Python: using a while loop to parse multiple pages

Question

0.00/5 (No votes)

See more:

Using the URL inside the function at the bottom of the code, I want to parse all of the quotes that are listed in all of the pages. However, this code only returns the first page's quotes and gives the URL to the next page. I want to use a while loop to parse the quotes from the next page and any subsequent pages that have a next button.

import requests, bs4, urllib.parse

def process(url):
	page = requests.get(url)
	soup = bs4.BeautifulSoup(page.text, 'html5lib')
	quotes = []
	for quote in soup.select('div[class="quote"] > span[class="text"]'):
		quotes.append(quote.getText())
		next_button = soup.select('li[class="next"] > a')
		if next_button != []:
			next_url = urllib.parse.urljoin(page.url, next_button[0]['href'])
		else:
			next_url = None
	return quotes, next_url

process('http://quotes.toscrape.com/page/1/')

What I have tried:

Typing a while statement before the return function returns the same code, while typing a while statement as the first line after defining the function returns nothing. I have a feeling it has something to do with the page number, as this goes up to page/10/, but I can't quite figure it out. http://quotes.toscrape.com/ also returns the first page.

Posted 24-Dec-19 21:30pm

Member 14699359

Updated 26-Dec-19 1:36am

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Richard MacCutchan · Answer 1 · 2019-12-26T01:36:00

Solution 1

See 8. Compound statements — Python 3.7.6 documentation[^]. You need to put the while loop in the process method, and repeat each time you find a new url.

Posted 26-Dec-19 1:36am

Richard MacCutchan