Click here to Skip to main content
15,888,527 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Using the URL inside the function at the bottom of the code, I want to parse all of the quotes that are listed in all of the pages. However, this code only returns the first page's quotes and gives the URL to the next page. I want to use a while loop to parse the quotes from the next page and any subsequent pages that have a next button.

import requests, bs4, urllib.parse

def process(url):
	page = requests.get(url)
	soup = bs4.BeautifulSoup(page.text, 'html5lib')
	quotes = []
	for quote in soup.select('div[class="quote"] > span[class="text"]'):
		quotes.append(quote.getText())
		next_button = soup.select('li[class="next"] > a')
		if next_button != []:
			next_url = urllib.parse.urljoin(page.url, next_button[0]['href'])
		else:
			next_url = None
	return quotes, next_url

process('http://quotes.toscrape.com/page/1/')


What I have tried:

Typing a while statement before the return function returns the same code, while typing a while statement as the first line after defining the function returns nothing. I have a feeling it has something to do with the page number, as this goes up to page/10/, but I can't quite figure it out. http://quotes.toscrape.com/ also returns the first page.
Posted
Updated 26-Dec-19 1:36am

1 solution

See 8. Compound statements — Python 3.7.6 documentation[^]. You need to put the while loop in the process method, and repeat each time you find a new url.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900