Issues in webscraping Python

Question

1.00/5 (2 votes)

See more:

i tried to do webscraping amazon.in laptop search page no 1 in python using beautifulsoup module. when i tried to get a list of names and prizes of a product it shows only one value and not the list required.
Here is the amazon website link:- [DELETED]

What I have tried:

Here is my Code

from bs4 import BeautifulSoup
import requests


url= "https://www.amazon.in/s?k=laptop&crid=11C0N6A7MKTOH&sprefix=laptop%2Caps%2C1151&ref=nb_sb_noss_1"
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('div',class_="s-main-slot s-result-list s-search-results sg-row")
print(lists)

for list in lists:
        Name = list.find('span',class_="a-size-medium a-color-base a-text-normal").text.replace('\n', '')
        Price = list.find('span', class_="a-price-whole").text.replace('\n', '')
print(Name)
print(Price)

Here is my output

HP 15s-Ryzen 3 3250U 8GB SDRAM/256GB SSD 15.6inch(39.6cm) HD, Micro-Edge Laptop/AMD Radeon Graphics/Dual Speakers/Win 11 Home/MS Office/Fast Charge/Jet Black/1.69Kg, 15s-ey1508AU
29,999

Posted 10-Jan-23 23:33pm

Anirudh Sudarsan

Updated 11-Jan-23 0:52am

OriginalGriff

v2

Add a Solution

Comments

Richard MacCutchan 11-Jan-23 5:50am

It may hav e something to do with the way that the information is generated on the page. But this is an issue with the source data, rather than a programming problem.

Anirudh Sudarsan 11-Jan-23 5:59am

i tried to do same coding with the other websites and i get the same error

Richard MacCutchan 11-Jan-23 6:14am

You need to examine the entire page that is returned on the call to requests.get(url). From that you should be able to see what data is being presented. You can only extract items that are sent in the page.

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Richard Deeming · Answer 1 · 2023-01-11T00:52:00

Quote:

Python

lists = soup.find_all('div',class_="s-main-slot s-result-list s-search-results sg-row")

Look at the returned HTML - there is only one <div class="s-main-slot ..."> element.

Your code loops over that one element, and finds the first name and price spans within the entire list.

Change your code to find all <div class="s-result-item ..."> items instead, and you might have better luck:

Python

lists = soup.select("div.s-result-item")

NB: use select rather than find_all, since you can't specify the full class attribute, as it changes for each element.

Alternatively, use the data-component-type attribute to find the search result items:

Python

lists = soup.select('div[data-component-type="s-search-result"]')

OriginalGriff · Answer 2 · 2023-01-10T23:57:00

Solution 1

Quote:
so is there any way to solve it

You could try using the Amazon API: Selling Partner API[^]

Posted 10-Jan-23 23:57pm

OriginalGriff

Comments

Anirudh Sudarsan 11-Jan-23 6:01am

the client wants to use python beautifulsoup module for it so that data can be converted to csv file. so is there any way to stop this error

Dave Kreskowiak 11-Jan-23 8:05am

Well, the client doesn't get to dictate which libraries to use or even the method of getting the data. They usually do not know the pitfalls of going with a specific process like this.

The problem you're going to run into with web page scraping is when Amazon changes the page layout, your code will break.

Beautiful Soup is for processing an input (HTML), not for generating the CSV file. One has nothing to do with the other.

OriginalGriff 11-Jan-23 8:26am

Most times the client doesn't know or even care what language is used, never mind what package! :D

And maybe he just likes fixing his app every week or so because Amazon changed the site *again*. :laugh:

Dave Kreskowiak 11-Jan-23 8:27am

I get the feeling the "client" is a teacher or other student and this is a fiverr job.

Anirudh Sudarsan 11-Jan-23 8:32am

By client i mean my trainer in my company. As I am a fresher i am learning from my trainer as to how to do it.
But the fact is that they don't know webscrapping that well. So here I am using beautiful soup for it. So can you tell me how to solve this error instead of suggesting alternatives

Richard MacCutchan 11-Jan-23 8:51am

You solve the "error" by doing what I suggested two hours ago. You need to examine all the data that is returned to find out exactly which classes are used for the items you are trying to extract. But as others have commented here, you cannot guarantee that Amazon will not change things the next time you try to run it. You would be better starting with a web page that you know will not change in order to test your code.

Anirudh Sudarsan 11-Jan-23 9:01am

1. The classes used in the Amazon code are the same classes that i used in the code

2. I havent used your solution so i don't if it will work

3. Is there other software or modules to be used for webscrapping. If so can you send the name and brief description about the software so that I can explore it in my free time

4. Also is it just popular and well used websites like Amazon who changes their code often or does every company does it i am curious

Richard MacCutchan 11-Jan-23 9:42am

1. Well I cannot see any reason why they are not being found. I have used BeautifulSoup a number of times without problems
2. N/A
3. If you know C# you could look at Html Agility Pack[^]
4. Websites, by their very nature and function, are rarely static entities.

Issues in webscraping Python

2 solutions

Solution 2

Solution 1

Add your solution here

Preview 0

Existing Members

...or Join us