Click here to Skip to main content
15,881,812 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
i tried to do webscraping amazon.in laptop search page no 1 in python using beautifulsoup module. when i tried to get a list of names and prizes of a product it shows only one value and not the list required.
Here is the amazon website link:- [DELETED]

What I have tried:

Here is my Code
from bs4 import BeautifulSoup
import requests


url= "https://www.amazon.in/s?k=laptop&crid=11C0N6A7MKTOH&sprefix=laptop%2Caps%2C1151&ref=nb_sb_noss_1"
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('div',class_="s-main-slot s-result-list s-search-results sg-row")
print(lists)

for list in lists:
        Name = list.find('span',class_="a-size-medium a-color-base a-text-normal").text.replace('\n', '')
        Price = list.find('span', class_="a-price-whole").text.replace('\n', '')
print(Name)
print(Price)


Here is my output
HP 15s-Ryzen 3 3250U 8GB SDRAM/256GB SSD 15.6inch(39.6cm) HD, Micro-Edge Laptop/AMD Radeon Graphics/Dual Speakers/Win 11 Home/MS Office/Fast Charge/Jet Black/1.69Kg, 15s-ey1508AU
29,999
Posted
Updated 11-Jan-23 0:52am
v2
Comments
Richard MacCutchan 11-Jan-23 5:50am    
It may hav e something to do with the way that the information is generated on the page. But this is an issue with the source data, rather than a programming problem.
Anirudh Sudarsan 11-Jan-23 5:59am    
i tried to do same coding with the other websites and i get the same error
Richard MacCutchan 11-Jan-23 6:14am    
You need to examine the entire page that is returned on the call to requests.get(url). From that you should be able to see what data is being presented. You can only extract items that are sent in the page.

Quote:
Python
lists = soup.find_all('div',class_="s-main-slot s-result-list s-search-results sg-row")
Look at the returned HTML - there is only one <div class="s-main-slot ..."> element.

Your code loops over that one element, and finds the first name and price spans within the entire list.

Change your code to find all <div class="s-result-item ..."> items instead, and you might have better luck:
Python
lists = soup.select("div.s-result-item")
NB: use select rather than find_all, since you can't specify the full class attribute, as it changes for each element.

Alternatively, use the data-component-type attribute to find the search result items:
Python
lists = soup.select('div[data-component-type="s-search-result"]')
 
Share this answer
 
Quote:
so is there any way to solve it
You could try using the Amazon API: Selling Partner API[^]
 
Share this answer
 
Comments
Anirudh Sudarsan 11-Jan-23 6:01am    
the client wants to use python beautifulsoup module for it so that data can be converted to csv file. so is there any way to stop this error
Dave Kreskowiak 11-Jan-23 8:05am    
Well, the client doesn't get to dictate which libraries to use or even the method of getting the data. They usually do not know the pitfalls of going with a specific process like this.

The problem you're going to run into with web page scraping is when Amazon changes the page layout, your code will break.

Beautiful Soup is for processing an input (HTML), not for generating the CSV file. One has nothing to do with the other.
OriginalGriff 11-Jan-23 8:26am    
Most times the client doesn't know or even care what language is used, never mind what package! :D

And maybe he just likes fixing his app every week or so because Amazon changed the site *again*. :laugh:
Dave Kreskowiak 11-Jan-23 8:27am    
I get the feeling the "client" is a teacher or other student and this is a fiverr job.
Anirudh Sudarsan 11-Jan-23 8:32am    
By client i mean my trainer in my company. As I am a fresher i am learning from my trainer as to how to do it.
But the fact is that they don't know webscrapping that well. So here I am using beautiful soup for it. So can you tell me how to solve this error instead of suggesting alternatives

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900