Click here to Skip to main content
15,891,033 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I am trying to build a system, which when given an input, would return relevant specific information about it by scraping the web (For example: given a software name, output information about its releases).

How to go about building a scraper for such a system?


What I have tried:

I have done web scraping before using Beautiful Soup. But, that pertained to getting information from a single specific website.

In this case, I might have to scrape websites of dynamically built URLs (like wiki pages of the input software or official product pages shown in google search results) and different software websites/wiki have different structures to display releases data. Are there any other approaches to get such information about different softwares in a structured way?
Posted
Comments
Richard MacCutchan 17-Nov-17 4:39am    
Just the same as scraping a single site, but you will need to find a list of website addresses from somewhere.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900