Click here to Skip to main content
15,917,610 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
Hi
I'm looking for away to read the collect the data/information from any website
for example: from amazon website I need the list of books and authors for specific subject? or from blogs website get a list of all titles and dates or from pizza website get all type of pizza and the prices
and extract it in XML file or even DB table

any one know any API do that.

thanks
Posted

You'd have to write a parser for each and every site you want to gather data from. Amazon has it's own API to use to retrieve data from. Blog sites have their own API's to gather data from, but you could easily just use their RSS feeds as a standard interface for retrieving contents. Pizza sites would require you to write a specialized parser to pull the information from the HTML of each page. You'd need to to supply custom information for each and every site you wanted to gather data from to tell the parser how to find the data you're looking for.
 
Share this answer
 
Comments
Ed Nutting 2-Mar-11 16:32pm    
My vote of four. Very true if you want specific information that you can use to display the way you want and update without checking say the Amazon website for changes but not necessarily the only way to solve the problem :P
wizardzz 2-Mar-11 17:06pm    
My vote of 5.

Ed - You are suggesting he use a crawler. The parser that Dave suggested is essentially a crawler. I wrote a similar program myself in the past as a favor to my boss's friend in a matter of minutes. There is no need to suggest an off the shelf crawler for this.
Or could you use a web crawler?

Many good web crawlers let you pull all the webpages (and thus info) off all or select parts of a website. There are some good ones out there it just takes a bit of searching. They can be very useful for getting information and excellent for keeping copies of sites that are useful but about to be pulled down ;P
 
Share this answer
 
The above answer given you enough information. However it is also possible for you to write your own webbots. Google for how to write a webbot.

Here are a few links...

It is a book which give examples.
http://www.heatonresearch.com/articles/series/20[^]

Here is a free library

http://www.download3k.com/Software-Development/Editors-Tools/Download-Web-Bot-Programming-Library.html[^]

This snippet is from the book Introduction to Neural Networks for C#, second Edition published by Heaton Research www.heatonresearch,com

Chapter 13: Bot Programmi ng and Neural Networks
• Creating a Simple Bot
• Analyzing Text
• Training a Neural Bot
• Using a Neural Bot
Bots are computer programs that are designed to use the Internet in much the
same way as humans use it. Neural networks can be useful in developing bots.
In this<br />
chapter you will see how a neural network can be used to assist a bot in finding desired<br />
information on the Internet. 
 
Share this answer
 
anyone use this before
http://web-harvest.sourceforge.net/index.php

In fact, I need something easy to modify to just grape some data from specific website

any advice please
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900