Click here to Skip to main content
15,894,825 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi everyone,

Does anyone have an idea of how to programmatically navigate through all the pages of a website(probably pages where login is not required), when the index(home URL) of the website is known. Any help or ideas will be highly appreciated.

Thanks
Anurag
Posted

Yeah, there are several articles on Code Project that implement "web crawlers" / "web spiders" (the name for the thing you're trying to do). Read them. The short version is: download the HTML, check for links, navigate to those links and download that HTML until you have no more links to check.
 
Share this answer
 
Comments
@nuraGGupta@ 21-Dec-10 1:32am    
Ok, thanks. It was really helpful. :-)
You really need some kind of information source to know the name and location of all pages within a web site, to be able to navigate those programmatically.

For example, Google searches for SiteMap.xml within the web root directory of a web site, and if it is found, it reads the page locations written within it and then crawls the pages to read SEO oriented data from each page.

So, what you need is some kind of SiteMap.xml within your target web site to know the name and location of all pages within the site so that you can programmatically navigate through the pages.
 
Share this answer
 
Comments
@nuraGGupta@ 21-Dec-10 2:37am    
Thank you. :-)
Hi ,


In the page load event you have to redirect to related page .

Or

U have try with Application Start Event
 
Share this answer
 
Comments
@nuraGGupta@ 21-Dec-10 1:21am    
Not a good answer, I was asking for getting all the related pages, and you are telling me the way to navigate through them.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900