How can we create website copier(all things related in that website) in C#?

Question

3.67/5 (2 votes)

See more:

Suppose i entered the url-www.w3school.org,then all the pages of w3school should save in a folder and create a summary page index.htm,after that how i click on index then it will look like w3school home page.I know i can use httrack but if i will do it through C# programming,then it will explore me.

Posted 17-Sep-12 20:33pm

StackQ

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Christiaan Rakowski · Answer 1 · 2012-09-22T05:26:00

To do this you can use a System.Net.WebClient[^]

- use it to download the page as a big string
- save the string in a file to the harddisk
- parse the string, using regular expressions, for images and what else you want.
- download the images, and what else you want

(If you want to get all the pages of the website you will need to also parse for hyperlinks and recursively download all of those too)

Please do keep in mind that the home page has a link to the home page!
To get around circles like that use a Dictionary to keep track of what you have and have not downloaded. A dictionary can contain a dictionary, that way you can save index.html in the first one, and add a dictionary for asp.net, and in the 2d dictionary save index.html again.

In the end you can then recursively print the dictionary to give you the sitemap.

Do keep in mind that this will generate a lot of traffic, and might not always be allowed by the website owners.

Hope this helps you on your way :-)