Click here to Skip to main content
15,891,033 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hello Dear friends. I want to get documents from website. Making interface and putting url of any website and then get documents from that site.KindlyHelp.

What I have tried:

using System;
using System.Net;
using System.IO;

public partial class Y : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)
    {
        Text1.Text = GetWebSiteContents("http://localhost/WebSite1/X.aspx");
    }

    protected string GetWebSiteContents(string url) {
        WebRequest req = WebRequest.Create(url);
        // Get the stream from the returned web response
        StreamReader sr = new StreamReader(req.GetResponse().GetResponseStream());
        System.Text.StringBuilder sb = new System.Text.StringBuilder();
        string strLine;
        // Read the stream a line at a time and place each one into the stringbuilder
        while ((strLine = sr.ReadLine()) != null) {
            // Ignore blank lines
            if (strLine.Length > 0) sb.Append(strLine);
        }
        sr.Close();
        return sb.ToString();
    }
}
And i got text through in this code.
Posted
Updated 24-Jul-17 6:23am
Comments
F-ES Sitecore 20-Jul-17 5:05am    
You're not going to be able to do that. You can get the HTML but that HTML will point to sources (css files, js files, images etc) that (depending how the links are structured) may no longer be valid, or will no longer work now they're been driven by a different domain. If you could just impersonate any page like this then everyone would be releasing their own google or gmail or whatever that are simply hijacks of the proper services by those companies.
Member 12982873 20-Jul-17 5:09am    
Can u help me in this case ?
ZurdoDev 20-Jul-17 10:52am    
Help with what? It's not very clear.
Member 12982873 23-Jul-17 16:30pm    
then clear it?
ZurdoDev 24-Jul-17 7:48am    
Yes please.

1 solution

Hi,

The class WebRequest allow that you get the html of the site (source code).
The html, you can read your tags with the HtmlAgilityPack library.
Link: Html Agility Pack | HAP

Here, the code for read html of the site:
C#
private HtmlAgilityPack.HtmlDocument GetWebSiteContents(string url)
        {
            string postString = "";

            HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create(new Uri(url));

            //Our method is post, otherwise the buffer (postvars) would be useless
            WebReq.Method = "POST";
            //We use form contentType, for the postvars.
            WebReq.ContentType = "application/x-www-form-urlencoded";

            WebReq.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";

            //The length of the buffer (postvars) is used as contentlength.
            WebReq.ContentLength = postString.Length;

            //Enviar os atributos dos post 
            StreamWriter requestWriter = new StreamWriter(WebReq.GetRequestStream());
            requestWriter.Write(postString);
            requestWriter.Close();

            //obter a stream de resposta do servidor e gerar um documento html
            Stream stream = WebReq.GetResponse().GetResponseStream();
            HtmlAgilityPack.HtmlDocument htmldoc = new HtmlAgilityPack.HtmlDocument();
            htmldoc.Load(stream);
            
            //finalizar os objetos aberto
            stream.Close(); 
            WebReq.GetResponse().Close();

            return htmldoc;
            
        }


IMPORTANT! If the site have authentication, you need send the authentication cookie.
C#
....

HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create(url);

//  authentication
WebReq.CookieContainer = new CookieContainer();
WebReq.CookieContainer.SetCookies(url, enter here the authentication cookie);

//end authentication

....


To read the html tag elements:
C#
//inicialize variable
HtmlAgilityPack.HtmlDocument html = null;
HtmlNode[] elems = null;

//get html
html = this.GetWebSiteContents(url);

//read specific input tag of the site.  
elems = html.DocumentNode.Descendants("input").Where(n => n.Attributes["name"] != null && n.Attributes["name"].Value == "name_html_tag").ToArray();
string myvalue = elems[0].Attributes["value"].Value.ToString();
 
Share this answer
 
v4
Comments
Member 12982873 24-Jul-17 17:07pm    
But when im trying to install HtmlAgilityPack this error occur 'HtmlAgilityPack' already has a dependency defined for 'System.Net.Http'.???
Sheila Pontes 24-Jul-17 21:01pm    
Hi,
Tomorrow, when I get to work. I'll see this case for you.
Sheila Pontes 25-Jul-17 7:31am    
Hi;

Don't install HtmlAgilityPack. Download version 1.4.6. Html Agility Pack, Unzip the file, add references in your project according to the version of the framework and add the namespace "using HtmlAgilityPack;"

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900