WebClient DownloadStringAsync loading wrong webpage

Question

0.00/5 (No votes)

See more:

I am using a WebClient routine to capture a webpage. Previously all was well, however on one website I am now suddenly retrieving a page that seems to come "before" the page I'm after.

In brief the routine looks like this:

C#

// ~~> Declaring 'x' as a new WebClient() method
   WebClient x = new WebClient();
// ~~>Setting the URL, then downloading the data from the URL.
   string source = x.DownloadStringAsync(SearchTerm);

Previously all was fine but now one website is returning a page than begins like this:

C#

<html><head><meta http-equiv="Pragma" content="no-cache">

This is only occurring on one website (rest are fine) and I assume that this must be a change to this particular website. Can someone point me towards what it is that has changed and how I get round this to load the "true" file?

Thanks.

Posted 8-Sep-15 3:26am

Andrw_S

Add a Solution

Comments

Kornfeld Eliyahu Peter 8-Sep-15 9:41am

What happening if you browse that address from a browser?

Andrw_S 8-Sep-15 9:44am

The result is exactly as I would expect - i.e. the "true" page is loaded.

Kornfeld Eliyahu Peter 8-Sep-15 10:00am

There is no secondary page (advertising)?

F-ES Sitecore 8-Sep-15 9:42am

Ask the owners of the site you're downloading the html from.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Andrw_S · Accepted Answer · 2015-09-09T04:36:00

After much fiddling, I discovered that the page actually loaded twice. The first load was this default header file that now precedes all their pages. The second was the true page.

Should you ever run into this problem you can check for something similar using a WebBrowser DocumentLoaded event. Something like this:

C#

// ~~> Create a browser.                
WebBrowser browser = new WebBrowser();

// ~~> Add handler to browser etc.
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);

// ~~> Disable scripting errors.
browser.ScriptErrorsSuppressed = true;

Then check the content of the page from the browser DocumentText.

C#

private void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    // ~~> Get source string.
    string source = browser.DocumentText;

}

Wrap the DocumentCompleted event in some sort of holding routine until the string "source" contains the information you expect.