Click here to Skip to main content
15,867,568 members
Please Sign up or sign in to vote.
2.50/5 (2 votes)
See more:
Hi,

When I compare the source code of a webpage in the browser (e.g. YouTube) with the source code I get from the code below, it differs. The source code is not the same and I suspect that is caused because of some DOM manipulations.
<br />
var webGet = new HtmlWeb();<br />
HtmlAgilityPack.HtmlDocument document = webGet.Load(_url);<br />

Is it possible to get the HTML source code after the javascript and/or ajax manipulations programmaticly (with C#)?

Thanks in advance!
Posted
Updated 12-Dec-22 20:09pm

1 solution

Yes and no. Those DOM manipulations are done purely on the client side after the document is already delivered from the HTTP server in HttpWebResponse. So, if you only download the HTML file from the server (using HttpWebRequest) you can only get the document as it is before it's DOM is manipulated.

So, what can you do? You can reproduce all those manipulations on the client side as the Web browser does. For this purpose, you can navigate to the Web page using System.Windows.Forms.WebBrowser. You can even manipulate DOM yourself using the instance of this class. See System.Windows.Forms.WebBrowser.Document, System.Windows.Forms.WebBrowser.DocumentText, the events System.Windows.Forms.WebBrowser.Navigated, System.Windows.Forms.WebBrowser.DocumentCompleted in http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.aspx[^].

—SA
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900