Click here to Skip to main content
15,920,687 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I would like to:

1. Load a website "home" page.
2. Enter a search term.
3. Parse the resulting page for content. (I'm okay from this point).

I have a method for doing stages (1) and (2), but they are rather clunky and I'm sure that there must be a better way.

Here's what I do currently:

VB
// ~~> Create a browser, a flag and a "home" page.                
// Create browser.
WebBrowser browser = new WebBrowser();

// Create flag for home page loaded.
bool homepageLoaded = false;

// Dummy page (would come from a textbox etc in reality).
string homePage = "http://CompanyName.com/web/";


Then I add a handler and navigate to the page.

VB
// ~~> Add handler and navigate to page.
// Add handler.
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
// Navigate to web site home page.
browser.Navigate(homePage);


And in the "DocumentCompleted" event, I wait for the home page to be fully loaded. Once this is done, I load the search term.

VB
//private void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {                        
            // Wait until the webpage has loaded completely
            if (e.Url.AbsoluteUri == homePage)
            {                                
                // Set flag.
                homepageLoaded = true;
                
                // Create document.
                HtmlDocument document = browser.Document;

                // Enter search term value.
                HtmlElement tb = document.GetElementById("searchTerm");
                tb.SetAttribute("value", "12345678");
                // Click the button.
                HtmlElement btn = document.GetElementById("btnSearch");
                btn.InvokeMember("Click");
            }
            else
            {
                string url = null;
                
                // Capture new webpage details.
                if (homepageLoaded && e.Url.AbsoluteUri.Contains(homePage))
                {
                    // Search result webpage.
                    url = e.Url.AbsoluteUri.ToString();

                    // -------------------------------------------------------
                    // Use this "url" for scraping...(I am okay from here)
                    // -------------------------------------------------------
                }
            }
        }


As I say, this works but it strikes me as very inefficient. I'm new to this area of coding and would welcome alternative suggestions to this approach.

Thank you.
Posted

1 solution

Here is what you are trying to do basically...

XML
System.Net.WebClient wc = new System.Net.WebClient();
                string HTML = wc.DownloadString("URL with post data");
                return HTML.Contains("text you want to parse");


So avoid all those unnecessary lines of code and replace with these two. :P
 
Share this answer
 
v2
Comments
Andrw_S 26-Mar-15 12:39pm    
Sorry, but I do not follow.

~~>Where does this code go?
~~>Where does the "homepage" string go?

This needs quite a bit of expansion to make it clearer to someone who is knew to this area of coding.
Thanks.
jk0391 31-Mar-15 15:53pm    
This code can go into a button for all you care...

Sorry for the late reply, I don't check here often. And also if you looked at my code and don't understand that "URL" is basically the homepage link, then you need some more studying. :P
Andrw_S 2-Apr-15 6:18am    
Hi,
No, you've missed the point. I know that the information IS NOT on the home page. There is a "search" textbox on this home page. Into this you enter the search term and hit the corresponding "search" button.

This load A NEW webpage and it is this page that I parse for its information.

It is the method of getting to this NEW page FROM the home page that I'm trying to improve. Sorry, I can see how my initial phrasing of the question can result in the confusion.

Thanks.
jk0391 2-Apr-15 11:21am    
Okay, then with my code you simply modify the URL to be:

The search function URL + post data being sent to do the search, then your response is stored in HTML where you can parse it to your liking.

If you don't know how to get the search POST data, use something like WireShark or Live HTTP Headers for Firefox to capture your request. :)
jk0391 2-Apr-15 11:21am    
What is the website you are trying to do this on? I can help.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900