how to get content inside of a dive in a particular class

Question

1.00/5 (1 vote)

See more:

in a html code i found a div like this

HTML

****************other html codes**************************
<div class="imageContainer" style="width:736px;">
        <img src="http://media-cache-ak0.pinimg.com/736x/88/d4/8f/88d48f85d605906c88c05df6c931f8f1.jpg" class="pinImage" style="height:580px;width:540px;margin:0 auto;padding:40px 0px;" alt="Batman - OKAY, I really like this. This is BATMAN as a Knight. Now if the armor was brought up to a current timeline and was future tech - that would be something.">

    </div>
****************other html codes**************************

i would like to filter class="imageContainer"

HTML

<img src="http://media-cache-ak0.pinimg.com/736x/88/d4/8f/88d48f85d605906c88c05df6c931f8f1.jpg" class="pinImage" style="height:580px;width:540px;margin:0 auto;padding:40px 0px;" alt="Batman - OKAY, I really like this. This is BATMAN as a Knight. Now if the armor was brought up to a current timeline and was future tech - that would be something.">

this to a textbox using

C#

Regex r = new Regex("")

Posted 1-Mar-14 16:28pm

satheeshk787

Updated 1-Mar-14 16:30pm

v2

Add a Solution

2 solutions

Solution 2

Seriously? Don't.
Using a Regex to process HTML is a common mistake: it normally ends in tears, because HTML processing requires a fair amount more than "dumb" pattern matching: it is a hierarchical data structure, and really needs to be processed as such. Regexes are really, really bad at that.

You might be able to create a solution that works for that specific example, today, but there is a very, very good chance that it will break with the first change to the site you are scraping the data from, and either deliver no information, or the wrong information. The first is easy to spot, but the second normally requires a human, and is a real pain to sort out, particularly if it isn't spotted immediately, and "bad" data gets passed through the system and stored. Working out what info has been corrupted and fixing that can take a lot of manual work.

There are loads of HTML parsers / site scraping tools out there: have a google, find one that fits what you are trying to do, and use that. Going with a simple-to-implement regex will give you a lot more trouble in the future than investing a little time in doing it right, right from the start.

[edit]Typo: "chance" for "change" - OriginalGriff[/edit]

Posted 1-Mar-14 23:05pm

OriginalGriff

Updated 1-Mar-14 23:06pm

v2

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

satheeshk787 · Accepted Answer · 2014-03-04T19:24:00

C#

public string get_div(string html)
{

    string input = html;


    Match match = Regex.Match(input, Properties.Resources.DIVReg, RegexOptions.IgnoreCase);

    if (match.Success)
    {
        string key = match.Groups[1].Value;
        return (key);
    }
    else
    {
        return "";
    }
}

"Properties.Resources.DIVReg" contains

"(?s)<div[^>]*?class="imageContainer"[^>]*?>(.*?)"