Click here to Skip to main content
15,902,918 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
in a html code i found a div like this

HTML
****************other html codes**************************
<div class="imageContainer" style="width:736px;">
        <img src="http://media-cache-ak0.pinimg.com/736x/88/d4/8f/88d48f85d605906c88c05df6c931f8f1.jpg" class="pinImage" style="height:580px;width:540px;margin:0 auto;padding:40px 0px;" alt="Batman - OKAY, I really like this. This is BATMAN as a Knight. Now if the armor was brought up to a current timeline and was future tech - that would be something.">

    </div>
****************other html codes**************************


i would like to filter class="imageContainer"

HTML
<img src="http://media-cache-ak0.pinimg.com/736x/88/d4/8f/88d48f85d605906c88c05df6c931f8f1.jpg" class="pinImage" style="height:580px;width:540px;margin:0 auto;padding:40px 0px;" alt="Batman - OKAY, I really like this. This is BATMAN as a Knight. Now if the armor was brought up to a current timeline and was future tech - that would be something.">


this to a textbox using

C#
Regex r = new Regex("")
Posted
Updated 1-Mar-14 16:30pm
v2

Seriously? Don't.
Using a Regex to process HTML is a common mistake: it normally ends in tears, because HTML processing requires a fair amount more than "dumb" pattern matching: it is a hierarchical data structure, and really needs to be processed as such. Regexes are really, really bad at that.

You might be able to create a solution that works for that specific example, today, but there is a very, very good chance that it will break with the first change to the site you are scraping the data from, and either deliver no information, or the wrong information. The first is easy to spot, but the second normally requires a human, and is a real pain to sort out, particularly if it isn't spotted immediately, and "bad" data gets passed through the system and stored. Working out what info has been corrupted and fixing that can take a lot of manual work.

There are loads of HTML parsers / site scraping tools out there: have a google, find one that fits what you are trying to do, and use that. Going with a simple-to-implement regex will give you a lot more trouble in the future than investing a little time in doing it right, right from the start.

[edit]Typo: "chance" for "change" - OriginalGriff[/edit]
 
Share this answer
 
v2
C#
public string get_div(string html)
{

    string input = html;


    Match match = Regex.Match(input, Properties.Resources.DIVReg, RegexOptions.IgnoreCase);

    if (match.Success)
    {
        string key = match.Groups[1].Value;
        return (key);
    }
    else
    {
        return "";
    }
}





"Properties.Resources.DIVReg" contains

"(?s)<div[^>]*?class="imageContainer"[^>]*?>(.*?)"
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900