Click here to Skip to main content
15,887,325 members
Articles / Programming Languages / C#
Tip/Trick

HTML Parser

Rate me:
Please Sign up or sign in to vote.
1.42/5 (18 votes)
2 Aug 2006CPOL1 min read 56.7K   843   28   13
C# DLL for use it in .Net Applications, you can convert it easy to any code

Introduction

This is HTML parser for getting Titles, Texts and Links from the page, it is a dll file using C# but you can transform it in an easy way to any programming language when you know, how to get the HTML code from the page

Basic Idea

The idea behind this code is,you parse through the HTML code character by character then if you get the title tag represent the text after it to the title string, if you go to body tag then accept all text which not language script or CSS, and the same for the links

Brief Code Description

i make lookup table for some special characters like when you read in the HTML code the characters &lt; this represent the < character

public string GetTitle(string Source)
{
        int len=Source.Length;
    string title="      ";
    char c;
    for(int i=0;i<len;i++)
    {
        c=Convert.ToChar(Source.Substring(i,1));
        title=title.Remove(0,1);
        title+=c;
        if(title.ToLower()=="<title")
        {
            while(c!='>')
            {
                i++;
                c=Convert.ToChar(Source.Substring(i,1));
            }
            title="";
            i++;
            c=Convert.ToChar(Source.Substring(i,1));
            while(c!='<')
            {
                title+=c;
                i++;
                c=Convert.ToChar(Source.Substring(i,1));
            }
            break;
        }
    }
    return title.Trim();
}

The other codes for getting text and links in the file attached

Usage

in using this code you add the library to your project then call the instance of this class like Parser.Parse inst=new Parser.Parser() and use the inst for getting the functions inst.GetTitle(page)to represent the title

inst.GetText(page)to represent the text

inst.MakeLinks(page)to represent the Links

then after you make link you will get it in pLabel and pLink which represent the Link and the label you which appear it in the page

Resources

C# DLL in .Net 2005

Contact me

 

if there is a problem please contact me at ahmed_a_e2006@yahoo.com

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
Egypt Egypt
Birth Date: 1/1/1985
ISFP Company(Integrated Solutions For Ports)
Computer Science and Automatic Control Department
Faculty of Engineering - Alexandria University
Graduation Year: 2006
Phone: (+2) 0183262859

Comments and Discussions

 
GeneralMy vote of 4 Pin
fatma_mansour1915-Oct-11 22:11
fatma_mansour1915-Oct-11 22:11 
GeneralI need an English Parser (ASP.net) Pin
soha_mssh_10110-Aug-08 20:59
soha_mssh_10110-Aug-08 20:59 
GeneralRe: I need an English Parser (ASP.net) Pin
Ahmed Ali El-Sayed14-Aug-08 9:39
Ahmed Ali El-Sayed14-Aug-08 9:39 
QuestionI need example Pin
matracaC#2-Jul-07 18:53
matracaC#2-Jul-07 18:53 
AnswerRe: I need example Pin
Ahmed Ali El-Sayed3-Jul-07 6:52
Ahmed Ali El-Sayed3-Jul-07 6:52 
GeneralRe: I need example Pin
matracaC#5-Jul-07 11:18
matracaC#5-Jul-07 11:18 
Generaluse of html in C# Pin
Zafar I Khan16-May-07 22:47
Zafar I Khan16-May-07 22:47 
GeneralRe: use of html in C# Pin
Ahmed Ali El-Sayed17-May-07 9:32
Ahmed Ali El-Sayed17-May-07 9:32 
GeneralAlso see... Pin
Ravi Bhavnani13-Apr-06 2:22
professionalRavi Bhavnani13-Apr-06 2:22 
GeneralRegEx Pin
Chadwick Posey12-Apr-06 14:10
Chadwick Posey12-Apr-06 14:10 
GeneralGood RegEx Tutorial Pin
The_Mega_ZZTer12-Apr-06 16:11
The_Mega_ZZTer12-Apr-06 16:11 
GeneralRe: RegEx Pin
Ahmed Ali El-Sayed13-Apr-06 1:13
Ahmed Ali El-Sayed13-Apr-06 1:13 
JokeRe: RegEx Pin
Abdallah M. Abdelsalam13-Apr-06 2:31
Abdallah M. Abdelsalam13-Apr-06 2:31 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.