Click here to Skip to main content
15,901,505 members
Articles / Web Development / HTML
Article

HTML Tag Extractor

Rate me:
Please Sign up or sign in to vote.
2.39/5 (12 votes)
2 Jun 20052 min read 49K   954   25   10
This article provides a solution to prevent HTML or JavaScript injections into your fields.

Sample Image

Introduction

Before I start the code explanations, I want to ask a question. What will you do if someone entered HTML tags or JavaScript into a textbox you have in a web form?

OK, I wrote this article and attached the code I use to validate or, to be more accurate, extract the tags entered in my textboxes. Although ASP.NET 1.1 contains in itself a detector that will detect tags entered in input fields, it would be better for you to extract these tags yourself if you don't need them.

Injections of unwanted tags or scripts may make your results or your output data unpredictable. For example, if you have a textbox that will save a username in a database and the user entered is <b>HisName</b>, and another page displays all the users in a table, then the username with the <b></b> tags will be shown in bold.

For example:

User name
Abdullah
HisName
Omar

The code attached contains two parts, one for ASP.NET and the other for VB.NET. I'll explain the class which is the same for both.

Using the code

The class Extractor contains a public function Extract that returns a string type, and two private functions FoundOpener, CalculateLength.

Extract function will search though the entered text and will search for any "<" character. If found, call the FoundOpener function which takes two parameters, the text that is under validation and the position of "<" respectively.

FoundOpener will search for the character ">" which is the closer for the tag and will return its position. If not found that means this tag is not closed, then the position will be the length of the text entered and all of the text after the opening will be removed.

After the position of the closer character is determined, another function which is called CalculateLength will be executed to calculate the length of the text between the <>. For example, the length of <center> is 8. This function takes the start and end positions as parameters. Start is the position of "<" and end is the position of ">". The length is calculated by subtracting the start from the end.

Extract function:

Remove is a built-in function for use in string variables to remove pieces of characters:

VB
Public Function Extract(ByVal srctext As String, _
                ByVal sender As frmTagExtractor) As String
 Dim TotalChars As Long
 Dim Counter As Long
 Dim CloserPosition As Long
 Dim length As Long
 Dim Extracts As String
 Dim srcLength As Long = Len(srctext) - 1

 Do While Counter <= srcLength
    If srctext.Chars(Counter) = "<" Then
        CloserPosition = FoundOpener(srctext, Counter)
        length = CalculateLength(Counter, CloserPosition)
        srctext = srctext.Remove(Counter, length)

        srcLength = Len(srctext) - 1
        Counter -= 1
    End If
    Counter += 1
 Loop

 Return srctext
End Function

FoundOpener function:

InStr built-in function in VB.NET will search something in a string:

VB
Public Class Extractor
  Private Function FoundOpener(ByVal text As String, _
                   ByVal Position As Long) As Long
    Dim CloserPosition As Long
    CloserPosition = InStr(Position + 1, text, ">", CompareMethod.Binary)
    If CloserPosition = 0 Then
      CloserPosition = Len(text)
    End If
  Return CloserPosition
 End Function

CalculateLength function:

VB
Private Function CalculateLength(ByVal start As Long, _
                 ByVal final As Long) As Long
  Return Math.Abs(final - start)
End Function

Finally

Please tell me if you have any suggestions concerning this technique or if you have another way to handle such a case.

Best regards.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Syrian Arab Republic Syrian Arab Republic
A Pharmacist Smile | :)

Comments and Discussions

 
QuestionUsing regex, will it remove attributes too Pin
Member 342150827-May-10 0:36
Member 342150827-May-10 0:36 
Generalthis is good Pin
edwardwu18-Jan-06 4:15
edwardwu18-Jan-06 4:15 
GeneralRe: this is good Pin
smiling4ever18-Jan-06 4:18
smiling4ever18-Jan-06 4:18 
GeneralI think Regex is a better choice! Pin
dathq3-Jun-05 16:54
dathq3-Jun-05 16:54 
GeneralRe: I think Regex is a better choice! Pin
Thomas Lykke Petersen5-Jun-05 21:54
Thomas Lykke Petersen5-Jun-05 21:54 
GeneralRe: I think Regex is a better choice! Pin
Member 265064127-Jun-11 20:56
Member 265064127-Jun-11 20:56 
GeneralNew Title Pin
eggie53-Jun-05 4:10
eggie53-Jun-05 4:10 
QuestionWhy not use Regular Expresion? Pin
JJF0072-Jun-05 23:47
JJF0072-Jun-05 23:47 
AnswerRe: Why not use Regular Expresion? Pin
smiling4ever3-Jun-05 10:18
smiling4ever3-Jun-05 10:18 
GeneralRe: Why not use Regular Expresion? Pin
JJF0074-Jun-05 5:01
JJF0074-Jun-05 5:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.