Click here to Skip to main content
15,896,606 members
Articles / Web Development / ASP.NET
Article

A C# class to make your ASP.NET pages XHTML valid

Rate me:
Please Sign up or sign in to vote.
4.87/5 (27 votes)
4 Nov 20044 min read 280.2K   1.2K   83   64
A C# class to make your ASP.NET pages XHTML valid.

Latest update (V1.1)

Fixed a bug in the ConvertToLowerCase method that was not working with tag with attributes.

Introduction

This article presents a simple class that can be used to adjust the HTML code generated by ASP.NET in order to make it a valid XHTML document.

A valid XHTML document is a document that has been successfully tested using the W3C Markup Validation Service (see http://validator.w3.org). This free service checks XHTML documents for conformance to W3C recommendations. This is not only useful to guarantee that your site will be correctly managed by any W3C compliant browser, but this kind of compliance could also be a specific requirement coming from your customer.

Image 1

The problem

The problem is that if you try to create a XHTML document using ASP.NET, you will probably fail since the code generated by the ASP.NET engine is not XHTML.

Just create a simple ASPX page and then run the W3C validator. Here is a list of errors you will find:

Uppercase tags

XHTML is all lower-case and it is case sensitive. Tags like HTML or HEAD are undefined for the XHTML validator. For this kind of problems, you could simply fix it by hand editing the HTML directly using the Visual Studio editor. Unfortunately, each time you add a new control on the page and you go back and forth from the design to the HTML view, the Visual Studio editor make the tags HTML and HEAD all uppercase.

Self-close tags

In XHTML (as in XML), all the tags must have a correspondent close tag or they must be self-close. Tags like <br> or <link href="style.css" rel="stylesheet"> are not XHTML valid. You should use <br /> and <link href="style.css" rel="stylesheet" /> instead.

Deprecated attributes

Some valid HTML attributes have been deprecated by XHTML. For instance, the name attribute is substitute by the id. If you take a look at the ASP.NET HTML code, you will see the following script (that is actually used to handle the ASP.NET postback mechanism).

HTML
<form name="Form1" method="post" action="Index.aspx" id="Form1">
<input type="hidden" name="__EVENTTARGET" value="" ID="Hidden1"/>
<input type="hidden" name="__EVENTARGUMENT" value="" ID="Hidden2"/>
<input type="hidden" name="__VIEWSTATE" 
  value="ReuDDhCfGkeYOyM6Eg==" ID="Hidden3"/>

<script language="javascript">
       
function __doPostBack(eventTarget, eventArgument) {
   var theform;
   if (window.navigator.appName.toLowerCase().indexOf("netscape") > -1
   {
      theform = document.forms["Form1"];
   }
   else {
      theform = document.Form1;
   }
   theform.__EVENTTARGET.value = eventTarget.split("$").join(":");
   theform.__EVENTARGUMENT.value = eventArgument;
   theform.submit();
}
</script>

The form attribute name need to be removed in order to make this code XHTML compliant.

Note that this code is generated only when the page is created. You have no way to change it at design time.

Mandatory attributes

The above script has another problem. In the script tag, the type="text/javascript" attribute is missing. This attribute is mandatory according to the XHTML specification.

Misplaced attributes

Still considering the content of the Form1, the hidden input tags are not correctly placed. In fact, according to XHTML specifications, an input tag has to be inside one of the following tags: "p", "h1", "h2", "h3", "h4", "h5", "h6", "div", "pre", "address", "fieldset", "ins", "del".

The solution

A possible solution is to intercept the HTML code just before it is sent to the client web browser and make the needed corrections.

XHTMLPage class

The XHTMLPage class inherits from the System.Web.UI.Page class, and it overrides the Render method.

C#
protected override void Render(HtmlTextWriter output) 
{
   StringWriter w; 
   w = new StringWriter();
            
   HtmlTextWriter myoutput = new HtmlTextWriter(w);
   base.Render(myoutput);

   myoutput.Close();

   m_sXHTML = w.GetStringBuilder().ToString();
            
   ReplaceDocType();

   switch (m_XHTMLFormat)
   {
      case _XHTMLFormat.XHTML10_Strict:
         ConvertToXHTMLStrict();
         break;
      case _XHTMLFormat.XHTML10_Transitional:
         ConvertToXHTMLTransactional();            
         break;
   }
            
   output.Write(m_sXHTML);
}

In the XHTMPage::Render method, first of all, the base class method base.Render is called using an instance of a new HtmlTextWriter object that has been created locally. The HtmlTextWriter is based on an underlying StringWriter object; in this way, the HTML code generated by ASP.NET can be placed inside the m_sXHTML string and then it can be treated.

The methods ConvertToXHTML… take care of replacing the non-valid XHMTL parts with equivalent XHTML code.

Make your page XHTML valid

In order to make any ASP.NET page an XHTML valid page, you just need to inherit from XHTMLPage instead of System.Web.UI.Page.

C#
public class Index : XHTMLPage
//public class Index : System.Web.UI.Page

The XHTMLPage can be configured using the XHTMLFormat property; this can be set to Strict or Transitional (that is the default) in order to make the page valid according to the XHTML Strict or SHTML Transitional specification.

C#
base.XHTMLFormat = XHTMLPage._XHTMLFormat.XHTML10_Strict;

Conclusion

Here I presented a problem that you may meet when trying to get a valid XHTML page using ASP.NET. Could be that this problem will be solved in the next version of Visual Studio, but in the mean time, I presented a simple solution you may find useful.

In the sample code I attached, I did not care too much about performance, but it is obvious that parsing the HTML generated by ASP.NET takes some time.

Credits

  • sebmafate helped me in extending and fixing the class functionality.

History

04-Nov-2004

  • Fixed a bug in the ConvertToLowerCase method that was not working with tag with attributes.

13-Oct-2004

  • Automatically convert to lowercase all tags and attribute names.
  • Added support for XHTML Frameset specification.
  • Added support for encoding and language XML attributes.
  • Added support for CDATA attributes.

27-Sep-2004

  • First version.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Italy Italy
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralThe REAL Solution Pin
Lonely Rolling Star28-Jul-07 9:00
Lonely Rolling Star28-Jul-07 9:00 
GeneralRe: The REAL Solution Pin
Rodrigo Kono1-Aug-07 6:33
Rodrigo Kono1-Aug-07 6:33 
GeneralDoes not handle closing tags Pin
maxgyver_sk22-Feb-07 11:58
maxgyver_sk22-Feb-07 11:58 
QuestionUsing with AJAX and third party componenets? Pin
Sameers Javed8-Jan-07 14:49
Sameers Javed8-Jan-07 14:49 
Questionwhat about __viewstate field? Pin
Tomislav B.18-Sep-06 9:45
Tomislav B.18-Sep-06 9:45 
AnswerRe: what about __viewstate field? Pin
Mika6-Dec-06 0:31
Mika6-Dec-06 0:31 
GeneralRe: what about __viewstate field? Pin
Michael Nero9-Sep-09 7:24
Michael Nero9-Sep-09 7:24 
GeneralRendering inner text as lowercase Pin
tcstom18-May-06 6:43
tcstom18-May-06 6:43 
Generalvb.net and html 4.0 Pin
foued6922-Mar-06 22:32
foued6922-Mar-06 22:32 
GeneralSome suggestions Pin
orcasquall30-Jun-05 20:25
orcasquall30-Jun-05 20:25 
GeneralPerformance Pin
Anonymous18-Apr-05 13:06
Anonymous18-Apr-05 13:06 
Generals Pin
Anonymous7-Apr-05 0:49
Anonymous7-Apr-05 0:49 
GeneralSome Improvements Pin
Mardoek W7-Mar-05 2:57
Mardoek W7-Mar-05 2:57 
GeneralRemoving invalid attributes Pin
majic197818-Feb-05 1:18
majic197818-Feb-05 1:18 
GeneralSlight bug with FixStyle method. Pin
miketong30-Nov-04 0:18
miketong30-Nov-04 0:18 
GeneralRe: Slight bug with FixStyle method. Pin
big7130-Nov-04 2:48
big7130-Nov-04 2:48 
QuestionVB? Pin
Karls.19-Nov-04 2:27
Karls.19-Nov-04 2:27 
AnswerRe: VB? Pin
big7119-Nov-04 23:40
big7119-Nov-04 23:40 
General[Message Deleted] Pin
Lucifer7917-Feb-05 8:49
Lucifer7917-Feb-05 8:49 
GeneralRe: VB? Pin
Lou Vanek18-May-05 13:20
sussLou Vanek18-May-05 13:20 
Here is a vb.net port.


Imports System
Imports System.IO
Imports System.Text.RegularExpressions
Imports System.Text
Imports System.Web.UI
Imports System.Web.UI.WebControls
Imports System.Web.UI.HtmlControls
Imports Microsoft.VisualBasic.ControlChars


Namespace ASPNET2XHTML


Public Class XHTMLPage
Inherits System.Web.UI.Page


'XHTMLPage constructor
Public Sub New()
m_sXHTML = ""
m_XHTMLFormat = _XHTMLFormat.HTML401_Loose
m_Encoding = Encoding.UTF8
m_sLanguage = "en"
m_bXmlCDATA = False
End Sub


Public Enum _XHTMLFormat
XHTML10_Strict
XHTML10_Transitional
XHTML10_Frameset
HTML401_Loose
HTML4_Transitional
End Enum


Private m_sXHTML As String
Private m_XHTMLFormat As _XHTMLFormat
Private m_Encoding As Encoding
Private m_sLanguage As String
Private m_bXmlCDATA As Boolean

Public Property XHTMLFormat() As _XHTMLFormat
Get
Return m_XHTMLFormat
End Get
Set(ByVal Value As _XHTMLFormat)
m_XHTMLFormat = Value
End Set
End Property

Public Property Encoding() As Encoding
Get
Return m_Encoding
End Get
Set(ByVal Value As Encoding)
m_Encoding = Value
End Set
End Property

Public Property Language() As String
Get
Return m_sLanguage
End Get

Set(ByVal Value As String)
m_sLanguage = Value
End Set
End Property

Public Property XmlCDATA() As Boolean
Get
Return m_bXmlCDATA
End Get
Set(ByVal Value As Boolean)
m_bXmlCDATA = Value
End Set
End Property




Protected Overrides Sub Render(ByVal output As HtmlTextWriter)
Dim w As StringWriter
w = New StringWriter

Dim myoutput As HtmlTextWriter
myoutput = New HtmlTextWriter(w)
MyBase.Render(myoutput)

myoutput.Close()

m_sXHTML = w.GetStringBuilder().ToString()

ReplaceDocType()

Select Case (m_XHTMLFormat)
Case _XHTMLFormat.XHTML10_Strict
ConvertToXHTMLStrict()

Case _XHTMLFormat.XHTML10_Transitional
ConvertToXHTMLTransitional()

Case _XHTMLFormat.XHTML10_Frameset
ConvertToXHTMLFrameset()

Case Else
ConvertToHTML4()

End Select

output.Write(m_sXHTML)
End Sub



Private Sub ConvertToXHTMLFrameset()
ConvertToLowerCase()
AddSelfClose("meta")
FixHtml()
End Sub


Private Sub ConvertToXHTMLTransitional()
ConvertToLowerCase()
AddSelfClose("meta")
AddSelfClose("link")
AddSelfClose("img")
AddSelfClose("hr")

FixScript()
FixBr()
FixStyle()
FixHtml()
End Sub


Private Sub ConvertToHTML4()
'ConvertToLowerCase()
'AddSelfClose("meta")
'AddSelfClose("link")
'AddSelfClose("img")
'AddSelfClose("hr")

'FixScript()
'FixBr()
'FixStyle()
End Sub


Private Sub ConvertToXHTMLStrict()
ConvertToLowerCase()
AddSelfClose("meta")
AddSelfClose("link")
AddSelfClose("img")
AddSelfClose("hr")

FixScript()
RemoveAttribute("form", "name")
FixInput()
FixBr()
FixStyle()
FixHtml()
maskScript()
End Sub


Private Sub ReplaceDocType()
' delete the current DOCTYPE
Dim nStart, nEnd As Integer
nStart = m_sXHTML.IndexOf("<!DOCTYPE", 0)
If (nStart > 0) Then
nEnd = m_sXHTML.IndexOf(">", nStart + 1)
If (nEnd > 0) Then
m_sXHTML = m_sXHTML.Remove(nStart, nEnd - nStart + 1)

Select Case m_XHTMLFormat
Case _XHTMLFormat.XHTML10_Strict
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">")

Case _XHTMLFormat.XHTML10_Transitional
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">")

Case _XHTMLFormat.XHTML10_Frameset
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Frameset//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"">")

Case _XHTMLFormat.HTML401_Loose
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC ""-//W3C//DTD HTML 4.01 Transitional//EN"" ""http://www.w3.org/TR/html4/loose.dtd"" >")

Case _XHTMLFormat.HTML4_Transitional
m_sXHTML = m_sXHTML.Insert(0, "<!DOCTYPE html PUBLIC ""-//W3C//DTD HTML 4.0 Transitional//EN"" >")

End Select

If m_XHTMLFormat <> _XHTMLFormat.HTML4_Transitional AndAlso m_XHTMLFormat <> _XHTMLFormat.HTML401_Loose Then
Dim s As String = "?>" & Cr & Lf
Dim h As String = m_Encoding.HeaderName
m_sXHTML = m_sXHTML.Insert(0, "<?xml version=""1.0"" encoding=""" & h & Chr(34) & s)
End If
End If
End If
End Sub



Private Sub ConvertToLowerCase()
' Update 03/11/2004 : Add support for Tags with properties
' Author : Sébastien FERRAND (mailto:sebastien.ferrand@vbmaf.net)
m_sXHTML = Regex.Replace(m_sXHTML, "<(/?)([a-zA-Z0-9]+)[ ]*(.*?)>", _
New MatchEvaluator(AddressOf SingleTagToLowerCase), RegexOptions.IgnoreCase)

' Update 03/11/2004 : Update to match correctly tag with more one propertie
' Author : Sébastien FERRAND (mailto:sebastien.ferrand@vbmaf.net)
' Make all properties to lower case
m_sXHTML = Regex.Replace(m_sXHTML, "<([a-zA-Z0-9]+)(\\s+[a-zA-Z]+)(=" & Chr(34) & ".+?>)", _
New MatchEvaluator(AddressOf PropertiesToLowerCase), RegexOptions.IgnoreCase)
End Sub



Private Function SingleTagToLowerCase(ByVal m As Match) As String
' Update 03/11/2004 : Add support for Tags with multi-properties
' Author : Sébastien FERRAND (mailto:sebastien.ferrand@vbmaf.net)
If (m.Groups(3).ToString().Trim() = String.Empty) Then
Return "<" & m.Groups(1).ToString().ToLower() & m.Groups(2).ToString().ToLower() & ">"
Else
Return "<" & m.Groups(1).ToString().ToLower() & m.Groups(2).ToString().ToLower() & " " & m.Groups(3).ToString() & ">"
End If
End Function


Private Function PropertiesToLowerCase(ByVal m As Match) As String
Dim szReplace As String
szReplace = "<" & m.Groups(1).ToString() & m.Groups(2).ToString().ToLower()

' Search another property in tag
If (Regex.Match(m.Groups(3).ToString(), "(.*?"")(\\s+\\w+)(="".+>)", RegexOptions.IgnoreCase).Success) Then
szReplace &= Regex.Replace(m.Groups(3).ToString(), "(.*?"")(\\s+\\w+)(="".+>)", New MatchEvaluator(AddressOf nextProperty), RegexOptions.IgnoreCase)
Else
szReplace &= m.Groups(3).ToString()
End If

Return szReplace
End Function


' Recursively search for property in tag
' <param name="m">Match of the regular expression</param>
' <returns>tag with lower case properties</returns>
Private Function nextProperty(ByVal m As Match) As String
Dim szReplace As String = ""
szReplace = m.Groups(1).ToString() & m.Groups(2).ToString().ToLower()

' Search another property in tag
' Ignore if tag contains __VIEWSTATE... prevent long time calculation.
If (Regex.Match(m.Groups(3).ToString(), "(.*?"")(\\s+\\w+)(="".+>)", RegexOptions.IgnoreCase).Success AndAlso m.Groups(3).ToString().IndexOf("__VIEWSTATE") = -1) Then
System.Diagnostics.Debug.WriteLine("Match OK", "nextProperty")
szReplace &= Regex.Replace(m.Groups(3).ToString(), "(.*?"")(\\s+\\w+)(="".+>)", New MatchEvaluator(AddressOf nextProperty), RegexOptions.IgnoreCase)
Else
System.Diagnostics.Debug.WriteLine("Match NOK", "nextProperty")
szReplace &= m.Groups(3).ToString()
End If
Return szReplace
End Function


Private Function HTMLTag(ByVal m As Match) As String
Return "<html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""" & m_sLanguage & """>"
End Function


Private Sub FixHtml()
m_sXHTML = Regex.Replace(m_sXHTML, "<html>", New MatchEvaluator(AddressOf HTMLTag), RegexOptions.IgnoreCase)
End Sub


Private Sub FixBr()
m_sXHTML = m_sXHTML.Replace("<br>", "<br />")
End Sub


Private Sub FixScript()
m_sXHTML = m_sXHTML.Replace("<script language=""javascript"">", "<script type=""text/javascript"">")
End Sub


Private Sub FixStyle()
m_sXHTML = Regex.Replace(m_sXHTML, "style=""[^""]+""", New MatchEvaluator(AddressOf ToLowerCase), RegexOptions.IgnoreCase)

' // Add <![CDATA[ ... ]]> to mask style
Dim m As New MatchEvaluator(AddressOf FixStyleAndScript)

m_sXHTML = Regex.Replace(m_sXHTML, _
"(<style[^<]*>){1}(?(?=\s*<!--)(\s*<!--)(\s*.*?)(//\s*-->)|(\s*.*?))\s*(</style>){1}", _
m, _
RegexOptions.IgnoreCase Or RegexOptions.Singleline)
End Sub


Private Sub FixInput()
Dim nStart, nPos, nEnd As Integer
nStart = 0
nPos = 0

While (nPos >= 0)
Dim sSearch As String = "<input type=""hidden"""
nPos = m_sXHTML.IndexOf(sSearch, nStart)
If (nPos > 0) Then
nStart = nPos + sSearch.Length
m_sXHTML = m_sXHTML.Insert(nPos, "<pre>")

nEnd = m_sXHTML.IndexOf(">", nStart)
If (nEnd > 0) Then
m_sXHTML = m_sXHTML.Insert(nEnd + 1, "</pre>")
End If
End If
End While

End Sub



Private Sub AddSelfClose(ByVal sTagName As String)
Dim nStart, nPos, nEnd As Integer
nStart = 0
nPos = 0

While (nPos >= 0)
Dim sSearch As String = "<" & sTagName
nPos = m_sXHTML.IndexOf(sSearch, nStart)
If (nPos > 0) Then
nStart = nPos + 1
nEnd = m_sXHTML.IndexOf(">", nStart)
If (nEnd > 0) Then
Dim c As Char = m_sXHTML.Chars(nEnd - 1)
If (c <> "/") Then
m_sXHTML = m_sXHTML.Insert(nEnd, " /")
End If
End If
End If
End While

End Sub



Private Sub RemoveAttribute(ByVal sTagName As String, ByVal sAttrName As String)
Dim nStart, nLength, nEnd As Integer
nStart = 0
nLength = 0
' Matches the tag containing the attribute
Dim rTagWithAttr As New Regex("<" & sTagName & "[^>]* " & sAttrName & "=""(.*?)""")
' Collection contains all occurances of the tag with the attribute
Dim mcTags As MatchCollection
mcTags = rTagWithAttr.Matches(m_sXHTML)

' Count BACKWARDS through the collection because the m_sXHTML length is affected each time
' an attribute is removed
Dim i As Integer = mcTags.Count - 1

While i >= 0
nStart = mcTags(i).Index + mcTags(i).Value.IndexOf(sAttrName) - 1
nLength = mcTags(i).Length - mcTags(i).Value.IndexOf(sAttrName) + 1
m_sXHTML = m_sXHTML.Remove(nStart, nLength)
i -= 1
End While
End Sub


Private Sub maskScript()
' Add <![CDATA[ ... ]]> to mask script
m_sXHTML = Regex.Replace(m_sXHTML, _
"(<script[^<]*>){1}(?(?=\s*<!--)(\s*<!--)(\s*.*?)(//\s*-->)|(\s*.*?))\s*(</script>){1}", _
New MatchEvaluator(AddressOf FixStyleAndScript), _
RegexOptions.IgnoreCase Or RegexOptions.Singleline)
End Sub



Private Function FixStyleAndScript(ByVal m As Match) As String
Dim ret, st, ed As String
ret = ""

If (m_bXmlCDATA) Then
st = Cr & Lf & "<![CDATA[" & Cr & Lf
ed = Cr & Lf & "]]>" & Cr & Lf
Else
st = Cr & Lf & "<!--" & Cr & Lf
ed = Cr & Lf & "//-->" & Cr & Lf
End If

If (m.Groups(2).ToString().Trim() = "" AndAlso m.Groups(4).ToString().Trim() = "") Then
st = ""
ed = ""
End If

ret = m.Groups(1).ToString() & st
ret &= m.Groups(2).ToString() & m.Groups(4).ToString() & ed & m.Groups(5).ToString()
Return ret
End Function


Private Function ToLowerCase(ByVal m As Match) As String
Return m.ToString().ToLower()
End Function

End Class

End Namespace

General[Message Deleted] Pin
Lucifer7918-May-05 23:20
Lucifer7918-May-05 23:20 
General[Message Deleted] Pin
Lucifer7918-May-05 23:58
Lucifer7918-May-05 23:58 
GeneralReplace DOC TYPE don't runs Pin
Anonymous6-Sep-05 6:16
Anonymous6-Sep-05 6:16 
GeneralRe: Replace DOC TYPE don't runs Pin
Anonymous6-Sep-05 8:07
Anonymous6-Sep-05 8:07 
GeneralRe: Replace DOC TYPE don't runs Pin
foued6923-Mar-06 4:13
foued6923-Mar-06 4:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.