Click here to Skip to main content
15,902,939 members
Articles / Programming Languages / C# 3.5
Tip/Trick

Replace HTML Tags From text/string Using regex

Rate me:
Please Sign up or sign in to vote.
2.23/5 (9 votes)
12 Dec 2015CPOL 41.1K   10   7
Replace all HTML tags from text/string

Introduction

This tip shows you how to replace HTML tags from text/string using Regular expression.

Using the Code

Pass text/string to input variable. The regular expression will remove HTML tags like <p>, </p>, <body>, </div>, etc. This is case insensitive.

C#
string input = "<b>This is test.</b><p> Enter any text.</p><div> The place is really beautiful.</div><img src=''>";

//To remove tags which are without any attribute
string str1 = Regex.Replace(input, @"(\<(\/)?(\w)*(\d)?\>)", string.Empty);

//To remove all kind of tags -- suggested by codeproject member 'svella'
string str2 = Regex.Replace(input, @"<.*?>", string.Empty);

Console.WriteLine(str1);
Console.WriteLine(str2);
Console.ReadLine();

Points of Interest

  1. Less code compared to string.Replace(), less maintenance.
  2. Even if tomorrow new HTML tags come or it can even remove 3rd party tags which follow HTML tag kind of syntax

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Asia/Pacific Region Asia/Pacific Region
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralTitle needs changed Pin
Eric A Law28-Jan-22 3:51
Eric A Law28-Jan-22 3:51 
QuestionDON'T USE REGEXPFOR PARSING HTML Pin
giammin17-Dec-15 0:02
giammin17-Dec-15 0:02 
QuestionNice. Just added a little formatting helper with paragraphs and breaks Pin
lespauled11-Dec-15 7:36
lespauled11-Dec-15 7:36 
QuestionSome simplification suggestions Pin
Thomas Daniels10-Dec-15 5:28
mentorThomas Daniels10-Dec-15 5:28 
PraiseRe: Some simplification suggestions Pin
Rajdeep Debnath10-Dec-15 5:47
Rajdeep Debnath10-Dec-15 5:47 
AnswerRe: Some simplification suggestions Pin
svella11-Dec-15 11:38
svella11-Dec-15 11:38 
Going even further, as is this will fail to remove tags with attributes or even extra whitespace after the tag name. Rather than try to match specific characters between the angle brackets, why not just use a non-greedy quantifier: @"<.*?>"

This also has the added benefit of also removing comments, processing instructions, but will also clobber CDATA sections, which you probably don't want to do. If you want to handle real world HTML from arbitrary sources, there really is no substitute for a full HTML parser.

PraiseRe: Some simplification suggestions Pin
Rajdeep Debnath11-Dec-15 21:13
Rajdeep Debnath11-Dec-15 21:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.