Click here to Skip to main content
15,867,834 members
Articles / Web Development / HTML
Tip/Trick

How to Substring Articles/News with HTML Tags on Server-Side

Rate me:
Please Sign up or sign in to vote.
4.92/5 (8 votes)
7 Jul 2015CPOL3 min read 16K   158   6   3
In this tip, we will learn how to summarize text files with HTML tags on server-side.

Introduction

In this tip, we will learn how to summarize text files with HTML tags on Server-Side. We will write a method that substrings the text and closes all tags which are left open.

Background

Summarizing is easy thing to do. However, if you have designed a blog or news page with text editors, you will understand the problem. The tags will be left open or you can cut the tag inside. Most people are doing this with JS or JS tools, sending the whole article to the index page and summarizing the article on the client-side. Simply, we don't need to do that. Moreover, imagine that you have a blog page and you listed 10-20 articles with summaries on the index page. If you send the whole article from the server, myriads of characters will be wasted.

Using the Code

The SubstringHtml method is easy to use. You just need to send the text and enter number of the length value that remains after the process.

public string SubstringHtml(string stringValue, int length)

First, we need to define regular expressions.

C#
var regexAllTags = new Regex(@"<[^>]*>");
var regexIsTag = new Regex(@"<|>");
var regexOpen = new Regex(@"<[^/][^>]*>");
var regexClose = new Regex(@"</[^>]*>");
var regexAttribute = new Regex(@"<[^ ]*");

"regexAllTags" will be used to detect the length value. Clearly, string value will be saved to database with HTML tags, so the length value includes tags. Therefore, we need to remove all tags and detect the text length.

"regexOpen" and "regexClose" will be used for detecting open tags and close tags.

"regexAttribute" will be used to remove attributes in the open tags and to transform them into close tags.

"regexIsTag" will be used to define whether we break the tag inside.

C#
int necessaryCount = 0;

if (regexAllTags.Replace(stringValue, "").Length <= length)
{
    return stringValue;
}

If the "stringValue" without tags have lower length than "length", we don't need to do cutting process.

C#
string[] split = regexAllTags.Split(stringValue);
string counter = "";

foreach (string item in split)
{
   if (counter.Length < length && counter.Length + item.Length >= length)
   {
       necessaryCount = stringValue.IndexOf(item,counter.Length) 
       + item.Substring(0, length - counter.Length).Length;

       break;
   }

   counter += item;
}

In this part, we are removing all tags and split the text into sections to detect which section we need to cut. After that, search the section in the original string that have tags and the start index will be "counter.Length". Finally, we detected the necessary count to cut the original string in "length" value.

C#
var x = regexIsTag.Match(stringValue, necessaryCount);
if (x.Value == ">")
{
    necessaryCount = x.Index + 1;
}

In this part, we are checking whether we break the tag inside. Breaking the tag is not possible on the split sections technique. However, you may need this code if you change/add any code.

Finally, we safely detected the necessaryCount as index value on the original string. Now, we are going to cut the text and close all tags that are left open.

C#
string subs = stringValue.Substring(0, necessaryCount);

var openTags = regexOpen.Matches(subs);

var closeTags = regexClose.Matches(subs);
C#
List<string> OpenTags = new List<string>();
foreach (var item in openTags)
{
    string trans = regexAttribute.Match(item.ToString()).Value;

    trans = "</" + trans.Substring(1, trans.Length - 1);

    if (trans.Last() != '>')
    {
        trans += ">";
    }

    OpenTags.Add(trans);
}

In this section, we are removing all the attributes (also blank spaces) on the open tags. After that, we are converting the tags into close tags to compare with the close tags list.

C#
foreach (System.Text.RegularExpressions.Match close in closeTags)
{
    OpenTags.Remove(close.Value);
}

We need to compare two lists to detect and remove all closed tags in OpenTags list. Now, we have the list of tags which are left open.

C#
for (int i = OpenTags.Count - 1; i >= 0; i--)
{
    if(i == 0) subs += "...";
    subs += OpenTags[i];
}

return subs;

Finally, create a reverse loop (Last in first out) and add the close tags to end of the string.

Example

C#
string ex = "<p>hellooo codeproject<a href='blah'> blah</a><strong> blahblahh</strong> dsafsdf</p>"; 
string substring = SubstringHtml(ex, 30);

And the result is:

C#
substring = "<p>hellooo codeproject<a href='blah'> blah</a><strong> blahb</strong></p>"

If you check the length of the text "hellooo codeproject blah blahb", you can see the result is 30.

Conclusion

I hope this tip will help you. Please share your valuable thoughts and comments. Your feedback is always welcome.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
Turkey Turkey
I am graduated from Istanbul Technical University, Meteorological Engineering at 2015,Jan. During my education I improved my coding skills and took some courses. Now, I am a member of large team that develops web projects.

Comments and Discussions

 
QuestionThank you for sharing Pin
skamit4codeproject15-Oct-23 1:38
skamit4codeproject15-Oct-23 1:38 
QuestionThank you. Pin
ipd-wiss25-Jun-20 4:35
ipd-wiss25-Jun-20 4:35 
GeneralThank You Pin
ekremZr15-Nov-16 4:16
ekremZr15-Nov-16 4:16 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.