Click here to Skip to main content
15,899,634 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
How can i generate keywords and thier count from a webpage using C#.

I have got the web page into string using HTMLAgilityPack and then converted them into words into a arraylist.

But now filter the keyword as adding their counts on the side by removing the duplicate.

My Code:
//Uses HtmlAgilityPack
var webGet = new HtmlWeb();
var doc = webGet.Load(url);

HtmlNode bodyContent = doc.DocumentNode.SelectSingleNode("/html/body");

            if (bodyContent != null)
            {
                pmd.Html = stripHtml(bodyContent.InnerHtml.ToString());                
            }  

string wordsOnly = pmd.Html;

string[] arrayWordsOnly = wordsOnly.Split(' ');                    
                    char[] spChar = new char[] { '?', '\"', ',', '\'', ';', ':', '.', '(', ')', '!' };

foreach (string word in arrayWordsOnly)
{
   key = word.Trim(spChar).ToLower();                           
}

protected string stripHtml(string strHtml)
        {
            //Strips the HTML tags from strHTML
            Regex objRegExp = new Regex("<(.|\n)+?>");
            string strOutput;
            //Replace all HTML tag matches with the empty string
            strOutput = objRegExp.Replace(strHtml, "");
            strOutput = strOutput.Replace("<", "<");
            strOutput = strOutput.Replace(">", ">");
            objRegExp = null;
            return strOutput;
        }
Posted
Updated 24-Mar-11 17:29pm
v4

1 solution

Firstly, you talk about using an ArrayList, this is no longer recommended. You should probably use a List<string> (MSDN page[^]).

Accepting that you do this, something like the following should do the trick:
C#
List<string> uniqueWords = new List<string>();
foreach (string word in arrayWordsOnly)
{
   key = word.Trim(spChar).ToLower();
   if (!uniqueWords.Contains(key))
   {
      uniqueWords.Add(key);
   }
}


If you are determined to use ArrayList then simply replace each occurrence of List<string> with ArrayList
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900