Click here to Skip to main content
15,898,134 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
hey i am currently working on a natural language project. So at first the task at had was to extract the keywords out of a text. Now dat is done and i am gonna put the codes in here. Can anyone suggest some techniques to extract the nouns out of the text by further modifying the code.
C#
namespace maxrep
{
  class Program
  {
    static void Main(string[] args)
    {
      string filename = "hello.txt";
      // string filename1 = "text.txt";
      /*
      * 
      *List<streamreader> SRL = new List<streamreader>();
      for (int i=1; i<foo.number_of_files+1;i++)>
      { 
      StreamReader aa= new StreamReader(@"realtime_" + Foo.main_id + "_" + i + ".txt");
      SRL.Add (aa);
      }
      */
      string inputString = File.ReadAllText(filename);
      // string inputStr = File.ReadAllText(filename1);

      inputString = inputString.ToLower();

      // Define characters to strip from the input and do it
      string[] stripChars = { ";", ",", ".", "-", "_", "^", "(", ")", "[", "]",
                              "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "\n", "\t", "\r" };
      foreach (string character in stripChars)
      {
        inputString = inputString.Replace(character, "");
      }

      List<string> wordList = inputString.Split(' ').ToList();

      string[] stopwords = new string[] { "and", "the", "she", "for", "this", "you", "but" };
      // string[] negative = new string[] { "bad", "worse", "low", "decrease", "fail", "reduce", "weak", "sad" };
      foreach (string word in stopwords)
      {
        while (wordList.Contains(word))
        {
          wordList.Remove(word);
        }
      }

      Dictionary<string, int> dictionary = new Dictionary<string, int>();

      foreach (string word in wordList)
      {
        if (word.Length >= 3)
        {
          if (dictionary.ContainsKey(word))
          {
            dictionary[word]++;
          }
          else
          {
            dictionary[word] = 1;
          }
        }
      }

      var sortedDict = (from entry in dictionary orderby entry.Value descending select entry).ToDictionary(pair => pair.Key, pair => pair.Value);

      int count = 1;
      Console.WriteLine("---- Most Frequent Terms in the File: " + filename + " ----");
      Console.WriteLine();
      foreach (KeyValuePair<string, int> pair in sortedDict)
      {
        Console.WriteLine(count + "\t" + pair.Key + "\t" + pair.Value);
        count++;
      }
      Console.ReadKey();
    }
  }
}
Posted
Updated 8-Mar-13 7:12am
v2
Comments
Richard MacCutchan 8-Mar-13 6:50am    
Create a list of nouns and search for them.
joshrduncan2012 8-Mar-13 13:23pm    
Nice, +5.

1 solution

I fixed the formatting of the code in your question.
However, your attempt to get a sorted dictionary will not work.
Using the .ToDictionary(...) turns it back into a regular Dictionary which does not preserve any ordering.
It looks like you can just use the query to make an IEnumerable<KeyValuePair<string, int>> and iterate over that:
C#
var sortedWordCounts = from entry in dictionary orderby entry.Value descending select entry;

int count = 1;
Console.WriteLine("---- Most Frequent Terms in the File: " + filename + " ----");
Console.WriteLine();
foreach (var pair in sortedWordCounts)
{
  Console.WriteLine(count + "\t" + pair.Key + "\t" + pair.Value);
  count++;
}
Console.ReadKey();

If you really need to keep the collection in the sorted order, you should use .ToList() or .ToArray().
 
Share this answer
 
Comments
Arjun Abco 8-Mar-13 23:21pm    
okie thnx for ur effort and now if i want to extract the nouns then how well should i format the given codes or can i do it by adding something more in this. I cannot keep a seperate file or directory and then check in it because then it becomes a simple pattern or string matchin and not natural language processing. I did try a few methods bu nothin worked out fine. Help apprecitated!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900