IDF( is a popular measure of a word's importance. The IDF invari- ably appears in a host of heuristic measures used in information retrieval. However, so far the IDF has itself been a heuristic.
mathamatically IDF is the
IDF(t,D)=log(Total Number documents/Number of Document matching term);
Actually i have develop one application for document clustering. in this i have
one IDF method like as
private static float FindInverseDocumentFrequency(string term)
{
int count = documentCollection.ToArray().Where(s => r.Split(s.ToUpper()).ToArray().Contains(term.ToUpper())).Count();
return (float)Math.Log((float)documentCollection.Count / (float)count);
}
this method use the following declared statments in program
documentCollection like as
documentCollection = collection.DocumentList[dv.content] as Hashtable;
DocumentList is like as
private DocumentCollection docCollection= new DocumentCollection() { DocumentList = new Hashtable() };
s is the string like as
List<string> removeList = new List<string>(){"\"","\r","\n","(",")","[","]","{","}","","."," ",","};
foreach (string s in removeList)
{
distinctTerms.Remove(s);
}
r is the Regular expression
private static Regex r = new Regex("([ \\t{}()\",:;. \n])");
IDF method have some error like as:
"
documentcollection.toarray() occur error like as
"'System.Collections.Hashtable' does not contain a definition for 'ToArray' and no extension method 'ToArray' accepting a first argument of type 'System.Collections.Hashtable' could be found (are you missing a using directive or an assembly reference?)
please slove this error.
please help me.thank u