Click here to Skip to main content
15,888,610 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
If we have a sentence which is stored in Trie and trying to find a pattern in that. Assuming i am storing text as an array. What if i have repeated words in that sentence. What is correct way of storing it.
Should i keep storing frequency at each leaf node?
What is best way of finding no of occurrence of word in a text?
Posted
Comments
Sergey Alexandrovich Kryukov 18-Mar-13 14:32pm    
Why "text as an array"? Of do you mean array of char, a string? The problem needs some considerable work to do it in good performance from scratch.
—SA

1 solution

Please see my comment to the question.

A solution with a good performance is possible via a hash table: http://en.wikipedia.org/wiki/Hash_table[^].

This structure keeps data in key-value pairs, supports uniqueness of keys, and allows to quickly find a value by a key. The computational time complexity of search is O(1), and is highly beneficial starting some considerable volume of data. (Please see http://en.wikipedia.org/wiki/Big_O_notation[^].)

You would need to have keys of the string type, to represent a word, and a value should represent a number of words found so far (use pointer to a number). You add words one-by one (first, you would need to split your string into words, another problem to solve), and try to add each word. If a value is found by some key, increment the word count by pointer. If not found, add a new key-value pair with value of 1 (a pointer to a value object). At the end, you will have all words each represented only once, and number of occurrences for each word. Don't forget to deallocate all the memory.

Perhaps, much simple structure with acceptable time complexity (O(log(n)) would be a binary tree: http://en.wikipedia.org/wiki/Binary_tree[^].

The idea of using binary tree is pretty much the same.

With C, it's going to be a good volume of work, but it promises you unforgettable and useful experience, especially in patience. :-)
You can help yourself if you find existing C implementation of a tree or a hash table, which should be easy.

Good luck,
—SA
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900