Click here to Skip to main content
15,867,568 members
Articles / Programming Languages / C#

A C# Library to Implement Array Tries

Rate me:
Please Sign up or sign in to vote.
4.91/5 (6 votes)
11 Jun 2021MIT9 min read 6.9K   196   9   1
Implementation, testing and practical use of a C# library implementing array tries
This article deals with the implementation, testing and practical use of a C# library implementing array tries. The library can be used by any managed-code Visual Studio application requiring efficient access to data stored in memory.

Introduction

A trie is a data structure used to access keys (strings) from a given data set. Most often, tries are built from character strings and represented as trees where the link from one node to each of its children is labeled with a unique character. According to the Wikipedia page titled “Trie”, the idea of a trie was first abstractly described by Axel Thue in 1912, and tries were described in terms of computing by Rene de la Briandais in 1959. In 1960, Edward Fredkin coined the term trie.

The following two figures illustrate tree representations of tries. The first figure was obtained from sample images of tries on the web. Each branch of the tree ends with a node labeled “EOW” which stands for “end-of-word”. As will be seen in the implementation of the library, such nodes are not necessary.

Image 1

The second figure is from the Wikipedia page titled “Trie”. In the original, the tree is missing two links: one from node “te” to node “ten” and one from node “in” to node “inn”. Those links have been added in red.

Image 2

From the example figures, it can be seen that tries are multiway trees, which means that a node in a trie can have an arbitrary number of children.

In his seminal 1960 paper titled “Trie Memory” (Communications of the ACM, Vol. 3, No. 9, pp. 490-499), Edward Fredkin described the use of memory registers to store tries. (As a passing note, the author had free access through the site ResearchGate only to the first page of Fredkin’s paper. To obtain the full paper, the user not only has to pay a “joining” fee but also the fee for the paper, and those fees add up to an unreasonable amount of money for such an old paper).

Figure 1 in Fredkin’s paper shows a schematic representation of the storage of words in trie memory, and it is reproduced in the following figure:

Image 3

For simplicity, Fredkin limited the character set to the first five letters of the alphabet plus a symbol (Ω in this reproduction) to denote “space” or end-of-word. The rows represent registers and the columns are labeled with the character set. Each non-empty entry in the table contains the number of the next register to be inspected. The process to retrieve a word always starts at register 1 (bottom row).

Lacking access to the rest of Fredkin’s paper, the workings of the schematic representation must be figured out somehow. As an example, entry (1,D) contains number 2; entry (2,A) contains number 3; entry (3,B) contains number 4; finally, entry (4,Ω) contains the special character “|” thus indicating the end of word “DAB”. Carrying out this process with the rest of some of the non-empty entries in register (row) 1, one obtains the following partial interpretation of the schematic.

Image 4

The reader is encouraged to add the links for the words “BE”, “BED” and “A”.

Implementation of the Trie Library

In this section, the C# implementation of the trie library will be described from the bottom up, that is, from elementary data structures and functions to more abstract ones. The complete code is in the ZIP file attached to this article. All the code in this section is assumed to reside within the namespace TriesLib.

Trie Nodes

Nodes in a trie contain the information necessary to encode the characters in a string.

C#
public class TrieNode
{
   private char ch;            // Character at some position in a string.
   private int limit, index;   // Number of subtrie nodes and index of the trie node.
   private TrieNode[] subtrie; // Array of all possible trie nodes that can follow 'ch'.

   /// <summary>Create an instance of a TrieNode with the given parameters.
   /// </summary>
   /// <param name="_ch">Character in the trie node.</param>
   /// <param name="_limit">Number of subtrie nodes.</param>
   /// <param name="_index">Index of the trie node.</param>
   /// <param name="endOfString">Whether or not the end of a string has been reached.
   /// </param>
   public TrieNode( char _ch, int _limit, int _index, bool endOfString )
   {
      ch = _ch;
      limit = _limit;

      subtrie = new TrieNode[ limit ];

      for ( int i = 0; i < limit; ++i )
      {
         subtrie[ i ] = null;
      }
      index = endOfString ? _index : -1;
   }// TrieNode (constructor)

The constuctor receives as arguments the character to be stored in the node, the number of subtrie nodes that the node will have, an index for the node and a flag indicating whether or not the character is at the end of the string being processed.

To access the value of the private data members of the node, there are three read-only (get) properties: Ch, Limit and Index. The following accessor function is used to get or set a subtrie element of the node.

C#
/// <summary>Get or Set the i-th subtrie of a trie node.
/// </summary>
/// <param name="i">Index of the subtrie.</param>
/// <returns>If the index {i} is valid, the i-th subtrie of a trie node.
/// </returns>
public TrieNode this[ int i ]
{
   get
   {
      if ( 0 <= i && i < limit )
      {
         return subtrie[ i ];
      }
      else
      {
         InvalidIndex( i );
         return null;
      }
   }
   set
   {
      if ( 0 <= i && i < limit )
      {
         subtrie[ i ] = value;
      }
      else
      {
         InvalidIndex( i );
      }
   }
}// this

Function InvalidIndex simply displays a MessageBox stating that the index passed to this is invalid.

Tries

Tries are built by creating trie nodes. In the examples of the previous section, tries were represented as trees. In the trie library being described, the implementation of tries is array-based.

C#
public class Trie
{
   private TrieNode root; // Root node of the trie.
   private int limit;     // Limit on the possible subtries of a trie node.

   public Trie()
   {
      limit = TLconst.asciiTrieLimit;
      root = new TrieNode( '@', limit, -1, false ); // Root node.
   }// Trie (constructor)

Every time the preceding constructor is called, a root TrieNode is created with the number of its possible subtries set by the following class:

C#
public static class TLconst
{
   public static readonly
      int
      asciiTrieLimit = 256; // A Trie contains characters from the standard ASCII code.
}// TLconst (static class)

What this means is that each trie node can have up to 256 possible subtries. This allows handling strings containing any possible combination of characters from the standard ASCII character set.

Insertion of Words into a Trie

Assuming that a sample application has a static class ATconst containing an array of words to be inserted, a Trie would be created and the words would be inserted as follows:

C#
Trie trie = new Trie();
for ( int j = 0; j < ATconst.words.Length; ++j )
{
   trie.Insert( ATconst.words[ j ], j );
}

The following two functions handle the insertion of the characters of a string into the trie.

C#
/// <summary>Insert the characters of a string into the trie.
/// </summary>
/// <param name="str">String containing the characters to be inserted.</param>
/// <param name="_index">Index of {str} in a data set.
/// </param>
public void Insert( string str, int _index )
{
   if ( !string.IsNullOrEmpty( str ) )
   {
      TraverseToInsert( str, _index, root );
   }
   else
   {
      MessageBox.Show( "Trie class error: null or empty strings cannot be inserted" );
   }
}// Insert

/// <summary>Traverse the trie to insert the characters of a non-null, non-empty string.
/// </summary>
/// <param name="str">String containing the characters to be inserted.</param>
/// <param name="_index">Index of {str} in a data set.</param>
/// <param name="node">Current trie node.
/// </param>
private void TraverseToInsert( string str, int _index, TrieNode node )
{
   int n = str.Length, m = n - 1;

   for ( int i = 0; i < n; ++i )
   {
      char _ch = str[ i ];
      int j = (int)_ch;    // ASCII code of {_ch}.

      // {node[ j ] is {node.subtrie[ j ]} accessed through {node.this[ j ]}

      if ( node[ j ] == null )
      {
         node[ j ] = new TrieNode( _ch, limit, _index, i == m );
      }
      node = node[ j ];
   }
}// TraverseToInsert

For example, starting with an empty trie, the insertion of the characters of the stringCAB” would leave the trie in the state shown in the following figure:

Image 5

There are four nodes in the trie. The top-most node corresponds to the root of the trie. At index 67 (ASCII code for character ‘C’) of the subtrie array of the root node a new TrieNode was created. At index 65 (ASCII code for character ‘A’) of the subtrie array of the second node a new TrieNode was created. Finally, at index 66 (ASCII code for character ‘B’) of the subtrie array of the third node a new TrieNode was created. All the elements of the subtrie array of the fourth node are null and ready for the insertion of more characters if necessary.

Backward Listing of Entries in a Trie

In order to allow the verification that all the strings in a data set have been inserted into the trie, an application using the library can call the following function, which collects into a list the indices of the nodes in the trie. The function relies on an auxiliary function that does the actual collection. The indices are collected in reverse order for no particular reason.

C#
/// <summary>Generate a backward listing of the trie indices.
/// </summary>
/// <returns>List of the indices.
/// </returns>
public List<int> BackwardListingOfEntries()
{
   List<int> indices = new List<int>();

   DescListNode( root, indices );

   return indices;
}// BackwardListingOfEntries

/// <summary>Descending listing of the indices in a trie.
/// </summary>
/// <param name="node">Current node in the trie.</param>
/// <param name="indices">List containing the indices of entries in the trie.
/// </param>
private void DescListNode( TrieNode node, List<int> indices )
{
   if ( node != null )
   {
      if ( node.Index != -1 )
      {
         indices.Add( node.Index );
      }
      for ( int i = limit - 1; i >= 0; --i )
      {
         DescListNode( node[ i ], indices );
      }
   }
}// DescListNode

Retrieval of Words That Start With a Given Prefix

One important application of tries is finding all the strings in a data set that start with a given prefix. Assuming that the strings are not sorted, for small data sets exhaustive search may be tolerated. However, for large data sets such an approach is not acceptable. If the strings are stored in a trie, there is no search at all and the retrieval of the indices of target strings takes time linear in the number of strings.

The retrieval of words that start with a given prefix is as follows. Starting at the root of the trie, traverse the prefix until reaching the node corresponding to the last character of the prefix. Then collect the indices of all the nodes under that node. This algorithm is implemented by the following three functions:

C#
/// <summary>Collect the indices in the trie of all the strings that
///          have the same prefix.
/// </summary>
/// <param name="prefix">Prefix for which the indices must be collected.</param>
/// <returns>List of indices from the trie.
/// </returns>
public List<int> Collect( string prefix )
{
   List<int> indices = new List<int>();

   if ( !string.IsNullOrEmpty( prefix ) )
   {
      int n = prefix.Length;
      TrieNode startNode;

      startNode = TraverseStr( prefix, root );
      if ( startNode != null )
      {
         CollectAll( startNode, indices );
      }
   }
   return indices;
}// Collect

/// <summary>Get to the node corresponding
/// to the last character of the prefix of a string.
/// </summary>
/// <param name="prefix">String to traverse in the trie.</param>
/// <param name="startNode">Start node for the traversal of {prefix}.</param>
/// <returns>Node in the trie corresponding to the last character of {prefix}.
/// </returns>
private TrieNode TraverseStr( string prefix, TrieNode startNode )
{
   TrieNode node = startNode;

   for ( int i = 0; i < prefix.Length; ++i )
   {
      char _ch = prefix[ i ];
      int j = (int)_ch;    // ASCII code of {_ch}

      if ( node[ j ] != null )
      {
         node = node[ j ];
      }
      else
      {
         node = null;
         break;
      }
   }
   return node;
}// TraverseStr

/// <summary>Collect the indices of all the nodes under
///          a given node in a trie.
/// </summary>
/// <param name="node">Current node in the trie.</param>
/// <param name="indices">List of all indices to collect.
/// </param>
private void CollectAll( TrieNode node, List<int> indices )
{
   if ( node != null )
   {
      if ( node.Index != -1 )
      {
         indices.Add( node.Index );
      }
      for ( int i = 0; i < limit; ++i )
      {
         CollectAll( node[ i ], indices );
      }
   }
}// CollectAll

An unstated assumption in the preceding functions is that the data set resides in memory and the target strings are accessed with the list of indices retrieved from the trie.

Testing the Library

The library to implement array tries can be tested by a simple console application such as the following. The code in the ZIP file contains more test cases that yield the expected results.

C#
using TriesLib;

namespace ArrayTries
{
   class Program
   {
      static void Main( string[] args )
      {
         Trie trie = new Trie();

         for ( int j = 0; j < Constants.words.Length; ++j )
         {
            trie.Insert( Constants.words[ j ], j );
         }

         ShowBackwardListing( trie.BackwardListingOfEntries() );

         ShowCollection( "a", trie );
         ShowCollection( "beh", trie );
         ShowCollection( "foo", trie );

         Console.WriteLine();
      }// Main

      public static void ShowBackwardListing( List<int> indices )
      {
         if ( indices.Count == 0 )
         {
            Console.WriteLine( "\nThe trie contains no entries\n" );
         }
         else
         {
            Console.WriteLine( "\nBackward listing of trie entries\n" );
            foreach ( int i in indices )
            {
               Console.WriteLine( Constants.words[ i ] );
            }
            Console.WriteLine();
         }
      }// ShowBackwardListing
      public static void ShowCollection( string prefix, Trie trie )
      {
         List<int> indices = trie.Collect( prefix );

         if ( indices.Count == 0 )
         {
            Console.WriteLine(
               string.Format( "\nThe trie contains no entries beginning with '{0}'", prefix ) );
         }
         else
         {
            Console.WriteLine( string.Format( "\nEntries beginning with '{0}'", prefix ) );

            foreach ( int i in indices )
            {
               Console.WriteLine( Constants.words[ i ] );
            }
         }
      }// ShowCollection
   }// Program (class)
}// ArrayTries (namespace)

To save space, the output from the program was organized in table form, and is shown in the following two figures:

Image 6

Image 7

A Simple Application: Simulation of Auto-Complete

Most, if not all, of the graphical user-interface (GUI) programs that provide a search capability have an auto-complete functionality. As the user is typing characters into the search box, the search “engine” displays suggestions to automatically complete the search text. It is very likely that such “engines” use data stored in a trie to present those suggestions.

The main program in the console application in the previous section can be extended to call the following function to simulate a user typing characters in a search box and a search engine’s list of auto-completion strings at each step.

C#
public static void SimulateAutoComplete( string target, Trie trie )
{
   int n = target.Length;

   if ( n > 4 )
   {
      Console.WriteLine( "\nSimulating auto-complete with string '{0}'\n", target );

      for ( int i = 0; i < n / 2; ++i )
      {
         string prefix = target.Substring( 0, i + 1 );

         Console.WriteLine( "prefix: '{0}'", prefix );
         ShowCollection( prefix, trie );
         Console.WriteLine();
      }
   }
}// SimulateAutoComplete

For example, the function call...

C#
SimulateAutoComplete( "behemoth", trie );

...would generate the output shown in the following figure. Again, the output was organized in table form.

Image 8

Implementing the Library

The ZIP file attached to this article contains three files that implement the library: TrieNode.cs, Trie.cs and TLconst.cs. In Visual Studio, create a new Visual C# Class Library with the name TriesLib. In the Solution Explorer, right-click on the default class name chosen by Visual Studio (usually Class1.cs), re-name it as TrieNode.cs and copy the code from file TrieNode.cs. Right-click on the solution name and select Add->New Item. In the menu that appears, select Class, name it Trie.cs and copy the code from file Trie.cs. In the Solution Explorer, right-click on References and select Add Reference. In the window that appears, select the .NET tab and add a reference to the library System.Windows.Forms. Right-click on the solution name and select Add->New Item. In the menu that appears, select Class, name it TLconst.cs and copy the code from file TLconst.cs. Build the solution and the bin\Debug directory will contain the library file TriesLib.dll.

Using the Library

The ZIP file also contains a file named Program.cs that implements a simple console application that uses the library, and a file named ATconst.cs which contains the sample strings to be inserted in a trie. In Visual Studio, create a new console application with the name ArrayTries. In the Solution Explorer, right-click on References and select Add Reference. In the window that appears, select the Browse tab and navigate to the bin\Debug directory containing the library file, TriesLib.dll. Select the library file and click on OK. Replace the entire contents of the empty Program.cs file with the text in the Program.cs file. Right-click on the solution name and select Add->New Item. In the menu that appears, select Class, name it ATconst.cs and copy the code from file ATconst.cs. Build the solution and run it.

Conclusion

This article has dealt with the implementation, testing, and practical use of a C# library implementing array tries. The library can be used by any managed-code Visual Studio application requiring efficient access to data stored in memory. The code can be adapted to access data in a database if the primary keys are strings. All the code described in this article was developed with Visual Studio 2010. The code was then ported to Visual Studio 2019 and it ran without requiring any modifications.

History

  • 11th June, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionHehe, congratulations, you have discovered the "Data Dictionary"! Pin
AnotherKen17-Jun-21 10:01
professionalAnotherKen17-Jun-21 10:01 
Your Array Trie is exactly what a Data Dictionary is, they already exist in C#. In this case the final child node of the dictionary contains a data element or a link to one. It's a good structure for minimizing search times.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.