Download source code - 626.8 KB

Introduction

I've been writing a novel using my Creative Writer's Word Processor app and in the process, continuously improved on it. However, the Rhymer.com dictionary I scraped off their website is inadequate. I was able to download 79,635 files from their server and then create a search engine for it which is incorporated into the Words app, but although they provide thousands of words that rhyme with common suffix endings, they are all clustered together with no concern for multiple ending syllable rhymes.

E.g.: uncomfortable rhymes with bowel and 10 000 other entries that have only a similar 'el' sound at the very end which need to be picked through to find something reasonable

Just because you added '-ed' suffix to a word doesn't mean you need to include every verb in the dictionary in your list of words that rhyme with it when you add the same '-ed' suffix.

Its still a useful dictionary but you have to do some work to filter through all the examples that don't really fit what you're looking for. And sometimes there are thousands of them. It's like looking for a specific snowflake in the middle of a blizzard.

This app, however, uses a phonetic fingerprint of each word to isolate similarities that go beyond just the final syllable.

Using the App

Note to Newbie: If you have never written any software and you're just looking for a rhyming dictionary, you can still run this app on Windows10. You just download the software above and extract it onto your hard-drive. Then you'll need to find the executable file.

C:\-wherever-you-extracted-the-file\Rhymes\Rhymes\bin\Debug\Rhymes.exe

You'll likely want to create a short-cut and keep it on your desktop.

Because CodeProject limits the downloadable file-size to 10MB, the app needs to build its database when you launch it for the first time. It will take about 10-15 minutes before it's ready and you'll see a list of words flash on the top left of the form while it's working. Once that is done, it will be ready to rhyme all you want.

Just type in the word you want to rhyme in the text-box and press Enter. Since it uses a phonetic algorithm, your spelling won't be as important as if you were looking for a word from a pre-defined list. Even if you misspell the word, you're trying to rhyme(as I often do) the search results will give you a list that are correctly spelled and that might help you in the future. You don't need to worry about American or British spelling and labourioiusly argue with it while you're just being creative.

Options

MaxEntries - You can limit the number of entries that are presented with this option. If the list of words found by your search exceed this limit, none of the words in that search 'level' (number of matching ending sounds) will not be presented. Searching a short word like 'this' will give you oodles of answers that will spill out all over the place so I question your poetry if you need help rhyming 'this' or 'that' but if you must... just increase the MaxEntries and you'll find something.

By increasing the MaxEntries value, you may get the results you're looking for:

UseClipBoard - When this box is checked, Microsoft's 'Clipboard' will be tested once every second while the app is running. If you 'copy' or 'cut' text into the clipboard, the app will use the text you copied as a search parameter. This is convenient while you're working in a separate Word-Processor and don't want to switch apps.

TopMost - Checking this box will force the app to stay in front of other apps you're running. You can still use your word processor and do your writing, but with this option and the UseClipBoard option checked, you'll quickly see rhymes appear and only have to 'copy' the word you want to rhyme without switching to the app itself.

The Code

The data-tree is a ternary-tree with linked-lists of words attached to every leaf in the tree. As all the data is appended to the end of the file while it is being created and each data-item (tree-leaf & linked-list item) are referenced by their addresses on the same file, there is no need for an encumbring index system and the varying word-sizes do not affect store and retrieval. The Insert/Search methods both start at the root of the tree and progress down for the number of search-keys that make up the signature of the word being searched.

The tree search keys themselves are sound tags that are sequence in reverse order, from the end of the word to the front. Similar sounds have identical tags and each level of the Ternary-Tree search corresponds to the number of 'sounds' counting from the end of a word. As each tree key comparison is a comparison of arbitrarily assigned unique numeric ID numbers that correspond to the collection of letter combinations, these comparisons are not alphabetical but strictly numeric in nature.

The entire Rhyming Dictionary and its Ternary-Tree algorithm can be incorporated into any app with a single file classRD_TernaryTree.cs. Both the Linked-List elements and Tree-Leaves have separate classes which handle the Write & Read methods needed to access the data on the file using Addr long integer to position the FileStream. These are the same addresses that are recorded as pointers for all the tree's component leaves and lists. The tree insertion and search methods both receive the Word to be inserted/searched as a string parameter. The Search() method returns not just one list of words that rhyme with the input parameter but a list of incrementally similar lists of words such that the first list includes words with only one syllable that rhymes with the search word while the next list will have two final syllables that are similar to the requested rhyme.

The Word is first 'dissected' into its component phonetics and the list of sounds are used as search keys of the tree. During the build phase, as each word is added to every linked-list of the tree leaves, it 'falls' down through using a Front-End-Insertion to the tree leaf's Linked-List these linked-lists include all the words in the tree that have identical phonetic signatures down to that level of the search (whatever level the leaf is that you're looking at).

E.g. - The words proposal and disposal will be in two successive the levels 'al'(1) and 'pos'(2) leaves but will diverge in the next level into two separate 'pro'(3) and 'dis'(3) leaves.

The different sounds used to phonetically fingerprint each word are stored in classSounds. Which is shown here below in its entirety.

public class classSounds
{
    public List<string> lstText = new List<string>();  // similar sounding word snips
    static int intIDCounter = 0;                       // static ID counter
    int intID = intIDCounter++;                        // unique ID used as search 'key'
    public int ID { get { return intID; } }
    public classSounds() { }
    public classSounds(string strSound)
    {
        lstText.Add(strSound);
    }

    public classSounds(string[] strSounds)
    {
        lstText.AddRange(strSounds.ToArray<string>());
    }
}

The actual 'key' used in the search is the classSound instance's ID. This value is a unique integer assigned to it when it is first created through the use of a static counter integer variable. Each instance of this class includes one or more letter combinations in the lstText variable. The entire collection of instances of this class are sorted into three categories: Prefix, Cluster & Suffix.

Each word is converted into a phonetic-signature by the DissectWord() method. The prefix and suffix groups are processed first and taken off the head and tail of a dissected word in the sequence in which they were created. The list of 'cluster' type are taken from any part of the word and are not limited to tail/head of it as suffix/prefix lists are.

public static List<int> DissectWord(string strWord)

In order to create the word's phonetic signature, each discovered series of characters that are defined in the classSound's lstText is replaced by a square braced tag.

E.g., an instance of classSound having:

ID = 34
string variables 'ea' & 'ee' in its lstText

will be used to replace every instance of the letters 'ea' and 'ee' in the word with their corresponding 'key-string' [34] where they were located in the word being dissected. When the dissection is complete, the entire string strWord that was first received by the method will be converted into its equivalent series of 'key-strings' (square bracketed ID numbers in the order in which they appeared in the word) and no longer have any letters but only square brackets and numerals. These are then Split at the square braces into an array of strings holding the classSounds ID numbers which are then converted into integer values and returned to the calling methods (both the Ternary Tree's Search and Insert methods make use of the DissectWord()) in the reverse order they were found and used to traverse the Ternary Tree.

Here's the method below:

public static List<int> DissectWord(string strWord)
{
    if (lstSounds.Count == 0)  // if the list of sounds is still empty -> build it
        SoundsInit();

    string strDebugCopy = strWord;          // keep a copy for debugging purposes
    bool bolDebug = false;
    if (bolDebug)
        strWord = strDebugCopy;

    List<int> lstRetVal = new List<int>();
    strWord = Deaccent(strWord).ToLower();  //replaces accented letters 
                                            //with unaccented version

    // replace non-alpha char with enum.NULL
    classSounds cNULL = lstSounds[0];
    string strNULL = EnumReplacement(ref cNULL);

    for (int intLetterCounter = strWord.Length - 1; intLetterCounter >= 0; intLetterCounter--)
    {
        char chrTest = strWord[intLetterCounter];
        if (!char.IsLetter(chrTest))
        {
            string strLeft = strWord.Substring(0, intLetterCounter);
            string strRight = intLetterCounter < strWord.Length - 1
                                                ? strWord.Substring(intLetterCounter + 1)
                                                : "";
            strWord = strLeft + strNULL + strRight;
        }
    }

    //              prefixes - sound-tag prefix at the front of the word
    for (int intPrefixCounter = Prefixes_Start; 
         intPrefixCounter <= Prefixes_End; intPrefixCounter++)
    {
        classSounds cPrefix = lstSounds[intPrefixCounter];
        string strEnumReplacement = EnumReplacement(ref cPrefix);

        for (int intTextCounter = 0; 
             intTextCounter < cPrefix.lstText.Count; intTextCounter++)
        {
            string strPrefix = cPrefix.lstText[intTextCounter];

            if (strWord.Length > strPrefix.Length)
            {
                if (string.Compare(strWord.Substring(0, strPrefix.Length), strPrefix) == 0)
                {
                    // prefix matches word
                    strWord = strEnumReplacement + strWord.Substring(strPrefix.Length);
                    goto exitPrefix; // exit because there can only be one prefix
                }
            }
        }
    }
exitPrefix:

    // suffixes - sound-tag suffix at the end of the word
    for (int intSuffixCounter = Suffixes_Start; 
             intSuffixCounter <= Suffixes_End; intSuffixCounter++)
    {
        classSounds cSuffix = lstSounds[intSuffixCounter];
        string strEnumReplacement = EnumReplacement(ref cSuffix);

        for (int intTextCounter = 0; intTextCounter < cSuffix.lstText.Count; intTextCounter++)
        {
            string strSuffix = cSuffix.lstText[intTextCounter];

            if (strWord.Length > strSuffix.Length)
            {
                string strWordEnd = strWord.Substring(strWord.Length - strSuffix.Length);
                if (string.Compare(strWordEnd, strSuffix) == 0)
                {
                    // Suffix matches word
                    strWord = strWord.Substring
                              (0, strWord.Length - strSuffix.Length) + strEnumReplacement;
                    goto exitSuffixes; // exit because there can only bee one suffix
                }
            }
        }
    }
exitSuffixes:

    //              consonantal
    for (int intClusterCounter = Clusters_Start; 
         intClusterCounter <= Clusters_End; intClusterCounter++)
    {  // for every classSound in Cluster list
        classSounds cSound = lstSounds[intClusterCounter];
        string strEnumReplacement = EnumReplacement(ref cSound);

        for (int intTextCounter = 0; intTextCounter < cSound.lstText.Count; intTextCounter++)
        { // for every letter-combination defined in this cSound's list 
            string strCluster = cSound.lstText[intTextCounter];
            if (strWord.Length >= strCluster.Length)
            {  
                if (strWord.Contains(strCluster))
                { 
                    // Cluster matches word -> replace it with its unique sound-tag
                    DissectWord_ReplaceEnum(ref strWord, ref cSound);
                }
            }
        }
    }

//  split the string sequence of soundtags into a string array
    char[] chrSplit = { ']', '[' };
    string[] strEnumList = strWord.Split(chrSplit, StringSplitOptions.RemoveEmptyEntries);
    for (int intCounter = strEnumList.Length - 1; intCounter >= 0; intCounter--)
    {    // proceed through the list of strings in reverse order
         // convert each string representation of the sound-tag's unique ID back to integer
        string strEnum = strEnumList[intCounter];
        try
        {
            int intEnum = Convert.ToInt32(strEnum);
            int eItem = (int)intEnum;
            lstRetVal.Add(eItem);
        }
        catch (Exception)
        {
        }
    }

    return lstRetVal;
}

Creating the Phonetic Lists

The phonetic lists themselves will likely require fine tuning. As I use this writing tool, I will correct whatever issues I discover and gradually improve the already formidable performance. You can easily do this yourself with your own copy of this open-source program by editing the existing examples in the SoundsInit() method.

static void SoundsInit()
{
    lstSounds.Add(new classSounds("NULL"));

/////////////////////// prefix start value  //////////////////////////////////////////////////// 
    _intPrefixes_Start = lstSounds.Count;   // set prefix START value here
////////////////////////////////////////////////////////////////////////////////////////////////
                                            // as prefixes lead a word their importance 
                                            // in a rhyming dictionary are negligible
                                            // here are examples of classSound instances 
                                            // with only 1 Text combination each
    lstSounds.Add(new classSounds("extra"));
    lstSounds.Add(new classSounds("hyper"));
    lstSounds.Add(new classSounds("inter"));
    lstSounds.Add(new classSounds("trans"));
    lstSounds.Add(new classSounds("ultra"));
    lstSounds.Add(new classSounds("under"))
    lstSounds.Add(new classSounds("super"));;
    lstSounds.Add(new classSounds("anti"));
    lstSounds.Add(new classSounds("auto"));
    lstSounds.Add(new classSounds("down"));
    lstSounds.Add(new classSounds("mega"));
    lstSounds.Add(new classSounds("over"));
    lstSounds.Add(new classSounds("post"));
    lstSounds.Add(new classSounds("semi"));
    lstSounds.Add(new classSounds("tele"));
    lstSounds.Add(new classSounds("con"));
    lstSounds.Add(new classSounds("dis"));
    lstSounds.Add(new classSounds("mid"));
    lstSounds.Add(new classSounds("mis"));
    lstSounds.Add(new classSounds("non"));
    lstSounds.Add(new classSounds("out"));
    lstSounds.Add(new classSounds("pre"));
    lstSounds.Add(new classSounds("pro"));
    lstSounds.Add(new classSounds("sub"));
    lstSounds.Add(new classSounds("de"));
    lstSounds.Add(new classSounds("il"));
    lstSounds.Add(new classSounds("im"));
    lstSounds.Add(new classSounds("ir"));
    lstSounds.Add(new classSounds("in"));
    lstSounds.Add(new classSounds("re"));
    lstSounds.Add(new classSounds("un"));
    lstSounds.Add(new classSounds("up"));
///////////////////////////// prefix END value ////////////////////////////////////////////////
    _intPrefixes_End = lstSounds.Count - 1;    // set prefix END value here
///////////////////////////////////////////////////////////////////////////////////////////////

    //  tripthongs      -       diphthongs      -       Consonantal Clusters        -       letters
///////////////////////////// Cluster start value /////////////////////////////////////////////
    _intClusters_Start = lstSounds.Count;     // set cluster START value here
///////////////////////////////////////////////////////////////////////////////////////////////
    classSounds cSound = new classSounds();   // create an instance of a classSound
    cSound.lstText.Add("ayer");               // add multiple equivalent or similar sounds 
                                              // to the same classSound object
    cSound.lstText.Add("ower");               // these different groups of letters will 
                                              // ALL have the phonetic value
    cSound.lstText.Add("oyer");
    cSound.lstText.Add("our");
    cSound.lstText.Add("ure");
    lstSounds.Add(cSound);                    // insert the new instance of the classSound 
                                              // into the existing list

    cSound = new classSounds();
    cSound.lstText.Add("ord");
    cSound.lstText.Add("ard");
    cSound.lstText.Add("urd");
    lstSounds.Add(cSound);

As these sound groupings are processed in the order in which they are inserted into the list, the longer string values should be tested first, otherwise a similarly spelled shorter string combination may discount a longer one and could cause unexpected results to the user's experience. Since the Suffix, Prefix and Cluster list of sounds are all in the same list and only denominated by integer variables that are used to distinguish them from each other in the DissectWord() method, you want to be certain those delimiting integer values reflect the sequence in which they all appear.

static int _intPrefixes_Start = -1;
static public int Prefixes_Start
{
    get { return _intPrefixes_Start; }
}
static int _intPrefixes_End = -1;
static public int Prefixes_End
{
    get { return _intPrefixes_End; }
}
static int _intClusters_Start = -1;
static public int Clusters_Start
{
    get { return _intClusters_Start; }
}
static int _intClusters_End = -1;
static public int Clusters_End
{
    get { return _intClusters_End; }
}
static int _intSuffixes_Start = -1;
static public int Suffixes_Start
{
    get { return _intSuffixes_Start; }
}
static int _intSuffixes_End = -1;
static public int Suffixes_End
{
    get { return _intSuffixes_End; }
}

You can see these integer values being assigned in the SoundsInit() method using the lstSounds.Count Property and any changes you make to that method should take them into consideration. Including sounds intended for Cluster (not head/tail of word but anywhere in the middle) in either prefix or suffix lists will not give you the results you want.

N.B.: To make changes to your data-tree, you'll need to delete the CK_RhymingDictionary.tree file it builds and relies on before re-launching the app after you've made any changes to the SoundInit(). Just launch the app again and it will build the data-tree like it did the first time you launched it. The words I used to include into this rhyming dictionary were the file names of the Rhyming.Com website that I scraped for a different project. As it would have been impossible to provide you with all these files (cumbersome and unnecessary), I collated their file names into 26 separate TextFiles located in the Debug/Bin subdirectory of the source code you need to download. Its a simple thing to add/remove words from that list and alter your own personal rhyming dictionary. If you want to make a rhyming dictionary in a different language such as Polish, French or German. You'll have to change the SoundInit() method in order for it to reflect the language you intend to rhyme. This will likely require a bit of tinkering on your part but it's really not that painful.

Points of Interest

This is not the first Ternary-Tree I've built. They tend to use up far too much memory to be worth using in RAM memory so I would normally opt for a Binary-Tree for a lot of my search methods but as this project relies on the Ternary Tree's unique properties to track through a phonetic signature of repeated sounds (not unique tree leaves such as you'd find in a binary tree), I don't know of any better alternative though there very well may be. I originally had only one ID for each letter combination so 'ph' was unique and different from 'ff' or even 'f' which was kind of useless. Since that only took me a few hours to write, I worked on it a little longer and made the necessary changes to build the current version.

History

30^th January, 2022 - First published