Click here to Skip to main content
15,888,301 members
Articles / Programming Languages / C++
Article

an efficient C++/STL library for word puzzles and spell checking

Rate me:
Please Sign up or sign in to vote.
1.80/5 (6 votes)
13 Sep 2005CPOL1 min read 39.3K   611   21   2
DicoLib stores words in lists of anagrams indexed by their length and a 26 bits bitset which describe which letters are present in the words. This makes it extremely fast to search for words which contain specified letters, and to search for words which are "close" for spell checking appli

Introduction

This project started from a puzzle : There is a common English word that is nine letters long. Each time you remove a letter from it, it still remains an English word - from nine letters right down to a single letter. What is the original word, and what are the words that it becomes after removing one letter at a time? I wrote DicoLib to find the word, but since I stupidely translated "nine" by "8" in the code, I didn't find THE solution, but some others listed here DicoLib uses an original data structure that optimises the search of words which are "close" to a given one or contain the same letters. It is therefore suitable for spell checking, although this application hasn't (yet) been developed.

Data Structure

DicoLib's indexes words:

  • by their length
  • and by a bitset of 26 bits which describe which letters are present in the word

Actually, DicoLib stores a list of anagrams for each length/bitset pair. Finding anagrams of a given word therefore takes 0 time!

Finding words that are made of specified letters for Scrabble, or of a given length and contianing given letters is extremely fast as it requires mainly bitset operations and table lookup.

The whole data structure used in DicoLib is depicted in the schema below:

Image 1

Source code

DicoLib is available on SourceForge You're welcome to contribute to the development of this project towards a very efficient spell-checker or any other application you may think of.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Technical Lead
Switzerland Switzerland
Born 1963, Programs since 1979 (Commodore PET).
PhD, consultant and software architect of technical software, especially add-ins for mechanical engineereing CAD.
I like science in general, geometry, algorithms and Python

Comments and Discussions

 
GeneralWould have been valuable if a little more care were shown. Pin
WREY16-Sep-05 8:49
WREY16-Sep-05 8:49 
GeneralBroken link Pin
TheGlynn14-Sep-05 0:15
TheGlynn14-Sep-05 0:15 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.