I have an application that is used for setting land values in New South Wales, Australia. The various people that use the program are required to enter notes and justifications for their values, but for various reasons, spelling mistakes creep into their notes. It's probably due to gremlins fiddling with the database overnight, because none of them actually make mistakes themselves. In order to help them find the errors, they requested that I include a spell checker with the program.
My application is written using WTL. I'm actually becoming a bit jaded with this ... there is quite a bit of source code now, and it's getting to the point where it's taking a very long time to build. sloccount[^] says that there are about 110,000 lines of C++ code, and at least 1/2 of it has to be rebuilt and re-linked when anything of significance changes. Painful. Just out of curiosity, I checked the preprocessor output of the main source code file: there are around 50,000 lines of non-blank, non-comment code after the inclusion of stdafx.h, and something over 400,000 including the stdafx.h code. I'm not trying to brag about the size of the project here (it's just not that big); I'm saying that if you're considering WTL, and the project is going to be big, think long and hard about it. Build times get out of hand.
Anyway, I went in search of a spell-checker. I've been using VSSPELL version 6 for simple behind-the-scenes spell checking for a different client, but that's getting a bit long in the tooth. Also, although I bought the thing quite a few years back, it appears to be popping up a nag screen when I want to use the GUI component. I really wanted to use the GUI component so that it could hook into my edit windows and do the red underlining for me. So something else was required.
I came to The Code Project first, and found Matt Gullett's Spell Checking Engine. This was written for MFC, so wasn't immediately useful to me. I also wanted an Australian dictionary. The hunt continued. I did regular net searches and found aspell, Hunspell, and several other commercial offerings. Couldn't find Matt Gullett's www.spellican.com. Eventually, I settled on Hunspell. It's Open Source, people seemed happy with it, there is a nice MSVC project available for the library, and the Australian English dictionaries are available. And, well, gosh ... if it's good enough for OpenOffice, it's good enough for me.
So I downloaded it, built it, and then had to try and figure out how to use it. The API is not well documented ... or maybe I just couldn't find it. I tracked down NHunspell[^], and that turned out to be useful in figuring out how to use the API. So all was good.
At that point, I needed to incorporate the checker into the edit window. Out came Matt Gullett's code, and I unashamedly stole his edit control code, and ported it into the WTL environment.
Using the Code
There are three main items that I want to address: using the wrapper that I built for the Hunspell code, using the
CSpellCheckEdit class, and the anatomy of the
Using the Singleton SpellCheck Wrapper Class
I've been using STL for my strings and collections throughout my application, so I've continued with the same convention here. It's a good fit with the WTL stuff. Having said that, I've provided both
const char* and
const std::string& versions of methods where it's reasonable. Feel free to add your own
const CString& methods as well.
Please note that the version of Hunspell that I downloaded doesn't come with Unicode methods. If you're writing code in a Unicode environment, you should have a look at NHunspell[^], as it contains all of the wide-to-multibyte string conversion code that you will need.
SpellCheck wrapper class is a singleton class. Why? Because, starting up the Hunspell checker is expensive, and I only want to do it once. To get things under way, you just need to get a reference to the singleton and then tell it where your dictionaries are.
SpellCheck& sc = SpellCheckS::instance();
sc.loadDicts("en_AU.aff", "en_AU.dic", "custom.dic");
When you invoke the
loadDicts() method, the
SpellCheck object starts a thread to create the actual
Hunspell object and load the dictionaries. It also reads the words (one word per line) from the "custom.dic" file and adds them to the dictionary.
To check a word, invoke the singleton's
spell() method. This method will return
true (indicating that the spelling is valid) if the dictionaries are not available for checking yet or if the dictionaries have the word as valid; or
false if the word is determined to be incorrect by the spelling engine.
SpellCheck& sc = SpellCheckS::instance();
I indicated above that the dictionaries might not be available for checking words at the time you want to check. The reason is that the dictionaries are loaded by a separate thread. I found that the dictionaries didn't load quite as quickly as I would like, particularly in debug mode. My users are used to the login screen coming up immediately, and I didn't want to have to delay the appearance of this window, or introduce a splash screen that loaded stuff in the background.
So, I have a
_ready flag in the object. This is set to
true when the dictionaries have been loaded. Until this has been done, every word checked will be shown to be correct. I can't have a bunch of red ink all over the screen just because the spell check is slow loading. When it has loaded successfully, everything will seamlessly switch over to checking as per expectation.
All of the code that interacts with the actual or custom dictionaries are protected by a critical section.
Obviously, users expect a bit more than just a red flag to let them know that they've misspelled a word. Hunspell does contain a
suggest() method, so you can use this to get a list of suggestions for the user.
SpellCheck& sc = SpellCheckS::instance();
for (STRINGLIST::iterator it = options.begin(); it != options.end(); it++)
ATLTRACE("Suggestion: %s\n", it->c_str());
The nicest thing to do with such a list is to put it in a context menu so that the user can right-click an error message and simply replace the misspelled word with the correct one.
When I first started working with the Hunspell library, I wondered how to make it so that users could add words to the dictionary. Being property valuers, my users have their own collection of jargon and abbreviations that aren't necessarily represented in the common dictionary. I'd not really thought about it before, but the dictionaries provided with the library are essentially read-only. That's all well and good, but what about custom dictionaries? Hunspell has an
add() method that allows you to add a word to the dictionary, but that is only for the duration of the
Hunspell object. It doesn't propagate to the dictionary itself. I didn't really know what to do at that point.
Then, I experienced a D'oh! moment, and slapped my forehead. OK, when the user adds a word to the dictionary, I'll also write the word to their "custom.dic" file. When I load the dictionaries at launch time, I'll read that file and just add the words before making the spell checker available to the rest of the application. Right. Done.
SpellCheck& sc = SpellCheckS::instance();
When the program has finished, you should close the singleton
This ensures that the Hunspell object is deleted, so you don't get a million lines of memory leaks when you're debugging. Ahem.
Using the CSpellCheckEdit Class
OK, so we have the spell checker. Assume for the moment that we already have the
CSpellCheckEdit class available. How should this be used in a given WTL dialog box? We need to do a couple of things.
- Include the SpellCheckEdit.h file.
- Create a
- Subclass an edit control on the dialog.
- Reflect notifications.
What does this look like in code?
#include "SpellCheckEdit.h" // (1) above
class CMainDlg : public CDialogImpl<CMainDlg>
LRESULT OnInitDialog(UINT , WPARAM ,
LPARAM , BOOL& )
Anatomy of the CSpellCheckEdit Class
The file starts with an enum that contains the IDs of the commands that will be returned from the call to
TrackPopupMenu discussed below. Basically, the deal is that suggestions are added to the context menu, and these values are associated with them.
CSpellCheckEdit class itself is derived from
CWindowImpl<CSpellCheckEdit, CEdit>, so you will create instances of this class rather than derive from it yourself.
The class has an internal struct (
SpError) that represents errors that are found within the text. These have the rectangle that the spelling mistake lies within, the misspelled word, and the character position of the start of the word within the edit control. I've also included a
typedef that lets me refer to an
std::list of these as
typedef std::list<SpError> SPERRLIST;
Finding and Drawing Errors
The methods involved in finding and drawing errors are:
RedrawErrors (two signatures: one called by event handlers, one called internally)
RedrawErrors (the one called by event handlers) clears the list of errors that had previously been found, and then loops through each visible line of text in the control. It invokes the internal
RedrawErrors method for each line.
The original code from Matt Gullett's project uses the
CEdit::LineLength call incorrectly. The original code assumed that you passed a line number to
CEdit::LineLength to get the length of the line. This is not the case. You pass the character offset of a character in the line to get the length of the line containing that character. The upshot is that this:
190: int iLine = GetFirstVisibleLine();
191: int iChar = LineIndex(iLine);
192: int iLineLen = LineLength("color: red;">iLine);
was changed to this:
91: int iLine = GetFirstVisibleLine();
92: int iChar = LineIndex(iLine);
93: int iLineLen = LineLength("color: red;">iChar);
There is another instance of this same problem being corrected from FPSSpellingEditCtrl.cpp (216, 217) to SpellCheckEdit.h (116, 117).
While the original code worked, it looks like it checked each line many times. Possibly as many times as there were characters in the line. I didn't verify that ... I just saw that things were being checked way more often than they should have been.
RedrawErrors method gets each word from the given line (using
IsWordBreak to determine where words break), trims it, and passes it to
DrawError is the method that actually talks to the
SpellCheck object. Given the word that needs to be checked, this method invokes
SpellCheck::wordIsOK. If the word is OK,
DrawError simply returns before going any further.
If the word is not in the dictionary,
DrawError calculates the location and size of the word, and if the bottom of the calculated rectangle is within the bounds of the edit window, calls
Finally, it creates an
SpError object with the error's information and adds it to the control's
This method (I renamed it from
DrawSquiggly) simply draws the red dotted line under the misspelled word.
The "squiggly" was originally a jagged line, drawn by a series of oscillating
LineTo calls. I thought I could do a bit better that that, and, using GDI+, drew an anti-aliased multipoint Bezier curve under the word. That looked pretty cool. Then, I saw the spell checker in Firefox, and thought that that looked better. So now, my code draws the single dotted line under the misspelled word.
For the sake of interest, I've left the GDI+ code intact, and you can enable it if you want to. To enable this code,
#ifdef the code in
DrawSquiggly, and the block starting at wtlspell.cpp (20). You will also have to link with gdiplus.lib.
While not an event, this method fires off the timer.
This event handler kills the timer, and allows default processing to continue.
This is my least favourite method. It is invoked when the timer fires. The first thing it does is kill the timer. Later on, it recreates it.
I added this method to the original class because it caused the text of the window to be checked automatically when the text was set. For instance, by a
When the program receives an
EN_CHANGE message, it invalidates the current checked state and causes the visible text to be checked again.
This handler causes the program to redraw the errors if any are known to exist.
OnScroll (WM_HSCROLL, WM_VSCROLL)
Because the program only checks and redraws text in the visible lines of the edit box, scrolling means that the visible area changes, so the check needs to be redone. The rectangles associated with the errors will also be changed, and this is why we need to handle the horizontal scroll messages.
OnLButton (WM_LBUTTONDOWN, WM_LBUTTONUP)
I actually don't know why I handle these messages. There must have been a reason.
Not every key press results in a change to the text (an
EN_CHANGE message), so this message handles those cases where the errors should be redrawn despite the fact that the text is unchanged. Perhaps a selection is being extended using Shift+Arrow. The error squigglys need to be redrawn in this case.
This is the most interesting event handler (I think). This handler builds and shows the context menu for the spell checker. Here's what it does in overview:
- Get the point in the control that the left-click took place (client coordinates).
- If the click was not inside a misspelled word, allow the framework to handle the event in the default manner.
- Create a popup menu. Note that you don't create a menu, you create a popup menu. It took me quite a while to figure that out again. Sigh.
- Get a list of suggestions from the
- Add the first (up to) 10 suggestions to the menu.
- Add a separator, the "Add Word" item, and another separator.
- Add the normal Edit context menu items (Undo, Cut, Copy, Paste, Delete, Select All).
- Invoke the
- Handle the user's selection.
Points of Interest
After I'd done all of this code, and settled down to write an article, I discovered Curtis J's Spell Checking Edit Control (Using HunSpell) article. Argh! I could have used his, and just ported his edit control to WTL! Ah well, there are another couple of points of difference between his code and mine ... I would strongly urge you to check his for "Ignore" functionality, dictionaries for different languages, a more comprehensive "user dictionary", and a lot more
VERIFYs than I have.
- 2009-06-22: v1.0 - Initial release.
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.