There are two questions to answer first:
- Which is more important, memory or execution time?
- How big is the character set you're processing?
If you've got plenty of memory available compared to the size of the size of the character set then a full frequency table (as in solution one) is probably the way to go. Please use C++ though and not some C relic[1]. Use:
std::vector<int> frequency_table( 256 );
rather than an array. You could even be flash and use
std::for_each
to do the counting in not many lines from a stream or another collection.
If you're limited by memory OR you can't fit the full frequency table in the application's working set then you can use a sparse frequency table. Make a map of ints to ints:
std::map<int,int> frequency_table;
and use it
exactly as you would the table in the other solution. It'll be slower but if it's the difference between code working and not working I'll take slower any day.
And if you bury it all in a class you can template the lot on the collection you're using and change it if it's too slow/too big. What's not to like?
Oh, one final thing. If your code is meant to handle UTF-8 or UTF-16 remember that one C++ char/wchar_t may not be a full character in those representations.
[1] I use the word relic to refer to the practice of using C constructs in C++ programs. I'm not calling C a relic, it's not - it's still one of the most important languages on the planet.
Edit for editor fail - I should be more careful about pasting stuff