Click here to Skip to main content
15,867,308 members
Articles / Programming Languages / C++
Article

Automated Cryptanalysis of Cryptograms

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
11 Jul 20013 min read 57K   1.5K   19   4
An article, code, and a sample project showing how to use computers to help break cryptograms.

I assume you know what cryptograms are. A cryptogram breaking program can be thought of as two programs working together.

  • One that generates a sequence of possible solutions
  • One that evaluates each solution

Generator:

do
	generate a random initial permutation
	do
		for each pair of letters in the permutation
			swap the two letters
			evaluate the permutation
			if worse, then swap them back
		end for
	while there was an improvement
	if this was the best solution then save it
until you are bored

Evaluator:

for each subsequence of 4 letters (each 4-gram) in the solution
	add the cost associated with the subsequence
end for

Where the cost associated with each 4-letter subsequence is proportional to the inverse of its frequency in the book text. If the frequency is zero then assign some minimal baseline frequency. This evaluation method is a limiting point of the chi-square cost after assuming that each 4-gram has appeared infinite times. This assumption is blatantly wrong, but it makes the evaluation linear and therefore fast.

Evaluation example:

if the book text is:
"to be or not to be"

and the cryptogram solution for evaluation is:
"...qwe rwzxcrb..." --> "...for nothing..."

then the cost would be:
...
+ cost (for ) = 1/epsilon
+ cost (or n) = 1/1
+ cost (r no) = 1/1
+...

because "for " never occurs in the book text and "or n" and "r no" each appear once.
For real, you would use a *whole book* for the book text.

Hypothetical question/answer:

Q: Why do you use 4-grams and not 3 or 5?
A: Because a precomputed table of 4-gram cost fits in 4 megs of RAM, which is a reasonable amount to expect people to have. Lower counts give worse results and higher counts require using a hash table which would make the code more complicated and would add the complication of deciding what the hash table size should be. I've found that you get improvements up to 6-grams with cryptograms with spaces and up to 5-grams with cryptograms without spaces. After that I believe the book text becomes the limiting factor and other methods of solving would be more appropriate.

Q: Can the program be modified to solve cryptograms without spaces between the words?
A: Yes, in fact the code is simpler than solving ones with spaces. Just tweak the function that filters the text to not bother with spacing. Similarly, you can solve cryptograms where the space is like a 27th letter simply by looping through 27 elements instead of 26 when shuffling and letter swapping.

Q: Have you applied these methods to the cryptanalysis of other classical ciphers?
A: No. If you want to then please go ahead - that's what this article and code is for.

Q: Your code looks like something my dog threw up! Can I suggest some modifications?
A: Yes please. This is actually the first time I've written code that someone else might look at and I would appreciate any suggestions. No optimisations though unless it also makes the code smaller and cleaner and gives me a warm and fuzzy feeling.

Q: I ran your sample program but it just gives an error message and then quits. What gives?
A: The program is useless without a sample text to read. If you want it to break shakespearean quotes then get some from the web and rename the file to "book.txt" and put it in the same directory as the exe. Or if you want it to do french then rename "cyrano.txt" to "book.txt" and put it in the same folder.

Q: How can I save the output of the sample program?
A: First make sure the program is stopped (as in not trying to break the cryptogram anymore) and then cut and paste the text from the window.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
United States United States
I'm a recent college graduate and have resigned myself to windows programming for now. I enjoy volleyball and juggling on the beach under the midnight sun. I live in Homer Alaska, Halibut fishing capital of the world!

Comments and Discussions

 
GeneralI loved it Pin
Peter Hawke18-Dec-11 17:17
Peter Hawke18-Dec-11 17:17 
GeneralNeed encrypted text Pin
Ahmed Toulan2-Jan-06 1:36
Ahmed Toulan2-Jan-06 1:36 
GeneralRe: Need encrypted text Pin
eyad.alim11-Jan-07 14:14
eyad.alim11-Jan-07 14:14 
please confirm your question with sample

thanx,
eyad.
GeneralActually pretty cool Pin
19-Jul-01 21:59
suss19-Jul-01 21:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.