Java Enigma

This is a Java implementation of an Enigma machine, along with code that attempts to break the encryption. This code is associated with an upcoming Computerphile video.

The Enigma Machine

An enigma machine is a mechanical encryption device that saw a lot of use before and during WW2. This code simulates a 3 rotor enigma, including the 8 rotors commonly seen during the war.

Creating a Java Enigma

The code itself is fairly straightforward. You can create a new enigma machine using a constructor, for example:

enigmaMachine = new Enigma(new String[] {"VII", "V", "IV"}, "B", new int[] {10,5,12}, new int[] {1,2,3}, "AD FT WH JO PN");

Rotors and the reflector are given by their common names used in the war, with rotors labelled as "I" through to "VIII", and reflectors "B" and "C". I've not implemented every variant, such as the thin reflectors seen in naval 4-rotor enigma. You could easily add these if you liked. Starting positions and ring settings are given as integers 0-25 rather than the A-Z often seen, this is just to avoid unnecessary mappings. The majority of the code here treats letters as 0-25 to aid indexing. Plugs are given as a string of character pairs representing steckered partners. If you don't wish to use a plugboard, "" or null is fine.

Encrypting and Decrypting

Given an enigma instance, encryption or decryption is performed on character arrays of capital letters [A-Z]. Simply to save time I've not done a lot of defensive coding to remove invalid characters, so be careful to only use uppercase, or to strip unwanted characters out of strings. Here is an encryption example using the enigma machine above:

char[] plaintext = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".toCharArray();
char[] ciphertext = enigmaMachine.encrypt(plaintext);
String s = new String(ciphertext); // UJFZBOKXBAQSGCLDNUTSNTASEF

You can quickly check everything is working by running the tests found in the EnigmaTest.java file.

How it works

Throughout the enigma machine, letters A-Z are represented as integers 0-25. Most of the components, the rotors, reflector and plugboard are treated as arrays that map values 0-25 to a different set of values 0-25. Encrypting or decrypting is simply a case of passing a value through these arrays in turn. What makes enigma trickier is that the arrays rotate, and that they can have different starting or ring positions. For efficiency in this implementation I keep the arrays fixed, and simulate rotation by shifting the index in and out of each rotor. Before each character is encrypted the rotors rotate, sometimes causing the neighbouring rotors to also rotate, this is handled by the rotate() function. Enigma has a quirk whereby the middle rotors moves twice when it turns the left-most rotor. This is called double stepping, and is also implemented here.

Breaking a Code

Breaking an enigma message here comes down to decrypting a ciphertext with all possible rotor configurations and seeing which output looks the best. We measure best here using a fitness function.

Fitness functions

The code makes a number of fitness functions available that can be used to measure how close a test decryption is to English text. Each works similarly, some work better than others. You can test to see which work best for a given message. The fitness functions are:

Index of coincidence. The probability of any random two letters being identical. Tends to be higher for proper sentences than for random encrypted text
Bi / Tri / Quad grams. The probability of a sentence measured based on the probability of constituent sequences of characters. Bigrams are pairs, such as AA or ST. Trigrams are triplets, e.g. THE, and so on.
Plaintext Fitness. This function is a known plaintext attack, comparing the decryption against all or portions of a suspected real plaintext. This is by far the most effective solution, even a few words of known plaintext will substantially increase your odds of a break even with a number of plugboard swaps.

Ciphertext Analysis

All the code required to crack enigma messages is found the analysis package, and an example (used in the Computerphile video) is in the main() method. The basic approach is as follows:

Decrypt the ciphertext with every possible rotor in each position, and rotated to each starting position. All rotor ring settings are set to 0. No plugboard. For each decryption, measure the text fitness using one of the available fitness functions. Save the best performing rotor configuration.
Fix the rotors, and iterate through all possible ring settings for the middle and right rotors, again testing the fitness of the decryption. You do not have to use the same fitness function as before.
Fix all settings, and then use a hill climbing approach to find the best performing plugboard swaps, again measured using a fitness function.

Notes

The code is fairly efficient, Enigma boils down to a lot of array indexing into different rotors. This said, I didn't worry too much about speed, it's plenty fast enough. I used classes and functions rather than doing things inline, for example. Modern compilers will optimise a lot of it anyway.
Similarly, in the brute force key search code, for simplicity I create new enigma machines as required rather than implementing a number of specific reinitialisation functions that would be faster.
I've not written any kind of command line parsing here. You're welcome to add this, but i felt for a tutorial on enigma and breaking it, a step by step procedure in main was fine.

Resources

For more details on enigma, the wikipedia articles are a great resource.
This attack is a variant of that originally proposed by James Gillogly. His work on this is still available via the web archive here.
If you'd like a more visual example of both encryption and cracking enigma codes, the Cryptool project is a great tool for this. Cryptool 2 has a great visualiser and can run cracking code similar to my own. I used cryptool to write the tests I used to make sure my own enigma implementation was working.