Skip to content

Classic Cryptographic Systems

Data, Information and Wisdom

Simply having access to data doesn't mean that we have information. Information is not wisdom. Compression and encryption manipulate information.

The main goal of compression is to extract the information from the data, and encode it as efficiently as possible, with an algorithm that is publicly available. Encryption on the other hand, is the process of diffusing a key to the information available, and encoding it with a public algorithm.

By diffusion, we are referring to the idea that manipulating a bit in the key will result in a large and unpredictable change to the cipher text. If altering a part of the key only resulted in a small change to the text, then the encrypted data would be much easier to reverse engineer.

Diffusing the key into the data and generating huge changes in the data is known as the avalanche effect, and most modern cryptosystems try to diffuse the key into the data. The DES (Data Encryption Standard) diffuses the key into the data. Cracks to this algorithm rely on this property.

The Best Approach

When data is transmitted securely, we primarily do three things to it:

  1. Compress the data
  2. Encrypt
  3. Add error detection & recovery

This is done in the above order, as compression removes redundancies in the initial data, and also allows us to feed less data into the compression pipeline. The error detection and recovery is added as a last step, in the clear, to allow us to see and fix errors along the path, and ensure that the message is received error free.



This was one of the first documented uses of a cipher. It was a simple substitution of more common symbols for less well-known symbols. The scribe did not always substitute symbols, opting to only alter some of them.

Caesar Cipher

As covered in the introduction, this is one of the most simple monoalphabetic substitution ciphers. We simply replace one letter in the alphabet with another letter some fixed number of digits later in the alphabet. Caesar himself used a shift of three (meaning that, for example, "A" would become "D"). This is simple to write down:


We could also use a keyword such as ROME at the start of the substitution alphabet, which would introduce some more randomness to the alphabet. It is important to note that these substitutions cannot contain repeated characters, as otherwise the message is unable to be reversed.


Frequency Analysis

The Babington Plot

This was the event that led to the execution of Mary, Queen of Scots. Mary's letters were encrypted, using a simple substitution cipher, that notably included some null characters. The courier of the letter was a double agent, and the letter was sent to a languages expert, who analysed the frequency of the letters that appeared.

Once deciphered, the cipher was known, and a postscript was added to the letter, asking for names of any co-conspirators. Without frequency analysis, this letter may well not have been deciphered and the course of history may well have changed.

Following this basic frequency analysis, the concept of a crib was invented, which is a known sequence of letters or words. To make frequency analysis more difficult, we can break the text into words not matching the original word lengths, or encode short bursts of text.

Manual Analysis

Take the following as an encrypted ciphertext:

We can look at the most common letters that appear in this encoded text, then substitute the most common letters in the normal alphabet (in this instance u becomes e):
Then substitute t for g:
tZe appears a lot, let's assign Z to h:
And maybe thPt as that (P to a):
that MDH ReeT VDC the aRFJeC KF aWAhaQet
Let's assign D to o, as e, t, h and a have already been assigned:
that MoH ReeT VoC the aRFJeC KF aWAhaQe
Then see that JeWW is maybe a two letter sequence ll, mm, nn, tt, etc. Let's go with W as l:
Jell ToRe MoH haIe FolIeT thKF FHQFtKtHtKoR SKAhUC aRT the JoCT that
MoH ReeT VoC the aRFJeC KF alAhaQet
alAhaQet looks really similar to alphabet, so let's assign A to p, and Q to b.
Jell Tone MoH haIe FolIeT thKF FHbFtKtHtKon SKphUC anT the JoCT that
MoH neeT VoC the anFJeC KF alphabet
Notice anT: could this be and? T becomes d:
Jell done MoH haIe FolIed thKF FHbFtKtHtKon SKphUC and the JoCd that
MoH need VoC the anFJeC KF alphab
The rest is quite intuitive. Look at a remaining capital, see if it is likely to be a consonant or vowel, see what makes a full word.

The final sentence, for those who can't get to the end is:

well done you have solved this substitution cipher and the word that
you need for the answer is alphabet

Some useful things that can help with a substitution cipher like this are:

  • A, E, I, O are normally high frequency, followed by U, then Y (which some consider an almost vowel) is much lower.
  • Letters contacting low frequency letters are usually vowels.
  • Letters showing a wide variety of contact are usually vowels.
  • In repeated digrams, one letter is usually a vowel.
  • In reversed digrams, e.g., VX, XV, one letter is normally a vowel.
  • Double consonants and double vowels are usually bordered by the other type.
  • It is unusual to find more than five consonants in a row.
  • Vowels do not often contact one another.

Polyalphabetic Ciphers

These are ciphers which involve substitution between multiple alphabets. We have a number of substitutions \(C_1\) to \(C_n\), the plaintext \(P\), then rotate through each of the substitution alphabets. A Vignere cipher makes use of 26 alphabets \(C_1, \dots, C_{26}\), each with different offsets from the standard A-Z. We then define a key, which we then use to encrypt the string with the transformation at that alphabet. We then repeat from the start after the last alphabet used.

To perform cryptanalysis on a message, we find the length of the key, divide the message into simple substitutions with the same length as the key, then use frequency analysis to solve the remaining substitutions. To find the key, we can use the Kasisky test, or the index of coincidence.

Kasisky Test

This is where plaintext words such as the are separated by multiples of the key length encoded in the same way. For a key cat, this would work well with the following text:

Idx: 000000000011111111112222

In this instance, two of the the's are encoded as VHX. To perform the test, we search for identical segments of length at least 3. We then record the distances between the segments \(\Delta1, \Delta2, \dots\). The key length (\(m\)) should divide the \(\textrm{gcd}(\Delta1, \Delta2)\).

When looking at the distance between them, this is measured from the start of the first token of the first occurrence to the first token of the second occurrence. In the above example, this is from index 0 to 9, meaning a distance of 9.

This test is ultimately used for deduction of the length of the keyword. Once we know the length of the keyword, then we can see that each \(m\) samples, the same alphabet is used for encryption and thus we can run a simple frequency analysis.

Index of Coincidence

This is the probability that two random letters from a text will be identical. Truly random text of the form A-Z will have a \(\frac{1}{26}\) chance of pulling out each letter. The chance of two of the same letters simultaneously is thus \(\frac{1}{26} \times \frac{1}{26} = \frac{1}{676} = 0.00148\).

The chance, therefore, of picking out any two of the same two double letters is \(26 \times \frac{1}{676} = \frac{1}{26} = 0.0385\).

For any given text, we can compute the \(IC\), or index of coincidence with the following formula:

\[ \begin{align} IC &=\frac{ \sum_{i=a}^z f_i(f_i - 1) }{ N(N-1) } \\ &\approx \frac{ \sum_{i=a}^z f_i^2 }{ N^2 }\\ &= \sum_{i=a}^z P_i^2\\ \textrm{where}\\ N & \textrm{ is the total number of letters in the text}\\ f_i & \textrm{ is the frequency of a specific letter $i$ in the text}\\ P_i & \textrm{ is the probability of a letter $i$ in the text}\\ a, z & \textrm{ start and end of the alphabet, respectively} \end{align} \]

Monoalphabetic substitutions will have similar frequencies as the standard alphabet. If our cryptogram frequencies are similar to normal English, then we can be more confident that the text is in English, with a simple substitution. English has an \(IC\) of \(\approx0.066\), whereas random text is about \(0.0385\), as shown earlier in the text.

We can use this metric to find the key length of a Vignere cipher, assuming a text \(Y = y_1y_2y_3\dots y_n\). We guess the key length and split the text into a matrix of \(m\) rows. We then calculate the index of coincidence for each row \(Y_1\) to \(Y_m\).

The matrix is of the form:

\[ \begin{bmatrix} y_1 & y_{m+1} & \cdots & y_{n-m+1}\\ y_2 & y_{m+2} & \cdots & y_{n-m+2}\\ \vdots & \vdots & \ddots & \vdots \\ y_m & y_{2m} & \cdots & y_n \\ \end{bmatrix} \]

If \(IC \approx 0.065 \forall 1 \le i \le m\), then we can be reasonably confident that we have the correct key length. Once this is done, then we can use normal frequency analysis on the text.

Enigma and Rotor Ciphers

The Enigma machine was an example of a machine that produced a complex polyalphabetic cipher. The machine had an array of rotating disks with electrical contacts on either side. Rotors advance in position after each letter, changing the substitution, to produce an even more complicated system.

The Enigma machine was widely used in WW2 by the Germans, and with 3 cylinders chosen from a set of 6 and had a possible number of keys at \(K = 26^3 = 17576\). With 5 rotors, this is upped to \(K = 26^5 = 12 \times 10^6\) possible combinations.

As there are 3 cylinders picked from 6, there are \(6 \times 5 \times 4 = 120\) possible combinations of cylinders to pick to put into the machine.

Not only did the machine have a set of rotors, but also a plugboard which was a series of plugs to plug a cable into allowing different letters to be swapped.

For the plugboard, we apply the simple way of choosing \(m\) pairs from \(n\) objects formula: $$ \frac{n!}{(n-2m)!\times m!\times2^m} $$

So for 6 rotors, 10 plugboard connections and \(26^3\) initial starting rotor positions would have upwards of \(10^{21}\) possible keys.

The settings were stored in 24 hour increments, with a settings sheet produced and distributed. Luckily, the manual was obtained by the Allies, allowing the wheel wirings to be decoded and the method of operation to be decoded too.

The main issue was the interaction between the huge collection of plugboard settings and the smaller collection of the wheel settings. One flaw allowing the machine to be broken was that a letter was never encrypted as itself. The Enigma cipher was broken through the use of a machine called the Bombe, which was designed by Turing and used some messages with known formats and code books to decrypt the message. For example, weather messages were of a known format, and messages always ended with a familiar Nazi saying.

Transposition Ciphers

These differ to substitution ciphers in that the text does not change symbol, but instead its location within the main ciphertext. We can work at the letter/byte/bit level without altering the actual values, or combine these with the substitutions to create a stronger cipher.


This was one of the earliest transposition encryption ciphers and relied on both parties having a dowel of a particular diameter. The message was written lengthways on a piece of ribbon wrapped around the dowel, then removed for transport.

Persons with a different sized dowel would not be able to align and therefore read the message.

Rail Fence

Messages are written diagonally in a zig-zag across different rows (rails), then read off row-by-row. For example:

d . . . n . . . e . . . t . . . l . . . h . . . s . . .
. e . e . d . h . e . s . w . l . o . t . e . a . t . e
. . f . . . t . . . a . . . a . . . f . . . c . . . l .
This is then read off row-by-row: dnetlhseedheswloteateftaafcl. The key for this is simply the number of rows of the rail.


This is another transposition cipher. We have a keyword, which doesn't repeat any letters, e.g., destiny. The number of rows of plaintext are cut to the columns, then padded with random gibberish.

Index 1 2 3 4 5 6 7
Plaintext a t t a c k p
o s t p o n e
d u n t i l t
w o f g x n u

We then organize according to the alphabetical order of the letters in the key, getting 1, 2, 5, 6, 3, 4, 7 and the cipher text AODW TSUO COIX KNLN TTNF APTG PETU


For cryptanalysis, we take the message length and deduce that the key should be a divisor of the message length. For example, for a message that is 45, the possible key lengths would be 1, 3, 5 or 15.

If the guess for the key length is correct, then we should have anagrams if we read the text row by row, and any possible patterns that suggest we could reorder the columns.

Brute Force and Dictionary

For short keywords (e.g., 9 letters), we can try all possible words and look for anagrams, patterns, possible combinations of words. We can also perform dictionary attacks, with a wordlist of approximately 1,000,000, generating a text file of possible keys, then look at the plaintext that gives the highest fitness. Fitness would typically be measured using something like the index of coincidence, single letter frequencies, etc.

Hill Climbing

This method is a way to try and find the text using a key of length \(N \approx 10\). We generate a random starting keyword. This is known as the parent key. We apply the key, then measure the fitness of the resulting text. We then generate a child key with random swaps in the parent keyword.

We apply the key, measure the fitness and replace if the child key is a better fit. We keep ranking different keys until we get the deciphered text with the fewest rare sequenced. If still no decryption key, then we choose another random word length \(N\) and go back to the parent.

If still not found after several iterations of this, then go back and increment \(N\) by 1 and repeat. A modern PC can supposedly manage a key length of around 20.

Fitness Metrics

This is a measure to determine how similar a piece of text is to English text. Single letter frequencies are not too helpful for transposition ciphers, as the text itself doesn't change in frequency besides the random padding at the end. We can make use of digram statistics, trigram statistics and quadgram statistics to determine how similar a piece of text is to English.

Wrongly deciphered ciphertexts will contain sequences which are very rare in normal English.

Product Ciphers

Ciphers that perform one operation are less chaotic than ciphers that perform multiple operations. Combining substitutions and transpositions together make newer, harder ciphers. As seen above, the use of just a substitution cipher is relatively simple for cryptanalysis, as is the use of just a transposition cipher.


Often confused with stenography, steganography is the process of hiding a message in plain sight. The goal of cryptography is to make the data unreadable, whilst still acknowledging that the data exists, and possibly broadcasting it for the world to see.

Steganography, on the other hand implies no knowledge of the existence of the data. Historically, methods such as tattooing of a message or image on the messenger's head might have been used. As far ago as the ancient Chinese, secret inks have been used, where messages were written on silk, crunched into a ball and covered in wax. The messenger then swallowed this ball of wax.

Microdots have been used to shrink a page of text into a dot less than a millimetre in diameter, then hiding the microdot on a normal letter. Null ciphers, where the message is camouflaged in an innocent sounding message have also been used.

Steganosystems Principles

Steganosystems comprise a cover object, which is the unaltered medium, a cover process, where we hide the message through the process of embedding, to produce the stego-object.

Finally, there is a recovering process, where the receiver extracts the hidden message using the key. As a security requirement, the stego-objects should be indistinguishable from the cover objects they were created from.

Textual Steganography

This is the process of hiding messages in formatted texts. We could either add a message every \(n\)-th character, or at offsets from the start of each work. We can also distribute plaintext letters randomly in the cover text, then use a mask over the writing to read it.

Furthermore, using errors or stylistic features at predetermined points in the cover data, such as word shifting encoding can be done by altering the whitespace on the page.

Null Cipher

"Apparently neutral's protest is thoroughly discounted and ignored. Isman hard hit. Blockade issue affects pretext for embargo on by products, ejecting suets and vegetable oils."

Take the second letter of each of the word, and "Pershing sails from NY June 1" can be extracted from the text.

Visual Steganography

Real images or audio files can lose approximately 3% of their information before the degradation becomes noticeable in the picture or audio. We can use this to hide messages in images. A 24-bit full colour image comprises 3 8-bit RGB or CMY values. We can take the LSB of each plane and use this for a secret message. GIF stores an 8-bit pointer to 256 colours. The distribution and palette are optimized for each image.

Taking the LSB from this and the palette is reduced to 128 bits, which might not be enough. JPEG splits the scene into 8x8 blocks and applies a cosine transform to compress the data. JPEG is noisy but we can hide date within cosine frequencies and amplitudes.

Tools such as Steganography Studio and OpenStego are free tools that can be used for steganography.

Terminology for Modern Cryptosystems

Private Key Cryptography

This makes use of symmetric algorithms, and a single key for reading and writing the data. The key is shared between the sender and the recipient. As a sidenote, a symmetric cipher is often used as a session key after the initial asymmetric cipher is setup, as it is faster and more secure.

Public Key Cryptography

These form the asymmetric algorithms. We generate two keys, the private and the public key. We can encrypt with the public key of another user, for e.g., sending them documents, then they can decrypt using their private key. We may also use our private key to sign a document and a remote user can use our public key to verify that the document has not been tampered with.

This type of cryptography relies on trapdoor one way functions, and if these are found to be reversible, such as with Shor's Algorithm, then this has huge implications for modern cryptography and is why we are looking to move towards quantum.

One Way Functions

Essentially, given \(x\), \(f(x)\) should be trivial. Given \(y\), evaluation of \(f^{-1}(y)\) should prove computationally difficult. Discrete log is a good example of this.

Trap Door One Way Functions

This is where we construct a pair of functions \(f, g = f^{-1}\). Given \(x\), evaluation of \(f(x)\) is trivial. Given only \(f\) and \(y\), evaluation of \(f^{-1}(y)\) is computationally difficult. Given \(g,y\) \(f^{-1}(y) = g(y)\) should be trivial. A good example of this is RSA.

Other Properties

One specific type of attack is the known plaintext attack. This is where the attacker knows \(C\), the ciphertext, and \(P\), the plaintext. Knowing these, we should only ever be able to reveal the public key.

We should also make it computationally difficult to generate the private key from the public key. As mentioned earlier, public algorithms are orders of magnitude (\(10^3\)) slower than symmetric ciphers, so we generate (carefully) a random session key, which we encrypt with the third party's public key.

Asymmetric systems are also vulnerable to exhaustive enumeration of plaintexts. We generate pairs of plaintexts and ciphertexts until the ciphertexts match. When they match, then the plaintext has been found. We therefore salt the plaintext to produce a different ciphertext.