Description of Solution Method

Next: Stage 5 Up: Stage 4 Previous: How we Found the

Description of Solution Method

There are two major obstacles when deciphering a Vigenère cipher. The first is to find the length of the keyword superposed on the plaintext, and the second is to find the shift between adjacent letters in the keyword.

If the keyword length is five, every fifth letter in the ciphertext have been shifted the same distance. This means that for the offsets 0, 1, 2, 3, 4, the text obtained by reading every fifth word will be an ordinary Caesar shift of the plaintext. Given a string x of n characters, the index of coincidence of x, denoted I_c(x) is defined as the probability that two random elements of x are identical. If the frequencies of the 26 letters in x are denoted f₀,..., f₂₅, this probability can be written

I_c(x) = $\displaystyle {\frac{\displaystyle\sum_{i=0}^{25}f_i(f_i-1)}{n(n-1)}}$ .

(1)

For a completely random string this value will be approximately 1/26 $\approx$ 0.038. However, if we let p_i be the probability that a random letter in English text is i, then for an English text x,

I_c(x) $\displaystyle \approx$ $\displaystyle \sum_{i=0}^{25}$ p_i² $\displaystyle \approx$ 0.065.

(2)

If the different letters in a text are shifted with different shifts this text will look like random text. However, if they are shifted with the same shift the index of coincidence should be that of the plaintext. If we compute the mean of the index of coincidence for the text obtained by reading every fifth letter for the five offsets we thus expect that the index of coincidence will be closer to 0.065 than to 0.038 if five is the correct keyword length. This is also what we observe above.

The next problem is to determine the difference in shifts between the adjacent letters in the text. This is done in a similar way. Let x and y be two strings with n_x and n_yletters, respectively. The mutual index of coincidence of xand y, denoted MI_c(x,y), is defined as the probability that a random element of x is identical to a random element of y. If f^x₀,..., f^x₂₅ and f^y₀,..., f^y₂₅ is the frequencies of the 26 letters in x and y, respectively, this probability can be written

MI_c(x,y) = $\displaystyle {\frac{\displaystyle\sum_{i=0}^{25}f^x_i\cdot f^y_i}{n_x\cdot n_y}}$ .

(3)

Assume once more that the keyword length is five. Let y₁,...,y₅ be the five substrings obtained by reading every fifth letter for the five different offsets. The substrings y_i are all obtained by a shift encryption of the plaintext. Assume that K = (k₁,..., k₅) is the keyword. It is quite easy to see that in this case

MI_c(y_i,y_j) $\displaystyle \approx$ $\displaystyle \sum_{h=0}^{25}$ p_{h - k_i}, p_{h - k_j} = $\displaystyle \sum_{h=0}^{25}$ p_h, p_{h + k_i - k_j}

(4)

where the subscripts are reduced modulo 26. If we test all possible relative shifts of two strings of English text we will see that when the relative shift is 0, the mutual coincidence will be approximately 0.065; and otherwise it lies between 0.030 and 0.045. This means that if we test all possible values for k_i - k_j for the two substrings y_i and y_j, the mutual coincidence should be close to 0.065 for the correct relative shift and between 0.030and 0.045 in the other cases. This will give us an linear system of equations for the five shifts k₁,..., k₅ which can be solved and which will yield the relative shifts for the five substrings.

Now we only have 26 possible keywords and plaintexts, and a visual inspection easily reveals the solution.

Next: Stage 5 Up: Stage 4 Previous: How we Found the

solvers@codebook.org