6. It is fairly obvious that the repeat rate for plain language is higher than that of random cypher material - and in particular that the rate for repeats of several letters in succession is much higher e.g. there is much better chance if two stretches of English are written out one underneath the other of getting THERE written underneath THERE than there is of getting, say, SJLWV coming underneath SJLWV in two random stretches of cypher material. Some approximate figures for German Naval traffic will illustrate this.
Probability of getting repeats of various lengths in Naval Traffic and at random.
|4 letter||1/264||100 x 1/264|
|6 letter||1/266||15000 x 1/266|
7. This provides us with a criterion for testing whether or not a given "distance" is correct viz. the number of repeats or "score" obtained by writing out the messages concerned at that distance. Going back to our example in Para. 4 if we write the encyphered messages out at a distance 5, we get this result.
The "score" is 7 repeats, including 2 two letter repeats (bigrams) in a stretch of 32 letters. The whole result is written
TYQ = TYB + 5 7xx/32 ("x" denoting a bigram).
which in words would be roughly this "We have tested the assumption that the message whose encyphered set-up (i.e. its trigram) is TYQ starts 5 places ahead of the message with encyphered set-up TYB. The result of this test is a score of 7 with 2 bigrams out of 32". More briefly still we might write Q = B + 5, 7xx/32 since - the TY being the same in both messages - the score is concerned simply with the encypherments of the third letters of the indicators. To determine the merits of this score it is necessary to know whether it is more likely that such a result would