21 Page 44
On the other hand it would be quite wrong to assume that it was 40:1 on that the new fertiliser was better than the usual type. This would be equivalent to neglecting the numerator in the special form of Bayes' theorem, namely the probability of obtaining as good a result as the one obtained with a fertiliser known to be better than before. This may be hard to estimate but it is at any rate less than one. Another equally important criticism is that we are throwing away a lot of evidence if we say only that the result of the experiment is that a deviation of at least 2σ above the mean in obtained. The result is likely to be known more exactly, say that the deviation is between 2.0σ and 2.1σ, and in this case the factor in favour of the hypothesis would be less (with a normal distribution). These points are stressed because there is a prominent school of Statisticians who do not even accept Bayes' theorem.
An example of this from our work is given by the score on a 1+2 break-in. Suppose the best score is 4σ without serious rivals. 4σ or better occurs at random once in 30,000 experiments so it would be natural to imagine that the odds of the setting given are 30,000 divided by 1271 or 23.1 on. In fact they are more like 3:1 on, (that is, even after a factor has been set against all the other settings due to the existence of no serious rival), though the odds depend to a reasonable extent on the particular link and length of tape and d. In the very early days of the section there was a tendency to continue with a message for some time if it gave a 4σ, since it was not believed that the odds could be much below 20:1 on. This was before the deciban had been brought over from Hut 8. (Later on the deciban exerted an influence on the work of the Testery also, due to the liaison between the two sections.)
(p) The principle of maximum likelihood.
If one has a continuous sequence of possible theories depending on a parameter x, it often happens that one has very little knowledge about the prior probabilities of the theories. If an experiment is done whose result has probability ƒ(x), then the numbers ƒ(x) are the relative factors of the various theories concerning the magnitude of x. ƒ(x) often has a maximum value at say x = x0. Then x0 is called the maximum likelihood solution for x. For a given value of ε it is more probable that x will lie in the interval (x0 - ε, x0 + ε) than in any other interval of the same size, provided that the prior distribution is uniform. In this special case the maximum likelihood solution is equal to the 'most probable value'. Neither of these should be confused with the expected value.