Publications of the University of Texas Publications Committee: W. J. BATTLE E.0.BABKEB J. C. TOWNES A. CASWELL ELLii w. s. CARTER R.A.LAW KlLLIS CAMPBELL J. A. LOKil F. w. SJMONl'.>S A. C. JUDSON The University publishes bulletins six times a month. These comprise the official publications of the University, publica­tions on humanistic and scientific subjects, bulletins prepared by the Department of Extension and by the Bureau of Munic­ipal !Wsearch and other bulletins of general educational in­terest. With the exception of special numbers, any bulletin will be sent to a citizen of Texas free on request. All communica­tions about University publications should be addressed to the Editor of University Publications, University of Texas, Austin. BULLETIN OF THE UNIVERSITY OF TEXAS 1915 :No. 4. JANUARY 15 1915 c-_--=----.,,--===-------------c:==:c::_cc_=-----------------'~-----=---­ The Arithmetic Mean as Approximately the Most Probable Value a Posteriori Under the Gaussian Probability Law BY EDWARD L. DODD Published by the University six times a month and entered as second class matter at the postoffice at Austin, Texas The benefits of education and of useful knowledge, generally diffused through a community, are essential to the preservation of a free gov­ernment, Sam Houston. Cultivated mind is the guardian genius of democracy... . . It is the only dictator that freemen acknowl­edge and the only security that free­men desire. Mirabeau B. Lamar. 'l'HE ARITHMETIC MEAN AS APPROXIMATELY THE MOST PROBABLE VALUE A POSTERIORI UNDER THE GAUSSIAN PROBABILITY LAW. BY EDWAR.D L. DODD. §1. OBJECT OF PAPER. The object of the present paper is to harmonize as much as possible the Principle of the Arithmetic Mean and the Gaussian Probability Law, viewed from the standpoint of probability a posteriori. By the principle of the arithmetic mean is under­stood the statement that, if measurements are made of a mag­nitude under like circumstances, the most probable value of the magnitude is the arithmetic mean or ''average'' of the measure­ments. The Gaussian law will be explained later. That a lack of harmony exists has been known for a long time. • But experience has seemed to substantiate both prin­ciple and law to such a high degree that an analysis of the dis­crepancy between the two has not invited the serious attention of many mathematicians. In fact, a casual reader of many books on Least Squares or the Theory of Measurements might get the idea that the prin­ciple could be deduced from the law or vice versa. It is not the intention here to condemn the suppression of intricate details in an elementary text-book. But the separate presentation of the principle and the law appears preferable to arr attempt to unite the two by pseudo-logic. Assuming the validity of the Gaussian Law, I have compared the arithmetic mean with several functions of the measure­ments.+ These comparisons show the arithmetic mean superior *Bertrand: Calcul des Probabilit6s, Paris (1889), p, 180. "La r?lgle des moyennes, il importe d'insister sur ce point, n'est ni demontr6e ni exacte." tThe probability of the arithmetic mean compared with that of certain other functions of the measurements; Annals of Mathe­matics, June, 1913, pp. 186-198. The error-risk of certain functions of the measurements; Mon­atshefte fuer Mathematik und Physik, XXIV Yahrgang, 1913, pp. 268-276. The error-risk of the median compared with that of the arith­ metic mean; Bulletin of the Universily of Texas, No. 323, March 15, 1914. The University of Texas Bulletin to most of the functions considered, but not to all. To conduct these comparisons, direct probability was used. In this paper, on the contrary, probability a posteriori will be used. Poincare discusses • the relation between the law and the principle, making use of probability a posteriori and the so­called ''probability of causes.'' He finds that the ''proof'' given by Gauss involves an assumption that the a priori prob­ability is a constant. This assumption, while it may seem somewhat unwarranted, appears at first sight as the most simple and perhaps the most reasonable assumption to make. But the assumption happens to be at variance t with one of the most fundamental principles of the theory of probability; viz., that the sum of the proba­bilities of the possible events shall be equal to one (unity), the symbol for certainty. On that ground we can not entertain such an assumption. A natural course is then to try to modify this assumption and to deal with an a priori probability which is nearly con­stant, taking care to give this word "nearly" some mathemat­ical precision. Again, it is natural to enquire if there are other assumptions concerning the a priori probability which will ap­peal to us as in any sense reasonable. And in view of the inti­mate relation which seems to exist between the Gaussian law and the arithmetic mean, it is natural to try to combine with the Gaussian law some postulates, as broad as possible, concern­ing the a priori probability, which will lead logically to the arithmetic mean as at least a very natural and close approxima­tion to the most probable value of the unknown. The object of this paper is, then, to investigate the nature of the a priori probability which permits a close relation between the Gaussian probability law and the principle of the arithmetic mean. 'fhe meaning of probability a priori in a problem of this kind is a matter for reflection. I shall not attempt to define it. We are forced continually to deal with problems in which the ulti­mate concepts are undefined and perhaps susceptible of consid­erable latitude of interpretation. In geometry we deal with •calcul des Probabilites (1912), p. 169. tBulletin of the American :Mathematical Society, June, 1913, pp. 479-482. the straight line. But what is a straight line T We may try to shift the difficulty of defining a straight line to algebra and make use of the linear function ; but this does not define a straight line as geometrically conceived. Again a ''stretched string" is a good description; but it is not a definition. The notion of probability a priori will be developed in descrip­tive fashion in the section which follows, and certain definitions and postulates given, also certain fallacies mentioned. Follow­ing this will be the formal statement and proof of four theorems, involving various hypotheses concerning the a priori probability. This will be followed by a section on defective hypotheses, those which are inadequate to bring the Gaussian law and the arith­metic mean into close relation. And a short discussion will follow this. §1. INTRODUCTORY CONCEPTS AND POSTULATES. A physicist learns that a meter-rod, that he has ordered, has been shipped to him. Supposing for the sake of simplicity that a meter is just 39.37 inches, what is the probability that the rod will be 39.37 inches long? What is the probability that it will be 39.38 inches long? Or 39.35 inches long? This problem may illustrate in a general way how an a priori probability may be conceived. The physicist has made no measurement of this rod ; in fact, he has not even seen it. His order may have been misunderstood, and something altogether wrong may have been sent him. Nevertheless, it would be generally admitted that he would be more likely to receive a rod between 39.30 and 39.40 inches in length than a rod between 40.00 and 40.10 inches. And it is natural to attempt to get an expression for this a priori probability, rough though the approximation may be. It frequently happens that some hypothesis about the a priori probability seems almost necessary, to make a start in certain problems. But fortunately the influence of this probability often becomes ultimately negligible; so that, even though it has been poorly represented, the harm done is of a vanishing nature. Poincare >11 introduces an essentially unknown function to rep­resent a certain probability in a problem on the roulette wheel *Poincar~. loc. cit., pp 148-162; see also p. 277. The University of Texas Bulletin and in a problem on the distribution of planets; and the influ­ence of the function is practically nil. In view of the fact that the distribution of errors of measure­ments and the deviations from the normal in biological observa­tions follow with more or less approximation the Gaussian prob­ability law, it is not unnatural to assume that the a priori prob­ability may likewise be approximately Gaussian; for example, to assume that the probability a priori that the length of the meter­bar lies between a and f3 is -k J/3 -k2(a-z)2(1) p -v; e dz, a in which e=2.718 .. . , a==39.37, and k, the measure of precision, depends upon the reputation of the firm for accuracy in con­struction. The Gaussian law, as is well known, makes large errors less likely than small errors, and very large errors well nigh impossible. Many authors favor the use of a constant a priori probability. In the present problem, however, it is obvious that the prob­ability that a rod between 1000 and 1001 inches long will be sent is not as great as the probability that a rod between 39 and 40 inches will be sent. It would be rash to assert that the present problem is typical of all problems that arise. But the use of a constant a priori probability in certain cases is highly objectionable, especially when it refers to an unknown magni­tude for which all real numbers are assumed possible. To show this, let the probability a priori that the unknown true value lies betwet'n a ~nd # be the integral of w(z)dz from a to {3. Now the symbol for certainty is unity. And so, as the unknown certainly lies between -oo and +oo, +~ (2) J \f/(z)dz=J. -oo As I have already * pointed out, this equation can not be sat­isfied if '1t (z) is a constant. The failure to recognize this, lies *Bulletin of the American Mathematical Society, loc. cit. at the basis of a fallacious deduction of the Gaussian law from the so-called principle of the arithmetic mean. The same fallacy underlies an argument for the reverse process, attempting to get the principle of the arithmetic mean from the Gaussian law. A still more objectionable fallacy, presented to accomplish this end, consists in confusing two distinct probabilities. The expression, h )" -h'[(z-m1 )'+...+(z-m0 )'] (3) (z)=( v:;;: e is first set up as the probability that if z is the true ...-alue the measurements m, m,, . . . mo will be made; and then the at­tempt is made to regard this expression ( 3) as the probability that z is the true value, the measurements having been made. Then by setting the first derivative equal to zero, the result, m,+m,+...+mn z= n is obtained; and it is asserted that the average or arithmetic mean is the most probable value of the unknown true value. It is the object of this paper to arrive at the conclusion that the average is approxirnately the most probable value by assum­ing that '1r(z) satisfies the requirement (2) and certain other natural conditions of a general nature, somewhat analogous to the conditions placed by Poincare upon the arbitrary functions in his roulette and planet problems. Different hypotheses will be made for -Ir(z) . It may seem that the only condition needed in addition to (2) is that '1t(z) be continuous, so that it would be practically constant in small intervals. But, as will be shown, this condition in no wise guarantees that the arithmetic mean will even approximate the most probable value. We do not here undertake to define a priori probability or indeed any kind of probability. A useful description of a prob­ability may. be an ideal frequency. In the preceding illustration, perhaps the firm has the reputation of making its meter-rods a triffo too long. In place of a in (1), the physicist may put his guess, g, which may or may not be 39.37. In thiuking of p in (1) as an ideal frequency, we may have in mind that if. upon the n occasions the physicist guesses that the length of the The University of Texas Bulletin coming rod will beg, then in about pn cases it is to be expected that the length of the rod will be between a and (3. Concerning each measurement, mi, m:i • . . m,, it will be as­sumed that it is subject to the Gaussian law, with measure of precision, k. That is, if a is the true value, and x==a--m, the probability that the error of a measurement to be made will lie between x1 and x2 is h x. -h'x' (5) v;fe dx. x, Or, stated in the differential form, • the probability that the error x will be made,-that is, an error between x and x+dx,-is h -h2 x 2 (6) --e dx. i~ Then the probability that the n errors, x1, x2 • • • Xn W'ill be made is (_h_)n -h'[x:+x:+••. +x~]d d y; e Xn X1••• This resembles (z) in (3) somewhat: but in (7), x1 is the actual error, a--m1 ; whereas, in(3), z-m1 is merely a residual with respect to z, where z is a candidate for recognition as the true value. Now let '11 (z) dz be the probability a priori that z is the true value, and let (8) F(z) w(z)(z). Then, by Bayes' theorem, the probability a posteriori that z is the unknown true value, after the measurements have been made, is 1 (9) -F z)dz, c where c is the integral of F(z)dz from -oo to +oo. We seek •For the sake of brevity, the differential form will be sometimes used. From an inspection of Theorem III or Theorem IV the reader will be able to restate the first two theorems in the more cumbrous but more precise form. the value Z of z which will make F (z) the maximum. Then, a posteriori, the most probable value of the unknown is Z. By this is meant that it will be more probable a posteriori that the true value will differ from Z by less than a small £ than that the true value will differ from any other real number by less than t. §2. FIRST HYPOTHESIS CONCERNING THE A PRIORI PROBABILITY. Let g be a guess at the true value, and let the a priori prob­ability function be (10) + (z)=l/'11' e -k'(g-z)' Then from (8), _ k ( h )n -[lt'(ir--zf+h'~(z-m)'] F(z)--= ~ e . v1T v1T Setting the first derivative equal to zero and solving gives the value Z of z, making F(z) the maximum. k'g+h'(m,+...+mn) (11) z k"+nh' A maximum actually occurs here; since the bracket above is a quadratic with the coefficient of z2 positive. Let M desig­nate the arithmetic mean or average of the measurements; and divide the numerator and denominator in (11) by nh2 • Then (12) where 'r/1' 71 2, w and •T approach zero with increasing n. In fact, if it be postulated that M does not increase indefinitely in numerical value-and in practice this is usually the case-then Mw is an infinitesimal; and thus (13) where .,, approaches zero with increasing n. Theorem I. Let the probabiiity a priori that the unknown t'r'!te value is z be • where g is any guess at the unknown. Let the probability that the error of a measurement to be made will be x be h -h2 x 2 i1:;;:-e dx. Then, a posteriori, after n measu1·ements with arithmetic mean M have been made, the most probable value of the unknown is where w and a appronch zero with increasing n. Disc1ission. If the tangent of an angle near 90° is being measured, the condition for (13) may not be satisfied. To illus­trate further by an example, suppose that the measurements turn out to be the odd integers in natural order: m1=l, m2=3, m3=5, etc. Then in (11), m 1+m2 + ... +mn -n2 ; and M n. Suppose furthermore that g= -1, and that h=l=k. Then z n-l=M-1. Thus M-Z=l; and so this difference is not an infinitesimal. In general, k and h in (11) are not equal. In place of the guess g in (10) a perliminary measurement m may be made; and thus m would replace g in (10). This may seem at first sight to make k equal to h, so that by (11) Z would become the exact average of the (n+l) measurements. But even with a measurement m there is no justificatian for making k in (10) equal to h. The probability that z is the true value after a measurement m has been made, is entirely distinct from the probability that the measurement m will be made if z is the true value. The failure to recognize the distinction between a "probability of cause" and a direct probability has been the source of many fallacies. The distinction can be brought out *See ( 5) and (Ii) and coresponding foot-note. clearly by urn problems where the exact probability can be com­puted, under certain hypothese,s. The probability that an urn contained two white balls and two black balls if a white ball and a black ball have been drawn, is quite different from the probability that from an urn containing two white balls and two black balls a white ball and a black ball will be drawn. §3. SECOND HYPOTHESIS CONCERNING THE A PRIORI PROBABILITY. By differentiating F( z) in (8), we obtain (14) F' (z)=(z)[w' (z)+'l' (z) ~ -2h'(nz-:Sm)} J Let f (z) be the function obtained by dividing the bracket in (14) by -2h2n'lt(z). Then '1t' (z) (15) f(z)=z-M 2h2n'l' (z) Now F'(z)=O provided f (z) =O. To make .f(z) vanish when z is nearly M, we naturally impose some condition to make the last term in ( 15) negligible with increasing n. THEOREM IL Let the probability * a priori that the 1mknown true value is z be \JI(~) dz, where !>It' (z) !o. §4 THIRD HYPOTHESIS CONCERNING THE A PRIORI PROBABILITY. In the theorem about to be stated, the special condition to be placed upon -.V' (z) is suggested by (16). The differential form of statement will now be laid aside. THEOREM III. Let the probability a priori that the true value lies between aand a+8 be where v(z) is positive, except perhaps at isolated points; and (18) c1 and c2 being positive constants. Let the probability that the error of a measurement to be made will lie between X1 and X2 be x, h' • h_J c-xdx v'I!' x, Then after n measurements with arithmetic mean M have been made, the probability a posteriori that the true value lies be­tween aand a+s is greatest when (19) where lim w=o and lim u=0 uniformly. n=:o, l'l=o n=oo, 8=o Proof. By Bayes' theorem, the probability a posteriori that the true value lies between a and a+8 is l a.+8 (20) P=-zJ F(z)dz. a The abridged form of this statement was given in (9) For brevity set (21) ( =_s_ • 2h'n These approach zero with increasing n. Pirst, suppose that '11(z) >0. Then from (15), f(z) O and the ('s are small, this will be satisfied if M-( (24) z< I+(,·; whereas if M~O, (23) will be satisfied if M-(, (25) z --­ < 1-(, In place of (24) and (25) we may write simply (26) where is an infinitesimal. Hence if z satisfies ( 26), f ( z) (3 will be negative, F'(z) will be positive by (3), (14) and (15), and F(z) will be an increasing function of z. Hence P in (20) can not take its largest value, for a given 8, if (27) This 8 can be made as small as we please. Likewise it can be shown that P can not take its largest value when The University of Texg,s BuUetin (28 where £4 is an infinitesimal. But with 3 fixed, P is a contin­uous function of a and hence takes on its maximum. Hence (191 follows from (27) and (28). Here wand" approach zero 11niformly; for from (21) , the rapidity with which £1 and £2 approach zero does not depend upon the magnitude of M. In Theorems I. and II., w and " likewise approach zero uniformly. Suppose now that w(z)=O at some isolated points. Then by (18) and (14) F'(z)=O at these isolated points. This will not affect the character of F (z) as an increasing function or as a decreasing function. To see that (18) is not a superfluous condition let (29) w(z)=Ce-lz"\, where C is chosen to satisfy (2). The graph of (29) is a curve symmetrical with respect to the ''Y axis,'' has its maxi­mum at z--0, has just one point of inflection on each side of the Y axis, and otherwise resembles the usual probability curve (10), with g=O. Then by (8), if h=l, _ ( 1 )n -z' -~(z-m)' (30) I· ( z )=C ]/; e If now the measurements turn out to be the odd integers, 1, 3, 5, ... then ~m=n2 , and F(z) takes its maximum when z=(l/ 3)1\!I(\17-1). Thus (19) is not satisfied; nor is (18). It will be noticed that Theorem I. is a special case of Theorem III.; but Theorem I. was given first, because of simplicity of development. §5 FOUR'l'H HYPOTHESIS CONCERNING THE A PRIORI PROB­ABILITY. 'l'here are certain cases m which measurements must lie be­tween two constants. If we accept the most elementary con­ception of an angle, the angle must lie between 0° and 180° The tangent of this angle, however, may have any real value positive or negative. But even if we are measuring the tangent of an angle, there would usually be an interval, from b, to b2, in which the average M would in practice be. To postulate that M must lie in (b1 , b2 ) would be contradictory to the Gaussian law. And so the theorem to be given applies in strictness to the case where M does lie in (b1, b2), rather than to the case where M must lie in (bii b2). THEOREM IV. Let the probability a priori that the true valile lies between a and a+8 be a+8 J 'JI (z)dz, a where w(z) is limited for all V'alues of z; and has a positive min­imum in some interv·al (bu b2 ), or at least in that part of (b1+E, b2--t:) which remains when a finite number of sub-inter­V'als of the form (~-E, ~-t-E) are removed, with Esmall at pleas1tre. Let the probability that the error of a measurement to be made will lie between x1 and x2 be x, Then after n measilrements with arithmetic mean M have been made, the probability a posteriori tha.t the true v·alne lies be­tween a and a+8 is greatest when a=M+o where lim a=o 1miformly in (b 1, b2) provided that M continues n=oo, 8=o to lie in (b 1, b2). Proof. J..Jet v11 v2, • • • Vn be the residuals of the measure­ments with respect to M; that is, let v1-M-mn etc. Then z-m1=z-M+v1, etc.; and, since lv o, (3) becomes . ( h )n -h'lv'[ -nh'(z-M)'] (31) (z)= --= e e v7f This shows, even more simply than (3), that (z)takes its maximum when z=M, and is an increasing function when zM. We wish The University of Texas Bulletin to show that, under the conditions of the hypothesis, (z) can, when n is large enough, force its own point of maximum M upon the product F(z) in (8) and so upon P in (20), to as close an approximation as we please. From (31) it follows that cI>(z+t) -nh' [tCz-M)+ie;]. (32) (z+f) e In particular, ct> ( M+t) -in~;t: • ....-,----,-=e (33) (M+f) Now, by hypothesis, '11 (z) is limited; that is, there is a constant K such that '11 (z) T. Suppose, first, that M differs from b1, b2 and every ~ by at least t:. Now take n in (33) large enough so that _(_M_+_t) <-T (M--{) T (34) ----<­ ( M +-t) K ' (M-t) K · Then (35) Kif> (M-i) (M-f). But (z) is an increasing function when z (z,) (M-j) (M--f) (z,). Thus F(z1 ) M+t:. But with a chosen 8, the in­tegral P is a continuous function of a; and hence P takes on its largest value when a lies between M-t: and M+t:. If, in particular, M~+e:, then by (34), w(e +e: +f) T (36) <­ w(~+e:+-~-) K But from (32), whatever be the value of M, w(~+e:+t) -nh'[t(~+(-M)+H'), e '11 ( ~+e:+t) and decreases with decreasing M. Hence (34), being satisfied when M~+e:, is satisfied a fortiori, when M<~+e:. Hence, when M <~+e:, and n and 8 are chosen as specified above, P can not take its maximum when a>~+2e:. And likewise when M>e-e:, P can not take its maximum when a<~-2e:. Thus if M falls between ~-t: and ~+e:, P takes its maximum when a lies between ~-2e: and ~+2e:; and la-Ml<3e:. The reasoning is analogus if M falls near b1 or b2• §6. FOUR DEFECTIVE HYPOTHESES CONCERNING THE A PRIORI PROBABILITY. Four hypotheses will now be mentioned which are untenable or artificial or inadequate. 1. Each real number is equally likely a prior£ to be the true value. This makes \Jt(z ) a constant, and (2) can not be satisfied. 2. Each real number in a certain interval (b1 b2 ) is a priori equally likely to be the true value, and it is impossible for the true value to lie outside this interval. I have given an example • for which this hypothesis is nat­ural. But in the general case it appears artificial to postulate that the a priori probability drops suddenly to zero at the ends of an interval, when these ends can be at best only hazily imagined. Even if we adopt this hypothesis, it would not follow that the arithmetic mean M is the most probable value of the un­known, without the addition proviso that M lies in (b 1, b2) . For if '1'(z)==O outside (b1' b2), then by (8) the probability a pos­ *Bulletin of the American Mathematical Society, loc. cit., p. 481. The University of Texas B1illetin teriori that the true value lies outside ( b1, b2 ) is also zero. Thus the most probable value of the unknown could not be outside (b1 , b2 ); whereas the Gaussian law permits M to have any value whatever. 3. The a priori probability is practically constant in small intervals,-or, as we may wish to express it, '11 (z) is a contin­uous function of z,-and all real numbers are possible values of the unknown true value. These conditions are satisfied by w(z) in (29); and hence are inadequate. 4. The a priori probability is continuous, and is zero outside a certain interval. The supposition that w ( z) ==() outside (b1' b2 ) leads to the defect mentioned under No. 2. §7. FOUR TYPES OF A PRIORI PROBABILITY. In a given case there may be a strong probability that the arithmetic mean M will not increase indefinitely, even though something like the tangent of an angle is being measured. Never­theless instruments may be subject to a progressive change due to a change in temperature. Or, indeed, an increasing set of measurements may be the result of mere chance as we usually understand the term. If a gambler loses in one night as much money as he has won previously in a year, he may well suspect that the dice are loaded. And an experimenter may well sus­pect that his instruments are suffering from some ailment, if his measurements persist in increasing. But in both cases it may be simply a run of bad luck. It may seldom be clear just what type of a priori probability to postulate; but the theorems just given and subsequent dis­cussion permit us to distinguish four types of a priori proba­bility in accordance with the ease with which this probability allows itself to be eliminated when M increases indefinitely., thus departing indefinitely from any value which a priori may be the most probable value of the unknown. 1. The weakest w ( z) , that has been mentioned as permissible, is given by (17). This has no power of resistance, and its influence is evanescent with increasing n. 2. The £unction (10) has a greater power 0£ resistance. An example was given a£ter Theorem I., in which g=-1 and Z 111-1. The £unction (10) can clip off a constant-in the example, unity-£rom the arithmetic mean, be£ore this is pre­sentable as the most probable value of the unknown. 3. 'l'he function (29) is still stronger. In the example given, it permits a number only about 55% of the arithmetic mean to come forth as the most probable value. But even this function is not absolutely prohibitive of a large most probable value. In spite of its strong preference £or zero as the value of the unkonwn, it acknowledges the possibility of any value; and the persistent increase of M forces up the most probable value. 4. A function absolutely prohibitive of a most probable value outside an interval (b1, b2) can be formed by making w(z)=O outside (bu b2). This follows from (8), (9), and (20). As long as M remains in some interval (b1, b2), both functions (10) and (29) and in general the w(z) just mentioned exert a vanishing influence upon the most probable value, by Theo­rem IV. With large values of M, it is easy to see why (29) should be more stubborn than (10), since when z>k2, -z' -k2z' e