Probability Theory: J.M. Steele
Probability Theory: J.M. Steele
Probability Theory: J.M. Steele
Probability Theory
J.M. Steele
Wharton
Probability theory is that part of mathematics that aims to provide insight into phenomena that depend on chance or on uncertainty. The most prevalent use of the theory comes through the frequentists interpretation of probability in terms of the outcomes of repeated experiments, but probability is also used to provide a measure of subjective beliefs, especially as judged by ones willingness to place bets.
The roots of probability theory are not as ancient as those of many parts of mathematics, and only in the sixteenth and seventeenth centuries does one nd the rst glimmerings of the theory in the investigations made by Gerolamo Cardano, Pierre de Fermat, and Blaise Pascal into games of chance. Despite the luminous reputations of these famous mathematicians and philosophers, the subject of probability theory remained on the periphery of respectability, and for a long time development was halting and lugubrious. Through the rst third of the twentieth century, the eighteenth century works of Jakob Bernoulli (see Bernoulli Family) and Abraham De Moivre continued to be viewed as the nearly denitive treatises of probability theory.
Int. Encyc. Social and Behavioral Sciences 18 March 2003
2 Still, even in the early days of the twentieth century when probability theory clearly suffered from the lack of a widely accepted foundation, there were profound developments, most notably Albert Einsteins use of Brownian motion in 1905 to provide the rst determination of Avagadros number [7]. Nevertheless, in 1933 when Andrey Nikolayevich Kolmogorov published his elegant succinct volume Foundations of Probability Theory [10], the mathematical world was hungry for such a treatment, and the subsequent development of probability theory was explosive.
1 Firm Foundation
Central to Kolmogorovs foundation for probability theory was his introduction of the triple (, F, P) that we now call a probability space, or sometimes the probabilists trinity. The triples rst element, , is required only to be a set. The second element is a collection of subsets of about which more will be said later. The third element is a function that assigns a real number to each of the elements of F. This function is called a probability measure P provided that it satises the three following axioms: Axiom 1. For all A F we have P(A) 0. Axiom 2. For any countable collection {Ai F : 1 i < } for which Ai Aj = for all i = j, we have
P ( Ai ) = i=1
P(Ai ).
i=1
3 Axiom 3. P() = 1 Axioms 1 and 3 are quite bland. Axiom 1 only captures our understanding that probabilities of events are nonnegative numbers, and Axiom 3 just echoes our assumption that is a sensible representation for the universe of all possible outcomes of the chance experiment being modeled. Only about Axiom 2 can there can be any quarrel, and at times arguments have been made for preferring a probability theory that only requires additivity of probabilities for nite collections of sets. Kolmogorovs decision to assume countable additivity is not the only possible choice, but it has been a fecund one that has proved to be appropriate in a wide variety of circumstances.
The mathematical benet of Kolmogorovs second axiom is that it connects probability theory with the theory of measure as put forward by Borel, Lebesgue, Radon, and Fr chet in the early part of the twentieth century. It was in fact Fr chet who e e noted some 13 years after Lebesgues famous 1902 thesis that the natural domain for a probability measure is a collection of sets that is closed under complementation and countable unions. Fr chet called such collections -algebras, and Kole mogorov required that the second term of his triple be just such a collection.
To the practical mind, Kolmogorovs axiomatization of probability may seem only to defer the problem of construction of probability models that serve to inform us about the physical and social world, but by putting the elusive probability function P on an axiomatic footing Kolmogorov did provide real assurance that one could
4 study probability as sensibly as one could study measure theory, analysis or algebra. In particular, one could proceed with the investigation of the objects that had been of concern from probabilitys earliest days.
One of the most fundamental notions of probability theory is the random variable, and in Kolmogorovs framework a random variable is nothing more than a function from X : with the property that for all t one has that the sets { : X()
t} are elements of the -algebra F. With this denition we are on rm footing when we take the denition of the distribution function F of X to be
F(t) = P(X t), because the set { : X() t} is in the domain of the set function P. In this framework the expectation E(X) of the random variable X can dened as the Lebesgue integral of X with regard to P, or as the Riemann-Stieltjes integral with respect to F, giving us
E(X)
X()dP() =
xdF(x).
The probability distribution function and the expectation operation provide us with the core language that is needed to express almost everything that one needs to say about individual random variables. For example, a basic measure of dispersion of a random variable is the variance, which one writes in terms of the expectation as
var(X) E(X )2 , where = E(X) and the standard deviation of X is dened to be the square root of the variance.
With expectations and distributions we recapture much of the most basic language of probability theory, but the real power of probability theory only emerges with the introduction of the central notion of independence of events, algebras, and random variables. To begin that development, one rst denes elements A and B of F to be independent provided
P(A B) = P(A)P(B). This denition is then extended to sub--algebras of A and B of F by calling A and B independent provided A and B are independent for all A A and all B B. Finally, random variables X and Y are independent if A and B independent when these are respectively the smallest -algebras containing all the sets {X t} and all the sets {Y t}.
This denition of independence of random variables may look a little burdensome at rst, but for many purposes it is much more convenient than the denition of independence that is sometimes given in elementary texts that call for the factor-
6 ization of the joint density of X and Y . In fact densities may not exist, but that is not the telling point. More to the heart of the matter is that with Kolmogorovs denition one clearly sees that the independence of X and Y implies the independence of f (X) and g(Y ) for any monotone functions f and g, while this intuitive fact is cumbersome to check if one needs to verify a density factorization.
There are two theorems that live at the very heart of probability theory. The rst is the law of large numbers, without which our most fundamental intuitions about the relationship of probability theory and the physical world would be at odds. The second is the central limit theorem, which is arguably the result that most clearly accounts for the practical utility of probability as a helpmate to statistics, as well as to the social and physical sciences. Theorem 1 (Law of Large Numbers). If {Xi : 1 i < } is a sequence of independent random variables, with the distribution function, F, and if E | Xi |< , then the event that the sequence
1 {X1 + X2 + . . . + Xn } n converges to E(X1 ) has probability one. Theorem 2 (Central Limit Theorem). If {Xi : 1 i < } is a sequence of independent random variables with distribution function F, E(Xi ) = < , and
1 limn P( n {X1 + X2 + . . . + Xn n} x)
1 (2)1/2
x 2/2 du. e
While the purest view of the aims and accomplishments of probability theory may be found in the study of sums of independent random variables, the applications of probability theory require the development of structures that also capture aspects of dependence. To give the simplest illustration of a such a system, we consider a set nite set S = {1, 2, . . . , n} which we will call the set of states, and a matrix P = {pij ), where all of the matrix entries satisfy 0 pij 1 and where the row sums pi1 + pi2 + . . . + pin all equal one. We now consider a sequence of random variables Xn that are dened by sequential transitions according to the row of the matrix P . Specically, if Xn = i, then Xn+1 is determined by making a choice from the set S in accordance with the probability masses (pij ). Such a sequence of random variables is called a Markov chain, and the theory of such sequences offers an important rst step from the core theory of independent random variables. The index of the sequence {Xn : n 0} is usually viewed as time and an important extension of the notion of a Markov chain is that of a Markov Process where the index is taken to be the whole positive real line and the state space is permitted to be
d
(or even a more complex space). The most important such process is Brownian
8 motion.
Another direction for the development of probability theory that goes beyond independence is provided by the theory of martingales. On one level, martingales capture the notion of a fair gambling game, and although this view is interesting (and loyal to the origins of probability theory), the theory of martingales turns out to be an appropriate tool for many kinds of investigation (see Counting Process Methods in Survival Analysis). In particular, the theory of martingales provides the key to profound connections between the theory of Markov processes and the classical theory of harmonic functions.
References
a) Adams, W.J. (1974). The Life and Times of the Central Limit Theorem, Kaedmon Press, New York. b) Chung, K.L. (1974). Elementary Probability with Stochastic Processes. SpringerVerlag, New York. c) David, F.N. (1962). Games, Gods, and Gambling: The Origins and History of Probability from the Earliest Times to the Newtonian Era. Grifn, London. d) Doob, Joseph L. (1994). The development of rigor in mathematical probability (1900-1950), in Development of Mathematics 1900-1950, J.-P. Pier, ed. Birkhauser-Verlag, Basel. e) Dudley, R.M. (1989). Real Analysis and Probability. Wadsworth-Brooks/Cole, Pacic Grove.
9 f) Durren, R. (1991). Probability: Theory and Examples, Wadsworth-Brooks/Cole, Pacic Grove. g) Einstein, A. (1905). On the movement of small particles suspended in a stationary liquid demanded by the molecular-kinetic theory of heat (in German), Ann. Pys. (Ser. 4) 17, 549560. h) Feller, W. (1968). An Introduction to Probability and Its Applications. Vol. I, 3rd Ed. Wiley,, New York. i) Kolmogorov, A.N. (1933). Grundbeggriffe der Wahrscheinlichtkeitrechnung.. Springer-Verlag, Berlin. (English translation: N. Morrison (1956), Foundations of the Theory of Probability, Chelsea, New York.) j) Stigler, S.M. (1986). The History of Statistics: The Measurement of Uncertainty before 1900. Harvard University Press, Cambridge, MA.
J.M. Steele