MIT6 262S11 Lec02
MIT6 262S11 Lec02
MIT6 262S11 Lec02
2/7/11
Expectations The distribution function of a rv X often contains more detail than necessary. The expectation X = E [X ] is sometimes all that is needed.
E [X ] = E [X ] = E [X ] = E [X ] =
aipX (ai)
xfX (x)dx
Fc X (x) dx FX (x) dx +
Fc X (x) dx
for arbitrary X.
E (X X )2
a2
a2pX (a2)
Fc X (x)
a4
If X has a density, the same argument applies to every Riemann sum for x xfX (s) dx and thus to the limit. It is simpler and more fundamental to take Fc X (x) dx as the general denition of E [X ]. This is also useful in solving problems
Indicator random variables For every event A in a probabiity model, an indicator rv IA is dened where IA( ) = 1 for A and IA( ) = 0 otherwise. Note that IA is a binary rv.
1 Pr {A} 0 1
FIA
0
E [IA] = Pr {A}
IA =
Pr {A} (1 Pr {A})
Multiple random variables Is a random variable (rv) X specied by its distribu tion function FX (x)? No, the relationship between rvs is important.
FXY (x, y ) = Pr { : X ( ) x}
n
{ : Y ( ) y }
FX (x1, . . . xn) =
m=1
FXm (xm)
For discrete rvs, independence is more intuitive when stated in terms of conditional probabilities.
pX |Y (x|y ) =
pXY (x, y ) pY (y )
Then X and Y are independent if pX |Y (x|y ) = pX (x) for all sample points x and y . This essentially works for densities, but then Pr {Y = y } = 0 (see notes). This is not very useful for distribution functions. NitPick: If X1, . . . , Xn are independent, then all sub sets of X1, . . . , Xn are independent. (This isnt al ways true for independent events).
IID random variables The random variables X1, . . . , Xn are independent and identically distributed (IID) if for all x1, . . . , xn
FX (x1, . . . , xn) =
k=1
This product form works for PMFs and PDFs also. Consider a probability model in which R is the sam ple space and X is a rv. We can always create an extended model in which Rn is the sample space and X1, X2, . . . , Xn are IID rvs. We can further visualize n where X1, X2, . . . is a stochastic process of IID variables.
FX (xk )
We study the sample average, Sn/n = (X1+ +Xn)/n. The laws of large numbers say that Sn/n essentially becomes deterministic as n . If the extended model corresponds to repeated ex periments in the real world, then Sn/n corresponds to the arithmetic average in the real world. If X is the indicator rv for event A, then the sample average is the relative frequency of A. Models can have two types of diculties. In one, a sequence of real-world experiments are not su ciently similar and isolated to correspond to the IID extendied model. In the other, the IID extension is OK but the basic model is not. We learn about these problems here through study of the models.
Science, symmetry, analogies, earlier models, etc. are all used to model real-world situations. Trivial example: Roll a white die and a red die. There are 36 sample outcomes, (i, j ), 1 i, j 6, taken as equiprobable by symmetry. Roll 2 indistinguishable white dice. The white and red outcomes (i, j ) and (j, i) for i = j are now in distinguishable. There are now 21 nest grain outcomes, but no sane person would use these as sample points. The appropriate sample space is the white/red sample space with an o-line recognition of what is distinguishable. Neither the axioms nor experimentation motivate this model, i.e., modeling requires judgement and common sense.
Comparing models for similar situations and analyz ing limited and defective models helps in clarifying fuzziness in a situation of interest. Ultimately, as in all of science, some experimenta tion is needed. The outcome of an experiment is a sample point, not a probability. Experimentation with probability requires multiple trials. The outcome is modeled as a sample point in an extended version of the original model. Experimental tests of an original model come from the laws of large numbers in the context of an ex tended model.
10
Laws of large numbers in pictures Let X1, X2, . . . , Xn be IID rvs with mean X , variance 2 = n 2. 2. Let Sn = X1 + + Xn. Then S n
1 0.8 0.6 0.4 0.2
0
10
FS4
F S20
F S50
15
The center of the distribution varies with n and the spread (Sn ) varies with n.
11
0.25
0.5
0.75
n=4 n = 20 n = 50
12
Note that Sn nX is a zero mean rv with variance nX is zero mean, unit variance. n 2. Thus Sn n
1
n=4 n = 20 n = 50
1 x2 exp = 2 2
dx.
13
pY (1) = p > 0,
pY (0) = 1 p = q > 0
The n-tuple of k 1s followed by n k 0s has prob ability pk q nk . Each n tuple with k ones has this same probability. For p < 1/2, pk q nk is largest at k = 0 and decreasing in k to k = n. There are n k n-tuples with k 1s. This is increasing in k for k < n/2 and then decreasing. Altogether, n pSn (k) = pk q nk k
14
pSn (k) =
pSn 1
pSn (k) > 1
<1 (k+1)
k pn k < pn < k + 1 k + 1 pn
15
<1 (k+1)
(1)
k pn k < pn < k + 1 k + 1 pn
pn
2 1 0 1 2 3
In other words, pSn (k), for xed n, is increasing with k for k < pn and decreasing for k > pn.
k pn
16
i i+1 = ln 1 ln 1+ nq np
17
i i+1 ln 1+ = ln 1 nq np i i 1 = + nq np np i 1 = + npq np
where we have used 1/p + 1/q = 1/pq and the ne glected terms are of order i2/n2. This says that these log of unit increment terms are essentially linear in i. We now have to combine these unit incremental terms.
18
p (pn + i + 1) i 1 ln Sn = + npq np pSn (pn + i) Expressing an increment of j terms as a telescoping sum of j unit increments, pSn (pn + i + 1) pSn (pn + i) j 1 i 1 + = i=0 npq np j (j 1) j j 2 = + 2npq np 2npq where we have used the fact that 1 + 2 + + j 1 = j ((j 1)/2. We have also ignored terms linear in j since they are of the same order as a unit increment in j .
j 1 ln = i=0
19
Finally,
j 2 2npq
This applies for j both positive and negative, and is a quantized version of a Gaussian distribution, with the unknown scaling constant pSn (pn). Choos ing this to get a PMF, 1 j 2 pSn (pn + j ) exp , 2 npq 2npq which is the discrete PMF form of the central limit theorem. See Section 1.5.3 for a dierent approach.
20
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.