MIT6 262S11 Lec02

6.
262: Discrete Stochastic Processes
2/7/11
Lecture 2: More review; the Bernoulli process Outline: Expectations

Indicator random variables
Multiple random variables
IID random variables
Laws of large numbers in pictures The Bernoulli process
Central limit theorem for Bernoulli
Expectations The distribution function of a rv X often contains more detail than necessary. The expectation X = E [X ] is sometimes all that is needed.
E [X ] = E [X ] = E [X ] = E [X ] =
aipX (ai)
for discrete X for continuous X for arbitrary nonneg X

0
xfX (x)dx
Fc X (x) dx FX (x) dx +
Almost as important is the standard deviation, X =

Fc X (x) dx
for arbitrary X.
E (X X )2
Why is E [X ] = Fc X (x) dx for Look at discrete case. Then

a1
arbitrary nonneg X ? Fc X (x) dx = i aipX (ai).

a3
pX (a1) a1pX (a1) pX (a2) pX (a3) pX (a4)
a2
a2pX (a2)
a3pX (a3) a4pX (a4)
Fc X (x)
a4
If X has a density, the same argument applies to every Riemann sum for x xfX (s) dx and thus to the limit. It is simpler and more fundamental to take Fc X (x) dx as the general denition of E [X ]. This is also useful in solving problems
Indicator random variables For every event A in a probabiity model, an indicator rv IA is dened where IA( ) = 1 for A and IA( ) = 0 otherwise. Note that IA is a binary rv.
pIA (0) = 1 Pr {A} ;
pIA (1) = Pr {A} .

1
1 Pr {A} 0 1
FIA
0
E [IA] = Pr {A}
IA =
Pr {A} (1 Pr {A})
Theorems about rvs can thus be applied to events.
Multiple random variables Is a random variable (rv) X specied by its distribu tion function FX (x)? No, the relationship between rvs is important.
FXY (x, y ) = Pr { : X ( ) x}
n
The rvs X1, . . . , Xn are independent if
{ : Y ( ) y }
FX (x1, . . . xn) =
m=1
FXm (xm)
for all x1, . . . , xn
This product form carries over for PMFs and PDFs.
For discrete rvs, independence is more intuitive when stated in terms of conditional probabilities.
pX |Y (x|y ) =
pXY (x, y ) pY (y )
Then X and Y are independent if pX |Y (x|y ) = pX (x) for all sample points x and y . This essentially works for densities, but then Pr {Y = y } = 0 (see notes). This is not very useful for distribution functions. NitPick: If X1, . . . , Xn are independent, then all sub sets of X1, . . . , Xn are independent. (This isnt al ways true for independent events).
IID random variables The random variables X1, . . . , Xn are independent and identically distributed (IID) if for all x1, . . . , xn
FX (x1, . . . , xn) =
k=1
This product form works for PMFs and PDFs also. Consider a probability model in which R is the sam ple space and X is a rv. We can always create an extended model in which Rn is the sample space and X1, X2, . . . , Xn are IID rvs. We can further visualize n where X1, X2, . . . is a stochastic process of IID variables.
FX (xk )
We study the sample average, Sn/n = (X1+ +Xn)/n. The laws of large numbers say that Sn/n essentially becomes deterministic as n . If the extended model corresponds to repeated ex periments in the real world, then Sn/n corresponds to the arithmetic average in the real world. If X is the indicator rv for event A, then the sample average is the relative frequency of A. Models can have two types of diculties. In one, a sequence of real-world experiments are not su ciently similar and isolated to correspond to the IID extendied model. In the other, the IID extension is OK but the basic model is not. We learn about these problems here through study of the models.
Science, symmetry, analogies, earlier models, etc. are all used to model real-world situations. Trivial example: Roll a white die and a red die. There are 36 sample outcomes, (i, j ), 1 i, j 6, taken as equiprobable by symmetry. Roll 2 indistinguishable white dice. The white and red outcomes (i, j ) and (j, i) for i = j are now in distinguishable. There are now 21 nest grain outcomes, but no sane person would use these as sample points. The appropriate sample space is the white/red sample space with an o-line recognition of what is distinguishable. Neither the axioms nor experimentation motivate this model, i.e., modeling requires judgement and common sense.
Comparing models for similar situations and analyz ing limited and defective models helps in clarifying fuzziness in a situation of interest. Ultimately, as in all of science, some experimenta tion is needed. The outcome of an experiment is a sample point, not a probability. Experimentation with probability requires multiple trials. The outcome is modeled as a sample point in an extended version of the original model. Experimental tests of an original model come from the laws of large numbers in the context of an ex tended model.
10
Laws of large numbers in pictures Let X1, X2, . . . , Xn be IID rvs with mean X , variance 2 = n 2. 2. Let Sn = X1 + + Xn. Then S n
1 0.8 0.6 0.4 0.2
0

10
FS4
F S20
F S50

15
(Bernoulli case) pX (1) = 1/4 pX (0) = 3/4

20
The center of the distribution varies with n and the spread (Sn ) varies with n.
11
The sample average is Sn/n, which is a rv of mean X and variance 2/n.

1

0.25

0.5

0.75
0.8 FSn/n(z ) 0.6 0.4 0.2 0 0
n=4 n = 20 n = 50
The center of the distribution is X and the spread decreases with 1/ n.
12
Note that Sn nX is a zero mean rv with variance nX is zero mean, unit variance. n 2. Thus Sn n
1
0.8 FZn (z ) 0.6 S E[S ] Zn = nX nn 0.4 0.2 0
n=4 n = 20 n = 50
Central limit theorem: Sn nX lim Pr y n n

1 x2 exp = 2 2
dx.
13
The Bernoulli process Sn = Y1 + Yn
pY (1) = p > 0,
pY (0) = 1 p = q > 0
The n-tuple of k 1s followed by n k 0s has prob ability pk q nk . Each n tuple with k ones has this same probability. For p < 1/2, pk q nk is largest at k = 0 and decreasing in k to k = n. There are n k n-tuples with k 1s. This is increasing in k for k < n/2 and then decreasing. Altogether, n pSn (k) = pk q nk k

14
pk q nk k To understand how this varies with k, consider
pSn (k) =
nk p k+1 q This is strictly decreasing in k. It also satises =
pSn (k+1) n! k!(nk)! pk+1q nk1 = pSn (k) (k+1)!(nk1)! n! pk q nk
pSn 1
pSn (k) > 1
<1 (k+1)
for for for
k pn k < pn < k + 1 k + 1 pn
15
pSn (k+1) nk p = pSn (k) k+1 q pSn 1

pSn (k) > 1
<1 (k+1)
(1)
for for for
k pn k < pn < k + 1 k + 1 pn
pn
2 1 0 1 2 3
In other words, pSn (k), for xed n, is increasing with k for k < pn and decreasing for k > pn.
k pn
16
CLT for Bernoulli process
pSn (k+1) nk p = pSn (k) k+1 q

We now use this equation for large n where k is rel atively close to pn. To simplify the algebra, assume pn is integer and look at k = pn + i for relatively small i. Then
p (pn + i + 1) ln Sn pSn (pn + i)
pSn (pn + i + 1) npni p nq i p = = pSn (pn + i) pn+i+1 q pn + i + 1 q i 1 nq =

1 + i+1 np
i i+1 = ln 1 ln 1+ nq np
17
Recall that ln(1 + x) x x2/2 + for |x| << 1.
p (pn + i + 1) ln Sn pSn (pn + i)
i i+1 ln 1+ = ln 1 nq np i i 1 = + nq np np i 1 = + npq np
where we have used 1/p + 1/q = 1/pq and the ne glected terms are of order i2/n2. This says that these log of unit increment terms are essentially linear in i. We now have to combine these unit incremental terms.
18
p (pn + i + 1) i 1 ln Sn = + npq np pSn (pn + i) Expressing an increment of j terms as a telescoping sum of j unit increments, pSn (pn + i + 1) pSn (pn + i) j 1 i 1 + = i=0 npq np j (j 1) j j 2 = + 2npq np 2npq where we have used the fact that 1 + 2 + + j 1 = j ((j 1)/2. We have also ignored terms linear in j since they are of the same order as a unit increment in j .
j 1 ln = i=0
p (pn + j ) ln Sn pSn (pn)
19
Finally,
p (pn + j ) ln Sn pSn (pn)
j 2 2npq
j 2 pSn (pn + j ) pSn (pn) exp 2npq
This applies for j both positive and negative, and is a quantized version of a Gaussian distribution, with the unknown scaling constant pSn (pn). Choos ing this to get a PMF, 1 j 2 pSn (pn + j ) exp , 2 npq 2npq which is the discrete PMF form of the central limit theorem. See Section 1.5.3 for a dierent approach.

20
MIT OpenCourseWare http://ocw.mit.edu
6.262 Discrete Stochastic Processes

Spring 2011
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

MIT6 262S11 Lec02

Uploaded by

Copyright:

Available Formats

MIT6 262S11 Lec02

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MIT6 262S11 Lec02

Uploaded by

Copyright:

Available Formats

6.

262: Discrete Stochastic Processes

Lecture 2: More review; the Bernoulli process Outline: Expectations

for discrete X for continuous X for arbitrary nonneg X

Almost as important is the standard deviation, X =

Why is E [X ] = Fc X (x) dx for Look at discrete case. Then

arbitrary nonneg X ? Fc X (x) dx = i aipX (ai).

pX (a1) a1pX (a1) pX (a2) pX (a3) pX (a4)

a3pX (a3) a4pX (a4)

pIA (0) = 1 Pr {A} ;

pIA (1) = Pr {A} .

Theorems about rvs can thus be applied to events.

The rvs X1, . . . , Xn are independent if

for all x1, . . . , xn

This product form carries over for PMFs and PDFs.

(Bernoulli case) pX (1) = 1/4 pX (0) = 3/4

The sample average is Sn/n, which is a rv of mean X and variance 2/n.

0.8 FSn/n(z ) 0.6 0.4 0.2 0 0

The center of the distribution is X and the spread decreases with 1/ n.

0.8 FZn (z ) 0.6 S E[S ] Zn = nX nn 0.4 0.2 0

Central limit theorem: Sn nX lim Pr y n n

The Bernoulli process Sn = Y1 + Yn

pk q nk k To understand how this varies with k, consider

nk p k+1 q This is strictly decreasing in k. It also satises =

pSn (k+1) n! k!(nk)! pk+1q nk1 = pSn (k) (k+1)!(nk1)! n! pk q nk

for for for

pSn (k+1) nk p = pSn (k) k+1 q pSn 1

for for for

CLT for Bernoulli process

pSn (k+1) nk p = pSn (k) k+1 q

p (pn + i + 1) ln Sn pSn (pn + i)

pSn (pn + i + 1) npni p nq i p = = pSn (pn + i) pn+i+1 q pn + i + 1 q i 1 nq =

Recall that ln(1 + x) x x2/2 + for |x| << 1.

p (pn + i + 1) ln Sn pSn (pn + i)

p (pn + j ) ln Sn pSn (pn)

p (pn + j ) ln Sn pSn (pn)

j 2 pSn (pn + j ) pSn (pn) exp 2npq

MIT OpenCourseWare http://ocw.mit.edu

6.262 Discrete Stochastic Processes

You might also like