Mathematics For Informatics 4a: The Story of The Film So Far..

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

The story of the film so far...

X, Y independent random variables and Z = X + Y :


fZ = fX ? fY , where ? is the convolution
Mathematics for Informatics 4a
X, Y with joint density f(x, y) and Z = g(X, Y):
x
José Figueroa-O’Farrill E(Z) = g(x, y)f(x, y)dx dy

X, Y independent:
E(XY) = E(X)E(Y)
Var(X + Y) = Var(X) + Var(Y)
MX+Y (t) = MX (t)MY (t), where MX (t) = E(etX )
We defined covariance and correlation of two r.v.s
Lecture 14 Proved Markov’s and Chebyshev’s inequalities
9 March 2012 Proved the (weak) law of large numbers and the
Chernoff bound
Waiting times of Poisson processes are exponentially
distributed
José Figueroa-O’Farrill mi4a (Probability) Lecture 14 1 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 2 / 23

More approximations 0.35 0.25

0.30
0.20

In Lecture 7 we saw that the binomial distribution with 0.25

0.15
0.20
parameters n, p can be approximated by a Poisson distribution 0.15

with parameter λ in the limit as n → ∞, p → 0 but np → λ:


0.10

0.10
0.05
0.05

λk
 
n k
p (1 − p)n−k ∼ e−λ
k k! n=5 n = 10

But what about if n → ∞ but p 6→ 0?


0.10
For example, consider flipping a fair coin n times and let X 0.15

denote the discrete random variable which counts the number 0.08

0.10

of heads. Then  
0.06

n 1 0.04

P(X = k) = 0.05

k 2n 0.02

Lectures 6 and 7: this distribution has µ = n/2 and σ2 = n/4.


n = 20 n = 50

José Figueroa-O’Farrill mi4a (Probability) Lecture 14 3 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 4 / 23
Normal limit of (symmetric) binomial distribution Proof
Theorem Let k = n
+ x. Then
2
1
Let X be binomial with parameter n and p = 2. Then for n large
n!2−n
   
and k − n/2 not too large, n −n n
2 = n 2−n = n
 n 
2 +x ! 2 −x !
k 2 + x
r
n n
− 1 ··· n
 
n 1 1 2 −2(k−n/2)2 /n
 
'√
2 2
e−(k−µ) /2σ = e n!2−n 2 2 2 − (x − 1)
n = n n × n
k 2 nπ
 n  n

2πσ 2 ! 2 ! 2 +1 2 + 2 ··· 2 + x
    
2 2 n x
for µ = n/2 and σ2 = n/4. 1 1 − · · · 1 − (x − 1)
r
2 n n 2
' ×     
x
nπ 1 + n2 1 + 2 n2 · · · 1 + x n2 n2
The proof rests on the de Moivre/Stirling formula for the factorial
of a large number:
√ √ Now we use the exponential approximation
n! ' 2πnn ne−n
1
which implies that 1 − z ' e−z and ' e−z
r 1+z
 
n n! 2
= ' 2n (valid for z small) to rewrite the big fraction in the RHS.
n/2 (n/2)!(n/2)! πn

José Figueroa-O’Farrill mi4a (Probability) Lecture 14 5 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 6 / 23

Proof – continued. Example (Rolling a die ad nauseam)


r It’s raining outside, you are bored and you roll a fair die 12000
times. Let X be the number of sixes. What is
   
n −n 2 4 8 2(x − 1) 2x
2 ' exp − − − · · · − −
k nπ n n n n P(1900 < X < 2200)?
r
2

4 2x
 The variable X is the sum X1 + · · · + X12000 , where Xi is the
= exp − (1 + 2 + · · · + (x − 1)) − number of sixes on the ith roll. This means that X is binomially
nπ n n
r   distributed with parameter n = 12000 and p = 16 , so
2 4 x(x − 1) 2x
= exp − − µ = pn = 2000 and σ2 = np(1 − p) = 5000 .
nπ n 2 n X−2000
√ √3
r X ∈ (1900, 2200) iff σ ∈ (− 6, 2 6), whence
2 2 /n
= e−2x √ √
nπ P(1900 < X < 2200) ' Φ(2 6) − Φ(− 6) ' 0.992847

which is indeed a normal distribution with σ2 = n


4. The exact result is
A similar proof shows that the general binomial distribution with X
2199  
12000  1 k  5 12000−k
µ = np and σ2 = np(1 − p) is also approximated by a normal 6 6 ' 0.992877
k
distribution with the same µ and σ2 . k=1901

José Figueroa-O’Farrill mi4a (Probability) Lecture 14 7 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 8 / 23
Normal limit of Poisson distribution We have just shown that in certain limits of the defining
parameters, two discrete probability distributions tend to
0.12
normal distributions:
0.15
0.10 the binomial distribution in the limit n → ∞,
0.10
0.08 the Poisson distribution in the limit λ → ∞
0.06

0.04
What about continuous probability distributions?
0.05

0.02 We could try with the uniform or exponential distributions:

λ=5 λ = 10 1.0 4

0.8
3

0.6
0.08 0.05 2

0.4
0.04
0.06
1
0.03 0.2
0.04

0.02
-1 1 2 3 4 0.5 1.0 1.5 2.0 2.5 3.0
0.02
0.01
No amount of rescaling is going to work. Why?

λ = 20 λ = 50

José Figueroa-O’Farrill mi4a (Probability) Lecture 14 9 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 10 / 23

The binomial and Poisson distributions have the following Sum of uniformly distributed variables
property:
Xi i.i.d. uniformly distributed on [0, 1]: then X1 + · · · + Xn is
if X, Y are binomially distributed with parameters (n, p) and
(m, p), X + Y is binomially distributed with parameter 1.4 1.0

(n + m, p) 1.2

1.0
0.8

if X, Y are Poisson distributed with parameters λ and µ, 0.8


0.6

X + Y is Poisson distributed with parameter λ + µ 0.6


0.4

0.4

It follows that if X1 , X2 , . . . are i.i.d. with binomial 0.2


0.2

distribution with parameters (m, p), X1 + · · · + Xn is -1.0 -0.5 0.5 1.0 1.5 2.0 -1 1 2 3

binomial with parameter (nm, p). Therefore m large is n=1 n=2


equivalent to adding many of the Xi .
It also follows that if X1 , X2 , . . . are i.i.d. with Poisson 0.8 0.7

0.6

distribution with parameter λ, X1 + · · · + Xn is Poisson 0.6


0.5

distributed with parameter nλ and again λ large is 0.4


0.4

0.3

equivalent to adding a large number of the Xi . 0.2


0.2

The situation with the uniform and exponential distributions 0.1

-1 1 2 3 4 -1 1 2 3 4 5

is different.
n=3 n=4

José Figueroa-O’Farrill mi4a (Probability) Lecture 14 11 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 12 / 23
Sum of exponentially distributed variables
0.12

If Xi are i.i.d. exponentially distributed with parameter λ, we 0.15 0.10

already saw that Z2 = X1 + X2 has a “gamma” probability 0.08

0.10
density function: 0.06

2 −λz
fZ2 (z) = λ ze 0.05
0.04

0.02

2 4 6 8 10 5 10 15 20

It is not hard to show that Zn = X1 + · · · + Xn has probability n=5 n = 10


density function

zn−1 −λz
fZn (z) = λn e 0.08 0.04

(n − 1)!
0.06 0.03

0.04 0.02

What happens when we take n large?


0.02 0.01

10 20 30 40 20 40 60 80 100 120 140

n = 20 n = 75

José Figueroa-O’Farrill mi4a (Probability) Lecture 14 13 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 14 / 23

The Central Limit Theorem Our 4-line proof of the CLT rests on Lévy’s continuity law,
which we will not prove.
Let X1 , X2 , . . . be i.i.d. random variables with mean µ and
variance σ2 . Paraphrasing: “the m.g.f. determines the c.d.f.”
Let Zn = X1 + · · · + Xn . It is then enough to show that the limit n → ∞ of the m.g.f.
of Z√ n −nµ
is the m.g.f. of the standard normal distribution.
Then Zn has mean nµ and variance nσ2 , but in addition we nσ

have
Proof of CLT.
Theorem (Central Limit Theorem) We shift the mean: the variables Yi = Xi − µ are i.i.d. with
In the limit as n → ∞, mean 0 and variance σ2 , and Zn − nµ = Y1 + · · · + Yn .
  MZn −nµ (t) = MY1 (t) · · · MYn (t) = MY1 (t)n , by i.i.d.
Zn − nµ n
√ 6 x → Φ(x)
  
P t t
nσ M Zn√−nµ (t) = MZn −nµ √nσ = MY1 √nσ

 2 t2
n 2
with Φ the c.d.f. of the standard normal distribution. M Zn√−nµ (t) = 1 + 2σnσ 2 + · · · → et /2 , which is the m.g.f.

In other words, for n large, Zn is normally distributed. of a standard normal variable.

José Figueroa-O’Farrill mi4a (Probability) Lecture 14 15 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 16 / 23
Example (Rounding errors)
Suppose that you round off 108 numbers to the nearest integer,
and then add them to get the total S. Assume that the rounding
errors are independent and uniform on [− 12 , 12 ]. What is the
Crucial observation probability that S is wrong by more than 3? more than 6?
The CLT holds regardless of how the Xi are distributed! Let Z = X1 + · · · + X108 . We may approximate it by a normal
The sum of any large number of i.i.d. normal variables always distribution with µ = 0 and σ2 = 108
12 = 9, whence σ = 3.
tends to a normal distribution. S is wrong by more than 3 iff |Z| > 3 or |Z−µ|
σ > 1 and hence
This also explains why normal distributions are so popular in
P(|Z − µ| > σ) = 1 − P(|Z − µ| 6 σ)
probabilistic modelling.
Let us look at a few examples. = 1 − (2Φ(1) − 1) = 2(1 − Φ(1)) ' 0.3174

|Z−µ|
S is wrong by more than 6 iff σ > 2 and hence

P(|Z − µ| > 2σ) = 1 − P(|Z − µ| 6 2σ)


= 1 − (2Φ(2) − 1) = 2(1 − Φ(2)) ' 0.0456

José Figueroa-O’Farrill mi4a (Probability) Lecture 14 17 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 18 / 23

Place your bets! Example (Roulette – continued)


Let Xi denote your winnings on the ith spin of the wheel. Then
Example (Roulette) 9
P(Xi = 1) = 19 and P(Xi = −1) = 10
19 . The mean is therefore
A roulette wheel has 38 slots: the numbers 1 to 36 (18 black,
1
18 red) and the numbers 0 and 00 in green. µ = P(Xi = 1) − P(Xi = −1) = − 19

You place a £1 bet on and the variance is


whether the ball will land on
a red or black slot and win σ2 = P(Xi = 1) + P(Xi = −1) − µ2 = 1 − 1
361 = 360
361
£1 if it does. Otherwise you
Then Z = X1 + · · · + X361 has mean −19 and variance 360. This
lose the bet. Therefore you
means that after 361 spins you are down £19 on average. We
win £1 with probability
18 9 are after the probability P(Z > 0):
38 = 19 and you “win” −£1
with probability 20 10
38 = 19 .
   
P(Z > 0) = P √ 19
Z+
> √19 =1−P √ 19
Z+
6 √19
360 360 360 360
After 361 spins of the wheel, what is the probability that you are ' 1 − Φ(1) ' 0.1587
ahead? (Notice that 361 = 192 .)
So there is about a 16% chance that you are ahead.

José Figueroa-O’Farrill mi4a (Probability) Lecture 14 19 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 20 / 23
Example (Measurements in astronomy) Example (Measurements in astronomy – continued)
Astronomical measurements are subject to the vagaries of Zn√
−nd
By the CLT we can assume that 2 n
is standard normal, so
weather conditions and other sources of errors. Hence in order we are after  √ 
to estimate, say, the distance to a star one takes the average of n
P | Z2n√
−nd
n
| 6 4 ' 0.95
many measurements. Let us assume that different √
n
measurements are i.i.d. with mean d (the distance to the star) or, equivalently, 4 = 2, so that n = 64.
and variance 4 (light-years2 ). How many measurements should
we take to be “reasonably sure” that the estimated distance is A question remains: is n = 64 large enough for the CLT? To
accurate to within half a light-year? answer it, we need to know more about the distribution of the
Let Xi denote the measurements and Zn = X1 + · · · + Xn . Let’s Xi . However Chebyshev’s
  inequality
 can be used to provide a
say that “reasonably sure” means 95%, which is 2σ in the safe n. Since E n = d and Var n = n4 , Chebyshev’s
Zn Zn

standard normal distribution. (In Particle Physics, “reasonably inequality says


sure” means 5σ, but this is Astronomy.) Then we are after n  
4 16
such that   P | Znn − d| > 0.5 6 n(0.5)2
= n
P | Znn − d| 6 0.5 ' 0.95  
so choosing n = 320 gives P | n − d| > 0.5 6 0.05 or
Zn
 
P | Znn − d| 6 0.5 > 0.95 as desired.
José Figueroa-O’Farrill mi4a (Probability) Lecture 14 21 / 23 José Figueroa-O’Farrill mi4a (Probability) Lecture 14 22 / 23

Summary
The binomial distribution with parameters n, p can be
approximated by a normal distribution (with same mean
and variance) for n large.
Similarly for the Poisson distribution with parameter λ as
λ→∞
These are special cases of the Central Limit Theorem: if
Xi are i.i.d. with mean µ and (nonzero) variance σ2 , the
sum Zn = X1 + · · · + Xn for n large is normally distributed.
We saw some examples on the use of the CLT: rounding
errors, roulette game, astronomical measurements.

José Figueroa-O’Farrill mi4a (Probability) Lecture 14 23 / 23

You might also like