Main
Main
Main
3
4 CONTENTS
Chapter 1
Stochastic Variables
1.1.2 Solution
For one die, the range is {1, 2, 3, 4, 5, 6}. The probability distribution is 1/6 for
each. For two dice, the range is {2, 3, . . . , 12}. The probability distribution is
1
{1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1} .
36
1.2.2 Solution
The probability that any one flip is a heads is 2−1 . Therefore, the probability
of any one particular string of heads and tails is 2−N . We have to multiply that
5
6 CHAPTER 1. STOCHASTIC VARIABLES
n+m=N
n−m = g.
N!
P (g|N ) = 2−N .
N +g
2 ! N 2−g !
1.3.2 Solution
1.4.2 Solution
where γ = V1 /V2 .
1.5.2 Solution
This is just like the coin problem. Each time we place a molecule, it goes into
V1 with probability
V1
p= = 1 − (1 + γ)−1
V1 + V2
and it goes into V2 with probability
V2
q =1−p= = (1 + γ)−1 .
V1 + V2
Imagine a string a labels {1, 2, 2, 1, 2, 2, 2, 1, 1, . . .} where the ith label tells you
which volume the ith particle is in. The probability of any one particular ar-
rangement with n particles in V1 and N − n particles in V2 is pn q N −n . There
are N ! ways to rearrange all the labels, n! ways to rearrange the “1” labels, and
(N − n)! ways to rearrange the “2” labels. Therefore, the probability to find n
“1” labels is
N!
P (n) = pn q N −n
n!(N − n)!
N!
= (1 − (1 + γ)−1 )n (1 + γ)−(N −n)
n!(N − n)!
N
(algebra) = γ n (1 + γ)−N .
n
1.6.2 Solution
We solve the problem by first assuming that the balls are all distinguishable,
and that we draw them from the urn in a particular order. We will then apply
one correction for the fact that the black (white) balls are indistinguishable from
each other and another to account for the ordering.
Suppose that the balls, in addition to having color, have some other trait that
makes them each completely distinguishable, e.g. they could each be numbered.
Imagine a particular case where we extract M balls one by one, getting first n
white balls and then k black balls:
w1 , w2 , . . . , wn , b1 , b2 , . . . , bk
What is the probability of this precise string? The probability of drawing ball
w1 on the first draw is Nw /(Nw + Nb ). Then, with ball w1 already removed, the
probability of drawing ball w2 on the second draw is (Nw − 1)/(Nw + Nb − 1).
Continuing this reasoning, we find the probability P of the string to be
Nw Nw − 1 Nw − (n − 1)
P = ···
Nw + Nb Nw + Nb − 1 Nw + Nb − n + 1
| {z }
probability of white balls
Nb Nb − 1 Nb − (k − 1)
···
Nw + Nb − n Nw + Nb − n − 1 Nw + Nb − n − k + 1
| {z }
probability of black balls
Nw ! Nb ! (Nw + Nb − n − k)!
= .
(Nw − n)! (Nb − k)! (Nw + Nb )!
Now of course, we’re really trying to find the probability of getting n indistin-
guishable white balls and k indistinguishable black ones. Therefore, shuffling
the n white balls or k black balls among themselves does not result in a uniquely
distinguishable arrangement of the draws. To correct for the overcounting, we
should divide by n!k!. Furthermore, we don’t care what order we draw the balls
in, so we should multiply by (n + k)!. The result is
Nw ! Nb ! (n + k)!(Nw + Nb − (n + k))!
.
(Nw − n)!n! (Nb − k)k! (Nw + Nb )!
Using the binomial notation we can write
Nw Nb Nw + Nb
/
n k k+n
which is equivalent to what we are trying to show because M = n + k.
1.7. CUMULATIVE PDF - PG. 4 9
1.7.2 Solution
That P is monotone non-decreasing is obvious from the fact that P is positive,
because Z x
P(x) = P (x0 ) dx0 .
−∞
It’s also obvious, for the same reason (and from the normalization condition),
that P(−∞) = 0 and P(+∞) = 1.
The relation between P and P is
1.8.2 Solution
This comes down to integrating the probability density of a random walk process
with a large number of trials and a fixed 5% probability of success on each trial.
Consider a string of N polls. Each one is either “success”, meaning the
person polled is from the party, or “fail”, meaning that the person is not from the
party. The probability of success is p and the probability of failure is q = 1 − p.
The probability of getting n successes is
N!
P (n) = pn q N −n .
n!(N − n)!
It’s a reasonably well known result that if N is big enough, P (n) is well approx-
imated as1
(n − N p)2
1
P (n) ≈ √ exp −
2πσ 2 2σ 2
1 Use Stirling’s approximation to prove this.
10 CHAPTER 1. STOCHASTIC VARIABLES
where σ 2 = N pq. The probability that the fraction of successes is between 4.5%
and 5.5% is
0.055·N
X
K ≡ Probability(0.045 N < n < 0.055 N ) = P (n) .
n=0.045·N
I don’t think we can do this analytically (although there may be some tricky
way to approximate the result). Instead, we can evaluate the sum numerically
for a few values of N and see where we get, say, 90% probability. Doing this,
we find that we get 90% probability at around N = 5, 200. A python script
opinion poll.py for the numerical analysis is included with the source repo.
Note that it’s helpful to approximate the distribution as a Gaussian to make
the numerics simple. Otherwise, we’d have to compute really large factorials.
It would be very nice to have a real analytic treatment of this problem so
that we know the functional dependence on N . From the numerics you can see
that the dependence of K on N seems to follow the form 1 − exp(−N/N0 ) for
some N0 .
1.9.2 Solution
The moments are
Z a
1
µn ≡ xn dx
2a −a
Z 0 Z a
1 n n
= x dx + x dx
2a −a 0
Z a
1
= xn + (−x)n dx
2a 0
0 n odd
=
an /(n + 1) n even
1.10. MOMENTS OF GAUSSIAN DISTRIBUTION - PG. 5 11
x2
1
P (x) = √ exp − 2 .
2πσ 2 2σ
1.10.2 Solution
The odd moments are obviously zero by symmetry. For the even moments, first
note that µ0 = 1 is just a statement that the distribution is normalized. Now
consider µ2 ,
Z ∞
x2
1
µ2n = √ x2n exp − 2 dx
2πσ2 −∞ 2σ
x2n+1 ∞ Z ∞
x2 x2
1 1
=√ exp − 2 + 2 x2n+2 exp − 2 dx
2πσ 2 2n + 1 2σ −∞ σ (2n + 1) −∞ 2σ
| {z }
0
1
= 2 µ2n+2
σ (2n + 1)
µ2n+2 = σ 2 (2n + 1)µ2n . (?)
Now use induction. Assuming the target relation is true for µ2n , we can rewrite
(?) as
1.11.2 Solution
The generating function is
Z a
1
G(k) = eikx dx
2a −a
1 e − e−ika
ika
=
2a ik
sin(ak)
=
ak
∞
X 1 (ak)2n+1
= (−1)n
n=0
ak (2n + 1)!
∞
X a2n (ik)2n
= (−1)n
n=0
i2n (2n+ 1)!
∞
X (ik)2n
= a2n
n=0
(2n + 1)!
∞
X an (ik)n
= .
n even
n + 1 n!
Therefore, the moments are µn = an /(n + 1) for even n, and zero for odd
n.
1.12.2 Solution
The generating function for the Gaussian is
Z ∞
−x2
1
G(k) = √ exp − 2 eikx dx
2πσ 2 −∞ 2σ
2 2
σ k
= exp −
2
2 2
σ k
ln G(k) = − .
2
Therefore, all of the cumulants are zero except for the second, which is σ 2 .
If we add in a first cumulant, we’d wind up with a Gaussian shifted away
from zero. So, I guess the most general distribution with all cumulants beyond
the second is just a shifted Gaussian.
1.13. POISSON CUMULANTS - PG. 7 13
1.13.2 Solution
The Poisson distribution can be written as a continuous distribution using delta
functions:
∞
X an −a
p(x) = e δ(x − n) .
n=0
n!
The moment generating function is therefore
∞ Z
X an −a
G(k) = e δ(x − n) eikx dx
n=0
n!
∞
X an −a ikn
= e e
n=0
n!
∞
X (aeik )n −a
= e
n=0
n!
= exp aeik − a
= exp a(eik − 1)
ln G(k) = a(eik − 1)
∞
X (ik)2
=a .
n!
k=1
with a = ρV1 .
1.14.2 Solution
We just barge ahead with Stirling’s approximation and ln(1 + x) ≈ x,
N!
pn = (1 + γ)−N γ n
n!(N − n)!
ln pn = −N ln(1 + γ) + n ln γ + ln N ! − ln(N − n)! − ln n!
= −N γ + n ln γ + N ln N − N − (N − n) ln(N − n) + (N − n) − ln n!
= −N γ + n ln γ + N ln N − N − N ln(N − n) + n ln(N − n) + (N − n) − ln n!
(N n) = −N γ + n ln(N γ) − ln n!
(ρV1 )n
pn = e−ρV1
n!
which is what we wanted to show.
1.15.2 Solution
The Lorentz distribution is
1 γ
P (x) = .
π (x − a)2 + γ 2
The characteristic function can be computed directly:
eikx
Z
γ
G(k) = dx
π (x − a)2 + γ 2
eiky
Z
γ
= eika dy
π y2 + γ 2
eiky
Z
γ
= eika dy .
π (y + iγ)(y − iγ)
Now we use contour integration. For k > 0 y must have a positive imaginary
part in order for the integrand to converge for large |y|. Therefore, we close the
contour in the upper half plane. In that case we get
For k < 0 we close the contour in the lower half plane and get
1.16.2 Solution
The distribution is found from the inverse Fourier transform
Z
dk
P (x) = G(k)e−ikx
2π
Z
1 dk iak
e + e−iak e−ikx
=
2 2π
Z
1 dk iak
e + e−iak e−ikx
=
2 2π
1
= (δ(x − a) + δ(x + a)) .
2
This shows that all the odd moments are zero, and the even moments are µ2n =
a2n .
16 CHAPTER 1. STOCHASTIC VARIABLES
1.17.2 Solution
Let’s recall the meaning of generating function.
The moment generating func-
tion G, defined by the equation G(k) = eikx , has the property that
Define a function F as
F (k) = hk x i .
Differentiating F and evaluating at k = 1 gives us the factorial moments:
∞
X (−x)m
log F (1 − x) θm .
m=1
m!
Express the first few in terms of the moments. Show that the Poisson distribu-
tion is characterized by the vanishing of all factorial cumulants beyond θ1 .
1.18.2 Solution
I’m skipping writing the cumulants in terms of the moments.
As shown in the previous problem, the factorial moment generating function
F (k) is
F (k) = hk x i
n
X an n
F (k) = e−a k
n=0
n!
= exp [a(k − 1)]
(ln F )(k) = a(k − 1)
(ln F )(1 − k) = −ak .
Therefore, θ1 = a and all other factorial cumulants are zero for the Poisson
distribution.
N!
pn = (1 + γ)−N γ n .
n!(N − n)!
18 CHAPTER 1. STOCHASTIC VARIABLES
1.19.2 Solution
The moment generating function is
F (k) = hk x i
N
X N!
= (1 + γ)−N γ n kn
n=0
n!(N − n)!
= (1 + γ)−N (1 + γk)N .
N!
(Dm F )(k) = (1 + γ)−N γ m (1 + γk)N −m
(N − m)!
−m
N! 1
(Dm F )(k = 1) = 1+
(N − m)! γ
N!
(γ 1) ≈ γm .
(N − m)!
1.20.2 Solution
Use Bayes’s rule:
Z Z
1
PA|B (a, b)da = PA,B (a, b) da
PB (b)
PB (b)
=
PB (b)
= 1.
1.21.2 Solution
Use Bayes’s rule again:
Therefore, when all variables are mutually exclusive, the joint probability
factorizes into a product of the marginals.
1
PX1 ,X2 (x1 , x2 ) = δ(x21 + x22 − a2 ) .
πa
20 CHAPTER 1. STOCHASTIC VARIABLES
1.22.2 Solution
1.23.2 Solution
The ways to get a total of 9 are:
(3, 6), (4, 5), (5, 4), (6, 3) .
Therefore, the probability distribution P (n) for the first die to show n points is
P (1) = P (2) = 0, P (3) = P (4) = P (5) = P (6) = 1/4 .
This is not incompatible with the fact that the dice are independent because
the fact that the total is 9 is a constraint which does not exist when describing
a-priori probabilities.
1.24.2 Solution
Use Bayes’s rule,
Pdie at age & alive at age (t, τ )
Pdie at age|alive at age (t|τ ) = .
Palive at age (τ )
If your age is τ that means you die at some time t0 such that t0 > τ . Therefore,
the probability that a given person is still alive at age τ is
Z ∞
alive at age τ = P (t0 ) dt0 .
τ
1.25. MOMENTS AND CHARACTERISTIC FUNCTION OF SUBSET OF VARIABLES - PG. 1221
Also, when t > τ , the joint probability that a person is alive at τ and dies at t is
the same thing as the marginal probability that the person dies at t. Therefore
we have
P (t)
Pdie at age|alive at age (t|τ ) = R ∞
τ
P (t0 ) dt0
as we wanted to show.
Now we suppose P (t) = γe−γt . Then the integral is
∞ ∞
e−γt
Z
0 0
P (t ) dt = γ
τ −γ τ
= e−γτ .
γe−γt
Pdie at age|alive at age (t|τ ) = = γe−γ(t−τ ) = Pdie at age (t − τ )
e−γτ
which is what we wanted to show.
To show that there is only one possible P (t) with the property we just
demonstrated, we use calculus:
P (t)
P (t − τ ) = R ∞
τ
P (t0 ) dt0
Z ∞
P (t)
P (t0 ) dt0 =
τ P (t − τ )
P (t)
(differentiate w.r.t. τ ) − P (τ ) = − (−P 0 (t − τ ))
P (t − τ )2
P (τ )P (t − τ )2
P 0 (t − τ ) = −
P (t)
(set τ = 0) P 0 (t) = −P (0)P (t)
P (t) = P (0)e−P (0) t .
1.25.2 Solution
Call the subset of variables A and the other variables B. Suppose there are m
variables in A and n variables in B.
Denote the marginal distribution as PA , i.e.
Z
PA (a) ≡ PA,B (a, b) db .
B
= GA,B (k1 , . . . , km , 0, . . . , 0) .
• The cumulants hhX1m1 X2m2 ii vanish when both m1 and m2 are not zero.
1.26.2 Solution
First, we show that if the variables are independent, the moments factorize. We
just crank through the math:
Z
hX1m1 X2m2 i = PX1 ,X2 (x1 , x2 )xm 1 m2
1 x2 dx1 dx2
X1 ,X2
Z Z
m1
= PX1 (x1 )x1 dx1 PX2 (x2 )xm
2 dx2
2
X1 X2
= hX1m1 i hX2m2 i .
This proves that if two variables are independent, the moments factorize. The
extension to more variables is obvious.
Next, we show that if the moments factorize, then the characteristic function
factorizes. Again, we just use math:
∞
X (ik1 )m1 (ik2 )m2
G(k1 , k2 ) = hX1m1 X2m2 i
0
m1 !m2 !
∞ ∞
X (ik1 )m1 X (ik2 )m2
= hX1m1 i hX2m2 i
0
m1 ! 0
m2 !
= G1 (k1 )G2 (k2 ) .
The Taylor series for this function obviously has no cross terms, which means
the cumulants with more than one variable having a non-zero power vanish.
It remains only to show that the vanishing of cumulants where more than
one power is nonzero implies that the variables are independent. If we can prove
that, then we’ll have completed a circle of implications showing that any one
of the three listed criteria, and the independence of the variables, all imply one
another.
If the cumulants vanish when more than one of the powers is nonzero, then
24 CHAPTER 1. STOCHASTIC VARIABLES
we can write
∞ ∞
X (ik1 )j X (ikn )j
log G(k1 , . . . , kn ) = + ··· +
j=0
j! j=0
j!
n ∞ j
Y X (ikm )
G(k1 , . . . , kn ) = exp
m=1 j=0
j!
n
Y
≡ Gm (km )
m=1
n Z
Y dkm
PX1 ,...,Xn (x1 , . . . , xn ) = Gm (km )e−ikm xm
m=1
2π
Yn
= PXm (xm ) .
m=1
= r .
hhXi2 ii Xj2
2 2 2 2
hXi i − hXi i Xj − hXj i
1.27.2 Solution
The Cauchy-Schwarz inequality says that for two vectors A and B in an inner
product space,
2
|hA|Bi| ≤ hA|Ai · hB|Bi .
where here h·|·i means inner product. Fortunately, given two variables X and
Y , the set of functions over those variables is a vector space. Also, given a
probability distribution PX,Y over those variables, the average
Z
hf gi ≡ PX,Y (x, y)f (x, y) g(x, y) dx dy
2
|h(X − hXi)(Y − hY i)i|
≤1
h(X − hXi)2 i h(Y − hY i)2 i
2
|hXY i − hXi hY i|
≤ 1.
2 2
hX 2 i − hXi hY 2 i − hY i
This proves that x and y are linearly related. I’m not really sure what this means
though. To say that X and Y are linearly related seems to be a statement about
the probability distribution itself, but I’m not sure yet how to think about what
that means.