Probbook
Probbook
Probbook
Olive
Springer
Preface
Acknowledgements
Teaching Probability and Measure and Large Sample Theory as Math 581
and Math 582 at Southern Illinois University in Fall 2021 and Spring 2022
was useful.
v
Contents
vii
viii Contents
6 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Chapter 1
Probability Measures and Measures
Definition 1.1. The sample space Ω is the set of all possible outcomes of
an experiment.
Remark 1.1. We will assume that Ω is not the empty set, which is the
set that contains no elements. The experiment is an idealized experiment.
For example, toss a coin once. Then Ω = {heads, tails}. Outcomes where one
can not tell whether the coin is heads or tails are not allowed in the idealized
experiment.
1
2 1 Probability Measures and Measures
S∞
Warning
T∞ 1.1: Since ∞ is not an integer, there is no set A∞ in i=m Ai
or i=m Ai .
c
vi) [A ∩ B] = Ac ∪ B c .
Example 1.1. The largest σ-field consists of all subsets of Ω. The smallest
σ-field is F = {∅, Ω}.
for k = 2, ..., n. To see that the Bk are disjoint, without loss of generality
(WLOG) let j < k. Then Bj ⊆ Aj and Bk ⊆ Acj . Hence Bj and Bk are
disjoint for j 6= k. Now ∪ni=1 Ai =
n
]
A1 ∪ [A2 ∩ Ac1 ] ∪ [A3 ∩ (A1 ∪ A2 )c ] ∪ · · · ∪ [An ∩ (A1 ∪ · · · ∪ An−1 )c ] = Bi .
i=1
6 1 Probability Measures and Measures
(Use induction or make a Venn diagram of concentric circles with the inner-
most circle A1 . Then the second innermost circle is A1 ∪ A2 where the ring
about the A1 circle is the set B2 , the third innermost circle is A1 ∪ A2 ∪ A3
where the ring about the A1 ∪ A2 circle is B3 , et cetera.)
vi) We will find disjoint sets B1 , ..., Bn, ... such that the Bk are disjoint,
An = ∪nk=1 Ak = ∪nk=1 Bk , and A = ∪∞ ∞
k=1 Ak = ∪k=1 Bk . Then
∞
X n
X
P (A) = Bk = lim P (Bk ) = lim P (An ).
n→∞ n→∞
k=1 k=1
Thus
n
X
P (An ) = P (Bk ) ↑ P (A).
k=1
Remark 1.3. a) Unlike the limit, limn an and limn an always exist when
±∞ are allowed as limits, since limits of nondecreasing and nonincreasing
sequences then exist.
b) limn an ≤ limn an
c) limn an = a iff limn an = limn an = a. Hence the limit of a sequence
exists iff limn an = limn an . Again, a = ±∞ is allowed.
d) Let lim∗n an be limn an or limn an .
If an ≤ bn , then lim∗n an ≤ lim∗n bn .
If an < bn , then lim∗n an ≤ lim∗n bn .
If an ≥ bn , then lim∗n an ≥ lim∗n bn .
If an > bn , then lim∗n an ≥ lim∗n bn .
That is, when taking the liminf or limsup on both sides of a strict inequality,
the < or > must be replaced by ≤ or ≥.
A similar result holds for limits if both limits exist.
e) limsupn (−an ) = −liminfn an .
f) i) limsupn an = limn an is the limit of the nonincreasing sequence
iii)
limn an = inf sup ak = lim sup(an , n ≥ k).
n k≥n k→∞
iv)
limn an = sup inf ak = lim inf(an , n ≥ k).
n k≥n k→∞
Remark 1.4. Warning: a common error is to take the limit of both sides
of an equation an = bn or of an inequality an ≤ bn . Taking the limit is an
error if the existence of the limit has not been shown. If ±∞ are allowed,
limn an and limn an always exits. Hence the limn an or limn an of the above
equation or inequality can be taken.
8 1 Probability Measures and Measures
lim An .
b) Cn = ∪∞ ∞
k=n Ak ↓ limAn . Thus limn→∞ ∪k=n Ak = lim An , and lim An =
∞
∩n=1 Cn .
c) Do not treat convergence of sets like convergence of functions.
An → A iff lim sup An = liminf An which implies that if ω ∈ An for in-
finitely many n, then ω ∈ An for all but finitely many n.
d) Warning: Students who have not figured out the following two examples
tend to make errors on similar problems.
e) Typically want to show that open, closed, and half open intervals can be
written as a countable union or countable intersection of intervals of another
type. Then the Borel σ-field B(R) = σ(C) where C is a class of intervals such
as the class of all open intervals.
Example 1.4. Prove the following results.
a) A1 ⊆ A2 ⊆ · · · implies that An ↑ A = ∪∞ n=1 An .
b) A1 ⊇ A2 ⊇ · · · implies that An ↓ A = ∩∞ n=1 An .
Proof. a) For each n, A = ∪∞ A
k=n k . Thus limsup An = ∩∞ n=1 A = A. For
each n, ∩∞ k=n kA = A n . Thus liminf A n = ∪ ∞
A
n=1 n = A.
b) For each n, ∪∞ ∞
k=n Ak = An . Thus limsup An = ∩n=1 An = A. For each
n, ∩∞ A
k=n k = A. Thus liminf A n = ∪ ∞
n=1 A = A.
Example 1.5. Simplify the following sets where a < b. Answers might be
(a, b), [a, b), (a, b], [a, b], [a, a] = {a}, (a, a) = ∅.
∞
\ 1
a) I = a, b +
n=1
n
∞
[ 1
b) I = a, b −
n=1
n
∞
[ 1 1
c) I = a+ ,b−
n=1
n n
∞
\ 1
d) I = a, b +
n=1
n
∞
\ 1
e) I = a, a +
n=1
n
∞
[ 1
f) I = a, b −
n=1
n
∞ ∞
\ 1 \
Solution. a) I = (a, b] = a, b + = An where An ↓ I. Note
n=1
n n=1
∞
\ 1 1
that (a, b] ⊆ A = a, b + since b ∈ a, b + ∀n. For any > 0,
n=1
n n
(a, b + ] 6⊆ A since b + 1/n < b + for large enough n. Note that b + 1/n → b,
10 1 Probability Measures and Measures
but sets are not functions. (A common error is to say I = (a, b).)
∞ ∞
[ 1 [
b) I = (a, b) = a, b − = An where An ↑ I. Note that
n=1
n n=1
∞
[ 1 1
b 6∈ a, b − = A since b 6∈ a, b − ∀n and since n ∈ N so n = ∞
n=1
n n
1
never occurs. Note that a, b − = ∅ if b − 1/n ≤ a. For any > 0 such
n
that b − > a, it follows that (a, b − ] ∈ A since b − 1/n > b − for large
enough n, say n > N . Thus b − ∈ A all but finitely many times.
∞ ∞
[ 1 1 [
c) I = (a, b) = a + ,b− = An where An ↑ I. Note that
n=1
n n
n=1
1 1
a, b 6∈ A = I since a, b 6∈ a + , b − ∀n ∈ N. Then the proof is simi-
n n
lar to that of b).
∞ ∞
\ 1 \
d) I = [a, b] = a, b + = An where An ↓ I. This proof is similar
n=1
n n=1
to that of a).
∞ ∞
\ 1 \
e) I = [a, a] = {a} = a, a + = An where An ↓ I. Note that
n=1
n n=1
a ∈ A = I, but a + 6∈ A ∀ > 0.
∞ ∞
[ 1 [
f) I = [a, b) = a, b − = An where An ↑ I. This proof is similar
n=1
n n=1
to that of b).
Theorem 1.3 proved monotone continuity (continuity from below and con-
tinuity from above) of P . The following theorem proves continuity of P . In
the proof, we can’t take limits when the limits have not been shown to exist,
but we can use the lim or lim operators.
Theorem 1.5. For each sequence {An } of F sets,
i) P (liminfn An ) ≤ liminfn P (An ) ≤ limsupn P (An ) ≤ P (limsupn An )
ii) Continuity of probability: If An → A, then P (An ) → P (A).
Proof. i) We need to show a) P (liminfn An ) ≤ liminfn P (An ) and b)
limsupn P (An ) ≤ P (limsupn An ). Let Bn = ∩∞ k=n Ak ↑ liminfn An , and
Cn = ∪∞ k=n kA ↓ limsup n A n . Then P (An ) ≥ P (Bn ) → P (liminfn An ).
(We can’t take limits on both sides of the inequality since we do not
know if limn P (An ) exists. Note that limn P (Bn ) = P (liminfn An ) by
monotone continuity.) Taking liminf of both sides gives liminfn P (An ) ≥
liminfn P (Bn ) = P (liminfn An ), proving a).
Similarly, P (An ) ≤ P (Cn ) → P (limsupn An ). Taking limsup of both sides
of the inequality gives limsupn P (An ) ≤ limsupn P (Cn ) = P (limsupn An ),
proving b).
1.1 Probability Measures 11
ii) Follows from i) since P (An ) → P (A) iff lim P (An ) = lim P (An ) =
P (A).
The following theorem shows that if A1 , A2 , ... are sets each having prob-
ability 0, then ∪∞
i=1 Ai is also a set having probability 0. If A1 , A2 , ... are sets
each having probability 1, then ∩∞ i=1 Ai is also a set having probability 1.
1.2 Measures
The proof of this theorem is similar to that of Theorem 1.3 which gives
properties of a probability measure. One difference is that in Theorem 1.10
v), the condition µ(A1 ) < ∞ is needed. See Problem 1.9.
1.3 Summary
1) The sample space Ω is the set of all outcomes from an idealized experi-
ment. The empty set is ∅. The complement of a set A is Ac = {ω ∈ Ω :
ω 6∈ A}.
2) Let Ω 6= ∅. A class F of subsets of Ω is a σ−field (or σ−algebra) on Ω
if
i) Ω ∈ F.
ii) A ∈ F ⇒ Ac ∈ F.
iii) A, B ∈ F ⇒ A ∩ B ∈ F.
iv) A1 , A2 , ... ∈ F ⇒ ∪∞
i=1 Ai ∈ F.
Note that i), ii), and iii) mean that a σ-field is a field (or algebra) on Ω. A
σ−field is closed under countable set operations. The term “on Ω” is often
understood and omitted.
Common error: Use n instead of ∞ in iv).
3) De Morgan’s laws: i) A ∩ B = (Ac ∪ B c )c , ii) A ∪ B = (Ac ∩ B c )c ,
c
iii) [∪∞ ∞ c
i=1 Ai ] = ∩i=1 Ai .
1.4 Complements
Billingsley (1995), Dudley (2002), Durrett (1995), Feller (1971), and Resnick
(1999).
Gaughan (2009) is a good reference for induction.
1.5 Problems
was done for proving the analogous property for a probability measure.
e) continuity from below: If An ↑ A then µ(An ) ↑ µ(A).
Hint: Let B1 = A1 and Bk = Ak −Ak−1 . You may use the fact that the Bk
are disjoint, An = ∪ni=1 Ai = ∪ni=1 Bi for each n, and A = ∪∞ ∞
i=1 Ai = ∪i=1 Bi ,
as was done for proving the analogous property for a probability measure.
1.10. Suppose An = A for n ≥ 1 where P (A) = 0.5. Then An → A,
P (An ) → P (A) = 0.5, limsupn An = ∩∞ ∞
n=1 ∪k=n Ak = {ω : ω ∈ An for
infinitely many An } = A and liminfn An = ∪n=1 ∩∞∞
k=n Ak = {ω : ω ∈ An =
for all but finitely many An } = A. It is known that liminf An and limsup An
are tail events. Why does the above result not contradict Kolmogorov’s zero-
one (0-1) law?
1.11. Let µ be a measure on (Ω, F ) and let c > 0. Prove that ν = cµ is
measure on (Ω, F ).
18 1 Probability Measures and Measures
Qn n
Qn
QnNote: If µ = i=1 µi is a product measure, then ν = c µ = i=1 cµi =
i=1 νi is a product measure by Problem 1.11). Also, a finite measure µ =
P/c is a scaled probability measure ν = P = cµ with c = 1/µ(Ω).
Exam and Quiz Problems
∞ [
[ 1 1 1 1
1.12. Let a < b and let I = a+ ,b− = a+ ,b− =
n=1
n n n n
n∈N
∞
[ 1 1
a+ ,b− where m is the smallest positive integer such that a +
n=m
n n
1 1
≤ b− since [c, d] = ∅ if c > d. I is equal to an interval. Find that
m m
interval.
1.13. a)!Let {Ai }∞i=1 be a sequence of sets such that P (An ) = 0 ∀n. Prove
[∞
P Ai = 0.
i=1
b) Let {Bi }∞ ! 1 ∀n. Then
i=1 be a sequence of sets such that P (Bn ) =
∞
∞
\
P (Bnc ) = 0 ∀n, and by a), P ( i=1 Bic ) = 0. Prove P
S
Bi = 1.
i=1
1.14. For an arbitrary sequence of events {An },
1.28.
1.29.
Some Qual Type Problems
1.30Q . Prove the following theorem.
Theorem 1.3. Properties of P : Let A, B, Ai , An , Ak be F sets. Pn
i) Finite additivity: If A1 , ..., An are disjoint, then P (∪ni=1 Ai ) = i=1 P (Ai ).
ii) P is monotone: A ⊆ B ⇒ P (A) ≤ P (B).
iii) If A ⊆ B, then P (B − A) = P (B) − P (A).
iv) Complement rule: P (Ac ) = 1 − P (A). Pn
v) Finite subadditivity: P (∪ni=1 Ai ) ≤ i=1 P (Ai ).
This chapter shows that random variables and random vectors are measurable
functions.
21
22 2 Random Variables and Random Vectors
Measurable functions can also be defined for the extended real numbers
[−∞, ∞].
Definition 2.3. A function f : Ω → [−∞, ∞] is a measurable function (or
measurable or F measurable or Borel measurable) if
i) f −1 (B) ∈ F ∀B ∈ B(R),
ii) f −1 ({∞}) = {ω : f(ω = ∞} ∈ F, and
iii) f −1 ({−∞}) = {ω : f(ω = −∞} ∈ F.
Comparing definitions 2.4 and 2.2 c) shows that X is a random variable iff
X is a measurable function.
Definition 2.4. Let (Ω, F , P ) be a probability space. A function X :
Ω → R = (−∞, ∞) is a random variable if the inverse image X −1 (B) ∈ F
∀B ∈ B(R). Equivalently, a function X : Ω → R is a random variable iff X
is a measurable function.
Warning: The inverse image X −1 (A) is a set, not an inverse function.
iv) If A and B are disjoint, then X −1 (A) and X −1 (B) are disjoint.
v) X −1 (B c ) = [X −1 (B)]c .
Let Λ be a nonempty index set.
vi) X −1 (∪λ∈Λ Bλ ) = ∪λ∈Λ X −1 (Bλ ).
vii) X −1 (∩λ∈Λ Bλ ) = ∩λ∈Λ X −1 (Bλ ).
Proof Sketch. i) If ω ∈ X −1 (ω), then X(ω) ∈ A ⊆ B. Hence X(ω) ∈ B
and ω ∈ X −1 (B). Thus X −1 (A) ⊆ X −1 (B)
ii) See Problem 2.1.
iii) ω ∈ X −1 (∩∞ ∞
n=1 Bn ) iff X(ω) ∈ ∩n=1 Bn iff X(ω) ∈ Bn for each n iff
−1 ∞ −1
ω ∈ X (Bn ) for each n iff ω ∈ ∩n=1 X (Bn ).
iv) If ω ∈ X −1 (A), then X(ω) ∈ A. Hence X(ω) ∈ / B. Thus ω ∈/ X −1 (B).
−1 c c −1 c
v) ω ∈ X (B ) iff X(ω) ∈ B iff X(ω) ∈ / B iff ω ∈ [X (B)] .
vi) Similar to ii).
vii) Replace n by λ in iii).
Note that unions and intersections in the above theorem can be finite,
countable, or uncountable.
Theorem 2.2. Let (Ω, F , P ) be a probability space. A function X : Ω →
R = (−∞, ∞) is a random variable iff {X ≤ t} = {ω ∈ Ω : X(ω) ≤ t} ∈ F
∀t ∈ R.
2.2 Random Variables 23
since the union is countable. Thus a sum of two random variables is a random
variable, and by induction, a finite sum of random variables is a random
variable.
c) For each t, {max(X, Y ) ≤ t} = {X ≤ t} ∩ {Y ≤ t} ∈ F
(since max(X, Y ) ≤ t iff both X ≤ t and Y ≤ t).
d) For each t, {min(X, Y ) ≤ t} = {X ≤ t} ∪ {Y ≤ t} ∈ F
(since min(X, Y ) ≤ t iff at least one of the following holds i) X ≤ t or ii)
Y ≤ t).
e) First show that X 2 is √
a random variable
√ if X is
√a random variable.
√ For
any t ≥ 0, {X 2 ≤ t} = {− t ≤ X ≤ t} = {x ≤ t} − {X < − t} ∈ F,
while for any t < 0, {X 2 ≤ t} = ∅ ∈ F. Thus X 2 is a random variable. Then
XY = 0.5[(X + Y )2 − X 2 − Y 2 ] is a random variable by b).
f) First show 1/Y is a random variable. Then the result follows by e). Now
1 {Y ≥ 1/t} ∪ {Y ≤ 0}, t ≥ 0
≤t =
Y {Y ≥ 1/t}, t < 0.
variable.
Theorem 2.5. Fix (Ω, F , P ). Let the induced probability PX = PF be
PX (B) = P [X −1 (B)] for any B ∈ B(R). Then (R, B(R), PX ) is a probability
space.
Proof. PX is a set function on B(R). Need to show the PX is a probability
measure.
P1) Let B ∈ B(R). Then PX (B) = P [X −1 (B)]. Hence 0 ≤ PX (B) ≤ 1.
P2) PX (R) = P [X −1 (R)] = P (Ω) = 1, and PX (∅) = P ({ω : X(ω) ∈ ∅) =
P (∅) = 0. U∞ U∞
P3)ULet {Bi } be disjoint B(R) sets. Then PX P ( i=1 Bi ) = P [X −1 ( i=1 Bi )] =
∞ ∞ ∞
P [ i=1 X −1 (Bi )] = −1
P
i=1 P [X (Bi )] = i=1 PX (Bi ). (Theorem 2.1 ii)
gives the second equality, but the inverse images of disjoint sets are disjoint
sets by Theorem 2.1 iv), giving the third equality.)
Definition 2.6. The distribution of X is PX (B) = P [X −1 (B)], B ∈
B(Rk ).
Note that the cumulative distribution function F (t) = FX (t) = PX ((−∞, t])
since PX ((−∞, t]) = P [X −1 ((−∞, t])] = P ({ω : X(ω) ∈ (−∞, t]} = P (X ≤
t) and since (−∞, t] ∈ B(R).
Notation. For a given random variable X, the subscript X in PX will
often be suppressed: e.g., write P ((−∞, x]) for PX ((−∞, x]). This notation
is often used when PX is the only probability of interest, and this notation
is used in the following proof.
Theorem 2.6. A function F : R → [0, 1] is a cumulative distribution
function of a random variable X if
df1) F is nondecreasing: x1 < x2 ⇒ F (x1 ) ≤ F (x2 ).
df2) F is right continuous:
lim F (x + h) = F (x) ∀x ∈ R.
h↓0
df3) (−∞, −n] ↓ ∅. Hence F (−n) ↓ 0, and limn→∞ F (−n) = limx→−∞ F (x) =
0.
df4) (−∞, n] ↑ R. Hence F (n) ↑ 1, and limn→∞ F (n) = limx→∞ F (x) = 1.
For the above proof, technically need Ah ↓ A to be a countable limit,
where Ah = (−∞, x + h] ↓ (−∞, x] = A, to apply the continuity from above
property of probability, but (−∞, x+h] ↓ (−∞, x] regardless of how h ↓ 0 (e.g.
using h = 1/n, a countable sequence of rational numbers, or an uncountable
sequence of irrationals), and (−∞, x + h] and (−∞, x] are Borel sets. Thus
the probabilities do exist and do decrease and converge to the limit F (x).
Similar remarks apply to df3) and df4).
Remark 2.1. Define F (x−) = P (X < x). Then P (X = x) = F (x) −
F (x−). Note that P (a < X ≤ b) = F (b) − F (a).
Definition 2.7. The σ-field σ(X) is the smallest σ-field with respect to
which the random variable X is measurable.
Theorem 2.7. σ(X) = the collection {X −1 (B) : B ∈ B(R)}, which is a
σ-field.
Proof. The above collection of sets is a subset of σ(X). Hence the result
follows if the collection is a σ-field.
σ1) X −1 (R) = Ω ∈ σ(X).
σ2) Let A ∈ σ(X). Then A = X −1 (B) for some B ∈ B(R). Thus Ac =
[X −1 (B)]c = X −1 (B c ) by Theorem 2.1 v), where B c ∈ B(R). Hence Ac ∈
σ(X).
σ3) A, B ∈ σ(X) implies A = X −1 (C) and B = X −1 (D) for some sets
C, D ∈ B(R). Hence A ∩ B = X −1 (C ∩ D) ∈ σ(X) by Theorem 2.1 vii).
σ4) Let {Ai }∞ i=1 ∈ σ(X). Then Ai = X
−1
(Bi ) for some Bi ∈ B(R). Thus
∞ ∞ −1 −1 ∞
∪i=1 Ai = ∪i=1 X (Bi ) = X (∪i=1 Bi ) by Theorem 2.1 iii). Thus ∪∞ i=1 Ai ∈
σ(X).
Example 2.1, continued. For a) where X is a constant, σ(X) = {∅, Ω},
the smallest possible σ-field. For b) where X = IA where A ∈ F, σ(X) =
{∅, A, Ac, Ω}.
iv) If A and B are disjoint, then X −1 (A) and X −1 (B) are disjoint.
v) X −1 (B c ) = [X −1 (B)]c .
Let Λ be a nonempty index set.
vi) X −1 (∪λ∈Λ Bλ ) = ∪λ∈Λ X −1 (Bλ ).
vii) X −1 (∩λ∈Λ Bλ ) = ∩λ∈Λ X −1 (Bλ ).
Proof Sketch. i) If ω ∈ X −1 (ω), then X(ω) ∈ A ⊆ B. Hence X(ω) ∈ B
and ω ∈ X −1 (B). Thus X −1 (A) ⊆ X −1 (B)
ii) Similar to Problem 2.1.
iii) ω ∈ X −1 (∩∞ ∞
n=1 Bn ) iff X(ω) ∈ ∩n=1 Bn iff X(ω) ∈ Bn for each n iff
−1 ∞ −1
ω ∈ X (Bn ) for each n iff ω ∈ ∩n=1 X (Bn ).
iv) If ω ∈ X −1 (A), then X(ω) ∈ A. Hence X(ω) ∈ / X −1 (B).
/ B. Thus ω ∈
−1 c c −1
v) ω ∈ X (B ) iff X(ω) ∈ B iff X(ω) ∈ / B iff ω ∈ [X (B)]c .
vi) Similar to ii).
vii) Replace n by λ in iii).
Theorem 2.5 is the special case of Theorem 2.9 with k = 1
Theorem 2.9. Fix (Ω, F , P ). If X is a 1 × k random vector, then the
induced probability PX = PF be PX (B) = P [X −1 (B)] for any B ∈
B(Rk ). Then (Rk , B(Rk ), PX ) is a probability space.
Proof. PX is a set function on B(Rk ). Need to show the PX is a proba-
bility measure.
P1) Let B ∈ B(Rk ). Then PX (B) = P [X −1 (B)]. Hence 0 ≤ PX (B) ≤ 1.
P2) PX (Rk ) = P [X −1 (Rk )] = P (Ω) = 1, and PX (∅) = P ({ω : X(ω) ∈
∅) = P (∅) = 0. U∞ U∞
P3) Let {Bi } be disjoint B(Rk ) sets. Then PX ( i=1 Bi ) = P [X −1 ( i=1 Bi )] =
U∞ ∞ ∞
P [ i=1 X −1 (Bi )] = −1
P P
i=1 P [X (Bi )] = i=1 PX (Bi ). (Theorem 2.8 ii)
gives the second equality, but the inverse images of disjoint sets are disjoint
sets by Theorem 2.8 iv), giving the third equality.)
Definition 2.9. The σ-field σ(X) is the smallest σ-field with respect to
which the 1 × k random vector X is measurable.
Theorem 2.10. σ(X) = the collection {X −1 (B) : B ∈ B(Rk )}, which is
a σ-field.
Proof. The above collection of sets is a subset of σ(X). Hence the result
follows if the collection is a σ-field.
σ1) X −1 (R) = Ω ∈ σ(X).
σ2) Let A ∈ σ(X). Then A = X −1 (B) for some B ∈ B(R). Thus Ac =
[X −1 (B)]c = X −1 (B c ) by Theorem 2.8 v), where B c ∈ B(Rk ). Hence Ac ∈
σ(X).
σ3) A, B ∈ σ(X) implies A = X −1 (C) and B = X −1 (D) for some sets
28 2 Random Variables and Random Vectors
Some properties of the gamma function follow. i) Γ (k) = (k−1)! for integer
k ≥ 1. ii) Γ (x + 1) =
√ x Γ (x) for x > 0. iii) Γ (x) = (x − 1) Γ (x − 1) for
x > 1. iv) Γ (0.5) = π.
1) Y ∼ beta(δ, ν)
Γ (δ + ν) δ−1
f(y) = y (1 − y)ν−1
Γ (δ)Γ (ν)
where y and µ are real numbers and σ > 0. E(Y ) = ∞ = VAR(Y ). E(Y )
and V (Y ) do not exist. c(t) = exp(itµ − |t|σ).
1 y −µ
F (y) = [arctan( ) + π/2].
π σ
5) chi-square(p) = gamma(ν = p/2, λ = 2), Y ∼ χ2p ,
p y
y 2 −1 e− 2
f(y) = p
2 2 Γ ( p2 )
F (y) = 1 − exp(−y/λ), y ≥ 0.
If Y1 , ..., Yn are iid exponential EXP(λ), then
n
X
Yi ∼ G(n, λ).
i=1
yν−1 e−y/λ
f(y) =
λν Γ (ν)
n
X
Thus if Y1 , ..., Yn are iid G(ν, λ), then Yi ∼ G(nν, λ).
i=1
8) Y ∼ N (µ, σ 2 )
−(y − µ)2
1
f(y) = √ exp
2πσ 2 2σ 2
Here ai and bi are fixed constants. Thus if Y1 , ..., Yn are iid N (µ, σ 2 ), then
Y ∼ N (µ, σ 2 /n).
9) Poisson(θ), Y ∼ POIS(θ)
2.4 Some Useful Distributions 31
e−θ θy
f(y) =
y!
θ3
V (Y ) = .
λ
The mgf is
" r !# " r !#
λ 2θ2 t 2 λ 2θ2 it
m(t) = exp 1− 1− t < λ/(2θ ), c(t) = exp 1− 1− .
θ λ θ λ
14) If Y has a negative binomial distribution, Y ∼ NB(r, ρ), then the pmf
of Y is
r+y−1 r
f(y) = P (Y = y) = ρ (1 − ρ)y
y
for y = 0, 1, . . . where 0 < ρ < 1. E(Y ) = r(1 − ρ)/ρ, and
r(1 − ρ)
V (Y ) = .
ρ2
The moment generating function
r
ρ
m(t) =
1 − (1 − ρ)et
Γ ( p+1 ) y2 p+1
f(y) = 1/2
2
(1 + )−( 2 )
(pπ) Γ (p/2) p
Z
(W
p )
1/2
2.5 Summary
{ω ∈ Ω : X(ω) ∈ B}. Note that the inverse image X −1 (B) is a set. X −1 (B)
is not the inverse function.
27) Let B(R) be the Borel σ−field on the real numbers R = (−∞, ∞). Let
(Ω, F ) be a measurable space, and let the real function X : Ω → R. Then X
is a measurable function if X −1 (B) ∈ F ∀ B ∈ B(R). Equivalently, X is
a measurable function if
{X ≤ t} = {ω ∈ Ω : X(ω) ≤ t} ∈ F ∀t ∈ R.
28) Fix the probability space (Ω, F , P ). Combining 20) and 27) shows X
is a random variable iff X is a measurable function.
71) Let X : Ω → R. Let A, B, Bn ∈ B(R).
i) If A ⊆ B, then X −1 (A) ⊆ X −1 (B).
ii) X −1 (∪n Bn ) = ∪n X −1 (Bn ).
iii) X −1 (∩n Bn ) = ∩n X −1 (Bn ).
iv) If A and B are disjoint, then X −1 (A) and X −1 (B) are disjoint.
v) X −1 (B c ) = [X −1 (B)]c .
(The unions and intersections in ii) and iii) can be finite, countable or un-
countable.)
72) Theorem: Fix (Ω, F , P ). Let X : Ω → R. X is a measurable function
iff X is a RV iff any one of the following conditions holds.
i) X −1 (B) = {ω ∈ Ω : X(ω) ∈ B} ∈ F ∀ B ∈ B(R).
ii) X −1 ((−∞, t]) = {X ≤ t} = {ω ∈ Ω : X(ω) ≤ t} ∈ F ∀t ∈ R.
iii) X −1 ((−∞, t)) = {X < t} = {ω ∈ Ω : X(ω) < t} ∈ F ∀t ∈ R.
iv) X −1 ([t, ∞)) = {X ≥ t} = {ω ∈ Ω : X(ω) ≥ t} ∈ F ∀t ∈ R.
v) X −1 ((t, ∞)) = {X > t} = {ω ∈ Ω : X(ω) > t} ∈ F ∀t ∈ R.
73) Theorem: Let X, Y , and Xi be RVs on P (Ω, F , P ).
a) aX + bY is a RV for any a, b ∈ R. Hence ni=1 Xi is a RV.
b) max(X, Y ) is a RV. Hence max(X1 , ..., Xn) is a RV.
c) min(X, Y ) is a RV. Hence min(X1 , ..., Xn) is a RV.
d) XY is a RV. Hence X1 · · · Xn is a RV.
e) X/Y is a RV if Y (ω) 6= 0 ∀ ω ∈ Ω.
f) supn Xn is a RV.
g) infn Xn is a RV.
h) limsupn Xn is a RV.
i) liminfn Xn is a RV.
j) If limn XP n = X, thenP X is a RV.
m ∞
k) If limm n=1 Xn = n=1 Xn = X, then X is a RV.
l) If h : Rn → R is measurable, then Y = h(X1 , ..., Xn) is a RV.
m) If h : Rn → R is continuous, then h is measurable and Y = h(X1 , ..., Xn)
is a RV.
n) If h : R → R is monotone, then h is measurable and h(X) is a RV.
34) An indicator IA is the function such that IA (ω) = 1 if ω ∈ A and
IA (ω) = 0 if ω 6∈ A.
Pk
35) A function f is a simple function if f = i=1 xi IAi for some positive
integer k. Thus a simple function f has finite range.
36) A simple function is a random variable if each Ai ∈ F.
2.7 Problems 35
2.6 Complements
2.7 Problems
2.1. Prove
∞ ∞
!
[ [
−1
X Bi = X −1 (Bi )
i=1 i=1
(You may assume, for example, that X : Ω → Rk is a random vector and the
Bi ∈ B(Rk ).)
2.8. Let (Ω, F ) be a measurable space. Suppose the 1 × k vector X : Ω →
Rk . Give the definition of a random vector X.
2.9. Fix (Ω, F , P ) and let X be a RV. What is the induced probability
PX (B) for B ∈ B(R)?
2.10.
2.11.
2.12.
2.13.
2.14.
36 2 Random Variables and Random Vectors
2.15.
2.16.
2.17.
2.18.
2.19.
Some Qual Type Problems
2.20Q . Prove Theorem 2.4 using Theorem 2.4.
Theorem 2.3. Fix (Ω, F , P ). Let X : Ω → R. X is a measurable function
iff X is a random variable iff any one of the following conditions holds.
i) X −1 (B) = {ω ∈ Ω : X(ω) ∈ B} ∈ F ∀ B ∈ B(R).
ii) X −1 ((−∞, t]) = {X ≤ t} = {ω ∈ Ω : X(ω) ≤ t} ∈ F ∀t ∈ R.
iii) X −1 ((−∞, t)) = {X < t} = {ω ∈ Ω : X(ω) < t} ∈ F ∀t ∈ R.
iv) X −1 ([t, ∞)) = {X ≥ t} = {ω ∈ Ω : X(ω) ≥ t} ∈ F ∀t ∈ R.
v) X −1 ((t, ∞)) = {X > t} = {ω ∈ Ω : X(ω) > t} ∈ F ∀t ∈ R.
Theorem 2.4. Let X, Y , and Xi be RVsP on (Ω, F , P ).
n
a) aX + bY is a RV for any a, b ∈ R. Hence i=1 Xi is a RV.
b) max(X, Y ) is a RV. Hence max(X1 , ..., Xn) is a RV.
c) min(X, Y ) is a RV. Hence min(X1 , ..., Xn) is a RV.
d) XY is a RV. Hence X1 · · · Xn is a RV.
e) X/Y is a RV if Y (ω) 6= 0 ∀ ω ∈ Ω.
f) supn Xn is a RV.
g) infn Xn is a RV.
h) limsupn Xn is a RV.
i) liminfn Xn is a RV.
j) If limn XP n = X, thenPX is a RV.
m ∞
k) If limm n=1 Xn = n=1 Xn = X, then X is a RV.
2.21Q . Fix (Ω, F , P ). For a random variable X, prove that the induced
probability PX (B) = P [X −1 (B)] for B ∈ B(R) is a probability measure on
(R, B(R)).
S∞ You may S∞use −1 without proof i) X −1 (R) = Ω, ii) X −1 (∅) = ∅, iii)
X ( i=1 Bi ) = i=1 X (Bi ), and iv) if A and C are disjoint, then X −1 (A)
−1
This chapter covers integration, expected values, Fubini’s theorem, and prod-
uct measures. Most of the proofs for integration are omitted, but the corre-
sponding results for expectation are often given.
3.1 Integration
37
38 3 Integration and Expected Value
Uniqueness:
n
X X X X X
xi P (Ai ) = xi P (Ai ) = xP (∪i:xi=x Ai ) = P (X = x).
i=1 x i:xi =x x x
Note that all sums in the above proof are finite. Also note that although
many partitions Ai may exist, each partition gives the same value of E(X).
Theorem 3.11. Let Xn , X, and Y be SRVs.
a) −∞ < E(X) < ∞
b) linearity: E(aX Pn+ bY ) = aE(X) + bE(Y )
c) If SRVPX = i=1 xi IAi where the Ai are not necessarily disjoint, then
n
E(X) = i=1 xi P (Ai ).
d) monotonicity: If X ≤ Y , then E(X) ≤ E(Y )
e) If the sequence {Xn } is uniformly bounded and X = limn Xn on a set of
probability 1, then E(X) = limn E(Xn ). P
f) If t is a real valued function, then E[t(X)] = x t(x)P (X = x)
g) If X isPnonnegative, X R≥ 0, then
∞
E(X) = i P (X > xi ) = 0 [1 − F (x)]dx.
h) If X Y , then E(X) P= E(X)E(Y ).
Proof. a) E(X) = x xP (X = x) where the x are bounded since X has
finite range x1 , ..., xm and P (X = x) ∈ [0, 1]. Hence
min(xi ) ≤ E(X)P ≤ max(xi ). P
b) Let X = i xi IAi and Y = j yj IBj where the Ai partition Ω and
the Bj partition Ω. Then the Ai ∩ Bj partition Ω, and aX + bY = axi + byj
for ω ∈ Ai ∩ Bj . Thus
XX
aX + bY = (axi + byj )IAi ∩Bj
i j
is a SRV with
XX
E(aX + bY ) = (axi + byj )P (Ai ∩ Bj ) =
i j
X X X X
axi P (Ai ∩ Bj ) + byj P (Ai ∩ Bj ) =
i j j i
X X
a xi P (Ai ) + b yj P (Bj ) = aE(X) + bE(Y ).
i j
is a SRV. Thus
ind
XX XX
E(XY ) = xi yj P (Ai ∩ Bj ) = xi yj P (Ai )P (Bj ) =
i j i j
X X
xi P (Ai ) yj P (Bj ) = E(X)E(Y ).
i j
Remark 3.7. For expected values, assume (Ω, F , P ) is fixed, and the
random variables are measurableR (with respect to) wrt F . We can define the
expected value to be E(X) = XdP as the special case of integration where
the measure µ = P is a probability measure, or we can use the following
definition that ignores most measure theory. There are several equivalent
ways to define integrals and expected values. Hence E(X) can also be defined
as in Def. 3.2 with µ replaced by P and f replaced by X : Ω → [0, ∞).
Theorem 3.12. Let X ≥ 0 be a random variable. Then there exist SRVs
Xn ≥ 0 such that Xn ↑ X.
Proof.
Note: Xn ↑ X means Xn (ω) ↑ X(ω) ∀ω. An analogy for Theorem 3.12 is
to take step functions, and “increase them” to get Riemann integrabilty of a
function. A consequence of Theorem 3.12 is that if X ≤ 0, then there exist
SRVs Xn such that Xn ↓ X.
Definition 3.9. Let X ≥ 0 Z be a nonnegative RV.
a) E(X) = limn→∞ E(Xn ) = XdP ≤ ∞ where the Xn are nonnegative
SRVs with 0 ≤ Xn ↑ X.
b) The expectation of X over an event A is E(XIA ).
Proof of existence and uniqueness:
existence: 0 ≤ E(X1 ) ≤ E(X2 ) ≤ .... So {E(Xn )} is a monotone sequence
and limn→∞ E(Xn ) exists in [0, ∞].
uniqueness (show E(X) is well defined): later
42 3 Integration and Expected Value
The first and last equalities holds by the definition of expected value for
nonnegative RVs. The second inequality holds by linearity for SRVs. The
third inequality holds since lim (an + bn ) = lim an + lim bn if the RHS
exists.
b) Let W = Y − X ≥ 0. Since E(Z) ≥ 0 when Z ≥ 0, E(Y − X) ≥ 0.
Using a) gives
E(X) ≤ E(Y ).
iv) |E(X)| ≤ E( |X| ).
Proof. i) If X is integrable, then E[|X|] = E[X + ] + E[X − ] by Theorem
3.13 a). Since E[X + ] ≥ 0, E[X − ] ≥ 0, and the sum is finite, both terms are
finite. If both E[X + ] and E[X − ] are finite, then E[|X|] = E[X + ] + E[X − ] is
finite.
ii)
iii) By ii) 0 ≤ E(Y − X) = E(Y ) − E(X). Thus E(Y ) ≥ E(X).
iv) Since −|X| ≤ X ≤ |X|, iii) implies that E(X) ≤ E(|X|) and
−E(|X|) ≤ E(X). Thus −E(X) ≤ E(|X|). Hence |E(X)| ≤ E( |X| ).
Theorem 3.15: Fatou’s Lemma: For RVs Xn ≥ 0, E[lim inf n Xn ] ≤
lim inf n E[Xn ].
Proof.
Theorem 3.16: Monotone Convergence Theorem (MCT): If 0 ≤
Xn ↑ X ae, then
E(Xn ) ↑ E(X).
Proof. The proof is for when the convergence is everywhere. Then Xn ↑ X
implies E(Xn ) ≤ E(X) for all n using monotonicity of nonnegative RVs. Thus
limsupn E(Xn ) ≤ E(X). By Fatou’s lemma:
by MCT.
Pk Pk
Remark 3.9. Consequences: a) linearity implies E( n=1 an Xn ) = n=1 an E(Xn ):
i.e., the expectation and finite sum operators can be interchanged, or the
expectation of a finite sum is the sum of the expectations if the Xn are inte-
grable.
b) MCT, LDCT, and BCT give conditions where the limit and E can be
interchanged: limn E(Xn ) = E[limn Xn ] = E(X) P∞
c) Theorem 3.18 i) and ii) give conditions where
P∞ the infiniteP∞ sum n=1 and
the expected value can be interchanged: E[ n=1 Xn ] = n=1 E(Xn ).
Definition 3.12. Given (Ω, F , P ), the collection of all integrable random
vectors or random variables is denoted by L1 = L1 (Ω, F , P ).
Definition 3.13. Let X be a 1 × k random vector with cdf FX (t) =
F (t) = P (X R 1 ≤ t1 , ..., Xk ≤ tk ). Then the Lebesgue Stieltjes integral
E[h(X)] = h(t)dF (t) provided the expected value exists, and the integral
is a linear operator
R with respect to both h and F . If X is a random variable,
then E[h(X)] = h(t)dF (t). If W = h(X) is integrable or if W = h(X) ≥ 0,
then the expected value exists. Here h : Rk → Rj with 1 ≤ j ≤ k.
Definition 3.14. The distribution of a 1 × k random vector X is a mix-
ture distribution if the cdf of X is
J
X
FX (t) = πj FU j (t)
j=1
PJ
where the probabilities πj satisfy 0 ≤ πj ≤ 1 and j=1 πj = 1, J ≥ 2,
and FU j (t) is the cdf of a 1 × k random vector U j . Then X has a mixture
distribution of the U j with probabilities πj . If X is a random variable, then
J
X
FX (t) = πj FUj (t).
j=1
3.2 Expected Value 45
f) Suppose X has a mixture distribution given by 68) and that E(h(X)) and
the E(h(Uj )) exist. Then
J
X J
X
E[h(X)] = πj E[h(Uj )] and E(X) = πj E[Uj ].
j=1 j=1
This theorem is easy to prove if the U j are continuous random vectors with
(joint) probability density functions (pdfs) fU j (t). Then X is a continuous
random vector with pdf
J
X Z ∞ Z ∞
fX (t) = πj fU j (t), and E[h(X)] = ··· h(t)fX (t)dt
j=1 −∞ −∞
J
X Z ∞ Z ∞ J
X
= πj ··· h(t)fU j (t)dt = πj E[h(U j )]
j=1 −∞ −∞ j=1
3.4 Summary
R + ∞ − ∞.
ii) The integral is defined unless it involves
f dµ and f − dµ are finite. Thus
R
iii)
R The function f is integrable if both
fdµ ∈ R if f is integrable.
49) A property holds almost everywhere (ae), if the property holds for
ω outside a set of measure 0, i.e. the property holds on a set A such that
µ(Ac ) = 0. If µ is a probability measure P , then P (A) = 1 while P (Ac ) = 0.
50) Theorem: suppose
R f and g are both nonnegative.
i) If f = 0 ae, then fdµ = 0. R
ii) If µ({ω
R : f(ω) > 0}) > 0, then fdµ > 0.
iii) If fdµ < ∞, then R f < ∞R ae.
iv) If f ≤ g ae, then R fdµ ≤R gdµ.
v) If f = g ae, then fdµ = gdµ.
R
51) Theorem: i) f is integrable iff |f|dµ < ∞. R
Rii) monotonicity: If f and g are integrable and f ≤ g ae, then fdµ ≤
gdµ.
iii) linearity:
R If f and gRare integrable
R and a, b ∈ R, then af +bg is integrable
with (af + bg)dµ = a fdµ + b gdµ.
iv)
R MonotoneR Convergence Theorem (MCT): If 0 ≤ fn ↑ f ae, then
fn dµ ↑ fdµ. R R
v) Fatou’s Lemma: For nonnegative fn , liminfn fn dµ ≤ liminfn fn dµ.
vi) Lebesgue’s Dominated Convergence Theorem (LDCT): If the
|fn | ≤ g ae where
R g is integrable,
R and if fn → f ae, then f and fn are
integrable and fn dµ → fdµ.
vii) Bounded Convergence Theorem (BCT): If µ(Ω) < ∞ and the fn
50 3 Integration and Expected Value
R R
are uniformly bounded, then fn → fPae implies
R P∞ fn dµ → fdµ.
∞ R
viii) IfPfn ≥ R0 then n=1 fn dµR =
P∞n=1 fn dµ.
∞ P∞ R
ix) If n=1 |fn |dµ < ∞, then R n=1 fn dµR = n=1 R fn dµ.
x) If f and g are integrable, then | fdµ − gdµ| ≤ |f − g|dµ.
R Pk Pk R
52) Consequences: a) linearity implies n=1 fn dµ = n=1 fn dµ: i.e.,
the integral and finite sum operators can be interchanged R
b) MCT, LDCT, and R BCT Rgive conditionsR where the limit and can be
interchanged: limn fn dµ = limn fn dµ = fdµ P∞
c) 51) viii)
R and ix) give conditionsR P∞ where the infinite
P∞ R sum n=1 and the
integral can be interchanged: f
n=1 n dµ = n=1 fn dµ.
53) A common technique is to show the result is true for indicators. Ex-
tend to simple functions by linearity, and then to nonnegative function by a
monotone passage of the limit. Use f = f + − f − for general functions.
54) Induction Theorem: If R(n) is a statement for each n ∈ N such that
a) R(1) is true, and b) for each k ∈ N, if R(k) is true, then R(k + 1) is true,
then R(n) is true for each n ∈ N.
Note that ∞ 6∈ N. Induction can be used with linearity to prove 52) a),
but induction generally does R not work R for 52) c).
55) Def. If A ∈ F, then R A fdµ = fIA dµ.
56) If µ(A) = 0, then A fdµ = 0.
57) If µ R: F → [0, ∞] is a measure and f ≥ 0, then
a) ν(A)R = A fdµ is a measure Ron F .
b) If Ω fdµ = 1, then P (A) = A fdµ is a probability measure on F .
58) For expected values, assume (Ω, F , P ) is fixed, and the random vari-
ables are measurable wrt F . R
59) We can define the expected value to be E(X) = XdP as the special
case of integration where the measure µ = P is a probability measure, or we
can use a definition that ignores most measure theory.
60) Def. Let X ≥ 0 be a nonnegative
Z RV.
a) E(X) = limn→∞ E(Xn ) = XdP ≤ ∞ where the Xn are nonnegative
SRVs with 0 ≤ Xn ↑ X.
b) The expectation of X over an event A is E(XIA ).
There are several equivalent ways to define integrals and expected values.
Hence E(X) can also be defined as in 43) with µ replaced by P and f replaced
by X : Ω → R.
61) Theorem: Let X, Y be nonnegative random variables.
a) For X, Y ≥ 0 and a, b ≥ 0, E(aX + bY ) = aE(X) + bE(Y ).
b) If X ≤ Y ae, then E(X) ≤ E(Y ).
Pn Pn
By induction, if the ai Xi ≥ 0, then E( i=1 ai Xi ) = i=1 E(ai Xi ): the
expected value of a finite sum of nonnegative RVs is the sum of the expected
values.
62) For a random variable X : Ω → (−∞, ∞), then the positive part
X + = XI(X ≥ 0) = max(X, 0), and the negative part X − = −XI(X ≤
3.4 Summary 51
69) Expected Value Theorem: Assume all expected values exist. Let
dx = dx1 dx2 ...dxk . Let X be the support of X = {x : f(x) > 0} or
{x : p(x) > 0}. R∞ R∞
a) If X has (joint) pdf f(x), then E[h(X)] = −∞ · · · −∞ h(x)f(x) dx =
R R R∞ R∞ R R
· · · X h(x)f(x) dx. Hence E[X] = −∞ · · · −∞ xf(x) dx = · · · X xf(x) dx.
R∞ R
b) If X has pdf f(x), then E[h(X)] = −∞ h(x)f(x) dx = X h(x)f(x) dx.
R∞ R
Hence E[X] = −∞ xf(x) dx = X xf(x) dx. P P
P If X has (joint) P
c) pmf p(x), then E[h(X)] = xP
1
· · · xP k
h(x)p(x) =
Px∈R k h(x)p(x)P= x∈X h(x)p(x). Hence E[X] = x1 · · · xk xp(x) =
x∈R k xp(x) = x∈X xp(x). P P
d) If X hasP pmf p(x),Pthen E[h(X)] = x h(x)p(x) = x∈X h(x)p(x). Hence
E[X] = x xp(x) = x∈X xp(x).
e) Suppose X has a mixture distribution given by 68) and that E(h(X)) and
the E(h(U j )) exist. Then
J
X J
X
E[h(X)] = πj E[h(U j )] and E(X) = πj E[U j ].
j=1 j=1
f) Suppose X has a mixture distribution given by 68) and that E(h(X)) and
the E(h(Uj )) exist. Then
J
X J
X
E[h(X)] = πj E[h(Uj )] and E(X) = πj E[Uj ].
j=1 j=1
This theorem is easy to prove if the U j are continuous random vectors with
(joint) probability density functions (pdfs) fU j (t). Then X is a continuous
random vector with pdf
J
X Z ∞ Z ∞
fX (t) = πj fU j (t), and E[h(X)] = ··· h(t)fX (t)dt
j=1 −∞ −∞
J
X Z ∞ Z ∞ J
X
= πj ··· h(t)fU j (t)dt = πj E[h(U j )]
j=1 −∞ −∞ j=1
3.4 Summary 53
88) The result in 87) can be extended to where the limits of integration
are infinite and to n ≥ 2 integrals. Using g(x, y) = h(x, y)f(x, y) where f is
a pdf gives E[h(X, Y )]. Note that g : R2 → R (at least ae).
3.5 Complements
3.6 Problems
X = 1I(0,0.75) + 1I(0.5,1).
Pn Pn
a) Find E(X) usingP linearity: E( i=1 xi IAi ) = i=1 xi P (Ai ).
b) Find E(X) = x xP (X = x) by finding the two distinct values of x in
the range of X and the two values of P (X = x).
(Note:P for X = 1I(0,0.75) + 1I(0.5,1), n = 2, and xi = 1 for i = 1, 2. Thus
n
E(X) 6= i=1 xi P (X = xi ) = 2(1)P (X =P 1). Need the xi to beP
the distinct
values of the range of SRV X for E(X) = ni=1 xi P (X = xi ) = x xP (X =
x). )
3.4. Fix (Ω, F , P ). Let the induced probability PXR = PF be PX (B) =
P [X −1 (B)] for any B ∈ B(R). Show that E[IB (X)] = IB dPX .
56 3 Integration and Expected Value
Note: X can be the claims distribution for an insurance policy where 95%
of the policy holders make no claim in the year, and 5% make a claim with
a complicated nonnegative distribution U2 where the mean and variance are
known from extensive past records.
P Then the central limit theorem can be
used to find the percentiles of Xi where the Xi are iid from the distribution
of X.
3.11. The random variable X is a point mass at the real number c if
P (X = c) = 1. Then the pmf pX (x) > 0 only at x = c. If h is a (measurable)
function, find E[h(X)].
Pn
3.12. Like part of Billingsley (1986, 5.12): Let X = k=1 IAk be a simple
random variable, and find E[X/n].
3.13. Billingsley (1986, 5.14): Prove that if X has nonnegative integers as
values, then
P∞
E[X] = n=1 P (X P ≥ n). P∞
Hint: E[X] = x xP (X = x) = n=1 nP (X = n). Consider the following
array, and sum on columns and sum on rows.
Table 3.1
sum
P(X=1) P(X=2) P(X=3) P(X=4) ··· P (X ≥ 1)
P(X=2) P(X=3) P(X=4) ··· P (X ≥ 2)
P(X=3) P(X=4) ··· P (X ≥ 3)
P(X=4) ··· P (X ≥ 4)
.. .. .. .. .. ..
. . . . . .
sum P(X=1) 2 P(X=2) 3 P(X=3) 4 P(X=4) ··· E(X)
b) Using a), show that if E[h(X)] and the E[h(Uj )] exist, then E[h(X)] =
PJ
j=1 πj E[h(Uj )].
3.17. Let P be the uniform U(0,1) probability and let X = 1I(0,0.7) +
1I(0.6,1). Find E(X). P
n
3.18. Suppose X = i=1 xi IAi where the xi are real numbers and the Ai
are events. Using linearity, find E(X).
58 3 Integration and Expected Value
R iv) Monotone
R Convergence Theorem (MCT): If 0 ≤ fn ↑ f ae, then
fn dµ ↑ fdµ.
R R
v) Fatou’s Lemma: For nonnegative fn , liminfn fn dµ ≤ liminfn fn dµ.
3.6 Problems 59
Use the pmf pX (t) to show that if E[h(X)] and the E[h(Uj )] exist, then
PJ
E[h(X)] = j=1 πj E[h(Uj )].
3.24. Prove one of the following:
P∞ a) the Monotone
P∞ Convergence Theorem
for RVs, b) If Xn ≥ 0, then E[ n=1 Xn ] = n=1 E[Xn ], or c) Lebesgue’s
Dominated Convergence Theorem for RVs. State which result, a), b) or c)
that you are proving.
3.25.
3.26.
3.27.
3.28.
3.29.
Some Qual Type Problems
Un
3.30Q . Suppose events A1 ,P
..., An are disjoint and i=1 Ai = Ω. Let simple
n
random variable (SRV) X = i=1 xi IAi . Then the expected value of X is
n
X X
E(X) = xi P (Ai ) = xP (X = x). (3.3)
i=1 x
61
62 4 Large Sample Theory
Example 4.1. Suppose that Xn ∼ U (−1/n, 1/n). Then the cdf Fn (x) of
Xn is
x ≤ −1
0, n
nx
Fn (x) = + 2 , n ≤ x ≤ n1
1 −1
2
1, x ≥ n1 .
Sketching Fn (x) shows that it has a line segment rising from 0 at x = −1/n
to 1 at x = 1/n and that Fn (0) = 0.5 for all n ≥ 1. Examining the cases
x < 0, x = 0 and x > 0 shows that as n → ∞,
4.1 Modes of Convergence 63
0, x < 0
Fn (x) → 21 , x = 0
1, x > 0.
Example 4.2. Suppose Yn ∼ U (0, n). Then Fn (t) = t/n for 0 < t ≤ n
and Fn (t) = 0 for t ≤ 0. Hence limn→∞ Fn (t) = 0 for t ≤ 0. If t > 0 and
n > t, then Fn (t) = t/n → 0 as n → ∞. Thus limn→∞ Fn (t) = H(t) = 0
for all t, and Yn does not converge in distribution to any random variable Y
since H(t) ≡ 0 is a continuous function but not a cdf.
See Section 2.4 for some properties of the point mass distribution, which
corresponds to a discrete random variable that only takes on exactly one
value. Using characteristic functions, it can be shown that if X has a point
mass at τ (θ), then X ∼ N (τ (θ), 0), a normal distribution with mean τ (θ)
and variance 0. See Section 4.2. A point mass at 0, where P (X = 0) = 1, is
a common limiting distribution. See Examples 4.1 and 4.3.
Example 4.3. X has a point mass distribution at c or X is degenerate
at c if P (X = c) = 1. Thus X has a probability mass function with all of the
mass at the point c. Then FX (t) = 1 for t ≥ c and FX (t) = 0 for t < c. Often
D
FXn (t) → FX (t) for all t 6= c where P (X = c) = 1. Then Xn → X where
P (X = c) = 1. Thus FXn (t) → H(t) for all t 6= c where H(t) = FX (t) ∀t 6= c.
It is possible that limn→∞ FXn (c) = H(c) ∈ [0, 1] or that limn→∞ FXn (c)
does not exist.
Example 4.4. Prove whether the following sequences of random variables
D
Xn converge in distribution to some random variable X. If Xn → X, find the
distribution of X (for example, find FX (t) or note that P (X = c) = 1, so X
has the point mass distribution at c).
64 4 Large Sample Theory
a) Xn ∼ U (−n − 1, −n)
b) Xn ∼ U (n, n + 1)
c) Xn ∼ U (an , bn ) where an → a < b and bn → b.
d) Xn ∼ U (an , bn ) where an → c and bn → c.
e) Xn ∼ U (−n, n)
f) Xn ∼ U (c − 1/n, c + 1/n)
Solution. If Xn ∼ U (an , bn ) with an < bn , then
t − an
FXn (t) =
b n − an
for an ≤ t ≤ bn , FXn (t) = 0 for t ≤ an and FXn (t) = 1 for t ≥ bn . On [an , bn ],
1
FXn (t) is a line segment from (an , 0) to (bn , 1) with slope .
b n − an
a) FXn (t) → H(t) ≡ 1 ∀t ∈ R since FXn (t) = 1 for t ≥ −n. Since H(t) is
continuous but not a cdf, Xn does not converge in distribution to any RV X.
b) FXn (t) → H(t) ≡ 0 ∀t ∈ R since FXn (t) = 0 for t < n. Since H(t) is
continuous but not a cdf, Xn does not converge in distribution to any RV X.
c)
0 t≤a
t−a
FXn (t) → FX (t) = b−a a ≤ t≤b
1 t ≥ b.
D
Hence Xn → X ∼ U (a, b).
d)
0 t<c
FXn (t) →
1 t > c.
D
Hence Xn → X where P (X = c) = 1. Hence X has a point mass distribution
at c. (The behavior of limn→∞ FXn (c) is not important, even if the limit does
not exist.)
e)
t+n 1 t
FXn (t) = = +
2n 2 2n
for −n ≤ t ≤ n. Thus FXn (t) → H(t) ≡ 0.5 ∀t ∈ R. Since H(t) is continuous
but not a cdf, Xn does not converge in distribution to any RV X.
f)
t − c + n1 1 n
FXn (t) = 2 = + (t − c)
n
2 2
for c − 1/n ≤ t ≤ c + 1/n. Thus
0 t<c
FXn (t) → H(t) = 1/2 t = c
1 t > c.
4.1 Modes of Convergence 65
Hence t = c is the only discontinuity point of FX (t), and H(t) = FX (t) at all
D
continuity points of FX (t). Thus Xn → X where P (X = c) = 1.
Definition 4.3. a) A sequence of random variables Xn converges in prob-
ability to a constant τ (θ), written
P
Xn → τ (θ),
P P
Notice that Xn → X if Xn − X → 0.
E(|Yn − Y |r ) → 0
E[u(Y )]
P [u(Y ) ≥ c] ≤ .
c
If µ = E(Y ) exists, then taking u(y) = |y − µ|r and c̃ = cr gives
Markov’s Inequality: for r > 0 with E[|Y − µ|r ] finite and for any c > 0,
E[|Y − µ|r ]
P (|Y − µ| ≥ c] = P (|Y − µ|r ≥ cr ] ≤ .
cr
If r = 2 and σ 2 = V (Y ) exists, then we obtain
Chebyshev’s Inequality:
V (Y )
P (|Y − µ| ≥ c] ≤ .
c2
Proof. The proof is given for pdfs. For pmfs, replace the integrals by sums.
Now
Z Z Z
E[u(Y )] = u(y)f(y)dy = u(y)f(y)dy + u(y)f(y)dy
R {y:u(y)≥c} {y:u(y)<c}
Z
≥ u(y)f(y)dy
{y:u(y)≥c}
Note: if E[|Y − µ|k ] is finite and k > 1, then E[|Y − µ|r ] is finite for
1 ≤ r ≤ k.
The following theorem gives sufficient conditions for Tn to converge in
qm
probability to τ (θ). Notice that M SEτ(θ) (Tn ) → 0 is equivalent to Tn → τ (θ).
Theorem 4.2. a) If
P
then Tn → τ (θ).
b) If
lim Vθ (Tn ) = 0 and lim Eθ (Tn ) = τ (θ),
n→∞ n→∞
4.1 Modes of Convergence 67
P
then Tn → τ (θ).
Proof. a) Using Theorem 4.1 with Y = Tn , u(Tn ) = (Tn − τ (θ))2 and
c = 2 shows that for any > 0,
Eθ [(Tn − τ (θ))2 ]
Pθ (|Tn − τ (θ)| ≥ ) = Pθ [(Tn − τ (θ))2 ≥ 2 ] ≤ .
2
Hence
lim Eθ [(Tn − τ (θ))2 ] = lim M SEτ(θ) (Tn ) → 0
n→∞ n→∞
P
is a sufficient condition for Tn → τ (θ).
b)
M SEτ(θ) (Tn ) = Vθ (Tn ) + [Biasτ(θ) (Tn )]2
where Biasτ(θ) (Tn ) = Eθ (Tn ) − τ (θ). Since M SEτ(θ) (Tn ) → 0 if both Vθ (Tn )
→ 0 and Biasτ(θ) (Tn ) = Eθ (Tn ) − τ (θ) → 0, the result follows from a).
P
Remark 4.2. We want conditions A ⇒ B where B is Xn → X. A ⇒ B
does not mean that if A does not hold, then B does not hold. A ⇒ B means
that if A holds, then B holds. A common error is for the student to say A
does not hold, so Xn does not converge in probability to X.
Theorem 4.3. a) Suppose Xn and X are RVs with the same probability
P D
space. If Xn → X, then Xn → X.
P D
b) Xn → τ (θ) iff Xn → τ (θ).
P
Proof. a) Assume Xn → X, and let > 0. Then Fn (x) = P (Xn ≤ x) =
= P (|Xn − X| ≥ ) + FX (x + )
where the second equality holds because the events for a partition. P (Xn ≤
x, X > x + ) ≤ P (|Xn − X| ≥ ) by the following diagram with e = .
X_n X
--------------------------
x x+e
Note that P (Xn ≤ x, X ≤ x + ) ≤ P (X ≤ x + ) since P (A ∩ B) ≤ P (B).
Similarly,
FX (x − ) = P (X ≤ x − ) = P (X ≤ x − , Xn > x) + P (X ≤ x − , Xn ≤ x)
X X_n
--------------------------
x-e x
Thus
P
Thus P [|Xn − c| ≥ ] → 1 − 1 + 0 = 0 as n → ∞, and Xn → c.
Definition 4.5. a) A sequence of random variables Xn converges with
probability 1 (or almost surely, or almost everywhere) to X if
P ( lim Xn = X) = 1.
n→∞
b)
wp1
Xn → τ (θ),
4.1 Modes of Convergence 69
if P ( lim Xn = τ (θ)) = 1.
n→∞
Theorem 4.5: Let k > 0. If E(X k ) is finite, then E(X j ) is finite for
0 < j ≤ k.
Proof. If |y| ≤ 1, then |yj | = |y|j ≤ 1. If |y| > 1 then |y|j ≤ |y|k . Thus
|y| ≤ |y|k + 1 and |X|j ≤ |X|k + 1. Hence E[|X|j ] ≤ E[|X|k ] + 1 < ∞.
j
g[E(X)] ≤ E[g(X)]
if the expected values exist and the function g is convex on an interval con-
taining the range of X.
Remark 4.4. a) Let (a, b) be an open interval where a = −∞ and b = ∞
are allowed. A sufficient condition for a function g to be convex on an open
interval (a, b) is g00 (x) > 0 on (a, b). If (a, b) = (0, ∞) and g is continuous on
[0, ∞) and convex on (0, ∞), then g is convex on [0, ∞).
b) If X is a positive RV, then the range of X is (0, ∞).
r k
Theorem 4.7: If Xn → X, then Xn → X where 0 < k < r.
Proof. Let Un = |Xn − X|r and Wn = |Xn − X|k . then Un = Wnt where
t = r/k > 1. The function g(x) = xt is convex on [0, ∞). By Jensen’s
70 4 Large Sample Theory
inequality,
for r > k. Thus limn→∞ E[|Xn −X|r = 0 implies that limn→∞ E[|Xn −X|k =
0 for 0 < k < r.
r P
Theorem 4.8. If Xn → X, then Xn → X.
Proof I) For > 0,
where the first inequality holds since the indicator is 0 or 1, and the second
inequality holds since |Xn − X|r ≥ r when the indicator is 1. Thus for any
> 0,
E[|Xn −X|r ] ≥ E[|Xn −X|r I(|Xn −X| ≥ )] ≥ E[r I(|Xn −X| ≥ )] = r P [|Xn−X| ≥ ].
Hence
E[|Xn − X|r ]
P [|Xn − X| ≥ ] ≤ →0
r
as n → ∞.
Proof II)
E[|Xn − X|r ]
P [|Xn − X| ≥ ] = P [|Xn − X|r ≥ r ] ≤ →0
r
as n → ∞ by the Generalized Chebyshev Inequality.
1 2
Hence Xn → 0 as expected by Theorem 4.7 since Xn → 0.
Theorem 4.9: Let Xn have pdf fXn (x), and let X have pdf fX (x). If
fXn (x) → fX (x) for all x (or for x outside of a set of Lebesgue measure 0),
D
then Xn → X.
Theorem 4.10: Suppose Xn and X are integer valued RVs with pmfs
D
fXn (x) and fX (x). Then Xn → X iff P (Xn = k) → P (X = k) for every
integer k iff fXn (x) → fX (x) for every real x.
Definition 4.7. If the mgf exists, then the cumulant generating func-
tion (cgf) k(t) = log(m(t)) for the values of t where the mgf is defined.
formulas i) and ii) “hold” if Y has a pmf, at least for t such that the mgf is
defined. If Y is nonnegative then the mgf is a scaled Laplace transformation
and c(t) is a scaled Fourier transformation, and then the two formulas i) and
ii) hold by Laplace and Fourier transformation theory, at least for t such that
the mgf is defined. The Taylor series for the mgf is
∞ k
X t
mY (t) = E[X k ]
k!
k=0
for all real t if Y has an mgf defined for all real t. Hence if b = ∞, the two
formulas i) and ii) hold. See Billingsley (1986, pp. 285, 353).
b) If E[Y 2 ] is finite, then
1
cY (t) = 1 + itE(Y ) − t2 E[Y 2 ] + o(t2 ) as t → 0.
2
In particular, if E(Y ) = 0 and E(Y 2 ) = V (Y ) = σ 2 , then
t2 σ 2
cY (t) = 1 − + o(t2 ) as t → 0. (4.2)
2
a(t)
Here a(t) = o(t2 ) as t → 0 if lim = 0. See Billingsley (1986, p. 354).
t→0 t2
c) Properties of c(t): i) c(0) = 1, ii) the modulus |c(t)| ≤ 1 for all real t,
iii) c(t) is a continuous function.
d) If Y has mgf m(t), then E(Y k ) is finite for each positive integer k.
e) A complex random variable Z = X + iY where X and Y are ordi-
nary random √variables. Then E(Z) = E(X) + iE(Y ), and E(Z) exists if
E(|Z|) = E( X 2 + Y 2 ) < ∞. Linearity of expectation and key inequali-
ties such as |E(Z)| ≤ E(|Z|) remain valid. Also, if Z1 Z2 and gi (Zi ) is a
function of the complex random variable Zi alone, then E[g1 (Z1 )g2 (Z2 )] =
E[g1 (Z1 )]E[g2(Z2 )] if the expectations exist. Z = eitY is the main complex
random variable in this book.
i0X 0 itX
p = E(e ) = E(e ) = 1. Note that |c(t)| = |E[e ]| ≤
Note that c(0)
E(|e |) = E[ [cos(itX)]2 + [sin(itX)]2 ] = E(1) = 1 by f) since [cos(itX(ω)]2 +
itX
Theorem 4.11. Suppose that the mgf m(t) exists for |t| < b for some
constant b > 0, and suppose that the kth derivative m(k) (t) exists for |t| < b.
Then E[Y k ] = m(k) (0) for positive integers k. In particular, E[Y ] = m0 (0)
00
and E[Y 2 ] = m (0). For the cumulant generating function k(t) = kY (t),
00
E(Y ) = k 0 (0) and V (Y ) = k (0). If E(Y k ) exists for a positive integer k,
then
1
E[Y k ] = k c(k)(0).
i
Note that
d m0Y (0)
k 0 (0) = log(mY (t)) = = E(Y )/1 = E(Y ).
dt t=0 mY (0)
Now
d m0Y (t) m00 (t)mY (t) − (m0Y (t))2
k 00 (t) = = Y .
dt mY (t) [mY (t)]2
So
k 00 (0) = m00Y (0) − [m0Y (0)]2 = E(Y 2 ) − [E(Y )]2 = V (Y ).
Definition 4.10. Random variables X and Y are identically distributed,
D
written X ∼ Y , X = Y , or Y ∼ FX , if FX (y) = FY (y) for all real y.
Proof of the WLLN. Want to show that if the Xi are iid with E(Xi ) <
D Pn
∞, then X n = Tn /n → E(X1 ) where Tn = i=1 Xi = nX n . Let Yi =
74 4 Large Sample Theory
Tn
Xi − E(Xi ) have characteristic function ϕY (t). Then Y n = − E(X1 ) has
n
characteristic function n
t
ψn (t) = ϕY .
n
Now
n n n
t t X t
ϕY − 1 = ϕY − 1n ≤ ϕY −1 =
n n n
k=1
t
n ϕY −1 .
n
If t 6= 0, then
n t
ϕY − ϕY (0)
t t n
ϕY − 1 ≤ n ϕY −1 = t
|t| → |t| ϕ0Y (0)
n n n
Tn D
− E(X1 ) → X.
n
Thus
Tn Tn D
− E(X1 ) + E(X1 ) = = X n → E(X1 )
n n
P
by Slutsky’s theorem using an = E(X1 ) → a = E(X1 ).
provided that the expectation exists for all t in some neighborhood of the
origin 0.
Theorem 4.13. If Y1 , ..., Yn have a cf cY (t) and mgf mY (t) then the
marginal cf and mgf for Yi1 , ..., Yik are found from the joint cf and mgf by
replacing tij by 0 for j = k + 1, ..., n. In particular, if Y = (Y 1 , Y 2 )T and
t = (t1 , t2 )T , then
If the joint mgf exists, then the random vectors Y 1 and Y 2 are independent
iff their joint mgf factors into the product of their marginal mgfs:
∀t in some neighborhood of 0.
Note that if Y 1 Y 2 , then
ind
= E[exp(itT1 Y 1 )]E[exp(itT2 Y 2 )] = cY 1 (t1 )cY 2 (t2 )
for any t = (tT1 , tT2 )T ∈ Rn .
Theorem 4.15. a) The characteristic function uniquely determines the
distribution.
b) If the moment generating function exists, then it uniquely determines
the distribution.
c) Assume that Y1 , ..., Yn are independent with
Pn characteristic functions
cYi (t). Then the characteristic function of W = i=1 Yi is
n
Y
cW (t) = cYi (t). (4.3)
i=1
76 4 Large Sample Theory
e) Assume
P that Y1 , ..., Yn are independent with mgfs mYi (t). Then the mgf
of W = ni=1 Yi is
Yn
mW (t) = mYi (t). (4.5)
i=1
Pnf) Assume that Y1 , ..., Yn are iid with mgf mY (t). Then the mgf of W =
i=1 Yi is
mW (t) = [mY (t)]n . (4.6)
g) Assume that Y1 , ..., Yn are independent with
Pn characteristic functions
cYi (t). Then the characteristic function of W = j=1 (aj + bj Yj ) is
n
X n
Y
cW (t) = exp(it aj ) cYj (bj t). (4.7)
j=1 j=1
h) Assume
Pn that Y1 , ..., Yn are independent with mgfs mYi (t). Then the mgf
of W = i=1 (ai + bi Yi ) is
n
X n
Y
mW (t) = exp(t ai ) mYi (bi t). (4.8)
i=1 i=1
Partial Proof:
c)
n
ind
Pn Y
cPnj=1 Yj (t) = E[eit j=1 Yj
] = E[eitY1+···+itYn ] = E eitYj =
j=1
n
Y n
Y
E[eitYj ] = cYj (t).
j=1 j=1
The proofs for d), e), and f) are similar, but for mgfs, omit the i’s and
change c to m. Pn Qn
g) Recall that exp(w) = ew and exp( j=1 dj ) = j=1 exp(dj ). Now
n
X
cW (t) = E(eitW ) = E(exp[it (aj + bj Yj )])
j=1
n
X n
X
= exp(it aj ) E(exp[ itbj Yj )])
j=1 j=1
4.2 The Characteristic Function and Related Functions 77
n
X n
Y
= exp(it aj ) E( exp[itbj Yj )])
j=1 i=1
n
X n
Y
= exp(it aj ) E[exp(itbj Yj )]
j=1 i=1
Pn
The distribution of W = i=1 Yi is known as the convolution of Y1 , ..., Yn.
Even for n = 2, convolution formulas tend to be hard; however,Pn the following
two theorems suggest that to find the distribution of W = i=1 Yi , first find
the mgf or characteristic function of W . If the mgf or cf is that of a brand
name distribution, then W has that distribution. For example, if the mgf of
W is a normal (ν, τ 2 ) mgf, then W has a normal (ν, τ 2 ) distribution, written
W ∼ N (ν, τ 2 ). This technique is useful for several brand name distributions
given in Section 2.4.
Here ai and bi are fixed constants. Thus if Y1 , ..., Yn are iid N (µ, σ 2 ), then
Y ∼ N (µ, σ 2 /n).
f) If Y1 , ..., Yn are independent Poisson POIS(θi ), then
n
X Xn
Yi ∼ POIS( θi ).
i=1 i=1
Also
Y ∼ IG(θ, nλ).
d) If Y1 , ..., Yn are independent negative binomial NB(ri , ρ), then
n
X Xn
Yi ∼ NB( ri , ρ).
i=1 i=1
Example 4.6. Suppose Y1 , ..., Yn are iid IG(θ, λ) where the mgf
" r !#
λ 2θ2 t
mYi (t) = m(t) = exp 1− 1−
θ λ
which is the mgf of an IG(nθ, n2 λ) random variable. The last equality was
2
obtained
Pby multiplying nλ
θ
by 1 = n/n and by multiplying 2θλ t by 1 = n2 /n2 .
n
Hence i=1 Yi ∼ IG(nθ, n2 λ).
The CLT is also known as the Lindeberg-Lévy CLT, and several proofs will
be given later in this chapter.
Remark
√ 4.7. i) The sample mean is estimating the population mean µ
with a n convergence rate, the asymptotic distribution is normal.
ii)
Pn
√ i=1 Yi − nµ
Yn−µ
Yn−µ
Zn = n = √ = √
σ σ/ n nσ
Pn D D
is the z–score of Y and the z–score of i=1 Yi . Then Zn → N (0, 1). If Zn →
N (0, 1), then the notation Zn ≈ N (0, 1), also written as Zn ∼ AN (0, 1),
means approximate the cdf of Zn by the standard normal cdf. Similarly, the
notation
Y n ≈ N (µ, σ 2 /n),
also written as Y n ∼ AN (µ, σ 2 /n), means approximate the cdf of Y n as if
Y n ∼ N (µ, σ 2 /n). Note that the approximate distribution, unlike the limit-
ing distribution, often does depend on n.
D
iii) The notation Yn → X means that for large n we can approximate the cdf
of Yn by the cdf of X.
iv) The distribution of X is the limiting distribution or asymptotic distribu-
tion of Yn , and the limiting distribution does not depend on n.
The two main applications of the CLT are to√give the limiting distribution
√
of n(Y n − µ) and the limiting Pn distribution of n(Yn /n − µX ) for a random
variable Yn such that Yn = i=1 Xi where the Xi are iid with E(X) = µX
2
and V (X) = σX . Several of the random variables in Theorems 4.16 and 4.17
can be approximated in this way.
Given iid data from some distribution,
√ a common homework problem is to
find the limiting distribution of n(Y n − µ) using the CLT. You may need to
find E(Y ), E(Y 2 ), and V (Y ) = E(Y 2 ) − [E(Y )]2 . A variant of this problem
gives a formula for E(Y r ). Then find E(Y ) = E(Y 1 ) with r = 1 and E(Y 2 )
with r = 2.
√
Yn D
n − ρ → N (0, ρ(1 − ρ))
n
since
√ D √
Yn D
n − ρ = n(X n − ρ) → N (0, ρ(1 − ρ))
n
by a).
4.4 Slutsky’s Theorem, the Continuity Theorem and Related Results 81
Theorem 4.19. Suppose Xn and X are RVs with the same probability space.
P D
a) If Xn → X, then Xn → X.
wp1 P D
b) If Xn → X, then Xn → X and Xn → X.
r P D
c) If Xn → X, then Xn → X and Xn → X.
P D
d) Xn → τ (θ) iff Xn → τ (θ).
D D D
e) If Xn → X and Xn → Y , then X = Y and FX (x) = FY (x) for all real x.
Partial Proof. a) See Theorem 4.3. c) See Theorem 4.8. d) See Theorem
4.3.
e) Suppose X has cdf F and Y has cdf G. Then F and G agree at their
common points of continuity. Hence F and G agree at all but countably many
points since F and G are cdfs. Hence F and G agree at all points by right
continuity.
A A D
Note: If Xn → X and Xn → Y , then X = Y where A is wp1, r, or P .
A A
This result holds by Theorem 4.19 e) since if Xn → X and Xn → Y , then
D D
Xn → X and Xn → Y .
D P
Theorem 4.20: Slutsky’s Theorem. Suppose Yn → Y and Wn → w
for some constant w. Then
D
a) Yn + Wn → Y + w,
D
b) Yn Wn → wY, and
D
c) Yn /Wn → Y /w if w 6= 0.
A D
Remark 4.8. Note that Yn → Y implies Yn → Y where A = wp1, r, or
P D
P . Also Wn → w iff Wn → w. If a sequence of constants cn → c as n → ∞
wp1 P
(regular convergence is everywhere convergence), then cn → c and cn → c.
P B
So Wn → w can be replaced by Wn → w where B = D, wp1, r, P, or regular
convergence.
A B
i) So Slutsky’s theorem a), b) and c) hold if Yn → Y and Wn → w.
82 4 Large Sample Theory
A B
ii) If Y ≡ y where y is a constant, then Yn → y and Wn → w implies that
D P
a), b) and c) hold with Y replaced by y, and → can be replaced by →.
D P P D
iii) If Yn → Y , an → a, and bn → b, then an + bn Yn → a + bY .
P P
Theorem 4.21. a) If Xn → θ and τ is continuous at θ, then τ (Xn ) → τ (θ).
D D
b) If Xn → θ and τ is continuous at θ, then τ (Xn ) → τ (θ).
Example 4.8. Let Y1 , ..., Yn be iid with mean E(Yi ) = µ and variance
P
V (Yi ) = σ 2 . Then the sample mean Y n → µ since i) the SLLN holds (use
Theorem 4.19 and 4.4), ii) the WLLN holds and iii) the CLT holds (use
Theorem 4.34). Since
P
Y n → µ by Theorem 4.2.
D D
Example 4.9. (Ferguson 1996, p. 40): If Xn → X then 1/Xn → 1/X if
X is a continuous random variable since P (X = 0) = 0 and x = 0 is the only
discontinuity point of g(x) = 1/x.
The following theorem is often part of the continuity theorem in the liter-
ature, and helps explain why Theorem 4.22 is called the continuity theorem.
Theorem 4.23: If limn→∞ cXn (t) = g(t) for all t where g is continuous
at t = 0, then g(t) = cX (t) is a characteristic function for some RV X, and
D
Xn → X.
σ2 2
CY −µ (t) = 1 − t + o(t2 ) and
2
t2
t
CY −µ √ =1− + o(t2 /n)
σ n 2n
84 4 Large Sample Theory
where
o(t2 /n)
→0
t2 /n
as n → ∞. Hence n o(t2 /n) → 0 as n → 0.
b) Let the Z-score of Y n be
√ Pn Pn
n(Y − µ) Y −µ Yi − nµ (Yi − µ)
Zn = = √ = i=1 √ = i=1 √
σ σ/ n σ n σ n
where the Yi − µ are iid with characteristic function cY −µ (t). Then the char-
Yi − µ
t
acteristic function of √ is cY −µ √ , and the characteristic function
σ n σ n
of Zn is n
t
cZn (t) = cY −µ √ .
σ n
√
If cZn (t) → cZ (t), the N (0, 1) characteristic function, then σZn = n(Y n −µ)
has 2 2
cσZn (t) → cσZ (t) = cZ (σt ) = e−σ t /2 ,
the N (0, σ 2 ) characteristic function, and the CLT holds.
Proof of the CLT: Let Zn be the Z-score of Y n . By Remark 4.10,
n
t2
cZn (t) = 1 − + o(t2 /n) =
2n
" #n
t2 2
2 − n o(t /n) 2
1− → e−t /2 = cZ (t)
n
D √ D
for all t by Remark 4.5 b). Thus Zn → Z ∼ N (0, 1) and σZn = n(Y n −µ) →
N (0, σ 2 ).
The next proof does not use characteristic functions, but only applies to iid
random variables Yi that have a moment distribution function. Thus E(Yij )
exists for each positive integer j. The CLT only needs E(Y ) and E(Y 2 ) to
exist. In the proof, k(t) = log(m(t)) is the cumulant generating function with
k 0 (0) = E(X) and k 00 (x) = V (X).
L’Hôspital’s Rule: Suppose functions f(x) → 0 and g(x) → 0 as x ↓ d,
x ↑ d, x → d, x → ∞, or x → −∞. If
f 0 (x) f(x)
→ L then →L
g0 (x) g(x)
as x ↓ d, x ↑ d, x → d, x → ∞, or x → −∞.
Proof of a Special Case of the CLT. Following
Rohatgi (1984, pp. 569-9) and Tardiff (1981), let Y1 , ..., Yn be iid with mean
4.4 Slutsky’s Theorem, the Continuity Theorem and Related Results 85
Yi − µ
Zi =
σ
has mean 0, variance 1 and mgf mZ (t) = exp(−tµ/σ)mY (t/σ) for |t| < σto .
Want to show that
√
Yn−µ D
Wn = n → N (0, 1).
σ
Notice that Wn =
n n Pn
Yi − µ n−1/2 Y n − µ
−1/2
X
−1/2
X
−1/2 i=1 Yi − nµ
n Zi = n =n = 1
.
i=1 i=1
σ σ n
σ
Thus
n n
X X √
mWn (t) = E(etWn ) = E[exp(tn−1/2 Zi )] = E[exp( tZi / n)]
i=1 i=1
n n
Y √ Y √ √
= E[etZi / n
]= mZ (t/ n) = [mZ (t/ n)]n .
i=1 i=1
Now kZ (0) = log[mZ (0)] = log(1) = 0. Thus by L’Hôpital’s rule (where the
derivative is with respect to n), limn→∞ log[mWn (t)] =
√ √ √
kZ (t/ n ) 0
kZ (t/ n )[ −t/2
n3/2
] t 0
kZ (t/ n )
lim 1 = lim = lim .
n→∞
n
n→∞ ( −1
n2
) 2 n→∞ √1
n
0
Now kZ (0) = E(Zi ) = 0, so L’Hôpital’s rule can be applied again, giving
limn→∞ log[mWn (t)] =
√
t 00
kZ (t/ n )[ 2n−t3/2 ] t2 00
√ t2 00
lim −1 = lim kZ (t/ n ) = kZ (0).
2 n→∞ ( 2n3/2 ) 2 n→∞ 2
00
Now kZ (0) = V (Zi ) = 1. Hence limn→∞ log[mWn (t)] = t2 /2 and
√
Yn−µ
D
Wn = n → N (0, 1).
σ
X1 /d1
∼ Fd1 ,d2 .
X2 /d2
Pk
If Ui ∼ χ21 are iid then i=1 Ui ∼ χ2k . Let d1 = r and k = d2 = dn . Hence if
X2 ∼ χ2dn , then
Pdn
X2 Ui P
= i=1 = U → E(Ui ) = 1
dn dn
D
by the law of large numbers. Hence if W ∼ Fr,dn , then rWn → χ2r .
Example 4.11. a) Let Xn ∼ bin(n, pn) where npn = λ > 0 for all positive
integers n. Then the mgf mXn (t) = (1 − pn + pn et )n for all t. Thus
n n
λ(et − 1)
λ λ t
mXn (t) = 1 − + et = 1+ → eλ(e −1) = mX (t)
n n n
D
for all t where X ∼ P OIS(λ). Hence Xn → X ∼ P OIS(λ) by the continuity
theorem.
b) Now let Xn ∼ bin(n, pn) where npn → λ > 0 as n → ∞. Thus
n
−npn + npn et
t
mXn (t) = 1 + → eλ(e −1) = mX (t)
n
P (|Wn | ≤ D ) ≥ 1 −
Wn
P (d ≤ ≤ D ) ≥ 1 −
Xn
for all n ≥ N .
d) Similar notation is used for a k × r matrix An = [ai,j (n)] if each
element ai,j (n) has the desired property. For example, An = OP (n−1/2 ) if
each ai,j (n) = OP (n−1/2 ).
a) Then Wn = OP (n−δ ).
b) If X is not degenerate, then Wn P n−δ .
The above result implies that if Wn has convergence rate nδ , then Wn has
tightness rate nδ , and the term “tightness” will often be omitted. Part a) is
proved, for example, in Lehmann (1999, p. 67).
88 4 Large Sample Theory
Wn 1 Xn 1
P (d ≤ ≤ D ) = P ( ≤ ≤ )≥1−
Xn D Wn d
Wn
P (|Wn | ≤ |Xn D |) ≥ P (d ≤ ≤ D ) ≥ 1 −
Xn
Wn
P (A) ≡ P ( ≤ D/2 ) ≥ 1 − /2
Xn
and
Wn
P (B) ≡ P (d/2 ≤ ) ≥ 1 − /2
Xn
for all n ≥ N = max(N1 , N2 ). Since P (A ∩ B) = P (A) + P (B) − P (A ∪ B) ≥
P (A) + P (B) − 1,
Wn
P (A ∩ B) = P (d/2 ≤ ≤ D/2 ) ≥ 1 − /2 + 1 − /2 − 1 = 1 −
Xn
The following result is used to prove the following Theorem 4.30 which says
that if there are K estimators Tj,n of a parameter β, such that kTj,n − βk =
OP (n−δ ) where 0 < δ ≤ 1, and if Tn∗ picks one of these estimators, then
kTn∗ − βk = OP (n−δ ).
Theorem 4.29: Pratt (1959). Let X1,n , ..., XK,n each be OP (1) where
K is fixed. Suppose Wn = Xin ,n for some in ∈ {1, ..., K}. Then
Wn = OP (1). (4.9)
Proof.
FWn (x) ≤ P (min{X1,n , ..., XK,n} ≤ x) = 1 − P (X1,n > x, ..., XK,n > x).
Since K is finite, there exists B > 0 and N such that P (Xi,n ≤ B) > 1−/2K
and P (Xi,n > −B) > 1 − /2K for all n > N and i = 1, ..., K. Bonferroni’s
PK
inequality states that P (∩K i=1 Ai ) ≥ i=1 P (Ai ) − (K − 1). Thus
Proof. Let Xj,n = nδ kTj,n − βk. Then Xj,n = OP (1) so by Theorem 4.29,
n kTn∗ − βk = OP (1). Hence kTn∗ − βk = OP (n−δ ).
δ
Remark 4.11. For each positive integer n, let Wn1 , ..., Wnrn be independent.
The probability space may change with n, giving a double array of random
rn
X
variables. Let E[Wnk ] = 0, V (Wnk ) = E[Wnk2
] = σnk2
, and s2n = 2
σnk =
k=1
90 4 Large Sample Theory
rn
X
V[ Wnk ]. Then
k=1 Prn
k=1 Wnk
Zn =
sn
Prn
is the z-score of k=1 Wnk .
For the above remark, let rn = n. Then the double array is the triangular
array shown below. Double arrays are sometimes called triangular arrays.
W11
W21 , W22
W31 , W32 , W33
..
.
Wn1 , Wn2 , Wn3, ..., Wnn
..
.
Theorem 4.31, Lyapounov’s CLT: Under Remark 4.11, assume the
|Wnk |2+δ are integrable for some δ > 0. Assume Lyapounov’s condition:
rn
X E[|Wnk |2+δ ]
lim = 0. (4.11)
n→∞
k=1
s2+δ
n
Then Prn
k=1 Wnk D
Zn = → N (0, 1).
sn
TheoremP4.31 can be proved using Theorem 4.32. Note that Zn is the
rn
Z-score of k=1 Wnk .
Example 4.12. Special
Pn cases: i) rn = n and Wnk = Wk has W1 , ..., Wn, ...
independent with s2n = k=1 σk2 .
ii) Wnk = Xnk − E(Xnk ) = Xnk − µnk has
Prn
k=1 (Xnk − µnk ) D
→ N (0, 1).
sn
iii) Suppose X1 , X2 , ... are independent with E(Xi ) = µi and V (Xi ) = σi2 .
Let Pn Pn
i=1 Xi − i=1 µi
Zn = Pn 1/2
( i=1 σi2 )
Pn
be the z-score of i=1 Xi . Assume E[|Xi − µi |3 ] < ∞ for all n ∈ N and
Pn 3
i=1 E[|Xi − µi | ]
lim n 3/2
= 0. (4.12)
n→∞
( i=1 σi2 )
P
D
Then Zn → N (0, 1).
4.6 More CLTs 91
Pn
Proof of iii): Take Wnk = Xk − µk , δ = 1, s2n = 2
k=1 σk , and apply
Lyapounov’s CLT. Note that
n
!3/2
X
σk2 = (s2n )3/2 = s3n = s2+1
n .
k=1
The (Lindeberg-Lévy) CLT has the Xi iid with V (Xi ) = σ 2 < ∞. The
Lyapounov CLT in Example 4.12. iii) has the Xi independent (not necessar-
ily identically distributed), but needs stronger moment conditions to satisfy
Equation (4.11) or (4.12).
Theorem 4.32, Lindeberg CLT: Let the Wnk satisfy Remark 4.11 and
Lindeberg’s condition
rn 2
X E(Wnk I[|Wnk | ≥ sn ])
lim =0 (4.13)
n→∞ s2n
k=1
1
Z
= W12 dP → 0
σ2 √
|W1 |≥σ n
√ √
as n → ∞ since P (|W1 | ≥ σ n) ↓ 0 as n → ∞. Or Yn = W12 I[|W1 | ≥ σ n]
satisfies Yn ≤ W12 and Yn ↓ Y = 0 as n → ∞. Thus E(Yn ) → E(Y ) = 0
by Lebesgue’s Dominated Convergence Theorem. Thus Equation (4.15) holds
D
and Zn → N (0, 1). If the Wi = Xi − µ, then
Pn √
i=1 (Xi − µ) n(X n − µ) D
Zn = √ = → N (0, 1).
σ n σ
√ D
Thus n(X n − µ) → N (0, σ 2 ).
d) Note that
rn rn
1 1 |Wnk |2+δ
X Z X Z
2
W dP ≤ dP = RHS
s2n {|Wnk |≥sn } nk s2n {|Wnk |≥sn } δ sδn
k=1 k=1
|Wnk |δ
>1
δ sδn
Example 4.14. DeGroot (1975, pp. 229-230): Suppose the Xi are in-
dependent Ber(pi ) ∼ bin(m =P1, pi ) random variables with E(Xi ) = pi ,
∞
V (Xi ) = pi qi , qi = 1 − pi , and i=1 pi qi = ∞. Prove that
Pn Pn
i=1 Xi − i=1 pi D
Zn = n → N (0, 1)
( i=1 pi qi )1/2
P
as n → ∞.
Proof. Let Yi = |Wi | = |Xi − pi |. Then P (Yi = 1 − pi ) = pi and
P (Yi = qi ) = qi . Thus
X
E[|Xi − pi |3 ] = E[|Wi |3 ] = y3 f(y) = (1 − pi )3 pi + p3i qi = qi3 pi + p3i qi
y
= pi qi (p2i + qi2 ) ≤ pi qi
Pn Pn
since p2i + qi2 ≤ P
(Pi + qi )2 = 1. Thus i=1 E[|Xi − pi |3 ] ≤ i=1 pi qi . Dividing
n
both sides by ( i=1 pi qi )3/2 gives
Pn
i=1 E[|Xi − pi |3 ] 1
Pn ≤ Pn →0
( i=1 pi qi )3/2 ( i=1 pi qi )1/2
D
as n → ∞. thus Equation (4.12) holds and Zn → N (0, 1).
Theorem 4.33, Hájek Šidak CLT: Let X1 , ..., Xn be iid with E(Xi ) = µ
and V (Xi ) = σ 2 . Let cn = (cn1 , ..., cnn)T be a vector of constants such that
c2
max Pn ni 2 → 0 as n → ∞.
j=1 cnj
1≤i≤n
Then Pn
i=1 cni (Xi − µ) D
Zn = qP → N (0, 1).
n 2
σ c
j=1 nj
D
Hence Xn = Yn + Wn → Y + 0 = X. The convergence in distribution parts
of b) and c) follow from a). Part f) follows from d) and e). Part e) implies
that if Tn is a consistent estimator of θ and τ is a continuous function, then
τ (Tn ) is a consistent estimator of τ (θ). Theorem 4. says that convergence in
distribution is preserved by continuous functions, and even some discontinu-
ities are allowed as long as the set of continuity points is assigned probability
1 by the asymptotic distribution. Equivalently, the set of discontinuity points
is assigned probability 0.
E[u(X)]
P [u(X) ≥ ] ≤ .
Proof Sketch. The proof is nearly identical to that of Theorem 4.1.
E[kX − ckr ]
P (kX − ck ≥ ] = P (kX − ckr ≥ r ] ≤ .
r
96 4 Large Sample Theory
and
E(Ax) = AE(x) and E(AxB) = AE(x)B. (4.17)
Thus
Cov(a + Ax) = Cov(Ax) = ACov(x)AT . (4.18)
Theorem 4.41 is the
√ multivariate extensions of CLT. When the limiting
distribution of Z n = n(g(T n ) − g(θ)) is multivariate normal Nk (0, Σ), ap-
proximate the joint cdf of Z n with the joint cdf of the Nk (0, Σ) distribution.
Thus to find probabilities, manipulate Z n as if Z n ≈ Nk (0, Σ). To see that
the CLT is a special case of the MCLT below, let k = 1, E(X) = µ and
V (X) = Σ = σ 2 .
The results in B) can be proven using the multivariate delta method. Let A
be a q × k constant matrix, b a constant, a a k × 1 constant vector, and d a
q × 1 constant vector. Note that a + bX n = a + AX n with A = bI. Thus i)
and ii) follow from iii).
A) Suppose X ∼ Nk (µ, Σ), then
i) AX ∼ Nq (Aµ, AΣAT ).
ii) a + bX ∼ Nk (a + bµ, b2 Σ).
iii) AX + d ∼ Nq (Aµ + d, AΣAT ).
(Find the mean and covariance matrix of the left hand side and plug in those
values for the right hand side. Be careful with the dimension k or q.)
D
B) Suppose X n → Nk (µ, Σ). Then
D
i) AX n → Nq (Aµ, AΣAT ).
D
ii) a + bX n → Nk (a + bµ, b2 Σ).
D
iii) AX n + d → Nq (Aµ + d, AΣAT ).
P
Definition 4.17. If the estimator g(T n ) → g(θ) for all θ ∈ Θ, then g(T n )
is a consistent estimator of g(θ).
Theorem 4.43. If X 1 , ..., Xn are iid, E(kXk) < ∞ and E(X) = µ, then
P
a) WLLN: X n → µ and
ae
b) SLLN: X n → µ.
for all t ∈ Rk .
for all t ∈ Rk .
98 4 Large Sample Theory
where y ∈ R.
D
If X n → X, then cX n (t) → cX (t) ∀ t ∈ Rk . Fix t. Then cX n (yt) →
D
cX (yt) ∀ y ∈ R. Thus tT X n → tT X.
D
Now assume tT X n → tT X ∀ t ∈ Rk . Then cX n (yt) → cX (yt) ∀ y ∈ R
D
and ∀ t ∈ Rk . Take y = 1 to get cX n (t) → cX (t) ∀ t ∈ Rk . Hence X n → X
by the Continuity Theorem.
Application: Proof of the MCLT Theorem 4.41. Note that for fixed
t, the tT X i are iid random variables with mean tT µ and variance tT Σt.
√ D
Hence by the CLT, tT n(X n − µ) → N (0, tT Σt). The right hand side has
distribution tT X where X ∼ Nk (0, Σ). Hence by the Cramér Wold Device,
√ D
n(X n − µ) → Nk (0, Σ).
P D
Theorem 4.46. a) If X n → X, then X n → X.
b)
P D
X n → g(θ) iff X n → g(θ).
Theorem 4.49. Suppose xn and x are random vectors with the same
probability space.
P D
a) If xn → x, then xn → x.
wp1 P D
b) If xn → x, then xn → x and xn → x.
r P D
c) If xn → x for some r > 0, then xn → x and xn → x.
P D
d) xn → c iff xn → c where c is a constant vector.
The proof of c) follows from the Generalized Chebyshev inequality. See
Example 4.15.
Remark 4.14. Let W n be a sequence of m × m random matrices and let
C be an m × m constant matrix.
P P
a) W n → X iff aT W n b → aT Cb for all constant vectors a, b ∈ Rm .
P P
b) If W n → C, then the determinant det(W n ) = |W n | → |C| = det(C).
P P
c) If W −1
n exists for each n and C
−1
exists, then If W n → C iff W −1
n → C
−1
.
The following two theorems are taken from Severini (2005, pp. 345-349,
354).
Theorem 4.53. Let xn = (x1n , ..., xkn)T and x = (x1 , ..., xk)T be random
D D
vectors. Then xn → x implies xin → xi for i = 1, ..., k.
Proof. Use the Cramér Wold device with ti = (0, ..., 0, 1, 0, ...0)T where
the 1 is in the ith position. Thus
D
tTi xn = xin → xi = tTi x.
Joint convergence in distribution implies marginal convergence in dis-
tribution by Theorem 4.53. Typically marginal convergence in distribution
D
xin → xi for i = 1, ..., m does not imply that
4.8 Multivariate Limit Theorems 101
x1n x1
.. D ..
. → . .
xmn xm
by the continuity theorem. To see this, let t = (tT1 , tT2 )T , z n = (xTn , yTn )T ,
and z = (xT , y T )T . Since xn y n and x y, the characteristic function
D D
Hence z n → z and g(z n ) → g(z) if g is continuous by the continuous
mapping theorem.
D D
Thus Yn ∼ N (0, 1), Xn → X, and Yn → X. Then
Xn 2X, n even
(1 1) = Xn + Yn =
Yn 0, n odd
then √ D
n(g(Tn ) − g(θ)) → N (0, σ 2 [g0 (θ)]2 ).
The CLT says that Y n ∼ AN (µ, σ 2 /n). The delta method says that if
Tn ∼ AN (θ, σ 2 /n), and if g0 (θ) 6= 0, then g(Tn ) ∼ AN (g(θ), σ 2 [g0 (θ)]2 /n).
Hence a smooth function g(Tn ) of a well behaved
√ statistic Tn tends to be well
behaved (asymptotically normal with a n convergence rate). By the delta
P
method and Theorem 4.b, Tn = g(Y n ) → g(µ) if g0 (µ) 6= 0 for all µ ∈ Θ. By
P
Theorem 4.e, g(Y n ) → g(µ) if g is continuous at µ.
Example 4.18. Let Y1 , ..., Yn be iid with E(Y ) = µ and V (Y ) = σ 2 . Then
by the CLT,
√ D
n(Y n − µ) → N (0, σ 2 ).
Let g(µ) = µ2 . Then g0 (µ) = 2µ 6= 0 for µ 6= 0. Hence
√ D
n((Y n )2 − µ2 ) → N (0, 4σ 2 µ2 )
√
Xn
a) Find the limiting distribution of − λ .
n
n
"r #
√ Xn √
b) Find the limiting distribution of n − λ .
n
D Pn
Solution. a) Xn = i=1 Yi where the Yi are iid Poisson(λ). Hence E(Y ) =
λ = V (Y ). Thus by the CLT,
Pn
√ D √ i=1 Yi
Xn D
n −λ = n − λ → N (0, λ).
n n
√
b) Let g(λ) = λ. Then g0 (λ) = 2√1 λ and by the delta method,
"r #
√ √ √
Xn Xn D
n − λ = n g − g(λ) →
n n
1 1
N (0, λ (g0 (λ))2 ) = N 0, λ = N 0, .
4λ 4
Example 4.20. Let Y1 , ..., Yn be independent and identically distributed
(iid) from a Gamma(α, β) distribution.
√
a) Find the limiting distribution of n Y − αβ .
√
b) Find the limiting distribution of n (Y )2 − c for appropriate con-
stant c.
N (0, p(1 − p)(g0 (p))2 ) = N (0, p(1 − p)4p2 ) = N (0, 4p3 (1 − p)).
√ D
Remark 4.16. a) Note that if n(Tn − k) → N (0, σ 2 ), then evaluate
0
the derivative at k. Thus use g (k) where k = αβ in the above example. A
104 4 Large Sample Theory
Then
1 2 D
τ (θ)g00 (θ)χ21 .
n[g(Tn ) − g(θ)] →
2
Example 4.22. Let Xn ∼ Binomial(n, p) where the positive integer n is
3
large 0< p < 1. Let g(θ) = θ − θ. Find the limiting distribution of
and
Xn 1
n g − c for appropriate constant c when p = √ .
n 3
D Pn
Solution: Since Xn = i=1 Yi where Yi ∼ BIN (1, p),
√
Xn D
n − p → N (0, p(1 − p))
n
by the CLT. Let θ = p. Then g0 (θ) = 3θ2 − 1 and g00 (θ) = 6θ. Notice that
√ √ √ √ 1 −2
g(1/ 3) = (1/ 3)3 − 1/ 3 = (1/ 3)( − 1) = √ = c.
3 3 3
√ √ √
Also g0 (1/ 3) = 0 and g00 (1/ 3) = 6/ 3. Since τ 2 (p) = p(1 − p),
√ 1 1
τ 2 (1/ 3) = √ (1 − √ ).
3 3
Hence
Xn −2 D 1 1 1 6 1
n g − √ → √ (1 − √ ) √ χ21 = (1 − √ ) χ21 .
n 3 3 2 3 3 3 3
To see that the delta method is a special case of the multivariate delta
method, note that if Tn and parameter θ are real valued, then Dg (θ ) = g0 (θ).
then √ D
n(g(T n ) − g(θ)) → Nd (0, Dg (θ ) ΣDTg (θ ) )
D g (θ ) = .. ..
.
. .
∂ ∂
g (θ) . . . ∂θk gd (θ)
∂θ1 d
Let (µ̂, φ̂) be the MLE of (µ, φ). According to Bain (1978, p. 215),
2
!!
√ 1.109 φµ2 0.257µ
µ̂ µ D 0
n − →N ,
φ̂ φ 0 0.257µ 0.608φ2
So
∂ ∂ ∂ φ ∂ φ
µ ∂φ µ φµφ−1 µφ log(µ)
∂θ1 g1 (θ) ∂θ2 g1 (θ) ∂µ
D g (θ ) = = = .
∂ ∂ ∂ ∂
∂θ1 g2 (θ) ∂θ2 g2 (θ) ∂µ φ ∂φ φ
0 1
√
λ̂ λ D
n − → N2 (0, Σ)
φ̂ φ
106 4 Large Sample Theory
4.10 Summary
D
1) Xn → X if
lim Fn (t) = F (t)
n→∞
P
b) Xn → X if for every > 0,
P
4) Theorem: Tn → τ (θ) if any of the following 2 conditions holds:
i) limn→∞ Vθ (Tn ) = 0 and limn→∞ Eθ (Tn ) = τ (θ).
ii) M SEτ(θ) (Tn ) = E[(Tn − τ (θ))2 ] → 0.
Here
M SEτ(θ) (Tn ) = Vθ (Tn ) + [Biasτ(θ) (Tn )]2
where Biasτ(θ) (Tn ) = Eθ (Tn ) − τ (θ).
5) Theorem: a) Let Xθ be a random variable with a distribution depending
on θ, and 0 < δ ≤ 1. If
D
nδ (Tn − τ (θ)) → Xθ
4.10 Summary 107
P
for all θ ∈ Θ, then Tn → τ (θ).
b) If
√ D
n(Tn − τ (θ)) → N (0, v(θ))
for all θ ∈ Θ, then Tn is a consistent estimator of τ (θ).
√ D P
Note: If n(Tn − θ) → N (0, σ 2 ), then Tn → θ. Often Xθ ∼ N (0, v(θ)).
6) WLLN: Let Y1 , ..., Yn, ... be a sequence of iid random variables with
P
E(Yi ) = µ. Then Y n → µ. Hence Y n is a consistent estimator of µ.
r
7) Yn converges in rth mean to a random variable Y , Yn → Y, if
E(|Yn − Y |r ) → 0
P ( lim Xn = X) = 1.
n→∞
wp1
This type of convergence will be denoted by Xn → X. Notation such as “Xn
converges to X wp1” will also be used. Sometimes “wp1” will be replaced
with “as” or “ae.”
wp1
Xn → τ (θ),
if P (limn→∞ Xn = τ (θ)) = 1.
wp1
9) SLLN: If X1 , ..., Xn are iid with E(Xi ) = µ finite, then X n → µ.
P r wp1
10) a) For i) Xn → X, ii) Xn → X, or iii) Xn → X, the Xn and X need
to be defined on the same probability space.
D
b) For Xn → X, the probability spaces can differ.
P wp1 D r
c) For i) Xn → c, ii) Xn → c, iii) Xn → c, and iv) Xn → c, the probability
spaces of the Xn can differ.
P D
11) Theorem: i) Tn → τ (θ) iff Tn → τ (θ).
P P
ii) If Tn → θ and τ is continuous at θ, then τ (Tn ) → τ (θ). Hence if Tn is
a consistent estimator of θ, then τ (Tn ) is a consistent estimator of τ (θ) if τ
is a continuous function on Θ.
12) Theorem: Suppose Xn and X are RVs with the same probability space
for b) and c). Let g : R → R be a continuous function.
D D
a) If Xn → X, then g(Xn ) → g(X).
108 4 Large Sample Theory
P P
b) If Xn → X, then g(Xn ) → g(X).
ae wp1
c) If Xn → X, then g(Xn ) → g(X).
13) CLT: Let Y1 , ..., Yn be iid with E(Y ) = µ and V (Y ) = σ 2 . Then
√ D
n(Y n − µ) → N (0, σ 2 ).
Pn
√ i=1 Yi − nµ
Yn −µ
Yn −µ
14) a) Zn = n = √ = √ is the
σ σ/ n nσ
Pn D
z–score of X n (and the z-score of i=1 Yi ), and Zn → N (0,√ 1). b) Two
applications of the CLT are to √ give the limiting distribution of n(Y n − µ)
and the limiting
Pn distribution of n(Yn /n−µY ) for a random variable Yn such 2
that Yn = i=1 Xi where the Xi are iid with E(X) = µX and V (X) = σX .
See Section 1.4. c) The CLT is the Lindeberg-Lévy CLT.
15) Theorem: Suppose Xn and X are RVs with the same probability space.
wp1 P D
a) If Xn → X, then Xn → X and Xn → X.
P D
b) If Xn → X, then Xn → X.
r P D
c) If Xn → X, then Xn → X and Xn → X.
P D
d) Xn → τ (θ) iff Xn → τ (θ) where c is a constant.
P
16) Theorem: a) If E[(Xn − X)2 ] → 0 as n → ∞, then Xn → X.
P
b) If E(Xn ) → E(X) and V (Xn − X) → 0 as n → ∞, then Xn → X.
Note: See 15) if P (X = τ (θ)) = 1.
r k
17) Theorem: If Xn → X, then Xn → X where 0 < k < r.
18) Theorem: Let Xn have pdf fXn (x), and let X have pdf fX (x). If
fXn (x) → fX (x) for all x (or for x outside of a set of Lebesgue measure 0),
D
then Xn → X.
19) Theorem: Let g : R → R be continuous at constant c.
D D
a) If Xn → c, then g(Xn ) → c.
P P
b) If Xn → c, then g(Xn ) → c.
wp1 wp1
c) If Xn → c, then g(Xn ) → c.
20) Theorem: Suppose Xn and X are integer valued RVs with pmfs fXn (x)
D
and fX (x). Then Xn → X iff P (Xn = k) → P (X = k) for every integer k iff
fXn (x) → fX (x) for every real x.
D P
21) Slutsky’s Theorem: If Yn → Y and Wn → w for some constant w,
D D D
then i) Yn Wn → wY , ii) Yn + Wn → Y + w and iii) Yn /Wn → Y /w for w 6= 0.
B D P
Note that Yn → Y implies Yn → Y where B = wp1, r, or P . Also Wn → w
D
iff Wn → w. If a sequence of constants cn → c as n → ∞ (everywhere
wp1 P
convergence), then cn → c and cn → c. (So everywhere convergence is a
special case of almost everywhere convergence.)
22) The cumulative distribution function (cdf) of any random variable
Y is F (y) = P (Y ≤ y) for all y ∈ R. If F (y) is a cumulative distribution
function, then i) F (−∞) = lim F (y) = 0, ii) F (∞) = lim F (y) = 1, iii)
y→−∞ y→∞
4.10 Summary 109
t2 σ 2
cY (t) = 1 + + o(t2 ) as t → 0.
2
a(t)
Here a(t) = o(t2 ) as t → 0 if lim 2 = 0. v) If Y is discrete with pmf fY (y),
X t→0 t
then cY (t) = eity fy (y). vi) If Y is a random variable, then cY (t) always
y
exists, and completely determines the distribution of Y .
25) Continuity Theorem: Let Yn be sequence of random variables with
characteristic functions cYn (t). Let Y be a random variable with cf cY (t).
a)
D
Yn → Y iff cYn (t) → cY (t) ∀t ∈ R.
b) Also assume that Yn has mgf mYn and Y has mgf mY . Assume that
all of the mgfs mYn and mY are defined on |t| ≤ d for some d > 0. Then if
D
mYn (t) → mY (t) as n → ∞ for all |t| < c where 0 < c < d, then Yn → Y .
26) Theorem: If limn→∞ cXn (t) = g(t) for all t where g is continuous at
t = 0, then g(t) = cX (t) is a characteristic function for some RV X, and
D
Xn → X.
Note: Hence continuity at t = 0 implies continuity everywhere since g(t) =
ϕX (t) is continuous. If g(t) is not continuous at 0, then Xn does not converge
in distribution.
110 4 Large Sample Theory
27) If cYn (t) → h(t) where h(t) is not continuous, then Yn does not con-
verge in distribution to any RV Y , by the Continuity Theorem and 26).
28) Let X1 , ..., Xn be independent RVs with characteristic functions cXj (t).
n
Pn Y
Then the characteristic function of j=1 Xj is cPnj=1 Xj (t) = cXj (t). If
j=1
Pn
the RVs also have mgfs mXj (t), then the mgf of j=1 Xj is m j=1 Xj (t) =
Pn
Yn
mXj (t).
j=1
D
29) Helly-Bray-Pormanteau Theorem: Xn → X iff E[g(Xn )] →
E[g(X)] for every bounded, real, continuous function g.
Note: 29) is used to prove 30) b).
D
30) a) Generalized Continuous Mapping Theorem: If Xn → X and
the function g is such that P [X ∈ C(g)] = 1 where C(g) is the set of points
D
where g is continuous, then g(Xn ) → g(X).
Note: P [X ∈ C(g)] = 1 can be replaced by P [X ∈ D(g)] = 0 where D(g)
is the set of points where g is not continuous.
D
b) Continuous Mapping Theorem: If Xn → X and the function g is
D
continuous, then g(Xn ) → g(X).
Note: the function g can not depend on n since gn is a sequence of functions
rather than a single function.
31) Generalized Chebyshev’s Inequality or Generalized Markov’s Inequal-
ity: Let u : R → [0, ∞) be a nonnegative function. If E[u(Y )] exists then for
any c > 0,
E[u(Y )]
P [u(Y ) ≥ c] ≤ .
c
If µ = E(Y ) exists, then taking u(y) = |y − µ|r and c̃ = cr gives
Markov’s Inequality: for r > 0 and any c > 0,
E[|Y − µ|r ]
P (|Y − µ| ≥ c] = P (|Y − µ|r ≥ cr ] ≤ .
cr
If r = 2 and σ 2 = V (Y ) exists, then we obtain
Chebyshev’s Inequality:
V (Y )
P (|Y − µ| ≥ c] ≤ .
c2
c n
32) a) lim 1− = e−c .
n→∞ n n
−cn
b) If cn → c as n → ∞, then lim 1 + = e−c .
n→∞ n
c) If cn is a sequence of complex numbers such that cn → c as n → ∞
cn n
where c is real, then lim 1 − = e−c .
n→∞ n
4.10 Summary 111
33) For each positive integer n, let Wn1 , ..., Wnrn be independent. The
probability space may change with n, giving a triangular array of RVs. Let
Xrn rn
X
E[Wnk ] = 0, V (Wnk ) = E[Wnk2 2
] = σnk , and s2n = 2
σnk = V[ Wnk ].
k=1 k=1
Then Prn
k=1 Wnk
Zn =
sn
Prn
is the z-score of k=1 Wnk .
34) Lyapounov’s CLT: Under 42), assume the |Wnk |2+δ are integrable
for some δ > 0. Assume Lyapounov’s condition:
rn
X E[|Wnk |2+δ ]
lim = 0.
n→∞
k=1
s2+δ
n
Then Prn
k=1 Wnk D
Zn = → N (0, 1).
sn
35) Special cases: i) rn = n and Wnk = Wk has W1 , ..., Wn, ... independent.
ii) Wnk = Xnk − E(Xnk ) = Xnk − µnk has
Prn
k=1 (Xnk − µnk ) D
→ N (0, 1).
sn
iii) Suppose X1 , X2 , ... are independent with E(Xi ) = µi and V (Xi ) = σi2 .
Let Pn Pn
Xi − i=1 µi
Zn = i=1 Pn 1/2
( i=1 σi2 )
Pn
be the z-score of i=1 Xi . Assume E[|Xi − µi |3 ] < ∞ for all n ∈ N and
Pn 3
i=1 E[|Xi − µi | ]
lim n 3/2
= 0. (∗)
n→∞
( i=1 σi2 )
P
D
Then Zn → N (0, 1).
36) The (Lindeberg-Lévy) CLT has the Xi iid with V (Xi ) = σ 2 < ∞. The
Lyapounov CLT in 43 iii) has the Xi independent (not necessarily identically
distributed), but needs stronger moment conditions to satisfy (∗).
37) Lindeberg CLT: Let the Wnk satisfy 42) and Lindeberg’s condition
rn 2
X E(Wnk I[|Wnk | ≥ sn ])
lim =0
n→∞ s2n
k=1
dk
g(k) (y) = g(y)
dyk
for integers k ≥ 2. Recall that the product rule is
4.11 Complements
4.12 Problems
4.1. Let Xn ∼ U (−n, n) have cdf Fn (x). Then limn Fn (x) = 0.5 for all real
D
x. Does Xn → X for some random variable X? Explain briefly.
4.2. Let Xn be a sequence of random variables such that P (Xn = 1/n) =
1. Does Xn converge in distribution? If yes, prove it by finding X and the
cdf of X. If no, prove it.
4.3. Suppose Xn has cdf
x n
Fn (x) = 1 − 1 −
θn
D
for x ≥ 0 and Fn (x) = 0 for x < 0. Show that Xn → X by finding the cdf of
X.
4.4. Suppose that Y1 , ..., Yn are iid with E(Y ) = (1 − ρ)/ρ and
VAR(Y ) =
√ 1−ρ
(1−ρ)/ρ2 where 0 < ρ < 1. Find the limiting distribution of n Y n − .
ρ
4.5. Let X1 , ..., Xn be iid with cdf F (x) = P (X ≤ x). Let Yi = I(Xi ≤ x)
where the indicator equals 1 if Xi ≤ x and 0, otherwise.
a) Find E(Yi ).
114 4 Large Sample Theory
b) Find VAR(Yi ).
n
1X
c) Let F̂n (x) = I(Xi ≤ x) for some fixed real number x. Find the
n i=1
√
limiting distribution of n F̂n (x) − cx for an appropriate constant cx.
4.6. Let Xn ∼ Binomial(n, p) where the positive integer n is large and
0 < p < 1.
√
Xn
Find the limiting distribution of n − p .
n
4.7. Suppose Xn is a discrete random variable with P (Xn = n) = 1/n
and P (Xn = 0) = (n − 1)/n.
D
a) Does Xn → X? Explain
b) Does E(Xn ) → E(X)? Explain briefly.
4.8. Lemma 1 (from Billingsley (1986)): Let z1 , ..., zm and w1 , ..., wm be
complex
Pm numbers of modulus at most 1. Then |(z1 · · · zm ) − (w1 · · · wm )| ≤
k=1 k − wk |.
|z
Prove this lemma by induction using (z1 · · · zm ) − (w1 · · · wm ) =
(z1 − w1 )(z2 · · · zm ) + w1 [(z2 · · · zm ) − (w2 · · · wm )]. Also, the modulus |z| acts
much like the absolute value. Hence |z1 z2 | = |z1 ||z2 |, and |z1 +z2 | ≤ |z1 |+|z2 |.
4.14. For each n ∈ N, let Xn1 , ..., Xnr be independent RVs on Pprobability
2 rn
space (Ωn , Fn , P
Pn ) with E(Xnk ) = µnk , V (XPnk ) = σnk , Tn = k=1 Xnk ,
rn 2 rn 2
E(Tn ) = µn = k=1 µnk , and V (Tn ) = σn = k=1 σnk .
a) If vn > 0 and σn /vn → 0 as n → ∞, use Chebyshev’s inequality to
prove
Tn − µn
Pn ≥ →0
vn
∀ > 0 as n → ∞.
b) Billinglsey (1986, problem 6.5 slightly modified):
Pn Let A1 , A2 , ... be in-
dependentP P (Ai ) = pi and pn = n1 i=1 pi . Let Xnk = Xk = IAk
events with P
n n
and Tn = k=1 Xk = k=1 IAk . Let rn = n and Pn = P for all n. Use a) to
prove
P [|n−1Tn − pn | ≥ ] → 0
for all > 0 as n → ∞.
√
Yn
4.15. Let Yn ∼ χ2n . Find the limiting distribution of n − 1 .
n
4.16. Suppose that X1 , ..., Xn are iid and that t is a function such that
E(t(X1 )) = µt . Is there a constant c such that
Pn
i=1 t(Xi ) P
→c ?
n
Explain briefly.
4.17. Let P (Xn = n) = 1.
a) Show FXn (x) → H(x) as n → ∞.
b) Let MXn (t) be the moment generating function of Xn . Find limn MXn (t)
for all t.
Hint: examine t < 0, t = 0, and t > 0.
c) Does Xn converge in distribution?
4.18. Suppose that X1 , ..., Xn are iid and V(X1 ) = σ 2 . Given that
n
1X P
σ̂n2 = (Xi − X)2 → σ 2 ,
n i=1
4.19. Suppose X 1 , ..., Xn are iid p×1 random vectors from a multivariate
t-distribution with parameters µ and Σ with d degrees of freedom. Then
d
E(X i ) = µ and Cov(X) = Σ for d > 2. Assuming d > 2, find the
√ d − 2
limiting distribution of n(X − c) for appropriate vector c.
4.20. Suppose
116 4 Large Sample Theory
√
n(X n − µ) D
Zn = → N (0, 1)
σ
P
and s2n → σ 2 where σ > 0. Prove that
√
n(X n − µ) D
→ N (0, 1).
sn
D P P D
4.21. If Yn → Y , an → a, and bn → b, then an + bn Yn → X. Find X.
4.22. Let X 1 , ..., Xn be iid k × 1 random vectors where E(X i ) =
(λ1 , ..., λk )T and Cov(X i ) = diag(λ21 , ..., λ2k), a diagonal k × k matrix with
jth diagonal entry λ2j . The nondiagonal entries are 0. Find the limiting dis-
√
tribution of n(X − c) for appropriate vector c.
4.23. What theorem can be used to prove both the (usual) central limit
theorem and the Lyapounov CLT?
Exam and Quiz Problems
4.24. Let Yn ∼ binomial(n, p).
√
Yn
a) Find the limiting distribution of n −p .
n
b) Find the limiting distribution of
r ! !
√ Yn √
n arcsin − arcsin( p) .
n
d 1
Hint : arcsin(x) = √ .
dx 1 − x2
4.25. Suppose Yn ∼ uniform(−n, n). Let Fn (y) be the cdf of Yn .
a) Find F (y) such that Fn (y) → F (y) for all y as n → ∞.
D
b) Does Yn → Y ? Explain briefly.
4.26.
4.27. Suppose x1 , ..., xn are iid p × 1 random vectors where E(xi ) = e0.5 1
and √
Cov(xi ) = (e2 − e)I p . Find the limiting distribution of n(x − c) for appro-
priate vector c.
4.28. Assume that
√
2
β̂1 β1 D 0 σ1 0
n − → N2 , .
β̂2 β2 0 0 σ22
√ √
β̂1 β1
n[(β̂1 − β̂2 ) − (β1 − β2 )] = (1 − 1) n − .
β̂2 β2
4.12 Problems 117
4.29. Let X1 , ..., Xn be iid with mean E(X) = µ and variance V (X) =
√ D
σ > 0. Then n(X − µ)2 = [ n(X − µ)]2 → W . What is W ?
2
2
4.30. Suppose that X1 , ..., Xn are iid√ N (µ, σ ).
a) Find the limiting distribution of n X n − µ .
√
b) Let g(θ) = [log(1+θ)]2 . Find the limiting distribution of n g(X n ) − g(µ)
for µ > 0.
c) Let g(θ) = [log(1+θ)]2 . Find the limiting distribution of n g(X n ) − g(µ)
D
b) Does Xn → X for some random variable X? Prove or disprove.
Hint: P (|Xn − 0| ≥ ) ≤ P (Xn = n).
4.38. Suppose Y1 , ..., Yn are iid EXP(λ). Let Tn = Y(1) = Y1:n =
min(Y1 , ..., Yn). It can be shown that the mgf of Tn is
1
mTn (t) =
1 − λt
n
118 4 Large Sample Theory
D
for t < n/λ. Show that Tn → X and give the distribution of X.
4.39. Suppose X 1 , ..., X n are iid 3 ×1 random vectors from a multinomial
distribution with
mρ1 mρ1 (1 − ρ1 ) −mρ1 ρ2 −mρ1 ρ3
E(X i ) = mρ2 and Cov(X i ) = −mρ1 ρ2 mρ2 (1 − ρ2 ) −mρ2 ρ3
mρ3 −mρ1 ρ3 −mρ2 ρ3 mρ3 (1 − ρ3 )
Let θ = (θ1 , ..., θp)T and let g(θ) = (eθ1 , ..., eθp )T . Find D g (θ ) .
4.43. Let µi be the ith population mean and let Σ i be the nonsingular
population covariance matrix of the ith population. Let xi,1 , ...xi,ni be iid
from the ith population. Let xi be the k × 1 sample mean from the xi,j ,
j = 1, ..., ni.
√
a) Find the limiting distribution of ni (xi − µi ).
Pp P
are p populations, n = i=1 ni , and ni /n√→ πi where
b) Assume there P
p
0 < πi√< 1 and = i=1 πi . Find the limiting distribution of n(xi − µi ).
√ 1√ √
Hint: n = ( n/ ni )( ni ).
D
4.44. Suppose Z n → Np (µ, I). Let a be a p × 1 constant vector. Find the
limiting distribution of aT (Z n − µ).
4.45. Let x1 , ..., xn be iid with mean E(x) = µ and variance V (x) = σ 2 >
√ D
0. Then exp[ n(x −µ)] → W . What is W ? Hint: use the continuous mapping
D D
theorem: if Z n → X and g is continuous, then g(Z n ) → g(X).
4.12 Problems 119
for µ = 0. Hint: use the Second Order Delta Method and find g(0).
4.49. Suppose
x ≤ c − n1
0,
FXn (x) = 2 (x − c + n ), c − n1 < x < c + n1
n 1
1, x ≥ c + n1 .
D D
Does Xn → X for some random variable X? Prove or disprove. If Xn → X,
find X.
4.50. Suppose Yn ∼ EXP (n) with cdf FYn (y) = 1 − exp(−y/n) for y ≥ 0
D
and FYn (y) = 0 for y < 0. Does Yn → Y for some random variable Y ? Prove
D
or disprove. If Yn → Y , find Y .
4.51. Suppose that Y1 , ..., Yn are iid with E(Y ) = (1−ρ)/ρ and VAR(Y )=
√
1−ρ
(1−ρ)/ρ2 where 0 < ρ < 1. a) Find the limiting distribution of n Y n − .
ρ
√
b) Find the limiting distribution of n g(Y n ) − ρ for appropriate func-
tion g.
120 4 Large Sample Theory
for t < 1/(λ + 1/n). Show that mn (t) → m(t) by finding m(t).
D
(Then Xn → X where X ∼ EXP (λ) with E(X) = λ by the continuity
theorem for mgfs.)
4.55. Suppose X 1 , ..., Xn are iid k × 1 random vectors where E(X i ) =
(µ1 , ..., µk)T and Cov(X i ) = diag(σ12 , ..., σk2), a diagonal k × k matrix with
jth diagonal entry σj2 . The nondiagonal entries are 0. Find the limiting dis-
√
tribution of n(X − c) for appropriate vector c.
P P
4.56. Suppose Yn → Y . Then Wn = Yn − Y → 0. Define Xn = Y for all
D D
n. Then Xn → Y . Then Yn = Xn + Wn → Z by Slutsky’s Theorem. What is
Z?
4.57. The method of moments estimator for Cov(X, Y ) = σX,Y is
n
1X
σ̂X,Y = (xi − x)(yi − y). Another common estimator is
n i=1
n
1 X n P
SX,Y = (xi − x)(yi − y) = σ̂X,Y . Using the fact that σ̂X,Y →
n − 1 i=1 n−1
P
σX,Y when the covariance exists, prove that SX,Y → σX,Y with Slutsky’s
4.12 Problems 121
P D
Theorem. Hint: Zn → c iff Zn → c if c is a constant, and usual convergence
P
an → a of a sequence of constants implies an → a.
4.58. Suppose that the characteristic function of X n is
t2 σ 2
cX (t) = exp(− ).
2n
√ √
Then the characteristic function of n X n is c√n X (t) = cX ( n t). Does
√ D
n X n → W for some random variable W ? Explain.
√ D
4.59. Suppose that β is a p × 1 vector and that n(β̂ n − β) → Np (0, C)
where C is a p × p nonsingular matrix. Let A be a j × p matrix with full rank
j. Suppose that Aβ = 0. √
a) What is the limiting distribution of nAβ̂n ?
√
b) What is the limiting distribution of Z n = n[ACAT ]−1/2Aβ̂ n ? Hint:
for a square symmetric nonsingular matrix D, we have D1/2 D 1/2 = D, and
D−1/2 D−1/2 = D−1 , and D−1/2 and D1/2 are both symmetric.
T
c) What is the limiting distribution of Z Tn Z n = nβ̂ n AT [ACAT ]−1 Aβ̂ n ?
D D
Hint: If Z n → Z ∼ Nk (0, I) then Z Tn Z n → Z T Z ∼ χ2k .
4.60. Suppose
2 2
σ̂1 σ1
√ . . D
n .. − .. → Np (0, Σ).
σ̂p2 σp2
p q
Let θ = (σ12 , ..., σp2)T and let g(θ) = ( σ12 , ..., σp2 )T . Find Dg (θ ) .
4.61. Suppose
σ̂1 σ1
√ . . D
n . − .. → Np (0, Σ).
.
σ̂p σp
Let θ = (σ1 , ..., σp)T and let g(θ) = ((σ1 )2 , ..., (σp)2 )T . Find Dg (θ ) .
Σ D
4.62. Let wB ∼ Np 0, . Then wB → w as B → ∞. Find w.
B
2
4.63.PLet x1 , ..., xn be iid
Pnwith mean E(x) =2 µ and Pnvariance V2 (x) = σ >
n
0. Then i=1 (xi −x n ) = i=1 (xi −µ+µ−x n ) = i=1 (xi −µ) −n(x−µ)2 .
2
n
1X P
a) (xi − µ)2 → θ. What is θ?
n i=1
√ D
b) n(x − µ)2 = [ n(x − µ)]2 → W . What is W ?
D
4.64. Suppose Z n → Nk (µ, I). Let A be a constant r × k matrix. Find
the limiting distribution of A(Z n − µ).
122 4 Large Sample Theory
with 0 < γ < 1 and c > 0. Then √ E(xi ) = µ and Cov(xi ) = [1 + γ(c − 1)]Σ.
Find the limiting distribution of n(x − d) for appropriate vector d.
4.66. Let Σ i be the nonsingular population covariance matrix of the ith
treatment group or population.P To simplify the large sample theory, assume
3
ni = πi n where 0 < πi < 1 and i=1 πi = 1. Let Ti be a multivariate location
estimator such that
√ √
D D Σi
ni (Ti − µi ) → Nm (0, Σ i ), and n(Ti − µi ) → Nm 0, for i = 1, 2, 3.
πi
Assume the Ti are independent.
Then
√ T 1 − µ 1
D
n T2 − µ2 → u.
T3 − µ3
a) Find the distribution of u.
b) Suggest an estimator π̂i of πi .
4.67. Let X1 , ..., Xn be independent and identically
Pn distributed (iid) from
Xi
a Poisson(λ) distribution with E(X) = λ. Let X = i=1 n
.
√
a) Find the limiting distribution of n ( X − λ ).
√
b) Find the limiting distribution of n [ (X)3 − (λ)3 ].
4.68.Let X1 , ..., Xn be iid from a normal distribution with unknown mean
√ 3
µ and known variance σ 2 . Find the limiting distribution of n(X − c) for
an appropriate constant c.
4.69. Let X1 , ..., Xn be a random sample from a population with pdf
θ−1
θx
0<x<3
f(x) = 3θ
0 elsewhere
X
The method of moments estimator for θ is Tn = . Find the limiting
√ 3 − X
distribution of n(Tn − θ) as n → ∞.
4.70. Let Yn ∼ χ2n .
√
Yn
a) Find the limiting distribution of n − 1 .
n
" #
3
√ Yn
b) Find the limiting distribution of n − 1 .
n
4.71. Let Y1 , ..., Yn be iid with E(Y ) = µ and V (Y ) = σ 2 . Let g(µ) = µ2 .
For µ = 0, find the limiting distribution of n[(Y n )2 − 02 ] = n(Y n )2 by using
the Second Order Delta Method.
4.12 Problems 123
√
Yn D
n − µ → N (0, σ 2 )
n
Pn
For example, if Yn ∼ N (nµ, nσ 2 ) then Yn ∼ i=1 Xi where the Xi are iid
N (µ, σ 2 ). Hence
√ √
Yn D
n − µ ∼ n(X n − µ) → N (0, σ 2 ).
n
4.80. Suppose X1 , ..., Xn are iid from a distribution with mean µ and
variance σ 2 . The method of moments estimator for σ 2 is
n n
2 1X 1X 2
SM = (Xi − X n )2 = Xi − (X n )2 .
n n
i=1 i=1
n
1X 2 P
a) X → c. What is c? Hint: Use WLLN on Wi = Xi2 .
n i=1 i
P P
b) (X n )2 → d. What is d? Hint: g(x) = x2 is continuous, so if Zn → θ,
P
then g(Zn ) → g(θ).
2 P
c) Show Sm → σ2 .
n
n 1 X P
d) S 2 = 2
SM = (Xi − X n )2 . Prove S 2 → σ 2 .
n−1 n − 1 i=1
4.81. Suppose X 1 , ..., Xn are iid k × 1 random vectors where E(X i ) =
(µ1 , ..., µk)T and Cov(X i ) = (1 − α)I + α11T , where I is the k × k iden-
T −1
tity matrix, 1 = √ (1, 1, ..., 1) , and −(k − 1) < α < 1. Find the limiting
distribution of n(X − c) for appropriate vector c.
4.82. Suppose Xn are random variables with characteristic functions
cXn (t), and that cXn (t) → eitc for every t ∈ R where c is a constant. Does
4.12 Problems 125
D
Xn → X for some random variable X? Explain briefly. Hint: Is the func-
tion g(t) = eitc continuous as t = 0? Is there a random variable that has
characteristic function g(t)?
4.83. The characteristic function for Y ∼ N (µ, σ 2 ) is
cY (t) = exp(itµ − t2 σ 2 /2). Let Xn ∼ N (0, n).
a) Prove cXn (t) → h(t) ∀t by finding h(t).
b) Use a) to prove whether Xn converges in distribution.
4.84. Suppose √
n(X n − µ) D
Zn = → N (0, 1)
σ
P
and s2n → σ 2 where σ > 0. Prove that
√
n(X n − µ) D
→ N (0, 1).
sn
4.85. Show the usual Delta Method is a special case of the Multivariate
Delta Method if g is a real function (d = 1), Tn is a random variable, θ is a
scalar and Σ = σ 2 is a scalar (k = 1).
4.86. Let X be a k × 1 random vector and X n be a sequence of k × 1
random vectors and suppose that
D
tT X n → tT X
D
for all t ∈ Rk . Does X n → X? Explain briefly.
D
4.87. Suppose the k×1 random vector X n → Nk (µ, Σ). Hence the asymp-
totic distribution of X n is the multivariate normal MVN Nk (µ, Σ) distribu-
tion. Find the d, µ̃ and Σ̃ for the following problem. Let C T be the transpose
of C.
D
Let C be an m × k matrix, then CX n → Nd (µ̃, Σ̃).
4.88. Suppose X n are k × 1 random vectors with characteristic functions
cX n (t). Does cX n (0) → a for some constant a? Prove or disprove. Here 0 is
a k × 1 vector of zeroes.
4.89. Suppose
√
λ̂ λ D 0 Σλ Σ λη
n − → Np+1 , ∼ Np+1 (0, Σ)
η̂ η 0 Σηλ Ση
Let θ = (σ12 , ..., σp2)T and let g(θ) = (log(σ12 ), ..., log(σp2 ))T . Find Dg (θ ) .
4.91. It is true that Wn has the same order as Xn in probability, written
Wn P Xn , iff for every > 0 there exist positive constants N and 0 < d <
D such that
Wn
P (d ≤ ≤ D ) ≥ 1 −
Xn
for all n ≥ N .
a) Show that if Wn P Xn then Xn P Wn .
b) Show that if Wn P Xn then Wn = OP (Xn ).
c) Show that if Wn P Xn then Xn = OP (Wn ).
d) Show that if Wn = OP (Xn ) and if Xn = OP (Wn ), then Wn P Xn .
4.92. This problem will prove the following Theorem which says that if
there are K estimators Tj,n of a parameter β, such that kTj,n −βk = OP (n−δ )
where 0 < δ ≤ 1, and if Tn∗ picks one of these estimators, then kTn∗ − βk =
OP (n−δ ).
Lemma: Pratt (1959). Let X1,n , ..., XK,n each be OP (1) where K is
fixed. Suppose Wn = Xin ,n for some in ∈ {1, ..., K}. Then
Wn = OP (1). (4.20)
Proof.
FWn (x) ≤ P (min{X1,n , ..., XK,n} ≤ x) = 1 − P (X1,n > x, ..., XK,n > x).
Since K is finite, there exists B > 0 and N such that P (Xi,n ≤ B) > 1−/2K
and P (Xi,n > −B) > 1 − /2K for all n > N and i = 1, ..., K. Bonferroni’s
PK
inequality states that P (∩K i=1 Ai ) ≥ i=1 P (Ai ) − (K − 1). Thus
and
−FWn (−B) ≥ −1 + P (X1,n > −B, ..., XK,n > −B) ≥
−1 + K(1 − /2K) − (K − 1) = −1 + K − /2 − K + 1 = −/2.
Hence
FWn (B) − FWn (−B) ≥ 1 − for n > N. QED
Theorem. Suppose kTj,n − βk = OP (n−δ ) for j = 1, ..., K where 0 < δ ≤
1. Let Tn∗ = Tin ,n for some in ∈ {1, ..., K} where, for example, Tin ,n is the
Tj,n that minimized some criterion function. Then
Prove the above theorem using the Lemma with an appropriate Xj,n .
2
4.93. Let W ∼ N (µW , σW ) and let X ∼ Np (µ, Σ). The characteristic
function of W is
y2 2
iyW
ϕW (y) = E(e ) = exp iyµW − σw .
2
x ≤ c − n1
0,
FXn (x) = 2 (x − c + n ), c − n1 < x < c + n1
n 1
1, x ≥ c + n1 .
D D
Does Xn → X for some random variable X? Prove or disprove. If Xn → X,
find X.
4.96. Suppose X1 , ..., Xn are iid from a distribution with E(X k ) = Γ (3 −
k)/6λk for integer k < 4. Recall
√ that Γ (n) = (n − 1)! for integers n ≥ 1. Find
the limiting distribution of n( X n − c ) for appropriate constant c.
4.97. Suppose Xn is a discrete random variable with P (Xn = n) = 1/n
and
D
P (Xn = 0) = (n − 1)/n. Does Xn → X? Explain.
√
Xn
4.98. Let Xn ∼ Poisson(nθ). Find the limiting distribution of n − θ .
n
128 4 Large Sample Theory
2
4.99. Let Y1 , ..., Yn be iid Gamma(θ, θ) random variables with
√ E(Yi ) = θ
3
and V (Yi ) = θ where θ > 0. Find the limiting distribution of n( Y n − c )
for appropriate constant√ c.
4.100. Let Xn = n with probability 1/n and Xn = 0 with probability
1 − 1/n.
√
(Xn = nI[0,1/n] wrt U(0,1) probability.)
1
a) Prove that Xn → 0.
2
b) Does Xn → 0? Prove or disprove.
D
4.101. Suppose Xn ∼ U (c−1/n, c+1/n). Does Xn → X for some random
variable X? Prove or disprove. (If Y ∼ U (θ1 , θ2 ), then the cdf of Y is F (y) =
(y − θ1 )/(θ2 − θ1 ) for θ1 ≤ y ≤ θ2 .)
4.102. Suppose X 1 , ..., X n are iid with E(X i ) = 0 but Cov(X i ) does not
P
exist. Does X n → c for some constant vector c? Explain briefly.
D P D
4.103. Suppose X n → X and Y n − X n → 0. Does Y n → W for some
random vector W ? [Hint: Y n = X n + (Y n − X n ).]
4.104. Let Xn ∼ N (0, σn2 ) where σn2 → ∞ as n → ∞. Let Φ(x) be the cdf
of a N (0, 1) RV. Then the cdf of Xn is Fn (x) = Φ(x/σn ).
a) Find F (x) such that Fn (x) → F (x) for all real x.
D
b) Does Xn → X? Explain briefly.
4.105. Define when a sequence of random variables Xn converges in prob-
ability to a random variable X.
4.106. Suppose X1 , ..., Xn are iid C(µ, σ) with characteristic function
ϕX (t) = exp(itµ − |t|σ) where exp(a) = ea . Pn
a) Find the characteristic function ϕTn (t) of Tn = i=1 Xi .
b) Xn ∼ U (n, n + 1)
c) Xn ∼ U (an , bn ) where an → a < b and bn → b.
d) Xn ∼ U (an , bn ) where an → c and bn → c.
e) Xn ∼ U (−n, n)
f) Xn ∼ U (c − 1/n, c + 1/n)
4.131Q . a) Let P (Xn = n) = 1/n and P (Xn = 0) = 1 − 1/n.
1
i) Determine whether Xn → 0.
P
ii) Determine whether Xn → 0.
D
iii) Determine whether Xn → 0.
1
b) Let P (Xn = 0) = 1 − and P (Xn = 1) = 1/n.
n
2
i) Determine whether Xn → 0.
1
ii) Determine whether Xn → 0.
P
iii) Determine whether Xn → 0.
D
iv) Determine whether Xn → 0.
133
134 5 Conditional Probability and Conditional Expectation
Definition 5.4. Let E(X) exist on (Ω, F , P ), and let the σ-field G ⊆ F. A
conditional expectation of XR given G is a f = E[X|G] that is i) measur-
Rable G and integrable, and ii) G E[X|G]dP = E[E(X|G)IG ] = E[XIG ] =
G
XdP for any G ∈ G.
Remark 5.2. i) Note that f = E[X|G] is a random variable wrt G.
ii) There are many such RVs E[X|G] satisfying Definition 5.4, but any two
of them are equal wp1. A specific such RV is called a version of E[X|G].
iii) Fix A ∈ F. If X = IA , then E[IA |G] is a version of P [A|G].
iv) Since G ⊆ F, often X is not measurable G. Then X is not a version of
E[X|G]. If X is measurable G, then X is a version of E[X|G].
Theorem 5.2. If X is measurable G and Y and XY are integrable, then
E[XY |G] = XE[Y |G] wp1. That is, XE[Y |G] is a version of E[XY |G].
Theorem 5.3. Let X, Y , and Xn be integrable. Let a and b be constants.
i) If X = a wp1, then E[X|G] = a wp1.
ii) E[(aX + bY )|G] = aE[X|G] + bE[Y |G] wp1.
iii) If X ≤ Y wp1, then E[X|G] ≤ E[Y |G] wp1.
iv) |E[X|G]| ≤ E[|X| | G] wp1.
v) If limn Xn = X wp1, |Xn | ≤ Y , and Y is integrable, then limn E[Xn |G] =
E[X|G] wp1.
Theorem 5.4. If X is integrable and σ−fields G1 ⊆ G2 ⊆ F, then
E(E[X|G2 ]|G1 ) = E[X|G1 ] wp1.
5.3 Summary
5.4 Complements
5.5 Problems
Problem 5.1. What theorem can be used to prove the existence of P [A|G]
and E[X|G]?
Problem 5.2. Using E[IA |G] = P [A|G] wp1, use X = IA , Y = IB ,
Xi = IAi , and the result for E[X|G] to get the corresponding result for
P [A|G]. Pn Pn Pn Pn
a) Using E[ i=1 ai Xi |G] = i=1 ai E[ i=1 ai Xi |G], find E[ i=1 ai IAi |G]
in terms of P [Ai |G].
b) If X ≤ Y wp1, then E[X|G] ≤ E[Y |G] wp1. If A ⊆ B, then IA ≤ IB .
Use these results to show that if A ⊆ B, then P [A|G] ≤ P [B|G] wp1.
c) If X = a wp1, then E[X|G] = a wp1. Use 1 = IΩ and b) with B = Ω
to prove P [A|G] ≤ 1 wp1.
Problem 5.3. Let a be a constant. Prove E[aX|G] = aE[X|G] wp1.
136 5 Conditional Probability and Conditional Expectation
137
Chapter 7
Some Solutions
139
140 7 Some Solutions
Uniqueness:
n
X X X X X
xi P (Ai ) = xi P (Ai ) = xP (∪i:xi=x Ai ) = P (X = x).
i=1 x i:xi =x x x
3.31. See the proof of Theorem 3.11.
3.32. Proof. 0 ≤ E(X1 ) ≤ E(X2 ) ≤ .... So {E(Xn )} is a monotone
sequence and limn→∞ E(Xn ) exists in [0, ∞].
3.33. See the proof of Theorem 3.13.
3.34. See the proof of Theorem 3.14.
3.35. See the Monotone Convergence Theorem 3.16 and its proof.
3.36. See the Lebesgue Dominate Convergence Theorem for RVs and its
proof.
D) Large Sample Theory:
4.1. Fn (y) = 0.5 + 0.5y/n for −n < y < n, so Fn (y) → H(y) ≡ 0.5 for all
real y. Hence Xn does not converge in distribution since H(y) is not a cdf.
4.16. c = µt by the WLLN since the Wi = t(Xi ) are iid
4.23. Lindeberg CLT
4.34. a) Fn (y) = y/n for 0 < y < n, so Fn (y) → H(y) ≡ 0.0 for all real y.
Hence Xn does not converge in distribution since H(y) is not a cdf.
4.36. c = E(Xi2 ) = σ 2 + µ2
√ D
4.46. a) n(X̄ − µ) → N (0, σ 2 ).
√ 1 D
b) Define g(x) = x1 , g0 (x) = −1
x2 . b) Using delta method, n( X̄ − µ1 ) →
2
N (0, σµ4 ), provided µ 6= 0.
4.120. Solution. a) The cdf Fn (x) of Xn is
x ≤ −1
0, n
nx 1 −1 1
Fn (x) = 2
+ 2, n ≤ x ≤ n
1, x ≥ n1 .
Sketching Fn (x) shows that it has a line segment rising from 0 at x = −1/n
to 1 at x = 1/n and that Fn (0) = 0.5 for all n ≥ 1. Examining the cases
x < 0, x = 0 and x > 0 shows that as n → ∞,
7 Some Solutions 141
0, x < 0
1
Fn (x) → x=0
2
1, x > 0.
D
Hence z n → z.
D
Hence Xn → X ∼ U (a, b).
d)
0 t<c
FXn (t) →
1 t > c.
D
Hence Xn → X where P (X = c) = 1. Hence X has a point mass distribution
at c. (The behavior of limn→∞ FXn (c) is not important, even if the limit does
not exist.)
e)
t+n 1 t
FXn (t) = = +
2n 2 2n
for −n ≤ t ≤ n. Thus FXn (t) → H(t) ≡ 0.5 ∀t ∈ R. Since H(t) is continuous
but not a cdf, Xn does not converge in distribution to any RV X.
f)
t − c + n1 1 n
FXn (t) = 2 = + (t − c)
n
2 2
for c − 1/n ≤ t ≤ c + 1/n. Thus
0 t<c
FXn (t) → H(t) = 1/2 t = c
1 t > c.
Hence t = c is the only discontinuity point of FX (t), and H(t) = FX (t) at all
D
continuity points of FX (t). Thus Xn → X where P (X = c) = 1.
4.131. Solution. a) i) Xn is discrete and takes on two values with E(Xn ) =
1
n for all positive integers n. Hence E[|Xn − 0|] = E(Xn ) = 1 ∀n and Xn
n
1
does not satisfy Xn → 0.
ii) Let > 0. Then
1
P [|Xn − 0| ≥ ] ≤ P (Xn = n) = →0
n
P
as n → ∞. Hence Xn → 0.
D
iii) By ii) Xn → 0.
b) i) Xn is discrete and takes on two values with
X 1 1 1
E[(Xn − 0)2 ] = E(Xn2 ) = x2 P (Xn = x) = 02 (1 − ) + 12 = → 0
n n n
2
as n → ∞. Hence Xn → 0.
Since i) holds, so do ii), iii) and iv).
(Also note that
1
E[|Xn − 0|] = E(Xn ) = → 0 ∀n.
n
1
Hence Xn → 0.)
4.132. See the proof of Theorem
Pn 4.3.
4.133. Solution. a) Xn ∼ i=1 Yi where the Yi are iid bin(n = 1, p)
random variables with E(Yi ) = p and V (Yi ) = p(1 − p). Thus
√ D √
Xn D
n − p = n Y − p → N [0, p(1 − p)]
n
by the CLT.
b) Yi = I(Xi ≤ x) ∼ bin(n = 1, F (x)) for fixed x.
i) E(Yi ) = P (Xi ≤ x) = F (x)
ii) V (Yi ) = F (x)(1 − F (x))
√ D
iii) n (F̂n (x) − F (x) ) → N [0, F (x)(1 − F (x)] by the CLT.
√
D d
c) n(X − µ) → Np 0, Σ
d−2
by the MCLT.
d) E(Y ) = exp(µ + σ 2 /2) using r = 1, and E(Y 2 ) = exp(2µ + 2σ 2 ) using
r = 2. V (Y ) = E(Y 2 ) − [E(Y )]2 . Thus
144 7 Some Solutions
√ √ D
n(Y n − E(Y )) = n(Y n − exp(µ + σ 2 /2)) → N (0, V (Y ))
by the CLT.
4.134. Solution: Proof: Let δ > 0. Then
rn rn δ Xrn
E[|Wnk |2+δ Mn2 E[|Wnk |2 E[|Wnk |2
X X Mn
lim ≤ lim = lim =
n→∞
k=1
s2+δ
n n→∞
k=1
s2+δ
n n→∞ sn
k=1
s2n
δ
Mn
lim =0
n→∞ sn
Prn
using s2n = k=1 E[|Wnk |2 .
4.135. Solution: Proof: Let > 0. Then
1 2 1 M2
2
E[Wnk I(|Wnk | ≥ sn )] ≤ 2 E[Mn2 I(|Wnk | ≥ sn )] = 2n P (|Wnk | ≥ sn ) ≤
sn sn sn
2
Mn2 E(Wnk2 2
) Mn σnk
=
s2n 2 s2n sn s2n 2
where the last inequality holds by Chebyshev’s inequality. So
rn 2 rn 2
X 1 2 Mn 1 X 2 Mn 1
E[Wnk I(|Wnk | ≥ sn )] ≤ σnk = →0
s2n sn s2n 2 sn 2
k=1 k=1
Prn 2
using k=1 σnk = s2n .
4.136. Proof. Let Yi = |Wi | = |Xi − pi |. Then P (Yi = 1 − pi ) = pi and
P (Yi = qi ) = qi . Thus
X
E[|Xi − pi |3 ] = E[|Wi |3 ] = y3 f(y) = (1 − pi )3 pi + p3i qi = qi3 pi + p3i qi
y
= pi qi (p2i + qi2 ) ≤ pi qi
Pn Pn
since p2i + qi2 ≤ P
(Pi + qi )2 = 1. Thus i=1 E[|Xi − pi |3 ] ≤ i=1 pi qi . Dividing
n
both sides by ( i=1 pi qi )3/2 gives
Pn 3
i=1 E[|Xi − pi | ] 1
Pn ≤ Pn →0
( i=1 pi qi )3/2 ( i=1 pi qi )1/2
as n → ∞. Hence the special case of Lyapounov’s condition
Pn 3
i=1 E[|Xi − µi | ]
lim n 3/2
= 0.
n→∞
( i=1 σi2 )
P
4.137. See the proof of Lyapounov’s CLT.
4.138. Solution: Proof: Once n is large enough so that sn > c (which
occurs since sn → ∞), I[|Wk | ≥ sn ] = 0. Hence Lindeberg’s condition holds.
4.139. Solution: Proof: Need to show that Lindeberg’s condition holds.
Now s2n = nσ 2 and the Wk2 I[|Wk | ≥ sn ] are iid for given n. Thus
n
1 X 1 √
E(Wk2 I[|Wk | ≥ sn ]) = 2 E(W12 I[|W1 | ≥ σ n])
s2n σ
k=1
1
Z
= W12 dP → 0
σ2 √
|W1 |≥σ n
√ √
as n → ∞ since P (|W1 | ≥ σ n) ↓ 0 as n → ∞. Or Yn = W12 I[|W1 | ≥ σ n]
satisfies Yn ≤ W12 and Yn ↓ Y = 0 as n → ∞. Thus E(Yn ) → E(Y ) = 0
by Lebesgue’s Dominated Convergence Theorem. Thus Equation (4.15) holds
D
and Zn → N (0, 1). If the Wi = Xi − µ, then
Pn √
(Xi − µ) n(X n − µ) D
Zn = i=1 √ = → N (0, 1).
σ n σ
√ D
Thus n(X n − µ) → N (0, σ 2 ).
5.9.
Solution:
R R
a) G E(X|G)dP = G XdP (= E[XIG ])
R R
b) E[E(X|G)] = Ω E(X|G)dP = Ω XdP = E[X]
R R R R
c) G E(IA |G)dP = G IA dP = IA IG dP = IA∩G dP = P (A ∩ G)
R
d) G P (A|G)dP = P (A ∩ G)
R
e) Ω P (A|G)dP = P (A ∩ Ω) = P (A)
REFERENCES 147
Ash, R.B. (1993), Real Variables: with Metric Space Topology, IEEE Press,
New York, NY. Available from (https://faculty.math.illinois.edu/∼r-ash/).
Ash, R.B., and Doleans-Dade, C.A. (1999), Probability and Measure The-
ory, 2nd ed., Academic Press, San Diego, CA.
Bickel, P.J., and Doksum, K.A. (1977), Mathematical Statistics: Basic
Ideas and Selected Topics, 1st ed., Holden Day, Oakland, CA.
Billingsley, P. (1986, 1995), Probability and Measure, 2nd and 3rd ed.,
Wiley, New York, NY.
Breiman, L. (1968), Probability, Addison-Wesley, Reading, MA.
Capiński, M., and Kopp, P.E. (2004), Measure, Integral and Probability,
2nd ed., Springer-Verlag, London, UK.
Casella, G., and Berger, R.L. (2002), Statistical Inference, 2nd ed., Duxbury,
Belmont, CA.
Chernoff, H. (1956), “Large-Sample Theory: Parametric Case,” The An-
nals of Mathematical Statistics, 27, 1-22.
Chung, K.L. (2001), A Course in Probability Theory, 3rd ed., Academic
Press, San Diego, CA.
Cramér, H. (1946), Mathematical Methods of Statistics, Princeton Univer-
sity Press, Princeton, NJ.
DasGupta, A. (2008), Asymptotic Theory of Statistics and Probability,
Springer, New York, NY.
Davidson, J. (1994), Stochastic Limit Theory, Oxford University Press,
Oxford, UK.
DeGroot, M.H. (1975), Probability and Statistics, 1st ed., Addison-Wesley
Publishing Company, Reading, MA.
Dudley, R.M. (2002), Real Analysis and Probability, Cambridge University
Press, Cambridge, UK.
Durrett, R. (2019), Probability, Theory and Examples, 5th ed., Cambridge
University Press, Cambridge, UK.
Feller, W. (1971), An Introduction to Probability Theory and Its Applica-
tions, Vol. II, 2nd ed., Wiley, New York, NY.
Ferguson, T.S. (1996), A Course in Large Sample Theory, Chapman &
Hall, New York, NY.
Gaughan, E.D. (2009), Introduction to Analysis, 5th ed., American Math-
ematical Society, Providence, RI.
Gnedenko, B.V. (1989), Theory of Probability, 5th ed., Chelsea Publishers,
Providence, RI.
Hoel, P.G., Port, S.C., and Stone, C.J. (1971), Introduction to Probability
Theory, Houghton Mifflin, Boston, MA.
Hunter, D.R. (2014), Notes for a Graduate-Level Course in Asymptotics
for Statisticians, available from (www.stat.psu.edu/∼dhunter/asymp/lectures/).
Jiang, J. (2022), Large Sample Techniques for Statistics, 2nd ed., Springer,
New York, NY.
Karr, A.F. Probability, (1993), Springer, New York, NY.
148 REFERENCES
151
152 Index