Lecture 05

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

STA732

Statistical Inference
Lecture 05: Rao-Blackwell Theorem

Yuansi Chen
Spring 2023
Duke University

https://www2.stat.duke.edu/courses/Spring23/sta732.01/

1
Recap from Lecture 04

• 𝑉 is ancillary if its distribution does not depend on 𝜃


• Completeness + sufficiency as the ideal notion of optimal data
compression. To prove completeness, one usually goes by
definition or by identifying exponential family.
• Basu’s theorem is useful to prove independence between a
complete sufficient statistics and an ancillary statistics.

2
Goal of Lecture 05

1. Convex loss
2. Rao-Blackwell Theorem
3. Uniformly minimum variance unbiased estimator (UMVU)

Chap. 3.6, 4.1-4.2 in Keener or Chap. 1.7, 2.1 in Lehmann and Casella

We are entering the first approach of arguing for “the best”


estimator in point estimation: by restricting to a smaller class of
estimators!

3
Convex loss
Definition. Convex set

A set 𝒞 ⊆ ℝ𝑝 is convex if given any two points 𝑥, 𝑦 ∈ 𝒞, for any


𝜆 ∈ [0, 1], we have

𝜆𝑥 + (1 − 𝜆)𝑦 ∈ 𝒞

4
Definition. Convex function

A real-valued function 𝑓 defined on a convex set 𝒞 ⊆ ℝ𝑝 is a convex


function if for any two points 𝑥, 𝑦 ∈ 𝒞 and any 𝜆 ∈ [0, 1], we have

𝑓(𝜆𝑥 + (1 − 𝜆)𝑦) ≤ 𝜆𝑓(𝑥) + (1 − 𝜆)𝑓(𝑦).

It is called strictly convex if the above inequality holds strictly for


𝑥 ≠ 𝑦 and 𝜆 ∈ (0, 1).

5
Jensen’s inequality in finite form

Jensen’s inequality in finite form


For a convex function 𝑓, 𝑥1 , … , 𝑥𝑛 in its domain, and positive
𝑛
weights 𝛼𝑖 with ∑𝑖=1 𝛼𝑖 = 1. Then
𝑛 𝑛
𝑓(∑ 𝑎𝑖 𝑥𝑖 ) ≤ ∑ 𝑎𝑖 𝑓(𝑥𝑖 )
𝑖=1 𝑖=1

proof by induction, omitted

6
Jensen’s inequality in a probabilistic setting

Jensen’s inequality in a probabilistic setting


𝑋 is an integrable real-valued random variable, 𝑓 is convex. Then

𝑓(𝔼[𝑋]) ≤ 𝔼[𝑓(𝑋)]

If 𝑓 is strictly convex, the inequality holds strictly unless 𝑋 is almost


surely constant.

proof see Thm 3.25, remark 3.26 in Keener or Wikipedia

7
Examples of convex functions

• 𝑥 ↦ 1/𝑥 is strictly convex on (0, ∞). Then for 𝑋 > 0, we have

1
≤ 𝔼[1/𝑋]
𝔼[𝑋]

• 𝑥 ↦ − log(𝑥) is strictly convex on (0, ∞). Then for 𝑋 > 0, we


have

log(𝔼[𝑋]) ≥ 𝔼 log(𝑋).

8
Convex loss penalizes extra noise to an estimator

Proposition
Suppose the loss 𝐿(𝜃, 𝑑) is convex in 𝑑. Let 𝛿(𝑋) be an estimate of
̃
𝜃. Define 𝛿(𝑋) = 𝛿(𝑋) + 𝜖, where 𝜖 is a zero-mean random variable
independent of 𝑋. Then

𝑅(𝜃, 𝛿)̃ ≥ 𝑅(𝜃, 𝛿)

where the risk 𝑅(𝜃, 𝛿) = 𝔼𝜃 [𝐿(𝜃, 𝛿(𝑋))]

Proof idea: tower property + Jensen’s inequality

9
Rao-Blackwell Theorem
Rao-Blackwell Theorem

Thm 3.28 in Keener


Let 𝑇 be a sufficient statistics for P = {𝑃𝜃 ∶ 𝜃 ∈ Ω}, let 𝛿 be an
estimator of 𝑔(𝜃). Define 𝜂(𝑇 ) = 𝔼𝜃 [𝛿(𝑋) ∣ 𝑇 ]. If 𝐿(𝜃, ⋅) is convex,
then

𝑅(𝜃, 𝜂) ≤ 𝑅(𝜃, 𝛿).

where the risk 𝑅(𝜃, 𝛿) = 𝔼𝜃 [𝐿(𝜃, 𝛿(𝑋))].


Furthermore, if 𝐿(𝜃, ⋅) is strictly convex, the inequality is strict
a.s.
unless 𝛿(𝑋) = 𝜂(𝑇 ).

10
Interpretation

For convex loss functions,

1. If an estimator is not just based on sufficient statistics 𝑇 , we


can improve it.
2. The step of constructing 𝜂(𝑇 ) = 𝔼𝜃 [𝛿(𝑋) ∣ 𝑇 ] from 𝛿 is called
Rao-Blackwellization.
3. When discussing optimal estimators, the only estimators of
𝑔(𝜃) that are worth considering are functions of sufficient
statistics 𝑇 .

11
Proof of Rao-Blackwell Theorem

See Keener Thm 3.28, apply Jensen

12
UMVU
Bias

• The bias of an estimate 𝛿(𝑋) is 𝔼𝜃 [𝛿(𝑋) − 𝑔(𝜃)]


• We say an estimator 𝛿 is unbiased for 𝑔(𝜃) if

𝔼𝜃 [𝛿(𝑋)] = 𝑔(𝜃), ∀𝜃 ∈ Ω.

Ex: what is an unbiased estimator of 𝜃 for 𝑋 drawn from a uniform


distribution on (0, 𝜃)?

13
Bias-variance decomposition under squared error loss

Squared error loss:

𝐿(𝜃, 𝑑) = (𝑑 − 𝑔(𝜃))2

Risk decomposition under squared error loss


2
Risk becomes the mean squared error 𝑅(𝜃, 𝛿) = 𝔼𝜃 (𝛿(𝑋) − 𝑔(𝜃))

2
𝔼𝜃 (𝛿(𝑋) − 𝑔(𝜃))
2
= 𝔼𝜃 (𝛿(𝑋) − 𝔼𝜃 [𝛿] + 𝔼𝜃 [𝛿] − 𝑔(𝜃))
2 2
=𝔼
⏟⏟𝜃 (𝛿(𝑋)
⏟⏟⏟⏟ − 𝔼⏟
𝜃 [𝛿])
⏟⏟ + 𝔼 𝜃 (𝔼𝜃 [𝛿] − 𝑔(𝜃)) + 2𝔼
⏟⏟⏟⏟⏟⏟⏟ [(𝛿 − 𝔼𝜃 [𝛿])(𝔼𝜃 𝛿 − 𝑔(𝜃))]
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
Var𝜃 (𝛿) Bias(𝛿)2 =0

14
UMVU

Logic: according to the bias-variance decomposition under


squared error loss, if we restrict to unbiased estimators, comparing
variance is equivalent to comparing risk
Def. UMVU
An unbiased estimator 𝛿 is uniformly minimum variance unbiased
(UMVU) if

̃ ∀𝜃 ∈ Ω
Var𝜃 (𝛿) ≤ Var𝜃 (𝛿),

for any competing unbiased estimator 𝛿.̃

15
Does UMVU always exist?
No! Even unbiased estimators might not exist
1
Ex: estimate 𝜃2
for 𝑋 drawn from Uniform(0, 𝜃)

16
Does UMVU always exist?
No! Even unbiased estimators might not exist
1
Ex: estimate 𝜃2
for 𝑋 drawn from Uniform(0, 𝜃)

Def. U-estimable
We say 𝑔(𝜃) is U-estimable if there exists 𝛿 such that
𝔼𝜃 𝛿 = 𝑔(𝜃), ∀𝜃 ∈ Ω

Does UMVU exist under U-estimable assumption?

16
UMVU under U-estimable and given complete sufficient statistics

Theorem 4.4 in Keener, Lehmann-Scheffé


Suppose 𝑇 (𝑋) is complete sufficient for P = {𝑃𝜃 ∶ 𝜃 ∈ Ω}. For
a.s.
any U-estimable 𝑔(𝜃), there is a unique (up to = ) UMVU estimator
which is based on 𝑇 .

17
Proof of Thm 4.4

• Existence
• Uniqueness
• UMVU

18
Extension to convex loss

Extension of Thm 4.4 to convex loss


Supppose 𝑇 (𝑋) is complete sufficient for P = {𝑃𝜃 ∶ 𝜃 ∈ Ω}.
Under a strictly convex loss, among all unbiased estimators, there
a.s.
is a unique (up to = ) uniformly minimum risk unbiased estimator
which is based on 𝑇

19
Strategies for finding UMVU estimators

Two strategies for finding UMVU estimators:


• Directly find an unbiased estimator based on a complete
sufficient 𝑇
• Find any unbiased estimator, then Rao-Blackwellize it.

20
Example 1

i.i.d.
𝑋1 , … , 𝑋𝑛 ∼ Poisson(𝜃), 𝜃 > 0.

• Find a UMVU estimator for 𝜃


• Find a UMVU estimator for 𝜃2

21
Example 2

i.i.d.
𝑋1 , … , 𝑋𝑛 ∼ Unif(0, 𝜃), 𝜃 > 0.

• Find a UMVU estimator for 𝜃 in two ways

22
In Example 2, is the UMVU estimator also a “good” (admissible)
estimator in terms of total risk?

23
Summary

• Jensen’s inequality for convex function. Convex loss allows us


to rule out estimators with extra noise
• Rao-Blackwell theorem allows us to improve an estimator
based on sufficient statistics 𝑇
• If unbiased estimator exists, complete sufficient statistics 𝑇
exists, then UMVU estimator exists and is unique

24
What is next?

• Reflexion on the unbiasedness


• Information inequality

25
Thank you

26
27

You might also like