Chap - 2point - Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

1 Point Estimation

In statistical analysis, point estimation of population parameters plays a very significant role. In
studying a real-world phenomenon we begin with a random sample of size n taken from the totality
of a population. We assume that the form of the population distribution is known (binomial,
normal, etc.) but the parameters of the distribution (p for a binomial; µ and σ for a normal, etc.)
are unknown. We shall estimate these parameters using the data from our random sample.
Let X1 , . . . , Xn be independent and identically distributed (iid) random variables (in statistical
language, a random sample) with a pdf f (x, θ1 , . . . , θn ) where θ1 , . . . , θn are the unknown population
parameters (characteristics of interest). The actual values of these parameters are not known. The
problem in point estimation is to determine statistics

gi (X1 , . . . , Xn ), i = 1, ..., l,

which can be used to estimate the value of each of the parameters based on observed sample data
from the population. These statistics are called estimators for the parameters, and the values
calculated from these statistics using particular sample data values are called estimates of the
parameters.
There are many methods available for estimating the true value(s) of the parameter(s) of interest.
Three of the more popular methods of estimation are the method of moments, the method of
maximum likelihood, and Bayes method. Bayes’s method will be considered in the last chapter.

2 Method of moments
0
Let µk = E(X k ) be the kth moment about the origin of a random variable X, whenever it exists.
0
Let mk = (1/n) ki=1 Xik be the corresponding kth sample moment. The method of moments is
P
based on matching the sample moments with the corresponding population (distribution) moments.

Example 2.1 Suppose X1 , . . . , Xn be a random sample from a geometric distribution with param-
eter p, 0 ≤ p ≤ 1. Find the moment estimator for p.

Solution n
X
0
For the Bernoulli random variable, µ = E(X) = p, so we can use m01 = 1
n
Xi to estimate p. Thus
i=1

n
1X
m01 = p̂ = Xi .
n i=1

Example 2.2 Let X1 , . . . , Xn be a random sample from a gamma probability distribution with pa-
rameters α and β. Find moment estimators for the unknown parameters α and β.

Solution
For the gamma distribution

E(X) = αβand E(X 2 ) = αβ 2 + α2 β 2 .


Using two moment estimators, we have
n n
1X 1X 2
m01 = xi = X = αβ and m02 = E(X 2 ) = X = αβ 2 + α2 β 2 .
n i=1 n i=1 i

X
Solving for α, we obtain α̂ = β̂
and
n
X
(X1 − X)2 )
i=1
β̂ = .
nX
Therefore
2
X
α̂ = n .
X
(X1 − X)2
i=1

3 Method of Maximum likelihood


Definition 3.1 Let f (x1 , . . . , xn ; θ), θ ∈ Θ ⊂ Rk , be the joint probability (or density) function of n
random variables X1 , . . . , Xn with sample values x1 , . . . , xn . The likelihood function of the sample is
given by
L(θ; x1 , . . . , xn ) = f (x1 , . . . , xn ; θ),
L(θ)), in a briefer notation. We emphasize that L is a function of θ for fixed sample values.

Definition 3.2 The maximum likelihood estimators (MLEs) are those values of the parameters that
maximize the likelihood function with respect to the parameter θ. That is,

L(θ̂; x1 , . . . , xn ) = maxL(θ; x1 , . . . , xn )
θ∈Θ

where Θ is the set of possible values of the parameter θ.

In general, the maximum likelihood method results in the problem of maximizing a function of
single or several variables. Hence, in most situations, the methods of calculus can be used. In
many cases, it is easier to work with the natural logarithm (ln) of the likelihood function, called the
log-likelihood function. Because the natural logarithm function is increasing, the maximum value
of the likelihood function, if it exists, will occur at the same point as the maximum value of the
log-likelihood function.

Example 3.3 Suppose X1 , . . . , Xn be a random sample from a geometric distribution with param-
eter p, 0 ≤ p ≤ 1. Find MLE p̂.

Solution
The probability mass function for a geometric random variable is

f (x, p) = p(1 − p)x−1 , 0 ≤ p ≤ 1, x = 1.2.3. . . .


n
X
n −n+ xi
Y
xi −1 n
L(p) = p(1 − p) = p (1 − p) i=1 .
i=1

2
Taking the natural logarithm of L(p),
n
!
X
ln L(p) = n ln p + −n + xi ln(1 − p).
i=1

Taking the derivative with respect to p, we have


n
!
X
−n + xi
d ln L(λ) n i=1 n 1
= − = 0 =⇒ p̂ = n = .
dλ p (1 − p) X x
i=1

Example 3.4 Suppose X1 , . . . , Xn be a random sample from a Poisson distribution with parameter
λ. Find MLE λ̂.
Solution
The the probability mass functionfor a Poisson random variable is
λx e−λ
P (X = x) = , x = 0, 1, 2, 3, 4, . . . λ > 0.
x!
n
X
xi
n
Y λxi e−λ λ i=1 e−nλ
L(λ) = = n .
i=1
xi ! Y
xi !
i=1

Then, taking the natural logarithm, we have


n
X n
X
ln L(λ) = xi ln λ − nλ − ln(xi !),
i=1 i=1

and differentiating with respect to λ results in


n
X n
X
xi xi
d ln L(λ) i=1 i=1
= − n = 0 =⇒ λ̂ = = x.
dλ λ n
Example 3.5 Let X1 , . . . , Xn be i.i.d N (µ, σ 2 ) random variables.
(a) If µ is unknown and σ 2 = σ02 is known, find the MLE for µ.
(b) If µ = µ0 is known and σ 2 is un known, find the MLE for µ.
(c) If µ and σ 2 are both unknown, find the MLE for µ and σ 2 .
Solution
n
X
 
2
n  2
  − (xi − µ) 
Y 1 −(xi − µ) 1  i=1
 
L(µ, σ 2 ) = √ exp 2
= n/2 σ n
exp 2
.
i=1
2πσ 2σ (2π) 
 2σ 

3
In order to avoid notational confusion when taking the derivative, let θ = σ 2 . Then, the likelihood
function is  n 
X
2
 − (xi − µ) 
i=1
L(µ, θ) = (2πθ)−n/2 exp 
 
.

 2θ 

Taking logs
n
X
(xi − µ)2
n n i=1
ln L(µ, σ 2 ) = − ln(2π) − ln θ − .
2 2 2θ
(a) θ0 = σ02 is known, the problem reduces to estimating the only one parameter µ. Differentiating
the log-likelihood function with respect to µ, we obtain
n
X
2 (xi − µ)
∂ i=1
ln L(µ, θ0 ) = = 0.
∂µ 2θ0
n
X n
X
Hence (xi − µ) = 0 and therefore xi = nµ or µ̂ = x.
i=1 i=1

(b) µ = µ0 is known, the problem reduces to estimating the only one parameter, σ 2 = θ.
n
X
(xi − µ)2
∂ −n i=1
ln L(µ, θ0 ) = + = 0.
∂µ 2θ 2θ2
Hence n
X
(xi − µ0 )2
θ̂ = σˆ2 = i=1
.
n
(c) When both µ and θ are unknown, we need to differentiate with respect to both µ and θ
individually:

n
X
2 (xi − µ)
∂ i=1
ln L(µ, θ) = = 0.
∂µ 2θ
n
X
(xi − µ)2
∂ −n i=1
ln L(µ, θ) = + = 0.
∂µ 2θ 2θ2
n
X
(xi −µ0 )2

Solving the equations simultaneously, we obtain µ̂ = x and θ̂ = σˆ2 = i=1


n
.

4
4 Desirable properties of point estimators
4.1 Unbiased Estimators
Definition 4.1 A point estimator θ̂ is called an unbiased estimator of the parameter θ if E(θ̂) = θ
for all possible values of θ. Otherwise θ̂ is said to be biased. Furthermore, the bias of θ̂ is given by

B = E(θ̂) − θ.

The bias occurs when a sample does not accurately represent the population from which the sample
is taken. It is important to observe that in order to check whether θ̂ is unbiased, it is not necessary
to know the value of the true parameter. Instead, one can use the sampling distribution of θ̂.
Theorem 4.2 The mean of a random sample X is an unbiased estimator of the population mean
µ.
Proof
Let X1 , . . . , Xn be random variables with mean µ. Then, the sample mean is X = (1/n) ni=1 Xi .
P

n
1X 1
E(X) = E(Xi ) = nµ = µ.
n i=1 n

How is this interpreted in practice? Suppose that a data set is collected with n numerical obser-
vations x1 , . . . , xn . The resulting sample mean may be either less than or greater than the true
population mean, µ (remember, we do not know this value). If the sampling experiment was
repeated many times, then the average of the estimates calculated over these repetitions of the
sampling experiment will equal the true population mean.
Theorem 4.3 If S 2 is the variance of a random sample from an infinite population with finite
variance σ 2 , then S 2 is an unbiased estimator for σ 2 .
Proof
Let X1 , . . . , Xn be iid random variables with variance σ 2 < ∞. We have
n n
!
2 1 X
2 1 X
2
E(S ) = E (Xi − X) = E {(Xi − µ) − (X − µ)}
n − 1 i=1 n−1 i=1
n
!
1 X
2 2
= E(Xi − µ) − nE(X − µ)
n − 1 i=1

But E(Xi − µ)2 = σ 2 and E(X − µ)2 = σ 2 /n, hence


n
!
1 X σ2
E(S 2 ) = σ2 − n = σ2.
n−1 i=1
n

Definition 4.4 The mean square error of the estimator θ̂, denoted by M SE(θ̂), is defined as

M SE(θ̂) = E(θ̂ − θ)2

5
Note that
 2
M SE(θ̂) = E(θ̂ − θ)2 = E (θ̂ − E(θ̂) + (E(θ̂) − θ)
 
= E (θ̂ − E(θ̂)2 + (E(θ̂) − θ)2 + 2(θ̂ − E(θ̂)E(θ̂) − θ)
= E(θ̂ − E(θ̂)2 + E(E(θ̂) − θ)2 + 2E(θ̂ − E(θ̂)E(θ̂) − θ)
= V ar(θ̂) + (E(θ̂) − θ)2

so that M SE(θ̂) = V ar(θ̂) + B 2 , where B is the bias of the estimator. For unbiased estimators,
M SE(θ̂) = V ar(θ̂).

4.2 Consistency
It is a desirable property that the values of an estimator be closer to the value of the true parameter
being estimated as the sample size becomes larger.

Definition 4.5 The estimator (θ̂n is said to be a consistent estimator of θ if, for any ε > 0,
h i
lim P |θ̂ − θ| ≤ ε = 1
n→∞

or equivalently h i
lim P |θ̂ − θ| > ε = 0
n→∞

(θ̂n is a consistent estimator of θ if (θ̂n converges in probability to θ. That is, the sample estimator
should have a high probability of being close to the population value θ for large sample size n.

Theorem 4.6 Let (θ̂n be an estimator of θ and let V ar(θ̂n be finite. If


 
lim E (θ̂ − θ)2 = 0,
n→∞

then (θ̂n is a consistent estimator of θ.


 
Recall that E (θ̂ − θ)2 = M SE(θ̂) = V ar(θ̂) + B 2 . For biased estimators, V ar(θ̂) → 0 and
B(θ̂) → 0 as n → ∞.

4.3 Efficiency
Definition 4.7 If θ̂1 and θ̂2 are two unbiased estimators for θ, the efficiency of θ̂1 relative to θ̂2 is
the ratio
V ar(θ̂2 )
e(θ̂1 , θ̂2 ) = .
V at(θ̂1 )
If V ar(θ̂2 ) > V ar(θ̂1 ) or equivalently e(θ̂1 , θ̂2 ) > 1, then, hatθ1 ) is relatively more efficient than
hatθ2 )

6
Example 4.8 Let X1 , . . . , Xn , n > 3 be a random sample from a population with a true mean µ
and variance σ 2 . Consider the following three estimators of µ:
1
θ̂1 = (X1 + X2 + X3 )
3
1 3 1
θ̂2 = X1 + (X2 + . . . + Xn−1 ) + Xn
8 4(n − 2) 8
θ̂3 = X

(a) Show that each of the three estimators is unbiased.


(b) Find e(θ̂2 , θ̂1 ), e(θ̂3 , θ̂1 ), and e(θ̂3 , θ̂2 ). Which of the three estimators is more efficient?
(c) Check for consistency for each of the estimators.
Solution
Given E(Xi ) = µ, i = 1, 2, . . . , n. Then
1 3µ
E(θ̂1 ) = (E(X1 ) + E(X2 ) + E(X3 )) = =µ
3 µ
1 3 1 2 3
E(θ̂2 ) = E(X1 ) + (E(X2 ) + . . . + E(Xn−1 )) + E(XN ) = µ + (n − 2)µ = µ
8 4(n − 2) 8 8 4(n − 2)
E(θ̂3 ) = E(X) = µ

Hence the three estimators is unbiased.


1 3σ 2 σ2
V ar(θ̂1 ) = (V ar(X1 ) + V ar(X2 ) + V ar(X3 )) = =
9 9 3
1 9 1
V ar(θ̂2 ) = V ar(X1 ) + 2
(V ar(X2 ) + . . . + V ar(Xn−1 )) + V ar(XN )
64 16(n − 2) 64
1 9 1 n + 16 2
= σ2 + (n − 2)σ 2 ) + σ 2 =

2
σ
64 16(n − 2) 64 32(n − 2)
σ2
V ar(θ̂3 ) = V ar(X) =
n

V ar(θ̂2 ) 3(n + 16)


e(θ̂1 , θ̂2 ) = = < 1 for n > 3.
V ar(θ̂1 32(n − 2)
Thus for n ≥ 4, θ̂2 is more efficient than θ̂1 .
V ar(θ̂1 ) σ 2 /3 n
e(θ̂3 , θ̂1 ) = = 2
= > 1 for n ≥ 4.
V ar(θ̂3 σ /n 3

Thus for n ≥ 4, θ̂3 is more efficient than θ̂1 .


n2 + 16n)
V ar(θ̂2 )
e(θ̂3 , θ̂2 ) = = > 1 for n ≥ 3.
V ar(θ̂3 32(n − 2)

Thus for n ≥ 4, θ̂3 is more efficient than θ̂2 .


Only V ar(θ̂3 ) −→ 0 as n −→ ∞. Therefore only θ̂3 is consistent.

7
Definition 4.9 An unbiased estimator θ̂, is said to be a uniformly minimum variance unbiased
estimator (UMVUE) for the parameter θ if, for any other unbiased estimator θ̂
V ar(θ̂0 ) ≤ V ar(θ̂),
for all possible values of θ.
Theorem 4.10 ( Cramer-Rao inequality) : Let X1 , . . . , Xn be a random sample from a popu-
lation with pdf f (x, θ). If θ̂ is an unbiased estimator of θ, then, under very general conditions, the
following inequality is true:
1
V ar(θ̂) ≥  2  .
∂ ln f (x,θ)
nE ∂θ

Theorem 4.11 If θ̂ is an unbiased estimator of θ and if


1
V ar(θ̂) =  2  ,
∂ ln f (x,θ)
nE ∂θ

then θ̂ is a uniformly minimum variance unbiased estimator (UMVUE) of θ. Therefore the UMVUE
θ̂ is an efficient estimator.
Note that if the function f (.) is sufficiently smooth, it can be shown that
 2  2 
∂ ln f (x, θ) ∂ ln f (x, θ)
E = −E
∂θ ∂θ2
Example 4.12 Let X1 , . . . , Xn be a random sample from an N (µ, σ 2 ) population with density func-
tion f (x). Show that X is an efficient estimator for µ.
Solution  
−(xi −µ)2
For a Normal distribution, we have f (x) = √1 exp . Hence
2πσ 2σ 2

1 −(xi − µ)2
ln f (x) = − ln(2π) − .
2 2σ 2
∂ ln f (x) (xi −µ) ∂ 2 ln f (x) −1
∂µ
= σ2
, and ∂µ2
= σ2
. Therefore
1 1 σ2
V ar(X) = 2  = = .
nE σ1
 
2 n
nE −∂ ln f (x,µ)
∂µ2

Therefore, X is an efficient estimator of µ. That is, X is an UMVUE of µ


Example 4.13 Suppose P (x) is the Poisson distribution with parameter λ. Show that the sample
mean X n is an efficient estimator for λ.
Solution
−λ
P (x) = λx ex! . Taking logarithms we have
ln P (x) = x ln λ − λ − ln(x!).
∂ ln P (x) x ∂2 ln P (x) −x
∂λ
= λ
− 1, and so ∂λ2
= λ2
. Hence
1 1 λ
V ar(X) =  2  = X
= .
2 nE n
nE −∂ ln f (x,µ)
∂λ2
λ2

Therefore X is an efficient estimator of λ.

8
4.4 Sufficiency
In the statistical inference problems on a parameter, one of the major questions is: Can a specific
statistic replace the entire data without losing pertinent information?

Definition 4.14 Let X1 , . . . , Xn be a random sample from a probability distribution with unknown
parameter θ. Then, the statistic U = g(X1 , . . . , Xn ) is said to be sufficient for θ if the conditional
pdf of X1 , . . . , Xn given U = u does not depend on θ for any value of u. An estimator of θ that is
a function of a sufficient statistic for θ is said to be a sufficient estimator of θ .

Example
Pn 4.15 Let X1 , . . . , Xn be iid Bernoulli random variables with parameter θ. Show that
U = i=1 Xi is sufficient for θ.

Solution
n
X n
X
Xi n− Xi

f (X1 , . . . , Xn : θ) = θ i=1 (1 − θ) i=1 , 0 ≤ θ ≤ 1.


Pn
Because U = i=1 Xi , we have

f (X1 , . . . , Xn : θ) = θU (1 − θ)n−U , 0 ≤ U ≤ n.

Also U ∼ B(n, θ) and so  


n U
g(u : θ) = θ (1 − θ)n−U .
u
Therefore
f (X1 , . . . , Xn : u) θU (1 − θ)n−U 1
f (X1 , . . . , Xn |U = u) = = n U

n−U
= n ,
g(u) u
θ (1 − θ) u
which is independent of θ. Therefore U is sufficient for θ.

Theorem 4.16 (NEYMANFISHER FACTORIZATION CRITERIA) Let U be a statistic


based on the random sample X1 , . . . , Xn . Then, U is a sufficient statistic for θ if and only if the
joint pdf f (x1 , . . . , xn ; θ) can be factored into two nonnegative functions.

f (x1 , . . . , xn ; θ) = g(u, θ)h(x1 , . . . , xn )

, for all x1 , . . . , xn , where g(u, ) is a function only of u and θ and h(x1 , . . . , xn ) is a function of only
x1 , . . . , xn and not of θ.

9
5 EXERCISES
1. Let X1 , . . . , Xn be a random sample of size n from the exponential distribution whose pdf is
(
θe−θx , x ≥ 0
f (x, θ) =
0, x<0

(a) Use the method of moments to find a point estimator for θ.


(b) The following data represent the time intervals between the emissions of beta particles:
Assuming the data follow an exponential distribution, obtain a moment estimate for the

0.9 0.1 0.1 0.8 0.9 0.1 0.1 0.7 1.0 0.2
0.1 0.1 0.1 2.3 0.8 0.3 0.2 0.1 1.0 0.9
0.1 0.5 0.4 0.6 0.2 0.4 0.2 0.1 0.8 0.2
0.5 3.0 1.0 0.5 0.2 2.0 1.7 0.1 0.3 0.1
0.4 0.5 0.8 0.1 0.1 1.7 0.1 0.2 0.3 0.1

parameter θ. Interpret.

2. Let X1 , . . . , Xn be a random sample from the beta distribution with parameters α and β.
Find the method of moments estimator for α and β.

3. Let X1 , . . . , Xn be a random sample from a Pareto distribution (named after the economist
Vilfredo Pareto) with shape parameter a. The density function is given by
(
a
a+1 , x≥1
f (x) = x
0, otherwise

Show that the maximum likelihood estimator of a is


n
â = n .
X
ln(Xi )
i=1

4. Let X1 , . . . , Xn be a random sample from U (0, θ), θ > 0. Find the MLE of θ̂.

5. Let X1 , . . . , Xn be a random sample from a population with pdf


(
1 (1−α)/α
α
x , for 0 < x < 1; α > 0
f (x) =
0, otherwise
Pn
(a) Show that the maximum likelihood estimator of α̂ = −(1/n) i=1 ln(Xi ).
(b) Is α̂ an unbiased estimator of α?
(c) Is α̂ a consistent estimator of α?

10
6. Let X1 , . . . , Xn , n > 4, be a random sample from a population with a mean µ and variance
σ 2 . Consider the following three estimators of µ:
1
θ̂1 = (X1 + 2X2 + 5X3 + X4 )
9
2 1 1 1
θ̂2 = X1 + X2 + (X2 + . . . + Xn−1 ) + Xn
5 5 5(n − 3) 5
θ̂3 = X

(a) Show that each of the three estimators is unbiased.


(b) Find e(θ̂2 , θ̂1 ), e(θ̂3 , θ̂1 ), and e(θ̂3 , θ̂2 ). Which of the three estimators is more efficient?
(c) Check for consistency for each of the estimators.

7. Let X1 , . . . , Xn be a random sample from a Rayleigh distribution with pdf


(
2x −x2 /β
e , x>0
f (x) = β
0, otherwise

Find an UMVUE for β.

8. Let X1 , . . . , Xn be a random sample from a one-parameter Weibull distribution with pdf


( 2
2αxe−αx , x > 0
f (x) =
0, otherwise

(a) Find a sufficient statistic for α.


(b) Find an UMVUE for α..

11

You might also like