Chap - 2point - Estimation
Chap - 2point - Estimation
Chap - 2point - Estimation
In statistical analysis, point estimation of population parameters plays a very significant role. In
studying a real-world phenomenon we begin with a random sample of size n taken from the totality
of a population. We assume that the form of the population distribution is known (binomial,
normal, etc.) but the parameters of the distribution (p for a binomial; µ and σ for a normal, etc.)
are unknown. We shall estimate these parameters using the data from our random sample.
Let X1 , . . . , Xn be independent and identically distributed (iid) random variables (in statistical
language, a random sample) with a pdf f (x, θ1 , . . . , θn ) where θ1 , . . . , θn are the unknown population
parameters (characteristics of interest). The actual values of these parameters are not known. The
problem in point estimation is to determine statistics
gi (X1 , . . . , Xn ), i = 1, ..., l,
which can be used to estimate the value of each of the parameters based on observed sample data
from the population. These statistics are called estimators for the parameters, and the values
calculated from these statistics using particular sample data values are called estimates of the
parameters.
There are many methods available for estimating the true value(s) of the parameter(s) of interest.
Three of the more popular methods of estimation are the method of moments, the method of
maximum likelihood, and Bayes method. Bayes’s method will be considered in the last chapter.
2 Method of moments
0
Let µk = E(X k ) be the kth moment about the origin of a random variable X, whenever it exists.
0
Let mk = (1/n) ki=1 Xik be the corresponding kth sample moment. The method of moments is
P
based on matching the sample moments with the corresponding population (distribution) moments.
Example 2.1 Suppose X1 , . . . , Xn be a random sample from a geometric distribution with param-
eter p, 0 ≤ p ≤ 1. Find the moment estimator for p.
Solution n
X
0
For the Bernoulli random variable, µ = E(X) = p, so we can use m01 = 1
n
Xi to estimate p. Thus
i=1
n
1X
m01 = p̂ = Xi .
n i=1
Example 2.2 Let X1 , . . . , Xn be a random sample from a gamma probability distribution with pa-
rameters α and β. Find moment estimators for the unknown parameters α and β.
Solution
For the gamma distribution
X
Solving for α, we obtain α̂ = β̂
and
n
X
(X1 − X)2 )
i=1
β̂ = .
nX
Therefore
2
X
α̂ = n .
X
(X1 − X)2
i=1
Definition 3.2 The maximum likelihood estimators (MLEs) are those values of the parameters that
maximize the likelihood function with respect to the parameter θ. That is,
L(θ̂; x1 , . . . , xn ) = maxL(θ; x1 , . . . , xn )
θ∈Θ
In general, the maximum likelihood method results in the problem of maximizing a function of
single or several variables. Hence, in most situations, the methods of calculus can be used. In
many cases, it is easier to work with the natural logarithm (ln) of the likelihood function, called the
log-likelihood function. Because the natural logarithm function is increasing, the maximum value
of the likelihood function, if it exists, will occur at the same point as the maximum value of the
log-likelihood function.
Example 3.3 Suppose X1 , . . . , Xn be a random sample from a geometric distribution with param-
eter p, 0 ≤ p ≤ 1. Find MLE p̂.
Solution
The probability mass function for a geometric random variable is
2
Taking the natural logarithm of L(p),
n
!
X
ln L(p) = n ln p + −n + xi ln(1 − p).
i=1
Example 3.4 Suppose X1 , . . . , Xn be a random sample from a Poisson distribution with parameter
λ. Find MLE λ̂.
Solution
The the probability mass functionfor a Poisson random variable is
λx e−λ
P (X = x) = , x = 0, 1, 2, 3, 4, . . . λ > 0.
x!
n
X
xi
n
Y λxi e−λ λ i=1 e−nλ
L(λ) = = n .
i=1
xi ! Y
xi !
i=1
3
In order to avoid notational confusion when taking the derivative, let θ = σ 2 . Then, the likelihood
function is n
X
2
− (xi − µ)
i=1
L(µ, θ) = (2πθ)−n/2 exp
.
2θ
Taking logs
n
X
(xi − µ)2
n n i=1
ln L(µ, σ 2 ) = − ln(2π) − ln θ − .
2 2 2θ
(a) θ0 = σ02 is known, the problem reduces to estimating the only one parameter µ. Differentiating
the log-likelihood function with respect to µ, we obtain
n
X
2 (xi − µ)
∂ i=1
ln L(µ, θ0 ) = = 0.
∂µ 2θ0
n
X n
X
Hence (xi − µ) = 0 and therefore xi = nµ or µ̂ = x.
i=1 i=1
(b) µ = µ0 is known, the problem reduces to estimating the only one parameter, σ 2 = θ.
n
X
(xi − µ)2
∂ −n i=1
ln L(µ, θ0 ) = + = 0.
∂µ 2θ 2θ2
Hence n
X
(xi − µ0 )2
θ̂ = σˆ2 = i=1
.
n
(c) When both µ and θ are unknown, we need to differentiate with respect to both µ and θ
individually:
n
X
2 (xi − µ)
∂ i=1
ln L(µ, θ) = = 0.
∂µ 2θ
n
X
(xi − µ)2
∂ −n i=1
ln L(µ, θ) = + = 0.
∂µ 2θ 2θ2
n
X
(xi −µ0 )2
4
4 Desirable properties of point estimators
4.1 Unbiased Estimators
Definition 4.1 A point estimator θ̂ is called an unbiased estimator of the parameter θ if E(θ̂) = θ
for all possible values of θ. Otherwise θ̂ is said to be biased. Furthermore, the bias of θ̂ is given by
B = E(θ̂) − θ.
The bias occurs when a sample does not accurately represent the population from which the sample
is taken. It is important to observe that in order to check whether θ̂ is unbiased, it is not necessary
to know the value of the true parameter. Instead, one can use the sampling distribution of θ̂.
Theorem 4.2 The mean of a random sample X is an unbiased estimator of the population mean
µ.
Proof
Let X1 , . . . , Xn be random variables with mean µ. Then, the sample mean is X = (1/n) ni=1 Xi .
P
n
1X 1
E(X) = E(Xi ) = nµ = µ.
n i=1 n
How is this interpreted in practice? Suppose that a data set is collected with n numerical obser-
vations x1 , . . . , xn . The resulting sample mean may be either less than or greater than the true
population mean, µ (remember, we do not know this value). If the sampling experiment was
repeated many times, then the average of the estimates calculated over these repetitions of the
sampling experiment will equal the true population mean.
Theorem 4.3 If S 2 is the variance of a random sample from an infinite population with finite
variance σ 2 , then S 2 is an unbiased estimator for σ 2 .
Proof
Let X1 , . . . , Xn be iid random variables with variance σ 2 < ∞. We have
n n
!
2 1 X
2 1 X
2
E(S ) = E (Xi − X) = E {(Xi − µ) − (X − µ)}
n − 1 i=1 n−1 i=1
n
!
1 X
2 2
= E(Xi − µ) − nE(X − µ)
n − 1 i=1
Definition 4.4 The mean square error of the estimator θ̂, denoted by M SE(θ̂), is defined as
5
Note that
2
M SE(θ̂) = E(θ̂ − θ)2 = E (θ̂ − E(θ̂) + (E(θ̂) − θ)
= E (θ̂ − E(θ̂)2 + (E(θ̂) − θ)2 + 2(θ̂ − E(θ̂)E(θ̂) − θ)
= E(θ̂ − E(θ̂)2 + E(E(θ̂) − θ)2 + 2E(θ̂ − E(θ̂)E(θ̂) − θ)
= V ar(θ̂) + (E(θ̂) − θ)2
so that M SE(θ̂) = V ar(θ̂) + B 2 , where B is the bias of the estimator. For unbiased estimators,
M SE(θ̂) = V ar(θ̂).
4.2 Consistency
It is a desirable property that the values of an estimator be closer to the value of the true parameter
being estimated as the sample size becomes larger.
Definition 4.5 The estimator (θ̂n is said to be a consistent estimator of θ if, for any ε > 0,
h i
lim P |θ̂ − θ| ≤ ε = 1
n→∞
or equivalently h i
lim P |θ̂ − θ| > ε = 0
n→∞
(θ̂n is a consistent estimator of θ if (θ̂n converges in probability to θ. That is, the sample estimator
should have a high probability of being close to the population value θ for large sample size n.
4.3 Efficiency
Definition 4.7 If θ̂1 and θ̂2 are two unbiased estimators for θ, the efficiency of θ̂1 relative to θ̂2 is
the ratio
V ar(θ̂2 )
e(θ̂1 , θ̂2 ) = .
V at(θ̂1 )
If V ar(θ̂2 ) > V ar(θ̂1 ) or equivalently e(θ̂1 , θ̂2 ) > 1, then, hatθ1 ) is relatively more efficient than
hatθ2 )
6
Example 4.8 Let X1 , . . . , Xn , n > 3 be a random sample from a population with a true mean µ
and variance σ 2 . Consider the following three estimators of µ:
1
θ̂1 = (X1 + X2 + X3 )
3
1 3 1
θ̂2 = X1 + (X2 + . . . + Xn−1 ) + Xn
8 4(n − 2) 8
θ̂3 = X
7
Definition 4.9 An unbiased estimator θ̂, is said to be a uniformly minimum variance unbiased
estimator (UMVUE) for the parameter θ if, for any other unbiased estimator θ̂
V ar(θ̂0 ) ≤ V ar(θ̂),
for all possible values of θ.
Theorem 4.10 ( Cramer-Rao inequality) : Let X1 , . . . , Xn be a random sample from a popu-
lation with pdf f (x, θ). If θ̂ is an unbiased estimator of θ, then, under very general conditions, the
following inequality is true:
1
V ar(θ̂) ≥ 2 .
∂ ln f (x,θ)
nE ∂θ
then θ̂ is a uniformly minimum variance unbiased estimator (UMVUE) of θ. Therefore the UMVUE
θ̂ is an efficient estimator.
Note that if the function f (.) is sufficiently smooth, it can be shown that
2 2
∂ ln f (x, θ) ∂ ln f (x, θ)
E = −E
∂θ ∂θ2
Example 4.12 Let X1 , . . . , Xn be a random sample from an N (µ, σ 2 ) population with density func-
tion f (x). Show that X is an efficient estimator for µ.
Solution
−(xi −µ)2
For a Normal distribution, we have f (x) = √1 exp . Hence
2πσ 2σ 2
1 −(xi − µ)2
ln f (x) = − ln(2π) − .
2 2σ 2
∂ ln f (x) (xi −µ) ∂ 2 ln f (x) −1
∂µ
= σ2
, and ∂µ2
= σ2
. Therefore
1 1 σ2
V ar(X) = 2 = = .
nE σ1
2 n
nE −∂ ln f (x,µ)
∂µ2
8
4.4 Sufficiency
In the statistical inference problems on a parameter, one of the major questions is: Can a specific
statistic replace the entire data without losing pertinent information?
Definition 4.14 Let X1 , . . . , Xn be a random sample from a probability distribution with unknown
parameter θ. Then, the statistic U = g(X1 , . . . , Xn ) is said to be sufficient for θ if the conditional
pdf of X1 , . . . , Xn given U = u does not depend on θ for any value of u. An estimator of θ that is
a function of a sufficient statistic for θ is said to be a sufficient estimator of θ .
Example
Pn 4.15 Let X1 , . . . , Xn be iid Bernoulli random variables with parameter θ. Show that
U = i=1 Xi is sufficient for θ.
Solution
n
X n
X
Xi n− Xi
f (X1 , . . . , Xn : θ) = θU (1 − θ)n−U , 0 ≤ U ≤ n.
, for all x1 , . . . , xn , where g(u, ) is a function only of u and θ and h(x1 , . . . , xn ) is a function of only
x1 , . . . , xn and not of θ.
9
5 EXERCISES
1. Let X1 , . . . , Xn be a random sample of size n from the exponential distribution whose pdf is
(
θe−θx , x ≥ 0
f (x, θ) =
0, x<0
0.9 0.1 0.1 0.8 0.9 0.1 0.1 0.7 1.0 0.2
0.1 0.1 0.1 2.3 0.8 0.3 0.2 0.1 1.0 0.9
0.1 0.5 0.4 0.6 0.2 0.4 0.2 0.1 0.8 0.2
0.5 3.0 1.0 0.5 0.2 2.0 1.7 0.1 0.3 0.1
0.4 0.5 0.8 0.1 0.1 1.7 0.1 0.2 0.3 0.1
parameter θ. Interpret.
2. Let X1 , . . . , Xn be a random sample from the beta distribution with parameters α and β.
Find the method of moments estimator for α and β.
3. Let X1 , . . . , Xn be a random sample from a Pareto distribution (named after the economist
Vilfredo Pareto) with shape parameter a. The density function is given by
(
a
a+1 , x≥1
f (x) = x
0, otherwise
4. Let X1 , . . . , Xn be a random sample from U (0, θ), θ > 0. Find the MLE of θ̂.
10
6. Let X1 , . . . , Xn , n > 4, be a random sample from a population with a mean µ and variance
σ 2 . Consider the following three estimators of µ:
1
θ̂1 = (X1 + 2X2 + 5X3 + X4 )
9
2 1 1 1
θ̂2 = X1 + X2 + (X2 + . . . + Xn−1 ) + Xn
5 5 5(n − 3) 5
θ̂3 = X
11