Sta 341 Class Notes Final

Course Code: STA 341
Course Title: THEORY OF ESTIMATION.
WEEK 1.
TOPIC 1: INTRODUCTION
Concepts of Estimation
• The objective of estimation is to determine the value of a population parameter on the

basis of a sample statistic.
• There are two types of estimators
– Point Estimator
– Interval estimator
INTRODUCTION
Statistical inference refer principle employed to infer or inform about a population using
ample drawn from it. Principle of sampling theory are very handy.
Statistical Inference entails:
1. Statistical Estimation Theory
2. Statistical Hypothesis Testing  accuracy of information where each I tested
separately as a discipline.
1.1 Definition of Term
1. Random variable, population and sample

 Random variable is a rule/function that assigns a numerical value to the outcome of a
random experiment.
 Population is the totality of all objects that a researcher wishes to study. Is a set of
measurements which are subject to an investigation/analysis/ interpretation.
 Sample is a subset of the population that is chosen from the entire population by means
of specific sampling technique.
Population (N) represented by X
Sample (subject of Population)

represented by x1
Where n＜N
N/B
Assume that some characteristic of the element in the population can be represented by a
random variable X whose p.d . f is f ( x, ) where the form of the p.d . f is assumed known
except that it contain unknown parameter  . Consider a random ample of size n i.e.
x1, x2,..., xn from the population X . On the basis of observed value x1, x2,..., xn , it’s desired
to estimate the unknown parameter  or some function of  say r ( ) .
2. Point Estimation
In point estimation we estimate an unknown parameter using a single number that is calculated
from the sample data.
This is a procedure/ process of obtaining a single number/ estimate/ point that would estimate
a population parameter.
E.g. X  10 to estimate  .
In an experiment on memory for chess positions, the mean recall for tournament players was
63.8 and the mean for non-players was 33.1. Therefore a point estimate of the difference
between population means is 30.7. The 95% confidence interval on the difference between
means extends from 19.05 to 42.35.
3. Interval Estimation
Point estimates are usually supplemented by interval estimates called confidence intervals.
Confidence intervals are intervals constructed using a method that contains the population
parameter a specified proportion of the time. For example, if the pollster used a method that
contains the parameter 95% of the time it is used, he or she would arrive at the following 95%
confidence interval: 0.46 < π < 0.60. The pollster would then conclude that somewhere between
0.46 and 0.60 of the population supports the proposal. The media usually reports this type of
result by saying that 53% favour the proposition with a margin of error of 7%.
In interval estimation, we estimate an unknown parameter using an interval of values that

is likely to contain the true value of that parameter (and state how confident we are that this
interval indeed captures the true value of the parameter).
Procedure that make use of ample information to arrive at two number that intend to enclose
the parameter of interest i.e. A    B
 The unknown population parameter  is likely to be within the prescribed probability.

It is the above two approaches  is treated as a RV
. with a given prior information/
distribution and this knowledge I combined with the ample result to consider a conditional
distribution of parameter  given this ample data i.e. k ( / x1, x 2, xn) , Then the estimator of
 derived from such a parameter distribution will lead to a Bayes estimate.
N/B
1.Take note of Notation:
Estimator of unknown Parameter  I denoted by 

e.g.
 I an estimator of 
 2 I an estimator of 
p I an estimate of p
4.Parameter
 A parameter is a numerical value associated with a population. Considered fixed and

unchanging. A statistical constant that characterizes the distribution in a population e.g.
population mean  , population standard deviation  , and population proportion p.
Is a statistical constant that characterises the distribution in a population. (i.e. population

characteristic) and are commonly unknown e.g. Let  N (  ,  2 )
 and  2 are population parameter to be estimated
A parameter  can have many different estimators. It essential that you choose the bet
estimator. E.g.
Variance
1
2 
n
 ( xi  X ) 2
1
2 
n 1
 ( xi  X ) 2
Estimation of population parameter is done by taking a small finite portion (sample of size n
) of the population. This sample is then used to construct function which can be used to
replace the parameter. The function constructed from the ample I called estimator
5. Statistic
 Statistic is a numerical quantity that is calculated from the data containing random
sample (formula). A function of obtained RV . which do not involve unknown
parameter. E.g. sample mean.
A statistic is a numerical value computed from a sample. Its value may differ for
different samples e.g. sample mean, sample standard deviation and sample proportion
X = x i
ni
estimat estimator
ee
6. Estimator
 Estimator is a rule/formula derived from a sample for the purpose of estimating an

unknown parameter.
7. Estimate
 Estimate is when observed values of a random variable are inserted into the estimator,
the result is estimate e.g.  is estimator of  .
8. Estimation Theory
I a procedure that utilize ample information to get an estimate of the population parameter
Approaches 1. Point estimator are:
2. Interval estimator
3. Bayes estimator
1.2 OVERVIEW OF SAMPLING THEORY

The probability distribution of statistic computed from the observations
x1, x2, x3, … , xn
is sometimes called its sampling distribution. This distribution describes the random
behaviour of the statistic. It is important to determine the sampling distribution of a statistic.
It will describe its sampling behaviour. The sampling distribution will be used the assess the
accuracy of the statistic when used for the purpose of estimation. Sampling theory is the area
of Mathematical Statistics that is interested in determining the sampling distribution of various
statistics.
Many statistics have a normal distribution. This quite often is true if the population is Normal.
It is also sometimes true if the sample size is reasonably large. (Central limit theorem)
1.3 Methods of sampling – nonprobability
• Friends, family, neighbours, acquaintances.
• Students in a class or co-workers in a workplace.
• Convenience.
• Volunteers.
• Snowball sample.
• Judgment sample.
• Quota sample – obtain a cross-section of a population, e.g. by age and sex for
individuals or by region, firm size, and industry for businesses. This may be reasonably
representative.
• Sampling distribution of statistics cannot be obtained using any of the above methods,
so statistical inference is not possible.
1.4 Methods of sampling – probabilistic

• Random sampling methods – each member has an equal probability of being selected.
• Systematic – every kth case. Equivalent to random if patterns in list are unrelated to
issues of interest. Eg. telephone book.
• Stratified samples – sample from each stratum or subgroup of a population. Eg. region,
size of firm.
• Cluster samples – sample only certain clusters of members of a population. Eg. city
blocks, firms.
• Multistage samples – combinations of random, systematic, stratified, and cluster
sampling.
• If probability involved at each stage, then distribution of sample statistics can be
obtained.
1.5 Some terms used in sampling
• Sampled population – population from which sample drawn. Researcher should
clearly define.
• Frame – list of elements that sample selected from. Eg. telephone book, city business
directory. May be able to construct a frame.
• Parameter – characteristics of a population. Eg. total (annual GDP or exports),
proportion p of population that votes Liberal in federal election. Also, µ or σ of a
probability distribution are termed parameters.
• Statistic – numerical characteristics of a sample. Eg. monthly unemployment rate, pre-
election polls.
• Sampling distribution of a statistic is the probability distribution of the statistic.
1.6 Sampling Distributions

 The sampling distribution of an estimator is the probability distribution of all possible
values the statistic may assume when a random sample of size n is taken. • An estimator
is a random variable since samples vary
 Sampling distribution of sample statistic tells probability distribution of values taken
by the statistic in repeated random samples of a given size.
 A sampling distribution is the probability distribution for all possible values of the
sample statistic.
1.7 Sampling variation
Consider eight random samples of size n = 5 from a large population of GMAT scores for MBA
applicants
The sample means ( X ) tend to be close to the population mean (μ =520.78).
Theorem 1
 X1 , , X n are observations of a random sample of size n from the normal

distribution N (  , 2 )
n
 X   X i is the sample mean of the n observations, and
i 1
1 n
 S2  
n  1 i 1
( X i  X ) 2 is the sample variance of the n observations.
Then
i. X and S 2 are independent

n
(n  1) S 2 (X i  X )2
ii.  i 1
 (2n 1)
 2
 2
Proof
x n 2
Now for proving number 2, Let W    

i 1   
Now, we can take W and do the trick of adding 0 to each term in the summation. Doing so, of
course, doesn't change the value of W:
2
n
 (x  X )  ( X   
W   
i 1   
As you can see, we added 0 by adding and subtracting the sample mean to the quantity in the
numerator. Now, let's square the term. Doing just that, and distributing the summation, we get:
2 2 2
n
 (x  X )  n
 (X    n
 ( X  )  n
W        2   (X  X)
  i 1    
i
i 1  i 1   i 1
n
But the last term is 0 i.e (X
i 1
i  X )  nX  nX  0 so, W reduces to:
n
( X i  X )2 n
n( X   ) 2
W  
i 1 2 i 1 2
We can do a bit more with the first term of W. As an aside, if we take the definition of the
sample variance:
1 n
S2  
n  1 i 1
( X i  X )2
and multiply both sides by (n−1), we get:
n
(n  1) S 2   ( X i  X ) 2
i 1
So, the numerator in the first term of W can be written as a function of the sample variance.
That is:
2
n
 X  (n  1) S 2 n
 n( X   ) 2 
W       
i 1    2 i 1  2 
Okay, let's take a break here to see what we have. We've taken the quantity on the left side of
the above equation, added 0 to it, and showed that it equals the quantity on the right side. Now,
what can we say about each of the terms. Well, the term on the left side of the equation:
2
n
 X 
 
i 1   

is a sum of n independent chi-square(1) random variables. That's because we have assumed

that X1 , , X n are observations of a random sample of size n from the normal
distribution N (  , 2 ) . Therefore:
x

follows a standard normal distribution. Now, recall that if we square a standard normal random
variable, we get a chi-square random variable with 1 degree of freedom. So, again:
2
n
 X 
 
i 1   

is a sum of n independent chi-square(1) random variables. Therefore, the moment-generating

function of W is the same as the moment-generating function of a chi-square(n) random
variable, namely:
n

M W (t )  (1  2t ) 2
1
for t  . Now, the second term of W, on the right side of the equals sign, that is:
2
n( X   ) 2
2
is a chi-square(1) random variable. That's because the sample mean is normally distributed
2
with mean μ and variance . Therefore:
n
X 
Z N (0,1)

n
is a standard normal random variable. So, if we square Z, we get a chi-square random variable
with 1 degree of freedom:
n( X   ) 2
Z  2
 (1)

And therefore the moment-generating function of Z 2 is:
1

M Z 2 (t )  (1  2t ) 2
1
for t  . Let's summarize again what we know so far. W is a chi-square(n) random variable,
2
and the second term on the right is a chi-square(1) random variable:
2
 X 
n
(n  1) S 2 n( X   ) 2
W     
i 1    2 2
 
 (2n )  12
n 1
 
M W (t )  (1  2t ) 2
M Z 2 (t )  (1  2t ) 2
Now, let's use the uniqueness property of moment-generating functions. By definition, the
moment-generating function of W is:
( n 1) S 2
t( Z2 )
M W (t )  E[ tW
]  E[ 2
]
Using what we know about exponents, we can rewrite the term in the expectation as a product
of two exponent terms:
( n 1) S 2
t( )
E[ tW
]  E[ 2
. t (Z 2 )
]  M ( n 1) S 2 (t ).M Z 2 (t )
2
The last equality in the above equation comes from the independence between X and S 2 . That
is, if they are independent, then functions of them are independent. Now, let's substitute in what
we know about the moment-generating function of W and of Z 2 . Doing so, we get:
n 1
 
(1  2t ) 2
 M ( n 1) S 2 (t ).(1  2t ) 2
2
( n 1) S 2
Now, let's solve for the moment-generating function of 2
, whose distribution we are
trying to determine. Doing so, we get:
( n 1)

M ( n 1) S 2 (t )  (1  2t ) 2
2
1
for t  . But, oh, that's the moment-generating function of a chi-square random variable
2
with n−1 degrees of freedom. Therefore, the uniqueness property of moment-generating
functions tells us that ( n 1) S 2
must be a a chi-square random variable with n−1 degrees of
2
freedom. That is:
(n  1) S 2 (X i  X )2
 i 1
 (2n 1) as was to be proved.
2 2
And
n
(X i  X )2
i 1
 (2n )
2
The only difference between these two summations is that in the first case, we are summing
the squared differences from the population mean μ, while in the second case, we are summing
the squared differences from the sample mean X . What happens is that when we estimate the
unknown population mean μ with X we "lose" one degreee of freedom. This is generally true...
a degree of freedom is lost for each parameter estimated in certain chi-square random variables.
Example 1
Let X i denote the Stanford-Binet Intelligence Quotient (IQ) of a randomly selected

individual, i=1,…,8. Recalling that IQs are normally distributed with mean μ=100 and
( n 1) S 2
variance  2 =162, what is the distribution of 2
?
Solution
Because the sample size is n=8, the above theorem tells us that
(8  1) S 2 7S 2 (X i  X )2
  i 1
2 2 2
follows a chi-square distribution with 7 degrees of freedom
Exercise
Suppose the weights of bags of flour are normally distributed with a population standard
deviation of  =1.2 ounces. Find the probability that a sample of 200 bags would have a
standard deviation between 1.1 and 1.3 ounces.
WEEK 2
TOPIC 2: POINT ESTIMATION.
Introduction
In this topic, we are going to study about consistency, sufficiency, efficiency and
completeness which are properties of point estimators in theory of estimation.
Objectives
By the end of this topic, you should be able to:

1. Define the terms consistency, sufficiency, efficiency and completeness as used in
estimation.
2. Check whether a given estimator is consistent, sufficient, efficient and complete.
Learning Activities
Activity:
 Students to take note of the exercises/ activities provided within the text and at the end of
the topic
Topic Resources
 Students to take note of the reference text books provided in the course outline
2. PROPERTIES OF POINT ESTIMATORS
A good estimator must satisfy three conditions:
 Unbiased: The expected value of the estimator must be equal to the mean of the
parameter
 Consistent: The value of the estimator approaches the value of the parameter as the
sample size increases
 Relatively Efficient: The estimator has the smallest variance of all estimators which
could be used
 Sufficiency.
2.1 UNBIASED
An unbiased estimator of a population parameter is an estimator whose expected value is equal

to that parameter. An estimator of a given parameter is said to be unbiased if its expected
value is equal to the true value of the parameter. In other words, an estimator is unbiased if it
produces parameter estimates that are on average correct.
An estimator is said to be unbiased if its expectation equals to the parameter being estimated
i.e Let  be an estimator of  . Then  is said to be unbiased if E[ ]   . If
E[ ]   then  is said to be biased

E[ ]   then  is positively skewed
E[ ]   then  is negatively skewed
E[ ]  0 then  is biased
An unbiased estimator is an accurate statistic that’s used to approximate a

population parameter. “Accurate” in this sense means that it’s neither an overestimate nor an
underestimate. If an overestimate or underestimate does happen, the mean of the difference is
called a “bias.”
2.1.1 BIASED ESTIMATOR

An estimator which is not unbiased is said to be biased.
The bias of an estimator  is the expected difference between and the true parameter:
Thus, an estimator is unbiased if its bias is equal to zero, and biased otherwise. There are two
common unbiased estimators
 Sample mean X for the population mean 
 Sample proportion p for population proportion p
2.1.2 Bias of the Sample Variance
As an estimator of population variance  2 , the bias of the sample variance S 2 is given
1 n
by S 2  
n  1 i 1
( X i  X )2
If we consider two possible estimators for the same population parameter and both are
unbiased, variance might help to choose i.e. prefer the estimator with smaller variance
if both are unbiased.
The standard deviation of an estimator is also called the standard error of the estimator.
Variance of two common estimators
p(1  p)
i ) var( p) 
n
2
ii ) Var  X  
n
If one or more of the estimators is biased, it may be harder to choose between them e.g.
one estimator may have a small bias and a small variance while another is unbiased but
has a very large variance. In this case, prefer the biased estimator over the unbiased
estimator.
Suppose that 1 and 2 are unbiased estimator of  . Since 1 has a smaller variance than
 2 , the estimator 1 is more likely to produce an estimate close to the true value  . If
we consider all unbiased estimators of  , the one with the smallest variance is called
the minimum variance unbiased estimator (MVUE). Sometimes the MVUE is called
UMVUE where U represents uniformly meaning for all  .
The mean square error (MSE) is a criterion which tries to take into account concerns
about both bias and variance of the estimators.
MSE ( )  E[(   ) 2 ]
MSE can be re-stated in terms of its variance and its bias

MSE ( )  var( )  [ Bias( )]2
Example 2.1
Let X1 ,..., X n be a random sample of size n from a Normal distribution N (  , 2 ) . Show that
i. X is unbiased estimator of 
1 n
ii. S2   ( X i  X )2 is unbiased estimator of  2
n  1 i 1
1 n
iii. S 2   ( X i  X )2 is biased estimator of  2
n i 1
Solution ( i)
1
X  ( X 1  ....  X n )
n
1
E[ X ]  E ( ( X 1  ....  X n ))
n
1
 E ( X 1  ....  X n )
n
1
  E ( X 1 )  E ( X 2 )  ...  E ( X n ) but E[ X 1 ]  
n
1
 ( 1   2  ...   n )
n
1
 .n
n

Therefore X is unbiased estimator of 
Solution (ii)
 n 2
(Xi  X ) 
E ( S 2 )  E  i 1 
 n 1 
 
1  n 
 E  ( X i  X )2 
n  1  i 1 
1
  X 12  X 2  2 XX i 
n 1
1  n 2 
  
n  1  i 1
X i  nX 2 

1  n 2  2
  i
n  1  i 1
E[ X 2
]  nE [ X ] 

but E [ X i
2
]   2
  2
and E [ X 2
]   2

n
1  n 2 
E (S )    (    )  n(   ) 
2 2 2 2
n  1  i 1 n 
1
  n 2  n 2  n 2   2 
n 1
1
 (n  1) 2 
n 1
2
Solution (iii)
 n 2
(Xi  X ) 
E ( S 2 )  E  i 1 
 n 
 
1  n 
 E  ( X i  X )2 
n  i 1 
1
  X 12  X 2  2 XX i 
n
1 n 
   X i2  nX 2 
n  i 1 
1 n 2  2
  i
n  i 1
E[ X 2
]  nE[ X ] 

but E [ X i
2
]   2
  2
and E [ X 2
]   2

n
1 n 2 
E ( S )    (    )  n(   ) 
2 2 2 2
n  i 1 n 
1
  n 2  n 2  n 2   2 
n
1
 (n  1) 2    2 hence biased
n
Exercise
a) Let X1 , , X n be a random sample of size 3 from a distribution with unknown
 and positive variance  2 .
X1  2 X 2  3X 2
Let Y
6
i. Show that E[Y ]  
ii. Show that X is a better unbiased estimator than Y
b) Let X1 , X 2 and X 3 be three identically and independently distributed random

variables
with mean E  X i    and Var  X i   18 for i  1, 2,3 . Consider the estimator of θ as
ˆ  16 X1  13 X 2  12 X 3 .
i. Show that ˆ is an unbiased estimator of 

ii. Obtain the value of Var ˆ 
c) X 1 , X ,2 X 3 is a random sample of size 3 from a population which is normal i.e
X~N( ,𝞼2). t1,t2 and t3 are estimators of mean where;
t1  x1  x 2  x3
t 2  2 x1  3x3  4 x 2
 x1  x 2  x3
t3 
2
where  is so that t3 is unbiased estimator for . Which one is the best estimator?
WEEK 3
TOPIC 3: CONSISTENCY
A sequence of estimators ˆn is said to be consistent of  if for every   0
limn pr{| ˆn   |  }  0 Or
limn pr{| ˆn   |  }  1
N/B:
Theorem 3.3.1
Suppose E (ˆn )   (i.e ˆn is unbiased for  , then ˆn is a consistent estimator of  if
lim n Var (ˆn )  0
Equivalently
lim n E[ˆn   ]2  0
3.1: Schwarz Inequality.
For any random variable U and V,  E UV   E U 2   E V 2 

2
Proof:
Let U and V be two random variables satisfying 0  E (U 2 )   and 0  E (V 2 )   .
Then for any constants a and b , 0  E  aU  bV   a 2 E U 2   2abE UV   b2 E V 2  ......(1)
2
And
0  E  aU  bV   a 2 E U 2   2abE UV   b2 E V 2  ......(2)

2
Now let
1 1
a   E V 2   2 And b   E U 2   2
Then it follows from (1) that

1
E UV    E U 2   E V 2   2
And from (2) it follows that

1
E UV    E U 2   E V 2   2
These two relations imply the inequality  E UV   E U 2   E V 2  .
2
3.2 Chebyshev’s Inequality.
Let X be a random variable with mean  and variance  2 . Then for any positive number
  0 , we have that
VarX
Pr{| x   |  }  Provided  2  
2
Or
2
Pr{| x   |  }  1  2

Also recall convergence in probability (stochastic convergence).
A sequence of random variables X1 , X 2 , , X n is said to converge in probability to a constant

' a ' if for any   0
limn pr{| X n  a |  }  1 or
limn pr{| X n  a |  }  0
N/B: Even if ˆ is an unbiased estimator of  , we could like that it is as close as possible to 

for large n .
We prefer an estimator that approaches the parameter  in probability as n   i,e statistic
n
T 
Example 3.1: Let x1 , x2 , , xn be a random sample from a distribution with mean  and a
finite variance  2 . Show that x is consistent for  .
Solution:
By Chebyshev’s inequality for every   0
2
Var ( x ) 2
Pr{| x   |  }   n2  2
2  n
2
lim n Pr{| x   |  }  lim n 0
n 2
 limn Pr{| x   |  }  0
Hence x is consistent for  .
Remark:
| ˆn   | Is distance /difference between the estimator and the parameter to be

estimated.
Or
Var ( x ) 2
Pr{| x   |  }  1   1 2
2 n
2
lim n Pr{| x   |  }  1  lim n 1
n 2
 x is consistent for  .
Example 3.2 : Let x1 , x2 , , xn be i.i.d N (  ,  2 ) random variables. Show that:
i. The sample mean x is a consistent for 
1
  xi  x  and
2
ii. s2 
n
1
  xi  x  are consistent estimators of  2
2
iii. S2 
n 1
Solution:
2
i. If x N (  ,  2 ) then x N ( , )
n
2
lim n Pr{| x   |  }  lim n 0
n 2
Var ( s 2 )
ii. lim n Pr{| s 2   2 |  }  lim n .
2
1 2
But Var ( s 2 )  Var    xi  x  
n 
  x  x  
2
1
 2 Var
n
   xi  x 
2
 n
1
 2 Var   2

i
2


 
1 2 2    xi  x 2 
 2 ( ) Var  
n   2

 
x  x 
2
Since i
 2 n 1 then mean  n  1 and Var  2(n  1)
2
4
Var ( s 2 )   2( n  1)
n2
 4  2(n  1) 2 4  n 1 
 lim n Pr{| s   |  }  lim n
2 2
 2 lim n  2 
n 2 2
  n 
 n 1  1 1 
But lim n  2   lim n   2   0
 n  n n 
limn Pr{| s 2   2 |  }  0
Or
 2 4 (n  1) 
lim n Pr{| s 2   2 |  }  lim n 1   1
 n 2 2 
Hence a consistent estimator of  2

Task: Use the same technique used to solve (ii), to solve (iii)
Exercise 3.1: Let x1 , x2 , , xn be a random sample from a distribution with mean  and
variance  2 . Show that x is consistent for  . (Hint: First get the variance of x )
Exercise 3.2: Consider a binomial distribution with parameter p . Find the unbiased estimator
for p and test for consistency.
Remark: If X i is i.i.d from a distribution with mean  and variance  2 , then

i. E( X i )   &Var ( X )   2
ii. E (aX i )  aE ( X )  a
iii. Var (aX )  a 2Var ( X )  a 2 2

WEEK 4
TOPIC 4: EFFICIENCY
It’s possible to have more than one unbiased and consistent estimators for the same
parameter. Between these estimators, one will be more efficient than the other(s).
Definition : If ˆ1 and ˆ2 are two unbiased and consistent estimators of the parameter  with
variances Var (ˆ1 ) and Var (ˆ2 ) respectively and we have that Var (ˆ1 )  Var (ˆ2 ) for all n, then
ˆ1 is said to be more efficient estimator.
I. e among the unbiased and consistent estimators, we choose the one with the least variance.
Var (ˆ2 )
The ratio is known as efficiency of ˆ1 relative to ˆ2 .
ˆ
Var (1 )
Example 4.1: A random sample of size 5 is drawn from a normal population with unknown
mean  and known variance  2 i.e X i N ( ,  2 ) ; i  1, 2, ,5 . Consider the following
estimators of the population mean 
1 5 X1  X 2 2 X1  X 2   X 3
t1  x   Xi
5 i 1
t2 
2
 X 3 and t3 
3
where  is such that t3 is
unbiased for 
i. Are t1 and t2 unbiased?

ii. State with reasons the estimator which is best among t1 , t2 and t3 .
Solution:
1  1 1 5
i. E (t1 )  E   X i    E ( X i )     
5  5 5 5
Thus t1 is unbiased estimator for 
 X  X2  1 1
E (t2 )  E  1  X 3   E  X 1  X 2   E  X 3           2
 2  2 2
Thus E (t2 )  2   hence not unbiased.
2X  X2   X3  1 1 
E (t3 )  E  1    2       3       
 3  3 3 3

For t3 to be unbiased, then E (t3 )         0
3
2 X1  X 2
 t3  which is unbiased.
3
1 1 5
ii. Var (t1 ) 
25
 Var ( X ) 
25
  2   2  0.2 2
25
 X  X2 
Var (t2 )  Var  1  X 3  which is not considered since its biased
 2 
2X  X2  1 1
Var (t3 )  Var  1   Var  2 X 1  X 2   4Var ( X 1 )  Var ( X 2 )
 3  9 9
1
9
 
5
 4 2   2   2  0.5556 2
9
Since t1 and t3 are unbiased, we consider Var (t1 ) and Var (t3 )
Thus Var (t1 )  Var (t3 ) , then t1 is more efficient estimator of 
Exercise : Let x1 , x2 , , xn denote a random sample from a population with mean  and
variance  2 . Consider the following three estimators for 
1
ˆ1   X1  X 2 
2
1 X  X 3    X n 1
ˆ 2  X1  2
2 2( n  2)
ˆ 3  x
Show that each of the three estimators are unbiased for  . Find the efficiency of ̂3 relative to
̂2 and ̂3 relative to ̂1
WEEK 5
TOPIC 5: SUFFICIENCY.
Let x1 , x2 , , xn be a random sample of size n from a population whose p.d.f. is f ( x, ) , where
 is unknown. Then the statistic T  t  x1 , x2 ,..., xn  is said to be sufficient for  if the
conditional distribution of x1 , x2 , , xn given T is independent of the unknown parameter  .
This implies that T is sufficient for  iff p x T  is independent of  for all T

i.e T is sufficient statistic of  if it contains all the information in the sample regarding the
parameter  .
Example 5.1: Let x1 , x2 , , xn be iid (identically independent distributed) random variables
from a Bernoulli distribution with parameter p . Show that T   x is a sufficient statistic.
Solution:
X B(n, p)
P( X  x)  p x (1  p) x ; x  0,1
For T to be sufficient, p x  T  doesn’t depend on p i.e p  x T   pp( x(T,T) )

n 1
But T   xi   X i  X n and xi are independent.
i 1
Then
n 1
P( x, T )  P( X 1  x1 )  P( X 2  x2 )  P( X n 1  xn 1 )  P( X n  T   X i )
i 1
n 1
 P( x1 )  P( x2 ) P( x3 )  P(T   X i )
i 1
n 1 n 1
1 x1 1 x2 1 xn1
T  Xi 1T   Xi
 p (1  p )
x1
 p (1  p )
x2
 p xn1
(1  p ) p i 1
(1  p ) i 1
 T  X i 1T   X i 
n1 n1
 n 1 
   p xi (1  p)1 xi    p i1 (1  p) i1 
 i 1   
 
  T  1T   X i 
n1 n1
  xi
n1
 ( n 1)   xi 
Xi
  p 1  p  
 p i 1
(1  p) i1 

 
  
 
n1
 Xi
p  (1  p) n 1 (1  p)1 (1  p) i1
xi
pT
 n1 . n1 .
 xi  Xi (1  p)T
(1  p) p i1
(1  p ) n T (1  p )1
 .p .
(1  p )1 (1  p )T
p( x, T )  pT .(1  p) nT
n
Next p(T )    p t .(1  p) n t ; t  0,1,..., n
t
i.e T is the distribution of n independent Bernoulli trials which is the binomial

distribution in t.
 T
p x 
p( x, T )
p(T )

pT (1  p) n t
n t n t

1
n
which does not depend on p .
  p (1  p)  
t t
Hence T   x is a sufficient statistic of p
Alternatively, using mgf technique:
M x (t )  E etx    etx . p( X  x)
1 1
  etx . p x (1  p)1 x   ( pet ) x (1  p)1 x
i 0 i 0
 1(1  p)1  pet (1  p)0  (1  p)  pet
Now
n
M T (t )  M (t )  M X1 (t ).M X 2 (t )...M X n (t )   M X i (t )
 x i
i 1
   
n
  (1  p)  pe  q  pe t t
i 1
This is the mgf of a binomial distribution.
This implies that the mgf is unique, then the distribution of T is given by
n T n T
  p (1  p ) ; T  0,1, 2,..., n
t
Thus p x  T 
p t (1  p) n t
n t n t

1
n
which is independent of p .
  p (1  p)  
t t
Example 2.4.2: Let x1 , x2 , , xn be iid random variable of size n from a population having a
e  x
Poisson distribution with parameter  i.e p ( X  x)  ; x  0,1, 2,...
x!
Show that T   x is sufficient for  .
Solution:
The task is to show that p ( x T ) doesn’t depend on 
p x  T   pp( x(T,T) )
But p( X , T )  p( x1 , x2 ,..., xn 1 , xn ,  xi )
 p( x1 , x2 ,..., xn 1 , xn  t   xi )
Since xi are iid, then

n 1
p  x, T   p( x1 ). p( x2 )... p( xn 1 ). p(t   xi )
i 1
n 1 n 1
  p( xi ). p(t   xi )
i 1
n1
t  xi
n 1
e   xi e  
 .
xi !  n 1 
 t   xi  !
i 1
 
n1 n1
 xi   xi
e ( n 1)   e   t 
 n 1
. n 1
 xi ! (t   xi )!
e  n  t
p ( x, T ) 
 n 1   n 1 
  xi !  t   xi  !
  

e  x
M X (t )  E (etx )   etx  e ( e 1)
t
x 0 x!
n
M T (t )  M (t )  M x1 (t ).M x2 (t )...M xn (t )   e ( e 1)  e n ( e 1)
t t
 x
i
i 1
e  n (n )t
Therefore p (T )  ; t  0,1, 2,...
t!
 T
p x 
 n 1
e  n  t
 n 1
 e 
t!
.  n t  n 1

t!
  n 1 
Which is
  xi !  t   xi  !   xi !  t   xi  !
     
independent of  .
Hence T is sufficient estimator of  .

Alternatively
e  n (n ) y
p (t  y )  Where y is non-negative integer
y!
 t   p  X  x , X
p x 1 1 2  x2 ... X n  xn 
t  y 
p
 X 1  x1 , X 2  x2 ... X n  xn , t  y 
p (t  y )
p
 X 1  x1 , X 2  x2 ... X n  xn 
p (t  y )
e  n  T y!
  y!  n
 n
  n
  xi  !.e ( n )
y
 xi !n y
 i 1  i 1
e  n   i e  n (n )T
x
Note: p ( x, T )  and p (T ) 
 n  T!
  xi  !
 i 1 
Remark:
Let p ( x)  p ( X  x)
Then a necessary and sufficient condition for t to be sufficient of  is that there exists a
factorization
p ( x)  g  t ( x)    ( x)..........(1)
Suppose that (1) holds and let t ( x)  y , then p (t  y )   p ( x) with t ( x)  Y
x
The conditional probability
g  t ( x )  ( x)

p X  x
t Y  
p ( x) 
 

 ( x)
p (t  y )  g (t )   ( x)  ( x)
 
Which is independent of  .
Conversely if this conditional distribution does not depend on  and is equal to say h( x, y) ,
then
p ( x)  p (t  y )  h( x, y ) And so (1) holds
We find that if g  s ( x);   is the frequency function of a statistic s( x) , then a necessary and
sufficient condition for s( x) to be sufficient is that we can write the probability
f ( x1; )  f ( x2 ; )  f ( xn ; )  g  s( x);   x1 , x2 ...xn 
Where   x1 , x2 ...xn  is independent of  .
It follows that s( x) is a sufficient unbiased estimator of 
Sufficiency can also be verified in terms of the likelihood function by the following theorem.
Theorem 5.1: Factorization Theorem
Let x1 , x2 , , xn be a random sample of size n from a population X having p.d.f f ( x, ) where
 is unknown. A statistic T  t ( x1 , x2 ,...xn ) is said to be sufficient iff the likelihood function
L ( x,  )
L( x, )  f ( x1 , )  f ( x2 ,  )... f ( xn ,  )
n
  f ( xi , )
i 1
L( x, ) is joint distribution of x1 , x2 , , xn  L( x1 , x2 , , xn , )
The statistic T  t ( x1 , x2 ,...xn ) is said to be sufficient if the likelihood function can be expressed
in the form
L( x, )  L1 (t ,  ).L2 ( x )
Where L1 (t , )  0 and it depends on x ' s only through t and  and L2 ( x )  0 doesn’t depend on
 but depend on x ' s only.
Example 5.2: Let x1 , x2 , , xn be a random sample from a population having a Poisson
e  x
distribution with parameter  i.e p ( X  x)  ; x  0,1, 2,... Find a sufficient statistic for
x!
.
Solution:
e    xi e  n   i
n n x
L( x ,  )   p( X  xi )    n
 xi !
i 1 i 1 xi !
i 1
 e n   i .
x 1
n
L1 ( t , )
x !
i 1
i
L2 ( x )
 L1 (t ,  )  0 and it depends on x ' s only through t   xi
Similarly L2 ( x )  0 and doesn’t depend on  . Thus t   xi is sufficient estimator of the

parameter 
Remark: A function of a sufficient statistic is also sufficient e.g If  x is sufficient, then

i
x
x i
is also sufficient for  .
n
Definition : Jointly Sufficient Statistics
A set of statistics s1 ( x1 , x2 ,..., xn ), s2 ( x1 , x2 ,..., xn )...sk ( x1 , x2 ,..., xn ) are said to be jointly sufficient
for the parameters 1 ,2 ,...,r ; r  k if the likelihood function L can be expressed as
L  L1 (1 ,2 ,...,r , s1 , s2 ,..., sn )  L2 ( x1 , x2 ,..., xn ) where
L1 is a function of 1 , 2 ,..., r and s1 , s2 ,..., sn
L2 is a function of x1 , x2 ,..., xn alone.
Remark: Jointly sufficient doesn’t imply single sufficiency i.e if ( s1 , s2 ,..., sn ) are jointly
sufficient for (1 ,2 ,..., j ...,r ) then it does not follow that s j is sufficient for  j
Definition : A statistic T ( x1 , x2 ,...xn ) is said to be minimal sufficient if it a function of every

other sufficient statistic.
Example 5.3: Let X N ( , 2 )
i. If  2 is known, find a sufficient statistic for  (i.e t   xi )
ii. If  is known, find sufficient statistics for  2 (i.e t   xi 2 )
iii. If  and  2 are unknown, find sufficient statistic for  and  2
Solution:
1  x 
2
 xi   
n 1
n
1   n  
2
 
L( x ,  ,  2 )     2  2 ( 2 ) 2 e

i. e 2 2 2
i 1  2
n 
n

1
 x 2
2   xi  n 2 
  2  ( ) e
 i
2
2 2 2 2
n 1
 xi 2 )
1
 2   x  n 
 
  (  2
 2 2
i
2 2 2 2 2
e e
1
 2   x  n  1
 xi 2 ) n
 2 
 2
 n 
 2 
( 
Since  2 is known, L( x,  ,  2 )  e e
i
2 2 2 2 2 2
L1 ( t ,  )
L2 ( x )
 L1 (t ,  )  0 and t   xi
Thus t   xi is a sufficient statistic for  . L2 ( x )  0 is independent of  .
n 1
 x  xi  n 2 
 
  2
2 
L( x,  ,  2 )  2 2
i
2 2
ii. 2 e
n 1
 t  2  t  n 
 
  2 n
 2 

 
2 1
2 2 2 2
e 2
L1 ( t1 ,t2 ,  , 2 ) L2 ( x )
Where t1   xi and t2   xi 2
Thus L1 is non-negative because is an exponential and depend on x ' s through t1   xi

and t2   xi 2 .
L2 is non-negative and independent of  and  2 . Hence t1   xi and t2   xi 2 are

jointly sufficient for  and 2
iii. Similarly
n 1
 x  xi  n 2 
 
  2
2 
L( x,  ,  2 )  2 2
i
2 2 2
e
n 1
 t  2  t  n 
 2  2 e
n   2
  2 
 2 1
2 2 2
L2 ( x ) L1 ( t2 , 2 )
5.2: Completeness
A family of distribution  f ( x, );   is said to be complete if for any function g ( x) ,

E[ g ( x)]  0 ,  0 , it implies that g ( x)  0x, x  0 with probability
i.e Pr [ g ( x)  0]  1
Example 5.4: Let X B(n, ) . Show that the family of distribution

n
f ( x,  )     x (1   ) n  x ; x  0,1, 2,..., n;  (0,1) is complete.
 x
Solution:
Let g ( x) be any function of x . Then
n n
n
E[ g ( x)]   g ( x)  f ( x,  )   g ( x)     x (1  ) n  x
i 1 i 0  x
x
n
 n   n
 n   
 (1   )  g ( x)    
n
  (1   )  g ( x)   {e( )}x e( ) 
n

i 0  x   1  i 0  x  1 
Now E[ g ( x)]  0
n
n
  g ( x)    {e( )}x  0 since (1   )n  0
i 0  x
n n n n
 g (0)   {e( )}0  g (1)   {e( )}1  g (2)  {e( )}2  ...  g (n)  {e( )}n
0 1  2  n
Which is a polynomial of degree n in e( ) which can only be identically equal to iff
all the coefficients a(n, x)  0x  0,1, 2,..., n
n
This is if a(n, x)  g ( x)    0x
 x
n
But    0  g ( x)  0x . Hence g ( x)  0 . Therefore, the family of binomial
 x
distribution is complete.
Note: Other complete families of distribution are Poisson, Normal distribution with 1
parameter.
Definition : A statistic T is said to be complete if the p.d.f is complete (i.e if it involves a

complete family of distribution)
Example 5.6: Let x1 , x2 , , xn be a random sample from Bernoulli with parameter  i.e
X B(1, ) . Show that the statistic T  x1  x2 is not complete.
Solution:
Let g ( x) be any function of x . Then
1
E[ g ( x)]   g ( x) P( X  x)   g ( x) x (1   )1 x
i 0
 g (0) 0 (1   )1  g (1) 1 (1   )0  g (0)(1   )  g (1)
E[ g ( x)]   {g (1)  g (0)}  g (0)  0;0    1.....(*)
Let * above be linear function of z (say)

a( z )  b  0
Then a  b  0  g (1)  g (0)  0 and g (0)  0
Meaning g (0)  g (1)  0

n
Task: Refer to Example 2.9 above. Show that s   xi is complete.
i 1
1
 ; 0  x 
Exercise : Let x1 , x2 , , xn be a random sample from the p.d.f f ( x, ) ; f ( x, )  
 otherwise
0
Show that the statistic T  Max{x1 , x2 , , xn } is complete.
Theorem 5.2
Let x1 , x2 , , xn be a random variable from p.d.f f ( x, );  . If

f ( x, )  A( ) B( x) exp{c( ) D( x)} , then f ( x, ) belong to 1-parameter exponential family.
n
 D( x ) is a complete minimal sufficient statistic.

i 1
i
Example 5.6: Show that f ( x, )   e x ; x  0 belongs to one-parameter exponential family.
Solution:
The task is to write f ( x, )  A( ) B( x) exp{c( ) D( x)}
Where A( )   , B( x)  I ( x), c( )   , D( x)  x
Hence f ( x, ) belong to 1-parameter exponential family.
 x e
Exercise : Show that f ( x,  )  ; x  0,1, 2,... belongs to 1-parameter exponential family
x!
Theorem 5.3: Rao-Blackwell
Let x1 , x2 , , xn be a random sample from p.d.f f ( x, ) and let S  s( x1 , x2 , , xn ) be a sufficient
statistic. Let the statistic T  t ( x1 , x2 , , xn ) be an unbiased estimator of T ( ) .
Define T *  E T s , then  
i. T * is a statistic and it’s a function of sufficient statistic S
ii. T * is an unbiased estimator of T ( )
iii. Var (T * )  Var (T ) and Var (T * )  Var (T ) for some  unless p(T *  T )  1
Proof:
i.  
Since S is a sufficient statistic, the distribution of T given S i.e g T s is independent
 
of  , hence T *  E T s independent of  . Therefore T * is a statistic.
ii. E[T * ]  E E T  S   E (T )  T ( ) . Therefore T is unbiased estimator of T ( ) .


iii. Write Var (T )  E {T  E (T * )}2
 E {T  T *  T *  E (T * )}
 E [T  T * ]2  2E [T  T * ][T *  E(T * )]  E[T *  E(T * )]2
*
But E [T  T * ][T *  E (T * )]  E E{(T  T * )T *  E (T ) }
s
 E {(T *  E (T * ). E (T  T ) }  E {(T *  E (T * )[ E (T )  E (T )]}

* *
s s s
 E {(T *  E(T * )[T *  T * ]}
Therefore Var (T )  (T  T * )2  Var T *
Hence Var (T )  Var (T * ) since E (T  T * )  0
5.3: Uniqueness.
Let x1 , x2 , , xn be a random sample of size n from p.d.f f ( x, );  . Let T1  t ( x ) be a
sufficient statistic for  and let the family {L(T , ); } be a complete p.d.f. If there is a
continuous function of T1 {i.e (T1 ) not a function of  } which is unbiased statistic for  , then
this function of T1 is the unique best statistic for 
Let (T1 ) be a continuous function of T1 independent of  .
Let  (T1 ) be another continuous function of T1 independent of  .
Then E{(T1 )}  0 and E{ (T1 )}  0;  , 
Hence E{(T1 )   (T1 )}  0;    (T1 )   (T1 )
Hence T1 is a complete sufficient statistic for 
Task: Let x represent a random sample from each of the discrete distributions having the
following p.d.f
 x (1   )1 x ; x  0,1;0    1
a) f ( x,  )  
 0; otherwise
 x e
 ; x  0,1, 2,...;0    
b) f ( x,  )   x !

 0; otherwise
n
Show in each case that T   xi is complete sufficient statistic for  . Find for each
i 1
distribution the unique continuous function of T1 which is the best statistic for  .
Exercise
1. X 1 , X 2 , X 3 Is a random sample of size 3 from a population which is normal i.e.
X  
N  ,  2 . t1 , t2 and t3 are estimators of mean  where
t1  X1  X 2  X 3
t2  2 X 1  3 X 3  4 X 2
 X1  X 2  X 3
t3 
2
Where  is such that t3 is unbiased estimator for  .
i. Which is the best estimator?
ii. With this value of  , is t3 a consistent estimator? Give reasons.
 x  x  are jointly sufficient statistic for  and  2 respectively.

2
2. Show that x and i
WEEEK 6: CONTINOUS ASSESSMENT TEST I
WEEK 7
TOPIC 7: METHOD OF MOMENTS (MME).
Introduction
In this topic, we introduce the concept of the method of moment (MME). We derive the
method of moment for parameters of various distributions, while taking note of the
underlying assumptions.
Objectives
By the end of this topic, learner should be able to:
3. Explain the basic theory and assumptions behind the method of moment (MME).
4. Derive the method of moment estimates for parameters of different distributions:
Exponential, Geometric, Binomial, Poisson, Uniform distributions … etc.
5. Apply the the method of moment (MME) in real life situation.
Learning Activities
 Students to take note of the activities and exercises provided within the text and at the
end of the topic.
Topic Resources
 Students to take note of the reference text books provided in the course outline.
 Learners to get e-learning materials from MMUST library and other links within their
reach.
7.1 METHOD OF MOMENTS (MME)
 The method of moments is the oldest method of deriving point estimators of population
parameters.
 The commonly used moments are: 1st (also called the mean/ average); the 2nd (also called
the Variance), where the standard deviation is the square root of the variance: an indication
of how closely the values are spread about the mean.
 Suppose that X is a continuous random variable with pdf f  x |    f  x | 1 ,  2 ,...,  k  or
discrete random variable with probability function P  X  x |    P  X  x | 1 ,  2 ,...,  k 
characterized by k unknown parameters. Let X 1 , X 2 ,..., X n be a random sample of size n
from X .
The k  th sample moment about the origin is defined as
1 n k
mk'   xi
n i 1
while the k  th population moment about the origin is defined as

 
k'  E  X k    x k f  x |   dx   x k f  x | 1 ,  2 ,...,  k  dx ; (if X is continuous)
x  x 
Or
k'  E  X k    x k P  X  x |     x k P  X  x | 1 , 2 ,..., k  ; (if X is discrete)

x x
Remark 7.1:
a) The method of moments is based on the assumption that, the sample moments should
provide good estimators of the corresponding population moments.
b) The population moments will be functions of population parameter, thus we equate
it to corresponding sample moments. This yields k simultaneous equations in k
unknowns, i.e.
k'  mk'
The solution to the system, denoted by ˆ1 , ˆ2 ,..., ˆk gives the moment estimators of the
parameters 1 ,  2 ,...,  k .
 Question to reflect as we go through the topic is, are method of moments estimators
unbiased and (b) Consistent?
Remark 7.2: Suppose the method-of-moments equations provide a one-to-one estimate of 

given the first k sample moments. Then, if the first k population moments exist, the method-
of-moments estimate is consistent.
Example 7.1: Find the method of moments estimate for  if a random sample of size n is
taken from the exponential distribution with pdf,
 e  x ; x  0
f  x  f  x |   
0 ; otherwise
Solution:
The exponential distribution has one parameter, thus we will compute the first
population moment
 
1
1'  E  X    x e  x dx    xe  x dx 
x 0 x 0

Using the sample, we have
1 n
m1'   xi  x
n i 1
Thus, by method of moments, the population moments equals to sample moments,

hence
m1'  1'
1
x
ˆ
1
 ˆ  which is the method of moment estimator.
x
Example 7.2: Let X 1 , X 2 ,..., X n be iid ~ B  n, p  , where  and  2 are unknown. Find method
of moment estimator of both n and p .
Solution:
The Binomial distribution has two parameters.
Thus, we will compute the first and second population moments
1'  E  X   np and 2'  E  X 2    2   2  n 2 p 2  n 2 p 2 1  p 

2
The first and second sample moments are;
1 n 1 n 2
m1'  i
n i 1
x  x and m2
'
  xi
n i 1
Thus, by method of moments, the population moments equals to sample moments,

hence
m1'  1'  np
ˆ ˆ  x ......... (i )
and
1 n 2
m2'  2'  n 2 p 2  n 2 p 2 1  pˆ    xi ......... (ii)
2
nˆ i 1
x
Activity 7.1: Solve equations (i) and (ii) simultaneously, and show that; pˆ  and
nˆ
x2
nˆ 
1 n

 xi  x 
2
x
nˆ i 1
Example 7.3: A random sample X 1 , X 2 ,..., X n is selected from a population with uniform
density over the interval  0,   , where  is unknown. Use the method of moments to estimate
the parameter  and check for consistency.
Solution:
If X ~ U  0,   , then
1
 ; 0  x 
f  x;   
0 ; otherwise
First population moment:

  
1 1 x2
1 1 2 
  E  X    x dx   x dx 
' 1
  
1
x 0
  x 0  2 0
2 2
First sample moment:
1 n
m1'   xi  x
n i 1
Thus; m1'  1'
ˆ
x
2
 ˆ  2x is the method of moment estimator for  .
Activity 7.2: Check for unbiasedness and consistency.
 
Example 7.4: Let X ~ N  ,  2 , where  and  2 are unknown. Let X 1 , X 2 ,..., X n be a
random sample of size n from X . Find the method of moment estimators of  and  2 and test
for unbiasedness and consistency.
Solution:
The population moments are:
1'  E  X 1    and 2'  E  X 2    2   2
The sample moments are:
1 n 1 n 2
m1'   i and 2 n 
n i 1
x m '

i 1
xi
Hence we have the system of two equations, i.e.
1 n
1'  m1'  ˆ   xi  x ......... (i)
n i 1
and
1 n 2
  m  ˆ 2  ˆ 2 
'
2
'
2  xi ......... (ii)
n i 1
Solving equations (i) and (ii) simultaneously, we have:
ˆ  x and
1 n 2 1 n 1 n
ˆ 2  ˆ 2  
n i 1
xi  x 2  ˆ 2   xi2  ˆ 2   xi2  x 2
n i 1 n i 1
1 n 2  1 n
 ˆ 2   i     xi  x 
2
x  nx 2
n  i 1  n i 1
Therefore, the moment estimator of  is the sample mean, and estimator of variance(
 2 ) is the sample second moment about the sample mean.
1 n
Note:  ˆ 2    xi  x  is not the sample variance (and hence it is biased)
2
n i 1
Activity 7.3: Check for unbiasedness and consistency?
Remark 7.3: Moment estimators:
1) May have multiple solutions.

2) Are consistent.
3) Need not be unbiased.
4) May not be applicable if there are not sufficient population moments.
5) Have the invariance property: If ˆ1 , ˆ2 ,..., ˆk is the MME of 1 ,  2 ,...,  k , then the MME

of      ˆ1 ,ˆ2 ,...,ˆk .
Example 7.5: Let X 1 , X 2 ,..., X n be iid ~ Bernoulli  p  . Find the MME of   p   1p .
Solution:
The population moment is: 1'  E  X 1   p
1 n
The sample moment is: m1'   xi  x
n i 1
Hence we 1'  m1'  p̂  x

1 1
Therefore, the MME of   p   1p is 
p̂ x
Example 7.6: Let X 1 , X 2 ,..., X n be a random sample from the Poisson distribution with pdf.
 ex x
 ; x  0,1, 2,3...
f  x |     x!
0
 ; otherwise
Find the moment estimator of  and test for unbiasedness and consistency.
Solution:

e  x
The population moment is: 1'  E  X 1    x 
x 0 x!
1 n
The sample moment is: m1'   xi  x
n i 1
Hence we 1'  m1'  ˆ  x
Therefore, the MME of  is ˆ  x .
Activity 7.4: Check for unbiasedness and consistency
 
Example 7.7: Let X 1 , X 2 ,..., X n be iid ~ N  ,  2 , where  and  2 are unknown. Find the

method of moment estimator of .

Solution:
2
1 n
From example 7.4, ˆ  x and ˆ    xi  x 
2
n i 1
 ˆ x
Thus, the method of moment estimator of is 
 ˆ 1 n
2
  xi  x 
n i 1
Exercise 7.1: Let X 1 , X 2 ,..., X n be a random sample from a population having a uniform
distribution on the interval  a, b  , where a and b are unknown. Use the method of moments
to find estimators of a and b . Let the following values be a realization of a random sample of
size 10: 2.3, 4.2, 5.3, 5.7, 8.1, 2.8, 6.2, 4.4, 8.5, 3.5. Calculate the moment estimates of a and
b based on these data. Do the estimates have reasonable values?
Exercise 7.2: Suppose that X 1  0, 42, X 2  0.10, X 3  0.65, and X 4  0.23 is a random sample
 x 1 ; 0  x  1
of size 4 from the pdf f  x   f  x;   
0 ; otherwise
Find the method of moments estimate for  .
 
  2   x 1 1  x  ; 0  x  1
Exercise 7.3: Find the MME of  in the pdf f  x;   
0 ; otherwise
Exercise 7.4: Let X 1 , X 2 ,..., X n be a random sample from a population having a Gamma
distribution with parameters  and  . Find the moment estimators of  and  and test for
unbiasedness and consistency.
Exercise 7.5: Let X 1 , X 2 ,..., X n be a random sample from the beta distribution,
X ~ Beta  ,   , where  and  are unknown. Find the moment estimators of  and  and
test for unbiasedness and consistency
Exercise 7.6: Let random variable X has a uniform distribution over the interval   ,     ,
where  and  are parameters. Obtain the moment estimators of ˆ and ˆ .
Exercise 7.7: Find the moment estimator of  from the pdf
 2x ; 0  x  
 2 1 x
f  x   f  x;    1  ; 0  x  1
0 ; otherwise

Exercise 7.8: Let X 1 , X 2 ,..., X n be a random sample from a random variable X with probability
function
  1 x ; 0  x  1;  0
f  x   f  x;   
0 ; otherwise
Find the moment estimator of  .

WEEK 8
TOPIC 8: MAXIMUM LIKELIHOOD ESTIMATION.
Introduction
In this topic, we introduce the concept of the Maximum Likelihood Estimation. We derive
the Maximum Likelihood Estimates (MLE) for parameters of various distributions.
Objectives
6. Explain the basic theory behind the maximum likelihood estimation (MLE)
7. Derive the Maximum Likelihood Estimates for parameters of different
distributions: Exponential, Geometric, Binomial, Poisson, Uniform distributions …
etc.
8. Find the variance of the maximum likelihood estimates for parameters.
9. Apply the maximum likelihood estimates in real life situation.
Learning Activities
end of the topic.
Topic Resources
reach.
8.1 LIKELIHOOD FUNCTION
Definition 8.1: Suppose that X 1 , X 2 ,..., X n are Independent Identically Distributed (I.I.D)
random variables from X and let x1 , x2 ,..., xn be the observed values of X 1 , X 2 ,..., X n . The
function of the parameters   1 ,  2 ,...,  k  defined by
n
L  | x   L  | x1 , x2 ,..., xn    f  xi |    f  x1 |    f  x2 |    ... f  xn |   if X is
i 1
continuous, or
n
L  | x   L  | x1 , x2 ,..., xn    P  X i  xi |    P  X 1  x1 |    P  X 2  x2 |    ... P  X n  xn |  
i 1
if X is discrete, is called the likelihood function.
 The maximum likelihood estimate (MLE), ˆ  ˆMLE , is the value which maximizes the
likelihood function L   . That is, ˆMLE  arg max L   .
 Thus, to maximize the likelihood function L   , we solve for  in the equation
 2
L    0 and finally check if L    0
  2
Remark 8.1:
a) Both L   and log L      are maximum at the same value of ˆ .


b) It is easier to maximize log L      , i.e. log L    0

8.2 MAXIMUM LIKELIHOOD ESTIMATION
Example 8.1: Find the maximum likelihood estimate for  if a random sample of size n is
taken from the exponential distribution with pdf,
 e  x ; x  0  0
f  x  f  x |   
0 ; otherwise
Test for unbiasedness and consistency.
Solution:
The likelihood function is:

n
n n   xi
L  | x   L     f  xi |      e  x   n e   ne 
i 1  x
i 1 i 1
The log-likelihood function is:    log  e    n ln    x

n  x
    n ln    x
Taking partial derivative w.r.t.  and equating to zero:
   n
    n ln    x    x
   
Take note that the moment you equate the derivative to zero, introduce “hat” on the
parameter(s) to be estimated.
 n
    x  0
 ˆ
Solving the equation:
n
x  0
ˆ
n
 x
ˆ
n 1
 ˆ  
x x
1
Thus, the MLE for  is ˆ 
x
 1 e   x ; x  0  0
1
Note: If f  x   f  x |     , then the MLE for  is ˆ  x (i.e. the

0 ; otherwise
sample mean).
Activity 8.1: Is the estimator obtained above (a) unbiased, and (b) consistent.
Example 8.2: Find the maximum likelihood estimate for  if a random sample of size n is
taken from the Geometric distribution with pdf,
 1    x 1 ; x  1, 2,3,... and 0    1

f  x  f  x |   
0 ; otherwise
Test for unbiasedness and consistency.
Solution:
  n 1   
n n
L  | x   L     f  xi |      1   
x 1 xn
i 1 i 1
The log-likelihood function is:    log  1      n ln    x  n  ln 1   

n xn
    n ln     x  n  ln 1   
  


  

n ln  

  x  n  ln 1   
Take note that the moment you equate the derivative to zero, introduce “hat” on the
parameter(s) to be estimated. Thus;
 n  x  n
     0
 ˆ 1   
n  x  n
 0
ˆ 1  ˆ  
n  x  n
 
ˆ 1  ˆ  
 
 n 1  ˆ  ˆ   x  n 
 n  nˆ  ˆ x  ˆn
 n  ˆ x
n 1
 ˆ  
x x
1
Therefore, the MLE for  is ˆ 
x
Example 8.3: A binomial experiment consisting of n trials resulted in observations x1 , x2 ,..., xn

, where xi  1 if the i-th trial was a success, xi  0 otherwise. Find the maximum likelihood
estimator of p , the probability of a success and test for unbiasedness and consistency.
Solution:
The likelihood of the observed sample is:
n
n n  xi n
 xi
L  p | x   L  p | x1 , x2 ,..., xn   L  p    f  xi |     p 1  p  1  p 
1 x n
x
p i 1
i 1
i 1 i 1
n
L  p   p x 1  p  ; where x   xi
n x
i 1
The log-likelihood function is:
 p   log  p x 1  p  
n x
 p   x ln p   n  x  ln 1  p 
Setting the derivative w.r.t. p and equating to zero:
 

p
 p 
p
 x ln p   n  x  ln 1  p 
  
  p  x ln p   n  x  ln 1  p 
p p p
 x nx
  p  
p p 1 p
 x nx
  p   0
p pˆ 1  pˆ
x nx

pˆ 1  pˆ
x 1  pˆ   pˆ  n  x 
x  px
ˆ  npˆ  px
ˆ
x  npˆ
x
 pˆ 
n
x
Therefore, the MLE for p is pˆ 
n
Exercise 8.1: Find the maximum likelihood estimate for p if a random sample of size n is
 n  x
  p 1  p  ; x  0,1, 2,3... n
n x
taken from the Binomial distribution with pdf f  x |     x 

0
 ; otherwise
Example 8.4: Let X 1 , X 2 ,..., X n be a random sample from the Poisson distribution with pdf.
 e  x
 ; x  0,1, 2,3...
f  x |     x!
0
 ; otherwise
Find the maximum likelihood estimate for  and test for unbiasedness and consistency.
Solution:
e    x e  n  
x
n n
The likelihood function is: L  | x   L      f  xi |     
i 1 i 1 x!  x!
i
  n   x 
e   
The log-likelihood function is:     log  
   n   x ln   ln    x ! 
  x!
i
  i 
 i 
 
     n   xi ln   ln   x !
 i 
Taking partial derivative w.r.t.  and equating to zero:
     
     n    xi ln   ln   x !
     i 
 n 
x i
0
ˆ
n
x i
ˆ
 ˆ 
x i
x
n
Therefore, the MLE for  is ˆ  x
Example 8.5: Let X 1 , X 2 ,..., X n be a random sample from the Uniform distribution with pdf.
1
 ; 0  x 
f  x |    
0 ; otherwise
Find the maximum likelihood estimate for  and test for unbiasedness and consistency.
Solution:
n
n n
1 1
The likelihood function is: L  | x   L     f  xi |       
i 1 i 1   
  1 n 
 
The log-likelihood function is:    log      log   n   n ln 
  
 
    n ln 
  n
     n ln   
  
 n
    0
 ˆ
 n
In solving the above equation, we note that, the function     0 for ˆ  0 .
 ˆ
Hence, L   is a monotonically decreasing function of  and it is maximized at the

largest observation in the sample, i.e.   x n   max  X 1 , X 2 ,..., X n  .
Therefore, the MLE for  is ˆ  x n   max  X 1 , X 2 ,..., X n 
Theorem xx: Invariance property of MLE: If ˆ1 , ˆ2 ,..., ˆk be the MLE of 1 ,  2 ,...,  k , then the

MLE of any function    is  ˆ1 ,ˆ2 ,...,ˆk 
Example 8.6: The variance of an exponential distribution having parameter  is 1

2
. Since the
MLE of  is 1
x , the MLE of the variance is x 2 .
Example 8.7: Let X 1 , X 2 ,..., X n be a random sample from the normal distribution,
 
X ~ N  ,  2 , where  and  2 are unknown. Find the maximum likelihood estimates for
 and  2 and test for unbiasedness and consistency.
Solution:
 
If X ~ N  ,  2 , then the pdf is;
 1  1  x 
2
 e 2  ;    x  ,       ,  2  0

f x |  , 2     2
0 ; otherwise


n
 1  2  1  2 2 2   x 
n n 2

 1  x 
1
 
n n 2
1
L  | x   L  ,  2
f  xi |     e 2   2    e i 1
i 1 i 1  2     2 
n
 x 
2
 1
      2 
 n2  n2 2 2
L  , 2 2
e i 1
The log-likelihood function is:
  x  
n
2
 1
  ,     2 
 n2 n
 log   2    n ln  2  n ln 2  1 2  x   2

2  n2 2 2
i 1
e
  2 2 2
i 1
 
n
n n
    ln  2  ln 2  21 2   x   
2

2 2 i 1
Taking partial derivative w.r.t.  and equating to zero
   n  n    1 n 2
      ln     ln 2    2 2   x    
2
   2    2    i 1 
 n n
2  2  x    1 
2 
     21 1
x  
 i 1 i 1
 n
    ˆ1   x  ˆ   0

2
i 1
n
Solving the equation: 1
ˆ 2   x  ˆ   0
i 1
n n n
   x  ˆ   0   x  ˆ  0
i 1 i 1 i 1
n x
  x nˆ  0  ˆ 
i 1
x
i 1 n
Taking partial derivative w.r.t.  2 and equating to zero
   n 2  n    1 n 
2
   2  2 2  
 2 
 ln    2 
ln 2   x   
 2   2    2    i 1 
n
x  
2
 n
     
i 1
 2  
2 2 2
2 2
n
  x  ˆ 
2
 n
     
i 1
0
 2ˆ 2  
2 2
2 ˆ 2
  x  ˆ 
2
n i 1
Solving the equation:   0
2ˆ 2  
2
2 ˆ 2
n
nˆ 2    x  ˆ 
2
i 1
 0
 
2
2 ˆ 2
n
  nˆ 2    x  ˆ   0
2
i 1
1 n
Substituting ˆ  x , we have ˆ 2   x  x
2
n i 1
1 n
Therefore, the MLE for  is ˆ  x and that for  2 is ˆ 2   x  x
2
n i 1
Activity 8.7: Refer to example 8.7. Find the maximum likelihood estimates for;
a)  when  2 is known.
b)  2 when  is known.
Exercise 8.2: Let X ~ N   ,1 and let x1 , x2 ,..., xn be a sample point from X. Find the
maximum likelihood estimate of  and test for unbiasedness and consistency.
Exercise 8.3: Let X 1 , X 2 ,..., X n be a random sample from the gamma distribution,
X ~ Gamma  ,   , where  and  are unknown, i.e.
 1  1  
x
 x e ; x  0,   0,   0
f  x |  ,      
0 ; otherwise

Find the maximum likelihood estimate of  and  and test for unbiasedness and consistency.
Exercise 8.4: Let X 1 , X 2 ,..., X n be a random sample from the beta distribution,
X ~ Beta  ,   , where  and  are unknown. Find the maximum likelihood estimate of
 and  and test for unbiasedness and consistency.
Remark 8.2: The maximum likelihood estimators:

a) Need not be unbiased
b) Are consistent estimators
c) Are most efficient estimators
8.3 VARIANCE OF THE MAXIMUM LIKELIHOOD ESTIMATORS

Refer to example 8.4 on Poisson distribution. Find the variance of the MLE for the parameter
?
Solution:
1
If the MLE for  is ˆ  , then;
x
Variance for ̂ is;

  xi  1 2
 
Var ˆ  Var  x   Var      Var   xi   2
1
Var  x  i
 n  n n

 n
1
 Var ˆ  2 i
1
Var  x   n  
2

1
n 2
1
n   
n n

 
 Var ˆ 
n
Alternatively, we can use the theorem xx below;
Theorem xx: If ˆ is the MLE of  , then the Var ˆ    

1


 
1
 E  2 log L    E  2   
     
e  n  
x
The likelihood function of Poisson distribution is: L  | x   L    

 x! i
 
The log-likelihood function is:     n   xi ln   ln    x !
 i 
Taking partial derivative w.r.t.  :
 n    xi ln   ln   x !  n   i
      x
   
     i  
   n  
 xi

 
Second derivative:

2
   

n
 x i

x i
 2     2
     2 i
2 x
 2
 
     xi
Thus  E  2 log L      E   2

 
 Ex    i

n

n
        
2 2 2


Therefore; Var ˆ     
1


 
1

1

 E  2 log L    E  2    n n
 
      

 Var ˆ   
n

Therefore, the variance of MLE for ̂ is .
n
Exercise 8.5: For each of the Examples 8.1 to Examples 8.7, find the variance of the MLE
for the parameters?
Exercise 8.6: Let X 1 , X 2 ,..., X n be a random sample of size n from the normal distribution,
 1  41  x 5 
2
 ;    x  ,   0
 
X ~ N  ,  2 whose pdf is f  x |  ,  2 
e
  2
0 ; otherwise

a) Find the maximum likelihood estimate ˆ for parameter 

b) Find the variance of the MLE ˆ for the parameter  .
c) Test for unbiasedness and consistency of the MLE ˆ .
WEEK 9
TOPIC 9: METHODS OF MINIMUM VARIANCE
Introduction
There are a number of ways of measuring the goodness of an estimator ˆ of  . Preface is given
to unbiased estimator with as small a variance as possible. MVUE and UMVUE are good
measurements for unbiased estimator. Cramer-Rao Inequality gives a lower bound on the
variance of any unbiased estimator. It gives the lower bound of MVUE.
Uses of Cramer-Rao inequality
i) It gives the lower bound for the variance of unbiased estimator.
ii) An estimator whose variance coincide with Cramer Rao lower bound is an MVUE
Objectives
10. Compute the MVUE.

11. Compute the UMVUE
12. Compute the Cramer-Rao lower bound
Learning Activities
end of the topic.
Topic Resources
reach.
9.0 METHODS OF MINIMUM VARIANCE
If a statistic T  t ( x1 , x2 , x3 ,.......xn , ) based on a sample of size n is such that
i) T is unbiased
ii) T has the minimum variance
Among the class of all unbiased estimator, then T is said to be minimum variance unbiased
estimator (MVUE) of  .
T is MVUE if
i) E (T )  
ii) E (T ' )  
iii) var(T )  var(T ' )
T ' Is any other unbiased estimator of  .
It is worth noting that; sales person B had the least variability, thus the best sales person.
Illustration: 9.0.1 Regularity condition

Let x1, x2, ........xn, be a random sample of size n from a distribution f ( x, ) where  is
unknown. Let T  t ( x1 , x2 , x3 ,.......xn , ) be unbiased for  . We make the following
assumptions called regularity condition.

i)  Is real. i.e. o  e[ log f ( x,  )]  

 L
ii)
  Ldx  

dx
 T L
iii)
  TLdx  

dx where  TLdx  E[T ]
Illustration: 9.0.2 Schwartz inequalities

Let f ( x)andg ( x) be two functions, then
 f ( x)dx. g ( x)dx  ( f ( x) g ( x))dx

The equality sign holds if f ( x) g ( x)orf ( x)  kg ( x) where k is some constants
Theorem: 9.0.0
Let x1, x2, ........xn, be a random sample of size n from a distribution f ( x, ) where  is
unknown. Let T  t ( x1 , x2 , x3 ,.......xn , ) be unbiased for  . Then the necessary and sufficient
condition for T to be MVUE of  is
 T 
log L 
 
Where  is a constant which may be a function of parameter  but is independent of x
Proof
We are to find T such that
i) E[T ]  
ii) var(T )  E[T  E[T ]]2  E[T   ] is a minimum.
From (i) E[T ]   TLdx  

 
 TLdx  1

 T Ldx  1 ………(i)

Also
  Ldx  1

 
 Ldx  0

 Ldx  0 ………..(2)

Multiplying equation (2) by  and subtracting from equation (1)
L L
 T  dx    dx  1  0
L
  (T   ) dx  1

But
 log L 1L

  L
L  log L
 L
 
 log L
  (T   ) L  1 ……….(3)

L log L  log L
(T   )  (T   ) L . L
 
Using Schwartz Inequality
2
  log L     log L  
2
 (T   ) L    dx    (T   ) L L    dx 
The inequality holds if
 log L
(T  L) L L

 log L T  
 ………………..(4)
 
Theorem:9.0.1
If T is MVUE of  then var(T )  
Proof
var(T )  E[T  E[T ]]2  E[T   ]2
From equation 4
  log L 
2
vat (T )  E   
  
  log L 
2
  E
2

  
It can be shown that
  log L    2 log L 
E   E  
     
2
  2 log L 
var(T )   E 
2

 
2

    log L  
  2 E   
     
   T   
  2 E     …………………From 4
     
  1
  2 E  (T   ). 
  
  1 1  
  2 E  (T   )   (T   ) 
       
  1 1
  2    E (T   )  
   
1
 2 *  since E[T   ]  E[T ]        0

var(T )   ,
 log L T  

 
Example 9.0.0
Let x N (u,  2 ) where  2 is known. Obtain MVUE of u and find its variance.
Solution:
If T is MVUE of u then
 log L T  u
 Where var(u )  
u 
1
1  ( x u )2
f ( x, u )  e 2 2
2 2
 ( x u )2
n n n 1
  
L   f ( x, u )  (2 ) ( ) e 2 2 2 2
i 1
n n 1
log L   log 2  log  2  2
2 2 2
 ( x  u) 2
 log L 1
  2  2( x  u )(1)
u 2
1
 2  ( x  u)

 ( x  u)
2
1
n
 ( x  u)
2
n
 log L x  u
 2
u 
n
Tx
E[T ]  E[ x ]  u
2
T Is MVUE for u and var(T ) 
n
Activity9.0
Suppose x N (0,  2 ) . Find MVUE for  2
9.1 UNIFORMLY MINIMUM VARIANCE UNBIASED ESTIMATOR (UMVUE)
Illustration 9.1.1
Let xi , x2 .........xn be a random sample from f ( x, ) . An estimator T *  t * ( x1.....xn ) is defined

to be uniformly minimum variance unbiased estimator of  ( ) iff
i) E[T * ]   ( )
ii) var(T * )  var(T ) for any other T  t ( x1.....xn ) of  ( ) which satisfies E[T ]   ( )
Illustration 9.1.2 Cramer Rao-Inequality

Theorem: 9.1.0
[ ' ( )]2
var(T )  …….*
 
2

nE  log f ( x,  ) 
  
Where T  t ( x1.....xn ) is unbiased estimator of  ( ) . Equality is attained iff  a function, say

k ( , n) such that
n

  log f ( x, )  k ( , n)[t ( x .....x )   ( )]...**
i 1
1 n
Equation *is called the Cramer Rao inequality with its right hand side of the equation called
the Cramer-Rao lower bound for the variance of unbiased estimator of  ( ) .
Equation ** helps in finding an estimator whose variance coincides with the Cramer-Rao
lower bound.
If T *  t * ( x1.....xn ) such that

n

  log f ( x, )  k ( , n)[t ( x .....x )  
i 1
*
1 n
*
( )]
For some function k ( , n) and T * ( ) , then T * is MVUE of  ( ) .
Theorem: 9.1.1
Let x1......xn be a random sample of size n from f ( x, ) where  is unknown parameter. Let
T  t ( x1.....xn ) be unbiased for a function of  ,  ( ) i.e. E (T )   ( ) , then the Cramer-Rao
inequality is given by
[ ' ( )]2
var(T )  Where
I ( )
d ( )
 ' ( ) 
d
  log L 
2
I ( )  E  
  
  2 log L 
 E  
  
2
Illustration9.1.3
Cramer Rao Inequality expression
[ g ' ( x)]2
var( ( x)0 
 
2

nE  log f ( x,  ) 
  
Proof
E[ ( x)]  g ( )
 g ( )  E[ ( x)]but
E[ x]   xf ( x)dx
g ( )    ( x) f ( x,  )dx
n
g ( )    ( x) f ( x,  )dxi
i 1
  n

g ( ) 
 
 ( x )i 1
f ( x,  )dxi
 n
  ( x) f ( x,  )dxi
 i 1
n
  f ( xi ,  ) n
   ( x) i n1

 f ( x , )dx
i i

i 1
f ( xi ,  ) i 1
 n n
   ( x) log  f ( xi ,  )dxi  f ( xi ,  )dxi
 i 1 i 1
 n
   ( x)

 log f ( x , )dx  f ( x , )dx
i i
i 1
i i
Then
n
T   log f ( xi ,  )
i 1
n
 g ' ( x)    ( x,T ) f ( xi ,  )dxi
i 1
g ( x)  E[ ( x, T )]
'
This is because E[( x)]  0
Showing that E[( x)]  0

 n  
E[T ]  E   log f ( xi ,  ) 
 i 1  
n
 n
  log f ( xi ,  ) f ( xi ,  )dxi
i 1  i 1
 n n
 log f ( xi ,  ) f ( xi ,  )dxi
 i 1 i 1
n
  f ( x , ) i n


i 1
n  f ( x , )dx
i i
i 1
f ( xi ,  ) i 1
 n  n
 
 i 1
f ( xi ,  ) 
   f ( x , )
i 1
i

 (1)  0

E[T ]  0
Now
cov( ( x, T )  E[ ( x, T )]  E[ ( x)]E[(T )]
 cov( ( x, T )  E[ ( x, T )]
Also
var(T )  E[(T )]2  [ E[(T )]2 But E[(T )]  0
var(T )  E[(T )]2
   
 E   log f ( x,  ) 2  
   
   
 E   log  f ( x,  ) 2  
   
Given that the coefficient correlation lies between 0 and 1, i.e. ( 0   2  1 )
[cov( ( x, T ))]2
0 1
var( ( x)) var(T )
Taking only the upper bound

[cov( ( x, T ))]2
 var( ( x))
var(T )
[cov( ( x, T ))]2
2
 var( ( x))
  
E  log  f ( x,  )  
  
But [cov( ( x, T ))]  E[( ( x, T ))]  g ' ( x)
[ g ' ( x)]2
2
 var( ( x))
  
E  log  f ( x,  )  
  
Using he properties of independence of xi , the above reduces to
[ g ' ( x)]2
var( ( x))  2
  
nE  log f ( x,  )  
  
Hence the proof.
Example:9.1.0
Let xi N (u, ) where u is known. Find MVUE of 
Solution
Use the characteristic equation

 I ( )
log L  (T   ( )) '
  ( )
 ' ( ) 
T   ( )  log L
I ( ) 
Where
 ( )   ,  ' ( )  1
n 1
1  ( xi  u )2
L e 2
i 1 2 
n n n
log L 
2
log 2 
2
log  
2
 ( xi  u)2
 n 1

log L  2 
2 2
 ( x  u)
i
2
2 n 1
log L  2  3  ( xi  u ) 2
 2
2 2
2 n 1
log L  2  2
 ( x  u)
i
2
X 2n
 2
2 2 
n n n
 2  2
2 2
 2
 2  n
butI ( )   E  2 log L   2
   2
1
T   ( xi  u ) 2 henceMVUE
n
1   ( xi  u ) 2  2
var(T )  2 var 
  
n   
2 2
 * 2n  2
n2 n
Example: 9.1.1
Let xi .......xn be a random sample from
f ( x,  )    e x
0
Find the Cramer Rao lower bound for the variance of

i)  ( )  
1
ii)  ( ) 

Solution
i) If    then  '  1
f ( x,  )   e  x
log f ( x,  )  log    x
 1
f ( x,  )   x
 
   
2
1 
2
E  log f ( x,  )    E   x 
     
1
var( x) 
2
 1
E[ x]   xf ( x)dx 
 
 1
E[ x]2   x 2 f ( x)dx 
 2
But
1
var( x)  E[ x 2 ]  [ E[ x]]2 
2
From above
1 2
var(T )  
1 n
n
2
2
var(T ) 
n
1 1
ii) If  ( )  , then  ' ( ) 
 2
[ ' ( )]2
var(T )  2
  
nE  log f ( x,  )  
  
2
 1
   2  1
var(T )   2
n 2
1 n

1
var(T ) 
n 2
1
Where T is unbiased estimator of  ( ) 

Activity: 9.1.0
If a random variable Y has a density function f ( y /  ) , then the Fisher Information is defined
as.
Activity: 9.1.1
Suppose that Y Exp( ) for some parameter   0 , calculate I ( ) .
Example: 9.1.2
Suppose Y n( ,  2 ) where  is a parameter and  is known. Calculate I ( ) .
Solution
The density Y is
1  ( y   )2 
f (y /)  exp   ,   y  
 2  2 2 
Given that the density only depends on  since  2 is known, we have
( y   )2
log f ( y /  )   log( 2 ) 
2 2
So that
 y  2 1
log f ( y /  )  2 and 2 log f ( y /  )   2
   
The Fisher information then becomes
2 1 1
I ( )   E ( log f (Y /  )   E ( 2 )  2
 2
 
Exercises: 9.1.0
Suppose that Y1.......Yn is a random sample from a population having a common density
function f ( y /  ) depending on a parameter  , and let  be an unbiased estimator of  based
on Y1.......Yn . Determine Cramer Rao inequality If f ( y /  ) is a smooth function of y and  .
Exercises: 9.1.1
Suppose that Y1.......Yn is a random sample from Expo( ) population depending on parameter
 >0. Prove that ˆ  Y is minimum variance unbiased estimator of  .
Exercises: 9.1.2
Suppose that Y1.......Yn is a random sample from a N ( ,  2 ) where  2 is known and but  is a
parameter. . Prove that ˆ  Y is minimum variance unbiased estimator of  .
Exercises: 9.1.3
1
Find the UMVUE of  ( )  for a random sample x1.......x2 taken from a pdf

f ( x,  )    e x
0
WEEK 10
TOPIC 10: METHOD OF LEAST SQUARE
Objectives
13. Define linear model.

14. Estimate parameters of simple linear model
15. Apply the model to data
Learning Activities
end of the topic.
Topic Resources
reach.
9.1: METHOD OF LEAST SQUARE
Defination of least square
Is a method of fitting a curve to a set of points representing statistical data in such a way that
the sum of the squares of the distances of the points from the curve is a minimum.
least square states that the line should be drawn through the plotted points in such in such a
manner that the sum of squares of deviations of the actual values from the computed y values
is the least. ( i.e. obtain the line of best fit).
Recall: Regression Equation. (Estimating Equation)
There are two regression equations:
(i). Regression Equation of Y on X
It is used to describe the variation in the value of Y for given changes in X. It is expressed as
follows
y  a  bX
Where
y is dependent variable i.e. depends on x
X is independent variable
a-is the y-intercept
b- is the slope of the line
a and b are constants and can be obtained by the method of least squares.
(ii). Regression equation of x on y.
It is used to describe the variation in the values of x for a given changes in y. i.e. it is expressed
as
X  a  bY
Where
X is dependent variable i.e. depends on x
Y is independent variable
a-is the y-intercept
b- is the slope of the line
a and b are constants and can be obtained by the method of least squares.
To determine the values of a and b , the two normal equations are to be solved i.e.
( y  a  bx) should be minimum.
Since y  a  bx
Then Let
s  ( y  a  bx)2  0
We first minimise S by differentiating w.r.t to a i.e
ds  2 ( y  a  bx)(1)  0
da
 ( y  a  bx)(1)  0
 ( y  a  bx)  0
  y  a  b  x  0
therefore
 y  a  b x (i)
We also minimise S with respect to b
ds  2 ( y  a  bx)( x)  0
db
 ( y  a  bx)( x)  0
  xy   ax   bx2  0
  xy  a  x  b  x2  0
This implies that
 xy  a  x  b  x2 (ii)
To get the values of a and b, solve (i) and (ii) by any method of solving simultaneous linear
equations.
Exercise: Find the regression equation of Y on X of the following data.
X 6 2 10 12 4 8
Y 9 11 15 10 8 7
9.2 Least Square Estimation
The basic linear model assumes the existence of a linear relationship between two variables x
and y which is disturbed by some random error  . Hence for each value of x the
corresponding y-value is a random variable of the form.
y  0  1 X   ............................(1)
Where 0 and 1 are the intercept parameters and the slope parameter respectively of the
linear function  0  1 X
If n values xi  1, 2,..., n of x are observed with corresponding errors  i  1, 2,..., n , then the
resulting random variables, yi  1, 2,..., n are given by

yi  0  1xi  i ;i  1,2, , n (2) (Simple linear regression model)
In this context it is assumed that the random errors  i  1, 2,..., n are iid with mean zero and
variance  2 . So that
E  i   0;i  1,2, , n (3)

and
E  y   0  1x
Var ( i )   2 ;i  1,2, , n (4)

and
var( y)   2 ,cov( i j )  0
If value of y corresponding to xi  1, 2,..., n are observed and are denoted by yi  1, 2,..., n
then the least squares Estimation (L.E) problem is to find estimates, ˆ0 and ˆ1 of the
unknown parameters values 0 and 1 . Which minimize the sum of squared residuals i.e.
f (0 , i ) i.e
2
S  0 , 1     yi  0  1xi 
n
(5)
i 1
This function is easily seen to be convex and differentiable in 0 and 1 , so that the
 
unique solution  ˆ , ˆ  is given by the first order conditions.
 0 1
 
s  ˆ , ˆ   2   y  ˆ   x  1
1 i  
(6)
0  0 1   i

0

s  ˆ , ˆ   2   y  ˆ   x   x
1 i  i 
(7)
1  0 1   i

0

If we let
x  1  xi
n
And
y  1  yi
n
Then by (6)
 yi  nˆ  ˆ  xi  0 (8)
0 1
Which implies that
1  y  ˆ  ˆ  1  x   0
 
n i 0 1  n i 
i.e
 y  ˆ  ˆ x  0
0 1
y  ˆ  ˆ x
0 1
And by (7)
  
  yi  ˆ0  ˆ1xi   xi   0 (9)
i 
To simplify (9), let the estimated y-value corresponding to
 ˆ , ˆ 
 0 1
 
And be defined by
yî  ˆ  ˆ xi ; i  1,2, , n (10)

0 1
And rewrite (9) as
 
 yi  yî ( xi )  0 (11)
Note from (8) that
  yî  yi    yî   yi    ˆ  ˆ xi   ny (12)

  i i i 0 1 
  nˆ  ˆ  xi   ny
 0 1 
 n  ˆ  ˆ x   ny
 0 1 
0
To slove for 0
we first observe by subtracting (8) from (10)
yî  y  ˆ  xi  x  (13)
1 
  yî  yi    yi  y   ˆ  xi  x  ;i  1,2, , n
    1 
Hence multiplying both sides by
 xi  x 
and summing over y we obtain.
2
  yî  yi 
 
 x  x    yi  y 
 


ˆ 
1 

 xi  x      xi  x 

(14)
But since (11) and (12) imply
( yî  yi )( xi  x)  ( yi  yî ) xi  x ( yî  yi )  0 (15)
We conclude from (14) that
( yi  y)( xi  x)
ˆ1  (16)
( xi  x)2
By employing 18 , we may solve for  0 in terms of 1 as
ˆ0  y  ˆ1 x (17)
N/B
1.E  ˆ0   0
 
2.E  ˆ1   1
 
3.Var (ˆ1 ) 
2
( x  x)2
Therefore
y  0  1xi
0  yi  1xi
1
ˆ   xx  xy
E  yi   0  1x
1
  ˆ    xx  x  y 
 
1
  xx  xx

1 1
Var  ˆ    xx  x var  y  x  xx 
 
1 1
  xx  x 2 yx  xx 
1
  2  xx 
 2
y  x  
Summary
 2     yi  0  1xi 
2
i
Let
 2   yi  0  1xi   xi   0
   yi  0  1xi   xi   0

  xi yi  xi 0  1xi 2  0
 xi yi  0  xi  1  xi 2  0
1  xi 2   xi yi  0  xi
ˆ0  y  i x
  xi yi  0  xi
  xi yi   y  i x   xi
  xi yi   xi y  i x  xi
 1  xi 2  1  xi x   xi yi   xi y
 xi  nx
 1  xi 2  1nx 2   xi yi  nxy
 
 1  xi 2  nx 2   xi yi  nxy
 xi yi  nxy
 1 
 xi 2  nx 2
Example
Students in STA341 class claimed that doing the assignment had not helped them prepare for
the main exam. The exam score y and assignment score x for the 18 students were as follows:
X Y
96 95
77 80
0 0
0 0
78 79
64 77
89 72
49 66
90 98
93 90
18 0
86 95
0 35
30 50
59 72
77 55
74 75
67 66
a) Obtain the ˆ0 and ̂1 and write the prediction equation
b) Test the hypothesis H 0 : 1  0
c) Find the coefficient of determination
Answer
ˆ0  10.7269
ˆ1  0.8726
Prediction equation yˆ  10.73  0.8726x
Estimating Linear model
Consider the linear model
yi  0  1x  i such that i ~ N (0, 2 )
i.e. where E    0 and Var ( )   2 .
i is the error involved in estimation of y as a function of xi
i 1,2, , n
Suppose we have n independent observations y1, y2 , , yn
Then
y1  0  1x1  1
y2  0  1x2   2
y3  0  1x3   3
yn  0  1xn   n
Now in matrix notation

 y   0  1x1    
 1    1
 y2   0  1x2    2 
   
     
     
 yn  
  0   x  
1 n  n
 y  1 x1   
 1    1
y  1 x2  0    2 
  
 2   
        
    1   
 yn  1 xn  n 
1  x  y   
 1  1  1
1  x   y2   
Let x   2 , y 
  and   2
     
     
1  xn 
   yn  n 

Therefore the model simplifies to y  x   where    0 , 1 
 =residue error.
The task is to minimize the error
  y  x
This is done by minimizing the sum of square s    and

ds  0
d
Let s be the sum of squares of the residuals (error) i.e.

s      y  x   y  x 
   
T
But we know that  AB   BT AT
 
s   y   x 
 y  x 

  
s  yy  yx   xy   xx
To estimate 
ds   x y  x x  0
 
d
For
ds  0
d
 xx  xy (Least square equation that has to be solved)
AX  
Recall A1 AX  A1
IX  A1
Where I =identity
Therefore to solve  x make  the subject
xx  xy
1 1
 xx   xx     xx  xy
1
ˆ   xx  xy
 ˆ 
 
But,    0 
ˆ
 
 1
 ˆ 
  1
 0  
ˆ
 xx  xy
 
 1
 
1 x1  1 1 1 1  
     y1 
1 x2  x x2 x3 xn   
Let x    and x   1  and y   y2 
     
     
1 xn    y 
 n
1 1 1 1  1 x1 
 
x x2 x3 xn  1 x2 
 xx    1  
  
  1 xn 
 
N/B
If yi  0  1xi   i
E  yi   0  1E  xi 
 0  1  x   
y  0  1  x    since E    1  E    0

n
And E  y   0  1  x 
Mean and variance of Least Square Estimates
N/B
 yi  xi 2  xi  xi y n  xi yi  xi  yi
If ˆ
0

2
and ˆ1  2
n  xi 2    xi  n  xi2    xi 
   
 xi yi  nxy

 xi 2  nx2
  xi  x  yi  y 

  xi  x 
2
  xi  x  yi  y   xi  x 

  xi  x 
2
  xi  x  yi

  xi  x 
2
 

 E  1   E 

 xi  x yi  
2 

  xi  x 
 

  xi  x  E  y 

  xi  x 
2
  xi  x   0  1xi 

  xi  x 
2
0   xi  x   1  xi  xi  x 

  xi  x 
2
1   xi  x  xi

  xi  x 
2
1  xi 2  x  xi

 xi  nx 2
1  xi 2  nx 2

 xi  nx 2
 1
E  ˆ1   1 and 1 is unbiased for ˆ1

 
Similarly
y  0  1x
ˆ0  y  1x
i.e.
E  ˆ0   E  y  ˆ1  x 

   
 E  y   xE  ˆ1 
 
 E  y   1  x 
 0  1  x     x 
 0
 E  ˆ0   0 and 0 is unbiased estimator for ˆ0 .

 
 
 ˆ  
  Var  cov ˆ ˆ 
Var  ˆ   Var  0    0 0 1
   ˆ1  
  

cov ˆ0ˆ1 Var 
1


 
Recall
var  ax   a 2 var  x  if a is a constant
And var  Ax   A var x A if A is a matrix
 1 
var  ˆ   var   xx  xy 
   
 1   1 
  xx  x var y   xx  xy 
   
From  AB   BA and  A  A , we have
1 1
 xx  x var  y  x  xx 
But var  y   var  0  1x   
1 1
var  ˆ    xx   xx  xx   2
 
1
  xx   2
1   x2  xi 
But  xx   1  i
 
2   x n 
n  xi2   xi  i
 ˆ 

var     Var  0 
 ˆ
   ˆ1 
 

 Var 0
ˆ cov ˆ0 ˆ1 
 
 cov ˆ0 ˆ1 Var ˆ 
 1 
  xi2  xi 
 

   
2 2
 n  xi
2 x
i n  xi2   xi 
   variance covariance matrix
  xi n 
 2
   
2
 n  x2   x n  xi2   xi 
 i i
 
 xi2  2  2  xi2
var  ˆ0   
  n   xi  x 
  2 2
n  xi2   xi
var  ˆ1   n 2  n 2
  n   xi  x 
  2 2
n  xi2   xi
 2
  xi  x 
2
cov  ˆ0 , ˆ1   cov  ˆ1, ˆ0 

   
 xi 2

 
2
n  xi2   xi
 nx 2
n  xi2  n2 x 2
 xi 2

n   xi  x 
2
  x 2
  xi  x 
2
Example
Use a matrix notation to fit
I. Linear regression equation model y  0  1x
II. Quadratic regression equation y  0  1x  2 x2 by method of least square
using the following data.
x -2 -1 0 1 2
y 0 0 1 1 3
WEEK 11
TOPIC 11: INTERVAL ESTIMATION
A single estimator may not be expected to coincide with a population parameter; we therefore
give an interval in which the parameter may be expected to lie with a certain degree of
confidence/ certainty.
N/B
The resulting interval should contain parameter and be relatively narrow.
Definition
An interval estimator is a rule that specifies the method for using the sample measurements to
calculate two values 𝐶 1 𝑎𝑛𝑑 C 2 𝑡ℎ𝑎𝑡 𝑓𝑜𝑟𝑚 𝑒𝑛𝑑 𝑝𝑜𝑖𝑛𝑡𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙.
Let x1 , x2 ,................xn be iid Random variables from a density f  x,   .
Let T1  t1  x  and T2  t 2  x  be two estimators (statistics) satisfying T1T2 which

Pr(T1  T    T2 )  1   , where ∝doesn’t depend on  . The random variable T1T2  is
called 100(1   )% confidence interval for T   .
 Pr T1  T    T2   1   mean probability that T   lie between interval T1 ,T2  is

1   
 1   is known as confidence coefficient
 T1 andT 2 are called the lower and upper confidence limits respectively for T   
   0, But small numbers as 5%, 1%, 2% etc. and is called level of significance
Methods of constructing confidence interval
Pivotal quantity method
Definition
Let x1 , x2 .........xn be a r.s from the density f ( X , ), LetQ  q( x1 ..........xn , ) be a function of
x1 ...........xn and  such that its distribution is independent of  . Then Q is called a pivotal
quantity. (PQ exists for continues variable).
Example
 
Let 𝑋~𝑁(𝜃, 𝑞)𝑙𝑒𝑡 𝑥1 … … … . 𝑥𝑛 be a r.s from the distribution x ~ 𝑁(𝜃, 𝑞), ( ( X   ) is a
pivotal quantity.

𝑞
x   ~(𝜃, ) Distribution does not depend on 
𝑛

X  𝑞
~𝑁 (1, 𝑛𝜃2 ), it is not PQ(distribution independent of  )

n

X 𝑞
But ~𝑁 (1, 𝑛𝜃2 ) 𝑖𝑡 𝑖𝑠 𝑛𝑜𝑡 𝑃𝑄 𝑖𝑡 𝑖𝑠 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓𝜃

Pivotal quantity method

If 𝑄( 𝑋, 𝜃)𝑖𝑠 𝑎 𝑃𝑄, 𝑡ℎ𝑒𝑛 𝑓𝑜𝑟 𝑎𝑛𝑦 𝑓𝑖𝑥𝑒𝑑 0 < (1 − 𝛼) <
1 ∃ 𝑞1 𝑎𝑛𝑑 𝑞2 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑡ℎ𝑒 𝑝(𝑞1 < 𝑄 < 𝑞2 ) = 1 −
𝛼 𝑖𝑓 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑙𝑢𝑒( 𝑥1 … … … . 𝑥𝑛 )
𝑞1 < 𝑄(𝑥1, 𝑥2 ) < 𝑞2
 t1 ( x1 .......xn )  Z ( )  t 2 ( x1 ......xn )
𝑓𝑜𝑟 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑡1 𝑎𝑛𝑑 𝑡2 (𝑛𝑜𝑡 𝑑𝑒𝑝𝑒𝑛𝑑𝑖𝑛𝑔 𝑜𝑛 𝜃), 𝑡ℎ𝑒𝑛 (𝑇1 , 𝑇2 ) 𝑖𝑠 𝑎 100(1 −
𝛼)%𝐶. 𝐼 𝑓𝑜𝑟 𝑍(𝜃)
EXAMPLE 1
Let
𝑥1 … … . 𝑥𝑛 𝑏𝑒 𝑎 𝑟. 𝑠 𝑓𝑟𝑜𝑚 𝑋~𝑁(𝜃, 1)𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡 𝑎 100(1𝛼)% 𝐶. 𝐼 𝑓𝑜𝑟 𝜃, 𝑢𝑠𝑒 𝑝𝑖𝑣𝑜𝑡𝑎𝑙 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦
Method
Solution

X  
𝑸= ~𝑵(𝟎, 𝟏) = √𝒏( X   ) ~𝑵(𝟎, 𝟏). we choose 𝑞1 &𝑞2 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 (𝑞1 < 𝑄 <
1
n
𝑞2 ) = 1 − 𝛼

X  1  1
𝑞1 <  q 2 , 𝑞1 < X    q2 X
√ 𝑛
1 n n

 1 1  1
− X  q1     X  q 2    X  q1
n n n
 1  1
X  q2 , X  q1 is
n n
𝑖𝐼𝑠
 1  1 1
L= 𝑇2 − 𝑇1 = ( X  q1 )  ( X  q2 )  (q 2  q1 )
n n n
𝑚𝑖𝑛𝐿 = min(𝑞2 − 𝑞1 ) 𝑢𝑛𝑑𝑒𝑟 𝑡ℎ𝑒 𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑖𝑜𝑛 𝑡ℎ𝑎𝑡
𝑃(𝑞1 < 𝑄 < 𝑞2 ) = 1 − 𝛼
∅(𝑞2 ) − ∅(𝑞1 ) = 1 − 𝛼
It can be shown that 𝑞2 = −𝑞1 , 100(1 − 𝛼)𝐶. 𝐼 𝑓𝑜𝑟 𝜃
Z Z
   
 X ,X 2 
2
𝛿=1
 n n 
EXAMPLE 2
𝐴 𝑟𝑎𝑛𝑑𝑜𝑚 𝑠𝑎𝑚𝑝𝑙𝑒 𝑜𝑓 𝑠𝑖𝑧𝑒 𝑛


= 16 𝑓𝑟𝑜𝑚 𝑁(𝜇, 25) 𝑦𝑖𝑒𝑙𝑑𝑒𝑑 X  73.8 . 𝐹𝑖𝑛𝑑 𝑎 95% 𝐶. 𝐼 𝑓𝑜𝑟 𝜇
Solution
  20.05
X  Z , 𝑍𝛼 = = 𝑍0.025 = 1.96
n 2 2
2
5 5
= (73.8 ±)1.96 = (73.8 ± 1.96𝑋 4)
√16
= 73.8 ± 2.45
=[71.35, 76.25]
Confidence interval for the mean

Letx1 , x2 .......xn Be a random sample from a normal population with unknown 𝜇 and known
variance 2 . To find the 1    100% confidence intervals for the mean.

 2
  X 
We know that X is N   ,   Z ~ 𝑁(0,1). From the normal tables find z
 n  
n

for which the area is 1  , Then the 1   )100 %  confidence limits are
2
       
(X  Z ,XZ ) Where, L1  X  Z , L2  X  Z
n n n n
EXAMPLE 1
Assume that the top speed attainable by a certain design of a car is a Normal Random
variable x with unknown mean 𝜇 and standard deviation 10km/h. A random sample of 10
cars built according to this design was selected and each car was tested with  xi  893,
compute 90% and 95% confidence interval for  .
Solution
(i) 90% confidence interval
1  0.1 0.9
From the Normal tables, the value corresponding to Z   is 1.65.
2 2
Thus the
16.5 16.5
Confidence limits are: (89.3 − , 89.3 + )
√10 √10
= 84.08,94.52 
EXAMPLE 2
Confidence interval for the mean of a Normal population with parameters and 2 both
unknown


x 
Suppose we want 1    100% confidence intervals from the samples, then follows a t-

n
distribution with n  1df where :
2
1  

S2  
n 1 
 xi  x 

From the t tables, note down the value of t1 2 corresponding to n  1 degree of freedom,
then the
 s  s
Confidence interval is x  t 1 , x  t 1 ,
2 n 2 n
EXAMPLE 3
Assume that the top speed attainable by a car is a normal random variable with unknown
mean and standard deviation. A random example of 10 cars built according to their design
was selected and each car was tested .The sum and the sum of squares of the top speeds
attained in km/h was x i  1652 , 𝑎𝑛𝑑 x i
2
 273765 computing the 95% confidence
interval for𝜇.
 2
1 1
𝑠2 = 𝑛
( xi  x) 2 = 𝑛 ∑ 𝑥𝑖2 − x )
1 1652 2
= 𝑋 273765 − ( )
10 10
= 83.4
ns 2  n  1s 2  s 2 
n 2 10
s  X 83.4  92.7
n 1 9
1
95%    0.5   1  0.25  0.975
2
t   t 0.975 . From the t tables, the value of t 0.975 corresponding to (10-1)

1
2
= 9df=2.62, thus the required confidence interval is
92.7 92.7
165.2 − √ 10 𝑋 2.26 , 165.2 + √ 10 𝑋2.26
= 158 .32,172 .08 
EXAMPLE 4
Let x1 , x2 ...............xn be a random sample from a normal population with mean  and
variance  2 both unknown. Find 1   100 % confidence interval for  2 .
Solution
In this case we use  2  variate with n  1df from the  2 tables, note down the values of
  2 and  2  corresponding to n  1df . Then the confidence interval is given by
1
2 2
2 2
 
  

 xi  x   xi  x 
   2  ,  2 
 


1
2 2
Illustration
Suppose X has distribution N(𝜇, 𝜎 2 ) where 𝜇 𝑎𝑛𝑑 𝜎 2 𝑎𝑟𝑒 𝑢𝑛𝑘𝑛𝑜𝑤𝑛. A random sample of
size 15 yields values  xi = 8.7 and ∑ 𝑥𝑖2 = 27.3. Obtain a 95% confidence interval for 𝜎 2 .
Solution
∝
Given n =15, ∝=.05, = .025,
2
2
 

 xi  x 
∑  in a  2 −𝑣𝑎𝑟𝑖𝑎𝑛𝑡 𝑤𝑖𝑡ℎ (𝑛 − 1)𝑑𝛿. From the  2 tables , the value of
 2
 0.025 and  2 corresponding to 14𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝑎𝑟𝑒 5.63 and 26.1

2
.975
respectively thus, the required C I is given by
2
 

  i 
 x  x
 (x i

 x) 2
, , we know that
26.1 5.63
2
 

  xi  x  1  2 2 
=   x i  x 
n n  
2
= x 2
i nx
2
 8.7 
= 27.3-15   = 22.1
 15 
22.19 22.19
The C.I is ,
26.1 5.63
= 0.85,3.94 
Finite population
In all the previous cases, the population was considered to be infinite or very large. If the
population is finite, we need to modify the CI . Suppose we are to fine confidence interval for
the mean from a finite population, then the CI is
 

  N n
x  Z, 
 n N 1 
 
Where, N= Pop size
n= Sample size
  SD of the population or sample

Example
A random sample of 50 students out of total of 200 showed a mean of 75 and a SD of 10
(i) What are the 95% confidence limits for estimates of the mean of the 200 students
(ii) With what degree of confidence could we say that the means of all the 200
students is 75  1 ?
Solution
The CI is given by
 

(i)  X  1.96  N n 
, X  1.96
 N n
 n N 1 n N 1 
 
10 200  50
 7.5  1.96
50 200  1
 72.59,77.4
10 150
(ii) 75 ± Z 1 √ = 75 ± 1=
√50 199
10 150
Z1 1
50 199
 Z 1  .81445  .29 X 200 %  58.2%
Confidence interval for difference means of two independent Normal Population
Let x1 , x 2 ..........x n be iid 𝑥~𝑁(𝜇 , 𝛿 2 )
Let x1 , x 2 ..........x n be iid 𝑌~𝑁(𝜇 , 𝛿 2 )
Let XandY be independent.
  12
E ( X )  X ~𝑁 (𝜇 1 , ) 𝑎𝑛𝑑
n1
  22
E (Y )  X ~𝑁 (  2 )
n2
     
 X  Y ~𝑁(𝐸 ( X  Y ), Var ( X  Y )
   
But E ( X  Y )  E ( X ) E (Y )  1   2
   12  22  
(Var ( X  Y )    var X  var Y
n1 n2
  
 X  Y      
  1 2
Z  ~𝑁(0,1)
 12  2 2

n1 n2
If the variance of the two populations are same ie  12   22   2 say then
  
 X  Y      
  1 2
Z  ~𝑁(0,1)
1 1
 
n1 n2
We wish to find 100(1   )% 𝐶 𝐼 for the difference of the mean 1   2 
Case A
If common variance is known
  
 X  Y      
  1 2
Z  ~𝑁(0,1)
1 1
 
n1 n2
Z will be between  Z  and Z  with probability 1  
2 2
P(  Z  < z  Z  ) = 1  
2 2

 
( X  Y )  ( 1   2 )
P( Z    Z )  1
2 1 1
 (  2
n1 n2
Gives (1   )100% CI for 1   2
 
 
  

  Z  1  1  ( X  Y )  (    Z  1  1   1  
P
n1 n2 
 1 2 
n1 n2
 2 2

 
 
 
   
   1 1   1 1 
P  ( X  Y ) Z     ( 1   2 )  ( X  Y )  Z      1
 2
n1 n2 2
n1 n2 
 
 
OR
Type equation here.P
 
 
   
   1  2
2 2   1  
2 2
 ( X  Y ) Z    ( 1   2 )  ( X  Y )  Z     1
 2
n 1 n 2 2
n 1 n 2 
 
 

If  1 and  2 are known but  1   2 and Z-Value giving area of to the right.
2
CASE B
When    1   2 is unknown, then  should be replaced by its unbiased estimate ie

2 2 2
common variance
S S
2
^
2

 (x i  1 ) 2   ( y i   2 ) 2
n1  n2  2
P
(n1  1) S12  (n2  1) S 22

S P2 
n1  n2  2
1 
(n1  n2  2) S P2  (n1  1) S12  (n2  1) S 22 , Where S12 
n1  1
 ( xi  X ) 2 and
1 
S 22 
n2  1
 ( yi  y) 2

(n1  1) S 1
2

 (x i  X )2
But 1 2
~~~ =

(n2  1) S 22

 ( yi  y) 2
2  22
(n1  n2  2) S p2 (n1  1) S12 (n2  1) s 22

  
2 12  22


( X  Y )  ( 1   2 )
1 1
Let Z =   and
n1 n2
(n1  n2  2) S P2

2
𝜌
t Z
Then
 ~~~~~
n
 
  
( X  Y )  ( 1   2 ) ( x  y )  ( 1   2 )
t 
1 1 s 1 1
   SP 
n1 n2 n1 n2
EXAMPLES 1
In an investigation to estimate the mean weights in Kg of 15 years old children in a particular
region R . S of 100 children is selected. Previous study indicate that the variance of weights

of such children is 30𝑘𝑔2 . Suppose the Sample mean weight is x = 38.4 kg, estimate the
population mean weight of all 15years old children in the region assuming that these weight
are normally distributed use  =5%
Solution
_
n = 100 X = 38.4  = 0.05
 2 = 30  Z 0.025  = 11 − 0.025 = 0.975  z  = 1.96

2
C.I for  when  2 is known

_
 _

C.1  x  z = X  (1.96)
n 2 n
30 3
C.1 = 38.4 + (1.96) = 38.4 - (1.96)
.100 10
C.1 = 37.32646379
C2 = 39.47353b21
EXAMPLE 2
A r.s of 11 bags were selected from a machine packaging wheat flour in bags marked 1 kg.
The actual weights of each flour in kilograms were 1.017, 1.05, 1.078, 0.997, 1.033, 0.996,
1.059, 1.082, 1.014, 1.072 and 0.998. Construct a 95% C.I for the mean weight of flour in
bags marked 1kg assuming the weight are normally distributed.
Solution
C.I for  when  2 not known; X~𝑁 (  ,  2 )

_
C.I = x  s t
n 2
 =5%  = 0.025, n=11,  x i 2 11.397  xi 2 = 11.81967

2
t 0.025 10 = 2.23
_  xi 11.397
X = = = 1.036090909
n 11

2

_ _
1 1
s2 = ( xi  x ) 2 =  xi2  n x
n 1 n 1
2
1   11.397   0.011348909
= 11.819671  11   =
10   11   10
S 2 = 0.00113489019 = (0.033688141) 2
C 1 = 1.0361-
0.033688
2.23  1.013449189
n
0.033688
C 2 = 1.03b1 + (2.23) = 1.058750811
n
 95% C.I  (1.0134, 1.05875)

EXAMPLE 3
Let x1 , x2 .............x n denote a R.S of size n from a normal distribution with mean  and
variance  2 , both unknown. For a R.S of size 25 it was found that  xi  3700 ,
x  573,000.
2
i
(a) Obtain unbiased estimate of  and  2 .

(b) Obtain a 95% CI for and 2
(c) Calculate a new C.I for  assuming that  2 was t be 800
SOLUTION
_
X~𝑁 (  ,  2 ) x ~𝑁 (  , 2 )
i n
n=25 X i
= 3700  xi2 = 573,000
_
E( x ) = 𝜇
_
1 3750
 x=  x1 = = 14.8
n 25
E(S 2 ) = S 2
_
1  _
2

1
Bur S 2 = ( xi − x) 2 =  1 
2
x n x 
n 1 n 1  
=
1
24

573000  25(198) 2 
=
1
25400 = 1058.33
24
95% C.I for 𝜇, ,  unknown
  = 0.05,  2 = 0.025, n=25

_
s
C.I  x  t
n 2
Bur t = t 0.025, 24  2.06

2
S = 1058 .33
1058 .33
 C 1 = 148 − .(2.06) = 134.5968
25
1058 .33
C 2 = 148 + (2.06) = 161.4032
25
 (134.5968, 161.4032)
95% C.I for  2 ; 𝜇 unknown
C.I for  2
(n  1) S 2  _2
C1 =
 2
=  ( xi  X ) 2   xi2  n x
, n 1`
2
(n  1) S 2
C2 =
 1  ,n 1
2
 But =  2 =  02.025, 24 = 40.65

2 ,n-1
 2 =  02.975, 24 = 13.12
2 ,n-1
24(1058 .33)
C1 = = 624.844
40.65
24(1058 .33)
C2 = = 1935.9695
13.12
 (624.844, 1935.9695) (650.88, 2016.62)
95% C.I for;  unknown

 

  
X  z 
 n 2

800
C 1 = 148− (1.96) = 136.91256
25
800
C 2 = 148+ (1.96) = 159.087
25
 136.91256, 159.08
Confidence interval for proportion
Let ‘p’ be proportion for success in a sample of 𝑁 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑓𝑟𝑜𝑚 𝑋~𝐵(𝑛, 𝑝).

x
Let p = be point estimate for the population proportion
n
x  Number of individual in the sample with specified characteristics and n = sample size

N/B It’s unlikely that a point estimate p will be equal to population proportion.
Recall

Sampling distribution of p
~~
For a simple R.S of size ; n  0.05 N (ie sample is not more than 5% of the proper size)

Sampling distribution of p is approximately normal with mean 𝜇 = p and standard deviation
p (1  p )
 
^ , provided their np(1  p)  10
p n
𝛿𝛿

Hence we use sampling distribution of p to construct ( 1   100 % C.I for the population
proportion.
Suppose a simple R.S of size n is taken from a population. A 100(1 −
𝛼)% 𝐶. 𝐼 𝑓𝑜𝑟 𝑃 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛 𝑏𝑦
Pr c1  p  c 2   1  

Pr  z  z  z
2 2
 1  

Since p
~~~
_ 
 ^
  p p
Z= 
c p (1  p )
n
 ^ 
 ^

 p p 
Pr  z   z   1  
 2 p(1  p) 2

 n 
 p (1  p ) ^ p (1  p ) 
Pr  z  p  p  z   1
 2 n 2 n 
 ^

^ p(1  p) ^ p(1  p) 
Pr  P  z  p  p  z   1
 2 n 2 n 
 
^ ^
Where{𝑛 p (1  p )  10 and } in order to construct this C.I

n  0.05 N
^ ^
If p is unknown, variance = n p (1  p );
^ ^ ^
E( p )  P ie p is the best point estimate of p
Example
A poll was conducted to a S.R.S of 1505 Kenyan adult on their opinion about ICC. Of 1505,
1129 responded to be in favor of ICC. Obtain a 95% C.I for the projection of Kenyans who
were in favor of ICC
Solution
n=1505 ;   0.05   0.025

2
^ x 1129
p   0.750166
n 1505
^ ^ ^
n p(1  p )  282 .06  10 and n  0.05 N .(ie these are over 10 million adults in Kenya)
Z   Z 0.025 = 1.96
2
^ p (1  p ) 0.0750 (1  0.750 )
C 1  p  z  0.750  1.96
2 n 1505
=0.728293101
C2 =0.7720383
 95C.I For p is (0.7283, 0.772)

 If 100 samples of size 1505 were taken about 95% of the intervals will confuse parameter
P and S will not.
N/B. Sample size necessary for estimating a population proportion within a specified margin
of error.
Let E be margin error. Then if we solve

E= z for n
2 n
Our margin error
^ ^
p (1  p )
E= z
2 n
^
2
  ^
 E  p(1  p)
  
 z n
 2 
2
 Z 
n  P(1  P) 2 
^
 E 
 
^ ^ x
N/B The above formula depends on p and p  depends on sample size n
n
And n =??
Possibilities:
1. You can use an estimate of P based on previous study
^
2. We let p  0.5
^ ^ ^
 When p  0.05 max values of p(1  p)  0.25
Hence largest possible value of n for a to give level of confidence and a given margin error is
^ ^
gotten using p (1  p )
Sample size needed for estimating the proportion p
^  z 
n  p(1  p) 2 
^
 E 
 
Rounded up to next integer
^
Where p  prior estimate of p
If prior estimate of p is unavailable, sample size required to obtained a 1   15% C.I

2
 z 
Is n  0.25 2  rounded up
 E 
 
EXAMPLE 1
A researcher wishes to estimate the % of Kenyans living in poverty. What size of sample
should be obtained if he wishes the estimate to be within 2% points with 99% confidence;
a) If he uses 2003 estimate of 12.7 obtained from KNHS.
b) He does not use any prior estimates.
Solution
E = 0.02,   1%  0.01 ,  2  0.005
z  Z 0.005  2.575
2
^
P  0.127
2
z  2
  
n  p1  p  2   0.127 1  0.127  2.575 
^ ^
  E   0.02 
 
=1837.9 = 1838 randomly selected people
E  0.02 Z 0.005 =2.575

2
 z  2
n  0.25 2   0.25 2.575 

 E   0.02 
 
=1444.1
n= 1445 randomly selected people
*Here R.S is >>>
EXAMPLE 2
A factory is producing so 50,000 pairs of shoes daily. From a random sample of 500
pairs.2% were found to be sub-standard quality. Estimate the number of pairs that can be
reasonably expected to be spoiled daily product and assign limits of 95% level of
significance.
Solution
P (defective) = 0.02  P ; q  1  p  0.95
95% C.I for the proper proportion is

^
^
^ p (1  p )
p  1.96
n
0.02  0.98
0.02  1.96  0.02  0.0123
500
= 0.0077 to 0.0323
 N/B of pairs expected to be spoiled in daily production

= 0.0077  500  0.0323  50000
= 385,1665 
I f sampling fraction is not ignored, then an unbiased estimate of S.EP, The 95% for P are
^ pq N  n
For p are p  .
n n 1
EXAMPLE 3
Out of consignment of 100,000 tennis balls, 400 were sorted or random and examined and
it was found that 20 of these were defective. How many defective balls can you
reasonably expect to have in the whole consignment at 95% confidence level?.
Solution
95% C.I for the % of defective balls in the government consignment are
5  96
5  1.96  5  2.136  2.864  7.136
400
Hence 95% C.I for number of bad bulbs in the consignment

2.864
100,000 X  2864
100
Confidence interval for Variance  2
(a)  - Known  0
  
 X  0 
Q  ~  (1) df
2
  
 n 
 
2 
  
 
  X   0  
Pr q1   q   1   both 𝑞1 𝑎𝑛𝑑 𝑞2 𝑎𝑟𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒.
2
2
 n 
 
 
 
n( X   0 ) 2 n( X   0 ) 2
2 
q2 q1
1 1  
L    n( X   0 ) 2
min  q1 q 2  subject to
q2
q1
g (t )dt  1   where( gt) pdfof (21)
We use approximate value of q1 and q 2
(b)  is unknown
n  1 2  2
 n 1
Q= 2
For a given 𝛾 or 1    q1 , q 2 , 

 q
Pr  1 
n  1s 2
 q

  1  
 2 2
 
(n  1) S 2 (n  1)
2 
q2 q1
1 1  q2
  (n  1) S 2  h( x)dx  1  
-To min x L=  1
q q2  q1
,
h(x)  ( n1)
Is is a pdf where
A soln for q1 and q 2 can be obtained by trial and error or numerical integration C.I for 
 n  1s 2 , n  1s 2 
 
 q2 q1 
 
Example
The following samples believed to come from normal people were obtained 221, 311, 286,
392, 412, 187, 346, 301, 280, 446, 351, 320, 237, 276, 381
Solution

x  316 .73 s  72.75

s
 x   x  2 n 2
i i
n 1
1) Determine a 90% C.I for  or 

2
n  1s 2   2  n  1S 2
2
  1
2 ,, n 1 2 ,, n 1
1.5  172.85 2   2  15  172.75 2

 0.05,14  0.95,14
 14 72.75 2 1472.75 2 

= , 
 23 .6848 6.57063 

= 3128 .4   2  11276 .8 
90% C.I for 
 3128.4 <   11276 .8 

WEK 12: CONTINIOUS ASSESSMENT TEST 2

Sta 341 Class Notes Final

Uploaded by

Copyright:

Available Formats

Sta 341 Class Notes Final

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sta 341 Class Notes Final

Uploaded by

Copyright:

Available Formats

Course Code: STA 341

Course Title: THEORY OF ESTIMATION.

• The objective of estimation is to determine the value of a population parameter on the

1. Random variable, population and sample

Population (N) represented by X

Sample (subject of Population)

In interval estimation, we estimate an unknown parameter using an interval of values that

 The unknown population parameter  is likely to be within the prescribed probability.

Estimator of unknown Parameter  I denoted by 

 A parameter is a numerical value associated with a population. Considered fixed and

Is a statistical constant that characterises the distribution in a population. (i.e. population

 and  2 are population parameter to be estimated

 Estimator is a rule/formula derived from a sample for the purpose of estimating an

1.2 OVERVIEW OF SAMPLING THEORY

1.4 Methods of sampling – probabilistic

1.6 Sampling Distributions

The sample means ( X ) tend to be close to the population mean (μ =520.78).

 X1 , , X n are observations of a random sample of size n from the normal

i. X and S 2 are independent

Now for proving number 2, Let W    

and multiply both sides by (n−1), we get:

is a sum of n independent chi-square(1) random variables. That's because we have assumed

is a sum of n independent chi-square(1) random variables. Therefore, the moment-generating

Let X i denote the Stanford-Binet Intelligence Quotient (IQ) of a randomly selected

follows a chi-square distribution with 7 degrees of freedom

By the end of this topic, you should be able to:

A good estimator must satisfy three conditions:

An unbiased estimator of a population parameter is an estimator whose expected value is equal

E[ ]   then  is said to be biased

An unbiased estimator is an accurate statistic that’s used to approximate a

2.1.1 BIASED ESTIMATOR

MSE can be re-stated in terms of its variance and its bias

b) Let X1 , X 2 and X 3 be three identically and independently distributed random

i. Show that ˆ is an unbiased estimator of 

A sequence of estimators ˆn is said to be consistent of  if for every   0

limn pr{| ˆn   |  }  0 Or

limn pr{| ˆn   |  }  1

lim n Var (ˆn )  0

lim n E[ˆn   ]2  0

3.1: Schwarz Inequality.

For any random variable U and V,  E UV   E U 2   E V 2 

0  E  aU  bV   a 2 E U 2   2abE UV   b2 E V 2  ......(2)

a   E V 2   2 And b   E U 2   2

Then it follows from (1) that

E UV    E U 2   E V 2   2

And from (2) it follows that

E UV    E U 2   E V 2   2

3.2 Chebyshev’s Inequality.

A sequence of random variables X1 , X 2 , , X n is said to converge in probability to a constant

N/B: Even if ˆ is an unbiased estimator of  , we could like that it is as close as possible to 

By Chebyshev’s inequality for every   0

Hence x is consistent for  .

| ˆn   | Is distance /difference between the estimator and the parameter to be

Example 3.2 : Let x1 , x2 , , xn be i.i.d N (  ,  2 ) random variables. Show that:

i. The sample mean x is a consistent for 

Hence a consistent estimator of  2

Remark: If X i is i.i.d from a distribution with mean  and variance  2 , then

iii. Var (aX )  a 2Var ( X )  a 2 2

i. Are t1 and t2 unbiased?

Thus t1 is unbiased estimator for 