Likelihood Ratio Tests: Instructor: Songfeng Zheng
Likelihood Ratio Tests: Instructor: Songfeng Zheng
Likelihood Ratio Tests: Instructor: Songfeng Zheng
A very popular form of hypothesis test is the likelihood ratio test, which is a generalization of
the optimal test for simple null and alternative hypotheses that was developed by Neyman
and Pearson (We skipped Neyman-Pearson lemma because we are short of time). The
likelihood ratio test is based on the likelihood function fn (X −1, · · · , Xn |θ), and the intuition
that the likelihood function tends to be highest near the true value of θ. Indeed, this is also
the foundation for maximum likelihood estimation. We will start from a very simple example.
H0 : θ = θ0 vs. Ha : θ = θ1 ,
1
2
Intuitively, if the evidence (data) supports H1 , then the likelihood function fn (X1 , · · · , Xn |θ1 )
should be large, therefore the likelihood ratio is small. Thus, we reject the null hypothesis
if the likelihood ratio is small, i.e. LR ≤ k, where k is a constant such that P (LR ≤ k) = α
under the null hypothesis (θ = θ0 ).
To find what kind of test results from this criterion, we expand the condition
à !−n
½µ ¶ ¾
θ0 1 1 X
α = P (LR ≤ k) = P exp − Xi ≤ k
θ1 θ1 θ0
à ½µ ¶ ¾ à !n !
1 1 X θ0
= P exp − Xi ≤ k
θ1 θ0 θ1
õ ¶ "à !n #!
1 1 X θ0
= P − Xi ≤ log k
θ1 θ0 θ1
à !
X log k + n log θ0 − n log θ1
= P Xi ≤ 1
θ1
− θ10
à !
2 X 2 log k + n log θ0 − n log θ1
= P Xi ≤ 1
θ0 θ0 θ1
− θ10
à !
2 log k + n log θ0 − n log θ1
= P V ≤ 1
θ0 θ1
− θ10
P
where V = θ20 Xi . From the property of exponential distribution, we know under the null
hypothesis, θ20 Xi follows χ22 distribution, consequently, V follows a Chi square distribution
with 2n degrees of freedom. Thus, by looking at the chi-square table, we can find the value
of the chi-square statistic with 2n degrees of freedom such that the probability that V is less
than that number is α, that is, solve for c, such that P (V ≤ c) = α. Once you find the value
of c, you can solve for k and define the test in terms of likelihood ratio.
For example, suppose that H0 : θ = 2 and Ha : θ = 1, and we want to do the test at
a significance level α = 0.05 with a random sample of size n = 5 from an exponential
distribution. We can look at the chi-square table under 10 degrees of freedom to find that
P
3.94 is the value under which there is 0.05 area. Using this, we can obtain P ( 22 Xi ≤
P
3.94) = 0.05. This implies that we should reject the null hypothesis if Xi ≤ 3.94 in this
example.
To find a rejection criterion directly in terms of the likelihood function, we can solve for k
by
2 log k + n log θ0 − n log θ1
1 = 3.94,
θ0 θ1
− θ10
and the solution is k = 0.8034. So going back to the original likelihood ratio, we reject the
null hypothesis if
à !−n ½µ ¶ ¾ µ ¶−5 ½µ ¶ ¾
θ0 1 1 X 2 1 1 X
exp − Xi = exp − Xi ≤ 0.8034
θ1 θ1 θ0 1 1 2
3
H 0 : µ = µ0 vs. Ha : µ 6= µ0
Solution: In this example, the parameter is θ = (µ, σ 2 ). Notice that Θ0 is the set {(µ0 , σ 2 ) :
σ 2 > 0}, and Θa = {(µ, σ 2 ) : µ 6= µ0 , σ 2 > 0}, and hence that Θ = Θ0 ∪ Θa = {(µ, σ 2 ) :
−∞ < µ < ∞, σ 2 > 0}. The value of the constant σ 2 is completely unspecified. We must
now find L(Θ̂0 ) and L(Θ̂).
4
Restricting µ to Θ0 implies that µ = µ0 , and we can find L(Θ̂0 ) if we can determine the
value of σ 2 that maximizes L(µ, σ 2 ) subject to the constraint that µ = µ0 . It is easy to see
that when µ = µ0 , the value of σ 2 that maximizes L(µ0 , σ 2 ) is
n
1X
σ̂02 = (Xi − µ0 )2 .
n i=1
Thus, L(Θ̂0 ) can be obtained by replacing µ with µ0 and σ 2 with σ̂02 in L(µ, σ 2 ), which yields
à !n " n
# Ã !n
1 X (Xi − µ0 )2 1
L(Θ̂0 ) = √ exp − = √ e−n/2 .
2πσ̂0 i=1 2σ̂02 2πσ̂0
We now turn to finding L(Θ̂). Let (µ̂, σ̂ 2 ) be the point in the set Θ which maximizes the
likelihood function L(µ, σ 2 ), by the method of maximum likelihood estimation, we have
n
1X
µ̂ = X̄ and σ̂ 2 = (Xi − µ̂)2 .
n i=1
Notice that 0 < Λ ≤ 1 because Θ0 ⊂ Θ, thus when Λ < k we would reject H0 , where k < 1
is a constant. Because
n
X n
X n
X
(Xi − µ0 )2 = [(Xi − X̄) + (X̄ − µ0 )]2 = (Xi − X̄)2 + n(X̄ − µ0 )2 ,
i=1 i=1 i=1
Pn 2
i=1 (Xi − X̄)
Pn < k0
i=1 (Xi − X̄)2 + n(X̄ − µ0 )2
1
2 < k0.
1 + Pn(
n
X̄−µ0 )
(X −X̄)2
i
i=1
n(X̄ − µ0 )2 1
Pn 2
> 0 − 1 = k 00
i=1 (Xi − X̄) k
n(X̄ − µ0 )2
1 Pn 2
> (n − 1)k 00
n−1 i=1 (Xi − X̄)
By defining
n
1 X
S2 = (Xi − X̄)2 ,
n − 1 i=1
the above rejection region is equivalent to
¯√ ¯
¯ n(X̄ − µ ) ¯ q
¯ 0 ¯
¯ ¯ > (n − 1)k 00 .
¯ S ¯
√
We can recognize that n(X̄ − µ0 )/S is the t statistic employed in previous sections, and the
decision rule is exactly the same as previous. Consequently, in this situation, the likelihood
ratio test is equivalent to the t test. For two-sided tests, we can also verify that likelihood
ratio test is equivalent to the t test.
Example 2: Suppose X1 , · · · , Xn from a normal distribution N (µ, σ 2 ) where both µ and σ
are unknown. We wish to test the hypotheses
at the level α. Show that the likelihood ratio test is equivalent to the χ2 test.
Solution: The parameter is θ = (µ, σ 2 ). Notice that Θ0 is the set {(µ, σ02 ) : −∞ < µ < ∞},
and Θa = {(µ, σ 2 ) : −∞ < µ < ∞, σ 2 6= σ02 }, and hence that Θ = Θ0 ∪ Θa = {(µ, σ 2 ) :
−∞ < µ < ∞, σ 2 > 0}. We must now find L(Θ̂0 ) and L(Θ̂).
For the normal distribution, we have
à !n " n
#
2 1 X (Xi − µ)2
L(θ) = L(µ, σ ) = √ exp − .
2πσ i=1 2σ 2
In the subset Θ0 , we have σ 2 = σ02 , and we can find L(Θ̂0 ) if we can determine the value of
µ that maximizes L(µ, σ 2 ) subject to the constraint that σ 2 = σ02 . It is easy to see that the
6
value of µ that maximizes L(µ, σ02 ) is µ̂0 = X̄. Thus, L(Θ̂0 ) can be obtained by replacing µ
with µ̂0 and σ 2 with σ02 in L(µ, σ 2 ), which yields
à !n " n
#
1 X (Xi − µ̂0 )2
L(Θ̂0 ) = √ exp − .
2πσ0 i=1 2σ02
Next, We find L(Θ̂). Let (µ̂, σ̂ 2 ) be the point in the set Θ which maximizes the likelihood
function L(µ, σ 2 ), by the method of maximum likelihood estimation, we have
n
1X
µ̂ = X̄ and σ̂ 2 = (Xi − µ̂)2 .
n i=1
Notice that 0 < Λ ≤ 1 because Θ0 ⊂ Θ, thus when Λ < k we would reject H0 , where k < 1
is a constant. The rejection region, Λ < k, is equivalent to
à !n/2 " #
σ̂ 2 n σ̂ 2
exp − 2 < ke−n/2 = k 0
σ02 2 σ0
Viewing the left hand side as a function of σ̂ 2 /σ02 , the above inequality holds if σ̂ 2 /σ02 is too
big or too small, i.e.
σ̂ 2 σ̂ 2
< a or >b
σ02 σ02
This inequality is equivalent to
nσ̂ 2 nσ̂ 2
< na or > nb.
σ02 σ02
We can recognize that nσ̂ 2 /σ02 is the χ2 statistic employed in previous sections, and the
decision rule is exactly the same as previous. Consequently, in this situation, the likelihood
ratio test is equivalent to the χ2 test.
Likelihood ratio statistic Λ is a function of the sample X1 , · · · , Xn , and we can prove that it
only depends on the sample through a sufficient statistic. Formally, suppose X1 , · · · , Xn is a
7
random sample from the distribution f (x|θ), where θ ∈ Θ is the unknown parameter (vector).
Furthermore, assume that T (X) is a sufficient statistic, then by factorization theorem the
joint distribution of X1 , · · · , Xn can be decomposed as
H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θa
where Θ0 and Θa are disjoint subsets of the parameter space Θ, and Θ0 ∪ Θa = Θ. Using
likelihood ratio test, we first need to find the maximal points in Θ0 and Θ. In Θ0 , let
θ̂0 = arg max f (X|θ) = arg max u(X)v[T (X), θ] = arg max v[T (X), θ],
θ∈Θ0 θ∈Θ0 θ∈Θ0
then clearly θ̂0 depends on the data X1 , · · · , Xn only through the sufficient statistic T (X),
and let us denote this relation as θ̂0 = g(T (X)). Similarly, in the set Θ, we have
θ̂ = arg max f (X|θ) = arg max u(X)v[T (X), θ] = arg max v[T (X), θ] = h(T (X)).
θ∈Θ θ∈Θ θ∈Θ
which depends on the sufficient statistic only. For example, in example 1, the final likelihood
ratio test depends on X̄ and S, and we know that X̄ is a sufficient statistic for µ, and S is
a sufficient statistic for σ.
Here, we see the importance of sufficient statistic another time. Previously, we saw that MLE
and Bayesian estimators are functions of sufficient statistics, and in exponential family, the
efficient estimator is a linear function of sufficient statistics.
It can be verified that the t test and F test used for two sample hypothesis testing problems
can also be reformulated as likelihood ratio test. Unfortunately, the likelihood ratio method
does not always produce a test statistic with a known probability distribution. If the sample
size is large, however, we can obtain an approximation to the distribution of Λ if some
reasonable “regularity conditions” are satisfied by the underlying population distribution(s).
These are general conditions that hold for most (but not all) of the distributions that we
have considered. The regularity conditions mainly involve the existence of derivatives, with
respect to the parameters, of the likelihood function. Another key condition is that the
region over which the likelihood function is positive cannot depend on unknown parameter
values. In summary, we have the following theorem:
8
Theorem. Let X1 , · · · , Xn have joint likelihood function L(Θ). Let r0 be the number of
free parameters under the null hypothesis H0 : θ ∈ Θ0 , and let r be the number of free
parameters under the alternative hypothesis Ha : θ ∈ Θa . Then, for large sample size n,
the null distribution of −2 log Λ has approximately a χ2 distribution with r − r0 degrees of
freedom.
In example 1, the null hypothesis specifies µ = µ0 but does not specify σ 2 , so there is one
free parameter, r0 = 1; under the alternative hypothesis, there are two free parameters, so
r = 2. For this example, the null distribution of −2 log Λ is exactly χ21 .
Example 2: Suppose that an engineer wishes to compare the number of complaints per week
filed by union stewards for two different shifts at a manufacturing plant. 100 independent
observations on the number of complaints gave meas x̄ = 20 for shift 1 and ȳ = 22 for shift 2.
Assume that the number of complaints per week on the i-th shift has a Poisson distribution
with mean θi , for i = 1, 2. Use the likelihood ratio method to test H0 : θ1 = θ2 versus
Ha : θ1 6= θ2 with significance level α = 0.01.
Solution. The likelihood function of the sample is now the joint probability function of all
xi ’s and yi ’s, and is given by
µ ¶ P P
n
Y θ1xi e−θ1 Y
n
θ2yi e−θ2 1 xi −nθ1 yi −nθ2
L(θ1 , θ2 ) = = θ e θ2 e
i=1 xi ! i=1 yi ! k 1
In this example, Θa = {(θ1 , θ2 ) : θ1 6= θ2 }, and Θ = {(θ1 , θ2 ) : θ1 > 0, θ2 > 0}. Using the
general likelihood function L(θ1 , θ2 ), we see that L(θ1 , θ2 ) is maximized when θ̂1 = x̄ and
θ̂2 = ȳ, respectively. That is, L(θ1 , θ2 ) is maximized when both θ1 and θ2 are replaced by
their maximum likelihood estimates. Thus,
3 Exercises
Exercise 1. Suppose that X = (X1 , · · · , Xn ) is a random sample from a normal distribution
with unknown mean µ and known variance σ 2 . We wish to test the hypotheses
H 0 : µ = µ0 vs. Ha : µ 6= µ0
at the level α. Show that the likelihood ratio test is equivalent to the z test.
Exercise 2. Suppose X1 , · · · , Xn from a normal distribution N (µ, σ 2 ) where both µ and σ
are unknown. We wish to test the hypotheses
at the level α. Show that the likelihood ratio test is equivalent to the χ2 test.