All of Stats-W
All of Stats-W
All of Stats-W
2. What are the alternative terms for parametric and nonparametric inference?
(a) Bernoulli(p)
(b) Normal(µ, σ 2 )
(c) Gamma(k, θ)
(a) Bernoulli(p)
(b) Normal(µ, σ 2 )
(c) Gamma(k, θ)
(d) Exp(λ)
(e) Poisson(λ)
(f) Beta(α, β)
16. Explain how {mle, asymptotic normality} and {equivariance, Delta method} are related to one
another.
1
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 2
2
Let X1 , . . . , Xn ∼ Beta(α, β) iid. The value of β is known.
Hence
Z 1
1
E(X) = x × xα−1 (1 − x)β−1 dx
B(α, β) 0
Z 1
1
= xα (1 − x)β−1 dx
B(α, β) 0
B(α + 1, β)
=
B(α, β)
Γ(α + 1)Γ(β)/Γ(α + β + 1)
=
Γ(α)Γ(β)/Γ(α + β)
αΓ(α)Γ(β)/(α + β)Γ(α + β)
= (because Γ(z + 1) = zΓ(z))
Γ(α)Γ(β)/Γ(α + β)
= α/(α + β).
µ
b1
bn = β ×
α .
1−µb1
# data sizes
N <- c ( 5 , 10 , 50 , 100 , 500 , 1000 , 5000 , 10000 )
# bootstrap size
2
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 2
B <- 1000
for ( n in N )
{
# data from Beta ( alpha , beta )
X <- rbeta ( n , alpha , beta )
# parametric bootstrap
alpha . hat . star <- replicate ( B , mome . alpha ( rbeta ( n , alpha . hat , beta ) , beta ) )
#
# this code is intended to illustrate the basic parametric bootstrap .
# consider using package boot for other purposes .
par ( op )
3
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 3
ML estimators for the three parameters are obtained by maximizing the likelihood; i.e., by solving
∂ ∂ ∂
l(α, β, σ 2 ) = 0, l(α, β, σ 2 ) = 0, l(α, β, σ 2 ) = 0
∂α ∂β ∂σ 2
together for the three parameters. In principle, one should also check whether the 3 × 3 matrix
of second partial derivatives is negative (semi)definite to ensure that it is a maximum and not a
minimum. The ML estimators turn out to be
n m
1 −1 X X
α
b = n xi + m−1 yj
2
i=1 j=1
n m
1 −1 X X
βb = n xi − m−1 yj
2
i=1 j=1
n m
c2 = 1 X
b 2+
X
b 2 .
σ (xi − (b
α + β)) (yj − (b
α − β))
n+m
i=1 j=1
# data sizes
#
n <- 60
m <- 40
# data
#
x <- rnorm ( n , mean = alpha + beta , sd = sigma )
y <- rnorm ( m , mean = alpha - beta , sd = sigma )
# MLE
#
alpha . hat <- 0.5 * ( mean ( x ) + mean ( y ) )
beta . hat <- 0.5 * ( mean ( x ) - mean ( y ) )
sigma . sqr . hat <- ( sum ( ( x - ( alpha . hat + beta . hat ) ) ^2 )
+ sum ( ( y - ( alpha . hat - beta . hat ) ) ^2 ) ) / ( n + m )
3. Give the confidence intervals for the above estimates by estimating standard error through para-
metric bootstrap and using asymptotic normal approximation.
We use below the standard R package boot for the parametric bootstrap. The code below is based
on the last example on the help page for function boot.
4
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 3
# data generator
get . data <- function ( data , mle )
{
data $ x <- rnorm ( data $n , mle [1] + mle [2] , mle [3] )
data $ y <- rnorm ( data $m , mle [1] - mle [2] , mle [3] )
return ( data )
}
# MLE
get . mle <- function ( data )
{
# stopifnot ( length ( data $ x ) == n , length ( data $ y ) == m )
# MLE
theta . hat <- get . mle ( data )
# parametric bootstrap
require ( boot )
R <- 1000
b <- boot ( data = data , statistic = get . mle , R = R ,
sim = " parametric " , ran . gen = get . data , mle = theta . hat )
# bootstrap CIs
alpha <- 0.05 # confidence parameter
ci <- rbind ( boot . ci ( b , type = ’ norm ’ , conf = 1 - alpha , index = 1 ) $ normal [ -1] ,
boot . ci ( b , type = ’ norm ’ , conf = 1 - alpha , index = 2 ) $ normal [ -1] ,
boot . ci ( b , type = ’ norm ’ , conf = 1 - alpha , index = 3 ) $ normal [ -1] )
rownames ( ci ) <- names ( theta )
# MLE
abline ( v = theta . hat [ i ] ) # estimated bias = mean ( b $ t [ , i ] - b $ t [ i ] )
# CI
arrows ( ci [i ,1] , max ( h $ density ) , ci [i ,2] , max ( h $ density ) ,
length = 0.1 , angle = 90 , code = 3 )
}
par ( op )
5
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 4
4
Let X1 , . . . , Xn ∼ N (µ, σ 2 ) iid. Let τ be the β percentile (0 ≤ β ≤ 1); i.e., P (X < τ ) = β.
The log-likelihood function for this problem, sans additive constants, is
n
Sn2 + (X n − µ)2 ,
ln (µ, σ) = −n log σ − 2
2σ
where X n = n−1 ni=1 Xi and Sn2 = n−1 ni=1 (Xi − X n )2 . See Example 9.11, AoS2004. The mle of µ
P P
and σ are, respectively,
N
1X
µ
bn = Xn = Xi
n
i=1
v
u N
u1 X
σ
bn = Sn = t bn )2 .
(Xi − µ
n
i=1
τ = σΦ−1 (β) + µ.
bn Φ−1 (β) + µ
τbn = σ bn .
The expected values of the four partial second derivatives of ln (µ, σ) with respect to µ and σ can
be obtained using the facts E(X n ) = µ and E(Sn2 ) = (n − 1)σ 2 /n. (The last expression follows
from the fact that E(nSn2 /(n − 1)) = σ 2 .) The Fisher information matrix In turns out to be
n 1 0
In = 2
σ 0 2
The inverse Jn of In is
σ2
1 0
Jn = .
n 0 12
The gradient of τ with respect to the parameters is
∂τ
1
∇τ (µ, σ) = ∂µ∂τ = −1 ,
∂σ
Φ (β)
6
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 4
where φ is the N (0, 1) pdf. For the Delta method, the two partial derivatives in the gradient are
required to be non-zero at the maximum-likelihood point (b µ, σ
b). The standard error of τb is
s
σ2
1 −1
q
T 2
τn ) = (∇τ ) Jn (∇τ ) =
se(b 1 + (Φ (β)) .
n 2
(bτn − τ )
N (0, 1),
sc τn )
e(b
τb ± zα/2 sc τn ).
e(b
3. Consider the LakeHuron data available in R. Find the maximum likelihood of τ for this dataset.
Find the standard error using the delta method. Find the standard error using the parametric
bootstrap.
# visualize
#
op <- par ( mfrow = c ( 1 , 3 ) )
# time series plot
plot ( LakeHuron , bty = ’n ’ )
# QQ against normal
qqnorm ( LakeHuron , bty = ’n ’ )
qqline ( LakeHuron )
# data size
#
n <- length ( LakeHuron )
# quantile
#
beta <- 0.95
qb <- qnorm ( beta )
# MLE
#
mu . hat <- mean ( LakeHuron )
sigma . hat <- sd ( LakeHuron ) * sqrt ( ( n - 1 ) / n )
tau . hat <- sigma . hat * qb + mu . hat
tau . hat . se . hat <- sqrt ( sigma . hat ^2 * ( 1 + 0.5 * qb ^2 ) / n )
7
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 5
5
Two coins have P (Head) to be p and p2 respectively, where 0 < p < 1. Each coin is tossed exactly
once. The resulting rvs are denoted X and Y respectively. In other words, X ∼ Bernoulli(p) and
Y ∼ Bernoulli(p2 ).
1. Write the likelihood function for p, log-likelihoood and its first two derivatives, assuming inde-
pendence between X and Y .
4. Based on any insights from the previous result, plus some guesswork/trial-and-error, find all
solutions to the likelihood equation in terms of X and Y .
Trial-and-error shows that p = 1 is one solution to this equation. Factoring p = 1 out of the
likelihood equation, we get the quadratic equation
3p2 + (1 − X)p − (X + 2Y ) = 0,
whose roots are
1 p 1 p
p± = −(1 − X) ± (1 − X)2 + 12(X + 2Y ) = (X − 1) ± 1 − 10X + 24Y + X 2 .
6 6
Both roots are real for the four joint possibilities for X and Y ; namely, (a) X = 1 and Y = 0, (b)
X = 1 and Y = 1, (c) X = 0 and Y = 1, and (d) X = 0 and Y = 0. One of the positive roots,
namely, 1 or p+ is the mle.
8
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 5
5. Plot the likelihood, log-likelihoood, and its first two derivatives as functions of p for the four cases
(a) X = 1 and Y = 0, (b) X = 1 and Y = 1, (c) X = 0 and Y = 1, and (d) X = 0 and Y = 0.
ll <- function ( p , X , Y , deriv = 0 )
{
stopifnot ( length ( deriv ) == 1 , deriv % in % ( 0:2 ) ,
all ( X % in % ( 0:1 ) ) , all ( Y % in % ( 0:1 ) ) )
nX <- length ( X )
nY <- length ( Y )
sX <- sum ( X )
sY <- sum ( Y )
if ( deriv == 0 )
{
i <- which ( p % in % c ( 0 , 1 ) )
j <- setdiff ( 1: length ( p ) , i )
p [ i ] <- - Inf
p [ j ] <- ( sX +2 * sY ) * log ( p [ j ]) +( nX - sX ) * log (1 - p [ j ]) +( nY - sY ) * log (1 - p [ j ]^2)
return ( p )
}
nX <- length ( X )
nY <- length ( Y )
sX <- sum ( X )
sY <- sum ( Y )
0.5 * ( sX - nX + sqrt ( ( sX - nX ) ^2 + 4 * ( nX +2 * nY ) * ( sX +2 * sY ) ) ) / ( nX +2 * nY )
}
Computation of the derivatives in the script above may not be correct for p = 0, 1.
9
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 5
6. Generalize the likelihood function, log-likelihood, and its derivatives for X1 , . . . , XN ∼ Bernoulli(p)
and Y1 , . . . , YM ∼ Bernoulli(p2 ). Assume independence.
N
X M
X
Let SX = Xi and SY = Yj .
i=1 j=1
L(p) = pSX (1 − p)N −SX × (p2 )SY (1 − p2 )M −SY = pSX +2SY (1 − p)N −SX (1 − p2 )M −SY
l(p) = (SX + 2SY ) log p + (N − SX ) log(1 − p) + (M − SY ) log(1 − p2 )
d SX + 2SY N − SX M − SY
l(p) = − − 2p
dp p 1−p 1 − p2
d2 SX + 2SY N − SX M − SY M − SY
2
l(p) = − 2
− 2
−2 2
− (2p)2 .
dp p (1 − p) 1−p (1 − p2 )2
which has p = 1 as one solution. The other two solutions are the roots of the quadratic
namely,
1 p
p± = −(N − SX ) ± (N − SX )2 + 4(N + 2M )(SX + 2SY ) .
2(N + 2M )
10
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 6
6
If X1 , X2 , . . . , Xn ∼ Poisson(λ) iid, design estimators for λ using estimation methods you have studied.
Poisson distribution is a discrete distribution defined over nonnegative integers k = 0, 1, . . . by the pmf
λk e−λ
P (k) = ,
k!
∞
X ∞
X
with λ > 0. It is normalized (i.e., 1 = P (k)) because λk /k! = eλ .
k=0 k=0
1. mome. We have
∞
X ∞
X
E(X) = kP (k) = kP (k)
k=0 k=1
∞ ∞
X λk X λk+1
= e−λ = e−λ
(k − 1)! k!
k=1 k=0
∞ ∞
−λ
X λk X
= λ×e =λ× P (k)
k!
k=0 k=0
= λ.
Pn
Equating this with the first sample moment n−1 i=1 Xi , we get the mome
n
X
b = n−1
λ Xi .
i=1
3. plug-in. This is a parametric setting and, as such, the plug-in method is not appropriate. If one
uses E(X) = λ, and use the plug-in estimator for mean, then this is same as mome.
11
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 7
1. X1 , X2 , . . . , Xn ∼ Uniform(0, θ), where θ > 0. What is the maximum likelihood estimator for θ?
This is AoS2004 Example 9.12.
1
θ 0≤x≤θ
fX (x; θ) = .
0 otherwise
Because θ defines the right boundary of the pdf, the pdf is discontinuous at x = θ. This
discontinuity will be reflected in the likelihood function. As such, the derivative-based method of
maximization is inapplicable here. The likelihood function is
n −n
Y θ 0 ≤ X1 , . . . , Xn ≤ θ
L(θ) = fX (Xi ; θ) = .
0 otherwise
i=1
To see how L(θ) changes with θ, imagine sliding the value of θ from left to right across the range
of observed data1 :
θ
X3 X8 X6 X5 X2 X1 X4 X7
As long as any data point remains to the right of θ, the likelihood L(θ) will be equal to zero.
Continuing the same argument, the likelihood will be nonzero (= θ−n ) for θ ≥ max{X1 , . . . , Xn }.
So
0 θ < max{X1 , . . . , Xn }
L(θ) = −n .
θ θ ≥ max{X1 , . . . , Xn }
Clearly, the maximum value of L occurs at the boundary, i.e., θ = max{X1 , . . . , Xn }. Therefore,
θbn = max{X1 , . . . , Xn }.
theta <- 1
n <- 10
u <- runif ( n , 0 , theta )
Requiring all data to be inside the interval [0, θ] is same as requiring the maximum data value to
be inside the interval [0, θ]. Therefore,
12
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 7
θbn = min{X1 , . . . , Xn }.
13
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 8
8
Let X1 , X2 , . . . , Xn ∼ Uniform(a, b) iid. a and b are the unknown parameters. Assume that a < b.
The first two sample moments (i.e., estimators for the first two moments above) are
n
1X
µ
b1 = Xi
n
i=1
n
1 X
µ
b2 = Xi2 .
n
i=1
Equate µk with µ
bk , k = 1, 2, we get two equations for the estimators of a and b:
1
µ
b1 = (b
an + bbn )
2
1 2
µ
b2 = (b
a +b anbbn + bb2n ).
3 n
Solving for b
an and bbn , we get the following two pairs of solutions:
q
an = µ
b b1 + 3(b µ2 − µb21 )
q
bbn = µ b1 − 3(b µ2 − µb21 )
OR
q
b1 −
an = µ
b 3(b b21 )
µ2 − µ
q
bbn = µ
b1 + 3(b b21 ).
µ2 − µ
b21 = n1 ni=1 (Xi − µ b1 )2 > 0. Hence, the required pair of estimators which is
P
b2 − µ
Notice that µ
consistent with a < b is the first one.
an = min{X1 , . . . , Xn }
b
bbn = max{X1 , . . . , Xn }.
14
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 9
9
Let X1 , X2 , . . . , Xn be independent and identically distributed rvs with the pdf
α+1
f (x; α) = , x ≥ 1, α > 0.
xα+2
This distribution function is a special form of the Pareto distribution, with parameter xm = 1.
n n
!−(α+2)
Y α+1 n
Y
L(α) = α+2 = (α + 1) Xi
i=1
Xi i=1
n
X
l(α) = n log(α + 1) − (α + 2) log(Xi )
i=1
n
dl(α) n X
= − log(Xi ).
dα α+1
i=1
Setting the derivative to zero and solving for α, we get the mle
n
!−1
1X
α
bn = log(Xi ) − 1.
n
i=1
15
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 10
10
Suppose we have independent normal data with the unequal means but the same variance; i.e., Xi ∼
N (µi , σ 2 ), i = 1, . . . , n. The means µi are all known.
d2
2 2n
2
I(σ ) = −E 2 l(σ ) = σ 2 .
dσ 2
16
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 11
11
Suppose we have independent normal data with the same mean but unequal variances; i.e., Xi ∼
N (µ, σi2 ), i = 1, . . . , n. The variances σi2 are all known.
where
1/σ 2
wi = Pn i 2 .
j=1 1/σj
Therefore,
n
d2
X
1
I(µ) = −E l(µ) = .
dµ 2 σi2 i=1
17
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 12
12
X1 , X2 , . . . , Xn ∼ Exp(θ) iid.
18
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 13
13
n1 people are given treatment 1 and n2 people are given treatment 2. n1 and n2 are known. Let X1 be
the number of people on treatment 1 who respond favourably to the treatment and X2 be the number of
people on treatment 2 who respond favourably to the treatment. Assume that X1 ∼ Binomial(n1 , p1 ),
X2 ∼ Binomial(n2 , p2 ). Let ψ = p1 − p2 .
3. Use the multiparameter delta method to find the asymptotic standard error of ψ̂.
19
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 14
14
Let X1 , . . . , Xn be iid rvs that take values θ + 1 and θ − 1 with probabilities p and 1 − p respectively.
Here, 0 ≤ p ≤ 1 is known. θ, the unknown, is a real number. We wish to estimate θ from data
X1 , X2 , . . . , Xn .
E(X) = p × (θ + 1) + (1 − p) × (θ − 1) = 2p + θ − 1.
It can be shown easily that this estimator is unbiased, and has variance (4p(1 − p) + 2θ)/n.
20
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 14
θbn = min(X1 , . . . , Xn ) + 1
θbn = max(X1 , . . . , Xn ) − 1
8. If all the observations turn out to be identical, would you need to modify your estimators? If yes,
state the modification and justify.
The events
X1 = . . . = Xn = θ + 1
and
X1 = . . . = Xn = θ − 1
could occur with probabilities pn and (1 − p)n respectively. However, we don’t know which
possibility has actually shown up. In these two cases, all previous estimators will lead to wrong
estimates of θ. Suppose X1 = . . . = Xn = α, where α is either θ + 1 or θ − 1 – we don’t know.
The following estimators may work better for this situation:
α − 1 if p > 1 − p
θn =
b
α + 1 otherwise
α − 1 with probability p
θbn =
α + 1 with probability 1 − p
The second estimator above requires generating a Uniform(0,1) random number r, and choosing
α − 1 if r ≤ p, and α + 1 otherwise. The second estimator above appears more appropriate when
p = 1/2.
** CONFIRM **
21
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 15
15
Suppose we have paired data (Xi , Yi ), i = 1, . . . , n. Let us assume that X and Y are related to each
other as
Yi = (aXi + b) + i ,
where the random noise i ∼ N (0, 1). Find the mles of a, b.
22
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 16
16
Suppose you have paired data (xi , yi ), i = 1, . . . , n, where xi are precisely known (they are not rvs).
Further, you are expecting a relationship between xi and yi that has the form
yi = aexi /b + i ,
where i ∼ N (0, σ 2 ) iid and σ 2 is not known. Write down the log-likelihood function and the likelihood
equations for estimation of a, b, σ 2 .
23
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 17
17
Let X1 , . . . , Xn ∼ N (µ, σ 2 ) iid, with known µ and unknown σ. Find the mle of σ and ψ = log σ. Find
the Fisher information for σ. Obtain thereby an estimate of the standard error of ψ, b together with a
(1 − α) confidence interval on ψ.
24
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 18
18
Let X1 , . . . , Xn ∼ Bernoulli(p) and let ψ = g(p) = log(p/(1 − p)). State the maximum likelihood
estimate for p. Using that, derive the formula for maximum likelihood estimate ψ̂ of ψ. Also compute
the standard error of ψ̂.
25
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 19
19
Let X1 , . . . , Xn ∼ f (·; k, θ) iid, where
k+1
− x)k 0 ≤ x ≤ θ
θk+1
(θ
f (x; k, θ) = ,
0 otherwise
and the Gamma function properties Γ(n) = (n − 1)! for integer n, and Γ(k + 1) = kΓ(k).
Second moment of the pdf:
k+1 θ 2 k+1 θ 2
Z Z
2 k x k
E(X ) = x (θ − x) dx = x 1 − dx
θk+1 0 θ 0 θ
Z 1
Γ(3)Γ(k + 1) θ θ
= θ2 (k + 1) y 2 (1 − y)k dx = θ2 (k + 1) = × ,
0 Γ(k + 4) k+3 k+2
Pn
Equating P b1 = n−1
the first two moments of the pdf with the first sample moments µ i=1 Xi and
n
b2 = n−1 i=1 Xi2 , we get the mome equations
µ
θ
µ
b1 =
k+2
θ θ
µ
b2 = × .
k+3 k+2
Solving for k and θ, we get the mome
µ2 − 2b
3b µ1 2
k = −
b
b2 − µ
µ b21
θb = (b
k + 2)b
µ1 .
2. Write the likelihood and log-likelihood functions. Proceed as far as possible towards finding the
mle for the parameters.
The pdf is discontinuous at x = 0 and possibly non-differentiable at x = θ. Since θ defines
a boundary for the pdf, we need to analyze the behaviour of the likelihood function carefully.
Write the pdf in the alternative form
2
fX (x; θ) = (θ − x)I[0,θ] (x),
θ2
where
1 x∈A
IA (x) = .
0 x∈
/A
26
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 19
Because of the indicator functions I[0,θ] (Xi ), i = 1, . . . , n, the likelihood as function of θ will
be nonzero only when θ > max{X1 , . . . , Xn }. We may write the likelihood and log-likelihood
functions as
0 θ ≤ max{X1 , . . . , Xn }
L(k, θ) = k+1 n Qn
k
k+1 i=1 (θ − Xi ) θ > max{X1 , . . . , Xn }
θ
−∞ θ ≤ max{X1 , . . . , Xn }
l(k, θ) =
n log(k + 1) − n(k + 1) log(θ) + k ni=1 log(θ − Xi ) θ > max{X1 , . . . , Xn }
P
Assuming that θ is constrained to the region θ > max{X1 , . . . , Xn }, we may write the likelihood
equations
n
∂l n(k + 1) X 1
= − +k =0
∂θ θ θ − Xi
i=1
n
∂l n X
= − − n log(θ) + log(θ − Xi ) = 0.
∂k (k + 1)
i=1
which turn out to be nonlinear in the parameters, and presumably from hard to impossible to solve
analytically. We may have to resort to numerical maximization methods or numerical methods
for solving systems of nonlinear equations.
3. Write R code to plot the log-likelihood function (as function of θ for fixed value of k) to see if the
mle of θ necessarily lies on the boundary θ = max{X1 , . . . , Xn }, or on the right of this boundary.
Write R code to make a surface plot or heatmap of the log-likelihood as function of both θ and k.
Core functions:
df <- function ( x , theta = 1 , k = 1 )
{
# probability density function
x
}
if ( ! is . null ( l ) ) x [ l ] <- 0
if ( ! is . null ( m ) ) x [ m ] <- 1 - ( 1 - x [ m ] / theta ) ^( k + 1 )
if ( ! is . null ( n ) ) x [ n ] <- 1
x
}
27
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 19
{
# quantile function
theta * ( 1 - ( 1 - q ) ^( 1 / ( k + 1 ) ) )
}
qf ( runif ( n ) , theta , k )
}
theta
}
theta <- 1
k <- 1 / 2
# cdf
curve ( pf ( x , theta , k ) , from = -0.5 , to = theta + 0.5 , n = 501 ,
bty = ’n ’ , lwd = 2 , xlim = c ( -0.5 , theta + 0.5 ) , ylab = ’ CDF ’ , asp = 1 )
abline ( h = c ( 0 , 1 ) , v = c ( 0 , theta ) , lty = 2 )
# qf
curve ( qf ( x , theta , k ) , from = 0 , to = 1 , n = 501 ,
bty = ’n ’ , lwd = 2 , xlim = c ( -0.1 , 1.1 ) , ylab = ’ QF ’ , xlab = ’q ’ , asp = 1 )
abline ( h = c ( 0 , theta ) , v = c ( 0 , 1 ) , lty = 2 )
par ( op )
theta <- 1
k <- 1 / 2
28
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 19
29
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 20
20
Let X1 , . . . , Xn ∼ f (·; θ, δ) iid, where
(
1 − x−δ
θe
θ for x ≥ δ
f (x; θ, δ) =
0 otherwise
The pdf definedZ above can be thought of as a shifted version of the exponential distribution. It can be
1 ∞ − x−δ
verified that e θ dx = 1.
θ δ
1 ∞ − x−δ
Z Z ∞
µ1 = E(X) = xe θ dx = (θy + δ)e−y dy = θ + δ
θ δ 0
Z ∞ Z ∞
1 2 − θx−δ
µ2 = E(X ) = 2
x e dx = (θy + δ)2 e−y dy = 2θ2 + 2θδ + δ 2 = θ2 + µ21 ,
θ δ 0
Z ∞
where we have used the result y k−1 e−y dy ≡ Γ(k) = (k − 1)! for positive integer k. The first
0
two sample moments are
n
1X
µ
b1 = Xi
n
i=1
n
1X 2
µ
b2 = Xi .
n
i=1
Equating the respective moments with their sample counterparts, we get the mome equations
µ
b1 = θbn + δbn
b2 = θb2 + µ
µ b2 . n 1
30
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 20
Clearly, something is wrong with this naive treatment: The culprit is that f is neither continuous
nor differentiable at x = δ. Alternate arguments lead to
δbn = min{X1 , . . . , Xn }.
** CONFIRM **
# CDF
pf <- function ( x , delta = 0 , theta = 1 ) { pexp ( x - delta , rate = 1 / theta ) }
# QF
qf <- function ( q , delta = 0 , theta = 1 ) { qexp ( q , rate = 1 / theta ) + delta }
# RNG
rf <- function ( n , delta = 0 , theta = 1 ) { rexp ( n , rate = 1 / theta ) + delta }
delta
}
delta <- 1
theta <- 2
# cdf
31
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 20
# qf
curve ( qf ( x , delta , theta ) , from = 0 , to = 1 , n = 501 , bty = ’n ’ , lwd = 2 ,
xlim = c ( -0.1 , 1.1 ) , ylim = c ( - delta , 10 * delta ) , ylab = ’ QF ’ , xlab = ’q ’ )
abline ( h = delta , v = c ( 0 , 1 ) , lty = 2 )
axis ( 4 , at = delta , label = expression ( delta ) , las = 2 )
par ( op )
delta <- 1
theta <- 2
# log - likelihood function as function of delta , with fixed value for theta
# log - likelihood --> - infinity for delta > min ( r ) ( not shown )
32
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 21
21
Let X1 , . . . , Xn ∼ Gamma(k, θ) iid. The pdf of Gamma(k, θ) is
( x
xk−1 e− θ
k x>0
f (x; k, θ) = θ Γ(k)
0 otherwise
with k, θ > 0.
1. Find the mome for k and θ.
We need the first two moments of Gamma(k, θ):
µ1 = E(X) = kθ
µ2 = E(X ) = Var(X) + (E(X))2 = kθ2 + (kθ)2 .
2
µ
b1 = b
k θb
2
µ k θb2 + b
b2 = b k θb .
b21
µ
k =
b
b2 − µ
µ b21
b2 − µ
µ b21
θb = .
µ
b1
This is a continuous and differentiable function of θ, so to maximize it, we find its derivative and
set it to zero:
n
nk 1 X
− − 2 xi = 0,
θ θ
i=1
giving us
n
1 X µ
b1
θ=
b xi = .
nk k
i=1
3. Assuming that the value of k is known, find the mle of θ, find mle for θ and τ = (2θ − 1)2 .
Using equivariance, we get τb = (2θb − 1)2 .
33
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 22
22
Visualize the likelihood or the log-likelihood functions:
1. X1 , . . . , Xn ∼ Bernoulli(p) iid.
# likelihood function for a Bernoulli ( p ) sample
log . likelihood . bernoulli <- function ( data , p = seq ( 0 , 1 , by = 0.1 ) )
{
N <- length ( data )
S <- sum ( data )
L <- p ^ S * ( 1 - p ) ^( N - S )
logL <- S * log ( p ) + ( N - S ) * log ( 1 - p )
cbind ( p , L , logL )
}
# visualization
op <- par ( mfrow = c ( 2 , 3 ) )
par ( op )
2. X1 , . . . , Xn ∼ N (µ, σ 2 ) iid.
# log likelihood function for a normal sample
log . likelihood . normal <- function ( mu , sigma , X )
{
# stopifnot ( sigma > 0 )
n <- length ( X )
S2 <- ( n - 1 ) * var ( X ) / n
Xbar <- mean ( X )
sapply ( sigma , function ( sigma ) { -n * log ( sigma ) - 0.5 * n * ( S2 +( Xbar - mu ) ^2) / sigma ^2 } )
}
# visualization
grid . size <- 501 # finer grid == > smoother contours , slower draw , larger storage
34
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 22
par ( op )
35