All of Stats-W

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 1

1. What are the differences between parametric and nonparametric inference?

2. What are the alternative terms for parametric and nonparametric inference?

3. Explain: Parameter of interest, nuiscence parameter.

4. Define: kth moment of a distribution.

5. Define: kth sample moment.

6. Explain: Method of moments estimator/estimation (mome).

7. Derive expressions for the mome for

(a) Bernoulli(p)
(b) Normal(µ, σ 2 )
(c) Gamma(k, θ)

8. State the essential properties of mome.

9. Define: Likelihood function.

10. Explain the rationale behind the likelihood principle.

11. Define: Maximum likelihood estimation/estimator (mle).

12. Derive the likelihood functions for

(a) Bernoulli(p)
(b) Normal(µ, σ 2 )
(c) Gamma(k, θ)
(d) Exp(λ)
(e) Poisson(λ)
(f) Beta(α, β)

assuming that the respective parameters are unknown.

13. Derive expressions for the mle for indicated parameters:

(a) p for Bernoulli(p)


(b) µ, σ 2 for Normal(µ, σ 2 )
(c) θ for Gamma(k, θ), assuming k is known
(d) λ for Exp(λ)
(e) λ for Poisson(λ)

assuming that the respective parameters are unknown.

14. State the essential properties of mle.

15. Define Fisher information.

16. Explain how {mle, asymptotic normality} and {equivariance, Delta method} are related to one
another.

17. State the rationale behind the parametric bootstrap.

18. State the parametric bootstrap procedure.

1
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 2

2
Let X1 , . . . , Xn ∼ Beta(α, β) iid. The value of β is known.

1. Derive the expression for E(X).


The pdf for Beta(α, β) is
(
xα−1 (1−x)β−1
B(α,β) 0≤x≤1
fX (x) = , where
0 otherwise
Γ(α)Γ(β)
B(α, β) = (Beta Function)
Γ(α + β)
Z ∞
Γ(z) = tz−1 e−t dt (Gamma Function).
0

Because fX (x) is normalized, we have


Z 1
xα−1 (1 − x)β−1 dx = B(α, β).
0

Hence
Z 1
1
E(X) = x × xα−1 (1 − x)β−1 dx
B(α, β) 0
Z 1
1
= xα (1 − x)β−1 dx
B(α, β) 0
B(α + 1, β)
=
B(α, β)
Γ(α + 1)Γ(β)/Γ(α + β + 1)
=
Γ(α)Γ(β)/Γ(α + β)
αΓ(α)Γ(β)/(α + β)Γ(α + β)
= (because Γ(z + 1) = zΓ(z))
Γ(α)Γ(β)/Γ(α + β)
= α/(α + β).

2. Assuming β to be known, find the mome for α.


Pn
Equate α
bn /(b b1 = n−1
αn + β) with the first sample moment µ i=1 Xi , and solve for it α
bn . We get

µ
b1
bn = β ×
α .
1−µb1

3. Use parametric bootstrap to get glimpses of the sampling distribution of α


bn for different values
of n.
# Beta distribution parameter
alpha <- 0.5 # we will estimate this using MoM ; see function below
beta <- 0.5 # we will assume this to be known

# MoM estimator of alpha


mome . alpha <- function ( x , beta )
{
mu1 . hat <- mean ( x )
beta * mu1 . hat / ( 1 - mu1 . hat )
}

# data sizes
N <- c ( 5 , 10 , 50 , 100 , 500 , 1000 , 5000 , 10000 )

# bootstrap size

2
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 2

B <- 1000

op <- par ( mfrow = c ( 2 , 4 ) )

for ( n in N )
{
# data from Beta ( alpha , beta )
X <- rbeta ( n , alpha , beta )

# MoM estimate of alpha assuming beta to be know


alpha . hat <- mome . alpha ( X , beta )

# parametric bootstrap
alpha . hat . star <- replicate ( B , mome . alpha ( rbeta ( n , alpha . hat , beta ) , beta ) )
#
# this code is intended to illustrate the basic parametric bootstrap .
# consider using package boot for other purposes .

# bootstrap distribution of alpha . hat


hist ( alpha . hat . star , min ( 50 , nclass . FD ( alpha . hat . star ) ) ,
xlim = range ( alpha . hat . star , alpha . hat , alpha ) ,
freq = FALSE , main = paste ( ’n = ’ , n ) , border = ’ white ’ , col = ’ gray ’ )
curve ( dnorm ( x , mean ( alpha . hat . star ) , sd ( alpha . hat . star ) ) , col = ’ red ’ , lty = 2 ,
from = min ( alpha . hat . star ) , to = max ( alpha . hat . star ) , n = 501 , add = TRUE )
abline ( v = alpha , col = ’ blue ’ )
axis ( 3 , alpha , label = expression ( alpha ) , col = ’ blue ’ , col . axis = ’ blue ’ )
}

par ( op )

3
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 3

1. Let X1 , . . . , Xn ∼ N (α + β, σ 2 ) iid and Y1 , . . . , Ym ∼ N (α − β, σ 2 ) iid. Also assume independence


between the X and Y rvs. Find the maximum likelihood estimators of α, β and σ 2 .
The likelihood L(α, β, σ 2 ) and log-likelihood l(α, β, σ 2 ) ≡ log L(α, β, σ 2 ) functions for this problem
are
n   Y m  
Y 1 1 1 1
L(α, β, σ 2 ) = √ exp − 2 (xi − (α + β))2 × √ exp − 2 (yj − (α − β))2
i=1 2πσ 2 2σ j=1 2πσ 2 2σ
n m
n+m 1 X 1 X
l(α, β, σ 2 ) log 2π + log σ 2 − 2 (xi − (α + β))2 − 2 (yj − (α − β))2 .

= −
2 2σ i=1 2σ j=1

ML estimators for the three parameters are obtained by maximizing the likelihood; i.e., by solving
∂ ∂ ∂
l(α, β, σ 2 ) = 0, l(α, β, σ 2 ) = 0, l(α, β, σ 2 ) = 0
∂α ∂β ∂σ 2
together for the three parameters. In principle, one should also check whether the 3 × 3 matrix
of second partial derivatives is negative (semi)definite to ensure that it is a maximum and not a
minimum. The ML estimators turn out to be
 
n m
1  −1 X X
α
b = n xi + m−1 yj 
2
i=1 j=1
 
n m
1  −1 X X
βb = n xi − m−1 yj 
2
i=1 j=1
 
n m
c2 = 1  X
b 2+
X
b 2 .
σ (xi − (b
α + β)) (yj − (b
α − β))
n+m
i=1 j=1

2. With α = 1, β = 2, σ 2 = 1, draw a sample of size 60 from the distribution of X and a sample of


size 40 from the distribution of Y . Compute the maximum likelihood estimates of α, β and σ 2
from the combined sample of size 100.
# true values of parameters
#
alpha <- 1
beta <- 2
sigma <- 1

# data sizes
#
n <- 60
m <- 40

# data
#
x <- rnorm ( n , mean = alpha + beta , sd = sigma )
y <- rnorm ( m , mean = alpha - beta , sd = sigma )

# MLE
#
alpha . hat <- 0.5 * ( mean ( x ) + mean ( y ) )
beta . hat <- 0.5 * ( mean ( x ) - mean ( y ) )
sigma . sqr . hat <- ( sum ( ( x - ( alpha . hat + beta . hat ) ) ^2 )
+ sum ( ( y - ( alpha . hat - beta . hat ) ) ^2 ) ) / ( n + m )

3. Give the confidence intervals for the above estimates by estimating standard error through para-
metric bootstrap and using asymptotic normal approximation.
We use below the standard R package boot for the parametric bootstrap. The code below is based
on the last example on the help page for function boot.

4
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 3

# data generator
get . data <- function ( data , mle )
{
data $ x <- rnorm ( data $n , mle [1] + mle [2] , mle [3] )
data $ y <- rnorm ( data $m , mle [1] - mle [2] , mle [3] )

return ( data )
}

# MLE
get . mle <- function ( data )
{
# stopifnot ( length ( data $ x ) == n , length ( data $ y ) == m )

a . hat <- 0.5 * ( mean ( data $ x ) + mean ( data $ y ) )


b . hat <- 0.5 * ( mean ( data $ x ) - mean ( data $ y ) )
ss . hat <- ( sum ( ( data $ x - ( a . hat + b . hat ) ) ^2 )
+ sum ( ( data $ y - ( a . hat - b . hat ) ) ^2 ) ) / ( data $ n + data $ m )

c ( alpha = a . hat , beta = b . hat , sigma = sqrt ( ss . hat ) )


}

# true values of parameters


theta <- c ( alpha = 1 , beta = 2 , sigma = 1 )

# data sizes and data


data <- list ( n = 60 , m = 40 , x = NULL , y = NULL )
data <- get . data ( data , theta )

# MLE
theta . hat <- get . mle ( data )

# parametric bootstrap
require ( boot )

R <- 1000
b <- boot ( data = data , statistic = get . mle , R = R ,
sim = " parametric " , ran . gen = get . data , mle = theta . hat )

# bootstrap CIs
alpha <- 0.05 # confidence parameter
ci <- rbind ( boot . ci ( b , type = ’ norm ’ , conf = 1 - alpha , index = 1 ) $ normal [ -1] ,
boot . ci ( b , type = ’ norm ’ , conf = 1 - alpha , index = 2 ) $ normal [ -1] ,
boot . ci ( b , type = ’ norm ’ , conf = 1 - alpha , index = 3 ) $ normal [ -1] )
rownames ( ci ) <- names ( theta )

# plot bootstrap distributions


op <- par ( mfrow = c ( 1 , 3 ) )
for ( i in 1: ncol ( b $ t ) )
{
# histogram
h <- hist ( b $ t [ , i ] , ’ FD ’ , plot = FALSE )
plot ( h , col = ’ gray ’ , border = ’ white ’ , main = ’ ’ , freq = FALSE ,
xlab = parse ( text = paste ( ’ widehat ( ’ , names ( theta ) [ i ] , ’) ’ ) ) )

# normal pdf with same mean and sd as the bootstrap distribution


curve ( dnorm ( x , mean ( b $ t [ , i ] ) , sd ( b $ t [ , i ] ) ) , from = min ( b $ t [ , i ] ) ,
to = max ( b $ t [ , i ] ) , n = 501 , add = TRUE , lty = 2 )
# mean of bootstrap values
abline ( v = mean ( b $ t [ , i ] ) , lty = 2 )

# MLE
abline ( v = theta . hat [ i ] ) # estimated bias = mean ( b $ t [ , i ] - b $ t [ i ] )

# CI
arrows ( ci [i ,1] , max ( h $ density ) , ci [i ,2] , max ( h $ density ) ,
length = 0.1 , angle = 90 , code = 3 )
}
par ( op )

5
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 4

4
Let X1 , . . . , Xn ∼ N (µ, σ 2 ) iid. Let τ be the β percentile (0 ≤ β ≤ 1); i.e., P (X < τ ) = β.
The log-likelihood function for this problem, sans additive constants, is
n
Sn2 + (X n − µ)2 ,

ln (µ, σ) = −n log σ − 2

where X n = n−1 ni=1 Xi and Sn2 = n−1 ni=1 (Xi − X n )2 . See Example 9.11, AoS2004. The mle of µ
P P
and σ are, respectively,
N
1X
µ
bn = Xn = Xi
n
i=1
v
u N
u1 X
σ
bn = Sn = t bn )2 .
(Xi − µ
n
i=1

1. Find the maximum likelihood estimate of τ .


Let Z = (X − µ)/σ ∼ N (0, 1), and Φ be the cdf for the standard normal N (0, 1). Then
     
X −µ τ −µ τ −µ τ −µ
β = P (X < τ ) = P < =P Z< =Φ .
σ σ σ σ

where Φ is the cdf for N (0, 1). Hence,

τ = σΦ−1 (β) + µ.

The mle of τ is obtained by invoking the equivariance property of the mle:

bn Φ−1 (β) + µ
τbn = σ bn .

2. Derive expression for an approximate 95% confidence interval for τ .


For this purpose, we need to use the multiparameter Delta method which establishes asymptotic
normality of mle of functions of parameters. By definition, the (i, j)th element of the Fisher
information matrix In is the expected value of the second partial derivative of ln (θ) with respect
to ith and jth parameters θ ≡ (µ, σ); i.e.,
 2 
∂ ln (µ, σ)
(In )i,j = −E .
∂θi ∂θj

The expected values of the four partial second derivatives of ln (µ, σ) with respect to µ and σ can
be obtained using the facts E(X n ) = µ and E(Sn2 ) = (n − 1)σ 2 /n. (The last expression follows
from the fact that E(nSn2 /(n − 1)) = σ 2 .) The Fisher information matrix In turns out to be
 
n 1 0
In = 2
σ 0 2

The inverse Jn of In is
σ2
 
1 0
Jn = .
n 0 12
The gradient of τ with respect to the parameters is
 ∂τ   
1
∇τ (µ, σ) = ∂µ∂τ = −1 ,
∂σ
Φ (β)

6
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 4

where φ is the N (0, 1) pdf. For the Delta method, the two partial derivatives in the gradient are
required to be non-zero at the maximum-likelihood point (b µ, σ
b). The standard error of τb is
s 
σ2

1 −1
q
T 2
τn ) = (∇τ ) Jn (∇τ ) =
se(b 1 + (Φ (β)) .
n 2

The estimated standard error sc τn ) is obtained by substituting σ


e(b bn for σ. Because

(bτn − τ )
N (0, 1),
sc τn )
e(b

an approximate/asymptotic (1 − α) confidence interval on τ is

τb ± zα/2 sc τn ).
e(b

3. Consider the LakeHuron data available in R. Find the maximum likelihood of τ for this dataset.
Find the standard error using the delta method. Find the standard error using the parametric
bootstrap.
# visualize
#
op <- par ( mfrow = c ( 1 , 3 ) )
# time series plot
plot ( LakeHuron , bty = ’n ’ )

# QQ against normal
qqnorm ( LakeHuron , bty = ’n ’ )
qqline ( LakeHuron )

# histogram with a normal curve overlaid


hist ( LakeHuron , ’ FD ’ , col = ’ gray ’ , border = ’ white ’ , freq = FALSE )
curve ( dnorm ( x , mean ( LakeHuron ) , sd ( LakeHuron ) ) ,
from = par ( ’ usr ’ ) [1] , to = par ( ’ usr ’ ) [2] , n = 501 , add = TRUE , lty = 2 )

# data locations on the x - axis


rug ( LakeHuron )
par ( op )

# data size
#
n <- length ( LakeHuron )

# quantile
#
beta <- 0.95
qb <- qnorm ( beta )

# MLE
#
mu . hat <- mean ( LakeHuron )
sigma . hat <- sd ( LakeHuron ) * sqrt ( ( n - 1 ) / n )
tau . hat <- sigma . hat * qb + mu . hat
tau . hat . se . hat <- sqrt ( sigma . hat ^2 * ( 1 + 0.5 * qb ^2 ) / n )

# confidence interval for tau


#
alpha <- 0.05
ci <- tau . hat + c ( -1 , +1 ) * qnorm ( 1 - 0.5 * alpha ) * tau . hat . se . hat
#
# while MLEs are asymp toticall y normal , one never knows if the distribution
# of tau . hat is approximately normal for this data size . The CI should not
# be over - interpreted .

7
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 5

5
Two coins have P (Head) to be p and p2 respectively, where 0 < p < 1. Each coin is tossed exactly
once. The resulting rvs are denoted X and Y respectively. In other words, X ∼ Bernoulli(p) and
Y ∼ Bernoulli(p2 ).
1. Write the likelihood function for p, log-likelihoood and its first two derivatives, assuming inde-
pendence between X and Y .

L(p) = pX (1 − p)1−X × (p2 )Y (1 − p2 )1−Y = pX+2Y (1 − p)1−X (1 − p2 )1−Y


l(p) = (X + 2Y ) log p + (1 − X) log(1 − p) + (1 − Y ) log(1 − p2 )
d X + 2Y 1−X 1−Y
l(p) = − − 2p
dp p 1−p 1 − p2
d2 X + 2Y 1−X 1−Y 1−Y
2
l(p) = − 2
− 2
−2 2
− (2p)2 .
dp p (1 − p) 1−p (1 − p2 )2

2. Derive the likelihood equation for p.


Differentiating l(p) with respect to p and setting it equal to zero, we get
d X + 2Y 1−X 1−Y
0= l(p) = − − 2p
dp p 1−p 1 − p2
(1 − p)(1 − p )(X + 2Y ) − p(1 − p2 )(1 − X) − 2p2 (1 − p)(1 − Y )
2
=
p(1 − p)(1 − p2 )
(1 − p − p2 + p3 )(X + 2Y ) − (p − p3 )(1 − X) − 2(p2 − p3 )(1 − Y )
=
p(1 − p)(1 − p2 )
3p3 − (X + 2)p2 − (2Y + 1)p + (X + 2Y )
= ,
p(1 − p)(1 − p2 )
which leads to the cubic equation
3p3 − (X + 2)p2 − (2Y + 1)p + (X + 2Y ) = 0.

3. If X = 1 and Y = 0, find the mle of p.


For X = 1 and Y = 0, the above equation reduces to
3p3 − 3p2 − p + 1 = (3p2 − 1)(p − 1) = 0.

whose three solutions are p = ±1/ 3 and p = 1. Of these, the negative solution is irrelavant.
√ It
can be√verified that the second derivative of the log likelihood is negative at p = 1/ 3, so that
p = 1/ 3 is the required maximum in log likelihood and the estimate of the unknown parameter.

4. Based on any insights from the previous result, plus some guesswork/trial-and-error, find all
solutions to the likelihood equation in terms of X and Y .
Trial-and-error shows that p = 1 is one solution to this equation. Factoring p = 1 out of the
likelihood equation, we get the quadratic equation
3p2 + (1 − X)p − (X + 2Y ) = 0,
whose roots are
1 p  1 p 
p± = −(1 − X) ± (1 − X)2 + 12(X + 2Y ) = (X − 1) ± 1 − 10X + 24Y + X 2 .
6 6
Both roots are real for the four joint possibilities for X and Y ; namely, (a) X = 1 and Y = 0, (b)
X = 1 and Y = 1, (c) X = 0 and Y = 1, and (d) X = 0 and Y = 0. One of the positive roots,
namely, 1 or p+ is the mle.

8
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 5

5. Plot the likelihood, log-likelihoood, and its first two derivatives as functions of p for the four cases
(a) X = 1 and Y = 0, (b) X = 1 and Y = 1, (c) X = 0 and Y = 1, and (d) X = 0 and Y = 0.
ll <- function ( p , X , Y , deriv = 0 )
{
stopifnot ( length ( deriv ) == 1 , deriv % in % ( 0:2 ) ,
all ( X % in % ( 0:1 ) ) , all ( Y % in % ( 0:1 ) ) )

nX <- length ( X )
nY <- length ( Y )
sX <- sum ( X )
sY <- sum ( Y )

if ( deriv == 0 )
{
i <- which ( p % in % c ( 0 , 1 ) )
j <- setdiff ( 1: length ( p ) , i )

p [ i ] <- - Inf
p [ j ] <- ( sX +2 * sY ) * log ( p [ j ]) +( nX - sX ) * log (1 - p [ j ]) +( nY - sY ) * log (1 - p [ j ]^2)

return ( p )
}

switch ( deriv , # need better handling of p =0 ,1


( sX +2 * sY ) /p -( nX - sX ) / (1 - p ) -2 * p * ( nY - sY ) / (1 - p ^2) ,
-( sX +2 * sY ) / p ^2 -( nX - sX ) / (1 - p ) ^2 -2 * ( nY - sY ) / (1 - p ^2) -4 * p ^2 * ( nY - sY ) / (1 - p ^2) ^2)
}

mle <- function ( X , Y )


{
stopifnot ( all ( X % in % ( 0:1 ) ) , all ( Y % in % ( 0:1 ) ) )

nX <- length ( X )
nY <- length ( Y )
sX <- sum ( X )
sY <- sum ( Y )

0.5 * ( sX - nX + sqrt ( ( sX - nX ) ^2 + 4 * ( nX +2 * nY ) * ( sX +2 * sY ) ) ) / ( nX +2 * nY )
}

plot . ll <- function ( X , Y , ylim = c ( -50 , +50 ) )


{
nX <- length ( X )
nY <- length ( Y )
sX <- sum ( X )
sY <- sum ( Y )
p . hat <- mle ( X , Y )

curve ( ll ( x , X , Y , 0 ) , from = 0 , to = 1 , n = 501 ,


bty = ’n ’ , xlab = ’p ’ , ylab = ’ ’ , lwd = 1.5 )
title ( main = paste ( ’Log - Likelihood | Sx = ’ , sX , ’ , Nx = ’ , nX ,
’ | Sy = ’ , sY , ’ , Ny = ’ , nY , sep = ’ ’ ) )
abline ( v = p . hat , lty = 2 , col = ’ red ’ )
axis ( 3 , at = p . hat , labels = ’ MLE ’ , col . axis = ’ red ’ , col = ’ white ’ )

curve ( ll ( x , X , Y , 1 ) , from = 0 , to = 1 , n = 501 ,


bty = ’n ’ , xlab = ’p ’ , ylab = ’ ’ , lwd = 1.5 , ylim = ylim )
title ( main = ’ First Derivative ’ )
abline ( v = p . hat , h = 0 , lty = 2 , col = ’ red ’ )
axis ( 3 , at = p . hat , labels = ’ MLE ’ , col . axis = ’ red ’ , col = ’ white ’ )

curve ( ll ( x , X , Y , 2 ) , from = 0 , to = 1 , n = 501 ,


bty = ’n ’ , xlab = ’p ’ , ylab = ’ ’ , lwd = 1.5 , ylim = ylim )
title ( main = ’ Second Derivative ’ )
abline ( v = p . hat , h = 0 , lty = 2 , col = ’ red ’ )
axis ( 3 , at = p . hat , labels = ’ MLE ’ , col . axis = ’ red ’ , col = ’ white ’ )
}

op <- par ( mfcol = c ( 3 , 4 ) )


plot . ll ( 0 , 0 )
plot . ll ( 1 , 0 )
plot . ll ( 0 , 1 )
plot . ll ( 1 , 1 )
par ( op )

Computation of the derivatives in the script above may not be correct for p = 0, 1.

9
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 5

6. Generalize the likelihood function, log-likelihood, and its derivatives for X1 , . . . , XN ∼ Bernoulli(p)
and Y1 , . . . , YM ∼ Bernoulli(p2 ). Assume independence.
N
X M
X
Let SX = Xi and SY = Yj .
i=1 j=1

L(p) = pSX (1 − p)N −SX × (p2 )SY (1 − p2 )M −SY = pSX +2SY (1 − p)N −SX (1 − p2 )M −SY
l(p) = (SX + 2SY ) log p + (N − SX ) log(1 − p) + (M − SY ) log(1 − p2 )
d SX + 2SY N − SX M − SY
l(p) = − − 2p
dp p 1−p 1 − p2
d2 SX + 2SY N − SX M − SY M − SY
2
l(p) = − 2
− 2
−2 2
− (2p)2 .
dp p (1 − p) 1−p (1 − p2 )2

The likelihood equation is

(N + 2M )p3 − (SX + 2M )p2 − (2SY + N )p + (SX + 2SY ) = 0,

which has p = 1 as one solution. The other two solutions are the roots of the quadratic

(N + 2M )p2 + (N − SX )p − (SX + 2SY ) = 0;

namely,
1  p 
p± = −(N − SX ) ± (N − SX )2 + 4(N + 2M )(SX + 2SY ) .
2(N + 2M )

The positive roots 1, p+ qualify as candidates for the mle.

7. Derive the formula for Fisher information on p.


Since E(X) = p and E(Y ) = p2 , E(SX ) = N p and E(SY ) = M p2 . The Fisher information for
parameter p is
 2   
d SX + 2SY N − SX M − SY 2 M − SY
I(p) = −E l(p) = −E − − − 2 − (2p)
dp2 p2 (1 − p)2 1 − p2 (1 − p2 )2
N p + 2M p2 N − N p M − M p2 2 M − Mp
2
= + + 2 + (2p)
p2 (1 − p)2 1 − p2 (1 − p2 )2
N + 2M p N 4M p2
= + + 2M + .
p 1−p 1 − p2

10
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 6

6
If X1 , X2 , . . . , Xn ∼ Poisson(λ) iid, design estimators for λ using estimation methods you have studied.
Poisson distribution is a discrete distribution defined over nonnegative integers k = 0, 1, . . . by the pmf

λk e−λ
P (k) = ,
k!

X ∞
X
with λ > 0. It is normalized (i.e., 1 = P (k)) because λk /k! = eλ .
k=0 k=0

1. mome. We have

X ∞
X
E(X) = kP (k) = kP (k)
k=0 k=1
∞ ∞
X λk X λk+1
= e−λ = e−λ
(k − 1)! k!
k=1 k=0
∞ ∞
−λ
X λk X
= λ×e =λ× P (k)
k!
k=0 k=0
= λ.
Pn
Equating this with the first sample moment n−1 i=1 Xi , we get the mome
n
X
b = n−1
λ Xi .
i=1

2. mle. Likelihood and log-likelihood functions are


n n n n
Y Y λXi e−λ Y λXi e−λ Pn Y 1
L(λ) = P (Xi ) = = = e−nλ λ i=1 Xi
×
Xi ! Xi ! Xi !
i=1 i=1 i=1 i=1
Xn
l(λ) = −nλ + log(λ) Xi + terms that do not depend on λ.
i=1

To maximize l(λ), set dl(λ)/dλ = 0:


n
1X
−n + Xi = 0,
λ
i=1

which gives us the mle


n
X
b = n−1
λ Xi .
i=1

3. plug-in. This is a parametric setting and, as such, the plug-in method is not appropriate. If one
uses E(X) = λ, and use the plug-in estimator for mean, then this is same as mome.

11
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 7

1. X1 , X2 , . . . , Xn ∼ Uniform(0, θ), where θ > 0. What is the maximum likelihood estimator for θ?
This is AoS2004 Example 9.12.
1

θ 0≤x≤θ
fX (x; θ) = .
0 otherwise

Because θ defines the right boundary of the pdf, the pdf is discontinuous at x = θ. This
discontinuity will be reflected in the likelihood function. As such, the derivative-based method of
maximization is inapplicable here. The likelihood function is
n  −n
Y θ 0 ≤ X1 , . . . , Xn ≤ θ
L(θ) = fX (Xi ; θ) = .
0 otherwise
i=1

To see how L(θ) changes with θ, imagine sliding the value of θ from left to right across the range
of observed data1 :

θ
X3 X8 X6 X5 X2 X1 X4 X7

As long as any data point remains to the right of θ, the likelihood L(θ) will be equal to zero.
Continuing the same argument, the likelihood will be nonzero (= θ−n ) for θ ≥ max{X1 , . . . , Xn }.
So 
0 θ < max{X1 , . . . , Xn }
L(θ) = −n .
θ θ ≥ max{X1 , . . . , Xn }
Clearly, the maximum value of L occurs at the boundary, i.e., θ = max{X1 , . . . , Xn }. Therefore,

θbn = max{X1 , . . . , Xn }.

theta <- 1
n <- 10
u <- runif ( n , 0 , theta )

curve ( x ^( - n ) , from = max ( u ) , to = 1.5 * max ( u ) , lwd = 2 , bty = ’n ’ ,


xlab = expression ( theta ) , ylab = ’ Likelihood ’ , xlim = c ( 0 , 1.5 * max ( u ) ) )
lines ( c ( 0 , max ( u ) ) , c ( 0 , 0 ) , lwd = 2 )
abline ( v = max ( u ) , lty = 2 )
rug ( u , col = ’ red ’ , lwd = 1.5 )
axis ( 3 , at = max ( u ) , label = expression ( widehat ( theta ) [ n ] ) )

Formally, we may write the pdf in the alternative form


1
fX (x; θ) = I[0,θ] (x),
θ
where 
1 x∈A
IA (x) = .
0 x∈
/A
The likelihood function takes the form
n
Y
−n
L(θ) = θ I[0,θ] (Xi ).
i=1

Requiring all data to be inside the interval [0, θ] is same as requiring the maximum data value to
be inside the interval [0, θ]. Therefore,

L(θ) = θ−n I[0,θ] (max{X1 , . . . , Xn }).


1
Because θ is the upper bound on X, we can meaningfully put X1 , . . . , Xn on the θ axis.

12
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 7

From this form, it is straightforward to see that



0 θ < max{X1 , . . . , Xn }
L(θ) = ,
θ−n θ ≥ max{X1 , . . . , Xn }

and that L(θ) attains its maximum value for θ = max{X1 , . . . , Xn }.

2. If X1 , X2 , . . . , Xn ∼ Uniform(θ, 0), what is the maximum likelihood estimator for θ?

θbn = min{X1 , . . . , Xn }.

13
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 8

8
Let X1 , X2 , . . . , Xn ∼ Uniform(a, b) iid. a and b are the unknown parameters. Assume that a < b.

1. Find the mome for a and b.


The pdf of each Xi is
1

b−a if a ≤ X ≤ b
fX (x) = .
0 otherwise
The first two moments of the distribution are
Z Z b
1 1
µ1 = E(X) = xfX (x)dx = xdx = (a + b)
b−a a 2
Z Z b
1 1
µ2 = E(X 2 ) = x2 fX (x)dx = x2 dx = (a2 + ab + b2 ).
b−a a 3

The first two sample moments (i.e., estimators for the first two moments above) are
n
1X
µ
b1 = Xi
n
i=1
n
1 X
µ
b2 = Xi2 .
n
i=1

Equate µk with µ
bk , k = 1, 2, we get two equations for the estimators of a and b:
1
µ
b1 = (b
an + bbn )
2
1 2
µ
b2 = (b
a +b anbbn + bb2n ).
3 n

Solving for b
an and bbn , we get the following two pairs of solutions:
q
an = µ
b b1 + 3(b µ2 − µb21 )
q
bbn = µ b1 − 3(b µ2 − µb21 )

OR
q
b1 −
an = µ
b 3(b b21 )
µ2 − µ
q
bbn = µ
b1 + 3(b b21 ).
µ2 − µ

b21 = n1 ni=1 (Xi − µ b1 )2 > 0. Hence, the required pair of estimators which is
P
b2 − µ
Notice that µ
consistent with a < b is the first one.

2. Find the maximum likelihood estimator for a and b.


Using arguments similar to AoS2004 Example 9.12, or those in the previous problem, we get

an = min{X1 , . . . , Xn }
b
bbn = max{X1 , . . . , Xn }.

14
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 9

9
Let X1 , X2 , . . . , Xn be independent and identically distributed rvs with the pdf
α+1
f (x; α) = , x ≥ 1, α > 0.
xα+2
This distribution function is a special form of the Pareto distribution, with parameter xm = 1.

1. Derive the mome for α.


First get the expression for E(X):
Z ∞ Z ∞  ∞ 
−(α+1) −α
α+1 α+1
E(X) = xf (x; α)dx = (α + 1) x dx = −
x = .
1 1 1 α α

b1 = n−1 ni=1 Xi with this


P
The mome α bn is obtained by equating the first sample moment µ
expected value, and solving the resulting equation for α. We get
n
α
bn = .
b1 − n
µ

2. Derive the mle for α.


Assuming that X1 , . . . , Xn ≥ 1, the likelihood, log-likelihood, and log-likelihood derivative are

n n
!−(α+2)
Y α+1 n
Y
L(α) = α+2 = (α + 1) Xi
i=1
Xi i=1
n
X
l(α) = n log(α + 1) − (α + 2) log(Xi )
i=1
n
dl(α) n X
= − log(Xi ).
dα α+1
i=1

Setting the derivative to zero and solving for α, we get the mle
n
!−1
1X
α
bn = log(Xi ) − 1.
n
i=1

15
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 10

10
Suppose we have independent normal data with the unequal means but the same variance; i.e., Xi ∼
N (µi , σ 2 ), i = 1, . . . , n. The means µi are all known.

1. Derive the mle for σ 2 .


The likelihood function L(σ 2 ) and the log-likelihood function l(σ 2 ) = log L(σ 2 ) for this problem
is
n
(Xi − µi )2
 
2
Y 1
L(σ ) = √ exp −
2σ 2 2σ 2
i=1
n
n 1 X (Xi − µi )2
l(σ 2 ) = − log 2σ 2 − .
2 2 σ2
i=1

Differentiating l(σ 2 ) with respect to σ 2 and setting it equal to zero, we get


n
d 2 n 1 X
l(σ ) = − + (Xi − µi )2 = 0.
dσ 2 σ σ3
i=1

The solution of the above equation for σ 2 is the desired estimator


n
c2 = 1
X
σ (Xi − µi )2 .
n
i=1

2. Find Fisher information for σ 2 .


Second derivative of log-likelihood function with respect to σ 2 is
n 
d2 3 X Xi − µi 2

2 n
l(σ ) = + 2 − 2 .
dσ 2 2 σ σ σ
i=1
 2  2
Xi − µi Xi − µi Xi − µi
Because ∼ N (0, 1), ∼ χ21 . Hence E = 1. Therefore,
σ σ σ

d2
 
2 2n
2
I(σ ) = −E 2 l(σ ) = σ 2 .
dσ 2

16
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 11

11
Suppose we have independent normal data with the same mean but unequal variances; i.e., Xi ∼
N (µ, σi2 ), i = 1, . . . , n. The variances σi2 are all known.

1. Derive the mle for µ.


The likelihood function L(µ) and the log-likelihood function l(µ) = log L(µ) for this problem is
n
(xi − µ)2
 
Y 1
L(µ) = exp −
2σi2
q
i=1 2σi2
n n 
1 X xi − µ 2

1X 2
l(µ) = − log 2σi − .
2 2 σi
i=1 i=1

Differentiating l(µ) with respect to µ and setting it equal to zero, we get


n  
d X xi − µ 1
l(µ) = = 0.
dµ σi σi
i=1

The solution of the above equation for µ is the desired estimator


n
X
µ
b= wi x i ,
i=1

where
1/σ 2
wi = Pn i 2 .
j=1 1/σj

2. Find Fisher information for µ.


Second derivative of log-likelihood function with respect to µ is
n
d2 X 1
l(µ) = − .
dµ2 σi2 i=1

Therefore,
n
d2
  X
1
I(µ) = −E l(µ) = .
dµ 2 σi2 i=1

17
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 12

12
X1 , X2 , . . . , Xn ∼ Exp(θ) iid.

1. Derive expression for E(X).


The pdf for Exp(θ) is
1 −x/θ

θe x≥0
fX (x) =
0 oherwise
Therefore,
Z ∞ Z ∞ Z ∞
1 −x/θ
E(X) = xfX (x)dx = xe dx = θ ye−y dy = θΓ(2) = θ, (1)
0 θ 0 0
R∞
where Γ(z) = 0 tz−1 e−t dt is the Gamma Function. For integer values of z, Γ(z) = (z − 1)!.

2. What is the mome for θ?


Pn
b1 = n−1
Equating E(X) with the first sample moment µ i=1 Xi , we get the estimator
n
X
θbn = n−1 Xi .
i=1

18
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 13

13
n1 people are given treatment 1 and n2 people are given treatment 2. n1 and n2 are known. Let X1 be
the number of people on treatment 1 who respond favourably to the treatment and X2 be the number of
people on treatment 2 who respond favourably to the treatment. Assume that X1 ∼ Binomial(n1 , p1 ),
X2 ∼ Binomial(n2 , p2 ). Let ψ = p1 − p2 .

1. Find the maximum likelihood estimator ψ̂ of ψ.

2. Find the Fisher information matrix I(p1 , p2 ).

3. Use the multiparameter delta method to find the asymptotic standard error of ψ̂.

19
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 14

14
Let X1 , . . . , Xn be iid rvs that take values θ + 1 and θ − 1 with probabilities p and 1 − p respectively.
Here, 0 ≤ p ≤ 1 is known. θ, the unknown, is a real number. We wish to estimate θ from data
X1 , X2 , . . . , Xn .

1. What is the the expected value of X?

E(X) = p × (θ + 1) + (1 − p) × (θ − 1) = 2p + θ − 1.

2. What is the variance of X?


Using similar reasoning, the variance of X turns out to be

Var(X) = E(X 2 ) − (E(X))2 = 4p(1 − p) + 2θ.

3. What is the pdf of X?


The pdf of X can be written in the form

p x=θ+1
fX (x) =
1−p x=θ−1
1 1
= p 2 (x−(θ−1)) (1 − p)1−( 2 (x−(θ−1))) .

4. What is the plug-in estimator of θ?


This is a parametric setting and, as such, the plug-in method is not appropriate. We could
consider using the plug-in estimator for the mean and relate it to E(X), but this would be the
mome; see below.

5. What is the mome of θ?


Pn
b1 = n−1
Equating E(X) to the first sample moment µ i=1 Xi , we get the mome for θ:
n
X
θbn = 1 − 2p + n−1 Xi .
i=1

It can be shown easily that this estimator is unbiased, and has variance (4p(1 − p) + 2θ)/n.

6. What is the mle of θ?


Pn
The likelihood and log-likelihood functions are (with S = i=1 xi ):
n
Y 1 1
L(θ) = fX (xi ) = p 2 (S−n(θ−1)) (1 − p)n− 2 (S−n(θ−1))
i=1
 
1 1
l(θ) = (S − n(θ − 1)) log(p) + n − (S − n(θ − 1)) log(1 − p).
2 2
Sans additive constants, the log-likelihood function is
n
l(θ) = − (log(p) + log(1 − p)) × θ.
2
Because the log-likelihood is linear in θ, it is unbounded and increases indefinitely as θ → ∞
because the slope − n2 (log(p)+log(1−p)) > 0. This is a case where the mle method is inapplicable.

20
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 14

7. Any other estimator for θ that you could design?


Assuming that both values of X, i.e., θ ± 1, are represented in the data, we have the following
two obvious estimators

θbn = min(X1 , . . . , Xn ) + 1
θbn = max(X1 , . . . , Xn ) − 1

which will both give the true value of θ.

8. If all the observations turn out to be identical, would you need to modify your estimators? If yes,
state the modification and justify.
The events
X1 = . . . = Xn = θ + 1
and
X1 = . . . = Xn = θ − 1
could occur with probabilities pn and (1 − p)n respectively. However, we don’t know which
possibility has actually shown up. In these two cases, all previous estimators will lead to wrong
estimates of θ. Suppose X1 = . . . = Xn = α, where α is either θ + 1 or θ − 1 – we don’t know.
The following estimators may work better for this situation:

α − 1 if p > 1 − p
θn =
b
α + 1 otherwise

α − 1 with probability p
θbn =
α + 1 with probability 1 − p

The second estimator above requires generating a Uniform(0,1) random number r, and choosing
α − 1 if r ≤ p, and α + 1 otherwise. The second estimator above appears more appropriate when
p = 1/2.

** CONFIRM **

21
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 15

15
Suppose we have paired data (Xi , Yi ), i = 1, . . . , n. Let us assume that X and Y are related to each
other as
Yi = (aXi + b) + i ,
where the random noise i ∼ N (0, 1). Find the mles of a, b.

22
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 16

16
Suppose you have paired data (xi , yi ), i = 1, . . . , n, where xi are precisely known (they are not rvs).
Further, you are expecting a relationship between xi and yi that has the form

yi = aexi /b + i ,

where i ∼ N (0, σ 2 ) iid and σ 2 is not known. Write down the log-likelihood function and the likelihood
equations for estimation of a, b, σ 2 .

23
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 17

17
Let X1 , . . . , Xn ∼ N (µ, σ 2 ) iid, with known µ and unknown σ. Find the mle of σ and ψ = log σ. Find
the Fisher information for σ. Obtain thereby an estimate of the standard error of ψ, b together with a
(1 − α) confidence interval on ψ.

24
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 18

18
Let X1 , . . . , Xn ∼ Bernoulli(p) and let ψ = g(p) = log(p/(1 − p)). State the maximum likelihood
estimate for p. Using that, derive the formula for maximum likelihood estimate ψ̂ of ψ. Also compute
the standard error of ψ̂.

25
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 19

19
Let X1 , . . . , Xn ∼ f (·; k, θ) iid, where
k+1
− x)k 0 ≤ x ≤ θ

θk+1

f (x; k, θ) = ,
0 otherwise

where θ > 0, k > 0.

1. Find the mome for k and θ.


First moment of the pdf:
θ
k+1 θ 
Z Z
k+1 k x k
E(X) = x(θ − x) dx = x 1− dx
θk+1
0 θ 0 θ
Z 1
Γ(2)Γ(k + 1) θ
= θ(k + 1) y(1 − y)k dx = θ(k + 1) = ,
0 Γ(k + 3) k+2
where we have used the definition and a property of the Beta function
Z 1
Γ(α)Γ(β)
B(α, β) = y α−1 (1 − y)β−1 dy = ,
0 Γ(α + β)

and the Gamma function properties Γ(n) = (n − 1)! for integer n, and Γ(k + 1) = kΓ(k).
Second moment of the pdf:
k+1 θ 2 k+1 θ 2
Z Z
2 k x k
E(X ) = x (θ − x) dx = x 1 − dx
θk+1 0 θ 0 θ
Z 1
Γ(3)Γ(k + 1) θ θ
= θ2 (k + 1) y 2 (1 − y)k dx = θ2 (k + 1) = × ,
0 Γ(k + 4) k+3 k+2
Pn
Equating P b1 = n−1
the first two moments of the pdf with the first sample moments µ i=1 Xi and
n
b2 = n−1 i=1 Xi2 , we get the mome equations
µ

θ
µ
b1 =
k+2
θ θ
µ
b2 = × .
k+3 k+2
Solving for k and θ, we get the mome

µ2 − 2b
3b µ1 2
k = −
b
b2 − µ
µ b21
θb = (b
k + 2)b
µ1 .

2. Write the likelihood and log-likelihood functions. Proceed as far as possible towards finding the
mle for the parameters.
The pdf is discontinuous at x = 0 and possibly non-differentiable at x = θ. Since θ defines
a boundary for the pdf, we need to analyze the behaviour of the likelihood function carefully.
Write the pdf in the alternative form
2
fX (x; θ) = (θ − x)I[0,θ] (x),
θ2
where 
1 x∈A
IA (x) = .
0 x∈
/A

26
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 19

The likelihood function is


 n
n Y
k+1
L(k, θ) = (θ − Xi )k I[0,θ] (Xi ).
θk+1
i=1

Because of the indicator functions I[0,θ] (Xi ), i = 1, . . . , n, the likelihood as function of θ will
be nonzero only when θ > max{X1 , . . . , Xn }. We may write the likelihood and log-likelihood
functions as

0 θ ≤ max{X1 , . . . , Xn }
L(k, θ) = k+1 n Qn
 k
k+1 i=1 (θ − Xi ) θ > max{X1 , . . . , Xn }
 θ
−∞ θ ≤ max{X1 , . . . , Xn }
l(k, θ) =
n log(k + 1) − n(k + 1) log(θ) + k ni=1 log(θ − Xi ) θ > max{X1 , . . . , Xn }
P

Assuming that θ is constrained to the region θ > max{X1 , . . . , Xn }, we may write the likelihood
equations
n
∂l n(k + 1) X 1
= − +k =0
∂θ θ θ − Xi
i=1
n
∂l n X
= − − n log(θ) + log(θ − Xi ) = 0.
∂k (k + 1)
i=1

which turn out to be nonlinear in the parameters, and presumably from hard to impossible to solve
analytically. We may have to resort to numerical maximization methods or numerical methods
for solving systems of nonlinear equations.

3. Write R code to plot the log-likelihood function (as function of θ for fixed value of k) to see if the
mle of θ necessarily lies on the boundary θ = max{X1 , . . . , Xn }, or on the right of this boundary.
Write R code to make a surface plot or heatmap of the log-likelihood as function of both θ and k.
Core functions:
df <- function ( x , theta = 1 , k = 1 )
{
# probability density function

stopifnot ( theta > 0 , k > 0 )

i <- which ( ( x <= theta ) & ( x >= 0 ) )


j <- setdiff ( 1: length ( x ) , i )

if ( ! is . null ( i ) ) x [ i ] <- ( k + 1 ) * ( 1 - x [ i ] / theta ) ^ k / theta


if ( ! is . null ( j ) ) x [ j ] <- 0

x
}

pf <- function ( x , theta = 1 , k = 1 )


{
# cumulative distribution function

stopifnot ( theta > 0 , k > 0 )

l <- which ( x <= 0 )


n <- which ( x >= theta )
m <- setdiff ( 1: length ( x ) , c ( l , n ) )

if ( ! is . null ( l ) ) x [ l ] <- 0
if ( ! is . null ( m ) ) x [ m ] <- 1 - ( 1 - x [ m ] / theta ) ^( k + 1 )
if ( ! is . null ( n ) ) x [ n ] <- 1

x
}

qf <- function ( q , theta = 1 , k = 1 )

27
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 19

{
# quantile function

stopifnot ( theta > 0 , k > 0 , q >= 0 , q <= 1 )

theta * ( 1 - ( 1 - q ) ^( 1 / ( k + 1 ) ) )
}

rf <- function ( n , theta = 1 , k = 1 )


{
# random number generator

stopifnot ( theta > 0 , k > 0 , n >= 0 , n == as . integer ( n ) )

qf ( runif ( n ) , theta , k )
}

ll <- function ( theta , k = 1 , x )


{
# log - likelihood function for single fixed value of k

stopifnot ( length ( k ) == 1 , k > 0 , all ( theta > 0 ) )

n <- length ( theta )

i <- which ( theta <= max ( x ) )


j <- setdiff ( 1: n , i )

theta [ i ] <- - Inf


theta [ j ] <- ( n * log ( k + 1 ) - ( k + 1 ) * n * log ( theta [ j ] )
+ sapply ( theta [ j ] , function ( t ) { sum ( log ( t - x ) ) } ) )

theta
}

Plots of pdf, cdf, qf, and a random number histogram:


source ( ’ 18 -0. r ’ )

theta <- 1
k <- 1 / 2

op <- par ( mfcol = c ( 2 , 2 ) )


# pdf
curve ( df ( x , theta , k ) , from = 0 , to = theta + 0.5 , n = 501 ,
bty = ’n ’ , lwd = 2 , xlim = c ( -0.5 , theta + 0.5 ) , ylab = ’ PDF ’ )
lines ( c ( -0.5 , 0 ) , c ( 0 , 0 ) , lwd = 2 )
abline ( v = c ( 0 , theta ) , lty = 2 )
title ( main = parse ( text = paste ( " paste ( k == " , k , " ~ ~ theta == " , theta , " ) " ) ) )

# random numbers histogram overlaid with pdf


r <- rf ( 1000 , theta , k )
hist ( r , ’ FD ’ , border = ’ white ’ , col = ’ gray ’ , freq = FALSE , xlab = ’x ’ ,
main = ’ Random Numbers ’ , xlim = c ( -0.5 , theta + 0.5 ) )
abline ( v = c ( 0 , theta ) , lty = 2 )
lines ( c ( -0.5 , 0 ) , c ( 0 , 0 ) , lwd = 2 )
lines ( c ( theta , theta + 0.5 ) , c ( 0 , 0 ) , lwd = 2 )
curve ( df ( x , theta , k ) , from = 0 , to = theta , n = 501 , add = TRUE , lwd = 2 )

# cdf
curve ( pf ( x , theta , k ) , from = -0.5 , to = theta + 0.5 , n = 501 ,
bty = ’n ’ , lwd = 2 , xlim = c ( -0.5 , theta + 0.5 ) , ylab = ’ CDF ’ , asp = 1 )
abline ( h = c ( 0 , 1 ) , v = c ( 0 , theta ) , lty = 2 )

# qf
curve ( qf ( x , theta , k ) , from = 0 , to = 1 , n = 501 ,
bty = ’n ’ , lwd = 2 , xlim = c ( -0.1 , 1.1 ) , ylab = ’ QF ’ , xlab = ’q ’ , asp = 1 )
abline ( h = c ( 0 , theta ) , v = c ( 0 , 1 ) , lty = 2 )
par ( op )

Log-likelihood function for fixed k:


source ( ’ 18 -0. r ’ )

theta <- 1
k <- 1 / 2

28
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 19

r <- rf ( 10 , theta , k ) # a random sample from the pdf

# log - likelihood function as function of theta , with fixed value for k


# log - likelihood --> - infinity for theta < max ( r ) ( not shown )

curve ( ll ( x , k , r ) , from = max ( r ) , to = 1.01 * max ( r ) , n = 501 , col = ’ red ’ ,


bty = ’n ’ , xlab = expression ( theta ) , ylab = ’ Log Likelihood ’ , lwd = 2 )
abline ( v = max ( r ) , lty = 2 )
axis ( 3 , at = max ( r ) ,
label = expression ( paste ( max , ’( ’ , X [1] , ’ , ’ , ldots , ’ , ’ , X [2] , ’) ’ ) ) )
title ( main = paste ( ’log - likelihood for fixed value of k = ’ , k ) )

29
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 20

20
Let X1 , . . . , Xn ∼ f (·; θ, δ) iid, where
(
1 − x−δ
θe
θ for x ≥ δ
f (x; θ, δ) =
0 otherwise

The pdf definedZ above can be thought of as a shifted version of the exponential distribution. It can be
1 ∞ − x−δ
verified that e θ dx = 1.
θ δ

1. Find the mome for θ and δ.


Because there are two parameters to estimate, we need to evaluate the first two moments of f :

1 ∞ − x−δ
Z Z ∞
µ1 = E(X) = xe θ dx = (θy + δ)e−y dy = θ + δ
θ δ 0
Z ∞ Z ∞
1 2 − θx−δ
µ2 = E(X ) = 2
x e dx = (θy + δ)2 e−y dy = 2θ2 + 2θδ + δ 2 = θ2 + µ21 ,
θ δ 0
Z ∞
where we have used the result y k−1 e−y dy ≡ Γ(k) = (k − 1)! for positive integer k. The first
0
two sample moments are
n
1X
µ
b1 = Xi
n
i=1
n
1X 2
µ
b2 = Xi .
n
i=1

Equating the respective moments with their sample counterparts, we get the mome equations

µ
b1 = θbn + δbn
b2 = θb2 + µ
µ b2 . n 1

Solving for θbn and δbn , we get the mome:


q
θbn = µ b21
b2 − µ
b1 − θbn .
δbn = µ

2. Find the mle for θ and δ.


The likelihood and log-likelihood functions are, respectively,
n
Y 1 Xi −δ 1 Pn
L(δ, θ) = e− θ = θ−n e− θ i=1 (Xi −δ)
θ
i=1
n
1X
l(δ, θ) = −n log(θ) − (Xi − δ).
θ
i=1

Set derivatives of l to zero:


∂l(δ, θ) n
0 = =
∂δ θ
n
∂l(δ, θ) n 1 X
0 = =− + 2 (Xi − δ).
∂θ θ θ
i=1

30
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 20

Clearly, something is wrong with this naive treatment: The culprit is that f is neither continuous
nor differentiable at x = δ. Alternate arguments lead to

δbn = min{X1 , . . . , Xn }.

With this, the second equation leads to


n
1X
θbn = (Xi − δbn ).
n
i=1

** CONFIRM **

3. Visualize the density and the log-likelihood function.


Core functions:
# PDF
df <- function ( x , delta = 0 , theta = 1 ) { dexp ( x - delta , rate = 1 / theta ) }

# CDF
pf <- function ( x , delta = 0 , theta = 1 ) { pexp ( x - delta , rate = 1 / theta ) }

# QF
qf <- function ( q , delta = 0 , theta = 1 ) { qexp ( q , rate = 1 / theta ) + delta }

# RNG
rf <- function ( n , delta = 0 , theta = 1 ) { rexp ( n , rate = 1 / theta ) + delta }

# log - likelihood function for single fixed value of theta


ll <- function ( delta , theta = 1 , x )
{
stopifnot ( length ( theta ) == 1 , theta > 0 )

n <- length ( delta )

i <- which ( delta > min ( x ) )


j <- setdiff ( 1: n , i )

delta [ i ] <- - Inf


delta [ j ] <- - ( n * log ( theta )
+ sapply ( delta [ j ] , function ( d ) { sum ( x - d ) } ) / theta )

delta
}

Plots of pdf, cdf, qf, and a random number histogram:


source ( ’ 020 -0. r ’ )

delta <- 1
theta <- 2

op <- par ( mfcol = c ( 2 , 2 ) )


# pdf
curve ( df ( x , delta , theta ) , from = delta , to = 10 * delta , n = 501 , bty = ’n ’ , lwd = 2 ,
xlim = c ( - delta , 10 * delta ) , ylim = c ( 0 , 1 / theta ) , ylab = ’ PDF ’ )
lines ( c ( - delta , delta ) , c ( 0 , 0 ) , lwd = 2 )
abline ( v = delta , lty = 2 )
axis ( 3 , at = delta , label = expression ( delta ) )

# random numbers histogram overlaid with pdf


r <- rf ( 1000 , delta , theta )
hist ( r , ’ FD ’ , border = ’ white ’ , col = ’ gray ’ , freq = FALSE , xlab = ’x ’ ,
main = ’ Random Numbers ’ , xlim = c ( - delta , 10 * delta ) , ylim = c ( 0 , 1 / theta ) )
abline ( v = delta , lty = 2 )
lines ( c ( - delta , delta ) , c ( 0 , 0 ) , lwd = 2 )
curve ( df ( x , delta , theta ) , from = delta , to = 10 * delta , n = 501 , add = TRUE , lwd = 2
)
axis ( 3 , at = delta , label = expression ( delta ) )

# cdf

31
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 20

curve ( pf ( x , delta , theta ) , from = - delta , to = 10 * delta , n = 501 ,


bty = ’n ’ , lwd = 2 , ylab = ’ CDF ’ )
abline ( h = c ( 0 , 1 ) , v = delta , lty = 2 )
axis ( 3 , at = delta , label = expression ( delta ) )

# qf
curve ( qf ( x , delta , theta ) , from = 0 , to = 1 , n = 501 , bty = ’n ’ , lwd = 2 ,
xlim = c ( -0.1 , 1.1 ) , ylim = c ( - delta , 10 * delta ) , ylab = ’ QF ’ , xlab = ’q ’ )
abline ( h = delta , v = c ( 0 , 1 ) , lty = 2 )
axis ( 4 , at = delta , label = expression ( delta ) , las = 2 )
par ( op )

Log-likelihood function for fixed θ:


source ( ’ 020 -0. r ’ )

delta <- 1
theta <- 2

r <- rf ( 10 , delta , theta ) # a random sample from the pdf

# log - likelihood function as function of delta , with fixed value for theta
# log - likelihood --> - infinity for delta > min ( r ) ( not shown )

curve ( ll ( x , theta , r ) , from = 0.8 * min ( r ) , to = min ( r ) , n = 501 , col = ’ red ’ ,


bty = ’n ’ , xlab = expression ( delta ) , ylab = ’ Log Likelihood ’ , lwd = 2 )
abline ( v = min ( r ) , lty = 2 )
axis ( 3 , at = min ( r ) ,
label = expression ( paste ( min , ’( ’ , X [1] , ’ , ’ , ldots , ’ , ’ , X [2] , ’) ’ ) ) )
title ( main = paste ( ’log - likelihood for fixed value of theta = ’ , theta ) )

32
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 21

21
Let X1 , . . . , Xn ∼ Gamma(k, θ) iid. The pdf of Gamma(k, θ) is
( x
xk−1 e− θ
k x>0
f (x; k, θ) = θ Γ(k)
0 otherwise

with k, θ > 0.
1. Find the mome for k and θ.
We need the first two moments of Gamma(k, θ):

µ1 = E(X) = kθ
µ2 = E(X ) = Var(X) + (E(X))2 = kθ2 + (kθ)2 .
2

b1 = n−1 ni=1 Xi and µ


b2 = n−1 ni=1 Xi2 , we
P P
Equating these to the first two sample moments µ
get the mome equations for the estimators of the parameters:

µ
b1 = b
k θb
 2
µ k θb2 + b
b2 = b k θb .

Solving these for b


k and θ,
b we get

b21
µ
k =
b
b2 − µ
µ b21
b2 − µ
µ b21
θb = .
µ
b1

2. Assuming that the value of k is known, find the mle of θ.


The likelihood function is
n n ix
Y Y xk−1 e− θ
i
L(θ) = f (xi ; k, θ) =
θk Γ(k)
i=1 i=1

(assuming x1 , . . . , xn > 0 which is true by assumption). The log-likelihood function is


n n
X 1X
l(θ) = (k − 1) log(xi ) − xi − nk log(θ) − n log(Γ(k)).
θ
i=1 i=1

This is a continuous and differentiable function of θ, so to maximize it, we find its derivative and
set it to zero:
n
nk 1 X
− − 2 xi = 0,
θ θ
i=1
giving us
n
1 X µ
b1
θ=
b xi = .
nk k
i=1

The mle of k (i.e., b


k) does not have a closed-form analytical expression; see https://en.
wikipedia.org/wiki/Gamma_distribution#Maximum_likelihood_estimation for more infor-
mation.

3. Assuming that the value of k is known, find the mle of θ, find mle for θ and τ = (2θ − 1)2 .
Using equivariance, we get τb = (2θb − 1)2 .

33
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 22

22
Visualize the likelihood or the log-likelihood functions:

1. X1 , . . . , Xn ∼ Bernoulli(p) iid.
# likelihood function for a Bernoulli ( p ) sample
log . likelihood . bernoulli <- function ( data , p = seq ( 0 , 1 , by = 0.1 ) )
{
N <- length ( data )
S <- sum ( data )

L <- p ^ S * ( 1 - p ) ^( N - S )
logL <- S * log ( p ) + ( N - S ) * log ( 1 - p )

cbind ( p , L , logL )
}

# The " Truth "


p <- 0.6 # stopifnot ( p >= 0 , p <= 1 )

# visualization
op <- par ( mfrow = c ( 2 , 3 ) )

for ( N in c ( 5 , 10 , 15 , 30 , 100 , 1000 ) ) # loop over sample sizes


{
x <- sample ( c ( 1 , 0 ) , N , replace = T , prob = c ( p , 1 - p ) )

ll <- log . likelihood . bernoulli ( x , p = seq ( 0 , 1 , by = 0.01 ) )


mle <- sum ( x ) / length ( x )

plot ( ll [ , ’p ’] , ll [ , ’L ’] , type = ’l ’ , xlab = ’p ’ , ylab = ’ log ( likelihood ) ’ ,


main = paste ( N , ’ Tosses , ’ , sum ( x ) , ’ Heads ’ ) , bty = ’n ’ )
abline ( v = mle , col = ’ blue ’ , lty = 2 )
abline ( v = 0.5 + coin . bias , col = ’ red ’ )
mtext ( ’ TRUTH ’ , 3 , at = 0.5 + coin . bias , line = 0.25 , cex = 0.5 , col = ’ red ’ )
text ( mle , 1.01 * max ( ll [ , ’L ’] ) , ’ MLE ’ , cex = 0.75 , col = ’ blue ’ , adj = c ( 0.5 , 0 ) )
}

par ( op )

2. X1 , . . . , Xn ∼ N (µ, σ 2 ) iid.
# log likelihood function for a normal sample
log . likelihood . normal <- function ( mu , sigma , X )
{
# stopifnot ( sigma > 0 )

n <- length ( X )

S2 <- ( n - 1 ) * var ( X ) / n
Xbar <- mean ( X )

sapply ( sigma , function ( sigma ) { -n * log ( sigma ) - 0.5 * n * ( S2 +( Xbar - mu ) ^2) / sigma ^2 } )
}

# The " Truth "


mu <- 3
sigma <- 2

# visualization
grid . size <- 501 # finer grid == > smoother contours , slower draw , larger storage

# grids : arbitrary choice


mu . grid <- seq ( floor ( 0.5 * mu ) , ceiling ( 1.5 * mu ) , length = grid . size )
se . grid <- seq ( floor ( 0.5 * sigma ) , ceiling ( 1.5 * sigma ) , length = grid . size )

op <- par ( mfrow = c ( 2 , 3 ) )

for ( N in c ( 5 , 10 , 15 , 30 , 100 , 1000 ) ) # loop over sample sizes


{
X <- rnorm ( N , mu , sigma ) # data sample
mu . hat <- mean ( X ) # MLE of mu
se . hat <- sqrt ( ( N - 1 ) * var ( X ) / N ) # MLE of sigma ; ! = sd ( X )

ll <- log . likelihood . normal ( mu . grid , se . grid , X )

34
Tutorial C09 | Statistical Inference (C206) | Mihir Arjunwadkar | MTech M&S 2016-17 22

contour ( mu . grid , se . grid , ll , nlevels = 21 , xlim = range ( mu . grid , mu , mu . hat ) ,


ylim = range ( se . grid , sigma , se . hat ) , asp = 1 , col = ’ blue ’ , bty = ’n ’ )
title ( xlab = expression ( mu ) , ylab = expression ( sigma ) ,
main = paste ( ’ log ( likelihood ) for N ( mu , sigma ) | N = ’ , N ) )

points ( mu , sigma , pch = ’+ ’ , cex = 2 , col = ’ red ’ )


text ( mu , sigma , ’ TRUTH ’ ,
pos = if ( mu < mu . hat ) 2 else 4 , cex = 0.75 , col = ’ red ’ )

points ( mu . hat , se . hat , pch = ’+ ’ , cex = 2 , col = ’ blue ’ )


text ( mu . hat , se . hat , ’ MLE ’ ,
pos = if ( mu < mu . hat ) 4 else 2 , cex = 0.75 , col = ’ blue ’ )
}

par ( op )

35

You might also like