The Royal Statistical Society 2003 Examinations: Solutions
The Royal Statistical Society 2003 Examinations: Solutions
The Royal Statistical Society 2003 Examinations: Solutions
+
= =
= = +
i.e. we have ( ) ( )
1
log log 1 log 1
n
i
i
L n x
=
= + +
.
( )
( )
1
log
log 1
n
i
i
d L n
x
d
=
= +
(A)
and setting this equal to zero gives
( )
log 1
i
n
x
=
+
. Further,
( )
2
2 2
log d L n
d
= ,
confirming that this is a maximum.
Hence, by the invariance property of maximum likelihood estimators,
( )
1
1 1
log 1
n
i
i
x
n
=
= = +
.
(ii) ( ) { } ( )
( )
1
1
1
log 1 1
1
w
w
i i
e
P X w P X e dx
x
+ > = > =
+
( )
1
1 1
0
1
w
w
w
e
e
e
x
= = + = (
+
(
.
Hence the cdf of this is 1 e
w
and the pdf is e
w
, so the distribution is exponential
with mean 1/ = .
[ ] ( )
1
. log 1
i
E nE X
n
= + = (
. Thus is an unbiased estimator of .
(iii) ( ) ( ) log log
d d d
L L
d d d
=
( ) { }
2
1
log 1
i
n x
= +
`
)
.
( ) ( )
2
2 2 3
1
2
log log 1
n
i
i
d n
L x
d
=
= +
,
2
2 2 3 2
2
log
d n n
E L n
d
(
= + =
(
, and
the C-R lower bound is
2
/n. From (ii), ( )
2
Var / n = , so the bound is attained.
(iv) No. Because the bound is attainable for , it cannot be attainable for a non-
linear function of , such as = 1/.
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 2
(i) Given a random sample of data X from a distribution having parameter , a
statistic T(X) is sufficient for if the conditional distribution of X given T(X) does
not involve .
(ii) Let Y = min(X
i
). Defining the indicator function I
(x
i
) to be 0 when x
i
< and
to be 1 when x
i
, the likelihood function is ( ) ( )
1
i
n
x
i
i
L e I x
=
=
. Also, we have
( ) ( )
1
n
i
i
I x I y
=
=
and so ( ) ( )
i
x n
L e I y e
= . Therefore, by the factorisation
theorem, Y is sufficient for .
(iii) P(Y > y) implies P(X
1
> y, X
2
> y, , X
n
> y), i.e. ( ) ( )
1
n
i
i
P Y y P X y
=
> = >
.
Now, ( )
x x y
y y
P X y e dx e e
( > = = =
, so ( )
( ) n y
P Y y e
> = , for y > .
Hence the cdf is ( )
( )
1
n y
F y e
= and the pdf is f(y) = dF(y)/dy =
( ) n y
ne
, for y > .
(iv) We have that Y has a shifted exponential distribution. Hence ( )
1
E Y
n
= +
and ( )
2
1
Var Y
n
= , so that ( )
1
E Y c c
n
= + and ( )
2
1
Var Y c
n
= . From these,
( )
1
Bias Y c c
n
= and
2
2
2
1 1
Bias Var MSE c
n n
| |
= + = +
|
\ .
, which is clearly
minimised when c = 1/n. Thus Y (1/n) has smallest variance of all estimators of the
form Y c.
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 3
(i) The likelihood for a sample (x
1
, x
2,
, , x
n
) is ( ) ( ) Const. 1
i
i
n x
x
L
= ,
and so the likelihood ratio is
( )
( )
( ) ( )
( ) ( )
2 2 1
3 3 3
3
3 1
4
4 4
8 4
9 3
i i
i i
i i
x n x
x n x
x n x
L
L
| | | |
= = =
| |
\ . \ .
. Using the
Neyman-Pearson lemma, we reject H
0
when > c, where c is chosen to give the
required level of test, . Now, is an increasing function of x
i
, hence of
, and an
equivalent rule is therefore to reject H
0
when
is ( ) ( )
N , 1 /n . When = 3/4 this is ( )
3 3
4 16
N ,
n
, and when = 2/3 it is
( )
2 2
3 9
N ,
n
.
(iii) For = 0.05, choose k such that
( )
3
4
= , giving
3 1.6449 3
4 4
k
n
= .
(iv) For power 0.95,
( )
2
3
= , giving
2 1.6449 2
3 3
k
n
= + .
Using this expression for k together with the expression in (iii) means that we require
3 1.6449 3 2 1.6449 2
4 4 3 3 n n
= + or
1 1.6449 1 1
3 2
12 4 3 n
| |
= +
|
\ .
.
Thus we get n = 12 1.6449 0.9044 = 17.8521 and n = 318.7, so we take n =
319.
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 4
(i) P(0) = P(1) = (1 ) P( 2) = 1 (1 ) = (1 )
2
.
Thus the likelihood of n
0
zeros, n
1
ones and n
2
with two or more flaws is
( ) { } { }
( )
( )
1 0 1 0 1 0 0 1
2 2 2
1 1 1
n n n n n n n n n n
L
+
= = .
(ii) ( ) ( ) ( ) ( )
0 1 0 1
log log 2 2 log 1 L n n n n n = + + .
( )
0 1 0 1
2 2
log
1
n n n n n d
L
d
+
=
.
Setting this equal to zero gives that
satisfies ( )
( )
( )
0 0 1
1 2 2 n n n n n + = , so
that
0 1
0
2
n n
n n
+
=
.
Further, ( )
( )
2
0 1 0 1
2 2 2
2 2
log
1
n n n n n d
L
d
+
=
is a maximum,
and the sample information when
= (given by
( )
2
2
log d L
d
E
evaluated at
= ) is
( ) ( )
2 2
0 0
0 1 0 1
2 2
2 2
n n n n
n n n n n
+
+
(using
0 1
0
2 2
1
2
n n n
n n
).
(iii) An approximate 90% confidence interval for is
( )
1.6449
sample information
.
In the case when n = 100, n
0
= 90 and n
1
= 7, we have 2n n
0
= 110, n
0
+ n
1
= 97 and
2n 2n
0
n
1
= 13.
Thus
97
0.882
110
= = and the sample information is
2 2
110 110
1055.5115
97 13
+ = .
Thus the confidence interval is
1.6449
0.882
32.489
, i.e. 0.882 0.051 or (0.831, 0.933).
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 5
(i) = 0.025, = 0.075.
For observations x
1
, x
2
, , x
n
the likelihood is ( )
2
1
2
2
2
exp
n
n
i
i i
n
n
x
x
L
=
| |
=
|
\ .
, and for
the given H
0
and H
1
the likelihood ratio is
( )
( )
2
2
2
1 3
exp
1 2 4
n
n i n
n
L
x
L
| |
= =
|
\ .
.
The sequential probability ratio test rule is to continue sampling while A <
n
< B,
accept H
0
if
n
B and reject H
0
(i.e. accept H
1
) if
n
A. A and B are given by
0.025 1
0.027
1 0.925 37
A
= = = =
,
1 0.975
13
0.075
B
= = = .
(ii)
( )
2 2
3
2 /
2
0
2
x
x
E X e dx
put y = x
2
/
2
, so that dy/dx = 2x/
2
( )
2 2 2
0
2
y
ye dy
= = =
.
The ith item in the sequence making up {log
n
} is
2
3
4
2log 2
i i
Z X = + .
( )
3
4
2 2log 2 .4 1.6137
i
E Z = = + = .
( )
3
4
1 2log 2 .1 0.6363
i
E Z = = + = .
( )
( )
( )
log 1 log
2 1.494
2
i
A B
E N
E Z
+
= =
=
.
( )
( )
( )
1 log log
1 4.948
1
i
A B
E N
E Z
+
= =
=
.
(iii) x
1
= 2.2. ( )
3 1
1 4 4
exp 4.84 9.428 = = , continue sampling.
x
2
= 2.5.
( )
2 2
3 1
2 16 4
exp (2.2 2.5 ) 255.93 = + = , accept H
0
.
No need to consider x
3
.
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 6
(i) A prior distribution is conjugate for a particular model (e.g. Normal, beta) if
the prior and posterior distributions are from the same family.
(ii) Likelihood
( ) ( )
1 / 2
1
1
constant exp 2
2
n
n
i i
i
L x x
=
= +
`
)
x .
The posterior distribution is proportional to ( ) ( )
g L x , i.e. it is
( )
( )
1 1 / 2
1
1
constant exp 2
2
n
n
i i
i
x x
+
=
(
+ +
`
(
)
,
which is gamma with parameters ( / 2) n + and
( )
1
1
2
2
i i
x x
+ +
. Hence the
gamma prior is conjugate.
(iii) The mean, 20, is /. The variance, also 20, is /
2
. So must be 1, and
must be 20, and these must be the values used in the prior distribution.
(iv) x is gamma
80 5.0
20 , 1
2 2
( | | | |
+ +
| | (
\ . \ .
, i.e. gamma(60, 3.5).
The mean of this is 60/3.5 and the variance is 60/(3.5)
2
. These are used in a Normal
approximation, which is satisfactory for n = 80. Hence an approximate 90% highest
posterior density interval for is given by
60 60
1.6449
3.5 3.5
,
i.e. 17.143 3.640 or (13.50, 20.78).
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 7
(i) The likelihood
( )
L x is ( ) . 1
i
i
n x
x
k
x
( )
1
1
1
i
i
n x
x
+
+
= .
So x is beta( + x
i
, + n x
i
), and with a squared error loss the Bayes estimator
of is the mean of this distribution, i.e.
i
x
n
+
+ +
.
(ii) When
1
2
n = = , we have
1
2
B
i
n x
n n
+
=
+
. The expectation of this is
1
2
n n
n n
+
+
, so its bias is given by
( )
1 1 1
2 2 2
1
n n n
n n n n n
+
= =
+ + +
.
Also,
( )
( )
( )
( )
( )
B 2 2
1 1
Var Var 1
1
i
x
n
n n
n n n
| |
= = =
|
+
\ .
+ +
.
The risk is
( )
( )
( )
( )
( ) ( )
2
1
2 2
B 2 2 2
1 1
Bias Variance
1 1 4 1
MSE
n n n
= + = + =
+ + +
.
(iii) A Bayes estimator with constant risk for all is minimax.
Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 8
Topics to be included in a comprehensive answer include the following, and suitable
examples should be given.
Parametric tests are based on assumptions about the values of the parameters in mass
or density functions for a family of distributions, for example N(,
2
) or B(n, p), and
confidence interval methods use the same theory.
Parametric methods often use a likelihood function based on an assumed model, for
example in a likelihood ratio test to compare hypotheses about a parameter in (say) a
gamma family.
Moments of a distribution, especially mean and variance, are often used in parametric
methods, whereas order statistics (median etc) are more useful for non-parametric
inference.
It is less easy to construct confidence-limit arguments in non-parametric inference.
Non-parametric methods need fewer assumptions, for example not requiring a
specific distribution as a model.
Prior information for parametric methods includes a model and some values for its
parameters, whereas merely the value of an order statistic is often sufficient in a non-
parametric test.
Exact probability theory based on samples from Normal distributions can be used for
parametric methods, whereas approximate methods are more common for non-
parametric methods.
Computing of critical value tables for non-parametric tests is often very complex
compared with that required for parametric tests, although some good Normal
approximations exist for moderate-sized samples in some standard non-parametric
tests.
If both types of test are possible for a set of data (for example a two-sample test), the
parametric one is more powerful (provided the underlying modelling assumptions are
satisfied) but the non-parametric one may be more robust (in case the assumptions are
not).
Ranked (non-numerical) data need the non-parametric approach.