The Royal Statistical Society 2003 Examinations: Solutions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

THE ROYAL STATISTICAL SOCIETY

2003 EXAMINATIONS SOLUTIONS





GRADUATE DIPLOMA

PAPER II STATISTICAL THEORY & METHODS





The Society provides these solutions to assist candidates preparing for the
examinations in future years and for the information of any other persons using the
examinations.

The solutions should NOT be seen as "model answers". Rather, they have been
written out in considerable detail and are intended as learning aids.

Users of the solutions should always be aware that in many cases there are valid
alternative methods. Also, in the many cases where discussion is called for, there
may be other valid points that could be made.

While every care has been taken with the preparation of these solutions, the Society
will not be responsible for any errors or omissions.

The Society will not enter into any correspondence in respect of these solutions.












RSS 2003



Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 1


(i) The likelihood L of the sample is

( ) ( )
( ) 1
1 1
1
n n
n
i i
i i
L f x x

+
= =
= = +



i.e. we have ( ) ( )
1
log log 1 log 1
n
i
i
L n x
=
= + +

.
( )
( )
1
log
log 1
n
i
i
d L n
x
d
=
= +

(A)
and setting this equal to zero gives
( )

log 1
i
n
x
=
+

. Further,
( )
2
2 2
log d L n
d
= ,
confirming that this is a maximum.

Hence, by the invariance property of maximum likelihood estimators,
( )
1
1 1
log 1

n
i
i
x
n

=
= = +

.


(ii) ( ) { } ( )
( )
1
1
1
log 1 1
1
w
w
i i
e
P X w P X e dx
x

+ > = > =
+


( )
1
1 1
0
1
w
w
w
e
e
e
x

= = + = (
+
(

.

Hence the cdf of this is 1 e


w
and the pdf is e


w
, so the distribution is exponential
with mean 1/ = .
[ ] ( )
1
. log 1
i
E nE X
n
= + = (

. Thus is an unbiased estimator of .


(iii) ( ) ( ) log log
d d d
L L
d d d


=
( ) { }
2
1
log 1
i
n x


= +
`
)

[using result (A) above] ( )


2
1
1
log 1
n
i
i
n
x

=
= + +

.
( ) ( )
2
2 2 3
1
2
log log 1
n
i
i
d n
L x
d
=
= +

,
2
2 2 3 2
2
log
d n n
E L n
d


(
= + =
(

, and
the C-R lower bound is
2
/n. From (ii), ( )
2
Var / n = , so the bound is attained.


(iv) No. Because the bound is attainable for , it cannot be attainable for a non-
linear function of , such as = 1/.


Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 2


(i) Given a random sample of data X from a distribution having parameter , a
statistic T(X) is sufficient for if the conditional distribution of X given T(X) does
not involve .


(ii) Let Y = min(X
i
). Defining the indicator function I

(x
i
) to be 0 when x
i
< and
to be 1 when x
i
, the likelihood function is ( ) ( )
1
i
n
x
i
i
L e I x


=
=

. Also, we have
( ) ( )
1
n
i
i
I x I y

=
=

and so ( ) ( )
i
x n
L e I y e


= . Therefore, by the factorisation
theorem, Y is sufficient for .


(iii) P(Y > y) implies P(X
1
> y, X
2
> y, , X
n
> y), i.e. ( ) ( )
1
n
i
i
P Y y P X y
=
> = >

.
Now, ( )
x x y
y y
P X y e dx e e



( > = = =

, so ( )
( ) n y
P Y y e

> = , for y > .
Hence the cdf is ( )
( )
1
n y
F y e

= and the pdf is f(y) = dF(y)/dy =
( ) n y
ne

, for y > .


(iv) We have that Y has a shifted exponential distribution. Hence ( )
1
E Y
n
= +
and ( )
2
1
Var Y
n
= , so that ( )
1
E Y c c
n
= + and ( )
2
1
Var Y c
n
= . From these,
( )
1
Bias Y c c
n
= and
2
2
2
1 1
Bias Var MSE c
n n
| |
= + = +
|
\ .
, which is clearly
minimised when c = 1/n. Thus Y (1/n) has smallest variance of all estimators of the
form Y c.



Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 3


(i) The likelihood for a sample (x
1
, x
2,
, , x
n
) is ( ) ( ) Const. 1
i
i
n x
x
L

= ,
and so the likelihood ratio is
( )
( )
( ) ( )
( ) ( )
2 2 1
3 3 3
3
3 1
4
4 4
8 4
9 3
i i
i i
i i
x n x
x n x
x n x
L
L




| | | |
= = =
| |
\ . \ .
. Using the
Neyman-Pearson lemma, we reject H
0
when > c, where c is chosen to give the
required level of test, . Now, is an increasing function of x
i
, hence of

, and an
equivalent rule is therefore to reject H
0
when

k < , where k is chosen to give test


level .


(ii)

n is binomial with parameters (n, ). Hence the large-sample distribution of

is ( ) ( )
N , 1 /n . When = 3/4 this is ( )
3 3
4 16
N ,
n
, and when = 2/3 it is
( )
2 2
3 9
N ,
n
.


(iii) For = 0.05, choose k such that
( )
3
4

0.05 P k < = = . That is, we want


3
4
0.05
3/16
k
n
| |
=
|
\ .
, or
3
4
1.6449
3/16
k
n

= , giving
3 1.6449 3
4 4
k
n
= .


(iv) For power 0.95,
( )
2
3

0.95 P k < = = , i.e.


2
3
0.95
2/ 9
k
n
| |
=
|
\ .
or
2
3
1.6449
2/ 9
k
n

= , giving
2 1.6449 2
3 3
k
n
= + .

Using this expression for k together with the expression in (iii) means that we require
3 1.6449 3 2 1.6449 2
4 4 3 3 n n
= + or
1 1.6449 1 1
3 2
12 4 3 n
| |
= +
|
\ .
.

Thus we get n = 12 1.6449 0.9044 = 17.8521 and n = 318.7, so we take n =
319.



Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 4


(i) P(0) = P(1) = (1 ) P( 2) = 1 (1 ) = (1 )
2
.

Thus the likelihood of n
0
zeros, n
1
ones and n
2
with two or more flaws is

( ) { } { }
( )
( )
1 0 1 0 1 0 0 1
2 2 2
1 1 1
n n n n n n n n n n
L
+
= = .


(ii) ( ) ( ) ( ) ( )
0 1 0 1
log log 2 2 log 1 L n n n n n = + + .

( )
0 1 0 1
2 2
log
1
n n n n n d
L
d
+
=

.

Setting this equal to zero gives that

satisfies ( )
( )
( )
0 0 1

1 2 2 n n n n n + = , so
that
0 1
0

2
n n
n n

+
=

.

Further, ( )
( )
2
0 1 0 1
2 2 2
2 2
log
1
n n n n n d
L
d

+
=

, which confirms that

is a maximum,
and the sample information when

= (given by
( )
2
2
log d L
d
E

evaluated at

= ) is
( ) ( )
2 2
0 0
0 1 0 1
2 2
2 2
n n n n
n n n n n

+
+
(using
0 1
0
2 2

1
2
n n n
n n

).


(iii) An approximate 90% confidence interval for is
( )
1.6449

sample information
.

In the case when n = 100, n
0
= 90 and n
1
= 7, we have 2n n
0
= 110, n
0
+ n
1
= 97 and
2n 2n
0
n
1
= 13.

Thus
97

0.882
110
= = and the sample information is
2 2
110 110
1055.5115
97 13
+ = .

Thus the confidence interval is

1.6449
0.882
32.489
, i.e. 0.882 0.051 or (0.831, 0.933).


Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 5


(i) = 0.025, = 0.075.

For observations x
1
, x
2
, , x
n
the likelihood is ( )
2
1
2
2
2
exp
n
n
i
i i
n
n
x
x
L

=
| |

=
|
\ .

, and for
the given H
0
and H
1
the likelihood ratio is
( )
( )
2
2
2
1 3
exp
1 2 4
n
n i n
n
L
x
L

| |
= =
|
\ .

.

The sequential probability ratio test rule is to continue sampling while A <
n
< B,
accept H
0
if
n
B and reject H
0
(i.e. accept H
1
) if
n
A. A and B are given by
0.025 1
0.027
1 0.925 37
A

= = = =

,
1 0.975
13
0.075
B

= = = .


(ii)
( )
2 2
3
2 /
2
0
2
x
x
E X e dx

put y = x
2
/
2
, so that dy/dx = 2x/
2

( )
2 2 2
0
2
y
ye dy

= = =

.

The ith item in the sequence making up {log
n
} is
2
3
4
2log 2
i i
Z X = + .

( )
3
4
2 2log 2 .4 1.6137
i
E Z = = + = .

( )
3
4
1 2log 2 .1 0.6363
i
E Z = = + = .

( )
( )
( )
log 1 log
2 1.494
2
i
A B
E N
E Z

+
= =
=
.

( )
( )
( )
1 log log
1 4.948
1
i
A B
E N
E Z

+
= =
=
.


(iii) x
1
= 2.2. ( )
3 1
1 4 4
exp 4.84 9.428 = = , continue sampling.

x
2
= 2.5.
( )
2 2
3 1
2 16 4
exp (2.2 2.5 ) 255.93 = + = , accept H
0
.

No need to consider x
3
.



Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 6


(i) A prior distribution is conjugate for a particular model (e.g. Normal, beta) if
the prior and posterior distributions are from the same family.


(ii) Likelihood
( ) ( )
1 / 2
1
1
constant exp 2
2
n
n
i i
i
L x x

=

= +
`
)

x .

The posterior distribution is proportional to ( ) ( )
g L x , i.e. it is
( )
( )
1 1 / 2
1
1
constant exp 2
2
n
n
i i
i
x x


+
=
(
+ +
`
(
)

,

which is gamma with parameters ( / 2) n + and
( )
1
1
2
2
i i
x x

+ +

. Hence the
gamma prior is conjugate.


(iii) The mean, 20, is /. The variance, also 20, is /
2
. So must be 1, and
must be 20, and these must be the values used in the prior distribution.


(iv) x is gamma
80 5.0
20 , 1
2 2
( | | | |
+ +
| | (
\ . \ .
, i.e. gamma(60, 3.5).

The mean of this is 60/3.5 and the variance is 60/(3.5)
2
. These are used in a Normal
approximation, which is satisfactory for n = 80. Hence an approximate 90% highest
posterior density interval for is given by

60 60
1.6449
3.5 3.5
,

i.e. 17.143 3.640 or (13.50, 20.78).



Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 7


(i) The likelihood
( )
L x is ( ) . 1
i
i
n x
x
k

, and the posterior density is



( ) ( ) ( )
g g L x x

i.e. we have

( ) ( ) ( )
1
1
1 1
i
i
n x
x
g




x
( )
1
1
1
i
i
n x
x


+
+
= .

So x is beta( + x
i
, + n x
i
), and with a squared error loss the Bayes estimator
of is the mean of this distribution, i.e.
i
x
n


+
+ +
.


(ii) When
1
2
n = = , we have
1
2
B

i
n x
n n

+
=
+
. The expectation of this is
1
2
n n
n n
+
+
, so its bias is given by
( )
1 1 1
2 2 2
1
n n n
n n n n n

+
= =
+ + +
.

Also,
( )
( )
( )
( )
( )
B 2 2
1 1

Var Var 1
1
i
x
n
n n
n n n


| |
= = =
|
+
\ .
+ +
.

The risk is
( )
( )
( )
( )
( ) ( )
2
1
2 2
B 2 2 2
1 1

Bias Variance
1 1 4 1
MSE
n n n


= + = + =
+ + +
.


(iii) A Bayes estimator with constant risk for all is minimax.



Graduate Diploma, Statistical Theory & Methods, Paper II, 2003. Question 8


Topics to be included in a comprehensive answer include the following, and suitable
examples should be given.

Parametric tests are based on assumptions about the values of the parameters in mass
or density functions for a family of distributions, for example N(,
2
) or B(n, p), and
confidence interval methods use the same theory.

Parametric methods often use a likelihood function based on an assumed model, for
example in a likelihood ratio test to compare hypotheses about a parameter in (say) a
gamma family.

Moments of a distribution, especially mean and variance, are often used in parametric
methods, whereas order statistics (median etc) are more useful for non-parametric
inference.

It is less easy to construct confidence-limit arguments in non-parametric inference.

Non-parametric methods need fewer assumptions, for example not requiring a
specific distribution as a model.

Prior information for parametric methods includes a model and some values for its
parameters, whereas merely the value of an order statistic is often sufficient in a non-
parametric test.

Exact probability theory based on samples from Normal distributions can be used for
parametric methods, whereas approximate methods are more common for non-
parametric methods.

Computing of critical value tables for non-parametric tests is often very complex
compared with that required for parametric tests, although some good Normal
approximations exist for moderate-sized samples in some standard non-parametric
tests.

If both types of test are possible for a set of data (for example a two-sample test), the
parametric one is more powerful (provided the underlying modelling assumptions are
satisfied) but the non-parametric one may be more robust (in case the assumptions are
not).

Ranked (non-numerical) data need the non-parametric approach.

You might also like