CS2 Booklet 8 (Time Series) 2019 FINAL

Exclusive use Batch0402p
Subject CS2
Revision Notes
For the 2019 exams
Time series
Booklet 8
covering
Chapter 13 Time series 1

The Actuarial Education Company

CONTENTS
Contents Page
Links to the Course Notes and Syllabus 2

Overview 4
Core Reading 5
Past Exam Questions 69
Solutions to Past Exam Questions 90
Factsheet 155
Copyright agreement
All of this material is copyright. The copyright belongs to Institute and

Faculty Education Ltd, a subsidiary of the Institute and Faculty of Actuaries.
The material is sold to you for your own exclusive use. You may not hire
out, lend, give, sell, transmit electronically, store electronically or photocopy
any part of it. You must take care of your material to ensure it is not used or
copied by anyone at any time.
Legal action will be taken if these terms are infringed. In addition, we may
seek to take disciplinary action through the profession or through your
employer.
These conditions remain in force after you have finished using the course.
© IFE: 2019 Examinations Page 1

LINKS TO THE COURSE NOTES AND SYLLABUS
Material covered in this booklet

These chapter numbers refer to the 2019 edition of the ActEd Course Notes.
Syllabus objectives covered in this booklet
The numbering of the syllabus items is the same as that used by the Institute
and Faculty of Actuaries.
2.1 Concepts underlying time series models
2.1.1 Explain the concept and general properties of stationary,

I (0) , and integrated, I (1) , univariate time series.
2.1.2 Explain the concept of a stationary random series.
2.1.3 Explain the concept of a filter applied to a stationary

random series.
2.1.4 Know the notation for backwards shift operator, backwards

difference operator, and the concept of roots of the
characteristic equation of time series.
2.1.5 Explain the concepts and basic properties of

autoregressive (AR), moving average (MA), autoregressive
moving average (ARMA) and autoregressive integrated
moving average (ARIMA) time series.
2.1.6 Explain the concept and properties of discrete random

walks and random walks with normally distributed
increments, both with and without drift.
2.1.7 Explain the basic concept of a multivariate autoregressive

model.
2.1.8 Explain the concept of cointegrated time series.
Page 2 © IFE: 2019 Examinations

2.1.9 Show that certain univariate time series models have the
Markov property and describe how to rearrange a
univariate time series model as a multivariate Markov
model.
2.2 Applications of time series models
2.2.1 Outline the processes of identification, estimation and

diagnosis of a time series, the criteria for choosing
between models and the diagnostic tests that might be
applied to the residuals of a time series after estimation.
2.2.2 Describe briefly other non-stationary, non-linear time

series models.
2.2.3 Describe simple applications of a time series model,

including random walk, autoregressive and cointegrated
models as applied to security prices and other economic
variables.
2.2.4 Develop deterministic forecasts from time series data,

using simple extrapolation and moving average models,
applying smoothing techniques and seasonal adjustment
when appropriate.

OVERVIEW
This booklet covers Syllabus objectives 2.1 and 2.2, which relate to time
series.
In this course, we look in detail at four important types for time series:
 moving average (MA) processes
 autoregressive (AR) processes
 autoregressive moving average (ARMA) processes, and
 autoregressive integrated moving average (ARIMA) processes.
For each model we consider the properties of stationarity and invertibility,

and we consider its autocorrelation function and partial autocorrelation
function.
We then go on to discuss how we can fit a time series model to a data set
using the Box-Jenkins methodology, and how to use a model to forecast
future values of a process.
In addition, we brieftly consider some more complicated time series models,

including multivariate time series and ARCH models.
There are many past exam questions (from Subject CT6) that ask for the
derivation of an autocorrelation function. These questions involve standard
algebra.
Questions on the Box-Jenkins methodology can involve bookwork or the

interpretation of graphs and summary statistics.

CORE READING
All of the Core Reading for the topics covered in this booklet is contained in
this section.
We have inserted paragraph numbers in some places, such as 1, 2, 3 …, to

help break up the text. These numbers do not form part of the Core
Reading.
The text given in Arial Bold font is Core Reading.
The text given in Arial Bold Italic font is additional Core Reading that is not
directly related to the topic being discussed.
____________
Chapter 13 – Time series 1
Univariate time series
1 A univariate time series is a sequence of observations of a single

process taken at a sequence of different times. Such a series can in
general be written as:
x (t1), x (t2 ),  , x (t n ) ie as { x (t i ) : i = 1, 2, 3,  , n }
Most applications involve observations taken at equally-spaced times.

In this case the series is written as:
x1, x 2 ,  , x n ie as { x t : t = 1, 2, 3,  , n }
____________
2 For instance, a sequence of daily closing prices of a given share

constitutes a time series, as does a sequence of monthly inflation
figures.
____________

3 The fact that the observations occur in time order is of prime

importance in any attempt to describe, analyse and model time series
data.
The observations are related to one another and cannot be regarded as

observations of independent random variables. It is this very
dependence amongst the members of the underlying sequence of
variables which any analysis must recognise and exploit.
____________
For example, a list of returns of the stocks in the FTSE 100 index on a
particular day is not a time series, and the order of records in the list is
irrelevant. At the same time, a list of values of the FTSE 100 index
taken at one-minute intervals on a particular day is a time series, and
the order of records in the list is of paramount importance.
Note that the observations x t can arise in different situations. For

example:
 the time scale may be inherently discrete (as in the case of a series
of ‘closing’ share prices)
 the series may arise as a sample from a series observable
continuously through time (as in the case of hourly readings of
atmospheric temperature)
 each observation may represent the results of aggregating a
quantity over a period of time (as in the case of a company’s total
premium income on new business each month).
UK women unemployed (1967-72)
100
h
u
n 90
d
r
e
d 80
s
70
Index 10 20 30 40 50
Figure 13.0: a time series

The purposes of a practical time series analysis may be summarised

as:
 description of the data
 construction of a model which fits the data
 forecasting future values of the process
 deciding whether the process is out of control, requiring action
 for vector time series, investigating connections between two or
more observed processes with the aim of using values of some of
the processes to predict those of the others
____________
4 A univariate time series is modelled as a realisation of a sequence of

random variables:
{ X t : t = 1, 2, 3,  , n }
called a time series process.
(Note, however, that in the modern literature the term ‘time series’ is
often used to mean both the data and the process of which it is a
realisation.)
____________
5 A time series process is a stochastic process indexed in discrete time

with a continuous state space.
____________
The sequence { X t : t = 1, 2,  , n } may be regarded as a subsequence

of a doubly infinite collection { X t : t =  , - 2, - 1, 0, 1, 2, } . This
interpretation will be found to be helpful in investigating notions such
as convergence to equilibrium.
____________
Properties of univariate time series
The concept of stationarity was introduced in Booklet 1, along with the

ideas of strict and weak stationarity.
____________

In the study of time series it is a convention that the word ‘stationary’

on its own is a shorthand notation for ‘weakly stationary’, though in the
case of a multivariate normal process the two forms of stationarity are
equivalent.
But we do need to be careful in our definition, as there are some

processes which we wish to exclude from consideration but which
satisfy the definition of weak stationarity.
____________
6 A process X is called purely indeterministic if knowledge of the

values of X 1,  , X n is progressively less useful at predicting the value
of X N as N Æ • .
____________
7 When we talk of a ‘stationary time series process’ we shall mean a

weakly stationary purely indeterministic process.
____________
8 A particular form of notation is used for time series: X is said to be

I (0) (read ‘integrated of order 0’) if it is a stationary time series
process, X is I (1) if X itself is not stationary but the increments
Yt = X t - X t - 1 form a stationary process, X is I (2) if it is
non-stationary but the process Y is I (1) , and so on.
____________
The theory of stationary random processes plays an important role in

the theory of time series because the calibration of time series models
(that is, estimation of the values of the model’s parameters using
historical data) can be performed efficiently only in the case of
stationary random processes. A non-stationary random process has to
be transformed into a stationary one before the calibration can be
performed.
____________

Mean, covariance and correlation
9 The mean function (or trend) of the process is mt = E ( X t ) , the

covariance function cov( X s , X t ) = E ( X s X t ) - E ( X s )E ( X t ) .
____________
Both of these functions take a simpler form in the case where X is

stationary.
____________
10 The mean of a stationary time stochastic process is constant, ie mt ∫ m

for all t .
The covariance of any pair of elements X r and X s of a stationary

sequence X depends only on the difference r - s .
____________
11 We can therefore define the autocovariance function {g k : k Œ } of a

stationary random process X as follows:
g k = cov( X t , X t + k ) = E ( X t X t + k ) - E ( X t )E ( X t + k )
The common variance of the elements of a stationary process is given

by:
g 0 = var( X t )
____________
12 The autocorrelation function (ACF) of a stationary process is defined

by:
gk
r k = corr( X t , X t + k ) =
g0
____________
13 The ACF of a purely indeterministic process satisfies r k Æ 0 as

kÆ•.
____________

14 The autocovariance function g and autocorrelation function r of a

stationary random process are even functions of k , that is, g k = g - k
and r k = r - k .
Since the autocovariance function g k = cov( X t , X t + k ) does not depend

on t , we have:
g k = cov( X t - k , X t - k + k ) = cov( X t - k , X t ) = cov( X t , X t - k ) = g - k
Thus g is an even function, which in turn implies that r is even.

____________
15 Another important characteristic of a stationary random process is the

partial autocorrelation function (PACF) {fk : k = 1, 2, } , defined as the
conditional correlation of X t + k with X t given X t + 1,  , X t + k - 1 .
____________
This may be derived as the coefficient fk ,k in the problem to minimise:
È
( )
2˘
E Í X t - f k ,1X t -1 - f k ,2 X t - 2 -  - f k ,k X t - k ˙˚
Î
The formula for calculating fk involves a ratio of determinants of large

matrices whose entries are determined by r1,  , r k ; it may be found
in standard works on time series analysis, and is readily available in
common computer packages like R.

Figure 13.1: ACF and PACF values of some stationary time series
model.
____________
16 In particular the formulae for f1 and f2 are as follows:
Ê 1 r1 ˆ
det Á
Ë r1 r2 ¯˜ r - r2
f1 = r1, f2 = = 2 21
Ê 1 r1ˆ 1 - r1
det Á
Ë r1 1 ˜¯
____________
Note that for each k , f k depends on only r1, r2 ,..., r k .

____________
Operators
Further discussion of the various models will be helped by the use of

two operators which operate on the whole time series process X .
____________

17 The backwards shift operator, B , acts on the process X to give a

process BX such that:
(BX )t = X t - 1
____________
18 The difference operator, — , is defined as — = 1 - B , or in other words:
(—X )t = X t - X t - 1
____________
19 Both operators can be applied repeatedly. For example:
(B2 X )t = (B(BX ))t = (BX )t - 1 = X t - 2

(—2 X )t = (—X )t - (—X )t - 1 = X t - 2 X t - 1 + X t - 2
and can be combined as, for example:
(B—X )t = (B(1 - B ) X )t = (BX )t - (B2 X )t = X t - 1 - X t - 2

____________
The usefulness of both of these operators will become apparent in later

sections.
The R commands for generating the differenced values of some time

series x are:
diff(x,lag=1,differences=1)
for ordinary difference — .
for differencing three times —3 , and:
for a simple seasonal difference with period 12, —12 (see later).
____________

White noise
A simple class of weakly stationary random processes is the white

noise processes.
____________
20 A random process {et : t Œ } is a white noise process if E (et ) = 0 for

any t , and:
ÔÏs 2 if k = 0
g k = cov(et , et + k ) = Ì
ÓÔ0 otherwise
____________
An important representative of the white noise processes is a

sequence of independent normal random variables with common
mean 0 and variance s 2 .
____________
Main linear models of time series
21 The main linear models used for modelling stationary time series are:
 Autoregressive process (AR)
 Moving average process (MA)
 Autoregressive moving average process (ARMA).
____________
The definitions of each of these processes, presented below, involve

the standard zero-mean white noise process {et : t = 1, 2, } defined
above.
In practice we often wish to model processes which are not I (0)

(stationary) but I (1) .
____________
22 For this purpose a further model is considered:
 Autoregressive integrated moving average (ARIMA).

____________

23 An autoregressive process of order p (the notation AR ( p) is

commonly used) is a sequence of random variables { X t } defined
consecutively by the rule:
X t = m + a 1( X t - 1 - m ) + a 2 ( X t - 2 - m ) +  + a p ( X t - p - m ) + et
Thus the autoregressive model attempts to explain the current value of

X as a linear combination of past values with some additional
externally generated random variation. The similarity to the procedure
of linear regression is clear, and explains the origin of the name
‘autoregression’.
____________
24 A moving average process of order q , denoted MA(q ) , is a sequence

{ X t } defined by the rule:
X t = m + et + b 1et - 1 +  + b q et - q
The moving average model explains the relationship between the X t

as an indirect effect, arising from the fact that the current value of the
process results from the recently passed random error terms as well as
the current one. In this sense, X t is ‘smoothed noise’.
____________
25 The two basic processes (AR and MA) can be combined to give an
autoregressive moving average, or ARMA, process. The defining
equation of an ARMA( p, q ) process is:
X t = m + a 1( X t - 1 - m ) +  + a p ( X t - p - m ) + et + b 1et - 1 +  + b q et - q
Note: ARMA( p,0) is AR ( p) ; ARMA(0, q ) is MA(q ) .

____________

AR (1) processes
The simplest autoregressive process is the AR (1) , given by:
X t = m + a ( X t - 1 - m ) + et (13.1)
____________
26 A process satisfying this recursive definition can be represented as:
t -1
Xt = m + a t (X 0 - m ) + Â a j et - j (13.2)
j =0
____________
27 It follows that the mean function mt is given by:
mt = m + a t ( m0 - m )
____________
28 The same representation (13.2) gives the variance:
1 - a 2t
var( X t ) = s 2 + a 2t var( X 0 )
1- a 2
where, as before, s 2 denotes the common variance of the white noise

terms {et } .
____________
29 From this it follows that a stationary process X satisfying (13.1) can

only exist if a < 1 .
____________

30 Further requirements are that m0 = m and that var( X 0 ) = s 2 (1 - a 2 ) .

Notice that this implies that X can only be stationary if X 0 is random.
If X 0 is a known constant, then var( X 0 ) = 0 and var( X t ) is no longer

independent of t , whereas if X 0 has expectation different from m then
the process X will have non-constant expectation.
____________
31 It is easy to see that the difference mt - m is a multiple of a t and that

var( X t ) - s 2 (1 - a 2 ) is a multiple of a 2t . Both of these terms will
decay away to zero for large t if a < 1 , implying that X will be
virtually stationary for large t .
____________
32 In this context it is often helpful to assume that X 1,  , X n is merely a

subsequence of a process  , X -1, X 0 , X 1,  , X n which has been
going on unobserved for a long time and has already reached a ‘steady
state’ by the time of the first observation.
____________
A double-sided infinite process satisfying (13.1) can be represented as:
•
Xt = m + Â a j et - j (13.3)
j =0
____________
33 This representation makes it clear that X t has expectation m and

variance equal to:
•
s2
Â a 2 js 2 = if a < 1
j =0 1- a 2
____________

In order to deduce that X is stationary we also need to calculate the

autocovariance function:
• •
g k = cov( X t , X t + k ) = Â Â a ia j cov(et - j , et + k - i )
j =0 i =0
•
= Â s 2a 2 j + k = a kg 0
j =0
This is independent of t , and thus a stationary process exists as long

as a < 1 .
It is worth introducing here a method of more general utility for

calculating autocovariance functions.
____________
34 From (13.1) we have, assuming that X is stationary:
g k = cov( X t , X t - k ) = cov( m + a ( X t - 1 - m ) + et , X t - k )
= a cov( X t - 1, X t - k )
= ag k - 1
implying that:
s2
g k = a kg 0 = a k for k ≥ 0
1- a 2
____________
35 So:
gk
rk = = a k for k ≥ 0
g0
____________

36 The partial autocorrelation function fk is given by:
a2 -a2
f1 = r1 = a f2 = =0
1- a 2
Indeed, since the best linear estimator of Xt given

X t - 1, X t - 2 , X t - 3 ,  is just a X t - 1 , the definition of the PACF implies
that fk = 0 for all k > 1 . Notice the contrast with the ACF, which
decreases geometrically towards 0.
____________
The following lines in R generate the ACF and PACF functions for an
AR (1) model:
par(mfrow=c(1,2))
barplot(ARMAacf(ar=0.7,lag.max = 12)[-1],main = "ACF

of AR(1)",col="red")
barplot(ARMAacf(ar=0.7,lag.max = 12,pacf = TRUE),main

= "PACF of AR(1)",col="red")
Figure 13.2: ACF and PACF of AR (1) with   0.7

____________

One of the well-known applications of a univariate autoregressive

model is the description of the evolution of the consumer price index
{Qt : t = 1, 2, 3,  } . The force of inflation, rt = ln(Qt Qt - 1 ) , is assumed
to follow the AR (1) process:
rt = m + a (rt - 1 - m ) + et
One initial condition, the value for r0 , is required for the complete
specification of the model for the force of inflation rt .
____________
AR ( p) processes
The equation of the more general AR ( p) process is:
X t = m + a 1( X t - 1 - m ) + a 2 ( X t - 2 - m ) +  + a p ( X t - p - m ) + et (13.4)
____________
37 In terms of the backwards shift operator:
(1 - a 1B - a 2B2 -  - a pB p )( X - m ) = et (13.5)
____________
As seen for AR (1) , there are some restrictions on the values of the a j
which are permitted if the process is to be stationary. In particular, we
have the following result.
____________
38 If the time series process X given by (13.4) is stationary, then the

roots of the equation:
1 - a 1z - a 2 z 2 -  - a p z p = 0
are all greater than 1 in absolute value.

(The polynomial 1 - a 1z - a 2 z 2 -  - a p z p is called the characteristic

polynomial of the autoregression.)
____________
Proof
If X is stationary then its autocovariance function satisfies:
Ê p ˆ p
g k = cov( X t , X t - k ) = cov Á Â a j X t - j + et , X t - k ˜ = Â a jg k - j
ËÁ j =1 ¯˜ j =1
for k ≥ p . This is a pth order difference equation with constant

coefficients; it has solution of the form:
p
gk = Â A j z -j k
j =1
for all k ≥ 0 , where z1,  , z p are the p roots of the characteristic

polynomial and A1,  , Ap are constants. As X is purely
indeterministic, we must have g k Æ 0 , which requires that z j > 1 for

each j .
The converse of this result is also true (but the proof is not given here):
if the roots of the characteristic polynomial are all greater than 1 in
absolute value, then it is possible to construct a stationary process X
satisfying (13.4). In order for an arbitrary process X satisfying (13.4)
to be stationary, the variances and covariances of the initial values
X 0 , X -1,  , X - p + 1 must also be equal to the appropriate values.
Often exact values for the g k are required, entailing finding the values
of the constants Ak .
____________

39 From (13.4) we have:
cov( X t , X t - k ) = a 1 cov( X t -1, X t - k ) +  + a p cov( X t - p , X t - k )

+ cov(et , X t - k )
which can be re-expressed as:
g k = a 1g k - 1 + a 2g k - 2 +  + a pg k - p + s 2 1{k = 0}
for 0 £ k £ p . (These are known as the Yule-Walker equations.) Here

the notation 1{ k = 0} denotes an indicator function, taking the value 1 if
k = 0 , the value 0 otherwise.
____________
40 For p = 3 we have 4 equations:
g 3 = a 1g 2 + a 2g 1 + a 3g 0
g 2 = a 1g 1 + a 2g 0 + a 3g 1
g 1 = a 1g 0 + a 2g 1 + a 3g 2
g 0 = a 1g 1 + a 2g 2 + a 3g 3 + s 2
____________
The second and third of these equations are sufficient to deduce g 2

and g 1 in terms of g 0 , which is all that is required to find r2 and r1 .
The first and fourth of the equations are needed when the values of the
g k are to be found explicitly.

The PACF, {fk : k ≥ 1} , of the AR ( p) process can be calculated from

the defining equations, but is not memorable. In particular, the first
three equations above can be written in terms of r1 , r2 , r3 and the
resulting solution of a 3 as a function of r1 , r2 , r3 is the expression
of f 3 . The same idea applies to all values of k , so that f k is the
solution of a k in a system of k linear equations, including those for
r2 - r12
f1 = r1 and f 2 = that we have seen before.
1 - r12
____________
41 It is important to note, though, that:
fk = 0 for all k > p

____________
42 This property of the PACF is characteristic of autoregressive

processes and forms the basis of the most frequently used test for
determining whether an AR ( p) model fits the data.
____________
It would be difficult to base a test on the ACF as the ACF of an

autoregressive process is a sum of geometrically decreasing
components. (See later.)
____________
MA(1) processes
A first-order moving average, denoted MA(1) , is a process given by:
X t = m + et + b et - 1
____________
43 The mean of this process is:
mt = m
____________

44 The variance and autocovariance are:
g 0 = var(et + b et - 1) = (1 + b 2 )s 2
g 1 = cov(et + b et - 1 , et - 1 + b et - 2 ) = bs 2
gk =0 for k > 1
____________
45 Hence the ACF of the MA(1) process is:
r0 = 1
b
r1 =
1+ b 2
rk = 0 for k > 1
____________
46 An MA(1) process is stationary regardless of the values of its

parameters.
____________
The parameters are nevertheless usually constrained by imposing the

condition of invertibility. This may be explained as follows.
It is possible to have two distinct MA(1) models with identical ACFs:

consider, for example, b = 0.5 and b = 2 , both of which have
b
r1 = = 0.4 .
1+ b 2
____________

47 The defining equation of the MA(1) may be written in terms of the

backwards shift operator:
X - m = (1 + b B)e (13.6)
In many circumstances an autoregressive model is more convenient

than a moving average model. We may rewrite (13.6) as:
(1 + b B )-1( X - m ) = e
and use the standard expansion of (1 + b B )-1 to give:
X t - m - b ( X t - 1 - m ) + b 2 ( X t - 2 - m ) - b 3 ( X t - 3 - m ) +  = et
The original moving average model has therefore been transformed

into an autoregression of infinite order. But this procedure is only
valid if the sum on the left hand side is convergent, in other words if
b < 1 . When this condition is satisfied the MA(1) is called invertible.
____________
Although more than one MA process may share a given ACF, at most
one of the processes will be invertible.
It is possible, at the cost of considerable effort, to calculate the PACF

of the MA(1) , giving:
(1 - b 2 ) b k
fk = ( -1)k + 1
1 - b 2(k + 1)
____________
48 This decays approximately geometrically as k Æ • , highlighting the

way in which the ACF and PACF are complementary: the PACF of a
MA(1) behaves like the ACF of an AR(1), the PACF of an AR(1) behaves
like the ACF of a MA(1).

Figure 13.3: ACF and PACF of MA(1) with b = 0.7

____________
MA(q ) processes
49 The defining equation of the general q th order moving average is, in

backwards shift notation:
X - m = (1 + b1B + b 2 B2 +  + b q Bq )e
____________
The autocovariance function is easier to find than in the case of AR(p):
q q q -k
gk = Â Â b i b j E (et - i et - j - k ) = s 2 Â bi bi +k
i =0 j =0 i =0
as long as k £ q . (Here b 0 denotes 1.)

____________
50 For k > q it is obvious that g k = 0 .

____________

51 Just as autoregressive processes are characterised by the property

that the partial ACF is equal to zero for sufficiently large k , moving
average processes are characterised by the property that the ACF is
equal to zero for sufficiently large k .
____________
52 Although there may be many moving average processes with the same
ACF, at most one of them is invertible, since no two invertible
processes have the same autocorrelation function. Moving average
models fitted to data by statistical packages will always be invertible.
____________
ARMA processes
A combination of the moving average and autoregressive models, an

ARMA model includes direct dependence of X t on both past values
of X and present and past values of e .
The defining equation is:
X t = m + a 1( X t -1 - m ) +  + a p ( X t - p - m ) + et + b 1et -1 +  + b q et -q
or, in backwards shift operator notation:
(1 - a 1B -  - a pB p )( X - m ) = (1 + b 1B +  + b q Bq )e
____________
53 Neither the ACF nor the PACF of the ARMA process eventually
becomes equal to zero.
This makes it more difficult to identify an ARMA model than either a

pure autoregression or a pure moving average.
____________

It is possible to calculate the ACF by a method similar to the method

employing the Yule-Walker equations for the ACF of an autoregression.
We will show that the autocorrelation function of the stationary

zero-mean ARMA(1,1) process:
X t = a X t -1 + et + b et -1 (13.7)
is given by:
(1 + ab )(a + b )
r1 =
(1 + b 2 + 2ab )
r k = a k -1r1, k = 2,3,
Figure 13.1 shows the ACF and PACF values of such a process with
a = 0.7 and b = 0.5 .
____________
54 Using equation (13.7):
cov( X t , et ) = a cov( X t -1, et ) + cov(et , et ) + b cov(et -1, et ) = s 2
since et is independent of both et -1 and X t -1 . Similarly:
cov( X t , et -1) = a cov( X t -1, et -1) + cov(et , et -1) + b cov(et -1, et -1)
= (a + b )s 2
This enables us to deduce the autocovariance function of X .
Again from (13.7):
cov( X t , X t ) = a cov( X t -1, X t ) + cov(et , X t ) + b cov(et -1, X t )
cov( X t , X t -1) = a cov( X t -1, X t -1) + cov(et , X t -1) + b cov(et -1, X t -1)

Also, for k > 1 :
cov( X t , X t - k ) = a cov( X t -1, X t - k ) + cov(et , X t - k ) + b cov(et -1, X t - k )
So:
g 0 = ag 1 + (1 + ab + b 2 )s 2
g 1 = ag 0 + bs 2
g k = ag k - 1
The solution is:
1 + 2ab + b 2
g0 = s2
1- a 2
(a + b )(1 + ab )
g1 = s2
1- a 2
g k = a k -1g 1 , k = 2,3,...
assuming that the process is stationary, ie that a < 1 .

____________
ARIMA processes
In many applications the process being modelled cannot be assumed

stationary, but can reasonably be fitted by a model with stationary
increments, that is, if the first difference of X , Y = —X , is itself a
stationary process.
A process X is called an ARIMA( p,1, q ) process if X is non-

stationary but the first difference of X is an ARMA( p, q ) process.

In certain cases it may be considered desirable to continue beyond the

first difference, if the process X is still not stationary after being
differenced once. The notation extends in a natural way.
____________
55 If X needs to be differenced at least d times in order to reduce it to

stationarity and if the d th difference Y = —d X is an ARMA( p, q )
process, then X is termed an ARIMA( p, d , q ) process.
____________
56 In terms of the backwards shift operator, the equation of the

ARIMA( p, d , q ) process is:
(1 - a 1B -  - a p B p )(1 - B)d ( X - m ) = (1 + b1B +  + b q Bq )e

____________
57 Example 1
The simplest example of an ARIMA process is the random walk:
X t = X t -1 + et
This can be rewritten as:
t
Xt = X0 + Â ej
j =1
The expectation of Xt is equal to E ( X 0 ) but the variance is

var( X 0 ) = ts 2 , so that X is not itself stationary. The first difference,
however, is given by:
Yt = —X t = et
which certainly is stationary. Thus the random walk is an

ARIMA(0,1, 0) process.
____________

58 To identify the values of p, d and q for which X is an

ARIMA( p, d , q ) process, where:
X t = 0.6 X t - 1 + 0.3 X t - 2 + 0.1X t - 3 + et - 0.25et - 1
we can write the equation in terms of the backwards shift operator:
(1 - 0.6B - 0.3B2 - 0.1B3 ) X = (1 - 0.25B)e
We now check whether the polynomial on the left-hand side is divisible

by 1 - B ; if so, factorise it out. Continue to do this until the remaining
polynomial is not divisible by 1 - B .
(1 - B)(1 + 0.4B + 0.1B2 ) X = (1 - 0.25B)e
The model can now be seen to be ARIMA(2,1,1) .

____________
Other examples of ARIMA processes
Example 2
Let Zt denote the closing price of a share on day t . The evolution of

Z is frequently described by the model:
Zt = Zt - 1 exp( m + et )
By taking logarithms we see that this model is equivalent to an I (1)

model, since Yt = ln Zt satisfies the equation:
Yt = m + Yt - 1 + et
which is the defining equation of a random walk with drift because

t
Yt = Y0 + m t + Â ej . The model is based on the assumption that the
j =1
daily returns ln(Zt Zt - 1 ) are independent of the past prices

Z 0 , Z1 ,  , Z t - 1 .

Example 3
The logarithm of the consumer price index can be described by the

ARIMA(1,1, 0) model:
(1 - B )ln Qt = m + a [(1 - B)ln Qt - 1 - m ] + et
When analysing the behaviour of an ARIMA( p,1, q ) model, the

standard technique is to look at the first difference of the process and
to perform the kind of analysis which is suitable for an ARMA model.
Once complete, this can be used to provide predictions for the original,
undifferenced, process.
ARIMA models play a central role in the Box-Jenkins methodology,

which aims to provide a consistent and unified framework for analysis
and prediction using time series models. (See later.)
____________
Markov property
59 As we saw in Booklet 1, if the future development of a process can be

predicted from its present state alone, without any reference to its past
history, it possesses the Markov property. Stated precisely this reads:
P [ X t Œ A | X s1 = x1, X s2 = x 2 ,  , X sn = x n , X s = x ]
= P[ X t Œ A | X s = x ]
for all times s1 < s2 <  < s < t , all states x1, x 2 ,  , x n , x in S and all
subsets A of S .
____________
60 A first-order autoregressive process possesses the Markov property,

since the conditional distribution of X n + 1 given all previous X t
depends only on X n .
____________

61 This property does not apply, however, to higher-order

autoregressions.
Suppose X is an AR (2) . X does not possess the Markov property,

since the conditional distribution of X n + 1 given the history of X up
until time n depends on X n - 1 as well as on X n . But let us define a
vector-valued process Y by Y t = ( X t , X t -1) . Given the whole history

T
of the process X up until time n , the distribution of Y n +1 depends

only on the values of X n and X n - 1 – in other words, on the value
of Y n . This means that Y possesses the Markov property.
In general an AR ( p ) does not possess the Markov property (for p > 1 )
( )
T
but we may define a vector-valued process Y t = X t , X t -1,  , X t - p +1
which does.
____________
Recall from Booklet 1 that a random walk possesses the Markov

property. The discussion of the Markov property for autoregressions
can be extended to include some ARIMA processes such as the
random walk, which has already been shown to be an ARIMA(0,1, 0)
process.
____________
62 An ARIMA( p, d , 0) process does not possess the Markov property (for

p + d > 1) but we may define a vector-valued process
( )
T
Y t = X t , X t -1,  , X t - p -d +1 which does.
____________

63 A moving average, or more generally an ARIMA( p, d , q ) process with

q > 0 , can never be Markov, since knowledge of the value of X n , or of
( X n , X n - 1,  , X n - q + 1 )
T
any finite collection will never be enough to
deduce the value of en , on which the distribution of X n + 1 depends.
Since a moving average has been shown to be equivalent to an
autoregression of infinite order, and since a p th order autoregression
needs to be expressed as a p -dimensional vector in order to possess
the Markov property, a moving average has no similar
finite-dimensional Markov representation.
____________

Chapter 14 – Time series 2
Compensating for trend and seasonality
All the methods that we shall investigate apply only to a time series
which gives the appearance of stationarity. In this section, therefore,
we deal with possible sources of non-stationarity and how to
compensate for them.
A simple time series plot in R can be generated as:
ts.plot(x)
where x is some (vector) time series data.

____________
64 Lack of stationarity may be caused by the presence of deterministic

effects in the quantity being observed.
Linear or exponential trends:
Monthly sales figures for a company, which is expanding rapidly,

would be expected to show a steady underlying increase, possibly
linear or perhaps even exponential.
Seasonal variation:
A company, which sells greetings cards, will find that the sales in some
months of the year will be much higher than in others.
In both cases there is an underlying deterministic pattern and some

(possibly stationary) random variation on top of that. In order to
predict sales figures in future months it is necessary to extrapolate the
deterministic trends as well as to analyse the stationary random
variation.
A further cause of non-stationarity may be that the process observed is

an integrated version of a more fundamental process.
In these cases, differencing the observed time series may produce a
series which is more likely to be a realisation of some stationary
process.
____________

65 The most useful tools in identifying non-stationarity are the simplest: a

plot of the series against t , and the sample ACF.
Plotting the series will highlight any obvious trends in the mean and
will show up any cyclic variation, which could also form evidence of
non-stationarity. This should always be the first step in any practical
time series analysis.
____________
The R code below uses the ts.plot function.
ts.plot(log(FTSE100$Close))
points(log(FTSE100$Close),cex=.4)
generates Figure 14.1, which shows the time series of the logs of 300
successive closing values of FTSE100 index.
Figure 14.1: 300 successive closing values of the FTSE100 index,

Jan 2017 – Mar 2018; log-transformed

The corresponding sample ACF and sample PACF are produced using:
par(mfrow=c(1,2))
acf(log(FTSE100$Close))
pacf(log(FTSE100$Close))
These are shown in Figure 14.2 below.
Figure 14.2: Sample ACF and sample PACF of the log(FTSE100) data;
dotted lines indicate cut-offs for significance if data came from some
white noise process.
____________
The sample ACF should, in the case of a stationary time series,

ultimately converge towards zero exponentially fast, as for AR (1)
where r s = a s .
____________
66 If the sample ACF decreases slowly but steadily from a value near 1,
we would conclude that the data need to be differenced before fitting
the model.
____________

If the sample ACF exhibits a periodic oscillation, however, it would be

reasonable to conclude that there is some underlying cause of the
variation.
Figure 14.2 shows the sample ACF of a time series which is clearly
non-stationary as the values decrease in some linear fashion;
differencing is therefore required before fitting a stationary model.
See, for example, the change of ACF and PACF for the differenced
data:
Figure 14.3: Data plot, sample ACF and sample PACF of

—ln(FTSE 100) .
____________

67 The simplest way to remove a linear trend is by ordinary least squares.

This is equivalent to fitting the model:
x t = a + bt + y t
where a and b are constants and y is a zero-mean stationary

process. The parameters a and b can be estimated by linear
regression prior to fitting a stationary model to the residuals y t .
____________
68 Differencing may well be beneficial if the sample ACF decreases slowly

from a value near 1, but has useful effects in other instances as well.
If, for instance, x t = a + bt + y t , then:
—x t = b + — y t
so that the differencing has removed the trend in the mean.

____________
Where seasonal variation is present in the data, one way of removing it

is to take a seasonal difference.
____________
69 Suppose that the time series x records the monthly average

temperature in London. A model of the form:
xt = m + q t + y t (14.1)
might be applied, where q is a periodic function with period 12 and y

is a stationary series. The seasonal difference of x is defined as
—12 x = (1 - B12 ) x and we see that:
(—12 x )t = x t - xt -12 = ( m + q t + y t ) - ( m + q t -12 + y t -12 ) = y t - y t -12
is a stationary process.
____________

Figure 14.4 below is generated from the following lines in R, where

functions ts.plot, acf and pacf are used:
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))

ts.plot(manston1$tmax,ylab="",main="Max temperatures
observed at each month (2010-2017), Manston, UK")
points(manston1$tmax,cex=0.4)
acf(manston1$tmax,main="")
pacf(manston1$tmax,main="")
Figure 14.4: Data plot, sample ACF and PACF of temperature data.

Seasonal differencing 12 seems to have removed the seasonal

behaviour of the data. See Figure 14.5 generated from:
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))

ts.plot(diff(manston1$tmax,lag=12),ylab="",
main="Seasonal differenced temperature data")
points(diff(manston1$tmax,lag=12),cex=0.4)
acf(diff(manston1$tmax,lag=12),main="")
pacf(diff(manston1$tmax,lag=12),,main="")
Figure 14.5: Temperature data after appropriate differencing

____________
The monthly inflation figures are obtained by seasonal differencing of

the Retail Prices Index. If x t is the value of the RPI in month t, the
xt - xt - 12
annual inflation figure reported is ¥ 100%
xt - 12
____________

70 The method of moving averages makes use of a simple linear filter to

eliminate the effects of periodic variation. If x is a time series with
seasonal effects with even period d = 2h , then we define a smoothed
process y by:
yt =
1
2h ( 1
2
xt - h + xt - h + 1 +  + xt - 1 + xt +  + xt + h - 1 +
1
2
xt + h )
This ensures that each period makes an equal contribution to y t .
The same can be done with odd periods d = 2h + 1 , but the end terms
x t - h and x t + h do not need to be halved.
____________
As with most filtering techniques, care must be taken lest the

smoothing of the data obscure the very effects which the procedure is
intended to uncover.
____________
71 The simplest method for removing seasonal variation is to subtract

from each observation the estimated mean for that period, obtained by
simply averaging the corresponding observations in the sample.
For example, when fitting the model in Equation 14.1 to a monthly time
series x extending over 10 years from January 1990 the estimate for
m is x and the estimate for q January is:
1
qˆJanuary = ( x1 + x13 + x 25 +  + x109 ) - mˆ
10
____________
In R the function decompose can be used to obtain both the moving

average and seasonal means.
ts.plot(manston1$tmax,ylab="",main="Max
temperatures")
points(manston1$tmax,cex=0.4)

The time series data is plotted as in Figure 14.6 below.

decomp=decompose(ts(manston1$tmax,frequency =
12),type="additive")
The decomposition is saved as decomp.
The moving average can be added (in red) using the code:
lines(as.vector(decomp$trend),col="red")
The sum of seasonal and moving average trends can be added (in blue)
as follows:
lines(as.vector(decomp$seasonal+decomp$trend),
col="blue")
Figure 14.6: Temperature data and its decomposition into moving

average (in red) and seasonal trend (in blue) added.
____________

Diagnostic procedures such as an inspection of a plot of the residuals

may suggest that even the best-fitting standard linear time series
model is failing to provide an adequate fit to the data. Before
attempting to use more advanced non-linear models it is often worth
attempting to transform the data in some straightforward way in an
attempt to find a data set on which the linear theory will work properly.
____________
72 Transformations are most commonly used when a dependence is

suspected between the variance of the residuals and the size of the
fitted values. If, for example, the standard deviation of X t + 1 - X t
appears to be proportional to X t , then it would be appropriate to use
the logarithmic transformation, to work on the time series Y = ln X .
____________
73 In certain applications it may be found that most residuals are small

and negative, with a few large positive values to offset them. This may
be taken to indicate that the distribution of the error terms is
non-normal, leading to doubts as to whether the standard time series
procedures, designed for normal errors, are applicable.
It may be possible to find a transformation which will improve the

normality of the error terms of the transformed process, but care
should be taken that this does not lead to instability in the variance.
____________
A further caution when using transformed data involves the final step
of turning forecasts for the transformed process into forecasts for the
original process, as some transformations introduce a systematic bias.
____________
Identification of MA(q ) and AR ( p) models
The treatment of this section assumes that the sequence of

observations { x1, x 2 ,  , x n } may be presumed to come from a
stationary time series process. The problems of how to tell if the
assumption of stationarity is reasonable and what to do if it is not have
been treated in the previous section.

Estimation of the ACF and PACF
The autocovariance and autocorrelation functions, as seen above, play

a central role in the analysis of time series. Other descriptive tools,
such as the partial autocorrelation function, are derived from the ACF.
Faced, then, with a sequence of observations { x1, x 2 ,  , x n } and the
task of finding a time series model to fit the sequence, a primary
concern must be to estimate the ACF of the time series process of
which the data form a realisation.
____________
74 The common mean of a stationary model can be estimated using the

sample mean:
1 n
mˆ = Â xt
n t =1
____________
75 The autocovariance function g k can be estimated using the sample

autocovariance function, denoted ck or gˆk , given by:
1 n
gˆk = Â ( xt - mˆ )( xt - k - mˆ )
n t = k +1
____________
76 Estimates for the autocorrelation function rk are given by:
gˆk
rk =
gˆ0
The collection {rk : k Œ  } is called the sample autocorrelation

function (SACF). Every time series analysis involves at least one plot
of rk against k . Such a plot is called a correlogram.
____________

77 The partial autocorrelation function fk can be estimated using the

formula involving the ratio of determinants to which reference was
made earlier, but with the r k replaced by their estimates rˆ k . The
resulting function fˆk , called the sample partial autocorrelation
function (SPACF), and the plot of fˆk against k , called the partial
correlogram, are as important as the SACF and the correlogram in the
analysis of time series.
____________
As we have seen before, R functions acf and pacf can be used for
generating these values.
For example, the following lines simulate observations from an

ARMA(1,1) model.
Set the seed to guarantee reproducibility. The code is:
set.seed(123)
Call the simulated data x :
x=arima.sim(n=300,model=list(ar=0.7,ma=0.5))
Then:
par(mfrow=c(1,2))
acf(x,main="Sample ACF")
pacf(x,main="Sample PACF")
produces the graphs below.

Figure 14.7: ACF and PACF of some simulated data from ARMA(1,1) .
____________
Identification of white noise
A test for whether a particular sequence of observations forms a

standard white noise process may seem of doubtful usefulness, but
one of the techniques of residual analysis suggests that the
verification of goodness of fit of any model should include a test as to
whether the residuals form a white noise process. A suitable test, or
portfolio of tests, is therefore a valuable asset.
Clearly the SACF and SPACF of a white noise process are random,
being simple functions of the observations. In particular, even if the
original process was a perfectly standard white noise the SACF and
SPACF would not be identically zero. The question is what scale of
deviation from zero is to be expected.
____________

78 An asymptotic result states that, if the original model is white noise:
X t = m + et
then the estimators r k and fk are approximately normally distributed

with mean 0, variance 1 n for each k .
____________
79 Values of the SACF or SPACF falling outside the range from - 2 n to

2 n can be taken as suggesting that the white noise model is
inappropriate. This range is indicated by dashed lines in the standard
output in R for ACF and PACF.
But some care should be exercised: the cut-off points of ±2 n give

approximate 95% limits, implying that about one value in 20 will fall
outside the range even when the white noise model is correct. This
means that one single value of r or fˆ outside the specified range
k k
would not be regarded as significant on its own, but three such values
might well be significant.
____________
80 A ‘portmanteau’ test is due to Ljung and Box, who state that, if the
white noise model is correct, then:
m rk2
n(n + 2) Â 2
~ cm
k =1 n - k
for each m .
____________
The standard commands for running these tests in R on some

observations (simulated white noise here) are:
x <- rnorm (100)

Box.test (x, lag = 1, type = "Ljung")
____________

Identification of MA(q )
The distinguishing characteristic of MA(q ) is that r k = 0 for all k > q .

A test for the appropriateness of a MA(q ) model, therefore, is that rk
is close to 0 for all k > q .
____________
81 If the data really do come from a MA(q ) model, the estimators r k for
k > q will be roughly normally distributed with mean 0 and variance
1Ê q ˆ
Á 1 + 2 Â rk ˜ .
2
nË k =1 ¯
____________
This asymptotic result enables a test to be formulated.

____________
Identification of AR ( p )
82 The corresponding diagnostic procedure for an autoregressive model

is based on the sample partial ACF, since the PACF of an AR ( p ) is
distinctive, being equal to zero for k > p .
The asymptotic variance of fk is 1 n for each k > p . Again a normal

approximation can be used, so that values of the SPACF outside the
range ±2 n may suggest that the AR ( p) model is inappropriate.
____________
The Box-Jenkins methodology
In this section we consider the general class of autoregressive

integrated moving average models – the ARIMA( p, d , q ) models. As
usual we assume that historical data, comprising a time series
{ x t : t = 1, 2,  n} , are given.
____________

83 The Box-Jenkins approach allows one to find an ARIMA model which is

reasonably simple and provides a sufficiently accurate description of
the behaviour of the historical data.
The main steps of the approach are:

 Tentative identification of a model from the ARIMA class.
 Estimation of parameters in the identified model.
 Diagnostic checks.
If the tentatively identified model passes the diagnostic tests, the

model is ready to be used for forecasting. If it does not, the diagnostic
tests should indicate how the model ought to be modified, and a new
cycle of identification, estimation and diagnosis is performed.
____________
Identifying p , d and q
An ARIMA( p, d , q ) model is completely identified by the choice of

non-negative integer values for the parameters p , d , and q . The
parameter d is the number of times we have to difference the time
series x to convert it to some stationary level. The following
principles can be used to choose the appropriate value of d .
____________
84 A time series X can be modelled by a stationary ARMA model if the

sample autocorrelation function rk decays rapidly to zero with k .
____________
85 If, on the other hand, a slowly decaying positive sample autocorrelation

function rk is observed, this should be taken to indicate that the time
series needs to be differenced to convert it into a likely realisation of a
stationary random process.
____________

86 Let sˆd2 denote the sample variance of the process z (d ) = —d x , ie the

sample variance of the data after they have been differenced d times.
It is normally the case that sˆd2 first decreases with d until stationarity
is achieved and then starts to increase. Therefore d can be set to the
value which minimises sˆd2 . This could be d = 0 if the original time
series x is already stationary.
____________
Suppose now that the appropriate value for the parameter d has been
found, and the time series { zd + 1, zd + 2 ,  , zn } is adequately stationary.
(Notice that a differenced series has one fewer observation than the
original series.) We shall assume throughout this section that the
sample mean of the z sequence is zero; if this is not the case, obtain a
new sequence by subtracting mˆ = z from each value in the sequence.
We shall also assume, for the sake of simplicity in setting down the
lower and upper limits of sums, that d = 0 .
In the framework of the Box-Jenkins approach we try to find an

ARMA( p, q ) model which fits the data z .
____________
87 If either the correlogram or the partial correlogram appears to be close

to zero for sufficiently large k , an MA(q ) or AR ( p ) model is indicated.
____________
88 Otherwise we should look for an ARMA( p, q ) model with non-zero

values of p and q . A good indicator for possible values of p and q
in an ARMA( p, q ) is the number of spikes in the ACF and PACF until
some geometrical decay to zero is observed. Since models can be
readily fitted in R, it is not hard to start with a simple model like
ARMA(1,1) and to work up to more complicated models if the simple
ones are deemed inadequate.
____________

89 Every additional parameter improves the fit of the model by reducing

the residual sum of squares. Taking this to extremes, a model with n
parameters could be found to fit the data exactly. But this will result in
some spurious model with insignificant t -values of parameter
estimates and the forecasts made with such a model will be found to
be practically useless. This is known as the problem of overfitting.
____________
90 The question of when to stop adding new parameters is addressed by

Akaike’s information criterion (AIC), which states that we should only
consider adding an extra parameter if this results in a reduction of the
residual sum of squares by a factor of at least e - 2 n , or alternatively,
one can evaluate for each possible model the value of:
number of parameters
AIC (model) = log(sˆ 2 ) + 2 ¥
n
and choose as the most appropriate the one corresponding to the

lowest such value.
____________
Parameter estimation
Once the values of p and q have been identified, the problem

becomes to estimate the values of parameters a 1, a 2 ,  , a p and
b1, b 2 ,  , b q for the ARMA( p, q ) model:
Zt = a 1Zt - 1 +  + a p Zt - p + et + b 1et - 1 +  + b q et - q
Least squares estimation suggests itself. This is equivalent to

maximum likelihood estimation if the et may be assumed to be
normally distributed.
____________

91 In the case of an AR ( p) we have:
et = Zt - a 1Zt - 1 -  - a p Zt - p
and the estimators aˆ1,  , aˆ p are chosen to minimise:
n
(zt - a 1zt -1 -  - a p zt - p )
2
Â
t = p +1
____________
92 In the case of a more general ARMA process we encounter the

difficulty that the et cannot be deduced from the zt . For example, in
the case of ARMA(1,1) we have:
et = zt - a 1zt - 1 - b 1et - 1
an equation which can be solved iteratively for et as long as some

starting value e0 is assumed. For an ARMA( p, q ) the list of starting
values is (e0 ,  , eq - 1) . The starting values need to be estimated,
which is usually carried out by a recursive technique.
____________
93 First assume they are all equal to zero and estimate the a i and b j on
that basis, then use standard forecasting techniques on the
time-reversed process { zn ,  , z1 } to obtain predicted values for
(e0 ,  , eq - 1) , a method known as backforecasting. These new values
can be used as the starting point for another application of the
estimation procedure; this continues until the estimates have
converged.
____________
In Figure 14.7, the ACF and PACF plots show some significant spikes
in the early lags, suggesting some presence of autoregressive and
moving average.

The code:
fit=arima(x,order=c(1,0,1));fit
fits the ARIMA(1,0,1) to this data set, with standard output:
arima(x = x, order = c(1, 0, 1))

Coefficients:
ar1 ma1 intercept
0.6118 0.5849 0.0911
s.e. 0.0530 0.0600 0.2224
sigma^2 estimated as 0.9016:
log likelihood = -410.9, aic = 829.8
where the estimated parameters ar1 and ma1 correspond to a and b

in the ARMA(1,1) model. The fitted model has AIC = 829.8, which is the
smallest value among other possible models like AR (1) , MA(1) ,
ARMA(1,2) and ARMA(2,2) .
____________
94 An alternative method of estimation is based on method of moments

estimation. There are p + q parameters to be estimated. We can
calculate the theoretical ACF { rk } of an ARMA( p, q ) process, which
will be a function of the a ’s and b ’s. Then the method of moments
estimates are those values of a and b such that the theoretical ACF
r1,  , r p + q coincides with the observed sample ACF r1,  , rp + q . This
method is easily available for AR ( p) models since the corresponding
Yule-Walker equations are linear, therefore moment estimation requires
solving them with respect to the unknown parameters a i .
____________

95 The final parameter of the model is s 2 , the variance of the et , which

may be estimated using:
1 n ˆ2
sˆ 2 = Â et
n t = p +1
1 n
= Â (zt - aˆ1zt -1 -  - aˆ p zt - p - bˆ1eˆt -1 -  - bˆq eˆt - q )2
n t = p +1
where eˆt denotes the residual at time t .

____________
If the number of observations, n , of the time series is sufficiently large

there will be little difference between the least squares estimates and
the method of moments estimates of the parameters.
____________
Diagnostic checks
After the tentative identification of an ARIMA( p, d , q ) model and

calculation of the estimates mˆ , sˆ , aˆ1, ... , aˆ p , bˆ1, ... , bˆq we have to
perform diagnostic checking.
____________
96 The principle of this is that, if the ARMA( p, q ) model is a good

approximation to the underlying time series process, then the residuals
eˆt will form a good approximation to a white noise process.
____________
The following checks are frequently used.

____________

97 The visual inspection of the graph of the residuals against t or the

graph of eˆt against zt can help to highlight a poorly fitting model. If
any pattern is evident, whether in the average level of the residuals or
in the magnitude of the fluctuations about 0, this should be taken to
mean that the model is inadequate.
____________
The behaviour of the sample ACF and sample PACF of a white noise
sequence have already been described.
____________
98 If the SACF or SPACF of the sequence of residuals has too many

values outside the range ±2 N , we conclude that the fitted model
does not have enough parameters and a new model with additional
parameters should be fitted. The Ljung-Box chi-squared statistic may
also be used for this purpose, but the degrees of freedom of the test
statistics needs to be reduced by the number of parameters p + q of
the ARMA model.
____________
99 If y 1, y 2 ,  , y n is a sequence of numbers, then we say that the

sequence has a turning point at time k if either y k - 1 < y k and
y k > y k + 1 , or y k - 1 > y k and y k < y k + 1 .
____________
100 If Y1, Y2 ,  , Yn is a sequence of independent random variables with

continuous distribution, then the probability of a turning point at time
2 2
k is 3
, the expected number of turning points is 3
(n - 2) , and the
variance is (16n - 29) 90 . Therefore the number of turning points in a
realisation of Y1, Y2 ,  , Yn should be within the 95% confidence
interval:
È2 16n - 29 16n - 29 ˘
Í 3 (n - 2) - 1.96
2
, 3
(n - 2) + 1.96 ˙
ÎÍ 90 90 ˚˙
____________

The command:
tsdiag(fit)
generates a graphical summary of the diagnostic checks of the

residuals.
where the last plot shows a sequence of p-values of the Ljung-Box test,
high values observed suggesting good fit, ie residuals close to white
noise.
Figure 14.8: Diagnostic checks of residuals

____________

Forecasting
101 Using the Box-Jenkins approach, forecasting is relatively

straightforward. Having fitted an ARMA model to the data { x1,  , x n }
we have the equation:
X n + k = m + a 1( X n + k -1 - m ) +  + a p ( X n + k - p - m )
+ en + k + b 1en + k -1 +  + b qen + k -q
The forecast value of X n + k given all observations up until time n ,

known as the k -step ahead forecast and denoted xˆ n (k ) , is obtained
from this equation by:
 replacing all (unknown) parameters by their estimated values;
 replacing the random variables X 1, , X n by their observed values

x1, , x n ;
 replacing the random variables X n +1, , X n + k -1 by their forecast

values xˆ n (1), , xˆ n (k - 1) ;
 replacing the innovations e1, , en by the residuals eˆ1, , eˆn ;
 replacing the random variables en +1, , en + k -1 by their

expectations, 0.
____________
102 The one-step ahead and two-step ahead forecasts for an AR (2) are
given by:
xˆ n (1) = mˆ + aˆ1( x n - mˆ ) + aˆ 2 ( x n - 1 - mˆ )
xˆ n (2) = mˆ + aˆ1( xˆ n (1) - mˆ ) + aˆ 2 ( x n - mˆ )
____________
Thus the k -step ahead forecast is essentially the conditional

expectation of the future value of the process given all the information
currently available at time n .

A point estimate of X n + k is less useful than a confidence interval, for

which an estimate of the variance is required. A comparison of X n + 1
with xˆ n (1) shows that the difference between them arises from
numerous sources, including en + 1 , differences between true values of
parameters and their estimates, and differences between true values of
the et and the residuals eˆt which are used to estimate them.
____________
103 Calculation of the prediction variance in any given case is complicated

and is best left to a computer. In general, though, it is possible to state
that the variance of the k -step ahead estimator is relatively small for
small values of k and converges, for large k , to g 0 , the variance of
the stationary process X .
____________
104 If X is an ARIMA( p, d , q ) process, then Z = —d X is ARMA( p, q ) , the

techniques given above can be used to produce forecasts and
confidence intervals for future values of Z . By reversing the
differencing procedure these can be translated into forecasts of future
values of X .
____________
For example, if X is ARIMA(0,1,1) , then X n + 1 = X n + Z n + 1 , and

xˆ n (1) = x n + zˆn (1) .
An ARIMA( p, d , q ) process with d > 0 is not stationary and therefore

has no stationary variance. It should come as no surprise, then, that
the prediction variance for the k -step ahead forecast increases to
infinity as k increases. This is easily seen in the case of the random
walk process.
For predicting three steps ahead using R:
predict(fit,n.ahead=3)

The Box-Jenkins methodology is demanding, requiring a skilled

operator to produce reliable results. There are many instances in
which a company needs no more than a simple forecast of some future
value without having to employ a trained statistician to provide it.
____________
Exponential smoothing
105 A much simpler forecasting technique, introduced by Holt in 1958,

uses a weighted combination of past values to predict future
observations:
xˆ n (1) = a ( x n + (1 - a ) x n -1 + (1 - a )2 x n - 2 + )
____________
106 Here a is a single parameter, either chosen by the user or estimated

by least squares from past data. Typically a value in the range 0.2 to
0.3 is used. The geometrically decreasing weights give rise to the
name exponential smoothing.
____________
107 The method lends itself easily to regular updating: it is easy to see
that:
xˆ n (1) = (1 - a ) xˆ n -1(1) + a x n = xˆ n -1(1) + a ( x n - xˆ n -1(1))
so that the current forecast is obtained by taking the previous forecast

and compensating for the error observed when the actual figure
became available.
____________
108 This technique works for stationary series, but clearly cannot be
applied to series exhibiting a trend or seasonal variation.
____________
109 There are more sophisticated versions of exponential smoothing which

are able to cope with trends or seasonal variation, and are even well
equipped to handle slowly varying trends or multiplicative, rather than
additive, seasonal variation.
____________

Multivariate time series models
110 An m -dimensional multivariate time series { x1,  , x n } is a sequence

of m -dimensional vectors. Each vector x t is a set of observations of
the values of m variables of interest at time t .
____________
111 A multivariate time series is modelled by a sequence of random

vectors { X 1, X 2 ,  } . The components of X t will be denoted
X t(1) ,  , X t(m ) .
____________
The second order properties of a sequence of random vectors are

summarised by:
 the vectors of expected values mt = E ( X t ) , and
 the covariance matrices for all pairs of random vectors,

cov( X t , X t + k ) .
____________
112 The definition of stationarity is the same in the multidimensional case

as it is for univariate time series: the vector process is (weakly)
stationary if both E ( X t ) and cov( X t , X t + k ) are independent of t .
____________
113 In the stationary case the notation m will be used to represent the
common mean vector, and S k the covariance matrix cov( X t , X t + k ) .
The diagonal elements of the covariance matrix S k are clearly the

autocovariances at lag k of the individual components of the random
vector X t . The off-diagonal elements S k (i , j ) are called the lag k
cross-covariances of X (i ) with X ( j ) , cov( X t(i ) , X t( +j )k ) .

____________

114 A multivariate white noise process is the simplest example of a

multivariate random process. Suppose e1, e2 ,  is a sequence of
independent zero-mean random vectors, each having the same
covariance matrix S .
S need not be a diagonal matrix, though it must be symmetrical. In

other words, the components of the innovations vector need not be
independent of one another. This is a multivariate analogue of the
zero-mean white noise.
____________
Vector autoregressive processes
115 A vector autoregressive process of order p , denoted VAR ( p ) , is a

sequence of m -component random vectors { X 1, X 2 ,  } satisfying:
p
Xt = m + Â A j (Xt - j - m ) + et (14.2)
j =1
where e is an m -dimensional white noise process and the A j are

m ¥ m matrices.
____________
We might believe that interest rates, it , and tendency to invest, It , are

related to one another by the equations:
ÏÔi - m = a 11(it -1 - mi ) + et(i )

t i
Ì (14.3)
I
ÔÓ t - m I = a 21(it -1 - mi ) + a 22 (It -1 - mI ) + et(I )
where e (i ) and e (I ) are zero-mean (univariate) white noises. They may

have different variances and are not necessarily uncorrelated; that is,
we do not require ( )
cov et(i ) , et(I ) = 0 , although we do require
cov (
et(i ) , es(I ) ) = 0 for s π t .
____________

116 This model can be expressed as a 2-dimensional VAR (1) :
Ê it - m i ˆ Ê a 11 0 ˆ Ê it -1 - m i ˆ Ê et ˆ
(i )
=
ÁË I - m ˜¯ ÁË a ˜Á ˜ + Á ˜
21 a 22 ¯ Ë It -1 - m I ¯ ÁË et ˜¯
(I )
t I
____________
117 The theory and analysis of a VAR (1) closely parallels that of a
univariate AR (1) . Iterating from equation (14.2) in the case p = 1 , it is
clear that:
t -1
Xt = m + Â A j et - j + At (X 0 - m )
j =0
____________
118 In order that X should represent a stationary time series, the powers
of A should converge to zero in some sense. The appropriate
requirement is that all eigenvalues of the matrix A should be less than
1 in absolute value.
____________
119 The eigenvalues of matrix A are the values l such that

det( A - l I ) = 0 . Eg for a 2-dimensional time series this equation
reduces to:
(a 11 - l )(a 22 - l ) - a 12a 21 = 0
where A[ i , j ] = a ij (i = 1,2, j = 1,2) .

____________
Similar, though more complicated, requirements can be set out under

which a more general VAR ( p ) process is stationary.

Fitting a vector autoregression is very similar to the process of fitting a

univariate autoregression. Parameter estimation can be carried out by
least squares or by method of moments. Some elements of the
univariate theory, such as the use of Akaike’s Information Criterion, do
not translate unchanged into a multivariate setting, but other topics
carry across relatively easily.
____________
The following simple dynamic Keynesian model provides an example

of a multivariate autoregressive process. Denote by Yt the national
income over a certain period of time, and denote by Ct and It the total
consumption and investment over the same period. It is assumed that
the consumption, Ct , depends on the income over the previous
period:
Ct = a Yt - 1 + et
(1)
where e (1) is a zero-mean white noise. The investment, It , is

determined by the ‘accelerator’ mechanism:
It = b (Ct - 1 - Ct - 2 ) + et
(2)
where e (2) is another zero-mean white noise. Finally, any part of the
national income is either consumed or invested; therefore:
Yt = Ct + It
____________
120 Eliminating the national income we arrive at the following

two-dimensional VAR (2) :
Ct = a Ct - 1 + a It - 1 + et(1)
It = b (Ct - 1 - Ct - 2 ) + et(2)

Using matrix notation we can rewrite the above equation as:
0ˆ Ê Ct - 2 ˆ Ê et ˆ
(1)
Ê Ct ˆ Ê a a ˆ Ê Ct -1ˆ Ê 0
ÁË I ˜¯ = ÁË b +
0 ˜¯ ÁË It -1 ˜¯ ÁË - b
+ Á ˜
0˜¯ ÁË It - 2 ˜¯ ËÁ e (2) ¯˜
t t
____________
Cointegrated time series
121 Recall that a time series process X is called integrated of order d ,

abbreviated as I (d ) , if the process Y = —d X is stationary.
____________
122 Two time series processes X and Y are called cointegrated if:
 X and Y are I (1) random processes
 there exists a non-zero vector (a , b ) such that a X + b Y is

stationary.
The vector (a , b ) is called a cointegrating vector.

____________
123 There are a number of circumstances when it is reasonable to expect

that two processes may be cointegrated:
 if one of the processes is driving the other
 if both are being driven by the same underlying process.
____________
The following simple model of evolution of the USDollar/GBPound

exchange rate X t provides an example of a cointegrated model. It is
assumed that the exchange rate fluctuates around the purchasing
power Pt Qt , where Pt and Qt are the consumer price indices for US
and UK, respectively.

This is described by the following model:
Pt
ln X t = ln + Yt
Qt
Yt = m + a (Yt -1 - m ) + et + b et -1
where e is a zero-mean white noise.
The evolution of ln P and lnQ is described by ARIMA(1,1, 0) models:
(1 - B)ln Pt = m1 + a 1[(1 - B)ln Pt - 1 - m1 ] + et(1)
(1 - B) ln Qt = m2 + a 2 [(1 - B )ln Qt - 1 - m2 ] + et(2)
where e (1) and e (2) are zero-mean white noises, possibly correlated.
ln P and ln Q are both ARIMA(1,1, 0) processes. The logarithm of the

exchange rate is also non-stationary. However, ln X - ln P + ln Q is the
ARMA(1,1) random process Y and, therefore, is a stationary random
process. It follows that the sequence of random vectors
{(ln X t ,ln Pt ,ln Qt ) : t = 1, 2,  } is described by a cointegrated model
with the cointegrating vector (1, -1,1) .
____________
Special non-stationary, non-linear time series models
124 The general class of bilinear models can be exemplified by its simplest
representative, the random process X defined by the relation:
X n + a ( X n - 1 - m ) = m + en + b en - 1 + b( X n - 1 - m )en - 1
Considered only as a function of X , this relation is linear; it is also

linear when considered as a function of e only. This is why it is called
‘bilinear’.
____________

125 The main qualitative difference between the bilinear model and models
from the ARMA class is that many bilinear models exhibit ‘bursty’
behaviour: when the process is far from its mean it tends to exhibit
larger fluctuations.
The difference between this model and an ARMA(1,1) process may be

seen to lie in the last term on the right hand side: when X n - 1 is far
from m and en - 1 is far from 0 – events which are far from being
independent – the final term assumes a much greater significance.
____________
126 A simple representative of the class of threshold autoregressive

models is the random process X defined by the relation:
Ïa 1( X n - 1 - m ) + en , if X n - 1 £ d
Xn = m + Ì
Óa 2 ( X n - 1 - m ) + en , if X n - 1 > d
____________
127 The distinctive feature of some models from the threshold

autoregressive class is the limit cycle behaviour. This makes the
threshold autoregressive models suitable for the description of ‘cyclic’
phenomena.
____________
128 Another modification of the AR class of models is that of

autoregressive models for which the coefficient is random. In other
words:
X t = m + a t ( X t - 1 - m ) + et
where {a 1, a 2 ,  , a n } is a sequence of independent random variables.

____________
129 The behaviour of these processes can vary widely, depending on the
distribution chosen for the a t , but is in general more irregular than
that of the corresponding AR (1) .
____________

130 Such a model could be used to represent the behaviour of an

investment fund, with m = 0 and a t = 1 + it with it being the random
rate of return.
____________
131 The class of autoregressive models with conditional heteroscedasticity

of order p – the ARCH ( p) models – is defined by the relation:
p
Â a k (Xt -k - m)
2
X t = m + et a 0 +
k =1
where e is a sequence of independent standard normal random

variables.
____________
132 The simplest representative of the ARCH ( p) class is the ARCH (1)
model defined by the relation:
X t = m + et a 0 + a 1 ( X t - 1 - m )
2
____________
If m is zero, it can be shown that cov( X t , X s ) = 0 for s π t confirming

that Xt is white noise with uncorrelated but not independent
components.
____________
133 As may be seen from the ARCH (1) model, a significant deviation of
X t - 1 from the mean m gives rise to an increase in the conditional
variance of X t given X t - 1 .
____________
134 The ARCH models have been used for modelling financial time series.
If Zt is the price of an asset at the end of the t th trading day, it is
found that the ARCH model can be used to model X t = ln(Zt Zt - 1) ,
interpreted as the daily return on day t .

The ARCH family of models captures the feature frequently observed in

asset price data that a significant change in the price of an asset is
often followed by a period of high volatility.
____________

PAST EXAM QUESTIONS
This section contains all the relevant exam questions from 2008 to 2017 that
are related to the topics covered in this booklet.
Solutions are given after the questions. These give enough information for
you to check your answer, including working, and also show you what an
outline examination answer should look like. Further information may be
available in the Examiners’ Report, ASET or Course Notes. (ASET can be
ordered from ActEd.)
We first provide you with a cross-reference grid that indicates the main
subject areas of each exam question. You can use this, if you wish, to
select the questions that relate just to those aspects of the topic that you
may be particularly interested in reviewing.
Alternatively, you can choose to ignore the grid, and attempt each question
without having any clues as to its content.

9
8
7
6
5
4
3
2
1
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
Question
Page 70





MA processes















AR processes


ARMA processes
Cross-reference grid



ARIMA processes


VAR processes
Autocov fn /
















 autocorr fn





Partial ACF
Non-stationary



 series
Parameter









estimation



Forecasting
Testing / choosing







model



Trends, cycles

ARCH models
© IFE: 2019 Examinations

Question attempted
1 Subject CT6 April 2008 Question 7
Consider the following model applied to some quarterly data:
Yt = et + b1et -1 + b 4et - 4 + b1b 4et - 5
where et is a white noise process with mean zero and variance s 2 .
(i) Express in terms of b1 and b 4 the roots of the characteristic

polynomial of the MA part, and give conditions for invertibility of the
model. [2]
(ii) Derive the autocorrelation function (ACF) for Yt . [5]
For our particular data the sample ACF is:
Lag ACF
1 0.73
2 0.14
3 0.37
4 0.59
5 0.24
6 0.12
7 0.07
(iii) Explain whether these results confirm the initial belief that the model
could be appropriate for these data. [3]
[Total 10]

(i) Describe the difference between strictly stationary processes and

weakly stationary processes. [2]
(ii) Explain why weakly stationary multivariate normal processes are also
strictly stationary. [1]
(iii) Show that the following bivariate time series process, ( X n ,Yn )T , is
weakly stationary:
X n = 0.5 X n -1 + 0.3Yn -1 + enx

Yn = 0.1X n -1 + 0.8Yn -1 + eny
where enx and enx are two independent white noise processes. [5]
(iv) Determine the positive values of c for which the process
X n = (0.5 + c ) X n -1 + 0.3Yn -1 + enx

Yn = 0.1X n -1 + (0.8 + c )Yn -1 + eny
is stationary. [6]
[Total 14]
3 Subject CT6 September 2008 Question 6
Consider the ARCH(1) process:
X t    et  0  1( X t 1   )2
where et are independent normal random variables with variance 1 and

mean 0. Show that, for s  1,2,, t  1 , X t and X t s are:
(i) uncorrelated. [5]
(ii) not independent. [3]

[Total 8]

From a sample of 50 consecutive observations from a stationary process,

the table below gives values for the sample autocorrelation function (ACF)
and the sample partial autocorrelation function (PACF):
Lag ACF PACF

1 0.854 0.854
2 0.820 0.371
3 0.762 0.085
The sample variance of the observations is 1.253.
(i) Suggest an appropriate model, based on this information, giving your

reasoning. [2]
(ii) Consider the AR(1) model:
Yt  a1Yt 1  et
where et is a white noise error term with mean zero and variance  2 .
Calculate method of moments (Yule-Walker) estimates for the

parameters of a1 and  2 on the basis of the observed sample. [4]
(iii) Consider the AR(2) model:
Yt  a1Yt 1  a2Yt  2  et
where et is a white noise error term with mean zero and variance  2 .
Calculate method of moments (Yule-Walker) estimates for the

parameters of a1 , a2 and  2 on the basis of the observed sample. [7]
(iv) List two statistical tests that you should apply to the residuals after fitting
a model to time series data. [2]
[Total 15]

Let Yt be a stationary time series with autocovariance function g Y (s ) .
(i) Show that the new series X t = a + bt + Yt where a and b are fixed
non-zero constants, is not stationary. [2]
(ii) Express the autocovariance function of DX t = X t - X t -1 in terms of

g Y (s ) and show that this new series is stationary. [7]
(iii) Show that if Yt is a moving average process of order 1, then the series
DX t is not invertible and has variance larger than that of Yt . [6]
[Total 15]
Consider the stationary autoregressive process of order 1 given by
Yt = 2a Yt -1 + Zt a < 0.5
where Zt denotes white noise with mean zero and variance s 2 .
•
Express Yt in the form Yt = Â a j Zt - j and hence or otherwise find an
j =0
expression for the variance of Yt in terms of a and s . [4]

The following data is observed from n = 500 realisations from a time series:
n n
Â xi = 13,153.32 , Â ( xi - x )2 = 3,153.67 and
i =1 i =1
n -1
Â ( xi - x )( xi +1 - x ) = 2,176.03
i =1
(i) Estimate, using the data above, the parameters m , a1 and s from the
model:
X t - m = a1( X t -1 - m ) + e t
where e t is a white noise process with variance s 2 . [7]
(ii) After fitting the model with the parameters found in (i), it was calculated
that the number of turning points of the residuals series eˆt is 280.
Perform a statistical test to check whether there is evidence that eˆt is

not generated from a white noise process. [3]
[Total 10]

The following two models have been suggested for representing some
quarterly data with underlying seasonality.
Model 1 Yt = aYt - 4 + et
Model 2 Yt = b et - 4 + et
where et is a white noise process in each case.
(i) Determine the autocorrelation function for each model. [4]
The observed quarterly data is used to calculate the sample autocorrelation.
(ii) State the features of the sample autocorrelation that would lead you to
prefer Model 1. [1]
[Total 5]
Observations y1, y 2 , , y n are made from a random walk process given by:
Y0 = 0 and Yt = a + Yt -1 + et for t > 0
where et is a white noise process with variance s 2 .
(i) Derive expressions for E (Yt ) and var (Yt ) and explain why the process
is not stationary. [3]
(ii) Show that g t ,s = cov (Yt ,Yt - s ) for s < t is linear in s . [2]
(iii) Explain how you would use the observed data to estimate the
parameters a and s . [3]
(iv) Derive expressions for the one-step and two-step forecasts for Yn +1
and Yn + 2 . [2]
[Total 10]

A time series model is specified by:
Yt = 2a Yt -1 - a 2Yt - 2 + et
(i) Determine the values of a for which the process is stationary. [2]
(ii) Derive the auto-covariances g 0 and g 1 for this process and find a
general recursive expression for g k for k ≥ 2 . [10]
(iii) Show that the auto-covariance function can be written in the form:
g k = Aa k + kBa k
for some values of A , B which you should specify in terms of the

constants a and s 2 . [5]
[Total 17]
Consider the time series Yt = 0.7 + 0.4Yt -1 + 0.12Yt - 2 + et where et is a

white noise process with variance s 2 .
(i) Identify the model as an ARIMA( p, d , q ) process. [1]
(ii) Determine whether Yt is a stationary process. [2]
(iii) Calculate E (Yt ) . [2]
(iv) Calculate the auto-correlations r1 , r2 , r3 and r 4 . [4]

[Total 9]

12 Subject CT6 October 2011 Question 8
Consider the time series
Yt = 0.1 + 0.4 Yt -1 + 0.9 et -1 + et
(i) Identify the model as an ARIMA( p, d , q ) process. [1]
(ii) Determine whether Yt is:
(a) a stationary process
(b) an invertible process. [2]
(iii) Calculate E (Yt ) and find the auto-covariance function for Yt . [6]
(iv) Determine the MA(• ) representation for Yt . [4]

[Total 13]

Consider the time series model
(1 - a B )3 X t = et
where B is the backwards shift operator and et is a white noise process with
variance s 2 .
(i) Determine for which values of a the process is stationary. [2]
Now assume that a = 0.4 .
(ii) (a) Write down the Yule-Walker equations.
(b) Calculate the first two values of the auto-correlation function r1

and r2 . [7]
(iii) Describe the behaviour of rk and the partial autocorrelation function

fk as k Æ • . [3]
[Total 12]

In order to model a particular seasonal data set an actuary is considering

using a model of the form:
(1 - B ) (1 - (a + b ) B + ab B ) X
3 2
t = et
where B is the backward shift operator and et is a white noise process with
variance s 2 .
(i) Show that for a suitable choice of s the seasonal difference series
Yt = X t - X t - s is stationary for a range of values of a and b , which
you should specify. [3]
After appropriate seasonal differencing the following sample autocorrelation

values for the series Yt are observed: rˆ1 = 0.2 and rˆ2 = 0.7 .
(ii) Estimate the parameters a and b based on this information. [7]
[HINT: let X = a + b , Y = ab and find a quadratic equation with roots a

and b .]
(iii) Forecast the next two observations x̂101 and x̂102 based on the
parameters estimated in part (ii) and the observed values x1, x2 ,..., x100
of X t . [4]
[Total 14]

An actuary is considering the time series model defined by:
X t = a X t -1 + et
where et is a sequence of independent Normally distributed random

variables with mean 0 variance s 2. The series begins with the fixed value
X 0 = 0.
(i) Show that the conditional distribution of X t given X t -1 is normal and

hence show that the likelihood of making observations x1, x2,º, xn
from this model is:
( xi -a xi -1 )2
n 1 -
Lμ’ e 2s 2 . [3]
i =1 2p s
(ii) Show that the maximum likelihood estimate of a can also be regarded
as a least squares estimate. [2]
(iii) Find the maximum likelihood estimates of a and s 2. [4]
(iv) Derive the Yule-Walker equations for the model and hence derive
estimates of a and s 2 based on observed values of the
autocovariance function. [5]
(v) Comment on the difference between the estimates of a in parts (iii)

and (iv). [1]
[Total 15]

(i) State the three main stages in the Box-Jenkins approach to fitting an
ARIMA time series model. [3]
(ii) Explain, with reasons, which ARIMA time series would fit the observed
data in the charts below. [2]
ACF PACF
Now consider the time series model given by:
X t = a1X t -1 + a 2 X t - 2 + b1et -1 + et
where et is a white noise process with variance s 2.
(iii) Derive the Yule-Walker equations for this model. [6]
(iv) Explain whether the partial auto-correlation function for this model can
ever give a zero value. [2]
[Total 13]

A sequence of 100 observations was made from a time series and the
following values of the sample auto-covariance function (SACF) were
observed:
Lag SACF
1 0.68
2 0.55
3 0.30
4 0.06
The sample mean and variance of the same observations are 1.35 and 0.9
respectively.
(i) Calculate the first two values of the partial correlation function fˆ1
and fˆ2. [1]
(ii) Estimate the parameters (including s 2 ) of the following models which

are to be fitted to the observed data and can be assumed to be
stationary.
(a) Yt = a0 + a1Yt -1 + et
(b) Yt = a0 + a1Yt -1 + a2Yt - 2 + et
In each case et is a white noise process with variance s 2 . [12]
(iii) Explain whether the assumption of stationarity is necessary for the

estimation for each of the models in part (ii). [2]
(iv) Explain whether each of the models in part (ii) satisfies the Markov
property. [2]
[Total 17]

(i) List the main steps in the Box-Jenkins approach to fitting an ARIMA time
series to observed data. [3]
Observations x1, x2,  , x200 are made from a stationary time series and
the following summary statistics are calculated:
200 200 200

 xi  83.7  ( xi  x )2  35.4  ( xi  x )( xi 1  x )  28.4
i 1 i 1 i 2
200
 ( xi  x )( xi 2  x )  17.1
i 3
,
(ii) Calculate the values of the sample auto-covariances ˆ0 ˆ1 and ˆ2. [3]
(iii) Calculate the first two values of the partial correlation function ˆ1
and ˆ2 . [3]
The following model is proposed to fit the observed data:
X t    a1( X t 1   )  et
where et is a white noise process with variance  2 .
(iv) Estimate the parameters  , a1 and  2 in the proposed model. [5]
After fitting the model in part (iv) the 200 observed residual values eˆt were
calculated. The number of turning points in the residual series was 110.
(v) Carry out a statistical test at the 95% significance level to test the
hypothesis that eˆt is generated from a white noise process. [4]
[Total 18]

The following time series model is being used to model monthly data:
Yt = Yt -1 + Yt -12 - Yt -13 + et + b1et -1 + b12et -12 + b1b12et -13
(i) Perform two differencing transformations and show that the result is a
moving average process which you may assume to be stationary. [3]
(ii) Explain why this transformation is called seasonal differencing. [1]
(iii) Derive the auto-correlation function of the model generated in part (i). [8]
[Total 12]
Consider the following pair of equations:
X t  0.5 X t 1  Yt   t1
Yt  0.5Yt 1   X t   t2
where  t1 and  t2 are independent white noise processes.
(i) (a) Show that these equations can be represented as:
X   X   1 
M  t   N  t 1    t 
 Yt   Yt 1    t2 
where M and N are matrices to be determined.
(b) Determine the values of  for which these equations represent a

stationary bivariate time series model. [9]

(ii) Show that the following set of equations represents a VAR( p ) (vector
auto regressive) process, by specifying the order and the relevant
parameters:
X t   X t 1  Yt 1   t1
Yt   X t 1   X t 2   t2
[3]
[Total 12]
Consider the following time series model:
Yt  1  0.6Yt 1  0.16Yt  2   t
where  t is a white noise process with variance  2 .
(i) Determine whether Yt is stationary and identify it as an ARMA  p,q 

process. [3]
(ii) Calculate E Yt  . [2]
(iii) Calculate for the first four lags:

 the autocorrelation values 1,  2 , 3 ,  4 and
 the partial autocorrelation values  1,  2 ,  3 ,  4 . [7]

[Total 12]

In order to model the seasonality of a particular data set an actuary is asked

to consider the following model:
1  B  1      B   B  X
12 2
t  t
where B is the backshift operator and  t is a white noise process with

variance  2 .
The actuary intends to apply a seasonal difference s X t  Yt .
(i) Explain why s should be 12 in this case (ie Yt  X t  X t 12 ). [1]
(ii) Determine the range of values for  and  for which the process will
be stationary after applying this seasonal difference. [3]
Assume that after the appropriate seasonal differencing the following sample
autocorrelation values for observations of Yt are ˆ1  0 and ˆ2  0.09 .
(iii) Estimate the parameters  and  . [5]
The actuary observes a sequence of observations x1, x2 ,, xT of X t , with

T  12.
(iv) Derive the next two forecasted values for next two observations xˆT 1
and xˆT  2 , as a function of the existing observations. [4]
[Total 13]

Model A is a stationary AR(1) model, which follows the equation:
Yt = m + a Yt -1 + e t
where e t is a standard white noise process.
(i) State two approaches for estimating the parameters in Model A. [2]
Mary, an actuarial student, wishes to revise Model A such that the error
terms e t no longer follow a normal distribution.
(ii) Explain which of the approaches in part (i) she should now use for
parameter estimation. [2]
(iii) Propose a method by which Mary will be able to calculate estimates of

the parameters a and s 2 , with reference to any relevant equations. [3]
Mary has now constructed Model B. She has done this by multiplying both
sides of the equation above by (1 - cB) , where B is the backshift operator,
so that Model B follows the equation:
(1 - cB)Yt = (1 - cB)(a Yt -1 + e t )
(iv) Explain why Model A and Model B are identical. [2]
(v) Explain for which values of c Model B is stationary. [2]

[Total 11]

Let X t = a + bt + Yt , where Yt is a stationary time series, and a and b are

fixed non-zero constants.
(i) Show that X t is not stationary. [2]
Let DX t = X t - X t -1.
(ii) Show that DX t is stationary. [1]
(iii) Determine the autocovariance values of DX t in terms of those of Yt . [4]
Now assume that Yt is an MA(1) process, ie Yt = e t + be t -1
(iv) Set out an equation for DX t in terms of b, b , e t and L, the lag operator.
[1]
(v) Show that DX t has a variance larger than that of Yt . [4]

[Total 12]

SOLUTIONS TO PAST EXAM QUESTIONS
The solutions presented here are just outline solutions for you to use to
check your answers. See ASET for full solutions.
(i) Invertibility
We have
Yt = et + b1et -1 + b 4et - 4 + b1b 4et - 5

= (1 + b1B + b 4B 4 + b1b 4B5 )et
So the characteristic polynomial of the white noise terms is:
1 + b1l + b 4 l 4 + b1b 4 l 5 = (1 + b1l )(1 + b 4 l 4 )
The roots of the characteristic polynomial are:
l = - b1 , 4 - b1
1 4
The time series is invertible if the roots are all greater than 1 in magnitude:
- b1 > 1 fi b1 < 1
1
4 - b1 > 1 fi - b1 > 14 fi b4 < 1

4 4

(ii) ACF
The autocovariance function at lag 0 is:
g 0 = cov(Yt ,Yt )
= cov(et + b1et -1 + b 4et - 4 + b1b 4et - 5 ,
et + b1et -1 + b 4et - 4 + b1b 4et - 5 )
= s 2 + b12s 2 + b 42s 2 + b12 b 42s 2
= s 2 (1 + b12 )(1 + b 42 )
Similarly:
g 1 = cov(Yt ,Yt -1)

et -1 + b1et - 2 + b 4et - 5 + b1b 4et - 6 )

= b1s + b1b 42s 2
2
= s 2 b1(1 + b 42 )
g 2 = cov(Yt ,Yt - 2 )
et - 2 + b1et - 3 + b 4et - 6 + b1b 4et - 7 )
=0
g 3 = cov(Yt ,Yt - 3 )
et - 3 + b1et - 4 + b 4et - 7 + b1b 4et - 8 )
= b1b 4s 2

g 4 = cov(Yt ,Yt - 4 )
et - 4 + b1et - 5 + b 4et - 8 + b1b 4et - 9 )
= b 4s 2 + b12 b 4s 2
= s 2 b 4 (1 + b12 )
g 5 = cov(Yt ,Yt - 5 )
et - 5 + b1et - 6 + b 4et - 9 + b1b 4et -10 )
= b1b 4s 2
gk = 0 k >5
Hence:
g0
r0 = =1
g0
g1 s 2 b1(1 + b 42 ) b1
r ±1 = = 2 =
g 0 s (1 + b12 )(1 + b 42 ) 1 + b12
r ±2 = 0
g3 b1b 4
r ±3 = =
g 0 (1 + b12 )(1 + b 42 )
g4 s 2 b 4 (1 + b12 ) b4
r ±4 = = 2 =
g 0 s (1 + b12 )(1 + b 42 ) 1 + b12
g5 b1b 4
r ±5 = =
g 0 (1 + b12 )(1 + b 42 )
r±k = 0 |k |>5

(iii) Results confirm belief?
Now r2 , r6 and r7 are zero, so we would expect r2 , r6 and r7 to be

close to zero. They do not appear to be (we have insufficient information to
carry out a formal test).
Now r3 and r5 should both equal the product of r1 and r 4 . r3 and r5

are not equal and are both smaller than r1r4 = 0.431 .
Alternatively, assuming we require an invertible model, then | b1 | < 1 and

b1 b4
| b 4 | < 1 which implies that r1 = < 0.5 , r4 = < 0.5 and
1+ b12 1+ b 42
r3 = r5 < 0.25 . Only r5 meets these conditions.
So it appears that the sample ACFs are not consistent with the theoretical
ACFs.
(i) Strictly and weakly stationary time series
A process is strictly stationary if:
f ( xt1 ,  , xtn ) ∫ f ( xt1 + k ,  , xtn + k )
A process is weakly stationary if:
 E ( X t ) is constant
 cov( X t , X t + k ) is constant for a given lag k .
(ii) Weakly stationary multivariate is strictly stationary
A normal distribution is defined by its mean, m , and its variance, s 2 only.

So if these are constant (as per the weakly stationary definition) then this will
uniquely define the process. Hence it will also be strictly stationary.

(iii) Show VAR(1) stationary
A VAR(1) process X n = AX n -1 + ε n is stationary if the eigenvalues of matrix

A are less than one in magnitude. The eigenvalues are the values of l
that solve:
det( A - l I) = 0
Expressing the time series in vector form X n = AX n -1 + ε n , we have:
Ê X n ˆ Ê 0.5 0.3ˆ Ê X n -1ˆ Ê enx ˆ

ÁË Y ˜¯ = ÁË 0.1 0.8˜¯ ÁË Y ˜¯ + ÁÁ y ˜˜
n n -1 Ë en ¯
The eigenvalues of the matrix A solve:
È Ê 0.5 0.3ˆ Ê 1 0ˆ ˘ Ê 0.5 - l 0.3 ˆ

det Í Á ˜ -lÁ ˜ ˙ = det Á =0
ÍÎ Ë 0.1 0.8¯ Ë 0 1¯ ˙˚ Ë 0.1 0.8 - l ˜¯
Ê a bˆ
For a 2 ¥ 2 matrix, M = Á , the determinant is det M = ad - bc . Hence:
Ë c d ˜¯
(0.5 - l )(0.8 - l ) - 0.3 ¥ 0.1 = 0
l 2 - 1.3l + 0.37 = 0
Solving this gives:
1.3 ± 1.32 - 4 ¥ 1 ¥ 0.37

l= = 0.421, 0.879
2
Since both of these roots are less than 1 in magnitude, the multivariate
process is stationary.

(iv) Values of c so multivariate process is stationary
Expressing the time series in vector form X n = AX n -1 + ε n , we have:
Ê X n ˆ Ê 0.5 + c 0.3 ˆ Ê X n -1ˆ Ê enx ˆ

ÁË Y ˜¯ = ÁË 0.1 +Á ˜
0.8 + c ˜¯ ÁË Yn -1 ˜¯ ÁË e y ˜¯
n n
The eigenvalues of the matrix A solve:
È Ê 0.5 + c 0.3 ˆ Ê 1 0ˆ ˘ Ê 0.5 + c - l 0.3 ˆ

det Í Á
Ë 0.1 0.8 + c ˜ - l ËÁ 0 1¯˜ ˙ = det ËÁ
¯ 0.1 0.8 + c - l ¯˜
=0
ÎÍ ˙˚
Hence:
(0.5 + c - l )(0.8 + c - l ) - 0.3 ¥ 0.1 = 0
l 2 - (1.3 + 2c )l + (c 2 + 1.3c + 0.37) = 0
Solving this gives:
(1.3 + 2c ) ± (1.3 + 2c )2 - 4 ¥ 1 ¥ (c 2 + 1.3c + 0.37)

l=
2
(1.3 + 2c ) ± (1.69 + 5.2c + 4c 2 ) - (4c 2 + 5.2c + 1.48)
=
2
(1.3 + 2c ) ± 0.21
=
2
To be stationary, we require both roots to be less than 1 in magnitude.

Since we are told that c is positive, we get:
(1.3 + 2c ) + 0.21
<1 fi 0.879 + c < 1 fi c < 0.121
2
(1.3 + 2c ) - 0.21
<1 fi 0.421 + c < 1 fi c < 0.579
2
Hence c < 0.121 .

(i) Show that the values are uncorrelated
Two random variables X and Y are uncorrelated if and only if

E  XY   E  X  E Y  .
For the two values X t and X t s of this ARCH model, we have:
E  X t   E    et  0  1( X t 1   )2     E et  0  1( X t 1   )2 

   
 2
   E et  E   0  1( X t 1   ) 
  
0
  0  
There’s nothing special about t , so we also have for t  s :
E  X t s   
The expectation of the product (when s  1 ) is:

 
E  X t X t s   E    et  0  1( X t 1   )2 X t s 



  E  X t s   E et  0  1( X t 1   )2 X t s 
   

  2  E et  E   0  1( X t 1   )2 X t s 
  
0
   0  2
2
So:
E  X t X t s    2     E  X t  E  X t s 
ie X t and X t s are uncorrelated.

(ii) Show that the values are not independent
If two random variables X and Y are independent, then

P f ( X )  A   P f ( X )  A | Y  y  .
Let Yt  X t   , so that:
Yt  et  0  1Yt21
Squaring this:

Yt2  et2  0  1Yt21 
We can use repeated substitution to get:

Yt2  et 2  0  1Yt21 

 et 2  0  1et21  
0  1Yt22
 et 2    e


0    e    Y  
2
1 t 1 0
2
1 t 2 0
2
1 t 3
 et 2    e
 0    e    Y  
2
1 t 1 0

2
1 t 2 0
2
1 t s

f Yt2 s 
So the bracketed factor f Yt2s   indicated (which contains only positive
numbers and squares) is an increasing function of Yt2s . So it follows that,

for example:
( ) (
P Yt2 < 1 Yt2- s = 1,000,000 < P Yt2 < 1 Yt2- s = 1 )
So Yt2 is not independent of Yt2s , which implies that Yt is not independent
of Yt s and hence that X t is not independent of X t s .

(i) Appropriate model
From the figures given it looks like the ACF is decaying slowly and the PACF
is cutting off after lag 2. This is a characteristic of an AR (2) model.
(ii) Parameter estimates
As a starter step:
cov(Yt , et )  cov(a1Yt 1  et , et )   2
Consider the autocovariance with a lag of 1:
 1  cov(Yt ,Yt 1)  cov(a1Yt 1  et ,Yt 1)  a1 0  1  a1
Since we are told in the question that the sample ACF with lag 1 is 0.854,
this is our estimate of 1 . So we have:
0.854  aˆ1
 0  cov(Yt ,Yt )  cov(a1Yt 1  et ,Yt )  a1 1   2
Since we are told in the question that the sample ACF with lag 1 is 0.854
(our estimate of 1 ) and the sample variance is 1.253 (our estimate of  0 ),
we have:
ˆ0  1.253
ˆ1
 0.854  ˆ1  0.854  1.253
ˆ0
From  0  a1 1   2 :
1.253  0.8542  1.253  ˆ 2  ˆ 2  0.339

(iii) Parameter estimates
 1  cov(Yt ,Yt 1)  cov(a1Yt 1  a2Yt  2  et ,Yt 1)  a1 0  a2 1

a1
 1 
1  a2
 2  cov(Yt ,Yt  2 )  cov(a1Yt 1  a2Yt  2  et ,Yt  2 )  a1 1  a2 0

a12 a 2  (1  a2 )a2
  0  a2 0  1 0
1  a2 1  a2
From this:
a12  (1  a2 )a2
2 
1  a2
Since we are told in the question that the sample ACF with lag 1 is 0.854
(our estimate for 1 ) and that the sample ACF with lag 2 is 0.820 (our
estimate of 2 ), we have:
aˆ1
0.854 
1  aˆ2
aˆ12  (1  aˆ2 )aˆ2
0.820 
1  aˆ2
Replacing â1 by 0.854(1  aˆ2 ) in the second equation, we get:
0.8542 (1  aˆ2 )2  (1  aˆ2 )aˆ2

0.820   0.8542 (1  aˆ2 )  aˆ2
1  aˆ2
 0.820  0.8542  aˆ2 (1  0.8542 )  aˆ2  0.335
By substituting this back into the first equation above, we get aˆ1  0.568 .

 0  cov(Yt ,Yt )  cov(a1Yt 1  a2Yt  2  et ,Yt )  a1 1  a2 2   2
Since we are told in the question that the sample ACF with lag 1 is 0.854,
the sample ACF with lag 2 is 0.820 and the sample variance is 1.253, we
have:
ˆ0  1.253
ˆ1
 0.854  ˆ1  0.854  1.253
ˆ0
ˆ2
 0.820  ˆ2  0.820  1.253
ˆ0
From  0  a1 1  a2 2   2 :
1.253  0.568  0.854  1.253  0.335  0.820  1.253  ˆ 2

 ˆ 2  0.301
(iv) Tests
The appropriate tests are the Portmanteau (Ljung and Box) and Turning
Points tests.
(i) Showing series is not stationary
We have:
E ( X t ) = a + bt + E (Yt )
Since Yt is stationary E (Yt ) does not depend upon time. However since
E ( X t ) contains t it depends on time and so is not constant. Hence X t is
not stationary.

(ii) Autocovariance
Simplifying the given expression for DX t :
DX t = X t - X t -1 = a + bt + Yt - a - b(t - 1) - Yt -1 = b + Yt - Yt -1
Consider the autocovariance function of this series with lag s :
g DX t (s ) = cov(b + Yt - Yt -1, b + Yt - s - Yt - s -1)

= cov(Yt ,Yt - s ) - cov(Yt ,Yt - s -1) - cov(Yt -1,Yt - s ) + cov(Yt -1,Yt - s -1)
= g Y (s ) - g Y (s + 1) - g Y (s - 1) + g Y (s )
= 2g Y (s ) - g Y (s + 1) - g Y (s - 1)
This means that the autocovariance function with lag s depends upon the
lag only.
Next, consider the mean:
E ( DX t ) = E (b + Yt - Yt -1) = b + E (Yt ) - E (Yt -1)
Since Yt is stationary, E (Yt ) = E (Yt -1) , so:
E ( DX t ) = b + E (Yt ) - E (Yt -1) = b

So the mean is constant.
These are the two conditions required for the time series to be stationary.
(iii) Moving average process
Since Yt is an MA(1) , then Yt = et + b et -1 . Therefore:
DX t = b + et + b et -1 - et -1 - b et - 2
= b + et + ( b - 1)et -1 - b et - 2

Here:
1 + ( b - 1)l - bl 2 = 0
1 - b ± ( b - 1)2 + 4 b 1 - b ± b 2 + 2 b + 1 1 - b ± ( b + 1)
fil = = =
-2 b -2 b -2 b
1
=- or 1
b
Since the roots of the characteristic equation are not all strictly greater than
1 in magnitude, the process is not invertible.
Now consider the variance of Yt :
var(Yt ) = var(et + b et -1) = var(et ) + b 2 var(et -1) by independence
Let var(et ) = s 2 so that:
var(Yt ) = var(et ) + b 2 var(et -1) = s 2 + b 2s 2 = (1 + b 2 )s 2
Also:
var( DX t ) = var(b + et + ( b - 1)et -1 - b et - 2 )

= var(et ) + var(( b - 1)et -1) + var( - b et - 2 )
= s 2 + ( b - 1)2s 2 + b 2s 2
= ( b - 1)2s 2 + (1 + b 2 )s 2
= ( b - 1)2s 2 + var(Yt )
This is clearly greater than the variance of Yt .

We need to ‘unfold’ the equation for Yt :
Yt = 2a Yt -1 + Zt
= 2a (2a Yt - 2 + Zt -1) + Zt = 4a 2Yt - 2 + 2a Zt -1 + Zt
= 4a 2 (2a Yt - 3 + Zt - 2 ) + 2a Zt -1 + Zt = 8a 3Yt - 3 + 4a 2Zt - 2 + 2a Zt -1 + Zt
•
= Â (2a ) j Zt - j
j =0
In calculating the variance, we recognise it is a geometric progression:
Ê • ˆ • •
var(Yt ) = var Á Â (2a ) j Zt - j ˜ = Â (2a )2 j var(Zt - j ) = Â (2a )2 j s 2
Ë j =0 ¯ j =0 j =0
2 2
s s
= =
1 - (2a )2 1 - 4a 2
(i) Estimation of parameters
The covariance with lag 1, g 1 , is:
g 1 = cov( X t , X t -1) = cov( m + a1( X t -1 - m ) + e t , X t -1)

= cov( m, X t -1) + cov(a1( X t -1 - m ), X t -1) + cov(e t , X t -1)
= 0 + cov(a1X t -1, X t -1) - cov(a1m, X t -1) + 0
= a1g 0 - 0 = a1g 0
We can substitute in the sample values given in the question:
2,176.03 = 3,153.67a1 fi a1 = 0.690
The variance is:
var( X t - m ) = var(a1( X t -1 - m ) + e t )

Since m and a1 are constants, and e t is independent of X t -1 :
var( X t ) = a12 var( X t -1) + var(e t ) = a12 var( X t -1) + s 2
We estimate the variance of the residuals from the data given in the
question:
3,153.67 3,153.67
= 0.6902 ¥ +s 2 fi s 2 = 3.304
500 500
So:
s = 1.818
We also calculate the mean from the data in the question:
13,153.32
m= = 26.31
500
(ii) Turning point test
2
E (T ) = (498) = 332
3
16 ¥ 500 - 29
var(T ) = = 88.567
90
The null and alternative hypotheses are:
H0 : the residuals are from a white noise process

H1 : the residuals are not from a white noise process
The observed value of the test statistic is:
280 + 0.5 - 332

= -5.47
88.567
Under H0 , this should come from the standard normal distribution. Since
-5.47 < -1.96 we have very strong evidence to reject H0 . This suggests
that the residuals are not consistent with white noise.

(i) Autocorrelation functions
Model 1: Yt = a Yt - 4 + et
The examiners assumed this process was stationary. Had we checked it

using the characteristic equation:
1 - al 4 = 0 fi l4 = 1 a fi | a | < 1 so that | l | > 1
We will assume that a π 0 , otherwise Yt is just white noise.
For this process:
g 1 = cov (Yt ,Yt -1 ) = cov (a Yt - 4 + et ,Yt -1 )

= a cov (Yt - 4 ,Yt -1 ) + cov (et ,Yt -1 ) = ag 3 (1)
g 2 = cov (Yt ,Yt - 2 ) = cov (a Yt - 4 + et ,Yt - 2 )

= a cov (Yt - 4 ,Yt - 2 ) + cov (et ,Yt - 2 ) = ag 2
fi g 2 = 0 since a π 0 .

2
=a g3 by eqn (1)
  3   1  0 since a π 0 .


Continuing in this way, we get:
g 5 = ag 1 = 0
g 6 = ag 2 = 0
g 7 = ag 3 = 0
g 8 = ag 4 = a 2g 0 etc
Hence:
ÔÏa k / 4g 0 if k = 4,8,12,16,...
gk = Ì
ÓÔ0 otherwise
1 if k  0
 
 k  k   k / 4 if k  4,8,12,16,...
0 
0 otherwise

Model 2: Yt = b et - 4 + et
We have:
 0  cov Yt ,Yt   cov   et  4  et ,  et  4  et 

  2 2   2   2  1  2 
 1  cov Yt ,Yt 1   cov   et  4  et ,  et 5  et 1 
0
 2  cov Yt ,Yt  2   cov   et  4  et ,  et  6  et 2 

0
 3  cov Yt ,Yt 3   cov   et  4  et ,  et 7  et 3 

0
 4  cov Yt ,Yt  4   cov   et  4  et ,  et 8  et  4 

 2
k  0 k4

So:
Ï1 if k = 0
Ô
Ô b
rk = Ì 2 if k = 4
Ôb +1
Ô0 otherwise
Ó
(ii) Features that would lead you to prefer Model 1
As we noted in part (i):
rk Æ 0 as k Æ • for Model 1
rk = 0 for k > 4 for Model 2
So Model 1 would be preferable if the sample autocorrelation function was

observed to decay to 0 rather than cutting off after lag 4.
(i) Mean and variance
We repeated substitution we have:
Yt  a  Yt 1  et
 a   a  Yt  2  et 1   et
 2a  Yt  2  et 1  et
 2a   a  Yt 3  et 2   et 1  et
 3a  Yt 3  et  2  et 1  et
 ...
t
 t a  Y0  e1  e2    et 1  et  ta   e j
j 1

So:
Ê t ˆ t
E (Yt ) = E Á ta + Â e j ˜ = ta + Â E (e j ) = t a
Ë j =1 ¯ j =1
Ê t ˆ t
var (Yt ) = var Á ta + Â e j ˜ = Â var (e j ) = t s 2
Ë j =1 ¯ j =1
Since the mean and variance of Yt depend on t , the process is not

stationary.
(ii) Proof of linearity
We have:
Ê t t -s ˆ Ê t t -s ˆ
g t ,s = cov (Yt ,Yt - s ) = cov Á ta + Â e j , (t - s ) a + Â e j ˜ = cov Á Â e j , Â e j ˜
Ë j =1 j =1 ¯ Ë j =1 j =1 ¯
= cov (e1 + e2 +  + et -1 + et , e1 + e2 +  + et - s -1 + et - s )
= var (e1) + var (e2 ) +  + var (et - s ) since ei terms independent
= (t - s ) s 2
Hence g t ,s is a linear function of s .
(iii) Estimation of the parameters
Since 1    0 has root l = 1 , we can difference this time series:
—Yt = Yt - Yt -1 = a + et
This is stationary and has mean and variance:
E (—Yt ) = E (Yt - Yt -1 ) = a + E (et ) = a
var (—Yt ) = var (Yt - Yt -1 ) = var (a + et ) = var (et ) = s 2
So the values of a and s can be estimated by differencing the sample data

and calculating the sample mean and standard deviation of the differenced
data. The sample mean can be used as an estimate of a and the sample
standard deviation can be used as an estimate of s .

(iv) Forecast values
Since Yn +1 = a + Yn + en +1 , the one-step ahead forecast for Yn +1 is:
yˆ n (1) = aˆ + y n + E (en +1 ) = aˆ + y n
Also, since Yn + 2 = a + Yn +1 + en + 2 , the two-step ahead forecast for Yn +2 is:
yˆ n (2) = aˆ + yˆ n (1) + E (en + 2 ) = aˆ + yˆ n (1) = 2aˆ + y n
(i) Determine the values of a for which the process is stationary
Rearranging and writing in terms of the backwards shift operator, we have:
Yt - 2a Yt -1 + a 2Yt - 2 = et
The characteristic equation of the autoregressive terms is:
1 - 2al + a 2l 2 = 0
Solving this we obtain:
2a ± 4a 2 - 4a 2 1
l= =
2a 2 a
To be stationary, we require all the roots to be greater than 1 in magnitude:
1
>1 fi a <1
a
(ii) Derive the autocovariance for lag 0, lag 1 and lag k
Starting at lag 0, we have:
g 0 = cov(Yt ,Yt ) = cov(2a Yt -1 - a 2Yt - 2 + et ,Yt )

= 2ag 1 - a 2g 2 + cov(et ,Yt )

We have:
cov(et ,Yt ) = cov(et , 2aYt -1 - a 2Yt - 2 + et )

= 0 - 0 + var(et )
=s2
Hence:
g 0 = 2ag 1 - a 2g 2 + s 2 (1)
Similarly, we get:
g 1 = cov(Yt ,Yt -1 ) = cov(2aYt -1 - a 2Yt - 2 + et ,Yt -1 )

= 2ag 0 - a 2g 1 (2)
g 2 = cov(Yt ,Yt - 2 ) = cov(2aYt -1 - a 2Yt - 2 + et ,Yt - 2 )

= 2ag 1 - a 2g 0 (3)
g 3 = cov(Yt ,Yt - 3 )
= 2ag 2 - a 2g 1
So we can see that the general recursive formula is:
g k = 2ag k -1 - a 2g k - 2 k≥2 (4)
Rearranging equation (2) gives:
2a
g 1(1 + a 2 ) = 2ag 0 fi g1 = g0 (5)
1+ a 2
Substituting this into equation (3) gives:
Ê 2a ˆ 4a 2 3a 2 - a 4
g 2 = 2a Á g 0 ˜ - a 2g 0 = g 0 - a 2g 0 = g0 (6)
Ë 1+ a 2 ¯ 1+ a 2
1+ a 2

We can now substitute (5) and (6) into the lag 0 covariance equation (1):
Ê 2a ˆ Ê 3a 2 - a 4 ˆ
g 0 = 2a Á g 0˜ - a 2 Á g 0˜ +s 2
Ë 1+ a 2 ¯ Ë 1 +a 2
¯
4a 2 - 3a 4 + a 6
= g 0 +s 2
1+ a 2
1+ a 2
fi g0 = s2 (7)
1 - 3a + 3a 4 - a 6
2
Substituting (7) into equation (5) gives:
2a
g1 = s2 (8)
1 - 3a 2 + 3a 4 - a 6
Equations (7), (8) and (4) give the required answer.
(iii) General formula for autocovariance
Rearranging equation (4) we have:
g k - 2ag k -1 + a 2g k - 2 = 0
Comparing this to the difference equation on page 4 of the Tables, we see

that a = 1 , b = -2a and c = a 2 . Since b2 - 4ac = 0 we see that the
solution is of the form:
g k = ( A + Bk )l k (9)
where l is the root of the quadratic equation:
l 2 - 2al + a 2 = 0
Using the quadratic equation formula, the root is:
2a ± 4a 2 - 4a 2
l= =a
2

Hence, the solution of the recursive equation (4) is:
g k = ( A + Bk )a k (10)
For k = 0 , equation (10) gives us g 0 = A . Comparing this to our solution in

equation (7):
1+ a 2
g0 = A = s2
1 - 3a + 3a 4 - a 6
2
For k = 1 , equation (10) gives us g 1 = ( A + B )a . Comparing this to our

solution in equation (8):
2a
g 1 = ( A + B )a = s2
1 - 3a 2 + 3a 4 - a 6
Hence:
2
B= s2 - A
1 - 3a 2 + 3a 4 - a 6
Substituting in our value of A we get:
2 1+ a 2
B= 2 4 6
s2 - 2 4 6
s2
1 - 3a + 3a - a 1 - 3a + 3a - a
1- a 2
= s2
1 - 3a 2 + 3a 4 - a 6
Alternatively, we could substitute the given formula g k = Aa k + kBa k into

the recursive formula g k = 2ag k -1 - a 2g k - 2 (equation (4)) and show that it
works. And then solve for A and B as before.

(i) Classifying Yt as an ARIMA(p,d,q)
Yt is an ARIMA(2, 0, 0) if it is stationary.
(ii) Checking for stationarity
The characteristic polynomial is:
1 - 0.4l - 0.12l 2
which has roots -5 and 1.667. Since all roots are of magnitude greater
than 1, the process is stationary.
(iii) Finding E(Yt )
E (Yt ) = E (0.7 + 0.4Yt -1 + 0.12Yt - 2 + et )

= 0.7 + 0.4E (Yt -1) + 0.12E (Yt - 2 ) + E (et )
where E (Yt + k ) = E (Yt ) = m for any k (since Yt and is stationary), and

E (et ) = 0 .
m = 0.7 + 0.4 m + 0.12 m .
0.7
E (Yt ) = m = = 1.4583
0.48
(iv) Autocorrelations
g 1 = cov(Yt , Yt -1)
= cov(0.4Yt -1 + 0.12Yt - 2 + et , Yt -1)
= 0.4 cov(Yt -1, Yt -1) + 0.12 cov(Yt - 2,Yt -1) + cov(et , Yt -1)
= 0.4g 0 + 0.12g 1
g 2 = cov(Yt , Yt - 2 )
= cov(0.4Yt -1 + 0.12Yt - 2 + et , Yt - 2 )
= 0.4 cov(Yt -1, Yt - 2 ) + 0.12 cov(Yt - 2, Yt - 2 ) + cov(et , Yt - 2 )
= 0.4g 1 + 0.12g 0

g 3 = 0.4g 2 + 0.12g 1
g 4 = 0.4g 3 + 0.12g 2
0.4 5
g1 = g0 = g0
0.88 11
Solve to get:
5 83 241 731
r1 = r2 = r3 = r4 =
11 275 1,375 6,875
(i) Identification as an ARIMA process
Rewriting the defining equation in terms of the backwards shift operator:
(1 - 0.4B )Yt = 0.1 + (1 + 0.9B )et
The characteristic equation of the AR terms is:
1 - 0.4l = 0 fi l= 1 = 2.5
0.4
Since the root is greater than 1 in absolute value, the process is stationary.
Hence d = 0 .
Also, p = 1 and q = 1 since the value of the process at time t depends on

Yt -1 and et -1 but not on earlier value of Y or e .
So the model is an ARIMA(1,0,1) process.
(ii)(a) Stationary?
We have already shown in part (i) that the process is stationary.

(ii)(b) Invertible?
The characteristic equation of the MA part is:
1 + 0.9l = 0 fi 1 = - 10
l = - 0.9 9
Since this is greater than 1 in absolute value, the process is invertible.
(iii) Mean and autocovariance function
E (Yt ) - 0.4E (Yt -1) = 0.1 + 0.9E (et -1) + E (et )
Since the process is stationary, we know that E (Yt ) = E (Yt -1) = ... = m . So:
1
m - 0.4 m = 0.1 + 0 + 0 fi m=
6
Calculating the autocovariance function:
g 0 = cov (Yt ,Yt )

= cov (0.4Yt -1 + 0.9et -1 + et ,Yt )
= 0.4 cov (Yt -1,Yt ) + 0.9 cov (et -1,Yt ) + cov (et ,Yt )
= 0.4g 1 + 0.9 cov (et -1,Yt ) + cov (et ,Yt )
Now since white noise is uncorrelated and future white noise is independent
of the past values of a time series:
cov (et ,Yt ) = cov (et ,0.4Yt -1 + 0.9et -1 + et )

= 0.4 cov (et ,Yt -1) + 0.9 cov (et , et -1) + cov (et , et )
= 0 + 0 + var (et )
=s2

cov (et -1,Yt ) = cov (et -1,0.4Yt -1 + 0.9et -1 + et )

= 0.4 cov (et -1,Yt -1) + 0.9 cov (et -1, et -1) + cov (et -1, et )
= 0.4 cov (et ,Yt ) + 0.9 cov (et , et ) + 0
= 0.4s 2 + 0.9s 2
= 1.3s 2
Hence:
g 0 = 0.4g 1 + 0.9 ¥ 1.3s 2 + s 2 = 0.4g 1 + 2.17s 2
Similarly:
g 1 = cov (Yt ,Yt -1)

= cov (0.4Yt -1 + 0.9et -1 + et , Yt -1)
= 0.4 cov (Yt -1, Yt -1) + 0.9 cov (et -1, Yt -1) + cov (et , Yt -1)
= 0.4g 0 + 0.9 cov (et , Yt ) + 0
= 0.4g 0 + 0.9s 2
g 2 = cov (Yt ,Yt - 2 )

= cov (0.4Yt -1 + 0.9et -1 + et , Yt - 2 )
= 0.4 cov (Yt -1, Yt - 2 ) + 0.9 cov (et -1, Yt - 2 ) + cov (et , Yt - 2 )
= 0.4g 1
g 3 = cov (Yt ,Yt - 3 )

= cov (0.4Yt -1 + 0.9et -1 + et , Yt - 3 )
= 0.4 cov (Yt -1, Yt - 3 ) + 0.9 cov (et -1, Yt - 3 ) + cov (et , Yt - 3 )
= 0.4g 2
= 0.42g 1
and, in general:
g k = 0.4k -1g 1 for k ≥ 1

Substituting our expression for g 0 into the expression for g 1 above, we get:
1.768 2 221 2
g 1 = 0.4(0.4g 1 + 2.17s 2 ) + 0.9s 2 fi g1 = s = s
0.84 105
Ê 221 2 ˆ 253 2
fi g 0 = 0.4 Á s + 2.17s 2 = s
Ë 105 ˜¯ 84
Ê 221 2 ˆ
fi g k = 0.4k -1g 1 = 0.4k -1 Á s ˜ for k ≥ 1
Ë 105 ¯
(iv) MA(∞) representation
We can write the equation for Yt as:
(1 - 0.4B )Yt = 0.1 + 0.9et -1 + et
fi Yt = (1 - 0.4B )
-1
(0.1 + 0.9et -1 + et )
However:
(1 - 0.4B )-1 = 1 + (0.4B ) + (0.4B )2 + (0.4B )3 + 

So:
Yt = (1 - 0.4B )
-1
(0.1 + 0.9et -1 + et )
( )
= 1 + (0.4B ) + (0.4B ) + (0.4B ) +  (0.1 + 0.9et -1 + et )
2 3
= 0.1 + 0.4 ¥ 0.1 + 0.42 ¥ 0.1 + 0.43 ¥ 0.1 + 
+ 0.9et -1 + 0.4 ¥ 0.9et - 2 + 0.42 ¥ 0.9et - 3 + 0.43 ¥ 0.9et - 4 + 
+ et + 0.4 et -1 + 0.42 et - 2 + 0.43 et - 3 + 
0.1
= + et + 1.3 et -1 + 0.4 ¥ 1.3 et - 2 + 0.42 ¥ 1.3 et - 3 + 
1 - 0.4
1 •
= + et + 1.3 Â 0.4 j -1et - j
6 j =1

(i) Values of a for which the process is stationary
Setting the characteristic equation equal to 0 and solving:
1
(1 - al )3 = 0 fi l=
a
The time series is stationary if l > 1 , ie if a < 1 .
(ii)(a) Yule-Walker equations
Writing the time series in its long-hand form, ie by expanding out the
backward shift operator, gives:
(1 - a B )3 X t = et
¤ (1 - 3a B + 3a B 2 2
)
- a 3B 3 X t = et
¤ X t - 3a X t -1 + 3a 2 X t - 2 - a 3 X t - 3 = et
¤ X t = 3a X t -1 - 3a 2 X t - 2 + a 3 X t - 3 + et
We’re told that a = 0.4 which gives:
X t = 1.2 X t -1 - 0.48 X t - 2 + 0.064 X t - 3 + et
A useful intermediate step, before computing the autocovariance function is

to work out cov ( X t , et ) :
cov ( X t , et ) = cov (1.2 X t -1 - 0.48 X t - 2 + 0.064 X t - 3 + et , et )

= 1.2 cov ( X t -1, et ) - 0.48 cov ( X t - 2, et ) + 0.064 cov ( X t - 3 , et )
+ cov (et , et )
= 0 - 0 + 0 +s 2
=s2
The autocovariance function is given by:
g k = cov ( X t , X t - k )

Hence:
g 0 = cov ( X t , X t )
= cov (1.2 X t -1 - 0.48 X t - 2 + 0.064 X t - 3 + et , X t )
= 1.2 cov ( X t -1, X t ) - 0.48 cov ( X t - 2, X t ) + 0.064 cov ( X t - 3 , X t )
+ cov (et , X t )
= 1.2g 1 - 0.48g 2 + 0.064g 3 + s 2 (1)
Similarly:
g 1 = cov ( X t , X t -1)
= cov (1.2 X t -1 - 0.48 X t - 2 + 0.064 X t - 3 + et , X t -1)
= 1.2 cov ( X t -1, X t -1) - 0.48 cov ( X t - 2, X t -1) + 0.064 cov ( X t -3 , X t -1)
+ cov (et , X t -1)
= 1.2g 0 - 0.48g 1 + 0.064g 2 + 0 (2)
g 2 = cov ( X t , X t - 2 )
= cov (1.2 X t -1 - 0.48 X t - 2 + 0.064 X t - 3 + et , X t - 2 )
= 1.2cov ( X t -1, X t - 2 ) - 0.48 cov ( X t - 2, X t - 2 ) + 0.064 cov ( X t - 3 , X t - 2 )
+ cov (et , X t - 2 )
= 1.2g 1 - 0.48g 0 + 0.064g 1 + 0
= 1.264g 1 - 0.48g 0 (3)
g 3 = cov ( X t , X t -3 )
= cov (1.2 X t -1 - 0.48 X t - 2 + 0.064 X t - 3 + et , X t - 3 )
= 1.2 cov ( X t -1, X t - 3 ) - 0.48 cov ( X t - 2, X t - 3 ) + 0.064 cov ( X t - 3 , X t - 3 )
+ cov (et , X t - 3 )
= 1.2g 2 - 0.48g 1 + 0.064g 0 + 0
In summary, the Yule-Walker equations are:
g 0 = 1.2g 1 - 0.48g 2 + 0.064g 3 + s 2

g k = 1.2g k -1 - 0.48g k - 2 + 0.064g k - 3 k ≥1

(ii)(b) Auto-correlation function at lags 1 and 2
Substituting equation (3) into equation (2) gives:
g 1 = 1.2g 0 - 0.48g 1 + 0.064(1.264g 1 - 0.48g 0 )

= 1.16928g 0 - 0.399104g 1
290
fi g1 = g 0 = 0.83573g 0
347
Substituting this back into equation (3) gives:
Ê 290 ˆ 200
g 2 = 1.264 Á g - 0.48g 0 = g = 0.57637g 0
Ë 347 0 ˜¯ 347 0
and:
g 1 290
r1 = = = 0.83573
g 0 347
g 2 200
r2 = = = 0.57637
g 0 347
(iii) Asymptotic behaviour of the autocorrelation and partial auto-correlation
The time series is an AR (3 ) series.
The autocorrelation function, r k , will decay geometrically (ie tend to 0) as

k Æ•.
The partial autocorrelation function, fk , will cut off (ie be 0) for k > 3 .

(i) Suitable choice of s
Since Yt = X t - X t - s = (1 - B s ) X t , we have s = 3 .
Applying this seasonal differencing, Yt = X t - X t - 3 , we have:
(1 - (a + b ) B + ab B )Y = e
2
t t
For Yt to be stationary, we require that the roots of the characteristic

equation all be greater than 1 in magnitude. The characteristic equation is:
(1 - (a + b ) l + abl ) = 0
2
Solving for l , we get:
1 1
(1 - al )(1 - bl ) = 0 fi l=
a
,
b
For stationarity, we need:
1 1
> 1 and >1
a b
ie:
a < 1 and b < 1
(ii) Estimates of a and b
Yt is an AR(2) process:
Yt = (a + b ) Yt -1 - ab Yt - 2 + et

The Yule-Walker equations are:
g 1 = cov ÎÈYt ,Yt -1 ˚˘ = cov ÈÎ(a + b ) Yt -1 - ab Yt - 2 + et ,Yt -1 ˘˚

= (a + b ) g 0 - abg 1 + 0 (1)
g 2 = cov ÈÎYt ,Yt - 2 ˘˚ = cov ÈÎ(a + b ) Yt -1 - ab Yt - 2 + et ,Yt - 2 ˘˚

= (a + b ) g 1 - abg 0 + 0 (2)
g1 =
(a + b ) g
0
1 + ab
g 2 = (a + b )
(a + b ) g - abg 0
0
1 + ab
Dividing through by g 0 , we have two equations:
r1 =
(a + b )
1 + ab
and r2 = (a + b )
(a + b ) - ab = (a + b ) r1 - ab
1 + ab
Equating these to rˆ1 = 0.2 and rˆ2 = 0.7 gives:
(a + b ) = 0.2 (3)
1 + ab
and 0.7 = (a + b ) 0.2 - ab (4)

Equation (3) gives:
a + b = 0.2 (1 + ab ) = 0.2 + 0.2ab
¤ a - 0.2ab = 0.2 - b
0.2 - b
¤ a = (5)
1 - 0.2 b
Substituting this into Equation (4) gives:
Ê 0.2 - b ˆ Ê 0.2 - b ˆ
0.7 = Á + b ˜ 0.2 - Á b
Ë 1 - 0.2 b ¯ Ë 1 - 0.2 b ˜¯
0.04 - 0.2 b + 0.2 b - 0.04 b 2 - 0.2 b + b 2
=
1 - 0.2 b
0.04 - 0.2 b + 0.96 b 2
=
1 - 0.2 b
Rearranging gives:
0.7 (1 - 0.2 b ) = 0.04 - 0.2 b + 0.96 b 2
¤ 0.96 b 2 - 0.06 b - 0.66 = 0
0.06 ± 0.062 + 4 ¥ 0.96 ¥ 0.66

¤ b = = -0.7985, 0.8610
2 ¥ 0.96
By the symmetry of the Yt = (a + b ) Yt -1 - ab Yt - 2 + et equation, it’s clear

that these two solutions are also the solutions for a .
So:
a = -0.7985 and b = 0.8610

Alternatively, using the hint we substitute X = a + b and Y = ab into

equations (3) and (4) to get:
X = 0.2 + 0.2Y (5)
and:
0.7 = 0.2X - Y (6)
Substituting X from equation (5) into equation (6) gives:
0.7 = 0.04 + 0.04Y - Y fi 0.66 = -0.96Y fi Y = -0.6875
Hence X = 0.2 + 0.2( -0.6875) = 0.0625 .
Therefore a + b = 0.0625 and ab = -0.6875 and since a quadratic

equation with roots a and b is:
( x - a )( x - b ) = 0 fi x 2 - (a + b )x + ab = 0
We see that a and b are roots of the quadratic equation:
x 2 - 0.0625 x - 0.6875 = 0
which we solve to get:
0.0625 ± 0.06252 + 4 ¥ 0.6875

= -0.7985, 0.8610
2
ie a = -0.7985 and b = 0.8610 .
(iii) Forecast of x̂101 and x̂102
Using the parameter values a = -0.7985 and b = 0.8610 , we have:
(1 - 0.0625B - 0.6875B )Y = e2
t t
¤ Yt = 0.0625Yt -1 + 0.6875Yt - 2 + et

Substituting in Yt = X t - X t - 3 , we have:
X t - X t - 3 = 0.0625 ( X t -1 - X t - 4 ) + 0.6875 ( X t - 2 - X t - 5 ) + et
¤ X t = 0.0625 X t -1 + 0.6875 X t - 2 + X t - 3 - 0.0625 X t - 4 - 0.6875 X t - 5 + et
So, the one-step ahead forecast is:
xˆ101 = 0.0625 x100 + 0.6875 x99 + x98 - 0.0625 x97 - 0.6875 x96
and the two-step ahead forecast is:
xˆ102 = 0.0625 xˆ101 + 0.6875 x100 + x99 - 0.0625 x98 - 0.6875 x97
(i) Conditional distribution and the likelihood
The conditional distribution of X t | X t -1 = xt -1 is:
N (a xt -1,s 2 )
The PDF of X t | X t -1 = xt -1 will be:
( xt -a xt -1 )2
1 -
e 2s 2
2
2ps
Hence the likelihood will be:
( xi -a xi -1 )2
n 1 -
L(a ,s ) = ’ e 2s 2
2
i =1 2ps

(ii) Equivalence between maximum likelihood and least squares estimators
The least squares estimate will minimise the following with respect to a :
n n
Â ei2 = Â ( xi - a xi -1)2
i =1 i =1
This is exactly the same as maximising the following with respect to a :
( xi -a xi -1 )2
n 1 -
’ 2
e 2s 2
i =1 2ps
(iii) Obtain MLE
Simplifying the likelihood function:
( xi -a xi -1 )2 1 n
n 1 - 1 - 2 Â ( xi -a xi -1 )2
L(a ,s ) = ’ e 2s
2
e 2s = const ¥ i =1
i =1 2ps 2 sn
Taking logs:
1 n
ln L(a ,s ) = const - n ln s - 2 Â ( xi - a xi -1)2
2s i =1
Differentiating with respect to a using the chain rule:
∂ 1 n
∂a
ln L(a ,s ) = - Â 2 ¥ ( xi - a xi -1) ¥ ( - xi -1)
2s 2 i =1
1 n
= Â xi -1( xi - a xi -1)
s2 i =1

Setting this equal to zero:
1 n
Â xi -1( xi - aˆ xi -1) = 0
sˆ 2 i =1
n n
Â xi -1xi - aˆ Â xi2-1 = 0
i =1 i =1
n
Â xi -1xi
aˆ = i =1
n
Â xi2-1
i =1
Differentiating with respect to s :
∂ n 1 n
∂s
ln L(a ,s ) = - + 3
s s
Â ( xi - a xi -1)2
i =1
Setting this equal to zero:
n 1 n
- +
sˆ sˆ 3
Â ( xi - aˆ xi -1)2 = 0
i =1
n
-nsˆ 2 + Â ( xi - aˆ xi -1)2 = 0
i =1
1 n
sˆ 2 = Â ( x - aˆ xi -1)2
n i =1 i
(iv) Yule-Walker equations and estimates of a and s
A preliminary step:
cov( X t , et ) = cov(a X t -1 + et , et )
= a cov( X t -1, et ) + cov(et , et )
= 0 +s 2
=s2

The autocovariance at lag 0 is:
g 0 = cov( X t , X t )
= cov(a X t -1 + et , X t )
= a cov( X t -1, X t ) + cov(et , X t )
= ag 1 + s 2
g 1 = cov( X t , X t -1)
= cov(a X t -1 + et , X t -1)
= a cov( X t -1, X t -1) + cov(et , X t -1)
= ag 0
g 2 = cov( X t , X t - 2 )
= cov(a X t -1 + et , X t - 2 )
= a cov( X t -1, X t - 2 ) + cov(et , X t - 2 )
= ag 1
In general:
g k = ag k -1 k ≥1
Equating the autocovariances to the observed values, we get:
ˆ ˆ1 + sˆ 2
gˆ0 = ag (1)
gˆ1 = ag
ˆ ˆ0 (2)
Rearranging (2) gives:
gˆ1
aˆ =
gˆ0

Hence using (1) we have:
gˆ12
sˆ 2 = gˆ0 - ag
ˆ ˆ1 = gˆ0 -
gˆ0
where, from Page 40 of the Tables:
1 n
gˆ0 = Â ( xi - x )2
n i =1
1 n
gˆ1 = Â ( xi - x )( xi -1 - x )
n i =1
(v) Comment
The autocovariance estimates involve x whereas the maximum likelihood

estimates don’t.
(i) Box-Jenkins approach
• Tentative identification of a model from the ARIMA class

• Estimation of parameters in the identified model
• Diagnostic checks
(ii) ARIMA time series to fit the observed data in the charts
The ACF cuts off (becomes 0) at all lags greater than 1, whereas the PACF
decays towards 0. Hence we have an MA(1) .
(iii) Yule-Walker equations
We start with some useful preliminary equations:
cov ( X t , et ) = cov (a1X t -1 + a 2 X t - 2 + b1et -1 + et , et )

= 0 + 0 + 0 + cov (et , et )
= var (et ) = s 2

Also:
cov ( X t , et -1) = cov (a1X t -1 + a 2 X t - 2 + b1et -1 + et , et -1)

= a1 cov ( X t -1, et -1) + 0 + b1 cov (et -1, et -1) + 0
= a1s 2 + b1s 2
= (a1 + b1) s 2
g 0 = cov ( X t , X t )
= cov (a1X t -1 + a 2 X t - 2 + b1et -1 + et , X t )
= a1 cov ( X t -1, X t ) + a 2 cov ( X t - 2, X t )
+ b1 cov (et -1, X t ) + cov (et , X t )
= a1 g 1 + a 2 g 2 + b1 (a1 + b1) s 2 + s 2 (1)
Similarly:
g 1 = cov ( X t , X t -1)
= cov (a1X t -1 + a 2 X t - 2 + b1et -1 + et , X t -1)
= a1 cov ( X t -1, X t -1) + a 2 cov ( X t - 2, X t -1)
+ b1 cov (et -1, X t -1) + cov (et , X t -1)
= a1 g 0 + a 2 g 1 + b1s 2 + 0 (2)
g 2 = cov ( X t , X t - 2 )
= cov (a1X t -1 + a 2 X t - 2 + b1et -1 + et , X t - 2 )
= a1 cov ( X t -1, X t - 2 ) + a 2 cov ( X t - 2, X t - 2 )
+ b1 cov (et -1, X t - 2 ) + cov (et , X t - 2 )
= a1 g 1 + a 2 g 0 + 0 + 0 (3)

In general, for lags k ≥ 2 :
g k = a1 g k -1 + a 2 g k - 2 (4)
These are the Yule-Walker equations.
(iv) Can the partial auto-correlation function ever give a zero value?
For a MA (q ) process, where q ≥ 1 , the PACF tends towards 0 but does not
completely cut off.
Here, we have an ARMA (2,1) process, ie q ≥ 1 . Hence the PACF tends

towards 0 but does not completely cut off. There will always be a small
partial autocorrelation.
(i) Calculate PACF
fˆ1 = r1 = 0.68
r2 - r12 0.55 - 0.682

fˆ2 = = = 0.1629
1 - r12 1 - 0.682
(ii)(a) Estimate the AR(1) parameters
g 0 = cov(Yt ,Yt )
= cov(a0 + a1Yt -1 + et ,Yt )
= a1g 1 + s 2 (1)

= cov(a0 + a1Yt -1 + et ,Yt -1)
= a1g 0 (2)

Dividing equation (2) by g 0 gives:
g 1 a1g 0
r1 = = = a1
g0 g0
Equating the model lag 1 ACF to the sample lag 1 ACF gives:
aˆ1 = 0.68
Substituting the expression for g 1 from equation (2) into equation (1) gives:
g 0 = a1(a1g 0 ) + s 2
Rearranging this gives:
s2
g0 = (3)
1 - a12
Equating this to the given sample variance of 0.9 and substituting in

aˆ1 = 0.68 that we calculated, we get:
sˆ 2
= 0.9 fi sˆ 2 = 0.9(1 - 0.682 ) = 0.48384
1 - 0.682
We have:
E (Yt ) = a0 + a1E (Yt -1) + 0
Since the time series is stationary, we have m = E (Yt ) = E (Yt -1) =  :
m = a0 + a1m
a0
m=
1 - a1

Equating this to the given sample mean of 1.35 and substituting in aˆ1 = 0.68
that we calculated, we get:
aˆ0
= 1.35 fi aˆ0 = 1.35(1 - 0.68) = 0.432
1 - 0.68
(ii)(b) Estimate the AR(2) parameters
g 0 = cov(Yt ,Yt )
= cov(a0 + a1Yt -1 + a2Yt - 2 + et ,Yt )
= a1g 1 + a2g 2 + s 2 (1)

= cov(a0 + a1Yt -1 + a2Yt - 2 + et ,Yt -1)
= a1g 0 + a2g 1 (2)
g 2 = cov(Yt ,Yt - 2 )
= cov(a0 + a1Yt -1 + a2Yt - 2 + et ,Yt - 2 )
= a1g 1 + a2g 0 (3)
Rearranging equation (2):
g 1 - a2g 1 = a1g 0
a1
fi g1 = g0 (4)
1 - a2
Ê a1 ˆ Ê a2 ˆ
g 2 = a1 Á g 0 ˜ + a2g 0 = Á 1 + a2 ˜ g 0 (5)
Ë 1 - a2 ¯ Ë 1 - a2 ¯
a1
g0
g 1 - a2 a1
r1 = 1 = =
g0 g0 1 - a2

Ê a2 ˆ
1 +a g
Á 2˜ 0
g2 Ë 1 - a2 ¯ a12
r2 = = = + a2
g0 g0 1 - a2
Equating the lag 1 and lag 2 model and sample ACFs gives:
a1
= 0.68 (6)
1 - a2
a12
+ a2 = 0.55 (7)
1 - a2
a1 = 0.68 - 0.68a2 (8)
Substituting this as well as equation (6) into equation (7):
0.68(0.68 - 0.68a2 ) + a2 = 0.55

fi (1 - 0.682 )a2 = 0.55 - 0.682
73
fi aˆ2 = = 0.16295 (9)
448
Substituting this back into equation (8) gives:
73 255
aˆ1 = 0.68 - 0.68 ¥ = = 0.56920
448 448
Substituting g 1 and g 2 from equations (4) and (5) into equation (1) gives:
Ê a1 ˆ ÏÔÊ a 2 ˆ ¸Ô
g 0 = a1 Á g 0 ˜ + a2 ÌÁ 1 + a2 ˜ g 0 ˝ + s 2
Ë 1 - a2 ¯ ÓÔË
1 - a2 ¯ ˛Ô

Rearranging:
ÔÏ a12 a 2a ¸Ô
g 0 Ì1 - - 1 2 - a22 ˝ = s 2
ÓÔ 1 - a2 1 - a2 ˛Ô
s2
fig0 =
a12 a 2a
1- - 1 2 - a22
1 - a2 1 - a2
Equating this to the given sample variance of 0.9 and substituting in

255 73
â1 = 448
and â2 = 448
, we get:
Ï
( ) ( 255
448 )
¸
2 2
255 73
Ô
( ) ˝Ô = 0.47099
448 448 2Ô
sˆ2 = 0.9 Ì1 - - - 73
73 73 448
Ô 1 - 448 1- 448
Ó ˛
We have:
E (Yt ) = a0 + a1E (Yt -1) + a2E (Yt - 2 ) + 0
Since the time series is stationary:
m = E (Yt ) = E (Yt -1) = E (Yt - 2 ) =  :
fi m = a0 + a1m + a2 m
a0
fim=
1 - a1 - a2
255
Equating this to the given sample mean of 1.35 and substituting in â1 = 448
73
and â2 = 448
, we get:
1-
a0
255 - 73
= 1.35 fi aˆ0 = 1.35 1 - ( 255
448
- 73
448 )= 81
224
 0.36161
448 448

(iii) Is stationarity necessary?
Yes stationarity is necessary for both models. Otherwise the mean, variance
and covariances would change over time.
(iv) Markov?
The first model satisfies the Markov property as it only depends on the
previous value of Yt -1 .
However, the second model does not satisfy the Markov property as it also
depends on Yt - 2 .
(i) Box-Jenkins method
The steps are:

1. identify p , d and q from ARIMA( p, d , q )
2. estimate parameters (eg  ,  2 ,  i ’s and i ’s)
3. check the fit of the model (using diagnostic checks).
(ii) Calculate sample autocovariance function
We have n  200 , hence using the formulae given on page 40 of the

Tables:
1 n 1
ˆ0  
n t 1
( xt  ˆ )2 
200
 35.4  0.177
1 n 1
ˆ1   ( xt  ˆ )( xt 1  ˆ )  200  28.4  0.142
n t 2
1 n 1
ˆ2   ( xt  ˆ )( xt 2  ˆ )  200  17.1  0.0855
n t 3

(iii) Calculate the sample PACF
We have:
ˆ1 0.142
r1    0.802260
ˆ0 0.177
ˆ2 0.0855
r2    0.483051
ˆ0 0.177
Hence:
ˆ1  r1  0.802260
r2  r12 0.483051  0.8022602

ˆ2    0.45056
1  r12 1  0.8022602
(iv) Estimate the AR(1) parameters
Equating the sample and population means gives:
1 200 1
ˆ  x   xt  200  83.7  0.4185
200 t 1
 0  cov( X t , X t )
 cov(   a1( X t 1   )  et , X t )
 a1 1   2 (1)
 1  cov( X t , X t 1)
 cov(   a1( X t 1   )  et , X t 1)
 a1 0 (2)
Taking equation (2) and dividing it by  0 gives:
 1 a1 0
1    a1
0 0

Equating the model lag 1 ACF to the sample lag 1 ACF gives:
aˆ1  0.802260
Substituting  1 from equation (2) into equation (1) gives:
 0  a1(a1 0 )   2
Rearranging this gives:
2
0  (3)
1  a12
Equating this to the sample variance of ˆ0  0.177 and substituting in

aˆ1  0.802260 , we get:
ˆ 2
 0.177  ˆ 2  0.177(1  0.8022602 )  0.063079
1  0.8022602
(v) Turning points test
We are testing:
H0 : the residuals are from a white noise process
H1 : the residuals are not from a white noise process
From page 42 of the Tables, have:
E (T )  2 ( n  2)  2 (200  2)  132
3 3
16n  29 16  200  29
and var(T )    35.23
90 90
The observed value of the test statistic is:
110  132
 3.706
35.23

The critical values are ±1.96 , so the test statistic lies in the rejection region.
Hence, we have sufficient evidence at the 5% level (and even at the 0.02%
level) to reject H0 . We have very strong evidence to suggest that the
residuals are not consistent with white noise.
(i) Differencing
We have:
Yt - Yt -1 - Yt -12 + Yt -13 = et + b1et -1 + b12et -12 + b1b12et -13
Taking a difference of the autoregressive terms gives:
(1 - B )(Yt - Yt -12 ) = et + b1et -1 + b12et -12 + b1b12et -13
Taking a seasonal difference of order 12 gives:
(1 - B )(1 - B12 )Yt = et + b1et -1 + b12et -12 + b1b12et -13
Setting X t = (1 - B )(1 - B12 )Yt gives:
X t = et + b1et -1 + b12et -12 + b1b12et -13
The differenced time series, X t , is a moving average process of order 13.
(ii) Explain why it is called seasonal differencing
Since we have monthly data then (1 - B12 ) subtracts the corresponding

monthly figure from the previous year. This will strip out any seasonal effect.
(iii) Auto-correlation function
g 0 = cov( X t , X t )
= cov(et + b1et -1 + b12et -12 + b1b12et -13 ,
et + b1et -1 + b12et -12 + b1b12et -13 )
= (1 + b12 + b12
2
+ b12 b12
2
)s 2

Also:
g 1 = cov( X t , X t -1)
et -1 + b1et - 2 + b12et -13 + b1b12et -14 )

2
= ( b1 + b1b12 )s 2
g 2 = g 3 =  = g 10 = 0
g 11 = cov( X t , X t -11)
et -11 + b1et -12 + b12et - 23 + b1b12et - 24 )

2
= b1b12s
g 12 = cov( X t , X t -12 )
et -12 + b1et -13 + b12et - 24 + b1b12et - 25 )
= ( b12 + b12 b12 )s 2
g 13 = cov( X t , X t -13 )
et -13 + b1et -14 + b12et - 25 + b1b12et - 26 )
= b1b12s 2
g 14 = g 15 =  = 0

The ACF is therefore:
g0
r0 = =1
g0
2
g1 b1 + b1b12
r1 = =
g 0 1 + b12 + b12
2
+ b12 b12
2
r2 = r3 =  = r10 = 0
g 11 b1b12
r11 = =
g 0 1 + b12 + b12
2
+ b12 b12
2
g 12 b12 + b12 b12

r12 = =
g 0 1 + b12 + b12
2
+ b12 b12
2
g 13 b1b12
r13 = =
g 0 1 + b12 + b12
2
+ b12 b12
2
r14 = r15 =  = 0
(i)(a) Rewrite in vector format
We can rewrite the equations as:
 1     X t   0.5 0   X t 1    t1 
       
  1   Yt   0 0.5   Yt 1    2 
 t 
So we have:
 1    0.5 0 
M  N 
  1   0 0.5 

(i)(b) Values of β for which the process is stationary
To test stationarity we require the process to be of the form

X n = AX n -1 + ε n . So we need to multiply by the inverse of matrix M which
is given by:
1 1 
M1   
1   
2 1
Multiplying the defining equation by M1 gives:
X t  M1NX t 1  M1εt
 Xt  1 1    0.5 0   X t 1  1 1     t1 
ie         
Y
 t  1  2  1 0 0.5   Yt 1  1   2  1   2 
 t 
1  0.5 0.5    X t 1  1 1    t 
1
      
1    0.5 
2 0.5   Yt 1  1   2   1   2 
 t 
 0.5 0.5  
 
1  2 1   2   X t 1  1 1     t1 
     
 0.5  0.5   Yt 1  1   2 1   2 
    t 
 1 
2
1   2 
This time series is stationary if both the eigenvalues of the matrix

 0.5 0.5  
 2 
 1   1  2 
are less than 1 in magnitude.
 0.5  0.5 
 
 1 
2
1   2 

Setting:
  0.5 0.5     0.5 0.5  

 2 2    
 1  1    1 0 1  2 1  2 
det    
   det  0
 0.5  0.5  0.5  
   0 1  
0.5
  
  1  2 1   2   1  2 1  2 
   
gives:
2 2
 0.5   0.5  
   0
2   1   2 
 1    
Multiplying through by (1 - b 2 )2 and rearranging gives:
0.5  (1  )
2
  0.5    0
2 2
 0.25   (1   2 )   2 (1   2 )2  0.25  2  0
  2 (1   2 )2   (1   2 )  0.25(1   2 )  0
  2 (1   2 )    0.25  0
1  1  4  0.25(1   2 ) 1  2 1 
 2
 2

2(1   ) 2(1   ) 2(1   2 )
1 1
ie  or 
2(1   ) 2(1   )
Therefore the time series is stationary if:
1 1
1 and 1
2(1   ) 2(1   )
 1  2(1   ) and 1  2(1   )

 1   1 and 1   1
2 2

For the first condition to be satisfied, we must have:
1   1
2 or 1     21
  1 or  3
2 2
For the second condition to be satisfied, we must have:
1   1
2 or 1     21
    21 or    32
So for both eigenvalues to be less than 1 in magnitude, we must have:
  1 or   3
2 2
(ii) VAR(p)
We can rewrite the equations as:
 Xt       X t 1   0 0   X t  2    t1 
      
 Yt    0   Yt 1     0   Yt  2    2 
t 
which is a VAR(2) since it is of the form:
X t  A1X t 1  A 2X t  2  εt
with:
    0 0
A1    and A2   
 0   0

(i) Testing stationarity and identifying ARMA(p,q)
We have:
Yt  0.6Yt 1  0.16Yt  2  1   t
The characteristic equation is:
1  0.6  0.16 2  0
Solving for  :
5
1  0.6  0.16 2  0    5 or
4
5
Since both 5 >1 and  1 , the process is stationary.
4
The earliest Y term is Yt  2 , so p  2 . The earliest white noise term is  t ,

which has zero lag, so q  0 .
Hence this is an ARMA(2,0) process.
(ii) Calculating the mean
Taking expectations gives:
E Yt   E 1  0.6Yt 1  0.16Yt  2   t 
 1  0.6E Yt 1   0.16E Yt 2   E   t 
Since the process is stationary, we know that E Yt k  is equal to some

constant  for all values of k . Also E (e t ) = 0 . This gives:
1
  1  0.6   0.16     4 61
0.24

(iii) Calculating ACFs and PACFs
 1  cov Yt ,Yt 1 

 cov(1  0.6Yt 1  0.16Yt 2   t ,Yt 1)
 cov 1,Yt 1   0.6cov Yt 1,Yt 1   0.16cov Yt 2 ,Yt 1   cov  t ,Yt 1 
 0  0.6 0  0.16 1  0
(1)
 2  cov Yt ,Yt  2 

 cov 1  0.6Yt 1  0.16Yt  2   t ,Yt  2 
 cov 1,Yt 2   0.6cov Yt 1,Yt  2   0.16cov Yt 2 ,Yt  2   cov  t ,Yt  2 
 0  0.6 1  0.16 0  0
(2)
 3  cov Yt ,Yt 3 

 cov 1  0.6Yt 1  0.16Yt  2   t ,Yt 3 
 cov 1,Yt 3   0.6cov Yt 1,Yt 3   0.16cov Yt  2 ,Yt 3   cov  t ,Yt 3 
 0  0.6 2  0.16 1  0
(3)
 4  cov Yt ,Yt  4 

 cov 1  0.6Yt 1  0.16Yt  2   t ,Yt  4 
 cov 1,Yt  4   0.6cov Yt 1,Yt  4   0.16cov Yt  2 ,Yt  4   cov  t ,Yt  4 
 0  0.6 3  0.16 2  0
(4)
0.6 5 1 5
0.84 1  0.6 0  1  0  0  1  
0.84 7 0 7

5
Substituting  1   0 into equation (2) gives:
7
5  103  2 103
 2  0.6   0   0.16 0  0  2  
7  175  0 175
Substituting these into equation (3) gives:
 103   5  409  3 409

 3  0.6   0   0.16   0   0  3  
 175   7  875  0 875
Substituting all the above into equation (4) gives:
 409   103  1,639  4 1,639

 4  0.6   0   0.16  0   0  4  
 875   175  4,375  0 4,375
Using the formulae on page 40 of the Tables, the PACF is:
5
 1  1 
7
 57 
2
103 
 2  12 175 4
2   
1  12 1   57 
2 25
 3   4  0 since AR(2) cuts off for lags greater than 2.

(i) Why s = 12
Setting s = 12 , ie setting Yt = X t - X t -12 , eliminates the factor of (1 - B12 )

and leaves us with an equation of the form:
(1 - (a + b )B + ab B )Y = e
2
t t
So, setting s = 12 removes the seasonal component from the time series.
(ii) Values of α and β for which the differenced process is stationary
The characteristic equation of the differenced series is:
1 - (a + b )l + abl 2 = 0
Factorising and solving gives:
1 1
(1 - al )(1 - bl ) = 0 fi l= ,
a b
For stationarity, we need:
1 1
> 1 and >1
a b
ie:
a < 1 and b <1
(iii) Estimates of α and β
The equation for Yt is:
Yt = (a + b )Yt -1 - ab Yt - 2 + e t

g 1 = cov ÎÈYt ,Yt -1˚˘ = cov ÈÎ(a + b )Yt -1 - abYt - 2 + e t ,Yt -1˘˚
= (a + b )g 0 - abg 1 + 0 (1)
g 2 = cov ÈÎYt ,Yt - 2 ˘˚ = cov ÈÎ(a + b )Yt -1 - abYt - 2 + e t ,Yt - 2 ˘˚

= (a + b )g 1 - abg 0 + 0 (2)
g1 =
(a + b ) g (3)
0
1 + ab
Dividing equations (2) and (3) through by g 0 , we have the following two
equations:
g1 a + b
r1 = =
g 0 1 + ab
g2
r2 = = (a + b ) r1 - ab
g0
Equating these to rˆ1 = 0 and rˆ2 = 0.09 gives:
a +b
r1 = =0 and r2 = 0 - ab = 0.09
1 + ab
So:
a +b =0 and -ab = 0.09
ie:
a = -b and a 2 = 0.09
Hence a = 0.3 or –0.3, and the corresponding values of b are –0.3 or 0.3.

(iii) Forecast values
Using the estimated values of a and b from part (ii), the equation for the
differenced series is:
Yt = (aˆ + bˆ )Yt -1 - ab
ˆ ˆYt - 2 + e t = 0.09Yt - 2 + e t
Also, since Yt = X t - X t -12 , we have:
X t - X t -12 = 0.09 ( X t - 2 - X t -14 ) + e t
ie:
X t = 0.09 X t - 2 + X t -12 - 0.09 X t -14 + e t
So, the one-step ahead forecast at time T is:
xˆT +1 = 0.09 xT -1 + xT -11 - 0.09 xT -13
Similarly, the two-step ahead forecast is:
xˆT + 2 = 0.09 xT + xT -10 - 0.09 xT -12
(i) State two approaches for estimating the parameters
Any TWO of the following:

 Method of moments estimation
 Maximum likelihood estimation
 Least squares estimation.
(ii) Explain which approach should be used
Any ONE of the following:

 Method of moments estimation
 Least squares estimation.

(iii) Method to calculate estimates
g 0 = cov(Yt ,Yt )
= cov( m + a Yt -1 + e t ,Yt )
= ag 1 + s 2 (1)

= cov( m + a Yt -1 + e t ,Yt -1)
= ag 0 (2)
Dividing equation (2) it by g 0 gives:
g 1 ag 0
r1 = = =a (3)
g0 g0
We now equate equation (1) to the sample variance and equation (3) to the
sample ACF at lag 1.
(iv) Explain why the models are identical
There was an error in the question. It should have read:
(1 - cB)Yt = (1 - cB) m + (1 - cB)(a Yt -1 + e t )
Since the same filter has been applied to both sides of the equation, the
observations of Model A will also satisfy Model B.
(v) Explain for which values of c Model B is stationary
Since Model A is stationary and Model B is equivalent then Model B is also

stationary.
However, in order to cancel the (1 - cB) term then (1 - cB)-1 must exist.
Looking at the conditions given on page 2 of the Tables for a convergent
series expansion for (1 + x )p we see that | x | < 1 which means we require
| c | < 1 . This was not included in the examiners’ solution though.

(i) Non-stationarity of X t
For this time series:
E ( X t ) = E (a + bt + Yt ) = a + bt + E (Yt )
We are told that Yt is stationary, so E (Yt ) is equal to some constant, m

say. Therefore:
E ( X t ) = a + bt + m
The presence of the bt term indicates that E ( X t ) depends on time t,

ie E ( X t ) is not constant and hence X t is not stationary.
(ii) Stationarity of DX t
We have:
DX t = X t - X t -1 = a + bt + Yt - (a + b(t - 1) + Yt -1) = b + Yt - Yt -1 = b + DYt
Since Yt is stationary, it follows that DYt is also stationary. Adding a

constant to a stationary series produces another stationary series. So DX t
is stationary.
(iii) Autocovariance function of DX t
Since Yt is a stationary series, its covariance function depends only on the

lag. Let:
g Y (s ) = cov(Yt ,Yt - s )

Since DX t = b + Yt - Yt -1 :
cov( DX t , DX t - s ) = cov(b + Yt - Yt -1, b + Yt - s - Yt - s -1)
= cov(Yt ,Yt - s ) - cov(Yt ,Yt - s -1)
- cov(Yt -1,Yt - s ) + cov(Yt -1,Yt - s -1)
= g Y (s ) - g Y (s + 1) - g Y (s - 1) + g Y (s )
= 2g Y (s ) - g Y (s + 1) - g Y (s - 1)
(iv) Equation for DX t
Substituting the expression for Yt into the formula for DX t gives:
DX t = b + Yt - Yt -1
= b + e t + be t -1 - e t -1 - be t - 2
= b + e t + ( b - 1)e t -1 - be t - 2
Hence:
(
DX t = b + 1 + ( b - 1)L - b L2 e t )
(v) Variance of DX t
The white noise process e t is a set of uncorrelated random variables. So:
var(Yt ) = var(e t + be t -1) = var(e t ) + b 2 var(e t -1)
In addition, the white noise random variables are identically distributed, so

they all have the same variance. If we denote this common variance by s 2 ,
then:
var(Yt ) = s 2 + b 2s 2 = (1 + b 2 )s 2

Similarly:
var( DX t ) = var( b + e t + ( b - 1)e t -1 - be t - 2 )

= var(e t ) + var(( b - 1)e t -1) + var( - be t - 2 )
= s 2 + ( b - 1)2s 2 + b 2s 2
= ( b - 1)2s 2 + (1 + b 2 )s 2
= ( b - 1)2s 2 + var(Yt )
which is greater than the variance of Yt .

FACTSHEET
This factsheet summarises the main methods, formulae and information

required for tackling questions on the topics in this booklet.
Time series process
Stochastic process, { X t : t Œ J } , with a continuous state space, X t Œ S , and

discrete time set J . There are two types:
 univariate (eg MA, AR, ARMA, ARIMA) – just one variable, say, X t
 multivariate (eg VAR) – more than one variable, use vectors/matrices.
White noise
A sequence of uncorrelated (or independent) random variables, each with

the same distribution. In the context of time series, these random variables
are assumed to have with zero mean.
Stationarity
A time series is (weakly) stationary if:

 E ( X t ) is constant
 cov( X t , X t + k ) depends only on the lag k .
We can test this by:

 writing down the characteristic polynomial of the X t ’s
 showing that the roots are all greater than 1 in magnitude.
Moving average processes are always stationary.

Invertibility
A time series is invertible if we can calculate the white noise terms

(residuals) from observed data values by inverting the process formula.
We can test this by:
 writing down the characteristic polynomial of the et ’s
 showing that the roots are all greater than 1 in magnitude.
Autoregressive processes are always invertible.
Purely indeterministic
 If knowledge of X1,  , X n is less useful in predicting X N as N Æ • .
 All the time series in this course are purely indeterministic.
Markov property
 Future probabilities depend only on the most recent value.

 Only the AR(1) and VAR(1) have this property.
 AR( p ) ’s can be converted to a VAR(1) .
Integrated of order d
 I (0) means the process, X t , is stationary
 I (d ) , d > 0 means the process, X t , is not stationary, but Yt = —d X t is.
Autocovariance function
g 0 = var( X t )
g k = cov( X t , X t + k )

Autocorrelation function (ACF)
gk
 rk = corr( X t , X t + k ) = - 1 £ rk £ 1
g0
 rk Æ 0 as k Æ • for purely indeterministic processes
 rk = 0 , k > q for an MA(q ) process.
Partial autocorrelation function (PACF)
 Conditional correlation of X t + k with X t given X t +1,  , X t + k -1
r2 - r12
f1 = r1, f2 = , fk given on page 40 of Tables
1 - r12
 fk Æ 0 as k Æ • for purely indeterministic processes
 fk = 0 , k > p for an AR( p ) process.
Moving average process of order q, MA(q )
 weighted average of the past q white noise terms:
X t = m + et + b1et -1 +  + b q et - q
 always stationary
 need to check invertibility
 never Markov
 ACF, rk , cuts off for k > q
 PACF, fk , decays to zero.

Autoregressive process of order p, AR ( p)
 weighted average of the past p observed values:
X t = m + a1( X t -1 - m ) +  + a p ( X t - p - m ) + et
 need to check stationarity

 always invertible
 only AR(1) is Markov
 ACF, rk , decays to zero
 PACF, fk , cuts off for k > p .
Autoregressive moving average process, ARMA( p, q )
 combination of AR( p ) and MA(q ) :
X t = m + a1( X t -1 - m ) +  + a p ( X t - p - m ) + et
+ b1et -1 +  + b q et - q
 need to check stationarity

 need to check invertibility
 only ARMA(1,0) is Markov
 ACF, rk , decays to zero
 PACF, fk , decays to zero.
Autoregressive integrated moving average process, ARIMA( p, d , q )
—d X t is a stationary ARMA( p, q ) process.

Vector autoregressive process of order p, VAR ( p)
 X t = m + A1( X t -1 - m ) +  + Ap ( X t - p - m ) + et
 always invertible
 VAR(1) is Markov
 AR( p ) ’s can be converted to a VAR (1) .
Cointegration
Two time series processes, X t and Yt are cointegrated if:
 X t ,Yt are both I (1)
 There exist non-zero values of a and b such that a X t + b Yt is

stationary.
Removing trends and seasonal variation
Linear trends can be removed by:

 differencing, y t = xt - xt -1
 least squares trend removal, y t = xt - (a + bt ) .
Exponential trends can be removed by taking logs, ie set y t = ln xt .
Seasonal variation can be removed by:
 seasonal differencing, eg set y t = xt - xt -12
 method of moving averages, eg set:
yt =
1
12 ( 1
2
x t - 6 +  + xt - 1 + x t + x t + 1 +  +
1
2
xt + 6 )
 method of seasonal means, eg subtract the monthly estimate from the
appropriate month.

Box-Jenkins methodology
An approach to fitting an ARIMA( p, d , q ) model to a data set.
Step 1 – identify the p , d and q
An SACF slowly decaying from 1 indicates that the series should be

differenced. A good choice for d is the difference with the lowest sample
variance.
 We have an AR( p ) if fˆk cuts-off for k > p .
 We have an MA(q ) if rk cuts-off for k > q .
 Otherwise we have an ARMA which can be fitted by a computer.
Step 2 – estimate parameters
 Use method of moments estimation by setting rk = rk .
 Least squares could be used on an AR( p ) process.
 Maximum likelihood requires an assumption about et ’s distribution.
Step 3 – check fit of the model
If the model is a good fit then the residuals, eˆt , will be white noise:
 the graph of {eˆt } against t or xt should be patternless
 a turning points test can check that the {eˆt } are patternless
 the graphs of SACF and SPACF of {eˆt } should be patternless and

close to zero
 95% of the SACF and SPACF values should lie within ± 2 n
 a Ljung-Box chi-squared test can be used to check whether the

residuals are uncorrelated.

Forecasting
There are two methods available:
 the k -step ahead forecast, xˆn (k ) estimates xn + k given x0 ,  , xn .
For an ARMA(2,1) process, xn = a1xn -1 + a 2 xn - 2 + en + b en -1 , we

get:
xˆ n (1) = aˆ1xn + aˆ 2 xn -1 + 0 + b eˆn

xˆ n (2) = aˆ1xˆ n (1) + aˆ 2 xn + 0 + 0 , etc
 Exponential smoothing calculates a forecast based on a weighted

average using smoothing parameter a :
xˆ n (1) = (1 - a )xˆn -1(1) + a xn

NOTES

NOTES

NOTES

NOTES

NOTES

CS2 Booklet 8 (Time Series) 2019 FINAL

Uploaded by

Copyright:

Available Formats

CS2 Booklet 8 (Time Series) 2019 FINAL

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS2 Booklet 8 (Time Series) 2019 FINAL

Uploaded by

Copyright:

Available Formats

Exclusive use Batch0402p

Chapter 13 Time series 1

The Actuarial Education Company

Links to the Course Notes and Syllabus 2

All of this material is copyright. The copyright belongs to Institute and

© IFE: 2019 Examinations Page 1

LINKS TO THE COURSE NOTES AND SYLLABUS

Material covered in this booklet

Chapter 13 Time series 1

Syllabus objectives covered in this booklet

2.1 Concepts underlying time series models

2.1.1 Explain the concept and general properties of stationary,

2.1.2 Explain the concept of a stationary random series.

2.1.3 Explain the concept of a filter applied to a stationary

2.1.4 Know the notation for backwards shift operator, backwards

2.1.5 Explain the concepts and basic properties of

2.1.6 Explain the concept and properties of discrete random

2.1.7 Explain the basic concept of a multivariate autoregressive

2.1.8 Explain the concept of cointegrated time series.

Page 2 © IFE: 2019 Examinations

2.2 Applications of time series models

2.2.1 Outline the processes of identification, estimation and

2.2.2 Describe briefly other non-stationary, non-linear time

2.2.3 Describe simple applications of a time series model,

2.2.4 Develop deterministic forecasts from time series data,

© IFE: 2019 Examinations Page 3

For each model we consider the properties of stationarity and invertibility,

In addition, we brieftly consider some more complicated time series models,

Questions on the Box-Jenkins methodology can involve bookwork or the

Page 4 © IFE: 2019 Examinations

We have inserted paragraph numbers in some places, such as 1, 2, 3 …, to

The text given in Arial Bold font is Core Reading.

Chapter 13 – Time series 1

Univariate time series

1 A univariate time series is a sequence of observations of a single

Most applications involve observations taken at equally-spaced times.

2 For instance, a sequence of daily closing prices of a given share

© IFE: 2019 Examinations Page 5

3 The fact that the observations occur in time order is of prime

The observations are related to one another and cannot be regarded as

Note that the observations x t can arise in different situations. For

Figure 13.0: a time series

Page 6 © IFE: 2019 Examinations

The purposes of a practical time series analysis may be summarised

4 A univariate time series is modelled as a realisation of a sequence of

called a time series process.

5 A time series process is a stochastic process indexed in discrete time

The sequence { X t : t = 1, 2,  , n } may be regarded as a subsequence

Properties of univariate time series

The concept of stationarity was introduced in Booklet 1, along with the

© IFE: 2019 Examinations Page 7

In the study of time series it is a convention that the word ‘stationary’

But we do need to be careful in our definition, as there are some

6 A process X is called purely indeterministic if knowledge of the

7 When we talk of a ‘stationary time series process’ we shall mean a

8 A particular form of notation is used for time series: X is said to be

The theory of stationary random processes plays an important role in

Page 8 © IFE: 2019 Examinations

Mean, covariance and correlation