Lecture 02 20190212

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

02 Stationary time series

Andrius Buteikis, [email protected]


http://web.vu.lt/mif/a.buteikis/
Introduction
All time series may be divided into two big classes - stationary and
non-stationary.

I Stationary process - a random process with a constant mean,


variance and covariance. Examples of stationary time series:

WN, mean = 0 MA(3), mean = 5 AR(1), mean = 5


2

8
6
1

6
5
x1

x2

x3
0

4
4
−1

2
−2

0 50 100 150 200 0 50 100 150 200 0 50 100 150 200

Time Time Time

The three example processes fluctuate around their constant mean


values. Looking from the graphs, the fluctuations of the first two graphs
seem to be constant, however the third one is not so apparent.
If we plot the last time series for a longer time period:

8
6 AR(1), mean = 5
x3

4
2

0 50 100 150 200

Time

AR(1), mean = 5
8
6
x3

4
2

0 100 200 300 400

Time

We can see that the fluctuations are indeed around a constant mean and
the variance does not appear to change throughout the period.
Some non-stationary time series examples:

I Yt = t + t , where t ∼ N (0, 1);


I Yt = t · t, where t ∼ N (0, σ 2 );
I Yt = tj=1 Zj , where each independent variable Zj is either 1 or −1,
P
with a 50% probability for either value.

The reasons for their non-stationarity are as follows:

I The first time series is not stationary because its mean is not
constant: EYt = t - depends on t;
I The second time series is not stationary because its variance is not
constant: Var (Yt ) = t 2 · σ 2 - depends on t.
However, EYt = 0 · t = 0 is constant;
I The third time series is not stationary because even though
Pt
EYt = j=1 (0.5 + (−0.5)) = 0, the variance
Var (Yt ) = E(Yt2 ) − (E(Yt ))2 = E(Yt2 ) = t where:
Pt
E(Yt2 ) = E(Zj2 ) + 2 E(Zj Zk ) = t · (0.5 · 1 + 0.5 · (−1)2 ) = t
P
j=1 j6=k
The sample data graphs are provided below:
non stationary in mean non stationary in variance no clear tendency
50

4
100

3
40

2
50
30

1
ns1

ns2

ns3

0
20

−1
10

−2
−50

−3
0

0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50

Index Index Index


I White noise (WN) - a stationary process of uncorrelated
(sometimes we may demand a stronger property of independence)
random variables with zero mean and constant variance. White
noise is a model of an absolutely chaotic process of uncorrelated
observations - it is a process that immediately forgets its past.

How can we know which of the previous three stationary graphs are not
WN? Two functions help us determine this:

I ACF - Autocorrelation function


I PACF - Partial autocorrelation function

If all the bars (except the 0th in the ACF) are within the blue band - the
stationary process is WN.
WN MA(3) AR(1)

1.0

0.8
0.8

0.6
ACF

ACF

ACF

0.4
0.4

0.2

0.0
0.0

−0.2
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 25

Lag Lag Lag

WN MA(3) AR(1)
0.10

0.6
0.3
Partial ACF

Partial ACF

Partial ACF

0.4
0.00

0.1

0.2
−0.10

0.0
−0.1

5 10 15 20 5 10 15 20 0 5 10 15 20 25

Lag Lag Lag

The 95% confidence intervals are calculated from:

qnorm(p = c(0.025, 0.975))/sqrt(n)

(more details on the confidence interval calculation are provided later in


these slides)
par(mfrow = c(1, 2))
set.seed(10)
n = 50
x0 <- rnorm(n)
acf(x0)
abline(h = qnorm(c(0.025, 0.975))/sqrt(n), col = "red")
pacf(x0)
abline(h = qnorm(c(0.025, 0.975))/sqrt(n), col = "red")

Series x0 Series x0

0.3
1.0
0.8

0.2
0.6

0.1
Partial ACF
0.4
ACF

0.0
0.2

−0.1
0.0

−0.2
−0.2

0 5 10 15 5 10 15

Lag Lag
Covariance-Stationary Time Series
I In cross-sectional data different observations were assumed to be
uncorrelated;
I In time series we require that there be some dynamics, some
persistence, some way in which the present is linked to the past and
the future - to the present. Having historical data then would allow
us to forecast the future.

If we want to forecast a series - at a minimum we would like its mean


and covariance structure to be stable over time. In that case, we would
say that the series is covariance stationary. There are two requirements
for this to be true:

1. The mean of the series is stable over time: EYt = µ;


2. The covariance structure is stable over time.

In general, the (auto)covariance between Yt and Yt−τ is:

γ(t, τ ) = cov (Yt , Yt−τ ) = E(Yt − µ)(Yt−τ − µ)

If the covariance structure is stable, then the covariance depends on τ


but not on t: γ(t, τ ) = γ(τ ). Note: γ(0) = Cov (Yt , Yt ) = Var (Yt ) < ∞.
Remark
When observing/measuring time series we obtain numbers y1 , ..., yT
which are the realization of random variables Y1 , ..., YT .
Using probabilistic concepts, we can give a more precise definition of a
(weak) stationary series:
I If EYt = µ - the process is called mean-stationary;
I If Var (Yt ) = σ 2 < ∞ - the process is called variance-stationary;
I If γ(t, τ ) = γ(τ ) - the process is called covariance-stationary.
In other words, a time series Yt is stationary if its mean, variance and
covariance do not depend on t.
If at least one of the three requirements is not met, then the process is
not-stationary.
Since we often work with the (auto)correlation between Yt and Yt−τ
rather than the (auto)covariance (because they are easier to interpret),
we can calculate the autocorrelation function (ACF):

cov (Yt , Yt−τ ) γ(τ )


ρ(τ ) = p =
Var (Yt )Var (Yt−τ ) γ(0)

Note: ρ(0) = 1, |ρ(τ )| ≤ 1.


The partial autocorrelation function (PACF) measures the association
between Yt and Yt−k :

p(k) = βk , where Yt = α + β1 Yt−1 + ... + βk Yt−k + t


The variance of the autocorrelation coefficient at lag k, rk , is normally
distributed at the limit, and the variance can be approximated:
1
Var (rk ) ∼ (where T is the number of observations).
T
As such, we want to createlowerand upper 95% confidence bounds for
1 1
the normal distribution N 0, , whose standard deviation is √ .
T T
The 95% confidence interval (of a stationary time series) is:

1.96
∆=0± √
T

In general, the critical value of a standard normal distribution and its


confidence interval can be found in these steps:

1−Q
I Compute α = , where Q is the confidence level;
2
I To express the critical value as a z − score, find the z1−α value.

For example, if Q = 0.95, then α = 0.05. Then, the standard normal


distributions 1 − α quantile is z0.025 ≈ 1.96.
White Noise
White noise processes are the fundamental building blocks of all
stationary time series.
We denote it t ∼ WN(0, σ 2 ) - a zero mean, constant variance and
serially uncorrelated (ρ(t, τ ) = 0, for τ > 0 and any t) random variable
process.
Sometimes we demand a stronger property of independence.
From the definition it follows that:

I E(t ) = 0;
I Var (t ) = σ 2 < ∞;
I γ(t, τ ) = E(t − Et )(t−τ − Et−τ ) = E(t t−τ ), where:

(
0, 6 0
if τ =
E(t t−τ ) =
σ 2 , if τ = 0
Example on how to check if a process is stationary.
Let us check if Yt = t + β1 t−1 , where t ∼ WN(0, σ 2 ) is stationary:
1. EYt = E(t + β1 t−1 ) = 0 + β1 · 0 = 0;
2. Var (Yr ) = Var (t + β1 t−1 ) = σ 2 + β12 σ 2 = σ 2 (1 + β1 );
3. The autocovariance for τ > 0:
γ(t, τ ) = E(Yt Yt−τ ) = E(t + β1 t−1 )(t−τ + β1 t−τ −1 )
= Et t−τ + β1 Et t−τ −1 + β1 Et−1 t−τ + β12 Et−1 t−τ −1
(
β1 σ 2 , if τ = 1
= β1 Et−1 t−τ =
0, if τ > 1

None of these characteristics depend on t, which means that the process


is stationary. This process has a very short memory (i.e. if Yt and Yt+τ
are separated by more than one time period - they are uncorrelated).
On the other hand, this process is not a WN.
The Lag Operator
The lag operator L is used to lag a time series: LYt = Yt−1 . Similarly:
L2 Yt = L(LYt ) = L(Yt−1 ) = Yt−2 etc. In general, we can write:
Lp Yt = Yt−p
Typically, we operate on a time series with a polynomial in the lag
operator. A lag operator polynomial of degree m is:
B(L) = β0 + β1 L + β2 L2 + ... + βm Lm
For example, if B(L) = 1 + 0.9L − 0.6L2 , then:
B(L)Yt = Yt + 0.9Yt−1 − 0.6Yt−2

A well known operator - the first-difference operator ∆ - is a first-order


polynomial in the lag operator: ∆Yt = Yt − Yt−1 = (1 − L)Yt , i.e.
B(L) = 1 − L.
We can also write an infinite-order lag operator polynomial as:

X
B(L) = β0 + β1 L + β2 L2 + ... = β j Lj
j=0
The General Linear Process
Wold’s representation theorem points to the appropriate model for
stationary processes.
Wold’s Representation Theorem
Let {Yt } be any zero-mean covariance-stationary process. Then we can
write it as:

X
Yt = B(L)t = βj t−j , t ∼ WN(0, σ 2 )
j=0

P∞
where β0 = 1 and j=0 βj2 < ∞. On the other hand, any process of the
above form is stationary.
I If β1 = β2 = ... = 0 - this corresponds to a WN process. This shows
once again that WN is a stationary process.
I If βk = φk , then since 1 + φ + φ2 + ... = 1/(1 − φ) < ∞ we have
that if |φ| < 1, then the process Yt =  + φt−1 + φ2 t−2 + ... is a
stationary process.
In Wold’s theorem, we assumed a zero mean, though this is not as
restrictive as it may seem. Whenever you see Yt , analyse the process
Yt − µ, so that the process is expressed in deviations from its mean. The
deviation from the mean has a zero mean by construction. So, there is
not generality loss, when analyzing zero-mean processes.
Wold’s representation theorem points to the importance of models with
infinite distributed (weighted) lags. Although infinite distributed lag
models are not of immediate practical use since they contain infinite
parameters, although this may not always be the case. From the previous
slide, βk = φk of the infinite polynomial B(L) - is only one parameter.
Estimation and Inference for the Mean, ACF and PACF

Suppose we have a sample data of a stationary time series but we do not


know the true model that generated the data (we only know that it was a
polynomial B(L)), nor the mean, ACF or PACF associated with the
model.
We want to use the data to estimate the mean, ACF and PACF, which
we might use to help us decide the suitable model to fit the data.
Sample Mean
The mean of a stationary series is EYt = µ. A fundamental principle of
estimation, called the analog principle, suggests that we develop
estimators by replacing expectations with sample averages. Thus, our
estimator of the population mean, given a sample of size T is the sample
mean:
T
1 X
Ȳ = Yt
T t=1
Typically, we are not interested in estimating the mean but it is needed
for estimating the autocorrelation function.
Sample Autocorrelations
The autocorrelation at displacement, or lag, τ for the covariance
stationary series {Yt } is:

E (Yt − µ) (Yt−τ − µ)
ρ(τ ) = 2
E (Yt − µ)

Application of the analog principle yields a natural estimator of ρ(τ ):

1 PT   
t=1 Yt − Ȳ Yt−τ − Ȳ
ρ̂(τ ) = T
1 PT 2
Yt − Ȳ
T t=1
This estimator is called the sample autocorrelation function (sample
ACF).
It is often of interest to assess whether a series is reasonably
approximated as white noise, i.e. whether all of its autocorrelations are
zero in population.
If a series is white noise, then the sample autocorrelations ρ̂(τ ), √
τ = 1, ..., K in large samples are independent and have the N (0, 1/ T )
distribution.
Thus, if the series is WN, ~95%
√ of the sample autocorrelations should
fall in the interval of ±1.96/ T .
Exactly the same holds for both sample ACF and sample PACF. We
typically plot the sample ACF and sample PACF along with their error
bands.
The aforementioned error bands provide 95% confidence bounds for only
the sample autocorrelation taken one at a time.
We are often interested in whether a series is white noise, i.e. whether all
its autocorrelations are jointly zero. Because of the sample size, we can
only take a finite number of autocorrelations. We want to test:

H0 : ρ(1) = 0, ρ(2) = 0, ..., ρ(k) = 0

Under the null hypothesis the Ljung-Box statistic:


k
X ρ̂2 (τ )
Q = T (T + 2)
τ =1
T −τ

is approximately distributed as a χ2K random variable.


To test the null hypothesis, we have to calculate the
p − value = P(χ2K > q): if p − value < 0.05 - we reject the null
hypothesis, H0 , and assume that Yt is not white noise.
Example: Canadian unemployment data

We will illustrate the provided ideas by examining quarterly Canadian


employment index data. The data is seasonally adjusted and displays no
trend, however it does appear to be highly serially correlated…

suppressPackageStartupMessages({require("forecast")})
txt1 <- "http://uosis.mif.vu.lt/~rlapinskas/(data%20R&GRETL/"
txt2 <- "caemp.txt"
caemp <- read.csv(url(paste0(txt1, txt2)),
header = TRUE, as.is = TRUE)
caemp <- ts(caemp, start = c(1960, 1), freq = 4)
tsdisplay(caemp)
caemp

105
95
90
85

1960 1965 1970 1975 1980 1985 1990 1995


1.0

1.0
0.8

0.8
0.6

0.6
0.4

0.4
PACF
ACF

0.2

0.2
−0.2 0.0

−0.2 0.0
5 10 15 20 5 10 15 20

Lag Lag

I The sample ACF are large and display a slow one-sided decay;
I The sample PACF are large at first, but are statistically negligible
beyond displacement τ = 2.
We shall once again test the WN hypothesis, this time using the
Ljung-Box test statistic.
Box.test(caemp, lag = 1, type = "Ljung-Box")

##
## Box-Ljung test
##
## data: caemp
## X-squared = 127.73, df = 1, p-value < 2.2e-16

with p < 0.05, we reject the null hypothesis H0 : ρ(1) = 0.


Box.test(caemp, lag = 2, type = "Ljung-Box")

##
## Box-Ljung test
##
## data: caemp
## X-squared = 240.45, df = 2, p-value < 2.2e-16

with p < 0.05, we reject the null hypothesis H0 : ρ(1) = 0, ρ(2) = 0,


and so on. We can see that the time series is not a WN.
We will now present a few more examples of stationary processes.
Moving-Average (MA) Models
Finite-order moving-average processes are approximations to the Wold
representation (an infinite-order moving average process).
The fact that all variation in time series, one way of another, is driven by
shocks of various sorts suggests the possibility of modelling time series
directly as distributed lags of current and past shocks - as
moving-average processes.
The MA(1) Process
The first-order moving average or MA(1) process is:

Yt = t + θt−1 = (1 − θL)t , −∞ < θ < ∞,  ∼ WN(0, σ 2 )

Defining characteristics of an MA process: the current value of the


observed series can be expressed as a function of current and lagged
unobservable shocks t .
Whatever the value of θ (as long as |θ| < ∞), MA(1) is always a
stationary process and:

I E(Yt ) = E(t ) + θE(t−1 ) = 0;


I Var (Yt ) = Var (t ) + θ2 Var (t−1 ) = (1 + θ2 )σ 2 ;

1, if τ = 0

I ρ(τ ) = θ/(1 + θ2 ), if τ = 1

0, otherwise

Key feature of MA(1): (sample) ACF has a sharp cutoff beyond τ = 1.


We can write MA(1) another way:
Since:
1
Yt = (1 − θL)t ⇒ t = Yt
1 − θL
Recalling the formula of a geometric series, if |θ| < 1:

t = (1 − θL + θ2 L2 − θ3 L3 + ...)Yt
= Yt − θYt−1 + θ2 Yt−2 − θ3 Yt−3 + ...

and we can express Yt as an infinite AR process:

Yt = θYt−1 − θ2 Yt−2 + θ3 Yt−3 − ... + t


X∞
= (−1)j+1 θj Yt−j + t
j=1

Remembering the definition of a PACF we have that for an MA(1)


process it will decay gradually to zero.

I If θ < 0, then the pattern of decay will be one-sided


I If 0 < θ < 1, then the pattern of decay will be oscillating.
An example on how the sample ACF and PACF would look like of MA(1)
processes:

MA(1) with θ = 0.5 MA(1) with θ = 0.5


1.0

0.0 0.1 0.2 0.3


0.8

Partial ACF
0.6
ACF

0.4
0.2
0.0

−0.2
0 1 2 3 4 5 1 2 3 4 5

Lag Lag

MA(1) with θ = −0.5 MA(1) with θ = −0.5

0.1
0.8

0.0
Partial ACF
0.4
ACF

−0.2
0.0

−0.4
−0.4

0 1 2 3 4 5 1 2 3 4 5

Lag Lag
The MA(q) Process
We will now consider a general finite-order moving average process of
order q, MA(q):

Yt = t +θ1 t−1 +...+θq t−q = Θ(L)t , −∞ < θ < ∞,  ∼ WN(0, σ 2 )

where
Θ(L) = 1 + θ1 L + ... + θq Lq
is the qth-order lag polynomial. The MA(q) process is a generalization of
the MA(1) process. Compared to MA(1), MA(q) can capture richer
dynamic patterns which can be used for improved forecasting.
The properties of an MA(q) processes are parallel to those of an MA(1)
process in all respects:

I The finite-order MA(q) process is covariance stationary for any value


of its parameters (|θj | < ∞, j = 1, ..., q);
I In MA(q) case, all autocorrelations in ACF beyond displacement q
are 0 (a distinctive property of the MA process);
I The PACF of the MA(q) decays gradually in accordance with the
infinite autoregressive representation, similar to MA(1):
Yt = a1 Yt−1 + a2 Yt−2 + ... + t (with certain conditions for aj ).
An example on how the sample ACF and PACF would look like of MA(3)
process:
0.8 MA(3) with θ1 = 1.2, θ2 = 0.65, θ3 = −0.35
ACF

0.4
0.0

0 2 4 6 8

Lag

MA(3) with θ1 = 1.2, θ2 = 0.65, θ3 = −0.35


0.6
0.4
Partial ACF

0.2
0.0
−0.4

2 4 6 8

Lag

ACF is cut off at τ = 3 and PACF decays gradually.


Autoregressive (AR) Models
The autoregressive process is also a natural approximation of the Wold
representation. We have seen that, under certain conditions, a
moving-average process has an autoregressive representation. So, an
autoregressive process is, in a sense, the same as a moving average
process.
The AR(1) Process

The first-order autoregressive or AR(1) process is:

Yt = φYt−1 + t , t ∼ WN(0, σ 2 )

or:
1
(1 − φL)Yt = t ⇒ Yt = t
1 − φL

Note the special interpretation of the errors, or disturbances, or shocks t


in time series theory: in contrast to the regression theory where they were
understood as the summary of all unobserved X ’s, now they are treated
as economic effects which have developed in period t.
As we will see when analyzing ACF, the AR(1) model is capable of
capturing much more persistent dynamics (depending on its parameter
value) than the MA(1) model, which has a very short memory, regardless
of its parameter value.
Recall that a finite-order moving-average process is always covariance
stationary, but that certain conditions must be satisfied for AR(1) to be
stationary. The AR(1) process can be rewritten as:
1
Yt = t = (1 + φL + φ2 L2 + ...)t = t + φt−1 + φ2 t−2 + ...
1 − φL
This Wold’s moving-average representation for Y is convergent if |φ| < 1,
thus:

AR(1) is stationary is |φ| < 1

Equivalently, the condition for covariance stationarity is that the root, z1 ,


of the autoregressive lag operator polynomial (i.e.
1 − φz1 = 0 ⇔ z1 = 1/φ) be greater than 1 in absolute value (a similar
condition on the roots is important for the AR(p) case).
We can also get the above equation by recursively applying the equation
of AR(1) to get the infinite MA process:
Yt = φYt−1 + t = φ(φYt−2 + t−1 ) + t

X
2
= t + φt−1 + φ Yt−2 = ... = φj t−j
j=0
From the moving average representation of the covariance stationary
AR(1) process:

I E(Yt ) = E(t + φt−1 + φ2 t−2 + ...) = 0;


I Var (Yt ) = Var (t ) + φ2 Var (t−1 ) + ... = σ 2 /(1 − φ2 );

Or, alternatively: when |φ| < 1 - the process is stationary, i.e. EYt = m,
therefore EYt = φEYt−1 + Et ⇒ m = φm + 0 ⇒ m = 0.
This allows us to easily estimate the mean of the generalized AR(1)
process: if Yt = α + φYt−1 + t , then m = α/(1 − φ).
The correlogram (ACF & PACF) of AR(1) is in a sense symmetric to that
of MA(1):

I ρ(τ ) = φτ , τ = 0, 1, 2... - ACF decays exponentially;


(
φ, τ = 1
I p(τ ) = - PACF cuts off abruptly.
0, τ > 1
An example on how the sample ACF and PACF would look like of AR(1)
process:

AR(1) with φ = 0.85


0.8
ACF

0.4
0.0

0 1 2 3 4 5

Lag

AR(1) with φ = 0.85


0.8
Partial ACF

0.4
0.0

1 2 3 4 5

Lag
The AR(p) Process
The general pth order autoregressive process, AR(p) is:
Yt = φ1 Yt−1 + φ2 Yt−2 + ... + φp Yt−p + t , t ∼ WN(0, σ 2 )
In lag operator form, we write:
Φ(L)Yt = (1 − φ1 L − φ2 L2 − ... − φp Lp )Yt = t
Similar to the AR(1) case, the AR(p) process is covariance stationary
if and only if all the roots zi of the autoregressive lag operator polynomial
Φ(z) are outside the complex unit circle:
1 − φ1 z − φ2 z 2 − ... − φp z p = 0 ⇒ |zi | > 1
So:

AR(p) is stationary if all the roots |zi | > 1

For a quick check of stationarity, use the following rule:


Pp
If i=1 φi ≥ 1, the process isn’t stationary
In the covariance stationary case, we can write the process in the infinite
moving average MA(∞) form:

1
Yt = t
Φ(L)

I The ACF for the general AR(p) process decays gradually when the
lag increases;
I The PACF for the general AR(p) process has a sharp cutoff at
displacement p.
An example on how the sample ACF and PACF would look like of AR(2)
process Yt = 1.5Yt−1 − 0.9Yt−2 + T :

AR(2) with φ1 = 1.5, φ2 = −0.9, AR(2) with φ1 = 1.5, φ2 = −0.9,

0.5
Partial ACF
0.5
ACF

−0.5
−0.5

0 5 10 15 20 5 10 15 20

Lag Lag

The corresponding lag operator polynomial is 1 − 1.5L + 0.9L2 with two


complex√conjugate roots: z1,2 = 0.83 ± 0.65i,
|z1,2 | = 0.832 + 0.652 = 1.05423 > 1 - thus the process is stationary.
The ACF for an AR(2) is:

0,
 τ =0
ρ(τ ) = φ1 /(1 − φ2 ), τ =1

φ1 ρ(τ − 1) + φ2 ρ(τ − 2), τ = 2, 3, ...

Because the roots are complex, the ACF oscillates and because the roots
are close to the unit circle, the oscillation damps slowly.
Stationarity and Invertibility
The AR(p) is a generalization of the AR(1) strategy for approximating
the Wold representation. The moving-average representation associated
with the stationary AR(p) process:


1 1 X
Yt = t where = ψj Lj , ψ0 = 1
Φ(L) Φ(L) j=0

depends on p parameters only. This gives us the infinite process from


Wold’s Representation Theorem:


X
Yt = ψj t−j
j=0

which is known asPthe infinite moving-average process, MA(∞). Because



AR is stationary, j=0 ψj2 < ∞ and Yt take finite values.
Thus, a stationary AR process can be rewritten as an MA(∞) process.
Stationarity and Invertibility

In some cases the AR form of a stationary process is preferred to that of


MA. Just as we can write an AR process as an MA(∞), we can written
an MA process as an AR(∞). The necessary definition says that the MA
process is called invertible if it can be expressed as an AR process. So,
the MA(q) process:

Yt = t +θ1 t−1 +...+θq t−q = Θ(L)t , −∞ < θi < ∞, t ∼ WN(0, σ 2 )

is invertible if all the roots of Θ(x ) = 1 + θ1 x + ... + θq x q lie outside the


unit circle:

1 + θ1 x + ... + θq x q = 0 ⇒ |xi | > 1


Stationarity and Invertibility
Then we can write the process as:


1 1 X
t = Yt , where = πj Lj , π0 = 1
Θ(L) Θ(L) j=0

X ∞
X
t = πj Yt−j = Yt + πj Yt−j
j=0 j=1

which gives us the infinite-order autoregressive process, AR(∞):



X
Yt = πej Yt−j + t
j=1

Because the MA process is invertible, the infinite series converges to a


finite value.
For example, MA(1) of the form Yt = t − t−1 is not invertible since
1 − x = 0 ⇒ x = 1.
Autoregressive Moving-Average (ARMA) Models

AR and MA models are often combined in attempts to obtain better


approximations to the Wold representation. The results it the
ARMA(p,q) process. The motivation for using ARMA models is as
follows:

I If the random shock that drives and AR process is itself a MA


process, then we obtain an ARMA process;
I ARMA processes arise from aggregation - sums of AR processes,
sums of AR and MA processes;
I AR processes observed subject to measurement error also turn out
to be ARMA processes.
ARMA(1,1) process
The simplest ARMA process that is not a pure AR or pure MA is the
ARMA(1,1) process:
Yt = φYt−1 + t + θt−1 , t ∼ WN(0, σ 2 )
or in lag operator form:
(1 − φL)Yt = (1 + θL)t
where:
1. |φ| < 1 - required for stationarity;
2. |θ| < 1 - required for invertibility.
If the covariance stationarity conditions are satisfied, then we have the
MA representation:
(1 − φL)
Yt = t = t + b1 t−1 + b2 t−2 + ...
(1 + θL)
which is an infinite distributed lag of current and past innovations.
Similarly, we can rewrite it in the infinite AR form:
(1 + θL)
Yt + a1 Yt−1 + a2 Yt−2 + ... = Yt = t
(1 − φL)
ARMA(p,q) process

A natural generalization of the ARMA(1,1) is the ARMA(p,q) process


that allows for multiple moving-average and autoregressive lags. We can
write it as:

Yt = φ1 Yt−1 + ... + φp Yt−p + t + θq t−1 + ... + θq t−q , t ∼ WN(0, σ 2 )

or:
Φ(L)Yt = Θ(L)t

I If all the roots of Φ(L) are outside the unit circle, then the process is
stationary and has a convergent infinite moving average
representation: Yt = (Φ(L)/Θ(L)) t ;
I If all roots of Θ(L) are outside the unit circle, then the process is
invertible and can be expressed as the convergent infinite
autoregression: (Φ(L)/Θ(L)) Yt = t .
An example of an ARMA(1,1) process: Yt = 0.85Yt−1 + t + 0.5t−1 :

0.8
ARMA(1,1) with φ = 0.85, θ = 0.5,
ACF

0.4
0.0

0 5 10 15 20

Lag

ARMA(1,1) with φ = 0.85, θ = 0.5,


0.8
Partial ACF

0.4
0.0
−0.4

5 10 15 20

Lag
ARMA models are often both highly accurate and highly parsimonious.
In a particular situation, for example, it might take an AR(5) model to
get the same approximation accuracy as could be obtained with an
ARMA(1,1), but the AR(5) has five parameters to be estimated, whereas
the ARMA(1,1) has only two.

The rule to determine the number of AR and MA terms:


- AR(p) - ACF declines, PACF = 0 if τ > p;
- MA(q) - ACF = 0 if τ > q, PACF declines;
- ARMA(p,q) - both ACF and PACF decline.
Estimation
Autoregressive process parameter estimation
Let say we want to estimate the parameters of our AR(1) process:

Yt = φ1 Yt−1 + t

I The OLS estimator of φ for the AR(1) case:


PT
t=1 Yt Yt−1
φ̂ = P T 2
Yt−1
t=1

I Yule-Walker estimator of φ for AR(1) can be calculated by


multiplying Yt = φ1 Yt−1 + t by Yt−1 and taking its expectation.
We will get the equation:

γ(1) = φγ(0)

Recall that γ(τ ) is the covariance between Yt and Yt−τ .


For the AR(p) case, we would need p different equations,i.e.:

γ(k) = θ1 γ(t − 1) + ... + θp γ(k − p), k = 1, ..., p

Moving-average process parameter estimation


Let say we want to estimate the parameter of our invertible MA(1)
process (i.e. |θ| < 1):

Yt = t + θ1 t−1 ⇒ t = Yt − θYt−1 + ...


PT
Let S(θ) = t=1 t and 0 = 0. We can find the parameter θ by
minimizing S(θ).

ARMA process parameter estimation


For the ARMA(1,1): Yt = φYt−1 + t + θt−1 we would need to
PT
minimize S(θ, φ) = t=1 2t with 0 = Y0 = 0.
For the ARMA(p,q), we would need to minimize S(θ, φ) by setting
k = Yk = 0 for k ≤ 0.
We can also estimate the parameters using the maximum
likelihood method.

You might also like