Introduction To ARMA Models: T Iid 2
Introduction To ARMA Models: T Iid 2
Introduction To ARMA Models: T Iid 2
Overview
1. Modeling paradigm
3. ARMA processes
7. ARIMA processes
Modeling paradigm
Modeling objective A common measure used to assess many statistical
models is their ability to reduce the input data to random noise. For
example, we often say that a regression model “fits well” if its residuals
ideally resemble iid random noise. We often settle for uncorrelated
processes with data.
Filters and noise Model the observed time series as the output of an un-
known process (or model) M “driven by” an input sequence composed
iid
of independent random errors {t } ∼ Dist(0, σ 2 ) (not necessarily nor-
mal),
t → Process M → Xt
From observing the output, say X1 , . . . , Xn , the modeling task is to
characterize the process (and often predict its course). This “signal
processing” may be more appealing in the context of, say, underwater
acoustics rather than macroeconomic processes.
Prediction rationale If a model reduces the data to iid noise, then the
model captures all of the relevant structure, at least in the sense that
Statistics 910, #8 2
Causal, one-sided Our notions of time and causation imply that the cur-
rent value of the process cannot depend upon the future (nonantici-
pating), allowing us to express the process M as
Xt = M (t , t−1 , . . .) .
∂ M 2
where, for example, ψjk = ∂t−j ∂t−k evaluated at zero. The first
summand on the right gives the linear expansion.
Invertibility The linear representation (1) suggests a big problem for iden-
tifying and then estimating the process: it resembles a regression in
which all of the explanatory variables are functions of the unobserved
Statistics 910, #8 3
errors. The invertibility condition implies that we can also express the
errors as weighted sum of current and prior observations,
∞
X
t = πj Xt−j .
j=0
Implication and questions The initial goal of time series modeling using
the class of ARMA models to be defined next amounts to finding a par-
simonious, linear model which can reduce {Xt } to iid noise. Questions
remain about how to do this:
1. Do such infinite sums of random variables exist, and how are they
to be manipulated?
2. What types of stationary processes can this approach capture
(i.e., which covariance functions)?
3. Can one express these models using few parameters?
E (Xn − X)2 → 0.
Consequently X X
ψj ψj+k ≤ ψj2 < ∞.
j
ψj2 .
P P
Informally, Var(Yt ) = j,k ψj ψk γ(k − j) ≤ γ(0) j
Covariances When the “input” is white noise, then the covariances are
infinite sums of the coefficients of the white noise,
∞
X
2
γY (h) = σX ψj ψj+h . (2)
j=0
Statistics 910, #8 5
P
Absolutely summable S&S often assume that |ψj | < ∞ (absolutely
summable). This is a stronger assumption that simplifies proofs of
a.s. convergence. For example, 1j is not absolutely summable, but is
square summable. We will not be too concerned with a.s. convergence
and will focus on mean-square convergence. (The issue is moot for
ARMA processes.)
ARMA Processes
Conflicting goals Obtain models that possess a wide range of covariance
functions (2) and that characterize ψj as functions of a few parameters
that are reasonably easy to estimate. We have seen several of these
parsimonious models previously, e.g.,
1. It is stationary.
2. It (or the deviations Xt − E Xt ) satisfies the linear difference
equation written in “regression form” (as in S&S, with negative
signs attached to the φs) as
where wt ∼ W N (0, σ 2 ).
Backshift operator Abbreviate the equation (3) using the so-called back-
shift operator defined as B k Xt = Xt−k . Using B, write (3) as
φ(B)Xt = θ(B)wt
φ(z) = 1 − φ1 z − · · · − φp z p
and
θ(z) = 1 + θ1 z + · · · + θq z q
Closure The backshift operator shifts the stochastic process in time. Be-
cause the process is stationary (having a distribution that is invariant
of the time indices), this transformation maps a stationary process
Statistics 910, #8 7
S = I + D + D2 /2 + D3 /3! + · · · Dm /m!
All moving averages are stationary Under the typical constraint of S&S
that the coefficients of a moving average are absolutely summable, all
moving averages are stationary — even moving averages of other mov-
ing averages.
Proof. Suppose j θj2 < ∞ and that Xt is stationary with covariance
P
P
function γX (h). The covariance function of Yt = j θj Xt−j is
X∞ ∞
X
Cov(Yt+h , Yt ) = Cov θj Yt+h−j , θk Yt−k
j=0 k=0
Statistics 910, #8 9
∞
X
= θj θk Cov(Yt+h−j , Yt−k )
j,k=0
X∞
= θj θk γX (h − j + k)
j,k=0
X X
≤ θj2 γX (j)
j j
Hence,
θ1 1
ρ(1) = < 2
1 + θ12
which we can see from a graph or by noting that the maximum occurs
where the derivative
1 − θ2
∂ρ(1)/∂θ1 = =0.
(1 + θ2 )2
AR(2) example These processes are interesting because they allow for
complex-valued zeros in the polynomial φ(z). The presence of complex
pairs produces oscillations in the observed process.
For the process to be stationary, we need the zeros of φ(z) to lie outside
the unit circle. If the two zeros are z1 and z2 , then
From the quadratic formula applied to (7), φ21 + 4φ2 < 0 implies that
the zeros form a complex conjugate pair,
and we know ρ(0) = 1. To find ρ(1), use the equation defined by γ(0)
and γ(1) = γ(−1):
which shows ρ(1) = φ1 /(1 − φ2 ). When the zeros are a complex pair,
ρ(h) = c rh cos(hλ), a damped sinusoid, and realizations exhibit quasi-
periodic behavior.
Y = β0 + β1 X1 + 0 X2 + ⇔ Y = (β0 + β1 ) + 0 X1 − β1 X2 +
Both models obtain the same fit, but with very different coefficients.
Least squares can find many fits that all obtain the same R2 (the
coefficients lie in a subspace).
Statistics 910, #8 12
Non-causal process. These “odd” processes hint at how models are not
identifiable. Suppose that |φ̃| > 1. Is there a stationary solution to
Xt − φ̃Xt−1 = Zt for some white-noise process Zt ? The surprising
answer is yes, but it’s weird because it runs backwards in time. The
hint that this might happen lies in the symmetry of the covariances,
Cov(Xt+h , Xt ) = Cov(Xt , Xt+h ).
To arrive at this representation, forward-substitute rather than back-
substitute. This flips the coefficient from φ̃ to 1/φ̃ < 1. Start with the
process at time t + 1
Continuing recursively,
where w̃t = wt /φ̃. This is the unique stationary solution to the differ-
ence equation Xt − φ̃Xt−1 = wt . The process is said to be non-causal
since Xt depends on “future” errors ws , s > t, rather than those in
the past. If |φ̃| = 1, no stationary solution exists.
Thus, the non-causal process also has same correlation function as the
more familiar process with coefficient |1/φ̃| < 1 (the non-causal version
has smaller error variance).
Statistics 910, #8 13
Xt = φXt−1 + wt .
Moving averages: one more condition Such issues also appear in the
analysis of moving averages. Consider the covariances of the two pro-
cesses
Xt = wt + θwt−1 and Xt = wt−1 + θwt−2
The second incorporates a time delay. Since both are finite moving
averages, both are stationary. Is the model identified? It is with the
added condition that ψ0 = 1.
Γ(z) = σ 2 ψ(z)ψ(z −1 )
θ(z)θ(z −1 )
= σ2 ,
φ(z)φ(z −1 )
whether the zeros go inside or outside the unit circle. They cannot lie
on the unit circle.
Since both φ(z) (which has zeros outside the unit circle) and φ(1/z)
(which has zeros inside the unit circle) both appear in the definition
of Γ(z), some authors state the conditions for stationarity in terms of
one polynomial or the other. In any case, no zero can lie on the unit
circle.
ARIMA processes
Nonstationary processes are common in many situations, and these would
at first appear outside the scope of ARMA models (certainly by the
definition of S&S). The use of differencing, via the operator (1 −
B)Xt = Xt − Xt−1 changes this.
Statistics 910, #8 15
Yt = α + β t + Xt ⇒ (1 − B)Yt = β + Xt − Xt−1
Xt replaced by (1 − B)d Xt