Handout Time Series For Sem II 2020 PDF
Handout Time Series For Sem II 2020 PDF
Handout Time Series For Sem II 2020 PDF
Time Series
Data Generating Process
The process of realisation of a time series data is known as the data generating process
(DGP). The underlying factors determining the process are stochastic or random and thus
DGP is a stochastic process.
In statistical sense, a time series is a random variable ordered in time. Therefore, a time series
variable can be analysed in terms of its probability distribution.
The stochastic process of a time series {yt} is defined by the marginal distribution
function obtained from this joint distribution:
f t yt .... f y1 , y2 ,.... yT dy1dy2 ...dyt 1dyt 1....dyT
The marginal distribution function defined in the above equation is characterised by the first
and second order moments:
t E yt
t2 E yt t 2 Eyt2 t2
t ,k cov yt , yk E yt t yk k E yt yk t k
The DGP of a time series is of two types: stationary and non-stationary. Stationary and
nonstationary processes are different in their properties, and they require different inference
procedures.
A time series variable, yt, is generated by a stationary process and the variable will be stationary
if the marginal distribution of the variable is time invariant.
For a time series variable {yt} = (y1, y2, y3,…..yT), the T-dimensional distribution function is
defined by
Ft y1 , y2 ,.... yT PY1 y1 ,......YT yT
The DGP of {yt} is said to be first order stationary, if the one dimensional distribution function
of the series is time-invariant:
Ft yi Ft k yi )
The process will be second order stationary if
Ft yi , y j Ft k yi , y j
In this way, the process is said to be Tth order stationary if
Ft y1, y2 ,..... yT Ft k y1, y2 ,.... yT
A time series process is strictly stationary or strongly stationary if the first and second order
moments of the marginal distribution are time independent, and the series is Tth order
stationary:
1. E( yt ) E( yt 1 ) ..
2. E[( yt ) ] E[( yt 1 ) ] .. y Vary
2 2 2
A time series variable is nonstationary if the marginal distribution of {yt} is time varying.
Nonstationary process is a state of statistical disequilibrium exhibiting trend. A stationary series
does not follow any trend. For nonstationary time series, either mean or variance or both are
time decaying:
1. E ( yt ) t
2. E ( yt ) t
2 2
3. E ( yt yt k ) Cov( yt j yt j k ) k t
yt 0 1 yt 1 2 yt 2 t
It is convenient to use a time-series operator called the lag operator to express equations in
compact form. The lag operator, L(.), is a mathematical operator or function in which the
argument is an element of a time series. By applying the lag operator to yt we get its predecessor
yt–1:
yt 1 Lyt ,
yt 2 Lyt 1 LLyt L2 yt
Similarly, yt p L yt
p
AR(1) process
Suppose that a time series variable, yt, follows AR(1) process. The DGP of yt following AR(1)
is described as
yt 0 1 yt 1 t
t 1 t 1
yt 0 1i 1t y0 1i t i
i 0 i 0
Here, yt could be interpreted as an aggregation of the entire history of innovations.
1 1Lyt 0 t
Or,
yt 0 1 1 1 1L t
1 1
By expanding the polynomial function in right hand side we get the similar result under the
restriction that 1 1 :
0
yt 1i t i
1 1 i 0
AR(2) process
The AR(2) process of yt is specified as
yt 0 1 yt 1 2 yt 2 t
By using lag operator, the AR(2) process is expressed as
1 L L y
1 2
2
t 0 t
The lag polynomial function or, AR inverse characteristic polynomial for AR(2) is
L 1 1 L 2 L2
And AR inverse characteristic equation is
L 1 1L 2 L2 0
The corresponding AR characteristic equation is
z z 2 1 z 2 0
1
Here, z
L
1 12 4 2
z1 , z2 =
2
We will discuss below that the stochastic behaviour of AR(2) process depends on the nature of
the roots.
Example 1
yt 0.75 yt 1 0.125 yt 2 t .
The lag polynomial for this process is 1 0.75L 0.125L2 and the characteristic equation is
z 2 0.75z 0.125 0 .
We can find that the characteristic roots are z1 = 0.50 and z2 = 0.25. Both roots are real and less
than one in absolute value, so this AR(2) process is stationary.
Example 2
yt 1.25 yt 1 0.25 yt 2 t
z 2 1.25z 0.25 0 .
The characteristic roots are z1 = 1, and z2 = 0.25. In this case, one root is equal to unity and the
other is less than unity, and the AR(2) process is nonstationary containing 1 unit root.
Example 3
z 2 0.2 z 0.35 0
The characteristic roots, z1 = 0.7 and z2 = -0.5, both are less than 1 in absolute value, and the
series will be stationary
Example 4
yt 1.6 yt 1 0.9 yt 2 t
z 2 1.6 z 0.9 0
The characteristic roots will be imaginary and the homogeneous solution has the form
1.6
cos 1 1
0.567
20.92
Therefore,
yt t 1 t 1
i 0
E(yt) = 0
V yt 1 12 2
Cov y t , y t k E t 1 t 1 t k 1 t k 1
E t t k 1 E t t k 1 1 E t 1 t k 12 E t 1 t k 1
2 1 12 , for k 0
1 2 , for k 1
0, for k 1
The mean, variance and covariance of MA(1) series are time independent. Therefore, yt
following MA(1) is stationary. Although, yt is a function of white noise variable, it will not be
white noise, because the covariance is not zero for k = 1.
The autocorrelation function for MA(1) process is 0 beyond lag length 1. This fact is important
for forecasting.
E yt 0
V yt 1 12 22 2
Cov y t , y t k E t 1 t 1 2 t 2 t k 1 t k 1 2 t k 2
E t t k 1 E t t k 1 2 E t t k 2 1 E t 1 t k 12 E t 1 t k 1
1 2 E t 1 t k 2 2 E t 2 t k 21 E t 2 t k 1 22 E t 2 t k 2
2 1 12 22 , for k 0
1 1 2 , for k 1
2
2 2 , for k2
0, for k 2
Therefore, mean, variance and covariance are time invariant, but covariance is not equal to zero
for k = 1 and k = 2.
2
Corr yt , yt 2 2
1 12 22
ρk = 0, for k > 2
Autoregressive Moving Average (ARMA) process
The ARMA (p, q) model is specified as
yt 0 1 yt 1 2 yt 2 ... p yt p t 1 t 1 .. q t q
In this equation, there are p autoregressive terms and q moving average terms. This type of
stochastic process is referred to as an autoregressive moving average process of order (p, q), or
an ARMA(p, q) process. If q = 0, the process is a pure autoregressive process of order p, AR(p).
Similarly, if p = 0, the process is a pure moving average of order q, MA(q).
Lyt 0 L t
The left-hand side is the autoregressive part of the process which is the homogeneous
difference equation containing p lags and the right-hand side is the moving average part
containing q lags plus a drift that allows the mean of y to be non-zero.
yt 0 1 yt 1 t 1 t 1
We can expresse it as
0
yt y01t 1i 1 1 1 t i
1 1 i 1
Autocorrelation Function
Autocorrelation refers to the correlation of a time series variable with its own past and future
values.
Autocorrelation is defined as the ratio of autocovariance to variance of a time series variable:
cov yt , yt k k
k
var yt 0
The graphical representation of ACF is called correlogram.
zt zt k 1 zt 1 zt k t zt k
Taking expectations on both sides,
E( zt zt k ) 1E( zt 1 zt k ) E( t zt k )
Or, k 1 k 1
Here, εt, εt-1, ……εt-k-1 are independent of yt-k
Or, k 1 0
k
k
Therefore, k 1k
0
The autocorrelation function is converging as long as |ϕ1| < 1. Therefore, |ϕ1| < 1 is a necessary
and sufficient condition for covariance stationarity of the AR(1) process. The condition |ϕ1| <
1 also assures that the AR process is ergodic.
lim k 0
k
zt 1 zt 1 2 zt 2 t
where zt yt E yt
Multiplying both sides by zt-k,
zt zt k 1 zt 1 zt k 2 zt 2 zt k t zt k
After taking expectations on both sides,
k 1 k 1 2 k 2
Or, k 1 k 1 2 k 2
This equation is called the Yule-Walker equation.
It is valid for k ≥ 2
For k = 0, 1
1
1
1 2
For k= 2,
2 1 1 2
or ,
12 2 2 1 2
2 2 1
1 2 1 2
We can solve the second order difference equation to find out the ACF function for AR(2)
which is similar to the time path of yt . The explicit solution depends critically on the
characteristic roots. For real and unequal roots, the solution will be
k A1h1k A2 h2k
at k = 0, A1 A2 1
at k = 1, 1 A1h1 A2 h2
Therefore,
1 A1h1 1 A1 h2
1 h2
or, A1
h1 h2
h1 1
or, A2
h1 h2
1 h2 k h1 1 k
Therefore, k h1 h2
h1 h2 h1 h2
Or, k
1 h2 h1k h1 1 h2k
h1 h2
Here,
1 12 42
1 h2 1
1 2 2
Therefore, k
1 h h
1
2 k 1
2
1 h22 h1k 1
h2 h1 1 h1h2
If the roots are real and equal, 12 42 0 . In this case,
k A1h k A2 khk
at k = 0, A1 = 1
at k =1, 1 h A2 h
1
h 1 2 2 1 2
A2 1 1 1
h 1 1 2 1 2
2
k
Therefore, k 1 1 2 k 1
1 2 2
1 2
tan 1
1 2
Therefore, in the case of AR(2), the autocorrelation function can assume a wide variety of
shapes. In all cases, the magnitude of ρk dies out exponentially fast as the lag k increases if
stationarity restrictions are satisfied. In the case of complex roots, ρk displays a damped sine
wave behaviour with damping factor r, frequency θ, and phase Φ.
E( yt yt k ) E( t yt k ) 1E( t 1 yt k )
For k = 0,
0 2 12 2 1 12 2
For k = 1,
E yt yt 1 E t yt 1 1 E t 1 yt 1
or , 1 1 2
Therefore,
1
1
1 12
For k 2, k k 0
Therefore, the MA(1) process has no correlation beyond lag 1.
yt t 1 t 1 2 t 2
For k=0,
E ( yt yt ) E ( t yt ) 1 E ( t 1 yt ) 2 E ( t 2 yt )
or , 0 2 12 2 22 2 (1 12 22 ) 2
For k=1,
E ( yt yt 1 ) E ( t yt 1 ) 1 E ( t 1 yt 1 ) 2 E ( t 2 yt 1 )
or , 1 1 2 21 2
For k=2
E ( yt yt 2 ) E ( t yt 2 ) 1 E ( t 1 yt 2 ) 2 E ( t 2 yt 2 )
or , 2 2 2
For k ≥ 3
k 0
Therefore, for an MA(2) process,
1 1 2
1
1 12 22
2
2
1 12 22
k 0 , for k ≥ 3
Therefore, the MA(2) process has no correlation beyond lag 2.
zt 1 zt 1 t 1 t 1
Now, E zt t
2
Ezt t 1 1 1 2
Multiplying both sides of (10.6.24) by zt-k, and taking expectations
0 1 1 2 1 1 (1 1 )
1 1 0 1 2
Substituting the value of γ1
k 1 k 1 , for k ≥ 1
By solving the simple recursion gives
k
1 11 1 1 k 1
1 211 12
1
The autocorrelation function decays exponentially with the increase in lag length k.
Partial Autocorrelation Function (PACF)
The partial correlation between two variables measures the degree of relationship between
them which is not explained by the correlations of other variables. For example, if we regress
a variable y on other variables x1, x2, and x3, the partial correlation between y and x3 is the
correlation between y and x3 after eliminating the effects of x1 and x2 on y.
Formally, we can define partial correlation as
cov y, x3 | x1 , x2
*
V y | x1 , x2 V x3 | x1 , x2
For a time series yt, the partial autocorrelation between yt and yt-k is defined as the conditional
correlation between yt and yt-k. The condition imposed on yt-k+1, ... , yt-1, the set of observations
that lie between the time points t and t−k.
cov yt , yt k | yt 1 , yt 2 ,..... yt k 1
k*
V yt | yt 1 , yt 2 ....., yt k 1 V yt k | yt 1 , yt 2 ,... yt k 1
Alternatively,
k* corr yt , yt k | yt 1 , yt 2 ,......, yt k 1
Or, k corr yt k1 yt 1 k 2 yt 2 ...... k ,k 1 yt k 1 , yt k
*
Or, k
*
corr k ,k yt k t , yt k kk
Therefore, the partial autocorrelation at lag k is equal to the estimated AR(k) coefficient in an
autoregressive model with k terms.
The coefficient of yt-2 in the linear regression of yt on yt-1, and yt-2 is the partial correlation
coefficient of order 2:
2* corr yt , yt 2 | yt 1 corr yt 21 yt 1 , yt 2 corr 22 yt 2 t , yt 2 22
An invertible MA(q) process could be converted into AR(∞) process and the PACF will never
cut off as for the AR(p) with finite p.
The PACF of MA models behaves like ACF for AR models and PACF for AR models
behaves like ACF for MA models.
The shape of the ACF and PACF are described in short for different types of DGP of a time
series in the following Chart.
ACF PACF
WN rs 0, s 0 ss 0, s 0
AR(1) Exponential Decay: Spike at lag 1 (at p for AR(p))
rs s
11 r1 ,ss 0 for s 2
0 direct decay
0 oscillating
MA(1) Positive (negative) spike Oscillating (geometric) decay
at lag 1 for 0 ( 0 ) for 11 0 ( 11 0 )
rs 0 for
s2
ARMA(1,1) Exponential (oscillating) decay Oscillating (exponential) decay
at lag 1 if 0 ( 0 ). at lag 1, 11 1
Decay at lag q for ARMA(p,q) Decay after lag p for ARMA(p,q)
Sample Autocorrelation Function
y t y yt k y
rk t k 1
T
y y
2
t
t 1
The sample ACF plays an important role in identifying an appropriate stochastic process. For
systematic inference concerning ρk, we need the sampling distribution of the estimator rk.
It can be shown that, for iid random disturbance with finite variance, the sample autocorrelation
rk, is approximately identically independently normally distributed with mean 0 and variance
1/T for T large. Hence, approximately 95 percent of the sample autocorrelations should fall
between the bounds ±1.96/√T.
If {yt} has a finite variance, Q is approximately distributed as χ2, the sum of squares of the
independent N(0, 1) random variables, with k degrees of freedom.
A large value of Q suggests that the sample autocorrelations of the data are too large to be a
sample from an iid sequence. We, therefore, reject the iid hypothesis of yt at level α if Q > χ21−α
(k) where χ21−α (k).
The distribution of this statistic is better approximated by the χ2 distribution with k degrees of
freedom.
Nonstationarity with unit root
Many economic and financial time series like asset prices, exchange rates and GDP exhibit
trending behaviour or nonstationarity either in mean, or in variance, or in both.
The characteristic roots of the autoregressive or moving average polynomial of an ARMA
process is helpful to determine the nature of stochastic behaviour of a time series. The
characteristic roots of AR characteristic equation less than unity implies that the series is
stationary. When the characteristic roots lie on or near the unit circle we can say that the series
contains unit root. The presence of unit root in autoregressive process implies that the series is
nonstationary exhibiting stochastic trend with or without deterministic trend. A series with unit
roots follows random walk. If a series contains unit root, it has no tendency to return to a long
run deterministic path. The variance of the series with unit root is time dependent and increases
over time. This type of non-stationarity produces permanent effects from random shocks. A
series with no unit roots having constant mean and variance is stationary, and that the effects
of shocks on the series dissipate over time. This feature is crucial for economic forecasting.
To understand the concept of unit root and the behaviour of AR process, let we start with the
AR(1):
yt 1 yt 1 t ,
or, (1 1L) yt t
1
z 1 0 , where z
L
The value of z is the characteristic root. If the characteristic root equals unity, then yt contains
unit root. Therefore, yt following AR(1) process will contain unit root when
z 1 1
In this case the AR(1) will be
yt yt 1 t
In a random walk process, yt is the sum of its initial value and the accumulation of shocks. In
time series literature, the accumulation of shocks is known as stochastic trend.
In this case we have,
E yt y0
Therefore, the unconditional mean of yt does not exhibit any trend.
2
t 1
V yt E t i t 2
i 0
t 1 t 1
Cov yt , yt k E t i t k i t k 2
i 0 i 0
Trend in variance is the outcome of stochastic trend.
The variable yt generated through the random walk model is non-stationary exhibiting
stochastic trend. But after taking a difference of the series it becomes stationary:
y y t
Therefore, the time series, yt, following purely a stochastic change over time exhibits stochastic
trend, and the stochastic process of the series is known as the difference stationary process
(DSP).
1 1L 2 L2 0
1
By setting z we get the characteristic equation:
L
z 2 1 z 2 0
1 12 4 2
The characteristic roots, z
2
When at least one of the characteristic roots equal to unity, the AR (2) series will contain unit
root.
If z1 = z2 = 1, there are two unit roots. In this case, ϕ1=2, and ϕ2= -1
yt 0 2 yt 1 yt 2 t
In the case of AR (2), maximum number of unit root = 2, and in this case the series to be
differenced twice to have stationary series and the series will be integrated of order 2.
If z1 = 1 and z2 < 1, or z2 = 1 and z1 < 1, the AR(2) series has one unit root, then we have to
take 1st difference to make it stationary, and in this case it will be integrated of order 1.
If z1 < 1 and z2 < 1, the AR(2) series has no unit root, the series will be stationary and it is
called integrated of order 0.
Or, yt yt 1
The general solution to this linear difference equation is
yt y0 t
Here, yt is exhibiting deterministic linear time trend.
The actual time behaviour of yt is described by the following way:
yt t t , t ~ N 0, 2 , cov t , t k 0
The population regression function is
E yt t
This is the popular trend model used to estimate growth rate. The trend in mean value is
completely predictable and is known as the deterministic trend.
The variance of yt,
V yt E yt E yt E t2 2
2
The variance of yt is time invariant and dose not exhibit any trend.
Cov yt , yt k Eyt E yt yt k E yt k E t t k 0
While variance and covariance are time invariant, the mean of yt is time variant. Thus, yt is
nonstationary time series exhibiting deterministic trend. Deterministic trend is a systematic
change of the mean level of a series over time.
TSP DSP
1. Non stationary 1. Non stationary
2. Stationary by detrending 2. Stationary by taking difference
3. Trend in mean – deterministic trend 3. Trend in mean – deterministic trend,
4. No trend in variance or covariance – or no trend in mean (without
no stochastic trend deterministic trend)
5. Effect of shock is transitory, it dies 4. Trend in variance or covariance –
out shortly stochastic trend. The stochastic
trend incorporates all the random
shocks (ε1 to εt) that have permanent
effects on the level of yt.
5. Effect of shock is permanent, it
persists for long period
1
A random walk without drift has no trend in the mean values of the variable.
Dickey-Fuller tests assumed that a series has at most one unit root. Dickey and Fuller (1979)
developed a procedure for testing whether a variable has a unit root. The null hypothesis is that
the variable contains a unit root, and the alternative is that the variable was generated by a
stationary process.
We can express AR(1) in three different alternative ways as
yt 1 yt 1 t
yt 0 1 yt 1 t
yt 0 1 yt 1 t t
Autoregressive unit root tests are based on testing the null hypothesis that the root of the
autoregressive polynomial is unity, ϕ1 = 1, against the alternative hypothesis that ϕ1 < 1.
H 0 : 1 1
H 1 : 1 1
In the third specification, the null hypothesis is that yt follows a unit root with drift and a time
trend in the regression.
H 0 : 1 1, 0
H 1 : 1 1, 0
However, such a regression is likely to be affected by serial correlation. To control for that, the
Dickey–Fuller test estimates a model of the following form:
yt yt 1 t
yt 0 yt 1 t
yt 0 yt 1 t t
The hypothesis is
H0 : 0
H1 : 0
or
H0 : 0, 0
H1 : 0 , 0
Here, 1 2
In equation (3.4.16), 1yt 1 is called the augmented term and the model is augmented Dickey
Fuller (ADF) model. The unit root test by using ADF model is called ADF unit root test. If yt
follows AR(2), we need 1 augmented term to convert it into AR(1). Therefore, it is ADF of
order 1. The AR(1) series is ADF of order 0.
Cointegration
Classical econometric theory assumes that the random disturbance is white noise and this
assumption remains valid when the variables used in a regression model are stationary.
Consider a linear relationship between two time series variables:
x1t 1 2 x2t t
If the series, x1t and x2t, are stationary, the random disturbance, t x1t 1 2 x2t , will be
stationary and we can consistently estimate the parameters by using OLS
Suppose that x1t denotes GDP and x2t denotes number of car accident over time. In this example,
there is no reason that x1t and x2t are causally related. But, this relationship may produce
empirical results in which the R2 is quite high, and the Durbin-Watson statistic is quite low.
This is the symptom for spurious regression because of non-stationary random walks behaviour
of the variables.
In a two variable framework, if x1t and x2t both are integrated of order 1 containing single unit
root and their linear combination, x1t 1 2 x2t t is stationary, then x1t and x2t will be
cointegrated.
By following Engle and Granger (1987), we can define cointegration in the following way:
For two time series x1t and x2t which are both I(d), if there exists a vector of β = ( β1, β2) such
that t x1t 1 2 x2t is I(d-b), d b > 0, then, by following Engle and Granger (1987), x1t
and x2t are defined as cointegrated of order (d, b).
Engle and Granger (1987) proposed for testing the null hypothesis of no cointegration between
a set of I (1) variables in a single equation method. In the first step, they estimate the
coefficients of a static relationship between a set of I (1) variables by ordinary least squares
and, in the second step, ADF unit root test is carried out to the residual. Rejection of the null
hypothesis provides an evidence in favour of cointegration.
For two I(1) variables, the cointegration equation between two variables is
x1t 0 1 x2t ut
If ût has a unit root, then x1t and x2t are not cointegrated.
Therefore, to carry out the test we have to estimate by applying OLS, and unit root tests are
applied to the residual û t .
uˆt uˆt 1 vt
The intercept term has not been included because the residuals from a regression equation have
zero mean. If the null hypothesis is rejected at a given significance level, we can infer that the
x1t and x2t are cointegrated of order (1, 1), CI (1, 1).