Topic 5 Unit Roots, Cointegration and VECM

Topic 9: Unit Roots, Cointegration
and VECM
Dr. O. B. Aworinde
Department of Economics
Babcock University
Ilishan-Remo
These presentation notes are based on

Introductory Econometrics for Finance Second
edition, by Chris Brooks
Stationarity and Unit Root Testing
Why do we need to test for Non-Stationarity?
• The stationarity or otherwise of a series can strongly influence its
behaviour and properties - e.g. persistence of shocks will be infinite for
nonstationary series
• Spurious regressions. If two variables are trending over time, a regression

of one on the other could have a high R2 even if the two are totally
unrelated
• If the variables in the regression model are not stationary, then it can be
proved that the standard assumptions for asymptotic analysis will not be
valid. In other words, the usual “t-ratios” will not follow a t-distribution, so
we cannot validly undertake hypothesis tests about the regression
parameters.
2
Two types of Non-Stationarity
• Various definitions of non-stationarity exist

• There are two models which have been frequently used to characterise non-
stationarity: the random walk model with drift:
yt =  + yt-1 + ut (1)
and the deterministic trend process:
yt =  + t + ut (2)
where ut is white noise in both cases.
3
Sample Plots for various Stochastic Processes:
A White Noise Process
4
3
2
1
0
-1 1 40 79 118 157 196 235 274 313 352 391 430 469
-2
-3
-4
4
A Random Walk and a Random Walk with Drift
70
60
Random Walk
50
Random Walk with Drift
40
30
20
10
0
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289 307 325 343 361 379 397 415 433 451 469 487
-10
-20
5
A Random Walk and a Random Walk with Drift
• In any case, in a random walk, b1= 1

• If b0 = 0, it is a random walk without drift,
and if b0 ≠ 0, it is a random walk with drift.
6
A Deterministic Trend Process
30
25
20
15
10
5
0
-5 1 40 79 118 157 196 235 274 313 352 391 430 469
7
Autoregressive Processes with
differing values of  (0, 0.8, 1)
15
Phi=1
10
Phi=0.8
Phi=0
5
0
1 53 105 157 209 261 313 365 417 469 521 573 625 677 729 781 833 885 937 989
-5
-10
-15
-20
8
Definition of Non-Stationarity
• Consider again the simplest stochastic trend model:

yt = yt-1 + ut
or yt = ut
• We can generalise this concept to consider the case where the series
contains more than one “unit root”. That is, we would need to apply the
first difference operator, , more than once to induce stationarity.
Definition
If a non-stationary series, yt must be differenced d times before it becomes
stationary, then it is said to be integrated of order d. We write yt I(d).
So if yt  I(d) then dyt I(0).
An I(0) series is a stationary series
An I(1) series contains one unit root,
e.g. yt = yt-1 + ut
9
Characteristics of I(0), I(1) and I(2) Series
• An I(2) series contains two unit roots and so would require differencing
twice to induce stationarity.
• I(1) and I(2) series can wander a long way from their mean value and
cross this mean value rarely.
• I(0) series should cross the mean frequently.
• The majority of economic and financial series contain a single unit root,
although some are stationary and consumer prices have been argued to
have 2 unit roots.
10
How do we test for a unit root?
• The early and pioneering work on testing for a unit root in time series
was done by Dickey and Fuller (Dickey and Fuller 1979, Fuller 1976).
The basic objective of the test is to test the null hypothesis that  =1 in:
yt = yt-1 + ut
against the one-sided alternative hypothesis that  <1. So we have
H0: series contains a unit root
vs. H1: series is stationary.
• We usually use the regression:

yt = yt-1 + ut
so that a test of =1 is equivalent to a test of =0 (since -1=).
11
Different forms for the DF Test Regressions
• Dickey Fuller tests are also known as  tests: , , .

• The null (H0) and alternative (H1) models in each case are
i) H0: yt = yt-1+ut
H1: yt = yt-1+ut, <1
This is a test for a random walk against a stationary autoregressive process of
order one (AR(1))
ii) H0: yt = yt-1+ut
H1: yt = yt-1++ut, <1
This is a test for a random walk against a stationary AR(1) with drift.
iii) H0: yt = yt-1+ut
H1: yt = yt-1++t+ut, <1
This is a test for a random walk against a stationary AR(1) with drift and a
time trend.
12
Computing the DF Test Statistic
• We can write
yt=ut
where yt = yt - yt-1, and the alternatives may be expressed as
yt = yt-1++t +ut
with ==0 in case i), and =0 in case ii) and = -1. In each case, the
tests are based on the t-ratio on the yt-1 term in the estimated regression of
yt on yt-1, plus a constant in case ii) and a constant and trend in case iii).
The test statistics are defined as 

test statistic = 

SE( )
• The test statistic does not follow the usual t-distribution under the null,
since the null is one of non-stationarity, but rather follows a non-standard
distribution. Critical values are derived from Monte Carlo experiments in,
for example, Fuller (1976). Relevant examples of the distribution are
shown in table 4.1 below
13
Critical Values for the DF Test
Significance level 10% 5% 1%

C.V. for constant -2.57 -2.86 -3.43
but no trend
C.V. for constant -3.12 -3.41 -3.96
and trend
Table 4.1: Critical Values for DF and ADF Tests (Fuller,
1976, p373).
The null hypothesis of a unit root is rejected in favour of the stationary alternative
in each case if the test statistic is more negative than the critical value.
14
The Augmented Dickey Fuller (ADF) Test
• The tests above are only valid if ut is white noise. In particular, ut will be
autocorrelated if there was autocorrelation in the dependent variable of the
regression (yt) which we have not modelled. The solution is to “augment”
the test using p lags of the dependent variable. The alternative model in
case (i) is now written: p
yt =yt −1 +  i yt −i + ut
i =1
• The same critical values from the DF tables are used as before. A problem
now arises in determining the optimal number of lags of the dependent
variable.
There are 2 ways
- use the frequency of the data to decide
- use information criteria
15
Testing for Higher Orders of Integration
• Consider the simple regression:

yt = yt-1 + ut
We test H0: =0 vs. H1: <0.
• If H0 is rejected we simply conclude that yt does not contain a unit root.
• But what do we conclude if H0 is not rejected? The series contains a unit
root, but is that it? No! What if ytI(2)? We would still not have rejected. So
we now need to test
H0: ytI(2) vs. H1: ytI(1)
We would continue to test for a further unit root until we rejected H0.
• We now regress 2yt on yt-1 (plus lags of 2yt if necessary).
• Now we test H0: ytI(1) which is equivalent to H0: ytI(2).
• So in this case, if we do not reject (unlikely), we conclude that yt is at least
I(2).
16
The Phillips-Perron Test
• Phillips and Perron have developed a more comprehensive theory of unit

root nonstationarity. The tests are similar to ADF tests, but they incorporate
an automatic correction to the DF procedure to allow for autocorrelated
residuals.
• The tests usually give the same conclusions as the ADF tests, and and
suffer from most of the same important limitations as, the ADF tests
17
Criticism of Dickey-Fuller and
Phillips-Perron-type tests
• Main criticism is that the power of the tests is low if the process is
stationary but with a root close to the non-stationary boundary.
e.g. the tests are poor at deciding if
=1 or =0.95,
especially with small sample sizes.
• If the true data generating process (dgp) is

yt = 0.95yt-1 + ut
then the null hypothesis of a unit root should be rejected.
• One way to get around this is to use a stationarity test as well as the unit
root tests we have looked at.
18
Stationarity tests
• Stationarity tests have

H0: yt is stationary
versus H1: yt is non-stationary
So that by default under the null the data will appear stationary.
• One such stationarity test is the KPSS test (Kwaitowski, Phillips, Schmidt
and Shin, 1992).
• Thus we can compare the results of these tests with the ADF/PP procedure
to see if we obtain the same conclusion.
19
Stationarity tests (cont’d)
• A Comparison
ADF / PP KPSS
H0: yt  I(1) H0: yt  I(0)
H1: yt  I(0) H1: yt  I(1)
• 4 possible outcomes
Reject H0 and Do not reject H0

Do not reject H0 and Reject H0
Reject H0 and Reject H0
Do not reject H0 and Do not reject H0
20
Cointegration: An Introduction
• In most cases, if we combine two variables which are I(1), then the
combination will also be I(1).
• More generally, if we combine variables with differing orders of

integration, the combination will have an order of integration equal to the
largest. i.e.,
if Xi,t  I(di) for i = 1,2,3,...,k
so we have k variables each integrated of order di.
Let k (1)
zt =  i X i,t
i =1
Then zt  I(max di)
21
Linear Combinations of Non-stationary Variables
• Rearranging (1), we can write

k
X 1,t =  i X i ,t + z 't
i =2
i zt

where i = −
1 , z ' =
t  , i = 2,..., k
1
• This is just a regression equation.
• But the disturbances would have some very undesirable properties: zt´ is
not stationary and is autocorrelated if all of the Xi are I(1).
• We want to ensure that the disturbances are I(0). Under what circumstances
will this be the case?
22
Definition of Cointegration (Engle & Granger, 1987)
• Let zt be a k1 vector of variables, then the components of zt are cointegrated

of order (d,b) if
i) All components of zt are I(d)
ii) There is at least one vector of coefficients  such that  zt  I(d-b)
• Many time series are non-stationary but “move together” over time.
• If variables are cointegrated, it means that a linear combination of them will
be stationary.
• There may be up to r linearly independent cointegrating relationships (where
r  k-1), also known as cointegrating vectors. r is also known as the
cointegrating rank of zt.
• A cointegrating relationship may also be seen as a long term relationship.
23
Equilibrium Correction or Error Correction Models
• When the concept of non-stationarity was first considered, a usual

response was to independently take the first differences of a series of I(1)
variables.
• The problem with this approach is that pure first difference models have no
long run solution.
e.g. Consider yt and xt both I(1).
The model we may want to estimate is
 yt = xt + ut
But this collapses to nothing in the long run.
• The definition of the long run that we use is where

yt = yt-1 = y; xt = xt-1 = x.
• Hence all the difference terms will be zero, i.e.  yt = 0; xt = 0.
24
Specifying an ECM
• One way to get around this problem is to use both first difference and levels
terms, e.g.
 yt = 1xt + 2(yt-1-xt-1) + ut (2)
• yt-1-xt-1 is known as the error correction term.
• Providing that yt and xt are cointegrated with cointegrating coefficient ,

then (yt-1-xt-1) will be I(0) even though the constituents are I(1).
• We can thus validly use OLS on (2).
• The Granger representation theorem shows that any cointegrating

relationship can be expressed as an equilibrium correction model.
25
Testing for Cointegration in Regression
• The model for the equilibrium correction term can be generalised to

include more than two variables:
yt = 1 + 2x2t + 3x3t + … + kxkt + ut (3)
• ut should be I(0) if the variables yt, x2t, ... xkt are cointegrated.
• So what we want to test is the residuals of equation (3) to see if they

are non-stationary or stationary. We can use the DF / ADF test on ut.
So we have the regression
with vt  iid.
ut = ut −1 + vt
• However, since this is a test on the residuals of an actual model, ut ,
then the critical values are changed.
26
Methods of Parameter Estimation in
Cointegrated Systems:
The Engle-Granger Approach
• There are (at least) 3 methods we could use: Engle Granger, Engle and Yoo,
and Johansen.
• The Engle Granger 2 Step Method
This is a single equation technique which is conducted as follows:
Step 1:
- Make sure that all the individual variables are I(1).
- Then estimate the cointegrating regression using OLS.
- Save the residuals of the cointegrating regression, ut .
- Test these residuals to ensure that they are I(0).
Step 2:
- Use the step 1 residuals as one variable in the error correction model e.g.
 yt = 1xt + 2( uˆt −1 ) + ut
where uˆt −1= yt-1-ˆ xt-1
27
The Engle-Granger Approach: Some Drawbacks
This method suffers from a number of problems:

1. Unit root and cointegration tests have low power in finite samples
2. We are forced to treat the variables asymmetrically and to specify one as
the dependent and the other as independent variables.
3. Cannot perform any hypothesis tests about the actual cointegrating
relationship estimated at stage 1.
- Problem 1 is a small sample problem that should disappear

asymptotically.
- Problem 2 is addressed by the Johansen approach.
- Problem 3 is addressed by the Engle and Yoo approach or the Johansen
approach.
28
The Engle & Yoo 3-Step Method
• One of the problems with the EG 2-step method is that we cannot make
any inferences about the actual cointegrating regression.
• The Engle & Yoo (EY) 3-step procedure takes its first two steps from EG.
• EY add a third step giving updated estimates of the cointegrating vector

and its standard errors.
• The most important problem with both these techniques is that in the
general case above, where we have more than two variables which may be
cointegrated, there could be more than one cointegrating relationship.
• In fact there can be up to r linearly independent cointegrating vectors

(where r  g-1), where g is the number of variables in total.
29
The Engle & Yoo 3-Step Method (cont’d)
• So, in the case where we just had y and x, then r can only be one or zero.
• But in the general case there could be more cointegrating relationships.
• And if there are others, how do we know how many there are or whether
we have found the “best”?
• The answer to this is to use a systems approach to cointegration which will

allow determination of all r cointegrating relationships - Johansen’s
method.
30
Testing for and Estimating Cointegrating Systems Using
the Johansen Technique Based on VARs
The Johansen Test and Eigenvalues
• Some properties of the eigenvalues of any square matrix A:

1. the sum of the eigenvalues is the trace
2. the product of the eigenvalues is the determinant
3. the number of non-zero eigenvalues is the rank
• Returning to Johansen’s test, the VECM representation of the VAR was

yt =  yt-1 + 1 yt-1 + 2 yt-2 + ... + k-1 yt-(k-1) + ut
• The test for cointegration between the y’s is calculated by looking at the
rank of the  matrix via its eigenvalues.
• The rank of a matrix is equal to the number of its characteristic roots

(eigenvalues) that are different from zero.
31
The Johansen Test and Eigenvalues (cont’d)
• The eigenvalues denoted i are put in order:

1  2  ...  g
• If the variables are not cointegrated, the rank of  will not be
significantly different from zero, so i = 0  i.
Then if i = 0, ln(1-i) = 0
If the ’s are roots, they must be less than 1 in absolute value.
• Say rank () = 1, then ln(1-1) will be negative and ln(1-i) = 0
• If the eigenvalue i is non-zero, then ln(1-i) < 0  i > 1.
32
The Johansen Test Statistics
• The test statistics for cointegration are formulated as

g
trace (r ) = −T  ln(1 − ˆi )
i = r +1
and max (r , r + 1) = − T ln(1 − r +1 )
where is the estimated value for the ith ordered eigenvalue from the
 matrix.
trace tests the null that the number of cointegrating vectors is less than
equal to r against an unspecified alternative.
trace = 0 when all the i = 0, so it is a joint test.
max tests the null that the number of cointegrating vectors is r against
an alternative of r+1.
33
Johansen Critical Values
• Johansen & Juselius (1990) provide critical values for the 2 statistics.
The distribution of the test statistics is non-standard. The critical values
depend on:
1. the value of g-r, the number of non-stationary components
2. whether a constant and / or trend are included in the regressions.
• If the test statistic is greater than the critical value from Johansen’s
tables, reject the null hypothesis that there are r cointegrating vectors in
favour of the alternative that there are more than r.
34
The Johansen Testing Sequence
• The testing sequence under the null is r = 0, 1, ..., g-1

so that the hypotheses for trace are
H0: r = 0 vs H1: 0 < r  g

H0: r = 1 vs H1: 1 < r  g
H0: r = 2 vs H1 : 2 < r  g
... ... ...
H0: r = g-1 vs H1: r = g
• We keep increasing the value of r until we no longer reject the null.
35
Interpretation of Johansen Test Results
• But how does this correspond to a test of the rank of the  matrix?
• r is the rank of .
•  cannot be of full rank (g) since this would correspond to the original yt
being stationary.
• If  has zero rank, then by analogy to the univariate case, yt depends only
on yt-j and not on yt-1, so that there is no long run relationship between the
elements of yt-1. Hence there is no cointegration.
• For 1 < rank () < g , there are multiple cointegrating vectors.
36
Hypothesis Testing Using Johansen
• EG did not allow us to do hypothesis tests on the cointegrating relationship

itself, but the Johansen approach does.
• If there exist r cointegrating vectors, only these linear combinations will be
stationary.
• You can test a hypothesis about one or more coefficients in the
cointegrating relationship by viewing the hypothesis as a restriction on the
 matrix.
• All linear combinations of the cointegrating vectors are also cointegrating
vectors.
• If the number of cointegrating vectors is large, and the hypothesis under
consideration is simple, it may be possible to recombine the cointegrating
vectors to satisfy the restrictions exactly.
37
ARDL Approach to Cointegration
• The Autoregressive Distributed Lag (ARDL) approach developed by
Pesaran, Shin and Smith (2001) for testing the presence of
cointegrating relationship has peculiar advantages over other
symmetric cointegration tests.
• First, the ARDL approach can be applied to variables of a different
order of cointegration (Pesaran and Pesaran, 1997). That is when
variables have different order of integration say a mix of I(0) and
I(1).
• Second, the ARDL approach is applicable for small or finite sample
size (Pesaran et al 2001).
• Third, the short and long-run parameters are estimated concurrently.
• Fourth, the approach can accommodate structural breaks in time
series data.
38
• To use this approach the dependent variable must be an I(1)

variable, otherwise the approach cannot be used
• The ARDL method involves four steps.
• The first step is to examine the presence of cointegration using the
bounds testing procedure (Pesaran and Pesaran, 1997; Pesaran,
Shin and Smith, 2001).
• The second step is to estimate the coefficient of the long run
relationships identified in the first step. Having found long run
relationships among the variables, in the next step the long run
relationship is estimated using an appropriate lag selection criterion
based on Schwarz Bayesian Criterion (SBC) for the ARDL model as
only an appropriate lag selection criterion will be able to identify the
true dynamics of the model.
• The third step is to estimate the short run dynamic coefficients.
39
• The fourth stage involves testing for the stability of the model, by
using the CUSUM and CUSUMSQ. From the second stage, not only
are estimates of long-run elasticities obtained, but also the CUSUM
and CUSUMSQ tests are applied to the residuals of equation to test
for stability of long-run elasticities by taking into account the short-
run dynamics.
• Given that the long run relationship of Y=f(X) this is now expressed
structural as;
𝑌 =𝑏0 +𝑏1𝑋+ 𝜀𝑡 (1)

• Here it is assumed that Y is stationary at first difference and X is
stationary at levels.
40
• Thus, to distinguish the short-run effects measures from their long-

run effects, Equation (2) is specified in an error–correction modeling
form. Following Pesaran et al.’s (2001) bounds testing approach and
rewrite (1) as follows:
𝑝 𝑝
∆𝐿𝑛𝑌 = 𝛼 + 𝛽𝑖 ∆𝐿𝑛𝑌𝑡−1 + 𝛿𝑖 ∆ 𝐿𝑛𝑋𝑡−1 + 𝜌0 𝐿𝑛𝑌𝑡−1 + 𝜌1 𝐿𝑛𝑋𝑡−1 + 𝜀𝑡 (2)

𝑖=1 𝑖=0
• In this set up, the null of no cointegration is defined by H0 : ρ0 = ρ1 =

0 is tested against the alternative of H1 : ρ0 ≠ 0, ρ1 ≠ 0, by the F-test.
• The parameters ρ0 and ρ1 are the long-run multipliers, parameters α
and β are the short-run multipliers, and et represents residuals.
41
• The asymptotic distribution of the F-statistic is non-standard

irrespective of whether the variables are I(0) or I(1).
• Pesaran et al. (2001) tabulated two sets of appropriate critical
values. One set assumes all variables are I(1) and another assume
that they are all I(0).
• This provides a band covering all possible classifications of the
variables into I(0) and I(1) or even fractionally integrated.
• If the calculated F-statistic lies above the upper level of the band,
the null is rejected, indicating that cointegration exists. If the F-
statistic is below the lower critical bounds value, it implies no
cointegration. Lastly, if the F-statistic falls into the bounds then the
test becomes inconclusive.
42

Topic 5 Unit Roots, Cointegration and VECM

Uploaded by

Copyright:

Available Formats

Topic 5 Unit Roots, Cointegration and VECM

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic 5 Unit Roots, Cointegration and VECM

Uploaded by

Copyright:

Available Formats

Topic 9: Unit Roots, Cointegration

These presentation notes are based on

• Spurious regressions. If two variables are trending over time, a regression

• Various definitions of non-stationarity exist

and the deterministic trend process:

where ut is white noise in both cases.

• In any case, in a random walk, b1= 1

• Consider again the simplest stochastic trend model:

• I(0) series should cross the mean frequently.

• We usually use the regression:

• Dickey Fuller tests are also known as  tests: , , .

Significance level 10% 5% 1%

• Consider the simple regression:

• Phillips and Perron have developed a more comprehensive theory of unit

• If the true data generating process (dgp) is

• Stationarity tests have

Reject H0 and Do not reject H0

• More generally, if we combine variables with differing orders of

• Rearranging (1), we can write

• This is just a regression equation.

• Let zt be a k1 vector of variables, then the components of zt are cointegrated

• When the concept of non-stationarity was first considered, a usual

• The definition of the long run that we use is where

• Providing that yt and xt are cointegrated with cointegrating coefficient ,

• We can thus validly use OLS on (2).

• The Granger representation theorem shows that any cointegrating

• The model for the equilibrium correction term can be generalised to

• So what we want to test is the residuals of equation (3) to see if they

This method suffers from a number of problems:

- Problem 1 is a small sample problem that should disappear

• EY add a third step giving updated estimates of the cointegrating vector

• In fact there can be up to r linearly independent cointegrating vectors

• But in the general case there could be more cointegrating relationships.

• The answer to this is to use a systems approach to cointegration which will

• Some properties of the eigenvalues of any square matrix A:

• Returning to Johansen’s test, the VECM representation of the VAR was

• The rank of a matrix is equal to the number of its characteristic roots

• The eigenvalues denoted i are put in order:

• Say rank () = 1, then ln(1-1) will be negative and ln(1-i) = 0

• If the eigenvalue i is non-zero, then ln(1-i) < 0  i > 1.

• The test statistics for cointegration are formulated as

and max (r , r + 1) = − T ln(1 − r +1 )

• The testing sequence under the null is r = 0, 1, ..., g-1

H0: r = 0 vs H1: 0 < r  g

• We keep increasing the value of r until we no longer reject the null.

• EG did not allow us to do hypothesis tests on the cointegrating relationship

• To use this approach the dependent variable must be an I(1)

𝑌 =𝑏0 +𝑏1𝑋+ 𝜀𝑡 (1)

• Thus, to distinguish the short-run effects measures from their long-

∆𝐿𝑛𝑌 = 𝛼 + 𝛽𝑖 ∆𝐿𝑛𝑌𝑡−1 + 𝛿𝑖 ∆ 𝐿𝑛𝑋𝑡−1 + 𝜌0 𝐿𝑛𝑌𝑡−1 + 𝜌1 𝐿𝑛𝑋𝑡−1 + 𝜀𝑡 (2)

• In this set up, the null of no cointegration is defined by H0 : ρ0 = ρ1 =

• The asymptotic distribution of the F-statistic is non-standard

You might also like