Time Series Using Stata (Oscar Torres-Reyna Version) : December 2007
Time Series Using Stata (Oscar Torres-Reyna Version) : December 2007
Time Series Using Stata (Oscar Torres-Reyna Version) : December 2007
net/publication/303842450
CITATIONS READS
0 2,942
1 author:
Aymen Ammari
INSEEC
21 PUBLICATIONS 86 CITATIONS
SEE PROFILE
All content following this page was uploaded by Aymen Ammari on 08 June 2016.
-----STATA 10.x/11.x:
gen datevar = date(date2,"MDY", 2012)
format datevar %td /*For daily data*/
------STATA 9.x:
gen datevar = date(date2,"mdy", 2012)
format datevar %td /*For daily data*/
------STATA 10.x/11.x:
tostring date3, gen(date3a)
gen datevar=date(date3a,"YMD")
format datevar %td /*For daily data*/
------STATA 9.x:
tostring date3, gen(date3a)
gen year=substr(date3a,1,4)
gen month=substr(date3a,5,2)
gen day=substr(date3a,7,2)
destring year month day, replace
gen datevar1 = mdy(month,day,year)
format datevar1 %td /*For daily data*/
If the components of the original date are in different numeric variables (i.e. color black):
To extract days of the week (Monday, Tuesday, etc.) use the function dow()
Replace “date” with the date variable in your dataset. This will create the variable ‘dayofweek’ where 0 is ‘Sunday’, 1 is
‘Monday’, etc. (type help dow for more details)
To specify a range of dates (or integers in general) you can use the tin() and twithin() functions. tin() includes the
first and last date, twithin() does not. Use the format of the date variable in your dataset.
/* Make sure to set your data as time series before using tin/twithin */
tsset date
regress y x1 x2 if tin(01jan1995,01jun1995)
regress y x1 x2 if twithin(01jan2000,01jan2001)
3
PU/DSS/OTR
Date variable (example)
Time series data is data collected over time for a single or a group of variables. For this kind of data the first thing
to do is to check the variable that contains the time or date range and make sure is the one you need: yearly,
monthly, quarterly, daily, etc.
The next step is to verify it is in the correct format. In the example below the time variable is stored in “date”
but it is a string variable not a date variable. In Stata you need to convert this string variable to a date variable.*
A closer inspection of the variable, for the years 2000 the format changes, we need to create a new variable with
a uniform format. Type the following:
use http://dss.princeton.edu/training/tsdata.dta
gen date1=substr(date,1,7)
gen datevar=quarterly(date1,"yq")
format datevar %tq
browse date date1 datevar
browse
********************************************************
Once you have the date variable in a ‘date format’ you need to declare your data as time series in order to
use the time series operators. In Stata type:
tsset datevar
. tsset datevar
time variable: datevar, 1957q1 to 2005q1
delta: 1 quarter
If you have gaps in your time series, for example there may not be data available for weekends. This
complicates the analysis using lags for those missing dates. In this case you may want to create a continuous
time trend as follows:
gen time = _n
tsset time
Use the command tsfill to fill in the gap in the time series. You need to tset, tsset or xtset the data
before using tsfill. In the example below:
tset quarters
tsfill
With tsset (time series set) you can use two time series commands: tin (‘times in’, from a to b) and
twithin (‘times within’, between a and b, it excludes a and b). If you have yearly data just include the years.
datevar unemp
datevar unemp
/* Make sure to set your data as time series before using tin/twithin */
tsset date
regress y x1 x2 if tin(01jan1995,01jun1995)
regress y x1 x2 if twithin(01jan2000,01jan2001)
8
PU/DSS/OTR
Merge/Append
See
http://dss.princeton.edu/training/Merge101.pdf
PU/DSS/OTR
Lag operators (lag)
Another set of time series commands are the lags, leads, differences and seasonal operators.
It is common to analyzed the impact of previous values on current ones.
To generate values with past values use the “L” operator
generate unempL1=L1.unemp
generate unempL2=L2.unemp
list datevar unemp unempL1 unempL2 in 1/5
. generate unempL1=L1.unemp
(1 missing value generated)
. generate unempL2=L2.unemp
(2 missing values generated)
1. 1957q1 3.933333 . .
2. 1957q2 4.1 3.933333 .
3. 1957q3 4.233333 4.1 3.933333
4. 1957q4 4.933333 4.233333 4.1
5. 1958q1 6.3 4.933333 4.233333
. generate unempF1=F1.unemp
(1 missing value generated)
. generate unempF2=F2.unemp
(2 missing values generated)
. generate unempD1=D1.unemp
(1 missing value generated)
. generate unempD2=D2.unemp
(2 missing values generated)
1. 1957q1 3.933333 . .
2. 1957q2 4.1 .1666665 .
3. 1957q3 4.233333 .1333332 -.0333333
4. 1957q4 4.933333 .7000003 .5666671
5. 1958q1 6.3 1.366667 .6666665
. generate unempS1=S1.unemp
(1 missing value generated)
. generate unempS2=S2.unemp
(2 missing values generated)
1. 1957q1 3.933333 . .
2. 1957q2 4.1 .1666665 .
3. 1957q3 4.233333 .1333332 .2999997
4. 1957q4 4.933333 .7000003 .8333335
5. 1958q1 6.3 1.366667 2.066667
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
AC shows that the PAC shows that the Box-Pierce’ Q statistic tests Graphic view of AC Graphic view of PAC
correlation between the correlation between the the null hypothesis that all which shows a slow which does not show
current value of unemp and current value of unemp and correlation up to lag k are decay in the trend, spikes after the second
its value three quarters ago its value three quarters ago equal to 0. This series show suggesting non- lag which suggests that
is 0.8045. AC can be use to is 0.1091 without the effect significant autocorrelation stationarity. all other lags are mirrors
define the q in MA(q) only of the two previous lags. as shown in the Prob>Q See also the ac of the second lag. See the
in stationary series PAC can be used to define value which at any k are less command. pac command.
the p in AR(p) only in than 0.05, therefore
stationary series rejecting the null that all lags
are not autocorrelated. 14
PU/DSS/OTR
Correlograms: cross correlation
The explore the relationship between two time series use the command xcorr. The graph below shows the correlation
between GDP quarterly growth rate and unemployment. When using xcorr list the independent variable first and the
dependent variable second. type
xcorr gdp unemp, lags(10) xlabel(-10(1)10,grid) . xcorr gdp unemp, lags(10) table
-1 0 1
LAG CORR [Cross-correlation]
Cross-correlogram
1.00
1.00
-10 -0.1080
-9 -0.1052
Cross-correlations of gdp and unemp
-8 -0.1075
-7 -0.1144
0.50
0.50
-6 -0.1283
-5 -0.1412
-4 -0.1501
-3 -0.1578
0.00
0.00
-2 -0.1425
-1 -0.1437
0 -0.1853
-0.50
-0.50
1 -0.1828
2 -0.1685
3 -0.1177
4 -0.0716
-1.00
-1.00 5 -0.0325
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 6 -0.0111
Lag 7 -0.0038
8 0.0168
9 0.0393
10 0.0419
At lag 0 there is a negative immediate correlation between GDP growth rate and unemployment. This means that a drop
in GDP causes an immediate increase in unemployment.
15
PU/DSS/OTR
Correlograms: cross correlation
xcorr interest unemp, lags(10) xlabel(-10(1)10,grid)
Cross-correlogram
1.00
1.00
. xcorr interest unemp, lags(10) table
Cross-correlations of interest and unemp
-1 0 1
0.50
0.50
LAG CORR [Cross-correlation]
-10 0.3297
-9 0.3150
0.00
0.00
-8 0.2997
-7 0.2846
-6 0.2685
-0.50
-0.50
-5 0.2585
-4 0.2496
-3 0.2349
-1.00
-1.00
-2 0.2323
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
Lag -1 0.2373
0 0.2575
1 0.3095
2 0.3845
Interest rates have a positive effect on future level of 3 0.4576
unemployment, reaching the highest point at lag 8 (four 4 0.5273
quarters or two years). In this case, interest rates are positive 5 0.5850
correlated with unemployment rates eight quarters later. 6 0.6278
7 0.6548
8 0.6663
9 0.6522
10 0.6237
16
PU/DSS/OTR
Lag selection
Too many lags could increase the error in the forecasts, too few could leave out relevant information*.
Experience, knowledge and theory are usually the best way to determine the number of lags needed. There
are, however, information criterion procedures to help come up with a proper number. Three commonly used
are: Schwarz's Bayesian information criterion (SBIC), the Akaike's information criterion (AIC), and the
Hannan and Quinn information criterion (HQIC). All these are reported by the command ‘varsoc’ in Stata.
. varsoc gdp cpi, maxlag(10)
Selection-order criteria
Sample: 1959q4 - 2005q1 Number of obs = 182
When all three agree, the selection is clear, but what happens when getting conflicting results? A paper from
the CEPR suggests, in the context of VAR models, that AIC tends to be more accurate with monthly data,
HQIC works better for quarterly data on samples over 120 and SBIC works fine with any sample size for
quarterly data (on VEC models)**. In our example above we have quarterly data with 182 observations,
HQIC suggest a lag of 4 (which is also suggested by AIC).
* See Stock & Watson for more details and on how to estimate BIC and SIC
** Ivanov, V. and Kilian, L. 2001. 'A Practitioner's Guide to Lag-Order Selection for Vector Autoregressions'. CEPR Discussion Paper no. 2685. London, Centre for Economic Policy
Research. http://www.cepr.org/pubs/dps/DP2685.asp. 17
PU/DSS/OTR
Unit roots
Having a unit root in a series mean that there is more than one trend in the series.
18
PU/DSS/OTR
Unit roots
Unemployment rate.
line unemp datevar
12 10
Unemployment Rate
6 84
19
PU/DSS/OTR
Unit root test
The Dickey-Fuller test is one of the most commonly use tests for stationarity. The null
hypothesis is that the series has a unit root. The test statistic shows that the unemployment
series have a unit root, it lies within the acceptance region.
One way to deal with stochastic trends (unit root) is by taking the first difference of the variable
(second test below).
. dfuller unemp, lag(5)
Interpolated Dickey-Fuller
Unit root Test 1% Critical 5% Critical 10% Critical
Statistic Value Value Value
Interpolated Dickey-Fuller
Test 1% Critical 5% Critical 10% Critical
No unit root Statistic Value Value Value
Interpolated Dickey-Fuller
Unit root* Test 1% Critical 5% Critical 10% Critical
Statistic Value Value Value
See Stock & Watson for a table of critical values for the unit root test and the theory behind.
21
*Critical value for one independent variable in the OLS regression, at 5% is -3.41 (Stock & Watson) PU/DSS/OTR
Granger causality: using OLS
If you regress ‘y’ on lagged values of ‘y’ and ‘x’ and the coefficients of the lag of ‘x’ are
statistically significantly different from 0, then you can argue that ‘x’ Granger-cause ‘y’, this is,
‘x’ can be used to predict ‘y’ (see Stock & Watson -2007-, Green -2008).
unemp
L1. 1.625708 .0763035 21.31 0.000 1.475138 1.776279
L2. -.7695503 .1445769 -5.32 0.000 -1.054845 -.484256
L3. .0868131 .1417562 0.61 0.541 -.1929152 .3665415
L4. .0217041 .0726137 0.30 0.765 -.1215849 .1649931
gdp
L1. .0060996 .0136043 0.45 0.654 -.0207458 .0329451
L2. -.0189398 .0128618 -1.47 0.143 -.0443201 .0064405
L3. .0247494 .0130617 1.89 0.060 -.0010253 .0505241
L4. .003637 .0129079 0.28 0.778 -.0218343 .0291083
( 1) L.gdp = 0
(
(
2)
3)
L2.gdp = 0
L3.gdp = 0
You cannot reject the null hypothesis that all
( 4) L4.gdp = 0 coefficients of lag of ‘x’ are equal to 0.
F( 4, 179) = 1.67 Therefore ‘gdp’ does not Granger-cause
Prob > F = 0.1601
‘unemp’.
. 22
PU/DSS/OTR
Granger causality: using VAR
The following procedure uses VAR models to estimate Granger causality using the command
‘vargranger’
The null hypothesis is ‘var1 does not Granger-cause var2’. In both cases, we cannot reject
the null that each variable does not Granger-cause the other
23
PU/DSS/OTR
Chow test (testing for known breaks)
The Chow test allows to test whether a particular date causes a break in the regression coefficients. It is named after Gregory
Chow (1960)*.
Step 1. Create a dummy variable where 1 if date > break date and 0 <= break date. Below we’ll test whether the first quarter of
1982 causes a break in the regression coefficients.
tset datevar
Change “tq” with the correct date format: tw (week), tm (monthly), tq (quarterly), th (half), ty (yearly) and the
gen break = (datevar>tq(1981q4)) corresponding date format in the parenthesis
Step 2. Create interaction terms between the lags of the independent variables and the lag of the dependent variables. We will
assume lag 1 for this example (the number of lags depends on your theory/data)
Step 3. Run a regression between the outcome variables (in this case ‘unemp’) and the independent along with the interactions
and the dummy for the break.
Step 4. Run an F-test on the coefficients for the interactions and the dummy for the break
/* Replace the words in bold with your own variables, do not change anything else*/
/* The log file ‘qlrtest.log’ will have the list for QLR statistics (use Word to read it)*/
/* See next page for a graph*/
/* STEP 1. Copy-and-paste-run the code below to a do-file, double-check the quotes (re-type them if necessary)*/
sum qlr`var'
local maxvalue=r(max)
gen maxdate=datevar if qlr`var'==`maxvalue'
local maxvalue1=round(`maxvalue',0.01)
local critical=3.66 /*Replace with the appropriate critical value (see Stock & Watson)*/
sum datevar
local mindate=r(min)
sum maxdate
local maxdate=r(max)
gen break=datevar if qlr`var'>=`critical' & qlr`var'!=.
dis "Below are the break dates..."
list datevar qlr`var' if break!=.
levelsof break, local(break1)
twoway tsline qlr`var', title(Testing for breaks in GDP per-capita (1957-2005)) ///
xlabel(`break1', angle(90) labsize(0.9) alternate) ///
yline(`critical') ytitle(QLR statistic) xtitle(Time) ///
ttext(`critical' `mindate' "Critical value 5% (`critical')", placement(ne)) ///
ttext(`maxvalue' `maxdate' "Max QLR = `maxvalue1'", placement(e))
datevar qlrgdp
1989q1
1989q4
1990q2
1990q4
1991q2
1991q4
1996q4
1989q3
1990q1
1990q3
1991q1
1991q3
1992q1
1997q1
Time
27
PU/DSS/OTR
Time Series: white noise
White noise refers to the fact that a variable does not have autocorrelation. In Stata use the
wntestq (white noise Q test) to check for autocorrelation. The null is that there is no serial
correlation (type help wntestq for more details):
. wntestq unemp
If your variable is not white noise then see the page on correlograms to see the order of the
autocorrelation.
28
PU/DSS/OTR
Time Series: Testing for serial correlation
Breush-Godfrey and Durbin-Watson are used to test for serial correlation. The null in both tests
is that there is no serial correlation (type help estat dwatson, help estat dubinalt
and help estat bgodfrey for more details).
. regress unempd gdp
. estat dwatson
. estat durbinalt
1 118.790 1 0.0000
. estat bgodfrey
Serial correlation
Breusch-Godfrey LM test for autocorrelation
1 74.102 1 0.0000
rho .966115
• Statistics with Stata (updated for version 9) / Lawrence Hamilton, Thomson Books/Cole, 2006
31
View publication stats PU/DSS/OTR