SEARCHING FOR ADDITIVE OUTLIERS IN NONSTATIONARY
TIME SERIES*
By Pierre Perron and Gabriel Rodríguez
Boston University and Universite´ d’ Ottawa
First Version received May 2000
Abstract. Recently, Vogelsang (1999) proposed a method to detect outliers which
explicitly imposes the null hypothesis of a unit root. It works in an iterative fashion to
select multiple outlier in a given series. We show, via simulations, that, under the null
hypothesis of no outliers, it has the right size in finite samples to detect a single outlier but,
when applied in an iterative fashion to select multiple outliers, it exhibits severe size
distortions towards finding an excessive number of outliers. We show that his iterative
method is incorrect and derive the appropriate limiting distribution of the test at each step
of the search. Whether corrected or not, we also show that the outliers need to be very
large for the method to have any decent power. We propose an alternative method based
on first-differenced data that has considerably more power. We also show that our method
to identify outliers leads to unit root tests with more accurate finite sample size and
robustness to departures from a unit root. The issues are illustrated using two US/Finland
real-exchange rate series.
Keywords. Additive outliers; t-test; Wiener process; unit root; size; power.
JEL: C2, C3, C5.
1.
INTRODUCTION
From Fox (1972), who introduced the notion of additive and innovational
outliers, issues related to this type of atypical observations in time series have
received considerable attention in the statistics and econometric literature. The
outlier detection issue, itself, has received particular attention.1 Another topic of
*This paper is drawn from Chapter 3 of Gabriel Rodrı́guez’s PhD dissertation at the
Université de Montréal, Rodrı́guez (1999). We would like to thank Tim Vogelsang for
useful conversations. We also thank Lynda Khalaf for comments on an earlier version of
this paper entitled ‘Additive Outliers and Unit Roots with an Application to LatinAmerican Inflation’ when it was presented at the 39th Congrès de la société canadienne de
sciences économiques, Hull (Québec), May 1999. Address for correspondence: Pierre
Perron, Department of Economics, Boston University, 270 Bay State Road, Boston, MA,
02215, USA (e-mail:
[email protected]).
1
See, for example, Hawkins (1980), who presents a set of methods proposed before 1980
and Hawkins (1973) who proposed one of the most used methods, based on order statistics,
to detect for outliers.
0143-9782/03/02 193–220
JOURNAL OF TIME SERIES ANALYSIS Vol. 24, No. 2
Ó 2003 Blackwell Publishing Ltd., 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main
Street, Malden, MA 02148, USA.
194
P. PERRON AND G. RODRÍGUEZ
interest in the research has been the estimation of autoregressive moving average
(ARMA) models in the presence of outliers. In this case, as mentioned by Chen
and Liu (1993), a common approach is to identify the locations and the types of
outliers and then to accommodate the effects of outliers using intervention models
as proposed by Box and Tiao (1975). This approach requires iterations between
stages of outlier detection and estimation of the model.2
In the context of integrated data (processes with an autoregressive (AR) unit
root), the effects of additive outliers have recently been the object of sustained
research. It is, by now, well recognized that outliers affect the properties of unit
root tests (Franses and Haldrup, 1994). They do so by inducing a negative
moving average (MA) component in the noise function which causes most unit
root tests to exhibit substantial size distortions towards rejecting the null
hypothesis too often. Franses and Haldrup (1994) suggested applying Dickey–
Fuller (1979) unit root tests by incorporating dummy variables in the
autoregression chosen on the basis of the outlier detection procedure
proposed by Chen and Liu (1993). This procedure has been implemented in
the computer program TRAM (Time Series Regression with ARIMA Noise and
Missing Values) written by Gómez and Maravall (1992b), which allows us to
estimate ARIMA models where missing observations may be treated as additive
outliers.
In an interesting recent paper, Vogelsang (1999) makes two contributions to
the issue about the effects of additive outliers on unit root tests. First,
recognizing that outliers induce a negative MA component, he suggests using
unit root tests developed by Stock (1999) and Perron and Ng (1996) that are
robust, in terms of achieving exact size close to nominal size in small samples,
even in the presence of a substantial negative MA component. He shows via
simulations that these unit root tests are little affected by systematic outliers.
Second, he recognized that one can take advantage of the null hypothesis of a
unit root in devising an outlier detection procedure. This allows the derivation
of a non-degenerate limiting distribution for the t-statistic on the relevant onetime dummy.
In this paper, we make further contributions following the second suggestion
of Vogelsang (1999). We show, via simulations, that Vogelsang’s (1999)
procedure, under the null hypothesis of no outlier, has the right size in finite
samples to detect a single outlier but, when applied in an iterative fashion to
select multiple outliers, it exhibits severe size distortions towards finding an
excessive number of outliers. We show that there is a basic flaw in the iterative
2
Some references are Chang et al. (1988) and Tsay (1986). Chen and Liu (1993) also
followed this way and they proposed another method to detect the locations of the outliers
and the joint estimation of the parameters of the model. Their point of view was the fact
that, even if the model is well specified, outliers may still produce biased estimates of the
parameters and, hence, may affect the outlier detection procedure. This is because atypical
observations, in general, affect the variance of the estimates (Peña, 1990).
Ó Blackwell Publishing Ltd 2003
SEARCHING FOR OUTLIERS I(1) TIME SERIES
195
method suggested by Vogelsang (1999). In effect, contrary to what he implicitly
assumes, the limiting distribution of the test used is different at each iteration of
the outlier detection procedure. We derive the appropriate limiting distribution
and tabulate some critical values. When so corrected, his method is shown to
have very low power to detect outliers (even a single one without the correction
made) unless the magnitude of the outlier is very large. As an alternative, we
propose a method based on first-differenced data which has considerably more
power. All of the methods considered are illustrated using two US/Finland realexchange rate series.
The rest of the paper is organized as follows. Section 2 deals with the model and
the issue of outlier detection. It reviews the procedure suggested by Vogelsang
(1999) and presents simulation evidence about its size. Section 3 derives the
correct limiting distribution of the test he suggested for each iteration of the
outlier detection procedure. Section 4 presents the procedure based on firstdifferenced data. Section 5 compares its size and power to methods based on levels
of the data using simulations. The size of unit root tests corrected using the
various methods to detect outliers is investigated in Section 6. An empirical
illustration using two US/Finland real-exchange rate series is presented in Section
7. Section 8 presents brief concluding remarks and some details about the data
used are discussed in an Appendix.
2.
THE MODEL AND THE ISSUE OF OUTLIER DETECTION
There is a large literature in statistics and econometrics on the subject of outlier
detection in ARMA models. The standard approach is to estimate a fully
parameterized ARMA model and construct a t-statistic for the presence of an
outlier. Such a t-statistic is constructed at all possible dates and the supremum is
taken. The value of the supremum is then compared to a critical value to decide if
an outlier is present. Some references are Tsay (1986), Chang et al. (1988), Shin
et al. (1996) and Chen and Liu (1993). Using a time series with an AR1MA noise
function, Gómez and Maravall (1992a) proposed to analyse missing observations
as additive outliers. This paper was the basis for the computer program TRAM
written by Gómez and Maravall (1992b) to estimate ARIMA models with missing
observations, which was used by Franses and Haldrup (1994) in the context of
outlier detection in time series with unit roots.
The issue of outlier detection in the unit root framework offers a distinct
advantage, namely that one can work under the null hypothesis that a unit root is
present. This is the approach taken by Vogelsang (1999) whose procedure has two
useful features. First, it does not require a fully parametric model of the noise
function and is valid for a wide class of processes. Second, an asymptotic
distribution can be obtained and critical values tabulated even without having to
make specific distributional and parametric assumptions about the datagenerating process.
Ó Blackwell Publishing Ltd 2003
196
P. PERRON AND G. RODRÍGUEZ
The data-generating process entertained is of the general form:
yt ¼ d t þ
m
X
dj DðT ao;j Þt þ ut
ð1Þ
j¼1
where DðT ao;j Þt ¼ 1 if t ¼ T ao;j and 0 otherwise. This permits the presence of m
additive outliers occurring at dates T ao;j ðj ¼ 1; . . . ; mÞ. The term d t specifies the
deterministic components. In most cases, d t ¼ l, if the series is non-trending or
d t ¼ l þ bt if the series is trending (of course, other specifications are possible).
The noise function is integrated of order one, i.e.
ut ¼ ut1 þ vt
ð2Þ
where vt can be, for example, a linear process of the form vt ¼ uðLÞet with
uðLÞ ¼
1
X
1
X
ui Li
i2 u2i < 1
i¼0
i¼0
and et is a martingale difference sequence with mean 0 and
r2e ¼ lim T 1
T !1
T
X
Eðe2t Þ
t¼1
is finite. What is important is that the sequence vt satisfies the condition for the
application of a functional central limit theorem such that
T 1=2
½Tr
X
vt ) rW ðrÞ
t¼1
where W ðrÞ is the unit Wiener process, ) denotes weak convergence in
distribution and
2
r ¼ lim T
T !1
1
E
T
X
t¼1
vt
!2
with 0 < r2 < 1.
The detection procedure suggested by Vogelsang (1999), starts with the
following regression estimated by ordinary least squares (OLS) – if necessary, a
time trend can also be included
^þ^
dDðT ao Þt þ ut
yt ¼ l
ð3Þ
where DðT ao Þt ¼ 1 if t ¼ T ao and 0 otherwise. Let t^d ðT ao Þ denote the t-statistic for
testing d ¼ 0 in (3). Following Chen and Liu (1993), the presence of an additive
outlier can be tested using
Ó Blackwell Publishing Ltd 2003
SEARCHING FOR OUTLIERS I(1) TIME SERIES
197
s ¼ sup jt^d ðT ao Þj
T ao
Assuming that k ¼ T ao =T remains fixed as T grows, Vogelsang (1999) showed that
as T ! 1,
W ðkÞ
t^d ðT ao Þ ) H ðkÞ ¼ R 1
ð 0 W ðrÞ2 drÞ1=2
ð4Þ
where W ðkÞ denotes a demeaned standard Wiener process, i.e.
W ðkÞ ¼ W ðkÞ
Z
1
W ðsÞds
0
If (3) also includes a time trend, W ðkÞ will denote a detrended Wiener process.
Furthermore, from the continuous mapping theorem, it follows that,
s ) sup jH ðkÞj H
ð5Þ
k2ð0;1Þ
The distribution given in (5) is non-standard but is invariant with respect to any
nuisance parameters, including the correlation structure of the noise function. The
asymptotic critical values for s were obtained using simulations. The Wiener
processes were approximated by normalized sums of i.i.d. (identically and
independently distributed) Nð0; 1Þ random deviates using 1000 steps and 50,000
replications. Two cases were considered according to the deterministic
components included in (3). When there is an intercept in (3), the critical values
are 3.53, 3.11 and 2.92 at the 1%, 5% and 10% significance levels, respectively. If
a time trend is also included in (3), the corresponding critical values are 3.73, 3.31
and 3.12.3
The outlier detection procedure recommended by Vogelsang (1999) is
implemented as follows.4 First, compute the s statistic for the entire series and
compare s to the appropriate critical value. If s exceeds the critical value, then an
outlier is detected at date
T^ao ¼ arg max jt^d ðT ao Þj:
T ao
The outlier and the corresponding row of the regression is dropped and (3) is
again estimated and tested for the presence of another outlier. This continues until
the test shows a non-rejection.
3
Critical values were also tabulated for the case where no deterministic components are
included in (3). The critical values at 1%, 5% and 10% significance levels are 3.22, 2.84 and
2.65, respectively.
4
This is equivalent to the stepwise procedure to select for multiple outliers. See Hawkins
(1980).
Ó Blackwell Publishing Ltd 2003
198
P. PERRON AND G. RODRÍGUEZ
2.1. Simulation experiments for size
To assess the properties of the method in finite samples, we performed simulation
experiments under the hypothesis that the series contain no outlier. We consider a
simple data-generating process with an AR unit root, i.e.
y t ¼ y t1 þ ut
Two cases are considered for the errors ut ; namely MA(1) processes of the
form
ut ¼ vt þ hvt1
and AR(l) processes of the form
ut ¼ qut1 þ vt
In all cases, vt i.i.d. Nð0; 1Þ. We consider values of h and q in the range
½0:8; 0:8 with a step size of 0.2. Two sample sizes are used: T ¼ 100 and
T ¼ 200. The number of replications used was 10,000 and tests at the 5% and
10% significance levels were performed.
We first consider the size of the procedures in what we label the ‘one pass’ case.
The size is the number of times an observation is categorized as an outlier when
searching for a single outlier (without iterating any further for a given sample).
Results are presented in Table I.
TABLE I
Exact Size of Single Outlier Detection
MA case
AR case
Constant
T
100
200
Time trend
Constant
Time trend
h
5.0%
10.0%
5.0%
10.0%
T
q
5.0%
10.0%
5.0%
10.0%
)0.80
)0.60
)0.40
)0.20
0.00
0.20
0.40
0.60
0.80
0.184
0.101
0.067
0.050
0.043
0.038
0.037
0.037
0.037
0.332
0.191
0.125
0.104
0.082
0.076
0.072
0.069
0.060
0.137
0.095
0.068
0.051
0.043
0.037
0.035
0.035
0.035
0.241
0.177
0.126
0.097
0.082
0.075
0.069
0.067
0.066
100
)0.80
)0.60
)0.40
)0.20
0.073
0.064
0.053
0.046
0.149
0.121
0.107
0.091
0.068
0.061
0.055
0.048
0.135
0.118
0.106
0.094
0.20
0.40
0.60
0.80
0.031
0.034
0.023
0.020
0.073
0.064
0.050
0.042
0.036
0.032
0.029
0.029
0.071
0.063
0.053
0.047
)0.80
)0.60
)0.40
)0.20
0.00
0.20
0.40
0.60
0.80
0.227
0.105
0.065
0.050
0.042
0.038
0.037
0.037
0.037
0.400
0.205
0.131
0.102
0.086
0.079
0.076
0.074
0.074
0.177
0.102
0.068
0.051
0.042
0.038
0.036
0.034
0.034
0.327
0.198
0.127
0.099
0.084
0.075
0.072
0.069
0.069
200
)0.80
)0.60
)0.40
)0.20
0.080
0.064
0.055
0.048
0.154
0.126
0.110
0.097
0.076
0.065
0.056
0.049
0.151
0.124
0.109
0.096
0.20
0.40
0.60
0.80
0.037
0.034
0.029
0.022
0.077
0.067
0.055
0.043
0.036
0.030
0.025
0.022
0.073
0.062
0.052
0.041
Ó Blackwell Publishing Ltd 2003
199
SEARCHING FOR OUTLIERS I(1) TIME SERIES
For the i.i.d. case, Vogelsang’s method has an exact size close to nominal size.
For the case with negative MA errors, the test has size distortions (being liberal).
These distortions are smaller when more deterministic components are included in
the models. For positive MA errors and particularly for the model that includes a
time trend, the procedure is slightly undersized. A similar result is observed when
there are positively correlated AR errors.
The next experiments consider the properties of the method when applied in a
full iterative fashion, i.e. continuing to search for additional outliers when one is
found. Here, we record the total number of observations categorized as outliers
divided by the number of replications. These values can be labelled as the
expected number of outliers found. If the tests have the correct size a, say, at each
step of the iterations, and the tests are independent, this number should be close
to a=ð1 aÞ, that is 0.111 for a significance level 10% and 0.053 for a significance
level 5%.
The results are presented in Table II. The main thing to note is that Vogelsang’s
procedure finds many more outliers than would be expected if the test had the
correct size at each step. For example, for the model with only a constant with
i.i.d. errors, T ¼ 100, and a significance level of 10%, the number is 0.293 instead
of 0.111, i.e. an average of 2.93 outliers for each replication which contains at least
one outlier. These distortions increase when T increases to 200 with a value of
0.520 (instead of 0.111) which corresponds to approximately 5.2 outliers per
replications which have at least one outlier.
TABLE II
Expected Number of Outliers Found Using Multiple Outlier Detection
MA case
Constant
T
100
200
AR case
Time trend
Constant
Time trend
h
5.0%
10.0%
5.0%
10.0%
T
q
5.0%
10.0%
5.0%
10.0%
)0.80
)0.60
)0.40
)0.20
0.00
0.20
0.40
0.60
0.80
0.216
0.139
0.132
0.129
0.129
0.127
0.130
0.133
0.132
0.447
0.306
0.285
0.292
0.293
0.295
0.296
0.288
0.293
0.154
0.109
0.094
0.092
0.096
0.097
0.097
0.101
0.101
0.291
0.220
0.194
0.196
0.205
0.208
0.205
0.207
0.206
100
)0.80
)0.60
)0.40
)0.20
0.124
0.128
0.127
0.129
0.278
0.272
0.286
0.292
0.091
0.086
0.088
0.091
0.198
0.181
0.192
0.199
0.20
0.40
0.60
0.80
0.128
0.129
0.136
0.147
0.294
0.293
0.294
0.297
0.097
0.105
0.110
0.146
0.204
0.207
0.227
0.278
)0.80
)0.60
)0.40
)0.20
0.00
0.20
0.40
0.60
0.80
0.295
0.197
0.200
0.214
0.209
0.214
0.212
0.214
0.216
0.638
0.478
0.494
0.515
0.520
0.509
0.505
0.503
0.500
0.206
0.141
0.138
0.144
0.142
0.147
0.147
0.147
0.148
0.428
0.311
0.308
0.324
0.331
0.338
0.341
0.338
0.339
200
)0.80
)0.60
)0.40
)0.20
0.195
0.203
0.206
0.217
0.487
0.499
0.505
0.519
0.126
0.133
0.141
0.143
0.296
0.301
0.319
0.328
0.20
0.40
0.60
0.80
0.217
0.217
0.225
0.221
0.505
0.494
0.495
0.488
0.145
0.145
0.150
0.183
0.337
0.332
0.345
0.359
Ó Blackwell Publishing Ltd 2003
200
P. PERRON AND G. RODRÍGUEZ
3.
THE DISTRIBUTION OF THE TEST
s AT DIFFERENT ITERATIONS
In Section 2, we showed that the original procedure of Vogelsang (1999) has
severe size distortions when applied in an iterative fashion to search for outliers.
The reason for this is that the limiting distribution of the s test given by (5) is only
valid in the first step of the iteration. In subsequent steps, the asymptotic critical
values used need to be modified. The correct limiting distribution at each step is
given in Theorem 1.
Theorem 1. Suppose that y t is generated by (1) with di ¼ 0 ði ¼ 1; . . . ; mÞ and let
sðiÞ be the statistic s obtained at step i of the iterative search for outliers, then
lim Pr½sðiÞ > x ¼ Pr
T !1
½H > x
ai1
where a is the significance level of the test. Hence, the correct a-percentage point of
the limiting distribution of sðiÞ is the ai percentage point of the distribution of H
defined by (5).
Proof. The basic reason for this result is that, at different steps, the tests are
not independent; indeed, they are asymptotically equivalent because the series is
integrated. Hence, at each step sðiÞ ) H unconditionally on what happened in
the previous steps. But subsequent steps are applied, only if the previous one
showed a rejection; hence one must consider the limiting distribution conditional
on a rejection at the previous step. For simplicity, consider this limiting
distribution for the second step. It is given by, where xa is the a-percentage point
of the distribution of H ,
lim Pr½sð2Þ > xjsð1Þ > xa ¼
limT !1 Pr½ðsð2Þ > xÞ \ ðsð1Þ > xa Þ
limT !1 Pr½sð1Þ > xa
¼
limT !1 Pr½ðsð2Þ > xÞ \ ðsð1Þ > xa Þ
a
T !1
since sð1Þ ) H . Now, since we also have sð2Þ ) H ,
Pr½ðH > xÞ \ ðH > xa Þ
a
Pr½H > x
¼
a
lim Pr½sð2Þ > xjsð1Þ > xa ¼
T !1
provided xPxa , which we shall need to have tests with correct sizes. The result
stated in the theorem follows using further iterations of the same arguments. QED
We shall denote by sc , the iterative outlier detection procedure that uses the
correct (and different) asymptotic critical values at different steps. We have
simulated some asymptotic critical values. We approximate the Wiener process by
Ó Blackwell Publishing Ltd 2003
SEARCHING FOR OUTLIERS I(1) TIME SERIES
201
TABLE III
Asmptotic Critical Values of the Test sc
a
i
Model 1 zt ¼ f1g
Model 2 zt ¼ f1; tg
0.05
1
2
3
4
2.99
3.69
4.29
4.43
3.33
4.86
13.16
18.20
0.10
1
2
3
4
5
2.81
3.38
3.88
4.33
4.78
3.11
3.94
6.08
14.43
36.44
0.20
1
2
3
4
5
6
7
2.61
3.05
3.43
3.79
4.12
4.42
4.73
2.87
3.41
4.05
5.40
8.88
18.04
33.41
normalized sums of i.i.d. N(0, 1) random variables using 200 steps. To obtain a
fair range of critical values, we used 2 million replications. Nevertheless, even with
such a large number of replications, the critical values can be obtained for only a
few cases. This is because as we proceed further in the iterations of the outlier
detection, we need percentage points of the distribution of H that are very far in
the tail. For example, if the significance level is a ¼ 0:05, the percentage point
needed at the fourth iteration is approximately 0.00001. Hence, even with 2
million replications, we can only present critical values up to i ¼ 4 for a ¼ 0:05,
i ¼ 5 for a ¼ 0:10, and i ¼ 7 for a ¼ 0:20. These are presented in Table III.5
4.
A TEST USING FIRST DIFFERENCES OF THE DATA
As discussed in Section 5, Vogelsang’s original procedure is not powerful unless
the size of the outlier is very large. As a consequence, the full corrected iterative
procedure is even less powerful since the critical values to be used at each iteration
increase. Simulation evidence to that effect is presented in Section 5. Hence, it is
desirable to entertain an alternative outlier detection procedure that is less likely
to suffer from this low power problem.
5
Note that the critical values with i ¼ 1 are not quite identical to those presented in Section
2 of this paper or in Vogelsang (1999) since 200 instead of 1000 steps were used to
approximate the Weiner process. The differences, however, are minor and do not affect
subsequent results.
Ó Blackwell Publishing Ltd 2003
202
P. PERRON AND G. RODRÍGUEZ
We propose an iterative strategy using tests based on first-differences of the
data. Consider data generated by (1) with d t ¼ l, and a single outlier occurring at
date T ao with magnitude d. Then,
Dy t ¼ d½DðT ao Þt DðT ao Þt1 þ vt
ð6Þ
where DðT ao Þt ¼ 1, if t ¼ T ao (0, otherwise) and DðT ao Þt1 ¼ 1, if t ¼ T ao þ 1
(0, otherwise). If the data are trending, a constant should be included. This reflects
the fact that a unit root process with an outlier is characterized in first-differences
by two successive outliers of equal magnitude but with opposite signs. We have
that the least-squares estimate of d is given by
Dy Dy tþ1
^
d¼ t
2
vt vtþ1
¼
2
under the null hypothesis of no outlier. So, the variance of ^d is given by
dÞ ¼
varð^
Rv ð0Þ Rv ð1Þ
2
where Rv ðjÞ is the autocovariance function of vt at delay j. Let
^ v ðjÞ ¼ T 1
R
T j
X
^vt ^vtþj
t¼1
^ v ðjÞ is a
with ^vt the least-squares residuals obtained from regression (6). Then, R
consistent estimate of Rv ðjÞ. We can then consider the test statistic
sd ¼ sup jt^d ðT ao Þj
T ao
where
t^d ðT ao Þ ¼
^d
^ v ð0Þ R
^ v ð1ÞÞ=2Þ1=2
ððR
To detect multiple outliers, we can follow a strategy similar to that suggested by
Vogelsang (1999), by dropping the observation labelled as an outlier before
proceeding to the next step. The important feature is that, unlike for the case of
tests based on levels (as the s statistic of Vogelsang), in the limit the test sd is not
perfectly correlated across each step of the iterations when dealing with multiple
outliers. With i.i.d. errors, the values of sd are approximately uncorrelated at each
step of the iterations; with positively correlated errors, this no longer holds but the
correlation is mild.
The disadvantage of this procedure, compared to that based on the level of the
data, is that the limiting distribution depends on the specific distribution of the
errors ut , though not on the presence of serial correlation and heteroscedasticity.
This problem is exactly the same as that for finding outliers in stationary time
Ó Blackwell Publishing Ltd 2003
203
SEARCHING FOR OUTLIERS I(1) TIME SERIES
series since, by differencing, we effectively work with a stationary series. The
standard practice in the literature is rather ad hoc and consists in rejecting if the
t-statistic on some observation is greater than a critical value chosen to be some
number between 3 and 4; see, for example Tiao (1985), Chang and Tiao (1983)
and Tsay (1986), among others. Here, we shall simulate critical values assuming
i.i.d. normal errors and discuss the extent to which inference is affected when the
data deviates from these specifications. So, the data generating process is again
y t ¼ y t1 þ ut
ð7Þ
where ut i.i.d. Nð0; 1Þ. Two samples sizes are considered, namely T ¼ 100 and
T ¼ 200. The number of replications used was 50,000. The percentage points of
the test sd are presented in Table IV. To assess the size of the test in finite samples
when correlation is present in the errors, we consider, as in Section 2.1, the same
process defined by (7) with correlated errors. Two cases are considered for the
errors ut ; namely MA(l) processes of the form ut ¼ vt þ hvt1 and AR(l) processes
of the form ut ¼ qut1 þ vt . In all cases, vt i.i.d. Nð0; 1Þ. We consider values of
h and q in the range ½0:8; 0:8 with a step size of 0.4. The sample size is T ¼ 100,
the number of replications used was 10,000 and tests at the 5% significance level
were performed. We consider the iterative procedure with up to four outliers. The
results are presented in Table V.
TABLE IV
Finite Sample Critical Values of the Test sd
Model 1 zt ¼ f1g
Level of significance
1.0%
2.5%
5.0%
10.0%
Model 2 zt ¼ f1; tg
T ¼ 100
T ¼ 200
T ¼ 100
T ¼ 200
4.14
3.87
3.65
3.44
4.20
3.95
3.75
3.56
4.13
3.85
3.63
3.42
4.19
3.94
3.74
3.55
TABLE V
Exact Size of the Test Based on sd
Probability to find
1st outlier
i.i.d case
2nd outlier
3rd outlier
4th outlier
0.047
0.002
0.000
0.000
MA case
h ¼ 0:80
h ¼ 0:40
h ¼ 0:40
h ¼ 0:80
0.053
0.052
0.034
0.021
0.003
0.002
0.003
0.005
0.000
0.000
0.001
0.001
0.000
0.000
0.000
0.001
AR case
q ¼ 0:80
q ¼ 0:40
q ¼ 0:40
q ¼ 0:80
0.029
0.053
0.039
0.029
0.003
0.002
0.003
0.007
0.000
0.000
0.001
0.005
0.000
0.000
0.000
0.004
Ó Blackwell Publishing Ltd 2003
204
P. PERRON AND G. RODRÍGUEZ
The probability of finding at least one outlier is close to the nominal 5% level
throughout. The test is slightly conservative with positive MA errors or when the
AR coefficient is very large in absolute value. The probability of finding at least
two outliers is close to the theoretically expected value of 0.0025. The probability
of finding more than two outliers is basically null in all cases. Hence, we conclude
that the iterative procedure is adequate in that it delivers the expected number of
rejections at each stage of the iterations. Also, the correction for the presence of
serial correlation appears to perform satisfactorily.
5.
SIMULATIONS FOR SIZE AND POWER
In this section, we present results about the size and, especially, the power of the
various procedures when multiple outliers are present. The data generating
process considered is
yt ¼
m
X
dj DðT ao;j Þt þ ut
ð8Þ
j¼1
ut ¼ ut1 þ vt
ð9Þ
where DðT ao;j Þt ¼ 1 if t ¼ T ao;j and 0 otherwise. Again, two cases are considered
for the errors vt : MA(l) processes of the form vt ¼ et þ het1 and AR(l) processes
of the form vt ¼ qvt1 þ et . In all cases, et i.i.d. N(0, 1). We consider values of h
and q in the range ½0:8; 0:8 with a step size of 0.4. This permits the presence of m
additive outliers occurring at dates T ao;j ðj ¼ l; . . . ; mÞ. We consider two cases: one
with m ¼ 0 to assess size and one with m ¼ 4 to assess power. All simulations are
based on a sample size T ¼ 100 and 10,000 replications were performed. We
present results only for the case where a constant is included in the set of
deterministic components. The significance level of the test is set to 5%. For the
procedures based on sc and sd , we used the critical values presented in Tables III
and IV respectively.
When m ¼ 4, the location of the outliers are at observations 20, 40, 60 and 80.
The magnitudes of the outliers considered are either
(a) d1 ¼ 5, d2 ¼ 3 and d3 ¼ d4 ¼ 2, or
(b) d1 ¼ 10 and d2 ¼ d3 ¼ d4 ¼ 5.
We consider the properties of Vogelsang’s uncorrected method (s), its corrected
version (sc ) and the method based on first-differenced data (sd ). The results are
presented in Table VI (MA errors) and Table VII (AR errors).
Consider first the behaviour of the tests when there is no outlier. The only
procedure with a size close to the expected theoretical nominal size (5% at the first
step, 0.0025 at the second and basically 0 at the third and fourth) is that based on
first-differenced data (sd ), though, as noted before, it is somewhat conservative
with an AR coefficient that is large in absolute value and for positive MA
Ó Blackwell Publishing Ltd 2003
TABLE VI
Size and Power of the Tests to Detect for Additive Outliers; MA(1) Errors
d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0
d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2
d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5
s
sc
sd
s
sc
sd
s
sc
sd
1st outlier
2nd outlier
3rd outlier
4th outlier
0.166
0.026
0.004
0.001
0.242
0.001
0.000
0.000
0.056
0.002
0.000
0.000
0.834
0.360
0.094
0.022
0.865
0.121
0.001
0.000
0.746
0.179
0.019
0.002
0.998
0.957
0.865
0.661
0.999
0.828
0.359
0.147
1.000
0.921
0.779
0.518
h ¼ 0:40
1st outlier
2nd outlier
3rd outlier
4th outlier
0.063
0.021
0.011
0.007
0.098
0.003
0.000
0.000
0.054
0.002
0.000
0.000
0.286
0.058
0.013
0.007
0.342
0.009
0.000
0.000
0.941
0.396
0.076
0.007
0.793
0.401
0.194
0.081
0.823
0.199
0.021
0.004
1.000
0.997
0.985
0.912
h ¼ 0:00
1st outlier
2nd outlier
3rd outlier
4th outlier
0.040
0.023
0.015
0.010
0.065
0.004
0.000
0.000
0.047
0.002
0.000
0.000
0.101
0.024
0.014
0.010
0.135
0.004
0.000
0.000
0.996
0.674
0.228
0.040
0.464
0.122
0.038
0.013
0.516
0.035
0.001
0.000
1.000
1.000
1.000
0.998
h ¼ 0:40
1st outlier
2nd outlier
3rd outlier
4th outlier
0.036
0.024
0.018
0.012
0.054
0.004
0.000
0.000
0.038
0.004
0.001
0.000
0.055
0.022
0.015
0.010
0.079
0.003
0.000
0.000
1.000
0.821
0.380
0.106
0.231
0.044
0.016
0.010
0.282
0.007
0.001
0.000
1.000
1.000
1.000
1.000
h ¼ 0:80
1st outlier
2nd outlier
3rd outlier
4th outlier
0.130
0.026
0.015
0.011
0.164
0.004
0.000
0.000
1.000
1.000
1.000
1.000
0.036
0.053
0.021
0.042
0.063
0.994
0.026
0.004
0.005
0.023
0.003
0.749
0.019
0.000
0.001
0.017
0.000
0.297
0.014
0.000
0.001
0.012
0.000
0.087
P
Note: The data generating process is: y t ¼ 4j¼1 dj DðT ao;j Þt þ ut with ut ¼ ut1 þ vt and vt ¼ et þ het1 where et i.i.d:
Nð0; 1Þ. 10,000 replications are used.
205
Ó Blackwell Publishing Ltd 2003
h ¼ 0:80
SEARCHING FOR OUTLIERS I(1) TIME SERIES
Probability to find
206
Ó Blackwell Publishing Ltd 2003
TABLE VII
Size and Power of the Tests to Detect for Additive Outliers; AR(1) Errors
d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0
d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5
s
sc
sd
s
sc
sd
s
sc
sd
q ¼ 0:80
1st outlier
2nd outlier
3rd outlier
4th outlier
0.069
0.022
0.009
0.006
0.106
0.002
0.000
0.000
0.029
0.003
0.001
0.000
0.301
0.059
0.014
0.005
0.360
0.010
0.000
0.000
0.375
0.044
0.003
0.000
0.824
0.426
0.202
0.077
0.849
0.210
0.018
0.003
0.965
0.565
0.279
0.098
q ¼ 0:40
1st outlier
2nd outlier
3rd outlier
4th outlier
0.052
0.020
0.013
0.009
0.083
0.003
0.000
0.000
0.055
0.002
0.000
0.000
0.206
0.038
0.013
0.008
0.254
0.006
0.000
0.000
0.921
0.361
0.067
0.006
0.701
0.288
0.119
0.045
0.738
0.126
0.009
0.001
1.000
0.993
0.973
0.880
q ¼ 0:40
1st outlier
2nd outlier
3rd outlier
4th outlier
0.033
0.024
0.019
0.014
0.050
0.004
0.001
0.000
0.042
0.003
0.001
0.000
0.042
0.021
0.016
0.012
0.067
0.003
0.000
0.000
1.000
0.856
0.429
0.131
0.159
0.030
0.015
0.011
0.200
0.004
0.000
0.000
1.000
1.000
1.000
1.000
q ¼ 0:80
1st outlier
2nd outlier
3rd outlier
4th outlier
0.027
0.021
0.018
0.015
0.038
0.003
0.001
0.000
1.000
1.000
1.000
1.000
0.025
0.033
0.030
0.025
0.033
1.000
0.022
0.004
0.007
0.022
0.003
0.935
0.019
0.001
0.004
0.019
0.001
0.608
0.017
0.000
0.004
0.016
0.000
0.308
P4
Note: The data generating process is: y t ¼ j¼1 dj DðT ao;j Þt þ ut with ut ¼ ut1 þ vt and vt ¼ qvt1 þ et where et i.i.d.
Nð0; 1Þ. 10,000 replications are used.
P. PERRON AND G. RODRÍGUEZ
Probability to find
d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2
SEARCHING FOR OUTLIERS I(1) TIME SERIES
207
coefficients. Vogelsang’s procedure, whether corrected or not show substantial
size distortions (liberal tests) in the presence of negative MA errors, and also to a
lesser extent in the presence of strong negative AR errors. The results also confirm
the fact that Vogelsang’s uncorrected procedure (s) finds an excessive number of
outliers when applied in an iterative fashion.
The most interesting feature of the results is that the methods based on the level
of the data have basically no power, while the method based on first-differenced
data has excellent power even for outliers of moderate size. Consider, for example,
case (a) which is representative of outliers of moderate sizes. In the case with i.i.d.
errors, Vogelsang’s corrected procedure (sc ) finds one outlier 14% of the cases,
while it basically never finds more than one outlier. The method based on firstdifferenced data finds at least one outlier almost 100% of the times and more than
three outliers 23% of the times. Consider now case (b) which is representative of
large outliers, the method sd finds four outliers basically 100% of the times, while
the method sc finds at least one outlier 52% of the time and finds more than two
outliers only 4% of the time. The results are qualitatively similar with errors that
are serially correlated. Negative serial correlation (of the AR or MA type) induces
a loss of power while positive serial correlation (again of either type) induces an
increase in power.
5.1. Robustness to departures from a unit root
It is of interest to assess the extent to which our suggested procedure is robust to
departures from a unit root. Indeed, sometimes, the purpose of detecting outliers is
to provide appropriate corrections to unit root tests, in which case the presence or
not of a unit root is unknown. A popular device to analyse this issue is a so-called
near-integrated process which specifies, under the null hypothesis of no outlier that
c
y t ¼ 1 þ y t1 þ vt
ð10Þ
T
where vt is a stationary process. Here, c is a non-centrality parameter which
measures the extent of departures from a strict unit root process. When c < 0, we
have a locally stationary process. This is labelled as a process local to unity, since
as T increases the AR parameter converges to one. A little algebra shows that,
under this specification, the OLS estimate ^
d from regression (6) is
Dy Dy tþ1
^
d¼ t
2
Tc Dy t þ ðvt vtþ1 Þ
¼
2
vt vtþ1
þ Op ðT 1 Þ
¼
2
^ v ð0Þ R
^ v ð1ÞÞ=2 remains a consistent estimate of the variance of ^d. Hence,
Also, ðR
we can expect our procedure to remain adequate under this local to unit root
Ó Blackwell Publishing Ltd 2003
208
P. PERRON AND G. RODRÍGUEZ
set-up. But the robustness of our procedure also extends to the case where y t is a
^ v ð0Þ R
^ v ð1ÞÞ=2 remains a
stationary processes. The reason is again that ðR
consistent estimate of the variance of ^
d.
To assess the size of the procedures sc and sd under departures from a unit root,
we performed simulations from data generated by (10) with T ¼ 100 and
vt i.i.d. Nð0; 1Þ for a range of values for c. The results are presented in Table
VIII. They clearly show that the procedure sd has an exact size close to the
nominal 5% for any value of c. On the other hand, the procedure sc shows
increasing size distortions as c moves away from 0 (it is easy to show that the
asymptotic distribution of the test sc is different under the local to unity
framework; the Wiener process being replaced by an Orhnstein–Uhlenbeck
process with drift parameter c). The last row of Table VIII, shows the exact size
when y t i.i.d. Nð0; 1Þ. Again, the procedure sd has an exact size close to the
nominal 5%.
When the process is stationary, a more natural procedure is to base the test on a
regression using levels of the data, i.e. a regression of the form
y t ¼ l þ dDðT ao Þt þ vt
ð11Þ
Let d be the OLS estimate of d and denote the t-statistic for testing d ¼ 0 by
td ðT ao Þ ¼
^ v ð0Þ ¼ T 1
where R
statistic is then
PT
v2t
t¼1 ^
d
^ v ð0Þ1=2
R
with ^vt the OLS residuals from regression (11). The
sl ¼ sup jtd ðT ao Þj
T ao
To assess the size and power properties of sd and sl , we performed a simulation
experiment with data generated by (8) 0. Now the errors ut are stationary and,
again, two cases are considered: MA(1) processes of the form ut ¼ et þ het1 and
TABLE VIII
Size of the Tests sc and sd with Non-unit Root Processes
sc
Near-integrated case
c
c
c
c
c
c
¼ 1:25
¼ 2:5
¼ 5:0
¼ 10:0
¼ 20:0
¼ 40:0
i.i.d. case
sd
T ¼ 100
T ¼ 200
T ¼ 100
T ¼ 200
0.062
0.065
0.083
0.126
0.206
0.281
0.065
0.071
0.091
0.145
0.248
0.362
0.053
0.053
0.054
0.054
0.055
0.056
0.051
0.051
0.051
0.051
0.050
0.051
0.329
0.501
0.056
0.051
Note: For the near integrated case, the data generating process is y t ¼ ð1 þ c=T Þy t1 þ et where
et i.i.d. Nð0; 1Þ. For the i.i.d. case, it is y t ¼ et . 10,000 replications are used.
Ó Blackwell Publishing Ltd 2003
SEARCHING FOR OUTLIERS I(1) TIME SERIES
209
AR(1) processes of the form ut ¼ qut1 þ et with et i.i.d. N(0, 1). The results,
obtained from 10,000 replications, are presented in Tables IX and X (the critical
values for sl were obtained the same way as those for sd in Table IV except that
regression (11) is used instead of (6)). When the process is i.i.d. or negatively
serially correlated, the procedure sd based on first-differences is indeed less
powerful than that based on level (sl ). However, with positive serial correlation,
the reverse holds and the procedure based on first-differences is more powerful.
This is encouraging since most macroeconomic time series are positively
correlated. Hence, for most applications of interest in economics, which have
positive correlation, not only sd is valid if a unit root is present but is more
powerful than the more common procedure based on level.
5.2. Consistency against local alternatives
It is well known that tests for outliers of the type considered here (based on the value
of a single observation) are inconsistent against fixed alternatives. That is, the power
of the tests against a fixed value of d does not converge to 1 as the sample size
increases. Nevertheless, insights into relative powers can be obtained looking at the
properties of the tests in the presence of local alternatives of the form
H 1 : dT ¼ do T a
ð12Þ
for some fixed do . It is easy to show that Vogelsang’s procedure corrected or not
(s or sc ) is consistent against alternatives of the form (12) only for values a > 12. On
the other hand, the procedures sd and sl are consistent against such alternatives
for any values a > 0. This goes some way towards explaining the greater power
found for sd in the simulations.
5.3. Departures from normality
As we argued above, the procedure sd has several advantages over the procedure
sc proposed by Vogelsang: the same critical values can be used at each step of the
iterations, power is much higher, and it is robust to departures from a unit root.
However, an advantage of the procedure sc is that it is not affected by departures
from the normality assumption (at least in large samples and with a strict unit
root). This is not the case for the procedure sd whose distribution is heavily
dependent on the normality assumption. This is not a new problem and it has
been present throughout much of the literature on outlier detection and, as
discussed in Section 4, most have resorted to recommend some ad hoc rule of
thumbs to decide on a rejection or not.
The distribution of tests like sd or the more common sl depends on the shape of
the tail of the distribution of the error process. For alternative distributions, we
obtained the following results from 25,000 replications. With uniform ½ 12 ; 12
errors, which have no tail, the 5% critical value of sd is 2.61 for T ¼ 100
Ó Blackwell Publishing Ltd 2003
210
d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0
Probability to find
d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2
d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5
sl
sd
sl
sd
sl
sd
h ¼ 0:80
1st outlier
2nd outlier
3rd outlier
4th outlier
0.047
0.003
0.000
0.000
0.055
0.003
0.000
0.000
0.571
0.089
0.006
0.001
0.168
0.014
0.001
0.000
0.999
0.821
0.553
0.257
0.832
0.239
0.051
0.001
h ¼ 0:40
1st outlier
2nd outlier
3rd outlier
4th outlier
0.047
0.003
0.000
0.000
0.056
0.003
0.000
0.000
0.798
0.206
0.022
0.002
0.318
0.035
0.002
0.000
1.000
0.964
0.862
0.605
0.974
0.505
0.198
0.048
h ¼ 0:00
1st outlier
2nd outlier
3rd outlier
4th outlier
0.053
0.002
0.000
0.000
0.056
0.003
0.000
0.000
0.876
0.276
0.036
0.003
0.614
0.113
0.011
0.000
1.000
0.989
0.946
0.765
0.999
0.827
0.596
0.313
h ¼ 0:40
1st outlier
2nd outlier
3rd outlier
4th outlier
0.052
0.002
0.000
0.000
0.053
0.002
0.000
0.000
0.807
0.211
0.023
0.003
0.878
0.283
0.039
0.003
1.000
0.967
0.868
0.616
1.000
0.977
0.928
0.772
h ¼ 0:80
1st outlier
2nd outlier
3rd outlier
4th outlier
0.037
0.004
0.000
0.000
0.595
0.099
0.008
0.000
0.907
0.323
0.048
0.001
0.999
0.839
0.570
0.260
1.000
0.984
0.948
0.828
0.044
0.004
0.000
0.000
P
Note: The data generating process is: y t ¼ 4j¼1 dj DðT ao;j Þt þ ut with
ut ¼ et þ het1 where et i.i.d. Nð0; 1Þ. 10,000 replications are used.
P. PERRON AND G. RODRÍGUEZ
Ó Blackwell Publishing Ltd 2003
TABLE IX
Size and Power of sl and sd for Stationary Processes; MA(1) Errors
TABLE X
Size and Power of sl and sd for Stationary Processes; AR(1) Errors
d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2
d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5
sl
sd
sl
sd
sl
sd
1st outlier
2nd outlier
3rd outlier
4th outlier
0.021
0.003
0.001
0.000
0.025
0.002
0.000
0.000
0.315
0.027
0.002
0.000
0.062
0.003
0.000
0.000
0.959
0.511
0.197
0.058
0.397
0.048
0.005
0.000
q ¼ 0:40
1st outlier
2nd outlier
3rd outlier
4th outlier
0.046
0.003
0.000
0.000
0.052
0.003
0.000
0.000
0.789
0.189
0.021
0.002
0.287
0.027
0.002
0.000
1.000
0.957
0.840
0.567
0.952
0.432
0.141
0.030
q ¼ 0:40
1st outlier
2nd outlier
3rd outlier
4th outlier
0.051
0.003
0.000
0.000
0.056
0.003
0.000
0.000
0.804
0.200
0.022
0.002
0.885
0.283
0.043
0.004
1.000
0.963
0.855
0.591
1.000
0.979
0.938
0.788
q ¼ 0:80
1st outlier
2nd outlier
3rd outlier
4th outlier
0.055
0.002
0.000
0.000
0.366
0.034
0.003
0.000
0.985
0.533
0.129
0.020
0.975
0.585
0.249
0.070
1.000
0.999
0.998
0.986
0.022
0.004
0.001
0.000
P4
Note: The data generating process is: y t ¼ j¼1 dj DðT ao;j Þt þ ut with
ut ¼ qut1 þ et where et i.i.d. Nð0; 1Þ. 10,000 replications are used.
211
Ó Blackwell Publishing Ltd 2003
q ¼ 0:80
SEARCHING FOR OUTLIERS I(1) TIME SERIES
Probability to find
d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0
212
P. PERRON AND G. RODRÍGUEZ
(compared to 3.65 with normal errors). On the other hand, when the errors are
distributed as a v2 ð1Þ (centred to have mean zero), the 5% critical value is 6.22;
indeed, much higher due to the long right tail of the distribution.
To the authors’ knowledge, no satisfactory procedure is available to overcome
this dependence of outlier detection procedures on the exact nature of the error
distribution. It may be possible to use extreme value theory and non-parametric
estimates of tail behaviour but such an extension is well beyond the scope of the
present paper.
6.
SIZE OF CORRECTED ADF UNIT ROOT TESTS
As noted by Franses and Haldrup (1994) and Vogelsang (1999), the presence of
outliers biases unit root tests towards over-rejection of the null hypothesis acting
like a negative MA component. One of the aim of outlier detection mentioned by
these authors is to be able to correct the unit root tests by incorporating
appropriate dummy variables. To assess the relative merits of the outlier detection
procedures discussed in correcting the size of unit root tests, we again resorted to
a simulation analysis concerning the size of the Dickey–Fuller (1979) test given by
the t-statistic for testing that a ¼ 1 in the regression
y t ¼ l þ ay t1 þ
pþ1 X
m
X
i¼0 j¼1
dij DðT ao;j Þti þ
k
X
d i Dy ti þ et
i¼1
where DðT ao;j Þti ¼ 1 if t ¼ T ao;j þ i and 0 otherwise, with T ao;j ðj ¼ l; . . . ; mÞ the
dates of the outliers identified. The data generating process is the same as
described earlier. In constructing the unit root tests, the lag length k was selected
in the same way as in Vogelsang (1999), namely using a recursive general to
specific t-test on the last lag with a significance level of 10% starting at some
maximal value set at 5. The results are presented in Table XI (MA errors)
and Table XII (AR errors). For each case, the row marked ‘without’ indicates
the percentage of rejections of the null hypothesis that occurred when no outlier
was selected; the row marked ‘with’ indicates the percentage of rejections of
the null hypothesis when outliers were detected and the appropriate dummies
introduced in the autoregression; the row marked ‘total’ is simply the sum of the
two values.
We first consider the case where no outlier is present. This establishes a base
case to compare size distortions with cases where outliers are present. With AR
errors all procedures have approximately the correct size. The same is true with a
positively correlated MA component. As is well known, the ADF unit root test
suffers from substantial size distortions with a negatively correlated MA
component and this is reflected in our results. When outliers are present, the
size of the ADF test corrected for outliers using sc is, in almost all cases, larger
than when corrected using the method sd . For example, with i.i.d. errors and large
Ó Blackwell Publishing Ltd 2003
TABLE XI
Size of the ADF Test; MA(1) Errors*
d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2
d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5
s
sc
sd
s
sc
sd
s
sc
sd
Without
With
Total
0.309
0.089
0.398
0.274
0.122
0.396
0.368
0.021
0.389
0.033
0.399
0.431
0.024
0.402
0.426
0.124
0.309
0.433
0.000
0.380
0.380
0.000
0.460
0.460
0.000
0.392
0.392
h ¼ 0:40
Without
With
Total
0.069
0.019
0.088
0.062
0.025
0.087
0.081
0.005
0.086
0.040
0.070
0.110
0.031
0.072
0.103
0.007
0.081
0.088
0.003
0.123
0.125
0.001
0.132
0.133
0.000
0.074
0.074
h ¼ 0:00
Without
With
Total
0.038
0.012
0.050
0.034
0.016
0.049
0.049
0.002
0.051
0.048
0.030
0.078
0.041
0.035
0.076
0.001
0.051
0.052
0.011
0.082
0.094
0.008
0.079
0.087
0.000
0.041
0.041
h ¼ 0:40
Without
With
Total
0.048
0.010
0.057
0.044
0.012
0.056
0.057
0.001
0.058
0.035
0.016
0.051
0.029
0.020
0.049
0.000
0.045
0.045
0.024
0.055
0.079
0.017
0.057
0.074
0.000
0.043
0.043
h ¼ 0:80
Without
With
Total
0.048
0.008
0.056
0.045
0.008
0.053
0.056
0.001
0.057
0.036
0.012
0.047
0.031
0.015
0.046
0.000
0.050
0.050
0.044
0.032
0.076
0.037
0.033
0.070
0.000
0.043
0.043
*Choosing the lag length with the sequential t-sig method.
213
Ó Blackwell Publishing Ltd 2003
h ¼ 0:80
SEARCHING FOR OUTLIERS I(1) TIME SERIES
d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0
214
Ó Blackwell Publishing Ltd 2003
TABLE XII
Size of the ADF Test; AR(1) Errors*
d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2
d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5
s
sc
sd
s
sc
sd
s
sc
sd
q ¼ 0:80
Without
With
Total
0.040
0.015
0.055
0.034
0.016
0.051
0.049
0.001
0.050
0.034
0.056
0.090
0.028
0.053
0.081
0.061
0.024
0.085
0.001
0.110
0.111
0.001
0.118
0.119
0.008
0.089
0.097
q ¼ 0:40
Without
With
Total
0.040
0.013
0.053
0.036
0.017
0.053
0.050
0.003
0.053
0.033
0.042
0.075
0.027
0.043
0.070
0.006
0.047
0.053
0.003
0.099
0.103
0.002
0.097
0.099
0.000
0.045
0.045
q ¼ 0:40
Without
With
Total
0.041
0.009
0.050
0.038
0.010
0.048
0.050
0.002
0.052
0.031
0.012
0.043
0.026
0.016
0.042
0.000
0.041
0.041
0.020
0.032
0.052
0.015
0.031
0.046
0.000
0.041
0.041
q ¼ 0:80
Without
With
Total
0.051
0.004
0.055
0.049
0.006
0.055
0.056
0.001
0.057
0.039
0.004
0.043
0.037
0.006
0.043
0.000
0.045
0.045
0.030
0.004
0.034
0.028
0.007
0.035
0.000
0.043
0.043
*Choosing the lag length with the sequential t-sig method.
P. PERRON AND G. RODRÍGUEZ
d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0
SEARCHING FOR OUTLIERS I(1) TIME SERIES
215
outliers, the size is 0.087 when corrected with sc and 0.041 when corrected with sd .
Comparing the rows ‘with’ and ‘without’, we see that, when outliers is present,
rejections of the unit root occurring when no outlier is identified are very small for
the method sd while they are substantial when using the method sc .
As emphasized by Franses and Haldrup (1994), outliers induce an MA-like
component in the errors when they are not accounted for. Even if Vogelsang’s
method selects too few outliers (given its low power), the size of the corrected
ADF test can be brought close to nominal size when using a data-dependent
method to select the lag length since the latter would tend to correct for missed
outliers by choosing a higher lag length (since the missed outliers have the effect of
inducing a negative MA structure in the errors). To verify this claim, we
conducted the same simulation experiments with pure AR(1) errors and the lag
length fixed at its true value 1. The results are presented in Table XIII. Consider,
for instance, the case with an AR coefficient q ¼ 0:4 with large outliers. The size
of the unit root test corrected using sc is 0.19 with k fixed at 1 instead of 0.10 with
k selected using the sequential t-test procedure. Hence, it is clear that the failure to
account for all outliers present can be compensated by the selection of a larger lag
length. Yet, as the results for the size of the unit root test corrected using sd show,
a good method to select outliers does a better job at reducing size distortions.
7.
EMPIRICAL APPLICATIONS
The procedures analysed in the last sections were applied to two series of realexchange rates for US/Finland. The first series covers the period 1900–88 and it is
constructed using the consumption price index (CPI). The other series spans the
years 1900–87 and is constructed using the gross domestic product (GDP)
deflator. The series are shown in Figures 1 and 2, respectively. These are the same
series used by Vogelsang (1999), Franses and Haldrup (1994) and Perron and
Vogelsang (1992) and are described in more details in Appendix A.
Franses and Haldrup (1994) used the TRAM program (Time Series Regression
With ARIMA Noise and Missing Values) written by Gómez and Maravall
(1992b) to search for outliers in these two real-exchange rate series. They
considered two types of outliers, additive outliers and outliers that produce
temporary changes, denoted AO and TC outliers, respectively. For the US/
Finland real-exchange rate series based on the CPI index, they found four additive
outliers at dates 1918, 1922, 1945 and 1948. The observations associated with the
years 1917, 1932 and 1949 were found to be outliers that produce temporary
changes (TC outliers). For the US/Finland real-exchange rate series based on the
GDP deflator, an additive outlier was found only at date 1918, whereas outliers
that produce temporary changes were found at dates 1917, 1932, 1949 and 1957.
Table XIV reports the empirical results from applying the procedures discussed
in this paper using 5% and 10% significance levels. Vogelsang (1999) presents
results for additive outliers only for the US/Finland real-exchange rate series
Ó Blackwell Publishing Ltd 2003
216
Ó Blackwell Publishing Ltd 2003
TABLE XIII
Size of the ADF Test; AR(1) Errors*
d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0
d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2
d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5
sc
sd
s
sc
sd
s
sc
sd
q ¼ 0:80
Without
With
Total
0.037
0.018
0.055
0.031
0.022
0.053
0.046
0.001
0.047
0.056
0.077
0.133
0.046
0.078
0.124
0.091
0.033
0.124
0.013
0.192
0.205
0.009
0.232
0.241
0.019
0.163
0.182
q ¼ 0:40
Without
With
Total
0.034
0.016
0.050
0.029
0.021
0.050
0.045
0.002
0.047
0.051
0.058
0.109
0.041
0.058
0.099
0.010
0.057
0.067
0.017
0.159
0.176
0.011
0.177
0.188
0.000
0.050
0.050
q ¼ 0:00
Without
With
Total
0.034
0.013
0.047
0.030
0.018
0.048
0.046
0.002
0.048
0.035
0.029
0.064
0.030
0.031
0.061
0.000
0.048
0.048
0.021
0.094
0.115
0.015
0.089
0.104
0.000
0.046
0.046
q ¼ 0:40
Without
With
Total
0.038
0.009
0.047
0.034
0.013
0.047
0.048
0.002
0.050
0.019
0.011
0.030
0.014
0.015
0.029
0.000
0.041
0.041
0.017
0.023
0.040
0.012
0.025
0.037
0.000
0.045
0.045
q ¼ 0:80
Without
With
Total
0.047
0.005
0.052
0.045
0.008
0.053
0.052
0.001
0.053
0.023
0.004
0.027
0.021
0.008
0.029
0.000
0.036
0.036
0.028
0.004
0.032
0.025
0.008
0.033
0.000
0.047
0.047
*With the lag length fixed at one.
P. PERRON AND G. RODRÍGUEZ
s
SEARCHING FOR OUTLIERS I(1) TIME SERIES
217
Figure 1. Logarithm of the US/Finland real exchange rate based on the consumer price index (CPI);
annual from 1900 to 1988.
Figure 2. Logarithm of the US/Finland real exchange rate based on the GDP deflator; annual from
1900 to 1987.
TABLE XIV
Empirical Results; Logarithm of the US/Finland Real Exchange Rate
Significance level
5.0%
10.0%
Test
CPI-based series 1900–88
GDP-based series 1900–87
sc
sd
sc
sd
1918
1917, 1918, 1919, 1932, 1948
1918, 1919
1917, 1918, 1919, 1932, 1948
No outlier
1918, 1949
No outlier
1917, 1918, 1919, 1921, 1932,
1947, 1948, 1957
Ó Blackwell Publishing Ltd 2003
218
P. PERRON AND G. RODRÍGUEZ
based on the CPI index. The dates he found (using the procedure s) were 1917–19,
1921 and 1932. When appropriately corrected, Vogelsang’s (1999) method finds
outliers only for the year 1918 at the 5% level and for 1918 and 1919 at the 10%
level, illustrating the fact that, when it is not corrected, it tends to select more
outliers than warranted. The procedure based on first-differenced data (sd ) finds
outliers at dates 1917, 1918, 1919, 1932 and 1948 at both the 5% and 10%
significance levels. This illustrates how this latter method is more powerful.
For the US/Finland real-exchange rate series based on the GDP deflator, the
method based on sc finds no outlier. As mentioned by Vogelsang (1999), this may
be due to the presence of a shift in the mean of the series as documented by Perron
and Vogelsang (1992). The procedure based on first-differenced data (sd ) is,
nevertheless, able to identify the years 1918 and 1948 as outliers at the 5% level.6
These two dates are not associated with the change in mean identified by Perron
and Vogelsang (1992) as occurring in 1937. The fact that our procedure identifies
the year 1918 as an outlier is comforting since visual inspection clearly points in
that direction.
8.
CONCLUSIONS
We analysed in this paper the size and power properties of some test procedures
for multiple outliers in series with an AR unit root. We showed, via simulations,
that the procedure suggested by Vogelsang (1999) has indeed the right size when
applied to detect a single outlier but that it finds an excessive number of outliers
when applied in an iterative fashion. We showed this iterative method to be
theoretically incorrect and we derived the appropriate limiting distribution for
each step of the iterations. We also showed that, whether corrected or not, such
outlier detection methods based on the level of the data have very low power
unless the magnitude of the outliers is unrealistically large. Our suggestion was to
use a procedure based on first-differenced data which was shown to have
considerably more power. Our analysis remained in the tradition of sequential
searches for outliers. It may well be the case that a global procedure might
perform better. Work is under way to investigate this issue.
APPENDIX: THE DATA
The US/Finland real-exchange rate series based on the CPI index and the GDP deflator
were kindly provided by Tim Vogelsang. They are the same series used in Vogelsang (1999),
6
At the 10% significance level, the outliers found are for the years 1917, 1918, 1919, 1921,
1932, 1947, 1948 and 1957.
Ó Blackwell Publishing Ltd 2003
SEARCHING FOR OUTLIERS I(1) TIME SERIES
219
Franses and Haldrup (1994) and Perron and Vogelsang (1992). The US/Finland realexchange rate series based on the CPI index is annual from 1900 to 1988, whereas that
based on the GDP deflator is from 1900 to 1987. The details of the sources is as follows
(Perron and Vogelsang 1992, App. A): Nominal exchange rate series, 1900–88 from the
Bank of Finland; CPI, 1900–85 from the Bank of Finland, 1986–88 from the IMF (1988);
GDP deflator, 1900–85 from the Bank of Finland, 1986–87 from IMF (1988). The sources
of the US data are: for the GNP deflator, 1869–1975 from Friedman and Schwartz (1982),
1976–88 from IMF (1988); for the CPI, 1860–1970 from the US Bureau of the Census
(1976) and 1971–88 from IMF (1988).
REFERENCES
Box, G. E. P. and Tiao, G. C. (1975) Intervention analysis with applications to economic and
environmental problems. Journal of the American Statistical Association 70, 70–9.
Chang, I and Tiao, G. C. (1983) Estimation of time series parameters in the presence of outliers,
Technical Report 8, University of Chicago, Statistics Research Center.
———, ——— and Chen, C. (1988) Estimation of time series parameters in the presence of outliers.
Technometrics 30, 193–204.
Chen, C. and Liu, L. (1993) Joint estimation of model parameters and outlier effects in time series.
Journal of the American Statistical Association 88, 284–97.
Dickey, D. A. and Fuller, W. A. (1979) Distribution of the estimators for autoregressive time series
with a unit root. Journal of the American Statistical Association 74, 427–31.
Fox, A. J. (1972) Outliers in time series. Journal of the Royal Statistical Association Series B 43,
350–63.
Franses, P. H. and Haldrup, N. (1994) The effects of additive outliers on tests for unit roots and
cointegration. Journal of Business & Economic Statistics 12, 471–8.
Friedman, M. and Schwartz, A. J. (1982) Monetary Trends in the United States and the United
Kingdom: Their Relation to Income, Prices and Interest Rates, 1867–1975. Chicago: The University of
Chicago Press.
Gómez, V. and Maravall, A. (1992a) Estimation, prediction and interpolation for nonstationary
series with the Kalman Filter. European University Institute, Working Paper ECO 92/80.
———, and ——— (1992b) Time series regression with ARIMA noise and missing observations.
Program TRAM. European University Institute, Working Paper ECO 92/81.
Hawkins, D. M. (1973) Repeated testing for outliers. Statistica Neerlandica 27, 1–10.
——— (1980) Identification of Outliers. New York: Chapman and Hall.
International Monetary Fund (1988) International Financial Statistics: Yearbook. Washington,
DC: IMF.
Peña, D. (1990). Influential observations in time series. Journal of Business & Economic Statistics 8,
235–41.
Perron, P. and Ng, S. (1996) Useful modifications to some unit root tests with dependent errors and
their local asymptotic properties. Review of Economic Studies 63, 435–63.
——— and Vogelsang, T. J. (1992) Nonstationarity and level shifts with an application to purchasing
power parity. Journal of Business & Economic Statistics 10, 301–20.
Rodrı́guez, G. (1999) Unit Root, Outliers and Cointegration Analysis with Macroeconomic Applications, Unpublished PhD dissertation, Département de Sciences Économiques, Université de
Montréal.
Shin, D. W., Sarkar, S. and Lee, J. H. (1996) Unit root tests for time series with outliers. Statistics and
Probability Letters 30, 189–97.
Stock, J. H. (1999) A class of tests for integration and cointegration. In Cointeqration, Causality and
Forecasting. A Festschrift in Honour of Clive W. J. Granger (eds R. F. Engle and H. White). Oxford
University Press, 137–67.
Tiao, G. C. (1985) Autoregressive moving average models, intervention problems and outlier detection
in time series. In Time Series in the Time Domain, Handbook of Statistics 5, (eds E. J. Hannan, P.
R. Krishnaiah and M. M. Rao). New York: North Holland, 85–118.
Ó Blackwell Publishing Ltd 2003
220
P. PERRON AND G. RODRÍGUEZ
Tsay, R. S. (1986) Time series model specification in the presence of outliers. Journal of the American
Statistical Association 81, 132–41.
US Bureau of the Census (1976) The Statistical History of the United States, From Colonial Times to
the Present. New York: Basic Books.
Vogelsang, T. J. (1999) Two simple procedures for testing for a unit root when there are additive
outliers. Journal of Time Series Analysis 20, 237–52.
Ó Blackwell Publishing Ltd 2003