Academia.eduAcademia.edu

SEARCHING FOR ADDITIVE OUTLIERS IN NONSTATIONARY TIME SERIES

2003, Journal of Time Series Analysis

Abstract

proposed a method to detect outliers which explicitly imposes the null hypothesis of a unit root. It works in an iterative fashion to select multiple outlier in a given series. We show, via simulations, that, under the null hypothesis of no outliers, it has the right size in finite samples to detect a single outlier but, when applied in an iterative fashion to select multiple outliers, it exhibits severe size distortions towards finding an excessive number of outliers. We show that his iterative method is incorrect and derive the appropriate limiting distribution of the test at each step of the search. Whether corrected or not, we also show that the outliers need to be very large for the method to have any decent power. We propose an alternative method based on first-differenced data that has considerably more power. We also show that our method to identify outliers leads to unit root tests with more accurate finite sample size and robustness to departures from a unit root. The issues are illustrated using two US/Finland real-exchange rate series.

SEARCHING FOR ADDITIVE OUTLIERS IN NONSTATIONARY TIME SERIES* By Pierre Perron and Gabriel Rodríguez Boston University and Universite´ d’ Ottawa First Version received May 2000 Abstract. Recently, Vogelsang (1999) proposed a method to detect outliers which explicitly imposes the null hypothesis of a unit root. It works in an iterative fashion to select multiple outlier in a given series. We show, via simulations, that, under the null hypothesis of no outliers, it has the right size in finite samples to detect a single outlier but, when applied in an iterative fashion to select multiple outliers, it exhibits severe size distortions towards finding an excessive number of outliers. We show that his iterative method is incorrect and derive the appropriate limiting distribution of the test at each step of the search. Whether corrected or not, we also show that the outliers need to be very large for the method to have any decent power. We propose an alternative method based on first-differenced data that has considerably more power. We also show that our method to identify outliers leads to unit root tests with more accurate finite sample size and robustness to departures from a unit root. The issues are illustrated using two US/Finland real-exchange rate series. Keywords. Additive outliers; t-test; Wiener process; unit root; size; power. JEL: C2, C3, C5. 1. INTRODUCTION From Fox (1972), who introduced the notion of additive and innovational outliers, issues related to this type of atypical observations in time series have received considerable attention in the statistics and econometric literature. The outlier detection issue, itself, has received particular attention.1 Another topic of *This paper is drawn from Chapter 3 of Gabriel Rodrı́guez’s PhD dissertation at the Université de Montréal, Rodrı́guez (1999). We would like to thank Tim Vogelsang for useful conversations. We also thank Lynda Khalaf for comments on an earlier version of this paper entitled ‘Additive Outliers and Unit Roots with an Application to LatinAmerican Inflation’ when it was presented at the 39th Congrès de la société canadienne de sciences économiques, Hull (Québec), May 1999. Address for correspondence: Pierre Perron, Department of Economics, Boston University, 270 Bay State Road, Boston, MA, 02215, USA (e-mail: [email protected]). 1 See, for example, Hawkins (1980), who presents a set of methods proposed before 1980 and Hawkins (1973) who proposed one of the most used methods, based on order statistics, to detect for outliers. 0143-9782/03/02 193–220 JOURNAL OF TIME SERIES ANALYSIS Vol. 24, No. 2 Ó 2003 Blackwell Publishing Ltd., 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. 194 P. PERRON AND G. RODRÍGUEZ interest in the research has been the estimation of autoregressive moving average (ARMA) models in the presence of outliers. In this case, as mentioned by Chen and Liu (1993), a common approach is to identify the locations and the types of outliers and then to accommodate the effects of outliers using intervention models as proposed by Box and Tiao (1975). This approach requires iterations between stages of outlier detection and estimation of the model.2 In the context of integrated data (processes with an autoregressive (AR) unit root), the effects of additive outliers have recently been the object of sustained research. It is, by now, well recognized that outliers affect the properties of unit root tests (Franses and Haldrup, 1994). They do so by inducing a negative moving average (MA) component in the noise function which causes most unit root tests to exhibit substantial size distortions towards rejecting the null hypothesis too often. Franses and Haldrup (1994) suggested applying Dickey– Fuller (1979) unit root tests by incorporating dummy variables in the autoregression chosen on the basis of the outlier detection procedure proposed by Chen and Liu (1993). This procedure has been implemented in the computer program TRAM (Time Series Regression with ARIMA Noise and Missing Values) written by Gómez and Maravall (1992b), which allows us to estimate ARIMA models where missing observations may be treated as additive outliers. In an interesting recent paper, Vogelsang (1999) makes two contributions to the issue about the effects of additive outliers on unit root tests. First, recognizing that outliers induce a negative MA component, he suggests using unit root tests developed by Stock (1999) and Perron and Ng (1996) that are robust, in terms of achieving exact size close to nominal size in small samples, even in the presence of a substantial negative MA component. He shows via simulations that these unit root tests are little affected by systematic outliers. Second, he recognized that one can take advantage of the null hypothesis of a unit root in devising an outlier detection procedure. This allows the derivation of a non-degenerate limiting distribution for the t-statistic on the relevant onetime dummy. In this paper, we make further contributions following the second suggestion of Vogelsang (1999). We show, via simulations, that Vogelsang’s (1999) procedure, under the null hypothesis of no outlier, has the right size in finite samples to detect a single outlier but, when applied in an iterative fashion to select multiple outliers, it exhibits severe size distortions towards finding an excessive number of outliers. We show that there is a basic flaw in the iterative 2 Some references are Chang et al. (1988) and Tsay (1986). Chen and Liu (1993) also followed this way and they proposed another method to detect the locations of the outliers and the joint estimation of the parameters of the model. Their point of view was the fact that, even if the model is well specified, outliers may still produce biased estimates of the parameters and, hence, may affect the outlier detection procedure. This is because atypical observations, in general, affect the variance of the estimates (Peña, 1990). Ó Blackwell Publishing Ltd 2003 SEARCHING FOR OUTLIERS I(1) TIME SERIES 195 method suggested by Vogelsang (1999). In effect, contrary to what he implicitly assumes, the limiting distribution of the test used is different at each iteration of the outlier detection procedure. We derive the appropriate limiting distribution and tabulate some critical values. When so corrected, his method is shown to have very low power to detect outliers (even a single one without the correction made) unless the magnitude of the outlier is very large. As an alternative, we propose a method based on first-differenced data which has considerably more power. All of the methods considered are illustrated using two US/Finland realexchange rate series. The rest of the paper is organized as follows. Section 2 deals with the model and the issue of outlier detection. It reviews the procedure suggested by Vogelsang (1999) and presents simulation evidence about its size. Section 3 derives the correct limiting distribution of the test he suggested for each iteration of the outlier detection procedure. Section 4 presents the procedure based on firstdifferenced data. Section 5 compares its size and power to methods based on levels of the data using simulations. The size of unit root tests corrected using the various methods to detect outliers is investigated in Section 6. An empirical illustration using two US/Finland real-exchange rate series is presented in Section 7. Section 8 presents brief concluding remarks and some details about the data used are discussed in an Appendix. 2. THE MODEL AND THE ISSUE OF OUTLIER DETECTION There is a large literature in statistics and econometrics on the subject of outlier detection in ARMA models. The standard approach is to estimate a fully parameterized ARMA model and construct a t-statistic for the presence of an outlier. Such a t-statistic is constructed at all possible dates and the supremum is taken. The value of the supremum is then compared to a critical value to decide if an outlier is present. Some references are Tsay (1986), Chang et al. (1988), Shin et al. (1996) and Chen and Liu (1993). Using a time series with an AR1MA noise function, Gómez and Maravall (1992a) proposed to analyse missing observations as additive outliers. This paper was the basis for the computer program TRAM written by Gómez and Maravall (1992b) to estimate ARIMA models with missing observations, which was used by Franses and Haldrup (1994) in the context of outlier detection in time series with unit roots. The issue of outlier detection in the unit root framework offers a distinct advantage, namely that one can work under the null hypothesis that a unit root is present. This is the approach taken by Vogelsang (1999) whose procedure has two useful features. First, it does not require a fully parametric model of the noise function and is valid for a wide class of processes. Second, an asymptotic distribution can be obtained and critical values tabulated even without having to make specific distributional and parametric assumptions about the datagenerating process. Ó Blackwell Publishing Ltd 2003 196 P. PERRON AND G. RODRÍGUEZ The data-generating process entertained is of the general form: yt ¼ d t þ m X dj DðT ao;j Þt þ ut ð1Þ j¼1 where DðT ao;j Þt ¼ 1 if t ¼ T ao;j and 0 otherwise. This permits the presence of m additive outliers occurring at dates T ao;j ðj ¼ 1; . . . ; mÞ. The term d t specifies the deterministic components. In most cases, d t ¼ l, if the series is non-trending or d t ¼ l þ bt if the series is trending (of course, other specifications are possible). The noise function is integrated of order one, i.e. ut ¼ ut1 þ vt ð2Þ where vt can be, for example, a linear process of the form vt ¼ uðLÞet with uðLÞ ¼ 1 X 1 X ui Li i2 u2i < 1 i¼0 i¼0 and et is a martingale difference sequence with mean 0 and r2e ¼ lim T 1 T !1 T X Eðe2t Þ t¼1 is finite. What is important is that the sequence vt satisfies the condition for the application of a functional central limit theorem such that T 1=2 ½Tr X vt ) rW ðrÞ t¼1 where W ðrÞ is the unit Wiener process, ) denotes weak convergence in distribution and 2 r ¼ lim T T !1 1 E T X t¼1 vt !2 with 0 < r2 < 1. The detection procedure suggested by Vogelsang (1999), starts with the following regression estimated by ordinary least squares (OLS) – if necessary, a time trend can also be included ^þ^ dDðT ao Þt þ ut yt ¼ l ð3Þ where DðT ao Þt ¼ 1 if t ¼ T ao and 0 otherwise. Let t^d ðT ao Þ denote the t-statistic for testing d ¼ 0 in (3). Following Chen and Liu (1993), the presence of an additive outlier can be tested using Ó Blackwell Publishing Ltd 2003 SEARCHING FOR OUTLIERS I(1) TIME SERIES 197 s ¼ sup jt^d ðT ao Þj T ao Assuming that k ¼ T ao =T remains fixed as T grows, Vogelsang (1999) showed that as T ! 1, W ðkÞ t^d ðT ao Þ ) H ðkÞ ¼ R 1 ð 0 W ðrÞ2 drÞ1=2 ð4Þ where W ðkÞ denotes a demeaned standard Wiener process, i.e. W ðkÞ ¼ W ðkÞ  Z 1 W ðsÞds 0 If (3) also includes a time trend, W ðkÞ will denote a detrended Wiener process. Furthermore, from the continuous mapping theorem, it follows that, s ) sup jH ðkÞj  H ð5Þ k2ð0;1Þ The distribution given in (5) is non-standard but is invariant with respect to any nuisance parameters, including the correlation structure of the noise function. The asymptotic critical values for s were obtained using simulations. The Wiener processes were approximated by normalized sums of i.i.d. (identically and independently distributed) Nð0; 1Þ random deviates using 1000 steps and 50,000 replications. Two cases were considered according to the deterministic components included in (3). When there is an intercept in (3), the critical values are 3.53, 3.11 and 2.92 at the 1%, 5% and 10% significance levels, respectively. If a time trend is also included in (3), the corresponding critical values are 3.73, 3.31 and 3.12.3 The outlier detection procedure recommended by Vogelsang (1999) is implemented as follows.4 First, compute the s statistic for the entire series and compare s to the appropriate critical value. If s exceeds the critical value, then an outlier is detected at date T^ao ¼ arg max jt^d ðT ao Þj: T ao The outlier and the corresponding row of the regression is dropped and (3) is again estimated and tested for the presence of another outlier. This continues until the test shows a non-rejection. 3 Critical values were also tabulated for the case where no deterministic components are included in (3). The critical values at 1%, 5% and 10% significance levels are 3.22, 2.84 and 2.65, respectively. 4 This is equivalent to the stepwise procedure to select for multiple outliers. See Hawkins (1980). Ó Blackwell Publishing Ltd 2003 198 P. PERRON AND G. RODRÍGUEZ 2.1. Simulation experiments for size To assess the properties of the method in finite samples, we performed simulation experiments under the hypothesis that the series contain no outlier. We consider a simple data-generating process with an AR unit root, i.e. y t ¼ y t1 þ ut Two cases are considered for the errors ut ; namely MA(1) processes of the form ut ¼ vt þ hvt1 and AR(l) processes of the form ut ¼ qut1 þ vt In all cases, vt  i.i.d. Nð0; 1Þ. We consider values of h and q in the range ½0:8; 0:8 with a step size of 0.2. Two sample sizes are used: T ¼ 100 and T ¼ 200. The number of replications used was 10,000 and tests at the 5% and 10% significance levels were performed. We first consider the size of the procedures in what we label the ‘one pass’ case. The size is the number of times an observation is categorized as an outlier when searching for a single outlier (without iterating any further for a given sample). Results are presented in Table I. TABLE I Exact Size of Single Outlier Detection MA case AR case Constant T 100 200 Time trend Constant Time trend h 5.0% 10.0% 5.0% 10.0% T q 5.0% 10.0% 5.0% 10.0% )0.80 )0.60 )0.40 )0.20 0.00 0.20 0.40 0.60 0.80 0.184 0.101 0.067 0.050 0.043 0.038 0.037 0.037 0.037 0.332 0.191 0.125 0.104 0.082 0.076 0.072 0.069 0.060 0.137 0.095 0.068 0.051 0.043 0.037 0.035 0.035 0.035 0.241 0.177 0.126 0.097 0.082 0.075 0.069 0.067 0.066 100 )0.80 )0.60 )0.40 )0.20 0.073 0.064 0.053 0.046 0.149 0.121 0.107 0.091 0.068 0.061 0.055 0.048 0.135 0.118 0.106 0.094 0.20 0.40 0.60 0.80 0.031 0.034 0.023 0.020 0.073 0.064 0.050 0.042 0.036 0.032 0.029 0.029 0.071 0.063 0.053 0.047 )0.80 )0.60 )0.40 )0.20 0.00 0.20 0.40 0.60 0.80 0.227 0.105 0.065 0.050 0.042 0.038 0.037 0.037 0.037 0.400 0.205 0.131 0.102 0.086 0.079 0.076 0.074 0.074 0.177 0.102 0.068 0.051 0.042 0.038 0.036 0.034 0.034 0.327 0.198 0.127 0.099 0.084 0.075 0.072 0.069 0.069 200 )0.80 )0.60 )0.40 )0.20 0.080 0.064 0.055 0.048 0.154 0.126 0.110 0.097 0.076 0.065 0.056 0.049 0.151 0.124 0.109 0.096 0.20 0.40 0.60 0.80 0.037 0.034 0.029 0.022 0.077 0.067 0.055 0.043 0.036 0.030 0.025 0.022 0.073 0.062 0.052 0.041 Ó Blackwell Publishing Ltd 2003 199 SEARCHING FOR OUTLIERS I(1) TIME SERIES For the i.i.d. case, Vogelsang’s method has an exact size close to nominal size. For the case with negative MA errors, the test has size distortions (being liberal). These distortions are smaller when more deterministic components are included in the models. For positive MA errors and particularly for the model that includes a time trend, the procedure is slightly undersized. A similar result is observed when there are positively correlated AR errors. The next experiments consider the properties of the method when applied in a full iterative fashion, i.e. continuing to search for additional outliers when one is found. Here, we record the total number of observations categorized as outliers divided by the number of replications. These values can be labelled as the expected number of outliers found. If the tests have the correct size a, say, at each step of the iterations, and the tests are independent, this number should be close to a=ð1  aÞ, that is 0.111 for a significance level 10% and 0.053 for a significance level 5%. The results are presented in Table II. The main thing to note is that Vogelsang’s procedure finds many more outliers than would be expected if the test had the correct size at each step. For example, for the model with only a constant with i.i.d. errors, T ¼ 100, and a significance level of 10%, the number is 0.293 instead of 0.111, i.e. an average of 2.93 outliers for each replication which contains at least one outlier. These distortions increase when T increases to 200 with a value of 0.520 (instead of 0.111) which corresponds to approximately 5.2 outliers per replications which have at least one outlier. TABLE II Expected Number of Outliers Found Using Multiple Outlier Detection MA case Constant T 100 200 AR case Time trend Constant Time trend h 5.0% 10.0% 5.0% 10.0% T q 5.0% 10.0% 5.0% 10.0% )0.80 )0.60 )0.40 )0.20 0.00 0.20 0.40 0.60 0.80 0.216 0.139 0.132 0.129 0.129 0.127 0.130 0.133 0.132 0.447 0.306 0.285 0.292 0.293 0.295 0.296 0.288 0.293 0.154 0.109 0.094 0.092 0.096 0.097 0.097 0.101 0.101 0.291 0.220 0.194 0.196 0.205 0.208 0.205 0.207 0.206 100 )0.80 )0.60 )0.40 )0.20 0.124 0.128 0.127 0.129 0.278 0.272 0.286 0.292 0.091 0.086 0.088 0.091 0.198 0.181 0.192 0.199 0.20 0.40 0.60 0.80 0.128 0.129 0.136 0.147 0.294 0.293 0.294 0.297 0.097 0.105 0.110 0.146 0.204 0.207 0.227 0.278 )0.80 )0.60 )0.40 )0.20 0.00 0.20 0.40 0.60 0.80 0.295 0.197 0.200 0.214 0.209 0.214 0.212 0.214 0.216 0.638 0.478 0.494 0.515 0.520 0.509 0.505 0.503 0.500 0.206 0.141 0.138 0.144 0.142 0.147 0.147 0.147 0.148 0.428 0.311 0.308 0.324 0.331 0.338 0.341 0.338 0.339 200 )0.80 )0.60 )0.40 )0.20 0.195 0.203 0.206 0.217 0.487 0.499 0.505 0.519 0.126 0.133 0.141 0.143 0.296 0.301 0.319 0.328 0.20 0.40 0.60 0.80 0.217 0.217 0.225 0.221 0.505 0.494 0.495 0.488 0.145 0.145 0.150 0.183 0.337 0.332 0.345 0.359 Ó Blackwell Publishing Ltd 2003 200 P. PERRON AND G. RODRÍGUEZ 3. THE DISTRIBUTION OF THE TEST s AT DIFFERENT ITERATIONS In Section 2, we showed that the original procedure of Vogelsang (1999) has severe size distortions when applied in an iterative fashion to search for outliers. The reason for this is that the limiting distribution of the s test given by (5) is only valid in the first step of the iteration. In subsequent steps, the asymptotic critical values used need to be modified. The correct limiting distribution at each step is given in Theorem 1. Theorem 1. Suppose that y t is generated by (1) with di ¼ 0 ði ¼ 1; . . . ; mÞ and let sðiÞ be the statistic s obtained at step i of the iterative search for outliers, then lim Pr½sðiÞ > x ¼ Pr T !1 ½H > x ai1 where a is the significance level of the test. Hence, the correct a-percentage point of the limiting distribution of sðiÞ is the ai percentage point of the distribution of H defined by (5). Proof. The basic reason for this result is that, at different steps, the tests are not independent; indeed, they are asymptotically equivalent because the series is integrated. Hence, at each step sðiÞ ) H unconditionally on what happened in the previous steps. But subsequent steps are applied, only if the previous one showed a rejection; hence one must consider the limiting distribution conditional on a rejection at the previous step. For simplicity, consider this limiting distribution for the second step. It is given by, where xa is the a-percentage point of the distribution of H , lim Pr½sð2Þ > xjsð1Þ > xa ¼ limT !1 Pr½ðsð2Þ > xÞ \ ðsð1Þ > xa Þ limT !1 Pr½sð1Þ > xa ¼ limT !1 Pr½ðsð2Þ > xÞ \ ðsð1Þ > xa Þ a T !1 since sð1Þ ) H . Now, since we also have sð2Þ ) H , Pr½ðH > xÞ \ ðH > xa Þ a Pr½H > x ¼ a lim Pr½sð2Þ > xjsð1Þ > xa ¼ T !1 provided xPxa , which we shall need to have tests with correct sizes. The result stated in the theorem follows using further iterations of the same arguments. QED We shall denote by sc , the iterative outlier detection procedure that uses the correct (and different) asymptotic critical values at different steps. We have simulated some asymptotic critical values. We approximate the Wiener process by Ó Blackwell Publishing Ltd 2003 SEARCHING FOR OUTLIERS I(1) TIME SERIES 201 TABLE III Asmptotic Critical Values of the Test sc a i Model 1 zt ¼ f1g Model 2 zt ¼ f1; tg 0.05 1 2 3 4 2.99 3.69 4.29 4.43 3.33 4.86 13.16 18.20 0.10 1 2 3 4 5 2.81 3.38 3.88 4.33 4.78 3.11 3.94 6.08 14.43 36.44 0.20 1 2 3 4 5 6 7 2.61 3.05 3.43 3.79 4.12 4.42 4.73 2.87 3.41 4.05 5.40 8.88 18.04 33.41 normalized sums of i.i.d. N(0, 1) random variables using 200 steps. To obtain a fair range of critical values, we used 2 million replications. Nevertheless, even with such a large number of replications, the critical values can be obtained for only a few cases. This is because as we proceed further in the iterations of the outlier detection, we need percentage points of the distribution of H that are very far in the tail. For example, if the significance level is a ¼ 0:05, the percentage point needed at the fourth iteration is approximately 0.00001. Hence, even with 2 million replications, we can only present critical values up to i ¼ 4 for a ¼ 0:05, i ¼ 5 for a ¼ 0:10, and i ¼ 7 for a ¼ 0:20. These are presented in Table III.5 4. A TEST USING FIRST DIFFERENCES OF THE DATA As discussed in Section 5, Vogelsang’s original procedure is not powerful unless the size of the outlier is very large. As a consequence, the full corrected iterative procedure is even less powerful since the critical values to be used at each iteration increase. Simulation evidence to that effect is presented in Section 5. Hence, it is desirable to entertain an alternative outlier detection procedure that is less likely to suffer from this low power problem. 5 Note that the critical values with i ¼ 1 are not quite identical to those presented in Section 2 of this paper or in Vogelsang (1999) since 200 instead of 1000 steps were used to approximate the Weiner process. The differences, however, are minor and do not affect subsequent results. Ó Blackwell Publishing Ltd 2003 202 P. PERRON AND G. RODRÍGUEZ We propose an iterative strategy using tests based on first-differences of the data. Consider data generated by (1) with d t ¼ l, and a single outlier occurring at date T ao with magnitude d. Then, Dy t ¼ d½DðT ao Þt  DðT ao Þt1 þ vt ð6Þ where DðT ao Þt ¼ 1, if t ¼ T ao (0, otherwise) and DðT ao Þt1 ¼ 1, if t ¼ T ao þ 1 (0, otherwise). If the data are trending, a constant should be included. This reflects the fact that a unit root process with an outlier is characterized in first-differences by two successive outliers of equal magnitude but with opposite signs. We have that the least-squares estimate of d is given by Dy  Dy tþ1 ^ d¼ t 2 vt  vtþ1 ¼ 2 under the null hypothesis of no outlier. So, the variance of ^d is given by dÞ ¼ varð^ Rv ð0Þ  Rv ð1Þ 2 where Rv ðjÞ is the autocovariance function of vt at delay j. Let ^ v ðjÞ ¼ T 1 R T j X ^vt ^vtþj t¼1 ^ v ðjÞ is a with ^vt the least-squares residuals obtained from regression (6). Then, R consistent estimate of Rv ðjÞ. We can then consider the test statistic sd ¼ sup jt^d ðT ao Þj T ao where t^d ðT ao Þ ¼ ^d ^ v ð0Þ  R ^ v ð1ÞÞ=2Þ1=2 ððR To detect multiple outliers, we can follow a strategy similar to that suggested by Vogelsang (1999), by dropping the observation labelled as an outlier before proceeding to the next step. The important feature is that, unlike for the case of tests based on levels (as the s statistic of Vogelsang), in the limit the test sd is not perfectly correlated across each step of the iterations when dealing with multiple outliers. With i.i.d. errors, the values of sd are approximately uncorrelated at each step of the iterations; with positively correlated errors, this no longer holds but the correlation is mild. The disadvantage of this procedure, compared to that based on the level of the data, is that the limiting distribution depends on the specific distribution of the errors ut , though not on the presence of serial correlation and heteroscedasticity. This problem is exactly the same as that for finding outliers in stationary time Ó Blackwell Publishing Ltd 2003 203 SEARCHING FOR OUTLIERS I(1) TIME SERIES series since, by differencing, we effectively work with a stationary series. The standard practice in the literature is rather ad hoc and consists in rejecting if the t-statistic on some observation is greater than a critical value chosen to be some number between 3 and 4; see, for example Tiao (1985), Chang and Tiao (1983) and Tsay (1986), among others. Here, we shall simulate critical values assuming i.i.d. normal errors and discuss the extent to which inference is affected when the data deviates from these specifications. So, the data generating process is again y t ¼ y t1 þ ut ð7Þ where ut  i.i.d. Nð0; 1Þ. Two samples sizes are considered, namely T ¼ 100 and T ¼ 200. The number of replications used was 50,000. The percentage points of the test sd are presented in Table IV. To assess the size of the test in finite samples when correlation is present in the errors, we consider, as in Section 2.1, the same process defined by (7) with correlated errors. Two cases are considered for the errors ut ; namely MA(l) processes of the form ut ¼ vt þ hvt1 and AR(l) processes of the form ut ¼ qut1 þ vt . In all cases, vt  i.i.d. Nð0; 1Þ. We consider values of h and q in the range ½0:8; 0:8 with a step size of 0.4. The sample size is T ¼ 100, the number of replications used was 10,000 and tests at the 5% significance level were performed. We consider the iterative procedure with up to four outliers. The results are presented in Table V. TABLE IV Finite Sample Critical Values of the Test sd Model 1 zt ¼ f1g Level of significance 1.0% 2.5% 5.0% 10.0% Model 2 zt ¼ f1; tg T ¼ 100 T ¼ 200 T ¼ 100 T ¼ 200 4.14 3.87 3.65 3.44 4.20 3.95 3.75 3.56 4.13 3.85 3.63 3.42 4.19 3.94 3.74 3.55 TABLE V Exact Size of the Test Based on sd Probability to find 1st outlier i.i.d case 2nd outlier 3rd outlier 4th outlier 0.047 0.002 0.000 0.000 MA case h ¼ 0:80 h ¼ 0:40 h ¼ 0:40 h ¼ 0:80 0.053 0.052 0.034 0.021 0.003 0.002 0.003 0.005 0.000 0.000 0.001 0.001 0.000 0.000 0.000 0.001 AR case q ¼ 0:80 q ¼ 0:40 q ¼ 0:40 q ¼ 0:80 0.029 0.053 0.039 0.029 0.003 0.002 0.003 0.007 0.000 0.000 0.001 0.005 0.000 0.000 0.000 0.004 Ó Blackwell Publishing Ltd 2003 204 P. PERRON AND G. RODRÍGUEZ The probability of finding at least one outlier is close to the nominal 5% level throughout. The test is slightly conservative with positive MA errors or when the AR coefficient is very large in absolute value. The probability of finding at least two outliers is close to the theoretically expected value of 0.0025. The probability of finding more than two outliers is basically null in all cases. Hence, we conclude that the iterative procedure is adequate in that it delivers the expected number of rejections at each stage of the iterations. Also, the correction for the presence of serial correlation appears to perform satisfactorily. 5. SIMULATIONS FOR SIZE AND POWER In this section, we present results about the size and, especially, the power of the various procedures when multiple outliers are present. The data generating process considered is yt ¼ m X dj DðT ao;j Þt þ ut ð8Þ j¼1 ut ¼ ut1 þ vt ð9Þ where DðT ao;j Þt ¼ 1 if t ¼ T ao;j and 0 otherwise. Again, two cases are considered for the errors vt : MA(l) processes of the form vt ¼ et þ het1 and AR(l) processes of the form vt ¼ qvt1 þ et . In all cases, et  i.i.d. N(0, 1). We consider values of h and q in the range ½0:8; 0:8 with a step size of 0.4. This permits the presence of m additive outliers occurring at dates T ao;j ðj ¼ l; . . . ; mÞ. We consider two cases: one with m ¼ 0 to assess size and one with m ¼ 4 to assess power. All simulations are based on a sample size T ¼ 100 and 10,000 replications were performed. We present results only for the case where a constant is included in the set of deterministic components. The significance level of the test is set to 5%. For the procedures based on sc and sd , we used the critical values presented in Tables III and IV respectively. When m ¼ 4, the location of the outliers are at observations 20, 40, 60 and 80. The magnitudes of the outliers considered are either (a) d1 ¼ 5, d2 ¼ 3 and d3 ¼ d4 ¼ 2, or (b) d1 ¼ 10 and d2 ¼ d3 ¼ d4 ¼ 5. We consider the properties of Vogelsang’s uncorrected method (s), its corrected version (sc ) and the method based on first-differenced data (sd ). The results are presented in Table VI (MA errors) and Table VII (AR errors). Consider first the behaviour of the tests when there is no outlier. The only procedure with a size close to the expected theoretical nominal size (5% at the first step, 0.0025 at the second and basically 0 at the third and fourth) is that based on first-differenced data (sd ), though, as noted before, it is somewhat conservative with an AR coefficient that is large in absolute value and for positive MA Ó Blackwell Publishing Ltd 2003 TABLE VI Size and Power of the Tests to Detect for Additive Outliers; MA(1) Errors d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0 d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2 d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5 s sc sd s sc sd s sc sd 1st outlier 2nd outlier 3rd outlier 4th outlier 0.166 0.026 0.004 0.001 0.242 0.001 0.000 0.000 0.056 0.002 0.000 0.000 0.834 0.360 0.094 0.022 0.865 0.121 0.001 0.000 0.746 0.179 0.019 0.002 0.998 0.957 0.865 0.661 0.999 0.828 0.359 0.147 1.000 0.921 0.779 0.518 h ¼ 0:40 1st outlier 2nd outlier 3rd outlier 4th outlier 0.063 0.021 0.011 0.007 0.098 0.003 0.000 0.000 0.054 0.002 0.000 0.000 0.286 0.058 0.013 0.007 0.342 0.009 0.000 0.000 0.941 0.396 0.076 0.007 0.793 0.401 0.194 0.081 0.823 0.199 0.021 0.004 1.000 0.997 0.985 0.912 h ¼ 0:00 1st outlier 2nd outlier 3rd outlier 4th outlier 0.040 0.023 0.015 0.010 0.065 0.004 0.000 0.000 0.047 0.002 0.000 0.000 0.101 0.024 0.014 0.010 0.135 0.004 0.000 0.000 0.996 0.674 0.228 0.040 0.464 0.122 0.038 0.013 0.516 0.035 0.001 0.000 1.000 1.000 1.000 0.998 h ¼ 0:40 1st outlier 2nd outlier 3rd outlier 4th outlier 0.036 0.024 0.018 0.012 0.054 0.004 0.000 0.000 0.038 0.004 0.001 0.000 0.055 0.022 0.015 0.010 0.079 0.003 0.000 0.000 1.000 0.821 0.380 0.106 0.231 0.044 0.016 0.010 0.282 0.007 0.001 0.000 1.000 1.000 1.000 1.000 h ¼ 0:80 1st outlier 2nd outlier 3rd outlier 4th outlier 0.130 0.026 0.015 0.011 0.164 0.004 0.000 0.000 1.000 1.000 1.000 1.000 0.036 0.053 0.021 0.042 0.063 0.994 0.026 0.004 0.005 0.023 0.003 0.749 0.019 0.000 0.001 0.017 0.000 0.297 0.014 0.000 0.001 0.012 0.000 0.087 P Note: The data generating process is: y t ¼ 4j¼1 dj DðT ao;j Þt þ ut with ut ¼ ut1 þ vt and vt ¼ et þ het1 where et  i.i.d: Nð0; 1Þ. 10,000 replications are used. 205 Ó Blackwell Publishing Ltd 2003 h ¼ 0:80 SEARCHING FOR OUTLIERS I(1) TIME SERIES Probability to find 206 Ó Blackwell Publishing Ltd 2003 TABLE VII Size and Power of the Tests to Detect for Additive Outliers; AR(1) Errors d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0 d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5 s sc sd s sc sd s sc sd q ¼ 0:80 1st outlier 2nd outlier 3rd outlier 4th outlier 0.069 0.022 0.009 0.006 0.106 0.002 0.000 0.000 0.029 0.003 0.001 0.000 0.301 0.059 0.014 0.005 0.360 0.010 0.000 0.000 0.375 0.044 0.003 0.000 0.824 0.426 0.202 0.077 0.849 0.210 0.018 0.003 0.965 0.565 0.279 0.098 q ¼ 0:40 1st outlier 2nd outlier 3rd outlier 4th outlier 0.052 0.020 0.013 0.009 0.083 0.003 0.000 0.000 0.055 0.002 0.000 0.000 0.206 0.038 0.013 0.008 0.254 0.006 0.000 0.000 0.921 0.361 0.067 0.006 0.701 0.288 0.119 0.045 0.738 0.126 0.009 0.001 1.000 0.993 0.973 0.880 q ¼ 0:40 1st outlier 2nd outlier 3rd outlier 4th outlier 0.033 0.024 0.019 0.014 0.050 0.004 0.001 0.000 0.042 0.003 0.001 0.000 0.042 0.021 0.016 0.012 0.067 0.003 0.000 0.000 1.000 0.856 0.429 0.131 0.159 0.030 0.015 0.011 0.200 0.004 0.000 0.000 1.000 1.000 1.000 1.000 q ¼ 0:80 1st outlier 2nd outlier 3rd outlier 4th outlier 0.027 0.021 0.018 0.015 0.038 0.003 0.001 0.000 1.000 1.000 1.000 1.000 0.025 0.033 0.030 0.025 0.033 1.000 0.022 0.004 0.007 0.022 0.003 0.935 0.019 0.001 0.004 0.019 0.001 0.608 0.017 0.000 0.004 0.016 0.000 0.308 P4 Note: The data generating process is: y t ¼ j¼1 dj DðT ao;j Þt þ ut with ut ¼ ut1 þ vt and vt ¼ qvt1 þ et where et  i.i.d. Nð0; 1Þ. 10,000 replications are used. P. PERRON AND G. RODRÍGUEZ Probability to find d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2 SEARCHING FOR OUTLIERS I(1) TIME SERIES 207 coefficients. Vogelsang’s procedure, whether corrected or not show substantial size distortions (liberal tests) in the presence of negative MA errors, and also to a lesser extent in the presence of strong negative AR errors. The results also confirm the fact that Vogelsang’s uncorrected procedure (s) finds an excessive number of outliers when applied in an iterative fashion. The most interesting feature of the results is that the methods based on the level of the data have basically no power, while the method based on first-differenced data has excellent power even for outliers of moderate size. Consider, for example, case (a) which is representative of outliers of moderate sizes. In the case with i.i.d. errors, Vogelsang’s corrected procedure (sc ) finds one outlier 14% of the cases, while it basically never finds more than one outlier. The method based on firstdifferenced data finds at least one outlier almost 100% of the times and more than three outliers 23% of the times. Consider now case (b) which is representative of large outliers, the method sd finds four outliers basically 100% of the times, while the method sc finds at least one outlier 52% of the time and finds more than two outliers only 4% of the time. The results are qualitatively similar with errors that are serially correlated. Negative serial correlation (of the AR or MA type) induces a loss of power while positive serial correlation (again of either type) induces an increase in power. 5.1. Robustness to departures from a unit root It is of interest to assess the extent to which our suggested procedure is robust to departures from a unit root. Indeed, sometimes, the purpose of detecting outliers is to provide appropriate corrections to unit root tests, in which case the presence or not of a unit root is unknown. A popular device to analyse this issue is a so-called near-integrated process which specifies, under the null hypothesis of no outlier that  c y t ¼ 1 þ y t1 þ vt ð10Þ T where vt is a stationary process. Here, c is a non-centrality parameter which measures the extent of departures from a strict unit root process. When c < 0, we have a locally stationary process. This is labelled as a process local to unity, since as T increases the AR parameter converges to one. A little algebra shows that, under this specification, the OLS estimate ^ d from regression (6) is Dy  Dy tþ1 ^ d¼ t 2  Tc Dy t þ ðvt  vtþ1 Þ ¼ 2 vt  vtþ1 þ Op ðT 1 Þ ¼ 2 ^ v ð0Þ  R ^ v ð1ÞÞ=2 remains a consistent estimate of the variance of ^d. Hence, Also, ðR we can expect our procedure to remain adequate under this local to unit root Ó Blackwell Publishing Ltd 2003 208 P. PERRON AND G. RODRÍGUEZ set-up. But the robustness of our procedure also extends to the case where y t is a ^ v ð0Þ  R ^ v ð1ÞÞ=2 remains a stationary processes. The reason is again that ðR consistent estimate of the variance of ^ d. To assess the size of the procedures sc and sd under departures from a unit root, we performed simulations from data generated by (10) with T ¼ 100 and vt  i.i.d. Nð0; 1Þ for a range of values for c. The results are presented in Table VIII. They clearly show that the procedure sd has an exact size close to the nominal 5% for any value of c. On the other hand, the procedure sc shows increasing size distortions as c moves away from 0 (it is easy to show that the asymptotic distribution of the test sc is different under the local to unity framework; the Wiener process being replaced by an Orhnstein–Uhlenbeck process with drift parameter c). The last row of Table VIII, shows the exact size when y t  i.i.d. Nð0; 1Þ. Again, the procedure sd has an exact size close to the nominal 5%. When the process is stationary, a more natural procedure is to base the test on a regression using levels of the data, i.e. a regression of the form y t ¼ l þ dDðT ao Þt þ vt ð11Þ Let d be the OLS estimate of d and denote the t-statistic for testing d ¼ 0 by td ðT ao Þ ¼ ^ v ð0Þ ¼ T 1 where R statistic is then PT v2t t¼1 ^ d ^ v ð0Þ1=2 R with ^vt the OLS residuals from regression (11). The sl ¼ sup jtd ðT ao Þj T ao To assess the size and power properties of sd and sl , we performed a simulation experiment with data generated by (8) 0. Now the errors ut are stationary and, again, two cases are considered: MA(1) processes of the form ut ¼ et þ het1 and TABLE VIII Size of the Tests sc and sd with Non-unit Root Processes sc Near-integrated case c c c c c c ¼ 1:25 ¼ 2:5 ¼ 5:0 ¼ 10:0 ¼ 20:0 ¼ 40:0 i.i.d. case sd T ¼ 100 T ¼ 200 T ¼ 100 T ¼ 200 0.062 0.065 0.083 0.126 0.206 0.281 0.065 0.071 0.091 0.145 0.248 0.362 0.053 0.053 0.054 0.054 0.055 0.056 0.051 0.051 0.051 0.051 0.050 0.051 0.329 0.501 0.056 0.051 Note: For the near integrated case, the data generating process is y t ¼ ð1 þ c=T Þy t1 þ et where et  i.i.d. Nð0; 1Þ. For the i.i.d. case, it is y t ¼ et . 10,000 replications are used. Ó Blackwell Publishing Ltd 2003 SEARCHING FOR OUTLIERS I(1) TIME SERIES 209 AR(1) processes of the form ut ¼ qut1 þ et with et  i.i.d. N(0, 1). The results, obtained from 10,000 replications, are presented in Tables IX and X (the critical values for sl were obtained the same way as those for sd in Table IV except that regression (11) is used instead of (6)). When the process is i.i.d. or negatively serially correlated, the procedure sd based on first-differences is indeed less powerful than that based on level (sl ). However, with positive serial correlation, the reverse holds and the procedure based on first-differences is more powerful. This is encouraging since most macroeconomic time series are positively correlated. Hence, for most applications of interest in economics, which have positive correlation, not only sd is valid if a unit root is present but is more powerful than the more common procedure based on level. 5.2. Consistency against local alternatives It is well known that tests for outliers of the type considered here (based on the value of a single observation) are inconsistent against fixed alternatives. That is, the power of the tests against a fixed value of d does not converge to 1 as the sample size increases. Nevertheless, insights into relative powers can be obtained looking at the properties of the tests in the presence of local alternatives of the form H 1 : dT ¼ do T a ð12Þ for some fixed do . It is easy to show that Vogelsang’s procedure corrected or not (s or sc ) is consistent against alternatives of the form (12) only for values a > 12. On the other hand, the procedures sd and sl are consistent against such alternatives for any values a > 0. This goes some way towards explaining the greater power found for sd in the simulations. 5.3. Departures from normality As we argued above, the procedure sd has several advantages over the procedure sc proposed by Vogelsang: the same critical values can be used at each step of the iterations, power is much higher, and it is robust to departures from a unit root. However, an advantage of the procedure sc is that it is not affected by departures from the normality assumption (at least in large samples and with a strict unit root). This is not the case for the procedure sd whose distribution is heavily dependent on the normality assumption. This is not a new problem and it has been present throughout much of the literature on outlier detection and, as discussed in Section 4, most have resorted to recommend some ad hoc rule of thumbs to decide on a rejection or not. The distribution of tests like sd or the more common sl depends on the shape of the tail of the distribution of the error process. For alternative distributions, we obtained the following results from 25,000 replications. With uniform ½ 12 ; 12 errors, which have no tail, the 5% critical value of sd is 2.61 for T ¼ 100 Ó Blackwell Publishing Ltd 2003 210 d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0 Probability to find d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2 d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5 sl sd sl sd sl sd h ¼ 0:80 1st outlier 2nd outlier 3rd outlier 4th outlier 0.047 0.003 0.000 0.000 0.055 0.003 0.000 0.000 0.571 0.089 0.006 0.001 0.168 0.014 0.001 0.000 0.999 0.821 0.553 0.257 0.832 0.239 0.051 0.001 h ¼ 0:40 1st outlier 2nd outlier 3rd outlier 4th outlier 0.047 0.003 0.000 0.000 0.056 0.003 0.000 0.000 0.798 0.206 0.022 0.002 0.318 0.035 0.002 0.000 1.000 0.964 0.862 0.605 0.974 0.505 0.198 0.048 h ¼ 0:00 1st outlier 2nd outlier 3rd outlier 4th outlier 0.053 0.002 0.000 0.000 0.056 0.003 0.000 0.000 0.876 0.276 0.036 0.003 0.614 0.113 0.011 0.000 1.000 0.989 0.946 0.765 0.999 0.827 0.596 0.313 h ¼ 0:40 1st outlier 2nd outlier 3rd outlier 4th outlier 0.052 0.002 0.000 0.000 0.053 0.002 0.000 0.000 0.807 0.211 0.023 0.003 0.878 0.283 0.039 0.003 1.000 0.967 0.868 0.616 1.000 0.977 0.928 0.772 h ¼ 0:80 1st outlier 2nd outlier 3rd outlier 4th outlier 0.037 0.004 0.000 0.000 0.595 0.099 0.008 0.000 0.907 0.323 0.048 0.001 0.999 0.839 0.570 0.260 1.000 0.984 0.948 0.828 0.044 0.004 0.000 0.000 P Note: The data generating process is: y t ¼ 4j¼1 dj DðT ao;j Þt þ ut with ut ¼ et þ het1 where et  i.i.d. Nð0; 1Þ. 10,000 replications are used. P. PERRON AND G. RODRÍGUEZ Ó Blackwell Publishing Ltd 2003 TABLE IX Size and Power of sl and sd for Stationary Processes; MA(1) Errors TABLE X Size and Power of sl and sd for Stationary Processes; AR(1) Errors d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2 d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5 sl sd sl sd sl sd 1st outlier 2nd outlier 3rd outlier 4th outlier 0.021 0.003 0.001 0.000 0.025 0.002 0.000 0.000 0.315 0.027 0.002 0.000 0.062 0.003 0.000 0.000 0.959 0.511 0.197 0.058 0.397 0.048 0.005 0.000 q ¼ 0:40 1st outlier 2nd outlier 3rd outlier 4th outlier 0.046 0.003 0.000 0.000 0.052 0.003 0.000 0.000 0.789 0.189 0.021 0.002 0.287 0.027 0.002 0.000 1.000 0.957 0.840 0.567 0.952 0.432 0.141 0.030 q ¼ 0:40 1st outlier 2nd outlier 3rd outlier 4th outlier 0.051 0.003 0.000 0.000 0.056 0.003 0.000 0.000 0.804 0.200 0.022 0.002 0.885 0.283 0.043 0.004 1.000 0.963 0.855 0.591 1.000 0.979 0.938 0.788 q ¼ 0:80 1st outlier 2nd outlier 3rd outlier 4th outlier 0.055 0.002 0.000 0.000 0.366 0.034 0.003 0.000 0.985 0.533 0.129 0.020 0.975 0.585 0.249 0.070 1.000 0.999 0.998 0.986 0.022 0.004 0.001 0.000 P4 Note: The data generating process is: y t ¼ j¼1 dj DðT ao;j Þt þ ut with ut ¼ qut1 þ et where et  i.i.d. Nð0; 1Þ. 10,000 replications are used. 211 Ó Blackwell Publishing Ltd 2003 q ¼ 0:80 SEARCHING FOR OUTLIERS I(1) TIME SERIES Probability to find d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0 212 P. PERRON AND G. RODRÍGUEZ (compared to 3.65 with normal errors). On the other hand, when the errors are distributed as a v2 ð1Þ (centred to have mean zero), the 5% critical value is 6.22; indeed, much higher due to the long right tail of the distribution. To the authors’ knowledge, no satisfactory procedure is available to overcome this dependence of outlier detection procedures on the exact nature of the error distribution. It may be possible to use extreme value theory and non-parametric estimates of tail behaviour but such an extension is well beyond the scope of the present paper. 6. SIZE OF CORRECTED ADF UNIT ROOT TESTS As noted by Franses and Haldrup (1994) and Vogelsang (1999), the presence of outliers biases unit root tests towards over-rejection of the null hypothesis acting like a negative MA component. One of the aim of outlier detection mentioned by these authors is to be able to correct the unit root tests by incorporating appropriate dummy variables. To assess the relative merits of the outlier detection procedures discussed in correcting the size of unit root tests, we again resorted to a simulation analysis concerning the size of the Dickey–Fuller (1979) test given by the t-statistic for testing that a ¼ 1 in the regression y t ¼ l þ ay t1 þ pþ1 X m X i¼0 j¼1 dij DðT ao;j Þti þ k X d i Dy ti þ et i¼1 where DðT ao;j Þti ¼ 1 if t ¼ T ao;j þ i and 0 otherwise, with T ao;j ðj ¼ l; . . . ; mÞ the dates of the outliers identified. The data generating process is the same as described earlier. In constructing the unit root tests, the lag length k was selected in the same way as in Vogelsang (1999), namely using a recursive general to specific t-test on the last lag with a significance level of 10% starting at some maximal value set at 5. The results are presented in Table XI (MA errors) and Table XII (AR errors). For each case, the row marked ‘without’ indicates the percentage of rejections of the null hypothesis that occurred when no outlier was selected; the row marked ‘with’ indicates the percentage of rejections of the null hypothesis when outliers were detected and the appropriate dummies introduced in the autoregression; the row marked ‘total’ is simply the sum of the two values. We first consider the case where no outlier is present. This establishes a base case to compare size distortions with cases where outliers are present. With AR errors all procedures have approximately the correct size. The same is true with a positively correlated MA component. As is well known, the ADF unit root test suffers from substantial size distortions with a negatively correlated MA component and this is reflected in our results. When outliers are present, the size of the ADF test corrected for outliers using sc is, in almost all cases, larger than when corrected using the method sd . For example, with i.i.d. errors and large Ó Blackwell Publishing Ltd 2003 TABLE XI Size of the ADF Test; MA(1) Errors* d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2 d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5 s sc sd s sc sd s sc sd Without With Total 0.309 0.089 0.398 0.274 0.122 0.396 0.368 0.021 0.389 0.033 0.399 0.431 0.024 0.402 0.426 0.124 0.309 0.433 0.000 0.380 0.380 0.000 0.460 0.460 0.000 0.392 0.392 h ¼ 0:40 Without With Total 0.069 0.019 0.088 0.062 0.025 0.087 0.081 0.005 0.086 0.040 0.070 0.110 0.031 0.072 0.103 0.007 0.081 0.088 0.003 0.123 0.125 0.001 0.132 0.133 0.000 0.074 0.074 h ¼ 0:00 Without With Total 0.038 0.012 0.050 0.034 0.016 0.049 0.049 0.002 0.051 0.048 0.030 0.078 0.041 0.035 0.076 0.001 0.051 0.052 0.011 0.082 0.094 0.008 0.079 0.087 0.000 0.041 0.041 h ¼ 0:40 Without With Total 0.048 0.010 0.057 0.044 0.012 0.056 0.057 0.001 0.058 0.035 0.016 0.051 0.029 0.020 0.049 0.000 0.045 0.045 0.024 0.055 0.079 0.017 0.057 0.074 0.000 0.043 0.043 h ¼ 0:80 Without With Total 0.048 0.008 0.056 0.045 0.008 0.053 0.056 0.001 0.057 0.036 0.012 0.047 0.031 0.015 0.046 0.000 0.050 0.050 0.044 0.032 0.076 0.037 0.033 0.070 0.000 0.043 0.043 *Choosing the lag length with the sequential t-sig method. 213 Ó Blackwell Publishing Ltd 2003 h ¼ 0:80 SEARCHING FOR OUTLIERS I(1) TIME SERIES d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0 214 Ó Blackwell Publishing Ltd 2003 TABLE XII Size of the ADF Test; AR(1) Errors* d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2 d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5 s sc sd s sc sd s sc sd q ¼ 0:80 Without With Total 0.040 0.015 0.055 0.034 0.016 0.051 0.049 0.001 0.050 0.034 0.056 0.090 0.028 0.053 0.081 0.061 0.024 0.085 0.001 0.110 0.111 0.001 0.118 0.119 0.008 0.089 0.097 q ¼ 0:40 Without With Total 0.040 0.013 0.053 0.036 0.017 0.053 0.050 0.003 0.053 0.033 0.042 0.075 0.027 0.043 0.070 0.006 0.047 0.053 0.003 0.099 0.103 0.002 0.097 0.099 0.000 0.045 0.045 q ¼ 0:40 Without With Total 0.041 0.009 0.050 0.038 0.010 0.048 0.050 0.002 0.052 0.031 0.012 0.043 0.026 0.016 0.042 0.000 0.041 0.041 0.020 0.032 0.052 0.015 0.031 0.046 0.000 0.041 0.041 q ¼ 0:80 Without With Total 0.051 0.004 0.055 0.049 0.006 0.055 0.056 0.001 0.057 0.039 0.004 0.043 0.037 0.006 0.043 0.000 0.045 0.045 0.030 0.004 0.034 0.028 0.007 0.035 0.000 0.043 0.043 *Choosing the lag length with the sequential t-sig method. P. PERRON AND G. RODRÍGUEZ d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0 SEARCHING FOR OUTLIERS I(1) TIME SERIES 215 outliers, the size is 0.087 when corrected with sc and 0.041 when corrected with sd . Comparing the rows ‘with’ and ‘without’, we see that, when outliers is present, rejections of the unit root occurring when no outlier is identified are very small for the method sd while they are substantial when using the method sc . As emphasized by Franses and Haldrup (1994), outliers induce an MA-like component in the errors when they are not accounted for. Even if Vogelsang’s method selects too few outliers (given its low power), the size of the corrected ADF test can be brought close to nominal size when using a data-dependent method to select the lag length since the latter would tend to correct for missed outliers by choosing a higher lag length (since the missed outliers have the effect of inducing a negative MA structure in the errors). To verify this claim, we conducted the same simulation experiments with pure AR(1) errors and the lag length fixed at its true value 1. The results are presented in Table XIII. Consider, for instance, the case with an AR coefficient q ¼ 0:4 with large outliers. The size of the unit root test corrected using sc is 0.19 with k fixed at 1 instead of 0.10 with k selected using the sequential t-test procedure. Hence, it is clear that the failure to account for all outliers present can be compensated by the selection of a larger lag length. Yet, as the results for the size of the unit root test corrected using sd show, a good method to select outliers does a better job at reducing size distortions. 7. EMPIRICAL APPLICATIONS The procedures analysed in the last sections were applied to two series of realexchange rates for US/Finland. The first series covers the period 1900–88 and it is constructed using the consumption price index (CPI). The other series spans the years 1900–87 and is constructed using the gross domestic product (GDP) deflator. The series are shown in Figures 1 and 2, respectively. These are the same series used by Vogelsang (1999), Franses and Haldrup (1994) and Perron and Vogelsang (1992) and are described in more details in Appendix A. Franses and Haldrup (1994) used the TRAM program (Time Series Regression With ARIMA Noise and Missing Values) written by Gómez and Maravall (1992b) to search for outliers in these two real-exchange rate series. They considered two types of outliers, additive outliers and outliers that produce temporary changes, denoted AO and TC outliers, respectively. For the US/ Finland real-exchange rate series based on the CPI index, they found four additive outliers at dates 1918, 1922, 1945 and 1948. The observations associated with the years 1917, 1932 and 1949 were found to be outliers that produce temporary changes (TC outliers). For the US/Finland real-exchange rate series based on the GDP deflator, an additive outlier was found only at date 1918, whereas outliers that produce temporary changes were found at dates 1917, 1932, 1949 and 1957. Table XIV reports the empirical results from applying the procedures discussed in this paper using 5% and 10% significance levels. Vogelsang (1999) presents results for additive outliers only for the US/Finland real-exchange rate series Ó Blackwell Publishing Ltd 2003 216 Ó Blackwell Publishing Ltd 2003 TABLE XIII Size of the ADF Test; AR(1) Errors* d1 ¼ 0; d2 ¼ 0; d3 ¼ 0; d4 ¼ 0 d1 ¼ 5; d2 ¼ 3; d3 ¼ 2; d4 ¼ 2 d1 ¼ 10; d2 ¼ 5; d3 ¼ 5; d4 ¼ 5 sc sd s sc sd s sc sd q ¼ 0:80 Without With Total 0.037 0.018 0.055 0.031 0.022 0.053 0.046 0.001 0.047 0.056 0.077 0.133 0.046 0.078 0.124 0.091 0.033 0.124 0.013 0.192 0.205 0.009 0.232 0.241 0.019 0.163 0.182 q ¼ 0:40 Without With Total 0.034 0.016 0.050 0.029 0.021 0.050 0.045 0.002 0.047 0.051 0.058 0.109 0.041 0.058 0.099 0.010 0.057 0.067 0.017 0.159 0.176 0.011 0.177 0.188 0.000 0.050 0.050 q ¼ 0:00 Without With Total 0.034 0.013 0.047 0.030 0.018 0.048 0.046 0.002 0.048 0.035 0.029 0.064 0.030 0.031 0.061 0.000 0.048 0.048 0.021 0.094 0.115 0.015 0.089 0.104 0.000 0.046 0.046 q ¼ 0:40 Without With Total 0.038 0.009 0.047 0.034 0.013 0.047 0.048 0.002 0.050 0.019 0.011 0.030 0.014 0.015 0.029 0.000 0.041 0.041 0.017 0.023 0.040 0.012 0.025 0.037 0.000 0.045 0.045 q ¼ 0:80 Without With Total 0.047 0.005 0.052 0.045 0.008 0.053 0.052 0.001 0.053 0.023 0.004 0.027 0.021 0.008 0.029 0.000 0.036 0.036 0.028 0.004 0.032 0.025 0.008 0.033 0.000 0.047 0.047 *With the lag length fixed at one. P. PERRON AND G. RODRÍGUEZ s SEARCHING FOR OUTLIERS I(1) TIME SERIES 217 Figure 1. Logarithm of the US/Finland real exchange rate based on the consumer price index (CPI); annual from 1900 to 1988. Figure 2. Logarithm of the US/Finland real exchange rate based on the GDP deflator; annual from 1900 to 1987. TABLE XIV Empirical Results; Logarithm of the US/Finland Real Exchange Rate Significance level 5.0% 10.0% Test CPI-based series 1900–88 GDP-based series 1900–87 sc sd sc sd 1918 1917, 1918, 1919, 1932, 1948 1918, 1919 1917, 1918, 1919, 1932, 1948 No outlier 1918, 1949 No outlier 1917, 1918, 1919, 1921, 1932, 1947, 1948, 1957 Ó Blackwell Publishing Ltd 2003 218 P. PERRON AND G. RODRÍGUEZ based on the CPI index. The dates he found (using the procedure s) were 1917–19, 1921 and 1932. When appropriately corrected, Vogelsang’s (1999) method finds outliers only for the year 1918 at the 5% level and for 1918 and 1919 at the 10% level, illustrating the fact that, when it is not corrected, it tends to select more outliers than warranted. The procedure based on first-differenced data (sd ) finds outliers at dates 1917, 1918, 1919, 1932 and 1948 at both the 5% and 10% significance levels. This illustrates how this latter method is more powerful. For the US/Finland real-exchange rate series based on the GDP deflator, the method based on sc finds no outlier. As mentioned by Vogelsang (1999), this may be due to the presence of a shift in the mean of the series as documented by Perron and Vogelsang (1992). The procedure based on first-differenced data (sd ) is, nevertheless, able to identify the years 1918 and 1948 as outliers at the 5% level.6 These two dates are not associated with the change in mean identified by Perron and Vogelsang (1992) as occurring in 1937. The fact that our procedure identifies the year 1918 as an outlier is comforting since visual inspection clearly points in that direction. 8. CONCLUSIONS We analysed in this paper the size and power properties of some test procedures for multiple outliers in series with an AR unit root. We showed, via simulations, that the procedure suggested by Vogelsang (1999) has indeed the right size when applied to detect a single outlier but that it finds an excessive number of outliers when applied in an iterative fashion. We showed this iterative method to be theoretically incorrect and we derived the appropriate limiting distribution for each step of the iterations. We also showed that, whether corrected or not, such outlier detection methods based on the level of the data have very low power unless the magnitude of the outliers is unrealistically large. Our suggestion was to use a procedure based on first-differenced data which was shown to have considerably more power. Our analysis remained in the tradition of sequential searches for outliers. It may well be the case that a global procedure might perform better. Work is under way to investigate this issue. APPENDIX: THE DATA The US/Finland real-exchange rate series based on the CPI index and the GDP deflator were kindly provided by Tim Vogelsang. They are the same series used in Vogelsang (1999), 6 At the 10% significance level, the outliers found are for the years 1917, 1918, 1919, 1921, 1932, 1947, 1948 and 1957. Ó Blackwell Publishing Ltd 2003 SEARCHING FOR OUTLIERS I(1) TIME SERIES 219 Franses and Haldrup (1994) and Perron and Vogelsang (1992). The US/Finland realexchange rate series based on the CPI index is annual from 1900 to 1988, whereas that based on the GDP deflator is from 1900 to 1987. The details of the sources is as follows (Perron and Vogelsang 1992, App. A): Nominal exchange rate series, 1900–88 from the Bank of Finland; CPI, 1900–85 from the Bank of Finland, 1986–88 from the IMF (1988); GDP deflator, 1900–85 from the Bank of Finland, 1986–87 from IMF (1988). The sources of the US data are: for the GNP deflator, 1869–1975 from Friedman and Schwartz (1982), 1976–88 from IMF (1988); for the CPI, 1860–1970 from the US Bureau of the Census (1976) and 1971–88 from IMF (1988). REFERENCES Box, G. E. P. and Tiao, G. C. (1975) Intervention analysis with applications to economic and environmental problems. Journal of the American Statistical Association 70, 70–9. Chang, I and Tiao, G. C. (1983) Estimation of time series parameters in the presence of outliers, Technical Report 8, University of Chicago, Statistics Research Center. ———, ——— and Chen, C. (1988) Estimation of time series parameters in the presence of outliers. Technometrics 30, 193–204. Chen, C. and Liu, L. (1993) Joint estimation of model parameters and outlier effects in time series. Journal of the American Statistical Association 88, 284–97. Dickey, D. A. and Fuller, W. A. (1979) Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74, 427–31. Fox, A. J. (1972) Outliers in time series. Journal of the Royal Statistical Association Series B 43, 350–63. Franses, P. H. and Haldrup, N. (1994) The effects of additive outliers on tests for unit roots and cointegration. Journal of Business & Economic Statistics 12, 471–8. Friedman, M. and Schwartz, A. J. (1982) Monetary Trends in the United States and the United Kingdom: Their Relation to Income, Prices and Interest Rates, 1867–1975. Chicago: The University of Chicago Press. Gómez, V. and Maravall, A. (1992a) Estimation, prediction and interpolation for nonstationary series with the Kalman Filter. European University Institute, Working Paper ECO 92/80. ———, and ——— (1992b) Time series regression with ARIMA noise and missing observations. Program TRAM. European University Institute, Working Paper ECO 92/81. Hawkins, D. M. (1973) Repeated testing for outliers. Statistica Neerlandica 27, 1–10. ——— (1980) Identification of Outliers. New York: Chapman and Hall. International Monetary Fund (1988) International Financial Statistics: Yearbook. Washington, DC: IMF. Peña, D. (1990). Influential observations in time series. Journal of Business & Economic Statistics 8, 235–41. Perron, P. and Ng, S. (1996) Useful modifications to some unit root tests with dependent errors and their local asymptotic properties. Review of Economic Studies 63, 435–63. ——— and Vogelsang, T. J. (1992) Nonstationarity and level shifts with an application to purchasing power parity. Journal of Business & Economic Statistics 10, 301–20. Rodrı́guez, G. (1999) Unit Root, Outliers and Cointegration Analysis with Macroeconomic Applications, Unpublished PhD dissertation, Département de Sciences Économiques, Université de Montréal. Shin, D. W., Sarkar, S. and Lee, J. H. (1996) Unit root tests for time series with outliers. Statistics and Probability Letters 30, 189–97. Stock, J. H. (1999) A class of tests for integration and cointegration. In Cointeqration, Causality and Forecasting. A Festschrift in Honour of Clive W. J. Granger (eds R. F. Engle and H. White). Oxford University Press, 137–67. Tiao, G. C. (1985) Autoregressive moving average models, intervention problems and outlier detection in time series. In Time Series in the Time Domain, Handbook of Statistics 5, (eds E. J. Hannan, P. R. Krishnaiah and M. M. Rao). New York: North Holland, 85–118. Ó Blackwell Publishing Ltd 2003 220 P. PERRON AND G. RODRÍGUEZ Tsay, R. S. (1986) Time series model specification in the presence of outliers. Journal of the American Statistical Association 81, 132–41. US Bureau of the Census (1976) The Statistical History of the United States, From Colonial Times to the Present. New York: Basic Books. Vogelsang, T. J. (1999) Two simple procedures for testing for a unit root when there are additive outliers. Journal of Time Series Analysis 20, 237–52. Ó Blackwell Publishing Ltd 2003