Beck and Katz 2011
Beck and Katz 2011
Beck and Katz 2011
ABSTRACT This paper deals with a variety of dynamic issues in the analysis of time-series cross-section (TSCS) data. While the issues raised are general, we focus on applications to comparative political economy, which frequently uses TSCS data. We begin with a discussion of specication and lay out the theoretical dierences implied by the various types of dynamic models that can be estimated. It is shown that there is nothing pernicious in using a lagged dependent variable and that all dynamic models either implicitly or explicitly have such a variable; the dierences between the models relate to assumptions about the speeds of adjustment of measured and unmeasured variables. When adjustment is quick it is hard to dierentiate between the various models; with slower speeds of adjustment the various models make suciently dierent predictions that they can be tested against each other. As the speed of adjustment gets slower and slower, specication (and estimation) gets more and more tricky. We then turn to a discussion of estimation. It is noted that models with both a lagged dependent variable and serially correlated errors can easily be estimated; it is only OLS that is inconsistent in this situation. There is a brief discussion of lagged dependent variables combined with xed eects and issues related to non-stationarity. We then show how our favored method of modeling dynamics combines nicely with methods for dealing with other TSCS issues, such as parameter heterogeneity and spatial dependence. We conclude with two examples, one being more extended.
For research support we thank the National Science Foundation. And earlier version was presented at the Annual Meeting of the Society for Political Methodology, Stanford University, Stanford, CA., July 29-31, 2004. We thank Geo Garrett, Evelyne Huber and John Stephens for providing data and many colleagues who have discussed TSCS issues with us and allowed us to present in various forums, amongst whom Chris Achen and Simon Jackman (for eorts well beyond the call of duty) must be singled out. Department of Politics; New York University; New York, NY 10003 USA; [email protected] Division of the Humanities and Social Sciences; California Institute of Technology; Pasadena, CA 91125 USA; [email protected]
1. INTRODUCTION Time-seriescross-section (TSCS) data is perhaps the most commonly used in comparative politics (broadly dened) and particularly in comparative political economy.1 While there are a variety of issues related to TSCS data, a number of important ones relate to the dynamics (time series) properties of the data. Obviously many of these issues are similar to those for a single time series, but the context of comparative political economy and the relatively short lengths of the TSCS time periods make for some interesting special issues. We assume that the reader is familiar with the basic statistical issues of time series data; since various specication issues are covered for political scientists elsewhere (Beck, 1985, 1991; De Boef and Keele, 2008; Keele and Kelly, 2006) we go fairly quickly over the basic issues, spending more time issues relating to the dynamic modeling and interpretation of those models in political economy. 1.1. Notation and nomenclature Our interest is in modeling the dynamics of TSCS models. By dynamics we mean any process where some variable has an impact that is distributed over time. Let yi,t be a observation for unit i at time t where i = 1, . . . , N and t = 1, . . . , T . We assume that y is measured as a continuous variable, or at least can be taken as continuous. Since in what follows we typically do not care if we have one or more than one independent variable or variables, let xi,t be either an observation on a single independent variable or a vector of such variables; if the latter, it is assumed that the dynamics apply similarly to all the components of that vector. Where we need to dierentiate dynamics we use a second variable (or vector of variables), zi,t . Since the constant term is typically irrelevant for what we do, we omit it from our notation (or include it in the vector of other independent variables). We will assume that these independent variables are exogenous, which is clearly both a strong and critical assumption. We postpone discussing specications for yi,t until the next section. Since the paradigmatic applications are to comparative political economy, we will often refer to the time periods as years and the units as countries. Given the yearly frequency, we focus on models where explicit lags are only of one or two years; we would not expect to see the higher order dynamic processes common in standard time series analysis of monthly or quarterly data. While we focus on low order processes, it is trivial to extend the interpretation, tests and estimation to higher order dynamics. T must be large enough so that averaging over time makes sense. In our own prior work (Beck and Katz, 1995, 1996) we did not examine situations where T < 15. Much political
Comparative politics refers to any comparison of political units, so it encompasses almost all of international relations which has countries or country pairs as the unit of analysis and any study which compares units (regions, states or counties) within a single country. Our language comes from comparative political economy, but everything extends to other types of TSCS studies (both within and beyond political science) so long as the data are observed over a long enough period. In terms of the importance of this type of data, Adolph, Butler and Wilson (2005) report that by the early 2000s approximately 5% of all political science articles in JSTOR journals used TSCS data.
1
economy data spans a sample period of 30 or more years and so there are no problems. This paper does not discuss standard survey panels which typically have ve or fewer waves, and there is no reason to believe that the methods discussed here apply to such survey panels. We make no assumptions about N , though in comparative political economy it is seldom less than 20 (advanced industrial nations); it can be quite large (over 100,000 for some studies in International Relations using dyadic data). We distinguish two types of error terms; i,t refers to an independent identically distributed (iid) error process, whereas i,t refers to a generic error process that may or may not be iid. Unless specically stated, we restrict non-iid processes to simple rst order autoregressions; this simplies notation with no loss to our argument; it is simple to extend our argument to more complicated error processes. Since coecients are only interpretable in the context of a specic model, we superscript coecients to indicate the specication they refer to whenever confusion might arise.2 When relevant, we use L as the lag operator, so that Lyi,t = yi,t1 if t > 1 and Lyi,1 is missing. The rst dierence operator is then = 1 L. 1.2. Stationarity We initially, and for most of the article, assume that the data are drawn from a stationary process. A univariate process is stationary if its various moments and cross-moments do not vary with the time period. In particular, the initial sections assume that the data are drawn from a covariance stationary process, that is E(yi,t ) = Var(yi,t ) = 2 E(yi,t yi,tk ) = k (2a) (2b) (2c) (1)
(and similarly for any other random variables). Stationary processes have various important features. In particular, they are mean reverting, and the best long-run forecast for a stationary process is that mean. Thus we can think of the mean as the equilibrium of a stationary process. Alternatively, we can think of the statistical properties of the data as not varying simply as a function of time (so, for example, there are no trends in the data.) We briey discuss non-stationary data in Section 4. 1.3. Missing data To simplify notation, we assume that the data set is rectangular, that is, each country is observed for the same time period (which is called the sample period even though it is not
We use the terms specication and model interchangeably, and refer to both as equations when making reference to a specic equation in the text.
2
a sample of anything). It is easy to extend everything to a world in which some countries are not observed for a few years either at the beginning or the end of period under study, and the only cost of so doing would be an additional subscript in the notation. Missing data in the interior of the period under study is not benign. At a minimum, such missing data causes all the standard problems associated with missing data (Little and Rubin, 1987). The default solution, list-wise deletion, is well known to be an inferior solution to the missing data problem for cross-sectional data. But the problem is more severe for TSCS data, since the specication invariably includes temporal lags of the data; even if the model has only rst order lags, each observation with missing data leads to the deletion of two data points. Thus, even more so than with cross-sectional data data, multiple imputation techniques are de rigour for dynamic models that have more than a trivial amount of missing data. Obviously the amount of missingness will vary as we move from studies of advanced industrial societies to poorer nations, and so the attention paid to missingness can vary. While it is easy to say that analysts should use Little and Rubins multiple imputations, the standard methods for cross-sectional imputations (hot decking or assuming that both missing and non-missing observations are essentially multivariate normal) are not appropriate for TSCS data. This is because we know a lot about TSCS data. Thus, for example, we would expect that missing economic variable are likely to be highly related to observed values in nearby countries with similar economies, or that observations on trending time series can be imputed with interpolated values. Honaker and Kings (2010) Amelia II allows users this kind of exibility. But there is no mechanistic solution to the missing data problem in political economy. To give but one example, missingness is often related to such things as civil wars; if we simply use some complicated averaging method to impute missing economic data during a civil war, our imputations are likely to be overly optimistic. Analysts using TSCS data sets with a signicant amount of missing data can only be warned that they must take extreme care. 1.4. Roadmap The next section, on the interpretation of alternative dynamic specication, is the heart of the article. There we deal only with stationary data. The following section briey examines combining dynamics with cross-sectional issues, in particular accounting for heterogeneity across units. The following section extends the argument to slowly moving and non-stationary data. Two examples of dynamic modeling with political economy TSCS data are in Section 5 and we oer some general conclusions in the nal section. 2. DYNAMIC MODELS: STATIONARY DATA There are a variety of specications for any time series model; for reviews considering applications to political science see Beck (1991) and De Boef and Keele (2008). All time series specications have identical counterparts in TSCS models. Since these specications appear in any standard text, we discuss general specication issues without either citation or claim of originality. 4
In our own prior work (Beck and Katz, 1996) we argued that a specication with a lagged dependent variable (LDV) is often adequate; since that has sparked some discussion (Achen, 2000; Keele and Kelly, 2006), we spend some time on this issue. After discussing a variety of specications we discuss issues of interpretation and estimation.3 2.1. Dynamic specications The generic static (non-dynamic) specication is yi,t = s xi,t + i,t . (3)
This specication is static because any changes in x or the errors are felt instantaneously with the eect also dissipating instantaneously; there are no delayed eects.4 There are a variety of ways to add dynamics to the static specication. The simplest is the nite distributed lag (FDL) model which assumes that the impact of x sets in over two (or a few) periods, but then dissipates completely. This specication has: yi,t = f 1 xi,t + f 2 xi,t1 + i,t . (4)
with the obvious generalization for higher ordered lags. Equation 3 is nested inside5 Equation 4 so testing between the two is simple in principle (though the correlation of x and its lags makes for a number of practical issues). Another commonly used dynamic specication is to assume that the errors follow a rst order autoregressive (AR1) process (rather than the iid process of Equation 3). If we assume that the errors follow an AR1 process, we have yi,t = ar1 xi,t + i,t + i,t1 i,t = ar1 xi,t + 1 L ar1 = xi,t + yi,t1 ar1 xi,t1 + i,t . (5a) (5b) (5c)
The advantage of the formulation in Equation 5c is that it makes the dynamics implied by the model clearer and also makes it easier to compare various models. Another alternative model is the lagged dependent variable (LDV) model (with iid errors) yi,t = ldv xi,t + yi,t1 + i,t xi,t i,t = ldv + . 1 L 1 L
3
(6a) (6b)
At various points we refer to critiques of the use of lagged dependent variables in Achens paper. While it is odd to spend time critiquing a decade year old unpublished paper, this paper has been inuential (over 300 Google Scholar cites as of this writing). We only deal with the portions of Achens paper relevant to issues raised here. 4 It may be that xi,t is measured with a lag, so the eect could be felt with a lag, but the model is still inexible in that the eect is completely and only felt at the one specied year. 5 One specication is nested inside another if it is a special case of the more general specication, obtained by restricting some coecients.
As the Equation 6b makes clear, the LDV model simply assumes that the eect of x decays geometrically (and for a vector of independent variables, all decay geometrically at the same rate). Note also that the compound error term is an innite geometric sum (with the same decay parameter as for x); this error term is equivalent to a rst order moving average (MA1) error process, again with its decay parameter being constrained to equal , the rate at which the eect of x on y decays. Both the AR1 and LDV specications are special cases of the autoregressive distributed lag (ADL) model, yi,t = adl xi,t + yi,t1 + xi,t1 + i,t (7) where Equation 5c imposes the constraint that = adl and Equation 6a assumes that = 0. The nesting of both the LDV and AR1 specications within the ADL specication allows for testing between the various models. For interpretative purposes, it can be helpful to rewrite the ADL model in error correction (EC) form (Davidson, Hendry, Srba and Yeo, 1978). To do this, subtract yi,t1 from both sides of the ADL model to get a rst dierence in y on the left hand side, and add and subtract adl xi,t1 from the right hand side to get a rst dierence of x in the specication. This leads to yi,t = ec xi,t (yi,t1 xi,t1 ) + i,t (8) which allows for the nice interpretation that short tun changes in y are a function of both short run changes in x and how much x and y were out of equilibrium last year, where the equilibrium y and x are given by yi,t = xi,t and the speed of equilibration (per year) is . The coecients of the EC model can be easily derived from the corresponding ADL model: adl ec = adl , = 1 and = + . For comparison with other models the ADL works 1 better, but for direct substantive interpretation of the coecients the EC model is easier to work with(since one can directly read o the short term impact of a change in x as well as various long run impacts). Since the two are identical, either one can be estimated. We return to the EC model when we deal with non-stationary data in Section 4. 2.2. Interpretation To see how the various specications dier, we turn to unit and impulse response functions. Since x itself is stochastic, assume the process has run long enough for it to be at its equilibrium value (stationary implies the existence of such an equilibrium). We can then think of a one time shock in x (or ) of one unit, with a subsequent return to equilibrium (zero for the error) the next year; if we then plot y against this, we get an impulse response function (IRF). Alternatively, we can shock x by one unit and let it stay at the new value; the plot of y against x is a unit response functions (URF). The static specication assumes that all variables have an instantaneous and only an instantaneous impact. Thus the IRF for either x or is a spike, associated with an instantaneous change in y, and if x or then returns to previous values in the next period, y immediately also returns to its previous value. The URF is simply a step function, with the height of the single step being s . 6
The nite distributed lag model generalizes this, with the URF having two steps, of height f 1 and f 1 + f 2 , and the interval between the steps being one year. Thus, unlike the simple static model, if x changes, it takes two years for the full eect of the change to be felt, but the eect is fully felt in those two years. Thus it may take one year for a party to have an impact on unemployment, but it may be the case that after that year the new party in oce has done all it can and will do in terms of changing unemployment. Similarly, an institutional change may not have all of its impact immediately, but the full impact may occur within the space of a year. We could add more lags to Equation 4 allowing for more complicated dynamics. But time series within a country are often temporally correlated, so multicolinearity makes it dicult to get good estimates of the coecients of the FDL specication with many lags of x. Given the annual frequency of much of the data seen in comparative political economy, the problem of having to add too many lags to Equation 4 (say more than one additional lag) may not, in practice, be a problem,. It may be unlikely that interesting institutional changes have only an immediate impact, but the FDL model might be appropriate. It surely should be borne in mind in thinking about appropriate specications, and, as we shall see, it combines nicely with some other specication. The AR1 model has a dierent IRF for x and the error. The IRF for x is a spike, identical to that of the static model; the IRF for the error has a declining geometric form with rate of decay . It seems odd that all the omitted variables have a declining geometric IRF but the x we are modeling has only an instantaneous impact. Maybe that is correct, but this is not the rst specication that would occur to us. The AR1 specication can be generalized by adding more lags of x, but we would still have very dierent dynamics for x and the unobserved variables in the error term. One should clearly have a reason to believe that dynamics are of this form before using the AR1 specication. The LDV model has an IRF for both x and the error that has a declining geometric form; the initial response is ldv (or 1 for the error); this declines to zero geometrically at a rate . While the eect never completely dissipates, it becomes tiny fairly quickly unless is almost ldv one. The URF starts with an eect ldv immediately, increasing to 1 . If is close to one, the long run impact of x can be 10 or more times the immediate impact. While the ADL specication appears to be much more exible, it actually has an IRF similar to the LDV specication, other than in the rst year (and is identical for a shock to the error process). Initially y changes by adl units, then the next period the change is adl + which then dies out geometrically at a rate . Thus the ADL specication is only a bit more general than the LDV specication. It does allow for the maximal impact of x to occur a year later, rather than instantaneously (or, more generally, the eect of x after one period is not constrained to be the immediate impact with one years decay). This may be important in some applications. A comparison of the various IRFs and URFs is in Figure 1. These clearly show that the dierence between the specications has simply to do with the timing of the adjustment to y after a change in x. Before getting to slightly more complicated models, this analysis tells us several things. The various models dier in the assumptions they impose on the dynamics that govern 7
2.0
Response
10
Year
2.0
Response
10
Year
Figure 1: Comparison of impulse and unit response functions for four specications: Autoregressive Distributed Lag (ADL, yi,t = 0.2xi,t + 0.5yi,t1 + 0.8xi,t1 ), Finite Distributed Lag (FDL, yi,t = 1.5xi,t + 0.5xi,t1 ), Lagged Dependent Variable (LDV, yi,t = 1.0xi,t + 0.5yi,t1 ), and Autoregressive Errors (AR1, yi,t = 2xi,t )
how x and the errors impact y. None of the dynamic specications can be more or less right a priori. Later we will discuss some estimation (see Section 2.5) and testing (see Section 2.6) issues, but for now we can say that various theories would suggest various specications. The most important issue is whether we think a change in some variable is felt only immediately, or whether its impact is distributed over time; in the latter case, do we think that a simple structure, such as a declining geometric form, is adequate? How would we expect an institutional change to aect some y of interest in terms of the timing of that eect? If only immediately or completely in one or two years, the AR1 model or the FDL model seems right; if we expect the some initial eect which increases to some limit over time, the LDV or ADL model would be used. But there is nothing atheoretical about the use of a lagged dependent variable, and there is nothing which should lead anyone to think that the use of a lagged dependent variable causes incorrect harm. It may cause correct harm, in that it may keep us from incorrectly concluding that x has a big eect when it does not, but that cannot be a bad thing. As has been well known, and as Hibbs (1974) showed three decades ago for political science, the correct modeling and estimation of time series models often undoes seemingly obvious nding. A related way to say the same thing is that each of the various models (for stationary data) implies some long-run equilibrium and a speed with which that equilibrium is reached after some shock to the system. It is easy to solve for equilibria (if they exist) by noting that in equilibrium both x and y are stable. Let yE and xE refer to equilibrium x and y. Then for the ADL model it must be true that yE = adl xE + yE + xE
adl
(9)
+x yielding yE = 1 E (|| < 1 by stationarity). This is easier to see in the error correction form, where yi,t1 = xi,t1 in equilibrium and is the rate (per year) at which y returns to this equilibrium. Any of the models for stationary data imply both a long run equilibrium and a speed of equilibration, with the dierent parameter constraints determining these long run features. Each of these models implies dierent short and long run reactions of y to x and standard econometric methods (see Section 2.6) can be used to discriminate between them.
2.3. Higher order processes and other complications We can generalize any of these models to allow for non-iid errors and higher order dynamics. However, since our applications typically use annual data, it is often the case that rst order error processes suce and it would be unusual to have more than second order processes. As we shall see, it is easy to test for higher order error processes so there is no reason to simply assume that errors are iid or only follow a rst order process. For notational simplicity we restrict ourselves to second order processes, but the generalization is obvious. Consider the LDV model with AR1 errors, so that yi,t = ldv xi,t + yi,t1 + 9 i,t . 1 L (10)
After multiplying through by (1 L) we get a model with two lags of y, x and lagged x and some constraints on the parameters; if we generalize the ADL model similarly, we get a model with two lags of both y and x and more constraints. The interpretation of this model is similar to the model with iid errors. We have already seen that the LDV model with iid errors is equivalent to a model where the eect of all the independent variables and the error decline at the same geometric rate. But if we assume that the errors, that is omitted or unmeasured variables, follow an MA1 process with the same decay rate, , as for the measured variables (which may or may not be a good assumption), we have yi,t = ldv xi,t + yi,t1 + (1 L)i,t which simplies to yi,t = ldv xi,t + i,t , 1 L (11b) (11a)
that is, a model which combines a geometrically declining impact of x on y with iid errors. It is surely more likely that the errors are correlated than that they are independent. Of course the most likely case is that the errors are neither iid nor MA1 with the same dynamics as x, so we should entertain a more general specication where the eect of both measured and unmeasured variables have a declining geometric impact with dierent rates of decline. The simplest such specication is Equation 10. We return to this more general specication in Section 2.5. 2.4. More complicated dynamics - multiple independent variables We typically have more than one independent variable. How much dynamic generality can (or should) be allowed for? One easy generalization is to allow for two independent (or sets of independent) variables, x and z. Allowing also for a separate speed of adjustment for the errors yields zi,t i,t xi,t + + . (12) yi,t = 1 x L 1 z L 1 L s Obviously each new variable now requires us to estimate two additional parameters. Also, on multiplying out the lag structures, we see that with three separate speeds of adjustment we have a third-order lag polynomial multiplying y, which means that we will have the rst three lags of y on the right hand side of the specication (and two lags of both x and z) and an MA2 error process. While there are many constraints on the parameters of this model, the need for 3 lags of y costs us 3 years worth of observations (assuming the original data set contained as many observations as were available). With k independent variables, we would lose k + 1 years of data; for a typical problem where T is perhaps 30 and k is perhaps 5, this is non-trivial. Thus we are unlikely to ever be able to (or want to) estimate a model where each variable has its own speed of adjustment. But we might get some leverage by allowing for two kinds of independent variables; those where adjustment (speed of return to equilibrium) is relatively fast (x) and those where 10
the system returns to equilibrium much more slowly. Since we are trying to simplify here, assume the error process shows the same slower adjustment speed as z; we can obviously build more complex models but they bring nothing additional to this discussion. We then would have yi,t = x xi,t + z i,t zi,t + 1 L 1 L = x xi,t x xi,t1 + z zi,t + yi,t1 + i,t . (13a) (13b)
Thus at the cost of one extra parameter, we can allow some variables to have only an immediate or very quick eect, while others have a slower eect, with that eect setting in geometrically. With enough years we could estimate more complex models, allowing for multiple dynamic processes, but such an opportunity is unlikely to present itself in studies of comparative political economy. We could also generalize the model by allowing for the lags of x and z to enter without constraint. It is possible to test for whether these various complications are supported by the data, or whether they simply ask too much of the data. As always, it is easy enough to test and then make a decision. 2.5. Estimation issues As is well known a specication with no lagged dependent variable but serially correlated errors is easy to estimate using any of several variants of feasible generalized least squares (FGLS), with the Cochrane-Orcutt iterated procedure being the most well known. It is also easy to estimate such a model via maximum likelihood, breaking up the full likelihood into a product of conditional likelihoods. The LDV model with iid errors is optimally estimated by OLS. However, it is also well-known that OLS yields inconsistent estimates of the LDV model if the error process is serially correlated. Perhaps less well-known is that Cochrane-Orcutt or maximum likelihood provides consistent estimates of the LDV model with serially correlated errors by accounting for that serial correlation (Hamilton, 1994, 226).6 Thus it is easy to correctly estimate the LDV model while allowing for serially correlated erors if analysts wish to do so. But we hope that analysts will not wish to do so. It is often the case that the inclusion of a lagged dependent variable eliminates almost all serial correlation of the errors. To see this, start with the AR1 equation: yi,t = ar1 xi,t + i,t i,t = i,t + i,t1 . (14a) (14b)
Remember that the error term is simply all the omitted variables, that is, everything that determines y that is not explained by x. If we adjoin yi,t1 to the specication, the error in that new specication is i,t yi,t1 where i,t is the original error in Equation 14a, not some generic error term. Since the i,t are serially correlated because they contain a common
The Cochrane-Orcutt procedure may nd a local minimum, so analysts should try various starting values. This is seldom an issue in practice, but it is clearly easy enough to try alternative starting values.
6
11
omitted variable, and yi,t1 contains the omitted variables at time t 1, including yi,t1 will almost certainly lower the degree of serial correlation, and often will eliminate it. But there is no reason to simply hope that this happens; we can estimate (using OLS) the LDV model assuming iid errors, and then test the null that the errors are independent using a Lagrange multiplier test (which only requires that OLS be consistent under the null of iid errors, which it is).7 If, as often happens, we do not reject the null that the remaining errors are iid, we can continue with the OLS estimates. If we do reject that null we should estimate a more complicated model. Obviously failing to reject the null of no serial correlation of the errors is not the same thing as knowing there is no serial correlation of the errors. Is this incorrect logic in interpreting a failure to reject the null hypothesis likely to cause problems? There are two reasons to be sanguine here. First, the large amount of data in typical TSCS studies gives the Lagrange multiplier test good power. In our rst example (Section 5.1), with about 300 total observations, the Lagrange multiplier test detected a serial correlation of the errors of about 0.10. It is also the case that ignoring a small amount of serial correlation (that is, estimating the LDV model with OLS as if there were no serial correlation) leads to only small amounts of bias. As Achen (2000, 13) elegantly shows, the estimation bias in incorrectly using OLS to estimate the LDV model with serially correlated errors in l is directly proportional to that serial correlation. Applied researchers make many assumptions to simplify analysis, assumptions that are never exactly correct. Ignoring a small serial correlation of the errors is surely one of the more benign mistakes. As we shall see in Section 3 a number of fruitful avenues of investigation are open if the errors are either uncorrelated or suciently uncorrelated that we can ignore that correlation. But what if a researcher is not so sanguine. As we have seen in Section 2.5 one can easily estimate, using methods other than OLS, the ADL model with serially correlated errors. But a more fruitful approach, as shown in Section 2.3, is to include second order lags of the variables in the ADL specication; this ADL2 specication can be appropriately estimated by OLS, once again allowing the researcher to more easily examine other interesting features of the data. Of course the same Lagrange multiplier testing procedure should rst be used to test for remaining serial correlation, but with annual data we can be hopeful that we will not need highly complicated lag structures in the preferred specication. Obviously more parsimonious specications are easier to interpret (and convey to the reader) and so more complicated specications with higher order lags come at a cost. Thus we might want to consider models intermediate between the ADL and ADL2 models. One obvious choice is to simply append a second lag of the dependent variable to the ADL specication; this is analogous to moving from the static to the LDV specications as discussed previously. This simpler specication, ADLLDV2, should be tested to see if the errors are iid. The ADLLDV2 specication may be a good compromise between parsimony and delity to important features of the data; in our rst example this is our preferred model. In other
The test is trivial to implement. Take the residuals from the OLS regression and regress them on the appropriate number of lags of those residuals and all the independent variables including the lagged dependent variable; the relevant test statistics if N T R2 from this auxiliary regression, which is asymptotically distributed 2 with degrees of freedom equal to the number of lags.
7
12
cases even simpler models may provide a better tradeo between the various goals. 2.6. Discriminating between models We can use the fact that the ADL model nests the LDV and AR1 models to test which specication better ts the data. The LDV model assumes that = 0 (in Equation 7) whereas the AR1 model assumes that = adl . Thus we can estimate the full ADL model and test whether = 0 or = adl .8 If both simplications are rejected we can simply retain the more complicated ADL model.9 Even in the absence of a precise test, the ADL estimates will often indicate which simplication is not too costly to impose. Note that for fast dynamics (where is close to zero) it will be hard to distinguish between the LDV and AR1 specications, or, alternatively, it does not make much dierence which specication we use. To see this, note that if the AR1 model is correct, but we estimate the LDV model, we are incorrectly omitting the lagged x variable, when it should be in the specication, but with the constrained coecient, . As goes to zero, the bias from failing to include this term goes to zero. Similarly, if we incorrectly estimate the AR1 model when the LDV model is correct, we have incorrectly included in the specication the lagged dependent variable, with coecient . Again, as goes to zero, this goes to zero. Thus we might nd ourselves not rejecting either the LDV or AR1 specications in favor of the more general specication, but for small it matters little. As grows larger the two models diverge, and so we have a better chance of discriminating between the specications. Interestingly, this is dierent from the conventional wisdom omitted variable bias, where it is normally thought to be worse to incorrectly exclude than to incorrectly include a variable. This dierence is because both models constrain the coecient of the lagged x, and so the AR1 model forces the lagged x to be in the specication. But if we start with the ADL model and then test for whether simplications are consistent with the data we will not be misled. This testing of simplications is easy to extend to more complicated models, such as Equation 13b. 3. COMBINING DYNAMIC AND CROSS-SECTIONAL ISSUES Obviously modeling dynamics with TSCS data is only half the job; clearly analysts also need to model the cross-sectional properties of the data. Here we discuss some issues that relate to the interaction of modeling dynamics and cross-sectional issues.10
The rst test is an ordinary t-test. The second is easiest via a linear approximation to the non-linear constraint using a Taylor series (Greene, 2008, 968); this test is implemented in some common statistical packages such as Stata. 9 Starting with the ADL model and then testing whether simplications are consistent with the data is part of the idea of general to simple testing (also called the encompassing approach) as espoused by Hendry and his colleagues (Hendry and Mizon, 1978; Mizon, 1984). Note that this approach could start with a more complicated model with higher order specications, but given annual data, the ADL model with no more than lags is often the most complicated specication that need be considered. 10 See Beck and Katz (1995) or Beck (2001) for our discussion of dealing with various cross-sectional issues for TSCS data. For reasons of space we omit discussion of dynamics with discrete dependent variables, which we have discussed elsewhere (Beck, Katz and Tucker, 1998). Dynamics are no less important in models with
8
13
3.1. Independent errors simplify cross-sectional modeling We have advocated modeling dynamics by including appropriate current and lagged values of the xs and lagged values of the dependent variable so that the resulting errors appear to be serially independent, allowing both for easy interpretation and estimation. One big advantage of this approach is that it is then much simpler to model cross-sectional situations. Most standard programs that allow for modeling complicated cross-sectional situations do not allow for temporally correlated errors. While this is a practical rather than a theoretical issue, some estimation methods are suciently complex that one really wants to use a canned program.11 In particular, realistic political economy models often should allow for spatial eects, that is, variables in one country impact other countries. Models of the political causes of economic performance, for example, must take into account that economic performance of any country is a function of the economic performance of its trading partners. These issues have been discussed in the context of TSCS data elsewhere (Beck, Gleditsch and Beardsley, 2006; Franzese and Hayes, 2007) and here we simply point out that our preferred approach to dynamics makes it easy for analysts to also deal with this critical cross-sectional issue. Another cross-sectional feature that should be considered (see Beck and Katz, 2007 and the references cited there) is that parameters may vary randomly by country (and possibly as a function of country level covariates). It is easy to allow for this using the random coecients model (which is equivalent to a mixed or hierarchical or multilevel model) if the error process is iid. Note that one of the randomly varying parameters can be the that of the lagged dependent variable, the parameter that controls the speed of adjustment in the model. Perhaps countries dier in that speed of adjustment. As we see in Section 5.1 this issue is easy to examine when errors are iid. 3.2. Fixed eects and lagged dependent variables Perhaps the most common cross-sectional issue is heterogeneity of the intercepts. In the TSCS context this is usually deal with by adding xed eects (country specic intercepts) to the specication. We would adjoin these country specic intercepts to the preferred ADL specication. But here we get into potential trouble, since it is well known that autoregressive models with xed eects lead to biased parameter estimates (Nickell, 1981). This bias is induced because centering all variables by country, which eliminates the heterogeneity of the constant term, induces a correlation between the centered lagged dependent variable and the
discrete dependent variables, but the recommended modeling is dierent for that situation. 11 Some researchers try to solve this problem by using simple models and then correcting the standard errors using some variant of Hubers (1967) robust standard errors. This is the reasoning behind our recommendation to use PCSEs to deal with some dicult cross-sectional complications of the error process. There are similar autocorrelation consistent standard errors (Newey and West, 1987). We do not recommend these because failing to account for serially correlated errors often leads to substantial ineciencies in estimation as well as incorrect standard errors; failing to account for cross-sectional problems in the data is usually less serious. In any event, users of our preferred methods have no need to resort to autocorrelation consistent standard errors.
14
centered error term. 1 It is also well known that this bias is of order T and almost all of the work on dealing with this problem has been in the context of small T panels. When T is 2 or 3, the bias is indeed severe (50% or so). But when T is 20 or more the bias becomes small. Various corrections for this bias are well-known. Most of them involve the use of instrumental variables, building on the work of Anderson and Hsiao (1982). As is often the case, it is hard to nd good instruments, and so the instrumental variable corrections often obtain consistency at the price of rather poor nite sample properties. Other estimators (Kiviet, 1995) are hard to combine with other methods and hard to generalize to even non-rectangular data sets. In Beck and Katz (2009) we ran Monte Carlo experiments to compare OLS estimation of a simple LDV model with xed eects to the Kiviet and Anderson-Hsiao estimators. For the T s seen typically in TSCS analysis (20 or more), OLS performs about as well as Kiviet and much better than Anderson-Hsiao. Given the advantages of the OLS method discussed in the previous sub section we do not hesitate to recommend OLS when country specic intercepts must be adjoined to the specication of a TSCS model.12 4. NON-STATIONARITY IN POLITICAL ECONOMY TSCS DATA Before looking at some examples, one topic remains: what to do with non-stationary data. During the last two decades, with the pioneering work of Engle and Granger (1987), time series econometrics has been dominated by the study of non-stationary series. While there are many ways to violate the assumptions of stationarity presented in Equation 2, most of the work has all dealt with the issue of unit roots or integrated series in which shocks to the series accumulate forever. These series are long-memoried since even distant shocks persist to the present. The key question is how to estimate models where the data are integrated (we restrict ourselves to integration of order one with no loss of generality). Such data, denoted I(1), are not stationary but their rst dierence is stationary. The simplest example of such an I(1) process is a random walk, where yi,t = yi,t1 + i,t (15)
with i,t being stationary by denition. Integrated data look very dierent from data generated by a stationary process. Most importantly they do not have equilibria (since there is no mean reversion) and the best prediction of an integrated series many periods ahead is just the current value of that series. There is a huge literature on estimating models with integrated data. Such methods must take into account that standard asymptotic theory does not apply, and also that
t
lim Var(yi,t ) = .
(16)
Thus if we wait long enough, any integrated series will wander innitely far from its mean. Much work on both diagnosing and estimating models with integrated series builds heavily
12
Similar advice is given by Judson and Owen (1999) following a similar discussion of this issue.
15
on both these issues. Our interest is not in the estimation of single time series, but rather TSCS political economy data.13 Political economy data is typically observed annually for relatively short periods of time (often 20-40 years). Of most relevance, during that time period we often observe very few cycles. Thus, while the series may be very persistent, we have no idea if a longer time period would show the series to be stationary (though with a slow speed of adjustment) or non-stationary. These annual observations on, for example, GDP or left political control of the economy are very dierent than the daily observations we may have on various nancial rates. So while it may appear from an autoregression that some political economy series have unit roots, is this the right characterization of these series? For example, using Huber and Stephens (2001) data, an autoregression of social security on its lag yields a point estimate of the autoregressive coecient of 1.003 with a standard error of 0.009; a similar autoregression of Christian Democratic party cabinet participation of 1.03 with a standard error of 0.001. It does not take heavy duty statistical testing to see we cannot reject the null that the autoregressive coecient is one in favor of the alternative that it is less than one. But does this mean that we think the series might be I(1)? Making an argument similar to that of Alvarez and Katz (2000), if these series had unit roots, there would be tendency for them to wander far from their means and the variance of the observations would grow larger and larger over time. But by denition both the proportion of the budget spent on social security and Christian Democratic cabinet participation are bounded between zero and 100%, which then bounds how large their variances can become. Further, if either series were I(1), then we would be equally likely to see an increase or decrease in either variable regardless of its present value; do we really believe that there is no tendency for social security spending to be more likely to rise when it is low and to fall when high, or similarly for Christian Democratic cabinet strength? In the Huber and Stephens data social security spending ranges only between 3% and 33% of the budget and Christian Democratic cabinet strength ranges between zero and 34%. While these series are very persistent, they simply cannot be I(1), Therefore, the impressive apparatus built over the last two decades to estimate models with I(1) series does not provide the tools needed for many, if not most, political economy TSCS datasets. One possibility is to induce stationarity by rst dierencing all slowly changing variables, leading to a model which explains changes in y by changes in x. In practice rst dierence models often perform poorly (from the perspective, at least, of the researcher, where changes in x appear unrelated to changes in y). Modeling rst dierences also throws out any longrun information about y and x, so the eect of a change in x is the same regardless of whether y is high or low by historical standards. Fortunately, the modeling issue is not really about the univariate properties of any series, but the properties of the stochastic process that generated the ys conditional on the observed covariates. Even with data similar to Huber and Stephens, the errors may appear stationary and so the methods of the previous section can be used. In particular, whether the series are integrated or stationary but slowly moving, they may be well modeled by the error correction
There is a literature on panel unit roots (Im, Pesaran and Shin, 2003; Levin, Lin and Chu, 2002), but at this point the literature is still largely about testing for unit roots.
13
16
specication (Equation 8), which as we have seen is just an alternative parameterization of the ADL model. The EC form is nice, because it combines the short run rst dierences model with the long run tendency for series to be in equilibrium. If the estimated in the EC specication is zero that indicates that y and x have no long run equilibrium relationship. We have already seen that if x and y are stationary that they always have a long run relationship, so this is only a problem if the series are integrated. In other words, if the series are stationary but adjust very slowly the EC (or equivalent ADL) model is a good place to start, and if the series are integrated either the EC model will work (the series are said to be co-integrated) or the residuals will appear highly correlated. Since our preferred methodology chooses specications with almost uncorrelated residuals, it should never lead to choosing an incorrect EC (or ADL) specication. Why do we propose ignoring much of what has dominated econometric argument for two decades? First economists study many series (such as interest or exchange rates) which inherently are in levels, and so are likely to be integrated; variables in political economy are often expressed as a proportion of GDP or the government budget and hence much less likely to be integrated. Other political variables, such as party control of government, may be persistent, but cannot possibly be integrated (they take on values of zero and one only, and so neither have innite variance nor no tendency to revert back towards the mean). A second dierence is that economists have no theory about whether one short run exchange rates adjusts to a second rate, or the second rate adjusts to the rst, or both; this leads to complicated estimation issues. In many political economy models it is clear that y adjusts to x but not vice versa. We think that left governments increase spending but we not not think that low spending leads directly to a right wing government (Beck, 1992). Thus, even with highly persistent data, the EC (or ADL) model, estimated by OLS, will quite often work well, and, when it fails, simple tests and a rigorous methodology will indicate that failure.14 5. EXAMPLES In this section we consider two examples to explore the practical issues in estimating dynamics in political economy TSCS datasets. The rst example, presented in some detail, looks at the political determinants of capital taxation rates where adjustment speeds are fairly slow. The second example, presented more cursorily, looks at the impact of political variables on the growth of GDP. Here, since the dynamics here are quite fast, the specication choice has fewer consequences.15
There is a slight technical problem in that the distribution of the estimated is not normal if the series are not co-integrated. Instead, they have a Dickey-Fuller type distribution which has fatter tails. Thus there may be some cases where a standard test of the null hypothesis that = 0 yields incorrect conclusions. But given the large N and T of TSCS data, it is often the case that it is clear that the EC model is adequate or not, and if we incorrectly assume stationarity consistent application of appropriate but standard methods will indicate the problem. 15 All computations were done using Stata 11.1 with data kindly provided by the authors. While our analysis is dierent from those of the original authors, we began by easily replicating their results.
14
17
Australia
80
Austria
Belgium
Canada
20
40
60
Denmark
80
Finland
France
Germany
20
40
60
Italy
80
Japan
Netherlands
Norway
20
40
60
Sweden
80
Switzerland
UK
US
20
40
60
5.1. Capital taxation rates Ou rst example models capital taxation rates in 16 OECD nations from 196193, using the data and specication of Garrett and Mitchell (2001).16 Obviously tax rates move relatively slowly over time; the autoregressive coecient of tax rates is 0.77. Thus, while tax rates are clearly stationary, it will take some number of years for the system to get close to fully adjusting; it takes about 2.65 years for any shock to dissipate. Before estimation one should examine the data to see whether there is sucient withincountry heterogeneity to make TSCS analysis meaningful, to see whether there appears to be very much inter-country heterogeneity which might need to be modeled and to see whether there are any temporal pattens, such as trends, which again need to be modeled. For the rst two issues a standard country specic box plot of tax rates, shown in Figure 2, is appropriate; for the third question time series plots by country, shown in Figure 3, are more useful, While some countries (Austria, Denmark, The Netherlands and New Zealand) show little
The data set is not rectangular; some countries only report tax rates for a portion of the period under study. In total there were 322 observations after dropping missing data at the beginning of a period (and also observations with missing lagged data so that all results pertain to the same data). The extra lag in the ADLLDV2 model leads to the lost of the rst observation for each country, yielding 306 observations for that estimation. This loss of data points is an additional reason why analysts may not prefer this specication.
16
18
Australia
60 80
Austria
Belgium
Canada
20
40
Denmark
60 80
Finland
France
Germany
20
40
Italy
60 80
Japan
Netherlands
Norway
20
40
Sweden
60 80
Switzerland
UK
US
40
Year
Figure 3: Time series plots of capital taxation rates by country if any variation in their tax rates, the other countries show enough variation over time to make a TSCS analysis of interest. There is also some country heterogeneity, with France, Sweden and the United Kingdom having generally higher rates. Figure 3 shows that taxation rates in some countries are strongly trending whereas others show little trend; this gure also clearly shows the beginning of period missingness pattern in the data. A regression of tax rates on time shows a trend of about .33% (with a small standard error) per annum in those rates. Thus it appears as though a TSCS analysis of this data is sensible, and that also it may be the case that there will be unexplained temporal and cross-sectional heterogeneity. Following Garrett and Mitchell, we mean-centered all observations by country and year, which is equivalent to allowing for year and country specic intercepts.17 Garrett and Mitchell wish to explain capital taxation rates (this is only one of their analyses) by variables which relate to the economy (in good times one can tax more), the demand for services (which require more revenue) and political factors (left parties spend and tax more). We work more or less with the Garrett and Mitchell specication, dropping a few
Obviously one can only decide whether these year and country specic intercepts are needed after a specication is chosen, and obviously the country and year specic intercepts are atheoretical and we should attempt to nd specications where they are not necessary. Alas, this is often impossible. Here both country and year specic intercepts were signicant in all specications. We might have preferred a model with a time trend instead of year specic intercepts, but the dierence between the two specications was negligible and we preferred to stay consistent with Garrett and Mitchell. Obviously in actual research such decisions should be made with care, and researchers should not simply do what others have done before.
17
19
variables that were not substantively interesting nor statistically signicant in any specication. We thus regress the capital tax rate (CAPTAX ) on unemployment (UNEM), economic growth (GDPPC), proportion of the population that is elderly (AGED), vulnerability of the workforce as measured by low wage imports (VULN), foreign direct investment (FDI ), and two political variables, the proportion of the cabinet portfolios held by the left (LEFT) and the proportion held by Christian Democrats (CDEM). Because we mean centered all variables there are no intercepts in the model. Table 1 reports the results of the various dynamic estimations. All standard errors are our recommended panel corrected standard errors(Beck and Katz, 1995) which are easy to compute with our recommended methodology. Table 1: Comparison of AR1, LDV, ADL and ADLLDV2 estimates of Garrett and Mitchells model of capital taxation in 16 OECD nations, 19671992 (country and year centered)
AR1 Variable VULN FDI UNEM AGED GDPPC LEFT CDEM TAXL VULNL FDIL UNEML AGEDL GDPPCL LEFTL CDEML TAXL2 rhoi 0.22 0.51 0.18 1.42 0.69 0.004 0.018 PCSE 0.12 0.26 0.22 0.51 0.11 0.012 0.022 0.10 0.37 0.34 0.35 0.62 0.006 0.015 0.70 LDV PCSE 0.07 0.21 0.14 0.24 0.12 0.009 0.012 0.06 0.28 0.59 0.68 0.26 0.80 0.003 0.015 0.76 0.21 0.55 0.48 0.24 0.29 0.005 0.005 ADL PCSE 0.13 0.26 0.27 0.71 0.13 0.013 0.025 0.07 0.14 0.29 0.26 0.76 0.12 0.013 0.024 ADLLDV2 0.33 0.48 0.68 0.27 0.81 0.002 0.031 0.93 0.24 0.56 0.62 0.98 0.36 0.004 0.010 0.26 PCSE 0.14 0.28 0.30 0.87 0.14 0.014 0.024 0.10 0.15 0.31 0.28 0.94 0.14 0.014 0.025 0.09
0.66
The static model (not shown) is clearly inadequate; a Lagrange multiplier test for serial correlation of the errors strongly rejects the null hypothesis of serially independent errors. Since the static model is nested inside both the LDV and AR1 models, standard Wald tests (a t-test of either H0 : = 0 or H0 : TAXL = 0) clearly show that the static model can be rejected in favor of either of these two models. But we must compare both the LDV and AR1 specications to the more general ADL specication. Again, since both the LDV and AR1 specications are nested inside the ADL specication we can use standard Wald tests (in this case an F-test of the null hypothesis that the coecients on all the lagged xs are zero); that null is decisively rejected, so the 20
more general ADL specication is preferred. The ADL specication still shows serial correlation of the errors; a Lagrange multiplier test of the null of independent errors shows we can reject that null hypothesis of iid errors.18 As discussed in Section 2.5, we added a second lag of capital taxation to the specication; results of estimating this specication are in the ADLLDV2 columns. We cannot reject the null hypothesis of independent errors for this regression (2 = 1.1). The ADLLDV2 1 specication is both statistically superior to the simpler specications and shows iid errors. There are, of course, many other specications that a substantive article would test (multiple speeds of adjustment, for example) but we do not pursue these here. All of the models show that a one time unit shock to the error process dies out exponentially (or nearly exponentially for the ADLLDV2 model) with similar decay rates ranging from 24% to 34% per annum for the rst three model; for the ADLLDV model the initial decay rate is only 7% in the rst year, but increases to 33% (one minus the sum of the coefcients on the lagged dependent variable) after the rst year. Given the standard errors on these coecients the decay rates are quite similar. Thus, for example, a classical condence interval for the decay rate in the ADL model is (11%, 38%) while in the ADLLDV2 model it is (17%, 49%) (after the rst year). Turning to the estimated eect of the various independent variables (omitting the two political variables which show almost no eect but huge standard errors), recall that the AR1 specication assumes that the eect of the variables is only instantaneous, the LDV model assumes the eect decays geometrically and the ADL and ADLLDV2 models allow us to test those assumptions. In those latter specications, the coecients on the current and lagged values of VULN and FDI are close in absolute value and of opposite sign. Thus the impact of those variables on capital taxation rates is more or less only instantaneous, and the ADL coecient estimates of this instantaneous eet is similar to the AR1 estimates but dierent from the LDV estimates. Of course the ADL specications allow us to study the speed of adjustment while the AR1 specication just assumes instantaneous adjustment. The coecients on UNEM and GDPPC and their lags are of opposite sign but do not quite oset each other. Here the ADL estimates are, as we would expect, much closer to the LDV estimates than to the AR1 estimates. But again, we need not make the assumptions about decay rates that the LDV specication imposes; instead we can examine what the decay process looks like. Interesting, and contrary to Achens notion of the lagged dependent variable dominating a regression, the coecients of all four of these substantive coecients are as large or larger than are the similar coecients in the AR1 specication. AGED is the only variable that some might think ought to determine tax rates, that determines tax rates in the AR1 specication but that fails to show any impact in any of the other specications. It may be noted that while AGED perhaps ought to eect tax rates, its coecient in the AR1 specication seems a bit large; would a one point increase in the aged population be expected to lead to over a one point increase in capital taxation
(A regression of the residuals from the ADL specication on the lagged residuals and all the other independent variables has an R2 of .046, which multiplied by the number of observations in that regression yields a statistic of 14; since this statistic is distributed 2 , the null hypothesis of independent errors is 1 clearly rejected.
18
21
rates? Thus perhaps it is not so simple to discuss which results make sense and making sense is hardly a statistical criterion. Note also that AGED is itself highly trending (its autoregression has a coecient of 0.93 with a standard error of 0.01). While we can reject the null that AGED has a unit root, it, like the capital tax rate, changes very slowly. Thus we might suspect that the simple contemporaneous relationship between the two variables is spurious (in the sense of Granger and Newbold, 1974). Of course we cannot know the truth here, but it is not obvious that the ADL (or LDV) results on the impact of AGED are somehow foolish or wrong. The moral so far is that researchers should estimate a model exible enough to account for various types of dynamics; they should also try hard to make sure that the error process is close to iid. The ADLLDV2 model performs very well here, both in terms of its passing various tests and its interpretability (with the simpler ADL model being easier to interpret but not quite passing the statistical tests). While no specication will be good in all situations, it is clear that researchers should not consider more general specications before accepting highly constrained ones such as either the AR1 or LDV model. While we focus on dynamics, no TSCS analysis is complete without a nal assessment of heterogeneity over countries. Remember that our analysis uses country centered data so there can be no heterogeneity in the various centered means. But we can see if the model fails to work for some subset of countries by cross-validation (Stone, 1974) leaving out one country at a time. Thus we reran the ADLLDV2 specication leaving out one country at a time, and then using the estimated values to predict capital tax rates in the omitted country; the mean absolute prediction error was then computed for each country. For all observations the absolute forecast error was about 2.3. Four countries, Japan, Norway, Sweden and the United Kingdom had mean absolute forecast errors above 3.5, indicating at least some lack of homogeneity. While we do not pursue this issue further here, clearly this issue would be pursued in a more complete analysis.19 We also assessed heterogeneity by testing for parameter heterogeneity (by country). Here, since we focus on dynamics, we t the ADL specication allowing for the coecient of the lagged dependent variable () for each country to be a random draw from a normal distribution with zero mean. This allows us to see whether the genreal speed of adjustment varies by country. Results of this estimation reveal no statistically (or substantively) signicant parameter heterogeneity (on the lagged dependent variable); the estimated standard deviation on the normal from which the coecients were drawn was only 0.09.20
19 We also do not present other post-estimation analyses that should be standard, such as residual plots by countries. 20 The standard error of the estimated standard deviation was 0.07; a test of the null hypothesis that does not vary by country yields a statistic of 0.70; this statistics is 2 , so far from the critical value for 1 rejection of the null. We can look at the individual country estimates of . Most are within 0.01 of the overall estimate of , with only the coecient for the UK really diering; the estimated for the UK is .11 under the overall estimate for , though with a standard error of about 0.07. Given all of this, we prefer not to pursue whether further investigation of the speed of adjustment in tax rates in the UK is needed, but clearly this type of analysis in other situations might prove extremely useful.
22
5.2. The growth of GDP Our second example relates to political economy explanations of the growth of GDP in 14 OECD nations observed from 19661990, using data from Garrett (1998).21 We use one of his models, taking the growth in GDP as a linear additive function of political factors and economic controls. The political variables are the proportion of cabinet posts occupied by left parties (LEFT), the degree of centralized labor bargaining as a measure of corporatism (CORP) and the product of the latter two variables (LEFTxCORP); the economic and control variables are a dummy marking the relatively prosperous period through 1973 (PER73), overall OECD GDP growth (DEMAND), trade openness (TRADE), capital mobility (CAPMOB) and a measure of oil imports (OILD). All variables, following Garretts used of country xed eects, were mean centered by country. As before, all standard errors are panel corrected. GDP growth appears stationary, with an autoregressive coecient of 0.32. Thus all specications are expected to show relatively fast dynamics, with quick returns to equilibrium. Turning to models with explanatory variables, results of estimating various specications are in Table 2. Table 2: Comparison of AR1 and LDV estimates of Garretts model of economic growth in 14 OECD nations, 19661990 (country centered) Static Variable DEMAND TRADE CAPMOB OILD PER73 CORP LEFT LEFTxCORP GDP L 0.007 0.018 0.20 7.86 1.76 0.54 0.075 0.10 PCSE 0.0012 0.019 0.21 7.34 0.42 0.56 0.17 0.53 AR1 Errors 0.007 0.021 0.25 6.69 1.76 0.43 0.076 0.10 0.12 PCSE 0.002 0.021 0.23 7.89 0.45 0.61 0.18 0.56 0.007 0.019 0.24 5.85 1.45 0.30 0.062 0.17 0.16 LDV PCSE 0.001 0.019 0..21 7.08 0.43 0.56 0.17 0.52 0.07
The static model showed modest serial correlation of the errors; a Lagrange multiplier test showed we could clearly reject the null of serially independent errors (2 = 8.6); substantively, 1 the serial correlation of the errors is small (0.12). Because of the small, albeit signicant,
We only show this example because we want to contrast data which shows a quick speed of adjustment with the previous example. Our treatment is cursory for reasons of space. As in the previous example, we use the same data for all analyses, yielding 336 observations.
21
23
amount of serial correlation of the errors, the OLS results are similar to the slightly more correct results in the two dynamic specications. Given the rapid speed of adjustment (the coecient on the LDV is 0.16), it is not surprising that all three specication show similar estimates. Very few coecients are signicant in any of the specications, but the two variables that show a strong impact in the static specication continue to show a strong impact in the two dynamic specications. The similarity of the AR1 and LDV estimates is not surprising; because of the fast dynamics the two models are not really very dierent. After one period the various independent variables in the LDV specication have only 3% of their original impact; the long-run eects in the LDV specication are only 18% larger than the immediate impacts. Thus the two specications are saying more or less the same things, and the estimated coecients are quite similar. Substantively, it appears as though GDP growth in a country is largely determined by GDP growth in its trading partners, and politics appears to play little if any role. Both specications were tested against the full ADL specication that contained all the one year lags of the independent variables. Standard hypothesis tests do not close to rejection of the simpler AR1 or LDV models in favor of the ADL model. Since the LDV and AR1 specications are not nested, discriminating between them is not so simple. Because both specications have the same number of parameters, discrimination using standard information criteria (AIC or BIC) simplify to comparisons of goodness of t, on which criterion both specications perform almost equally well. This will often be the case, since both the AR1 and LDV specications imply very quick adjustments to equilibrium when the dynamic parameters are near zero. In short, the data are consistent with very short run impacts, and it does not particularly matter how we exactly specify those dynamics. In terms of the Achen critique, there are two predictors of GDP that are strong in the AR1 model; they remain about equally strong in the LDV model. As we have argued, there is nothing about LDVs which dominate a regression or which make real eects disappear. Given the nature of dynamics, this will always be the case when variables adjust quickly. 6. CONCLUSION There is no cookbook for modeling the dynamics of TSCS models; instead, careful examination of the specications, and what they entail substantively, can allow TSCS analysts to think about how to model these dynamics. Well known econometric tests help in this process, and standard methods make it easy to estimate the appropriate dynamic model. Modeling decisions are less critical where variables equilibrate quickly; as the adjustment process slows, the various specications imply more and more dierent characteristics of the data. Analysts should take advantage of this to choose the appropriate model, that is, one which implies dynamics consistent with theoretical concerns. The specication chosen should of course be exible enough to allow for testing against alternative dynamic specications. Being more specic, we have provided evidence that, contra Achen, there is nothing pernicious in the use of a model with a lagged dependent variable. Obviously attention to issues of testing and specication are as important here as anywhere, but there is nothing 24
about lagged dependent variables that make them generically harmful. As we have seen, there are a variety of generic dynamic specications, and researchers should choose amongst them using the same general methodology they use in other cases. The ADL model (or its ADL2 complication) is a good place to start; at that point various specializations of the model can be tested against this general specication. Analysts, should, of course, interpret the dynamic results in substantive terms, focussing on long as well as short run eects. It is much better to model the dynamics directly (that is, in terms of observable variables) rather than push dynamics into a complicated error process which then must be xed up to allow for estimation. There are both theoretical and practical advantages to this. The theoretical advantage is that dynamic issues become much more than nuisances for estimation. The practical advantage is that it is easy to estimate models with (approximately) independent error processes via OLS, and easy then to estimate these models with additional complicating cross-sectional features. There are many important features of TSCS data, both in the temporal and spatial realm. Both features are substantively interesting, and neither should be swept under the rug. Fortunately the econometrics involved with good TSCS modeling are not dicult, and a clear eye on specication and testing allows researchers to nd substantively interesting features of the data.
25
REFERENCES Achen, Christopher. 2000. Why Lagged Dependent Variables Can Supress the Explanatory Power of Other Independent Variables. Presented at the Annual Meeting of the Society for Political Methodology, UCLA. Adolph, Christopher, Daniel M. Butler and Sven E. Wilson. 2005. Which Time-Series Cross-Section Estimator Should I Use Now? Guidance from Monte Carlo Experiments. Paper presented at the 2005 Annual Meeting of the American Political Science Association, Washington, D.C. Alvarez, R. Michael and Jonathan N. Katz. 2000. Aggregation and Dynamics of Survey Responses: The Case of Presidential Approval. Social Scicnce Working Paper 1103, Division of the Humanities and Social Science, California Institute of Technology,. Anderson, T. W. and Cheng Hsiao. 1982. Formulation and Estimation of Dynamic Models Using Panel Data. Journal of Econometrics 18:4782. Beck, Nathaniel. 1985. Estimating Dynamic Models is Not Merely a Matter of Technique. Political Methodology 11:7190. Beck, Nathaniel. 1991. Comparing Dynamic Specications: The Case of Presidential Approval. Political Analysis 3:5187. Beck, Nathaniel. 1992. The Methodology of Cointegration. Political Analysis 4:23747. Beck, Nathaniel. 2001. Time-SeriesCross-Section Data: What Have We Learned in the Past Few Years? Annual Review of Political Science 4:27193. Beck, Nathaniel and Jonathan N. Katz. 1995. What To Do (and Not To Do) with TimeSeries Cross-Section Data. American Political Science Review 89:634647. Beck, Nathaniel and Jonathan N. Katz. 1996. Nuisance vs. Substance: Specifying and Estimating Time-SeriesCross-Section Models. Political Analysis 6:136. Beck, Nathaniel and Jonathan N. Katz. 2007. Random Coecient Models for Time-Series Cross-Section Data: Monte Carlo Experiments. Political Analysis 15:18295. Beck, Nathaniel and Jonathan N. Katz. 2009. Modeling Dynamics in Time-SeriesCrossSection Political Economy Data. Caltech Social Science Working Paper No. 1304. Beck, Nathaniel, Jonathan N. Katz and Richard Tucker. 1998. Taking Time Seriously: Time Series Cross Section Analysis with a Binary Dependent Variable. American Journal of Political Science 42:12601288. Beck, Nathaniel, Kristian Skrede Gleditsch and Kyle Beardsley. 2006. Space is More than Geography: Using Spatial Econometrics in the Study of Political Economy. International Studies Quarterly 50:2744. 26
Davidson, James H., David F. Hendry, F. Srba and S. Yeo. 1978. Econometric Modelling of the Aggregate Time-Series Relationship Between Consumers Expenditures and Income in the United Kingdom. Economic Journal 88:66192. De Boef, Suzanna and Luke Keele. 2008. Taking Time Seriously. American Journal of Political Science 52:184200. Engle, Robert and Clive W. Granger. 1987. Co-Integration and Error Correction: Representation, Estimation and Testing. Econometrica 55:25176. Franzese, Robert J. and Jude C. Hayes. 2007. Spatial Econometric Models of CrossSectional Interdependence in Political Science Panel and Time-Series-Cross-Section Data. Political Analysis 15:14064. Garrett, Georey. 1998. Partisan Politics in the Global Economy. New York: Cambridge University Press. Garrett, Georey and Deborah Mitchell. 2001. Globalization, Government Spending and Taxation in the OECD. European Journal of Political Research 39:14477. Granger, Clive W. and Paul Newbold. 1974. Spurious Regressions in Econometrics. Journal of Econometrics 2:11120. Greene, William. 2008. Econometric Analysis. Sixth ed. Upper Saddle River, N.J.: Pearson Prentice Hall. Hamilton, James D. 1994. Time Series Analysis. Princeton: Princeton University Press. Hendry, David and Graham Mizon. 1978. Serial Correlation as a Convenient Simplication, Not a Nuisance: A Comment on a Study of the Demand for Money by the Bank of England. Economic Journal 88:549563. Hibbs, Douglas. 1974. Problems of Statistical Estimation and Causal Inference in Timeseries Regression Models. In Sociological Methodology 19731974, ed. H. Costner. San Francisco: Jossey-Bass pp. 252308. Honaker, James and Gary King. 2010. What to do About Missing Values in Time Series Cross-Section Data. American Journal of Political Science 54:561581. Huber, Evelyne and John D. Stephens. 2001. Development and Crisis of the Welfare State.. Huber, Peter J. 1967. The Behavior of Maximum Likelihood Estimates Under Non-Standard Conditions. In Proceedings of the Fifth Annual Berkeley Symposium on Mathematical Statistics and Probability, ed. Lucien M. LeCam and Jerzy Neyman. Vol. I Berkeley, Ca.: University of California Press pp. 22133. Im, Kyung So, M. Hashem Pesaran and Yongcheol Shin. 2003. Testing for Unit Roots in Heterogeneous Panels. Journal of Econometrics 115:5374. 27
Judson, Katherine A. and Anne L. Owen. 1999. Estimating Dynamic Panel Data Models: A Guide for Macroeconomists. Economics Letters 65:915. Keele, Luke and Nathan J. Kelly. 2006. Dynamic Models for Dynamic Theories: The Ins and Outs of Lagged Dependent Variables. Political Analysis 14:186205. Kiviet, Jan F. 1995. On Bias, Inconsistency, and Eciency of Various Estimators in Dynamic Panel Models. Journal of Econometrics 68:5378. Levin, Andrew, C.-F Lin and C-S. J. Chu. 2002. Unit Root Tests in Panel Data: Asymptotic and Finite-Sample Properties. Journal of Econometrics 108:124. Little, Roderick J.A. and Donald B. Rubin. 1987. Statistical Analysis with Missing Data. New York: Wiley. Mizon, Graham. 1984. The Encompassing Approach in Econometrics. In Econometrics and Quantitative Economics, ed. David Hendry and Kenneth Wallis. Oxford: Basic Blackwell pp. 135172. Newey, Whitney K. and Kenneth D. West. 1987. A Simple Positive Semi-Denite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimator. Econometrica 55:703708. Nickell, Stephen. 1981. Biases in Dynamic Models with Fixed Eects. Econometrica 49:141726. Stone, M. 1974. Crossvalidatory Choice and Assessment of Statistical Prediction. Journal of the Royal Statistical Society, Series B 36:11133.
28