Academia.eduAcademia.edu

Panel data analysis—advantages and challenges

2007, TEST

We explain the proliferation of panel data studies in terms of (i) data availability, (ii) the more heightened capacity for modeling the complexity of human behavior than a single cross-section or time series data can possibly allow, and (iii) challenging methodology. Advantages and issues of panel data modeling are also discussed.

WISE WORKING PAPER SERIES WISEWP0602 Panel Data Analysis - Advantages and Challenges Cheng Hsiao April 19, 2006 COPYRIGHT© WISE, XIAMEN UNIVERSITY, CHINA Panel Data Analysis — Advantages and Challenges Cheng Hsiao∗ Department of Economics University of Southern California Los Angeles, CA 90089-0253 and Wang Yanan Institute for Studies in Economics Xiamen University, China April 19, 2006 ABSTRACT We explain the proliferation of panel data studies in terms of (i) data availability, (ii) the more heightened capacity for modeling the complexity of human behavior than a single cross-section or time series data can possibly allow, and (iii) challenging methodology. Advantages and issues of panel data modeling are also discussed. Keywords: Panel data; Longitudinal data; Unobserved heterogeneity; Random effects; Fixed effects ∗ I would like to thank Irene C. Hsiao for helpful discussion and editorial assistance and Kannika Damrongplasit for drawing the figures. Some of the arguments presented here also appear in Hsiao (2005, 2006). 1. Introduction Panel data or longitudinal data typically refer to data containing time series observations of a number of individuals. Therefore, observations in panel data involve at least two dimensions; a cross-sectional dimension, indicated by subscript i, and a time series dimension, indicated by subscript t. However, panel data could have a more complicated clustering or hierarchical structure. For instance, variable y may be the measurement of the level of air pollution at station ℓ in city j of country i at time t (e.g. Antweiler (2001), Davis (1999)). For ease of exposition, I shall confine my presentation to a balanced panel involving N cross-sectional units, i = 1, . . . , N , over T time periods, t = 1, . . . , T . There is a proliferation of panel data studies, be it methodological or empirical. In 1986, when Hsiao’s (1986) first edition of Panel Data Analysis was published, there were 29 studies listing the key words: “panel data or longitudinal data”, according to Social Sciences Citation index. By 2004, there were 687 and by 2005, there were 773. The growth of applied studies and the methodological development of new econometric tools of panel data have been simply phenomenal since the seminal paper of Balestra and Nerlove (1966). There are at least three factors contributing to the geometric growth of panel data studies. (i) data availability, (ii) greater capacity for modeling the complexity of human behavior than a single cross-section or time series data, and (iii) challenging methodology. In what follows, we shall briefly elaborate each of these one by one. However, it is impossible to do justice to the vast literature on panel data. For further reference, see Arellano (2003), Baltagi (2001), Hsiao (2003), Matyas and Sevester (1996), and Nerlove (2002), etc. 2. Data Availability The collection of panel data is obviously much more costly than the collection of crosssectional or time series data. However, panel data have become widely available in both developed and developing countries. The two most prominent panel data sets in the US are the National Longitudinal Surveys of Labor Market Experience (NLS) and the University of Michigan’s Panel Study 1 of Income Dynamics (PSID). The NLS began in the mid 1960’s. It contains five separate annual surveys covering distinct segments of the labor force with different spans: men whose ages were 45 to 59 in 1966, young men 14 to 24 in 1966, women 30 to 44 in 1967, young women 14 to 24 in 1968, and youth of both sexes 14 to 21 in 1979. In 1986, the NLS expanded to include annual surveys of the children born to women who participated in the National Longitudinal Survey of Youth 1979. The list of variables surveyed is running into the thousands, with emphasis on the supply side of market. The PSID began with collection of annual economic information from a representative national sample of about 6,000 families and 15,000 individuals in 1968 and has continued to the present. The data set contains over 5,000 variables (Becketti, Gould, Lillard and Welch (1988)). In addition to the NLS and PSID data sets, there are many other panel data sets that could be of interest to economists, see Juster (2000). In Europe, many countries have their annual national or more frequent surveys such as the Netherlands Socio-Economic Panel (SEP), the German Social Economics Panel (GSOEP), the Luxembourg Social Panel (PSELL), the British Household Panel Survey (BHS), etc. Starting in 1994, the National Data Collection Units (NDUS) of the Statistical Office of the European Committees have been coordinating and linking existing national panels with centrally designed multi-purpose annual longitudinal surveys. The European Community Household Panel (ECHP) are published in Eurostat’s reference data base New Cronos in three domains: health, housing, and income and living conditions. Panel data have also become increasingly available in developing countries. In these countries, there may not have been a long tradition of statistical collection. It is of special importance to obtain original survey data to answer many significant and important questions. Many international agencies have sponsored and helped to design panel surveys. For instance, the Dutch non-government organization (NGO), ICS, Africa, collaborated with the Kenya Ministry of Health to carry out a Primary School Deworming Project (PDSP). The project took place in Busia district, a poor and densely-settled farming region in 2 western Kenya. The 75 project schools include nearly all rural primary schools in this area, with over 30,000 enrolled pupils between the ages of six to eighteen from 1998-2001. Another example is the Development Research Institute of the Research Center for Rural Development of the State Council of China, in collaboration with the World Bank, which undertook an annual survey of 200 large Chinese township and village enterprises from 1984 to 1990. 3. Advantages of Panel Data Panel data, by blending the inter-individual differences and intra-individual dynamics have several advantages over cross-sectional or time-series data: (i) More accurate inference of model parameters. Panel data usually contain more degrees of freedom and less multicollinearity than cross-sectional data which may be viewed as a panel with T = 1, or time series data which is a panel with N = 1, hence improving the efficiency of econometric estimates (e.g. Hsiao, Mountain and Ho-Illman (1995). (ii) Greater capacity for capturing the complexity of human behavior than a single cross-section or time series data. These include: (ii.a) Constructing and testing more complicated behavioral hypotheses. For instance, consider the example of Ben-Porath (1973) that a cross-sectional sample of married women was found to have an average yearly labor-force participation rate of 50 percent. These could be the outcome of random draws from a homogeneous population or could be draws from heterogeneous populations in which 50% were from the population who always work and 50% never work. If the sample was from the former, each woman would be expected to spend half of her married life in the labor force and half out of the labor force. The job turnover rate would be expected to be frequent and 3 the average job duration would be about two years. If the sample was from the latter, there is no turnover. The current information about a woman’s work status is a perfect predictor of her future work status. A cross-sectional data is not able to distinguish between these two possibilities, but panel data can because the sequential observations for a number of women contain information about their labor participation in different subintervals of their life cycle. Another example is the evaluation of the effectiveness of social programs (e.g. Heckman, Ichimura, Smith and Toda (1998), Hsiao, Shen, Wang and Wang (2005), Rosenbaum and Rubin (1985). Evaluating the effectiveness of certain programs using cross-sectional sample typically suffers from the fact that those receiving treatment are different from those without. In other words, one does not simultaneously observe what happens to an individual when she receives the treatment or when she does not. An individual is observed as either receiving treatment or not receiving treatment. Using the difference between the treatment group and control group could suffer from two sources of biases, selection bias due to differences in observable factors between the treatment and control groups and selection bias due to endogeneity of participation in treatment. For instance, Northern Territory (NT) in Australia decriminalized possession of small amount of marijuana in 1996. Evaluating the effects of decriminalization on marijuana smoking behavior by comparing the differences between NT and other states that were still non-decriminalized could suffer from either or both sorts of bias. If panel data over this time period are available, it would allow the possibility of observing the before- and affect-effects on individuals of decriminalization as well as providing the possibility of isolating the effects of treatment from other factors affecting the outcome. 4 (ii.b) Controlling the impact of omitted variables. It is frequently argued that the real reason one finds (or does not find) certain effects is due to ignoring the effects of certain variables in one’s model specification which are correlated with the included explanatory variables. Panel data contain information on both the intertemporal dynamics and the individuality of the entities may allow one to control the effects of missing or unobserved variables. For instance, MaCurdy’s (1981) life-cycle labor supply model under certainty implies that because the logarithm of a worker’s hours worked is a linear function of the logarithm of her wage rate and the logarithm of worker’s marginal utility of initial wealth, leaving out the logarithm of the worker’s marginal utility of initial wealth from the regression of hours worked on wage rate because it is unobserved can lead to seriously biased inference on the wage elasticity on hours worked since initial wealth is likely to be correlated with wage rate. However, since a worker’s marginal utility of initial wealth stays constant over time, if time series observations of an individual are available, one can take the difference of a worker’s labor supply equation over time to eliminate the effect of marginal utility of initial wealth on hours worked. The rate of change of an individual’s hours worked now depends only on the rate of change of her wage rate. It no longer depends on her marginal utility of initial wealth. (ii.c) Uncovering dynamic relationships. “Economic behavior is inherently dynamic so that most econometrically interesting relationship are explicitly or implicitly dynamic”. (Nerlove (2002)). However, the estimation of time-adjustment pattern using time series data often has to rely on arbitrary prior restrictions such as Koyck or Almon distributed lag models because time series observations of current and lagged variables are likely to be highly collinear (e.g. Griliches (1967)). With panel 5 data, we can rely on the inter-individual differences to reduce the collinearity between current and lag variables to estimate unrestricted time-adjustment patterns (e.g. Pakes and Griliches (1984)). (ii.d) Generating more accurate predictions for individual outcomes by pooling the data rather than generating predictions of individual outcomes using the data on the individual in question. If individual behaviors are similar conditional on certain variables, panel data provide the possibility of learning an individual’s behavior by observing the behavior of others. Thus, it is possible to obtain a more accurate description of an individual’s behavior by supplementing observations of the individual in question with data on other individuals (e.g. Hsiao, Appelbe and Dineen (1993), Hsiao, Chan, Mountain and Tsui (1989)). (ii.e) Providing micro foundations for aggregate data analysis. Aggregate data analysis often invokes the “representative agent” assumption. However, if micro units are heterogeneous, not only can the time series properties of aggregate data be very different from those of disaggregate data (e.g., Granger (1990); Lewbel (1992); Pesaran (2003)), but policy evaluation based on aggregate data may be grossly misleading. Furthermore, the prediction of aggregate outcomes using aggregate data can be less accurate than the prediction based on micro-equations (e.g., Hsiao, Shen and Fujiki (2005)). Panel data containing time series observations for a number of individuals is ideal for investigating the “homogeneity” versus “heterogeneity” issue. (iii) Simplifying computation and statistical inference. Panel data involve at least two dimensions, a cross-sectional dimension and a time series dimension. Under normal circumstances one would expect that the 6 computation of panel data estimator or inference would be more complicated than cross-sectional or time series data. However, in certain cases, the availability of panel data actually simplifies computation and inference. For instance: (iii.a) Analysis of nonstationary time series. When time series data are not stationary, the large sample approximation of the distributions of the least-squares or maximum likelihood estimators are no longer normally distributed, (e.g. Anderson (1959), Dickey and Fuller (1979,81), Phillips and Durlauf (1986)). But if panel data are available, and observations among cross-sectional units are independent, then one can invoke the central limit theorem across cross-sectional units to show that the limiting distributions of many estimators remain asymptotically normal (e.g. Binder, Hsiao and Pesaran (2005), Levin, Lin and Chu (2002), Im, Pesaran and Shin (2004), Phillips and Moon (1999)). (iii.b) Measurement errors. Measurement errors can lead to under-identification of an econometric model (e.g. Aigner, Hsiao, Kapteyn and Wansbeek (1985)). The availability of multiple observations for a given individual or at a given time may allow a researcher to make different transformations to induce different and deducible changes in the estimators, hence to identify an otherwise unidentified model (e.g. Biorn (1992), Griliches and Hausman (1986), Wansbeek and Koning (1989)). (iii.c) Dynamic Tobit models. When a variable is truncated or censored, the actual realized value is unobserved. If an outcome variable depends on previous realized value and the previous realized value are unobserved, one has to take integration over the truncated range to obtain the likelihood of observables. In a dynamic framework with multiple missing values, the multiple 7 integration is computationally unfeasible. With panel data, the problem can be simplified by only focusing on the subsample in which previous realized values are observed (e.g. Arellano, Bover, and Labeager (1999)). 4. Methodology Standard statistical methodology is based on the assumption that the outcomes, say y , ˜ conditional on certain variables, say x, are random outcomes from a probability distribution ˜ that is characterized by a fixed dimensional parameter vector, θ , f (y | x; θ). For instance, ˜ ˜ ˜ ˜ the standard linear regression model assumes that f (y | x; θ) takes the form that ˜ ˜ ˜ E(y | x) = α + β ′ x, ˜ ˜˜ (4.1) Var(y | x) = σ 2 , ˜ (4.2) and where θ′ = (α, β ′ , σ 2 ). Typical panel data focuses on individual outcomes. Factors affecting ˜ ˜ individual outcomes are numerous. It is rare to be able to assume a common conditional probability density function of y conditional on x for all cross-sectional units, i, at all time, ˜ t. For instance, suppose that in addition to x, individual outcomes are also affected by ˜ unobserved individual abilities (or marginal utility of initial wealth as in MaCurdy (1981) labor supply model discussed in (iib) on section 3), represented by αi , so that the observed (yit , xit ), i = 1, . . . , N, t = 1, . . . , T , are actually generated by ˜ i = 1, . . . , N, yit = αi + β ′ xit + uit , t = 1, . . . , T, ˜˜ (4.3) as depicted by Figure 1, 2 and 3 in which the broken-line ellipses represent the point scatter of individual observations around the mean, represented by the broken straight lines. If an investigator mistakenly imposes the homogeneity assumption (4.1) - (4.2), the solid lines in those figures would represent the estimated relationships between y and x, which can ˜ be grossly misleading. 8 If the conditional density of y given x varies across i and over t, the fundamental ˜ theorems for statistical inference, the laws of large numbers and central limit theorems, will be difficult to implement. One way to restore homogeneity across i and/or over t is to add more conditional variables, say z , ˜ f (yit | xit , z it ; θ). ˜ ˜ ˜ (4.4) However, the dimension of z can be large. A model is a simplification of reality, not a ˜ mimic of reality. The inclusion of z may confuse the fundamental relationship between y ˜ and x, in particular, when there is a shortage of degrees of freedom or multicollinearity, etc. ˜ Moreover, z may not be observable. If an investigator is only interested in the relationship ˜ between y and x, one approach to characterize the heterogeneity not captured by x is to ˜ ˜ assume that the parameter vector varies across i and over t, θit , so that the conditional ˜ density of y given x takes the form f (yit | xit ; θit ). However, without a structure being ˜ ˜ ˜ imposed on θ it , such a model only has descriptive value. It is not possible to draw any ˜ inference about θ it . ˜ The methodological literature on panel data is to suggest possible structures on θ it ˜ (e.g. Hsiao (2003)). One way to impose some structure on θit is to decompose θ it into ˜ ˜ (β , γ it ), where β is the same across i and over t, referred to as structural parameters, ˜ ˜ ˜ and γ it as incidental parameters because when cross-section units, N and/or time series ˜ observations, T increases, so does the dimension of γ it . The focus of panel data literature ˜ is to make inference on β after controlling the impact of γ it . ˜ ˜ Without imposing a structure for γ it , again it is difficult to make any inference on β ˜ ˜ because estimation of β could depend on γit and the estimation of the unknown γ it probably ˜ will exhaust all available sample information. Assuming that the impacts of observable variables, x, are the same across i and over t, represented by the structure parameters, ˜ β , the incidental parameters γ it represent the heterogeneity across i and over t that are ˜ ˜ not captured by xit . They can be considered composed of the effects of omitted individual ˜ time-invariant, αi , period individual-invariant, λt , and individual time-varying variables, 9 δit . The individual time-invariant variables are variables that are the same for a given cross-sectional unit through time but vary across cross-sectional units such as individualfirm management, ability, gender, and socio-economic background variables. The period individual-invariant variables are variables that are the same for all cross-sectional units at a given time but vary through time such as prices, interest rates, and wide spread optimism or pessimism. The individual time-varying variables are variables that vary across crosssectional units at a given point in time and also exhibit variations through time such as firm profits, sales and capital stock. The effects of unobserved heterogeneity can either be assumed as random variables, referred to as the random effects model, or fixed parameters, referred to as the fixed effects model, or a mixture of both, refereed to as the mixed effects model. The challenge of panel methodology is to control the impact of unobserved heterogeneity, represented by the incidental parameters, γit , to obtain valid inference on the structural parameters β . A general principle of obtaining valid inference of β in the pres˜ ˜ ence of incidental parameters γ it is to find proper transformation to eliminate γ it from ˜ ˜ the specification. Since proper transformations depend on the model one is interested. As illustrations, I shall try to demonstrate the fundamental issues from the perspective of linear static models, dynamic models, nonlinear models, models with cross-sectional dependencies and models with large N and large T . For ease of exposition, I shall assume for the most time that there are no time-specific effects, λt and the individual time-varying effects, δit , can be represented by a random variable uit , that is treated as the error of an equation. In other words, only individualspecific effects, αi , are present. The individual-specific effects, αi , can either be assume as random or fixed. The standard assumption for random effects specification is that they are randomly distributed with a common mean and are independent of fixed xit . ˜ The advantages of random effects (RE) specification are: (a) The number of parameters stay constant when sample size increases. (b) It allows the derivation of efficient 10 estimators that make use of both within and between (group) variation. (c) It allows the estimation of the impact of time-invariant variables. The disadvantage is that one has to specify a conditional density of αi given x′i = (xit , . . . , xiT ), f (αi | xi ), while αi are ˜ ˜ ˜ ˜ unobservable. A common assumption is that f (αi | xi ) is identical to the marginal density ˜ f (αi ). However, if the effects are correlated with xit or if there is a fundamental difference ˜ among individual units, i.e., conditional on xit , yit cannot be viewed as a random draw ˜ from a common distribution, common RE model is misspecified and the resulting estimator is biased. The advantages of fixed effects (FE) specification are that it can allow the individualand/or time specific effects to be correlated with explanatory variables xit . Neither does ˜ it require an investigator to model their correlation patterns. The disadvantages of the FE specification are: (a’) The number of unknown parameters increases with the number of sample observations. In the case when T (or N for γt ) is finite, it introduces the classical incidental parameter problem (e.g. Neyman and Scott (1948)). (b’) The FE estimator does not allow the estimation of the coefficients that are time-invariant. In order words, the advantages of RE specification are the disadvantages of FE specification and the disadvantages of RE specification are the advantages of FE specification. To choose between the two specifications, Hausman (1978) notes that if the FE estimator (or GMM), θ̂ F E , is consistent whether αi is fixed or random and the commonly used RE ˜ estimator (or GLS), θ̂RE , is consistent and efficient only when αi is indeed uncorrelated ˜ with xit and is inconsistent if αi is correlated with xit . Therefore, he suggests using the ˜ ˜ statistic −   ′   (4.5) Cov (θ̂ F E ) − Cov (θ̂ RE ) θ̂F E − θ̂ RE θ̂ F E − θ̂ RE ˜ ˜ ˜ ˜ ˜ ˜ to test RE vs FE specification. The statistic (4.5) is asymptotically chi-square distributed   with degrees of freedom equal to the rank of Cov (θ̂GM M ) − Cov (θ̂ RE ) . ˜ ˜ 4.1 Linear Static Models 11 A widely used panel data model is to assume that the effects of observed explanatory variables, x, are identical across cross-sectional units, i, and over time, t, while the effects ˜ of omitted variables can be decomposed into the individual-specific effects, αi , time-specific effects, λt , and individual time-varying effects, δit = uit , as follows: i = 1, . . . , N, yit = β ′ xit + αi + λt + uit , t = 1, . . . , T. ˜˜ (4.6) In a single equation framework, individual-time effects, u, are assumed random and uncorrelated with x, while αi and λt may or may not correlated with x. When αi and λt ˜ ˜ are treated as fixed constants as coefficients of dummy explanatory variables, dit = 1 if the observation corresponds to ith individual at time t, and 0 otherwise, whether they are correlated with x is not an issue. On the other hand, when αi and λt are treated as ˜ random, they become part of the error term and are typically assumed to be uncorrelated with xit . ˜ For ease of exposition, we shall assume that there are no time-specific effects, i.e., λt = 0 for all t and uit are independently, identically distributed (i.i.d) across i and over t. Stack an individuals T time series observations of (yit , x′it ) into a vector and a matrix, ˜ (4.6) may alternatively be written as y i = Xi β + eαi + ui , i = 1, . . . , N, ˜ ˜ ˜ ˜ (4.7) where y i = (yi1 , . . . , yiT )′ , Xi = (xi1 , . . . , xiT )′ , ui = (ui1 , . . . , uiT )′ , and e is a T × 1 vector ˜ ˜ ˜ ˜ ˜ of 1’s. Let Q be a T × T matrix satisfying the condition that Qe = 0. Premultiplying (4.7) ˜ ˜ by Q yields Qy i = QXi β + Qui , ˜ ˜ ˜ i = 1. . . . , N. (4.8) Equation (4.8) no longer involves αi . The issue of whether αi is correlated with xit or ˜ whether αi should be treated as fixed or random is no longer relevant for (4.8). Moreover, since Xi is exogenous, E(QXi u′i Q′ ) = QE(Xi u′i )Q′ = 0 and EQui u′i Q′ = σu2 QQ′ . An ˜ ˜˜ ˜ ˜ 12 efficient estimator of β is the generalized least squares estimator (GLS), ˜  −1  N N   β̂ = Xi′ (Q′ Q)− Xi Xi′ (Q′ Q)− y i , ˜ ˜ i=1 i=1 (4.9) where (Q′ Q)− denotes the Moore-Penrose generalized inverse (e.g. Rao (1973)). 1 T ee′ , Q is idempotent. The Moore-Penrose generalized inverse of ˜˜ (Q′ Q)− is just Q = IT − T1 ee′ itself. Premultiplying (4.8) by Q is equivalent to transforming ˜˜ (4.6) into a model When Q = IT − i = 1, . . . , N, (yit − ȳi ) = β ′ (xit − x̄i ) + (uit − ūi ), t = 1, . . . , T, ˜ ˜ ˜ (4.10) T T   yit , x̄i = T1 xit and ūi = T1 uit . The transformation is called ˜ t=1 t=1 ˜ t=1 covariance transformation. The least squares estimator (LS) (or a generalized least squares where ȳi = 1 T T  estimator (GLS)) of (4.10), β̂ cv = ˜ N T  i=1 (xit − x̄i )(xit − x̄i )′ ˜ ˜ ˜ ˜ t=1 −1  N T  t=1 t=1  (xit − x̄i )(yit − ȳi ) , ˜ ˜ (4.11) is called covariance estimator or within estimator because the estimation of β only makes ˜ use of within (group) variation of yit and xit only. The covariance estimator of β turns out ˜ ˜ to be also the least squares estimator of (4.10). It is the best linear unbiased estimator of β if αi is treated as fixed and uit is i.i.d. ˜ If αi is random, transforming (4.7) into (4.8) transforms T independent equations (or observations) into (T − 1) independent equations, hence the covariance estimator is not as efficient as the efficient generalized least squares estimator if Eαi x′it = 0′ . When αi is ˜ ˜ independent of xit and is independently, identically distributed across i with mean 0 and ˜ ˜ 2 variance σα , the best linear unbiased estimator (BLUE) of β is GLS, ˜  N −1  N   β̂ = (4.12) Xi′ V −1 Xi Xi′ V −1 y i . ˜ ˜ i=1 i=1   2 σ2 σα 1 ′ ′ −1 2 2 where V = σu IT + σα ee , V = σ2 IT − σ2 +T σ2 ee . Let ψ = σ2 +Tu σ2 , the GLS is u u α ˜˜ u α ˜˜ equivalent to first transforming the data by subtracting a fraction (1 − ψ 1/2 ) of individual 13 means ȳi and x̄i from their corresponding yit and xit , then regressing [yit − (1 − ψ 1/2 )ȳi ] ˜ ˜ 2 σu 1/2 on [xit − (1 − ψ )x̄i ]. ψ = σ2 +T σ2 . (for detail, see Baltagi (2001), Hsiao (2003)). u α ˜ ˜ If a variable is time-invariant, like gender dummy, xkit = xkis = x̄ki , the covariance transformation eliminates the corresponding variable from the specification. Hence, the coefficients of time-invariant variables cannot be estimated. On the other hand, if αi is random and uncorrelated with xi , ψ = 1, the GLS can still estimate the coefficients of ˜ those time-invariant variables. 4.2 Dynamic Models When the regressors of a linear model contains lagged dependent variables, say, of the form (e.g. Balestra and Nerlove (1966)) y i = y i,−1 γ + Xi β + eαi + ui = Zi θ + eαi + ui , ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ i = 1, . . . , N. (4.13) where y i,−1 = (yi0 , . . . , yi,T −1 )′ , Zi = (y i,−1 , Xi ) and θ = (γ, β ′ )′ . For ease of notation, we ˜ ˜ ˜ ˜ assume that yi0 are observable. Technically, we can still eliminate the individual-specific effects by premultiplying (4.13) by the transformation matrix Q (Qe = 0), ˜ ˜ Qy i = QZi θ + Qui . ˜ ˜ ˜ (4.14) However, because of the presence of lagged dependent variables, EQZi u′i Q′ = 0 even with ˜ the assumption that uit is independently, identically distributed across i and over t. For instance, the covariance transformation matrix Q = IT − 1 T ee′ transforms (4.13) into the ˜˜ form (yit − ȳi ) = (yi,t−1 − ȳi,−1 )γ + (xit − x̄i )′ β + (uit − ūi ), ˜ ˜ ˜ where ȳi = 1 T T  t=1 yit , ȳi,−1 = 1 T T  yi,t−1 and ūi = t=1 1 T T  i = 1, . . . , N, t = 1, . . . , T, (4.15) uit . Although, yi,t−1 and uit are t=1 uncorrelated under the assumption of serial independence of uit , the covariance between ȳi,−1 and uit or yi,t−1 and ūi is of order (1/T) if | γ |< 1. Therefore, the covariance estimator of θ creates a bias of order (1/T) when N → ∞ (Anderson and Hsiao (1981, ˜ 14 1982), Nickell (1981)). Since most panel data contain large N but small T , the magnitude of the bias can not be ignored (e.g. with T=10 and γ=0.5, the asymptotic bias is -0.167). When EQZi u′i Q′ = 0, one way to obtain a consistent estimator for θ is to find instru˜ ˜ ˜ ments Wi that satisfy EWi u′i Q′ = 0, ˜ ˜ (4.16) rank (Wi QZi ) = k, (4.17) and where k denotes the dimension of (γ, β′ )′ , then apply the generalized instrumental variable ˜ or generalized method of moments estimator (GMM) by minimizing the objective function N  Wi (Qy i − QZi θ) ˜ ˜ i=1 ′  N  Wi Qui u′i Q′ Wi′ ˜ ˜ i=1 −1  N   Wi (Qy i − QZ i θ) , ˜ ˜ ˜ i=1 (4.18) with respect to θ. (e.g. Arellano (2003), Ahn and Schmidt (1995), Arellano and Bond ˜ (1991), Arellano and Bover (1995)). For instance, one may let Q be a (T − 1) × T matrix of the form −1 ⎢ 0 D=⎣ 0 · ⎡ 1 −1 · · 0 · 1 · · · · −1 ⎤ · ·⎥ ⎦, · 1 (4.19) then the transformation (4.14) is equivalent to taking the first difference of (4.13) over time to eliminate αi for t = 2, . . . , T , i = 1, . . . , N, ∆yit = ∆yi,t−1 γ + ∆x′it β + ∆uit , t = 2, . . . , T, ˜ ˜ (4.20) where ∆ = (1 − L) and L denotes the lag operator, Lyt = yt−1 . Since ∆uit = (uit − ui,t−1 ) is uncorrelated with yi,t−j for j ≥ 2 and xis , for all s, when uit is independently distributed ˜ over time and xit is exogenous, one can let Wi be a T (T − 1)[K + 12 ] × (T − 1) matrix of ˜ the form ⎡ ⎤ · q i2 0 · · ⎥ ⎢ ˜0 q˜is · ⎢ ˜ ˜ ⎥ · · · ⎥, Wi = ⎢ · (4.21) ⎣ ⎦ · · · · · · · q iT ˜ 15 where q it = (yi0 , yi1 , . . . , yi,t−2 , x′i )′ , xi = (x′i1 , . . . , x′iT )′ , and K = k − 1. Under the ˜ ˜ ˜ ˜ ˜ ′ ′ assumption that (y i , xi ) are independently, identically distributed across i, the Arellano˜ ˜ Bover (1991) GMM estimator takes the form ⎧ N ⎨   N  −1  N  ⎫−1 ⎬ Zi′ D′ Wi′ Wi AWi′ Wi DZi θ̂ AB,GM M = ⎩ ⎭ ˜ i=1 i=1 i=1 ⎧ ⎫ −1  N  N N ⎨  ⎬   ′ ′ ′ Zi DWi Wi Dy i Wi AWi , ⎩ ˜ ⎭ i=1 i=1 (4.22) i=1 where A is a (T − 1) × (T − 1) matrix with 2 on the diagonal elements, −1 on the elements above and below the diagonal elements and 0 elsewhere. The GMM estimator has the advantage that it is consistent and asymptotically normally distributed whether αi is treated as fixed or random because it eliminates αi from the specification. However, the number of moment conditions increases at the order of T 2 which can create severe downward bias in finite sample (Ziliak (1997)). An alternative is to use a (quasi-) likelihood approach which has the advantage of having a fixed number of orthogonality conditions independent of the sample size. It also has the advantage of making use of all the available sample, hence may yield more efficient estimator than (4.22) (e.g. Hsiao, Pesaran and Tahmiscioglu (2002), Binder, Hsiao and Pesaran (2004)). However, the likelihood approach has to formulate the joint likelihood function of (yi0 , yi1 , . . . , yiT ) (or the conditional likelihood function (yi1 , . . . , yiT | yi0 )). Since there is no reason to assume that the data generating process of initial observations, yi0 , to be different from the rest of yit , the initial yi0 depends on previous values of xi,−j and αi which are unavailable. Bhar˜ gava and Sargan (1983) suggest to circumscribe this missing data problem by conditioning yi0 on xi and αi if αi is treated as random. If αi is treated as a fixed constant, Hsiao, ˜ Pesaran and Tahmisciogulu (2002) propose conditioning (yi1 − yi0 ) on the first difference of xi . ˜ 4.3 Nonlinear Models 16 When the unobserved individual specific effects, αi , (and or time-specific effects, λt ) affect the outcome, yit , linearly, one can avoid the consideration of random versus fixed effects specification by eliminating them from the specification through some linear transformation such as the covariance transformation (4.8) or first difference transformation (4.20). However, if αi affects yit nonlinearly, it is not easy to find transformation that can eliminate αi . For instance, consider the following binary choice model where the observed yit takes the value of either 1 or 0 depending on the latent response function ∗ yit = β ′ xit + αi + uit , ˜˜ (4.23) and yit =  ∗ 1, if yit > 0, ∗ 0, if yit ≤ 0, (4.24) where uit is independently, identically distributed with density function f (uit ). Let yit = E(yit | xit , αi ) + ǫit , ˜ then  (4.25) ∞ f (u)du xit +αi ) (4.26) ˜˜ ′ = [1 − F (−β xit − αi )]. ˜˜ Since αi affects E(yit | xit , αi ) nonlinearly, αi remains after taking successive difference of ˜ yit , yit − yi,t−1 = [1 − F (−β ′ xit − αi )] ˜˜ (4.27) − [1 − F (−β ′ xi,t−1 − αi )] + (ǫit − ǫi,t−1 ). ˜˜ The likelihood function conditional on xi and αi takes the form, ˜ E(yit | xit , αi ) = ˜ −(β ′ ′ 1−yit T [1 − F (−β ′ xit − αi )]yit . ΠN i=1 Πt=1 [F (−β xit − αi )] ˜ ˜˜ ˜ (4.28) If T is large, consistent estimator of β and αi can abe obtained by maximizing (4.28). If T ˜ is finite, there is only limited information about αi no matter how large N is. The presence of incidental parameters, αi , violates the regularity conditions for the consistency of the maximum likelihood estimator of β . ˜ 17 If f (αi | xi ) is known, and is characterized by a fixed dimensional parameter vector, ˜ consistent estimator of β can be obtained by maximizing the marginal likelihood function, ˜  N (4.29) Πi=1 ΠTt=1 [F (−β ′ xit − αi )]1−yit [1 − F (−β ′ xit − αi )]yit f (αi | xi )dαi . ˜ ˜˜ ˜˜ However, maximizing (4.29) involves T -dimensional integration. Butler and Moffit (1982), Chamberlain (1984), Heckman (1981), etc., have suggested methods to simplify the computation. The advantage of RE specification is that there is no incidental parameter problem. The problem is that f (αi | xi ) is in general unknown. If a wrong f (αi | xi ) is postu˜ ˜ lated, maximizing the wrong likelihood function will not yield consistent estimator of β . ˜ Moreover, the derivation of the marginal likelihood through multiple integration may be computationally infeasible. The advantage of FE specification is that there is no need to specify f (αi | xi ). The likelihood function will be the product of individual likelihood (e.g. ˜ (4.28)) if the errors are i.i.d. The disadvantage is that it introduces incidental parameters. A general approach of estimating a model involving incidental parameters is to find transformations to transform the original model into a model that does not involve incidental parameters. Unfortunately, there is no general rule available for nonlinear models. One has to explore the specific structure of a nonlinear model to find such a transformation. For instance, if f (u) in (4.23) is logistic, then β ′ x +α e ˜ ˜ it i . Prob (yit = 1 | xit , αi ) = β ′ xit +αi ˜ 1 + e˜ ˜ (4.30) Since, in a logit model, the denominators of Prob(yit = 1 | xit , αi ) and Prob(yit = 0 | ˜ T  yit = s is xit , αi ) are identical and the numerator of any sequence {yi1 , . . . , yiT } with ˜ t=1 T  always equal to exp (αi s)·exp{ (β ′ xit )yit }, the conditional likelihood function conditional t=1 ˜ ˜ T  yit = s will not involve the incidental parameters αi . For instance, consider the on t=1 18 simple case that T = 2, then Prob(yi1 = 1, yi2 = 0 | yi1 + yi2 ′ xi1 eβ ˜ ˜ = 1) = ′ ′ xi2 x β i1 e ˜ ˜ + eβ ˜˜ 1 = , β ′ ∆xi2 1 + e˜ ˜ and Prob(yi1 = 0, yi2 = 1 | yi1 + yi2 β ′ ∆x e ˜ ˜ i2 = 1) = , β ′ ∆xi2 1 + e˜ ˜ (4.31) (4.32) (Chamberlain (1980), Hsiao (2003)). This approach works because of the logit structure. In the case when f (u) is unknown, Manski (1987) exploits the latent linear structure of (4.23) by noting that for given i, > > ′ = β xi,t−1 ⇐⇒ E(yit | xit , αi ) = E(yi,t−1 | xi,t−1 , αi ), β ′ xit < < ˜ ˜ ˜˜ ˜˜ (4.33) and suggests maximizing the objective function N T 1  sgn(b′ ∆xit )∆yit , HN (b) = N i=1 t=2 ˜ ˜ (4.34) where sgn(w) = 1 if w > 0, = 0 if w = 0, and −1 if w < 0. The advantage of the Manski (1987) maximum score estimator is that it is consistent without the knowledge of f (u). The disadvantage is that (4.33) holds for any cβ where c > 0. Only the relative magnitude ˜ of the coefficients can be estimated with some normalization rule, say  β = 1. Moreover, ˜ the speed of convergence is considerably slower (N 1/3 ) and the limiting distribution is quite complicated. Horowitz (1992) and Lee (1999) have proposed modified estimators that improve the speed of convergence and are asymptotically normally distributed. Other examples of exploiting specific structure of nonlinear models to eliminate the effects of incidental parameters αi include dynamic discrete choice models (Chamberlain (1993), Honoré and Kyriazidou (2000), Hsiao, Shen, Wang and Weeks (2005)), symmetrically trimmed least squares estimator for truncated and censored data (Tobit models) (Honoré (1992)), sample selection models (or type II Tobit models) (Kyriazidou (1997)), 19 etc. However, often they impose very severe restrictions on the data such that not much information of the data can be utilized to obtain parameter estimates. Moreover, there are models such that there does not appear to possess consistent estimator when T is finite. An alternative to consider consistent estimators is to consider bias reduced estimator. The advantage of such an approach is that the bias reduced estimators may still allow the use of all the sample information so that from a mean square error point of view, the bias reduced estimator may still dominate a consistent estimators because the latter often have to throw away a lot of sample, thus tend to have large variances. Following the idea of Cox and Reid (1987), Arellano (2001) and Carro (2004) propose to derive the modified MLE by maximizing the modified log-likelihood function ∗ L (β ) = ˜ N   i=1 ℓ∗i (β , α̂i (β )) ˜ ˜  1 ∗ − log ℓi,di di (β1 α̂i (β ) , 2 ˜ (4.35) where ℓ∗i (β , α̂i (β )) denotes the concentrated log-likelihood function of y i after substi˜ ˜ ˜ = 0 in terms of tuting the MLE of αi in terms of β , α̂i (β ), (i.e., the solution of ∂logL ∂αi ˜ ˜ β , i = 1, . . . , N ), into the log-likelihood function and ℓ∗i,αi αi (β , α̂i (β )) denotes the second ˜ ˜ ˜ derivative of ℓ∗i with respect to αi . The bias correction term is derived by noting that to the E[∗ (β ,αi )] i αi . By order of (1/T ) the first derivative of ℓ∗i with respect to β converges to 12 i,βα ∗ E[i,α α (β˜,αi )] i i ˜ ˜ subtracting the order (1/T) bias from the likelihood function, the modified MLE is biased only to the order of (1/T 2 ), without increasing the asymptotic variance. Monte Carlo experiments conducted by Carro (2005) have shown that when T = 8, the bias of modified MLE for dynamic probit and logit models are negligible. Another advantage of the Arellano-Carro approach is its generality. For instance, a dynamic logit model with time dummy explanatory variable can not meet the Honoré and Kyriazidou (2000) conditions for generating consistent estimator, but can still be estimated by the modified MLE with good finite sample properties. 4.4 Modeling Cross-Sectional Dependence Most panel studies assume that apart from the possible presence of individual in20 variant but period varying time specific effects, λt , the effects of omitted variables are independently distributed across cross-sectional units. However, often economic theory predicts that agents take actions that lead to interdependence among themselves. For example, the prediction that risk averse agents will make insurance contracts allowing them to smooth idiosyncratic shocks implies dependence in consumption across individuals. Ignoring cross-sectional dependence can lead to inconsistent estimators, in particular when T is finite (e.g. Hsiao and Tahmiscioglu (2005)). Unfortunately, contrary to the time series data in which the time label gives a natural ordering and structure, general forms of dependence for cross-sectional dimension are difficult to formulate. Therefore, econometricians have relied on strong parametric assumptions to model cross-sectional dependence. Two approaches have been proposed to model cross-sectional dependence: economic distance or spatial approach and factor approach. In regional science, correlation across cross-section units is assumed to follow a certain spatial ordering, i.e. dependence among cross-sectional units is related to location and distance, in a geographic or more general economic or social network space (e.g. Anselin (1988), Anselin and Griffith (1988), Anselin, Le Gallo and Jayet (2005)). A known spatial weights matrix, W = (wij ) an N × N positive matrix in which the rows and columns correspond to the cross-sectional units, is specified to express the prior strength of the interaction between individual (location) i (in the row of the matrix) and individual (location) j (column), wij . By convention, the diagonal elements, wii = 0. The weights are N  wij = 1. often standardized so that the sum of each row, j=1 The spatial weight matrix, W , is often included into a model specification to the dependent variable, to the explanatory variables, or to the error term. For instance, a spatial lag model for the N T × 1 variable y = (y ′1 , . . . , y′N )′ , y i = (yi1 , . . . , yiT )′ , may take ˜ ˜ ˜ ˜ the form y = ρ(W ⊗ IT )y + Xβ + u ˜ ˜ ˜ (4.36) where X and u denote the N T ×K explanatory variables and N T ×1 vector of error terms, ˜ 21 respectively, and ⊗ denotes the Kronecker product. A spatial error model may take the form, y = Xβ + v , ˜ ˜ ˜ (4.37) where v may be specified as in a spatial autoregressive form, ˜ v = θ(W ⊗ IT )v + u, ˜ ˜ ˜ (4.38) v = γ(W ⊗ IT )u + u. ˜ ˜ ˜ (4.39) or a spatial moving average form, The spatial model can be estimated by the instrumental variables (generalized method of moments estimator) or the maximum likelihood method. However, the approach of defining cross-sectional dependence in terms of “economic distance” measure requires that the econometricians have information regarding this “economic distance” (e.g. Conley (1999)). Another approach to model cross-sectional dependence is to assume that the error of a model, say model (4.37) follows a linear factor model, vit = r  bij fjt + uit , (4.40) j=1 where f t = (f1t , . . . , frt )′ is a r × 1 vector of random factors, b′i = (bi1 , . . . , bir ), is a r × 1 ˜ ˜ nonrandom factor loading coefficients, uit , represents the effects of idiosyncratic shocks which is independent of f t and is independently distributed across i. (e.g. Bai and Ng ˜ (2002), Moon and Perron (2004), Pesaran (2004)). The conventional time-specific effects model is a special case of (7.5) when r = 1 and bi = b for all i and ℓ. The factor approach requires considerably less prior information than the economic distance approach. Moreover, the number of time-varying factors, r, and factor load matrix B = (bij ) can be empirically identified if both N and T are large. The estimation of a factor loading matrix when N is large may not be computationally feasible. Pesaran N N   yit , x̄t = N1 xit (2004) has therefore suggested to add cross-sectional means ȳt = N1 ˜ i=1 i=1 ˜ 22 as additional regressors with individual-specific coefficients to (4.37) to filter out crosssectional dependence. This approach is very appealing because of its simplicity. However, it is not clear how it will perform if N is neither small nor large. Neither is it clear how it can be generalized to nonlinear models. 4.5 Large-N and Large-T Panels Our discussion has been mostly focusing on panels with large N and finite T . There are panel data sets, like the Penn-World tables, covering different individuals, industries, and countries over long periods. In general, if an estimator is consistent in the fixed-T , large-N case, it will remain consistent if both N and T tend to infinity. Moreover, even in the case that an estimator is inconsistent for fixed T and large N , (say, the MLE of dynamic model (4.13) or fixed effects probit or logit models (4.26)), it can become consistent if T also tends to infinity. The probability limit of an estimator, in general, is identical irrespective of how N and T tend to infinity. However, the properly scaled limiting distribution may depend on how the two indexes, N and T , tend to infinity. There are several approaches for deriving the limits of large-N , large-T panels: a. Sequential limits — First, fix one index, say N , and allow the other, say T , to go to infinity, giving an intermediate limit, then, let N go to infinity. b. Diagonal-path limits — Let the two indexes, N and T , pass to infinity along a specific diagonal path, say T = T (N ) as N −→ ∞. c. Joint limits — Let N and T pass to infinity simultaneously without placing specific diagonal path restrictions on the divergence. In many applications, sequential limits are easy to derive. However, sometimes sequential limits can give misleading asymptotic results. A joint limit will give a more robust result than either a sequential limit or a diagonal-path limit, but will also be substantially more difficult to derive and will apply only under stronger conditions, such as the existence of higher moments. Phillips and Moon (1999) have given a set of sufficient conditions that ensures that sequential limits are equivalent to joint limits. 23 When T is large, there is a need to consider serial correlations more generally, including both short-memory and persistent components. For instance, if unit roots are present in y and x (i.e. both are integrated of order 1),, but are not cointegrated, Phillips and Moon (1999) show that if N is fixed but T −→ ∞, the least squares regression of y on x is a nondegenerate random variables that is a functional of Brownian motion that does not converge to the long-run average relation between y and x, but it does if N also tends to infinity. In other words, the issue of spurious regression will not arise in panel with large N (e.g. Kao (1999)). Both theoretical and applied researchers have paid a great deal attention to unit root and cointegration properties of variables. When N is finite and T is large, standard time series techniques can be used to derive the statistical properties of panel data estimators. When N is large and cross-sectional units are independently distributed across i, central limit theorems can be invoked along the cross-sectional dimension. Asymptotically normal estimators and test statistics (with suitably adjustment for finite T bias) for unit roots and cointegration have been proposed (e.g. Baltagi and Kao (2000), Im, Pesaran and Shin (2003), Levin, Lin and Chu (2002)). They, in general, gain statistical power over their standard time series counterpart (e.g. Choi (2001)). When both N and T are large and cross-sectional units are not independent, a factor analytic framework of the form (4.40) has been proposed to model cross-sectional dependency and variants of unit root tests are proposed (e.g. Perron and Moon (2004)). However, the implementation of those panel unit root tests is quite complicated. When N  ′ ′ uit −→ 0, (4.40) implies that v̄t = b̄ f t , where b̄ is the cross-sectional avN −→ ∞, N1 ˜ ˜˜ i=1 ′ erage of bi = (bi1 , . . . , bir ) and f t = (f1t , . . . , frt ). Pesaran (2004, 2005) suggests a simple ˜ ˜ approach to filter out the cross-sectional dependency by augmenting the cross-sectional means, ȳt and x̄t to the regression model (4.37), ˜ yit = x′it β + αi + ȳt ci + x̄′t di + eit , ˜ ˜ ˜ 24 (4.41) or ȳt , ∆ȳt−j to the Dickey-Fuller (1979) type regression model, ∆yit = αi + δi t + γi yi,t−1 + pi  =1 + pi  φi ∆yi,t− + ci ȳt−1 (4.42) di ∆ȳt− + eit , =1 N N   yit , x̄t = N1 xit , ∆ȳt−j = N1 ∆yi,t−j and ˜ i=1 i=1 ˜ i=1 ∆ = (1 − L), L denotes the lag operator. The resulting pooled estimator will again be for testing of unit root, where ȳt = 1 N N  asymptotically normally distributed. When cross-sectional dependency is of unknown form, Chang (2002) suggests to use nonlinear transformations of the lagged level variable, yi,t−1 , F (yi,t−1 ), as instruments (IV) for the usual augmented Dickey-Fuller (1970) type regression. The test static for the unit root hypothesis is simply defined as a standardized sum of individual IV t-ratios. As long as F (·) is regularly integrable, say F (yt−1 ) = yi,t−1 e−ci |yi,t−1 | , where ci is a positive constant, the product of the nonlinear instruments F (yi,t−1 ) and F (yj,t−1 ) from different crosssectional units i and j are asymptotically uncorrelated, even the variables yi,t−1 and yj,t−1 generating the instruments are correlated. Hence, the usual central limit theorems can be invoked and the standardized sum of individual IV t-ratios is asymptotically normally distributed. For further review of the literature on unit roots and cointegration in panels, see Breitung and Pesaran (2005) and Choi (2004). However, a more fundamental issue of panel modeling with large N and large T is whether the standard approach of formulating unobserved heterogeneity for the data with finite T remains a good approximation to the true data generating process with large T ?. 5. Concluding Remarks In this paper we have tried to provide a summary of advantages of using panel data and the fundamental issues of panel data analysis. Assuming that the heterogeneity across 25 cross-sectional units and over time that are not captured by the observed variables can be captured by period-invariant individual specific and/or individual-invariant time specific effects, we surveyed the fundamental methods for the analysis of linear static and dynamic models. We have also discussed difficulties of analyzing nonlinear models and modeling cross-sectional dependence. There are many important issues such as the modeling of joint dependence or simultaneous equations models, varying parameter models (e.g. Hsiao (1992, 2003), Hsiao and Pesaran (2005)), unbalanced panel, measurement errors (Griliches and Hausman (1986), Wansbeek and Konig (1989)), nonparametric or semiparametric approach, repeated cross-section data, etc. that are not discussed, but are of no less importance. Although panel data offer many advantages, they are not panacea. The power of panel data to isolate the effects of specific actions, treatments or more general policies depends critically on the compatibility of the assumptions of statistical tools with the data generating process. In choosing a proper method for exploiting the richness and unique properties of the panel, it might be helpful to keep the following factors in mind: First, what advantages do panel data offer us in investigating economic issues over data sets consisting of a single cross section or time series? Second, what are the limitations of panel data and the econometric methods that have been proposed for analyzing such data? Third, when using panel data, how can we increase the efficiency of parameter estimates and reliability of statistical inference? Fourth, are the assumptions underlying the statistical inference procedures and the data-generating process compatible. 26 References Ahn, S.C. and P. Schmidt (1995), “ Efficient Estimation of Models for Dynamic Panel Data”, Journal of Econometrics, 68, 5-27. Aigner, D.J., C. Hsiao, A. Kapteyn and T. Wansbeek (1985), “Latent Variable Models in Econometrics”, in Handbook of Econometrics, vol. II., ed. by Z. Griliches and M.D. Intriligator, Amsterdam: North-Holland, 1322-1393. Ahn, S.C. and P. Schmidt (1995), “ Efficient Estimation of Models for Dynamic Panel Data”, Journal of Econometrics, 68, 5-27. Anderson, T.W. (1959), “On Asymptotic Distributions of Estimates of Parameters of Stochastic Difference Equations”, Annals of Mathematical Statistics 30, 676-687. Anderson, T.W. and C. Hsiao (1981), “Estimation of Dynamic Models with Error Components”, Journal of the American Statistical Association, 767, 598-606. (1982), “Formulation and Estimation of Dynamic Models Using Panel Data”, Journal of Econometrics, 18, 47-82. Anselin, L., (1988), Spatial Econometrics: Methods and Models, Boston: Kluwer. and D.A. Griffith (1988), “Do Spatial Effects Really Matter in Regression Analysis?”, Papers of the Regional Science Association, 65, 11-34. , J. Le Gallo and H. Jayet (2005), “Spatial Panel Econometrics”, mimeo. Antweiler, W. (2001), “Nested Random Effects Estimation in Unbalanced Panel Data”, Journal of Econometrics, 101, 295-313. Arellano, M., (2001), “Discrete Choice with Panel Data”, working paper 0101, CEMFI, Madrid. (2003), Panel Data Econometrics, Oxford: Oxford University Press. , M. and S.R. Bond (1991), “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations”, Review of Economic Studies, 58, 277-297. and O. Bover (1995), “Another Look at the Instrumental Variable Estimation of Error-Components Models”, Journal of Econometrics, 68, 29-51. Arellano, M., O. Bover and J. Labeaga (1999), “Autoregressive Models with Sample Selectivity for Panel Data”, in Analysis of Panels and Limited Dependent Variable Models, ed., by C. Hsiao, K. Lahiri, L.F. Lee and M.H. Pesaran, Cambridge: Cambridge University Press, 23-48. Bai, J. and S. Ng (2002), “Determining the Number of Factors in Approximate Factor Models”, Econometrica, 70. 91-121. Balestra, P. and M. Nerlove (1966), “Pooling Cross-Section and Time Series Data in the Estimation of a Dynamic Model: The Demand for Natural Gas”, Econometrica, 34, 27 585-612. Baltagi, B.H. (2001), Econometric Analysis of Panel Data, Second edition, New York: Wiley. and C. Kao (2000), “Nonstationary Panels, Cointegration in Panels and Dynamic Panel, A Survey”, in Nonstationary Panels Panel Cointegration, and Dynamic Panels, ed. by B. Baltagi, Advances in Econometrics, vol. 15, Amsterdam: JAI Press, 7-52. Becketti, S., W. Gould, L. Lillard and F. Welch (1988), “The Panel Study of Income Dynamics After Fourteen Years: An Evaluation”, Journal of Labor Economics, 6, 472-492. Ben-Porath, Y. (1973), “Labor Force Participation Rates and the Supply of Labor”, Journal of Political Economy, 81, 697-704. Bhargava, A. and J.D. Sargan (1983), “Estimating Dynamic Random Effects Models from Panel Data Covering Short Time Periods”, Econometrica, 51, 1635-1659. Binder, M., C. Hsiao and M.H. Pesaran (2005), “Estimation and Inference in Short Panel Vector Autoregressions with Unit Roots and Cointegration”, Econometric Theory, 21, pp. 795-837. Biorn, E. (1992), “Econometrics of Panel Data with Measurement Errors” in Econometrics of Panel Data: Theory and Applications, ed. by. L. Mátyás and P. Sevestre, Klumer, 152-195. Breitung, J. and M.H. Pesaran (2005), “Unit Roots and Cointegration in Panels”, in The Econometrics of Panel Data, Kluwer (forthcoming). Butler, J.S. and R. Moffitt (1982), “A Computationally Efficient Quadrature Procedure for the One Factor Multinominal Probit Model”, Econometrica, 50, 761-764. Carro, J.M. (2005), “Estimating Dynamic Panel Data Discrete Choice Models with Fixed Effects”, Journal of Econoemtrics (forthcoming). Chamberlain, G. (1980), “Analysis of Covariance with Qualitative Data”, Review of Economic Studies, 47, 225-238. (1984), “Panel Data”, in Handbook of Econometrics Vol II, ed. Griliches and M. Intriligator, pp. 1247-1318. Amsterdam: North Holland. by Z. (1993), “Feedback in Panel Data Models”, mimeo, Department of Economics, Harvard University. Chang, Y. (2002), “Nonlinear IV Unit Root Tests in Panels with Cross-Sectional Dependency”, Journal of Econometrics, 110, 261-292. Choi, I. (2001), “Unit Root Tests for Panel Data”, Journal of International Money and Finance, 20, 249-272. Choi, I. (2004), “Nonstationary Panels”, in Palgrave Handbooks of Econometrics, vol I, (forthcoming). 28 Conley, T.G. (1999), “GMM Estimation with Cross-sectional Dependence”, Journal of Econometrics, 92, 1-45. Cox, D.R. and Reid (1987), “Parameter Orthogonality and Approximate Conditional Inference”, Journal of the Royal Statistical Society, B, 49, 1-39. Davis, P. (1999), “Estimating Multi-way Error components Models with Unbalanced Panel Data Structure”, MIT Sloan School. Dickey, D.A. and W.A. Fuller (1979), “Distribution of the Estimators for Autoregressive Time Series with a Unit Root”, Journal of the American Statistical Association, 74, 427-431. (1981), “Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root”, Econometrica 49, 1057-1072. Granger, C.W.J. (1990), “Aggregation of Time-Series Variables: A Survey”, in Disaggregation in Econometric Modeling, ed. by T. Barker and M.H. Pesaran, London: Routledge. Griliches, Z. (1967), “Distributed Lags: A Survey”, Econometrica, 35, 16-49. and J.A. Hausman (1986), “Errors-in-Variables in Panel Data”, Journal of Econometrics, 31, 93-118. Hausman, J.A. (1978), “Specification Tests in Econometrics”, Econometrica, 46. 1251-71. Heckman, J.J. (1981), “Statistical Models for Discrete Panel Data”, in Structural Analysis of Discrete Data with Econometric Applications, ed. by C.F. Manski and D. McFadden, Cambridge, Mass., MIT Press, 114-178. Heckman, J.J., H. Ichimura, J. Smith and P. Todd (1998), “Characterizing Selection Bias Using Experimental Data”, Econometrica, 66, 1017-1098. Honoré, Bo (1992), “Trimmed LAD and Least Squares Estimation of Truncated and Censored Regression Models with Fixed Effects”, Econometrica, 60, 533-567. Honoré, Bo and E. Kyriazidou (2000),. “Panel Data Discrete Choice Models with Lagged Dependent Variables”, Econometrica, 68, 839-874. Horowitz, J.L. (1992), “A Smoothed Maximum Score Estimator for the Binary Response Model”, Econometrica 60, 505-531. Hsiao, C., (1986) “Analysis of Panel Data, Econometric Society monographs No. 11, New York: Cambridge University Press. (1992), “Random Coefficient Models” in The Econometrics of Panel Data, ed. by L. Matyas and P. Sevestres, Kluwer, 1st edition, 223-241, 2nd ed. (1996), 410-428. (2003), Analysis of Panel Data, 2nd edition, Cambridge: Cambridge University Press (Econometric Society monograph no. 34). (2005), “Why Panel Data?”, Singapore Economic Review, 50(2), 1-12. 29 (2006), “Longitudinal Data Analysis”, in The New Palgrave Dictionary of Economics, MacMillan (forthcoming). and M.H. Pesaran (2005), “Random Coefficients Models”, in The Econometrics of Panel Data, ed. by L. Matyas and P. Sevestre, Kluwer (forthcoming). and T. Tahmiscioglu (2005), “Estimation of Dynamic Panel Data Models with Both Individual and Time Specific Effects”. Hsiao, C., T.W. Appelbe, and C.R. Dineen (1993), “A General Framework for Panel Data Analysis — With an Application to Canadian Customer Dialed Long Distance Service”, Journal of Econometrics, 59, 63-86. D.C. Mountain and K. Ho-Illman (1995), “Bayesian Integration of EndUse Metering and Conditional Demand Analysis”, Journal of Business and Economic Statistics, 13, 315-326. M.W. Luke Chan, D.C. Mountain and K.Y. Tsui (1989), “Modeling Ontario Regional Electricity System Demand Using a Mixed Fixed and Random Coefficients Approach”, Regional Science and Urban Economics, 19, 567-587. Hsiao, C., M.H. Pesaran and A.K. Tahmiscioglu (2002), “Maximum Likelihood Estimation of Fixed Effects Dynamic Panel Data Models Covering Short Time Periods”, Journal of Econometrics, 109, 107-150. , Y. Shen and H. Fujiki (2005), “Aggregate vs Disaggregate Data Analysis — A Paradox in the Estimation of Money Demand Function of Japan Under the Low Interest Rate Policy”, Journal of Applied Econometrics, 20, 579-601. , Y. Shen, B. Wang and G. Weeks (2005), “Evaluating the Effectiveness of Washington State Repeated Job Search Services on the Employment Rate of Primeage Female Welfare Recipients”, mimeo. Im, K., M.H. Pesaran and Y. Shin (2003), “Testing for Unit Roots in Heterogeneous Panels”, Journal of Econometrics, 115, 53-74. Juster, T. (2000), “Economics/Micro Data”, in International Encyclopedia of Social Sciences, (forthcoming). Kao, C. (1999), “Spurious Regression and Residual-Based Tests for Cointegration in Panel Data”, Journal of Econometrics, 90, 1-44. Kyriazidou, E. (1997), “Estimation of a Panel Data Sample Selection Model”, Econometrica, 65, 1335-1364. Lee, M.J. (1999), “A Root-N-Consistent Semiparametric Estimator for Related Effects Binary Response Panel Data”, Econometrica, 67, 427-433. Lee, M.J. (2005), Micro-Econometrics for Policy, Program and Treatment Analysis, Oxford: Oxford University Press. Lewbel, A. (1994), “Aggregation and Simple Dynamics”, American Economic Review, 84, 905-918. 30 Levin, A., C. Lin, and J. Chu (2002), “Unit Root Tests in Panel Data: Asymptotic and Finite-Sample Properties”, Journal of Econometrics. 108, 1-24. MaCurdy, T.E. (1981), “An Empirical Model of Labor Supply in a Life Cycle Setting”, Journal of Political Economy, 89, 1059-85. Manski, C.F. (1987), “Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data”, Econometrica, 55, 357-362. Mátyás, L. and P. Sevestre, ed (1996), The Econometrics of Panel Data — Handbook of Theory and Applications, 2nd ed. Dordrecht: Kluwer. Moon, H.R. and B. Perron (2004), “Testing for a Unit Roots in Panels with Dynamic Factors”, Journal of Econometrics, 122, 81-126. Nerlove, M. (2002), Essays in Panel Data Econometrics, Cambridge: Cambridge University Press. Neyman, J. and E.L. Scott (1948), “Consistent Estimates Based on Partially Consistent Observations”, Econometrica, 16, 1-32. Nickell, S. (1981), “Biases in Dynamic Models with Fixed Effects”, Econometrica, 49, 1399-1416. Pakes, A. and Z. Griliches (1984), “Estimating Distributed Lags in Short Panels with an Application to the Specification of Depreciation Patterns and Capital Stock Constructs”, Review of Economic Studies, 51, 243-262. Pesaran, M.H. (2003), “On Aggregation of Linear Dynamic Models: An Application to Life-Cycle Consumption Models Under Habit Formation”, Economic Modeling, 20, 227-435. Pesaran, M.H. (2004), “Estimation and Inference in Large Heterogeneous Panels with Cross-Sectional Dependence”, mimeo. (2005), “A Simple Panel Unit Root Test in the Presence of Cross-Section Dependence”, mimeo. Phillips, P.C.B. (1986), “Understanding Spurious Regressions in Econometrics”, Journal of Econometrics, 33, 311-340. Phillips, P.C.B. and S.N. Durlauf (1986), “Multiple Time Series Regression with Integrated Processes”, Review of Economic Studies, 53, 473-495. and H.R. Moon (1999), “Linear Regression Limit Theory for Nonstationary Panel Data”, Econometrica, 67, 1057, 1111. Rao, C.R., (1973), Linear Statistical Inference and Its Applications, 2nd ed., New York: Wiley. Rosenbaum, P., and D. Rubin (1985), “Reducing Bias in Observational Studies Using Subclassification on the Propensity Score”, Journal of the American Statistical Association, 79, 516-524. 31 Wansbeek, T.J. and R.H. Koning (1989), “Measurement Error and Panel Data”, Statistica Neerlandica, 45, 85-92. Zilak, J.P. (1997), “Efficient Estimation with Panel Data When Instruments are Predetermined: An Empirical Comparison of Moment-Condition Estimators”, Journal of Business and Economic Statistics, 15, 419-431. 32 Scatter Diagrams of (y(i,t),x(i,t)) y x Figure 1 y x Figure 2 y x Figure 3