WISE WORKING PAPER SERIES
WISEWP0602
Panel Data Analysis - Advantages and Challenges
Cheng Hsiao
April 19, 2006
COPYRIGHT© WISE, XIAMEN UNIVERSITY, CHINA
Panel Data Analysis — Advantages and Challenges
Cheng Hsiao∗
Department of Economics
University of Southern California
Los Angeles, CA 90089-0253
and
Wang Yanan
Institute for Studies in Economics
Xiamen University, China
April 19, 2006
ABSTRACT
We explain the proliferation of panel data studies in terms of (i) data availability, (ii)
the more heightened capacity for modeling the complexity of human behavior than a single
cross-section or time series data can possibly allow, and (iii) challenging methodology.
Advantages and issues of panel data modeling are also discussed.
Keywords: Panel data; Longitudinal data; Unobserved heterogeneity; Random effects;
Fixed effects
∗
I would like to thank Irene C. Hsiao for helpful discussion and editorial assistance and
Kannika Damrongplasit for drawing the figures. Some of the arguments presented here
also appear in Hsiao (2005, 2006).
1. Introduction
Panel data or longitudinal data typically refer to data containing time series observations of a number of individuals. Therefore, observations in panel data involve at least
two dimensions; a cross-sectional dimension, indicated by subscript i, and a time series
dimension, indicated by subscript t. However, panel data could have a more complicated
clustering or hierarchical structure. For instance, variable y may be the measurement of
the level of air pollution at station ℓ in city j of country i at time t (e.g. Antweiler (2001),
Davis (1999)). For ease of exposition, I shall confine my presentation to a balanced panel
involving N cross-sectional units, i = 1, . . . , N , over T time periods, t = 1, . . . , T .
There is a proliferation of panel data studies, be it methodological or empirical. In
1986, when Hsiao’s (1986) first edition of Panel Data Analysis was published, there were
29 studies listing the key words: “panel data or longitudinal data”, according to Social
Sciences Citation index. By 2004, there were 687 and by 2005, there were 773. The growth
of applied studies and the methodological development of new econometric tools of panel
data have been simply phenomenal since the seminal paper of Balestra and Nerlove (1966).
There are at least three factors contributing to the geometric growth of panel data
studies. (i) data availability, (ii) greater capacity for modeling the complexity of human
behavior than a single cross-section or time series data, and (iii) challenging methodology.
In what follows, we shall briefly elaborate each of these one by one. However, it is impossible to do justice to the vast literature on panel data. For further reference, see Arellano
(2003), Baltagi (2001), Hsiao (2003), Matyas and Sevester (1996), and Nerlove (2002), etc.
2. Data Availability
The collection of panel data is obviously much more costly than the collection of crosssectional or time series data. However, panel data have become widely available in both
developed and developing countries.
The two most prominent panel data sets in the US are the National Longitudinal
Surveys of Labor Market Experience (NLS) and the University of Michigan’s Panel Study
1
of Income Dynamics (PSID). The NLS began in the mid 1960’s. It contains five separate
annual surveys covering distinct segments of the labor force with different spans: men
whose ages were 45 to 59 in 1966, young men 14 to 24 in 1966, women 30 to 44 in 1967,
young women 14 to 24 in 1968, and youth of both sexes 14 to 21 in 1979. In 1986, the NLS
expanded to include annual surveys of the children born to women who participated in
the National Longitudinal Survey of Youth 1979. The list of variables surveyed is running
into the thousands, with emphasis on the supply side of market.
The PSID began with collection of annual economic information from a representative
national sample of about 6,000 families and 15,000 individuals in 1968 and has continued
to the present. The data set contains over 5,000 variables (Becketti, Gould, Lillard and
Welch (1988)). In addition to the NLS and PSID data sets, there are many other panel
data sets that could be of interest to economists, see Juster (2000).
In Europe, many countries have their annual national or more frequent surveys such
as the Netherlands Socio-Economic Panel (SEP), the German Social Economics Panel
(GSOEP), the Luxembourg Social Panel (PSELL), the British Household Panel Survey
(BHS), etc. Starting in 1994, the National Data Collection Units (NDUS) of the Statistical
Office of the European Committees have been coordinating and linking existing national
panels with centrally designed multi-purpose annual longitudinal surveys. The European
Community Household Panel (ECHP) are published in Eurostat’s reference data base New
Cronos in three domains: health, housing, and income and living conditions.
Panel data have also become increasingly available in developing countries. In these
countries, there may not have been a long tradition of statistical collection. It is of special
importance to obtain original survey data to answer many significant and important questions. Many international agencies have sponsored and helped to design panel surveys. For
instance, the Dutch non-government organization (NGO), ICS, Africa, collaborated with
the Kenya Ministry of Health to carry out a Primary School Deworming Project (PDSP).
The project took place in Busia district, a poor and densely-settled farming region in
2
western Kenya. The 75 project schools include nearly all rural primary schools in this
area, with over 30,000 enrolled pupils between the ages of six to eighteen from 1998-2001.
Another example is the Development Research Institute of the Research Center for Rural
Development of the State Council of China, in collaboration with the World Bank, which
undertook an annual survey of 200 large Chinese township and village enterprises from
1984 to 1990.
3. Advantages of Panel Data
Panel data, by blending the inter-individual differences and intra-individual dynamics
have several advantages over cross-sectional or time-series data:
(i) More accurate inference of model parameters. Panel data usually contain more
degrees of freedom and less multicollinearity than cross-sectional data which may
be viewed as a panel with T = 1, or time series data which is a panel with N = 1,
hence improving the efficiency of econometric estimates (e.g. Hsiao, Mountain
and Ho-Illman (1995).
(ii) Greater capacity for capturing the complexity of human behavior than a single
cross-section or time series data. These include:
(ii.a) Constructing and testing more complicated behavioral hypotheses. For instance, consider the example of Ben-Porath (1973) that a cross-sectional
sample of married women was found to have an average yearly labor-force
participation rate of 50 percent. These could be the outcome of random
draws from a homogeneous population or could be draws from heterogeneous populations in which 50% were from the population who always work
and 50% never work. If the sample was from the former, each woman would
be expected to spend half of her married life in the labor force and half out of
the labor force. The job turnover rate would be expected to be frequent and
3
the average job duration would be about two years. If the sample was from
the latter, there is no turnover. The current information about a woman’s
work status is a perfect predictor of her future work status. A cross-sectional
data is not able to distinguish between these two possibilities, but panel data
can because the sequential observations for a number of women contain information about their labor participation in different subintervals of their life
cycle.
Another example is the evaluation of the effectiveness of social programs
(e.g. Heckman, Ichimura, Smith and Toda (1998), Hsiao, Shen, Wang and
Wang (2005), Rosenbaum and Rubin (1985). Evaluating the effectiveness of
certain programs using cross-sectional sample typically suffers from the fact
that those receiving treatment are different from those without. In other
words, one does not simultaneously observe what happens to an individual
when she receives the treatment or when she does not. An individual is
observed as either receiving treatment or not receiving treatment. Using
the difference between the treatment group and control group could suffer
from two sources of biases, selection bias due to differences in observable
factors between the treatment and control groups and selection bias due to
endogeneity of participation in treatment. For instance, Northern Territory
(NT) in Australia decriminalized possession of small amount of marijuana
in 1996. Evaluating the effects of decriminalization on marijuana smoking
behavior by comparing the differences between NT and other states that
were still non-decriminalized could suffer from either or both sorts of bias. If
panel data over this time period are available, it would allow the possibility
of observing the before- and affect-effects on individuals of decriminalization
as well as providing the possibility of isolating the effects of treatment from
other factors affecting the outcome.
4
(ii.b) Controlling the impact of omitted variables. It is frequently argued that the
real reason one finds (or does not find) certain effects is due to ignoring the
effects of certain variables in one’s model specification which are correlated
with the included explanatory variables. Panel data contain information
on both the intertemporal dynamics and the individuality of the entities
may allow one to control the effects of missing or unobserved variables. For
instance, MaCurdy’s (1981) life-cycle labor supply model under certainty
implies that because the logarithm of a worker’s hours worked is a linear
function of the logarithm of her wage rate and the logarithm of worker’s
marginal utility of initial wealth, leaving out the logarithm of the worker’s
marginal utility of initial wealth from the regression of hours worked on wage
rate because it is unobserved can lead to seriously biased inference on the
wage elasticity on hours worked since initial wealth is likely to be correlated
with wage rate. However, since a worker’s marginal utility of initial wealth
stays constant over time, if time series observations of an individual are
available, one can take the difference of a worker’s labor supply equation
over time to eliminate the effect of marginal utility of initial wealth on hours
worked. The rate of change of an individual’s hours worked now depends
only on the rate of change of her wage rate. It no longer depends on her
marginal utility of initial wealth.
(ii.c) Uncovering dynamic relationships.
“Economic behavior is inherently dynamic so that most econometrically interesting relationship are explicitly or implicitly dynamic”. (Nerlove (2002)).
However, the estimation of time-adjustment pattern using time series data
often has to rely on arbitrary prior restrictions such as Koyck or Almon distributed lag models because time series observations of current and lagged
variables are likely to be highly collinear (e.g. Griliches (1967)). With panel
5
data, we can rely on the inter-individual differences to reduce the collinearity
between current and lag variables to estimate unrestricted time-adjustment
patterns (e.g. Pakes and Griliches (1984)).
(ii.d) Generating more accurate predictions for individual outcomes by pooling
the data rather than generating predictions of individual outcomes using
the data on the individual in question. If individual behaviors are similar
conditional on certain variables, panel data provide the possibility of learning
an individual’s behavior by observing the behavior of others. Thus, it is
possible to obtain a more accurate description of an individual’s behavior by
supplementing observations of the individual in question with data on other
individuals (e.g. Hsiao, Appelbe and Dineen (1993), Hsiao, Chan, Mountain
and Tsui (1989)).
(ii.e) Providing micro foundations for aggregate data analysis.
Aggregate data analysis often invokes the “representative agent” assumption.
However, if micro units are heterogeneous, not only can the time series properties of aggregate data be very different from those of disaggregate data
(e.g., Granger (1990); Lewbel (1992); Pesaran (2003)), but policy evaluation based on aggregate data may be grossly misleading. Furthermore, the
prediction of aggregate outcomes using aggregate data can be less accurate
than the prediction based on micro-equations (e.g., Hsiao, Shen and Fujiki
(2005)). Panel data containing time series observations for a number of individuals is ideal for investigating the “homogeneity” versus “heterogeneity”
issue.
(iii) Simplifying computation and statistical inference.
Panel data involve at least two dimensions, a cross-sectional dimension and a
time series dimension. Under normal circumstances one would expect that the
6
computation of panel data estimator or inference would be more complicated than
cross-sectional or time series data. However, in certain cases, the availability of
panel data actually simplifies computation and inference. For instance:
(iii.a) Analysis of nonstationary time series.
When time series data are not stationary, the large sample approximation
of the distributions of the least-squares or maximum likelihood estimators
are no longer normally distributed, (e.g. Anderson (1959), Dickey and Fuller
(1979,81), Phillips and Durlauf (1986)). But if panel data are available,
and observations among cross-sectional units are independent, then one can
invoke the central limit theorem across cross-sectional units to show that the
limiting distributions of many estimators remain asymptotically normal (e.g.
Binder, Hsiao and Pesaran (2005), Levin, Lin and Chu (2002), Im, Pesaran
and Shin (2004), Phillips and Moon (1999)).
(iii.b) Measurement errors.
Measurement errors can lead to under-identification of an econometric model
(e.g. Aigner, Hsiao, Kapteyn and Wansbeek (1985)). The availability of
multiple observations for a given individual or at a given time may allow a
researcher to make different transformations to induce different and deducible
changes in the estimators, hence to identify an otherwise unidentified model
(e.g. Biorn (1992), Griliches and Hausman (1986), Wansbeek and Koning
(1989)).
(iii.c) Dynamic Tobit models. When a variable is truncated or censored, the actual
realized value is unobserved. If an outcome variable depends on previous
realized value and the previous realized value are unobserved, one has to
take integration over the truncated range to obtain the likelihood of observables. In a dynamic framework with multiple missing values, the multiple
7
integration is computationally unfeasible. With panel data, the problem can
be simplified by only focusing on the subsample in which previous realized
values are observed (e.g. Arellano, Bover, and Labeager (1999)).
4. Methodology
Standard statistical methodology is based on the assumption that the outcomes, say y ,
˜
conditional on certain variables, say x, are random outcomes from a probability distribution
˜
that is characterized by a fixed dimensional parameter vector, θ , f (y | x; θ). For instance,
˜ ˜ ˜ ˜
the standard linear regression model assumes that f (y | x; θ) takes the form that
˜ ˜ ˜
E(y | x) = α + β ′ x,
˜
˜˜
(4.1)
Var(y | x) = σ 2 ,
˜
(4.2)
and
where θ′ = (α, β ′ , σ 2 ). Typical panel data focuses on individual outcomes. Factors affecting
˜
˜
individual outcomes are numerous. It is rare to be able to assume a common conditional
probability density function of y conditional on x for all cross-sectional units, i, at all time,
˜
t. For instance, suppose that in addition to x, individual outcomes are also affected by
˜
unobserved individual abilities (or marginal utility of initial wealth as in MaCurdy (1981)
labor supply model discussed in (iib) on section 3), represented by αi , so that the observed
(yit , xit ), i = 1, . . . , N, t = 1, . . . , T , are actually generated by
˜
i = 1, . . . , N,
yit = αi + β ′ xit + uit ,
t = 1, . . . , T,
˜˜
(4.3)
as depicted by Figure 1, 2 and 3 in which the broken-line ellipses represent the point scatter
of individual observations around the mean, represented by the broken straight lines. If an
investigator mistakenly imposes the homogeneity assumption (4.1) - (4.2), the solid lines
in those figures would represent the estimated relationships between y and x, which can
˜
be grossly misleading.
8
If the conditional density of y given x varies across i and over t, the fundamental
˜
theorems for statistical inference, the laws of large numbers and central limit theorems,
will be difficult to implement. One way to restore homogeneity across i and/or over t is to
add more conditional variables, say z ,
˜
f (yit | xit , z it ; θ).
˜ ˜ ˜
(4.4)
However, the dimension of z can be large. A model is a simplification of reality, not a
˜
mimic of reality. The inclusion of z may confuse the fundamental relationship between y
˜
and x, in particular, when there is a shortage of degrees of freedom or multicollinearity, etc.
˜
Moreover, z may not be observable. If an investigator is only interested in the relationship
˜
between y and x, one approach to characterize the heterogeneity not captured by x is to
˜
˜
assume that the parameter vector varies across i and over t, θit , so that the conditional
˜
density of y given x takes the form f (yit | xit ; θit ). However, without a structure being
˜ ˜
˜
imposed on θ it , such a model only has descriptive value. It is not possible to draw any
˜
inference about θ it .
˜
The methodological literature on panel data is to suggest possible structures on θ it
˜
(e.g. Hsiao (2003)). One way to impose some structure on θit is to decompose θ it into
˜
˜
(β , γ it ), where β is the same across i and over t, referred to as structural parameters,
˜
˜ ˜
and γ it as incidental parameters because when cross-section units, N and/or time series
˜
observations, T increases, so does the dimension of γ it . The focus of panel data literature
˜
is to make inference on β after controlling the impact of γ it .
˜
˜
Without imposing a structure for γ it , again it is difficult to make any inference on β
˜
˜
because estimation of β could depend on γit and the estimation of the unknown γ it probably
˜
will exhaust all available sample information. Assuming that the impacts of observable
variables, x, are the same across i and over t, represented by the structure parameters,
˜
β , the incidental parameters γ it represent the heterogeneity across i and over t that are
˜
˜
not captured by xit . They can be considered composed of the effects of omitted individual
˜
time-invariant, αi , period individual-invariant, λt , and individual time-varying variables,
9
δit . The individual time-invariant variables are variables that are the same for a given
cross-sectional unit through time but vary across cross-sectional units such as individualfirm management, ability, gender, and socio-economic background variables. The period
individual-invariant variables are variables that are the same for all cross-sectional units at
a given time but vary through time such as prices, interest rates, and wide spread optimism
or pessimism. The individual time-varying variables are variables that vary across crosssectional units at a given point in time and also exhibit variations through time such as
firm profits, sales and capital stock. The effects of unobserved heterogeneity can either be
assumed as random variables, referred to as the random effects model, or fixed parameters,
referred to as the fixed effects model, or a mixture of both, refereed to as the mixed effects
model.
The challenge of panel methodology is to control the impact of unobserved heterogeneity, represented by the incidental parameters, γit , to obtain valid inference on the
structural parameters β . A general principle of obtaining valid inference of β in the pres˜
˜
ence of incidental parameters γ it is to find proper transformation to eliminate γ it from
˜
˜
the specification. Since proper transformations depend on the model one is interested.
As illustrations, I shall try to demonstrate the fundamental issues from the perspective
of linear static models, dynamic models, nonlinear models, models with cross-sectional
dependencies and models with large N and large T .
For ease of exposition, I shall assume for the most time that there are no time-specific
effects, λt and the individual time-varying effects, δit , can be represented by a random
variable uit , that is treated as the error of an equation. In other words, only individualspecific effects, αi , are present. The individual-specific effects, αi , can either be assume as
random or fixed. The standard assumption for random effects specification is that they
are randomly distributed with a common mean and are independent of fixed xit .
˜
The advantages of random effects (RE) specification are: (a) The number of parameters stay constant when sample size increases. (b) It allows the derivation of efficient
10
estimators that make use of both within and between (group) variation. (c) It allows the
estimation of the impact of time-invariant variables. The disadvantage is that one has
to specify a conditional density of αi given x′i = (xit , . . . , xiT ), f (αi | xi ), while αi are
˜
˜
˜
˜
unobservable. A common assumption is that f (αi | xi ) is identical to the marginal density
˜
f (αi ). However, if the effects are correlated with xit or if there is a fundamental difference
˜
among individual units, i.e., conditional on xit , yit cannot be viewed as a random draw
˜
from a common distribution, common RE model is misspecified and the resulting estimator
is biased.
The advantages of fixed effects (FE) specification are that it can allow the individualand/or time specific effects to be correlated with explanatory variables xit . Neither does
˜
it require an investigator to model their correlation patterns. The disadvantages of the FE
specification are: (a’) The number of unknown parameters increases with the number of
sample observations. In the case when T (or N for γt ) is finite, it introduces the classical
incidental parameter problem (e.g. Neyman and Scott (1948)). (b’) The FE estimator
does not allow the estimation of the coefficients that are time-invariant.
In order words, the advantages of RE specification are the disadvantages of FE specification and the disadvantages of RE specification are the advantages of FE specification.
To choose between the two specifications, Hausman (1978) notes that if the FE estimator
(or GMM), θ̂ F E , is consistent whether αi is fixed or random and the commonly used RE
˜
estimator (or GLS), θ̂RE , is consistent and efficient only when αi is indeed uncorrelated
˜
with xit and is inconsistent if αi is correlated with xit . Therefore, he suggests using the
˜
˜
statistic
−
′
(4.5)
Cov (θ̂ F E ) − Cov (θ̂ RE )
θ̂F E − θ̂ RE
θ̂ F E − θ̂ RE
˜
˜
˜
˜
˜
˜
to test RE vs FE specification. The statistic (4.5) is asymptotically chi-square distributed
with degrees of freedom equal to the rank of Cov (θ̂GM M ) − Cov (θ̂ RE ) .
˜
˜
4.1 Linear Static Models
11
A widely used panel data model is to assume that the effects of observed explanatory
variables, x, are identical across cross-sectional units, i, and over time, t, while the effects
˜
of omitted variables can be decomposed into the individual-specific effects, αi , time-specific
effects, λt , and individual time-varying effects, δit = uit , as follows:
i = 1, . . . , N,
yit = β ′ xit + αi + λt + uit ,
t = 1, . . . , T.
˜˜
(4.6)
In a single equation framework, individual-time effects, u, are assumed random and uncorrelated with x, while αi and λt may or may not correlated with x. When αi and λt
˜
˜
are treated as fixed constants as coefficients of dummy explanatory variables, dit = 1 if
the observation corresponds to ith individual at time t, and 0 otherwise, whether they
are correlated with x is not an issue. On the other hand, when αi and λt are treated as
˜
random, they become part of the error term and are typically assumed to be uncorrelated
with xit .
˜
For ease of exposition, we shall assume that there are no time-specific effects, i.e.,
λt = 0 for all t and uit are independently, identically distributed (i.i.d) across i and over
t. Stack an individuals T time series observations of (yit , x′it ) into a vector and a matrix,
˜
(4.6) may alternatively be written as
y i = Xi β + eαi + ui , i = 1, . . . , N,
˜
˜ ˜
˜
(4.7)
where y i = (yi1 , . . . , yiT )′ , Xi = (xi1 , . . . , xiT )′ , ui = (ui1 , . . . , uiT )′ , and e is a T × 1 vector
˜
˜
˜
˜
˜
of 1’s.
Let Q be a T × T matrix satisfying the condition that Qe = 0. Premultiplying (4.7)
˜ ˜
by Q yields
Qy i = QXi β + Qui ,
˜
˜
˜
i = 1. . . . , N.
(4.8)
Equation (4.8) no longer involves αi . The issue of whether αi is correlated with xit or
˜
whether αi should be treated as fixed or random is no longer relevant for (4.8). Moreover,
since Xi is exogenous, E(QXi u′i Q′ ) = QE(Xi u′i )Q′ = 0 and EQui u′i Q′ = σu2 QQ′ . An
˜
˜˜
˜
˜
12
efficient estimator of β is the generalized least squares estimator (GLS),
˜
−1 N
N
β̂ =
Xi′ (Q′ Q)− Xi
Xi′ (Q′ Q)− y i ,
˜
˜
i=1
i=1
(4.9)
where (Q′ Q)− denotes the Moore-Penrose generalized inverse (e.g. Rao (1973)).
1
T
ee′ , Q is idempotent. The Moore-Penrose generalized inverse of
˜˜
(Q′ Q)− is just Q = IT − T1 ee′ itself. Premultiplying (4.8) by Q is equivalent to transforming
˜˜
(4.6) into a model
When Q = IT −
i = 1, . . . , N,
(yit − ȳi ) = β ′ (xit − x̄i ) + (uit − ūi ),
t = 1, . . . , T,
˜
˜ ˜
(4.10)
T
T
yit , x̄i = T1
xit and ūi = T1
uit . The transformation is called
˜
t=1
t=1 ˜
t=1
covariance transformation. The least squares estimator (LS) (or a generalized least squares
where ȳi =
1
T
T
estimator (GLS)) of (4.10),
β̂ cv =
˜
N T
i=1
(xit − x̄i )(xit − x̄i )′
˜
˜ ˜
˜
t=1
−1 N T
t=1 t=1
(xit − x̄i )(yit − ȳi ) ,
˜
˜
(4.11)
is called covariance estimator or within estimator because the estimation of β only makes
˜
use of within (group) variation of yit and xit only. The covariance estimator of β turns out
˜
˜
to be also the least squares estimator of (4.10). It is the best linear unbiased estimator of
β if αi is treated as fixed and uit is i.i.d.
˜
If αi is random, transforming (4.7) into (4.8) transforms T independent equations (or
observations) into (T − 1) independent equations, hence the covariance estimator is not
as efficient as the efficient generalized least squares estimator if Eαi x′it = 0′ . When αi is
˜
˜
independent of xit and is independently, identically distributed across i with mean 0 and
˜
˜
2
variance σα , the best linear unbiased estimator (BLUE) of β is GLS,
˜
N
−1 N
β̂ =
(4.12)
Xi′ V −1 Xi
Xi′ V −1 y i .
˜
˜
i=1
i=1
2
σ2
σα
1
′
′
−1
2
2
where V = σu IT + σα ee , V
= σ2 IT − σ2 +T σ2 ee . Let ψ = σ2 +Tu σ2 , the GLS is
u
u
α ˜˜
u
α
˜˜
equivalent to first transforming the data by subtracting a fraction (1 − ψ 1/2 ) of individual
13
means ȳi and x̄i from their corresponding yit and xit , then regressing [yit − (1 − ψ 1/2 )ȳi ]
˜
˜
2
σu
1/2
on [xit − (1 − ψ )x̄i ]. ψ = σ2 +T σ2 . (for detail, see Baltagi (2001), Hsiao (2003)).
u
α
˜
˜
If a variable is time-invariant, like gender dummy, xkit = xkis = x̄ki , the covariance
transformation eliminates the corresponding variable from the specification. Hence, the
coefficients of time-invariant variables cannot be estimated. On the other hand, if αi is
random and uncorrelated with xi , ψ = 1, the GLS can still estimate the coefficients of
˜
those time-invariant variables.
4.2 Dynamic Models
When the regressors of a linear model contains lagged dependent variables, say, of the
form (e.g. Balestra and Nerlove (1966))
y i = y i,−1 γ + Xi β + eαi + ui = Zi θ + eαi + ui ,
˜ ˜
˜
˜
˜ ˜
˜
˜
i = 1, . . . , N.
(4.13)
where y i,−1 = (yi0 , . . . , yi,T −1 )′ , Zi = (y i,−1 , Xi ) and θ = (γ, β ′ )′ . For ease of notation, we
˜
˜
˜
˜
assume that yi0 are observable. Technically, we can still eliminate the individual-specific
effects by premultiplying (4.13) by the transformation matrix Q (Qe = 0),
˜ ˜
Qy i = QZi θ + Qui .
˜
˜
˜
(4.14)
However, because of the presence of lagged dependent variables, EQZi u′i Q′ = 0 even with
˜
the assumption that uit is independently, identically distributed across i and over t. For
instance, the covariance transformation matrix Q = IT −
1
T
ee′ transforms (4.13) into the
˜˜
form
(yit − ȳi ) = (yi,t−1 − ȳi,−1 )γ + (xit − x̄i )′ β + (uit − ūi ),
˜
˜ ˜
where ȳi =
1
T
T
t=1
yit , ȳi,−1 =
1
T
T
yi,t−1 and ūi =
t=1
1
T
T
i = 1, . . . , N,
t = 1, . . . , T,
(4.15)
uit . Although, yi,t−1 and uit are
t=1
uncorrelated under the assumption of serial independence of uit , the covariance between
ȳi,−1 and uit or yi,t−1 and ūi is of order (1/T) if | γ |< 1. Therefore, the covariance
estimator of θ creates a bias of order (1/T) when N → ∞ (Anderson and Hsiao (1981,
˜
14
1982), Nickell (1981)). Since most panel data contain large N but small T , the magnitude
of the bias can not be ignored (e.g. with T=10 and γ=0.5, the asymptotic bias is -0.167).
When EQZi u′i Q′ = 0, one way to obtain a consistent estimator for θ is to find instru˜
˜
˜
ments Wi that satisfy
EWi u′i Q′ = 0,
˜
˜
(4.16)
rank (Wi QZi ) = k,
(4.17)
and
where k denotes the dimension of (γ, β′ )′ , then apply the generalized instrumental variable
˜
or generalized method of moments estimator (GMM) by minimizing the objective function
N
Wi (Qy i − QZi θ)
˜
˜
i=1
′ N
Wi Qui u′i Q′ Wi′
˜ ˜
i=1
−1 N
Wi (Qy i − QZ i θ) ,
˜ ˜
˜
i=1
(4.18)
with respect to θ. (e.g. Arellano (2003), Ahn and Schmidt (1995), Arellano and Bond
˜
(1991), Arellano and Bover (1995)). For instance, one may let Q be a (T − 1) × T matrix
of the form
−1
⎢ 0
D=⎣
0
·
⎡
1
−1
·
·
0 ·
1 ·
·
·
· −1
⎤
·
·⎥
⎦,
·
1
(4.19)
then the transformation (4.14) is equivalent to taking the first difference of (4.13) over
time to eliminate αi for t = 2, . . . , T ,
i = 1, . . . , N,
∆yit = ∆yi,t−1 γ + ∆x′it β + ∆uit ,
t = 2, . . . , T,
˜ ˜
(4.20)
where ∆ = (1 − L) and L denotes the lag operator, Lyt = yt−1 . Since ∆uit = (uit − ui,t−1 )
is uncorrelated with yi,t−j for j ≥ 2 and xis , for all s, when uit is independently distributed
˜
over time and xit is exogenous, one can let Wi be a T (T − 1)[K + 12 ] × (T − 1) matrix of
˜
the form
⎡
⎤
·
q i2 0 ·
· ⎥
⎢ ˜0 q˜is ·
⎢ ˜ ˜
⎥
· ·
· ⎥,
Wi = ⎢ ·
(4.21)
⎣
⎦
·
· ·
·
·
· · q iT
˜
15
where q it = (yi0 , yi1 , . . . , yi,t−2 , x′i )′ , xi = (x′i1 , . . . , x′iT )′ , and K = k − 1. Under the
˜
˜
˜
˜
˜
′
′
assumption that (y i , xi ) are independently, identically distributed across i, the Arellano˜ ˜
Bover (1991) GMM estimator takes the form
⎧
N
⎨
N
−1 N
⎫−1
⎬
Zi′ D′ Wi′
Wi AWi′
Wi DZi
θ̂ AB,GM M =
⎩
⎭
˜
i=1
i=1
i=1
⎧
⎫
−1 N
N
N
⎨
⎬
′
′
′
Zi DWi
Wi Dy i
Wi AWi
,
⎩
˜ ⎭
i=1
i=1
(4.22)
i=1
where A is a (T − 1) × (T − 1) matrix with 2 on the diagonal elements, −1 on the elements
above and below the diagonal elements and 0 elsewhere.
The GMM estimator has the advantage that it is consistent and asymptotically normally distributed whether αi is treated as fixed or random because it eliminates αi from
the specification. However, the number of moment conditions increases at the order of T 2
which can create severe downward bias in finite sample (Ziliak (1997)). An alternative is to
use a (quasi-) likelihood approach which has the advantage of having a fixed number of orthogonality conditions independent of the sample size. It also has the advantage of making
use of all the available sample, hence may yield more efficient estimator than (4.22) (e.g.
Hsiao, Pesaran and Tahmiscioglu (2002), Binder, Hsiao and Pesaran (2004)). However, the
likelihood approach has to formulate the joint likelihood function of (yi0 , yi1 , . . . , yiT ) (or
the conditional likelihood function (yi1 , . . . , yiT | yi0 )). Since there is no reason to assume
that the data generating process of initial observations, yi0 , to be different from the rest of
yit , the initial yi0 depends on previous values of xi,−j and αi which are unavailable. Bhar˜
gava and Sargan (1983) suggest to circumscribe this missing data problem by conditioning
yi0 on xi and αi if αi is treated as random. If αi is treated as a fixed constant, Hsiao,
˜
Pesaran and Tahmisciogulu (2002) propose conditioning (yi1 − yi0 ) on the first difference
of xi .
˜
4.3 Nonlinear Models
16
When the unobserved individual specific effects, αi , (and or time-specific effects, λt )
affect the outcome, yit , linearly, one can avoid the consideration of random versus fixed
effects specification by eliminating them from the specification through some linear transformation such as the covariance transformation (4.8) or first difference transformation
(4.20). However, if αi affects yit nonlinearly, it is not easy to find transformation that can
eliminate αi . For instance, consider the following binary choice model where the observed
yit takes the value of either 1 or 0 depending on the latent response function
∗
yit
= β ′ xit + αi + uit ,
˜˜
(4.23)
and
yit =
∗
1, if yit
> 0,
∗
0, if yit ≤ 0,
(4.24)
where uit is independently, identically distributed with density function f (uit ). Let
yit = E(yit | xit , αi ) + ǫit ,
˜
then
(4.25)
∞
f (u)du
xit +αi )
(4.26)
˜˜
′
= [1 − F (−β xit − αi )].
˜˜
Since αi affects E(yit | xit , αi ) nonlinearly, αi remains after taking successive difference of
˜
yit ,
yit − yi,t−1 = [1 − F (−β ′ xit − αi )]
˜˜
(4.27)
− [1 − F (−β ′ xi,t−1 − αi )] + (ǫit − ǫi,t−1 ).
˜˜
The likelihood function conditional on xi and αi takes the form,
˜
E(yit | xit , αi ) =
˜
−(β
′
′
1−yit
T
[1 − F (−β ′ xit − αi )]yit .
ΠN
i=1 Πt=1 [F (−β xit − αi )]
˜
˜˜
˜
(4.28)
If T is large, consistent estimator of β and αi can abe obtained by maximizing (4.28). If T
˜
is finite, there is only limited information about αi no matter how large N is. The presence
of incidental parameters, αi , violates the regularity conditions for the consistency of the
maximum likelihood estimator of β .
˜
17
If f (αi | xi ) is known, and is characterized by a fixed dimensional parameter vector,
˜
consistent estimator of β can be obtained by maximizing the marginal likelihood function,
˜
N
(4.29)
Πi=1 ΠTt=1 [F (−β ′ xit − αi )]1−yit [1 − F (−β ′ xit − αi )]yit f (αi | xi )dαi .
˜
˜˜
˜˜
However, maximizing (4.29) involves T -dimensional integration. Butler and Moffit (1982),
Chamberlain (1984), Heckman (1981), etc., have suggested methods to simplify the computation.
The advantage of RE specification is that there is no incidental parameter problem.
The problem is that f (αi | xi ) is in general unknown. If a wrong f (αi | xi ) is postu˜
˜
lated, maximizing the wrong likelihood function will not yield consistent estimator of β .
˜
Moreover, the derivation of the marginal likelihood through multiple integration may be
computationally infeasible. The advantage of FE specification is that there is no need to
specify f (αi | xi ). The likelihood function will be the product of individual likelihood (e.g.
˜
(4.28)) if the errors are i.i.d. The disadvantage is that it introduces incidental parameters.
A general approach of estimating a model involving incidental parameters is to find
transformations to transform the original model into a model that does not involve incidental parameters. Unfortunately, there is no general rule available for nonlinear models. One
has to explore the specific structure of a nonlinear model to find such a transformation.
For instance, if f (u) in (4.23) is logistic, then
β ′ x +α
e ˜ ˜ it i
.
Prob (yit = 1 | xit , αi ) =
β ′ xit +αi
˜
1 + e˜ ˜
(4.30)
Since, in a logit model, the denominators of Prob(yit = 1 | xit , αi ) and Prob(yit = 0 |
˜
T
yit = s is
xit , αi ) are identical and the numerator of any sequence {yi1 , . . . , yiT } with
˜
t=1
T
always equal to exp (αi s)·exp{ (β ′ xit )yit }, the conditional likelihood function conditional
t=1 ˜ ˜
T
yit = s will not involve the incidental parameters αi . For instance, consider the
on
t=1
18
simple case that T = 2, then
Prob(yi1 = 1, yi2 = 0 | yi1 + yi2
′
xi1
eβ
˜
˜
= 1) = ′
′
xi2
x
β
i1
e ˜ ˜ + eβ
˜˜
1
=
,
β ′ ∆xi2
1 + e˜ ˜
and
Prob(yi1 = 0, yi2 = 1 | yi1 + yi2
β ′ ∆x
e ˜ ˜ i2
= 1) =
,
β ′ ∆xi2
1 + e˜ ˜
(4.31)
(4.32)
(Chamberlain (1980), Hsiao (2003)).
This approach works because of the logit structure. In the case when f (u) is unknown,
Manski (1987) exploits the latent linear structure of (4.23) by noting that for given i,
>
>
′
= β xi,t−1 ⇐⇒ E(yit | xit , αi ) = E(yi,t−1 | xi,t−1 , αi ),
β ′ xit <
<
˜
˜
˜˜
˜˜
(4.33)
and suggests maximizing the objective function
N T
1
sgn(b′ ∆xit )∆yit ,
HN (b) =
N i=1 t=2
˜ ˜
(4.34)
where sgn(w) = 1 if w > 0, = 0 if w = 0, and −1 if w < 0. The advantage of the Manski
(1987) maximum score estimator is that it is consistent without the knowledge of f (u).
The disadvantage is that (4.33) holds for any cβ where c > 0. Only the relative magnitude
˜
of the coefficients can be estimated with some normalization rule, say β = 1. Moreover,
˜
the speed of convergence is considerably slower (N 1/3 ) and the limiting distribution is
quite complicated. Horowitz (1992) and Lee (1999) have proposed modified estimators
that improve the speed of convergence and are asymptotically normally distributed.
Other examples of exploiting specific structure of nonlinear models to eliminate the
effects of incidental parameters αi include dynamic discrete choice models (Chamberlain
(1993), Honoré and Kyriazidou (2000), Hsiao, Shen, Wang and Weeks (2005)), symmetrically trimmed least squares estimator for truncated and censored data (Tobit models)
(Honoré (1992)), sample selection models (or type II Tobit models) (Kyriazidou (1997)),
19
etc. However, often they impose very severe restrictions on the data such that not much
information of the data can be utilized to obtain parameter estimates. Moreover, there are
models such that there does not appear to possess consistent estimator when T is finite.
An alternative to consider consistent estimators is to consider bias reduced estimator.
The advantage of such an approach is that the bias reduced estimators may still allow the
use of all the sample information so that from a mean square error point of view, the bias
reduced estimator may still dominate a consistent estimators because the latter often have
to throw away a lot of sample, thus tend to have large variances.
Following the idea of Cox and Reid (1987), Arellano (2001) and Carro (2004) propose
to derive the modified MLE by maximizing the modified log-likelihood function
∗
L (β ) =
˜
N
i=1
ℓ∗i (β , α̂i (β ))
˜
˜
1
∗
− log ℓi,di di (β1 α̂i (β ) ,
2
˜
(4.35)
where ℓ∗i (β , α̂i (β )) denotes the concentrated log-likelihood function of y i after substi˜
˜
˜
=
0 in terms of
tuting the MLE of αi in terms of β , α̂i (β ), (i.e., the solution of ∂logL
∂αi
˜
˜
β , i = 1, . . . , N ), into the log-likelihood function and ℓ∗i,αi αi (β , α̂i (β )) denotes the second
˜
˜
˜
derivative of ℓ∗i with respect to αi . The bias correction term is derived by noting that to the
E[∗
(β ,αi )]
i αi
. By
order of (1/T ) the first derivative of ℓ∗i with respect to β converges to 12 i,βα
∗
E[i,α α (β˜,αi )]
i i
˜
˜
subtracting the order (1/T) bias from the likelihood function, the modified MLE is biased
only to the order of (1/T 2 ), without increasing the asymptotic variance.
Monte Carlo experiments conducted by Carro (2005) have shown that when T = 8,
the bias of modified MLE for dynamic probit and logit models are negligible. Another
advantage of the Arellano-Carro approach is its generality. For instance, a dynamic logit
model with time dummy explanatory variable can not meet the Honoré and Kyriazidou
(2000) conditions for generating consistent estimator, but can still be estimated by the
modified MLE with good finite sample properties.
4.4 Modeling Cross-Sectional Dependence
Most panel studies assume that apart from the possible presence of individual in20
variant but period varying time specific effects, λt , the effects of omitted variables are
independently distributed across cross-sectional units. However, often economic theory
predicts that agents take actions that lead to interdependence among themselves. For example, the prediction that risk averse agents will make insurance contracts allowing them
to smooth idiosyncratic shocks implies dependence in consumption across individuals. Ignoring cross-sectional dependence can lead to inconsistent estimators, in particular when
T is finite (e.g. Hsiao and Tahmiscioglu (2005)). Unfortunately, contrary to the time series
data in which the time label gives a natural ordering and structure, general forms of dependence for cross-sectional dimension are difficult to formulate. Therefore, econometricians
have relied on strong parametric assumptions to model cross-sectional dependence. Two
approaches have been proposed to model cross-sectional dependence: economic distance
or spatial approach and factor approach.
In regional science, correlation across cross-section units is assumed to follow a certain spatial ordering, i.e. dependence among cross-sectional units is related to location and
distance, in a geographic or more general economic or social network space (e.g. Anselin
(1988), Anselin and Griffith (1988), Anselin, Le Gallo and Jayet (2005)). A known spatial
weights matrix, W = (wij ) an N × N positive matrix in which the rows and columns
correspond to the cross-sectional units, is specified to express the prior strength of the
interaction between individual (location) i (in the row of the matrix) and individual (location) j (column), wij . By convention, the diagonal elements, wii = 0. The weights are
N
wij = 1.
often standardized so that the sum of each row,
j=1
The spatial weight matrix, W , is often included into a model specification to the
dependent variable, to the explanatory variables, or to the error term. For instance, a
spatial lag model for the N T × 1 variable y = (y ′1 , . . . , y′N )′ , y i = (yi1 , . . . , yiT )′ , may take
˜
˜
˜
˜
the form
y = ρ(W ⊗ IT )y + Xβ + u
˜
˜ ˜
(4.36)
where X and u denote the N T ×K explanatory variables and N T ×1 vector of error terms,
˜
21
respectively, and ⊗ denotes the Kronecker product. A spatial error model may take the
form,
y = Xβ + v ,
˜
˜ ˜
(4.37)
where v may be specified as in a spatial autoregressive form,
˜
v = θ(W ⊗ IT )v + u,
˜ ˜
˜
(4.38)
v = γ(W ⊗ IT )u + u.
˜
˜ ˜
(4.39)
or a spatial moving average form,
The spatial model can be estimated by the instrumental variables (generalized method
of moments estimator) or the maximum likelihood method. However, the approach of
defining cross-sectional dependence in terms of “economic distance” measure requires that
the econometricians have information regarding this “economic distance” (e.g. Conley
(1999)). Another approach to model cross-sectional dependence is to assume that the
error of a model, say model (4.37) follows a linear factor model,
vit =
r
bij fjt + uit ,
(4.40)
j=1
where f t = (f1t , . . . , frt )′ is a r × 1 vector of random factors, b′i = (bi1 , . . . , bir ), is a r × 1
˜
˜
nonrandom factor loading coefficients, uit , represents the effects of idiosyncratic shocks
which is independent of f t and is independently distributed across i. (e.g. Bai and Ng
˜
(2002), Moon and Perron (2004), Pesaran (2004)). The conventional time-specific effects
model is a special case of (7.5) when r = 1 and bi = b for all i and ℓ.
The factor approach requires considerably less prior information than the economic
distance approach. Moreover, the number of time-varying factors, r, and factor load
matrix B = (bij ) can be empirically identified if both N and T are large. The estimation
of a factor loading matrix when N is large may not be computationally feasible. Pesaran
N
N
yit , x̄t = N1
xit
(2004) has therefore suggested to add cross-sectional means ȳt = N1
˜
i=1
i=1 ˜
22
as additional regressors with individual-specific coefficients to (4.37) to filter out crosssectional dependence. This approach is very appealing because of its simplicity. However,
it is not clear how it will perform if N is neither small nor large. Neither is it clear how it
can be generalized to nonlinear models.
4.5 Large-N and Large-T Panels
Our discussion has been mostly focusing on panels with large N and finite T . There are
panel data sets, like the Penn-World tables, covering different individuals, industries, and
countries over long periods. In general, if an estimator is consistent in the fixed-T , large-N
case, it will remain consistent if both N and T tend to infinity. Moreover, even in the case
that an estimator is inconsistent for fixed T and large N , (say, the MLE of dynamic model
(4.13) or fixed effects probit or logit models (4.26)), it can become consistent if T also
tends to infinity. The probability limit of an estimator, in general, is identical irrespective
of how N and T tend to infinity. However, the properly scaled limiting distribution may
depend on how the two indexes, N and T , tend to infinity.
There are several approaches for deriving the limits of large-N , large-T panels:
a. Sequential limits — First, fix one index, say N , and allow the other, say T , to go
to infinity, giving an intermediate limit, then, let N go to infinity.
b. Diagonal-path limits — Let the two indexes, N and T , pass to infinity along a
specific diagonal path, say T = T (N ) as N −→ ∞.
c. Joint limits — Let N and T pass to infinity simultaneously without placing
specific diagonal path restrictions on the divergence.
In many applications, sequential limits are easy to derive. However, sometimes sequential limits can give misleading asymptotic results. A joint limit will give a more robust
result than either a sequential limit or a diagonal-path limit, but will also be substantially
more difficult to derive and will apply only under stronger conditions, such as the existence
of higher moments. Phillips and Moon (1999) have given a set of sufficient conditions that
ensures that sequential limits are equivalent to joint limits.
23
When T is large, there is a need to consider serial correlations more generally, including
both short-memory and persistent components. For instance, if unit roots are present in
y and x (i.e. both are integrated of order 1),, but are not cointegrated, Phillips and Moon
(1999) show that if N is fixed but T −→ ∞, the least squares regression of y on x is a
nondegenerate random variables that is a functional of Brownian motion that does not
converge to the long-run average relation between y and x, but it does if N also tends to
infinity. In other words, the issue of spurious regression will not arise in panel with large
N (e.g. Kao (1999)).
Both theoretical and applied researchers have paid a great deal attention to unit root
and cointegration properties of variables. When N is finite and T is large, standard time
series techniques can be used to derive the statistical properties of panel data estimators.
When N is large and cross-sectional units are independently distributed across i, central
limit theorems can be invoked along the cross-sectional dimension. Asymptotically normal
estimators and test statistics (with suitably adjustment for finite T bias) for unit roots
and cointegration have been proposed (e.g. Baltagi and Kao (2000), Im, Pesaran and Shin
(2003), Levin, Lin and Chu (2002)). They, in general, gain statistical power over their
standard time series counterpart (e.g. Choi (2001)).
When both N and T are large and cross-sectional units are not independent, a factor
analytic framework of the form (4.40) has been proposed to model cross-sectional dependency and variants of unit root tests are proposed (e.g. Perron and Moon (2004)).
However, the implementation of those panel unit root tests is quite complicated. When
N
′
′
uit −→ 0, (4.40) implies that v̄t = b̄ f t , where b̄ is the cross-sectional avN −→ ∞, N1
˜
˜˜
i=1
′
erage of bi = (bi1 , . . . , bir ) and f t = (f1t , . . . , frt ). Pesaran (2004, 2005) suggests a simple
˜
˜
approach to filter out the cross-sectional dependency by augmenting the cross-sectional
means, ȳt and x̄t to the regression model (4.37),
˜
yit = x′it β + αi + ȳt ci + x̄′t di + eit ,
˜
˜ ˜
24
(4.41)
or ȳt , ∆ȳt−j to the Dickey-Fuller (1979) type regression model,
∆yit = αi + δi t + γi yi,t−1 +
pi
=1
+
pi
φi ∆yi,t− + ci ȳt−1
(4.42)
di ∆ȳt− + eit ,
=1
N
N
yit , x̄t = N1
xit , ∆ȳt−j = N1
∆yi,t−j and
˜
i=1
i=1 ˜
i=1
∆ = (1 − L), L denotes the lag operator. The resulting pooled estimator will again be
for testing of unit root, where ȳt =
1
N
N
asymptotically normally distributed.
When cross-sectional dependency is of unknown form, Chang (2002) suggests to use
nonlinear transformations of the lagged level variable, yi,t−1 , F (yi,t−1 ), as instruments (IV)
for the usual augmented Dickey-Fuller (1970) type regression. The test static for the unit
root hypothesis is simply defined as a standardized sum of individual IV t-ratios. As long as
F (·) is regularly integrable, say F (yt−1 ) = yi,t−1 e−ci |yi,t−1 | , where ci is a positive constant,
the product of the nonlinear instruments F (yi,t−1 ) and F (yj,t−1 ) from different crosssectional units i and j are asymptotically uncorrelated, even the variables yi,t−1 and yj,t−1
generating the instruments are correlated. Hence, the usual central limit theorems can
be invoked and the standardized sum of individual IV t-ratios is asymptotically normally
distributed.
For further review of the literature on unit roots and cointegration in panels, see
Breitung and Pesaran (2005) and Choi (2004). However, a more fundamental issue of
panel modeling with large N and large T is whether the standard approach of formulating
unobserved heterogeneity for the data with finite T remains a good approximation to the
true data generating process with large T ?.
5. Concluding Remarks
In this paper we have tried to provide a summary of advantages of using panel data
and the fundamental issues of panel data analysis. Assuming that the heterogeneity across
25
cross-sectional units and over time that are not captured by the observed variables can be
captured by period-invariant individual specific and/or individual-invariant time specific
effects, we surveyed the fundamental methods for the analysis of linear static and dynamic
models. We have also discussed difficulties of analyzing nonlinear models and modeling
cross-sectional dependence. There are many important issues such as the modeling of
joint dependence or simultaneous equations models, varying parameter models (e.g. Hsiao
(1992, 2003), Hsiao and Pesaran (2005)), unbalanced panel, measurement errors (Griliches
and Hausman (1986), Wansbeek and Konig (1989)), nonparametric or semiparametric
approach, repeated cross-section data, etc. that are not discussed, but are of no less
importance.
Although panel data offer many advantages, they are not panacea. The power of
panel data to isolate the effects of specific actions, treatments or more general policies
depends critically on the compatibility of the assumptions of statistical tools with the
data generating process. In choosing a proper method for exploiting the richness and
unique properties of the panel, it might be helpful to keep the following factors in mind:
First, what advantages do panel data offer us in investigating economic issues over data
sets consisting of a single cross section or time series? Second, what are the limitations
of panel data and the econometric methods that have been proposed for analyzing such
data? Third, when using panel data, how can we increase the efficiency of parameter
estimates and reliability of statistical inference? Fourth, are the assumptions underlying
the statistical inference procedures and the data-generating process compatible.
26
References
Ahn, S.C. and P. Schmidt (1995), “ Efficient Estimation of Models for Dynamic Panel
Data”, Journal of Econometrics, 68, 5-27.
Aigner, D.J., C. Hsiao, A. Kapteyn and T. Wansbeek (1985), “Latent Variable Models in
Econometrics”, in Handbook of Econometrics, vol. II., ed. by Z. Griliches and M.D.
Intriligator, Amsterdam: North-Holland, 1322-1393.
Ahn, S.C. and P. Schmidt (1995), “ Efficient Estimation of Models for Dynamic Panel
Data”, Journal of Econometrics, 68, 5-27.
Anderson, T.W. (1959), “On Asymptotic Distributions of Estimates of Parameters of
Stochastic Difference Equations”, Annals of Mathematical Statistics 30, 676-687.
Anderson, T.W. and C. Hsiao (1981), “Estimation of Dynamic Models with Error Components”, Journal of the American Statistical Association, 767, 598-606.
(1982), “Formulation and Estimation of Dynamic Models Using Panel Data”,
Journal of Econometrics, 18, 47-82.
Anselin, L., (1988), Spatial Econometrics: Methods and Models, Boston: Kluwer.
and D.A. Griffith (1988), “Do Spatial Effects Really Matter in Regression
Analysis?”, Papers of the Regional Science Association, 65, 11-34.
, J. Le Gallo and H. Jayet (2005), “Spatial Panel Econometrics”, mimeo.
Antweiler, W. (2001), “Nested Random Effects Estimation in Unbalanced Panel Data”,
Journal of Econometrics, 101, 295-313.
Arellano, M., (2001), “Discrete Choice with Panel Data”, working paper 0101, CEMFI,
Madrid.
(2003), Panel Data Econometrics, Oxford: Oxford University Press.
, M. and S.R. Bond (1991), “Some Tests of Specification for Panel Data:
Monte Carlo Evidence and an Application to Employment Equations”, Review of
Economic Studies, 58, 277-297.
and O. Bover (1995), “Another Look at the Instrumental Variable Estimation of Error-Components Models”, Journal of Econometrics, 68, 29-51.
Arellano, M., O. Bover and J. Labeaga (1999), “Autoregressive Models with Sample Selectivity for Panel Data”, in Analysis of Panels and Limited Dependent Variable Models,
ed., by C. Hsiao, K. Lahiri, L.F. Lee and M.H. Pesaran, Cambridge: Cambridge
University Press, 23-48.
Bai, J. and S. Ng (2002), “Determining the Number of Factors in Approximate Factor
Models”, Econometrica, 70. 91-121.
Balestra, P. and M. Nerlove (1966), “Pooling Cross-Section and Time Series Data in the
Estimation of a Dynamic Model: The Demand for Natural Gas”, Econometrica, 34,
27
585-612.
Baltagi, B.H. (2001), Econometric Analysis of Panel Data, Second edition, New York:
Wiley.
and C. Kao (2000), “Nonstationary Panels, Cointegration in Panels and Dynamic Panel, A Survey”, in Nonstationary Panels Panel Cointegration, and Dynamic
Panels, ed. by B. Baltagi, Advances in Econometrics, vol. 15, Amsterdam: JAI Press,
7-52.
Becketti, S., W. Gould, L. Lillard and F. Welch (1988), “The Panel Study of Income
Dynamics After Fourteen Years: An Evaluation”, Journal of Labor Economics, 6,
472-492.
Ben-Porath, Y. (1973), “Labor Force Participation Rates and the Supply of Labor”, Journal of Political Economy, 81, 697-704.
Bhargava, A. and J.D. Sargan (1983), “Estimating Dynamic Random Effects Models from
Panel Data Covering Short Time Periods”, Econometrica, 51, 1635-1659.
Binder, M., C. Hsiao and M.H. Pesaran (2005), “Estimation and Inference in Short Panel
Vector Autoregressions with Unit Roots and Cointegration”, Econometric Theory, 21,
pp. 795-837.
Biorn, E. (1992), “Econometrics of Panel Data with Measurement Errors” in Econometrics
of Panel Data: Theory and Applications, ed. by. L. Mátyás and P. Sevestre, Klumer,
152-195.
Breitung, J. and M.H. Pesaran (2005), “Unit Roots and Cointegration in Panels”, in The
Econometrics of Panel Data, Kluwer (forthcoming).
Butler, J.S. and R. Moffitt (1982), “A Computationally Efficient Quadrature Procedure
for the One Factor Multinominal Probit Model”, Econometrica, 50, 761-764.
Carro, J.M. (2005), “Estimating Dynamic Panel Data Discrete Choice Models with Fixed
Effects”, Journal of Econoemtrics (forthcoming).
Chamberlain, G. (1980), “Analysis of Covariance with Qualitative Data”, Review of Economic Studies, 47, 225-238.
(1984), “Panel Data”, in Handbook of Econometrics Vol II, ed.
Griliches and M. Intriligator, pp. 1247-1318. Amsterdam: North Holland.
by Z.
(1993), “Feedback in Panel Data Models”, mimeo, Department of Economics, Harvard University.
Chang, Y. (2002), “Nonlinear IV Unit Root Tests in Panels with Cross-Sectional Dependency”, Journal of Econometrics, 110, 261-292.
Choi, I. (2001), “Unit Root Tests for Panel Data”, Journal of International Money and
Finance, 20, 249-272.
Choi, I. (2004), “Nonstationary Panels”, in Palgrave Handbooks of Econometrics, vol I,
(forthcoming).
28
Conley, T.G. (1999), “GMM Estimation with Cross-sectional Dependence”, Journal of
Econometrics, 92, 1-45.
Cox, D.R. and Reid (1987), “Parameter Orthogonality and Approximate Conditional Inference”, Journal of the Royal Statistical Society, B, 49, 1-39.
Davis, P. (1999), “Estimating Multi-way Error components Models with Unbalanced Panel
Data Structure”, MIT Sloan School.
Dickey, D.A. and W.A. Fuller (1979), “Distribution of the Estimators for Autoregressive
Time Series with a Unit Root”, Journal of the American Statistical Association, 74,
427-431.
(1981), “Likelihood Ratio Statistics for Autoregressive Time Series with a
Unit Root”, Econometrica 49, 1057-1072.
Granger, C.W.J. (1990), “Aggregation of Time-Series Variables: A Survey”, in Disaggregation in Econometric Modeling, ed. by T. Barker and M.H. Pesaran, London:
Routledge.
Griliches, Z. (1967), “Distributed Lags: A Survey”, Econometrica, 35, 16-49.
and J.A. Hausman (1986), “Errors-in-Variables in Panel Data”, Journal of
Econometrics, 31, 93-118.
Hausman, J.A. (1978), “Specification Tests in Econometrics”, Econometrica, 46. 1251-71.
Heckman, J.J. (1981), “Statistical Models for Discrete Panel Data”, in Structural Analysis of Discrete Data with Econometric Applications, ed. by C.F. Manski and D.
McFadden, Cambridge, Mass., MIT Press, 114-178.
Heckman, J.J., H. Ichimura, J. Smith and P. Todd (1998), “Characterizing Selection Bias
Using Experimental Data”, Econometrica, 66, 1017-1098.
Honoré, Bo (1992), “Trimmed LAD and Least Squares Estimation of Truncated and Censored Regression Models with Fixed Effects”, Econometrica, 60, 533-567.
Honoré, Bo and E. Kyriazidou (2000),. “Panel Data Discrete Choice Models with Lagged
Dependent Variables”, Econometrica, 68, 839-874.
Horowitz, J.L. (1992), “A Smoothed Maximum Score Estimator for the Binary Response
Model”, Econometrica 60, 505-531.
Hsiao, C., (1986) “Analysis of Panel Data, Econometric Society monographs No. 11, New
York: Cambridge University Press.
(1992), “Random Coefficient Models” in The Econometrics of Panel Data,
ed. by L. Matyas and P. Sevestres, Kluwer, 1st edition, 223-241, 2nd ed. (1996),
410-428.
(2003), Analysis of Panel Data, 2nd edition, Cambridge: Cambridge University Press (Econometric Society monograph no. 34).
(2005), “Why Panel Data?”, Singapore Economic Review, 50(2), 1-12.
29
(2006), “Longitudinal Data Analysis”, in The New Palgrave Dictionary of
Economics, MacMillan (forthcoming).
and M.H. Pesaran (2005), “Random Coefficients Models”, in The Econometrics of Panel Data, ed. by L. Matyas and P. Sevestre, Kluwer (forthcoming).
and T. Tahmiscioglu (2005), “Estimation of Dynamic Panel Data Models
with Both Individual and Time Specific Effects”.
Hsiao, C., T.W. Appelbe, and C.R. Dineen (1993), “A General Framework for Panel
Data Analysis — With an Application to Canadian Customer Dialed Long Distance
Service”, Journal of Econometrics, 59, 63-86.
D.C. Mountain and K. Ho-Illman (1995), “Bayesian Integration of EndUse Metering and Conditional Demand Analysis”, Journal of Business and Economic
Statistics, 13, 315-326.
M.W. Luke Chan, D.C. Mountain and K.Y. Tsui (1989), “Modeling Ontario
Regional Electricity System Demand Using a Mixed Fixed and Random Coefficients
Approach”, Regional Science and Urban Economics, 19, 567-587.
Hsiao, C., M.H. Pesaran and A.K. Tahmiscioglu (2002), “Maximum Likelihood Estimation
of Fixed Effects Dynamic Panel Data Models Covering Short Time Periods”, Journal
of Econometrics, 109, 107-150.
, Y. Shen and H. Fujiki (2005), “Aggregate vs Disaggregate Data Analysis
— A Paradox in the Estimation of Money Demand Function of Japan Under the Low
Interest Rate Policy”, Journal of Applied Econometrics, 20, 579-601.
, Y. Shen, B. Wang and G. Weeks (2005), “Evaluating the Effectiveness of
Washington State Repeated Job Search Services on the Employment Rate of Primeage Female Welfare Recipients”, mimeo.
Im, K., M.H. Pesaran and Y. Shin (2003), “Testing for Unit Roots in Heterogeneous
Panels”, Journal of Econometrics, 115, 53-74.
Juster, T. (2000), “Economics/Micro Data”, in International Encyclopedia of Social Sciences, (forthcoming).
Kao, C. (1999), “Spurious Regression and Residual-Based Tests for Cointegration in Panel
Data”, Journal of Econometrics, 90, 1-44.
Kyriazidou, E. (1997), “Estimation of a Panel Data Sample Selection Model”, Econometrica, 65, 1335-1364.
Lee, M.J. (1999), “A Root-N-Consistent Semiparametric Estimator for Related Effects
Binary Response Panel Data”, Econometrica, 67, 427-433.
Lee, M.J. (2005), Micro-Econometrics for Policy, Program and Treatment Analysis, Oxford: Oxford University Press.
Lewbel, A. (1994), “Aggregation and Simple Dynamics”, American Economic Review, 84,
905-918.
30
Levin, A., C. Lin, and J. Chu (2002), “Unit Root Tests in Panel Data: Asymptotic and
Finite-Sample Properties”, Journal of Econometrics. 108, 1-24.
MaCurdy, T.E. (1981), “An Empirical Model of Labor Supply in a Life Cycle Setting”,
Journal of Political Economy, 89, 1059-85.
Manski, C.F. (1987), “Semiparametric Analysis of Random Effects Linear Models from
Binary Panel Data”, Econometrica, 55, 357-362.
Mátyás, L. and P. Sevestre, ed (1996), The Econometrics of Panel Data — Handbook of
Theory and Applications, 2nd ed. Dordrecht: Kluwer.
Moon, H.R. and B. Perron (2004), “Testing for a Unit Roots in Panels with Dynamic
Factors”, Journal of Econometrics, 122, 81-126.
Nerlove, M. (2002), Essays in Panel Data Econometrics, Cambridge: Cambridge University Press.
Neyman, J. and E.L. Scott (1948), “Consistent Estimates Based on Partially Consistent
Observations”, Econometrica, 16, 1-32.
Nickell, S. (1981), “Biases in Dynamic Models with Fixed Effects”, Econometrica, 49,
1399-1416.
Pakes, A. and Z. Griliches (1984), “Estimating Distributed Lags in Short Panels with
an Application to the Specification of Depreciation Patterns and Capital Stock Constructs”, Review of Economic Studies, 51, 243-262.
Pesaran, M.H. (2003), “On Aggregation of Linear Dynamic Models: An Application to
Life-Cycle Consumption Models Under Habit Formation”, Economic Modeling, 20,
227-435.
Pesaran, M.H. (2004), “Estimation and Inference in Large Heterogeneous Panels with
Cross-Sectional Dependence”, mimeo.
(2005), “A Simple Panel Unit Root Test in the Presence of Cross-Section
Dependence”, mimeo.
Phillips, P.C.B. (1986), “Understanding Spurious Regressions in Econometrics”, Journal
of Econometrics, 33, 311-340.
Phillips, P.C.B. and S.N. Durlauf (1986), “Multiple Time Series Regression with Integrated
Processes”, Review of Economic Studies, 53, 473-495.
and H.R. Moon (1999), “Linear Regression Limit Theory for Nonstationary
Panel Data”, Econometrica, 67, 1057, 1111.
Rao, C.R., (1973), Linear Statistical Inference and Its Applications, 2nd ed., New York:
Wiley.
Rosenbaum, P., and D. Rubin (1985), “Reducing Bias in Observational Studies Using
Subclassification on the Propensity Score”, Journal of the American Statistical Association, 79, 516-524.
31
Wansbeek, T.J. and R.H. Koning (1989), “Measurement Error and Panel Data”, Statistica
Neerlandica, 45, 85-92.
Zilak, J.P. (1997), “Efficient Estimation with Panel Data When Instruments are Predetermined: An Empirical Comparison of Moment-Condition Estimators”, Journal of
Business and Economic Statistics, 15, 419-431.
32
Scatter Diagrams of (y(i,t),x(i,t))
y
x
Figure 1
y
x
Figure 2
y
x
Figure 3