Panel Data Analysis Using STATA 13
Panel Data Analysis Using STATA 13
Panel Data Analysis Using STATA 13
4. Panel Data
There are different names for panel data. These are pool, longitudinal,
multi-dimensional and cross- sectional time-series data. That is the
combination of both time series and cross-sectional data. Therefore,
panel data can simply be referred to as the data on explicitly different
units or variables over a period of time. Different units and different
periods of time are the basic elements of panel data. For example,
data on the profits of UBA and Zenith Banks for a period of 40 years
are panel data. We also have the macro and micro panel data. Each
of these is discussed separately below:
The panel data that have values for all their observations are termed
balanced. In essence, each unit or cross-section has the same time
space (coverage). Number of observation (t) is the same throughout
for each unit. A topical example can be virtualized below:
4
Table 3: Unbalanced Panel Data Set
company year GDP(N,B) inflation(%)
1 2001 30 8
1 2002 23 9
2 2001 67 13
2 2002 32 9
2 2003 35 8
3 2001 87 10
Source: Hypothetical
5
We are through with the nature of data we shall be dealing with in this
workshop. Let’s proceed to modeling.
(3) Cov (µt, µj) =0; that is the error term (µt) in equation 2 must not
correlate with any other error term such as µj.
(4) Cov (µt, xt.) =0; that is the error term must not statistically relate
with the explanatory variables.
(5) Open the editor profile, right click and paste then a screen short
will appear. On the screen short click on treat first row as variable
names. Immediately the data will be pasted.
(6) Declare data time series. There are two ways to do this- manual
and automatic but in our discussion here let’s follow the manual
method. Just type the “tset year” on the command space below and
enter.
(7) Estimate the model by typing: regress follow by the depend
variable and independent variable (s). Then press enter.
(8) For the diagnostic test and descriptive statistics we shall use
automatic method here.
(9) Click on statistics on the upper menu, move to linear models and
related, navigate right to regress diagnostics, navigate again right to
specification etc and click
(10) Begin to select the diagnostic tests you what to perform and click
on either ok or summit.
FOR THE DESCRIPTIVE STATISTICS
(11) Move down to summary, tables and test, navigate to summary
and descriptive statistics, navigate again to summary statistics.
(12) Click, then enter the variables and click on ok
We have been able to perform mundane task, we can now proceed to
panel modeling and estimation. Again I will quickly let you know that
equation 2 has it panel data model counterpart. And it generally
10
referred to pool regress model. The classical/traditional form of this
model can specified as:
11
(11) Panel Data Unit Root Test
Unit root means a parameter of a series that is equal to 1 and
when there is a unit root in a series, it means there is evidence of
a random walk in the series and therefore it is not stationary.
Regression result based on such series may be spurious or
nonsensicant. To avoid this situation either in panel or time series
analysis, it important you subject data series to stationarility or
unit root test. However, the test of a unit root is recent
phenomenon in panel data see for example Levin, Lin and Chu
(2002), Im, Pesaran and Shin (2003), Harris and Tzavalis (1999),
Choi (2001) and Hadri (2000). The Levin, Lin and Chu test
specification can be expressed as:
12
Note that the majority of the unit root tests assume that you have
a balanced panel dataset, but the Im–Pesaran–Shin and Fisher-
type tests (i.e. Choi 2001) allow for unbalanced panels. The
syntax for Fisher test is- xtunitroot fisher the variable, dfuller trend
demean lags(1), the syntax for Harris–Tzavalis test is- xtunitroot
ht the variable. The next aspect of this discussion is to look at the
descriptive statistics. The panel syntax for these are: xtsum, ,
xtline and histogram
(12) Estimation of the Pool Regression
To state equation 18 in the normal form, let’s represent h(x i) with αi,
then we have
yi = αi + βx`it + wit (19)
Note that each αi is treated as unknown parameter to be estimated.
14
How do you estimate the fixed effects model in STATA? This pertinent
question demands urgent answer. The following syntax is applicable:
type xtreg dependent variable independent variable(s), fe and enter.
It is important to know that the set of the explanatory variables
included in equation 19 is classified into two- time invariate and time
variate variables. The time invariate variables mimic the individual
specific constant term ci. The coefficients of the time invariate
variables cannot be estimated using the fixed effects model; therefore,
the fixed effects model absorbs them in the αi as stated in equation 19.
This is the limitation of the fixed effects model because it cannot be
used to estimating the coefficients of the time invariate variables. But
the random effects model can provide separate estimations for the
coefficients of the time invariate variables. Therefore, the random
effects model is of immense concern in Econometrics. The
assumption underlying the formulation of the random effects model is
that the unobserved individual heterogeneity is uncorrelated with the
explanatory variables (x`it) and if it is uncorrelated with the explanatory
variables, it must be included in the disturbance term. Thus, the
random effects model can be specified as:
yit = αz`i - E(αz`i) + E(αz`i) + βx`it + εit (20)
Rearrange the terms in equation 20
yit = E(αz`i) + βx`it + αz`i - E(αz`i) + εit (21)
Again let:
αi = E(αz`i) (22)
µi = αz`i - E(αz`i) (23)
Then equation 21 is reduced to:
yit = αi + βx`it + µi + εit (24)
15
Where µi is the group specific error term similar to εit
We can now uphold the strict exogeneity assumptions:
E[εit/x`it] =0
E[µi/x`it] =0
E[ε2 it/x`it] =δ ε2
17