Gretl Guide (201 250)
Gretl Guide (201 250)
Gretl Guide (201 250)
date cpi
1990 130.658
1995 152.383
2000 172.192
What we need is for the CPI variable in the panel to repeat these three values 500 times.
Solution: Simple! With the panel dataset open in gretl,
append cpi.txt
Comment: If the length of the time series is the same as the length of the time dimension in the
panel (3 in this example), gretl will perform the stacking automatically. Rather than using the
append command you could use the “Append data” item under the File menu in the GUI program.
If the length of your time series does not exactly match the T dimension of your panel dataset,
append will not work but you can use the join command, which is able to pick just the observations
with matching time periods. On selecting “Append data” in the GUI you are given a choice between
plain “append” and “join” modes, and if you choose the latter you get a dialog window allowing you
to specify the key(s) for the join operation. For native gretl data files you can use built-in series that
identify the time periods, such as $obsmajor, for your outer key to match the dates. In the example
above, if the CPI data were in gretl format $obsmajor would give you the year of the observations.
nulldata 36
set seed 61218
setobs 12 1:1 --stacked-time-series
###
### generate simulated yearly data
###
series year = 2000 + time
series y = round(normal())
series x = round(3*uniform())
list X = y x
print year X -o
###
### now re-cast as 4-year averages
###
# a dummy for endpoints
series endpoint = (year % 4 == 0)
# id variable
series id = $unit
# compute averages
loop foreach i X
series $i = movavg($i, 4)
Chapter 21. Cheat sheet 190
endloop
? print year X -o
year y x
1:01 2001 1 1
1:02 2002 -1 1
1:03 2003 -1 0
1:04 2004 0 1
1:05 2005 1 2
1:06 2006 1 2
1:07 2007 -1 0
1:08 2008 1 1
1:09 2009 0 3
1:10 2010 -1 1
1:11 2011 1 1
1:12 2012 0 1
...
3:09 2009 0 1
3:10 2010 1 1
3:11 2011 0 2
3:12 2012 1 2
? print id year X -o
id year y x
open data4-10.gdt
markers --to-array="state_codes"
genr index
stringify(index, state_codes)
store joindata.gdt
Chapter 21. Cheat sheet 191
Comment: The markers command saves the observation markers to an array of strings. The com-
mand genr index creates a series that goes 1, 2, 3, . . . , and we attach the state codes to this series
via stringify(). After saving the result we have a datafile that contains a series, index, that can
be matched with whatever series holds the state code strings in the target dataset.
Suppose the relevant string-valued key series in the target dataset is called state. We might prefer
to avoid the need to specify a distinct “outer” key (again see chapter 7). In that case, in place of
genr index
stringify(index, state_codes)
we could do
genr index
series state = index
stringify(state, state_codes)
and the two datafiles will contain a comparable string-valued state series.
series d = (t=="1984:2")
Comment: The internal variable t is used to refer to observations in string form, so if you have a
cross-section sample you may just use d = (t=="123"). If the dataset has observation labels you
can use the corresponding label. For example, if you open the dataset mrw.gdt, supplied with gretl
among the examples, a dummy variable for Italy could be generated via
Note that this method does not require scripting at all. In fact, you might as well use the GUI Menu
“Add/Define new variable” for the same purpose, with the same syntax.
Comment: Suppose you have a list D of mutually exclusive dummies, that is a full set of 0/1 vari-
ables coding for the value of some characteristic, such that the sum of the values of the elements
of D is 1 at each observation. This is, by the way, exactly what the dummify command produces.
The reverse job of dummify can be performed neatly by using the lincomb function.
The code above multiplies the first dummy variable in the list D by 1, the second one by 2, and so
on. Hence, the return value is a series whose value is i if and only if the i-th member of D has value
1.
If you want your coding to start from 0 instead of 1, you’ll have to modify the code snippet above
into
Easter
Problem: I have a 7-day daily dataset. How do I create an “Easter” dummy?
Solution: We have the easterday() function, which returns month and day of Easter given the
year. The following is an example script which uses this function and a few string magic tricks:
series Easter = 0
loop y=2011..2016
a = easterday(y)
m = floor(a)
d = round(100*(a-m))
ed_str = sprintf("%04d-%02d-%02d", y, m, d)
Easter["@ed_str"] = 1
endloop
Comment: The round() function is necessary for the “day” component because otherwise floating-
point problems may ensue. Try the year 2015, for example.
Recoding a variable
Problem: You want to perform a 1-to-1 recode on a variable. For example, consider tennis points:
you may have a variable x holding values 1 to 3 and you want to recode it to 15, 30, 40.
Solution 1:
Solution 2:
Comment: There are many equivalent ways to achieve the same effect, but for simple cases such
as this, the replace function is simple and transparent. If you don’t mind using matrices, scripts
using replace can also be remarkably compact. Note that replace also performs n-to-1 (“surjec-
tive”) replacements, such as
which would turn all entries equal to 2, 3, 5, 11, 22 or 33 to 1 and leave the other ones unchanged.
Comment: The above works fine if the number of distinct values in the source to be condensed into
each dummy variable is fairly small, but it becomes cumbersome if a single dummy must comprise
dozens of source values.
Clever solution:
Comment: The subset of values to be grouped together can be written out as a matrix relatively
compactly (first line). The magic that turns this into the desired series (second line) relies on the
versatility of the “dot” (element-wise) matrix operators. The expression “{src}” gets a column-
vector version of the input series—call this x — and “vec(sel)’” gets the input matrix as a row
vector, in case it’s a column vector or a matrix with both dimensions greater than 1 — call this s. If
x is n × 1 and s is 1 × m, the “.=” operator produces an n × m result, each element (i, j) of which
equals 1 if xi = sj , otherwise 0. The maxr() function along with the “.>” operator (see chapter 17
for both) then produces the result we want.
Of course, whichever procedure you use, you have to repeat for each of the dummy series you want
to create (but keep reading—the “proper” solution is probably what you want if you plan to create
several dummies).
Further comment: Note that the clever solution depends on converting what is “naturally” a vector
result into a series. This will fail if there are missing values in src, since (by default) missing values
will be skipped when converting src to x, and so the number of rows in the result will fall short
of the number of observations in the dataset. One fix is then to subsample the dataset to exclude
missing values before employing this method; another is to adjust the skip_missing setting via
the set command (see the Gretl Command Reference).
Proper solution:
The best solution, in terms of both computational efficiency and code clarity, would be using a
“conversion table” and the replace function, to produce a series on which the dummify command
can be used. For example, suppose we want to convert from a series called fips holding FIPS
codes1 for the 50 US states plus the District of Columbia to a series holding codes for the four
standard US regions. We could create a 2 × 51 matrix — call it srmap— with the 51 FIPS codes on
the first row and the corresponding region codes on the second, and then do
Generating an ARMA(1,1)
Problem: Generate yt = 0.9yt−1 + εt − 0.5εt−1 , with εt ∼ NIID(0, 1).
Recommended solution:
alpha = 0.9
theta = -0.5
series y = filter(normal(), {1, theta}, alpha)
alpha = 0.9
theta = -0.5
series e = normal()
series y = 0
series y = alpha * y(-1) + e + theta * e(-1)
1 FIPS is the Federal Information Processing Standard: it assigns numeric codes from 1 to 56 to the US states and
outlying areas.
Chapter 21. Cheat sheet 194
Comment: The filter function is specifically designed for this purpose so in most cases you’ll
want to take advantage of its speed and flexibility. That said, in some cases you may want to
generate the series in a manner which is more transparent (maybe for teaching purposes).
In the second solution, the statement series y = 0 is necessary because the next statement eval-
uates y recursively, so y[1] must be set. Note that you must use the keyword series here instead
of writing genr y = 0 or simply y = 0, to ensure that y is a series and not a scalar.
yi = 1 for xi < 18
yi = 2 for 18 ≤ xi < 65
yi = 3 for xi ≥ 65
Solution:
Comment: True and false expressions are evaluated as 1 and 0 respectively, so they can be ma-
nipulated algebraically as any other number. The same result could also be achieved by using the
conditional assignment operator (see below), but in most cases it would probably lead to more
convoluted constructs.
Conditional assignment
Problem: Generate yt via the following rule:
(
xt for dt > a
yt =
zt for dt ≤ a
Solution:
series y = (d > a) ? x : z
Comment: There are several alternatives to the one presented above. One is a brute force solution
using loops. Another one, more efficient but still suboptimal, would be
However, the ternary conditional assignment operator is not only the most efficient way to accom-
plish what we want, it is also remarkably transparent to read when one gets used to it. Some readers
may find it helpful to note that the conditional assignment operator works exactly the same way as
the =IF() function in spreadsheets.
series x = time
Comment: The special construct genr time and its variants are aware of whether a dataset is a
panel.
Chapter 21. Cheat sheet 195
nulldata 20
x = normal()
y = normal()
z = x + y # collinear
list A = x y const z
list B = sanitize(A)
list print A
list print B
returns
? list print A
x y const z
? list print B
const x y
Besides: it has been brought to our attention that some mischievous programs out there put the
constant last, instead of first, like God intended. We are not amused by this utter disrespect of
econometric tradition, but if you want to pursue the way of evil, it is rather simple to adapt the
script above to that effect.
# method 1
leverage --save --quiet
Chapter 21. Cheat sheet 196
series h1 = lever
# method 2
series h2 = diag(qform(mX, iXX))
# method 3
series h3 = sumr(mX .* (mX*iXX))
# method 4
series h4 = NA
loop i=1..$nobs
matrix x = mX[i,]’
h4[i] = x’iXX*x
endloop
# verify
print h1 h2 h3 h4 --byobs
Comment: Solution 1 is the preferable one: it relies on the built-in leverage command, which
computes the requested series quite efficiently, taking care of missing values, possible restrictions
to the sample, etc.
However, three more are shown for didactical purposes, mainly to show the user how to manipulate
matrices. Solution 2 first constructs the PX matrix explicitly, via the qform function, and then takes
its diagonal; this is definitely not recommended (despite its compactness), since you generate a
much bigger matrix than you actually need and waste a lot of memory and CPU cycles in the
process. It doesn’t matter very much in the present case, since the sample size is very small, but
with a big dataset this could be a very bad idea.
Solution 3 is more clever, and relies on the fact that, if you define Z = X · (X ′ X)−1 , then hi could
also be written as
k
X
hi = x′i zi = xik zik
i=1
which is in turn equivalent to the sum of the elements of the i-th row of X ⊙ Z, where ⊙ is the
element-by-element product. In this case, your clever usage of matrix algebra would produce a
solution computationally much superior to solution 2.
Solution 4 is the most old-fashioned one, and employs an indexed loop. While this wastes practi-
cally no memory and employs no more CPU cycles in algebraic operations than strictly necessary,
it imposes a much greater burden on the hansl interpreter, since handling a loop is conceptually
more complex than a single operation. In practice, you’ll find that for any realistically-sized prob-
lem, solution 4 is much slower that solution 3.
open bjg.gdt
order = 12
list L = lg || lags(order-1, lg)
smpl +order ;
Chapter 21. Cheat sheet 197
computes the moving minimum, maximum and median of the lg series. Plotting the four series
would produce something similar to figure 21.1.
6.6
lg
6.4 movmin
movmed
6.2 movmax
5.8
5.6
5.4
5.2
4.8
4.6
1950 1952 1954 1956 1958 1960
? eval mcov(X)
2.0016 1.0157
1.0157 1.0306
If, instead, you want your simulated data to have a given sample covariance matrix, you have to
apply the same technique twice: one for standardizing the data, another one for giving it the
covariance structure you want. Example:
S = {2,1;1,1}
T = 1000
Chapter 21. Cheat sheet 198
X = mnormal(T, rows(S))
X = X * (cholesky(S)/cholesky(mcov(X)))’
eval mcov(X)
gives you
? eval mcov(X)
2 1
1 1
as required.
list X = x1 x2 x3
list Z = z1 z2
list dZ = null
loop foreach i Z
series d$i = d * $i
list dZ = dZ d$i
endloop
ols y X Z d dZ
Comment: It’s amazing what string substitution can do for you, isn’t it?
Realized volatility
Given data by the minute, you want to compute the “realized volatility” for the hour as
Problem: P
1 60 2
RVt = 60 τ=1 yt:τ . Imagine your sample starts at time 1:1.
Solution:
smpl --full
genr time
series minute = int(time/60) + 1
series second = time % 60
setobs minute second --panel
series rv = psd(y)^2
setobs 1 1
smpl second==1 --restrict
store foo rv
Comment: Here we trick gretl into thinking that our dataset is a panel dataset, where the minutes
are the “units” and the seconds are the “time”; this way, we can take advantage of the special
function psd(), panel standard deviation. Then we simply drop all observations but one per minute
and save the resulting data (store foo rv translates as “store in the gretl datafile foo.gdt the
series rv”).
Chapter 21. Cheat sheet 199
list L1 = a b c
list L2 = x y z
k1 = 1
loop foreach i L1
k2 = 1
loop foreach j L2
if k1 == k2
ols $i 0 $j
endif
k2++
endloop
k1++
endloop
Comment: The simplest way to achieve the result is to loop over all possible combinations and
filter out the unneeded ones via an if condition, as above. That said, in some cases variable names
can help. For example, if
list Lx = x1 x2 x3
list Ly = y1 y2 y3
then we could just loop over the integers — quite intuitive and certainly more elegant:
loop i=1..3
ols y$i const x$i
endloop
where
k
X
rk = pi qk−i
i=0
is the convolution of the pi and qi coefficients. The same operation can be performed via the FFT,
but in most cases using conv2d is quicker and more natural.
As an example, we’ll use the same one we used in Section 30.5: consider the multiplication of two
polynomials:
P (x) = 1 + 0.5x
Q(x) = 1 + 0.3x − 0.8x 2
R(x) = P (x) · Q(x) = 1 + 0.8x − 0.65x 2 − 0.4x 3
Chapter 21. Cheat sheet 200
p = {1; 0.5}
q = {1; 0.3; -0.8}
r = conv2d(p, q)
print r
r (4 x 1)
1
0.8
-0.65
-0.4
which is indeed the desired result. Note that the same computation could also be performed via
the filter function, at the price of slightly more elaborate syntax.
A △ B = (A \ B) ∪ (B \ A)
where in this context backslash represents the relative complement operator, such that
A \ B = {x ∈ A | x ̸∈ B}
In practice we first check if there are any series in A but not in B, then we perform the reverse check.
If the union of the two results is an empty set, then the lists must contain the same variables. The
hansl syntax for this would be something like
list X print
list revX print
will produce
? list X print
D1 D872 EEN_DIS GCD
? list revX print
GCD EEN_DIS D872 D1
Chapter 21. Cheat sheet 201
Cross-validation
Problem: “I’d like to compute the so-called leave-one-out cross-validation criterion for my regression.
Is there a command in gretl?”
If you have a sample with n observations, the “leave-one-out” cross-validation criterion can be
mechanically computed by running n regressions in which one observation at a time is omitted
and all the other ones are used to forecast its value. The sum of the n squared forecast errors is
the statistic we want. Fortunately, there is no need to do so. It is possible to prove that the same
statistic can be computed as
Xn
CV = [ûi /(1 − hi )]2 ,
i=1
where hi is the i-th element of the “hat” matrix (see section 21.2) from a regression on the whole
sample.
This method is natively provided by gretl as a side benefit to the leverage command, that stores
the CV criterion into the $test accessor. The following script shows the equivalence of the two
approaches:
scalar CV = 0
matrix mX = {X}
loop i = 1 .. $nobs
xi = mX[i,]
yi = price[i]
smpl obs != i --restrict
ols price X --quiet
smpl full
scalar fe = yi - xi * $coeff
CV += fe^2
endloop
sumc(sumr(!ok(m))) > 0
If this gives a non-zero return value you know that m contains at least one non-finite element.
Part II
Econometric methods
203
Chapter 22
22.1 Introduction
Consider (once again) the linear regression model
y = Xβ + u (22.1)
β̂ = (X ′ X)−1 X ′ y (22.2)
If the condition E(u|X) = 0 is satisfied, this is an unbiased estimator; under somewhat weaker
conditions the estimator is biased but consistent. It is straightforward to show that when the OLS
estimator is unbiased (that is, when E(β̂ − β) = 0), its variance is
Var(β̂) = E (β̂ − β)(β̂ − β)′ = (X ′ X)−1 X ′ ΩX(X ′ X)−1 (22.3)
If the iid assumption is not satisfied, two things follow. First, it is possible in principle to construct
a more efficient estimator than OLS—for instance some sort of Feasible Generalized Least Squares
(FGLS). Second, the simple “classical” formula for the variance of the least squares estimator is no
longer correct, and hence the conventional OLS standard errors — which are just the square roots
of the diagonal elements of the matrix defined by (22.4) — do not provide valid means of statistical
inference.
In the recent history of econometrics there are broadly two approaches to the problem of non-
iid errors. The “traditional” approach is to use an FGLS estimator. For example, if the departure
from the iid condition takes the form of time-series dependence, and if one believes that this
could be modeled as a case of first-order autocorrelation, one might employ an AR(1) estimation
method such as Cochrane–Orcutt, Hildreth–Lu, or Prais–Winsten. If the problem is that the error
variance is non-constant across observations, one might estimate the variance as a function of the
independent variables and then perform weighted least squares, using as weights the reciprocals
of the estimated variances.
While these methods are still in use, an alternative approach has found increasing favor: that is,
use OLS but compute standard errors (or more generally, covariance matrices) that are robust with
respect to deviations from the iid assumption. This is typically combined with an emphasis on
using large datasets—large enough that the researcher can place some reliance on the (asymptotic)
consistency property of OLS. This approach has been enabled by the availability of cheap computing
power. The computation of robust standard errors and the handling of very large datasets were
daunting tasks at one time, but now they are unproblematic. The other point favoring the newer
methodology is that while FGLS offers an efficiency advantage in principle, it often involves making
204
Chapter 22. Robust covariance matrix estimation 205
additional statistical assumptions which may or may not be justified, which may not be easy to test
rigorously, and which may threaten the consistency of the estimator — for example, the “common
factor restriction” that is implied by traditional FGLS “corrections” for autocorrelated errors.
James Stock and Mark Watson’s Introduction to Econometrics illustrates this approach at the level of
undergraduate instruction: many of the datasets they use comprise thousands or tens of thousands
of observations; FGLS is downplayed; and robust standard errors are reported as a matter of course.
In fact, the discussion of the classical standard errors (labeled “homoskedasticity-only”) is confined
to an Appendix.
Against this background it may be useful to set out and discuss all the various options offered
by gretl in respect of robust covariance matrix estimation. The first point to notice is that gretl
produces “classical” standard errors by default (in all cases apart from GMM estimation). In script
mode you can get robust standard errors by appending the --robust flag to estimation commands.
In the GUI program the model specification dialog usually contains a “Robust standard errors”
check box, along with a “configure” button that is activated when the box is checked. The configure
button takes you to a configuration dialog (which can also be reached from the main menu bar:
Tools → Preferences → General → HCCME). There you can select from a set of possible robust
estimation variants, and can also choose to make robust estimation the default.
The specifics of the available options depend on the nature of the data under consideration —
cross-sectional, time series or panel— and also to some extent the choice of estimator. (Although
we introduced robust standard errors in the context of OLS above, they may be used in conjunction
with other estimators too.) The following three sections of this chapter deal with matters that are
specific to the three sorts of data just mentioned. Note that additional details regarding covariance
matrix estimation in the context of GMM are given in chapter 27.
We close this introduction with a brief statement of what “robust standard errors” can and cannot
achieve. They can provide for asymptotically valid statistical inference in models that are basically
correctly specified, but in which the errors are not iid. The “asymptotic” part means that they
may be of little use in small samples. The “correct specification” part means that they are not a
magic bullet: if the error term is correlated with the regressors, so that the parameter estimates
themselves are biased and inconsistent, robust standard errors will not save the day.
1 In some specialized contexts spatial autocorrelation may be an issue. Gretl does not have any built-in methods to
small” on average. This point is quite intuitive. The OLS parameter estimates, β̂, satisfy by design
the criterion that the sum of squared residuals,
X X 2
û2t = yt − Xt β̂
is minimized for given X and y. Suppose that β̂ ≠ β. This is almost certain to be the case: even if
OLS is not biased it would be a miracle if the β̂ calculated from any finite sample were
P exactly equal
to β. But in that case thePsum of squares of the true, unobserved errors, u2t = (yt − Xt β)2 is
P
bound to be greater than û2t . The elaborated variants on HC0 take this point on board as follows:
• HC1 : Applies a degrees-of-freedom correction, multiplying the HC0 matrix by T /(T − k).
• HC2 : Instead of using û2t for the diagonal elements of Ω̂, uses û2t /(1 − ht ), where ht =
Xt (X ′ X)−1 Xt′ , the t th diagonal element of the projection matrix PX , which has the property
that PX · y = ŷ. The relevance of ht is that if the variance of all the ut is σ 2 , the expectation
of û2t is σ 2 (1 − ht ), or in other words, the ratio û2t /(1 − ht ) has expectation σ 2 . As Davidson
and MacKinnon show, 0 ≤ ht < 1 for all t, so this adjustment cannot reduce the the diagonal
elements of Ω̂ and in general revises them upward.
• HC3 : Uses û2t /(1 − ht )2 . The additional factor of (1 − ht ) in the denominator, relative to
HC2 , may be justified on the grounds that observations with large variances tend to exert a
lot of influence on the OLS estimates, so that the corresponding residuals tend to be under-
estimated. See Davidson and MacKinnon for a fuller explanation.
• HC3a : Implements the jackknife approach from MacKinnon and White (1985). (HC3 is a close
approximation of this.)
The relative merits of these variants have been explored by means of both simulations and the-
oretical analysis. Unfortunately there is not a clear consensus on which is “best”. Davidson and
MacKinnon argue that the original HC0 is likely to perform worse than the others; nonetheless,
“White’s standard errors” are reported more often than the more sophisticated variants and there-
fore, for reasons of comparability, HC0 is the default HCCME in gretl.
If you wish to use HC1 , HC2 , HC3 , or HC3a you can arrange for this in either of two ways. In script
mode, you can do, for example,
set hc_version 2
In the GUI program you can go to the HCCME configuration dialog, as noted above, and choose any
of these variants to be the default.
disturbances to the market may usher in periods of increased volatility. Such phenomena call for
specific estimation strategies, such as GARCH (see chapter 31).
Despite the points made above, some residual degree of heteroskedasticity may be present in time
series data: the key point is that in most cases it is likely to be combined with serial correlation
(autocorrelation), hence demanding a special treatment. In White’s approach, Ω̂, the estimated
covariance matrix of the ut , remains conveniently diagonal: the variances, E(u2t ), may differ by
t but the covariances, E(ut us ) for s ≠ t, are all zero. Autocorrelation in time series data means
that at least some of the the off-diagonal elements of Ω̂ should be non-zero. This introduces a
substantial complication and requires another piece of terminology: estimates of the covariance
matrix that are asymptotically valid in face of both heteroskedasticity and autocorrelation of the
error process are termed HAC (heteroskedasticity- and autocorrelation-consistent).
The issue of HAC estimation is treated in more technical terms in chapter 27. Here we try to
convey some of the intuition at a more basic level. We begin with a general comment: residual
autocorrelation is not so much a property of the data as a symptom of an inadequate model. Data
may be persistent though time, and if we fit a model that does not take this aspect into account
properly we end up with a model with autocorrelated disturbances. Conversely, it is often possible
to mitigate or even eliminate the problem of autocorrelation by including relevant lagged variables
in a time series model, or in other words, by specifying the dynamics of the model more fully. HAC
estimation should not be seen as the first resort in dealing with an autocorrelated error process.
That said, the “obvious” extension of White’s HCCME to the case of autocorrelated errors would
seem to be this: estimate the off-diagonal elements of Ω̂ (that is, the autocovariances, E(ut us ))
using, once again, the appropriate OLS residuals: ω̂ts = ût ûs . This is basically right, but demands
an important amendment. We seek a consistent estimator, one that converges towards the true Ω as
the sample size tends towards infinity. This can’t work if we allow unbounded serial dependence.
A larger sample will enable us to estimate more of the true ωts elements (that is, for t and s
more widely separated in time) but will not contribute ever-increasing information regarding the
maximally separated ωts pairs since the maximal separation itself grows with the sample size.
To ensure consistency we have to confine our attention to processes exhibiting temporally limited
dependence. In other words we cut off the computation of the ω̂ts values at some maximum value
of p = t − s, where p is treated as an increasing function of the sample size, T , although it cannot
increase in proportion to T .
The simplest variant of this idea is to truncate the computation at some finite lag order p, where
p grows as, say, T 1/4 . The trouble with this is that the resulting Ω̂ may not be a positive definite
matrix. In practical terms, we may end up with negative estimated variances. One solution to this
problem is offered by The Newey–West estimator (Newey and West, 1987), which assigns declining
weights to the sample autocovariances as the temporal separation increases.
To understand this point it is helpful to look more closely at the covariance matrix given in (22.5),
namely,
(X ′ X)−1 (X ′ Ω̂X)(X ′ X)−1
This is known as a “sandwich” estimator. The bread, which appears on both sides, is (X ′ X)−1 . This
k × k matrix is also the key ingredient in the computation of the classical covariance matrix. The
filling in the sandwich is
Σ̂ = X′ Ω̂ X
(k×k) (k×T ) (T ×T ) (T ×k)
It can be proven that under mild regularity conditions T −1 Σ̂ is a consistent estimator of the long-run
covariance matrix of the random k-vector xt · ut .
From a computational point of view it is neither necessary nor desirable to store the (potentially
very large) T × T matrix Ω̂ as such. Rather, one computes the sandwich filling by summation as a
weighted sum:
p
X
Σ̂ = Γ̂ (0) + wj Γ̂ (j) + Γ̂ ′ (j)
j=1
Chapter 22. Robust covariance matrix estimation 208
where wj is the weight given to lag j > 0 and the k × k matrix Γ̂ (j), for j ≥ 0, is given by
T
X
Γ̂ (j) = ût ût−j Xt′ Xt−j ;
t=j+1
that is, the sample autocovariance matrix of xt · ut at lag j, apart from a scaling factor T .
This leaves two questions. How exactly do we determine the maximum lag length or “bandwidth”,
p, of the HAC estimator? And how exactly are the weights wj to be determined? We will return to
the (difficult) question of the bandwidth shortly. As regards the weights, gretl offers three variants.
The default is the Bartlett kernel, as used by Newey and West. This sets
1− j j≤p
p+1
wj =
0 j>p
so the weights decline linearly as j increases. The other two options are the Parzen kernel and the
Quadratic Spectral (QS) kernel. For the Parzen kernel,
3
2
1 − 6aj + 6aj 0 ≤ aj ≤ 0.5
wj = 2(1 − aj )3 0.5 < aj ≤ 1
0 a >1
j
Bartlett Parzen QS
In gretl you select the kernel using the set command with the hac_kernel parameter:
As shown in Table 22.1 the choice between nw1 and nw2 does not make a great deal of difference.
T p (nw1) p (nw2)
50 2 3
100 3 4
150 3 4
200 4 4
300 5 5
400 5 5
You also have the option of specifying a fixed numerical value for p, as in
set hac_lag 6
In addition you can set a distinct bandwidth for use with the Quadratic Spectral kernel (since this
need not be an integer). For example,
VLR (ut )
VLR (xt ) =
(1 − ρ)2
In most cases ut is likely to be less autocorrelated than xt , so a smaller bandwidth should suffice.
Estimation of VLR (xt ) can therefore proceed in three steps: (1) estimate ρ; (2) obtain a HAC estimate
of ût = xt − ρ̂xt−1 ; and (3) divide the result by (1 − ρ)2 .
The application of the above concept to our problem implies estimating a finite-order Vector Au-
toregression (VAR) on the vector variables ξt = Xt ût . In general the VAR can be of any order, but
in most cases 1 is sufficient; the aim is not to build a watertight model for ξt , but just to “mop up”
a substantial part of the autocorrelation. Hence, the following VAR is estimated
ξt = Aξt−1 + εt
set hac_prewhiten on
Chapter 22. Robust covariance matrix estimation 210
There is at present no mechanism for specifying an order other than 1 for the initial VAR.
A further refinement is available in this context, namely data-based bandwidth selection. It makes
intuitive sense that the HAC bandwidth should not simply be based on the size of the sample,
but should somehow take into account the time-series properties of the data (and also the kernel
chosen). A nonparametric method for doing this was proposed by Newey and West (1994); a good
concise account of the method is given in Hall (2005). This option can be invoked in gretl via
This option is the default when prewhitening is selected, but you can override it by giving a specific
numerical value for hac_lag.
Even the Newey–West data-based method does not fully pin down the bandwidth for any particular
sample. The first step involves calculating a series of residual covariances. The length of this series
is given as a function of the sample size, but only up to a scalar multiple — for example, it is given
as O(T 2/9 ) for the Bartlett kernel. Gretl uses an implied multiple of 1.
The ES method is the default. The off option means that gretl will refuse to produce HAC standard
errors when the sample includes incomplete observations: use this if you have qualms about the
modified methods.
Long-run variance
Let us expand a little on the subject of the long-run variance that was mentioned above and the
associated tools offered by gretl for scripting. (You may also want to check out the reference for
the lrcovar function for the multivariate case.) As is well known, the variance of the average of T
random variables x1 , x2 , . . . , xT with equal variance σ 2 equals σ 2 /T if the data are uncorrelated. In
this case, the sample variance of xt over the sample size provides a consistent estimator.
Chapter 22. Robust covariance matrix estimation 211
PT
If, however, there is serial correlation among the xt s, the variance of X̄ = T −1 t=1 xt must be
estimated differently. One of the most widely used statistics for this purpose is a nonparametric
kernel estimator with the Bartlett kernel defined as
TX
−k k
X
ω̂2 (k) = T −1 wi (xt − X̄)(xt−i − X̄) , (22.6)
t=k i=−k
where the integer k is known as the window size and the wi terms are the so-called Bartlett weights,
|i|
defined as wi = 1 − k+1 . It can be shown that, for k large enough, ω̂2 (k)/T yields a consistent
estimator of the variance of X̄.
gretl implements this estimator by means of the function lrvar(). This function takes one re-
quired argument, namely the series whose long-run variance is to be estimated, followed by two
optional arguments. The first of these can be used to supply a value for k; if it is omitted or nega-
tive, the popular choice T 1/3 is used. The second allows specification of an assumed value for the
population mean of X, which then replaces X̄ in the variance calculation. Usage is illustrated below.
• The variance of the error term may differ across the cross-sectional units.
• The covariance of the errors across the units may be non-zero in each time period.
• If the “between” variation is not swept out, the errors may exhibit autocorrelation, not in the
usual time-series sense but in the sense that the mean value of the error term may differ
across units. This is relevant when estimation is by pooled OLS.
Gretl currently offers two robust covariance matrix estimators specifically for panel data. These are
available for models estimated via fixed effects, random effects, pooled OLS, and pooled two-stage
least squares. The default robust estimator is that suggested by Arellano (2003), which is HAC
provided the panel is of the “large n, small T ” variety (that is, many units are observed in relatively
few periods). The Arellano estimator is
n
−1 −1
X
Σ̂A = X ′ X Xi′ ûi û′i Xi X ′ X
i=1
where X is the matrix of regressors (with the group means subtracted in the case of fixed effects, or
quasi-demeaned in the case of random effects) ûi denotes the vector of residuals for unit i, and n
is the number of cross-sectional units.2 Cameron and Trivedi (2005) make a strong case for using
this estimator; they note that the ordinary White HCCME can produce misleadingly small standard
2 This variance estimator is also known as the “clustered (over entities)” estimator.
Chapter 22. Robust covariance matrix estimation 212
errors in the panel context because it fails to take autocorrelation into account.3 In addition Stock
and Watson (2008) show that the White HCCME is inconsistent in the fixed-effects panel context for
fixed T > 2.
In cases where autocorrelation is not an issue the estimator proposed by Beck and Katz (1995)
and discussed by Greene (2003, chapter 13) may be appropriate. This estimator, which takes into
account contemporaneous correlation across the units and heteroskedasticity by unit, is
n Xn
′
−1 X ′
−1
Σ̂BK = X X σ̂ij Xi Xj X ′ X
i=1 j=1
set pcse on
(Note that regardless of the pcse setting, the robust estimator is not used unless the --robust flag
is given, or the “Robust” box is checked in the GUI program.)
where m denotes the number of clusters, and Xj and ûj denote, respectively, the matrix of regres-
sors and the vector of residuals that fall within cluster j. As noted above, the Arellano variance
estimator for panel data models is a special case of this, where the clustering is by panel unit.
For models estimated by the method of Maximum Likelihood (in which case the standard variance
estimator is the inverse of the negative Hessian, H), the cluster estimator is
m
X
Σ̂C = H −1 Gj′ Gj H −1
j=1
where Gj is the sum of the “score” (that is, the derivative of the loglikelihood with respect to the
parameter estimates) across the observations falling within cluster j.
3 See also Cameron and Miller (2015) for a discussion of the Arellano-type estimator in the context of the random
effects model.
Chapter 22. Robust covariance matrix estimation 213
It is common to apply a degrees of freedom adjustment to these estimators (otherwise the variance
may appear misleadingly small in comparison with other estimators, if the number of clusters is
small). In the least squares case the factor is (m/(m − 1)) × (n − 1)/(n − k), where n is the total
number of observations and k is the number of parameters estimated; in the case of ML estimation
the factor is just m/(m − 1).
ols y 0 x1 x2 --cluster=cvar
The specified clustering variable must (a) be defined (not missing) at all observations used in esti-
mating the model and (b) take on at least two distinct values over the estimation range. The clusters
are defined as sets of observations having a common value for the clustering variable. It is generally
expected that the number of clusters is substantially less than the total number of observations.
Chapter 23
Panel data
A panel dataset is one in which each of N > 1 units (sometimes called “individuals” or “groups”) is
observed over time. In a balanced panel there are T > 1 observations on each unit; more generally
the number of observations may differ by unit. In the following we index units by i and time by t.
To allow for imbalance in a panel we use the notation Ti to refer to the number of observations for
unit or individual i.
where yit is the observation on the dependent variable for cross-sectional unit i in period t, Xit
is a 1 × k vector of independent variables observed for unit i in period t, β is a k × 1 vector of
parameters, and uit is an error or disturbance term specific to unit i in period t.
The fixed and random effects models have in common that they decompose the unitary pooled
error term, uit . For the fixed effects model we write uit = αi + εit , yielding
That is, we decompose uit into a unit-specific and time-invariant component, αi , and an observation-
specific error, εit .1 The αi s are then treated as fixed parameters (in effect, unit-specific y-intercepts),
which are to be estimated. This can be done by including a dummy variable for each cross-sectional
1 It is possible to break a third component out of u , namely w , a shock that is time-specific but common to all the
it t
units in a given period. In the interest of simplicity we do not pursue that option here.
214
Chapter 23. Panel data 215
unit (and suppressing the global constant). This is sometimes called the Least Squares Dummy Vari-
ables (LSDV) method. Alternatively, one can subtract the group mean from each of variables and
estimate a model without a constant. In the latter case the dependent variable may be written as
where Ti is the number of observations for unit i. An exactly analogous formulation applies to the
independent variables. Given parameter estimates, β̂, obtained using such de-meaned data we can
recover estimates of the αi s using
Ti
1 X
α̂i = yit − Xit β̂
Ti t=1
These two methods (LSDV, and using de-meaned data) are numerically equivalent. gretl takes the
approach of de-meaning the data. If you have a small number of cross-sectional units, a large num-
ber of time-series observations per unit, and a large number of regressors, it is more economical
in terms of computer memory to use LSDV. If need be you can easily implement this manually. For
example,
genr unitdum
ols y x du_*
In contrast to the fixed effects model, the vi s are not treated as fixed parameters, but as random
drawings from a given probability distribution.
The celebrated Gauss–Markov theorem, according to which OLS is the best linear unbiased esti-
mator (BLUE), depends on the assumption that the error term is independently and identically
distributed (IID). In the panel context, the IID assumption means that E(u2it ), in relation to equa-
tion 23.1, equals a constant, σu2 , for all i and t, while the covariance E(uis uit ) equals zero for all
s ≠ t and the covariance E(ujt uit ) equals zero for all j ≠ i.
If these assumptions are not met—and they are unlikely to be met in the context of panel data —
OLS is not the most efficient estimator. Greater efficiency may be gained using generalized least
squares (GLS), taking into account the covariance structure of the error term.
Consider observations on a given unit i at two different times s and t. From the hypotheses above
it can be worked out that Var(uis ) = Var(uit ) = σv2 + σε2 , while the covariance between uis and uit
is given by E(uis uit ) = σv2 .
In matrix notation, we may group all the Ti observations for unit i into the vector yi and write it as
yi = Xi β + ui (23.4)
Chapter 23. Panel data 216
The vector ui , which includes all the disturbances for individual i, has a variance–covariance matrix
given by
Var(ui ) = Σi = σε2 I + σv2 J (23.5)
where J is a square matrix with all elements equal to 1. It can be shown that the matrix
θi
Ki = I − J,
Ti
r
where θi = 1 − σε2 / σε2 + Ti σv2 , has the property
Ki ΣKi′ = σε2 I
Ki yi = Ki Xi β + Ki ui (23.6)
satisfies the Gauss–Markov conditions, and OLS estimation of (23.6) provides efficient inference.
But since
Ki yi = yi − θi ȳi
GLS estimation is equivalent to OLS using “quasi-demeaned” variables; that is, variables from which
we subtract a fraction θ of their average.2 Notice that for σε2 → 0, θ → 1, while for σv2 → 0, θ → 0.
This means that if all the variance is attributable to the individual effects, then the fixed effects
estimator is optimal; if, on the other hand, individual effects are negligible, then pooled OLS turns
out, unsurprisingly, to be the optimal estimator.
To implement the GLS approach we need to calculate θ, which in turn requires estimates of the
two variances σε2 and σv2 . (These are often referred to as the “within” and “between” variances
respectively, since the former refers to variation within each cross-sectional unit and the latter to
variation between the units). Several means of estimating these magnitudes have been suggested in
the literature (see Baltagi, 1995); by default gretl uses the method of Swamy and Arora (1972): σε2
is estimated by the residual variance from the fixed effects model, and σv2 is estimated indirectly
with the help of the “between” regression which uses the group means of all the relevant variables:
is,
ȳi = X̄i β + ei
The residual variance from this regression, se2 , can be shown to estimate the sum σv2 + σε2 /T . An
estimate of σv2 can therefore be obtained by subtracting 1/T times the estimate of σε2 from se2 :
Alternatively, if the --nerlove option is given, gretl uses the method suggested by Nerlove (1971).
In this case σv2 is estimated as the sample variance of the fixed effects, α̂i ,
n
1 X ¯
2
σ̂v2 = α̂i − α̂ (23.8)
N − 1 i=1
option with the panel command. But the gain in efficiency from doing so may well be slim; for a
discussion of this point and related matters see Cottrell (2017). Unbalancedness also affects the
Nerlove (1971) estimator, but the econometric literature offers no guidance on the details. Gretl
uses the weighted average of the fixed effects as a natural extension of the original method. Again,
see Cottrell (2017) for further details.
Choice of estimator
Which panel method should one use, fixed effects or random effects?
One way of answering this question is in relation to the nature of the data set. If the panel comprises
observations on a fixed and relatively small set of units of interest (say, the member states of the
European Union), there is a presumption in favor of fixed effects. If it comprises observations on a
large number of randomly selected individuals (as in many epidemiological and other longitudinal
studies), there is a presumption in favor of random effects.
Besides this general heuristic, however, various statistical issues must be taken into account.
1. Some panel data sets contain variables whose values are specific to the cross-sectional unit
but which do not vary over time. If you want to include such variables in the model, the fixed
effects option is simply not available. When the fixed effects approach is implemented using
dummy variables, the problem is that the time-invariant variables are perfectly collinear with
the per-unit dummies. When using the approach of subtracting the group means, the issue is
that after de-meaning these variables are nothing but zeros.
2. A somewhat analogous issue arises with the random effects estimator. As mentioned above,
the default Swamy–Arora method relies on the group-means regression to obtain a measure
of the between variance. Suppose we have observations on n units or individuals and there
are k independent variables of interest. If k > n, this regression cannot be run — since we
have only n effective observations — and hence Swamy–Arora estimates cannot be obtained.
In this case, however, it is possible to use Nerlove’s method instead.
If both fixed effects and random effects are feasible for a given specification and dataset, the choice
between these estimators may be expressed in terms of the two econometric desiderata, efficiency
and consistency.
From a purely statistical viewpoint, we could say that there is a tradeoff between robustness and
efficiency. In the fixed effects approach, we do not make any hypotheses on the “group effects” (that
is, the time-invariant differences in mean between the groups) beyond the fact that they exist —
and that can be tested; see below. As a consequence, once these effects are swept out by taking
deviations from the group means, the remaining parameters can be estimated.
On the other hand, the random effects approach attempts to model the group effects as drawings
from a probability distribution instead of removing them. This requires that individual effects are
representable as a legitimate part of the disturbance term, that is, zero-mean random variables,
uncorrelated with the regressors.
As a consequence, the fixed-effects estimator “always works”, but at the cost of not being able to
estimate the effect of time-invariant regressors. The richer hypothesis set of the random-effects
estimator ensures that parameters for time-invariant regressors can be estimated, and that esti-
mation of the parameters for time-varying regressors is carried out more efficiently. These advan-
tages, though, are tied to the validity of the additional hypotheses. If, for example, there is reason
to think that individual effects may be correlated with some of the explanatory variables, then the
random-effects estimator would be inconsistent, while fixed-effects estimates would still be valid.
The Hausman test is built on this principle (see below): if the fixed- and random-effects estimates
agree, to within the usual statistical margin of error, there is no reason to think the additional
hypotheses invalid, and as a consequence, no reason not to use the more efficient RE estimator.
Chapter 23. Panel data 218
• Collect the fixed-effects estimates in a vector β̃ and the corresponding random-effects esti-
mates in β̂, then form the difference vector (β̃ − β̂).
• Form the covariance matrix of the difference vector as Var(β̃ − β̂) = Var(β̃) − Var(β̂) = Ψ ,
where Var(β̃) and Var(β̂) are estimated by the sample variance matrices of the fixed- and
random-effects models respectively.3
′
• Compute H = β̃ − β̂ Ψ −1 β̃ − β̂ .
Given the relative efficiencies of β̃ and β̂, the matrix Ψ “should be” positive definite, in which case
H is positive, but in finite samples this is not guaranteed and of course a negative χ 2 value is not
admissible.
The regression method avoids this potential problem. The procedure is to estimate, via OLS, an
augmented regression in which the dependent variable is quasi-demeaned y and the regressors
include both quasi-demeaned X (as in the RE specification) and the de-meaned variants of all the
time-varying variables (i.e. the fixed-effects regressors). The Hausman null then implies that the
coefficients on the latter subset of regressors should be statistically indistinguishable from zero.
If the RE specification employs the default covariance-matrix estimator (assuming IID errors), H
can be obtained as follows:
3 Hausman (1978) showed that the covariance of the difference takes this simple form when β̂ is an efficient estimator
and β̃ is inefficient.
Chapter 23. Panel data 219
• Treat the random-effects model as the restricted model, and record its sum of squared resid-
uals as SSRr .
• Estimate the augmented (unrestricted) regression and record its sum of squared residuals as
SSRu .
• Compute H = n (SSRr − SSRu ) /SSRu , where n is the total number of observations used.
Alternatively, if the --robust option is selected for RE estimation, H is calculated as a Wald test
based on a robust estimate of the covariance matrix of the augmented regression. Either way, H
cannot be negative.
By default gretl computes the Hausman test via the regression method, but it uses the matrix-
difference method if you pass the option --matrix-diff to the panel command.
Serial correlation
A simple test for first-order autocorrelation of the error term, namely the Durbin–Watson (DW)
statistic, is printed as part of the output for pooled OLS as well as fixed-effects and random-effects
estimation. Let us define “serial correlation proper” as serial correlation strictly in the time di-
mension of a panel dataset. When based on the residuals from fixed-effects estimation, the DW
statistic is a test for serial correlation proper.4 The DW value shown in the case of random-effects
estimation is based on the fixed-effects residuals. When DW is based on pooled OLS residuals it
tests for serial correlation proper only on the assumption of a common intercept. Put differently,
in this case it tests a joint null hypothesis: absence of fixed effects plus absence of (first order)
serial correlation proper. In the presence of missing observations the DW statistic is calculated as
described in Baltagi and Wu (1999) (their expression for d1 under equation (16) on page 819).
When it is computed, the DW statistic can be retrieved via the accessor $dw after estimation. In
addition, an approximate P -value for the null of no serial correlation (ρ = 0) against the alternative
of ρ > 0 may be available via the accessor $dwpval. This is based on the analysis in Bhargava
et al. (1982); strictly speaking it is the marginal significance level of DW considered as a dL value
(the value below which the test rejects, as opposed to dU , the value above which the test fails to
reject). In the panel case, however, dL and dU are quite close, particularly when N (the number of
individual units) is large. At present gretl does not attempt to compute such P -values when the
number of observations differs across individuals.
4 The generalization of the Durbin–Watson statistic from the straight time-series context to panel data is due to
The method that gretl uses internally is exemplified in Listing 23.1. The coefficients in the second
OLS estimation, including the intercept, agree with those in the initial fixed effects model, though
the standard errors differ due to a degrees of freedom correction in the fixed-effects covariance
matrix. (Note that the pmean function returns the group mean of a series.) The third estimator —
which produces quite a lot of output — instead uses the stdize function to create the centered
dummies. It thereby shows the equivalence of the internally-used method to “OLS plus centered
dummies”. (Note that in this case the standard errors agree with the initial estimates.)
Listing 23.1: Calculating the intercept in the fixed effects model [Download ▼]
open abdata.gdt
###
### built-in method
###
###
### recentering "by hand"
###
###
### using centered dummies
###
the “important” parameters of a model in which you want to include (for whatever reason) a full
set of individual dummies. If you take the second of these perspectives, your dependent variable is
unmodified y and your model includes the unit dummies; the appropriate R 2 measure is then the
squared correlation between y and the ŷ computed using both the measured individual effects and
the effects of the explicitly named regressors. This is reported by gretl as the “LSDV R-squared”. If
you take the first point of view, on the other hand, your dependent variable is really yit − ȳi and
your model just includes the β terms, the coefficients of deviations of the x variables from their
per-unit means. In this case, the relevant measure of R 2 is the so-called “within” R 2 ; this variant
is printed by gretl for fixed-effects model in place of the adjusted R 2 (it being unclear in this case
what exactly the “adjustment” should amount to anyway).
In this model gretl takes the “fitted value” ($yhat) to be α̂i + Xit β̂, and the residual ($uhat) to be
yit minus this fitted value. This makes sense because the fixed effects (the αi terms) are taken as
parameters to be estimated. However, it can be argued that the fixed effects are not really “explana-
tory” and if one defines the residual as the observed yit value minus its “explained” component
one might prefer to see just yit − Xit β̂. You can get this after fixed-effects estimation as follows:
where $ahat gives the unit-specific intercept (as it would be calculated if one included all N unit
dummies and omitted a common y-intercept), and $coeff[1] gives the “global” y-intercept.6
Now consider the random-effects model:
In this case gretl considers the error term to be vi + εit (since vi is conceived as a random drawing)
and the $uhat series is an estimate of this, namely
yit − Xit β̂
What if you want an estimate of just vi (or just εit ) in this case? This poses a signal-extraction
problem: given the composite residual, how to recover an estimate of its components? The solution
is to ascribe to the individual effect, v̂i , a suitable fraction of the mean residual per individual,
¯ i = Ti ûit . The “suitable fraction” is the proportion of the variance of the variance of ūi that is
P
û t=1
due to vi , namely
σv2
= 1 − (1 − θi )2
σv + σε2 /Ti
2
After random effects estimation in gretl you can access a series containing the v̂i s under the name
$ahat. This series can be calculated by hand as follows:
6 For anyone used to Stata, gretl’s fixed-effects $uhat corresponds to what you get from Stata’s “predict, e” after
First, if the error uit includes a group effect, vi , then yit−1 is bound to be correlated with the error,
since the value of vi affects yi at all t. That means that OLS applied to (23.9) will be inconsistent
as well as inefficient. The fixed-effects model sweeps out the group effects and so overcomes this
particular problem, but a subtler issue remains, which applies to both fixed and random effects
estimation. Consider the de-meaned representation of fixed effects, as applied to the dynamic
model,
ỹit = X̃it β + ρ ỹi,t−1 + εit
where ỹit = yit − ȳi and εit = uit − ūi (or uit − αi , using the notation of equation 23.2). The trouble
is that ỹi,t−1 will be correlated with εit via the group mean, ȳi . The disturbance εit influences yit
directly, which influences ȳi , which, by construction, affects the value of ỹit for all t. The same
issue arises in relation to the quasi-demeaning used for random effects. Estimators which ignore
this correlation will be consistent only as T → ∞ (in which case the marginal effect of εit on the
group mean of y tends to vanish).
One strategy for handling this problem, and producing consistent estimates of β and ρ, was pro-
posed by Anderson and Hsiao (1981). Instead of de-meaning the data, they suggest taking the first
difference of (23.9), an alternative tactic for sweeping out the group effects:
where ηit = ∆uit = ∆(vi + εit ) = εit − εi,t−1 . We’re not in the clear yet, given the structure of the
error ηit : the disturbance εi,t−1 is an influence on both ηit and ∆yi,t−1 = yit − yi,t−1 . The next step
is then to find an instrument for the “contaminated” ∆yi,t−1 . Anderson and Hsiao suggest using
either yi,t−2 or ∆yi,t−2 , both of which will be uncorrelated with ηit provided that the underlying
errors, εit , are not themselves serially correlated.
The Anderson–Hsiao estimator is not provided as a built-in function in gretl, since gretl’s sensible
handling of lags and differences for panel data makes it a simple application of regression with
instrumental variables—see Listing 23.2, which is based on a study of country growth rates by
Nerlove (1999).7
Although the Anderson–Hsiao estimator is consistent, it is not most efficient: it does not make the
fullest use of the available instruments for ∆yi,t−1 , nor does it take into account the differenced
structure of the error ηit . It is improved upon by the methods of Arellano and Bond (1991) and
Blundell and Bond (1998). These methods are taken up in the next chapter.
Listing 23.2: The Anderson–Hsiao estimator for a dynamic panel model [Download ▼]
The command for estimating dynamic panel models in gretl is dpanel. This command supports
both the “difference” estimator (Arellano and Bond, 1991) and the “system” estimator (Blundell and
Bond, 1998), which has become the method of choice in the applied literature.
24.1 Introduction
Notation
A dynamic linear panel data model can be represented as follows (in notation based on Arellano
(2003)):
yit = αyi,t−1 + β′ xit + ηi + vit (24.1)
where i = 1, 2 . . . , N indexes the cross-section units and t indexes time.
The main idea behind the difference estimator is to sweep out the individual effect via differencing.
First-differencing eq. (24.1) yields
in obvious notation. The error term of (24.2) is, by construction, autocorrelated and also correlated
with the lagged dependent variable, so an estimator that takes both issues into account is needed.
The endogeneity issue is solved by noting that all values of yi,t−k with k > 1 can be used as
instruments for ∆yi,t−1 : unobserved values of yi,t−k (whether missing or pre-sample) can safely be
substituted with 0. In the language of GMM, this amounts to using the relation
as an orthogonality condition.
Autocorrelation is dealt with by noting that if vit is white noise, the covariance matrix of the vector
whose typical element is ∆vit is proportional to a matrix H that has 2 on the main diagonal, −1
on the first subdiagonals and 0 elsewhere. One-step GMM estimation of equation (24.2) amounts to
computing
−1
X X X X
γ̂ = W′i Zi AN Z′i Wi W′i Zi AN Z′i ∆yi (24.4)
i i i i
where
h i′
∆yi = ∆yi3 ··· ∆yiT
" #′
∆yi2 ··· ∆yi,T −1
Wi =
∆xi3 ··· ∆xiT
yi1 0 0 ··· 0 ∆xi3
0 yi1 yi2 ··· 0
∆xi4
Zi =
..
.
0 0 0 ··· yi,T −2 ∆xiT
224
Chapter 24. Dynamic panel models 225
and
−1
X
AN = Z′i HZi
i
Once the 1-step estimator is computed, the sample covariance matrix of the estimated residuals
can be used instead of H to obtain 2-step estimates, which are not only consistent but asymp-
totically efficient. (In principle the process may be iterated, but nobody seems to be interested.)
Standard GMM theory applies, except for one point: Windmeijer (2005) has computed finite-sample
corrections to the asymptotic covariance matrix of the parameters, which are nowadays almost
universally used.
The difference estimator is consistent, but has been shown to have poor properties in finite samples
when α is near one. People these days prefer the so-called “system” estimator, which complements
the differenced data (with lagged levels used as instruments) with data in levels (using lagged dif-
ferences as instruments). The system estimator relies on an extra orthogonality condition which
has to do with the earliest value of the dependent variable yi,1 . The interested reader is referred
to Blundell and Bond (1998, pp. 124–125) for details, but here it suffices to say that this condi-
tion is satisfied in mean-stationary models and brings an improvement in efficiency that may be
substantial in many cases.
The set of orthogonality conditions exploited in the system approach is not very much larger than
with the difference estimator since most of the possible orthogonality conditions associated with
the equations in levels are redundant, given those already used for the equations in differences.
The key equations of the system estimator can be written as
−1
X X X X
γ̃ = W̃′i Z̃i AN Z̃′i W̃i W̃′i Z̃i AN Z̃′i ∆ỹi (24.5)
i i i i
where
h i′
∆ỹi = ∆yi3 ··· ∆yiT yi3 ··· yiT
" #′
∆yi2 ··· ∆yi,T −1 yi2 ··· yi,T −1
W̃i =
∆xi3 ··· ∆xiT xi3 ··· xiT
yi1 0 0 ··· 0 0 ··· 0 ∆xi3
0 yi1 yi2 ··· 0 0 ··· 0 ∆xi4
..
.
0 0 0 ··· yi,T −2 0 ··· 0 ∆xiT
Z̃i =
..
.
0 0 0 ··· 0 ··· 0 xi3
∆yi2
..
.
0 0 0 ··· 0 0 ··· ∆yi,T −1 xiT
and
−1
X
AN = Z̃′i H ∗ Z̃i
i
In this case choosing a precise form for the matrix H ∗ for the first step is no trivial matter. Its
north-west block should be as similar as possible to the covariance matrix of the vector ∆vit , so
Chapter 24. Dynamic panel models 226
the same choice as the “difference” estimator is appropriate. Ideally, the south-east block should
be proportional to the covariance matrix of the vector ι ηi + v, that is σv2 I + ση2 ι ι ′ ; but since ση2 is
unknown and any positive definite matrix renders the estimator consistent, people just use I. The
off-diagonal blocks should, in principle, contain the covariances between ∆vis and vit , which would
be an identity matrix if vit is white noise. However, since the south-east block is typically given a
conventional value anyway, the benefit in making this choice is not obvious. Some packages use I;
others use a zero matrix. Asymptotically, it should not matter, but on real datasets the difference
between the resulting estimates can be noticeable.
Rank deficiency
Both the difference estimator (24.4) and the system estimator (24.5) depend for their existence on
the invertibility of AN . This matrix may turn out to be singular for several reasons. However, this
does not mean that the estimator is not computable. In some cases, adjustments are possible such
that the estimator does exist, but the user should be aware that in such cases not all software
packages use the same strategy and replication of results may prove difficult or even impossible.
A first reason why AN may be singular is unavailability of instruments, chiefly because of missing
observations. This case is easy to handle. If a particular row of Z̃i is zero for all units, the corre-
sponding orthogonality condition (or the corresponding instrument if you prefer) is automatically
dropped; the overidentification rank is then adjusted for testing purposes.
Even if no instruments are zero, however, AN could be rank deficient. A trivial case occurs if there
are collinear instruments, but a less trivial case may arise when T (the total number of time periods
available) is not much smaller than N (the number of units), as, for example, in some macro datasets
where the units are countries. The total number of potentially usable orthogonality conditions is
O(T 2 ), which may well exceed N in some cases. Since AN is the sum of N matrices which have, at
most, rank 2T − 3 it could well happen that the sum is singular.
In all these cases, dpanel substitutes the pseudo-inverse of AN (Moore–Penrose) for its regular
inverse. Our choice is shared by some software packages, but not all, so replication may be hard.
yt = αyt−1 + η + ϵt
Chapter 24. Dynamic panel models 227
where the i index is omitted for clarity. Suppose you have an individual with t = 1 . . . 5, for which
y3 is missing. It may seem that the data for this individual are unusable, because differencing yt
would produce something like
t 1 2 3 4 5
yt ∗ ∗ ◦ ∗ ∗
∆yt ◦ ∗ ◦ ◦ ∗
where ∗ = nonmissing and ◦ = missing. Estimation seems to be unfeasible, since there are no
periods in which ∆yt and ∆yt−1 are both observable.
However, we can use a k-difference operator and get
∆k yt = α∆k yt−1 + ∆k ϵt
where ∆k = 1 − Lk and past levels of yt are valid instruments. In this example, we can choose k = 3
and use y1 as an instrument, so this unit is in fact usable.
Not all software packages seem to be aware of this possibility, so replicating published results may
prove tricky if your dataset contains individuals with gaps between valid observations.
24.2 Usage
One feature of dpanel’s syntax is that you get default values for several choices you may wish to
make, so that in a “standard” situation the command is very concise. The simplest case of the
model (24.1) is a plain AR(1) process:
dpanel 1 ; y
Gretl assumes that you want to estimate (24.6) via the difference estimator (24.4), using as many
orthogonality conditions as possible. The scalar 1 between dpanel and the semicolon indicates that
only one lag of y is included as an explanatory variable; using 2 would give an AR(2) model. The
syntax that gretl uses for the non-seasonal AR and MA lags in an ARMA model is also supported in
this context. For example, if you want the first and third lags of y (but not the second) included as
explanatory variables you can say
dpanel {1 3} ; y
To use a single lag of y other than the first you need to employ this mechanism:
To use the system estimator instead, you add the --system option, as in
dpanel 1 ; y --system
The level orthogonality conditions and the corresponding instrument are appended automatically
(see eq. 24.5).
Chapter 24. Dynamic panel models 228
Regressors
If additional regressors are to be included, they should be listed after the dependent variable in
the same way as other gretl estimation commands, such as ols. For the difference orthogonality
relations, dpanel takes care of transforming the regressors in parallel with the dependent variable.
One case of potential ambiguity is when an intercept is specified but the difference-only estimator
is selected, as in
dpanel 1 ; y const
In this case the default dpanel behavior, which agrees with David Roodman’s xtabond2 for Stata
(Roodman, 2009a), is to drop the constant (since differencing reduces it to nothing but zeros).
However, for compatibility with the DPD package for Ox, you can give the option --dpdstyle, in
which case the constant is retained (equivalent to including a linear trend in equation 24.1). A
similar point applies to the period-specific dummy variables which can be added in dpanel via the
--time-dummies option: in the differences-only case these dummies are entered in differenced
form by default, but when the --dpdstyle switch is applied they are entered in levels.
The standard gretl syntax applies if you want to use lagged explanatory variables, so for example
the command
Instruments
The default rules for instruments are:
• lags of the dependent variable are instrumented using all available orthogonality conditions;
and
• additional regressors are considered exogenous, so they are used as their own instruments.
If a different policy is wanted, the instruments should be specified in an additional list, separated
from the regressors list by a semicolon. The syntax closely mirrors that of the tsls command,
but in this context it is necessary to distinguish between “regular” instruments and what are often
called “GMM-style” instruments (that is, instruments that are handled in the same block-diagonal
manner as lags of the dependent variable, as described above).
“Regular” instruments are transformed in the same way as regressors, and the contemporaneous
value of the transformed variable is used to form an orthogonality condition. Since regressors are
treated as exogenous by default, it follows that these two commands estimate the same model:
dpanel 1 ; y z
dpanel 1 ; y z ; z
The instrument specification in the second case simply confirms what is implicit in the first: that
z is exogenous. Note, though, that if you have some additional variable z2 which you want to add
as a regular instrument, it then becomes necessary to include z in the instrument list if it is to be
treated as exogenous:
The specification of “GMM-style” instruments is handled by the special constructs GMM() and
GMMlevel(). The first of these relates to instruments for the equations in differences, and the
second to the equations in levels. The syntax for GMM() is
where name is replaced by the name of a series (or the name of a list of series), and minlag and
maxlag are replaced by the minimum and maximum lags to be used as instruments. The same goes
for GMMlevel().
One common use of GMM() is to limit the number of lagged levels of the dependent variable used as
instruments for the equations in differences. It’s well known that although exploiting all possible
orthogonality conditions yields maximal asymptotic efficiency, in finite samples it may be prefer-
able to use a smaller subset—see Roodman (2009b), Okui (2009). For example, the specification
dpanel 1 ; y ; GMM(y, 2, 4)
GMM()
yi1 0 0 0 0 0 ··· yi1 0 0 ···
0 yi1 yi2 0 0 0 ··· yi1 yi2 0 ···
=⇒
0 0 0 yi1 yi2 yi3 ··· y
i1 yi2 yi3 ···
.. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . .
GMMlevel()
∆yi2 0 0 ··· ∆yi2
0 0 ···
∆yi3 ∆yi3
=⇒
0 0 ∆yi4 ··· ∆y
i4
.. .. .. .. ..
. . . . .
This treatment of instruments can be selected per GMM or GMMlevel case — by appending the
collapse flag following the maxlag value — or it can be set “globally” by use of the --collapse
option to the dpanel command. To our knowledge Roodman’s xtabond2 was the first software to
offer this useful facility.
A further use of GMM() is to exploit more fully the potential orthogonality conditions afforded by
an exogenous regressor, or a related variable that does not appear as a regressor. For example, in
dpanel 1 ; y x ; GMM(z, 2, 6)
the variable x is considered an endogenous regressor, and up to 5 lags of z are used as instruments.
Note that in the following script fragment
dpanel 1 ; y z
dpanel 1 ; y z ; GMM(z,0,0)
Chapter 24. Dynamic panel models 230
the two estimation commands should not be expected to give the same result, as the sets of orthog-
onality relationships are subtly different. In the latter case, you have T − 2 separate orthogonality
relationships pertaining to zit , none of which has any implication for the other ones; in the former
case, you only have one. In terms of the Zi matrix, the first form adds a single row to the bottom of
the instruments matrix, while the second form adds a diagonal block with T − 2 columns; that is,
h i
zi3 zi4 · · · zit
versus
zi3 0 ··· 0
0 zi4 ··· 0
.. ..
. .
0 0 ··· zit
#include <oxstd.h>
#import <packages/dpd/dpd>
main()
{
decl dpd = new DPD();
dpd.Load("abdata.in7");
dpd.SetYear("YEAR");
delete dpd;
}
In the examples below we take this template for granted and show just the model-specific code.
Example 1
The following Ox/DPD code—drawn from abest1.ox— replicates column (b) of Table 4 in Arellano
and Bond (1991), an instance of the differences-only or GMM-DIF estimator. The dependent variable
1 Seehttp://www.doornik.com/download.html.
2 Tobe specific, this is using Ox Console version 5.10, version 1.24 of the DPD package, and gretl built from CVS as of
2010-10-23, all on Linux.
Chapter 24. Dynamic panel models 231
is the log of employment, n; the regressors include two lags of the dependent variable, current and
lagged values of the log real-product wage, w, the current value of the log of gross capital, k, and
current and lagged values of the log of industry output, ys. In addition the specification includes
a constant and five year dummies; unlike the stochastic regressors, these deterministic terms are
not differenced. In this specification the regressors w, k and ys are treated as exogenous and serve
as their own instruments. In DPD syntax this requires entering these variables twice, on the X_VAR
and I_VAR lines. The GMM-type (block-diagonal) instruments in this example are the second and
subsequent lags of the level of n. Both 1-step and 2-step estimates are computed.
dpd.Gmm("n", 2, 99);
dpd.SetDummies(D_CONSTANT + D_TIME);
open abdata.gdt
list X = w w(-1) k ys ys(-1)
dpanel 2 ; n X const --time-dummies --asy --dpdstyle
dpanel 2 ; n X const --time-dummies --asy --two-step --dpdstyle
Note that in gretl the switch to suppress robust standard errors is --asymptotic, here abbreviated
to --asy.3 The --dpdstyle flag specifies that the constant and dummies should not be differenced,
in the context of a GMM-DIF model. With gretl’s dpanel command it is not necessary to specify the
exogenous regressors as their own instruments since this is the default; similarly, the use of the
second and all longer lags of the dependent variable as GMM-type instruments is the default and
need not be stated explicitly.
Example 2
The DPD file abest3.ox contains a variant of the above that differs with regard to the choice of
instruments: the variables w and k are now treated as predetermined, and are instrumented GMM-
style using the second and third lags of their levels. This approximates column (c) of Table 4 in
Arellano and Bond (1991). We have modified the code in abest3.ox slightly to allow the use of
robust (Windmeijer-corrected) standard errors, which are the default in both DPD and gretl with
2-step estimation:
dpd.Gmm("n", 2, 99);
dpd.Gmm("w", 2, 3);
dpd.Gmm("k", 2, 3);
3 Option flags in gretl can always be truncated, down to the minimal unique abbreviation.
Chapter 24. Dynamic panel models 232
open abdata.gdt
list X = w w(-1) k ys ys(-1)
list Ivars = ys ys(-1)
dpanel 2 ; n X const ; GMM(w,2,3) GMM(k,2,3) Ivars --time --two-step --dpd
Note that since we are now calling for an instrument set other then the default (following the second
semicolon), it is necessary to include the Ivars specification for the variable ys. However, it is
not necessary to specify GMM(n,2,99) since this remains the default treatment of the dependent
variable.
Example 3
Our third example replicates the DPD output from bbest1.ox: this uses the same dataset as the
previous examples but the model specifications are based on Blundell and Bond (1998), and involve
comparison of the GMM-DIF and GMM-SYS (“system”) estimators. The basic specification is slightly
simplified in that the variable ys is not used and only one lag of the dependent variable appears as
a regressor. The Ox/DPD code is:
open abdata.gdt
list X = w w(-1) k k(-1)
list Z = w k
Note the use of the --system option flag to specify GMM-SYS, including the default treatment of
the dependent variable, which corresponds to GMMlevel(n,1,1). In this case we also want to
use lagged differences of the regressors w and k as instruments for the levels equations so we
need explicit GMMlevel entries for those variables. If you want something other than the default
treatment for the dependent variable as an instrument for the levels equations, you should give an
explicit GMMlevel specification for that variable — and in that case the --system flag is redundant
(but harmless).
For the sake of completeness, note that if you specify at least one GMMlevel term, dpanel will then
include equations in levels, but it will not automatically add a default GMMlevel specification for
the dependent variable unless the --system option is given.
which allows for a time-specific disturbance νt . The Solow model with Cobb–Douglas production
function implies that γ = −α, but this assumption is not imposed in estimation. The time-specific
disturbance is eliminated by subtracting the period mean from each of the series.
Equation (24.7) can be transformed to an AR(1) dynamic panel-data model by adding yi,t−5 to both
sides, which gives
yit = (1 + β)yi,t−5 + αsit + γ(nit + g + δ) + ηi + ϵit (24.8)
where all variables are now assumed to be time-demeaned.
In (rough) replication of Bond et al. (2001) we now proceed to estimate the following two models:
(a) equation (24.8) via GMM-DIF, using as instruments the second and all longer lags of yit , sit and
nit + g + δ; and (b) equation (24.8) via GMM-SYS, using ∆yi,t−1 , ∆si,t−1 and ∆(ni,t−1 + g + δ) as
additional instruments in the levels equations. We report robust standard errors throughout. (As a
purely notational matter, we now use “t − 1” to refer to values five years prior to t, as in Bond et al.
(2001)).
The gretl script to do this job is shown in Listing 24.1. Note that the final transformed versions of
the variables (logs, with time-means subtracted) are named ly (yit ), linv (sit ) and lngd (nit +g +δ).
For comparison we estimated the same two models using Ox/DPD and xtabond2. (In each case
we constructed a comma-separated values dataset containing the data as transformed in the gretl
script shown above, using a missing-value code appropriate to the target program.) For reference,
the commands used with Stata are reproduced below:
4 We say an “approximation” because we have not been able to replicate exactly the OLS results reported in the papers
cited, though it seems from the description of the data in Caselli et al. (1996) that we ought to be able to do so. We note
that Bond et al. (2001) used data provided by Professor Caselli yet did not manage to reproduce the latter’s results.
Chapter 24. Dynamic panel models 234
open CEL.gdt
ngd = n + 0.05
ly = log(y)
linv = log(s)
lngd = log(ngd)
smpl --full
list X = linv lngd
# 1-step GMM-DIF
dpanel 1 ; ly X ; GMM(X,2,99)
# 2-step GMM-DIF
dpanel 1 ; ly X ; GMM(X,2,99) --two-step
# GMM-SYS
dpanel 1 ; ly X ; GMM(X,2,99) GMMlevel(X,1,1) --two-step --sys
#delimit ;
insheet using CEL.csv
tsset unit time;
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
gmm(lngd, lag(2 99)) rob nolev;
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
gmm(lngd, lag(2 99)) rob nolev twostep;
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
gmm(lngd, lag(2 99)) rob nocons twostep;
For the GMM-DIF model all three programs find 382 usable observations and 30 instruments, and
yield identical parameter estimates and robust standard errors (up to the number of digits printed,
or more); see Table 24.1.5
1-step 2-step
coeff std. error coeff std. error
ly(-1) 0.577564 0.1292 0.610056 0.1562
linv 0.0565469 0.07082 0.100952 0.07772
lngd −0.143950 0.2753 −0.310041 0.2980
Results for GMM-SYS estimation are shown in Table 24.2. In this case we show two sets of gretl
5 The coefficient shown for ly(-1) in the Tables is that reported directly by the software; for comparability with the
original model (eq. 24.7) it is necesary to subtract 1, which produces the expected negative value indicating conditional
convergence in per capita income.
Chapter 24. Dynamic panel models 235
results: those labeled “gretl(1)” were obtained using gretl’s --dpdstyle option, while those labeled
“gretl(2)” did not use that option—the intent being to reproduce the H matrices used by Ox/DPD
and xtabond2 respectively.
In this case all three programs use 479 observations; gretl and xtabond2 use 41 instruments and
produce the same estimates (when using the same H matrix) while Ox/DPD nominally uses 66.6
It is noteworthy that with GMM-SYS plus “messy” missing observations, the results depend on the
precise array of instruments used, which in turn depends on the details of the implementation of
the estimator.
Overidentification
If a model estimated with the use of instrumental variables is just-identified, the condition of or-
thogonality of the residuals and the instruments can be satisfied exactly. But if the specification
is overidentified (more instruments than endogenous regressors) this condition can only be ap-
proximated, and the degree to which orthogonality “fails” serves as a test for the validity of the
instruments (and/or the specification). Since dynamic panel models are almost always overidenti-
fied such a test is of particular importance.
There are two such tests in the econometric literature, devised respectively by Sargan (1958) and
Hansen (1982). They share a common principle: a suitably scaled measure of deviation from perfect
orthogonality can be shown to be distributed as χ 2 (k), with k the degree of overidentification,
under the null hypothesis of valid instruments and correct specification. Both test statistics can be
written as
N
X N
X
∗′ ′ ∗
S= v̂ Zi AN i Z v̂ i i
i=1 i=1
where the v̂∗i are the residuals in first differences for unit i, and for that reason they are often
rolled together—for example, as “Hansen–Sargan” tests by Davidson and MacKinnon (2004).
The Sargan vs Hansen difference is buried in AN : Sargan’s original test is the minimized orthogonal-
ity score divided by a scalar estimate of the error variance (which is presumed to be homoskedastic),
while Hansen’s is the minimized criterion from efficient GMM estimation, in which the scalar vari-
ance estimate is replaced by a heteroskedasticity- and autocorrelation-consistent (HAC) estimator
of the variance matrix of the error term. These variants correspond to 1-step and 2-step estimates
of the given specification.
Up till version 2021d, gretl followed Ox/DPD in presenting a single overidentification statistic under
the name “Sargan”—in effect, a Sargan test proper for the 1-step estimator and a Hansen test
6 This is a case of the issue described in section 24.1: the full A matrix turns out to be singular and special measures
N
must be taken to produce estimates.
Chapter 24. Dynamic panel models 236
for 2-step. Subsequently, however, gretl follows xtabond2 in distinguishing between the tests
and presenting both statistics, under their original names, when 2-step estimation is selected (and
therefore the HAC variance estimator is available). This choice responds to an argument made by
Roodman (2009b): the Sargan test is questionable owing to its assumption of homoskedasticity but
the Hansen test is seriously weakened by an excessive number of instruments (it may under-reject
substantially), so there may be a benefit to taking both tests into consideration.
There are cases where the degrees of freedom for the overidentification test differs between DPD
and gretl; this occurs when the AN matrix is singular (section 24.1). In concept the df equals the
number of instruments minus the number of parameters estimated; for the first of these terms
gretl uses the rank of AN , while DPD appears to use the full dimension of this matrix.
Autocorrelation
Negative first-order autocorrelation of the residuals in differences is expected by construction of
the dynamic panel estimator, so a significant value for the AR(1) test does not indicate a problem.
If the AR(2) test rejects, however, this indicates violation of the maintained assumptions. Note that
valid AR tests cannot be produced when the --asymptotic option is specified in conjunction with
one-step GMM-SYS estimation; if you need the tests, either add the two-step option or drop the
asymptotic flag (which is recommended in any case).
Key Content
AR1, AR2 1st and 2nd order autocorrelation test statistics
sargan, sargan_df Sargan test for overidentifying restrictions and correspond-
ing degrees of freedom
hansen, hansen_df Hansen test for overidentifying restrictions and corre-
sponding degrees of freedom
wald, wald_df Wald test for overall significance and corresponding de-
grees of freedom
GMMinst The matrix Z of instruments (see equations (24.2) and (24.5)
wgtmat The matrix A of GMM weights (see equations (24.2) and
(24.5)
Note that hansen and hansen_df are not included when 1-step estimation is selected. Note also
that GMMinst and wgtmat (which may be quite large matrices) are not saved in the $model bundle
by default; that requires use of the --keep-extra option with the dpanel command. Listing 24.2
illustrates use of these matrices to replicate via hansl commands the calculation of the GMM esti-
mator.
Chapter 24. Dynamic panel models 237
print coef
Chapter 24. Dynamic panel models 238
--time-dummies=noprint
This means that although the dummies are included in the specification their coefficients, standard
errors and so on are not printed.