Gretl Guide (201 250)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Chapter 21.

Cheat sheet 189

date cpi
1990 130.658
1995 152.383
2000 172.192

What we need is for the CPI variable in the panel to repeat these three values 500 times.
Solution: Simple! With the panel dataset open in gretl,

append cpi.txt

Comment: If the length of the time series is the same as the length of the time dimension in the
panel (3 in this example), gretl will perform the stacking automatically. Rather than using the
append command you could use the “Append data” item under the File menu in the GUI program.
If the length of your time series does not exactly match the T dimension of your panel dataset,
append will not work but you can use the join command, which is able to pick just the observations
with matching time periods. On selecting “Append data” in the GUI you are given a choice between
plain “append” and “join” modes, and if you choose the latter you get a dialog window allowing you
to specify the key(s) for the join operation. For native gretl data files you can use built-in series that
identify the time periods, such as $obsmajor, for your outer key to match the dates. In the example
above, if the CPI data were in gretl format $obsmajor would give you the year of the observations.

Time averaging of panel datasets


Problem: You have a panel dataset (comprising observations of n indidivuals in each of T periods)
and you want to lower the time frequency by averaging. This is commonly done in empirical growth
economics, where annual data are turned into 3- or 4- or 5-year averages (see for example Islam,
1995).
Solution: In a panel dataset, gretl functions that deal with time are aware of the panel structure,
so they will automatically “do the right thing”. Therefore, all you have to do is use the movavg()
function for computing moving averages and then just drop the years you don’t need. An example
with artificial data follows:

nulldata 36
set seed 61218
setobs 12 1:1 --stacked-time-series

###
### generate simulated yearly data
###
series year = 2000 + time
series y = round(normal())
series x = round(3*uniform())
list X = y x
print year X -o

###
### now re-cast as 4-year averages
###
# a dummy for endpoints
series endpoint = (year % 4 == 0)
# id variable
series id = $unit

# compute averages
loop foreach i X
series $i = movavg($i, 4)
Chapter 21. Cheat sheet 190

endloop

# drop extra observations


smpl endpoint --dummy --permanent

# restore panel structure


setobs id year --panel-vars
print id year X -o

Running the above script produces (among other output):

? print year X -o

year y x

1:01 2001 1 1
1:02 2002 -1 1
1:03 2003 -1 0
1:04 2004 0 1
1:05 2005 1 2
1:06 2006 1 2
1:07 2007 -1 0
1:08 2008 1 1
1:09 2009 0 3
1:10 2010 -1 1
1:11 2011 1 1
1:12 2012 0 1
...
3:09 2009 0 1
3:10 2010 1 1
3:11 2011 0 2
3:12 2012 1 2

? print id year X -o

id year y x

1:1 1 2004 -0.25 0.75


1:2 1 2008 0.50 1.25
1:3 1 2012 0.00 1.50
...
3:3 3 2012 0.50 1.50

Turning observation-marker strings into a series


Problem: Here’s one that might turn up in the context of the join command (see chapter 7).
The current dataset contains a string-valued series that you’d like to use as a key for matching
observations, perhaps the two-letter codes for the names of US states. The file from which you wish
to add data contains that same information, but not in the form of a string-valued series, rather it
exists in the form of “observation markers”. Such markers cannot be used as a key directly, but is
there a way to parlay them into a string-valued series? Why, of course there is!
Solution: We’ll illustrate with the Ramanathan data file data4-10.gdt, which contains private
school enrollment data and covariates for the 50 US states plus Washington, D.C. (n = 51).

open data4-10.gdt
markers --to-array="state_codes"
genr index
stringify(index, state_codes)
store joindata.gdt
Chapter 21. Cheat sheet 191

Comment: The markers command saves the observation markers to an array of strings. The com-
mand genr index creates a series that goes 1, 2, 3, . . . , and we attach the state codes to this series
via stringify(). After saving the result we have a datafile that contains a series, index, that can
be matched with whatever series holds the state code strings in the target dataset.
Suppose the relevant string-valued key series in the target dataset is called state. We might prefer
to avoid the need to specify a distinct “outer” key (again see chapter 7). In that case, in place of

genr index
stringify(index, state_codes)

we could do

genr index
series state = index
stringify(state, state_codes)

and the two datafiles will contain a comparable string-valued state series.

21.2 Creating/modifying variables


Generating a dummy variable for a specific observation
Problem: Generate dt = 0 for all observation but one, for which dt = 1.
Solution:

series d = (t=="1984:2")

Comment: The internal variable t is used to refer to observations in string form, so if you have a
cross-section sample you may just use d = (t=="123"). If the dataset has observation labels you
can use the corresponding label. For example, if you open the dataset mrw.gdt, supplied with gretl
among the examples, a dummy variable for Italy could be generated via

series DIta = (t=="Italy")

Note that this method does not require scripting at all. In fact, you might as well use the GUI Menu
“Add/Define new variable” for the same purpose, with the same syntax.

Generating a discrete variable out of a set of dummies


Problem: The dummify function (also available as a command) generates a set of mutually exclusive
dummies from a discrete variable. The reverse functionality, however, seems to be absent.
Solution:

series x = lincomb(D, seq(1, nelem(D)))

Comment: Suppose you have a list D of mutually exclusive dummies, that is a full set of 0/1 vari-
ables coding for the value of some characteristic, such that the sum of the values of the elements
of D is 1 at each observation. This is, by the way, exactly what the dummify command produces.
The reverse job of dummify can be performed neatly by using the lincomb function.
The code above multiplies the first dummy variable in the list D by 1, the second one by 2, and so
on. Hence, the return value is a series whose value is i if and only if the i-th member of D has value
1.
If you want your coding to start from 0 instead of 1, you’ll have to modify the code snippet above
into

series x = lincomb(D, seq(0, nelem(D)-1))


Chapter 21. Cheat sheet 192

Easter
Problem: I have a 7-day daily dataset. How do I create an “Easter” dummy?
Solution: We have the easterday() function, which returns month and day of Easter given the
year. The following is an example script which uses this function and a few string magic tricks:

series Easter = 0
loop y=2011..2016
a = easterday(y)
m = floor(a)
d = round(100*(a-m))
ed_str = sprintf("%04d-%02d-%02d", y, m, d)
Easter["@ed_str"] = 1
endloop

Comment: The round() function is necessary for the “day” component because otherwise floating-
point problems may ensue. Try the year 2015, for example.

Recoding a variable
Problem: You want to perform a 1-to-1 recode on a variable. For example, consider tennis points:
you may have a variable x holding values 1 to 3 and you want to recode it to 15, 30, 40.
Solution 1:

series x = replace(x, 1, 15)


series x = replace(x, 2, 30)
series x = replace(x, 3, 40)

Solution 2:

matrix tennis = {15, 30, 40}


series x = replace(x, seq(1,3), tennis)

Comment: There are many equivalent ways to achieve the same effect, but for simple cases such
as this, the replace function is simple and transparent. If you don’t mind using matrices, scripts
using replace can also be remarkably compact. Note that replace also performs n-to-1 (“surjec-
tive”) replacements, such as

series x = replace{z, {2, 3, 5, 11, 22, 33}, 1)

which would turn all entries equal to 2, 3, 5, 11, 22 or 33 to 1 and leave the other ones unchanged.

Generating a “subset of values” dummy


Problem: You have a dataset which contains a fine-grained coding for some qualitative variable
and you want to “collapse” this to a relatively small set of dummy variables. Examples: you have
place of work by US state and you want a small set of regional dummies; or you have detailed
occupational codes from a census dataset and you want a manageable number of occupational
category dummies.
Let’s call the source series src and one of the target dummies D1. And let’s say that the values
of src to be grouped under D1 are 2, 13, 14 and 25. We’ll consider three possible solutions:
“Longhand,” “Clever,” and “Proper.”
“Longhand” solution:

series D1 = src==2 || src==13 || src==14 || src==25


Chapter 21. Cheat sheet 193

Comment: The above works fine if the number of distinct values in the source to be condensed into
each dummy variable is fairly small, but it becomes cumbersome if a single dummy must comprise
dozens of source values.
Clever solution:

matrix sel = {2,13,14,25}


series D1 = maxr({src} .= vec(sel)’) .> 0

Comment: The subset of values to be grouped together can be written out as a matrix relatively
compactly (first line). The magic that turns this into the desired series (second line) relies on the
versatility of the “dot” (element-wise) matrix operators. The expression “{src}” gets a column-
vector version of the input series—call this x — and “vec(sel)’” gets the input matrix as a row
vector, in case it’s a column vector or a matrix with both dimensions greater than 1 — call this s. If
x is n × 1 and s is 1 × m, the “.=” operator produces an n × m result, each element (i, j) of which
equals 1 if xi = sj , otherwise 0. The maxr() function along with the “.>” operator (see chapter 17
for both) then produces the result we want.
Of course, whichever procedure you use, you have to repeat for each of the dummy series you want
to create (but keep reading—the “proper” solution is probably what you want if you plan to create
several dummies).
Further comment: Note that the clever solution depends on converting what is “naturally” a vector
result into a series. This will fail if there are missing values in src, since (by default) missing values
will be skipped when converting src to x, and so the number of rows in the result will fall short
of the number of observations in the dataset. One fix is then to subsample the dataset to exclude
missing values before employing this method; another is to adjust the skip_missing setting via
the set command (see the Gretl Command Reference).
Proper solution:
The best solution, in terms of both computational efficiency and code clarity, would be using a
“conversion table” and the replace function, to produce a series on which the dummify command
can be used. For example, suppose we want to convert from a series called fips holding FIPS
codes1 for the 50 US states plus the District of Columbia to a series holding codes for the four
standard US regions. We could create a 2 × 51 matrix — call it srmap— with the 51 FIPS codes on
the first row and the corresponding region codes on the second, and then do

series region = replace(fips, srmap[1,], srmap[2,])

Generating an ARMA(1,1)
Problem: Generate yt = 0.9yt−1 + εt − 0.5εt−1 , with εt ∼ NIID(0, 1).
Recommended solution:

alpha = 0.9
theta = -0.5
series y = filter(normal(), {1, theta}, alpha)

“Bread and butter” solution:

alpha = 0.9
theta = -0.5
series e = normal()
series y = 0
series y = alpha * y(-1) + e + theta * e(-1)
1 FIPS is the Federal Information Processing Standard: it assigns numeric codes from 1 to 56 to the US states and

outlying areas.
Chapter 21. Cheat sheet 194

Comment: The filter function is specifically designed for this purpose so in most cases you’ll
want to take advantage of its speed and flexibility. That said, in some cases you may want to
generate the series in a manner which is more transparent (maybe for teaching purposes).
In the second solution, the statement series y = 0 is necessary because the next statement eval-
uates y recursively, so y[1] must be set. Note that you must use the keyword series here instead
of writing genr y = 0 or simply y = 0, to ensure that y is a series and not a scalar.

Recoding a variable by classes


Problem: You want to recode a variable by classes. For example, you have the age of a sample of
individuals (xi ) and you need to compute age classes (yi ) as

yi = 1 for xi < 18
yi = 2 for 18 ≤ xi < 65
yi = 3 for xi ≥ 65

Solution:

series y = 1 + (x >= 18) + (x >= 65)

Comment: True and false expressions are evaluated as 1 and 0 respectively, so they can be ma-
nipulated algebraically as any other number. The same result could also be achieved by using the
conditional assignment operator (see below), but in most cases it would probably lead to more
convoluted constructs.

Conditional assignment
Problem: Generate yt via the following rule:
(
xt for dt > a
yt =
zt for dt ≤ a

Solution:

series y = (d > a) ? x : z

Comment: There are several alternatives to the one presented above. One is a brute force solution
using loops. Another one, more efficient but still suboptimal, would be

series y = (d>a)*x + (d<=a)*z

However, the ternary conditional assignment operator is not only the most efficient way to accom-
plish what we want, it is also remarkably transparent to read when one gets used to it. Some readers
may find it helpful to note that the conditional assignment operator works exactly the same way as
the =IF() function in spreadsheets.

Generating a time index for panel datasets


Problem: gretl has a $unit accessor, but not the equivalent for time. What should I use?
Solution:

series x = time

Comment: The special construct genr time and its variants are aware of whether a dataset is a
panel.
Chapter 21. Cheat sheet 195

Sanitizing a list of regressors


Problem: I noticed that built-in commands like ols automatically drop collinear variables and put
the constant first. How can I achieve the same result for an estimator I’m writing?
Solution: No worry. The function below does just that

function list sanitize(list X)


list R = X - const
if nelem(R) < nelem(X)
R = const R
endif
return dropcoll(R)
end function

so for example the code below

nulldata 20
x = normal()
y = normal()
z = x + y # collinear
list A = x y const z
list B = sanitize(A)

list print A
list print B

returns

? list print A
x y const z
? list print B
const x y

Besides: it has been brought to our attention that some mischievous programs out there put the
constant last, instead of first, like God intended. We are not amused by this utter disrespect of
econometric tradition, but if you want to pursue the way of evil, it is rather simple to adapt the
script above to that effect.

Generating the “hat” values after an OLS regression


Problem: I’ve just run an OLS regression, and now I need the so-called the leverage values (also
known as the “hat” values). I know you can access residuals and fitted values through “dollar”
accessors, but nothing like that seems to be available for “hat” values.
Solution: “Hat” values are can be thought of as the diagonal of the projection matrix PX , or more
explicitly as
hi = x′i (X ′ X)−1 xi
where X is the matrix of regressors and x′i is its i-th row.
The reader is invited to study the code below, which offers four different solutions to the problem:

open data4-1.gdt --quiet


list X = const sqft bedrms baths
ols price X

# method 1
leverage --save --quiet
Chapter 21. Cheat sheet 196

series h1 = lever

# these are necessary for what comes next


matrix mX = {X}
matrix iXX = invpd(mX’mX)

# method 2
series h2 = diag(qform(mX, iXX))
# method 3
series h3 = sumr(mX .* (mX*iXX))
# method 4
series h4 = NA
loop i=1..$nobs
matrix x = mX[i,]’
h4[i] = x’iXX*x
endloop

# verify
print h1 h2 h3 h4 --byobs

Comment: Solution 1 is the preferable one: it relies on the built-in leverage command, which
computes the requested series quite efficiently, taking care of missing values, possible restrictions
to the sample, etc.
However, three more are shown for didactical purposes, mainly to show the user how to manipulate
matrices. Solution 2 first constructs the PX matrix explicitly, via the qform function, and then takes
its diagonal; this is definitely not recommended (despite its compactness), since you generate a
much bigger matrix than you actually need and waste a lot of memory and CPU cycles in the
process. It doesn’t matter very much in the present case, since the sample size is very small, but
with a big dataset this could be a very bad idea.
Solution 3 is more clever, and relies on the fact that, if you define Z = X · (X ′ X)−1 , then hi could
also be written as
k
X
hi = x′i zi = xik zik
i=1

which is in turn equivalent to the sum of the elements of the i-th row of X ⊙ Z, where ⊙ is the
element-by-element product. In this case, your clever usage of matrix algebra would produce a
solution computationally much superior to solution 2.
Solution 4 is the most old-fashioned one, and employs an indexed loop. While this wastes practi-
cally no memory and employs no more CPU cycles in algebraic operations than strictly necessary,
it imposes a much greater burden on the hansl interpreter, since handling a loop is conceptually
more complex than a single operation. In practice, you’ll find that for any realistically-sized prob-
lem, solution 4 is much slower that solution 3.

Moving functions for time series


Problem: gretl provides native functions for moving averages, but I need a to compute a different
statistic on a sliding data window. Is there a way to do this without using loops?
Solution: One of the nice things of the list data type is that, if you define a list, then several
functions that would normally apply “vertically” to elements of a series apply “horizontally” across
the list. So for example, the following piece of code

open bjg.gdt
order = 12
list L = lg || lags(order-1, lg)
smpl +order ;
Chapter 21. Cheat sheet 197

series movmin = min(L)


series movmax = max(L)
series movmed = median(L)
smpl full

computes the moving minimum, maximum and median of the lg series. Plotting the four series
would produce something similar to figure 21.1.

6.6
lg
6.4 movmin
movmed
6.2 movmax

5.8

5.6

5.4

5.2

4.8

4.6
1950 1952 1954 1956 1958 1960

Figure 21.1: “Moving” functions

Generating data with a prescribed correlation structure


Problem: I’d like to generate a bunch of normal random variates whose covariance matrix is exactly
equal to a given matrix Σ. How can I do this in gretl?
Solution: The Cholesky decomposition is your friend. If you want to generate data with a given
population covariance matrix, then all you have to do is post-multiply your pseudo-random data by
the Cholesky factor (transposed) of the matrix you want. For example:

set seed 123


S = {2,1;1,1}
T = 1000
X = mnormal(T, rows(S))
X = X * cholesky(S)’
eval mcov(X)

should give you

? eval mcov(X)
2.0016 1.0157
1.0157 1.0306

If, instead, you want your simulated data to have a given sample covariance matrix, you have to
apply the same technique twice: one for standardizing the data, another one for giving it the
covariance structure you want. Example:

S = {2,1;1,1}
T = 1000
Chapter 21. Cheat sheet 198

X = mnormal(T, rows(S))
X = X * (cholesky(S)/cholesky(mcov(X)))’
eval mcov(X)

gives you

? eval mcov(X)
2 1
1 1

as required.

21.3 Neat tricks


Interaction dummies
Problem: You want to estimate the model yi = xi β1 + zi β2 + di β3 + (di · zi )β4 + εt , where di is a
dummy variable while xi and zi are vectors of explanatory variables.
Solution: As of version 1.9.12, gretl provides the ^ operator to make this operation easy. See
section 15.1 for details (especially example script 15.1). But back in my day, we used loops to do
that! Here’s how:

list X = x1 x2 x3
list Z = z1 z2
list dZ = null
loop foreach i Z
series d$i = d * $i
list dZ = dZ d$i
endloop

ols y X Z d dZ

Comment: It’s amazing what string substitution can do for you, isn’t it?

Realized volatility
Given data by the minute, you want to compute the “realized volatility” for the hour as
Problem: P
1 60 2
RVt = 60 τ=1 yt:τ . Imagine your sample starts at time 1:1.

Solution:

smpl --full
genr time
series minute = int(time/60) + 1
series second = time % 60
setobs minute second --panel
series rv = psd(y)^2
setobs 1 1
smpl second==1 --restrict
store foo rv

Comment: Here we trick gretl into thinking that our dataset is a panel dataset, where the minutes
are the “units” and the seconds are the “time”; this way, we can take advantage of the special
function psd(), panel standard deviation. Then we simply drop all observations but one per minute
and save the resulting data (store foo rv translates as “store in the gretl datafile foo.gdt the
series rv”).
Chapter 21. Cheat sheet 199

Looping over two paired lists


Problem: Suppose you have two lists with the same number of elements, and you want to apply
some command to corresponding elements over a loop.
Solution:

list L1 = a b c
list L2 = x y z

k1 = 1
loop foreach i L1
k2 = 1
loop foreach j L2
if k1 == k2
ols $i 0 $j
endif
k2++
endloop
k1++
endloop

Comment: The simplest way to achieve the result is to loop over all possible combinations and
filter out the unneeded ones via an if condition, as above. That said, in some cases variable names
can help. For example, if

list Lx = x1 x2 x3
list Ly = y1 y2 y3

then we could just loop over the integers — quite intuitive and certainly more elegant:

loop i=1..3
ols y$i const x$i
endloop

Convolution / polynomial multiplication


Problem: How do I multiply polynomials? There’s no dedicated function to do that, and yet it’s a
fairly basic mathematical task.
Solution: Never fear! We have the conv2d function, which is a tool for a more general problem, but
includes polynomial multiplication as a special case..
Pm Pn
Suppose you want to multiply two finite-order polynomials P (x) = i=0 pi x i and Q(x) = i=0 qi x i .
What you want is the sequence of coefficients of the polynomial
m+n
X
R(x) = P (x) · Q(x) = rk x k
k=0

where
k
X
rk = pi qk−i
i=0
is the convolution of the pi and qi coefficients. The same operation can be performed via the FFT,
but in most cases using conv2d is quicker and more natural.
As an example, we’ll use the same one we used in Section 30.5: consider the multiplication of two
polynomials:
P (x) = 1 + 0.5x
Q(x) = 1 + 0.3x − 0.8x 2
R(x) = P (x) · Q(x) = 1 + 0.8x − 0.65x 2 − 0.4x 3
Chapter 21. Cheat sheet 200

The following code snippet performs all the necessary calculations:

p = {1; 0.5}
q = {1; 0.3; -0.8}
r = conv2d(p, q)
print r

Runnning the above produces

r (4 x 1)

1
0.8
-0.65
-0.4

which is indeed the desired result. Note that the same computation could also be performed via
the filter function, at the price of slightly more elaborate syntax.

Comparing two lists


Problem: How can I tell if two lists contain the same variables (not necessarily in the same order)?
Solution: In many respects, lists are like sets, so it makes sense to use the so-called “symmetric
difference” operator, which is defined as

A △ B = (A \ B) ∪ (B \ A)

where in this context backslash represents the relative complement operator, such that

A \ B = {x ∈ A | x ̸∈ B}

In practice we first check if there are any series in A but not in B, then we perform the reverse check.
If the union of the two results is an empty set, then the lists must contain the same variables. The
hansl syntax for this would be something like

scalar NotTheSame = nelem((A-B) || (B-A)) > 0

Reordering list elements


Problem: Is there a way to reorder list elements?
Solution: You can use the fact that a list can be cast into a vector of integers and then manipulated
via ordinary matrix syntax. So, for example, if you wanted to “flip” a list you may just use the
mreverse function. For example:

open AWM.gdt --quiet


list X = 3 6 9 12
matrix tmp = X
list revX = mreverse(tmp’)

list X print
list revX print

will produce

? list X print
D1 D872 EEN_DIS GCD
? list revX print
GCD EEN_DIS D872 D1
Chapter 21. Cheat sheet 201

Plotting an asymmetric confidence interval


Problem: “I like the look of the --band option to the gnuplot and plot commands, but it’s set up
for plotting a symmetric interval and I want to show an asymmetric one.”
Solution: Any interval is by construction symmetrical about its mean at each observation. So you
just need to perform a little tweak. Say you want to plot a series x along with a band defined by the
two series top and bot. Here we go:

# create series for mid-point and deviation


series mid = (top + bot)/2
series dev = top - mid
gnuplot x --band=mid,dev --time-series --with-lines --output=display

Cross-validation
Problem: “I’d like to compute the so-called leave-one-out cross-validation criterion for my regression.
Is there a command in gretl?”
If you have a sample with n observations, the “leave-one-out” cross-validation criterion can be
mechanically computed by running n regressions in which one observation at a time is omitted
and all the other ones are used to forecast its value. The sum of the n squared forecast errors is
the statistic we want. Fortunately, there is no need to do so. It is possible to prove that the same
statistic can be computed as
Xn
CV = [ûi /(1 − hi )]2 ,
i=1

where hi is the i-th element of the “hat” matrix (see section 21.2) from a regression on the whole
sample.
This method is natively provided by gretl as a side benefit to the leverage command, that stores
the CV criterion into the $test accessor. The following script shows the equivalence of the two
approaches:

set verbose off


open data4-1.gdt
list X = const sqft bedrms baths

# compute the CV criterion the silly way

scalar CV = 0
matrix mX = {X}
loop i = 1 .. $nobs
xi = mX[i,]
yi = price[i]
smpl obs != i --restrict
ols price X --quiet
smpl full
scalar fe = yi - xi * $coeff
CV += fe^2
endloop

printf "CV = %g\n", CV

# the smart way

ols price X --quiet


leverage --quiet
printf "CV = %g\n", $test
Chapter 21. Cheat sheet 202

Is my matrix result broken?


Problem: “Most of the matrix manipulation functions available in gretl flag an error if something
goes wrong, but there’s no guarantee that every matrix computation will return an entirely finite
matrix, containing no infinities or NaNs. So how do I tell if I’ve got a fully valid matrix?”
Solution: Given a matrix m, the call “ok(m)” returns a matrix with the same dimensions as m, with
elements 1 for finite values and 0 for infinities or NaNs. A matrix as a whole is OK if it has no
elements which fail this test, so here’s a suitable check for a “broken” matrix, using the logical NOT
operator, “!”:

sumc(sumr(!ok(m))) > 0

If this gives a non-zero return value you know that m contains at least one non-finite element.
Part II

Econometric methods

203
Chapter 22

Robust covariance matrix estimation

22.1 Introduction
Consider (once again) the linear regression model

y = Xβ + u (22.1)

where y and u are T -vectors, X is a T × k matrix of regressors, and β is a k-vector of parameters.


As is well known, the estimator of β given by Ordinary Least Squares (OLS) is

β̂ = (X ′ X)−1 X ′ y (22.2)

If the condition E(u|X) = 0 is satisfied, this is an unbiased estimator; under somewhat weaker
conditions the estimator is biased but consistent. It is straightforward to show that when the OLS
estimator is unbiased (that is, when E(β̂ − β) = 0), its variance is
 
Var(β̂) = E (β̂ − β)(β̂ − β)′ = (X ′ X)−1 X ′ ΩX(X ′ X)−1 (22.3)

where Ω = E(uu′ ) is the covariance matrix of the error terms.


Under the assumption that the error terms are independently and identically distributed (iid) we
can write Ω = σ 2 I, where σ 2 is the (common) variance of the errors (and the covariances are zero).
In that case (22.3) simplifies to the “classical” formula,

Var(β̂) = σ 2 (X ′ X)−1 (22.4)

If the iid assumption is not satisfied, two things follow. First, it is possible in principle to construct
a more efficient estimator than OLS—for instance some sort of Feasible Generalized Least Squares
(FGLS). Second, the simple “classical” formula for the variance of the least squares estimator is no
longer correct, and hence the conventional OLS standard errors — which are just the square roots
of the diagonal elements of the matrix defined by (22.4) — do not provide valid means of statistical
inference.
In the recent history of econometrics there are broadly two approaches to the problem of non-
iid errors. The “traditional” approach is to use an FGLS estimator. For example, if the departure
from the iid condition takes the form of time-series dependence, and if one believes that this
could be modeled as a case of first-order autocorrelation, one might employ an AR(1) estimation
method such as Cochrane–Orcutt, Hildreth–Lu, or Prais–Winsten. If the problem is that the error
variance is non-constant across observations, one might estimate the variance as a function of the
independent variables and then perform weighted least squares, using as weights the reciprocals
of the estimated variances.
While these methods are still in use, an alternative approach has found increasing favor: that is,
use OLS but compute standard errors (or more generally, covariance matrices) that are robust with
respect to deviations from the iid assumption. This is typically combined with an emphasis on
using large datasets—large enough that the researcher can place some reliance on the (asymptotic)
consistency property of OLS. This approach has been enabled by the availability of cheap computing
power. The computation of robust standard errors and the handling of very large datasets were
daunting tasks at one time, but now they are unproblematic. The other point favoring the newer
methodology is that while FGLS offers an efficiency advantage in principle, it often involves making

204
Chapter 22. Robust covariance matrix estimation 205

additional statistical assumptions which may or may not be justified, which may not be easy to test
rigorously, and which may threaten the consistency of the estimator — for example, the “common
factor restriction” that is implied by traditional FGLS “corrections” for autocorrelated errors.
James Stock and Mark Watson’s Introduction to Econometrics illustrates this approach at the level of
undergraduate instruction: many of the datasets they use comprise thousands or tens of thousands
of observations; FGLS is downplayed; and robust standard errors are reported as a matter of course.
In fact, the discussion of the classical standard errors (labeled “homoskedasticity-only”) is confined
to an Appendix.
Against this background it may be useful to set out and discuss all the various options offered
by gretl in respect of robust covariance matrix estimation. The first point to notice is that gretl
produces “classical” standard errors by default (in all cases apart from GMM estimation). In script
mode you can get robust standard errors by appending the --robust flag to estimation commands.
In the GUI program the model specification dialog usually contains a “Robust standard errors”
check box, along with a “configure” button that is activated when the box is checked. The configure
button takes you to a configuration dialog (which can also be reached from the main menu bar:
Tools → Preferences → General → HCCME). There you can select from a set of possible robust
estimation variants, and can also choose to make robust estimation the default.
The specifics of the available options depend on the nature of the data under consideration —
cross-sectional, time series or panel— and also to some extent the choice of estimator. (Although
we introduced robust standard errors in the context of OLS above, they may be used in conjunction
with other estimators too.) The following three sections of this chapter deal with matters that are
specific to the three sorts of data just mentioned. Note that additional details regarding covariance
matrix estimation in the context of GMM are given in chapter 27.
We close this introduction with a brief statement of what “robust standard errors” can and cannot
achieve. They can provide for asymptotically valid statistical inference in models that are basically
correctly specified, but in which the errors are not iid. The “asymptotic” part means that they
may be of little use in small samples. The “correct specification” part means that they are not a
magic bullet: if the error term is correlated with the regressors, so that the parameter estimates
themselves are biased and inconsistent, robust standard errors will not save the day.

22.2 Cross-sectional data and the HCCME


With cross-sectional data, the most likely departure from iid errors is heteroskedasticity (non-
constant variance).1 In some cases one may be able to arrive at a judgment regarding the likely
form of the heteroskedasticity, and hence to apply a specific correction. The more common case,
however, is where the heteroskedasticity is of unknown form. We seek an estimator of the covari-
ance matrix of the parameter estimates that retains its validity, at least asymptotically, in face of
unspecified heteroskedasticity. It is not obvious a priori that this should be possible, but White
(1980) showed that
d h (β̂) = (X ′ X)−1 X ′ Ω̂X(X ′ X)−1
Var (22.5)
does the trick. (As usual in statistics we need to say “under certain conditions”, but the conditions
are not very restrictive.) Ω̂ is in this context a diagonal matrix, whose non-zero elements may be
estimated using squared OLS residuals. White referred to (22.5) as a heteroskedasticity-consistent
covariance matrix estimator (HCCME).
Davidson and MacKinnon (2004, chapter 5) offer a useful discussion of several variants on White’s
HCCME theme. They refer to the original variant of (22.5) — in which the diagonal elements of Ω̂
are estimated directly by the squared OLS residuals, û2t — as HC0 . (The associated standard errors
are often called “White’s standard errors”.) The various refinements of White’s proposal share a
common point of departure, namely the idea that the squared OLS residuals are likely to be “too

1 In some specialized contexts spatial autocorrelation may be an issue. Gretl does not have any built-in methods to

handle this and we will not discuss it here.


Chapter 22. Robust covariance matrix estimation 206

small” on average. This point is quite intuitive. The OLS parameter estimates, β̂, satisfy by design
the criterion that the sum of squared residuals,
X X 2
û2t = yt − Xt β̂

is minimized for given X and y. Suppose that β̂ ≠ β. This is almost certain to be the case: even if
OLS is not biased it would be a miracle if the β̂ calculated from any finite sample were
P exactly equal
to β. But in that case thePsum of squares of the true, unobserved errors, u2t = (yt − Xt β)2 is
P

bound to be greater than û2t . The elaborated variants on HC0 take this point on board as follows:

• HC1 : Applies a degrees-of-freedom correction, multiplying the HC0 matrix by T /(T − k).

• HC2 : Instead of using û2t for the diagonal elements of Ω̂, uses û2t /(1 − ht ), where ht =
Xt (X ′ X)−1 Xt′ , the t th diagonal element of the projection matrix PX , which has the property
that PX · y = ŷ. The relevance of ht is that if the variance of all the ut is σ 2 , the expectation
of û2t is σ 2 (1 − ht ), or in other words, the ratio û2t /(1 − ht ) has expectation σ 2 . As Davidson
and MacKinnon show, 0 ≤ ht < 1 for all t, so this adjustment cannot reduce the the diagonal
elements of Ω̂ and in general revises them upward.

• HC3 : Uses û2t /(1 − ht )2 . The additional factor of (1 − ht ) in the denominator, relative to
HC2 , may be justified on the grounds that observations with large variances tend to exert a
lot of influence on the OLS estimates, so that the corresponding residuals tend to be under-
estimated. See Davidson and MacKinnon for a fuller explanation.

• HC3a : Implements the jackknife approach from MacKinnon and White (1985). (HC3 is a close
approximation of this.)

The relative merits of these variants have been explored by means of both simulations and the-
oretical analysis. Unfortunately there is not a clear consensus on which is “best”. Davidson and
MacKinnon argue that the original HC0 is likely to perform worse than the others; nonetheless,
“White’s standard errors” are reported more often than the more sophisticated variants and there-
fore, for reasons of comparability, HC0 is the default HCCME in gretl.
If you wish to use HC1 , HC2 , HC3 , or HC3a you can arrange for this in either of two ways. In script
mode, you can do, for example,

set hc_version 2

In the GUI program you can go to the HCCME configuration dialog, as noted above, and choose any
of these variants to be the default.

22.3 Time series data and HAC covariance matrices


Heteroskedasticity may be an issue with time series data too but it is unlikely to be the only, or
even the primary, concern.
One form of heteroskedasticity is common in macroeconomic time series but is fairly easily dealt
with. That is, in the case of strongly trending series such as Gross Domestic Product, aggregate
consumption, aggregate investment, and so on, higher levels of the variable in question are likely
to be associated with higher variability in absolute terms. The obvious “fix”, employed in many
macroeconometric studies, is to use the logs of such series rather than the raw levels. Provided the
proportional variability of such series remains roughly constant over time the log transformation
is effective.
Other forms of heteroskedasticity may resist the log transformation, but may demand a special
treatment distinct from the calculation of robust standard errors. We have in mind here autore-
gressive conditional heteroskedasticity, for example in the behavior of asset prices, where large
Chapter 22. Robust covariance matrix estimation 207

disturbances to the market may usher in periods of increased volatility. Such phenomena call for
specific estimation strategies, such as GARCH (see chapter 31).
Despite the points made above, some residual degree of heteroskedasticity may be present in time
series data: the key point is that in most cases it is likely to be combined with serial correlation
(autocorrelation), hence demanding a special treatment. In White’s approach, Ω̂, the estimated
covariance matrix of the ut , remains conveniently diagonal: the variances, E(u2t ), may differ by
t but the covariances, E(ut us ) for s ≠ t, are all zero. Autocorrelation in time series data means
that at least some of the the off-diagonal elements of Ω̂ should be non-zero. This introduces a
substantial complication and requires another piece of terminology: estimates of the covariance
matrix that are asymptotically valid in face of both heteroskedasticity and autocorrelation of the
error process are termed HAC (heteroskedasticity- and autocorrelation-consistent).
The issue of HAC estimation is treated in more technical terms in chapter 27. Here we try to
convey some of the intuition at a more basic level. We begin with a general comment: residual
autocorrelation is not so much a property of the data as a symptom of an inadequate model. Data
may be persistent though time, and if we fit a model that does not take this aspect into account
properly we end up with a model with autocorrelated disturbances. Conversely, it is often possible
to mitigate or even eliminate the problem of autocorrelation by including relevant lagged variables
in a time series model, or in other words, by specifying the dynamics of the model more fully. HAC
estimation should not be seen as the first resort in dealing with an autocorrelated error process.
That said, the “obvious” extension of White’s HCCME to the case of autocorrelated errors would
seem to be this: estimate the off-diagonal elements of Ω̂ (that is, the autocovariances, E(ut us ))
using, once again, the appropriate OLS residuals: ω̂ts = ût ûs . This is basically right, but demands
an important amendment. We seek a consistent estimator, one that converges towards the true Ω as
the sample size tends towards infinity. This can’t work if we allow unbounded serial dependence.
A larger sample will enable us to estimate more of the true ωts elements (that is, for t and s
more widely separated in time) but will not contribute ever-increasing information regarding the
maximally separated ωts pairs since the maximal separation itself grows with the sample size.
To ensure consistency we have to confine our attention to processes exhibiting temporally limited
dependence. In other words we cut off the computation of the ω̂ts values at some maximum value
of p = t − s, where p is treated as an increasing function of the sample size, T , although it cannot
increase in proportion to T .
The simplest variant of this idea is to truncate the computation at some finite lag order p, where
p grows as, say, T 1/4 . The trouble with this is that the resulting Ω̂ may not be a positive definite
matrix. In practical terms, we may end up with negative estimated variances. One solution to this
problem is offered by The Newey–West estimator (Newey and West, 1987), which assigns declining
weights to the sample autocovariances as the temporal separation increases.
To understand this point it is helpful to look more closely at the covariance matrix given in (22.5),
namely,
(X ′ X)−1 (X ′ Ω̂X)(X ′ X)−1
This is known as a “sandwich” estimator. The bread, which appears on both sides, is (X ′ X)−1 . This
k × k matrix is also the key ingredient in the computation of the classical covariance matrix. The
filling in the sandwich is
Σ̂ = X′ Ω̂ X
(k×k) (k×T ) (T ×T ) (T ×k)

It can be proven that under mild regularity conditions T −1 Σ̂ is a consistent estimator of the long-run
covariance matrix of the random k-vector xt · ut .
From a computational point of view it is neither necessary nor desirable to store the (potentially
very large) T × T matrix Ω̂ as such. Rather, one computes the sandwich filling by summation as a
weighted sum:
p  
X
Σ̂ = Γ̂ (0) + wj Γ̂ (j) + Γ̂ ′ (j)
j=1
Chapter 22. Robust covariance matrix estimation 208

where wj is the weight given to lag j > 0 and the k × k matrix Γ̂ (j), for j ≥ 0, is given by
T
X
Γ̂ (j) = ût ût−j Xt′ Xt−j ;
t=j+1

that is, the sample autocovariance matrix of xt · ut at lag j, apart from a scaling factor T .
This leaves two questions. How exactly do we determine the maximum lag length or “bandwidth”,
p, of the HAC estimator? And how exactly are the weights wj to be determined? We will return to
the (difficult) question of the bandwidth shortly. As regards the weights, gretl offers three variants.
The default is the Bartlett kernel, as used by Newey and West. This sets

 1− j j≤p
p+1
wj =
 0 j>p

so the weights decline linearly as j increases. The other two options are the Parzen kernel and the
Quadratic Spectral (QS) kernel. For the Parzen kernel,
3

2
 1 − 6aj + 6aj 0 ≤ aj ≤ 0.5


wj = 2(1 − aj )3 0.5 < aj ≤ 1

0 a >1


j

where aj = j/(p + 1), and for the QS kernel,


!
25 sin mj
wj = − cos mj
12π 2 d2j mj

where dj = j/p and mj = 6π dj /5.


Figure 22.1 shows the weights generated by these kernels, for p = 4 and j = 1 to 9.

Figure 22.1: Three HAC kernels

Bartlett Parzen QS

In gretl you select the kernel using the set command with the hac_kernel parameter:

set hac_kernel parzen


set hac_kernel qs
set hac_kernel bartlett

Selecting the HAC bandwidth


The asymptotic theory developed by Newey, West and others tells us in general terms how the
HAC bandwidth, p, should grow with the sample size, T — that is, p should grow in proportion
to some fractional power of T . Unfortunately this is of little help to the applied econometrician,
working with a given dataset of fixed size. Various rules of thumb have been suggested and gretl
implements two such. The default is p = 0.75T 1/3 , as recommended by Stock and Watson (2003).
An alternative is p = 4(T /100)2/9 , as in Wooldridge (2002b). In each case one takes the integer
part of the result. These variants are labeled nw1 and nw2 respectively, in the context of the set
command with the hac_lag parameter. That is, you can switch to the version given by Wooldridge
with
Chapter 22. Robust covariance matrix estimation 209

set hac_lag nw2

As shown in Table 22.1 the choice between nw1 and nw2 does not make a great deal of difference.

T p (nw1) p (nw2)
50 2 3
100 3 4
150 3 4
200 4 4
300 5 5
400 5 5

Table 22.1: HAC bandwidth: two rules of thumb

You also have the option of specifying a fixed numerical value for p, as in

set hac_lag 6

In addition you can set a distinct bandwidth for use with the Quadratic Spectral kernel (since this
need not be an integer). For example,

set qs_bandwidth 3.5

Prewhitening and data-based bandwidth selection


An alternative approach is to deal with residual autocorrelation by attacking the problem from two
sides. The intuition behind the technique known as VAR prewhitening (Andrews and Monahan,
1992) can be illustrated by a simple example. Let xt be a sequence of first-order autocorrelated
random variables
xt = ρxt−1 + ut
The long-run variance of xt can be shown to be

VLR (ut )
VLR (xt ) =
(1 − ρ)2

In most cases ut is likely to be less autocorrelated than xt , so a smaller bandwidth should suffice.
Estimation of VLR (xt ) can therefore proceed in three steps: (1) estimate ρ; (2) obtain a HAC estimate
of ût = xt − ρ̂xt−1 ; and (3) divide the result by (1 − ρ)2 .
The application of the above concept to our problem implies estimating a finite-order Vector Au-
toregression (VAR) on the vector variables ξt = Xt ût . In general the VAR can be of any order, but
in most cases 1 is sufficient; the aim is not to build a watertight model for ξt , but just to “mop up”
a substantial part of the autocorrelation. Hence, the following VAR is estimated

ξt = Aξt−1 + εt

Then an estimate of the matrix X ′ ΩX can be recovered via

(I − Â)−1 Σ̂ε (I − Â′ )−1

where Σ̂ε is any HAC estimator, applied to the VAR residuals.


You can ask for prewhitening in gretl using

set hac_prewhiten on
Chapter 22. Robust covariance matrix estimation 210

There is at present no mechanism for specifying an order other than 1 for the initial VAR.
A further refinement is available in this context, namely data-based bandwidth selection. It makes
intuitive sense that the HAC bandwidth should not simply be based on the size of the sample,
but should somehow take into account the time-series properties of the data (and also the kernel
chosen). A nonparametric method for doing this was proposed by Newey and West (1994); a good
concise account of the method is given in Hall (2005). This option can be invoked in gretl via

set hac_lag nw3

This option is the default when prewhitening is selected, but you can override it by giving a specific
numerical value for hac_lag.
Even the Newey–West data-based method does not fully pin down the bandwidth for any particular
sample. The first step involves calculating a series of residual covariances. The length of this series
is given as a function of the sample size, but only up to a scalar multiple — for example, it is given
as O(T 2/9 ) for the Bartlett kernel. Gretl uses an implied multiple of 1.

Newey–West with missing values


If the estimation sample for a time-series model includes incomplete observations — where the
value of the dependent variable or one more regressors is missing — the Newey–West procedure
must be either modified or abandoned, since some ingredients of the Σ̂ matrix defined above will
be absent. Two modified methods have been discussed in the literature. Parzen (1963) proposed
what he called Amplitude Modulation (AM), which involves setting the values of the residual and
each of the regressors to zero for the incomplete observations (and then proceeding as usual).
Datta and Du (2012) propose the so-called Equal Spacing (ES) method: calculate as if the incomplete
observations did not exist, and the complete observations therefore form an equally-spaced series.
Somewhat suprisingly, it can be shown that both of these methods have appropriate asymptotic
properties; see Rho and Vogelsang (2018) for further elaboration.
In gretl you can select a preferred method via one or other of these commands:

set hac_missvals es # ES, Datta and Du


set hac_missvals am # AM, Parzen
set hac_missvals off

The ES method is the default. The off option means that gretl will refuse to produce HAC standard
errors when the sample includes incomplete observations: use this if you have qualms about the
modified methods.

VARs: a special case


A well-specified vector autoregression (VAR) will generally include enough lags of the dependent
variables to obviate the problem of residual autocorrelation, in which case HAC estimation is
redundant —although there may still be a need to correct for heteroskedasticity. For that rea-
son plain HCCME, and not HAC, is the default when the --robust flag is given in the context of the
var command. However, if for some reason you need HAC you can force the issue by giving the
option --robust-hac.

Long-run variance
Let us expand a little on the subject of the long-run variance that was mentioned above and the
associated tools offered by gretl for scripting. (You may also want to check out the reference for
the lrcovar function for the multivariate case.) As is well known, the variance of the average of T
random variables x1 , x2 , . . . , xT with equal variance σ 2 equals σ 2 /T if the data are uncorrelated. In
this case, the sample variance of xt over the sample size provides a consistent estimator.
Chapter 22. Robust covariance matrix estimation 211

PT
If, however, there is serial correlation among the xt s, the variance of X̄ = T −1 t=1 xt must be
estimated differently. One of the most widely used statistics for this purpose is a nonparametric
kernel estimator with the Bartlett kernel defined as
 
TX
−k k
X
ω̂2 (k) = T −1  wi (xt − X̄)(xt−i − X̄) , (22.6)
t=k i=−k

where the integer k is known as the window size and the wi terms are the so-called Bartlett weights,
|i|
defined as wi = 1 − k+1 . It can be shown that, for k large enough, ω̂2 (k)/T yields a consistent
estimator of the variance of X̄.
gretl implements this estimator by means of the function lrvar(). This function takes one re-
quired argument, namely the series whose long-run variance is to be estimated, followed by two
optional arguments. The first of these can be used to supply a value for k; if it is omitted or nega-
tive, the popular choice T 1/3 is used. The second allows specification of an assumed value for the
population mean of X, which then replaces X̄ in the variance calculation. Usage is illustrated below.

# automatic window size; use xbar for mean


lrs2 = lrvar(x)
# set a window size of 12
lrs2 = lrvar(x, 12)
# set window size and impose assumed mean of zero
lrs2 = lrvar(x, 12, 0)
# impose mean zero, automatic window size
lrs2 = lrvar(x, -1, 0)

22.4 Special issues with panel data


Since panel data have both a time-series and a cross-sectional dimension one might expect that, in
general, robust estimation of the covariance matrix would require handling both heteroskedasticity
and autocorrelation (the HAC approach). In addition, some special features of panel data require
attention.

• The variance of the error term may differ across the cross-sectional units.

• The covariance of the errors across the units may be non-zero in each time period.

• If the “between” variation is not swept out, the errors may exhibit autocorrelation, not in the
usual time-series sense but in the sense that the mean value of the error term may differ
across units. This is relevant when estimation is by pooled OLS.

Gretl currently offers two robust covariance matrix estimators specifically for panel data. These are
available for models estimated via fixed effects, random effects, pooled OLS, and pooled two-stage
least squares. The default robust estimator is that suggested by Arellano (2003), which is HAC
provided the panel is of the “large n, small T ” variety (that is, many units are observed in relatively
few periods). The Arellano estimator is
 
n
−1  −1
X
Σ̂A = X ′ X Xi′ ûi û′i Xi  X ′ X


i=1

where X is the matrix of regressors (with the group means subtracted in the case of fixed effects, or
quasi-demeaned in the case of random effects) ûi denotes the vector of residuals for unit i, and n
is the number of cross-sectional units.2 Cameron and Trivedi (2005) make a strong case for using
this estimator; they note that the ordinary White HCCME can produce misleadingly small standard

2 This variance estimator is also known as the “clustered (over entities)” estimator.
Chapter 22. Robust covariance matrix estimation 212

errors in the panel context because it fails to take autocorrelation into account.3 In addition Stock
and Watson (2008) show that the White HCCME is inconsistent in the fixed-effects panel context for
fixed T > 2.
In cases where autocorrelation is not an issue the estimator proposed by Beck and Katz (1995)
and discussed by Greene (2003, chapter 13) may be appropriate. This estimator, which takes into
account contemporaneous correlation across the units and heteroskedasticity by unit, is
 
n Xn

−1 X ′
−1
Σ̂BK = X X  σ̂ij Xi Xj  X ′ X
i=1 j=1

The covariances σ̂ij are estimated via


û′i ûj
σ̂ij =
T
where T is the length of the time series for each unit. Beck and Katz call the associated standard
errors “Panel-Corrected Standard Errors” (PCSE). This estimator can be invoked in gretl via the
command

set pcse on

The Arellano default can be re-established via

set pcse off

(Note that regardless of the pcse setting, the robust estimator is not used unless the --robust flag
is given, or the “Robust” box is checked in the GUI program.)

22.5 The cluster-robust estimator


One further variance estimator is available in gretl, namely the “cluster-robust” estimator. This may
be appropriate (for cross-sectional data, mostly) when the observations naturally fall into groups or
clusters, and one suspects that the error term may exhibit dependency within the clusters and/or
have a variance that differs across clusters. Such clusters may be binary (e.g. employed versus
unemployed workers), categorical with several values (e.g. products grouped by manufacturer) or
ordinal (e.g. individuals with low, middle or high education levels).
For linear regression models estimated via least squares the cluster estimator is defined as
 
m
−1 −1
X
Σ̂C = X ′ X Xj′ ûj û′j Xj  X ′ X


j=1

where m denotes the number of clusters, and Xj and ûj denote, respectively, the matrix of regres-
sors and the vector of residuals that fall within cluster j. As noted above, the Arellano variance
estimator for panel data models is a special case of this, where the clustering is by panel unit.
For models estimated by the method of Maximum Likelihood (in which case the standard variance
estimator is the inverse of the negative Hessian, H), the cluster estimator is
 
m
X
Σ̂C = H −1  Gj′ Gj  H −1
j=1

where Gj is the sum of the “score” (that is, the derivative of the loglikelihood with respect to the
parameter estimates) across the observations falling within cluster j.
3 See also Cameron and Miller (2015) for a discussion of the Arellano-type estimator in the context of the random

effects model.
Chapter 22. Robust covariance matrix estimation 213

It is common to apply a degrees of freedom adjustment to these estimators (otherwise the variance
may appear misleadingly small in comparison with other estimators, if the number of clusters is
small). In the least squares case the factor is (m/(m − 1)) × (n − 1)/(n − k), where n is the total
number of observations and k is the number of parameters estimated; in the case of ML estimation
the factor is just m/(m − 1).

Availability and syntax


The cluster-robust estimator is currently available for models estimated via OLS and TSLS, and also
for most ML estimators other than those specialized for time-series data: binary logit and pro-
bit, ordered logit and probit, multinomial logit, Tobit, interval regression, biprobit, count models
and duration models. Additionally, the same option is available for generic maximum likelihood
estimation as provided by the mle command (see chapter 26 for extra details).
In all cases the syntax is the same: you give the option flag --cluster= followed by the name of
the series to be used to define the clusters, as in

ols y 0 x1 x2 --cluster=cvar

The specified clustering variable must (a) be defined (not missing) at all observations used in esti-
mating the model and (b) take on at least two distinct values over the estimation range. The clusters
are defined as sets of observations having a common value for the clustering variable. It is generally
expected that the number of clusters is substantially less than the total number of observations.
Chapter 23

Panel data

A panel dataset is one in which each of N > 1 units (sometimes called “individuals” or “groups”) is
observed over time. In a balanced panel there are T > 1 observations on each unit; more generally
the number of observations may differ by unit. In the following we index units by i and time by t.
To allow for imbalance in a panel we use the notation Ti to refer to the number of observations for
unit or individual i.

23.1 Estimation of panel models


Pooled Ordinary Least Squares
The simplest estimator for panel data is pooled OLS. In most cases this is unlikely to be adequate,
but it provides a baseline for comparison with more complex estimators.
If you estimate a model on panel data using OLS an additional test item becomes available. In
the GUI model window this is the item “panel specification” under the Tests menu; the script
counterpart is the panspec command.
To take advantage of this test, you should specify a model without any dummy variables represent-
ing cross-sectional units. The test compares pooled OLS against the principal alternatives, the fixed
effects and random effects models. These alternatives are explained in the following section.

The fixed and random effects models


In the graphical interface these options are found under the menu item “Model/Panel/Fixed and
random effects”. In the command-line interface one uses the panel command, with or without
the --random-effects option. (The --fixed-effects option is also allowed but not strictly
necessary, being the default.)
This section explains the nature of these models and comments on their estimation via gretl.
The pooled OLS specification may be written as

yit = Xit β + uit (23.1)

where yit is the observation on the dependent variable for cross-sectional unit i in period t, Xit
is a 1 × k vector of independent variables observed for unit i in period t, β is a k × 1 vector of
parameters, and uit is an error or disturbance term specific to unit i in period t.
The fixed and random effects models have in common that they decompose the unitary pooled
error term, uit . For the fixed effects model we write uit = αi + εit , yielding

yit = Xit β + αi + εit (23.2)

That is, we decompose uit into a unit-specific and time-invariant component, αi , and an observation-
specific error, εit .1 The αi s are then treated as fixed parameters (in effect, unit-specific y-intercepts),
which are to be estimated. This can be done by including a dummy variable for each cross-sectional

1 It is possible to break a third component out of u , namely w , a shock that is time-specific but common to all the
it t
units in a given period. In the interest of simplicity we do not pursue that option here.

214
Chapter 23. Panel data 215

unit (and suppressing the global constant). This is sometimes called the Least Squares Dummy Vari-
ables (LSDV) method. Alternatively, one can subtract the group mean from each of variables and
estimate a model without a constant. In the latter case the dependent variable may be written as

ỹit = yit − ȳi

The “group mean”, ȳi , is defined as


Ti
1 X
ȳi = yit
Ti t=1

where Ti is the number of observations for unit i. An exactly analogous formulation applies to the
independent variables. Given parameter estimates, β̂, obtained using such de-meaned data we can
recover estimates of the αi s using

Ti
1 X 
α̂i = yit − Xit β̂
Ti t=1

These two methods (LSDV, and using de-meaned data) are numerically equivalent. gretl takes the
approach of de-meaning the data. If you have a small number of cross-sectional units, a large num-
ber of time-series observations per unit, and a large number of regressors, it is more economical
in terms of computer memory to use LSDV. If need be you can easily implement this manually. For
example,

genr unitdum
ols y x du_*

(See Chapter 10 for details on unitdum).


The α̂i estimates are not printed as part of the standard model output in gretl (there may be a large
number of these, and typically they are not of much inherent interest). However you can retrieve
them after estimation of the fixed effects model if you wish. In the graphical interface, go to the
“Save” menu in the model window and select “per-unit constants”. In command-line mode, you can
do series newname = $ahat, where newname is the name you want to give the series.
For the random effects model we write uit = vi + εit , so the model becomes

yit = Xit β + vi + εit (23.3)

In contrast to the fixed effects model, the vi s are not treated as fixed parameters, but as random
drawings from a given probability distribution.
The celebrated Gauss–Markov theorem, according to which OLS is the best linear unbiased esti-
mator (BLUE), depends on the assumption that the error term is independently and identically
distributed (IID). In the panel context, the IID assumption means that E(u2it ), in relation to equa-
tion 23.1, equals a constant, σu2 , for all i and t, while the covariance E(uis uit ) equals zero for all
s ≠ t and the covariance E(ujt uit ) equals zero for all j ≠ i.
If these assumptions are not met—and they are unlikely to be met in the context of panel data —
OLS is not the most efficient estimator. Greater efficiency may be gained using generalized least
squares (GLS), taking into account the covariance structure of the error term.
Consider observations on a given unit i at two different times s and t. From the hypotheses above
it can be worked out that Var(uis ) = Var(uit ) = σv2 + σε2 , while the covariance between uis and uit
is given by E(uis uit ) = σv2 .
In matrix notation, we may group all the Ti observations for unit i into the vector yi and write it as

yi = Xi β + ui (23.4)
Chapter 23. Panel data 216

The vector ui , which includes all the disturbances for individual i, has a variance–covariance matrix
given by
Var(ui ) = Σi = σε2 I + σv2 J (23.5)
where J is a square matrix with all elements equal to 1. It can be shown that the matrix

θi
Ki = I − J,
Ti
r  
where θi = 1 − σε2 / σε2 + Ti σv2 , has the property

Ki ΣKi′ = σε2 I

It follows that the transformed system

Ki yi = Ki Xi β + Ki ui (23.6)

satisfies the Gauss–Markov conditions, and OLS estimation of (23.6) provides efficient inference.
But since
Ki yi = yi − θi ȳi
GLS estimation is equivalent to OLS using “quasi-demeaned” variables; that is, variables from which
we subtract a fraction θ of their average.2 Notice that for σε2 → 0, θ → 1, while for σv2 → 0, θ → 0.
This means that if all the variance is attributable to the individual effects, then the fixed effects
estimator is optimal; if, on the other hand, individual effects are negligible, then pooled OLS turns
out, unsurprisingly, to be the optimal estimator.
To implement the GLS approach we need to calculate θ, which in turn requires estimates of the
two variances σε2 and σv2 . (These are often referred to as the “within” and “between” variances
respectively, since the former refers to variation within each cross-sectional unit and the latter to
variation between the units). Several means of estimating these magnitudes have been suggested in
the literature (see Baltagi, 1995); by default gretl uses the method of Swamy and Arora (1972): σε2
is estimated by the residual variance from the fixed effects model, and σv2 is estimated indirectly
with the help of the “between” regression which uses the group means of all the relevant variables:
is,
ȳi = X̄i β + ei
The residual variance from this regression, se2 , can be shown to estimate the sum σv2 + σε2 /T . An
estimate of σv2 can therefore be obtained by subtracting 1/T times the estimate of σε2 from se2 :

σ̂v2 = se2 − σ̂ε2 /T (23.7)

Alternatively, if the --nerlove option is given, gretl uses the method suggested by Nerlove (1971).
In this case σv2 is estimated as the sample variance of the fixed effects, α̂i ,
n
1 X ¯
2
σ̂v2 = α̂i − α̂ (23.8)
N − 1 i=1

¯ is the mean of the estimated fixed effects.


where N is the number of individuals and α̂
Swamy and Arora’s equation (23.7) involves T , hence assuming a balanced panel. When the number
of time series observations, Ti , differs across individuals some sort of adjustment is needed. By
default gretl follows Stata by using the harmonic mean of the Ti s in place of T . It may be argued,
however, that a more substantial adjustment is called for in the unbalanced case. Baltagi and Chang
(1994) recommend a variant of Swamy–Arora which involves Ti -weighted estimation of the between
regression, on the basis that units with more observations will be more informative about the vari-
ance of interest. In gretl one can switch to the Baltagi–Chang variant by giving the --unbalanced
2 In a balanced panel, the value of θ is common to all individuals, otherwise it differs depending on the value of Ti .
Chapter 23. Panel data 217

option with the panel command. But the gain in efficiency from doing so may well be slim; for a
discussion of this point and related matters see Cottrell (2017). Unbalancedness also affects the
Nerlove (1971) estimator, but the econometric literature offers no guidance on the details. Gretl
uses the weighted average of the fixed effects as a natural extension of the original method. Again,
see Cottrell (2017) for further details.

Choice of estimator
Which panel method should one use, fixed effects or random effects?
One way of answering this question is in relation to the nature of the data set. If the panel comprises
observations on a fixed and relatively small set of units of interest (say, the member states of the
European Union), there is a presumption in favor of fixed effects. If it comprises observations on a
large number of randomly selected individuals (as in many epidemiological and other longitudinal
studies), there is a presumption in favor of random effects.
Besides this general heuristic, however, various statistical issues must be taken into account.

1. Some panel data sets contain variables whose values are specific to the cross-sectional unit
but which do not vary over time. If you want to include such variables in the model, the fixed
effects option is simply not available. When the fixed effects approach is implemented using
dummy variables, the problem is that the time-invariant variables are perfectly collinear with
the per-unit dummies. When using the approach of subtracting the group means, the issue is
that after de-meaning these variables are nothing but zeros.

2. A somewhat analogous issue arises with the random effects estimator. As mentioned above,
the default Swamy–Arora method relies on the group-means regression to obtain a measure
of the between variance. Suppose we have observations on n units or individuals and there
are k independent variables of interest. If k > n, this regression cannot be run — since we
have only n effective observations — and hence Swamy–Arora estimates cannot be obtained.
In this case, however, it is possible to use Nerlove’s method instead.

If both fixed effects and random effects are feasible for a given specification and dataset, the choice
between these estimators may be expressed in terms of the two econometric desiderata, efficiency
and consistency.
From a purely statistical viewpoint, we could say that there is a tradeoff between robustness and
efficiency. In the fixed effects approach, we do not make any hypotheses on the “group effects” (that
is, the time-invariant differences in mean between the groups) beyond the fact that they exist —
and that can be tested; see below. As a consequence, once these effects are swept out by taking
deviations from the group means, the remaining parameters can be estimated.
On the other hand, the random effects approach attempts to model the group effects as drawings
from a probability distribution instead of removing them. This requires that individual effects are
representable as a legitimate part of the disturbance term, that is, zero-mean random variables,
uncorrelated with the regressors.
As a consequence, the fixed-effects estimator “always works”, but at the cost of not being able to
estimate the effect of time-invariant regressors. The richer hypothesis set of the random-effects
estimator ensures that parameters for time-invariant regressors can be estimated, and that esti-
mation of the parameters for time-varying regressors is carried out more efficiently. These advan-
tages, though, are tied to the validity of the additional hypotheses. If, for example, there is reason
to think that individual effects may be correlated with some of the explanatory variables, then the
random-effects estimator would be inconsistent, while fixed-effects estimates would still be valid.
The Hausman test is built on this principle (see below): if the fixed- and random-effects estimates
agree, to within the usual statistical margin of error, there is no reason to think the additional
hypotheses invalid, and as a consequence, no reason not to use the more efficient RE estimator.
Chapter 23. Panel data 218

Testing panel models


If you estimate a fixed effects or random effects model in the graphical interface, you may notice
that the number of items available under the “Tests” menu in the model window is relatively limited.
Panel models carry certain complications that make it difficult to implement all of the tests one
expects to see for models estimated on straight time-series or cross-sectional data.
Nonetheless, various panel-specific tests are printed along with the parameter estimates as a matter
of course, as follows.
When you estimate a model using fixed effects, you automatically get an F -test for the null hy-
pothesis that the cross-sectional units all have a common intercept. That is to say that all the αi s
are equal, in which case the pooled model (23.1), with a column of 1s included in the X matrix, is
adequate.
When you estimate using random effects (RE), the Breusch–Pagan and Hausman tests are presented
automatically. To save their results in the context of a script one would copy the $model.bp_test
or $model.hausman_test bundles which are nested inside the $model bundle. Both of these inner
bundles contain the elements test, dfn (degrees of freedom), and pvalue.
The Breusch–Pagan test is the counterpart to the F -test mentioned above. The null hypothesis is
that the variance of vi in equation (23.3) equals zero; if this hypothesis is not rejected, then again
we conclude that the simple pooled model is adequate. If the panel is unbalanced the method from
Baltagi and Li (1990) is used to perform the Breusch–Pagan test for individual effects.
The Hausman test probes the consistency of the GLS estimates. The null hypothesis is that these
estimates are consistent—that is, that the requirement of orthogonality of the vi and the Xi is
satisfied. The test is based on a measure, H, of the “distance” between the fixed-effects and random-
effects estimates, constructed such that under the null it follows the χ 2 distribution with degrees
of freedom equal to the number of time-varying regressors in the matrix X. If the value of H is
“large” this suggests that the random effects estimator is not consistent and the fixed-effects model
is preferable.
There are two ways of calculating H, the matrix-difference method and the regression method. The
procedure for the matrix-difference method is this:

• Collect the fixed-effects estimates in a vector β̃ and the corresponding random-effects esti-
mates in β̂, then form the difference vector (β̃ − β̂).

• Form the covariance matrix of the difference vector as Var(β̃ − β̂) = Var(β̃) − Var(β̂) = Ψ ,
where Var(β̃) and Var(β̂) are estimated by the sample variance matrices of the fixed- and
random-effects models respectively.3
 ′  
• Compute H = β̃ − β̂ Ψ −1 β̃ − β̂ .

Given the relative efficiencies of β̃ and β̂, the matrix Ψ “should be” positive definite, in which case
H is positive, but in finite samples this is not guaranteed and of course a negative χ 2 value is not
admissible.
The regression method avoids this potential problem. The procedure is to estimate, via OLS, an
augmented regression in which the dependent variable is quasi-demeaned y and the regressors
include both quasi-demeaned X (as in the RE specification) and the de-meaned variants of all the
time-varying variables (i.e. the fixed-effects regressors). The Hausman null then implies that the
coefficients on the latter subset of regressors should be statistically indistinguishable from zero.
If the RE specification employs the default covariance-matrix estimator (assuming IID errors), H
can be obtained as follows:

3 Hausman (1978) showed that the covariance of the difference takes this simple form when β̂ is an efficient estimator

and β̃ is inefficient.
Chapter 23. Panel data 219

• Treat the random-effects model as the restricted model, and record its sum of squared resid-
uals as SSRr .

• Estimate the augmented (unrestricted) regression and record its sum of squared residuals as
SSRu .

• Compute H = n (SSRr − SSRu ) /SSRu , where n is the total number of observations used.

Alternatively, if the --robust option is selected for RE estimation, H is calculated as a Wald test
based on a robust estimate of the covariance matrix of the augmented regression. Either way, H
cannot be negative.
By default gretl computes the Hausman test via the regression method, but it uses the matrix-
difference method if you pass the option --matrix-diff to the panel command.

Serial correlation
A simple test for first-order autocorrelation of the error term, namely the Durbin–Watson (DW)
statistic, is printed as part of the output for pooled OLS as well as fixed-effects and random-effects
estimation. Let us define “serial correlation proper” as serial correlation strictly in the time di-
mension of a panel dataset. When based on the residuals from fixed-effects estimation, the DW
statistic is a test for serial correlation proper.4 The DW value shown in the case of random-effects
estimation is based on the fixed-effects residuals. When DW is based on pooled OLS residuals it
tests for serial correlation proper only on the assumption of a common intercept. Put differently,
in this case it tests a joint null hypothesis: absence of fixed effects plus absence of (first order)
serial correlation proper. In the presence of missing observations the DW statistic is calculated as
described in Baltagi and Wu (1999) (their expression for d1 under equation (16) on page 819).
When it is computed, the DW statistic can be retrieved via the accessor $dw after estimation. In
addition, an approximate P -value for the null of no serial correlation (ρ = 0) against the alternative
of ρ > 0 may be available via the accessor $dwpval. This is based on the analysis in Bhargava
et al. (1982); strictly speaking it is the marginal significance level of DW considered as a dL value
(the value below which the test rejects, as opposed to dU , the value above which the test fails to
reject). In the panel case, however, dL and dU are quite close, particularly when N (the number of
individual units) is large. At present gretl does not attempt to compute such P -values when the
number of observations differs across individuals.

Robust standard errors


For most estimators, gretl offers the option of computing an estimate of the covariance matrix that
is robust with respect to heteroskedasticity and/or autocorrelation (and hence also robust standard
errors). In the case of panel data, robust covariance matrix estimators are available for the pooled,
fixed effects and random effects models. See section 22.4 for details.

The constant in the fixed effects model


Users are sometimes puzzled by the constant or intercept reported by gretl on estimation of the
fixed effects model: how can a constant remain when the group means have been subtracted from
the data? The method of calculation of this term is a matter of convention, but the gretl authors
decided to follow the convention employed by Stata; this involves adding the global mean back
into the variables from which the group means have been removed.5 If you prefer to interpret the
fixed effects model as “OLS plus unit dummies throughout”, it can be proven the this approach is
equivalent to using centered unit dummies instead of plain 0/1 dummies.

4 The generalization of the Durbin–Watson statistic from the straight time-series context to panel data is due to

Bhargava et al. (1982).


5 See Gould (2013) for an extended explanation.
Chapter 23. Panel data 220

The method that gretl uses internally is exemplified in Listing 23.1. The coefficients in the second
OLS estimation, including the intercept, agree with those in the initial fixed effects model, though
the standard errors differ due to a degrees of freedom correction in the fixed-effects covariance
matrix. (Note that the pmean function returns the group mean of a series.) The third estimator —
which produces quite a lot of output — instead uses the stdize function to create the centered
dummies. It thereby shows the equivalence of the internally-used method to “OLS plus centered
dummies”. (Note that in this case the standard errors agree with the initial estimates.)

Listing 23.1: Calculating the intercept in the fixed effects model [Download ▼]

open abdata.gdt

list X = w k ys # list of explanatory variables

###
### built-in method
###

panel n const X --fixed-effects

###
### recentering "by hand"
###

depvar = n - pmean(n) + mean(n) # redefine the dependent variable

list indepvars = const


loop foreach i X
# redefine the explanatory variables
x_$i = $i - pmean($i) + mean($i)
indepvars += x_$i
endloop

ols depvar indepvars # perform estimation

###
### using centered dummies
###

list C = dummify(unit) # create the unit dummies


smpl n X --no-missing # adjust to perform centering correctly
list D = stdize(C, -1) # center the unit dummies
ols n const X D # perform estimation

R-squared in the fixed effects model


There is no uniquely “correct” way of calculating R 2 in the context of the fixed-effects model. It
may be argued that a measure of the squared correlation between the dependent variable and the
prediction yielded by the model is a desirable descriptive statistic to have, but which model and
which (variant of the) dependent variable are we talking about?
Fixed-effects models can be thought of in two equally defensible ways. From one perspective they
provide a nice, clean way of sweeping out individual effects by using the fact that in the linear
model a sufficient statistic is easy to compute. Alternatively, they provide a clever way to estimate
Chapter 23. Panel data 221

the “important” parameters of a model in which you want to include (for whatever reason) a full
set of individual dummies. If you take the second of these perspectives, your dependent variable is
unmodified y and your model includes the unit dummies; the appropriate R 2 measure is then the
squared correlation between y and the ŷ computed using both the measured individual effects and
the effects of the explicitly named regressors. This is reported by gretl as the “LSDV R-squared”. If
you take the first point of view, on the other hand, your dependent variable is really yit − ȳi and
your model just includes the β terms, the coefficients of deviations of the x variables from their
per-unit means. In this case, the relevant measure of R 2 is the so-called “within” R 2 ; this variant
is printed by gretl for fixed-effects model in place of the adjusted R 2 (it being unclear in this case
what exactly the “adjustment” should amount to anyway).

Residuals in the fixed and random effect models


After estimation of most kinds of models in gretl, you can retrieve a series containing the residuals
using the $uhat accessor. This is true of the fixed and random effects models, but the exact
meaning of gretl’s $uhat in these cases requires a little explanation.
Consider first the fixed effects model:

yit = Xit β + αi + εit

In this model gretl takes the “fitted value” ($yhat) to be α̂i + Xit β̂, and the residual ($uhat) to be
yit minus this fitted value. This makes sense because the fixed effects (the αi terms) are taken as
parameters to be estimated. However, it can be argued that the fixed effects are not really “explana-
tory” and if one defines the residual as the observed yit value minus its “explained” component
one might prefer to see just yit − Xit β̂. You can get this after fixed-effects estimation as follows:

series ue_fe = $uhat + $ahat - $coeff[1]

where $ahat gives the unit-specific intercept (as it would be calculated if one included all N unit
dummies and omitted a common y-intercept), and $coeff[1] gives the “global” y-intercept.6
Now consider the random-effects model:

yit = Xit β + vi + εit

In this case gretl considers the error term to be vi + εit (since vi is conceived as a random drawing)
and the $uhat series is an estimate of this, namely

yit − Xit β̂

What if you want an estimate of just vi (or just εit ) in this case? This poses a signal-extraction
problem: given the composite residual, how to recover an estimate of its components? The solution
is to ascribe to the individual effect, v̂i , a suitable fraction of the mean residual per individual,
¯ i = Ti ûit . The “suitable fraction” is the proportion of the variance of the variance of ūi that is
P
û t=1
due to vi , namely
σv2
= 1 − (1 − θi )2
σv + σε2 /Ti
2

After random effects estimation in gretl you can access a series containing the v̂i s under the name
$ahat. This series can be calculated by hand as follows:

# case 1: balanced panel


scalar theta = $["theta"]
series vhat = (1 - (1 - theta)^2) * pmean($uhat)

6 For anyone used to Stata, gretl’s fixed-effects $uhat corresponds to what you get from Stata’s “predict, e” after

xtreg, while the second variant corresponds to Stata’s “predict, ue”.


Chapter 23. Panel data 222

# case 2: unbalanced, Ti varies by individual


scalar s2v = $["s2v"]
scalar s2e = $["s2e"]
series frac = s2v / (s2v + s2e/pnobs($uhat))
series ahat = frac * pmean($uhat)

If an estimate of εit is wanted, it can then be obtained by subtraction from $uhat.

23.2 Autoregressive panel models


Special problems arise when a lag of the dependent variable is included among the regressors in a
panel model. Consider a dynamic variant of the pooled model (eq. 23.1):

yit = Xit β + ρyit−1 + uit (23.9)

First, if the error uit includes a group effect, vi , then yit−1 is bound to be correlated with the error,
since the value of vi affects yi at all t. That means that OLS applied to (23.9) will be inconsistent
as well as inefficient. The fixed-effects model sweeps out the group effects and so overcomes this
particular problem, but a subtler issue remains, which applies to both fixed and random effects
estimation. Consider the de-meaned representation of fixed effects, as applied to the dynamic
model,
ỹit = X̃it β + ρ ỹi,t−1 + εit
where ỹit = yit − ȳi and εit = uit − ūi (or uit − αi , using the notation of equation 23.2). The trouble
is that ỹi,t−1 will be correlated with εit via the group mean, ȳi . The disturbance εit influences yit
directly, which influences ȳi , which, by construction, affects the value of ỹit for all t. The same
issue arises in relation to the quasi-demeaning used for random effects. Estimators which ignore
this correlation will be consistent only as T → ∞ (in which case the marginal effect of εit on the
group mean of y tends to vanish).
One strategy for handling this problem, and producing consistent estimates of β and ρ, was pro-
posed by Anderson and Hsiao (1981). Instead of de-meaning the data, they suggest taking the first
difference of (23.9), an alternative tactic for sweeping out the group effects:

∆yit = ∆Xit β + ρ∆yi,t−1 + ηit (23.10)

where ηit = ∆uit = ∆(vi + εit ) = εit − εi,t−1 . We’re not in the clear yet, given the structure of the
error ηit : the disturbance εi,t−1 is an influence on both ηit and ∆yi,t−1 = yit − yi,t−1 . The next step
is then to find an instrument for the “contaminated” ∆yi,t−1 . Anderson and Hsiao suggest using
either yi,t−2 or ∆yi,t−2 , both of which will be uncorrelated with ηit provided that the underlying
errors, εit , are not themselves serially correlated.
The Anderson–Hsiao estimator is not provided as a built-in function in gretl, since gretl’s sensible
handling of lags and differences for panel data makes it a simple application of regression with
instrumental variables—see Listing 23.2, which is based on a study of country growth rates by
Nerlove (1999).7
Although the Anderson–Hsiao estimator is consistent, it is not most efficient: it does not make the
fullest use of the available instruments for ∆yi,t−1 , nor does it take into account the differenced
structure of the error ηit . It is improved upon by the methods of Arellano and Bond (1991) and
Blundell and Bond (1998). These methods are taken up in the next chapter.

7 Also see Clint Cummins’ benchmarks page, http://www.stanford.edu/~clint/bench/.


Chapter 23. Panel data 223

Listing 23.2: The Anderson–Hsiao estimator for a dynamic panel model [Download ▼]

# Penn World Table data as used by Nerlove


open penngrow.gdt
# Fixed effects (for comparison)
panel Y 0 Y(-1) X
# Random effects (for comparison)
panel Y 0 Y(-1) X --random-effects
# take differences of all variables
diff Y X
# Anderson-Hsiao, using Y(-2) as instrument
tsls d_Y d_Y(-1) d_X ; 0 d_X Y(-2)
# Anderson-Hsiao, using d_Y(-2) as instrument
tsls d_Y d_Y(-1) d_X ; 0 d_X d_Y(-2)
Chapter 24

Dynamic panel models

The command for estimating dynamic panel models in gretl is dpanel. This command supports
both the “difference” estimator (Arellano and Bond, 1991) and the “system” estimator (Blundell and
Bond, 1998), which has become the method of choice in the applied literature.

24.1 Introduction
Notation
A dynamic linear panel data model can be represented as follows (in notation based on Arellano
(2003)):
yit = αyi,t−1 + β′ xit + ηi + vit (24.1)
where i = 1, 2 . . . , N indexes the cross-section units and t indexes time.
The main idea behind the difference estimator is to sweep out the individual effect via differencing.
First-differencing eq. (24.1) yields

∆yit = α∆yi,t−1 + β′ ∆xit + ∆vit = γ ′ Wit + ∆vit , (24.2)

in obvious notation. The error term of (24.2) is, by construction, autocorrelated and also correlated
with the lagged dependent variable, so an estimator that takes both issues into account is needed.
The endogeneity issue is solved by noting that all values of yi,t−k with k > 1 can be used as
instruments for ∆yi,t−1 : unobserved values of yi,t−k (whether missing or pre-sample) can safely be
substituted with 0. In the language of GMM, this amounts to using the relation

E(∆vit · yi,t−k ) = 0, k>1 (24.3)

as an orthogonality condition.
Autocorrelation is dealt with by noting that if vit is white noise, the covariance matrix of the vector
whose typical element is ∆vit is proportional to a matrix H that has 2 on the main diagonal, −1
on the first subdiagonals and 0 elsewhere. One-step GMM estimation of equation (24.2) amounts to
computing
   −1    
X X X X
γ̂ =  W′i Zi  AN  Z′i Wi   W′i Zi  AN  Z′i ∆yi  (24.4)
i i i i

where
h i′
∆yi = ∆yi3 ··· ∆yiT
" #′
∆yi2 ··· ∆yi,T −1
Wi =
∆xi3 ··· ∆xiT
 
yi1 0 0 ··· 0 ∆xi3
0 yi1 yi2 ··· 0
 
 ∆xi4 
Zi = 
 
.. 

 . 

0 0 0 ··· yi,T −2 ∆xiT

224
Chapter 24. Dynamic panel models 225

and
 −1
X
AN =  Z′i HZi 
i

Once the 1-step estimator is computed, the sample covariance matrix of the estimated residuals
can be used instead of H to obtain 2-step estimates, which are not only consistent but asymp-
totically efficient. (In principle the process may be iterated, but nobody seems to be interested.)
Standard GMM theory applies, except for one point: Windmeijer (2005) has computed finite-sample
corrections to the asymptotic covariance matrix of the parameters, which are nowadays almost
universally used.
The difference estimator is consistent, but has been shown to have poor properties in finite samples
when α is near one. People these days prefer the so-called “system” estimator, which complements
the differenced data (with lagged levels used as instruments) with data in levels (using lagged dif-
ferences as instruments). The system estimator relies on an extra orthogonality condition which
has to do with the earliest value of the dependent variable yi,1 . The interested reader is referred
to Blundell and Bond (1998, pp. 124–125) for details, but here it suffices to say that this condi-
tion is satisfied in mean-stationary models and brings an improvement in efficiency that may be
substantial in many cases.
The set of orthogonality conditions exploited in the system approach is not very much larger than
with the difference estimator since most of the possible orthogonality conditions associated with
the equations in levels are redundant, given those already used for the equations in differences.
The key equations of the system estimator can be written as

   −1    
X X X X
γ̃ =  W̃′i Z̃i  AN  Z̃′i W̃i   W̃′i Z̃i  AN  Z̃′i ∆ỹi  (24.5)
i i i i

where
h i′
∆ỹi = ∆yi3 ··· ∆yiT yi3 ··· yiT
" #′
∆yi2 ··· ∆yi,T −1 yi2 ··· yi,T −1
W̃i =
∆xi3 ··· ∆xiT xi3 ··· xiT
 
yi1 0 0 ··· 0 0 ··· 0 ∆xi3
 

 0 yi1 yi2 ··· 0 0 ··· 0 ∆xi4 

 .. 
.
 
 
 

 0 0 0 ··· yi,T −2 0 ··· 0 ∆xiT 

Z̃i = 
 .. 


 . 

0 0 0 ··· 0 ··· 0 xi3 
 
 ∆yi2
..
 
 

 . 

0 0 0 ··· 0 0 ··· ∆yi,T −1 xiT

and
 −1
X
AN =  Z̃′i H ∗ Z̃i 
i

In this case choosing a precise form for the matrix H ∗ for the first step is no trivial matter. Its
north-west block should be as similar as possible to the covariance matrix of the vector ∆vit , so
Chapter 24. Dynamic panel models 226

the same choice as the “difference” estimator is appropriate. Ideally, the south-east block should
be proportional to the covariance matrix of the vector ι ηi + v, that is σv2 I + ση2 ι ι ′ ; but since ση2 is
unknown and any positive definite matrix renders the estimator consistent, people just use I. The
off-diagonal blocks should, in principle, contain the covariances between ∆vis and vit , which would
be an identity matrix if vit is white noise. However, since the south-east block is typically given a
conventional value anyway, the benefit in making this choice is not obvious. Some packages use I;
others use a zero matrix. Asymptotically, it should not matter, but on real datasets the difference
between the resulting estimates can be noticeable.

Rank deficiency
Both the difference estimator (24.4) and the system estimator (24.5) depend for their existence on
the invertibility of AN . This matrix may turn out to be singular for several reasons. However, this
does not mean that the estimator is not computable. In some cases, adjustments are possible such
that the estimator does exist, but the user should be aware that in such cases not all software
packages use the same strategy and replication of results may prove difficult or even impossible.
A first reason why AN may be singular is unavailability of instruments, chiefly because of missing
observations. This case is easy to handle. If a particular row of Z̃i is zero for all units, the corre-
sponding orthogonality condition (or the corresponding instrument if you prefer) is automatically
dropped; the overidentification rank is then adjusted for testing purposes.
Even if no instruments are zero, however, AN could be rank deficient. A trivial case occurs if there
are collinear instruments, but a less trivial case may arise when T (the total number of time periods
available) is not much smaller than N (the number of units), as, for example, in some macro datasets
where the units are countries. The total number of potentially usable orthogonality conditions is
O(T 2 ), which may well exceed N in some cases. Since AN is the sum of N matrices which have, at
most, rank 2T − 3 it could well happen that the sum is singular.
In all these cases, dpanel substitutes the pseudo-inverse of AN (Moore–Penrose) for its regular
inverse. Our choice is shared by some software packages, but not all, so replication may be hard.

Covariance matrix and standard errors


By default the standard errors shown for 1-step estimation are robust, based on the heteroskedasticity-
consistent variance estimator
   
X X
Var(γ̂)
d = M−1  W′i Zi  AN V̂N AN  Z′i Wi  M−1
i i

where M = ( i W′i Zi )AN ( i Z′i Wi ) and V̂N = N −1 ′ ′


P P P
i Zi ûi ûi Zi , with ûi the vector of residuals in
differences for individual i. In addition, as noted above, the variance estimator for 2-step estimation
employs the finite-sample correction of Windmeijer (2005).
When the --asymptotic option is passed to dpanel, however, the 1-step variance estimator is
simply σ̂u2 M −1 , which is not heteroskedasticity-consistent, and the Windmeijer correction is not
applied for 2-step estimation. Use of this option is not recommended unless you wish to replicate
prior results that did not report robust standard errors. In particular, tests based on the asymptotic
2-step variance estimator are known to over-reject quite substantially (standard errors too small).

Treatment of missing values


Textbooks seldom bother with missing values, but in some cases their treatment may be far from
obvious. This is especially true if missing values are interspersed between valid observations. For
example, consider the plain difference estimator with one lag, so

yt = αyt−1 + η + ϵt
Chapter 24. Dynamic panel models 227

where the i index is omitted for clarity. Suppose you have an individual with t = 1 . . . 5, for which
y3 is missing. It may seem that the data for this individual are unusable, because differencing yt
would produce something like
t 1 2 3 4 5
yt ∗ ∗ ◦ ∗ ∗
∆yt ◦ ∗ ◦ ◦ ∗
where ∗ = nonmissing and ◦ = missing. Estimation seems to be unfeasible, since there are no
periods in which ∆yt and ∆yt−1 are both observable.
However, we can use a k-difference operator and get

∆k yt = α∆k yt−1 + ∆k ϵt

where ∆k = 1 − Lk and past levels of yt are valid instruments. In this example, we can choose k = 3
and use y1 as an instrument, so this unit is in fact usable.
Not all software packages seem to be aware of this possibility, so replicating published results may
prove tricky if your dataset contains individuals with gaps between valid observations.

24.2 Usage
One feature of dpanel’s syntax is that you get default values for several choices you may wish to
make, so that in a “standard” situation the command is very concise. The simplest case of the
model (24.1) is a plain AR(1) process:

yi,t = αyi,t−1 + ηi + vit . (24.6)

If you give the command

dpanel 1 ; y

Gretl assumes that you want to estimate (24.6) via the difference estimator (24.4), using as many
orthogonality conditions as possible. The scalar 1 between dpanel and the semicolon indicates that
only one lag of y is included as an explanatory variable; using 2 would give an AR(2) model. The
syntax that gretl uses for the non-seasonal AR and MA lags in an ARMA model is also supported in
this context. For example, if you want the first and third lags of y (but not the second) included as
explanatory variables you can say

dpanel {1 3} ; y

or you can use a pre-defined matrix for this purpose:

matrix ylags = {1, 3}


dpanel ylags ; y

To use a single lag of y other than the first you need to employ this mechanism:

dpanel {3} ; y # only lag 3 is included


dpanel 3 ; y # compare: lags 1, 2 and 3 are used

To use the system estimator instead, you add the --system option, as in

dpanel 1 ; y --system

The level orthogonality conditions and the corresponding instrument are appended automatically
(see eq. 24.5).
Chapter 24. Dynamic panel models 228

Regressors
If additional regressors are to be included, they should be listed after the dependent variable in
the same way as other gretl estimation commands, such as ols. For the difference orthogonality
relations, dpanel takes care of transforming the regressors in parallel with the dependent variable.
One case of potential ambiguity is when an intercept is specified but the difference-only estimator
is selected, as in

dpanel 1 ; y const

In this case the default dpanel behavior, which agrees with David Roodman’s xtabond2 for Stata
(Roodman, 2009a), is to drop the constant (since differencing reduces it to nothing but zeros).
However, for compatibility with the DPD package for Ox, you can give the option --dpdstyle, in
which case the constant is retained (equivalent to including a linear trend in equation 24.1). A
similar point applies to the period-specific dummy variables which can be added in dpanel via the
--time-dummies option: in the differences-only case these dummies are entered in differenced
form by default, but when the --dpdstyle switch is applied they are entered in levels.
The standard gretl syntax applies if you want to use lagged explanatory variables, so for example
the command

dpanel 1 ; y const x(0 to -1) --system

would result in estimation of the model

yit = αyi,t−1 + β0 + β1 xit + β2 xi,t−1 + ηi + vit .

Instruments
The default rules for instruments are:

• lags of the dependent variable are instrumented using all available orthogonality conditions;
and

• additional regressors are considered exogenous, so they are used as their own instruments.

If a different policy is wanted, the instruments should be specified in an additional list, separated
from the regressors list by a semicolon. The syntax closely mirrors that of the tsls command,
but in this context it is necessary to distinguish between “regular” instruments and what are often
called “GMM-style” instruments (that is, instruments that are handled in the same block-diagonal
manner as lags of the dependent variable, as described above).
“Regular” instruments are transformed in the same way as regressors, and the contemporaneous
value of the transformed variable is used to form an orthogonality condition. Since regressors are
treated as exogenous by default, it follows that these two commands estimate the same model:

dpanel 1 ; y z
dpanel 1 ; y z ; z

The instrument specification in the second case simply confirms what is implicit in the first: that
z is exogenous. Note, though, that if you have some additional variable z2 which you want to add
as a regular instrument, it then becomes necessary to include z in the instrument list if it is to be
treated as exogenous:

dpanel 1 ; y z ; z2 # z is now implicitly endogenous


dpanel 1 ; y z ; z z2 # z is treated as exogenous
Chapter 24. Dynamic panel models 229

The specification of “GMM-style” instruments is handled by the special constructs GMM() and
GMMlevel(). The first of these relates to instruments for the equations in differences, and the
second to the equations in levels. The syntax for GMM() is

GMM(name, minlag, maxlag[,collapse])

where name is replaced by the name of a series (or the name of a list of series), and minlag and
maxlag are replaced by the minimum and maximum lags to be used as instruments. The same goes
for GMMlevel().
One common use of GMM() is to limit the number of lagged levels of the dependent variable used as
instruments for the equations in differences. It’s well known that although exploiting all possible
orthogonality conditions yields maximal asymptotic efficiency, in finite samples it may be prefer-
able to use a smaller subset—see Roodman (2009b), Okui (2009). For example, the specification

dpanel 1 ; y ; GMM(y, 2, 4)

ensures that no lags of yt earlier than t − 4 will be used as instruments.


A second means of limiting the number of instruments is to “collapse” the sets of block-diagonal
instruments shown following equations 24.4 and 24.5. Instead of having a distinct instrument per
observation per lag, this is reduced to a distinct instrument per lag, as shown in Figure 24.1.

GMM()
   
yi1 0 0 0 0 0 ··· yi1 0 0 ···
0 yi1 yi2 0 0 0 ···   yi1 yi2 0 ··· 
   

=⇒
   

 0 0 0 yi1 yi2 yi3 ···    y
 i1 yi2 yi3 ···  
.. .. .. .. .. .. .. .. .. .. ..
   
. . . . . . . . . . .

GMMlevel()
   
∆yi2 0 0 ··· ∆yi2
0 0 ··· 
   
 ∆yi3  ∆yi3 
=⇒
   

 0 0 ∆yi4 ···    ∆y
 i4


.. .. .. .. ..
   
. . . . .

Figure 24.1: Collapsing block-diagonal instruments

This treatment of instruments can be selected per GMM or GMMlevel case — by appending the
collapse flag following the maxlag value — or it can be set “globally” by use of the --collapse
option to the dpanel command. To our knowledge Roodman’s xtabond2 was the first software to
offer this useful facility.
A further use of GMM() is to exploit more fully the potential orthogonality conditions afforded by
an exogenous regressor, or a related variable that does not appear as a regressor. For example, in

dpanel 1 ; y x ; GMM(z, 2, 6)

the variable x is considered an endogenous regressor, and up to 5 lags of z are used as instruments.
Note that in the following script fragment

dpanel 1 ; y z
dpanel 1 ; y z ; GMM(z,0,0)
Chapter 24. Dynamic panel models 230

the two estimation commands should not be expected to give the same result, as the sets of orthog-
onality relationships are subtly different. In the latter case, you have T − 2 separate orthogonality
relationships pertaining to zit , none of which has any implication for the other ones; in the former
case, you only have one. In terms of the Zi matrix, the first form adds a single row to the bottom of
the instruments matrix, while the second form adds a diagonal block with T − 2 columns; that is,
h i
zi3 zi4 · · · zit
versus  
zi3 0 ··· 0
 0 zi4 ··· 0 
 
 
 .. .. 

 . . 

0 0 ··· zit

24.3 Replication of DPD results


In this section we show how to replicate the results of some of the pioneering work with dynamic
panel-data estimators by Arellano, Bond and Blundell. As the DPD manual (Doornik, Arellano and
Bond, 2006) explains, it is difficult to replicate the original published results exactly, for two main
reasons: not all of the data used in those studies are publicly available; and some of the choices
made in the original software implementation of the estimators have been superseded. Here, there-
fore, our focus is on replicating the results obtained using the current DPD package and reported
in the DPD manual.
The examples are based on the program files abest1.ox, abest3.ox and bbest1.ox. These
are included in the DPD package, along with the Arellano–Bond database files abdata.bn7 and
abdata.in7.1 The Arellano–Bond data are also provided with gretl, in the file abdata.gdt. In the
following we do not show the output from DPD or gretl; it is somewhat voluminous, and is easily
generated by the user. As of this writing the results from Ox/DPD and gretl are identical in all
relevant respects for all of the examples shown.2
A complete Ox/DPD program to generate the results of interest takes this general form:

#include <oxstd.h>
#import <packages/dpd/dpd>

main()
{
decl dpd = new DPD();

dpd.Load("abdata.in7");
dpd.SetYear("YEAR");

// model-specific code here

delete dpd;
}

In the examples below we take this template for granted and show just the model-specific code.

Example 1
The following Ox/DPD code—drawn from abest1.ox— replicates column (b) of Table 4 in Arellano
and Bond (1991), an instance of the differences-only or GMM-DIF estimator. The dependent variable
1 Seehttp://www.doornik.com/download.html.
2 Tobe specific, this is using Ox Console version 5.10, version 1.24 of the DPD package, and gretl built from CVS as of
2010-10-23, all on Linux.
Chapter 24. Dynamic panel models 231

is the log of employment, n; the regressors include two lags of the dependent variable, current and
lagged values of the log real-product wage, w, the current value of the log of gross capital, k, and
current and lagged values of the log of industry output, ys. In addition the specification includes
a constant and five year dummies; unlike the stochastic regressors, these deterministic terms are
not differenced. In this specification the regressors w, k and ys are treated as exogenous and serve
as their own instruments. In DPD syntax this requires entering these variables twice, on the X_VAR
and I_VAR lines. The GMM-type (block-diagonal) instruments in this example are the second and
subsequent lags of the level of n. Both 1-step and 2-step estimates are computed.

dpd.SetOptions(FALSE); // don’t use robust standard errors


dpd.Select(Y_VAR, {"n", 0, 2});
dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});
dpd.Select(I_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});

dpd.Gmm("n", 2, 99);
dpd.SetDummies(D_CONSTANT + D_TIME);

print("\n\n***** Arellano & Bond (1991), Table 4 (b)");


dpd.SetMethod(M_1STEP);
dpd.Estimate();
dpd.SetMethod(M_2STEP);
dpd.Estimate();

Here is gretl code to do the same job:

open abdata.gdt
list X = w w(-1) k ys ys(-1)
dpanel 2 ; n X const --time-dummies --asy --dpdstyle
dpanel 2 ; n X const --time-dummies --asy --two-step --dpdstyle

Note that in gretl the switch to suppress robust standard errors is --asymptotic, here abbreviated
to --asy.3 The --dpdstyle flag specifies that the constant and dummies should not be differenced,
in the context of a GMM-DIF model. With gretl’s dpanel command it is not necessary to specify the
exogenous regressors as their own instruments since this is the default; similarly, the use of the
second and all longer lags of the dependent variable as GMM-type instruments is the default and
need not be stated explicitly.

Example 2
The DPD file abest3.ox contains a variant of the above that differs with regard to the choice of
instruments: the variables w and k are now treated as predetermined, and are instrumented GMM-
style using the second and third lags of their levels. This approximates column (c) of Table 4 in
Arellano and Bond (1991). We have modified the code in abest3.ox slightly to allow the use of
robust (Windmeijer-corrected) standard errors, which are the default in both DPD and gretl with
2-step estimation:

dpd.Select(Y_VAR, {"n", 0, 2});


dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});
dpd.Select(I_VAR, {"ys", 0, 1});
dpd.SetDummies(D_CONSTANT + D_TIME);

dpd.Gmm("n", 2, 99);
dpd.Gmm("w", 2, 3);
dpd.Gmm("k", 2, 3);

3 Option flags in gretl can always be truncated, down to the minimal unique abbreviation.
Chapter 24. Dynamic panel models 232

print("\n***** Arellano & Bond (1991), Table 4 (c)\n");


print(" (but using different instruments!!)\n");
dpd.SetMethod(M_2STEP);
dpd.Estimate();

The gretl code is as follows:

open abdata.gdt
list X = w w(-1) k ys ys(-1)
list Ivars = ys ys(-1)
dpanel 2 ; n X const ; GMM(w,2,3) GMM(k,2,3) Ivars --time --two-step --dpd

Note that since we are now calling for an instrument set other then the default (following the second
semicolon), it is necessary to include the Ivars specification for the variable ys. However, it is
not necessary to specify GMM(n,2,99) since this remains the default treatment of the dependent
variable.

Example 3
Our third example replicates the DPD output from bbest1.ox: this uses the same dataset as the
previous examples but the model specifications are based on Blundell and Bond (1998), and involve
comparison of the GMM-DIF and GMM-SYS (“system”) estimators. The basic specification is slightly
simplified in that the variable ys is not used and only one lag of the dependent variable appears as
a regressor. The Ox/DPD code is:

dpd.Select(Y_VAR, {"n", 0, 1});


dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 1});
dpd.SetDummies(D_CONSTANT + D_TIME);

print("\n\n***** Blundell & Bond (1998), Table 4: 1976-86 GMM-DIF");


dpd.Gmm("n", 2, 99);
dpd.Gmm("w", 2, 99);
dpd.Gmm("k", 2, 99);
dpd.SetMethod(M_2STEP);
dpd.Estimate();

print("\n\n***** Blundell & Bond (1998), Table 4: 1976-86 GMM-SYS");


dpd.GmmLevel("n", 1, 1);
dpd.GmmLevel("w", 1, 1);
dpd.GmmLevel("k", 1, 1);
dpd.SetMethod(M_2STEP);
dpd.Estimate();

Here is the corresponding gretl code:

open abdata.gdt
list X = w w(-1) k k(-1)
list Z = w k

# Blundell & Bond (1998), Table 4: 1976-86 GMM-DIF


dpanel 1 ; n X const ; GMM(Z,2,99) --time --two-step --dpd

# Blundell & Bond (1998), Table 4: 1976-86 GMM-SYS


dpanel 1 ; n X const ; GMM(Z,2,99) GMMlevel(Z,1,1) \
--time --two-step --dpd --system
Chapter 24. Dynamic panel models 233

Note the use of the --system option flag to specify GMM-SYS, including the default treatment of
the dependent variable, which corresponds to GMMlevel(n,1,1). In this case we also want to
use lagged differences of the regressors w and k as instruments for the levels equations so we
need explicit GMMlevel entries for those variables. If you want something other than the default
treatment for the dependent variable as an instrument for the levels equations, you should give an
explicit GMMlevel specification for that variable — and in that case the --system flag is redundant
(but harmless).
For the sake of completeness, note that if you specify at least one GMMlevel term, dpanel will then
include equations in levels, but it will not automatically add a default GMMlevel specification for
the dependent variable unless the --system option is given.

24.4 Cross-country growth example


The previous examples all used the Arellano–Bond dataset; for this example we use the dataset
CEL.gdt, which is also included in the gretl distribution. As with the Arellano–Bond data, there
are numerous missing values. Details of the provenance of the data can be found by opening the
dataset information window in the gretl GUI (Data menu, Dataset info item). This is a subset of the
Barro–Lee 138-country panel dataset, an approximation to which is used in Caselli, Esquivel and
Lefort (1996) and Bond, Hoeffler and Temple (2001).4 Both of these papers explore the dynamic
panel-data approach in relation to the issues of growth and convergence of per capita income across
countries.
The dependent variable is growth in real GDP per capita over successive five-year periods; the
regressors are the log of the initial (five years prior) value of GDP per capita, the log-ratio of in-
vestment to GDP, s, in the prior five years, and the log of annual average population growth, n,
over the prior five years plus 0.05 as stand-in for the rate of technical progress, g, plus the rate of
depreciation, δ (with the last two terms assumed to be constant across both countries and periods).
The original model is

∆5 yit = βyi,t−5 + αsit + γ(nit + g + δ) + νt + ηi + ϵit (24.7)

which allows for a time-specific disturbance νt . The Solow model with Cobb–Douglas production
function implies that γ = −α, but this assumption is not imposed in estimation. The time-specific
disturbance is eliminated by subtracting the period mean from each of the series.
Equation (24.7) can be transformed to an AR(1) dynamic panel-data model by adding yi,t−5 to both
sides, which gives
yit = (1 + β)yi,t−5 + αsit + γ(nit + g + δ) + ηi + ϵit (24.8)
where all variables are now assumed to be time-demeaned.
In (rough) replication of Bond et al. (2001) we now proceed to estimate the following two models:
(a) equation (24.8) via GMM-DIF, using as instruments the second and all longer lags of yit , sit and
nit + g + δ; and (b) equation (24.8) via GMM-SYS, using ∆yi,t−1 , ∆si,t−1 and ∆(ni,t−1 + g + δ) as
additional instruments in the levels equations. We report robust standard errors throughout. (As a
purely notational matter, we now use “t − 1” to refer to values five years prior to t, as in Bond et al.
(2001)).
The gretl script to do this job is shown in Listing 24.1. Note that the final transformed versions of
the variables (logs, with time-means subtracted) are named ly (yit ), linv (sit ) and lngd (nit +g +δ).
For comparison we estimated the same two models using Ox/DPD and xtabond2. (In each case
we constructed a comma-separated values dataset containing the data as transformed in the gretl
script shown above, using a missing-value code appropriate to the target program.) For reference,
the commands used with Stata are reproduced below:
4 We say an “approximation” because we have not been able to replicate exactly the OLS results reported in the papers

cited, though it seems from the description of the data in Caselli et al. (1996) that we ought to be able to do so. We note
that Bond et al. (2001) used data provided by Professor Caselli yet did not manage to reproduce the latter’s results.
Chapter 24. Dynamic panel models 234

Listing 24.1: GDP growth example [Download ▼]

open CEL.gdt

ngd = n + 0.05
ly = log(y)
linv = log(s)
lngd = log(ngd)

# take out time means


loop i=1..8
smpl (time == i) --restrict --replace
ly -= mean(ly)
linv -= mean(linv)
lngd -= mean(lngd)
endloop

smpl --full
list X = linv lngd
# 1-step GMM-DIF
dpanel 1 ; ly X ; GMM(X,2,99)
# 2-step GMM-DIF
dpanel 1 ; ly X ; GMM(X,2,99) --two-step
# GMM-SYS
dpanel 1 ; ly X ; GMM(X,2,99) GMMlevel(X,1,1) --two-step --sys

#delimit ;
insheet using CEL.csv
tsset unit time;
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
gmm(lngd, lag(2 99)) rob nolev;
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
gmm(lngd, lag(2 99)) rob nolev twostep;
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
gmm(lngd, lag(2 99)) rob nocons twostep;

For the GMM-DIF model all three programs find 382 usable observations and 30 instruments, and
yield identical parameter estimates and robust standard errors (up to the number of digits printed,
or more); see Table 24.1.5

1-step 2-step
coeff std. error coeff std. error
ly(-1) 0.577564 0.1292 0.610056 0.1562
linv 0.0565469 0.07082 0.100952 0.07772
lngd −0.143950 0.2753 −0.310041 0.2980

Table 24.1: GMM-DIF: Barro–Lee data

Results for GMM-SYS estimation are shown in Table 24.2. In this case we show two sets of gretl
5 The coefficient shown for ly(-1) in the Tables is that reported directly by the software; for comparability with the

original model (eq. 24.7) it is necesary to subtract 1, which produces the expected negative value indicating conditional
convergence in per capita income.
Chapter 24. Dynamic panel models 235

results: those labeled “gretl(1)” were obtained using gretl’s --dpdstyle option, while those labeled
“gretl(2)” did not use that option—the intent being to reproduce the H matrices used by Ox/DPD
and xtabond2 respectively.

gretl(1) Ox/DPD gretl(2) xtabond2


ly(-1) 0.9237 (0.0385) 0.9167 (0.0373) 0.9073 (0.0370) 0.9073 (0.0370)
linv 0.1592 (0.0449) 0.1636 (0.0441) 0.1856 (0.0411) 0.1856 (0.0411)
lngd −0.2370 (0.1485) −0.2178 (0.1433) −0.2355 (0.1501) −0.2355 (0.1501)

Table 24.2: 2-step GMM-SYS: Barro–Lee data (standard errors in parentheses)

In this case all three programs use 479 observations; gretl and xtabond2 use 41 instruments and
produce the same estimates (when using the same H matrix) while Ox/DPD nominally uses 66.6
It is noteworthy that with GMM-SYS plus “messy” missing observations, the results depend on the
precise array of instruments used, which in turn depends on the details of the implementation of
the estimator.

24.5 Auxiliary test statistics


We have concentrated above on parameter estimates and standard errors. Here we add some dis-
cussion of the additional test statistics that typically accompany both GMM-DIF and GMM-SYS
estimation —tests of overidentification, for first- and second-order autocorrelation, and for the
joint significance of regressors.

Overidentification
If a model estimated with the use of instrumental variables is just-identified, the condition of or-
thogonality of the residuals and the instruments can be satisfied exactly. But if the specification
is overidentified (more instruments than endogenous regressors) this condition can only be ap-
proximated, and the degree to which orthogonality “fails” serves as a test for the validity of the
instruments (and/or the specification). Since dynamic panel models are almost always overidenti-
fied such a test is of particular importance.
There are two such tests in the econometric literature, devised respectively by Sargan (1958) and
Hansen (1982). They share a common principle: a suitably scaled measure of deviation from perfect
orthogonality can be shown to be distributed as χ 2 (k), with k the degree of overidentification,
under the null hypothesis of valid instruments and correct specification. Both test statistics can be
written as    
N
X N
X
∗′ ′ ∗
S= v̂ Zi  AN i Z v̂  i i
i=1 i=1

where the v̂∗i are the residuals in first differences for unit i, and for that reason they are often
rolled together—for example, as “Hansen–Sargan” tests by Davidson and MacKinnon (2004).
The Sargan vs Hansen difference is buried in AN : Sargan’s original test is the minimized orthogonal-
ity score divided by a scalar estimate of the error variance (which is presumed to be homoskedastic),
while Hansen’s is the minimized criterion from efficient GMM estimation, in which the scalar vari-
ance estimate is replaced by a heteroskedasticity- and autocorrelation-consistent (HAC) estimator
of the variance matrix of the error term. These variants correspond to 1-step and 2-step estimates
of the given specification.
Up till version 2021d, gretl followed Ox/DPD in presenting a single overidentification statistic under
the name “Sargan”—in effect, a Sargan test proper for the 1-step estimator and a Hansen test
6 This is a case of the issue described in section 24.1: the full A matrix turns out to be singular and special measures
N
must be taken to produce estimates.
Chapter 24. Dynamic panel models 236

for 2-step. Subsequently, however, gretl follows xtabond2 in distinguishing between the tests
and presenting both statistics, under their original names, when 2-step estimation is selected (and
therefore the HAC variance estimator is available). This choice responds to an argument made by
Roodman (2009b): the Sargan test is questionable owing to its assumption of homoskedasticity but
the Hansen test is seriously weakened by an excessive number of instruments (it may under-reject
substantially), so there may be a benefit to taking both tests into consideration.
There are cases where the degrees of freedom for the overidentification test differs between DPD
and gretl; this occurs when the AN matrix is singular (section 24.1). In concept the df equals the
number of instruments minus the number of parameters estimated; for the first of these terms
gretl uses the rank of AN , while DPD appears to use the full dimension of this matrix.

Autocorrelation
Negative first-order autocorrelation of the residuals in differences is expected by construction of
the dynamic panel estimator, so a significant value for the AR(1) test does not indicate a problem.
If the AR(2) test rejects, however, this indicates violation of the maintained assumptions. Note that
valid AR tests cannot be produced when the --asymptotic option is specified in conjunction with
one-step GMM-SYS estimation; if you need the tests, either add the two-step option or drop the
asymptotic flag (which is recommended in any case).

Wald tests on regressors


Wald tests on the regressors (and separately on the time dummy variables, if included), are based
on the estimated variance matrix of the parameter estimates and are generally in agreement across
software packages provided the parameter variance is estimated in the same way. One small ex-
ception pertains to comparison between Ox/DPD and gretl when the difference estimator is used, a
constant term is included, and the --dpdstyle option is given with dpanel (so the constant is not
automatically omitted). In this case DPD includes the constant in the time-dummies Wald test but
gretl does not.

24.6 Post-estimation available statistics


After estimation, the $model accessor will return a bundle containing several items that may be of
interest: most should be self-explanatory, but here’s a partial list:

Key Content
AR1, AR2 1st and 2nd order autocorrelation test statistics
sargan, sargan_df Sargan test for overidentifying restrictions and correspond-
ing degrees of freedom
hansen, hansen_df Hansen test for overidentifying restrictions and corre-
sponding degrees of freedom
wald, wald_df Wald test for overall significance and corresponding de-
grees of freedom
GMMinst The matrix Z of instruments (see equations (24.2) and (24.5)
wgtmat The matrix A of GMM weights (see equations (24.2) and
(24.5)

Note that hansen and hansen_df are not included when 1-step estimation is selected. Note also
that GMMinst and wgtmat (which may be quite large matrices) are not saved in the $model bundle
by default; that requires use of the --keep-extra option with the dpanel command. Listing 24.2
illustrates use of these matrices to replicate via hansl commands the calculation of the GMM esti-
mator.
Chapter 24. Dynamic panel models 237

Listing 24.2: replication of built-in command via hansl commands [Download ▼]

set verbose off


open abdata.gdt

# compose list of regressors


list X = w w(-1) k k(-1)
list Z = w k

dpanel 1 ; n X const ; GMM(Z,2,99) --two-step --dpd --keep-extra

### --- re-do by hand ----------------------------

# fetch Z and A from model


A = $model.wgtmat
mZt = $model.GMMinst # note: transposed

# create data matrices


series valid = ok($uhat)
series ddep = diff(n)
series dldep = ddep(-1)
list dreg = diff(X)

smpl valid --dummy

matrix m_reg = {dldep} ~ {dreg} ~ 1


matrix m_dep = {ddep}

matrix uno = mZt * m_reg


matrix due = qform(uno’, A)
matrix tre = (uno’A) * (mZt * m_dep)
matrix coef = due\tre

print coef
Chapter 24. Dynamic panel models 238

24.7 Memo: dpanel options


flag effect

--asymptotic Suppresses the use of robust standard errors


--two-step Calls for 2-step estimation (the default being 1-step)
--system Calls for GMM-SYS, with default treatment of the dependent variable,
as in GMMlevel(y,1,1)
--collapse Collapse block-diagonal sets of GMM instruments as per Roodman
(2009a)
--time-dummies Includes period-specific dummy variables
--dpdstyle Compute the H matrix as in DPD; also suppresses differencing of
automatic time dummies and omission of intercept in the GMM-DIF
case
--verbose Prints confirmation of the GMM-style instruments used; and when
--two-step is selected, prints the 1-step estimates first
--vcv Calls for printing of the covariance matrix
--quiet Suppresses the printing of results
--keep-extra Save additional matrices in $model bundle (see above)

The time dummies option supports the qualifier noprint, as in

--time-dummies=noprint

This means that although the dummies are included in the specification their coefficients, standard
errors and so on are not printed.

You might also like