Rivregress

Title stata.
com
ivregress — Single-equation instrumental-variables regression
Syntax Menu Description Options

Remarks and examples Stored results Methods and formulas References
Also see
Syntax

ivregress estimator depvar varlist1 (varlist2 = varlistiv ) if in weight

, options
estimator Description
2sls two-stage least squares (2SLS)
liml limited-information maximum likelihood (LIML)
gmm generalized method of moments (GMM)
options Description
Model
noconstant suppress constant term
hascons has user-supplied constant
1
GMM
wmatrix(wmtype) wmtype may be robust, cluster clustvar, hac kernel, or unadjusted
center center moments in weight matrix computation
igmm use iterative instead of two-step GMM estimator
eps(#)2 specify # for parameter convergence criterion; default is eps(1e-6)
weps(#)2 specify # for weight matrix convergence criterion; default is
weps(1e-6)
optimization options2 control the optimization process; seldom used
SE/Robust
vce(vcetype) vcetype may be unadjusted, robust, cluster clustvar, bootstrap,
jackknife, or hac kernel
Reporting
level(#) set confidence level; default is level(95)
first report first-stage regression
small make degrees-of-freedom adjustments and report small-sample
statistics
noheader display only the coefficient table
depname(depname) substitute dependent variable name
eform(string) report exponentiated coefficients and use string to label them
display options control column formats, row spacing, line width, display of omitted
variables and base and empty cells, and factor-variable labeling
1
2 ivregress — Single-equation instrumental-variables regression
perfect do not check for collinearity between endogenous regressors and

excluded instruments
coeflegend display legend instead of statistics
1
These options may be specified only when gmm is specified.
2
These options may be specified only when igmm is specified.
varlist1 , varlist2 , and varlistiv may contain factor variables; see [U] 11.4.3 Factor variables.
depvar, varlist1 , varlist2 , and varlistiv may contain time-series operators; see [U] 11.4.4 Time-series varlists.
bootstrap, by, jackknife, rolling, statsby, and svy are allowed; see [U] 11.1.10 Prefix commands.
Weights are not allowed with the bootstrap prefix; see [R] bootstrap.
aweights are not allowed with the jackknife prefix; see [R] jackknife.
hascons, vce(), noheader, depname(), and weights are not allowed with the svy prefix; see [SVY] svy.
aweights, fweights, iweights, and pweights are allowed; see [U] 11.1.6 weight.
perfect and coeflegend do not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.
Menu
Statistics > Endogenous covariates > Single-equation instrumental-variables regression
Description
ivregress fits a linear regression of depvar on varlist1 and varlist2 , using varlistiv (along with
varlist1 ) as instruments for varlist2 . ivregress supports estimation via two-stage least squares (2SLS),
limited-information maximum likelihood (LIML), and generalized method of moments (GMM).
In the language of instrumental variables, varlist1 and varlistiv are the exogenous variables, and
varlist2 are the endogenous variables.
Options

Model
noconstant; see [R] estimation options.
hascons indicates that a user-defined constant or its equivalent is specified among the independent
variables.

GMM
wmatrix(wmtype) specifies the type of weighting matrix to be used in conjunction with the GMM
estimator.
Specifying wmatrix(robust) requests a weighting matrix that is optimal when the error term is
heteroskedastic. wmatrix(robust) is the default.
Specifying wmatrix(cluster clustvar) requests a weighting matrix that accounts for arbitrary
correlation among observations within clusters identified by clustvar.
Specifying wmatrix(hac kernel #) requests a heteroskedasticity- and autocorrelation-consistent
(HAC) weighting matrix using the specified kernel (see below) with # lags. The bandwidth of a
kernel is equal to # + 1.
ivregress — Single-equation instrumental-variables regression 3
Specifying wmatrix(hac kernel opt) requests an HAC weighting matrix using the specified kernel,
and the lag order is selected using Newey and West’s (1994) optimal lag-selection algorithm.
Specifying wmatrix(hac kernel) requests an HAC weighting matrix using the specified kernel and
N − 2 lags, where N is the sample size.
There are three kernels available for HAC weighting matrices, and you may request each one by
using the name used by statisticians or the name perhaps more familiar to economists:
bartlett or nwest requests the Bartlett (Newey–West) kernel;
parzen or gallant requests the Parzen (Gallant 1987) kernel; and
quadraticspectral or andrews requests the quadratic spectral (Andrews 1991) kernel.
Specifying wmatrix(unadjusted) requests a weighting matrix that is suitable when the errors are
homoskedastic. The GMM estimator with this weighting matrix is equivalent to the 2SLS estimator.
center requests that the sample moments be centered (demeaned) when computing GMM weight
matrices. By default, centering is not done.
igmm requests that the iterative GMM estimator be used instead of the default two-step GMM estimator.
Convergence is declared when the relative change in the parameter vector from one iteration to
the next is less than eps() or the relative change in the weight matrix is less than weps().
eps(#) specifies the convergence criterion for successive parameter estimates when the iterative GMM
estimator is used. The default is eps(1e-6). Convergence is declared when the relative difference
between successive parameter estimates is less than eps() and the relative difference between
successive estimates of the weighting matrix is less than weps().
weps(#) specifies the convergence criterion for successive estimates of the weighting matrix when
the iterative GMM estimator is used. The default is weps(1e-6). Convergence is declared when
the relative difference between successive parameter estimates is less than eps() and the relative
difference between successive estimates of the weighting matrix is less than weps().

optimization options: iterate(#), no log. iterate() specifies the maximum number of iterations
to perform in conjunction with the iterative GMM estimator. The default is 16,000 or the number
set using set maxiter (see [R] maximize). log/nolog specifies whether to show the iteration
log. These options are seldom used.

SE/Robust
vce(vcetype) specifies the type of standard error reported, which includes types that are robust to
some kinds of misspecification (robust), that allow for intragroup correlation (cluster clustvar),
and that use bootstrap or jackknife methods (bootstrap, jackknife); see [R] vce option.
vce(unadjusted), the default for 2sls and liml, specifies that an unadjusted (nonrobust) VCE
matrix be used. The default for gmm is based on the wmtype specified in the wmatrix() option;
see wmatrix(wmtype) above. If wmatrix() is specified with gmm but vce() is not, then vcetype
is set equal to wmtype. To override this behavior and obtain an unadjusted (nonrobust) VCE matrix,
specify vce(unadjusted).
ivregress also allows the following:

vce(hac kernel # | opt ) specifies that an HAC covariance matrix be used. The syntax used
with vce(hac kernel . . .) is identical to that used with wmatrix(hac kernel . . .); see
wmatrix(wmtype) above.

Reporting
level(#); see [R] estimation options.
first requests that the first-stage regression results be displayed.

small requests that the degrees-of-freedom adjustment N/(N −k) be made to the variance–covariance
matrix of parameters and that small-sample F and t statistics be reported, where N is the sample
size and k is the number of parameters estimated. By default, no degrees-of-freedom adjustment
is made, and Wald and z statistics are reported. Even with this option, no degrees-of-freedom
adjustment is made to the weighting matrix when the GMM estimator is used.
noheader suppresses the display of the summary statistics at the top of the output, displaying only
the coefficient table.
depname(depname) is used only in programs and ado-files that use ivregress to fit models other than
instrumental-variables regression. depname() may be specified only at estimation time. depname
is recorded as the identity of the dependent variable, even though the estimates are calculated using
depvar. This method affects the labeling of the output — not the results calculated — but could
affect later calculations made by predict, where the residual would be calculated as deviations
from depname rather than depvar. depname() is most typically used when depvar is a temporary
variable (see [P] macro) used as a proxy for depname.
eform(string) is used only in programs and ado-files that use ivregress to fit models other
than instrumental-variables regression. eform() specifies that the coefficient table be displayed in
“exponentiated form”, as defined in [R] maximize, and that string be used to label the exponentiated
coefficients in the table.
display options: noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvla-
bel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt), sformat(% fmt), and
nolstretch; see [R] estimation options.
The following options are available with ivregress but are not shown in the dialog box:
perfect requests that ivregress not check for collinearity between the endogenous regressors and
excluded instruments, allowing one to specify “perfect” instruments. This option cannot be used
with the LIML estimator. This option may be required when using ivregress to implement other
estimators.
coeflegend; see [R] estimation options.
Remarks and examples stata.com

ivregress performs instrumental-variables regression and weighted instrumental-variables regres-
sion. For a general discussion of instrumental variables, see Baum (2006), Cameron and Trivedi (2005;
2010, chap. 6) Davidson and MacKinnon (1993, 2004), Greene (2012, chap. 8), and Wooldridge
(2010, 2013). See Hall (2005) for a lucid presentation of GMM estimation. Angrist and Pischke (2009,
chap. 4) offer a casual yet thorough introduction to instrumental-variables estimators, including their
use in estimating treatment effects. Some of the earliest work on simultaneous systems can be
found in Cowles Commission monographs — Koopmans and Marschak (1950) and Koopmans and
Hood (1953) — with the first developments of 2SLS appearing in Theil (1953) and Basmann (1957).
However, Stock and Watson (2011, 422–424) present an example of the method of instrumental
variables that was first published in 1928 by Philip Wright.
The syntax for ivregress assumes that you want to fit one equation from a system of equations
or an equation for which you do not want to specify the functional form for the remaining equations
of the system. To fit a full system of equations, using either 2SLS equation-by-equation or three-stage
least squares, see [R] reg3. An advantage of ivregress is that you can fit one equation of a
multiple-equation system without specifying the functional form of the remaining equations.
Formally, the model fit by ivregress is
yi = yi β1 + x1i β2 + ui (1)
yi = x1i Π1 + x2i Π2 + vi (2)
Here yi is the dependent variable for the ith observation, yi represents the endogenous regressors
(varlist2 in the syntax diagram), x1i represents the included exogenous regressors (varlist1 in the syntax
diagram), and x2i represents the excluded exogenous regressors (varlistiv in the syntax diagram).
x1i and x2i are collectively called the instruments. ui and vi are zero-mean error terms, and the
correlations between ui and the elements of vi are presumably nonzero.
The rest of the discussion is presented under the following headings:
2SLS and LIML estimators
GMM estimator

The most common instrumental-variables estimator is 2SLS.
Example 1: 2SLS estimator

We have state data from the 1980 census on the median dollar value of owner-occupied housing
(hsngval) and the median monthly gross rent (rent). We want to model rent as a function of
hsngval and the percentage of the population living in urban areas (pcturban):
renti = β0 + β1 hsngvali + β2 pcturbani + ui
where i indexes states and ui is an error term.

Because random shocks that affect rental rates in a state probably also affect housing values, we
treat hsngval as endogenous. We believe that the correlation between hsngval and u is not equal
to zero. On the other hand, we have no reason to believe that the correlation between pcturban and
u is nonzero, so we assume that pcturban is exogenous.
Because we are treating hsngval as an endogenous regressor, we must have one or more additional
variables available that are correlated with hsngval but uncorrelated with u. Moreover, these excluded
exogenous variables must not affect rent directly, because if they do then they should be included
in the regression equation we specified above. In our dataset, we have a variable for family income
(faminc) and for region of the country (region) that we believe are correlated with hsngval but
not the error term. Together, pcturban, faminc, and factor variables 2.region, 3.region, and
4.region constitute our set of instruments.
To fit the equation in Stata, we specify the dependent variable and the list of included exogenous
variables. In parentheses, we specify the endogenous regressors, an equal sign, and the excluded
exogenous variables. Only the additional exogenous variables must be specified to the right of the
equal sign; the exogenous variables that appear in the regression equation are automatically included
as instruments.
Here we fit our model with the 2SLS estimator:

. use http://www.stata-press.com/data/r13/hsng
(1980 Census housing data)
. ivregress 2sls rent pcturban (hsngval = faminc i.region)
Instrumental variables (2SLS) regression Number of obs = 50
Wald chi2(2) = 90.76
Prob > chi2 = 0.0000
R-squared = 0.5989
Root MSE = 22.166
rent Coef. Std. Err. z P>|z| [95% Conf. Interval]
hsngval .0022398 .0003284 6.82 0.000 .0015961 .0028836

pcturban .081516 .2987652 0.27 0.785 -.504053 .667085
_cons 120.7065 15.22839 7.93 0.000 90.85942 150.5536
Instrumented: hsngval
Instruments: pcturban faminc 2.region 3.region 4.region
As we would expect, states with higher housing values have higher rental rates. The proportion
of a state’s population that is urban does not have a significant effect on rents.
Technical note
In a simultaneous-equations framework, we could write the model we just fit as
hsngvali = π0 + π1 faminci + π2 2.regioni + π3 3.regioni + π4 4.regioni + vi

renti = β0 + β1 hsngvali + β2 pcturbani + ui
which here happens to be recursive (triangular), because hsngval appears in the equation for rent
but rent does not appear in the equation for hsngval. In general, however, systems of simultaneous
equations are not recursive. Because this system is recursive, we could fit the two equations individually
via OLS if we were willing to assume that u and v were independent. For a more detailed discussion
of triangular systems, see Kmenta (1997, 719–720).
Historically, instrumental-variables estimation and systems of simultaneous equations were taught
concurrently, and older textbooks describe instrumental-variables estimation solely in the context of
simultaneous equations. However, in recent decades, the treatment of endogeneity and instrumental-
variables estimation has taken on a much broader scope, while interest in the specification of
complete systems of simultaneous equations has waned. Most recent textbooks, such as Cameron
and Trivedi (2005), Davidson and MacKinnon (1993, 2004), and Wooldridge (2010, 2013), treat
instrumental-variables estimation as an integral part of the modern economists’ toolkit and introduce
it long before shorter discussions on simultaneous equations.
In addition to the 2SLS member of the κ-class estimators, ivregress implements the LIML
estimator. Both theoretical and Monte Carlo exercises indicate that the LIML estimator may yield less
bias and confidence intervals with better coverage rates than the 2SLS estimator. See Poi (2006) and
Stock, Wright, and Yogo (2002) (and the papers cited therein) for Monte Carlo evidence.
Example 2: LIML estimator

Here we refit our model with the LIML estimator:
. ivregress liml rent pcturban (hsngval = faminc i.region)
Instrumental variables (LIML) regression Number of obs = 50
Wald chi2(2) = 75.71
Prob > chi2 = 0.0000
R-squared = 0.4901
Root MSE = 24.992
hsngval .0026686 .0004173 6.39 0.000 .0018507 .0034865

pcturban -.1827391 .3571132 -0.51 0.609 -.8826681 .5171899
_cons 117.6087 17.22625 6.83 0.000 83.84587 151.3715
These results are qualitatively similar to the 2SLS results, although the coefficient on hsngval is
about 19% higher.
GMM estimator
Since the celebrated paper of Hansen (1982), the GMM has been a popular method of estimation
in economics and finance, and it lends itself well to instrumental-variables estimation. The basic
principle is that we have some moment or orthogonality conditions of the form
E(zi ui ) = 0 (3)
From (1), we have ui = yi − yi β1 − x1i β2 . What are the elements of the instrument vector zi ? By
assumption, x1i is uncorrelated with ui , as are the excluded exogenous variables x2i , and so we use
zi = [x1i x2i ]. The moment conditions are simply the mathematical representation of the assumption
that the instruments are exogenous—that is, the instruments are orthogonal to (uncorrelated with) ui .
If the number of elements in zi is just equal to the number of unknown parameters, then we can
apply the analogy principle to (3) and solve
1 X 1 X
zi ui = zi (yi − yi β1 − x1i β2 ) = 0 (4)
N i N i
This equation is known as the method of moments estimator. Here where the number of instruments
equals the number of parameters, the method of moments estimator coincides with the 2SLS estimator,
which also coincides with what has historically been called the indirect least-squares estimator (Judge
et al. 1985, 595).
The “generalized” in GMM addresses the case in which the number of instruments (columns of zi )
exceeds the number of parameters to be estimated. Here there is no unique solution to the population
moment conditions defined in (3), so we cannot use (4). Instead, we define the objective function
!0 !
1 X 1 X
Q(β1 , β2 ) = zi ui W zi u i (5)
N i
N i
where W is a positive-definite matrix with the same number of rows and columns as the number of
columns of zi . W is known as the weighting matrix, and we specify its structure with the wmatrix()
option. The GMM estimator of (β1 , β2 ) minimizes Q(β1 , β2 ); that is, the GMM estimator chooses
β1 and β2 to make the moment conditions as close to zero as possible for a given W. For a more
general GMM estimator, see [R] gmm. gmm does not restrict you to fitting a single linear equation,
though the syntax is more complex.
A well-known result is that if we define the matrix S0 to be the covariance of zi ui and set
W = S−1 0 , then we obtain the optimal two-step GMM estimator, where by optimal estimator we mean
the one that results in the smallest variance given the moment conditions defined in (3).
Suppose that the errors ui are heteroskedastic but independent among observations. Then
S0 = E(zi ui ui z0i ) = E(u2i zi z0i )
and the sample analogue is

b= 1
X
S b2 zi z0i
u (6)
N i i
To implement this estimator, we need estimates of the sample residuals u bi . ivregress gmm obtains
the residuals by estimating β1 and β2 by 2SLS and then evaluates (6) and sets W = S b −1 . Equation (6)
is the same as the center term of the “sandwich” robust covariance matrix available from most Stata
estimation commands through the vce(robust) option.
Example 3: GMM estimator

Here we refit our model of rents by using the GMM estimator, allowing for heteroskedasticity in
ui :
. ivregress gmm rent pcturban (hsngval = faminc i.region), wmatrix(robust)
Instrumental variables (GMM) regression Number of obs = 50
Wald chi2(2) = 112.09
Prob > chi2 = 0.0000
R-squared = 0.6616
GMM weight matrix: Robust Root MSE = 20.358
Robust
hsngval .0014643 .0004473 3.27 0.001 .0005877 .002341

pcturban .7615482 .2895105 2.63 0.009 .1941181 1.328978
_cons 112.1227 10.80234 10.38 0.000 90.95052 133.2949
Because we requested that a heteroskedasticity-consistent weighting matrix be used during estimation

but did not specify the vce() option, ivregress reported standard errors that are robust to
heteroskedasticity. Had we specified vce(unadjusted), we would have obtained standard errors that
would be correct only if the weighting matrix W does in fact converge to S−1 0 .
Technical note
Many software packages that implement GMM estimation use the same heteroskedasticity-consistent
weighting matrix we used in the previous example to obtain the optimal two-step estimates but do not use
a heteroskedasticity-consistent VCE, even though they may label the standard errors as being “robust”.
To replicate results obtained from other packages, you may have to use the vce(unadjusted) option.
See Methods and formulas below for a discussion of robust covariance matrix estimation in the GMM
framework.
By changing our definition of S0 , we can obtain GMM estimators suitable for use with other types
of data that violate the assumption that the errors are independent and identically distributed. For
example, you may have a dataset that consists of multiple observations for each person in a sample.
The observations that correspond to the same person are likely to be correlated, and the estimation
technique should account for that lack of independence. Say that in your dataset, people are identified
by the variable personid and you type
. ivregress gmm ..., wmatrix(cluster personid)
Here ivregress estimates S0 as

b= 1
X
S qc q0c
N
c∈C
where C denotes the set of clusters and

X
qc = u
bi zi
i∈cj
where cj denotes the j th cluster. This weighting matrix accounts for the within-person correlation
among observations, so the GMM estimator that uses this version of S0 will be more efficient than
the estimator that ignores this correlation.
Example 4: GMM estimator with clustering

We have data from the National Longitudinal Survey on young women’s wages as reported in a
series of interviews from 1968 through 1988, and we want to fit a model of wages as a function of
each woman’s age and age squared, job tenure, birth year, and level of education. We believe that
random shocks that affect a woman’s wage also affect her job tenure, so we treat tenure as endogenous.
As additional instruments, we use her union status, number of weeks worked in the past year, and a
dummy indicating whether she lives in a metropolitan area. Because we have several observations for
each woman (corresponding to interviews done over several years), we want to control for clustering
on each person.
. use http://www.stata-press.com/data/r13/nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. ivregress gmm ln_wage age c.age#c.age birth_yr grade
> (tenure = union wks_work msp), wmatrix(cluster idcode)
Instrumental variables (GMM) regression Number of obs = 18625
Wald chi2(5) = 1807.17
Prob > chi2 = 0.0000
R-squared = .
GMM weight matrix: Cluster (idcode) Root MSE = .46951
(Std. Err. adjusted for 4110 clusters in idcode)
Robust
ln_wage Coef. Std. Err. z P>|z| [95% Conf. Interval]
tenure .099221 .0037764 26.27 0.000 .0918194 .1066227

age .0171146 .0066895 2.56 0.011 .0040034 .0302259
c.age#c.age -.0005191 .000111 -4.68 0.000 -.0007366 -.0003016
birth_yr -.0085994 .0021932 -3.92 0.000 -.012898 -.0043008

grade .071574 .0029938 23.91 0.000 .0657062 .0774417
_cons .8575071 .1616274 5.31 0.000 .5407231 1.174291
Instrumented: tenure
Instruments: age c.age#c.age birth_yr grade union wks_work msp
Both job tenure and years of schooling have significant positive effects on wages.
Time-series data are often plagued by serial correlation. In these cases, we can construct a weighting
matrix to account for the fact that the error in period t is probably correlated with the errors in periods
t − 1, t − 2, etc. An HAC weighting matrix can be used to account for both serial correlation and
potential heteroskedasticity.

To request an HAC weighting matrix, you specify the wmatrix(hac kernel # | opt ) option.
kernel specifies which of three kernels to use: bartlett, parzen, or quadraticspectral. kernel
determines the amount of weight given to lagged values when computing the HAC matrix, and #
denotes the maximum number of lags to use. Many texts refer to the bandwidth of the kernel instead
of the number of lags; the bandwidth is equal to the number of lags plus one. If neither opt nor #
is specified, then N − 2 lags are used, where N is the sample size.
If you specify wmatrix(hac kernel opt), then ivregress uses Newey and West’s (1994)
algorithm for automatically selecting the number of lags to use. Although the authors’ Monte Carlo
simulations do show that the procedure may result in size distortions of hypothesis tests, the procedure
is still useful when little other information is available to help choose the number of lags.
For more on GMM estimation, see Baum (2006); Baum, Schaffer, and Stillman (2003, 2007);
Cameron and Trivedi (2005); Davidson and MacKinnon (1993, 2004); Hayashi (2000); or
Wooldridge (2010). See Newey and West (1987) and Wang and Wu (2012) for an introduction
to HAC covariance matrix estimation.
Stored results
ivregress stores the following in e():
Scalars
e(N) number of observations
e(mss) model sum of squares
e(df m) model degrees of freedom
e(rss) residual sum of squares
e(df r) residual degrees of freedom
e(r2) R2
e(r2 a) adjusted R2
e(F) F statistic
e(rmse) root mean squared error
e(N clust) number of clusters
e(chi2) χ2
e(kappa) κ used in LIML estimator
e(J) value of GMM objective function
e(wlagopt) lags used in HAC weight matrix (if Newey–West algorithm used)
e(vcelagopt) lags used in HAC VCE matrix (if Newey–West algorithm used)
e(rank) rank of e(V)
e(iterations) number of GMM iterations (0 if not applicable)
Macros
e(cmd) ivregress
e(cmdline) command as typed
e(depvar) name of dependent variable
e(instd) instrumented variable
e(insts) instruments
e(constant) noconstant or hasconstant if specified
e(wtype) weight type
e(wexp) weight expression
e(title) title in estimation output
e(clustvar) name of cluster variable
e(hac kernel) HAC kernel
e(hac lag) HAC lag
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. Err.
e(estimator) 2sls, liml, or gmm
e(exogr) exogenous regressors
e(wmatrix) wmtype specified in wmatrix()
e(moments) centered if center specified
e(small) small if small-sample statistics
e(depname) depname if depname(depname) specified; otherwise same as e(depvar)
e(properties) b V
e(estat cmd) program used to implement estat
e(predict) program used to implement predict
e(footnote) program used to implement footnote display
e(marginsok) predictions allowed by margins
e(marginsnotok) predictions disallowed by margins
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
Matrices
e(b) coefficient vector
e(Cns) constraints matrix
e(W) weight matrix used to compute GMM estimates
e(S) moment covariance matrix used to compute GMM variance–covariance matrix
e(V) variance–covariance matrix of the estimators
e(V modelbased) model-based variance
Functions
e(sample) marks estimation sample
Methods and formulas

Methods and formulas are presented under the following headings:
Notation
GMM estimator
Notation
Items printed in lowercase and italicized (for example, x) are scalars. Items printed in lowercase
and boldfaced (for example, x) are vectors. Items printed in uppercase and boldfaced (for example,
X) are matrices.
The model is
y = Yβ1 + X1 β2 + u = Xβ + u
Y = X1 Π1 + X2 Π2 + v = ZΠ + V
where y is an N × 1 vector of the left-hand-side variable; N is the sample size; Y is an N × p
matrix of p endogenous regressors; X1 is an N × k1 matrix of k1 included exogenous regressors;
X2 is an N × k2 matrix of k2 excluded exogenous variables, X = [Y X1 ], Z = [X1 X2 ]; u is an
N × 1 vector of errors; V is an N × p matrix of errors; β = [β1 β2 ] is a k = (p + k1 ) × 1 vector
of parameters; and Π is a (k1 + k2 ) × p vector of parameters. If a constant term is included in the
model, then one column of X1 contains all ones.
Let v be a column vector of weights specified by the user. If no weights are specified, v = 1.
Let w be a column vector of normalized weights.If no weights
are specified or if the user specified
fweights or iweights, w = v; otherwise, w = v/(10 v) (10 1). Let D denote the N × N matrix
with w on the main diagonal and zeros elsewhere. If no weights are specified, D is the identity
matrix.
The weighted number of observations n is defined as 10 w. For iweights, this is truncated to an
integer. The sum of the weights is 10 v. Define c = 1 if there is a constant in the regression and zero
otherwise.
The order condition for identification requires that k2 ≥ p: the number of excluded exogenous
variables must be at least as great as the number of endogenous regressors.
In the following formulas, if weights are specified, X01 X1 , X0 X, X0 y, y0 y, Z0 Z, Z0 X, and Z0 y
are replaced with X01 DX1 , X0 DX, X0 Dy, y0 Dy, Z0 DZ, Z0 DX, and Z0 Dy, respectively. We
suppress the D below to simplify the notation.

Define the κ-class estimator of β as
−1 0
b = X0 (I − κMZ )X

X (I − κMZ )y
where MZ = I − Z(Z0 Z)−1 Z0 . The 2SLS estimator results from setting κ = 1. The LIML estimator
results from selecting κ to be the minimum eigenvalue of (Y0 MZ Y)−1/2 Y0 MX1 Y(Y0 MZ Y)−1/2 ,
where MX1 = I − X1 (X01 X1 )−1 X01 .
The total sum of squares (TSS) equals y0 y if there is no intercept and y0 y − (10 y)2 /n otherwise.

The degrees of freedom is n−c. The error sum of squares (ESS) is defined as y0 y− 2bX0 y+b0 X0 Xb.
The model sum of squares (MSS) equals TSS − ESS. The degrees of freedom is k − c.
The mean squared error, s2 , is defined as ESS/(n − k) if small is specified and ESS/n otherwise.
The root mean squared error is s, its square root.
If c = 1 and small is not specified, a Wald statistic, W , of the joint significance of the k − 1
parameters of β except the constant term is calculated; W ∼ χ2 (k − 1). If c = 1 and small is
specified, then an F statistic is calculated as F = W/(k − 1); F ∼ F (k − 1, n − k).
The R-squared is defined as R2 = 1 − ESS/TSS.
The adjusted R-squared is Ra2 = 1 − (1 − R2 )(n − c)/(n − k).
−1
If robust is not specified, then Var(b) = s2 X0 (I − κMZ )X

. For a discussion of robust
variance estimates in regression and regression with instrumental variables, see Methods and formulas
in [R] regress. If small is not specified, then k = 0 in the formulas given there.
This command also supports estimation with survey data. For details on VCEs with survey data,
see [SVY] variance estimation.
GMM estimator
We obtain an initial consistent estimate of β by using the 2SLS estimator; see above. Using this
estimate of β, we compute the weighting matrix W and calculate the GMM estimator
−1 0
bGMM = X0 ZWZ0 X X ZWZ0 y

The variance of bGMM is

−1 0 −1
Var(bGMM ) = n X0 ZWZ0 X 0
X X0 ZWZ0 X

X ZWSWZ
b
Var(bGMM ) is of the sandwich form DMD; see [P] robust. If the user specifies the small option,
ivregress implements a small-sample adjustment by multiplying the VCE by N/(N − k).
b = W−1 and the VCE reduces to the “optimal”
If vce(unadjusted) is specified, then we set S
GMM variance estimator −1
Var(βGMM ) = n X0 ZWZ0 X

However, if W−1 is not a good estimator of E(zi ui ui z0i ), then the optimal GMM estimator is
inefficient, and inference based on the optimal variance estimator could be misleading.
W is calculated using the residuals from the initial 2SLS estimates, whereas S is estimated using
the residuals based on bGMM . The wmatrix() option affects the form of W, whereas the vce()
option affects the form of S. Except for different residuals being used, the formulas for W−1 and
S are identical, so we focus on estimating W−1 .
If wmatrix(unadjusted) is specified, then
s2 X
W−1 = zi z0i
n i
where s2 = u2i /n. This weight matrix is appropriate if the errors are homoskedastic.
P
i
If wmatrix(robust) is specified, then
1X 2 0
W−1 = u zi zi
n i i
which is appropriate if the errors are heteroskedastic.

If wmatrix(cluster clustvar) is specified, then

1X
W−1 = qc q0c
n c
where c indexes clusters, X

qc = u i zi
i∈cj
and cj denotes the j th cluster.

If wmatrix(hac kernel # ) is specified, then
l=n−1 i=n
−1 1X 2 0 1 X X
K(l, m)ui ui−l zi z0i−l + zi−l z0i

W = u zi zi +
n i i n
l=1 i=l+1
where m = # if # is specified and m = n − 2 otherwise. Define z = l/(m + 1). If kernel is nwest,

then
1−z 0≤z ≤1
n
K(l, m) =
0 otherwise
If kernel is gallant, then
1 − 6z 2 + 6z 3 0 ≤ z ≤ 0.5
(
K(l, m) = 2(1 − z)3 0.5 < z ≤ 1
0 otherwise
If kernel is quadraticspectral, then

1 z=0
K(l, m) =
3 {sin(θ)/θ − cos(θ)} /θ2 otherwise
where θ = 6πz/5.
If wmatrix(hac kernel opt) is specified, then ivregress uses Newey and West’s (1994) automatic
lag-selection algorithm, which proceeds as follows. Define h to be a (k1 + k2 ) × 1 vector containing
ones in all rows except for the row corresponding to the constant term (if present); that row contains
a zero. Define
fi = (ui zi )h
n
1 X
σ
bj = fi fi−j j = 0, . . . , m∗
n i=j+1
∗
m
X
(q)
sb =2 bj j q
σ
j=1
∗
m
X
(0)
sb =σ
b0 + 2 σ
bj
j=1
( 2 )1/2q+1
sb (q)
γ
b = cγ
sb (0)
bn1/(2q+1)
m=γ
where q , m∗ , and cγ depend on the kernel specified:
Kernel q m∗ cγ

Bartlett 1 int 20(T /100)2/9 1.1447

Parzen 2 int 20(T /100)4/25 2.6614

Quadratic spectral 2 int 20(T /100)2/25 1.3221
where int(x) denotes the integer obtained by truncating x toward zero. For the Bartlett and Parzen
kernels, the optimal lag is min{int(m), m∗ }. For the quadratic spectral, the optimal lag is min{m, m∗ }.
P matrices ivregress replaces the term ui zi in
If center is specified, when computing weighting
the formulas above with ui zi − uz, where uz = i ui zi /N .
References
Andrews, D. W. K. 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica
59: 817–858.
Angrist, J. D., and J.-S. Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ:
Princeton University Press.
Basmann, R. L. 1957. A generalized classical method of linear estimation of coefficients in a structural equation.
Econometrica 25: 77–83.
Bauldry, S. 2014. miivfind: A command for identifying model-implied instrumental variables for structural equation
models in Stata. Stata Journal 14: 60–75.
Baum, C. F. 2006. An Introduction to Modern Econometrics Using Stata. College Station, TX: Stata Press.
Baum, C. F., M. E. Schaffer, and S. Stillman. 2003. Instrumental variables and GMM: Estimation and testing. Stata
Journal 3: 1–31.
. 2007. Enhanced routines for instrumental variables/generalized method of moments estimation and testing. Stata
Journal 7: 465–506.
Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and Applications. New York: Cambridge
University Press.
. 2010. Microeconometrics Using Stata. Rev. ed. College Station, TX: Stata Press.
Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University
Press.
. 2004. Econometric Theory and Methods. New York: Oxford University Press.
Desbordes, R., and V. Verardi. 2012. A robust instrumental-variables estimator. Stata Journal 12: 169–181.
Finlay, K., and L. M. Magnusson. 2009. Implementing weak-instrument robust tests for a general class of instrumental-
variables models. Stata Journal 9: 398–421.
Gallant, A. R. 1987. Nonlinear Statistical Models. New York: Wiley.
Greene, W. H. 2012. Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall.
Hall, A. R. 2005. Generalized Method of Moments. Oxford: Oxford University Press.
Hansen, L. P. 1982. Large sample properties of generalized method of moments estimators. Econometrica 50:
1029–1054.
Hayashi, F. 2000. Econometrics. Princeton, NJ: Princeton University Press.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl, and T.-C. Lee. 1985. The Theory and Practice of Econometrics.
2nd ed. New York: Wiley.
Kmenta, J. 1997. Elements of Econometrics. 2nd ed. Ann Arbor: University of Michigan Press.
Koopmans, T. C., and W. C. Hood. 1953. Studies in Econometric Method. New York: Wiley.
Koopmans, T. C., and J. Marschak. 1950. Statistical Inference in Dynamic Economic Models. New York: Wiley.
Newey, W. K., and K. D. West. 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent
covariance matrix. Econometrica 55: 703–708.
. 1994. Automatic lag selection in covariance matrix estimation. Review of Economic Studies 61: 631–653.
Nichols, A. 2007. Causal inference with observational data. Stata Journal 7: 507–541.
Palmer, T. M., V. Didelez, R. R. Ramsahai, and N. A. Sheehan. 2011. Nonparametric bounds for the causal effect
in a binary instrumental-variable model. Stata Journal 11: 345–367.
Poi, B. P. 2006. Jackknife instrumental variables estimation in Stata. Stata Journal 6: 364–376.
Stock, J. H., and M. W. Watson. 2011. Introduction to Econometrics. 3rd ed. Boston: Addison–Wesley.
Stock, J. H., J. H. Wright, and M. Yogo. 2002. A survey of weak instruments and weak identification in generalized
method of moments. Journal of Business and Economic Statistics 20: 518–529.
Theil, H. 1953. Repeated Least Squares Applied to Complete Equation Systems. Mimeograph from the Central
Planning Bureau, The Hague.
Wang, Q., and N. Wu. 2012. Long-run covariance and its applications in cointegration regression. Stata Journal 12:
515–542.
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, MA: MIT Press.
. 2013. Introductory Econometrics: A Modern Approach. 5th ed. Mason, OH: South-Western.
Wright, P. G. 1928. The Tariff on Animal and Vegetable Oils. New York: Macmillan.
Also see
[R] ivregress postestimation — Postestimation tools for ivregress
[R] gmm — Generalized method of moments estimation
[R] ivprobit — Probit model with continuous endogenous regressors
[R] ivtobit — Tobit model with continuous endogenous regressors
[R] reg3 — Three-stage estimation for systems of simultaneous equations
[R] regress — Linear regression
[SEM] intro 5 — Tour of models
[SVY] svy estimation — Estimation commands for survey data
[TS] forecast — Econometric model forecasting
[XT] xtivreg — Instrumental variables and two-stage least squares for panel-data models
[U] 20 Estimation and postestimation commands

Rivregress

Uploaded by

Copyright:

Available Formats

Rivregress

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rivregress

Uploaded by

Copyright:

Available Formats

Title stata.

Syntax Menu Description Options

perfect do not check for collinearity between endogenous regressors and

first requests that the first-stage regression results be displayed.

Remarks and examples stata.com

Formally, the model fit by ivregress is

2SLS and LIML estimators

Example 1: 2SLS estimator

renti = β0 + β1 hsngvali + β2 pcturbani + ui

where i indexes states and ui is an error term.

Here we fit our model with the 2SLS estimator:

rent Coef. Std. Err. z P>|z| [95% Conf. Interval]

hsngval .0022398 .0003284 6.82 0.000 .0015961 .0028836

hsngvali = π0 + π1 faminci + π2 2.regioni + π3 3.regioni + π4 4.regioni + vi

Example 2: LIML estimator

rent Coef. Std. Err. z P>|z| [95% Conf. Interval]

hsngval .0026686 .0004173 6.39 0.000 .0018507 .0034865

S0 = E(zi ui ui z0i ) = E(u2i zi z0i )

and the sample analogue is

Example 3: GMM estimator

hsngval .0014643 .0004473 3.27 0.001 .0005877 .002341

Because we requested that a heteroskedasticity-consistent weighting matrix be used during estimation

Here ivregress estimates S0 as

where C denotes the set of clusters and

Example 4: GMM estimator with clustering

tenure .099221 .0037764 26.27 0.000 .0918194 .1066227

c.age#c.age -.0005191 .000111 -4.68 0.000 -.0007366 -.0003016

birth_yr -.0085994 .0021932 -3.92 0.000 -.012898 -.0043008

Methods and formulas

2SLS and LIML estimators

The variance of bGMM is

which is appropriate if the errors are heteroskedastic.

If wmatrix(cluster clustvar) is specified, then

where c indexes clusters, X

and cj denotes the j th cluster.

where m = # if # is specified and m = n − 2 otherwise. Define z = l/(m + 1). If kernel is nwest,

where q , m∗ , and cγ depend on the kernel specified:

You might also like