Rivregress
Rivregress
Rivregress
com
ivregress — Single-equation instrumental-variables regression
Syntax
ivregress estimator depvar varlist1 (varlist2 = varlistiv ) if in weight
, options
estimator Description
2sls two-stage least squares (2SLS)
liml limited-information maximum likelihood (LIML)
gmm generalized method of moments (GMM)
options Description
Model
noconstant suppress constant term
hascons has user-supplied constant
1
GMM
wmatrix(wmtype) wmtype may be robust, cluster clustvar, hac kernel, or unadjusted
center center moments in weight matrix computation
igmm use iterative instead of two-step GMM estimator
eps(#)2 specify # for parameter convergence criterion; default is eps(1e-6)
weps(#)2 specify # for weight matrix convergence criterion; default is
weps(1e-6)
optimization options2 control the optimization process; seldom used
SE/Robust
vce(vcetype) vcetype may be unadjusted, robust, cluster clustvar, bootstrap,
jackknife, or hac kernel
Reporting
level(#) set confidence level; default is level(95)
first report first-stage regression
small make degrees-of-freedom adjustments and report small-sample
statistics
noheader display only the coefficient table
depname(depname) substitute dependent variable name
eform(string) report exponentiated coefficients and use string to label them
display options control column formats, row spacing, line width, display of omitted
variables and base and empty cells, and factor-variable labeling
1
2 ivregress — Single-equation instrumental-variables regression
Menu
Statistics > Endogenous covariates > Single-equation instrumental-variables regression
Description
ivregress fits a linear regression of depvar on varlist1 and varlist2 , using varlistiv (along with
varlist1 ) as instruments for varlist2 . ivregress supports estimation via two-stage least squares (2SLS),
limited-information maximum likelihood (LIML), and generalized method of moments (GMM).
In the language of instrumental variables, varlist1 and varlistiv are the exogenous variables, and
varlist2 are the endogenous variables.
Options
Model
noconstant; see [R] estimation options.
hascons indicates that a user-defined constant or its equivalent is specified among the independent
variables.
GMM
wmatrix(wmtype) specifies the type of weighting matrix to be used in conjunction with the GMM
estimator.
Specifying wmatrix(robust) requests a weighting matrix that is optimal when the error term is
heteroskedastic. wmatrix(robust) is the default.
Specifying wmatrix(cluster clustvar) requests a weighting matrix that accounts for arbitrary
correlation among observations within clusters identified by clustvar.
Specifying wmatrix(hac kernel #) requests a heteroskedasticity- and autocorrelation-consistent
(HAC) weighting matrix using the specified kernel (see below) with # lags. The bandwidth of a
kernel is equal to # + 1.
ivregress — Single-equation instrumental-variables regression 3
Specifying wmatrix(hac kernel opt) requests an HAC weighting matrix using the specified kernel,
and the lag order is selected using Newey and West’s (1994) optimal lag-selection algorithm.
Specifying wmatrix(hac kernel) requests an HAC weighting matrix using the specified kernel and
N − 2 lags, where N is the sample size.
There are three kernels available for HAC weighting matrices, and you may request each one by
using the name used by statisticians or the name perhaps more familiar to economists:
bartlett or nwest requests the Bartlett (Newey–West) kernel;
parzen or gallant requests the Parzen (Gallant 1987) kernel; and
quadraticspectral or andrews requests the quadratic spectral (Andrews 1991) kernel.
Specifying wmatrix(unadjusted) requests a weighting matrix that is suitable when the errors are
homoskedastic. The GMM estimator with this weighting matrix is equivalent to the 2SLS estimator.
center requests that the sample moments be centered (demeaned) when computing GMM weight
matrices. By default, centering is not done.
igmm requests that the iterative GMM estimator be used instead of the default two-step GMM estimator.
Convergence is declared when the relative change in the parameter vector from one iteration to
the next is less than eps() or the relative change in the weight matrix is less than weps().
eps(#) specifies the convergence criterion for successive parameter estimates when the iterative GMM
estimator is used. The default is eps(1e-6). Convergence is declared when the relative difference
between successive parameter estimates is less than eps() and the relative difference between
successive estimates of the weighting matrix is less than weps().
weps(#) specifies the convergence criterion for successive estimates of the weighting matrix when
the iterative GMM estimator is used. The default is weps(1e-6). Convergence is declared when
the relative difference between successive parameter estimates is less than eps() and the relative
difference between successive estimates of the weighting matrix is less than weps().
optimization options: iterate(#), no log. iterate() specifies the maximum number of iterations
to perform in conjunction with the iterative GMM estimator. The default is 16,000 or the number
set using set maxiter (see [R] maximize). log/nolog specifies whether to show the iteration
log. These options are seldom used.
SE/Robust
vce(vcetype) specifies the type of standard error reported, which includes types that are robust to
some kinds of misspecification (robust), that allow for intragroup correlation (cluster clustvar),
and that use bootstrap or jackknife methods (bootstrap, jackknife); see [R] vce option.
vce(unadjusted), the default for 2sls and liml, specifies that an unadjusted (nonrobust) VCE
matrix be used. The default for gmm is based on the wmtype specified in the wmatrix() option;
see wmatrix(wmtype) above. If wmatrix() is specified with gmm but vce() is not, then vcetype
is set equal to wmtype. To override this behavior and obtain an unadjusted (nonrobust) VCE matrix,
specify vce(unadjusted).
ivregress also allows the following:
vce(hac kernel # | opt ) specifies that an HAC covariance matrix be used. The syntax used
with vce(hac kernel . . .) is identical to that used with wmatrix(hac kernel . . .); see
wmatrix(wmtype) above.
Reporting
level(#); see [R] estimation options.
4 ivregress — Single-equation instrumental-variables regression
The following options are available with ivregress but are not shown in the dialog box:
perfect requests that ivregress not check for collinearity between the endogenous regressors and
excluded instruments, allowing one to specify “perfect” instruments. This option cannot be used
with the LIML estimator. This option may be required when using ivregress to implement other
estimators.
coeflegend; see [R] estimation options.
yi = yi β1 + x1i β2 + ui (1)
yi = x1i Π1 + x2i Π2 + vi (2)
Here yi is the dependent variable for the ith observation, yi represents the endogenous regressors
(varlist2 in the syntax diagram), x1i represents the included exogenous regressors (varlist1 in the syntax
diagram), and x2i represents the excluded exogenous regressors (varlistiv in the syntax diagram).
x1i and x2i are collectively called the instruments. ui and vi are zero-mean error terms, and the
correlations between ui and the elements of vi are presumably nonzero.
The rest of the discussion is presented under the following headings:
2SLS and LIML estimators
GMM estimator
Instrumented: hsngval
Instruments: pcturban faminc 2.region 3.region 4.region
As we would expect, states with higher housing values have higher rental rates. The proportion
of a state’s population that is urban does not have a significant effect on rents.
Technical note
In a simultaneous-equations framework, we could write the model we just fit as
which here happens to be recursive (triangular), because hsngval appears in the equation for rent
but rent does not appear in the equation for hsngval. In general, however, systems of simultaneous
equations are not recursive. Because this system is recursive, we could fit the two equations individually
via OLS if we were willing to assume that u and v were independent. For a more detailed discussion
of triangular systems, see Kmenta (1997, 719–720).
Historically, instrumental-variables estimation and systems of simultaneous equations were taught
concurrently, and older textbooks describe instrumental-variables estimation solely in the context of
simultaneous equations. However, in recent decades, the treatment of endogeneity and instrumental-
variables estimation has taken on a much broader scope, while interest in the specification of
complete systems of simultaneous equations has waned. Most recent textbooks, such as Cameron
and Trivedi (2005), Davidson and MacKinnon (1993, 2004), and Wooldridge (2010, 2013), treat
instrumental-variables estimation as an integral part of the modern economists’ toolkit and introduce
it long before shorter discussions on simultaneous equations.
In addition to the 2SLS member of the κ-class estimators, ivregress implements the LIML
estimator. Both theoretical and Monte Carlo exercises indicate that the LIML estimator may yield less
bias and confidence intervals with better coverage rates than the 2SLS estimator. See Poi (2006) and
Stock, Wright, and Yogo (2002) (and the papers cited therein) for Monte Carlo evidence.
ivregress — Single-equation instrumental-variables regression 7
Instrumented: hsngval
Instruments: pcturban faminc 2.region 3.region 4.region
These results are qualitatively similar to the 2SLS results, although the coefficient on hsngval is
about 19% higher.
GMM estimator
Since the celebrated paper of Hansen (1982), the GMM has been a popular method of estimation
in economics and finance, and it lends itself well to instrumental-variables estimation. The basic
principle is that we have some moment or orthogonality conditions of the form
E(zi ui ) = 0 (3)
From (1), we have ui = yi − yi β1 − x1i β2 . What are the elements of the instrument vector zi ? By
assumption, x1i is uncorrelated with ui , as are the excluded exogenous variables x2i , and so we use
zi = [x1i x2i ]. The moment conditions are simply the mathematical representation of the assumption
that the instruments are exogenous—that is, the instruments are orthogonal to (uncorrelated with) ui .
If the number of elements in zi is just equal to the number of unknown parameters, then we can
apply the analogy principle to (3) and solve
1 X 1 X
zi ui = zi (yi − yi β1 − x1i β2 ) = 0 (4)
N i N i
This equation is known as the method of moments estimator. Here where the number of instruments
equals the number of parameters, the method of moments estimator coincides with the 2SLS estimator,
which also coincides with what has historically been called the indirect least-squares estimator (Judge
et al. 1985, 595).
The “generalized” in GMM addresses the case in which the number of instruments (columns of zi )
exceeds the number of parameters to be estimated. Here there is no unique solution to the population
moment conditions defined in (3), so we cannot use (4). Instead, we define the objective function
!0 !
1 X 1 X
Q(β1 , β2 ) = zi ui W zi u i (5)
N i
N i
8 ivregress — Single-equation instrumental-variables regression
where W is a positive-definite matrix with the same number of rows and columns as the number of
columns of zi . W is known as the weighting matrix, and we specify its structure with the wmatrix()
option. The GMM estimator of (β1 , β2 ) minimizes Q(β1 , β2 ); that is, the GMM estimator chooses
β1 and β2 to make the moment conditions as close to zero as possible for a given W. For a more
general GMM estimator, see [R] gmm. gmm does not restrict you to fitting a single linear equation,
though the syntax is more complex.
A well-known result is that if we define the matrix S0 to be the covariance of zi ui and set
W = S−1 0 , then we obtain the optimal two-step GMM estimator, where by optimal estimator we mean
the one that results in the smallest variance given the moment conditions defined in (3).
Suppose that the errors ui are heteroskedastic but independent among observations. Then
To implement this estimator, we need estimates of the sample residuals u bi . ivregress gmm obtains
the residuals by estimating β1 and β2 by 2SLS and then evaluates (6) and sets W = S b −1 . Equation (6)
is the same as the center term of the “sandwich” robust covariance matrix available from most Stata
estimation commands through the vce(robust) option.
Robust
rent Coef. Std. Err. z P>|z| [95% Conf. Interval]
Instrumented: hsngval
Instruments: pcturban faminc 2.region 3.region 4.region
Technical note
Many software packages that implement GMM estimation use the same heteroskedasticity-consistent
weighting matrix we used in the previous example to obtain the optimal two-step estimates but do not use
a heteroskedasticity-consistent VCE, even though they may label the standard errors as being “robust”.
To replicate results obtained from other packages, you may have to use the vce(unadjusted) option.
See Methods and formulas below for a discussion of robust covariance matrix estimation in the GMM
framework.
By changing our definition of S0 , we can obtain GMM estimators suitable for use with other types
of data that violate the assumption that the errors are independent and identically distributed. For
example, you may have a dataset that consists of multiple observations for each person in a sample.
The observations that correspond to the same person are likely to be correlated, and the estimation
technique should account for that lack of independence. Say that in your dataset, people are identified
by the variable personid and you type
. ivregress gmm ..., wmatrix(cluster personid)
where cj denotes the j th cluster. This weighting matrix accounts for the within-person correlation
among observations, so the GMM estimator that uses this version of S0 will be more efficient than
the estimator that ignores this correlation.
. use http://www.stata-press.com/data/r13/nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. ivregress gmm ln_wage age c.age#c.age birth_yr grade
> (tenure = union wks_work msp), wmatrix(cluster idcode)
Instrumental variables (GMM) regression Number of obs = 18625
Wald chi2(5) = 1807.17
Prob > chi2 = 0.0000
R-squared = .
GMM weight matrix: Cluster (idcode) Root MSE = .46951
(Std. Err. adjusted for 4110 clusters in idcode)
Robust
ln_wage Coef. Std. Err. z P>|z| [95% Conf. Interval]
Instrumented: tenure
Instruments: age c.age#c.age birth_yr grade union wks_work msp
Both job tenure and years of schooling have significant positive effects on wages.
Time-series data are often plagued by serial correlation. In these cases, we can construct a weighting
matrix to account for the fact that the error in period t is probably correlated with the errors in periods
t − 1, t − 2, etc. An HAC weighting matrix can be used to account for both serial correlation and
potential heteroskedasticity.
To request an HAC weighting matrix, you specify the wmatrix(hac kernel # | opt ) option.
kernel specifies which of three kernels to use: bartlett, parzen, or quadraticspectral. kernel
determines the amount of weight given to lagged values when computing the HAC matrix, and #
denotes the maximum number of lags to use. Many texts refer to the bandwidth of the kernel instead
of the number of lags; the bandwidth is equal to the number of lags plus one. If neither opt nor #
is specified, then N − 2 lags are used, where N is the sample size.
If you specify wmatrix(hac kernel opt), then ivregress uses Newey and West’s (1994)
algorithm for automatically selecting the number of lags to use. Although the authors’ Monte Carlo
simulations do show that the procedure may result in size distortions of hypothesis tests, the procedure
is still useful when little other information is available to help choose the number of lags.
For more on GMM estimation, see Baum (2006); Baum, Schaffer, and Stillman (2003, 2007);
Cameron and Trivedi (2005); Davidson and MacKinnon (1993, 2004); Hayashi (2000); or
Wooldridge (2010). See Newey and West (1987) and Wang and Wu (2012) for an introduction
to HAC covariance matrix estimation.
ivregress — Single-equation instrumental-variables regression 11
Stored results
ivregress stores the following in e():
Scalars
e(N) number of observations
e(mss) model sum of squares
e(df m) model degrees of freedom
e(rss) residual sum of squares
e(df r) residual degrees of freedom
e(r2) R2
e(r2 a) adjusted R2
e(F) F statistic
e(rmse) root mean squared error
e(N clust) number of clusters
e(chi2) χ2
e(kappa) κ used in LIML estimator
e(J) value of GMM objective function
e(wlagopt) lags used in HAC weight matrix (if Newey–West algorithm used)
e(vcelagopt) lags used in HAC VCE matrix (if Newey–West algorithm used)
e(rank) rank of e(V)
e(iterations) number of GMM iterations (0 if not applicable)
Macros
e(cmd) ivregress
e(cmdline) command as typed
e(depvar) name of dependent variable
e(instd) instrumented variable
e(insts) instruments
e(constant) noconstant or hasconstant if specified
e(wtype) weight type
e(wexp) weight expression
e(title) title in estimation output
e(clustvar) name of cluster variable
e(hac kernel) HAC kernel
e(hac lag) HAC lag
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. Err.
e(estimator) 2sls, liml, or gmm
e(exogr) exogenous regressors
e(wmatrix) wmtype specified in wmatrix()
e(moments) centered if center specified
e(small) small if small-sample statistics
e(depname) depname if depname(depname) specified; otherwise same as e(depvar)
e(properties) b V
e(estat cmd) program used to implement estat
e(predict) program used to implement predict
e(footnote) program used to implement footnote display
e(marginsok) predictions allowed by margins
e(marginsnotok) predictions disallowed by margins
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
Matrices
e(b) coefficient vector
e(Cns) constraints matrix
e(W) weight matrix used to compute GMM estimates
e(S) moment covariance matrix used to compute GMM variance–covariance matrix
e(V) variance–covariance matrix of the estimators
e(V modelbased) model-based variance
Functions
e(sample) marks estimation sample
12 ivregress — Single-equation instrumental-variables regression
Notation
Items printed in lowercase and italicized (for example, x) are scalars. Items printed in lowercase
and boldfaced (for example, x) are vectors. Items printed in uppercase and boldfaced (for example,
X) are matrices.
The model is
y = Yβ1 + X1 β2 + u = Xβ + u
Y = X1 Π1 + X2 Π2 + v = ZΠ + V
where y is an N × 1 vector of the left-hand-side variable; N is the sample size; Y is an N × p
matrix of p endogenous regressors; X1 is an N × k1 matrix of k1 included exogenous regressors;
X2 is an N × k2 matrix of k2 excluded exogenous variables, X = [Y X1 ], Z = [X1 X2 ]; u is an
N × 1 vector of errors; V is an N × p matrix of errors; β = [β1 β2 ] is a k = (p + k1 ) × 1 vector
of parameters; and Π is a (k1 + k2 ) × p vector of parameters. If a constant term is included in the
model, then one column of X1 contains all ones.
Let v be a column vector of weights specified by the user. If no weights are specified, v = 1.
Let w be a column vector of normalized weights.If no weights
are specified or if the user specified
fweights or iweights, w = v; otherwise, w = v/(10 v) (10 1). Let D denote the N × N matrix
with w on the main diagonal and zeros elsewhere. If no weights are specified, D is the identity
matrix.
The weighted number of observations n is defined as 10 w. For iweights, this is truncated to an
integer. The sum of the weights is 10 v. Define c = 1 if there is a constant in the regression and zero
otherwise.
The order condition for identification requires that k2 ≥ p: the number of excluded exogenous
variables must be at least as great as the number of endogenous regressors.
In the following formulas, if weights are specified, X01 X1 , X0 X, X0 y, y0 y, Z0 Z, Z0 X, and Z0 y
are replaced with X01 DX1 , X0 DX, X0 Dy, y0 Dy, Z0 DZ, Z0 DX, and Z0 Dy, respectively. We
suppress the D below to simplify the notation.
where MZ = I − Z(Z0 Z)−1 Z0 . The 2SLS estimator results from setting κ = 1. The LIML estimator
results from selecting κ to be the minimum eigenvalue of (Y0 MZ Y)−1/2 Y0 MX1 Y(Y0 MZ Y)−1/2 ,
where MX1 = I − X1 (X01 X1 )−1 X01 .
The total sum of squares (TSS) equals y0 y if there is no intercept and y0 y − (10 y)2 /n otherwise.
The degrees of freedom is n−c. The error sum of squares (ESS) is defined as y0 y− 2bX0 y+b0 X0 Xb.
The model sum of squares (MSS) equals TSS − ESS. The degrees of freedom is k − c.
ivregress — Single-equation instrumental-variables regression 13
The mean squared error, s2 , is defined as ESS/(n − k) if small is specified and ESS/n otherwise.
The root mean squared error is s, its square root.
If c = 1 and small is not specified, a Wald statistic, W , of the joint significance of the k − 1
parameters of β except the constant term is calculated; W ∼ χ2 (k − 1). If c = 1 and small is
specified, then an F statistic is calculated as F = W/(k − 1); F ∼ F (k − 1, n − k).
The R-squared is defined as R2 = 1 − ESS/TSS.
The adjusted R-squared is Ra2 = 1 − (1 − R2 )(n − c)/(n − k).
−1
If robust is not specified, then Var(b) = s2 X0 (I − κMZ )X
. For a discussion of robust
variance estimates in regression and regression with instrumental variables, see Methods and formulas
in [R] regress. If small is not specified, then k = 0 in the formulas given there.
This command also supports estimation with survey data. For details on VCEs with survey data,
see [SVY] variance estimation.
GMM estimator
We obtain an initial consistent estimate of β by using the 2SLS estimator; see above. Using this
estimate of β, we compute the weighting matrix W and calculate the GMM estimator
−1 0
bGMM = X0 ZWZ0 X X ZWZ0 y
Var(bGMM ) is of the sandwich form DMD; see [P] robust. If the user specifies the small option,
ivregress implements a small-sample adjustment by multiplying the VCE by N/(N − k).
b = W−1 and the VCE reduces to the “optimal”
If vce(unadjusted) is specified, then we set S
GMM variance estimator −1
Var(βGMM ) = n X0 ZWZ0 X
However, if W−1 is not a good estimator of E(zi ui ui z0i ), then the optimal GMM estimator is
inefficient, and inference based on the optimal variance estimator could be misleading.
W is calculated using the residuals from the initial 2SLS estimates, whereas S is estimated using
the residuals based on bGMM . The wmatrix() option affects the form of W, whereas the vce()
option affects the form of S. Except for different residuals being used, the formulas for W−1 and
S are identical, so we focus on estimating W−1 .
If wmatrix(unadjusted) is specified, then
s2 X
W−1 = zi z0i
n i
where s2 = u2i /n. This weight matrix is appropriate if the errors are homoskedastic.
P
i
If wmatrix(robust) is specified, then
1X 2 0
W−1 = u zi zi
n i i
where θ = 6πz/5.
If wmatrix(hac kernel opt) is specified, then ivregress uses Newey and West’s (1994) automatic
lag-selection algorithm, which proceeds as follows. Define h to be a (k1 + k2 ) × 1 vector containing
ones in all rows except for the row corresponding to the constant term (if present); that row contains
a zero. Define
fi = (ui zi )h
n
1 X
σ
bj = fi fi−j j = 0, . . . , m∗
n i=j+1
∗
m
X
(q)
sb =2 bj j q
σ
j=1
∗
m
X
(0)
sb =σ
b0 + 2 σ
bj
j=1
( 2 )1/2q+1
sb (q)
γ
b = cγ
sb (0)
bn1/(2q+1)
m=γ
ivregress — Single-equation instrumental-variables regression 15
Kernel q m∗ cγ
Bartlett 1 int 20(T /100)2/9 1.1447
Parzen 2 int 20(T /100)4/25 2.6614
Quadratic spectral 2 int 20(T /100)2/25 1.3221
where int(x) denotes the integer obtained by truncating x toward zero. For the Bartlett and Parzen
kernels, the optimal lag is min{int(m), m∗ }. For the quadratic spectral, the optimal lag is min{m, m∗ }.
P matrices ivregress replaces the term ui zi in
If center is specified, when computing weighting
the formulas above with ui zi − uz, where uz = i ui zi /N .
References
Andrews, D. W. K. 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica
59: 817–858.
Angrist, J. D., and J.-S. Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ:
Princeton University Press.
Basmann, R. L. 1957. A generalized classical method of linear estimation of coefficients in a structural equation.
Econometrica 25: 77–83.
Bauldry, S. 2014. miivfind: A command for identifying model-implied instrumental variables for structural equation
models in Stata. Stata Journal 14: 60–75.
Baum, C. F. 2006. An Introduction to Modern Econometrics Using Stata. College Station, TX: Stata Press.
Baum, C. F., M. E. Schaffer, and S. Stillman. 2003. Instrumental variables and GMM: Estimation and testing. Stata
Journal 3: 1–31.
. 2007. Enhanced routines for instrumental variables/generalized method of moments estimation and testing. Stata
Journal 7: 465–506.
Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and Applications. New York: Cambridge
University Press.
. 2010. Microeconometrics Using Stata. Rev. ed. College Station, TX: Stata Press.
Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University
Press.
. 2004. Econometric Theory and Methods. New York: Oxford University Press.
Desbordes, R., and V. Verardi. 2012. A robust instrumental-variables estimator. Stata Journal 12: 169–181.
Finlay, K., and L. M. Magnusson. 2009. Implementing weak-instrument robust tests for a general class of instrumental-
variables models. Stata Journal 9: 398–421.
Gallant, A. R. 1987. Nonlinear Statistical Models. New York: Wiley.
Greene, W. H. 2012. Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall.
Hall, A. R. 2005. Generalized Method of Moments. Oxford: Oxford University Press.
Hansen, L. P. 1982. Large sample properties of generalized method of moments estimators. Econometrica 50:
1029–1054.
Hayashi, F. 2000. Econometrics. Princeton, NJ: Princeton University Press.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl, and T.-C. Lee. 1985. The Theory and Practice of Econometrics.
2nd ed. New York: Wiley.
Kmenta, J. 1997. Elements of Econometrics. 2nd ed. Ann Arbor: University of Michigan Press.
Koopmans, T. C., and W. C. Hood. 1953. Studies in Econometric Method. New York: Wiley.
16 ivregress — Single-equation instrumental-variables regression
Koopmans, T. C., and J. Marschak. 1950. Statistical Inference in Dynamic Economic Models. New York: Wiley.
Newey, W. K., and K. D. West. 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent
covariance matrix. Econometrica 55: 703–708.
. 1994. Automatic lag selection in covariance matrix estimation. Review of Economic Studies 61: 631–653.
Nichols, A. 2007. Causal inference with observational data. Stata Journal 7: 507–541.
Palmer, T. M., V. Didelez, R. R. Ramsahai, and N. A. Sheehan. 2011. Nonparametric bounds for the causal effect
in a binary instrumental-variable model. Stata Journal 11: 345–367.
Poi, B. P. 2006. Jackknife instrumental variables estimation in Stata. Stata Journal 6: 364–376.
Stock, J. H., and M. W. Watson. 2011. Introduction to Econometrics. 3rd ed. Boston: Addison–Wesley.
Stock, J. H., J. H. Wright, and M. Yogo. 2002. A survey of weak instruments and weak identification in generalized
method of moments. Journal of Business and Economic Statistics 20: 518–529.
Theil, H. 1953. Repeated Least Squares Applied to Complete Equation Systems. Mimeograph from the Central
Planning Bureau, The Hague.
Wang, Q., and N. Wu. 2012. Long-run covariance and its applications in cointegration regression. Stata Journal 12:
515–542.
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, MA: MIT Press.
. 2013. Introductory Econometrics: A Modern Approach. 5th ed. Mason, OH: South-Western.
Wright, P. G. 1928. The Tariff on Animal and Vegetable Oils. New York: Macmillan.
Also see
[R] ivregress postestimation — Postestimation tools for ivregress
[R] gmm — Generalized method of moments estimation
[R] ivprobit — Probit model with continuous endogenous regressors
[R] ivtobit — Tobit model with continuous endogenous regressors
[R] reg3 — Three-stage estimation for systems of simultaneous equations
[R] regress — Linear regression
[SEM] intro 5 — Tour of models
[SVY] svy estimation — Estimation commands for survey data
[TS] forecast — Econometric model forecasting
[XT] xtivreg — Instrumental variables and two-stage least squares for panel-data models
[U] 20 Estimation and postestimation commands