Entropy: Generalized Maximum Entropy Analysis of The Linear Simultaneous Equations Model
Entropy: Generalized Maximum Entropy Analysis of The Linear Simultaneous Equations Model
Entropy: Generalized Maximum Entropy Analysis of The Linear Simultaneous Equations Model
3390/e16020825
OPEN ACCESS
entropy
ISSN 1099-4300
www.mdpi.com/journal/entropy
Article
Received: 20 November 2013; in revised form: 17 January 2014 / Accepted: 28 January 2014 /
Published: 12 February 2014
Abstract: A generalized maximum entropy estimator is developed for the linear simultaneous
equations model. Monte Carlo sampling experiments are used to evaluate the estimator’s
performance in small and medium sized samples, suggesting contexts in which the current
generalized maximum entropy estimator is superior in mean square error to two and three
stage least squares. Analytical results are provided relating to asymptotic properties of the
estimator and associated hypothesis testing statistics. Monte Carlo experiments are also
used to provide evidence on the power and size of test statistics. An empirical application is
included to demonstrate the practical implementation of the estimator.
MSC Codes: 62
1. Introduction
least squares [2], limited information maximum likelihood [3], and full information maximum
likelihood [4,5]. These estimators yield consistent estimates of structural parameters by correcting for
simultaneity between the endogenous variables and the disturbance terms of the statistical model.
However, in the presence of small samples or ill-posed problems, traditional approaches may provide
parameter estimates with high variance and/or bias, or provide no solution at all. As an alternative to
traditional estimators, we present a generalized maximum entropy estimator for the linear SEM and
rigorously analyze its sampling properties in small and large sample situations including the case of
contaminated error models.
Finite sampling properties of the SEM have been discussed in [6–10], where alternative estimation
techniques that have potentially superior sampling properties are suggested. Specifically, they
discussed limitations of asymptotically justified estimators in finite sample situations and the lack of
research on estimators that have small sample justification. In a special issue of The Journal of
Business and Economic Statistics, the authors of [11,12] examined small sample properties of generalized
methods of moments estimators for model parameters and covariance matrices. References [13–15]
pointed out that even small deviations from model assumptions in parametric econometric-statistical
models that are only asymptotically justified can lead to undesirable outcomes. Moreover, Reference [16]
singled out the extreme sensitivity of least squares estimators to modest departures from strictly Gaussian
conditions as a justification for examining robust methods of estimation. These studies motivate the
importance of investigating alternatives to parameter estimation methods for the SEM that are robust
in finite samples and lead to improved prediction, forecasting, and policy analysis.
The principle of maximum entropy has been applied in a variety of modeling contexts. Reference [10]
proposed estimation of the SEM based on generalized maximum entropy (GME) to deal with small
samples or ill-posed problems, and defined a criteria that balances the entropy in both the parameter
and residual spaces. The estimator was justified on information theoretic grounds, but the repeated
sampling properties of the estimator and its asymptotic properties were not analyzed extensively.
Reference [17] suggested an information theoretic estimator based on minimization of the Kullback-Leibler
Information Criterion as an alternative to optimally-weighted generalized method of moments estimation
that can accommodate weakly dependent data generating mechanisms. Subsequently, [18] investigated
an information theoretic estimator based on minimization of the Cressie-Read discrepancy statistic as
an alternative approach to inference in models whose data information was cast in terms of moment
conditions. Reference [18] identified both exponential empirical likelihood (negative entropy) andempirical
likelihood as special cases of the Cressie-Read power divergence statistic. More recently, [19,20] applied
the Kullback-Leibler Information Criterion to define empirical moment equations leading to estimators
with improved predictive accuracy and mean square error in some small sample estimation contexts.
Reference [21] provided an overview of information theoretic estimators for the SEM. Reference [22]
demonstrated that maximum entropy estimation of the SEM has relevant application to spatial
autoregressive models wherein autocorrelation parameters are inherently bounded and in circumstances
when traditional spatial estimators become unstable. Reference [23] examined the effect of
management factors on enterprise performance using a GME SEM estimator. Finally, [24] estimated
spatial structural equation models also extended to a panel data framework.
In this paper we investigate a GME estimator for the linear SEM that is fundamentally different
from traditional approaches and identify classes of problems (e.g., contaminated error models) in
Entropy 2014, 16 827
which the proposed estimator outperforms traditional estimators. The estimator: (1) is completely
consistent with data and other model information constraints on parameters, even in finite samples;
(2) has large sample justification in that, under regularity conditions, it retains properties of consistency
and asymptotic normality to provide practitioners with means to apply standard hypothesis testing
procedures; and (3) has the potential for improved finite sample properties relative to alternative
traditional methods of estimation. The proposed estimator is a one-step instrumental variable-type
estimator based on a nonlinear-in-parameters SEM model discussed in [1,7,25]. The method does not
deal with data information by projecting it in the form of moment constraints but rather, in GME
parlance, is based on data constraints that deal with the data in individual sample observation form.
Additional information utilized in the GME estimator includes finite support spaces that are imposed
on model parameters and disturbances, which allows users to incorporate a priori interval restrictions
on the parameters of the model.
Monte Carlo (MC) sampling experiments are used to investigate the finite sample performance
of the proposed GME estimator. In the small sample situations analyzed, the GME estimator is
superior to two and three stage least squares based on mean square error considerations. Further, we
demonstrate the improved robustness of GME relative to 3SLS in the case of contaminated error
models. For larger sample sizes, the consistency of the GME estimator results in sampling behavior
that emulates that of 2SLS and 3SLS estimators. Observations on power and size of asymptotic test
statistics suggest that the GME does not dominate, nor is it dominated by, traditional testing methods.
An empirical application is provided to demonstrate practical implementation of the GME estimator
and to delineate inherent differences between GME and traditional estimators in finite samples.
The empirical analysis also highlights the sensitivity of GME coefficient estimates and predictive fit to
specification of error truncation points, underscoring the need for care in specifying the empirical
error support.
Consider the SEM with G equations, which can be written in matrix form as:
YΓ +XΒ +Ε = 0 (1)
where Y = (y1 ... yG) is a N G matrix of jointly determined endogenous variables, Γ ( Γ1...ΓG ) is
an invertible G G matrix of structural coefficients of the endogenous variables, X = (x1...xK) is
a N K matrix of exogenous variables that has full column rank, Β (Β1...ΒG ) is a K G matrix
of coefficients of exogenous variables, and Ε (ε1...ε G ) is a N G matrix of unobserved random
disturbances. The standard stochastic assumptions of the disturbance vectors are that E[εi ] 0 for
i = 1,...,G and E[εi ε j ] ij I N for i,j = 1,...,G. Letting ε vec(ε1...εG ) denote the vertical concatenation
of the vectors ε1 ,..., ε G , the covariance matrix is given by E[εε ] Σ I N where the G G matrix
Σ contains the unknown ij ' s for i,j = 1,...,G.
The reduced form model is obtained by post-multiplying Equation (1) by Γ1 and solving for Y as:
Y = X ΒΓ 1 + ΕΓ 1 = X Π +V (2)
Entropy 2014, 16 828
y i = Xπi + vi (3)
The ith equation in Equation (1) can be rewritten in terms of a nonlinear structural parameter
representation of the reduced form model as [1]:
y i = XΠ (-i ) γ i + X i β i + μ i = Zδ i + μ i (4)
where E Y (-i ) = XΠ (-i ) , μ i ε i Y( i ) E Y( i ) γ i , Zi XΠ ( i ) Xi , and δ i = vec( γ i,β i ) .
In general the notation (-i) in the subscript of a variable represents the explicit exclusion of the ith
column vector, such as yi being excluded from Y to form Y(−i), in addition to the exclusion of any other
column vectors implied by the structural restrictions. Then Y(−i) represents a N Gi matrix of
Gi jointly dependent explanatory variables having nonzero coefficients in the ith equation, γ i is the
corresponding Gi 1 subvector of the structural parameter vector Γi , Xi is a N Ki matrix that
represents the Ki exogenous variables with nonzero coefficients in the ith equation, and βi is the
corresponding Ki 1 subvector of the parameter vector Βi . It is assumed that the linear exclusion
restrictions on the structural parameters are sufficient to identify each equation. The K Gi matrix of
reduced form coefficients Π ( i ) coincides with the endogenous variables in Y(−i).
Historically, Equation (4) has provided motivation for two stage least squares (2SLS) and three
stage least squares (3SLS) estimators. The presence of right hand side endogenous variables yields
biased and inconsistent estimates for Y(−i) [1]. In 2SLS and 3SLS, the first stage is to approximate
E[Y(−i)] by applying ordinary least squares (OLS) to the unrestricted reduced form model in Equation
(2) and thereby obtain predicted values of Y(−i). Then, using the predicted values to replace E[Y(−i)], the
second stage is to estimate the model in Equation (4) with OLS. In the event that the error terms are
normally distributed, homoskedastic, and serially independent, the 3SLS estimator is asymptotically
equivalent to the asymptotically efficient full-information maximum likelihood (FIML) estimator [21].
Under the same conditions, it is equivalent to apply FIML to either Equation (1) or to Equation (4)
under the restriction Π =-BΓ -1 .
N
H (q) qn ln qn
n 1
in [26]. The value of H(q) reaches a maximum when qn = N −1 for n = 1,...,N, which characterizes the
uniform distribution. Generalizations of the entropy function that have been examined elsewhere in the
econometrics and statistics literature include the Cressie-Read power divergence statistic [18],
Kullback-Leibler Information Criterion [27], and the α-entropy measure [28]. We restrict our analysis
Entropy 2014, 16 829
to the entropy objective function due to its efficiency and robustness properties [18], and its current
universal use within the context of GME applications [9].
GME estimators previously proposed for the SEM include (a) the data constrained estimator for the
general linear model, hereafter GME-D, which amounts to applying the GME principle to a vectorized
version of the structural model in Equation (1); and (b) a two stage estimator analogous to 2SLS
whereby GME-D is applied to the reduced form model in the first stage and to the structural model in
the second stage, hereafter GME-2S. Alternatively, [10] applied the GME principle to the reduced
form model in Equation (3) with the restriction Π =-BΓ -1 imposed, hereafter GME-GJM.
Our approach follows 2SLS and 3SLS in the sense that the restriction Π =-BΓ -1 is not explicitly
enforced and that E[Y(−i)] is algebraically replaced by XΠ ( i ) . However, unlike 2SLS and 3SLS, our
approach is formulated under the GME principle completely consistent with Equation (4) retained as a
nonlinear constraint and concurrently solved with the unrestricted reduced form model in Equation (3)
to identify structural and reduced form coefficient estimates. Reference [7] refers to Equations (3) and (4)
as a nonlinear-in-parameters (NLP) form of the SEM model.
To formulate a GME estimator for the NLP model of the SEM, henceforth referred to as
GME-NLP, parameters and disturbance terms of Equations (3) and (4) are reparameterized as convex
combinations of reference support points and unknown convexity weights. Support matrices S i for
i , , , z, w that identify finite bounded feasible spaces for individual parameters and weight vectors
p , p , p , z, w that consist of unknown parameters to be estimated are explicitly defined below.
The parameters are redefined as β vec(β1 ,..., β G ) S p , γ vec( γ 1 ,..., γ G ) S p , and
π vec( π1 ,..., π G ) S p , while the disturbance vectors are defined as v vec( v 1 ,..., v G ) S z z , and
μ vec(μ 1 ,..., μ G ) S w w . Using these identities and letting p vec(p , p , p , z, w ) the estimates of
π, γ, β are obtained by solving the constrained GME problem:
max {- p ln p} (5)
p
subject to:
y I G X S( )p S p X Sp S w w (6)
y I G X S p S z z (7)
I Q 2 NG 1M p 1Q 2 NG (8)
The S i support matrices (for i , , , z, w ) present in Equations (6) and (7) consist of user supplied
reference support points defining feasible spaces for parameters and disturbances. For example, S w is
given by:
s w 0 ... 0
S1w 0 ... 0 1i sniw1
0 sw ... 0 w
0 S 2w ... 0 w sni 2
w
S = Siw = 2i
sni = (9)
. . . . . . . . .
w
0 0 . SGw (GN GNM ) 0
sniM (M 1)
0 . s wNi (N NM )
Entropy 2014, 16 830
where the nth disturbance term of the gth equation with M support points is defined, in summation
notation, as ng m 1 sngm
M w
wngm . Similarly, the kth β parameter of the gth equation is defined by
kg m1 skgm
M
pkgm . For notational convenience the number of support points have been defined as
M 2 for both errors and parameters.
In Equation (6), the matrix S ( ) defines the reference supports for the block diagonal matrix f, while
X =diag X1 ,..., XG is a GN K block diagonal matrix and y vec( y1 ,..., y G ) is a GN 1 vector
of endogenous variables. In Equations (6) and (7) the NGM 1 vectors w vec(w11 ,..., w NG ) and
z vec(z11 ,..., z NG ) represent vertical concatenations of sets of M 1 subvectors for n = 1,...,N and g
= 1,...,G, where each subvector w ng ( wng 1 , ..., wngM ) and z ng ( z ng 1 ,..., z ngM ) contains a set of M
convex weights. Also p vec(p 11 ,..., p KG ) is a KGM 1 vector that consists of convex weights
p kg ( pkg 1 ,..., pkgM
) for k = 1,...,K and g = 1,...,G. The MG 1 vector p vec(p11 ,..., p GG
) and the
KM 1 vector p
vec(p 11 ,..., p KG ) are similarly defined. Equation (8) contains the required adding
up conditions for each of the sets of convexity weights used in forming the GME-NLP estimator.
Nonnegativity of the weights is an inherent characteristic of the maximum entropy objective and does
not need to be explicitly enforced with inequality constraints. Regarding notation in (8), IG represents a
G G identity matrix and 1N is a N 1 unit vector. Letting K i 1 Ki denote the number of
G
unknown kg ' s and G i 1 Gi denote the number of unknown ig ' s , then together with the KG
G
reduced form parameters, the kg ' s , the total number of unknown parameters in the structural and
reduced form equations is Q K G KG .
Optimizing the objective function defined in Equation (5) optimizes the entropy in the parameter
and disturbance spaces for both the structural model in Equation (6) and the reduced form model in
Equation (7). The optimized objective function can mitigate the detrimental effects of ill-conditioned
explanatory and/or instrumental variables and extreme outliers due to heavy tailed sampling
distributions. In these circumstances traditional estimators are unstable and often represent an
unsatisfactory basis for estimation and inference [20,25,29].
We emphasize that the proposed GME-NLP is a data-constrained estimator. Equations (5)–(8)
constitute a data-constrained model in which the regression models themselves, as opposed to moment
conditions based on them, represent constraining functions to the entropy objective function. [16]
pointed out that outside the Gaussian error model, estimation based on sample moments can be
inefficient relative to other procedures. Reference [9] provided MC evidence that data-constrained
GME models, making use of the full set of observations, outperformed moment-constrained GME
models in mean square error. In the GME-NLP model, constraints Equations (6) and (7) remain
completely consistent with sample data information in Equations (3) and (4).
We also emphasize that the proposed GME-NLP estimator is a one-step approach, simultaneously
solving for reduced form and structural parameters. As a result, the nonlinear specification of Equation (6)
leads to first order optimization conditions (Equation A15) derived in the Appendix) that are different
from other multiple-step or asymptotically justified estimators. The most obvious difference is that the
first order conditions do not require orthogonality between right hand side variables and error terms,
Entropy 2014, 16 831
i.e., GME-NLP relaxes the orthogonality condition between instruments and the structural error term.
Perhaps more importantly, multiple-step estimators (e.g., 2SLS or GME-2S) only approximate the
NLP model and ignore nonlinear interactions between reduced and structural form coefficients. Thus,
constraints Equations (6) and (7) are not completely satisfied by multiple-step procedures, yielding an
estimator that is not fully consistent with the entire information set underlying the specification of the
model. Although this is not a critical issue in large sample estimation, as demonstrated below,
estimation inefficiency can be substantial in small samples if multiple-step estimators do not adequately
approximate the NLP model.
The proposed GME-NLP estimator has some econometric limitations similar to, and other
limitations which set it apart from, 2SLS that are evident when inspecting Equations (5)–(8). Firstly,
like 2SLS, the residuals in Equations (4) and (6) are not identical to those of the original structural
model, nor are they the same as the reduced form error term, except when evaluated at the true
parameter values. Secondly, the GME-NLP estimator does not attempt to correct for contemporaneous
correlation among the errors of the structural equations. Although a relevant efficiency issue,
contemporaneous correlation is left for future research. Thirdly, and perhaps most importantly, the use
of bounded disturbance support spaces in GME estimation introduces a specification issue in empirical
analysis that typically does not arise with traditional estimators. These issues are discussed in more
detail ahead.
In practice, parameter restrictions for coefficients of the SEM have been imposed using constrained
maximum likelihood or Bayesian regression [7,30]. Neither approach is necessarily simple enough to
specify analytically nor estimate empirically, and each has its empirical advantages and disadvantages.
For example, Bayesian estimation is well-suited for representing uncertainty with respect to model
parameters, but can also require extensive MC sampling when numerical estimation techniques are
required, as is often the case in non-normal, non-conjugate prior model contexts. In comparison to
constrained maximum likelihood or Bayesian analysis, the GME-NLP estimator also enforces
restrictions on parameter values, is arguably no more difficult to specify or estimate, and does not
require the use of MC sampling in the estimation phase of the analysis. Moreover, and in contrast to
constrained maximum likelihood or the typical parametric Bayesian analysis, GME-NLP does not
require explicit specification of the distributions of the disturbance terms or of the parameter values.
However, both the coefficient and the disturbance support spaces are compact in the GME-NLP
estimation method, which may not apply in some idealized empirical modeling contexts.
Imposing bounded support spaces on coefficients and error terms has several implications for GME
estimation. Consider support spaces for coefficients. Selecting bounds and intermediate reference
support points provides an effective way to restrict parameters of the model to intervals. If prior
knowledge about coefficients is limited, wider truncation points can be used to increase the confidence
that the support space contains the true . If knowledge exists about, say, the sign of a specific
coefficient from economic theory, this can be straightforwardly imposed together with a reasonable
bound on the coefficient.
Entropy 2014, 16 832
Importantly, there is a bias-efficiency tradeoff that arises when parameter support spaces are specified in
terms of bounded intervals. A disadvantage of bounded intervals is that they will generally introduce
bias into the GME estimator unless the intervals happen to be centered on the true values of the
parameters. An advantage of restricting parameters to finite intervals is that they can lead to increases
in efficiency by lowering parameter estimation variability. In the MC analysis ahead, it is demonstrated
that the bias introduced by bounded parameter intervals in the GME-NLP estimator can be much
more-than compensated for by substantial decreases in variability, leading to notable increases in
overall estimation efficiency.
In practice, support spaces for disturbances can always be chosen in a manner that provides a
reasonable approximation to the true disturbance distribution because upper and lower truncation
points can always be selected sufficiently wide to contain the true disturbances of regression models [31].
The number, M, of support points for each disturbance can be chosen to account for additional information
relating to higher moments (e.g., skewness and kurtosis) of each disturbance term. MC experiments
by [9] demonstrated that support points ranging from 2 to 10 are acceptable for empirical applications.
For the GME-NLP estimator, identifying bounds for the disturbance support spaces is complicated
by the interaction among truncation points of the parameters and disturbance support points of both the
reduced and structural form models. Yet, several informative generalizations can be drawn. First, [32]
demonstrated that ordinary least squares-like behavior can be obtained by appropriately selecting
truncation points of the GME-D estimator of the general linear model. This has direct implications to
SEM estimation in that appropriately selected truncation points of the GME-2S estimator leads to
2SLS-like behavior. However, as demonstrated ahead, given the nonlinear interactions between the
structural and reduced form models, adjusting truncation points of the GME-NLP does not necessarily
lead to two stage like behavior in finite samples. Second, the reduced form model in Equation (3) and
the nonlinear structural parameter representation of the reduced form model in Equation (4) have
identical error structure at the true parameter values. Hence, in the empirical applications below, we
specify identical support matrices for error terms of both the structural and reduced form models.
Third, in the limiting case where the disturbance boundary points of the GME-NLP structural model
expand in absolute value to infinity, the parameter estimates converge to the mean of their support points.
Given ignorance regarding the disturbance distribution, [9,10] suggest using a sample scale parameter
and the multiple-sigma truncation rule to determine error bounds. For example, the three sigma rule for
random variables states that the probability of a unimodal continuous random variable assuming
outcomes distant from its mean by more than three standard deviations is at most 5% [33]. Intuitively,
this multiple-sigma truncation rule provides a means of encompassing an arbitrarily large proportion of
the disturbance support space. From the empirical evidence presented below, it appears that combining
the three sigma rule with a sample scale parameter to estimate the GME-NLP model is a useful approach.
To derive consistency and asymptotic normality results for the GME-NLP estimator, we assume the
following regularity conditions.
R1. The N rows of the N G disturbance matrix Ε are independent random drawings from an
G-dimensional population with zero mean vector and unknown finite covariance matrix Σ .
Entropy 2014, 16 833
R2. The N K matrix X of exogenous variables has rank K and consists of nonstochastic
elements, with lim N1 XX Ω where Ω is a positive definite matrix.
N
R3. The elements ng of the vector v g μ g (n = 1,...,N, g = 1,...,G) are independent and bounded
such that c g 1 g ng c gM g for some g 0 and large enough positive cgM = cg1.
The probability density function of μ is assumed to be symmetric about the origin with a finite
covariance matrix.
R4. kg kgL , kgH , for finite kgL and kgH , k = 1,...,K and g = 1,...,G.
jg jgL , jgH , for finite jgL and jgH , ( j g ) j,g = 1,...,G; and gg 1 .
kg kgL , kgH , for finite kgL and kgH , k = 1,...,K and g = 1,...,G.
R5. For the true Β and nonsingular Γ , there exists positive definite matrices Ψ g (g = 1,...,G)
such that lim
N
1
N
Z g Z g Ψ g where Π =-BΓ -1 .
Condition R1 asserts that the disturbances are contemporaneously correlated. It also requires
independence of the N rows of the N G disturbance matrix Ε , which is stronger than the
uncorrelated error assumptions introduced immediately following Equation (1). Conditions R1, R2,
and R5 are typical assumptions made when deriving asymptotic properties for the 2SLS and 3SLS
estimators of the SEM [1]. The condition R3 states that the supports of ng and vng are symmetric
about the origin and can be contained in the interior of closed and bounded intervals [c1,cM]. Extending
the lower and upper bounds of the interval by (possibly arbitrarily small) g 0 is a technical and
computational convenience ensuring feasibility of the entropic solutions [32]. Condition R4 implies
that the true value of the parameters kg , jg , kg can be enclosed within a bounded interval.
The regularity conditions (R1)-(R5) provide a basic set of assumptions sufficient to establish
asymptotic properties for the GME-NLP estimator of the SEM. For notational convenience let
θ vec π, δ , where we follow the standard convention that δ vec δ1 ,..., δG . The theorems for
consistency and asymptotic normality are stated below with proofs in the Appendix.
Theorem 1. Under the regularity conditions R1-R5, the GME-NLP estimator, θˆ vec πˆ , δˆ , is a
consistent estimator of the true coefficient values θ vec π, δ .
The intuition behind the proof is that without the reduced form component in Equation (7) the
parameters of the structural component in Equation (6) are not identified. As shown in the Appendix,
the reduced form component yields estimates that are consistent and contribute to identifying the
structural parameters, and the structural component in Equation (7) ties the structural coefficients
to the data and draws the GME-NLP estimates toward the true parameter values as the sample
size increases.
Entropy 2014, 16 834
Theorem 2. Under the conditions of Theorem 1, the GME-NLP estimator, δˆ vec δˆ 1 ,..., δˆ G , is
a
asymptotically normally distributed as δˆ N δ, N1 Ω1Ω Ω1 .
The asymptotic covariance matrix consists of Ω diag 1Ψ1 ,..., G ΨG , which follows from R5
1
M w 2
and g E with ung sngm wngm w ung ung . The elements of Ω are
w w w ( ung ) 2
ng ng
m1
defined by N Z Σ I Z Ω , where Z=diag Z1 ,..., ZG and Σ is a G G covariance matrix
1
for the ng
w
's .
Estimators of the SEM are generally categorized as “full information” (e.g., 3SLS or FIML) or
“limited information” (e.g., 2SLS or LIML) estimators. GME-NLP is not a full information estimator
because the estimator neither enforces the restriction Π =-BΓ -1 nor explicitly characterizes the
contemporaneous correlation of the disturbance terms. An advantage of GME-NLP is that it is
completely consistent with data constraints in both small and large samples, because we concurrently
estimate the parameters of the reduced form and structural models. As a limited information estimator,
GME-NLP has several additional attractive characteristics. First, similar to other limited information
estimators, it is likely to be more robust to misspecification than a full information alternative because
in the latter case misspecification of any one equation can lead to inconsistent estimation of all the
equations in the system [34]. Second, GME-NLP is easily applied in the case of a single equation,
G = 1, and it retains the asymptotic properties identified above. Finally, the single equation case is a
natural generalization of the data-constrained GME estimator for the general linear model.
u
1
M w 2
N
ˆ =diag ˆ Ψ
then Ω ˆ ,..., ˆ Ψˆ . A straightforward estimate of Ω can be constructed as
1 1 G G
for i,j = 1,...,G. Combining these elements, the estimated asymptotic covariance matrix of δ̂ is defined
as V̂ar(δˆ ) N1 Ω
ˆ 1 Ω
ˆ Ω
ˆ 1 .
Entropy 2014, 16 835
To define Wald tests on the elements of δ , let Ho: R(δ) = 0 be the null hypothesis to be tested.
Here R( δ ) is a continuously differentiable L-dimensional vector function with rank R(δδ ) = L K.
Rδ
In the special case of a linear null hypothesis Ho: Rδ r , then δ = R . It follows from Theorem 5.37
in [35] that:
N R δˆ r N 0 , Rδ( δ ) Ω 1Ω Ω1 Rδ( δ )
d
The Wald test statistic has a 2 limiting distribution with L degrees of freedom given as
1
ˆ ˆ
d
W R δˆ r Rδ(δ ) V̂ar(δˆ ) Rδ(δ ) R δˆ r L2
under the null hypothesis.
For the sampling experiments we set up an overdetermined simultaneous system with contemporaneously
correlated errors that is similar, but not identical, to empirical models discussed in [10,36,37].
Reference [10] provide empirical evidence of the performance of the GME-GJM estimator for both
ill-posed (multicollinearity) and well-posed problems using a sample size of 20 observations. In this
study we attempt to focus on both smaller and larger sample size performance of the GME-NLP
estimator, the size and power of single and joint hypothesis tests, and the relative performance of
GME-NLP to 2SLS and 3SLS. In addition, the performance of GME-NLP is compared to Golan,
Judge, and Miller’s GME-GJM estimator. The estimation performance measure is the mean square
error (MSE) between the empirical coefficient estimates and the true coefficient values.
The parameters Γ and Β and the covariance structure Σ of the structural system in Equation (1)
are specified as:
6.2 4.4 4.0
0 .74 0
-1 .267 .087 .7 0 .53 1 -1 -.125
Γ= .222 -1 0 Β= 0 0 .11 Σ= -1 4 .0625
0 .046 -1 .96 .13 0 -.125 .0625 8
0 0 .56
.06 0 0
Entropy 2014, 16 836
The exogenous variables are drawn from an iid N(0,1) distribution, while the errors for the
structural equations are drawn from a multivariate normal distribution with mean zero and covariance
Σ I that is truncated at ±3 standard deviations.
To specify the GME models, additional information beyond that traditionally used in 2SLS and
3SLS is required. Upper and lower bounds, as well as intermediate support points for the individual
coefficients and disturbance terms, are supplied for the GME-NLP and GME-GJM models along with
starting values for the parameter coefficients. The difference in specification of GME-GJM relative
to GME-NLP is that in the former, Π =-BΓ -1 replaces the structural model in Equation (6) and the
GME-GJM objective function excludes any parameters associated with the structural form disturbance
term. The upper and lower bounds of the support spaces specified for the structural and reduced form
models are identical to [10] except that we use three rather than five support points. The supports are
defined as sik sik 5, 0, 5 for k = 2,...,7, si1 si1 20, 0, 20 , and sij 2, 0, 2 for i,j = 1,2,3.
The error supports for the reduced form and structural model were specified as
s s i 3 i , 0, i 3 i , where i is the standard deviation of the errors from the ith
z
in
w
in
equation and from R3 we let i = 2.5 to ensure feasibility. See appendix material for a more complete
discussion of computational issues.
Table 1 contains the mean values of the estimated Γ parameters based on 1,000 MC repetitions
for sample sizes of 5, 25, 100, 400, and 1,600 observations per equation. From this information,
we can infer several implications about the performance of the GME estimators. For a sample size of
five observations per equation, 2SLS and 3SLS estimators provide no solution due to insufficient
degrees of freedom. For five and 25 observations the GME-NLP and GME-GJM estimators have mean
values that are similar, although GME-NLP exhibits more bias. When the sample size is 100, the
GME-NLP estimator generally exhibits less bias. Like 2SLS and 3SLS, the GME-NLP estimator is
converging to the true coefficient values as N increases to 1,600 observations per equation (3SLS
estimates are not reported for 1,600 observations).
In Table 2 the standard error (SE) and MSE are reported for 3SLS and GME-NLP. The GME-NLP
estimator has uniformly lower standard error and MSE than does 3SLS. For small samples of
25 observations the MSE performance of the GME-NLP estimator is vastly improved relative to the
3SLS estimator, which is consistent with MC results from other studies relating to other GME-type
estimators [9,32]. As the sample size increases from 25 to 400 observations, both the standard error
and mean squared error of the 3SLS and GME-NLP converge towards each other. Interestingly, even
at a sample size of 100 observations the GME-NLP mean squared error remains notably superior
to 3SLS.
Entropy 2014, 16 837
Table 1. Mean value of parameter estimates from 1000 Monte Carlo simulations using
2SLS, 3SLS, GME-GJM, and GME-NLP.
Obs 2SLS 3SLS GME-GJM GME-NLP
21 = 0.222
5 - - 0.331 0.353
25 0.165 0.186 0.304 0.311
100 0.207 0.220 0.357 0.259
400 0.219 0.222 0.373 0.234
1,600 0.223 - 0.393 0.227
12 = 0.267
5 - - 0.267 0.301
25 0.274 0.241 0.292 0.304
100 0.264 0.278 0.278 0.283
400 0.272 0.276 0.293 0.274
1,600 0.268 - 0.319 0.269
32 = 0.046
5 - - 0.144 0.158
25 0.067 0.103 0.107 0.144
100 0.044 0.048 0.101 0.083
400 0.039 0.040 0.095 0.053
1600 0.046 - 0.075 0.048
13 = 0.087
5 - - 0.197 0.223
25 0.115 0.114 0.182 0.208
100 0.084 0.085 0.165 0.139
400 0.083 0.083 0.155 0.100
1,600 0.088 - 0.153 0.093
To investigate the size of the asymptotically normal test, the single hypothesis H0: ij k was tested
with k set equal to the true values of the structural parameters. Critical values of the tests were based
on a normal distribution with a 0.05 level of significance. An observation on the power of the
respective tests was obtained by performing a test of significance whereby k = 0 in the preceding
hypothesis. To complement this analysis, we investigated the size and power of a joint hypothesis H0:
21 k1 , 32 k2 using the Wald test. The scenarios were analyzed using 1000 MC repetitions for
sample sizes of 25, 100, and 400 per equation.
Table 3 contains the rejection probabilities for the true and false hypotheses of both the GME-NLP
and 3SLS estimators. The single hypothesis test for the parameter 21 0.222 based on the
asymptotically normal test responded well for GME-NLP (3SLS), yielding an estimated test size of
0.066 (0.043) and power of 0.980 (0.964) at 400 observations per equation. In contrast, for the
remaining parameters, the size and power of the hypotheses tests were considerably less satisfactory.
This is due in part to the second and third equations having substantially larger disturbance variability.
For the joint hypothesis test based on the Wald test the size and power perform well for GME-NLP
Entropy 2014, 16 838
(3SLS) with an estimated test size of 0.047 (0.047) and power of 0.961 (0.934) at 400 observations.
Overall, the results indicate that based on asymptotic test statistics GME-NLP does not dominate, nor
is it dominated by, 3SLS.
Table 2. Standard error (SE) and mean square error (MSE) of parameter estimates from
1000 Monte Carlo simulations using 3SLS and GME-NLP.
Obs SE MSE
3SLS GME-NLP 3SLS GME-NLP
21 = 0.222
5 - 0.101 - 0.027
25 0.442 0.155 0.197 0.032
100 0.143 0.116 0.021 0.015
400 0.065 0.064 0.004 0.004
12 = 0.267
5 - 0.103 - 0.012
25 1.281 0.166 1.641 0.029
100 0.459 0.183 0.211 0.034
400 0.198 0.149 0.039 0.022
32 = 0.046
5 - 0.168 - 0.041
25 0.842 0.256 0.711 0.075
100 0.449 0.226 0.201 0.052
400 0.183 0.158 0.033 0.025
13 = 0.087
5 - 0.120 - 0.033
25 0.669 0.202 0.448 0.055
100 0.269 0.188 0.073 0.038
400 0.133 0.121 0.018 0.015
Further MC results are presented to demonstrate the sensitivity of the GME-NLP to the sigma truncation
rule (Table 4) and to illustrate robustness of the GME-NLP relative to 3SLS in the presence of contaminated
error models (Table 5). Each of these issues play a critical role in empirical analysis of the SEM, while
the latter can compound estimation problems especially in small sample estimation.
To obtain the results in Table 4, the error supports for the reduced form and structural model were
specified as before with sinz sinw i j i , 0, i j i where i is the standard deviation of the
errors from the ith equation, j = 3,4,5 and from R3 i = 2.5, again for solution feasibility. The results
exhibit a tradeoff between bias and MSE specific to the individual coefficient estimates. For 21 the
bias and the MSE decreases as the truncation points are shrunk from five to three sigma. In contrast,
for the remaining coefficients in Table 4, the MSE increases as the truncation points are decreased.
The bias decreases for 32 and 13 as the truncation points are shrunk, while the direction of bias
is ambiguous for 12 . Predominately, the empirical standard error of the coefficients decreased with
Entropy 2014, 16 839
wider truncation points. Overall, these results underscore that the mean and standard error of GME-NLP
coefficient values are sensitive to the choice of truncation points.
Table 4. Mean, standard error (SE), and mean square error (MSE) of parameter estimates
from 1000 Monte Carlo simulations for GME-NLP with 3, 4, and 5-sigma truncation rules.
Obs 3-Sigma 4-Sigma 5-Sigma
Mean SE MSE Mean SE MSE Mean SE MSE
21 = 0.222
25 0.311 0.155 0.030 0.336 0.133 0.031 0.345 0.111 0.033
100 0.259 0.116 0.015 0.277 0.111 0.015 0.292 0.108 0.017
400 0.234 0.064 0.004 0.244 0.066 0.005 0.247 0.063 0.005
12 = 0.267
25 0.304 0.166 0.029 0.303 0.120 0.016 0.301 0.095 0.010
100 0.283 0.183 0.034 0.283 0.146 0.021 0.285 0.118 0.014
400 0.274 0.149 0.022 0.271 0.130 0.017 0.272 0.115 0.013
32 = 0.046
25 0.144 0.256 0.075 0.144 0.203 0.051 0.164 0.152 0.037
100 0.083 0.226 0.052 0.101 0.199 0.042 0.113 0.158 0.029
400 0.053 0.158 0.025 0.063 0.137 0.019 0.068 0.128 0.017
13 = 0.087
25 0.208 0.202 0.055 0.210 0.145 0.036 0.217 0.109 0.029
100 0.139 0.188 0.038 0.157 0.157 0.030 0.176 0.139 0.027
400 0.100 0.121 0.015 0.111 0.112 0.013 0.127 0.106 0.013
Results from Table 5 provide the mean and MSE of the distribution of coefficient estimates for
3SLS and GME-NLP when the error term is contaminated by outcomes from an asymmetric
Entropy 2014, 16 840
distribution [14,15]. For a given percentage level , the errors for the structural equations are
drawn from (1 ) N [0], Σ I F (2,3) and then truncated at ±3 standard deviations. We define
F (2,3) Beta(2,3) 6 and examine the robustness of 3SLS and GME-NLP with values of = 0.1,
0.5, and 0.9. The error supports for the reduced form and structural model were specified with the three
sigma rule. As evident in Table 5, when the percent of contamination induced in the error component
of the SEM increases, performance of both estimators is detrimentally impacted. For 25 observations,
the 3SLS coefficient estimates are much less robust to the contamination process than are the
GME-NLP estimates as measured by the MSE values. At 100 observations the performance of 3SLS
improves, but still remain less robust than GME-NLP.
Table 5. Mean and mean square error (in parentheses) of parameter estimates from 1000
Monte Carlo simulations for 3SLS and GME-NLP with contaminated normal distribution.
Obs 0.90N(0, ) + 0.10F(2,3) 0.50N(0, ) + 0.50F(2,3) 0.10N(0, ) + 0.90F(2,3)
3SLS GME-NLP 3SLS GME-NLP 3SLS GME-NLP
21 = 0.222
25 0.184 0.320 0.278 0.414 0.350 0.451
(0.159) (0.032) (0.406) (0.064) (1.404) (0.082)
100 0.226 0.262 0.243 0.329 0.268 0.368
(0.023) (0.016) (0.082) (0.037) (0.204) (0.050)
12 = 0.267
25 0.262 0.309 0.427 0.385 0.608 0.422
(1.058) (0.029) (1.195) (0.041) (4.578) (0.056)
100 0.267 0.282 0.356 0.339 0.374 0.364
(0.353) (0.036) (0.551) (0.038) (0.726) (0.44)
32 = .046
25 0.084 0.111 −0.009 0.105 −0.070 0.097
(0.794) (0.067) (0.779) (0.058) (2.489) (0.062)
100 0.061 0.082 0.010 0.067 −0.003 0.075
(0.326) (0.049) (0.395) (0.048) (0.601) (0.057)
13 = 0.087
25 0.081 0.198 0.094 0.198 0.083 0.219
(0.330) (0.048) (0.401) (0.056) (1.366) (0.067)
100 0.093 0.142 0.093 0.144 0.077 0.150
(0.061) (0.036) (0.059) (0.038) (0.124) (0.055)
4.5. Discussion
The performance of the GME-NLP estimator was based on a variety of MC experiments. In small
and medium sample situations (≤100 observations) the GME-NLP is MSE superior to 3SLS for the
defined experiments. Increasing the sample size clearly demonstrated consistency of the GME-NLP
estimator for the SEM. Regarding performance in single or joint hypothesis testing contexts, the
empirical results indicate that the GME-NLP did not dominate, nor was it dominated by 3SLS.
The MC evidence provided above indicates that applying the multiple-sigma truncation rule with a
sample scale parameter to estimate the GME-NLP model is a useful empirical approach. Across the 3,
Entropy 2014, 16 841
4, and 5-sigma rule sampling experiments, GME-NLP continued to dominate 3SLS in MSE for 25,
100, and 400 observations per equation. For wider truncation points the empirical SE of the
coefficients decreased. However, these results also demonstrate that the GME-NLP coefficients are
sensitive to the choice of truncation points with no consensus in choosing narrower (3-sigma) over
wider (5-sigma) truncation supports under a Gaussian error structure. We suggest that additional
research is needed to optimally identify error truncation points.
Finally, the GME-NLP estimator exhibited more robustness in the presence of contaminated errors
relative to 3SLS. The MC analysis illustrates that deviations from normality assumptions in asymptotically
justified econometric-statistical models lead to dramatically less robust outcomes in small samples.
Reference [9,16] emphasized that under traditional econometric assumptions, when samples are Gaussian
in nature and sample moments are taken as minimal sufficient statistics, then no information may be
lost. However, they point out that outside the Gaussian setting, reducing data constraints to moment
constraints can be wasteful use of sample information and results in estimators that are less than fully
efficient. The above MC analysis suggests that GME-NLP, which relies on full sample information but
does not rely on a full parametric specification such as maximum likelihood, can be more robust to
alternative error distributions.
5. Empirical Illustration
Klein’s Model I was selected as an empirical application because it has been extensively applied in
many studies. Klein’s macroeconomic model is highly aggregated with relatively low parameter
dimensionality, making it useful for pedagogical purposes. It is a three-equation SEM based on annual
data for the United States from 1920 to 1941. All variables are in billions of dollars, which are constant
dollars with base year 1934 (for a complete description of the model and data see [1,38]).
The model is comprised of three stochastic equations and five identities. The stochastic equations
include demand for consumption, investment, and labor. Klein’s consumption function is given as:
CN t = 11 + 11 W 1t + W 2t + 21 P t + 21 P t-1 + t1
where CNt is consumption, W1t is wages earned by workers in the private sector, W2t is wages earned
by government workers, Pt is nonwage income (profit), and 1t is a stochastic error term. This equation
describes aggregate consumption as a function of the total wage bill and current and lagged profit. The
investment equation is given by:
I t = 12 + 12 P 1t + 22 P t -1 + 32 K t -1 + t2
where It is net investment, Kt is the stock of capital goods at the end of the year, and 2t is a stochastic
error term. This equation implies that net investment reacts to current and lagged profits, as well as
beginning of the year capital stocks. The demand for labor is given by:
Entropy 2014, 16 842
Table 6 contains the estimates of the three stochastic equations using ordinary least squares (OLS),
two stage least squares (2SLS), three stage least squares (3SLS), and GME-NLP. Parameter
restrictions for GME-NLP were specified using the fairly uninformative reference support points
(-50,0,50) for the intercept, (-5,0,5) for the slope parameters of the reduced form models and
(-2,0,2) for the slope parameters of the structural form models. Truncation points for the error
supports of the structural model are specified using both three- and five-sigma rules.
For the given truncation points, the GME-NLP estimates of asymptotic standard errors are greater
than those of the other estimators. It is to be expected that if more informative parameter support
ranges had been used when representing the feasible space of the parameters, standard errors would
have been reduced. In most of the cases, the parameter, standard error, and R2 measures were not
particularly sensitive to the choice of error truncation point, although there were a few notable
exceptions dispersed throughout the three equation system.
Entropy 2014, 16 843
Table 6. Structural parameter estimates and standard errors (in parentheses) of Klein’s
Model I using OLS, 2SLS, 3SLS, and GME-NLP.
GME-NLP GME-NLP
Structural Parameter OLS 2SLS 3SLS
3-sigma 5-sigma
Consumption
16.237 16.555 16.441 14.405 14.374
11
(1.303) (1.468) (12.603) (2.788) (2.625)
0.796 0.810 0.790 0.772 0.750
11
(0.040) (0.045) (0.038) (0.073) (0.071)
0.193 0.017 0.125 0.325 0.280
21
(0.091) (0.131) (0.108) (0.372) (0.306)
0.090 0.216 0.163 0.120 0.206
21
(0.091) (0.119) (0.100) (0.332) (0.274)
R2 0.981 0.929 0.928 0.916 0.922
Investment
10.126 20.278 28.178 8.394 9.511
12
(5.466) (8.383) (6.79) (10.012) (10.940)
0.480 0.150 −0.013 0.440 0.358
12
(0.097) (0.193) (0.162) (0.386) (0.362)
0.333 0.616 0.756 0.340 0.350
22
(0.101) (0.181) (0.153) (0.342) (0.325)
−0.112 −0.158 −0.195 −0.100 −0.100
32
(0.027) (0.040) (0.033) (0.046) (0.051)
R2 0.931 0.837 0.831 0.819 0.811
Labor
1.497 1.500 1.797 2.423 1.859
13
(1.270) (1.276) (1.12) (3.112) (3.157)
0.439 0.439 0.400 0.481 0.381
13
(0.032) (0.040) (0.032) (0.255) (0.178)
0.146 0.147 0.181 0.087 0.200
23
(0.037) (0.043) (0.034) (0.272) (0.180)
0.130 0.130 0.150 0.112 0.114
33
(0.032) (0.032) (0.028) (0.091) (0.085)
R2 0.987 0.942 0.941 0.905 0.907
The Klein Model I benchmarks the GME-NLP estimator relative to OLS, 2SLS, and 3SLS.
Comparisons are based on the sum of the squared difference (SSD) measures between GME-NLP and
the OLS, 2SLS and 3SLS parameter estimates. Turning to the consumption model, the SSD is smallest
(largest) between GME-NLP and OLS (3SLS) parameter estimates for both the three- and five-sigma
rules (but only marginally). For example, the SSD between OLS (3SLS) and GME-NLP under the
3-sigma is 3.35 (4.15). Alternatively, for the labor model, the SSD is smallest (largest) between
GME-NLP and 3SLS (OLS) parameter estimates for both the three- and five-sigma rules. The most
dramatic differences arise in the investment model. For example, the SSD between OLS (3SLS) and
GME-NLP under the 3-sigma is 3.00 (391.79). This comparison underscores divergences that exist
between GME-NLP and 2SLS and 3SLS estimators. In addition to the information introduced by the
Entropy 2014, 16 844
parameter support spaces, another reason for this divergence may be due to the fact that GME-NLP is
a single-step estimator that is completely consistent with data constraints Equations (6) and (7), while
2SLS and 3SLS are multiple step estimators that only approximate the NLP model and ignore
nonlinear interactions between reduced and structural form coefficients. The nonlinear specification of
GME-NLP leads to first order optimization conditions (Equation (16) derived in the Appendix) that are
different from other multiple-step or asymptotically justified estimators such as 2SLS and 3SLS.
Overall, the SSD comparisons characterize finite samples differences in the GME-NLP estimator
relative to more traditional estimators.
6. Conclusions
small samples or ill-posed problems, and underscores the need for continued research on problems of a
number of problems in small sample estimation based on asymptotically justified estimators.
Acknowledgments
We thank George Judge (Berkeley) for helpful comments and suggestions. All errors remaining are
the sole property of the authors.
Conflicts of Interest
Appendix
To facilitate both the derivation of the asymptotic properties and computational efficiency of
the GME-NLP estimator, we reformulate the maximum entropy model into scalar notation that is
completely consistent with Equations (5)–(8) (under the prevailing assumptions and the constraints
Equations A1–A8 defined below). The scalar notation exhibits the flexibility to use different numbers
of support points for each parameter or error term. However, we simplify the notation by using M
support points for each parameter and error term.
Let represent a bounded, convex, and dense parameter space containing the Q1 vector of the
reduced form and structural parameters θ = vec(θ ,θ ,θ) . The reformulated constrained maximum
entropy model is defined as
max - pkgm
ln pkgm pigm
ln pigm - pkgm
ln pkgm
kgm
, p , p , p , z ,w
igm kgm
(A1)
- wngm ln wngm - zngm ln zngm
ngm ngm
subject to:
M
s
m 1
kgm
pkgm kg ; kgL skg 1 ... skgM
kgH (A2)
s
m 1
igm
pigm ig ; igL sig 1 ... sigM
igH (A3)
s
m 1
kgm
pkgm kg ; kgL skg
1 ... skgM kgH (A4)
θ
M
s w
ngm wngm ung yng X n Π θ
g X gn θg ; cg1 sng
w
1 ... sngM cgM
w
(A5)
( g )
m 1
s
m 1
z
ngm ngmz vng yng X n θ g ; cg1 sng
z
1 ... sngM cgM
z
(A6)
Entropy 2014, 16 846
sijgm sijg ( M 1m) for m = 1,..., M (where for M odd s ijg M 1 0 and i = w, z ) (A7)
2
M M M M M
p = 1, p = 1, p = 1, w = 1, z = 1
m=1
kgm
m=1
igm
m=1
kgm
m=1
ngm
m=1
ngm (A8)
Constraints A2-A6 define the reparameterized coefficients and errors with supports. In A5 the term
Π θ
is a K Gg matrix of elements kg that coincide with the endogenous variables in Y(−g).
( g )
The constraint A7 implies symmetry of the error supports about the origin and A8 defines the
normalization conditions. The nonnegativity restrictions on p kgm , p igm
, p kgm , wngm, and zngm are
inherently satisfied by the optimization problem and are not explicitly incorporated into the constraint set.
Next, we define the conditional entropy function by conditioning on θ τ , θ τ , and θ τ ,
or simply θ τ where θ = vec(θ ,θ ,θ) and τ = vec( τ , τ , τ ) . This yields
F (τ) max - pkgm
ln pkgm pigm
ln pigm - pkgm
ln pkgm
p , p , p
kgm
, z , w:θ τ
igm kgm
(A9)
- wngm ln wngm - zngm ln zngm
ngm ngm
The optimal value of zngm in the conditionally-maximized entropy function is the solution to
z z M z
M M
the Lagrangian L z
ng , z
ng , z
ng z ngm
ln( z ngm ) ng ngm
z 1 ng sngm zngm vng (τ ) and
m1 m1 m1
is given by
z
ng
vng ( τ ) sngm
z
zngm z
ng vng (τ )
M
e
vng ( τ ) sng
, m 1, , M
(A9)
e
z z
ng
1
wngm uw
ng ng (τ) M
e
, m 1, , M
e (A10)
w w
ng ung ( τ ) sng
1
w wM w
M M
solves L wng , ng , ng wngm ln(wngm ) ng wngm 1 ng sngm wngm ung (τ)
w w
v
M M
s z
z
ngm ngm
z
ng ng ( τ ) sngm
z
zngm ng
z
vng (τ ) (A11)
m 1 m 1
and:
M M
sngm
w
wngm ng
w
ung (τ) sngm
w
wngm ng
w
ung (τ) (A12)
m 1 m 1
follow from the symmetry of the support points around zero. Likewise the optimal values of p kgm
(for , , ) are respectively:
Entropy 2014, 16 847
kg kg sngm
p
kgm
kg
kg M
e
, m 1,..., M
(A13)
e
kg kg sngj
j 1
M
M M
which satisfy L pkg , ng , ng pkgm
ln( pkgm ) kg pkgm 1 kg sngm pkgm kg . For
m1 m1 m1
notational convenience we let ng = (vng ( τ )) , ng = (u ng ( τ )) , kg = kg ( kg ) , ig = ig (ig ) and
z z w w
kg = kg ( kg ) represent the optimal values of the Lagrangian multipliers. Substituting the solutions
defined from Equations (A10), (A11), and (A14) into the conditional objective function yields the
conditional maximum value entropy function:
F τ kg kg ln exp kg skgm
jg jg ln exp jg s jgm
kg m jg m
w
kg kg ln exp kg skgm
ng ung τ ln exp ng sngm
w w
(A14)
kg m ng m
z
ng
vng τ ln exp ng z z
sngm
ng m
λ τ G
I X I Γ τ X
G
λz
τ λ τ [ 0] diag Π ( τ )( 1) X ,..., Π ( τ )( G ) X
λ w
λ τ
[ 0]
diag X1 ,..., XG
(A15)
λz
λ τ Z * τ w
λ
Above, Γ τ is a G G matrix of elements ig and Π τ
( g )
is a K Gg matrix of elements
kg The Lagrangian multipliers are vertically concatenated into λ , λ , λ , λ w , λ z , where, for example,
the vector λ w vec λ1w,...,λ Gw is of dimension NG 1 and is made up of λ gw ( 1wg ,..., wNg ) for g =
1,...,G.
The Q Q Hessian matrix of the conditional maximum value F( τ ) in Equation (A15) is given
by:
λ τ Z * τ
I λ Z * τ Ξ τ Z * τ
H( τ ) w
(A16)
τ τ
where denotes the Hadamard product (element wise) between two matrices. The Q Q diagonal
matrix λτ τ λ
τ
, λτ , λτ is defined by:
Entropy 2014, 16 848
M 2
2
1
skgm pkgm kg if k r , g t
m 1
kg
for , ,
rt
0 otherwise
where:
1
M w 2
τ sngm wngm w ung τ ung τ
w 2
ng
m 1
and:
1
M z 2
τ sngm
zngm z vng τ vng τ
2
z
ng
m 1
By the Cauchy-Swcharz inequality, symmetry assumption on the supports, and the adding up
conditions, then - τ(τ ) - Z *(τ )(Ξ(τ ) Z *(τ )) is a negative definite matrix. Next, we prove consistency and
asymptotic normality of the GME-NLP estimator.
Theorem 1. Under the regularity conditions R1-R5, the GME-NLP estimator, θˆ vec πˆ , δˆ , is a
consistent estimator of the true coefficient values θ vec π, δ .
Proof. Let represent a bounded, convex, and dense parameter space such that the true coefficient
values θ . Consider the just identified case. From Equations (5)–(8):
max
p , p ,p , w ,z
w ln w
is not a function of p or z almost everywhere. Furthermore, it is not a function of the reduced form
coefficients satisfying the identification conditions that are discussed after Equation (4). In addition the
Entropy 2014, 16 849
δˆ τ γˆ τ , βˆ τ arg max F τ
τ , τ :τ
for τ in the parameter set that satisfies the identification conditions. By [32]:
p p
γˆ τ γ τ and βˆ τ β τ
and:
p p
γˆ π γ π and βˆ π β π
then by [39]
πˆ , γˆ πˆ , βˆ πˆ π, γ, β
p
which establishes consistency for the just identified case. Further results pertaining to the
overidentified case are available from the authors upon request.
Theorem 2. Under the conditions of Theorem 1, the GME-NLP estimator, δˆ vec δˆ 1 ,..., δˆ G , is
a
asymptotically normally distributed as δˆ N δ, N1 Ω1Ω Ω1 .
Proof. Let δ̂ be the GME-NLP estimator of δ vec δ1 ,..., δG . Expand the gradient vector in a Taylor
series around δ to obtain:
δˆ δ H δ* δˆ δ (A17)
p
where δ* is between δ̂ and δ . Since δ̂ is a consistent estimator of δ , then δ* δ . Using this
information and the fact that δˆ [0] at the optimum, then:
d
N δˆ δ N1 H δ
1
1
N
δ
where both the left hand and right hand side terms have equivalent limiting distributions. Note that
1 1
w
N H δ N Z Ξ Z 0 p N where Z is the block diagonal matrix Z =diag Z1 ,..., Z G . From
1
p
regulatory conditions 1
N H δ Ω =diag 1Ψ1 ,..., G Ψ G , where Ω is a positive definite matrix.
Because ng ng
w
θ ngz θ are iid for n = 1,...,N, then g E ng .
Entropy 2014, 16 850
with covariance matrix N1 Z Σ I Z Ω , where Σ a G G covariance matrix for the wg ' s
(see [40,41]). From the above results and by applying Slutsky’s Theorem:
N δˆ δ N [0], Ω1Ω Ω1
a
δˆ N δ, N1 Ω1Ω Ω1
a
Proposition 1. Under the assumptions of Theorem 1, the reduced form estimates of (3) are consistent,
p
πˆ arg max z τ ln z τ π .
τ
Proof. With the exception that we account for contemporaneous correlation in the errors, this is the
proof for consistency of the data-constrained GME estimator of the general linear model [32].
Consider the conditional maximum function:
z z z
FR τ ng vng τ ln exp ng
sng
ng m
where v ng y ng X n π g .
We expand FR τ about π with a Taylor series expansion that yields:
FR τ FR π π τ π 12 τ π H R π* τ π
where π* lies between τ and π . The gradient vector is given by I X λ z and the Hessian
matrix is H R I X Ξz I X . The scaled gradient term is asymptotically normally
d
distributed as 1
N
π N [0], Ω R by a multivariate version of Liapounov’s central limit theorem
2
* N
1
2
τ π H R π τ π s 2
The parameter s denotes the smallest eigenvalue of N1 H R π* for any π* that lies between τ
1/ 2
and π , where a k 1 ak2 denotes the standard vector norm.
K
Entropy 2014, 16 851
Combining the elements from above, for all > 0 the P max
F ( τ ) F (δ ) 0 1 as N .
:
Thus, πˆ arg max z τ ln z τ π .
p
τ
To estimate the GME-NLP model, the conditional entropy function (Equation (A15)) was
maximized. Note that the constrained maximization problem Equations (5)–(8) requires estimation of
(Q + 2GNM) unknown parameters. Solving Equations (5)–(8) for (Q + 2GNM) unknowns is not
computationally practical as the sample size, N, grows larger. For example, consider an empirical
application with Q = 36 coefficients, G = 3 equations, and M = 3 support points. Even for a small
number of observations, say N = 50, the number of unknown parameters would be 936. In contrast,
maximizing Equation (A15) requires estimation of only Q unknown coefficients for any real value of N.
The GME-NLP estimator uses the reduced and structural form models as data constraints with a
dual objective function as part of its information set. To completely specify the GME-NLP model,
support (upper and lower truncation and intermediate) points for the individual parameters, support
points for each error term, and Q starting values for the parameter coefficients are supplied by the user.
In the Monte Carlo analysis and empirical application, the model was estimated using the unconstrained
optimizer OPTIMUM in the econometric software GAUSS. We used 3 support points for each
parameter and error term. To increase the efficiency of the estimation process the analytical gradient
and Hessian were coded in GAUSS and called in the optimization routine. This also offered an opportunity
to empirically validate the derivation of the gradient, Hessian, and covariance matrix. Given suitable
starting values the optimization routine generally converged within seconds for the empirical examples
discussed above. Moreover, solutions were quite robust to alternative starting values.
References
1. Theil, H. Principles of Econometrics; John Wiley & Sons: New York, NY, USA, 1971.
2. Zellner, A.; Theil, H. Three-stage least squares: Simultaneous estimation of simultaneous equations.
Econometrica 1962, 30, 54–78.
3. Fuller, W.A. Some properties of a modification of the limited information estimator. Econometrica
1977, 45, 939–953.
4. Koopmans, T.C. Statistical Inference in Dynamic Economic Models; Cowles Commission
Monograph 10; Wiley: New York, NY, USA, 1950.
5. Hausman, J.A. Full information instrumental variable estimation of simultaneous equations
systems. Ann. Econ. Soc. Meas. 1974, 3, 641–652.
6. Zellner, A. Statistical analysis of econometric models. J. Am. Stat. Assoc. 1976, 74, 628–643.
7. Zellner, A. The finite sample properties of simultaneous equations = estimates and estimators
bayesian and non-bayesian approaches. J. Econom. 1998, 83, 185–212.
8. Phillips, P.C.B. Exact Small Sample Theory in the Simultaneous Equations Model. In Handbook
of Econometrics; Griliches, Z., Intrilligator, M.D., Eds.; Elsevier: New York, NY, USA, 1983.
Entropy 2014, 16 852
9. Golan, A.; Judge, G.; Miller, D. Maximum Entropy Econometrics: Robust Estimation with
Limited Data; John Wiley & Sons: New York, NY, USA, 1996.
10. Golan, A.; Judge, G.; Miller, D. Information Recovery in Simultaneous Equations Statistical
Models. In Handbook of Applied Economic Statistics; Ullah, A., Giles, D., Eds.; Marcel Dekker:
New York, NY, USA, 1997.
11. West, K.D.; Wilcox, D.W. A comparison of alternative instrumental variables estimators of a
dynamic linear model. J. Bus. Econ. Stat. 1996, 14, 281–293.
12. Hanson, L.P.; Heaton, J.; Yaron, A. Finite-sample properties of some alternative GMM
estimators. J. Bus. Econ. Stat. 1996, 14, 262–280.
13. Tukey, J.W. A Survey Sampling from Contaminated Distributions. In Contributions to
Probability and Statistics; Olkin, I., Ed.; Stanford University Press: Stanford, CA, USA, 1960.
14. Huber, P.J. Robust Statistics; John Wiley & Sons: New York, NY, USA, 1981.
15. Hampel, F.R.; Ronchetti, E.M.; Rousseeuw, P.J.; Stahel, W.A. Robust Statistics: The Approach
Based on Influence Functions; John Wiley & Sons: New York, NY, USA, 1986.
16. Koenker, R.; Machado, J.A.F.; Keels, C.L.S.; Welsh, A.H. Momentary lapses: Moment expansions
and the robustness of minimum distance estimation. Econom. Theory 1994, 10, 172–197.
17. Kitamura, Y.; Stutzer, M. An information-theoretic alternative to generalized method of moments
estimation. Econometrica 1997, 65, 861–874.
18. Imbens, G.; Spady, R.; Johnson, P. Information theoretic approaches to inference in moment
condition models. Econometrica 1998, 66, 333–357.
19. Van Akkeren, M.; Judge, G.G.; Mittelhammer, R.C. Generalized moment based estimation and
inference. J. Econom. 2002, 107, 127–148.
20. Mittelhammer, R.; Judge, G. Endogeneity and Moment Based Estimation under Squared Error
Loss. In Handbook of Applied Econometrics and Statistics; Wan, A., Ullah, A., Chaturvedi, A.,
Eds.; Marcel Dekker: New York, NY, USA, 2001.
21. Mittelhammer, R.C.; Judge, G.; Miller, D. Econometric Foundations; Cambridge University
Press: New York, NY, USA, 2000.
22. Marsh, T.L.; Mittelhammer, R.C. Generalized Maximum Entropy Estimation of a First Order
Spatial Autoregressive Model. In Advances in Econometrics, Spatial and Spatiotemporal
Econometrics; LeSage, J.P., Ed; Elsevier: New York, USA, 2004; Volume 18.
23. Ciavolino, E.; Dahlgaard, J.J. Simultaneous Equation Model based on the generalized maximum
entropy for studying the effect of management factors on enterprise performance. J. Appl. Stat.
2009, 36, 801–815.
24. Papalia, R.B.; Ciavolino, E. GME estimation of spatial structural equations models. J. Classif.
2011, 28, 126–141.
25. Zellner, A. Estimation of regression relationships containing unobservable independent variables.
Int. Econ. Rev. 1970, 11, 441-454.
26. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423.
27. Kullback, S. Information Theory and Statistics; John Wiley & Sons: New York, NY, USA, 1959.
28. Pompe, B. On some entropy measures in data analysis. Chaos Solitons Fractals 1994, 4, 83–96.
Entropy 2014, 16 853
29. Zellner, A. Bayesian and Non-Bayesian Estimation Using Balanced Loss Functions. In Statistical
Decision Theory and Related Topics; Gupta, S., Berger, J., Eds.; Springer Verlag: New York, NY,
USA, 1994.
30. Dreze, J.H.; Richard, J.F. Bayesian Analysis of Simultaneous Equations Systems. In Handbook of
Econometrics; Griliches, Z., Intrilligator, M.D., Eds.; Elsevier: New York, NY, USA, 1983.
31. Malinvaud, E. Statistical Methods of Econometrics, 3rd Ed; North-Holland: Amsterdam,
The Netherlands, 1980.
32. Mittelhammer, R.C.; Cardell, N.S.; Marsh, T.L. The data-constrained generalized maximum
entropy estimator of the GLM: Asymptotic theory and inference. Entropy 2013, 15, 1756–1775.
33. Pukelsheim, F. The three sigma rule. Am. Stat. 1994, 48, 88–91.
34. Davidson, R.; MacKinnon, J.G. Estimation and Inference in Econometrics; Oxford: New York,
NY, USA, 1993.
35. Mittelhammer, R.C. Mathematical Statistics for Economics and Business; Springer: New York,
NY, USA, 1996.
36. Cragg, J.G. On the relative small sample properties of several structural-equation estimators.
Econometrica 1967, 35, 89–110.
37. Tsurumi, H. Comparing Bayesian and Non-Bayesian Limited Information Estimators. In Bayesian
and Likelihood Methods in Statistics and Econometrics; Geisser, S., Hodges, J.S., Press, S.J.,
Zellner, A., Eds.; North Holland Publishing: Amsterdam, The Netherlands, 1990.
38. Klein, L.R. Economic Fluctuations in the United States, 1921–1941; John Wiley & Sons:
New York, NY, USA, 1950.
39. Rao, C.R. Linear Statistical Inference and Its Applications, 2nd ed; John Wiley & Sons:
New York, NY, USA, 1973.
40. Hoadley, B. Asymptotic properties of maximum likelihood estimators for the independent but not
identically distributed case. Ann. Math. Stat. 1971, 42, 1977–1991.
41. White, H. Asymptotic Theory for Econometricians; Academic Press: New York, NY, USA 1984.
© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article
distributed under the terms and conditions of the Creative Commons Attribution license
(http://creativecommons.org/licenses/by/3.0/).