Structural Equation Modeling Foundation and Extensions
Structural Equation Modeling Foundation and Extensions
Structural Equation Modeling Foundation and Extensions
VO LU M E S I N T H E S E R I E S
1. HIERARCHICAL LINEAR MODELS: Applications and Data Analysis
Methods, 2nd Edition
Stephen W. Raudenbush and Antony S. Bryk
2. MULTIVARIATE ANALYSIS OF CATEGORICAL DATA: Theory
John P. van de Geer
3. MULTIVARIATE ANALYSIS OF CATEGORICAL DATA: Applications
John P. van de Geer
4. STATISTICAL MODELS FOR ORDINAL VARIABLES
Clifford C. Clogg and Edward S. Shihadeh
5. FACET THEORY: Form and Content
Ingwer Borg and Samuel Shye
6. LATENT CLASS AND DISCRETE LATENT TRAIT MODELS:
Similarities and Differences
Ton Heinen
7. REGRESSION MODELS FOR CATEGORICAL AND LIMITED
DEPENDENT VARIABLES
J. Scott Long
8. LOG-LINEAR MODELS FOR EVENT HISTORIES
Jeroen K. Vermunt
9. MULTIVARIATE TAXOMETRIC PROCEDURES: Distinguishing
Types From Continua
Niels G. Waller and Paul E. Meehl
10. STRUCTURAL EQUATION MODELING: Foundations and
Extensions, 2nd Edition
David Kaplan
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page iii
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page iv
All rights reserved. No part of this book may be reproduced or utilized in any form or
by any means, electronic or mechanical, including photocopying, recording, or by any information
storage and retrieval system, without permission in writing from the publisher.
Portions of Chapter 7 will also appear in the forthcoming SAGE Handbook of Quantitative Methods
in Psychology edited by Roger E. Millsap and Albert Maydeu-Olivares. Portions of Chapter 9 were
first published in the following articles by David Kaplan and are reprinted here with permission:
Finite mixture dynamic regression modeling of panel data with implications for dynamic response
analysis, Journal of Educational and Behavioral Statistics; An overview of Markov chain methods for
the study of stage-sequential developmental processes, Developmental Psychology (Copyright ©
2008 by the American Psychological Association); Methodological advances in the analysis of
individual growth with relevance to educational policy, Peabody Journal of Education.
For information:
H61.25.K365 2009
300.1′5118—dc22 2008017670
08 09 10 11 12 10 9 8 7 6 5 4 3 2 1
Contents
References 232
Index 245
About the Author 255
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page vi
O ver the last 35 years Structural Equation Modeling (SEM) has become
one of the most important data analysis techniques in the social sci-
ences. In fact, it has become much more than that. It has become a language to
formulate social science theories, and a language to talk about the relationship
between variables. For about 10 years, the AQT series of books on advanced
techniques in the social sciences has had David Kaplan’s excellent text. We are
now pleased to have a completely revised and updated second edition.
SEM is not without its critics, and most researchers active in the area will
admit that it can easily be misused and in fact has frequently been misused. Its
routine application as a tool for theory formation and causal analysis has been
criticized by well-known statisticians such as Freedman, Wermuth, Rogosa,
Speed, Rubin, and Cox. One obvious problem is that SEM is a complicated
technique, whose statistical properties are far from simple, and many of its
users do not have enough statistical expertise to understand the various com-
plications. Another problem is that using SEM allows one to search over an
enormously large space of possible models, with a complicated combinatorial
structure, and the task of choosing an appropriate model, let alone the “best
model,” is horrendously difficult. Unless one has very strong prior knowledge,
which sets strict limitations on the choice of the model, it is easy to search until
one has an acceptable fit. That fit will often convey more about the tenacity and
good fortune of the investigator than about the world the model is supposed
to characterize. Finally, at a deeper level, there is considerable disagreement
about precisely what one can learn about cause and effect in the absence of
experiments in which causal variables are actually manipulated. For the critics,
SEM can never be a substitute for real experiments. The second edition pays a
great deal of attention to causal inference.
It is somewhat unfortunate that most of the books discussing SEM con-
centrate on the practical aspects of the technique, and are often ill-disguised
vi
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page vii
A lmost a decade has passed since the publication of the first edition of
Structural Equation Modeling: Foundations and Extensions and consider-
able methodological advances in the area of structural equation modeling have
taken place. The vast majority of these advances have been in the analysis of
longitudinal data, but advances in the analysis of categorical latent variables as
well as general models that combine both categorical and continuous latent
variables have also made their way into applied work in the social and behav-
ioral sciences during this time. In addition, there have been advances in
estimation methods, techniques of model evaluation, and modern conceptual-
izations of the modeling process—including recent thinking on the use of
structural equation modeling for causal inference. In light of these advances,
I have undertaken a substantial revision of the book from the original format
that was adopted in the first edition.
This new edition maintains and updates so-called “first-generation” struc-
tural equation modeling but now brings in developments in so-called “second
generation” structural equation modeling—methods that combine continuous
latent variables (factors) with categorical latent variables (latent classes) in
cross-sectional and longitudinal contexts. As a result, the term structural equa-
tion modeling is being used here in a much more expansive sense, covering
models for continuous and categorical latent variables.
The present edition is now organized as follows. Chapter 1 retains the
original historical overview but now adds an historical overview of latent class
models. Chapter 2 remains relatively intact from the first edition. For com-
pleteness, Chapter 3 now contains material on nonstatistical estimation in the
unrestricted model—including a discussion of principal components analysis
and the common factor model. Chapter 4 remains mostly intact from the first
edition. Chapter 5 provides more detail regarding mean- and variance-adjusted
maximum likelihood and weighted least squares estimators along with a dis-
cussion of the extant evidence regarding their performance. Additional mater-
ial regarding developments in the analysis of missing data in the structural
ix
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page x
with the first edition, this is not a book on how to use Mplus. For any
supplementary analyses, I have decided to use the open source software
program R. The R programming language is best considered a “dialect” of the
S programming language (Chambers, 1998). In most cases, S code can be
exported to the R environment without difficulty. In some cases, it is necessary
to invoke the S environment, and this is best accomplished through the
commercially available version, S-Plus.
Acknowledgments
Roger J. Calantone
Michigan State University
George Farkas
Pennsylvania State University
Scott M. Lynch
Princeton University
Keith A. Markus
John Jay College of Criminal Justice
Sandy Marquart-Pyatt
Utah State University
Victor L. Willson
Texas A&M University
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page xii
1
Historical Foundations
of Structural Equation
Modeling for Continuous and
Categorical Latent Variables
rij = li lj , [1.1]
where ρij is the population correlation between scores on test i and test j, and
λi and λj are weights (loadings) that relate test i and test j to the general factor.
Consistent with our general definition of structural equation modeling,
Equation [1.1] expresses the correlations in terms of a set of structural
parameters. Spearman used the newly developed product-moment correlation
coefficient to correlate scores on a variety of tests taken by small sample of
boys. Spearman’s reported findings that were consistent with the structural
equation in Equation [1.1].
The work of Spearman and others (e.g., Thomson, 1956; Vernon, 1961)
formed the so-called British school of factor analysis. However, in the 1930s,
attention shifted to the work of L. L. Thurstone and his colleagues at the
University of Chicago. Thurstone argued that there was not one underlying
general factor of ability accompanied by specific ability factors as postulated by
Spearman. Rather, Thurstone argued that there existed major group factors
referred to as primary mental abilities (Thurstone, 1935). According to Mulaik
(1972), Thurstone’s search for group factors was motivated by a parsimony
principle that suggested that each factor should account for as much covariation
as possible in nonoverlapping sets of observed measures. Factors displaying this
property were said to exhibit simple structure. To achieve simple structure, how-
ever, Thurstone (1947) had to allow for the possibility that the factors them-
selves were correlated. Proponents of the British school, as noted by Mulaik
(1972), found this correlation to validate their claim of a general unitary ability
factor. In the context of Thurstone’s (1947) multiple factor model, the general
ability factor exists at a higher level of the ability hierarchy and is postulated to
account for the intercorrelations between the lower order primary factors.
By the 1950s and 1960s, factor analysis gained tremendous popularity,
owing much to the development and refinement of statistical computing
capacity. Indeed, Mulaik (1972) characterized this era as a time of agnostic and
blind factor analysis. However, during this era, developments in statistical
01-Kaplan-45677:01-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 3
factor analysis were also occurring, allowing for the explicit testing of hypothe-
ses regarding the number of factors. Specifically, work by researchers such as
Jöreskog (1967), Jöreskog and Lawley (1968), Lawley (1940, 1941), and Lawley
and Maxwell (1971) led to the development of a maximum likelihood–based
approach to factor analysis. The maximum likelihood approach allowed a
researcher to test a hypothesis that a specified number factors were present to
account for the intercorrelations between the variables. Minimization of the
maximum likelihood fitting function led directly to the likelihood ratio chi-
square test of the hypothesis that a proposed model fits the data. A generalized
least squares approach was later developed by Jöreskog and Goldberger (1972).
Developments by researchers such as Anderson and Rubin (1956) and later
by Jöreskog (1969) led to the methodology of confirmatory factor analysis that
allowed for testing hypotheses regarding the number of factors and the pattern
of loadings. From a historical perspective, these developments lent a rigorous
statistical approach to Thurstone’s simple structure ideas. In particular, a
researcher could now specify a model that certain factors accounted for the cor-
relations of only a specific subset of the observed variables. Again, using the
method of maximum likelihood, the hypothesis of simple structure could be
tested.
Exploratory and confirmatory factor analysis remain to this day very pop-
ular methodologies in quantitative social science research. In the context of
structural equation modeling, however, factor analysis constitutes a part of the
overall framework. Indeed, structural equation modeling represents a method
that, among other things, allows for the assessment of complex relationships
among factors. These complex relationships are often represented as systems of
simultaneous equations. The historical development of simultaneous equation
methodology is traced next.
of path analysis was statistically equivalent to factor analysis and was developed
apparently without knowledge of the work of Spearman (1904; see also Bollen,
1989). Wright also applied path analysis to problems of estimating supply and
demand equations and also treated the problem of model identification. These
issues formed the core of later econometric contributions to structural equa-
tion modeling (Goldberger & Duncan, 1972).
A second line of development occurred in the field of econometrics.
Mathematical models of economic phenomena have had a long history, begin-
ning with Petty (1676; as cited in Spanos, 1986). However, the form of econo-
metric modeling of relevance to structural equation modeling must be
credited to the work of Haavelmo (1943). Haavelmo was interested in model-
ing the interdependence between economic variables using the form for systems
of simultaneous equations written as
y = By + Γx + ζ , [1.2]
The above discussion briefly sketched the history of factor analysis and the
history of simultaneous equation modeling. The subject of this book is the
combination of the two, namely simultaneous equation modeling among
latent variables. The combination of these methodologies into a coherent ana-
lytic framework was based on the work of Jöreskog (1973), Keesling (1972),
and Wiley (1973).
The general structural equation model as outlined by Jöreskog (1973)
consists of two parts: (1) the measurement part, which links observed variables
to latent variables via a confirmatory factor model, and (2) the structural part
linking latent variables to each other via systems of simultaneous equations.
The estimation of the model parameters uses maximum likelihood estimation.
In the case where it is assumed that there is no measurement error in the
observed variables, the general model reduces to the simultaneous equations
model developed in econometrics (e.g., Hood & Koopmans, 1953). Issues of
model identification developed in econometrics (e.g., Fisher, 1966) were
brought into the general model with latent variables by Wiley (1973). A history
of software development then took place culminating in the popular LISREL
program (Jöreskog & Sörbom, 2000).
What has been considered thus far is the history of structural equation model-
ing with a focus on the introduction of continuous latent variables into the
simultaneous equation framework. In many applications, however, it may be
useful to hypothesize the existence of categorical latent variables. Such cate-
gorical latent variables are presumed to explain response frequencies among
dichotomous or ordered categorical variables.
The use of categorical latent variables underlies the methodology of latent
structure analysis. Latent structure analysis was originally proposed by
Lazarsfeld (1950) as a means of modeling latent attitudes derived from
dichotomous survey items with the origins of latent structure analysis arising
from studies of military personnel during World War II.3 The problem facing
researchers at that time concerned the development of reliable and valid
instruments measuring the attitudes soldiers had toward the army. The results
of research conducted on World War II soldiers were published between 1949
and 1959 in a four-volume set titled The American Solider: Studies in Social
Psychology in WWII (Stouffer, Suchman, Devinney, Star, & Williams, 1949).
01-Kaplan-45677:01-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 6
Volume 4 of this study was devoted to the problems of measurement and scal-
ing with major contributions made by Louis Guttman and Paul Lazarsfeld.
As with the earlier work of Spearman and Thurstone on the measurement
of intelligence, the goal here was to uncover underlying or “latent” structure
describing the attitudes of army personnel. However, unlike Spearman and
Thurstone, the observed data were discrete categorical responses and, in par-
ticular, dichotomous yes/no, agree/disagree responses.
The summary empirical data were in the form of frequencies of agreement
to a set of questions administered to the sample of personnel. In an example
using four dichotomous items, Lazarsfeld summarized the counts of indivi-
duals who agreed with all four statements, agreed with the first, but not the
remaining three, and so on. An inspection of the response frequencies led
Lazarsfeld to postulate that soldiers belonged to one of two possible “latent
classes”: the first being soldiers who are generally favorable to the army versus
those who are generally unfavorable. Moreover, Lazarsfeld noted that if one
were to have administered the items to only one of the two classes, there would
be no correlations among the items. This phenomenon was termed local inde-
pendence by Lazarsfeld and it implies that holding the latent class constant,
there is no correlation among the manifest item responses.
Missing during the early days in the development of latent structure analy-
sis was explicit testing of the latent class model. The standard issues of good-
ness-of-fit, parameter estimation, standard errors, and other concepts familiar
to mathematical statistics at the time were not discussed in any meaningful way
within the emerging literature on latent structure analysis. It wasn’t until much
later with the work of Lazarsfeld and Henry (1968) that the conventional con-
cepts of mathematical statistics were brought into the domain of latent struc-
ture analysis. Full integration of latent structure analysis with mathematical
statistics came with the publication of Leo Goodman’s (1968) paper on log-
linear modeling approaches to latent structure analysis and the publication of
Discrete Multivariate Analysis by Yvonne Bishop, Stephen Fienberg, and Paul
Holland (1975).
Structural equation modeling is, without question, one of the most popular
statistical methodologies available to quantitative social scientists. The popu-
larity of structural equation modeling can be attested to by the creation of a
scholarly journal devoted specifically to structural equation modeling4 as well
as the existence of SEMNET, a very popular and active electronic discussion list
that focuses on structural equation modeling and related issues.5
Structural equation modeling also continues to be an active area of theo-
retical and applied statistical research. Indeed, the past 40 years have seen
01-Kaplan-45677:01-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 7
The previous sections provided only a taste of the foundations and extensions
of structural equation modeling. Each chapter of this book provides more
detail to both. These developments come about primarily through the interac-
tion of statisticians with substantive researchers motivated by a need to solve
01-Kaplan-45677:01-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 8
Theory
Model
Specification
Sample and
Measures
Estimation
Assessment
Modification
of Fit
Discussion
The substantive examples in this book draw from current issues in the field of
education and that are at the forefront of national debate on school effects. My
choice in using examples from the field of education stems mainly from the fact
that this is the substantive area with which I am most familiar. In addition, many
of the topics covered in this book can be convincingly demonstrated on prob-
lems in the field of education. However, many of the new extensions in structural
equation modeling that constitute a part of this book can be quite clearly demon-
strated on problems arising from fields other than education.
Many of the examples used throughout this book are guided by a theoret-
ical framework. The theoretical framework used throughout this book is
referred to as the input-process-output theory of education (Bidwell & Kasarda,
1975). A number of diagrammatic formulations have been offered to describe
the input-process-output theory of the U.S. educational system. Figure 1.2 shows
one such diagram offered by Shavelson, McDonnell, and Oakes (1989) and often
referred to as the RAND Corporation Indicators Model.
There are numerous aspects of this figure that are worth pointing out.
First, and of relevance to the subject of this book, is the implied complexity of
the educational system. To take an example, schooling inputs such as fiscal
resources are theorized to have their effects on outputs mostly via other school-
ing variables as well as teacher and classroom process variables. The teacher/
classroom process variables, in turn, exhibit their own structural complexity.
01-Kaplan-45677:01-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 11
Fiscal
Achievement
Resources
Curriculum
Quality
Teaching
Quality
Student Attitudes/
Background Aspirations
Notes
2
Path Analysis
Modeling Systems of Structural
Equations Among Observed Variables
13
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 14
y = α + By + Γx + ζ, [2.1]
Path Analysis—15
SCIGRA6 SCIGRA10
SES
SCIACH
are imposed by the underlying, substantive theory. For example, in the model
shown in Figure 2.1, an element of B would be the path relating SIGRA10 to
CHALLG. An element in Γ would be the path relating SIGRA10 to SES.
We can distinguish between three types of parameters in B, Γ, and Ψ. The
first set of parameters is the one that is to be estimated. These are often referred
to as free parameters. Thus, in the model in Figure 2.1, the observable paths are
the free parameters. The value of these free parameters will be estimated with
the methods described below.
The second set of parameters are given a priori values that are held con-
stant during estimation. These parameters are often referred to as fixed para-
meters. Most often, the fixed parameters are set equal to zero to represent the
absence of the relationship. However, it is possible to fix an element to a
nonzero value if the theory is strong enough to suggest what that value
should be. Again, considering the model in Figure 2.1, we theorize that there
is no direct relationship between SCIACH and SES. Thus, this path is fixed
to zero.
Finally, it is possible to constrain certain parameters to be equal to other
parameters in the model. These elements are referred to as constrained
parameters. An example of constrained parameters would be requiring that
the effect of SCIGRA6 on SCIGRA10 be the same as the effect of SES on
SCIGRA10.
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 16
x1
x2 y1 y2
x3
Path Analysis—17
x1
y1
x2
x3 y2
ðI − BÞy = α + Γx + ζ: [2.2]
Eðy, xÞ = Σyx
Eðyy0 Þ Eðyx0 Þ
= [2.5]
Eðx0 yÞ Eðx0 xÞ
" #
ðI − BÞ−1 ðΓΦΓ0 + ΨÞðI − BÞ0−1 ðI − BÞ−1 ΓΦ
= :
ΦΓ0 ðI − BÞ0 −1 Φ
Equations [2.4] and [2.5] show that structural equation modeling repre-
sents a structure on the mean vector and covariance matrix. The structure is in
terms of the parameters of the model.
Path Analysis—19
structural equation models. In this section, we will define the problem of iden-
tification from the covariance structure perspective. Later, we introduce the
problem of identification from the reduced form perspective when consider-
ing some simple rules for establishing identification.
Recall again that we wish to know whether the variances and covariances of the
exogenous variables (contained in Φ), the variances and covariances of the dis-
turbance terms (contained in Ψ), and the regression coefficients (contained in B
and Γ) can be solved in terms of the variances and covariances contained in Σ.
Two classical approaches to identification can be distinguished in terms of
whether identification is evaluated on the model as a whole or whether identifi-
cation is evaluated on each equation composing the system of equations. The for-
mer approach is generally associated with social science applications of structural
equation modeling, whereas the latter approach appears to be favored in econo-
metrics. Nevertheless, they both provide a consistent picture of identification in
that if any equation is not identified, the model as a whole is not identified.
The first, and perhaps simplest, method for ascertaining the identification
of the model parameters is referred to as the counting rule (see, e.g., Bollen,
1989). Let s = p + q, be the total number of p endogenous and q exogenous vari-
ables. Then the number of nonredundant elements in Σ is equal to –12 s(s + 1). Let
t be the total number of parameters in the model that are to be estimated (i .e.,
the free parameters). The counting rule states that a necessary condition for
identification is that t ≤ –12 s(s + 1). If the equality holds, then we say that the
model may be just identified. If t is strictly less than –12 s(s + 1), then we say that
the model may be overidentified. If t is greater than –12 s(s + 1), then the model
may be not identified.
As an example of the counting rule, consider the model of science achieve-
ment given in Figure 2.1. The total number of variables, s, in this model is 7.
Thus, we obtain 28 elements in Σ. There are 10 variances and covariances of
exogenous variables (including disturbances), and 8 path coefficients. Using
the counting rule, we obtain t = 28 − 18 = 10. Because t is strictly less than the
number of elements in Σ, we say that the model is overidentified. The 10 overi-
dentifying elements come from the 10 restrictions placed on the model.
Clearly, the advantage to the counting rule is its simplicity. It is also a nec-
essary but not sufficient rule for identification. We can, however, provide rules
for identification that are sufficient, but that pertain only to recursive models,
or to special cases of recursive models. Specifically, a sufficient condition for
identification is that B is triangular and that Ψ is a diagonal matrix. However,
this is the same as saying that recursive models are identified. Indeed, this is the
case, and Bollen (1989) refers to this rule as the recursive rule of identification.
In combination with the counting rule above, recursive models can be either
just identified or overidentified.
A special case of the recursive rule concerns the situation where B = 0 and
Ψ again a diagonal matrix. Under this condition, the model in Equation [2.1]
reduces to
y = α + Γx + ζ, [2.6]
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 21
Path Analysis—21
2 3
x
y1 0 b12 y1 g g12 0 4 15 z
= + 11 x2 + 1 : [2.7]
y2 b21 0 y2 0 0 g23 z2
x3
A = ½ðI − BÞjΓ
1 −b12 −g11 −g12 0 [2.8]
= ,
−b21 1 0 0 −g23
where s = p + q. Note that the zeros placed in Equation [2.8] represent paths
that have been excluded (restricted) from the model based on a priori model
specification. We can represent the restrictions in the first equation of A, say
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 22
0
:
−g23
With the first row 0, the rank of this matrix is 1, and hence, the first equa-
tion is identified. Considering the second equation, the resulting submatrix is
−g11 −g12
:
0 0
Again, because of the 0s in the second row, the rank of this submatrix is 1,
and we conclude that the second equation is identified.
A corollary of the rank condition is referred to as the order condition. The
order condition states that the number of variables (exogenous and endoge-
nous) excluded (restricted) from any of the equations in the model must be at
least p −1 (Fisher, 1966). Despite the simplicity of the order condition, it is only
a necessary condition for the identification of an equation of the model. Thus,
the order condition guarantees that there is a solution to the equation, but it
does not guarantee that the solution is unique. A unique solution is guaranteed
by the rank condition.
As an example of the order condition, we observe that the first equation
has one restriction and the second equation as two restrictions as required by
the condition that the number of restrictions must be as least p − 1 (here, equal
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 23
Path Analysis—23
to 1). It may be of interest to modify the model slightly to demonstrate how the
first equation of the model would not be identified according to the order con-
dition. Referring to Figure 2.3, imagine a path from x3 to y1. Then the 0 in the
first row of A would be replaced by −γ13. Using the simple approach for deter-
mining the order condition, we find that there are no restrictions in the first
equation and therefore the first equation is not identified. Similarly, the first
equation fails the rank condition of identification.
Assuming that the parameters of the model are identified, we now move on to
describe procedures for the estimation of the parameters of the model. The
parameters of the model are (a) the variances and covariances of exogenous
variables contained in Φ, (b) the variances and covariances of disturbance
terms contained in Ψ, and (c) the regression coefficients contained in B
and Γ. Once again, it is convenient to consider collecting these parameters
together in a parameter vector denoted as Ω. The goal of estimation is to
obtain estimates of the parameter vector Ω, which we will write as Ω ^ , that
minimize a discrepancy function FðS,ΣÞ ^ , where Σ
^ = ΣðΩÞ^ is the covariance
matrix based on the estimates of the model—the so-called fitted covariance
matrix.
The function FðS,ΣÞ^ is a scalar that measures the discrepancy (distance)
between the sample covariance matrix S (the data) and the fitted covariance
matrix Σ^ based on model estimates. A correct discrepancy function is charac-
terized by the following properties (see Browne, 1984):
ðiÞ FðS,ΣÞ
^ ≥ 0,
ðiiÞ FðS,ΣÞ
^ = 0, if and only if Σ
^ = S,
ðiiiÞ FðS,ΣÞ
^ is a continuous fraction in S and Σ:
^
The first property (i) requires that the discrepancy function must be a
positive real number. The second property (ii) indicates that the discrepancy func-
tion is zero only if the model estimates reproduce the sample covariance matrix
perfectly. The third property (iii) simply states that the function is continuous.
For the purposes of this chapter, we consider maximum likelihood (ML)
and generalized least squares (GLS).5 We consider the distributional assump-
tions underlying these methods, but we postpone the discussion of assumption
violations until Chapter 5 where we consider alternative methods of estimation
under more relaxed distributional assumptions.
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 24
If Equation [2.9] represents the multivariate normal density function for a sin-
gle sample member, then the product given in Equation [2.10] can be written as
" #
−N ðp + qÞ=2 −N =2
X
N
LðΩÞ = ð2pÞ jΣðΩÞj exp 1
2 z0 i Σ−1 ðΩÞzi , [2.11]
i=1
−N ðp + qÞ N
logLðΩÞ = logð2pÞ − logjΣðΩÞj
2 2
N [2.12]
− tr½TΣ−1 ðΩÞ:
2
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 25
Path Analysis—25
The last term on the right-hand side of Equation [2.12] arises from the
fact that the term in the brackets of Equation [2.11] is a scalar, and the trace of
a scalar is a scalar. Thus, referring to the last term on the right-hand side of
Equation [2.11] we have
X X
1 N 0 −1 1 N
− z i Σ ðΩÞzi = − tr½z0 i ΣðΩÞzi : [2.13]
2 i=1 2 i=1
Multiplying and dividing by N and using the trace rule that tr(ABC) =
tr(CAB) yields
X X
1 N N N −1 0 −1
− tr½z0i ΣðΩÞzi = − tr N zi zi Σ ðΩÞ
2 i=1 2 i=1
[2.14]
N
=− tr TΣ − 1 ðΩÞ ,
2
N
logLðΩÞ = − logjΣðΩÞj + tr½SΣ − 1 ðΩÞ : [2.15]
2
A problem with Equation [2.15] is that it does not possess the properties
of a correct discrepancy function as described above. To see this, note that if
S = Σ, then the second term on the right-hand side of Equation [2.15] will be
an identity matrix of order p + q and the trace will equal p + q. However, the
difference between the first term and second term will not equal zero as
required if Equation [2.15] is to be a proper discrepancy function, as in prop-
erty (ii) discussed above. To render Equation [2.15] a proper discrepancy func-
tion, we need to add terms that do not depend on model parameters and
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 26
therefore are not involved in the differentiation. To begin, we can remove the
term − –N2 , in which case we are minimizing the function rather than maximiz-
ing it. Then, we can add terms that do not depend on model parameters and
thus are of no consequence to differentiation. This gives
where t is the total number of variables in z, that is, t = p + q. It can be seen that
if the model fits perfectly, the first and third terms sum to zero and the second
and fourth terms sum to zero and therefore Equation [2.16] is now a proper
fitting function as defined properties (i) to (iii) above.
In addition to obtaining the estimates of the model parameters, we can also
obtain the covariance matrix of the estimates. Let Ω^ be the r × 1 vector of estimated
model parameters. Then, the asymptotic covariance matrix of Ω ^ can be written as
2 −1
∂ logLðΩÞ
ACOVðΩÞ = −E
^ , [2.17]
∂Ω∂Ω0
Path Analysis—27
a. SCIGRA6, self-reported science grades from grade 6 to present; CERTSCI, Is teacher certified to
teach science in state? (1 = yes); SES, socioeconomic status composite; UNDERSTD, How often is
student asked to show understanding of science concepts?; CHALLG, How often does student feel
challenged in science class?; SCIGRA10, self-reported science grades from grade 10; SCIACH, item
response theory estimated number right on science achievement test.
b. Mardia’s coefficient of multivariate kurtosis.
The software program Mplus (L. Muthén & Muthén, 2006) was used for
this analysis. ML estimates of the model parameters and tests of significance are
given in the upper panel Table 2.2. The unstandardized estimates are the direct
effects and the covariances of the exogenous variables in the model. It can be
seen that with few exceptions, each direct effect is statistically significant.
where W−1 is a weight matrix that weights the deviations S − ΣðΩÞ in terms
of their variances and covariances with other elements. Notice that this is a
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 28
Table 2.2 Maximum Likelihood Estimates Direct Effects for the Initial Science
Achievement Model
IRTSCI ON
SCIGRA10 1.228 0.034 35.678 1.228 0.384
SCIGRA10 ON
CHALLG −0.033 0.017 −1.961 −0.033 −0.022
SCIGRA6 0.781 0.020 38.625 0.781 0.413
SES 0.239 0.026 9.103 0.239 0.097
CERTSCI −0.040 0.039 −1.039 −0.040 −0.011
UNDERSTD 0.168 0.015 11.315 0.168 0.125
CHALLG ON
UNDERSTD 0.318 0.010 33.225 0.318 0.361
UNDERSTD ON
CERTSCI −0.030 0.033 −0.929 −0.030 −0.011
Residual variances
UNDERSTD 1.858 0.031 60.667 1.858 1.000
CHALLG 1.250 0.021 60.667 1.250 0.870
SCIGRA10 2.637 0.043 60.667 2.637 0.786
IRTSCI 29.291 0.483 60.667 29.291 0.853
2
Observed variable R
UNDERSTD 0.000
CHALLG 0.130
SCIGRA10 0.214
IRTSCI 0.147
proper discrepancy function insofar as if the model fits the data perfectly, the
first and last terms on the right-hand side of Equation [2.18] will yield a null
matrix.
A critical consideration of WLS estimators is the choice of a weight matrix
W . One choice could be W−1 = I, the identity matrix. With the identity matrix
−1
as the choice for the weight matrix, WLS reduces to unweighted least squares
(ULS). Unweighted least squares is identical to ordinary least squares in the
standard regression setting in that it assumes homoscedastic disturbances.
Moreover, although ULS is known to yield unbiased estimates of model
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 29
Path Analysis—29
1
FGLS = tr½S−1 ðS − ΣÞ2
2
1 [2.19]
= trðI − S−1 ΣÞ2 :
2
A feature of ML and GLS estimation of the path model is that one can explicitly
test the hypothesis that the model fits the data. Consider again Equation [2.15].
This is the log-likelihood under the null hypothesis that the specified model
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 30
n
logLa = − logjSj + trðSS−1 Þ
2
n [2.20]
= − logjSj + trðIÞ
2
n
= − logjSj + t:
2
The statistic for testing the null hypothesis that the model fits in the pop-
ulation is referred to as the likelihood ratio (LR) test and is expressed as
L0
−2log = −2logL0 + 2logLa
La
[2.21]
= n logjΣj + trðΣ−1 SÞ − nðlogjSj + tÞ
= n logjΣj + trðΣ−1 SÞ − logjSj − t :
Notice from the last equality in Equation [2.21] that the log-likelihood
ratio is simply n × FML.
The large sample distribution of the LR test is chi-square with degrees of
freedom (df) given by the difference in the number nonredundant elements in
Σ and the number of free parameters in the model. The LR chi-square test is
used to test the null hypothesis that the population covariance matrix pos-
sesses the structure implied by the model against the alternative hypothesis
that Σ is an arbitrary symmetric positive definite matrix.
In the context of our science achievement example, the LR chi-square sta-
tistic indicates that the model does not fit the data (χ2 = 1321.13, df = 10,
p < .000). Numerous explanations for the lack of fit are possible including
nonnormality, missing data, sample size sensitivity, and incorrect model spec-
ification. These are taken up in detail in Chapter 5.
In addition to a global test of whether the model fits perfectly in the pop-
ulation, one can also test hypotheses regarding the individual fixed and freed
parameters in the model. We can consider three alternative ways to evaluate the
fixed and freed elements of the model vis-à-vis overall fit. The first method
rests on the difference between the LR chi-square statistics comparing a given
model against a less restrictive model. A less restrictive model can be formed
by freeing one of the currently restricted paths.
Recall from Section 2.3.1 that the LR test of the null hypothesis is given as
n ∗ FML. This initial null hypothesis, say H01, is tested against the alternative
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 31
Path Analysis—31
∂logLðΩÞ
sðΩÞ = [2.23]
∂Ω
as the score vector representing the change in the log-likelihood for a change in
Ω. For the estimated parameters in the model, the elements of s(Ω) will be
zero, because at the maximum of the likelihood, the vector of partial derivative
is zero. However, for the restricted elements in Ω, say Ωr, the partial derivatives
will only be zero if the restrictions hold exactly. If the restrictions don’t hold
exactly, which would almost always be the case in practice, then the maximum
of the likelihood would not be reached and the derivatives would not be zero.
Thus, a test can be formed, referred to as the Lagrange multiplier (LM) test,
which assesses the validity of the restrictions in the model (Silvey, 1959). The
LM test can be written as
LM = ½sðΩ ^ r Þ − 1 ½sðΩ
^ r Þ0 IðΩ ^ r Þ, [2.24]
where I(Ωr) was earlier defined as the information matrix. The LM test is asymp-
totically distributed as chi-square with degrees of freedom equaling the difference
between the degrees-of-freedom of the more restrictive model and the less restric-
tive model. Again, if one restriction is being evaluated, then the LM test is evalu-
ated with one degree of freedom. The LM test in Equation [2.24] is also referred
to as the modification index (Sörbom, 1989). This test is most commonly used for
model modification and we will defer that discussion until Chapter 6.
Finally, we can consider evaluating the impact of placing restrictions on
the unrestricted model. Let r(Ω) represent a set of restrictions placed on a
model. In our science achievement example, r(Ω) represents the paths fixed to
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 32
zero. The estimates r(Ωr) are zero by virtue of the specification. The question
is whether the restrictive model holds for the set of unrestricted estimates, say
r(Ωu). In other words, if a small (and perhaps nonsignificant) path coefficient
was restricted to be zero (removed from the model), would that restriction
hold in the population? If the restrictive model holds, then r(Ωu) should not
differ significantly from zero. However, if the restrictive model does not hold,
one would expect the elements of r(Ωu) to differ significantly from zero. The
test for the validity of restricting parameters is given by the Wald test (W) writ-
ten as
^ 2j
o
Wj = , [2.26]
oj Þ
Varð^
where Var(ωj) is the jth diagonal element of the asymptotic covariance matrix
of the estimates. In large samples, Wj is distributed as chi-square with one
degree of freedom. Note that the square root of Equation [2.26] gives
^j
o
z= , [2.27]
oj Þ
seð^
which has an asymptotic normal distribution with mean 0 and variance 1. This
statistic can also be used to test the null hypothesis that ωj = 0. The LR difference
test, the LM test, and the Wald test are known to be asymptotically equivalent
(see Buse, 1982; Engle, 1984).
Path Analysis—33
CERTSCI, SES on CERTSCI, and the correlation of CETSCI with SCIGRA6, all
paths are statistically significant. As noted above, we can evaluate the effect of
restricting one of these paths on the overall LR chi-square test by simply squar-
ing the specific z-value of interest. So, for example, restricting the regression of
SCIGRA10 on UNDERSTD (z = 11.315) to zero, we would expect the LR chi-
square test to increase by z2 = 128.029. This would indicate a significant decre-
ment in the overall fit of the model. Similarly, if we wished to restrict the path
from UNDERSTD to CERTSCI (z = −0.929), the resulting change in the LR
chi-square test would be z2 = 0.0841, which is not a significant decrement to
model fit.
In addition to the direct effects of the model, path analysis allows for the
further decomposition of the total and indirect effects. Indeed, the decomposi-
tion of effects represents a classic approach to the development of path analy-
sis (see, e.g., Duncan, 1975).
It is perhaps pedagogically easier to first consider the decomposition of
the total effect. The total effect is the sum of the direct effect and all indirect
effects of an exogenous variable on an endogenous variable of interest. From
the standpoint of the equations of the model, it is useful to consider the
reduced form specification shown earlier in Equation [2.3]. In the context of
the reduced form of the model, the coefficient matrix Π1 ≡ ðI − BÞ − 1 Γ is
the matrix of total effects.
In many respects, an analysis of the total effects and their substantive
and/or statistical significance provides the information necessary to further
use the model for prediction purposes. That is, often an investigator can isolate
a particular endogenous outcome as the ultimate outcome of interest. The
exogenous variables, on the other hand, may have clinical or policy relevance
to the investigator.8 The mediating variables, then, represent the theorized
processes of how changes in exogenous variables lead to changes in endoge-
nous variables. However, the process may be less important in some contexts
than the simple matter of the overall effect. An analysis of the total effects can
provide this information.
If, in a given context, it is important to understand meditating processes,
then one needs to consider the indirect effects. An indirect effect is one in
which an exogenous variable influences an endogenous variable through the
mediation of at least one other variable. For example, it may be of interest to
determine if students with higher science grades and higher science achieve-
ment scores are associated with teachers who are certified in science by virtue
of higher instructional quality.
To obtain an expression for the indirect effects recall that the total effect is
the sum of the direct and all indirect effects. This then leads to an expression
for the indirect effects of exogenous variables on endogenous variables.
Specifically, if Γ is the matrix of direct effects of exogenous variables on
endogenous variables, and ðI − BÞ − 1 Γ is the matrix of total effects, then it fol-
lows that the matrix containing total indirect effects is ðI − BÞ − 1 Γ − Γ.
Table 2.3 provides a selected set of effect decompositions for the science
achievement model. The total indirect effect of sixth grade reported science
grades on science achievement is statistically significant as is the total indirect
effect of SES on science achievement. The specific indirect effects of certifica-
tion to teach science on science achievement are all nonsignificant. The specific
indirect effects from UNDERST to science achievement are each statistically
significant.
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 35
Path Analysis—35
Table 2.3 Selected Total Indirect and Specific Indirect Estimates for the Science
Achievement Model
science achievement model, such variables as SES and SCIACH have under-
standable and usable metrics. In cases where the metrics are not readily inter-
pretable, or perhaps arbitrary, it is useful to adopt a new metric for the variables
so as to yield substantively interesting interpretations. One such approach to the
problem is to standardize the structural parameters of the model. In the context
of path analysis, consider an unstandardized path coefficient for an element of
Γ, say γpq. Then, the standardized element is obtained as
!
^ yp
s
^gpq = g^ , [2.28]
^ xq pq
s
^ y0 ^
^ 0 = s
b b 0: [2.29]
pp
^ y pp
s
Path Analysis—37
2.6 Conclusion
This chapter introduced the basics of path analysis. We covered issues of iden-
tification, estimation, and testing as well as provided a substantive example.
Although additional topics addressing assumptions and model testing are
taken up in later chapters, the steps used in this example are characteristic of
the conventional approach to structural equation modeling. Namely, a model
was postulated that represents causal relationships implied by the input-
process-output theory. Next, data were collected and variables were chosen
that represent the theoretical constructs of interest. Finally, model parameters
were estimated and tested as was the overall fit of the model. Throughout this
chapter, an underlying assumption was that the variables were measured with-
out error. This, of course, is a heroic assumption in most situations. Moreover,
consequences of violating this assumption are fairly well known—namely
measurement error in our exogenous and endogenous variables can attenuate
regression coefficients and induce biased standard errors, respectively
(Duncan, 1975; Jöreskog & Sörbom, 2000). Ideally, the goal would be to con-
duct path analysis on error-free measures. One approach to obtaining error-
free measures of theoretical variables is to develop multiple measures of the
underlying construct and eventually employ the construct directly into the
path analysis.
In the next chapter, we address the issue of validating measures of our the-
oretical variables via the method of factor analysis. This discussion then leads
into Chapter 4, which combines factor analysis and path analysis into a com-
prehensive structural equation methodology.
Notes
1. Path diagrams do not represent the theory or even the theoretical model,
assuming there is one. Rather, the path diagram is a pictorial representation of a statis-
tical model of the data.
2. In this chapter, I will use standard econometric terminology for describing path
models. Thus, the terms endogenous variables and exogenous variables are terms derived
from econometrics. Other related terms are dependent variables and independent vari-
ables or criterion variables and predictor variables, respectively. Moreover, I will use nota-
tion similar to that used in LISREL (Jöreskog & Sörbom, 2000).
02-Kaplan-45677:02-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 38
3. This and other path diagrams were drawn using AMOS 4.0.
4. Perhaps more accurately, the parameter vector Φ could be omitted from this
list. The parameter vector Φ contains the variances and covariances of the exogenous
variables and is not structured in terms of model parameters. In fact, estimates in Φ will
be identical to the corresponding elements in the sample covariance matrix S.
5. The focus of attention on these estimators does not result in a loss of general-
ity. Indeed, the maximum likelihood estimator that is discussed is referred to as FIML
(full information maximum likelihood) in the econometrics literature. Moreover, for
recursive models, two-stage least squares and unweighted least squares are identical.
6. This is not to suggest that goodness-of-fit is unimportant. Indeed, serious lack
of fit may be due to specification errors that would, in turn, lead to biased structural
coefficients. However, the evidence suggests that goodness-of-fit dominates the model-
ing effort with little regard to whether the model estimates are sensible or informative.
7. In this section, we will focus on terminology typically encountered in social
and behavioral science applications of structural equation modeling. The literature on
causal inference in structural equation models makes clear distinctions between statis-
tical parameters and causal parameters. We defer that discussion to Chapter 11.
8. Implicit here is the idea that the exogenous variables of interest are truly exoge-
nous. The issue of exogeneity is taken up in Chapter 5.
03-Kaplan-45677:03-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 39
3
Factor Analysis
The example used throughout this chapter explores the factor structure of
student perceptions of school climate. This problem has important implica-
tions for the input-process-output model not only because student percep-
tions are important education indicators in their own right but they may also
be predictive of achievement.
The data for this example come from the responses of a sample of public
school 10th grade students to survey items in the National Educational
Longitudinal Study (NCES, 1988).1 Table 3.1 defines the items in the question-
naire. After mean imputation of missing data, the 10th grade sample of
students was 12,669.2
39
03-Kaplan-45677:03-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 40
Label Variable
Although it may be the case that a researcher has a particular model relat-
ing student perceptions of school climate to achievement in mind, of priority
is the measurement of the constructs that are incorporated into the model.
The researcher may postulate that there are several important dimensions to
student perceptions. The question is whether a set of measurements that asks
students to rate their agreement to statements about the climate of the school
correlate in such a way as to suggest the existence of the factors in question.
The model used to relate observed measures to factors is the linear factor
analysis model and can be written as
x = Λx ξ + δ, [3.1]
Factor Analysis—41
EðξÞ = 0,
EðδÞ = 0,
and
Covðξ,δÞ = 0:
Under these assumptions, the covariance matrix of the observed data can
be written in the form of the fundamental factor analytic equation,
t = x − e, [3.3]
However, the factor model for the true scores will not contain measurement
error but will contain specific error due to the particular selection of variables
in the model. The factor analytic model for the true scores can be written as
t = Λx ξ + s: [3.4]
Λ = ΛT,
Φ = T−1 ΦT−1 ,
and
Θ = Θ,
then
0
Σ = Λ Φ Λ + Θ
= ΛTT−1 ΦT−1 T0 Λ0 + Θ
= ΛΦΛ0 + Θ
= Σ.
03-Kaplan-45677:03-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 43
Factor Analysis—43
This shows that any orthogonal transformation of the system will give rise
to the same covariance matrix. When this is the case, we say that the model is
not identified and that there are k × k = k2 indeterminancies that must be
removed in order for there to be a unique solution to the factor model. The k2
elements correspond to the dimension of the transformation matrix T.
To see how the identification problem is handled, first consider the case of
orthogonal factors—that is, factors that are not correlated. For the orthogonal
factor case, Φ = I. When there is only one factor, that is, k = 1, setting Φ = I
(or φ = 1) removes the k2 indeterminancies completely. No orthogonal trans-
formation of the system is possible, and the parameters are uniquely identified.
This is the reason that we cannot rotate one factor. When k = 2 then k2 = 4 and
setting Φ = I removes k(k + 1)/2 = 3 indeterminancies, leaving one remaining
indeterminacy. Finally, we can consider the general case of k ≥ 2. Again, with
Φ = I, we have removed k(k + 1)/2 indeterminancies leaving k2 − k(k + 1)/2 =
k(k − 1)/2 indeterminancies to be removed.
We can see from the above discussion that simply setting Φ = I does not
remove all the indeterminancies in the model except in the case when there is
only one factor (k = 1). The remaining k(k − 1)/2 must be placed in the ele-
ments of Λ. The manner in which these remaining restrictions are imposed
depends on the method of estimation that is used. For the most part, the dif-
ferences among estimation methods in the manner in which these restrictions
are imposed are arbitrary. However, given that the restrictions are imposed in
an arbitrary fashion to simply fix the reference factors, this arbitrariness can
be exploited for purposes of factor rotation. This topic is further elaborated
on below.
What has been covered so far concerns issues of identification of the para-
meters irrespective of the method of factor extraction. We can now turn to the
problem of factor extraction directly. We consider principal components analy-
sis and the common factor model, both of which use the method of principal
axis factoring for factor extraction. We then briefly consider two statistical
methods of factor analysis that rest on the common factor model, namely
generalized least squares and maximum likelihood estimation. These estima-
tion methods were discussed in greater detail in Chapter 2. Each method is
applied to the problem of estimating the factors of student perception of
school climate.
03-Kaplan-45677:03-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 44
ðΣ − lIÞu = 0, [3.5]
jΣ − lIj = 0: [3.6]
Factor Analysis—45
z = u0 x: [3.8]
V ðzÞ = V ðu0 xÞ
= u0 V ðxÞu [3.9]
= u Σu:
0
But from Equation [3.7] Σ = uDu0 . Making use of the fact that u is an
orthogonal matrix, Equation [3.9] can be written as
V ðzÞ = u0 uDuu0
= IDI [3.10]
= D:
From Equation [3.10], we can see that the principal components are
orthogonal to each other and that the variances of the principal components
are the eigenvalues of Σ.
In the process of extracting the principal components of Σ, the principal
components are ordered in terms of decreasing size of their eigenvalues. With
all components retained, the total variance of the principal components is equal
to the total variance of the original variables. That is,
X
q X
q X
q
szi = li = trðΣÞ = syi ,
i=1 i=1 i=1
03-Kaplan-45677:03-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 46
where σzi is the variance of the ith principal component, σyi is the variance of
the ith original variable, and “tr” is the trace operator.
In the typical application of PCA as a factor analysis method, usually
m < q principal components are retained because they account for a sub-
stantively important amount of the total variance. This is discerned using
the fact that
l1 + l2 + + lm
trðΣÞ
z = D1=2 z:
V ðz Þ = I
Factor Analysis—47
6
Eigenvalues
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Component Number
Factor Analysis—49
when the diagonal elements are altered, it is not necessarily the case that the
Gramian property will hold.
The question for Thurstone was how to estimate communalities that
resulted in an altered correlation matrix that remained Gramian. If the com-
munalities were underestimated, then it would be possible that the resulting
rank would be too high. Conversely, if the communalities were overestimated
then the factors would account for more than simply the off-diagonal correla-
tions and the rank would not be reduced (Mulaik, 1972, p. 136). Therefore, the
goal was to choose estimates of communalities that resulted in minimal rank
under the required Gramian conditions.
With regard to the choice of communality estimates, the major work on
this problem can be traced to Guttman (1954, 1956). Guttman provided three
different estimates of communality that would result in lower bounds for min-
imum rank. In the interest of space, and given the fact that more recent statis-
tical approaches have rendered the problem of communality estimation somewhat
moot, we consider the most typical form of communality estimation—namely the
use of the squared multiple correlation (SMC).
The SMC of the qth variable with the other q − 1 variables was shown by
Guttman to be the best estimate of the lower bound for communality. The idea
was to compute the SMCs and insert them into the diagonal of the sample
correlation matrix R and then subject the correlation matrix to principal
axis factoring. Before factoring, however, it is necessary to slightly adjust off-
diagonal elements to preserve the required Gramian property.
An early method, suggested by Wrigley (1956), involved updating the
analysis after each factor extraction. That is, for a given an a priori number of
factors chosen, and with SMCs in the diagonal, the correlation matrix is
subjected to principal axis factoring. Next, the sum of the squared estimated
factor loadings from the first factoring, say Λ^0 Λ
^
1 1 , are used as “updated” com-
munality estimates. These updated estimates are now inserted into the diago-
nal of the correlation matrix. This process continues until the difference
between the current communality estimates and the previous communality
estimates is less than an arbitrary constant. Once convergence is obtained, the
final iterated solution is presented for interpretation. What we have described
here is the method of iterated principal axis factoring and is found in most com-
monly used statistical software packages.
cos y sin y
T= : [3.11]
− sin y cos y
Note that this matrix has one parameter—namely, the angle of rotation θ.
With the angle of rotation chosen, the last remaining indeterminacy is
removed. In the general case, setting Φ = I removes k(k + 1)/2 indeterminan-
cies, leaving k(k − 1)/2. These remaining indeterminancies can be solved by
choosing k(k − 1)/2 angles of rotation.
The decision to rotate the solution usually rests on a desire to achieve a
simple structure representation of the factors (Thurstone, 1935, 1947). Simple
structure criteria are designed to reduce the complexity of the variables—that
is, the number of factors that the variable is loaded on. Ideally, under simple
structure, each variable should have a factor complexity of one—meaning that
a variable should load on one, and only one, factor. Methods of factor rotation
aid in achieving this goal. As we will see, only with the advent of the restricted
factor analysis model, do we have a rigorous approach to testing simple struc-
ture hypotheses.
Λ = ΛT,
and
X
q
dj = l2
ij , j = 1, . . . , q,
j=1
03-Kaplan-45677:03-Kaplan-45677.qxp 6/25/2008 10:35 AM Page 51
Factor Analysis—51
where dj is the sum of the squared loadings for the kth column of Λ*. Then,
varimax maximizes
" q #
X
k X 2
V= ðλ2
ji − dj =qÞ : [3.12]
i=1 j=1
Essentially, Equation [3.12] shows that the varimax criterion maximizes the
sum of squared deviations of the squared loadings from the corresponding col-
umn mean. As shown in Lawley and Maxwell (1971, p. 73), this amounts to
maximization with respect to the elements of the transformation matrix T.
Once maximized, T contains elements whose angles of rotation satisfy the vari-
max criterion.
Table 3.3 Promax Rotated Principal Axis Factor Loadings and Factor Correlations
1 2
1 1.000
2 0.713 1.000
tests for maximum likelihood leads to the conclusion that the three-factor
model does not fit the data. This could be affected by sample size and nonnor-
mality. However, the root mean square error of approximation along with the
90% confidence interval suggests approximately good fit of the model. Issues
of goodness-of-fit will be taken up in Chapter 4.
Up to now, focus of attention was on methods of extraction that did not require
assumptions regarding the distribution of the data. These were essentially
03-Kaplan-45677:03-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 53
Factor Analysis—53
Table 3.4 Promax Rotated Maximum Likelihood Factor Loadings and Factor
Correlations
1 2
1 1.000
2 0.674 1.000
Once the parameters are estimated, the maximum likelihood solution can
be further rotated to attain greater interpretability. The problem of factor rota-
tion is described in Section 3.4.6.
In addition to maximum likelihood, the method of generalized least
squares can also be used to estimate the parameters of the factor model. The
generalized least squares estimator was originated by Aitken (1935) but was
applied to the factor analysis setting by Jöreskog and Goldberger (1972).
Factor Analysis—55
POSITIVE CLIMATE
GETALONG 1.000a 0.000 0.000 0.866 0.859
SPIRIT 0.999 0.010 104.674 0.865 0.762
STRICT 0.899 0.010 87.260 0.778 0.674
FAIR 0.989 0.010 100.728 0.856 0.744
RACEFRND 1.013 0.010 102.665 0.876 0.753
TCHGOOD 1.082 0.009 122.594 0.937 0.838
TCHINT 1.107 0.009 123.147 0.958 0.840
TCHPRAIS 1.026 0.009 112.010 0.888 0.795
LISTEN 1.064 0.009 117.955 0.921 0.820
NEGATIVE CLIMATE
DISRUPT 1.000a 0.000 0.000 0.903 0.776
PUTDOWN 0.832 0.010 85.759 0.752 0.741
STUDOWN 0.873 0.010 85.167 0.789 0.737
FEELSAFE 0.797 0.010 82.342 0.720 0.716
IMPEDE 0.958 0.011 85.337 0.865 0.738
MISBEHAV 0.993 0.011 89.840 0.897 0.772
NEGATIVE CLIMATE with POSITIVE CLIMATE
0.598 0.011 56.355 0.765 0.765
χ (89 df) = 4827.399, p < .05
2
Factor Analysis—57
3.8 Conclusion
GETALNG–LISTEN DISRUPT–MISBEHAV
1 1
POS NEG
CLIMATE CLIMATE
Factor Analysis—59
Notes
1. The sample of students in the NELS survey does not represent a random sam-
ple of U.S. students. Rather, the NELS survey sampling scheme provides a proportion-
ally representative sample of schools. Within the schools, classroom teachers are sampled
for purposes of course coverage. This is followed by a sample of students within those
classrooms.
2. More accurately, imputation based on the mean of nearby points was used. The
argument is that because students are nested in schools, it is important to attempt to
maintain values that reflect the nesting of students within schools. For this analysis,
means based on five nearby points were chosen. Chapter 8 takes up the problem of fac-
tor analysis in multilevel settings.
3. This decision may rest on an inspection of a “scree” plot that plots the sizes of
the eigenvalues. Other criteria may include the number of eigenvalues exceeding 1.0 or
the percent of variance accounted for by the factor.
4. Extraction of the eigenvalues uses the R package prcomp.
03-Kaplan-45677:03-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 60
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 61
4
Structural Equation Models
in Single and Multiple Groups
T his chapter focuses on linking path analysis with factor analysis into a
comprehensive methodology typically referred to as structural equation
modeling.1 The rationale for linking these two methodologies into a compre-
hensive framework is that by doing so, we mitigate the problems associated with
measurement error thereby obtaining improved parameter estimates both in
terms of bias and sampling variability. The improvement resulting from com-
bining path analysis with factor analysis comes with a price, however—namely,
adding a measurement model to a path model will often dramatically increase
the total number of degrees of freedom available for testing model fit. This is
because, as we saw in Chapter 3, the restricted factor model will typically be
associated with a large number of restrictions reflecting a simple structure
hypothesis underlying the measurement instrument. These added restrictions
make it all the more likely that a reasonably well fitting structural part of the
model will be rejected due to problems within the measurement model.
Moreover, the potential for misspecification in the measurement part of the
model owing to these restrictions can, in some circumstances, propagate into
the structural part of the model (Kaplan, 1988; Kaplan & Wenger, 1993). We
take up these issues in more detail in Chapter 6 when we discuss modeling
strategies. Despite these difficulties, structural equation modeling represents an
extremely important advancement in statistical modeling when the goal is accu-
rate estimation and inference within complex systems.
The organization of this chapter is as follows. First, the basic model spec-
ification is presented. This is followed by a discussion of the problem of iden-
tification that pertains specifically to structural equation models.
Next, we discuss the method of multiple group structural equation mod-
eling as a means of addressing group differences while taking into account
61
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 62
η = Bη + Γξ + ζ, [4.1]
y = Λy η + ε [4.2]
and
x = Λx ξ + δ, [4.3]
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 63
The problem of identification of path models and factor models was discussed
in Chapters 2 and 3, respectively. Here we discuss identification as it pertains
to the full model in Equation [4.1]. The general problem of identification
remains the same—namely, whether unique estimates of the parameters of the
full model can be determined from the elements of the covariance matrix of
the observable variables. When combining the measurement and structural
models together into a single analytic framework, a set of new identification
conditions can be added to those that have already been presented.
To begin, we note that by adding the measurement model to the path
model, the identification conditions of the measurement model, specifically
that of restricted factor analysis, are required as part of overall identification.
In particular, it is essential to set the metric of the latent variables as discussed
in Section 3.5 of Chapter 3. In the typical case, the metric of the exogenous
latent variables are set by either fixing one loading in each column of Λx to 1.0,
or by fixing the diagonal elements of Φ to 1.0. The metric of the endogenous
latent variables are typically set by fixing a loading in each column of Λy to 1.0.
With the metric of the latent variables determined, we can now consider a
set of rules provided by Bollen (1989) that can be used to evaluate the identi-
fication status of structural equation models. The first rule is the counting rule
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 64
SCIGRA6 SCIGRA10
SES SCIACH
1 1
The two rules just described are either necessary or sufficient, but not
both. Indeed, there appear to be no necessary and sufficient conditions for
identification of the full model. If sufficient conditions for identification are
not met then it may be necessary to directly attempt to solve the structural
form equations in terms of the reduced form coefficients. Generally, however,
except for extremely complex models, the counting rule will work most often
in practice.
n o1
Dη = diag½ðI − B̂Þ−1 ðΓ
^Φ^Γ ^ − B̂Þ0 −1 2 ,
^ 0 + ΨÞðI [4.5]
1
Dξ = ðdiag ΦÞ
^ 2, [4.6]
n o1
^ y ðI − B̂Þ−1 ðΓ
Dy = diag½Λ ^Φ^Γ ^ − B̂Þ0−1 Λ
^ 0 + ΨÞðI ^ ε 2,
^ 0y + Θ [4.7]
and
1
^ x ΦΛ
Dx = diag½Λ ^ 0 x + Θδ 2 , [4.8]
of science models, writing reports, making calculations, and so on. A third set
of items assess the extent to which students feel challenged in the science class-
room, work hard in science class, and are challenged to show understanding. A
fourth set of items examine the extent to which students engage in hands-on
science learning. Finally, a fifth set of items assess the extent to which students
perceive themselves to be engaged in passive science learning—such as copying
down teachers’ notes, listening to lectures, and so on.
An exploratory factor analysis with oblique rotation (not shown) revealed
a very clear five-factor structure along the lines of the sets of items just
described. Of the five factors extracted, it was decided to use two factors that
could be argued to be most relevant for science achievement. These were items
that measured perceptions of hands-on involvement in science learning
(INVOLVE) and those that measured the extent to which the students felt chal-
lenged in the classroom (CHALLG).
In Chapter 2, the variables UNDERST and CHALLG were considered
separate though highly correlated variables. On the basis of a more detailed
exploratory factor analysis, we find that these two variables actually serve as
indicators of a single CHALLG factor. It is argued that the extent to which
students perceive to be challenged in the classroom can be predicted in part by
the extent of active learning. This hypothesis is consistent with the general
input-process-output paradigm discussed in Chapter 1.
The path diagram of the expanded path model incorporating the latent
variables is shown in Figure 4.1. The initial model is similar to that discussed
in Chapter 2. Namely, it is hypothesized that the background student charac-
teristics of previous science grades (SCIGRA6) and socioeconomic status
(SES) influence science achievement indirectly through 10th grade science
grades. The role of teacher certification in science is hypothesized to predict
the extent of hand-on science involvement. This in turn is hypothesized to pre-
dict student perceptions of a challenging classroom environment, which in
turn should predict science achievement through science grades.
The analysis of the initial model was based on a sample of public school
10th grade students from the NELS:88 data set. After listwise deletion, the sam-
ple was 6,677. The analysis used the software program Mplus with maximum
likelihood estimation.
Table 4.1 presents the maximum likelihood estimates of the expanded
science achievement model. An inspection of Table 4.1 reveals moderately large
and significant effect of perceptions of hands-on involvement on perceptions
of being challenged. However, the direct effect of perceptions of hands-on
involvement is not a significant or very large predictor of 10th grade science
grades.
The results in Table 4.1 are supplemented with a breakdown of the total
and indirect effects displayed in Table 4.2. Here, it can be seen that although
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 68
Measurement model
INVOLV BY
MAKEMETH 1.000 0.000 0.000 0.606 0.738
OWNEXP 0.724 0.027 26.469 0.439 0.605
CHOICE 0.755 0.029 26.036 0.458 0.507
CHALL BY
CHALLG 1.000 0.000 0.000 0.917 0.748
UNDERST 0.757 0.024 31.043 0.694 0.503
WORKHARD 0.867 0.026 32.921 0.795 0.723
Structural model
CHALL ON
INVOLV 0.251 0.027 9.282 0.166 0.166
INVOLV ON
CERTSCI 0.012 0.031 0.399 0.021 0.006
SCIGRA10 ON
CHALL 0.264 0.026 10.319 0.242 0.133
SCIACH ON
SCIGRA10 1.217 0.036 33.687 1.217 0.381
SCIGRA10 ON
SCIGRA6 0.788 0.021 37.147 0.788 0.416
SES 0.240 0.028 8.734 0.240 0.098
CERTSCI 0.032 0.068 0.466 0.032 0.005
Table 4.2 Standardized Total and Indirect Effects for the Expanded Science
Achievement
This section considers the extension of the single group model discussed in this
chapter and in Chapters 2 and 3, to multiple group situations. To motivate this
section, consider the substantive problem of student perceptions of school climate
as discussed in Chapter 3. Suppose that an investigator wishes to understand
the differences between public and private schools in terms of student perceptions
of school climate. Using, for example, the National Educational Longitudinal
Study (NCES, 1988), it is possible to identify whether students belong to public or
private (e.g., Catholic, other religious, and other nonreligious) schools.
An investigator may have a program of research designed to study school-type
differences in student perceptions of school climate eventually relating these differ-
ences to important educational outcomes such as academic achievement and
student dropout. First, the investigator might wish to determine whether the mea-
surement structure of the student perception items operates the same way across
school types. Second, the investigator may be interested in knowing if the means of
the factors of school climate are different between public and private school students.
xg = Λxg ξg + δg , [4.9]
ng
log L0 ðΩÞg = − logjΣg j + trðSg Σ−1
g Þ [4.10]
2
X
G
log L0 ðΩÞ = log L0 ðΩÞ: [4.11]
g =1
Given the model specification and the requisite assumptions, the first
test that may be of interest is the equality of covariance matrices across
groups. Note that for this first step, no structure is imposed. Rather, the goal
is simply to determine if the covariance matrices differ. Borrowing from
Jöreskog’s (1971) notation, the null hypothesis for this first step can be writ-
ten as
H0 : Σ1 = Σ2 = = ΣG : [4.13]
This hypothesis is tested using Box’s M test (Timm, 1975) and can be writ-
ten as
X
G
M = nlogjSj − ng logjSg j, [4.14]
g =1
1
d = ðg − 1Þqðq + 1Þ: [4.15]
2
H0k : k1 = k2 = = kG , [4.16]
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 72
H0 : Λ1 = Λ2 = = ΛG : [4.18]
H0ΛΘ : Λ1 = Λ2 = = ΛG , [4.20]
Θ1 = Θ2 = = ΘG :
1 1
dΛΘ = gqðq + 1Þ − qk + q − gkðk + 1Þ − q: [4.21]
2 2
Again, if this test is not rejected, then further sequential testing is allowed.
Finally, one can test for complete invariance of all parameters across
groups by adding the constraint that the factor covariance matrices Φg to be
equal across groups. The hypothesis can be represented as
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 73
H0ΛΘ : Λ1 = Λ2 = = ΛG ,
Θ1 = Θ2 = = ΘG , [4.22]
Φ 1 = Φ2 = = Φ G :
1 1
dΛΘΦ = qðq + 1Þ − qk + q − gkðk + 1Þ − q: [4.23]
2 2
public school students in line with the sample size for private school students,
a random sample of 500 public school students was drawn from the total sam-
ple of 11,000.
An exploratory factor analysis of both groups separately (not shown)
revealed that the same two factors held for both groups. Therefore, we proceed
with the sequence of testing proposed by Jöreskog (1971), beginning with the
test of factorial invariance.
Table 4.3 displays the sequential testing of invariance restrictions.
Preliminary analyses (not shown) revealed that the hypothesis of equality of
covariance matrices was rejected, suggesting that we can begin to explore
increasingly restrictive hypotheses of school type differences in student per-
ceptions of school climate.
The next hypothesis under Model 1 is that of invariance of factor loadings
corresponding to Equation [4.17]. The analysis suggests that the hypothesis of
invariance of factor loadings is rejected. Strictly speaking, this finding suggests
that while the number of factors is the same for both groups, the relationship
between the variables and their corresponding factors is not. According to the
testing strategy proposed by Jöreskog (1971), hypothesis testing stops at this
point. For completeness, however, we provide the remaining tests as well as chi-
square difference tests that compare increasing restrictions.
a. Model 1: Invariance of factor loadings. Model 2: Invariance of factor loadings and measurement
error variances. Model 3: Invariance of factor loadings, measurement error variances, and factor
variances and covariances.
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 75
xg = τ + Λg ξg + δg , [4.24]
Eðxg Þ = τ + Λg Eðξg Þ
[4.25]
= τ + Λ g κg ,
Eðxg Þ = τ − Λd + Λðκg + dÞ
[4.26]
= τ + Λκg :
This section covers a special case of the general structural equation model
considered above for the estimation of group differences on latent variables.
a. Estimate reflects factor mean difference between private school students (=1) and public school
students (=0) under the assumption of factorial invariance. Note that scales are coded 1 = strongly
agree, 4 = strongly disagree.
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 77
This model is referred to as the MIMIC model standing for the multiple
indicators and multiple causes model and was proposed by Jöreskog and
Goldberger (1975).
Consider once again the goal of estimating school type differences on
student perceptions of school climate. Denote by x a vector that contains
dummy codes representing group membership (e.g., 1 = public school; 0 = pri-
vate school). Then, the MIMIC model can be written as
y = Λy η + ε,
η = Γx + ζ, [4.27]
x ≡ ξ:
GETALNG–LISTEN DISRUPT–MISBEHAV
1 1
POS NEG
CLIMATE CLIMATE
SCHOOL
TYPE
η = Bη + Γξ + ζ,
y = Λy η + ε, [4.28]
x = Λx ξ + δ:
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 79
Measurement part
POS BY
GETALNG 1.000 0.000 0.000
SPIRIT 0.945 0.033 28.438
STRICT 0.880 0.036 24.315
FAIR 0.971 0.032 29.993
RACEFRND 1.001 0.033 30.000
TCHGOOD 1.087 0.031 35.201
TCHINT 1.072 0.029 37.514
TCHPRAIS 1.029 0.031 33.702
LISTEN 1.023 0.033 31.121
NEG BY
DISRUPT 1.000 0.000 0.000
PUTDOWN 0.825 0.034 24.162
STUDOWN 0.867 0.036 23.778
FEELSAFE 0.790 0.034 23.155
IMPEDE 0.963 0.040 23.975
MISBEHAV 0.960 0.038 25.324
Structural part
POS ON
SCHTYPE −0.279 0.067 −4.161
NEG ON
SCHTYPE 0.284 0.070 4.041
NEG WITH
POS 0.694 0.045 15.337
matrix, and hence, one of the elements of B must be fixed to 1.0. Also, in this
parameterization, the matrix Γ contains the regressions of the factor as well
as its indicators on the exogenous variables. Moreover, in the case of a single
factor, the vector ζ contains p + 1 elements. The first p elements are the
uniquenesses associated with the elements of ε and the last element is the dis-
turbance term ζ.
A natural question that arises in the context of the educational tracking exam-
ple is whether we can unequivocally ascribe mean differences in latent self-
concept to the effect of educational tracking. When there is random
assignment of observations to conditions, we are in a stronger position to
argue for causal effects of treatments. However, in this example, random
assignment was not utilized. Indeed, numerous manifest and latent variables
are in play that select children into educational tracks. This section considers
the problem of nonrandom selection into treatment conditions as it pertains
to issues of factorial invariance. We also consider methods that attempt to
account for nonrandom selection mechanisms in latent variable models of
group differences.
population of children, still holds within educational tracks, where the tracks
are not usually formed by random assignment.
To motivate the problem of factorial invariance, consider again the factor
analytic model discussed in Chapter 3. As in Equation [3.2] the covariance
matrix of the q variables can be written as
Σ = ΛΦΛ0 + Θ, [4.29]
μ = τx + Λκ, [4.30]
μs = τx + Λκs , [4.31]
and
Σs = ΛΦs Λ0 + Θs : [4.32]
Equations [4.31] and [4.32] mean that under the strong factorial invari-
ance the structural intercepts and the factor loadings are invariant across the
groups but that the factor means, factor covariance matrix, and covariance
matrix of the uniquenesses may differ. In contrast, strict factorial invariance
retains Equation [4.32] but now requires that the matrix of unique variances is
constant across subpopulations—namely, that
Σs = ΛΦs Λ0 + Θ: [4.33]
04-Kaplan-45677:04-Kaplan-45677.qxp 6/24/2008 8:20 PM Page 82
4.8 Conclusion
This chapter covered the merging of path analysis and factor analysis into a
comprehensive structural equation modeling methodology. We extended the
basic model to cover modeling in multiple groups. In addition, we discussed
new developments in modeling nonrandom selection that can occur in quasi-
experimental applications of structural equation modeling.
The general model discussed in this chapter, as well as the special cases dis-
cussed so far rest on a certain set of statistical assumptions. It is crucial for
these assumptions to be met to have confidence in the inferences drawn from
application of structural equation modeling. The next chapter discusses, in detail,
the assumptions underlying structural equation modeling.
Notes
1. Duncan (1975) has used the term structural equation modeling to refer to what
we are calling path analysis. Although the terminology is somewhat arbitrary, I have
decided to maintain the common parlance used in this field.
2. The focus on multiple group measurement models is due to the fact that these
are the most common models studied across groups. Presenting the multiple group
measurement model does not result in a loss of generality.
3. It is possible to assess group differences in the unrestricted model, using a vari-
ety of factor comparability measures. See, for example, Harman (1976).
4. However, this is only reasonable if there is random selection of observations
and random assignment to groups.
05-Kaplan-45677:05-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 85
5
Statistical Assumptions Underlying
Structural Equation Modeling
5.1 Nonnormality
As with any other sample statistic, one can obtain its variance and covari-
ance with other sample statistics. The general form of the asymptotic covari-
ance matrix of covariances in s can be written as
N −1
ðN − 1Þacovðsij ; sgh Þ = sig sjh + sih sjg + kijgh , [5.3]
N
05-Kaplan-45677:05-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 87
With Equation [5.4] as the weight matrix, FWLS reduces to the generalized
least squares estimator discussed in Chapter 2.
The robustness properties of the ADF estimator have been the subject of
numerous simulation studies. The results of early studies of the ADF estimator
were somewhat mixed. For example, Browne (1982) found ADF estimates to
be biased. Muthén and Kaplan (1985, 1992), on the other hand, found very lit-
tle bias in ADF estimates. In all cases, the ADF chi-square was smaller than ML
chi-square when applied to continuous nonnormal data. However, when ADF
was applied to categorical data, B. Muthén and Kaplan (1992) found that the
ADF chi-square was markedly sensitive and that this sensitivity increased as the
size of the model increased. Moreover, ADF standard errors were noticeably
downward biased, becoming worse as the model size increased.
A troubling feature of the ADF estimator concerns the computational dif-
ficulties encountered for models of moderate size. Specifically, with p variables
there are u = –12 p(p + 1) elements in the sample covariance matrix S. The weight
matrix W is of order u × u. Therefore, the size of the weight matrix grows
rapidly with the number of variables. So, if a model were estimated with 20 vari-
ables, the weight matrix would contain 22,155 distinct elements. Moreover,
ADF estimation required that the sample size (for each group if relevant) exceed
p + –12 p(p + 1) to ensure that the weight matrix is nonsingular. These constraints
have limited the utility of the ADF estimator in applied settings. Below, we dis-
cuss some new estimators that appear to work well for small samples.
More recently, three expectation-maximization (EM) based ML estima-
tors have been developed for the structural equation modeling framework
which provide robust chi-square tests and correct standard errors under non-
normality. These estimators are distinguished by the approach they take for the
calculation of standard errors. The first method uses a first-order approxima-
tion of the asymptotic covariance matrix of the estimates to obtain the stan-
dard errors and is referred to as the MLF estimator. The second method is the
conventional ML estimator that uses the second-order derivatives of the
observed log-likelihood. The third method is based on a sandwich estimator
derived from the information matrices of ML and MLF and produces the
05-Kaplan-45677:05-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 88
8
>
> Ci − 1 if νi, Ci − 1 < yi
>
< i −2 if νi, Ci − 2 < yi ≤ νi, Ci − 1
>
> C
yi = ... , [5.5]
>
>
>
> 1 if νi, 1 < yi ≤ νi, 2
>
:
0 if yi ≤ νi, 1
y = τy + Λη + ε, [5.6]
η = α + Bη + Γx + ζ: [5.7]
variables x that do not follow a linear factor model. This specification allows
for two types of cases of Muthén’s general model. In Case A, there is no x vec-
tor. This case allows Muthén’s framework to capture all of the models consid-
ered by Jöreskog and Sörbom (1993). For Case B, the vector x is included
allowing one to capture models in which x is a nonstochastic vector of vari-
ables (such as dummy variables). An example of such a model would be the
MIMIC model discussed in Chapter 4, where x may represent treatment group
conditions.
If it is reasonable to assume that continuous and normally distributed y∗
variables underlie the categorical y variables, then from classic psychometric
theory a variety of latent correlations can be specified. Table 5.1 summarizes
Pearson and the latent correlations.
The first step in Muthén’s approach is to estimate the thresholds for the
categorical variables using ML. In the second step, the latent correlations (e.g.,
tetrachoric correlations) are estimated. Finally, in the third step, a consistent
estimator of the asymptotic covariance matrix of the latent correlations is
obtained and implemented in a WLS estimator.
Observe that Muthén’s approach is quite flexible insofar as any combina-
tion of categorical and continuous observed variables can be present in the
data. The only requirement is the assumption that the categorical variables are
associated with continuous normally distributed latent response variables. A
trivariate normality test was offered by Muthén and Hofacker (1988). If the
assumption of trivariate normality holds, then bivariate normality could be
assumed for each pair.
The first simulation study examining the performance of the categorical
variable methodology (CVM) estimator compared with estimators for contin-
uous variables was Muthén and Kaplan (1985). In this article, Muthén and
Kaplan examined the ability of CVM to recover the parameters of the y∗ model
when the variables were split into 25%/75% dichotomies. The results showed
that CVM yielded a slight underestimation of chi-square but that the parame-
ter estimates and sampling variability were well in line with expected values.
Similar findings were observed for multiple group mean structure models
Kaplan (1991b).
WLS (B. Muthén, 1984) and compare the results with the new robust estima-
tion methods for categorical data (B. Muthén et al., in press). Normal theory
ML results are reproduced for comparative purposes. All analyses used Mplus
(L. Muthén & Muthén, 2006).
Table 5.2 presents the results of the estimators under study. A comparison
of the WLS estimators against ML reveals some noticeable differences. In
particular, the standard errors under the WLS-based estimators are generally
smaller than those under normal theory ML. This is expected given theoretical
studies of these estimators. The pattern of changes in the estimates does not
reveal any particular pattern. Finally, and perhaps most noticeably, the chi-
square test of model fit is smaller for the WLS-based estimators compared to ML.
Indeed, the robust WLS estimators result in progressively smaller goodness-of-
fit tests compared to WLS without such a correction.
Estimates (S.E.)
IRTSCI ON
SCIGRA10 1.228 (0.034) 1.873 (0.052) 1.876 (0.051) 1.876 (0.051)
SCIGRA10 ON
CHALLG −0.033 (0.017) −0.035 (0.012) −0.038 (0.012) −0.038 (0.012)
SCIGRA6 0.781 (0.020) 0.500 (0.012) 0.544 (0.013) 0.544 (0.013)
SES 0.239 (0.026) 0.207 (0.017) 0.288 (0.017) 0.288 (0.017)
CERTSCI −0.040 (0.039) −0.037 (0.025) −0.049 (0.025) −0.049 (0.025)
UNDERSTD 0.168 (0.015) 0.144 (0.013) 0.153 (0.013) 0.153 (0.013)
CHALLG ON
UNDERSTD 0.318 (0.010) 0.409 (0.011) 0.408 (0.011) 0.408 (0.011)
UNDERSTD ON
CERTSCI −0.031 (0.033) −0.019 (0.025) −0.017 (0.026) −0.017 (0.026)
Residual variances
UNDERSTD 1.858 (0.031)
CHALLG 1.250 (0.021)
SCIGRA10 2.637 (0.043)
IRTSCI 29.290 (0.483) 19.452 (0.491) 22.759 (0.501) 22.759 (0.501)
Chi-square 1321.31 1190.19 1001.68 901.51
df 10 10 10 9
05-Kaplan-45677:05-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 92
It should be pointed out that the results of this application, though gener-
ally consistent with theoretical expectations, may not occur in practice. Indeed,
the application of estimators that account for nonnormality may reveal other
problems that could actually inflate chi-square and standard errors.
MAR. This is because although the observed values on Y are not a random
sample from the original sample, they are a random sample within subgroups
defined on X (Little & Rubin, 2002). Finally, if the probability of responding to
Y depends on Y even after controlling for X, then the data are neither MAR nor
OAR. Here again, the missing data mechanism is nonignorable.
1
acovðsij , sgh Þ = ðsig sjh + sih sjg Þ, [5.8]
n
05-Kaplan-45677:05-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 94
nijgh
acovðsij , sgh Þ = ðsig sjh + sih sjg Þ, [5.9]
nij ngh
y = τ + Λη + ε, [5.10]
s = Γη η + Γy y + δ, [5.11]
where Γη and Γy are coefficient matrices that allow the selection to be a function
of η, y∗, or both, and δ is a vector of disturbances. In essence, the vector s∗
05-Kaplan-45677:05-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 95
θ = ðτ, Λ, Ψ, Θε Þ, [5.13]
where the matrix Θδ ε allows for the possibility that ε and δ are correlated.
The approach taken by Muthén et al. (1987) is to arrange the data into G
distinct missing data patterns. Let Σg and Sg be the population and sample
covariance matrices for the gth missing data pattern, respectively. Then, follow-
ing Little (1982), B. Muthén et al. (1987), and Rubin (1976), the log-likelihood
can be written as
X
G
log Lðθ, φjyÞ = log hg ðθjyÞ + log f ðθ, φjyÞ [5.15]
g =1
where
1
log hg ðθjyÞ = const: − N g logjΣg j
2 [5.16]
1 g
− N trðΣg Þ − 1 ½Sg + ðyg − μg Þðyg − μg Þ0 :
2
s = Γy + δ: [5.17]
B. Muthén et al. (1987) show that when Γ2 = 0, and Θδε = 0, then the sec-
ond term on the right-hand side of Equation [5.18] does not enter into the dif-
ferentiation with respect to the model parameters in the first term of the
right-hand side of Equation [5.18]. These conditions satisfy the definition of
MAR.
Next, consider the case where respondents omit data at Time 2 due to true
academic self-concept rather than on any one or more of the unreliable measures
of academic self-concept. In this case, the selection model can be written as
s = Γη η + δ: [5.19]
05-Kaplan-45677:05-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 97
B. Muthén et al. (1987) show that in this case, the likelihood involves both
model parameters θ and missing data parameters φ. Hence, the assumption of
MAR is not satisfied in this case and the missing data mechanism is not ignorable.
Some Studies Using the FQL Estimator. The FQL estimator was compared with
LPA and PPA in an extensive simulation study by B. Muthén et al. (1987). The
general findings were that the FQL estimator was superior to the more tradi-
tional approaches to handling missing data even under conditions where it was
not appropriate to ignore the mechanism that generated the missing data.
In another simulation study, Kaplan (1995b) demonstrated the superior-
ity of the FQL estimator of the PPA approach to data that were missing com-
pletely at random by design.
Problems With the FQL Estimator. The major problem associated with the FQL
estimator is that it is restricted to modeling under a relatively small number of
distinct missing data patterns. Specifically, for the covariance matrices for each
distinct group to be positive definite, the number of respondents in any given
group must be one more than the total number of variables in the model.4
With the exception of cases where missing data is by design,5 small numbers
for distinct missing data patterns are rare in social and behavioral science
applications.
Once the mean vectors and covariance matrices have been formed, the full
information ML approach of Arbuckle (1996) uses the fact that for the ith case,
the log-likelihood function can be written as
1 1
log Li = Ci − logjΣi j − ðxi − μi Þ0 Σi− 1 ðxi − μi Þ, [5.20]
2 2
and the log-likelihood of the entire sample is the sum of the individual log-
likelihoods. As usual, the likelihood is maximized in terms of the parameters of
the model.
A comparable development in estimation under MAR was proposed by
B. Muthén and incorporated in Mplus (L. Muthén & Muthén, 2006). In the
first step the mean, variances, and covariances are estimated via ML using the
EM algorithm suggested by Little and Rubin (2002) with no restrictions. This
is referred to as the H1 model. Then, the model of interest (H0) is estimated
conditional on the exogenous variables.6 If there are missing values on the exoge-
nous variables, they are estimated via ML using EM and held fixed at those val-
ues when the H0 model is estimated. A fitting function similar to Equation [5.20]
is used, yielding a large sample chi-square test of model fit.
Simulation studies of ML under MAR have been undertaken. For example,
Arbuckle (1996) compared his full-information ML approach with PPA and
LPA under missing data conditions of MCAR and MAR. His results suggested
that the ML approach performed about the same as PPA and LPA with respect
parameter estimate bias but outperformed these ad hoc methods with respect
to sampling variability. More recently, a simulation study by Enders and
Bandalos (2001) demonstrated the effectiveness of full-information ML to PPA
and LPA under MCAR and MAR.
Varð^o1 Þ sym: ^1
o [5.21]
W21 = n½ o
^1 ^2
o ,
^ 2Þ
o1 , o
Covð^ o2 Þ
Varð^ ^2
o
^ 1 ) and Var(o
where Var(o ^ 2 ) are the variances of o
^ 1 and o ^ 1, o
^ 2 , and Cov( o ^ 2 ) is the
covariance of o ^ 1 and o ^ 2 , which is not assumed to be zero. In this case, the
determinant of the middle term in Equation [5.21] can be written as
D = V1 V2 − C21 2
, where V1 = Varð^ o1 Þ, V2 = Varðω ^2 Þ, and C21 = Covð^ o1 , o ^ 2 Þ.
Thus, Equation [5.21] can be reexpressed as
n 2
W21 = o V2 − o
½^ ^ 2 C21 − o
^ 1o ^ 21 C21 + o
^ 22 V1 : [5.22]
D 1
From Equation [5.22], it can be seen that the Wald test involves the covari-
ance of the parameters C21. If C21 = 0, then the simultaneous Wald test in
Equation [5.22] decomposes into the sum of two univariate Wald tests. When
C21 = 0, we say that the test statistics are asymptotically independent and the
hypotheses associated with these tests are separable (Aitchison, 1962; see also
Satorra, 1989).
The example just given for the case of two hypotheses can be expanded to
sets of multivariate hypotheses (see Kaplan & Wenger, 1993). Of particular
relevance to the problem of specification error propagation is the case of three
hypotheses. Specifically, Kaplan and Wenger (1993) considered the problem of
restricting two parameter estimates, say o ^ 1 and ω̂3 that have zero covariance—
that is, C31 = 0. However, o ^ 1 has a nonzero covariance with o ^ 2 and ω̂3 has a
nonzero covariance with ω̂3. Then, in this case, o ^ 1 and ω̂3 are asymptotically
independent, but the joint hypothesis that o ^ 1 and ω̂3 are zero will not decom-
pose into the sum of the individual hypotheses because of their “shared” covari-
ance with o ^ 2 . In other words, these particular hypotheses are not separable.
Kaplan and Wenger (1993) referred to hypotheses of this sort as transitive. For
the hypotheses associated with these parameters to be separable, all three para-
meters must be mutually asymptotically independent.
The above discussion suggests that the pattern of zero and nonzero values
in the covariance matrix of the estimates is the underlying mechanism that
governs the impact of testing parameters, either singly or in sets. In the context
of specification error, if a parameter (or sets of parameters) has a zero covari-
ance with another parameter, then the restriction of one (say on the basis of
the Wald test) will not affect the estimate or standard error of the other para-
meter. In addition, the concept of transitive hypotheses suggest that if two
parameters are asymptotically independent but not mutually asymptotically
independent with respect to a third parameter, then the restriction of one will
affect the other due to its shared covariance with that third parameter (Kaplan &
Wenger, 1993).
05-Kaplan-45677:05-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 101
Y
N
f ðzjθÞ f ðz1 , z2 , . . . , zN jθÞ = ðzi jθÞ, [5.23]
i=1
Following our notation from above, arrange the conditional and marginal
parameters accordingly:
Note that for the bivariate normal (and by extension the multivariate nor-
mal), x is weakly exogenous for the estimation of the parameters in ω1 because
the parameters of the marginal distribution ω2 do not appear in the set of the
parameters for the conditional distribution ω1. In other words, the choices of
values of the parameters on ω2 do not restrict in any way the range of values
that the parameters in ω1 can take.
The multivariate normal distribution, as noted above, belongs to the class
of elliptically symmetric distributions. Other distributions include Student’s t,
the logistic, and the Pearson type III distributions. Consider the case where the
joint distribution can be characterized by a bivariate Student’s t distribution
(symmetric but leptokurtotic). The conditional and marginal densities under
the bivariate Student’s t can be written as (Spanos, 1999)
νs2 1
ðyjxÞ ffi St ðb0 + b1 xÞ, 1+ ½x − m2 2 ν + 1 ,
ν−1 νs22 [5.27]
xffi St½m2 , s222 ; ν,
5.5 Conclusion
Notes
1. The vech(·) operator takes the –12 s (s + 1) nonredundant elements of the matrix
and strings it into a vector of dimension [–12 s (s + 1)]×1.
05-Kaplan-45677:05-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 107
6
Evaluating and Modifying
Structural Equation Models
109
06-Kaplan-45677:06-Kaplan-45677.qxp 6/24/2008 8:32 PM Page 110
w2b − w2t
NFI = , [6.1]
w2b
where χ2b is the chi-square for the model of complete independence (the so-
called baseline model) and χ2t is the chi-square for the target model of interest.
The model of complete independence will typically be associated with a very
large chi-square value because the null hypothesis tested by χ2b states that there
are no covariances among the variables in the population. Therefore, values
close to 0 suggest that the target model is not much better than a model of
complete independence among the variables. Values close to 1 suggest that the
target model is an improvement over the baseline model. As noted above, a
value of 0.95 is considered evidence that the target model fit is a good fit to the
data relative to the baseline model.
An index that is similar to the NFI but which takes into account the
expected value of the chi-square statistic of the target model is the Tucker-
Lewis index (TLI; Tucker & Lewis, 1973), also referred to as the nonnormed fit
index (NNFI).1 The TLI can be written as
where dfb denotes the degrees of freedom for the model of complete independence
and dft denotes the degrees of freedom for the target model of interest. This
index may yield values that lie outside the 0 to 1 range.
The NFI and TLI assume a true null hypothesis and therefore a central
chi-square distribution for the test statistic. However, an argument could be
made that the null hypothesis is never exactly true and that the distribution of
the test statistic can be better approximated by a noncentral chi-square with
noncentrality parameter λ. An estimate of the noncentrality parameter can be
obtained as the difference between the statistic and its associated degrees of
freedom. Thus, for models that are not extremely misspecified, an index devel-
oped by McDonald and Marsh (1990) and referred to as the relative noncen-
trality index (RNI) can be defined as
ðw2b − dfb Þ − ðw2t − dft Þ
RNI = : [6.3]
w2b − dfb
The RNI can lie outside the 0 to 1 range. To remedy this, Bentler (1990)
adjusted the RNI so that it would lie in the range of 0 to 1. This adjusted ver-
sion is referred to as the comparative fit index (CFI).2
06-Kaplan-45677:06-Kaplan-45677.qxp 6/24/2008 8:32 PM Page 112
Finally, there are classes of CFIs that adjust existing fit indices for the
number of parameters that are estimated. These are so called parsimony-based
CFIs. The rationale behind these indices is that a model can be made to fit the
data by simply estimating more and more parameters. Indeed, a model that is
just-identified fits the data perfectly. Therefore, an appeal to parsimony would
require that these indices be adjusted for the number of parameters that are
estimated. One such parsimony-based index proposed by Mulaik et al. (1989)
is the parsimony-NFI (PNFI) defined as
dft
PNFI = NFI: [6.4]
dfb
The rationale behind the PNFI is as follows. Note that the baseline model
of complete independence restricts all off-diagonal elements of the covariance
matrix to zero. Thus, the degrees of freedom for the baseline model is
dfb = pðp − 1Þ=2 , where p is the total number of observed variables and repre-
sents the degrees-of-freedom for the most restricted model possible. The more
parameters estimated in the target model, the less restricted the model becomes
relative to the baseline model and the greater the penalty attached to the NFI.
As noted above, considerable attention has been paid to the development
and application of CFIs. The extant literature is replete with studies on the
behavior of these, and many other, CFIs. The major questions concern the
extent to which these indices are sensitive to sample size, method of estimation,
and distributional violations (Marsh, Balla, & McDonald, 1988). A detailed
account of the extant studies on the behavior of the CFIs is beyond the scope
of this chapter. An excellent review can be found in Hu and Bentler (1995).
Suffice to say, that use of comparative indices has not been without con-
troversy. In particular, Sobel and Borhnstedt (1985) argued early on that these
indices are designed to compare one’s hypothesized model against a scientifi-
cally questionable baseline hypothesis. That is, the baseline hypothesis states
that the observed variables are completely uncorrelated with each other. Yet,
as Sobel and Borhnstedt have pointed out, one would never seriously entertain
such a hypothesis, and that perhaps these indices should be compared with a
different baseline hypothesis. Unfortunately, the conventional practice of
structural equation modeling as represented in scholarly journals does not
suggest that these indices have ever been compared to anything other than the
baseline model of complete independence (see also Tanaka, 1993).
we provide the alternative fit indices described above and interpret the fit of
the model with respect to those indices. We will revisit our conclusions regard-
ing the fit of the model in Section 6.2 after the model has been modified.
Table 6.1 provides the TLI and CFI as described in Section 6.1. If we were
to evaluate the fit of the model on the basis of indices that compare the speci-
fied model against a baseline model of independence, we would conclude here
as well that the model does not fit the data well. That is, the TLI does not reach
or exceed the criterion of 0.95 for acceptable fit. Also, the CFI, which does not
rest on the assumption of a true population model but takes into account pop-
ulation noncentrality, does not suggest good fit.
Table 6.1 Selected Alternative Measures of Model Fit for the Initial Education
Indicators Model
χ (df = 39)
2
1730.524 0.000
Tucker-Lewis index 0.792
Comparative fit indexa 0.844
RMSEA 0.081 0.000
RMSEA lower bound 0.077
RMSEA upper bound 0.084
AIC 209277.638
BIC 209420.573
not fit this covariance matrix perfectly. Further, let Σ ~ 0 = ΣðΩ0 Þ be the best
fit of the model to the population covariance matrix. Finally, let Σ ^ = ΣðΩÞ
^ be
the best fit of the model to the sample covariance matrix S.
From these various fit measures, Browne and Cudeck (1993) defined three
types of discrepancies that are of relevance to our discussion. First, there
is FðΣ0 , Σ
~ 0 Þ referred to as the discrepancy due to approximation, which mea-
sures the lack of fit of the model to the population covariance matrix. Second,
there is the discrepancy due to estimation, defined as FðΣ~ 0 , ΣÞ
^ , which measures
the difference between the model fit to the sample covariance matrix and the
model fit to the population covariance matrix. The discrepancy due to approx-
imation is unobserved but may be approximated by
E½FðΣ ^ = n−1 q,
~ 0 ,ΣÞ
[6.5]
where q are the number of unknown parameters of the model (Browne &
Cudeck, 1993). Finally, there is the discrepancy due to overall error, defined
as FðΣ0 , Σ
~ 0 Þ , which measures the difference between the elements of the
population covariance matrix and the model fit to the sample covariance matrix.
Here too, this quantity is unobserved but may be approximated by
which is the sum of the discrepancy due to approximation and the discrepancy
due to error of estimation (Browne & Cudeck, 1993). For completeness of our
discussion we may wish to include the usual sample discrepancy function
F^ = FðS, ΣÞ
^ defined in Chapter 2.
Measures of approximate fit are concerned with the discrepancy due to
approximation. Based on the work of Steiger and Lind (1980; see also Browne
& Cudeck, 1989), it is possible to assess approximate fit of a model in the pop-
ulation. The method of Steiger and Lind (1980) for measuring approximate fit
can be sketched as follows. First, it should be recalled that in line with statisti-
cal distribution theory, if the null hypothesis is true, then the likelihood ratio
possesses a central chi-square distribution with d degrees of freedom. If the
null hypothesis is false, which will almost surely be the case in most realistic
situations, then the likelihood ratio statistic has as a noncentral chi-square
distribution with d = –12 p (p + 1)−q degrees of freedom and noncentrality para-
meter, λ. The noncentrality parameter serves to shift the central chi-square dis-
tribution to the right.
Continuing, let F0 be the population discrepancy value that would be
obtained if the model were fit to the population covariance matrix. Generally,
F0 > 0 unless the model fits the data perfectly, in which case F0 = 0. Further, let
06-Kaplan-45677:06-Kaplan-45677.qxp 6/24/2008 8:32 PM Page 115
^ = F0 + d
EðFÞ , [6.7]
n
with the bias being d/n. The bias in F^ can be reduced by forming the estimator
d
F^0 = F^ − : [6.8]
n
as the estimate of the error due to approximation (Browne & Cudeck, 1993).
From Equation [6.8], we now have a population discrepancy function F0
and its estimator F^0. However, an inspection of Equation [6.8] shows that F^0
decreases with increasing degrees of freedom. Thus, to controlg for model
complexity, Steiger (1990; see also Steiger & Lind, 1980) defines a root mean
square error of approximation (RMSEA) as
rffiffiffiffiffi
F0
ea = , [6.10]
d
with point estimate
sffiffiffiffiffi
F^0
^ea =
d [6.11]
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
= maxf½F^ − n − 1 d,0g:
H0 : e ≤ 0:05: [6.12]
06-Kaplan-45677:06-Kaplan-45677.qxp 6/24/2008 8:32 PM Page 116
Recall that the likelihood ratio (LR) discussed in Chapter 2 can be written
here as
When the goal is to use the AIC for model comparison and selection then
AIC(Ha) is common to all computations based on the same data and cancels
out of the comparisons. Therefore, AIC(H0) can be simply defined as
The use of the AIC requires fitting several competing models. As noted
above, the model with the lowest value of the AIC among the competing mod-
els is deemed to fit the data best from a predictive point of view. The smallest
value of the AIC is referred to as the minimum AIC (MAIC).
A particularly important feature of the AIC is that it can be used for com-
paring nonnested models. Nonnested models are quite different in terms of
their structural specification. However, as discussed below, the AIC can be used
to select from a series of nested models that are formed on the basis of relax-
ing constraints in the model.
06-Kaplan-45677:06-Kaplan-45677.qxp 6/24/2008 8:32 PM Page 119
pðM1 jYÞ pðYjM1 Þ pðM1 Þ
= [6.19]
pðM2 jYÞ pðYjM2 Þ pðM2 Þ
= B12 π12 ,
where B12 is called the Bayes factor, and π12 is the prior odds of M1 relative to
M2. Note that in the case of neutral prior odds, the Bayes factor is the ratio of
the marginal likelihood of M1 to the marginal likelihood of M2. In the case
where the prior odds are not neutral, the Bayes factor is the ratio of the poste-
rior odds to the prior odds.
It is possible to avoid using the prior probabilities in Equation [6.19] and
still obtain a rough approximation to the Bayes factor. Specifically, we define
the BIC as
where q0 is the number of parameters under the null hypothesis and n is the
sample size. Following Raftery (1993), it can be shown that
1
B12 = e − =2 BIC12 : [6.21]
Studies have shown that the BIC tends to penalize models with too many
parameters more harshly than the AIC.
CVI = FðSV , Σ
^ C Þ, [6.22]
06-Kaplan-45677:06-Kaplan-45677.qxp 6/24/2008 8:32 PM Page 120
where SV is the sample covariance matrix from the validation sample, and Σ̂c is
the fitted covariance matrix from the calibration sample. This index measures
the extent to which the model fitted to the calibration sample also fits the
validation sample.
If we consider the expected value of the CVI over validation samples given
the calibration sample we have (Browne & Cudeck, 1993)
EðCVIÞ = E½FðSV ,Σ
^ C jΣ ^ C Þ + n−1 p ,
^ C Þ = FðΣ0 ,Σ [6.23]
v
where nV is the size of the validation sample and p∗ = –12 p(p + 1) is the number
of nonredundant elements in Σ. It can be seen that the CVI is a biased estimate
of the overall discrepancy with the bias being n−1
v p . Note that one cannot
−1
remove the bias by simply subtracting nv p from FðSV , Σ ^ C Þ because in some
cases the resulting value of the fit function would take on an inadmissible
negative value. And, in any case, n−1
v p is the same for all competing models that
are fit to the calibration sample and would not change the rank ordering of the
competing models (Browne & Cudeck, 1993).
The above discussion assumes that one can split the sample to form a cal-
ibration sample and validation sample. Clearly, this is a disadvantage when one
is working with small samples. The problem arises from the fact that the over-
all error (defined above) is larger in small samples. Thus, it is desirable not to
split the sample but to develop a measure of cross-validation based only on the
calibration sample.
For the purposes of developing a single sample cross-validation index, we
must assume that the sample sizes for the calibration and the validation sam-
ples are the same. Then, it can be shown (Browne & Cudeck, 1993) that the
ECVI is approximately
^ + 2n − 1 q
c = FðS,ΣÞ [6.25]
In practice, it is often the case that structural equation models are rejected on
the basis of the likelihood ratio chi-square and/or one or more of the alterna-
tive indices of fit described in Section 6.1. The reasons for model rejection are,
of course, many. The most obvious reasons include (a) violations of underly-
ing assumptions, such as normality and completely random missing data;
(b) incorrect restrictions placed on the model; and (c) sample size sensitivity.
The problem of violating assumptions was discussed in Chapter 5. There we
noted that in some cases violations of assumptions could be addressed—but we
argued that an explicit presentation of the assumptions was crucial when evalu-
ating the quality of the model. In addition, in Chapter 5, we noted that specifi-
cation errors in the form of incorrect restrictions was a pernicious problem and
intimately related to the issue of sample size sensitivity. In this section, we con-
sider the problem of modifying models to bring them closer in line with the data.
We consider model modification in light of statistical power thereby more for-
mally integrating the problem of specification error and sample size sensitivity.
When a model, such as our science achievement model, is rejected on the
basis of the LR chi-square test, attempts are usually made to modify the model
06-Kaplan-45677:06-Kaplan-45677.qxp 6/24/2008 8:32 PM Page 122
to bring it line with the data. Assuming one has ruled out assumption viola-
tions such as nonnormality, missing data, and nonindependent observations
(Kaplan, 1990a), methods of model modification usually involve relaxing
restrictions in the model by freeing parameters that were fixed in the initial
specification. The decision to free such parameters is often guided by the size
of the LM statistic, which as discussed Chapter 2, possesses a one degree-of-
freedom chi-square distribution and gives the decrease in the overall LR chi-
square test when the parameter in question is freed. The LM test is also referred
to as the modification index (MI) (Sörbom, 1989).
For each restricted but potentially identified parameter, there exists a LM
test. Software programs generally list each of the LM test values and in some
cases will provide the largest LM value. The temptation, of course, is to relax
the fixed parameter associated with the maximum LM value. The difficulty
here is that the parameter associated with the largest LM value may not be one
that makes any substantive sense whatsoever (see, e.g., Kaplan, 1988).
Regardless of whether the parameter with the largest associated LM value
is relaxed first, typically more than one, and often many, modifications to the
model are made. Two problems exist when engaging in numerous model
modifications. First, extant simulation studies have shown that searching for
specification errors via the LM test does not always result in locating the speci-
fication errors imposed on a “true” model—that is, the model that generated
the covariance matrix (Kaplan, 1988, 1989c; Luijben, Boomsma, & Molenaar,
1987; MacCallum, 1986).
A second problem associated with unrestricted model modifications on
the basis of the LM test is the increase in the probability of Type II errors
resulting from the general goal of not rejecting the null hypothesis that the
model fits the data (Kaplan, 1989b; MacCallum, Roznowski, & Necowitz,
1992). In one sense, the way to mitigate the problem of Type II errors is to free
paths that have maximum power.
A general method for calculating the empirical power of the LR chi-square
test was given by Satorra and Saris (1985). Their approach can be outlined as
follows. First, estimate the model under the null hypothesis H0 and obtain the
LR chi-square statistic. Second, estimate a new model, call it H1 (not to be con-
fused with the unrestricted alternative Ha), which consists of the model under
H0 with parameters fixed at their maximum likelihood estimates obtained
from the first step but with the restriction of interest dropped and replaced
with an alternative “true” parameter value to be tested. Note that estimating the
model under H0 with parameters fixed at their maximum likelihood values will
yield a chi-square test value of 0 with degrees of freedom equal to the number
of degrees of freedom of the model. When estimating the H1 model under the
H0 specification, the resulting chi-square will no longer be 0. Indeed, the chi-
square statistic resulting from this test is distributed as a noncentral chi-square
06-Kaplan-45677:06-Kaplan-45677.qxp 6/24/2008 8:32 PM Page 123
l = nde2 , [6.26]
where n = N − 1. Note that from the standpoint of the RMSEA, perfect fit
implies that ε = 0, which in turns implies that the distribution of the test
statistic is a central chi-square distribution because under the central chi-
square distribution, the noncentrality parameter, λ = 0. However, using
06-Kaplan-45677:06-Kaplan-45677.qxp 6/24/2008 8:32 PM Page 124
Equation [6.26], MacCallum et al. (1996) showed that the test of the hypo-
thesis of close fit e ≤ 0:05 can be tested with a noncentral chi-square with
noncentrality parameter λ given in Equation [6.26].
But now suppose that the true value of ε is 0.08 (considered mediocre fit),
and we test the hypothesis of close fit, e ≤ 0:05 . What is the power of this test?
To be specific, let ε0 represent the null hypothesis of close fit and let εa repre-
sent the alternative hypothesis of mediocre fit. The distribution used to test the
2
null hypothesis in this case is the noncentral chi-square distribution wd, l0 and
the distribution used to test the alternative hypothesis is the noncentral chi-
2
square distribution wd,la , where d are the degrees of freedom and λ0 and λa are
the respective noncentrality parameters. From here, the power of the test of
close fit is given as
2
where wc is the critical value of chi-square under the null hypothesis for a
given Type I error probability α (MacCallum et al., 1996).
Given that the power of the test can be determined from values of N, d, ε0,
εa, and α, we can turn to the question of determining the sample size necessary
to achieve a desired level of power. MacCallum et al. (1996) use an indirect
approach of interval halving to solve the problem of assessing the sample size
necessary to achieve a desired level of power for testing close fit. The details of
this procedure can be found in their article. Suffice to say here that the minimum
N necessary to achieve a desired level of power of the test of close fit against the
alternative of mediocre fit is a function of the degrees of freedom of the model
where models with large degrees of freedom require smaller sample sizes.
MacCallum et al. (1996) are careful to point out that their procedure must
be used with caution because, for example, models associated with a very large
number of degrees of freedom may yield required minimum sample sizes that
are smaller than the number of variables one has in hand. They also correctly
point out that their procedure is designed for omnibus testing of close fit and
that the sample size suggested for adequate power for the overall test of close
fit may not be adequate for testing parameter estimates. This concern is in line
with Kaplan (1989c), who found that power can be different in different parts
of the model even for the same size specification error. Thus, sample size effects
are not uniform throughout a model.
false. To see this, recall again that the log-likelihood ratio test can be written as
n × FML . Clearly, if the model fits perfectly, FML will be 0 and the sample size will
have no affect. Sample size comes into play in its interaction with model mis-
fit—where FML will then take on a value greater than zero. Thus, there is a need
to gauge the relative affect of sample size against the degree of model misfit.
A method of gauging the influence of sample size and model misfit in the
context of power analysis is through the use of the expected parameter change
(EPC) statistic. The EPC was developed by Saris, Satorra, and Sörbom (1987)
as a means of gauging the size of a fixed parameter if that parameter were freed.
To motivate this statistic, let ωi be a parameter that takes on the value ω0 (usu-
ally zero) under the null hypothesis. Let doi = ∂ ln LðΩÞ=∂oi evaluated at o ^i ,
where ln LðΩÞ is the log-likelihood function. Then, Saris et al. (1987) defined
the EPC as
MI
EPC = oi − o0 = , [6.28]
doi
" #1
Varð^ξÞ
2
θSEPC = θEPC
g , [6.29]
g
ηÞ
Varð^
index, the addition of the path from SES to SCIACH resulted in a significant
drop in the value of the likelihood ratio chi-square. Moreover, the selected set
of indices presented in Table 6.3 show improvement, although the RMSEA
remains statistically significant. Second, the addition of this path does not
result in a substantial change in the other paths in the model. That is, the esti-
mated effects in the initial model remain about the same.
Measurement model
INVOLV BY
MAKEMETH 1.000 0.000 0.000 0.606 0.738
OWNEXP 0.724 0.027 26.469 0.439 0.605
CHOICE 0.755 0.029 26.036 0.458 0.507
CHALL BY
CHALLG 1.000 0.000 0.000 0.917 0.748
UNDERST 0.757 0.024 31.043 0.694 0.503
WORKHARD 0.867 0.026 32.921 0.795 0.723
Structural model
CHALL ON
INVOLV 0.251 0.027 9.282 0.166 0.166
INVOLV ON
CERTSCI 0.012 0.031 0.399 0.021 0.006
SCIGRA10 ON
CHALL 0.264 0.026 10.319 0.242 0.133
SCIACH ON
SCIGRA10 1.013 0.035 29.120 1.013 0.317
SES 2.456 0.086 28.710 2.456 0.313
SCIGRA10 ON
SCIGRA6 0.788 0.021 37.146 0.788 0.416
SES 0.240 0.028 8.734 0.240 0.098
CERTSCI 0.032 0.068 0.466 0.032 0.005
Selected goodness-of-fit indices
χ2 (df = 38) = 953.726, p < .000; ; TLI = 0.884; CFI = 0.915; RMSEA = 0.060, p < .05;
BIC = 208652.581
06-Kaplan-45677:06-Kaplan-45677.qxp 6/24/2008 8:32 PM Page 129
Measurement model
INVOLV BY
MAKEMETH 1.000 0.000 0.000 0.606 0.738
OWNEXP 0.724 0.027 26.469 0.439 0.605
CHOICE 0.755 0.029 26.036 0.458 0.507
CHALL BY
CHALLG 1.000 0.000 0.000 0.917 0.748
UNDERST 0.757 0.024 31.043 0.694 0.503
WORKHARD 0.867 0.026 32.921 0.795 0.723
Structural model
CHALL ON
INVOLV 0.251 0.027 9.282 0.166 0.166
INVOLV ON
CERTSCI 0.012 0.031 0.399 0.021 0.006
SCIGRA10 ON
CHALL 0.264 0.026 10.319 0.242 0.133
SCIACH ON
SCIGRA10 0.748 0.037 20.019 0.748 0.235
SES 2.191 0.085 25.695 2.191 0.279
SCIGRA6 1.214 0.072 16.944 1.214 0.201
SCIGRA10 ON
SCIGRA6 0.788 0.021 37.146 0.788 0.416
SES 0.240 0.028 8.734 0.240 0.098
CERTSCI 0.032 0.068 0.466 0.032 0.005
Selected goodness-of-fit indices
χ2 (df = 37) = 675.121, p < .000; TLI = 0.917; CFI = 0.941; RMSEA = 0.051, p = 0.336,
BIC = 208382.783
6.3 Conclusion
Notes
probability distribution of molecules and showed that entropy was equal to the log-
probability of a statistical distribution.
7. A true distribution can be realized through Monte Carlo simulations.
8. Note that other software programs for structural equation modeling such as
AMOS (Arbuckle, 1999), LISREL (Jöreskog & Sörbom, 2000), and EQS (Bentler, 1995)
may provide more or fewer fit indices than Mplus.
07-Kaplan-45677:07-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 133
7
Multilevel Structural
Equation Modeling
133
07-Kaplan-45677:07-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 134
When we carefully consider the problem of analyzing data arising from hierar-
chically nested systems, such as schools, it is clear that neither standard struc-
tural equation modeling nor standard multilevel modeling alone can give a
complete picture of the problem under investigation. Indeed, use of either
methodology separately could result in different but perhaps equally serious
specification errors. Specifically, using conventional structural equation model-
ing assuming simple random samples alone would ignore the sampling schemes
that are often used to generate educational data and would result in biased
structural regression coefficients (B. Muthén & Satorra, 1989). The use of mul-
tilevel modeling alone would preclude the analyst from studying complex indi-
rect and simultaneous effects within and across levels of the system. What is
required, therefore, is a method that combines the best of both methodologies.
07-Kaplan-45677:07-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 135
The second method is the conventional ML estimator, which uses the second-
order derivatives of the observed log-likelihood. The third method is based on
a sandwich estimator derived from the information matrices of ML and MLF
and produces the correct asymptotic covariance matrix of the estimates that is
not dependent on the assumption of normality, and which also yields a robust
chi-square test of model fit. This estimator is referred to as MLR. The MLR is
a robust full information ML estimator for MLLVMs. A small simulation study
reported in Asparouhov and Muthén (2003) compared the ML and MLR
estimator to a mean-adjusted and mean- and variance-adjusted ML estimator.
The results demonstrated better performance of the MLR estimator for non-
normal variables than that obtained from the maximum likelihood estimator
with mean and variance adjustment.
(2001) has extended multilevel modeling to the item response theory context.
However, a full discussion of their work is beyond the scope of this chapter.
To begin, consider a model that decomposes a p-dimensional response
vector yig for student i in school g into the sum of a grand mean μ, a between-
groups part νg and a within-groups part uig. That is,
The total sample covariance matrix for the response vector yig can be written as
ΣT = Σb + Σw , [7.2]
ng
1X
y:g = y [7.3]
ng i = 1 ig
ng
1X G X
y = y [7.4]
N g = 1 i = 1 ig
ng
1 X G X
Sw = ðy − y:g Þðyig − y:g Þ0 [7.5]
N − G g = 1 i = 1 ig
1 X G
0
Sb = ng ðy:g − yÞðy:g − yÞ , [7.6]
G − 1 g =1
where y–.g is the sample mean for group g, y– is the grand mean, Sw is the sample
pooled within-group covariance matrix, and Sb is the between-groups covari-
ance matrix.
As with the standard application of linear regression to data arising from
multistage sampling, the application of factor analysis should also account for
nested effects. For example, a battery of attitude items assessing student per-
ceptions of school climate administered to students will most likely exhibit
between-school variability. Ignoring the between-school variability in the
scores of students within schools will result in predictable biases in the para-
meters of the factor analysis model. Therefore, it is desirable to extend multi-
level methodology to the factor analysis framework.
To start, let the vector of student responses be expressed in terms of the
multilevel linear factor model as
where yig was defined earlier, ν is the grand mean, Λw is the factor loading matrix
for the within-group variables, ηwig is a factor that varies randomly across units
within groups, Λb is the between-groups factor loading matrix, ηbg is a
factor that varies randomly across groups, ⑀wig and ⑀bg are within- and between-
group uniquenesses. Under the standard assumptions of linear factor analysis,
here extended to the multilevel case, the total sample covariance matrix defined
in Equation [7.2] can be expressed in terms of factor model parameters as
where Φw and Φb are the factor covariance matrices for the within-group and
between-group parts and Θw and Θb are diagonal matrices of unique vari-
ances for the within-group and between-group parts.
Generally speaking, it is usually straightforward to specify a factor struc-
ture for the within-school variables. It is also straightforward to allow for
within-school variables to vary between schools. Conceptual difficulties often
arise in warranting a factor structure to explain variation between groups. In
an example given in Kaplan and Kreisman (2000) examining student percep-
tions of school climate, two clear factors were extracted for the within-school
part, but the between-school part appeared to suggest one factor. The fact that
it is sometimes difficult to conceptualize a factor structure for the between-
groups covariance matrix does not diminish the importance of taking the
between-group variability into account when conducting a factor analysis on
multilevel structured data.
As noted above, multilevel regression models may not be suited for capturing
the structural complexity within and between organizational levels. For exam-
ple, it may be of interest to determine if school level variation in student
science achievement can be accounted for by school-level variables. Moreover,
one might hypothesize and wish to test direct and indirect effects of school
level exogenous variables on that portion of student-level achievement that
07-Kaplan-45677:07-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 140
Within-School Model
Calculating Mathematics
Train timetable 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
Discount % 1.187∗ 0.022 1.190∗ 0.022 1.136∗ 0.026 1.135∗ 0.025
Size (m2) of a floor 1.140∗ 0.023 1.140∗ 0.023 1.125∗ 0.027 1.124∗ 0.027
Graphs in newspaper 0.909∗ 0.021 0.908∗ 0.021 0.876∗ 0.026 0.875∗ 0.026
Distance on a map 1.184∗ 0.028 1.185∗ 0.028 1.113∗ 0.031 1.109∗ 0.033
Petrol consumption rate 0.881∗ 0.022 0.883∗ 0.022 0.905∗ 0.027 0.904∗ 0.027
Solving equations
3x + 5 = 17 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
2(x + 3)=(x + 3)(x − 3) 1.060∗ 0.015 1.059∗ 0.015 1.039∗ 0.020 1.036∗ 0.021
Calculating Mathematics −0.133∗ 0.016 −0.113∗ 0.024
on MALE
Solving Equations on MALE −0.039 0.023 0.001 0.038
Factor Covariances
Calculating Mathematics 0.286∗ 0.009 0.284∗ 0.009 0.197∗ 0.008 0.197∗ 0.008
with Solving Equations
Between-School Model
General Mathematics Emphasis
Train timetable 1.000 0.000 1.000 0.000
Discount % 1.373∗ 0.067 1.379∗ 0.068
Size (m2) of a floor 1.192∗ 0.062 1.195∗ 0.060
Graphs in newspaper 1.047∗ 0.063 1.053∗ 0.064
Distance on a map 1.460∗ 0.102 1.474∗ 0.105
Petrol consumption rate 0.752∗ 0.072 0.764∗ 0.074
3x + 5 = 17 1.808∗ 0.132 1.814∗ 0.143
2(x + 3)=(x + 3)(x − 3) 1.987∗ 0.136 1.994∗ 0.145
General Mathematics −0.057 0.051
Emphasis on MALE
Model Fit Indices
χ2 456.250 (19 df ) 526.500 (25 df ) 641.253 (39 df ) 670.784 (52 df )
AIC 86593.8 95130.6 85173.1 89797.8
BIC 86758.6 95308.9 85443.3 90088.2
NOTE: Unstandardized estimates are displayed. SE = standard error; AIC = Akaike information criterion; BIC = Bayesian
information criterion.
Values are statistically significant at p < . 05.
07-Kaplan-45677:07-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 141
Within-School
Train timetable
Discount %
Distance on a map
Gender
Petrol consumption
rate
3x + 5 = 17
Solving
Equations
2(x + 3) = (x + 3)(x − 3)
Between-School
Train timetable
Discount %
Size of a floor
General
Gender Mathematics Graphs in newspaper
Emphasis
Distance on a map
3x + 5 = 17
2(x + 3) = (x + 3)(x − 3)
varies over schools. We argue that these questions are important for a fuller
understanding of organizational systems and such questions can be addressed
via multilevel structural equation modeling.
For ease of notation and development of concepts, we focus our discus-
sion on multilevel path analysis. By focusing on this model, we are assuming
that reliable and valid measures of the variables are available. We recognize that
this assumption may be unreasonable for most social and behavioral science
research, but as shown in the previous section, multilevel measurement mod-
els exist that allow one to examine heterogeneity in measurement structure.
Indeed, as a matter of modeling strategy, it may be very informative to exam-
ine heterogeneity in measurement structure prior to forming scales to be used
in multilevel path analysis. However, it is possible to combine multilevel path
models and measurement models into a comprehensive multilevel structural
equation model.
The model that we will consider allows for varying intercepts and varying
structural regression coefficients. Earlier work on multilevel path analysis by
Kaplan and Elliott (1997a) building on the work of B. Muthén (1989) specified
a structural model for varying intercepts only. This “intercepts as outcomes”
model was applied to a specific educational problem in Kaplan and Elliott
(1997b) and Kaplan and Kreisman (2000).
In what follows, we write the within-school (Level-1) full structural equa-
tion model as
Note how Equations [7.10] to [7.12] allow for randomly varying inter-
cepts and two types of randomly varying slopes—namely, Bg are randomly
07-Kaplan-45677:07-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 143
varying slopes relating endogenous variables to each other and Γg are ran-
domly varying slopes relating endogenous variables to exogenous variables.
These randomly varying structural coefficients are modeled as functions of a
set of between-school predictors zg and wg. These between-school predictors
appear in Equations [7.10] to [7.12], but their respective regression coefficients
are parameterized to reflect a priori structural relationships.
Of particular importance for substantive research is the fact that the full mul-
tilevel path model allows for a set of structural relationships among between-
school endogenous and exogenous variables, which we can write as
zg = τ + Δzg + Ωw g + δg , [7.13]
where τ, Δ, and Ω are the fixed structural effects. Finally, ε, ζ, θ, and δ are dis-
turbance terms that are assumed to be normally distributed with mean zero
and covariance matrix T with elements
0 1
s2E
B szE s2z C
T=B
@ syE
C:
A [7.14]
syz s2y
sdE sdz sdy s2d
After a series of substitutions we can obtain the reduced form of the Level-1
and Level-2 models and express yig as a function of a grand mean, the main effect
of within-school variables, the main effect of between-school variables, and the
cross-level moderator effects of between- and within-school variables. These
reduced form effects contain the structural relations as specified in Equations [7.9]
through [7.13]. The importance of this model is that if w consists of variables that
could, in principle, be manipulated in the context of a hypothetical experiment,
and then this model could be used to test cross-level causal hypotheses taking into
account the structural relationships between and within levels.1
Although this discussion has focused on multilevel structural equation
modeling with manifest variables, it is relatively straightforward to specify a
multilevel structural equation among latent variables. A review of the extant
literature had not uncovered an application of the full model described here
using latent variables, except in the context of the analysis of longitudinal data,
which is described next.
2003 survey (OECD, 2004). The final outcome variable at the student level was
a measure of mathematics achievement (MATHSCOR).2 Mediating predictors
of mathematics achievement consisted of whether students enjoyed mathe-
matics (ENJOY) and whether students felt mathematics was important in life
(IMPORTNT). Student exogenous background variables included students’
perception of teacher qualities (PERTEACH), as well as both parents’ educa-
tional levels (MOMEDUC & DADEDUC). At the school level, a model was
specified to predict the extent to which students are encouraged to achieve
their full potential (ENCOURAG). A measure of teachers’ enthusiasm for their
work (ENTHUSIA) was viewed as an important mediator variable between
background variables and encouragement to make students achieve full poten-
tial. The variables used to predict encouragement via teachers’ enthusiasm
consisted of math teachers’ use of new methodology (NEWMETHO), consen-
sus between math teachers with regard to school expectations and teaching
goals as they pertain directly to mathematics instruction (CNSENSUS), and
the teaching conditions of the school (CNDITION). The teaching condition
variable was computed from the shortage of school’s equipment, so higher val-
ues on this variable reflect a worse condition.
A diagram of the multilevel path model is shown in Figure 7.2. The dia-
gram is drawn to convey the fact that the intercepts of the endogenous vari-
ables, ENJOY, IMPORTNT, and MATHSCOR and the slope of MATHSCOR
on ENJOY are regressed on the endogenous and exogenous school level vari-
ables. The results of the multilevel path analysis are displayed in Table 7.2.
First, we estimated the intraclass correlations to determine the amount of
variation in the student-level variables that can be accounted for by differences
between schools. We found intraclass correlations (not shown) ranging from a
low of 0.02 for the importance of math in one’s life to a high 0.259 for mathe-
matics achievement. Under the heading “Within School,” we find that MOME-
DUC, DADEDUC, ENJOY, and IMPORTNT are significant and positive
predictors of MATHSCOR. We also observe that ENJOY is significantly and
positively predicted by PERTEACH. Finally, MOMEDUC, PERTEACH, and
ENJOY are positive and significant predictor of IMPORTNT.
Of importance to this chapter are the results under the heading “Between
School.” Here, we find that the resource conditions of the school (CNDITION)
and the extent to which the school encourages students to use their full poten-
tial (ENCOURAG) are both significant predictors of math achievement.
Enjoyment of mathematics is significantly related to whether there is consen-
sus among mathematics teachers in with regard to expectations and teaching
goals. Importance of mathematics is related to the resource conditions of the
school. Teacher enthusiasm for their work is significantly predicts the extent to
which they encourage students to use their full potential. Enthusiasm is pre-
dicted by use of new methods for teaching math and the extent of consensus
07-Kaplan-45677:07-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 145
Within-School Between-School
Model Model
Estimate SE Estimate SE
ENJOY on MATHSCOR on
PERTEACH 0.457∗ 0.026 NEWMETHO 6.806 6.550
IMPORTNT on ENTHUSIA −14.081 8.881
MOMEDUC 0.026∗ 0.006 CNSENSUS 2.407 7.898
PERTEACH 0.245∗ 0.021 CNDITION 3.366 6.683
ENJOY 0.534∗ 0.015 ENCOURAG 14.594 7.299
ENJOY on
NEWMETHO 0.008 0.025
ENTHUSIA 0.016 0.038
CNSENSUS 0.109∗ 0.036
CNDITION 0.019 0.025
ENCOURAG −0.035 0.024
IMPORTNT on
NEWMETHO −0.027 0.019
ENTHUSIA 0.028 0.031
CNSENSUS 0.057 0.030
CNDITION 0.044∗ 0.020
ENCOURAG 0.002 0.020
ENCOURAG on
ENTHUSIA 0.579∗ 0.086
ENTHUSIA on
NEWMETHO 0.164∗ 0.044
CNSENSUS 0.323∗ 0.067
CNDITION −0.042 0.040
NOTE: Unstandardized estimates are displayed. SE = standard error. Values are statistically significant
at p < .05.
07-Kaplan-45677:07-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 146
Within-School
MOMEDUC
ENJOY
DADEDUC MATHSCOR
IMPORTNT
PERTEACH
Between-School
ENJOY
IMPORTNT
MATHSCOR
NEWMETHO
CNSENSUS
ENTHUSIA ENCOURAG
CNDITION
RANDOM SLOPE
of math and math achievement, where poorer conditions of the school lowers
the relationship between enjoyment of math and math achievement.
The models described earlier in this chapter account for the multilevel nature
of organizations such as schools. So far, though, it has been assumed that ran-
dom samples from each level of analysis have been obtained. However, it is not
often the case that random samples from each level of the organization are
obtained, but rather it is more likely that samples are taken from each level of
the system with unequal probabilities, often due to the necessity of oversam-
pling underrepresented units of analysis.
Such complex sampling is common with large-scale national and interna-
tional assessments. For example, in the Early Childhood Longitudinal Study
(ECLS-K) (NCES, 2001), a three-stage sampling design is employed. The first
stage is primary sampling units of single counties or groups of counties. The
second stage is schools within counties. The third stage of sampling is students
within schools. A process of stratification as well as disproportionate sampling
was employed in the first two stages of sampling. Disproportionate sampling
was also employed at the third stage of sampling. Clearly, therefore, to obtain
unbiased estimates of population parameters, some type of weighting scheme
to reflect these design features must be used.
The problem of using sampling weights in complex sample designs has
been widely studied in the sampling literature (Kish, 1965; Kish & Frankel,
1974; Tryfos, 1996) and will not be discussed in detail in this chapter. Of con-
cern to us, however, is that the issue of sampling weights applied in complex
sample surveys have recently been discussed in the literature on structural
equation modeling (Asparouhov, 2005; Kaplan & Ferguson, 1999; Stapleton,
2002). This section overviews the recent literature on the incorporation of
sampling weights in the structural equation modeling framework and distills
the important findings and recommendations that are relevant for the appli-
cation of structural equation modeling to complex sample surveys.
The first systematic study of sampling weights in the structural equation
modeling framework was conducted by Kaplan and Ferguson (1999). In their
study, Kaplan and Ferguson considered the case of single sample factor analy-
sis applied to a simple random sample taken from a population with a mixture
of strata of different sizes. In their design, Kaplan and Ferguson assumed that
the size of the population and the size of the strata were known to the inves-
tigator, but due to the unequal strata sizes, it is necessary to apply sample
weights.
07-Kaplan-45677:07-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 148
To motivate the central ideas in the Kaplan and Ferguson (1999) study,
note first that a weight wi is simply the inverse of the probability of sample
selection—namely,
1
wi = , [7.15]
pi
where pi is the probability of sample selection. With pi = n/N, where n is the size
of the strata and N is the population size, the weights sum to the population
sample size N.
The weighting scheme studied by Kaplan and Ferguson (1999) was based
on the Horvitz-Thompson estimator (Horvitz & Thompson, 1952). The cen-
tral idea of the Horvitz-Thompson estimator is that when the inclusion prob-
abilities are known, raw sampling weights can be computed and applied to
analyses so that unbiased estimates of population parameters can be obtained.
A well-known disadvantage to the use of raw sampling weights is that they
sum to the population sample size. Clearly, this would have profound effects on
the size of standard errors, but in the latent variable context, there would also
be profound inflation of goodness-of-fit indices based on the likelihood ratio
chi-square. Therefore, it may be preferable to use normalized sampling weights
that sum to the actual sample size.
In their analysis, Kaplan and Ferguson (1999) compared raw sampling
weights to normalized sampling weights using a factor analysis model.
Specifically, they employed the PRELIS software program (Jöreskog &
Sörbom, 2000) to compute weighted variances and covariances that followed
a factor analysis model. The weighted variances and covariances are calcu-
lated as
P
wi ðxi − xi Þ2
Varw = i P [7.16]
i Wi
and
P
wi ðxi − xi Þðyi − yi Þ
Covw ðx, yÞ = i P , [7.17]
i wi
weights led to much greater bias in the population parameter values compared
with when either raw or normalized sampling weights are employed. Standard
errors appeared to be somewhat underestimated when sampling weights are
employed. Interestingly, Kaplan and Ferguson found that although the nor-
malized weighting procedure yielded likelihood ratio chi-square values close to
the true value, the remaining goodness-of-fit indices showed no discernible
pattern relative regardless of weighting. Kaplan and Ferguson concluded that
in terms of applications of latent variable modeling, incorporating sampling
weights should be routine practice when the weights are provided or otherwise
known. Weighting is crucial for accurate inferences in latent variable models
with normalized sampling weights showing potential with regard to standard
errors.
Although the Kaplan and Ferguson (1999) study may have been the first
to systematically examine the use of sampling weights in the latent variable
modeling situation, their approach did not consider sampling weights in mul-
tilevel structural equation models of the sort discussed in Sections 7.2 and 7.3.
Rather, more recent studies have extended that work to multilevel structural
equation modeling.
A recent study by Stapleton (2002) provided a systematic examination
of sampling weights employed in multilevel structural equation models.
Stapleton focused on three forms of weighting based on work by Potthoff,
Woodbury, and Manton (1992). The first form of weighting employs standard
raw weights and produces a sampling variance based on the population sam-
ple size. The second form of weighting employs relative weights that summed
to the actual sample size, but has been shown to yield downward bias in esti-
mates of sampling variance (Potthoff et al., 1992). The third weighting scheme
produces weights that sum to the effective sample size but has been shown by
Potthoff et al., to yield unbiased estimates of the sampling variance of the
mean. Stapleton argued that the use of the effective sample size weights may
address a conjecture by Kaplan and Ferguson that the underestimation of stan-
dard errors using relative weights was due to not adjusting the standard errors
in the process of ML estimation.
In a detailed simulation study using a prototype multilevel structural
equation model, Stapleton (2002) corroborated the findings of Kaplan and
Ferguson (1999) regarding raw and relative sampling weights but found that
effective sampling weights yielded relatively robust estimates without the need
to adjust standard errors.
A more recent study by Stapleton (2006) examined five approaches for
obtaining robust estimates and standard errors when structural equation
modeling is applied to data from complex sampling designs. The approaches
included (a) robust maximum likelihood estimation ignoring stratification
07-Kaplan-45677:07-Kaplan-45677.qxp 6/24/2008 8:21 PM Page 150
Table 7.3 Results of Confirmatory Factor Analysis (CFA) of PISA 2003 Mathematics
Assessment Using Student Sampling Weights
Single-Level CFA Without Weights Single-Level CFA With Weights
Within-School Model
Calculating Mathematics
Train timetable 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
Discount % 1.187∗ 0.022 1.190∗ 0.022 1.179∗ 0.023 1.182∗ 0.023
Size (m2) of a floor 1.140∗ 0.023 1.140∗ 0.023 1.141∗ 0.022 1.140∗ 0.022
∗ ∗ ∗ ∗
Graphs in newspaper 0.909 0.021 0.908 0.021 0.904 0.021 0.903 0.021
Distance on a map 1.184∗ 0.028 1.185∗ 0.028 1.185∗ 0.028 1.189∗ 0.028
Petrol consumption rate 0.881∗ 0.022 0.883∗ 0.022 0.880∗ 0.022 0.884∗ 0.022
Solving equations
3x + 5 = 17 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
2(x + 3)=(x + 3)(x − 3) 1.060 ∗
0.015 1.059 ∗
0.015 1.060 ∗
0.016 1.060 ∗
0.015
Factor Covariances
Calculating Mathematics 0.286∗ 0.009 0.284∗ 0.009 0.284∗ 0.009 0.282∗ 0.009
with Solving Equations
NOTE: SE = standard error; AIC = Akaike information criterion; BIC = Bayesian information criterion.
Values are statistically significant at p < . 05.
7.5 Conclusion
recently given in Kaplan, Kim, and Kim (in press). However, the linkage of
multilevel latent variable models with finite mixture modeling is richer than
that considered in the Kaplan et al. chapter—allowing for models of, say
students nested within schools, but where there might exist unobserved het-
erogeneity among schools that can be captured by finite mixture modeling.
In the final analysis, multilevel latent variable modeling and its special
cases provide a natural framework for cross-sectional studies. The the next
chapter, we consider latent growth curve modeling for longitudinal data.
Notes
8
Latent Growth Curve Modeling
T hus far, the examples used to motivate the utility of structural equation
modeling have been based on cross-sectional data. Specifically, it has been
assumed that the data have been obtained from a sample of individuals mea-
sured at one point in time. Although it may be argued that most applications
of structural equation modeling are applied to cross-sectional data, it can also
be argued that most social and behavioral processes under investigation are
dynamic, that is, changing over time. In this case, cross-sectional data constitute
only a snapshot of an ongoing dynamic process and interest might naturally
center on the study of this process.
Increasingly, social scientists have access to longitudinal data that can pro-
vide insights into how outcomes of interest change over time. Indeed many
important data sets now exist that are derived from panel studies (e.g., NCES,
1988; NELS:88; The National Longitudinal Study; The Longitudinal Study of
American Youth; and the Early Childhood Longitudinal Study; to name a few).
Access to longitudinal data allows researchers to address an important class of
substantive questions—namely, the growth and development of social and
behavioral outcomes over time. For example, interest may center on the devel-
opment of mathematical competencies in young children (Jordan, Hanich, &
Kaplan, 2003a, 2003b; Jordan, Kaplan, & Hanich, 2002). Or, interest may cen-
ter on growth in science proficiency over the middle school years. Moreover,
in both cases, interest may focus on predictors of individual growth that are
assumed to be invariant across time (e.g., gender) or that vary across time (e.g.,
a student’s absenteeism rate during a school year).
This chapter considers the methodology of growth curve modeling—a
procedure that has been advocated for many years by researchers such as
Raudenbush and Bryk (2002); Rogosa, Brandt, and Zimowski, (1982); and
Willett (1988) for the study of intraindividual differences in change (see also
Willett & Sayer, 1994). The chapter is organized as follows. First, we consider
155
08-Kaplan-45677:08-Kaplan-45677.qxp 6/24/2008 8:31 PM Page 156
90
85
IRT Science Achievement Scores
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0 1 2 3 4
LSAY Waves
21
20
19
18
17
Science Attitude Scores
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4
LSAY Waves
achievement and the rate of change over time and link these parameters of
growth to time-varying and time-invariant variables. In this example, such
predictors will include student gender as well as teacher and parental push
variables. However, in addition to applying univariate growth curve models, we
also examine how these outcomes vary together in a multivariate growth curve
application.
08-Kaplan-45677:08-Kaplan-45677.qxp 6/24/2008 8:31 PM Page 158
The specification of growth models can be viewed as falling within the class of
multilevel linear models (Raudenbush & Bryk, 2002), where Level 1 represents
intraindividual differences in initial status and growth, and Level-2 models
individual initial status and growth parameters as a function of interindividual
differences.
To fix ideas, consider a growth model for a continuous variable such
science achievement. We can write a Level-1 equation expressing outcomes
over time within an individual as
where yip is the achievement score for person p at time i, π0p represents the
initial status at time t = 0, π1p represents the growth trajectory, ti represents a
temporal dimension that here isassumed to be the same for all individuals—
such as grade level, and εip is the disturbance term. Later in this chapter, we
consider more flexible alternatives to specifying time metrics.
Quadratic growth can also be incorporated into the model by extending
the specification as
and
where mp0 and mp1 are intercept parameters representing population true
status and population growth when xp is zero; gp0 and gp1 are slopes relating
xp to initial status and growth, respectively.
The model specified above can be further extended to allow individuals to
be nested in groups such as classrooms. In this case, classrooms become a
Level-3 unit of analysis. Finally, the model can incorporate time-varying pre-
dictors of change. In the science achievement example, such a time-varying
predictor might be changes in parental push or changes in attitudes toward
science over time. Thus, this model can be used to study such issues as the
influence of classroom-level characteristics and student-level invariant and
varying characteristics on initial status and growth in reading achievement
over time.
Research by B. Muthén (1991) and Willett and Sayer (1994) have shown how
the general growth model described in the previous section can also be incor-
porated into a structural equation modeling framework. In what follows, the
specification proposed by Willett and Sayer (1994) is described. The broad
details of the specification are provided; however, the reader is referred to
Willett and Sayer’s (1994) article for more detail.
The Level-1 individual growth model can be written in the form of the
factor analysis measurement model in Equation [4.24] of Chapter 4 as
y = τy + Λy η + ε, [8.5]
where y is a vector representing the empirical growth record for person p. For
example, y could be science achievement scores for person p at the 7th, 8th, 9th,
10th, and 11th grades.
In this specification, τy is an intercept vector with elements fixed to zero
and Λy is a fixed matrix containing a column of ones and a column of constant
time values. Assuming that time is centered at the seventh grade,1 the time
constants would be 0, 1, 2, 3, and 4. The matrix η contains the initial status and
growth rate parameters denoted as π0p and π1p, and the vector ε contains
measurement errors, where it is assumed that Cov(ε) is a diagonal matrix of
constant measurement error variances. Because this specification results in the
initial status and growth parameters being absorbed into the latent variable vec-
tor η, which vary randomly over individuals, this model is sometimes referred to
as a latent variable growth model (B. Muthén, 1991). The growth factors, as in the
multilevel specification, are random variables.
08-Kaplan-45677:08-Kaplan-45677.qxp 6/24/2008 8:31 PM Page 160
η = α + Bη + ζ, [8.6]
x = τx + Λx ξ + δ, [8.7]
Initial Growth
Status Rate
1
1
1
1
1
0 1 2 3 4
seventh grade. The results indicate that the average seventh grade science
achievement score is 50.51 and increases an average of 2.21 points a year. The
correlation between the initial status and rate of change is negative suggesting
the possibility of a ceiling effect. Figure 8.1 presents a random sample of 50
model-estimated science achievement trajectories.
Column 2 of Table 8.1 presents the results of the linear growth curve
model with gender as a time-invariant predictor of initial status and growth
rate. A path diagram of this model is shown in Figure 8.4. The results indicate
a significant difference in favor of boys for seventh grade science achievement,
but no significant difference between boys and girls in the rate of change over
the five grades.
Column 3 of Table 8.1 presents the results of the linear growth curve model
with the time-varying covariates of PAP and STP included. The results for
gender remain the same. A path diagram of this model is shown in Figure 8.5.
The results for the time-varying covariates suggest that early PAP is a stronger
predictor of early science achievement compared with STP. However, the
effects of both time-varying covariates balance out at the later grades.
08-Kaplan-45677:08-Kaplan-45677.qxp 6/24/2008 8:31 PM Page 164
Gender
Initial Growth
Status Rate
1
1 1 1 1
0 1 2 3 4
Growth in Attitudes Toward Science. Column 1 of Table 8.2 shows the results of
the simple linear growth curve model applied to the science attitude data. Path
diagrams for this and the remaining models are not shown. The results show a
seventh grade average attitude score of 14.25 points (on a scale of 1 to 20) and
a small but significant decline over time. Moreover, a strong negative correla-
tion can be observed between initial science attitudes and the change over time.
This suggests, as with achievement, that higher initial attitudes are associated
with slower change in attitudes over time.
08-Kaplan-45677:08-Kaplan-45677.qxp 6/24/2008 8:31 PM Page 165
Gender
Initial Growth
Status Rate
1 1
1 1 1
0 1 2 3 4
Table 8.2 Selected Results of Growth Curve Model of Attitudes Toward Science
Effect
Estimates Maximum Likelihood
model, the nonlinear curve fitting approach suggested by Meredith and Tisak
would require that the first loading be fixed to zero to estimate the intercept, the
second loading would be fixed to one to identify the metric of the slope factor, but
the third through fifth loadings would be free. In this case, the time metrics are
being empirically determined. When this type of model is estimated, it perhaps
makes better sense to refer to the slope factor as a shape factor.
(1990), we fix the first and second loadings as in the conventional growth curve
modeling case and free the loadings associated with the third, fourth, and fifth
waves of the study. The results are displayed in Table 8.4. It is clear from an
inspection of Table 8.4 that the nonlinear curve fitting model results in a sub-
stantial improvement in model fit. Moreover, we find that there are significant
sex differences with respect to the intercept in the nonlinear curve fitted model.
Intercept by
Ach1 1.000 1.000
Ach2 1.000 1.000
Ach3 1.000 1.000
Ach4 1.000 1.000
Ach5 1.000 1.000
Shape by
Ach1 0.000 0.000
Ach2 1.000 1.000
Ach3 3.351 3.299
Ach4 3.928 3.869
Ach5 5.089 5.004
∗
Ach. intercept 50.360 49.966∗
Ach. shape 1.693∗ 1.770∗
r(shape, intercept) −0.397 ∗
−0.398∗
Intercept on
Male 0.737∗
Shape on
Male −0.091
BIC 111240.859 115769.718
∗
p < .05.
08-Kaplan-45677:08-Kaplan-45677.qxp 6/24/2008 8:31 PM Page 171
It is not difficult to make the case for specifying an ALT model for devel-
opmental research studies. Consider the example used throughout this chapter
where the focus is on modeling the development of science proficiency
through the middle and high school years. We can imagine that interest centers
on how change in science proficiency predicts later outcomes of educational
relevance—such as majoring in science-related disciplines in college. It is not
unreasonable, therefore, to assume that in addition to overall growth in science
proficiency prior science scores predict later science scores thus suggesting an
autoregressive structure.
In the case of long periods between assessment waves, we might reason-
ably expect small autoregressive coefficients, as opposed to more closely spaced
assessment waves. Nevertheless, if the ALT model represents the true data gen-
erating structure, then omission of the autoregressive part may lead to sub-
stantial parameter bias. A recent article by Sivo, Fan, and Witta (2005) found
extensive bias for all parameters of the growth curve model as well as biases in
measures of model fit when a true autoregressive component was omitted
from the analysis.
For the purposes of this chapter, we focus on the baseline lag-1 ALT
model with a time-invariant predictor. This will be referred to as the
ALT(1) model. The ALT(1) specification indicates that the outcome at time
t is predicted only by the outcome at time t − 1. It should be noted lags
greater than one can also be specified. As with conventional growth curve
modeling, the ALT model can be extended to include more than one out-
come, each having its own autoregressive structure, as well as extensions
that include proximal or distal outcomes and time-varying and time-
invariant predictors.
To contextualize the study consider the example of an ALT model for the
development of reading competencies in young children. The first model is a
baseline lag-1 ALT model. This model can be written in structural equation
modeling notation as
y = α + Λη + By + δ, [8.8]
η = τ + Γη + ζ, [8.9]
Initial Growth
Status Rate
1
1 1
1 1
0 1 2 3
4
Ach5 ON
Ach4 0.135∗ 0.135∗
Ach4 ON
Ach3 0.103∗ 0.103∗
Ach3 ON
Ach2 0.102∗ 0.102∗
Ach2 ON
Ach1 0.031∗ 0.031∗
∗
Ach. intercept 50.335 49.911∗
Ach. slope 0.246∗ 0.329∗
r(slope, intercept) −0.575 ∗
−0.573∗
Intercept on
Male 0.814∗
Slope on
Male −0.156
BIC 111195.653 115723.216
∗
p < .05.
second approach is to exploit the inherent missing data structure. In this case,
we could arrange the data as shown in Table 8.6 patterned after Bollen and
Curran (2006, p. 77). Notice that there are three cohorts and five time points.
Any given child in this example can provide between one and four repeated
measures. The pattern of missing data allows estimation using maximum like-
lihood imputation under the assumption of missing-at-random (Allison,
1987; Arbuckle, 1996; Muthén et al., 1987). Thus, the growth parameters span-
ning the entire time span can be estimated.
As Bollen and Curran (2006) point out, however, this approach suffers
from the potential of cohort effects. That is, children in Cohort 1 may have
been 7 years old at the second wave of assessment, but children in Cohort 2
would have been 7 at the first wave of assessment.
Age of Assessment
1 6 7 8 9
2 7 8 9 10
3 8 9 10 11
08-Kaplan-45677:08-Kaplan-45677.qxp 6/24/2008 8:31 PM Page 175
It may be useful to consider if there are aspects of model fit that are pertinent
to the questions being addressed via the use latent growth curve models.
Clearly, we can apply traditional statistical and nonstatistical measures of fit,
such as the likelihood ratio chi-square, RMSEA, NNFI, or the like. In many
cases, the Bayesian information criterion is used to compare latent growth
curve models as well. However, these measures of fit are capturing whether the
restrictions that are placed on the data to provide estimates of the initial status
and growth rate are supported by the data. In addition, these measures
are assessing whether such assumptions as non-autocorrelated errors are sup-
ported by the data.
The application of traditional statistical and nonstatistical measures of fit
does provide useful information. However, because growth curve models pro-
vide estimates of rates of change, it may be useful to consider whether the
model predicted growth rate fits the empirical trajectory over time. So, for
example, if we know how science achievement scores have changed over the
five waves of LSAY, we may wish to know if our growth curve model accurately
predicts the known growth rate. In the context of economic forecasting, this
exercise is referred to as ex post simulation. The results of an ex post simula-
tion exercise is particularly useful when the goal of modeling is to make fore-
casts of future values.
To evaluate the quality and utility of latent growth curve models, Kaplan
and George (1998) studied the use of six different ex post (historical) simula-
tion statistics originally proposed by Theil (1966) in the domain of economet-
ric modeling. These statistics evaluate different aspects of the growth curve.
The first of these statistics discussed by Kaplan and George was the root mean
square simulation error (RMSSE) as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T [8.10]
RMSSE = 1
T t ðy s − yta Þ2 ,
=1 t
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
T yts − yta
RMSPE = 1
T t yta : [8.11]
=1
A problem with the RMSPE is that its scale is arbitrary. Although the lower
bound of the measure is zero, the upper bound is not constrained. Thus, it is
of interest to scale the RMSSE to lie in the range of 0 to 1. A measure that lies
between 0 and 1 is Theil’s inequality coefficient, defined as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T
1
T t ðy s − yta Þ2
=1 t
U = rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : [8.12]
T T
1
T t ðy s Þ2 + 1
T t ðy a Þ2
=1 t =1 t
y s − y a Þ2
ð
UM = , [8.13]
1 T
T t =1 ðyts − yta Þ2
where ys and ya are the means of the simulated and actual growth record,
respectively, calculated across the T time periods. The bias proportion provides
a measure of systematic error because it considers deviations of average actual
values from average simulated values (Pindyck & Rubinfeld, 1991).
The ideal would be a value of U M = 0. Values greater than 0.1 or 0.2 are
considered problematic.
Another component of Theil’s U is the variance proportion defined as
ðss − sa Þ2
US = , [8.14]
1 T
ðy s − y a Þ2
T t
t =1 t
08-Kaplan-45677:08-Kaplan-45677.qxp 6/25/2008 10:39 AM Page 177
where σs and σa are the standard deviations of the respective growth records
calculated across the T time periods. The variance proportion provides a
measure of the extent to which the model tracks the variability in the growth
record. If U S is large, it suggests that the actual (or simulated) growth record
varied a great deal while the simulated (or actual) growth record did not
deviate by a comparable amount.
A final measure based on the decomposition of the inequality coefficient
is the covariance proportion, defined as
2ð1 − rÞss sa
UC = , [8.15]
T
1
ðy s − y a Þ2
T t =1 t t
U M + U S + U C = 1, [8.16]
Table 8.7 Observed and Predicted Science Achievement Means and Forecasting
Statistics for the Science Achievement Model
60
58
Science Achievement Score
56
54
52
50
48
7 8 9 10 11
Grade
8.6 Conclusion
Notes
1. Clearly, other choices of centering are possible. Centering will not affect the
growth rate parameter but will affect the initial status parameter.
2. LSAY was a National Science Foundation funded national longitudinal study of
middle and high school students. The goal of LSAY was to provide a description of
students’ attitudes toward science and mathematics focusing also on these areas as pos-
sible career choices (Miller et al., 1992, p. 1).
08-Kaplan-45677:08-Kaplan-45677.qxp 6/24/2008 8:31 PM Page 180
09-Kaplan-45677:09-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 181
9
Structural Models for Categorical
and Continuous Latent Variables
T his chapter describes what can be reasonably considered the state of the
art in structural equation modeling—namely, structural equation models
that combine categorical and continuous latent variables for cross-sectional
and longitudinal designs. The comprehensive modeling framework described
in this chapter rests on the work of B. Muthén (2002, 2004),which builds on
the foundations of finite mixture modeling (e.g., McLachlan & Peel, 2000) and
conventional structural equation modeling for single and multiple groups as
described in Chapter 4.
It is beyond the scope of this chapter to describe every special case that
can be accommodated by the general framework. Rather, this chapter
touches on a few key methods that tie into many of the previous chapters.
The organization of this chapter is as follows. First, we set the stage for the
applications of structural equation modeling for categorical and continuous
latent variables with a brief review of finite mixture modeling and the
expectation-maximization (EM) algorithm, following closely the discussion
given in McLachlan and Peel (2000). This is followed by a discussion of
applications of finite mixture modeling for categorical outcomes leading to
latent class analysis and variants of Markov chain modeling. Next, we dis-
cuss applications of finite mixture modeling to the combination of contin-
uous and categorical outcomes, leading to growth mixture modeling. We
focus solely on growth mixture modeling because this methodology encom-
passes structural equation modeling, factor analysis, and growth curve
modeling for continuous outcomes. The chapter closes with a brief overview
of other extensions of the general framework that relate to previous chapters
of this book.
181
09-Kaplan-45677:09-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 182
The approach taken to specifying models that combine categorical and con-
tinuous latent variables is finite mixture modeling. Finite mixture modeling
relaxes the assumption that a sample is drawn from a population characterized
by a single set of parameters. Rather, finite mixture modeling assumes that the
population is composed of a mixture of unobserved subpopulations charac-
terized by their own unique set of parameters.
To fix notation, let z = ðz01 , z02 , . . . , z0n Þ0 denote the realized values of a
p-dimensional random vector Z = ðZ01 , Z02 , . . . Z0n Þ0 based on a random sample
of size n. An element Zi of the vector Z has an associated probability density
function f(zi). Next, define the finite mixture density as
X
K
f ðzi Þ = pk fk ðzi Þ, ði = 1, 2, . . . , n; k = 1, 2, . . . , K Þ, [9.1]
k=1
where the elements of π defined earlier arise from the fact that
c c c
prfCi = ci g = p11i p22i pKKi : [9.3]
pk fk ðzi Þ
τk ðzi Þ = , ði = 1, 2, . . . , n; k = 1, 2, . . . , K Þ: [9.4]
f ðzi Þ
X
K
f ðzi ; ΩÞ = pk fk ðzi ; yk Þ, ði = 1, 2, . . . , n; k = 1, 2, . . . , K Þ, [9.5]
k=1
π = ðp1 , p2 , . . . , pK Þ [9.7]
context of incomplete data problems (Dempster et al., 1977; see also Little &
Rubin, 2002). However, it was soon recognized that a wide array of statistical
models, including the latent class model, could be conceptualized as incomplete
data problems, including finite mixture modeling. Specifically, in context of
finite mixture models, the component label vector c is not observed. The EM
algorithm proceeds by specifying the complete-data vector, denoted here as
K X
X n
log Lcomp ðΩÞ = cik flog pk + log fk ðzi jθÞg, [9.9]
k=1 i=1
where cik is an element of c. The form of Equation [9.9] shows the role of cik as
an indicator of whether individual i is a member of class k.
The EM algorithm involves two steps. The E-step begins by taking the
conditional expectation of Equation [9.9] given the observed data z using
the current estimates of Ω based on a set of starting values, say Ω(0).
Following McLachlan and Peel (2000), the conditional expectation is writ-
ten as
Let Ω(m) be the updated value of Ω after the mth iteration of the EM algo-
rithm. Then the E-step on the (m + 1)th iteration calculates Q(Ω, Ω(m)).
With regard to the class-label vector c, the E-step of the EM algorithm
computes the conditional expectation of Cik given z, where Cik is an element of
C. Specifically, on the (m + 1)th iteration, the E-step computes
ðm + 1Þ
X
n
pk = τk ðzi ; ΩðmÞ Þ=n ðk = 1, 2, . . . , K Þ: [9.12]
i=1
In this section, we discuss models for categorical latent variables, with applica-
tions to cross-sectional and longitudinal designs. This section is drawn from
Kaplan (in press). To motivate the use of categorical latent variables consider the
problem of measuring reading ability in young children. Typical studies of read-
ing ability measure reading on a continuous scale. Using the methods of item
response theory (see, e.g., Hambleton & Swaminathan, 1985), reading measures
are administered to survey participants on multiple occasions, with scores
equated in such a way as to allow for a meaningful notion of growth. However,
in large longitudinal studies such as the Early Childhood Longitudinal Study
(NCES, 2001), not only are continuous scale scores of total reading proficiency
available for analyses but also mastery scores for subskills of reading. For exam-
ple, a fundamental subskill of reading is letter recognition. A number of items
constituting a cluster that measures letter recognition are administered, and,
according to the ECLS-K scoring protocol, if the child receives 3 out of 4 items
in the cluster correct, then the child is assumed to have mastered the skill with
mastery coded “1” and nonmastery coded as “0.” Of course, there exist other,
more difficult, subskills of reading, including beginning sounds, ending sounds,
sight words, and words in context with subskill cluster coded for mastery.
Assume for now that these subskills tap a general reading ability factor. In
the context of factor analysis, a single continuous factor can be derived that
would allow children to be placed somewhere along the factor. Another approach
might be to derive a factor that serves to categorize children into mutually exclu-
sive classes on the latent reading ability factor. Latent class analysis is designed to
accomplish this categorization.
latent classes are, in essence, categorical factors arising from the pattern of
response frequencies to categorical items, where the response frequencies play
a role similar to that of the correlation matrix in factor analysis (Collins, Hyatt,
and Graham, 2000). The analogues of factor loadings are probabilities associ-
ated with responses to the manifest indicators given membership in the latent
class. Unlike continuous latent variables, categorical latent variables serve to
partition the population into discrete groups based on response patterns
derived from manifest categorical variables.
X
A
Pijkl = da rija rjja rkja rlja , [9.13]
a=1
x
where pc is the probability that a randomly selected child will belong to latent
Ajx
class c (c = 1, 2, . . . , C) of the categorical latent variable ξ, pic is the
conditional probability of response i to variable A given membership in latent
Bjx Cjx Djx
class c, and pjc , pkc , plc , and pEjx
mc are likewise the conditional probabilities for
items B, C, D, and E, respectively. For this example, the manifest variables are
dichotomously scored, and so there are two response options for each item.2
Identification of a latent class model is typically achieved by imposing the
constraint that the latent classes and the response probabilities that serve as
indicators of the latent classes sum to 1.0—namely, that
X X Ajx
X Bjx
X Cjx
X Djx
X
pxc = pic = pjc = pkc = plc = Ejx
pmc = 1:0, [9.15]
c i j k l m
09-Kaplan-45677:09-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 187
where the first term on the left-hand side of Equation [9.15] indicates that the
latent class proportions must sum to 1.0, and the remaining terms on the left-
hand side of Equation [9.15] denote that the latent class indicator variables
sum to 1.0 as well (McCutcheon, 2002).3
To continue with our reading example, suppose that we hypothesize that
the latent class variable ξ is a measure of reading ability with three classes (1 =
advanced reading ability, 2 = average reading ability, and 3 = beginning reading
ability). Assume also that we have a random sample of first semester kinder-
garteners. Then, we might find that a large proportion of kindergartners in the
sample who show mastery of letter recognition (items A and B, both coded 1/0)
are located in the beginning reading ability class. A smaller proportion of
kindergartners demonstrating mastery of ending sounds and sight words might
be located in the average reading ability class, and still fewer might be located in
the advanced reading class. Of course at the end of kindergarten and hopefully
by the end of first grade, we would expect to see the relative proportions shift.4
Table 9.1 presents the response probabilities measuring the latent classes
for each wave of the study separately. The interpretation of this table is similar
to the interpretation of a factor loading matrix. The pattern of response proba-
bilities across the subsets of reading tests suggest the labels that have been given
to the latent classes—namely, low alphabet knowledge (LAK), early word read-
ing (EWR), and early word comprehension (ERC). The extreme differences
across time in the likelihood ratio chi-square tests are indicative of sparse cells,
particularly occurring at spring kindergarten. For the purposes of this chapter,
I proceed with the analysis without attempting to ameliorate the problem.
Table 9.1 Response Probabilities and Class Proportions for Separate Latent
Class Models: Total Sample
Class
Latent Class LRb BS ES SW WIC Proportions χ2LR (29 df)
Fall K
LAKc 0.47 0.02 0.01 0.00 0.00 0.67 3.41
EWR 0.97 0.87 0.47 0.02 0.00 0.30
ERC 1.00 0.99 0.98 0.97 0.45 0.03
Spring K
LAK 0.56 0.06 0.00 0.00 0.00 0.24 4831.89∗
EWR 0.99 0.92 0.63 0.05 0.00 0.62
ERC 0.00 0.99 0.99 0.96 0.38 0.14
Fall First
LAK 0.52 0.08 0.01 0.00 0.00 0.15 11.94
EWR 1.00 0.92 0.71 0.05 0.03 0.59
ERC 1.00 0.99 0.98 0.98 0.42 0.26
Spring First
LAK 0.19 0.00 0.00 0.00 0.00 0.04 78.60∗
EWR 0.98 0.90 0.79 0.35 0.00 0.18
ERC 1.00 0.99 0.98 0.99 0.60 0.78
a. Response probabilities are for passed items. Response probabilities for failed items can be com-
puted from 1 − prob (mastery).
b. LR = letter recognition, BS = beginning sounds, ES = ending letter sounds, SW = sight words,
WIC = words in context.
c. LAK = low alphabet knowledge, EWR = early word reading, ERC = early reading comprehension.
∗
p < .05. Extreme value likely due to sparse cells.
09-Kaplan-45677:09-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 189
The last column of Table 9.1 presents the latent class membership pro-
portions across the four ECLS-K waves for the full sample. We see that in fall
of kindergarten, approximately 67% of the cases fall into the LAK class,
whereas only approximately 3% of the cases fall into the ERC class. This break-
down of proportions can be compared with the results for Spring of first grade;
by that time, only 4% of the sample are in the LAK class, whereas approxi-
mately 78% of the sample is in the ERC class.
The example of latent class analysis given in the previous sections presented
results over the waves of the ECLS-K but treated each wave cross-sectionally.
Nevertheless, it could be seen from Table 9.1 that response probabilities did
change over time as did latent class membership proportions. Noting these
changes, it is important to have a precise approach to characterizing change in
latent class membership over time. In this section, we consider changes in
latent class membership over time. We begin by describing a general approach
to the study of change in qualitative status over time via Markov chain model-
ing, extended to the case of latent variables. This is followed by a discussion of
latent transition analysis, a methodology well-suited for the study of stage-
sequential development.
X ðFijkl − fijkl Þ2
w2 = , [9.16]
ijkl
fijkl
where Fijkl are the observed frequencies of the IJKL contingency table and fijkl
are the expected cell counts. The degrees of freedom are obtained by subtracting
the number of parameters to be estimated from the total number of cells of the
contingency table that are free to vary.
In addition to the Pearson chi-square test, a likelihood ratio statistic can
be obtained that is asymptotically distributed as chi-square, where the degrees
of freedom are calculated as with the Pearson chi-square test. Finally, the
Akaike information criterion (AIC) and Bayesian information criterion (BIC)
discussed in Chapter 6 can be used to choose among competing models.
transition probability from time 1 to time 2 for those in category j given they
were in category i at the beginning of the study. The parameter t32 kjj represents
the transition probability from time 2 to time 3 for those in category k given
they were in category j at the previous time point. Finally, the parameter t43 ljk is
the transition probability from time 3 to time 4 for those in category 1given
that they were in category k at the previous time point.
The manifest Markov model can be specified to allow transition probabil-
ities to be constant over time or to allow transition probabilities to differ over
time. The former is referred to as a stationary Markov chain while the latter is
referred to as a nonstationary Markov chain.
A X
X B X
C X
D
Pijkl = d1a r1ija t21 2 32 3 43 4
bja rjjb tcjb rkjc tdjc rljd , [9.18]
a=1 b=1 c =1 d=1
09-Kaplan-45677:09-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 192
Table 9.2 Results of the Nonstationary Manifest Markov Chain Model Applied
to Mastery of Ending Sounds
a. 1 = nonmastery, 2 = mastery.
b. χ2p refers to the Pearson chi-square test, χLR
2
refers to the likelihood ratio chi-square test.
a. 1 = nonmastery, 2 = mastery.
09-Kaplan-45677:09-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 194
latent—in the sense of not being directly observed but possibly measured by
numerous manifest indicators. The advantages to measuring multiple latent
variables via multiple indicators are the known benefits with regard to reliabil-
ity and validity. Therefore, it might be more realistic to specify multiple mani-
fest categorical indicators of the categorical latent variable and combine them
with Markov chain models.
The combination of multiple indicator latent class models and Markov
chain models provides the foundation for the latent transition analysis of stage-
sequential dynamic latent variables. In line with Collins and Flaherty (2002),
consider the current reading example where the data provide information on
the mastery of five different skills. At any given point in time, a child has mas-
tered or not mastered one or more of these skills. It is reasonable in this exam-
ple to postulate a model that specifies that these reading skills are related in
such a way that mastery of a later skill implies mastery of all preceding skills.
At each time point, the child’s latent class membership defines his or her latent
status. The model specifies a particular type of change over time in latent sta-
tus. This is defined by Collins and Flaherty (2002) as a “model of stage-sequential
development, and the skill acquisition process is a stage-sequential dynamic
latent variable” (p. 289). It is important to point out that there is no funda-
mental difference between latent transition analysis and latent Markov chain
modeling. The difference is practical, with latent transition analysis being
perhaps better suited conceptually for the study of change in developmental
status.
The model form for latent transition analysis uses Equation [9.18] except
that model estimation is undertaken with multiple indicators of the latent cat-
egorical variable. The appropriate measurement model for categorical latent
variables is the latent class model.
In Table 9.4, the results of the latent transition probabilities for the full latent
transition model are provided. On the basis of the latent transition analysis, we see
that for those in the LAK class at Fall kindergarten, 30% are predicted to remain
in the LAK class, while 69% are predicted to move to the EWR class and 1% are
predicted to transition to ERC in Spring kindergarten. Among those in the EWR
class at Fall kindergarten, 66% are predicted to remain in that class, and 34% of
the children are predicted to transition to the ERC class in Spring kindergarten.
Among those children who are in the LAK class at Spring Kindergarten,
59% are predicted to remain in that class at Fall of first grade, while 40% are
predicted to transition to the EWR class, with 1% predicted to transition to the
ERC class. Among those children who are in the EWR class in Fall kinder-
garten, 82% are predicted to stay in the EWR class while 18% are predicted to
transition to the ERC class.
Finally, among those children who are in the LAK class in Fall of first grade,
30% are predicted to remain in that class at Spring of first grade, while 48% are
predicted to transition to the EWR class by Spring of first grade, with 22%
Table 9.4 Transition Probabilities From Fall Kindergarten to Spring First Grade
Fall K Spring K
LAK 0.30 0.69 0.01
EWR 0.00 0.66 0.34
ERC 0.00 0.00 1.00
a. LAK = low alphabet knowledge, EWR = early word reading, ERC = early reading comprehension.
09-Kaplan-45677:09-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 196
transitioning to the ERC class. Among those children in the EWR class at fall
of first grade, 13% are assumed to remain in that class with 86% transitioning
to the ERC class by Spring of first grade.
S X
X A X
B X
C X
D
Pijkl = ps d1ajs r1ijas t21 2 32 3 43 4
bjas rjjbs tcjjbs rkjcs tljks rljds , [9.19]
s=1 a=1 b=1 c =1 d =1
Table 9.5 Transition Probabilities for the Mover-Stayer Model: Total Sample
Proportion of
Movers and Stayers (Rows) by Time 1 Classes (Columns) Total Sample
a. LAK = low alphabet knowledge, EWR = early word reading, ERC = early reading comprehension.
Fall K
Spring K
Fall First
Spring First
Fall Third
and
ηi = αc + Bc ηi + Γc xi + ζi , [9.21]
the model did not include the covariate of poverty level, we settled on retain-
ing three growth mixture classes. A plot of the three classes can be found in
Figure 9.2.
From Table 9.6 and Figure 9.2, we label the first latent class, consisting of
35.5% of our samples, as “below average developers.” Students in this class evi-
denced a spring kindergarten mean math achievement score of 23.201, a linear
growth rate of 1.317, and a de-acceleration in growth of .005. We labeled the
second latent class, comprising of 58.3% of our sample, as “average develop-
ers.” Students in this class evidenced a spring kindergarten mean math achieve-
ment score of 33.646, a linear growth rate of 1.890, and a de-acceleration of
.006. Finally, we labeled the third latent class, consisting of 35.5% of our sam-
ple as, “above average developers.” Students in this class evidenced a spring
kindergarten mean math achievement score of 54.308, a linear growth rate of
1.988, and a de-acceleration of −.016.
When poverty level was added into the growth mixture model, three latent
classes were again identified.8 The above average developer class started signifi-
cantly above their peers and continued to grow at a rate higher than the rest of
their peers. Interestingly, the above average achiever group was composed
entirely of students living above the poverty line. The average achiever group
120
100
80
Math IRT Score
60
40
20
0
Fall K Spring K Fall 1st Spring 1st Spring 3rd
Times of Assessment
Average Developers
Above Average Developers
Below Average Developers
was composed of both students who lived above and below the poverty line.
The below average achiever group was composed disproportionately of below
poverty students but did contain some above poverty students. A plot of the
three-class solution with poverty added to the model can be found in Figure 9.3.
120
100
Math IRT Score
80
60
40
20
0
Fall K Spring K Fall 1st Spring 1st Spring 3rd
Time of Assessment
Figure 9.3 The Three-Class Growth Mixture Model With Poverty Status Added
09-Kaplan-45677:09-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 203
Table 9.7 Average Posterior Probabilities for the Three-Class Solution for
Baseline Model
NOTE: Class 1 = average developing; Class 2 = above average; Class 3 = below average.
Table 9.8 Average Posterior Probabilities for the Three-Class Solution With
Poverty Status Included
NOTE: Class 1 = average developing; Class 2 = above average; Class 3 = below average.
9.6 Conclusion
Notes
1. From here on, we will use the term “class” to refer to components of the mix-
ture model. The term is not to be confused with latent classes (e.g., Clogg, 1995)
although finite mixture modeling can be used to obtain latent classes (McLachlan &
Peel, 2000).
09-Kaplan-45677:09-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 205
2. Note that latent class models can handle polytomously scored items.
3. For dichotomous items, it is only necessary to present the value of one latent
class indicator.
4. Methods for assessing latent class membership over time are discussed in
Section 10.4.
5. The sampling design of ECLS-K included a 27% subsample of the total sample
at Fall of first grade to reduce the cost burden of following the entire sample for four
waves but to allow for the study of summer learning loss (NCES, 2001).
6. A nonstationary Markov model is one that allows heterogeneous transition
probabilities over time. In contrast, stationary Markov models assume homogeneous
transition probabilities over time.
7. It should be noted that finite mixture modeling has been applied to continuous
growth curve models under the name general growth mixture models (B. Muthén, 2004).
These models have been applied to problems in the development of reading competen-
cies (Kaplan, 2002), and math competencies (Jordan, Kaplan, Nabors-Olah, & Locuniak,
2006 ).
8. It is sometimes the case that adding covariates can change the number of mix-
ture classes. See Kaplan (2002) for an example of this problem in the context of reading
achievement.
9. This is an admittedly simple explanation. The CACE approach makes very
important assumptions—including random assignment, and stable unit treatment
value (Jo & Muthén, 2001).
09-Kaplan-45677:09-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 206
10-Kaplan-45677:10-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 207
10
Epilogue
Toward a New Approach to the
Practice of Structural Equation Modeling
A s stated in the Preface, one goal of this book was to provide the reader
with an understanding of the foundations of structural equation model-
ing and hopefully to stimulate the use of the methodology through examples
that show how structural modeling can illuminate our understanding of social
reality—with problems in the field of education serving as motivating exam-
ples. At this point, we revisit the question of whether structural equation
207
10-Kaplan-45677:10-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 208
Epilogue—209
Four steps almost completely describe it: a model is postulated, data gathered,
a regression run, some t-statistics or simulation performance provided and
another empirical regularity was forged.
Epilogue—211
Epilogue—213
I argue that one response to this critique offered by Spanos (1986, 1990,
1995) may provide an alternative to the conventional practice of structural
equation modeling in the social sciences. Spanos refers to this alternative
approach as the probabilistic reduction approach.
The Theory of Errors Paradigm. The theory of errors paradigm had its roots
in the mathematical theory of approximation and led to the method of least
squares proposed by Legendre in 1805. A probabilistic foundation was given to
the least squares approach by Gauss in 1809 and developed into a “theory of
errors” by Laplace in 1812.
The basic idea originally proposed by Legendre was that a certain function
was optimally approximated by another function via the minimization of the
sum of the squared deviations about the line. The probabilistic formulation
10-Kaplan-45677:10-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 214
proposed by Gauss and later Laplace was that if the errors were the result of
insignificant omitted factors, then the distribution of the sum of the errors
would be normal as the number of errors increased. If it could be argued that
the omitted variables were essentially unrelated to the systematic part of the
model, then the phenomena under study could be treated as if it were a nearly
isolated system (as cited in Spanos, 1995; Stigler, 1986).
Arguably, the theory of errors paradigm had a more profound influence
on econometric and social science modeling than the Fisher paradigm. Specifically,
the theory of errors paradigm led to a tremendous focus on statistical estima-
tion. Indeed, a perusal of most econometric textbooks shows that the domi-
nant discussion is typically around the choice of an estimation method. The
choice of an alternative estimator, whether it be two-stage least squares,
limited-information maximum likelihood, instrumental variable estimation,
or generalized least squares, is the result of viewing ordinary least squares as
not living up to its optimal properties in the context of real data.
Epilogue—215
Theory DGP
Theoretical Observed
Model Data
Estimable Statistical
Model Model
Estimation
Misspecification
Reparameterization
Model Selection
Empirical Social
Science Model
a theory, and the latter does not always provide information regarding what
can be observed or how it should be measured. One only need think of “school
quality” as an important theoretical variable of the input-process-output
theory to realize how many different ways such a theoretical variable can be
measured. Therefore, a distinction needs to be made regarding the theoretical
model and an estimable model, where the estimable model is specified with an
eye toward the DGP (Spanos, 1990).
As an example, let us assume the appropriateness of the input-process-
output theory. If interest centers on the measurement of school quality via a
10-Kaplan-45677:10-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 217
Epilogue—217
survey of school climate, this will have bearing on the form of the estimable
model as well as the form of the statistical model (to be described next). If
school quality actually referred to the distribution of resources to classrooms,
then clearly the estimable model will differ from the theoretical model and aux-
iliary measurements might need to be added. It may be interesting to note that
the theoretical model and estimable model coincides when data are generated
from an experimental arrangement. However, we noted that such arrangements
are rare in social science applications of structural equation modeling.
where the first term on the right-hand side of Equation [10.1] is the conditional
distribution of the endogenous variables given the exogenous variables, and
the second term on the right-hand side is the marginal distribution of the
10-Kaplan-45677:10-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 218
y = Πx + ζ , [10.2]
Epilogue—219
The two forms of identification are related, but distinction is useful from the
view point of the probabilistic reduction approach (see Spanos, 1990).
data, and it may be necessary to put forth numerous statistical models until one
is finally chosen. These assumptions include exogeneity, normality, linearity,
homogeneity, and independence. Weak exogeneity becomes a very serious
assumption at this step because evidence against weak exogeneity implies that
conditional estimation is inappropriate—that is, the conditional and marginal
distributions must be both taken into consideration during estimation. In any
case, a violation of one or more of these assumptions requires respecification and
adjustment until a statistically adequate model is obtained.
The next step in the probabilistic reduction approach is to begin testing
theoretical propositions of interest via parameter restrictions placed on a sta-
tistically adequate model. Note that whereas the resulting statistical model may
be based on considerable data mining, this does not present a problem because
the parameters of the statistical model do not have a direct interpretation rel-
ative to the theoretical parameters. However, the process of parameter restric-
tion of the statistical model is based on theoretical suppositions and should not
be data specific. Indeed, as Spanos points out, the more restrictions placed on
the model, the less data-specific the theoretical/estimable model becomes.
From the point of view of structural equation modeling in the social sciences,
this means that we tend to favor models with many degrees of freedom.
In contrast to the probabilistic reduction approach, the conventional
approach typically starts with an over-identified model wherein the more overi-
dentifying restrictions the better from a theoretical point of view. However, the
process of model modification that characterizes the conventional approach
becomes problematic insofar as it does not rest on a statistically adequate and
convenient summary of the probabilistic structure of the data.
Epilogue—221
Epilogue—223
under which the causal variables are assumed to operate. This view encourages
the practitioner to provide a rationale for the choice of variables in a particu-
lar model and how they might work together as a field within which a select set
of causal variables operates. This exercise in providing a deep description of
the causal field and the inus conditions for causation should be guided by
theory and, in turn, can be used to inform and test theory.
Epilogue—225
to the conjuncts of particular inus conditions. But what of the remaining rele-
vant causes of reading proficiency in our example? According to Mackie, they
are relegated to the causal field. Hoover views the causal field as the standing
conditions of the problem that are known not to change, or perhaps to be
extremely stable for the purposes at hand. In Hoover’s words, they represent
the “boundary conditions” of the problem.
However, the causal field is much more than simply the standing condi-
tions of a particular problem. Indeed, from the standpoint linear statistical
models generally, those variables that are relegated to the causal field are part
of what is typically referred to as the error term. Introducing random error
into the discussion allows Mackie’s notions to be possibly relevant to indeter-
ministic problems such as those encountered in the social and behavioral sci-
ences. However, according to Hoover, this is only possible if the random error
terms are components of Mackie’s notion of a causal field.
Hoover argues that the notion of a causal field has to be expanded for
Mackie’s ideas to be relevant to indeterministic problems. In the first instance,
certain parameters of a causal process may not, in fact, be constant. If parame-
ters of a causal question were truly constant, then they can be relegated to the
causal field. Parameters that are mostly stable over time can also be relegated
to the causal field, but should they in fact change, the consequences for the
problem at hand may be profound. In Hoover’s analysis, these parameters are
part of the boundary conditions of the problem. Hoover argues that most inter-
ventions are defined within certain, presumably constant, boundary conditions—
although this may be questionable outside of economics.
In addition to parameters, there are also variables that are not of our imme-
diate concern and thus part of the causal field. Random errors, in Hoover’s
analysis, contain the variables omitted from the problem and are “impounded”
in the causal field. “The causal field is a background of standing conditions and,
within the boundaries of validity claimed for the causal relation, must be invari-
ant to exercises of controlling the consequent by means of the particular causal
relation (INUS condition) of interest” (Hoover, 2001, p. 222).
Hoover points out that for the inus condition to be a sophisticated approach
to the problem of causal inference, the antecedents must truly be antecedent.
Frequently, this requirement is presumed to be met by appealing to temporal
priority. But the assumption of temporal priority is often unsatisfactory.
Hoover gives the example of laying one’s head on a pillow and the resulting
indentation in the pillow as an example of the problem of simultaneity and
temporal priority.12 Mackie, however, sees the issue somewhat more simply—
namely the antecedent must be directly controllable. This focus on direct con-
trollability is an important feature Woodward’s (2003) manipulability theory
of causation described next.
10-Kaplan-45677:10-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 226
my idea is that one ought to be able to associate with any successful explana-
tion a hypothetical or counterfactual experiment that shows us that and how
manipulation of the factors mentioned in the explanation . . . would be a way
of manipulating or altering the phenomenon explained . . . Put in still
another way, an explanation ought to be such that it can be used to answer
what I call the what-if-things-had-been-different question . . . (p. 11)
Epilogue—227
y = bx + u, [10.3]
z = gx + ly + v: [10.4]
y = bx + u, [10.5]
z = px + w, [10.6]
points out, although these two sets of equations yield observationally equivalent
information, they are distinct causal representations.
To see this, note that Equations [10.3] and [10.4] say that x is a direct cause
of y and x and y are direct causes of z. But, Equations [10.5] and [10.6] say that
x is a direct cause of y and z and says nothing about y being a direct cause of z.
If Equations [10.3] and [10.4] represent the true causal system and is assumed
to be modular in Woodward’s sense, then Equations [10.4] and [10.5] cannot be
modular. For example, if y is fixed to a particular value due to intervention, then
this implies that β = 0. Nevertheless, despite this intervention, Equation [10.4]
will continue to hold. In contrast, given modularity of Equations [10.3] and
[10.4], we see that Equation [10.5] will change because π is a function of β.
We see then, that the structural form and reduced form are distinct causal
systems, and although they provide identical observational information as well
as inform the problem of identification, they do not provide identical causal
information. Moreover, given that numerous equivalent models can be formed,
the criterion for choosing among them, according to Woodward, is that the
model satisfies modularity, because that will be the model that fully represents
the causal mechanism and set of relationships (Woodward, 2003, p. 332).
In what sense does the manipulability theory of causation inform model-
ing practice? For Woodward (2003), the problem is that the model possessing
the property of modularity cannot be unambiguously determined from among
competing observationally equivalent models. Only the facts about causal
processes can determine this. For Woodward therefore, the prescription for
modeling practice is that researchers should theorize distinct causal mecha-
nisms and hypothesize what would transpire under hypothetical interventions.
This information is then mapped into a system of equations wherein each
equation represents a clearly specified and distinct causal mechanism. The
right-hand side in any given equation contains those variables on which inter-
ventions would change the variables on the left-hand side. And, although dif-
ferent systems of equations may be mathematically equivalent, this is only a
problem if we are postulating relatively simple associations. As Pearl (2000)
points out, mathematically equivalent models are not syntactically equivalent
when considered in light of hypothetical interventions. That is, each equation
in a system of equations should “encode” counterfactual information necessary
for considering hypothetical interventions (Pearl, 2000; Woodward, 2003).
Epilogue—229
y = bz + u, [10.7]
z = gx + v [10.8]
is interpreted quite differently from the case where we also allow x to directly
influence y—that is,
y = bz + lx + u, [10.9]
z = gx + v: [10.10]
In the purely mediating model given in Equations [10.7] and [10.8], the effect
of intervening on x is to change y by βγ. In the model in Equations [10.9] and
[10.10], the effect of intervening on x is to change y by bg + l .
The difference between the interpretations of these two models is not
trivial. They represent important causal information regarding what would
obtain after an intervention on x. For Pearl (2000), structural equations are
meant to define an equilibrium state, where that state would be violated when
there is an outside intervention (p. 157). As such, structural equations encode
not only information about the equilibrium state but also information about
which equations must be perturbed to explain the new equilibrium state. For
the two models just described, an intervention on x would lead to different
equilibrium states.
Much more can be said regarding Woodward’s (2003) manipulability
theory of causation as well as Pearl’s (2000) interventional interpretation of
structural equation modeling, but a full account of their ideas is simply beyond
the scope of this chapter. Suffice to say that in the context of structural equa-
tion modeling, Woodward’s (2003) as well as Pearl’s (2000) expansion of the
counterfactual theory of causation to the problem of hypothetical interven-
tions on exogenous variables provides a practical framework for using struc-
tural equation modeling to guide causal inference and is line with how its
founders (Haavelmo, 1943; Marschak, 1950; Simon, 1953) viewed the utility of
the methodology.
10.8 Conclusion
Over the past 10 years, there have been important developments in the
methodology of structural equation modeling—particularly in methods
10-Kaplan-45677:10-Kaplan-45677.qxp 6/24/2008 8:22 PM Page 230
Epilogue—231
Notes
1. However, with the advent of new estimation methods, such as those discussed
in Chapter 5, this may become less of a concern in the future.
2. Except perhaps indirectly when using the Akaike information criterion for
nested comparisons.
3. It is beyond the scope of this chapter to conduct a detailed historical analysis,
but it is worth speculating whether Goldberger’s important influence in structural
equation modeling may partially account for the conventional practice observed in the
social sciences.
4. As noted in Spanos (1989) this view was based on the perceived outcome of a
classic debate between Koopmans (1947) and Vining (1949).
5. Included are such important contributions as randomization, replication, and
blocking.
6. A difficulty that arises in the context of this discussion is the confusion of
terms such as theory, model, and statistical model. No attempt will be made to resolve
this confusion in the context of this chapter and thus it is assumed that the reader will
understand the meaning of these terms in context.
7. Of course, nonzero restrictions and equality constraints are also possible.
8. Note that one can also use the multilevel reduced form discussed in Chapter 7
for this purpose as well.
9. Haavelmo, Wright, and Koopmans were referring to simultaneous equation
modeling, but the point still holds for structural equation modeling as understood in
this book.
10. An example might be a match being lit without it being struck—for example,
if it were hit by lightning.
11. In this regard, there does not appear to be any inherent conflict between the
probabilistic reduction approach described earlier and the counterfactual model of
causal inference.
12. This example was originally put forth by Emanuel Kant in the context of an
iron ball depressing a cushion.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp 6/24/2008 6:43 PM Page 232
References
232
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp 6/24/2008 6:43 PM Page 233
References—233
Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness-of-fit in the
analysis of covariance structures. Psychological Bulletin, 88, 588–606.
Bentler, P. M., & Liang, J. (2003). Two-level mean and covariance structures: Maximum
likelihood via an EM algorithm. In S. Reise & N. Duan (Eds.), Multilevel modeling:
Methodological advances, issues, and applications (pp. 53–70). Mahwah, NJ: Lawrence
Erlbaum.
Berndt, E. R. (1991). The practice of econometrics: Classic and contemporary. New York:
Addison-Wesley.
Bidwell, C. E., & Kasarda, J. D. (1975). School district organization and student achieve-
ment. American Sociological Review, 40, 55–70.
Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis.
Cambridge, MA: MIT Press.
Blumen, I. M., Kogan, M., & McCarthy, P. J. (1955). The industrial mobility of labor as a
probability process. Ithaca, NY: Cornell University Press.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bollen, K. A., & Curran, P. J. (2004). Autoregressive latent trajectory (ALT) models: A
synthesis of two traditions. Sociological Methods and Research, 32, 336–383.
Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation perspec-
tive. New York: Wiley.
Boomsma, A. (1983). On the robustness of LISREL (maximum likelihood estimation)
against small sample size and non-normality. Unpublished dissertation, University
of Groningen, Groningen, The Netherlands.
Browne, M. W. (1982). Covariance structures. In D. M. Hawkins (Ed.), Topics in applied
multivariate analysis (pp. 72–141). London: Cambridge University Press.
Browne, M. W. (1984). Asymptotic distribution free methods in the analysis of
covariance structures. British Journal of Mathematical and Statistical Psychology,
37, 62–83.
Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covari-
ance structures. Multivariate Behavioral Research, 24, 445–455.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In
K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136–162).
Newbury Park, CA: Sage.
Browne, M. W., & Mels, G. (1990). RAMONA user’s guide. Columbus: Department of
Psychology, Ohio State University.
Buse, A. (1982). The likelihood ratio, Wald, and Lagrange multiplier tests: An exposi-
tory note. The American Statistician, 36, 153–157.
Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of fac-
tor covariance and mean structure. Psychological Bulletin, 105, 456–466.
Caldwell, B. (1982). Beyond positivitism: Economic methodology in the twentieth century.
London: George Allen & Unwin.
Cartwright, N. (2007). Hunting causes and using them: Approaches in philosophy and eco-
nomics. Cambridge: Cambridge University Press.
Chambers, J. M. (1998). Programming with data: A guide to the S language. New York:
Springer-Verlag.
Chou, C.-P., & Bentler, P. M. (1990). Model modification in covariance structure mod-
eling: A comparison among likelihood ratio, Lagrange multiplier, and Wald tests.
Multivariate Behavioral Research, 25, 115–136.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp 6/24/2008 6:43 PM Page 234
References—235
Hume, D. (1739). A treatise of human nature. Oxford, UK: Oxford University Press.
Intriligator, M. D., Bodkin, R. G., & Hsiao, C. (1996). Econometric models, techniques,
and applications. Upper Saddle River, NJ: Prentice Hall.
Jo, B., & Muthén, B. O. (2001). Modeling of intervention effects with noncompliance: A
latent variable modeling approach for randomized trials. In G. A. Marcoulides &
R. E. Schumacker (Eds.), New developments and techniques in structural equation
modeling (pp. 57–87). Mahwah, NJ: Lawrence Erlbaum.
Johnston, J. (1972). Econometric methods (2nd ed.). New York: McGraw-Hill.
Jordan, N. C., Hanich, L. B., & Kaplan, D. (2003a). Arithmetic fact mastery in young
children: A longitudinal investigation. Journal of Experimental Child Psychology,
85, 103–119.
Jordan, N. C., Hanich, L. B., & Kaplan, D. (2003b). A longitudinal study of mathematical
competencies in children with specific mathematics difficulties versus children with
co-morbid mathematics and reading difficulties. Child Development, 74, 834–850.
Jordan, N. C., Kaplan, D., & Hanich, L. B. (2002). Achievement growth in children with
learning difficulties in mathematics: Findings of a two-year longitudinal study.
Journal of Educational Psychology, 94, 586–597.
Jordan, N. C., Kaplan, D., Nabors-Oláh, L., Locuniak, M. N. (2006). Number sense
growth in kindergarten: A longitudinal investigation of children at risk for math-
ematics difficulties. Child Development, 77, 153–175.
Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis.
Psychometrika, 32, 443–482.
Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor
analysis. Psychometrika, 34, 183–202.
Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations.
Psychometrika, 36, 409–426.
Jöreskog, K. G. (1973). A general method for estimating a linear structural equation sys-
tem. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the
social sciences (pp. 85–112). New York: Academic Press.
Jöreskog, K. G. (1977). Structural equation models in the social sciences: Specification,
estimation and testing. In P. R. Krishnaiah (Ed.), Applications of statistics
(pp. 265–287). Amsterdam: North-Holland.
Jöreskog, K. G., & Goldberger, A. (1972). Factor analysis by generalized least squares.
Psychometrika, 37, 243–259.
Jöreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indi-
cators and multiple causes of a single latent variable. Journal of the American
Statistical Association, 70, 631–639.
Jöreskog, K. G., & Lawley, D. N. (1968). New methods in maximum likelihood factor
analysis. British Journal of Mathematical and Statistical Psychology, 21, 85–96.
Jöreskog, K. G., & Sörbom, D. (1993). LISREL 8.14. Chicago: Scientific Software
International.
Jöreskog, K. G., & Sörbom, D. (2000). LISREL 8.30 and PRELIS 2.30. Lincolnville, IL:
Scientific Software International.
Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis.
Psychometrika, 23, 187–200.
Kaplan, D. (1988). The impact of specification error on the estimation, testing, and
improvement of structural equation models. Multivariate Behavioral Research, 23,
69–86.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp 6/24/2008 6:43 PM Page 237
References—237
Kaplan, D., Harik, P., & Hotchkiss, L. (2000). Cross-sectional estimation of dynamic
structural equation models in disequilibrium. In R. Cudeck, S. H. C. du Toit, &
D. Sorbom (Eds.), Structural equation modeling: Present and future. A festschrift
in honor of Karl G. Jöreskog (pp. 315–339). Lincolnville, IL: Scientific Software
International.
Kaplan, D., Kim, J.-S., and Kim, S.-Y. (in press). Multilevel Latent Variable Modeling:
Current research and recent developments. In R. Millsap & A. Maydeus-Olivaras
(Eds.), Sage handbook of quantitative methods in psychology. Thousand Oaks, CA
Sage.
Kaplan, D., & Kreisman, M. B. (2000). On the validation of indicators of mathematics
education using TIMSS: An application of multilevel covariance structure model-
ing. International Journal of Educational Policy, Research, and Practice, 1, 217–242.
Kaplan, D., & Walpole, S. (2005). A stage-sequential model of literacy transitions:
Evidence from the Early Childhood Longitudinal Study. Journal of Educational
Psychology, 97, 551–563.
Kaplan, D., & Wenger, R. N. (1993). Asymptotic independence and separability in
covariance structure models: Implications for specification error, power, and
model modification. Multivariate Behavioral Research, 28, 483–498.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Educational
Research Association, 90, 773–795.
Keesling, J. W. (1972). Maximum likelihood approaches to causal analysis. Unpublished
doctoral dissertation, University of Chicago, Chicago.
Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences. Pacific
Grove, CA: Brooks/Cole.
Kish, L. (1965). Survey sampling. New York: Wiley.
Kish, L., & Frankel, M. R. (1974). Inference from complex samples. Journal of the Royal
Statistical Society Series B, 36, 1–37.
Koopmans, T. C. (1947). Measurement without theory. Review of Economics and
Statistics, 29, 161–172.
Koopmans, T. C. (Ed.). (1950). Statistical inference in dynamic economic time series
(Vol. 10). New York: Wiley.
Koopmans, T. C., Rubin, H., & Leipnik, R. B. (1950). Measuring the equation systems of
dynamic economics. In T. C. Koopmans (Ed.), Statistical inference in dynamic
economic models. (pp. 53–237). New York: Wiley.
Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. Thousand Oaks, CA:
Sage.
Land, K. C. (1973). Identification, parameter estimation, and hypothesis testing in recur-
sive sociological models. In A. S. Goldberger & O. D. Duncan (Eds.), Structural
equation models in the social sciences (pp. 19–49). New York: Seminar Press.
Langeheine, R., & Van de Pol, F. (2002). Latent Markov chains. In J. A. Hagenaars &
A. L. McCutcheon (Eds.), Applied latent class analysis (pp. 304–341). Cambridge,
UK: Cambridge University Press.
Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum
likelihood. Proceedings of the Royal Society of Edinburgh, 60, 64–82.
Lawley, D. N. (1941). Further investigations in factor estimation. Proceedings of the
Royal Society of Edinburgh, 61, 176–185.
Lawley, D. N., & Maxwell, A. E. (1971). Factor analysis as a statistical method. London:
Butterworth.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp 6/24/2008 6:43 PM Page 239
References—239
References—241
Muthén, B., & Satorra, A. (1989). Multilevel aspects of varying parameters in structural
models. In R. D. Bock (Ed.), Multilevel analysis of educational data. San Diego, CA:
Academic Press.
Muthén, B. O. (2001). Latent variable mixture modeling. In G. A. Marcoulides &
R. E. Schumacker (Eds.), New developments and techniques in structural equation
modeling. Mahaw, NJ: Lawrence Erlbaum.
Muthén, B. O., & Curran, P. J. (1997). General longitudinal modeling of individual dif-
ferences in experimental designs: A latent variable framework for analysis and
power estimation. Psychological Methods, 2, 371–402.
Muthén, B. O., du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted
least squares and quadratic estimating equations in latent variable modeling with
categorical outcomes. Unpublished manuscript, University of California, Los
Angeles.
Muthén, L. K., & Muthén, B. O. (2006). Mplus: Statistical analysis with latent variables.
Los Angeles: Muthén & Muthén.
Muthén, L. K., & Muthén, B. (1998–2007). Mplus user’s guide (5th ed.). Los Angeles:
Muthén & Muthén.
Nagin, D. S. (1999). Analyzing developmental trajectories: A semi-parametric, group-
based approach. Psychological Methods, 4, 139–157.
National Assessment of Educational Progress (NAEP). (1986). The NAEP 1986
Technical Report. Princeton, NJ. Educational Testing Service
National Center for Education Statistics. (1988). National educational longitudinal study
of 1988. Washington, DC: U.S. Department of Education.
National Center for Education Statistics. (2001). Early childhood longitudinal study:
Kindergarten class of 1998–99: Base year public-use data files user’s manual (No.
NCES 2001–029). Washington, DC: Government Printing Office.
Olsson, U. (1979). On the robustness of factor analysis against crude classification of
the observations. Multivariate Behavioral Research, 14, 485–500.
Organisation for Economic Co-operation and Development. (2004). The PISA 2003
assessment framework: Mathematics, reading, science, and problem solving knowl-
edge and skills. Paris: Author.
Pagan, A. R. (1984). Model evaluation by variable addition. In D. F. Hendry & K. F.
Wallis (Eds.), Econometrics and quantitative economics (pp. 275–314). Oxford, UK:
Basil Blackwell.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge
University Press.
Pearson, K., & Lee, A. (1903). On the laws of inheritance in man. Biometrika, 2,
357–462.
Pindyck, R. S., & Rubinfeld, D. L. (1991). Econometric models & economic forecasts. New
York: McGraw-Hill.
Potthoff, R. F., Woodbury, M. A., & Manton, K. G. (1992). “Equivalent sample size” and
“equivalent degrees of freedom” refinements using survey weights under super-
population models. Journal of the American Statistical Association, 87, 383–396.
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural
equation modeling. Psychometrika, 69, 167–190.
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A.
Bollen, & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180).
Newbury Park, CA: Sage.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp 6/24/2008 6:43 PM Page 242
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and
data analysis methods (2nd ed.). Thousands Oaks, CA: Sage.
Richard, J.-F. (1982). Exogeneity, causality, and structural invariance in econometric
modeling. In G. C. Chow & P. Corsi (Eds.), Evaluating the reliability of macro-
economic models (pp. 105–118). New York: Wiley.
Rogosa, D. R., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the mea-
surement of change. Psychological Bulletin, 90, 726–748.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in
observational studies for causal effects. Biometrika, 70, 41–55.
Rubin, D. (1976). Inference and missing data. Biometrika, 63, 581–592.
Saris, W. E., & Stronkhorst, H. (1984). Causal modeling in nonexperimental research.
Amsterdam: Sociometric Research Foundation.
Saris, W. E., Satorra, A., & Sörbom, D. (1987). The detection and correction of specifi-
cation errors in structural equation models. In C. C. Clogg (Ed.), Sociological
methodology 1987 (pp. 105–129). San Francisco: Jossey-Bass.
Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A unified
approach. Psychometrika, 54, 131–151.
Satorra, A. (1992). Asymptotic robust inference in the analysis of mean and covariance
structures. In P. V. Marsden (Ed.), Sociological methodology 1992 (pp. 249–278).
Oxford, UK: Blackwell.
Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance struc-
ture analysis. Psychometrika, 50, 83–90.
Schmidt, W. H. (1969). Covariance structure analysis of the multivariate random effects
model. Unpublished doctoral dissertation, University of Chicago.
Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6,
461–464.
Shavelson, R. J., McDonnell, L. M., & Oakes, J. (Eds.). (1989). Indicators for monitoring
mathematics and science: A sourcebook. Santa Monica, CA: Rand Corporation.
Silvey, S. D. (1959). The Lagrangian multipler test. Annals of Mathematical Statistics, 30,
389–407.
Simon, H. A. (1953). Causal ordering and identifiability. In W. C. Hood & T. C.
Koopmans (Eds.), Studies in econometric method (pp. 49–74). New York: Wiley.
Sivo, S. A., Fan, X., & Witta, E. L. (2005). The biasing effects of unmodeled ARMA time
series processes on latent growth curve model estimates. Structural Equation
Modeling, 12, 215–232.
Sobel, M. E. (1990). Effect analysis and causation in linear structural equation models.
Psychometrika, 55, 495–515.
Sobel, M. E., & Bohrnstedt, G. W. (1985). Use of null models in evaluating the fit of
covariance structure models. In N. B. Tuma (Ed.), Sociological methodology 1985
(pp. 152–178). San Francisco: Jossey-Bass.
Sörbom, D. (1974). A general method for studying differences in factor means and fac-
tor structure between groups. British Journal of Mathematical and Statistical
Psychology, 27, 229–239.
Sörbom, D. (1978). An alternative to the methodology of analysis of covariance.
Psychometrika, 43, 381–396.
Sörbom, D. (1989). Model modification. Psychometrika, 54, 371–384.
Spanos, A. (1986). Statistical foundations of econometric modeling. Cambridge, UK:
Cambridge University Press.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp 6/24/2008 6:43 PM Page 243
References—243
Index
245
Index-Kaplan-45677:Index-Kaplan-45677.qxp 6/24/2008 4:04 PM Page 246
Index—247
Index—249
Index—251
Index—253
David Kaplan received his PhD in Education from UCLA in 1987, after which
he joined the faculty of the University of Delaware where he remained until
2006. He is currently Professor of Quantitative Methods in the Department of
Educational Psychology at the University of Wisconsin–Madison. His current
research focuses on the problem of causal inference in nonexperimental settings
within a “structuralist” perspective. He also maintains a strong and active inter-
est in the development and testing of statistical models for social and behavioral
processes that are not necessarily directly observed—including latent variable
models, growth curve models, mixture models, and Markov models. His
collaborative work involves applications of advanced statistical methods to
problems in education and human development. His Web site can be found at
http://www.education.wisc.edu/edpsych/facstaff/kaplan/kaplan.htm.
255
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp 6/24/2008 4:04 PM Page 256
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp 6/24/2008 4:04 PM Page 257
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp 6/24/2008 4:04 PM Page 258
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp 6/24/2008 4:04 PM Page 259
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp 6/24/2008 4:04 PM Page 260