Structural Equation Modeling Foundation and Extensions

Fm-Kaplan-45677:Fm-Kaplan-45677.
qxp 6/24/2008 6:29 PM Page i

Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page ii
Advanced Quantitative Techniques

in the Social Sciences
VO LU M E S I N T H E S E R I E S
1. HIERARCHICAL LINEAR MODELS: Applications and Data Analysis
Methods, 2nd Edition
Stephen W. Raudenbush and Antony S. Bryk
2. MULTIVARIATE ANALYSIS OF CATEGORICAL DATA: Theory
John P. van de Geer
3. MULTIVARIATE ANALYSIS OF CATEGORICAL DATA: Applications
John P. van de Geer
4. STATISTICAL MODELS FOR ORDINAL VARIABLES
Clifford C. Clogg and Edward S. Shihadeh
5. FACET THEORY: Form and Content
Ingwer Borg and Samuel Shye
6. LATENT CLASS AND DISCRETE LATENT TRAIT MODELS:
Similarities and Differences
Ton Heinen
7. REGRESSION MODELS FOR CATEGORICAL AND LIMITED
DEPENDENT VARIABLES
J. Scott Long
8. LOG-LINEAR MODELS FOR EVENT HISTORIES
Jeroen K. Vermunt
9. MULTIVARIATE TAXOMETRIC PROCEDURES: Distinguishing
Types From Continua
Niels G. Waller and Paul E. Meehl
10. STRUCTURAL EQUATION MODELING: Foundations and
Extensions, 2nd Edition
David Kaplan
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page iii
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page iv
Copyright © 2009 by SAGE Publications, Inc.
All rights reserved. No part of this book may be reproduced or utilized in any form or
by any means, electronic or mechanical, including photocopying, recording, or by any information
storage and retrieval system, without permission in writing from the publisher.
Portions of Chapter 7 will also appear in the forthcoming SAGE Handbook of Quantitative Methods
in Psychology edited by Roger E. Millsap and Albert Maydeu-Olivares. Portions of Chapter 9 were
first published in the following articles by David Kaplan and are reprinted here with permission:
Finite mixture dynamic regression modeling of panel data with implications for dynamic response
analysis, Journal of Educational and Behavioral Statistics; An overview of Markov chain methods for
the study of stage-sequential developmental processes, Developmental Psychology (Copyright ©
2008 by the American Psychological Association); Methodological advances in the analysis of
individual growth with relevance to educational policy, Peabody Journal of Education.
For information:
SAGE Publications, Inc. SAGE Publications India Pvt. Ltd.

2455 Teller Road B 1/I 1 Mohan Cooperative
Thousand Oaks, Industrial Area
California 91320 Mathura Road, New Delhi 110 044
E-mail: [email protected] India
SAGE Publications Ltd. SAGE Publications

1 Oliver’s Yard Asia-Pacific Pte. Ltd.
55 City Road 33 Pekin Street #02-01
London EC1Y 1SP Far East Square
United Kingdom Singapore 048763
Printed in the United States of America
Library of Congress Cataloging-in-Publication Data
Kaplan, David, 1955-

Structural equation modeling: foundations and extensions/David Kaplan.—2nd ed.
p. cm.—(Advanced quantitative techniques in the social sciences; 10)
Includes bibliographical references and index.
ISBN 978-1-4129-1624-0 (cloth)
1. Social sciences—Mathematical models. 2. Social sciences—Statistical methods. I. Title.
H61.25.K365 2009
300.1′5118—dc22 2008017670
Printed on acid-free paper
08 09 10 11 12 10 9 8 7 6 5 4 3 2 1
Acquiring Editor: Vicki Knight

Associate Editor: Sean Connelly
Editorial Assistant: Lauren Habib
Production Editor: Cassandra Margaret Seibel
Copy Editor: QuADS Prepress (P) Ltd.
Typesetter: C&M Digitals (P) Ltd.
Proofreader: Wendy Jo Dymond
Indexer: Jean Casalegno
Marketing Manager: Stephanie Adams
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page v
Contents
Series Editors’ Introduction to Structural Equation Modeling vi

Jan de Leeuw and Richard Berk
Preface to the Second Edition ix
1. Historical Foundations of Structural Equation

Modeling for Continuous and Categorical Latent Variables 1
2. Path Analysis: Modeling Systems of Structural Equations
Among Observed Variables 13
3. Factor Analysis 39
4. Structural Equation Models in Single and Multiple Groups 61
5. Statistical Assumptions Underlying Structural
Equation Modeling 85
6. Evaluating and Modifying Structural Equation Models 109
7. Multilevel Structural Equation Modeling 133
8. Latent Growth Curve Modeling 155
9. Structural Models for Categorical and Continuous
Latent Variables 181
10. Epilogue: Toward a New Approach to the Practice of
Structural Equation Modeling 207
References 232
Index 245
About the Author 255
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page vi
Series Editors’ Introduction to

Structural Equation Modeling
O ver the last 35 years Structural Equation Modeling (SEM) has become
one of the most important data analysis techniques in the social sci-
ences. In fact, it has become much more than that. It has become a language to
formulate social science theories, and a language to talk about the relationship
between variables. For about 10 years, the AQT series of books on advanced
techniques in the social sciences has had David Kaplan’s excellent text. We are
now pleased to have a completely revised and updated second edition.
SEM is not without its critics, and most researchers active in the area will
admit that it can easily be misused and in fact has frequently been misused. Its
routine application as a tool for theory formation and causal analysis has been
criticized by well-known statisticians such as Freedman, Wermuth, Rogosa,
Speed, Rubin, and Cox. One obvious problem is that SEM is a complicated
technique, whose statistical properties are far from simple, and many of its
users do not have enough statistical expertise to understand the various com-
plications. Another problem is that using SEM allows one to search over an
enormously large space of possible models, with a complicated combinatorial
structure, and the task of choosing an appropriate model, let alone the “best
model,” is horrendously difficult. Unless one has very strong prior knowledge,
which sets strict limitations on the choice of the model, it is easy to search until
one has an acceptable fit. That fit will often convey more about the tenacity and
good fortune of the investigator than about the world the model is supposed
to characterize. Finally, at a deeper level, there is considerable disagreement
about precisely what one can learn about cause and effect in the absence of
experiments in which causal variables are actually manipulated. For the critics,
SEM can never be a substitute for real experiments. The second edition pays a
great deal of attention to causal inference.
It is somewhat unfortunate that most of the books discussing SEM con-
centrate on the practical aspects of the technique, and are often ill-disguised
vi
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page vii
Series Editors’ Introduction to Structural Equation Modeling—vii
extended manuals of specific SEM software packages. Kaplan’s second edition

provides a general overview of theoretical aspects of SEM, and it does include
many of these more recent developments. He mentions multilevel SEM, non-
normal SEM, missing data, latent class analysis, mixed discrete and continuous
models and latent growth curves. Although the author uses Muthén’s Mplus
software for its main analyses, he also uses R for supplementary statistics. But
the book is not an Mplus manual, and with little additional effort, the reader
can adapt the analyses to her favorite software package.
Let us try to place the Kaplan book somewhat more precisely in its histor-
ical context. To some extent this is already done in the historical introduction
in Chapter 1. There is very little factual distortion if we attribute the creation
of modern SEM to Karl Jöreskog’s work undertaken in the early seventies.
Because of the competitive structure of the field, due in part to the emphasis
on commercial software packages, there have been many attempts to minimize
his contributions. But it is clear from the record that Jöreskog managed, almost
single-handedly, to integrate into a single framework the simultaneous equa-
tions theory from econometrics, the path analysis theory from genetics and
sociology, and the factor analysis theory from psychometrics. And he imple-
mented this synthesis in the LISREL program, which was so influential in its
first twenty years of existence that many people simply referred to SEM as
“LISREL modeling” or “LISREL analysis.” Kaplan’s book takes the LISREL
approach as its starting point, but it integrates the many subsequent funda-
mental contributions of Bentler, Brown, Satorra, McDonald, and Muthén. In
this second edition there is a great deal of emphasis on “second-generation
SEM,” which combines latent class analysis with continuous first-generation
SEM. The basic approach is not new, because it was already suggested a long
time ago by Lazarsfeld, Guttman, and McDonald, but there now is a convenient
implementation in the Mplus machinery.
In a sense, SEM is an extension of simultaneous equation modeling, a class
of techniques that has been around in econometrics since the late thirties. The
major contributor around that time was Tinbergen. At the same time, SEM is
in an active stage of development in quite a few different sciences, and it has a
long history in each. Consequently, there inevitably are differences in the way
scientists from different disciplines talk and think about the technique. Kaplan
started in the psychometric tradition, but has been incorporating more and
more of the work in econometrics. One of the most original contributions of
the first edition is the discussion of Spanos’s work on econometric modeling
and the consequent alternative approach to SEM. This nonstandard discussion
of SEM remains part of the second edition.
Thus, the main contributions of the book are a solid discussion of likelihood-
based inference for SEM, with many of the modern extensions, and a new
methodological perspective on the model-fitting cycle based on Spanos’s work.
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page viii
viii—STRUCTURAL EQUATION MODELING
In the second edition, many more modern developments in longitudinal and

growth curve analysis, and in second-generation SEM, have been added.
We think this combination works, and it makes the book eminently suit-
able as a textbook on SEM for graduate-level courses in social science method-
ology or social statistics programs.
Jan de Leeuw
Richard Berk
Series Editors
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page ix
Preface to the Second Edition
A lmost a decade has passed since the publication of the first edition of
Structural Equation Modeling: Foundations and Extensions and consider-
able methodological advances in the area of structural equation modeling have
taken place. The vast majority of these advances have been in the analysis of
longitudinal data, but advances in the analysis of categorical latent variables as
well as general models that combine both categorical and continuous latent
variables have also made their way into applied work in the social and behav-
ioral sciences during this time. In addition, there have been advances in
estimation methods, techniques of model evaluation, and modern conceptual-
izations of the modeling process—including recent thinking on the use of
structural equation modeling for causal inference. In light of these advances,
I have undertaken a substantial revision of the book from the original format
that was adopted in the first edition.
This new edition maintains and updates so-called “first-generation” struc-
tural equation modeling but now brings in developments in so-called “second
generation” structural equation modeling—methods that combine continuous
latent variables (factors) with categorical latent variables (latent classes) in
cross-sectional and longitudinal contexts. As a result, the term structural equa-
tion modeling is being used here in a much more expansive sense, covering
models for continuous and categorical latent variables.
The present edition is now organized as follows. Chapter 1 retains the
original historical overview but now adds an historical overview of latent class
models. Chapter 2 remains relatively intact from the first edition. For com-
pleteness, Chapter 3 now contains material on nonstatistical estimation in the
unrestricted model—including a discussion of principal components analysis
and the common factor model. Chapter 4 remains mostly intact from the first
edition. Chapter 5 provides more detail regarding mean- and variance-adjusted
maximum likelihood and weighted least squares estimators along with a dis-
cussion of the extant evidence regarding their performance. Additional mater-
ial regarding developments in the analysis of missing data in the structural
ix
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page x
x—STRUCTURAL EQUATION MODELING
equation modeling framework is also provided. Chapter 6 updates the section

on statistical power in the structural equation modeling framework.
Chapter 7 of the first edition outlined multilevel structural equation mod-
eling. In the past few years, however, advances have been made in the estima-
tion of multilevel structural models as well as to applying structural equation
modeling to data arising from complex sampling designs. Therefore, Chapter 7
now includes a completely updated overview of multilevel structural equation
modeling and a review of the application of structural equation modeling for
complex sampling designs. Chapter 8 provides updated material on conven-
tional latent growth curve modeling but is now expanded to include methods
for nonlinear curve fitting, autoregressive latent trajectory models, cohort
sequential designs, and flexible modeling with alternative time metrics.
Chapter 9 now contains a review of very recent developments in structural
equation modeling that combine models for continuous latent variables
with categorical latent variable, and can best described as second-generation
structural equation modeling. To begin, Chapter 9 reviews latent class analysis,
which addresses the problem of categorical latent variables. Latent class analy-
sis is quite well known in its own right, but I include it in this edition because
it is now a part of a general extension of structural equation modeling. I also
include extensions of latent class analysis to longitudinal designs, describing
Markov chain–based models. Chapter 9 provides an overview of structural
equation modeling with finite mixtures, which formally combines categorical
and continuous latent into a comprehensive analytic framework. This chapter
reviews mixture factor analysis, mixture structural equation modeling, growth
mixture modeling, and mixture latent transition analysis.
Chapter 10 of this new edition retains much of the discussion of Spanos’s
(1986) probabilistic reduction approach to statistical modeling found in the
first edition of the book. I believe that it is still important to include this alter-
native to the conventional practice of structural equation modeling because
the conventional practice still dominates the social and behavioral science lit-
erature. However, during the interim between the first edition and this one, the
issue of causal inference in the social and behavioral sciences became, once
again, a hotly debated topic, and structural equation modeling has become one
of the battlegrounds in this debate. Therefore, this edition now incorporates a
discussion of causal inference in structural equation modeling motivated by
recent philosophical and methodological work on the counterfactual theory of
causation and its extensions.
In response to valuable feedback from colleagues and students who have
used the first edition of this book in their research and classes, I have decided
to make one additional change. In the interest of coherence, I have decided to
use Mplus Version 4.2 (L. Muthén & Muthén, 2006) as the standard structural
equation modeling software program throughout this edition. However, as
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page xi
Preface to the Second Edition—xi
with the first edition, this is not a book on how to use Mplus. For any
supplementary analyses, I have decided to use the open source software
program R. The R programming language is best considered a “dialect” of the
S programming language (Chambers, 1998). In most cases, S code can be
exported to the R environment without difficulty. In some cases, it is necessary
to invoke the S environment, and this is best accomplished through the
commercially available version, S-Plus.
Acknowledgments
Sage Publications appreciates the constructive comments and suggestions pro-

vided by the following reviewers:
Roger J. Calantone
Michigan State University
George Farkas
Pennsylvania State University
Scott M. Lynch
Princeton University
Keith A. Markus
John Jay College of Criminal Justice
Sandy Marquart-Pyatt
Utah State University
Victor L. Willson
Texas A&M University
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp 6/24/2008 6:29 PM Page xii
To Allison, Rebekah, and Hannah.

01-Kaplan-45677:01-Kaplan-45677.qxp 6/24/2008 8:19 PM Page 1
1
Historical Foundations
of Structural Equation
Modeling for Continuous and
Categorical Latent Variables
S tructural equation modeling can perhaps best be defined as a class of

methodologies that seeks to represent hypotheses about summary statistics
derived from empirical measurements in terms of a smaller number of “struc-
tural” parameters defined by a hypothesized underlying model. This definition
covers a large number of special cases as we will see throughout this book.
Traditionally, structural equation modeling for continuous latent variables
concerned hypotheses about the means, variances, and covariances of observed
data. In this book, we will also include summary statistics in the form of
response frequencies among observed categorical variables thus allowing us to
admit latent class models as another special case of structural equation models.
We begin our treatment of structural equation modeling by attempting
to define the methodology and then outlining its history. In developing the
history of structural equation modeling, we can best illustrate the substantive
problems that the methodology is trying to solve.
1.1 Psychometric Origins of Structural

Equation Modeling for Continuous Latent Variables
Structural equation modeling for continuous variables latent variables repre-

sents the hybrid of two separate statistical traditions. The first tradition is fac-
tor analysis developed in the disciplines of psychology and psychometrics. The
1
2—STRUCTURAL EQUATION MODELING
second tradition is simultaneous equation modeling developed mainly in

econometrics but having an early history in the field of genetics.
The origins of factor analysis can be traced to the work of Galton (1889)
and Pearson (Pearson & Lee, 1903) on the problem of inheritance of genetic
traits. However, it is the work of Spearman (1904) on the underlying structure
of mental abilities that can be credited with the development of the common
factor model. Spearman’s theoretical position was that the intercorrelations
between tests of mental ability could be accounted for by a general ability fac-
tor common to all of the tests and specific ability factors associated with each
of the tests. This view led to a structural equation of the form
rij = li lj , [1.1]
where ρij is the population correlation between scores on test i and test j, and
λi and λj are weights (loadings) that relate test i and test j to the general factor.
Consistent with our general definition of structural equation modeling,
Equation [1.1] expresses the correlations in terms of a set of structural
parameters. Spearman used the newly developed product-moment correlation
coefficient to correlate scores on a variety of tests taken by small sample of
boys. Spearman’s reported findings that were consistent with the structural
equation in Equation [1.1].
The work of Spearman and others (e.g., Thomson, 1956; Vernon, 1961)
formed the so-called British school of factor analysis. However, in the 1930s,
attention shifted to the work of L. L. Thurstone and his colleagues at the
University of Chicago. Thurstone argued that there was not one underlying
general factor of ability accompanied by specific ability factors as postulated by
Spearman. Rather, Thurstone argued that there existed major group factors
referred to as primary mental abilities (Thurstone, 1935). According to Mulaik
(1972), Thurstone’s search for group factors was motivated by a parsimony
principle that suggested that each factor should account for as much covariation
as possible in nonoverlapping sets of observed measures. Factors displaying this
property were said to exhibit simple structure. To achieve simple structure, how-
ever, Thurstone (1947) had to allow for the possibility that the factors them-
selves were correlated. Proponents of the British school, as noted by Mulaik
(1972), found this correlation to validate their claim of a general unitary ability
factor. In the context of Thurstone’s (1947) multiple factor model, the general
ability factor exists at a higher level of the ability hierarchy and is postulated to
account for the intercorrelations between the lower order primary factors.
By the 1950s and 1960s, factor analysis gained tremendous popularity,
owing much to the development and refinement of statistical computing
capacity. Indeed, Mulaik (1972) characterized this era as a time of agnostic and
blind factor analysis. However, during this era, developments in statistical
Historical Foundations of Structural Equation Modeling—3
factor analysis were also occurring, allowing for the explicit testing of hypothe-
ses regarding the number of factors. Specifically, work by researchers such as
Jöreskog (1967), Jöreskog and Lawley (1968), Lawley (1940, 1941), and Lawley
and Maxwell (1971) led to the development of a maximum likelihood–based
approach to factor analysis. The maximum likelihood approach allowed a
researcher to test a hypothesis that a specified number factors were present to
account for the intercorrelations between the variables. Minimization of the
maximum likelihood fitting function led directly to the likelihood ratio chi-
square test of the hypothesis that a proposed model fits the data. A generalized
least squares approach was later developed by Jöreskog and Goldberger (1972).
Developments by researchers such as Anderson and Rubin (1956) and later
by Jöreskog (1969) led to the methodology of confirmatory factor analysis that
allowed for testing hypotheses regarding the number of factors and the pattern
of loadings. From a historical perspective, these developments lent a rigorous
statistical approach to Thurstone’s simple structure ideas. In particular, a
researcher could now specify a model that certain factors accounted for the cor-
relations of only a specific subset of the observed variables. Again, using the
method of maximum likelihood, the hypothesis of simple structure could be
tested.
Exploratory and confirmatory factor analysis remain to this day very pop-
ular methodologies in quantitative social science research. In the context of
structural equation modeling, however, factor analysis constitutes a part of the
overall framework. Indeed, structural equation modeling represents a method
that, among other things, allows for the assessment of complex relationships
among factors. These complex relationships are often represented as systems of
simultaneous equations. The historical development of simultaneous equation
methodology is traced next.
1.2 Biometric and Econometric

Origins of Structural Equation Modeling
Structural equation modeling represents a melding of factor analysis and path

analysis into one comprehensive statistical methodology. The path analytic ori-
gins of structural equation modeling had its beginnings with the biometric
work of Sewell Wright (1918, 1921, 1934, 1960). Wright’s major contribution
was in showing how the correlations among variables could be related to the
parameters of a model as represented by a path diagram—a pictorial device that
Wright was credited with inventing. Wright also showed how the model equa-
tions could be used to estimate direct effects, indirect effects, and total effects.
Wright (1918) first applied path analysis to the problem of estimating size
components of the measurements of bones. Interestingly, this first application
of path analysis was statistically equivalent to factor analysis and was developed
apparently without knowledge of the work of Spearman (1904; see also Bollen,
1989). Wright also applied path analysis to problems of estimating supply and
demand equations and also treated the problem of model identification. These
issues formed the core of later econometric contributions to structural equa-
tion modeling (Goldberger & Duncan, 1972).
A second line of development occurred in the field of econometrics.
Mathematical models of economic phenomena have had a long history, begin-
ning with Petty (1676; as cited in Spanos, 1986). However, the form of econo-
metric modeling of relevance to structural equation modeling must be
credited to the work of Haavelmo (1943). Haavelmo was interested in model-
ing the interdependence between economic variables using the form for systems
of simultaneous equations written as
y = By + Γx + ζ , [1.2]
where y is a vector endogenous variables that the model is specified to explain,

x is a vector exogenous variables that are purported to explain y but whose
behavior is not explained, ζ is a vector of disturbance terms, and B and Γ are
coefficient matrices.
The model in Equation [1.2] was a major innovation in econometric
modeling. The development and refinement of the simultaneous equations
model was the agenda of the Cowles Commission for Research in Economics,
a conglomerate of statisticians and econometricians that met at the Univer-
sity of Chicago in 1945 and subsequently moved to Yale (see Berndt, 1991).1
This group wedded the newly developed simultaneous equations model with
the method of maximum likelihood estimation and associated hypothesis
testing methodologies (see Hood & Koopmans, 1953; Koopmans, 1950).
For the next 25 years, the thrust of econometric research was devoted to
the refinement of the simultaneous equations approach. Particularly
notable during this period was the work of Franklin Fisher (1966) on model
identification.
It is important to point out that although the simultaneous equations
model of Equation [1.2] enjoyed a long history of development and application,
it was not without its critics. Critics asserted that a serious problem with large
macroeconomic simultaneous equations models was that they could not com-
pete with the theory-free methods of the Box-Jenkins time series models when it
came to accurate predictions (e.g., Cooper, 1972).2 The underlying problem was
related to the classic distinction between theory-based but static models versus
dynamic time series models (e.g., Spanos, 1986). This problem led to a serious
reconsideration of the entire “conventional” approach to econometric modeling
that now occupies considerable discussion in the econometric literature.
1.3 Simultaneous Equation Modeling

Among Continuous Latent Variables
The above discussion briefly sketched the history of factor analysis and the
history of simultaneous equation modeling. The subject of this book is the
combination of the two, namely simultaneous equation modeling among
latent variables. The combination of these methodologies into a coherent ana-
lytic framework was based on the work of Jöreskog (1973), Keesling (1972),
and Wiley (1973).
The general structural equation model as outlined by Jöreskog (1973)
consists of two parts: (1) the measurement part, which links observed variables
to latent variables via a confirmatory factor model, and (2) the structural part
linking latent variables to each other via systems of simultaneous equations.
The estimation of the model parameters uses maximum likelihood estimation.
In the case where it is assumed that there is no measurement error in the
observed variables, the general model reduces to the simultaneous equations
model developed in econometrics (e.g., Hood & Koopmans, 1953). Issues of
model identification developed in econometrics (e.g., Fisher, 1966) were
brought into the general model with latent variables by Wiley (1973). A history
of software development then took place culminating in the popular LISREL
program (Jöreskog & Sörbom, 2000).
1.4 Structural Models for Categorical Latent Variables
What has been considered thus far is the history of structural equation model-
ing with a focus on the introduction of continuous latent variables into the
simultaneous equation framework. In many applications, however, it may be
useful to hypothesize the existence of categorical latent variables. Such cate-
gorical latent variables are presumed to explain response frequencies among
dichotomous or ordered categorical variables.
The use of categorical latent variables underlies the methodology of latent
structure analysis. Latent structure analysis was originally proposed by
Lazarsfeld (1950) as a means of modeling latent attitudes derived from
dichotomous survey items with the origins of latent structure analysis arising
from studies of military personnel during World War II.3 The problem facing
researchers at that time concerned the development of reliable and valid
instruments measuring the attitudes soldiers had toward the army. The results
of research conducted on World War II soldiers were published between 1949
and 1959 in a four-volume set titled The American Solider: Studies in Social
Psychology in WWII (Stouffer, Suchman, Devinney, Star, & Williams, 1949).
Volume 4 of this study was devoted to the problems of measurement and scal-
ing with major contributions made by Louis Guttman and Paul Lazarsfeld.
As with the earlier work of Spearman and Thurstone on the measurement
of intelligence, the goal here was to uncover underlying or “latent” structure
describing the attitudes of army personnel. However, unlike Spearman and
Thurstone, the observed data were discrete categorical responses and, in par-
ticular, dichotomous yes/no, agree/disagree responses.
The summary empirical data were in the form of frequencies of agreement
to a set of questions administered to the sample of personnel. In an example
using four dichotomous items, Lazarsfeld summarized the counts of indivi-
duals who agreed with all four statements, agreed with the first, but not the
remaining three, and so on. An inspection of the response frequencies led
Lazarsfeld to postulate that soldiers belonged to one of two possible “latent
classes”: the first being soldiers who are generally favorable to the army versus
those who are generally unfavorable. Moreover, Lazarsfeld noted that if one
were to have administered the items to only one of the two classes, there would
be no correlations among the items. This phenomenon was termed local inde-
pendence by Lazarsfeld and it implies that holding the latent class constant,
there is no correlation among the manifest item responses.
Missing during the early days in the development of latent structure analy-
sis was explicit testing of the latent class model. The standard issues of good-
ness-of-fit, parameter estimation, standard errors, and other concepts familiar
to mathematical statistics at the time were not discussed in any meaningful way
within the emerging literature on latent structure analysis. It wasn’t until much
later with the work of Lazarsfeld and Henry (1968) that the conventional con-
cepts of mathematical statistics were brought into the domain of latent struc-
ture analysis. Full integration of latent structure analysis with mathematical
statistics came with the publication of Leo Goodman’s (1968) paper on log-
linear modeling approaches to latent structure analysis and the publication of
Discrete Multivariate Analysis by Yvonne Bishop, Stephen Fienberg, and Paul
Holland (1975).
1.5 Modern Developments
Structural equation modeling is, without question, one of the most popular
statistical methodologies available to quantitative social scientists. The popu-
larity of structural equation modeling can be attested to by the creation of a
scholarly journal devoted specifically to structural equation modeling4 as well
as the existence of SEMNET, a very popular and active electronic discussion list
that focuses on structural equation modeling and related issues.5
Structural equation modeling also continues to be an active area of theo-
retical and applied statistical research. Indeed, the past 40 years have seen
remarkable developments in the statistical theory underlying structural equa-

tion modeling, culminating in software developments that allow flexible and
sophisticated modeling under nonstandard conditions of the data. Moreover,
recent developments have allowed traditionally different approaches to statis-
tical modeling to be specified as special cases of structural equation modeling.
As we will see later, the advantages of the structural equation modeling per-
spective are its tremendous flexibility as well as the incorporation of explicit
measurement models into more general statistical models.
It would be impossible to highlight all of the modern developments
in structural equation modeling in this section. However, among the more
important modern developments has been the extension of new estimation
methods to handle nonnormal distributions. This topic is considered in depth in
Chapter 5. Suffice it to say that owing to the seminal work of Browne (1984),
B. Muthén (1978, 1984), and others, it is now possible to estimate the parameters
of complex structural equation models when the data are nonnormal—including
mixtures of dichotomous, ordered-categorical, and continuous variables.
In addition to estimation with nonnormal variables, a long history of
methodological developments in structural equation modeling now allows
researchers to estimate models in the presence of other data-related problems.
For example, B. Muthén, Kaplan, and Hollis (1987), Allison (1987), and
Arbuckle (1996), building on the work of Little and Rubin (1987), have shown
how one can use standard structural modeling software to estimate the para-
meters of structural equation models when missing data are not missing com-
pletely at random.
Other developments in structural equation modeling have resulted from
specifying the general model in a way that allows a “structural modeling”
approach to other types of modeling. The most recent example of this develop-
ment is the use of structural equation modeling to estimate multilevel data—
including longitudinal data for the estimation of growth curve parameters
(B. Muthén & Satorra, 1989; Willett & Sayer, 1994). These topics are taken up in
more detail in Chapters 7 and 8. Finally, the merging of categorical latent vari-
able modeling and models for continuous latent variables in cross-sectional and
longitudinal contexts constitutes the current “state of the art” in structural equa-
tion modeling (B. Muthén, 2004). These topics are considered in Chapter 9.
1.6 The “Conventional” Practice

of Structural Equation Modeling
The previous sections provided only a taste of the foundations and extensions
of structural equation modeling. Each chapter of this book provides more
detail to both. These developments come about primarily through the interac-
tion of statisticians with substantive researchers motivated by a need to solve
specific substantive problems. Interesting substantive problems lead to cutting-

edge methodological developments, which, in principle, should lead to greater
insights into substantive problems. Yet, although it is possible to obtain better
and more precise estimates of substantive relationships, these new develop-
ments are embedded in a “conventional” practice of structural equation
modeling that I would argue limits substantive understanding as well as new
methodological developments.
The conventional approach to structural equation modeling as generally
practiced in the social and behavioral sciences can be characterized as shown
in Figure 1.1. The broad details of Figure 1.1 are as follows. First, when avail-
able, a theory is presented. The structural equations, as represented in a path
diagram, are seen as a one-to-one representation of the theory. Next, a sample
is selected and measures are obtained on the sample. This is followed by the
estimation of the parameters of the model. At this stage, the measurement
model can be estimated first, followed by the structural model or the full model
can be estimated at once. This is followed by an assessment of the goodness-
of-fit of the model followed by model modification if necessary. Typically, this
stage is cyclical with the model continually being modified and evaluated in
terms of goodness-of-fit until a decision is made that the model meets some
standard of adequate fit. Often, any number of conceptually different fit
indices is brought to bear to aid in this decision. These indices are described in
Chapter 6. Once the model is deemed to fit, a discussion of the findings fol-
lows. Rarely, if ever, are the results of the modeling exercise used for prediction
studies wherein policy/clinically relevant variables are manipulated and their
effects on outcome variables observed.
A clear feature of the conventional approach is the connection it makes
between the theory and the specification of the equations of the model as rep-
resented by the path diagram. Indeed, the conventional approach seems to sug-
gest that the specification of the model differs from the theory only by the
existence of a white noise error term. Within the conventional approach, the
goal of obtaining better fit by modifying models is driven by the view that
better fit suggests closer alignment with the theory. This is not to argue that
improving the fit of the model is a meaningless endeavor. However, when the
focus of a modeling exercise is post hoc fitting to data, such a strategy is bound
to lead to disappointment because it ignores the data-generating process
(DGP), that is, the actual process that generates the observed data—the
so-called “actual DGP” (see Spanos, 1986)—and the distance between the DGP
and the statistical model. Better fit may suggest closer alignment with the data,
but not necessarily with the theory.
In addition to the gap between the DGP and the statistical model as it per-
tains to the theory, there are scarce examples of using the results of the statisti-
cal model to validate theoretical predictions. One could argue that this is
because there are few examples of theories in the social and behavioral sciences
Theory
Model
Specification
Sample and
Measures
Estimation
Assessment
Modification
of Fit
Discussion
Figure 1.1 Diagram of Conventional Approach to Structural Equation Modeling
that articulated well enough to lead to theoretical models (mathematical

formulations of theories) that could generate predictions. However, as is
discussed in more detail in Chapter 10, the restrictions placed on structural
equation models for estimation and testing purposes implies the existence of a
theoretical model even if one was not explicitly articulated. Thus, I argue that a
problem with the conventional approach to structural equation modeling as
practiced in the social and behavioral sciences is that theory, theoretical models,
and statistical models are viewed as one and the same apart from an error
term—with the actual data playing little to no role at all. Moreover, structural
equation models are rarely, if ever, used to validate theoretical predictions.
Interestingly, this problem in social and behavioral science applications of
structural equation modeling parallels similar problems observed in econo-
metrics discussed above. An alternative approach to this problem from the
econometric perspective has been articulated by Spanos (1986, 1990, 1995),

who has offered a different formulation of econometric modeling that I believe
is worth exploring within the structural equation modeling community.
Although the primary aim of this book is to provide a detailed discussion of
the statistical foundations and extensions of structural equation modeling, one
central goal of this book is to open the debate about the practice of structural
equation modeling by formulating an alternative approach based on a modi-
fied version of Spanos’s work.
In addition to an explication of Spanos’s approach to structural equation
modeling, another extremely important issue concerns the use of structural
equation models for testing causal claims. Indeed, the problem of causal infer-
ence has led to an ongoing and vigorous debate between those advocating a
structural econometric approach to causal inference and those whose approach
rests on the tenets of randomized experimental designs. In addition to explicat-
ing Spanos’s notions of matching the statistical model with the actual DGP,
I also offer a discussion of the problem of causal inference as it bears on appli-
cations of structural equation modeling.
1.7 A Note on the Substantive Examples
The substantive examples in this book draw from current issues in the field of
education and that are at the forefront of national debate on school effects. My
choice in using examples from the field of education stems mainly from the fact
that this is the substantive area with which I am most familiar. In addition, many
of the topics covered in this book can be convincingly demonstrated on prob-
lems in the field of education. However, many of the new extensions in structural
equation modeling that constitute a part of this book can be quite clearly demon-
strated on problems arising from fields other than education.
Many of the examples used throughout this book are guided by a theoret-
ical framework. The theoretical framework used throughout this book is
referred to as the input-process-output theory of education (Bidwell & Kasarda,
1975). A number of diagrammatic formulations have been offered to describe
the input-process-output theory of the U.S. educational system. Figure 1.2 shows
one such diagram offered by Shavelson, McDonnell, and Oakes (1989) and often
referred to as the RAND Corporation Indicators Model.
There are numerous aspects of this figure that are worth pointing out.
First, and of relevance to the subject of this book, is the implied complexity of
the educational system. To take an example, schooling inputs such as fiscal
resources are theorized to have their effects on outputs mostly via other school-
ing variables as well as teacher and classroom process variables. The teacher/
classroom process variables, in turn, exhibit their own structural complexity.
Inputs Processes Outputs
Fiscal
Achievement
Resources
Curriculum
Quality
Teacher School Instructional Participation/

Quality Quality Quality Dropout
Teaching
Quality
Student Attitudes/
Background Aspirations
Figure 1.2 The Input-Process-Output Model of the U.S. Educational System
SOURCE: From Shavelson, McDonnell, and Oakes (1989).
The statistical methodology most suited to capturing the complexity of rela-

tions among the inputs, processes, and outputs of the educational system is
structural equation modeling. The input-process-output model suggests a large
number of questions that can be posed relating to the structural complexity of
schooling. Chapters 2 and 4 consider these questions as a means of motivating
the use of path analysis and structural equation modeling, respectively.
Second, the terms displayed in the boxes represent loosely defined theo-
retical constructs and not specific observable data. Moreover, the figure does
not specify exactly which instantiations of the constructs should be selected for
measurement. For example, the construct of “student attitudes” is shown in the
model to be an important output of the educational system and one that
should be measured. But which attitudes—and how should they be measured?
The issue of measurement, including the reliability of measures and their con-
struct validity are essential for the construction of structural equation models.
The measurement of some of the constructs implied by the input-process-
output theory constitutes our discussion of factor analysis in Chapter 3.
Third, the educational system is viewed as multilevel in form. That is, out-
puts at the student level are hypothesized to be a function of processes mostly
occurring at the teacher/classroom level, which, in turn, are a function of

inputs mostly at the school level. Methods for the analysis of multilevel data
have been discussed, for example, by Raudenbush and Bryk (2002). Extensions
of multilevel methods to structural equation models are discussed and demon-
strated in Chapter 7.
Fourth, what is not obvious from an inspection of Figure 1.2 is that the mea-
surements taken of the inputs, processes, and outputs of the educational system
constitute a snapshot of an ongoing dynamic process. In other words, although
Figure 1.2 implies an educational system in static equilibrium, the reality of edu-
cational systems may be quite the opposite, especially when the inputs constitute
potential instruments of policy and the outputs (such as achievement) are
expected to change over time—perhaps in response to changes in policy-relevant
input variables. One solution might be to collect longitudinal data—and such
national longitudinal panel data are readily available for analysis. Chapter 8 con-
siders the problem of measuring growth in continuous variables whereas
Chapter 9 considers stage sequential changes in latent categorical variables—
both with applications to important educational outcomes.
Finally, as the motivating example is based on issues that are important for
educational reform, it is necessary to examine how models derived from the
theoretical framework can be used to improve upon conventional statistical
practice and to test causal claims. In this regard, Chapter 10 provides a review
of modern ideas of causal inference as they pertain specifically to the practice
of structural equation modeling.
Notes
1. As discussed in Berndt (1991), the Cowles Commission was founded by Alfred

Cowles, III, who, among other things, provided the resources necessary to create the
Econometric Society.
2. Berndt (1991) points out that in the first issue of Econometrica, Cowles (1933)
stated that the best records of stock market forecasters were “little, if any, better than
what might be expected from pure chance. There is some evidence, on the other hand,
to indicate that the least successful records are worse than what could be reasonably be
attributed to chance” (p. 324).
3. The term latent structure analysis appears to have been reserved for the study of
structure of dichotomous response variables.
4. Structural Equation Modeling: A Multidisciplinary Journal published by Taylor &
Francis.
5. SEMNET is composed of approximately 1500 individuals from over 75
countries. To subscribe, send an email message to [email protected]. In the body
of the message, type SUB SEMNET first-name last-name.
2
Path Analysis
Modeling Systems of Structural
Equations Among Observed Variables
W e noted in the introductory chapter that structural equation model-

ing seeks to describe the means, variances, and covariances of a set
of variables in terms of a smaller number of “structural parameters.” In this
chapter, we begin by focusing on structural parameters that represent
hypothesized relationships between a set of continuous observed variables
modeled in terms of systems of equations. Modeling systems of structural
relationships between a set of observed variables is often referred to as path
analysis but is also referred to as simultaneous equation modeling in the field
of econometrics. For the purposes of this chapter, we use the term path
analysis.
We begin with a discussion of the substantive example used throughout
this chapter as a means of introducing the problem of model specification. In
the course of this discussion, we introduce the distinction between recursive
and nonrecursive models. The discussion of model specification is followed by
an outline of the problem of model identification. Here, the necessary and suf-
ficient conditions for model identification are provided. Model identification
is followed by a discussion of parameter estimation—including the develop-
ment of maximum likelihood and generalized least squares approaches. A dis-
cussion of model and parameter testing follows, along with a detailed
discussion of the interpretation of the elements of a path analysis. In particu-
lar, we focus on the decomposition and interpretation of direct, indirect, and
total effects. The chapter concludes with a discussion of the problem of mea-
surement error.
13
2.1 A Substantive Example: Specification of Path Models
For the purposes of this chapter, we consider a model of student science

achievement. In doing so, we recognize that we are ignoring certain organiza-
tional features of the educational system as displayed in Figure 1.2. For exam-
ple, we are not considering the fact that students are nested in schools. We will
return to the issue of nesting when we consider multilevel structural equation
modeling in Chapter 7.
The data for this example come from the first follow-up of the National
Educational Longitudinal Study (NELS) of 1988 (National Center for Education
Statistics [NCES], 1988). The NELS survey was designed to provide relevant
trend data on important transitions experienced by students as they move
through elementary school and progress to high school and beyond.
The subset of students used in this example were obtained from the first
follow-up (10th grade) wave of the survey. Only those public school students
whose science teachers filled out the teacher survey were included in this analy-
sis. A set of student-level and teacher-level variables suggested by the education
indicators literature and by the input-process-output model in Figure 1.2 were
included. After listwise deletion of missing data and multiple responses, the
sample size for this example was 7,361.
In the initial stages of model specification, it is often useful to represent the
set of structural equations in the form of a path diagram. Figure 2.1 shows the
path diagram for the student-level model of science achievement implied by
the theoretical model shown in Figure 1.2 of Chapter 1. Path diagrams are espe-
cially useful pictorial devices because, if drawn accurately, then there is a one-
to-one relationship between the diagram and the set of structural equations.1
To fix notation, let p be the number of endogenous variables and q be the
number of exogenous variables. The system of structural equations represent-
ing the model in Figure 2.1 can be compactly written as
y = α + By + Γx + ζ, [2.1]
where y is a p × 1 vector of observed endogenous variables, x is a q × 1 vector of

observed exogenous variables, α is a p × 1 vector of structural intercepts, B is a
p × p coefficient matrix that relates endogenous variables to each other, Γ is a
p × q coefficient matrix that relates endogenous variables to exogenous variables,
and ζ is a p × 1 vector of disturbance terms, where Var(ζ) = Ψ is the p × p covariance
matrix of the disturbance terms, and where Var(·) is the variance operator. Finally,
let Var(x) = Φ be the q × q covariance matrix for the exogenous variables.2
The elements in B, Γ represent the structural relationships between the
variables. The patterns of zero and nonzero elements in these matrices, in turn,
Path Analysis—15
SCIGRA6 SCIGRA10
SES
SCIACH
CERTSCI UNDERSTD CHALLG
Figure 2.1 Education Indicators Model: Initial Specification
are imposed by the underlying, substantive theory. For example, in the model
shown in Figure 2.1, an element of B would be the path relating SIGRA10 to
CHALLG. An element in Γ would be the path relating SIGRA10 to SES.
We can distinguish between three types of parameters in B, Γ, and Ψ. The
first set of parameters is the one that is to be estimated. These are often referred
to as free parameters. Thus, in the model in Figure 2.1, the observable paths are
the free parameters. The value of these free parameters will be estimated with
the methods described below.
The second set of parameters are given a priori values that are held con-
stant during estimation. These parameters are often referred to as fixed para-
meters. Most often, the fixed parameters are set equal to zero to represent the
absence of the relationship. However, it is possible to fix an element to a
nonzero value if the theory is strong enough to suggest what that value
should be. Again, considering the model in Figure 2.1, we theorize that there
is no direct relationship between SCIACH and SES. Thus, this path is fixed
to zero.
Finally, it is possible to constrain certain parameters to be equal to other
parameters in the model. These elements are referred to as constrained
parameters. An example of constrained parameters would be requiring that
the effect of SCIGRA6 on SCIGRA10 be the same as the effect of SES on
SCIGRA10.
2.1.1 RECURSIVE AND NONRECURSIVE MODELS

The specification of the elements of the matrix B allows us to distinguish
between two general classifications of path analytic models: (1) recursive and
(2) nonrecursive or simultaneous. Consider first the recursive case. A prototype
recursive model is shown in Figure 2.2.3 A characteristic feature of recursive sys-
tems is that the elements of B that represent the relationships between the
endogenous variables in this model are contained in the lower triangular por-
tion of B. In addition, note that Figure 2.2 does not contain covariances between
the disturbance terms in the model. In other words, for a recursive model, Ψ is
a diagonal matrix whose elements are the variances of the disturbances.
A second type of model is referred to as a nonrecursive model. A prototype
nonrecursive model is shown in Figure 2.3. Nonrecursive models are also
referred to as simultaneous equation models and have been widely used in eco-
nomics to study problems such as supply and demand for certain commodities.
In the nonrecursive model a feedback loop between two endogenous vari-
ables is specified. Specification of the feedback look is determined by freeing
the appropriate elements in the upper triangular part of B. Moreover, it is typ-
ically the case that a covariance term is specified between the disturbances among
endogenous variables in the feedback loop. In other words, for nonrecursive
models, Ψ is specified to be a symmetric matrix with nonzero off-diagonal
elements. The nonzero covariance between the disturbance terms arises from
the fact that because y1 affects y2 and ζ1 affects y1, this implies that ζ1 affects y1,
which in turn will affect y2. This will result in a nonzero covariance between ζ1
and y2. The process works similarly for the effect of y2 on y1.
The presence of feedback loops also implies an underlying dynamic spec-
ification to the structural model insofar as some period of time is required for
the feedback to take place. The problem alluded to here concerns the extent to
which the process specified by the feedback loop will stabilize or explode as a
x1
x2 y1 y2
x3
Figure 2.2 A Prototype Recursive Path Model

Path Analysis—17
x1
y1
x2
x3 y2
Figure 2.3 Prototype Nonrecursive Path Model
result of a change to the system imparted by the exogenous variables. The

problem of the stability and equilibrium has been discussed by Sobel (1990)
and Kaplan, Harik, and Hotchkiss (2000). It is beyond the scope of this chapter
to discuss dynamic features of nonrecursive models. Suffice to say that this
issue is extremely important when we consider that most social systems are
dynamic and that cross-sectional models are static views of an ongoing dynamic
system (see, e.g., Tuma & Hannan, 1984).
2.1.2 REDUCED FORM AND COVARIANCE

STRUCTURE SPECIFICATIONS
The system described in Equation [2.1] is referred to as the structural form
of the model. It is convenient to rewrite the structural form of the model so
that the endogenous variables are on one side of the equation and the exoge-
nous variables on the other side. Thus, Equation [2.1] can be rewritten as
ðI − BÞy = α + Γx + ζ: [2.2]
Assuming that (I − B) is nonsingular so that its inverse exists, Equation [2.2]

can be written as
y = ðI − BÞ−1 α + ðI − BÞ−1 Γx + ðI − BÞ−1 ζ, [2.3]

= Π0 + Π1 x + ζ :
This specification is referred to as the reduced form of the model, where Π0

is the vector of reduced form intercepts, Π1 is the vector of reduced form
slopes, and ζ∗ is the vector of reduced form disturbances with Var(ζ∗) = Ψ∗.
Note that Equation [2.3] is a straightforward multivariate regression of y on x.

We return to the reduced form specification of the model when considering
the issue of identification later in this chapter and the issue of exogeneity in
Chapter 5.
Although the importance of the reduced form specification is recognized
in econometric and other social science treatments of structural equation
modeling, its role is usually relegated to issues of identification and estimation.
As an aside, it should be pointed out that the system of equations described
in Equation [2.3] can be represented in terms of modeling means, variances,
and covariances—as per our definition of structural equation modeling in
Chapter 1. To see this, let Ω be a vector that contains the structural parameters
of the model—so in this case Ω = ðB, Γ, Ψ, ΦÞ .4 Furthermore, let E(x) = μx be
the vector of means for x, Var(x) = E(x′x) = Φ, and E(ζ) = 0, where E(·) is the
expectation operator. Then, using rules of expectation algebra
EðyÞ = ðI − BÞ−1 α + ðI − BÞ−1 EðxÞ,

[2.4]
= ðI − BÞ−1 α + ðI − BÞ−1 μx ,
and
Eðy, xÞ = Σyx

Eðyy0 Þ Eðyx0 Þ
= [2.5]
Eðx0 yÞ Eðx0 xÞ
" #
ðI − BÞ−1 ðΓΦΓ0 + ΨÞðI − BÞ0−1 ðI − BÞ−1 ΓΦ
= :
ΦΓ0 ðI − BÞ0 −1 Φ
Equations [2.4] and [2.5] show that structural equation modeling repre-
sents a structure on the mean vector and covariance matrix. The structure is in
terms of the parameters of the model.
2.2 Identification of Path Models
A prerequisite for the estimation of the parameters of the path model is to

establish whether the parameters are identified. Identification refers to whether
the parameters of the model can be uniquely determined by the sample data.
If the parameters of the model are not identified, estimation of the parameters
is not possible. Although the problem of identification is present in almost all
parametric statistical models, the role of identification is perhaps clearest in
Path Analysis—19
structural equation models. In this section, we will define the problem of iden-
tification from the covariance structure perspective. Later, we introduce the
problem of identification from the reduced form perspective when consider-
ing some simple rules for establishing identification.
2.2.1 DEFINITION OF IDENTIFICATION

We begin with a definition of identification from the perspective of covariance
structure modeling. First, arrange the unknown parameters of the model in the
vector Ω. Consider next a population covariance matrix Σ whose elements are the
population variances and covariances. We assume that there exists an underlying,
substantive model that is purported to explain the variances and covariances in Σ.
So, for our discussion, we assume that the model in Equation [1.2] describes the
population variances and covariances. We know that the variances and covariances
in Σ can be estimated by their sample counterparts in the sample covariance
matrix S using straightforward formulae for the calculation of sample variances
and covariances. Thus, the parameters in Σ are uniquely identified from the data—
where here the data are the elements of the sample covariance matrix.
Having established that the elements in Σ are identified from their sample
counterparts, what we need to establish to permit estimation of the model
parameters is whether the model parameters are identified. We say that the ele-
ments in Ω are identified if they can be expressed uniquely in terms of the ele-
ments of the covariance matrix Σ. If all elements in Ω are identified, we say
that the model is identified.
2.2.2 SOME COMMON IDENTIFICATION RULES

Let us now consider the identification of the parameters of the path analy-
sis model in Equation [2.1]. To begin, it is important to note that there exists
an initial set of restrictions that must be imposed even for simple regression
models. The first restriction, referred to as normalization, requires that we set
the diagonal elements of B to zero, such that an endogenous variable cannot
have a direct effect on itself.
The second requirement concerns the vector of disturbance terms ζ. Note
that the disturbances for each equation are unobserved and hence have no
inherent metric. The most common way to set the metric of ζ, and the one
used in simple regression modeling, is to fix the coefficient relating the endoge-
nous variables to the disturbance terms to 1.0. An inspection of Equation [2.2]
reveals that ζ is actually being multiplied by the scaling factor 1.0. Thus, the
disturbance terms are in the same scale as their relevant endogenous variables.
With the normalization rule in place and the metric of ζ fixed, we can now
discuss some common rules for the identification of path model parameters.
Recall again that we wish to know whether the variances and covariances of the
exogenous variables (contained in Φ), the variances and covariances of the dis-
turbance terms (contained in Ψ), and the regression coefficients (contained in B
and Γ) can be solved in terms of the variances and covariances contained in Σ.
Two classical approaches to identification can be distinguished in terms of
whether identification is evaluated on the model as a whole or whether identifi-
cation is evaluated on each equation composing the system of equations. The for-
mer approach is generally associated with social science applications of structural
equation modeling, whereas the latter approach appears to be favored in econo-
metrics. Nevertheless, they both provide a consistent picture of identification in
that if any equation is not identified, the model as a whole is not identified.
The first, and perhaps simplest, method for ascertaining the identification
of the model parameters is referred to as the counting rule (see, e.g., Bollen,
1989). Let s = p + q, be the total number of p endogenous and q exogenous vari-
ables. Then the number of nonredundant elements in Σ is equal to –12 s(s + 1). Let
t be the total number of parameters in the model that are to be estimated (i .e.,
the free parameters). The counting rule states that a necessary condition for
identification is that t ≤ –12 s(s + 1). If the equality holds, then we say that the
model may be just identified. If t is strictly less than –12 s(s + 1), then we say that
the model may be overidentified. If t is greater than –12 s(s + 1), then the model
may be not identified.
As an example of the counting rule, consider the model of science achieve-
ment given in Figure 2.1. The total number of variables, s, in this model is 7.
Thus, we obtain 28 elements in Σ. There are 10 variances and covariances of
exogenous variables (including disturbances), and 8 path coefficients. Using
the counting rule, we obtain t = 28 − 18 = 10. Because t is strictly less than the
number of elements in Σ, we say that the model is overidentified. The 10 overi-
dentifying elements come from the 10 restrictions placed on the model.
Clearly, the advantage to the counting rule is its simplicity. It is also a nec-
essary but not sufficient rule for identification. We can, however, provide rules
for identification that are sufficient, but that pertain only to recursive models,
or to special cases of recursive models. Specifically, a sufficient condition for
identification is that B is triangular and that Ψ is a diagonal matrix. However,
this is the same as saying that recursive models are identified. Indeed, this is the
case, and Bollen (1989) refers to this rule as the recursive rule of identification.
In combination with the counting rule above, recursive models can be either
just identified or overidentified.
A special case of the recursive rule concerns the situation where B = 0 and
Ψ again a diagonal matrix. Under this condition, the model in Equation [2.1]
reduces to
y = α + Γx + ζ, [2.6]
Path Analysis—21
which we recognize as a multivariate linear regression model. Here, too,

we can use the counting rule to show that regression models are also just
identified.
Note that recursive models place restrictions on the form of B and Ψ and
that the identification conditions stated above are directly related to these types
of restrictions. Nonrecursive models, however, do not restrict B and Ψ in the
same way. Thus, we need to consider identification rules that are relevant to
nonrecursive models.
2.2.3 IDENTIFICATION OF NONRECURSIVE MODELS

As noted above, the approach to identification arising out of econometrics
(see Fisher, 1966), considers one equation at a time. The concern is whether a
true structural equation can be distinguished from a false one formed by a lin-
ear combination of the other equations in the model (see, e.g., Land, 1973). In
complex systems of equations, trying to determine linear combinations of
equations is a tedious process. One approach would be to evaluate the rank of
a given matrix because if a given matrix is not of full rank, then it means that
there exist columns (or rows) that are linear combinations of each other. This
leads to developing a rank condition for identification.
To motivate the rank and order conditions consider the prototype non-
recursive model in Figure 2.3. As before, let p be the number of endogenous
variables and let q be the number of exogenous variables. We can write this
model as
2 3
x
y1 0 b12 y1 g g12 0 4 15 z
= + 11 x2 + 1 : [2.7]
y2 b21 0 y2 0 0 g23 z2
x3
In this example, p = 2 and q = 3. As a useful device for assessing the rank

and order condition, we can arrange the structural coefficients in a partitioned
matrix A of dimension p × s as
A = ½ðI − BÞjΓ

1 −b12 −g11 −g12 0 [2.8]
= ,
−b21 1 0 0 −g23
where s = p + q. Note that the zeros placed in Equation [2.8] represent paths
that have been excluded (restricted) from the model based on a priori model
specification. We can represent the restrictions in the first equation of A, say
A1, as A1φ1 = 0, where φ1 is a column vector whose hth element (h = 1, . . . , s)

is unity and the remaining elements are zero. Thus, φ1 selects the particular
element of A1 for restriction. A similar equality can be formed for A2, the
second equation in the system. The rank condition states that a necessary and
sufficient condition for the identifiability of the first equation is that the rank
of Aφ1 must be at least equal to p − 1. A similar result holds for the second
equation.
The proof of the rank condition is given in Fisher (1966). If the rank is less
than p −1, then the parameters of the equation are not identified. If the rank is
exactly equal to p −1, then the parameters of the equation in question are just
identified. If the rank is greater than p −1, then the parameters of the equation
are overidentified.
The rank condition can be easily implemented as follows. First, delete the
columns containing nonzero elements in the row corresponding to the equa-
tion of interest. Next, check the rank of the resulting submatrix. If the rank is
p −1, then the equation is identified. To take the above example, consider the
identification status of the first equation. Recall that for this example, p −1 = 1.
According to the procedure just described, the resulting submatrix is

0
:
−g23
With the first row 0, the rank of this matrix is 1, and hence, the first equa-
tion is identified. Considering the second equation, the resulting submatrix is

−g11 −g12
:
0 0
Again, because of the 0s in the second row, the rank of this submatrix is 1,
and we conclude that the second equation is identified.
A corollary of the rank condition is referred to as the order condition. The
order condition states that the number of variables (exogenous and endoge-
nous) excluded (restricted) from any of the equations in the model must be at
least p −1 (Fisher, 1966). Despite the simplicity of the order condition, it is only
a necessary condition for the identification of an equation of the model. Thus,
the order condition guarantees that there is a solution to the equation, but it
does not guarantee that the solution is unique. A unique solution is guaranteed
by the rank condition.
As an example of the order condition, we observe that the first equation
has one restriction and the second equation as two restrictions as required by
the condition that the number of restrictions must be as least p − 1 (here, equal
Path Analysis—23
to 1). It may be of interest to modify the model slightly to demonstrate how the
first equation of the model would not be identified according to the order con-
dition. Referring to Figure 2.3, imagine a path from x3 to y1. Then the 0 in the
first row of A would be replaced by −γ13. Using the simple approach for deter-
mining the order condition, we find that there are no restrictions in the first
equation and therefore the first equation is not identified. Similarly, the first
equation fails the rank condition of identification.
2.3 Estimation of Model Parameters
Assuming that the parameters of the model are identified, we now move on to
describe procedures for the estimation of the parameters of the model. The
parameters of the model are (a) the variances and covariances of exogenous
variables contained in Φ, (b) the variances and covariances of disturbance
terms contained in Ψ, and (c) the regression coefficients contained in B
and Γ. Once again, it is convenient to consider collecting these parameters
together in a parameter vector denoted as Ω. The goal of estimation is to
obtain estimates of the parameter vector Ω, which we will write as Ω ^ , that
minimize a discrepancy function FðS,ΣÞ ^ , where Σ
^ = ΣðΩÞ^ is the covariance
matrix based on the estimates of the model—the so-called fitted covariance
matrix.
The function FðS,ΣÞ^ is a scalar that measures the discrepancy (distance)
between the sample covariance matrix S (the data) and the fitted covariance
matrix Σ^ based on model estimates. A correct discrepancy function is charac-
terized by the following properties (see Browne, 1984):
ðiÞ FðS,ΣÞ
^ ≥ 0,
ðiiÞ FðS,ΣÞ
^ = 0, if and only if Σ
^ = S,
ðiiiÞ FðS,ΣÞ
^ is a continuous fraction in S and Σ:
^
The first property (i) requires that the discrepancy function must be a
positive real number. The second property (ii) indicates that the discrepancy func-
tion is zero only if the model estimates reproduce the sample covariance matrix
perfectly. The third property (iii) simply states that the function is continuous.
For the purposes of this chapter, we consider maximum likelihood (ML)
and generalized least squares (GLS).5 We consider the distributional assump-
tions underlying these methods, but we postpone the discussion of assumption
violations until Chapter 5 where we consider alternative methods of estimation
under more relaxed distributional assumptions.
2.3.1 MAXIMUM LIKELIHOOD

An important breakthrough in the estimation of the parameters of path
models came from the application of maximum likelihood (ML). Maximum
likelihood was originally proposed as a method of estimation for econometric
simultaneous equation models by (Koopmans, Rubin, & Leipnik, 1950) under
the name full-information maximum likelihood. Later, Jöreskog (1973) discussed
ML estimation for general structural equation models.
To begin, let the set of observed responses x and y be denoted as z.
Furthermore, let the observed responses be based on a sample of n = N − 1
observations with corresponding unbiased sample covariance matrix S that
estimates a population covariance matrix Σ, and that is assumed to follow the
path model in Equation [2.1]. Central to the development of the ML estima-
tor is the assumption that the observations are derived from a population that
follows a multivariate normal distribution. The multivariate normal density
function of z can be written as

−ðp + qÞ=2 1=2 1 0 −1 [2.9]
f ðzÞ = ð2pÞ jΣj exp − z Σ z :
2
The multivariate normal density function in Equation [2.9] describes the

distribution for each observation in the sample. Under the assumption that the
N observations are independent of one another, the joint density function can
be written as the product of the individual densities,
f ðz1 ,z2 , . . . ,zN Þ = f ðz1 Þf ðz2 Þ f ðzN Þ: [2.10]
If Equation [2.9] represents the multivariate normal density function for a sin-
gle sample member, then the product given in Equation [2.10] can be written as
" #
−N ðp + qÞ=2 −N =2
X
N
LðΩÞ = ð2pÞ jΣðΩÞj exp 1
2 z0 i Σ−1 ðΩÞzi , [2.11]
i=1
where L(Ω) is defined to be the likelihood of the sample.

To simplify the derivation, and with no loss of generality, it is convenient
to take the log of Equation [2.11] yielding the log-likelihood

−N ðp + qÞ N
logLðΩÞ = logð2pÞ − logjΣðΩÞj
2 2

N [2.12]
− tr½TΣ−1 ðΩÞ:
2
Path Analysis—25
The last term on the right-hand side of Equation [2.12] arises from the
fact that the term in the brackets of Equation [2.11] is a scalar, and the trace of
a scalar is a scalar. Thus, referring to the last term on the right-hand side of
Equation [2.11] we have
X X
1 N 0 −1 1 N
− z i Σ ðΩÞzi = − tr½z0 i ΣðΩÞzi : [2.13]
2 i=1 2 i=1
Multiplying and dividing by N and using the trace rule that tr(ABC) =
tr(CAB) yields
X X
1 N N N −1 0 −1
− tr½z0i ΣðΩÞzi = − tr N zi zi Σ ðΩÞ
2 i=1 2 i=1
[2.14]
N
=− tr TΣ − 1 ðΩÞ ,
2
where T is the sample covariance matrix based on N rather than on n = N − 1.

The next step is to maximize Equation [2.12] with respect to the parame-
ters of the model. Maximizing the log-likelihood in Equation [2.12] requires
obtaining the derivatives with respect to the parameters of the model, setting
the derivatives equal to zero, and solving. The rules of matrix differential cal-
culus used for this task can be found in Magnus and Neudecker (1988).
Continuing, first note that Equation [2.12] contains the constant term
½ − N ðp + qÞ=2logð2πÞ , which does not contain model parameters. Thus, this
term will not enter into the derivatives and can therefore be ignored. Second, we
note that the difference between T (based on N) and the usual unbiased estimate
S (based on n = N − 1) is negligible in large samples. We can therefore rewrite
Equation [2.12] as
N
logLðΩÞ = − logjΣðΩÞj + tr½SΣ − 1 ðΩÞ : [2.15]
2
A problem with Equation [2.15] is that it does not possess the properties
of a correct discrepancy function as described above. To see this, note that if
S = Σ, then the second term on the right-hand side of Equation [2.15] will be
an identity matrix of order p + q and the trace will equal p + q. However, the
difference between the first term and second term will not equal zero as
required if Equation [2.15] is to be a proper discrepancy function, as in prop-
erty (ii) discussed above. To render Equation [2.15] a proper discrepancy func-
tion, we need to add terms that do not depend on model parameters and
therefore are not involved in the differentiation. To begin, we can remove the
term − –N2 , in which case we are minimizing the function rather than maximiz-
ing it. Then, we can add terms that do not depend on model parameters and
thus are of no consequence to differentiation. This gives
FML = logjΣðΩÞj + tr½SΣ − 1 ðΩÞ − logjSj − t, [2.16]
where t is the total number of variables in z, that is, t = p + q. It can be seen that
if the model fits perfectly, the first and third terms sum to zero and the second
and fourth terms sum to zero and therefore Equation [2.16] is now a proper
fitting function as defined properties (i) to (iii) above.
In addition to obtaining the estimates of the model parameters, we can also
obtain the covariance matrix of the estimates. Let Ω^ be the r × 1 vector of estimated
model parameters. Then, the asymptotic covariance matrix of Ω ^ can be written as

2 −1
∂ logLðΩÞ
ACOVðΩÞ = −E
^ , [2.17]
∂Ω∂Ω0
where the expression in the brackets is referred to as the Fisher information

matrix denoted as I(Ω). From here, one can obtain standard errors from the
square roots of the diagonal elements of the asymptotic covariance matrix of
the estimates.
Maximum Likelihood Estimation of

the Science Achievement Model
Returning to the science achievement example, the model in Figure 2.1
specifies that student background as measured by SES (socioeconomic status),
prior science grades, and teacher science certification predicts student science
achievement through measures of instructional quality as perceived by the
students. This model omits teaching quality and curriculum quality insofar as
these measures are unavailable in NELS.
Table 2.1 gives the variable names and their measurement scales and
descriptive statistics for the variables chosen for our science achievement
model. It can be seen that certain variables exhibit moderate levels of skewness
and kurtosis. The problem of how nonnormality influences the results of
structural equation modeling as well as alternative estimators for addressing
this problem are taken up in Chapter 5. For now, we will assume multivariate
normality of the data.
Path Analysis—27
Table 2.1 Variable Names and Descriptive Statistics
Namesa Min Max Mean Std. Skew Kurtosis
SCIGRA6 1.000 5.000 3.950 0.970 −0.688 −0.039

CERTSCI 0.000 1.000 0.390 0.490 0.465 −1.784
SES −2.250 2.010 −0.007 0.745 0.036 −0.373
UNDERSTD 2.000 6.000 4.380 1.360 −0.370 −1.097
CHALLG 2.000 6.000 4.950 1.200 −1.061 0.168
SCIGRA10 2.000 9.000 6.700 1.840 −0.542 −0.505
SCIACH 10.130 34.680 22.044 5.866 0.125 −0.860
Multivariate kurtosis = −2.216 (z = −8.469, p < .001)
b
a. SCIGRA6, self-reported science grades from grade 6 to present; CERTSCI, Is teacher certified to
teach science in state? (1 = yes); SES, socioeconomic status composite; UNDERSTD, How often is
student asked to show understanding of science concepts?; CHALLG, How often does student feel
challenged in science class?; SCIGRA10, self-reported science grades from grade 10; SCIACH, item
response theory estimated number right on science achievement test.
b. Mardia’s coefficient of multivariate kurtosis.
The software program Mplus (L. Muthén & Muthén, 2006) was used for
this analysis. ML estimates of the model parameters and tests of significance are
given in the upper panel Table 2.2. The unstandardized estimates are the direct
effects and the covariances of the exogenous variables in the model. It can be
seen that with few exceptions, each direct effect is statistically significant.
2.3.2 GENERALIZED LEAST SQUARES AND

UNWEIGHTED LEAST SQUARES ESTIMATION
The GLS estimator was developed by Aitken (1935) and applied to the
path analysis setting by Jöreskog and Goldberger (1972; see also Anderson,
1973). As in the case of standard linear regression, the basic idea behind the
GLS estimator is to correct for heteroscedastic disturbances. The GLS estima-
tor is actually a member of the family of weighted least squares (WLS) estima-
tors that can be written generally as
FWLS = ½S − ΣðΩÞ0 W−1 ½S − ΣðΩÞ, [2.18]
where W−1 is a weight matrix that weights the deviations S − ΣðΩÞ in terms
of their variances and covariances with other elements. Notice that this is a
Table 2.2 Maximum Likelihood Estimates Direct Effects for the Initial Science
Achievement Model
Estimates SE Est./SE Std StdYX
IRTSCI ON
SCIGRA10 1.228 0.034 35.678 1.228 0.384
SCIGRA10 ON
CHALLG −0.033 0.017 −1.961 −0.033 −0.022
SCIGRA6 0.781 0.020 38.625 0.781 0.413
SES 0.239 0.026 9.103 0.239 0.097
CERTSCI −0.040 0.039 −1.039 −0.040 −0.011
UNDERSTD 0.168 0.015 11.315 0.168 0.125
CHALLG ON
UNDERSTD 0.318 0.010 33.225 0.318 0.361
UNDERSTD ON
CERTSCI −0.030 0.033 −0.929 −0.030 −0.011
Residual variances
UNDERSTD 1.858 0.031 60.667 1.858 1.000
CHALLG 1.250 0.021 60.667 1.250 0.870
SCIGRA10 2.637 0.043 60.667 2.637 0.786
IRTSCI 29.291 0.483 60.667 29.291 0.853
2
Observed variable R
UNDERSTD 0.000
CHALLG 0.130
SCIGRA10 0.214
IRTSCI 0.147
proper discrepancy function insofar as if the model fits the data perfectly, the
first and last terms on the right-hand side of Equation [2.18] will yield a null
matrix.
A critical consideration of WLS estimators is the choice of a weight matrix
W . One choice could be W−1 = I, the identity matrix. With the identity matrix
−1
as the choice for the weight matrix, WLS reduces to unweighted least squares
(ULS). Unweighted least squares is identical to ordinary least squares in the
standard regression setting in that it assumes homoscedastic disturbances.
Moreover, although ULS is known to yield unbiased estimates of model
Path Analysis—29
parameters, it is not the most efficient choice of estimators with respect to

yielding estimates with minimum sampling variability.
To address the potential problem of heteroskedastic disturbances, one can
choose W−1 = S−1. Indeed, this is the most common choice for W−1. Choosing
W−1 = S−1 defines the GLS estimator and can be rewritten as
1
FGLS = tr½S−1 ðS − ΣÞ2
2
1 [2.19]
= trðI − S−1 ΣÞ2 :
2
Under the assumption of multivariate normality, the GLS fitting function

has identical asymptotic properties to ML—namely, the GLS estimator is
asymptotically normal and asymptotically efficient, thus improving on ULS.
In Chapter 5, we will consider alternative choices for W−1 that address the
problem of nonnormal data.
2.3.3 A NOTE ON SCALE INVARIANCE AND SCALE FREENESS

An important consideration in the choice of estimation methods is the
properties of scale invariance and scale freeness. Scale invariance refers to the
property that the value of the fit function is the same regardless of the change
of scale of the measurements. For example, if the value of the fit function is the
same when transforming a covariance matrix to a correlation matrix, then the
estimator is scale invariant.
A similar concept is that of scale freeness. This concept concerns the rela-
tionship between parameter estimates based on untransformed variables and
those based on linearly transformed variables. More specifically, if scaling fac-
tors can be determined that allow one to obtain transformed estimates from
untransformed estimates (and vice versa), then the estimator is scale free.
Of the estimators discussed in Section 2.3, ML and GLS are both scale
invariant and scale free under general conditions. If parameters are constrained
to nonzero constants, or if there a specific types of cross-group equality
constraints, then ML and GLS may lose their scale-free properties. Unweighted
least squares, by contrast, is neither scale invariant nor scale free.
2.4 Model and Parameter Testing
A feature of ML and GLS estimation of the path model is that one can explicitly
test the hypothesis that the model fits the data. Consider again Equation [2.15].
This is the log-likelihood under the null hypothesis that the specified model
holds in the population. The corresponding alternative hypothesis is that Σ is

any symmetric positive definite matrix. Under the alternative hypothesis, the
log-likelihood attains its maximum with S as the estimator of Σ. Thus, the log-
likelihood under the alternative hypothesis, say log La, can be written as
n
logLa = − logjSj + trðSS−1 Þ
2
n [2.20]
= − logjSj + trðIÞ
2
n
= − logjSj + t:
2
The statistic for testing the null hypothesis that the model fits in the pop-
ulation is referred to as the likelihood ratio (LR) test and is expressed as
L0
−2log = −2logL0 + 2logLa
La
[2.21]
= n logjΣj + trðΣ−1 SÞ − nðlogjSj + tÞ

= n logjΣj + trðΣ−1 SÞ − logjSj − t :
Notice from the last equality in Equation [2.21] that the log-likelihood
ratio is simply n × FML.
The large sample distribution of the LR test is chi-square with degrees of
freedom (df) given by the difference in the number nonredundant elements in
Σ and the number of free parameters in the model. The LR chi-square test is
used to test the null hypothesis that the population covariance matrix pos-
sesses the structure implied by the model against the alternative hypothesis
that Σ is an arbitrary symmetric positive definite matrix.
In the context of our science achievement example, the LR chi-square sta-
tistic indicates that the model does not fit the data (χ2 = 1321.13, df = 10,
p < .000). Numerous explanations for the lack of fit are possible including
nonnormality, missing data, sample size sensitivity, and incorrect model spec-
ification. These are taken up in detail in Chapter 5.
In addition to a global test of whether the model fits perfectly in the pop-
ulation, one can also test hypotheses regarding the individual fixed and freed
parameters in the model. We can consider three alternative ways to evaluate the
fixed and freed elements of the model vis-à-vis overall fit. The first method
rests on the difference between the LR chi-square statistics comparing a given
model against a less restrictive model. A less restrictive model can be formed
by freeing one of the currently restricted paths.
Recall from Section 2.3.1 that the LR test of the null hypothesis is given as
n ∗ FML. This initial null hypothesis, say H01, is tested against the alternative
Path Analysis—31
hypothesis Ha that Σ is a symmetric positive definite matrix. Consider a sec-

ond hypothesis, say H02 that differs from H01 in that a single restriction ωj = 0
is relaxed. For example, in Figure 2.1 we may relax the restriction that SES does
not affect SCIACH. Note that the alternative hypothesis is the same in both
cases and thus will cancel in the algebra. Therefore, the change in the chi-square
value can be written as
w2 = nðFML1 − FML2 Þ, [2.22]
where the distribution of Δχ2 is distributed as chi-square with degrees-of-

freedom equaling the difference in degrees of freedom between the model
under H01 and the less restrictive model under H02. In the case of a single
restriction described here, the Δχ2 test is evaluated with one degree of freedom.
The second method of evaluating the components of the model concerns
whether the restrictions placed on the model hold in the population. Denote
∂logLðΩÞ
sðΩÞ = [2.23]
∂Ω
as the score vector representing the change in the log-likelihood for a change in
Ω. For the estimated parameters in the model, the elements of s(Ω) will be
zero, because at the maximum of the likelihood, the vector of partial derivative
is zero. However, for the restricted elements in Ω, say Ωr, the partial derivatives
will only be zero if the restrictions hold exactly. If the restrictions don’t hold
exactly, which would almost always be the case in practice, then the maximum
of the likelihood would not be reached and the derivatives would not be zero.
Thus, a test can be formed, referred to as the Lagrange multiplier (LM) test,
which assesses the validity of the restrictions in the model (Silvey, 1959). The
LM test can be written as
LM = ½sðΩ ^ r Þ − 1 ½sðΩ
^ r Þ0 IðΩ ^ r Þ, [2.24]
where I(Ωr) was earlier defined as the information matrix. The LM test is asymp-
totically distributed as chi-square with degrees of freedom equaling the difference
between the degrees-of-freedom of the more restrictive model and the less restric-
tive model. Again, if one restriction is being evaluated, then the LM test is evalu-
ated with one degree of freedom. The LM test in Equation [2.24] is also referred
to as the modification index (Sörbom, 1989). This test is most commonly used for
model modification and we will defer that discussion until Chapter 6.
Finally, we can consider evaluating the impact of placing restrictions on
the unrestricted model. Let r(Ω) represent a set of restrictions placed on a
model. In our science achievement example, r(Ω) represents the paths fixed to
zero. The estimates r(Ωr) are zero by virtue of the specification. The question
is whether the restrictive model holds for the set of unrestricted estimates, say
r(Ωu). In other words, if a small (and perhaps nonsignificant) path coefficient
was restricted to be zero (removed from the model), would that restriction
hold in the population? If the restrictive model holds, then r(Ωu) should not
differ significantly from zero. However, if the restrictive model does not hold,
one would expect the elements of r(Ωu) to differ significantly from zero. The
test for the validity of restricting parameters is given by the Wald test (W) writ-
ten as
(" # " #)−1

0
∂rð Ω
^ uÞ −1 ∂rðΩ
^ uÞ [2.25]
W = rðΩ
^ uÞ IðΩ
^ uÞ rðΩ
^ uÞ :
∂Ωû ∂Ωû
The Wald test is asymptotically distributed as chi-square with degrees of

freedom equaling the number of imposed restrictions.
When interest focuses on evaluating one restriction, that is, ωj = 0, the
W test in Equation [2.25] reduces to
^ 2j
o
Wj = , [2.26]
oj Þ
Varð^
where Var(ωj) is the jth diagonal element of the asymptotic covariance matrix
of the estimates. In large samples, Wj is distributed as chi-square with one
degree of freedom. Note that the square root of Equation [2.26] gives
^j
o
z= , [2.27]
oj Þ
seð^
which has an asymptotic normal distribution with mean 0 and variance 1. This
statistic can also be used to test the null hypothesis that ωj = 0. The LR difference
test, the LM test, and the Wald test are known to be asymptotically equivalent
(see Buse, 1982; Engle, 1984).
Assessing the Impact of Restrictions

in the Science Achievement Model
Table 2.2 presents the z-tests (denoted as EST/S.E. in Mplus) for each esti-
mated path in the science achievement model. It can be seen that with the
exception of the regressions of UNDERSTD on CETSCI, SCIGRA10 on
Path Analysis—33
CERTSCI, SES on CERTSCI, and the correlation of CETSCI with SCIGRA6, all
paths are statistically significant. As noted above, we can evaluate the effect of
restricting one of these paths on the overall LR chi-square test by simply squar-
ing the specific z-value of interest. So, for example, restricting the regression of
SCIGRA10 on UNDERSTD (z = 11.315) to zero, we would expect the LR chi-
square test to increase by z2 = 128.029. This would indicate a significant decre-
ment in the overall fit of the model. Similarly, if we wished to restrict the path
from UNDERSTD to CERTSCI (z = −0.929), the resulting change in the LR
chi-square test would be z2 = 0.0841, which is not a significant decrement to
model fit.
2.5 Interpretation of Model Parameters
An important, though somewhat neglected practice in structural equation

modeling is the interpretation of the structural coefficients. Indeed, an earlier
review of substantive studies using structural equation modeling shows that
once goodness-of-fit is established, rarely are the structural parameters inter-
preted (Elliott, 1994) with respect to their substantive meaning. However, if the
goal of the model is to move beyond explanation and toward using the model
to address specific substantive questions, then interpretation of the parameters
is crucial.6
2.5.1 EFFECT DECOMPOSITION

To begin, we need a vocabulary for interpreting the coefficients of the
model.7 In the terminology of path analysis (e.g., Duncan, 1975), the elements
of B and Γ represent the direct effects of the model. As such, they can be inter-
preted as any other type of regression coefficient. For example, an element of
B gives the increase in the endogenous variable, for a unit increase in the value
of another endogenous variable. Similarly, an element of Γ gives the increase
in the endogenous variable for a unit increase in the exogenous variable. In
either case, the direct effect of one variable on another is not mediated by
another variable. If the metrics of the exogenous and endogenous variables of
interest are substantively meaningful, then the meaning of these increases is
straightforward.
To take an example from our science achievement model, the regression
coefficient relating SCIGRA10 to SCIGRA6 is a direct effect in the model and
is contained in the Γ matrix. Similarly, the regression coefficient relating SCI-
ACH to SCIGRA10 is also direct effect but is contained in the B matrix. In
addition to interpreting the direct effects, we can make use of Equations [2.27]
or [2.28] to assess the statistical significance of the direct effects.
In addition to the direct effects of the model, path analysis allows for the
further decomposition of the total and indirect effects. Indeed, the decomposi-
tion of effects represents a classic approach to the development of path analy-
sis (see, e.g., Duncan, 1975).
It is perhaps pedagogically easier to first consider the decomposition of
the total effect. The total effect is the sum of the direct effect and all indirect
effects of an exogenous variable on an endogenous variable of interest. From
the standpoint of the equations of the model, it is useful to consider the
reduced form specification shown earlier in Equation [2.3]. In the context of
the reduced form of the model, the coefficient matrix Π1 ≡ ðI − BÞ − 1 Γ is
the matrix of total effects.
In many respects, an analysis of the total effects and their substantive
and/or statistical significance provides the information necessary to further
use the model for prediction purposes. That is, often an investigator can isolate
a particular endogenous outcome as the ultimate outcome of interest. The
exogenous variables, on the other hand, may have clinical or policy relevance
to the investigator.8 The mediating variables, then, represent the theorized
processes of how changes in exogenous variables lead to changes in endoge-
nous variables. However, the process may be less important in some contexts
than the simple matter of the overall effect. An analysis of the total effects can
provide this information.
If, in a given context, it is important to understand meditating processes,
then one needs to consider the indirect effects. An indirect effect is one in
which an exogenous variable influences an endogenous variable through the
mediation of at least one other variable. For example, it may be of interest to
determine if students with higher science grades and higher science achieve-
ment scores are associated with teachers who are certified in science by virtue
of higher instructional quality.
To obtain an expression for the indirect effects recall that the total effect is
the sum of the direct and all indirect effects. This then leads to an expression
for the indirect effects of exogenous variables on endogenous variables.
Specifically, if Γ is the matrix of direct effects of exogenous variables on
endogenous variables, and ðI − BÞ − 1 Γ is the matrix of total effects, then it fol-
lows that the matrix containing total indirect effects is ðI − BÞ − 1 Γ − Γ.
Table 2.3 provides a selected set of effect decompositions for the science
achievement model. The total indirect effect of sixth grade reported science
grades on science achievement is statistically significant as is the total indirect
effect of SES on science achievement. The specific indirect effects of certifica-
tion to teach science on science achievement are all nonsignificant. The specific
indirect effects from UNDERST to science achievement are each statistically
significant.
Path Analysis—35
Table 2.3 Selected Total Indirect and Specific Indirect Estimates for the Science
Achievement Model
Effects from SCIGRA6 to IRTSCI

Total indirect 0.959 0.037 26.208 0.959 0.158
Effects from SES to IRTSCI
Total indirect 0.294 0.033 8.820 0.294 0.037
Effects from CERTSCI to IRTSCI
Total indirect −0.055 0.048 −1.151 −0.055 −0.005
Specific indirect
IRTSCI
SCIGRA10
CERTSCI −0.050 0.048 −1.038 −0.050 −0.004
IRTSCI
SCIGRA10
UNDERSTD
CERTSCI −0.006 0.007 −0.926 −0.006 −0.001
IRTSCI
SCIGRA10
CHALLG
UNDERSTD
CERTSCI 0.000 0.000 0.839 0.000 0.000
Effects from CHALLG to IRTSCI
Total indirect −0.041 0.021 −1.958 −0.041 −0.008
Effects from UNDERSTD to IRTSCI
Total indirect 0.194 0.018 10.834 0.194 0.045
Specific indirect
IRTSCI
SCIGRA10
UNDERSTD 0.207 0.019 10.786 0.207 0.048
IRTSCI
SCIGRA10
CHALLG
UNDERSTD −0.013 0.007 −1.955 −0.013 −0.003
2.5.2 STANDARDIZED SOLUTIONS

When observed variables have different or arbitrary scales, it is often nec-
essary to use standardized coefficients to aid interpretation. Considering the
science achievement model, such variables as SES and SCIACH have under-
standable and usable metrics. In cases where the metrics are not readily inter-
pretable, or perhaps arbitrary, it is useful to adopt a new metric for the variables
so as to yield substantively interesting interpretations. One such approach to the
problem is to standardize the structural parameters of the model. In the context
of path analysis, consider an unstandardized path coefficient for an element of
Γ, say γpq. Then, the standardized element is obtained as
!
^ yp
s
^gpq = g^ , [2.28]
^ xq pq
s
where s ^ xq are the variances of yp and xq, respectively. Note that s

^ yp and s ^ yp
is obtained from a specific estimated diagonal element of the upper left-hand
partition of Equation [2.5], namely ðI − BÞ ^ − 1 ðΓ
^ ΦΓ
^ 0 + ΨÞðI
^ − BÞ ^ 0 − 1 , whereas
^ xq is obtained from a diagonal element of
s ^ in Equation [2.5]. Similarly, for
an element of B, the standardized coefficient is given as

^ y0 ^
^ 0 = s
b b 0: [2.29]
pp
^ y pp
s
Standardized coefficients can also be obtained for indirect, and total

effects of the model.
In the context of our science achievement example, an inspection of the
standardized solutions in Table 2.2 indicate that holding constant student SES
and teacher certification in science, the strongest direct effect in the model is the
relationship between previous science grades and current science grades. In
terms of the relations these variable have to science achievement, it appears that
the indirect effect of previous science grades through current science grades is
the strongest predictor of science achievement (see Table 2.3). Direct and indi-
rect effects of student reported instructional quality on science achievement are
moderate to weak.
It is important to note that this interpretation is based on the initial spec-
ification of the model, which we pointed out does not fit the data as evidenced
by the LR chi-square. As we will see in Chapter 5, evidence of poor fit may
result in biased parameter estimates so our interpretations must be taken with
caution. At this point in our discussion, we have not modified the model on the
basis of statistical and/or substantive considerations. Model modification con-
stitutes an important component of the conventional approach to structural
equation modeling. In Section 2.4 we discussed the use of the Lagrange multiplier
Path Analysis—37
as a method that allows us to assess the validity of restrictions in the model. In

Chapter 5, we examine the Lagrange multiplier in conjunction with other
information as a means of modifying the initial model.
2.6 Conclusion
This chapter introduced the basics of path analysis. We covered issues of iden-
tification, estimation, and testing as well as provided a substantive example.
Although additional topics addressing assumptions and model testing are
taken up in later chapters, the steps used in this example are characteristic of
the conventional approach to structural equation modeling. Namely, a model
was postulated that represents causal relationships implied by the input-
process-output theory. Next, data were collected and variables were chosen
that represent the theoretical constructs of interest. Finally, model parameters
were estimated and tested as was the overall fit of the model. Throughout this
chapter, an underlying assumption was that the variables were measured with-
out error. This, of course, is a heroic assumption in most situations. Moreover,
consequences of violating this assumption are fairly well known—namely
measurement error in our exogenous and endogenous variables can attenuate
regression coefficients and induce biased standard errors, respectively
(Duncan, 1975; Jöreskog & Sörbom, 2000). Ideally, the goal would be to con-
duct path analysis on error-free measures. One approach to obtaining error-
free measures of theoretical variables is to develop multiple measures of the
underlying construct and eventually employ the construct directly into the
path analysis.
In the next chapter, we address the issue of validating measures of our the-
oretical variables via the method of factor analysis. This discussion then leads
into Chapter 4, which combines factor analysis and path analysis into a com-
prehensive structural equation methodology.
Notes
1. Path diagrams do not represent the theory or even the theoretical model,
assuming there is one. Rather, the path diagram is a pictorial representation of a statis-
tical model of the data.
2. In this chapter, I will use standard econometric terminology for describing path
models. Thus, the terms endogenous variables and exogenous variables are terms derived
from econometrics. Other related terms are dependent variables and independent vari-
ables or criterion variables and predictor variables, respectively. Moreover, I will use nota-
tion similar to that used in LISREL (Jöreskog & Sörbom, 2000).
3. This and other path diagrams were drawn using AMOS 4.0.
4. Perhaps more accurately, the parameter vector Φ could be omitted from this
list. The parameter vector Φ contains the variances and covariances of the exogenous
variables and is not structured in terms of model parameters. In fact, estimates in Φ will
be identical to the corresponding elements in the sample covariance matrix S.
5. The focus of attention on these estimators does not result in a loss of general-
ity. Indeed, the maximum likelihood estimator that is discussed is referred to as FIML
(full information maximum likelihood) in the econometrics literature. Moreover, for
recursive models, two-stage least squares and unweighted least squares are identical.
6. This is not to suggest that goodness-of-fit is unimportant. Indeed, serious lack
of fit may be due to specification errors that would, in turn, lead to biased structural
coefficients. However, the evidence suggests that goodness-of-fit dominates the model-
ing effort with little regard to whether the model estimates are sensible or informative.
7. In this section, we will focus on terminology typically encountered in social
and behavioral science applications of structural equation modeling. The literature on
causal inference in structural equation models makes clear distinctions between statis-
tical parameters and causal parameters. We defer that discussion to Chapter 11.
8. Implicit here is the idea that the exogenous variables of interest are truly exoge-
nous. The issue of exogeneity is taken up in Chapter 5.
3
Factor Analysis
I n Chapter 2, we outlined the method of path analysis as an approach to

understanding relationships between a set of observable variables described
by systems of equations. We noted in Chapter 2, as well as in the introductory
chapter, that theory often dictates what to measure but not exactly how to mea-
sure it. Yet, the development of scales that map theoretical variables into
number systems constitutes arguably the most important step in the modeling
process.
In this chapter, we will consider the measurement of underlying con-
structs via the method of factor analysis. We consider first the unrestricted fac-
tor model, including principal components analysis (PCA). The discussion of
statistical hypothesis testing of the number of factors in the unrestricted model
will then lead into a discussion of the restricted model as a method of statisti-
cally testing the factor structure underlying a set of measurements. We con-
sider the specification of the restricted model as well as issues of identification,
estimation, and testing.
3.1 Model Specification and Assumptions
The example used throughout this chapter explores the factor structure of
student perceptions of school climate. This problem has important implica-
tions for the input-process-output model not only because student percep-
tions are important education indicators in their own right but they may also
be predictive of achievement.
The data for this example come from the responses of a sample of public
school 10th grade students to survey items in the National Educational
Longitudinal Study (NCES, 1988).1 Table 3.1 defines the items in the question-
naire. After mean imputation of missing data, the 10th grade sample of
students was 12,669.2
39
Table 3.1 Variables Used to Measure Students Perceptions of School Climate
Label Variable
GETALONG Students get along well with teachers

SPIRIT There is real school spirit
STRICT Rules for behavior are strict
FAIR Discipline is fair
RACEFRND Students make friends with students of other racial and ethnic groups
DISRUPT Other students often disrupt class
TCHGOOD The teaching is good
TCHINT Teachers are interested in students
TCHPRAIS When I work hard on schoolwork, my teachers praise my effort
TCHDOWN In class I often feel “put down” by my teachers
STUDOWN In school I often feel “put down” by other students
LISTEN Most of my teachers really listen to what I have to say
FEELSAFE I don’t feel safe at this school
IMPEDE Disruptions by other students get in the way of my learning
MISBEHAV Misbehaving students often get away with it
SOURCE: From the National Educational Longitudinal Study (NCES, 1988).

NOTE: Survey items are measured on a 4-point scale ranging from strongly agree to strongly
disagree.
Although it may be the case that a researcher has a particular model relat-
ing student perceptions of school climate to achievement in mind, of priority
is the measurement of the constructs that are incorporated into the model.
The researcher may postulate that there are several important dimensions to
student perceptions. The question is whether a set of measurements that asks
students to rate their agreement to statements about the climate of the school
correlate in such a way as to suggest the existence of the factors in question.
The model used to relate observed measures to factors is the linear factor
analysis model and can be written as
x = Λx ξ + δ, [3.1]
where x is q × 1 vector of observed responses on q questions that are assumed

to measure student perceptions of school climate for N students, Λx is a q × k
matrix of factor regression weights (loadings), ξ is a k × 1 vector of common
Factor Analysis—41
factors that are mathematical instantiations of the underlying dimensions of

student perceptions, and δ is a q × 1 vector of unique variables that contain both
measurement error and specific error to be described below. Equation [3.1]
expresses the observed variables in terms of a weighted set of common factors
and a vector of unique variables.
It is convenient to invoke assumptions that are common to linear models—
namely that
EðξÞ = 0,
EðδÞ = 0,
and
Covðξ,δÞ = 0:
Under these assumptions, the covariance matrix of the observed data can
be written in the form of the fundamental factor analytic equation,
Σ = Covðxx0 Þ = Λx Eðξξ0 ÞΛ0x + Eðδδ0 Þ [3.2]

= Λx ΦΛ0x + Θδ ,
where Σ is a q × q population covariance matrix, Φ is a k × k matrix of factor

variances and covariances, and Θδ is a q × q diagonal matrix of unique
variances. Note that Equation [3.2] can be considered a special case of the
general model given in Equation [1.2] in Chapter 1 where the parameter vector
Ω contains the parameters of the factor analysis model.
3.2 The Nature of Unique Variables
Before moving on to the problem of identification and estimation of the para-

meters of the model in Equation [3.2], it would be useful to consider the nature
of the unique variables contained in the vector δ of Equation [3.1]. The unique
variables that constitute the elements of δ do not contain only measurement
error. To see this, consider the model in Equation [3.1] for a vector of true
scores t rather than observed scores x. According to classical true score theory
(e.g., Lord & Novick, 1968), the vector of true scores are defined as
t = x − e, [3.3]
where e represents pure measurement error. It is reasonable to assume that the

factor model that holds for the observed scores also holds for the true scores.
However, the factor model for the true scores will not contain measurement
error but will contain specific error due to the particular selection of variables
in the model. The factor analytic model for the true scores can be written as
t = Λx ξ + s: [3.4]
The vector s contains specific variances, defined as the variances in the

true scores that are due to the particular selection of variables (see Harman,
1976). Inserting Equation [3.4] into Equation [3.3], we see that the uniqueness
term is δ = s + e. Despite the fact that unique variance is composed of specific
variance and error variance, we typically assume that specific variances are small
relative to measurement error variance.
3.3 Identification in the Unrestricted Factor Model
Prior to presenting estimation of the unrestricted model, it is necessary to dis-

cuss parameter identification. The issue of identification was raised in Chapter 2
and concerns whether a unique solution to the parameters of a model exist. In
the case of factor analysis, let us consider Equation [3.2] with the subscripts
removed for simplicity. The basic problem of identification in the context of
factor analysis concerns rotational indeterminacy—namely the fact that we can
rotate the solution to a factor analysis in an infinite number of ways and obtain
the same solution.
Following Lawley and Maxwell (1971) define a k × k nonsingular orthog-
onal transformation matrix T. The properties of T are such that TT0 = TT − 1 = I,
a k × k identity matrix. It can be shown that if
Λ = ΛT,
Φ = T−1 ΦT−1 ,
and
Θ = Θ,
then
0
Σ = Λ Φ Λ + Θ
= ΛTT−1 ΦT−1 T0 Λ0 + Θ
= ΛΦΛ0 + Θ
= Σ.
This shows that any orthogonal transformation of the system will give rise
to the same covariance matrix. When this is the case, we say that the model is
not identified and that there are k × k = k2 indeterminancies that must be
removed in order for there to be a unique solution to the factor model. The k2
elements correspond to the dimension of the transformation matrix T.
To see how the identification problem is handled, first consider the case of
orthogonal factors—that is, factors that are not correlated. For the orthogonal
factor case, Φ = I. When there is only one factor, that is, k = 1, setting Φ = I
(or φ = 1) removes the k2 indeterminancies completely. No orthogonal trans-
formation of the system is possible, and the parameters are uniquely identified.
This is the reason that we cannot rotate one factor. When k = 2 then k2 = 4 and
setting Φ = I removes k(k + 1)/2 = 3 indeterminancies, leaving one remaining
indeterminacy. Finally, we can consider the general case of k ≥ 2. Again, with
Φ = I, we have removed k(k + 1)/2 indeterminancies leaving k2 − k(k + 1)/2 =
k(k − 1)/2 indeterminancies to be removed.
We can see from the above discussion that simply setting Φ = I does not
remove all the indeterminancies in the model except in the case when there is
only one factor (k = 1). The remaining k(k − 1)/2 must be placed in the ele-
ments of Λ. The manner in which these remaining restrictions are imposed
depends on the method of estimation that is used. For the most part, the dif-
ferences among estimation methods in the manner in which these restrictions
are imposed are arbitrary. However, given that the restrictions are imposed in
an arbitrary fashion to simply fix the reference factors, this arbitrariness can
be exploited for purposes of factor rotation. This topic is further elaborated
on below.
3.4 Nonstatistical Estimation in the

Unrestricted Model: Principal Components
Analysis and the Common Factor Model
What has been covered so far concerns issues of identification of the para-
meters irrespective of the method of factor extraction. We can now turn to the
problem of factor extraction directly. We consider principal components analy-
sis and the common factor model, both of which use the method of principal
axis factoring for factor extraction. We then briefly consider two statistical
methods of factor analysis that rest on the common factor model, namely
generalized least squares and maximum likelihood estimation. These estima-
tion methods were discussed in greater detail in Chapter 2. Each method is
applied to the problem of estimating the factors of student perception of
school climate.
3.4.1 PRINCIPAL AXIS FACTORING

There are many ways in which factors can be extracted from a covariance
matrix (see, e.g., Mulaik, 1972). The method of principal axis factoring is per-
haps the most common. Principal axis factoring seeks to transform the origi-
nal set of variables into a new set of orthogonal variables that retains the total
amount of variance in the observed variables. Principal axis factoring does not
assume a measurement model for the data per se but simply constitutes a
mathematical transformation of the original variables.
To fix ideas, consider a population covariance matrix Σ for a set of q
observed variables. Following Tatsuoka (1988), a variance maximizing trans-
formation of the original variables can be accomplished through the solution
of the eigenvector/eigenvalue equation
ðΣ − lIÞu = 0, [3.5]
where λ are the characteristics roots, or eigenvalues, of Σ, and the vector u

corresponds to the eigenvectors or principal axes of Σ. For there to be a
nontrivial solution to Equation [3.5] (i.e., u ≠ 0), values for λ must be found
that satisfy the determinental equation
jΣ − lIj = 0: [3.6]
The solution of Equation [3.6] is a determinental polynomial of order q.

Given that Σ is symmetric and assumed to be of full rank, then all of the eigen-
values are real and positive (Harman, 1976). The matrix u is orthogonal,
implying that u′u = uu′ = I and u−1 = u′. Thus, from Equation [3.5] it can be
shown that
Σ = uDu0
= uD1=2 D1=2 u0 [3.7]
0
= u u ,
where D is a diagonal matrix containing the p eigenvalues of Σ and u* = uD1/2.
3.4.2 PRINCIPAL COMPONENTS ANALYSIS

It is often the case in an exploratory study that a researcher will use PCA
as an initial approach to data reduction. PCA is not, technically, within the
class of unrestricted factor models. However, PCA will provide results that are
often quite similar to factor analysis and is included in many statistical pack-
ages as the default. Thus, we are including PCA within our discussion of the
unrestricted factor models.
A major difference between PCA and other forms of unrestricted factor

analysis lies in the assumptions made about the existence of measurement
error. Specifically, PCA does not assume that the variables are measured with
error. Rather, PCA simply transforms the original set of measurements into
orthogonal components retaining the original amount of variance in the data.
Factor analysis, by contrast, specifically models measurement error, and
extracts factors that account for maximum common variance in the observed
variables. Often, PCA is used as a factor analysis model and a decision is made
to retain fewer principal components for future rotation and interpretation.3
This distinction is made more explicit below.
Consider now the construction of the new set of q × 1 principal compo-
nents, formed by the linear combination of the original variables x weighted by
the eigenvectors u. That is,
z = u0 x: [3.8]
When PCA is used as a factor analytic method, z is treated as analogous to

ξ in Equation [3.1]. From here, we can obtain the variance of z as
V ðzÞ = V ðu0 xÞ
= u0 V ðxÞu [3.9]
= u Σu:
0
But from Equation [3.7] Σ = uDu0 . Making use of the fact that u is an
orthogonal matrix, Equation [3.9] can be written as
V ðzÞ = u0 uDuu0
= IDI [3.10]
= D:
From Equation [3.10], we can see that the principal components are
orthogonal to each other and that the variances of the principal components
are the eigenvalues of Σ.
In the process of extracting the principal components of Σ, the principal
components are ordered in terms of decreasing size of their eigenvalues. With
all components retained, the total variance of the principal components is equal
to the total variance of the original variables. That is,
X
q X
q X
q
szi = li = trðΣÞ = syi ,
i=1 i=1 i=1
where σzi is the variance of the ith principal component, σyi is the variance of
the ith original variable, and “tr” is the trace operator.
In the typical application of PCA as a factor analysis method, usually
m < q principal components are retained because they account for a sub-
stantively important amount of the total variance. This is discerned using
the fact that
l1 + l2 + + lm
trðΣÞ
provides a measure of the amount of variance explained by the first

m principal components. Thus, in practice, the investigator might choose
m principal components such that the relative variance is large, and then will
work with the new m × 1 vector of principal components.
It may also be the case in practice that an investigator will use standardized
principal components, defined as
z = D1=2 z:
Using Equation [3.10], it can be shown that
V ðz Þ = I
as expected from standardized orthogonal variables.
An Example Principal Components Analysis

Initially, it might be useful to obtain information about the percent of
variance accounted for by each component. As noted above, this is given by the
ratio of the eigenvalue to the total trace variance. In the case of the correlation
matrix, the trace equals total number of variables.
Table 3.2 presents the decomposition of the total variance in terms of the
principal components of the data.4 It can be seen that approximately two thirds
of the total variance in the variables is accounted for by the first two principal
components, and these two components are associated with eigenvalues
greater than 1.0. The scree plot in Figure 3.1, which plots the eigenvalues
against the number of components, also suggests two factors. These results
point to two factors underlying the data as might be suspected from the word-
ing of the items.
Table 3.2 Eigenvalue Decomposition of Student Perception Data
Component Total % of Variance Cumulative %
1 8.408 56.055 56.055

2 1.422 9.483 65.538
3 0.645 4.299 69.837
4 0.534 3.557 73.394
5 0.482 3.214 76.609
6 0.450 3.002 79.610
7 0.424 2.825 82.436
8 0.409 2.724 85.160
9 0.389 2.592 87.752
10 0.371 2.475 90.228
11 0.339 2.260 92.488
12 0.322 2.148 94.636
13 0.295 1.964 96.599
14 0.259 1.724 98.323
15 0.252 1.667 100.000
NOTE: Extraction method: principal component analysis.
3.4.3 THE COMMON FACTOR MODEL

As noted above, the principal axis method and specifically PCA do not
assume a measurement model for the original data. In other words, in PCA, it
is assumed that the variables are measured without error. Clearly, this is an
unrealistic assumption in most applications and the issue is how to extract fac-
tors that explicitly takes into account measurement error.
To begin, let us again consider the model in Equation [3.1]. We can con-
sider the model for the vector of observed responses as arising from two parts:
a common part that relates variables to each other and is assumed to have a
set of common factors underlying their relationships, and a unique part that
corresponds mostly to measurement error. In extracting factors, the goal is to
consider the common part. However, on close examination we run into the
problem that the number of unobserved variables (the k common factors plus
the q unique variables) is larger than q observed variables. Thus, the common
factor model is indeterminant.
6
Eigenvalues
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Component Number
Figure 3.1 Scree Plot From Principal Component Analysis of Student

Perceptions of School Climate
The issue of indeterminancy was considered by Thurstone (1947).

Thurstone’s resolution to the problem was to insist that the number of com-
mon factors extracted from a correlation matrix be determined by the rank of
the correlation matrix after appropriate estimates of common variance were
inserted in the diagonal of the matrix. The common variances are referred to
as communalities.
As noted by Mulaik (1972, p. 136), the communality problem can be con-
sidered from the standpoint of principal axis factoring. Specifically, it is known
that a symmetric matrix with a dominant principal diagonal, such as a corre-
lation matrix, is positive semidefinite—also referred to as Gramian. The rank
of a Gramian matrix is equal to the number of positive eigenvalues. In the
common factor model, Thurstone required that the number of common fac-
tors be determined by the rank of the correlation matrix after replacing the
diagonal elements with communality estimates—with the proviso that the
resulting correlation matrix retain the Gramian property. The problem is that
when the diagonal elements are altered, it is not necessarily the case that the
Gramian property will hold.
The question for Thurstone was how to estimate communalities that
resulted in an altered correlation matrix that remained Gramian. If the com-
munalities were underestimated, then it would be possible that the resulting
rank would be too high. Conversely, if the communalities were overestimated
then the factors would account for more than simply the off-diagonal correla-
tions and the rank would not be reduced (Mulaik, 1972, p. 136). Therefore, the
goal was to choose estimates of communalities that resulted in minimal rank
under the required Gramian conditions.
With regard to the choice of communality estimates, the major work on
this problem can be traced to Guttman (1954, 1956). Guttman provided three
different estimates of communality that would result in lower bounds for min-
imum rank. In the interest of space, and given the fact that more recent statis-
tical approaches have rendered the problem of communality estimation somewhat
moot, we consider the most typical form of communality estimation—namely the
use of the squared multiple correlation (SMC).
The SMC of the qth variable with the other q − 1 variables was shown by
Guttman to be the best estimate of the lower bound for communality. The idea
was to compute the SMCs and insert them into the diagonal of the sample
correlation matrix R and then subject the correlation matrix to principal
axis factoring. Before factoring, however, it is necessary to slightly adjust off-
diagonal elements to preserve the required Gramian property.
An early method, suggested by Wrigley (1956), involved updating the
analysis after each factor extraction. That is, for a given an a priori number of
factors chosen, and with SMCs in the diagonal, the correlation matrix is
subjected to principal axis factoring. Next, the sum of the squared estimated
factor loadings from the first factoring, say Λ^0 Λ
^
1 1 , are used as “updated” com-
munality estimates. These updated estimates are now inserted into the diago-
nal of the correlation matrix. This process continues until the difference
between the current communality estimates and the previous communality
estimates is less than an arbitrary constant. Once convergence is obtained, the
final iterated solution is presented for interpretation. What we have described
here is the method of iterated principal axis factoring and is found in most com-
monly used statistical software packages.
3.5 Rotation in the Unrestricted Model
As discussed in Section 3.3, the manner in which the indeterminancies are

removed for identification purposes is designed to provide an initial set of
estimates. In other words, without dealing with the identification problem as
03-Kaplan-45677:03-Kaplan-45677.qxp 6/25/2008 10:35 AM Page 50
described, no solution would be possible. However, because the initial set of

estimates is completely arbitrary, another set of k2 restrictions can be chosen by
rotating the initially unrotated solution. To see this, consider once again the
orthogonal transformation matrix T. For the k = 2 case, this matrix is of the form

cos y sin y
T= : [3.11]
− sin y cos y
Note that this matrix has one parameter—namely, the angle of rotation θ.
With the angle of rotation chosen, the last remaining indeterminacy is
removed. In the general case, setting Φ = I removes k(k + 1)/2 indeterminan-
cies, leaving k(k − 1)/2. These remaining indeterminancies can be solved by
choosing k(k − 1)/2 angles of rotation.
The decision to rotate the solution usually rests on a desire to achieve a
simple structure representation of the factors (Thurstone, 1935, 1947). Simple
structure criteria are designed to reduce the complexity of the variables—that
is, the number of factors that the variable is loaded on. Ideally, under simple
structure, each variable should have a factor complexity of one—meaning that
a variable should load on one, and only one, factor. Methods of factor rotation
aid in achieving this goal. As we will see, only with the advent of the restricted
factor analysis model, do we have a rigorous approach to testing simple struc-
ture hypotheses.
3.5.1 ORTHOGONAL ROTATION: THE VARIMAX CRITERION

In the interest of space, I will not review all of the methods of orthogonal
factor rotation (see, e.g., Mulaik [1972] for a complete discussion). Suffice to say
that the most popular method of orthogonal rotation is based on the varimax
criterion (Kaiser, 1958). The basic idea behind the varimax criterion is that after
rotation, the resultant loadings on a factor should be either large or small rela-
tive to the original loadings. Kaiser’s (1958) original method was based on an
iterated approach that rotated pairs of factors at a time. Horst (1965) offered a
“simultaneous varimax” solution that rotated all factors simultaneously.
Following Lawley and Maxwell (1971), let
Λ = ΛT,
and
X
q
dj = l2
ij , j = 1, . . . , q,
j=1
where dj is the sum of the squared loadings for the kth column of Λ*. Then,
varimax maximizes
" q #
X
k X 2
V= ðλ2
ji − dj =qÞ : [3.12]
i=1 j=1
Essentially, Equation [3.12] shows that the varimax criterion maximizes the
sum of squared deviations of the squared loadings from the corresponding col-
umn mean. As shown in Lawley and Maxwell (1971, p. 73), this amounts to
maximization with respect to the elements of the transformation matrix T.
Once maximized, T contains elements whose angles of rotation satisfy the vari-
max criterion.
3.5.2 OBLIQUE ROTATION: THE PROMAX CRITERION

We now consider the problem of oblique factors. In this case, we begin by
setting diag(Φ) = I. This removes k indetermancies, leaving k2 − k = k(k − 1) left
to remove. One approach to the problem would be to first orthogonally rotate
to a set of loadings, say Λ(1) using, for example, varimax. Then, find a new set of
loadings, say Λ(2), corresponding to a new Φ, say Φ(2), such that Φ(2) has unit
diagonal elements and nonzero off-diagonal elements. The result is an oblique
solution yielding correlations among the factors contained in Φ(2). The method
just described was developed by Hendrickson and White (1964) and is referred
as promax. Again, in the interest of space, I will not review the numerous meth-
ods of oblique factor rotation. See Mulaik (1972) for a complete discussion.
3.5.3 AN EXAMPLE PRINCIPAL AXIS

FACTORING WITH PROMAX ROTATION
Table 3.3 presents the promax rotated factor loadings based on principal
axis factoring based on the extraction of two factors. An inspection of the load-
ings in combination with the meaning of the items suggests two interpretable
factors. Factor 1 can be labeled POSITIVE SCHOOL CLIMATE and factor 2
can be labeled NEGATIVE SCHOOL CLIMATE. An inspection of the promax
rotated factor correlation matrix in Table 3.3 shows that the positive and neg-
ative school climate factors are highly correlated. The fact that the correlation
is positive reflects the scaling of the items.
An inspection of Tables 3.4 shows the results of maximum likelihood esti-
mation and promax rotation of the unrestricted student perceptions data. It
can be seen that the substantive interpretation of the results are basically the
same as the principal axis factoring results. Moreover, note that the chi-square
Table 3.3 Promax Rotated Principal Axis Factor Loadings and Factor Correlations
Positive Climate Negative Climate
GETALONG 0.797 0.077

SPIRIT 0.673 0.128
STRICT 0.425 0.326
FAIR 0.665 0.104
RACEFRND 0.589 0.219
DISRUPT 0.155 0.651
TCHGOOD 0.863 −0.025
TCHINT 0.923 −0.104
TCHPRAIS 0.823 −0.028
PUTDOWN −0.102 0.854
STUDOWN 0.082 0.678
LISTEN 0.870 −0.054
FEELSAFE −0.027 0.748
IMPEDE 0.138 0.619
MISBEHAV 0.024 0.748
Promax Factor Correlations
1 2
1 1.000
2 0.713 1.000
tests for maximum likelihood leads to the conclusion that the three-factor
model does not fit the data. This could be affected by sample size and nonnor-
mality. However, the root mean square error of approximation along with the
90% confidence interval suggests approximately good fit of the model. Issues
of goodness-of-fit will be taken up in Chapter 4.
3.6 Statistical Estimation in the

Unrestricted Model: Maximum Likelihood
and Generalized Least Squares Methods
Up to now, focus of attention was on methods of extraction that did not require
assumptions regarding the distribution of the data. These were essentially
Table 3.4 Promax Rotated Maximum Likelihood Factor Loadings and Factor
Correlations
Positive Climate Negative Climate
GETALONG 0.745 0.150

SPIRIT 0.628 0.174
STRICT 0.392 0.367
FAIR 0.624 0.160
RACEFRND 0.545 0.272
DISRUPT 0.144 0.659
TCHGOOD 0.821 0.034
TCHINT 0.889 −0.044
TCHPRAIS 0.786 0.024
PUTDOWN −0.110 0.856
STUDOWN 0.081 0.679
LISTEN 0.831 0.001
FEELSAFE −0.029 0.746
IMPEDE 0.134 0.621
MISBEHAV 0.028 0.741
Promax Factor Correlations
1 2
1 1.000
2 0.674 1.000
χ2 (36 df) = 2639.403, p < .05

RMSEA = 0.052, p < .059 (90% CI = 0.050, 0.053)
nonstatistical methods of extraction. Perhaps the most important break-

through in the statistical estimation of factor analysis was the use of maximum
likelihood estimation proposed by Lawley (1940, 1941; see also Lawley &
Maxwell, 1971). A feature of maximum likelihood estimation of the common
factor model is that one can explicitly test the hypothesis that there are k
common factors that underlie the data. Statistically, the method follows that
considered in Chapter 2.
As discussed in Chapter 2, the large sample distribution of the likelihood

ratio test is chi-square with degrees of freedom given by the difference in the
number nonredundant elements in Σ and the number of free parameters in
the common factor model that need to be estimated. In the context of factor
analysis, first note that to solve the identification problem discussed in Section 3.3,
maximum likelihood estimation requires that Λ0 Θ − 1 Λ be diagonal. This has
the effect of imposing k(k − 1)/2 constraints on the model. The number of free
parameters to be estimated are then given as qk + q − k(k − 1)/2. Thus, the
degrees of freedom are given by
df = 12 q(q + 1) − [qk + q − 12 k(k − 1)]

[3.22]
= 12 [(q − k)2 − (q + k)].
Once the parameters are estimated, the maximum likelihood solution can
be further rotated to attain greater interpretability. The problem of factor rota-
tion is described in Section 3.4.6.
In addition to maximum likelihood, the method of generalized least
squares can also be used to estimate the parameters of the factor model. The
generalized least squares estimator was originated by Aitken (1935) but was
applied to the factor analysis setting by Jöreskog and Goldberger (1972).
3.7 The Restricted Factor Model:

Confirmatory Factor Analysis
Up to this point, the discussion has centered on the unrestricted model—

commonly referred to as exploratory factor analysis. The unrestricted model
is contrasted in this section to the restricted model commonly referred to as
confirmatory factor analysis.
The term restricted factor analysis reflects a difference in the number and
position of the restrictions imposed on the factor space (Jöreskog, 1969).
Specifically, in the unrestricted solution we saw that identification is achieved
by imposing k2 restrictions on the model. Because those restrictions are arbi-
trary, the solution can also be rotated to achieve simple structure. Yet, regard-
less of the rotation, the factor model will yield the same fit to the observed
covariance matrix. In a restricted solution, by contrast, usually more than k2
restrictions are imposed. These restrictions are imposed on the elements of Λ
in a manner that reflects an a priori hypothesis of simple structure. As a result,
it is not possible to rotate the restricted model because doing so would destroy
the positioning of the restrictions and hence the hypothesis under study.
3.7.1 IDENTIFICATION IN THE RESTRICTED MODEL

The specification as well as the assumptions of the restricted factor model
is the same as shown in Equations [3.1] and [3.2]. In addition to the k2 restric-
tions imposed on the model as described in previous sections, it is also neces-
sary to fix the metric of the latent variables. This can be accomplished in two
ways. First, one can standardize the variances of the latent variables to one.
This is how the metric is determined in the unrestricted model. When the met-
ric is set in this fashion, then the factor has a mean of 0 and a variance of 1.
The second approach to setting the scale to set the each factor’s scale to the
scale of one of its indicators. To take the example used in this chapter, we may
set the metric of the factor “perception of teacher quality” by fixing its loading
on the variable “Teachers are interested in students” to be 1.0. This variable (as
well as the others) is measured on a 4-point scale, and thus the factor is in the
metric of a 4-point scale. The issue of how the metric of the factor influences
interpretation is discussed in the context of the full structural equation model
in Chapter 4. In any event, one or the other method of setting the scale of the
latent variable must be chosen.
Once the scale of the latent variables is determined, the next step is to
decide on the pattern of fixed and freed loadings in the model. The fixed load-
ings are those that are usually fixed to zero (except, as noted before, when the
need is to set the scale). The fixed loadings represent a priori hypotheses
regarding the simple structure underlying the model. So, to take the percep-
tions of school climate example, results from the maximum likelihood estima-
tion of the unrestricted model (Table 3.5) suggest that the variable “When
I work hard on schoolwork, my teachers praise my effort” has very small load-
ings on the “perceptions of negative school climate” and “perceptions of mis-
behavior and disruptions” factors. Thus, in the restricted model, we may wish
to fix this loading to zero. Doing so implies that we are hypothesizing that the
small loading is exactly zero in the population. After the pattern of fixed load-
ings in the model have been specified, the remaining loadings are “free” to be
estimated.
The determination of the fixed loadings, in combination with the number
of factors implies additional identification issues. To take an example, consider
the case of two indicators loading on one factor. In this case, the number of dis-
tinct elements is given as –12 q(q + 1) = 3. After fixing the metric, say by fixing
one of the loadings to 1.0, the number of parameters to be estimated is 4 (one
loading, two error variances, and one factor variance). Thus, this model
obtains −1.0 degrees of freedom, and hence, the model is not identified. With
three indicators of one common factor, the number of distinct elements in the
covariance matrix is 6, whereas the number of free parameters to be estimated
(after setting the metric) is also 6 (two loadings, three error variances, and one
factor variance). Thus the three-indicator/one-factor model is just-identified.
Table 3.5 Maximum Likelihood Estimation of the Restricted Model of Student

Perceptions of School Climate
Estimates S.E. Est./S.E. Std StdYX
POSITIVE CLIMATE
GETALONG 1.000a 0.000 0.000 0.866 0.859
SPIRIT 0.999 0.010 104.674 0.865 0.762
STRICT 0.899 0.010 87.260 0.778 0.674
FAIR 0.989 0.010 100.728 0.856 0.744
RACEFRND 1.013 0.010 102.665 0.876 0.753
TCHGOOD 1.082 0.009 122.594 0.937 0.838
TCHINT 1.107 0.009 123.147 0.958 0.840
TCHPRAIS 1.026 0.009 112.010 0.888 0.795
LISTEN 1.064 0.009 117.955 0.921 0.820
NEGATIVE CLIMATE
DISRUPT 1.000a 0.000 0.000 0.903 0.776
PUTDOWN 0.832 0.010 85.759 0.752 0.741
STUDOWN 0.873 0.010 85.167 0.789 0.737
FEELSAFE 0.797 0.010 82.342 0.720 0.716
IMPEDE 0.958 0.011 85.337 0.865 0.738
MISBEHAV 0.993 0.011 89.840 0.897 0.772
NEGATIVE CLIMATE with POSITIVE CLIMATE
0.598 0.011 56.355 0.765 0.765
χ (89 df) = 4827.399, p < .05
2
RMSEA = 0.065, p < .050 (90% CI = 0.063, 0.066)

CFI = 0.962
TLI = 0.956
a. Fixed to 1.0 to set the metric of the factor.
Finally, consider a two-indicator/two-factor model. Here, the number of

distinct elements in the covariance matrix is 10, whereas the number of para-
meters to be estimated is 9 (two loadings, four error variances, two factor
variances, and one factor covariance). Thus, this model is overidentified with
one degree of freedom. Note that in this case, if the factor correlation is 0, then
the individual two-indicator/one-factor models will not be identified. These
identification concerns are nicely discussed in Bollen (1989).
3.7.2 TESTING IN THE RESTRICTED MODEL

As in the unrestricted case, one can choose many methods of statistical
estimation of the parameters of the restricted model. For example, one can use
either the method of maximum likelihood discussed in Section 3.5.1 or the
method of generalized least squares discussed in Section 3.5.2. In either case,
the goals of the estimation procedure are the same—as are the underlying
assumptions.
The difference between the unrestricted model and the restricted model
lies with issues of model testing. Specifically, consider the method of maxi-
mum likelihood discussed in Section 3.5.1. Maximum likelihood estimation of
the parameters in the restricted model proceeds in much the same way as esti-
mation in the unrestricted case. Unlike the unrestricted case however, wherein
we were testing the null hypothesis that there exists k common factors under-
lying the data, here the likelihood ratio chi-square is used to test the null
hypothesis that the specified pattern of fixed and free loadings holds in the
population. This hypothesis implies that there are not only k common factors
but also that a particular simple structure describes the relationship between
the variables and the factors. Thus, the additional restrictions beyond the k2
restrictions necessary to obtain a unique solution will result in greater degrees
of freedom compared with the unrestricted model. The degrees of freedom are
obtained by subtracting the total number of estimated parameters, say t, from
the –12 q(q + 1)distinct elements in Σ.
In addition to the likelihood ratio chi-square statistic, a large number of
alternative methods of model fit have been developed. These alternative meth-
ods of model fit were developed in part due to the sensitivity of the likelihood
ratio chi-square test to sample size. These tests are described in more detail in
Chapter 6.
In addition to a global test of whether the restricted model holds in the
population, one can also test hypotheses regarding the individual fixed and
freed parameters (loadings, error variances, and factor variances and covari-
ances) in the model. The methods of parameter testing described in Section 2.4
of Chapter 2 can be applied here as well.
Restricted Factor Analysis of Student

Perceptions of Classroom Climate
Figure 3.2 displays the path diagram of the restricted factor model. This
model was estimated using maximum likelihood and the results are displayed
in Table 3.5. The results are mostly consistent with the findings from the
exploratory factor analysis. Not surprisingly, the larger number of restric-
tions placed on the model (as indicated by the increased degrees of freedom
relative to the exploratory factor analysis results) leads to a much larger
likelihood ratio chi-square value. As will be discussed in Chapter 4, the

remaining goodness-of-fit indices give somewhat contradictory information.
Finally, tests of the individual parameters of the model are also displayed in
Table 3.5. These tests indicate that each estimate in the model is statistically
significant.
3.8 Conclusion
In this chapter, we considered the estimation of latent variables that serve to

represent theoretical constructs. This activity can serve as an end in its own
right—insofar as estimating the latent constructs underlying data provide
information regarding the construct validity of our measures. In the next
chapter, we consider the merging of factor analysis and path analysis into a
comprehensive structural equation methodology and extend the discussion
to cover modeling in multiple groups. In that context, the latent constructs
serve to dissattenuate path coefficients and their standard errors from the
affects of measurement error. However, as we will see, adding a measurement
model to a path model can introduce additional complications to the
methodology.
GETALNG–LISTEN DISRUPT–MISBEHAV
1 1
POS NEG
CLIMATE CLIMATE
Figure 3.2 Stylized Path Diagram of Restricted Factor Model of Student

Perceptions
Notes
1. The sample of students in the NELS survey does not represent a random sam-
ple of U.S. students. Rather, the NELS survey sampling scheme provides a proportion-
ally representative sample of schools. Within the schools, classroom teachers are sampled
for purposes of course coverage. This is followed by a sample of students within those
classrooms.
2. More accurately, imputation based on the mean of nearby points was used. The
argument is that because students are nested in schools, it is important to attempt to
maintain values that reflect the nesting of students within schools. For this analysis,
means based on five nearby points were chosen. Chapter 8 takes up the problem of fac-
tor analysis in multilevel settings.
3. This decision may rest on an inspection of a “scree” plot that plots the sizes of
the eigenvalues. Other criteria may include the number of eigenvalues exceeding 1.0 or
the percent of variance accounted for by the factor.
4. Extraction of the eigenvalues uses the R package prcomp.
4
Structural Equation Models
in Single and Multiple Groups
T his chapter focuses on linking path analysis with factor analysis into a
comprehensive methodology typically referred to as structural equation
modeling.1 The rationale for linking these two methodologies into a compre-
hensive framework is that by doing so, we mitigate the problems associated with
measurement error thereby obtaining improved parameter estimates both in
terms of bias and sampling variability. The improvement resulting from com-
bining path analysis with factor analysis comes with a price, however—namely,
adding a measurement model to a path model will often dramatically increase
the total number of degrees of freedom available for testing model fit. This is
because, as we saw in Chapter 3, the restricted factor model will typically be
associated with a large number of restrictions reflecting a simple structure
hypothesis underlying the measurement instrument. These added restrictions
make it all the more likely that a reasonably well fitting structural part of the
model will be rejected due to problems within the measurement model.
Moreover, the potential for misspecification in the measurement part of the
model owing to these restrictions can, in some circumstances, propagate into
the structural part of the model (Kaplan, 1988; Kaplan & Wenger, 1993). We
take up these issues in more detail in Chapter 6 when we discuss modeling
strategies. Despite these difficulties, structural equation modeling represents an
extremely important advancement in statistical modeling when the goal is accu-
rate estimation and inference within complex systems.
The organization of this chapter is as follows. First, the basic model spec-
ification is presented. This is followed by a discussion of the problem of iden-
tification that pertains specifically to structural equation models.
Next, we discuss the method of multiple group structural equation mod-
eling as a means of addressing group differences while taking into account
61
problems of measurement error. This section begins by considering the general

form of the multiple group structural equation model focusing on multiple
group measurement models as a special case.2 Next, this section addresses the
special problem of identification in the multiple group case. This is followed by
a brief discussion of estimation in the multiple group case. The problem of
testing is addressed next where the discussion will focus on a variety of strate-
gies for assessing group differences in the latent variable context. From here, we
discuss alternative methods for assessing group differences. In particular, we
focus attention on the multiple indicators/multiple causes (MIMIC) approach
to modeling group differences in latent variables. Finally, the chapter closes
with a discussion of the problem of drawing causal inferences in studies of group
differences in latent variable models.
This chapter does not cover issues of estimation and testing in structural
equation models because these issues are essentially the same as the estimation
and testing issues covered in Chapters 2 and 3. An exception includes the issue
of mean structure estimation that is discussed in Section 4.5.
4.1 Specification of the

General Structural Equation Model
We consider in this section the specification of the general structural equation

model for continuous latent variables—linking the measurement model as
described in Chapter 3 with the path analytic model described in Chapter 2. To
fix notation, define the full structural model as follows:
η = Bη + Γξ + ζ, [4.1]
where η is an m × 1 vector of endogenous latent variables, ξ is a k × 1 vector of

exogenous latent variables, B is an m × m matrix of regression coefficients
relating the latent endogenous variables to each other, Γ is an m × k matrix of
regression coefficients relating endogenous variables to exogenous variables,
and ζ is an m × 1 vector of disturbance terms.
The latent variables are linked to observable variables via measurement
equations for the endogenous variables and exogenous variables. These equa-
tions are defined as
y = Λy η + ε [4.2]
and
x = Λx ξ + δ, [4.3]
Structural Equation Models in Single and Multiple Groups—63
where Λy and Λx are p × m and q × k matrices of factor loadings, respectively,

and ε and δ are p × 1 and q × 1 vectors of uniqueness, respectively.
As noted elsewhere, structural equation modeling is a special case of a
more general covariance structure model. Substituting Equation [4.1] into
Equation [4.2], the covariance matrix for y and x can be written in terms of the
parameters of the full model (Jöreskog, 1977) as

Σyy sym:
Σ=
Σxy Σxx
" #
Λy ðI − BÞ−1 ðΓΦΓ0 + ΨÞðI − BÞ0−1 Λy + Θε Λy ðI − BÞ−1 ΓΦΛ0 x
= , [4.4]
Λx ΦΓ0 ðI − BÞ0−1 Λ0 y Λx ΦΛ0 x þ Θδ
where Φ is the k × k covariance matrix of the exogenous latent variables, Ψ, is

the m × m covariance matrix of the disturbance terms, and Θε and Θδ are the
covariance matrices of the uniquenesses ε and δ, respectively. In terms of the
parameter vector Ω, we have Ω = ðΛy , Λx , Θe , Θd , Φ, B, Γ, ΨÞ . Note that
the lower diagonal element of Equation [4.4], Λx ΦΛ0x + Θd , is the covariance
structure for the factor analytic model discussed in Chapter 3.
4.2 Identification of Structural Equation Models
The problem of identification of path models and factor models was discussed
in Chapters 2 and 3, respectively. Here we discuss identification as it pertains
to the full model in Equation [4.1]. The general problem of identification
remains the same—namely, whether unique estimates of the parameters of the
full model can be determined from the elements of the covariance matrix of
the observable variables. When combining the measurement and structural
models together into a single analytic framework, a set of new identification
conditions can be added to those that have already been presented.
To begin, we note that by adding the measurement model to the path
model, the identification conditions of the measurement model, specifically
that of restricted factor analysis, are required as part of overall identification.
In particular, it is essential to set the metric of the latent variables as discussed
in Section 3.5 of Chapter 3. In the typical case, the metric of the exogenous
latent variables are set by either fixing one loading in each column of Λx to 1.0,
or by fixing the diagonal elements of Φ to 1.0. The metric of the endogenous
latent variables are typically set by fixing a loading in each column of Λy to 1.0.
With the metric of the latent variables determined, we can now consider a
set of rules provided by Bollen (1989) that can be used to evaluate the identi-
fication status of structural equation models. The first rule is the counting rule
that was discussed in Chapters 2 and 3. To reiterate, let s = p + q, be the total

number of endogenous and exogenous variables, respectively. Then, as we
showed, the number of nonredundant elements in Σ is equal to
–12s(s + 1). If t is the total number of parameters in the model that are to be esti-
mated (i.e., the free parameters), then the counting rule states that a necessary
condition for identification is that t ≤ –12s(s + 1) . If the equality holds, then we
say that the model is just identified. If t is strictly less than –12s(s + 1) , then we say
that the model is overidentified. If t is greater than –12s(s + 1), then the model is
not identified.
In addition to the counting rule, another simple rule for establishing the
identification of structural equation models is based on treating structural
equation models as restricted factor analysis models. This method is referred
to as the two-step rule (Bollen, 1989). The basic idea can be motivated by con-
sidering the latent variable model shown in Figure 4.1. The first step of the
two-step rule is to reparameterize the structural equation model as a restricted
factor analysis model, recognizing that the elements in Φ can be translated to
elements in Γ and B. If the restricted factor model is identified, then the sec-
ond step focuses on the structural model as though it were among observed
variables. If the observed variable model satisfies the identification conditions
for path analysis discussed in Chapter 2, then the model as a whole is identified.
SCIGRA6 SCIGRA10
SES SCIACH
CERTSCI INVOLVE CHALLENGE
1 1
MAKEMETH OWNEXP CHOICE CHALLG UNDERST WORKHARD
Figure 4.1 Expanded Science Achievement Model

The two rules just described are either necessary or sufficient, but not
both. Indeed, there appear to be no necessary and sufficient conditions for
identification of the full model. If sufficient conditions for identification are
not met then it may be necessary to directly attempt to solve the structural
form equations in terms of the reduced form coefficients. Generally, however,
except for extremely complex models, the counting rule will work most often
in practice.
4.3 Testing and Interpretation

in Structural Equation Models
As noted earlier, it is not the intention of this chapter to discuss parameter

estimation. Parameter estimation was discussed in Chapters 2 and 3, and the
issues discussed in those chapters generalize to the estimation of the parame-
ters of the full model. Issues of estimation under violations of underlying
assumptions are taken up in Chapter 5. Nevertheless, the combination of mea-
surement and path models into a comprehensive framework does offer inter-
esting problems for testing that should briefly be addressed.
Specifically, it is important to consider that the test of the fit of the model
based on, say, the likelihood ratio chi-square statistic is now going to be based
on many more degrees of freedom than usual. In general, it is possible to parti-
tion the total degrees of freedom into those based on restrictions in the mea-
surement part of the model and those based on restrictions in the structural part
of the model. Usually, the degrees of freedom from the measurement part of the
model are greater than those from the structural part of the model. However,
it is the structural part of the model that is typically the focus of substantive
inquiry with the measurement part serving to provide unbiased estimates of
structural model parameters. Thus, it is possible to reject a relatively well-fitting
structural model because of a poorly developed measurement model.
It is clear then, that the effort put forth in building a well-defined measure-
ment model will benefit model fit as a whole. In Chapter 6 we consider the prob-
lem of model building strategies with respect to the debate around building
measurement and structural models. We also consider this issue in Chapter 10.
4.3.1 STANDARDIZED SOLUTIONS IN THE FULL MODEL

In the full model, two types of standardizations can be employed. The first
type standardizes only the structural part of the model and allows for compar-
isons between only the structural coefficients. The second type standardizes
both observed and latent variables so that all parameters in the model can
be compared.
Following Jöreskog and Sörbom (2000) define a set of scaling factors as
n o1
Dη = diag½ðI − B̂Þ−1 ðΓ
^Φ^Γ ^ − B̂Þ0 −1 2 ,
^ 0 + ΨÞðI [4.5]
1
Dξ = ðdiag ΦÞ
^ 2, [4.6]
n o1
^ y ðI − B̂Þ−1 ðΓ
Dy = diag½Λ ^Φ^Γ ^ − B̂Þ0−1 Λ
^ 0 + ΨÞðI ^ ε 2,
^ 0y + Θ [4.7]
and
1
^ x ΦΛ
Dx = diag½Λ ^ 0 x + Θδ 2 , [4.8]
where diag[·] refers to the diagonal elements of the matrix it is referencing.

These scaling factors are applied in specific ways to yield standardized and
completely standardized solutions (Jöreskog & Sörbom, 2000). For the
standardized solutions, the latent variables are scaled to have variances equal to
one while the observed variables remain in their original metric. For the
completely standardized solution, both the observed and latent variables are
scaled to have variances equal to one.
An Example of a Structural Equation

Model of Science Achievement
In Chapter 2, we considered a model of science achievement. In that
model, we considered two variables as representing teacher processes within
the input-process-output theory of education. Those variables included
student perceptions of the extent to which the teacher emphasizes under-
standing of science concepts (UNDERSTD) and student perceptions of
whether they feel challenged in the classroom (CHALLG). In this example, we
take a closer look at student perceptions of classroom context, postulate two
latent variables of relevance to perceived classroom processes, and embed these
latent variables into the path model discussed in Chapter 2.
An inspection of the NELS:88 data set reveals numerous items that assess
student perceptions of classroom climate and teacher behavior. These items fall
roughly into those that assess student perceptions of how much the teacher
emphasizes further study of science, interest in science, the importance of
science to everyday life, and learning science facts and rules. Another set of
items measure the extent of computer use in the science classroom—in terms
of science models, writing reports, making calculations, and so on. A third set
of items assess the extent to which students feel challenged in the science class-
room, work hard in science class, and are challenged to show understanding. A
fourth set of items examine the extent to which students engage in hands-on
science learning. Finally, a fifth set of items assess the extent to which students
perceive themselves to be engaged in passive science learning—such as copying
down teachers’ notes, listening to lectures, and so on.
An exploratory factor analysis with oblique rotation (not shown) revealed
a very clear five-factor structure along the lines of the sets of items just
described. Of the five factors extracted, it was decided to use two factors that
could be argued to be most relevant for science achievement. These were items
that measured perceptions of hands-on involvement in science learning
(INVOLVE) and those that measured the extent to which the students felt chal-
lenged in the classroom (CHALLG).
In Chapter 2, the variables UNDERST and CHALLG were considered
separate though highly correlated variables. On the basis of a more detailed
exploratory factor analysis, we find that these two variables actually serve as
indicators of a single CHALLG factor. It is argued that the extent to which
students perceive to be challenged in the classroom can be predicted in part by
the extent of active learning. This hypothesis is consistent with the general
input-process-output paradigm discussed in Chapter 1.
The path diagram of the expanded path model incorporating the latent
variables is shown in Figure 4.1. The initial model is similar to that discussed
in Chapter 2. Namely, it is hypothesized that the background student charac-
teristics of previous science grades (SCIGRA6) and socioeconomic status
(SES) influence science achievement indirectly through 10th grade science
grades. The role of teacher certification in science is hypothesized to predict
the extent of hand-on science involvement. This in turn is hypothesized to pre-
dict student perceptions of a challenging classroom environment, which in
turn should predict science achievement through science grades.
The analysis of the initial model was based on a sample of public school
10th grade students from the NELS:88 data set. After listwise deletion, the sam-
ple was 6,677. The analysis used the software program Mplus with maximum
likelihood estimation.
Table 4.1 presents the maximum likelihood estimates of the expanded
science achievement model. An inspection of Table 4.1 reveals moderately large
and significant effect of perceptions of hands-on involvement on perceptions
of being challenged. However, the direct effect of perceptions of hands-on
involvement is not a significant or very large predictor of 10th grade science
grades.
The results in Table 4.1 are supplemented with a breakdown of the total
and indirect effects displayed in Table 4.2. Here, it can be seen that although
Table 4.1 Maximum Likelihood Estimates of Expanded Science Achievement

Model
Measurement model
INVOLV BY
MAKEMETH 1.000 0.000 0.000 0.606 0.738
OWNEXP 0.724 0.027 26.469 0.439 0.605
CHOICE 0.755 0.029 26.036 0.458 0.507
CHALL BY
CHALLG 1.000 0.000 0.000 0.917 0.748
UNDERST 0.757 0.024 31.043 0.694 0.503
WORKHARD 0.867 0.026 32.921 0.795 0.723
Structural model
CHALL ON
INVOLV 0.251 0.027 9.282 0.166 0.166
INVOLV ON
CERTSCI 0.012 0.031 0.399 0.021 0.006
SCIGRA10 ON
CHALL 0.264 0.026 10.319 0.242 0.133
SCIACH ON
SCIGRA10 1.217 0.036 33.687 1.217 0.381
SCIGRA10 ON
SCIGRA6 0.788 0.021 37.147 0.788 0.416
SES 0.240 0.028 8.734 0.240 0.098
CERTSCI 0.032 0.068 0.466 0.032 0.005
the direct effect of perceptions of hands-on involvement is a strong predictor

of perceptions of a challenging classroom, its total and indirect effect on ulti-
mate science achievement is quite modest—particularly when compared with
prior grades in science.
From a substantive standpoint, these results suggest that previous science
grades and the perception of a challenging classroom environment are important
predictors of grades in 10th grade science. The direct effect of perceptions of
hands-on involvement is not as strong as its indirect effect through perceptions of
a challenging classroom environment. That is, it appears that hands-on involvement
Table 4.2 Standardized Total and Indirect Effects for the Expanded Science
Achievement
Effects from SCIGRA6 to SCIACH

Sum of indirect 0.959 0.038 24.954 0.959 0.159
Specific indirect effects
SCIACH
SCIGRA10
SCIGRA6 0.959 0.038 24.954 0.959 0.159
Effects from SES to SCIACH
Sum of indirect effects 0.293 0.035 8.455 0.293 0.037
SCIACH
SCIGRA10
SES 0.293 0.035 8.455 0.293 0.037
Effects from CERTSCI to SCIACH
Sum of indirect effects 0.040 0.083 0.478 0.040 0.002
SCIACH
SCIGRA10
CERTSCI 0.039 0.083 0.466 0.039 0.002
SCIACH
SCIGRA10
CHALL
INVOLV
CERTSCI 0.001 0.003 0.399 0.001 0.000
is relevant only to the extent that it leads to perceptions of being challenged.

Moreover, to the extent that grades in 10th grade science are predictive of overall
science achievement, the role of perceptions of hands-on involvement is not as
important as the perception of a challenging classroom environment.
Interestingly, these results suggest that the role of teacher background
(as measured by whether the teacher is certified in science) is not an important
predictor of perceptions of hands-on involvement. Indeed, the results of this
analysis suggest that teacher certification in science is a poor predictor of science
grades or ultimate science achievement—especially when compared with SES,
prior grades, and the perceptions of a challenging classroom.
4.4 Multiple Group Modeling: No Mean Structure
This section considers the extension of the single group model discussed in this
chapter and in Chapters 2 and 3, to multiple group situations. To motivate this
section, consider the substantive problem of student perceptions of school climate
as discussed in Chapter 3. Suppose that an investigator wishes to understand
the differences between public and private schools in terms of student perceptions
of school climate. Using, for example, the National Educational Longitudinal
Study (NCES, 1988), it is possible to identify whether students belong to public or
private (e.g., Catholic, other religious, and other nonreligious) schools.
An investigator may have a program of research designed to study school-type
differences in student perceptions of school climate eventually relating these differ-
ences to important educational outcomes such as academic achievement and
student dropout. First, the investigator might wish to determine whether the mea-
surement structure of the student perception items operates the same way across
school types. Second, the investigator may be interested in knowing if the means of
the factors of school climate are different between public and private school students.
4.4.1 MULTIPLE GROUP SPECIFICATION AND TESTING

We begin by considering the problem of assessing the comparability of
factor structures across groups.3 Jöreskog (1971) suggested a strategy for
assessing the comparability of a factor structures between groups based on
tests of increasingly restricted hypotheses.
Let us reconsider the factor model subscripted with a group index g = 1,
2, . . . , G, denoted as
xg = Λxg ξg + δg , [4.9]
where xg is a vector of observed measures, Λxg is a matrix of factor loadings, ξg

is a vector of common factors, and δg is a vector of unique variables. The
identification conditions discussed in Section 3.5 hold here as well, but now
they must be in place for each group.
Under the assumption that the samples are independent of one another, and
also assuming that the values of the variables are realizations from a multivariate
normal population, the log-likelihood function can be written for group g as
ng
log L0 ðΩÞg = − logjΣg j + trðSg Σ−1
g Þ [4.10]
2
and can be summed yielding,

X
G
log L0 ðΩÞ = log L0 ðΩÞ: [4.11]
g =1
Minimizing the function in Equation [4.10] yields the maximum likeli-

hood fitting function FML, written as
FML = logjΣj + trðSΣ−1 Þ − logjSj − q: [4.12]
Given the model specification and the requisite assumptions, the first
test that may be of interest is the equality of covariance matrices across
groups. Note that for this first step, no structure is imposed. Rather, the goal
is simply to determine if the covariance matrices differ. Borrowing from
Jöreskog’s (1971) notation, the null hypothesis for this first step can be writ-
ten as
H0 : Σ1 = Σ2 = = ΣG : [4.13]
This hypothesis is tested using Box’s M test (Timm, 1975) and can be writ-
ten as
X
G
M = nlogjSj − ng logjSg j, [4.14]
g =1
which is asymptotically distributed as chi-square with degrees of freedom
1
d = ðg − 1Þqðq + 1Þ: [4.15]
2
In standard multivariate procedures, such as multivariate analysis of vari-

ance, we typically wish to retain the null hypothesis of equal covariance matri-
ces. In the testing strategy proposed by Jöreskog (1971), retaining H0Σ would
suggest proceeding with an analysis using the pooled covariance matrix and a
discontinuation of examining group differences.
If the hypothesis of equality of covariances is rejected, then the next step
in the sequence is to test the equality of the number of factors, without regard
to the specific pattern of fixed and free loadings. This hypothesis can be repre-
sented as
H0k : k1 = k2 = = kG , [4.16]
where k is a specified number of factors. Essentially, this test is conducted as

separate unrestricted factor analysis models. Each is tested using chi-square with
degrees of freedom dk = –21 [(q–k)2 – (q+k)]. Because the chi-square statistics are
independent, they can be summed to obtain an overall chi-square test of the
equality of number of factors, with degrees of freedom
1
dk = ½ðq − kÞ2 − ðq + kÞ: [4.17]
2
If this hypothesis is rejected, then testing stops and analyses can take place
within groups.
If the hypothesis of equality of number of factors is not rejected, the next
step in the testing strategy is a test of equality of factor loadings. This test is
classically known as the test of factorial invariance and can be represented as
H0 : Λ1 = Λ2 = = ΛG : [4.18]
The test of factorial invariance is obtained by setting equality constraints

across groups for common elements in the factor loading matrix Λ and allow-
ing the remaining parameters to be free across groups. The result is a chi-
square statistic that can be assessed with
1 1
d = gqðq + 1Þ − qk + q − qkðk + 1Þ − gq [4.19]
2 2
degrees of freedom. If factorial invariance is rejected, then testing stops at this
point.
If factorial invariance is tenable, the next step is to assess equality of factor
loadings and unique variables. Again, in Jöreskog’s (1971) notation, this can be
represented as
H0ΛΘ : Λ1 = Λ2 = = ΛG , [4.20]
Θ1 = Θ2 = = ΘG :
The test of the hypothesis in Equation [4.20] is obtained by setting equal-

ity constraints across groups for common elements in the factor loading
matrix Λ and covariance matrix of the uniquenesses Θ. A chi-square test is
obtained, which can be assessed with degrees of freedom,
1 1
dΛΘ = gqðq + 1Þ − qk + q − gkðk + 1Þ − q: [4.21]
2 2
Again, if this test is not rejected, then further sequential testing is allowed.
Finally, one can test for complete invariance of all parameters across
groups by adding the constraint that the factor covariance matrices Φg to be
equal across groups. The hypothesis can be represented as
H0ΛΘ : Λ1 = Λ2 = = ΛG ,
Θ1 = Θ2 = = ΘG , [4.22]
Φ 1 = Φ2 = = Φ G :
Jöreskog (1971) makes the important point that the hypothesis in

Equation [4.22] is stronger than the hypothesis of equality of covariance
matrices in Equation [4.13] because Equation [4.13] includes cases where Σ
does not necessarily follow a common factor model.
The test of the hypothesis in Equation [4.22] uses the pooled sample
covariance matrix. The resulting chi-square statistic is assessed with degrees of
freedom
1 1
dΛΘΦ = qðq + 1Þ − qk + q − gkðk + 1Þ − q: [4.23]
2 2
It is important to point out that the testing sequence outlined by Jöreskog

(1971) may lead to inflated significance values. Similar to the problem of
multiple post hoc nonorthogonal comparisons in analysis of variance, the
sequence of tests outlined by Jöreskog are likely not orthogonal and thus the
significance values are unknown. This was discussed by Jöreskog (1971), who
recommended that the choice of retaining or rejecting hypotheses should be
based on the substantive importance of the hypothesis under investigation
along with an understanding of the assumptions underlying the use of the like-
lihood ratio chi-square test. These assumptions are discussed in Chapter 5.
It should also be noted that the modeling strategies just described can
handle cases of partial invariance (Byrne, Shavelson, & Muthén, 1989). That is,
given lack of total cross-group invariance in the factor loadings, say, one can
test partial invariance of specific elements in the loading matrix.
Finally, this modeling strategy can be extended to the general structural
equation model. In this case, one would add constraints pertaining to equality
of the structural coefficients in B, Γ, and Ψ.
An Example of Multiple Group Modeling

Recall that the example in Chapter 3 concerned student perceptions of
school climate, with focus on only public schools. In this section, we examine
if there are differences in the factor structure of student perceptions of school
climate across students in public and private schools. The data for
this example are the same as those used in Chapter 3 except that students in
private schools were also extracted. Catholic schools and other religious private
schools were omitted from the analysis. The total sample size for the private
school students (after listwise deletion) was 346. To bring the sample sizes for
public school students in line with the sample size for private school students,
a random sample of 500 public school students was drawn from the total sam-
ple of 11,000.
An exploratory factor analysis of both groups separately (not shown)
revealed that the same two factors held for both groups. Therefore, we proceed
with the sequence of testing proposed by Jöreskog (1971), beginning with the
test of factorial invariance.
Table 4.3 displays the sequential testing of invariance restrictions.
Preliminary analyses (not shown) revealed that the hypothesis of equality of
covariance matrices was rejected, suggesting that we can begin to explore
increasingly restrictive hypotheses of school type differences in student per-
ceptions of school climate.
The next hypothesis under Model 1 is that of invariance of factor loadings
corresponding to Equation [4.17]. The analysis suggests that the hypothesis of
invariance of factor loadings is rejected. Strictly speaking, this finding suggests
that while the number of factors is the same for both groups, the relationship
between the variables and their corresponding factors is not. According to the
testing strategy proposed by Jöreskog (1971), hypothesis testing stops at this
point. For completeness, however, we provide the remaining tests as well as chi-
square difference tests that compare increasing restrictions.
4.5 Multiple Group Specification: Bringing in the Means
Following the empirical example given in the section, An Example of Multiple

Group Modeling, the next step in a program of research looking at tracking
differences in self-concept and locus of control might be to consider if there are
mean differences across tracks on the latent variables. The problem of estimat-
ing factor mean differences was considered by Sörbom (1974). This section
outlines mean structure analysis, presents the identification and estimation
Table 4.3 Sequential Chi-Square Tests of Invariance Constraints for Analysis of

Public and Private School Student Perceptions of School Climate
Modela χ2 df p-Value Δχ2 Δdf
Model 1 4555.33 191 0.000

Model 2 4711.32 206 0.000 155.99 15
Model 3 4724.97 209 0.000 13.65 3
a. Model 1: Invariance of factor loadings. Model 2: Invariance of factor loadings and measurement
error variances. Model 3: Invariance of factor loadings, measurement error variances, and factor
variances and covariances.
issues involved in mean structure analysis, and presents an empirical example.

An excellent overview of the methods described in this section can be found in
Hancock (2004)
4.5.1 MEAN STRUCTURE SPECIFICATION AND TESTING

To estimate the differences between groups on latent variable means, it is
necessary to expand the factor model to incorporate intercepts. That is, we
consider expanding the model in Equation [4.9] as
xg = τ + Λg ξg + δg , [4.24]
where τ is a q-dimensional vector of intercepts. The remaining terms were

defined earlier. In the mean structure case, we add the assumption that
Eðxg Þ = τ + Λg Eðξg Þ
[4.25]
= τ + Λ g κg ,
where κg is a k-dimensional vector of factor means for group g.
4.5.2 IDENTIFICATION AND

ESTIMATION OF THE MEAN STRUCTURE MODEL
In addition to the standard forms of identification in the restricted factor
model, there are issues of identification that are specific to the mean structure
case. To begin, recall that the goal of mean structure analysis is to assess the dif-
ferences between groups on factor means. Thus, it will typically be the case that
the battery of measurements is the same for both groups, and it is therefore
reasonable to assume that factorial invariance holds.4 Under the assumption of
factorial invariance, the model in Equation [4.25] is not identified. As shown
by Sörbom (1974) we can add a k-dimensional vector, say d, to κg and subtract
Λd from τx yielding
Eðxg Þ = τ − Λd + Λðκg + dÞ
[4.26]
= τ + Λκg :
Because d is a k-dimensional vector, the model in Equation [4.25] is iden-

tified only if we add k restrictions. One way to accomplish this is setting, say,
κg = 0, which fixes k restrictions. From here, the remaining factor mean esti-
mates are interpreted as differences from group g.
An Example of Mean Structure Modeling

Continuing with the example of private and public school student per-
ceptions of school climate we proceed to estimate the school type mean differ-
ences on these three latent variables. For the purpose of this example, the
public schools are the reference groups and are coded zero. Also, we estimate
the factor mean differences under the assumption of measurement intercept
and factor loading invariance. As noted above, under factorial invariance, des-
ignating the factor means for a particular group to be zero removes k restric-
tions (here, one) and allows for the identification of the factor means.
The results are shown in Table 4.4. Note that the wordings of the items are
on a scale from 1 to 4, with 4 representing disagreement with the statements
that serve as indicators of their respective factors. The findings indicate that
public school students, on average, tend to disagree with statements tapping
into positive school climate compared to private school students. Moreover,
private school students tend disagree with statements reflecting negative
school climate, relative to the public school students. To the extent that public
schools have a greater student-teacher ratio, more behavior problems on aver-
age, and so on, it appears that students with each of these schools accurately
tend to accurately describe the school climate.
An important caveat in these findings is that they do not account for vari-
ation within the public school or private school sectors separately. This issue is
related to the clustered sampling scheme that generated the data and is dis-
cussed in Chapter 7 when we take up the topic of multilevel factor analysis.
4.6 An Alternative Model for

Estimating Group Differences
This section covers a special case of the general structural equation model
considered above for the estimation of group differences on latent variables.
Table 4.4 Maximum Likelihood Estimates of Factor Mean Differences in

Student Perceptions of School Climate
Factor Estimate a S.E. Est./S.E. p
Positive climate −0.279 0.067 −4.131 0.000

Negative climate 0.283 0.071 4.014 0.000
a. Estimate reflects factor mean difference between private school students (=1) and public school
students (=0) under the assumption of factorial invariance. Note that scales are coded 1 = strongly
agree, 4 = strongly disagree.
This model is referred to as the MIMIC model standing for the multiple
indicators and multiple causes model and was proposed by Jöreskog and
Goldberger (1975).
Consider once again the goal of estimating school type differences on
student perceptions of school climate. Denote by x a vector that contains
dummy codes representing group membership (e.g., 1 = public school; 0 = pri-
vate school). Then, the MIMIC model can be written as
y = Λy η + ε,
η = Γx + ζ, [4.27]
x ≡ ξ:
The identity between x and ξ is obtained by fixing Λx = I, a q × q identity

matrix, and Θδ = 0, a null matrix. Note that the covariance matrix of x is
absorbed into the structural covariance matrix Φ.
There are no special rules of identification that are associated with the
MIMIC model apart from those that are required by the general structural
equation model. Moreover, estimation of the parameters of the MIMIC model
proceeds in the same way as estimation of the parameters in the general struc-
tural equation model.
4.6.1 AN EXAMPLE OF THE MIMIC MODEL

The analysis of school-type differences on student perceptions of school cli-
mate is used here as an example. A path diagram of the MIMIC model is given
in Figure 4.2 and the results are displayed in Table 4.5. The results of this analy-
sis are virtually identical with the mean structure analysis reported in the section
An Example of Multiple Group Modeling. Specifically, the regression of the
latent school climate perception variables on the school type reflects the results
found in the mean structure case. Again, we find that public school students per-
ceive poor teacher quality, more negative school climate, and more problems
with misbehavior and disruptions than their private school counterparts.
4.6.2 EXTENSIONS OF THE MIMIC MODEL

The MIMIC model is perhaps one of the most flexible special cases of the
general structural equation model for addressing substantive problems in the
social and behavioral sciences. To show this, consider again the vector of exogenous
variables x. In the example given in Section 4.6.1, the vector x was a dummy
variable representing track placement. However, the MIMIC model can
incorporate any type of exogenous variable—from continuous to categorical.
GETALNG–LISTEN DISRUPT–MISBEHAV
1 1
POS NEG
CLIMATE CLIMATE
SCHOOL
TYPE
Figure 4.2 Stylized MIMIC Model of Student Perception of School Climate
Indeed, x can be coded to represent orthogonal analysis-of-variance design

vectors (e.g., Kirk, 1995) thus integrating experimental design notions into a
latent variable framework.
Perhaps a more interesting specification of the MIMIC model comes from
the work of B. Muthén (1989) on estimating parameters in heterogeneous pop-
ulations. Among other things, Muthén extended the specification of the
MIMIC model to allow the regression of the indicators as well as the factor on
the exogenous variables. The advantage of this extension is that it allows one to
examine the extent to which there are group differences on specific items over
and above the factor.
This extended specification can be obtained as follows. First, consider
again the full structural equation model in Equations [4.1] to [4.3] and repro-
duced here
η = Bη + Γξ + ζ,
y = Λy η + ε, [4.28]
x = Λx ξ + δ:
Table 4.5 MIMIC Model Results of School-Type Differences in Perceptions of

School Climate
Estimates S.E. Est./S.E.
Measurement part
POS BY
GETALNG 1.000 0.000 0.000
SPIRIT 0.945 0.033 28.438
STRICT 0.880 0.036 24.315
FAIR 0.971 0.032 29.993
RACEFRND 1.001 0.033 30.000
TCHGOOD 1.087 0.031 35.201
TCHINT 1.072 0.029 37.514
TCHPRAIS 1.029 0.031 33.702
LISTEN 1.023 0.033 31.121
NEG BY
DISRUPT 1.000 0.000 0.000
PUTDOWN 0.825 0.034 24.162
STUDOWN 0.867 0.036 23.778
FEELSAFE 0.790 0.034 23.155
IMPEDE 0.963 0.040 23.975
MISBEHAV 0.960 0.038 25.324
Structural part
POS ON
SCHTYPE −0.279 0.067 −4.161
NEG ON
SCHTYPE 0.284 0.070 4.041
NEG WITH
POS 0.694 0.045 15.337
As in the regular MIMIC model, let Λx = I, and Θδ = 0. In the extended

model proposed by Muthén, let Λy = 1, and Θε = 0. This specification forces
the loadings to reside as elements in B. As in the basic MIMIC model, the
metric of the latent variable must be determined—typically by fixing a load-
ing to 1.0. In this extended parameterization, the loadings reside in the B
matrix, and hence, one of the elements of B must be fixed to 1.0. Also, in this
parameterization, the matrix Γ contains the regressions of the factor as well
as its indicators on the exogenous variables. Moreover, in the case of a single
factor, the vector ζ contains p + 1 elements. The first p elements are the
uniquenesses associated with the elements of ε and the last element is the dis-
turbance term ζ.
An Example of the Extended MIMIC Model

As discussed earlier, this model will provide exactly the same goodness-of-
fit, estimates, and standard errors as the basic model in Figure 4.2. The advan-
tage to the extended model is to determine if there are school-type differences
in specific items, over and above differences in the latent variables.
On the basis of the Lagrange multiplier diagnostics (not shown), the
largest effect was the relation between school-type and the question that asked
students to indicate the extent to which they agree with the statement “I don’t
feel safe at this school.” On freeing that path, the relationship was negative,
indicating that private school students significantly disagree with this state-
ment relative to their public school counterparts, over and above the differ-
ences in the general negative school climate factor.
4.7 Issues of Selection Bias in Multiple Group Models
A natural question that arises in the context of the educational tracking exam-
ple is whether we can unequivocally ascribe mean differences in latent self-
concept to the effect of educational tracking. When there is random
assignment of observations to conditions, we are in a stronger position to
argue for causal effects of treatments. However, in this example, random
assignment was not utilized. Indeed, numerous manifest and latent variables
are in play that select children into educational tracks. This section considers
the problem of nonrandom selection into treatment conditions as it pertains
to issues of factorial invariance. We also consider methods that attempt to
account for nonrandom selection mechanisms in latent variable models of
group differences.
4.7.1 THE PROBLEM OF FACTORIAL INVARIANCE

The problem of factorial invariance concerns the extent to which a factor
model that is assumed to hold in a parent population also holds in subpopu-
lations formed by means of some selection criterion. Taking the educational
tracking issue as an example, the problem of factorial invariance concerns
whether a single factor model of self-concept, which is assumed to hold in the
population of children, still holds within educational tracks, where the tracks
are not usually formed by random assignment.
To motivate the problem of factorial invariance, consider again the factor
analytic model discussed in Chapter 3. As in Equation [3.2] the covariance
matrix of the q variables can be written as
Σ = ΛΦΛ0 + Θ, [4.29]
where Σ is a q × q population covariance matrix, Λ is the matrix of factor load-

ings, Φ = E(ξξ′) is a k × k factor covariance matrix, and Θ is a q × q diagonal
matrix of unique variances. Moreover, letting E(ξ) = κ, the q × 1 observed
mean vector μ can be modeled as
μ = τx + Λκ, [4.30]
where τx is a q × 1 vector of measurement intercepts and κ is a k × 1 vector of

factor means. Consider also a selection variable z and a selection function f(z)
that determines the selection of a subpopulation from the parent population.
For the tracking example, z may contain a vector of variables such as prior
academic achievement, SES, and so on. At this point, however, we consider z
and f(z) as unknown.
Meredith (1993) distinguished between two types of factorial invariance,
namely, strong factorial invariance and strict factorial invariance. For either type
of invariance, certain assumptions must hold. First, it is assumed that a factor
model holds in the parent population. Second, it is assumed that the condi-
tional covariances of the factors and the uniquenesses given f(z) is a zero vec-
tor. Under these two assumptions, strong factorial invariance implies that for
every subpopulation, denoted as s,
μs = τx + Λκs , [4.31]
and
Σs = ΛΦs Λ0 + Θs : [4.32]
Equations [4.31] and [4.32] mean that under the strong factorial invari-
ance the structural intercepts and the factor loadings are invariant across the
groups but that the factor means, factor covariance matrix, and covariance
matrix of the uniquenesses may differ. In contrast, strict factorial invariance
retains Equation [4.32] but now requires that the matrix of unique variances is
constant across subpopulations—namely, that
Σs = ΛΦs Λ0 + Θ: [4.33]
The practical implications of factorial invariance as it applies to multiple

group modeling concerns the potential for explicitly testing whether selection
mechanisms are present that mitigate against arguing for the causal effect of
treatments. As is shown below, such testing is possible in the MIMIC framework.
4.7.2 LATENT VARIABLE ANALYSIS OF

COVARIANCE APPROACHES TO MODELING SELECTION
An extension of the analysis of covariance (ANCOVA) to the latent vari-
able context was proposed by Sörbom (1978). This approach allows for the
incorporation of any number of covariates, and through a latent variable rep-
resentation of the covariate, accounts for the problems of measurement error.
The specification of the latent variable ANCOVA model requires a multiple
group model with mean structures (Sörbom, 1974). The groups represent the
treatment conditions of interest, say, the experimental group and the control
group. Specifying a mean structure analysis allows for testing the experimen-
tal hypothesis of interest. However, within each group, a structural model is
specified that regresses the latent outcomes of interest on the latent covariate.
Setting equality constraints across groups on the slopes relating the outcome
to the covariate allows for a latent variable extension of the homogeneity of
regression test. The model proposed by Sörbom (1978) allows one to test fac-
tor mean differences under the assumption of homogeneity of regression.
One limitation of Sörbom’s (1978) procedure is that the covariates should
follow a factor analytic representation to be relevant to the structural equation
modeling framework. That is, although there may be as many covariates as
necessary, the benefit of Sörbom’s approach lies precisely in the latent variable
specification of the covariate. However, there may be cases in which such a
latent variable specification is not feasible. Again, considering the tracking
example, the covariates of interest might include demographic characteristics
and course taking patterns for which a factor analytic representation may be
inappropriate.
4.7.3 A PROPENSITY SCORE

APPROACH TO MODELING SELECTION
An alternative to ANCOVA is based on the use of the propensity score. The
propensity score was proposed by Rosenbaum and Rubin (1983) as a means of
balancing treatment and control groups with respect to observed covariates in
nonrandomized studies.
In a typical application of this approach, a model is specified that pre-
dicts membership into groups. The predictors in this model are referred to
as covariates. The propensity score is defined as the conditional probability

of assignment to a treatment group given a set of observed covariates and
is obtained via a logistic or probit regression. Each observation is associ-
ated with a propensity to be assigned to the treatment group. The distribu-
tion of propensity scores is then divided into strata, and analyses of group
differences are conducted within strata. Comparisons of group differences
within and across strata provide evidence for whether or not the bias due
to nonrandom selection has been accounted for by the propensity score
adjustment.
The propensity score methodology was integrated with multiple group
MIMIC modeling by Kaplan (1999) as a means of addressing selection bias in
latent variable models. Specifically, a MIMIC model of group differences on
latent variables is specified for each strata formed by the propensity score.
Equality constraints across strata of the type discussed in Section 4.4.1 are
imposed on the regression coefficient relating the latent variable to the group-
ing variable (elements of Γ).
A number of hypotheses can be tested under this multisample MIMIC
model. Of considerable interest to the problem of assessing selection bias is the
hypothesis of equality of group differences across strata. The coefficients in Γ
represent the factor mean difference between the groups. An analysis of the
coefficients in Γ can yield three types of interpretations. First, if there is no sta-
tistical and/or substantive factor mean difference between the groups and if
this lack of an effect is found to be invariant across strata, it suggests that the
selection characteristics are accounting for the differences between groups on
the latent variables. Second, if there is a statistical and/or substantively signifi-
cant effect that is invariant across strata it suggests that there is a factor mean
difference between groups that is not due solely to selection effects. Third, if
there is a significant improvement in model fit when allowing for the elements
of Γ to vary freely across strata (hereafter referred to as the Γ-free model) com-
pared with the Γ-invariance model, it suggests that there is an interaction
between the selection characteristics and the group variable. In this case, fur-
ther study of group differences within each stratum separately on the covari-
ates may reveal the sources of the interaction.
The method proposed by Kaplan (1999) rests on two fundamental
assumptions. First, it is assumed that treatment assignment is “strongly ignor-
able” (Rosenbaum & Rubin, 1983). Second, it is assumed that at least strong
factorial invariance holds. Multiple group mean and covariance structure
modeling (Jöreskog, 1971; Sörbom, 1974) can be used to assess these assump-
tions. If either form of invariance is found not to hold, interpretation of the
test for a selection by treatment interaction must proceed with great caution.
However, lack of invariance across groups does not necessarily invalidate com-
parisons between groups and within groups.
4.8 Conclusion
This chapter covered the merging of path analysis and factor analysis into a
comprehensive structural equation modeling methodology. We extended the
basic model to cover modeling in multiple groups. In addition, we discussed
new developments in modeling nonrandom selection that can occur in quasi-
experimental applications of structural equation modeling.
The general model discussed in this chapter, as well as the special cases dis-
cussed so far rest on a certain set of statistical assumptions. It is crucial for
these assumptions to be met to have confidence in the inferences drawn from
application of structural equation modeling. The next chapter discusses, in detail,
the assumptions underlying structural equation modeling.
Notes
1. Duncan (1975) has used the term structural equation modeling to refer to what
we are calling path analysis. Although the terminology is somewhat arbitrary, I have
decided to maintain the common parlance used in this field.
2. The focus on multiple group measurement models is due to the fact that these
are the most common models studied across groups. Presenting the multiple group
measurement model does not result in a loss of generality.
3. It is possible to assess group differences in the unrestricted model, using a vari-
ety of factor comparability measures. See, for example, Harman (1976).
4. However, this is only reasonable if there is random selection of observations
and random assignment to groups.
5
Statistical Assumptions Underlying
Structural Equation Modeling
A s with all statistical methodologies, structural equation modeling

requires that certain underlying assumptions be satisfied to ensure accu-
rate inferences. These assumptions pertain to the intersection of the data and
the estimation method. This chapter considers the major assumptions associ-
ated with structural equation modeling. These include, multivariate normality,
completely random missing data, sufficiently large sample size, and correct
model specification. In addition to these major assumptions, this chapter also
discusses one additional assumption—namely the assumption of exogeneity.
Assumptions regarding the sampling mechanism are deferred to Chapter 7
when we take up the issue of multilevel structural equation modeling.
5.1 Nonnormality
A basic assumption underlying the standard use of structural equation model-

ing is that the observations are drawn from a continuous and multivariate
normal population. This assumption is particularly important for maximum
likelihood (ML) estimation because, as we saw in Chapter 2, the ML estimator
is derived directly from the expression for the multivariate normal distribu-
tion. As noted in Chapter 2, if the data follow a continuous and multivariate
normal distribution, then ML attains optimal asymptotic properties, namely,
that the estimates are normally distributed, unbiased, and efficient.
This section focuses on the effects of nonnormality on normal theory–
based estimation as well as alternative estimators that have been proposed to
address nonnormality. In the interest of space, we will not consider alternative
remedies for handling nonnormal variables, such as transforming the original
variables or creating item parcels. The interested reader should consult the
excellent review by West, Finch, and Curran (1995).
85
5.1.1 EFFECTS OF NONNORMALITY

ON NORMAL THEORY–BASED ESTIMATION
The effects of nonnormality on estimates, standard errors, and tests of
model fit are well known. The extant literature accumulated from the mid-
1980s through the 1990s based on statistical simulation studies suggests that
nonnormality does not affect parameter estimates. In contrast, standard errors
appear to be underestimated relative to the empirical standard deviation of the
estimates. With regard to goodness-of-fit, the extant literature indicates that
nonnonnormality can lead to substantial overestimation of likelihood ratio
chi-square statistic, and this overestimation appears to be related to the
number of degrees of freedom of the model (see, e.g., Boomsma, 1983; B. Muthén
& Kaplan, 1985, 1992; Olsson, 1979).
5.1.2 ESTIMATORS FOR CONTINUOUS NONNORMAL DATA

The mid-1980s also witnessed tremendous developments in alternative
estimation methods in the presence of nonnormal data. Most notably is the
work of Browne (1982, 1984) for continuous manifest variables and B. Muthén
(1978, 1984) for categorical manifest variables.
In both the continuous and categorical cases, the approach to estimation
under nonnormality uses a class of discrepancy functions based on weighted
least squares (WLS). The WLS discrepancy function can be written as
FWLS = ðs − σÞW − 1 ðs − σÞ, [5.1]
where s = vech(S), and σ = vech[Σ(Ω)] are vectorized elements of S and Σ(Ω),

respectively.1 For the function in Equation [5.3] to be a proper discrepancy
function,2 W must be positive definite.
The key characteristic of WLS estimation is the choice of an appropriate
weight matrix W. Following Browne (1982), consider an element of the vector
s denoted as sij, the covariance of variable i and variable j. The expected value
of the sample covariance element can be written as
Eðsij Þ = sij : [5.2]
As with any other sample statistic, one can obtain its variance and covari-
ance with other sample statistics. The general form of the asymptotic covari-
ance matrix of covariances in s can be written as
N −1
ðN − 1Þacovðsij ; sgh Þ = sig sjh + sih sjg + kijgh , [5.3]
N
Statistical Assumptions Underlying Structural Equation Modeling—87
where κijgh is a fourth-order cumulant—a component of the distribution related

to the multivariate excess kurtosis.
When the weight matrix in Equation [5.1] contains the elements shown in
Equation [5.3], this is referred to as the asymptotic distribution free (ADF) esti-
mator proposed by Browne (1982). Note that under the assumption of multi-
variate normality there is no excess kurtosis and therefore the term κijgh is zero
so that Equation [5.3] reduces to
ðN − 1Þacovðsij , sgh Þ = sig sjh + sih sjg : [5.4]
With Equation [5.4] as the weight matrix, FWLS reduces to the generalized
least squares estimator discussed in Chapter 2.
The robustness properties of the ADF estimator have been the subject of
numerous simulation studies. The results of early studies of the ADF estimator
were somewhat mixed. For example, Browne (1982) found ADF estimates to
be biased. Muthén and Kaplan (1985, 1992), on the other hand, found very lit-
tle bias in ADF estimates. In all cases, the ADF chi-square was smaller than ML
chi-square when applied to continuous nonnormal data. However, when ADF
was applied to categorical data, B. Muthén and Kaplan (1992) found that the
ADF chi-square was markedly sensitive and that this sensitivity increased as the
size of the model increased. Moreover, ADF standard errors were noticeably
downward biased, becoming worse as the model size increased.
A troubling feature of the ADF estimator concerns the computational dif-
ficulties encountered for models of moderate size. Specifically, with p variables
there are u = –12 p(p + 1) elements in the sample covariance matrix S. The weight
matrix W is of order u × u. Therefore, the size of the weight matrix grows
rapidly with the number of variables. So, if a model were estimated with 20 vari-
ables, the weight matrix would contain 22,155 distinct elements. Moreover,
ADF estimation required that the sample size (for each group if relevant) exceed
p + –12 p(p + 1) to ensure that the weight matrix is nonsingular. These constraints
have limited the utility of the ADF estimator in applied settings. Below, we dis-
cuss some new estimators that appear to work well for small samples.
More recently, three expectation-maximization (EM) based ML estima-
tors have been developed for the structural equation modeling framework
which provide robust chi-square tests and correct standard errors under non-
normality. These estimators are distinguished by the approach they take for the
calculation of standard errors. The first method uses a first-order approxima-
tion of the asymptotic covariance matrix of the estimates to obtain the stan-
dard errors and is referred to as the MLF estimator. The second method is the
conventional ML estimator that uses the second-order derivatives of the
observed log-likelihood. The third method is based on a sandwich estimator
derived from the information matrices of ML and MLF and produces the
correct asymptotic covariance matrix of the estimates that is not dependent on

the assumption of normality, and which also yields a robust chi-square test of
model fit. This estimator is referred to as MLR. The MLR is a robust full infor-
mation ML estimator. All three estimators are available in the Mplus software
program.
5.1.3 ESTIMATORS FOR CATEGORICAL VARIABLES

It is rarely the case in social science applications of structural equation
modeling that analysts have ratio scaled continuous measures. More often,
researchers measure social and behavioral attributes using ordered categories
but typically treat them as though they are interval scaled. Arguably, the most
common scale is the Likert-type scale, but often dichotomous scales are encoun-
tered. Clearly, such data are, by definition, not normally distributed. Again, the
concern is whether continuous normal theory–based estimators such as ML
and generalized least squares can recover the parameters of models estimated
on such data, and whether standard errors and test statistics are unduly affected
by nonnormality induced by categorization and skewness.
An important development in the estimation of structural model para-
meters for categorical variables is based on the work of B. Muthén (1978, 1983,
1984). Muthén’s approach can be outlined as follows. First, Muthén assumes
that for each element of the observed categorical vector y there is a corre-
sponding latent response variable denoted as y. For a given measure yi
8
>
> Ci − 1 if νi, Ci − 1 < yi
>
< i −2 if νi, Ci − 2 < yi ≤ νi, Ci − 1
>
> C
yi = ... , [5.5]
>
>
>
> 1 if νi, 1 < yi ≤ νi, 2
>
:
0 if yi ≤ νi, 1
where the νis are threshold parameters to be estimated.

Muthén’s approach assumes that y∗ follows a linear factor model
y = τy + Λη + ε, [5.6]
and a system of structural equations
η = α + Bη + Γx + ζ: [5.7]
Muthén’s approach differs from the standard model described in Chapter 4

in that it specifies the structural equation model with observed exogenous
variables x that do not follow a linear factor model. This specification allows
for two types of cases of Muthén’s general model. In Case A, there is no x vec-
tor. This case allows Muthén’s framework to capture all of the models consid-
ered by Jöreskog and Sörbom (1993). For Case B, the vector x is included
allowing one to capture models in which x is a nonstochastic vector of vari-
ables (such as dummy variables). An example of such a model would be the
MIMIC model discussed in Chapter 4, where x may represent treatment group
conditions.
If it is reasonable to assume that continuous and normally distributed y∗
variables underlie the categorical y variables, then from classic psychometric
theory a variety of latent correlations can be specified. Table 5.1 summarizes
Pearson and the latent correlations.
The first step in Muthén’s approach is to estimate the thresholds for the
categorical variables using ML. In the second step, the latent correlations (e.g.,
tetrachoric correlations) are estimated. Finally, in the third step, a consistent
estimator of the asymptotic covariance matrix of the latent correlations is
obtained and implemented in a WLS estimator.
Observe that Muthén’s approach is quite flexible insofar as any combina-
tion of categorical and continuous observed variables can be present in the
data. The only requirement is the assumption that the categorical variables are
associated with continuous normally distributed latent response variables. A
trivariate normality test was offered by Muthén and Hofacker (1988). If the
assumption of trivariate normality holds, then bivariate normality could be
assumed for each pair.
The first simulation study examining the performance of the categorical
variable methodology (CVM) estimator compared with estimators for contin-
uous variables was Muthén and Kaplan (1985). In this article, Muthén and
Kaplan examined the ability of CVM to recover the parameters of the y∗ model
when the variables were split into 25%/75% dichotomies. The results showed
that CVM yielded a slight underestimation of chi-square but that the parame-
ter estimates and sampling variability were well in line with expected values.
Similar findings were observed for multiple group mean structure models
Kaplan (1991b).
Table 5.1 Observed and Latent Correlations
x-Variable Scale y-Variable Scale Observed Correlation Latent Correlationa
Continuous Continuous Pearson Pearson

Continuous Categorical Pearson Polyserial
Continuous Dichotomous Point-Biserial Biserial
Categorical Categorical Pearson Polychoric
Dichotomous Dichotomous Phi Tetrachoric
a. Assumption of an underlying continuous variable for categorical or dichotomous variables.

5.1.4 RECENT DEVELOPMENTS IN

ESTIMATION UNDER NONNORMALITY
As noted above, a problem with estimation methods that explicitly address
nonnormality is the reliance on very large sample sizes or unrealistically small
models. This is because within the class of WLS methods, a weight matrix
inversion is required as can be seen from Equation [5.1]. Accounting for non-
normality increases the size of the weight matrix dramatically making estima-
tion difficult unless the sample sizes are very large or models are unrealistically
small. Indeed, for small samples, the weight matrix may be singular.
Recent developments in the WLS-based estimation of structural model para-
meters under nonnormality now do not require such large sample sizes. Based on
work by Satorra (1992), Muthén and his colleagues (see also B. Muthén, 1993;
B. Muthén, du Toit, & Spisic, 1997) developed a mean-adjusted WLS estimator
(WLSM) and mean- and variance-adjusted WLS estimator (WLSMV). Both esti-
mation methods are available in Mplus (L. Muthén & Muthén, 2006).
The basic idea behind the WLSM and WLSMV is as follows. Note from
above that the weight matrix W is chosen to be an estimate of the asymptotic
covariance matrix of the estimates, denoted in Muthén et al. (1997) as Γ. This
matrix requires inversion. A Taylor expansion of the asymptotic covariance
matrix of the estimates yields a distinction between W and Γ with only W
requiring inversion—not Γ. Therefore, from a computational standpoint, W
can be chosen as any matrix that is easy to invert (such as the identity matrix).
As such, a robust estimator yielding robust standard errors is developed that
does not require extensive computations and does not require enormously
large sample sizes. In addition to robust estimation, a robust mean-adjusted
and mean- and variance-adjusted chi-square can be given (Satorra, 1992) that
again does not derive from an estimation method that does not rest on poten-
tially unstable matrix calculations.
A Monte Carlo study conducted by B. Muthén (1993) demonstrated that
the robust WLS estimator produced much better sampling variability behavior
and considerably better chi-square performance compared with the conven-
tional WLS approaches for handling nonnormality and categorical variable
described above. Below, we apply these methods to the science achievement
model, explicitly accounting for the mixture of categorical and continuous
variables and compare the results with the traditional approaches.
Alternative Estimation of the Science Achievement Model

In this section, we reanalyze the science achievement model in Chapter 2
under a variety of different estimation algorithms that account for the mixture
of continuous and categorical observed variables. This approach is appropriate
given the nature of the scales used in this model. We consider estimation under
WLS (B. Muthén, 1984) and compare the results with the new robust estima-
tion methods for categorical data (B. Muthén et al., in press). Normal theory
ML results are reproduced for comparative purposes. All analyses used Mplus
(L. Muthén & Muthén, 2006).
Table 5.2 presents the results of the estimators under study. A comparison
of the WLS estimators against ML reveals some noticeable differences. In
particular, the standard errors under the WLS-based estimators are generally
smaller than those under normal theory ML. This is expected given theoretical
studies of these estimators. The pattern of changes in the estimates does not
reveal any particular pattern. Finally, and perhaps most noticeably, the chi-
square test of model fit is smaller for the WLS-based estimators compared to ML.
Indeed, the robust WLS estimators result in progressively smaller goodness-of-
fit tests compared to WLS without such a correction.
Table 5.2 Categorical Variable Estimation of the Science Achievement Model
Estimates (S.E.)
Effects ML WLS WLSM WLSMV
IRTSCI ON
SCIGRA10 1.228 (0.034) 1.873 (0.052) 1.876 (0.051) 1.876 (0.051)
SCIGRA10 ON
CHALLG −0.033 (0.017) −0.035 (0.012) −0.038 (0.012) −0.038 (0.012)
SCIGRA6 0.781 (0.020) 0.500 (0.012) 0.544 (0.013) 0.544 (0.013)
SES 0.239 (0.026) 0.207 (0.017) 0.288 (0.017) 0.288 (0.017)
CERTSCI −0.040 (0.039) −0.037 (0.025) −0.049 (0.025) −0.049 (0.025)
UNDERSTD 0.168 (0.015) 0.144 (0.013) 0.153 (0.013) 0.153 (0.013)
CHALLG ON
UNDERSTD 0.318 (0.010) 0.409 (0.011) 0.408 (0.011) 0.408 (0.011)
UNDERSTD ON
CERTSCI −0.031 (0.033) −0.019 (0.025) −0.017 (0.026) −0.017 (0.026)
Residual variances
UNDERSTD 1.858 (0.031)
CHALLG 1.250 (0.021)
SCIGRA10 2.637 (0.043)
IRTSCI 29.290 (0.483) 19.452 (0.491) 22.759 (0.501) 22.759 (0.501)
Chi-square 1321.31 1190.19 1001.68 901.51
df 10 10 10 9
It should be pointed out that the results of this application, though gener-
ally consistent with theoretical expectations, may not occur in practice. Indeed,
the application of estimators that account for nonnormality may reveal other
problems that could actually inflate chi-square and standard errors.
5.2 Missing Data
In addition to the problem of nonnormal data, another problem that com-

monly occurs in social and behavioral science research is that of missing data.
Generally, statistical procedures such as structural equation modeling assume
that each unit of analysis has complete data. However, for numerous reasons,
units may be missing values on one or more of the variables under investiga-
tion. The question addressed in this section concerns the extent to which infer-
ences about the parameters and test statistics of structural equation models are
influenced by the presence of incomplete data.
5.2.1 A NOMENCLATURE FOR MISSING DATA

To begin, let us consider a standard nomenclature for missing data prob-
lems. To motivate this section consider first the case of missing values on one
variable. For example, we may find that some responses to a question regard-
ing teacher salary are missing for some subset of teachers. The effect of miss-
ing values in the univariate case is to reduce the sample from size n to size m,
say. Statistical summaries of the data, such as the sample mean and variance,
are based on m units who responded to the question. If the m observed units
are a random sample from the n total sample, then the missing data are said to
be missing at random (MAR) and the missing data mechanism is ignorable.
However, if teachers with higher incomes tend not to report their income, then
the probability of observing a value for teacher salary depends on the salary
and hence the missing data are not MAR (NMAR) and the mechanism gener-
ating the missing data is nonignorable.
Consider next the case of two variables with missing data occurring for
only one variable. This is an example of a monotone missing data pattern. To
place this problem in a substantive context consider examining the age and
income of a sample of teachers, with missing values again occurring for income.
Let X and Y represent measures of age and income for a sample of teachers,
respectively. Following Rubin’s (1976) terminology, three cases are possible.
First, if missing values for Y are independent of X and Y, then the missing data
are MAR and the observed data are observed at random (OAR), and thus, the
missing data are missing completely at random (MCAR). Second, if missing val-
ues on Y are dependent on values of X—that is, if income values are missing
because of the age of the teachers, then we say that the missing values on Y are
MAR. This is because although the observed values on Y are not a random
sample from the original sample, they are a random sample within subgroups
defined on X (Little & Rubin, 2002). Finally, if the probability of responding to
Y depends on Y even after controlling for X, then the data are neither MAR nor
OAR. Here again, the missing data mechanism is nonignorable.
5.2.2 AVAILABLE CASE APPROACHES TO MISSING DATA

In line with Little and Rubin (2002), we can consider the following approaches
to handling missing data.
Methods Based on Complete Data for All Units

This approach creates complete data for all units. This can be accom-
plished in two ways. The first is based on listwise available data for all cases on
any variable in the analysis. This is referred to as the listwise present approach
(LPA). The second is based on pairwise available data where the focus of atten-
tion is on statistics calculated on pairs of observations (e.g., correlations).
Here, observations are deleted if there are any missing values on any pair
of variables under consideration. This is referred to as the pairwise present
approach (PPA). The advantages to these approaches are their obvious sim-
plicity. However, there are numerous serious problems with these approaches.
Problems with LPA and PPA

The main disadvantage with LPA is the loss of information. Also, LPA
assumes that the remaining data are a random sample of the total sample.
Thus, LPA implicitly assumes that the missing data are MCAR. Simple descrip-
tive or inferential statistics may be used to assess whether this is a valid assump-
tion. However, it can only assess MCAR and not MAR.
A major problem with the pairwise present approach is that when p ≥ 3, PPA
can yield nonpositive definite correlation and covariance matrices. If covariance
or correlation matrices are not positive definite, then this may cause discrepancy
function values to become negative and thus would violate the requirement for
discrepancy functions to be bounded below by zero (Browne, 1982).
A second problem with pairwise correlation matrices is that they do not
maximize any proper likelihood function (B. Muthén, Kaplan, & Hollis, 1987).
To see this, recall that under the assumption of multivariate normality, the
asymptotic covariance matrix of the sample covariance matrix S takes on the
form shown in Equation [5.4] rewritten here as
1
acovðsij , sgh Þ = ðsig sjh + sih sjg Þ, [5.8]
n
where n = N − 1. As shown by Heiberger (1977), however, the asymptotic

covariance matrix of the pairwise covariance matrix can be written as
nijgh
acovðsij , sgh Þ = ðsig sjh + sih sjg Þ, [5.9]
nij ngh
where nijgh are the number of complete observations on variables j, k, l, m. The

difficulty with pairwise covariance matrices lies in the multiplier (nijgh/nijngh),
which results in a violation of the Wishart distribution assumptions. With
respect to structural equation modeling, this violation will likely affect the chi-
square goodness-of-fit test.
Finally, even in the case where pairwise deletion does not result in non-
positive definite matrices, the assumption of MCAR is assumed to hold for the
subset of observations that remain. Thus, PPA is not recommended for use in
structural equation modeling unless the amount of missing data is quite small
and MCAR can be assumed to hold.
5.2.3 MODEL-BASED APPROACHES TO MISSING DATA

One can consider treating missing data by explicitly modeling the mecha-
nism that generates missing data. This approach requires defining a model for
both the observed data and the missing data and maximizing the likelihood
under the full model. As noted by Little and Rubin (2002), model-based
approaches avoid ad hoc procedures, are generally quite flexible, and allow the
calculation of standard errors that take into account missing data.
A major breakthrough in model-based approaches to missing data in the
structural equation modeling context was made simultaneously by Allison
(1987) and B. Muthén et al. (1987). We consider the framework of Muthén
et al. To begin, Muthén et al. (1987) consider the factor analysis model respec-
ified here as
y = τ + Λη + ε, [5.10]
where y∗ is a vector of potentially observable variables and the remaining

parameters were defined earlier in Chapter 3. In addition to the factor model
in Equation [5.10], Muthén et al. (1987) consider a vector of latent selection
variables s∗ associated with each y* that follow the specification
s = Γη η + Γy y + δ, [5.11]
where Γη and Γy are coefficient matrices that allow the selection to be a function
of η, y∗, or both, and δ is a vector of disturbances. In essence, the vector s∗
represents the propensity that each y∗ will be selected to be observed as y. That

is, we can define a threshold variable νj such that for the ith observation

1, if sij > νj
sij = : [5.12]
0, otherwise
In other words, when sij = 1, the corresponding y∗j is selected to be

observed. Otherwise, it is missing. The strength of the selectivity is determined
by the size of the elements in Γη and Γy, and the amount of missing data is
determined by the thresholds ν.
From here, Muthén et al. (1987) derive the likelihood function of the
recorded observations on y∗ by considering two vectors of parameters—a
vector of parameters for the factor analysis model in Equation [5.10] and
a vector of parameters for the selection model in Equation [5.11]. Let the
stacked vector of factor analytic parameters be denoted as
θ = ðτ, Λ, Ψ, Θε Þ, [5.13]
where Ψ is the covariance matrix of η and Θε is the diagonal matrix of unique

variances. Further, let the stacked vector of selection parameters be denoted as
φ = ðΓ, Θδ , Θδε , νÞ, [5.14]
where the matrix Θδ ε allows for the possibility that ε and δ are correlated.
The approach taken by Muthén et al. (1987) is to arrange the data into G
distinct missing data patterns. Let Σg and Sg be the population and sample
covariance matrices for the gth missing data pattern, respectively. Then, follow-
ing Little (1982), B. Muthén et al. (1987), and Rubin (1976), the log-likelihood
can be written as
X
G
log Lðθ, φjyÞ = log hg ðθjyÞ + log f ðθ, φjyÞ [5.15]
g =1
where
1
log hg ðθjyÞ = const: − N g logjΣg j
2 [5.16]
1 g
− N trðΣg Þ − 1 ½Sg + ðyg − μg Þðyg − μg Þ0 :
2
It can be seen that Equation [5.15] is composed of two parts. From

B. Muthén et al. (1987), Equation [5.15] is referred to as the “true” likelihood
because it contains the parameters of the substantive model of interest as well

as the parameters of the model that generates missing data. If we consider just
the first term on the right-hand side of Equation [5.15], this is referred to by
B. Muthén et al. (1987) as the “quasi-likelihood” because it ignores the mech-
anism that generates the missing data. The second term on the right-hand side
of Equation [5.15] represents the mechanism that generates the missing data.
The estimator that maximizes the quasi-likelihood is referred to by Muthén
et al., as the full quasi-likelihood (FQL) estimator.
At this point, attention is focused on the extent to which the mechanism
that generates the missing data—that is, log f ðθ, φjyÞ , is ignorable or nonig-
norable. A highly restrictive missing data process that is ignorable is MCAR. In
the context of Equation [5.11], MCAR implies that s∗ = δ, that is, Γη = 0, Γy = 0,
and Θδ ε = 0.3 Under MCAR, maximization of Equation [5.15] will yield correct
ML estimates.
Although in the context of model-based approaches MCAR is the easiest
to understand, it is also the most restrictive and most unrealistic. However, the
model based approach suggested by B. Muthén et al. (1987) can be used to
model the more realistic assumption of MAR. Following Muthén et al., con-
sider first the case where missing data are predicted by the latent y∗ variables.
From the standpoint of Equation [5.11] this can be represented as
s = Γy + δ: [5.17]
To provide a substantive context to the problem consider measuring the

construct of student academic self-concept across two time periods, where we
have complete data at Time 1 and missing data due to attrition at Time 2. Here
we are considering the case where a respondent is missing data on the indica-
tors of self-concept at Time 2 due to, say, an increase in any one of the respon-
dents’ Time 1 measures. In other words, consider expanding Equation [5.17] as
s = Γ1 y1 + Γ2 y2 + δ: [5.18]
B. Muthén et al. (1987) show that when Γ2 = 0, and Θδε = 0, then the sec-
ond term on the right-hand side of Equation [5.18] does not enter into the dif-
ferentiation with respect to the model parameters in the first term of the
right-hand side of Equation [5.18]. These conditions satisfy the definition of
MAR.
Next, consider the case where respondents omit data at Time 2 due to true
academic self-concept rather than on any one or more of the unreliable measures
of academic self-concept. In this case, the selection model can be written as
s = Γη η + δ: [5.19]
B. Muthén et al. (1987) show that in this case, the likelihood involves both
model parameters θ and missing data parameters φ. Hence, the assumption of
MAR is not satisfied in this case and the missing data mechanism is not ignorable.
Some Studies Using the FQL Estimator. The FQL estimator was compared with
LPA and PPA in an extensive simulation study by B. Muthén et al. (1987). The
general findings were that the FQL estimator was superior to the more tradi-
tional approaches to handling missing data even under conditions where it was
not appropriate to ignore the mechanism that generated the missing data.
In another simulation study, Kaplan (1995b) demonstrated the superior-
ity of the FQL estimator of the PPA approach to data that were missing com-
pletely at random by design.
Problems With the FQL Estimator. The major problem associated with the FQL
estimator is that it is restricted to modeling under a relatively small number of
distinct missing data patterns. Specifically, for the covariance matrices for each
distinct group to be positive definite, the number of respondents in any given
group must be one more than the total number of variables in the model.4
With the exception of cases where missing data is by design,5 small numbers
for distinct missing data patterns are rare in social and behavioral science
applications.
5.2.4 MAR-BASED APPROACHES FOR MODELING MISSING DATA

Recently, it has been possible to incorporate MAR-based approaches
(Little & Rubin, 2002) to modeling missing data within standard structural
equation modeling software. Specifically, Arbuckle (1996) suggested an
approach to ML estimation of structural model parameters under incomplete
data assuming MAR. The method appears to have been originally suggested by
Finkbeiner (1979) for the analysis of confirmatory factor models under incom-
plete data. The general idea is as follows. Let xi be the vector of observed data
for case i. The length of xi is equal to the number of variables with complete
data. Following Arbuckle’s (1996) example of three variables, some cases will
have complete data on all three variables, other cases will have complete data
on two of the three variables, whereas still other cases will have complete data
on only one variables.
In the next step, mean vectors and covariance matrices are formed for
cases that have the same pattern of observed data. For example, if Cases 1, 2,
and 5 have complete data on all three variables, then a mean vector and covari-
ance matrix would be formed for those three. Mean vectors and covariance
matrices for the remaining cases are similarly formed.
Once the mean vectors and covariance matrices have been formed, the full
information ML approach of Arbuckle (1996) uses the fact that for the ith case,
the log-likelihood function can be written as
1 1
log Li = Ci − logjΣi j − ðxi − μi Þ0 Σi− 1 ðxi − μi Þ, [5.20]
2 2
and the log-likelihood of the entire sample is the sum of the individual log-
likelihoods. As usual, the likelihood is maximized in terms of the parameters of
the model.
A comparable development in estimation under MAR was proposed by
B. Muthén and incorporated in Mplus (L. Muthén & Muthén, 2006). In the
first step the mean, variances, and covariances are estimated via ML using the
EM algorithm suggested by Little and Rubin (2002) with no restrictions. This
is referred to as the H1 model. Then, the model of interest (H0) is estimated
conditional on the exogenous variables.6 If there are missing values on the exoge-
nous variables, they are estimated via ML using EM and held fixed at those val-
ues when the H0 model is estimated. A fitting function similar to Equation [5.20]
is used, yielding a large sample chi-square test of model fit.
Simulation studies of ML under MAR have been undertaken. For example,
Arbuckle (1996) compared his full-information ML approach with PPA and
LPA under missing data conditions of MCAR and MAR. His results suggested
that the ML approach performed about the same as PPA and LPA with respect
parameter estimate bias but outperformed these ad hoc methods with respect
to sampling variability. More recently, a simulation study by Enders and
Bandalos (2001) demonstrated the effectiveness of full-information ML to PPA
and LPA under MCAR and MAR.
5.3 Specification Error
In addition to normality and ignorable missing data, structural equation

models assume no specification errors. In the context of this book, specifi-
cation error is defined to be the omission of relevant variables in any equa-
tion of the system of equations defined by the structural equation model.7
This includes the measurement model equations as well as the structural
model equations. As with the problems of nonnormality and missing data,
the question of concern in this section is the extent to which omitted vari-
ables affect inferences. However, for a complete understanding of this prob-
lem, it is necessary to link specification error to issues of sample size and
statistical power.
5.3.1 THE BASIC PROBLEM OF SPECIFICATION ERROR

The basic problem of specification error is well known. From the stand-
point of simple linear regression, specification errors in the form of omitted
variables induce a correlation between the errors and exogenous variables in
the model. As such, the ordinary least squares estimators will no longer be
unbiased, and the bias will depend directly on the size of the correlation between
the errors and exogenous variables.
The extension of the ordinary least squares problem to systems of struc-
tural equations is relatively similar but depends, in part, on the chosen estima-
tor. In particular, estimators such as two-stage least squares and limited
information ML that focus on estimating the parameters of one equation at a
time, limit the effects of specification errors to the particular equation where
the error occurred. By contrast, full information estimators, such as full infor-
mation ML and three-stage least squares tend to propagate errors throughout
the system of equations (see, e.g., Intriligator, Bodkin, & Hsiao, 1996; White,
1982). We take up the problem of specification error propagation below.
5.3.2 STUDIES ON THE PROBLEM OF SPECIFICATION ERROR

The mid-1980s saw a proliferation of studies on the problem of specifica-
tion error in structural equation models. The general finding is that specifica-
tion errors in the form of omitted variables can result in substantial parameter
estimate bias (Kaplan, 1988). In the context of sampling variability, specifica-
tion errors have been found to be relatively robust to small specification errors
(Kaplan, 1989c). However, the z-test associated with free parameters in the
model is affected in such a way that specification error in one parameter can
propagate to affect the power of the z-test in another parameter in the model.
Sample size, as expected theoretically, interacts with the size and type of the
specification error.
A consistent finding of studies on specification error is that specification
error in one part of a model can propagate to other parts of the model. In the
context of structural equation modeling, this error propagation was first
noticed by Kaplan (1988) and studied more closely in a paper by Kaplan and
Wenger (1993).
The nature of specification error propagation depends on the concepts of
asymptotic independence and separable hypotheses in restricted maximum
likelihood estimation introduced by Aitchison (1962). Following Kaplan and
Wenger (1993), consider the case where an investigator wishes to restrict two
parameters simultaneously based on the Wald test given in Equation [2.26] in
Chapter 2. Let W21 be the Wald test of the joint hypothesis that o ^ 1 and o^ 2 are
zero. Then, from Equation [2.26] the Wald test can be written as

Varðô1 Þ sym: ^1
o [5.21]
W21 = n½ o
^1 ^2
o ,
^ 2Þ
o1 , o
Covð^ o2 Þ
Varð^ ^2
o
^ 1 ) and Var(o
where Var(o ^ 2 ) are the variances of o
^ 1 and o ^ 1, o
^ 2 , and Cov( o ^ 2 ) is the
covariance of o ^ 1 and o ^ 2 , which is not assumed to be zero. In this case, the
determinant of the middle term in Equation [5.21] can be written as
D = V1 V2 − C21 2
, where V1 = Varð^ o1 Þ, V2 = Varðω ^2 Þ, and C21 = Covð^ o1 , o ^ 2 Þ.
Thus, Equation [5.21] can be reexpressed as
n 2
W21 = o V2 − o
½^ ^ 2 C21 − o
^ 1o ^ 21 C21 + o
^ 22 V1 : [5.22]
D 1
From Equation [5.22], it can be seen that the Wald test involves the covari-
ance of the parameters C21. If C21 = 0, then the simultaneous Wald test in
Equation [5.22] decomposes into the sum of two univariate Wald tests. When
C21 = 0, we say that the test statistics are asymptotically independent and the
hypotheses associated with these tests are separable (Aitchison, 1962; see also
Satorra, 1989).
The example just given for the case of two hypotheses can be expanded to
sets of multivariate hypotheses (see Kaplan & Wenger, 1993). Of particular
relevance to the problem of specification error propagation is the case of three
hypotheses. Specifically, Kaplan and Wenger (1993) considered the problem of
restricting two parameter estimates, say o ^ 1 and ω̂3 that have zero covariance—
that is, C31 = 0. However, o ^ 1 has a nonzero covariance with o ^ 2 and ω̂3 has a
nonzero covariance with ω̂3. Then, in this case, o ^ 1 and ω̂3 are asymptotically
independent, but the joint hypothesis that o ^ 1 and ω̂3 are zero will not decom-
pose into the sum of the individual hypotheses because of their “shared” covari-
ance with o ^ 2 . In other words, these particular hypotheses are not separable.
Kaplan and Wenger (1993) referred to hypotheses of this sort as transitive. For
the hypotheses associated with these parameters to be separable, all three para-
meters must be mutually asymptotically independent.
The above discussion suggests that the pattern of zero and nonzero values
in the covariance matrix of the estimates is the underlying mechanism that
governs the impact of testing parameters, either singly or in sets. In the context
of specification error, if a parameter (or sets of parameters) has a zero covari-
ance with another parameter, then the restriction of one (say on the basis of
the Wald test) will not affect the estimate or standard error of the other para-
meter. In addition, the concept of transitive hypotheses suggest that if two
parameters are asymptotically independent but not mutually asymptotically
independent with respect to a third parameter, then the restriction of one will
affect the other due to its shared covariance with that third parameter (Kaplan &
Wenger, 1993).
5.3.3 IMPLICATIONS OF SPECIFICATION

ERROR PROPAGATION FOR THE PRACTICE
OF STRUCTURAL EQUATION MODELING
A key point of the discussion on specification error propagation is that the
form of the covariance matrix of the estimates is determined by the initial speci-
fication of the model. After that, each addition (or deletion) of parameters results
in a change in the form of the covariance matrix of the estimates and hence, in the
ways that specification errors will manifest themselves through the model.
By and large, it is rather difficult to predict how specification error will be
propagated throughout a model. This essential difficulty has implications for
parameter testing discussed in Chapter 2. Specifically, any change in the model
results in a change in other parameters of the model as dictated by the form of
the covariance matrix of the estimates. In addition, the result of a series of addi-
tions (or deletions) of parameters on the basis of the Lagrange multiplier or
Wald test will not sum to the effect of adding (or deleting) parameters all at once,
unless mutual asymptotic independence holds. Thus, multivariate approaches to
parameter testing such as the multivariate Lagrange multiplier or multivariate
Wald test advocated by Bentler (1995), must proceed with great caution. Indeed,
Kaplan and Wenger (1993) argued for a more cautious univariate testing approach
trading statistical power for monitoring model changes.
5.4 An Additional Assumption: Exogeneity
The assumptions multivariate normality, random or completely random miss-

ing data, and no specification error constitute the standard assumptions under-
lying structural equation modeling. Perhaps a more primary assumption and
one that touches on specification issues relates to the “exogeneity” of exogenous
variables. We will see that simply designating a variables as “exogenous” does
not render it as such. Nor is the standard requirement of orthogonality of a vari-
able and a disturbance term sufficient for a variable to be exogenous. A detailed
discussion of the issue of exogeneity can be found in Kaplan (2004).
Before beginning, it is useful to point out that there are three forms of
exogeneity considered in the econometrics literature. The first is weak exo-
geneity. The second is strong exogeneity (which includes the feature of
Granger noncausality). The third is super exogeneity, which includes the
notion of parameter invariance. Pearl (2000) makes the important distinction
that weak exogeneity and strong exogeneity are statistical concerns while
super exogeneity concerns the issue of causality. In this section, we focus on
the statistical problem of weak exogeneity and briefly touch on super exo-
geneity in Chapter 10.
To motivate the problem of exogeneity from classic linear regression con-

sider a matrix of variables denoted as z of order N × r, where N is the sample
size and r is the number of variables. It is typically the case that we decide on a
partitioning of z into endogenous variables constituting the outcomes to be
modeled and exogenous variables that are assumed to account for the variation
in the endogenous variables. Denote by y the N × p matrix of endogenous vari-
ables, and by x an N × q matrix of exogenous variables where r = p + q. Of
interest is the joint distribution of y and x, denoted as
Y
N
f ðzjθÞ f ðz1 , z2 , . . . , zN jθÞ = ðzi jθÞ, [5.23]
i=1
where θ is a t-dimensional vector of parameters of the joint distribution of z.

The parameter space of θ is denoted by Θ.
We can rewrite Equation [5.23] in terms of the conditional distribution of
y given x and the marginal distribution of x. That is, Equation [5.23] can be
expressed as
f ðy, xjθÞ = f ðyjx, ω1 Þf ðx, ω2 Þ, [5.24]
where ω1 are the parameters associated with the conditional distribution of

y given x and ω2 are the parameters associated with the marginal distribution
of x. The parameter spaces of ω1 and ω2 are denoted as Ω 1 and Ω 2,
respectively.
Factoring the joint distribution in Equation [5.23] into the product of the
conditional distribution and marginal distribution in Equation [5.24] repre-
sents no loss of information. However, if attention is focused on the condi-
tional distribution only, then this assumes that the marginal distribution can
be taken as given (Ericsson & Irons, 1994). The issue of exogeneity concerns
the implications of this assumption.
5.4.1 PARAMETERS OF INTEREST

Generally speaking, interest usually focuses on modeling the condi-
tional distribution in Equation [5.24]. If this is the goal, then as noted
above, it implies that the parameters of the marginal distribution of x carry
no relevant information for the parameters of interest. In the case of simple
linear regression, the parameters of interest can be defined as θ = ðβ, s2u Þ .
Therefore, more formally, the parameters of interest are a function of ω1,
that is, θ = gðω1 Þ .
5.4.2 VARIATION FREE

Another important concept as it relates to exogeneity is that of variation
free. Specifically, variation free means that for any value of ω2 in Ω2, ω1 can take
on any value in Ω1, and vice versa (Spanos, 1986). To take an example, consider
again the simple regression model. The parameters of interest of the conditional
distribution are θ1 ≡ ðb, σ 2u Þ and the parameters of the marginal distribution
are θ2 ≡ ðmx , s2x Þ . Furthermore, note that b = sxy =s2x , where σxy denotes the
covariance of x and y. Following Ericsson (1994), if σxy varies proportionally
with σx2, then σx2, which is in ω2 carries no information relevant for the estima-
tion of b = sxy =s2x , which is in ω1. Therefore, ω1 and ω2 are variation free.
5.4.3 A DEFINITION OF WEAK EXOGENEITY

The above concepts of factorization, parameters of interest, and variation
free lead to a definition of weak exogeneity. Specifically, following Richard
(1982; see also Ericsson & Irons, 1994; Spanos, 1986), a variable x is weakly
exogenous for the parameters of interest, say ψ, if and only if there exists a
reparameterization of θ as ω with ω = ðω1 , ω2 Þ such that
(i) θ = g(ω1);
(ii) ω1 and ω2 are variation free.
5.4.4 SOME EXAMPLES OF THE EXOGENEITY PROBLEM

It may be of interest consider the standard situation for which weak exo-
geneity holds. For simplicity, consider the simple linear regression model. It
is known that within the class of elliptically symmetric multivariate distribu-
tions, the multivariate normal distribution possess a conditional variance
(scedasticity) that can be shown not to depend on the exogenous variables
(Spanos, 1986). To see this, consider the bivariate normal distribution for two
variables x and y. The conditional and marginal densities of the bivariate nor-
mal distribution can be written respectively as
ðyjxÞ ffi N ½ðb0 + b1 xÞ, s2 ,

[5.25]
x ffi N ½m2 , s222 ,
where the top expression states the conditional distribution of y given x is

approximately normally distributed with mean equal to β0 + β1 x and variance
equal to σ2. The bottom expression states that the marginal distribution of x is
approximately normally distributed with mean μ2 and variance σ222.
Following our notation from above, arrange the conditional and marginal
parameters accordingly:
θ = ðm1 , m2 , s12 , s22 , s12 Þ,

ω1 = ðb0 , b1 , s2 Þ, [5.26]
ω2 = ðm2 , s22 Þ:
Note that for the bivariate normal (and by extension the multivariate nor-
mal), x is weakly exogenous for the estimation of the parameters in ω1 because
the parameters of the marginal distribution ω2 do not appear in the set of the
parameters for the conditional distribution ω1. In other words, the choices of
values of the parameters on ω2 do not restrict in any way the range of values
that the parameters in ω1 can take.
The multivariate normal distribution, as noted above, belongs to the class
of elliptically symmetric distributions. Other distributions include Student’s t,
the logistic, and the Pearson type III distributions. Consider the case where the
joint distribution can be characterized by a bivariate Student’s t distribution
(symmetric but leptokurtotic). The conditional and marginal densities under
the bivariate Student’s t can be written as (Spanos, 1999)

νs2 1
ðyjxÞ ffi St ðb0 + b1 xÞ, 1+ ½x − m2 2 ν + 1 ,
ν−1 νs22 [5.27]
xffi St½m2 , s222 ; ν,
where ν are the degrees of freedom.

Let
θ = ðm1 , m2 , s211 , s222 , s12 Þ,
ω1 = ðb0 , b1 , m2 , s222 , s2 Þ, [5.28]
ω2 = ðm2 , s222 Þ:
Here, it can be seen that the parameters of the marginal distribution ω2

appear with the parameters of conditional distributions ω1. Thus, x is not
weakly exogenous for the estimation of the parameters in ω1.
One simple indirect test of exogeneity is to assess the assumption of joint nor-
mality of y and x using, say Mardia’s coefficient of multivariate skewness and kur-
tosis. If the joint distribution is something other than the normal, then parameter
estimation must occur under the correct distributional form, and that inference
may require specification of the parameters of the marginal distribution.
5.4.5 WEAK EXOGENEITY IN STRUCTURAL EQUATION MODELS

A perusal of the extant textbook and substantive literature utilizing struc-
tural equation modeling suggests that the exogeneity of the predictor variables,
as defined above, is not addressed. Indeed, the extant literature reveals that only
theoretical considerations are given when delimiting a variable as “exogenous.”
An approach to remedying this problem is to focus on the reduced form of
the model as a means of initial exogeneity testing. The reduced form specifica-
tion of a structural model was given in Equation [2.3] of Chapter 2 and is
derived from manipulating the structural form so that the endogenous variables
are on one side of the equation and the exogenous variables are on the other side.
An inspection of Equation [2.3] reveals that reduced form is nothing more
than the multivariate general linear model. From here, Equation [2.3] can be
used to assess weak exogeneity. Specifically, from the context of the reduced
form, the parameters of interest are θ1 ≡ ðΠ1 , Ψ Þ .
5.4.6 AN INDIRECT TEST OF WEAK EXOGENEITY

IN STRUCTURAL EQUATION MODELING
We may wish to consider the standard situation for which weak exogeneity
holds. It is known that under the assumption of multivariate normality of y and x,
the parameters of the joint distribution and the marginal distribution are variation
free. In particular, only under the assumption of normally and independently dis-
tributed observations does weak exogeneity hold. By contrast, if the joint distribu-
tion can be characterized by a multivariate Student’s t distribution, then the
parameters of the marginal and conditional distributions will not be variation free
and hence x will not satisfy the assumption of weak exogeneity. Thus, one indirect
test of exogeneity is to test the assumption of joint multivariate normality of y and
x via, say Mardia’s test of multivariate skewness and kurtosis.
Note also that the assumption of normal, independent, and identically
distributed observations also implies the assumption of homoscedastic errors.
Thus any departure of the normal iid (independent and identically distrib-
uted) assumption, including that of homoscedasticity, calls into question the
assumption of weak exogeneity of x because it suggests a relationship between
the parameters of the marginal distribution and the conditional distribution.
In the context of structural equation modeling, if attention turns to the
reduced form of the model as described in Equation [5.25], then standard
methods for assessing the normal iid assumption—including homoscedastic-
ity assessments would be relatively easy to implement. In any case, users of
structural equation modeling should be encouraged to study plots and obtain
the necessary diagnostics to assess weak exogeneity.
5.4.7 WEAK EXOGENEITY AND THE

PRACTICE OF STRUCTURAL EQUATION MODELING
The assumption of weak exogeneity holds two rather important ramifica-
tions for the practice of structural modeling. In the first place, most structural
modeling software packages use some form of conditional estimation—
conditional on the exogenous variables. As we have seen, conditional estimation
is only valid if the exogenous variables are weakly exogenous. If exogeneity does
not hold, it calls into serious question the use of conditional estimation. What
follows from this assumption is that software programs allow for the character-
ization of alternative distributional forms for the joint distribution of the data.
Second, from a predictive standpoint, lack of weak exogeneity calls into
question the use of the variable for predictive studies. Clearly, it does not make
sense to manipulate values of an exogenous variable for purposes of prediction
when the parameters of that variable are a function of the parameters used for
the prediction. Thus, as we argued above, exogeneity testing is perhaps the most
crucial specification assumption and one that deserves serious attention.
5.5 Conclusion
This chapter examined the major statistical assumptions underlying structural

equation modeling—including normality, ignorable missing data, and no
specification error. We also discussed the issue of the sampling mechanism as
an important assumption underlying the use of structural equation modeling.
In addition, we introduced a new assumption concerning the exogeneity of
predictor variables. We argued that the reduced form specification of the
model could be used to assess exogeneity. Assessing the assumptions of the
joint normality and homoscedasticity offers a simple approach to assessing
exogeneity. We will return to a general discussion of exogeneity with respect to
problems of causal inference in structural equation modeling in Chapter 10.
Setting aside issues of statistical assumptions, the practice of structural
equation modeling possesses set of strategies for model evaluation and modi-
fication. The next chapter considers the variety of strategies available for model
evaluation and modification.
Notes
1. The vech(·) operator takes the –12 s (s + 1) nonredundant elements of the matrix
and strings it into a vector of dimension [–12 s (s + 1)]×1.
2. See Section 2.4 in Chapter 2 for the properties of a discrepancy function.

3. The condition that Θδε = 0 relates to Rubin’s (1976) notion of no parameter space
restrictions or ties between the model parameters and the missing data model parameters.
4. From a practical standpoint, this problem relates to the fact that software
implementation of the FQL estimator requires the specification of multiple group
models with each group representing a distinct missing data pattern.
5. An example of data missing by design arises in the context of balanced incom-
plete spiraling item assessments. This form of missing complete at random by design
was used to study the properties of the FQL estimator in Kaplan (1995b).
6. We consider the validity of such conditional estimation when we take up the
issue of exogeneity in Chapter 9.
7. In econometric treatments of simultaneous equation modeling, specification
error often refers to incorrect functional form of the model. For example, specification
error may refer to the use of a linear model specification when a nonlinear model is
correct.
6
Evaluating and Modifying
Structural Equation Models
T he previous chapter considered the assumptions underlying structural

equation models, the consequences of violating the assumptions, and cur-
rent remedies available to address these violations. It was argued that assessing
assumptions was crucial insofar as violation of one or more of the assumptions
can profoundly affect estimates, standard errors, and tests of goodness of fit. It
was further argued in the previous chapter that it is essential to rule out or oth-
erwise control for assumption violations in order to place confidence in the
results of a structural equation modeling exercise.
In terms of the conventional practice of structural equation modeling,
it is often the case that content area researchers will attempt to evaluate the fit
of the model and in some cases interpret parameter estimates regardless of
whether the assumptions have been assessed and/or controlled. The problem
of evaluating and interpreting structural equation models has dominated the
methodological literature for many years.
This chapter considers first the development and use of alternative fit
indices for assessing the fit of a structural equation model. No attempt is made
to feature every alternative fit index. Rather, we rely on a broad classification of
the indices, discuss their use in practice, and summarize the statistical litera-
ture pertaining to their advantages and limitations.
The section on alternative fit indices will then be followed by a discussion
of model modification and statistical power. Here, the focus is on strategies of
model modification using the Lagrange multiplier (LM) and Wald tests
described in Chapter 2. The closely related issue of statistical power is also
considered in this section.
109
6.1 Evaluating Overall Model Fit: Alternative Fit Indices
For almost 30 years, attention has focused on the development of alternative

indices that provide relatively different perspectives on the fit of structural
equation models. The development of these indices has been motivated, in part,
by the known sensitivity of the likelihood ratio chi-square statistic to large sam-
ple sizes as discussed in Chapter 5. Other classes of indices have been motivated
by a need to rethink the notion of testing exact fit in the population—an idea
that is deemed by some to be unrealistic in most practical situations. Finally,
another class of alternative indices have been developed that focus on the cross-
validation adequacy of the model.
In this section, we divide our discussion of alternative fit indices into three
categories: (1) measures based on comparative fit to a baseline model, includ-
ing those that add a penalty function for model complexity; (2) measures
based on population errors of approximation; and (3) model selection mea-
sures based on the notion of cross-validation adequacy. Chapter 2 covered the
likelihood ratio chi-square, and it will not be discussed here.
6.1.1 MEASURES BASED ON

COMPARATIVE FIT TO A BASELINE MODEL
Arguably, the most active work in the area of alternative fit indices has
been the development of what can be broadly referred to as measures of com-
parative fit. The basic idea behind these indices is that the fit of the model is
compared to the fit of some baseline model that usually specifies complete
independence among the observed variables. The baseline model of complete
independence is the most restrictive model possible and hence the fit of the
baseline model will usually be quite large. The issue is whether one’s model of
interest is an improvement relative to the baseline model. A subset of these
types of indices is designed to take into account the degree of misspecification
in the model. In some cases, these fit indices are augmented with a penalty
function for each parameter estimated. These indices are typically scaled to lie
between 0 and 1, with 1 representing perfect fit relative to this baseline model.
The usual rule of thumb for these indices is that 0.95 is indicative of good fit
relative to the baseline model.
The sheer number of comparative fit indices precludes a detailed discus-
sion of each one. We will consider a subset of indices here that serve to illus-
trate the basic ideas. Detailed discussions of these indices can be found in, for
example, Hu and Bentler (1995).
The quintessential example of a comparative fit index is the normed-
fit index (NFI) proposed by Bentler and Bonett (1980). This index can be
written as
Evaluating and Modifying Structural Equation Models—111
w2b − w2t
NFI = , [6.1]
w2b
where χ2b is the chi-square for the model of complete independence (the so-
called baseline model) and χ2t is the chi-square for the target model of interest.
The model of complete independence will typically be associated with a very
large chi-square value because the null hypothesis tested by χ2b states that there
are no covariances among the variables in the population. Therefore, values
close to 0 suggest that the target model is not much better than a model of
complete independence among the variables. Values close to 1 suggest that the
target model is an improvement over the baseline model. As noted above, a
value of 0.95 is considered evidence that the target model fit is a good fit to the
data relative to the baseline model.
An index that is similar to the NFI but which takes into account the
expected value of the chi-square statistic of the target model is the Tucker-
Lewis index (TLI; Tucker & Lewis, 1973), also referred to as the nonnormed fit
index (NNFI).1 The TLI can be written as
ðw2b =dfb − w2t =dft Þ

TLI = , [6.2]
ðw2b =dfb − 1Þ
where dfb denotes the degrees of freedom for the model of complete independence
and dft denotes the degrees of freedom for the target model of interest. This
index may yield values that lie outside the 0 to 1 range.
The NFI and TLI assume a true null hypothesis and therefore a central
chi-square distribution for the test statistic. However, an argument could be
made that the null hypothesis is never exactly true and that the distribution of
the test statistic can be better approximated by a noncentral chi-square with
noncentrality parameter λ. An estimate of the noncentrality parameter can be
obtained as the difference between the statistic and its associated degrees of
freedom. Thus, for models that are not extremely misspecified, an index devel-
oped by McDonald and Marsh (1990) and referred to as the relative noncen-
trality index (RNI) can be defined as

ðw2b − dfb Þ − ðw2t − dft Þ
RNI = : [6.3]
w2b − dfb
The RNI can lie outside the 0 to 1 range. To remedy this, Bentler (1990)
adjusted the RNI so that it would lie in the range of 0 to 1. This adjusted ver-
sion is referred to as the comparative fit index (CFI).2
Finally, there are classes of CFIs that adjust existing fit indices for the
number of parameters that are estimated. These are so called parsimony-based
CFIs. The rationale behind these indices is that a model can be made to fit the
data by simply estimating more and more parameters. Indeed, a model that is
just-identified fits the data perfectly. Therefore, an appeal to parsimony would
require that these indices be adjusted for the number of parameters that are
estimated. One such parsimony-based index proposed by Mulaik et al. (1989)
is the parsimony-NFI (PNFI) defined as

dft
PNFI = NFI: [6.4]
dfb
The rationale behind the PNFI is as follows. Note that the baseline model
of complete independence restricts all off-diagonal elements of the covariance
matrix to zero. Thus, the degrees of freedom for the baseline model is
dfb = pðp − 1Þ=2 , where p is the total number of observed variables and repre-
sents the degrees-of-freedom for the most restricted model possible. The more
parameters estimated in the target model, the less restricted the model becomes
relative to the baseline model and the greater the penalty attached to the NFI.
As noted above, considerable attention has been paid to the development
and application of CFIs. The extant literature is replete with studies on the
behavior of these, and many other, CFIs. The major questions concern the
extent to which these indices are sensitive to sample size, method of estimation,
and distributional violations (Marsh, Balla, & McDonald, 1988). A detailed
account of the extant studies on the behavior of the CFIs is beyond the scope
of this chapter. An excellent review can be found in Hu and Bentler (1995).
Suffice to say, that use of comparative indices has not been without con-
troversy. In particular, Sobel and Borhnstedt (1985) argued early on that these
indices are designed to compare one’s hypothesized model against a scientifi-
cally questionable baseline hypothesis. That is, the baseline hypothesis states
that the observed variables are completely uncorrelated with each other. Yet,
as Sobel and Borhnstedt have pointed out, one would never seriously entertain
such a hypothesis, and that perhaps these indices should be compared with a
different baseline hypothesis. Unfortunately, the conventional practice of
structural equation modeling as represented in scholarly journals does not
suggest that these indices have ever been compared to anything other than the
baseline model of complete independence (see also Tanaka, 1993).
An Example of Comparative Fit Indices

Applied to the Science Achievement Model
In Chapter 4, it was noted that the science achievement model did not fit
the data as evidenced by the likelihood ratio chi-square statistic. In this section,
we provide the alternative fit indices described above and interpret the fit of
the model with respect to those indices. We will revisit our conclusions regard-
ing the fit of the model in Section 6.2 after the model has been modified.
Table 6.1 provides the TLI and CFI as described in Section 6.1. If we were
to evaluate the fit of the model on the basis of indices that compare the speci-
fied model against a baseline model of independence, we would conclude here
as well that the model does not fit the data well. That is, the TLI does not reach
or exceed the criterion of 0.95 for acceptable fit. Also, the CFI, which does not
rest on the assumption of a true population model but takes into account pop-
ulation noncentrality, does not suggest good fit.
6.1.2 MEASURES BASED ON ERRORS OF APPROXIMATION

It was noted earlier that the likelihood ratio chi-square test assesses an
exact null hypothesis that the model fits perfectly in the population. If the
model in question is overidentified however, then it is quite unlikely that the
model will fit the data perfectly even if the entire population were measured.
Not only is it unreasonable to expect a model to hold even if we had access to
the population, but small errors can also have detrimental effects on the likeli-
hood ratio test when applied to large samples. It seems, therefore, that a more
sensible approach is to assess whether the model fits approximately well in the
population. The difficulty arises when trying to quantify what is meant by
“approximately.”
To motivate this work, it is useful to differentiate among different types of
fit measures (Cudeck & Henly, 1991; see also Linhart & Zucchini, 1986). To fix
ideas, let Σ0 be the population covariance matrix. In general, a given model will
Table 6.1 Selected Alternative Measures of Model Fit for the Initial Education
Indicators Model
Fit Measure Value p
χ (df = 39)
2
1730.524 0.000
Tucker-Lewis index 0.792
Comparative fit indexa 0.844
RMSEA 0.081 0.000
RMSEA lower bound 0.077
RMSEA upper bound 0.084
AIC 209277.638
BIC 209420.573
a. Same as relative noncentrality index except scaled to lie in the interval 0 to 1.

not fit this covariance matrix perfectly. Further, let Σ ~ 0 = ΣðΩ0 Þ be the best
fit of the model to the population covariance matrix. Finally, let Σ ^ = ΣðΩÞ
^ be
the best fit of the model to the sample covariance matrix S.
From these various fit measures, Browne and Cudeck (1993) defined three
types of discrepancies that are of relevance to our discussion. First, there
is FðΣ0 , Σ
~ 0 Þ referred to as the discrepancy due to approximation, which mea-
sures the lack of fit of the model to the population covariance matrix. Second,
there is the discrepancy due to estimation, defined as FðΣ~ 0 , ΣÞ
^ , which measures
the difference between the model fit to the sample covariance matrix and the
model fit to the population covariance matrix. The discrepancy due to approx-
imation is unobserved but may be approximated by
E½FðΣ ^ = n−1 q,
~ 0 ,ΣÞ
[6.5]
where q are the number of unknown parameters of the model (Browne &
Cudeck, 1993). Finally, there is the discrepancy due to overall error, defined
as FðΣ0 , Σ
~ 0 Þ , which measures the difference between the elements of the
population covariance matrix and the model fit to the sample covariance matrix.
Here too, this quantity is unobserved but may be approximated by
E½FðΣ0 ,ΣÞ ~ 0 Þ + n−1 q,

^ = FðΣ0 ,Σ [6.6]
which is the sum of the discrepancy due to approximation and the discrepancy
due to error of estimation (Browne & Cudeck, 1993). For completeness of our
discussion we may wish to include the usual sample discrepancy function
F^ = FðS, ΣÞ
^ defined in Chapter 2.
Measures of approximate fit are concerned with the discrepancy due to
approximation. Based on the work of Steiger and Lind (1980; see also Browne
& Cudeck, 1989), it is possible to assess approximate fit of a model in the pop-
ulation. The method of Steiger and Lind (1980) for measuring approximate fit
can be sketched as follows. First, it should be recalled that in line with statisti-
cal distribution theory, if the null hypothesis is true, then the likelihood ratio
possesses a central chi-square distribution with d degrees of freedom. If the
null hypothesis is false, which will almost surely be the case in most realistic
situations, then the likelihood ratio statistic has as a noncentral chi-square
distribution with d = –12 p (p + 1)−q degrees of freedom and noncentrality para-
meter, λ. The noncentrality parameter serves to shift the central chi-square dis-
tribution to the right.
Continuing, let F0 be the population discrepancy value that would be
obtained if the model were fit to the population covariance matrix. Generally,
F0 > 0 unless the model fits the data perfectly, in which case F0 = 0. Further, let
F^ be the corresponding sample discrepancy value function obtained when the

model is fit to the sample covariance matrix. When F0 = 0, then n F^ is cen-
tral chi-square distributed, where n = N − 1 . However, when F0 > 0 then n F^
is noncentral chi-square distributed with noncentrality parameter l = n F0 .3
Browne and Cudeck (1993) point out that when n F^ has a noncentral
chi-square distribution, then F^ is a biased estimator of F0. Specifically,
^ = F0 + d
EðFÞ , [6.7]
n
with the bias being d/n. The bias in F^ can be reduced by forming the estimator
d
F^0 = F^ − : [6.8]
n
Because Equation [6.8] can yield negative values, we use
F^0 = maxfF^ − n−1 d,0g [6.9]
as the estimate of the error due to approximation (Browne & Cudeck, 1993).
From Equation [6.8], we now have a population discrepancy function F0
and its estimator F^0. However, an inspection of Equation [6.8] shows that F^0
decreases with increasing degrees of freedom. Thus, to controlg for model
complexity, Steiger (1990; see also Steiger & Lind, 1980) defines a root mean
square error of approximation (RMSEA) as
rffiffiffiffiffi
F0
ea = , [6.10]
d
with point estimate
sffiffiffiffiffi
F^0
êa =
d [6.11]
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
= maxf½F^ − n − 1 d,0g:
In using the RMSEA for assessing approximate fit, a formal hypothesis

testing framework is employed. On the basis prior empirical examples, Steiger
(1989) and Browne and Mels (1990) defined as “close fit” an RMSEA value less
than or equal to 0.05. Thus, the formal null hypothesis to be tested is
H0 : e ≤ 0:05: [6.12]
In addition, a 90% confidence interval around ε can be formed enabling

an assessment of the precision of the estimate (Browne & Cudeck, 1993; Steiger
& Lind, 1980). Practical guidelines recommended by Browne and Cudeck
(1993) suggest that values of ε̂ between 0.05 and 0.08 are indicative of fair fit,
whereas values between 0.08 and 0.10 are indicative of mediocre fit.
An Example of the RMSEA Applied

to the Science Achievement Model
Recall that the NFI and TLI (as well as others) use the likelihood ratio chi-
square and assume that the model fits perfectly in the population. We argued
that this could be considered too restrictive and that we may wish to evaluate
the approximate fit of the model. The RMSEA is designed to evaluate approx-
imate fit of the model. From Table 6.1 we see that the RMSEA and its associ-
ated probability value applied to the science achievement model indicates that
the null hypothesis of approximate fit must rejected. Moreover, an inspection
of the 90% confidence interval also suggests poor approximate fit.
6.1.3 MODEL SELECTION CRITERIA

Another important consideration in evaluating a structural model is its
performance relative to other models. In some cases, substantively different
specifications of a model relying on the same sample and variables may be
compared, reflecting competing theoretical frameworks. More often, however,
competing models are simply nested or nonnested modifications of an initial
specification. This is the problem of model selection, and one criteria for
selecting a model is whether the model is capable of cross-validating well in a
future sample of the same size, from the same population, and sampled in the
same fashion.4
In the context of cross-validation, an investigator may have a sufficiently
large sample to allow it to be randomly split in half with the model estimated and
modified on the calibration sample and then cross-validated on the validation
sample. When this is possible, then the final fitted model from the calibration
sample is applied to the validation sample covariance matrix with parameter
estimates fixed to the estimated values obtained from the calibration sample.
In other instances, investigators may not be in a position to work with
samples large enough to allow separation into calibration and validation sam-
ples. Yet, the cross-validation adequacy of the model remains a desirable piece
of information in selecting a model. In this section, we consider three indices
that have been used for model selection in the structural equation modeling
context: the Akaike Information Criterion (AIC), the Bayesian Information
Criterion (BIC), and the expected cross-validation index (ECVI).5
Akaike Information Criterion

In recent years, important contributions to statistical theory in general,
and structural equation modeling in particular allow one to gauge the extent
to which a model will cross-validate in a future sample based on the use of a
single sample. These developments are based on the seminal work of Akaike
(1973, 1987) in the area of information theory.
The mathematical underpinnings of Akaike’s work are beyond the scope
and focus of this chapter. However, the broad outlines of Akaike’s work can be
sketched as follows. To begin, the approach taken in this line of inquiry
requires adopting the viewpoint that the goal of statistics is the realization of
appropriate predictions. By adopting this predictive viewpoint, attention shifts
from the estimation of parameters to the estimation of distributions of future
observations.
The question now turns to the mechanism by which one can estimate a
distribution. Here, Akaike (1985) outlines how work in physics and, in partic-
ular, the concept of entropy could be related to notions of statistical informa-
tion.6 Indeed, Akaike shows that Fisher’s information matrix, described in
Chapter 2, is a function of the entropy.
The goal now is to link these concepts with the method of maximum like-
lihood. Akaike (1985) notes that a limitation of the method of maximum like-
lihood is that when selecting from a set of k competing models, maximum
likelihood will always prefer the saturated model. Thus, it becomes important
to have a sensible procedure whereby several parametric (and over-identified)
models can be compared with one model being selected as “best” from a pre-
dictive point of view.
The predictive viewpoint and the concept of entropy generalize the prob-
lem of estimation to one of estimating distributions of future observations and
not just parameters. Estimating a distribution of future observations is referred
to as a predictive distribution. The question is how one measures the “goodness”
of this predictive distribution. One choice, according to Akaike, is to measure
the deviation of the predictive distribution from some “true” distribution. A
true distribution is a conceptual construct that forms the basis by which one
designs an estimation procedure.7 It happens that this deviation can be linked
back to the expected entropy.
From here, Akaike derives the result that regardless of the form of the true
distribution, the log-likelihood based on present data is an unbiased estimate
of the expected log-likelihood of some future set of data. This observation
leads to a measure of the “badness-of-fit” of the model, referred to as AIC and
written as
AIC = ð−2Þlog-likelihood + 2ðnumber of parametersÞ: [6.13]

In the context of structural equation modeling, the AIC can be sketched as

follows. Following Akaike (1987), let q0 be the number of unknown parameters
under the null hypothesis Σ = ΣðΩÞ and let qa = –12 p(p + 1) be the number of
unknown parameters under the alternative hypothesis that Σ is an arbitrary
symmetric positive definite matrix. Akaike shows that for a particular model
AICðH0 Þ = ð−2Þ max ln LðH0 Þ + 2q0 : [6.14]
The first term on the right-hand side of Equation [6.14] is a measure of

the fit of the model and the second term is penalty function. If q0 is low, it sug-
gests that the model is parsimonious whereas if q0 is high the model is relatively
less parsimonious. Similarly, for the alternative hypothesis
AICðHa Þ = ð−2Þ max ln LðHa Þ + 2qa : [6.15]
Recall that the likelihood ratio (LR) discussed in Chapter 2 can be written
here as
LR = ð−2Þ max ln LðH0 Þ − ð−2Þ max ln LðHa Þ, [6.16]
which we noted was distributed as chi-square with df = qa − q0 degrees of

freedom. Therefore, if we subtract AIC(Ha) from AIC(H0), we obtain
AICðH0 Þ − AICðHa Þ = ð−2Þ max ln LðH0 Þ + 2q0

−ð−2Þ max ln LðH0 Þ + 2qa [6.17]
= w − 2ðdf Þ:
2
When the goal is to use the AIC for model comparison and selection then
AIC(Ha) is common to all computations based on the same data and cancels
out of the comparisons. Therefore, AIC(H0) can be simply defined as
AICðH0 Þ = w2 − 2ðdf Þ: [6.18]
The use of the AIC requires fitting several competing models. As noted
above, the model with the lowest value of the AIC among the competing mod-
els is deemed to fit the data best from a predictive point of view. The smallest
value of the AIC is referred to as the minimum AIC (MAIC).
A particularly important feature of the AIC is that it can be used for com-
paring nonnested models. Nonnested models are quite different in terms of
their structural specification. However, as discussed below, the AIC can be used
to select from a series of nested models that are formed on the basis of relax-
ing constraints in the model.
Bayesian Information Criterion

A statistic that is similar to the AIC but rests on Bayesian model selection
theory is the Bayesian Information Criterion (BIC) (Schwartz, 1978). To moti-
vate the BIC, consider two, not necessarily nested, models, M1 and M2. Following
Raftery (1993; see also Kass and Raftery, 1995), we can write the posterior odds
of M1 relative to M2 as

pðM1 jYÞ pðYjM1 Þ pðM1 Þ
= [6.19]
pðM2 jYÞ pðYjM2 Þ pðM2 Þ
= B12 π12 ,
where B12 is called the Bayes factor, and π12 is the prior odds of M1 relative to
M2. Note that in the case of neutral prior odds, the Bayes factor is the ratio of
the marginal likelihood of M1 to the marginal likelihood of M2. In the case
where the prior odds are not neutral, the Bayes factor is the ratio of the poste-
rior odds to the prior odds.
It is possible to avoid using the prior probabilities in Equation [6.19] and
still obtain a rough approximation to the Bayes factor. Specifically, we define
the BIC as
BIC = −2 ln L + q0 lnðnÞ, [6.20]
where q0 is the number of parameters under the null hypothesis and n is the
sample size. Following Raftery (1993), it can be shown that
1
B12 = e − =2 BIC12 : [6.21]
Studies have shown that the BIC tends to penalize models with too many
parameters more harshly than the AIC.
Expected Cross-Validation Index

Another method for model selection that is also based on assessing cross-
validation adequacy of structural equation models is based on the work of
Browne and Cudeck (1989, 1993) and uses aspects of the AIC and the different
notions of discrepancies described in the Section 6.1.2.
To begin, a cross-validation index (CVI) can be formed as
CVI = FðSV , Σ
^ C Þ, [6.22]
where SV is the sample covariance matrix from the validation sample, and Σ̂c is
the fitted covariance matrix from the calibration sample. This index measures
the extent to which the model fitted to the calibration sample also fits the
validation sample.
If we consider the expected value of the CVI over validation samples given
the calibration sample we have (Browne & Cudeck, 1993)
EðCVIÞ = E½FðSV ,Σ
^ C jΣ ^ C Þ + n−1 p ,
^ C Þ = FðΣ0 ,Σ [6.23]
v
where nV is the size of the validation sample and p∗ = –12 p(p + 1) is the number
of nonredundant elements in Σ. It can be seen that the CVI is a biased estimate
of the overall discrepancy with the bias being n−1
v p . Note that one cannot
−1
remove the bias by simply subtracting nv p from FðSV , Σ ^ C Þ because in some
cases the resulting value of the fit function would take on an inadmissible
negative value. And, in any case, n−1
v p is the same for all competing models that
are fit to the calibration sample and would not change the rank ordering of the
competing models (Browne & Cudeck, 1993).
The above discussion assumes that one can split the sample to form a cal-
ibration sample and validation sample. Clearly, this is a disadvantage when one
is working with small samples. The problem arises from the fact that the over-
all error (defined above) is larger in small samples. Thus, it is desirable not to
split the sample but to develop a measure of cross-validation based only on the
calibration sample.
For the purposes of developing a single sample cross-validation index, we
must assume that the sample sizes for the calibration and the validation sam-
ples are the same. Then, it can be shown (Browne & Cudeck, 1993) that the
ECVI is approximately
ECVI = E½FðSV ,Σ ~ 0 Þ + n−1 ðp + qÞ,

^ C Þ = FðΣ0 ,Σ [6.24]
where q represents the number of free model parameters to be estimated. If we

let S = SV and Σ ^ =Σ^ V , Brown and Cudeck (1989) show that the expected
value of the index
^ + 2n − 1 q
c = FðS,ΣÞ [6.25]
is approximately the ECVI. It should be noted that when maximum likelihood

is used to obtain the discrepancy function values, the index c is related to the
AIC and will result in the same rank ordering for competing models. Indeed,
the ECVI is to be used in the same way as the AIC for selecting among
competing models. That is, the model with the smallest ECVI is selected as the
model that will cross-validate best.
Selected Fit Indices From the Science Achievement Model

Table 6.1 presents selected indices from the initial science achievement
model estimated using the Mplus software program.8 Taken as a whole, the fit
indices do not provide evidence of adequate model fit. However, as noted
above, the values of the AIC and BIC do not stand alone as meaningful. Rather,
we are required to compare these values with values from competing models.
In Section 6.2, we compare the AIC and BIC values from the modified version
of this model to the initial AIC and ECVI values in Table 6.1.
6.1.4 SUMMARY OF ALTERNATIVE FIT INDICES

FOR THE SCIENCE ACHIEVEMENT MODEL
It is clear from a variety of perspectives that the initial specification of the
science achievement model does not fit the data. There is no evidence of exact
or even approximate fit, nor is there evidence that the model fits well compared
with a baseline model of complete independence. On the basis of this evidence,
as well as our discussion in Chapter 5 of parameter estimate bias under mis-
specification, any substantive conclusions drawn from this model must be
viewed cautiously. At this point, the usual practice is to modify the model to
bring it in closer line with the data. The next section covers model modifica-
tion and the associated issue of statistical power.
6.2 Model Modification and Statistical Power
In practice, it is often the case that structural equation models are rejected on
the basis of the likelihood ratio chi-square and/or one or more of the alterna-
tive indices of fit described in Section 6.1. The reasons for model rejection are,
of course, many. The most obvious reasons include (a) violations of underly-
ing assumptions, such as normality and completely random missing data;
(b) incorrect restrictions placed on the model; and (c) sample size sensitivity.
The problem of violating assumptions was discussed in Chapter 5. There we
noted that in some cases violations of assumptions could be addressed—but we
argued that an explicit presentation of the assumptions was crucial when evalu-
ating the quality of the model. In addition, in Chapter 5, we noted that specifi-
cation errors in the form of incorrect restrictions was a pernicious problem and
intimately related to the issue of sample size sensitivity. In this section, we con-
sider the problem of modifying models to bring them closer in line with the data.
We consider model modification in light of statistical power thereby more for-
mally integrating the problem of specification error and sample size sensitivity.
When a model, such as our science achievement model, is rejected on the
basis of the LR chi-square test, attempts are usually made to modify the model
to bring it line with the data. Assuming one has ruled out assumption viola-
tions such as nonnormality, missing data, and nonindependent observations
(Kaplan, 1990a), methods of model modification usually involve relaxing
restrictions in the model by freeing parameters that were fixed in the initial
specification. The decision to free such parameters is often guided by the size
of the LM statistic, which as discussed Chapter 2, possesses a one degree-of-
freedom chi-square distribution and gives the decrease in the overall LR chi-
square test when the parameter in question is freed. The LM test is also referred
to as the modification index (MI) (Sörbom, 1989).
For each restricted but potentially identified parameter, there exists a LM
test. Software programs generally list each of the LM test values and in some
cases will provide the largest LM value. The temptation, of course, is to relax
the fixed parameter associated with the maximum LM value. The difficulty
here is that the parameter associated with the largest LM value may not be one
that makes any substantive sense whatsoever (see, e.g., Kaplan, 1988).
Regardless of whether the parameter with the largest associated LM value
is relaxed first, typically more than one, and often many, modifications to the
model are made. Two problems exist when engaging in numerous model
modifications. First, extant simulation studies have shown that searching for
specification errors via the LM test does not always result in locating the speci-
fication errors imposed on a “true” model—that is, the model that generated
the covariance matrix (Kaplan, 1988, 1989c; Luijben, Boomsma, & Molenaar,
1987; MacCallum, 1986).
A second problem associated with unrestricted model modifications on
the basis of the LM test is the increase in the probability of Type II errors
resulting from the general goal of not rejecting the null hypothesis that the
model fits the data (Kaplan, 1989b; MacCallum, Roznowski, & Necowitz,
1992). In one sense, the way to mitigate the problem of Type II errors is to free
paths that have maximum power.
A general method for calculating the empirical power of the LR chi-square
test was given by Satorra and Saris (1985). Their approach can be outlined as
follows. First, estimate the model under the null hypothesis H0 and obtain the
LR chi-square statistic. Second, estimate a new model, call it H1 (not to be con-
fused with the unrestricted alternative Ha), which consists of the model under
H0 with parameters fixed at their maximum likelihood estimates obtained
from the first step but with the restriction of interest dropped and replaced
with an alternative “true” parameter value to be tested. Note that estimating the
model under H0 with parameters fixed at their maximum likelihood values will
yield a chi-square test value of 0 with degrees of freedom equal to the number
of degrees of freedom of the model. When estimating the H1 model under the
H0 specification, the resulting chi-square will no longer be 0. Indeed, the chi-
square statistic resulting from this test is distributed as a noncentral chi-square
with noncentrality parameter λ. With the noncentral chi-square statistic and

the degrees of freedom in hand, one can determine the power of the test using
tables such as those provided in Saris and Stronkhorst (1984).
An immediate problem with the procedure outlined above is that it
requires estimating the model twice for every parameter of interest and for
every alternative value of interest. Moreover, it will often be difficult in practice
for researchers to specify a “true” value for the parameter to be tested. This
problem was remedied by Satorra (1989), who recognized that the LM test
could be used to approximate the noncentrality parameter for each restriction
in the model. Because there exists an LM test associated with each fixed para-
meter in the model, one can obtain a one degree-of-freedom assessment of the
power of the test. In practical terms, this means that for each restriction in the
model, one can assess whether the test is powerful enough to reject the null
hypothesis that the parameter in question is zero.
6.2.1 ESTIMATION OF OVERALL POWER

A question that often arises in the application of structural equation
modeling concerns the sample size necessary to estimate the model. This is an
important question when concerned with obtaining research funding as well,
insofar as funding agencies want to be confident that proposed studies have
sufficient power.
In other statistical methodologies, such as analysis of variance, it is possi-
ble to determine an overall sample size necessary to detect a specific effect
size—sometimes quantified in terms of percentage of variance accounted for.
The question here concerns whether it is possible to determine the sample size
necessary to detect an effect in a structural equation model. The issue therefore
centers on the type of effect one is interested in detecting.
Based on the development of the RMSEA discussed earlier, MacCallum,
Browne and Sugawara (1996) developed an approach to estimate the required
sample size necessary to detect whether a model closely fits the data, against the
alternative that the model is a mediocre fit to the data. The problem rests on
first developing power calculations for the RMSEA.
Drawing on Equations [6.7] through [6.10], MacCallum et al. (1996) show
that the noncentrality parameter can be expressed in terms of the RMSEA as
l = nde2 , [6.26]
where n = N − 1. Note that from the standpoint of the RMSEA, perfect fit
implies that ε = 0, which in turns implies that the distribution of the test
statistic is a central chi-square distribution because under the central chi-
square distribution, the noncentrality parameter, λ = 0. However, using
Equation [6.26], MacCallum et al. (1996) showed that the test of the hypo-
thesis of close fit e ≤ 0:05 can be tested with a noncentral chi-square with
noncentrality parameter λ given in Equation [6.26].
But now suppose that the true value of ε is 0.08 (considered mediocre fit),
and we test the hypothesis of close fit, e ≤ 0:05 . What is the power of this test?
To be specific, let ε0 represent the null hypothesis of close fit and let εa repre-
sent the alternative hypothesis of mediocre fit. The distribution used to test the
2
null hypothesis in this case is the noncentral chi-square distribution wd, l0 and
the distribution used to test the alternative hypothesis is the noncentral chi-
2
square distribution wd,la , where d are the degrees of freedom and λ0 and λa are
the respective noncentrality parameters. From here, the power of the test of
close fit is given as
p = Pr w2d, la ≥ w2c , [6.27]
2
where wc is the critical value of chi-square under the null hypothesis for a
given Type I error probability α (MacCallum et al., 1996).
Given that the power of the test can be determined from values of N, d, ε0,
εa, and α, we can turn to the question of determining the sample size necessary
to achieve a desired level of power. MacCallum et al. (1996) use an indirect
approach of interval halving to solve the problem of assessing the sample size
necessary to achieve a desired level of power for testing close fit. The details of
this procedure can be found in their article. Suffice to say here that the minimum
N necessary to achieve a desired level of power of the test of close fit against the
alternative of mediocre fit is a function of the degrees of freedom of the model
where models with large degrees of freedom require smaller sample sizes.
MacCallum et al. (1996) are careful to point out that their procedure must
be used with caution because, for example, models associated with a very large
number of degrees of freedom may yield required minimum sample sizes that
are smaller than the number of variables one has in hand. They also correctly
point out that their procedure is designed for omnibus testing of close fit and
that the sample size suggested for adequate power for the overall test of close
fit may not be adequate for testing parameter estimates. This concern is in line
with Kaplan (1989c), who found that power can be different in different parts
of the model even for the same size specification error. Thus, sample size effects
are not uniform throughout a model.
6.2.2 SAMPLE SIZE, POWER, AND

EXPECTED PARAMETER CHANGE
Returning to the problem of assessing power for each effect in the model,
the issue of sample size sensitivity is only relevant when the null hypothesis is
false. To see this, recall again that the log-likelihood ratio test can be written as
n × FML . Clearly, if the model fits perfectly, FML will be 0 and the sample size will
have no affect. Sample size comes into play in its interaction with model mis-
fit—where FML will then take on a value greater than zero. Thus, there is a need
to gauge the relative affect of sample size against the degree of model misfit.
A method of gauging the influence of sample size and model misfit in the
context of power analysis is through the use of the expected parameter change
(EPC) statistic. The EPC was developed by Saris, Satorra, and Sörbom (1987)
as a means of gauging the size of a fixed parameter if that parameter were freed.
To motivate this statistic, let ωi be a parameter that takes on the value ω0 (usu-
ally zero) under the null hypothesis. Let doi = ∂ ln LðΩÞ=∂oi evaluated at o î ,
where ln LðΩÞ is the log-likelihood function. Then, Saris et al. (1987) defined
the EPC as
MI
EPC = oi − o0 = , [6.28]
doi
where MI is the modification index. In essence, the EPC is a point estimate of

the alternative hypothesis for the parameter in question.
With the EPC in hand, Saris et al. (1987) discuss four possible outcomes
that can arise. First, one can obtain a large EPC and an associated large LM (or
MI). In this case, one might be advised to free this parameter insofar as it is the-
oretically justified to do so. Second, one can obtain a large LM associated with
a small EPC. Here, one might not be tempted to free the parameter despite the
large predicted drop in the overall test statistic because the value of the para-
meter might be trivial. Indeed, this case might suggest sample size sensitivity
given that other factors have been ruled out. Third, one could obtain a small
MI associate with a large EPC. In this case, the situation is ambiguous. The
problem could be one of sampling variability in the estimate or that the test
is not powerful enough to detect this parameter. A more detailed power
analysis—perhaps using the methods of Satorra and Saris (1985), might be
necessary. Finally, one could obtain a small LM associated with a small EPC.
In this case, it makes no substantive sense to free this parameter.
Methodological studies of the EPC have shown that it outperforms the
LM test with respect to locating known specification errors (Luijben et al.,
1987). Moreover, when the EPC statistic and LM test are used in combination
as a strategy for model modification they can provide useful information about
sample size sensitivity and power (Kaplan, 1990a, 1990b).
A problem with the EPC is that it is dependent on the metrics of the
observed variables. Therefore, it may be desirable to standardize the EPC so as
to allow relative comparisons of size. A standardized version of the EPC was
developed by Kaplan (1989a). For example, if we let θEPC
g represent the expected
change statistic associated with a fixed element of the parameter matrix Γ (the
matrix of coefficients relating endogenous variables to exogenous variables),

then a standardized version of θEPC
g would be calculated as
" #1
Varð^ξÞ
2
θSEPC = θEPC
g , [6.29]
g
ηÞ
Varð^
where Varð^ξÞ is an appropriate diagonal element of Φ̂ and Varð^ηÞ is expressed

in terms of other parameters of the model. It was noted by Chou and Bentler
(1990) that the SEPC given in Equation [6.29] was relevant for observed
variable models. However, if a full latent variable model was specified then
Equation [6.29] would be sensitive to the scaling of the constructs. Therefore,
they proposed a completely standardized EPC. An empirical study of the
completely standardized EPC by Chou and Bentler (1990) demonstrated the
utility of their expanded statistics.
6.2.3 MODEL MODIFICATION AND MODEL SELECTION

As we discussed above, the AIC, ECVI, and BIC, can be used to select
among competing nonnested models as well as competing nested models. In
the latter case, such nested models are typically the result of model modifi-
cations. There is an interesting relationship between model modification
and cross-validation. Kaplan (1991a) showed that the AIC is a function of
the MI. Specifically, recall from Equation [6.18] that the AIC for any given
model can be written as χ2 − 2(df ). Consider two models M1 and M2, where
M2 is formed from M1 by relaxing a constraint based on information pro-
vided by the MI. For M1 and M2, there are corresponding AICs, denoted as
AIC1 and AIC2. Let the chi-squares and degrees of freedom for M1 and M2
be noted as χ12, χ22, df1, and df2, respectively. Then, the difference in AICs can
be expressed as
AIC1 − AIC2 = ðw21 − w22 Þ − 2ðdf1 − df2 Þ

[6.30]
= w2 − 2ðdf Þ,
expressing the change in AICs as a change in chi-square and a change in

degrees of freedom.
Note that from Chapter 2, the first term on the right-hand side of
Equation [6.30] is asymptotically equivalent to the LM (MI) test. If we con-
sider the relaxation of one constraint, then Δdf = 1, and we have
AIC1 − AIC2 = MI − 2: [6.31]

Thus we see that the relaxation of a constraint improves the predictive

validity of the model provided that the MI exceeds the constant 2. Kaplan
(1991a) argued that this finding supports the caveat against an automatic
approach to model modification. But, as we noted above, the use of the expected
change statistic in conjunction with the MI is a better approach to model mod-
ification than the use of the MI alone.
Modification of the Science Achievement Model

Table 6.2 presents the modification indices, expected parameter changes,
and standardized expected parameter changes associated with the science
achievement model estimated in Chapter 4. An inspection of the MI and SEPC
suggest that the most substantive change in the model in terms of improve-
ment of fit and expected parameter estimate would arise from freeing the path
from SES to SCIACH. From a substantive point of view, such a modification
is plausible. However, when inspecting the input-process-output model in
Figure 1.2. It is clear that the student inputs are hypothesized to be related to
achievement outputs only through curriculum and instructional mediating
variables. Thus, this modification calls into question presumption of full medi-
ation as suggested by the input-process-output theory.
Table 6.3 presents the estimates, standard errors, and goodness-of-fit sta-
tistics for the modified model with the SES to SCIACH path added. Two results
of this modification are worth noting. First, as expected from the modification
Table 6.2 Model Modification Indices and Expected Parameter Change

Statistics for Full Science Achievement Model
Fixed Path MI EPC Std EPC StdYX EPC
INVOLV ON SCIGRA10 0.205 0.002 0.004 0.007

INVOLV ON SCIACH 72.157 −0.013 −0.022 −0.127
INVOLV ON SCIGRA6 1.424 −0.011 −0.018 −0.018
INVOLV ON SES 11.464 −0.041 −0.068 −0.050
CHALL ON SCIGRA10 53.596 0.111 0.121 0.221
CHALL ON SCIACH 0.076 0.001 0.001 0.004
CHALL ON SCIGRA6 68.436 0.112 0.122 0.118
CHALL ON SES 5.210 0.040 0.044 0.032
CHALL ON CERTSCI 0.004 0.003 0.003 0.001
SCIGRA10 ON SCIACH 576.194 −0.190 −0.190 −0.605
SCIACH ON SCIGRA6 408.406 1.540 1.540 0.255
SCIACH ON SES 732.887 2.455 2.455 0.313
SCIACH ON CERTSCI 14.399 −0.865 −0.865 −0.043
index, the addition of the path from SES to SCIACH resulted in a significant
drop in the value of the likelihood ratio chi-square. Moreover, the selected set
of indices presented in Table 6.3 show improvement, although the RMSEA
remains statistically significant. Second, the addition of this path does not
result in a substantial change in the other paths in the model. That is, the esti-
mated effects in the initial model remain about the same.

Model (Modification of SCIACH Regressed on SES Added)
Measurement model
INVOLV BY
MAKEMETH 1.000 0.000 0.000 0.606 0.738
OWNEXP 0.724 0.027 26.469 0.439 0.605
CHOICE 0.755 0.029 26.036 0.458 0.507
CHALL BY
CHALLG 1.000 0.000 0.000 0.917 0.748
UNDERST 0.757 0.024 31.043 0.694 0.503
WORKHARD 0.867 0.026 32.921 0.795 0.723
Structural model
CHALL ON
INVOLV 0.251 0.027 9.282 0.166 0.166
INVOLV ON
CERTSCI 0.012 0.031 0.399 0.021 0.006
SCIGRA10 ON
CHALL 0.264 0.026 10.319 0.242 0.133
SCIACH ON
SCIGRA10 1.013 0.035 29.120 1.013 0.317
SES 2.456 0.086 28.710 2.456 0.313
SCIGRA10 ON
SCIGRA6 0.788 0.021 37.146 0.788 0.416
SES 0.240 0.028 8.734 0.240 0.098
CERTSCI 0.032 0.068 0.466 0.032 0.005
Selected goodness-of-fit indices
χ2 (df = 38) = 953.726, p < .000; ; TLI = 0.884; CFI = 0.915; RMSEA = 0.060, p < .05;
BIC = 208652.581
An inspection of the modification indices and expected change statistics

after the addition of the path from SES to SCIACH (not shown) revealed that
the next largest MI associated with the largest EPC (and SEPC) was the path
from SCIGRA6 to SCIACH. Again, from the point of view of the input-process-
output framework presented in Figure 1.2, the suggestion of this important
effect calls into question the notion that background student characteristics
affect achievement only through instructional/curricular experiences. In other
words, prior science grades are an important direct predictor of science achieve-
ment over and above any instructional/curricular effects. The trajectory of
instructional experiences leading to high science grades by grade 6 is not part of
this model but do suggest the importance of a longitudinal perspective.
Table 6.4 presents the results of estimates, standard errors, and goodness-of-
fit statistics for the modified model with the SCIGRA6 to SCIACH path added.
Here, too, it can be seen that the addition of this path resulted in a substantial
improvement in fit on all measures except the PNFI. Moreover, the addition of
this path did not substantially change any of the other paths in the model.
Finally, it may be interesting to consider these modifications in line with
the question of cross-validation. An inspection of Tables 6.1, 6.3, and 6.4 shows
a substantial drop in the values of the AIC and BIC with the addition of these
modifications. In line with the recommendations for the use of the AIC and
BIC, we would want to choose the final model in Table 6.3 on the basis of its
ability to cross-validate in a future sample of the same size.
From a substantive point of view, the results speak to the importance of
prior background, as measured by SES and prior grades, on current measures
of science achievement. These effects seem to also have substantial indirect
effects on current science grades. The role of instructional and teacher charac-
teristics (as measured in this analysis) on science achievement appear weak or
almost nonexistent. However, the role of instructional characteristics on science
grades is significant. This result may not be too surprising insofar as instruc-
tional characteristics could be argued to influence achievement measures that
are more closely aligned with instruction—such as classroom tests and quizzes.
Indeed, the standardized indirect effect (not shown) of UNDERSTD on SCIACH
is about equal to the standardized indirect effect of SES on SCIACH. Both of
these indirect effects pass through SCIGRA10. Thus, instructional style as mea-
sured by the extent to which teachers press students to show understanding of
science concepts has an important role to play in the prediction of science
achievement, but only to the extent that it influences classroom achievement.
6.2.4 FACTORS INFLUENCING

MODEL MODIFICATION AND POWER
We discussed in Chapter 5 the problem of how specification errors prop-
agate through models. We noted that the mechanism for such propagation was

Model (Modification of SCIACH Regressed on SCIGRA6 Added)
Measurement model
INVOLV BY
MAKEMETH 1.000 0.000 0.000 0.606 0.738
OWNEXP 0.724 0.027 26.469 0.439 0.605
CHOICE 0.755 0.029 26.036 0.458 0.507
CHALL BY
CHALLG 1.000 0.000 0.000 0.917 0.748
UNDERST 0.757 0.024 31.043 0.694 0.503
WORKHARD 0.867 0.026 32.921 0.795 0.723
Structural model
CHALL ON
INVOLV 0.251 0.027 9.282 0.166 0.166
INVOLV ON
CERTSCI 0.012 0.031 0.399 0.021 0.006
SCIGRA10 ON
CHALL 0.264 0.026 10.319 0.242 0.133
SCIACH ON
SCIGRA10 0.748 0.037 20.019 0.748 0.235
SES 2.191 0.085 25.695 2.191 0.279
SCIGRA6 1.214 0.072 16.944 1.214 0.201
SCIGRA10 ON
SCIGRA6 0.788 0.021 37.146 0.788 0.416
SES 0.240 0.028 8.734 0.240 0.098
CERTSCI 0.032 0.068 0.466 0.032 0.005
Selected goodness-of-fit indices
χ2 (df = 37) = 675.121, p < .000; TLI = 0.917; CFI = 0.941; RMSEA = 0.051, p = 0.336,
BIC = 208382.783
the extent to which model parameters could be characterized as asymptotically

independent. If a set of parameters are asymptotically independent then their
tests are referred to as separable.
These concepts have implications for model modification and power
analysis in the structural equation modeling context. As shown by Kaplan
(1989c) and discussed in Chapter 5, specification errors can result in biased

test statistics and hence give rise to incorrect power probabilities. Here, too, it
was suggested that this problem was due to the pattern of zero and nonzero
covariances among parameter estimates. On the basis of the results of Kaplan
and Wenger (1993) as summarized in Chapter 5, it can be argued that this
mechanism explains why power has been found to differ in different parts of a
model for the same size specification error (Saris et al., 1987). Specifically, even
though the magnitude of the specification errors is the same, their locations in
different parts of a model imply different covariances with the remaining free
parameters. It was this reason among others that led Saris et al. (1987) to
develop the expected change statistic to supplement the modification index as
a means of carrying out model modification (Kaplan & Wenger, 1993). For a
review of the problem of statistical power in structural equation models, see
Kaplan (1995a).
6.3 Conclusion
The purpose of this chapter was to provide an overview of common methods

for evaluating the fit and adequacy of a model as well as to discuss issues of
power and model modification. In the next chapter, we discuss recent model-
ing methods that take into account more subtle features of data than that
assumed by simple random sampling. In particular, we consider structural
equation models for data that are derived from complex multistage sampling.
As discussed in Chapter 5, ignoring the nested structure of data can have ram-
ifications for issues of model fit and evaluation.
Notes
1. The expected value of a central chi-square is equal to its degrees of freedom.

2. The Mplus program provides the TLI and CFI only.
3. An important assumption is that the degree of misfit in the sample is about
the same as the degree of misfit in the population. See Steiger, Shapiro, and Browne
(1985).
4. Insisting the sample is drawn the same way is often ignored in this literature
but must be emphasized due to the differences in the quality of estimation if the sam-
pling scheme is ignored. For example, one can take a simple random sample of students
that ignores their nesting within schools, or one can take a multistage clustered sample.
In both cases, a different estimator would be used and the cross-validation indices
would be affected by the differences. See also Chapter 8.
5. Mplus only presents the AIC and BIC.
6. Thermodynamic entropy is a measure of unavailable energy in a closed system.
The physicist L. Boltzman showed that entropy could be defined in terms of the
probability distribution of molecules and showed that entropy was equal to the log-
probability of a statistical distribution.
7. A true distribution can be realized through Monte Carlo simulations.
8. Note that other software programs for structural equation modeling such as
AMOS (Arbuckle, 1999), LISREL (Jöreskog & Sörbom, 2000), and EQS (Bentler, 1995)
may provide more or fewer fit indices than Mplus.
7
Multilevel Structural
Equation Modeling
T he models discussed so far have assumed that observations constitute

simple random samples from a population. There are many instances,
however, where observations are not simple random samples from the popula-
tion. For example, organizations such as schools are hierarchically structured,
and the data generated from these types of organizations are typically obtained
through some form of multistage sampling. Until relatively recently, the com-
mon approach to the analysis of hierarchically organized social science data
would have been to either disaggregate data to the individual (e.g., student) level
or aggregate data to the organizational (e.g., school) level. The difficulty with
disaggregation or aggregation is they are not optimal approaches for a proper
analysis of the actual structure of the data. Using students in schools as an
example, the problem with disaggregation is that students will have the same
values on observed and unobserved school level variables. As such, the usual
regression assumption of independence of errors is violated, possibly leading to
biased regression coefficients. In the case of data aggregation, the result could be
a loss of variation such that measures of association among variables aggregated
to the school level may be overestimated.
To overcome the limitations associated with these problematic approaches
to the analysis of hierarchical data, methodologists and statisticians have made
important theoretical advances that allow for appropriate modeling of organi-
zational systems such as schools. These methods have been referred to as mul-
tilevel linear models, mixed-effects and random-effects models, random
coefficient models, and covariance components models. The differences in
these terms reflect, in some respects, the fact that they have been used in many
different research settings such as sociology, biometrics, econometrics, and sta-
tistics, respectively (see Kreft & de Leeuw, 1998, for an overview of the history
133
of these methods). In addition to statistical developments, software advances

have allowed for relatively straightforward estimation of multilevel models.
The purpose of this chapter is to describe recent methodological advances
that have extended multilevel modeling to the structural equation modeling
perspective. The organization of this chapter is as follows. Section 7.1 provides
a discussion of multilevel structural equation modeling. The general problem
of parameter estimation for multilevel structural models is discussed first. This
is followed by a discussion of multilevel factor analysis in Section 7.2. An
example of multilevel factor analysis is provided that reexamines student per-
ceptions of school climate discussed in Chapter 3. Following this, multilevel
structural equation modeling is described in the simple case of multilevel path
analysis wherein within-organization level parameters are allowed to vary
across organizations and, in turn, are modeled as a function of between-
organization variables following their own path model. A multilevel path
analysis of student science achievement is provided as an example.
In addition to covering multilevel structural equation modeling, a related
problem concerns structural equation modeling applied to complex sampling
designs. Such designs are not uncommon on the social and behavioral sciences.
For example, in the field of education, large-scale surveys are developed where
simple random sampling is not feasible or desirable. To reflect the organization
of schooling as well as to obtain sufficiently large samples of underrepresented
groups, a form of multistage sampling, with specific forms of oversampling is
often employed. To ensure proper inferences to the relevant population, sam-
pling weights are typically employed. Recent developments in structural equa-
tion modeling now allow for the incorporation of sampling weights, and this
issue is discussed in Section 7.4.
7.1 Basic Ideas in Multilevel Structural Equation Modeling
When we carefully consider the problem of analyzing data arising from hierar-
chically nested systems, such as schools, it is clear that neither standard struc-
tural equation modeling nor standard multilevel modeling alone can give a
complete picture of the problem under investigation. Indeed, use of either
methodology separately could result in different but perhaps equally serious
specification errors. Specifically, using conventional structural equation model-
ing assuming simple random samples alone would ignore the sampling schemes
that are often used to generate educational data and would result in biased
structural regression coefficients (B. Muthén & Satorra, 1989). The use of mul-
tilevel modeling alone would preclude the analyst from studying complex indi-
rect and simultaneous effects within and across levels of the system. What is
required, therefore, is a method that combines the best of both methodologies.
Multilevel Structural Equation Modeling—135
One of the earliest attempts to develop multilevel latent variable modeling

was by Schmidt (1969) who derived a maximum likelihood (ML) estimator for
a general multilevel covariance structure model but did not attempt to intro-
duce group level variables into the model. In a later paper, Longford and
Muthén (1992) provided computational results for multilevel factor analysis
models. B. Muthén and Satorra (1989) were the first to show the variety of pos-
sible special cases of multilevel covariance structure modeling, and B. Muthén
(1989) suggested, among other things, how such models could be estimated
with existing software. Later, Kaplan and Elliott (1997a) building on the work
of B. Muthén (1989) derived the reduced form specification of a multilevel
path model. This model was argued to be applicable to the problem of devel-
oping policy simulation models for validating education indicators (Kaplan &
Elliott, 1997b; Kaplan & Kreisman, 2000). These earlier studies by Kaplan and
Elliott (1997b), Kaplan and Kreisman (2000), and others made use of a limited
information ML estimator referred to as MUML (Muthén’s ML Estimator). A
number of important studies investigating the properties of the MUML esti-
mator can be found in Yuan and Hayashi (2005) and Yuan and Bentler (2005).
7.1.1 FULL INFORMATION ML-BASED ESTIMATION FOR MLLVMS

This MUML estimator provides for estimation and testing of full structural
equation models but only for random intercept type analyses. Moreover, the
MUML estimator relies on the computation of two covariance matrices—the
pooled within-groups covariance matrix and the between-groups covariance
matrix. Since the development of the MUML estimator, a number of new esti-
mation methods have appeared that provide for full information ML estimation
of the parameters of multilevel latent variable models (MLLVMs) that do not
specifically require the estimation of two separate covariance matrices, and
allow for random slopes as well as random intercepts in full structural equation
models. These new estimators rely on the expectation-maximization (EM) algo-
rithm (Dempster, Laird, & Rubin, 1977) for estimation under a general notion
of missing data. In the context of hierarchical linear model (Raudenbush &
Bryk, 2002), the EM algorithm was used to treat random coefficients as missing
data. In the context of MLLVM, the EM algorithm was used by Lee and Poon
(1998) and Bentler and Liang (2003) for two-level structural equation models,
where the between-level part of a variable is viewed as missing. More recently,
Asparouhov and Muthén (2003) developed three EM algorithm–based ML esti-
mators that combine both approaches.
The three EM algorithm–based ML estimators are distinguished by the
approach they take for the calculation of standard errors. The first method uses
a first-order approximation of the asymptotic covariance matrix of the esti-
mates to obtain the standard errors and is referred to as the MLF estimator.
The second method is the conventional ML estimator, which uses the second-
order derivatives of the observed log-likelihood. The third method is based on
a sandwich estimator derived from the information matrices of ML and MLF
and produces the correct asymptotic covariance matrix of the estimates that is
not dependent on the assumption of normality, and which also yields a robust
chi-square test of model fit. This estimator is referred to as MLR. The MLR is
a robust full information ML estimator for MLLVMs. A small simulation study
reported in Asparouhov and Muthén (2003) compared the ML and MLR
estimator to a mean-adjusted and mean- and variance-adjusted ML estimator.
The results demonstrated better performance of the MLR estimator for non-
normal variables than that obtained from the maximum likelihood estimator
with mean and variance adjustment.
7.1.2 WEIGHTED LEAST SQUARES ESTIMATION FOR MLLVMS

As with single-level latent variable models, the ML estimator assumes con-
tinuous manifest variables. The MLR estimator assumes continuous manifest
variables as well but allows relaxation of the normality assumption. In practice,
however, it is often the case that manifest variables are categorical in nature,
and substantive applications may very well contain manifest variables of
different scale types—including binary, ordered categorical, and continuous.
Recently, Asparouhov and Muthén (2007), building on the single-level work of
B. Muthén (1984) developed a weighted least squares estimator for MLLVMs
that provides computationally efficient estimates and correct chi-square tests
of model fit in the presence of categorical manifest variables. This is referred to
as the weighted least squares mean adjusted estimator (WLSM). A small sim-
ulation study by Asparouhov and Muthén (2007) demonstrated that the
WLSM estimator performed better than MLR when the manifest variables
were categorical and virtually the same as MLR with the data were continuous
and normally distributed. The ML and weighted least squares estimators are
available in the Mplus software program (L. Muthén & Muthén, 2007).
Throughout this chapter, we use the MLR estimator.
7.2 Multilevel Factor Analysis
In this section, we outline multilevel latent variable modeling for continuous

latent variables. Examples will use the MLR estimator described above and all
analyses will use the Mplus software program (L. Muthén & Muthén, 2007). In
line with common applications of single-level structural equation modeling,
we begin with a discussion of the measurement problem by focusing on mul-
tilevel factor analysis. It should be noted that recent work by Fox and Glas
(2001) has extended multilevel modeling to the item response theory context.
However, a full discussion of their work is beyond the scope of this chapter.
To begin, consider a model that decomposes a p-dimensional response
vector yig for student i in school g into the sum of a grand mean μ, a between-
groups part νg and a within-groups part uig. That is,
yig = μ + νg + uig : [7.1]
The total sample covariance matrix for the response vector yig can be written as
ΣT = Σb + Σw , [7.2]
where ΣT is the population total sample covariance matrix, Σb is the popula-

tion between-groups covariance matrix, and Σw is the population within-groups
covariance matrix. Sample quantities can be defined as
ng
1X
y:g = y [7.3]
ng i = 1 ig
ng
1X G X
y = y [7.4]
N g = 1 i = 1 ig
ng
1 X G X
Sw = ðy − y:g Þðyig − y:g Þ0 [7.5]
N − G g = 1 i = 1 ig
1 X G
0
Sb = ng ðy:g − yÞðy:g − yÞ , [7.6]
G − 1 g =1
where y–.g is the sample mean for group g, y– is the grand mean, Sw is the sample
pooled within-group covariance matrix, and Sb is the between-groups covari-
ance matrix.
As with the standard application of linear regression to data arising from
multistage sampling, the application of factor analysis should also account for
nested effects. For example, a battery of attitude items assessing student per-
ceptions of school climate administered to students will most likely exhibit
between-school variability. Ignoring the between-school variability in the
scores of students within schools will result in predictable biases in the para-
meters of the factor analysis model. Therefore, it is desirable to extend multi-
level methodology to the factor analysis framework.
To start, let the vector of student responses be expressed in terms of the
multilevel linear factor model as
yig = ν + Λw ηwig + Λb ηbg + wig + bg , [7.7]

where yig was defined earlier, ν is the grand mean, Λw is the factor loading matrix
for the within-group variables, ηwig is a factor that varies randomly across units
within groups, Λb is the between-groups factor loading matrix, ηbg is a
factor that varies randomly across groups, ⑀wig and ⑀bg are within- and between-
group uniquenesses. Under the standard assumptions of linear factor analysis,
here extended to the multilevel case, the total sample covariance matrix defined
in Equation [7.2] can be expressed in terms of factor model parameters as
ΣT = Λw Φw Λ0w + Θw + Λb Φb Λ0b + Θb , [7.8]
where Φw and Φb are the factor covariance matrices for the within-group and
between-group parts and Θw and Θb are diagonal matrices of unique vari-
ances for the within-group and between-group parts.
Generally speaking, it is usually straightforward to specify a factor struc-
ture for the within-school variables. It is also straightforward to allow for
within-school variables to vary between schools. Conceptual difficulties often
arise in warranting a factor structure to explain variation between groups. In
an example given in Kaplan and Kreisman (2000) examining student percep-
tions of school climate, two clear factors were extracted for the within-school
part, but the between-school part appeared to suggest one factor. The fact that
it is sometimes difficult to conceptualize a factor structure for the between-
groups covariance matrix does not diminish the importance of taking the
between-group variability into account when conducting a factor analysis on
multilevel structured data.
An Example of Multilevel Confirmatory Factor Analysis

In this section, we provide examples of multilevel exploratory factor
analysis and multilevel confirmatory factor analysis using data from the PISA
2003 database. The PISA is sponsored by the Organisation for Economic
Co-operation and Development (OECD; 2004) and represents arguably the
largest and most sophisticated international assessment of student academic
competencies. Data are collected on 15-year-old students from the participat-
ing countries. We concentrate on the PISA 2003 cycle, which focused on math-
ematics and which contains information on over a quarter of a million
students from 41 countries. It includes not only information on their perfor-
mance in the major content domains but also their responses to the student
questionnaires that they complete as part of the assessment. The student ques-
tionnaires cover a large variety of topics, including attitudes to the subject mat-
ter being assessed as well as considerable background information. In what
follows, we analyze the data from the South Korean sample.
In this analysis, we estimate a single-level and multilevel confirmatory fac-

tor analysis (CFA) with and without the addition of gender as a covariate. On
the basis of initial exploratory factor analyses, we specified two within-school
factors and one between-school factor. The first within-school factor can be
labeled CALCULATING MATHEMATICS IN LIFE and the second within-
school factor can be labeled SOLVING EQUATIONS. The single between-
school factor can be interpreted as representing perhaps an overall school level
emphasis on mathematics instruction and can be labeled GENERAL MATHE-
MATICS EMPHASIS.
The results of the single-level and multilevel CFAs without predictors are
displayed in Table 7.1. Comparison of the single-level and multilevel results
without predictors suggests that accounting for clustering slightly worsened
model fit as evidenced by the larger likelihood ratio chi-square, comparative fit
index, and root mean square of approximation. The estimates are also negligi-
bly different with the exception that the standard errors for the multilevel solu-
tion are uniformly larger. It should be noted that taking into account clustering
is known to improve fit in simulation studies. In the context of real data how-
ever, accounting for clustering is still appropriate but can also reveal other
problems that can lead to poorer fit.
As an additional analysis, we added gender as a predictor of the latent vari-
ables with males coded 0 and females coded 1. Adding a predictor to a CFA
model yields the specification of a multiple indicator multiple cause (MIMIC)
structural equation model (Jöreskog & Goldberger, 1975). A path diagram of
this model is displayed in Figure 7.1 and unstandardized results and model fit
statistics for the single-level and multilevel CFAs with gender as the predictor
are displayed in Table 7.1. The results are shown in Table 7.1 under the
columns titled “With Predictors.” Again, the multilevel results show a slight
worsening of fit. However, conclusions regarding the gender effect remain the
same—namely, we find significant gender differences on CALCULATING
MATHEMATICS IN LIFE and SOLVING EQUATIONS for both the single-level
and multilevel solutions.
7.3 Multilevel Path Analysis
As noted above, multilevel regression models may not be suited for capturing
the structural complexity within and between organizational levels. For exam-
ple, it may be of interest to determine if school level variation in student
science achievement can be accounted for by school-level variables. Moreover,
one might hypothesize and wish to test direct and indirect effects of school
level exogenous variables on that portion of student-level achievement that
Table 7.1 Results of Confirmatory Factor Analysis of PISA 2003 Mathematics

Assessment
Single-Level CFA Multilevel CFA
Without With Without With

Predictors Predictors Predictors Predictors
Estimate SE Estimate SE Estimate SE Estimate SE
Within-School Model
Calculating Mathematics
Train timetable 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
Discount % 1.187∗ 0.022 1.190∗ 0.022 1.136∗ 0.026 1.135∗ 0.025
Size (m2) of a floor 1.140∗ 0.023 1.140∗ 0.023 1.125∗ 0.027 1.124∗ 0.027
Graphs in newspaper 0.909∗ 0.021 0.908∗ 0.021 0.876∗ 0.026 0.875∗ 0.026
Distance on a map 1.184∗ 0.028 1.185∗ 0.028 1.113∗ 0.031 1.109∗ 0.033
Petrol consumption rate 0.881∗ 0.022 0.883∗ 0.022 0.905∗ 0.027 0.904∗ 0.027
Solving equations
3x + 5 = 17 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
2(x + 3)=(x + 3)(x − 3) 1.060∗ 0.015 1.059∗ 0.015 1.039∗ 0.020 1.036∗ 0.021
Calculating Mathematics −0.133∗ 0.016 −0.113∗ 0.024
on MALE
Solving Equations on MALE −0.039 0.023 0.001 0.038
Factor Covariances
Calculating Mathematics 0.286∗ 0.009 0.284∗ 0.009 0.197∗ 0.008 0.197∗ 0.008
with Solving Equations
Between-School Model
General Mathematics Emphasis
Train timetable 1.000 0.000 1.000 0.000
Discount % 1.373∗ 0.067 1.379∗ 0.068
Size (m2) of a floor 1.192∗ 0.062 1.195∗ 0.060
Graphs in newspaper 1.047∗ 0.063 1.053∗ 0.064
Distance on a map 1.460∗ 0.102 1.474∗ 0.105
Petrol consumption rate 0.752∗ 0.072 0.764∗ 0.074
3x + 5 = 17 1.808∗ 0.132 1.814∗ 0.143
2(x + 3)=(x + 3)(x − 3) 1.987∗ 0.136 1.994∗ 0.145
General Mathematics −0.057 0.051
Emphasis on MALE
Model Fit Indices
χ2 456.250 (19 df ) 526.500 (25 df ) 641.253 (39 df ) 670.784 (52 df )
AIC 86593.8 95130.6 85173.1 89797.8
BIC 86758.6 95308.9 85443.3 90088.2
NOTE: Unstandardized estimates are displayed. SE = standard error; AIC = Akaike information criterion; BIC = Bayesian
information criterion.
Values are statistically significant at p < . 05.
Within-School
Train timetable
Discount %
Calculating Size of a floor

Mathematics
in Life
Graphs in newspaper
Distance on a map
Gender
Petrol consumption
rate
3x + 5 = 17
Solving
Equations
2(x + 3) = (x + 3)(x − 3)
Between-School
Train timetable
Discount %
Size of a floor
General
Gender Mathematics Graphs in newspaper
Emphasis
Distance on a map
Petrol consumption rate
3x + 5 = 17
2(x + 3) = (x + 3)(x − 3)
Figure 7.1 Multilevel Factor Analysis With a Covariate

varies over schools. We argue that these questions are important for a fuller
understanding of organizational systems and such questions can be addressed
via multilevel structural equation modeling.
For ease of notation and development of concepts, we focus our discus-
sion on multilevel path analysis. By focusing on this model, we are assuming
that reliable and valid measures of the variables are available. We recognize that
this assumption may be unreasonable for most social and behavioral science
research, but as shown in the previous section, multilevel measurement mod-
els exist that allow one to examine heterogeneity in measurement structure.
Indeed, as a matter of modeling strategy, it may be very informative to exam-
ine heterogeneity in measurement structure prior to forming scales to be used
in multilevel path analysis. However, it is possible to combine multilevel path
models and measurement models into a comprehensive multilevel structural
equation model.
The model that we will consider allows for varying intercepts and varying
structural regression coefficients. Earlier work on multilevel path analysis by
Kaplan and Elliott (1997a) building on the work of B. Muthén (1989) specified
a structural model for varying intercepts only. This “intercepts as outcomes”
model was applied to a specific educational problem in Kaplan and Elliott
(1997b) and Kaplan and Kreisman (2000).
In what follows, we write the within-school (Level-1) full structural equa-
tion model as
yig = αg + Bg yig + Γg xig + rig , g = 1, 2, . . . , G, [7.9]
where yig is a p-dimensional vector of endogenous variables for student i in

school g, xig is a q-dimensional vector of within-school exogenous variables, αg
is a vector of structural intercepts that can vary across schools, Bg and Γg are
structural coefficients that are allowed to vary across schools, and rig is the
within-school disturbance term assumed to be normally distributed with
mean zero and constant within-school variance σ 2r.
From here, we can model the structural intercepts and slopes as a function
of between-school endogenous variables zg and between-school exogenous
variables wg. Specifically, we write the Level-2 model as
αg = α00 + α01 zg + α02 w g + g , [7.10]

Bg = B00 + B01 zg + B02 wg + ζg , [7.11]
Γg = Γ00 + Γ01 zg + Γ02 wg + θg : [7.12]
Note how Equations [7.10] to [7.12] allow for randomly varying inter-
cepts and two types of randomly varying slopes—namely, Bg are randomly
varying slopes relating endogenous variables to each other and Γg are ran-
domly varying slopes relating endogenous variables to exogenous variables.
These randomly varying structural coefficients are modeled as functions of a
set of between-school predictors zg and wg. These between-school predictors
appear in Equations [7.10] to [7.12], but their respective regression coefficients
are parameterized to reflect a priori structural relationships.
Of particular importance for substantive research is the fact that the full mul-
tilevel path model allows for a set of structural relationships among between-
school endogenous and exogenous variables, which we can write as
zg = τ + Δzg + Ωw g + δg , [7.13]
where τ, Δ, and Ω are the fixed structural effects. Finally, ε, ζ, θ, and δ are dis-
turbance terms that are assumed to be normally distributed with mean zero
and covariance matrix T with elements
0 1
s2E
B szE s2z C
T=B
@ syE
C:
A [7.14]
syz s2y
sdE sdz sdy s2d
After a series of substitutions we can obtain the reduced form of the Level-1
and Level-2 models and express yig as a function of a grand mean, the main effect
of within-school variables, the main effect of between-school variables, and the
cross-level moderator effects of between- and within-school variables. These
reduced form effects contain the structural relations as specified in Equations [7.9]
through [7.13]. The importance of this model is that if w consists of variables that
could, in principle, be manipulated in the context of a hypothetical experiment,
and then this model could be used to test cross-level causal hypotheses taking into
account the structural relationships between and within levels.1
Although this discussion has focused on multilevel structural equation
modeling with manifest variables, it is relatively straightforward to specify a
multilevel structural equation among latent variables. A review of the extant
literature had not uncovered an application of the full model described here
using latent variables, except in the context of the analysis of longitudinal data,
which is described next.
7.3.1 AN EXAMPLE OF MULTILEVEL PATH ANALYSIS

A multilevel path analysis was employed to study within and between
school predictors of mathematics achievement again using data from the PISA
2003 survey (OECD, 2004). The final outcome variable at the student level was
a measure of mathematics achievement (MATHSCOR).2 Mediating predictors
of mathematics achievement consisted of whether students enjoyed mathe-
matics (ENJOY) and whether students felt mathematics was important in life
(IMPORTNT). Student exogenous background variables included students’
perception of teacher qualities (PERTEACH), as well as both parents’ educa-
tional levels (MOMEDUC & DADEDUC). At the school level, a model was
specified to predict the extent to which students are encouraged to achieve
their full potential (ENCOURAG). A measure of teachers’ enthusiasm for their
work (ENTHUSIA) was viewed as an important mediator variable between
background variables and encouragement to make students achieve full poten-
tial. The variables used to predict encouragement via teachers’ enthusiasm
consisted of math teachers’ use of new methodology (NEWMETHO), consen-
sus between math teachers with regard to school expectations and teaching
goals as they pertain directly to mathematics instruction (CNSENSUS), and
the teaching conditions of the school (CNDITION). The teaching condition
variable was computed from the shortage of school’s equipment, so higher val-
ues on this variable reflect a worse condition.
A diagram of the multilevel path model is shown in Figure 7.2. The dia-
gram is drawn to convey the fact that the intercepts of the endogenous vari-
ables, ENJOY, IMPORTNT, and MATHSCOR and the slope of MATHSCOR
on ENJOY are regressed on the endogenous and exogenous school level vari-
ables. The results of the multilevel path analysis are displayed in Table 7.2.
First, we estimated the intraclass correlations to determine the amount of
variation in the student-level variables that can be accounted for by differences
between schools. We found intraclass correlations (not shown) ranging from a
low of 0.02 for the importance of math in one’s life to a high 0.259 for mathe-
matics achievement. Under the heading “Within School,” we find that MOME-
DUC, DADEDUC, ENJOY, and IMPORTNT are significant and positive
predictors of MATHSCOR. We also observe that ENJOY is significantly and
positively predicted by PERTEACH. Finally, MOMEDUC, PERTEACH, and
ENJOY are positive and significant predictor of IMPORTNT.
Of importance to this chapter are the results under the heading “Between
School.” Here, we find that the resource conditions of the school (CNDITION)
and the extent to which the school encourages students to use their full poten-
tial (ENCOURAG) are both significant predictors of math achievement.
Enjoyment of mathematics is significantly related to whether there is consen-
sus among mathematics teachers in with regard to expectations and teaching
goals. Importance of mathematics is related to the resource conditions of the
school. Teacher enthusiasm for their work is significantly predicts the extent to
which they encourage students to use their full potential. Enthusiasm is pre-
dicted by use of new methods for teaching math and the extent of consensus
Table 7.2 Results of Multilevel Path Analysis
Within-School Between-School
Model Model
Estimate SE Estimate SE
MATHSCOR on RANDOM SLOPE on

MOMEDUC 4.011∗ 1.042 NEWMETHO −4.632 2.652
∗ ∗
DADEDUC 4.813 0.929 ENTHUSIA 10.101 3.838
PERTEACH 6.273∗ 2.765 CNSENSUS −3.629 3.224
IMPORTNT 15.873∗ 2.334 CNDITION −8.181∗ 2.532
ENCOURAG −1.668 2.863
ENJOY on MATHSCOR on
PERTEACH 0.457∗ 0.026 NEWMETHO 6.806 6.550
IMPORTNT on ENTHUSIA −14.081 8.881
MOMEDUC 0.026∗ 0.006 CNSENSUS 2.407 7.898
PERTEACH 0.245∗ 0.021 CNDITION 3.366 6.683
ENJOY 0.534∗ 0.015 ENCOURAG 14.594 7.299
ENJOY on
NEWMETHO 0.008 0.025
ENTHUSIA 0.016 0.038
CNSENSUS 0.109∗ 0.036
CNDITION 0.019 0.025
ENCOURAG −0.035 0.024
IMPORTNT on
NEWMETHO −0.027 0.019
ENTHUSIA 0.028 0.031
CNSENSUS 0.057 0.030
CNDITION 0.044∗ 0.020
ENCOURAG 0.002 0.020
ENCOURAG on
ENTHUSIA 0.579∗ 0.086
ENTHUSIA on
NEWMETHO 0.164∗ 0.044
CNSENSUS 0.323∗ 0.067
CNDITION −0.042 0.040
NOTE: Unstandardized estimates are displayed. SE = standard error. Values are statistically significant
at p < .05.
Within-School
MOMEDUC
ENJOY
DADEDUC MATHSCOR
IMPORTNT
PERTEACH
Between-School
ENJOY
IMPORTNT
MATHSCOR
NEWMETHO
CNSENSUS
ENTHUSIA ENCOURAG
CNDITION
RANDOM SLOPE
Figure 7.2 Multilevel Path Model of Mathematics Achievement With Structural

Model at the Within-School and Between-School Levels
around school expectations and teaching goals pertaining to mathematics

instruction.
The results for the random slope relating ENJOY to MATHSCOR reveals
that teacher enthusiasm moderates the relationship between enjoyment of
math and math achievement—with higher levels of teacher reported enthusi-
asm associated with a stronger positive relationship between enjoyment of
math and math achievement. Finally, the condition of the school also demon-
strates a significant moderating effect on the relationship between enjoyment
of math and math achievement, where poorer conditions of the school lowers
the relationship between enjoyment of math and math achievement.
7.4 Incorporating Sampling

Weights in Latent Variable Models
The models described earlier in this chapter account for the multilevel nature
of organizations such as schools. So far, though, it has been assumed that ran-
dom samples from each level of analysis have been obtained. However, it is not
often the case that random samples from each level of the organization are
obtained, but rather it is more likely that samples are taken from each level of
the system with unequal probabilities, often due to the necessity of oversam-
pling underrepresented units of analysis.
Such complex sampling is common with large-scale national and interna-
tional assessments. For example, in the Early Childhood Longitudinal Study
(ECLS-K) (NCES, 2001), a three-stage sampling design is employed. The first
stage is primary sampling units of single counties or groups of counties. The
second stage is schools within counties. The third stage of sampling is students
within schools. A process of stratification as well as disproportionate sampling
was employed in the first two stages of sampling. Disproportionate sampling
was also employed at the third stage of sampling. Clearly, therefore, to obtain
unbiased estimates of population parameters, some type of weighting scheme
to reflect these design features must be used.
The problem of using sampling weights in complex sample designs has
been widely studied in the sampling literature (Kish, 1965; Kish & Frankel,
1974; Tryfos, 1996) and will not be discussed in detail in this chapter. Of con-
cern to us, however, is that the issue of sampling weights applied in complex
sample surveys have recently been discussed in the literature on structural
equation modeling (Asparouhov, 2005; Kaplan & Ferguson, 1999; Stapleton,
2002). This section overviews the recent literature on the incorporation of
sampling weights in the structural equation modeling framework and distills
the important findings and recommendations that are relevant for the appli-
cation of structural equation modeling to complex sample surveys.
The first systematic study of sampling weights in the structural equation
modeling framework was conducted by Kaplan and Ferguson (1999). In their
study, Kaplan and Ferguson considered the case of single sample factor analy-
sis applied to a simple random sample taken from a population with a mixture
of strata of different sizes. In their design, Kaplan and Ferguson assumed that
the size of the population and the size of the strata were known to the inves-
tigator, but due to the unequal strata sizes, it is necessary to apply sample
weights.
To motivate the central ideas in the Kaplan and Ferguson (1999) study,
note first that a weight wi is simply the inverse of the probability of sample
selection—namely,
1
wi = , [7.15]
pi
where pi is the probability of sample selection. With pi = n/N, where n is the size
of the strata and N is the population size, the weights sum to the population
sample size N.
The weighting scheme studied by Kaplan and Ferguson (1999) was based
on the Horvitz-Thompson estimator (Horvitz & Thompson, 1952). The cen-
tral idea of the Horvitz-Thompson estimator is that when the inclusion prob-
abilities are known, raw sampling weights can be computed and applied to
analyses so that unbiased estimates of population parameters can be obtained.
A well-known disadvantage to the use of raw sampling weights is that they
sum to the population sample size. Clearly, this would have profound effects on
the size of standard errors, but in the latent variable context, there would also
be profound inflation of goodness-of-fit indices based on the likelihood ratio
chi-square. Therefore, it may be preferable to use normalized sampling weights
that sum to the actual sample size.
In their analysis, Kaplan and Ferguson (1999) compared raw sampling
weights to normalized sampling weights using a factor analysis model.
Specifically, they employed the PRELIS software program (Jöreskog &
Sörbom, 2000) to compute weighted variances and covariances that followed
a factor analysis model. The weighted variances and covariances are calcu-
lated as
P
wi ðxi − xi Þ2
Varw = i P [7.16]
i Wi
and
P
wi ðxi − xi Þðyi − yi Þ
Covw ðx, yÞ = i P , [7.17]
i wi
respectively. Normalized weights can also be calculated.

A bootstrap design was used by Kaplan and Ferguson to examine the
impact of sampling weights in a single group factor analysis framework. The
results of the Kaplan and Ferguson study showed that ignoring sampling
weights led to much greater bias in the population parameter values compared
with when either raw or normalized sampling weights are employed. Standard
errors appeared to be somewhat underestimated when sampling weights are
employed. Interestingly, Kaplan and Ferguson found that although the nor-
malized weighting procedure yielded likelihood ratio chi-square values close to
the true value, the remaining goodness-of-fit indices showed no discernible
pattern relative regardless of weighting. Kaplan and Ferguson concluded that
in terms of applications of latent variable modeling, incorporating sampling
weights should be routine practice when the weights are provided or otherwise
known. Weighting is crucial for accurate inferences in latent variable models
with normalized sampling weights showing potential with regard to standard
errors.
Although the Kaplan and Ferguson (1999) study may have been the first
to systematically examine the use of sampling weights in the latent variable
modeling situation, their approach did not consider sampling weights in mul-
tilevel structural equation models of the sort discussed in Sections 7.2 and 7.3.
Rather, more recent studies have extended that work to multilevel structural
equation modeling.
A recent study by Stapleton (2002) provided a systematic examination
of sampling weights employed in multilevel structural equation models.
Stapleton focused on three forms of weighting based on work by Potthoff,
Woodbury, and Manton (1992). The first form of weighting employs standard
raw weights and produces a sampling variance based on the population sam-
ple size. The second form of weighting employs relative weights that summed
to the actual sample size, but has been shown to yield downward bias in esti-
mates of sampling variance (Potthoff et al., 1992). The third weighting scheme
produces weights that sum to the effective sample size but has been shown by
Potthoff et al., to yield unbiased estimates of the sampling variance of the
mean. Stapleton argued that the use of the effective sample size weights may
address a conjecture by Kaplan and Ferguson that the underestimation of stan-
dard errors using relative weights was due to not adjusting the standard errors
in the process of ML estimation.
In a detailed simulation study using a prototype multilevel structural
equation model, Stapleton (2002) corroborated the findings of Kaplan and
Ferguson (1999) regarding raw and relative sampling weights but found that
effective sampling weights yielded relatively robust estimates without the need
to adjust standard errors.
A more recent study by Stapleton (2006) examined five approaches for
obtaining robust estimates and standard errors when structural equation
modeling is applied to data from complex sampling designs. The approaches
included (a) robust maximum likelihood estimation ignoring stratification
or clustering, (b) robust maximum likelihood with manual adjustment of

standard errors using the square root of the average design effect for each vari-
able in the analysis, (c) calculation of a weighted sample covariance matrix
using design-effect adjusted weights, (d) Taylor linearization with pseudomax-
imum likelihood estimation but with only the cluster level identifier included
in the analysis, and (e) linearization and pseudomaximum likelihood with
cluster and strata identifiers included in the analysis.
The aforementioned five methods were examined under six different sam-
pling designs that correspond to the types of large-scale designs found in edu-
cation, sociology, and the health sciences. The first design was simple random
sampling. The second design was stratified random sampling with equal prob-
ability of selection under each strata. The third design was stratified random
sampling with unequal selection probabilities. The fourth design was two-stage
simple random sampling. The fifth design was two-stage complex sampling,
which included unequal probabilities of selection at each stage. Finally, the
sixth design was three-stage complex sampling with disproportionate selection
and stratification at each level.
Within context of a comprehensive Monte Carlo study, Stapleton (2006)
concluded that, in general, the use of normalized sample weights for with com-
plex sampling designs resulted in unbiased estimates of population parameters
but negatively biased standard errors. Manual adjustment of the standard errors
tended to result in over-inflation. Stapleton also concluded that design-effect
adjusted weights were not recommended for structural equation modeling
applications. With this approach, measurement and structural parameters
were found to be unbiased, but residual and disturbance variances were found
to be negatively biased. Standard errors were also biased and chi-square tests
were underestimated leading to acceptance of the null hypothesis of model
fit too often. Stapleton found that the linearization methods provided robust
estimates and standard errors and was recommended when structural equa-
tion modeling is applied to complex sampling designs. Fortunately, these
methods are available in Mplus and also in LISREL.
7.4.1 AN EXAMPLE USING SAMPLING WEIGHTS

This example reanalyzes the single-level CFA given in earlier but incorpo-
rating student-level sampling weights. The results are given in Table 7.3. We
find that there are noticeable differences in the parameter estimates but minor
difference in the standard errors. Goodness-of-fit statistics show slightly better
fit compared with the unweighted results. Nevertheless, in this example, the
differences are not extreme perhaps reflecting the nature of the sampling
design for the South Korean sample.
Table 7.3 Results of Confirmatory Factor Analysis (CFA) of PISA 2003 Mathematics
Assessment Using Student Sampling Weights
Single-Level CFA Without Weights Single-Level CFA With Weights
Without With Without With

Predictors Predictors Predictors Predictors
Estimate SE Estimate SE Estimate SE Estimate SE
Within-School Model
Calculating Mathematics
Train timetable 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
Discount % 1.187∗ 0.022 1.190∗ 0.022 1.179∗ 0.023 1.182∗ 0.023
Size (m2) of a floor 1.140∗ 0.023 1.140∗ 0.023 1.141∗ 0.022 1.140∗ 0.022
∗ ∗ ∗ ∗
Graphs in newspaper 0.909 0.021 0.908 0.021 0.904 0.021 0.903 0.021
Distance on a map 1.184∗ 0.028 1.185∗ 0.028 1.185∗ 0.028 1.189∗ 0.028
Petrol consumption rate 0.881∗ 0.022 0.883∗ 0.022 0.880∗ 0.022 0.884∗ 0.022
Solving equations
3x + 5 = 17 1.000 0.000 1.000 0.000 1.000 0.000 1.000 0.000
2(x + 3)=(x + 3)(x − 3) 1.060 ∗
0.015 1.059 ∗
0.015 1.060 ∗
0.016 1.060 ∗
0.015
Calculating Mathematics −0.133 ∗

0.016 −0.138 ∗
0.016
on MALE
Solving Equations −0.039 0.023 −0.046 0.023

on MALE
Factor Covariances
Calculating Mathematics 0.286∗ 0.009 0.284∗ 0.009 0.284∗ 0.009 0.282∗ 0.009
with Solving Equations
Model Fit Indices

χ2 456.250 (19 df ) 526.500 (25 df ) 432.962 (19 df ) 500.190 (25 df )
AIC 86593.8 95130.6 87553.9 95164.1
BIC 86758.6 95308.9 87718.9 95342.3
NOTE: SE = standard error; AIC = Akaike information criterion; BIC = Bayesian information criterion.
Values are statistically significant at p < . 05.
7.5 Conclusion
This chapter provided a review of past studies and recent developments in

multilevel latent variable modeling for continuous and categorical latent vari-
ables. In the interest of space, some topics were not addressed but which are
nevertheless important. These include multilevel latent variable models for

complex sampling designs and multilevel mixture models.
To an important extent, multilevel latent variable models address one
aspect of a complex sampling design—namely, the sampling that results in
nested data. The national and international databases that were used in this
study possess a nested data structure, where the sampling design is developed to
reflect the natural organizational structure of the system under investigation—
such as schools. However, these databases also possess additional complexities
related to the sampling design. Specifically, in many cases, there is oversam-
pling of under-represented units in the population. For example, the Early
Childhood Longitudinal Study (NCES, 2001) uses a multistage sampling
design with a very complex weighting scheme that addresses, among other
things, nonresponse, oversampling of Asian and Pacific Islander students, and
children who move from one school to another. As a consequence of this com-
plexity, sampling weights must be employed in order to properly address the
unequal probabilities of selection into the sample. For a review of these issues
in the context of structural equation modeling, see Kaplan and Ferguson
(1999), Stapleton (2002), and Asparouhov (2005). Mplus can implement sam-
pling weights, and it is clear from the extant research that the complex sam-
pling design must be taken into account when estimating the parameters of a
multilevel structural equation model.
This chapter focused on the estimation framework embedded in the Mplus
software program. In addition to this framework, Rabe-Hesketh, Skrondal,
and Pickles (2004) developed a generalized linear latent and mixed models
(GLLAMM) framework for multilevel latent variable modeling that extends
generalized linear mixed models (GLMMs) to the latent variable case. The
GLLAMM approach is not based on structuring the within- and between-
groups covariance matrices. Rather, GLAMM adopts a univariate approach that
specifies a response model, a structural model for the latent variables, and the
distribution of the latent variables. The GLLAMM framework, which is imple-
mented in the software program Stata, can handle (a) an arbitrary number of
levels, (b) missing data under missing-at-random and not-missing-at-random
assumptions, (c) unbalanced designs, (d) random coefficients with unbalanced
covariates, (e) flexibility of factor structures including free factor loadings,
(f) regressions among latent variables that vary at different levels, and (g) a large
variety of response functions, including ordered categorical responses, counts
processes, and mixed response types.
Recent developments have extended multilevel structural equation mod-
eling to cases with categorical latent variables, including multilevel latent class
analysis and multilevel Markov chain models. These extensions rest on merg-
ing multilevel modeling ideas with finite mixture models. A demonstration of
these methods using data from the Early Childhood Longitudinal Study was
recently given in Kaplan, Kim, and Kim (in press). However, the linkage of
multilevel latent variable models with finite mixture modeling is richer than
that considered in the Kaplan et al. chapter—allowing for models of, say
students nested within schools, but where there might exist unobserved het-
erogeneity among schools that can be captured by finite mixture modeling.
In the final analysis, multilevel latent variable modeling and its special
cases provide a natural framework for cross-sectional studies. The the next
chapter, we consider latent growth curve modeling for longitudinal data.
Notes
1. This point is related to a specific counterfactual model of causality based on

manipulability theory (see Woodward, 2003).
2. The math achievement variable is calculated using plausible value methodol-
ogy, in which five plausible values are obtained from a posterior distribution of latent
math ability. We used the first plausible value for this analysis.
8
Latent Growth Curve Modeling
T hus far, the examples used to motivate the utility of structural equation
modeling have been based on cross-sectional data. Specifically, it has been
assumed that the data have been obtained from a sample of individuals mea-
sured at one point in time. Although it may be argued that most applications
of structural equation modeling are applied to cross-sectional data, it can also
be argued that most social and behavioral processes under investigation are
dynamic, that is, changing over time. In this case, cross-sectional data constitute
only a snapshot of an ongoing dynamic process and interest might naturally
center on the study of this process.
Increasingly, social scientists have access to longitudinal data that can pro-
vide insights into how outcomes of interest change over time. Indeed many
important data sets now exist that are derived from panel studies (e.g., NCES,
1988; NELS:88; The National Longitudinal Study; The Longitudinal Study of
American Youth; and the Early Childhood Longitudinal Study; to name a few).
Access to longitudinal data allows researchers to address an important class of
substantive questions—namely, the growth and development of social and
behavioral outcomes over time. For example, interest may center on the devel-
opment of mathematical competencies in young children (Jordan, Hanich, &
Kaplan, 2003a, 2003b; Jordan, Kaplan, & Hanich, 2002). Or, interest may cen-
ter on growth in science proficiency over the middle school years. Moreover,
in both cases, interest may focus on predictors of individual growth that are
assumed to be invariant across time (e.g., gender) or that vary across time (e.g.,
a student’s absenteeism rate during a school year).
This chapter considers the methodology of growth curve modeling—a
procedure that has been advocated for many years by researchers such as
Raudenbush and Bryk (2002); Rogosa, Brandt, and Zimowski, (1982); and
Willett (1988) for the study of intraindividual differences in change (see also
Willett & Sayer, 1994). The chapter is organized as follows. First, we consider
155
growth curve modeling as a general multilevel problem. This is followed by

the specification of a growth curve model as a latent variable structural
equation model. In this section, we consider the problem of how time is
measured and incorporated into the model. The next section considers the
addition of predictors into the latent growth curve model, as well as using
the growth parameters as predictors of proximal and distal outcomes. This
is followed by a discussion of growth curve modeling extensions that
accommodate multivariate outcomes, nonlinear curve fitting, autoregressive
structures.
This chapter will not consider other important issues of structural equa-
tion modeling to dynamic data. In particular, we will not consider the sta-
tionarity of factors in longitudinal factor analysis (e.g., Tisak & Meredith,
1990), nor will we consider recent developments in the merging of time-series
models and structural equation models (e.g., Hershberger, Molenaar, &
Corneal, 1996). For a detailed account of growth curve modeling, see Bollen
and Curran (2006).
8.1 Growth Curve Modeling:

A Motivating Example and Basic Ideas
To motivate the development of growth curve modeling, let us revisit the

input-process-output model in Figure 1.2. A criticism of the input-process-
output model, as diagrammed in Figure 1.2 is that it suggests a static educa-
tional system rather than a system that is inherently dynamic. For example, the
outcomes of achievement and attitudes are, arguably, constructs that develop
and change over time. Therefore, it may be of interest to adopt a dynamic per-
spective and ask how outcomes change over time and how those changes are
influenced by time-invariant and time-varying features of the educational
system. In addition to examining the change in any one of these outcomes over
time, it may be of interest to examine how two or more outcomes change
together over time.
For the purposes of the example that will be used throughout this
chapter, we study change in science achievement and science attitudes sepa-
rately and together. To set the framework for this application, Figure 8.1
shows the empirical trajectories for 50 randomly chosen students on the
science achievement assessment over the five waves of LSAY. The figure shows
considerable variability in both level and trend in science achievement over
the waves of LSAY. Figure 8.2 shows the general trend in science attitudes over
the five grade levels. Unlike achievement in science, attitudes toward science
show a general linear decline over time. The advantage of growth curve
modeling is that we can obtain an estimate of the initial level of science
Latent Growth Curve Modeling—157
90
85
IRT Science Achievement Scores
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0 1 2 3 4
LSAY Waves
Figure 8.1 Fifty Random Science Achievement Observed Trajectories
21
20
19
18
17
Science Attitude Scores
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4
LSAY Waves
Figure 8.2 Fifty Random Science Attitude Observed Trajectories
achievement and the rate of change over time and link these parameters of
growth to time-varying and time-invariant variables. In this example, such
predictors will include student gender as well as teacher and parental push
variables. However, in addition to applying univariate growth curve models, we
also examine how these outcomes vary together in a multivariate growth curve
application.
8.2 Growth Curve Modeling From

the Multilevel Modeling Perspective
The specification of growth models can be viewed as falling within the class of
multilevel linear models (Raudenbush & Bryk, 2002), where Level 1 represents
intraindividual differences in initial status and growth, and Level-2 models
individual initial status and growth parameters as a function of interindividual
differences.
To fix ideas, consider a growth model for a continuous variable such
science achievement. We can write a Level-1 equation expressing outcomes
over time within an individual as
yip = p0p + p1p ti + eip , [8.1]
where yip is the achievement score for person p at time i, π0p represents the
initial status at time t = 0, π1p represents the growth trajectory, ti represents a
temporal dimension that here isassumed to be the same for all individuals—
such as grade level, and εip is the disturbance term. Later in this chapter, we
consider more flexible alternatives to specifying time metrics.
Quadratic growth can also be incorporated into the model by extending
the specification as
yip = p0p + p1p ti + π2p ti2 + eip , [8.2]

where π2p captures the curvilinearity of the growth trajectory. Higher-order
terms can also be incorporated. In Section 8.4.2, we explore an alternative to the
quadratic growth model in Equation [8.2] by allowing for general nonlinear
curve fitting.
The specification of Equations [8.1] and [8.2] can be further extended to
handle predictors of individual differences in the initial status and growth tra-
jectory parameters. In the terminology of multilevel modeling, individuals
would be considered Level-2 units of analysis. In this case, two models are spec-
ified, one for the initial status parameter and one for the growth trajectory
parameter. Consider, for example, a single time-invariant predictor of initial
status and growth for person p, denoted as xp. An example of such a predictor
might be socioeconomic status of the student. Then, the Level-2 model can be
written as
p0p = mp0 + gp0 xp + z0p [8.3]
and
p1p = mp1 + gp1 xp + z1p , [8.4]

where mp0 and mp1 are intercept parameters representing population true
status and population growth when xp is zero; gp0 and gp1 are slopes relating
xp to initial status and growth, respectively.
The model specified above can be further extended to allow individuals to
be nested in groups such as classrooms. In this case, classrooms become a
Level-3 unit of analysis. Finally, the model can incorporate time-varying pre-
dictors of change. In the science achievement example, such a time-varying
predictor might be changes in parental push or changes in attitudes toward
science over time. Thus, this model can be used to study such issues as the
influence of classroom-level characteristics and student-level invariant and
varying characteristics on initial status and growth in reading achievement
over time.
8.3 Growth Curve Modeling

From the Structural Modeling Perspective
Research by B. Muthén (1991) and Willett and Sayer (1994) have shown how
the general growth model described in the previous section can also be incor-
porated into a structural equation modeling framework. In what follows, the
specification proposed by Willett and Sayer (1994) is described. The broad
details of the specification are provided; however, the reader is referred to
Willett and Sayer’s (1994) article for more detail.
The Level-1 individual growth model can be written in the form of the
factor analysis measurement model in Equation [4.24] of Chapter 4 as
y = τy + Λy η + ε, [8.5]
where y is a vector representing the empirical growth record for person p. For
example, y could be science achievement scores for person p at the 7th, 8th, 9th,
10th, and 11th grades.
In this specification, τy is an intercept vector with elements fixed to zero
and Λy is a fixed matrix containing a column of ones and a column of constant
time values. Assuming that time is centered at the seventh grade,1 the time
constants would be 0, 1, 2, 3, and 4. The matrix η contains the initial status and
growth rate parameters denoted as π0p and π1p, and the vector ε contains
measurement errors, where it is assumed that Cov(ε) is a diagonal matrix of
constant measurement error variances. Because this specification results in the
initial status and growth parameters being absorbed into the latent variable vec-
tor η, which vary randomly over individuals, this model is sometimes referred to
as a latent variable growth model (B. Muthén, 1991). The growth factors, as in the
multilevel specification, are random variables.
Next, it is possible to use the standard structural model specification dis-

cussed in Chapter 4 to handle the Level-2 components of the growth model,
corresponding. Considering the Level-2 model without the vector of predictor
variables x, the model can be written as
η = α + Bη + ζ, [8.6]
where η is specified as before, α contains the population initial status and

growth parameters mp0 and mp1 , B is a null matrix, and ζ is a vector of
deviations of the parameters from their respective population means. Again,
this specification has the effect of parameterizing the true population initial
status parameter and growth parameter into the structural intercept vector α.
Finally, the covariance matrix of ζ, denoted as Ψ, contains the variances and
covariances of true initial status and growth.
The Level-2 model given in Equation [8.6] does not contain predictor
variables. The latent variable growth model can, however, be extended to
include exogenous predictors of initial status and growth. To incorporate
exogenous predictors requires using the x-measurement model of the sort
described in Chapter 4. Specifically, the model is written as
x = τx + Λx ξ + δ, [8.7]
where here x is a vector of exogenous predictors, τx contains the mean vector,

Λx is an identity matrix, ξ contains the exogenous predictors deviated from
their means, and δ is a null vector. This specification has the effect of placing
the centered exogenous variables in ξ (Willett & Sayer, 1994, p. 374).
Finally, the full specification of the structural equation model given in
Equation [4.1] can be used to model the predictors of true initial status and
true growth, where, due to the centering of the exogenous predictors, it retains
its interpretation as the population mean vector of the individual initial status
and growth parameters (Willett & Sayer, 1994, p. 375).
An important feature of the structural equation modeling approach to
growth curve modeling is its flexibility in handling structured errors. That
is, the assumption of independent and homoscedastic errors can be
relaxed allowing for heteroscedasticity and autocorrelation. In the former
case, heteroscedasticity can be incorporated by relaxing the equality con-
straints among error variances in the diagonal of Θ ε. Autocorrelation can
be incorporated into growth curve models by allowing free off-diagonal
elements in Θ ε.
8.3.1 AN EXAMPLE OF UNIVARIATE

GROWTH CURVE MODELING
The data for this example come from the Longitudinal Study of American
Youth (LSAY; Miller, Hoffer, Sucher, Brown, & Nelson, 1992).2 LSAY includes
two sets of schools, a national probability sample of approximately 60 high
schools (Cohort 1) and approximately 60 middle schools (Cohort 2). An aver-
age of 60 10th graders (Cohort 1) and 60 7th graders (Cohort 2) from each of
the 60 high schools and middle schools have been followed since 1987, gather-
ing information on students’ family and school background, attitudes, and
achievement.
In addition to general background information, achievement and attitude
measures were obtained. Achievement tests in science and mathematics were
given to the students each year. The items for the mathematics and science
achievements tests were drawn from the item pool of the 1986 National
Assessment of Educational Progress (NAEP) tests (NAEP, 1986).
The measure of student attitudes toward science is based on a composite
which consists of an equally weighted average of four attitudinal subscales,
namely interest, utility, ability, and anxiety. There are nine variables in this
composite, for example, “I enjoy science”; “I enjoy my science class”; and so on.
Variables were recoded so that high values indicate a positive attitude toward
science. The composite is measured on a 0 to 20 metric.
For the purposes of this example, we concentrate the younger cohort,
measured at grades 7, 8, 9, 10, and 11. In addition to science achievement test
scores, we also include gender (male = 1) as a time-invariant predictor. Time-
varying predictors include a measure of parent academic push (PAP) and
student’s science teacher push (STP). PAP is an equally weighted average of eight
variables. Both student and parent responses are used in this composite.
Questions asked of the students are related to parental encouragement for
making good grades, doing homework, and interest in school activities.
Questions asked of the parents were related to their knowledge of their child’s
performance, homework, and school projects. This composite is measured on
a 0 to 10 metric. Although a composite measure of parent science push com-
posite was available for Cohort 2, it was not used because the items composing
this composite were not measured at all the time points.
Science teacher push is a composite based on five student response vari-
ables referring to teacher encouragement of science. Response values for this
composite range from 0 to 5.
The sample size for this study was 3,116. Analyses used Mplus (L. Muthén
& Muthén, 2006) under the assumption of multivariate normality of the data.
Missing data were handled by full information maximum likelihood imputa-

tion as discussed in Chapter 5. The analysis proceeds by assessing growth in
science achievement and science attitudes separately, then together in a multi-
variate growth curve model.
Growth in Science Achievement. Column 1 of Table 8.1 presents the results of

the linear growth curve model without predictors. A path diagram of this
model is shown in Figure 8.3. This model is estimated allowing for het-
eroscedastic but non-autocorrelated disturbances. The initial status is set at
Table 8.1 Selected Results of Growth Curve Model of Science Achievement
Model 1a Model 2b Model 3c
Effect Maximum Likelihood Estimates
Intercept 50.507∗ 50.632∗ 47.042∗

Slope 2.207∗ 1.813∗ 1.810∗
Var(intercept) 71.665∗ 68.935∗ 67.755∗
Var(slope) 2.409∗ 1.563∗ 1.602∗
r(intercept and slope) −0.392∗ −0.352∗ −0.365
∗
Intercept on gender 0.667 0.736∗
Slope on gender −0.078 −0.083
SCIACH1 on PAP1 0.344∗
SCIACH1 on STP1 0.109
SCIACH2 on STP2 0.161
SCIACH3 on STP3 0.556∗
BIC 111359.030 116079.778 227152.878
a. Linear growth curve model—no covariates.

b. Linear growth curve model—gender as time-invariant covariate.
c. Linear growth curve model—gender as time-invariant covariate; parent academic push (PAP)
and science teacher push (STP) as time-varying covariates.
∗
p < .05.
Initial Growth
Status Rate
1
1
1
1
1
0 1 2 3 4
SCIACH1 SCIACH2 SCIACH3 SCIACH4 SCIACH5
Figure 8.3 Initial Growth Curve Model of Science Achievement
seventh grade. The results indicate that the average seventh grade science
achievement score is 50.51 and increases an average of 2.21 points a year. The
correlation between the initial status and rate of change is negative suggesting
the possibility of a ceiling effect. Figure 8.1 presents a random sample of 50
model-estimated science achievement trajectories.
Column 2 of Table 8.1 presents the results of the linear growth curve
model with gender as a time-invariant predictor of initial status and growth
rate. A path diagram of this model is shown in Figure 8.4. The results indicate
a significant difference in favor of boys for seventh grade science achievement,
but no significant difference between boys and girls in the rate of change over
the five grades.
Column 3 of Table 8.1 presents the results of the linear growth curve model
with the time-varying covariates of PAP and STP included. The results for
gender remain the same. A path diagram of this model is shown in Figure 8.5.
The results for the time-varying covariates suggest that early PAP is a stronger
predictor of early science achievement compared with STP. However, the
effects of both time-varying covariates balance out at the later grades.
Gender
Initial Growth
Status Rate
1
1 1 1 1
0 1 2 3 4
SC IAC H1 SC IAC H2 SC IAC H3 SC IAC H4 SC IAC H5
Figure 8.4 Growth Curve Model of Science Achievement With Time-Invariant

Predictors
Growth in Attitudes Toward Science. Column 1 of Table 8.2 shows the results of
the simple linear growth curve model applied to the science attitude data. Path
diagrams for this and the remaining models are not shown. The results show a
seventh grade average attitude score of 14.25 points (on a scale of 1 to 20) and
a small but significant decline over time. Moreover, a strong negative correla-
tion can be observed between initial science attitudes and the change over time.
This suggests, as with achievement, that higher initial attitudes are associated
with slower change in attitudes over time.
Gender
Initial Growth
Status Rate
1 1
1 1 1
0 1 2 3 4
PAP1 PAP2 PAP3 PAP4 PAP5
STP1 STP2 STP3 STP4 STP5
Figure 8.5 Growth Curve Model of Science Achievement With Time-Invariant

and Time-Varying Predictors
Column 2 of Table 8.2 examines sex differences in initial seventh grade

science attitudes and sex differences in the rate of decline over time. As with
science achievement we observe initial differences in attitudes at seventh grade
with boys exhibiting significantly higher positive attitudes compared with girls.
However, there appears to be no sex differences in the rate of attitude change
over time.
Table 8.2 Selected Results of Growth Curve Model of Attitudes Toward Science
Model 1a Model 2 b Model 3 c
Effect Maximum Likelihood Estimates
Intercept 14.251∗ 14.058∗ 12.018∗

Slope −0.095∗ −0.094∗ 0.043
∗ ∗
Var(intercept) 3.422 3.388 2.889∗
Var(slope) 0.121∗ 0.121∗ 0.107∗
r(intercept and slope) −0.578∗ −0.578∗ −0.564∗
Intercept on gender 0.369∗ 0.413∗
Slope on gender −0.003 −0.009
ATTITUDE1 on PAP1 0.148∗
ATTITUDE1 on STP1 0.284∗
BIC 69089.797 73588.288 184562.117
a. Linear growth curve model—no covariates.

b. Linear growth curve model—gender as time-invariant covariate.
c. Linear growth curve model—gender as time-invariant covariate; parent academic push (PAP)
and science teacher push (STP) as time-varying covariates.
∗
p < .05.
Column 3 of Table 8.2 adds the time-varying covariates to model in Column 2.

The results here are somewhat different than those found for achievement.
Specifically, we observe that PAP is a relatively weak predictor of science attitudes
compared with STP. Moreover, an inspection of correlations between sex and
each of the time-varying predictors can be interpreted as representing whether
sex differences are occurring for PAP and STP. The results indicate small and
mostly nonsignificant sex differences in these time-varying covariates.
8.4 Extensions of the Basic Growth Curve Model
An important feature of growth curve modeling within the structural equation

modeling perspective is its tremendous flexibility in handling a variety of dif-
ferent kinds of questions involving growth. In this section, we consider four
important extensions of growth curve modeling. First, we consider the multi-
variate growth curve modeling, including models for parallel and sequential
processes. Second, we consider model extensions for nonlinear curve fitting.
Third, we consider an extension that incorporates an autoregressive compo-
nent to the model. Finally, we briefly consider some flexible alternatives
to addressing the time metric. It should be noted that these three extensions
do not exhaust the range of analytical possibilities with growth curve model-
ing. For a more comprehensive treatment of the extensions of growth curve
modeling, see Bollen and Curran (2006).
8.4.1 MULTIVARIATE GROWTH CURVE MODELING

Consider the case where an investigator wishes to assess the relationship
between growth in mathematics and reading proficiency. It can be argued that
these achievement domains are highly related. Indeed, one may argue that
because measures of mathematics proficiency require reading proficiency,
reading achievement might be a causal factor for growth in mathematics pro-
ficiency. For now, however, we are only interested in assessing how these domains
change together.
A relatively straightforward extension of the growth curve specification
given in Equations [8.5] to [8.7] allows for the incorporation of multiple out-
come measures (Willett & Sayer, 1996). Important information about growth
in multiple domains arises from an inspection of the covariance matrix of η
denoted above as Ψ. Recall that in the case of univariate growth curve model-
ing the matrix Ψ contains the covariance (or correlation) between the initial
status parameter π0 and the growth parameter π1. In the multivariate exten-
sion, Ψ contains the measures of association among the initial status and
growth rate parameters of each outcome. Thus, for example, we can assess the
degree to which initial levels of reading proficiency are correlated with initial
proficiency levels in mathematics and also the extent to which initial reading
proficiency is correlated with the rates of growth in mathematics. We may also
ask whether rates of growth in reading are correlated with rates of growth in
mathematics. As in the univariate case, the multivariate case can be easily
extended to include time-invariant and time-varying predictors of all the
growth curve parameters.
If both mathematics and reading proficiency are measured across the
same time intervals, then we label this a parallel growth process. However, an
interesting additional extension of multivariate growth curve modeling allows

the developmental process of one domain to predict the developmental process
of a later occurring outcome (see, e.g., B. Muthén & Curran, 1997). For exam-
ple, one might argue that development in reading proficiency in first, second,
and third grades predict the development of science achievement in fourth,
fifth, and sixth grades.
For this extension, the decision where to center the level of the process is
crucial. One could choose to center initial reading proficiency at first grade and
initial science proficiency at fourth grade. However, it may be the case that
reading proficiency at first grade shows little variation and thus may not be a
useful predictor of initial science proficiency. Perhaps a more sensible strategy
would be to center initial reading proficiency at the third grade and center ini-
tial science proficiency at fourth grade. One might expect more variation in
reading proficiency at the third grade and this variation might be more pre-
dictive of science proficiency at the fourth grade. As in the univariate case, the
issue of centering will most often be based on substantive considerations.
An Example of Multivariate Growth Curve Modeling

An inspection of Figures 8.1 and 8.2 suggests the need to study changes in
science achievement and science attitudes together. In the interest of space, we
fit the full time-invariant and time-varying model to the achievement and atti-
tude data in one analysis. A path diagram for this model is not shown. The
results are shown in Table 8.3. The results generally replicate those of the uni-
variate analyses, and in the interest of space, the time-varying covariate results
are not shown. However, it is important to focus on the correlations between
the growth parameters for achievement and attitudes. The results indicate a
positive correlation between seventh grade science achievement and seventh
grade science attitudes (r = 0.458). Moreover, we observe that higher rates of
growth in science achievement are associated with higher rates of growth in
attitudes toward science (r = 0.381). An apparent contradiction arises when
considering the negative correlation between initial science achievement and
rate of change in science attitudes. Again, an explanation might be a ceiling
effect, insofar as higher achievement scores are associated with higher attitudes
and therefore attitudes toward science cannot change much more.
8.4.2 NONLINEAR CURVE FITTING

In practical applications of growth curve modeling, it might be the case that
a nonlinear curve better fits the data. An approach to nonlinear curve fitting,
suggested by Meredith and Tisak (1990), entails freeing a set of the factor load-
ings associated with the slope. Specifically, considering the science achievement
Table 8.3 Selected Results of Multivariate Growth Curve Model of Science

Achievement and Attitudes Toward Science
Effect
Estimates Maximum Likelihood
Ach. intercept 46.749∗

Ach. slope 2.325∗
Att. intercept 12.066∗
Att slope 0.064
Var(ach. intercept) 69.775∗
Var(ach. slope) 2.318∗
Var(att. intercept) 3.164∗
Var(att. slope) 0.167∗
r(ach. intercept/att. intercept) 0.458∗
r(ach. intercept/att. slope) −0.231∗
r(ach. intercept/ach. slope) −0.394∗
r(att. intercept/att slope) −0.616∗
r(att. intercept/ach. slope) −0.178∗
r(ach. slope/att. slope) 0.381∗
Ach. intercept on gender 0.846∗
Ach slope on gender −0.151
Att intercept on gender 0.416∗
Att slope on gender −0.012
BIC 295164.119
∗
p < .05.
model, the nonlinear curve fitting approach suggested by Meredith and Tisak
would require that the first loading be fixed to zero to estimate the intercept, the
second loading would be fixed to one to identify the metric of the slope factor, but
the third through fifth loadings would be free. In this case, the time metrics are
being empirically determined. When this type of model is estimated, it perhaps
makes better sense to refer to the slope factor as a shape factor.
An Example of Nonlinear Curve Fitting

In this example, we estimate the science achievement growth model allow-
ing estimation of a general shape factor. As suggested by Meredith and Tisak
(1990), we fix the first and second loadings as in the conventional growth curve
modeling case and free the loadings associated with the third, fourth, and fifth
waves of the study. The results are displayed in Table 8.4. It is clear from an
inspection of Table 8.4 that the nonlinear curve fitting model results in a sub-
stantial improvement in model fit. Moreover, we find that there are significant
sex differences with respect to the intercept in the nonlinear curve fitted model.
8.4.3 AUTOREGRESSIVE LATENT TRAJECTORY MODELS

Recently, Bollen and Curran (2004) and Curran and Bollen (2001) advo-
cated the blending of an autoregressive structure into conventional growth
curve modeling. They refer to this hybrid model as the autoregressive latent
trajectory (ALT) model.
Table 8.4 Maximum Likelihood Estimates From Nonlinear Curve Fitting

Models
Model 0 Estimates Model 1 Estimates
Intercept by
Ach1 1.000 1.000
Ach2 1.000 1.000
Ach3 1.000 1.000
Ach4 1.000 1.000
Ach5 1.000 1.000
Shape by
Ach1 0.000 0.000
Ach2 1.000 1.000
Ach3 3.351 3.299
Ach4 3.928 3.869
Ach5 5.089 5.004
∗
Ach. intercept 50.360 49.966∗
Ach. shape 1.693∗ 1.770∗
r(shape, intercept) −0.397 ∗
−0.398∗
Intercept on
Male 0.737∗
Shape on
Male −0.091
BIC 111240.859 115769.718
∗
p < .05.
It is not difficult to make the case for specifying an ALT model for devel-
opmental research studies. Consider the example used throughout this chapter
where the focus is on modeling the development of science proficiency
through the middle and high school years. We can imagine that interest centers
on how change in science proficiency predicts later outcomes of educational
relevance—such as majoring in science-related disciplines in college. It is not
unreasonable, therefore, to assume that in addition to overall growth in science
proficiency prior science scores predict later science scores thus suggesting an
autoregressive structure.
In the case of long periods between assessment waves, we might reason-
ably expect small autoregressive coefficients, as opposed to more closely spaced
assessment waves. Nevertheless, if the ALT model represents the true data gen-
erating structure, then omission of the autoregressive part may lead to sub-
stantial parameter bias. A recent article by Sivo, Fan, and Witta (2005) found
extensive bias for all parameters of the growth curve model as well as biases in
measures of model fit when a true autoregressive component was omitted
from the analysis.
For the purposes of this chapter, we focus on the baseline lag-1 ALT
model with a time-invariant predictor. This will be referred to as the
ALT(1) model. The ALT(1) specification indicates that the outcome at time
t is predicted only by the outcome at time t − 1. It should be noted lags
greater than one can also be specified. As with conventional growth curve
modeling, the ALT model can be extended to include more than one out-
come, each having its own autoregressive structure, as well as extensions
that include proximal or distal outcomes and time-varying and time-
invariant predictors.
To contextualize the study consider the example of an ALT model for the
development of reading competencies in young children. The first model is a
baseline lag-1 ALT model. This model can be written in structural equation
modeling notation as
y = α + Λη + By + δ, [8.8]
η = τ + Γη + ζ, [8.9]
where y is a vector of repeated measures, Λ is a matrix of fixed coefficients that

specify the growth parameters, η is a vector of growth parameters, B is a matrix
of regression coefficients relating the repeated measures to each other, and δ is
a vector of residual variances with covariance matrix Cov(δ δ′) = Θ. A path
diagram of ALT(1) model is shown in Figure 8.6.
Initial Growth
Status Rate
1
1 1
1 1
0 1 2 3
4
Figure 8.6 Autoregressive Latent Trajectory(1) [ALT(1)] Model of Science

Achievement
An Example of an ALT Model

For this example, we estimate an ALT(1) among the science scores from
the LSAY example. Model 0 of Table 8.5 displays the results for the ALT(1)
model without the addition of gender as a time-invariant predictor. It can be
seen that the autocorrelation effects are small but statistically significant.
Model 1 under Table 8.5 adds gender to the ALT model. The addition of gen-
der in Model 1 appears to worsen the overall fit of the model as evidenced by
the increase in the BIC.
8.4.4 ALTERNATIVE METRICS OF TIME

Up to this point, we have assumed a highly restrictive structure to the data.
Specifically, we have assumed that each wave of measurement is equidistant
and that we have complete data on all units of analysis at each time point. In
many cases, this assumption is too restrictive and we need a way of handling
more realistic time structures. For example, in developmental research, the
Table 8.5 Maximum Likelihood Estimates From Autoregressive Latent

Trajectory Models
Model 0 Estimates Model 1 Estimates
Ach5 ON
Ach4 0.135∗ 0.135∗
Ach4 ON
Ach3 0.103∗ 0.103∗
Ach3 ON
Ach2 0.102∗ 0.102∗
Ach2 ON
Ach1 0.031∗ 0.031∗
∗
Ach. intercept 50.335 49.911∗
Ach. slope 0.246∗ 0.329∗
r(slope, intercept) −0.575 ∗
−0.573∗
Intercept on
Male 0.814∗
Slope on
Male −0.156
BIC 111195.653 115723.216
∗
p < .05.
wave of assessment may not be nearly as important as the chronological age of

the child. In this case, there might be a quite a bit of variability in chronologi-
cal ages at each wave of assessment. In other cases, the nature of the assessment
design is such that each child has his or her own unique interval between test-
ing. In this section, we introduce two approaches that demonstrate the flexi-
bility in dealing with the time metric in longitudinal studies: The cohort
sequential design and the individual varying metrics of time design.
The Cohort Sequential Design. In cohort sequential designs, we consider age

cohorts within a particular time period (Bollen & Curran, 2006). Thus, at Wave 1
of the study, we may have children who vary in age from 5 years to 7 years. At
Wave 2 of the study, we may have children varying in age from 7 years to 9 years
years, and so on. Notice, there is an overlap of ages at each wave.
As Bollen and Curran (2006) point out, there are two ways that this type
of data structure can be addressed. First, we can go back to treating wave as
the metric of time and use age of respondent as a covariate in the study. The
second approach is to exploit the inherent missing data structure. In this case,
we could arrange the data as shown in Table 8.6 patterned after Bollen and
Curran (2006, p. 77). Notice that there are three cohorts and five time points.
Any given child in this example can provide between one and four repeated
measures. The pattern of missing data allows estimation using maximum like-
lihood imputation under the assumption of missing-at-random (Allison,
1987; Arbuckle, 1996; Muthén et al., 1987). Thus, the growth parameters span-
ning the entire time span can be estimated.
As Bollen and Curran (2006) point out, however, this approach suffers
from the potential of cohort effects. That is, children in Cohort 1 may have
been 7 years old at the second wave of assessment, but children in Cohort 2
would have been 7 at the first wave of assessment.
Individually Varying Metrics of Time. Perhaps a more realistic situation

arises when individuals have their own unique spacing of assessment waves.
An example of this would be the situation where a researcher is collecting
individual longitudinal assessments in schools. At the beginning of the
semester, the researcher and his or her assistants begin data collection.
Because it is probably not feasible that every child in every sampled school
can be assessed on exactly the same day, the assessment times may spread
over, say, a 2-week period. At the second wave of assessment, the first child
assessed at Wave 1 is not necessarily the same child assessed at Wave 2.
Indeed, in the worst case scenario, if the first child assessed at Wave 1 is the
last child assessed at Wave 2, the length of time between assessments will be
much greater than if the child is the last one assessed at Wave 1 and the first
assessed at Wave 2. Although I have presented the extreme case, the conse-
quences for a study of development, especially in young children, would be
profound. A better approach is to mark the date of assessment for each child
and use the time between assessments for each child as his or her own
unique metric of time. Time can be measured in days, weeks, or months,
with the decision based on developmental considerations.
Table 8.6 Cohort Sequential Data Structure
Age of Assessment
Cohort Time 1 Time 2 Time 3 Time 4
1 6 7 8 9
2 7 8 9 10
3 8 9 10 11
8.5 Evaluating Growth Curve

Models Using Forecasting Statistics
It may be useful to consider if there are aspects of model fit that are pertinent
to the questions being addressed via the use latent growth curve models.
Clearly, we can apply traditional statistical and nonstatistical measures of fit,
such as the likelihood ratio chi-square, RMSEA, NNFI, or the like. In many
cases, the Bayesian information criterion is used to compare latent growth
curve models as well. However, these measures of fit are capturing whether the
restrictions that are placed on the data to provide estimates of the initial status
and growth rate are supported by the data. In addition, these measures
are assessing whether such assumptions as non-autocorrelated errors are sup-
ported by the data.
The application of traditional statistical and nonstatistical measures of fit
does provide useful information. However, because growth curve models pro-
vide estimates of rates of change, it may be useful to consider whether the
model predicted growth rate fits the empirical trajectory over time. So, for
example, if we know how science achievement scores have changed over the
five waves of LSAY, we may wish to know if our growth curve model accurately
predicts the known growth rate. In the context of economic forecasting, this
exercise is referred to as ex post simulation. The results of an ex post simula-
tion exercise is particularly useful when the goal of modeling is to make fore-
casts of future values.
To evaluate the quality and utility of latent growth curve models, Kaplan
and George (1998) studied the use of six different ex post (historical) simula-
tion statistics originally proposed by Theil (1966) in the domain of economet-
ric modeling. These statistics evaluate different aspects of the growth curve.
The first of these statistics discussed by Kaplan and George was the root mean
square simulation error (RMSSE) as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T [8.10]
RMSSE = 1
T t ðy s − yta Þ2 ,
=1 t
where T is the number of time periods, y ts is the simulated (i.e., predicted)

value at time t, and y ta is the actual value at time t. The RMSSE provides a mea-
sure of the deviation of the simulated growth record from the actual growth
record and is the measure most often used to evaluate simulation models
(Pindyck & Rubinfeld, 1991).
Another measure is the root mean square percent simulation error
(RMSPE), which scales the RMSSE by the average size of the variable at time t.
The RMSPE is defined as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
T yts − yta
RMSPE = 1
T t yta : [8.11]
=1
A problem with the RMSPE is that its scale is arbitrary. Although the lower
bound of the measure is zero, the upper bound is not constrained. Thus, it is
of interest to scale the RMSSE to lie in the range of 0 to 1. A measure that lies
between 0 and 1 is Theil’s inequality coefficient, defined as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T
1
T t ðy s − yta Þ2
=1 t
U = rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : [8.12]
T T
1
T t ðy s Þ2 + 1
T t ðy a Þ2
=1 t =1 t
An inspection of Equation [8.12] shows that perfect fit of the simulated

growth record to the actual growth record is indicated by a value U = 0.
However, if U = 1, the simulation adequacy is as poor as possible.
An interesting feature of the inequality coefficient in Equation [8.12] is
that can be decomposed into components that provide different perspectives
on the quality of simulation performance. The first component of Theil’s U is
the bias proportion, defined as
y s − y a Þ2
ð
UM = , [8.13]
1 T
T t =1 ðyts − yta Þ2
where ys and ya are the means of the simulated and actual growth record,
respectively, calculated across the T time periods. The bias proportion provides
a measure of systematic error because it considers deviations of average actual
values from average simulated values (Pindyck & Rubinfeld, 1991).
The ideal would be a value of U M = 0. Values greater than 0.1 or 0.2 are
considered problematic.
Another component of Theil’s U is the variance proportion defined as
ðss − sa Þ2
US = , [8.14]
1 T
ðy s − y a Þ2
T t
t =1 t
where σs and σa are the standard deviations of the respective growth records
calculated across the T time periods. The variance proportion provides a
measure of the extent to which the model tracks the variability in the growth
record. If U S is large, it suggests that the actual (or simulated) growth record
varied a great deal while the simulated (or actual) growth record did not
deviate by a comparable amount.
A final measure based on the decomposition of the inequality coefficient
is the covariance proportion, defined as
2ð1 − rÞss sa
UC = , [8.15]
T
1
ðy s − y a Þ2
T t =1 t t
where ρ is the correlation coefficient between y ts and y at. The covariance

proportion U C provides a measure of unsystematic error, that is, error that
remains after having removed deviations from average values.
The decomposition of U results in the relation
U M + U S + U C = 1, [8.16]
and an ideal result for a simulation model would be U M = 0, U S = 0,

and U C = 1.
Values greater than zero for U M and/or U S are indicative of some problem
with the model vis-à-vis tracking the empirical growth record.
8.5.1 COMPARISON OF STANDARD

GOODNESS-OF-FIT TESTS AND FORECASTING
STATISTICS FOR SCIENCE ACHIEVEMENT GROWTH MODEL
Table 8.7 displays the forecasting statistics described above for the
science achievement model. It can be seen that the simple linear trend model
(Model 1) demonstrates the best historical forecasting performance as mea-
sured by all six forecasting statistics. Model 2 incorporates the time-invariant
predictor of gender. Here, it can be seen that historical forecasting perfor-
mance worsens. When time-varying predictors of teacher push and parent
push are added to account for the variability in the growth curve, the historical
forecasting performance improves as measured by U S as expected. Figure 8.7
compares the observed growth in science achievement with the model-
predicted growth curves.
Table 8.7 Observed and Predicted Science Achievement Means and Forecasting
Statistics for the Science Achievement Model
Observed Predicted Means

Grade Means Model 1 Model 2 Model 3
7 50.345 50.507 50.591 49.953

8 52.037 52.714 52.404 51.748
9 56.194 54.921 54.217 55.722
10 56.840 57.128 56.030 56.511
11 58.970 59.335 57.843 57.021
Forecasting Statistics
RMSSE 0.681 1.098 0.935
RMSPSE 0.012 0.019 0.016
U 0.006 0.010 0.008
UM 0.004 0.361 0.539
US 0.014 0.409 0.202
UC 0.982 0.230 0.259
60
58
Science Achievement Score
56
54
52
50
48
7 8 9 10 11
Grade
Observed Means Model 1

Model 2 Model 3
Figure 8.7 Observed Versus Model-Predicted Science Achievement Means

8.6 Conclusion
This chapter focused on the extension of structural equation modeling to the

study of growth and change. We outlined how growth can be considered a
multilevel modeling problem with intraindividual differences in students
modeled at Level 1, individual differences modeled at Level 2, and individuals
nested in groups modeled Level 3. We discussed how this specification could
be parameterized as a structural equation model and discussed how the gen-
eral model could be applied to (a) the study of growth in multiple domains,
(b) the study of binary outcomes, and (c) intervention studies.
In addition to the basic specification, we also discussed approaches to the
evaluation of growth curve models—focusing particularly on the potential of
growth curve modeling for prediction and forecasting. We argued that growth
curve modeling could be used to develop predictions of outcomes at future
time points, and we discussed the use of econometric forecasting evaluation
statistics as an alternative to more traditional forms of model evaluation.
Notes
1. Clearly, other choices of centering are possible. Centering will not affect the
growth rate parameter but will affect the initial status parameter.
2. LSAY was a National Science Foundation funded national longitudinal study of
middle and high school students. The goal of LSAY was to provide a description of
students’ attitudes toward science and mathematics focusing also on these areas as pos-
sible career choices (Miller et al., 1992, p. 1).
9
Structural Models for Categorical
and Continuous Latent Variables
T his chapter describes what can be reasonably considered the state of the
art in structural equation modeling—namely, structural equation models
that combine categorical and continuous latent variables for cross-sectional
and longitudinal designs. The comprehensive modeling framework described
in this chapter rests on the work of B. Muthén (2002, 2004),which builds on
the foundations of finite mixture modeling (e.g., McLachlan & Peel, 2000) and
conventional structural equation modeling for single and multiple groups as
described in Chapter 4.
It is beyond the scope of this chapter to describe every special case that
can be accommodated by the general framework. Rather, this chapter
touches on a few key methods that tie into many of the previous chapters.
The organization of this chapter is as follows. First, we set the stage for the
applications of structural equation modeling for categorical and continuous
latent variables with a brief review of finite mixture modeling and the
expectation-maximization (EM) algorithm, following closely the discussion
given in McLachlan and Peel (2000). This is followed by a discussion of
applications of finite mixture modeling for categorical outcomes leading to
latent class analysis and variants of Markov chain modeling. Next, we dis-
cuss applications of finite mixture modeling to the combination of contin-
uous and categorical outcomes, leading to growth mixture modeling. We
focus solely on growth mixture modeling because this methodology encom-
passes structural equation modeling, factor analysis, and growth curve
modeling for continuous outcomes. The chapter closes with a brief overview
of other extensions of the general framework that relate to previous chapters
of this book.
181
9.1 A Brief Overview of Finite Mixture Modeling
The approach taken to specifying models that combine categorical and con-
tinuous latent variables is finite mixture modeling. Finite mixture modeling
relaxes the assumption that a sample is drawn from a population characterized
by a single set of parameters. Rather, finite mixture modeling assumes that the
population is composed of a mixture of unobserved subpopulations charac-
terized by their own unique set of parameters.
To fix notation, let z = ðz01 , z02 , . . . , z0n Þ0 denote the realized values of a
p-dimensional random vector Z = ðZ01 , Z02 , . . . Z0n Þ0 based on a random sample
of size n. An element Zi of the vector Z has an associated probability density
function f(zi). Next, define the finite mixture density as
X
K
f ðzi Þ = pk fk ðzi Þ, ði = 1, 2, . . . , n; k = 1, 2, . . . , K Þ, [9.1]
k=1
where fk ðzi Þ are component densities with mixing proportions ð0 ≤ pk ≤ 1Þ

and Σkk = 1 πk = 1.
It may be instructive to consider how data are generated from a K-class
finite mixture model.1 Following McLachlan and Peel (2000), consider a cate-
gorical random variable Ci, referred to here as a class label, which takes on val-
ues 1, 2, . . . , K with associated probabilities p1 , p2 , . . . , pk . In this context, the
conditional density of Zi given that the class label Ci = k is fk ðzi Þ and the mar-
ginal density of Zi is f (zi).
We can arrange the class label indicators in a K-dimensional vector denoted
as C = ðC01 , C02 , . . . , C0n Þ0 with corresponding realizations c = ðc1 , c2 , . . . , cn Þ0 .
Here, the elements of ci are all zero except for one element whose value is unity
indicating that zi belongs to the kth mixture class. It follows then, that the
K-dimensional random vector Ci possesses a multinomial distribution, namely,
Ci ∼ MultK ð1, πÞ, [9.2]
where the elements of π defined earlier arise from the fact that
c c c
prfCi = ci g = p11i p22i pKKi : [9.3]
A practical way of conceptualizing the finite mixture problem is to imag-

ine that the vector Zi is drawn from population J consisting of K groups
ðJ1 , J2 , . . . , JK Þ with proportions p1 , p2 , . . . , pk . Then, the density function of Zi
in group Jk given Ci = k is fk ðzi Þ for k = 1, 2, . . . , K . Note that the proportion
Structural Models for Categorical and Continuous Latent Variables—183
πk can be thought of as the prior probability that individual i belongs to mix-

ture class k. Thus, from Bayes’s theorem, the posterior probability that individ-
ual i belongs to class k given the data zi can be written as
pk fk ðzi Þ
τk ðzi Þ = , ði = 1, 2, . . . , n; k = 1, 2, . . . , K Þ: [9.4]
f ðzi Þ
Estimated posterior probabilities from Equation [9.4] provide one approach

for assessing the adequacy of the finite mixture model, as will be demonstrated
in the examples below.
In the context of this chapter, it is necessary to provide a parametric form
of the finite mixture model described in this section. The parametric form of
the finite mixture model in Equation [9.1] can be written as
X
K
f ðzi ; ΩÞ = pk fk ðzi ; yk Þ, ði = 1, 2, . . . , n; k = 1, 2, . . . , K Þ, [9.5]
k=1
where Ω is a parameter vector containing the unknown parameters of the

mixture model, namely,
Ω = ðp1 , p2 , . . . , pK − 1 , ΘÞ, [9.6]
where Θ contains the parameters θ1 , θ2 , . . . , θK , and where
π = ðp1 , p2 , . . . , pK Þ [9.7]
is the vector of mixing proportions defined earlier. Because the probabilities in

Equation [9.7] sum to unity, one of them is redundant as represented in
Equation [9.6]. As outlined below, the vector Θ will contain the parameters of
the various models under consideration—such as growth mixture models. For
now, we consider Θ to be any general parameter vector whose elements are
distinct from π.
9.2 The Expectation-Maximization Algorithm
Standard estimation algorithms for structural equation models, such as maxi-

mum likelihood (ML) and the class of weighted least squares estimators, were
covered in Chapter 2. The method of estimation typically employed for finite
mixture models is ML using the EM algorithm. The EM algorithm was origi-
nally developed as a means of obtaining maximum likelihood estimates in the
context of incomplete data problems (Dempster et al., 1977; see also Little &
Rubin, 2002). However, it was soon recognized that a wide array of statistical
models, including the latent class model, could be conceptualized as incomplete
data problems, including finite mixture modeling. Specifically, in context of
finite mixture models, the component label vector c is not observed. The EM
algorithm proceeds by specifying the complete-data vector, denoted here as
zcomp = ðz0 , c0 Þ0 : [9.8]
The complete-data log-likelihood must account for the distribution of the

class-label indicator vector as well as the distribution of the data. Thus, from
Equation [9.5], the complete data log-likelihood for Ω can be written as
K X
X n
log Lcomp ðΩÞ = cik flog pk + log fk ðzi jθÞg, [9.9]
k=1 i=1
where cik is an element of c. The form of Equation [9.9] shows the role of cik as
an indicator of whether individual i is a member of class k.
The EM algorithm involves two steps. The E-step begins by taking the
conditional expectation of Equation [9.9] given the observed data z using
the current estimates of Ω based on a set of starting values, say Ω(0).
Following McLachlan and Peel (2000), the conditional expectation is writ-
ten as
QðΩ; Ωð0Þ Þ = EΩð0Þ flog Lcomp ðΩÞjzg: [9.10]
Let Ω(m) be the updated value of Ω after the mth iteration of the EM algo-
rithm. Then the E-step on the (m + 1)th iteration calculates Q(Ω, Ω(m)).
With regard to the class-label vector c, the E-step of the EM algorithm
computes the conditional expectation of Cik given z, where Cik is an element of
C. Specifically, on the (m + 1)th iteration, the E-step computes
EΩðmÞ ðCik jzÞ = pr ΩðmÞ fCik = 1jzg

[9.11]
= τk ðzi ; ΩðmÞ Þ,
where τk ðzi ; ΩðmÞ Þ is the posterior probability of class membership defined in

Equation [9.4]. The M-step of the EM algorithm maximizes Q(Ω, Ω(m)) with
respect to Ω providing the updated estimate Ω(m + 1). Note that the E-step
replaces cik in Equation [9.9] with τk ðzi ; ΩðmÞ Þ. Therefore, the updated estimate
of the mixing proportion for class k, denoted as piðm + 1Þ is
ðm + 1Þ
X
n
pk = τk ðzi ; ΩðmÞ Þ=n ðk = 1, 2, . . . , K Þ: [9.12]
i=1
9.3 Cross-Sectional Models

for Categorical Latent Variables
In this section, we discuss models for categorical latent variables, with applica-
tions to cross-sectional and longitudinal designs. This section is drawn from
Kaplan (in press). To motivate the use of categorical latent variables consider the
problem of measuring reading ability in young children. Typical studies of read-
ing ability measure reading on a continuous scale. Using the methods of item
response theory (see, e.g., Hambleton & Swaminathan, 1985), reading measures
are administered to survey participants on multiple occasions, with scores
equated in such a way as to allow for a meaningful notion of growth. However,
in large longitudinal studies such as the Early Childhood Longitudinal Study
(NCES, 2001), not only are continuous scale scores of total reading proficiency
available for analyses but also mastery scores for subskills of reading. For exam-
ple, a fundamental subskill of reading is letter recognition. A number of items
constituting a cluster that measures letter recognition are administered, and,
according to the ECLS-K scoring protocol, if the child receives 3 out of 4 items
in the cluster correct, then the child is assumed to have mastered the skill with
mastery coded “1” and nonmastery coded as “0.” Of course, there exist other,
more difficult, subskills of reading, including beginning sounds, ending sounds,
sight words, and words in context with subskill cluster coded for mastery.
Assume for now that these subskills tap a general reading ability factor. In
the context of factor analysis, a single continuous factor can be derived that
would allow children to be placed somewhere along the factor. Another approach
might be to derive a factor that serves to categorize children into mutually exclu-
sive classes on the latent reading ability factor. Latent class analysis is designed to
accomplish this categorization.
9.3.1 LATENT CLASS ANALYSIS

Latent class models were introduced by Lazarsfeld and Henry (1968) for
the purposes of deriving latent attitude variables from responses to dichoto-
mous survey items. In a traditional latent class analysis, it is assumed that an
individual belongs to one and only one latent class, and that given the individ-
ual’s latent class membership, the observed variables are independent of one
another—the so-called local independence assumption (see Clogg, 1995). The
latent classes are, in essence, categorical factors arising from the pattern of
response frequencies to categorical items, where the response frequencies play
a role similar to that of the correlation matrix in factor analysis (Collins, Hyatt,
and Graham, 2000). The analogues of factor loadings are probabilities associ-
ated with responses to the manifest indicators given membership in the latent
class. Unlike continuous latent variables, categorical latent variables serve to
partition the population into discrete groups based on response patterns
derived from manifest categorical variables.
9.3.2 SPECIFICATION, IDENTIFICATION,

AND TESTING OF LATENT CLASS MODELS
The latent class model can be written as follows. Let
X
A
Pijkl = da rija rjja rkja rlja , [9.13]
a=1
where δa is the proportion of individuals in latent class a. The parameters ρi|a,

ρj|a, ρk|a, and ρl|a are the response probabilities for items i, j, k, and l, respectively
conditional on membership in latent class a.
In the case of the ECLS-K reading example, there are five dichotomously
scored reading subskill measures, which we will refer to here as A, B, C, D, and E.
Denote the response options for each of the measures respectively by i, j, k, l, and
m, (i = 1, . . . , I; j = 1, . . . , J; k =1, . . . , K; l =1, . . . , L; m =1, . . . , M) and denote
the categorical latent variable as ξ. Then, the latent class model can be written as
Ajx Bjx Cjx Djx

pABCDEx
ijklmc = pxc pic pjc pkc plc pEjx
mc ,
[9.14]
x
where pc is the probability that a randomly selected child will belong to latent
Ajx
class c (c = 1, 2, . . . , C) of the categorical latent variable ξ, pic is the
conditional probability of response i to variable A given membership in latent
Bjx Cjx Djx
class c, and pjc , pkc , plc , and pEjx
mc are likewise the conditional probabilities for
items B, C, D, and E, respectively. For this example, the manifest variables are
dichotomously scored, and so there are two response options for each item.2
Identification of a latent class model is typically achieved by imposing the
constraint that the latent classes and the response probabilities that serve as
indicators of the latent classes sum to 1.0—namely, that
X X Ajx
X Bjx
X Cjx
X Djx
X
pxc = pic = pjc = pkc = plc = Ejx
pmc = 1:0, [9.15]
c i j k l m
where the first term on the left-hand side of Equation [9.15] indicates that the
latent class proportions must sum to 1.0, and the remaining terms on the left-
hand side of Equation [9.15] denote that the latent class indicator variables
sum to 1.0 as well (McCutcheon, 2002).3
To continue with our reading example, suppose that we hypothesize that
the latent class variable ξ is a measure of reading ability with three classes (1 =
advanced reading ability, 2 = average reading ability, and 3 = beginning reading
ability). Assume also that we have a random sample of first semester kinder-
garteners. Then, we might find that a large proportion of kindergartners in the
sample who show mastery of letter recognition (items A and B, both coded 1/0)
are located in the beginning reading ability class. A smaller proportion of
kindergartners demonstrating mastery of ending sounds and sight words might
be located in the average reading ability class, and still fewer might be located in
the advanced reading class. Of course at the end of kindergarten and hopefully
by the end of first grade, we would expect to see the relative proportions shift.4
An Example of Latent Class Analysis

The following example comes from Kaplan and Walpole (2005) using data
from the Early Childhood Longitudinal Study: Kindergarten Class of 1998–1999
(NCES, 2001). The ECLS-K database provides a unique opportunity to estimate
the prospects of successful reading achievement (which Kaplan and Walpole
define as the ability to comprehend text) by the end of first grade for children
with different levels of entering skill and different potential barriers to success.
The ECLS-K data available for their example include longitudinal measures of
literacy achievement for a large and nationally representative sample—a sample
unprecedented in previous investigations of early reading development.
Data used in the Kaplan and Walpole (2005) example consist of the
kindergarten base year (Fall 1998/Spring 1999) and first grade follow-up (Fall
1999/Spring 2000) panels of ECLS-K. Only first-time public school kinder-
garten students who were promoted to and present at the end of first grade
were chosen for this study. The sample size for their example was 3,575.5
The measures used in their example consisted of a series of reading assess-
ments. Using an item response theory framework, the reading assessment
yielded scale scores for (1) letter recognition, (2) beginning sounds, (3) ending
sounds, (4) sight words, and (5) words in context.
In addition to reading scale scores, ECLS-K provides transformations of
these scores into probabilities of proficiency as well as dichotomous profi-
ciency scores, the latter which Kaplan and Walpole used in their study. The
reading proficiencies were assumed to follow a Guttman simplex model, where
mastery at a specific skill level implies mastery at all previous skill levels.
Details regarding the construction of these proficiency scores can be found in
Kaplan and Walpole (2005).
Table 9.1 presents the response probabilities measuring the latent classes
for each wave of the study separately. The interpretation of this table is similar
to the interpretation of a factor loading matrix. The pattern of response proba-
bilities across the subsets of reading tests suggest the labels that have been given
to the latent classes—namely, low alphabet knowledge (LAK), early word read-
ing (EWR), and early word comprehension (ERC). The extreme differences
across time in the likelihood ratio chi-square tests are indicative of sparse cells,
particularly occurring at spring kindergarten. For the purposes of this chapter,
I proceed with the analysis without attempting to ameliorate the problem.
Table 9.1 Response Probabilities and Class Proportions for Separate Latent
Class Models: Total Sample
Subtest Response Probabilitiesa
Class
Latent Class LRb BS ES SW WIC Proportions χ2LR (29 df)
Fall K
LAKc 0.47 0.02 0.01 0.00 0.00 0.67 3.41
EWR 0.97 0.87 0.47 0.02 0.00 0.30
ERC 1.00 0.99 0.98 0.97 0.45 0.03
Spring K
LAK 0.56 0.06 0.00 0.00 0.00 0.24 4831.89∗
EWR 0.99 0.92 0.63 0.05 0.00 0.62
ERC 0.00 0.99 0.99 0.96 0.38 0.14
Fall First
LAK 0.52 0.08 0.01 0.00 0.00 0.15 11.94
EWR 1.00 0.92 0.71 0.05 0.03 0.59
ERC 1.00 0.99 0.98 0.98 0.42 0.26
Spring First
LAK 0.19 0.00 0.00 0.00 0.00 0.04 78.60∗
EWR 0.98 0.90 0.79 0.35 0.00 0.18
ERC 1.00 0.99 0.98 0.99 0.60 0.78
a. Response probabilities are for passed items. Response probabilities for failed items can be com-
puted from 1 − prob (mastery).
b. LR = letter recognition, BS = beginning sounds, ES = ending letter sounds, SW = sight words,
WIC = words in context.
c. LAK = low alphabet knowledge, EWR = early word reading, ERC = early reading comprehension.
∗
p < .05. Extreme value likely due to sparse cells.
The last column of Table 9.1 presents the latent class membership pro-
portions across the four ECLS-K waves for the full sample. We see that in fall
of kindergarten, approximately 67% of the cases fall into the LAK class,
whereas only approximately 3% of the cases fall into the ERC class. This break-
down of proportions can be compared with the results for Spring of first grade;
by that time, only 4% of the sample are in the LAK class, whereas approxi-
mately 78% of the sample is in the ERC class.
9.4 Longitudinal Models for Categorical

Latent Variables: Markov Chain Models
The example of latent class analysis given in the previous sections presented
results over the waves of the ECLS-K but treated each wave cross-sectionally.
Nevertheless, it could be seen from Table 9.1 that response probabilities did
change over time as did latent class membership proportions. Noting these
changes, it is important to have a precise approach to characterizing change in
latent class membership over time. In this section, we consider changes in
latent class membership over time. We begin by describing a general approach
to the study of change in qualitative status over time via Markov chain model-
ing, extended to the case of latent variables. This is followed by a discussion of
latent transition analysis, a methodology well-suited for the study of stage-
sequential development.
9.4.1 IDENTIFICATION, ESTIMATION,

AND TESTING OF MARKOV CHAIN MODELS
In this section, we briefly discuss the problem of parameter identification,
estimation, and model testing in Markov chain models. As with the problem of
identification in factor analysis and structural equation models, identification
in Markov chain models is achieved by placing restrictions on model.
With regard to manifest Markov chains, identification is not an issue. All
parameters can be obtained directly from manifest categorical responses. In
the context of latent Markov chain models with a single indicator, the situation
is somewhat more difficult. Specifically, identification is achieved by restricting
the response probabilities to be invariant over time. As noted by Langeheine &
Van de Pol (2002), this restriction simply means that measurement error is
assumed to be equal over time. For four or more time points, it is only required
that the first and last set of response frequencies be invariant. As with latent
class analysis, parameters are estimated via ML using the EM algorithm as dis-
cussed in Sections 9.1 and 9.2.
After obtaining estimates of model parameters, the next step is to assess

whether the specified model fits the data. In the context of Markov chain mod-
els and latent class extensions, model fit is assessed by comparing the observed
response proportions against the response proportions predicted by the
model. Two statistical tests are available for assessing the fit of the model based
on comparing observed versus predicted response proportions. The first is the
classic Pearson chi-square statistic. As an example from the latent class frame-
work, the Pearson chi-square test can be written as
X ðFijkl − fijkl Þ2
w2 = , [9.16]
ijkl
fijkl
where Fijkl are the observed frequencies of the IJKL contingency table and fijkl
are the expected cell counts. The degrees of freedom are obtained by subtracting
the number of parameters to be estimated from the total number of cells of the
contingency table that are free to vary.
In addition to the Pearson chi-square test, a likelihood ratio statistic can
be obtained that is asymptotically distributed as chi-square, where the degrees
of freedom are calculated as with the Pearson chi-square test. Finally, the
Akaike information criterion (AIC) and Bayesian information criterion (BIC)
discussed in Chapter 6 can be used to choose among competing models.
9.4.2 THE MANIFEST MARKOV MODEL

The manifest Markov model consists of a single chain, where predicting
the current state of an individual only requires data from the previous occa-
sion. In line with the example given in Section 4, consider measuring mastery
of ending letter sounds at four discrete time points. The manifest Markov
model can be written as
Pijkl = d1i t21 32 43

jji tkjj tljk , [9.17]
where Pijkl is the model-based expected proportion of respondents in the

defined population in cell (i, j, k, l). The subscripts, i, j, k, and l are the manifest
categories for times 1, 2, 3, and 4, respectively, with i = 1, . . . I; j = 1, . . . J;
k = 1, . . . K; and l = 1, . . . L. In this study, there are two categorical responses
for i, j, k, and l—namely, mastery or nonmastery of ending letter sounds Thus,
I = J = K = L = 2. The parameter is the observed proportion of individuals at
time 1 who have or have not mastered ending letter sounds and corresponds to
32
the initial marginal distribution of the outcome. The parameters t21 jji , tkjj , and
43 21
tljk , are the transition probabilities. Specifically, the parameter tjji represents the
transition probability from time 1 to time 2 for those in category j given they
were in category i at the beginning of the study. The parameter t32 kjj represents
the transition probability from time 2 to time 3 for those in category k given
they were in category j at the previous time point. Finally, the parameter t43 ljk is
the transition probability from time 3 to time 4 for those in category 1given
that they were in category k at the previous time point.
The manifest Markov model can be specified to allow transition probabil-
ities to be constant over time or to allow transition probabilities to differ over
time. The former is referred to as a stationary Markov chain while the latter is
referred to as a nonstationary Markov chain.
Application of the Manifest Markov Model

Table 9.2 presents the results of the nonstationary manifest Markov model
applied to the development of competency in ending sounds.6 It can be seen
that over time, the probabilities associated with moving from nonmastery of
ending sounds to master of ending sounds changes. For example, at the begin-
ning of kindergarten and the beginning of first grade, the proportions who
have not mastered beginning sounds and the proportion who then go on to
master ending sounds is relatively constant. However, the transition from non-
mastery of ending sounds to mastery of ending sounds is much greater from
the beginning of first grade to the end of first grade. Nevertheless, approxi-
mately 25% of the sample who did not master ending sounds at the beginning
of first grade does not appear to have mastered ending sounds by the end of
first grade.
9.4.3 THE LATENT MARKOV MODEL

A disadvantage of the manifest Markov model is that it assumes that the
manifest categories are perfectly reliable measures of a true latent state. In the
context of the ending sounds example, this would imply that the observed cat-
egorical responses measure the true mastery/nonmastery of ending sounds.
Rather, it may be more reasonable to assume that the observed responses are
fallible measures of an unobservable latent state, and it is the study of transi-
tions across true latent states that are of interest.
The latent Markov model was developed by Wiggins (1973) to address the
problem of measurement error in observed categorical responses and as a
result, to obtain transition probabilities at the latent level. The latent Markov
model can be written as
A X
X B X
C X
D
Pijkl = d1a r1ija t21 2 32 3 43 4
bja rjjb tcjb rkjc tdjc rljd , [9.18]
a=1 b=1 c =1 d=1
Table 9.2 Results of the Nonstationary Manifest Markov Chain Model Applied
to Mastery of Ending Sounds
Ending Sounds Time 1 (Rows) by Ending Sounds Time 2 (Columns)a

1 2
1 0.55 0.45
2 0.10 0.90
Ending Sounds Time 2 (Rows) by Ending Sounds Time 3 (Columns)
1 2
1 0.57 0.43
2 0.10 0.90
Ending Sounds Time 3(Rows) by Ending Sounds Time 4 (Columns)
1 2
1 0.25 0.75
2 0.03 0.97
Goodness-of-fit testsb
χ2P (8 df) = 133.77, p < .05
χ2LR (8 df) = 150.23, p < .05
BIC = 13363.49
a. 1 = nonmastery, 2 = mastery.
b. χ2p refers to the Pearson chi-square test, χLR
2
refers to the likelihood ratio chi-square test.
where the parameters in Equation [9.18] taken on slightly different meanings

from those in Equation [9.17]. In particular, the parameter δ1a represents a latent
distribution having A latent states. The linkage of the latent states to manifest
responses is accomplished by the response probabilities ρ. The response
probabilities thus serve a role analogous to that of factor loadings in factor
analysis. Accordingly, r1ija refers to the response probability associated with
category i given membership in latent state a. The parameter r2jjb is interpreted as
the response probability associated with category j given membership in latent
state b at time 2. Remaining response probabilities are similarly interpreted.
As with the manifest Markov model, the transition from time 1 to time 2
in latent state membership is captured by t21jji . At time 2, the latent state is mea-
sured by the response probabilities r2jjb . Remaining response and transition
probabilities are analogously interpreted. Note that an examination of
Equation [9.18] reveals that if the response probabilities were all 1.0 (indicat-
ing perfect measurement of the latent variable), then Equation [9.18] would
essentially reduce to Equation [9.17]—the manifest Markov model.
Application of the Latent Markov Model

Table 9.3 compares the transition probabilities for the manifest Markov
model and the latent Markov model under the assumption of a stationary
Markov chain. The results show small but noticeable differences in the transi-
tion probabilities when taking account measurement error in the manifest cat-
egorical responses.
9.4.4 LATENT TRANSITION ANALYSIS

Although the application of Markov models for the analysis of psycholog-
ical variables goes back to Anderson (1959; as cited in Collins & Wugalter,
1992), most applications focused on single manifest measures. However, as
with the early work in the factor analysis of intelligence tests (e.g., Spearman,
1904), it was recognized that many important psychological variables are
Table 9.3 Comparison of Transition Probabilities for Manifest and Latent

Markov Chain Model With Homogenous Transition Probabilities:
Application to Ending Letter Sounds
Manifest Markov Chain Latent Markov Chain
Ending Sounds Time 1 (Rows) by Ending Sounds Time 2 (Columns)a

1 2 1 2
1 0.50 0.50 1 0.47 0.53
2 0.38 0.62 2 0.38 0.62
Ending Sounds Time 2 (Rows) by Ending Sounds Time 3 (Columns)
1 2 1 2
1 0.50 0.50 1 0.47 0.53
2 0.38 0.62 2 0.38 0.62
Ending Sounds Time 3(Rows) by Ending Sounds Time 4 (Columns)
1 2 1 2
1 0.50 0.50 1 0.47 0.53
2 0.38 0.62 2 0.38 0.62
Goodness-of-fit tests
χ2P (13 df) = 6946.62, p < .05 7040.50, p < .05
χ (13 df) = 6169.320, p < .05
2
LR 6299.62, p < .05
BIC = 19341.68 19471.99, p < .05
a. 1 = nonmastery, 2 = mastery.
latent—in the sense of not being directly observed but possibly measured by
numerous manifest indicators. The advantages to measuring multiple latent
variables via multiple indicators are the known benefits with regard to reliabil-
ity and validity. Therefore, it might be more realistic to specify multiple mani-
fest categorical indicators of the categorical latent variable and combine them
with Markov chain models.
The combination of multiple indicator latent class models and Markov
chain models provides the foundation for the latent transition analysis of stage-
sequential dynamic latent variables. In line with Collins and Flaherty (2002),
consider the current reading example where the data provide information on
the mastery of five different skills. At any given point in time, a child has mas-
tered or not mastered one or more of these skills. It is reasonable in this exam-
ple to postulate a model that specifies that these reading skills are related in
such a way that mastery of a later skill implies mastery of all preceding skills.
At each time point, the child’s latent class membership defines his or her latent
status. The model specifies a particular type of change over time in latent sta-
tus. This is defined by Collins and Flaherty (2002) as a “model of stage-sequential
development, and the skill acquisition process is a stage-sequential dynamic
latent variable” (p. 289). It is important to point out that there is no funda-
mental difference between latent transition analysis and latent Markov chain
modeling. The difference is practical, with latent transition analysis being
perhaps better suited conceptually for the study of change in developmental
status.
The model form for latent transition analysis uses Equation [9.18] except
that model estimation is undertaken with multiple indicators of the latent cat-
egorical variable. The appropriate measurement model for categorical latent
variables is the latent class model.
Application of Latent Transition Analysis

Using all five of the subtests of the reading assessment in ECLS-K, this sec-
tion demonstrates a latent transition analysis. It should be noted that a specific
form of the latent transition model was estimated—namely, a model that
assumes no forgetting or loss of previous skills. This type of model is referred
to as a longitudinal Guttman process and was used in a detailed study of stage
sequential reading development by Kaplan and Walpole (2005).
A close inspection of the changes over time in class proportions shown in
Table 9.1 points to transition over time in the proportions who master more
advanced reading skills. However, these separate latent class models do not
provide simultaneous estimation of the transition probabilities, which are cru-
cial for a study of stage-sequential development over time.
In Table 9.4, the results of the latent transition probabilities for the full latent
transition model are provided. On the basis of the latent transition analysis, we see
that for those in the LAK class at Fall kindergarten, 30% are predicted to remain
in the LAK class, while 69% are predicted to move to the EWR class and 1% are
predicted to transition to ERC in Spring kindergarten. Among those in the EWR
class at Fall kindergarten, 66% are predicted to remain in that class, and 34% of
the children are predicted to transition to the ERC class in Spring kindergarten.
Among those children who are in the LAK class at Spring Kindergarten,
59% are predicted to remain in that class at Fall of first grade, while 40% are
predicted to transition to the EWR class, with 1% predicted to transition to the
ERC class. Among those children who are in the EWR class in Fall kinder-
garten, 82% are predicted to stay in the EWR class while 18% are predicted to
transition to the ERC class.
Finally, among those children who are in the LAK class in Fall of first grade,
30% are predicted to remain in that class at Spring of first grade, while 48% are
predicted to transition to the EWR class by Spring of first grade, with 22%
Table 9.4 Transition Probabilities From Fall Kindergarten to Spring First Grade
Wave LAKa EWR ERC
Fall K Spring K
LAK 0.30 0.69 0.01
EWR 0.00 0.66 0.34
ERC 0.00 0.00 1.00
Spring K Fall First

LAK 0.59 0.40 0.01
EWR 0.00 0.82 0.18
ERC 0.00 0.00 1.00
Fall First Spring First

LAK 0.30 0.48 0.22
EWR 0.01 0.13 0.86
ERC 0.00 0.00 1.00
χ2P (1048528 df) = 12384.21, p = 1.0
χ2LR (1048528 df) = 6732.31, p = 1.0
BIC = 44590.80
a. LAK = low alphabet knowledge, EWR = early word reading, ERC = early reading comprehension.
transitioning to the ERC class. Among those children in the EWR class at fall
of first grade, 13% are assumed to remain in that class with 86% transitioning
to the ERC class by Spring of first grade.
9.4.5 MIXTURE LATENT MARKOV

MODEL (THE MOVER-STAYER MODEL)
A limitation of the models described so far is that they assume that the sam-
ple of observations arises from a single population that can be characterized by a
single Markov chain (latent or otherwise) and one set of parameters—albeit per-
haps different for certain manifest groups such as those children
living above or below poverty. It is possible, however, that the population is com-
posed of a finite and unobserved mixture of subpopulations characterized by
qualitatively different Markov chains. To the extent that the population consists of
finite mixtures of subpopulations, then a “one-size-fits-all” application of the
Markov model can lead to biased estimates of the parameters of the model as well
as incorrect substantive conclusions regarding the nature of the developmental
process in question. A reasonable strategy for addressing this problem involves
combining Markov chain–based models under the assumption of a mixture dis-
tribution (see, e.g., McLachlan & Peel, 2000 for an excellent overview of finite
mixture modeling). This is referred to as the mixture latent Markov model.7
An important special case of the mixture latent Markov model is referred
to as the mover-stayer model (Blumen, Kogan, & McCarthy, 1955). In the
mover-stayer model, there exists a latent class of individuals who transition
across stages over time (movers) and a latent class that does not transition
across stages (stayers). In the context of reading development, the stayers are
those who never move beyond, say, mastery of letter recognition. Variants of
the mover-stayer models have been considered by Van de Pol and Langeheine
(1989; see also Mooijaart, 1998).
The mixture latent Markov model can be written as
S X
X A X
B X
C X
D
Pijkl = ps d1ajs r1ijas t21 2 32 3 43 4
bjas rjjbs tcjjbs rkjcs tljks rljds , [9.19]
s=1 a=1 b=1 c =1 d =1
where πs represents the proportion of observations in Markov chain s (= 1,

2, . . . , S), and the remaining parameters are interpreted as in Equation [9.18],
with the exception that they are conditioned on membership in Markov chain s.
The model in Equation [9.19] is the most general of those considered in
this article with the preceding models being derived as special cases. For example,
with s = 1, Equation [9.19] reduces to the latent Markov model in Equation [9.18].
Also, with s = 1, and no transition probabilities, the model in Equation [9.19]
reduces to the latent class model of Equation [9.13].
Application of the Mover-Stayer Model

For this example, we estimate the full latent transition analysis model with
the addition of a latent class variable that is hypothesized to segment the sam-
ple into those who do transition over time in their development of more com-
plex reading skills (movers) versus those that do not transition at all (stayers).
The results of the mover-stayer latent transition analysis are given in Table 9.5.
In this analysis, it is assumed that the stayer class has zero probability of mov-
ing. An alternative specification can allow the “stayers” to have a probability
that is not necessarily zero but different from the mover class.
From the upper panel of Table 9.5, it can be seen that 97% of the sample
transition across stages, with 71% of the movers beginning their transitions to
full literacy from the LAK status, 26% beginning EWR status, and 2% already
in the ERC status. The stayers represent only 3% of the sample, corresponding
to approximately 90 children. These children are in the low alphabet knowl-
edge class and are not predicted to move.
The lower panel of Table 9.5 gives the transition probabilities for the whole
sample. In many cases, it would be necessary to compute the transition probabil-
ities separately for the movers, but because all the stayers are in the LAK class, they
do not contribute to the transition probabilities for the movers. The slight differ-
ences between the mover transition probabilities compared with the transition
probabilities in Table 9.4 are due to the fact that 3% of the sample is in the stayer
class. Finally, it may be interesting to note that based on a comparison of the BICs
the results of the mover-stayer specification provides a better fit to the manifest
response frequencies than the latent transition analysis model in Table 9.4.
However, the discrepancy between the likelihood ratio chi-square and Pearson
chi-square is, again, indicative of sparse cells and would need to be inspected
closely.
9.5 Models for Categorical

and Continuous Latent Variables
Having introduced the topic of categorical latent variables, we can now

move to models that combine categorical and continuous latent variables.
The basic idea here, as before, is that a population might be composed of
finite mixtures of subpopulations characterized by their own unique para-
meters, but where the parameters are those of models based on continuous
latent variables—such as factor analysis and structural equation models.
For this section, we focus on finite mixture modeling applied to growth
curve modeling because growth curve modeling encompasses many special
cases, including factor analysis, structural equation modeling, and MIMIC
modeling.
Table 9.5 Transition Probabilities for the Mover-Stayer Model: Total Sample
Proportion of
Movers and Stayers (Rows) by Time 1 Classes (Columns) Total Sample
LAK EWR ERC

Movers 0.71 0.26 0.02 0.97
Stayers 1.00 0.00 0.00 0.03
Results for Movers
Fall K Classes (Rows) by Spring K Classes (Columns)
LAK EWR ERC
LAK 0.34 0.65 0.01
EWR 0.00 0.62 0.38
ERC 0.00 0.00 1.00
Spring K Classes (Rows) by Fall First Classes (Columns)
LAK EWR ERC
LAK 0.61 0.39 0.00
EWR 0.00 0.84 0.16
ERC 0.00 0.00 1.00
Fall First Classes (Rows) by Spring First Classes (Columns)
LAK EWR ERC
LAK 0.22 0.55 0.23
EWR 0.01 0.12 0.87
ERC 0.00 0.00 1.00
χ2P (1048517 df) = 10004.46, p = 1.0
χ2LR (1048517 df) = 5522.87, p = 1.0
BIC = 43397.29
a. LAK = low alphabet knowledge, EWR = early word reading, ERC = early reading comprehension.
9.5.1 GENERAL GROWTH MIXTURE MODELING

Conventional growth curve modeling and its extensions were discussed in
Chapter 8. The power of conventional growth curve modeling notwithstand-
ing, a fundamental constraint of the method is that it assumes that the
manifest growth trajectories are a sample from a single finite population of
trajectories characterized by a single average level parameter and a single aver-
age growth rate. However, it may be the case that the sample is derived from a
mixture of populations, each having its own unique growth trajectory. For
example, children may be sampled from populations exhibiting very different

classes of math development—some children may have very rapid rates of
growth in math that level off quickly, others may show normative rates of
growth, while still others may show very slow or problematic rates of growth.
An inspection of Figure 9.1 reveals heterogeneity in the shapes of the
growth curves for a sample of 100 children who participated in the Early
Childhood Longitudinal Study. If such distinct growth functions are actually
present in the data, then conventional growth curve modeling applied to a
mixture of populations will ignore this heterogeneity in growth functions and
result in biased estimates of growth. Therefore, it may be preferable to relax the
assumption of a single population of growth and allow for the possibility that
the population is composed of mixtures of distinct growth trajectory shapes.
Growth mixture modeling begins by unifying conventional growth curve
modeling with latent class analysis (e.g., Clogg, 1995) under the assumption
that there exists a finite mixture of populations defined by unique trajectory
classes. An extension of latent class analysis sets the foundation for growth
mixture modeling. Specifically, latent class analysis can be applied to repeated
measures at different time points. This is referred to as latent class growth
analysis (see, e.g., B. Muthén, 2001; Nagin, 1999). As with latent class analysis,
latent class growth analysis assumes homogenous growth within classes.
Growth mixture modeling relaxes the assumption of homogeneous growth
within classes and is capable of capturing two significant forms of heterogene-
ity. The first form of heterogeneity is captured by individual differences in
growth through the specification of the conventional growth curve model. The
second form of heterogeneity is more basic—representing heterogeneity in
classes of growth trajectories.
Ecls Scale Scores
Fall K
Spring K
Fall First
Spring First
Fall Third
Figure 9.1 Sample of 100 Empirical Growth Trajectories

9.5.2 SPECIFICATION OF THE GROWTH MIXTURE MODEL

The growth mixture model is similar to that given for the conventional
growth curve model. The difference lies in allowing there to be different
growth trajectories for different classes. Thus, in line with Equations 8.5 and 8.6,
we can represent the presence of trajectory classes as
yi = ν + Ληi + Kxi + εi [9.20]
and
ηi = αc + Bc ηi + Γc xi + ζi , [9.21]
where the subscript c represents trajectory class (c = 1, 2, . . . , C).

The advantage to using growth mixture modeling lies in the ability to
characterize across-class differences in the shape of the growth trajectories.
Assuming that the time scores are constant across the classes, the different
reading trajectory shapes are captured in αc. Relationships among growth
parameters contained in Bc are also allowed to be class-specific. The modeling
framework is flexible enough to allow differences in measurement error vari-
ances (Θ) and structural disturbance variances (Ψ = Var(ζ)) across classes as
well. Finally, of relevance to this chapter, the different classes can show differ-
ent relationships to a set of covariates x. In the context of our example,
Equation [9.21] allows one to test whether poverty level has a differential effect
on growth depending on the shape of the growth trajectories Again, one might
hypothesize that there is a small difference between poverty levels for children
with normative or above average rates of growth in math, but that poverty has
a strong positive effect for those children who show below normal rates of
growth in math.
Application of Growth Mixture Modeling

The results of the conventional growth curve modeling provide initial
information for assessing whether there are substantively meaningful growth
mixture classes. To begin, the conventional growth curve model can be consid-
ered a growth mixture model with only one mixture class. From here, we spec-
ified two, three, and four mixture classes. We used three criteria to judge the
number of classes that we decided to retain. The first criterion was the pro-
portion of ECLS-K children assigned to the mixture classes. The second crite-
rion was BIC, which was used to assess whether the extraction of additional
latent classes improved the fit of the model. The third criterion was the ade-
quacy of classification using the average posterior probabilities of classifica-
tion. On the basis of these three criteria, and noting that the specification of
the model did not include the covariate of poverty level, we settled on retain-
ing three growth mixture classes. A plot of the three classes can be found in
Figure 9.2.
From Table 9.6 and Figure 9.2, we label the first latent class, consisting of
35.5% of our samples, as “below average developers.” Students in this class evi-
denced a spring kindergarten mean math achievement score of 23.201, a linear
growth rate of 1.317, and a de-acceleration in growth of .005. We labeled the
second latent class, comprising of 58.3% of our sample, as “average develop-
ers.” Students in this class evidenced a spring kindergarten mean math achieve-
ment score of 33.646, a linear growth rate of 1.890, and a de-acceleration of
.006. Finally, we labeled the third latent class, consisting of 35.5% of our sam-
ple as, “above average developers.” Students in this class evidenced a spring
kindergarten mean math achievement score of 54.308, a linear growth rate of
1.988, and a de-acceleration of −.016.
When poverty level was added into the growth mixture model, three latent
classes were again identified.8 The above average developer class started signifi-
cantly above their peers and continued to grow at a rate higher than the rest of
their peers. Interestingly, the above average achiever group was composed
entirely of students living above the poverty line. The average achiever group
120
100
80
Math IRT Score
60
40
20
0
Fall K Spring K Fall 1st Spring 1st Spring 3rd
Times of Assessment
Average Developers
Above Average Developers
Below Average Developers
Figure 9.2 The Three-Class Growth Mixture Model

Table 9.6 Results of Three-Class Growth Mixture Model
Class 1 Class 2 Class 3
Coefficient Model 1 Model 2 Model 1 Model 2 Model 1 Model 2
Intercept (I) 23.201 24.968 33.646 34.943 54.308 56.081

Linear slope (S) 1.317 1.365 1.890 1.912 1.988 1.989
Quadratic (Q) −0.005 −0.006 −0.006 −0.007 −0.016 −0.017
I on below poverty −4.513 −10.418 −24.376
S on below poverty −0.194 −0.434 −0.129
∗
Q on below poverty 0.002 0.002 0.012
∗
Not statistically significant.
was composed of both students who lived above and below the poverty line.
The below average achiever group was composed disproportionately of below
poverty students but did contain some above poverty students. A plot of the
three-class solution with poverty added to the model can be found in Figure 9.3.
120
100
Math IRT Score
80
60
40
20
0
Fall K Spring K Fall 1st Spring 1st Spring 3rd
Time of Assessment
Above Poverty/Average Below Poverty/Average

Below Poverty/Below Average Above Poverty/Below Average
Above Poverty/Above Average Below Poverty/Below Average
Figure 9.3 The Three-Class Growth Mixture Model With Poverty Status Added
The posterior probabilities of classification without and with poverty

added to the model can be found in Tables 9.7 and 9.8, respectively. We observe
that students who should be classified above average achievers had a .882 prob-
ability of being correctly classified as below average developers. Students in the
average developer class had a .855 probability of being correctly classified by
the model as average achievers. Finally, students in the above average class had
a .861 probability of being correctly classified by the model into the below
average class. The posterior probabilities do not change dramatically with the
addition of poverty to the model, as seen in Table 9.8.
Table 9.7 Average Posterior Probabilities for the Three-Class Solution for
Baseline Model
Class 1 0.882 0.027 0.090

Class 2 0.138 0.855 0.007
Class 3 0.138 0.001 0.861
NOTE: Class 1 = average developing; Class 2 = above average; Class 3 = below average.
Table 9.8 Average Posterior Probabilities for the Three-Class Solution With
Poverty Status Included
Class 1 0.858 0.041 0.101

Class 2 0.155 0.826 0.019
Class 3 0.191 0.008 0.801
NOTE: Class 1 = average developing; Class 2 = above average; Class 3 = below average.
9.6 Conclusion
This chapter provided an overview of models for categorical latent variables

and the combination of categorical and continuous latent variables.
Methodologies that were reviewed in this section included latent class model-
ing, manifest and latent Markov modeling, latent transition analysis, and
mixture latent transition analysis (the mover-stayer model). In the context of
combining continuous and categorical latent variables, we focused on growth
mixture modeling.
The general framework that underlies these methodologies recognizes the

possibility of population heterogeneity arising from finite mixture distribu-
tions. In the case of the mover-stayer model, the heterogeneity manifests itself
in a subpopulation of individuals who do not exhibit any stage transition over
time. In the case of growth mixture modeling, the heterogeneity manifests
itself as subpopulations exhibiting qualitatively different growth trajectories.
As we noted in the introduction to this chapter, the general framework
developed by Muthén and his colleagues is quite flexible, and covering every
conceivable special case of the general framework is simply not practical.
Suffice to say here that the general framework can be applied to all of the mod-
els discussed prior to this chapter—including mixture factor analysis, mixture
structural equation modeling in single and multiple groups, mixture MIMIC
modeling, and perhaps most interestingly, mixture multilevel structural equa-
tion modeling. This latter methodology allows for heterogeneity in the para-
meters of multilevel models. An application to education would allow models
for students nested in schools to exhibit unobserved heterogeneity that might
be explained by unique student and school characteristics.
Still another powerful application of the general framework focuses on
estimating causal effects in experimental studies—the so-called complier aver-
age causal effects (CACE) method (see, e.g., Jo & Muthén, 2001). For example,
in a field experiment of an educational intervention, not all individuals who
receive the experimental intervention will comply with the protocol. Standard
approaches analyze the treatment and control groups via an intent-to-treat
analysis, essentially ignoring noncompliance. The result of such an approach
can, in principle, bias the treatment effect downward. A viable alternative
would be compare the treatment compliers to those in the control group who
would have complied had they received the treatment. However, this latter
group is unobserved. The CACE approach under the general framework uses
finite mixture modeling and information about treatment compliers to form a
latent class of potential complier, and forms the experimental comparison
between these two groups.9
While certainly not exhaustive, it is hoped that this chapter provides the
reader with a taste the modeling possibilities that the general framework
allows. The models in this chapter scratch only the surface of what has been
described as “second-generation” structural equation modeling.
Notes
1. From here on, we will use the term “class” to refer to components of the mix-
ture model. The term is not to be confused with latent classes (e.g., Clogg, 1995)
although finite mixture modeling can be used to obtain latent classes (McLachlan &
Peel, 2000).
2. Note that latent class models can handle polytomously scored items.
3. For dichotomous items, it is only necessary to present the value of one latent
class indicator.
4. Methods for assessing latent class membership over time are discussed in
Section 10.4.
5. The sampling design of ECLS-K included a 27% subsample of the total sample
at Fall of first grade to reduce the cost burden of following the entire sample for four
waves but to allow for the study of summer learning loss (NCES, 2001).
6. A nonstationary Markov model is one that allows heterogeneous transition
probabilities over time. In contrast, stationary Markov models assume homogeneous
transition probabilities over time.
7. It should be noted that finite mixture modeling has been applied to continuous
growth curve models under the name general growth mixture models (B. Muthén, 2004).
These models have been applied to problems in the development of reading competen-
cies (Kaplan, 2002), and math competencies (Jordan, Kaplan, Nabors-Olah, & Locuniak,
2006 ).
8. It is sometimes the case that adding covariates can change the number of mix-
ture classes. See Kaplan (2002) for an example of this problem in the context of reading
achievement.
9. This is an admittedly simple explanation. The CACE approach makes very
important assumptions—including random assignment, and stable unit treatment
value (Jo & Muthén, 2001).
10
Epilogue
Toward a New Approach to the
Practice of Structural Equation Modeling
Methodology is a frustrating and rewarding area in which to

work. Just as there is no best way to listen to a Tchaikovsky
symphony, or to write a book, or to raise a child, there is no
best way to investigate social reality. Yet methodology has a
role to play in all of this. By showing that science is not the
objective, rigorous, intellectual endeavor it was once thought
to be, and by demonstrating that this need not lead to anarchy,
that critical discourse still has a place, the hope is held out that
a true picture of the strengths and limitations of scientific
practice will emerge. And with luck, this insight may lead to a
better and certainly more honest, science.
—Caldwell (1982), as cited in Spanos, (1986)
The only immediate utility of all sciences is to teach us how to

control and regulate future events by their causes.
—Hume (1739)
A s stated in the Preface, one goal of this book was to provide the reader
with an understanding of the foundations of structural equation model-
ing and hopefully to stimulate the use of the methodology through examples
that show how structural modeling can illuminate our understanding of social
reality—with problems in the field of education serving as motivating exam-
ples. At this point, we revisit the question of whether structural equation
207
modeling can illuminate our understanding of social reality. I argue in this

chapter that the answer to this question rests not so much on the specific sta-
tistical details of the method, but rather on the approach taken to the applica-
tion of the method. However, as we will see, the approach taken to the
application of the method is intimately connected to the statistical underpin-
nings of the method itself.
Taking the position that the application of the method, and not the
method itself, is linked to what we can learn about social reality, this chapter
reconsiders the conventional approach to structural equation modeling as rep-
resented in most textbooks and substantive applications wherein structural
modeling has been employed. The conventional approach to structural equa-
tion modeling is considered in light of recent work in the practice of econo-
metric methodology—particularly simultaneous equation modeling.
It is not the intention of this chapter to argue that the econometric
approach is the “gold standard” of structural equation modeling practice in the
social sciences. Rather, the purpose of this chapter is to examine an alternative
formulation of modeling practice in econometrics and to argue that the cur-
rent discourse on econometric practice may have value when considered in
light of the conventional practice of structural equation modeling found in
other social sciences. In doing so, one goal of this chapter is to remind the
reader of the econometric history underlying structural equation modeling
and to outline how that history might have influenced the history of the
methodology in the other social sciences.
In addition to outlining an alternative approach to the practice of struc-
tural equation modeling, I argue that developments in our understanding of
causal inference in the social and behavioral sciences must be brought into cur-
rent practice to exploit the utility of structural equation modeling. These
developments include recent thinking on the counterfactual theory and related
manipulationist theory of causation.
The organization of this chapter is as follows. In the next section, we sum-
marize the conventional practice of structural equation modeling to set the
framework for the ensuing critique. This is followed by a sketch of the
so-called “textbook” practice of simultaneous equation modeling in econo-
metrics. Following this, we outline of the history and components of an alter-
native methodology proposed by Spanos (1986, 1990, 1995) referred to as the
probabilistic reduction approach. Following the outline of Spanos’s methodol-
ogy, we discuss the implications of the probabilistic reduction approach to the
practice of structural equation modeling. The chapter then turns to the prob-
lem of causal inference. Here, we focus attention on philosophical and
methodological work on the counterfactual and manipulationist theories of
causal inference that has informed econometric practice and may be useful to
the practice of structural equation modeling in the other social science disci-
plines. Finally, we close with a summary.
Epilogue—209
10.1 Revisiting the Conventional

Approach to Structural Equation Modeling
The conventional approach to structural equation modeling was represented

in Chapter 1. Throughout this book, reference was made to how various sta-
tistical and nonstatistical techniques within structural equation modeling were
used in conventional practice. The conventional approach can be reiterated as
follows. First, the investigator postulates a theoretical framework to set the
stage for the specification of the model. In some cases, attempts are made to
relate the theoretical framework directly to the specification of the model as
typically portrayed in a path diagram. It is common to find an implicitly artic-
ulated one-to-one relationship between the theory and the path diagram—
implying that the theory and the diagram correspond to each other up to the
inclusion of disturbance terms.
Next, a set of measures are selected to be incorporated into the model. In
cases where multiple measures of hypothesized underlying constructs are desired,
investigators may digress into a study of the measurement properties of the data
before incorporating the variables into a full latent variable model. It can be
inferred from a reading of the extant literature that there is very close relationship
assumed between the theoretical variables and the empirical latent variables.
In the next phase, the specified model as portrayed in the diagram is esti-
mated. Rarely is the choice of the estimator based on an explicit assessment of
its underlying assumptions. Even if such a thorough assessment of the assump-
tions were made, in many cases, analysts are limited in their choice of estima-
tors due to such real constraints as sample size requirements. In other words,
investigators may very well understand the limitations of, say, maximum like-
lihood estimation to categorical and other nonnormal variables, but the sam-
ple size requirements for successful implementation of, say, weighted least
squares estimators may be prohibitive.1
After the model parameters have been estimated, the fit of the model is
almost always assessed. It is quite common to find the presentation of alterna-
tive fit indices alongside the standard likelihood ratio chi-square statistic.
These indices are presented despite the fact that they are based on conceptually
different notions of model fit. For example, displaying the likelihood ratio chi-
square test of exact fit with the nonnormed fit index which assesses fit against
a baseline model of independence is conceptually dubious insofar as the “alter-
native hypotheses” being evaluated are entirely different.
As we noted in earlier chapters, it is often the case that a model is deter-
mined not to fit the data on a number of criteria. The lack of model fit could
be the result of the violation of one or more of the assumptions underlying the
chosen estimator. But regardless of the reasons for model misfit, the conven-
tional approach to structural equation modeling takes the next step of model
modification. The modification of the model typically proceeds using the

modification index in conjunction with the expected change statistic. By neces-
sity, post hoc model modification is typically supplemented with post hoc jus-
tification of how the modification fits into the theoretical framework. In any
case, at some point in the cycle, model modification stops.
Once the model is deemed to fit the data, it is common to relate the find-
ings back to the original substantive question being posed. However, the results
of the model are often related back to the original question in a cursory man-
ner. Seldom is it the case that specific parameter estimates are directly inter-
preted. Nor do we find a discussion of how the parameter estimates, their
signs, and statistical significance support theoretical propositions. Rarer still do
we find examples of comparisons of structural models representing different
theoretical positions, with models being selected on the basis of, say, the Akaike
information criterion statistic.2 Finally, it is rarely the case that models are used
for policy or clinically relevant prediction studies.
To summarize, the conventional approach to structural equation model-
ing in the social sciences can be described in five steps: (1) a model is specified
and considered to be a relatively close instantiation of a theory, (2) measures
are gathered, (3) the model is estimated, (4) then typically modified, and
finally (5) the results are related back to the original question. Interestingly, the
approach to structural equation modeling in the social sciences parallels the
conventional approach to econometric modeling described by Pagan (1984),
who wrote
Four steps almost completely describe it: a model is postulated, data gathered,
a regression run, some t-statistics or simulation performance provided and
another empirical regularity was forged.
Next, we outline the history that led to the conventional approach to

econometric practice characterized by Pagan to serve as a comparison the con-
ventional practice of structural equation modeling in the social sciences.
10.2 The Conventional Approach to Econometric Practice
In his historical account of econometric practice, Spanos (1989) argues that

the Harvard monograph by Haavelmo (1944) formally launched econometrics
as a distinct discipline. Moreover, Spanos laments the fact that this mono-
graph, although heavily cited, was rarely read and that there were many key
aspects of the work that have been neglected in practice. Neglect of these key
aspects of Haavelmo’s work may have contributed to the conventional practice
of econometric modeling and the difficulties it generated.
Epilogue—211
A central aspect of Haavelmo’s approach was the notion of the joint

distribution of the process underlying the available data as being of most
importance to identification, estimation, and hypothesis testing. The joint dis-
tribution of the observed random variables over the time period of collection
is referred to by Spanos (1989) as the Haavelmo distribution. We consider the
Haavelmo distribution in more detail in Section 10.3.
The second aspect of Haavelmo’s contribution, which was arguably
ignored in the conventional practice of econometrics, concerned the notion
of statistical adequacy. Statistical adequacy was a concept introduced by
R. A. Fisher and brought to econometrics by Koopmans (1950) and is a prop-
erty of a statistical model applied to the observed data when the underlying
assumptions of the model are met. In cases where a statistical model is not
statistically adequate, inferences drawn from the statistical model are suspect at
best. Of central importance to the argument presented in this chapter is that
statistical adequacy must be established before testing theoretical suppositions
because the validity of these tests depends on the validity of the statistical model.
A third aspect of Haavelmo’s approach concerns his view of data mining.
Specifically, this issue concerns the distinction between the statistical model
and the estimable econometric model used for testing specific theoretical ques-
tions of interest. The statistical model carries with it aspects of the underlying
theory insofar as the theory dictates which variables to collect and, possibly,
how to measure them. However, the statistical model is designed to capture the
probabilistic structure of the data only and is, in an important sense, theory neu-
tral. The relationship between the statistical model and the theoretical parameters
of interest is handled by Haavelmo through the process of identification—
which in Haavelmo’s methodology is intimately linked with the probabilistic
structure of the observed data.
The final element of Haavelmo’s methodology, which seems to have been
neglected in the conventional practice of econometrics, concerns the error
term. Specifically, in Haavelmo’s methodology, the statistical model is speci-
fied in consideration of the probabilistic structure of the observed random
variables—not the error term. Spanos (1989) notes that this distinction sepa-
rates the post-Haavelmo paradigm in econometric methodology from the pre-
Haavelmo paradigm that rested on the Gaussian theory of errors.
10.2.1 COMPONENTS OF THE

TEXTBOOK APPROACH TO ECONOMETRICS
As Spanos (1989) noted, a lack of careful reading of Haavelmo resulted in
what came to be called the “textbook” practice of econometrics (Spanos, 1986).
The textbook practice was perhaps best exemplified by two important early
econometric textbooks: Goldberger (1964) and Johnston (1972). It is interest-

ing to point out that Goldberger was influential in the application of structural
equation modeling to social sciences other than economics. Indeed, Goldberger
collaborated with the sociologist O. D. Duncan producing a classic edited
volume on structural equation modeling in the social science (Goldberger &
Duncan, 1972; Jöreskog, 1973). Goldberger also collaborated with Karl
Jöreskog on important applications to structural equation modeling—including
the MIMIC model discussed in Chapter 4 (Jöreskog & Goldberger, 1975).3
The textbook approach to econometrics as represented by Johnston’s and
Goldberger’s texts incorporated aspects of Haavelmo’s probabilistic approach
only through the assumed structure of the error term. Moreover, Haavelmo’s
notions of obtaining a statistically adequate model did not influence the prac-
tice of simultaneous equation modeling because there was a prevailing view
that the use of sample information without underlying theory was inappropri-
ate (Spanos, 1989).4 Clearly, under this viewpoint, there is no incentive to con-
sider the underlying probabilistic structure of the data. By default, data mining
is also discouraged.
The response to the textbook practice of econometrics was a series of sus-
tained critiques from a variety of perspectives. A discussion of these critiques
can be found in Spanos (1990). Suffice to say here that the critiques of the text-
book practice of econometrics centered on the validity of employing experi-
mental design reasoning to purely observational data and on the role of
statistically adequate models. A specific critique offered by Spanos (1989, 1990)
had its origins in the London School of Economics tradition (see, e.g., Hendry,
1983) and focused on the importance of the probabilistic structure of the data
and is based on a rereading and adaptation of Haavelmo’s original contribu-
tions. This approach is described next.
10.3 The Probabilistic Reduction Approach
In Chapter 1, we noted that econometric simultaneous equation modeling

could not compete with Box-Jenkins time-series models in terms of predictive
performance. One problem with simultaneous equation modeling centered on
the distinction between dynamic and static models. However, regardless of the
specific problem, econometricians were beginning to realize that simultaneous
equation models were not producing the kind of reliable predictions of the
behavior of the economy that the Cowles Commission had envisioned. The
problem, it seemed, lied in a conventional practice of econometric modeling
that deviated from what was originally intended by founders such as Haavelmo
(Haavelmo, 1943, 1944; see also Spanos, 1989). The result was that from the
mid-1970s to the present, there has been a sustained critique of the conven-
tional approach to econometric modeling.
Epilogue—213
I argue that one response to this critique offered by Spanos (1986, 1990,
1995) may provide an alternative to the conventional practice of structural
equation modeling in the social sciences. Spanos refers to this alternative
approach as the probabilistic reduction approach.
10.3.1 THE HISTORICAL BACKGROUND

OF THE PROBABILISTIC REDUCTION APPROACH
In the development of the probabilistic reduction approach, Spanos
(1995) traces the general problem of simultaneous equation modeling to two
historical paradigms in statistics: (1) Fisher’s experimental design paradigm
and (2) the Gaussian theory of errors paradigm. The conventional practice of
simultaneous equation modeling in econometrics resulted from a combination
of the influence of these paradigms and a lack of careful reading of Haavelmo’s
(1943, 1944) original work.
Fisher’s Experimental Design Paradigm. In the case of Fisher’s paradigm, the

experimental design represents the theory and the statistical model is chosen
before the data are collected. Indeed, the correspondence between the statisti-
cal model and the experimental design as representing the theory are nearly
identical, with the statistical model differing from the design by the incorpora-
tion of an error term.
The major contributions of Fisher’s paradigm notwithstanding,5 the con-
ventional approach to simultaneous equation modeling borrowed certain fea-
tures of the paradigm that are problematic in light of the reality of economic
and social science phenomena. Specifically, as noted by Spanos (1995) the
social theory under investigation (e.g., input-process-output theory in educa-
tion) replaces the experimental designer. Moreover, the theory is required to
lead to a theoretical model that does not differ in any substantial way from the
statistical model. In other words, adapting the Fisher paradigm to economics
and social science applications of structural modeling assumes that the theory
and the designer are one and the same and that the statistical model and the
theoretical model as derived from the theory differ only up to the inclusion of
a white-noise disturbance term.
The Theory of Errors Paradigm. The theory of errors paradigm had its roots
in the mathematical theory of approximation and led to the method of least
squares proposed by Legendre in 1805. A probabilistic foundation was given to
the least squares approach by Gauss in 1809 and developed into a “theory of
errors” by Laplace in 1812.
The basic idea originally proposed by Legendre was that a certain function
was optimally approximated by another function via the minimization of the
sum of the squared deviations about the line. The probabilistic formulation
proposed by Gauss and later Laplace was that if the errors were the result of
insignificant omitted factors, then the distribution of the sum of the errors
would be normal as the number of errors increased. If it could be argued that
the omitted variables were essentially unrelated to the systematic part of the
model, then the phenomena under study could be treated as if it were a nearly
isolated system (as cited in Spanos, 1995; Stigler, 1986).
Arguably, the theory of errors paradigm had a more profound influence
on econometric and social science modeling than the Fisher paradigm. Specifically,
the theory of errors paradigm led to a tremendous focus on statistical estima-
tion. Indeed, a perusal of most econometric textbooks shows that the domi-
nant discussion is typically around the choice of an estimation method. The
choice of an alternative estimator, whether it be two-stage least squares,
limited-information maximum likelihood, instrumental variable estimation,
or generalized least squares, is the result of viewing ordinary least squares as
not living up to its optimal properties in the context of real data.
A Comparison of the Two Approaches. The common denominator between the

Fisher paradigm and the theory of errors paradigm is the assumptions made
regarding the error term. In both cases, the assumptions made regarding the
error term lead to the view that the phenomenon under study exists as a nearly
isolated system. Where the two traditions differ however, is in their views of
redesign and data mining (Spanos, 1995). Specifically, in the Fisher paradigm it
is entirely possible that an experiment can be redesigned. Moreover, given that
the design is the de facto reality under study, data mining could lead to “discov-
ering a theory in the data.” In the context of the theory of errors paradigm, the
data are nonexperimental in nature and thus data mining is nonproblematic.
Spanos (1995) cites the example of Kepler. Spanos writes, “Kepler’s insight was
initially suggested by looking at the data and not by a theory. Indeed, the theory
came much later in the form of Newton’s theory of universal gravitation” (pp.
195–196). In addition, in nonexperimental research, such “experiments” cannot
be redesigned.
Regardless of the similarities and differences between the Fisher and
theory of errors paradigms, the conventional approach to econometric model-
ing, and indeed statistical modeling in the social sciences generally, adopted
aspects of both. In particular, econometric modeling historically took a dim
view with respect to data mining, and social science applications of structural
equation modeling have been somewhat silent on this issue. As noted above,
this could be the result of confusion between the theory and the experimental
design that arises from the Fisher paradigm. A close look at this bias in the con-
text of nonexperimental data leads to the conclusion that the bias is somewhat
irrational. Moreover, as we will see when we outline the probabilistic reduction
approach, this negative view toward data mining disappears, and instead, the
activity becomes positively encouraged.
Epilogue—215
10.4 Elements of the Probabilistic Reduction Approach
The probabilistic reduction approach to structural equation modeling is pre-

sented in Figure 10.1. A key feature of Figure 10.1 is the separation of the theory
from the actual data-generating process, or DGP. In this formulation, a theory is
a conceptual construct that serves to provide an idealized description of the phe-
nomena under study. For example, the input-process-output “model” discussed
in Chapter 1, is actually a theory insofar as it describes, in entirely conceptual
terms, the processes that leads to important educational outcomes.6 The con-
structs that make up the theory are not observable entities, nor are they latent
variables derived from observable data. Yet, the theory should be articulated well
enough to suggest what measures to obtain even if it does not directly suggest the
scales on which they should be measured. Finally, the theory should be suffi-
ciently detailed to allow for predictions based on a statistical model. That is, the
statistical model, to be described below, should be capable of a reparameteriza-
tion sufficiently detailed to allow tests of predictions suggested by the theory.
10.4.1 THE DATA-GENERATING PROCESS

We next consider a very important component of the probabilistic reduction
approach—namely the actual data-generating process or DGP. In the simplest
terms, the DGP is the actual phenomenon that the theory is put forth to explain.
In essence, the DGP corresponds to the reality that generated the observed data. It
is the reference point for both the theory and the statistical model. In the former
case, the theory is put forth to explain the reality under investigation—be it the
cyclical behavior of the economy or the organizational structure of schooling that
generates student achievement. In the latter case, the statistical model is designed
to capture the systematic nature of the observed data as generated by the DGP.
10.4.2 THE THEORETICAL MODEL

A theoretical model, according to Spanos, is a mathematical formulation
of the theory. The theoretical model is not necessarily the statistical model with
a white-noise term added. In social science applications of structural equation
modeling, we tend not to see theoretical models as such. Instead, we view the
statistical model with the restrictions added as somehow separate from a the-
oretical model. It is argued below that the restrictions placed on a statistical
model, and indeed the issue of identification, implies an underlying theoreti-
cal model even if not directly referred to as such.
10.4.3 THE ESTIMABLE MODEL

In some cases, the theoretical model may not be capable of being estimated.
This is because the theoretical model is simply a mathematical formulation of
Theory DGP
Theoretical Observed
Model Data
Estimable Statistical
Model Model
Estimation
Misspecification
Reparameterization
Model Selection
Empirical Social
Science Model
Figure 10.1 Diagram of the Probabilistic Reduction Approach to Structural

Equation Modeling
SOURCE: Adapted from Spanos (1986).
a theory, and the latter does not always provide information regarding what
can be observed or how it should be measured. One only need think of “school
quality” as an important theoretical variable of the input-process-output
theory to realize how many different ways such a theoretical variable can be
measured. Therefore, a distinction needs to be made regarding the theoretical
model and an estimable model, where the estimable model is specified with an
eye toward the DGP (Spanos, 1990).
As an example, let us assume the appropriateness of the input-process-
output theory. If interest centers on the measurement of school quality via a
Epilogue—217
survey of school climate, this will have bearing on the form of the estimable
model as well as the form of the statistical model (to be described next). If
school quality actually referred to the distribution of resources to classrooms,
then clearly the estimable model will differ from the theoretical model and aux-
iliary measurements might need to be added. It may be interesting to note that
the theoretical model and estimable model coincides when data are generated
from an experimental arrangement. However, we noted that such arrangements
are rare in social science applications of structural equation modeling.
10.4.4 THE STATISTICAL MODEL

The statistical model describes an internally consistent set of probabilistic
assumptions made about the obtained data series. As Spanos (1990) notes,
the statistical model should be an adequate and convenient summary of the
observed data. The term “adequate” is used in the sense that it does not exclude
systematic information in the data. The term “convenient” is used to suggest
that the statistical model can be used to consider aspects of the theory.
To be clear, the statistical model is not a one-to-one instantiation of the
theory. Rather, within the probabilistic reduction approach, the statistical
model is chosen to adequately represent the probabilistic information in the
data (Spanos, 1990). However, the choice of a statistical model is partly guided
by theory insofar as the statistical model must be capable of being used to
answer theoretical question of interest.
It is in the context of our discussion of the statistical model that we may
wish to revisit the issue of data mining. In the probabilistic reduction approach,
the statistical model is specified to capture as much systematic probabilistic
information in the data as possible. No theoretical specification is imposed. To
take an example from educational research, the lack of independence among
observations due to nesting of students in schools is unrelated to the number
of plots or other exploratory methods used to detect it. As such, data mining
in the form of plots and other methods of exploratory data analysis is not only
valid but also strongly encouraged as a means of capturing the systematic
information in the data.
Because the notion of the statistical model is unique to the probabilistic
reduction approach, it is required that we develop the concept more fully. To
begin, consider the joint distribution of the data denoted as f ðy, xjθÞ. Generally,
statistical models such as regression involve a reduction of the joint distribu-
tion of the observed data. Such a reduction can be written as
f ðy, xjθÞ = f ðyjx; θ1 Þf ðx; θ2 Þ, [10.1]
where the first term on the right-hand side of Equation [10.1] is the conditional
distribution of the endogenous variables given the exogenous variables, and
the second term on the right-hand side is the marginal distribution of the
exogenous variables. The parameter vectors θ1 and θ2 index the parameters of

the conditional distribution and marginal distribution, respectively. To take an
example from simple regression, the vector θ1 contains the intercept, slope, and
disturbance variance parameters of the regression model, while the vector θ2
contains the mean and variance of the marginal distribution of x.
Weak Exogeneity. The development of a statistically adequate model proceeds

by focusing attention on the conditional distribution of y given x. However,
this immediately raises the question of whether one can ignore the marginal
distribution of x. This question concerns the problem of weak exogeneity
(Ericsson & Irons, 1994; Richard, 1982) and represents the first and perhaps
most important assumption that needs to be addressed. The problem of weak
exogeneity was discussed in Chapter 5. Suffice to say that with regard to the
choice of the variables in the model vis-à-vis the theory, the assumption of
weak exogeneity requires serious attention.
Continuing with our discussion, if we can assume weak exogeneity, then
we can focus our attention on the conditional distribution f ðy, xjθ1 Þ . In the
context of structural equation modeling, we may write f ðy, xjθ1 Þ as
y = Πx + ζ , [10.2]
which we note is the reduced form specification discussed in Chapter 2 and, in

fact, is the multivariate general linear model.
Within the probabilistic reduction approach applied to structural equa-
tion modeling, the reduced form specification constitutes the statistical model
while the structural form constitutes the theoretical model. Prior to testing
restrictions implied by the theory via the structural form, it is necessary to
assess the statistical adequacy of the reduced form.
A Note on Identification. It may be interesting to note that the probabilistic

reduction approach yields two notions of identification (Spanos, 1990). First, in
the context of the statistical model, identification concerns the adequacy of the
sample information for estimating the parameters of the joint distribution of
the data. It could be the case that there is insufficient information in the form
of colinearity that limits the estimation of the statistical model. Colinearity was
not explicitly discussed in this book. For a discussion of colinearity in the con-
text of structural equation modeling, see Kaplan (1994).
Second, identification problems in the form of insufficient sample infor-
mation can be distinguished from identification problems related to insuffi-
cient theoretical information—in essence whether structural parameters can
be identified from reduced form parameters. However, it must be made clear
that the probabilistic reduction approach does not view theoretical identifica-
tion issues as separate from the statistically adequate model on which it rests.
Epilogue—219
The two forms of identification are related, but distinction is useful from the
view point of the probabilistic reduction approach (see Spanos, 1990).
10.4.5 THE EMPIRICAL SOCIAL SCIENCE MODEL

It is important to note that the statistical model discussed in Section 10.4.4
refers to a model that captures the systematic probabilistic information in the
data. Once a convenient and adequate statistical model is formulated, the
empirical social science model is reparameterized for purposes of description,
explanation, or prediction. The reparameterization that would be easily recog-
nized by practitioners of structural equation modeling is in the form of
restricting parameters to zero.7 In other words, after a statistical model is cho-
sen, the next step is to restrict the model in ways suggested by theory or as a
means of testing competing theories.
For example, after formulating an adequate representation of the reduced
form of the science achievement model, one could test a set of theoretical
propositions of the sort implied by the path diagram in Figure 2.1.8 The path
diagram, therefore, represents the empirical model of interest—providing a
pictorial representation of the restrictions to be placed on a statistically ade-
quate reduced form model.
10.4.6 RECAP: MODELING STEPS USING THE

PROBABILISTIC REDUCTION APPROACH
It is important to be clear regarding the modeling steps that are suggested
by the probabilistic reduction approach and to contrast them with the conven-
tional approach described above. The probabilistic reduction approach assumes
that there exists a theory (or theories) that the investigator wishes to test.
It is assumed that the theory is sufficiently detailed insofar as it is able to sug-
gest the type of measures to be obtained. The theory is assumed to describe
some actual phenomenon—referred to as the DGP. In this regard, there is no
philosophical difference between the probabilistic reduction approach and the
conventional approach.
Assuming that a set of data has been gathered, the next step is to specify a
convenient and adequate statistical model of the observed data. Such a statistically
adequate model should account for all the systematic probabilistic information in
the data. That is, the statistical model is developed on the joint distribution of the
data. All means necessary to model the probabilistic nature of the joint distribu-
tion should be used because the statistical parameters of the joint distribution
have no theoretical interpretation at this point. Indeed, the probabilistic reduction
approach advocated by Spanos calls for the free use of data plots and other forms
of data summary in an effort to arrive at an adequate and convenient statistical
model. Note that model assumptions relate to the conditional distribution of the
data, and it may be necessary to put forth numerous statistical models until one
is finally chosen. These assumptions include exogeneity, normality, linearity,
homogeneity, and independence. Weak exogeneity becomes a very serious
assumption at this step because evidence against weak exogeneity implies that
conditional estimation is inappropriate—that is, the conditional and marginal
distributions must be both taken into consideration during estimation. In any
case, a violation of one or more of these assumptions requires respecification and
adjustment until a statistically adequate model is obtained.
The next step in the probabilistic reduction approach is to begin testing
theoretical propositions of interest via parameter restrictions placed on a sta-
tistically adequate model. Note that whereas the resulting statistical model may
be based on considerable data mining, this does not present a problem because
the parameters of the statistical model do not have a direct interpretation rel-
ative to the theoretical parameters. However, the process of parameter restric-
tion of the statistical model is based on theoretical suppositions and should not
be data specific. Indeed, as Spanos points out, the more restrictions placed on
the model, the less data-specific the theoretical/estimable model becomes.
From the point of view of structural equation modeling in the social sciences,
this means that we tend to favor models with many degrees of freedom.
In contrast to the probabilistic reduction approach, the conventional
approach typically starts with an over-identified model wherein the more overi-
dentifying restrictions the better from a theoretical point of view. However, the
process of model modification that characterizes the conventional approach
becomes problematic insofar as it does not rest on a statistically adequate and
convenient summary of the probabilistic structure of the data.
10.5 Structural Equation Modeling and Causal Inference
In the previous section, a detailed account of an alternative to the conventional

application of structural equation modeling was offered. This alternative
approach to conventional structural equation modeling focuses almost entirely
on the statistical features of the methodology and its common practice. Moreover,
in our discussion, attention was paid to the use of the probabilistic reduction
approach to improve prediction. Although prediction is critically important in
the social and behavioral sciences, an equally important activity is the testing of
causal propositions and developing explanations of substantive processes.
It is important to contrast models used for prediction versus models used
for causal inference and explanation. In the former case, it is sufficient to have
used the probabilistic reduction approach to capture the covariance structure
of the data. In the latter case, the logic of causal inference lies outside of the
statistical analysis and requires that we examine variables with regard to their
potential for manipulation and control.
Epilogue—221
Historically, developers and practitioners of structural equation modeling

have been reluctant to consider it as a tool for assessing causal claims. However, in
what is undeniably a classic study of the problem of causality, Pearl (2000) in his
book Causality: Models, Reasoning and Inference deals directly with, among many
other things, the reluctance of practitioners to use structural equation modeling
for warranting causal claims. Pearl noted that many of those who have been instru-
mental in developing structural equation modeling and propagating its use have
either explicitly warned against using causal language in regards to its practice (e.g.,
Muthén, 1987), or have simply not discussed causality at all. However, as Pearl
pointed out, the founders of structural equation modeling (especially Haavelmo,
1943; Koopmans et al., 1950; Wright, 1921, 1934) have noted that it can be used to
warrant causal claims as long as we understand that certain causal assumptions
must be made first.9 Haavelmo, for instance, believed that structural equations
were statements about hypothetical controlled experiments.
Pearl sees the elimination of causal language in structural equation model-
ing as arising from two distinct sets of issues. First, from the econometric end,
Pearl argues that the Lucas’s (1976) critique may have led economists to avoid
causal language. The Lucas critique centers on the use of econometric models
for policy analysis because such models contain information that changes as a
function of changes in the phenomenon under study. The following quotation
from Lucas (1976; as cited in Hendry, 1995) illustrates the problem.
Given that the structure of an econometric model consists of optimal deci-

sion rules for economic agents, and that optimal decision rules vary system-
atically with changes in the structure of the series relevant to the decision
maker, it follows that any change in policy will systematically alter the struc-
ture of econometric models. (Lucas, 1976, as cited in Hendry, 1995, p. 529)
As Hendry (1995) summarizes, “a model cannot be used for policy if

implementing the policy would change the model on which that policy was
based, since then the outcome of the policy would not be what the model had
predicted” (p. 172). From the more modern structural equation modeling per-
spective, Pearl argues that the reluctance to use of causal language may have
been due to practitioners wanting to gain respect from the statistical commu-
nity who have traditionally eschewed invoking assumptions that they deemed
untestable. Finally, Pearl lays some of the blame at the feet of the founders,
who, he argues, developed an algebraic language for structural equation mod-
eling that precluded making causal assumptions explicit.
Despite these concerns, a great deal of philosophical and methodological
research has developed that, I argue, provides a sensible foundation for testing
causal claims within the structural equation modeling context. Specifically,
that foundation rests on the counterfactual model of causation. Next, I provide
a brief review modern philosophical ideas and econometric theory related
specifically to the counterfactual theory of causation.
10.6 The Counterfactual Theory of Causation
My focus on the counterfactual theory of causation and the careful formula-

tion of model-based counterfactual claims rests on the argument that properly
developed measures that are closely aligned with the data-generating mecha-
nism provides a system for testing counterfactual claims in the context of
structural equation models. The probabilistic reduction approach described
earlier is, in my view, a more statistically sophisticated approach to developing
measures and models that are closely aligned with the DGP than the con-
ventional approach. The counterfactual theory of causation provides a logical
overlay to the probabilistic reduction approach and can lead to a sophisticated
study of causation within structural equation modeling.
It should be mentioned at the outset that this section of the chapter nei-
ther covers all aspects of a theory of causation that is of relevance to structural
equation modeling nor does it overview existing debates between those hold-
ing a so-called structural view of causation (e.g., Heckman, 2005) versus those
holding a treatment effects view of causation (e.g., Holland, 1986). A more
comprehensive review of these issues in general can be found in Kaplan (in
press). Instead, this section deals with specific theories of causation that
arguably hold great promise in improving the practice of structural equation
modeling for advancing the social and behavioral sciences.
10.6.1 MACKIE AND THE INUS CONDITION FOR CAUSATION

A great deal has been written on the counterfactual theory of causation. For
the purposes of this chapter, I will focus specifically on the work of Mackie (1980)
in his seminal work The Cement of the Universe as well as Hoover’s (1990, 2001)
applications of Mackie’s thinking within the econometric framework. A specific
extension of the counterfactual theory by Woodward (2003) which advocates a
manipulationist view of causation is also discussed. I argue that these works on
counterfactual propositions sets the basis for a more nuanced approach to causal
inference amenable to structural equation modeling. The seminal work on the
counterfactual theory of causation can be found in Lewis (1973). An excellent
recent discussion can be found in Morgan and Winship (2007).
To begin, Mackie (1980) situates the issue of causation in the context of a
modified form of a counterfactual conditional statement—namely, if X causes Y,
then this means that X occurred and Y occurred, and Y would not have occurred
if X had not. This strict counterfactual proposition is challenging because there
are situations were we can conceive of Y occurring if X had not.10 Thus, Mackie
suggests that a counterfactual statement must be augmented by considering
the circumstances or conditions under which the causal event took place—or
what Mackie refers to as a causal field. To quote Mackie (1980),
Epilogue—223
What is said to be caused, then, is not just an event, but an event-in-a-certain-

field, and some ’conditions’ can be set aside as not causing this-event-in-this-
field simply because they are part of the chosen field, though if a different field
were chosen, in other words if a different causal question were being asked,
one of those conditions might well be said to cause this-event-in-that-other-
field. (p. 35)
Contained in a casual field can be a host of factors that could qualify as

causes of an event. Following Mackie (1980), let A, B, C, . . . , and so on, be a list
of factors within a causal field that lead to some effect whenever some conjunc-
tion of the factors occurs. A conjunction of events may be ABC or DEF or JKL,
and so on. This allows for the possibility that ABC might be a cause or DEF
might be a cause, and so forth. For simplicity, assume the collection of factors is
finite—namely ABC, DEF, and JKL. Each specific conjunction, such as ABC is
sufficient but not necessary for the effect. In fact, following Mackie, ABC is a
“minimal sufficient” condition insofar as none of its constituent parts are redun-
dant. That is, AB is not sufficient for the effect, and A itself is neither a necessary
nor sufficient condition for the effect. However, Mackie states that the single fac-
tor, in this case, A, is related to the effect in an important fashion—namely, “[I]t
is an insufficient but non-redundant part of an unnecessary but sufficient con-
dition: it will be convenient to call this . . . an inus condition” (p. 62).
It may be useful to briefly examine the importance of Mackie’s work in the
context of a substantive illustration. For example, in testing models that can be
used to examine ways of improving reading proficiency in young children,
Mackie would have us first specify the causal field or context under which the
development of reading proficiency takes place. Clearly, this would be the
home and schooling environments. We could envision a large number of fac-
tors that could qualify as causes of reading proficiency within this causal field.
In Mackie’s analysis, the important step would be to isolate the set of con-
junctions, any one of which might be minimally sufficient for improved read-
ing proficiency. A specific conjunction might be phonemic awareness, parental
support and involvement, and teacher training in early literacy instruction.
This set is the minimal sufficient condition for reading proficiency in that none
of the constituent parts are redundant. Any two of these three factors is not
sufficient for reading proficiency and one alone—say, focusing on phonemic
awareness, is neither necessary nor sufficient. However, phonemic awareness is
an inus condition for reading proficiency. That is, the emphasis on phonemic
awareness is insufficient as it stands, but it is also a nonredundant part of a set
of unnecessary but (minimally) sufficient conditions.
Mackie’s analysis, therefore, provides a framework for considering the
exogenous and mediating effects in a structural equation model. Specifically,
when delineating the exogenous variables and mediating variables in a struc-
tural equation model, explicit attention should be paid to the causal field
under which the causal variables are assumed to operate. This view encourages
the practitioner to provide a rationale for the choice of variables in a particu-
lar model and how they might work together as a field within which a select set
of causal variables operates. This exercise in providing a deep description of
the causal field and the inus conditions for causation should be guided by
theory and, in turn, can be used to inform and test theory.
10.6.2 CAUSAL INFERENCE AND

COUNTERFACTUALS IN ECONOMETRICS
Because structural equation modeling has its roots in econometrics, it is
useful to examine aspects of the problem of causal inference from that disci-
plinary perspective. Within econometrics, an important paper that synthesized
much of Mackie’s (1980) notions of inus conditions for causation was Hoover
(1990). Hoover’s essential point is that causal inference is a logical problem and
not a problem whose solution is to be found within a statistical model per se.11
Moreover, Hoover argues that discussions of causal inference in econometrics
are essential and that we should not eschew the discussion because of its seem-
ingly metaphysical content. Rather, as with medicine, but perhaps without the
same consequences, the success or failure of economic policy might very well
hinge on a logical understanding of causation. A central thesis of the present
chapter is that such a logical understanding of causation is equally essential to
rigorous studies in the other social and behavioral sciences that use structural
equation modeling.
In line with Mackie’s analysis, Hoover suggests that the requirement that a
cause be necessary and sufficient is too strong, but necessity is crucial in the
sense that every consequence must have a cause (Holland, 1986). As such,
Hoover views the inus condition as particularly attractive to economists because
it focuses attention on some aspect of the causal problem without having to be
concerned directly with knowing every minimally sufficient subset of the full
cause of the event. In the context the social and behavioral sciences, these ideas
should also be particularly attractive. As in the aforementioned example of
reading proficiency, we know that it is not possible to enumerate the full cause
of reading proficiency, but we may be able isolate an inus condition—say
parental involvement in reading activities.
Hoover next draws out the details of the inus condition particularly as it
pertains to the econometric perspective. Specifically, in considering a particu-
lar substantive problem, such as the causes of reading proficiency, we may
divide the universe into antecedents that are relevant to reading proficiency, C,
and those that are irrelevant, non-C. Among the relevant antecedents are those
that we can divide into their disjuncts Ci and then further restrict our attention
Epilogue—225
to the conjuncts of particular inus conditions. But what of the remaining rele-
vant causes of reading proficiency in our example? According to Mackie, they
are relegated to the causal field. Hoover views the causal field as the standing
conditions of the problem that are known not to change, or perhaps to be
extremely stable for the purposes at hand. In Hoover’s words, they represent
the “boundary conditions” of the problem.
However, the causal field is much more than simply the standing condi-
tions of a particular problem. Indeed, from the standpoint linear statistical
models generally, those variables that are relegated to the causal field are part
of what is typically referred to as the error term. Introducing random error
into the discussion allows Mackie’s notions to be possibly relevant to indeter-
ministic problems such as those encountered in the social and behavioral sci-
ences. However, according to Hoover, this is only possible if the random error
terms are components of Mackie’s notion of a causal field.
Hoover argues that the notion of a causal field has to be expanded for
Mackie’s ideas to be relevant to indeterministic problems. In the first instance,
certain parameters of a causal process may not, in fact, be constant. If parame-
ters of a causal question were truly constant, then they can be relegated to the
causal field. Parameters that are mostly stable over time can also be relegated
to the causal field, but should they in fact change, the consequences for the
problem at hand may be profound. In Hoover’s analysis, these parameters are
part of the boundary conditions of the problem. Hoover argues that most inter-
ventions are defined within certain, presumably constant, boundary conditions—
although this may be questionable outside of economics.
In addition to parameters, there are also variables that are not of our imme-
diate concern and thus part of the causal field. Random errors, in Hoover’s
analysis, contain the variables omitted from the problem and are “impounded”
in the causal field. “The causal field is a background of standing conditions and,
within the boundaries of validity claimed for the causal relation, must be invari-
ant to exercises of controlling the consequent by means of the particular causal
relation (INUS condition) of interest” (Hoover, 2001, p. 222).
Hoover points out that for the inus condition to be a sophisticated approach
to the problem of causal inference, the antecedents must truly be antecedent.
Frequently, this requirement is presumed to be met by appealing to temporal
priority. But the assumption of temporal priority is often unsatisfactory.
Hoover gives the example of laying one’s head on a pillow and the resulting
indentation in the pillow as an example of the problem of simultaneity and
temporal priority.12 Mackie, however, sees the issue somewhat more simply—
namely the antecedent must be directly controllable. This focus on direct con-
trollability is an important feature Woodward’s (2003) manipulability theory
of causation described next.
10.7 A Manipulationist Account of

Causation Within Structural Equation Modeling
A very important discussion of the problem of manipulability was given by

Woodward (2003) who directly dealt with causal interpretation in structural
equation modeling. First, Woodward considers the difference between descrip-
tive knowledge versus explanatory knowledge. While not demeaning the use-
fulness of description for purposes of classification and prediction, Woodward
is clear that his focus is on causal explanation. For Woodward, a causal expla-
nation is an explanation that provides information for purposes of mani-
pulation and control. To quote Woodward,
my idea is that one ought to be able to associate with any successful explana-
tion a hypothetical or counterfactual experiment that shows us that and how
manipulation of the factors mentioned in the explanation . . . would be a way
of manipulating or altering the phenomenon explained . . . Put in still
another way, an explanation ought to be such that it can be used to answer
what I call the what-if-things-had-been-different question . . . (p. 11)
We clearly see the importance of the counterfactual proposition in the

context of Woodward’s manipulability theory of causation. However, unlike
Mackie’s analysis of the counterfactual theory, Woodward goes a step further
by linking the counterfactuals to interventions. For Woodward, the types of
counterfactual propositions that matter are those that suggest how one variable
would change under an intervention that changes another variable.
10.7.1 INVARIANCE AND MODULARITY

A key aspect of Woodward’s theory is the notion of invariance. Specifically,
it is crucial to the idea of a causal generalization regarding the relationship
between two variables (say X and Y) that the relationship remains invariant
after an intervention on X. According to Woodward, a necessary and sufficient
condition for a generalization to describe a causal relationship is that it be
invariant under some appropriate set of interventions. This is central for
Woodward insofar as invariance under interventions is what distinguishes
causal explanations from accidental association. It should be briefly noted that
a stronger version of invariance is super-exogeneity, which links the statistical
concept of weak exogeneity to the problem of invariance (Ericsson & Irons,
1994; Kaplan, 2004). With regard to causal processes represented by systems of
structural equations, another vital issue to the manipulability theory of causa-
tion is that of modularity (Hausman & Woodward, 1999, 2004). Quoting from
Woodward (2003),
Epilogue—227
A system of equation is modular if (i) each equation is level invariant under

some range of interventions and (ii) for each equation there is a possible inter-
vention on the dependent variable that changes only that equation while the
other equations in the system remain unchanged and level invariant. (p. 129)
In the above quote, level-invariance refers to invariance within equations,

while modularity refers generally to invariance between equations, so-called
equation invariance. In the context of structural equation modeling, level
invariance and modularity require very careful consideration. The distinction
between the two concepts expands the notion of how counterfactual proposi-
tions can be examined. Level-invariance concerns a type of local counterfactual
proposition—local in the sense that it refers to invariance to interventions
within a particular equation. In other words, the truth of the counterfactual
proposition is localized to that particular equation. Modularity, on the other
hand, concerns invariance in one equation given interventions occurring in
other equations in the system. In the context of the social and behavioral sci-
ences, modularity is, arguably, a more heroic and more serious assumption.
For a general critique of modularity, see Cartwright (2007).
10.7.2 OBSERVATIONALLY EQUIVALENT MODELS

A particularly troublesome issue related to level-invariance and modular-
ity concerns the relationship between the reduced form and structural form of
a structural model and the attendant issue of observationally equivalent mod-
els. We saw in Chapter 2 that the reduced form of a model (essentially equiva-
lent to multivariate regression) can be used to obtain structural parameters
provided the parameters are identified. Following Woodward (2003), consider
the following structural model
y = bx + u, [10.3]
z = gx + ly + v: [10.4]
The reduced form of this model can be written as
y = bx + u, [10.5]
z = px + w, [10.6]
where p = g + bl , and w = lu + v . The problem is that for just-identified

models, the reduced form solution provides exactly the same information
about the pattern of covariances as the structural form solution. As Woodward
points out, although these two sets of equations yield observationally equivalent
information, they are distinct causal representations.
To see this, note that Equations [10.3] and [10.4] say that x is a direct cause
of y and x and y are direct causes of z. But, Equations [10.5] and [10.6] say that
x is a direct cause of y and z and says nothing about y being a direct cause of z.
If Equations [10.3] and [10.4] represent the true causal system and is assumed
to be modular in Woodward’s sense, then Equations [10.4] and [10.5] cannot be
modular. For example, if y is fixed to a particular value due to intervention, then
this implies that β = 0. Nevertheless, despite this intervention, Equation [10.4]
will continue to hold. In contrast, given modularity of Equations [10.3] and
[10.4], we see that Equation [10.5] will change because π is a function of β.
We see then, that the structural form and reduced form are distinct causal
systems, and although they provide identical observational information as well
as inform the problem of identification, they do not provide identical causal
information. Moreover, given that numerous equivalent models can be formed,
the criterion for choosing among them, according to Woodward, is that the
model satisfies modularity, because that will be the model that fully represents
the causal mechanism and set of relationships (Woodward, 2003, p. 332).
In what sense does the manipulability theory of causation inform model-
ing practice? For Woodward (2003), the problem is that the model possessing
the property of modularity cannot be unambiguously determined from among
competing observationally equivalent models. Only the facts about causal
processes can determine this. For Woodward therefore, the prescription for
modeling practice is that researchers should theorize distinct causal mecha-
nisms and hypothesize what would transpire under hypothetical interventions.
This information is then mapped into a system of equations wherein each
equation represents a clearly specified and distinct causal mechanism. The
right-hand side in any given equation contains those variables on which inter-
ventions would change the variables on the left-hand side. And, although dif-
ferent systems of equations may be mathematically equivalent, this is only a
problem if we are postulating relatively simple associations. As Pearl (2000)
points out, mathematically equivalent models are not syntactically equivalent
when considered in light of hypothetical interventions. That is, each equation
in a system of equations should “encode” counterfactual information necessary
for considering hypothetical interventions (Pearl, 2000; Woodward, 2003).
10.7.3 PEARL’S INTERVENTIONAL INTERPRETATION

OF STRUCTURAL EQUATION MODELING
Although we have focused mainly on Woodward’s treatment of structural
equation modeling, it should also be pointed out that Pearl (2000), among
other things, offered an interventionist interpretation of structural equation
modeling. Briefly, Pearl notes that in practice researchers will often imbue
Epilogue—229
structural parameters with more meaning than they would covariances or

other statistical parameters. For example, the interpretation of a purely medi-
ating model
y = bz + u, [10.7]
z = gx + v [10.8]
is interpreted quite differently from the case where we also allow x to directly
influence y—that is,
y = bz + lx + u, [10.9]
z = gx + v: [10.10]
In the purely mediating model given in Equations [10.7] and [10.8], the effect
of intervening on x is to change y by βγ. In the model in Equations [10.9] and
[10.10], the effect of intervening on x is to change y by bg + l .
The difference between the interpretations of these two models is not
trivial. They represent important causal information regarding what would
obtain after an intervention on x. For Pearl (2000), structural equations are
meant to define an equilibrium state, where that state would be violated when
there is an outside intervention (p. 157). As such, structural equations encode
not only information about the equilibrium state but also information about
which equations must be perturbed to explain the new equilibrium state. For
the two models just described, an intervention on x would lead to different
equilibrium states.
Much more can be said regarding Woodward’s (2003) manipulability
theory of causation as well as Pearl’s (2000) interventional interpretation of
structural equation modeling, but a full account of their ideas is simply beyond
the scope of this chapter. Suffice to say that in the context of structural equa-
tion modeling, Woodward’s (2003) as well as Pearl’s (2000) expansion of the
counterfactual theory of causation to the problem of hypothetical interven-
tions on exogenous variables provides a practical framework for using struc-
tural equation modeling to guide causal inference and is line with how its
founders (Haavelmo, 1943; Marschak, 1950; Simon, 1953) viewed the utility of
the methodology.
10.8 Conclusion
Over the past 10 years, there have been important developments in the
methodology of structural equation modeling—particularly in methods
such as multilevel structural equation modeling, growth curve modeling, and

structural equation models that combine categorical and continuous latent
variables. These developments indicate a promising future with respect to sta-
tistical and substantive applications. However, it is still the case that the con-
ventional approach to structural equation modeling described earlier in this
chapter dominates its applications to substantive problems and it is also still
the case that practitioners remain reluctant to fully exploit structural equation
modeling for testing causal claims.
How might we reconcile statistical issues with causal issues and at the
same time improve the practice of structural equation modeling? In this
regard, Pearl (2000) offers a distinction between statistical and causal con-
cepts that I argue is helpful as we attempt to advance the use of structural
equation modeling in the social and behavioral sciences. Specifically, Pearl
defines a statistical parameter as a quantity determined in terms of a joint
probability distribution of observables without regard to any assumptions
related to the existence of unobservables. Thus, EðyjxÞ, the regression coeffi-
cient β, and so on are examples of statistical parameters. By contrast, a causal
parameter would be defined from a causal model, such as path coefficients,
the expected value of y under an intervention, and so on. Furthermore, a
statistical assumption is any constraint on the joint distribution of the
observables—for example, the assumption of multivariate normality. A causal
assumption, by contrast is any constraint on the causal model that is not
based on statistical constraints. Causal assumptions may or may not have sta-
tistical implications. An example would be identification conditions, which
are causal assumptions that can have statistical implications. Finally, in Pearl’s
view, statistical concepts include: correlation, regression, conditional indepen-
dence, association, likelihood, and so on. Causal concept, by contrast, include
randomization, influence, effect, exogeneity, ignorability, intervention,
invariance, explanation, and so forth.
Pearl argues that researchers should not necessarily ignore one set of
concepts in favor of the other but to treat each with the proper set of tools.
In the context of structural equation modeling, I argue that the probabilistic
reduction approach provides an improved set of tools that focus on the
statistical side of modeling, whereas the counterfactual and manipulation-
ist views of causation articulated by, for example, Hoover (2001), Mackie
(1980), and Woodward (2003) provide the set of tools and concepts for
engaging in causal inference. I argue that keeping the distinction between
statistical and causal activities clear, but boldly and critically engaging in
both, should help us realize the full potential of structural equation model-
ing as a valuable tool in the array of methodologies for the social and behav-
ioral sciences.
Epilogue—231
Notes
1. However, with the advent of new estimation methods, such as those discussed
in Chapter 5, this may become less of a concern in the future.
2. Except perhaps indirectly when using the Akaike information criterion for
nested comparisons.
3. It is beyond the scope of this chapter to conduct a detailed historical analysis,
but it is worth speculating whether Goldberger’s important influence in structural
equation modeling may partially account for the conventional practice observed in the
social sciences.
4. As noted in Spanos (1989) this view was based on the perceived outcome of a
classic debate between Koopmans (1947) and Vining (1949).
5. Included are such important contributions as randomization, replication, and
blocking.
6. A difficulty that arises in the context of this discussion is the confusion of
terms such as theory, model, and statistical model. No attempt will be made to resolve
this confusion in the context of this chapter and thus it is assumed that the reader will
understand the meaning of these terms in context.
7. Of course, nonzero restrictions and equality constraints are also possible.
8. Note that one can also use the multilevel reduced form discussed in Chapter 7
for this purpose as well.
9. Haavelmo, Wright, and Koopmans were referring to simultaneous equation
modeling, but the point still holds for structural equation modeling as understood in
this book.
10. An example might be a match being lit without it being struck—for example,
if it were hit by lightning.
11. In this regard, there does not appear to be any inherent conflict between the
probabilistic reduction approach described earlier and the counterfactual model of
causal inference.
12. This example was originally put forth by Emanuel Kant in the context of an
iron ball depressing a cushion.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp 6/24/2008 6:43 PM Page 232
References
Aitchison, J. (1962). Large-sample restricted parametric tests. Journal of the Royal

Statistical Society Series B, 24, 234–250.
Aitken, A. C. (1935). On least squares and linear combinations of observations.
Proceedings of the Royal Society, 55, 42–48.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood
principle. In B. N. Petrov & F. Csaki (Eds.), Second international symposium on
information theory. Budapest: Akademiai Kiado.
Akaike, H. (1985). Prediction and entropy. In A. C. Atkinson & S. E. Feinberg (Eds.),
A celebration of statistics (pp. 1–24). New York: Springer-Verlag.
Akaike, H. (1987). Factor analysis on AIC. Psychometrika, 52, 317–332.
Allison, P. D. (1987). Estimation of linear models with incomplete data. In C. C. Clogg
(Ed.), Sociological methodology (pp. 71–103). San Francisco: Jossey-Bass.
Anderson, T. W. (1959). Some scaling models and estimation procedures in the latent
class model. In U. Grenander (Ed.), Probability and statistics: The Harald Cramer
volume (pp. 9–38). New York: Wiley.
Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with
linear structure. Annals of Statistics, 1, 135–141.
Anderson, T. W., & Rubin, H. (1956). Statistical inference in factor analysis. In
J. Neyman (Ed.), Proceedings of the third Berkeley symposium for mathematical
statistics and probability (Vol. 5, pp. 111–150). Berkeley: University of California Press.
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data.
In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation
modeling: Issues and techniques (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum.
Arbuckle, J. L. (1999). AMOS 4.0 users’ guide. Chicago: Smallwaters.
Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural
Equation Modeling, 12, 411–434.
Asparouhov, T., & Muthén, B. (2003). Full-information maximum-likelihood estimation
of general two-level latent variable models with missing data (Mplus Working Paper).
Asparouhov, T., & Muthén, B. (2007, August). Computationally efficient estimation of mul-
tilevel high-dimensional latent variable models. In Proceedings of the Joint Statistical
Meeting in Salt Lake City. ASA Section on Biometrics.
Bentler, P. M. (1990). Comparative fit indices in structural models. Psychological Bulletin,
107, 238–246.
Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA:
Multivariate Software.
232
References—233
Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness-of-fit in the
analysis of covariance structures. Psychological Bulletin, 88, 588–606.
Bentler, P. M., & Liang, J. (2003). Two-level mean and covariance structures: Maximum
likelihood via an EM algorithm. In S. Reise & N. Duan (Eds.), Multilevel modeling:
Methodological advances, issues, and applications (pp. 53–70). Mahwah, NJ: Lawrence
Erlbaum.
Berndt, E. R. (1991). The practice of econometrics: Classic and contemporary. New York:
Addison-Wesley.
Bidwell, C. E., & Kasarda, J. D. (1975). School district organization and student achieve-
ment. American Sociological Review, 40, 55–70.
Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis.
Cambridge, MA: MIT Press.
Blumen, I. M., Kogan, M., & McCarthy, P. J. (1955). The industrial mobility of labor as a
probability process. Ithaca, NY: Cornell University Press.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bollen, K. A., & Curran, P. J. (2004). Autoregressive latent trajectory (ALT) models: A
synthesis of two traditions. Sociological Methods and Research, 32, 336–383.
Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation perspec-
tive. New York: Wiley.
Boomsma, A. (1983). On the robustness of LISREL (maximum likelihood estimation)
against small sample size and non-normality. Unpublished dissertation, University
of Groningen, Groningen, The Netherlands.
Browne, M. W. (1982). Covariance structures. In D. M. Hawkins (Ed.), Topics in applied
multivariate analysis (pp. 72–141). London: Cambridge University Press.
Browne, M. W. (1984). Asymptotic distribution free methods in the analysis of
covariance structures. British Journal of Mathematical and Statistical Psychology,
37, 62–83.
Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covari-
ance structures. Multivariate Behavioral Research, 24, 445–455.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In
K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136–162).
Newbury Park, CA: Sage.
Browne, M. W., & Mels, G. (1990). RAMONA user’s guide. Columbus: Department of
Psychology, Ohio State University.
Buse, A. (1982). The likelihood ratio, Wald, and Lagrange multiplier tests: An exposi-
tory note. The American Statistician, 36, 153–157.
Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of fac-
tor covariance and mean structure. Psychological Bulletin, 105, 456–466.
Caldwell, B. (1982). Beyond positivitism: Economic methodology in the twentieth century.
London: George Allen & Unwin.
Cartwright, N. (2007). Hunting causes and using them: Approaches in philosophy and eco-
nomics. Cambridge: Cambridge University Press.
Chambers, J. M. (1998). Programming with data: A guide to the S language. New York:
Springer-Verlag.
Chou, C.-P., & Bentler, P. M. (1990). Model modification in covariance structure mod-
eling: A comparison among likelihood ratio, Lagrange multiplier, and Wald tests.
Multivariate Behavioral Research, 25, 115–136.
Clogg, C. C. (1995). Latent class models. In G. Arminger, C. C. Clogg, & M. E. Sobel

(Eds.), Handbook of statistical modeling in the social and behavioral sciences
(pp. 81–110). San Francisco: Jossey-Bass.
Collins, L. M., & Flaherty, B. P. (2002). Latent class models for longitudinal data. In
J. A. Hagenaars & A. L. McCutcheon (Eds.), Applied latent class analysis
(pp. 287–303). Cambridge, UK: Cambridge University Press.
Collins, L. M., Hyatt, S. L., & Graham, J. W. (2000). Latent transition analysis as a way
of testing models of stage-sequential change in longitudinal data. In T. D. Little,
K. U. Schnabel, & J. Baumert (Eds.), Modeling longitudinal and multilevel data:
Practical issues, applied approaches, and specific examples (pp. 147–161). Mahwah,
NJ: Lawrence Erlbaum.
Collins, L. M., & Wugalter, S. E. (1992). Latent class models for stage-sequential
dynamic latent variables. Multivariate Behavioral Research, 27, 131–157.
Cooper, R. L. (1972). The predictive performance of quarterly econometric models of
the United States. In B. G. Hickman (Ed.), Econometric models of cyclical behavior
(pp. 813–974). New York: Columbia University Press.
Cowles, A., III. (1933). Can stock market forecasters forecast? Econometrica, 1, 309–324.
Cudeck, R., & Henly, S. J. (1991). Model selection in covariance structure analysis and
the “problem” of sample size. Psychological Bulletin, 109, 512–519.
Curran, P. J., & Bollen, K. A. (2001). The best of both worlds: Combining autoregressive
and latent curve models. In L. M. Collins & S. A. G. (Eds.), New methods for the
analysis of change. Washington, DC: American Psychological Association.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incom-
plete data via the EM algorithm (with discussion). Journal of the Royal Statistical
Society Series B, 39, 1–38.
Duncan, O. D. (1975). Introduction to structural equation models. New York: Academic Press.
Elliott, P. R. (1994). An overview of current practice in structural equation modeling. Paper
presented at the annual meeting of the American Educational Research
Association, New Orleans, LA.
Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information
maximum likelihood estimation for missing data in structural equation models.
Structural Equation Modeling, 8, 430–457.
Engle, R. F. (1984). Wald, likelihood ratio, and Lagrange Multiplier tests in econo-
metrics. In Z. Griliches & M. D. Intriligator (Eds.), Handbook of Econometrics
(pp. 776–826). Amsterdam: North-Holland.
Ericcson, N. R. (1994). Testing exogeneity: An introduction. In N. R. Ericcson & J. S.
Irons (Eds.), Testing exogeneity (pp. 3–38). Oxford, UK: Oxford University Press.
Ericsson, N. R., & Irons, J. S. (Eds.). (1994). Testing exogeneity. Oxford, UK: Oxford
University Press.
Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing.
Psychometrika, 44, 409–420.
Fisher, F. (1966). The identification problem in econometrics. New York: McGraw-Hill.
Fox, J. P., & Glas, C. A. W. (2001). Bayesian estimation of a multilevel IRT model using
Gibbs sampling. Psychometrika, 66, 271–288.
Galton, F. (1889). Natural inheritance. London: Macmillan.
Goldberger, A. S. (1964). Econometric theory. New York: Wiley.
Goldberger, A. S., & Duncan, O. D. (1972). Structural equation methods in the social sci-
ences. New York: Seminar Press.
References—235
Goodman, L. (1968). The analysis of cross-classified data: Independence quasi-

independence, and interactions in contingency tables with our without missing
entries. Journal of the American Statistical Association, 63, 1091–1131.
Guttman, L. (1954). Some necessary conditions for common-factor analysis.
Guttman, L. (1956). “Best possible” systematic estimates of communalities. Psychometrika,
21, 273–285.
Haavelmo, T. (1943). The statistical implications of a system of simultaneous equations.
Econometrica, 11, 1–12.
Haavelmo, T. (1944). The probability approach in econometrics. Econometrica (Supplement),
12, 1–115.
Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and appli-
cations. Boston: Kluwer-Nijhoff.
Harman, H. H. (1976). Modern factor analysis. Chicago: University of Chicago Press.
Hausman, D., & Woodward, J. (1999). Independence, invariance, and the causal Markov
condition. British Journal for the Philosophy of Science, 50, 521–583.
Hausman, D., & Woodward, J. (2004). Modularity and the causal Markov condition: A
restatement. British Journal for the Philosophy of Science, 55, 147–161.
Heckman, J. J. (2005). The scientific model of causality. In R. M. Stolzenberg (Ed.),
Sociological methodology (Vol. 35, pp. 1–97). Boston: Blackwell.
Heiberger, R. M. (1977). Regression with pairwise-present covariance matrix.
Proceedings of the Statistical Computing Section, 1977. Washington, DC: American
Statistical Association.
Hendrickson, A. E., & White, P. O. (1964). PROMAX: A quick method for rotation to
oblique simple structure. British Journal of Mathematical and Statistical Psychology,
17, 65–70.
Hendry, D. F. (1983). Econometric modelling: The consumption function in retrospect.
Scottish Journal of Political Economy, 30, 193–220.
Hendry, D. F. (1995). Dynamic Econometrics. Oxford: Oxford University Press.
Hershberger, S. L., Molenaar, P. C. M., & Corneal, S. E. (1996). A hierarchy of univari-
ate and multivariate structural time series models. In G. A. Marcoulides &
R. E. Schumacker (Eds.), Advanced structural equation modeling. Mahwah, NJ:
Lawrence Erlbaum.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical
Association, 81, 945–960.
Hood, W. C., & Koopmans (Eds.). (1953). Studies in econometric method (Vol. 14). New
York: Wiley.
Hoover, K. D. (1990). The logic of causal inference: Econometrics and the conditional
analysis of causality. Economics and Philosophy, 6, 207–234.
Hoover, K. D. (2001). Causality in macroeconomics. Cambridge, UK: Cambridge
University Press.
Horst, P. (1965). Factor analysis of data matrices. New York: Holt, Rinehart, and
Winston.
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replace-
ment from a finite universe. Journal of the American Statistical Association, 47,
663–685.
Hu, L., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural
equation modeling: Concepts, issues, and applications. Thousand Oaks, CA: Sage.
Hume, D. (1739). A treatise of human nature. Oxford, UK: Oxford University Press.
Intriligator, M. D., Bodkin, R. G., & Hsiao, C. (1996). Econometric models, techniques,
and applications. Upper Saddle River, NJ: Prentice Hall.
Jo, B., & Muthén, B. O. (2001). Modeling of intervention effects with noncompliance: A
latent variable modeling approach for randomized trials. In G. A. Marcoulides &
R. E. Schumacker (Eds.), New developments and techniques in structural equation
modeling (pp. 57–87). Mahwah, NJ: Lawrence Erlbaum.
Johnston, J. (1972). Econometric methods (2nd ed.). New York: McGraw-Hill.
Jordan, N. C., Hanich, L. B., & Kaplan, D. (2003a). Arithmetic fact mastery in young
children: A longitudinal investigation. Journal of Experimental Child Psychology,
85, 103–119.
Jordan, N. C., Hanich, L. B., & Kaplan, D. (2003b). A longitudinal study of mathematical
competencies in children with specific mathematics difficulties versus children with
co-morbid mathematics and reading difficulties. Child Development, 74, 834–850.
Jordan, N. C., Kaplan, D., & Hanich, L. B. (2002). Achievement growth in children with
learning difficulties in mathematics: Findings of a two-year longitudinal study.
Journal of Educational Psychology, 94, 586–597.
Jordan, N. C., Kaplan, D., Nabors-Oláh, L., Locuniak, M. N. (2006). Number sense
growth in kindergarten: A longitudinal investigation of children at risk for math-
ematics difficulties. Child Development, 77, 153–175.
Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis.
Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor
analysis. Psychometrika, 34, 183–202.
Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations.
Jöreskog, K. G. (1973). A general method for estimating a linear structural equation sys-
tem. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the
social sciences (pp. 85–112). New York: Academic Press.
Jöreskog, K. G. (1977). Structural equation models in the social sciences: Specification,
estimation and testing. In P. R. Krishnaiah (Ed.), Applications of statistics
(pp. 265–287). Amsterdam: North-Holland.
Jöreskog, K. G., & Goldberger, A. (1972). Factor analysis by generalized least squares.
Jöreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indi-
cators and multiple causes of a single latent variable. Journal of the American
Statistical Association, 70, 631–639.
Jöreskog, K. G., & Lawley, D. N. (1968). New methods in maximum likelihood factor
analysis. British Journal of Mathematical and Statistical Psychology, 21, 85–96.
Jöreskog, K. G., & Sörbom, D. (1993). LISREL 8.14. Chicago: Scientific Software
International.
Jöreskog, K. G., & Sörbom, D. (2000). LISREL 8.30 and PRELIS 2.30. Lincolnville, IL:
Scientific Software International.
Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis.
Kaplan, D. (1988). The impact of specification error on the estimation, testing, and
improvement of structural equation models. Multivariate Behavioral Research, 23,
69–86.
References—237
Kaplan, D. (1989a). Model modification in covariance structure analysis: Application

of the expected parameter change statistic. Multivariate Behavioral Research, 24,
285–305.
Kaplan, D. (1989b). Power of the likelihood ratio test in multiple group confirmatory
factor analysis under partial measurement invariance. Educational and Psychological
Measurement, 49, 579–586.
Kaplan, D. (1989c). A study of the sampling variability and z-values of parameter esti-
mates from misspecified structural equation models. Multivariate Behavioral
Research, 2, 41–57.
Kaplan, D. (1990a). Evaluation and modification of covariance structure models: A
review and recommendation. Multivariate Behavioral Research, 25, 137–155.
Kaplan, D. (1990b). Rejoinder on evaluating and modifying covariance structure mod-
els. Multivariate Behavioral Research,, 25, 197–204.
Kaplan, D. (1991a). On the modification and predictive validity of covariance structure
models. Quality and Quantity, 25, 307–314.
Kaplan, D. (1991b). The behaviour of three weighted least squares estimators for struc-
tured means analysis with non-normal Likert variables. British Journal of
Mathematical and Statistical Psychology, 4, 333–346.
Kaplan, D. (1994). Estimator conditioning diagnostics for covariance structure models.
Sociological Methods and Research, 23, 200–229.
Kaplan, D. (1995a). Statistical power in structural equation modeling. In R. H. Hoyle
(Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 100–117).
Thousand Oaks, CA: Sage.
Kaplan, D. (1995b). The impact of BIB-spiralling induced missing data patterns on
goodness-of-fit tests in factor analysis. Journal of Educational and Behavioral
Statistics, 20, 69–82.
Kaplan, D. (1999). On the extension of the propensity score adjustment method for the
analysis of group differences in MIMIC models. Multivariate Behavioral Research,
34, 467–492.
Kaplan, D. (2002). Methodological advances in the analysis of individual growth with-
relevance to education policy. Peabody Journal of Education, 77, 189–215.
Kaplan, D. (2004). On exogeneity. In D. Kaplan (Ed.), Sage handbook of quantitative
methodology for the social sciences (pp. 409–423). Thousand Oaks: CA: Sage.
Kaplan, D. (2005). Finite mixture dynamic regression modeling of panel data with
implications for dynamic response analysis. Journal of Educational and Behavioral
Statistics, 30(2), 169–187.
Kaplan, D. (2008). An overview of Markov chain methods for the study of stage-sequen-
tial developmental processes. Developmental Psychology, 44, 457–467.
Kaplan, D., & Elliott, P. R. (1997a). A didactic example of multilevel structural equation
modeling applicable to the study of organizations. Structural Equation Modeling: A
Multidisciplinary Quarterly, 4, 1–24.
Kaplan, D., & Elliott, P. R. (1997b). A model-based approach to validating education
indicators using multilevel structural equation modeling. Journal of Educational
and Behavioral Statistics, 22, 323–348.
Kaplan, D., & Ferguson, A. J. (1999). On the utilization of sample weights in latent vari-
able models. Structural Equation Modeling, 6, 305–321.
Kaplan, D., & George, R. (1998). Evaluating latent variable growth models through ex
post simulation. Journal of Educational and Behavioral Statistics, 23, 216–235.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp 6/25/2008 10:40 AM Page 238
Kaplan, D., Harik, P., & Hotchkiss, L. (2000). Cross-sectional estimation of dynamic
structural equation models in disequilibrium. In R. Cudeck, S. H. C. du Toit, &
D. Sorbom (Eds.), Structural equation modeling: Present and future. A festschrift
in honor of Karl G. Jöreskog (pp. 315–339). Lincolnville, IL: Scientific Software
International.
Kaplan, D., Kim, J.-S., and Kim, S.-Y. (in press). Multilevel Latent Variable Modeling:
Current research and recent developments. In R. Millsap & A. Maydeus-Olivaras
(Eds.), Sage handbook of quantitative methods in psychology. Thousand Oaks, CA
Sage.
Kaplan, D., & Kreisman, M. B. (2000). On the validation of indicators of mathematics
education using TIMSS: An application of multilevel covariance structure model-
ing. International Journal of Educational Policy, Research, and Practice, 1, 217–242.
Kaplan, D., & Walpole, S. (2005). A stage-sequential model of literacy transitions:
Evidence from the Early Childhood Longitudinal Study. Journal of Educational
Psychology, 97, 551–563.
Kaplan, D., & Wenger, R. N. (1993). Asymptotic independence and separability in
covariance structure models: Implications for specification error, power, and
model modification. Multivariate Behavioral Research, 28, 483–498.
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Educational
Research Association, 90, 773–795.
Keesling, J. W. (1972). Maximum likelihood approaches to causal analysis. Unpublished
doctoral dissertation, University of Chicago, Chicago.
Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences. Pacific
Grove, CA: Brooks/Cole.
Kish, L. (1965). Survey sampling. New York: Wiley.
Kish, L., & Frankel, M. R. (1974). Inference from complex samples. Journal of the Royal
Statistical Society Series B, 36, 1–37.
Koopmans, T. C. (1947). Measurement without theory. Review of Economics and
Statistics, 29, 161–172.
Koopmans, T. C. (Ed.). (1950). Statistical inference in dynamic economic time series
(Vol. 10). New York: Wiley.
Koopmans, T. C., Rubin, H., & Leipnik, R. B. (1950). Measuring the equation systems of
dynamic economics. In T. C. Koopmans (Ed.), Statistical inference in dynamic
economic models. (pp. 53–237). New York: Wiley.
Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. Thousand Oaks, CA:
Sage.
Land, K. C. (1973). Identification, parameter estimation, and hypothesis testing in recur-
sive sociological models. In A. S. Goldberger & O. D. Duncan (Eds.), Structural
equation models in the social sciences (pp. 19–49). New York: Seminar Press.
Langeheine, R., & Van de Pol, F. (2002). Latent Markov chains. In J. A. Hagenaars &
A. L. McCutcheon (Eds.), Applied latent class analysis (pp. 304–341). Cambridge,
UK: Cambridge University Press.
Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum
likelihood. Proceedings of the Royal Society of Edinburgh, 60, 64–82.
Lawley, D. N. (1941). Further investigations in factor estimation. Proceedings of the
Royal Society of Edinburgh, 61, 176–185.
Lawley, D. N., & Maxwell, A. E. (1971). Factor analysis as a statistical method. London:
Butterworth.
References—239
Lazarsfeld, P. (1950). The logical and mathematical foundations of latent structure

analysis. In S. S. Stouffer (Ed.), Measurement and prediction (pp. 361–412).
Princeton, NJ: Princeton University Press.
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton
Mifflin.
Lee, S.-Y., & Poon, W.-Y. (1998). Analysis of two-level structural equation models via
EM algorithms. Statistica Sinica, 8, 749–766.
Lewis, D. (1973). Counterfacturals. Malden, MA: Blackwell.
Linhart, H., & Zucchini, W. (1986). Model selection. New York: Wiley.
Little, R. J. A. (1982). Models for nonresponse in sample surveys. Journal of the American
Statistical Association, 77, 237–250.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York:
Wiley.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New
York: Wiley.
Longford, N. T., & Muthén, B. (1992). Factor analysis of clustered observations.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading,
MA: Addison-Wesley.
Lucas, R. E. (1976). Econometric policy evaluation: A critique. In K. Brunner &
A. H. Meltzer (Eds.), Carnegie-Rochester conference series on public policy: The
Phillips curve and labor markets (Vol. 1, pp. 19–46). Amsterdam: North-Holland.
Luijben, T., Boomsma, A., & Molenaar, I. W. (1987). Modification of factor analysis
models in covariance structure analysis. Heymans Bulletins Psychologische
Instituten, University of Groningen.
MacCallum, R. (1986). Specification searches in covariance structure modeling. Psychological
Bulletin, 100, 107–120.
MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and deter-
mination of sample size for covariance structure modeling. Psychological Methods,
1, 130–149.
MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in
covariance structure analysis: The problem of capitalization on chance.
Psychological Bulletin, 111, 490–504.
Mackie, J. L. (1980). The cement of the universe: A study of causation. Oxford, UK:
Oxford University Press.
Magnus, J. R., & Neudecker, H. (1988). Matrix differential calculus with applications in
statistics and econometrics. New York: Wiley.
Marschak, J. (1950). Statistical inference in economics. In T. Koopmans (Ed.), Statistical
inference in dynamic economic models (Cowles Commission Monograph No. 10,
pp. 1–50). New York: Wiley.
Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confir-
matory factor analysis: The effect of sample size. Psychological Bulletin, 103, 391–411.
McCutcheon, A. L. (2002). Basic concepts and procedures in single- and multiple-group
latent class analysis. In J. A. Hagenaars & A. L. McCutcheon (Eds.), Applied latent
class analysis. Cambridge, UK: Cambridge University Press.
McDonald, R. P., & Marsh, H. W. (1990). Choosing a multivariate model: Noncentrality
and goodness-of-fit. Psychological Bulletin, 107, 247–255.
McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance.

Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107–122.
Miller, J. D., Hoffer, T., Sucher, R. W., Brown, K. G., & Nelson, C. (1992). LSAY conde-
book: Student, parent, and teacher data for 1992 chohort two for longitudinal years
one through four (1987–1991). DeKalb: Northern Illinois University.
Mooijaart, A. (1998). Log-linear and Markov modelling of categorical longitudinal
data. In. C. C. J. H. Bijleveld & L. J. Th. van der Kamp (Eds.), Longitudinal data
analysis: Designs, models, and methods. Thousand Oaks, CA: Sage.
Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and
principles for social research. Cambridge, UK: Cambridge University Press.
Mulaik, S. (1972). The foundations of factor analysis. New York: McGraw-Hill.
Mulaik, S. A., James, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stillwell, C. D. (1989).
An evaluation of goodness-of-fit indices for structural equation models.
Psychological Bulletin, 105, 430–445.
Muthén, B. (1978). Contributions to factor analysis of dichotomous variables.
Muthén, B. (1983). Latent variable structural equation modeling with categorical data.
Journal of Econometrics, 22, 43–65.
Muthén, B. (1984). A general structural equation model with dichotomous, ordered
categorical, and continuous latent variable indicators. Psychometrika, 49, 115–132.
Muthén, B. (1987). Response to Freedman’s critique of path analysis: Improve credibil-
ity by better methodological training. Journal of Educational Statistics, 12,
178–184.
Muthén, B. (1989). Latent variable modeling in heterogeneous populations. Psychometrika,
54, 557–585.
Muthén, B. (1991). Analysis of longitudinal data using latent variable models with vary-
ing parameters. In L. Collins & J. Horn (Eds.), Best methods for the analysis of
change: Recent advances, unanswered questions, future directions (pp. 1–17).
Washington, DC: American Psychological Association.
Muthén, B. (1993). Goodness of fit with categorical and other non-normal variables. In
K. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 205–234).
Muthén, B. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika,
29, 81–117.
Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related tech-
niques for longitudinal data. In D. Kaplan (Ed.), Sage handbook of quantitative
methodology for the social sciences (pp. 345–368). Thousand Oaks, CA: Sage.
Muthén, B., & Hofacker, C. (1988). Testing the assumptions underlying tetrachoric cor-
relations. Psychometrika, 53, 563–578.
Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies for the factor
analysis of non-normal Likert variables. British Journal of Mathematical and
Statistical Psychology, 38, 171–189.
Muthén, B., & Kaplan, D. (1992). A comparison of some methodologies for the factor
analysis of non-normal Likert variables: A note on the size of the model. British
Journal of Mathematical and Statistical Psychology, 45, 19–30.
Muthén, B., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data
that are not missing completely at random. Psychometrika, 51, 431–462.
References—241
Muthén, B., & Satorra, A. (1989). Multilevel aspects of varying parameters in structural
models. In R. D. Bock (Ed.), Multilevel analysis of educational data. San Diego, CA:
Academic Press.
Muthén, B. O. (2001). Latent variable mixture modeling. In G. A. Marcoulides &
R. E. Schumacker (Eds.), New developments and techniques in structural equation
modeling. Mahaw, NJ: Lawrence Erlbaum.
Muthén, B. O., & Curran, P. J. (1997). General longitudinal modeling of individual dif-
ferences in experimental designs: A latent variable framework for analysis and
power estimation. Psychological Methods, 2, 371–402.
Muthén, B. O., du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted
least squares and quadratic estimating equations in latent variable modeling with
categorical outcomes. Unpublished manuscript, University of California, Los
Angeles.
Muthén, L. K., & Muthén, B. O. (2006). Mplus: Statistical analysis with latent variables.
Los Angeles: Muthén & Muthén.
Muthén, L. K., & Muthén, B. (1998–2007). Mplus user’s guide (5th ed.). Los Angeles:
Muthén & Muthén.
Nagin, D. S. (1999). Analyzing developmental trajectories: A semi-parametric, group-
based approach. Psychological Methods, 4, 139–157.
National Assessment of Educational Progress (NAEP). (1986). The NAEP 1986
Technical Report. Princeton, NJ. Educational Testing Service
National Center for Education Statistics. (1988). National educational longitudinal study
of 1988. Washington, DC: U.S. Department of Education.
National Center for Education Statistics. (2001). Early childhood longitudinal study:
Kindergarten class of 1998–99: Base year public-use data files user’s manual (No.
NCES 2001–029). Washington, DC: Government Printing Office.
Olsson, U. (1979). On the robustness of factor analysis against crude classification of
the observations. Multivariate Behavioral Research, 14, 485–500.
Organisation for Economic Co-operation and Development. (2004). The PISA 2003
assessment framework: Mathematics, reading, science, and problem solving knowl-
edge and skills. Paris: Author.
Pagan, A. R. (1984). Model evaluation by variable addition. In D. F. Hendry & K. F.
Wallis (Eds.), Econometrics and quantitative economics (pp. 275–314). Oxford, UK:
Basil Blackwell.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge
University Press.
Pearson, K., & Lee, A. (1903). On the laws of inheritance in man. Biometrika, 2,
357–462.
Pindyck, R. S., & Rubinfeld, D. L. (1991). Econometric models & economic forecasts. New
York: McGraw-Hill.
Potthoff, R. F., Woodbury, M. A., & Manton, K. G. (1992). “Equivalent sample size” and
“equivalent degrees of freedom” refinements using survey weights under super-
population models. Journal of the American Statistical Association, 87, 383–396.
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural
equation modeling. Psychometrika, 69, 167–190.
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A.
Bollen, & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180).
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and
data analysis methods (2nd ed.). Thousands Oaks, CA: Sage.
Richard, J.-F. (1982). Exogeneity, causality, and structural invariance in econometric
modeling. In G. C. Chow & P. Corsi (Eds.), Evaluating the reliability of macro-
economic models (pp. 105–118). New York: Wiley.
Rogosa, D. R., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the mea-
surement of change. Psychological Bulletin, 90, 726–748.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in
observational studies for causal effects. Biometrika, 70, 41–55.
Rubin, D. (1976). Inference and missing data. Biometrika, 63, 581–592.
Saris, W. E., & Stronkhorst, H. (1984). Causal modeling in nonexperimental research.
Amsterdam: Sociometric Research Foundation.
Saris, W. E., Satorra, A., & Sörbom, D. (1987). The detection and correction of specifi-
cation errors in structural equation models. In C. C. Clogg (Ed.), Sociological
methodology 1987 (pp. 105–129). San Francisco: Jossey-Bass.
Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A unified
approach. Psychometrika, 54, 131–151.
Satorra, A. (1992). Asymptotic robust inference in the analysis of mean and covariance
structures. In P. V. Marsden (Ed.), Sociological methodology 1992 (pp. 249–278).
Oxford, UK: Blackwell.
Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance struc-
ture analysis. Psychometrika, 50, 83–90.
Schmidt, W. H. (1969). Covariance structure analysis of the multivariate random effects
model. Unpublished doctoral dissertation, University of Chicago.
Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6,
461–464.
Shavelson, R. J., McDonnell, L. M., & Oakes, J. (Eds.). (1989). Indicators for monitoring
mathematics and science: A sourcebook. Santa Monica, CA: Rand Corporation.
Silvey, S. D. (1959). The Lagrangian multipler test. Annals of Mathematical Statistics, 30,
389–407.
Simon, H. A. (1953). Causal ordering and identifiability. In W. C. Hood & T. C.
Koopmans (Eds.), Studies in econometric method (pp. 49–74). New York: Wiley.
Sivo, S. A., Fan, X., & Witta, E. L. (2005). The biasing effects of unmodeled ARMA time
series processes on latent growth curve model estimates. Structural Equation
Modeling, 12, 215–232.
Sobel, M. E. (1990). Effect analysis and causation in linear structural equation models.
Sobel, M. E., & Bohrnstedt, G. W. (1985). Use of null models in evaluating the fit of
covariance structure models. In N. B. Tuma (Ed.), Sociological methodology 1985
(pp. 152–178). San Francisco: Jossey-Bass.
Sörbom, D. (1974). A general method for studying differences in factor means and fac-
tor structure between groups. British Journal of Mathematical and Statistical
Psychology, 27, 229–239.
Sörbom, D. (1978). An alternative to the methodology of analysis of covariance.
Sörbom, D. (1989). Model modification. Psychometrika, 54, 371–384.
Spanos, A. (1986). Statistical foundations of econometric modeling. Cambridge, UK:
Cambridge University Press.
References—243
Spanos, A. (1989). On rereading Haavelmo: A retrospective view of econometric mod-

eling. Econometric Theory, 5, 405–429.
Spanos, A. (1990). Towards a unifying methodological framework for econometric
modeling. In C. W. J. Granger (Ed.), Modelling economic series. Oxford, UK: Oxford
University Press.
Spanos, A. (1995). On theory testing in econometrics: Modeling with non-experimental
data. Journal of Econometrics, 67, 189–226.
Spanos, A. (1999). Probability theory and statistical inference. Cambridge, UK:
Cambridge University Press.
Spearman, C. (1904). General intelligence, objectively determined and measured.
American Journal of Psychology, 15, 201–293.
Stapleton, L. M. (2002). The incorporation of sample weights into multilevel structural
equation models. Structural Equation Modeling, 9, 475–502.
Stapleton, L. M. (2006). An assessment of practical solutions for structural equation
modeling with complex sample data. Structural Equation Modeling, 13, 28–58.
Steiger, J. H. (1989). Causal modeling : A supplementary module for SYSTAT and SYGRAPH.
Evanston, IL: SYSTAT.
Steiger, J. H. (1990). Structural model evaluation and modification: An interval estima-
tion approach. Multivariate Behavioral Research, 25, 173–180.
Steiger, J. H., & Lind, J. M. (1980). Statistically based tests for the number of common fac-
tors. Paper presented at the Psychometric Society, Iowa City, IA.
Steiger, J. H., Shapiro, A., & Browne, M. W. (1985). On the multivariate asymptotic dis-
tribution of sequential chi-square tests. Psychometrika, 50, 253–264.
Stigler, S. M. (1986). The history of statistics: The measurement of uncertainty before 1900.
Cambridge, MA: Harvard University Press.
Stouffer, S. S., Suchman, E. A., Devinney, L. C., Star, S. A., & Williams, R. M., Jr. (1949).
The American Soldier (Vol. 1). Princeton, NJ: Princeton University Press.
Tanaka, J. S. (1993). Multifaceted conceptions of fit in structural equation models. In
K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 10–39).
Tatsuoka, M. M. (1988). Multivariate analysis: Techniques for educational and psycholog-
ical research (2nd ed.). New York: Macmillan.
Theil, H. (1966). Applied economic forecasting. Amsterdam: North-Holland.
Thomson, G. H. (1956). The factorial analysis of human ability. Boston: Houghton Mifflin.
Thurstone, L. L. (1935). The vectors of the mind. Chicago: University of Chicago Press.
Thurstone, L. L. (1947). Multiple-factor analysis. Chicago: University of Chicago Press.
Timm, N. H. (1975). Multivariate analysis with applications in education and psychol-
ogy. Monterey, CA: Brooks/Cole.
Tisak, J., & Meredith, W. (1990). Longitudinal factor analysis. In A. von Eye (Ed.),
Statistical methods in longitudinal research (Vol. 1, pp. 125–149). New York:
Academic Press.
Tryfos, P. (1996). Sampling methods for applied research. New York: Wiley.
Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood fac-
tor analysis. Psychometrika, 38, 1–10.
Tuma, N. B., & Hannan, M. T. (1984). Social dynamics: Models and methods. New York:
Academic Press.
Van de Pol, F., & Langeheine, R. (1989). Mover-stayer models, mixed Markov models
and the EM algorithm; with an application to labour market data from the
Netherlands Socio-Economic Panel. In R. Coppi & S. Bolasco (Eds.), Multiway

data analysis (pp. 485–495). Amsterdam: North-Holland.
Vernon, P. E. (1961). The structure of human abilities (2nd ed.). London: Methuen.
Vining, R. (1949). Koopmans on the choice of variables to be studied and of methods
of measurement. Review of Economics and Statistics, 31, 77–94.
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with non-
normal variables: Problems and remedies. In R. Hoyle (Ed.), Structural equation
modeling: Concepts, issues, and applications. Thousand Oaks, CA: Sage.
White, H. (Ed.). (1982). Model specification [Special issue]. Amsterdam: North-Holland.
Wiggins, L. M. (1973). Panel analysis. Amsterdam: Elsevier.
Wiley, D. E. (1973). The identification problem in structural equation models with
unmeasured variables. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equa-
tion models in the social sciences (pp. 69–83). New York: Academic Press.
Willett, J. B. (1988). Questions and answers in the measurement of change. In
E. Z. Rothkopf (Ed.), Review of Research in Education (Vol. 15, pp. 345–422).
Washington, DC: American Educational Research Association.
Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates
and predictors of individual change over time. Psychological Bulletin, 116, 363–381.
Willett, J. B., & Sayer, A. S. (1996). Cross-domain analyses of change over time:
Combining growth modeling and covariance structure analysis. In G. A.
Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling.
Mahwah, NJ: Lawrence Erlbaum.
Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford, UK:
Oxford University Press.
Wright, S. (1918). On the nature of size factors. Genetics, 3, 367–374.
Wright, S. (1921). Correlation and causation. Journal of Agriculture Research, 20,
557–585.
Wright, S. (1934). The method of path coefficients. Annals of Mathematical Statistics, 5,
161–215.
Wright, S. (1960). Path coefficients and path regressions: Alternative or complementary
concepts? Biometrics, 16, 189–202.
Wrigley, C. (1956). An empirical comparison of various methods for the estimation of com-
munalities (Contract Rep. No. 1). Berkeley: University of California
Yuan, K.-H., & Bentler, P. M. (2005). Asymptotic robustness of the normal theory like-
lihood ratio statistic for two-level covariance structure models. Journal of
Multivariate Analysis, 94, 328–343.
Yuan, K.-H., & Hayashi, K. (2005). On Muthén’s maximum likelihood for two-level
covariance structure models. Psychometrika, 70, 147–167.
Index-Kaplan-45677:Index-Kaplan-45677.qxp 6/24/2008 4:04 PM Page 245
Index
Akaike information criterion (AIC) interventionist interpretation,

model evaluation, 113t, 116, 117–118, 228–229
121, 131n6, 132n7 invariance, 226–227
model modification, 126–127, manipulability theory of causation,
128t, 129, 130t 226–229
American Soldier, The (Stouffer, modularity, 226–227
Suchman, Devinney, Star and observationally equivalent
Williams), 5–6 models, 227–228
Approximation errors, 113–116 structural equation modeling, 220–229
Asymptotically independent, 100 Causality (Pearl), 221
Asymptotic distribution free (ADF) Cement of the Universe, The (Mackie), 222
estimation, 87 Characteristics roots, 44, 45–46, 47t, 59n3
Autoregressive latent trajectory (ALT) Class label, 182, 204n1
model, 167, 170–172, 173t Common factor model, 43, 47–49
Communality, 48–49
Bayes factor, 119 Comparative fit index (CFI), 111–112,
Bayesian information criterion (BIC) 113t, 131n2
model evaluation, 113t, 116, Complier average causal effects (CACE),
119, 121 204, 205n9
model modification, 126–127, Confirmatory factor analysis
128t, 129, 130t historical development, 3
Bias proportion, 176 multilevel, 138–139, 140t, 141f, 150, 151t
Biometrics, 3–4 restricted factor model, 54–58
Bishop, Y., 6 Constrained parameters, 15
Continuous latent variables
Calibration sample, 116, 119–120 finite mixture modeling, 197–203
Categorical latent variables historical development, 1–5
finite mixture modeling, 185–203 Counterfactual theory of causation
historical development, 5–6, 12n3 causal field, 222–223
Categorical variable methodology econometrics, 224–225, 231n11
(CVM), 88–89, 91t inus condition, 222–224, 225
Causal field, 222–223 structural equation modeling, 222–225
Causal inference Counting rule
counterfactual theory of causation, path analysis, 20
222–225 structural equation model, 63–64, 65
econometrics, 224–225 Covariance proportion, 177
245
Covariance structure specification Error term, 211, 212

path analysis, 17–18, 19–20, 38n4 Estimable model, 215–217
structural equation model, 63 Exogeneity
Cowles Commission for Research in parameters of interest, 102
Economics, 4, 12n1 strong exogeneity, 101
Cross-validation index (CVI), 119–120 structural equation modeling,
101–106
Data-generating process (DGP), 8, 215 super exogeneity, 101, 226
Data mining, 211, 212 variation free, 103
Devinney, L. C., 5–6 weak exogeneity, 101, 103–106,
Direct effect 218, 226
path analysis, 33–34, 36 Exogenous variables
structural equation model, 67–69 path analysis, 14, 16–17, 20, 27,
Discrepancy due to approximation, 114–115 33–34, 37n2, 38n8
Discrepancy due to estimation, 114–115 structural equation model, 62–63, 64
Discrete Multivariate Analysis (Bishop, Expectation-maximization (EM)
Fienberg, and Holland), 6 algorithm
finite mixture modeling, 183–185
Early Childhood Longitudinal Study model assumptions, 87–88, 98
(ECLS) (2001), 147, 152–153, multilevel modeling, 135–136
155, 185, 187, 199, 205n5 Expected cross-validation index (ECVI)
Econometrics model evaluation, 116, 119–120,
causal inference, 224–225 121, 131n5
conventional approach, 210–212 model modification, 126–127
counterfactual theory of causation, Expected parameter change (EPC),
224–225, 231n11 124–126, 127–129
data mining, 211, 212 Exploratory factor analysis
error term, 211, 212 historical development, 3
Haavelmo distribution, 211 structural equation model, 67
historical development, unrestricted factor model, 42–54
210–212, 231n3
path analysis, 13, 16, 18, 20, 21, 24 Factor analysis
simultaneous equation modeling, characteristics roots, 44, 45–46,
5, 13, 16, 24, 212 47t, 59n3
statistical adequacy, 211 common factor model, 43, 47–49
structural equation modeling, communality, 48–49
4, 9–10, 210–212, 224–225 confirmatory factor analysis, 54–58
textbook approach, 211–212 effect decomposition, 46, 47t, 59n4
Educational system, 10–12 exploratory factor analysis, 42–54
See also student achievement fitted covariance matrix, 41
examples fixed parameters, 55–56
Educational tracking, 80–83 free parameters, 55–56
Effect decomposition generalized least squares (GLS)
path analysis, 33–34, 35t, 36, 38n7 estimation, 54, 57
structural equation model, 67–69 Gramian matrix, 48–49
Empirical social science model, 219, 231n7 historical development, 1–3
Endogenous variables indeterminancy, 42–43, 47–48, 49–50
path analysis, 14, 16, 17, 33–34, 37n2 iterated principal axis factoring, 49
structural equation model, 62–63, 64 linear factor model, 40–41
Index—247
maximum likelihood (ML) local independence assumption,

estimation, 51–54, 55–56, 57 185–186
model assumptions, 41 longitudinal models, 189–197
model specification, 39–41 manifest Markov model,
model testing, 57–58 190–191, 193t
multilevel modeling, 136–139, 140t, Markov chain models, 189–197
141f, 150, 151t maximum likelihood (ML)
not identified, 43 estimation, 183–185
oblique rotation, 51–52, 53t mixture latent Markov model,
orthogonal rotation, 42–43, 196–197, 205n7
44, 45, 50–51 mover-stayer model, 196–197
parameter estimation, 43–49 nonstationary manifest Markov
parameter identification, 42–43, model, 191, 192t, 205n6
49–50, 55–56 overview, 182–183
path diagram, 57–58 stage-sequential dynamic latent
principal axis factoring, 44, 49, 51–52 variables, 194
principal components analysis (PCA), student reading achievement,
43, 44–47, 48f 185, 187–189, 191, 192t, 193t,
promax criterion, 51–52, 53t 194–196, 197, 198t, 200–203
rotation, 49–52 Fisher information matrix, 26
school climate perceptions, 39–41, Fisher’s experimental design,
46, 47t, 48f, 51–52, 55–56, 213, 214, 231n5
57–58, 59n2 Fitted covariance matrix
score vector, 41 factor analysis, 41
simple structure, 2, 3 path analysis, 23–26, 29
squared multiple correlation Fixed parameters
(SMC), 49 factor analysis, 55–56
true score theory, 41–42 multiple group modeling, 71–72, 79–80
unique variables, 41–42 path analysis, 15, 30–31
varimax criterion, 50–51 Forecasting statistics, 175–178
See also Structural equation model Free model, 83
Factorial invariance, 72, 74, 80–82 Free parameters
Fienberg, S., 6 factor analysis, 55–56
Finite mixture modeling multiple group modeling, 71–72
categorical-continuous latent path analysis, 15, 30–31
variables, 197–203 structural equation model, 64
categorical latent variables, 185–203 Full-information maximum likelihood
class label, 182, 204n1 (FIML) estimation, 24, 38n5
complier average causal effects Full quasi-likelihood (FQL) estimation,
(CACE), 204, 205n9 96–97, 107n4
cross-sectional models, 185–189
expectation-maximization (EM) Generalized least squares (GLS)
algorithm, 183–185 estimation
growth mixture modeling, factor analysis, 54, 57
198–203, 205n8 historical development, 3
latent class models, 185–189, 205n2 model assumptions, 87, 88
latent Markov model, 191–193 path analysis, 23, 27–29, 38n5
latent status, 194 Generalized linear latent and mixed
latent transition analysis, 193–197, 198t model (GLLAMM), 152
Generalized linear mixed model Inus condition, 222–224, 225

(GLMM), 152 Invariance
Genetics, 1–2 causal inference, 226–227
Gramian matrix, 48–49 multiple group modeling,
Growth curve modeling 72, 74, 80–82
alternative time metrics, 167, 172–174 path analysis, 29
autoregressive latent trajectory (ALT) strict factorial, 81
model, 167, 170–172, 173t strong factorial, 81
basic ideas, 156–157 Invariance model, 83
bias proportion, 176 Iterated principal axis factoring, 49
covariance proportion, 177
forecasting statistics, 175–178 Just identified
inequality coefficient, 176 path analysis, 20, 22
maximum likelihood (ML) structural equation model, 64
estimation, 162, 166t, 173t
model extensions, 167–174 Lagrange multiplier (LM) test
multilevel modeling, 158–159 model assumptions, 101
multivariate modeling, 167–168, 169t model modification, 122, 125,
nonlinear curve fitting, 167, 168–170 126–127
parallel growth process, 167–168 path analysis, 31, 37
path diagram, 162, 163f Latent class models, 185–189, 205n2
root mean square percent simulation Latent Markov model, 191–193
error (RMSPSE), 175–176 Latent status, 194
root mean square simulation error Latent structure analysis, 5–6, 12n3
(RMSSE), 175–176 Latent variable analysis, 82
structural equation modeling, 159–166 Likelihood ratio (LR) test
student science achievement, model modification, 122, 127–128
156–157, 161–166, 168–170, path analysis, 30–31, 36
169t, 172, 173t, 177–178 structural equation model, 65
student science attitudes, 156–157, Linear factor model
161–166, 168–170, 169t, 172, factor analysis, 40–41
173t, 177–178 model assumptions, 88–89
univariate modeling, 161–167 LISREL software program, 5
variance proportion, 176–177 Listwise present approach (LPA),
Growth mixture modeling, 93–94, 97, 98
198–203, 205n8 Local independence, 6, 185–186
London School of Economics, 212
Haavelmo distribution, 211 Longitudinal Study of American Youth
Holland, P., 6 (LSAY), 155, 161, 179n2
Ignorable mechanism, 92–93, 96–97 Mackie, J. L., 222

Indeterminancy, 42–43, 47–48, 49–50 Manifest Markov model,
Indirect effect 190–191, 193t
path analysis, 34, 35t, 36 Manipulability theory of causation,
structural equation model, 67–69 226–229
Inequality coefficient, 176 Markov chain models, 189–197
Input-process-output theory of Maximum likelihood first-order (MLF)
education, 10–12 estimation, 135–136
Index—249
Maximum likelihood (ML) estimation Mixture latent Markov model,

factor analysis, 51–54, 55–56, 57 196–197, 205n7
finite mixture modeling, 183–185 Model assumptions
growth curve modeling, asymptotically independent, 100
162, 166t, 173t asymptotic distribution free (ADF)
historical development, 3, 4 estimation, 87
model assumptions, 85, 87–88, 91t, exogeneity, 101–106
94–97, 98, 99 expectation-maximization (EM)
model evaluation, 117–118, 120 algorithm, 87–88, 98
model modification, 128t, 130t factor analysis, 41
multilevel modeling, 135–136 generalized least squares (GLS)
multiple group modeling, 71, 76t estimation, 87, 88
path analysis, 23, 24–27, 28t, 29, 38n5 Lagrange multiplier (LM) test, 101
structural equation model, 67, 68t linear factor model, 88–89
Maximum likelihood robust (MLR) maximum likelihood (ML)
estimation, 136–138 estimation, 85, 87–88, 91t,
Mean structure estimation, 74–76 94–97, 98, 99
Mean structure specification, 75 missing data, 92–98
Mean structure testing, 75 mutually asymptotically
Minimum Akaike information criterion independent, 100
(MAIC), 118 nonnormality estimation, 85–92
Missing at random (MAR), separable hypotheses, 100
92–93, 96–98 specification error, 98–101, 107n7
Missing completely at random student science achievement, 90–92
(MCAR), 92–94, 96–97, 98, 107n5 transitive hypotheses, 100
Missing data Wald test, 99–100, 101
case approaches, 93–94 weighted least squares (WLS)
full quasi-likelihood (FQL) estimation, 86–87, 89, 90–92
estimation, 96, -97, 107n4 Model evaluation
ignorable mechanism, 92–93, 96–97 Akaike information criterion (AIC),
listwise present approach (LPA), 113t, 116, 117–118, 121,
93–94, 97, 98 131n6, 132n7
MAR-based approaches, 97–98 alternative fit indices, 110–121
missing at random (MAR), approximation errors, 113–116
92–93, 96–98 Bayes factor, 119
missing completely at random Bayesian information criterion (BIC),
(MCAR), 92–94, 96–97, 113t, 116, 119, 121
98, 107n5 calibration sample, 116, 119–120
model-based approaches, 94–97 comparative fit index (CFI), 111–112,
nomenclature, 92–93 113t, 131n2
nonignorable mechanism, comparative fit measures, 110–113
92–93, 96–97 cross-validation index (CVI),
not missing at random (NMAR), 119–120
92–93 discrepancy due to approximation,
observed at random (OAR), 92–93 114–115
pairwise present approach (PPA), discrepancy due to estimation, 114–115
93–94, 97, 98 expected cross-validation index (ECVI),
structural equation modeling, 92–98 116, 119–120, 121, 131n5
maximum likelihood (ML) standardized expected parameter

estimation, 117–118, 120 change (SEPC), 125–126,
minimum Akaike information 127–129
criterion (MAIC), 118 statistical power, 121–131
model selection criteria, student science achievement,
116–121, 131n4 121–122, 127–129, 130t
nonnormed fit index (NNFI), Model specification
111, 113t, 131n1 factor analysis, 39–41
normed-fit index (NFI), 110–111, 116 multiple group modeling,
parsimony-based indices, 112 70–76, 84n2
parsimony normed-fit index path analysis, 14–18
(PNFI), 112 structural equation model, 62–63
predictive distribution, 117 Model testing
relative noncentrality index factor analysis, 57–58
(RNI), 111 multiple group modeling, 70–74
root mean square error of path analysis, 29–33
approximation (RMSEA), structural equation model, 65–69
113t, 115–116 Modification index (MI)
student science achievement, model modification, 122, 125,
112–113, 116 126–127
Tucker-Lewis index (TLI), 111, 113t, path analysis, 31
116, 131n1 Modularity, 226–227
validation sample, 116, 119–120 Mover-stayer model, 196–197
Model modification Mplus software program, 27, 67, 88, 90,
Akaike information criterion (AIC), 91, 98, 121, 132n8, 136, 152, 161
126–127, 128t, 129, 130t M test, 71
Bayesian information criterion (BIC), Multilevel latent variable model
126–127, 128t, 129, 130t (MLLVM), 135–136
expected cross-validation index Multilevel modeling
(ECVI), 126–127 basic ideas, 134–136
expected parameter change (EPC), expectation-maximization (EM)
124–126, 127–129 algorithm, 135–136
Lagrange multiplier (LM) test, factor analysis, 136–139, 140t, 141f,
122, 125, 126–127 150, 151t
likelihood ratio (LR) test, 122, 127–128 generalized linear latent and mixed
maximum likelihood (ML) model (GLLAMM), 152
estimation, 128t, 130t generalized linear mixed model
model selection criteria, (GLMM), 152
126–129, 130t growth curve modeling, 158–159
modification index (MI), 122, 125, maximum likelihood first-order
126–127 (MLF) estimation, 135–136
power estimation, 123–126 maximum likelihood (ML)
power influences, 129–131 estimation, 135–136
root mean square error of maximum likelihood robust (MLR)
approximation (RMSEA), estimation, 136–138
123–124, 128 multilevel latent variable model
sample size, 123–126 (MLLVM), 135–136
Index—251
multiple indicators/multiple causes Muthén’s maximum likelihood

(MIMIC) model, 139 (MUML) estimation, 135–136
Muthén’s maximum likelihood Mutually asymptotically
(MUML) estimation, 135–136 independent, 100
path analysis, 139, 142–147
path diagram, 139, 141f, 144, 146f National Educational Longitudinal
sampling weights, 147–151 Study (NELS) (1988), 14, 39–41,
student mathematics achievement, 59n1, 70, 155
138–139, 140t, 141f, 143–147, Nonignorable mechanism, 92–93, 96–97
150, 151t, 153n2 Nonlinear curve fitting, 167, 168–170
weighted least squares mean-adjusted Nonnormality estimation
(WLSM) estimation, 136 categorical variable methodology
weighted least squares (WLS) (CVM), 88–89, 91t
estimation, 136 continuous nonnormal data, 86–88
Multiple group modeling normal theory-based estimation,
educational tracking, 80–83 85, 86
exploratory factor analysis, 74 recent developments, 7, 90–92
factorial invariance, 72, 74, 80–82 structural equation modeling,
fixed parameters, 71–72, 79–80 85–92
free model, 83 Nonnormed fit index (NNFI),
free parameters, 71–72 111, 113t, 131n1
invariance model, 83 Nonrecursive path model
latent variable analysis, 82 identification, 21–23
maximum likelihood (ML) path analysis, 16–17, 21–23
estimation, 71, 76t Nonstationary manifest Markov model,
mean structure estimation, 74–76 191, 192t, 205n6
mean structure specification, 75 Normalization rule, 19
mean structure testing, 75 Normed-fit index (NFI), 110–111, 116
model specification, 70–76, 84n2 Not identified
model testing, 70–74 factor analysis, 43
M test, 71 path analysis, 20, 22, 23
multiple indicators/multiple causes structural equation model, 64
(MIMIC) model, 62, 76–80 Not missing at random (NMAR), 92–93
no mean structure, 70–74
parameter estimation, 75–76 Oblique rotation
parameter identification, 75–76 factor analysis, 51–52, 53t
propensity score, 82–83 structural equation model, 67
school climate perceptions, 70, 73–74, Observed at random (OAR), 92–93
76–80 Order condition, 22–23
selection bias, 80–83 Organization for Economic
strict factorial invariance, 81 Cooperation and Development
strong factorial invariance, 81 (OECD), 138
unique variables, 70, 72, 80 Orthogonal rotation, 42–43, 44, 45,
Multiple indicators/multiple causes 50–51
(MIMIC) model, 62, 76–80, 139 Overidentified
Multivariate growth curve modeling, path analysis, 20, 22
167–168, 169t structural equation model, 64
Pairwise present approach (PPA), 93–94, parameter interpretation, 33–37, 38n6

97, 98 parameter testing, 29–33
Parallel growth process, 167–168 path diagram, 14–15, 37n1
Parameters of interest, 102 rank condition, 21–22
Parsimony normed-fit index (PNFI), 112 recursive path model, 16–17
Path analysis recursive rule, 20–21
constrained parameters, 15 reduced form specification, 17–18
counting rule, 20 scale freeness, 29
covariance structure specification, scale invariance, 29
17–18, 19–20, 38n4 score vector, 31
direct effect, 33–34, 36 standardized solutions, 35–37
econometrics, 13, 16, 18, 20, 21, 24 student science achievement,
effect decomposition, 33–34, 35t, 14–15, 20, 26–27, 28t, 30,
36, 38n7 31–33, 34, 35t, 36
endogenous variables, 14, 16, 17, total effect, 34, 35t, 36
33–34, 37n2 unweighted least squares (ULS)
exogenous variables, 14, 16–17, estimation, 28–29
20, 27, 33–34, 37n2, 38n8 Wald test, 32
Fisher information matrix, 26 weighted least squares (WLS)
fitted covariance matrix, 23–26, 29 estimation, 27–29
fixed parameters, 15, 30–31 See also Structural equation model
free parameters, 15, 30–31 Path diagram
full-information maximum likelihood factor analysis, 57–58
(FIML) estimation, 24, 38n5 growth curve modeling, 162, 163f
generalized least squares (GLS) multilevel modeling, 139, 141f,
estimation, 23, 27–29, 38n5 144, 146f
historical development, 3–4 path analysis, 14–15, 37n1
indirect effect, 34, 35t, 36 Pearl, J., 221
just identified, 20, 22 Predictive distribution, 117
Lagrange multiplier (LM) test, 31, 37 Principal axis factoring, 44, 49, 51–52
likelihood ratio (LR) test, 30–31, 36 Principal components analysis (PCA),
maximum likelihood (ML) 43, 44–47, 48f
estimation, 23, 24–27, 28t, Probabilistic reduction approach
29, 38n5 data-generating process (DGP), 215
model specification, 14–18 diagram, 216f
model testing, 29–33 elements of, 215–220
modification index (MI), 31 empirical social science model,
multilevel modeling, 139, 142–147 219, 231n7
nonrecursive path model, 16–17 estimable model, 215–217
nonrecursive path model Fisher’s experimental design, 213,
identification, 21–23 214, 231n5
normalization rule, 19 historical development, 213–214
not identified, 20, 22, 23 identification, 218–219
order condition, 22–23 modeling steps, 219–220
overidentified, 20, 22 statistical model, 217–219
parameter estimation, 23–29 theoretical model, 215
parameter identification, 18–23 theory, 215, 231n6
parameter identification defined, 19 theory of errors, 211, 213–214
parameter identification rules, 19–21 weak exogeneity, 218
Index—253
Promax criterion, 51–52, 53t Standardized expected parameter

Propensity score, 82–83 change (SEPC), 125–126, 127–129
Psychometrics, 1–3 Standardized solutions
path analysis, 35–37
RAND Corporation Indicators Model, structural equation model, 65–69
10, 11f Star, S. A., 5–6
Rank condition, 21–22 Statistical adequacy, 211
Recursive path model, 16–17 Statistical assumptions. See Model
Recursive rule, 20–21 assumptions
Reduced form specification, 17–18 Statistical model, 217–219
Relative noncentrality index (RNI), 111 Stouffer, S. S., 5–6
Root mean square error of approximation Strict factorial invariance, 81
(RMSEA), 113t, 115–116 Strong exogeneity, 101
Root mean square percent simulation Strong factorial invariance, 81
error (RMSPSE), 175–176 Structural equation model
Root mean square simulation error counting rule, 63–64, 65
(RMSSE), 175–176 covariance structure specification, 63
Rotation direct effect, 67–69
factor analysis, 49–52 effect decomposition, 67–69
oblique rotation, 51–52, 53t, 67 endogenous variables, 62–63, 64
orthogonal rotation, 42–43, 44, 45, exogenous variables, 62–63, 64
50–51 exploratory factor analysis, 67
free parameters, 64
Scale freeness, 29 indirect effect, 67–69
Scale invariance, 29 just identified, 64
School climate perceptions likelihood ratio (LR) test, 65
factor analysis, 39–41, 46, 47t, 48f, maximum likelihood (ML)
51–52, 55–56, 57–58, 59n2 estimation, 67, 68t
multiple group modeling, 70, 73–74, model specification, 62–63
76–80 model testing, 65–69
Score vector not identified, 64
factor analysis, 41 oblique rotation, 67
path analysis, 31 overidentified, 64
SEMNET, 6, 12n5 parameter identification, 63–65
Separable hypotheses, 100 parameter interpretation, 65–69
Simple structure, 2, 3 standardized solutions, 65–69
Simultaneous equation modeling student science achievement,
econometrics, 5, 13, 16, 24, 212 64f, 66–69
historical development, 4, 5 total effect, 67–69
See also Econometrics; Path analysis two-step rule, 64–65
Specification error See also Model assumptions; Model
basic problem, 99 evaluation; Model modification;
problem research, 99–100 Model specification; Model
specification error propagation, 101 testing
structural equation modeling, Structural equation modeling
98–101, 107n7 biometrics, 3–4
Squared multiple correlation (SMC), 49 causal inference, 220–229
Stage-sequential dynamic latent comprehensive methodology,
variables, 194 61, 84n1
conventional approach, 7–10, Theoretical model, 215

209–210 Theory, 215, 231n6
counterfactual theory of causation, Theory of errors, 211, 213–214
222–225 Total effect
econometrics, 4, 9–10, 210–212, path analysis, 34, 35t, 36
224–225 structural equation model, 67–69
educational system, 10–12 Transitive hypotheses, 100
growth curve modeling, 159–166 True score theory, 41–42
historical development, 1–10 Tucker-Lewis index (TLI), 111, 113t,
manipulability theory of causation, 116, 131n1
226–229 Two-step rule, 64–65
probabilistic reduction approach,
212–220 Unique variables
psychometrics, 1–3 factor analysis, 41–42
recent developments, 6–7 multiple group modeling, 70, 72, 80
See also Factor analysis; Finite Univariate growth curve modeling,
mixture modeling; Growth curve 161–167
modeling; Multilevel modeling; Unweighted least squares (ULS)
Multiple group modeling; Path estimation, 28–29
analysis
Student mathematics achievement, Validation sample, 116, 119–120
138–139, 140t, 141f, 143–147, 150, Variance proportion, 176–177
151t, 153n2 Variation free, 103
Student reading achievement, 185, Varimax criterion, 50–51
187–189, 191, 192t, 193t, 194–196,
197, 198t, 200–203 Wald test
Student science achievement model assumptions, 99–100, 101
growth curve modeling, 156–157, path analysis, 32
161–166, 168–170, 169t, 172, Weak exogeneity, 101, 103–106, 218, 226
173t, 177–178 Weighted least squares mean-adjusted
model assumptions, 90–92 (WLSM) estimation
model evaluation, 112–113, 116 model assumptions, 90–92
model modification, 121–122, multilevel modeling, 136
127–129, 130t Weighted least squares mean/variance-
path analysis, 14–15, 20, 26–27, 28t, adjusted (WLSMV) estimation,
30, 31–33, 34, 35t, 36 90–92
structural equation model, 64f, 66–69 Weighted least squares (WLS)
Student science attitudes, 156–157, estimation
161–166, 168–170, 169t, 172, 173t, model assumptions, 86–87, 89, 90–92
177–178 multilevel modeling, 136
Suchman, E. A., 5–6 path analysis, 27–29
Super exogeneity, 101, 226 Williams, R. M., Jr., 5–6
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp 6/24/2008 4:04 PM Page 255
About the Author
David Kaplan received his PhD in Education from UCLA in 1987, after which
he joined the faculty of the University of Delaware where he remained until
2006. He is currently Professor of Quantitative Methods in the Department of
Educational Psychology at the University of Wisconsin–Madison. His current
research focuses on the problem of causal inference in nonexperimental settings
within a “structuralist” perspective. He also maintains a strong and active inter-
est in the development and testing of statistical models for social and behavioral
processes that are not necessarily directly observed—including latent variable
models, growth curve models, mixture models, and Markov models. His
collaborative work involves applications of advanced statistical methods to
problems in education and human development. His Web site can be found at
http://www.education.wisc.edu/edpsych/facstaff/kaplan/kaplan.htm.
255

Structural Equation Modeling Foundation and Extensions

Uploaded by

Copyright:

Available Formats

Structural Equation Modeling Foundation and Extensions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Structural Equation Modeling Foundation and Extensions

Uploaded by

Copyright:

Available Formats

What are the different volumes in the series?

What are the different volumes in the series?

What is the copyright information for the book?

What is the copyright information for the book?

Fm-Kaplan-45677:Fm-Kaplan-45677.

qxp 6/24/2008 6:29 PM Page i

Advanced Quantitative Techniques

Copyright © 2009 by SAGE Publications, Inc.

SAGE Publications, Inc. SAGE Publications India Pvt. Ltd.

SAGE Publications Ltd. SAGE Publications

Printed in the United States of America

Library of Congress Cataloging-in-Publication Data

Kaplan, David, 1955-

Printed on acid-free paper

Acquiring Editor: Vicki Knight

Series Editors’ Introduction to Structural Equation Modeling vi

1. Historical Foundations of Structural Equation

Series Editors’ Introduction to

Series Editors’ Introduction to Structural Equation Modeling—vii

extended manuals of specific SEM software packages. Kaplan’s second edition

viii—STRUCTURAL EQUATION MODELING

In the second edition, many more modern developments in longitudinal and

Preface to the Second Edition

x—STRUCTURAL EQUATION MODELING

equation modeling framework is also provided. Chapter 6 updates the section

Preface to the Second Edition—xi

Sage Publications appreciates the constructive comments and suggestions pro-

To Allison, Rebekah, and Hannah.

S tructural equation modeling can perhaps best be defined as a class of

1.1 Psychometric Origins of Structural

Structural equation modeling for continuous variables latent variables repre-

2—STRUCTURAL EQUATION MODELING

second tradition is simultaneous equation modeling developed mainly in

Historical Foundations of Structural Equation Modeling—3

1.2 Biometric and Econometric

Structural equation modeling represents a melding of factor analysis and path

4—STRUCTURAL EQUATION MODELING

where y is a vector endogenous variables that the model is specified to explain,

Historical Foundations of Structural Equation Modeling—5

1.3 Simultaneous Equation Modeling

1.4 Structural Models for Categorical Latent Variables

6—STRUCTURAL EQUATION MODELING

1.5 Modern Developments

Historical Foundations of Structural Equation Modeling—7

remarkable developments in the statistical theory underlying structural equa-

1.6 The “Conventional” Practice

8—STRUCTURAL EQUATION MODELING

specific substantive problems. Interesting substantive problems lead to cutting-

Historical Foundations of Structural Equation Modeling—9

Figure 1.1 Diagram of Conventional Approach to Structural Equation Modeling

that articulated well enough to lead to theoretical models (mathematical

10—STRUCTURAL EQUATION MODELING

econometric perspective has been articulated by Spanos (1986, 1990, 1995),

1.7 A Note on the Substantive Examples

Historical Foundations of Structural Equation Modeling—11

Inputs Processes Outputs

Teacher School Instructional Participation/

Figure 1.2 The Input-Process-Output Model of the U.S. Educational System

SOURCE: From Shavelson, McDonnell, and Oakes (1989).

The statistical methodology most suited to capturing the complexity of rela-