Discovering Structural Equation - Alan C. Acock
Discovering Structural Equation - Alan C. Acock
Discovering Structural Equation - Alan C. Acock
Published by Stata Press, 4905 Lakeway Drive, College Station, Texas 77845
Typeset in LATEX 2
10 9 8 7 6 5 4 3 2 1
No part of this book may be reproduced, stored in a retrieval system, or transcribed, in any
form or by any means—electronic, mechanical, photocopy, recording, or otherwise—
without the prior written permission of StataCorp LP.
Stata and Stata Press are registered trademarks with the World Intellectual Property
Organization of the United Nations.
Preface
Acknowledgments
1 Introduction to confirmatory factor analysis
1.1 Introduction
1.2 The “do not even think about it” approach
1.3 The principal component factor analysis approach
1.4 Alpha reliability for our nine-item scale
1.5 Generating a factor score rather than a mean or summative score
1.6 What can CFA add?
1.7 Fitting a CFA model
1.8 Interpreting and presenting CFA results
1.9 Assessing goodness of fit
1.9.1 Modification indices
1.9.2 Final model and estimating scale reliability
1.10 A two-factor model
1.10.1 Evaluating the depression dimension
1.10.2 Estimating a two-factor model
1.11 Parceling
1.12 Extensions and what is next
1.13 Exercises
1.A Using the SEM Builder to run a CFA
1.A.1 Drawing the model
1.A.2 Estimating the model
2 Using structural equation modeling for path models
2.1 Introduction
2.2 Path model terminology
2.2.1 Exogenous predictor, endogenous outcome, and endogenous
mediator variables
2.2.2 A hypothetical path model
2.3 A substantive example of a path model
2.4 Estimating a model with correlated residuals
2.4.1 Estimating direct, indirect, and total effects
2.4.2 Strengthening our path model and adding covariates
2.5 Auxiliary variables
2.6 Testing equality of coefficients
2.7 A cross-lagged panel design
2.8 Moderation
2.9 Nonrecursive models
2.9.1 Worked example of a nonrecursive model
2.9.2 Stability of a nonrecursive model
2.9.3 Model constraints
2.9.4 Equality constraints
2.10 Exercises
2.A Using the SEM Builder to run path models
3 Structural equation modeling
3.1 Introduction
3.2 The classic example of a structural equation model
3.2.1 Identification of a full structural equation model
3.2.2 Fitting a full structural equation model
3.2.3 Modifying our model
3.2.4 Indirect effects
3.3 Equality constraints
3.4 Programming constraints
3.5 Structural model with formative indicators
3.5.1 Identification and estimation of a composite latent variable
3.5.2 Multiple indicators, multiple causes model
3.6 Exercises
4 Latent growth curves
4.1 Discovering growth curves
4.2 A simple growth curve model
4.3 Identifying a growth curve model
4.3.1 An intuitive idea of identification
4.3.2 Identifying a quadratic growth curve
4.4 An example of a linear latent growth curve
4.4.1 A latent growth curve model for BMI
4.4.2 Graphic representation of individual trajectories (optional)
4.4.3 Intraclass correlation (ICC) (optional)
4.4.4 Fitting a latent growth curve
4.4.5 Adding correlated adjacent error terms
4.4.6 Adding a quadratic latent slope growth factor
4.4.7 Adding a quadratic latent slope and correlating adjacent
error terms
4.5 How can we add time-invariant covariates to our model?
4.5.1 Interpreting a model with time-invariant covariates
4.6 Explaining the random effects—time-varying covariates
4.6.1 Fitting a model with time-invariant and time-varying
covariates
4.6.2 Interpreting a model with time-invariant and time-varying
covariates
4.7 Constraining variances of error terms to be equal (optional)
4.8 Exercises
5 Group comparisons
5.1 Interaction as a traditional approach to multiple-group comparisons
5.2 The range of applications of Stata’s multiple-group comparisons
with sem
5.2.1 A multiple indicators, multiple causes model
5.2.2 A measurement model
5.2.3 A full structural equation model
5.3 A measurement model application
5.3.1 Step 1: Testing for invariance comparing women and men
5.3.2 Step 2: Testing for invariant loadings
5.3.3 Step 3: Testing for an equal loadings and equal error-
variances model
5.3.4 Testing for equal intercepts
5.3.5 Comparison of models
5.3.6 Step 4: Comparison of means
5.3.7 Step 5: Comparison of variances and covariance of latent
variables
5.4 Multiple-group path analysis
5.4.1 What parameters are different?
5.4.2 Fitting the model with the SEM Builder
5.4.3 A standardized solution
5.4.4 Constructing tables for publications
5.5 Multiple-group comparisons of structural equation models
5.6 Exercises
6 Epilogue–what now?
6.1 What is next?
A The graphical user interface
A.1 Introduction
A.2 Menus for Windows, Unix, and Mac
A.2.1 The menus, explained
A.2.2 The vertical drawing toolbar
A.3 Designing a structural equation model
A.4 Drawing an SEM model
A.5 Fitting a structural equation model
A.6 Postestimation commands
A.7 Clearing preferences and restoring the defaults
B Entering data from summary statistics
References
Author index
Subject index
Tables
1.1 Final results for CFA model
2.1 Standardized effects of attention span at age 4 and math achievement at
age 21 with correlated residual for math at age 7 and reading at age 7
3.1 Requirements for levels of invariance
3.2 Comparison of two models
5.1 Comparison of models
5.2 Comparison of models
5.3 Model testing of covariance difference
5.4 Summary table for our multiple-group results
Figures
1.1 Histogram of generated mean score on conservatism
1.2 Generated mean score on conservatism versus factor score on
conservatism
1.3 CFA for nine-item conservatism scale
1.4 Presenting standardized results for a single-factor solution
1.5 Final model
1.6 Two-factor CFA, final model
1.7 Measurement component dialog box
1.8 Two-factor conceptual model
1.9 Unstandardized solution
1.10 Variable settings dialog box
1.11 Presentation figure
1.12 Final presentation figure with legend
2.1 Recursive path model
2.2 Attention span at age 4 and math achievement at age 21 ( )
2.3 Standardized estimates for attention span at age 4 and math achievement
at age 21 ( )
2.4 Revised model of attention span at age 4 and math achievement at age 21
with correlated errors ( )
2.5 Standardized estimates for attention span at age 4 and math achievement
at age 21 with correlated errors ( ;* , ** , and ***
)
2.6 Controlling for covariates can weaken a relationship, as illustrated by
gender and vocabulary skills at age 4
2.7 Cross-lagged panel design
2.8 Estimated model for cross-lagged panel relating reading and math skills
(* , ** , and *** )
2.9 Bivariate relationship between MPG and weight in 1,000s
2.10 Weight and foreign brand as predictors of MPG (*** ; ns not
significant)
2.11 The relationship between weight and MPG as mediated by whether the
car is a foreign brand (* and *** )
2.12 A scatterplot of predicted values between weight and MPG as mediated
by whether the car is a foreign brand
2.13 Reciprocal relationship between a wife’s satisfaction and her husband’s
satisfaction: A nonrecursive model (unidentified)
2.14 Reciprocal relationship between a wife’s satisfaction and her husband’s
satisfaction: A nonrecursive model (identified)
2.15 Reciprocal relationship between respondent and friend for occupational
aspirations
2.16 Estimated reciprocal relationship between respondent and friend for
occupational aspirations ( )
2.17 Model fit with standardized path coefficients displayed
2.18 Model fit with less clutter and moved path values
3.1 A structural equation model
3.2 Stability of alienation ( )
3.3 Model with correlated error terms to acknowledge the possible operation
of unobserved variables
3.4 Final model (standardized; *** )
3.5 A model of marital satisfaction (hypothetical data)
3.6 Formative model using the SEM Builder
3.7 Standardized solution when SES66 is a composite variable
3.8 MIMIC model
4.1 A linear growth curve
4.2 A linear relationship with two, three, and four data points
4.3 Growth curve with linear and quadratic components
4.4 Linear latent growth curve for BMI between 20 and 28 years of age
4.5 Subset of our data in wide format
4.6 Subset of our data in long format
4.7 Growth patterns for a sample of people
4.8 Plot of means by year
4.9 A quadratic growth curve relating BMI to age
4.10 The effects of a time-varying covariate (spikes and dips)
4.11 Simplified BMI model with time-invariant and time-varying covariates
5.1 Additive and interactive models
5.2 Confirmatory factor analysis for depression and support for government
intervention (standardized; maximum likelihood estimator; *** )
5.3 Same form equivalence model for women and men (unstandardized)
solution
5.4 Variable settings dialog box
5.5 Comparison of means on depression and government responsibility for
women and men with an equal loadings measurement model (** ;
*** )
5.6 Path model
5.7 Standardized results for children who are not foster children ( )
and who are foster children ( )
5.8 Model drawn showing parameters that are constrained
5.9 Full structural equation model predicting peer relations from a person’s
physical fitness and attractiveness
5.10 Multigroup comparison of fourth and fifth grade students (** ;
*** )
A.1 The drawing toolbar
A.2 A model built in the SEM Builder with types of objects labeled
A.3 Final figure
A.4 Laying out our latent variables
A.5 Adding observed indicators
A.6 Adding paths
A.7 Final figure
A.8 Unstandardized solution
A.9 Standardized results
A.10 Text block with Greek letters
B.1 Original model correlating executive functioning and math achievement
B.2 Your model of cognitive ability, working memory, and inhibitory control
to predict math achievement
Boxes
1.1 Using the SEM Builder to draw a model
1.2 How close does our model come to fitting the covariance matrix?
1.3 Interpretation of CFA results
1.4 Working with missing values
1.5 What you need to know about identification
1.6 Differences between Windows and Mac
2.1 Traditional approach to mediation
2.2 Proportional interpretation of direct and indirect effects
2.3 Testing standardized indirect and specific indirect effects
3.1 Reordering labels for the error terms
3.2 Changing the reference indicator
3.3 Testing simultaneous equality constraints with three or more indicators
3.4 Equality constraints using the SEM Builder
3.5 Heywood case—negative error variances
4.1 Fitting a growth curve with the SEM Builder
4.2 Interpreting a linear slope when there is a quadratic slope
4.3 A note of caution when using method(mlmv)
4.4 Reconciling the sem command and the mixed command
5.1 Why we use unstandardized values in group comparisons
5.2 Why does it matter if groups have different ’s?
5.3 Using 0: and 1: is optional here but can be useful
A.1 Draw a figure freehand
Preface
What is assumed?
There are two ways of learning about structural equation modeling (SEM).
The one I have chosen for this book is best described by an old advertising
tag for a sport shoe company: “Just do it”. My approach could be called
kinetic learning because it is based on the tactile experience of learning about
SEM by using Stata to estimate and interpret models. This means you should
have Stata open while you read this book; otherwise, this book might help
you go to sleep if you try to read it without simultaneously working through
it on your computer. By contrast, if you do work through the examples in the
book by running the commands as you are reading, I hope you develop the
same excitement that I have for SEM.
The alternative approach to learning SEM is to read books that are much
more theoretical and may not even illustrate the mechanics of estimating
models. These kinds of books are important, and reading them will enrich
your understanding of SEM. This book is not meant to replace those books,
but simply to get you started. My intent is for you to work your way through
this book sequentially, but I recognize that some readers will want to skip
around. I am hopeful that after you have been through this book once, you
will want to return to specific chapters to reference techniques covered there.
To facilitate this, each chapter includes some repetition of the most salient
concepts covered in prior chapters. There is also a detailed index at the end of
the book.
What background is assumed? A person who has never used Stata will
need some help getting started. A big part of Stata’s brilliance is its
simplicity, so a few minutes of help will get you up and ready for what you
need to know about Stata. If you are new to Stata, have a friend who is
familiar with the program show you the basics. If you have read my book A
Gentle Introduction to Stata (2012a), you are ahead of the game. If you have
any experience using Stata, then you are in great shape for this book. If you
are a longtime Stata user, you will find that parts of this book explain things
you already know.
To get the most out of this book, you need to have some background in
statistics with experience in multiple regression. If you know path analysis,
you will find the SEM approach to path analysis a big improvement over
traditional approaches; however, the material on path analysis has been
written for someone who has had very little exposure to path analysis. Even
though the first chapter begins by covering how factor analysis has been used
traditionally, a background in factor analysis is less important than having
had some exposure to multiple regression. The first chapter shows how
confirmatory factor analysis adds capabilities to move beyond the traditional
approach—you may never want to rely on alpha and principal component
factor analysis again for developing a scale. I have covered enough about the
traditional applications of factor analysis that you will be okay if you have
had little or no prior exposure to factor analysis.
The book has two appendixes. Appendix A shows you how to use Stata’s
graphical user interface (GUI) to draw and estimate models with Stata’s SEM
Builder. It would be very useful to begin here so that you are familiar with
the SEM Builder interface. If you have no background in SEM, you will not
understand how to interpret the results you generate in appendix A, but this is
not the point. Appendix A is just there to acquaint you with the SEM Builder
that Stata introduced in version 12 and enhanced in version 13. How the
interface works is the focus of appendix A. In the text, I use this GUI fairly
often, but the focus is on understanding why we are estimating models the
way we do and how we interpret and present the results. All the figures
presented in this book were created using the SEM Builder, which produces
publication-quality figures—far better than what you can draw with most
other software packages that produce “near” publication-quality figures.
Appendix B shows you how to work with summary data (means, standard
deviations, correlations) that are often reported in published works. You will
be able to fit most models with these summary statistics even if you do not
have the real data. This feature is great when you read an article and would
like to explore how alternative models might be more appropriate. Many
articles include a correlation matrix along with standard deviations and
means. If these are not included, it is easier to request them from the author
than it is to request the author’s actual data.
In addition to the two main appendixes, the first two chapters each have
their own appendix that briefly describes using the SEM Builder for the
models estimated in that chapter.
Though the chapters are fairly long, they are broken up into more
manageable sections. If you are like me, once you know the commands I
cover, you will have enough on your plate that you will forget the specifics
before you need to fit a particular type of model. The sections in each chapter
build on each other but are sufficiently independent that you should find them
useful as a reference. Someday you will want to estimate a nonrecursive path
model or a mediation model; you can easily find the section covering the
appropriate model and come back to it. At the same time, this book does not
attempt to compete with Stata’s own Structural Equation Modeling Reference
Manual, or [SEM]; I only cover a widely used subset of the options and
postestimation commands available in Stata’s SEM package.
When you run these three commands, you do not type the initial period and
space, called the dot prompt. A convention in all Stata documentation and
output in the Results window is to include the dot prompt as a prefix to each
command, but you need only type the command itself.
There are several varieties of Stata software, and all of these are able to
run the models described in this text. I focus on the Windows and Mac
operating systems, and I show when there are slight differences in how they
work in the GUI. The Unix GUI is very similar to the Windows GUI. The same
Stata do-files run on all operating systems, though the systems differ slightly
in how the file structure is organized.
One variety of Stata is called Small Stata. This is full featured and is
small only in the sense of being limited in the number of observations (1,200)
and variables (99) it can handle. Because a few of the datasets I use have
more than 1,200 observations, I have made up smaller datasets that will work
using Small Stata. You can obtain these datasets by entering the following in
the Command window:
Using the Small Stata data, you will get somewhat different results for some
models in the book simply because you will be using a smaller dataset. In
addition, there are three models in chapter 4 that will not run using Small
Stata.
At the end of each chapter, you will find some exercises that illustrate the
material covered in the chapter. It is important to fit all the models in the text
while you read the book because this reinforces what you are learning, as
does typing in the commands yourself. The exercises extend this learning
process by having you develop your own set of commands and models using
the GUI system.
There is much more to SEM than could possibly be covered in a book this
size. This book is intended to complement the material in the Stata manuals
(over 11,000 pages of helpful information), which are available as PDF files
when you install Stata. One way to access the [SEM] manual is to type help
sem in the Command window of Stata. This opens a help file. At the top left
of the help file, the title ([SEM] sem and gsem) is highlighted in blue.
Clicking on this blue link will open up the PDF file of the [SEM] manual.
Conventions
Typewriter font.
I use a typewriter font when something would be meaningful to Stata
as input. This would be the case for something you type in the
Command window or in a do-file. If a command is separated from the
main text, as in
a dot prompt will precede the command. I also use a typewriter font for
all Stata results, variable names, folders, and filenames.
Bold font.
I use a bold font for menu items and for buttons you click within a
menu. The bold font helps distinguish the button from the text; for
example, you might be instructed to click the Adjust Canvas Size
button.
Slant font.
I use a slant font when referring to keys on our keyboard, such as the
Enter key.
Italic font.
I use an italic font when referring to text in a menu that you need to
replace with something else, such as an actual variable name.
Capitalization.
Stata is case sensitive. The command sem (compliance <- educ
income gender) will produce a maximum likelihood multiple
regression. If you replace sem with Sem, Stata will report that it has no
command called Sem. I will use lowercase for all commands and all
observed variables. When I refer to latent variables, I will capitalize the
first letter of the latent variable. A simple confirmatory factor analysis
would be sem (Alienation -> anomia isolate depress report).
Only the latent variable, Alienation, is capitalized. The arrow indicates
that observed variables measure how a person responds on an anomia
scale labeled anomia, an isolation scale labeled isolate, a depression
scale labeled depress, and a reported score from an observer labeled
report. All four of these observed variables depend on their level of
Alienation, the latent variable.
Acknowledgments
I acknowledge the extraordinary support of the Stata staff who have worked
with me on this project. Kristin MacDonald was the statistical reviewer of
this book, but her contribution is much greater than you might expect from a
reviewer. Because this book was written for the first version of the structural
equation modeling capabilities in Stata, I was learning new options as I was
writing the book. Kristin MacDonald pointed out better ways of doing many
of the procedures covered in this book. Her advice has greatly enhanced the
value of this book. Writing a book on an advanced statistical application with
the intention of making it accessible to people who are just learning the
application is challenging. Deirdre Skaggs was the English editor, and she
made innumerable changes in the wording that helped to clarify what I was
trying to write. Annette Fett designed the cover. The idea of learning a new
application and creating a new research capability was cleverly captured in
her design. Kaycee Headley assisted me in preparing a draft for reviews. I
want to thank Stephanie White, who converted my text to proper LATEX ,
which is the document markup language used for TEX typesetting. Those of
you who are familiar with LATEX can fully appreciate what this means. Stata
works very well with LATEX , and all the tables of results that appear in the
book are exactly what you see when you run Stata.
1
2
3
4
5
6
A
B
Chapter 1
Introduction to confirmatory factor analysis
1.1 Introduction
When we are measuring a concept, it is desirable for that concept to be
unidimensional. For example, if we are measuring a person’s perception of
his or her health, the concept is vague in that there are multiple dimensions of
health—physical health, mental health, and so on. Let us suppose Chloe is in
excellent physical health (a perfect 10), but she is very low on mental health
(a score of 0.0). Madison is in excellent mental health (a perfect 10), but has
serious problems with her physical health (a score of 0.0). Jaylen is average
on both dimensions (a score of 5.0 on both). Do we want to give all three the
same score because Chloe, Madison, and Jaylen each average 5.0? Should
one dimension be more important than another?
When you have two dimensions ( and ) and try to represent them on a
graph, you need two values: one showing a score on the dimension and one
showing a score on the dimension. When there is more than one dimension,
a single score becomes difficult to interpret, and it is often misleading to
represent the location of a person on the concept with a single number. Thus
there are advantages to narrowly defining our concepts so our measures can
tap a single dimension. If we are interested in multiple dimensions, such as
distinguishing between physical and mental health, then we need multiple
concepts and multiple empirical sets of measures.
On the other hand, we can carry this argument too far. Each item we
might pick to measure physical health will represent a slightly different
aspect of physical health. We should aim to represent as broad a meaning of
physical health as we can without adding distinctly different dimensions. The
ideal way to do this is to allow each item to have its own unique variance and
develop a scale score that represents the shared meaning of the set of items
on a single dimension. This way, our measurement model represents concepts
that are neither too broad to have a clear meaning nor too narrow to be of
general interest.
A good alpha value does not ensure that a single dimension is being
tapped. Consider the following correlation matrix:
1.0
0.6 1.0
0.6 0.6 1.0
0.3 0.3 0.3 1.0
0.3 0.3 0.3 0.6 1.0
0.3 0.3 0.3 0.6 0.6 1.0
We see in this matrix two subsets of items: – and – . Items –
are all highly correlated with each other ( ) but much less
correlated with items – ( ). Similarly, items – are highly
correlated with one another ( ) but much less correlated with items
– ( ). This indicates that there are two related dimensions,
namely, whatever is being measured by – for one dimension and
whatever is being measured by – for the other. The alpha for these six
items is , which is considered good. For example, Kline (2000)
indicates that an alpha of 0.70 and above is acceptable. However, the point
here is that when we rely on alpha to justify computing a total or mean score
for a set of items, we may be forcing together two or more dimensions, that
is, trying to represent two (or more) concepts with one number. At the very
least, we should routinely combine reports of reliability with some sort of
factor analysis to evaluate how many dimensions we are measuring.
Alpha can be high even with items that are only minimally related to one
another. The formula for a standardized alpha is
where is the number of items in the scale and is the mean correlation
among the items. We would not think of an , for example, as more
than a minimal relationship. After all, if then , meaning
that 97% of the variance in the two variables is not linearly related. However,
if you had a 40-item scale with an average correlation of just 0.17, your alpha
would be 0.80. The measure would be reliable in the sense of internal
consistency, but the high alpha does not mean we are measuring a single
dimension. Simply adding up a series of items (or taking their mean) and
reporting an alpha is insufficient to qualify a measure as a good measure.
A major concern with PCFA is that it tries to account for all the variance
and covariance of the set of items rather than the portion of the covariance
that the items share in common. Thus it assumes there is no unique or error
variance in each of the indicator variables. One reason PCFA is so widely used
is because it is the default method in other widely used statistical packages,
and you need to override this default in those programs to get a truer form of
factor analysis. In Stata, PCFA is an option you need to specify and not the
default. The Stata command for PCFA is simply factor varlist, pcf, where
pcf stands for principal component factor analysis. Through the menu
system, click on Statistics Multivariate analysis Factor and principal
component analysis Factor analysis.1 In that dialog box, you list your
variables under the Model tab. Under the Model 2 tab, you pick Principal-
component factor.
We will illustrate PCFA using actual data from the National Longitudinal
Survey of Youth, 1997 (NLSY97). This is a longitudinal study that focuses on
the transition from youth to adulthood. In 2006, when the participants were in
their 20s, the NLSY97 asked a series of questions about the government being
proactive in promoting well-being. The questions covered such topics as
providing decent housing, college aid, reducing the income differential,
health care, and providing jobs. We are interested in using 10 items to create
a measure of conservatism. In the nlsy97cfa.dta dataset, these items are
named s8646900–s8647800; for simplicity, we have renamed them x1 to x10.
The commands appear in a do-file called ch1.do, which you can find at
http://www.stata-press.com/data/dsemusr/ch1.do. The dataset is located at
The PCFA analyzes the correlation matrix where each item is standardized
to have a variance of 1.0. Therefore, with 10 items, the eigenvalues combined
will add up to 10. With 3.92 out of 10 being explained by the first factor, we
say the first factor explains 39.2% of the variance in the set of items. Any
factor with an eigenvalue of less than 1.0 can usually be ignored.
The second factor has an eigenvalue of 1.01, which is very weak though it
does not strictly fall below the 1.0 cutoff. We decide that the first factor,
explaining 39.2% of the variance in the 10 items, is the only strong factor.
This is reasonably consistent with our intention to pick items that tap a single
dimension. We do not have an explicit test of a single-factor solution, but the
eigenvalue of 3.92 is large enough to be reasonably confident that all the
items are tapping a single dimension. Notice that all the loadings of the items
of Factor1 are substantial, varying from 0.45 to 0.73. This range is also
good when compared to conventions of the loadings being 0.4 or above.
Some authors feel a loading of at least 0.30 is the minimum criterion for an
item (Costello and Osborne 2005). You may recall that with the PCFA, the
loadings are the correlation between how people respond to each item and the
underlying, latent dimension.
Even though the last item has a loading over 0.40, its loading is
considerably weaker than the rest of the items. The last item is about the
environment, which can be a personal concern of anyone, whether
conservative or not. By contrast, the other nine items involve government
response to needs people have because of their limited personal resources.
Because there is a second factor with an eigenvalue greater than 1.0 and
because the loading of the tenth item on the first factor is the weakest, we will
drop that item and rerun our analysis to see if we can obtain a clearer result.
Most researchers would be quite happy with these results. Only one factor
has an eigenvalue greater than 1.0, and all nine items load over 0.5 on that
factor.
Here is the alpha command with results. Stata can estimate alpha using
the variance and covariances (unstandardized, the default) or the
correlations (standardized). Because we are going to generate mean or total
scores, we will estimate the unstandardized value. The unstandardized
version is recommended when generating a scale score using unstandardized
variables.
Our scale looks great by conventional standards. At the bottom of the
table in the row labeled Test scale, we have the alpha for our scale. The
alpha is 0.81, which is over the 0.70 minimum value standard. Under the
column labeled alpha, we see what would happen if we dropped any single
item from our scale; in each case, the alpha would go down. If dropping an
item (one at a time) would substantially raise the alpha, we might look
carefully at the item to make sure it was measuring the same concept as the
other items. Most likely, the PCFA would have spotted such a problematic
item as not fitting the first factor.
To obtain our scale score for each person in our sample, we would simply
compute the total or mean score for the nine items. I usually prefer the mean
score of the items, because it will be on the same scale as the original items
(for these items, between 1 to 4). Given this, a mean of 3.0 would denote that
a person is conservative and does not support a proactive government. A
mean of 1.5 would denote that the person is fairly liberal, between definitely
and probably supporting a proactive government.
By contrast, a total score would range from 9 to 36, and it would be much
harder to interpret a total score of, say, 24.0 (instead of 3.0) or 12.0 (instead
of 1.5). Another problem with the total score arises if there are missing values
for some items. An item with a missing value would contribute nothing to the
total, as if we had assigned that item a value of 0.0. If a person skips an item,
giving them a score of 0 for that item is ridiculous because that would
indicate more definite support of a proactive government than the most
favorable available response that is coded as 1.0.
To obtain the mean score for each person, we generate our scale score as
the mean of the items the person answered. This egen (extended generation)
command gives you the mean of however many of the nine items the person
answered:
The egen command shows that there are 7,097 missing values on our
generated conserve variable. This is not a problem because the item was only
asked for a subset of the overall dataset. The summarize command below
tells us that the mean is 1.78, the standard deviation is 0.51, and this is based
on 1,888 observations. These 1,888 observations include anybody who
answered at least one of the items (see box 2.1 for alternative treatments of
missing values). The histogram with a normal distribution overlay
(figure 1.1) shows that our score is pretty skewed to the right with a
concentration of people favoring a proactive government.
200
150
Frequency
100
50
0
1 2 3 4
conserve
You can generate a factor score that weights each item according to how
salient it is to the concept being measured. Factor scores will be extremely
highly correlated with the simple mean or summative score whenever the
loadings are all fairly similar. If the loadings vary widely, the factor score
will be a better score to use because factor scores weight items by their
salience (loadings and correlations with the other items), but the advantage is
only substantial when some items have much weaker loadings than others.
The factor score will be scaled to have a mean of 0.0 and a variance of 1.0; in
other words, it will be the standardized score for the concept.
The results above show us the factor scoring coefficients, which are like
standardized beta weights. Notice that the ninth item has a scoring coefficient
of 0.20 and the second item has a scoring coefficient of 0.16. This means the
ninth item counts slightly more in generation of the factor score, which
makes sense because the ninth item had a bigger loading than the second item
(0.74 0.59).
The default for the predict command is to predict the factor score as the
weighted sum of the items using the scoring coefficient as the weight for each
item. The factor score should be more reliable than the summative or mean
score because it more optimally weights the items.
175
150
150
125
125
100
100
Frequency
Frequency
75
75
50
50
25
25
0
1 2 3 4 -2 0 2 4 6
Mean Conservatism Score Factor Score on Conservatism
CFA assumes that the latent variable accounts for how people respond to
all nine individual questions, which is what the nine items share in common.
Notice the direction of the arrows from Conservative to each of the nine
items; the arrows take us from the latent variable to the observed items. This
is because how people respond to a question is the dependent variable; that is,
a person’s response depends on how conservative he or she is, the
independent variable. Because all the items seem to tap conservatism, we will
posit that a single factor is all we need, and so we draw the single-factor
model seen in figure 1.3.
Conservative
x1 x2 x3 x4 x5 x6 x7 x8 x9
ε1 ε2 ε3 ε4 ε5 ε6 ε7 ε8 ε9
There are real advantages to CFA. By isolating the shared variance of the
nine questions from their unique variances, we are able to obtain a better
measure of the latent variable. We are also likely to get stronger results by
removing measurement error if the latent variable is subsequently used as an
independent or dependent variable in a structural equation model. This is
because measurement error, by its nature, only adds noise to our
measurement; it has no explanatory power.
This new canvas size will be large enough to accommodate the full
diagram. However, you may not yet be able to see the full canvas. Click
on the Fit in Window button, , to see the full canvas in the Builder
window. If a portion of the diagram is not on the canvas, click on the
Select tool, , and drag it over the model so that all objects are
highlighted. Then move the diagram until you see the entire diagram on
the canvas.
Because Conservative is so long, it does not fit in the default size oval
for a latent variable. To make the oval larger, select Settings
Variables All Latent... from the menu (on a Mac, click the Tools
button, , in the upper right to find the Settings menu.) In the dialog
box that opens, change the size to . You can also change the
size of the boxes for observed variables through the Settings
Variables menu if you like.
With so many indicators, it should be clear now why you want short
names for your variables. I used the clonevar command to rename the
variables because their original names in the dataset were long and
unclear, for example, clonevar x1 = s8332500.
The Stata command to fit our CFA model is simple. We do need to run a
set of four commands, but each of them is quite simple. First, to fit the model,
we run
1. The default is method(ml) which means that we fit the model using
maximum likelihood estimation. By default, when using method(ml),
the variance–covariance matrix of the estimators (and therefore the
standard errors) is computed using an observed information matrix.
Where you assume normality, method(ml) is often the best option and
is fairly robust even with some violation of normality. This uses listwise
deletion.
2. When option method(ml) is combined with option vce(robust), sem
performs quasi maximum likelihood estimation, and the standard errors
are estimated in a manner that does not assume normality. This uses the
Huber–White sandwich estimator of the variance–covariance matrix of
the estimators. Because several of our items are clearly not normally
distributed, this might be a good option to use. The robust standard
errors are less efficient than the observed information matrix standard
errors if the assumptions of maximum likelihood estimation are met.
This uses listwise deletion.
3. The option method(adf) is asymptotically distribution free. This
method makes no normality assumptions and is a form of weighted least
squares. It is also less efficient than maximum likelihood where that is
appropriate, but more efficient than the quasi maximum likelihood
estimation. Because it does not assume normality and is asymptotically
equivalent (in a large sample) to maximum likelihood, this may be the
best option for our data. This uses listwise deletion.
4. The option method(mlmv) is appropriate when you want to use all the
information available in the presence of missing values on one or more
variables. This method assumes joint normality and that the missing
values are missing at random. This does not use listwise deletion. In our
example, we would have an using the method(mlmv) option,
whereas with any of the other three estimators our .
You can also use the vce(bootstrap) option to estimate the standard
errors with the bootstrap procedure. This method will resample your
observations with replacement and fit the model however many times you
specify. It will then use the distribution of the parameter estimates across
these replications to estimate your standard error. This will be especially
useful when you are concerned about violating the normality assumption of
the maximum likelihood options. For example, you might run the following
command:
For now, we will just use the default version of the command. Here are
the results:
1.8 Interpreting and presenting CFA results
At the top of the results, we see that we have 7,360 observations with missing
values excluded. The default estimation method, maximum likelihood, uses
listwise deletion and drops any observations that do not have a response for
all nine of our items.4 The results next report our endogenous (dependent)
variables. All of our observed items, x1 to x9, are endogenous; that is, these
measurement variables depend on the latent variable. We next have a list of
exogenous variables. Stata reports just one latent exogenous variable,
Conservative; Stata does not list the measurement-error terms to here
even though these are also latent exogenous variables.
Conservative
x1 x2 x3 x4 x5 x6 x7 x8 x9
ε1 ε2 ε3 ε4 ε5 ε6 ε7 ε8 ε9
Alpha is the lower limit of when there are no correlated errors, as in our
example. Alpha may be greater than when there are correlated error terms,
whether these are between indicators of the same latent variable or indicators
of different latent variables. You need to use the unstandardized loadings
when estimating these reliability values. In addition, to use this formula you
need to rerun your structural equation model and fix the variance of the latent
variable at 1.0; so far, we have been using a reference indicator to estimate
the variance of the latent variable, but we no longer need to do this when we
fix the variance of the latent variable at 1.0. In section 1.9.2, we will work out
a detailed example of how to do this.
In other words, the CFI reported by the estat gof command above
indicates that our model does 89.8% better than a null model in which we
assume the items are all unrelated to each other. This is quite an
improvement, but depending on the source, the recommended cutoff values
should be either 0.90 or 0.95 with the 0.95 cutoff becoming more widely used
today.6 Our one-factor solution is again less than ideal, falling short of either
of these criteria.
Another index of how well our model fits is the standardized root mean
squared residual (SRMR). This is a measure of how close we come to
reproducing each correlation, on average. Our results have an SRMR of 0.05.
This means that, on average, we come within 0.05 of reproducing each
correlation among the nine indicators. The recommended value is less than
0.08, so our 0.05 is a good value. The SRMR can be misleading if you have
items that are minimally related; for example, if the average correlation of
items is 0.06, then coming within 0.08 would be terrible, but if the average
correlation is 0.60, then coming within 0.08 might be considered okay.
Box 1.2. How close does our model come to fitting the covariance matrix?
So far, the CFA result is telling us that our model is not ideal, but the result
does not indicate what we might change about our model. Stata provides
postestimation options that can help here, but it is dangerous to rely too much
on this information.
Also remember that these values are for adding a single parameter. The
values are not additive, so if you add a parameter, the modification indices
for everything else will change. Best practice is to add only one parameter at
a time.
Both of these items involve a focus on helping people who, because of their
circumstance (illness or age), may have limited capacity to make a substantial
income. estat mindices indicates that the error terms for these two items
should be correlated with an approximate correlation (standardized estimated
parameter change) of 0.39. In other words, what is unique about one of these
items with respect to Conservative is correlated with what is unique about
the other item. x3 and x4 are more correlated with each other than they are
with the other seven items (the correlation between this pair of items is 0.59).
We will drop these two variables and keep the covariance between the
pair of error terms x3 and x4. Our final model is fit and is evaluated using the
following four commands. Notice the inclusion of the covariance() option
to provide for the error term of x3 to be correlated with the error term of x4.
We name these error terms e.x3 and e.x4. The asterisk tells Stata to allow
them to be correlated.
Our results are very strong, and the goodness of fit is greatly improved. Our
model still has a significant chi-squared of 56.02, which is highly significant
with 13 degrees of freedom, . Although this means our model is not
perfect, the measures of fit are all good: and .
We use the formula for scale reliability that includes the covariance of the
two error terms in the denominator. We use the unstandardized loadings, and
we need to refit our model one last time. We need to have the variance of
Conservative fixed at 1.0 so that there is no reference indicator needed; that
is, each of the values is estimated. When we fix the variance of
Conservative at 1.0, Stata recognizes that we do not need a reference
indicator fixed at 1.0.8 Here is our command with the additional constraint on
the error variance and partial results.
The formula for reliability is
In our final model, shown in figure 1.5, we have included the key
measures of goodness of fit along with sample size and reliability.
χ2 (13) = 56.02
p < 0.001 Conservative
RMSEA = 0.05
CFI = 0.99
SRMR = 0.02
N = 1625
ρ reliability = 0.79 .56 .58 .56 .5 .64 .58 .73
x1 x3 x4 x5 x6 x7 x9
ε1 ε2 ε4 ε5 ε6 ε7 ε9
.38
Many readers are perfectly happy to have just a figure showing the
results. Other readers want more detailed information, which can be provided
in a table. By including a figure as well as a table like table 1.1, you can save
space in the text because you will not need to report each value (see box 1.2
for an interpretation of a CFA result).
Unstandardized Standardized
value value
Loadings
Variances
Covariance
error.x3 with error.x4 0.11*** 0.38***
***
This table would be presented along with the measures of fit shown in
figure 1.5. The table makes it clear which item we picked as the reference
indicator. Notice that Stata does not report significance levels for the error
variances, although it does report approximate confidence intervals. Other
structural equation model software does report significance levels for the
error variances; Stata does not because there is a boundary problem in that a
variance cannot be less than 0. However, because none of the confidence
intervals for the error variances approaches 0 for its lower limit, we should
not dismiss any of these error variances by fixing them at 0. There is a unique
variance for each item that is not directly shared with Conservatism.
We have between 7,291 and 7,397 observations. These items were asked
of far more people than the items on conservatism. We are interested in
evaluating our measure of depression using these three indicators and then
combining our latent depression variable and our latent conservative variable
to see if they are correlated. We aim to find out whether depressed people are
more conservative or less conservative.
Let us look at the standardized solution to assess our model. We run our
set of four commands. But because the estat gof and estat mindices
commands do not add any useful information, the output from these two
commands is not shown.
The first and third Depress indicators are in the correct direction, but
item x12 would need to be reverse-coded if we were using a traditional
summated scale. We do not need to do this when constructing a latent
variable because the loadings can be either positive or negative for each item.
To simplify interpretation, it is always useful to have the majority of the
indicators coded so that a higher score is associated with more of the concept.
In this case, two of our three items are positively loading on Depress, so
higher scores for the majority of items reflect greater depression. If this were
not the case, we could reverse-code items.
If we examine the standardized solution, we see that all three loadings are
significant and their loadings are fairly strong: , , and . The
scale reliability of our simple three-item measure of latent Depress is
. Remember, to compute , we need to refit the model with the
variance of the latent variable fixed at 1.0: sem (Depress -> x11-x13),
var(Depress@1). To calculate the reliability estimate, we use the absolute
value of the unstandardized loadings.
Option 1 uses the default, which is listwise deletion. This will include
the 7,183 people who answered all three of the Depress indicators; it
will exclude people who answered just one or two of the Depress
indicators. More importantly, it will include thousands of people who
were not asked any of the indicators of Conservative.
Option 2 uses the method(mlmv) estimator. This will include the 7,429
people who answered at least one of the Depress items, regardless of
whether they had missing values coded as dots or as special missing-
value codes (.a, .b, .c, etc.). This will handle part of our problem in
that it will include data for those who answered any of the three
Depress items; the problem is that it will also include people who were
not asked any of the Conservative items (remember, those items were
only asked of a subsample of survey participants).
Option 3 does listwise deletion for the three indicators of Depress (the
default), but also does listwise deletion for anyone missing one or more
of the x1–x9 indicators used to measure Conservative (the !missing
means “not missing”). This results in a sample size of 1,466 who
answered all nine indicators of Conservative and all three indicators of
Depress.
How well does our model fit the data? Let us find out with our third
command:
The first set of modification indices is for the measurement loadings. The
biggest modification index among this set is 16.58 for
. If we added this path, our model would be more
complex conceptually because this item would be loading on two latent
variables. There are cases where a single item should load on two latent
variables, but this makes the latent variables at least somewhat factorially
confounded.
What should we do about x13? The item asks “How often R depressed
last month”? This has face validity as an indicator of Depress. I do not see
any face validity for it as an indicator of Conservative, and so would not feel
justified in allowing the item to load on both dimensions. If you wanted to let
x13 load on both dimensions, your command would be sem (Depress ->
x11-x13) (Conservative -> x1 x3-x7 x9 x13); you simply have the item
loading on both latent variables.
At the bottom of the results, we see that there are several error terms we
could correlate. Although all these modification indices are greater than 3.84,
we should not make any changes unless we have a compelling reason to do
so. Given how well our model fits the data and that logically we do not see
why these errors should be correlated, we will go with this model. Figure 1.6
shows our standardized results. All reported values are significant at the
level except the correlation of Depress and Conservative, which
is significant at the level.
x1 ε1
x3 ε3
.55 .38
.12 .58
ε11 x11 x4 ε4
.81
.57
.66
.65
If you have experience with exploratory factor analysis using Stata, you
will remember that there are oblique solutions that allow the factors to be
correlated. What we are doing with CFA is a bit different. We are estimating
the correlation directly in the model, whereas conventional oblique solutions
make arbitrary assumptions to get a moderate correlation. If you change those
arbitrary assumptions, you can make the correlation either bigger or smaller,
but it is questionable to interpret the correlation as very meaningful given it is
based on arbitrary assumptions. By contrast, with CFA we can directly
estimate the correlation. If our model is correct, the correlation between
depression and conservatism is statistically significant but weak at .
Keep in mind that this box is about what you need to know and not
what is nice to know. Identification can be simple and often is for many
models, but it is sometimes very difficult. One solution is to follow
some general rules and then assume that Stata would not give you a
definite answer if there were an identification problem. Ideally, you
would generate a system of simultaneous equations—one for each
variance and covariance—and then determine algebraically whether you
have enough information to identify each parameter you are estimating.
What happens if you have more than three indicators for a single latent
variable? With four indicators, we would have
variances and covariances. How many parameters are we trying to
estimate? These are easiest to count for the standardized solution. We
would need to estimate four loadings (paths from the latent variable to
the indicators) plus four error variances. Thus we have 10 pieces of
information and 8 parameters to estimate, resulting in
degrees of freedom. What if we correlated a pair of error terms? We
would still have 10 pieces of information, but now we would have 9
parameters to estimate, meaning that we would have just 1 degree of
freedom. We could fit this model and test the fit, although 1 degree of
freedom is a fairly limited amount of information for the testing of the
model.
Perhaps you have read articles that use structural equation modeling
with several latent variables, one or more of which had just two
indicators. In terms of identification, we can estimate a latent variable
that has just two indicators if we have additional observed variables in
the model.
1.11 Parceling
Sometimes we have so many indicators that the model becomes difficult to fit
and impossible to draw. We might have three latent variables with 20 items
for each of them. This means we need to estimate loadings, 60
error variances, and the variances and covariances of the latent variables
themselves. In cases like this, parceling is one solution you might consider.
This approach combines the items into a few parcels. For example, we might
have three parcels for each latent variable with the first and second parcels
based on seven items and the third parcel based on six items.
How do we combine items into parcels? If we randomly assigned the
items to three sets and generated a mean score for each set, we would have
three parcels. We could create the parcels like this: egen parcel1 =
rowmean(x1 x2 x3 x4 x5 x6 x7). If there were a single underlying
dimension for each of the latent variables, then the three randomly assigned
sets of items would be approximately equivalent.
A second approach is to actively balance the items going into each parcel.
First, you run a separate factor analysis for each of the three latent variables.
You then assign the top three items (based on loadings) to anchor each
parcel. The next step is to assign the next best triad of items to the three
parcels, and continue until you run out of items. It is okay if one of the
parcels has fewer items than the others. Balancing the items may be better
than random assignment because it ensures that each parcel is a well-
balanced representation of the latent variable. As with the random selection,
this approach assumes there is a single underlying dimension.
We could have a methods factor where similar methods are used for some
of the items for different latent variables. If we were interested in a latent
variable to represent a person’s health, we might have five self-report items, a
performance measure of fitness, and a clinical assessment. Suppose our
second latent variable was compliance and it had a similar mixture of
indicators. In addition to our two latent variables, Health and Compliance,
we might have to add a factor for self-report method. This factor would load
on the five items that are self-reported health and the five items that are self-
reported compliance.
The big extension of CFA is when we use latent variables in full structural
equation models, where we replace correlations of latent variables with
causal paths between them. Before we look into that, we must first examine
using structural equation modeling for path models that do not involve latent
variables. Then, chapter 3 will bring these two traditions, CFA and path
analysis, together to show how we fit full structural equation models.
1.13 Exercises
1. Why should a measure represent a single dimension?
2. Why is alpha reliability sometimes described as the lower limit on
reliability?
3. Why is an average of a series of items easier to interpret than the
summation of those items?
4. Use the SEM Builder to draw a three-factor CFA with three indicators of
each factor.
5. What is the difference between method(ml), method(ml) with
vce(robust), and method(mlmv)?
6. When would you use bootstrap to estimate standard errors?
7. The reliability of a scale corresponding to a latent variable is usually
bigger than alpha reliability for the set of indicators. When could be
smaller and why?
8. You have 15 observed indicators. What is the maximum number of
parameters you can estimate?
9. If you have two results that have modification indices of 40.0, what does
this mean and what should you do?
10. You have a psychological measure of competitiveness consisting of 40
items and a measure of liberalism consisting of 30 items. If you are
doing a CFA of these two latent variables, what is the problem? What is a
solution?
11. You read an article that does a CFA involving three latent variables with
three indicators of each. There is an appendix that shows a correlation
matrix and standard deviations:
1.0
0.7 1.0
0.6 0.6 1.0
0.2 0.1 0.2 1.0
0.1 0.2 0.1 0.6 1.0
0.1 0.1 0.1 0.6 0.7 1.0
0.3 0.3 0.3 0.1 0.1 0.1 1.0
0.3 0.3 0.3 0.2 0.1 0.1 0.8 1.0
0.3 0.3 0.3 0.1 0.1 0.1 0.9 0.8 1.0
Read appendix B at the end of the book to learn how to enter these data.
Confid1 Confid2
Comply1 Comply2
ε3 ε4 ε7 ε6
We will draw a modified version of the two-factor model shown in figure 1.6
and fit the results using the SEM Builder. We have seven indicators of
Conservative, which we arranged vertically in figure 1.6. With a large
number of indicators, a vertical arrangement usually works best. Here we will
arrange the indicators horizontally to illustrate how to handle a large number
of indicators horizontally; this is sometimes necessary when there are several
latent variables. First, let us open the dataset:
Depress
Conservative
x1 x3 x4 x5 x6 x7 x9
ε4 ε5 ε6 ε7 ε8 ε9 ε10
you see a Tools button, , at the top right of the screen. If you do
not see the button, you need to right-click on the bar, select Customize,
and follow the instructions to get the button to show up. When you click
on the Tools button, a drop-down menu appears that allows you to
access the Objects, Estimation, Settings, and View menus that are
directly accessible on the top bar of Stata for Windows and Unix.
Both styles of top bar also have other tools available. You can use
Zoom to adjust the size of the model you are creating, which is helpful
if part of your model has, say, several arrows in close proximity. The
Fit in Window tool is also useful. You can drag the border of the SEM
Builder window to be as large or small as you want, and then you can
click on Fit in Window to have the canvas fill the entire window size.
Now we need to fit the model. Click on Estimation Estimate... and then
simply click on OK to get the results shown in figure 1.9.
ε1 .15 ε2 .26 ε3 .2
1 -.74 .69
Depress
.29
.035
Conservative
.32
x1 x3 x4 x5 x6 x7 x9
2.4 1.4 1.4 1.8 2.3 2.2 1.7
This is not a bad model, but it shows several numbers we do not need,
such as the variance of Depress and Conservative as well as the intercepts
for each observed variable. To remove these, click on Settings Variables
All Latent.... Select the Results tab (see figure 1.10). Both of our latent
variables are exogenous, so change the first result for Exogenous variables
from Variance to None.
Figure 1.10: Variable settings dialog box
Depress
.12
Conservative
x1 x3 x4 x5 x6 x7 x9
.38
There is one more thing we might do to make the figure a bit more useful
for our readers. We can add a legend to report the goodness-of-fit
information. Let us put this legend in a box in the left middle of the figure
where there is some white space. Click on the Add Text tool, , on the left
side of your screen and then click on the figure where you want the text to
start. Creating the legend can be a bit tricky because we want to italicize the p
and the N. We also want the Greek letter (rho). We need to enter {it: p}
for the , {it: {&rho}} for the , and {it: N} for the . Notice that we are
using curly braces rather than parentheses or square brackets.
x1 x3 x4 x5 x6 x7 x9
.38
If there is a problem fitting the legend on the figure, you can always
include this information in a footnote to the figure.
You can do a lot more with the SEM Builder; I give you just a taste here.
Click on Estimation Estimate.... The Model tab provides three options for
estimating parameters. You will want to use the Maximum likelihood with
missing values option to use all available data. The Asymptotic distribution
free option should be used when you have variables with far from normal
distributions. Under the if/in tab, you can enter restrictions on your sample
like we did in the text (see ch1.do). The Weights tab allows you to select
sampling weights when these are available for your data. The SE/Robust tab
provides many options for how you estimate the standard errors. Clicking on
Survey data estimation changes your options in ways that are relevant for
working with complex samples. Without clicking on Survey data estimation,
you have a wide variety of estimators for the standard errors. The Clustered
robust estimator adjusts for lack of independence within clusters, like you
might get if you sampled 50 children from each of 20 different schools, with
the school being your cluster. The Robust estimator uses the sandwich
estimator to compute standard errors and thereby avoids the assumption of a
normal distribution. The Bootstrap estimator of standard errors was
illustrated in the text of this chapter, but it can also be used from here.
When you click on Estimation Goodness of fit, you get fit options
that are available as postestimation commands that were illustrated in the
chapter text. These include the equation-level goodness of fit (estat eqgof)
and the overall goodness of fit (estat gof) commands. The modification
indices (estat mindices) are accessed by clicking on Estimation Testing
and CIs Modification indices. You also get many more options that go
beyond the scope of this chapter. When you run these options, the results
appear in the Results window rather than on the figure.
Stata uses a general number format with up to seven total digits by default
when reporting the path coefficients. If a path has 0 for its second digit—for
example, 0.50—Stata reports only the first digit, .5. You may want to change
this to report two decimal places. To do this, you would click on Settings
Connections Paths... to open the dialog box. In the lower left on the
Results tab, click on the button labeled Result 1.... Change the Format to
%4.2f.
I will illustrate how to use SEM Builder again in each of the following
chapters. Try new features as you learn them. You will find this to be a
remarkable tool that Stata has built. It works as well as or better than
specialized commercial drawing programs, and they cannot even fit your
model.
. Warning: When you go to Statistics Multivariate analysis Factor and principal component
analysis, do not then pick Principal component analysis (PCA) from the menu. This is intended to extract
principal components, linear combinations of the variables, rather than factors.
. To get the mean for only those who answered all nine items, we would have used egen conservm =
rowmean(x1-x9) if !missing(x1, x2, x3, x4, x5, x6, x7, x8, x9). Stata reads !missing as “not
missing”. Notice the items are all listed and separated by a comma; the commas are necessary for this
command.
. Note that the name of the latent variable should be capitalized to help us distinguish indicators, which
should be all lowercase, from latent variables.
. If we had wanted a full information approach that utilized all available information, we would have
specified sem (Conservative -> x1-x9), method(mlmv).
. Remember: The first loading is fixed at 1.0 as a reference indicator.
. I do not describe the Tucker–Lewis index, which has the same cutoff values as the more commonly used
CFI.
. Remember: The model chi-squared is testing whether or not our model fits, so we want to consider
changes that reduce the size of chi-squared significantly.
. Whether we use a reference indicator to estimate the variance of the latent variable or fix the latent
variable’s variance at 1.0, the standardized solution reported in figure 1.5 will be the same.
Chapter 2
Using structural equation modeling for path
models
2.1 Introduction
In this chapter, we will learn about the second building block in structural
equation modeling. Now that you have some experience with the
measurement model—that is, confirmatory factor analysis—we will see how
to fit path models by using the sem command. We will finally delve into full
structural equation modeling in chapter 3, which will combine latent
variables presented in chapter 1 with the ideas of path analysis presented in
this chapter.
But, let us say you do not include because you do not have it measured
or you did not think about it. The model you fit is
In this case, if the correlation between and is greater than 0, then X will
be correlated with , resulting in a biased estimate of , the direct effect of
on . This is one example of omitted variable bias, which is sometimes
referred to as an endogeneity problem. We never know whether all relevant
predictors have been included in our model, and one of those we left out may
actually cause the endogenous outcome variable. Thus even with longitudinal
data, we have extremely limited meaning for the term “causal model”. Many
researchers say that “ influences ” or “ is associated with ” instead of
asserting that “ causes ”. A discussion of what is meant by a causal
model is beyond the scope of this chapter, but it is an important issue that
every researcher should examine (see Bollen and Pearl [2013]; Halpern and
Pearl [2005]; Hedström and Swedberg [1998]; Shadish, Cook, and Campbell
[2002]).
The only endogenous outcome variable in figure 2.1 is x6, the final
outcome. The remaining variables, x3–x5, are endogenous variables that
mediate some part of the effect of antecedent variables on subsequent
variables. Each of these endogenous variables is explained by other variables
in the model. We would say that x1 has a direct effect on x6, (
).1
x4
β41
x1
β61 ε2
β64
β31
β43
r21 x6 ε4
re2,e3
x3
β32 β65
ε1 ε3
x2
β51
x5
Figure 2.1 is a recursive path model because the flow of influence goes in
a single direction; there is no feedback. We say that x1 directly, with effect
, and indirectly, with effects , , and
causes the level of x6. We do not allow for x6 to have any
feedback to x1. Study figure 2.1 carefully to see if you understand how these
indirect influences are simply calculated by multiplying a combination of
direct effects. Can you figure out the direct and indirect effects of x2 on x6?3
ε1
c
x y
ε1
a b
ε2
c
x y
Today, many path models are like our figure 2.1 in the sense that there
are multiple endogenous mediators. Some of these may result in
positive indirect effects, and some may result in negative indirect
effects.
math7
ε1
attention4 math21 ε3
ε2
read7
Let us fit this model just as it is drawn. It is always useful to draw a figure
before writing the command, even if the drawing is just a freehand drawing
using a pencil. Figure 2.2 was drawn using Stata’s SEM Builder (see
appendix A and the appendix to this chapter). The SEM Builder allows us to
estimate the results directly from the figure. You may want to rely on the SEM
Builder as your primary way of fitting models; however, in the text of this
chapter, I show the actual Stata commands. It is important to understand what
these commands are, even if you rely on the SEM Builder most of the time.
Figure 2.3 is the resulting model including the standardized estimates that
Stata generates:
math7
ε1 0.31
0.14
attention4 math21 ε3
0.12
0.13
ε2 0.25
read7
The next command tells us how well our model fits the data. This
includes some information to evaluate our model that is unavailable with
traditional regression results.
These results are disappointing. Our model fails significantly to
reproduce the covariance matrix for our four variables, ,
. Remember from chapter 1 that with structural equation modeling,
a significant chi-squared means that we fail to account for the covariances
among our variables. We want a small chi-squared relative to the degrees of
freedom and one that is not statistically significant. The root mean squared
error of approximation (RMSEA) is 0.25, much greater than our ideal standard
of less than or equal to 0.05; the comparative fit index (CFI) is 0.79, far below
our ideal standard of 0.95 or even the acceptable standard of 0.90. We have 1
degree of freedom. A recursive path model is always identified or over-
identified.
We will ignore the direct and indirect effects until we get a better fitting
model. Here we will examine the last command, estat mindices, to see if
there is a way to improve the fit. This command reports modification indices,
each of which is an estimate of how much we would reduce chi-squared if we
added the indicated path. For example, math7 <- read7 has a modification
index of 26.89. This means that if we added a path going from read7 to
math7, our chi-squared would be reduced by roughly 26.89 points and this
would use up 1 degree of freedom. Because a chi-squared of 3.84 is
significant with 1 degree of freedom, this would be a significant
improvement in how well our model fits the data. We need to be very
cautious here: we have only 1 degree of freedom, and estimating any
additional parameter will guarantee a perfect fit because we will then have 0
degrees of freedom. Any model with 0 degrees of freedom will have ,
RMSEA , and CFI . That does not mean it is a good model, just that
we cannot test it unless it is over-identified (that is, has at least 1 degree of
freedom).
At the bottom of the results, the final extra path we might add would be to
allow the error terms for math7 and read7—that is, e.math7 and e.read7—
to be correlated. In figure 2.3, e.math7 appears as and e.read7 appears as
. Allowing these error terms to be correlated makes a lot of sense, and we
do not have to make a causal argument as we would for or
vice versa. This correlated error means that there are variables that would
influence both math7 and read7 that are not in our model. There are lots of
these, such as the child’s socioeconomic status and gender. Allowing the
error terms of endogenous variables that have no direct linkage between them
to be correlated is similar to the idea of a partial correlation. It is how much
of the variance in read7 that is unexplained by our model is correlated with
the variance in math7 that is unexplained by our model. Because we have not
explained about 98% of the variance of both variables ( ’s for both
variables were about 0.02), it is likely that there will be some covariance.
ε1
attention4 math21 ε3
ε2
read7
To fit this model, we run exactly the same set of commands as before
except we additionally request the covariance of and . We add this using
covariance(), abbreviated cov(), as an additional option after the comma:
The results of the sem command (results not shown) are almost the same
as before, except we now find a covariance of e.math7 and e.read7.
Because we requested the standardized solution, the reported value is actually
a correlation, ( , ). I leave it to
you to run this structural equation model using the commands or using the
SEM Builder. The ’s for read7 and math7 have not changed much
compared with what we had in the initial model, but the for math21 has
increased a bit from 0.19 to 0.22.
Many of the statistics produced by the estat gof, stats(all)
command (results not shown) are meaningless because we have no degrees of
freedom and will have a perfect fit, by definition. The estat mindices
command reports no modification indices because we have a perfect fit, as
we must have with no degrees of freedom.
The first table reports the unstandardized and the standardized direct
effects. These are the same standardized direct effects we already had. The
test and probability reported in this table are for the unstandardized direct
effects. You can either rely on the tests for the unstandardized solution or see
how you can compute tests for the standardized effects by reading box 2.3.
The second table reports indirect effects, where they exist. By examining
figure 2.4, we see that there is no indirect effect of attention4 on math7,
just a direct effect. Similarly, there is no indirect effect of attention4 on
read7. There are two indirect paths between attention4 and math21:
and .
Stata provides the total of these indirect effects, 0.07, and a test for the
unstandardized indirect effect, , ; Stata does not provide
the two separate effects. We can easily calculate each of these indirect effects
by hand multiplying the corresponding paths together ( and
), but we would not have separate tests of significance for them. I
explain how you can test these two indirect effects in box 2.3.
The last table in our results provides the total effects by adding together
the direct and corresponding indirect effects (see box 2.2). The Std. Coef.
column provides these results. The total standardized effect of attention4
on math21 is the direct effect plus the indirect effect, that is,
; the test for the unstandardized total effect is
, .
Let us put all this information together in both a figure and a table. In
figure 2.5, we report the standardized path coefficients for the direct effects
as well as the correlation of the error terms. In addition, we report the
available measures of goodness of fit. With no degrees of freedom, the
standard measures (chi-squared test, CFI, and RMSEA) are meaningless, so we
do not report them here; We would report them for any model that had at
least 1 degree of freedom. We also report the for each endogenous
variable and the sample size.
math7
ε1 0.30***
0.14**
attention4 math21 ε3
0.11*
0.26***
R2math7 = 0.02 0.13**
R2read7 = 0.02 ε2 0.25***
R2math21 = 0.02
N = 430
read7
Table 2.1 shows the direct, indirect, and total effects. We can get all the
information we need from the estat teffects, standardized command
results. We report the significance levels based on the tests for the
unstandardized solution because the specific tests for the indirect and direct
effects are not provided in the standardized solution. While the test of an
unstandardized coefficient and the test of the corresponding standardized
coefficient may differ, this difference usually does not change the overall
significance level.
Math at age 7
0.14** - 0.14**
Reading at age 7
0.13** - 0.13**
Math at age 21
0.11* 0.07** 0.19***
0.30*** - 0.30***
0.25*** - 0.25***
The significance levels shown here are for the unstandardized solution.
* ,** , and *** .
To use nlcom, we first make up a name for each indirect effect so that
we can find the estimated values in the results. First, let us name
as attn_m7_m21. We will name
as attn_r7_m21. Next we need to
find out the names Stata has given to the coefficients for each of the
direct paths in the model. We need a legend of the names of
coefficients, which we can get by replaying the results with sem,
coeflegend.
Perhaps gender is related to both the attention span at age 4 and the
reading level at age 7 ( and ).
When there is a common antecedent that causes both variables—that is,
gender causes both attention span at 4 and reading at 7—we say the
relationship between attention span at 4 and reading at 7 is spurious or at
least partially spurious.
attention4 || read7 ε2
||
ε1
vocab4
Let us make this a bit more complicated. We will focus on the model in
figure 2.5 except we will add four covariates: vocabulary at 4, adoption status
of the child, gender of the child, and his or her mother’s education.
We have added four covariates for each equation: vocab4, adopted, male,
and momed. In the first line, the equation for math7, we are now saying that the
math7 score may be influenced by attention4 as in our original model, but
also by the child’s vocabulary at age 4, whether he or she is adopted, whether
the child is a boy or girl, and his or her mother’s education. When we fit this
equation, the question becomes: Does attention4 still have a significant
effect on math7? We do the same thing in the next two lines for read7 and
math21.
An auxiliary variable is a variable that is not part of your model but that
explains who is more likely to have missing values. An example would be
education because people with less education often have more missing values
than people with more education. While we cannot test the MAR assumption,
it is less restrictive than MCAR; sometimes we can include a few auxiliary
variables that help explain who has missing values. The point to remember is
that method(mlmv) is less restrictive than the assumption of some other
methods, such as listwise (casewise) deletion, that are valid only if the
missing values are MCAR. Note that method(mlmv) still does assume that the
variables are multivariate normal, which is often a problematic assumption.
When you violate the multivariate normal assumption, this can bias your
parameter estimates, in some cases substantially (Acock 2012b).
Auxiliary variables that are added to make the MAR assumption more
plausible either help predict a score that is missing (age might help predict a
person’s height if height were a missing value for that person) or help explain
why variables have missing values (older people may be more likely to skip
the item about their height). Although adding a few auxiliary variables helps
justify the MAR assumption, they may also complicate the multivariate
normal assumption, for example, using gender. Additionally, some potential
auxiliary variables might have a lot of missing data themselves, which can
add more noise to your model. For example, you might think of income as an
auxiliary variable, but many people refuse to report their income. One
solution is to fit your model using both the default maximum likelihood
estimation assuming listwise deletion and the method(mlmv) approach.
Hopefully, the results will be similar, adding confidence to your findings.
Stata’s sem command does not have an explicit way to add auxiliary
variables; however, there is a way that we can trick it. Suppose we come up
with a set of six variables to serve as auxiliary variables. We will label them
aux1–aux6 in this example. If we can include these auxiliary variables, then
we can explain the missingness and add credibility to our meeting the MAR
assumption.
These estimates are the unstandardized estimates, but that is not going to
be a problem. Now we can test the equality of standardized path coefficients
by running a test command and using the labels shown in the legend. The
test command gives us a Wald chi-squared test of the difference. We use
the prefix command estat stdize: so that the test is applied to our
standardized path coefficients. Here is what we get:
A chi-squared test of equality of the standardized coefficients yields a
, . You can see why this test is important. Although the
effect of math7 on math21 in our sample was stronger, , than the
effect of read7 on math21, , this difference is not statistically
significant. As with any test of significance, we are not saying that reading
and math at age 7 are equally important, just that we have not demonstrated a
significant difference using our data.
a
math7 math21 ε2
b
read7 read21 ε1
We must have correlated errors for read21 and math21 in our model.
Some variance in math21 that is not accounted for by math7 and read7 will
surely be correlated with some variance in read21 that is not accounted for
by math7 and read7. Some researchers choose not to include this correlation;
however, it should be included, and the sem command makes it simple to
estimate this correlation. Without this correlation, you are unlikely to obtain a
good fit for your model, and excluding it would be equivalent to saying that
all factors influencing math21 are uncorrelated with those influencing read21
except for read7 and math7.
0.31***
math7 math21 ε2
0.11*
R2math21 = 0.21
R2read21 = 0.29 0.17**
0.27*** N = 416
0.26 ***
0.49 ***
read7 read21 ε1
What can we say about these results? First, it appears that reading skills
are more stable than math skills ( compared with , both
), although we should test whether there is a significant difference.
Second, it appears that reading skills at age 7 influence math skills at age 21 (
, ) more so than math skills at age 7 influence reading
skills at age 21 ( , ). We should also test whether these
differences are statistically significant.
To test for significant differences between standardized path coefficients,
we first run sem, coeflegend to retrieve the names Stata has given the paths.
Using these names, we then run our tests. We precede each test with the
prefix command estat stdize: so that Stata runs the test on the
standardized estimates. Here are the postestimation commands and their
results:
Thus we can assert that the stability coefficient for reading skills is
significantly different from the stability coefficient for math skills,
, . We can also assert that the effect of reading on later
math skills differs significantly from the effect of math on later reading skills,
, . The chi-squared test is a two-tail test. If you had
directional hypotheses and wanted a one-tail test, you could cut the estimated
-values in half.
You can apply a panel model like this to a wide variety of situations. Do
wives’ political views influence their husbands’ political views more than the
other way around? You might measure their political views at three time
points: one year before the election, six months before the election, and at the
election. In this case, you would simply extend the model in figure 2.7 to
have three waves. You would repeat the cross-lagged paths between the
second and third waves, and you would correlate the error terms for the
second and third waves.
2.8 Moderation
In this chapter, I have focused on meditational models, but sem can be used
equally well for models that involve moderation. Statisticians use the term
“statistical interaction” when they want to show that the effect of one variable
x on another variable y is moderated by the level of a third variable z.
People who have a high score on z might have a stronger or weaker
relationship between x and y, in which case we say that z moderates the
relationship between x and y.
R 2 = 0.65
N = 74
wgt1000s mpg ε1
-6.01
Next we think that foreign brand cars may get better mileage than
domestic brand cars. We could incorporate both weight and foreign brand as
joint predictors of MPG as illustrated in figure 2.10 by using
The addition of whether the brand is foreign does not help significantly
once we have adjusted for weight. Thus it appears that the key variable for
MPG is simply how much a car weighs.
wgt1000s
R 2 = 0.66
-6.59 *** N = 74
-0.21*** mpg ε1
ns
-1.65
foreign
Domestic cars
Foreign cars
Thus a domestic car loses 5.98 MPG for each additional 1,000 pounds of
weight. Surprisingly, a foreign car loses 10.43 MPG for each additional 1,000
pounds of weight. Controlling for weight, lighter foreign cars do get better
mileage, but this advantage disappears for heavier foreign cars.
We can illustrate this in two ways. Figure 2.118 shows one way, where
the car being foreign moderates the effect of weight on mileage.
Foreign
R2 = 0.68
N = 74
-4.45* 9.27*
Our scatterplot, shown in figure 2.12, does a good job of showing the
moderation effect. Foreign cars, represented by hollow diamonds, tend to
weigh less than domestic cars; none of the foreign cars weigh more than
about 3,500 pounds, while some domestic cars weigh nearly 5,000 pounds.
As the weight goes up, the foreign cars have a sharper drop-off in their
estimated MPG. Foreign cars, on average, get better gas mileage because most
of them are relatively light, around 2,500 pounds, whereas most U.S. cars are
over 3,000 pounds.
30
Estimated Miles Per Gallon
15 2010 25
2 3 4 5
Car's Weight in 1000s of pounds
Foreign Domestic
We could do all this with the regress command, but using the sem
command has a couple of advantages. First, if we had missing values, we
could use the method(mlmv) option to automatically handle missing values
better than we would by using listwise deletion with the ordinary least-
squares command if it were reasonable to assume multivariate normality.
Second, after you complete chapter 3 using the full structural equation
modeling design involving both a measurement model and a structural model,
we can take advantage of the ability to estimate measurement errors and have
correlated measurement errors where appropriate. Finally, we can extend the
structural equation model to include much more complex moderation models.
When the moderator is categorical, we sometimes treat the categories as
separate groups and do a multigroup analysis, as you will learn in chapter 5.
By examining figure 2.13, we can see that we have far too little
information to fit the model. We have too many parameters to estimate with
just a pair of observed variables. Can you explain why we have correlated
error terms in the figure? Remember that represents the unexplained
variance in the wife’s satisfaction and represents the unexplained variance
in the husband’s satisfaction. Couples share so many variables in common
that will be influencing their mutual satisfaction; they face the same
economic challenges, neighbor problems, parenting issues. You can generate
a long list of factors that influence the satisfaction of both a wife and her
husband. None of these are included in our model, so we must at least
acknowledge this by allowing the errors to be correlated.
ε1
wife_sat
husb_sat
ε2
Figure 2.13: Reciprocal relationship between a wife’s satisfaction
and her husband’s satisfaction: A nonrecursive model
(unidentified)
One way to fit this model is to use what are known as instrumental
variables. These variables directly influence the wife’s satisfaction but do not
directly influence her husband’s satisfaction, and vice versa. We use the
instrumental variables to predict the wife’s satisfaction in a way that does not
influence her husband’s satisfaction directly. We can then use this predicted
value of her satisfaction to estimate the effect of her satisfaction on his
satisfaction. We do the same for the husband. The problem with this
approach is that it can be very difficult to locate variables that directly
influence one but not both of our reciprocally related variables.
The instrumental variable could have an indirect effect. For example, you
might use the husband’s education as the instrumental variable for husb_sat.
His own education might predict his satisfaction but not directly influence his
wife’s satisfaction. If he has more education, he is likely to make more
income, which may indirectly make the wife more satisfied. Then we could
use the wife’s education as the instrumental variable for her satisfaction,
making the same argument in reverse.9 Our model might look like
figure 2.14.
ε1
wife_ed wife_sat
fam_inc ε3
husb_ed husb_sat
ε2
ε1
r_intel
r_occasp
r_ses
f_ses
f_occasp
f_intel
ε2
Do you like this model? Do you agree with the researchers’ assumptions?
Pretend you are the respondent. What they are saying, for example, is that
your friend’s intelligence only indirectly influences your aspirations by first
directly influencing your friend’s aspirations. If your best friend Susan is
extremely intelligent, she is likely to aspire to a fairly high occupational
status. As she shares these aspirations with you, this may expand your
horizons about your own possible occupational status, regardless of how
intelligent you are and regardless of your socioeconomic status.
We can use the SEM Builder or the sem command directly to fit this
model. This command specifies each path in the model. The covariance
between the errors appears as an option, and we ask for a standardized
solution.
Figure 2.16 shows the results for the SEM Builder. When we used the SEM
Builder, we did not draw the correlations between the exogenous variables,
but Stata assumes these observed exogenous variables are correlated. We can
draw them using the SEM Builder, but the results we get will be the same. If
you publish a figure like this, you typically add a footnote indicating that the
correlations among the independent variables are not shown to simplify the
presentation.
ε1
r_intel 0.29
r_occasp
0.16
r_ses
0.10
-0.15
0.21 0.28
0.08
f_ses
0.17
f_occasp
f_intel 0.37
ε2
Alternatively, imagine you had unstandardized values of 0.2 and 0.3 for
the pair of ’s. Then when your aspirations go up 1 unit, your friend’s go up
0.2 units. This makes your aspirations go up units, which
makes your friend’s aspirations go up a further units, and
so on. In this case, the model will stabilize as the subsequent increments
become smaller and eventually converge on 0.000.
Take a close look at our model. Is there any reason why a randomly selected
respondent should have more influence on a friend than a friend has on the
respondent? Because our sample is treated as a random sample, there is no
reason why these reciprocal paths should not be identical. The same case can
be made for the effects of each of the exogenous variables. Certainly there
will be some variation in a particular sample like this, but logically the
population values should be equal. A randomly selected respondent’s
intelligence should have no more effect on that respondent’s aspirations than
a friend’s intelligence has on the friend’s aspirations. This also applies to the
effects of socioeconomic status. We expect the following set of equalities to
hold:
To make these four equality tests, we need to first know the nicknames
Stata assigns to each of the eight paths. We can find these labels by running
sem, coeflegend. Here are partial results:
While these results tell us that corresponding paths are not significantly
different, the results do not estimate the shared value each pair of paths has.
Stata can constrain specific paths to be equal by attaching the same name to
each of them. The name can be from 1 to 32 characters long. For example, to
constrain to be the same value as
, we could use the label b1. Here is how we change
our command:10
These equality constraints are entirely appropriate and simplify our model
considerably. We now have 4 degrees of freedom instead of 0: ,
. The reciprocal effects have equal values because of our equality
constraint on them. Importantly, both and
are statistically significant: , .
The postestimation commands you would also want to run on this model
should be familiar by now. We did include estat eqgof to obtain the
values, but you should also run estat teffects, standardized to obtain
the indirect and total effects, and estat mindices to obtain the modification
indices. You may also want to run estat eqgof with the mc2 coefficients as
your estimates of explained variance instead of (Bentler and
Raykov 2000). This is a variation of designed exclusively for
nonrecursive models. Like other correlations, the mc coefficient can be
negative. Because the square of this must be positive, you should check
whether the mc value is negative.
2.10 Exercises
1. Draw a path model that includes six variables with at least one variable
that is an endogenous mediator. Use variables from your substantive
area of interest and make sure you are able to defend your path model.
a. Identify which variables are exogenous.
b. Identify which variables are endogenous mediators.
c. Identify which variables are endogenous outcomes.
d. Why do we not use the traditional labels of a variable being either
independent or dependent?
2. Label the paths in your model as shown in figure 2.1.
a. List each direct effect.
b. List each possible indirect effect, and show how it is a product of
direct effects.
3. The traditional approach to mediation assumes that if is uncorrelated
with , then there is no mediation.
a. Draw a model in which the correlation of and might be 0, but
there is still mediation.
b. Explain how there can be mediation when there is no bivariate
correlation.
4. What command gives you indirect effects and total effects?
5. What command gives you the for each endogenous variable?
6. Why would you correlate error terms for endogenous mediator
variables?
7. You have a large company that has a serious problem with employee
absenteeism. You are considering a company-wide, employer-sponsored
exercise program. You feel that exercise directly reduces stress,
improves overall health, and decreases absenteeism. You feel that stress
directly reduces health and increases absenteeism. Finally, you feel that
improved health reduces absenteeism. The following table shows
hypothetical correlations and standard deviations. These are based on a
pilot test of 200 employees.
a. Draw this as a path model by using the SEM Builder. Fit the model
by using commands (not the SEM Builder).
b. Estimate a standardized solution to the model.
c. How well does your model fit the data?
d. How much variance ( ) do you explain for each endogenous
variable?
e. What is the direct, indirect, and total effect of exercise on
absenteeism? How significant is each?
f. Provide a summary of your results, including a table that you could
show the Board of Directors to convince them to sponsor a
company-wide exercise program.
8. Draw the model in exercise 7, and fit the result by using the SEM
Builder.
a. Is there a significant difference between the direct effect of health
and the direct effect of stress on absenteeism?
b. Compare the results in the model you fit above with the results
from a model that only includes the direct effects of exercise,
stress, and health on absenteeism. Why is a model that includes
only direct effects misleading? What does the total effect tell you
that you would miss if you did a simple regression of absenteeism
on the three predictors?
9. Name two variables that are reciprocally related and explain why they
are reciprocally related. Name some instrumental variables that would
allow you to estimate the reciprocal relationship. Do a freehand drawing
of the model, and justify why the instrumental variables are appropriate.
We will first draw and then fit the model that appears in figure 2.7. With
the dataset open, type sembuilder into the Command window and press
Enter.12 If you have changed settings in your SEM Builder when constructing
an earlier model, you may want to restore the default settings that control
how the results will be displayed. Click on Settings Settings Defaults.
Click on the Select tool, , in the upper left of the screen, and then click
and drag over the top two rectangles to select them. Once these are
highlighted, click on Object Align Horizontal Center. Do the same for
the bottom two rectangles. You can also use Object Align to vertically
align the two rectangles on the left as well as the two rectangles on the right.
Using the Select tool, click on the rectangle at the top left. Use the
Variable drop-down menu on the toolbar to choose math7 as name of the
variable for this rectangle. Repeat this process to specify the variable names
for the other three rectangles.
Next click on the Add Path tool, , and draw the four paths. When you
draw these, the SEM Builder automatically adds the error terms, and , for
the endogenous variables. Click on the Add Covariance tool, , and
connect the two error terms. Stata will assume the exogenous variables,
math7 and read7, are correlated, so we do not need to draw the curve
between them (but we will anyway for practice). When you use the icon for a
curved line, the line will curve inward or outward depending on whether you
connect going up or down. If it curves the wrong way, you can click on the
Select tool, and then click on the curve. A blue line with a small, hollow
circle at one end will appear on the curve. Click on the circle to change the
curvature of the line.
3.8
math7 math21 ε1
1 0.320 1.9
.78
0.118
0.270 0.183
0.261
3.9 0.499
read7 read21 ε2
1 6.4
.7
Figure 2.17: Model fit with standardized path coefficients
displayed
The model has a variance reported for each error term. My preference is
to delete the error-variance numbers from the figure, keeping just the
correlation between error terms wherever there is one. Go to Settings
Variables All Error..., and on the Results tab change Error variance to
None.
To change the number of decimal places reported for the path coefficients
and correlations, click on Settings Connections All.... In the dialog box
that opens, go to the Results tab and click on Result 1.... Another dialog box
will open, and here you can change the Format to %5.2f and then click on
OK.
Moving the values for the paths is a bit tricky. So far, you have something
that looks like the figure below. In the figure, I have used the Select tool to
highlight the path from math7 to read21. This causes the path coefficient 0.11
to appear in blue and a button labeled Properties... to appear on the top
toolbar.
Click on the Properties... button, and go to the Appearance tab of the
dialog box. Check Customize appearance for selected connections, and then
click on Set custom appearance to open a new dialog box. Go to the Results
tab, and click on the Result 1... button on the lower left side. One element of
the dialog box that opens (see below) is Distance between nodes, which is a
percentage of the distance between the predictor and its outcome. The default
distance of 50% causes a problem in our case because this is where our paths
cross each other. Change the percentage to 30% and click on OK.
Repeat the procedure for the path from read7 to math21 by first using
the Select tool to highlight the path. Once you have changed the percentage
for that path, we have taken care of our second problem.
0.12*
0.27*** 0.18 **
0.26***
0.50***
read7 read21 ε2
Figure 2.18: Model fit with less clutter and moved path values
Now we can enter the text for the model fit. Text can be included on the
figure itself or in a footnote. You will learn how to include it in the figure in
appendix A at the end of the book, but for now you may want to skip this part
and just put the text in a footnote. In either case, we need to run estat eqgof
as well as estat gof, stats(all). We can run these using the SEM Builder.
Click on Estimation Goodness of fit Equation-level goodness of fit.
This gives us many options, but Equation-level goodness-of-fit
statistics (eqgof) is already selected for us. Click on OK. Looking at the
Results window, we see the values of 0.206 for math21 and 0.285 for
read21. Next click on Estimation Goodness of fit Overall goodness of
fit. In the dialog box, use the drop-down menu under Statistics to be
displayed to select All of the above. Click OK. If you also want
modification indices, select Estimation Testing and CIs Modification
indices.
. Notice the order of the subscripts on . The first is the dependent variable, x6, and the second is the
independent variable, x1. When you write out a simple regression equation, such as
, your dependent variable is on the left and your independent variable is on the right; thus the reason that the
x6 variable is the subscript on the left while the x1 is on the right.
. Without structural equation modeling, you would need to use special regression commands, such as
seemingly unrelated regression (sureg), when you have correlated residuals. It would be inappropriate to fit
this model using Stata’s regress command.
. There is no direct effect, but there are three indirect effects: , , and
.
. Remember, if you use a Mac, you first click on the gear-shaped tool in the upper right of your SEM
Builder screen.
. Here total effects direct effects indirect effects.
. This is not necessary because Stata assumes these exogenous variables are freely correlated; however, it is
sometimes nice to include this for readers who are unfamiliar with Stata. When you have several exogenous
variables, including all the correlations becomes very messy and detracts from the main point of your figure.
In that case, leave them out of the figure and add a footnote indicating that the exogenous variables were
allowed to be correlated.
. It is not necessary to draw these correlations when working with the SEM Builder; it could make the
drawing quite messy if we had several exogenous variables. Remember, Stata assumes that these exogenous
variables are correlated.
. This figure was not drawn using the SEM Builder because this freehand model was a better way to
represent the data.
. This model will be limited by how compelling this argument is. Certainly, there are many couples who are
satisfied where neither wife nor husband have much education, and there are also many couples who are
miserable though wife and husband both have substantial education.
. This command would normally be applied to the unstandardized solution, but the data we are using for
this example are limited to a correlation matrix. Nonetheless, we drop the standardized option. When we
leave this option, we get slight differences on one of the pairs that we constrained to be equal. This is because
the standardization uses the fitted variances, and they are not exactly equal to 1.0.
. You may also want to reread the chapter 1 appendix box 1.A, which details the differences between the
SEM Builder interfaces in Stata for Windows, Unix, and Mac.
. I like to make the SEM Builder fill most of my screen. You can do this by dragging the edge of the
Builder. Then press the Fit in Window button on the toolbar to make the canvas fit the SEM Builder window.
. If you ever have variable names that are too long to fit in your rectangles, this is also where you fix that
problem. Simply go to the Box/Oval tab, and change the size of the rectangles. Alternatively, you can click
on the variable, go to the Appearance tab, check the box for Customize appearance for selected variable,
click on Set custom appearance, and then adjust the size of the rectangle or oval.
Chapter 3
Structural equation modeling
3.1 Introduction
In this chapter, we will learn about structural equation modeling (SEM),
which is a combination of what you learned in chapter 1 about measurement
models and what you learned in chapter 2 about path analysis and structural
models. SEM opens up an enormous range of research capabilities. This
chapter will complete your basic understanding of SEM, and then we will
learn about some of the specific applications of SEM in the next chapters. SEM
techniques can be applied to an ever-expanding range of research topics and
offer highly flexible strategies.
Consider the latent concept of alienation. This was seen as varying from
day to day if not from hour to hour, and certainly from one year to the next.
The argument was made that many social–psychological concepts were so
unstable that we should ignore them. It makes little sense to predict a variable
or to use it as a predictor if the concept that it represents is constantly
changing.
educ66 occstat66
SES66
ε7 Alien67 Alien71 ε8
ε3 ε4 ε5 ε6
Fitting the model does not depend on the order of the error term labels,
but you may want to override the initial order Stata gives you. In
figure 3.1, we first label the error terms for our indicators from to ,
and then label the error terms for our latent endogenous variables and
.
To customize the label for an error term, click on the Select tool, and
then double-click on the error term you want to reorder. In the dialog
box that opens, you can specify a label in the Custom label box.
Suppose the error term for Alien67 was , and we wanted to change it
to . We can enter as {&epsilon}{sub: 7}, and then click on OK.
The ampersand, &, precedes the Greek letter. The sub: within braces
makes whatever follows it a subscript. A sup: within braces would
make whatever follows it a superscript.
By default, Stata uses the Greek letter (“epsilon”) for all error terms.
However, you might encounter an article that uses a (“delta”) for error
terms of indicators of exogenous latent variables, an for error terms of
indicators of endogenous latent variables, and a (“zeta”) for error
terms of the endogenous latent variables. You can imagine how we
would do this. To label the error term for Alien67 as , we would
double-click on the error term for Alien67, and then we would enter
{&zeta}{sub: 1} in the Custom label box.
For this example, we use a dataset from Stata’s Structural Equation Modeling
Reference Manual. These are not raw data but are data in the form of a
covariance matrix. All we need to fit a structural equation model is a matrix
of correlations and a vector of standard deviations.2 We have three latent
variables and need to specify measurements for each of them. We have three
paths in our structural model that link the latent variables. Here is our
program with the postestimation commands you learned in chapters 1 and 2.
The next two lines define the structural components of our model:
and SES66. This reflects figure 3.1,
where alienation in 1967 depends only on SES in 1966, while alienation in
1971 depends on both alienation in 1967 and SES in 1966. As in the
measurement portion of the model, the ordering of the variables does not
matter so long as the arrow points to the dependent variable. We finish our
sem command with the standardized option because we want a
standardized solution.
When you enter the sem command in Stata, you get the following results:
The standardized coefficients appear in figure 3.2. To create this figure, I
used the SEM Builder and deleted some of the parameter estimates reported
by default (the appendix to chapter 2 shows you how to delete these). The
SEM Builder is described briefly in the appendices for chapters 1 and 2 and
more fully in appendix A at the end of the book. If you are not familiar with
the SEM Builder yet, you should work through all of appendix A quite
carefully at this point.
ε1 ε2
educ66 occstat66
0.83
0.65
SES66
-0.15
-0.57
ε7 Alien67 Alien71 ε8
0.66
0.81 0.80
0.81 0.84
ε3 ε4 ε5 ε6
The initial structural equation model has 932 observations. We are told
which three observed variables were used as reference indicators and
therefore fixed at 1.0 for the unstandardized solution, even though we asked
for a standardized solution.
When we ask for the standardized solution, both the latent and the observed
variables are rescaled automatically to have a variance of 1.0, which allows
us to have standardized estimates for each of the loadings for the
measurement model. All six of these are strong, 0.65 to 0.84. These results
are included in figure 3.2. The tests show that all the loadings are
statistically significant.
For the structural part of the model, we see that SES66 has a strong effect
on Alien67: , , . Both SES66 and Alien67
have a significant effect on Alien71. The coefficient on the path from
Alien67 to Alien71 is , , , and the coefficient
on the path from SES66 to Alien71 is , , .
Stata assumes the first observed variable used to measure each latent
variable is the reference indicator. It normally does not matter which
observed variable is the reference indicator; however, it makes sense to
pick a strong indicator as the reference indicator. In the model shown in
figure 3.2, the first indicators—educ66, anomia67, and anomia71—are
the reference indicators, and each of them has the strongest standardized
loading. When the first indicator is relatively weaker than the others, it
makes sense to rearrange the variables so that the strongest indicator
appears first. You might need to first fit the model with the default (the
first indicator) as the initial reference indicator and then pick the
indicator that has the largest loading as the replacement reference
indicator.
To change the reference indicator, you simply list the indicator you
want to be the reference first. For example, our sem command above
has (SES66 -> educ66 occstat66). If we wanted to have occstat66
be the reference indicator, we would replace that portion of the
command with (SES66 -> occstat66 educ66). Alternatively, we
could still use our original sem command and simply constrain the
coefficient on occstat66 to 1 by appending @1 to the variable name:
(SES66 -> educ66 occstat66@1). Doing that would cause the
unstandardized solution to have a fixed loading of 1.0 for occstat66.
We see that 32.1% of the variance in the latent variable Alien67 and
57.6% of the variance in Alien71 were explained. The pre-SEM literature
suggested that alienation would not be stable; however, the stability
coefficient for , , is highly significant and
substantial, especially given the time interval between its measurements in
1967 and 1971. The stability of alienation is substantially higher in our SEM
approach because we specify anomia and powerlessness as measurements of
alienation but also account for other variation in these variables through their
corresponding error terms; that is, we include a measurement portion of the
model in addition to the structural portion.
Next we look at the results of estat gof, stats(all) to see how well
our model fits the data. The chi-squared statistic indicates that our model
significantly fails to perfectly reproduce the original covariance matrix. Our
model’s , , means that it fails to fully account for all
the variances and covariances. The root mean squared error of approximation
(RMSEA) of 0.11 is well above the goal of being less than 0.05. The
comparative fit index (CFI) of 0.97, however, is better than our target of 0.95
for a good fit.
Let us focus on the modification indices for the covariances of the error
terms. It would make sense to have the error terms for anomia67 and
anomia71 be correlated as well as the error terms for pwless67 and
pwless71. These changes in our model reflect the idea that unobserved
variables are shared by these respective error terms. For example,
psychological variables such as neuroticism are not in our model but are
stable personality traits that might influence your sense of powerlessness at
both waves of data collection. Although we do not observe these variables,
we can allow for their influence by correlating the error terms, which is like
acknowledging the existence of some level of spuriousness.4
Figure 3.3 shows two models:5 panel A has correlated error terms, and
panel B shows unobserved latent variables.
ε1 ε2 ε4 ε5
Alien67 Alien71 ε3
(a) Panel A
Unobserved1 Unobserved2
ε1 ε2 ε4 ε5
Alien67 Alien71 ε3
(b) Panel B
Figure 3.3: Model with correlated error terms to acknowledge the
possible operation of unobserved variables
We are not able to fit the model in panel B because we do not have the
necessary explanatory variables. However, the model in panel A makes
considerable sense. By looking at the modification indices, we can see that
estimating these correlated errors might help our fit—recognizing that once
we estimate one of them, the modification indices for everything else will
change. I think we can justify correlating the error terms for anomia at both
waves and the error terms for powerlessness at both waves. These are not the
biggest modification indices, but they are substantial and correlating them
makes sense conceptually. Here we choose to free two correlated errors at
once because it makes sense conceptually for both to be correlated; we have
no conceptual basis to pick just one to be correlated.
The standardized results, fit statistics, ’s, and indirect effect of SES66
on Alien71 are shown in our final model (see figure 3.4). There are no
modification indices reported. The default in estat mindices only reports
modification indices greater than 3.84, corresponding to the 0.05 significance
level with 1 degree of freedom.
Indirect effects work in the same way that they do in a path model (see
chapter 2). By examining figure 3.4, we see that SES66 has an indirect effect
on Alien71 that is mediated by Alien67 ( ).
We obtain the indirect effects with the postestimation command estat
teffects, nodirect standardized. As explained in chapter 2, the estat
teffects command provides the direct, indirect, and total effects. In our
command, we added the option nodirect to specify that we do not need the
direct effects (we have already estimated them).
ε1 ε2
ε7 Alien67 Alien71 ε8
0.57
0.85 0.83
0.77 0.81
ε3 ε4 ε5 ε6
0.36 0.12
We have only one indirect effect in this model, so these results are
exactly what we need to know; in more complicated models, we may have
two or more indirect effects for a single variable, a scenario we discussed in
chapter 2. In box 2.3, you learned how to estimate and test specific indirect
effects when they are part of your model. The estat teffects command
does not give you an estimate of specific indirect effects automatically nor
their significance when there are multiple indirect paths between a pair of
variables.
You will notice that one of the correlated errors is substantial and
significant, while the other is relatively small (0.12) and not significant. We
leave them both in the model because we earlier decided on theoretical
grounds that they should be included. Some researchers would argue that
because we aim to develop a parsimonious model, we should rerun the model
with only the significant correlated error term. The reality is that whether you
leave a parameter with a small value in a model or remove it does not change
much, simply because the parameter is already close to 0.
If this were the case, then marital satisfaction would have a fundamentally
different meaning for a husband than it has for his wife. His view of marriage
would put the emphasis on sexual satisfaction and his wife’s being a good
mother, while her view of marriage would emphasize emotional support and
financial security. Such a model with hypothetical coefficients appears in
figure 3.5.
0.60
Wife_Sat Husb_Sat
0.40 0.50 0.80 0.70
0.90 0.70 0.50 0.40
ε1 ε2 ε3 ε4 ε5 ε6 ε7 ε8
There are four widely recognized levels of invariance, which are detailed
in table 3.1. The first and lowest level requires that the same set of indicators
be relevant to the latent variable. The hypothetical results in figure 3.5 meet
this level of invariance. In our example about alienation (figure 3.4),
powerlessness might have been more central in 1967 than it was in 1971.
This first level allows both the loadings and the error variances for the
observed measures to vary from one wave to the next. There may be more or
less unique variance for e.anomia67 or e.pwless67 than for e.anomia71 or
e.pwless71, respectively. If this is the best case you can make for invariance,
you need to make this limitation clear to your reader; to some extent, we are
correlating apples with oranges.
There are two strategies we can use. For the first approach, we can test
whether the loadings for pwless67 and pwless71 are significantly different
by using the test command that we learned about in chapter 2. To run this
command, we need to know the nicknames that Stata has assigned to the
different parameters. After running our unstandardized sem command, we
can obtain the nicknames by running sem, coeflegend.
We see that _b[name] is the internal name assigned to each of the
parameter estimates. To test whether the loading for
equals the loading for , we need to test for a difference
between _b[pwless67: Alien67] and _b[pwless71: Alien71].
The second approach to testing for level three invariance is to fit two
structural equation models. We have already fit the first model where
, . The second model is identical except that we place
constraints so the loadings for and
must have identical values (note we are working with
the unstandardized result).
One way to fit our model is to attach the same label to each of these
paths; we will use @a1 as our label. Every parameter with this label attached
to it will be forced by Stata to have identical estimated values. Thus because
pwless67@a1 and pwless71@a1 have the same label, Stata solves for the best
maximum likelihood solution with the constraint that these two
unstandardized loadings are identical:
Near the top of the results, we see a list of the four constraints on our
structural equation model:
The second listed constraint requests that there be no difference between the
loading for pwless67 and the loading for pwless71. Both of these loadings
are constrained to be equal, and the estimated unstandardized value for both
of them is 0.95.
You have learned how to fit two models and compare the chi-squared
values. Stata has a nice way to automate this process by using the estimates
store and lrtest commands. These work for a maximum likelihood
solution because the chi-squared for the differences (0.88 in table 3.2) is a
simple function of the likelihood ratio for the two models. After running the
first model, we add the command estimates store level1 (we could
assign any name here; I chose level1 because that was our least restrictive
model). This command saves all the estimated values for this model. Then
after running the second model, we add the command estimates store
level3 (named level3 because we require the loadings to be invariant).
Finally, we run lrtest level1 level3.
We produced essentially the same values for the chi-squared difference;
the slight difference is because lrtest keeps more decimal places in the
calculations.
Once you have drawn the model, it is easy to add equality constraints
with the SEM Builder. In the text above, you learned how to assign the
same label to parameters you are constraining. We want the coefficient
on the path from to equal the coefficient on the
path from . Click on the Select tool, and then
click on one of these paths. In the Constrain parameters box, , on
the toolbar, enter a1 to attach the label a1 to this path. Then click on
the other path and enter the same constraint. We can now fit the
unstandardized result and get the same results we got above.
We can also put equality constraints on error variances. For a model
with level four invariance, we need to show that the corresponding
variances for the error terms are equal; that is, we need to show that the
variance of for anomia67 equals the variance of for anomia71, and
that the variance of for pwless67 equals the variance of for
pwless71. We do this in the same way that we put an equality constraint
on the two loadings: click on and enter b1 in the Constrain
parameters box, , on the toolbar. Alternatively, you can double-
click on the error itself and enter the constraint in the dialog box that
opens:
Many researchers find the level four invariance model, requiring both the
loadings and the error variances to be equal, to be too restrictive; most are
satisfied with the level three model. After all, the level three model assumes
that the indicators have the same salience (loadings) at both waves. This
allows for differences in the unique variances. For the level four model in
figure 3.4, we need to constrain the variance of to be equal to the variance
of , and constrain the variance of to be equal to the variance of .
Box 3.4 shows how to do this using the SEM Builder. The sem command to
accomplish this is as follows:
You can fit the model with level four invariance, where both loadings and
error variances are constrained to be equal, as an exercise. The level four
model compares with the level three model in the same way we compared the
level three model with the level one model (see table 3.2). If the chi-squared
difference test does not show that the chi-squared values are significantly
different, then we go with the more restricted model, that is, the level four
invariance model. If the -value for the chi-squared difference test is small
enough to indicate a statistically significant difference in the two chi-squared
values, then we compare relative fit by using the RMSEA and CFI tests.
Other latent variables have causal flow in the opposite direction, that is,
the latent variables are caused by the observed indicators. This type of latent
variable is called a formative construct or a composite latent variable. A
model with a composite latent variable is known as a formative measurement
model.
These items will vary in how correlated they are. For example, the
correlation between using pot and cutting somebody is very low. In fact, we
will improve our index if we pick a diverse set of dimensions of delinquency,
which means that many of the correlations will be very low. A simple
exploratory factor analysis will yield several factors.
For the composite latent variable, we have fixed one of our formative
indicators, educ66, to have a loading of 1.0. We should pick one of our
most central indicators as the reference indicator.
We have fixed the error variance for the composite latent variable, , to
be 0.0.
educ66 occstat66
SES66
0 ε7
ε8 Alien67 Alien71 ε9
a1 a1
ε3 ε4 ε5 ε6
Most of the results are familiar. Notice that SES66 is no longer part of the
measurement model. SES66 moved to the structural model results section
because it is now an endogenous variable explained by educ66 and
occstat66. Because we asked for a standardized solution, we have a separate
standardized loading and test of significance for both educ66 and occstat66.
Stata accomplishes this by rescaling the unstandardized solution, which
makes the variance of all variables, including SES66, equal to 1.0; thus we do
not need a reference indicator to estimate its variance. As before, we have
allowed the respective error terms for anomia and powerlessness to be
correlated. And the last item of note: the variance of the error term for SES66,
var(e.SES66), is now 0.0.
To fit this model using the SEM Builder, we need to add the indicated
constraints. After placing the variables and appropriate paths in the path
diagram, we use the Select tool to click on the path , and
then we enter 1 in the Constrain parameter box in the toolbar. Next we
constrain the loadings of pwless67 and pwless71 to be equal by clicking on
each in turn and entering the constraint a1 for both of them. Finally, we click
on the error for SES66, , and add the constraint of 0 for its variance. The
standardized results are shown in figure 3.7.
0.80
0.30
SES66
-0.16
-0.49
ε7
0
ε8 Alien67 Alien71 ε9
0.60
.76 .51
0.85 0.83
0.77 0.81
ε3 ε4 ε5 ε6
0.36 0.12
What does it mean to fix the error variance of SES66 at 0.0 and the
unstandardized coefficient from educ66 to SES66 at 1.0? This means our
latent variable is assumed to be a perfect composite of the observed formative
indicators. These two changes allow us to identify a formative construct.
Constraining the error variance to be 0.0 means that there is nothing more
to SES66 than the effect of education and occupational status. This may be a
difficult assumption to justify to your reader. What about income? A person
who has limited education and limited occupational status may still have a
reasonably high SES66 because he or she has a lot of money. An old HBO
program called the Sopranos was about a leader of an organized gang who
made a lot of money. If you did not happen to know how he made his
income, you would think of him as having high SES66 independent of his
education and the low status his occupation had.
In the previous section, we fixed the error variance for our latent variable
at 0.0, but here you see that we want to estimate it. Why? Simply because
there is more to the quality of a child’s home learning environment than our
five indicators. We have not included anything about the father’s education
or his involvement. Because we lack information on these variables, we allow
there to be error in our latent variable. We are acknowledging that more goes
into the quality of a child’s home learning environment than our five
formative indicators.
mom_ed
fam_inc
ε4 acad_ach ε1
tch_rate ε3
par_sup
par_mon
There are just two differences in the sem command we would use for this
model. First, we do not need to fix one of the formative indicators at 1.0,
because we already have a reference indicator (academic achievement).
Second, we do not need an option of var(e.LearnEnv@0), because we are
able to allow for some error variance in our latent variable.
At the top of the results, Stata reports the log-likelihood function for
each iteration. Where there is a problem, Stata reports “not concave”,
which means that the log-likelihood function is essentially flat at a
particular location. This is not a problem if it happens in some
iterations, but when there is a long list of iterations that are not concave,
Stata may never converge on a solution. If this happens, you can
examine the results for an iteration that is not concave to see if there is
an error variance that is very close to 0. To get these intermediate
results, you add iterate(#) as an option, where # is the number of an
iteration that is not concave. This option forces Stata to print out the
results it has at that iteration. Here are the first seven iterations from a
result reported in this chapter:
The model eventually converged on the 15th iteration. If you have what
seems like an endless list of not concave iterations, you might find an
error variance that is very close to 0, which may be the reason your
model is not converging. If you do not see any other problem in the way
your model is specified, you might try fixing the offending error
variance at 0 or some other very small value.
3.6 Exercises
1. A modification index for the covariance between a pair of error terms is
40.00. What does this mean? When should you revise your model to
allow for this covariance?
2. What are the differences between the four levels of invariance?
3. Draw a simple model and explain how you would test whether two
parameter estimates were equal. Why is testing the equality important?
4. Give an example of a composite latent variable.
a. Why does the arrow go in the opposite direction for this type of
latent variable?
b. What assumptions do we make to identify a composite latent
variable?
5. Make up an example of a MIMIC model that would be useful in your
substantive area. Use the SEM Builder to draw this figure.
6. Does a patient’s confidence in his or her primary care physician
influence the patient’s compliance? Does a patient’s compliance
influence his or her confidence? You have two measures of confidence:
a) a scale of the patient’s rating of the physician’s medical expertise and
b) a scale of the patient’s rating of how caring the physician is. You also
have two measures of compliance: a) compliance about prescriptions
and b) compliance to behavioral instructions given by the physician.
You have measured all four variables at two time points, and your fit
model looks like this:
.51 .49
ε1 ε2 ε5 ε6
ε3 ε4 ε7 ε8
.52 .48
Your data are in the compliance.dta dataset, and they are hypothetical.
. We discussed nonrecursive path models in chapter 2. The same identification issues apply to nonrecursive
full structural equation models.
. I discuss this data format in appendix B at the end of the text.
. Modification indices are based on freeing parameters one at a time, not sets of parameters.
. Spuriousness means that a third variable explains the relationship between two variables. For example,
neuroticism might explain part of the relationship between anomia67 and anomia71. Thus we could say that
part of the relationship between anomia in 1967 and anomia in 1971 is spurious because of the common
antecedent cause of neuroticism.
. Both models exclude SES66 for simplification.
. A test of significance for the standardized indirect effect is not available directly from estat teffects.
To obtain tests of standardized indirect effects, you can use the method described in chapter 2 (see box 2.1).
Chapter 4
Latent growth curves
4.1 Discovering growth curves
In chapters 2 and 3, we were predicting a person’s score on an observed or
latent variable. We identified the variables that predicted who would score
higher or lower on an endogenous variable. For example, we could use a path
model (chapter 2) or a full structural equation model (chapter 3) to predict
who would consume a very high or very low amount of alcohol. In chapter 3,
we used an example involving alienation. We predicted a person’s level of
alienation in 1971 from his or her level of alienation in 1967 and
socioeconomic status in 1966. In this chapter, we go over a fundamentally
different way of examining data. Instead of predicting a person’s score on a
variable, we want to
Actually, there are two possible random effects for a growth trajectory.
The first is the intercept or initial level. Some people start higher or lower on
alcohol consumption, for example. A latent variable can be used to represent
the random intercept. This latent variable may also be called the latent
intercept growth factor. When there is substantial variance in the intercepts of
different people, we may want to look for covariates that help explain this
variance. Do men or women have a higher intercept? Do more religious
people have a lower intercept? Do people from the “Bible Belt” states have a
lower intercept?
The second possible random effect is the slope or rate of change. Why do
some people experience a dramatic increase in BMI during their 20s while
others maintain a constant BMI and still others actually lower their BMI in
their 20s? Perhaps education plays a role, with more educated people having
a less dramatic increase in BMI. Perhaps gender is relevant. Could
race/ethnicity be important? Surely a person’s exercise routine is relevant to
both the intercept and the slope of his or her BMI trajectory. Our predictors
are now explaining trajectories rather than scores at just one time point. Our
latent variables represent an intercept and a slope.
Let us begin with the simplest example, a linear growth curve like the one
in figure 4.1.
Intercept Slope
1 1 1
0 1 2
y0 y1 y2
ε1 ε2 ε3
The variables y0, y1, and y2 are observed variables. They indicate how
people score on whatever we are measuring. The y0 might be the person’s
score at the baseline; for example, if these observed variables represented
alcohol consumption at ages 18, 20, and 22, then y0 would be the person’s
score on alcohol consumption at age 18. If these observed variables were
testing the effects of an intervention, the y0 might be the score before the
intervention commenced; y1 would be the person’s score at wave 2, which
could be the score at the end of the intervention; and y2 would be the
person’s score at wave 3, maybe at a follow-up measurement one month after
the intervention was completed.
Similarly, we allow for some variation from one person to the next in the
latent slope growth factor, and we call that difference from the overall slope
the random effect of the slope. Again, we estimate the variance of this
random effect. Notice in figure 4.1 that we have even allowed the latent
intercept growth factor and the latent slope growth factor to be correlated.
This is the curved line with an arrow at both ends that connects the intercept
and slope growth factors.
To identify the latent intercept and latent slope, remember your basic
regression equation: the intercept is the constant to which you add or subtract
the effect of change in your predictor. Most statistical packages, including
Stata, refer to the intercept as the coefficient on the constant. Figure 4.1
represents this constant by assigning a fixed value of 1.0 to the path from the
latent intercept to each of the observed variables.
From your basic regression training, you will remember that the slope is
the change in the outcome for each unit change in your predictor. In
figure 4.1, we identify these values at 0, 1, and 2. (It may look strange to have
a path with a 0 on it, but showing it this way may clarify what happens.) If y
is measured at three equal-interval time points, then we can say that the first
time point is 0, the second is 1, and the third is 2; the values we choose
depend completely on how time is measured. If you are using a person’s
score at age 18, again at age 20, and finally at age 22, you might use 0, 2,
and 4 because each wave is two time units (years) of time apart. If you use 0,
1, and 2, you need to remember that one unit represents 2 years when you
interpret your results.
We also know the means of the observed variables. We thus have three
variances, three covariance, and three means. It is possible to write an
equation for each of these nine known values.
What would happen if we had four time points? Then we would have
elements of the covariance matrix (four variances and
six covariances) along with four means. We would still have the same
parameters we were estimating plus one additional error variance, and so we
would have degrees of freedom. Having one more time point
would provide a much more rigorous test of our model.
Time points added to a model not only add degrees of freedom but also
provide more information for testing the model. With just two time points,
there is no information to test such a relationship because two points
determines only one line; for example, see figure 4.2. With three time points,
there is a bit more information to test our linear model, and with four time
points, there is even more information for our test. The take-away point to
remember is that you need at least three time points for a linear latent growth
curve, though four or more time points is best.
6
6
4
4
2
2
0
0
0 1 2 3 4 0 1 2 3 4
time time
0 1 2 3 4
time
y4 Fitted values
Figure 4.2: A linear relationship with two, three, and four data
points
With four time points, we can fit a simple curve using a quadratic. Figure 4.3
shows how we would draw such a model. The latent intercept growth factor
with loadings of 1 for each wave and the latent linear slope growth factor
with loadings of 0, 1, 2, and 3 are unchanged. What is new is the latent
quadratic slope growth factor. You may remember that in ordinary
regression, you used to represent the linear predictor ( ) and
to represent the quadratic ( ). We follow the same
logic here. Notice in figure 4.3 that the loadings for the latent quadratic slope
are just the square of the loadings for the latent linear slope; thus 0 stays at 0,
1 stays at 1, 2 becomes 4, and 3 becomes 9.
Intercept Slope Quadratic
1 1 1 1 0 1 2 3 0 1 4 9
y0 y1 y2 y3
ε1 ε2 ε3 ε4
With Stata’s sem command, we can fit linear and quadratic growth
curves, and test to see whether the quadratic is necessary. We should test for
a quadratic component whenever we believe the growth trajectory follows a
simple curve. With more time points, it is possible to add higher dimensions,
but these are often difficult to interpret.
In 2001, our subset of participants were 20 years old, and in 2009 they
were 28. We are interested in what happens to BMIs of people during their
20s. BMI is calculated as , where weight is measured
in pounds and height is measured in inches. The data are in the public
domain, and the complete dataset can be downloaded at
http://www.nlsinfo.org/investigator/pages/login.jsp. We are just using a few
variables and a subset of years for our example in this chapter; I have also
arbitrarily deleted some observations where people reported weighing under
50 pounds or being under 4 feet, 2 inches tall. The dataset we are using is
bmiworking.dta.
y0 y1 y2 y3 y4 y5 y6 y7 y8
ε1 ε2 ε3 ε4 ε5 ε6 ε7 ε8 ε9
Figure 4.4: Linear latent growth curve for BMI between 20 and 28
years of age
Before jumping into fitting the latent growth curve of BMI from age 20 to
age 28, it is helpful to look at a subsample of a few individual trajectories. To
create a figure showing individual trajectories requires you to run Stata code
that does not use the sem command. This is not a necessary step. However, it
is nice to see the overall linear trajectory and also the year-by-year change for
a small sample of observations—say, 20 to 50 people. The results of these
efforts appear in figure 4.7 at the end of this section.
First, we open the dataset and keep just the bmi01–bmi09 observed
variables along with the id variable.
The data are in what is called a wide format. This means we have one row
of data for each person. Look at the tiny part of our data shown in figure 4.5.
The person whose id is 14 has missing values for all nine BMI scores,
bmi01–bmi09. The next person has a missing score on all the variables except
for bmi03. You may notice that we have no variable bmi04 for when the
person was 23; this variable was left out so that you can see how to adjust for
a missing wave in the sem command.
To construct our graph, we need to reshape the data into what is called a
long format. I will not go into the details of this extremely powerful
command other than to illustrate how we use it. The command we run is
The reshape long part of the command tells Stata that we are going to
go from a wide way of arranging the data to a long way. The bmi0 is the stub
for each of our variables, bmi01, bmi02, …, bmi09. The stub includes
everything except the numerical value that distinguishes each wave; we
include the 0 in the stub because each wave has a 0 in its name. If our
variables were hlth1, hlth2, …, hlth10, then our stub would be hlth, and
the first part of our command would be reshape long hlth.
Person 14 had no data on any of the BMI scores. Person 20 only had a
score for the third year of data, 2003, when the person was 22 years old.
Person 26 had 8 years of BMI data. There is just one BMI variable, bmi0,
because we now have our year variable to distinguish the waves. Check out
person 26’s BMI in the last wave, wave 9. Person 26 had a BMI of 37.12. By
contrast, when person 26 was just 21 years old at wave 2, the BMI was 24.21.
A BMI over 25 is considered overweight, so person 26 went from a normal
weight to being morbidly obese during his or her 20s.
Now we are ready to create our graph. We can use the Graphics
Twoway graph (scatter, line, etc.) dialog box to prepare our command. We
will create one graph with just the linear predicted value and then a separate
graph for 10 randomly selected people. We then overlay these graphs into a
single figure.
The complete graph command is quite lengthy, but the syntax is simply
repeated for each line:
The command above produces the graph in figure 4.7. The thick line in
the figure shows the overall linear trajectory. The dashed lines show the
actual changes from year to year for our 10 randomly selected people. The
overall trajectory is positive, with BMI increasing from year to year. When we
look at the 10 individual line graphs, we see some people with missing
values, some with no missing values, some who bounce up and down more
than others; however, there is an overall upward trend in BMI during our
respondents’ 20s.
The overall intercept looks like it is about 26, but notice the substantial
variance between individuals: some people start with a BMI of over 30, and
some start with a BMI of around 20. When there is a lot of variance in the
intercept, we need to use what is called a random intercept model. We need to
estimate the intercept but also the amount of variance around the intercept.
We have a similar result with the slope: some trajectories are positive with
fairly steep inclines, while some have a slope near 0 or even slightly negative.
50
Body mass index (BMI)
30 20 40
20 21 22 23 24 25 26 27 28
Age
The bold regression line we plotted in figure 4.7 is not ideal because it
ignores the interdependence of the repeated measures on the same people.
People who have a high score one year are more likely to have high scores
other years. We would expect some degree of consistency, and this means
that our observations are not going to be statistically independent as is
assumed for ordinary least-squares (OLS) regression. For example, we expect
that your score when you were 20 years old will be related to your score
when you were 21 years old.
Because our data are in the long format, we can quickly check for this
lack of statistical independence by estimating the intraclass correlation (ICC),
also called (“rho”). The higher the ICC, the greater the dependence and the
more inappropriate the OLS regression would be. Values close to 0.0—say,
less than 0.05—are sometimes used to justify using OLS regression. However,
when a study is designed to have repeated measures, that is, multiple scores
nested in each individual, then there is a good reason to go beyond depending
on OLS, regardless of the size of the ICC. OLS offers no theoretical advantage,
even when the ICC is small. However, if the ICC is very close to 0, you may
not have convergence when fitting the model that includes a random
intercept.
With the data in long format, a way to estimate the ICC is to use the mixed
command:
The mixed command is shown with the outcome variable, bmi0, and no
predictors. After the pair of vertical bars, ||, is the name of the identification
variable that specifies how the data are clustered. In this example, each
individual is a cluster and the identification variable for individuals is id (it
is necessary to include the colon after the identification variable). The estat
icc command gives us the ICC based on the results of the mixed model.
The ICC of 0.82 indicates a very high degree of dependence in BMI scores
across the waves. The variance between individuals is shown in the mixed
results as var(_cons) and is 32.42. This represents the differences between
people. The variance within people (across the repeated measures) is reported
as the var(Residual) and is 7.01. This represents how individuals vary from
year to year. We can square these to get variances and then estimate the ICC.
Stata has alternative ways of estimating the ICC; for example, you could
use the command icc bmi0 id. This command applies when you have a
balanced design (no missing values). There are usually at least some missing
values, making our growth curve an unbalanced design. The icc command
uses casewise deletion, dropping any individual that has any missing values.
In our example, icc bmi0 id yields for the 781 observations
that have a value at all eight waves of data. We want to use all of our
available data for the 1,581 observations and so rely on mixed bmi0 || id:
followed by estat icc.
We can fit a basic linear growth curve either using listwise deletion or using
all available data. We just saw from the mixed results that the average
number of BMI reports was 6.6 waves for our 1,581 individuals. It is
extremely common to have substantial missing values like this in a
longitudinal study. People are unavailable for one or more years because of
any number of reasons (out of town during the interview period, incarcerated,
in military service, refuse to answer, ill at the time of the interview). We can
either use all the information we have for each individual, missing values and
all, or we can just keep the individuals who have complete data. Only 781
people have complete data that allows us to calculate BMI for all eight years.
In all likelihood, limiting our sample to just these 781 individuals and thereby
throwing out more than 50% of our subset will introduce more bias than
using all the available information we have on the 1,581 individuals. By
using all available data, we assume the missing values are missing at random
and that variables have a multivariate normal distribution.
Have you guessed what is missing? The mean of the latent intercept
growth factor and the mean of the latent slope growth factor do not appear in
our results. We are missing exactly what we are trying to estimate.
We must have a mean for the intercept and a mean for the slope because
those two mean values determine the overall trajectory. We have to ask for
these values by adding the options noconstant and means(Intercept
Slope). We will also include an option to give us a full-information
maximum likelihood estimation: method(mlmv), where mlmv stands for
maximum likelihood estimation with missing values. The added options
appear in the last line of the command.
We have all 1,581 observations that have a BMI score for at least 1 year.
The first several lines in the table of results show the coefficients on the
Intercept and Slope where we have fixed values. Stata says these values
are constrained, meaning they are forced to be these particular values. Below
that are the means: the Intercept mean is 25.63 and the Slope mean is 0.35.
This means the expected gain in BMI is 0.35 points each year between the
ages of 20 and 28. An increase of 0.35 points may not sound like much, but
over 9 years, you can see that it will add up. We could write an equation to
represent our growth curve:
How well does our linear model fit the data? At the bottom of our results,
we see one of the most important criterions: LR test of model
vs. saturated: chi2(31) = 376.67, Prob > chi2 = 0.0000. A saturated
model, by definition, has no degrees of freedom and perfectly reproduces the
known moments (variances, covariances, means). Our model, which has 31
degrees of freedom, fails to perfectly reproduce all the known moments. In
the strictest sense, our model should be rejected. However, it is often the case
that a model can be imperfect based on this chi-squared test and still be very
useful. We can use the postestimation command estat gof, stats(all) to
get a more comprehensive view of how well our model fits the data.
We can examine the modification indices to see whether they suggest any
possible solutions to our marginal model fit.
Because we know that the modification indices are reported only if
freeing a parameter would significantly improve the fit, we can see there are a
lot of problems. This many problems means we need to be especially careful
to avoid capitalizing on chance; we only want to free parameters that make
conceptual sense. For example, if we freed the intercept at the first wave, we
could reduce the chi-squared by roughly 40.95. The negative value for the
expected change in the parameter estimate suggests that the linear fit slightly
overestimates the initial BMI. Conceptually, the intercept is a constant value
and was coded 1.0 for all years, so it would not make sense to have it vary
from year to year.
There is also a big problem with the slope at bmi09 with a modification
index of 83.09 and a large negative expected change. Some of the other years
also have significant changes in the slope as possible improvements. As with
the intercept, freeing these parameters would essentially destroy a linear
trajectory that has a fixed intercept and a fixed slope.
Before fitting a model with adjacent errors correlated, let us think about a
different strategy to improve fit. We might look at our means for each year.
We could run the following command to get the means for each wave with
listwise deletion to remove observations with missing values from the
sample:
We could graph the means compared with a linear fit. We can enter the
data directly and then create the plot as shown in figure 4.8:
29
28
27
26
25
0 2 4 6 8
year
Our graph suggests a bit of leveling in the growth of BMI, and this would
be tapped by adding a quadratic slope. Does adding a quadratic slope make
sense? We can justify this by recognizing that BMI could not continue to go
up indefinitely in a straight line. We do not know when it will start to level
out, but it is possible that this could happen in the late 20s, as our means
suggest.
When drawing your growth curve model with the SEM Builder, you
have to do two things before you fit the model. To ensure that you will
estimate the mean for the latent growth factors (the Intercept and the
Slope), you need to double-click on them one at a time. When you
double-click on, say, the Intercept, a Variable properties dialog box
will open. In the dialog box, click on Estimate mean:
You next need to ensure that Stata knows all the intercepts on the
observed variables are fixed at 0. Usually, the simplest way to do this is
to wait until you are ready to fit the model. Before you fit it, click on
Estimation Estimate, select the Advanced tab, and click on the box
for Do not fit intercepts:
Alternatively, you could click on each observed variable and enter a 0
in the box that appears on the top toolbar.
The following is the sem command for correlating errors. Notice that there
are no data for wave 4, so we ignore that. Stata generated names for the error
terms automatically by adding an e. as a prefix to the observed variable
name. Thus the error for bmi01 becomes e.bmi01. Correlating this error term
with the error term for bmi02 is accomplished by adding the option
cov(e.bmi01*e.bmi02).
The results (not shown) for the estat gof, stats(all) postestimation
command show a significant improvement. Our ,
, and . Our previous model (without correlated
error terms) had a . The difference between these two chi-
squared values is .A is highly
significant, . You can estimate the exact probability with display
chi2tail(df,chi_squared), that is, display chi2tail(5, 180.24).
Rather than subtracting these by hand and using the display command
to obtain the probability level, you can use the likelihood ratio test as
implemented by the lrtest command. It has slightly better precision
because it keeps more decimal places, and you avoid the chance of making an
arithmetic mistake. After the first structural equation model that did not have
correlated errors, you would tell Stata to save the results by typing estimates
store ind_errors.2 Then after the second structural equation model, you
would save those results under a different name by typing estimates store
corr_errors. Finally, you would run lrtest ind_errors corr_errors.
We can fit a quadratic simply by adding it as a latent variable and fixing its
loading at the squared value of the corresponding linear variable. Here is our
program:
We cannot use the chi-squared values to test which of the two previous
solutions is best. The model with correlated adjacent error terms was not
nested in the model with a latent quadratic slope factor nor vice versa.
However, both models make considerable sense and both improve the fit.
What happens if we put them together?
We can do this by using the following sem command. Because this is our
final model, we will look at all the results and go over them carefully.
Although the model is not perfect because the chi-squared is significant,
, , the other measures of goodness of fit are
excellent: and . The intercept mean is 25.49,
, ; the linear slope mean is 0.49, , ;
and the quadratic slope mean is , , . We could
write this as an equation
20 21 22 23 24 25 26 27 28
Age
Figure 4.9 presents the resulting quadratic graph of BMI from age 20 to
age 28. You can see there is a substantial initial increase in the BMI and a
slight leveling off of the effect because of the small but statistically
significant quadratic term.
We need to pay special attention to the variance of the latent growth
factors. The mean of the Intercept is 25.49, and the variance is 28.71 with a
standard error of 1.16. Stata does not report a test for the variance: the
sampling distribution has a boundary problem because the variance cannot be
less than 0. However, Stata reports a 95% confidence interval, and the lower
limit is well above 0. Therefore, there appears to be a substantial variance in
the random effects. We can take the square root of the variance to obtain a
standard . Based on the assumption that the random
intercepts are normally distributed, about 95% of the people will have a
personal intercept within two standard deviations of the mean. Thus about
95% of the people will have an intercept between 14.77 and 36.20. This is an
enormous range in the estimated initial BMI varying from underweight to
morbidly obese. Whenever you have a substantial variance in the Intercept,
you will likely want to identify exogenous variables (gender, exercise,
parents’ weights) that might explain this variance. The variance around the
overall linear slope of 0.49 is 0.99, so the standard deviation is 0.99. This is a
huge variance relative to the size of the slope. Consider that an approximate
95% confidence interval on the slope will be (95%
confidence interval of to 2.48). Some people have a steep negative
slope, while others have a steep positive slope. We have a substantial
variance for the random intercept and a substantial variance for the random
slope. There is also variance of the random effects for the quadratic, but it is
less dramatic than the variances of the random effects for the intercept and
linear slope.
When you have both a linear and a quadratic slope, the interpretation of
the linear slope is problematic. This is how such a growth curve might
appear:
20
15
Fitted values
10
5
0
0 2 4 6 8 10
x1
You might want to be more specific. For example, you could say that
initially the slope is 6.21, but it becomes flat (the highest point) at
. At any given wave, the slope
is . At the last wave, the slope is
. Given that the initial slope is 6.21,
the slope at 6.87 is 0.0, and the slope at 10 is , you can see that
the graph is easier to explain to your reader than trying to interpret the
slopes directly.
4.5 How can we add time-invariant covariates to our
model?
A covariate is called time-invariant if its value remains constant across waves
—such as gender. In this section, we will add a pair of time-invariant
covariates. We will use gender as a binary predictor of the trajectory and self-
described weight during adolescence, deswgt97, as a second predictor. The
self-described weight variable is coded from 1 for very underweight to 5 for
very overweight. To help simplify the interpretation of the intercept, we will
recode this variable, generating the variable deswgt0_97 that ranges from 0
to 4 instead of from 1 to 5. We do this so that a value of 0 is meaningful for
the time-invariant covariate.
We think that men will have a higher intercept and a steeper slope than
women. We also think that people who thought they were overweight as
adolescents will have a larger intercept and steeper slope than those who felt
they were underweight as adolescents. The steeper slope for those who
thought they were overweight as adolescents represents a cumulative
disadvantage that overweight people have. This is a type of interaction where
the growth rate in BMI during your 20s varies depending on how overweight
you reported that you were as an adolescent.
The RMSEA signifies a marginal fit for our model (RMSEA between
0.06 and 0.08); the CFI suggests a good fit to the data. However, the
, , is highly significant. Remember, to simplify
this model we have not included a quadratic term or correlation of adjacent
errors.
As you may recall, the intercept for the linear growth curve was 25.63;
for the current model, the intercept is 24.86 (this is reported as the coefficient
for the _cons). The reason for the difference is that we have added the two
exogenous variables to the model. The intercept is the value when all
predictors are at 0. For the linear growth curve model without covariates, the
intercept was simply the mean of the latent variable Intercept. However,
our new model including covariates produces an equation for an individual’s
estimated value in the first time period based on his or her gender and self-
reported weight:
Thus the coefficient on _cons in the equation for the Intercept, 24.86, is
the estimated intercept for a woman ( ) who had an average
perception of her weight as an adolescent ( ). For a
comparable man, the intercept is . This confirms our
speculation that men would have a higher intercept, and the difference of
1.56 is highly significant ( , ).
You can see how gender and self-described weight explain variance in the
Intercept. In the results of the estat eqgof command, we see that the
for the Intercept. Thus our two time-invariant covariates are
explaining 28% of the variance in the initial BMI score.
The mean Slope, when we did the simple linear growth curve with no
time-invariant covariates, was 0.35. From our output, we see that the slope
has an expected value of 0.34, that is, . This looks like a
small difference, but we need to be very careful about interpreting slopes and
intercepts when we have time-invariant covariates. Let us look at the equation
for the Slope:
Imagine an intervention that has five waves of data starting at the end of
first grade and ending at the end of fifth grade. The intervention is designed
to increase positive behavior. The solid line in figure 4.10 shows the growth
curve for hypothetical data, while diamonds represent the actual means at
each year. At the end of first grade, the average score was 2.0. No progress
was made in the second year; the average remained at 2.0. Third grade
brought the students up to 2.4, fourth grade brought them to 2.8, and there
was no progress in fifth grade. The intervention ended with an average score
of 2.8.
It looks like a linear growth trajectory makes overall sense, but how could
you explain the lack of progress in grades 2 and 5, and the substantial
progress in grades 3 and 4? One possible factor would be the level of
implementation of the intervention. Each school year would have had a
different teacher: perhaps the teachers for grades 2 and 5 did little to
implement the intervention, while the teachers at grades 3 and 4 made a huge
effort to implement the intervention. We could measure the level of
implementation by each teacher, and this would be a time-varying covariate.
Rather than explaining the overall growth trajectory, time-varying covariates
seek to explain deviations from the overall trajectory.
3
2.5
2
1.5
1
0 1 2 3 4
Wave
The spikes and dips from the overall trajectory are exactly what time-
varying covariates seek to explain. Our simplified growth curve model
appears in figure 4.11. The box labeled “invariant” is a vector of time-
invariant covariates (gender, race, initial health, etc.). The variables labeled
tv0–tv3 are the time-varying covariates. The Intercept and Slope
represent the overall trajectory that may be shaped by the time-invariant
covariates. By contrast, the time-varying covariates directly influence the BMI
at each wave, causing a spike or a dip. In this model, time-varying covariates
do not directly influence the overall trajectory, Intercept, or Slope.
ε3 ε4 ε5 ε6
ε1
Intercept
ε2
Slope
invariant
Although figure 4.11 illustrates only four waves of data, we will keep all
waves in our analysis. The time-varying covariate is measured at each wave,
drkdays01–drkdays03 and drkdays05–drkdays09. We will first recode the
extended missing value codes .a to .e as simple missing values.5 We also
center the drkdays variables before including them in the model. To fit this
model, we add the time-varying covariate in the lines that specify the
relationship between Intercept and Slope and BMI at each year. Here are
our commands and partial results:
4.6.2 Interpreting a model with time-invariant and time-varying
covariates
The RMSEA and CFI suggest a good fit to the data. The
, , is highly significant. Adding the time-varying
covariates greatly increases our model complexity, giving us 99 degrees of
freedom compared with 43 for the model that did not include these
covariates. The CFI is unchanged, while the RMSEA of 0.05 is improved and is
in the very good fit range. Remember, to simplify this model, we have not
included a quadratic term nor correlation of adjacent errors for BMI; we do
still have correlated errors for Intercept and Slope. We also obtain
estimates of the covariances for all the exogenous variables, male,
c_deswgt97, and c_drkdays01–c_drkdays09.
Our intercept and slope are similar to what they were with the time-
invariant covariates. We now have an Intercept of 24.83 compared with
24.86 before. Our slope is now 0.35 compared with 0.34 before. In the results
of the estat eqgof command, we see that for the Intercept,
which is the same as it was before. Our , and this is also the same
as before. These results should not be surprising because our model does not
have a direct effect of the time-varying covariates on either the Intercept or
the Slope. Such effects would be difficult to justify causally given that all but
the first time-varying covariates occur after the initiation of the growth
process.
Our estimated means illustrate the potential problem with our results,
even though our sample is quite large. Notice the estimated means for
c_drkdays01–c_drkdays09. Each of these is centered in our observed
data and has a mean of 0. Although none of the estimates is
significantly less than 0, they are all in a negative direction. The
estimated mean for c_deswgt97 is approximately 0 because it is
relatively more normally distributed. The mean for our observed data on
male is 0.51, indicating that it is a symmetrical variable, and the
estimated mean of 0.50 is very close.
Let us look at the estimated error variances for our measured BMI
variables in the most recent example where we have included time-invariant
and time-varying covariates but not equality constraints:
You can imagine what would happen if we forced these error variances to be
equal. The variance at wave 6 is over three times greater than the variance at
wave 1.
If we examine the variances of the eight error terms above, we see that
there is an increasing error up to age 25 and then it drops back down. Perhaps
there is something going on in a person’s mid-20s that would account for
this. In other situations, there might be increasing variation in errors over the
course of a growth curve if the later measurements are after the completion of
an intervention. On the other hand, when working with children from first
grade to fifth grade, there might be a decrease in the variation of the errors as
the cognitive skills of the students increase, making them better able to fully
understand the questions they are asked.
Why would you force the variance of the errors to be equal? You might
do this whenever you have no reason to think of them differing, because this
simplifies your model considerably—you only need to estimate one variance
for all the errors. Other times, this may be a practical solution to a model that
does not converge. The more complicated a model gets, the harder it is for
Stata to converge on a solution. When you force the error variances to be
equal, you will have a worse fit to your model, but often it does not make a
substantial difference.
Here is the change we make to force all the error variances to be identical:
What is new? The last two lines of the command force all the variances to
be equal by assigning a common name, v, to each of them (any common
name will work). Every parameter being estimated that is assigned this
common name will be forced to be equal.
Look at the error variances above, where we allowed them to be free with
Stata’s default. What do you think the estimates will be when we run this new
sem command that constrains them all to have the same value? You probably
will not be surprised with these results:
The parameter estimates for the Intercept and Slope do not change very
much, but the goodness of fit is much worse (results not shown) with a
, . Still, the and are
only a bit worse.
Box 4.4. Reconciling the sem command and the mixed command
Many growth models can be fit either by using the sem command as
described in this chapter or by using the mixed command. One
limitation of the mixed command is that it can only fit a single growth
curve and is not able to fit parallel growth curves. The sem command
fits growth models much faster in most cases compared with mixed. The
sem command also provides additional information for assessing the fit
of the model, for example, the RMSEA, CFI, and modification indices. It
can be frustrating when fitting a growth model with sem only to find
that the RMSEA is too large and the CFI is too small to meet publishable
standards. If you had run the model with mixed, you would not have this
information.
The two commands have different default treatments of error terms for
each wave. The sem command assumes that error variances are free to
differ from wave to wave; this is its default. The mixed command
assumes that the variances of the errors for the waves are invariant as a
default; this is a much more restrictive assumption. In section 4.7, we
saw that the error variances for the waves varied widely from 2.43
to 8.22. When we constrained them to be invariant, they were estimated
to all be 4.53. Our fit with this invariance constraint was a bit worse but
still acceptable.
Is it justifiable to assume the error variances are all different? If you
have a sample of adolescents being followed from age 13 to age 19 and
are measuring how enmeshed they are with their families, it is possible
to expect the error variances to be different; likely, they would become
larger as the adolescents expand their social networks. On the other
hand, it is good to think about the reasonableness of the assumption that
the error variances are different from wave to wave (Grimm, Ram, and
Hamagami 2011).
4.8 Exercises
1. Using the SEM Builder, draw a figure showing a linear growth curve
with four waves. Draw a figure showing a quadratic growth curve with
four waves. What additional parameters need to be estimated with the
quadratic model?
2. In a latent growth curve, why do the arrows go from the intercept and
slope to the measured scores rather than the other way around?
3. You read that between ages 75 and 85, the speed with which people
perform simple math calculation slows. You are using a math speed test
that has a mean of 80 and a standard deviation of 15 in the entire
sample, ages 16–90. The estimated intercept for your subsample of 75-
to 85-year-old people is 76 with a variance of 1.2. The estimated slope is
with a variance of 1.1.
a. Draw a freehand figure showing the estimated growth curve for
your subsample.
b. How similar are people initially (age 75)?
c. How similar is the rate of change for people between ages 75 and
85? Why would a fixed-effects model be problematic?
4. Given the results for exercise 3, use Stata to draw a two-way graph of
the growth curve.
5. You run an sem linear growth curve, and your modification indices
(MIs) have several estimated MIs over 3.84.
a. What does an mean? How likely is this?
b. The biggest . You free this and refit your model. What
will happen to the remaining MI estimates?
c. Why should you generally free only one parameter at a time based
on the MI values?
d. (Optional) Give an example of a situation where you might free
more than one parameter at a time.
6. Why do time-invariant covariates influence the intercept and slope but
time-varying covariates influence the scores at each wave directly?
7. The sem approach to fitting a growth curve allows variances of errors to
differ from wave to wave. Why would you constrain them to be equal?
What will this constraint do to your goodness of fit?
8. Using the exercise_wide.dta dataset, fit a growth curve for the 12-
week exercise program by using commands (sem, etc.).
9. Repeat exercise 8 by using the SEM Builder.
10. The exercise_wide.dta dataset has a time-invariant covariate,
program. This is coded 0 for programs that increase the number of
replications and is coded 1 for programs that increase the amount of
weight used in each replication. Which program has better results?
. Self-reports of height and weight can have serious measurement error, and this error can be biased. Men
might over-report their height and under-report their weight if they feel they are overweight. Men who have a
very low weight might overestimate their weight. Women may also give biased reports. Because these data
are being used as an example, we will not worry about these possible biases; but we should acknowledge that
direct observer measurement of height and weight would give us far greater confidence in our findings.
. The name you assign to each saved set of estimates is your choice.
. If you do not have fre installed, type ssc install fre to automatically download it.
. If you do not have center installed, type ssc install center to download it.
. Stata supports several missing value codes other than the single dot. For example, .a might mean a
person skipped an item and .b might mean that the person was not interviewed that year. For simplicity, we
are treating all reasons as simply having missing values.
Chapter 5
Group comparisons
Stata offers a sophisticated way of comparing known groups with the sem
command. In earlier chapters, we have done some types of group
comparisons whenever we included a categorical variable as a predictor. In
chapter 2, we fit path models; we might include a categorical variable such as
gender as a predictor of one or more of the endogenous variables. This would
simply add the gender effect, if any, to the estimated score on the outcome
variable. For example, we might have income as an endogenous outcome
variable, and education and gender as exogenous variables. We would have
an effect for education, which would be added to the effect of gender to
predict income. This is often referred to as an additive model. On the other
hand, we might think the effects of other exogenous variables varied by
gender; perhaps the effect of education on income is different for women than
it is for men. A simple additive model would not be able to handle this, and
we need to allow for the interaction of gender with education. We would say
the effect of education on income was moderated by gender.
We might predict that a) the more a study participant felt that the U.S.
must be the peacekeeper for the world, the more the participant will support
the Republican candidate and b) women are less likely to support the
Republican candidate. An additive model appears in figure 5.1 panel a. To
obtain a predicted support level for the Republican candidate, you simply add
the two main effects.
pol_order
+
support ε1
-
female
pol_order
+
femXorder support ε1
-
-
female
You might believe that the additive model is too simple. Having an
interventionist policy might have a positive effect on men but a negative
effect on women; that is, you believe there is an interaction. Gender
moderates the effect of supporting the U.S.’s responsibility for maintaining
the pol_order of the world on support for the Republican candidate. The
effect of one variable, pol_order, on another variable, support, is contingent
on the level of the third variable, female. This is called an interactive model
and is illustrated in figure 5.1 panel b.
We can fit a simple interactive model like this by entering a product term:
Let us take a second look at the multiple indicators, multiple causes model
that was presented in chapter 3. We were interested in how important a series
of factors was in producing a quality home learning environment and how the
quality of the home learning environment influenced the academic
achievement, educational aspirations, and teacher ratings of children. We
might think that the gender of the child is an important moderator. Perhaps
mother’s education is more critical for daughters than it is for sons. Perhaps
parental monitoring is more important for sons, but parental support is more
important for daughters. Could the number of siblings have a different effect
on the quality of the learning environment for sons than it does for daughters?
Perhaps all these effects are equal for daughters and sons. Based on the right
side of figure 3.8, we might speculate that the quality of the home learning
environment has a greater or lesser effect on academic achievement,
educational aspirations, and teacher ratings for girls than it does for boys.
Perhaps we think that it may have been true in the past that women and
men had very different emphases on different indicators of their perceived
marital satisfaction, but not so much now. We might compare a sample from
the 1980s, where this difference between women and men was stark, with a
sample from the 2010s, where the difference is smaller and perhaps no longer
significant. To use the date of the study (1980 versus 2010) and gender as
categorical variables, we would have a four-group design: 1980 women
versus 1980 men versus 2010 women versus 2010 men. Perhaps today’s
women place as much emphasis on sexual satisfaction as do their husbands;
perhaps today’s men place as much emphasis on emotional support as do
their wives.
Once you have evaluated the measurement model to show that the latent
variables have the same or very similar meanings for each of the groups, then
you can proceed to comparing the structural models. Are all of your latent
predictor variables equally salient to predicting the outcome variable?
All the examples we have used so far are purely hypothetical. You might
agree or disagree with any of the speculations we have proposed. The point
of this introduction is to give you an idea of the rich set of research questions
you can generate in your own field of study—questions that Stata’s multiple-
group modeling can help you resolve.
Our model appears in figure 5.2, including the results for the standardized
solution using the default maximum likelihood estimator. Figure 5.2 provides
the solution when both women and men are combined into one group. The
figure is arranged so that we could have a large number of indicators.2
We fit this model using the SEM Builder, but the sem command is as
follows:
ε1 x1 x4 ε4
0.85***
0.54***
0.11*** x9 ε9
0.67***
ε2 x2 Depress Gov_Resp
-0.60*** 0.60***
We fit the model simultaneously in both groups and constrain all the
corresponding parameters to be equal for women and men. The estimated
parameters are equivalent to those from a single group model fit without the
group(female) and ginvariant(all) options. We include these options
here to obtain a likelihood and chi-squared statistic that can be compared with
models that we fit below. Although the model has a significant chi-squared,
both the root mean squared error of approximation (RMSEA) and comparative
fit index (CFI) are excellent. All the loadings are highly significant and in the
correct direction.3 There is a very weak but statistically significant
correlation between a person’s level of depression and his or her favoring a
more activist government.
5.3.1 Step 1: Testing for invariance comparing women and men
Now suppose we think there might be group differences between women and
men. Think about all the possible differences. The loadings on the indicator
variables of the latent variables might be different for women than they are
for men. Some of the indicator variables might be completely irrelevant to
one group. Although not likely in this example, we might even have one or
more indicators loading on a different latent variable for one of the groups.
Here we will fit a model that imposes the equivalent form on all the
relationships but does not impose any equality constraints. An equivalent
form solution has the same form with the same indicators loading on the
latent variables for each group, but it does not require the corresponding
loadings to be equal. This model places no equality constraints on the error
variances, the variances of the latent variables, or the covariance of the latent
variables. Additionally, we are not examining whether the means of the latent
variables are the same or different. Our sem command is as follows:
The group(female) option names the grouping variable we are using and
is unchanged. What is new? The ginvariant(none) option tells Stata to
apply the same form of the measurement model as defined by the first two
lines but to require none of the corresponding estimates to be equal. The
means(Depress@0 Gov_Resp@0) option tells Stata that the means of the latent
variables are constrained to equal 0 in both groups. We do this because we
are not yet ready to compare means. It is standard to work with the
unstandardized solution when comparing groups, so we do not ask for a
standardized solution. Also, Stata assumes for this model that women and
men may have different variances on the latent variables.
The single group model that we fit previously is nested in this model, so
we can compare their fits. The model fit with ginvariant(all) has a
, while the model fit with ginvariant(none) (which had no
constraints on any of the estimates) has a . The difference in chi-
squared is 74.32. This difference has degrees of freedom and a
. An alternative way of evaluating the significance of the difference
is to use a function: display chi2tail(22, 74.32). This returns a
probability of 1.36e or . Thus we can assert that a
model with no invariance constraints does significantly better than a model in
which all the parameters are constrained to be equal for women and men.
As for the model fit with the ginvariant(none) option, at the least our
results allow us to assert that women and men have the same form of the
model. This is a much weaker assertion than if we had required equal
loadings, and without equal loadings we would weaken any additional
comparisons, such as the correlation between depression and support for an
activist government. Why? Because depression or support for government
responsibility would have a different meaning for women than it would for
men. Perhaps the government being responsible for decent housing is more
central to women’s beliefs about the role of government than it is to men’s
beliefs. In this case, it would have a significantly different loading for women
and for men.
We could have fit this multiple-group model by using the SEM Builder.
Open a new SEM Builder and draw your model as if you had a single group;
your figure should look like figure 5.2 without any results. Using the Select
tool, click on Depress. On the toolbar at the top of the window, type 0 in the
Constrain mean box. Then select Gov_Resp and constrain its mean to 0 as
well. Next click on Estimation Estimate.... On the Group tab, check
Group analysis, and for the Group variable enter female. Under
Parameters that are equal across groups, choose None of the above.
Because we are just testing for the same form of the model, there are no
constraints on parameters across groups. Now click on OK to get the results.
There is one big difference from results for single group models. At the
center of the top bar of icons, you have a drop-down box labeled Group that
shows 0 - Man; this indicates that the results shown are for men.
You can change this to get the results for women. Figure 5.3 shows the
results for both groups. In this figure, I do not bother to indicate significance
levels or reported measures of fit.
ε1 x1 x4 ε4
1.00
1.00
x9 ε9
0.04
1.09
ε2 x2 Depress Gov_Resp
-0.70 1.03
x10 ε10
0.92
0.67
ε3 x3 x12 ε12
1.00
1.00
x9 ε9
0.02
1.19
ε2 x2 Depress Gov_Resp
-0.76 1.35
x10 ε10
1.06
0.71
ε3 x3 x12 ε12
Based on the previous section, we will assume that some parameters are
different for women than they are for men, though we do not know which
parameters are different. The difference could be in the loadings, the error
variances, or the covariance of the latent variables. We will begin with the
loadings.
Is the restriction that the loadings are equal for women and men likely to
work? If this is true, then we can assume the latent variables have the same
meaning for both women and men; that is, one or another of the indicators is
not more central to the meaning for women or for men. In our results above
that did not impose this restriction, the unstandardized loading for
is 1.09 for men versus 1.19 for women; the loading for
is 1.03 for men versus 1.35 for women; etc. Some of these
differences sound like they might be substantial, and some are remarkably
similar. For invariant loadings to be justified, we need to impose the
constraint that all the corresponding loadings are the same for women as they
are for men.
The difference between this model and the previous model is that we
have required the loadings to be invariant. The loading has an
[*] preceding it, indicating it is constrained to be equal for both women and
men. The loading is for both groups. The loading is
similarly constrained and is estimated as 0.69 for both groups. Although this
applies to each of the loadings, a separate estimate of each error variance is
shown for men and for women. Additionally, there is a separate estimate of
the covariance of the latent variables for each group.
This invariant loadings model has a . This is nested in our
same form model where the , so we can compare the chi-squared
values. The difference is 5.29.
If you had found that the equal loadings model did significantly worse
than the same form equivalence model, you would reject the equal loadings
model. You could proceed but would need to caution your readers that you
have not justified an equal loadings model, and it appears that the latent
variables have somewhat different meanings for the two groups. You can run
the postestimation command estat ginvariant, showpclass(mcoef) to see
which, if any, loadings were problematic. This command will give you a test
of significance for each constrained parameter estimate. It also provides a test
of significance for each parameter that was not constrained against the
hypothesis that it should be constrained. Let us look at what this command
reports:
If we can show support for the model with equal loadings for women and
men, then our comparisons are much more reasonable. If each of the
corresponding indicators can be constrained to have the same loading for
both women and men, then both depression and government responsibility
are assumed to mean the same thing for women and for men.
We could use the SEM Builder in the same way as was described at the
bottom of section 5.3.1 with one difference. Instead of choosing None of the
above in the Parameters that are equal across groups box, we would choose
Measurement coefficients. The figure would show that the corresponding
loadings are identical for women and men, but the covariance that was not
constrained would be different. This equal loadings model does not require
the error variances to be equal.
This test rarely supports both equal loadings and equal error variances. Most
researchers are happy to proceed once they have demonstrated equal
loadings, and some proceed even if they can only support same form
equivalence. Will we be able to demonstrate equal loadings and equal error
variances? Look at the separate variance estimates for the measurement errors
in the results above. The variance of for women and 0.50 for
men. For e.x1, the corresponding values are 0.13 and 0.12. Most of the
variance estimates are remarkably similar, so we may be able to demonstrate
equal loadings and equal error-variance equivalence. We test this model with
the following:
The only change is that we now require the corresponding variances of
the measurement errors to be equal across groups by adding merrvar
(constrain measurement error variances) to our ginvariant() option. We get
a , , , and (results not
shown). Compared with our equal loadings model, where ,
, and , we seem to be doing slightly worse with the
additional constraints of the equal error-variances model. The difference in
chi-squared is , , but the RMSEA and CFI are virtually
unchanged.
We could draw this model with the SEM Builder in the same way as
described at the bottom of section 5.3.1 except that in the Parameters that are
equal across groups box, we would choose Measurement coefficients and
Covariances of measurement errors.
So far, we have not tested for equal means on the latent variables, Depress
and Gov_Resp. Before doing this, some researchers add a test of equal
intercepts. When the latent means are constrained to be 0 as we have been
doing so far, the intercepts, labeled _cons, are simply the means for the
indicators. If the items have very different means, then this might reflect an
important gender difference. For example, women might have higher means
(or lower means) on the three indicators of depression. We can run tabstat
to see a comparison of these means. We use the if e(sample) restriction to
compute means with only those observations used to fit the models above.
We can use sem to test whether the corresponding means (intercepts) are
equal for the indicators of Depress and Gov_Resp. Let us work with the
model that has the invariant loadings (mcoef) but does not impose equal
measurement error variances (merrvar). We can achieve this by adding the
intercepts, mcons, to our ginvariant() option.
The intercepts are now preceded by an [*] indicating their equality
constraint. For example, the intercept for is 3.23. For the
model that has equal intercepts and equal loadings, the ,
. We can compare this with the model that has equal loadings but
without the equal intercept constraints where we had . The
difference is significant: , . This would lead us to
reject the equal intercepts constraint. This is not especially disappointing in
our example, because we want to move to comparing the means of the latent
variables; if the loadings and intercepts were both equal across gender, then it
would be unlikely for the latent variables to have different means.
We can construct a table that summarizes our findings so far. Table 5.1
shows that all the models provide reasonable fits to the data.
We could make a case for any of the models summarized in table 5.1
because they all provide a good fit to the data. Both the RMSEA and CFI are in
the good range for all models. Usually, we would pick the most restrictive
model that still provides a good fit so that when we are comparing the means
for women and men, we have first eliminated as many differences as possible
—other than a difference of means. I would feel very comfortable with the
equal loadings model (model 2). It is not significantly worse than the
completely unconstrained model (model 1). Going this way, we can assert
that there is not a significant difference in how women and men rate the
corresponding indicators of Depress and Gov_Resp.
You might want to argue for the equal loadings and equal error-variances
model (model 3). This means that women and men weight the different
indicators equally and additionally have equal unique error variances. This
model provides a good overall fit in terms of the RMSEA and CFI values, but it
is significantly worse than the equal loadings only model (model 2). The
estat ginvariant command pointed to two error variances that were
significantly different; we might free just those two error variances. The
defense of model 3 is that its RMSEA and CFI are great and virtually as strong
as for the less restricted model.
If you are willing to accept the equal loadings and equal error-variances
model as reasonable, you likely would go even further and accept the model
that has equal loadings, error variances, variances, and covariances (model 4).
It does not do significantly worse than the less restrictive model, and its root
mean squared errors and CFI are both excellent. The final model we fit,
model 5, does significantly worse than model 2.
Can you see the problem for readers who are not familiar with this
distinction? If ’s are invariant, then ’s will only be invariant if the
standard deviations for both variables are identical across groups. Thus
when your test for invariance is successful but you show a standardized
result, the standardized values will be different. This is not a problem:
your model is invariant, and the standardized solution confounds this
invariance in the form of the relationship with group differences in the
standard deviations.
In our example, we will illustrate this to see whether women are more or
less depressed and more or less supportive of an activist government
compared with men. The reference group is the group that was scored a 0 on
the grouping variable; this makes men our reference group because our
grouping variable, female, is coded 0 for men. Here is our sem command:
We add the option ginvariant(mcoef mcons). Making the intercepts
invariant by specifying mcons in the ginvariant() option and removing the
mean(Depress@0 Gov_Resp@0) option results in Stata estimating the group
means. The mcons option forces the intercepts to be equal, and thereby any
difference in means of the indicators is reflected in the means of the latent
variables. As the reference group, men have a fixed mean of 0 on each latent
variable, and women have a different mean. The intercept, _cons, for each
indicator has an [*] indicating that the intercept is constrained to be
invariant across groups. Because we have not constrained the variances of the
measurement errors to be invariant, our model is equal loadings rather than
equal loadings and equal error variances.
To fit this model using the SEM Builder, we would add a constraint on the
measurement intercepts and remove the constraint on the latent means. Using
the Select tool, we click on Depress. In the Constrain mean box on the
toolbar, we remove the 0 constraint. Then click on Gov_Resp, and remove the
0 constraint for it as well. Now click on Estimation Estimate.... On the
Group tab, choose Measurement coefficients and Measurement
intercepts in the box labeled Parameters that are equal across groups.
Click on OK to fit the model.
1.00 ***
x9 ε9
V = .32 V = .31
1.13 ***
ε2 x2 Depress Gov_Resp ***
-0.71*** 1.17
M=0 M=0
x10 ε10
0.98 ***
0.67 ***
x12 ε12
ε3 x3
1.00 ***
x9 ε9
V = .34 V = .28
1.13 ***
ε2 x2 Depress Gov_Resp ***
-0.71*** 1.17
M = -.15 M = -.14
x10 ε10
0.98 ***
0.67 ***
x12 ε12
ε3 x3
We fit two models: the first allows the covariance to be different for
women than it is for men, and the second constrains the covariance to be
equal. If the model constraining the covariance to be equal has a significantly
worse fit, then we can assert that the covariances are significantly different.
On the other hand, if the model that constrains the covariance to be equal is
not a significantly worse fit, then we assert that the covariances are not
significantly different. We test this by using the unstandardized model with a
covariance but then replay the standardized results that convert the
covariance into a correlation.
We will explore a simple path model that compares Head Start children
who live in a foster home with Head Start children who do not live in a foster
home. Colleagues of mine were interested in predicting problem behavior
among first graders based on three variables using a national sample
(U.S. Department of Health and Human Services 2010). The quality of the
preschool teacher–child relationship, tcr, and the quality of the preschool
based on systematic observation, qual, are used as predictors of behavioral
problems the child had in kindergarten, bk. The first pair of predictors, tcr
and qual, were felt to lead to the level of kindergarten problem behavior. All
three variables were thought to lead to the level of first grade problem
behavior, b1.
The full study was much more complex than this, and we are using only a
small subset of the data. The results we report will differ from those in the
larger and more complex study based on the entire sample. We are just using
this small subsample of data to illustrate how to compare path models across
groups. Do not draw any substantive conclusions from the results we report
here. Our subset of the data is in the multgrp_path.dta dataset. We have a
grouping variable, grp, that is coded 0 for Head Start children who do not
live with a foster parent and 1 for Head Start children who live with a foster
parent. Figure 5.6 shows our model, where qual is quality of preschool, tcr
is preschool teacher–child relationship, bk is behavioral problems in
kindergarten, and b1 is behavioral problems in first grade.
qual
ε1
bk b1 ε2
tcr
We can fit this model with all the children in our subsample and ignore
whether they come from a foster home situation, but we are fundamentally
interested in possible group differences. If we simply identify our grouping
variable, grp, we obtain solutions that allow all the parameters being
estimated in our path model to be different for our two groups. This is the
same idea for the structural model as the same form equivalence was for the
measurement model. That is, we allow the path coefficients (structural parts)
and the error variances to differ across groups. Here is our command:
The group() option identifies the grouping variable, grp, which has a
value of 0 for children who are not from foster homes and a value of 1 for
children who are from foster homes. We have asked for an unstandardized
solution (the default) because these are best for comparing groups (see
box 5.1). Adding the covariance option, cov(tcr*qual), will not change the
results of the other parameter estimates but will force Stata to include the
covariance estimate in the results (we may want to have this in the figure).
You can see that there is a separate parameter estimate for each path
depending on the group the child was in. Both groups were estimated
simultaneously with no equality constraints across the groups. For example,
the path coefficient for (quality of the child’s preschool to
behavioral problems in kindergarten) is insignificant for children who are not
from foster homes ( , , ); however, this same
path coefficient is significant and positive for children from foster homes (
, , ). It appears that the quality of a preschool is a
significant predictor of behavioral problems in kindergarten, but only if the
child is from a foster home situation. Several of the predictors have quite
different path coefficients for the two groups. At the bottom of the table, we
see our , meaning that our model is just-identified.
We see that we have 84 children who are not from foster homes and 91
children who are. The overall model as well as the solutions for each group
have a chi-squared of 0 because our model is just-identified within each
group—that is, we have no degrees of freedom.
We could have fit this path model by using the SEM Builder. After
drawing the model, click on Estimation Estimate..., and then click on
Group analysis. Enter grp as the Group variable. If you want a
standardized solution, you can move to the Reporting tab and click on
Display standardized coefficients and values. Click on OK. Your results are
shown for the group scored 0, that is, not a foster child. You can change to
the foster child group by selecting it at the middle of the top bar. Figure 5.7
shows how you might present the standardized results after deleting some of
the coefficients you do not want to show.
qual
-0.05
-0.08
ε1
-0.09 bk 0.38 b1 ε2
-0.33
-0.24
tcr
qual
-0.05
0.21
ε1
-0.03 bk 0.33 b1 ε2
-0.54
-0.10
tcr
(b) Panel B: Foster child
Figure 5.7: Standardized results for children who are not foster children
( ) and who are foster children ( )
If somebody were to replicate your study with 5,000 in one group and
5,000 in the other group, the constrained unstandardized path would be
approximately in the middle, say, around 1.1. The relative sizes of the
groups has a huge effect on the constrained estimate.
We can run the command estat eqgof (results shown below) to get the
values for each endogenous variable, separately by group. We have more
explanatory power to predict kindergarten behavioral problems among the
foster home children, with for bk (behavior problems during
kindergarten) compared with for nonfoster children. In the case of
b1, behavioral problems at first grade, for nonfoster children and
for foster children.
Should we run the estat ginvariant command on this solution? There
is no reason to do so because our chi-squared value is not significant, hence,
no change could have a significant improvement. However, if there were
problems with model fit, we would want to run estat ginvariant to locate
the problem. It would be possible to add equality constraints to the variances
and the covariance, but we have no hypothesis about that, so we will allow
them to be different.
We could have fit the model above by using this sem command:
We did not need the prefix of 0: or 1:. When we do not include 0: and
1:, whatever constraints we specify—such as tcr@a1—
apply to all groups. Sometimes it is very useful to include the group
prefix as we did here. Suppose you want to compare the effect of qual
on bk with the effect of tcr on bk, assuming that both qual and tcr
were measured on the same scale. Further, while constraining the paths
to be equal in each group, we might not want to constrain them to be
equal across groups. Our command would look like this:
Because both qual and tcr have the a1 constraint in group 0:, these
two parameter estimates are constrained to be equal. Because qual and
tcr have the b1 constraint in group 1:, these two parameter estimates
are constrained to be equal. However, there is no cross-group equality
constraint.
We need to draw our model with constraints on all the parameter estimates
we want to be the same in both groups; see figure 5.8.
qual
b1
ε1
bk d1 b1 ε2
a1
c1
tcr
ε7 appear2
Appear
ε8 appear3 peerrel1 ε2
ε1
ε9 appear4 peerrel2 ε3
Peerrel
Physical
ε12 phyab3
ε13 phyab4
We will use data published by Marsh and Hocevar (1985). They provided
summary data (means, standard deviations, and correlations). Stata used these
summary statistics to illustrate how to create a dataset for structural equation
modeling when you only have summary statistics and wish to perform
multiple-group analysis (StataCorp 2013). Appendix B illustrates how you
can do this when you have a single group. The dataset can be accessed by
typing
We are using two groups: a sample of fourth graders and a sample of fifth
graders. We are interested in group differences in how well Physical and
Appear predict Peerrel. It might be more interesting to have other groupings
for this example to see whether there are differences between women and
men or between preadolescents and adolescents. You can think of other
groups we might want to compare. Adults in their early 20s might have a
different emphasis on appearance and physical fitness than those in their 40s,
for example, while those in their 40s might have a different emphasis than
those in their 70s. Still, the fourth and fifth graders will be sufficient to
illustrate how to do multiple-group structural equation modeling.
When analyzing summary data instead of raw data, we cannot use many
common Stata commands such as summarize or tabulate to examine the
data. We cannot use other Stata commands to assess normality. Importantly,
we cannot use the method(mlmv) option to work with missing values. We can
work with summary data based on listwise/casewise deletion. If, however, the
summary data are based on pairwise deletion, then we may encounter
estimation problems, because the correlation or covariance matrix cannot be
inverted. Because the matrix may have a different subsample for each
element, the resulting matrix may be impossible to obtain on any single
sample.
To get an idea of the data we have, we can run ssd describe. Because
Stata added notes to describe the dataset, we can also run the command
notes. When you are ready to generate your own summary data, you will
want to include a detailed set of notes so that a user of your data can know
more about the original raw data you used.
We have 385 observations and 16 variables. At the bottom of the ssd
describe results, we see there are two groups and the grouping variable is
grade. We also see that we have 134 fourth graders and 251 fifth graders.
We could go directly to fitting the model, but it is best to have a
satisfactory CFA result before moving to the full structural equation model. If
the CFA model in which the latent variables are freely correlated rather than
having structural paths does not fit the data, then the full structural equation
model is certain to fail. We would fit the CFA model using the same steps we
did at the start of this chapter to do multigroup CFA. We will not repeat this
here, but let us assume that we are satisfied with a three-factor CFA with the
four indicators of each of the latent variables as shown in figure 5.9. We also
assume that we have equal loadings across groups but have allowed the error
variances to differ. Now we are going to estimate the structural part of the
model instead of simply having the latent variables be correlated.
First, we will fit the model using two groups with the loadings and
measurement intercepts constrained to be equal across groups. This is the
default: Stata assumes you will let the variances of the error terms vary but
requires the loadings to be equal to ensure that the latent variables have the
same meaning in both groups. Our command is
From the estat gof postestimation command (results not shown), we
learn that , , and . From the
estat eqgof postestimation command, we learn that for our Peerrel
equation in the first group, fourth graders, is 0.73 and in the second group,
fifth graders, is 0.45. The difference in values suggests that more factors
go into explaining peer relations among the older students than just
attractiveness and physical fitness.
For the fourth graders, the sem results show that the unstandardized path
coefficient for is 0.49, and for fifth graders it is smaller
at 0.35. For , the unstandardized path coefficient is
0.28 for fourth graders compared with 0.19 for fifth graders. We might ask
for a standardized solution, running sem, standardized (results not shown).
These results are for among fourth graders
and among fifth graders. The results are for
among fourth graders and among fifth
graders.
Let us rerun the model with both of the structural path coefficients
constrained across groups (results are not shown for the following set of
commands).
Now we need to make a decision. This model does not do much worse in
terms of the RMSEA and CFI, but in terms of chi-squared, it does significantly
worse. With an extremely large sample, a chi-squared will be significant even
when the inequalities are trivial, but our sample is not extremely large. Many
researchers would probably say that the equality constraints do significantly
worse and conclude that the structural paths are different for fourth graders
than they are for fifth graders.
The next question is whether both paths are different or just one of them
is different. We can explore this question by using the estat ginvariant
command. Here are our results:
The results may be a bit confusing. The columns labeled Wald Test tell
us what would happen if we constrained parameters to be equal that are
currently free to differ. For example, the variance of the error term for
peerrel1, e.peerrel1, is not constrained to be equal across groups. If we put
an equality constraint on it, var(e.peerrel1@a), the model would not fit
significantly worse: Wald , . Although we have not put
any equality constraints on the error variances, these results suggest we might
have tried a more restrictive model that did place equality constraints on all
of them.
The last three columns are labeled Score Tests (Lagrange multiplier
tests). These indicate how much our chi-squared test would be reduced if we
removed an equality constraint. You will remember that Stata, by default, put
an equality constraint on each of the loadings. Removing the constraint on
any individual loading would not result in a significant improvement of chi-
squared; that is, they are all less than , .
Figure 5.10 is one way we could present our results. To simplify the
figure, I leave off the error variances. I do include the unstandardized
loadings which, by default, are equal for both groups. We report both the
unstandardized and standardized structural paths for the two groups. It would
appear that this model fits the data at least adequately. Appearance and
physical attractiveness are both significant predictors of peer relations, and
this is true for both fourth and fifth graders.
Both predictors are somewhat stronger for fourth graders than they are for
fifth graders. The for the standardized effect of
for fourth graders compared with for fifth
graders. The for the standardized effect of
for fourth graders compared with for fifth graders. Although the
coefficients are only slightly weaker for fifth graders, the model fits
significantly worse when the unstandardized efficients are constrained to be
equal. The differences in the effects is reflected in a larger for fourth
graders ( ) compared with fifth graders ( ).
ε6 appear1
ε7 appear2 1.1
Appear
ε8 appear3 1.2 peerrel1 ε2
1 .34*** .49***
(4th.68***)
(5th.55***) ε1 1
ε9 appear4 peerrel2 ε3
4th.96 1.47*** 1.2
(5th.45***) (4th.62***) Peerrel
1.4
ε10 phyab1 peerrel3 ε4
(4th.25**)
1.3
.28 *
1
Each of these exercises involves some type of comparison of fourth and fifth
grade children. You should interpret the results you obtain for each exercise.
a. Fit the model with all paths allowing for group differences where
socioeconomic status is your grouping variable.
b. Fit the model with all structural paths constrained to be equal.
Compare with your results from part a of this exercise.
. The interaction femXorder is the product of female and pol_order. When I generate an interaction as a
product, I include a capital X in the middle to help me easily see that the variable is an interaction term.
. When you have a large number of indicators, placing the boxes for the indicators in a vertical arrangement
works much better than placing them in a horizontal arrangement.
. We did not reverse-code x2, so it should have a negative loading.
Chapter 6
Epilogue–what now?
You have seen several specific applications of structural equation modeling.
These examples were selected to represent as broad a range of applications as
possible. You may find an example that is exactly what works for your own
research, or you may need to mix features of several of these examples to
meet your needs. These techniques greatly expand the types of questions we
can address in our research; as you worked through the examples, you
probably thought of several studies you could do yourself.
I designed this book to meet the needs of two types of researchers. The
primary audience is people who are new to structural equation modeling and
who will read this book and work their way through the examples and
exercises to facilitate learning. The secondary audience is people who already
know structural equation modeling and just want to see how to estimate
models using Stata. Whichever type of reader you are, keep this book close at
hand and use it as a reference.
Although these capabilities are beyond the scope of this book, they are
covered in the Structural Equation Modeling reference manual, and numerous
examples are shown. Depending on your experience with these estimators
outside of SEM, this book should have prepared you to explore their
applications within SEM. These estimators allow extensions of SEM to cover
models where the outcome variables may be binary (logit, probit,
cloglog), categorical (mlogit), ordinal (ologit, oprobit, ocloglog), count
(poisson, nbreg), or positive continuous but skewed (gamma).
male
Bernoulli
adopted college
logit
momed
You could fit the model using gsem with a syntax similar to sem syntax
but specifying the family and link.
In this appendix, you will learn how to use the SEM Builder based on
Stata 13. It is good to work through this appendix before trying to use the
Builder yourself. Though we used the Builder in the text, I did not present a
systematic introduction to its capabilities—that is the purpose of this
appendix. If you work through this appendix, you will be able to create all the
figures that appear in this book.
As you work through the following pages, do not worry if you do not
understand the statistical information presented here; it is fully explained in
the main text. Here I make no attempt to explain the statistical information.
The best way to learn how to draw a model with the SEM Builder is to use
the SEM Builder, so open Stata and let us get started.1 We will use with a
dataset that is an example from Stata’s Structural Equation Modeling
Reference Manual. You can open this dataset by typing
These data are in summary format rather than being raw data.2 We can
run the command ssd describe to obtain a description of the data:
Open up the SEM Builder in Stata. You can do this either by typing
sembuilder in the Command window or by using the menu system
—Statistics SEM (structural equation modeling) Model building
and estimation.
For Windows and Unix, the SEM Builder has a bar across the top with
straightforward names for each menu (File, Edit, Object, Estimation,
Settings, View, and Help); you can see this bar at the top of figure A.2.
Regardless of your operating system, take a minute to experiment with the
menus to get an idea of the many special features that are available. For
example, in the Object drop-down menu, there is an option to Align objects.
After you draw parts of your model, you can highlight them and then use the
alignment feature to make them line up properly.
The View drop-down menu has options for zooming in or out. This helps
when you are working on a complex model where fitting everything in the
right place is challenging. You can also expand your window for the SEM
Builder to fill your screen and then click on the Fit in Window button. I find
this extremely useful because it is easier to construct a model when you have
the largest possible workspace for doing this. You can also Show Grid lines
here. These will not appear when you print your graph, but they are
extremely helpful when you are trying to arrange objects. When you fit a
model in the unstandardized metric but want to see standardized results, you
simply click on View Standardized Estimates—you often want these.
On the left side of the Builder, you will see drawing tools (see figure A.1;
these are the same for Windows, Unix, and Mac). From top to bottom, these
are the Change to generalized SEM button, the Select tool, the Add
Observed Variable tool, the Add Generalized Response Variable tool, the
Add Latent Variable tool, the Add Multilevel Latent Variable tool, the
Add Path tool, the Add Covariance tool, the Add Measurement
Component tool, the Add Observed Variables Set tool, the Add Latent
Variables Set tool, the Add Regression Component tool, the Add Text
tool, and the Add Area tool. The names of the tools are fairly self-
explanatory as to what they do. The Select tool is used to click or double-
click on objects that you want to modify. Sometimes the Add Measurement
Component tool is the easiest way to create a latent variable that has
multiple indicators. I described how to use this tool in the appendix to
chapter 1. The Change to generalized SEM button, the Add Generalized
Response Variable tool, and the Add Multilevel Latent Variable tool are
used for fitting multilevel SEMs or SEMs with generalized-response variables,
which are beyond the scope of this book.
Figure A.1: The drawing toolbar
The objects in this model are labeled with letters. Notice that figure A.2
does not have an observed exogenous variable (E). An example of such a
variable might be gender4 if you wanted to add a path from gender to
Alien67 ( ) and gender to Alien71 (
). Selected variables have two attached letters. If we
wanted to make a change across all latent variables, such as increasing the
size of the ovals to accommodate longer names, we could easily refer to all of
them (A). However, we could make a change only to the oval around SES66
by just referring to latent exogenous variables (C). The final point to mention
is that the paths and are
constrained to be equal.
Figure A.2: A model built in the SEM Builder with types of objects
labeled
Until you are very familiar with the SEM Builder, I suggest that you first
draw a freehand sketch of your model. This way, you have the
opportunity to decide the best way to arrange the variables. Usually we
work from the left to the right. Perhaps you want to have SES66 farther
to the left than it is in our final model. You also might like to flip the
figure so that SES66 is above Alien66 and Alien67. You also need to
decide how close things are to each other and where you want error
terms located. I suggest you sketch this out with paper and pencil first
because it is easier to draw another quick sketch than it is to move
objects around in the SEM Builder.
A.4 Drawing an SEM model
If you have not already done so, open Stata and open the dataset with the
following command:
Also open the SEM Builder if you have not yet. You can do this either by
running the command sembuilder or by using the drop-down menu
Statistics SEM (structural equation modeling) Model building and
estimation. Expand your window and then use the Fit to Window button or
change the Zoom percentage to make your workspace for the figure as large
as possible.
We are now ready to add the variables to our model. It would be easiest
to add each latent variable and its corresponding indicators using the Add
Measurement Component tool. However, to demonstrate some features of
the SEM Builder, we will instead add each component of this model to the
diagram separately.
To begin, click on the Add Latent Variable tool (the oval) in the
drawing toolbar and then click on your workspace where you want SES66;
repeat this for Alien67 and Alien71. Click on the Select tool at the top of the
drawing toolbar, and click on the oval you want to label Alien67. When you
click on a latent variable oval, the SEM Builder places a blue box around it,
and a toolbar appears just above the drawing area. In the new toolbar, type in
the box next to Name the name of the latent variable, Alien67. Repeat this
procedure to name each of the latent variables. At this point, you have
something similar to figure A.4.5
Figure A.4: Laying out our latent variables
Now you can move the ovals around to where you want them. Make sure
that there is good alignment for Alien67 and Alien71: Click on the Select
tool, and then click and drag over these two latent variables. Once they are
both highlighted, choose the Object Align menu item and click on the
Horizontal Bottom option.
Now we want to add rectangles for our observed variables. Click on the
Add Observed Variable tool (the rectangle) in the drawing toolbar. Just like
with the latent variable ovals, you now click on the diagram to create a
rectangle for each of our six observed variables (two per latent variable). You
can move the rectangles around and use Object Align to get correct
alignment. Make sure you leave enough room for the error terms; this is
especially important for the indicators of Alien67 and Alien71 because you
will need room for the curved lines representing correlated error terms.
When you click on the Select tool and then click on one of the rectangles,
you can choose a variable name from the Variable drop-down menu that
becomes available in the top toolbar. You can resize the rectangles if your
actual variable names are too long to fit, but you may prefer to rename your
variables to have just eight or fewer characters before starting to use the SEM
Builder. Your model may look something like figure A.5 now.
We are now ready to put in the paths. Click on the Add Path tool (the
thin arrow pointing to the right) in the drawing toolbar. Click near the source
and drag to the destination to draw the path. The objects are green at both
ends of the path when you have made the connection.
When you insert a path, Stata inserts the corresponding error terms. The
two observed endogenous variables, educ66 and occstat66, have their error
terms, but notice there is no error term for SES66. This is an exogenous latent
variable, and so we are not trying to explain it—it does not need an error
term. Do not worry if the error terms are in a different order than what you
see in figures A.2 and A.3. The subscripts on the error terms will vary
depending on the order in which you input the variables. This does not
matter. Your model may now look like figure A.6.
ε3 ε5 ε6
Alien67 Alien71 ε2
SES66
ε7 educ66 occstat66
ε8
So you think this looks ugly. You really do not like the location of some
of the error terms. You might also like to have the arrows from the latent
variables go to the centers of the observed variables.
Click on the Select tool, and then click on educ66. On the toolbar that
appears just above the figure, click on the button to bring the error term to
a 6 o’clock position below educ66. You might want to make similar changes
for other error terms that are not nicely located in figure A.6. This does what
we need, but sometimes you may prefer to fine-tune the location of the error
terms by clicking on them and dragging them to where you want them.
Next click on any of the paths, and notice the two small circles that
appear at the ends of the path. Click on one of these circles and drag the path
end to where it belongs, and then click on the other circle and do the same.
You can do this to fine-tune where the direct paths intersect the objects.
Let us also move the SES66 latent variable and its indicators up and to the
right a bit. Using the Select tool, click and drag over SES66 and its indicators,
including the error terms, to highlight the block of variables. You can now
click and drag to move all the highlighted objects as a block.
Sometimes we need to correlate error terms. Let us correlate the error
terms for anomia67 and anomia71 and the error terms for pwless67 and
pwless71. Click on the Add Covariance tool (the arched arrow) on the
drawing toolbar, and draw in the lines. When you do this, you may find that
the curves are quite pronounced; perhaps they even extend above the top of
your figure. Click on a curve to see a light blue line with a circle at the top
appear at one end of the curve. Click on the circle and drag to reshape your
curve. After you add the correlated error paths, your model looks like
figure A.7.
ε3 ε4 ε5 ε6
equal equal
ε1 Alien67 Alien71 ε2
SES66
educ66 occstat66
ε7 ε8
This looks a lot better. For reasons explained in the main text, we also
might want to put equality constraints on the path from Alien67 to pwless67
and Alien71 to pwless71. We do this by assigning the same label to each of
these paths. In figure A.7, I used the label equal for both. To do this, click on
the Select tool, and then click on one of the paths we want to constrain. In the
top toolbar, a box will appear where you will type equal. Then click on
the other path to constrain and do the same. Because both paths have the
same attached label, Stata will constrain them to be equal for the
unstandardized solution.
We are done! Let us copy this figure into a Word document. First, resize
the canvas so that no more than half an inch of space is between the edge of
the figure and the edge of the canvas on all sides. The easiest way to
accomplish this is to Select the entire drawing and move it to the lower left
corner of your workspace; now when you resize the canvas, everything will
fit nicely.
Once the canvas is the right size, again Select your entire figure. You can
use Ctrl c to copy the figure. Move to a Word document, and use Ctrl v to
paste the figure into the document. You can also save your figure as a
structural equation model path diagram (.stsem). In this format, you can
open it whenever you want to use it again. You should save in this format
anytime you create a model you may want to use later. You may also save it
as any of several other formats, including as a portable document file (.pdf),
a picture (.jpg), or an encapsulated postscript file (.eps).
Let us fit this model. Click on the Estimate button, , on the top toolbar,
and then click on OK in the dialog box that opens. Stata will automatically
select the first observed indicator variable for each latent variable as the
reference indicator and will make its unstandardized loading fixed at 1.0.6
The results for our unstandardized model are shown in figure A.8.
1.6 .33
-.58
-.22
SES66
6.8
5.2
1
educ66 occstat66
11 37
ε7 2.8 ε8 264
We can verify that Stata fixed the first indicator of each latent variable as
a reference indicator, fixing its unstandardized loading at 1.0. And we can
also see that because we constrained the loadings from
and to be equal, they are both
showing the same value, 0.95.
Next click on Settings Variables All Observed..., and select the tab
marked Results. We do not have any exogenous observed variables, but we
do have six endogenous observed variables. Because we do not want the
intercepts reported for these variables, click on Intercept and change it to
None. Next click on Settings Variables All Latent..., and select the
Results tab. Here for both exogenous and endogenous variables, change what
is reported to None. Finally, click on Settings Variables All Error...,
and select the Results tab. Change the reported results to None.
With our preferences implemented, our figure now looks like figure A.9.
.36 .12
ε3 ε4 ε5 ε6
-0.56
-0.21
SES66
0.64
0.84
educ66 occstat66
ε7 ε8
Stata has options for what it reports along with each of the paths ( -
values, -values, confidence intervals). There is rarely room for this much
information in a figure. It is typical to report one asterisk for , two
asterisks for , and three asterisks for . To add this
reporting in the SEM Builder, we need to Add Text. Click on the Add Text
tool on the drawing toolbar, click on the figure about where you want the
asterisks, enter *** in the dialog box that opens, and then click on OK. You
can always move these text boxes around so that they are properly located.
Figure A.3 has asterisks that were added in this way.
Our model fits the data very well, but if you want modification indices,
click on Estimation Tests and CIs Modification indices. This can help
us diagnose problems in a poorly fitting model. In many models, we want
indirect effects. The only indirect effect we might want to estimate in our
model is . To compute this indirect effect,
click on Estimation Tests and CIs Direct and indirect effects. Check
Do not display effects with no paths, Report standardized effects, Do not
display direct effects (we already have them), and Suppress blank lines. This
postestimation command reports that SES66 has a standardized indirect effect
of on Alien71, and the unstandardized indirect effect is significant:
, . You will find several other postestimation commands
in the Estimation menu as well.
You may want to add a block of text to the figure that includes
information such as the chi-squared value, its level of significance, etc.
Suppose you want to add a block of text like the hypothetical one in
figure A.10.
Figure A.10: Text block with Greek letters
Click on the Add Text tool, and then click where you want the text to
start. In the dialog box that opens, enter the following:
If you cannot remember these ways to enter special characters, you can
always use Stata’s help system. You access this help menu by typing help
sg__tags (notice the two underscores between sg and tags).
The best way to develop skill using the SEM Builder is simply to use it to
draw figures of varying complexity. You cannot hurt Stata by making a
mistake, and restoring the original settings ensures you can always get a fresh
start.
. Use caution as you learn how to use the SEM Builder because you will likely change some of the defaults.
These changes stick in the SEM Builder and may not be what you want in a subsequent model you are
drawing. To restore Stata defaults for the SEM Builder, click on Settings Settings Defaults when you
open your next SEM Builder window before drawing your next diagram. If you use a Mac, be sure to read
box 1.A in the appendix to chapter 1 to learn where to access your Settings menu; information on the
differences in operating systems is also presented in this appendix section A.2.
. How to enter data in summary format is described in appendix B.
. I make only one exception to this rule: when I have interaction terms, I use a capital X between the
component names; for example, I might use the name genXinc to refer to the interaction of the variables
gender and income.
. Notice that gender is all lowercase because it is an observed rather than a latent variable.
. Figure A.4 shows the Properties window. To view this window, click on View Show Property Sheet,
and to hide it click on View Hide Property Sheet. When drawing more complex models, you may want
to hide this window to maximize the space you have for your drawing.
. Our example does not have any observed exogenous variables; however, if we have two or more of them,
we do not need to draw in the correlations among them because Stata assumes they will be correlated.
Appendix B
Entering data from summary statistics
Many articles report enough information for you to analyze their data. Path
analysis and standard structural equation models only need a covariance
matrix or a correlation matrix along with the standard deviations. Some
models, such as multiple-group models or growth curve models, also need the
means. Given just these summary statistics, Stata can fit the same model and
get the same results for all the parameters estimates, standard errors, and
measures of goodness of fit. You can use the summary statistics to modify
the original model in ways you feel are more appropriate, such as by adding a
path, correlating a pair of error terms, or creating a nonrecursive model. This
capability is extremely useful when you want to reanalyze published results.
It also might save some time if you had an unusually large dataset, because
Stata does not need to read and analyze the large raw dataset.
Be warned, however, that the results produced by sem are only as good as
the summary statistics you enter. Many journals, such as those following
American Psychological Association guidelines, only report two decimal
places. This arbitrary restraint is unfortunate because it gives you less
precision in your results than would be possible with more decimal places.
In this example, we will use hypothetical data. Suppose you read a study
of the executive functioning of 6-year-old children. Children who have strong
working memories, can exercise inhibitory control (wait for a better but less
immediate advantage), and have good cognitive skills are said to have strong
executive functioning. This strong executive functioning leads to better
scores on math achievement. Suppose the study combines three items to
measure each of these concepts to obtain a single score on a nine-item
executive function scale. The study also uses math achievement when the
child is 8 years old to show that executive functioning predicts academic
performance. The original model is shown in figure B.1.
After reading this article, you decide that it is better to estimate the
individual effects of the three components of executive functioning—
working memory, inhibitory control, and cognitive skills—rather than
treating the three as a single scale. You think that treating these as three
separate variables will do a better job of predicting math performance. You
believe that each of these variables may have differential importance in
predicting math achievement.
ε1 c1
ε2 c2
ε3 c3
ε4 m1
ε6 m3
ε7 i1
ε8 i2
ε9 i3
You may have noticed that ExecFunction is a long name for a variable,
and it does not fit into the default width of the latent variable oval. Use the
Select tool to click on the ExecFunction oval. On the top toolbar, click on
Properties... to open the Variable properties dialog box. In the Appearance
tab, click on Customize appearance for selected variables and then Set
custom appearance. In the menu that opens, change the size of the oval from
0.65 inch to 0.75 inch.
Your model, which is shown in figure B.2, uses the three components of
executive functions separately to predict math achievement.
ε1 c1
ε2 c2 Cognitive
ε6 c3
ε3 m1
ε4 m2 WrkMem math ε8
ε5 m3
ε7 i1
ε9 i2 IhibCtrl
ε10 i3
We use the ssd command (summary statistics data) to enter the original
reported summary statistics. First, we use ssd init c1 c2 c3 m1 m2 m3 i1
i2 i3 math to provide names for our variables. Next we use ssd set
observations 500 to set the number of observations in the dataset (here we
assume there were children in the study). We next enter the
standard deviations for each variable with the ssd set sd command. If you
want to just do a standardized analysis or you do not know the standard
deviations, you could just enter 1.0 as the standard deviation for each
variable, which would make all your results standardized. We enter the
correlation matrix with ssd set correlations to enter everything on the
diagonal and below the diagonal. This looks nicer if we change the delimiter
temporarily from a carriage return to a semicolon; this way, we do not need
to put the three slashes (///) at the end of each line. We change the delimiter
back to a carriage return after entering the matrix.
The results are reported as follows:
The means are not set because we did not enter any. We do not need the
means to produce an equivalent dataset for our analysis, but we would for
other analyses. At this point, it is wise to save this dataset; we would not want
to have to reenter all this information. We save it, reopen it, and then run ssd
list to see what is in the dataset.
Everything looks fine, so we are ready to fit our structural equation
models. First, we reproduce the model in the original article:
These are standardized results, so we get a standardized loading for each
indicator of the single construct, executive functioning. The cognitive items
have loadings of 0.55–0.57; the memory items have loadings of 0.61–0.63;
and each of the inhibitory control items has a loading of 0.61. All of this
seems reasonable.
What about our model? Remember that we kept the three dimensions of
executive functioning separate. Here is our set of commands:
Have we improved on the original publication? First, we can examine the
loadings. We see that the three indicators of cognitive ability have loadings
between 0.70 and 0.78, the three indicators of working memory have
loadings between 0.72 and 0.84, and the three indicators of inhibitory control
have loadings between 0.72 and 0.81. This is impressive!
Many authors do not report the necessary summary statistics, and some
journals may even discourage this to save journal space. When the
summary statistics are not published, you can sometimes request them
from the authors.
When the appropriate summary statistics are reported, they often are
provided with just two or three decimal places, so when you try to
reproduce what the authors did, you may get slightly different results.
We are using structural equation modeling. Many researchers still rely
on ordinary least-squares regression, and they were not able to isolate
the estimated measurement error from the structural model.
In analyzing the summary statistics, we are acting as if there were no
missing values. The authors of the original study may have reported a
listwise correlation matrix or a pairwise correlation matrix, and both of
these are ineffective ways of handling missing values.
B
Bentler, P. M., 1.9 , 2.9.4
Bollen, K. A, 2.2
Brown, T. A., 3.5.2
C
Campbell, D. T., 2.2
Cook, T. D., 2.2
Costello, A. B., 1.3
Cunningham, W. A., 1.11
D
Duncan, O. D., 2.9.1
F
Fabrigar, L. R., 1.3
G
Graham, J. W., 4.6.2
Grimm, K. J., 4.7
H
Haller, A. O., 2.9.1
Halpern, J. Y., 2.2
Hamagami, F., 4.7
Hedström, P., 2.2
Hocevar, D., 5.5
Hu, L., 1.9
K
Kline, P., 1.2
L
Little, T. D., 1.11
M
Maccallum, R. C., 1.3
Marsh, H. W., 5.5
McClelland, M. M., 2.3
Medeiros, R., 2.4.1
Mitchell, M. N., 4.4.2
Muthén, B., 3.2
O
Osborne, J. W., 1.3
P
Pearl, J., 2.2
Piccinin, A., 2.3
Portes, A., 2.9.1
R
Ram, N., 4.7
Raykov, T., 2.9.4 , 3.3
Rhea, S. A., 2.3
S
Schafer, J. L., 4.6.2
Shadish, W. R., 2.2
Shahar, G., 1.11
Stallings, M. C., 2.3
StataCorp, 5.5
Strahan, E. J., 1.3
Summers, G., 3.2
Swedberg, R., 2.2
U
U.S. Department of Health and Human Services, 5.4
W
Wegener, D. T., 1.3
Wheaton, B., 3.2
Widaman, K. F., 1.11
Subject index
#delimit command, B
A
alpha, 1.2
alpha command, 1.4
alpha formula, 1.2
asymptotically distribution free estimator, 1.7
auxiliary variables, 2.5 , 2.5
B
bmiworking.dta, 4.4 , 4.4.4
C
casewise deletion, 1.5
center command, 4.5
centering, 4.5
CFA, see confirmatory factor analysis
CFI, see comparative fit index
chi2tail() option, 4.4.5
clonevar command, 1.6
codebook, compact command, 1.3
comparative fit index, 1.9
comparisons of models, 5.3.5
composite latent variables, 3.2 , 3.5 , 3.5 , 3.5.2
confirmatory factor analysis
advantages, 1.7
estimation of two-factor model, 1.10.2 , 1.10.2
interpreting results, 1.8 , 1.8 , 1.9.2 , 1.9.2
sem command, 1.7
constraints, 2.9.3 , 2.9.3
equality, 3.3 , 3.3
fixed error variance, 3.5.1
correlated errors, 2.4 , 2.4.2 , 3.2.3
growth curves, 4.4.5 , 4.4.5
spurious effects, 3.2.3
correlated residuals, 2.4 , 2.4.2
correlation comparisons, 5.3.7 , 5.3.7
covariance comparisons, 5.3.7 , 5.3.7
covariates, 2.4.2 , 2.4.2
cross-lagged panel design, 2.7 , 2.7
cumulative disadvantage, 4.2
D
data
entering correlations, means, and standard deviations, B
entering summary statistics, B
degrees of freedom, 1.9 , 1.10.1 , 1.10.2 , 1.10.2 , 2.3 , 2.4 , 2.4.1 , 2.4.2 ,
4.3.1 , 4.4.4 , 5.4
direct effects, 2.2 , 2.2.2
testing, 2.4.1 , 2.4.1
display chi2tail() command, 3.4 , 4.4.5
display command, 1.3
drawing,
E
effect size, 5.3.6
effects
direct, 2.4.1 , 2.4.1
indirect, 2.4.1 , 2.4.1
total, 2.4.1 , 2.4.1
egen command, 1.4
eigenvalue, 1.3
endogeneity, 2.2
endogenous mediator variables, 2.2.1 , 2.2.1
endogenous outcome variables, 2.2.1 , 2.2.1
equal loadings
multiple-group comparison, 5.3.2
equality constraints, 2.9.3 , 2.9.3 , 3.3 , 3.3
SEM Builder, 3.4
error terms
changing order, 3.2.1
correlated, 2.4 , 2.4.2 , 2.4.2
equality constraints, 3.4
naming, 3.2.3
normally distributed, 1.6
estat
eqgof postestimation command, 2.3
framework postestimation command, 1.9 , 1.9
ggof postestimation command, 5.4
ginvariant postestimation command, 5.3.2 , 5.3.3 , 5.4.1
gof postestimation command, 1.9 , 1.9.2
icc postestimation command, 4.4.3
mindices postestimation command, 1.9.1 , 1.9.1
stable postestimation command, 2.9.2
stdize: postestimation command, 2.4.1 , 2.4.1 , 2.6 , 2.7
teffects postestimation command, 2.4.1 , 2.4.1 , 3.2.4 , 3.2.4
estimates store command, 3.4 , 5.3.2
estimation
asymptotically distribution free, 1.7
maximum variance, 1.7
methods, 1.7
exogenous variables, 2.2.1 , 2.2.1
covariance structure, 2.4.2
explained variance, 2.3
F
factor, pcf command, 1.3
factor score, 1.5
fixed effects, 4.1
fixed error variance, 3.5.1
formative construct, 3.5
formative indicators, 3.5 , 3.5 , 3.5.2
formative measurement model, 3.5 , 3.5.2
G
ginvariant() option, 5.3.1
goodness of fit
CFI, 1.9
growth curves, 4.4.4
RMSEA, 1.9
SRMR, 1.9
graphs
individual trajectories, 4.4.2 , 4.4.2
moderation effects, 2.8
group comparisons, 5 , 5.6
as an additive effect, 5
equal loadings, 5.3.2
equivalent form, 5.3.1
full SEM model, 5.2.3 , 5.2.3
ginvariant() option, 5.3.1
group means, 5.3.6
invariant loadings, 5.3.2 , 5.3.2
means, 5.3.1 , 5.3.6 , 5.3.6
measurement model, 5.3 , 5.3.7
model fit, 5.4.1
path analysis, 5.4 , 5.4.4
reference group, 5.3.6
-squared, 5.4.1
unequal ’s, 5.4.1 , 5.4.1
variances and covariances, 5.3.7 , 5.3.7
group() option, 5.3.1 , 5.4
growth curves
correlated errors, 4.4.4 , 4.4.5 , 4.4.5
equal error variances, 4.7 , 4.7
estimation, 4.4.4 , 4.4.4
fixed effect, 4.1
goodness of fit, 4.4.4
identification, 4.3.1 , 4.3.2
individual trajectories, 4.4.2 , 4.4.2
interpreting intercept and slope with covariates, 4.5.1
interpreting mean intercept and slope, 4.4.4
modification indices, 4.4.5 , 4.4.5
purpose of, 4.1
quadratic, 4.4.6 , 4.4.6
random effects, 4.4.7
random intercept, 4.1 , 4.4.7
random slope, 4.1 , 4.4.7
time-invariant covariates, 4.6.2 , 4.6.2
time-varying covariates, 4.6 , 4.6.2
trajectory, 4.1
H
histogram command, 1.4
I
ICC, see intraclass correlation
identification, 1.9 , 1.10.2 , 1.10.2
composite latent variable, 3.5.1 , 3.5.1
formative indicators, 3.5.1 , 3.5.1
growth curves, 4.3 , 4.3.2
intercept, 4.2
structural equation model, 3.2.1 , 3.2.1
indirect effects, 2.2.2 , 2.2.2 , 3.2.4 , 3.2.4
testing, 2.4.1 , 2.4.1
interaction, 2.8 , 2.8
traditional approach, 5.1 , 5.1
intercept
interpreting with covariates, 4.5.1
intercept constraints, 5.3.4 , 5.3.4
internal consistency, 1.2
intraclass correlation, 4.4.3 , 4.4.3
invariance
equal loadings, 5.3.2 , 5.3.2
measurement model, 3.3
L
latent variable, 1.6 , 1.6
reliability of measurement scale, 1.8
variance of, 1.8
likelihood-ratio test, 5.3.2
listwise deletion, 1.5
loadings, 1.3
long format, 4.4.2
lrtest command, 3.4 , 5.3.2
M
MAR, see missing at random
maximum likelihood estimation, 1.7
MCAR, see missing completely at random
means
group comparisons, 5.3.6 , 5.3.6 , 5.3.6
means() option, 4.4.4 , 5.3.1
measurement invariance, 3.3
measurement models
and group comparisons, 5.2.2 , 5.2.2
mediation, 2.2.1 , 2.2.1 , 2.2.2 , 2.2.2
method(adf) option, 1.7
method(mlmv) option, 1.7
methods factor, 1.12
MIMIC model, see multiple indicators, multiple causes models
missing at random, 1.7 , 2.5 , 2.5 , 4.6.2
missing completely at random, 2.5 , 2.5 , 4.6.2
missing values, 1.4 , 1.5 , 1.10.1 , 1.10.1 , 4.4.4
auxiliary variables, 2.5 , 2.5
missing at random, 2.5 , 2.5
missing completely at random, 2.5 , 2.5
method(mlmv) option, 1.7
mixed command, 4.7
model comparison, 5.3.5
nested models, 4.4.5 , 4.4.6
model fit
estat gof command, 1.9 , 1.9.2
models
constraints, 2.9.3 , 2.9.4
cross-lagged panel design, 2.7 , 2.7
interaction, 2.8 , 2.8
MIMIC, 3.5.2 , 3.5.2
moderation, 2.8 , 2.8
nonrecursive, 2.9 , 2.9.4
reciprocal relations, 2.9
moderation, 2.8 , 2.8
traditional approach, 5.1 , 5.1
modification indices, 1.9 , 1.9.1 , 1.9.2 , 3.2.2 , 4.4.5 , 4.4.5
multgrp_cfa.dta, 5.3
multgrp_path.dta, 5.4
multiple groups
ginvariant(none) option, 5.3.1
invariant loadings, 5.3.2 , 5.3.2
multiple indicators, multiple causes models, 3.5.2 , 3.5.2 , 5.2.1 , 5.2.1
multiple-group analysis, 5 , 5.6
measurement model, 5.3 , 5.3.7
multiple-group comparisons
equal loadings, 5.3.2
equivalent form, 5.3.1
path analysis, 5.4 , 5.4.4
N
National Longitudinal Survey of Youth, 1997, 1.3 , 4.4 , 5.3
nlsy97cfa.dta, 1.3
noconstant option, 4.4.4
nodirect option, 3.2.4
nonrecursive models, 2.9 , 2.9.4
O
ordering error terms, 3.2.1
P
parceling, 1.11 , 1.11
path analysis, 2 , 2.1
multiple-group comparisons, 5.4 , 5.4.4
postestimation commands
estat eqgof, 2.3
estat framework, 1.9 , 1.9
estat ggof, 5.4
estat ginvariant, 5.3.2 , 5.3.3 , 5.4.1
estat icc, 4.4.3
estat mindices, 1.9.1 , 1.9.1
estat stable, 2.9.2
estat stdize:, 2.6 , 2.7
estat teffects, 2.4.1 , 2.4.1 , 3.2.4 , 3.2.4
predict, 1.5 , 2.8
predict postestimation command, 1.5 , 2.8
principal component factor analysis, 1.3 , 1.3
Q
quadratic graph, 4.4.7
quadratic growth curve
identification, 4.3.2 , 4.3.2
R
random effects, 4.1
random intercept, 4.1
random slope, 4.1 , 4.4.2
recursive model, 3.2.1
reference group
group comparisons, 5.3.6
reference indicator, 1.8
selection of, 3.2.2 , 3.2.2
reflective indicators, 3.2 , 3.5
reliability
scale reliability in CFA, 1.8
scale reliability with correlated errors, 1.8 , 1.9.2
scale reliability without correlated errors, 1.8
reordering labels, 3.2.1
reshape command, 4.4.2
RMSEA, see root mean squared error of approximation
robust estimator, 1.7
root mean squared error of approximation, 1.9
rowmean() option, 1.4
-squared, 2.3
S
saving estimates, 5.3.2
scale score, 1.4
SEM Builder,
CFA model, 1.6 , 1.6
custom reporting, 5.3.6
equality constraints, 3.4
equality constraints on errors, 3.4
fixing error variance at 0, 3.5.1
group comparisons
same form of model, 5.3.1
sembuilder command, 1.6
sem command, 1.7 , 1.8 , 1.10.1 , 1.10.1
sem, coeflegend command, 2.4.1 , 2.4.1 , 3.4
sem, standardized command, 1.8
sembuilder command, 1.6
set obs command, B
significance
for chi-squared, 3.4
not reported for variances, 1.9.2
slope, 4.1 , 4.2
spurious effects
and correlated errors, 3.2.3
SRMR, see standardized root mean squared residual
ssd
command limitations, B
list command, B
set correlations command, B
ssdmath.dta, B
standardized coefficients, 5.3.5 , 5.3.5
standardized root mean squared residual, 1.9
standardized solution, 1.8
storing estimates, 5.3.2
structural model, 2 , 2.1
summarize command, 1.4
summary statistics
entering, B
limitations, B
T
test, 5.3.6
tables
comparison of models, 5.3.5 , 5.3.5
mean comparison, 5.3.6
testing equality of coefficients, 2.6 , 2.6
tests
equality constraints, 2.9.3 , 2.9.3 , 2.9.4 , 2.9.4 , 3.4
indirect effects, 2.4.1 , 2.4.1
invariance, 3.4
nonlinear, 2.4.1 , 2.4.1
significance of chi-squared, 3.4
total effects, 2.4.1 , 2.4.1
total score, 1.4
trajectory
fixed effect, 4.1
random effect, 4.1
twoway command, 4.4.7
U
unequal ’s, 5.4.1 , 5.4.1
unidimensional, 1.1
unique variance, 1.6
unstandardized coefficients, 5.3.5 , 5.3.5
V
variance comparisons, 5.3.7 , 5.3.7
vce(bootstrap) option, 1.7
W
wide format, 4.4.4