Patrick Dattalo - Analysis of Multiple Dependent Variables
Patrick Dattalo - Analysis of Multiple Dependent Variables
Patrick Dattalo - Analysis of Multiple Dependent Variables
POCKET GUIDES TO
SOCIAL WORK RESEARCH METHODS
Series Editor
Tony Tripodi, DSW
Professor Emeritus, Ohio State University
Determining Sample Size: The Dissertation:
Balancing Power, Precision, and Practicality From Beginning to End
Patrick Dattalo Peter Lyons and Howard J. Doueck
Preparing Research Articles Cross-Cultural Research
Bruce A. Thyer Jorge Delva, Paula Allen-Meares, and
Systematic Reviews and Meta-Analysis Sandra L. Momper
Julia H. Littell, Jacqueline Corcoran, and Secondary Data Analysis
Vijayan Pillai Thomas P. Vartanian
Historical Research Narrative Inquiry
Elizabeth Ann Danto Kathleen Wells
Confirmatory Factor Analysis Structural Equation Modeling
Donna Harrington Natasha K. Bowen and Shenyang Guo
Randomized Controlled Trials: Finding and Evaluating Evidence:
Design and Implementation for Systematic Reviews and
Community-Based Psychosocial Evidence-Based Practice
Interventions Denise E. Bronson and Tamara S. Davis
Phyllis Solomon, Mary M. Cavanaugh, Policy Creation and Evaluation:
and Jeffrey Draine Understanding Welfare Reform in the
Needs Assessment United States
David Royse, Michele Staton-Tindall, Richard Hoefer
Karen Badger, and J. Matthew Webster Grounded Theory
Multiple Regression with Discrete Julianne S. Oktay
Dependent Variables Systematic Synthesis of Qualitative
John G. Orme and Terri Combs-Orme Research
Developing Cross-Cultural Measurement Michael Saini and Aron Shlonsky
Thanh V. Tran Quasi-Experimental Research Designs
Intervention Research: Bruce A. Thyer
Developing Social Programs Conducting Research in Juvenile and
Mark W. Fraser, Jack M. Richman, Maeda J. Criminal Justice Settings
Galinsky, and Steven H. Day Michael G. Vaughn, Carrie Pettus-Davis,
Developing and Validating Rapid and Jeffrey J. Shook
Assessment Instruments Qualitative Methods for Practice
Neil Abell, David W. Springer, Research
and Akihito Kamata Jeffrey Longhofer, Jerry Floersch,
Clinical Data-Mining: and Janet Hoy
Integrating Practice and Research Culturally Competent Research:
Irwin Epstein Using Ethnography as a Meta-Framework
Strategies to Approximate Random Mo Yee Lee and Amy Zaharlick
Sampling and Assignment Using Complexity Theory for Research and
Patrick Dattalo Program Evaluation
Analyzing Single System Design Data Michael Wolf-Branigin
William R. Nugent Analysis of Multiple Dependent
Survival Analysis Variables
Shenyang Guo Patrick Dattalo
PATRICK DATTALO
Analysis of Multiple
Dependent Variables
1
3
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide.
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Oxford is a registered trademark of Oxford University Press in the UK and certain other
countries.
1 3 5 7 9 8 6 4 2
Printed in the United States of America
on acid-free paper
For Marie
This page intentionally left blank
Contents
References 157
Index 171
vii
This page intentionally left blank
Analysis of Multiple Dependent Variables
This page intentionally left blank
1
INTRODUCTION
Multivariate procedures allow social workers and other human services
researchers to analyze multidimensional social problems and interven-
tions in ways that minimize oversimplification. The term multivariate
is used here to describe analyses in which several dependent variables
(DVs) are being analyzed simultaneously. The term univariate is used to
describe analyses in which one DV is being analyzed. DVs are also called
criterion, response, y-variables, and endogenous variables. Independent
variables (IVs) are also called predictor, explanatory, x-variables, and
exogenous variables. For consistency, only the terms independent and
dependent variables will be used here. Examples of multivariate analy-
ses include investigations of relationships between personality traits and
aspects of behavior, program characteristics and effectiveness, and client
characteristics and service needs. The purposes of multivariate analyses
include (1) data structure simplification and reduction, (2) description
of relationships, and (3) prediction.
Using multivariate analysis for data structure simplification and reduc-
tion involves identifying and interpreting concepts called latent variables,
or emergent variables. According to Huberty (1994), “what is being sought
is an attribute or trait that may not be directly observable, and is not
1
2 Analysis of Multiple Dependent Variables
Boring (1953) stated, “ . . . as long as a new construct has only the sin-
gle operational definition that it received at birth, it is just a construct;
when it gets two alternative operational definitions, it is beginning to be
validated” (p. 183).
Multiple DVs may be suggested by theory, previously conducted
empirical investigations, or practice experience. Moreover, as Campbell
and Fiske (1959) have suggested, even if the primary interest is in only
one DV, multiple operationalization may provide a more valid assess-
ment of that DV. However, difficulty may arise in the interpretation of
relationships among multiple DVs if there is reason to suspect that these
variables are correlated. Consequently, the examination of correlations
among DVs may provide a greater understanding than can be attained by
considering these variables separately. The four multivariate procedures
discussed in this book facilitate the analysis of correlated DVs.
Control of Type I Error. A second reason to conduct a multivariate
analysis is to control type I error. Type I error is the probability of incor-
rectly identifying a statistically significant effect. Type I error is a false
4 Analysis of Multiple Dependent Variables
the output was produced. Instructions for using PASW and SAS
to conduct MMR also are provided; and
3. An annotated example of SEM is provided using AMOS.
Computer output is presented as well as an explanation of how
the output was produced.
Assumptions
The term statistical model is used here to describe mathematical state-
ments about how variables are associated. Ideally, models are constructed
to capture a pattern of association suggested by a theory. Models are con-
structed and fitted to statistically test theories that purport to explain
relationships between and among variables. Statistical models include IVs
and DVs. Variables also may be described in terms of order. Higher-order
variables (e.g., interactions or moderators) are constructed from
lower-order variables (i.e., main effects).
Model building concerns strategies for selecting an optimal set of IVs
that explain the variance in one or more DVs (Schuster, 1998). A statisti-
cal model is said to be misspecified if it incorrectly includes unimportant
variables or excludes important variables (Babyak, 2004). Including more
variables than are needed is termed overfitting. Overfitting occurs when
a model contains too many IVs in relation to sample size, and when IVs
are intercorrelated. An overfitted model captures not only true regulari-
ties reflected in the data, but also chance patterns that reduce the model’s
predictive accuracy. Overfitting yields overly optimistic model results;
that is, findings that appear in an overfitted model may not exist in the
population from which a sample has been drawn. In contrast, underfit-
ting occurs when a model has too few IVs. Underfitting results in omis-
sion bias, and, consequently, may have poor predictive ability because of
the lack of detail in a regression model.
The simplest statistical model consists of IVs whose relationships
with a DV are separate, and, consequently, there is additivity, or no inter-
action. A more complicated model is when the effect of one IV (X1) on a
DV (Y) depends on another IV (X2). The relationship between X1 and X2
with Y is called a statistical interaction (or moderation). In a statistical
test of interaction, the researcher explores this interaction in the regres-
sion model by multiplying X1 by X2, which produces X1X2, which is called
Basic Concepts and Assumptions 7
The bi’s are the regression coefficients, representing the amount the DV
(y) changes when the IV (e.g., x1) changes one unit. The c is the constant,
or where the regression line intercepts the y-axis, and represents the value
of the DV(y) when all IVs equal zero.
Ordinary least squares (OLS) regression derives its name from the
method used to estimate the best-fit regression line: a line such that
the sum of the squared deviations of the distances of all the points from
the line is minimized. More specifically, the regression surface (i.e., a line
in simple regression, or a plane or higher-dimensional surface in mul-
tiple regression) expresses the best predicted value of the DV (Y), given
the values on the IVs (X’s). However, phenomena of interest to social
workers and other applied researchers are never perfectly predictable,
and usually there is variation between observed values of a DV and those
values predicted by OLS regression. The deviation of observed versus pre-
dicted values is called the residual value. Because the goal of regression
analysis is to fit a linear function of the X variables as closely as possible
to the observed Y variable, the residual values for the observed points can
be used to devise a criterion for the “best fit.” Specifically, in regression
problems, the linear function is computed for which the sum-of-squared
deviations of the observed points from values predicted by that func-
tion are minimized. Consequently, this general procedure is referred to
as least squares estimation. Power terms (e.g., x2, x3) can be added IVs to
explore curvilinear effects. Cross-product terms (e.g., x1x2) can be added
as IVs to explore interaction effects.
8 Analysis of Multiple Dependent Variables
test the significance of the individual and group mean variables. If the
group-mean variable is significant in this model it indicates that the
individual-level and group-level slopes are significantly different, and
one has evidence of a contextual effect.
Model Is Specified Correctly. As discussed above, this assumption
concerns the accuracy of the statistical model being tested. Several tests
of specification have been proposed. Ramsey’s Regression Specification
Error Test (RESET) is an F-test of differences of R2 under linear versus
nonlinear assumptions (Ramsey, 1969). More specifically, the RESET
evaluates whether nonlinear combinations of the estimated values help
explain the DV. The basic assumption of the RESET is that if nonlinear
combinations of the explanatory variables have any power in explaining
the endogenous variable, then the model is misspecified. That is, for a
linear model that is properly specified, nonlinear transforms of the fit-
ted values should not be useful in predicting the DV. While Stata, for
example, labels the RESET as a test of omitted variables, it only tests if
any nonlinear transforms of the specified DVs variables have been omit-
ted. RESET does not test for other relevant linear or nonlinear variables
that have been omitted.
The RESET (and other specification tests) should not be viewed as
a substitute for a good literature review, which is critical to identify a
model’s variables. As a rule of thumb, the lower the overall effect (e.g.,
R2 in multiple regression with a single DV), the more likely it is that
important variables have been omitted from the model and that exist-
ing interpretations of the model will change when the model is correctly
specified. Perhaps, specification error may be best identified when the
research relies on model comparison as opposed to the testing of one
model to assess the relative importance of the IVs.
Missing Data. Missing data in empirical social work research can
substantially affect results. Common causes of missing data include par-
ticipant nonresponse and research design. That is, some participants may
decline to provide certain information, some data (e.g., archival) may
not be available for all participants, and information may be purposely
censored, for example, to protect confidentiality. There is a rich body of
literature on missing data analysis. However, there is no consensus in the
methodological literature about what constitutes excessive missingness
(Enders, 2010; Little & Rubin, 2002). Based on whether missing data are
dependent on observed values, patterns of missing data are classified into
Basic Concepts and Assumptions 11
tails,” and there is a relatively small proportion of values both above and
below the mean residual. Platykurtosis is a distribution with “thin tails,”
and there are a relatively large proportion of values both above and below
the mean residual.
According to Stevens (2009), deviation from multivariate normal-
ity has only a small effect on type I error. Multivariate skewness appears
to have a negligible effect on power. However, Olsen (1974) found that
platykurtosis does have an effect on power, and the severity of the effect
increases as platykurtosis spreads from one to all groups.
Several strategies to assess multivariate normality are as follows:
CHAPTER SUMMARIES
MANOVA, MANCOVA, MMR, and SEM are discussed in terms
of (1) purpose of the analysis; (2) important assumptions; (3) key
Basic Concepts and Assumptions 19
concepts and analytical steps; (4) sample size and power requirements;
(5) strengths and limitations; (6) an annotated example; (7) reporting
the results of an analysis; and (8) additional examples from the applied
research literature.
Multivariate Analysis
of Variance: Overview
and Key Concepts
22
Multivariate Analysis of Variance 23
THE t-TEST
Frequently, hypotheses concern differences between the means of two
groups in a sample. For example, a hypothesis could state that, in a ran-
dom sample, the average age of males does not equal the average age
of females. The t-test may be used to test the significance of this differ-
ence in average age between males and females. The variances of the
two sample means may be assumed to be equal or unequal. There are
two basic designs for comparing the mean of one group on a particu-
lar variable with the mean of another group on that same variable. Two
samples are independent if the data in one sample are unrelated to the
data in the other sample. Two samples are paired (also termed correlated
or dependent samples) if each data point in one sample is matched to a
unique data point in the second sample. An example of a paired sample
is a pre-test/post-test study design in which all participants are measured
on a DV before and after an intervention.
William Gosset, who published under the pseudonym of Student,
noted that using a sample’s standard deviation to estimate the popula-
tion’s standard deviation is unreliable for small samples (Student, 1908).
This unreliability is because the sample standard deviation tends to
underestimate the population standard deviation. As a result, Gosset
described a distribution that permits the testing of hypotheses from nor-
mally distributed populations when the population mean is not known.
This distribution is the t-distribution or Student’s t.
The t-test is used when a population’s variance is not known and the
sample’s size is small (N < 30). In fact, even when N > 30, the t-distribution
is more accurate than the normal distribution for assessing probabilities,
and, therefore, t is the distribution of choice in studies that rely on sam-
ples to compare two means.
The t-distribution is similar to the normal distribution when the esti-
mate of variance is based on many degrees of freedom (df) (i.e., larger
samples), but has relatively more scores in its tails when there are fewer
24 Analysis of Multiple Dependent Variables
Normal
t(4)
–8 –7 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 7 8
Figure 2.1 t versus Normal Distribution.
difference is true. The α-level (e.g., .05) defines “likely.” That is, if αis set
at p = .05, then unlikely is .05 or less. If the αof an observed difference
is equal to or less than .05, then the observed difference is unlikely if the
null hypothesis is true. As a result, the null hypothesis is treated as false
and rejected. In contrast, if the α of the observed difference is greater
than .05, then the null is retained.
The following procedure is used to test whether a difference between
two means in a sample is likely to be true at some level of probability
(e.g., .05) for the population:
ANOVA
When there are more than two means, conducting multiple t-tests can
lead to inflation of the type I error rate. This refers to the fact that the more
comparisons that are conducted at, for example, α = .05, the more likely
that a statistically significant comparison will be identified. For example,
if the comparisons are independent, with five means there are 10 possible
pair-wise comparisons. Doing all possible pair-wise comparisons on five
means would increase the overall chance of a type I error to
dfSSB = k – 1 (2.2)
dfSSW = N – k (2.3)
and
F = SSB/SSW (2.4)
Necessary assumptions for the F-test include the following: (1) the
response variable is normally distributed in each group; and (2) the
groups have equal variances. Note that the aforementioned assumptions
are parametric assumptions and that ANOVA (as illustrated here) also
requires that the observations are independent. F is an extended fam-
ily of distributions, which varies as a function of a pair of df (one for
SSB and one for SSW). F is positively skewed. Figure 2.2 illustrates an F
distribution with degrees of freedom (df) = df1,df2, or df = 5,10. F ratios,
like the variance estimates from which they are derived, cannot have a
value less than zero. Consequently, in one sense, F is a one-tailed prob-
ability test. However, the F-ratio is sensitive to any pattern of differences
among means. Therefore, F should be more appropriately treated as a
two-tailed test.
When the overall F-test is not significant (i.e., assuming the hull
hypothesis is true, the observed value of F is likely or greater than a
pre-established α-level, such as p = .05, analysis usually terminates. When
differences are statistically significant (i.e., assuming the hull hypothesis
is true, the observed value of F is unlikely or less than a pre-established
α-level, such as p = .05), the researcher will often identify specific differ-
ences among groups with post hoc tests. That is, with ANOVA, if the null
hypothesis is rejected, then at least two groups are different in terms of the
mean for the group on the DV. To determine which groups are different,
post hoc tests are performed using some form of correction. Commonly
used post hoc tests include Bonferonni, Tukey b, Tukey, and Scheffé. Post
hoc tests are discussed further within the context of MANOVA.
In general, post hoc tests will be robust in those situations where the
one-way ANOVA’s F-test is robust, and will be subject to the same poten-
tial problems with unequal variances, particularly when the sample sizes
are unequal.
α
Figure 2.2 The F Distribution
28 Analysis of Multiple Dependent Variables
MANOVA
Developed as a theoretical construct by Wilks (1932), MANOVA designs
are substantially more complicated than ANOVA designs, and, therefore,
there can be ambiguity about the relationship of each IV with each DV.
In MANOVA, there is at least one IV (i.e., group) with two or more lev-
els (i.e., subgroups), and at least two DVs. MANOVA designs evaluate
whether groups differ on at least one optimally weighted linear combina-
tion (i.e., composite means or centroids) of at least two DVs.
MANOVA, therefore, is an ANOVA with several DVs. The testing of
multiple DVs is accomplished by creating new DVs that maximize group
differences. The gain in power obtained from decreased within-group
sum of squares might be offset by the loss in degrees of freedom. One
degree of freedom is lost for each DV. Some researchers argue that
MANOVA is preferable to performing a series of ANOVAs (i.e., one for
each DV) because (1) multiple ANOVAs can capitalize on chance (i.e.,
inflate α or increase type I error); and (2) ANOVA ignores intercorrela-
tions among DVs; in contrast, MANOVA controls for intercorrelations
among DVs. See chapter 1 for additional discussion of intercorrelations
among DVs and inflated α−levels
A two-group MANOVA-termed Hotelling’s T 2 is used when one IV
has two groups and there are several DVs (Hotelling, 1931). For example,
there might be two DVs, such as score on an academic achievement test
and attention span in the classroom, and two levels of types of educa-
tional therapy, emphasis on perceptual training versus emphasis on aca-
demic training. It is inappropriate to use separate t tests for each DV to
evaluate differences between groups because this strategy inflates type I
error. Instead, Hotelling’s T 2 is used to see if groups differ on the two DVs
combined. The researcher asks if there are non-chance differences in the
centroids (average on the combined DVs) for the two groups. Hotelling’s
T 2 is a special case of MANOVA, just as the t-test is a special case of
ANOVA, when the IV has only two groups.
One-way MANOVA evaluates differences among optimally weighted
linear combinations for a set of DVs when there are more than two level
levels (i.e., groups) of one IV (factor). Any number of DVs may be used;
the procedure addresses correlations among them, and controls for type
I error. Once statistically significant differences are found, planned and
post hoc comparisons (discussed below) are available to assess which
DVs are influenced by the IV.
Multivariate Analysis of Variance 29
is the absolute value of the determinant of the matrix formed by the vec-
tors representing the parallelogram’s sides. The more similar the vectors
(i.e., they point in the same direction), the smaller the area or volume and
the smaller the determinant. In higher dimensions, the analog of volume
is called hypervolume and the same conclusion can be drawn by the same
argument: the hypervolume of the parallel-sided region determined by
k vectors in k dimensions is the absolute value of the determinant whose
entries are their components in the directions of the these vectors.
For example, in Figure 2.3, parallelogram 1 has a greater volume
(e.g., determinant) than parallelogram 2. The parallelogram tilts, and the
length of its sides remains the same.
For nonsquare matrices, it can be shown that there are always some vec-
tors (either rows or columns) that are not independent of the other vectors.
Therefore, the determinant of such a nonsquare matrix is always zero; and
⎡ σ x2 cov xxy ⎤
V=⎢ ⎥
⎣cov yx σ y2 ⎦
Parallelogram 1 Parallelogram 2
|E|
− (2.6)
|T|
Wilk’s Λis an inverse criterion: the smaller the value of Λ, the more evi-
dence for the relationship of the IVs with the DVs. If there is no associa-
tion between the two sets of variables, then Λwould approach 1.
Note that other multivariate F-tests rely on alternative measures of
multivariate variance, such as the trace and an eigenvalue. The trace of
an n-by-n square matrix A is defined to be the sum of the elements on
the main diagonal (the diagonal from the upper left to the lower right) of
A. An eigenvalue provides quantitative information about the variance
in a portion of a matrix. Specifically, if A is a linear transformation repre-
sented by a matrix A such that AX = λX for some scalar λ, thenλ is called
the eigenvalue of A with corresponding eigenvector X.
Pillai–Bartlett trace is the trace of H/T (i.e., the variance between
groups). Hotelling–Lawley trace is the ratio of the determinants H and
E or |H|/|E|. Roy’s largest root is the largest eigenvalue of H/E (i.e., the
eigenvalue of the linear combination that explains most of the variance
and covariance between groups).
The sampling distributions of these statistics are usually converted
to approximate F-ratio statistics (Tabachnick & Fidell, 2007), and they
will generally produce similar results with more than two groups. The
32 Analysis of Multiple Dependent Variables
Assumptions of MANOVA
MANOVA assumes that data being analyzed follow the GLM (Stevens,
2009; Tabachnick & Fidell, 2007). These assumptions are discussed in
further detail in chapter 1, and demonstrated in the annotated example
below. In particular, the tenability of following assumptions should be
assessed prior to conducting a MANOVA:
the overall type I error rate increases. This problem also may occur when
a MANOVA is followed-up with multiple comparisons between groups
on linear combinations of the DVs.
There are two basic approaches to assessing multivariate group
differences: (1) perform the omnibus test first, followed by a study of
some comparisons between pairs of means; or (2) proceed directly to
group comparisons and answer specific research questions. That is, some
authors (c.f. Huberty & Smith, 1982; Wilcox, 1987) question the need
to test the overall null hypothesis, and argue that this second approach
may be appropriate when there is an a priori ordering of the DVs, which
implies that a specific set of hypotheses be tested. Howell (2009) explains
that the hypotheses tested by multivariate F-tests and a multiple compar-
ison tests are different. Multivariate F-tests distribute differences among
groups across the number of degrees of freedom for groups. This has
the effect of diluting the overall F-value if, for example, several group
means are equal to each other but different one other mean. In general,
follow-up tests, sometimes termed comparisons or contrasts, are a priori
or planned, or post hoc or unplanned. An a priori comparison is one
that the researcher has decided to test prior to an examination of the
data. These comparisons are theory-driven and part of a strategy of con-
firmatory data analysis. A post hoc comparison is one that a researcher
decided to test after observing all or part of the data. These comparisons
are data-driven and are part of an exploratory data analysis strategy.
The following comparison strategies are discussed here: (1) multiple
ANOVAs (Leary & Altmaier, 1980); (2) two-group multivariate compari-
sons (Stevens, 2009); (3) step-down analysis (SDA) (Roy & Bargmann,
1958); and (4) simultaneous confidence intervals (SCI) (Smithson, 2003).
Multiple ANOVAs and SCIs are univariate perspectives. Two group mul-
tivariate comparisons and SDA are multivariate perspectives.
It should be noted that descriptive discriminant analysis (DDA) also
has been suggested (Huberty, 1994). In DDA, IVs are linearly combined to
create a composite IV that maximally differentiates the groups. Although
mathematically identical to a one-way MANOVA, DDA emphasizes clas-
sification and prediction. The first step in any DDA is to derive discrim-
inant functions that are linear combinations of the original variables.
The first discriminant function is the linear combination of variables
that maximizes the ratio of between-groups to within-groups vari-
ance. Subsequent discriminant functions are uncorrelated with previous
Multivariate Analysis of Variance 35
Mi Mj
q observed = (2.7)
⎛ 1⎞
MSerror
⎝s⎠
where Mi and Mj are the group means being compared MSerror is mean
square within, which is an estimate of the population variance based on
the average of all variances within the several samples, and S is the number
of observations per group (the groups are assumed to be of equal size).
Once the qobserved is computed, it is then compared with a qcritical value
from a table of critical values (see this book’s companion webpage). The
value of qcritical depends upon the a-level, the degrees of freedom v = N ¡K,
Multivariate Analysis of Variance 37
The Scheffé Test. The Scheffé test uses F tables versus Studentized
range tables (Scheffé, 1953). If the overall null hypothesis is rejected,
F -values are computed simultaneously for all possible comparison pairs.
These F-values are larger than the value of the overall F test. The formula
simply modifies the F-critical value by taking into account the number of
groups being compared: (a –1) Fcrit. The new critical value represents the
critical value for the maximum possible family-wise error rate. This test
is the most conservative of all post hoc tests discussed here. Compared
to Tukey’s HSD, Scheffé has less power when making pair-wise compari-
sons (Hsu, 1996). That is, although the Scheffé test has the advantage
of maintaining the study-wise significance level, it does so at the cost of
the increased probability of type II errors. Some authors (cf. Brown &
Melamed, 1990) argue that he Scheffé test is used most appropriately if
there is a need for unplanned comparisons. Toothacker (1993) recom-
mends the Scheffé test for complex comparisons, or when the number of
comparisons is large.
One limitation of using multiple ANOVAs to follow-up a significant
multivariate F-test result is that univariate and multivariate tests use dif-
ferent information. If multiple ANOVAs with unadjusted or adjusted
38 Analysis of Multiple Dependent Variables
Model Validation
In the last stage of MANOVA, the model should be validated. If sam-
ple size permits, one approach to validation is sample splitting, which
involves creating two subsamples of the data and performing a MANOVA
on each subsample. Then, the results can be compared. Differences in
results between subsamples suggest that these results may not generalize
to the population.
R2
f 2= . (2.10)
1− R 2
ANNOTATED EXAMPLE
A study is conducted to test a model that compares satisfaction with agency
services and satisfaction with parenting by client race. Satisfaction with
agency services and satisfaction with parenting are operationalized as scale
scores. Race is operationalized as clients’ self-reports of their racial group
(coded 1 = Caucasian, 2 = Asian American, 3 = African American). Both
the satisfaction with agency services and satisfaction with parenting scales
are considered reliable. Both scales have associated Cronbach’s alphas
greater than .80. Cronbach’s alpha is the most common form of internal
44 Analysis of Multiple Dependent Variables
SPSS, Stata, and SAS commands are numbered in sequence and high-
lighted in Courier. The data set used in this annotated example, enti-
tled Annotated_Example1_FULL_N=66.sav, may be downloaded from
this book’s companion website. These data are in SPSS format, but may
be imported into Stata and SAS.
1. include ‘C:\Users\pat\Desktop\MANOVA_Ex_1\
normtest.sps’
2. normtest vars = satisfaction_
parenting,satisfaction_adopt_agency
3. execute.
The first line of the aforementioned syntax includes (calls) the macro
entitled normtest.sps, which is located in the directory c:\spsswin\,
and the second line invokes the macro for the variables satisfaction_
parenting,satisfaction_adopt_agency. This macro is available from http://
www.columbia.edu/~ld208/ and this book’s companion website.
The results of DeCarlo’s (1997) are summarized in Figure 2.4. For brev-
ity, the focus here is on Small’s (1980) and Srivistava’s (1984) tests of
46 Analysis of Multiple Dependent Variables
Srivastava’s test
chi (b1p) df p-value
.4892 2.0000 .7830
Srivastava’s test
b2p N(b2p) p-value
2.7672 –.5459 .5851
Mardia’s test
b2p N(b2p) p-value
7.9961 –.0040 .9968
1. findit mvtest
2. install
3. mvtest normality satisfaction_parenting
satisfaction_adopt_agency, stats(all)
For this second command statement, DATA = SAS data set to be analyzed.
If the DATA = option is not supplied, the most recently created SAS data
set is used. VAR = the list of variables to used. Individual variable names,
separated by blanks, must be specified. PLOT = MULT requests a high-
or low-resolution chi-square quantile–quantile (Q-Q) plot of the squared
Mahalanobis distances of the observations from the mean vector. PLOT
= UNI requests high-resolution univariate histograms of each variable
with overlaid normal curves and additional univariate tests of normality.
Note that the univariate plots cannot be produced if HIRES = NO. PLOT
= BOTH (the default) requests both of the above. PLOT = NONE sup-
presses all plots and the univariate normality tests. HIRES = YES (the
default) requests that high-resolution graphics be used when creating
plots. You must set the graphics device (GOPTIONS DEVICE =) and
48 Analysis of Multiple Dependent Variables
D>1 1
Cook’s D values will be added to the data file as COO_1 and COO_2.
Box’s M 12.046
F 1.906
df1 6
df2 31697.215
Sig. .076
R-Squared = .242
Adjusted R-Squared = .179
R-Squared = .366
Adjusted R-Squared = .305
R-Squared = .083
Adjusted R-Squared = .023
Stata Commands
1. findit xi3
2. install
3. xi3: regress satisfaction_adopt_agency
g.race* satisfaction_parenting
SAS Commands
1. proc glm data = file name;
2. class race;
3. model satisfaction_adopt_agency =
race satisfaction_parenting
race*satisfaction_parenting;
4. run;
5. quit;
54 Analysis of Multiple Dependent Variables
Multivariate TestsC
Effect Partial Eta
Value F Hypothesis df Error df Sig. Squared
Intercept Pillai’s Trace .999 48837.040a 2.000 62.000 .000 .999
Wilks’ Lambda .001 48837.040a 2.000 62.000 .000 .999
Hotelling’s Trace 1575.388 48837.040a 2.000 62.000 .000 .999
Roy’s Largest Root 1575.388 48837.040a 2.000 62.000 .000 .999
Race Pillai’s Trace .211 3.724 4.000 126.000 .007 .106
Wilks’ Lambda .793 3.816a 4.000 124.000 .006 .110
Hotelling’s Trace .256 3.904 4.000 122.000 .005 .113
Roy’s Largest Root .233 7.342b 2.000 63.000 .001 .189
a. Exact statistic
b. The statistic is an upper bound on F that yields a lower bound on the significance level.
c. Design: Intercept+Race
for the current model. All DVs were judged to be sufficiently reliable
to warrant stepdown analysis, since the reported reliability coefficients
for and were and respectively. Both the satisfaction with agency services
and satisfaction with parenting scales are considered reliable. Both scales
have associated Cronbach’s alphas greater than .80. To conduct a Roy–
Bargmann stepdown analysis with two DVs, one ANCOVA is performed
with the most import DV as the DV, and the next most important DV
as a covariate. (Note that, for example, with three DVs, an ANCOVA is
performed with the most important DV as the DV, and the next most
important DV as a covariate. Then, an ANCOVA is performed with the
next most important DV as the DV, and the next two most important
DVs as the covariates.)
To conduct a Roy–Bargmann stepdown analysis with two DVs, pro-
ceed as follows in SPSS:
1. Proc glm;
2. class IV;
3. model DVs = IV;
4. means IV / LSD alpha = p-value;
5. manova h=IV;
6. run;
1. proc glm;
2. class race;
3. model satisfaction_parenting satisfaction_
adopt_agency = race;
4. means race / LSD alpha = .05;
5. manova h=race;
6. run;
Results
This study tested a model that compared a client’s satisfaction with an
adoption agency’s services and their satisfaction with being an adoptive
parent by the client’s race (coded 1 = Caucasian, 2 = Asian American,
3 = African American). Satisfaction with agency services and satisfaction
with parenting were operationalized as scale scores. Race was operation-
alized as clients’ self-reports of their racial group.
A one-way multivariate analysis of variance (MANOVA) was per-
formed to determine the effect of being a member of three racial groups
(i.e., African, Asian, and Caucasian American) on a client’s satisfaction
Multivariate Analysis of Variance 59
Table 2.2 Means, and Standard Deviations for each Dependent Variable
by Race
Caucasian Asian American African
N = 16 N = 19 American
Mean(SD) Mean(SD) N = 31
Mean(SD)
Satisfaction w/ Adoption 49.31(1.078)* 51.00(1.764)* 51.26(1.949)*
Agency
Satisfaction w/ Parenting 50.50(1.826) 40.74(1.485) 50.32(1.922)
*Significant at p < .05
60 Analysis of Multiple Dependent Variables
Jewell, J. D., & Stark, K. D. (2003). Comparing the family environments of adoles-
cents with conduct disorder or depression. Journal of Child and Family Studies,
12(1), 77–89.
This study attempted to differentiate the family environments of youth
with Conduct Disorder (CD) compared to youth with a depressive disorder.
Participants were 34 adolescents from a residential treatment facility. The
K-SADS-P was used to determine the youth’s diagnosis, while their family
environment was assessed by the Self Report Measure of Family Functioning
Child Version. A MANOVA was used to compare the two diagnostic groups on
seven family environment variables. Results indicated that adolescents with
CD described their parents as having a permissive and ambiguous discipline
style, while adolescents with a depressive disorder described their relation-
ship with their parents as enmeshed. A discriminant function analysis, using
the two family environment variables of enmeshment and laissez-faire family
style as predictors, correctly classified 82% of the participants. Implications
for treatment of youth with both types of diagnoses and their families are dis-
cussed. (Journal abstract.).
Multivariate Analysis of Variance 61
Thevos, A. K., Thomas, S. E., & Randall, C. L. (2001). Social support in alco-
hol dependence and social phobia: Treatment comparisons. Research on Social
Work Practice, 11, 458–472.
This study investigated whether different alcoholism treatment approaches
differentially impact social support scores in individuals with concurrent alco-
hol dependence and social phobia. Individuals (N = 397) were selected retro-
spectively from a larger pool of participants enrolled in a multisite randomized
clinical trial on treatment matching. Three standard treatments were delivered
over 12 weeks: Cognitive-Behavioral Therapy (CBT), Twelve Step Facilitation
Therapy (TSF), and Motivational Enhancement Therapy (MET). MANOVA
62 Analysis of Multiple Dependent Variables
was used to analyze social support measures to test the effects of treatment
group and gender. For men, there was significant improvement on two mea-
sures of social support regardless of treatment group. Women who received
CBT or TSF had better support outcomes than women who received MET.
Multivariate Analysis
of Covariance
63
64 Analysis of Multiple Dependent Variables
ASSUMPTIONS OF MANCOVA
For MANCOVA, all of the assumptions for MANOVA apply, with the fol-
lowing additions: (1) covariates are measured without error; (2) a linear
relationship between the DV and the covariates; and (3) homogeneity of
the regression hyperplanes. This assumption, also termed homogeneity
of regressions or parallelism, requires that the regression slopes between
the covariate and DV are the same (homogeneous) for all groups.
According to Stevens (2002), a violation of this assumption means that
there is a statistically significant covariate-by-IV interaction. Conversely,
if the interaction is not statistically significant, this assumption is met.
To test this assumption for a MANCOVA with one covariate, then,
a model that contains a covariate-by-IV interaction term is tested. It is
hoped that this interaction term does not achieve statistical significance
(e.g., p > .05). If there is more than one covariate, then, for each covariate,
a covariate-by-IV interaction term is included in the model. It is hoped
that all covariate-by-IV interaction terms are not statistically significant.
MANCOVA AND MANOVA are similar in terms of (1) evaluating
the omnibus null hypothesis, (2) assessing overall model fit, (3) explor-
ing multivariate group differences, and (4) model validation.
of cells (IVs by DVs) in the design and g is the number of covariates, then
groups = k + g. This strategy for determining sample size is demonstrated
in the annotated example below.
C (J − )
< .10. (3.1)
N
This formula can be re-written as follows:
participants, then C < 6 − 2 = 4; that is, four or more covariates are used,
then estimates of the adjusted means are likely to be unstable.
All covariates should be correlated with the DV, and none should be
substantially correlated with each other. If covariates are substantially
correlated with each other, they may not add significantly to reduction
of error, or they may cause computational difficulties such as multicol-
linearity. In addition, to avoid confounding of the intervention effect with
a change on the covariate, one should only use pretest or other informa-
tion gathered before the intervention begins as covariates. If a covariate
is used that is measured after the intervention beings and that variable
was affected by the intervention, then the change on the covariate may
be correlated with change of the DV. Consequently, when the covariate
adjustment is made, part of the intervention effect is removed.
MANCOVA has the following analytical limitations:
ANNOTATED EXAMPLE
A study is conducted to test a model that compares satisfaction with
agency services and satisfaction with parenting by client race, controlling
for client self-efficacy. Satisfaction with agency services, satisfaction with
parenting, and client self-efficacy were operationalized as scale scores. Race
was operationalized as clients’ self-reports of their racial group. The three
scales have associated Cronbach’s alphas greater than .80. See chapter 2
for a discussion of reliability.
The data analysis strategy used here is consistent with the recom-
mendations of several prominent authors (cf. Tabachnick & Fidell, 2007;
Stevens, 2009) to use Roy–Bargmann stepdown analysis and simul-
taneous confidence intervals to follow-up comparisons to elucidate
Multivariate Analysis of Covariance 67
1. include ‘C:\Users\pat\Desktop\MANCOVA_Ex_1\
normtest.sps’
2. normtest vars = satisfaction_
parenting,satisfaction_adopt_agency,
self_efficacy/.
3. execute.
The first line of the aforementioned syntax includes (calls) the macro
entitled normtest.sps, which is located in the directory C:\Users\pat\
68 Analysis of Multiple Dependent Variables
1. Select run→all.
The results of DeCarlo’s (1997) are summarized in Figure 3.1. For brev-
ity, the focus here is on Small’s (1980) and Srivistava’s (1984) tests of
multivariate kurtosis and skew, Mardia’s (1970) multivariate kurtosis,
and an omnibus test of multivariate normality based on Small’s statis-
tics (see Looney, 1995). However, all other tests calculated by the macro
were consistent with these results. That is, all tests of multivariate skew,
Srivastava’s test
chi (b1p) df p-value
7.5667 3.0000 .0559
Srivastava’s test
b2p N (b2p) p-value
3.2197 .5707 .5682
Mardia’s test
b2p N (b2p) p-value
15.9375 .6289 .5294
1. findit mvtest
2. install
3. mvtest normality satisfaction_parenting
satisfaction_adopt_agency self_efficacy,
stats(all)
For this second command statement, DATA = SAS data set to be ana-
lyzed. If the DATA = option is not supplied, the most recently created SAS
data set is used. VAR = the list of variables to used. Individual variable
names, separated by blanks, must be specified. PLOT = MULT requests
a high- or low-resolution chi-square quantile–quantile (Q-Q) plot of
70 Analysis of Multiple Dependent Variables
D>1 1
Cook’s D values will be added to the data file as coo_1 and COO_2.
Box’s M 13.658
F 2.147
df1 6
df2 64824.923
Sig. .045
R-Squared = .325
Adjusted R-Squared = .222
Figure 3.4 Group 1 versus Group 3.
R–Squared = .368
Adjusted R–Squared = .287
Figure 3.5 Group 1 versus Group 2.
76 Analysis of Multiple Dependent Variables
R–Squared = .177
Adjusted R–Squared = .071
Figure 3.6 Group 2 versus Group 3.
with parenting and race and self efficacy (highlighted with rectangles in
the output). In each analysis the race and satisfaction with parenting is
not statistically significant at p = .05. Therefore, homogeneity of regres-
sion was achieved for all components of the stepdown analysis.
Stata Commands
1. findit xi3
2. install
3. xi3: regress satisfaction_adopt_
agency g.race* satisfaction_parenting
g.race*self_efficacy
SAS Commands
1. proc glm data = file name;
2. class race;
3. model satisfaction_adopt_agency = race
satisfaction_parenting race*satisfaction_
parenting race*self_efficacy;
4. run;
5. quit;
Multivariate Analysis of Covariance 77
Multivariate TestsC
a. Exact statistic
b. The statistic is an upper bound on F that yields a lower bound on the significanc level.
c. Design: Intercept+self_efficacy+Race
Dependent Variable:satisfaction_adopt_agency
Source Type III Sum
of Squares df Mean Square F Sig.
Corrected Model 40.400a 4 10.100 3.761 .010
Intercept 140.908 1 140.908 52.466 .000
self_efficacy 6.634 1 6.634 2.470 .122
satisfaction_parenting .074 1 .074 .027 .869
Race 37.085 2 18.542 6.904 .002
Error 131.600 49 2.686
Total 136978.000 54
Corrected Total 172.000 53
a. R Squared = .235 (Adjusted R Squared = .172)
Figure 3.8 Tests of Between-Subjects Effects.
Multivariate Analysis of Covariance 79
1. Proc glm;
2. class IV;
Multivariate Analysis of Covariance 81
Note that the options command CLASS is followed by the names of the
variables to be used as MANOVA IVs (factors) in the model. The options
command MODEL follows the format of names of DV(s) = names of
IV(s).
For this analysis, the following commands were used:
1. proc glm;
2. class race;
3. model satisfaction_parenting satisfaction_
adopt_agency
= race;
4. means race / LSD alpha =.05;
5. manova h=race;
6. run;
Results
This study tested a model that compared a client’s satisfaction with an
adoption agency’s services and their satisfaction with being an adoptive
parent by the client’s race (coded 1 = Caucasian, 2 = Asian, 3 = African
American), controlling for client self-efficacy. Satisfaction with agency
services, satisfaction with parenting, and client self-efficacy were opera-
tionalized as scale scores. Race was operationalized as clients’ self-reports
of their racial group.
A one-way multivariate analysis of covariance (MANCOVA) was
performed to determine the effect of being a member of three racial
groups on a client’s satisfaction with an adoption agency’s services and
their satisfaction with being an adoptive parent, controlling for client
self-efficacy.
No problems were noted for missing data, outliers, or multivari-
ate normality. The tenability of the absence of outliers assumption was
evaluated using Cook’s distance, D, which provides an overall measure
of the impact of an observation on the estimated MANCOVA model.
Observations with larger values of D than the rest of the data are those
which have unusual leverage.
For these data, five cases had values greater than .07 (.08–.11). If they
appear atypical, cases that contain values of D greater than 0.07 may be
deleted. See chapter 1 for additional discussion about the management of
outliers. Alternatively, MANCOVA results should be reported with and
without the outliers. Note that for these cases, MANCOVA results with
and without cases that had values on one or both DVS that exceeded
Multivariate Analysis of Covariance 83
.07 yielded equivalent results. Consequently, only results for all cases are
reported below. These results suggest that the assumption of absence of
outliers is tenable.
The tenability of the multivariate normality assumption was
evaluated by using an SPSS macro developed by DeCarlo (1997). All
tests of multivariate skew, multivariate kurtosis, and an omnibus test
of multivariate normality were not statistically significant at p = .05.
These results suggest that the assumption of multivariate normality is
tenable.
Box’s M was used to test the assumption (i.e., H0) of equality of vari-
ance–covariance matrices. Box’s M equaled 13.658, F(6, 64825) = 2.147,
p = .045, which means that equality of variance–covariance matrices can-
not be assumed.
Significant differences were found among the three categories of race
on the DVs, with Wilk’s Lambda = .760, F(4,98) = 3.600, p < .01. A Roy–
Bargmann stepdown analysis and simultaneous confidence intervals,
analysis were conducted to further explore these differences.
For these data, at a statistically significant level, controlling for cli-
ent self-efficacy, there are mean differences between racial groups on
level of satisfaction with the adoption agency (see Table 3.2). That is, at
a statistically significant level, controlling for client self-efficacy. African
American respondents report the highest level of satisfaction with the
adoption agency followed by Asian American and Caucasian American
respondents. Although these results are statistically significant, they
Table 3.2 Means and Standard Deviations for each Dependent Variable by Race
Caucasian 95% CI Asian 95% CI African 95% CI
N=18 N=18 American
Mean(SE) Mean(SE) N=18
Mean(SE)
Satisfaction 50.50 49.26– 49.82 48.95– 50.02 49.13–
w/Parenting (.43) 51.37 (.43) 50.69 (.44) 50.91
Satisfaction 49.13 48.35– 50.76 49.99– 51.11 50.31–
w/Adoption (.39)* 49.91 (.39)* 51.54 (.40)* 51.90
Agency
*Significant at p <.05
84 Analysis of Multiple Dependent Variables
Jones, R., Yates, W. R., Williams, S., Zhou, M., & Hardman, L. (1999). Outcome
for adjustment disorder with depressed mood: Comparison with other mood
disorders. Journal of Affective Disorders, 55 (1), 55–61.
Retrospective data were used to evaluate the construct validity of the adjust-
ment disorder diagnostic category. The data primarily consisted of SF-36
Health Status Survey responses by a large group of adult psychiatric outpatients
before treatment and again six months after beginning treatment. Respondents
were divided into five diagnostic groups, and MANOVA, MANCOVA and chi
square were used to clarify relationships among diagnoses, sociodemographic
data and SF-36 scores.
Diana M. DiNitto, Deborah K. Webb and Allen Rubin. (2002). The Effectiveness of
an Integrated Treatment Approach for Clients With Dual Diagnoses, Research
on Social Work Practice, 12 (5), 621–641.
A MANCOVA tested the effectiveness of adding a psychoeducationally ori-
ented group therapy intervention, Good Chemistry Groups, to standard inpa-
tient chemical-dependency services for clients dually diagnosed with mental
and substance dependence disorders. Ninety-seven clients were randomly
assigned to an experimental group (n = 48) and a control group (n = 49).
Outcome variables included drug and alcohol use, participation in self-help
support group meetings, incarceration days, psychiatric symptoms, psychiatric
Multivariate Analysis of Covariance 85
Farooqi, A., Hägglöf, B., Sedin, G., Gothefors, L., & Serenius, F. (2007). Mental
health and social competencies of 10- to 12-year-old children born at 23 to
25 weeks of gestation in the 1990s: A Swedish national prospective follow-up
study. Pediatrics, 120, 118–133.
The study investigated a national cohort of extremely immature children with
respect to behavioral and emotional problems and social competencies, from
the perspectives of parents, teachers, and children themselves. MANCOVA of
parent-reported behavioral problems revealed no interactions, but significant
main effects emerged for group status (extremely immature versus control), fam-
ily function, social risk, and presence of a chronic medical condition, with all effect
sizes being medium and accounting for 8% to 12% of the variance. MANCOVA of
teacher-reported behavioral problems showed significant effects for group status
and gender but not for the covariates mentioned above. According to the teachers’
ratings, extremely immature children were less well adjusted to the school envi-
ronment than were control subjects. However, a majority of extremely immature
children (85%) were functioning in mainstream schools without major adjust-
ment problems. results. Reports from children showed a trend toward increased
depression symptoms compared with control subjects.
Pomeroy, E. C., Kiam, R., & Abel, E. M. (1999). The effectiveness of a psycho-
educational group for HIV-infected/affected incarcerated women. Research on
Social Work Practice, 9 (2), 171–187.
This study evaluated the effectiveness of a psychoeducational group inter-
vention for HIV/AIDS-infected and affected women at a large southeastern
county jail facility. A MANCOVA yielded significant differences between the
experimental and comparison groups. Subsequent analysis of covariance for
86 Analysis of Multiple Dependent Variables
Rubin, A., Bischofshausen, S., Conroy-Moore, K., Dennis, K., Hastie, M., Melnick,
L., Reeves, D., & Smith, T. (2001). The effectiveness of EMDR in a child guid-
ance center. Research on Social Work Practice, 11 (4), 435–457.
This study evaluated the effectiveness of adding Eye Movement
Desensitization and Reprocessing (EMDR) to the routine treatment regi-
men of child therapists. MANCOVA found no significant differences in Child
Behavior Checklist scores between groups. Subanalyses conducted for 33 cli-
ents with elevated pretest scores found moderate effect sizes that approached,
but fell short of, statistical significance. These findings raise doubts about
notions that EMDR produces rapid and dramatic improvements with children
whose emotional and behavioral problems are not narrowly connected to a
specific trauma and who require improvisational deviations from the standard
EMDR protocol. Further research is needed in light of the special difficulties
connected to implementing the EMDR protocol with clients like those in this
study.
4
Multivariate Multiple
Regression
87
88 Analysis of Multiple Dependent Variables
RSS(p )
Mallows C p = − (n p) (4.1)
s2
where n is the sample size, p is the number of IVs plus the intercept,
RSS(p) is the residual sum-of -squares from a model containing p param-
eters, and s2 is the mean residual sum-of-squares from the model con-
taining p parameters. Recall from chapter 2, that residual sum-of-squares
(also termed SSW, sum-of-squares-within, and sum-of-squares error) is
a measure of the variability within respective groups around that group’s
mean for the same variable (e.g., age).
The values of Cp are typically positive and greater than one, where
lower values are better. Models that yield the best (lowest) values of Cp
will tend to be similar to those that yield the best (highest) values of
adjusted R2, but the exact ranking may be slightly different. Compared
Multivariate Multiple Regression 91
to adjusted R2, the Cp criterion tends to favor models with fewer param-
eters, so it is perhaps more robust to model overfitting. Generally, plots
of R2 and Cp versus the number of variables are examined to determine
an optimal model.
SPSS does not compute Mallow’s Cp directly from its menu system.
However, a syntax file that can be used to calculate Cp may be downloaded
from http://www.spsstools.net/Syntax/RegressionRepeatedMeasure/
DoAll-SubsetsRegressions.txt. For Stata users, rsquare is a module that
will calculate Mallow’s Cp. The rsquare module can be downloaded from
within Stata by typing findit rsquare. For SAS users, the SAS command
file, model_selection.sas, which can be downloaded from http://www-
personal.umich.edu/~kwelch/workshops/regression/sas/model_
selection.sas calculates Cp.
When all-possible-subsets and best subsets models are reported, it
is essential to describe how the model was derived. It is impossible to
determine from the numerical results whether a set of IVs was specified
before data collection or was obtained by using a selection procedure for
finding the “best” model. Parameter estimates and ANOVA tables do not
change to reflect which variable selection procedure was used. Results are
the same as what would have been obtained if that set of IVs had been
specified in advance.
An increasing number of applied researchers believe that computer-
controlled model-enhancement procedures, such as stepwise regression
and all-possible-subsets regression, are most appropriate in exploratory
research. For theory testing, a researcher should base selection of vari-
ables and their order of entry into a model on theory, not on a computer
algorithm. Menard (1995), for example, writes,
model for each of the variables not in the model in order to obtain their
F-to-enter values. It should be noted that statistical software packages,
such as SPSS, do not fit all models from scratch. Instead, the stepwise
search process, for example, is performed by a sequence of transforma-
tions to the correlation matrix of the variables in the model. That is,
variables are only read in once, and the sequence of adding or removing
variables and recalculating the F-statistics requires an updating opera-
tion on the correlation matrix, which is called “sweeping.”
The nominal significance level (e.g., .05) used at each step in stepwise
regression is subject to inflation, such that by the last step, the true signif-
icance level may be much higher, increasing the chances of type I errors.
That is, when many IVs are considered, and there is nothing theoretically
compelling about any of them before the data are collected, probability
theory suggests that at least one IV will achieve statistical significance
(Draper, Guttman, & Lapczak, 1979). Therefore, as more tests are per-
formed, the probability that one or more achieve statistical significance
because of chance (type I error) increases. This phenomenon (discussed
in chapter 2), sometimes termed probability pyramiding or inflated alpha,
explains why IVs that are theoretically unimportant sometimes achieve
statistical significance. Stepwise regression, therefore, usually results in
measures (F-test, t-tests, R2, standard error of the estimate, prediction
intervals) that are biased toward too much strength in the relationship
between the DV and the IVs. It is incorrect to call them statistically sig-
nificant because the reported values do not accurately reflect of the selec-
tion procedure.
Also troublesome is when there are missing data. Stepwise procedures
must exclude observations that are missing for any of the potential IVs.
Sometimes one or more of the IVs in the final model are no longer sta-
tistically significant when the model is fitted to the data set that includes
missing observations that had been deleted, even when these values are
missing at random.
Perhaps, the fundamental problem with computer-controlled meth-
ods is that they often substitute for thinking about a problem. Statistical
computing packages available today do our arithmetic for us in a way
that was totally unthinkable thirty years ago. When solving regres-
sion equations with many variables could take weeks using a desk cal-
culator, researchers were understandably reluctant to embark on the
Multivariate Multiple Regression 93
xi xi
x i* = , i ,..., k ,
si
(4.2)
yi y
y =
*
,
sy
where x–i and y– are the means of each variable in the sample and Si and
Sy are the standardized variables. A second way to standardize regres-
sion coefficients is by multiplying them by the ratio between the stan-
dard deviation of the respective IV (Si) and the standard deviation of the
DV (Sy),
Bi βli (s i / s y ), (4.3)
⎛ y 1 ⎞ ⎛ 1 x 11 x 12 " x 1q ⎞ ⎛ β0 ⎞ ⎛ ε1 ⎞
⎜ y ⎟ ⎜1 x ⎜ ⎟
x 22 " x 2q ⎟ ⎜ β1 ⎟ ⎜ ε2 ⎟
⎜ 2⎟ = ⎜ 21
⎟⎜ ⎟ + (4.5)
⎜ # ⎟ ⎜# # # # ⎟⎜ # ⎟ ⎜ ⎟
⎜⎝ y ⎟⎠ ⎜ 1 x ⎟⎜ ⎟ ⎜ # ⎟
n ⎝ n1 x n2 " x nq ⎠ ⎝ βq ⎠ ⎜⎝ ε ⎟⎠
n
Hotelling’s trace, Pillai’s trace, and Roy’s largest root. See chapter 2 for a
detailed discussion of these multivariate F-tests.
under the assumption that the data are generated by a structural linear
model with normally distributed variables and disturbances. According
to Anderson (1999; 2002), if these conditions hold, the coefficients of
the first canonical pair correspond to those in a MMR, and the asymp-
totic distribution of the sample canonical correlations and coefficients
of the canonical variates may be used for statistical inference about the
coefficients.
Model Validation
In the last stage of MMR, the model should be validated. If sample size
permits, one approach to validation is sample splitting, which involves
creating two subsamples of the data and performing an MMR analysis
on each subsample. Then, the results can be compared. Differences in
results between subsamples suggest that these results may not generalize
to the population.
ANNOTATED EXAMPLE
A study is conducted to test a model that predicts post-adoption service
utilization and positive adoption. Specifically, the study tests a model that
includes (1) factors influencing the utilization of post-adoption services
(parents’ perceptions of self-efficacy, relationship satisfaction between par-
ents, and attitudes toward adoption) as IVs and (2) service utilization and
positive adoption outcomes (satisfaction with parenting and satisfaction
with adoption agency) as DV. All variables were operationalized as scale
scores. The researcher decides to perform a MMR analysis to investigate
further the relationship between the following (1) IVs: parents’ percep-
tions of self-efficacy, relationship satisfaction between parents, and attitudes
toward adoption, and (2) the following DVs: service utilization, and satis-
faction with parenting. The next section presents Stata commands, which
are numbered in sequence and in Courier. These Stata commands use
the format DVs = IVs.
Figure 4.1 illustrates the results of the manova command. The overall
F tests the null hypothesis that regression coefficients for all IVs equal zero
for all DVs. The multivariate F is based on the sum of squares between
100 Analysis of Multiple Dependent Variables
Total 299
e = exact, a = approximate, u = upper bound on F
and within groups, and on the sum of crossproducts; that is, it considers
correlations between the criterion variables (see chapter two for a more
detailed discussion).
The F- and p-values for all four tests under the section labeled
“Model,” Wilk’s lambda, Lawley–Hotelling trace, Pillai’s trace, and Roy’s
largest root, are statistically significant (p < .001). Because the overall
multivariate tests are significant, it is concluded that there are differences
among the DVs as a function of one or more IVs.
Figure 4.2 illustrates the results of the mvreg command, which first pro-
vides the multiple R2 values for each DV (equation), with associated F- and
p-values. Second, the output provides unstandardized regression coeffi-
cients, standard errors, t-values, p-values, and 95% confidence intervals
Multivariate Multiple Regression 101
for each IV in each model. In Stata mvreg is the command used for MMR
estimates. The output from the mvreg command is similar to the output
from the regress command, except that there are three equations (one
for each DV) instead of one. The coefficients (and all of the output) are
interpreted in the same way as they are for any OLS regression. To be clear,
the “R-sq” in Figure 4.2 corresponds to the R2 from the regress com-
mand, not the adjusted R2. If regressions were performed for each out-
come variable, the same coefficients, standard errors, t- and p-values, and
confidence intervals as shown above would be obtained.
MMR does not provide an overall association measure. The Ri2
statistics are conventional multiple correlation coefficients and reflect
assessments of predictive ability. The Ri2 values predicted by the sepa-
rate regression equations necessarily uncorrelated. That is, the R2 values
for each DV may not indicate the unique variance explained for those
DVs by its set of IVs as a proportion of total variance explained for
all DVs.
As mentioned, if a separate regression was run for each DV, the same
coefficients, standard errors, t- and p-values, and confidence intervals as
shown above would be obtained. The use of the test command is one of
the compelling reasons for conducting a multivariate regression analysis.
One advantage of using the mvreg command is that tests of coefficients
(IVs) across the DVs may be run. Accordingly, the researcher tests the null
hypothesis that the coefficients for the IVs test self_efficacy, relationship_sat,
and attitude_adoption equal 0 in the equations for each of the two DVs.
satisfacti~g
self_effic~y –.0402116 .0280111 –1.44 0.152 –.0953378 .0149146
relationsh~t –.0193035 .0307725 –0.63 0.531 –.0798641 .041257
attitude)_a~n .6267131 .0273102 22.95 0.000 .5729664 .6804598
_cons 21.56306 2.001662 10.77 0.000 17.62377 25.50235
Figures 4.4 and 4.5 illustrate the results of these two analyses. In
Figure 4.4 for the DV service utilization, the three variables seem to make
equal contributions. In Figure 4.5 for the DV satisfaction with parenting,
the variable attitude toward adoption seems to make the greatest relative
contribution (Beta = 0.8011).
service_ut~i Beta
self_effic~y –.0530787
relationsh~t –.0470319
attitude_a~n .0221673
_cons .
Figure 4.4 DV is service utilization: Standardized Regression Coefficients.
Multivariate Multiple Regression 103
satisfacti~g Beta
self_effic~y –.0557719
relationsh~t –.0244611
attitude_a~n .8011388
_cons .
Figure 4.5 DV is satisfaction with parenting: Standardized Regression Coefficients.
/METHOD=SSTYPE(3)
/INTERCEPT=INCLUDE
/CRITERIA=ALPHA(.05)
/DESIGN= self_efficacy relationship_sat
attitude_adoption
Cargill, B. R., Emmons, K. M., Kahler, C.W., & Brown, R. A. (2001). Relationship
among alcohol use, depression, smoking behavior, and motivation to quit
smoking with hospitalized smokers. Psychology of Addictive Behaviors, 15(3),
272–275.
Relationships among depression, alcohol use, and motivation to quit smok-
ing were examined in a sample of hospitalized smokers. Multivariate multiple
regression and logistic regression analyses indicated that participants with
depressed mood were more likely to have a history of problematic drinking.
106 Analysis of Multiple Dependent Variables
Flores, L. Y., Navarro, R. L., & DeWitz, S. J. (2008). Mexican American high school
students’ postsecondary educational goals: Applying social cognitive career
theory. Journal of Career Assessment, 16(4), 489–501.
A multivariate multiple regression analysis predicting the educational
goal aspirations and expectations of Mexican American high school students
was examined based on Lent, Brown, and Hackett’s Social Cognitive Career
Theory and prior research findings with Mexican American samples. No gen-
der or generational status differences were found in educational aspirations
or expectations; however, participants reported higher educational aspirations
than educational expectations. In addition, results of a multivariate multiple
regression analysis suggested that Anglo-oriented acculturation was signifi-
cantly positively related to educational goal expectations and educational goal
aspirations. Mexican-oriented acculturation, college self-efficacy, and college
outcome expectations were not significantly related to Mexican American
students’ educational goals aspirations or expectations. Results are discussed
as they relate to improving the educational achievement among Mexican
American youth.
Henderson, M. J., Saules, K. K., & Galen, L. W. (2004). The predictive validity
of the University of Rhode Island Change Assessment Questionnaire in a
heroin-addicted polysubstance abuse sample. Psychology of Addictive Behaviors,
18(2), 106–112.
The purpose of this investigation was to examine the predictive utility
of the stages-of-change scales of the University of Rhode Island Change
Assessment Questionnaire in a heroin-addicted polysubstance-abusing
treatment sample. Participants completed the URICA at the beginning of
a 29-week treatment period that required thrice-weekly urine drug screens.
Multivariate multiple regression analysis indicated that after controlling for
demographic variables, substance abuse severity, and treatment assignment,
the stages of change scales added significant variance to the prediction of
heroin- and cocaine-free urine samples. The Maintenance scale was posi-
tively related to cocaine-free urines and length in treatment. The implica-
tions of these findings for treatment and for measuring readiness among
individuals using multiple substances while taking maintenance medications
are discussed.
Multivariate Multiple Regression 107
Miville, M. L., Darlington, P., Whitlock, B., & Mulligan, T. (2005). Integrating
identities: The relationships of racial, gender, and ego identities among white
college students. Journal of College Student Development, 46(2), 157–175.
The authors proposed that racial and gender identities were related to ego
identities based on common themes that exist across these different dimen-
sions of identity. A sample of White college students completed the White
Racial Identity Attitude Scale, the Womanist Identity Attitude Scale or Men’s
Identity Attitude Scale, and the Extended Objective Measure of Ego Identity
Status. Multivariate multiple regression analyses revealed that all ego identity
statuses were significantly related to gender and/or racial identity statuses for
108 Analysis of Multiple Dependent Variables
both women and men. Implications for practice, limitations, and directions
for future research are discussed.
Tang, T., & Kim, J. K. (1999). The meaning of money among mental health
workers: The endorsement of money ethic as related to organizational citizen-
ship, behavior, job satisfaction, and commitment. Public Personnel Management,
28(1), 15–26.
Exploratory and confirmatory factor analyses were conducted to examine
the measurement and dimensions of the six-item Money Ethic Scale (MES)
using a sample of mental health workers. Results showed that the items of the
new MES had very low and negligible cross-loadings and the interfactor cor-
relations were small. Therefore, the three factors (Budget, Evil, and Success)
measured fairly independent constructs. In addition, the results of a multi-
variate multiple regression showed that the linear combination of the factors
Budget, Evil, and Success was a significant predictor of the linear combina-
tion of organizational citizenship behavior, job satisfaction, and organizational
commitment.
5
Structural Equation
Modeling
109
110 Analysis of Multiple Dependent Variables
more accurately to the extent that the measures that define it are strongly
related to one another. If, for example, one measure is only weakly cor-
related with two other measures of the same construct, then that con-
struct will be poorly defined. Ideally, each indicator is a separate measure
of the hypothesized latent variable. In SEM, measurement is recognized
as error-prone. By explicitly modeling measurement error, SEM seeks to
derive unbiased estimates for the relations between latent constructs.
Multi-item scales pose challenges for SEM if all the items are used
as indicators of a latent construct (Cattell, 1956). For instance, a model
could have too many parameters to estimate relative to the available
sample size, resulting in reduced power to detect important parameters.
In addition, it might not fit the data sufficiently well because individual
items may have less than ideal measurement properties, leading to the
rejection of a plausible model.
When describing strategies for incorporating multi-item scales into
SEM, it is useful to distinguish among factors or latent variables, items
or observed variables or items, and groups of items or parcels. Three
basic strategies for incorporating lengthy scales into SEM are as fol-
lows: (1) including all items individually and (2) combining items (e.g.,
summed or averaged) into one or more subsets or parcels. The practice
of parceling items within scales or subscales has received considerable
attention in the structural equation modeling literature (cf. Bandalos,
2002; Little, Cunningham, Shahar, & Widaman, 2002; MacCallum et al.,
1999; Nasser & Takahashi, 2003). Item parceling can reduce the dimen-
sionality and number of parameters estimated, resulting in more stable
parameter estimates and proper solutions of model fit. When items are
severely nonnormal or are coarsely categorized, research suggests item
parceling improves the normality and continuity of the indicators and
estimates of model fit are enhanced as compared to the original items
(Bandalos, 2002).
Item parceling’s potential psychometric benefits notwithstanding,
this strategy has been controversial. One concern is that parceling results
in a loss of information about the relative importance of individual items
(Marsh & O’Neill, 1984), because items are implicitly weighted equally
in parcels (Bollen &Lennox, 1991). Another concern was that parcel-
ing of ordinal scales results in indicators with undefined values, poten-
tially changing the original relations between the indicators and latent
variables, for instance, from nonlinear to linear relations (Coanders,
Structural Equation Modeling 111
M
a b
c′
X Y
Figure 5.1 A Recursive Relationship.
compare the obtained value of the test statistic against tabled values in
order to make a decision about the null hypothesis.
A structural equation model implies a structure of the covariance
matrix of the measures (see chapter 2 for a discussion of covariance
matrices). Once the model’s parameters have been estimated, the result-
ing model-implied covariance matrix can then be compared to an empir-
ical or data-based covariance matrix. If the two matrices are consistent
with one another, then the structural equation model can be considered
a plausible explanation for relations between the measures.
From another perspective, assume a set of numbers X related to
another set of numbers Y by the equation Y = 4X, then the variance of
Y must be 16 times that of X, so you can test the hypothesis that Y and
X are related by the equation Y = 4X indirectly by comparing the vari-
ances of the Y and X variables. This idea generalizes, in various ways, to
several variables interrelated by a group of linear equations. The rules
become more complex, the calculations more difficult, but the basic mes-
sage remains the same: to test whether variables are interrelated through
a set of linear relationships by examining the variances and covariances
of the variables.
Model fit or goodness-of-fit may be assessed by examining the results
of the analysis, in particular the solution (i.e., parameter estimates, stan-
dard errors, correlations of parameter estimates, squared multiple cor-
relations, coefficients of determination), the overall fit (i.e., chi-square
based and non–chi-squared comparative fit indices), and the detailed
assessment of fit (i.e., standardized residuals and modification indices).
The remaining discussion is organized as follows: (1) assumptions,
(2) the procedure, (3) sample-size requirements, (4) strengths and limi-
tations, (5) annotated example, (6) reporting results, (7) results, and
(8) additional examples of SEM from the applied research literature.
ASSUMPTIONS OF SEM
For SEM, all of the assumptions for MANOVA apply, together with the
following extension of the assumption that the model is specified cor-
rectly. Because SEM is a confirmatory technique, full model must be
defined a priori. Within the context of SEM, this assumption, termed
identification, specifies the requirements of an appropriate model.
114 Analysis of Multiple Dependent Variables
Structural Identification
Model identification is a complex topic and a comprehensive mathemati-
cal discussion is beyond the scope of this book. However, some insight
into identification is essential for researchers to competently perform
structural equation modeling. Essentially, the following discussion
focuses on the t-rule, one of several tests associated with identification.
Other tests associated with identification will be briefly described and
sources of more comprehensive discussion will be recommended. More
extensive discussions of model identification within the context of SEM
are provided by Bollen (1989) and Kline (2011).
A statistical model is structurally identified if the known informa-
tion available implies that there is one best value for each parameter in
the model whose value is not known. Structural models must be identi-
fied for the overall SEM to be identified. That is, a model is identified if
the unknown parameters in the model only are functions of identified
parameters and these functions lead to unique solutions (Bollen, 1989).
Models for which there are an infinite number of possible parameter
estimate values are said to be underidentified. For example, a theoretical
model suggests that X + Y = 10. One possible solution is that X = 5 and
Y = 5, another is that X = 2 and Y = 8, but there are many possible solu-
tions for this problem; that is, there is indeterminacy, or the possibility
that the data fit more than one implied theoretical model equally well. If
a model is underidentified, then it will remain under identified regardless
of sample size. Models that are not identified should be respecified.
Models in which there is only one possible solution for each param-
eter estimate are said to be just-identified. Finally, models that have more
than one possible solution (but one best or optimal solution) for each
parameter estimate are considered overidentified. Usually, overidentified
Structural Equation Modeling 115
F1
V1 V2
e1 e2
models are used in SEM because these models allow a researcher to test
statistical hypotheses (Loehlin, 2004).
Heuristics are available to help to determine whether a model is struc-
turally identified. One commonly used heuristic is the t-Rule. This rule
states that there must be more known pieces of information (i.e., inputs)
than unknown pieces of information (i.e., parameters to be estimated) to
calculate a unique solution. If this condition is not satisfied, the model is
not identified. If this condition is satisfied, the model may be identified.
Consider Figure 5.2
This model contains one factor, F1, two observed variables, V1 and V2,
and two error variances or residuals, e1 and e2. This model requires that
four parameters be estimated: the factor’s variance, the two error variances,
and one factor loading. To estimate the number of inputs available to esti-
mate the aforementioned four parameters, use the following formula:
[Q(Q + 1)] / 2
F1 F2
V1 V2 V3 V4
e1 e2 e3 e4
Empirical Identification
Measurement models must also be empirically identified for the overall
SEM to be identified (Kenny & Judd, 1986). A model in which at least
one parameter estimate is unstable is empirically underidentified. As
discussed above, in SEM, the measurement model describes the rela-
tionships between observed variables and the construct or constructs
those variables are hypothesized to measure. The measurement model
of SEM allows the researcher to evaluate how well his or her observed
(measured) variables combine to identify underlying hypothesized con-
structs. Confirmatory factor analysis is used in testing the measurement
model, and the hypothesized factors are referred to as latent variables.
The measures chosen by the researcher define the latent variables in the
measurement model. A latent variable is defined more accurately to the
extent that the measures that define it are strongly related to one another.
Accordingly, in building measurement models, multiple-indicator mea-
surement models (Hunter & Gerbing, 1982) are preferred because they
allow the most unambiguous assignment of meaning to the estimated
constructs. The reason for this is that with multiple-indicator measure-
ment models, each estimated construct is defined by at least two measures,
118 Analysis of Multiple Dependent Variables
where one item partly “causes” the response to the next item in a
survey);
2. The construct has at least two indicators whose errors are
uncorrelated and either (a) both the indicators of the construct
correlate with a third indicator of another construct but neither
of the two indicators’ errors is correlated with the error of that
third indicator, or (b) the two indicators’ loadings are set equal to
each other; and
3. The construct has one indicator and (a) its error variance is
fixed to zero or some other a priori value (e.g., the quantity one
minus the reliability times the indicator’s variance), or (b) there
is a variable that can serve as an instrumental variable in the
structural model and the error in the indicator is not correlated
with that instrumental variable. Note that for a variable to be
a valid instrument, then, it must be (a) correlated with the
dependent variable of a model and (b) only affect the dependent
variable through an independent variable.
to the researcher. That is, prior to any data collection or analysis, the
researcher describes a model to be confirmed. Available information is
used to decide which variables to include in the theoretical model, which
implicitly also involves which variables not to include in the model and
how these variables are related. A model will be misspecified to the extent
that the relationships hypothesized do not capture the observed relation-
ships. The following basic rules are used when drawing a model:
Model Estimation
In SEM, the parameters of a proposed model are estimated by minimiz-
ing the discrepancy between the empirical (sample) covariance matrix,
S, and a covariance matrix, Σ, implied by the model (population). When
elements in the matrix S minus the elements in the matrix Σ equal zero
(S − Σ, = 0), then one has a perfect model fit to the data. The model
estimation process uses a fitting function or estimation procedures to
minimize the difference between Σ and S. Several fitting functions are
available. In AMOS, the following estimation procedures are available:
unweighted or ordinary least squares (ULS or OLS), generalized least
squares (GLS), asymptotically distribution free (ADF), scale-free least
squares (SFLS), and maximum likelihood (ML).
The least-squares criterion minimizes the sum of squared residuals
between the observed and predicted values of y. In the regression setting,
Structural Equation Modeling 121
The chi-square (χ2) test has at least two limitations. First, the
chi-square test offers only a dichotomous decision strategy implied by a
statistical decision rule and cannot be used to quantify the degree of fit
along a continuum with some prespecified boundary. Second, as with
most statistics, large sample sizes increase power, resulting in signifi-
cance with small effect sizes. Consequently, a nonsignificant χ2 may be
unlikely, although the model may be a close fit to the observed data.
Despite these limitations, researchers almost universally report the χ2
(Martens, 2005).
126 Analysis of Multiple Dependent Variables
C res
GFI = 1 − (5.1)
C tot
where Cres and Ctot estimate, respectively, the residual and total variabil-
ity in the sample covariance matrix. The numerator in the right side of
Equation 5.1 is related to the sum of the squared covariance residuals,
and the denominator is related to the total sum of squares in the data
matrix. GFI should by equal to or greater than .90 to indicate good fit.
GFI is less than or equal to 1. A value of 1 indicates a perfect fit. GFI
tends to be larger as sample size increases. GFI >.95 indicates good fit.
GFI index is roughly analogous to the multiple R-square in multiple
regression because it represents the overall amount of the covariation
among the observed variables that can be accounted for by the hypoth-
esized model. One limitation of the GFI is that its expected values vary
with sample size. Another limitation is that values of the GFI sometimes
fall outside of the range 0 – 1.
The Comparative Fit Index (CFI) (Bentler, 1990) measures the rela-
tive improvement in the fit of the researcher’s model over that of a base-
line model, typically the independence or null model, which specifies no
relationships among variables. CFI ranges from 0 to 1.0. Values close to 1
indicates a very good fit, >.9 or close to .95 indicates good fit, by conven-
tion, CFI should be equal to or greater than .90 to accept the model. The
formula is
χ2M − df M
CFI = 1 − (5.2)
χ2B − df B
where the numerator and the denominator of the expression on the right
side of the equation estimates the chi square noncentrality parameter for,
respectively, the researcher’s model and the baseline model. Note that
noncentrality parameters reflect the extent to which the null hypothesis
Structural Equation Modeling 127
is false. For example, the traditional χ2 test assumes that the null hypoth-
esis is true (χ2 = 0) in the population. This test relies on the “central”
distribution of χ2 values. Because the researcher is hoping not to reject
the null hypothesis, it is argued that it is more appropriate to test the
alternative hypothesis (Ha). This test of Ha would rely on a “noncentral”
chi-square distribution that assumes Ha is true in the population. This
approach to model fit uses a chi-square equal to the df for the model
as having a perfect fit (as opposed to χ2 = 0). Thus, the noncentrality
parameter estimate is calculated by subtracting the df of the model from
the chi-square (χ2 − df).
CFI is relatively insensitive to sample size (Fan, Thompson, and Wang,
1999). However, one limitation is that the null hypothesis that the base-
line model is better than the independence model is almost always true.
This is because the assumption of zero covariances among the observed
variables is improbable in most studies. Although it is possible to specify
a different, more plausible baseline model—such as one that allows the
exogenous variables only to covary—and compute by hand the value of
an incremental fit index with its equation, this is rarely done in practice.
Widaman and Thompson (2003) describe how to specify more plausible
baseline models.
The root-mean-square error of the approximation (RMSEA) is
another index based that is based on the noncentrality parameter (Steiger,
1990). The formula is
χ2M − df M
RMSEA = (5.3)
df M (N − 1)
the mean absolute correlation residual, the overall difference between the
observed and predicted correlations. The Hu and Bentler (1999) thresh-
old of SRMR ≤ .08 for acceptable fit was not a very demanding standard.
This is because if the average absolute correlation residual is around .08,
then many individual values could exceed this value, which would indi-
cate poor explanatory power at the level of pairs of observed variables.
It is better to actually inspect the matrix of correlation residuals and
describe their pattern as part of a diagnostic assessment of fit than just to
report the summary statistic SRMR.
In addition to considering overall model fit, it is important to con-
sider the significance of estimated parameters, which are analogous to
regression coefficients. As with regression, a model that fits the data quite
well but has few significant parameters would be meaningless. At a mini-
mum, the researcher should inspect model estimates to determine if pro-
posed parameters were significant and in the expected direction.
Model Modification
Rarely is a proposed model the best-fitting model. Consequently, modifi-
cation, also termed respecification, may be needed. This involves adjusting
the estimated model by freeing (estimating) or setting (not estimating)
parameters. Post hoc modifications of the model are often based on
modification indices. Improvement in fit is measured by a reduction in
chi-square, which makes the chi-square fit index less likely to be found
significant (recall a finding of significance corresponds to rejecting the
model as one that fits the data). For each fixed and constrained param-
eter (coefficient), the modification index reflects the predicted decrease
in chi-square if a single fixed parameter or equality constraint is removed
from the model by eliminating its path, and the model is re-estimated.
One arbitrary rule of thumb is to consider eliminating paths associated
with parameters whose modification index exceeds 10. However, another
approach is to eliminate the parameter with the largest MI, then see the
effect as measured by the chi-square fit index.
Modification is a controversial topic, which has been likened to the
debate about post hoc comparisons in ANOVA (MacCallum & Austin,
2000; McDonald & Ho, 2002). The suggested modifications, however,
may or may not be supported on theoretical grounds. As with ANOVA
and regression, problems with model modification include capitalization
130 Analysis of Multiple Dependent Variables
on chance and results that are specific to a sample because they are data
driven. Although there is disagreement regarding the acceptability of
post hoc model modification, statisticians and applied researchers alike
emphasize the need to clearly state when there was post hoc modification
rather than imply that analyses were a priori.
Researchers are urged not to make too many changes based on modi-
fication indices, even if such modifications seem sensible on theoretical
grounds. Note that SEM takes a confirmatory approach to model testing;
one does not try to find the best model or theory via data using SEM.
Rather than data-driven post hoc modifications (which may be very
inconsistent over repeated samples), it is often more defensible to con-
sider multiple alternative models a priori. That is, multiple models (e.g.,
based on competing theories or different sides of an argument) should
be specified prior to model fitting, and the best-fitting model should be
selected among the alternatives. Because a more complex model, assum-
ing it is identified, will generally produce better fit, and different models
can produce the same fit, theory is imperative in model testing.
In conclusion, it is worth noting that although SEM allows the test-
ing of causal hypotheses, a well-fitting SEM model does not and can-
not prove causal relations without satisfying the necessary conditions for
causal inference (e.g., time precedence, robust relationship in the pres-
ence or absence of other variables). A selected well-fitting model in SEM
is like a retained null hypothesis in conventional hypothesis testing; it
remains plausible among perhaps many other models that are not tested
but may produce the same or better level of fit. SEM users are cautioned
not to make unwarranted causal claims. Replications of findings with
independent samples are recommended, especially if the models are
obtained with post hoc modifications.
ANNOTATED EXAMPLE
A researcher plans to examine the relationship between factors that influ-
ence postadoption service utilization and positive adoption outcomes.
Specifically, the study tests a model that links (1) factors influencing the
utilization of postadoption services (parents’ perceptions of self-efficacy,
relationship satisfaction between parents, and attitudes toward adoption)
with (2) service utilization, and (3) positive adoption outcomes (satisfac-
tion with parenting and satisfaction with adoption agency). See Figure 5.5.
Note that latent variables are not included in the current model, since
e2
1
e1 Satisfaction with
Self Efficacy
Adoption Agency
1
Satisfaction with
Relationship
Parenting
Satisfaction
v(v + 1)/2,
The output is as follows (see Figure 5.6) and the estimated sample size
equals 408.
An alternative strategy for estimating sample size for SEM is provided
by the following webpage: http://timo.gnambs.at/en/scripts/power-
forsem. This webpage generates syntax for SPSS and R to estimate sam-
ple size for various measures of model fit, including RMSEA, GFI, AGFI.
This webpage http://www.datavis.ca/sasmac/csmpower.html provides an
SAS macro, csmpower, to calculate retrospective or prospective power
computations for SEM using the method of MacCallum and Browne
(1993). Their approach allows for testing a null hypothesis of “not-good-
fit,” so that a significant result provides support for good fit. Effect size
in this approach is defined in terms of a null hypothesis and alterna-
tive hypothesis value of the root-mean-square error of approximation
(RMSEA) index. These values, together with the degrees of freedom (df)
for the model being fitted, the sample size (n), and error rate (alpha),
allow power to be calculated.
1. Selects the data set that will be used to test the model by
clicking File Data Files File Name to browse to and select the
file;
2. Draws the model diagram by using the Diagram dropdown
menu (see Figure 5.7);
3. Draws observed variables by selecting Draw Observed, then uses
the cursor to draw the five observed variables (rectangles) in the
model;
4. Names an observed variable by right-clicking on a rectangle, and
under text tab, adds the variable name;
5. Draws latent variables (not included in the current mode) by
selecting Draw Observed, and then uses the cursor to draw the
five observed variables (rectangles) in the model;
6. Names a latent variable by right-clicking on a rectangle, and
under text tab, adding the variable name;
7. Draws paths by selecting Draw Path, and using the cursor;
8. Draws error terms by selecting Draw Unique Variable and using
the cursor; and
9. Names an error term by right-clicking a rectangle, and under text
tab adding the error term’s name.
Once the model is illustrated,
10. Clicks View Analysis Properties Estimation tab;
11. Selects Maximum likelihood;
12. Selects Fit saturated and independence models;
13. Clicks Output tab;
14. Selects Standardized estimates;
15. Selects Residual moments;
16. Selects Modification indices;
17. Selects Indirect, direct, and total effects;
18. Selects Covariances of estimates;
19. Selects Correlations of estimates;
20. Selects Tests for normality and outliers;
21. Once the model is illustrated and analysis properties are selected,
clicks Analyze, Calculate Estimates; and
136 Analysis of Multiple Dependent Variables
the model. By this criterion the present model is not rejected. Recall that
nonsignificant chi-square (e.g., p > .05) indicates that the parameters that
were estimated for the model fit the data (Please see Figure 5.8).
CMIN/DF is the χ2 test divided by the current model’s degrees of
freedom (df). Some researchers allow values as large as 5 as being an
adequate fit, but conservative use calls for rejecting models with rela-
tive chi-square greater than 3. By this criterion the current model is not
rejected (Please see Figure 5.8).
One important problem with chi-square is that with large samples
significance is easy to obtain. Given that large sample are recommended
for the SEM technique, this presents a dilemma. A solution has been to
develop what are called fit indexes which are based on the chi-square but
which control in some way for sample size.
RMR is the Root-Mean-Square Residual, which is the square root of
the mean squared amount by which the sample covariances differ from
the estimated covariances, estimated on the assumption that your model
is correct; the smaller the RMR, the better the fit. An RMR of zero indi-
cates a perfect fit. The closer the RMR to 0 for a model being tested, the
better the model fit, and an RMR value smaller than .05 suggests good fit.
For these data, RMR equals 9.370, which suggests that the current model
is not a good fit (Please see Figure 5.9).
GFI is the Goodness of Fit Index. GFI varies from 0 to 1, but theoreti-
cally can yield meaningless negative values. By convention, GFI should
by equal to or greater than .90 to accept the model. By this criterion the
present model (GFI = .979) is accepted (Please see Figure 5.9).
AGFI is the Adjusted Goodness of Fit Index. AGFI is a variant of GFI.
AGFI also varies from 0 to 1, but theoretically can yield meaningless neg-
ative values. AGFI should also be at least .90. By this criterion the present
model (AGFI = .913) is accepted (Please see Figure 5.9).
Saturated model
21 .000 0
Independence model
6 87.629 15 .000 5.842
Default model
(current model) 9.370 .979 .913 .233
not reject the null hypothesis and conclude that RMSEA is no greater
than .05 (Please see Figure 5.14).
AIC is the Akaike Information Criterion. The AIC measure indicates
a better fit when it is smaller. The measure is not standardized and is
not interpreted for a given model. For two models estimated from the
same data set, the model with the smaller AIC is to be preferred. The
AIC makes the researcher pay a “penalty” for every parameter that is
estimated. The absolute value of AIC has relatively little meaning; rather
the focus is on the relative size, the model with the smaller AIC being
preferred (Please see Figure 5.15).
BCC is the Browne–Cudeck Criterion, also called the Cudeck &
Browne single-sample cross-validation index. The BCC should be close
to .9 to conclude a model is a good fit. BCC penalizes for model com-
plexity (lack of parsimony) more than AIC. For two models estimated
from the same data set, the model with the smaller BCC is to be pre-
ferred (Please see Figure 5.15).
BIC is the Bayes Information Criterion, also known as Akaike’s
Bayesian information criterion (ABIC). BIC penalizes for sample size as
well as model complexity. Specifically, BIC penalizes for additional model
parameters more severely than does AIC. For two models estimated from
Model NCP LO 90 HI 90
Model FMIN F0 LO 90 HI 90
the same data set, the model with the smaller BIC is to be preferred
(Please see Figure 5.15).
CAIC is the Consistent AIC Criterion, which also penalizes for sam-
ple size as well as model complexity (lack of parsimony). The penalty is
greater than AIC or BCC but less than BIC. For two models estimated
from the same data set, the model with the smaller CAIC is to be pre-
ferred (Please see Figure 5.15).
ECVI is the Expected Cross-Validation Index. It is another variant on
AIC. For two models estimated from the same data set, the model with
the smaller ECVI is to be preferred. MECVI is the Modified Expected
Cross-Validation Index. It is a variant on BCC. MECVI is a variant on
BCC, differing in scale factor. Compared to ECVI, a greater penalty is
imposed for model complexity. Lower is better between models. For two
models estimated from the same data set, the model with the smaller
MECVI is to be preferred (Please see Figure 5.16).
Hoelter’s critical N or Hoetler index is the largest sample size at
which the researcher would accept the model at the .05 or .01 levels. This
offers a perspective on the chi-square fit index, which has the problem
that the larger the sample size, the more likely the rejection of the model
and the more likely a Type II error. In this case, actual sample size was
408 and the model was accepted. If the sample size had been only 164, it
would have been accepted at the .05 level (Please see Figure 5.17).
For these data, SRMR equals .0513. Therefore, the model is a fair
to good fit. SRMR ≤ .08 suggests good fit. Additionally, the matrix of
correlation was inspected. No unusual coefficients were observed. No
modification index was greater than zero, and, consequently, no model
modifications were suggested.
In addition to considering overall model fit, it is important to con-
sider the significance of estimated parameters, which are analogous to
regression coefficients. As with regression, a model that fits the data
quite well but has few significant parameters would be meaningless. At
HOELTER HOELTER
Model
.05 .01
e2
e1 Satisfaction with
Self Efficacy
–.15 Adoption Agency
.01
.10
.47 Attitude toward .09 Service e3 .28
Adoption Utilization
–.52 .19
.9
Satisfaction with
Relationship –16
Parenting
Satisfaction
RESULTS
The hypothesized model is described in Figure 5.5. An SEM analysis was
performed on hypothetical data from 408 clients of an adoption agency
(one mean score for each adoptive couple for each concept). The analysis
was performed using AMOS version 18. The assumptions of no missing
data, multivariate normality, linearity, and absence of multivariate outli-
ers, and perfect multicollinearity were evaluated. Each of the aforemen-
tioned assumptions seems tenable.
The hypothesized model was tested with maximum likelihood esti-
mation. The hypothesized model appears to be a good fit to the data.
P(CMIN) equals .261, and the present model is not rejected. CMIN/
DF equals 1.30. By this criterion the current model is not rejected. GFI
equals .979. By this criterion the present model is not rejected. CFI equals
.979. By this criterion the present model is not rejected. RMSEA equal
to .00 − .05 indicates close fit. As PCLOSE equals .394, we do not reject
the null hypothesis and conclude that RMSEA is no greater than .05.
By this criterion the present model is not rejected. SRMR equals .0513.
Therefore, the model is a fair to good fit. SRMR ≤ .08 suggests good fit.
Additionally, the matrix of correlation was inspected. No unusual coef-
ficients were observed. No modification index was greater than zero, and,
consequently, no model modifications were suggested. No modification
Structural Equation Modeling 145
Fandrem, H., Strohmeier, D., & Roland, E. (2009). Bullying and victimization
among native and immigrant adolescents in Norway: The role of proactive and
reactive aggressiveness. The Journal of Early Adolescence, 29(6), 898–923.
This study compares levels of bullying others, victimization, and aggressiveness
in native Norwegian and immigrant adolescents living in Norway and shows how
bullying is related to proactive and reactive aggressiveness. The sample consists
of 2,938 native Norwegians and 189 immigrant adolescents in school grades 8, 9,
and 10. Data were collected via self-assessments. SEMs were conducted separately
for girls and boys in both groups. The levels of victimization, reactive and pro-
active aggressiveness were the same for both native Norwegians and immigrant
adolescents but there was a significant difference in the levels of bullying others.
Compared with the native Norwegians, immigrant adolescents were found to be
at higher risk of bullying others. Structural models revealed significantly stronger
relations between affiliation-related proactive aggressiveness and bullying others
in immigrant boys compared with the other groups. This indicates that the wish
for affiliation is an important mechanism of bullying others in immigrant boys.
The authors also suggest further research and the practical importance of the
findings for prevention of targeting immigrant adolescents.
Owens, T. J. (2009). Depressed mood and drinking occasions across high school:
comparing the reciprocal causal structures of a panel of boys and girls. Journal
of Adolescence, 32 (4), 763–780.
Does adolescent depressed mood portend increased or decreased drink-
ing? Is frequent drinking positively or negatively associated with emotional
well-being? Do the dynamic relations between depression and drinking differ
by gender? Using block-recursive SEMs, we explore the reciprocal short-term
effects (within time, t) and the cross-lagged medium-term effects (t + 1 year)
and long-term effects (t + 2 years) of depressed mood and monthly drinking
occasions. Data come from the high school waves of the Youth Development
148 Analysis of Multiple Dependent Variables
Choosing among
Procedures for the
Analysis of Multiple
Dependent Variables
149
Criterion MANOVA MANCOVA MMR SEM
Research Objective Focuses on mean differences Focuses on mean differences Test models that focuses on Test models that Focus on
while controlling for other relationships between a set of individual paths between
variable that may affect these IVs and a set of DVs exogenous and endogenous
differences variables
Important Assumptions GLM GLM GLM GLM
Variables Modeled Multiple Independent (IV) Multiple Independent (IV) Multiple Independent (IV) Multiple Exogenous
Multiple Dependent (DV) Multiple Dependent (DV) Multiple Dependent (DV) Multiple Endogenous
Level of Measurement IVs = Nominal IVs = Nominal IVs = All Exogenous = All
DVs = Interval/Ratio DVs = Interval/Ratio DVs = Interval/Ratio Endogenous = All
Covariates = Interval/Ratio
Minimum Sample Size Smaller Smaller Smaller Larger
Criteria to Evaluate the Utility of the Eta Squared Eta Squared Multiple R-squared Various, including model chi
Model Tested square, CFI, RMSEA
Modeling Capabilities Path of all IVs to a linear Path of all IVs to a linear Path of all IVs to a of each DV Paths to observed and latent
combination of all DVs combination of all DVs, while endogenous variables
controlling for other variable
that may affect these differences
Type of Variance Common Common Common Common
Modeled Specific
Error
Moderating Relationships Modeled? Yes Yes Yes Yes
Mediating Relationships Modeled? No No Yes Yes
Causation Established? No No No No
Salas (1993), for example, have argued that the choice between MANOVA
and SEM should be guided by the question under investigation and by
the type of DV system being modeled.
More specifically, both SEM and MANOVA may be used to test mod-
els that link latent variables (or factors) and the empirical indicators of
those latent variables. But, the type of link between latent variables and
empirical indicators differs for the two approaches. In SEM models, the
direction is from the latent variables to the indicators; in contrast, in
MANOVA models, the direction is from the empirical indicators to the
latent variables. Following Bollen (1989) and Bollen and Lennox (1991),
in SEM, empirical indicators of a latent variable are referred to as effect
indicators, and in MANOVA, empirical indicators of a latent variable are
referred to cause indicators. It is crucial in deciding which analysis to
conduct to distinguish between DVs (i.e., exogenous variables) that are
affected by latent variables and DVs (i.e., exogenous variables) that cause
latent variables.
While it may seem reasonable to expect indicators of the same latent
variable to be positively related with each other, indicators of a latent
variable may not be related to each other. More specifically, empirical
indicators should be related to each other if they are the effects of a latent
variable. For example, to measure self-esteem, a person may be asked to
indicate whether he or she agrees or disagrees with the statements: (1) I
believe that I am a good person; and (2) I am happy with who I am. A per-
son with high self-esteem should agree with both statements; in contrast,
a person with low self-esteem would probably disagree with both state-
ments. Because each indicator depends on or is caused by self-esteem,
both of the aforementioned indicators should be positively correlated
with each other. That is, indicators that depend on the same variable
should be associated with one another if they are valid measures (Rubin,
2010). In contrast, when indicators are the cause rather than the effect
of a latent variable, these indicators may correlate positively, correlated
negatively, or be uncorrelated (Rubin, 2010). For example, gender and
race could be used as indicators of the variable “exposure to discrimina-
tion.” Being nonwhite or female increases the likelihood of experiencing
discrimination, so both are good indicators of the variable. But, race and
gender of individuals would not be expected to be strongly associated.
In summary, in SEM, empirical indicators (i.e., measured variables)
are hypothesized to be linear combinations of latent variables plus error;
154 Analysis of Multiple Dependent Variables
that is, arrows are directed from latent to empirical indicators. In SEM,
measured variables are effect indicators within a latent variable sys-
tem in that the indicators are affected by the factors (Bollen &Lennox,
1991; MacCallum & Browne, 1993). In MANOVA and MANCOVA,
researchers may hypothesize that a latent variable (e.g., program effec-
tiveness) is a linear combination of measured variables (e.g., increased
client self-efficacy, congruence between client expectations and program
performance, and client reports of program responsiveness in terms of
respectful and timely worker behaviors). Under these conditions, mea-
sured variables may be best conceptualized as cause indicators (Bollen &
Lennox. 1991; MacCallum & Browne, 1993).
associated with the use of multivariate procedures. The use of too many
DVs may reduced power or result in spurious findings due to chance.
The proper selection of an analytical strategy is a crucial part of
the research study. Which strategy is most appropriate depends on the:
(1) purpose of the analysis; (2) sample size; (3) tenability of assump-
tions; and (4) type of DV system being modeled. These three issues have
been discussed for MANOVA, MANCOVA, MMR, and SEM, which were
presented as alternative statistical procedures for analyzing models with
more than one DV.
References
157
158 References
Cook, J. A., Razzano, L., & Cappelleri, J. C. (1996). Canonical correlation anal-
ysis of residential and vocational outcomes following psychiatric rehabilitation.
Evaluation and Program Planning, 19(4), 351–363.
Cooley, W. W., & Lohnes, P. R. (1971). Evaluation research in education. New York:
Irvington.
Cottingham, K. L., Lennon, J. T., & Brown, B. L. (2005). Knowing when to draw
the line: designing more informative ecological experiments. Frontiers in
Ecology and the Environment, 3(3), 145–152.
Cox, D. R., & Small N. J. H. (1978). Testing multivariate normality. Biometrika
65(2), 263–272.
Cramer, E. M., & Nicewander, W. A. (1979). Some symmetric, invariant measures
of multivariate association. Psychometrika, 44, 43–54.
Chrisman, N. R. (1998). Rethinking levels of measurement for cartography.
Cartography and Geographic Information Science, 25(4), 231–242.
DeCarlo, L. T. (1997). On the meaning and use of kurtosis. Psychological Methods,
2(3), 292–307.
Draper, N. R. and Smith, H. (1998). Applied Regression Analysis. New York: John
Wiley.
Draper, N. R., Guttman, I., & Lapczak, L. (1979). Actual rejection levels in a cer-
tain stepwise test. Communications in Statistics, 8, 99–105.
Duffy, M. E., Wood, R. Y., & Morris, S. (2001). The influence of demograph-
ics, functional status and comorbidity on the breast self-examination profi-
ciency of older African-American women. Journal of National Black Nurses
Association, 12(1), 1–9.
Edgeworth, F. Y. (1886). Progressive means. Journal of the Royal Statistical Society,
49, 469–475.
Enders, C. K. (2003). Performing multivariate group comparisons following a
statistically significant MANOVA. Measurement and Evaluation in Counseing
and Development, 36, 40–56.
Enders, C. K. (2010). Applied missing data analysis. New York: The Guilford
Press.
Fan, X. (1997). Canonical correlation analysis and structural equation modeling:
What do they have in common? Structural Equation Modeling, 4, 65–79.
Fan, X., Thompson, B., & Wang, L. (1999). The effects of sample size, estima-
tion methods, and model specification on SEM fit indices. Structural Equation
Modeling, 6, 56–83.
Finch, W. H. (2007). Performance of the Roy-Bargmann stepdown procedure as
a follow up to a significant MANOVA. Multiple Regression Viewpoints, 33(1),
12–22.
Finn, J. D. (1974). A general model for multivariate analysis. New York: Holt.
References 161
Larsen, J. J., & Juhasz, A. M. (1985). The effects of knowledge of child develop-
ment and social-emotional maturity on adolescent attitudes toward parenting.
Adolescence, 20(80), 823–39.
Leary, M. R., & Altmaier, E. M. (1980). Type I error in counseling research: A plea
for multivariate analyses. Journal of Counseling Psychology, 27, 611–615.
Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and
structural equation analysis. Mahwah, NJ: Erlbaum.
Lorenz, Frederick O. (1987). Teaching about influence in simple regression.
Teaching Sociology, 15(2), 173–177.
Little, R. J., & Rubin, D. B. (2002). Statistical analysis with missing data. New York:
John Wiley & Sons.
Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To par-
cel or not to parcel: Exploring the question, weighing the merits. Structural
Equation Modeling, 9, 151–173.
Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and
structural equation analysis. Mahwah, NJ: Erlbaum.
Looney, S. W. (1995). How to use tests for univariate normality to assess multi-
variate normality. The American Statistician, 49(1), 64–70.
Lorenz, F. O. (1987). Teaching influence in simple regression. Teaching Sociology,
15(2), 173–177.
Lynch, S. M., & Graham-Bermann, S. A. (2004). Exploring the relationship
between positive work experiences and women’s sense of self in the context of
partner abuse. Psychology of Women Quarterly, 28(2), 159–167.
MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation mod-
eling in psychological research. Annual Review of Psychology, 51, 201–226.
MacCallum, R. C., & Browne, M. W. (1993). The use of causal indicators in covariance
structure models: Some practical issues. Psychological Bulletin, 114, 533–541.
MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar L. R. (1993). The
problem of equivalent models in applications of covariance structure analysis.
Psychological Bulletin, 114, 185–99.
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in
factor analysis. Psychological Methods, 4, 84–99.
Mallows, C. L. (1973). Some comments on CP. Technometrics, 15(4), 661–675.
Marsh, H. W., & O’Neill, R. (1984). Self Description Questionnaire III: The con-
struct validity of multidimensional self-concept ratings by late adolescents.
Journal of Educational Measurement, 21, 153–174.
Martens, M. P. (2005). Future directions of structural equation modeling in
counseling psychology. The Counseling Psychologist, 33(3), 375–382.
Menard, S. (1995). Applied logistic regression analysis. Thousand Oaks, CA: Sage
Publications.
References 165
Satorra, A., & Bentler, P. (1994). Corrections to test statistics and standard
errors in covariance structure analysis. In A. von Eye and C.C. Clogg (eds.),
Latent variable analysis: Applications to developmental research (pp. 399–419).
Newbury Park: Sage.
Schau, C., Stevens, J., Dauphinee, T. L., & Del Vecchio, A. (1995). The develop-
ment and validation of the Survey of Attitudes toward Statistics. Educational
and Psychological Measurement, 55, 868–875.
Scheffé, H. (1953). A method for judging all contrasts in the analysis of variance.
Biometrika, 40, 87–104.
Schmidt, F. L., & Hunter, J. E. (1997). Eight common but false objections to
the discontinuation of significance testing in the analysis of research data.
In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no
significance tests? (pp. 37–64). Mahwah, NJ: Erlbaum.
Schuster, C. (1998). Regression analysis for social sciences. New York: Academic
Press.
Seaman, M. A., Levin, J. R., & Serlin, R. C. (1991). New developments in pair-
wise multiple comparisons: Some powerful and practicable procedures.
Psychological Bulletin, 110, 577–586.
Searle, S. (1987). Linear models for unbalanced data. New York: John Wiley &
Sons, Inc.
Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality
(complete samples). Biometrika, 52, 591–611.
Shevlin, M., Miles, J. N. V., & Bunting B. P. (1997). Summated rating scales: A
Monte Carlo investigation of the effects of reliability and collinearity in regres-
sion models. Personality and Individual Differences, 23, 665–676.
Sokal, R. R., & Rohlf, F. J. (1995). Biometry. New York, NY: WH Freeman.
Small, N. J. H. (1980). Marginal skewness and kurtosis in testing multivariate
normality. Applied Statistics, 29, 85–87.
Smith, S. P., & Jain, A. K. (1988). A test to determine the multivariate normality
of a dataset. IEEE Transactions on Pattern Analysis and Machine Intelligence,
10(5), 757–761.
Smithson, M. (2003). Confidence intervals. Thousand Oaks, CA: Sage.
Sorbom, D. (1981). Structural equation models with structured means. In K.
G. Joreskog & H. Wolds, (Eds.), Systems under indirect observation: Causality,
structure and prediction (pp. 23–69). Amsterdam: North-Holland.
Srivistava, M. S. (1984). A measure of skewness and kurtosis and a graphical
method for assessing multivariate normality. Statistics and Probability Letters,
2, 263–276.
Steiger, J. H. (1990). Structural model evaluation and modification: An interval
estimation approach. Multivariate Behavioral Research, 25, 173–180.
168 References
Stevens, J. P. (1972). Four methods of analyzing between variation for the K-group
MANOVA problem. Multivariate Behavioral Research, 7, 499–522.
Stevens, J. P. (1973). Step-down analysis and simultaneous confidence intervals in
MANOVA. Multivariate Behavioral Research, 8(3), 391–402.
Stevens, J. P. (1980). Power of the multivariate analysis of variance tests.
Psychological Bulletin, 88(3), 728–737.
Stevens, J. (1996). Applied multivariate statistics for the social sciences. New York:
Routledge.
Stevens, J. (2002). Applied multivariate statistics for the social sciences. New York:
Routledge.
Stevens, J. (2009). Applied multivariate statistics for the social sciences. New York:
Routledge.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103,
677–680.
Stevens, S.S. (1951). Mathematics, measurement and psychophysics. In S. S.
Stevens (Ed.), Handbook of experimental psychology (pp. 1–49). New York:
Wiley.
Student. (1908). The probable error of a mean. Biometrika, 6, 1–25.
Subbaiah, P., & Mudholkar, G. S. (1978). A comparison of two tests for the sig-
nificance of a mean vector. Journal of the American Statistical Association, 73,
414–418.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics. Boston:
Allyn and Bacon.
Takane, Y., & Hwang, H. (2005). On a test of dimensionality in redundancy analy-
sis. Psychometrika, 70(2), 271–281.
Tang, T., & Kim, J. K. (1999). The meaning of money among mental health work-
ers: The endorsement of money ethic as related to organizational citizenship,
behavior, job satisfaction, and commitment. Public Personnel Management,
28(1), 15–26.
Thompson, B. (1984). Canonical correlation analysis: Use and interpretation.
Beverly Hills, CA: Sage.
Thompson, B. (1991). A primer on the logic and use on canonical correlation
analysis. Measurement and Evaluation in Counseling and Development, 24(2),
80–95.
Timm, N. H. (1975). Multivariate analysis with applications in education and psy-
chology. Monterey, California: Brookes-Cole.
Tintner, G. (1950). Some formal relations in multivariate analysis. Journal of the
Royal Statistical Society, Series B (Methodological), 12, 95–101.
Tomarken, A. J., & Waller, N. G. (2005). Structural equation modeling: Strengths,
limitations, and misconceptions. Annual Review of Clinical Psychology, 1,
31–65.
References 169
171
172 Index