Uses and Abuses of The Analysis of Covariance

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 11

Uses and abuses of ANCOVA 1

Uses and Abuses of the Analysis of Covariance


(Research in Nursing and Health, 1998, 21(4), 557-562.)

Steven V. Owen, PhD


Professor of Nursing

Robin D. Froman, PhD, FAAN


Associate Dean for Research, School of Nursing

University of Texas Medical Branch at Galveston

Reprint requests or correspondence to:


Steven V. Owen
School of Nursing
University of Texas Medical Branch
301 University Blvd.
Galveston, TX 77555-1029

Abstract
The analysis of covariance (ANCOVA) is a powerful analytic tool, but there
continue to be abuses of the method. We review assumptions and illustrate
legitimate uses of ANCOVA, and summarize statistical packages’ approach to the
method. Finally, we consider how ANCOVA is used in contemporary nursing
research.
Uses and abuses of ANCOVA 2

As many statistics books point out, the analysis of covariance (ANCOVA) has
two primary purposes: (a) to improve the power of a statistical analysis by
reducing error variance, and (b) to statistically "equate" comparison groups. The
first purpose operates well when participants are randomly assigned to their
groups. But using ANCOVA with intact or pre-existing groups can have the
opposite effect, a reduction in statistical power. The second purpose usually
accompanies non-random group comparisons, and analysts apply ANCOVA to
make the group comparisons more “fair.”
In this paper, we review the merits and demerits of these claims for
ANCOVA. More specifically, we explore various ANCOVA pitfalls that can
deliver misleading results for the unwary analyst, and review appropriate uses of
ANCOVA. We also show how statistical packages (BMDP, SPSS, SAS, and
SYSTAT) differ in their approach to ANCOVA. Though our focus is on the
conventional ANOVA formulation, for researchers who subscribe to Cohen’s
(1968) idea that regression analysis can do (just about) anything, our remarks
apply to regression models as well. In fact, regression models may be more
vulnerable to ANCOVA problems because independent variables often serve as
covariates whether or not the researcher intended them to take that role.
When Sir Ronald Fisher invented the ANCOVA model in the 1930s, he took
random assignment and experimental control for granted. Fisher had been studying
agricultural methods, and random assignment was easy to arrange. The point of his
invention was to enhance the precision of the statistical analysis. Today, ANCOVA
is used routinely with quasi-experimental data where treatments cannot—because
of expense, ethical concerns, or general disruptiveness—be randomly assigned to
participants. The inability to assign participants to treatments is particularly
evident in health care research. For example, in comparing lung vital capacity in
smokers and nonsmokers, participants self-select themselves into the two
comparison groups. If the researcher thinks that age might be a confounding
variable, age might be assigned to a covariate role. Whether that decision is a good
or bad one depends largely on two ANCOVA assumptions.
The first statistical assumption is that the covariate(s) is(are) uncorrelated with
other independent variables. In the smoking example, is age correlated with the
independent variable, groups? If the correlation is non-zero, then removing the
variance associated with age will also remove some of the variance associated with
the grouping variable. This in effect leaves less of the dependent variable’s (lung
vital capacity) variance to be accounted for by the independent variable (smoking).
Figure 1 illustrates the situation. Notice that the covariate, age, overlaps with
smoking status (arrowed portion), absorbing some of smoking’s relationship with
lung vital capacity.
Uses and abuses of ANCOVA 3

IV: SMOKING

COV: AGE

DV: LUNG VITAL


CAPACITY

Figure 1. A covariate erodes the effect of another independent variable

In the frequent case where ANCOVA is arranged specifically to “equate”


groups that differ on some pretest measure, then the analyst has automatically
violated the assumption. Does that make any difference? Wu and Slakter (1989),
discussing ANCOVA in nursing research, showed no hesitation in recommending
the technique to adjust for pre-existing group differences; whereas Pedhazur and
Schmelkin (1991, p. 283) remarked that the approach “is fraught with serious
biases and threats to validity.” Our position is more aligned with Pedhazur and
Schmelkin’s.
The second statistical assumption for ANCOVA is that the covariate(s) is (are)
correlated with the dependent variable. When a covariate is a pretest and the
dependent variable is the posttest, there should be a substantial correlation between
the two (but see Pedhazur and Schmelkin [1991, pp. 283-284] for other potential
problems with such a design).
In the smoking example, age of course is not a pretest; rather, it is considered a
proxy variable, that is, a convenient substitute for other underlying constructs. In a
proxy role, age might represent longer opportunity for smoking and increased
vulnerability to an earlier cultural acceptance of smoking. What is the correlation
of age with the dependent variable, lung vital capacity? If the correlation is small,
then the covariate adjustment is similarly small, and there will not be a noticeable
improvement in power. There might even be a reduction in power, because the
covariate uses a degree of freedom that had been assigned to the error term. That
results in an increase in the mean square error (reducing power). When sample size
is small to begin with, the covariate must be strong enough to compensate for the
loss of the error degree of freedom. But even if the covariate-dependent variable
correlation is large, there may or may not be an improvement in power; it depends
on meeting the first assumption.

Example 1: Reducing statistical power


The data here are from an example in the BMDP Manual, Volume 1 (Dixon,
1992), in which 40 participants run a mile, and then their pulse rates are measured.
A 2 (Sex) by 2 (Smoking) ANOVA is arranged. Both main effects are significant:
Uses and abuses of ANCOVA 4

Sex F(1,36) = 37.33, p < .001, effect size1 (partial 2) = .51; Smoking F(1,36) =
6.56, p < .01, partial 2 = .15. The interaction term is nonsignificant.
Things change when the data are rerun, using baseline pulse as a covariate.
Once again, the main effects are significant: Sex F(1,35) = 21.81, p < .001, partial
2 = .38; Smoking F(1,35) = 5.71, p < .05, partial 2 = .14. Notice that the effect
size for Sex has dropped substantially. That is because baseline pulse was
correlated with Sex, violating assumption #1.
What about the substantive problem of interpretation? This involves the well
known problem of variance partitioning. Darlington (1968) and other early
analysts described the problem clearly, and admitted no answers. Even today,
Pedhazur (1997) has addressed an entire chapter to the issue, because “variance
partitioning is widely used, mostly abused, in the social sciences for determining
the relative importance of independent variables….[In the last 15 years,] abuses of
variance partitioning have not abated but rather increased” (p. 243). When
variables covary, there is no satisfactory way to assign unique explanatory power
to them individually. One can make odd sounding statements that reveal how
confusing the situation is. For example, “Adjusting for initial pulse rate, sex is
associated with post-exercise pulse rate.” What does it mean to hold pulse rate
constant, as though everyone had the same initial pulse rate? What value is the
hypothetical constant pulse rate? Could one choose a different pulse rate to hold
constant? Back to Pedhazur: “Unfortunately, applications of ANCOVA in quasi-
experimental and non-experimental research are by and large not valid” (1997, p.
654).

Example 2: Improving statistical power


These data are from an intervention study of 32 veterans diagnosed with post-
traumatic stress disorder. Two treatments are randomly assigned: A one-week
Outward Bound experience, or regular counseling sessions at a Veterans Affairs
hospital. Pre-intervention and one-week follow-up measures are taken with the
Beck Hopelessness Scale (Beck & Steer, 1988).
A one-way ANOVA is run, and the main effect for treatment is significant:
F(1,30) = 10.67, p <.01, effect size (partial 2) = .26. As in the first example, the
data are rerun, this time using baseline Hopelessness scores as a covariate. Once
again, the treatment effect is significant: F(1,29) = 17.24, p <.001, 2 = .37. In this
case, all the signs of increased statistical power are present: larger F-ratio, smaller
p value, and larger effect size. In this case, ANCOVA did its job because both
assumptions were in place: Random assignment of treatments to participants
1
In ANCOVA, partial 2 is defined as
adjusted SS for effect
--------------------------------------------------------
(adjusted SS for effect + adjusted SS for error)

(Tabachnick & Fidell, 1996, p. 349).


Uses and abuses of ANCOVA 5

creates an expected correlation of zero between the pretest and the grouping
variable; and pretest scores are theoretically and statistically related to the outcome
measure.
Figure 2 depicts this case. Because of random assignment of treatment groups,
the grouping variable is not related to the covariate, Hopelessness pretest scores.
But the covariate is related to the dependent variable, and boosts the independent
variable’s power by removing some of what otherwise would be error variance.
Once the dependent variable’s variance associated with the covariate’s variance is
removed, the portion of the remaining variance in the independent variable shared
with the independent variable (treatment) becomes larger.

IV: GROUP

COV: PRE-
HOPELESSNESS
DV: POST-

HOPELESSNESS

Figure 2. A covariate improves the effect of another independent variable

Other ANCOVA Assumptions


The usual ANOVA assumptions—homogeneity of variance, normality, and
independence of scores—hold for ANCOVA as well. And, as usual, the F-ratio can
withstand some disruption in homogeneity of variance and normality (especially
with equal cell sizes), but it is highly vulnerable to correlated scores, which create
Type I errors.
There is another ANOVA/ANCOVA assumption often unmentioned in
statistics books: No measurement error in the covariate(s). In the case of ANCOVA
with random assignment, covariate measurement error does not bias the adjusted
means, but it does produce less statistical power, which in turn increases the
probability of a Type II error. With a quasi-experimental design lacking random
assignment, covariate measurement error creates bias in adjusted means. The bias
is usually negative (underadjustment), but under some conditions can be positive
(Bryk & Weisberg (1977).

Although measurement error in the dependent variable is not an


Uses and abuses of ANCOVA 6

ANOVA/ANCOVA assumption, it can disrupt statistical power. With ANOVA,


measurement error in the dependent variable reduces statistical power, but with
ANCOVA, the outcome is less predictable: Even Type I errors may result if the
covariate is correlated to other independent variables, because measurement error
in the covariate may now ripple through the entire model by way of its correlations
with other variables.
Homogeneity of regression slopes is an additional assumption for the
ANCOVA model. This means that each comparison group should show a similar
regression slope when the dependent variable is regressed on the covariate(s). The
reason for the assumption is that all groups’ dependent variable scores are adjusted
based on a pooled regression slope; if the groups’ individual slopes differ sharply,
then the pooling becomes a muddy average.
Interestingly, when cell sizes are equal, the ANCOVA F-ratios are generally
robust except for the most gross violations of homogenity of regression (Hamilton,
1977; Wu, 1984). That does not mean that equal cell sizes allows the analyst to
ignore the homogeneity of slope assumption. A robust F-ratio is a statistical
summary that delivers no particular insight about how groups are different. It can
be far more informative, following a violation of homogenous slope, to calculate
Johnson-Neyman regions of significance. This technique helps to map out where
groups do and do not differ along various values of the covariate. Dorsey and
Soeken (1996) produced an introduction to the Johnson-Neyman method applied
to nursing research.
The final ANCOVA assumption is that the relationship between the
covariate(s) and the dependent variable is linear. Because the regression is based
only on the linear portion, any systematic but nonlinear relationship will cause a
reduction in statistical power. The simplest solution for nonlinearity is to apply a
power transformation (e.g., quadratic, cubic) to the covariate before the ANCOVA
analysis.

What Do Nurse Researchers Do with ANCOVA?


We searched four important nursing journals over a five-year period (1993-
1997; reference list available from first author) for examples of ANCOVA. Image
had none, in keeping with its recent drift toward qualitative research (Henry,
1998). Western Journal of Nursing Research had only two, Research in Nursing
and Health published five , and Nursing Research showed nine, for a total of 16.
Of those articles, nine (56%) used random assignment of participants to
treatments, so the covariate(s) were expected to be uncorrelated with the dependent
variables (assumption #1).
Only one of the 16 articles assessed ANCOVA assumptions thoroughly. In
fairness, though, the nine using random assignment should not have needed to
check the correlation between the covariate(s) and independent variable(s). Also,
because random assignment can produce (approximately) equal cell sizes, the
analysis is inoculated against violations of all assumptions except independence of
scores. It is surprising that so few of the articles reported information about
assumption #2, the relationship of the covariate and the dependent variable. Only
Uses and abuses of ANCOVA 7

one reported F-ratios for covariates, and one other study gave the simple
correlations between covariates and dependent variables.

Statistical Packages and ANCOVA


In 1982, Searle and Hudson compared ANCOVA procedures from 10 computer
programs, and discovered different output among all 10. Although contemporary
statistical programs are easier to use, output and labeling have not improved much
since then. Three of the four packages we reviewed are owned by SPSS (BMDP
version 7, SPSS version 8.0, and SYSTAT version 7.01). Interestingly, the flagship
program, SPSS, differs from the other two in its approach to ANCOVA. Its default
setting is what SPSS terms the “experimental” approach, in which main effects and
interactions are adjusted for the covariate. The default for BMDP and SYSTAT, in
SPSS language, is called the “regression” approach, in which each term—even the
covariate—is adjusted for each other term. When the covariate is uncorrelated with
other independent variables, then both approaches give the same result (that is,
there is nothing to adjust in the covariate). But in the non-experimental situation,
where the covariate may be related to a grouping variable, the two approaches can
deliver markedly different results. In this case, the regression approach gives more
conservative results, with less statistical power. With each package, the thoughtful
analyst can easily override the defaults to produce the alternate approach. SYSTAT
does not label the covariate(s) as such on the printout. BMDP’s programs 1V and
4V label the covariate(s) clearly, but 2V does not.
SAS (version 6.12) does not treat the approaches as alternate. It delivers both
the regression and experimental results in a single table, so the user can decide
which to use (or not decide, and report both). SAS does not identify the
covariate(s) on the printout.
SPSS’s ANCOVA is the most unconventional of the four packages. The only
way to assign covariate status to a variable is through SPSS’s General Linear
Model procedure. The resulting printout does not distinguish covariates from other
independent variables.

Statistical Packages’ Treatment of ANCOVA Assumptions


As a rule, statistical packages encourage users to ignore assumptions and leap
right to the main analysis. Inside ANOVA programs, packages offer the Levene
test for homogenity of variance, but any other tests of assumptions must be
arranged by the user.
For ANCOVA, the situation is no better. In BMDP, only one of its three
ANCOVA programs (1V) automatically delivers a homogeneity of regression test.
Unfortunately, this program handles only a one-way model, so if the analyst has a
factorial model, she must convert the cell structure to a one-way model just to get
the assumption tested. SYSTAT and SAS offer no homogeneity test. In SPSS’s
GLM procedure, one must construct an interaction term representing the
assumption test. Without a clear guide (SPSS, 1997, pp. 118-119), this would be
hard to discover.
Any analyst facile with regression analysis could readily test for slope
homogeneity test inside a regression model. Caution should be used, though, in
Uses and abuses of ANCOVA 8

arranging the model. With a hierarchical analysis (the preferred approach), the
homogeneity term (interaction between the covariate and the independent variable)
is entered last, and the test is a version of SPSS’s “experimental” approach, where
each successive term is adjusted for previous terms. If a direct or simultaneous
regression is used, the homogeneity term is tested with SPSS’s “regression”
approach.

Conclusions
In 1969, Janet Elashoff called the analysis of covariance (ANCOVA) "a
delicate instrument." It still is. Carefully handled, though, it is an excellent device
for the analyst’s toolkit. To improve the quality of future ANCOVA studies, we
recommend that the method be limited primarily to randomized designs. When the
analyst wants to use ANCOVA with an intact group or other nonrandom
assignment, the correlation between the covariate(s) and the independent
variable(s) should be reported. As the correlations are increasingly non-zero, then
conclusions drawn about the independent variables are increasingly suspect.
ANCOVA is an interesting and useful toolkit, but it is not a fix-all to be applied
indiscriminately to equate groups. As mentioned above, the Johnson-Neyman
method can be used as an option (or as a complement) to ANCOVA. Myers and
Well (1995) offer a brief comparison of ANCOVA with other approaches—
blocking, analysis of gain scores—to improving statistical power in non-random
group). Kirk (1995, Chapter 15) gives a short but excellent review of ANCOVA
applications, and Huitema’s (1980) text remains as the definitive work on
ANCOVA.
We also recommend that researchers report tests of ANCOVA assumptions.
That statistical packages make assumption tests challenging is not a good reason to
avoid them entirely. And it is easy, not challenging, to report the simple
correlations between covariates and dependent variables. In the case where the
correlations are tiny, then there is no gain whatsoever to using ANCOVA.

References
Beck, A. T., & Steer, R.A. (1988). Beck Hopelessness Scale manual. San Antonio:
Psychological Corporation.
Bryk, A.S., & Weisberg, H.I. (1977). Use of the nonequivalent control group
design when subjects are growing. Psychological Bulletin, 84, 950-962.
Cohen, J. (1968). Multiple regression as a general data analytic system.
Psychological Bulletin, 70, 426-443.
Darlington, R.B. (1968). Multiple regression in psychological research and
practice. Psychological Bulletin, 69, 161-182.
Dixon, W.J. (1992). BMDP statistical software manual, Vol. 1. Berkeley, CA:
University of California Press.
Dorsey, S.G., & Soeken, K.L. (1996). Use of the Johnson-Neyman technique as an
alternative to analysis of covariance. Nursing Research, 45, 363-366.
Elashoff, J.D. (1969). Analysis of covariance: A delicate instrument. American
Educational Research Journal, 6, 383-401.
Uses and abuses of ANCOVA 9

Hamilton, B.L. (1977). An empirical investigation of the effects of heterogeneous


regression slopes in analysis of covariance. Educational and Psychological
Measurement, 37, 701-702.
Henry, B. (1998). To Journal readers, report and requests, 1998. Image: Journal of
Nursing Scholarship, 30, 2.
Huitema, B.(1980). The analysis of covariance and alternatives. New York: Wiley.
Kirk, R.E. (1995). Experimental design: Procedures for the behavioral sciences
(3rd Ed.). Pacific Grove, CA: Brooks/Cole.
Myers, J.L., & Well, A.D. (1995). Research design & statistical analysis. Hillsdale,
NJ: Lawrence Erlbaum.
Pedhazur, E.J. (1997). Multiple regression in behavioral research (3rd Ed.). New
York: Harcourt Brace.
Pedhazur, E.J., & Schmelkin, L.P. (1991). Measurement, design, and analysis: An
integrated approach. Hillsdale, NJ: Lawrence Erlbaum.
Searle, S.R., & Hudson, G.F.S. (1982). Some distinctive features of output from
statistical computing packages for analysis of covariance. Biometrics, 38, 737-
745.
SPSS. (1997). SPSS advanced statistics 7.5. Chicago: Author.
Wu, Y-W.B. (1984). The effects of heterogeneous regression slopes on the
robustness of two test statistics in the analysis of covariance. Educational and
Psychological Measurement, 44, 647-663.
Wu, Y-W.B., & Slakter, M.J. (1989). Analysis of covariance in nursing research.
Nursing Research, 38, 306-308.
Uses and abuses of ANCOVA 10
Uses and abuses of ANCOVA 11

You might also like