1 s2.0 S2452301118300154 Main
1 s2.0 S2452301118300154 Main
1 s2.0 S2452301118300154 Main
com
www.elsevier.com/locate/hpe
Abstract
Analysis of covariance (ANCOVA) is a commonly used statistical method in experimental and quasi-experimental studies. One
of the fundamental assumptions underlying ANCOVA is that of no interaction between factor and covariate. Unfortunately, many
researchers report the outcomes of ANCOVA but not the outcomes of a check on that non-interaction assumption. Through a
comparison of ANCOVA (which assumes non-interaction) and moderated regression (MODREG, which allows for interaction) in
a worked example, this article demonstrates that omitting the check of the non-interaction assumption comes at the risk of
misestimating a treatment effect or other group difference of interest. If there is substantial interaction between factor and
covariate, ANCOVA will result in conclusions of there being a group difference or no group difference whereas MODREG
indicates that the magnitude of a group difference depends on the level of the covariate. Therefore, this article advises to first
check and report on the interaction, to use that check to decide whether a model without interaction (ANCOVA) or with
interaction (MODREG) is to be preferred, and to use ANCOVA only if the criteria outlined in this article indicate a preference
towards the model without interaction. Moreover, omitted terms, such as the omitted interaction if one proceeds with ANCOVA,
should be reported as well.
& 2018 King Saud bin Abdulaziz University for Health Sciences. Production and Hosting by Elsevier B.V. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
https://doi.org/10.1016/j.hpe.2018.04.001
2452-3011/& 2018 King Saud bin Abdulaziz University for Health Sciences. Production and Hosting by Elsevier B.V. This is an open access
article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
226 J. Leppink / Health Professions Education 4 (2018) 225–232
assumptions,1,5 and violations of these assumptions groups. The first type of deviation appears to not
may have serious consequences for the outcomes and meaningfully affect the outcomes of ANCOVA,
interpretations. whereas the second type of deviation is mainly
problematic when dealing with groups that differ in
1.1. Assumptions sample size (with larger differences being more
problematic).9 Huitema1 discusses several alternatives
As elegantly formulated by Huitema,1 the assump- to ANCOVA for such situations.
tions for the ANCOVA model “are relatively straight- Fifthly, the grouping variable and covariate are
forward because it is simply another linear model” (p. assumed to be fixed and measured without error. For
182). the grouping variable, this is straightforward. Whenever
Firstly, the residuals are assumed to be independent. the interest lies in a comparison between specific
Although this assumption may be realistic for instance groups, such as treatment conditions in a randomized
in randomized controlled experiments where partici- controlled experiment, the categories of the grouping
pants receive individual treatment and – throughout the variable are fixed. The comparison is clear and there is
experiment – do not interact in any way with other no interest in generalizing to other groups not observed
participants, when participants interact (e.g., group in the study. However, in cases where groups under
learning) or are measured repeatedly on the same comparison can be considered a random sample of a
variable(s) of interest (i.e., repeated measures) that population of possible groups and the interest lies in
assumption is usually violated. Interaction between generalizing the findings of the groups observed to
participants and repeated measurements from the same other groups, the groups are in fact treated as random
participants are two phenomena that usually create not fixed. An example of the latter is found in a random
some kind of a dependence of residuals and that sample of say twenty health centers from a much larger
dependence needs to be accounted for in the statistical population of health centers. In the latter case,
analysis, for instance through multilevel analysis.6,7 ANCOVA does not work for it does not enable
Secondly, the residuals are assumed to have a mean generalization to groups not observed; a multilevel
of zero regardless of the grouping variable or the level model that treats the groups as random units in which
of the covariate. This can be considered true when the individuals of interest (e.g., employees, patients) are
relation between response variable and covariate within nested constitutes a better approach.6 That said, for the
groups is linear, and in cases where the latter is not covariate, the situation is more complex. Assuming
true, researchers should consider nonlinear alternatives random sampling, covariate values observed in a
to the linear model (i.e., it is possible to have means of sample in practice rarely cover all values of the
zero for residual distributions when the relation covariate in the population but rather constitute a
between response variable and covariate is nonlinear random sample of covariate values in the population.
and an appropriate nonlinear function is used).1 Moreover, given that in educational and psychological
Thirdly, the residuals are assumed to be normally settings the covariate often results from a psychometric
distributed. Inspecting the plotted residuals of the instrument, the covariate is often measured with error.
ANCOVA model and/or the normal probability plot Although ANCOVA was initially derived under the
provides a straightforward approach to checking this assumption of the covariate being fixed, ANCOVA
assumption.1 Although moderate departures from with a random variable covariate measured without
normality in samples in the 20 s or larger generally do error is appropriate. 10 For instance, from a limited
not constitute a cause of concern, more severe number of values observed on the covariate, researchers
departures from normality may introduce substantial may use a linear model to generalize to values of the
distortion 8 and hence need to be accounted for, for covariate that have not been observed in the sample but
instance by using a model that allows for another type are within the range (i.e., between minimum and
of distribution.1 maximum) of values observed in the sample. However,
Fourthly, the variance of the residuals is the same the measurement error issue is more serious: “In
regardless of the grouping variable or covariate (i.e., experimental research, unreliable covariates lead to
homoscedasticity). Two common types of deviation loss of power and a conservative statistical test through
from that assumption are (1) increasing (or decreasing) underadjustment of the error term”11 (p.326). Besides,
residual variance with increases in the level of the when ANCOVA is used to adjust means in nonrando-
covariate but no difference between groups and mized studies, “the difference between the adjusted
(2) constant residual variance within but not between means is partly a function of the reliability of the
J. Leppink / Health Professions Education 4 (2018) 225–232 227
covariate”1 (p. 191). The latter is serious, because the assumption that underlies ANCOVA. For that reason,
adjusted difference may have an incorrect magnitude when there is substantial interaction or when the
and even be opposite to what it should be (i.e., negative statistical power to detect an interaction of a given size
instead of positive or vice versa). In other words, it is of is limited due to relatively small groups, ANCOVRES
paramount importance to use covariates that do not may have somewhat more statistical power to detect a
suffer from poor reliability (i.e., high measurement group difference in the response variable of interest
error). than ANCOVA. 13 However, just like in ANCOVA, the
Sixth and finally, the slope of the linear relation comparison between groups takes place at the average
between response variable and covariate is assumed to value of the covariate. Although ANCOVRES may at
be the same across groups (i.e., homogeneity of the first come with somewhat more statistical power than
within-group slopes). In plain language, this means ANCOVA for that group difference at the average
parallel regression lines aka non-interaction: there is value of the covariate, the elephant in the room – the
no interaction between factor (i.e., treatment or other interaction – is still ignored. Avoiding the non-
grouping variable) and covariate. If this assumption interaction assumption by using group-specific slopes
holds, any group difference of interest is the same (i.e., ANCOVRES) instead of a pooled (i.e., across
across the observed range of the covariate. Unfortu- groups) slope (i.e., ANCOVA) does not mean the
nately, many researchers do report the outcomes of interaction is suddenly gone. With ANCOVRES, we
ANCOVA but not the outcomes of a check of that non- still learn nothing about how a group difference in a
interaction assumption.11 At first, one may wonder why response variable of interest depends on the level of the
to report on the outcome of a non-interaction assump- covariate. Furthermore, especially when samples are on
tion if we do not commonly report in detail (e.g., the smaller side, group-specific regression slope
through residual plots) on the normality assumption estimates (cf. ANCOVRES) may be quite inaccurate,
either. Besides, some may argue that the interest lies in eventual departures from normally distributed residuals
‘the treatment effect’ and not in an interaction per se. may have more severe consequences for the validity of
Researchers who follow this argument probably under- model outcomes, and an eventual correction in degrees
estimate the effects that ignoring an interaction may of freedom of the residual term due to estimating group-
have on the treatment effect of interest. In the words of specific instead of using a pooled slope 14–16 may
Keselman et al. 12 (p. 351): “The applied researcher reduce the gain in statistical power relative to
who routinely adopts a traditional procedure without ANCOVA back to about zero. In other words, the
giving thought to its associated assumptions may added value of ANCOVRES over ANCOVA may be
unwittingly be filling the literature with nonreplicable qualified as questionable at best.
results.” Although moderate departure from normality Luckily, there is another alternative to ANCOVA
may in practice often not meaningfully undermine the called moderated regression17,18 (henceforth: MOD-
validity of ANCOVA, moderate departure from non- REG). In this alternative, the interaction effect is neither
interaction poses serious threats to that validity.1,11 assumed to be non-existent (cf. ANCOVA) nor avoided
Thus, when there is interaction between grouping (cf. ANCOVRES) but is tested and estimated. In two-
variable and covariate (i.e., non-parallel regression way analysis of variance (ANOVA),1 most researchers
lines), we need to consider alternatives to ANCOVA. would agree that it is odd to run a model without
interaction effect when there is an interaction effect.
1.2. Alternatives Rather, researchers report the interaction effect and
proceed with simple effects analysis: group differences
One alternative to ANCOVA that has been proposed in a response variable of interest are then studied for
to deal with a violation of the non-interaction assump- each group of the second factor. Analogously, MOD-
tion is analysis of covariate residuals (ANCOVRES). 13 REG acknowledges that running an ANCOVA (or
In ANCOVA, residuals are computed for the full ANCOVRES, for that matter) when there is interaction,
sample by pooling the within-group regression of the is odd. Instead, we evaluate group differences of
response variable on the covariate (i.e., one estimate for interest at more than one level of the covariate (i.e.,
all groups). In ANCOVRES, no such pooling takes picked-points analysis 1 aka pick-a-point analysis19).
place; instead, the within-group slope for each group is Several software packages have been developed for this
used to remove covariate variance from the response analysis, including Hayes’ process software20 and
variable within the respective group. As such, Jamovi.21 It is common to choose three points: at the
ANCOVRES does not rely on the non-interaction average value of the covariate, at one standard
228 J. Leppink / Health Professions Education 4 (2018) 225–232
deviation (SD) below the average value of the covariate, commonly better alternative to Cronbach's alpha23
and at one SD above the average value of the covariate. equals 0.81). The posttest yields a score ranging from
0 (all wrong) to 20 (all correct). Again, for the
1.3. The current study simplicity of the example, let us assume that psycho-
metric analysis reveals that this set of items provides a
Through a comparison of ANCOVA (which assumes unidimensional assessment of learning outcomes with a
non-interaction) and MODREG (which enables re- good reliability (e.g., McDonald's omega equals 0.85).
search to study an interaction), this article demonstrates Prior to the experiment, the researchers hypothesize a
that omitting the check of the non-interaction assump- positive linear relation between self-perceived prior
tion comes at the risk of misestimating the treatment experience (i.e., the covariate) and posttest performance
effect or other group difference of interest. If there is (i.e., the response variable of interest). However, as to
substantial interaction between factor and covariate, the knowledge of the researchers the comparison
ANCOVA will result in conclusions of there being a between conditions is done in an empirical study for
group difference or no group difference (for the whole the first time, they have no definite expectations with
observed range of the covariate, since no interaction is regard to one condition doing better than the other
assumed) whereas MODREG indicates that the magni- condition. Further, the example used for this article is
tude of a group difference depends on the level of the such that the other assumptions discussed in this paper
covariate (i.e., interaction aka moderation aka effect pose no threats to the validity of the models compared.
modification). Therefore, this article advises to first
check and report on the interaction, to use the outcome 2.2. Data analysis
of that check whether a model without interaction
(ANCOVA) or with interaction (MODREG) is to be Once the data are collected, five models can be
preferred, to use ANCOVA only if the criteria outlined compared: a treatment effect but no linear relation
in this paper indicate a preference towards the model between covariate and response variable (i.e., Model 1:
without interaction, and even in that case to report on treatment); a linear relation between covariate and
the omitted interaction so that readers understand why response variable (i.e., Model 2: covariate); both
the interaction was omitted. treatment and covariate but no interaction (i.e., Model
3: ANCOVA); a model with interaction (i.e., Model 4:
2. Method MODREG); or neither treatment nor covariate matters
(i.e., Model 5: same prediction for all). These five
In a hypothetical randomized controlled experiment models are compared in terms of proportion of variance
(i.e., these are simulated data), two groups of n ¼ 150 explained (R2 and adjusted R2), information criteria
learners each (i.e., total N ¼ 300) practice a complex (i.e., smaller is better), and Bayes factors (i.e., larger is
problem-solving procedure in an online learning better) as outlined in a recent article.24 Proportion of
environment according to the condition they have been variance explained statistics are obtained from Jamovi
allocated to and then complete a posttest that is 0.8.6.0,21 information criteria – Akaike's information
supposed to measure learning outcomes. Before the criterion (AIC),25 Schwarz’ Bayesian information
experiment, learners individually rate, on a series of criterion (BIC),26,27 and the sample size-adjusted BIC
five items – each with a visual analog scale ranging (SABIC)28,29 – from Mplus 8,30 and Bayes factors
from -5 (no experience at all) through 0 (average (BF)31 from JASP 0.8.6.0.32
experience) to þ 5 (a lot of experience) – their self-
perceived prior experience with the procedure (i.e., 3. Results
each of the five items covers one aspect of the
procedure). Self-perceived prior experience was on average (M)
0.054 (SD ¼ 1.012) and, as to be expected after
2.1. Assumptions random allocation of a number of participants of this
size, was almost the same for the experimental
For the simplicity of the example, let us assume that treatment (M ¼ 0.059, SD ¼ 0.954) and control (M
psychometric analysis reveals that this set of items ¼ 0.049, SD ¼ 1.069) condition. Moreover, in terms
provides a unidimensional (i.e., one-factor) assessment of posttest performance, the experimental treatment (M
22
of self-perceived prior experience that has a high ¼ 11.072, SD ¼ 1.337) and control (M ¼ 11.019, SD
reliability (e.g., McDonald's omega, an in practice ¼ 0.962) condition were not much different either.
J. Leppink / Health Professions Education 4 (2018) 225–232 229
Table 1
Comparison of five competing models – treatment; covariate; ANCOVA; MODREG; or same prediction for all (i.e., ‘None’) in terms of proportion
of explained variance (R2 and adj. R2), information criteria (AIC, BIC, SABIC: smaller is better), and Bayes factor (BF: larger is better).
effect among learners who report relatively high prior linear relation between covariate and response
experience (i.e., covariate average þ 1 SD). variable is close to zero in each of the groups, the
outcome of the model comparison is likely a model
4. Discussion with treatment and without covariate (i.e., a one-way
ANOVA or t-test). Finally, if the model selection
Through a comparison of ANCOVA and MODREG, criteria indicate that the model without any of these
this article demonstrates why it is important to check terms (i.e., the ‘None’ model in Table 1) is to be
the non-interaction assumption underlying ANCOVA preferred, it seems we do not need any term in the
before one proceeds with ANCOVA. Failing to do so model (i.e., same prediction for all will do).
may result in overlooking a treatment effect of interest, 3. Proceed with the best model: when we report the
as in the example study in this article. In other cases, outcomes of model comparison as in Table 1, we are
researchers may conclude an overall treatment effect transparent to our readers with regard to which
based on ANCOVA while MODREG reveals that there model we prefer and why. When all criteria indicate
is a substantial (positive or negative) treatment effect on a preference towards one particular model, the
one side of the observed range of the covariate but not choice is easy. Incidentally, not all criteria may
on the other side of that range (i.e., substantial at point in the same direction: in some cases, the AIC
covariate average – 1 SD but not at covariate average þ may prefer a slightly more complex model (e.g.,
1 SD, or vice versa). Although for illustration purposes MODREG over ANCOVA) whereas the BIC may
the interaction effect in this example is rather large (β prefer a slightly simpler model (e.g., ANCOVA over
4 0.40), substantial misestimation of treatment effects MODREG). In such cases, the SABIC and BF along
of interest can also occur when the interaction effect is with differences between models in R2 and adjusted
of a somewhat more modest size (e.g., β ¼ 0.25). As in R2 can help us to make a motivated choice, and by
the context of ANOVA, in the case of substantial presenting the outcomes cf. Table 1 readers can
interaction, the interpretation of main effects (in this decide for themselves if we made a sensible choice.
study: effects of treatment and covariate) is tricky 4. Also report the outcomes of effects omitted: regard-
business that may well result in inappropriate conclu- less of which model we choose, we should report
sions with regard to effects of interest. Therefore, the some key statistics with regard to effects omitted
following strategy should be applied when combina- from the final model. If we choose ANCOVA, we
tions of factors and covariates are considered: report the outcomes of group differences of interest
(i.e., estimated difference with confidence interval or
1. Describe and plot: means and standard deviations posterior interval 24) as well as the effect of the
per group on the covariate and response variable of covariate (i.e., β with regard to the main effect of the
interest and a scatterplot for the linear relation covariate). However, we should in that case also
between covariate and response variable; these report the standardized estimate (β) of the omitted
statistics and graphic representations will help to interaction effect. This information will not only
get a first impression of the data and of the provide useful information in addition to the
magnitude of the interaction (i.e., the extent to outcomes of model comparison (cf. Table 1) but
which the regression lines deviate from going can facilitate meta-analysis 33 on a phenomenon of
parallel). interest as well; one of the main struggles for
2. Compare models: in practice, the interaction term researchers who attempt to do a meta-analysis, or
will rarely be exactly zero (i.e., there will usually be systematic review otherwise, is unreported data. For
some deviation from parallel regression lines); the same reasons, even if a simpler model than
information criteria and Bayes factors can help us ANCOVA (i.e., treatment only, covariate only, or
decide which model to prefer. If, for instance, the same prediction for all) is to be preferred, we should
interaction effect is only small, these criteria may report the outcomes with regard to group differences
indicate a preference towards ANCOVA relative to of interest (i.e., estimated difference with confidence
MODREG. Likewise, if there is neither substantial interval or posterior interval 24), the effect of the
interaction nor an overall group difference, the covariate (β), and the interaction effect (β) a well.
aforementioned criteria will likely prefer a model
with covariate only 24 (i.e., a simple linear regres- Indeed, if two factors instead of one factor and/or
sion). In a similar fashion, if there is a substantial two covariates instead of one covariate are to be
group difference in the response variable but the considered, this comes down to more statistics being
J. Leppink / Health Professions Education 4 (2018) 225–232 231
20. Hayes AF. The PROCESS macro for SPSS and SAS; 2017. 28. Enders CK, Tofighi D. The impact of misspecifying class-specific
Retrieved from: 〈http://www.processmacro.org/index.html〉. Ac- residual variance in growth mixture models. Struct Eq Mod:
cessed 16 April 2018. Multidisc J 2008;15:75–95.
21. Jamovi project. Jamovi (version 0.8.6.0); 2018. Retrieved from 29. Tofighi D, Enders CK. Identifying the correct number of classes
〈https://www.jamovi.org〉. Accessed 16 April 2018. in mixture models. In: Hancock GR, Samuelsen KM, editors.
22. Crutzen R, Peters GJY. Scale quality: alpha is an inadequate Advances in Latent Variable Mixture Models. Greenwich, CT:
estimate and factor-analytic evidence is needed first of all. Health Information Age; 2007. p. 317–341.
Psychol Rev 2017;11:242–247. 30. Muthén LK, Muthén B. Mplus user’s guide. Version 8; 2017.
23. Dunn TJ, Baguley T, Brunsden V. From alpha to omega: a Retrieved from: 〈https://www.statmodel.com/download/users
practical solution to the pervasive problem of internal consistency guide/MplusUserGuideVer_8.pdf〉 Accessed 16 April 2018.
estimation. Br J Psychol 2014;105:399–412. 31. Wagenmakers EJ, Marsman E, Jamil, T, et al. Bayesian inference
24. Leppink J. A pragmatic approach to statistical testing and for psychology. Part I: theoretical advantages and practical
estimation (PASTE). Health Prof Educ 2018http://dx.doi.org/ ramifications. Psychon Bull Rev 2017http://dx.doi.org/10.3758/
10.1016/j.hpe.2017.12.009. s13423-017-1343-3.
25. Akaike H Information theory and an extension of the maximum 32. Love J, Selker R, Marsman M . et al. JASP (version 0.8.6.0);
likelihood principle. In Petrov BN, Csaki F (Eds.), In: 2018. Retrieved from 〈https://jasp-stats.org/〉 Accessed 16 April
Proceedings of the Second International Symposium on Informa- 2018.
tion Theory pp. 267–281. Budapest: Academiai Kiado. 33. Lipsey MW, Wilson DB. Applied social research methods series.
26. Schwarz G. Estimating the dimensions of a model. Ann Stat Practical Meta-analysis, Vol. 49. Thousand Oaks, CA: Sage;
1978;6:461–464. 2001.
27. Kass RE, Raftery AE. Bayes factors. J Am Stat Assoc 1995;90:
773–795.