The Stroop Color Word Test Influence of
The Stroop Color Word Test Influence of
The Stroop Color Word Test Influence of
1177/1073191105283427
Van der Elst et al. / THE STROOP COLOR-WORD TEST
The Stroop Color-Word Test was administered to 1,856 cognitively screened, healthy Dutch-
speaking participants aged 24 to 81 years. The effects of age, gender, and education on
Stroop test performance were investigated to adequately stratify the normative data. The re-
sults showed that especially the speed-dependent Stroop scores (time to complete a subtest),
rather than the accuracy measures (the errors made per Stroop subtask), were profoundly af-
fected by the demographic variables. In addition to the main effects of the demographic vari-
ables, an Age ´ Low Level of Education interaction was found for the Error III and the
Stroop Interference scores. This suggests that executive function, as measured by the Stroop
test, declines with age and that the decline is more pronounced in people with a low level of
education. This is consistent with the reserve hypothesis of brain aging (i.e., that education
generates reserve capacity against the damaging effects of aging on brain functions). Nor-
mative Stroop data were established using both a regression-based and traditional
approach, and the appropriateness of both methods for generating normative data is
discussed.
Keywords: Stroop Color-Word Test; executive function; normative data; education; sex; brain reserve
The Stroop Color-Word Test (Stroop, 1935) is a useful the ink color that incongruously named color words are
and reliable assessment tool in psychology (Lezak, printed in). The increase in time taken to perform the latter
Howieson, & Loring, 2004). Numerous different Stroop task compared with the basic task is referred to as “the
test versions have been developed (e.g., Comalli, Wapner, Stroop interference effect” (e.g., Davidson, Zacks, &
& Werner, 1962; Golden, 1978; Trenerry, Crosson, Williams, 2003; Moering, Schinka, Mortimer, & Graves,
DeBoe, & Leber, 1989), with variations in the color and 2003) and is considered a general measure of cognitive
number of the test items, the number of subtests, and the flexibility and control (Uttl & Graf, 1997) or executive
administration procedure. Despite these variations, the ba- functioning (Moering et al., 2003). These abilities de-
sic paradigm of the Stroop test has remained the same: An cline with age (Ivnik, Malec, Smith, Tangalos, & Peter-
individual’s performance on a basic task (e.g., reading sen, 1996) and in dementia (Houx, Jolles, & Vreeling,
names of colors) is compared with his or her performance 1993), which has made the Stroop test a popular test to
on an analogous task in which a habitual response needs to evaluate various groups of patients with borderline or es-
be suppressed in support of an unusual one (i.e., naming tablished brain pathology (Bohnen, Twijnstra, & Jolles,
Address correspondence to Jelle Jolles, Faculty of Medicine, Department of Psychiatry and Neuropsychology, Maastricht University,
P.O. Box 616, 6200 MD, Maastricht, The Netherlands; phone: +31 43 38 81041; e-mail: [email protected].
Assessment, Volume 13, No. 1, March 2006 62-79
DOI: 10.1177/1073191105283427
© 2006 Sage Publications
Van der Elst et al. / THE STROOP COLOR-WORD TEST 63
1992; Davidson et al., 2003; Dulaney & Rogers, 1994; nitive aging (Jolles, Houx, Van Boxtel, & Ponds, 1995).
Graf, Uttl, & Tuokko, 1995; Houx et al., 1993; MacLeod, Participants in the MAAS study were recruited from the
1991; Uttl & Graf, 1997). Registration Network Family Practices (RegistratieNet
The popularity of the Stroop test in clinical and re- Huisartspraktijken [RNH]), which includes 80,000 peo-
search settings (Lezak et al., 2004) means that it is impor- ple who live in the province of Limburg in the Netherlands
tant to determine the influence of age and age-extrinsic (Metsemakers, Höppener, Knottnerus, Kocken, &
factors on test performance. Previous studies have pro- Limonard, 1992). RNH physicians classify health prob-
vided inconclusive data about the effects of age, sex, and lems according to the International Classification of
education on Stroop test performance. Although most Health Problems in Primary Care (ICPC; Lamberts &
authors reported age-related decrements in Stroop test Wood, 1987). The use of the RNH as a sample frame rather
performance (Daigneault, Braun, & Whitaker, 1992; than a general population sample has the advantage that an
Feinstein, Brown, & Ron, 1994; Hameleers et al., 2000; eligible study sample could be selected beforehand. Thus,
Houx et al., 1993; Ivnik et al., 1996; Klein, Ponds, Houx, individuals with medical conditions known to interfere
& Jolles, 1997; Libon et al., 1994; Moering et al., 2003; with cognition (i.e., cerebrovascular pathology, tumors of
Spreen & Strauss, 1998; Swerdlow, Filion, Geyer, & the nervous system, multiple sclerosis, epilepsy, Parkin-
Braff, 1995; Van Boxtel, ten Tusscher, Metsemakers, sonism, dementia, organic psychosis, schizophrenia, af-
Willems, & Jolles, 2001), Graf et al. (1995) did not find fective psychosis, and mental retardation) were identified
age to influence test performance. Sex differences in using the RNH records and excluded from the sample
Stroop performance have been reported by some frame. In total, 10,396 individuals between age 24 and 81
(Hameleers et al., 2000; Martin & Franzen, 1989; Moering were then randomly drawn from the RNH. They were in-
et al., 2003; Van Boxtel et al., 2001) but not all (Houx et al., formed about the study by their general practitioner rather
1993; Klein et al., 1997; Swerdlow et al., 1995; Trenerry than by the MAAS project staff, which was expected to
et al., 1989) authors. Again, education was found to be have a facilitative effect on compliance, and were asked to
positively related to Stroop test performance by some au- return a prepaid postcard to indicate their willingness to
thors (Hameleers et al., 2000; Houx et al., 1993; Moering participate in the study. In total, 4,490 people (43.2%)
et al., 2003; Van Boxtel et al., 2001) but not by others agreed to participate (3,531 individuals [34%] refused
(Trennery et al., 1989). participation and 2,375 [22.8%] people did not return the
In this study, we administered the Stoop test to a sample postcard). Potential participants were then screened in a
of 1,856 cognitively intact men and women aged 24 to 81 semistructured interview to check for additional exclusion
years with different levels of educational attainment in or- criteria that were not coded in the RNH (i.e., a history of
der to establish normative data. In the past decade, there transient ischemic attacks, brain surgery, hemodialysis for
has been considerable debate about which methodological renal failure, electroconvulsive therapy, and chronic psycho-
approach should be used to derive normative data, that is, a tropic drug use), which led to the exclusion of 301 individ-
traditional approach or a regression-based approach (see uals. Of the remaining 4,189 participants, 1,856 were ran-
Fastenau, 1998; Fastenau & Adams, 1996; Heaton, Avitable, domly selected from 12 discontinuous age categories (25
Grant, & Matthews, 1999; Heaton, Matthews, Grant, & ± 1 years, 30 ± 1 years . . . 80 ± 1 years) for participation in
Avitable, 1996; Van Breukelen & Vlaeyen, in press). The the study.
main difference between the two approaches is that with Not all data for the 1,856 participants administered the
the traditional approach, normative data are derived from Stroop test were included in the analyses. The following
raw test scores by splitting the sample into relevant demo- exclusion criteria were used: a score below 24 on the Mini-
graphic subgroups, whereas with the regression-based ap- Mental State Examination (Folstein, Folstein, & McHugh,
proach, normative data are derived from test scores as pre- 1975), the occurrence of technical problems during test as-
dicted from the relevant demographic variables. Because sessment, and more than 20 errors on the third Stroop
both methods have their advantages, either methodologi- subtask (indicative of possible cognitive problems). This
cal or in terms of ease of use, we established normative led to the exclusion of data for 68 participants (data for 13,
data using both methods. 52, and 3 participants were excluded for the above-
mentioned reasons, respectively). Table 1 provides basic
descriptive data for the sample. Level of Education (LE)
METHOD was assessed by classifying formal schooling in a system
often used in the Netherlands (De Bie, 1987), which is
Participants comparable to the International Standard Classification of
Education (United Nations Educational, Scientific and
The data were derived from the Maastricht Aging Study Cultural Organisation, 1976). The participants were
(MAAS), a prospective study into the determinants of cog- grouped as follows: those with at most primary education
64 ASSESSMENT
TABLE 1
Descriptive Characteristics of the Sample (N = 1,788)
Age Level of Education (frequency)
Age Group (years) n M SD Low Average High Male:Female Ratio
(LE low), those with junior vocational training (LE aver- the three subsequent subtasks. There was no time limit to
age), and those with senior vocational or academic train- complete a subtask. The times needed to complete each
ing (LE high). The ethnic background of all participants Stroop subtask served as dependent measures (Stroop I,
was Caucasian, and all participants were native Dutch Stroop II, and, Stroop III, respectively). An interference
speakers. measure was calculated by subtracting the average time
needed to complete the first two subtasks from the time
Procedure and Instruments needed to complete the third subtask (Interference =
Stroop III – [(Stroop I + Stroop II) / 2]; Valentijn et al.,
All participants were tested individually at the neuro- 2005). The examiners did not point out errors made during
psychological laboratory of the Brain & Behaviour Insti- the test. Although many participants spontaneously cor-
tute in Maastricht, the Netherlands, using the Stroop test rected themselves when they noticed an error (which re-
version most commonly used in this country (Hammes, quires a certain amount of time, so that the Stroop I, Stroop
1973). This Stroop test version consists of three subtasks. II, and Stroop III scores were, to some extent, indirectly
The stimulus material for each of these subtasks is shown corrected for poor accuracy), this was not always the case.
on a white sheet of paper that is landscape oriented (A4 let- Therefore, the number of errors that were not self-
ter format, 11.69 in ´ 8.26 in [29.70 cm ´ 20.99 cm]). The corrected was also recorded for each Stroop subtask (Error
100 stimuli for each subtask are distributed evenly in a 10 I, Error II, and Error III, respectively).
´ 10 matrix on each sheet of paper with a margin of about The data were collected by five examiners who had
1.97 in (5 cm) at the top, 0.59 in (1.5 cm) on the left and on been intensively trained in test administration. Regular
the right, and 1.57 in (4 cm) at the bottom. The first subtask training sessions were scheduled to ensure uniform ad-
shows color words in random order (red, blue, yellow, ministration of the Stroop.
green) printed in black ink (noncapital letters, 0.157 in
[0.4 cm] high). Subtask 2 displays solid color patches of Data Analysis
0.276 in ´ 0.787 in (0.7 cm ´ 2.0 cm) in one of these four
basic colors. The third subtask contains color words print- We first established which demographic variables were
ed in an incongruous ink color (noncapital letters, 0.157 in and were not predictive for the different Stroop scores so
[0.4 cm] high), for example, the word yellow printed in red that the normative data could be appropriately corrected
ink. The Dutch words for red, blue, yellow, and green are for these variables. Thus, the Stroop scores were regressed
similar to their English equivalents in length and pronunci- on age, age2, sex, level of education, and all two-way inter-
ation time (i.e., rood, blauw, geel, and groen, respectively). actions. Age was centered (calendar age – 50) before com-
The participants were instructed to read the words, puting the quadratic terms and interactions to avoid multi-
name the colors, and finally, name the ink color of the collinearity (Marquardt, 1980). Sex was dummy coded
printed words as quickly and as accurately as possible in with male = 1 and female = 0. LE was dummy coded with
Van der Elst et al. / THE STROOP COLOR-WORD TEST 65
TABLE 2
Final Regression Models for the Stroop Scores
2
Measure Variable B SE B T Standardized B R
NOTE: The full models included age, age2, sex, LE low, LE high, and all two-way interactions as predictors. LE = level of education. Coding of the pre-
dictors: Age = calendar age – 50; Age2 = (calendar age – 50)2; Sex: male = 1, female = 0; LE low: low education = 1, average or high education = 0; LE high:
high education = 1, low or average education = 0.
*p < .005. **p < .001.
TABLE 3
Zero-Order Correlations Between the Predictors Used in the
Regression Analyses and the Stroop Measures
Age Sex LE Low LE High Stroop I Stroop II Stroop III Interference Error I Error II Error III
Age 1
Sex 0.008 1
a
LE low 0.313** –0.098** 1
a a
LE high –0.193** 0.128** –0.409** 1
Stroop I 0.379** 0.017 0.343** –0.223** 1
Stroop II 0.429** 0.085** 0.322** –0.214** 0.756** 1
Stroop III 0.578** 0.061 0.407** –0.232** 0.602** 0.739** 1
Interference 0.547** 0.051 0.368** –0.197** 0.377** 0.532** 0.958** 1
Error I 0.044 –0.002 0.077** –0.058 0.046 0.074* 0.078** 0.071* 1
Error II 0.084** 0.008 0.101** –0.088** 0.072* 0.143** 0.137** 0.125** 0.112** 1
Error III 0.176** 0.026 0.194** –0.094** 0.176** 0.188** 0.321** 0.323** 0.107** 0.167** 1
a. These correlations (between two dummy variables) correspond to j coefficients (Hays & Winkler, 1971).
*p < .005. **p < .001.
FIGURE 1
Predicted Stroop Interference Scores for Male Participants by
Level of Education as a Function of Age
90
80
Predicted Stroop Interference score
70
60
LE low
LE average
LE high
50
40
30
20
r
r
a
a
ye
ye
ye
ye
ye
ye
ye
ye
ye
ye
ye
ye
25
30
35
40
45
50
55
60
65
70
75
80
68 ASSESSMENT
FIGURE 2
Predicted Stroop Error III Scores for Males and Females, by Level of Education as a Function of Age
2,5
2
Predicted Stroop Error III score
1,5
LE low
LE average
LE high
1
0,5
0
r
r
a
a
ye
ye
ye
ye
ye
ye
ye
ye
ye
ye
ye
ye
25
30
35
40
45
50
55
60
65
70
75
80
siduals) as a function of predicted scores did not solve the If the performance of such an individual is borderline nor-
problem of skewed Error II and Error III standardized re- mal (e.g., Z value » –1.64), the exact Z values can be
sidual scores, which makes the regression-based norma- determined using Tables 2 and 5.
tive approach inappropriate for these scores. For this rea-
son, there are no SD (residual) values for these scores
presented in Table 5. DISCUSSION
As an example of the regression-based method, the in-
terference score of 35 of the 40-year-old man from the pre- In this study, we first determined which variables affect
vious example is considered. The predicted interference Stroop performance, i.e., age, sex, LE, and all possible
score for this person was 35.676 [= 36.066 + (.500 * –10) + two-way interactions between these predictors.
(.016 * [–102]) + (3.010 * 1) + (8.505 * 0) + (–2.092 * 0) + All the speed-dependent Stroop test scores (Stroop I,
(.167 * 0) + (–.019 * 0)]. So the residual was .676 (= – [35 - Stroop II, Stroop III, and interference) were profoundly
35.676]) and the standardized residual was .053 (= .676 / affected by linear and quadratic age effects and LE (see
12.667). This standardized residual corresponds to an ex- Table 2). Although MacLeod (1991) suggested that sex
act p value of .52 and can be considered normal. Because only has a minor influence on Stroop test performance at
the steps that are required in this regression-based ap- any age, we found clear sex differences on the Stroop II,
proach require some active calculation from the user of the Stroop III, and the interference scores, with women out-
normative data, we also calculated simplified normative performing men over the entire age range studied (there
tables to increase the user-friendliness of the regression- were no Sex ´ Age interactions). With respect to the accu-
based norms (see Tables B1 to B4 in the appendix). If an racy measures (Error I, Error II, and Error III), the influ-
individual is not exactly 25, 30, . . . , 80 years old, then the ence of the demographic variables was less profound: The
person’s age should be rounded up to the closest age given. Error I score was not affected by any of the demographic
TABLE 4
Descriptive Statistics for the Stroop Measures Stratified by Their Relevant Predictors
Age Group
Measure Sex LE All 25 ± 1 30 ± 1 35 ± 1 40 ± 1 45 ± 1 50 ± 1 55 ± 1 60 ± 1 65 ± 1 70 ± 1 75 ± 1 80 ± 1
Females Low
M 58.70 54.45 57.67 57.20 51.70 56.14 57.74 62.50 61.36 60.80 67.90 71.38
SD 8.47 7.74 10.06 13.84 7.58 9.19 8.57 8.82 10.11 8.13 11.30 11.82
n 13 13 11 16 26 35 48 42 51 45 48 18
Average
M 49.68 51.17 48.84 52.13 51.53 52.39 56.34 53.89 57.08 56.72 63.80 65.71
SD 8.27 6.31 6.84 8.30 7.84 7.23 7.10 7.52 8.44 10.01 8.43 11.06
n 40 38 52 48 41 28 22 27 21 23 26 5
High
M 49.66 49.41 50.86 44.91 49.88 55.69 48.62 56.55 58.41 54.58 56.34 61.96
SD 8.72 9.47 4.52 5.62 6.51 14.29 11.04 7.71 10.75 11.80 9.02 7.63
n 28 24 16 14 17 18 6 7 5 6 6 6
(continued)
69
70
TABLE 4 (continued)
Age Group
Measure Sex LE All 25 ± 1 30 ± 1 35 ± 1 40 ± 1 45 ± 1 50 ± 1 55 ± 1 60 ± 1 65 ± 1 70 ± 1 75 ± 1 80 ± 1
Females Low
M 92.55 89.16 90.32 91.13 82.68 91.90 100.17 111.74 109.83 112.19 131.97 150.17
SD 16.74 17.67 13.99 24.64 15.74 13.43 18.32 21.16 22.15 20.24 30.48 36.24
n 13 13 11 16 26 35 48 42 51 45 48 18
Average
M 73.93 76.85 77.08 80.32 83.82 83.22 95.42 88.01 97.75 106.03 114.55 126.22
SD 11.00 12.66 14.15 11.75 13.89 17.12 14.58 13.77 25.37 26.45 19.39 21.86
n 40 38 52 48 41 28 22 27 21 23 26 5
High
M 75.75 74.52 80.68 69.63 74.76 86.23 79.72 88.11 91.76 95.68 108.54 125.61
SD 13.50 17.37 11.20 10.52 9.17 15.50 18.72 11.90 19.55 16.26 29.94 33.28
n 28 24 16 14 17 18 6 7 5 6 6
Interference Males Low
M 47.14 36.29 38.75 43.03 53.65 54.90 55.87 49.55 60.70 65.33 76.36 88.95
SD 8.34 8.53 12.86 16.59 18.59 19.37 18.49 17.17 25.11 28.49 22.91 29.22
n 8 9 22 21 23 17 33 31 32 35 32 20
Average
M 36.16 36.17 33.06 37.36 38.85 39.35 45.02 43.12 46.88 52.73 60.92 74.44
SD 11.19 10.95 9.56 10.55 14.94 15.94 14.58 13.50 14.03 18.94 20.06 24.52
n 37 41 25 24 32 43 27 38 29 34 27 4
High
M 29.52 36.47 33.33 37.68 38.01 34.04 38.77 44.35 49.86 49.95 57.90 61.24
SD 7.52 10.97 11.56 15.30 13.00 11.58 13.77 17.72 13.55 12.73 19.60 17.55
n 33 29 31 32 23 19 21 12 15 12 18 6
Females Low
M 41.33 40.42 37.97 38.55 36.30 41.52 49.09 56.51 55.73 58.59 72.47 89.06
SD 15.42 14.80 10.05 16.35 10.94 10.29 14.79 18.42 18.72 18.34 26.45 29.11
n 13 13 11 16 26 35 48 42 51 45 48 18
Average
M 30.13 31.04 33.72 33.76 37.48 35.56 46.16 41.10 47.07 55.91 58.54 70.81
SD 7.43 10.42 10.64 10.95 10.73 14.34 12.22 11.19 23.10 20.89 16.41 14.78
n 40 38 52 48 41 28 22 27 21 23 26 5
High
M 31.34 29.81 35.15 28.64 29.84 38.85 35.03 37.94 40.88 47.16 58.03 69.91
SD 9.82 12.95 10.29 8.56 9.60 11.53 11.81 7.35 7.85 7.68 25.88 27.93
n 28 24 16 14 17 18 6 7 5 6 6 6
Error I Males and females All LEs
a
M 0.21
SD 0.593
n 1,788
Error II Males and females Low
a
M 0.46
SD 0.94
n 649
Average
a
M 0.33
SD 0.81
n 732
High
a
M 0.22
SD 0.61
n 404
Error III Males and females Low
a a a a a a a a a
M 1.14 0.50 0.82 0.57 0.86 1.46 0.80 0.89 1.01 1.15 2.63 2.45
SD 1.65 0.67 1.18 1.28 1.61 2.04 1.44 1.45 1.87 2.03 4.46 3.45
n 21 22 33 37 49 52 80 73 83 80 79 38
Average
a a a a a a a a a a a
M 0.52 0.53 0.45 0.39 0.68 0.49 0.55 0.54 0.96 0.47 0.60 1.33
SD 0.84 1.02 0.91 0.85 1.26 0.89 1.50 0.99 2.37 0.89 1.10 1.58
n 77 79 77 72 73 71 49 65 49 57 53 9
High
a a a a a a a a a
M 0.33 0.60 0.32 0.57 0.35 0.27 0.30 0.26 0.42 0.72 1.33 0.92
SD 0.89 1.81 0.69 1.26 0.80 0.56 0.78 0.73 0.84 1.32 2.55 2.02
n 61 53 47 46 40 37 27 19 19 18 24 12
NOTE: LE = level of education. The normative statistics for the scores that are stratified by LE were based on a total sample of 1,785 instead of 1,788 because data on LE were missing from 3 participants. Data on
Error III scores were missing from 4 participants.
a. The mean and standard deviation for this subgroup cannot be used to convert the raw scores into Z scores because the raw scores in this subgroup were not normally distributed (p value of the Kolmogorov-
Smirnov Z < .005).
71
72 ASSESSMENT
12 years) would be evaluated against the same normative tinuous variables, whereas the accuracy measures are dis-
data, which is not acceptable (Capitani, 1997). crete. In general, linear regression analysis is primarily
In the regression-based approach, the problem of unre- suitable to analyze continuous dependent variables, al-
liable normative data for certain subgroups because the though in most cases discrete dependent variables can
sample was subdivided does not occur. Indeed, regression- also be analyzed with regression analysis. For example,
based norms provide more accurate estimates of popula- the regression-based normative approach was found to be
tion statistics because they are based on equations that appropriate in previous normative research with the Verbal
are derived using the data for all demographic groups Learning Test of Rey (Van der Elst, Van Boxtel, Van
(Van Breukelen & Vlaeyen, in press; Zachary & Gorsuch, Breukelen, & Jolles, 2005), a test for declarative memory
1985). In fact, normative data can even be provided for of which most of the scores are discrete variables (with a
people with certain demographic characteristics that were range of the scores of 0 to 15). The Stroop error scores are
not in the sample. For example, in the present study, dis- also discrete variables that cover about the same range of
continuous age groups were used (24-26 years, 29-31 scores, but a problem with the error scores is that they are
years, 34-36 years, etc.), but normative data for nontested confined to a small part of the possible range of the scores
people aged 28 years, for example, could be determined by and that they are highly skewed. When such data are ana-
using the regression model and the SD (residual) derived lyzed with regression analysis, the assumption of nor-
from the scores of the tested people. Also, the problem mally distributed residuals is usually violated (Fox, 1997).
of unbalanced data does not occur in the regression-based This assumption is arbitrary for most purposes—because
approach because any imbalance in the sample does not regression analysis is robust against violations of the nor-
bias the estimation or testing of the regression weights but mality of the residuals assumption (Fox, 1997)—but in the
only causes some loss of statistical power (because the context of the regression-based normative approach, this
standard error of a regression weight is proportional to the assumption is important because the standardized residu-
VIF of the predictor at hand; see, e.g., Kleinbaum, Kupper, als can only be interpreted via a Z distribution table if they
Muller, & Nizam, 1998). In addition, the regression-based are normally distributed. However, heavily skewed scores
normative approach offers some interesting possibilities that are confined to a small part of the possible range
that cannot be achieved by using a traditional approach. of scores can be normed by using a method similar to the
For example, as an alternative for the interference score, a regression-based approach, that is, by dichotomizing the
regression model can be constructed that predicts the scores and analyzing them by logistic regression (Fox,
Stroop III score based on the Stroop I and Stroop II scores 1997), but this method is more complex and beyond the
in addition to the relevant demographic variables. Such an scope of this article. Rather than using logistic regres-
approach avoids the use of difference scores such as the in- sion, we used cumulative percentages (see the appendix;
terference score, which may reduce the problems that are Table A1) to evaluate the error scores of a testee. The use
associated with such scores, that is, their typically low reli- of cumulative percentages avoids the problem of non-
ability. We did not provide such measures here, because normally distributed raw scores (which is required to use
normative tables for each relevant combination of predic- the Z-score transformation in the traditional approach),
tors (Stroop I score, Stroop II score, age, sex, and/or LE) but the problems associated with the division of the sample
would be too numerous. Thus, for such norms to be used, into subgroups remain. Indeed, for the Error III score, the
the three-step procedure to convert raw scores into standard- sample had to be divided into many subgroups because of
ized residuals would be required, which is not very user- significant effects of age and LE on this score, which
friendly. A solution is to use a computer-based algorithm means that the cumulative percentages of the Error III
that performs these calculations automatically (which can scores for certain subgroups are derived from the data of
easily be done with standard programs such as Microsoft small samples (see the ns in Table 4). Thus, caution is
Excel) or to use categorized values instead of continuous needed when using the normative cumulative percentages
values. for the Error III score.
Although the regression-based approach has some sig- Although the normative data presented here are for
nificant advantages over the traditional method, it requires Dutch-speaking participants, the Stroop test has been
some distributional assumptions that are not always satis- found to be culturally robust in a large-scale study that in-
fied. Indeed, in our study, the regression-based method volved seven different languages (English, Danish, Dutch,
was appropriate to derive norms for the speed-dependent French, German, Greek, and Spanish; Møller et al., 1998).
scores (Stroop I, Stroop II, Stroop III, and interference This suggests that the normative data can be used for peo-
scores), but there were problems with the accuracy mea- ple living in different cultural areas or for people who
sures (error scores). An important difference between both speak other languages, but more research is needed. Houx
types of scores is that the speed-dependent scores are con- et al. (2002) showed that the shortened Stroop test (with 40
74 ASSESSMENT
items per subtest instead of 100) was a reliable test (test/ Stroop test are similar across countries and languages. As
retest correlation with 2-week test interval of .80 for the a final remark, it should be kept in mind we used the
40-item Stroop III score) in a large sample of elderly par- Hammes (1973) version of the Stroop test. Although the
ticipants in Great Britain, Ireland, and the Netherlands Stroop interference effect is a robust effect that has been ob-
(N = 5,804; age range = 70-82 years). The Pearson’s corre- served with numerous different versions of the Stroop test,
lations varied less than .05 for each of the three countries, more research is needed regarding the comparability and
which suggests that the psychometric properties of the interchangeability of the different Stroop test versions.
Van der Elst et al. / THE STROOP COLOR-WORD TEST 75
APPENDIX A1
Cumulative Percentages for the Raw Error I, Error II, and Error III Scores That Represent the Proportion of Individuals
Who Score At Or Above 0, 1, 2, 3, 4, and 5, Stratified by Their Relevant Predictors
4 .6
3 1.4
2 3.8
1 14.7
0 100.0
MALE & FEMALE
Raw score All Age Groups
5 .6
4 1.5
LE low
3 4.0
2 11.6
1 27.2
0 100.0
Raw score All Age Groups
5 .5
LE average
4 .8
Error II
3 2.2
2 7.1
1 21.0
0 100.0
Raw score All Age Groups
5 .2
LE high
4 1.5
3 1.5
2 5.4
1 14.1
0 100.0
MALE & FEMALE
Age group
Raw score
25±1 30±1 35±1 40±1 45±1 50±1 55±1 60±1 65±1 70±1 75±1 80±1
LE low
5 4.8 0.0 0.0 2.7 2.0 9.6 1.3 4.1 6.0 7.5 17.7 18.4
4 9.5 0.0 3.0 5.4 4.1 17.3 3.8 4.1 9.6 11.3 22.8 21.1
3 19.0 0.0 12.1 8.1 12.2 25.0 11.3 8.2 14.5 15.0 27.8 28.9
2 28.6 9.1 27.3 10.8 22.4 34.6 22.5 24.7 24.1 27.5 38.0 39.5
1 47.6 40.9 39.4 27.0 36.7 48.1 36.3 42.5 37.3 42.5 55.7 63.2
0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Age group
Raw score
25±1 30±1 35±1 40±1 45±1 50±1 55±1 60±1 65±1 70±1 75±1 80±1
Error III
LE average
5 0.0 0.0 0.0 0.0 1.4 0.0 4.1 1.5 2.0 0.0 1.9 0.0
4 0.0 2.5 2.6 1.4 4.1 1.4 4.1 1.5 2.0 1.8 3.8 22.0
3 5.2 8.9 5.2 4.2 13.7 5.6 8.2 3.1 8.2 5.3 5.7 22.0
2 11.7 13.9 10.4 11.1 21.9 11.3 16.3 10.8 18.4 10.5 17.0 22.0
1 35.1 27.8 27.3 22.2 27.4 31.0 16.3 35.4 42.9 29.8 32.1 66.7
0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Age group
Raw score
25±1 30±1 35±1 40±1 45±1 50±1 55±1 60±1 65±1 70±1 75±1 80±1
5 1.6 3.8 0.0 4.3 0.0 0.0 0.0 0.0 0.0 5.6 12.5 8.3
LE high
4 1.6 3.8 0.0 6.5 0.0 0.0 0.0 0.0 0.0 5.6 12.5 8.3
3 3.3 5.7 2.1 8.7 5.0 0.0 3.7 5.3 0.0 5.6 12.5 8.3
2 9.8 5.7 8.5 10.9 10.0 5.4 11.1 5.3 21.1 22.2 25.0 16.7
1 16.4 28.3 21.3 26.1 20.0 21.6 14.8 15.8 21.1 33.3 37.5 33.3
0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
APPENDIX B1
Regression based normative data for the Stroop I score stratified by Age (25, 30, ..., 80 years) and Level of Education. The
raw score leading to a particular Z-value is given for Z-values indicating the percentiles 5, 10, 20, 50, 80, 90 and 95
LE low
1.28 .90 34.5 34.5 34.6 34.9 35.3 35.9 36.6 36.6 37.6 38.8 40.1 41.6
0.84 .80 37.6 37.6 37.8 38.0 38.5 39.0 39.8 40.1 41.1 42.3 43.6 45.1
0 .50 43.7 43.7 43.8 44.1 44.5 45.1 45.8 46.7 47.8 48.9 50.3 51.7
-0.84 .20 49.8 49.8 49.9 50.2 50.6 51.2 51.9 53.4 54.4 55.6 56.9 58.4
-1.28 .10 52.9 52.9 53.1 53.3 53.8 54.3 55.1 56.9 57.9 59.1 60.4 61.9
-1.64 .05 55.5 55.5 55.7 55.9 56.4 56.9 57.7 59.7 60.7 61.9 63.3 64.7
1.28 .90 32.5 32.5 32.0 32.3 32.7 33.3 34.1 34.9 34.9 36.1 36.5 38.0
0.84 .80 35.1 35.1 34.9 35.1 35.6 36.1 36.9 37.8 38.1 39.3 40.0 41.5
0 .50 40.1 40.1 40.2 40.5 40.9 41.5 42.2 43.1 44.2 45.3 46.7 48.1
-0.84 .20 45.1 45.1 45.6 45.9 46.3 46.9 47.6 48.5 50.2 51.4 53.3 54.8
-1.28 .10 47.7 47.7 48.4 48.7 49.1 49.7 50.4 51.3 53.4 54.6 56.8 58.3
-1.64 .05 49.9 49.9 50.7 51.0 51.4 52.0 52.7 53.6 56.0 57.2 59.7 61.1
value prob. 25 30 35 40 45 50 55 60 65 70 75 80
1.64 .95 28.8 28.8 28.9 29.2 29.7 30.2 30.2 31.1 32.2 32.0 33.3 33.6
LE high
1.28 .90 31.0 31.0 31.1 31.4 31.8 32.4 32.5 33.4 34.5 34.6 35.9 36.5
0.84 .80 33.6 33.6 33.7 34.0 34.4 35.0 35.4 36.2 37.3 37.8 39.1 40.0
0 .50 38.6 38.6 38.7 39.0 39.4 40.0 40.7 41.6 42.7 43.8 45.2 46.6
-0.84 .20 43.6 43.6 43.7 44.0 44.4 45.0 46.1 47.0 48.0 49.9 51.2 53.3
-1.28 .10 46.2 46.2 46.4 46.6 47.1 47.6 48.9 49.8 50.8 53.1 54.4 56.8
-1.64 .05 48.4 48.4 48.5 48.8 49.2 49.8 51.2 52.1 53.1 55.7 57.0 59.6
APPENDIX B2
Regression based normative data for the Stroop II score stratified by Age (25, 30, ..., 80 years), Sex, and Level of Education.
The raw score leading to a particular Z-value is given for Z-values indicating the percentiles 5, 10, 20, 50, 80, 90 and 95
MALE FEMALE
Z Cum. Age in years Age in years
value prob. 25 30 35 40 45 50 55 60 65 70 75 80 25 30 35 40 45 50 55 60 65 70 75 80
1.64 .95 42.8 42.3 42.1 42.3 42.8 43.6 44.9 44.5 46.4 48.7 51.3 54.3 42.0 41.5 41.3 41.4 42.0 41.3 42.5 44.0 44.1 46.3 48.9 51.9
LE low
1.28 .90 46.2 45.7 45.5 45.6 46.2 47.0 48.3 48.3 50.3 52.5 55.1 58.1 45.0 44.5 44.3 44.5 45.0 44.6 45.9 47.4 47.9 50.1 52.8 55.7
0.84 .80 50.3 49.8 49.6 49.8 50.3 51.2 52.4 53.0 54.9 57.2 59.8 62.8 48.7 48.2 48.0 48.2 48.7 48.8 50.0 51.6 52.5 54.8 57.4 60.4
0 .50 58.2 57.7 57.5 57.7 58.2 59.1 60.3 61.9 63.8 66.1 68.7 71.7 55.9 55.3 55.1 55.3 55.8 56.7 57.9 59.5 61.4 63.7 66.3 69.3
-0.84 .20 66.2 65.6 65.4 65.6 66.1 67.0 68.2 70.8 72.7 75.0 77.6 80.6 63.0 62.4 62.2 62.4 62.9 64.6 65.8 67.4 70.3 72.6 75.2 78.2
-1.28 .10 70.3 69.8 69.6 69.8 70.3 71.1 72.4 75.4 77.4 79.6 82.2 85.2 66.7 66.2 66.0 66.1 66.7 68.8 70.0 71.5 75.0 77.2 79.9 82.8
-1.64 .05 73.7 73.2 73.0 73.2 73.7 74.5 75.8 79.2 81.2 83.4 86.1 89.0 69.7 69.2 69.0 69.2 69.7 72.2 73.4 74.9 78.8 81.0 83.7 86.6
1.28 .90 43.2 42.7 42.5 42.6 43.2 44.0 44.0 45.6 47.5 48.3 50.9 53.9 41.4 40.9 40.7 40.9 41.4 41.6 42.9 44.4 45.1 47.4 48.5 51.5
0.84 .80 46.9 46.4 46.2 46.4 46.9 47.8 48.2 49.7 51.7 52.9 55.6 58.5 44.9 44.4 44.2 44.4 44.9 45.4 46.6 48.2 49.3 51.5 53.2 56.1
0 .50 54.0 53.5 53.3 53.5 54.0 54.9 56.1 57.6 59.6 61.8 64.5 67.4 51.6 51.1 50.9 51.1 51.6 52.5 53.7 55.3 57.2 59.4 62.1 65.0
-0.84 .20 61.1 60.6 60.4 60.6 61.1 62.0 64.0 65.6 67.5 70.7 73.4 76.3 58.3 57.8 57.6 57.8 58.3 59.6 60.8 62.4 65.1 67.4 71.0 73.9
-1.28 .10 64.8 64.3 64.1 64.3 64.8 65.7 68.1 69.7 71.6 75.4 78.0 81.0 61.8 61.3 61.1 61.3 61.8 63.3 64.5 66.1 69.2 71.5 75.6 78.6
-1.64 .05 67.9 67.4 67.2 67.3 67.9 68.7 71.5 73.1 75.0 79.2 81.8 84.8 64.7 64.2 64.0 64.2 64.7 66.3 67.6 69.1 72.6 74.9 79.4 82.4
1.28 .90 40.8 40.9 40.7 40.9 41.4 41.7 42.9 44.5 45.2 47.4 48.6 51.5 39.0 38.5 38.3 38.5 39.0 39.9 41.1 42.1 44.0 45.0 47.7 49.1
0.84 .80 44.6 44.4 44.2 44.4 44.9 45.4 46.6 48.2 49.3 51.6 53.2 56.2 42.6 42.0 41.9 42.0 42.5 43.4 44.6 45.8 47.7 49.2 51.8 53.8
0 .50 51.7 51.1 51.0 51.1 51.6 52.5 53.7 55.3 57.2 59.5 62.1 65.1 49.3 48.7 48.6 48.7 49.3 50.1 51.3 52.9 54.8 57.1 59.7 62.7
-0.84 .20 58.8 57.8 57.7 57.8 58.4 59.6 60.8 62.4 65.1 67.4 71.0 74.0 56.0 55.5 55.3 55.4 56.0 56.8 58.1 60.0 61.9 65.0 67.6 71.6
-1.28 .10 62.5 61.4 61.2 61.3 61.9 63.3 64.6 66.1 69.3 71.5 75.7 78.6 59.5 59.0 58.8 59.0 59.5 60.3 61.6 63.7 65.7 69.2 71.8 76.2
-1.64 .05 65.5 64.2 64.1 64.2 64.7 66.4 67.6 69.2 72.7 74.9 79.5 82.4 62.4 61.8 61.7 61.8 62.4 63.2 64.4 66.8 68.7 72.5 75.2 80.1
APPENDIX B3
Regression based normative data for the Stroop III score stratified by Age (25, 30, ..., 80 years), Sex, and Level of Education.
The raw score leading to a particular Z-value is given for Z-values indicating the percentiles 5, 10, 20, 50, 80, 90 and 95
MALE FEMALE
Z Cum. Age in years Age in years
value prob. 25 30 35 40 45 50 55 60 65 70 75 80 25 30 35 40 45 50 55 60 65 70 75 80
1.64 .95 64.9 63.3 62.8 63.5 65.4 68.4 72.5 67.3 73.7 81.3 90.0 99.9 65.6 64.0 63.5 64.2 60.9 63.9 68.0 73.3 69.2 76.8 85.6 95.5
LE low
1.28 .90 71.9 70.3 69.9 70.5 72.4 75.4 79.5 76.6 83.0 90.6 99.4 109.3 71.5 69.9 69.4 70.1 67.9 70.9 75.1 80.4 78.6 86.2 94.9 104.8
0.84 .80 80.5 78.9 78.4 79.1 81.0 84.0 88.1 88.0 94.5 102.0 110.8 120.7 78.7 77.1 76.6 77.3 76.5 79.5 83.6 88.9 90.0 97.6 106.3 116.2
0 .50 96.9 95.3 94.8 95.5 97.4 100.4 104.5 109.8 116.2 123.8 132.6 142.5 92.4 90.8 90.4 91.0 92.9 95.9 100.0 105.3 111.8 119.4 128.1 138.0
-0.84 .20 113.3 111.7 111.2 111.9 113.7 116.7 120.9 131.6 138.0 145.6 154.4 164.3 106.2 104.6 104.1 104.8 109.3 112.3 116.4 121.7 133.6 141.2 149.9 159.8
-1.28 .10 121.8 120.2 119.8 120.5 122.3 125.3 129.5 143.0 149.4 157.0 165.8 175.7 113.4 111.8 111.3 112.0 117.9 120.9 125.0 130.3 145.0 152.6 161.3 171.2
-1.64 .05 128.9 127.3 126.8 127.5 129.4 132.3 136.5 152.3 158.8 166.4 175.1 185.0 119.3 117.6 117.2 117.9 124.9 127.9 132.0 137.3 154.3 161.9 170.6 180.5
1.28 .90 62.6 61.0 60.6 61.3 63.1 66.1 70.3 71.5 78.0 77.4 86.1 96.0 61.3 59.6 59.2 59.9 61.7 61.7 65.8 71.1 73.5 81.1 81.6 91.5
0.84 .80 69.8 68.2 67.8 68.5 70.3 73.3 77.5 80.1 86.6 88.8 97.5 107.4 67.4 65.8 65.3 66.0 67.9 68.9 73.0 78.3 82.1 89.7 93.0 102.9
0 .50 83.6 82.0 81.5 82.2 84.1 87.1 91.2 96.5 103.0 110.6 119.3 129.2 79.1 77.5 77.1 77.8 79.6 82.6 86.7 92.0 98.5 106.1 114.8 124.7
-0.84 .20 97.3 95.7 95.3 96.0 97.8 100.8 105.0 112.9 119.3 132.3 141.1 151.0 90.9 89.2 88.8 89.5 91.3 96.3 100.5 105.8 114.9 122.5 136.6 146.5
-1.28 .10 104.5 102.9 102.5 103.2 105.0 108.0 112.2 121.5 127.9 143.7 152.5 162.4 97.0 95.4 94.9 95.6 97.5 103.6 107.7 113.0 123.5 131.0 148.0 157.9
-1.64 .05 110.4 108.8 108.4 109.1 110.9 113.9 118.1 128.5 134.9 153.1 161.8 171.7 102.0 100.4 100.0 100.7 102.5 109.4 113.6 118.9 130.5 138.1 157.4 167.3
1.28 .90 61.9 60.2 59.8 60.5 59.3 62.2 66.4 71.7 74.1 81.7 82.2 92.1 57.4 55.8 55.3 56.0 57.9 60.9 61.9 67.2 69.6 77.2 77.8 87.6
0.84 .80 68.0 66.4 65.9 66.6 66.5 69.4 73.6 78.9 82.7 90.3 93.6 103.5 63.5 61.9 61.5 62.2 64.0 67.0 69.1 74.4 78.2 85.8 89.2 99.1
0 .50 79.7 78.1 77.7 78.4 80.2 83.2 87.3 92.6 99.1 106.7 115.4 125.3 75.3 73.6 73.2 73.9 75.7 78.7 82.9 88.2 94.6 102.2 111.0 120.8
-0.84 .20 91.5 89.8 89.4 90.1 94.0 96.9 101.1 106.4 115.5 123.1 137.2 147.1 87.0 85.4 84.9 85.6 87.5 90.5 96.6 101.9 111.0 118.6 132.7 142.6
-1.28 .10 97.6 96.0 95.5 96.2 101.2 104.1 108.3 113.6 124.1 131.6 148.6 158.5 93.1 91.5 91.1 91.8 93.6 96.6 103.8 109.1 119.6 127.2 144.2 154.0
-1.64 .05 102.6 101.0 100.6 101.3 107.0 110.0 114.2 119.5 131.1 138.7 158.0 167.9 98.2 96.5 96.1 96.8 98.6 101.6 109.7 115.0 126.6 134.2 153.5 163.4
APPENDIX B4
Regression based normative data for the Stroop Interference score stratified by Age (25, 30, ..., 80 years),
Sex, and Level of Education. The raw score leading to a particular Z-value is given for Z-values indicating
the percentiles 5, 10, 20, 50, 80, 90 and 95
MALE FEMALE
Z Cum. Age in years Age in years
value prob. 25 30 35 40 45 50 55 60 65 70 75 80 25 30 35 40 45 50 55 60 65 70 75 80
1.64 .95 20.1 19.9 20.4 16.5 18.6 21.6 25.3 19.0 24.3 30.5 37.4 45.1 17.1 16.9 17.4 18.7 20.9 18.6 22.3 26.8 21.3 27.5 34.4 42.1
LE low
1.28 .90 24.7 24.4 25.0 22.2 24.4 27.3 31.0 27.1 32.4 38.6 45.5 53.2 21.7 21.4 22.0 23.3 25.4 24.3 28.0 32.5 29.4 35.5 42.5 50.2
0.84 .80 30.3 30.0 30.5 29.2 31.3 34.3 38.0 37.0 42.3 48.4 55.4 63.1 27.3 27.0 27.5 28.9 31.0 31.3 35.0 39.5 39.3 45.4 52.4 60.1
0 .50 40.9 40.6 41.2 42.5 44.6 47.6 51.3 55.9 61.2 67.3 74.3 82.0 37.9 37.6 38.2 39.5 41.6 44.6 48.3 52.8 58.2 64.3 71.2 79.0
-0.84 .20 51.5 51.3 51.8 55.8 58.0 60.9 64.6 74.7 80.1 86.2 93.1 100.9 48.5 48.3 48.8 50.1 52.3 57.9 61.6 66.2 77.1 83.2 90.1 97.9
-1.28 .10 57.1 56.9 57.4 62.8 64.9 67.9 71.6 84.6 90.0 96.1 103.0 110.8 54.1 53.8 54.4 55.7 57.8 64.9 68.6 73.1 86.9 93.1 100.0 107.7
-1.64 .05 61.7 61.4 61.9 68.5 70.6 73.6 77.3 92.7 98.0 104.2 111.1 118.8 58.7 58.4 58.9 60.3 62.4 70.6 74.3 78.8 95.0 101.2 108.1 115.8
value prob. 25 30 35 40 45 50 55 60 65 70 75 80 25 30 35 40 45 50 55 60 65 70 75 80
1.64 .95 15.8 14.7 14.4 14.9 16.2 18.3 16.0 19.7 24.2 18.6 24.7 31.6 15.5 14.4 14.1 14.6 15.9 15.3 18.2 16.7 21.2 26.5 21.7 28.6
LE average
1.28 .90 20.4 19.3 19.0 19.5 20.8 22.9 21.7 25.4 29.9 26.7 32.8 39.7 19.4 18.3 18.0 18.5 19.8 19.9 22.8 22.4 26.9 32.2 29.8 36.7
0.84 .80 25.9 24.8 24.5 25.0 26.3 28.4 28.7 32.4 36.9 36.6 42.7 49.6 24.3 23.2 22.9 23.4 24.7 25.4 28.3 29.3 33.8 39.1 39.7 46.6
0 .50 36.6 35.5 35.2 35.7 37.0 39.1 42.0 45.7 50.2 55.5 61.6 68.5 33.6 32.5 32.2 32.7 34.0 36.1 39.0 42.7 47.2 52.5 58.6 65.5
-0.84 .20 47.2 46.1 45.8 46.3 47.6 49.7 55.3 59.0 63.5 74.4 80.5 87.4 42.8 41.7 41.4 41.9 43.2 46.7 49.6 56.0 60.5 65.8 77.4 84.3
-1.28 .10 52.8 51.7 51.4 51.9 53.2 55.3 62.3 66.0 70.5 84.2 90.3 97.2 47.7 46.6 46.3 46.8 48.1 52.3 55.2 63.0 67.5 72.8 87.3 94.2
-1.64 .05 57.3 56.2 55.9 56.4 57.7 59.8 68.0 71.7 76.2 92.3 98.4 105.3 51.7 50.6 50.3 50.8 52.1 56.8 59.7 68.7 73.2 78.5 95.4 102.3
value prob. 25 30 35 40 45 50 55 60 65 70 75 80 25 30 35 40 45 50 55 60 65 70 75 80
1.64 .95 14.2 15.7 15.3 15.7 14.2 16.2 19.0 17.4 21.8 27.0 22.2 29.0 13.8 12.7 12.3 12.7 13.9 15.9 16.0 19.6 18.8 24.0 19.1 25.9
LE high
1.28 .90 18.7 19.6 19.2 19.6 18.8 20.8 23.6 23.1 27.5 32.7 30.2 37.0 17.8 16.6 16.2 16.6 17.8 19.8 20.6 24.2 24.5 29.7 27.2 34.0
0.84 .80 24.3 24.5 24.1 24.5 24.3 26.3 29.1 30.1 34.5 39.7 40.1 46.9 22.7 21.5 21.1 21.5 22.7 24.7 26.1 29.7 31.5 36.7 37.1 43.9
0 .50 35.0 33.8 33.4 33.8 35.0 37.0 39.8 43.4 47.8 53.0 59.0 65.8 31.9 30.8 30.4 30.8 32.0 34.0 36.8 40.4 44.8 50.0 56.0 62.8
-0.84 .20 45.6 43.0 42.6 43.0 45.6 47.6 50.4 56.7 61.1 66.3 77.9 84.7 41.2 40.0 39.6 40.0 41.2 43.2 47.4 51.0 58.1 63.3 74.9 81.7
-1.28 .10 51.2 47.9 47.5 47.9 51.2 53.2 56.0 63.7 68.1 73.3 87.8 94.6 46.1 44.9 44.5 44.9 46.1 48.1 53.0 56.6 65.1 70.3 84.8 91.6
-1.64 .05 55.7 51.9 51.5 51.9 55.8 57.8 60.6 69.4 73.8 79.0 95.9 102.7 50.0 48.9 48.5 48.9 50.1 52.1 57.6 61.2 70.8 76.0 92.9 99.7
NOTES Graf, P., Uttl, B., & Tuokko, H. (1995). Color- and Picture-word Stroop
tests: Performance changes in old age. Journal of Clinical and Exper-
1. A Z score is usually calculated as (observed score – mean score) / imental Neuropsychology, 17, 390-415.
SD (so without reversing the positive/negative sign), because in general a Hameleers, P. A. H. M., Van Boxtel, M. P. J., Hogervorst, E., Riedel, W. J.,
higher/lower observed score compared to the mean score signifies a Houx, P. J., Buntinx, F., & Jolles, J. (2000). Habitual caffeine con-
better/worse performance than expected, respectively. With regard to the sumption and its relation to memory, planning capacity and psycho-
Stroop test scores, however, a higher score means a worse performance. motor performance across multiple age groups. Human
For this reason, the sign of the residual value is reversed. Psychopharmacology: Clinical and Experimental, 15, 573-581.
Hammes, J. (1973). De Stroop Kleur-Woord Test: Handleiding [The
2. The sign of the residual value is reversed for the reason given in
Stroop Color-Word Test: Manual]. Amsterdam: Swets & Zeitlinger.
Note 1.
Hays, W. L., & Winkler, R. L. (1971). Statistics: Probability, inference,
and decision. New York: Holt, Rinehart & Winston.
Heaton, R. K., Avitable, N., Grant, I., & Matthews, C. G. (1999). Further
REFERENCES crossvalidation of regression-based neuropsychological norms with
an update for the Boston Naming Test. Journal of Clinical and Exper-
imental Neuropsychology, 21, 572-582.
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and inter- Heaton, R. K., Matthews, C. G., Grant, I., & Avitable, N. (1996). Demo-
preting interactions. Newbury Park, CA: Sage. graphic corrections with comprehensive norms: An overzealous at-
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: tempt, or a good start? Journal of Clinical and Experimental
Identifying influential data and sources of collinearity. New York: Neuropsychology, 18, 449-458.
John Wiley. Houx, P. J., Jolles, J., & Vreeling, F. W. (1993). Stroop Interference: Ag-
Bohnen, N., Twijnstra, A., & Jolles, J. (1992). Performance in the Stroop ing effects associated with the Stroop Color-Word Test. Experimental
Color Word Test in relationship to the persistence of symptoms fol- Aging Research, 19, 209-224.
lowing mild head injury. Acta Neurologica Scandinavica, 85, 116- Houx, P. J., Shepherd, J., Blauw, G., Murphy, M. B., Ford, I., Bollen,
121. E. L., et al. (2002). Testing cognitive function in elderly populations:
Bosma, H., Van Boxtel, M. P. J., Ponds, R. W. H. M., Houx, P. J., & Jolles, The PROSPER study. Journal of Neurology, Neurosurgery & Psychi-
J. (2002). Mental work demands protect against cognitive impair- atry, 73, 385-389.
ment: MAAS prospective cohort study. Experimental Aging Re- Ivnik, R. J., Malec, J. F., Smith, G. E., Tangalos, E. G., & Petersen, R. C.
search, 29, 33-45. (1996). Neuropsychological tests’ norms above age 55: COWAT,
Capitani, E. (1997). Normative data and neuropsychological assessment: BNT, MAE Token, WRAT-R Reading, AMNART, STROOP, TMT,
Common problems in clinical practice and research. Neuropsycho- and JLO. The Clinical Neuropsychologist, 10, 262-278.
logical Rehabilitation, 7, 295-309. Jolles, J., Houx, P. J., Van Boxtel, M. P. J., & Ponds, R. W. H. M. (1995).
Comalli, P. E., Wapner, S., & Werner, H. (1962). Interference effects of Maastricht aging study: Determinants of cognitive aging. Maastricht,
Stroop Color-Word Test in childhood, adulthood, and aging. Journal the Netherlands: Neuropsych Publishers.
of Genetic Psychology, 100, 47-53. Klein, M., Ponds, R. W., Houx, P. J., & Jolles, J. (1997). Effect of test du-
Daigneault, S., Braun, C. M., & Whitaker, H. A. (1992). Early effects of ration on age-related differences in Stroop interference. Journal of
normal aging on perseverative and non-perseverative prefrontal mea- Clinical and Experimental Neuropsychology, 19, 77-82.
sures. Developmental Neuropsychology, 8, 99-114. Kleinbaum, D. G., Kupper, L. L., Muller, K. E., & Nizam, A. (1998). Ap-
Davidson, D. J., Zacks, R. T., & Williams, C. C. (2003). Stroop Interfer- plied regression analysis and other multivariate methods (3rd ed.).
ence, practice and aging. Aging Neuropsychology and Cognition, 10, New York: Duxbury Press.
85-98. Lamberts, H., & Wood, M. (1987). ICPC: International Classification of
De Bie, S. E. (1987). Standaardvragen 1987: Voorstellen voor uniformer- Primary Care. Oxford: Oxford University Press.
ing van vraagstellingen naar achtergrondkenmerken en interviews Le Carret, N., Lafont, S., Letenneur, L., Dartigues, J. F., Mayo, W., &
[Standard questions 1987: Proposal for uniformization of questions Fabrigoule, C. (2003). The effect of education on cognitive perfor-
regarding background variables and interviews]. Leiden, the Nether- mances and its implication for the constitution of the cognitive re-
lands: Leiden University Press. serve. Developmental Neuropsychology, 23, 317-337.
Dufouil, C., Alpérovitch, A., & Tzourio, C. (2003). Influence of educa- Lezak, M. D., Howieson, D. B., & Loring, D. W. (2004). Neuropsycho-
tion on the relationship between white matter lesions and cognition. logical assessment (4th ed.). New York: Oxford University Press.
Neurology, 60, 831-836. Libon, D. J., Glosser, G., Malamut, B. L., Kaplan, E., Goldberg, E.,
Dulaney, C. L., & Rogers, W. A. (1994). Mechanisms underlying reduc- Swenson, R., et al. (1994). Age, executive functions, and visuospatial
tion in Stroop Interference with practice for young and old adults. functioning in healthy older adults. Neuropsychology, 8, 38-43.
Journal of Experimental Psychology: Learning, Memory, and Cogni- MacLeod, C. M. (1991). Half a century of research on the Stroop effect:
tion, 20, 470-484. An integrative review. Psychological Bulletin, 109, 163-203.
Fastenau, P. S. (1998). Validity of regression-based norms: an empirical Marquardt, D. W. (1980). You should standardize the predictor variables
test of the comprehensive norms with older adults. Journal of Clinical in your regression models. Journal of the American Statistical Asso-
and Experimental Neuropsychology, 6, 906-916. ciation, 75, 87-91.
Fastenau, P. S., & Adams, K. M. (1996). Heaton, Grant, and Matthews’ Martin, N. J., & Franzen, M. D. (1989). The effect of anxiety on neuro-
Comprehensive Norms: An overzealous attempt. Journal of Clinical psychological function. International Journal of Clinical Neuro-
and Experimental Neuropsychology, 18, 444-448. psychology, 11, 1-8.
Feinstein, A., Brown, R., & Ron, M. (1994). Effects of practice of serial Metsemakers, J. F. M., Höppener, P., Knottnerus, J. A., Kocken, R. J. J., &
tests of attention in healthy subjects. Journal of Clinical and Experi- Limonard, C. B. G. (1992). Computerized health information in the
mental Neuropsychology, 16, 436-447. Netherlands: A registration network of family practices. British Jour-
Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). Mini-Mental nal of General Practice, 42, 102-106.
State: a practical method for grading the cognitive state of patients for Mitrushina, M. N., Boone, K. B., & D’Elia, L. F. (1999). Handbook of
the clinician. Journal of Psychiatric Research, 12, 189-198. normative data for neuropsychological assessment. New York: Ox-
Fox, J. (1997). Applied regression analysis, linear models, and related ford University Press.
methods. Newbury Park, CA: Sage. Moering, R. G., Schinka, J. A., Mortimer, J. A., & Graves, A. B. (2003).
Golden, C. (1978). Stroop Color and Word Test: Manual for clinical and Normative data for elderly African Americans for the Stroop Color
experimental uses. Chicago: Stoelting. and Word Test. Archives of Clinical Neuropsychology, 607, 1-11.
Van der Elst et al. / THE STROOP COLOR-WORD TEST 79
Møller, J. T., Cluitmans, P., Rasmussen, L. S., Houx, P. J., Rasmussen, H., Wim Van der Elst is a PhD student in neuropsychology in the
Canet, J., et al. (1998). Long-term postoperative cognitive dysfunc-
Department of Psychiatry and Neuropsychology at Maastricht
tion in the elderly: ISPOCD1 study. The Lancet, 351, 857-861.
Scarmeas, N., Levy, G., Tang, M. X., Manly, J., & Stern, Y. (2001). Influ-
University in the Netherlands. His research interests include
ence of leisure activity on the incidence of Alzheimer’s disease. Neu- psychometrics and models of cognitive aging. He has authored
rology, 57, 2236-2242. several articles on the development of normative data for com-
Spreen, O., & Strauss, E. (1998). A compendium of neuropsychological monly used neuropsychological tests.
tests. New York: Oxford University Press.
Stern, Y., Zarahn, E., Hilton, H. J., Flynn, J., DeLaPaz, R., & Rakitin Martin P. J. Van Boxtel, MD, PhD, is an associate professor in
(2003). Exploring the neural basis of cognitive reserve. Journal of the Department of Psychiatry and Neuropsychology at
Clinical and Experimental Neuropsychology, 25, 691-701.
Maastricht University. He is engaged in the Maastricht Aging
Stroop, J. (1935). Studies of interference in serial verbal reactions. Jour-
nal of Experimental Psychology, 18, 643-662.
Study, a longitudinal research program of the determinants of
Swerdlow, N. R., Filion, D., Geyer, M. A., & Braff, D. L. (1995). “Nor- usual and pathological cognitive aging. His present research
mal” personality correlates of sensimotor, cognitive, and visuospatial interests involve vascular risk factors and imaging techniques in
gating. Biological Psychiatry, 37, 286-299. cognitive aging studies.
Trenerry, M., Crosson, B., DeBoe, J., & Leber, W. (1989). Stroop Neuro-
psychological Screening Test manual. Adessa, FL: Psychological As- Gerard J. P. Van Breukelen is an associate professor of statis-
sessment Resources (PAR). tics in the Department of Psychology at Maastricht University.
United Nations Educational, Scientific and Cultural Organisation. (1976). His past research focused on psychometrics and response time
International Standard Classification of Education (ISCED). Paris:
UNESCO.
models. His current research deals with the optimal design and
Uttl, B., & Graf, P. (1997). Color-Word Stroop test performance across analysis of field experiments with a nested design or repeated
the adult life span. Journal of Clinical and Experimental Neuro- measures and is based on mixed (multilevel) regression model-
psychology, 19, 405-420. ing. He has coauthored numerous articles in this field and in
Valentijn, S. A. M., van Boxtel, M. P. J., Van Hooren, S. A., Bosma, H., applied psychology.
Beckers, H. J. M., Ponds, R. W. H. M., et al. (2005). Change in sen-
sory functioning predicts change in cognitive functioning: Results Jelle Jolles, PhD, has a degree in psychology (1977, specializa-
from a 6-year follow-up in the Maastricht Aging Study. Journal of the
tion neuropsychology) and a degree in chemistry (1975, special-
American Geriatrics Society, 53, 374-380.
Van Boxtel, M. P. J., ten Tusscher, M. P. M., Metsemakers, J. F. M.,
ization neurochemistry). He is a full professor of neuropsychol-
Willems, B., & Jolles, J. (2001). Visual determinants of reduced per- ogy and biological psychology at Maastricht University. He
formance on the Stroop Color-Word Test in normal aging individuals. leads the Division of Cognitive Disorders, which is embedded in
Journal of Clinical and Experimental Neuropsychology, 23, 620-627. the Research Institute Brain & Behavior, and is the director of the
Van Breukelen, G. J. P., & Vlaeyen, J. W. S. (in press). Norming Clinical Alzheimer Centre Limburg. Research activities are focused on
Questionnaires with multiple regression: the pain cognition list. Psy-
chological Assessment.
the relationship between biological and psychological factors in
Van der Elst, W., Van Boxtel, M. P. J., Van Breukelen, G. P. J., & Jolles, J. their effect on behavioral, cognitive, and affective functioning in
(2005). Rey’s verbal learning test: Normative data for 1855 healthy normal subjects and neuropsychiatric patients. His managerial
participants aged 24-81 years and the influence of sex, education, and activities relate to research, teaching, education, and patient care.
model of presentation. Journal of the International Neuropsycho- He is head of the patient care unit specializing in clinical neuro-
logical Society, 11, 290-302.
psychology at the psychiatric hospital Vijverdal and the Aca-
Zachary, R. A., & Gorsuch, R. L. (1985). Continuous norming: Implica-
tions for the WAIS-R. Journal of Clinical Psychology, 41, 86-97. demic Hospital Maastricht.