Confounding
Confounding
Confounding
validity
• Learning outcome: at the end of this lecture, students should
be able to
Explain the concept of confounding and how it
affects the results of epidemiologic studies
Reiterate the criteria that a variable must meet to be
a possible confounder
Conduct a stratified analysis to determine whether a variable
is a confounder or not
Provide examples of exposure/outcome/confounder
relationships, in terms of confounder criteria and
analysis requirements
Kemoh Rogers
Confounding
• Like random error and bias, confounding is another threat to study
validity
• While it also leads to a systematic error in the data, confounding is a
special case
Confounding
• Imagine that you are doing a cross-sectional study in Benevolent,
Makeni Pupils of foot size and reading ability
• The question we pose is; does foot size affect reading ability?
Confounding
• You go to Benevolent, Makeni and measure
both foot size (measured as length in inches) and
reading ability (measured in terms of words read per
minute, averaged over a 5-minute testing period), and
you collect the following data
Participant # Foot Size (inches) Reading Speed (wpm)
1 7.2 40
2 7.7 85
3 7.2 63
4 7.6 52
5 7.4 51
6 7.1 41
7 7.0 82
8 7.2 60
9 7.6 53
10 7.5 55
11 8.3 123
12 8.2 97
13 8.5 108
14 8.1 111
15 8.2 109
16 8.2 99
30 9.0 137
Participant # Foot Size (inches) Reading Speed (wpm)
17 8.7 95
18 8.0 110
19 8.5 121
20 8.2 108
21 9.4 128
22 8.1 117
23 9.8 115
24 8.8 109
25 9.1 112
26 9.3 112
27 9.8 106
28 9.2 125
29 9.6 163
30 9.0 137
Confounding
• As discussed previously, we will always dichotomize (i.e., split in
two) continuous variables to make the math simpler. If we
dichotomize both foot size and reading speed—at 8.25” and 100 wpm,
respectively—we can draw the following 2 x 2 table:
• (Last, 2001)
Reading Speed
<100 100+
<8.25″ 12 5
Foot Size
8.25″+ 1 12
Confounding
• Because this is a cross-sectional study, we would calculate the odds
ratio
• OR = A/B C/D
• = AD/BC
• = (12)(12)
• (12)(12)
• =28.8
Confounding
In words,
• Pupils at Benevolent, Makeni with feet that are at least 8.25″ long are
28.8 times as likely to be able to read at least 100 words per minute,
compared to students with shorter feet.
• This is a huge finding! Should we give all Benevolent, Makeni Pupils
growth hormones so that they get bigger feet and increase their
reading speeds?
• Given that the target population for this hypothetical study is
Benevolent Pupils, it seems likely that there is a confounder at work—
namely, students in higher education will have bigger feet because
they are older, and they will also by and large be faster readers:
Education level
(confounder)
Confounding
• In this scenario, we need to control for the confounder (education
level): we need to remove its influence to get an accurate estimate of
the association between the
exposure (foot size) and the
outcome (reading ability)
• Before we delve into how to control for confounders, let’s discuss
what confounders are from a theoretical perspective
A confounder - definition
• A confounder is a third variable—not the exposure, and not the
outcome - that biases the measure of association we calculate for the
particular exposure/outcome pair
• Importantly, from a research perspective, we never want to report a
measure of association that is confounded.
• Imagine if we do our cross-sectional study on foot size and reading
ability, without accounting for level of education.
• We would report the odds ratio of 28.8 as calculated above…
• We’ve reported an association that’s not really true—it’s just
confounded by grade level.
Criteria for Confounders
• There are 3 criteria that a variable must meet in order for it to be a
potential confounder
The variable must be statistically associated with the exposure.
The variable must cause the outcome.
The variable must not be on a causal pathway
• We now discuss each of these in more detail
• (Bovbjerg, 2019; Hennekens and Buring, 2012)
1. Associated with Exposure
• Association is a statistical term that does not necessarily imply a
causal relationship (we have discussed this previously)
• Basically, association means that the confounding variable is more
common in the exposed group than the unexposed group (or vice
versa), thus producing a statistical association
• The confounder does not need to cause or prevent the exposure, it just
needs to be disproportionately distributed between the exposed and
unexposed groups
Associated with Exposure
• In our previous example, education level is disproportionately
distributed among various foot sizes—students at higher grades are
more likely to have bigger feet compared to kids in lower grades
• Note that there can be a causal relationship, with the confounder
causing the exposure (but not the other way around—see criterion 3),
but this is not necessary. In our example, grade level is not causing
foot size (age is causing foot size)—but they are associated
2. Causes the Outcome
• In this case, there must be a causal link between the confounder and
the outcome
• It does not have to be a proven causal link, just an “it is reasonably
possible that this exposure causes (or prevents) that outcome” link.
• In our foot size/reading ability example, education level (the
confounder) certainly causes faster reading speed (the outcome)
• Importantly, the confounder must cause the outcome—not the other
way around.
• If the outcome is causing the confounder, then it’s not a confounder.
2. Causes the Outcome
• There are many times in epidemiology when we aren’t sure which way
a causal arrow would go—does the disease cause the confounder, or
does the confounder cause the disease?
• An example might be excessive weight loss and illness
• Losing a large amount of weight quickly can make one ill—but being
ill can also cause a large amount of weight loss
2. Causes the Outcome
• In scenarios like this, where we aren’t sure which way the arrow
points, what epidemiologists do in practice is first assume the arrow
goes one way and do the analysis accordingly (here, that would mean
either including or not the potential confounder)
• They then assume the arrow goes the other way and do the analysis
again
• If the results of both analyses are similar, then the arrow direction isn’t
important. But if the 2 analyses produce very different results, then we
would report both and let the reader decide which is more applicable
for them.
3. Not on the Causal Pathway
• The final criterion for a variable to be a potential confounder is that it
is not on the causal pathway from exposure to outcome
• This is to say that we do not want a scenario such as this below
3. Not on the Causal Pathway
• An example of a variable on a causal pathway might be as follows
WASSCE Pupils
1. Restricting the sample
• By restricting to just WASSCE Pupils, we remove the confounding by
grade level: Pupils in both higher and lower grades are no longer
relevant because if we have only WASSCE Pupils, then there aren’t
any in higher or lower grades.
• Among WASSCE Pupils only, we would expect that foot size and
reading ability are uncorrelated.
Inherent variability
• Obviously not all WASSCE Pupils will have the same size feet, nor
will all uniformly have the same reading ability.
• However, on a group level, WASSCE Pupils in general have bigger
feet and are better readers than first graders, and who likewise have
smaller feet and are poorer readers than WASSCE Pupils.
• Epidemiology as a science works because of both this individual
variation and the fact that groups of people (selected on some
characteristic, like WASSCE Pupils level) are more similar to each
other than they are to people in other groups.
• See the same data with a column added for grade level:
Participant # Foot Size (inches) Reading Speed Grade
(wpm)
1 7.2 40 1
2 7.7 85 1
3 7.2 63 1
4 7.6 52 1
5 7.4 51 1
6 7.1 41 1
7 7.0 82 1
8 7.2 60 1
9 7.6 53 1
10 7.5 55 1
11 8.3 123 3
12 8.2 97 3
13 8.5 108 3
14 8.1 111 3
15 8.2 109 3
Participant # Foot Size (inches) Reading Speed (wpm) Grade
16 8.2 99 3
17 8.7 95 3
18 8.0 110 3
19 8.5 121 3
20 8.2 108 3
21 9.4 128 5
22 8.1 117 5
23 9.8 115 5
24 8.8 109 5
25 9.1 112 5
26 9.3 112 5
27 9.8 106 5
28 9.2 125 5
29 9.6 163 5
30 9.0 137 5
Restricting ourselves to just third grade
Reading Speed
<100 100+
<8.25″ 2 2
Foot Size
8.25″+ 3 3
Restricting ourselves to just third grade
• Limiting ourselves to just third grade, then, the 2 x 2 table looks like
this
• We conduct the study and obtain the data in the next slides
Ever OCP? Ovarian Cancer?
Participant #
0 = no, 1 = yes 0 = no (control), 1 = yes (case)
1 1 1
2 1 1
3 1 1
4 1 1
5 0 1
6 0 1
7 0 1
8 0 1
9 0 1
10 0 1
11 1 0
12 1 0
13 1 0
14 1 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 0
20 0 0
The 2 x 2 table would be as follows
Ovarian Cancer
+ –
Ever 4 4
OCP
Never 6 6
Stratifying
• The OR is 1.0—use of oral contraceptives is not associated with
ovarian cancer.
• During confounding analyses, this value is referred to as the crude or
unadjusted measure of association—meaning that we have not yet
accounted, adjusted, or controlled for any confounders.
• Unadjusted measures only take into account the exposure and the
outcome.
• What about smoking as a confounder? Let’s check the confounder
criteria in the next slide
Smoking as a confounder
• 1. The variable must be associated with the exposure.
Yes! Both oral contraceptives and smoking increase one’s risk of deep
venous thrombosis, a potentially life-threatening condition.
• Smoking is thus considered a contraindication to oral contraceptive
use, which leads clinicians to prescribe other forms of birth control
instead for women who smoke. (Bonnema et al, 2010)
• This leads to a disproportionate distribution of smokers (the
confounder) between women who do and do not use oral
contraceptives (the exposure).
• 2. The variable must cause the outcome.
Possibly. While we often think of smoking as causing lung cancer
(which it certainly does), smoking has also been associated with other
cancers often enough that it is reasonable to suspect that it might cause
ovarian cancer too
• 3 The variable must not be on a causal pathway.
Yes! It seems highly unlikely that taking birth control pills would in
turn cause a woman to take up smoking
Smoking thus meets our criteria and is a potential confounder in this
scenario (Greenland and Robins, 1999)
The data with smoking status added is shown below
• (Bovbjerg, 2019; Hennekens and Buring, 2012)
Ovarian Cancer? Smoker?
Ever OCP?
Participant # 0 = no (control), 0 = no, 1 =
0 = no, 1 = yes
1 = yes (case) yes
1 1 1 1
2 1 1 1
3 1 1 0
4 1 1 0
5 0 1 1
6 0 1 1
7 0 1 1
8 0 1 0
9 0 1 0
10 0 1 0
11 1 1 1
12 1 0 1
13 1 0 0
14 1 0 0
15 0 0 1
16 0 0 1
17 0 0 1
18 0 0 0
19 0 0 0
20 0 0 0
Stratify by smoking status
• We now stratify by smoking status. In other words, we make 2
different 2 × 2 tables: one for smokers, and the other for nonsmokers.
• Keep in mind that all the women who appeared in the above 2 × 2
table for ovarian cancer and OCP use are still present—they’re just in
one of the two tables below, depending on whether they smoke or not.
Stratify by smoking status
Smokers Nonsmokers
+ – + –
Ever 2 3 Ever 2 3
OCP OCP
Never 3 2 Never 3 2
Stratify by smoking status
• Note that the 2 x 2 tables are still for OCP (exposure) and ovarian
cancer (outcome)—we have just made one such table for smokers and
another for nonsmokers
• The next step in a stratified analysis is to calculate the ORs from these
2 x 2 tables, so we have an
OR for smokers, and an
OR for nonsmokers.
Stratify by smoking status
• The odds ratio for smokers is: • The odds ratio for non-smokers is:
AD AD
• ORsmokers= • ORnon-smokers=
BC BC
= (2)(2) =
(3)(3)
= 0.44
= 0.44 Interpretation:
Interpretation:
Among nonsmokers, women who have ovarian
Women who have ovarian cancer are 0.44 times as cancer are 0.44 times as likely to report a history
likely to report a history of OCP use, compared to of oral contraceptive (OCP) use, compared to
women without ovarian cancer - among smokers only. women without ovarian cancer.
Confounders
• When conducting stratified analysis, it is important to say which group
your measure of association applies to. This can come either at the
beginning (as it does above for nonsmokers) or at the end (as it does
above for smokers).
• Since our stratum-specific odds ratios (0.44 for smokers and 0.44 for
nonsmokers) are similar to each other but different from the crude OR
(which was 1.0), we say that smoking is indeed acting as a confounder
in these data. The crude OR was wrong; it was confounded by
smoking.
Interpretation of confounders
• To interpret our OCP/ovarian cancer findings in words (the adjusted
odds ratio, whether calculated via Mantel-Haenzel or by regression
(not used here), is 0.44), we would say:
• 1. Women who have ovarian cancer are 0.44 times as likely to report a
history of OCP use compared to women without ovarian cancer,
controlling for smoking
• Or we could say:
• 2. Women who have ovarian cancer are 0.44 times as likely to report a
history of OCP use compared to women without ovarian cancer,
adjusting for smoking.
Interpretation of confounders
• Or we could say:
• 3. Women who have ovarian cancer are 0.44 times as likely to report a
history of oral contraceptive use compared to women without ovarian
cancer, holding smoking constant
• Notice how there are multiple ways of letting the reader know that
smoking was treated as a confounder (phrases in red). It doesn’t matter
which you choose—the important thing is that you make it clear that
we are presenting the measure of association having already dealt with
the confounding.
• (Bovbjerg, 2019; Hennekens and Buring, 2012)
Choosing Confounders
• When conducting an analysis in real life, there are often multiple
potential confounders. The first step in any analysis is to make a list of
all such potential confounders.
• The easiest way to do this is first to make a list of all variables that
might cause your outcome. Then take that list and make sure the
variables are associated with the exposure.
• Finally, for any confounders that meet our first 2 criteria, make sure
they are not on the causal pathway (e.g., that the exposure is not
causing the confounder).
Choosing Confounders
• As mentioned above, there are many instances where it is difficult to
know which is causing which; in such cases, we do the analysis both
ways.
• The next step would be to determine which of the potential
confounders meet the 3 criteria to control for in an analysis (regression
allows you to control for many confounders at once, if you wish to
read tis ).
(Bovbjerg, 2019)
• Bonnema RA, McNamara MC, Spencer AL. Contraception choices in women with underlying medical
conditions. Am Fam Physician. 2010;82(6):621-628.
• Bovbjerg, M.L., 2019. Foundations of epidemiology.
• Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiol Camb Mass.
1999;10(1):37-48
• Hennekens, C.H. and Buring, J.E., 2012. Epidemiology in medicine. In Epidemiology in medicine (pp. 383-
383).
• Last, J.M., 2001. Pandemic. A dictionary of epidemiology (4th ed.). Oxford: Oxford University Press.
https://doi. org/10.1093/aje/154.1.
• https://www.cancer.org/latest-news/study-smoking-causes-almost-half-of-deaths-from-12-cancer-
types.html.