Kwy 280
Kwy 280
Kwy 280
5
© The Author(s) 2019. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of DOI: 10.1093/aje/kwy280
Public Health. All rights reserved. For permissions, please e-mail: [email protected].
Advance Access publication:
March 16, 2019
Commentary
Sam Harper*
* Correspondence to Dr. Sam Harper, Department of Epidemiology, Biostatistics & Occupational Health, McGill University, 1020
Initially submitted November 12, 2018; accepted for publication December 18, 2018.
Observational studies are ambiguous, difficult, and necessary for epidemiology. Presently, there are concerns
that the evidence produced by most observational studies in epidemiology is not credible and contributes to
research waste. I argue that observational epidemiology could be improved by focusing greater attention on
1) defining questions that make clear whether the inferential goal is descriptive or causal; 2) greater utilization of
quantitative bias analysis and alternative research designs that aim to decrease the strength of assumptions
needed to estimate causal effects; and 3) promoting, experimenting with, and perhaps institutionalizing both repro-
ducible research standards and replication studies to evaluate the fragility of study findings in epidemiology.
Greater clarity, credibility, and transparency in observational epidemiology will help to provide reliable evidence
that can serve as a basis for making decisions about clinical or population-health interventions.
causal inference; observational studies; quantitative bias analysis; quasi-experiments; reproducible research;
research reporting; transparency
Observational studies are ambiguous, difficult, and neces- also increased our reliance on evidence synthesis for observa-
sary. Randomized treatment assignment has important con- tional studies (e.g., systematic reviews, meta-analysis, meta-
ceptual strengths, especially for reducing bias, but it is resource meta-analysis), but such studies also hide many investigator
intensive and often applied to highly selected populations, mak- decisions (13). Moreover, synthesized evidence may reduce
ing generalizations difficult. There are good reasons (feasibility, the role of sampling error but will do nothing about system-
resources, ethical issues) why observational studies remain the atic errors. Consensus, repetition, and synthesis of poor research
workhorse of epidemiology and constitute the bulk of epidemi- does not generate strong evidence.
ologic output. Recent years have seen critiques (and defenses) Large-scale replication efforts in observational epidemiology
of observational studies, but the issues are longstanding. In the are lacking, but there is increasing concern that the considerable
late 1950s, R. A. Fisher (1) notoriously and mistakenly thought flexibility researchers have when analyzing observational data
that observational studies of the association between smoking can lead to unreliable results. We know, for example, that if you
and lung cancer were flawed. Concerns about reliability related give multiple authors the same observational data set and ask
to clinical questions were raised in the early 1980s (2) and were them to answer the same (admittedly vague) research ques-
extended to questions about “lifestyle” exposures later in the tion, you will see a broad range of estimates, largely because
decade (3, 4). In the early 21st century, concerns about con- of the different methodological approaches used (14). We also
flicting findings from observational studies and large-scale ran- know that when a single investigative team analyzes a single
domized evaluations (5, 6) led some epidemiologists to ask data set using various regression specifications, coding and
whether it was “time to call it a day” for observational studies inclusion of covariates, and decision rules, they can generate
(7), as well as to broad accusations of “junk science” being leveled results on either side of the null (15, 16). Such analyses do
at the entire discipline in the popular press (8). not suggest any causal hypotheses; rather, they serve to illus-
Despite important new efforts to reconcile differences between trate the myriad ways in which the same data may provide re-
randomized and nonrandomized studies (9, 10), the issue of sults consistent with many conflicting hypotheses. Such flexibility
the reliability of observational evidence isn’t likely to go away also makes it extraordinarily difficult to identify true heterogeneity
(see the recent discrepancy between randomized and observa- in effects across populations that may be relevant for understand-
tional results from the Illinois workplace wellness evaluation ing causal mechanisms, because presently we cannot separate
(11)). Exponential growth in the published literature (12) has it from the (typically unseen) choices of the investigators.
These problems are not restricted to observational studies surveillance in state and local public health agencies (25–27).
in epidemiology; they are pervasive within many scientific Such analyses are quite explicitly descriptive: Is the number
disciplines. Distorted incentives for publishing, promotion, and of deaths increasing or decreasing? Who is dying, from what
granting that prioritize novelty and quantity rather than quality are they dying, and where are they dying? Yet few, if any, of
exacerbate the problem (17). One goal of observational epide- these reports were published in leading epidemiology journals.
miology is to provide reliable evidence about causal relation- This is not necessarily an indication of bias (perhaps they were
ships between exposures and outcomes, such that the evidence never submitted), but the fact that a massive increase in disabil-
could serve as a basis for making decisions about clinical or ity and death due to opioids, with profound implications for
population-health interventions. Unreliable, poor-quality epi- society and the economy, has received comparatively little atten-
demiologic research limits scientific understanding of the deter- tion in our top journals feels like a lost opportunity.
minants of health and disease, reduces our ability to contribute Similarly, the impact of the disastrous cost-cutting deci-
to clinical and policy decisions, and could potentially under- sion to change the water supply in the city of Flint, Michigan,
mine the public’s faith in epidemiologic research. from Lake Huron to the Flint River was known not only as a
result of citizens voicing their concerns (28) but also from basic
Am J Epidemiol. 2019;188(5):840–845
842 Harper
to create exchangeable treatment groups through “acts of chance” of results. Finally, I think we need transparency in the preparation
or other arbitrary rules or events not under investigators’ control. and deposition of materials so that people other than the research-
In the best-case scenario, this leads to randomized or “as-if” ran- ers can reproduce and, possibly improve upon, their results.
domized treatment allocation (e.g., a lottery), straightforward Here, I think epidemiology can learn from other disciplines
analysis, and more robust inference. These designs are not new that have also been feeling the heat of the so-called “replication
but also are not extensively taught in most epidemiology pro- crisis.” Large-scale efforts to replicate published results in psy-
grams. However, recent explications in the epidemiologic liter- chology (61), genetics (62), cancer biology (63), neuroscience
ature of interrupted time series (49), difference-in-differences (50), (64), and economics (65) have generally produced results that
synthetic controls (51), instrumental variables (52), and regres- do not inspire confidence in published work. Quite frankly, I
sion discontinuity (53) should help. think epidemiology needs to embrace more vigorous challenges
These designs are valuable, and we should encourage greater of the evidence it produces. To that extent, I hope that the edi-
use of them to help reduce some sources of bias. However, tors and editorial boards of our leading journals also experiment
enthusiasm for quasi-experimental designs should be tempered with strategies for promoting replication studies.
by keeping in mind that they are still observational studies. They It is important to be clear about the distinction between so-
Am J Epidemiol. 2019;188(5):840–845
A Future for Observational Epidemiology 843
Re-analyses of published randomized trials with different as- inadequate peer-review, novelty chasing, exaggerated claims, and
sumptions that led to different conclusions (73) should make conflicts of interest, all of which contribute to problems with
clear that it is not only observational studies that could benefit the credibility of published research (84). Better training of
from greater reproducibility. Clearly there may be limits to epidemiology graduate students in how to think critically and
such an exercise, because there are far too many papers sub- formulate well-defined questions, evaluate evidence and adjudi-
mitted and published to subject all papers to such scrutiny; cate between competing claims, and maintain humility regard-
however, without making the materials behind our studies ing their own research findings would be valuable. For what is
available, such dialogue will not happen. the alternative? Some may argue that epidemiology mostly gets
I wonder whether epidemiology could benefit from a more it right over the long haul, but how will we know? What other
structured approach. A recent example from development eco- masses of sliced, prodded, hacked, and unreliably “significant”
nomics highlights both the promise and some of the perils of results might be lurking beneath the surface of the sea of obser-
replication work (74). The Journal of Development Studies vational epidemiologic papers published every day? Will we
used a structured grant program to fund researchers to repli- bother to wade in and find out?
cate 8 high-profile publications from a candidate list, enlisted
Am J Epidemiol. 2019;188(5):840–845
844 Harper
Research; 2018. http://www.nber.org/papers/w24229. 31. Lieberson S. Making it Count: The Improvement of Social Research
Accessed November 16, 2018. and Theory. Berkeley, CA: Univ of California Press; 1987.
12. Bowen A, Casadevall A. Increasing disparities between 32. Lawlor DA, Tilling K, Davey Smith G. Triangulation in
resource inputs and outcomes, as measured by certain health aetiological epidemiology. Int J Epidemiol. 2016;45(6):
deliverables, in biomedical research. Proc Natl Acad Sci U S A. 1866–1886.
2015;112(36):11335–11340. 33. Klinenberg E. Denaturalizing disaster: a social autopsy of the
13. Ioannidis JPA. The mass production of redundant, misleading, 1995 chicago heat wave. Theory Soc. 1999;28(2):239–295.
and conflicted systematic reviews and meta-analyses. Milbank 34. Leon DA, Chenet L, Shkolnikov VM, et al. Huge variation in
Q. 2016;94(3):485–514. Russian mortality rates 1984–94: artefact, alcohol, or what?
14. Silberzahn R, Uhlmann EL, Martin DP, et al. Many analysts, Lancet. 1997;350(9075):383–388.
one data set: making transparent how variations in analytic 35. Venero Fernández SJ, Medina RS, Britton J, et al. The association
choices affect results. Adv Methods Pract Psychol Sci. 2018; between living through a prolonged economic depression and the
1(3):337–356. male: female birth ratio–a longitudinal study from Cuba,
15. Madigan D, Ryan PB, Schuemie M. Does design matter? 1960–2008. Am J Epidemiol. 2011;174(12):1327–1331.
Systematic evaluation of the impact of analytical choices on 36. Jayachandran S, Pandi R. Why are indian children so short?
Am J Epidemiol. 2019;188(5):840–845
A Future for Observational Epidemiology 845
54. Pletscher M. The effects of organized screening programs on 70. Hicks JH, Kremer M, Miguel E. Commentary: deworming
the demand for mammography in switzerland. Eur J Health externalities and schooling impacts in Kenya: a comment on
Econ. 2017;18(5):649–665. Aiken et al. (2015) and Davey et al. (2015). Int J Epidemiol.
55. Cullati S, von Arx M, Courvoisier DS, et al. Organised 2015;44(5):1593–1596.
population-based programmes and change in socioeconomic 71. Regnerus M. Is structural stigma’s effect on the mortality of
inequalities in mammography screening: a 1992–2012 nationwide sexual minorities robust? A failure to replicate the results of a
quasi-experimental study. Prev Med. 2018;116:19–26. published study. Soc Sci Med. 2017;188:157–165.
56. Feynman RP. Cargo cult science. Engineering and Science. 72. Hatzenbuehler ML, Bellatorre A, Lee Y, et al. Structural
1974;37(7):10–13. stigma and all-cause mortality in sexual minority populations.
57. Kahneman D. Thinking, Fast and Slow. New York, NY: Farrar, Soc Sci Med. 2014;103:33–41.
Straus & Giroux; 2011. 73. Ebrahim S, Sohani ZN, Montoya L, et al. Reanalyses of
58. Equator Network. EQUATOR Network reporting guideline randomized clinical trial data. JAMA. 2014;312(10):
manual. https://www.equator-network.org/library/equator-network- 1024–1032.
reporting-guideline-manual/. Accessed December 17, 2018. 74. Brown AN, Wood BD. Replication studies of development
59. Munafò MR, Nosek BA, Bishop DV, et al. A manifesto for impact evaluations. J Dev Stud. 2019;55(5):917–924.
Am J Epidemiol. 2019;188(5):840–845