Kwy 280

American Journal of Epidemiology Vol. 188, No.
5
© The Author(s) 2019. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of DOI: 10.1093/aje/kwy280
Public Health. All rights reserved. For permissions, please e-mail: [email protected].
Advance Access publication:
March 16, 2019
Commentary
A Future for Observational Epidemiology: Clarity, Credibility, Transparency
Sam Harper*
* Correspondence to Dr. Sam Harper, Department of Epidemiology, Biostatistics & Occupational Health, McGill University, 1020
Downloaded from https://academic.oup.com/aje/article/188/5/840/5381890 by guest on 03 July 2024

Pine Avenue West, Montreal, QC H3A 1A2 (e-mail: [email protected]).
Initially submitted November 12, 2018; accepted for publication December 18, 2018.
Observational studies are ambiguous, difficult, and necessary for epidemiology. Presently, there are concerns
that the evidence produced by most observational studies in epidemiology is not credible and contributes to
research waste. I argue that observational epidemiology could be improved by focusing greater attention on
1) defining questions that make clear whether the inferential goal is descriptive or causal; 2) greater utilization of
quantitative bias analysis and alternative research designs that aim to decrease the strength of assumptions
needed to estimate causal effects; and 3) promoting, experimenting with, and perhaps institutionalizing both repro-
ducible research standards and replication studies to evaluate the fragility of study findings in epidemiology.
Greater clarity, credibility, and transparency in observational epidemiology will help to provide reliable evidence
that can serve as a basis for making decisions about clinical or population-health interventions.
causal inference; observational studies; quantitative bias analysis; quasi-experiments; reproducible research;
research reporting; transparency
Observational studies are ambiguous, difficult, and neces- also increased our reliance on evidence synthesis for observa-
sary. Randomized treatment assignment has important con- tional studies (e.g., systematic reviews, meta-analysis, meta-
ceptual strengths, especially for reducing bias, but it is resource meta-analysis), but such studies also hide many investigator
intensive and often applied to highly selected populations, mak- decisions (13). Moreover, synthesized evidence may reduce
ing generalizations difficult. There are good reasons (feasibility, the role of sampling error but will do nothing about system-
resources, ethical issues) why observational studies remain the atic errors. Consensus, repetition, and synthesis of poor research
workhorse of epidemiology and constitute the bulk of epidemi- does not generate strong evidence.
ologic output. Recent years have seen critiques (and defenses) Large-scale replication efforts in observational epidemiology
of observational studies, but the issues are longstanding. In the are lacking, but there is increasing concern that the considerable
late 1950s, R. A. Fisher (1) notoriously and mistakenly thought flexibility researchers have when analyzing observational data
that observational studies of the association between smoking can lead to unreliable results. We know, for example, that if you
and lung cancer were flawed. Concerns about reliability related give multiple authors the same observational data set and ask
to clinical questions were raised in the early 1980s (2) and were them to answer the same (admittedly vague) research ques-
extended to questions about “lifestyle” exposures later in the tion, you will see a broad range of estimates, largely because
decade (3, 4). In the early 21st century, concerns about con- of the different methodological approaches used (14). We also
flicting findings from observational studies and large-scale ran- know that when a single investigative team analyzes a single
domized evaluations (5, 6) led some epidemiologists to ask data set using various regression specifications, coding and
whether it was “time to call it a day” for observational studies inclusion of covariates, and decision rules, they can generate
(7), as well as to broad accusations of “junk science” being leveled results on either side of the null (15, 16). Such analyses do
at the entire discipline in the popular press (8). not suggest any causal hypotheses; rather, they serve to illus-
Despite important new efforts to reconcile differences between trate the myriad ways in which the same data may provide re-
randomized and nonrandomized studies (9, 10), the issue of sults consistent with many conflicting hypotheses. Such flexibility
the reliability of observational evidence isn’t likely to go away also makes it extraordinarily difficult to identify true heterogeneity
(see the recent discrepancy between randomized and observa- in effects across populations that may be relevant for understand-
tional results from the Illinois workplace wellness evaluation ing causal mechanisms, because presently we cannot separate
(11)). Exponential growth in the published literature (12) has it from the (typically unseen) choices of the investigators.
840 Am J Epidemiol. 2019;188(5):840–845

A Future for Observational Epidemiology 841
These problems are not restricted to observational studies surveillance in state and local public health agencies (25–27).
in epidemiology; they are pervasive within many scientific Such analyses are quite explicitly descriptive: Is the number
disciplines. Distorted incentives for publishing, promotion, and of deaths increasing or decreasing? Who is dying, from what
granting that prioritize novelty and quantity rather than quality are they dying, and where are they dying? Yet few, if any, of
exacerbate the problem (17). One goal of observational epide- these reports were published in leading epidemiology journals.
miology is to provide reliable evidence about causal relation- This is not necessarily an indication of bias (perhaps they were
ships between exposures and outcomes, such that the evidence never submitted), but the fact that a massive increase in disabil-
could serve as a basis for making decisions about clinical or ity and death due to opioids, with profound implications for
population-health interventions. Unreliable, poor-quality epi- society and the economy, has received comparatively little atten-
demiologic research limits scientific understanding of the deter- tion in our top journals feels like a lost opportunity.
minants of health and disease, reduces our ability to contribute Similarly, the impact of the disastrous cost-cutting deci-
to clinical and policy decisions, and could potentially under- sion to change the water supply in the city of Flint, Michigan,
mine the public’s faith in epidemiologic research. from Lake Huron to the Flint River was known not only as a
result of citizens voicing their concerns (28) but also from basic

descriptive epidemiology (29). The full impacts of the interven-
CLARITY: WHAT WAS THE QUESTION? tion are still unknown, but routine surveillance of water quality
and, subsequently, of blood lead levels in Flint’s children (30)
How can we increase the credibility of published observa- were enormously important in establishing a credible basis for
tional epidemiologic research? Although randomized trials public health action.
have many potential limitations, one strength is that they often Furthermore, in addition to routine surveillance, descrip-
focus clearly on answering a specific causal question for a well- tive observational studies help to reconcile diverging lines of
defined treatment. Regardless of whether we are conducting a evidence, reflect on the implications of existing theories, and
randomized trial or an observational study, if the purpose is triangulate evidence from multiple sources (31, 32). Combin-
to provide a quantitative estimate of what would happen if we ing descriptive analysis with theory and historical context can
could intervene to change A from a to a*, the goal is estimat- provide extraordinarily valuable insights into social phenom-
ing a causal effect (18). The extent to which the actual study ena with important implications for human health and wellbe-
provides good evidence rests on the credibility of the assump- ing, such as climate change (33), economic crises (34, 35),
tions needed to interpret the estimate as causal (exchangeab- undernutrition (36), maternal death (37), psychosocial stress (38),
lity, positivity, consistency, correct model specification, and or population migration (39, 40). So, on one hand I would like to
absence of measurement error and selection bias). Because these see greater enthusiasm in observational epidemiology for descrip-
assumptions are unlikely to be met, many observational studies tive, historical, and narrative modes of investigation, but on the
only claim to study associations, but this leads to ambiguity other, I also hope for more reliable and useful causal investigations.
regarding the question, as well as to inaccurate or misleading
inference from regression adjustment (19) when there is no
hope for causal identification. CREDIBILITY: WEAKER ASSUMPTIONS BY DESIGN
Of course, there should be ample room within epidemiol-
ogy for studies that do not aim to estimate any causal effect We need good observational studies designed to answer causal
but rather to explore, hypothesize, characterize, interrogate, questions. Having clarified the study purpose (estimating a causal
and describe. Although I disagree with recent arguments that effect) and making clear the assumptions needed for inference,
the potential outcomes framework necessarily constrains the the key question then becomes the credibility of those assump-
kinds of causal questions it is possible to ask (20, 21), I do tions. Most authors still rely on intuitions for their assessments
think that the emphasis on (and enthusiasm for) formal causal of the likely impact of whether these assumptions are reason-
inference may have crowded out many interesting questions able, but quantitative intuitions are notoriously lousy (41, 42).
that are not explicitly causal. In fact, I would argue for greater Rather than making strong assumptions about the absence of
role for high-quality descriptive studies that do not aim to pro- unmeasured confounding or measurement error, researchers
vide causal estimates and a lesser role for purportedly “causal” can relax these assumptions by allowing for specific degrees
studies that are unlikely to provide robust evidence about causal of bias. Methods (43) and guidance (44) for making quantita-
effects. tive bias analysis a routine part of epidemiologic studies have
For example, consider the recent and deeply disturbing rise been around a long time, yet few researchers conduct any for-
in deaths from opioid overdose in the United States; presently mal quantitative bias analysis (45), and even fewer take mul-
more than 70,000 Americans die from this cause annually. tiple biases (46) into account simultaneously. The upshot is
Not only were descriptive studies crucial in understanding the overconfidence in our own findings and likely underestima-
distribution and determinants of the opioid epidemic (22, 23), tion of study uncertainty. This cannot be helpful for decision
but continued surveillance and interest have also allowed epi- making.
demiologists to identify transitional phases of the epidemic by What else can we do? In additional to making quantitative
geography, substance used (prescription opioids, heroin, or bias analysis standard, another way to potentially improve the
fentanyl), and racial/ethnic group (24), as well as to contrib- credibility of observational epidemiologic studies is to consider
ute to potential interventions. Many of the early reports in which alternative designs or analytic choices that aim to account for,
increases in the number of deaths from opioid overdose were by design, some sources of unmeasured confounding or other
found came from epidemiologists conducting basic mortality potential study biases. Quasi-experimental designs (47, 48) aim
Am J Epidemiol. 2019;188(5):840–845
842 Harper
to create exchangeable treatment groups through “acts of chance” of results. Finally, I think we need transparency in the preparation
or other arbitrary rules or events not under investigators’ control. and deposition of materials so that people other than the research-
In the best-case scenario, this leads to randomized or “as-if” ran- ers can reproduce and, possibly improve upon, their results.
domized treatment allocation (e.g., a lottery), straightforward Here, I think epidemiology can learn from other disciplines
analysis, and more robust inference. These designs are not new that have also been feeling the heat of the so-called “replication
but also are not extensively taught in most epidemiology pro- crisis.” Large-scale efforts to replicate published results in psy-
grams. However, recent explications in the epidemiologic liter- chology (61), genetics (62), cancer biology (63), neuroscience
ature of interrupted time series (49), difference-in-differences (50), (64), and economics (65) have generally produced results that
synthetic controls (51), instrumental variables (52), and regres- do not inspire confidence in published work. Quite frankly, I
sion discontinuity (53) should help. think epidemiology needs to embrace more vigorous challenges
These designs are valuable, and we should encourage greater of the evidence it produces. To that extent, I hope that the edi-
use of them to help reduce some sources of bias. However, tors and editorial boards of our leading journals also experiment
enthusiasm for quasi-experimental designs should be tempered with strategies for promoting replication studies.
by keeping in mind that they are still observational studies. They It is important to be clear about the distinction between so-

require important assumptions for valid inference, and there is called “pure” replications, independent replications, and reproduc-
some reason to worry that the label “quasi-experimental” may ible research. Pure replications involve independent researchers
make researchers less worried about all of the ways that these taking the same materials and code developed by the original
studies can still produce poor-quality evidence. Invalid instru- investigators and trying to reproduce the tables, figures, and
ments, poor control groups, and time-varying confounding are estimates of the original report. This is akin to quality control,
serious concerns, in addition to the usual threats of selection and it would seem the bare minimum for most published work.
bias and measurement error. In 2 recent quasi-experimental Independent replications of the same causal question in another
analyses of the impact of a national mammography screen- setting are more challenging because we should not expect
ing program in Switzerland, researchers using identical data exact replications of study findings when there may be good
came to opposing conclusions about the program’s impact reasons why the effects may demonstrate heterogeneity (66, 67).
on socioeconomic inequalities in mammography precisely That being said, if the findings are wildly different in magnitude
because of important differences in how each set of authors or are in opposing directions, it should probably lead to caution
attempted to create exchangeable treatment groups (54, 55). with respect to designing or implementing interventions. Repro-
If it seems like these strategies could make each study con- ducible research more generally means making the materials
siderably more time-consuming and painstaking, so much the needed to replicate the work (e.g., data, protocols, code) avail-
better. We should worry deeply about the credibility of our as- able. Of course, the fact that studies are reproducible does not
sumptions and vigorously seek alternative explanations for re- make them valid. Poorly designed studies are hopeless for pro-
sults. Richard Feynman famously said that we have a responsibility ducing good evidence, no matter how transparently they are
as scientists to each other (and, I might add, to those who fund planned, pre-registered, analyzed, and reported. Although much
our work) to bend over backward to show how we might be can be done to apply sensitivity analysis to published work with-
wrong (56). This is good advice, and epidemiologists should out access to the original study materials (68), having access
aspire follow it, though in practice many discussions of study greatly expands the ability to explore methodological choices
limitations feel more like exercises in hand waving to convince that could be consequential for inference (e.g., assumptions,
readers of why we are right. However, we are all human, and if data transformations, alternative estimation strategies, effect
cognitive science has taught us anything it is that we are unlikely heterogeneity).
to be able to overcome our biases when evaluating our own Recent examples of replication studies in epidemiology are
work (57). Confirmation bias is one hell of a drug. instructive. Replication of a high-profile deworming interven-
tion (69) and a response by the authors of the original study
(70) generated a nuanced discussion of methods and choices
TRANSPARENCY: NECESSARY, NOT SUFFICIENT for analyzing such interventions. The original authors also pro-
vided important discussion about the merits of the replication
We can go some way toward more credible observational exercise that ultimately led to improved estimates. On the other
epidemiology by utilizing better study designs and making hand, another recent paper (71) in which the authors replicated
quantitative bias analysis standard practice. However, we can a study estimating the impact of anti-gay discrimination on
also go further. I think greater transparency has some role to mortality (72) led to considerably different conclusions after
play here, not only through better reporting, which has received accounting for missing data and ultimately to a correction being
considerable attention (58), but also throughout the entire life cycle published. The point here is not to take sides regarding the mer-
of research (59). For causal questions, this starts with the formula- its of the particular papers; rather, it is to highlight the enhanced
tion of a clear research question, ideally one specified to represent knowledge that comes from a replication exercise. In both
a target trial (10), and with a systematic review of the extent of cases, coding errors were found that surely the authors would
prior work to see whether new evidence is needed (60). From appreciate, even if it did not change the interpretation. More
there, we need transparency in how the research design was importantly, the questions were probed by different investi-
developed, the assumptions that must be credibly believed in gators with different background knowledge, with different
order to estimate a causal effect, and how the data analysis was biases, and perhaps with alternative causal structures in mind.
carried out. We also need transparency in reporting conflicts Interrogating observational studies in such a fashion should
(financial or nonfinancial) that may bias the authors’ interpretation help us to be more honest about the fragility of our findings.
Am J Epidemiol. 2019;188(5):840–845
Re-analyses of published randomized trials with different as- inadequate peer-review, novelty chasing, exaggerated claims, and
sumptions that led to different conclusions (73) should make conflicts of interest, all of which contribute to problems with
clear that it is not only observational studies that could benefit the credibility of published research (84). Better training of
from greater reproducibility. Clearly there may be limits to epidemiology graduate students in how to think critically and
such an exercise, because there are far too many papers sub- formulate well-defined questions, evaluate evidence and adjudi-
mitted and published to subject all papers to such scrutiny; cate between competing claims, and maintain humility regard-
however, without making the materials behind our studies ing their own research findings would be valuable. For what is
available, such dialogue will not happen. the alternative? Some may argue that epidemiology mostly gets
I wonder whether epidemiology could benefit from a more it right over the long haul, but how will we know? What other
structured approach. A recent example from development eco- masses of sliced, prodded, hacked, and unreliably “significant”
nomics highlights both the promise and some of the perils of results might be lurking beneath the surface of the sea of obser-
replication work (74). The Journal of Development Studies vational epidemiologic papers published every day? Will we
used a structured grant program to fund researchers to repli- bother to wade in and find out?
cate 8 high-profile publications from a candidate list, enlisted

third parties to assist in facilitating contact between original
researchers and replication authors, and provided space for the
original authors to comment on the replication findings. Errors ACKNOWLEDGMENTS
were discovered and there was some contentious debate, but
Author affiliations: Department of Epidemiology,
most findings were generally robust and several new findings
Biostatistics & Occupational Health, McGill University,
emerged. After all, some degree of nonreplication is useful for
Montreal, Quebec (Sam Harper); and Institute for Health and
elaborating alternative theories, exploring, and moving knowl-
Social Policy, McGill University, Montreal, Quebec (Sam
edge forward. There are many complications in making such ex-
Harper).
ercises routine, but I think the benefits likely outweigh the costs.
I thank Drs. Jay Kaufman, Nick King, and Arijit Nandi for
Journals in other disciplines are also actively experiment-
constructive comments on an earlier draft of this paper.
ing with alternative structures for peer review and publishing.
Conflict of interest: none declared.
More than 120 journals are now supporting “registered reports”
so that initial peer review and publication decisions are made
on the basis of specification of the scientific question, research
design, and methods (75). Publication is effectively guaranteed REFERENCES
after successful peer review (unless there are valid reasons why
the research could not be completed, suspicion of fraud, etc.). 1. Fisher RA. Cancer and smoking. Nature. 1958;182(4635):596.
Of course, this won’t help with the problem of null results that 2. Byar DP. Why data bases should not replace randomized
are never written up (76), but given the potential for results to clinical trials. Biometrics. 1980;36(2):337–342.
cloud judgments about the scientific merits of a paper, it’s 3. Feinstein AR. Scientific standards in epidemiologic studies of
the menace of daily life. Science. 1988;242(4883):1257–1263.
worth considering. Similar calls for results-free submission (77)
4. Savitz DA, Greenland S, Stolley PD, et al. Scientific standards
and quantitative bias analysis in peer review (78) for epidemio- of criticism: a reaction to “scientific standards in epidemiologic
logic studies have been suggested but, to my knowledge, never studies of the menace of daily life,” by A.R. Feinstein.
tried in any of our leading journals. Leaving aside the unhelpful Epidemiology. 1990;1(1):78–83.
dichotomization of results, early research suggests that the pro- 5. Kunz R, Oxman AD. The unpredictability paradox: review of
portion of “significant” results among registered reports is con- empirical comparisons of randomised and non-randomised
siderably lower (≈30%) (79) than the upwards of 90% of clinical trials. BMJ. 1998;317(7167):1185–1190.
published studies that report significant results (80). Because 6. Lawlor DA, Davey Smith G, Kundu D, et al. Those
null hypothesis significance testing is presently such a strong confounded vitamins: what can we learn from the differences
filter for publication, this isn’t necessarily unexpected (81) and between observational versus randomised trial evidence?
Lancet. 2004;363(9422):1724–1727.
is consistent with clinical trial registration having led to more
7. Davey Smith G, Ebrahim S. Epidemiology–is it time to call it a
null reports (82). day? Int J Epidemiol. 2001;30(1):1–11.
There are, of course, challenges with conducting replica- 8. Taubes G. Do we really know what makes us healthy? New
tion studies. There are important issues concerning privacy York Times. September 16, 2007: Magazine, 52. https://www.
and sharing of sensitive health data. Existing norms and in- nytimes.com/2007/09/16/magazine/16epidemiology-t.html.
centives in scientific publishing also continue to prioritize the Accessed November 16, 2018.
“new and novel” (e.g., replication studies that “overturn” origi- 9. Hernán MA, Alonso A, Logan R, et al. Observational studies
nal studies) (83). Care should be taken to make sure that repli- analyzed like randomized experiments: an application to
cation studies are themselves conducted transparently and to postmenopausal hormone therapy and coronary heart disease.
be cognizant of the potential impact of the exercise on young Epidemiology. 2008;19(6):766–779.
10. Hernán MA, Sauer BC, Hernández-Díaz S, et al. Specifying a
investigators. Guidance as to how to make replication studies
target trial prevents immortal time bias and other self-inflicted
more mainstream is emerging (67). injuries in observational analyses. J Clin Epidemiol. 2016;79:
Many of the problems that currently plague observational 70–75.
epidemiology exist because epidemiologic research is done 11. Jones D, Molitor D, Reif J. What Do Workplace Wellness
by human beings embedded in social contexts. This has led to Programs Do? Evidence From the Illinois Workplace Wellness
misaligned incentives among researchers, funders, and publishers, Study. Cambridge MA: National Bureau of Economic
Am J Epidemiol. 2019;188(5):840–845
844 Harper
Research; 2018. http://www.nber.org/papers/w24229. 31. Lieberson S. Making it Count: The Improvement of Social Research
Accessed November 16, 2018. and Theory. Berkeley, CA: Univ of California Press; 1987.
12. Bowen A, Casadevall A. Increasing disparities between 32. Lawlor DA, Tilling K, Davey Smith G. Triangulation in
resource inputs and outcomes, as measured by certain health aetiological epidemiology. Int J Epidemiol. 2016;45(6):
deliverables, in biomedical research. Proc Natl Acad Sci U S A. 1866–1886.
2015;112(36):11335–11340. 33. Klinenberg E. Denaturalizing disaster: a social autopsy of the
13. Ioannidis JPA. The mass production of redundant, misleading, 1995 chicago heat wave. Theory Soc. 1999;28(2):239–295.
and conflicted systematic reviews and meta-analyses. Milbank 34. Leon DA, Chenet L, Shkolnikov VM, et al. Huge variation in
Q. 2016;94(3):485–514. Russian mortality rates 1984–94: artefact, alcohol, or what?
14. Silberzahn R, Uhlmann EL, Martin DP, et al. Many analysts, Lancet. 1997;350(9075):383–388.
one data set: making transparent how variations in analytic 35. Venero Fernández SJ, Medina RS, Britton J, et al. The association
choices affect results. Adv Methods Pract Psychol Sci. 2018; between living through a prolonged economic depression and the
1(3):337–356. male: female birth ratio–a longitudinal study from Cuba,
15. Madigan D, Ryan PB, Schuemie M. Does design matter? 1960–2008. Am J Epidemiol. 2011;174(12):1327–1331.
Systematic evaluation of the impact of analytical choices on 36. Jayachandran S, Pandi R. Why are indian children so short?

effect estimates in observational studies. Ther Adv Drug Saf. The role of birth order and son preference. Am Econ Rev. 2017;
2013;4(2):53–62. 107(9):2600–2629.
16. Patel CJ, Burford B, Ioannidis JP. Assessment of vibration of 37. Loudon I. Maternal mortality in the past and its relevance to
effects due to model specification can demonstrate the developing countries today. Am J Clin Nutr. 2000;72(1 suppl):
instability of observational associations. J Clin Epidemiol. 241S–246S.
2015;68(9):1046–1058. 38. Susser M, Stein Z. Commentary: civilization and peptic ulcer
17. Ioannidis JP, Greenland S, Hlatky MA, et al. Increasing value 40 years on. Int J Epidemiol. 2002;31(1):18–21.
and reducing waste in research design, conduct, and analysis. 39. Fang J, Madhavan S, Alderman MH. The association between
Lancet. 2014;383(9912):166–175. birthplace and mortality from cardiovascular causes among
18. Hernán MA. The c-word: scientific euphemisms do not black and white residents of New York City. N Engl J Med.
improve causal inference from observational data. Am J Public 1996;335(21):1545–1551.
Health. 2018;108(5):616–619. 40. Razum O, Twardella D. Time travel with oliver twist–towards
19. Kaufman JS. Statistics, adjusted statistics, and maladjusted an explanation foa a paradoxically low mortality among recent
statistics. Am J Law Med. 2017;43(2–3):193–208. immigrants. Trop Med Int Health. 2002;7(1):4–10.
20. Krieger N, Davey Smith G. The tale wagged by the dag: 41. Tversky A, Kahneman D. Judgment under uncertainty:
broadening the scope of causal inference and explanation for heuristics and biases. Science. 1974;185(4157):1124–1131.
epidemiology. Int J Epidemiol. 2016;45(6):1787–1808. 42. Lash TL. Heuristic thinking and inference from observational
21. Vandenbroucke JP, Broadbent A, Pearce N. Causality and epidemiology. Epidemiology. 2007;18(1):67–72.
causal inference in epidemiology: the need for a pluralistic 43. Greenland S. Basic methods for sensitivity analysis of biases.
approach. Int J Epidemiol. 2016;45(6):1776–1786. Int J Epidemiol. 1996;25(6):1107–1116.
22. Fingerhut LA, Cox CS. Poisoning mortality, 1985–1995. 44. Lash TL, Fox MP, MacLehose RF, et al. Good practices for
Public Health Rep. 1998;113(3):218–233. quantitative bias analysis. Int J Epidemiol. 2014;43(6):
23. King NB, Fraser V, Boikos C, et al. Determinants of increased 1969–1985.
opioid-related mortality in the United States and Canada, 45. Brakenhoff TB, Mitroiu M, Keogh RH, et al. Measurement
1990–2013: a systematic review. Am J Public Health. 2014; error is often neglected in medical literature: a systematic
104(8):e32–e42. review. J Clin Epidemiol. 2018;98:89–97.
24. Alexander MJ, Kiang MV, Barbieri M. Trends in black and 46. Greenland S. Multiple-bias modelling for analysis of
white opioid mortality in the United States, 1979–2015. observational data. J R Stat Soc Ser A Stat Soc. 2005;168(2):
Epidemiology. 2018;29(5):707–715. 267–306.
25. Centers for Disease Control and Prevention. Unintentional and 47. Shadish WR, Cook TD, Campbell DT. Experimental and
undetermined poisoning deaths–11 states, 1990–2001. MMWR Quasi-Experimental Designs for Generalized Causal
Morb Mortal Wkly Rep. 2004;53(11):233–238. Inference. Boston, MA: Houghton Mifflin; 2001.
26. Franklin GM, Mai J, Wickizer T, et al. Opioid dosing trends 48. Craig P, Katikireddi SV, Leyland A, et al. Natural experiments:
and mortality in Washington State workers’ compensation, an overview of methods, approaches, and contributions to
1996–2002. Am J Ind Med. 2005;48(2):91–99. public health intervention research. Annu Rev Public Health.
27. Shah N, Lathrop SL, Landen MG. Unintentional methadone- 2017;38:39–56.
related overdose death in New Mexico (USA) and implications 49. Bernal JL, Cummins S, Gasparrini A. Interrupted time series
for surveillance, 1998–2002. Addiction. 2005;100(2):176–188. regression for the evaluation of public health interventions: a
28. Felton R. Flint residents raise concerns over discolored water. tutorial. Int J Epidemiol. 2017;46(1):348–355.
Detroit Metro Times. https://www.metrotimes.com/detroit/ 50. Ryan AM, Burgess JF Jr, Dimick JB. Why we should not be
flint-residents-raise-concerns-over-discolored-water/Content? indifferent to specification choices for difference-in-
oid=2231724. Published August 13, 2014. Accessed differences. Health Serv Res. 2015;50(4):1211–1235.
November 16, 2018. 51. Rehkopf DH, Basu S. A new tool for case studies in
29. Hanna-Attisha M, LaChance J, Sadler RC, et al. Elevated epidemiology-the synthetic control method. Epidemiology.
blood lead levels in children associated with the flint drinking 2018;29(4):503–505.
water crisis: a spatial analysis of risk and public health 52. Hernán MA, Robins JM. Instruments for causal inference: an
response. Am J Public Health. 2016;106(2):283–290. epidemiologist’s dream? Epidemiology. 2006;17(4):360–372.
30. Kennedy C, Yard E, Dignam T, et al. Blood lead levels among 53. Bor J, Moscoe E, Mutevedzi P, et al. Regression discontinuity
children aged <6 years - Flint, Michigan, 2013–2016. MMWR designs in epidemiology: causal inference without randomized
Morb Mortal Wkly Rep. 2016;65(25):650–654. trials. Epidemiology. 2014;25(5):729–737.
Am J Epidemiol. 2019;188(5):840–845
54. Pletscher M. The effects of organized screening programs on 70. Hicks JH, Kremer M, Miguel E. Commentary: deworming
the demand for mammography in switzerland. Eur J Health externalities and schooling impacts in Kenya: a comment on
Econ. 2017;18(5):649–665. Aiken et al. (2015) and Davey et al. (2015). Int J Epidemiol.
55. Cullati S, von Arx M, Courvoisier DS, et al. Organised 2015;44(5):1593–1596.
population-based programmes and change in socioeconomic 71. Regnerus M. Is structural stigma’s effect on the mortality of
inequalities in mammography screening: a 1992–2012 nationwide sexual minorities robust? A failure to replicate the results of a
quasi-experimental study. Prev Med. 2018;116:19–26. published study. Soc Sci Med. 2017;188:157–165.
56. Feynman RP. Cargo cult science. Engineering and Science. 72. Hatzenbuehler ML, Bellatorre A, Lee Y, et al. Structural
1974;37(7):10–13. stigma and all-cause mortality in sexual minority populations.
57. Kahneman D. Thinking, Fast and Slow. New York, NY: Farrar, Soc Sci Med. 2014;103:33–41.
Straus & Giroux; 2011. 73. Ebrahim S, Sohani ZN, Montoya L, et al. Reanalyses of
58. Equator Network. EQUATOR Network reporting guideline randomized clinical trial data. JAMA. 2014;312(10):
manual. https://www.equator-network.org/library/equator-network- 1024–1032.
reporting-guideline-manual/. Accessed December 17, 2018. 74. Brown AN, Wood BD. Replication studies of development
59. Munafò MR, Nosek BA, Bishop DV, et al. A manifesto for impact evaluations. J Dev Stud. 2019;55(5):917–924.

reproducible science. Nat Hum Behav. 2017;1:0021. 75. Nosek BA, Lakens D. Registered reports: a method to increase
60. Chalmers I, Bracken MB, Djulbegovic B, et al. How to the credibility of published results. Soc Psychol. 2014;45(3):
increase value and reduce waste when research priorities are 137–141.
set. Lancet. 2014;383(9912):156–165. 76. Franco A, Malhotra N, Simonovits G. Social science.
61. Open Science Collaboration. Estimating the reproducibility of Publication bias in the social sciences: unlocking the file
psychological science. Science. 2015;349(6251):aac4716. drawer. Science. 2014;345(6203):1502–1505.
62. Ioannidis JP, Allison DB, Ball CA, et al. Repeatability of 77. Lawlor DA. Quality in epidemiological research: should we be
published microarray gene expression analyses. Nat Genet. submitting papers before we have the results and submitting
2009;41(2):149–155. more hypothesis-generating research? Int J Epidemiol. 2007;
63. Errington TM, Iorns E, Gunn W, et al. Science forum: an open 36(5):940–943.
investigation of the reproducibility of cancer biology research. 78. Fox MP, Lash TL. On the need for quantitative bias analysis in
eLife. 2014;3:e04333. the peer-review process. Am J Epidemiol. 2017;185(10):
64. Button KS, Ioannidis JP, Mokrysz C, et al. Power failure: why 865–868.
small sample size undermines the reliability of neuroscience. 79. Allen CP, Mehler DM. Open science challenges, benefits and tips
Nat Rev Neurosci. 2013;14(5):365–376. in early career and beyond–manuscript. PsyArXiv Preprint. 2018.
65. Camerer CF, Dreber A, Forsell E, et al. Evaluating replicability (doi: 10.31234/osf.io/3czyt). Accessed November 16, 2018.
of laboratory experiments in economics. Science. 2016; 80. Chavalarias D, Wallach JD, Li AH, et al. Evolution of
351(6280):1433–1436. reporting p values in the biomedical literature, 1990–2015.
66. Lash TL, Collin LJ, Van Dyke ME. The replication crisis in JAMA. 2016;315(11):1141–1148.
epidemiology: snowball, snow job, or winter solstice?. Curr 81. Lash TL. The harm done to reproducibility by the culture of
Epidemiol Rep. 2018;5(2):175–183. null hypothesis significance testing. Am J Epidemiol. 2017;
67. Zwaan RA, Etz A, Lucas RE, et al. Making replication 186(6):627–635.
mainstream. Behav Brain Sci. 2017;1–50. 82. Kaplan RM, Irvin VL. Likelihood of null effects of large
68. Weuve J, Sagiv SK, Fox MP. Quantitative bias analysis for NHLBI clinical trials has increased over time. PLoS One.
collaborative science. Epidemiology. 2018;29(5):627–630. 2015;10(8):e0132382.
69. Aiken AM, Davey C, Hargreaves JR, et al. Re-analysis of 83. Gertler P, Galiani S, Romero M. How to make replication the
health and educational impacts of a school-based deworming norm. Nature. 2018;554(7693):417–419.
programme in western Kenya: a pure replication. Int J 84. Ioannidis JP. How to make more published research true. PLoS
Epidemiol. 2015;44(5):1572–1580. Med. 2014;11(10):e1001747.
Am J Epidemiol. 2019;188(5):840–845

Kwy 280

Uploaded by

Copyright:

Available Formats

Kwy 280

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kwy 280

Uploaded by

Copyright:

Available Formats

American Journal of Epidemiology Vol. 188, No.

A Future for Observational Epidemiology: Clarity, Credibility, Transparency

Downloaded from https://academic.oup.com/aje/article/188/5/840/5381890 by guest on 03 July 2024

840 Am J Epidemiol. 2019;188(5):840–845

Downloaded from https://academic.oup.com/aje/article/188/5/840/5381890 by guest on 03 July 2024

Downloaded from https://academic.oup.com/aje/article/188/5/840/5381890 by guest on 03 July 2024

Downloaded from https://academic.oup.com/aje/article/188/5/840/5381890 by guest on 03 July 2024

Downloaded from https://academic.oup.com/aje/article/188/5/840/5381890 by guest on 03 July 2024

Downloaded from https://academic.oup.com/aje/article/188/5/840/5381890 by guest on 03 July 2024

You might also like