STARD
STARD
STARD
Background: To comprehend the results of diagnostic Results: The search for published guidelines on diag-
accuracy studies, readers must understand the design, nostic research yielded 33 previously published check-
conduct, analysis, and results of such studies. That goal lists, from which we extracted a list of 75 potential items.
can be achieved only through complete transparency The consensus meeting shortened the list to 25 items,
from authors. using evidence on bias whenever available. A prototyp-
Objective: To improve the accuracy and completeness of ical flow diagram provides information about the
reporting of studies of diagnostic accuracy to allow method of patient recruitment, the order of test execu-
readers to assess the potential for bias in the study and tion and the numbers of patients undergoing the test
to evaluate its generalisability. under evaluation, the reference standard or both.
Methods: The Standards for Reporting of Diagnostic Conclusions: Evaluation of research depends on com-
Accuracy (STARD) steering committee searched the plete and accurate reporting. If medical journals adopt
literature to identify publications on the appropriate the checklist and the flow diagram, the quality of
conduct and reporting of diagnostic studies and ex- reporting of studies of diagnostic accuracy should im-
tracted potential items into an extensive list. Research- prove to the advantage of clinicians, researchers, review-
ers, editors, and members of professional organisations ers, journals, and the public.
shortened this list during a two-day consensus meeting
with the goal of developing a checklist and a generic The world of diagnostic tests is highly dynamic. New tests
flow diagram for studies of diagnostic accuracy. are developed at a fast rate and the technology of existing
tests is continuously being improved. Exaggerated and
biased results from poorly designed and reported diag-
1
Department of Clinical Epidemiology and Biostatistics, Academic Medi-
nostic studies can trigger their premature dissemination
cal Center—University of Amsterdam, 1100 DE Amsterdam, The Netherlands. and lead physicians into making incorrect treatment de-
2
Department of Pathology, University of Virginia, Charlottesville, VA cisions. A rigorous evaluation process of diagnostic tests
22903. before introduction into clinical practice could not only
3
Clinical Chemistry, Washington, DC 20037.
4
Centre for Statistical Sciences, Brown University, Providence, RI 02912. reduce the number of unwanted clinical consequences
5
Centre for General Practice, University of Queensland, Herston QLD related to misleading estimates of test accuracy, but also
4006, Australia. limit healthcare costs by preventing unnecessary testing.
6
Department of Public Health & Community Medicine, University of
Sydney, Sydney NSW 2006, Australia.
Studies to determine the diagnostic accuracy of a test are
7
Chalmers Research Group, Ottowa, Ontario, K1N 6M4 Canada. a vital part in this evaluation process (1–3 ).
8
Institute for Health Policy Studies, University of California, San Fran- In studies of diagnostic accuracy, the outcomes from
cisco, San Francisco, CA 94118. one or more tests under evaluation are compared with
9
Journal of the American Medical Association, Chicago, IL 60610.
10
Institute for Research in Extramural Medicine, Free University, 1081 BT
outcomes from the reference standard, both measured in
Amsterdam, The Netherlands. subjects who are suspected of having the condition of
*Address correspondence to this author at: Department of Clinical Epide- interest. The term test refers to any method for obtaining
miology and Biostatistics, Academic Medical Center—University of Amster-
additional information on a patient’s health status. It
dam, PO Box 22700, 1100 DE Amsterdam, The Netherlands. Fax 31-20-6912683;
e-mail [email protected]. includes information from history and physical examina-
Received September 15, 2002; accepted September 15, 2002. tion, laboratory tests, imaging tests, function tests and
1
2 Bossuyt et al.: The STARD Initiative
histopathology. The condition of interest or target condi- diagnostic studies. This search included the Medline,
tion can refer to a particular disease or to any other Embase, BIOSIS and the methodological database from
identifiable condition that may prompt clinical actions, the Cochrane Collaboration up to July 2000. In addition,
such as further diagnostic testing, or the initiation, mod- the steering committee members examined reference lists
ification or termination of treatment. In this framework, of retrieved articles, searched personal files, and con-
the reference standard is considered to be the best available tacted other experts in the field of diagnostic research.
method for establishing the presence or absence of the They reviewed all relevant publications and extracted an
condition of interest. The reference standard can be a extended list of potential checklist items.
single method, or a combination of methods, to establish Subsequently, the STARD steering committee con-
the presence of the target condition. It can include labo- vened a two-day consensus meeting for invited experts
ratory tests, imaging tests, pathology, but also dedicated from the following interest groups: researchers, editors,
clinical follow-up of subjects. The term accuracy refers to methodologists and professional organisations. The aim
the amount of agreement between the information from of the conference was to reduce the extended list of
the test under evaluation, referred to as the index test, and potential items, where appropriate, and to discuss the
the reference standard. Diagnostic accuracy can be ex- optimal format and phrasing of the checklist. The selec-
pressed in many ways, including sensitivity and specific- tion of items to retain was based on evidence whenever
ity, likelihood ratios, diagnostic odds ratio, and the area possible.
under a receiver operator characteristic (ROC) curve The meeting format consisted of a mixture of small
(4 – 6 ). group sessions and plenary sessions. Each small group
There are several potential threats to the internal and focused on a group of related items of the list. The
external validity of a study of diagnostic accuracy. A suggestions of the small groups were then discussed in
survey of studies of diagnostic accuracy published in four plenary sessions. Overnight a first draft of the STARD
major medical journals between 1978 and 1993 revealed checklist was assembled based on the suggestions from
that the methodological quality was mediocre at best (7 ). the small group and the additional remarks from the
However, evaluations were hampered because many re- plenary sessions. All meeting attendees discussed this
ports lacked information on key elements of design, version the next day and made additional changes. The
conduct and analysis of diagnostic studies (7 ). The ab- members of the STARD group could suggest further
sence of critical information about the design and conduct changes through a later round of comments by electronic
of diagnostic studies has been confirmed by authors of mail.
metaanalyses (8, 9 ). As in any other type of research, Potential users field-tested the conference version of
flaws in study design can lead to biased results. One the checklist and flow diagram and additional comments
report showed that diagnostic studies with specific design were collected. This version was placed on the CONSORT
features are associated with biased, optimistic, estimates Website with a call for comments. The STARD steering
of diagnostic accuracy compared to studies without such committee discussed all comments and assembled the
deficiencies (10 ). final checklist.
At the 1999 Cochrane Colloquium meeting in Rome,
the Cochrane Diagnostic and Screening Test Methods Results
Working Group discussed the low methodological quality The search for published guidelines for diagnostic research
and substandard reporting of diagnostic test evaluations. yielded 33 lists. Based on these published guidelines and on
The Working Group felt that the first step to correct these input of steering and STARD group members, the steering
problems was to improve the quality of reporting of committee assembled a list of 75 items. During the consen-
diagnostic studies. Following the successful CONSORT sus meeting on September 16 and 17, 2000, participants
initiative (11–13 ), the Working Group aimed at the devel- consolidated and eliminated items to form the 25-item
opment of a checklist of items that should be included in checklist. Conference members made major revisions to the
the report of a study of diagnostic accuracy. phrasing and format of the checklist.
The objective of the Standards for Reporting of Diag- The STARD group received valuable comments and
nostic Accuracy (STARD) initiative is to improve the remarks during the various stages of evaluation after the
quality of reporting of studies of diagnostic accuracy. conference, which resulted in the version of the STARD
Complete and accurate reporting allows the reader to checklist that appears in Table 1.
detect the potential for bias in the study (internal validity) The flow diagram provides information about the
and to assess the generalisability and applicability of the method of patient recruitment (e.g., based on a consecu-
results (external validity). tive series of patients with specific symptoms, case-
control), the order of test execution, and the number of
Materials and Methods patients undergoing the test under evaluation (index test)
The STARD steering committee (see appendix for mem- and the reference test (see Fig. 1). We provide one
bership and details) started with an extensive search to prototypical flowchart that reflects the most commonly
identify publications on the conduct and reporting of employed design in diagnostic research. Examples that
Table 1. STARD checklist for the reporting of studies of diagnostic accuracy.
4 Bossuyt et al.: The STARD Initiative
reflect other designs are on the STARD Web site (see readers to judge the potential for bias in the study and to
www.consort-statement.org.htm) appraise the applicability of the findings. Two other
general considerations shaped the content and format of
Discussion the checklist. First, the STARD group believes that one
The purpose of the STARD initiative is to improve the general checklist for studies of diagnostic accuracy, rather
quality of the reporting of diagnostic studies. The items in than different checklists for each field, is likely to be more
the checklist and the flowchart can help authors in de- widely disseminated and perhaps accepted by authors,
scribing essential elements of the design and conduct of peer reviewers, and journal editors. Although the evalu-
the study, the execution of tests, and the results. ation of imaging tests differs from that of tests in the
We arranged the items under the usual headings of a laboratory, we felt that these differences were more of
medical research article but this is not intended to dictate degree than of kind. The second consideration was the
the order in which they have to appear within an article. development of a checklist specifically aimed at studies of
The guiding principle in the development of the diagnostic accuracy. We did not include general issues in
STARD checklist was to select items that would help the reporting of research findings, like the recommenda-
Clinical Chemistry 49, No. 1, 2003 5
chemistry (Copenhagen, Denmark); Barbara McNeil, Har- 6. Metz CE. Basic principles of ROC analysis. Semin Nucl Med
vard Medical School, Department of Health Care Policy 1978;8:283–98.
(Boston, MA); Matthew McQueen, Hamilton Civic Hos- 7. Reid MC, Lachs MS, Feinstein AR. Use of methodological stan-
dards in diagnostic test research. Getting better but still not good.
pitals, Department of Laboratory Medicine (Hamilton,
JAMA 1995;274:645–51.
Canada); Andrew Onderdonk, Channing Laboratory 8. Nelemans PJ, Leiner T, de Vet HCW, van Engelshoven JMA.
(Boston, MA); John Overbeke, Nederlands Tijdschrift voor Peripheral arterial disease: Meta-analysis of the diagnostic perfor-
Geneeskunde (Amsterdam, The Netherlands); Christopher mance of MR angiography. Radiology 2000;217:105–14.
Price, St Bartholomew’s - Royal London School of Medi- 9. Devries SO, Hunink MGM, Polak JF. Summary receiver operating
cine and Dentistry (London, United Kingdom); Anthony characteristic curves as a technique for meta-analysis of the
Proto, Radiology Editorial Office (Richmond, VA);Hans diagnostic performance of duplex ultrasonography in peripheral
arterial disease. Acad Radiol 1996;3:361–9.
Reitsma, Academic Medical Center, Department of Clin- 10. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der
ical Epidemiology (Amsterdam, The Netherlands); David Meulen JH, et al. Empirical evidence of design-related bias in
Sackett, Trout Centre (Ontario, Canada); Gerard Sanders, studies of diagnostic tests. JAMA 1999;282:1061– 6.
Academic Medical Center, Department of Clinical Chem- 11. Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al.
istry (Amsterdam, The Netherlands); Harold Sox, Annals Improving the quality of reporting of randomized controlled trials.
of Internal Medicine (Philadelphia, PA); Sharon Straus, Mt. The CONSORT statement. JAMA 1996;276:637–9.
12. Moher D, Schulz KF, Altman D. The CONSORT statement: revised
Sinai Hospital (Toronto, Canada); Stephan Walter, Mc-
recommendations for improving the quality of reports of parallel-
Master University, Clinical Epidemiology and Biostatis- group randomized trials. JAMA 2001;285:1987–91.
tics (Hamilton, Canada). 13. Moher D, Jones A, Lepage L. Use of the CONSORT statement and
quality of reports of randomized trials. A comparative before-and-
References after evaluation. JAMA 2001;285:1992–5.
1. Guyatt GH, Tugwell PX, Feeny DH, Haynes RB, Drummond M. A 14. International Committee of Medical Journal Editors. Uniform Re-
framework for clinical evaluation of diagnostic technologies. Can quirements for manuscripts submitted to biomedical journals.
Med Assoc J 1986;134:587–94. JAMA. 1997;277:927–34. Also available at: ACP Online, http://
www.acponline.org.
2. Fryback DG, Thornbury JR. The efficacy of diagnostic imaging. Med
15. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP,
Decis Making 1991;11:88 –94.
Irwig LM, et al. The STARD Statement for reporting studies of
3. Kent DL, Larson EB. Disease, level of impact, and quality of diagnostic accuracy: explanation and elaboration. Clin Chem
research methods. Three dimensions of clinical efficacy assess- 2003;49:7–18.
ment applied to magnetic resonance imaging. Invest Radiol 1992; 16. Egger M, Jüni P, Barlett C. Value of flow diagrams in reports of
27:245–54. randomized controlled trials. JAMA 2001;285:1996 –9.
4. Griner PF, Mayewski RJ, Mushlin AI, Greenland P. Selection and 17. Knottnerus JA. The effects of disease verification and referral on
interpretation of diagnostic tests and procedures. Principles and the relationship between symptoms and diseases. Med Decis
applications. Ann Intern Med 1981;94:557–92. Making 1987;7:139 – 48.
5. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. The selection of 18. Panzer RJ, Suchman AL, Griner PF. Workup bias in prediction
diagnostic tests. In: Sackett D, editor. Clinical epidemiology, 2nd research. Med Decis Making 1987;7:115–9.
ed. Boston/Toronto/London: Little, Brown and Company; 1991: 19. Begg CB. Biases in the assessment of diagnostic tests. Stat Med
47–57. 1987;6:411–23.