A Simple, Step-By-Step Guide To Interpreting Decision Curve Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Vickers et al.

Diagnostic and Prognostic Research (2019) 3:18 Diagnostic and


https://doi.org/10.1186/s41512-019-0064-7
Prognostic Research

COMMENTARY Open Access

A simple, step-by-step guide to interpreting


decision curve analysis
Andrew J. Vickers1* , Ben van Calster2,3 and Ewout W. Steyerberg3

Abstract
Background: Decision curve analysis is a method to evaluate prediction models and diagnostic tests that was
introduced in a 2006 publication. Decision curves are now commonly reported in the literature, but there remains
widespread misunderstanding of and confusion about what they mean.
Summary of commentary: In this paper, we present a didactic, step-by-step introduction to interpreting a
decision curve analysis and answer some common questions about the method. We argue that many of the
difficulties with interpreting decision curves can be solved by relabeling the y-axis as “benefit” and the x-axis as
“preference.” A model or test can be recommended for clinical use if it has the highest level of benefit across a
range of clinically reasonable preferences.
Conclusion: Decision curves are readily interpretable if readers and authors follow a few simple guidelines.
Keywords: Net benefit, Decision curve analysis, Educational paper

Introduction differs from accuracy metrics such as discrimination and


Decision curve analysis is a method to evaluate predic- calibration because it incorporates the consequences of
tion models and diagnostic tests that was introduced by the decisions made on the basis of a model or test. For
Vickers and Elkin in a 2006 publication in Medical Deci- more on the background to decision curve analysis, see
sion Making [1]. The method sought to overcome the Vickers et al. [2].
limitations of both traditional statistical metrics, such as Recent years have seen an explosion of interest in and
discrimination and calibration, which are not directly in- practical use of decision curve analysis. The paper has
formative as to clinical value, and full decision analytic been widely cited, with > 1000 citations on Google
approaches, which are too unwieldy to be used in regu- Scholar as of May 2019. Decision curve analysis has
lar biostatistical practice. been recommended by editorials in many top journals,
In brief, decision curve analysis calculates a clinical including JAMA, BMJ, Annals of Internal Medicine,
“net benefit” for one or more prediction models or diag- Journal of Clinical Oncology, and PLoS Medicine [2–6].
nostic tests in comparison to default strategies of treat- That said, there does appear to be widespread misunder-
ing all or no patients. Net benefit is calculated across a standing of and confusion about decision curve analysis.
range of threshold probabilities, defined as the minimum For instance, a well-respected epidemiologist claimed that
probability of disease at which further intervention he had yet to find more than a couple of people in the
would be warranted, as net benefit = sensitivity × preva- world who could explain what decision curves meant and
lence – (1 – specificity) × (1 – prevalence) × w where w that he himself was not clear on their interpretation. We
is the odds at the threshold probability. For a prediction have also attended meetings where presenters have shown
model that gives predicted probability of disease p̂ , sensi- a decision curve slide and then commented that they them-
tivity and specificity at a given threshold probability pt is selves did not actually understand it.
calculated by defining test positive as p̂ ≥ pt. Net benefit Here, we present a didactic, step-by-step introduction
to interpreting a decision curve analysis. Each step aims
* Correspondence: [email protected] to give increasing understanding. Mastery of any step
1
Department of Epidemiology and Biostatistics, Memorial Sloan Kettering will give at least some insight into a published decision
Cancer Center, 485 Lexington Avenue, 2nd Floor, New York, NY 10017, USA
Full list of author information is available at the end of the article curve, although understanding all steps will naturally

© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Vickers et al. Diagnostic and Prognostic Research (2019) 3:18 Page 2 of 8

provide the greatest insight. In contrast to prior edito- diagnostic test (sensitivity 40%, specificity 90%) and a stat-
rials, which are aimed predominately at investigators istical prediction model based on several markers that give
wishing to report a decision curve analysis, our main an output in terms of a predicted probability of disease and
audience here is readers who wish to understand a pub- has an area under the curve (AUC) of 0.79. We calculate
lished decision curve. In this paper, we ask readers to the decision curves following the methods first described
consider individual patient scenarios, such as a patient in the Vickers and Elkin paper [1]. We then address some
who has young children and is worried about cancer. frequently asked questions about decision curves.
Note that such examples are for didactic purposes only:
as pointed out below, decision curves are a research tool Interpreting a decision curve analysis
and are not for direct use in the clinic. Note also that we Step 1: Benefit is good
will not comment further on how to calculate decision Figure 1 shows only the most essential elements of a
curves, nor comment on their mathematical properties: decision curve analysis. The result for the prediction
readers are referred to the appropriate methodological model is the light gray line, and the diagnostic test is
literature [1, 7, 8] and to www.decisioncurveanalysis.org the dashed line. The two other lines are for “interven-
We will use as our main example a study of prostate can- tion for all” (thin black line) and “intervention for
cer biopsy, a topic that has been subject to several papers none” (thick black line).
reporting decision curves, with a multi-institutional com- “Intervention” is used in a general sense: it might refer
parison of two prediction models being but one example to drugs or surgery, but it could also encompass lifestyle
[9]. As background, men undergoing screening with advice, additional diagnostic workup, or subsequent
prostate-specific antigen (PSA) are generally advised to monitoring. Indeed, intervention reflects any action that
have a biopsy if their PSA is elevated, for instance, a value a patient at high risk from a model, or getting a positive
of 3 ng/mL or higher. However, only a small proportion of result on a diagnostic test, would consider to improve
such men have high-grade cancer, the kind that benefits their health, or their life in general. The exact interven-
from early treatment. In contrast, low-grade cancer is con- tion depends on the clinical setting. In our study of
sidered to constitute overdiagnosis, and, of course, no ur- prostate cancer in men with elevated PSA, intervention
ologist would recommend a biopsy of a man without would mean prostate biopsy. To give other examples, in
cancer. Researchers have tried to find additional markers a study of infection, intervention might be giving antibi-
that could predict high-grade cancer in men with elevated otics; in a study of heart disease prevention, intervention
PSA. The idea is that any man with an elevated PSA would might be giving statins. In a study of palliative surgery for
undergo a second test, and only be referred to biopsy if advanced cancer, with an endpoint of death within 3
that indicated a high risk of aggressive disease. In our months; however, the idea would be to avoid surgery in
hypothetical study, the prevalence of high-grade cancer is patients at high risk and intervention would be “best sup-
10%. We suppose that the study evaluated both a binary portive care.” Note that in the original paper, describing

Fig. 1 A decision curve plotting benefit against preference


Vickers et al. Diagnostic and Prognostic Research (2019) 3:18 Page 3 of 8

decision curve analysis, and in many empirical applica- because the patient is younger and has school-age chil-
tions, the word “treat” is used in place of intervention. dren, and so very much prioritizes finding any lethal
Decision curve analysis includes results for “interven- cancer at a curable stage: this patient is clearly “worried
tion for all” and “intervention for none” because these about disease,” consistent with a low threshold for con-
are often reasonable clinical strategies [10, 11]. To give a tinuing diagnostic workup. A doctor with a preference
specific example, one reasonable strategy in the prostate for a given patient towards the right of the x-axis wants
biopsy study would be to biopsy all patients with ele- to avoid biopsy if possible. This might reflect a patient
vated PSA irrespective of the results of the diagnostic who does not like the idea of invasive medical proce-
test or prediction model. Indeed, this is generally what dures or a doctor treating an older patient and who is
happens in contemporary practice, where men who have skeptical about the value of early detection in that popu-
a PSA above a certain threshold are routinely biopsied lation: they are “worried about biopsy” and will opt for
without additional testing. On the other hand, we might biopsy only if the patient is at particularly high risk.
imagine a study of men with low PSA, who are not sub- This helps us take our interpretation a little bit further.
ject to biopsy in routine clinical practice. Some of these We can see that the model has higher benefit than the
men do have high-grade prostate cancer, and researchers other approaches, apart from doctors who fall in the
might be investigating a suitable test. In this case, the “very worried” category, for whom the benefit is actually
reference strategy would be “intervention for none.” slightly higher for the strategy of “intervention for all.”
On the figure, the y-axis is benefit and the x-axis is This makes intuitive sense: a patient with an elevated
preference. The benefit of a test or model is that it cor- PSA who has a strong preference for early identification
rectly identifies which patients do and do not have dis- of potentially lethal cancer might want to go straight
ease (in our example, high-grade cancer). Preference ahead and get a biopsy rather than depend on a second
refers to how doctors value different outcomes for a model or test that is not 100% accurate.
given patient, a decision that is often influenced by a dis-
cussion between the doctor and that patient. Both pref- Step 3: The unit of preference is threshold probability
erence and benefit are described in further detail below: Our model gives a patient’s predicted probability of
at this stage, it is only important to know that benefit is high-grade cancer. One might assume that if the model
good and that preferences vary. It is easily seen that the estimated the patient’s risk as 1%, both the patient and
light gray line, corresponding to the prediction model, the doctor would agree that there was no need for bi-
has the highest benefit across a wide range of values of opsy; if the risk was 99%, however, the doctor would ad-
preference. Hence, we can conclude that, except for a vise and the patient accept that biopsy was indicated.
small range of low preferences, intervening on (i.e., bi- Comparable conclusions would be drawn if the risks
opsying) patients on the basis of the prediction model were 2% versus 98%. We might imagine that we vary the
leads to higher benefit than the alternative strategies of risks, counting up from 2% and down from 98% until
biopsying all patients, biopsying no patients, or only bi- the doctor is no longer sure. For instance, a doctor
opsy those patients who are positive on the diagnostic might say “Thinking about this patient, I wouldn’t do
test. For the prostate biopsy study, the conclusion is that more than 10 biopsies to find one high-grade cancer in
using the model to determine whether patients should patients with similar health and who think about the
have a biopsy would lead to improved clinical outcome. risks and benefits of biopsy vs. finding cancer in the
same way. So if a patient’s risk was above 10% I do a bi-
Step 2: Preference refers to how doctors value different opsy, otherwise, I just carefully monitor the patient and
outcomes for their patients perhaps do a biopsy later if I saw a reason to.”
Following a consultation and a discussion with some pa- The relationship between preference and threshold
tients, a doctor might be particularly worried about probability is perhaps the easiest to see when using the
missing disease; for other patients, the doctor may be odds. The risk of 10% is an odds of 1:9, so in using a
more concerned about avoiding unnecessary interven- threshold probability of 10%, the doctor is telling us
tion. Doctors may also vary in their propensity to inter- “missing a high-grade cancer is 9 times worse than doing
vene, some being more conservative, others more an unnecessary biopsy” [2]. This can be interpreted as
aggressive. In Fig. 1, the extremes of the x-axis for pref- the “number-needed-to-test,” that is, 10% is a number-
erence are “I’m worried about disease” and “I’m worried needed-to-test of 10. Figure 2 shows threshold probabil-
about biopsy.” In the case of prostate cancer biopsy, a ities on the x-axis. Odds are also shown for didactic
doctor who, for a given patient, has a preference towards purposes, although these are omitted when presenting
the left end of the x-axis weighs the relative harm of decision curves. This helps us to understand our previ-
missing a high-grade cancer as much greater than the ous conclusion that patients who are particularly wor-
harm of unnecessary biopsy. This may be, for instance, ried about disease do not benefit from using the model.
Vickers et al. Diagnostic and Prognostic Research (2019) 3:18 Page 4 of 8

Fig. 2 A decision curve plotting net benefit against threshold probability

We can now see that it is only if threshold probabilities will depend on the relative seriousness of the interven-
are less than 2 or 3% that we should avoid using the tion and outcome. For instance, we will be willing to
model. That would be a stretch in prostate cancer, where conduct more unnecessary biopsies to find one cancer if
biopsy is invasive, painful, and associated with the risk of the biopsy procedure is safe vs. dangerous or the cancer
sepsis. However, such a low threshold might be plausible is aggressive vs. more indolent. The exchange rate is cal-
in some other scenarios, for instance, biopsy for skin culated, as explained above, from the threshold probabil-
cancer, which is a far less risky and less invasive proced- ity. Another analogy is with net health benefit or net
ure. Note also that the curve is only plotted up to 20%. monetary benefit, which both depend on the willingness
This is because, given the relative risks of missing a to pay threshold in their exchange of benefits in terms
high-grade prostate cancer compared to the harms of bi- of health and costs [12].
opsy, we would consider it unreasonable for any patient The unit of net benefit is true positives. A net benefit
or doctor to demand greater than 20% risk before of 0.07, for instance, means “7 true positives for every
accepting biopsy. The plausible range of thresholds 100 patients in the target population.” So just like in the
hence depends critically on context. Elsewhere, we de- example of net profit for the wine trader, a net benefit of
scribe in detail the process by which a reasonable range 0.07 would be the equivalent of identifying 7 patients
of thresholds can be agreed upon [2]. per 100, all of whom had disease. In the prostate biopsy
example, a 0.07 net benefit would be equivalent to a
Step 4: Benefit is actually net benefit strategy where 7 patients per 100 were biopsied and all
Figure 2 also shows the correct units for benefit, what is were found to have high-grade tumors. Also comparable
known as “net benefit.” The “net” in “net benefit” is the to the business example, where a profit of $250,000
same as in “net profit,” that is, income minus expend- could result from various combinations of income and
iture. If, say, a wine importer buys €1m of wine from expenditure, a net benefit of 0.07 could result from dif-
France and sells it in the USA for $1.5m, then if the ex- ferent combinations of true and false positives.
change rate is €1 to $1.25, the net profit is income in
dollars (1.5m) − expenditure in euros (1m) × exchange Step 5: Net benefit can also be expressed as interventions
rate (1.25) = $250,000. Leaving aside, for the sake of sim- avoided
plicity, the issue of risk and the time and trouble to In many scenarios, the most common strategy is to
trade, this is equivalent to being given $250,000 without “intervention for all” rather than to “intervention for
having to do any trading. In the case of diagnosis, the in- none.” Indeed, this is the case for our prostate cancer
come is true positives (e.g., finding a cancer) and the ex- example, where urologists routinely biopsy all patients
penditure is false positives (e.g., unnecessary biopsies), with an elevated PSA. In these scenarios, a model or test
with the “exchange rate” being the number of false posi- would aim to reduce unnecessary intervention. Net
tives that are worth one true positive. The exchange rate benefit can be expressed in terms of true negatives
Vickers et al. Diagnostic and Prognostic Research (2019) 3:18 Page 5 of 8

rather than true positives. Figure 3 shows an example of probability thresholds would be lower, if diagnosing
this type of decision curve. This could be interpreted and treating high-grade cancer had a larger effect
that, at a risk threshold of 10%, use of the prediction on life expectancy. As another simple example,
model would be the equivalent of a strategy that reduced consider a decision curve to predict heart attack,
the number of unnecessary biopsies by about 40 per 100 where patients at high risk are given a prophylactic
without missing biopsy for any patients with high-grade drug. Imagine that the drug reduced the relative
cancer. Expressing net benefit in terms of avoided un- risk of a cardiac event by 10% and was associated
necessary diagnostic procedures or avoided unnecessary with an absolute 1% risk of a serious side-effect
treatments is recommended if the reference strategy is such as a stroke or gastrointestinal hemorrhage.
“intervention for all.” Note that doing so does not If we assume that cardiac events and serious side-
change any conclusions as to which model or test has effects are equally harmful, then the minimum
the highest net benefit. threshold probability to justify treatment would be
10%. This is because a 10% relative risk reduction
Some common questions about interpreting from 10% is 1% in absolute terms, so the reduction
decision curves in the risk of a cardiac event would be the same as
the increase in the risk of stroke. However, if the
1. What if we do not know the threshold probability? A drug were more effective, say a 20% relative risk
threshold probability is necessary to use any model reduction, then the minimum threshold probability
or test for decision-making. If our prostate cancer would be 5%. That said, some models predict not
prediction model gave a predicted risk of, say, 40%, absolute risk but treatment benefit, that is, “patient
and no one knew whether that was high or low, X is predicted to have a 2% absolute reduction in
and therefore could not tell whether biopsy was risk” rather than “patient X has a 20% absolute risk
indicated, then the model could not be used to of the event.” An alternative version of decision
make a decision. As a result, the question of using a curve analysis is available for such models [13].
decision analytic technique such as decision curve 3. How much of a difference in curves is enough? In
analysis to evaluate the model would be redundant. classical decision theory, the strategy with the
2. How is treatment effect taken into account? In most highest expected utility should be chosen,
decision curves, the effect of treatment is implicit irrespective of the size or statistical significance of
and is incorporated into the threshold probability. the benefit. Theoretically, any improvement in net
In general, the more effective the treatment, the benefit is therefore worth having. That said, a
lower the threshold probability: larger treatment straightforward decision analysis does not take into
effects imply lower thresholds. In the prostate account the time and trouble required to obtain
cancer example, the diagnosis of high-grade disease data for and implement a model. Now, if a model
would be considered more important, and hence required a variable from an invasive medical

Fig. 3 A decision curve plotting decrease in interventions against threshold probability


Vickers et al. Diagnostic and Prognostic Research (2019) 3:18 Page 6 of 8

procedure associated with non-trivial risk, we would discrimination (AUC) and calibration [15]. To give
likely not use the model if it had only a small im- a simple example, imagine that we took the
provement in net benefit. There are two approaches predictions from the prostate cancer and divided by
to this problem. First, as described in the original 10. Although this would have no effect on
paper on net benefit [1], investigators can formally AUC—patients with a higher risk are more likely
incorporate harm associated with the model or test to have high-grade cancer than patients at lower
into a decision curve. In brief, the investigators ask: risk—it would have obvious effects on clinical
“if the test/model was perfect, how many patients value: we might tell a patient at 40% risk that risk
would I subject to it in order to find one true case was only 4%. With that risk estimate, he would elect
(e.g., a cancer)?”. The reciprocal of this number is not to have a biopsy, leading to an important risk
known as “test harm” and is subtracted from net of missing an aggressive cancer.
benefit. Alternatively, the investigators can look at 6. Why is “intervention for all” or “intervention for
differences in net benefit, or interventions avoided, none” a relevant comparison? Intervening for all or
and make an informal judgment; this is related to no patients, irrespective of test or model results are
the concept of “test trade-off” [14]. Using the data reasonable clinical strategies in many scenarios. A
shown in Figure 3, one might ask whether it is test or model must be found superior to both of
worth calculating the model for 100 patients in these strategies to justify being used in clinical
order to prevent 39 biopsies, or whether it is worth practice [10]. There are several examples in the
using the model rather than the test to prevent 5 literature demonstrating the value of comparing
biopsies. The answer to those questions depends on models to intervention for all or no patients. For
the sort of information required for the model and instance, Nam et al. [9] found that a prostate biopsy
for the test, such as whether an invasive, harmful, model had a lower net benefit than biopsying all
or expensive procedure was required. men at elevated risk, because the model
4. Should there be confidence intervals or p values for underestimated the risk of cancer.
decision curves? Statistical significance and 7. Can a decision curve analysis substitute a
confidence intervals are not important concepts in traditional decision analysis or cost-effectiveness
classical decision theory. This can be described in analysis? Decision curve analysis is much quicker
brief as follows. A decision-maker should start and easier than a full decision analysis because it
by considering all reasonable options for a given requires fewer parameters to be specified (indeed,
decision problem. Which options count as only one, the reasonable range of threshold
“reasonable” might well include consideration of probabilities). However, doing so involves
statistical significance. But when choosing between simplifying assumptions. If the results of the
different options, the most rational choice is decision curve analysis are very clear, for instance,
(in general) that with the highest expected utility, that a model has no benefit, this may obviate the
irrespective of statistical significance. As a simple need for a more complex decision analysis. On the
thought experiment, consider an individual who other hand, if the results are more equivocal, there
had to rush home for an appointment, could take may be a case for a decision analysis with a more
either one of two bus routes, and happened to have completely specified list of parameters for benefits,
a dataset of the times for each route. If the mean harms, and costs.
times home were 30 vs. 35 min, with similar 8. Can you use a decision curve to choose the best
distributions and variances, the individual would be threshold? This is a frequent and fundamental
advised to take the quicker route home, even if the misunderstanding. Investigators have sometimes
difference was not statistically significant and the written statements such as “the model was superior
confidence interval for the difference in times in the range 30 – 40%; therefore patients should
overlapped with zero. As a result, few published choose intervention if their probability from the
decision curves incorporate confidence intervals. model is greater than 30 – 40%.” This reverses the
Confidence intervals may be useful in certain relationship between threshold probability and
scenarios, for instance, to determine whether more evaluation of a model. Investigators should first
research is required. Methods for the calculation work out a clinically reasonable range of threshold
of confidence intervals have been published [7]. probabilities, based on considering the relative
5. How can a model be harmful if area under the harms of avoid intervention for a patient with
curve (AUC) is better than 0.50? If one model has a disease versus unnecessarily intervening on a
better AUC than another, how can it have a worse patient who is disease free. They should then
net benefit? Net benefit takes into account both determine whether the net benefit of their model
Vickers et al. Diagnostic and Prognostic Research (2019) 3:18 Page 7 of 8

or test is better than alternatives across this range threshold probability and net benefit—are concepts that
of threshold probabilities. are novel to many.
9. How do you use a decision curve analysis in the We hope that this didactic overview will aid in the in-
clinic? A decision curve analysis has no more direct terpretation of decision curve analysis and ensure that
clinical applicability than, say, the p value and the basic concepts underpinning decision curves are
overall absolute risk reduction from a trial of a new more widely understood.
drug. In the drug trial, a p value might be used to
Abbreviations
conclude “the drug works” and the overall absolute AUC: Area under the curve; p̂ : Predicted probability that a given patient has
risk reduction to judge that “the benefit of the drug the event of interest; PSA: Prostate-specific antigen; pt: Threshold probability
outweighs the harms.” In such a case, a doctor
Acknowledgements
would then give the drug to patients where None
indicated, without looking up the trial results each
time. In a similar way, a decision curve is used to Authors’ contributions
AV conceived of the initial paper. BVC and ES provided input to the concept.
evaluate whether a model or test would be of The manuscript was written, edited, and approved by all authors.
benefit in the clinic. If results are positive, then the
model or test can be used with appropriate patients Funding
AJV is supported by David H. Koch provided through the Prostate Cancer
as part of shared decision-making without any need Foundation, the Sidney Kimmel Center for Prostate and Urologic Cancers,
to refer back to the original decision curve. SPORE grant from the National Cancer Institute to Dr. H. Scher (grant
10. Do I need to know the threshold probability for an number P50-CA92629), and a National Institutes of Health/National Cancer
Institute Cancer Center Support Grant to MSKCC (grant number P30-
individual patient before I use the results of a CA008748). EWS is supported by U01 NS086294 from the NIH. BVC is sup-
decision curve and use the results of a test or model? ported by the Research Foundation–Flanders (FWO; grant G0B4716N) and In-
This is not how decision curves are intended to be ternal Funds KU Leuven (grant C24/15/037).
used. If a model or test has the highest net benefit Availability of data and materials
across the entire range of reasonable threshold Not applicable
probabilities, then clearly that model or test should
Ethics approval and consent to participate
be used irrespective of patient preference. If the Not applicable
optimal approach depends on the threshold
probability, then the typical conclusion would be that Consent for publication
Not applicable
the model or test is of unproven benefit or that it is
only useful in settings where we assume a specific Competing interests
range of preferences. A more formal decision analysis The authors declare that they have no competing interests.
might involve elicitation of individual preferences Author details
from a study sample and integration of utilities across 1
Department of Epidemiology and Biostatistics, Memorial Sloan Kettering
a distribution of these preferences. Cancer Center, 485 Lexington Avenue, 2nd Floor, New York, NY 10017, USA.
2
Department of Development and Regeneration, KU Leuven, Oude Markt 13,
3000 Leuven, Belgium. 3Department of Biomedical Data Sciences, Leiden
University Medical Center, Albinusdreef 2, 2333 ZA Leiden, Netherlands.
Conclusion
Received: 13 December 2018 Accepted: 26 June 2019
A PubMed search for “decision analysis” restricted to
2017 retrieves 311 papers; a comparable search for “deci-
sion curve analysis” retrieves 95. Given that few of deci- References
1. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating
sion curve papers would have involved a decision analytic prediction models. Med Decis Making. 2006;26(6):565–74.
methodology if not for the availability of a straightforward 2. Vickers AJ, Van Calster B, Steyerberg EW. Net benefit approaches to the
analytic technique, this means that decision curve analysis evaluation of prediction models, molecular markers, and diagnostic tests.
BMJ. 2016;352:i6.
is responsible for an important increase in the use of deci- 3. Fitzgerald M, Saville BR, Lewis RJ. Decision curve analysis. JAMA. 2015;
sion analysis in the medical literature. A greater under- 313(4):409–10.
standing of decision curve analysis is therefore not 4. Localio AR, Goodman S. Beyond the usual prediction accuracy metrics: reporting
results for clinical decision making. Ann Intern Med. 2012;157(4):294–5.
only of inherent value, but will also lead to a greater 5. Kerr KF, Brown MD, Zhu K, Janes H. Assessing the clinical impact of risk
appreciation of decision analytic principles in the re- prediction models with decision curves: guidance for correct interpretation
search community as a whole. and appropriate use. J Clin Oncol. 2016;34(21):2534–40.
6. Holmberg L, Vickers A. Evaluation of prediction models for decision-making:
When investigators have indicated to us that “deci- beyond calibration and discrimination. PLoS Med. 2013;10(7):e1001491.
sion curve analysis is hard to understand,” it is clear 7. Vickers AJ, Cronin AM, Elkin EB, Gonen M. Extensions to decision curve
that this confusion centers on the metric rather than analysis, a novel method for evaluating diagnostic tests, prediction models
and molecular markers. BMC Med Inform Decis Mak. 2008;8:53.
the methodology. Calculating a decision curve requires 8. Baker SG, Cook NR, Vickers A, Kramer BS. Using relative utility curves to
only the most trivial math [1], but the two axes— evaluate risk prediction. J R Stat Soc Ser A Stat Soc. 2009;172(4):729–48.
Vickers et al. Diagnostic and Prognostic Research (2019) 3:18 Page 8 of 8

9. Nam RK, Kattan MW, Chin JL, Trachtenberg J, Singal R, Rendon R, Klotz LH,
Sugar L, Sherman C, Izawa J, et al. Prospective multi-institutional study
evaluating the performance of prostate cancer risk calculators. J Clin Oncol.
2011;29(22):2959–64.
10. Hunink M, Glasziou P, Siegel J, Weeks J, Pliskin J, Elstein A, M. W. Decision
making in health and medicine: integrating evidence and values.
Cambridge: Cambridge University Press; 2001.
11. Pauker SG, Kassirer JP. The threshold approach to clinical decision making.
N Engl J Med. 1980;302(20):1109–17.
12. Stinnett AA, Mullahy J. Net health benefits: a new framework for the
analysis of uncertainty in cost-effectiveness analysis. Med Decis Making.
1998;18(2 Suppl):S68–80.
13. Vickers AJ, Kattan MW, Daniel S. Method for evaluating prediction models that
apply the results of randomized trials to individual patients. Trials. 2007;8:14.
14. Baker SG. The summary test tradeoff: a new measure of the value of an
additional risk prediction marker. Stat Med. 2017;36(28):4491–4.
15. Van Calster B, Vickers AJ. Calibration of risk prediction models: impact on
decision-analytic performance. Med Decis Making. 2015;35(2):162–9.

Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

You might also like