Disease Prediction Models and Operational Readiness
Courtney D. Corley1*, Laura L. Pullum2, David M. Hartley3, Corey Benedum1, Christine Noonan1,
Peter M. Rabinowitz4, Mary J. Lancaster1
1 Pacific Northwest National Laboratory, Richland, Washington, United States of America, 2 Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of
America, 3 Georgetown University Medical Center, Washington, DC, United States of America, 4 Yale University School of Medicine, New Haven, Connecticut, United
States of America
The objective of this manuscript is to present a systematic review of biosurveillance models that operate on select agents
and can forecast the occurrence of a disease event. We define a disease event to be a biological event with focus on the
One Health paradigm. These events are characterized by evidence of infection and or disease condition. We reviewed
models that attempted to predict a disease event, not merely its transmission dynamics and we considered models
involving pathogens of concern as determined by the US National Select Agent Registry (as of June 2011). We searched
commercial and government databases and harvested Google search results for eligible models, using terms and phrases
provided by public health analysts relating to biosurveillance, remote sensing, risk assessments, spatial epidemiology, and
ecological niche modeling. After removal of duplications and extraneous material, a core collection of 6,524 items was
established, and these publications along with their abstracts are presented in a semantic wiki at http://BioCat.pnnl.gov. As
a result, we systematically reviewed 44 papers, and the results are presented in this analysis. We identified 44 models,
classified as one or more of the following: event prediction (4), spatial (26), ecological niche (28), diagnostic or clinical (6),
spread or response (9), and reviews (3). The model parameters (e.g., etiology, climatic, spatial, cultural) and data sources
(e.g., remote sensing, non-governmental organizations, expert opinion, epidemiological) were recorded and reviewed. A
component of this review is the identification of verification and validation (V&V) methods applied to each model, if any
V&V method was reported. All models were classified as either having undergone Some Verification or Validation method,
or No Verification or Validation. We close by outlining an initial set of operational readiness level guidelines for disease
prediction models based upon established Technology Readiness Level definitions.
Citation: Corley CD, Pullum LL, Hartley DM, Benedum C, Noonan C, et al. (2014) Disease Prediction Models and Operational Readiness. PLoS ONE 9(3): e91989.
Editor: Niko Speybroeck, Université Catholique de Louvain, Belgium
Received November 2, 2012; Accepted February 19, 2014; Published March 19, 2014
Copyright: ß 2014 Corley et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This study was supported through a contract to Pacific Northwest National Laboratory from the National Biosurveillance Integration Center, Office of
Health Affairs, and the Science and Technology Directorate, Chemical and Biological Division, Threat Characterization and Attribution Branch, of the U.S.
Department of Homeland Security (DHS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: [email protected]
Biosurveillance Disease forecast Infectious disease surveillance Remote sensing + disease forecast Biosurveillance
Bioterror* and model Disease outbreak origin Pathogen detection Spatial disease model Bioterror* and model
CBRN model* Epidemic model* Population dynamic + outbreak Vector-borne disease model CBRN model*
pathogens (e.g., Aphtae epizooticae), and plant pathogens (e.g., search results returned are bound by the dates of coverage of each
soybean and wheat rusts). Examples for evidence of condition database and the date in which the search was performed,
include accidental or deliberate events affecting air or water however all searching was completed by December 31, 2010. The
quality (e.g., volcanic ash, pesticide runoff), economically motivat- databases queried resulted in 12,152 citations being collected.
ed adulteration of the food and pharmaceutical supply, and Irrelevant citations on the topic of sexually transmitted diseases,
intentional exposure. In the context of this article, a biosurveil- cancer and diabetes were retrieved. We de-duplicated and
lance model is broadly defined as an abstract computational, removed extraneous studies resulting in a collection of 6,503
algorithmic, statistical, or mathematical representation that publications. We also collected 13,767 web documents based on
produces informative output related to event detection or event Google queries, often referred to as Google harvesting. We down
risk [23]. The model is formulated with a priori knowledge and selected the web documents for theses and dissertations, reducing
may ingest, process, and analyze data. A biosurveillance model this number to 21. Citations not relevant to the study of select
may be proactive or anticipatory (e.g., used to detect or forecast an agents, such as sexually transmitted diseases, cancer and diabetes,
event, respectively), it may assess risk, or it may be descriptive (e.g., were identified and removed, leaving 6,524 documents. See
used to understand the dynamics or drivers of an event) [23]. Checklist S1 for a list of information sources used in this study.
There also is a true lack of implementation of such models in Next, we filtered citations by hand based upon the definition of
routine surveillance and control activities; as a result there is not a biosurveillance model presented in the introduction and for
an active effort to build and improve capacity for such model select agents, which resulted in a 117 curated papers. Of these 117
implementation in the future [24–27]. When it comes to emerging papers, 54 were considered relevant to the study based on our
infectious disease events, or the intentional or accidental release of selection criteria; however, 10 of these dealt purely with disease
a bioterrorism agent, most such pathogens are zoonotic (trans- spread models, inactivation of bacteria, or the modeling of human
mitted from animal to human) in origin [28–30]. Therefore, in immune system responses to pathogens. As a result, we
assessing disease prediction models for biosurveillance prepared- systematically reviewed 44 papers and the results are presented
ness, it is reasonable to include a focus on agents of zoonotic origin in this analysis. See Figure S1 for a graphic summary of the data
that could arise from wildlife or domestic animal populations or reduction methodology and Checklist S1 for the PRISMA
could affect such animal populations concurrently with human guidelines used for the evaluation of the 44 papers. To enable
populations [31]. To date, the development of surveillance systems real-time collaboration and sharing of the literature, the citations
for tracking disease events in animals and humans have arisen were exported to the Biosurveillance Model Catalog housed at
largely in isolation, leading to calls for better integration of human http://BioCat.pnnl.gov.
and animal disease surveillance data streams [32], to better The models in the selected publications were classified in the
prepare for emerging and existing disease threats. Recent reports
categories listed below. These categories are not mutually
have shown some utility for such linkage [26,33].
exclusive and publications involving multiple modeling and
Two critical characteristics differentiate this work from other analytic approaches were assigned to multiple categories.
infectious disease modeling systematic reviews (e.g., [34–38]). First,
Risk Assessment models correlate risk factors for a specific
we reviewed models that attempted to predict or forecast the
location based upon weather and other covariates to calculate
disease event (not simply predict transmission dynamics). Second,
disease risk, similar to a forest fire warning. This type of model is
we considered models involving pathogens of concern as
commonly referred to as ecological niche modeling or disease risk
determined by the U.S. National Select Agent Registry as of June
mapping [39–41].
2011 (http://www.selectagents.gov).
Event Prediction models will assign a probability for when
and where the disease event is likely to occur based upon specific
Methods data sources and variables. The difference between event
Subject matter experts were asked to supply keywords and prediction and risk assessment is the end product of the model;
phrases salient to the research topic. A sample of keywords and in the former, the output is the location and a time period a disease
phrases used is shown in Table 1. Multiple searches were outbreak will occur, while the risk assessment model provides the
conducted in bibliographic databases covering the broad areas risk of an outbreak occurring under specified conditions [42–44].
of medicine, physical and life sciences, the physical environment, Spatial models forecast the geographic spread of a disease after
government and security. There were no restrictions placed on it occurs based upon the relationship between the outbreak and
publication date or language of publication. Abstracts and primarily geospatial factors. It should be noted that spatial models
citations of journal articles, books, books in a series, book sections can be considered dynamical models in that they change in time,
or chapters, edited books, theses and dissertations, conference e.g., spatial patch models. [45–47].
proceedings and abstracts, and technical reports containing the Dynamical models examine how a specific disease moves
keywords and phrases were reviewed. The publication date of through a population. These models may include parameters, such
as movement restrictions, that have the effect of interventions on Modeling and Simulation [58], and U.S. DoD MIL-STD-3022
the severity of an epidemic or epizootic. These models may be [59]. For instance, the U.S. DoD definition of verification for
used to predict and understand the dynamics of how a disease will modeling and simulation is ‘‘the process of determining that a
spread through a naı̈ve population or when the pathogenicity will model implementation and its associated data accurately represent
change [48,49]. the developer’s conceptual description and specifications’’[56].
Event Detection models attempt to identify outbreaks either The US DoD definition of validation for modeling and simulation is
through sentinel groups or through the collection of real-time ‘‘the process of determining the degree to which a model and its
diagnostic, clinical, or syndromic data and to detect spikes in signs, associated data provide an accurate representation of the real
symptoms or syndromes that are indicative of an event (e.g., event- world from the perspective of the intended uses of the model’’[56].
based biosurveillance) [50,51]. In the words of Boehm, verification answers the question ‘‘Did we
The disease agents examined in this study were taken from the build the system right?’’ and validation answers, ‘‘Did we build the
U.S. National Select Agent Registry and include human, plant, right system?’’ [60]. Further, the ‘‘official certification that a
and animal pathogens. The agents described within these models model, simulation, or federation of models and simulations and its
are grouped non-exclusively by their mode of transmission: direct associated data is acceptable for use for a specific purpose’’ is its
contact, vector-borne, water- or soil-borne and non-specific. accreditation[56], which answers the question of whether the model/
Next, we analyzed the data sources in order to find ways to simulation is credible enough to be used.
improve operational use of biosurveillance models. These non- All models were classified as either a) having undergone Some
mutually exclusive data source categories were: ‘‘Epidemiological V&V method, or b) No V&V based only on the paper(s) cited for
Data from the Same Location’’; ‘‘Epidemiological Data from a that model. Those models classified as having undergone Some
Different Location’’; ‘‘Governmental and Non-Governmental V&V were further classified based upon the type of V&V
Organizations’’; ‘‘Satellite (Remote Sensing)’’; ‘‘Simulated’’; ‘‘Lab- method(s) applied to these models. The V&V method classifica-
oratory Diagnostic’’; ‘‘Expert Opinion’’; and ‘‘Literature.’’ If a tions used were ‘‘Statistical Verification’’; ‘‘Sensitivity Analysis
paper cited any form of literature that was not epidemiological, (verification)’’; ‘‘Specificity and Sensitivity (verification)’’; ‘‘Verifi-
weather, or population data, it was categorized within the cation using Training Data’’; ‘‘Validation using Temporally
literature group. An example of this is references to the preferred Independent Data’’; and ‘‘Validation using Spatially and Tempo-
natural habitat or survival requirements for a disease agent. Papers rally Independent Data.’’ In general, no conclusions on model
that cited epidemiological data from a location independent of the credibility can be based on the types of V&V methods used, given
validation data were grouped ‘‘Epidemiological Data from a that a) none of the papers were focused on the model V&V, and b)
Different Location,’’ ‘‘Simulated Data,’’ and ‘‘Experimental seldom are all aspects of V&V reported upon in the types of papers
Data.’’ ‘‘Expert Opinion’’ did not explicitly state from whom or surveyed. The most frequently used verification method used is
what type of data was used. In addition to the model data sources, some form of statistical verification. It is important to note that
twelve non-mutually exclusive variable categories were identified verification methods do not necessarily imply that a model is
to facilitate understanding of how these models could be used correct. In this type of verification, methods such as Kappa (used
effectively by the research and operational communities. Models to assess the degree to which two or more persons, examining the
with variables describing location or distance and rainfall or same data, agree on the assignment of data to categories), area
temperature were categorized as ‘‘Geospatial’’ and ‘‘Climatic,’’ under the receiving operating characteristic (ROC) curve,
respectively. Models that took into account the epidemiological goodness of fit, and other statistical values are examined to help
(population-level) characteristics of the disease were grouped measure the ability of the model to accurately describe or predict
together as ‘‘Epidemiological.’’ Variables that dealt specifically the outbreak. Several models plotted observed data against
with the agent or etiology were categorized under ‘‘Etiology.’’ predicted data as a V&V technique. This technique was further
Population size, density, and other related variables were grouped delineated, depending on whether the observed data were part the
into either ‘‘Affected Population’’ (i.e., the animal, plant, or model’s training data (verification), temporally independent of the
human population affected by the disease) or ‘‘Vectors and Other training data (validation), or temporally and spatially independent
Populations’’ (i.e., populations of the vector or any other of the training data (validation). The remaining models applied
population that may be considered within the model but that verification methods such as sensitivity analysis, which examined
was not affected by the disease). Models that utilized remote whether a model functioned as it was believed to when different
sensing data such as the ‘‘normalized difference vegetation index’’ values were input into important variables; or specificity and
(NDVI), a measurement used to determine the amount of living sensitivity metrics, which measure the ability to determine true
green vegetation in a targeted area, were grouped within ‘‘Satellite positives and negatives. We acknowledge that not all of these V&V
(Remote Sensing).’’ ‘‘Agricultural’’ techniques, such as tillage techniques are applicable to every model type. Also note that the
systems, were also identified to be variables in some models as well use of a verification or validation method does not constitute
as ‘‘Clinical’’ and ‘‘Temporal’’ variables. The final two variable complete verification or validation of the model. For instance, the
types identified were ‘‘Topographic and Environmental,’’ such as IEEE standard for software verification and validation (IEEE Std
altitude or forest type, and ‘‘Social, Cultural, and Behavioral,’’ 1012-2005) includes five V&V processes, supported by ten V&V
which included religious affiliations and education. activities, in turn implemented by 79 V&V tasks. To put this in
There are many verification and validation (V&V) standards perspective of the study, the V&V methods noted herein are at or
(e.g., ISO/IEC 15288-2008 [52], IEEE Std 1012-2012 [53], ISO/ below the level of task. Assessment of inherent biases present
IEEE 12207 [54]) and definitions, including some that are within these source documents and models reviewed is beyond the
specifically focused on modeling and simulation: NASA-STD- scope of this study.
7009 [55], Verification, Validation, and Accreditation Recom-
mended Practices Guide from the U.S. Department of Defense Results and Analysis
(U.S. DoD) Modeling & Simulation Coordination Office [56],
U.S. Army TRADOC Reg 5-11 [57], U.S. Navy Best Practices The publications’ models were categorized as follows (see
Guide for Verification, Validation, and Accreditation of Legacy Table 2): event prediction (n = 4), spatial (n = 26), ecological niche
Dynamical [74–82] 9
Event Detection [80,83–87] 6
Event Prediction [74,88–90] 4
Review Articles [91–93] 3*
Risk Assessment [21,62,75–78,88,91,94–107] 28
Spatial [61,74,75,78,79,83–85,89,94–99,108,109] 26
(n = 28), diagnostic or clinical (n = 6), spread or response (n = 9), with lower confidence such that users may not trust the results or
and reviews (n = 3). The event prediction type includes only four may not trust that the findings are relevant. Similarly, users may
models—possibly explained by the difficulty in creating of a model not have faith that models are structured in a biologically
that truly predicts disease events. In general, these models were meaningful way if biologic or epidemiologic data do not appear
applied to (or involved) small or special populations (e.g., in a model [63]. Nonetheless, before incorporating epidemiolog-
populations with chronic diseases). According to Favier et al., ical data in disease event prediction models, further research is
the lack of prediction models could be addressed by taking a ‘‘toy needed to determine whether such data will increase the model’s
model’’ and creating a predictive model [61]. If models that are robustness, sensitivity, and specificity. Factors such as accuracy
similar to predictive models, such as risk assessment, could be and precision of epidemiological data will influence this analysis.
modified into such, the number of predictive models could be To better understand the relationship between the variables and
increased. the disease agent’s mode of transmission, a graph (Figure 1) was
created to show the distribution of different modes of transmission
Transmission Mode cited for each variable type used in the evaluated models. Table 5
The transmission modes of the models disease agent spanned shows the distribution of citations for each variable type. It was
the following: direct contact (n = 24), vector-borne (n = 15), water- noted without surprise that, as more research was done on a mode
or soil-borne (n = 7), and non-specific (n = 3); (see Table 3). Direct of transmission, more variables were examined. Furthermore the
contact and vector-borne models accounted for approximately variables, ‘‘Vectors or Other Populations’’ and ‘‘Social, Cultural,
84% of all of the evaluated models. Behavioral’’ were underutilized in the evaluated models. This is
unfortunate because these variables typically have a seasonal
Data Sources and Variables abundance pattern. Further, human socio-cultural behaviors
The data sources (e.g., remote sensing, non-governmental greatly impact the interactions between human and vector
organizations, expert opinion, epidemiological) and variable populations and seasonal meteorological variation can strongly
parameters (e.g., etiology, climatic, spatial, cultural) for each affect vector abundance and competence [64]. Relatively few
model were recorded and reviewed (see Table 4). The two disease prediction models were identified in which the causative
categories that contained the most data sources were ‘‘Epidemi- agent was water- or soil-borne [20,65].
ological Data from the Same Location’’ (n = 25), such as a previous
outbreak, and data gathered from an organization, such as census Verification and Validation Methods
data. Thirty-two models used some type of ‘‘Literature’’ (n = 14) The V&V methods applied to each model, if any, were also
an important fact is that the majority of data used in the models analyzed; see Table 6. Among the types of papers surveyed, few
were scientifically measured. aspects of V&V are typically reported. The majority of models
Categories of variables and parameters utilized in the models selected for this study were subjected some method of verification
supplemented the data sources. The two largest groupings were or validation. Publications on many applications of predictive
‘‘Geospatial’’ and ‘‘Climatic’’ variables. According to Eisen et al. models typically state statistical, sensitivity analysis, and training
[62], models that do not use epidemiological data produce results data test results. These are necessary, though insufficient methods
If a model involved multiple agents in different categories, the paper was placed in multiple groups.
to determine the credibility, verification or validation of a model. verification or validation method does not constitute complete
For instance, the IEEE standard for software verification and verification or validation of the model [66–68].
validation (IEEE Std 1012-2005) includes five V&V processes,
supported by ten V&V activities, which are in turn implemented Operational Readiness
by 79 V&V tasks. To put this into perspective, the V&V methods Given the importance of these models to national and
noted herein are at or below the level of task. The papers reported international health security [69], we note the importance of a
the use of V&V methods for many models but not for others, and categorization scheme that defines a model’s viability for use in an
for the latter case it is unclear whether V&V methods were not operational setting. To our knowledge, none exists, but below we
used or merely unreported. Another positive observation is the illustrate one possibility, based upon the ‘‘technology readiness
significant use of real epidemiological data to examine aspects of level’’ (TRL) originally defined by NASA [70], to evaluate the
model validity. Even though ‘‘Validation using Spatially and technology readiness of space development programs. Important
Temporally Independent Data’’ was used for one of the smallest to note: NASA TRL levels were not developed to cover modeling
sets of models, use of actual data versus predicted data for and simulation, much less biosurveillance models, so the
validation tests was reported for approximately 33% of the models. definitions require modification. In the public health domain,
The reader is encouraged to understand that the use of a TRLs can assist decision makers in understanding the operational
Figure 1. The Percentage of Citations Placed in Each Variable Group by Transmission Mode (if a model contained variables from
multiple groups, it was placed in each respective group).
If a model contained variables from multiple groups, it was placed in each respective group.
readiness level, maturity and utility of a disease event or prediction Readiness Level rating of any given model will thus depend upon
model. Advantages of utilizing the TRL paradigm are that it can the diverse questions and purposes to which any given model is
provide a common understanding of biosurveillance model applied.
maturity, inform risk management, support decision making An initial scheme modifying these definitions is shown in
concerning government funded research and technology invest- Table 7. In such a scheme, the models would be characterized
ments, and support decisions concerning transition of technology. based on how the model was validated, what type of data was used
We also point out the characteristics of TRLs that may limit their to validate the model, and the validity of data used to create the
utility, such as the operational readiness of a model does not model. The V&V of predictive models, regardless of realm of
necessarily fit with technology maturity (V&V), a mature disease application, is an area that requires better definition and
prediction or forecasting model may possess a greater or lesser techniques. The results of model V&V can be used in the
degree of readiness for use in a particular geographic region than definition of model operational readiness; however the readiness
one of lower maturity, and numerous additional factors must be level definitions must also be accompanied by data validation,
considered, including the relevance of the models’ operational uncertainty quantification, and model fitness for use evaluations,
environment, the cost, technological accessibility, sustainability, many of which are areas of active research [73].
"Operational readiness" is a concept that is user and intended Discussion
use dependent. A model that one user may consider ready may not
suffice for readiness with another user. Different users have Our study was conducted to characterize published select-agent
different needs according to their missions. For example, in the pathogen models that are capable of predicting disease events in
case of surveillance models, some will need to see everything order to determine opportunities for expanded research and to
reported by event-based surveillance systems (i.e., they are define operational readiness levels [38]. Out of an initial collection
unconcerned with specificity but sensitivity is of high value to of 6,524 items 44 papers met inclusion criteria and were
them), while other users may demand low false alarm rates (i.e., systematically reviewed. Models were classified as one or more
specificity is important for their needs) [71,72]. The Operational of the following: event prediction, spatial, ecological niche,
No V&V [61,78,86,92,109] 5
Sensitivity Analysis (verification) [79–82,94,99,112,113] 8
Specificity and Sensitivity (verification) [1,75,84,95] 4
Statistical Verification [21,75,77,79,82,83,88,94–97,99–106,108,115] 21
Validation using Spatially and Temporally Independent Data [79,90] 2
Validation using Temporally Independent Data [84,88,97,102,103,111] 6
Verification using Training Data [21,75,81,85,89,97,104,105,107,112,115] 11
If a model used multiple methods for its verification or validation, it was categorized in each respective group.
Table 7. Initial Definitions of Operational Readiness Levels for Disease Prediction Models.
Level Definition
