Global Hotspots

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

ARTICLE

DOI: 10.1038/s41467-017-00923-8 OPEN

Global hotspots and correlates of emerging


zoonotic diseases
Toph Allen1, Kris A. Murray2,3, Carlos Zambrana-Torrelio 1, Stephen S. Morse4, Carlo Rondinini5,
Moreno Di Marco6,7, Nathan Breit1, Kevin J. Olival1 & Peter Daszak1

Zoonoses originating from wildlife represent a significant threat to global health, security and
economic growth, and combatting their emergence is a public health priority. However, our
understanding of the mechanisms underlying their emergence remains rudimentary. Here we
update a global database of emerging infectious disease (EID) events, create a novel measure
of reporting effort, and fit boosted regression tree models to analyze the demographic,
environmental and biological correlates of their occurrence. After accounting for reporting
effort, we show that zoonotic EID risk is elevated in forested tropical regions experiencing
land-use changes and where wildlife biodiversity (mammal species richness) is high. We
present a new global hotspot map of spatial variation in our zoonotic EID risk index, and
partial dependence plots illustrating relationships between events and predictors. Our results
may help to improve surveillance and long-term EID monitoring programs, and design field
experiments to test underlying mechanisms of zoonotic disease emergence.

1 EcoHealth Alliance, 460 West 34th Street, 17th Floor, New York, NY 10001, USA. 2 Department of Infectious Disease Epidemiology, School of Public Health,

Imperial College London, St Mary’s Campus, Norfolk Place, London W2 1PG, UK. 3 Grantham Institute – Climate Change and the Environment, Imperial
College London, Exhibition Road, London SW7 2AZ, UK. 4 Mailman School of Public Health, Columbia University, 722 West 168th St #1504, New York, NY
10032, USA. 5 Global Mammal Assessment Program, Department of Biology and Biotechnologies, Sapienza University of Rome, Viale dell’Università 32,
00185 Rome, Italy. 6 ARC Centre of Excellence for Environmental Decisions, Centre for Biosiversity and Conservation Science, University of Queensland, St
Lucia, QLD 4072, Australia. 7 School of Earth and Environmental Sciences, The University of Queensland, St Lucia, QLD 4072, Australia. Correspondence and
requests for materials should be addressed to P.D. (email: [email protected])

NATURE COMMUNICATIONS | 8: 1124 | DOI: 10.1038/s41467-017-00923-8 | www.nature.com/naturecommunications 1


ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-00923-8

E
merging infectious diseases (EIDs) are a significant and A previous analysis of global EID trends modeled the spatial
growing threat to global health, global economy and variation of “EID events”, representing records of the first
global security1, 2. Analyses of their trends suggest that their appearance of a pathogen in a human population related to
frequency and economic impact are on the rise3, 4, yet our increased distribution (e.g., new geographic location, new host
understanding of the causes of disease emergence is incomplete. species), incidence, virulence, or other factors4. The EID events
The majority of EIDs (and almost all recent pandemics) originate were divided into four groups, including wildlife origin zoonoses4.
in animals, mostly wildlife, and their emergence often involves To model the potential risk of disease emergence, these four
dynamic interactions among populations of wildlife, livestock, groups were regressed as a function of human population density
and people within rapidly changing environments5–7. The and growth, latitude, rainfall, and wildlife species richness. The
mechanisms underlying this process are likely complex, and results suggest that wildlife origin EIDs are more likely to occur in
occur in contexts that are often characterized by a paucity of regions with higher human population density and greater
systematically collected data8. wildlife diversity (mammal species richness)8. However, the
Global efforts to reduce the impacts of emerging diseases study is limited in its mechanistic inference due, in part, to the
are largely focused on post-emergence outbreak control, lack of specificity of the predictors. For example, the effect of
quarantine, drug, and vaccine development3. However, delays in population density could represent anthropogenic environmental
detection of or response to newly emerged pathogens, combined changes (human pressure on landscapes), human-animal contact
with increased global urbanization and connectivity, have rates, reporting biases, or a combination of these. Furthermore, a
resulted in recent EIDs causing extensive mortality across range of potential mechanisms may not be adequately
cultural, political, and national boundaries (e.g., HIV), and represented by this predictor set; a lack of an effect of rainfall,
disproportionately high economic damages (e.g., SARS, H1N1). for example, does not discount the potential for other climatic
Efforts to identify the origins and causes of disease emergence factors to play a role, and a lack of an effect of latitude
at local scales, and regions from which novel diseases may could mean that it is simply a poor proxy for other more
be more likely to emerge, are valuable for focusing meaningful factors that nevertheless exhibit some latitudinal
surveillance, prevention, and control programs earlier in variation (e.g., temperature, habitat types, biodiversity, and GDP).
the chain of emergence, containing EIDs closer to their source, Improving the predictor set to better target underlying mechan-
and more effectively limiting their subsequent spread and isms could improve model performance and our ability to explain
socioeconomic impacts8. spatial variation in EID risk.

Evergreen broadleaf
trees
Population

Global envir. strat.

Mammal biodiversity
Cultivated/managed
veg.
Pasture change

Pasture
Evergreen/deciduous
needleleaf trees
Livestock mammal Group
headcount
Predictor

Human activity
Cropland change
Animals
Regularly flooded veg. Environment
Population change

Cropland

Shrubs

Urban/built–up

Poultry
Deciduous broadleaf
trees
Herbaceous veg.

Mixed/other trees

0 10 20 30 40
Relative influence (%)

Fig. 1 The relative influence of predictors on EID event occurrence probability. The box plots show the spread of relative influence across 1000 replicate
model runs to account for uncertainty in EID event location (see above). Whiskers represent the minimum or maximum datum up to 1.5 times the inter-
quartile range beyond the lower or upper quartile. BRTs do not provide p-values or coefficients, but rank variables by their relative influence in explaining
variation in the outcome26

2 NATURE COMMUNICATIONS | 8: 1124 | DOI: 10.1038/s41467-017-00923-8 | www.nature.com/naturecommunications


NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-00923-8 ARTICLE

The current study aims to better analyze the mechanistic of Jones et al.4 We focus on EIDs of wildlife origin, which are
underpinnings of disease emergence for zoonotic EIDs of responsible for nearly all recent pandemics (e.g., Ebola, MERS),
wildlife origin, while addressing some methodological limitations constitute the majority of the high impact EIDs from the last few

Evergreen broadleaf Population


Global envir. strat. Mammal biodiversity
trees (/1,000,000)
0.60

0.55

0.50

0.45

0.40
0 5 10 15 0 1 2 3 4 5 60 80 100 40 60 80 100 120

Cultivated/managed Evergreen/deciduous
Pasture change Pasture
veg. needleleaf trees
0.60

0.55

0.50

0.45

0.40
0 20 40 60 −0.01 0.00 0.01 0.0 0.1 0.2 0.3 0.4 0 5 10

Livestock mammal
Cropland change Regularly flooded veg. Population change
headcount (/100,000)
EID event risk index (and 90% CI)

0.60
Group
0.55
Human activity
0.50
Animals
0.45
Environment
0.40
0.0 2.5 5.0 7.5 10.0 12.5 −0.03 −0.02 −0.01 0.00 0.01 0.00 0.25 0.50 0.75 0e+00 2e+05 4e+05 6e+05

Cropland Shrubs Urban/built−up Poultry


0.60

0.55

0.50

0.45

0.40
0.0 0.2 0.4 0 5 10 15 0 5 10 0e+00 3e+06 6e+06 9e+06

Deciduous broadleaf
Herbaceous veg. Mixed/other trees
trees
0.60

0.55

0.50

0.45

0.40
0 5 10 15 20 0 5 10 15 0 10 20 30
Value of predictor

Fig. 2 Partial dependence plots showing the influence on zoonotic EID events for all predictors in the weighted boosted regression tree model, ordered by
relative influence. X axes show the range from the 10th to 90th percentiles of sampled values of predictors (e.g., number of mammal species per grid
square formammalian richness, or proportion of grid cell for a land cover type). Gray bars show histograms of predictor distribution along X axes. Y axes
show the effect on the EID event risk index from that variable. Black lines show the median and colored areas show the 90% confidence intervals,
computed using a bootstrap resampling regime incorporating uncertainty in EID event locations. The overall prevalence of our outcome, which indexes EID
event risk, is fixed by the resampling regime between 0 and 1, with a mean at 0.5. Y axes are centered around the mean and scaled to 0.1 above and below.
Partial dependence plots display the response for an individual variable in the model while holding all other variables constant26, 61. They allow a
visualization of what are mostly non-linear relationships between drivers and the EID event risk index (in this case, after reporting effort is factored out.).
See Supplementary Note 3 for results of the model unweighted by reporting effort

NATURE COMMUNICATIONS | 8: 1124 | DOI: 10.1038/s41467-017-00923-8 | www.nature.com/naturecommunications 3


ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-00923-8

High

Low

High

Low

Fig. 3 Heat maps of predicted relative risk distribution of zoonotic EID events. a shows the predicted distribution of new events being observed (weighted
model output with current reporting effort); b shows the estimated risk of event locations after factoring out reporting bias (weighted model output
reweighted by population). See Fig. 4 for raw weighted model output. Maps were created using standard deviation scaling, with the color palette scaled to
2.5 s.d. above and below the mean

decades, and are a significantly growing proportion of all EIDs account the difficulties of accurately geocoding EID events.
combined4. We updated the EID database from4, and employed a Our results suggest that EID events are best predicted by
new modeling framework (boosted regression trees, BRT) to the distribution of tropical forested regions, higher mammalian
capture high-dimensional interactions and generate response species richness, and variables relating to shifts in agricultural
functions for individual variables. We selected a refined set of land use; and appear to occur more often in tropical regions.
spatial predictors for their relevance to a priori hypotheses on We identify specific areas and approaches where a research focus
plausible mechanisms underlying zoonotic EID emergence, may identify more specific trends not apparent in our data.
including proxies for human activity, environmental factors,
and the zoonotic pathogen pool from which novel diseases
could emerge, all key features of conceptual models of zoonotic Results
spillover7–11. We used an improved data set of mammal species Variables in boosted regression tree models. After factoring out
distributions12, and included numerous data sets on measures of reporting effort (in the weighted model), evergreen broadleaf
land use, land-use change and land cover. Furthermore, all data trees (median 7.6% of the model’s predictive power), human
sets with sufficient temporal coverage were matched to events in population density (6.9%), Global Environmental Stratification
the EID database by decade, such that covariates more accurately (climate) (5.9%), and mammal species richness (an aspect of
reflect the prevailing conditions at the time of disease emergence. biodiversity) (5.6%) had the largest relative influence over the
We also constructed a novel proxy of reporting effort to match distribution of EID events (Fig. 1). Across 1000 iterations of
the spatial resolution of the other predictors, where previous the model, no variables consistently emerged as much stronger
studies have relied on coarse, country-level measures, and predictors than others but an average ranking of predictor
compared EID risk predictions with and without corrections for importance could be derived. Of the top predictors, evergreen
reporting effort. Finally, we accounted for spatial uncertainty in broadleaf trees (representing tropical rainforests) exhibited an
EID event data by random resampling to explicitly take into overall positive trend, human population density an overall
negative trend, the Global Environmental Stratification (climate)

4 NATURE COMMUNICATIONS | 8: 1124 | DOI: 10.1038/s41467-017-00923-8 | www.nature.com/naturecommunications


NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-00923-8 ARTICLE

Event probability
(relative to
reporting effort)

0.7
0.6
0.5
0.4
0.3

Fig. 4 Heat map of weighted model response, i.e., EID risk relative to reporting effort. Value indicates the binomial probability that a grid cell sampled at
that location will contain an EID event as opposed to a background sample, when drawing equal numbers of absence and background samples weighted by
reporting effort (see Methods section). This layer was weighted by reporting effort to produce the “observed” EID risk index map (Fig. 3a) and by
population to produce the risk index map with bias factored out (Fig. 3b)
with an empirical 90% confidence interval ranging from 0.54 to
an idiosyncratic trend towards warmer and wetter (i.e., more 0.69 (out of possible values between 0 and 1, with 0.5 indicating
tropical) climates, and mammal species richness showed performance no better than random). The median True Skill
an idiosyncratic trend, with higher risk values at lower and Statistic (TSS) was 0.23 with an empirical 90% CI of 0.14 to 0.33
particularly higher richness values (Fig. 2). After mammal species (out of a range of −1 to 1). These indicate low to moderate
richness, three variables involving agricultural practices followed predictive performance13–15. Evaluated against an unweighted
in importance: cultivated/managed vegetation (5.6%), pasture null, the weighted model had a median AUC of 0.78 (90% CI
change (5.2%), and areas dedicated to pasture (5.1%). In the (0.75, 0.81)) and a median TSS of 0.43 (90% CI (0.37, 0.50)).
unweighted model, which did not account for reporting effort The unweighted model evaluated against to an unweighted null,
(Supplementary Note 3), urban/built-up land was by far the had a median AUC of 0.77 (90% CI (0.73, 0.81)) and a median
strongest predictor of observed events, explaining a median of TSS of 0.44 (90% CI (0.37, 0.50)).
30.6% of the model’s variation and exhibiting a distinct positive
trend.
Discussion
Global distribution of EID risk index. Relative to the observed We developed a spatial model to describe the global spatial
risk index for EID events, the model’s estimated risk index patterns of zoonotic EIDs. Our main model (the “weighted
correcting for reporting bias (Fig. 3) is more concentrated in model” factored out clear effects of reporting effort, which
tropical regions. Areas of higher suitability for EID occurrence are otherwise biases our ability to interpret EID event observations. It
fairly evenly distributed across the continents, with no major ranked risk factors according to their predictive power, capturing
land mass free from areas predicted to be suitable for EIDs. In both their main effects and potential interactions with other
particular, areas of high population outside the tropics, such as variables, and we derived the directionality and shape of their
cities in Europe, the United States, Asia and Latin America relationships to EID events for graphical interpretation. Our
remain among areas at the high end of the risk index. Tropical results suggest that the risk of disease emergence is elevated
regions in North America, Asia, Central Africa, and regions of in tropical forest regions, high in mammal biodiversity,
South America have more extensive areas of predicted EID and experiencing anthropogenic land use changes related to
occurrence. agricultural practices16–18.
The link between mammal biodiversity and zoonotic disease
Model performance and validation statistics. Our model emergence has been identified previously4 and hypothesized
validation statistics were computed both for the weighted model widely8, 19. Areas with tropical forest and high mammalian
—with a background, or absence, sample weighted by reporting biodiversity were elevated on our EID risk index (henceforth
effort, effectively computing statistics on the residuals of that “EID risk”), although the uncertainty of the estimates was high.
variable—and our unweighted model, using a background sample It may be that these variables represent the same mechanism,
uniform across land area. The weighted bootstrap model reported as tropical forests are generally areas of high biodiversity20, and
a median of 31.6% of deviance explained across the 1000 replicate the apparent association may be attenuated by the presence of
models (empirical 90% confidence interval (CI) 15.9% to 50.5%), both in the model. This trend is consistent with existing
whereas the unweighted model explained a median 50.2% of hypotheses, which suggest greater host biodiversity, increases
deviance (empirical 90% CI 35.8% to 67.2%). Our weighted the “depth” of the pathogen pool from which novel pathogens
model’s cross-validation statistics, computed over 100 runs of may emerge, which in turn increases the potential for novel
10-fold cross-validation, varied depending on the weighting of the zoonotic pathogens to emerge21. There is a large literature on
null validation sample. With validation absences weighted by the relationship between biodiversity and infectious disease
reporting effort, the weighted model had a median AUC of 0.64, risk in people, with some studies suggesting that high host

NATURE COMMUNICATIONS | 8: 1124 | DOI: 10.1038/s41467-017-00923-8 | www.nature.com/naturecommunications 5


ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-00923-8

biodiversity decreases risk or that biodiversity loss may increase realistically map reporting effort and shows the significant
risk (i.e., the dilution effect)22, while others refute the general- implications that a finer-scale, sub-national resolution variable for
izability of this23, 24 or suggest disease richness or prevalence reporting effort can have for a model. Finally, we were able to
increases with increasing wildlife species richness13. Our findings temporally match predictors to events.
look at the global scale and a large group of pathogens, and so Despite using a more flexible modeling framework, there are
do not speak directly to this debate: although the dominant limitations to our approach. When differentiating between EID
trend is an increase in risk of disease emergence with higher events and a uniformly weighted background sample, our
mammalian richness, this neither rules out nor substantiates the weighted and unweighted models had an AUC of 0.78 and 0.77,
possibility of a dilution effect for specific diseases. Rather, it is and a TSS of 0.43 and 0.41, respectively, indicating moderate
consistent with previous suggestions that the relationship between predictive performance. However, against a background sample
biodiversity and disease risk is complex, context-specific and weighted by reporting effort, our weighted model had an AUC of
idiosyncratic23. 0.61 and a TSS of 0.18, indicating low–moderate performance.
When not accounting for reporting effort (unweighted), These statistics indicate much unexplained variation. While broad
our model showed urban land as having a very strong positive changes in zoonotic EID relative risk are evident in the partial
association with EID events. However, this can be interpreted as dependence plots, in areas of elevated risk CIs are generally wide
an effect of reporting bias, since (1) urban land was also strongly enough that quantitative relationships remain uncertain.
associated with our measure of reporting effort, and (2) fitting our Wherever possible, we tried to define and incorporate
weighted model, relative to reporting effort, attenuated this effect. uncertainty into our model (e.g., correcting for uncertainty in
Similarly, although population density was not found to be an location by sampling EID events from within known areas of
important predictor in the unweighted model (median relative occurrence, and correcting for literature-level biases by weighting
influence 2.2%), weighting the model by reporting effort drove up background samples by our measure of observation effort).
its importance (median rel. inf. 6.9%), such that EID risk was Multiple factors contribute to this uncertainty. First, analyses
inversely related to population density. Population density was were conducted using gridded data at 1° WGS84 resolution
also included in the reporting effort model, but was not as (c. 100 km at the equator), the same resolution used previously4.
strong a predictor (rel. inf. 3.6%) as urban land (rel. inf. 45.2%). Our choice of resolution for predictor data sets was constrained
Theoretically, population has a baseline multiplicative by data availability, since all were downscaled to the lowest
effect on human disease events25—of which EID events are a common spatial resolution. Second, CIs are widest in regions for
subclass—and their detection is modulated by reporting effort. each variable where fewer grid cells were sampled. Since our
Reporting effort appears to be associated with urbanization, but weighted model sampled fewer grid cells proportional with
reporting effort and urbanization are also both products of reporting effort, these represent areas where more reporting effort
human population. We did not attempt to fully disentangle these —including ground-truthing studies—may increase confidence.
factors, instead using our measure of reporting effort to present a Third, another limitation shared with ref. 4 is the underlying
map of emerging infectious disease hotspots with bias “factored accuracy and suitability of EID event data, which were drawn
out” (described below in Methods section). from a review of published literature. Individual studies
Our reporting effort measure was created by matching place carry their own biases, inaccuracies, and different approaches to
names in a subset of the biomedical literature. The BRT model of collecting and documenting data, and this alone adds an
reporting effort model suggested that the distribution of this unknown amount of imprecision and potential bias to our
effort was strongly and positively related to urban areas. outcome data set. Finally, our goal of creating a single model, to
This could be because our extraction of place names biases the look for common trends in emerging wildlife origin zoonotic
outcome toward urban areas, or it may accurately represent diseases, likely imposes limitations on the specificity of trends we
the true distribution of reporting toward urban areas, or a can examine. In reality, different classes of diseases (e.g., viruses
combination of the two. In either case, our reporting effort data versus bacteria) and indeed individual diseases have their own
set is likely to be a large improvement over similar previous unique biology and ecology, with different drivers and sets of
studies that have used country-level data to control hetero- conditions being more or less important in shaping the
geneous reporting effort in better-than country-level spatial emergence process27. Because of these limitations, we refrain
analyses of disease risk4, 25 (detailed fully in Supplementary from making specific (e.g., city by city) interpretations of the
Methods). model’s output, rather noting broad trends in geographic regions
The work presented here builds on previous research4 in a and environment types of intererest.
number of important ways to advance our understanding of Wide confidence intervals in areas of elevated EID risk suggest
wildlife origin zoonotic disease emergence. First, our model areas for future study, and underscore the need for targeted
building approach explores the explanatory value of a large long-term disease surveillance and monitoring in these areas.
collection of globally gridded data on environmental, demo- Collection of more accurate spatiotemporal data on events
graphic, and host diversity variables, including newly developed surrounding disease emergence, including initial emergence
models of mammal distributions and richness patterns. This has events, using a combination of large scale field research
allowed us to close the gap between predictors and a priori (e.g., USAID’s PREDICT project28) and digital disease detection
mechanistic hypotheses specifically relevant to zoonotic disease tools29 would help alleviate this issue in the future by generating
emergence from wildlife reservoirs. Second, we adopted a more consistent data on a larger scale, potentially automatically30.
machine-learning modeling approach (boosted regression trees) These data sets will aid efforts to better define the point at which
suited to the analysis of complex ecological data26, and used a disease becomes “emerging”, and allow the programmatic
various resampling regimes to measure and visualize multiple definition and examination of different definitions of emergence
sources of uncertainty (model uncertainty, spatial uncertainty of (e.g., first appearance vs. increasing incidence, etc.) in testable
EID events, and temporal uncertainty of covariates matching with form31.
events) and predictive performance. Third, we have attempted to Future work may be able to enhance the predictive power of
improve how the model accounts for uneven global distribution this approach by focusing on even tighter classes of disease,
of surveillance and research on disease event detection (i.e., report taxonomic groups of pathogens and hosts, or transmission
effort). This includes an algorithm-based approach to more modes, and building models to forecast changes in risk

6 NATURE COMMUNICATIONS | 8: 1124 | DOI: 10.1038/s41467-017-00923-8 | www.nature.com/naturecommunications


NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-00923-8 ARTICLE

Table 1 List of predictor layers included in the model

Variable Unit per grid cell Type Source data set Processing Temporal resolution
Human population Population Human activity GRUMP Rescaled Decadal
Population change Change in population Human activity GRUMP (calculated) Calculated from rescaled layers Decadal
Cropland Proportion Human activity HYDE Rescaled Decadal
Cropland change Change in proportion Human activity HYDE (calculated) Calculated from rescaled layers Decadal
Pasture Proportion Human activity HYDE Rescaled Decadal
Pasture change Change in proportion Human activity HYDE (calculated) Calculated from rescaled layers Decadal
Urban land Percentage Human activity EarthEnv Rescaled Decadal
Managed/cultivated vegetation Percentage Human activity EarthEnv Rescaled Static
Mammalian species richness Count of species Animals/hosts Global Mammal Assessment Reprojected, rescaled Static
Domestic mammal headcount Count of animals Animals/hosts GLW Rescaled, summed buffalo, cattle, Static
goat, pig, sheep headcounts
Poultry headcount Count of animals Animals/hosts GLW Rescaled Static
Global environmental stratification Global environmental stratification Environment GEnS Rescaled Static
Evergreen/deciduous needleleaf trees Percentage Environment EarthEnv Rescaled Static
Evergreen broadleaf trees Percentage Environment EarthEnv Rescaled Static
Deciduous broadleaf trees Percentage Environment EarthEnv Rescaled Static
Mixed/other trees Percentage Environment EarthEnv Rescaled Static
Shrubs Percentage Environment EarthEnv Rescaled Static
Herbaceous vegetation Percentage Environment EarthEnv Rescaled Static
Regularly flooded vegetation Percentage Environment EarthEnv Rescaled Static
Reporting effort Weighted number of mentions in Observation bias (Internal) (See methods) Static
publications

distribution or to examine more specific mechanistic hypotheses. approach will provide a way to identify the fine-scale rules that
For example, our model includes a single layer representing total govern disease emergence and provide a richer understanding of
mammal species richness, whereas recent work has shown what drives EID risk on-the-ground, a critical extension of this
that the number of zoonotic viruses varies across mammal species modeling approach.
and taxa32. Efforts to examine the commonalities of disease
emergence may benefit from incorporating host-specific or Methods
disease-specific models in a hierarchical approach, allowing Zoonotic EID events as response variable. We followed the definition of an
certain parameters to vary across diseases, disease classes, or other emerging infectious disease and an EID event used in ref. 4—specifically, events
documented in the scientific literature denoting the first emergence of pathogen in
properties. a human population where that pathogen was classified as “emerging” due to
Despite shortcomings, our improvements to the earlier model recent spillover from an animal reservoir, a significant increase in its incidence
allowed us to find quantitative support for previously only or geographic distribution in the human population, a marked change in its
hypothesized factors that increase the risk of EID events. Our pathogenicity or virulence, or other factors. In this study we focus only on EID
findings, therefore, have broad implications for surveillance, events of wildlife origin (“wildlife zoonoses”) because these represent the majority
of EID events in the most recent decade studied, are increasing significantly as a
monitoring, control, and research on emerging infectious dis- proportion of all EIDs after correcting for reporting bias, include most of the
eases. Like Jones et al.,4 we find that EID events are observed highest impact EIDs of recent decades (e.g., Ebola viruses, Nipah virus) and almost
predominantly in developed countries, where surveillance is all recent pandemics (e.g., pandemic influenza viruses, SARS). Data on EID events
strongest, but that our predicted risk is higher in tropical, were derived from an updated version of the database originally used by ref. 4
(Supplementary Data 1), which contained EID events ranging from 1940 to 2004
developing countries. (n = 335 total, n = 145 for wildlife zoonoses (43.3% of all EIDs)). We updated
Our spatial mapping has direct relevance to ongoing the database to include EID events for wildlife zoonoses through 2008 (n = 224),
surveillance and pathogen discovery efforts (http://www. following the methodology in ref. 4 so as to include only diseases reported in the
globalviromeproject.org/). It shows that the global distribution peer-reviewed literature, where there is evidence that a disease is emerging for one
of the reasons laid out above. In addition, we only included the first emergence of a
of zoonotic EID risk (and the presence of EID “hotspots”) is new disease-causing agent, such that the MERS Coronavirus was included, but not
concentrated in tropical regions where wildlife biodiversity is high reports of new strains of Ebola virus. For each EID event, data were derived from
and land-use change is occurring. These regions are likely to be the literature, if available, for date, location (see below), pathogen genus and
the most cost effective for surveillance programs targeting wild- species, zoonotic origin and type, and associated or hypothesized drivers,
following ref. 4. Location data for initial EID emergence events were variable in
life, livestock or people for novel zoonoses, and for pandemic their geographic specificity, ranging from precise coordinates to broader regions
prevention programs that build capacity and infrastructure to (e.g., municipalities, counties, districts) or entire continents depending on details
pre-empt and control outbreaks28. Further honing the EID risk reported in the primary literature. A spatial polygon was created for each event
index within regions and countries might also inform the plan- that represented the most precise municipal region the EID event was known to
ning of large land-use change programs such as logging and have occurred in. All EID event polygons, regardless of precision, were included in
our bootstrap resampling framework; removing those with geographic uncertainty
mining concessions, dam-building, and road development33. (e.g., those with only country-level resolution) may artificially inflate the
These activities carry an intrinsic risk of disease emergence by apparent certainty of our model, and our resampling scheme limits their impact to
increasing human or livestock contact with wildlife in new appropriate levels. Events with precise coordinates were also assigned a polygon
regions or by disrupting disease dynamics in reservoir hosts21, 34, for consistency of data format, but rather than using a municipal boundary, the
event was assigned a 5 km circular buffer zone. EID polygons were subsampled for
and have been repeatedly linked to outbreaks of novel EIDs. model fitting as described below. Because our model matches EID events with
Similarly, the partial dependence plots allow a deeper decadal population and land use data (described below), we restricted our analyses
understanding of the largely non-linear relationships between to decades for which covariate data exist, excluding events before 1970 and leaving
EID drivers and disease emergence that can be used to design n = 147 records for analysis (66% of wildlife zoonosis events).
field experiments to test specific and generalizable hypotheses on
the drivers of zoonotic disease emergence. These should include Explanatory variables. We compiled spatial data layers for 20 predictors in four
broad categories to decompose which factors are associated with zoonotic disease
field sites along land use gradients within EID hotspot countries emergence. These reflected the most frequently hypothesized drivers of zoonotic
where controlled sampling protocols are used to identify how disease emergence and included (Table 1): human presence/activity, animals/hosts,
wildlife biodiversity, known and unknown pathogen diversity the environment, and reporting effort. Explanatory variables came from a variety of
(e.g., using viral family level degenerate primers for PCR35), and data sources, and all were rescaled or transformed to a spatial grid of 1° resolution
(WGS84, c. 110 km at the equator) prior to their use in models. Full details of
human contact with wildlife varies across a landscape. Such an sources, original resolutions and rescaling are presented in Tables 1 and 2.

NATURE COMMUNICATIONS | 8: 1124 | DOI: 10.1038/s41467-017-00923-8 | www.nature.com/naturecommunications 7


ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-00923-8

Table 2 Original resolutions and extents of source data sets

Source data set Spatial resolution Temporal resolution and extent


GRUMP (Global Rural–Urban Mapping Project)39 0°5′ 5 years, 1970–2000
HYDE (History Database of the Global Environment)43 0°5′ 10 years, 1900–2000
GMA (Global Mammal Assessment)12 300 m N/A
GLW (Gridded Livestock of the World)48 0.05° N/A
GEnS (Global Environmental Stratification)53 0°0′30″ N/A
EarthEnv55 0°0′30″ N/A

“Human Activity” data were compiled and eight predictors derived based on suitability for vectors of wildlife origin zoonoses (e.g., West Nile virus), more rapid
the following rationale: (1) Population density likely influences EID risk in two vector reproduction rates and biting rates, changes in the efficiency or rates of
discrete ways. First, as EID events are defined as diseases emerging in the human pathogen transmission among hosts and vectors, and changes in the ability of
population, their frequency—before the effects of other predictors—is assumed pathogens to persist in the environment, among other factors50, 51. Climate was
to be proportional to population density, with the other predictors modifying the represented by a single layer in our study, the Global Environmental
per-person risk of EID events. To represent this, we treated human population as a Stratification52, which uses a quantitative model to stratify the Earth’s surface into
baseline multiplicative factor in our models36. Second, population density may zones of similar climate on a single scalar measure, where higher values equate to
affect transmission dynamics such that EID events in areas of denser population warmer, wetter (more tropical) regions; (2) Land cover type: Land cover type is
may be more likely to produce outbreaks large enough to be detected37. We used associated with the distribution of terrestrial mammals12 and other taxa53,
the Global Rural–Urban Mapping Project38 human population data set, which potentially exposing humans present to different assemblages of viral species. It is
provides gridded estimates of human population every five years for 1970–2000. (2) also likely that the types of contact between wildlife and people vary with land
Population change acts as a proxy for changing demands on ecosystems leading to cover type. For land cover, we used the EarthEnv data set54, which divides the
environmental perturbation, which has been hypothesized to drive disease Earth’s surface into 12 classes. These include different classes of natural ecosystems,
emergence21. We created a measure for population change by calculating the urban land and cultivated vegetation (grouped with “Human Activity” above). We
inter-decadal difference of human population per grid cell. (3) Land-use type excluded barren areas, open water and snow/ice due to a lack of biologically
represents largely anthropogenic influence on the landscape (as opposed to ‘land plausible mechanisms for disease emergence. EarthEnv represents each class as a
cover’ below) and has been hypothesized to play a role in disease emergence and percentage per grid cell.
spatial distribution19, 21, 39–41. We used the HYDE data set which estimates the
percentage of land-use types in each grid cell of a global data set every ten years for
1900–200042 to derive predictors representing percentage of land used for cropland Reporting effort. The distribution of reported EID events is likely strongly
and percentage used for pasture. We also include the layers for Urban Land and influenced by an inconsistent spatial distribution of detection and reporting of
Managed/Cultivated Vegetation from the EarthEnv data set, described below disease outbreaks. Previous studies have used proxies of reporting effort such as the
under “Environment”, in this category, as they index human impact on the interpolated locations of known sampling sites (“sampling effort”)55; frequency of
environment. (4) Land-use change has been hypothesized as a key driver for countries of residence for all authors of all articles in the Journal of Infectious
disease emergence by perturbing ecosystems and bringing humans into close Disease (“reporting effort”)4; and PubMed searches for keywords for each country
proximity with wildlife5, 7, 8, 21, 27. We created metrics of change for pasture and (“reporting bias”)25. Other studies have used occurrence records for a similar class
cropland by calculating the between-decade difference in values for each grid cell of observations as a surrogate for background sampling effort; for example, in
for cropland and pasture. ecology, modeling the distribution of a particular species and utilizing occurrence
For data sets with multiple temporal layers (human population, cropland, records from multiple other species to represent background samples56.
and pasture), we included the intersection of available dates in different data sets We adapted these approaches by deriving an index for reporting effort based on
(decades 1970–2000) and calculated inter-decadal change layers by differencing the spatial distribution of toponyms (place names) in peer-reviewed biomedical
consecutive decades. All presence and absence samples drawn for each event literature. We wrote a Python package, PubCrawler (see Supplementary Methods
(see below) were matched to the nearest decadal layers (years ending in 5 were for full details), to search the full text of each of the 1,266,085 (as of April 2016)
rounded up) and the change layer for the decade they fell in. articles in the PubMed Central Open-Access Subset (PMCOAS)57 for toponyms
“Animal/host” data were represented by two predictors: (1) Mammalian from the GeoNames database58, which includes data on population (if
biodiversity. The diversity and prevalence in a host population of potentially appropriate), country, and geographical coordinates for each toponym. PubCrawler
zoonotic pathogens in an area is hypothesized to be a key factor in the risk of novel uses a set of heuristics, based on textual and geographic features of the identified
pathogen emergence8, 21, 43. However, spatial data on global pathogen diversity do toponyms, to minimize the number of false positives and select amongst
not currently exist, and it is estimated that we have identified less than 1% of ambiguous matches. We selected articles matching terms from the Human Disease
mammalian viral diversity35. Consistent with previous studies, we therefore assume Ontology59 and exported extracted toponyms. After excluding a further round of
that the number of available pathogens in an area is proportional to the diversity potentially spurious matches, place name matches were assigned a weight,
(species richness) of wildlife species4, 5, 35, 44. The overwhelming majority of normalized by article, and then summed to the study grid. To impute missing data
emerging zoonoses have mammalian hosts45, and global biogeographic patterns of (resulting in a number of zero-value grid cells) and smooth noise in the raw output,
human infectious diseases is highly correlated with global patterns of mammalian we fit a Poisson boosted regression tree model (using human population,
diversity30. We therefore used mammal biodiversity (species richness), measured accessibility, urbanized land, DALY rates, health expenditure, and GDP as
as number of mammal species per grid cell as a proxy for pathogen species predictors), and used this to represent reporting effort in our model. This approach
richness. To do this, we used the most up to date mammal species distribution produced a layer that adequately represented the underlying data while achieving a
maps available, derived from species distribution ranges filtered according to similar coverage of grid cells to other layers.
species-specific habitat preferences12. These habitat suitability models reflected
species preferences for land cover types, their altitudinal limits, their tolerance to
Statistical framework. We used boosted regression trees (BRT) to model EID
human presence, and their relationship with water bodies. The full-resolution
occurrence26, 48, 60 and to determine how conditions varied between locations
mammal biodiversity data (representing all 5291 terrestrial mammal species)12 was
where EID events have been observed compared to areas where they have not.
rescaled to the study grid by summing the number of species’ distributions that
BRTs handle non-linear relationships and higher order interactions among many
overlapped each grid cell; (2) Domestic animal density. A number of past EID
variables more robustly than many other modeling methods, and are robust to
events with wildlife origin have emerged through farmed or domestic animal
monotonic transformations of data26, 60. They fit potentially complex, non-linear
intermediate or amplifier hosts (e.g., Hendra and Nipah virus, SARS). In addition,
relationships by aggregating the predictions of multiple simpler models, and are
there is growing evidence that the global trend of intensification of livestock
trained iteratively on random partitions of the data26, 60. In addition, predictive
production increases the emergence risk of novel wildlife origin zoonoses, e.g.,
accuracy of BRTs, as determined by common validation methodologies (e.g., Area
Nipah virus in Malaysia46, influenza viruses, and others6. We used the Gridded
Under the Curve of the Receiver-Operator Characteristic (AUC of the ROC), True
Livestock of the World (GLW) data set47, which contains data for poultry, goat,
Skill Statistic (TSS)), frequently exceeds conventional linear methods26. Unlike
buffalo, cattle, sheep, and pig headcounts. We summed mammals to a single
conventional models, they do not produce confidence intervals or p-values.
predictor (livestock mammal headcount) and retained poultry as a discrete
predictor.
We analyzed eight predictors from two data sets representing “Environmental” Resampling regimes. We employed various resampling techniques to incorporate
variables: (1) Climate. Climatic factors have been repeatedly hypothesized as our measure of reporting effort56, 61, estimate the predictive power of our
important in the global biogeography of human infectious diseases, including models, account for spatial uncertainty in EID events15, and generate empirical
EIDs30, 48, 49. Climate may influence disease distribution through enhanced confidence intervals for effects representing both sampling uncertainty and spatial

8 NATURE COMMUNICATIONS | 8: 1124 | DOI: 10.1038/s41467-017-00923-8 | www.nature.com/naturecommunications


NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-00923-8 ARTICLE

uncertainty62. Each time an event was sampled, one presence point and one Because all test statistics and figures from our main model are relative to the
absence point were drawn (artificially fixing overall prevalence at 0.5)15. The reporting effort measure, we also ran “unweighted” models. We expected these
presence point was from the grid cells overlapped by that event’s polygon, and the would score yield higher cross-validation scores, since we expected that reporting
absence point from all grid cells; both were weighted by reporting effort (the effect effort would be correlated both with some important predictor variables and the
of weighting presence points by reporting effort made little difference for points outcome, and weighting background samples uniformly rather than according to
with small, precisely specified occurrence polygons, and for events with high this variable would present a clearer contrast. To avoid bias from land area in the
uncertainty it acted as a prior, specifying that, in the absence of other knowledge, WGS84 grid cells, we additionally weighted our “unweighted models” by land area
the event was more likely detected where reporting effort was higher). per grid cell. The figures from these models are presented fully in Supplementary
All replicate BRT models were fit using the R packages dismo and gbm26. Information.
The function gbm.step() was called with the parameters tree.complexity = 3
(governing interaction depth), learning.rate = 0.0035 (setting the “shrinkage” Code availability. All data and code used to generate the models are available on
applied to individual trees), and n.trees = 35 (governing the initial number of trees GitHub (doi: 10.5281/zenodo.400978)65, as is the code used to generate the
fit, as well as the “step size” or number added at each step of the stagewise fitting reporting effort layer (doi: 10.5281/zenodo.400977)66.
process)26. These values were selected through an iterative process, starting with
the default parameters, adding tree complexity, and tuning the shrinkage and step
size parameters to achieve successful gradient descent consistently across Data availability. The data sets analyzed during this study are included in this
resampling runs, following refs. 26, 62. With the final parameters, the BRTs published article and its Supplementary Information Files, with the exception of
composing the bootstrap model fit a mean of 1005 trees. EID Event shape files, which are available from the corresponding author on
Our main model used a bootstrap resampling regime, which was used to fit reasonable request.
1000 replicate models. For each model, 147 events were drawn randomly with
replacement from the set the 147 EID events of interest, and for each selected Received: 1 June 2016 Accepted: 7 August 2017
event, 1 presence and 1 absence value were drawn as described above. The fitted
models were used to generate Relative Influence box plots and Partial Dependence
plots with empirical 90% confidence intervals. The mean of the predictions of these
models were used to generate all maps.
To compute validation statistics (described below), we conducted 100 rounds of
10-fold cross-validation15, 62. In each round, a single presence and absence sample
were drawn for each event, which were assigned randomly to ten groups. Each References
group in turn was held out, and a model was trained on the remaining groups’ 1. Heymann, D. L. et al. Global health security: the wider lessons from the west
samples. The model’s predictions for the presence and absences samples of the African Ebola virus disease epidemic. Lancet 385, 1884–1901 (2015).
held-out group were used to construct confusion matrices, and calculate the AUC 2. Morens, D. M. & Fauci, A. S. Emerging infectious diseases in 2012: 20 years
and TSS. This process was repeated 100 times, and the median, 0.05 and 0.95 after the institute of medicine report. Mbio 3, e00494–12 (2012).
quantiles for all scores were reported. 3. Pike, J., Bogich, T. L., Elwood, S., Finnoff, D. C. & Daszak, P. Economic
optimization of a global stategy to reduce the pandemic threat. Proc. Natl Acad.
Factoring reporting bias out. We assumed that the distribution of observed EID Sci. USA 111, 18519–18523 (2014).
events was conditional on the distribution of reporting effort across the globe 4. Jones, K. E. et al. Global trends in emerging infectious diseases. Nature 451,
following56. We fit our main, “weighted” model with grid cells sampled relative to 990–993 (2008).
reporting effort. The model thus produced a response relative to reporting effort 5. Wolfe, N. D., Dunavan, C. P. & Diamond, J. Origins of major human infectious
(Fig. 4). We multiplied this response by the value of reporting effort in each grid diseases. Nature 447, 279–283 (2007).
cell to map the index of observed EID event risk (Fig. 3a). 6. Jones, B. A. et al. Zoonosis emergence linked to agricultural intensification and
We produced the estimate of the risk index after factoring out reporting bias environmental change. Proc. Natl Acad. Sci. USA 110, 8399–8404 (2013).
(Fig. 3b) as follows. We assumed that the optimal distribution of reporting effort 7. Karesh, W. B. et al. Zoonoses 1 Ecology of zoonoses: natural and unnatural
for human disease events in a location is proportional to the distribution of the histories. Lancet 380, 1936–1945 (2012).
human population. In reality, other unmeasured factors likely affect this. However, 8. Morse, S. Factors in the Emergence of Infectious Diseases. Emerg. Infect. Dis. 1,
given this assumption, we can define reporting bias as proportional to the ratio of 7–15 (1995).
reporting effort to the human population (Fig. 4). 9. Coker, R. et al. Towards a conceptual framework to support one-health
research for policy on emerging zoonoses. Lancet Infect. Dis. 11, 326–331
Reporting effort
Reporting bias / (2011).
Population 10. Woolhouse, M., Scott, F., Hudson, Z., Howey, R. & Chase-Topping, M. Human
When bias is known, it is possible to estimate the true distribution of a viruses: discovery and emergence. Philos. Trans. R. Soc. B Biol. Sci. 367,
phenomenon by “factoring bias out”56. In ecological studies, this generally means 2864–2871 (2012).
dividing by the measured “survey effort”, assuming that the optimal distribution of 11. Brierley, L., Vonhof, M. J., Olival, K. J., Daszak, P. & Jones, K. E. Quantifying
search effort is uniform across the landscape. global drivers of zoonotic bat viruses: a process-based perspective. Am. Nat.
187, E53–E64 (2016).
Observed risk index
True risk index / 12. Rondinini, C. et al. Global habitat suitability models of terrestrial mammals.
Reporting bias Philos. Trans. R. Soc. Lond. B Biol. Sci. 366, 2633–2641 (2011).
We posit that, in the case of human disease events, uniform search effort across 13. Lobo, J. M., Jiménez-Valverde, A. & Real, R. AUC: a misleading measure of the
a landscape is also suboptimal, and that it is safer to assume optimal reporting performance of predictive distribution models. Glob. Ecol. Biogeogr. 17,
effort distribution would be proportional to the human population. In this case, we 145–151 (2008).
remove “bias” by factoring out measured reporting effort and factoring in assumed 14. Allouche, O., Tsoar, A. & Kadmon, R. Assessing the accuracy of species
optimal effort, and obtain a hypothetical map of the true event risk index, thus: distribution models: prevalence, kappa and the true skill statistic (TSS). J. Appl.
Ecol. 43, 1223–1232 (2006).
Human population
True risk index / Observed risk index ´ 15. Barbet-Massin, M., Jiguet, F., Albert, C. H. & Thuiller, W. Selecting
Reporting effort pseudo-absences for species distribution models: how, where and how many?
Methods Ecol. Evol. 3, 327–338 (2012).
Model validation and performance. We used multiple tools for model validation 16. Weiss, R. A. & McMichael, A. J. Social and environmental risk factors in the
and performance. For our bootstrap model, we calculated deviance explained using emergence of infectious diseases. Nat. Med. 10, S70–S76 (2004).
the gbm.step() function26 and also derived median and empirical 90% CIs by 17. McFarlane, R., Sleigh, A. & McMichael, A. land-use change and emerging
taking the 0.05, 0.5, and 0.95 quantiles of those values for the replicate models. infectious disease on an island continent. Int. J. Environ. Res. Public Health 10,
Since this model is fit relative to reporting effort, percentage deviance explained 2699–2719 (2013).
is calculated relative to that variable. For the ten-fold cross-validation runs, 18. Patz, J. A. et al. Unhealthy landscapes: Policy recommendations on land use
we calculated the AUC, a threshold-independent measure of model predictive change and infectious disease emergence. Environ. Health Perspect. 112,
performance that is commonly used as a validation metric in species distribution 1092–1098 (2004).
modelling63. The AUC can be interpreted as “the probability that the model will 19. Keesing, F. et al. Impacts of biodiversity on the emergence and transmission of
rank a randomly chosen presence site higher than a randomly chosen absence infectious diseases. Nature 468, 647–652 (2010).
site”64, or more accurately in our application, a measure of a model’s performance 20. Myers, N., Mittermeier, R. A., Mittermeier, C. G., da Fonseca, G. A. B. &
to discriminate EID events from random points56. Because the use of AUC has Kent, J. Biodiversity hotspots for conservation priorities. Nature 403, 853–858
been criticized for its lack of sensitivity to absolute predicted probability and its (2000).
inclusion of a priori untenable prediction thresholds13, we also calculated the True
Skill Statistic (TSS)15.

NATURE COMMUNICATIONS | 8: 1124 | DOI: 10.1038/s41467-017-00923-8 | www.nature.com/naturecommunications 9


ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-00923-8

21. Murray, K. A. & Daszak, P. Human ecology in pathogenic landscapes: two 54. Tuanmu, M.-N. & Jetz, W. A global 1-km consensus land-cover product for
hypotheses on how land use change drives viral emergence. Curr. Opin. Virol. biodiversity and ecosystem modelling. Glob. Ecol. Biogeogr. 23, 1031–1045
3, 79–83 (2013). (2014).
22. Schmidt, K. A. & Ostfeld, R. S. Biodiversity and the dilution effect in disease 55. Hopkins, M. E. & Nunn, C. L. A global gap analysis of infectious agents in wild
ecology. Ecology 82, 609–619 (2001). primates. Divers. Distrib. 13, 561–572 (2007).
23. Salkeld, D. J., Padgett, K. A. & Jones, J. H. A meta-analysis suggesting that the 56. Phillips, S. J. et al. Sample selection bias and presence-only distribution models:
relationship between biodiversity and risk of zoonotic pathogen transmission is implications for background and pseudo-absence data. Ecol. Appl. 19, 181–197
idiosyncratic. Ecology Letters 16, 679–686 (2013). (2009).
24. Randolph, S. E. & Dobson, A. D. M. Pangloss revisited: a critique of the dilution 57. PubMed Central FTP Service. Available at: https://www.ncbi.nlm.nih.gov/pmc/
effect and the biodiversity-buffers-disease paradigm. Parasitology 139, 847–863 tools/ftp/ (2017).
(2012). 58. Wick, M. GeoNames. Available at: http://www.geonames.org (2017).
25. Yang, K. et al. Global distribution of outbreaks of water-associated infectious 59. Kibbe, W. A. et al. Disease Ontology 2015 update: an expanded and updated
diseases. PLoS Neglect. Trop. Dis. 6, e1483 (2012). database of human diseases for linking biomedical knowledge through disease
26. Elith, J., Leathwick, J. R. & Hastie, T. A working guide to boosted regression data. Nucleic Acids Res. 43, D1071–D1078 (2015).
trees. J. Anim. Ecol. 77, 802–813 (2008). 60. De’ath, G. Boosted trees for ecological modeling and prediction. Ecology 88,
27. Loh, E. H. et al. Targeting transmission pathways for emerging zoonotic disease 243–251 (2007).
surveillance and control. Vector Borne Zoonotic Dis. 15, 432–437 (2015). 61. Dorazio, R. M. Accounting for imperfect detection and survey bias in statistical
28. Morse, S. S. et al. Prediction and prevention of the next pandemic zoonosis. analysis of presence-only data. Glob. Ecol. Biogeogr. 23, 1472–1484 (2014).
Lancet 380, 1956–1965 (2012). 62. Leathwick, J. R., Elith, J., Francis, M. P., Hastie, T. & Taylor, P. Variation in
29. Olson, S. H. et al. Drivers of emerging infectious disease events as a framework demersal fish species richness in the oceans surrounding New Zealand: an
for digital detection. Emerg. Infect. Dis. 21, 1285–1292 (2015). analysis using boosted regression trees. Marine Ecol. Prog. 321, 267–281 (2006).
30. Murray, K. A. et al. Global biogeography of human infectious diseases. Proc. 63. Liu, C., White, M. & Newell, G. Measuring and comparing the accuracy of
Natl Acad. Sci. USA 112, 12746–12751 (2015). species distribution models with presence-absence data. Ecography 34, 232–243
31. Funk, S., Bogich, T. L., Jones, K. E., Kilpatrick, A. M. & Daszak, P. Quantifying (2011).
trends in disease impact to produce a consistent and reproducible definition of 64. Fawcett, T. An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874
an emerging infectious disease. PLoS ONE 8, e69951 (2013). (2006).
32. Olival, K. J. et al. Host and viral traits predict zoonotic spillover from mammals. 65. Allen, T. ecohealthalliance/hotspots2: “Global correlates” paper. Available at
Nature 546, 646–650 (2017). doi.org/10.5281/zenodo.400978 (2017).
33. Laurance, W. F. et al. A global strategy for road building. Nature 513, 229–232 66. Allen, T & Breit, N. ecohealthalliance/pubcrawler: “Global correlates” paper.
(2014). Available at doi.org/10.5281/zenodo.400977 (2017).
34. Loh, E. H., Murray, K. A., Nava, A., Aguirre, A. A. & Daszak, P. in Tropical
Conservation: Perspectives on Local and Global Priorities (eds Aguirre, A. A. &
Sukumar, B.) Ch. 6, 79–88 (Oxford University Press, 2016).
Acknowledgements
This work was made possible by the generous support of the American people through
35. Anthony, S. J. et al. A strategy to estimate unknown viral diversity in mammals.
the United States Agency for International Development (USAID) Emerging Pandemic
Mbio 4, e00598-00513 (2013).
Threats PREDICT (Cooperative Agreement No. AID-OAA-A-14-00102). The natural
36. Moffett, A., Shackelford, N. & Sarkar, S. Malaria in Africa: vector species’ niche
language processing software described was sponsored by the Department of the Defense,
models and relative risk maps. PLoS ONE 2, e824 (2007).
Defense Threat Reduction Agency (Project No. J9CBA14212). The contents are the
37. McCallum, H. How should pathogen transmission be modelled? Trends Ecol.
responsibility of the authors and do not necessarily reflect the views or the policy of
Evol. 16, 295–300 (2001).
USAID or the United States Government, and no official endorsement should be
38. Socioeconomic Data and Applications Center (SEDAC).. Global Rural-Urban
inferred. We thank Liam Brierly (Univ. of Edinburgh) for collating new EID data.
Mapping Project (GRUMP), v1. Available at: http://sedac.ciesin.columbia.edu/
data/collection/grump-v1 (2015).
39. Ostfeld, R. S. & Keesing, F. Biodiversity series: the function of biodiversity in Author contributions
the ecology of vector-borne zoonotic diseases. Can. J. Zool. 78, 2061–2078 T.A. and K.M. designed the statistical approach, with contributions from K.J.O. and
(2000). C.Z.-T. The EID database was updated under P.D.’s supervision. T.A. wrote the modeling
40. Ostfeld, R. S. & Keesing, F. Effects of host diversity on infectious disease. Annu. code and generated the figures, and N.B. and T.A. wrote the code to generate the
Rev. Ecol. Evol. Syst. 43, 157–182 (2012). publication bias layer. C.R. and M.D.M. contributed the mammal species richness data
41. Bogich, T. L. et al. Preventing pandemics via international development: a set. T.A., K.M., K.J.O., and P.D. wrote the manuscript, with all authors contributing edits.
systems approach. PLoS Med. 9, e1001354 (2012).
42. Klein Goldewijk, K., Beusen, A., Van Drecht, G. & De Vos, M. The HYDE
3.1 spatially explicit database of human-induced global land-use change over Additional information
the past 12,000 years. Glob. Ecol. Biogeogr 20, 73–86 (2011). Supplementary Information accompanies this paper at doi:10.1038/s41467-017-00923-8.
43. Lloyd-Smith, J. O. et al. Epidemic dynamics at the human-animal interface.
Science 326, 1362–1367 (2009). Competing interests: The authors declare no competing financial interests.
44. Dunn, R. R., Davies, T. J., Harris, N. C. & Gavin, M. C. Global drivers of human
pathogen richness and prevalence. Proc. R. Soc. B Biol. Sci. 277, 2587–2595 (2010). Reprints and permission information is available online at http://npg.nature.com/
45. Woolhouse, M. E. J. & Gowtage-Sequeria, S. Host range and emerging and reprintsandpermissions/
reemerging pathogens. Emerg. Infect. Dis. 11, 1842–1847 (2005).
46. Pulliam, J. R. C. et al. Agricultural intensification, priming for persistence and Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in
the emergence of Nipah virus: a lethal bat-borne zoonosis. J. R. Soc. Interface 9, published maps and institutional affiliations.
89–101 (2011).
47. Robinson, T. P. et al. Mapping the global distribution of livestock. PLoS ONE 9,
e96084 (2014).
48. Hay, S. I. et al. Global mapping of infectious disease. Philos. Trans. R. Soc. Lond. Open Access This article is licensed under a Creative Commons
B Biol. Sci. 368, 20120250 (2013). Attribution 4.0 International License, which permits use, sharing,
49. Guernier, V., Hochberg, M. E. & Guégan, J.-F. Ecology drives the worldwide adaptation, distribution and reproduction in any medium or format, as long as you give
distribution of human diseases. PLoS Biol. 2, e141 (2004). appropriate credit to the original author(s) and the source, provide a link to the Creative
50. Rohr, J. R. et al. Frontiers in climate change-disease research. Trends Ecol. Evol. Commons license, and indicate if changes were made. The images or other third party
26, 270–277 (2011). material in this article are included in the article’s Creative Commons license, unless
51. Kilpatrick, A. M. & Randolph, S. E. Zoonoses 2 Drivers, dynamics, and control indicated otherwise in a credit line to the material. If material is not included in the
of emerging vector-borne zoonotic diseases. Lancet 380, 1946–1955 (2012). article’s Creative Commons license and your intended use is not permitted by statutory
52. Metzger, M. J. et al. A high-resolution bioclimate map of the world: a unifying regulation or exceeds the permitted use, you will need to obtain permission directly from
framework for global biodiversity research and monitoring. Glob. Ecol. the copyright holder. To view a copy of this license, visit http://creativecommons.org/
Biogeogr. 22, 630–638 (2013). licenses/by/4.0/.
53. Jenkins, C. N., Pimm, S. L. & Joppa, L. N. Global patterns of terrestrial
vertebrate diversity and conservation. Proc. Natl Acad. Sci. USA 110,
E2602–E2610 (2013). © The Author(s) 2017

10 NATURE COMMUNICATIONS | 8: 1124 | DOI: 10.1038/s41467-017-00923-8 | www.nature.com/naturecommunications

You might also like