Schneider Et Al-2019-Methods in Ecology and Evolution
Schneider Et Al-2019-Methods in Ecology and Evolution
Schneider Et Al-2019-Methods in Ecology and Evolution
DOI: 10.1111/2041-210X.13288
REVIEW
Correspondence
Florian D. Schneider Abstract
Email: [email protected] 1. Trait‐based approaches are widespread throughout ecological research as they
Funding information offer great potential to achieve a general understanding of a wide range of eco-
Deutsche Forschungsgemeinschaft, Grant/ logical and evolutionary mechanisms. Accordingly, a wealth of trait data is avail-
Award Number: MA7144/1‐1, KO2209/12‐2,
Po362/18‐3 and WE3081/21‐1; able for many organism groups, but this data is underexploited due to a lack of
Schweizerischer Nationalfonds zur standardization and heterogeneity in data formats and definitions.
Förderung der Wissenschaftlichen
Forschung, Grant/Award Number: 2. We review current initiatives and structures developed for standardizing trait data
310030E‐173542/1 and discuss the importance of standardization for trait data hosted in distributed
Handling Editor: David Orme open‐access repositories.
3. In order to facilitate the standardization and harmonization of distributed trait
datasets by data providers and data users, we propose a standardized vocabulary
that can be used for storing and sharing ecological trait data. We discuss poten-
tial incentives and challenges for the wide adoption of such a standard by data
providers.
4. The use of a standard vocabulary allows for trait datasets from heterogeneous
sources to be aggregated more easily into compilations and facilitates the crea-
tion of interfaces between software tools for trait‐data handling and analysis. By
aiding decentralized trait‐data standardization, our vocabulary may ease data in-
tegration and use of trait data for a broader ecological research community and
enable global syntheses across a wide range of taxa and ecosystems.
KEYWORDS
data standardization, ecoinformatics, functional ecology, ontologies, semantic web, species
traits
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium,
provided the original work is properly cited.
© 2019 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society.
1 | I NTRO D U C TI O N (Wilkinson et al., 2016). It is thus likely that trait data will become in-
creasingly available, but a lack of data and metadata standardization will
Functional traits are phenotypic (i.e. morphological, physiological, hamper the efficient reuse and synthesis of published datasets.
behavioural) characteristics that are related to the fitness and per- In this paper, we review existing initiatives for trait‐data collec-
formance of an organism (McGill, Enquist, Weiher, & Westoby, 2006; tion and standardization from the pragmatic view of data providers,
Violle et al., 2007). Recent years have seen a proliferation of trait‐ data curators and data users, as well as data managers. We discuss
based research in a wide range of fields: trait data have been used current efforts to make trait data visible, accessible, interoperable
to understand the evolutionary basis of individual‐level properties and reuseable in downstream data analysis, as demanded by the
(Salguero‐Gómez et al., 2016), global patterns of biodiversity (Díaz FAIR guiding principles for scientific data (Wilkinson et al., 2016).
et al., 2016), and the relationship between ecosystem functions and Furthermore, we show how the current deficit in the standardiza-
the functional composition of species assemblages (Bello et al., 2010; tion of primary data hampers the implementation of interoperability
Mouillot, Graham, Villéger, Mason, & Bellwood, 2013). This research and reuse of trait data. Based on these considerations, we propose
provides the mechanistic framework for linking climate change or an- a versatile vocabulary for describing ecological trait datasets, which
thropogenic land use to biodiversity and its related functions (Allan et builds upon, and is compatible with, existing terminology standards
al., 2015; Díaz et al., 2011; Lavorel & Grigulis, 2012). Species traits have for biodiversity data, in particular the Darwin Core Standard for bio-
been suggested as indicator variables for monitoring ecosystem health diversity data (DwC; Wieczorek et al., 2012). Since a standard vo-
at the individual level, like for instance changes in body sizes in a popu- cabulary relies on the adoption by a broad research community, we
lation of fish (Kissling et al., 2018). Because functional traits allow us to discuss incentives for its use and lay out mechanisms for future con-
infer the ecological role of organisms from their apparent features, re- sensus‐building and community development towards an accessible
gardless of their taxonomic identity (Grime, 2001; Moretti et al., 2017; and easy‐to‐use ecological trait‐data standard vocabulary.
Villéger, Brosse, Mouchet, Mouillot, & Vanni, 2017), their measure-
ment is also a promising means of bypassing taxonomic impediment,
i.e. the fact that most species are yet undescribed, and little is known of 2 | I N ITI ATI V E S FO R TR A IT‐ DATA
their interactions with other organisms and their environment. S TA N DA R D IZ ATI O N
Despite the importance of trait‐based approaches, fully exploiting
their potential relies heavily on the broad availability and compatibility The need for standardizing trait data arises from the prospective
of trait data to achieve sufficient taxonomic and regional coverage, both gain of compiling heterogeneous trait datasets for data synthesis.
of present‐day taxa as well as in evolutionary deep‐time. However, the Often, the scientific scope and focus differs between data provid-
heterogeneity of data arising from different research contexts render ers measuring and assessing the trait data in the first place and data
trait data extremely heterogeneous and make the task of data compi- users who reuse published data for a broader synthesis application.
lation time‐consuming and error‐prone. To date, trait data have tradi- Furthermore, data curators and data managers are taking up the
tionally been harmonized and compiled into centralized databases only task of providing compiled and harmonized data and prepare them
for specific organism groups and regional scope, often centred around for future use and long‐term preservation. Data managers are con-
particular research questions (e.g. PanTHERIA, Jones et al., 2009; TRY, cerned with the development of complex digital infrastructures for
Kattge, Díaz, et al., 2011; AmphiBio, Oliveira, São‐Pedro, Santos‐Barrera, handling and analysing large amounts of data. These are idealized
Penone, & Costa, 2017). Less well‐studied taxa and specialized research roles of researchers that are dealing with trait‐data standardization
questions lack the resources for such an endeavour. Besides initiatives throughout the data life cycle. In this chapter, we review four types
aiming at assembling data, tools to enable the compatibility of data of initiatives that are of relevance for trait‐data standardization (see
across databases are being developed. These include software to access Glossary in Table 1 for italicized terms):
trait data from the Internet (e.g. Ankenbrand, Hohlfeld, Weber, Foerster,
& Keller, 2018; Chamberlain, Foster, Bartomeus, LeBauer, & Harris, 1. Initiatives that provide trait datasets which have been assembled
2017), semantic web standards (Page, 2008; Wieczorek et al., 2012) and out of a particular research interest, either by measurement
thesauri of consensus terms (Garnier et al., 2017; Walls et al., 2012). or collated from the literature.
Meanwhile, open and reproducible science has become mainstream: 2. Initiatives that aim to harmonize trait data from the literature or
publication of research data without access restrictions, with structured from direct measurements into data compilations or database
metadata and in accordance with data standards to enable their reuse, infrastructures and make those data widely available on the
has become the declared goal of an open biodiversity knowledge man- Internet.
agement (http://www.bouchoutdeclaration.org/) and is increasingly 3. Initiatives that aim at the standardization and development of
demanded by journals and public research funding agencies (Alliance consensus measurement methods and definitions for traits and
of German Science Organisations, 2010; Royal Society Science Policy provide standard terminologies.
Centre, 2012). As a result, an increasing number of individual research 4. Initiatives that aim to combine data (1 & 2) and terminologies (3)
projects publish their primary data on general‐purpose file hosting ser- into formalized structures for knowledge representation to link
vices, where no data standards are enforced upon the uploaded material trait data to a wider set of biodiversity data.
SCHNEIDER et al. Methods in Ecology and Evolu
on
3 |
TA B L E 1 Glossary of terms from the biodiversity data‐management context as they are used in this paper; draws from Garnier et al.
(2017)
Term Definition
Concept An idea, notion or object that is made explicit in an information context by a term definition, and referenced to a URI or
other accessible reference
Controlled A list of terms that gives all valid consensus terms for a particular context, while no unlisted entries are accepted
vocabulary
Darwin Core Body of terms intended to facilitate the sharing of information about biological diversity; maintained by the Biodiversity
Standard (DwC) Information Standards TDWG (http://rs.tdwg.org/dwc/)
Dataset A set of measurements and observations, often stored in a data‐table and originating from a single experimental set‐up or
study context; can be considered as being internally homogeneous across all data entries
Database A structured collection of data, usually organized as multiple data tables linked via identifiers into relational databases;
usually constructed using a specific database management system, i.e. a software to provide a (offline or online) user
interface
File repository A storage or archiving of datasets on file‐hosting services like Figshare.com, Dryad (datadr yad.org), Researchgate.net, or
Zenodo.org; online repositories make data available for public access, provide metadata, state conditions of reuse, and
(not always) facilitate citations via persistent identifiers, e.g. DOIs (Digital Object Identifiers)
Identifier (ID) A unique label that relates data entries to information within and across datasets or external items of information; may
be used to connect multiple data tables into a database; can be user‐specific or, in form of a URI, point to a globally valid
ontology or thesaurus
Metadata Data documentation of the higher‐level information or instructions; describe the content, context, quality, structure, prov-
enance and accessibility of a data object (Michener, 2006). In the context of trait data, such additional information can
move to the body of the primary data table when data are compiled from different sources
Occurrence The observation context of a single individual, i.e. the existence of an organism at a particular place and time; Sometimes
used as synonym of ‘observation’ in data management context
Ontology A semantic model of the objects and their relationships in a domain of interest (Gruber, 1995); defines terms and concepts
in a formal language that provides cross‐references and semantic meaning; commonly published in OWL format for
machine readability
Semantic web An extension of the World Wide Web that aims for machine‐readable meaning of information via well‐defined data stand‐
ards, ontologies and exchange protocols (Berners‐Lee et al., 2001); the World Wide Web Consortium (W3C) defines stand-
ards, i.e. specifications of protocols and technologies for the semantic web (http://www.w3.org/standards/semanticweb/)
Term A word that names or labels a particular concept as part of the specialized vocabulary of a field.
Terminology The body of terms and concepts used with a particular application in a subject of study, usually formalized in a thesaurus or
ontology
Thesaurus Controlled vocabulary that provides key terms with their associated concepts and relations for a specific field or domain of
interest (Laporte et al., 2013); e.g. may define a hierarchy of broader or narrower terms
Uniform Resource An unambiguous pointer to a unique resource on the Internet; used to refer to single terms of a thesaurus or ontology;
Identifier (URI) Example: http://purl.obolibrary.org/obo/TO_0000391
We consider these initiatives separately although they are often wing and beak measurements for birds, as well as life‐history traits
developed in conjunction to serve a particular database project, such such as Ellenberg values for plants or physiological and reproductive
as the TRY plant database (Kattge, Díaz, et al., 2011; Kattge, Ogle, et traits for animals (e.g. feeding biology, dispersal, metabolic rate and
al., 2011) and the Thesaurus of Plant characteristics (TOP; Garnier body size) have been assessed for decades and have been published
et al., 2017). We show how the degree of trait‐data standardization in regular journal articles or books. With the rise of ecological trait‐
in existing datasets is highly variable, and which tools and standards based research, measurements and information available from spe-
are currently applied to achieve harmonization of data from multiple, cies descriptions have been compiled into project‐specific datasets
distributed sources. The objective of this review is to raise aware- that typically comprise a local set of taxa and a focal set of traits. A
ness of the generic structure of trait data and aid researchers in how plethora of such static datasets has been published alongside sci-
to share and publish their own datasets in an appropriate form. entific articles, or as standalone data publications (see Kleyer et al.,
2008 for a review on plant data; for animal data, e.g. Gossner et al.,
2015 and Appendix S1, Table A1).
2.1 | Trait datasets
Today, the online publication of such data is greatly facilitated
In the field of comparative biology, morphological traits, such as by file hosting services (e.g. Figshare, Zenodo, Researchgate, Data
traits related to flower shape, leaf and stem structures for plants or Dryad), which warrant long‐term accessibility, and citeability via
| Methods in Ecology and Evolu
on
4 SCHNEIDER et al.
DOIs, and govern data sharing via license statements. These plat- metadata content might be lost, as the detail in the original files dif-
forms offer the hosting of publicly accessible file repositories at fers, while the reference to the original dataset becomes obscured,
low‐cost or for free, which makes them attractive for small and in- as only aggregated values are reported (e.g. means or medians). Such
termediate‐sized research projects that cannot dedicate extra re- trait–data compilations are often labelled ‘database’, although they
sources for data management. Most importantly, these platforms do not formally provide data in a database structure in the strict
enable public hosting of data with very low quality‐thresholds re- data‐management sense. Instead, the data are released as static
garding metadata documentation and data standardization. Thus, data tables of raw measurements or aggregate trait values on jour-
although open for download, the trait datasets on such data reposi- nal websites or open‐access file hosting platforms, which may be
tories might be stored in variable tabular structures and labelled fol- updated irregularly.
lowing self‐defined terms, which makes extraction and further use As they deal with much larger amounts of data, initiatives that
unnecessarily tedious. compile data from natural history museum collections are tradition-
For trait data, there are common issues arising from the variabil- ally more concerned with standardization. The amount of morpho-
ity of data structures and metadata quality. In terms of structure, logical measurements data extracted from museum collections and
trait data usually are reported in a species × traits wide‐table for- herbaria is likely to skyrocket in the near future due to digitization
mat. In this intuitive data table, each row represents a species (or efforts supported by new technology for scanning and pattern rec-
taxon) for which multiple traits are reported in columns. Similarly, ognition (Smith & Blagoderov, 2012, and references therein; Ströbel,
when reporting raw data, researchers place observations on indi- Schmelzle, Blüthgen, & Heethoff, 2018) and citizen science initia-
vidual organisms in rows with multiple trait measurements applied tives (e.g. www.markmybird.org). For example, the VertNet data-
to the same individual across multiple columns. Covariates on the base compiled and harmonized large quantities of vertebrate trait
taxon, the individual specimen (e.g. sex or life‐stage) or context of data from collections; the resulting data are published as versioned
observation (e.g. time and place of sampling) would be placed in data tables which are updated as new data sources become available
additional columns and would further expand the two‐dimensional (http://vertnet.org, Guralnick et al., 2016).
data table. The resolution or scope of these covariates varies greatly Specialized online portals have been created to attract data
depending on the research question and observation context. The submissions from a defined research field and take care of data
column descriptions and terminology applied to taxa and traits are harmonization, thereby greatly facilitating data synthesis. For ex-
mostly project‐specific and rarely chosen for compatibility with ample, by aiming for a universal framework for plant traits, the
larger database initiatives. Variability in the number and meaning of TRY database (Kattge, Díaz, et al., 2011) attracted more data sub-
columns in these data tables requires tedious manual adjustments missions and downloads than any other trait‐data platform. The
when merging multiple datasets (Wickham, 2014). Furthermore, online portal enables selective data download and management of
metadata provided along with the primary data vary in their level user permissions. For animal trait data, however, a single unified
of detail, e.g. for documenting descriptions of variables, measure- platform and harmonizing scheme is still lacking. Nonetheless,
ment procedures or sampling context (Kattge, Ogle, et al., 2011). initiatives for particular groups of animals do exist. Examples are
While, in some datasets, information like geolocation or sampling the BETSI database on soil invertebrate traits (http://betsi.cesab.
date and time might be dataset‐level information, thus qualifying as org/; Pey et al., 2014), the Carabids.org web portal (http://www.
metadata, in other datasets they might be collected on a level of carabids.org/), the Coral Trait Database (Madin et al., 2016), or the
individual observations (see section on data compilations below). Global Ants Database (Parr et al., 2017, see Appendix S1, Table
More importantly, clear statements on ownership and authorship, A1). The role of online portals and database initiatives in stan-
terms of use, or internationalization (e.g. separators and delimiters), dardizing data and making them more accessible is paramount.
are often still neglected in primary trait‐data publications. The task Trait‐data portals incentivize data submissions by offering in-
of harmonizing trait data is taken up by data‐curating initiatives, creased data visibility and usage, while providing data‐use poli-
who compile heterogeneous data into comprehensive databases cies that secure author attribution and, potentially, co‐authorship
(see next section). of associated articles. However, maintaining centralized database
infrastructures is costly and requires long‐term funding (Bach et
al., 2012).
2.2 | Data compilation initiatives
In the past two decades, many distributed trait datasets have been
2.3 | Terminology standards for traits
aggregated and harmonized into greater collections with particu-
lar taxonomic or regional focus (e.g. Kleyer et al., 2008; Oliveira et A major challenge in trait‐data standardization is the lack of widely
al., 2017, see Appendix S1, Table A1). While these initiatives suc- accepted and unambiguous trait definitions (Kissling et al., 2018).
cessfully address issues of heterogeneity in units or categorical Previous standard definitions of trait concepts range from listings of
variables, or achieve high taxonomic or geographic coverage, few selected definitions in vocabularies, over well‐defined method hand-
of these compilations apply a standardized terminology for taxa or books and comprehensive thesauri, to formalized definitions of trait
trait definitions. Additionally, in the process of data aggregation, rich concepts in ontologies. The initiatives behind method handbooks,
SCHNEIDER et al. Methods in Ecology and Evolu
on |
5
thesauri and ontologies are essential for building community con- Jaiswal et al., 2005; Walls et al., 2012; the Flora Phenotype Ontology,
sensus for trait definitions. Hoehndorf et al., 2016), and for specific animal taxa (e.g. the
Very general classes of traits are defined within the list of Hymenoptera Anatomy Ontology, Yoder, Miko, Seltmann, Bertone,
GeoBON Essential Biodiversity Variables (Kissling et al., 2018) aim- & Deans, 2010; the Vertebrate Trait Ontology, Park et al., 2013). The
ing for a list of functional indicators for ecosystem health. UBERON ontology is an integrated cross‐species anatomy ontology
Assigning a detailed and unambiguous methodological protocol for all animals, which combines concepts from different existing
for a trait, including the units to use or the ordinal or factor levels to ontologies, with wide application in biomedical or physiological re-
be assigned, is essential for standardizing its measurement process. search (Mungall, Torniai, Gkoutos, Lewis, & Haendel, 2012).
Efforts to develop handbooks for measurement protocols provide To conclude, there is already a suite of globally available the-
such a methodological standardization for plants (Cornelissen et al., sauri and ontologies for traits. However, definitions in some domains
2003; Perez‐Harguindeguy et al., 2013) or invertebrates (Moretti et are better covered than others (Kissling et al., 2018), and different
al., 2017), but are of limited use in harmonizing trait data that pre‐ curation strategies and measures for peer‐review and community
date or ignore this standard (Kattge, Ogle, et al., 2011). building are employed. To this end, the OBO Foundry is providing a
A thesaurus provides a ‘controlled vocabulary designed to clar- development platform for (biological) ontologies and offers review
ify the definition and structuring of key terms and associated con- and quality control (Smith et al., 2007, http://www.obofoundry.org/).
cepts in a specific discipline’ (Garnier et al., 2017; Laporte, Garnier, While defined vocabularies are increasingly used in biodiversity data
& Mougenot, 2013). To provide a logic structure for trait terms, management, distributed trait data of smaller projects published in
Garnier et al. (2017) suggest the Entity‐Quality model (EQ), where general‐purpose file servers rarely refer to standard terminologies.
a trait is defined as ‘an entity having a quality’ (for instance for trait Finding and applying the most suited and highest quality ontology
‘femur length’, ‘femur’ is the entity and ‘length’ the quality). In the- from the range of available ontologies is not an easy task for ecologi-
sauri, hierarchies of concepts can be formalized by linking each term cal researchers. To mitigate this effort, meta‐ontology initiatives, like
to broader or narrower terms, or to synonyms. For example, the defi- Ontobee (http://www.ontobee.org/), Bioportal (https://biopor tal.
nition of ‘femur length of first leg, left side’ is narrower than ‘femur bioontology.org/, Whetzel et al., 2011), or the GFBio Terminology
length’ which is narrower than ‘leg trait’ which is narrower than ‘lo- Service (Karam et al., 2016, https://terminologies.gfbio.org/), pro-
comotion trait’. Being publicly available, it is also possible to refer to vide centralized hosting for trait ontologies, structured browsing,
these defined terms via globally unique Uniform Resource Identifiers and harmonized web services for computational access.
(URIs). For example, a measurement of fruit mass could be linked to
the definition of the term within the Thesaurus of Plant characteris-
2.4 | Trait‐data structures
tics (TOP, Garnier et al., 2017) via its URI ‘http://top-thesaurus.org/
annotationInfo?viz=1&&trait=Fruit_mass’. While trait thesauri and trait ontologies typically define concepts of
In addition to defining terms for human interpretation, ontologies measurements and observations for focal groups of organisms, they
define terms by their relationship to other defined terms, thereby do not specify the format or structure in which trait data should be
providing a semantic model of the concepts used within a domain stored and labelled.
of research, with the objective of enabling the computational inter- A trait dataset typically contains multiple data entries, where
pretation of data (Kissling et al., 2018; Walls et al., 2012, 2014). The each entry describes a trait value observed on an instance of a sci-
Plant Trait Ontology (TO) definition of the concept ‘seed size’ con- entific taxon. The item on which the value has been observed can be
tains references to other globally defined terms: ‘A seed morphology very variable, ranging from an occurrence of an individual at a specific
trait (TO:0000184) which is the size of a seed (PO:0009010)’. Thus, place and time in its natural environment or a preserved specimen
trait definitions may refer to related terms or synonyms defined in in a collection (Figure 1a), a group of individuals of a specific taxon
other trait ontologies or other scientific ontologies, like units as de- (Figure 1b), or an entire population of a species (Figure 1c,d). The re-
fined by the Units of Measurement Ontology (Gkoutos, Schofield, ported trait values may be quantitative measurements or qualitative
& Hoehndorf, 2012). By providing ontologies in a formalized syn- facts. Quantitative measurements are values obtained either by di-
tax, like Web Ontology Language (OWL), a machine‐readable web rect morphological, physiological or behavioural observations on sin-
of definitions is spun across the Internet allowing researchers and gle specimens (Figure 1a), by aggregating replicated measurements
search engines to relate independent trait measurements with each on multiple entities (Figure 1b) or by estimating the means or ranges
other and connect them to the wider semantic web of online data for the respective taxon as reported in the literature or other pub-
(Berners‐Lee, Hendler, & Lassila, 2001; Gruber, 1995; Page, 2008; lished sources (e.g. databases, Figure 1c). This encompasses a wide
Walls et al., 2012). range of numeric data types, including continuous, binary, integer,
Comprehensive trait thesauri have been developed in TOP intervals or ratios, as well as categorical (ordinal or nominal) values.
(which is employed in the TRY database, Garnier et al., 2017) and Qualitative facts are assignments of categorical information, often
in the Thesaurus for Soil Invertebrate Trait‐based Approaches (T‐ on entire taxa, e.g. of a behavioural or life‐history trait (Figure 1d).
SITA, http://t-sita.cesab.org/, Pey et al., 2014). Ontologies of trait Beyond these core observations, further information might be
definitions have been developed for plants (e.g. the Plant Ontology, available that specify the taxon concept applied, provide detail on
| Methods in Ecology and Evolu
on
6 SCHNEIDER et al.
genus z
males of taxon y is
have herbivore
average body length of 43 mm
F I G U R E 1 Types of ecological trait data assume different entities or reported qualities: (a) morphometric or morphological
measurements of individual body features (lengths, areas, volumes, weights) or other quantities related to life history (e.g. reproductive
rates, life spans); (b) aggregated trait values are reported as means taken on multiple measures of organisms of a taxon; (c) quantitative traits
may be extracted from literature or existing databases, referring to the entire taxon (or a subset, e.g. a sex) as the subject of description; (d)
qualitative traits are categorical, ordinal or binary descriptors of the entire species or higher taxonomic level (also called ‘facts’)
the measurement method, or that place the reported measurement While the above‐mentioned standards provide terms and con-
in a broader observation context (including geolocation as well as cept definitions, and the logic relationships of those, they do not pre-
date and time of sampling). As such data may be useful for future scribe explicit structure for trait data. Based on the terms of DwC,
analysis of the causal reasons of trait variation or to explain noise the Extensible Observation Ontology (OBOE, Madin et al., 2007;
in measurement data, it should always be published along with the Schildhauer et al., 2016) formalizes observations and measurements
core data. In most cases, information on place and time apply to into a machine‐readable ontology, thus being easily integrated into
the entire dataset, and thus would be included in the metadata larger database management systems. By applying this scheme for
accompanying a data publication (potentially applying Ecological plant traits, Kattge, Ogle, et al. (2011) propose a generic database
Metadata Language, EML, KNB, 2011 as a formal structure). In the structure that covers most potential use cases of trait‐based ecology.
case of trait data and depending on the research scope, the infor- This data structure is built around a central data table that contains
mation may also have been collected on a level of measurement, observations of individual plants linked to several measurements of
occurrence or taxon level. Geolocation or date and time would then traits via identifiers. The observations are also linked to a taxonomy
not be provided as metadata, but as covariate data in additional and metadata descriptors of the observation context, like location or
columns of the primary dataset. When compiling datasets, it is a experimental treatment. Kissling et al. (2018) discuss different ontol-
key task of data curators to deal with dataset‐level information and ogies (including OBOE) that formalize the structure of observation
maintain it for downstream analysis by incorporating it into the data and attest that for the use cases of trait data these ontologies
compiled data table. are still difficult to integrate.
Standard terms for the formal description of the common con- The Encyclopedia of Life (EOL) has proposed TraitBank (Parr et
cepts of biodiversity knowledge have been provided in the schema al., 2016) as a standard structure for uploading data on physiological
for biological collection records (Access to Biological Collection and life‐history traits of all kingdoms of life. It is to date the most
Data, ABCD; Holetschek, Dröge, Güntsch, & Berendsohn, 2012) general approach of an integrated structure for trait data. The frame-
or the Darwin Core Standard for biodiversity data (DwC; Wieczorek work employs established terms provided by the DwC and the DwC
et al., 2012). Both DwC and ABCD are ratified standards of the MeasurementOrFact extension (Parr et al., 2016). Additional layers
Biodiversity Information Standards (TDWG, http://www.tdwg.org) of information cover bibliographic references, multimedia archives
which is a global network to support the development and wide and ecological interactions. TraitBank invites data submissions to
adoption of exchange standards for biodiversity data. These terms the EOL database in a structured Darwin Core Archive (DwC‐A,
may be used for defining columns in data tables that contain mea- GBIF, 2017), which is a set of simple text files (csv), a file to specify
surement values, units and categorical levels, taxon names, variables relationships between these text files (called meta.xml), and a file for
such as sex or life stage, information of time and date of observa- metadata descriptions using EML (called, EML.xml, see GBIF, 2017
tion and methodological details (Robertson, Döring, Wieczorek, for specifications, archives can be validated before upload on https
DeGiovanni, & Vieglais, 2009). A suite of terminology extensions ://tools.gbif.org/dwca-validator/).
links to and expands the capacities of DwC (Wieczorek et al., 2012). All of these structures suggest the use of stable URIs to refer to
Of particular importance for trait data is the ‘MeasurementOrFact’ taxon concepts. The difficulties with keeping taxonomic references
extension, which typically would be used in database management intact along with continuous changes in taxonomy consensus are a
and bioinformatics to structure trait observations (Parr et al., 2016). central challenge of biodiversity data management and are beyond
SCHNEIDER et al. Methods in Ecology and Evolu
on |
7
the scope of this review (Franz et al., 2016). Initiatives that aim at pro- FENNEC (Ankenbrand et al., 2018) is an online tool or self‐hosted
viding a stable reference while tracking the changing taxon concepts service capable of extracting trait information from multiple sources
are for instance the Catalogue of Life (https://www.catalogueoflife. for a target species community.
org/) or the EDIT Platform for Cybertaxonomy (https://cybert axon A more widespread implementation of ontologies would ad-
omy.eu/). The GBIF Backbone Taxonomy (GBIF Secretariat, 2017) vance the possibilities to integrate datasets and reduce noise
collects and bundles existing terminologies into a single reference and uncertainty when aggregating data. First, groups of trait re-
framework. searchers must take up the task of developing consensus defini-
tions into semantically defined ontologies that are useful for their
use case. Platforms like OBO Foundry can help structuring this
2.5 | Closing gaps to improve trait‐data reuse
process. Second, the reference to ontologies and thesauri must
In sum, we attest to a gap between the trait‐data structures de- be incentivized and facilitated for individual data providers by the
veloped for data curators and data managers and the data input development of tools for matching concepts from the available on-
produced by data providers. Hardly any of the aforementioned stan- tologies to their data. Third, frameworks for providing trait data
dalone or aggregated trait datasets for birds, amphibians, mammals in an unambiguous and machine‐readable structure must be sim-
or invertebrates employs the described standard terminologies, on- plified to match the limited resources of small and intermediate
tologies or data standards. As it stands, reusing these data in larger research projects. This can be achieved by extending documen-
compilations or integrating them into structured database initiatives tation or providing tools for the application of existing ontology
is error‐prone and labour‐intensive and the potential for a broad syn- frameworks and database structures (e.g. data validator services),
thesis is diminished. and by defining easy‐to‐use standard vocabularies that enable the
One likely reason for this lack of standardization is the complex- interoperability of data at minimal effort.
ity of the task: the proposed data structures are designed for multi‐ However, no unified and widely adopted terminology for pri-
layered, relational databases rather than for standalone datasets mary trait‐data publications has emerged across the multiple
for which a two‐dimensional data table may suffice. In the eyes of sub‐disciplines of trait‐based research. In the following chapter,
the data‐provider, in most cases, any co‐variate can be appended as we propose a unified vocabulary for trait data that can serve as a
extra columns to the dataset. The other reason is lack of awareness minimal consensus for describing and labelling trait data. The sim-
of the need for trait‐data standardization among data providers, who plicity of this standard terminology will lower the thresholds and
are not trained in the demands of biodiversity data‐management. In offer high pay‐off in the visibility and reuse of published data. By
addition, complying with what may be non‐intuitive data structures establishing this as a ‘best‐practice’ in trait‐based research, trait
is an investment without clear incentive or immediate pay‐off, and data will eventually fulfil the FAIR guiding principles for scientific
hardly affordable for small and intermediate‐size research projects, data (Wilkinson et al., 2016).
especially since funders often do not require these efforts to be in-
cluded into proposals.
By filling this gap, data‐brokering services (the German 3 | I NTRO D U C I N G TH E ECO LO G I C A L
Federation for Biological Data; http://gfbio.org, Diepenbroek et al., TR A IT‐ DATA S TA N DA R D VO C A B U L A RY
2014; e.g. Data Observation Network for Earth, DataONE, Michener
et al., 2011) or data management systems for scientific projects (e.g. As a response to the challenges outlined above, we propose a ver-
KNB and its open‐source database back‐end Metacat, https://knb. satile standard vocabulary for trait‐based ecological research. The
ecoinformatics.org/; Diversity Workbench, http://diversityworkbe Ecological Trait‐data Standard Vocabulary (ETS) is accessible at https
nch.net; BEXIS2, http://bexis2.uni-jena.de/) are likely to gain impor- ://terminologies.gfbio.org/terms/ets/pages/ and combines terms of
tance. These services simplify and direct the standardized upload of DwC with newly defined terms to cover the variety of trait‐based
research data and descriptive metadata into reliable and interlinked approaches and their different needs to report measurement detail.
data infrastructures. The goal of such initiatives is to facilitate data Rather than prescribing a data structure or exchange format, the
reuse by providing standardization of data, for instance by mapping vocabulary is intended as a more inclusive terminology that can be
to unambiguous terminologies and ontologies for biodiversity data used in three major use cases:
and clarifying conditions of data reuse.
Another solution for data users to access trait data in a struc- 1. by data providers: for publication of standardized primary data
tured way is offered by decentralized tools and tool chains to facil- on open‐access data repositories, or for labelling project‐spe-
itate the use and analysis of trait data. For instance, the r‐package cific data for local use and exchange with collaborators, e.g.
traits (Chamberlain et al., 2017) contains functions to extract trait in two‐dimensional data tables or project databases,
data directly from their source, including Birdlife, EOL TraitBank or 2. by data users and data curators: as a consensus vocabulary when
BetyDB. The package tr8 provides similar access to plant traits from compiling data from distributed sources into aggregate datasets,
a list of databases (including LEDA, BiolFlor and Ellenberg values; e.g. to map standardized columns and refer to taxa and trait defi-
Bocci, 2015) and aggregates them into a species × traits wide‐table. nitions in a uniform way, and
| Methods in Ecology and Evolu
on
8 SCHNEIDER et al.
+
(c) Original names and unambiguous URIs
(added as columns to core table)
verbatimScientificName verbatimTraitName verbatimTraitValue verbatimTraitUnit traitID taxonID measurementID occurrenceID
Agonum_ericeti body_length_cm 0.587 cm http://t-sita.cesab.org/ http://www.gbif.org/ 1 001
BETSI_vizInfo.jsp?trait=Body_length species/5755044
… … … … .. … …
(d) Extensions
+
taxonID taxonRank order
(added as columns, Taxon
mapped to identifiers) http://www.gbif.org/species/5755044 species Coleoptera
http://www.gbif.org/species/5755044 species Coleoptera
http://www.gbif.org/species/5755080 species Coleoptera
.. … …
… … … …
F I G U R E 2 Formats used for trait datasets: (a) taxon‐level trait data compiled from literature or aggregated from measurements are often
published as a compiled species × traits wide‐table; (b) observation long‐tables are a well‐defined and tidy data format, reporting one single
measurement per row and relating it to a standard trait definition and accepted taxon name; (c) additional columns may provide original
names for maintaining author‐side continuity, identifiers reference to taxa and trait concepts via unambiguous URI pointers. Additional
identifiers relate each row to other layers of information on (d) the taxon resolution, the individual organism (i.e. occurrence), or the origin of
or confidence in the reported measurement or fact
3. by data managers: in developing data exchange formats between each term. The service can be accessed programmatically (i.e. via
online resources, web services and software tools, e.g. when pro- the API; https://terminologies.gfbio.org/api/terminologies/).
viding database queries via a web service or defining input and Our vocabulary offers three extensions to contain additional
output formats of software packages. information on the context of the observation along with the core
data in analogy to DwC extensions (‘Taxon’, ‘Measurement or
All terms may be applied to describe columns of a data table (Figure 2; Fact’, and ‘Occurrence’; see section on extensions below). Further
see Appendix S2 for best‐practice principles and examples for publish- terms are provided for dealing with typical dataset‐level informa-
ing primary data). By applying these standard terms, data providers can tion on authorship and rights of reuse of the data (based on terms
ensure that the description of trait measurements uploaded into public of Dublin Core Metadata Initiative, DCMI), as well as for defin-
data repositories will be unambiguous. It will facilitate interoperability ing own trait concepts (see section on metadata below). Aspects
of published data and enable their reuse for future data aggregation not covered by the vocabulary may draw from terms provided by
initiatives and data synthesis, while warranting long‐term accessibility. other existing terminologies (in particular DCMI and DwC and its
The definitions of terms are hosted on the GFBio Terminology extensions), or be added as user‐defined columns (which should
Service (Karam et al., 2016, https://terminologies.gfbio.org/), pro- then be clearly specified in the metadata‐information accompa-
viding permanent and redirectable individual URIs and URLs for nying the dataset).
SCHNEIDER et al. Methods in Ecology and Evolu
on |
9
It provides important information that allows for the tracking of 2018; Michener, 2006). In the case of primary measurement data,
potential sources of noise or bias in measured data (e.g. variation this information usually applies to the entire trait dataset, and would
in measurement method) or aggregated values (e.g. statistical be stored along with the published data as metadata entered in a
method), as well as the source of reported facts (e.g. literature template provided by the file hosting service. To facilitate interoper-
source or expert reference). ability and computational evaluation of metadata, specific standards
3. The Occurrence extension contains vocabulary to describe infor- for metadata may be provided, e.g. by applying Ecological Metadata
mation on the observation context of individual organisms, such Language (EML, KNB, 2011). Whenever data from different sources
as sex, life stage or age. This also includes the method of sampling are compiled into a single dataset, metadata information would be-
and preservation, as well as the date and geographical location, come part of the resulting data table, as each data entry would have
which provide an important resource to analyse trait variation due to maintain reference to the original data provider and conditions of
to differences in space and time. reuse of these data. This can be achieved by appending the metadata
terms as columns to the core dataset, or by linking to a secondary
These additional layers of information can either be added as extra data table via an unambiguous datasetID (e.g. a URI pointing to the
columns to the core dataset or kept in separate data sheets, thus source DOI) and a descriptive datasetName (e.g. a descriptive name
avoiding redundancy and duplication of content. A unique identifier for the source). The ETS metadata vocabulary provides terms for a
links to these other datasheets, encoding single measurements or minimal set of information that should be provided along with trait
reported facts (measurementID) or individual organisms of a species data. The suggested terms originate from Dublin Core Metadata
(occurrenceID). Initiative (DCMI), and are widely compatible with terms provided by
The concept of ‘occurrence’ is prone to cause confusion. By the DataCite Metadata Schema (DataCite Metadata Working Group,
definition of DwC it is ‘An existence of an Organism at a particu- 2019). The terms can be extended and complemented by using terms
lar place at a particular time’. Thus, any individual observed twice from these resources.
would have two distinct ‘occurrences’. If sampling of an individual In order to ensure traceability, the metadata of any dataset that
is only performed once, this results in any occurrence being se- employs the ETS should refer to the specific online version that was
mantically identical with the individual organism (i.e. the DwC used to build the dataset, e.g. by entering ‘Schneider, F.D., Jochum,
term ‘organism’). Some data types directly refer to existing global M., Le Provost, G., Penone, C., Ostrowski, A. and Simons, N.K.,
identifiers for occurrence IDs, e.g. a GBIF URI or a stable identi- 2019, Ecological Traitdata Standard Vocabulary v0.10, https://doi.
fier references the precise specimen at a particular place and time org/10.5281/zenodo.2605377, URL: https://terminologies.gfbio.
from which the measurement was taken (Groom, Hyam, & Güntsch, org/terms/ets/pages/’ in the metadata field conformsTo. Wherever
2017; Güntsch et al., 2017). Also, as ‘occurrence’ is strictly defined referring to individual terms of the vocabulary in publications or
by a date‐time event, it may be identical to the common‐sense con- metadata, this should be done via their individual URIs.
cept of ‘observation’. As such, data entries for location of sampling
(provided in column locationID) and sampling campaigns (eventID),
which are often recorded and published along with trait data, are 4 | D I S CU S S I O N
tightly linked to the concept of ‘occurrence’. As occurrence is the
narrower term and the key concept for linking an individual organ- To serve the demand for the standardization and harmonization
ism to a location and sampling event in DwC, and since it is indeed of ecological trait data which has arisen from a growing number
relevant to distinguish between multiple ‘occurrences’ of the same of distributed datasets of different research contexts, we propose
organism in some trait‐based research applications, the ETS sticks a versatile vocabulary for the publication of new datasets, for the
to this terminology. creation of data compilations, and for the exchange and handling of
Identifiers can also be used to provide a structure within the trait data in the context of the semantic web.
measurement data table, e.g. to link rows of measurements on the Consensus building on how traits are to be used and evaluated is
same individual (by having entries share the same ID in column oc‐ currently under way in several fields of ecological research with their
currenceID). Similarly, the values of multivariate measurements can taxonomic focus and project‐specific questions (Garnier et al., 2017;
be linked by using the same measurementID for several rows. Kissling et al., 2018; Moretti et al., 2017; Pey et al., 2014). Such com-
The terms of the extensions draw from terms of the DwC exten- munity discussions on trait definitions and measurement practices
sions of particular relevance for trait data. See the documentation of are leading to a better quality of data, naturally. However, they still
the ETS for further detail on the use of extensions. require a stronger linkage into the global biodiversity data initiatives.
With our proposal of an Ecological Trait‐data Standard Vocabulary
(ETS), we aim to capture the common core concept of trait data in a
3.4 | Specification of metadata
single resource terminology and provide a starting point for the devel-
Dataset‐level information about structure, provenance of data, au- opment of a joint language and terminology around traits as a cross‐
thorship and data ownership, as well as terms of use should be con- sectoral topic of ecological and evolutionary research. To enable
sidered when sharing and working with trait datasets (Kissling et al., the ETS to capture the different approaches in trait‐based research
SCHNEIDER et al. Methods in Ecology and Evolu
on |
11
across fields, we invite researchers to contribute to future versions groups, ecosystem types or regions. These distributed data are het-
of the standard vocabulary and develop their own applications and erogeneous in form and description, hampering endeavours to har-
ontologies that interact with it. Development will also aim at linking monize, compile and analyse these data.
the initiative to the joint efforts for biodiversity data terminologies, in Using a standard vocabulary with globally accessible definitions
particular within Biodiversity Information Standards (TDWG). of terms would allow distributed trait data to be more easily reused
Data released according to consensus standards, especially if and harmonized into aggregated datasets. The biggest challenge in
published under open‐access licenses, are more easily reused in future standardization of trait data may be consensus building for
compilations and synthesis studies. By providing the ETS, an easy‐ standard terms, the establishment of incentives and the develop-
to‐use vocabulary for trait‐based research, the investment of time ment of tools for a user‐side standardization before the publication
and resources in trait‐data standardization before publication will be of data. This requires significant effort, but it returns great scientific
mitigated for individual researchers and small research projects. A benefit by enabling data‐heavy synthesis for a general understand-
well‐defined minimal vocabulary for metadata will also ensure that ing of biodiversity and ecosystem functioning.
authorship and terms of use are appropriately documented along
the data life cycle. However, for these incentives to take effect, data
AC K N OW L E D G E M E N T S
publications and data citations must become viewed as a valid sci-
entific contribution to the community and recognized in the profes- Thanks to all respondents to an internal online survey on trait data
sional evaluation of individual researchers (Costello, 2009; Roche, for the Biodiversity Exploratories project and to Diana Bowler, Klaus
Kruuk, Lanfear, & Binning, 2015). Birkhofer, Runa Boeddinghaus, Markus Fischer, Jens Kattge (and
At the community level, shifting the task of standardization the TRY Steering Commitee), Felicitas Löffler, Catrin Westphal and
from the data‐user side to the data‐owner side yields great gain in two anonymous reviewers for comments on the manuscript drafts
accuracy and reduces the risk of misinterpretation. For instance, and pre‐print, as well as the Ecological Trait‐data Standard vocabu-
measurement results depend very much on the precise method- lary. We are grateful to the organizers and participants of the Open
ology used and often systematic biases could be corrected for Traits workshop in New Orleans, USA, in August 2018. We thank
when providing an unambiguous definition. On the other hand, the past and present scientific coordinators, local managers and data
plausibility checks and evaluation of statistical methods, e.g. for managers of the Biodiversity Exploratories program for their work,
aggregating trait values to the species level, can only be done in and Markus Fischer, Eduard Linsenmair, Dominik Hessenmöller,
comparison across a wide array of datasets. Currently, these ‘big Daniel Prati, Ingo Schöning, François Buscot, Ernst‐Detlef Schulze,
data’ volumes are only available in centralized databases. However, Wolfgang W. Weisser and the late Elisabeth Kalko for their role in
to establish a best practice of data aggregation, an exploration and setting up the Biodiversity Exploratories program. The work has been
evaluation of different methods for quality assessment and quality partly funded by the DFG Priority Program 1374 ‘Infrastructure‐
control should be subject to a community discussion. This is only Biodiversity‐Exploratories’ (DFG‐Refno. Po362/18‐3, MA7144/1‐1,
possible with large quantities of distributed data being available in WE3081/21‐1, KO2209/12‐2); MMG obtained funding from Swiss
a harmonized way. The ETS facilitates such a community‐driven National Science Foundation (SNF 310030E‐173542/1); MJ was sup-
comparison. ported by the German Research Foundation within the framework of
Without clearly defined terms and concepts, handling of large the Jena Experiment (FOR 1451) and by the Swiss National Science
amounts of trait data by computational assistance systems for sci- Foundation.
entific analysis (‘e‐Science’) will be massively hampered (Wilkinson
et al., 2016). The ETS represents an important building block for
AU T H O R S ’ C O N T R I B U T I O N S
a unified mode to ease data exchange between web services and
software packages and thus facilitates the development of a soft- F.D.S., A.O., C.P. and N.K.S. conceived the idea and developed
ware toolchain for the trait‐data lifecycle. Having well‐defined the vocabulary for the trait‐data standard with significant con-
terms is also a key precondition for developing exchange formats tributions of M.J. and G.L.P.; C.P. and F.D.S. curated the living
between large database initiatives and biodiversity data archives. spreadsheet; A.G. and D.F. implemented the vocabulary in the
Even further downstream, readying the primary data for the se- GFBio terminology service; all authors contributed critically to the
mantic web via references to ontologies and data standards will structure and content of the manuscript and gave final approval
ease the application of automatized big‐data mining and machine‐ for publication.
learning techniques.
ORCID Díaz, S., Kattge, J., Cornelissen, J. H. C., Wright, I. J., Lavorel, S., Dray,
S., … Gorné, L. D. (2016). The global spectrum of plant form and
Florian D. Schneider https://orcid.org/0000-0002-1494-5684 function. Nature, 529(7585), 167–171. https://doi.org/10.1038/natur
e16489
David Fichtmueller https://orcid.org/0000-0002-0829-5849
Diaz, S., Quetier, F., Caceres, D. M., Trainor, S. F., Perez‐Harguindeguy,
Martin M. Gossner https://orcid.org/0000-0003-1516-6364 N., Bret‐Harte, M. S., … Poorter, L. (2011). Linking functional diver-
sity and social actor strategies in a framework for interdisciplinary
Anton Güntsch https://orcid.org/0000-0002-4325-4030
analysis of nature’s benefits to society. Proceedings of the National
Malte Jochum https://orcid.org/0000-0002-8728-1145 Academy of Sciences, 108(3), 895–902. https://doi.org/10.1073/
pnas.1017993108
Birgitta König‐Ries https://orcid.org/0000-0002-2382-9722
Diepenbroek, M., Glöckner, F. O., Grobe, P., Güntsch, A., Huber, R.,
Gaëtane Le Provost https://orcid.org/0000-0002-1643-6023 König‐Ries, B., Tolksdorf, R. (2014). Towards an integrated biodiver-
sity and ecological research data management and archiving plat-
Peter Manning https://orcid.org/0000-0002-7940-2023
form: The German Federation for the Curation of Biological Data
Andreas Ostrowski https://orcid.org/0000-0002-2033-779X (GFBio). In GI‐Jahrestagung (pp. 1711–1721).
Franz, N. M., Chen, M., Kianmajd, P., Yu, S., Bowers, S., Weakley, A. S., &
Caterina Penone https://orcid.org/0000-0002-8170-6659
Ludäscher, B. (2016). Names are not good enough: Reasoning over
Nadja K. Simons https://orcid.org/0000-0002-2718-7050 taxonomic change in the Andropogon complex1. Semantic Web, 7(6),
645–667. https://doi.org/10.3233/SW-160220
Gallagher, R., Falster, D. S., Maitner, B., Salguero-Gomez, R., Vandvik, V.,
Pearse, W., … Enquist, B. (2019). The open traits network: using open
REFERENCES
science principles to accelerate trait-based science across the tree
of life. EcoEvoRxiv. https://, https://doi.org/doi.org//10.32942/osf.io/
Allan, E., Manning, P., Alt, F., Binkenstein, J., Blaser, S., Blüthgen, N., …
kac45.
Fischer, M. (2015). Land use intensification alters ecosystem multi-
Garnier, E., Stahl, U., Laporte, M.‐A., Kattge, J., Mougenot, I., Kühn, I.,
functionality via loss of biodiversity and changes to functional com-
… Klotz, S. (2017). Towards a thesaurus of plant characteristics: An
position. Ecology Letters, 18(8), 834–843. https://doi.org/10.1111/
ecological contribution. Journal of Ecology, 105(2), 298–309. https://
ele.12469
doi.org/10.1111/1365-2745.12698
Alliance of German Science Organisations (2010). Principles for the
GBIF (2017). Darwin Core Archives – How-to Guide, version 2.0, re-
handling of research data. Retrieved from https://www.wissenscha
leased on 9 May 2011, (contributed by Remsen, D, Braak, K, Döring,
ftsrat.de/download/archiv/Allianz-Principles_Research_Data_2010.
M, Robertson, T), Copenhagen: Global Biodiversity Information
pdf (accessed date November 9, 2017).
Facility. Retrieved from https://github.com/gbif/ipt/wiki/DwCAH
Ankenbrand, M. J., Hohlfeld, S. C. Y., Weber, L., Foerster, F., & Keller,
owToGuide
A. (2018). FENNEC ‐ Functional Exploration of Natural Networks
GBIF Secretariat. (2017). GBIF backbone taxonomy. https://doi.
and Ecological Communities. bioRxiv, 194308. https://doi.
org/10.15468/39omei
org/10.1101/194308
Gkoutos, G. V., Schofield, P. N., & Hoehndorf, R. (2012). The Units
Bach, K., Schäfer, D., Enke, N., Seeger, B., Gemeinholzer, B., & Bendix,
Ontology: A tool for integrating units of measurement in science.
J. (2012). A comparative evaluation of technical solutions for
Database, 2012(bas033), 1–7. https://doi.org/10.1093/database/
long‐term data repositories in integrative biodiversity research.
bas033
Ecological Informatics, 11, 16–24. https://doi.org/10.1016/j.
Gossner, M. M., Simons, N. K., Achtziger, R., Blick, T., Dorow, W. H.
ecoinf.2011.11.008
O., Dziock, F., … Weisser, W. W. (2015). A summary of eight traits
Berners‐Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web.
of Coleoptera, Hemiptera, Orthoptera and Araneae, occurring
Scientific American, 284(5), 28–37. https://doi.org/10.1038/scientific
in grasslands in Germany. Scientific Data, 2, 150013. https://doi.
americ an0501-34
org/10.1038/sdata.2015.13
Bocci, G. (2015). tr8: An r package for easily retrieving plant species
Grime, J. P. (2001). Plant Strategies, Vegetation Processes, and Ecosystem
traits. Methods in Ecology and Evolution, 6(3), 347–350. https://doi.
Properties. John Wiley & Sons.
org/10.1111/2041-210X.12327
Groom, Q., Hyam, R., & Güntsch, A. (2017). Data management: Stable
Chamberlain, S., Foster, Z., Bartomeus, I., LeBauer, D., & Harris, D.
identifiers for collection specimens. Nature, 546(7656), 33. https://
(2017). traits: Species trait data from around the web (version 0.3.0).
doi.org/10.1038/546033d
Retrieved from https://cran.r-projec t.org/web/packages/traits/
Gruber, T. R. (1995). Toward principles for the design of ontologies used
index.html.
for knowledge sharing? International Journal of Human‐Computer
Cornelissen, J. H. C., Lavorel, S., Garnier, E., Díaz, S., Buchmann, N.,
Studies, 43(5), 907–928. https://doi.org/10.1006/ijhc.1995.1081
Gurvich, D. E., … Poorter, H. (2003). A handbook of protocols for
Güntsch, A., Hyam, R., Hagedorn, G., Chagnoux, S., Röpert, D., Casino,
standardised and easy measurement of plant functional traits
A., … Triebel, D. (2017). Actionable, long‐term stable and semantic
worldwide. Australian Journal of Botany, 51(4), 335–380. https://doi.
web compatible identifiers for access to biological collection objects.
org/10.1071/BT02124
Database, 2017, https://doi.org/10.1093/database/bax003
Costello, M. J. (2009). Motivating online publication of data. BioScience,
Guralnick, R. P., Zermoglio, P. F., Wieczorek, J., LaFrance, R., Bloom,
59(5), 418–427. https://doi.org/10.1525/bio.2009.59.5.9
D., & Russell, L. (2016). The importance of digitized biocollections
DataCite Metadata Working Group. (2019). DataCite metadata schema
as a source of trait data and a new VertNet resource. Database,
documentation for the publication and citation of research data.
2016(baw158), 1–13. https://doi.org/10.1093/database/baw158
Version 4.2. DataCite e.V. https://doi.org/10.5438/bmjt-bx77
Hoehndorf, R., Alshahrani, M., Gkoutos, G. V., Gosline, G., Groom, Q.,
de Bello, F., Lavorel, S., Díaz, S., Harrington, R., Cornelissen, J. H. C.,
Hamann, T., … Weiland, C. (2016). The flora phenotype ontology
Bardgett, R. D., … Harrison, P. A. (2010). Towards an assessment
(FLOPO): Tool for integrating morphological traits and phenotypes
of multiple ecosystem processes and services via functional traits.
of vascular plants. Journal of Biomedical Semantics, 7, 65. https://doi.
Biodiversity and Conservation, 19(10), 2873–2893. https://doi.
org/10.1186/s13326-016-0107-8
org/10.1007/s10531-010-9850-9
SCHNEIDER et al. Methods in Ecology and Evolu
on
13|
Holetschek, J., Dröge, G., Güntsch, A., & Berendsohn, W. G. (2012). The Michener, W., Vieglais, D., Vision, T., Kunze, J., Cruse, P., & Janée, G.
ABCD of primary biodiversity data access. Plant Biosystems – an (2011). DataONE: Data Observation Network for Earth – Preserving
International Journal Dealing with All Aspects of Plant Biology, 146(4), data and enabling innovation in the biological and environmental
771–779. https://doi.org/10.1080/11263504.2012.740085 sciences. D‐Lib Magazine, 17(1/2), https://doi.org/10.1045/janua
Jaiswal, P., Avraham, S., Ilic, K., Kellogg, E. A., McCouch, S., Pujar, A., ry2011-michener
… Zapata, F. (2005). Plant Ontology (PO): A controlled vocabulary Moretti, M., Dias, A. T. C., de Bello, F., Altermatt, F., Chown, S. L., Azcárate,
of plant structures and growth stages. Comparative and Functional F. M., … Berg, M. P. (2017). Handbook of protocols for standardized
Genomics, 6(7‐8), 388–397. https://doi.org/10.1002/cfg.496 measurement of terrestrial invertebrate functional traits. Functional
Jones, K. E., Bielby, J., Cardillo, M., Fritz, S. A., O'Dell, J., Orme, C. Ecology, 31(3), 558–567. https://doi.org/10.1111/1365-2435.12776
D. L., … Purvis, A. (2009). PanTHERIA: A species‐level data- Mouillot, D., Graham, N. A. J., Villéger, S., Mason, N. W. H., & Bellwood,
base of life history, ecology, and geography of extant and re- D. R. (2013). A functional approach reveals community responses to
cently extinct mammals. Ecology, 90(9), 2648–2648. https://doi. disturbances. Trends in Ecology & Evolution, 28(3), 167–177. https://
org/10.1890/08-1494.1 doi.org/10.1016/j.tree.2012.10.004
Karam, N., Müller‐Birn, C., Gleisberg, M., Fichtmüller, D., Tolksdorf, R., Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M.
& Güntsch, A. (2016). A terminology service supporting semantic A. (2012). Uberon, an integrative multi‐species anatomy ontology.
annotation, integration, discovery and analysis of interdisciplinary Genome Biology, 13(1), R5. https://doi.org/10.1186/gb-2012-13-1-r5
research data. Datenbank‐Spektrum, 16(3), 195–205. https://doi. Oliveira, B. F., São‐Pedro, V. A., Santos‐Barrera, G., Penone, C., & Costa,
org/10.1007/s13222-016-0231-8 G. C. (2017). AmphiBIO, a global database for amphibian ecological
Kattge, J., Díaz, S., Lavorel, S., Prentice, I. C., Leadley, P., Bönisch, traits. Scientific Data, 4, sdata2017123. https://doi.org/10.1038/
G., … Wirth, C. (2011). TRY – A global database of plant sdata.2017.123
traits. Global Change Biology, 17(9), 2905–2935. https://doi. Page, R. D. M. (2008). Biodiversity informatics: The challenge of linking
org/10.1111/j.1365-2486.2011.02451.x data and the role of shared identifiers. Briefings in Bioinformatics, 9(5),
Kattge, J., Ogle, K., Bönisch, G., Díaz, S., Lavorel, S., Madin, J., … 345–354. https://doi.org/10.1093/bib/bbn022
Wirth, C. (2011). A generic structure for plant trait databases. Park, C. A., Bello, S. M., Smith, C. L., Hu, Z.‐L., Munzenmaier, D. H.,
Methods in Ecology and Evolution, 2(2), 202–213. https://doi. Nigam, R., … Reecy, J. M. (2013). The Vertebrate Trait Ontology:
org/10.1111/j.2041-210X.2010.00067.x A controlled vocabulary for the annotation of trait data across
Keil, J. M., & Schindler, S. (2018). Comparison and evaluation of ontolo- species. Journal of Biomedical Semantics, 4(1), 13. https://doi.
gies for units of measurement. Semantic Web, 10(1), 33–51. https:// org/10.1186/2041-1480-4-13
doi.org/10.3233/SW-180310 Parr, C. L., Dunn, R. R., Sanders, N. J., Weiser, M. D., Photakis, M., Bishop,
Kissling, W. D., Walls, R., Bowser, A., Jones, M. O., Kattge, J., Agosti, D., T. R., … Gibb, H. (2017). GlobalAnts: A new database on the geogra-
… Guralnick, R. P. (2018). Towards global data products of essential phy of ant traits (Hymenoptera: Formicidae). Insect Conservation and
biodiversity variables on species traits. Nature Ecology & Evolution, Diversity, 10(1), 5–20. https://doi.org/10.1111/icad.12211
2(10), 1531–1540. https://doi.org/10.1038/s41559-018-0667-3 Parr, C. S., Schulz, K. S., Hammock, J., Wilson, N., Leary, P., Rice, J., &
Kleyer, M., Bekker, R. M., Knevel, I. C., Bakker, J. P., Thompson, Corrigan, R. J. (2016). TraitBank: Practical semantics for organism at-
K., Sonnenschein, M., … Peco, B. (2008). The LEDA Traitbase: tribute data. Semantic Web, 7(6), 577–588. https://doi.org/10.3233/
A database of life‐history traits of the Northwest European SW-150190
flora. Journal of Ecology, 96(6), 1266–1274. https://doi. Pérez‐Harguindeguy, N., Díaz, S., Garnier, E., Lavorel, S., Poorter, H.,
org/10.1111/j.1365-2745.2008.01430.x Jaureguiberry, P., … Cornelissen, J. H. C. (2013). New handbook for stan-
KNB (2011). Ecological Metadata Language (EML) specification. dardised measurement of plant functional traits worldwide. Australian
Retrieved from https://knb.ecoinformatics.org/#external//emlpa Journal of Botany, 61(3), 167–234. https://doi.org/10.1071/BT12225
rser/docs/eml-2.1.1/index.html (accessed date November 10, 2017). Pey, B., Laporte, M.‐A., Nahmani, J., Auclerc, A., Capowiez, Y., Caro, G.,
Laporte, M.‐A., Garnier, E., & Mougenot, I. (2013). A faceted search system … Hedde, M. (2014). A thesaurus for soil invertebrate trait‐based ap-
for facilitating discovery‐driven scientific activities: A use case from proaches. PLoS ONE, 9(10), e108985. https://doi.org/10.1371/journ
functional ecology. Semantics for Biodiversity (S4BioDiv 2013), 25. al.pone.0108985
Retrieved from https://hal-lirmm.ccsd.cnrs.fr/docs/00/83/17/57/ Robertson, T., Döring, M., Wieczorek, J., DeGiovanni, R., & Vieglais, D.
PDF/Proceedings_S4BioDiv-2013.pdf#page=27. (2009). Darwin core text guide. Retrieved from http://rs.tdwg.org/
Lavorel, S., & Grigulis, K. (2012). How fundamental plant functional dwc/terms/guides/text/index.htm(accessed date October 30, 2017).
trait relationships scale‐up to trade‐offs and synergies in eco- Roche, D. G., Kruuk, L. E. B., Lanfear, R., & Binning, S. A. (2015). Public
system services. Journal of Ecology, 100(1), 128–140. https://doi. data archiving in ecology and evolution: how well are we doing?
org/10.1111/j.1365-2745.2011.01914.x PLOS Biology, 13(11), e1002295. https://doi.org/10.1371/journ
Madin, J. S., Anderson, K. D., Andreasen, M. H., Bridge, T. C. L., Cairns, al.pbio.1002295
S. D., Connolly, S. R., … Baird, A. H. (2016). The Coral Trait Database, Royal Society Science Policy Centre (2012). Science as an open enterprise.
a curated database of trait information for coral species from the London, UK: The Royal Society. Retrieved from https://royalsocie
global oceans. Scientific Data, 3, 160017. https://doi.org/10.1038/ ty.org/topics-policy/projec ts/science-public-enterprise/report /.
sdata.2016.17 Salguero‐Gómez, R., Jones, O. R., Jongejans, E., Blomberg, S. P., Hodgson,
Madin, J., Bowers, S., Schildhauer, M., Krivov, S., Pennington, D., & Villa, D. J., Mbeau‐Ache, C., … Buckley, Y. M. (2016). Fast–slow continuum
F. (2007). An ontology for describing and synthesizing ecological and reproductive strategies structure plant life‐history variation
observation data. Ecological Informatics, 2(3), 279–296. https://doi. worldwide. Proceedings of the National Academy of Sciences, 113(1),
org/10.1016/j.ecoinf.2007.05.004 230–235. https://doi.org/10.1073/pnas.1506215112
McGill, B. J., Enquist, B. J., Weiher, E., & Westoby, M. (2006). Rebuilding Schildhauer, M., Jones, M. B., Bowers, S., Madin, J., Krivov, S., Pennington,
community ecology from functional traits. Trends in Ecology & Evolution, D., …O’Brien, M. (2016). OBOE: Extensible observation ontol-
21(4), 178–185. https://doi.org/10.1016/j.tree.2006.02.002 ogy. Version 1.1. KNB Data Repository. https://doi.org/10.5063/
Michener, W. K. (2006). Meta‐information concepts for ecological F11C1TTM
data management. Ecological Informatics, 1(1), 3–7. https://doi. Schneider, F., Jochum, M., LeProvost, G., Ostrowski, A., Penone, C., &
org/10.1016/j.ecoinf.2005.08.004 Simons, N. K. (2019). Ecological Trait-data Standard Vocabulary
| Methods in Ecology and Evolu
on
14 SCHNEIDER et al.
(v0.10) (Version v0.10). https://terminologies.gfbio.org/terms/ets/ Nucleic Acids Research, 39(suppl_2), W541–W545. https://doi.
pages/ https://doi.org/10.5281/zenodo.2605377 org/10.1093/nar/gkr469
Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., … Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10),
Lewis, S. (2007). The OBO Foundry: Coordinated evolution of ontol- 1–23. https://doi.org/10.18637/jss.v059.i10
ogies to support biomedical data integration. Nature Biotechnology, Wieczorek, J., Bloom, D., Guralnick, R., Blum, S., Döring, M., Giovanni,
25(11), 1251–1255. https://doi.org/10.1038/nbt1346 R., … Vieglais, D. (2012). Darwin core: An evolving community‐devel-
Smith, V. S., & Blagoderov, V. (2012). Bringing collections out of the dark. oped biodiversity data standard. PLoS ONE, 7(1), e29715. https://doi.
ZooKeys, 209, 1–6. https://doi.org/10.3897/zookeys.209.3699 org/10.1371/journal.pone.0029715
Ströbel, B., Schmelzle, S., Blüthgen, N., & Heethoff, M. (2018). An au- Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J. J., Appleton, G.,
tomated device for the digitization and 3D modelling of insects, Axton, M., Baak, A., … Mons, B. (2016). The FAIR Guiding Principles
combining extended‐depth‐of‐field and all‐side multi‐view imaging. for scientific data management and stewardship. Scientific Data, 3,
ZooKeys, 759, 1–27. https://doi.org/10.3897/zookeys.759.24584 160018. https://doi.org/10.1038/sdata.2016.18
Villéger, S., Brosse, S., Mouchet, M., Mouillot, D., & Vanni, M. J. (2017). Yoder, M. J., Miko, I., Seltmann, K. C., Bertone, M. A., & Deans, A. R.
Functional ecology of fish: Current approaches and future chal- (2010). A gross anatomy ontology for Hymenoptera. PLoS ONE, 5(12),
lenges. Aquatic Sciences, 79(4), 783–801. https://doi.org/10.1007/ e15991. https://doi.org/10.1371/journal.pone.0015991
s00027-017-0546-z
Violle, C., Navas, M.‐L., Vile, D., Kazakou, E., Fortunel, C., Hummel, I.,
& Garnier, E. (2007). Let the concept of trait be functional!. Oikos,
116(5), 882–892. https://doi.org/10.1111/j.0030-1299.2007.15559.x S U P P O R T I N G I N FO R M AT I O N
Walls, R. L., Athreya, B., Cooper, L., Elser, J., Gandolfo, M. A., Jaiswal, P.,
… Stevenson, D. W. (2012). Ontologies as integrative tools for plant Additional supporting information may be found online in the
science. American Journal of Botany, 99(8), 1263–1275. https://doi. Supporting Information section at the end of the article.
org/10.3732/ajb.1200222
Walls, R. L., Deck, J., Guralnick, R., Baskauf, S., Beaman, R., Blum, S., …
Wooley, J. (2014). Semantics in support of biodiversity knowledge
How to cite this article: Schneider FD, Fichtmueller D,
discovery: An introduction to the biological collections ontology and
related ontologies. PLoS ONE, 9(3), e89606. https://doi.org/10.1371/ Gossner MM, et al. Towards an ecological trait‐data standard.
journal.pone.0089606 Methods Ecol Evol. 2019;00:1–14. https://doi.
Whetzel, P. L., Noy, N. F., Shah, N. H., Alexander, P. R., Nyulas, C., org/10.1111/2041-210X.13288
Tudorache, T., & Musen, M. A. (2011). BioPortal: Enhanced function-
ality via new Web services from the National Center for Biomedical
Ontology to access and use ontologies in software applications.