Reads (%)
Reads (%)
60 60 60 i1 i2 i3 i4 i5 the functional potential of its
40 40 40 Intra-individual Inter-individual
over Ɵme (Meta)metabolomics: (also referred
to as metabonomics, mainly in the
20 20 20
t1 i1 context of research on single
0 0 0 organisms) technologies that
(D) t2 i2 measure intra- and/or extracellular
metaG metaT * metabolites in and around microbial
* t3 i3 communities.
Jensen shannon div.
Intra family
Inter family
Intra family
Inter family
Intra individual
Inter individual
Intra individual
Inter individual
Intra individual
Inter individual
Figure 1. Community-Wide View of the Variability of Encoded and Expressed Functions in the Human Gut Microbiome.
Microbiome. (A) High-level functional profiles of 1267 human gut microbiome metagenomes retrieved from the integrated
gene catalogue (IGC) of the human gut microbiome [25]. (B) High-level functional profiles from metagenomes of our own
smaller integrated multi-omics study [18] annotated using the IGC [25] in comparison to the (C) profiles from metatran-
scriptomes of the same samples. (D) Comparison of intra-individual to inter-individual and intra-family to inter-family
distances (Jensen–Shannon divergence) based on functional metagenomic (MG), metatranscriptomic (MT), and meta-
proteomic (MP) profiles [18]; *P < 0.05, Wilcoxon rank sum test. (E,F) Estimation of power to distinguish functional profiles
from members of different families based on metagenome and metatranscriptome measurements [18] applying limma/
voom assumptions and the statistical model of Bi et al. [29] (E) and van Iterson et al. [28] (F). (G) Summarizing scheme,
illustrating functional potentials with limited variability (middle) and functional expression profiles with greater plasticity
(bottom) within an individual over time (t), compared to (H) the variability between different individuals’ (i) microbiomes’
functional potentials (middle), and functional expression profiles (bottom).
activity of specific microbial taxa, such that oral species have very low transcript levels in stool
samples while colonic organisms are highly active [16,18]. Resolving gene expression to the
taxon of origin, and relating this to the overall activity of that taxon, should further help in
distinguishing in situ activity from noise in functional profiles. Discovery of compartment-
specific functional features, which are important in the context of health and disease [31],
may therefore be facilitated by metatranscriptomics (but see also Box 1 for a discussion of other
omics technologies).
The observation that metatranscriptomic functional profiles are more variable than might be
inferred based solely on metagenomic information suggests that nonhousekeeping genes,
even those with high genomic copy numbers, are not stably expressed in situ [10,11,16,18].
We have recently developed an approach which allows taxon-specific resolution of expressed
genes [18]. When applying this method to link functional genes to the genomes which encode
them, we observed that functions of interest may be contributed to the community-wide
phenotype by single or multiple microbial populations in the absence of observable differences
in the respective populations’ abundances [18]. The identity of these populations may differ in
different individuals, as the microbiota may have widely divergent taxonomic compositions [18].
The variability observed at the level of gene expression may very well be a reflection of
functional plasticity and a prerequisite for stable community function. Consequently, resolv-
ing functional differences at multiple omic levels to the taxa contributing them is necessary in
order to understand when and how these functions may impact human physiology.
The Unknowns
One challenge for microbiome research in relation to elucidating phenotypic impacts on the
host is posed by unknown taxa and functions. While the overall proportion of protein-coding
genes for which a molecular function cannot be predicted in the human microbiome (40–70%,
depending on the prediction method [18,32,33]), is still generally high, this proportion is higher
the rarer a microbial gene is in the human population (Figure 2A). Furthermore, this is especially
the case when encoded in taxa which are not well described or even uncharacterized
(Figure 2B). In many recent studies, genes without known functions, or those from uncultured
taxa, have been completely ignored, because metagenomic data were analysed by mapping to
annotated reference genomes. These approaches often make inefficient use of the data [34],
are likely to introduce biases in the interpretation [35], and do not have a handle on the large
proportion of horizontally transferred functions in the microbiome [36] as well as on strain-
specific functional gene complements [37,38] which make up taxa-specific pangenomes.
Horizontally transferred and strain-specific genes may be essential [39], in particular when
they code for medically relevant functions such as antibiotic resistance [40] or toxins [41]. In this
light, the prediction of functional potential [42,43] or even metabolic outcome [44] based on
rough (i.e., genus-level) taxonomic profiles must be regarded as questionable.
Number of genes
Annotated genes (%)
1 × 106
60 8 × 105
6 × 105
40 4 × 105
AnnotaƟon by: 2 × 105
FOAM all
0 to 0.11
0.11 to 0.23
0.23 to 0.45
0.45 to 0.87
0.87 to 1.7
1.7 to 3.4
3.4 to 6.8
6.8 to 13
13 to 26
26 to 51
51 to 100
0 20 40 60 80 100
Frequency (%)
Frequency (%)
1 × 10−2 Key: Genus known,
funcƟon predicted
Abundance metaT (%)
Figure 2. Genes of Unknown Function.
frequency of their occurrence according to the integrated gene catalogue (IGC) [25]. Annotations: ‘KO BLAST’: KEGG
orthologous group (KO) annotations included in the IGC [25]; ‘KO HMM’: HMM-based annotations using KOs [18];
‘FOAM’: HMM-based annotations using FOAM [32]; ‘eggNOG’: eggNOG-based [33] annotations included in the IGC [25];
‘MuSt HMM’: HMM-based annotations using KOs, Pfam-A-families, TIGR-families, Swiss-Prot- or MetaCyc enzymes [18];
‘all’: all annotations by either of the named methods. (B) Relationship between the number of annotated genes (by any of
the methods displayed in (A), their relative frequency of occurrence, and the level of taxonomic assignment in the IGC [25].
(C) Frequency of occurrence [25] and maximum observed expression [18] of genes in the IGC. Pink dots highlight genes
annotated with orthologous groups or protein domains of unknown function.
Ignoring functional unknowns also limits the potential that metagenomic and metatranscrip-
tomic approaches possess in creating new knowledge. For example, approaches to compare
abundances and genomes of uncultured taxa, which contribute approximately 40% of the
metagenomic data, are well established [45,46]. Similarly, collections of orthologous groups
and protein families without known functions have been established [32,47,48], allowing for
cross-sample comparisons. These approaches facilitate the identification of biologically signif-
icant entities, for example, because they are found to be enriched or depleted in individuals with
a disease or consistently highly abundant and/or expressed. For instance, in our recent multi-
omics study, 9% of the differentially abundant transcripts (between families or between
Several experimental approaches to gain knowledge on ‘the dark matter’ of the human
microbiome have been proposed, in addition to the proven combination of classical microbio-
logical techniques with functional genomics. ‘Functional metagenomics’ involving the large-
scale in vitro screening of metagenomic sequences has been developed [49–51], including use
of microfluidics to assay millions of metagenomic variants of apparently similar genes [52].
‘Culturomics’, the combination of miniaturized cultivation and advanced sequencing
approaches, for example, to generate metagenomes from enrichment cultures, allows for
the detailed characterization of organisms that are not culturable at a traditional laboratory scale
[53,54]. The elucidation of unknowns that differ in health and disease, as well as the specific role
they play in microbiome–host interactions, is an important challenge for the coming years.
The above observations are likely a reflection of functional redundancy within the healthy
human microbiome. Functional redundancy can confer resilience [56] and therefore can
stabilize ecosystem functionality during perturbations [57], which, in the context of the human
microbiome, is generally assumed to lead to both stability and health [58]. However, the actual
relationship between functional redundancy and stability has not been studied in the human gut
microbiome, in contrast to other microbial ecosystems [59,60]. It is not even known whether
there is true redundancy, as different genomic contexts may determine the impact of genes [61]
and, within the gut microbiota, the interaction with the host [62,63]. The assumption that
functional redundancy of the microbiome is related to human health is primarily based on an
apparent relationship between taxonomic stability and the maintenance of taxonomic and
functional diversity over time [18,64,65]. However, for the human gut microbiome, it is currently
unclear whether diversity is a prerequisite for stability [66], which has been shown in other
contexts [67–69]. Functional richness has also been suggested to positively impact human
8 × 106 1 × 105
1 × 104
6 × 106
1 × 103
4× 106
1 × 102
2 × 106 1 × 101
0 1 × 100
0 20 40 60 80 Annotaons of:
Frequency (%) Single sample IGC
(C) (D)
25 25
20 20
No. of bins
No. of bins
15 15
10 10
5 5
0 0
1 10 100 10 000 0.1 10 1000
Total MG depth Total MT depth
Figure 3. Functional Redundancy in the Human Microbiome
number of unique genes and the frequency of their occurrence. The graph is based on the 1267 human gut
microbiome metagenomes retrieved from the integrated gene catalogue (IGC) [25]. (B) Numbers of genes with the same
functional annotation, based on KEGG orthologous groups (KO) and eggNOG [33] orthologous groups, as published with
the IGC [25]. (C) Relationship between the number of population-level genomes (‘bins’) containing genes annotated with a
function in microbial metabolism and the corresponding cumulative metagenomic (MG) depth of coverage of the genes. (D)
Relationship between the number of population-level genomes (‘bins’) expressing annotated genes and the corresponding
cumulative metatranscriptomic (MT) depth of coverage. (C,D) Graphs are based on one representative sample from a
healthy individual [18].
health [70], and decreased functional diversity has been observed in several diseases [22],
although the observed functional richness may also be influenced by colonic transit time [71]. A
higher metabolic diversity ensures digestibility of a wider range of nutrients [72] and potentially
increases overall energy harvest. Metabolic diversity may also offer a protective potential
against environmental toxic substances [3]. Despite the likely importance of functional diversity,
the exact mechanism by which the human host benefits from redundant, diverse, and/or stable
Key Figure
Roadmap for Using Functional Omics to Create New Knowledge
meta’omics: Context-dependent
metatranscriptome expression paerns
- Funconal Candidate funcons with Intervenon
- annotaons health-related potenal studies
Human gut - Targeted New
microbiome Gene catalogue experiments funconal
model systems knowledge
Genes Orthologous Most wanted Funconal
groups of list for in vitro
Metagenome unknown genes funconal
Genome assays
collecon DUFZX
Funconal assays
Human omics
clinical and lifestyle data
Figure 4. Crucial steps are the integration of reference genomes, metagenomic data collections, and de novo gene and genome reconstructions in genome and gene
collections or catalogues. Genes should be linked to functions, taxonomic occurrence, and expression in different hosts. Genes without predicted functions can be
grouped by orthology to enable comparative analyses and derive a list of ‘most wanted’ yet to be determined functions. Genes with functions that are likely to affect
human health and/or display suggestive patterns of expression in different human hosts should be validated in targeted experiments in model systems and human
intervention studies.
This work was supported by Luxembourg National Research Fund (FNR) CORE programme grants (CORE/15/BM/10404093 and CORE/16/BM/11276306) to P.W.
supported by Luxembourg National Research Fund (FNR) CORE programme grants (CORE/15/BM/10404093 and
CORE/16/BM/11276306) to P.W.
