Article

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Review Article

https://doi.org/10.1038/s42256-020-0217-y

Ensemble deep learning in bioinformatics


Yue Cao1,2, Thomas Andrew Geddes2,3, Jean Yee Hwa Yang1,2 and Pengyi Yang   1,2,4 ✉

The remarkable flexibility and adaptability of ensemble methods and deep learning models have led to the proliferation of their
application in bioinformatics research. Traditionally, these two machine learning techniques have largely been treated as inde-
pendent methodologies in bioinformatics applications. However, the recent emergence of ensemble deep learning—wherein
the two machine learning techniques are combined to achieve synergistic improvements in model accuracy, stability and repro-
ducibility—has prompted a new wave of research and application. Here, we share recent key developments in ensemble deep
learning and look at how their contribution has benefited a wide range of bioinformatics research from basic sequence analysis
to systems biology. While the application of ensemble deep learning in bioinformatics is diverse and multifaceted, we identify
and discuss the common challenges and opportunities in the context of bioinformatics research. We hope this Review Article
will bring together the broader community of machine learning researchers, bioinformaticians and biologists to foster future
research and development in ensemble deep learning, and inspire novel bioinformatics applications that are unattainable by
traditional methods.

B
ioinformatics, an interdisciplinary field of research, is at the With the aim of providing a reference point to foster research in the
centre of modern molecular biology, where computational increasingly popular field of ensemble deep learning and its applica-
methods are developed and utilized to transform biological tion to various challenges in bioinformatics, in this Review Article
data into knowledge and translate them for biomedical applications. we revisit the foundation of ensemble and deep learning, and sum-
Among the various computational methods utilized in bioinfor- marize and categorize the latest developments in ensemble deep
matics research, machine learning, a branch of artificial intelligence learning. This is followed by a survey of ensemble deep learning
characterized by data-driven model building, has been the key applications in bioinformatics. We then discuss the remaining chal-
enabling computational technology1. At the forefront of machine lenges and opportunities that we hope will inspire future research
learning, ensemble learning and deep learning have independently and development across multiple disciplines.
made a substantial impact on the field of bioinformatics through
their widespread applications, from basic nucleotide and protein Basics of ensemble and deep learning
sequence analysis to systems biology2,3. Ensemble learning refers to a class of strategies where instead of
Until recently, ensemble and deep learning models have largely building a single model, multiple ‘base’ models are combined to per-
been treated as independent methodologies in bioinformatics form tasks such as supervised and unsupervised learning7. Classic
applications. The fast-growing synergy between these two popular ensemble methods for supervised learning fall into three categories:
techniques, however, has attracted a new wave of development and bagging-, boosting- and stacking-based methods. In bagging8, indi-
application of next-generation machine learning methods referred vidual base models are trained on subsets of data sampled randomly
to as ensemble deep learning (Fig. 1a). The root of ensemble deep with replacement (Fig. 1b). In boosting9, models are trained sequen-
learning can be traced back two decades, when ensembles of neu- tially (Fig. 1c), where subsequent models focus on previous misclas-
ral networks were found to reduce generalization error4. However, sified samples. In stacking, a meta-learner is trained to optimally
the recent resurgence of ensemble deep learning models has combine the predictions made by base models10. Like supervised
brought about new ideas, algorithms, frameworks and architec- ensemble learning, conventional unsupervised ensemble learn-
tures that substantially enrich the old paradigm. Through its novel ing, such as ensemble clustering11, also relies on the generation and
application to a wide range of biological and biomedical research, integration of base models (Fig. 1d). While their variants, includ-
ensemble deep learning is unleashing its power in dealing with ing more advanced methods reviewed in the next section, have also
key challenges, including small sample size, high-dimensionality, been used in ensemble learning, a guiding principle in designing
imbalanced class distribution, and noisy and heterogeneous data ensemble methods has been ‘many heads are better than one’12.
generated from diverse cellular and biological systems using an Deep learning, a branch of machine learning, is rooted in artifi-
array of high-throughput omics technologies. These computa- cial neural networks13. The most fundamental architecture of deep
tional, methodological and technological undertakings and break- learning models is the densely connected neural network (DNN),
throughs together are leading a phenomenal transformation of consisting of a series of layers of neurons; each of these is connected
bioinformatics. to all neurons in the previous layer14. More sophisticated models
Both ensemble learning and deep learning methods have been expand on the basic architectures. In convolutional neural networks
extensively studied and reviewed in the context of bioinformatics (CNNs)15, each layer comprises a series of filters that ‘slide over’ the
applications5,6. However, the emergence of ensemble deep learn- output of the previous layer to extract local features across different
ing and its application in bioinformatics has yet to be documented. parts of the input. In recurrent neural networks (RNNs)16, circuits

1
School of Mathematics and Statistics, University of Sydney, Sydney, New South Wales, Australia. 2Charles Perkins Centre, University of Sydney,
Sydney, New South Wales, Australia. 3School of Environmental and Life Sciences, University of Sydney, Sydney, New South Wales, Australia.
4
Computational Systems Biology Group, Children’s Medical Research Institute, University of Sydney, Westmead, New South Wales, Australia.
✉e-mail: [email protected]

Nature Machine Intelligence | www.nature.com/natmachintell


Review Article NATuRe MAcHine InTelligence

a Supervised ensemble deep learning. In this section, we summarize


the key ensemble deep learning frameworks for supervised tasks.
Bioinformatics
Ensemble across multiple models. The aggregation of multiple and
often independent deep learning models is the most straightforward
application of ensemble deep learning to classify (Fig. 2a). As diver-
sity of individual networks is an essential characteristic of a good
ensemble model24, a variety of strategies exist to promote diversity
ce
en

of base networks. One approach is to encourage negative correla-


ng
lig

ni
el

ng
ar

tion in the classification error of base models25. The key motivation


nt

En
Ensemble
li

ni

se
l
ia

ar
ne

deep learning in
behind promoting negative correlation among base models is to
ic

m
le
hi
ti f

bl
bioinformatics
p
ac
Ar

e
ee

encourage complementary learning of the training data to achieve

le
M

ar
ni
better generalizability of the ensemble. An alternative approach to

ng
increase base model diversity is through multiple choice learning
b Data c d Data in which each network is ‘specialized’ on a particular subset of data
perturbation perturbation
during the training step26.
X X X An issue associated with training and storing multiple models is
the computational and storage demand involved. To address this,
...
... Base
... methods that perform knowledge distillation have become increas-
X1 X2 XT P1 P2 PT X1 X2 XT
...
classifiers
...
ingly popular27. One such implementation is based on the concept
Passing classification
P1 P2 PT
residues
C1 C2 CT of a teacher–student network framework, where the teacher net-
works are selected from a pool of pre-trained networks and the stu-
Base Base
classifiers clusters dent network distils knowledge of multiple teachers into a single
V WV V
and often simpler network28,29. The testing phase is storage and com-
Voting Consensus
function
Weighted
function
putationally efficient, as the samples only need to pass through a
voting
single student network.
Fig. 1 | The focus of this Review Article and classic ensemble methods. Ensemble within a single model. Ensemble strategies described in
a, Relationships of artificial intelligence, machine learning, deep learning, the previous section require training of multiple models. Deep
ensemble learning and bioinformatics. The red square denotes the focal learning models are often computationally costly to train and
point of this Review Article. b–d, Classic ensemble learning frameworks may take days or even weeks depending on the scale of the dataset
including bagging and its variants (b), boosting and its variants (c), and and model. Effort has been made to develop ‘implicit ensembles’
ensemble clustering based on data perturbation (d). X, input data. where a single neural network could achieve an effect similar to
integrating multiple network models. To this end, a group of tech-
niques focuses on random deactivation of neurons and layers dur-
are created to feed the output of a layer back into the same layer ing the training process of a single model. This leads to an implicit
along with new input, allowing the model to act on dependencies ensemble of networks with different architectures (Fig. 2b). For
between upstream and downstream values in a sequence. Variants example, the random deactivation of neurons, termed dropout,
of RNNs have been proposed to enable more effective learning in originally proposed as a regularization strategy30 for addressing
long-term dependency tasks, with the two most common ones model overfitting is now widely known as an implicit ensemble
being long short-term memory (LSTM)17 and gated recurrent unit strategy31,32. This has inspired follow-up works on random deacti-
(GRU)18. In residual neural networks (ResNet)19, shortcuts between vation of building blocks, termed ResBlocks, in ResNets33 and the
upstream and downstream layers are introduced to improve the combined random deactivation of neurons and layers34. Besides
effectiveness of backpropagation in networks with many hidden random deactivation-based methods, alternative strategies have
layers. In autoencoders20, networks are constructed with an encoder also been explored. One popular approach is the snapshot ensemble
and a decoder that together learn a more efficient latent space rep- technique, where the key idea is to save multiple versions of a sin-
resentation of the original higher-dimensional data. Although the gle model during the training process for forming an ensemble35.
difference between traditional neural networks and deep learning In a snapshot ensemble, a cyclic learning rate scheduler is utilized,
may seem elusive, the latter is increasingly defined by its unique where the learning rate is abruptly changed every few epochs to per-
architectures and ability to learn complex data representations that turb the network and thus may lead to diversity in the snapshots of
are beyond the capacity of classic models21. the model.

Ensemble deep learning Ensemble with model branching. Single-model ensemble approaches
Deep learning is well known for its power to approximate almost greatly reduce training cost compared to ensembles of multiple
any function and increasingly demonstrates predictive accuracy models. However, such a reduction in computational demand
that surpasses human experts. However, deep learning models are comes potentially at a cost in base model diversity. Since the infor-
not without shortcomings: they often exhibit high variance and may mation captured by the lower layers of neural networks is likely to
fall into local loss minima during training. Indeed, empirical results be similar across models, a group of techniques has emerged with a
of ensemble methods that combine the output of multiple deep focus on sharing lower layers followed by ‘branching’ of additional
learning models have been shown to achieve better generalizability layers36. These model branching approaches introduce diversity
than a single model22. In addition to simple ensemble approaches while also enjoying the reduction of time and computation of train-
such as averaging output from individual models, combining het- ing multiple models (Fig. 2c). Besides reducing computational cost,
erogeneous models enables multifaceted abstraction of data, and model branching has also been adapted to address other challenges
may lead to better learning outcomes23. In this section, we catego- in training an ensemble. For example, the gradient can be propa-
rize and summarize the most representative ensemble deep learning gated over a shorter path in a branching network, mitigating the
strategies for both supervised and unsupervised tasks. vanishing gradient problem37. In the knowledge distillation frame-

Nature Machine Intelligence | www.nature.com/natmachintell


NATuRe MAcHine InTelligence Review Article
a Data b c
perturbation Model
branching
Random neuron or
... layer deactivation ...
X1

...
...
...

...
X ...
Voting X X

...

...
... function
T
X ...

Input Hidden Prediction Input Hidden Prediction Input Hidden Prediction


layer layer layer layer layer layer

d Data e f
perturbation

X1 Random deactivation
Model of neurons
perturbation
...

...

...

...

...

...
...

X X

XT X

C1 C1 C
Base Base
F F
...

...
clusters clusters
CT Consensus CT Consensus
function function

Fig. 2 | Typical ensemble deep learning frameworks in supervised and unsupervised learning. a, Ensemble across multiple models. Each neural network
is trained separately on the dataset, usually perturbed to allow the network to learn from diverse training samples. b, Ensemble within a single model.
Common strategies for creating intrinsic variants of the network include randomly deactivating and bypassing layers (indicated by the curved arrow)
and randomly deactivating neurons (indicated by the close-up). c, Ensemble by model branching. Common strategies include sharing lower layers and
branching out to learn different higher-level features with or without weight sharing. d, Unsupervised ensemble by data perturbation. Each autoencoder
is trained with a perturbed dataset such as bootstrapping. The latent representations are extracted for clustering and combined through a consensus
function. e, Model perturbation-based unsupervised ensemble. Multiple autoencoders each with a different model architecture can be used to learn
diverse representation of the original data. f, Unsupervised ensemble within a single model. Similar to the supervised case, random deactivation of neurons
can be used to create intrinsic variants of the network.

work, each branch acts as a student model, ensembled to form a multi-view clustering when such data are available. Representative
teacher model on the fly to reduce the computationally intensive examples include multi-view representation learning using deep
process of pre-training the teacher model38. The key commonality canonically correlated autoencoders41 and multi-view spectral clus-
between these model branching network ensembles is that by shar- tering where multiple embedding networks were used to represent
ing information, the base networks avoid parameter search from the original data from different feature sets42.
scratch and can converge faster.
Ensemble within a single model. The power of autoencoders in data
Unsupervised ensemble deep learning. In this section, we summa- dimension reduction has motivated research around creating bet-
rize the key ensemble deep learning frameworks for unsupervised ter data representations that are robust to noise in the input data.
tasks. For example, a denoising autoencoder architecture was introduced
in ref. 43, where values of a random subset of neurons are masked
Ensemble across multiple models. Most unsupervised ensemble (that is, changed to zero) during each training epoch, forcing the
deep learning methods employ autoencoders, a popular unsuper- network to overcome noise introduced to the data. The concept of
vised network architecture. Similar to the supervised approach, randomly masking neurons in denoising autoencoders is analogous
unsupervised ensemble methods can be categorized into those that to the dropout method used in the supervised approach, and hence
generate and combine multiple models through data and model can be considered as an implicit ensemble within a single model,
perturbation, and those that achieve implicit ensemble within a or ‘pseudo-ensemble’44, for unsupervised deep learning (Fig. 2f).
single model. In this line of research, a recent study exploits the flexibility of the
For methods based on data perturbation, strategies akin to bag- dropout algorithm and embeds it in a more advanced variational
ging in supervised learning are widely used (Fig. 2d). For example, autoencoder architecture45. The proposed algorithm employs a
Geddes et al. used random feature projection of the input data to novel strategy to learn the dropout parameter, thus alleviating the
train a set of autoencoders to create a cluster ensemble39. Training a need for manual tuning. Another extension in this direction is the
series of unsupervised networks with different hyper-parameters is ‘stacked’ denoising autoencoder that uses multiple layers of denois-
a common ensemble strategy for methods based on model pertur- ing autoencoders for improving data representation46. The data
bation (Fig. 2e). An example extending this approach uses differ- representation learned from such stacked denoising autoencoders
ent activation functions and a weighting scheme to improve model led to substantially improved classification accuracy compared with
accuracy40. An alternative to data and model perturbation is to use using raw input data.

Nature Machine Intelligence | www.nature.com/natmachintell


Review Article NATuRe MAcHine InTelligence

Theoretical advances for ensemble deep learning. While early component for handling small sample size59. Beyond combining
works on the bias–variance trade-off framework have laid the different network architectures, studies have also integrated differ-
theoretical foundation for neural network ensembles47, recent ent genomic data modalities to capture distinct and complementary
research on ensemble deep learning mostly relies on empirical information. In one study, DNA sequences and their neighbouring
experiments due to the increasingly specialized ensemble method- cytosine–guanine dinucleotide (CpG) states were used as input into
ologies and complex neural network architectures. Nevertheless, two sub-networks of an ensemble to explore their relationship in
efforts have been made to advance the theoretical foundation of predicting DNA methylation states60. This has led to the identifica-
this fast-growing field48. Studies have shown the existence of mul- tion of sequence motifs related to DNA methylation and the effect of
tiple local minima in training neural networks, where some enjoy their mutation on CpG methylation. In another study, an ensemble
better generalizability than others49. This has inspired ensemble network that takes input data either from DNA sequences alone or
techniques such as snapshot methods that take advantage of the with the addition of epigenetic information extracted from chroma-
diversity of multiple local minima35. Theoretical justification for tin immunoprecipitation (ChIP) and deoxyribonuclease (DNase)
dropout as a form of averaging has been discussed in ref. 31, where sequencing were used to predict human immunodeficiency virus
the expectation of the gradient with dropout was shown to be the type 1 (HIV-1) integration sites61. The ensemble network, compris-
gradient of the regularized ensemble error. A recent mathematical ing CNNs with attention layers62, enabled the discovery of DNA
framework provided a new perspective of dropout by relating it to a sequence motifs that are important for HIV-1 integration.
form of data augmentation50.
Gene expression. Gene expression data including microarray,
Bioinformatics applications of ensemble deep learning RNA-sequencing (RNA-seq) and, recently, single-cell RNA-seq
This section categorizes representative works in different areas of (scRNA-seq)63–65, has been studied extensively to better under-
bioinformatics applications (Table 1) and identifies their benefits, stand complex diseases and to identify biomarkers that can guide
such as improving model accuracy, reproducibility, interpretability therapeutic decision making. A recent study on cancer type clas-
and model inference. sification demonstrated how ensemble deep learning can serve as
a potential strategy to address the key challenge of reproducibility
Sequence analysis. Biological sequence analysis represents one of in biomarker research66. The use of a DNN ensemble in this work
the fundamental applications of computational methods in molecu- allowed the derivation of important genes through consensus rank-
lar biology. RNN and its variants (for example, LSTM and GRU) ing across multiple models, resulting in a robust set of biomarkers.
are well-suited to sequential data. For example, an LSTM/CNN Due to the difficulty of obtaining patient samples, especially for rare
multi-model was trained to extract distinct features to predict patho- diseases and cancer types, another common challenge in analysing
genic potential of DNA sequences51. Compared to DNA sequences, gene expression data from cancers and diseases is the small sample
RNA sequences offer an additional layer of information where size. The use of ensemble learning to mitigate this issue is exempli-
instructions encoded in genes are transcribed. While traditional fied by ref. 67, where the authors applied a multi-model approach to
methods rely on various manually curated RNA sequence features, generate initial predictions from RNA-seq gene expression profiles
ensemble deep learning enables automatic learning from raw data. of cancer samples and integrated these predictions using a DNN to
One example is in predicting localization of long non-coding RNAs, produce the final ensemble prediction.
where multiple sub-networks were used to integrate distinct feature In addition to its role in medical research, ensemble deep learn-
sets to maximize model performance52. In another work, a CNN/ ing has been used in a wide range of applications to improve under-
RNN ensemble was used to integrate features and raw sequence data standing of basic biological mechanisms from gene expression data.
to predict different types of translation initiation sites53, overcoming An example is the use of a DNN ensemble to explore the embry-
the generalizability issue of traditional methods that can only pre- onic to fetal transition process, a defining stage where cells lose the
dict a specific type of translational initiation sites. potential for regeneration68. A benefit of training multiple networks
Following transcription, messenger RNAs (mRNAs) are further is that the prediction scores from each network can be further used
translated into proteins that carry out various functions. Similar to to generate an integrative score to determine the transition state of a
RNA sequence analysis, methods relying on ensembles of multiple sample between the embryonic and adult state, a strategy that is not
sub-networks were used to integrate information from multiple possible with a single model. The utility of unsupervised ensemble
features sets to predict DNA binding sites54 and post-translational deep learning has also been demonstrated on the extraction of bio-
modification (PTM) sites55 on protein sequences. The study on logical pathway signatures69. By integrating signatures across 100
PTM site prediction has further demonstrated that features learned autoencoders through consensus clustering, the ensemble model
by ensemble models are ‘transferable’ for predicting different types detected more biological pathways with higher significance than a
of PTMs, a key property for tackling the issue of small sample size single model. Unsupervised deep learning ensembles have also been
in training data. applied to cell type identification in single-cell research. In ref. 39,
an ensemble of autoencoders was used to generate a diverse set of
Genome analysis. While sequence analysis has led to many bio- latent representations of scRNA-seq data for subsequent analysis.
logical discoveries, alone it cannot capture the full repertoire of
information encoded in the genome. Additional layers of genetic Structural bioinformatics. Proteins are the key products of genes,
information including structural variants56 (for example, copy and their functions and mechanisms are largely governed by pro-
number variations (CNVs)) and epigenetic modifications57 of the tein structures encoded in amino acid sequences. Therefore, mod-
genome bring important insight to the understanding of biological elling and characterizing proteins from their primary amino acid
systems, populations and complex diseases. sequences to secondary and tertiary structures is essential for under-
A number of ensemble deep learning methods have been devel- standing and predicting their functions70. RNN and its architectural
oped on this front, such as classifying cancer types using CNV variants are specifically designed to capture long- and short-range
data and a snapshot ensemble model comprising CNNs, LSTMs interactions between sequences, and are hence well-suited to
and convolutional autoencoders58. The use of supervised CNN and decoding the relationship between amino acid sequences and the
LSTM models allows both global and local sequential features to protein structures they encode. Extending on the use of a single
be captured, and further integration with unsupervised convolu- RNN, the ensemble of variants of RNNs with CNNs is a common
tional autoencoders enables unsupervised pre-training, an effective hybrid architecture in recent applications that seeks to combine the

Nature Machine Intelligence | www.nature.com/natmachintell


NATuRe MAcHine InTelligence Review Article
Table 1 | Categorization of recent ensemble deep learning methods in bioinformatics application
Type of Ensemble Deep Sequence Genome Gene Structural Proteomics Systems Multi- Bioimage
learning technique learning analysis analysis expression bio- biology omics informatics
architecture informatics
Supervised Multiple models DNN Refs. 66–68 Ref. 80 Ref. 83 Ref. 93 Ref. 98
CNN Ref. 54 Ref. 61 Ref. 75 Refs. 82,85
CNN + RNN Refs. 51,53
Ref. 60
Refs. 71,72
Ref. 79
Ref. 84 Ref. 90
CNN + RNN Ref. 55
Refs. 73,74,76

+ ResNet
Others Ref. 52 Ref. 58 Ref. 95
Within single CNN + RNN Ref. 58

model
Model branching CNN Refs. 96,97
CNN + Ref. 94
ResNet
Unsupervised Multiple models Autoencoder Refs. 39,69 Ref. 91,92
Others Ref. 89
Within single Autoencoder Ref. 58

model

power of RNN in analysing sequential data and CNN on extracting from data-independent acquisition (DIA) MS data80. While conven-
local features71,72. The replacement of CNN with ResNet73 as well as tional MS runs select only a few important peptides based on their
the addition of residual connections between GRU and CNN74 were signal levels (that is, data-dependent acquisition) for subsequent
also explored to facilitate feature propagation for improved mod- quantification, the DIA approach fragments every single peptide for
elling of long-range dependencies between amino acids. In these improved proteome coverage. However, the DIA approach may lead
works, ensemble deep learning not only improved generalizability to an increase in co-eluted peptides and therefore higher interfer-
on independent datasets but also led to the discovery of novel fea- ence in the data. The ensemble framework was able to quantify the
tures associated with protein structures. amount of interference between multiple peptides mapped to the
Besides predicting protein structures, many studies have focused same point, thereby removing interference and improving peptide
on directly predicting protein functions. An example of ensemble identification confidence and quantification accuracy.
deep learning application in this domain is illustrated by the work
of Zacharaki75, who used an ensemble of CNNs for protein enzy- Systems biology. Systems biology aims to map interactions of mole-
matic function prediction. Specifically, the ensemble is a fusion cule species, regulatory relationships and mechanisms to understand
of two CNNs trained separately on protein properties and amino complex biological systems as a whole81. One key aspect of systems
acid features for extracting complementary information. In another biology is the understanding of what and how biological molecules
example, Singh et al.76 built an ensemble deep learning model to interact. In recent times, ensemble deep learning has been applied
identify residue conformation crucial to protein folding and func- on this front to predict interactions among different biological mol-
tion. While the dataset used for model training has an extreme class ecules and entities. The application of an interpretable ensemble of
imbalance (1.4:1,000), the ensemble model, consisting of ResNet CNN models for predicting binding affinity between peptides and
and LSTM modules, yielded robust performance on independent major histocompatibility complex is an example of ensemble deep
test sets without manual generation of a balanced dataset. learning in this domain82 and has important implication in clinics.
The model demonstrated good generalizability across 30 indepen-
Proteomics. While protein structure and function prediction are dent datasets and uncovered binding motifs with literature support.
essential tasks for characterizing individual proteins, technological In predicting protein–protein interactions, an ensemble of DNNs
advances in quantitative mass spectrometry (MS) have now enabled trained on S. cerevisiae achieved more accurate results than other
global profiling of the entire proteome in cells, tissues and species77. machine learning methods83. Subsequently, the model was applied
Computational analysis of such large volume datasets is transform- to other datasets generated from different organisms and the rela-
ing our understanding of proteome dynamics in complex systems tive accuracy on each dataset was shown to be a good indicator of
and diseases78. the evolutionary relationships of those organisms.
Ensemble deep learning has been used as a key technique for Systems biology also extends to the interaction between biologi-
addressing various aspects of proteomics data analysis. The work of cal molecules and chemical compounds. In particular, the study of
Zohora et al.79 exemplifies the application of ensemble deep learn- protein and chemical compound interaction in drug development
ing to peptide identification from a liquid chromatography-MS has seen a growing number of ensemble deep learning applica-
(LC-MS) map, a critical step for identifying and quantifying protein tions. For example, Karimi et al. proposed an ensemble model that
abundance. Specifically, a hybrid network architecture comprising comprised various network modules for compound–protein affin-
both CNN and RNN modules was designed to detect sequential ity prediction84. To overcome the limited availability of labelled
features along the axes during the scan of an MS map. The final datasets, the model exploited abundant unlabelled compound and
model, an ensemble of multiple networks with different parameters, protein data through unsupervised pre-training. This was followed
was shown to achieve state-of-the-art results for protein quantifica- by interaction prediction on labelled data using CNN and RNN
tion. Another study proposed an ensemble of DNNs for learning modules in the ensemble. In another work on predicting drug and

Nature Machine Intelligence | www.nature.com/natmachintell


Review Article NATuRe MAcHine InTelligence

target protein interactions, a CNN-based ensemble model was used in addressing various other challenges in bioimage analysis. For
to score the likelihood of interaction of randomly selected drug– example, an ensemble network with knowledge distillation and a
protein pairs85. The trained model revealed that drugs with similar branching strategy was used to reduce the number of parameters in
structures bind to similar target proteins, suggesting potential simi- the model and therefore lower the likelihood of overfitting on small
larity in the effects of these drugs. datasets97. To deal with the problem of class imbalance, Yuan et al.98
introduced an iterative regularization approach that, for a given
Multi-omics. Multi-omics analysis is a topic closely related to sys- iteration, penalizes misclassification of samples that were correctly
tems biology, where integrative methods are used to understand classified in previous iterations. This method alleviated the problem
biological regulation by combining an array of omics data. There of bias in favour of majority classes and preserved correctly classi-
is a growing interest in multi-omic studies as it is increasingly rec- fied minority examples.
ognized that a single type of omics data does not capture the entire
landscape of the complex biological networks86. Challenges and opportunities
Many conventional machine learning methods have been pro- The applications we have reviewed here reveal various challenges
posed to utilize the complementary information present across mul- and opportunities surrounding ensemble deep learning in bioinfor-
tiple modalities of omics data87,88. Most conventional approaches, matics research. In the following sections, we highlight several key
however, do not account for the relationships among different directions in which ensemble deep learning is likely to have increas-
omics layers. To this end, Liang et al. proposed to use an ensemble of ingly important impacts.
deep belief networks to encode gene expression, miRNA expression
and DNA methylation data into multiple layers of hidden variables Small sample size. Deep learning is known for its exceptional
for integrative clustering89, thereby actively exploring regulation performance on data with large sample size. While modern omics
across different omics layers. Ensembles of different deep learn- technologies have enabled the profiling of tens of thousands of
ing architectures have also been utilized to take advantage of the molecular species and biological events in a single experiment, the
unique characteristics of different data types. Using an ensemble of number of samples available is usually small owing to the cost in
CNNs and LSTMs, both genomic sequences and their secondary time and labour. Hence, bioinformatics applications are often con-
structures can now be integrated for alternative polyadenylation site fronted with the issue of limited sample size, causing unstable pre-
prediction on pre-mRNAs90. This addressed the gap where existing dictions and thus low reproducibility in results.
models overlooked RNA secondary structures, despite these being Fortunately, one essential property of ensemble methods is stabil-
important features to the polyadenylation process. Another applica- ity. Leveraging this key property, a number of ensemble deep learn-
tion in multi-omics was the use of a novel ensemble of autoencoders ing methods were proposed to specifically address small sample
wherein a coupling cost was used to encourage the base autoencod- size challenges, opening up the opportunity to utilize deep learn-
ers to learn from each other91. This unsupervised model allowed the ing in bioinformatics. While the most popular approach so far has
integration of two vastly different data types—single-cell transcrip- been using pre-trained models, more specialized methods have also
tomics and electrophysiological profiles—and to identify common been explored. Examples include extracting intermediate features
and unique cell types across datasets. learned by the network to generate additional output for integra-
High dimensionality and heterogeneity are both issues associ- tion and thus stabilizing the ensemble prediction99; and encourag-
ated with the large number of molecular features in multi-omics ing cooperation among individual models through a pairwise loss,
datasets. The application of autoencoders is popular in dealing with thereby reducing the variance caused by small sample size100. These
these challenges. In one instance, an ensemble of autoencoders was methods represent promising strategies that can be explored in
used to extract lower dimension and integrate over 450,000 features future lines of research.
in pan-cancer classification92. Stacking multiple deep learning mod-
els, each handling a different modality of omics data93, is another High-dimensionality and class imbalance. Omics data are
approach that avoids feature concatenation that might otherwise well-known for their high-dimensionality, as biological features (for
exacerbate the issue of high dimensionality in datasets potentially example, genes, proteins) frequently outnumber samples. This is
containing tens of thousands of features. further exacerbated by the issue of small sample size already men-
tioned. The problem, widely known as the ‘curse of dimensionality’,
Bioimage informatics. Traditionally, analysis of bioimages is often has been identified as one of the main causes of overfitting in deep
performed manually by field experts. With the growing number learning models due to the large number of parameters that needs
of computer vision applications demonstrating their superior per- to be fitted101. While deep learning models seem to be particularly
formance over human experts, automatic analysis has become an susceptible to the high-dimensionality of omics data, the com-
increasing focus in bioinformatics studies. A primary application bination of deep learning with ensemble methods such as model
of ensemble deep learning in bioimage informatics is the detec- averaging39 and the implicit ensemble through dropout30 has been
tion of diseases such as cancers in patient images. For instance, to demonstrated to be an effective approach for handling this issue.
improve classification of glioma from magnetic resonance images, Imbalanced class distribution is another common issue in many
Lu et al. embedded a branching module into ResNet for integrating bioinformatics applications102 where, for example, a biological
multi-scale information obtained from different receptive fields of event of interest is only present in a small proportion of the data.
the original ResNet94. Codella et al. proposed an ensemble model Ensemble deep learning is found to be an effective remedy for deal-
that combined network architectures, including ResNet, CNN and ing with this challenge. Bioinformatics applications include the use
U-Net, to segment and classify skin lesions from dermoscopic of bootstrap-sampling- and random-sampling-based ensemble
images95. It is noteworthy that the proposed model achieved a deep learning for dealing with class imbalance in DNA and protein
segmentation result with 95% accuracy, surpassing that of human sequence analyses53,54. Due to the increasing use of high-throughput
experts who exhibit an accuracy of around 91%. To segment cervi- technologies, ensemble deep learning strategies that are capable of
cal cell images, Song et al. performed multi-resolution extraction dealing with these challenges will remain an active research direc-
and colour space transformation of the images to generate diverse tion in bioinformatics.
feature sets, leading to enhanced segmentation accuracy96.
Besides improving classification and segmentation accu- Data noise and heterogeneity. Biological systems are inherently
racy, ensemble deep learning methods have also been explored heterogeneous and noisy. This is further confounded by technical

Nature Machine Intelligence | www.nature.com/natmachintell


NATuRe MAcHine InTelligence Review Article
noise from various sources including experimental protocol and architectures and provided a panel of ensemble strategies and algo-
omics platform. A key characteristic of ensemble methods is their rithms to enable more efficient model fitting with a substantial
robustness to data noise103, which can facilitate the reproduc- reduction in training time. The improvement of computer hardware
ible extraction of biological signals from noisy and heterogeneous and technological advances in computing methods such as distrib-
data. The application of methods such as denoising autoencoders uted and federated deep learning106,107 also facilitate the application
also strengthens model robustness43. The integration of ensemble and deployment of ensemble deep learning on large-scale omics
and deep learning methods therefore provides an opportunity to data. Given that the size and complexity of biological data are only
address noise and heterogeneity in biological data. expected to soar as technology progresses, the development of more
The development of multi-omics technologies further contrib- efficient ensemble deep learning algorithms and architectures will
uted to heterogeneity within datasets in that different molecular be another crucial direction in both machine learning and bioinfor-
species measured across omics platforms must be combined and matics research.
analysed integratively to understand biological systems holistically.
Ensemble deep learning methods such as multi-model approaches Future outlook
reviewed previously have been demonstrated to be highly effective While the ensemble of neural networks has existed long before the
in combining different omics data for joint inference89 and classi- deep learning era, the recent development of ensemble deep learn-
fication90. Given these intrinsic properties of data generated from ing has substantially enriched the field with novel architectures
biological systems, we expect ensemble deep learning methods to and ensemble strategies that greatly improve model accuracy, reli-
play an increasingly important role in omics data analysis and in ability and efficiency. These innovations, together with properties
integrating large-scale multi-omics data. such as robustness to small sample size, high-dimensionality and
data noise, have transformed ensemble deep learning into a new
Model interpretability. A common criticism of deep learning force, leading to remarkable and widespread breakthroughs across
models is their lack of interpretability. Besides building an accurate different fields of bioinformatics applications. Nonetheless, many of
model, gaining insight from the model is also critical in bioinfor- the advanced ensemble techniques that harness the power of recent
matics applications, since having an interpretable model of a bio- deep learning architectures remain under-explored in their appli-
logical system may lead to testable hypotheses that can be validated cation to bioinformatics. In addition, the development and appli-
through experiments. cation of models that enable interpretation of biological systems
Several studies reviewed in previous sections have already made are still in their infancy. We hope this Review Article has sparked
notable progress in this direction. For example, attention layers in thoughts on ensemble deep learning across multiple disciplines,
ensemble networks were used to identify motifs of HIV integra- and will inspire future research and applications that embraces the
tion sites61 and drug binding sites84. The stability and reproducibil- myriad of ensemble deep learning strategies to revolutionize bio-
ity offered by ensemble methods such as in feature selection104 are logical and biomedical research.
also making a substantial impact in biomarker discovery105. This is
evident from the application of ensemble deep learning methods Received: 21 March 2020; Accepted: 14 July 2020;
to identifying molecular markers for the diagnosis of primary and Published: xx xx xxxx
metastatic cancers66 and to provide insights into normal develop-
ment and cancers68. As we move from predictive to preventive bio- References
medical research, models that offer biological insight into data will 1. Larranaga, P. et al. Machine learning in bioinformatics. Briefings Bioinform.
become increasingly desirable. 7, 86–112 (2006).
2. Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new
computational modelling techniques for genomics. Nat. Rev. Genet. 20,
Choice of network architecture. The choice of network archi- 389–403 (2019).
tecture is crucial for achieving optimal performance in a specific 3. Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C. & Collins, J. J.
domain and application. For example, many studies choose to Next-generation machine learning for biological networks. Cell 173,
employ variants of the RNN such as the LSTM, which is suitable for 1581–1592 (2018).
4. Hansen, L. K. & Salamon, P. Neural network ensembles. IEEE Trans.
learning sequential information in biological sequences53,72. DNN Pattern Anal. Mach. 12, 993–1001 (1990).
and CNN architectures, on the other hand, are shown to be suitable 5. Yang, P., Hwa Yang, Y., Zhou, B. B. & Zomaya, A. Y. A review of ensemble
for biological applications that handle high-dimensional input61,66. methods in bioinformatics. Curr. Bioinform. 5, 296–308 (2010).
The use of multi-model ensembles makes it possible to exploit 6. Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Briefings
the power of hybrid architectures or to combine heteroge- Bioinform. 18, 851–869 (2017).
7. Dietterich, T. G. Ensemble methods in machine learning. In International
neous data types in multi-omics. Examples reviewed include the Workshop on Multiple Classifier Systems 1–15 (Springer, 2000).
ResNet/RNN hybrid used to capture the relationship between 8. Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).
each layer of features in RNA secondary structure prediction73, 9. Schapire, R. E., Freund, Y., Bartlett, P. & Lee, W. S. Boosting the margin:
and the CNN/LSTM hybrid to learn both RNA sequences and a new explanation for the effectiveness of voting methods. Ann. Stat. 26,
secondary structures for joint prediction of alternative polyad- 1651–1686 (1998).
10. Wolpert, D. H. Stacked generalization. Neural Netw. 5, 241–259 (1992).
enylation sites on pre-mRNAs90. While these studies demonstrate 11. Vega-Pons, S. & Ruiz-Shulcloper, J. A survey of clustering ensemble
the importance and the application of specialized network archi- algorithms. Int. J. Pattern Recogn. 25, 337–372 (2011).
tectures in bioinformatics, the exponential growth of new network 12. Altman, N. & Krzywinski, M. Points of significance: ensemble methods:
architectures proposed in the computer science literature is likely bagging and random forests. Nat. Methods 14, 933–935 (2017).
to lead to many more novel applications in bioinformatics in the 13. Schmidhuber, J. Deep learning in neural networks: an overview. Neural
Netw. 61, 85–117 (2015).
coming years. 14. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations
by back-propagating errors. Nature 323, 533–536 (1986).
Computational expense. Deep learning models typically con- 15. Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with
tain large numbers of parameters and the computational bur- deep convolutional neural networks. In Proc. 26th Int. Conf. Advances in
den of generating an ensemble of multiple deep learning models Neural Information Processing Systems 1097–1105 (NIPS, 2012).
16. Williams, R. J. & Zipser, D. A learning algorithm for continually running
could be extremely high especially when working with large-scale fully recurrent neural networks. Neural Comput. 1, 270–280 (1989).
omics data. Nevertheless, recent developments in ensemble 17. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput.
deep learning have made use of the modularity of deep learning 9, 1735–1780 (1997).

Nature Machine Intelligence | www.nature.com/natmachintell


Review Article NATuRe MAcHine InTelligence
18. Cho, K. et al. Learning phrase representations using RNN encoder–decoder 47. Geman, S., Bienenstock, E. & Doursat, R. Neural networks and the bias/
for statistical machine translation. In Proc. 2014 Conf. Empirical Methods in variance dilemma. Neural Comput. 4, 1–58 (1992).
Natural Language Processing 1724–1734 (EMNLP, 2014). 48. Bengio, Y. Learning deep architectures for AI. Found. Trends Mach. Learn.
19. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image 2, 1–127 (2009).
recognition. In Proc. 2016 IEEE Conf. Computer Vision and Pattern 49. Keskar, N. S., Nocedal, J., Tang, P. T. P., Mudigere, D. & Smelyanskiy, M. On
Recognition 770–778 (IEEE, 2016). large-batch training for deep learning: generalization gap and sharp
20. Baldi, P. Autoencoders, unsupervised learning, and deep architectures. minima. In Proc. 5th Int. Conf. Learning Representations (ICLR, 2017).
In Proc. ICML Workshop on Unsupervised and Transfer learning 37–49 50. Zhao, D., Yu, G., Xu, P. & Luo, M. Equivalence between dropout and data
(ICML, 2012). augmentation: a mathematical check. Neural Netw. 115, 82–89 (2019).
21. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 51. Bartoszewicz, J. M., Seidel, A., Rentzsch, R. & Renard, B. Y. Deepac:
436–444 (2015). predicting pathogenic potential of novel DNA with reverse-complement
22. Ju, C., Bibaut, A. & van der Laan, M. The relative performance of ensemble neural networks. Bioinformatics 36, 81–89 (2020).
methods with deep convolutional neural networks for image classification. 52. Cao, Z., Pan, X., Yang, Y., Huang, Y. & Shen, H.-B. The lncLocator: a
J. Appl. Stat. 45, 2800–2818 (2018). subcellular localization predictor for long non-coding RNAs based on a
23. Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D. & Batra, D. Why M stacked ensemble classifier. Bioinformatics 34, 2185–2194 (2018).
heads are better than one: training a diverse ensemble of deep networks. 53. Zhang, S., Hu, H., Jiang, T., Zhang, L. & Zeng, J. TITER: predicting
Preprint at https://arxiv.org/abs/1511.06314 (2015). translation initiation sites by deep learning. Bioinformatics 33,
24. Granitto, P. M., Verdes, P. F. & Ceccatto, H. A. Neural network ensembles: i234–i242 (2017).
evaluation of aggregation algorithms. Artif. Intell. 163, 139–162 (2005). 54. Zhang, Y., Qiao, S., Ji, S. & Zhou, J. Ensemble-CNN: predicting DNA
25. Liu, Y. & Yao, X. Ensemble learning via negative correlation. Neural Netw. binding sites in protein sequences by an ensemble deep learning method. In
12, 1399–1404 (1999). Proc. 14th Int. Conf. Intelligent Computing 301–306 (ICIC, 2018).
26. Lee, S. et al. Stochastic multiple choice learning for training diverse deep 55. He, F. et al. Protein ubiquitylation and sumoylation site prediction based on
ensembles. In Proc. 30th Int. Conf. Advances in Neural Information ensemble and transfer learning. In Proc. 2019 IEEE Int. Conf. Bioinformatics
Processing Systems 2119–2127 (NIPS, 2016). and Biomedicine 117–123 (IEEE, 2019).
27. Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural 56. Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human
network. Preprint at http://arxiv.org/abs/1503.02531 (2015). genome. Nat. Rev. Genet. 7, 85–97 (2006).
28. Shen, Z., He, Z. & Xue, X. Meal: multi-model ensemble via adversarial 57. Portela, A. & Esteller, M. Epigenetic modifications and human disease.
learning. In Proc. AAAI Conf. Artificial Intelligence Vol. 33 4886–4893 Nat. Biotechnol. 28, 1057–1068 (2010).
(AAAI, 2019). 58. Karim, M. R., Rahman, A., Jares, J. B., Decker, S. & Beyan, O. A snapshot
29. Parisotto, E., Ba, J. & Salakhutdinov, R. Actor-mimic: deep multitask and neural ensemble method for cancer-type prediction based on copy number
transfer reinforcement learning. In Proc. Int. Conf. Learning Representations variations. Neural Comput. Appl. https://doi.org/10.1007/s00521-019-04616-9
(ICLR, 2016). (2019).
30. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. 59. Erhan, D. et al. Why does unsupervised pre-training help deep learning?
Dropout: a simple way to prevent neural networks from overfitting. J. Mach. J. Mach. Learn. Res 11, 625–660 (2010).
Learn. Res. 15, 1929–1958 (2014). 60. Angermueller, C., Lee, H. J., Reik, W. & Stegle, O. DeepCpG: accurate
31. Baldi, P. & Sadowski, P. J. Understanding dropout. In Proc. 27th Int. Conf. prediction of single-cell DNA methylation states using deep learning.
Advances in Neural Information Processing Systems 2814–2822 (NIPS, 2013). Genome Biol. 18, 67 (2017).
32. Hara, K., Saitoh, D. & Shouno, H. Analysis of dropout learning regarded as 61. Hu, H. et al. Deephint: understanding HIV-1 integration via deep learning
ensemble learning. In Proc. 25th Int. Conf. Artificial Neural Networks 72–79 with attention. Bioinformatics 35, 1660–1667 (2019).
(ICANN, 2016). 62. Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly
33. Huang, G., Sun, Y., Liu, Z., Sedra, D. & Weinberger, K. Q. Deep networks learning to align and translate. Preprint at https://arxiv.org/abs/1409.0473
with stochastic depth. In 14th European Conf. Computer Vision 646–661 (2014).
(Springer, 2016). 63. Yang, Y. H. & Speed, T. Design issues for cDNA microarray experiments.
34. Singh, S., Hoiem, D. & Forsyth, D. Swapout: learning an ensemble of deep Nat. Rev. Genet. 3, 579–588 (2002).
architectures. In Proc. 30th Int. Conf. Advances in Neural Information 64. Ozsolak, F. & Milos, P. M. RNA sequencing: advances, challenges and
Processing Systems 28–36 (NIPS, 2016). opportunities. Nat. Rev. Genet. 12, 87–98 (2011).
35. Huang, G. et al. Snapshot ensembles: train 1, get M for free. Preprint at 65. Kolodziejczyk, A. A., Kim, J. K., Svensson, V., Marioni, J. C. & Teichmann,
https://arxiv.org/abs/1704.00109 (2017). S. A. The technology and biology of single-cell RNA sequencing. Mol. Cell
36. Han, B., Sim, J. & Adam, H. Branchout: regularization for online ensemble 58, 610–620 (2015).
tracking with convolutional neural networks. In Proc. IEEE Conf. Computer 66. Grewal, J. K. et al. Application of a neural network whole
Vision and Pattern Recognition 3356–3365 (IEEE, 2017). transcriptome-based pan-cancer method for diagnosis of primary and
37. Wang, X., Bao, A., Cheng, Y. & Yu, Q. Multipath ensemble convolutional metastatic cancers. JAMA Netw. Open 2, e192597 (2019).
neural network. IEEE Trans. Emerg. Topics Comput. https://doi.org/10.1109/ 67. Xiao, Y., Wu, J., Lin, Z. & Zhao, X. A deep learning-based multi-model
TETCI.2018.2877154 (2018). ensemble method for cancer prediction. Comput. Methods Prog. Biomed.
38. Zhu, X., Gong, S. et al. Knowledge distillation by on-the-fly native 153, 1–9 (2018).
ensemble. In Proc. 32nd Int. Conf. Advances in Neural Information 68. West, M. D. et al. Use of deep neural network ensembles to identify
Processing Systems 7517–7527 (NIPS, 2018). embryonic-fetal transition markers: repression of COX7A1 in embryonic
39. Geddes, T. A. et al. Autoencoder-based cluster ensembles for single-cell and cancer cells. Oncotarget 9, 7796–7811 (2018).
RNA-seq data analysis. BMC Bioinform. 20, 660 (2019). 69. Tan, J. et al. Unsupervised extraction of stable expression signatures
40. Shao, H., Jiang, H., Lin, Y. & Li, X. A novel method for intelligent from public compendia with an ensemble of neural networks. Cell Syst. 5,
fault diagnosis of rolling bearings using ensemble deep auto-encoders. 63–71 (2017).
Mech. Syst. Signal Process. 102, 278–297 (2018). 70. Lee, D., Redfern, O. & Orengo, C. Predicting protein function from
41. Wang, W., Arora, R., Livescu, K. & Bilmes, J. On deep multi-view sequence and structure. Nat. Rev. Mol. Cell Biol. 8, 995–1005 (2007).
representation learning. In Proc. 32nd Int. Conf. International Conference on 71. Li, Z. & Yu, Y. Protein secondary structure prediction using cascaded
Machine Learning 1083–1092 (ICML, 2015). convolutional and recurrent neural networks. In Proc. 25th Int. Joint Conf.
42. Huang, Z. et al. Multi-view spectral clustering network. In Proc. 28th Int. Artificial Intelligence 2560–2567 (AAAI, 2016).
Joint Conf. Artificial Intelligence 2563–2569 (IJCAI, 2019). 72. Torrisi, M., Kaleel, M. & Pollastri, G. Deeper profiles and cascaded
43. Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. Extracting and recurrent and convolutional neural networks for state-of-the-art protein
composing robust features with denoising autoencoders. In Proc. 25th Int. secondary structure prediction. Sci. Rep. 9, 12374 (2019).
Conf. Machine Learning 1096–1103 (ICML, 2008). 73. Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. RNA secondary structure
44. Bachman, P., Alsharif, O. & Precup, D. Learning with pseudo-ensembles. prediction using an ensemble of two-dimensional deep neural networks and
In Proc. 28th Int. Conf. Advances in Neural Information Processing Systems transfer learning. Nat. Commun. 10, 5407 (2019).
3365–3373 (NIPS, 2014). 74. Zhang, B., Li, J. & Lü, Q. Prediction of 8-state protein secondary structures
45. Antelmi, L., Ayache, N., Robert, P. & Lorenzi, M. Sparse multi-channel by a novel deep learning architecture. BMC Bioinform. 19, 293 (2018).
variational autoencoder for the joint analysis of heterogeneous data. In 75. Zacharaki, E. I. Prediction of protein function using a deep convolutional
Proc. 36th Int. Conf. Machine Learning 302–311 (ICML, 2019). neural network ensemble. PeerJ Comput. Sci. 3, e124 (2017).
46. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. & Manzagol, P.-A. Stacked 76. Singh, J. et al. Detecting proline and non-proline cis isomers in protein
denoising autoencoders: learning useful representations in a deep network structures from sequences using deep residual ensemble learning. J. Chem.
with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010). Inf. Model. 58, 2033–2042 (2018).

Nature Machine Intelligence | www.nature.com/natmachintell


NATuRe MAcHine InTelligence Review Article
77. Walther, T. C. & Mann, M. Mass spectrometry-based proteomics in cell 98. Yuan, X., Xie, L. & Abouelenien, M. A regularized ensemble framework of
biology. J. Cell Biol. 190, 491–500 (2010). deep learning for cancer detection from multi-class, imbalanced training
78. Cox, J. & Mann, M. Quantitative, high-resolution proteomics for data. Pattern Recognit. 77, 160–172 (2018).
data-driven systems biology. Annu. Rev. Biochem. 80, 273–299 (2011). 99. Xie, J., Xu, B. & Chuang, Z. Horizontal and vertical ensemble with deep
79. Zohora, F. T. et al. DeepIso: a deep learning model for peptide feature representation for classification. Preprint at https://arxiv.org/abs/1306.2759
detection from LC-MS map. Sci. Rep. 9, 17168 (2019). (2013).
80. Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. 100. Dvornik, N., Schmid, C. & Mairal, J. Diversity with cooperation: ensemble
DIA-NN: neural networks and interference correction enable deep methods for few-shot classification. In Proc. IEEE Int. Conf. Computer
proteome coverage in high throughput. Nat. Methods 17, 41–44 (2020). Vision 3723–3731 (IEEE, 2019).
81. Kitano, H. Computational systems biology. Nature 420, 206–210 (2002). 101. Bzdok, D., Nichols, T. E. & Smith, S. M. Towards algorithmic analytics for
82. Hu, Y. et al. ACME: pan-specific peptide–MHC class I binding prediction large-scale datasets. Nat. Mach. Intell. 1, 296–306 (2019).
through attention-based deep neural networks. Bioinformatics 35, 102. Yang, P. et al. Sample subset optimization techniques for imbalanced and
4946–4954 (2019). ensemble learning problems in bioinformatics applications. IEEE Trans.
83. Zhang, L., Yu, G., Xia, D. & Wang, J. Protein–protein interactions prediction Cybern. 44, 445–455 (2014).
based on ensemble deep neural networks. Neurocomputing 324, 10–19 (2019). 103. Yang, P. et al. AdaSampling for positive-unlabeled and label noise learning
84. Karimi, M., Wu, D., Wang, Z. & Shen, Y. DeepAffinity: interpretable deep with bioinformatics applications. IEEE Trans. Cybern. 49, 1932–1943 (2019).
learning of compound–protein affinity through unified recurrent and 104. Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P. & Saeys, Y. Robust
convolutional neural networks. Bioinformatics 35, 3329–3338 (2019). biomarker identification for cancer diagnosis with ensemble feature
85. Hu, S. et al. Predicting drug-target interactions from drug structure and selection methods. Bioinformatics 26, 392–398 (2010).
protein sequence using novel convolutional neural networks. BMC 105. Pusztai, L., Hatzis, C. & Andre, F. Reproducibility of research and
Bioinform. 20, 689 (2019). preclinical validation: problems and solutions. Nat. Rev. Clin. Oncol. 10,
86. Yang, P. et al. Multi-omic profiling reveals dynamics of the phased 720–724 (2013).
progression of pluripotency. Cell Syst. 8, 427–445 (2019). 106. Dean, J. et al. Large scale distributed deep networks. In Proc. 26th Int. Conf.
87. Kim, H. J. et al. Transcriptional network dynamics during the progression Advances in Neural Information Processing Systems 1223–1231 (NIPS, 2012).
of pluripotency revealed by integrative statistical learning. Nucl. Acids Res. 107. Smith, V., Chiang, C.-K., Sanjabi, M. & Talwalkar, A. S. Federated
48, 1828–1842 (2020). multi-task learning. In Proc. 31th Int. Conf. Advances in Neural Information
88. Ramazzotti, D., Lal, A., Wang, B., Batzoglou, S. & Sidow, A. Multi-omic Processing Systems 4424–4434 (NIPS, 2017).
tumor data reveal diversity of molecular mechanisms that correlate with
survival. Nat. Commun. 9, 4453 (2018).
89. Liang, M., Li, Z., Chen, T. & Zeng, J. Integrative data analysis of Acknowledgements
P.Y. was supported by an Australian Research Council (ARC) Discovery Early Career
multi-platform cancer data with a multimodal deep learning approach.
Researcher Award (DE170100759) and a National Health and Medical Research Council
IEEE/ACM Trans. Comput. Biol. Bioinform. 12, 928–937 (2014).
Investigator Grant (1173469). J.Y.H.Y. and P.Y. were supported by an ARC Discovery
90. Arefeen, A., Xiao, X. & Jiang, T. DeepPasta: deep neural network based
Project (DP170100654). Y.C. was supported by a University of Sydney Postgraduate
polyadenylation site analysis. Bioinformatics 35, 4577–4585 (2019).
Award. T.A.G. was supported by a postgraduate scholarship from Research Training
91. Gala, R. et al. A coupled autoencoder approach for multi-modal analysis of
Program.
cell types. In Proc. 33st Int. Conf. Advances in Neural Information Processing
Systems 9263–9272 (NIPS, 2019).
92. Zhang, X. et al. Integrated multi-omics analysis using variational Author contributions
autoencoders: application to pan-cancer classification. In Proc. 2019 IEEE P.Y. conceptualized this work. Y.C and P.Y. reviewed the literature and drafted the
Int. Conf. Bioinformatics and Biomedicine 765–769 (IEEE, 2019). manuscript. All authors wrote and edited the Review Article.
93. Sharifi-Noghabi, H., Zolotareva, O., Collins, C. C. & Ester, M. MOLI:
multi-omics late integration with deep neural networks for drug response
prediction. Bioinformatics 35, i501–i509 (2019). Competing interests
94. Lu, Z. et al. The classification of gliomas based on a pyramid dilated The authors declare no competing interests.
convolution resnet model. Pattern Recognit. Lett. 133, 173–179 (2020).
95. Codella, N. C. F. et al. Deep learning ensembles for melanoma recognition
in dermoscopy images. IBM J. Res. Dev. 61, 5 (2017). Additional information
96. Song, Y. et al. Accurate segmentation of cervical cytoplasm and nuclei Correspondence should be addressed to P.Y.
based on multiscale convolutional network and graph partitioning. IEEE Reprints and permissions information is available at www.nature.com/reprints.
Trans. Biomed. Eng. 62, 2421–2433 (2015).
97. Rasti, R., Teshnehlab, M. & Phung, S. L. Breast cancer diagnosis in Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
DCE-MRI using mixture ensemble of convolutional neural networks. published maps and institutional affiliations.
Pattern Recognit. 72, 381–390 (2017). © Springer Nature Limited 2020

Nature Machine Intelligence | www.nature.com/natmachintell

You might also like