Deep RL For Biology

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Cite this preprint version of the manuscript as:

M. Mahmud, M.S. Kaiser, A. Hussain, S. Vassanelli. “Applications of Deep Learning


and Reinforcement Learning to Biological Data,” IEEE Trans. Neural Netw. Learn.
Syst., 2018, doi: 10.1109/TNNLS.2018.2790388.
c IEEE holds the copyright of this work.

Applications of Deep Learning and Reinforcement Learning


to Biological Data
Mufti Mahmud1,* , M. Shamim Kaiser2,* , Amir Hussain3 , Stefano Vassanelli1
arXiv:1711.03985v2 [cs.LG] 7 Jan 2018

1
NeuroChip Lab, University of Padova, 35131 - Padova, Italy
2
IIT, Jahangirnagar University, Savar, 1342 - Dhaka, Bangladesh
3
Division of Computing Science & Maths, University of Stirling, FK9 4LA Stirling, UK

*
Co-‘first and corresponding’ author. Emails: [email protected] (M. Mahmud),
[email protected] (M.S. Kaiser)

Abstract
Rapid advances of hardware-based technologies during the past decades have opened up
new possibilities for Life scientists to gather multimodal data in various application
domains (e.g., Omics, Bioimaging, Medical Imaging, and [Brain/Body]-Machine
Interfaces), thus generating novel opportunities for development of dedicated data
intensive machine learning techniques. Overall, recent research in Deep learning (DL),
Reinforcement learning (RL), and their combination (Deep RL) promise to revolutionize
Artificial Intelligence. The growth in computational power accompanied by faster and
increased data storage and declining computing costs have already allowed scientists in
various fields to apply these techniques on datasets that were previously intractable for
their size and complexity. This review article provides a comprehensive survey on the
application of DL, RL, and Deep RL techniques in mining Biological data. In addition,
we compare performances of DL techniques when applied to different datasets across
various application domains. Finally, we outline open issues in this challenging research
area and discuss future development perspectives.

Introduction
The need for novel healthcare solutions and continuous efforts in understating the
biological bases of pathologies have pushed extensive research in the Biological Sciences
over the last two centuries [1]. Recent technological advancements in Life Sciences
opened up possibilities not only to study Biological systems from a holistic perspective
but provided unprecedented access to the molecular details of the living organisms [2, 3].
Novel tools for DNA sequencing [4], gene expression [5], bioimaging [6],
neuroimaging [7], and brain-machine interfaces [8] are now available to the scientific
community. However, considering the inherent complexity of the biological systems
together with the high-dimensionality, diversity, and noise contaminations, inferring

1/33
meaningful conclusion from these data is a huge challenge [9]. Therefore, novel
instruments are required to process and analyze biological big data that must be robust,
reliable, reusable, and accurate [10]. This encouraged numerous scientists from life and
computing sciences disciplines to embark in a multidisciplinary approach to demystify
functions and dynamics of living organisms with remarkable progress to biological and
biomedical research [11]. Thus, many techniques of Artificial Intelligence (AI), in
particular machine learning (ML), have been proposed over time to facilitate
recognition, classification, and prediction of patterns in biological data [12].
The conventional ML techniques can be broadly categorized in two large sets –
supervised and unsupervised. The methods pertaining to the supervised learning
paradigm classify objects in a pool using a set of known annotations/ attributes/
features. Instead, the unsupervised learning techniques form groups/ clusters among the
objects in a pool by identifying their similarity and then use them for classifying the
unknowns. Also, the other category, reinforcement learning (RL), allows a system to
learn from the experiences it gains through interacting with its environment (see section
1.2 for details).
Popular supervised methods include: Artificial Neural Network (ANN) [13] and its
variants, Support Vector Machines [14] and linear classifiers [15], Bayesian
Statistics [16], k-Nearest Neighbors [17], Hidden Markov Model [18], and Decision
Trees [19]. Also, popular unsupervised methods include: Autoencoders [20], Expectation
Maximization [21], Self-Organizing Maps [22], k-Means [23], and Fuzzy [24] and
Density-based [25] clustering.
Figure 1 A possible representa- A large body of evidences shows
tion of the DL, RL, and deep RL that the above mentioned methods
A C E F
frameworks for biological appli- and their respective variants can be
cations. A–F. The popular DL successfully applied to Biological
architectures. G. Schematic dia- CNN data coming from various
B
DNN
D
gram of the learning framework sources, e.g., Omics (covers data
DBM DBN
as a part of Artificial Intelligence Output Hidden from genetics and [gen/ transcript/
Input Conv/Pool
(AI). Broadly, AI can be thought RNN DA Kernel Visible epigen/ prote/ metabol]omics [26]),
to have evolved parallelly in two G Artificial Intelligence Expert System Bioimaging (covers data
main directions– Expert Systems Learning from [sub-]cellular images acquired
(ES) and ML. ES takes expert de- Machine Learning by diverse imaging techniques [27]),
Mapping/Classification

cisions from given factual data us- Feature extraction


Input Output Medical Imaging (covers
ing rule based inferences. ML ex- Deep Learning data from [medical/ clinical/
Simple feature Abstract feature
tracts features from data mainly health] imaging mainly through
through statistical modeling and Agent diagnostic imaging techniques [28]),
provides predictive output when Reinforcement Learning Reward and [Brain/Body]-Machine
applied to unknown data. DL, Interfaces or BMI (covers electrical
Human activity recognition EEG
being a sub-division of ML, ex- H
Endoscopy Images
Fundus Images
X-ray
PET/CT Scan
(f)MRI

Monitoring ECG
Cognetive state EMG
signals generated by the Brain
tracts more abstract features from Anomaly detection
and the Muscles and acquired
Cell/tissue classification

Biosignal
a larger set of training data mostly using appropriate sensors [29, 30]).
Disease diagnosis

in a hierarchical fashion resem- Broadly, AI can be thought to


Cell analysis

Sample
Cell count

bling the working principle of our Medical Imaging have evolved parallelly in two main
Anomaly detection
Organ segmentaion
Disease diagnosis
Reconstruction

brain. The other sub-division, RL, Cell


directions– Expert Systems and
provides a software agent which DNA ML (see the schematic diagram
gathers experience based on in- of Fig. 1H). Focusing on the latter,
RNA
EMI

Protein
teractions with the environment ML extracts features from training
Gene/DNA/RNA sequences Disease prediction
through some actions and aims to miRNA
Gene expression
Drug design
Gene modification dataset(s) and make models with
Molecular components Alternative splicing
maximize the cumulative perfor- Protein structure (PS) PS interpretation minimal or no human intervention.
mance. H. Possible applications These models provide predicted
of AI to biological data. outputs based on test data. DL,

2/33
being a sub-division of ML, extracts more abstract features from a larger set of training
data mostly without human supervision. RL, being the other sub-division of ML, is
inspired by psychology. It provides a software agent which gathers experience based on
interactions with the environment through some actions and aims to maximize the
cumulative performance.
In recent years DL, RL, and deep RL methods are poised to reshape the future of
ML [31]. Over the last decade, the works pertaining to DL, RL, and deep RL were
extensively reviewed from different perspectives. In a topical review, Schmidhuber
provided a detailed time line of significant DL developments (for both supervised and
unsupervised), RL and evolutionary computation, and DL in feed-forward and recurrent
neural networks (NNs) for RL [32]. Other reviews are focusing on applications of DL in
health informatics [33], biomedicine [34], and bioinformatics [35]. On the other hand,
Kaelbling et al. discussed RL from the perspective of the trade-off between exploitation
and exploration, RL’s foundation via Markov decision theory, the learning mechanism
using delayed reinforcement, construction of empirical learning models, use of
generalization and hierarchy, and reported some exemplifying RL systems
implementations [36]. Glorennec provides a brief overview of the basis of RL with
explicit descriptions of Q- and Fuzzy Q-learning [37]. With respect to applications in
solving dynamic optimization problems, Gosavi surveys Q-learning, temporal differences,
semi-Markov decision problems, stochastic games, policy gradients, and hierarchical RL
with detailed underlying mathematics [38]. In addition, Li analyzed the recent advances
of deep RL on– Deep Q-Network (DQN) with its extensions, asynchronous methods,
policy optimization, reward, and planning as well as different applications including
games (e.g., AlphaGo, robotics, chatbot, etc.), neural architecture design, natural
language processing, personalized web services, healthcare, and finance [39].
Despite the popularity of the topic and application potential to diverse disciplines, a
comprehensive review is missing that focuses on data from different Biological
application domains while providing a performance comparison across techniques. This
review is intended to fill this gap: it provides a brief overview on DL, RL, and deep RL
concepts, followed by state-of-the-art applications of these techniques and performance
comparison between various DL approaches. Finally, it identifies and outlines some
open issues and speculates about future perspectives.
As for the organization of the rest of the article, section 1 provides a conceptual
overview to the DL, RL, and deep RL techniques, thus introducing the reader to the
underlying theory; section 2 contains the state-of-the-art applications of these
techniques to various biological application domains; section 3 presents test results and
performance comparison of DL techniques applied on datasets pertaining to different
application domains; section 4 highlights open issues and hints on future perspectives;
and is concluded in section 5.

1 Conceptual Overview
1.1 Deep Learning
The core concept of DL is to learn data representations through increasing abstraction
levels. Almost in all levels more abstract representations at a higher level are learned by
defining them in terms of less abstract representations at lower levels. This type of
hierarchical learning process is very powerful as it allows a system to comprehend and
learn complex representations directly from the raw data [40], making it useful in many
disciplines [41].
Several DL architectures have been reported in the literature including: Deep Neural
Network (DNN), Recurrent Neural Network (RNN), Convolutional Neural Network

3/33
(CNN), Deep Autoencoder (DA), Deep Boltzmann Machine (DBM), Deep Belief
Network (DBN), Deep Residual Network, Deep Convolutional Inverse Graphics
Network, etc. For the sake of brevity, only the ones widely used with Biological data are
briefly summarized below. However, the interested readers are redirected to the
references mentioned in each subsection for concrete mathematical details behind each
architecture.

1.1.1 Deep Neural Network


DNN (Fig. 1 A) [42] is inspired by the brain’s visual input processing mechanism which
takes place at multiple levels (i.e., starting with cortical area ‘V1’ and then passing to
area ‘V2’, and so on) [32]. The standard neural network (NN) is extended to have
multiple hidden layers with nonlinear modules embodied in each hidden layer allowing it
to learn part-hole of the representations. Though this formulation has been successfully
used in many applications, the training process is slow and cumbersome.

1.1.2 Recurrent Neural Network


RNN (Fig. 1 B) [43] is a NN model designed to detect structures in streams of data [44].
Unlike feedforward NN which performs computations unidirectionally from input to
output, RNN computes the current state’s output depending on the outputs of the
previous states. Due to this ‘memory’-like property, despite learning problems related to
vanishing and exploding gradients, RNN gained popularity in many fields involving
streaming data (e.g., text mining, time series, genomes, etc.). In recent years, two main
variants, bidirectional RNN (BRNN) [45] and long short-term memory (LSTM) [46]
have also been applied [47, 48].

1.1.3 Convolutional Neural Network


CNN (Fig. 1 C) [49] is a multilayer NN model [50], inspired by the neurobiology of
visual cortex, that consists of convolutional layer(s) followed by fully connected layer(s).
In between these two types of layers the may exist subsampling steps. They get the
better of DNNs which have difficulty in scaling well with multidimensional locally
correlated input data. Therefore, the main application of CNN has been in datasets
where the number of nodes and parameters required to be trained is relatively large
(e.g., image analysis). Exploiting the ‘stationary’ property of an image, convolution
filters (CF) can learn data-driven kernels. Applying such CF along with a suitable
pooling function reduces the features that are supplied to the fully connected network
to classify. However, in case of large datasets even this can be daunting and can be
solved using sparsely connected networks. Some of the popular CNN configurations
include: AlexNet [51], VGGNet [52], and GoogLeNet [53].

1.1.4 Deep Autoencoder


DA architecture (Fig. 1 D) [54] is obtained by stacking a number of Autoencoders
which are data driven NN models (i.e., unsupervised) designed to reduce data dimension
by automatically projecting incoming representations to a lesser dimensional space than
that of the input. In an Autoencoder, equal amount of units are used in the
input/output layers and less units in the hidden layers. (Non)linear transformations are
embodied in the hidden layer units to encode the given input into smaller
dimensions [55]. Despite that it requires a pre-training stage and suffers from vanishing
error, this architecture is popular for its data compression capability and have many
variants, e.g., Denoising Autoencoder [54], Sparse Autoencoder [56], Variational
Autoencoder [57], and Contractive Autoencoder [58].

4/33
1.1.5 [Restricted] Boltzmann Machine ([R]BM)
[R]BM is an undirected probabilistic generative model representing specific probability
distributions [59]. It is also considered as nonlinear feature detector. The learning
process of [R]BM is based on optimizing its parameters for a set of given observations to
obtain the best possible fit of the probability distribution through Gibbs sampling (a
Markov Chain Monte Carlo method [60]) [61]. BM has symmetrical connections among
its units and has one visible layer with (multiple) hidden layers. Usually, the learning
process of a BM is slow and computationally expensive, thus, requires long to reach
equilibrium statistics [40]. By restricting the intralayer units of a BM to connect among
themselves a bipartite graph is formed (i.e., an RBM has a visible and a hidden layer)
where the learning inefficiency is solved [59]. Stacking multiple RBMs as learning
elements yields the following two DL architectures.

Deep Boltzmann Machine DBM (Fig. 1 E) [62] is a stack of undirected RBMs.


Being undirected, there is a feedback process among the layers where feature inference
from higher level units affect the inference of lower level units. Despite this powerful
inference mechanism which allows an input’s alternative interpretations through
concurrent competition at all levels of the model, estimating model parameters from
data remains difficult. Gradient based methods (e.g., persistent contrastive
divergence [63]) fail to explore the model parameters sufficiently [62]. Though this
learning problem is overcome by pretraining each RBM in a layerwise greedy fashion,
with outputs of the hidden variables from lower layers as input to upper layers [59], the
time complexity remains high and may not be suitable for large training datasets [64].

Deep Belief Network DBN (Fig. 1 F) [65] is formed by ordering several RBMs in a
way that one RBM’s latent layer is linked to the subsequent RBM’s visible layer. The
connections of DBN are downward directed to its immediate lower layer, except that the
upper two layers are undirected [65]. Thus, DBN is a hybrid model with the first two
layers as undirected graphical model and the rest being directed generative model. The
different layers are learned in a layerwise greedy fashion and fine tuned based on
required output [33], however, the training procedure is computationally demanding.

1.2 Reinforcement Learning


Rooted in behavioral psychology, RL is a distinctive member of the ML family. An RL
problem is solved by learning new experiences through trial–and–error. An RL agent is
trained, as such, it’s actions to interact with the environment maximizes the cumulative
reward resulting from the interactions. Generally, RL problems are modeled and solved
using Markov Decision Processes (MDP) theory through Monte Carlo (MC) and
dynamic programming (DP) [66].
The learning of an agent is a continuous process where the interactions with the
environment occurs at discrete time steps. In a typical RL cycle (at time t), the agent
receives the environment’s state (i.e., state, st ) and selects an action (at ) to interact.
The environment responds to the action and progresses to a new state (st+1 ). The
reward (rt+1 ), that the agent either receives or not for the selected action, associated to
the transition (st , at , st+1 ) is also determined [66]. Accordingly, after each cycle, the
agent updates the value function V (s) or action-value function Q(s, a) based on certain
policy, where, policy (π) is a function that maps states s ∈ S to actions a ∈ A, i.e.,
π : S → A ⇒ a = π(s) [36].
A possible way to solve the RL problem is to describe the environment as MDP with
a set of state-value function pairs, a set of actions, a policy, and a reward function. The
value function can be separated to solve state-value function (V ) or action-value

5/33
function (Q). In the state-value function the expected outcome, of being in state s
following policy π, is determined by sum of the rewards P∞ at future time steps with a
given discount factor (γ ∈ [0, 1]), i.e., V π (s) = Eπ ( k=0 γ k rt+k+1 |st = s). And in the
action-value function the expected outcome, of being in state s taking action a following
policy π, is determined
P∞ by sum of the rewards for each state action pairs, i.e.,
Qπ (s, a) = Eπ ( k=0 γ k rt+k+1 |st = s, at = a).
The MDP can be solved and the optimum policy can be achieved through DP by:
either starting with an initial policy and improving it iteratively (policy iteration), or
starting with arbitrary value function and recursively refining an estimate of an
improved state-value or action-value function to compute an optimal policy and its
value (value iteration) [67]. In the simplest case, the state-value function for a given
policy can be estimated using Bellman expectation equation as:
V π (s) = Eπ (rt+1 + γV π (st+1 )|st = s). Considering this as a policy evaluation process,
an improved and eventually optimal policy (π ∗ ) can be achieved by taking actions
greedily that maximizes the state-action value. But in scenarios with unknown
environments, model-free methods are to be used without MDP. In such cases, instead
of the state-value function, the action-value function can be maximized to find the
optimal policy (π ∗ ) using a similar policy evaluation and improvement process, i.e.,
Qπ (s, a) = Eπ (rt+1 + γQπ (st+1 , at+1 )|st = s, at = a). There are several learning
techniques, e.g., Monte Carlo, Temporal Difference (TD), and
State-Action-Reward-State-Action (SARSA), which describe various aspects of the
model-free policy evaluation and improvement process [68].
However, in real world RL problems, the state-action space is very large and storing
a separate value function for every possible state is cumbersome. In such situations
generalization of the value function through function approximation is required. For
example, the Q value function approximation is able to generalize to unknown states by
calculating a function (Q̂) for a given state action pair (s, a), i.e.,
Q̂(s, a, w) ≈ Qπ (s, a) = x(s, a)> w. In other words, a rough approximation of the Q
function is obtained from the feature vector representing (s, a) pair (x) and the
provided parameter (w which is updated using MC or TD learning) [69]. This
approximation allows to improve the Q function by minimizing the loss between the
true and approximated values (e.g., using gradient descent), i.e.,
J(w) = Eπ ((Qπ (s, a) − Q̂(s, a, w))2 ). Examples of differentiable function approximators
include: neural network, linear combinations of features, decision tree, nearest neighbor,
Fourier bases, etc. [70].

1.3 Deep Reinforcement Learning


The autonomic capability to learn without any feature crafting makes RL a powerful
tool applicable to many disciplines, but it falls short in cases when the data
dimensionality is large and the environment is non-stationary [71]. Also, DL’s capability
to learn complex patterns is sometimes prone to misclassification [72]. To mitigate, in
recent years, RL algorithms have been successfully combined with deep NN [39] giving
rise to novel learning strategies. This integration has been used either in approximating
RL functions using deep NN architectures or in training deep NN using RL.
The first notable example of such an integration is the Deep Q-network (DQN) [31]
which combines Q-learning with deep NN. The DQN agent, when presented with
high-dimensional inputs, can successfully learn policies using RL. The action-value
function is approximated for optimality using deep CNN. The deep CNN, using
experience replay and target network, overcomes the instability and divergence
sometimes experienced while approximating Q-function with shallow NN.
Another deep RL algorithm is the Double DQN which is an extension of the DQN
algorithm [73]. In certain situations the DQN suffers from substantial overestimations

6/33
inherited from the implemented Q-learning which are overcome by replacing the
Q-learning of the DQN with a double Q-learning algorithm [74]. The DQL learns two
value functions, by assigning an experience randomly to update one of them, resulting
in two sets of weights. During every update one set determines the greedy policy while
the other its value. Other deep RL algorithms include: Deep Deterministic Policy
Gradient, Continuous DQN, Asynchronous N-step Q-learning, Dueling network DQN,
Prioritized Experience Replay, Deep SARSA, Asynchronous Advantage Actor-Critic,
and Actor-Critic with Experience Replay [39].

2 Applications to Biological Data


The techniques outlined above, also available as open-source tools (e.g., see [75] for a
mini review on tools based on DL), have been used in mining Biological data. The
applications, as reported in the literature, are provided below for data coming from each
of the application domains.
Table 1 summarizes the state-of-the art applications of DL and RL to biological data
(see Fig. 1 H). It also reports on individual applications in each of these domains and
the data type on which the methods have been applied.

2.1 Omics
Some DL and RL methods have been extensively used in Omics (such as genomics,
proteomics or metabolomics) research to extract features, functions, structure, and
molecular dynamics from the raw biological sequence data (e.g., DNA, RNA, and
amino-acids). Specifically, mining sequence data is a challenging task. Different
analyses (e.g., gene expression profiling, splicing junction prediction, sequence specificity
prediction, transcription factor determination, protein-protein interaction evaluation,
etc.) dealing with different types of sequence data have been reported in the literature.
To identify splicing junction at DNA level, a tedious job to do manually, Lee et al.
proposed a DBN based unsupervised method to perform the auto-prediction [79].
Profiling gene expression (GE) is a demanding job. Chen et al. exploited a DNN based
method for GE profiling on RNA-seq and microarray-based GE Omnibus dataset [83].
The ChIP-seq data were preprocessed, using CNN, into a 2D matrix where each row
denoted a gene’s transcription factor activity profile [92]. Also, somatic point mutation
based cancer classification was performed using DNN [90]. In addition, DA based
methods have been used for feature extraction in cancer diagnosis and classification
(Fakoor et al. used sparse DA method [76]) in combination with related gene
identification (Danaee et al. used stacked Denoising DA [77]) from GE data.
Alipanahi et al. used Deep CNN structure to predict DNA- and RNA-binding
proteins’ ([D/R]BPs) role in alternative splicing and examined the effect of disease
associated genetic variants (GV) on transcription factor binding and GE [93]. Zhang et
al. developed a DNN framework to model structural features of RBPs [84]. Pan et al.
proposed a hybrid CNN-DBN model to predict RBP interaction sites and motifs on
RNAs [82]. Quang et al. proposed a DNN model to annotate and identify pathogenicity
in GV [86].
Identifying the best discriminative genes/microRNAs (miRNAs) is a challenging
task. Ibrahim et al. proposed a group feature selection method from genes/miRNAs
based on expression profile using DBN and active learning [80]. CNN was used to
interpret noncoding genome by annotating them [94]. Also, Zeng et al. employed CNN
to predict the binding between DNA and protein [95]. Zhou et al. proposed a CNN
based approach to identify noncoding GV [96] which was also used by Huang et al. for a
similar purpose [97]. Park et al. proposed a LSTM based tool to automatically predict

7/33
Table 1. Summary of [Deep] [Reinforcement] Learning Applications to Biological Data
Method & Reference
App. Data
Purpose DL
Dom. Type RL
DA DBN DNN CNN RNNψ
GE, DD,
DmsPS, SJP/Se,
PrSD, miRNA, [76–78] [79–82] [83–91] [92–98] [99, 100] [101–104]

Omics
DrD, Prtn,
PsDs Gene

Seg,
BCaD,
EMI [105] [106, 107] [108–115]
CACI,
CCnt

Bioimaging
[f/s]MRI [116–118] [119–122] [123] [124]υ [125–130] [131]
Seg,
CT [132] [133] [134–141]
ID,
PET [117] [120, 121, 142] [124]υ [143]

Medical
Imaging
DD
OthI [144] [145] [146] [138, 147–157] [131] [158]
MA, EEG [159, 160] [161–168] [169]γ [170, 171] [172–177] [178–181] [182, 183]
MD, EMG [184, 185]

BMI
ER, ECG [186–188] [189]
CogS NS [190–196]

Legends: App. Dom.–Application Domain; ψ –including LSTM; υ –using DPN; γ –using Sparse-DBN; Abbreviations in ‘App’ column:
GE–Gene Expression; DD–Disease Diagnosis (including Alzheimer’s/ Huntington/ Schizophrenia/ Mental/ Emotional State); DmsPS–DNA
methylation state prediction [85], Pathogenicity [86], Sequence assembly [101–103]; PrSD–Protein Structure (binding) Determination; DrD–Drug
Discovery; Seg–Segmentation; BCaD–Breast Cancer Detection; CACI–Classification & Analysis of Cell Image; CCnt–Cell Count; ID–Image
Denoising; MA–Motor Action decoding; MD–Movement Decoding; ER–Emotion Recognition; CogS–Cognitive State Determination;
Abbreviations in ‘Data Type’ column: SJP–Splice Junction Prediction; Se–[DNA/RNA/ChIP/DNase] sequence & microarray GE;
Prtn–Protein properties; OthI–Images not used elsewhere (e.g., UlS/EMI/EnI/XRay/CFI/MMM/cardiac MRI); NS–Neural Spikes.

8/33
miRNA precursor [99]. Also, Lee et al. presented a deep RNN framework for automatic
miRNA target prediction [100].
DNA methylation (DM) causes DNA segment activity alteration without affecting
the sequence, thus, detecting it’s state in a sequence is important. Angermueller et al.
used DNN based method to estimate DM state by predicting the changes in single
nucleotides and uncovering sequence motifs [85].
Proteomics pose many complex computational problems to solve. Estimating
complete protein structures from biological sequences, in 3D space, is a complex and NP
hard problem. Alternatively, the protein structures can be divided into independent
sub-problems (e.g., torsion angle, access surface area, dihedral angles, etc.) and solved
in parallel, and estimate the secondary protein structures (2-PS). Predicting
compounds-protein interaction (CPI) is very interesting from drug discovery point of
view and tough to solve.
Heffernan et al. proposed an iterative DNN scheme to solve these sub-problems for
2-PS [87]. Wang et al. utilized deep CNN to predict 2-PS [98]. Li et al. proposed DA
learning based model to reconstruct protein structure based on a template [78]. Also,
DNN based methods to predict CPI [88, 89, 91] have also been reported.
In medicine, model organisms are often used for translational research. Chen et al.
used bimodal DBNs to predict responses of human cells under certain stimuli based on
responses of rat cells obtained with same stimuli [81].
RL has also been used in omics, for example, Yang et al. used binary particle swarm
optimization and RL to predict bacterial genomes [101], Ralha et al. used RL through a
system called BioAgent to increase the accuracy of biological sequence annotation [102],
and Bocicor et al. solved the problem of DNA fragment assembly using RL based
framework [103]. Zhu et al. proposed hybrid RL method, with text mining, for
constructing protein-protein interaction networks [104].

2.2 Bioimaging
In biology, DL architectures targeted on pixel levels of a biological image to train the
NN. Ning et al. used CNN for pixel-wise image segmentation of nucleus, cytoplasm, cell,
and nuclear membranes using Electron Microscope Image (EMI) [108]. Reduced pixel
noise and better abstract features of biological images can be obtained by adding
multiple layers. Ciresan et al. employed deep convolutional neural networks to identify
mitosis in histology images of the breast [109], and similar architecture was also used to
find neuronal membranes and automatically segment neuronal structures in EMI [110].
Xu et al. used Stacked Sparse DA architecture to identify nuclei in the histopathology
images of the breast cancer [105]. Xu et al. classified Colon cancer images using
Multiple Instance Learning (MIL) from DNN learnt features [106].
Besides pixel level analysis, DL have also been applied to cell and tissue level
analysis. Chen et al. employed DNN in label-free cell classification [107]. Pärnamaa and
Leopold used CNN to automatically detect fluorescent protein in various subcellular
localization patterns using microscopy images of yeast [111]. Ferrari et al. used CNNs
to count bacterial colonies in agar plates [112]. Kraus et al. integrated both the
segmentation as well as classification in a model which can be utilized to classify the
microscopy images of the yeast [113]. Flow cytometry is used in cellular biology through
cycle analysis to monitor different stages of a cell-cycle. Eulenberg et al. proposed deep
flow model, combining non-linear dimension reduction with CNN, to analyze single cell
flow cytometry images [114]. Furthermore, CNN architecture was employed to segment
and recognize neural stem cells in images taken by bright field microscope [115], and
DBN for analyzing Gold immunochromatographic strip [197].

9/33
2.3 Medical Imaging
DL and RL architectures have been widely used in analyzing medical images obtained
from– magnetic resonance ([f/s]MRI), CT scan, positron emission tomography (PET),
radiography/ fundus (e.g., X-ray, CFI), microscope, ultrasound (UlS)– to denoise,
segment, classify, detect anomalies and diseases from these images.
Segmentation is a process of partitioning an image based on some specific patterns.
Sirinukunwattana et al. reported the results of the Gland Segmentation competition
from colon histology images [156]. Kamnitsas et al. proposed 3D dual pathway CNN to
simultaneously process multi-channel MRI and segment lesions related to tumors,
traumatic injuries, and ischemic stroke [130]. Stollenga et al. segmented neuronal
structures from 3D EMI and brain MRI using multi dimensional RNN [131]. Fritscher
et al. used deep CNN for volume segmentation from head-neck region’s CT scans [134].
Havaei et al. segmented brain tumor from MRI using CNN [125], and DNN [123].
Brosch and Tam proposed a DBN based manifold learning method of 3D brain
MRI [119]. Cardiac MRIs were segmented for heart’s left ventricle using DBN [145], and
blood pool and myocardium using CNN [157]. Mansoor et al. automatically segmented
anterior visual pathway from MRI sequences using stacked DA model [116]. Lerouge et
al. proposed DNN based method to label CT scans [133].
Success of many medical image analysis methods depends on image denoising.
Gondara proposed a denoising technique utilizing convolutional denoising DA, and
validated it with mammograms and dental radiography [144]. While Agostinelli et al.
presented an adaptive multi-column stacked sparse denoising autoencoder (DA) method
for image denoising which was validated using CT Scan images of the head [132].
Detecting anomaly in medical images is widely used for disease diagnosis. Several
models were applied to detect Alzheimer’s Disease (AD) and Mild Cognitive
Impairment (MCI) from MRI and PET scans including DA [117, 118], DBM [120],
RBM [121], and multimodal stacked deep polynomial network (MM-SDPN) [124].
Due to its facilitating structure, CNN has been the most popular DL architecture for
image analysis. CNN was applied to classify breast masses from mammograms
(MMM) [151–155], diagnose AD using different neuroimages (e.g., brain MRI [126],
brain CT scans [135], and (f)MRIs [128]), and rheumatoid arthritis from hand
radiographs [150]. CNN was also used extensively: on CT scans to detect– anatomical
structure [136], sclerotic metastases of spine along with colonic polyps and lymph nodes
(LN) [137], thoracoabdominal LN and interstitial lung disease (ILD) [139], pulmonary
nodules [138, 140, 141]; on (f)MRI and diffusion tensor images to extract deep features
for brain tumor patients’ survival time prediction [129]; on MRI to detect
neuroendocrine carcinoma [127]; on UlS images to diagnose Breast Lesions [138] and
ILD [147]; on CFI to detect hemorrhages [148]; on endoscopy images to diagnose
digestive organ related diseases [149]; on PET images to identify oesophagal carcinoma
and predict responses of neoadjuvant chemotherapy [143].
In addition, DBN was successfully applied to identify: Attention Deficit
Hyperactivity Disorder [142], and Schizophrenia (SZ) and Huntington Disease from
(f/s)MRI [122]. And, a DNN based method was proposed to successfully identify the
fetal abdominal standard plane in UlS images [146].
RL was used in segmenting transrectal UlS images to estimate location and volume
of the prostate [158].

2.4 [Brain/Body]-Machine Interfaces


DL and RL methods have been applied to BMI signals (e.g., electroencephalogram,
EEG; electrocardiogram, ECG; electromyogram, EMG) mainly from (brain) function
decoding and anomaly detection perspectives.

10/33
Various DL architectures have been used in classifying EEG signals to decode Motor
Imagery (MoI). CNN was applied in the classification pipeline using – augmented
common spatial pattern features which covered various frequency ranges [172]; features
based on combined selective location, time, and frequency attributes which were then
classified using DA [173]; and signal’s dynamic energy representation [174]. DBN was
also employed– in combination with softmax regression to classify signal frequency
information as features [161]; and in conjunction with Ada-boost algorithm to classify
single channels [162]. DNN was used– with variance based common spatial pattern
(CSP) features to classify MoI EEG [171], and to find neural patterns occurring at each
time points in single trials where the input heatmaps were created with layer-wise
relevance propagation technique [170]. In addition, MoI EEG signals were classified by
denoising DA using multifractal attribute features [159].
DBN was used by Li et al. to extract low dimensional latent features as well as
critical channel selection which led to an early framework for affective state classification
using EEG signals [163]. In a similar work, Jia et al. used semi-supervised approach
with an active learning to train DBN and generative RBMs for the classification [164].
Later, using differential entropy as features to train DBN, Zheng et al. examined
dominant frequency bands and channels of EEG in an emotion recognition system [165].
Jirayucharoensak et al. used PCA extracted power spectral densities from each EEG
channel, which were corrected by covariate shift adaptation to reduce non-stationarity,
as features to stacked DA to detect emotion [160]. Tripathi et al. explored DNN (with
Softmax activator and Dropout) and CNN [198] (with Tan Hyperbolic, Max Pooling,
Dropout, and Softplus) for emotion classification from the DEAP dataset using EEG
signals and response face video [175]. Using similar data from the MAHNOB-HCI
dataset, Soleymani et al. detected continuous emotion using RNN-LSTM [178].
Channel-wise CNN & its variant with RBM [176], and AR-model based features with
sparse-DBN [169], was used to estimate driver’s cognitive states using EEG data.
In another approach to model cognitive events, EEG signals were transformed to
time-lagged multi-spectral images and fed to CNN for learning the spectral and spatial
representations of each image, followed by an adapted RNN (LSTM) to find the
temporal patterns in the image sequence [179].
DBN has been employed in classifying EEG signals for anomaly detection in diverse
scenarios including: online waveform classification [166]; AD diagnosis [167]; integrated
with HMM to understand sleep phases [168]. To detect and predict seizures– CNN was
used through classification of synchronization patterns [177]; RNN predicted specific
signal features related to seizure after being trained with data preprocessed by wavelet
decomposition [180]. Also, a lapse of responsiveness warning system was proposed using
RNN (LSTM) [181].
Using CNN Park & Lee [185] and Atzori et al. [184] decoded hand movements from
EMG signals.
ECG Arrhythmias were successfully detected using DBN [188] and DNN [189]. DBN
was also used to classify ECG signals acquired with two-leads [187], and in combination
with nonlinear SVM and Gaussian kernel [186].
RL has also been applied in BMI research. Concentrating mainly on controlling
(prosthetic/robotic) devices, several studies have been reported, including: mapping
neural activity to intended behavior through coadaptive BMI (using TD(λ)) [190] and
symbiotic BMI (using actor-critic) [191], a testbed targeting center-out reaching task in
primates for creating more realistic BMI control models [192], Hebbian RL for adaptive
control by mapping neural states to prosthetic actions [193], BMI for unsupervised
decoding of cortical spikes in multistep goal-directed tracking task (using Q(λ)) [194],
adaptive BMI capable of adjusting to dramatic reorganizing neural activities with
minimal training and stable performance over long duration (using actor-critic) [195],

11/33
BMI for efficient nonlinear mapping of neural states to actions through sparsification of
state-action mapping space using quantized attention-gated kernel RL as an
approximator [196]. Also, Lampe et al. proposed BMI capable of transmitting
imaginary movements evoked EEG signals over the Internet to remotely control robotic
device [182], and Bauer and Gharabaghi combined RL with Bayesian model to select
dynamic thresholds for improved performance of restorative BMI [183].

A Splice junction prediction B Compound-protein interaction prediction C Protein secondary structure prediction
1.0 GWHd GWHa UChg19 UChg38 DS1 DS2 DUD-E 100
100
CASP11 TS1199 CullPDB CB513 CASP10 CASP11 CAMEO
0.9 90
90
Performance (%)

Performance (%)
0.8

Performance (%)
80
80
0.7 70
70
0.6 60
60
0.5 50
50
SVM-RBF
SVM-SF
DBN
SVM-RBF
SVM-SF
DBN
SVM-RBF
SVM-SF
GS
SM
DBN
SVM-RBF
SVM-SF
GS
SM
DBN

SSPro
SSProT
CNF
DCNF-SS
SSPro
SSProT
CNF
DCNF-SS
SSPro
SSProT
CNF
DCNF-SS
SSProT
CNF
DCNF-SS
SSPro
SSProT
CNF
DCNF-SS
SSPro
PSIPRED
MSNN
FFNN
StSAE
PSIPRED
MSNN
FFNN
StSAE
DNN*
RF
LR-L1
LR-L2
SVM1
SVM2
SVM3
DNN*
RF
LR-L1
LR-L2
SVM1
SVM2
SVM3
DNN*
RF
CNN
GWHd: Whole Human Genome-Donor dataset; DS1: Dataset 1; DS2: Dataset 2; DUD-E: Directory of CASP11, TS1199, CullPDB, CB513, CASP10, CASP11, CAMEO: Datasets; PSIPRED:
GWHa: GWH-Acceptor dataset; UChg19: Useful Decoys - Enhanced; DNN*: DNN with cross Protein Secondary Structure Prediction with Feed-Forward NN; MSNN: Multi-Step NN (using
UCSC-hg19 dataset; UChg38: UCSC-hg38 entropy, LR, Stochastic Gradient Descent and SPINE-X); FFNN: Feed-Forward NN (using SCORPION); StSAE: Stacked Sparse
dataset; SVM-RBF: SVM with Radial Basis Dropout; LR-L[1/2]: L[1/2]-regularized Logistic Autoencoders based deep NN with back propagation; SSPro: SSPro Software without
Function; SVM-SF: SVM with Sigmoid Function; Regression; SVM[1/2]: L2-regularized L2-loss SVM Template; SSProT: SSPro Software with Template; CNF: Conditional Neural Field (using
GS: Gene Splicer; SM: Splice Machine. [dual/primal]; SVM3: L1-regularized L2-loss SVM. RaptorX-SS8); DCNF-SS: Deep CNF Secondary Structure predictor.
D Gene expression analysis, and cancer detection and classification
100 TS CV-ESP ER RG DCG AML1 AC BC LK1 LK2 AML2 BC2 SN OV CC MB PC
80
Performance (%)

60
40

20
0
SVM
LR
DNN
SVM
LR
DNN
CNN
GKSVM

DNN
SVM
kNN
NB
ANN
SVM
SVM-RBF
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
SAE
StAE
StAE-FT
PCA-SVM
TS: Shuffled dataset with observed and simulated variants; CV-ESP: ClinVar and ESP; ER: Combined ENCODE and Roadmap Epigenomics DNase-seq Database; RG: Raw Gene; DCG: Deeply Connected
Genes; AML[1/2]: Acute Myeloid Leukemia; AC: Adenocarcinoma; BC[1/2]: Breast Cancer; LK[1/2]: Lukemia; SN: Seminoma; OV: Ovarian; CC: Colon Cancer; MB: Medulloblastoma; PC: Prostate Cancer.
GKSVM: Gapped K-Mer with SVM; NB: Naive Bayes; StAE: Stacked Autoencoder; StAE-FT: StAE with Fine Tuning; PCA-SVM: PCA with Softmax and SVM.
E Sequence specificity prediction RBP prediction F miRNA prediction G
DREAM5 ChIP AUC DREAM5 PBM Score 1.0 1.0
0.75 1.00
0.95 0.9 0.9
0.70

Performance
0.90 0.8 0.8
Performance

Performance

0.65
AUC

0.85 0.7 0.7


0.60
0.80
0.6 0.6
0.55 0.75
0.5 0.5
0.50 0.70

Bst-SVM
CSHMM
SVM-LSS
SVM-MFE
RBS
MIL
PFM
PFM
PBM
EN
MDS
RNAAE

LSTM
DBN-CNN
iONMF
NMF

DBN-CNN*
DBCNN
DBCNN
Ray et al.

SNMF
QNO
SVM

GraphProt
mDBN-
mDBN+
DBCNN
B-P-s
B-P
B-P-d
F-R-d
F-R-P
F-R
T-E
P-a
F-R-s
M-R
T-D

DBCNN
F-R
T-D
T-E
F-R-d
B-P-d
B-P
F-R-P
B-P-s
M-R
F-R-s
P-a

NMF: Non-neg Matrix Factorization; iONMF: integrative MIL: Multi Instance Learning; P[F/B]M: Parameter
DBCNN: DeepBind method implementing deep CNN; B-P: BEEML-PBM; F-R: Orthogonality-regularized NMF; SNMF: Sparse NMF; [Free/Based] Model; EN: Ensamble; MDS: Multiboost
FeatureREDUCE; M-R: MatrixREDUCE; T-: Team-; s: sec; d: dinuc; P: PWM; a: align. QNO: Quasi-Newton Optimization; mDBN-: Multimodal with Decision Stumps; RNNAE: RNN with AE; Bst-SVM:
Models: DBCNN(CNN); F-R(hybrid); T-D(k-mer); T-E(PWM); F-R-d(dinuc); DBN integrating RNAbase sequence and secondary Boost with SVM; CSHMM: Context Sensitive HMM;
B-P-d(dinuc); B-P (PWM); F-R-P(PWM); B-P-s(2PWMs); M-R(PWM); F-R-s(2PWMs); structural profiles; mDBN+: mDBN- also integrating SVM-LSS: SVM with Local Structure-Sequence feature;
P-a(PWM). DREAM5: Dialogue for Reverse Engineering Assessments and Methods - tertiary structural profiles; GraphProt: Efficient graph SVM-MFE: SVM with Minimum Free Energy; RBS: Rule
challenge 5 (http://dreamchallenges.org/, see caption for references). kernel based RBP predictor; Ray et al.: see caption. Based Schema; LSTM: Long Short Term Memory

Figure 2. Performance comparison of representative DL techniques when applied


to Omics data in: predicting splice junction (A), compound-protein interactions (B),
and secondary/tertiary structures of proteins (C); analyzing gene expression data, and
classifying and detecting cancers from them (D); and predicting DNA- and RNA-sequence
specificity (E), RNA binding proteins (F), and micro RNA precursors (G). Ray et al.:
method proposed in [199].

12/33
3 Performance Analysis and Comparison
Comparative test results, in the form of performances/ accuracies of each DL technique
when applied to the data coming from Omics (Fig. 2), Bioimaging (Fig. 3), Medical
Imaging (Fig. 4), and [Brain/Body]-Machine Interfaces (Fig 5), are summarized below
to facilitate the reader in selecting the appropriate method for h[is/er] research. The
reported performances can be regarded as a metric to evaluate the strengths/
weaknesses of a particular technique with a given set of parameters on a specific dataset.
It should be noted that several factors (e.g., data pre-processing, network architecture,
feature selection and learning, parameters’ optimization, etc.) collectively determine the
accuracy of a method.
In Figs. 2–5, each group of bars indicates accuracies/performances of comparable DL
or non-DL techniques when applied to same data and reported in an individual study.
And, each bar in a group shows the (mean) performance of different runs of a technique
on either multiple subjects/datasets (for means, error bar is ± standard deviation).

3.1 Omics
Fig. 2A reports that DBN outperforms other methods in predicting splice junction
when applied to: two datasets from Whole Human Genome database (GWH-donor,
GWH-acceptor) and two from UCSC genome database (UCSC-hg19, UCSC-hg38) [79].
In the GWH datasets, DBN based method achieved superior F1-score (0.81 and 0.75)
against SVM based methods with Radial Basis (0.77 and 0.67) and Sigmoid (0.71 and
0.56) Functions, and other splicing techniques like Gene Splicer (0.74 and 0.75) and
Splice Machine (0.77 and 0.73). Also, in the UCSC datasets, DBN achieved highest
classification accuracy (0.88 and 0.88) in comparison to SVM-RBF (0.868 and 0.867)
and SVM-SF (0.864 and 0.861).
Performance comparison of CPI is shown in Fig. 2B. Tested over two CPI datasets,
a DNN based method (DNN*) achieved superior prediction accuracy (93.2% in dataset1
and 93.8% in dataset2) compared to other methods based on RF (83.9% and 86.6%),
LR (88.3% and 89.9% using LR2 ), and SVM (88.7% and 90.3% using SVM3 ) [88]. In
another study, a similar DNN* was applied on DUD-E dataset where it achieved higher
accuracy (99.6%) over RF (99.58%) and CNN (89.5%) based methods [89]. As per the
accuracies reported in [88], the RF based method had lower values in comparison to the
LR and SVM based methods which had similar values. Whereas, when applied on
DUD-E dataset (reported in [89]), the RF based method outperforms the CNN based
method. This may be attributed to the fact that, classification problems are data
dependent and despite being one of the best classifiers [200], RF performs poorly on the
DUD-E dataset.
In predicting 2-PS, DL based methods outperforms other methods (see Fig. 2C).
When applied on two datasets (CASP11 and TS1199), the stacked sparse autoencoder
(StSAE) based method achieved superior prediction accuracy (80.8% and 81.8%) in
comparison to other NN based methods (FFNN: 79.9% and 82%, MSNN: 78.8% and
81%, and PSIPRED: 78.8% and 79.7%) [87]. Another DL method with Conditional
Neural Fields (DLCNF), when tested on five different datasets (CullPDB53, CB513,
CASP1054, CASP1155, CAMEO), better predicted the 2-PS (Q8 accuracy: 75.2%,
68.3%, 71.8%, 72.3%, 72.1%) in comparison to other non-template based methods
(SSPro: 66.6%, 63.5%, 64.9%, 65.6%, 63.5%; CNF: 69.7%, 64.9%, 64.8%, 65.1%,
66.2%) [98]. However, when a template of solved protein structure from PDB was used,
the SSpro with template obtained the best accuracy (SSProT: 85.1%, 89.9%, 75.9%,
66.7%, 65.7%).
To annotate GV in identifying pathogenic variants from two datasets (TS and
CVESP in Fig. 2D), a DNN based method performed better (72.2% and 94.6%) than

13/33
LR (63.5% and 95.2%) and SVM (63.1% and 93.0%) based methods. Another CNN
based approach to predict DNA sequence accessibility was tested on data from
ENCODE & REC databases and was reported to outperform gapped k-mer SVM
method (mean AUC of 0.89 vs. 0.78) [94]. In classifying cancer based on somatic point
mutation using raw TCGA data containing 12 cancer types, a DNN based method
outperformed non-DL methods (60.1% vs. [SVM: 52.7%, kNN: 40.4%, NB: 9.8%]) [90].
To detect breast cancer using GE data from TCGA database, a Stacked Denoising
Autoencoder (StDAE) was employed to extract features. According to the reported
accuracies of non-DL classifiers (ANN, SVM, and SVM-RBF), StDAE outperformed
other feature extraction methods such as PCA and KPCA (SVM-RBF classification
accuracies for StDAE, PCA, and KPCA were 98.26%, 89.13%, and 97.32%,
respectively) [77]. Also, deeply connected genes were better classified with StDAE
extracted features (accuracies- ANN: 91.74%, SVM: 91.74%, and SVM-RBF:
94.78%) [77]. Another study on classifying cancer, using 13 different GE datasets taken
from the literature, reported that the use of PCA in data dimensionality reduction,
before applying SAE, StAE, and StAE-FT for feature extraction, facilitates more
accurate extraction of features (except AC and OV in Fig. 2D) for classification using
SVM with Gaussian kernel [76].
Sequence specificities of [D/R]BPs prediction was performed more accurately using a
deep CNN based method in comparison to other non-DL methods participated at the
DREAM51 challenge [93]. As seen in Fig. 2E, the CNN based method (DeepBind)
outperformed other methods in ChIP AUC values (top two values- DeepBind: 0.726 vs.
BEEML-PBM sec: 0.714) and PBM scores (two top scores- DeepBind: 0.998 vs.
FeatureREDUCE: 0.985) [201, 202].
Moreover, in predicting RBP, DL based methods outperformed non-DL methods as
seen in Fig. 2F. As reported in using CLIP AUC values, DBCNN outperforms Ray et al.
(0.825 vs. 0.759) [93], multimodal DBN outperforms GraphProt (0.902 vs. 0.887) [84],
and DBN-CNN hybrid outperforms NMF based methods (0.9 vs. 0.85) [82].
Also, in predicting miRNA precursor (Fig. 2G), RNN with DA outperformed other
non-DL methods (RNNAE: 0.97 vs. [MIL: 0.58, PFM: 0.77, PFM: 0.59; PBM: 0.58, EN:
0.71, and MDS: 0.64]) [100]. And LSTM outperformed SVM methods (LSTM: 0.93 vs.
[Boost-SVM: 0.89, CSHMM: 0.65, SVM-LSS: 0.79, SVM-MFE: 0.86, and RBS: 0.8]) [99].

3.2 Bioimaging
DNN was used in detecting 12 different cellular compartments from microscopy images
and was reported to have achieved classification accuracy of 87% compared to 75% for
RF [111]. The mean performance of the detection was 83.24±5.18% using DNN and
69.85±6.33% using RF (Fig. 3A). In classifying flow cytometry images for cell cycle
phases, Deep CNN with non-linear dimension reduction outperformed boosting
(98.73±0.16% vs. 93.1±0.5%) (Fig. 3A) [114]. DNN, trained using genetic algorithm
with AUC as cost function, performed label free cell classification at higher accuracy
(95.5±0.9%) than SVM with Gaussian kernel (94.4±2.1%), LR (93.5±0.9%), NB
(93.4±1%), and DNN trained with cross entropy (88.7±1.6%) (Fig. 3A) [107].
Colon histopathology images were classified with higher accuracy using DNN and
Multiple Instance Learning (97.44%) compared to K-Means clustering (89.43%) [106]
(Fig. 3B). Deep max-pooling CNN detected mitosis in breast histology images with
higher accuracy (88%) in comparison to statistical feature based classification (70%)
(Fig. 3B) [109]. Using StSAE with Softmax classifier (SMC), nuclei were more
accurately detected from breast cancer histopathology images (88.8 ± 2.7%) when
compared to: other techniques with SMC– CNN (88.3 ± 2.7%), 3-layer SAE (88.3 ±
1 DREAM5 challenge: http://dreamchallenges.org/ and [201, 202].

14/33
A 100 Cell B 100 Automatic

Performance (%)

Performance (%)
90 compartment 90 feature
classification annotation
80 80
70 Cell cycle 70 Mitosis
classification detection
60 60
50 Cell 50 Nuclei
classification detection

DNN
RF
CNN
Boost
DNN1
DNN2
LR
SVM
NB

DNN3
K-M
DNN4
SFC
StSAE*
CNN*
TSAE*
StAE*
SAE*
AE*
SMC
EM
DNN: Deep Neural Network; RF: Random Forest; DNN3: DNN with Multiple Instance Learning; K-M:
CNN: Convolutoinal Neural Network; DNN1: Deep K-Means; DNN4: Deep Maxpool CNN; SFC: Statistical
Neural Network with Genetic Algorithm; DNN2: Feature based Classification; (StS/TS/St/S)AE:
DNN with Cross Entropy; LR: Logistic Regression; (Stacked Sparse/Three-layer Sparse/Stacked/Sparse)
SVM: Support Vector Machine; Autoencoder; SMC: Soft Max Classifier; EM:
NB: Naive Bayes. Expectation Maximization; *: with SMC.
Figure 3. Performance comparison of some DL and conventional ML techniques when
applied to Bioimaging application domain. A. Performances in classifying electron
microscope images for cell compartments, cell cycles, and cells. B. Performances in
analyzing images to automatically annotate features, and detect mitosis and cell nuclei.

1.9%), StAE (83.7 ± 1.9%), SAE (84.5 ± 3.8%), AE (83.5 ± 3.3%); SMC alone (78.1 ±
4%); and EM (66.4 ± 4%) (Fig. 3B) [105].

3.3 Medical Imaging


Comparing test results on performance of various DL/non-DL techniques in segmenting
medical images to detect pathology or organ parts is reported in Fig. 4A. A multi-scale,
dual pathway, 11-layers, 3D CNN based method with Conditional Random Fields
outperformed RF method (DSC metric values: 63.0 ± 16.3 vs. 54.8 ± 18.5) when
segmenting brain lesion in MRIs obtained from TBI database. The classifier’s accuracy
improved when 3 similar networks were ensembled (i.e., Ensembled Method, ESM) and
their outputs were averaged (64.5 ± 16.3 vs. 63.0 ± 16.3) [130]. In a similar task a [two
pathway/cascaded] CNN trained using a two-phase training procedure, with local and
global features, outperformed other methods participated at the MCCAI-BRATS20132
as reported using Dice coefficients (InputCascadeCNN: 0.88 vs. Tustison: 0.87) [125].
StAE based method performed similarly or superior to other non-DL methods (see ‘ON’
in Fig. 4A, DSC values– StAE: 0.79 vs. [MAF: 0.77, MAF: 0.76, and SVM: 0.73]) in
segmenting optic nerve from MRI data [116]. Several DL methods were evaluated in
identifying glands in colon histology images, and a DCAN based method outperformed
other CNN based methods at the GlaS contest3 (DCAN: 0.839 vs. [MPCNN1 : 0.834,
UCNN-MHF: 0.831, MFR-FCN: 0.833, MPCNN2 : 0.819, UCNN-CCL: 0.829]) [156].
Also, left ventricles were segmented from cardiac MRI, where a CNN-StAE based
method outperformed other methods (CNN-StAE: 0.94 vs. [TLVSF: 0.9, DRLS-DBN:
0.89, GMM-DP: 0.89, MF-GVF: 0.86, and TSST-RRDP: 0.88]) [204]. In segmenting
volumetric medical images for blood pool (BP) and myocardium (MC), CNN based
methods outperformed other methods as reported using Dice coefficients– BP (MAF:
0.88, DCNN: 0.93, 3DMRF: 0.87, TVRF: 0.79, 3DUNet: 0.926, 3DDSN: 0.928); and MC
(MAF: 0.75, DCNN: 0.8, 3DMRF: 0.61, TVRF: 0.5, 3DUNet: 0.69, 3DDSN: 0.74) [157].
DL based methods outperformed other methods in denoising MMM & dental
radiographs [144], and brain CT scans [132] (Fig. 4B). StDAE-CNN performed more
accurate denoising in the presence of Gaussian/ Poisson noise (GN/PN) as reported
using SSIM scores (Noisy: 0.63, NL Means: 0.62, MedFilt: 0.8, CNNDAEa : 0.89,
CNNDAEb : 0.9) [144]. Adaptive MC-StSDA outperformed MC-StSDA in denoising CT
images as reported using PSNR values for GN, Salt & Pepper (SPN), and Speckle noise
2 See [203] for MICCAI-BRATS2013.
3 See [156] for MICCAI15-GlaS.

15/33
A Image Segmentation B Image denoising
1 GN / PN
C Anomaly and disease detection - I (Breast mass detection)
1.0 BT ON GL LV BP MC GN SPN SN
100 FFDM BCDR INBreast DDSM MIAS
0.9
0.9 0.8 90

Performance (%)
Performance (%)

SSIM Score
0.8 0.7
80
0.6
0.7
0.5 70
0.6 0.4
0.3 60
0.5
0.4 0.2 50

CNN-LU
SVM-LU
ANN-LU
SVM-L
ANN-L
Noisy
NL-Mean
MedF
CNNDAEa
CNNDAEb
MCStSDA
AMCStSDA
MCStSDA
AMCStSDA
MCStSDA
AMCStSDA

RF-CNNPT
CNNPT
RF-HF
CNN3
CNN2
HGD
HOG
DeCAF

SFS-LDA
AANN
SVM
SCNN
SVM-ELM
SSVM
CNN-LSVM
SVM-MLP
DWT-GMB
RF
RF-CRF
MS3DCNN
3DCNN-CRF
ESM
ESM-CRF
ICCNN
MAP-MRF
LCCNN
LPCNN
DF-CFCRF
MRF-HM
MCP-LOO
StAE
MAF
MAF
SVM
DCAN
MPCNN1
UCNN-MHF
MFR-FCN
MPCNN2
UCNN-CCL
CNN-StAE
TLVSF
DRLS-DBN
GMM-DP
MF-GVF
TSST-RRDP
MAF
DCNN
3DMRF
3DUNet
3DDSN
MAF
DCNN
3DMRF
3DUNet
3DDSN
Noisy: Noisy data; NL-Mean: Non L/U: Labelled/Unlabelled features; CNN3: 2 convolutional layers
MS3D(CNN-CRF): Multi Scale 3D Deep CNN coupled with a 3D fully connected Conditional Random Field; ESM: Local Mean; MedF: Median Filter; with 1 fully connected layer; CNN2: 1 convolutional layer with 1
Ensamble Segmentation Method; ESM-CRF: ESM with CRF; ICCNN: Input Cascade CNN; MAP-MRF: Maximum A CNNDAEa: Stacked Denoising fully connected layer; HGD: Histogram of Gradient Divergence;
Priori estimation with Markov Random Fields; LCCNN: Local Cascade CNN; LPCNN: Local Path CNN; DF-CFCRF: Autoencoders with Convolutional HOG: Histogram of Oriented Gradients; DeCAF: Deep
Density Forest with Classification Forest and CRF; MRF-HM: Markov Random Field with Histogram Matching; Layers using small dataset; Convolutional Activation Feature; CNNPT: CNN with Pre-Training;
MCP-LOO: Multi Channel Patch with Leave-One-Out; StAE: Stacked Autoencoder; MAF: Multi Atlas Framework; CNNDAEb: CNNDAE using RF-CNNPT: RF with CNNPT; RF-HF: RF with Hand-crafted
DCAN: Deep Contour-Aware Network; MPCNN[1/2]: Multi Path CNN [with/without] Border Netowrks; UCNN-MHF: combanied dataset; MCStSDA: Features; SFS-LDA: Sequential Forward Selection with LDA;
U-shape CNN with Morphological Hole-Filling; MFR-FCN: Fully Conv. Nets with Multi-level Feature Representa- Multi-Column Stacked Sparse AANN: Auto-Associator NN; SCNN: Soft Clustered NN;
tions; UCNN-CCL: U-shaped CNN with Connected Component Labeling; CNN-StAE: CNN with StAE; TLVSF: 3D DAE; AMCStSDA: Adaptive SVM-ELM: SVM with Extreme Learning Machine; SSVM:
Time Left Ventricle Segmentation Framework; DRLS-DBN: Distance Regularized with DBN; GMM-DP: MCStSDA. GN: Gaussian Noise; Structured SVM; CNN-LSVM: CNN with Linear SVM; SVM-MLP:
Gaussian-Mixture Model with Dynamic Programming; MF-GVF: Morphological Filtering with Gradient Vector Flow; PN: Poisson Noise; SPN: Salt & SVM with MLP; DWT-GMB: Discrete Wavelet Transform with Gray
TSST-RRDP: Topological Stable-State Thresholding with Region Restricted Dynamic Programming; DCNN: Dilated Pepper Noise; SN: Speckle Noise. Level Co-occurrence Matrix with Back Propagating NN.
CNN; TVRF: Total Variation RF; 3DUNet: 3D UNet CNN architecture; 3DDSN: 3D Deep Supervision Network.
D Anomaly and disease detection - II (AD/MCI/NC detection) AD vs. NC MCI vs. NC E Anomaly and disease detection - III (Other disease detection)
100 100 LNC OC BTD CPD CTRD
90 90
Performance (%)

Performance (%)
80 80
70 70
60 60
50 50

3S-CNN
1S-CNN
CNN

DAE-BDT

SIFT-SVM
CNN
CNN-SVM
HCNN-ELM
HF
HF-SIFT
HF2D-CNN
HFCNN-PCA
CNN

GB
RF
SVM
GB*
RF*
SVM*
RF
kNN
SVM-RBF
RT-SVM
CT-SVM

QRA

MTANN
SCNN
LeNet
RDCNN
AlexNet
FT-AlexNet

HoG-SVM
RF-SVM
StDAE

MC-CNN
SAE
DSAE
DBM
MK-SVM
DSAE-SLR
SAE-3DCNN
ISML
DDL
DSA-3DCNN
SAE
DSAE
DBM
MK-SVM
DSAE-SLR
SAE-3DCNN
ISML
DDL
DSA-3DCNN

SRBMWOD
SRBMD
MK-SVM

SRBMWODO
SRBMDO

SK-SVM
MK-SVM
SK-SVM
MK-SVM
MTL
StDSAE
MStDPN-SVM
MStDPN-LC
MTL
StDSAE
MStDPN-SVM
MStDPN-LC
MK-SVM

SAE: Sparse Autoencoder; DSAE: Deep Sparse Autoencoder; DBM: Deep Boltzmann Machine; MK-SVM: SVM-RBF: SVM with RBF; RT-SVM: Ranklet Transform with SVM; CT-SVM: Curvelet Transform with SVM;
Multi-Kernel SVM; DSAE-SLR: Deep Sparse Autoencoder with Softmax Logistic Regressor; SAE-3DCNN: StDAE: Stacked Denoising Autoencoder; DAE-BDT: DAE with Binary Decision Tree; QRA: Quantitative Radiomics
SAE with 3 Dimentional CNN; ISML: Inherent Structure-based Multiview Learning; DDL: Deep Learning Approach; MC-CNN: Multi-crop CNN; MTANN: Massive-Training ANN; SCNN: Shallow CNN; LeNet: CNN’s
with Dropout; DSA-3DCNN: Deeply Supervised Adaptable 3 Dimensional CNN; SK-SVM: Single Kernel LeNet arch.; RDCNN: Relatively Deep CNN; AlexNet: CNN’s AlexNet arch.; FT-AlexNet: Fine Tuned AlexNet;
SVM; SRBMD: Stacked Restricted Boltzmann Machine With Dropout; SRBMWOD: Stacked Restricted SIFT-SVM: Scale Invariant Feature Transform with Locality Constraint based Vector Sparse Coding and SVM;
Boltzmann Machine Without Dropout; MK-SVM: Multi-Kernel SVM; MTL: MultiTask Learning; SDSAE: CNN-SVM: CNN with SVM; HCNN-ELM: Hybrid CNN with Extreme Learning Machine; HF: Handcrafted Features;
Stacked Denoising Sparse Autoencoder; MSDPN-SVM: Multimodal Stacked Deep Polynomial Netowrk HF-SIFT: HF with SIFT; HF2D-CNN: HF with 2D CNN; HFCNN-PCA: HF with 3D CNN and PCA; HoG-SVM:
with SVM; MSDPN-LC: MSDPN with Linear Classifier. Histogram of oriented Gradients with SVM; 3/1S-CNN: 3/1 Slices CNN; GB: Gradient Boosting; *: with PCA.
Figure 4. Performance comparison of representative DL techniques when applied to
Medical Imaging. A. Performance of Image Segmentation techniques in segmenting
tumors (BT: Brain Tumor), and different organ parts (ON: Optic Nerve, GL: Gland, LV:
Left Ventricle of heart, BP: Blood Pool, and MC: Myocardium). B. Image denoising
techniques to improve image quality during the presence of Gaussian, Poisson, Salt &
Pepper, and Speckle noise. C. Detecting anomalies and diseases in mammograms. D.
Classification and detection of Alzheimer’s Disease (AD), Mild Cognitive Impairment
(MCI), along with healthy controls (NC). E. Performance of prominent techniques for-
Lung Nodule Classification (LNC), Organ Classification (OC), Brain Tumor Detection
(BTD), Colon Polyp Detection (CPD), and Chemotherapy Response Detection (CTRD).

(SN) (SSIM4 scores– GN: 26.5 vs. 22.7; SPN: 25.5 vs. 22.1; SN: 26.6 vs. 25.1) [132].
CNN based methods performed very well in detecting breast mass and lesion in
MMM obtained from different datasets (see Fig. 4C). MMM obtained from the FFDM
database, trained with Labeled and Unlabeled features, CNN outperformed other
methods (CNN-LU: 87.9%, SVM-LU: 85.4%, ANN-LU: 84%, SVM-L: 82.5%, ANN-L:
81.9%) [153]. In detecting masses in MMM from BCDR database, CNN with 2
convolution layers and 1 fully connected layer (CNN3) performed similar to other
methods (CNN3:82%, HGD: 83%, HOG: 81%, DeCAF: 82%), and CNN with 1
convolution layer and 1 fully connected layer performed poorly (CNN2: 78%) [151].
Pre-trained CNN with RF outperformed other methods (e.g., RF with handcrafted
features and sequential forward selection with LDA) while analyzing MMM from
4 SSIM = (PSNR−15.0865)/20.069, with σfg = 102 [205].

16/33
INBreast database (RF-CNNPT: 95%, CNNPT: 91%, RF-HF: 90%, SFS-LDA:
89%) [155]. In yet another study, CNN with linear SVM outperformed other methods
on MMM from DDSM database (CNN-LSVM: 96.7%, SVM-ELM: 95.7%, SSVM: 91.4%,
AANN: 91%, SCNN: 88.8%, SVM: 83.9%) [152]. However, for the MMM form MIAS
database, a DWT with back-propagating NN outperformed its SVM/ CNN counterparts
(DWT-GMB: 97.4% vs. SVM-MLP: 93.8%) [152].
Despite having been all applied on images from ADNI database, reported methods
displayed different performances in detecting and classifying ‘AD vs. NC’ and ‘MCI vs.
NC’ (AD and MCI, in short) varied greatly (see Fig. 4D). An approach employing
deep-supervised-adaptable 3D-CNN (DSA-3D-CNN) outperformed other DL and
non-DL methods, as reported using their accuracies, in detecting AD and MCI (AD,
MCI– DSA-3DCNN: 99.3%, 94.2% vs. [DSAE: 95.9%, 85.0%; DBM: 95.4%, 85.7%;
SAE-3DCNN: 95.3%, 92.1%; SAE: 94.7%, 86.3%; DSAE-SLR: 91.4%, 82.1%; MK-SVM:
96.0%, 80.3%; ISML: 93.8%, 89.1%; DDL: 91.4%, 77.4%]) [126]. A Stacked RBM with
dropout based method outperformed the same method without dropout and
multi-kernel based SVM method in detecting AD and MCI (AD, MCI– SRBMDO:
91.4%, 77.4% vs. [SRBMWODO: 84.2%, 73.1%; and MK-SVM: 85.3%, 76.9%]) [121]. In
another method with StAE extracted features, MK-SVM was more accurate than
SK-SVM method ([AD, MCI]– MK-SVM: [85.3%, 76.9%] vs. SK-SVM: [95.9%,
85.0%]) [117]. Another method, where features from MRI and PET were fused and
learned using multi-modal stacked Deep Polynomial Network (MStDPN) algorithm,
outperformed other multimodal learning methods in detecting AD and MCI (AD, MCI–
MTL: 95.38%, 82.99%; StSDAE: 91.95%, 83.72%; MStDPN-SVM: 97.13%, 87.24%;
MStDPN-LC: 96.93%, 86.99%) [124].
Different techniques reported varying accuracies in detecting a range of anomalies
from different medical images (Fig. 4E). CNN had better accuracy in classifying ILD
(85.61%) when compared to RF (78.09%), kNN (73.33%), and SVM-RBF (71.52%) [147].
LNC were accurately done using StDAE (95%) compared to RT-SVM (72.5%) and
CT-SVM (77%) [138]. Multi-crop CNN achieved better accuracy (87.14%) than DAE
with binary DT (80.29%) and quantitative radiomics based approach (83.21%) in
LNC [140]. MTANNs outperformed CNNs variants in LNC (MTANN: 88.06% vs.
[SCNN: 77.09%, LeNet: 75.86%, RDCNN: 78.13%, AlexNet: 76.85%, FT-AlexNet:
77.55%]) [141]. Hierarchical CNN with ELM outperformed other CNN and SVM
methods (HCNN-NELM: 97.23% vs. [SIFT-SVM: 89.79%, CNN: 95%, CNN-SVM:
97.05%]) in classifying digestive organs [149]. Multi-channel CNN with PCA and
handcrafted features better detected BT (89.85%) in comparison to 2D-CNN (81.25%),
scale-invariant transform (78.35%), and manual classification with handcrafted features
(62.8%) [129]. A 2D CNN based method trained with stochastic gradient descent
learning outperformed other non-DL methods (AUC values– CNN: 0.93 vs. [HOG-SVM:
0.87, and RF-SVM: 0.76]) in detecting colon polyp from CT colonographs [137]. In
addition, 3Slice-CNN was successfully employed to detect Chemotherapy response in
PET images which outperformed other shallow methods (3S-CNN: 73.4 ± 5.3%,
1S-CNN: 66.4 ± 5.9%, GB: 66.7 ± 5.2%, RF: 57.3 ± 7.8%, SVM: 55.9 ± 8.1%,
GB-PCA: 66.7 ± 6.0%, RF-PCA: 65.7 ± 5.6%, SVM-PCA: 60.5 ± 8.0%) [143].

3.4 [Brain/Body]-Machine Interfaces


Test results in the form of performance comparison of DL/non-DL methods applied to
EEG data to detect MoI, emotion & affective state, and anomaly are shown in Fig. 5A.
A linear parallel CNN with MLP classified EEG energy dynamics more accurately
(70.6%), than SVM (67.0%), MLP (65.8%), and CNN (69.6%) to detect MoI from BCI
competition (BCIC) IV-2a dataset [174]. CNN better classified FCMS of augmented
CSP and SFM as features (68.5% and 69.3%) than filter-bank CSP (67.0%) to detect

17/33
A EEG
100 MID 100 ER AnD

Performance (%)
90
80
80
70 60
60 40
50 20

SVM
MLP
CNN
CNNMLP
FCMS
SFM
FBCSP
CNN
CNNStAE
SVM
DBN
SVM
DBN
FBCSP
CSPLDA
DNN

DLN
SVM
DBN
SVM
DNN
SVM
DNN
CNN
CNNR
CCNN
SVM
LDA

DBN
KNN
SVM
DBNr
DBNf
HMMf
CSP: Common Spatial Pattern; FCMS: Frequency Complementary feature Map Selection; SFM: full Feature
Map; FBCSP: Filter-Bank CSP; DLN: Deep Learning Network implemeted using Stacked Autoencoder with
Softmax Classifier alongwith PCA and Covariate Shift Adaptation; CCNN: Channel-wise CNN; CNNR: CCNN
with RBM instead of convolutional filter; DBNr: DBN without any prior knowledge; DBNf: DBN with handmade
features; HMMf: Gaussian observation hidden Markov model with handmade features.
B EMG C ECG
100 MD 100 SC

Performance (%)
80 95

60 90
40 85
20 80

DBN1
NN
LDA
SVM1
SVM2
CA
DBN2
SVM3
SVM4
SVM5
SOM
NN
DT
DyBN
DBN3
NN1
NN2
MOE
LDA
CNN
SVM
CNN
RF
SVM
KNN
CNN
RF
SVM
KNN
CNN
RF
SVM
KNN
DBN1: DBN and SVM with Gaussian kernel; SVM1: SVM with genetic algorithm; SVM2: SVM with KNN; CA:
Cluster Analysis; DBN2: DBN with softmax classifier; SVM3: SVM with Wavelet transformation and ICA; SVM4:
SVM with higher order statistics and Hermite coefficients; SVM5: SVM with ICA; DyBN: Dynamic Bayesian
Network; DBN3: DBN with softmax regression; NN1: block-based NN; NN2: feedforward, fully connected ANN
with multidimensional particle swarm optimization; MOE: Mixure of Experts.

Figure 5. Accuracy comparison of DL and conventional ML techniques when applied to


BMI signals. A. Performance comparison in detecting motor imagery (MID), recognizing
emotion and cognitive states (ER), and detecting anomaly (AnD) from EEG signals.
B. Accuracies of movement decoding (MD) from EMG signals. C. Accuracies of ECG
signal classification (SC).

MoI from BCIC IV-2a dataset [172]. CNN, StAE, and their combination (CNN-StAE)
were tested in classifying MoI from BCIC IV-2b EEG data. Using time, frequency &
location information as features, CNN-StAE achieved best accuracy (77.6 ± 2.1%) in
comparison to SVM (72.4 ± 5.7%), CNN (74.8 ± 2.3%) and StAE (57.7 ± 5.5%) [173].
A DBN with Ada-boost based classifier had higher accuracy (∼81%) than SVM (∼76%)
in classifying hand movements from EEG [162]. Another DBN based method reported
better accuracy (0.84) using frequency representations of EEG (using FFT and wavelet
package decomposition) rather than FCSP (0.8), and CSP (0.76) in classifying
MoI [161]. A DNN based method, with layerwise relevance propagation heatmaps,
performed comparable (75%) to CSP-LDA (82%) in MoI classification [170].
A DLN was built using StAE with PCA and covariate shift adaptation to classify
valence and arousal states from DEAP EEG data with multichannel PSD as features.
The mean accuracy of the DLN was 52.7 ± 9.7% compared to SVM (40.1 ± 7.9%) [160].
A supervised DBN based method classified affective states more accurately (75.6 ±
4.5%) when compared to SVM (64.7 ± 4.6%) by extracting deep features from
thousands of low level features using DEAP EEG data [163]. A DBN based method,
with differential entropy as features, explored critical frequency bands and channels in
EEG, and classified three emotions (positive, neutral, and negative) with higher
accuracy (86.1%) than SVM (83.9%) [165]. As reported through Az-score, in predicting
driver’s drowsy and alert state from EEG data, CCNN and CNNR methods
outperformed (79.6% and 82.8% respectively) other DL (CNN: 71.4±7.5% and DNN:
76.5±4.4%) and non-DL (LDA: 52.8%±4, and SVM: 50.4±2.5%) methods [176].
DBN was used to model, classify, and detect anomalies from EEG waveforms. It has
been reported that using raw data a comparable classification and superior anomaly
detection accuracy (50 ± 3%) can be achieved compared to SVM (48±2%), and KNN
(40±2%) classifiers [166]. Another DBN and HMM based method performed
comparable sleep stage classification from raw EEG data (67.4 ± 12.9%) with respect to
DBN with HMM and features (72.2 ± 9.7%), and Gaussian observation HMM with

18/33
features (63.9 ± 10.8%) [168].
Fig. 5B shows performances of various methods in decoding movements (MD) from
(s)EMG. A CNN based method’s hand movement classification accuracy using three
sEMG datasets (from the Ninapro database) were comparable to other methods (CNN
vs. [kNN, SVM, RF]) – Dataset1: 66.6 ± 6.4% vs. [60 ± 8%, 62.1 ± 6.1%, 75.3 ±
5.7%]; Dataset2: 60.3 ± 7.7% vs. [68 ± 7%, 75 ± 5.8%, 75.3 ± 7.8%]; and Dataset3:
38.1 ± 14.3% vs. [38.8 ± 11.9%, 46.3 ± 7.9%, 46.3 ± 7.9%] [184]. Another method, a
user-adaptive one, using CNN with deep feature learning, decoded movements more
accurately compared to SVM (95 ± 2% vs. 80 ± 10%) [185].
Fig. 5C compares different techniques’ performances in classifying ECG signals from
MIT-BIH arrhythmia database and detecting anomalies in them. A nonlinear SVM
with Gaussian kernel (DBN1 ) outperformed (98.5%) NN (97.5%), LDA (96.2%), SVM
with genetic algorithm (SVM1 : 96.0%), SVM with kNN (SVM2 : 98.0%), Wavelet with
PSO (88.8%), and CA (94.3%) in classifying ECG features extracted using DBN [186].
Comparable accuracy in classifying ECG beats were obtained using DBN with softmax
(DBN2 : 98.8%) compared to SVM with Wavelet and ICA (SVM3 : 99.7%), SVM with
higher order statistics and Hermite coefficients (SVM4 : 98.1%), SVM with ICA (SVM5 :
98.8%), DT (96.1%), and Dynamic Bayesian network (DyBN: 98%) [187]. Using DBN
(with contrastive divergence and persistent contrastive divergence learning),
arrhythmias were classified more accurately (98.6%) in comparison to block NN1
(98.1%), feed-forward based NN with PSO (NN2 : 97.0%), mixture of experts (97.6%),
and LDA (95.5%) [188].

4 Open Issues and Future Perspectives


Overall, it is believed that brain solves problems through reinforcement learning and
neuronal networks organized as hierarchical processing systems. Though since the
1950’s the field of AI has been trying to adopt and implement this strategy in
computers, notable progress has been seen only recently due to our better
understanding about learning systems, increase of computational power, decline of
computing costs, and last but not the least, the seamless integration of different
technological and technical breakthroughs. However, there are still situations where
these methods fail, underperforming traditional methods, and, therefore, must be
improved. Below we outline, what in our opinion are, the shortcomings of current
techniques, the existing open research challenges, and speculate about some future
perspectives that will facilitate further development and advancement of the field.
The combined computational capability and flexibility provided by the two
prominent ML methods (i.e., DL and RL) has also limitations [33]. Both of these
methods require heavy computing power and memory and, therefore, they are not
worthy of being applied to moderate size datasets. Additionally, the theory of DL is not
completely understood, making the high level outcomes obscure and difficult to
interpret. This turns into a situation when the models are considered as ‘Black
box’ [206]. In addition, like other ML techniques, DL is also susceptible to
misclassification [72] and overclassification [207]. Furthermore, in representing
action-value pairs in RL, it is not possible to use all nonlinear approximators which may
cause instability or even divergence in some cases [31]. Also, bootstrapping makes many
of the RL algorithms NP hard and inapplicable to real-time applications as they are too
slow to converge and in some cases too dangerous (e.g., autonomous driving). Moreover,
very few of the existing techniques support harnessing the potential power of distributed
and parallel computation through cloud computing. Arguably, in case of cloud,
distributed, and parallel computing, data privacy and security concerns are still
prevailing [208], and real-time processing capability of the gigantic amount of

19/33
experimentally acquired data is still underdeveloped [209, 210].
To proceed towards mitigating the shortcomings and addressing the open issues, first
of all, improving the existing theoretical foundations of DL on the basis of experimental
data becomes crucial to be able to quantify the performances of individual NN
models [211]. These improvements should be able to address issues like– specific
assessment of an individual model’s computational complexity and learning efficiency in
relation to well defined parameter tuning strategies, the ability to generalize and
topologically self-organize based on data-driven properties. Also, novel data
visualization techniques should be incorporated so that the interpretation of data
becomes intuitive and less cumbersome. In terms of learning strategies, updated hybrid
on- and off-policy with new advances in optimization techniques are required. The
problems pertaining to observability of RL are yet to be completely solved, and optimal
action selection is still a huge challenge.
As seen in Table 1, there are great opportunities to employ deep RL in Biological
data mining. For example, deriving dynamic information from Biological data coming
from multiple levels to reduce data redundancy and discover novel biomarkers for
disease detection and prevention. Also, new unsupervised learning for deep RL methods
are required to shrink the necessity of large-set of labeled data at the training phase.
Multitasking and multiagent learning paradigm should advance in order to cope with
dynamically changing problems.
In addition, to keep up with the rapid pace of data growth in the biological
application domains, computational infrastructures in terms of distributed and parallel
computing tailored to those applications are needed.

5 Conclusion
The recent bliss of technological advancement in Life Sciences came with the huge
challenge of mining the multimodal, multidimentional and complex Biological data.
Triggered by that call, interdisciplinary approaches have resulted in development of
cutting edge machine learning based analytical tools. The success stories of artificial
neural networks, deep architectures, and reinforcement learning in making machines
intelligent are well known. Furthermore, computational costs have dropped, computing
power has surged, and quasi unlimited solid-state storage is available at reasonable price.
These factors have allowed to combine these learning techniques to reshape machines’
capabilities to understand and decipher complex patterns from Biological data. To
facilitate wider deployment of such techniques and to serve as a reference point for the
community, this article provides– a comprehensive survey of the literature of techniques’
usability with different Biological data; a comparative study on performances of various
DL techniques, when applied to the data coming from different application domains, as
reported in the literature; and highlights of some open issues and future perspectives.

Acknowledgment
The authors would like to thank Dr. Pawel Raif and Dr. Kamal Abu-Hassan for useful
discussions during the early stage of the work. This work was supported by the ACSLab
(www.acslab.info).

References
1. W. Coleman, Biology in the nineteenth century: problems of form, function, and
transformation. NY, USA: Cambridge Univ Press, 1977.

20/33
2. L. Magner, A history of the life sciences. New York: M. Dekker, 2002.
3. S. Brenner, “History of science. the revolution in the life sciences,” Science, vol.
338, no. 6113, pp. 1427–8, 2012.
4. J. Shendure and H. Ji, “Next-generation DNA sequencing,” Nat. Biotechnol.,
vol. 26, no. 10, pp. 1135–1145, 2008.
5. M. L. Metzker, “Sequencing technologies — the next generation,” Nat. Rev.
Genet., vol. 11, no. 1, pp. 31–46, 2010.
6. R. Vadivambal and D. S. Jayas, Bio-imaging : principles, techniques, and
applications. Boca Raton, FL: CRC Press, 2016.
7. R. A. Poldrack and M. J. Farah, “Progress and challenges in probing the human
brain,” Nature, vol. 526, no. 7573, pp. 371–379, 2015.
8. M. A. Lebedev and M. A. L. Nicolelis, “Brain-machine interfaces: From basic
science to neuroprostheses and neurorehabilitation,” Phys. Rev., vol. 97, no. 2,
pp. 767–837, 2017.
9. V. Marx, “Biology: The big challenges of big data,” Nature, vol. 498, no. 7453,
pp. 255–260, 2013.
10. Y. Li and L. Chen, “Big biological data: Challenges and opportunities,”
Genomics Proteomics Bioinformatics, vol. 12, pp. 187–189, 2014.
11. P. Wickware, “Next-generation biologists must straddle computation and
biology,” Nature, vol. 404, no. 6778, pp. 683–684, 2000.
12. A. L. Tarca and et al., “Machine learning and its applications to biology,” PLoS
Comput. Biol., vol. 3, no. 6, p. e116, 2007.
13. J. Hopfield, “Artificial neural networks,” IEEE Circuits Devices Mag., vol. 4,
no. 5, pp. 3–10, 1988.
14. C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20,
no. 3, pp. 273–297, 1995.
15. G. X. Yuan, C. H. Ho, and C. J. Lin, “Recent advances of large-scale linear
classification,” Proc. IEEE, vol. 100, no. 9, pp. 2584–2603, 2012.
16. D. Heckerman, “A tutorial on learning with bayesian networks,” in Learning in
Graphical Models, M. I. Jordan, Ed. Springer Netherlands, 1998, no. 89, pp.
301–354.
17. T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Trans.
Inf. Theory, vol. 13, no. 1, pp. 21–27, 1967.
18. L. Rabiner and B. Juang, “An introduction to hidden Markov models,” IEEE
ASSP Mag., vol. 3, no. 1, pp. 4–16, 1986.
19. R. Kohavi and J. Quinlan, “Data mining tasks and methods: Classification:
Decision-tree discovery,” in Handbook of Data Mining and Knowledge Discovery,
W. Klosgen and J. Zytkow, Eds. New York, USA: Oxford University Press,
2002, pp. 267–276.
20. G. E. Hinton, “Connectionist learning procedures,” Artif. Intell., vol. 40, no. 1-3,
pp. 185–234, 1989.

21/33
21. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from
incomplete data via the em algorithm,” J. R. Stat. Soc. Series B Methodol.,
vol. 39, no. 1, pp. 1–38, 1977.
22. T. Kohonen, “Self-organized formation of topologically correct feature maps,”
Biol. Cybernet., vol. 43, no. 1, pp. 59–69, 1982.
23. G. Ball and D. Hall, “ISODATA, a novel method of data anlysis and pattern
classification,” Stanford Research Institute, Stanford, CA, Technical report
NTIS AD 699616, 1965.
24. J. C. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in
Detecting Compact Well-Separated Clusters,” J. Cybernet., vol. 3, no. 3, pp.
32–57, 1973.
25. J. Hartigan, Clustering algorithms. NY, USA: J.Wiley & Sons, 1975.
26. M. Libbrecht and W. Noble, “Machine learning applications in genetics and
genomics,” Nat. Rev. Genet., vol. 16, no. 6, pp. 321–332, 2015.
27. A. Kan, “Machine learning applications in cell image analysis,” Immunol. Cell.
Biol., vol. 95, no. 6, pp. 525–530, 2017.
28. B. Erickson, P. Korfiatis, Z. Akkus, and T. Kline, “Machine learning for medical
imaging,” RadioGraphics, vol. 37, pp. 505–515, 2017.
29. C. Vidaurre and et al., “Machine learning-based coadaptive calibration for bcis,”
Neural Comput., vol. 23, no. 3, pp. 791–816, 2010.
30. M. Mahmud and S. Vassanelli, “Processing and analysis of multichannel
extracellular neuronal signals: State-of-the-art and challenges,” Front. Neurosci.,
vol. 10, 2016.
31. V. Mnih and et al., “Human-level control through deep reinforcement learning,”
Nature, vol. 518, no. 7540, pp. 529–533, 2015.
32. J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural
Netw., vol. 61, pp. 85–117, 2015.
33. D. Ravi and et al., “Deep learning for health informatics,” IEEE J. Biomed.
Health Inform., vol. 21, no. 1, pp. 4–21, 2017.
34. P. Mamoshina and et al., “Applications of deep learning in biomedicine,” Mol.
Pharm., vol. 13, no. 5, pp. 1445–1454, 2016.
35. S. Min, B. Lee, and S. Yoon, “Deep learning in bioinformatics,” Brief
Bioinform., 2016, bbw068.
36. L. Kaelbling, M. Littman, and A. Moore, “Reinforcement learning: A survey,” J.
Artif. Intell. Res., vol. 4, pp. 237–285, 1996.
37. P. Y. Glorennec, “Reinforcement learning: an overview,” in Proc. ESIT, 2000,
pp. 17–35.
38. A. Gosavi, “Reinforcement learning: A tutorial survey and recent advances,”
INFORMS J. Comput., vol. 21, no. 2, pp. 178–192, 2009.
39. Y. Li, “Deep reinforcement learning: An overview,” CoRR, vol. abs/1701.07274,
2017.

22/33
40. Y. Bengio, “Learning deep architectures for ai,” Found. Trends Mach. Learn.,
vol. 2, no. 1, pp. 1–127, 2009.

41. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, USA:


MIT Press, 2016.
42. A. M. Saxe and et al., “Exact solutions to the nonlinear dynamics of learning in
deep linear neural nets,” CoRR, vol. abs/1312.6120, 2013.
43. R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep
recurrent neural networks,” in Proc. ICLR, 2014.
44. J. L. Elman, “Finding structure in time,” Cognitive Sci., vol. 14, no. 2, pp.
179–211, 1990.
45. M. Schuster and K. Paliwal, “Bidirectional recurrent neural networks,” IEEE
Tran. Signal Proces., vol. 45, no. 11, pp. 2673–2681, 1997.

46. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput.,


vol. 9, no. 8, pp. 1735–1780, 1997.
47. Z. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recurrent neural
nets for sequence learning,” CoRR, vol. abs/1506.00019, 2015.

48. Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no.
7553, pp. 436–444, 5 2015.
49. T. Wiatowski and H. Bolcskei, “A mathematical theory of deep cnn for feature
extraction,” CoRR, vol. abs/1512.06293, 2015.

50. Y. LeCun and Y. Bengio, “Convo-lutional networks for images, speech, and time
series,” in The Handbook of Brain Theory and Neural Nets, M. Arbib, Ed.
Cambridge, USA: MIT Press, 1998, pp. 255–258.
51. A. Krizhevsky and et al., “ImageNet classification with deep convolutional
neural networks,” in Proc. NIPS, 2012, pp. 1097–1105.

52. K. Simonyan and A. Zisserman, “Very deep convolutional networks for


large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
53. C. Szegedy and et al., “Going deeper with convolutions,” in Proc. CVPR, 2015,
pp. 1–9.

54. P. Vincent and et al., “Stacked denoising autoencoders: Learning useful


representations in a deep network with a local denoising criterion,” J. Mach.
Learn. Res., vol. 11, pp. 3371–3408, 2010.
55. P. Baldi, “Autoencoders, unsupervised learning and deep architectures,” in Proc.
ICUTLW, 2012, pp. 37–50.

56. M. Ranzato and et al., “Efficient learning of sparse representations with an


energy-based model,” in Proc. NIPS, 2006, pp. 1137–1144.
57. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” CoRR, vol.
abs/1312.6114, 2014.
58. S. Rifai and et al., “Contractive auto-encoders: explicit invariance during
feature extraction,” in Proc. ICML, 2011, pp. 833–840.

23/33
59. R. Salakhutdinov and G. E. Hinton, “Deep boltzmann machines,” in Proc.
AISTATS, 2009, pp. 448–455.
60. S. Geman and D. Geman, “Stochastic relaxation, gibbs distributions, and the
bayesian restoration of images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 6,
no. 6, pp. 721–741, 1984.
61. A. Fischer and C. Igel, “An introduction to restricted boltzmann machines,” in
Proc. CIARP, 2012, pp. 14–36.
62. G. Desjardins, A. C. Courville, and Y. Bengio, “On training deep boltzmann
machines,” CoRR, vol. abs/1203.4416, 2012.
63. T. Tieleman, “Training restricted boltzmann machines using approximations to
likelihood gradient,” in Proc. ICML, 2008, pp. 1064–1071.
64. Y. Guo and et al., “Deep learning for visual understanding: A review,”
Neurocomputing, vol. 187, pp. 27–48, 2016.
65. G. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief
nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, 2006.
66. R. S. Sutton and A. G. Barto, Reinforcement Learning : An Introduction.
Cambridge, MA, USA: MIT Press, 1998.
67. D. L. Poole and A. K. Mackworth, Artificial intelligence : foundations of
computational agents. NY, USA: Cambridge University Press, 2017.
68. L. Busoniu and et al., Reinforcement Learning and Dynamic Programming
Using Function Approximators. FL, USA: CRC Press, 2010.
69. R. S. Sutton and et al., “Fast Gradient-descent Methods for Temporal-difference
Learning with Linear Function Approximation,” in Proc. ICML, 2009, pp.
993–1000.
70. T. Schaul, D. Horgan, K. Gregor, and D. Silver, “Universal Value Function
Approximators,” in Proc. ICML, 2015, pp. 1312–1320.
71. F. Woergoetter and B. Porr, “Reinforcement learning,” Scholarpedia, vol. 3,
no. 3, p. 1448, 2008.
72. A. M. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily
fooled: High confidence predictions for unrecognizable images,” in Proc. CVPR,
2015, pp. 427–436.
73. H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with
Double Q-learning,” CoRR, vol. abs/1509.06461, 2015.
74. H. Hasselt, “Double q-learning,” in Proc. NIPS, 2010, pp. 2613–21.
75. B. J. Erickson and et al., “Toolkits and libraries for deep learning,” J. Digit.
Imaging, vol. 30, no. 4, pp. 400–405, 2017.
76. R. Fakoor, F. Ladhak, A. Nazi, and M. Huber, “Using deep learning to enhance
cancer diagnosis and classification,” in Proc. ICML, 2013.
77. P. Danaee, R. Ghaeini, and D. A. Hendrix, “A deep learning approach for
cancer detection and relevant gene identification,” in Proc. Pac. Symp.
Biocomput., vol. 22, 2016, pp. 219–229.

24/33
78. H. Li, “A template-based protein structure reconstruction method using da
learning,” J. Proteomics Bioinform., vol. 9, no. 12, 2016.

79. T. Lee and S. Yoon, “Boosted categorical rbm for computational prediction of
splice junctions,” in Proc. ICML, 2015, pp. 2483–2492.
80. R. Ibrahim, N. A. Yousri, M. A. Ismail, and N. M. El-Makky, “Multi-level
gene/mirna feature selection using deep belief nets and active learning,” in Proc.
IEEE EMBC, Aug 2014, pp. 3957–3960.

81. L. Chen, C. Cai, V. Chen, and X. Lu, “Trans-species learning of cellular


signaling systems with bimodal deep belief networks,” Bioinformatics, vol. 31,
no. 18, pp. 3008–3015, 2015.
82. X. Pan and H.-B. Shen, “RNA-protein binding motifs mining with a new hybrid
deep learning based cross-domain knowledge integration approach,” BMC
Bioinform., vol. 18, no. 1.
83. Y. Chen and et al., “Gene expression inference with deep learning,”
Bioinformatics, vol. 32, no. 12, pp. 1832–1839, 2016.
84. S. Zhang, J. Zhou, H. Hu, H. Gong, and et al., “A deep learning framework for
modeling structural features of rna-binding protein targets,” Nucleic. Acids Res.,
vol. 44, no. 4, p. e32, 2016.
85. C. Angermueller, H. J. Lee, W. Reik, and O. Stegle, “Deepcpg: accurate
prediction of single-cell dna methylation states using deep learning,” Genome
Biol., vol. 18, no. 1, p. 67, 2017.

86. D. Quang and et al., “Dann: a deep learning approach for annotating the
pathogenicity of genetic variants,” Bioinformatics, vol. 31, no. 5, p. 761, 2015.
87. R. Heffernan and et al., “Improving prediction of secondary structure, local
backbone angles, and solvent accessible surface area of proteins by iterative deep
learning,” Sci. Rep., vol. 5, p. 11476, 2015.

88. K. Tian and et al., “Boosting compound-protein interaction prediction by deep


learning,” Methods, vol. 110, pp. 64–72, 2016.
89. F. Wan and J. Zeng, “Deep learning with feature embedding for
compound-protein interaction prediction,” bioRxiv, 2016.
90. Y. Yuan and et al., “DeepGene: an advanced cancer type classifier based on
deep learning and somatic point mutations,” BMC Bioinform., vol. 17, no. 17, p.
476, 2016.
91. M. Hamanaka, K. Taneishi, H. Iwata, J. Ye, J. Pei, J. Hou, and Y. Okuno,
“CGBVS-DNN: Prediction of compound-protein interactions based on deep
learning,” Mol. Inf., vol. 36, no. 1-2, 2017.

92. O. Denas and J. Taylor, “Deep modeling of gene expression regulation in


erythropoiesis model,” in Proc. ICMLRL, 2013.
93. B. Alipanahi, A. Delong, M. T. Weirauch, and B. J. Frey, “Predicting the
sequence specificities of dna- and rna-binding proteins by deep learning,” Nature
Biotechnol., vol. 33, no. 8, pp. 831–838, 2015.

25/33
94. D. R. Kelley, J. Snoek, and J. L. Rinn, “Basset: learning the regulatory code of
the accessible genome with deep convolutional neural networks,” Genome Res.,
vol. 26, no. 7, pp. 990–9, 2016.
95. H. Zeng, M. D. Edwards, G. Liu, and D. K. Gifford, “Convolutional neural
network architectures for predicting dna-protein binding,” Bioinformatics,
vol. 32, no. 12, pp. 121–127, 2016.
96. J. Zhou and O. G. Troyanskaya, “Predicting effects of noncoding variants with
deep learning-based sequence model,” Nature Methods, vol. 12, no. 10, pp.
931–934, 2015.
97. Y. Huang, B. Gulko, and A. Siepel, “Fast, scalable prediction of deleterious
noncoding variants from functional and population genomic data,” Nature
Genet., vol. 49, pp. 618–624, 2017.
98. S. Wang, J. Peng, and et al., “Protein secondary structure prediction using deep
convolutional neural fields,” Sci. Rep., vol. 6, 2016.
99. S. Park and et al., “deepMiRGene: Deep neural network based precursor
microrna prediction,” CoRR, vol. abs/1605.00017, 2016.
100. B. Lee, J. Baek, S. Park, and S. Yoon, “deepTarget: End-to-end learning
framework for miRNA target prediction using deep recurrent neural networks,”
CoRR, vol. abs/1603.09123, 2016.
101. L. Chuang and et al., “Operon prediction using particle swarm optimization &
reinforcementlearning,” in Proc. ICTAAI, 2010, pp. 366–72.
102. C. Ralha, H. Schneider, M. Walter, and A. Bazzan, “Reinforcement learning
method for bioagents,” in Proc. SBRN, 2010, pp. 109–114.
103. M. Bocicor and et al., “A reinforcement learning approach for solving the
fragment assembly problem,” in Proc. SYNASC, 2011, pp. 191–198.
104. F. Zhu, Q. Liu, X. Zhang, and B. Shen, “Protein-protein interaction network
constructing based on text mining and reinforcement learning with application
to prostate cancer,” in Proc. BIBM, 2014, pp. 46–51.
105. J. Xu and et al., “Stacked sparse autoencoder (SSAE) for nuclei detection on
breast cancer histopathology images,” IEEE Trans. Med. Imaging, vol. 35, no. 1,
pp. 119–130, 2016.
106. Y. Xu, T. Mo, Q. Feng, P. Zhong, and et al., “Deep learning of feature
representation with multiple instance learning for medical image analysis,” in
Proc. ICASSP, 2014, pp. 1626–1630.
107. C. L. Chen, A. Mahjoubfar, L.-C. Tai, I. K. Blaby, and et al., “Deep learning in
label-free cell classification,” Sci. Rep., vol. 6, no. 1, 2016.
108. F. Ning, D. Delhomme, Y. LeCun, F. Piano, and et al., “Toward automatic
phenotyping of developing embryos from videos,” IEEE Trans. Image Process.,
vol. 14, no. 9, pp. 1360–1371, 2005.
109. D. Ciresan and et al., “Mitosis detection in breast cancer histology images with
deep neural nets,” in Proc. MICCAI, 2013, pp. 411–4188.
110. ——, “Deep neural nets segment neuronal membrane in electron microscopy
images,” in Proc. NIPS, 2012, pp. 2843–2851.

26/33
111. T. Parnamaa and L. Parts, “Accurate classification of protein subcellular
localization from high-throughput microscopy images using deep learning,” G3,
vol. 7, no. 5, pp. 1385–1392, 2017.
112. A. Ferrari, S. Lombardi, and A. Signoroni, “Bacterial colony counting with
convolutional neural networks in digital microbiology imaging,” Pat. Recogn.,
vol. 61, pp. 629 – 640, 2017.
113. O. Z. Kraus, J. L. Ba, and B. J. Frey, “Classifying and segmenting microscopy
images with deep multiple instance learning,” Bioinfo, vol. 32, no. 12, p. i52,
2016.
114. P. Eulenberg and et al., “Deep learning for imaging flow cytometry: Cell cycle
analysis of jurkat cells,” bioRxiv, 2016.
115. B. Jiang and et al., “Convolutional neural networks in automatic recognition of
trans-differentiated neural progenitor cells under bright-field microscopy,” in
Proc. IMCCC, 2015, pp. 122–126.
116. A. Mansoor, J. Cerrolaza, R. Idrees, and et al., “Deep learning guided
partitioned shape model for anterior visual pathway segmentation,” IEEE Trans.
Med. Imaging, vol. 35, no. 8, pp. 1856–1865, 2016.
117. H.-I. Suk and D. Shen, “Deep learning-based feature representation for ad/mci
classification,” in Proc. MICCAI, 2013, pp. 583–590.
118. B. Shi, Y. Chen, P. Zhang, C. Smith, and J. Liu, “Nonlinear feature
transformation and deep fusion for Alzheimer’s Disease staging analysis,”
Pattern Recognition, vol. 63, pp. 487–498, 2017.
119. T. Brosch and R. Tam, “Manifold learning of brain mris by deep learning,” in
Proc. MICCAI, 2013, pp. 633–640.
120. H.-I. Suk, S.-W. Lee, and D. Shen, “Hierarchical feature representation and
multimodal fusion with deep learning for ad/mci diagnosis,” NeuroImage, vol.
101, pp. 569 – 582, 2014.
121. F. Li, L. Tran, K. H. Thung, S. Ji, D. Shen, and J. Li, “A robust deep model for
improved classification of ad/mci patients,” IEEE J. Biomed. Health. Inform.,
vol. 19, no. 5, pp. 1610–1616, 2015.
122. S. Plis, D. Hjelm, R. Salakhutdinov, and et al., “Deep learning for neuroimaging:
a validation study,” Front. Neurosci., vol. 8, 2014.
123. M. Havaei and et al., “Brain tumor segmentation with deep neural networks,”
Med. Image Anal., vol. 35, pp. 18–31, 2017.
124. J. Shi and et al., “Multimodal neuroimaging feature learning with multimodal
stacked deep polynomial networks for diagnosis of Alzheimer’s disease,” IEEE J.
Biomed. Health Inform., vol. PP, pp. 1–1, 2017.
125. M. Havaei, N. Guizard, and et al., “Deep learning trends for focal brain
pathology segmentation in mri,” in Machine Learning for Health Informatics,
A. Holzinger, Ed. Cham: Springer, 2016, pp. 125–148.
126. E. HosseiniAsl, G. L. Gimelfarb, and A. El-Baz, “Alzheimer’s disease diagnostics
by a deeply supervised adaptable 3d convolutional network,” CoRR, vol.
abs/1607.00556, 2016.

27/33
127. J. Kleesiek and et al., “Deep mri brain extraction: A 3d cnn for skull stripping,”
NeuroImage, vol. 129, pp. 460 – 469, 2016.
128. S. Sarraf and G. Tofighi, “Deepad: Alzheimer’s disease classification via deep
convolutional neural nets using mri and fmri,” bioRxiv, 2017.
129. D. Nie, H. Zhang, E. Adeli, L. Liu, and et al., “3d deep learning for multi-modal
imaging-guided survival time prediction of brain tumor patients,” in Proc.
MICCAI, 2016, pp. 212–220.
130. K. Kamnitsas, C. Ledig, V. F. Newcombe, J. Simpson, and et al., “Efficient
multi-scale 3d CNN with fully connected CRF for accurate brain lesion
segmentation,” Med. Image Anal., vol. 36, pp. 61–78, 2017.
131. M. F. Stollenga, W. Byeon, M. Liwicki, and J. Schmidhuber, “Parallel
multi-dimensional lstm, with application to fast biomedical volumetric image
segmentation,” in Proc. NIPS, 2015, pp. 2980–88.
132. F. Agostinelli, M. R. Anderson, and H. Lee, “Adaptive multi-column deep
neural networks with application to robust image denoising,” in Proc. NIPS,
2013, pp. 1493–1501.
133. J. Lerouge and et al., “Ioda: An input/output deep architecture for image
labeling,” Pattern Recogn., vol. 48, no. 9, pp. 2847–58, 2015.
134. K. Fritscher and et al., “Deep neural networks for fast segmentation of 3d
medical images,” in Proc. MICCAI, 2016, pp. 158–165.
135. X. W. Gao, R. Hui, and Z. Tian, “Classification of CT brain images based on
deep learning networks,” Comput. Methods Programs Biomed., vol. 138, pp. 49 –
56, 2017.
136. J. Cho, K. Lee, E. Shin, G. Choy, and S. Do, “Medical image deep learning with
hospital pacs dataset,” CoRR, vol. abs/1511.06348, 2015.
137. H. Roth, L. Lu, J. Liu, and et al., “Improving computer aided detection using
convolutional neural networks and random view aggregation,” IEEE Trans. Med.
Imaging, vol. 35, no. 5, pp. 1170–1181, 2016.
138. J. Cheng and et al., “Computer-aided diagnosis with deep learning architecture:
Applications to breast lesions in us images and pulmonary nodules in ct scans,”
Sci. Rep., vol. 6, p. 24454, 2016.
139. H. Shin, H. Roth, M. Gao, L. Lu, and et al., “Deep convolutional neural
networks for computer-aided detection: CNN architectures, dataset
characteristics & transfer learning,” IEEE Trans. Med. Imaging, vol. 35, no. 5,
pp. 1285–1298, 2016.
140. W. Shen and et al., “Multi-crop convolutional neural networks for lung nodule
malignancy suspiciousness classification,” Pattern Recognit., vol. 61, pp.
663–673, 2017.
141. N. Tajbakhsh and K. Suzuki, “Comparing two classes of end-to-end
machine-learning models in lung nodule detection and classification: Mtanns vs.
cnns,” Pattern Recognit., vol. 63, pp. 476–486, 2017.
142. D. Kuang and L. He, “Classification on adhd with deep learning,” in Proc.
CCBD, 2014, pp. 27–32.

28/33
143. P. Ypsilantis and et al., “Predicting response to neoadjuvant chemotherapy with
pet imaging using convolutional neural networks,” PLoS One, vol. 10, no. 9, p.
e0137036, 2015.
144. L. Gondara, “Medical image denoising using convolutional denoising
autoencoders,” in Proc. ICDMW, 2016, pp. 241–246.
145. T. Ngo and et al., “Combining deep learning and level set for the automated
segmentation of the left ventricle of the heart from cardiac cine mr,” Med. Image
Anal., vol. 35, pp. 159–171, 2017.
146. H. Chen, D. Ni, J. Qin, S. Li, and et al., “Standard plane localization in fetal
ultrasound via domain transferred deep neural networks,” IEEE J. Biomed.
Health Inform., vol. 19, no. 5, pp. 1627–1636, Sept 2015.
147. M. Anthimopoulos and et al., “Lung pattern classification for interstitial lung
diseases using a deep convolutional neural network,” IEEE Trans. Med. Imag.,
vol. 35, no. 5, pp. 1207–1216, 2016.
148. M. Grinsven and et al., “Fast cnn training using selective data sampling:
Application to hemorrhage detection in color fundus images,” IEEE Trans. Med.
Imaging, vol. 35, no. 5, pp. 1273–1284, 2016.
149. J. Yu, J. Chen, Z. Xiang, and Y. Zou, “A hybrid convolutional neural networks
with extreme learning machine for wce image classification,” in Proc. ROBIO,
2015, pp. 1822–1827.
150. S. Lee and et al., “Fingernet: Deep learning-based robust finger joint detection
from radiographs,” in Proc. IEEE BioCAS, 2015, pp. 1–4.
151. J. Arevalo and et al., “Representation learning for mammography mass lesion
classification with convolutional neural networks,” Comput. Methods Programs
Biomed., vol. 127, pp. 248–257, 2016.
152. Z. Jiao and et al., “A deep feature based framework for breast masses
classification,” Neurocomputing, vol. 197, pp. 221–231, 2016.
153. W. Sun, T.-L. Tseng, J. Zhang, and W. Qian, “Enhancing deep convolutional
neural network scheme for breast cancer diagnosis with unlabeled data,”
Comput. Med. Imaging Graph., vol. 57, pp. 4–9, 2017.
154. T. Kooi, G. Litjens, B. van Ginneken, A. Gubern-Merida, and et al., “Large
scale deep learning for computer aided detection of mammographic lesions,”
Med. Image Anal., vol. 35, pp. 303–312, 2017.
155. N. Dhungel, G. Carneiro, and A. P. Bradley, “A deep learning approach for the
analysis of masses in mammograms with minimal user intervention,” Med.
Image Anal., vol. 37, pp. 114–128, 2017.
156. K. Sirinukunwattana, J. P. Pluim, H. Chen, X. Qi, and et al., “Gland
segmentation in colon histology images: The glas challenge contest,” Med. Image
Anal., vol. 35, pp. 489–502, 2017.
157. Q. Dou and et al., “3d deeply supervised network for automated segmentation of
volumetric medical images,” Med. Image Anal., vol. 41, pp. 40–54, 2017.
158. F. Sahba, H. R. Tizhoosh, and M. M. Salama, “Application of reinforcement
learning for segmentation of transrectal ultrasound images,” BMC Med. Imaging,
vol. 8, no. 1, p. 8, 2008.

29/33
159. J. Li and A. Cichocki, “Deep learning of multifractal attributes from motor
imagery induced eeg,” in Proc. ICONIP, 2014, pp. 503–510.

160. S. Jirayucharoensak, S. Pan-Ngum, and P. Israsena, “Eeg-based emotion


recognition using deep learning network with principal component based
covariate shift adaptation,” Scientific World J., pp. 1–10, 2014.
161. N. Lu, T. Li, X. Ren, and H. Miao, “A deep learning scheme for motor imagery
classification based on restricted boltzmann machines,” IEEE Trans. Neural
Syst. Rehabil. Eng., vol. 25, no. 6, pp. 566–576, 2017.

162. X. An and et al., “A deep learning method for classification of eeg data based on
motor imagery,” in Proc. ICIC, 2014, pp. 203–210.
163. K. Li, X. Li, Y. Zhang, and A. Zhang, “Affective state recognition from eeg with
deep belief networks,” in Proc. BIBM, 2013, pp. 305–310.

164. X. Jia, K. Li, X. Li, and A. Zhang, “A novel semi-supervised deep learning
framework for affective state recognition on eeg signals,” in Proc. IEEE BIBE,
2014, pp. 30–37.
165. W. Zheng and B. Lu, “Investigating critical frequency bands and channels for
EEG-based emotion recognition with deep neural net,” IEEE Trans. Auton.
Mental Develop., vol. 7, no. 3, pp. 162–175, 2015.
166. D. Wulsin and et al., “Modeling electroencephalography waveforms with
semi-supervised deep belief nets: fast classification and anomaly measurement,”
J. Neural Eng., vol. 8, no. 3, p. 036015, 2011.

167. Y. Zhao and L. He, “Deep learning in the eeg diagnosis of alzheimer’s disease,”
in Proc. ACCV, 2015, pp. 340–353.
168. M. Langkvist, L. Karlsson, and A. Loutfi, “Sleep stage classification using
unsupervised feature learning,” AANS, p. 107046, 2012.
169. R. Chai and et al., “Improving eeg-based driver fatigue classification using
sparse-deep belief networks,” Front. Neurosci., vol. 11, 2017.
170. I. Sturm and et al., “Interpretable deep neural networks for single-trial EEG
classification,” J. Neurosci. Methods, vol. 274, pp. 141–145, 2016.
171. S. Kumar and et al., “A deep learning approach for motor imagery eeg signal
classification,” in Proc. APWC on CSE, 2016, pp. 34–39.
172. H. Yang and et al., “On the use of convolutional neural networks and
augmented csp features for multi-class motor imagery of eeg signals
classification,” in Proc. EMBC, 2015, pp. 2620–2623.
173. Y. R. Tabar and U. Halici, “A novel deep learning approach for classification of
EEG motor imagery signals,” J. Neural Eng., vol. 14, no. 1, p. 016003, 2017.
174. S. Sakhavi and et al., “Parallel convolutional-linear neural net for motor imagery
classification,” in Proc. EUSIPCO, 2015, pp. 2786–2790.
175. S. Tripathi, S. Acharya, R. Sharma, S. Mittal, and et al., “Using deep and
convolutional neural networks for accurate emotion classification on deap
dataset,” in Proc. IAAI, 2017, pp. 4746–4752.

30/33
176. M. Hajinoroozi, Z. Mao, and Y. Huang, “Prediction of driver’s drowsy and alert
states from eeg signals with deep learning,” in Proc. IEEE CAMSAP, 2015, pp.
493–496.

177. P. Mirowski, D. Madhavan, Y. LeCun, and R. Kuzniecky, “Classification of


patterns of EEG synchronization for seizure prediction,” Clin. Neurophysiol., vol.
120, no. 11, pp. 1927 – 1940, 2009.
178. M. Soleymani and et al., “Continuous emotion detection using eeg signals and
facial expressions,” in Proc. ICME, 2014, pp. 1–6.

179. P. Bashivan, I. Rish, M. Yeasin, and N. Codella, “Learning representations from


EEG with deep recurrent-convolutional neural networks,” CoRR, vol.
abs/1511.06448, 2015.
180. A. Petrosian and et al., “Recurrent neural network based prediction of epileptic
seizures in intra- and extracranial EEG,” Neurocomputing, vol. 30, no. 1–4, pp.
201 – 218, 2000.
181. P. R. Davidson, R. D. Jones, and M. T. R. Peiris, “Eeg-based lapse detection
with high temporal resolution,” IEEE Trans. Biomed. Eng., vol. 54, no. 5, pp.
832–839, 2007.

182. T. Lampe and et al., “A brain-computer interface for high-level remote control
of an autonomous, reinforcement-learning-based robotic system for reaching and
grasping,” in Proc. IUI, 2014, pp. 83–88.
183. R. Bauer and A. Gharabaghi, “Reinforcement learning for adaptive threshold
control of restorative brain-computer interfaces: a bayesian simulation,” Front.
Neurosci., vol. 9, no. 36, 2015.
184. M. Atzori and et al., “Deep learning with convolutional neural networks applied
to emg data: A resource for the classification of movements for prosthetic
hands,” Front. Neurorobot., vol. 10, p. 9, 2016.
185. K. Park and S. Lee, “Movement intention decoding based on deep learn for
multiuser myoelectric interfaces,” in Proc. IWW-BCI, 2016, p. 2.
186. M. Huanhuan and Z. Yue, “Classification of electrocardiogram signals with dbn,”
in Proc. IEEE CSE, 2014, pp. 7–12.
187. Y. Yan and et al., “A restricted boltzmann machine based two-lead
electrocardiography classification,” in Proc. BSN, 2015, pp. 1–9.
188. Z. Wu and et al., “A novel method for classification of ecg arrhythmias using
dbn,” J. Comp. Intel. Appl., vol. 15, p. 1650021, 2016.
189. M. Rahhal and et al., “Deep learning approach for active classification of ecg
signals,” Inform. Sci., vol. 345, pp. 340–354, 2016.

190. J. DiGiovanna, B. Mahmoudi, J. Fortes, J. C. Principe, and J. C. Sanchez,


“Coadaptive brain-machine interface via reinforcement learning,” IEEE Trans.
Biomed. Eng., vol. 56, no. 1, pp. 54–64, 2009.
191. B. Mahmoudi and J. C. Sanchez, “A symbiotic brain-machine interface through
value-based decision making,” PLoS One, vol. 6, no. 3, 2011.

31/33
192. J. C. Sanchez, A. Tarigoppula, J. S. Choi, B. T. Marsh, and et al., “Control of a
center-out reaching task using a reinforcement learning brain-machine interface,”
in Proc. IEEE NER, 2011, pp. 525–528.

193. B. Mahmoudi, E. A. Pohlmeyer, N. W. Prins, S. Geng, and J. C. Sanchez,


“Towards autonomous neuroprosthetic control using hebbian reinforcement
learning,” J. Neural Eng., vol. 10, no. 6, p. 066005, 2013.
194. F. Wang, K. Xu, Q. S. Zhang, Y. W. Wang, and X. Zheng, “A multi-step neural
control for motor brain-machine interface by reinforcement learning,” AMM, vol.
461, pp. 565–569, 2013.
195. E. Pohlmeyer and et al., “Using reinforcement learning to provide stable
brain-machine interface control despite neural input reorganization,” PLoS One,
vol. 9, no. 1, p. e87253, 2014.

196. F. Wang and et al., “Quantized attention-gated kernel reinforcement learning for
brain-machine interface decoding,” IEEE Trans. Neural Netw. Learn. Syst.,
vol. 28, no. 4, pp. 873–886, 2017.
197. N. Zeng, Z. Wang, H. Zhang, W. Liu, and F. E. Alsaadi, “Deep belief networks
for quantitative analysis of a gold immunochromatographic strip,” Cogn.
Comput., vol. 8, no. 4, pp. 684–692, 2016.

198. J. Li, Z. Zhang, and H. He, “Hierarchical convolutional neural networks for
eeg-based emotion recognition,” Cogn. Comput., pp. 1–13, 2017.
199. D. Ray and et al., “A compendium of RNA-binding motifs for decoding gene
regulation,” Nature, vol. 499, no. 7457, pp. 172–177, 2013.

200. M. Fernandez-Delgado and et al., “Do we need hundreds of classifiers to solve


real world classification problems?” J. Mach. Learn. Res., vol. 15, pp.
3133–3181, 2014.
201. D. Marbach and et al., “Wisdom of crowds for robust gene network inference,”
Nat. Meth., vol. 9, no. 8, pp. 796–804, 2012.

202. M. Weirauch and et al., “Evaluation of methods for modeling transcription


factor sequence specificity,” Nat Biotech, vol. 31, no. 2, pp. 126–134, 2013.
203. B. Menze and et al., “The multimodal brain tumor image segmentation
benchmark (brats),” IEEE Trans. Med. Imaging, vol. 34, no. 10, pp. 1993–2024,
2015.
204. M. Avendi, A. Kheradvar, and H. Jafarkhani, “Fully automatic segmentation of
heart chambers in cardiac mri using deep learning,” J. Cardio. M. Reson.,
vol. 18, no. 1, p. P351, 2016.
205. A. Hore and D. Ziou, “Image quality metrics: PSNR vs. SSIM,” in Proc. ICPR,
2010, pp. 2366–2369.
206. D. Erhan and et al., “Understanding representations learned in deep
architectures,” Universite de Montreal, Tech. Rep. 1355, 2010.
207. C. Szegedy and et al., “Intriguing properties of neural networks,” CoRR, vol.
abs/1312.6199, 2013.

32/33
208. M. Mahmud and et al., “Service oriented architecture based web application
model for collaborative biomedical signal analysis,” Biomed. Tech. (Berl).,
vol. 57, pp. 780–783, 2012.

209. M. Mahmud and et al., “QSpike tools: a generic framework for parallel batch
preprocessing of extracellular neuronal signals recorded by substrate
microelectrode arrays,” Front. Neuroinform., vol. 8, 2014.
210. M. Mahmud and et al., “A web-based framework for semi-online parallel
processing of extracellular neuronal signals recorded by microelectrode arrays,”
in Proc. MEAMEETING, 2014, pp. 202–203.
211. P. Angelov and A. Sperduti, “Challenges in deep learning,” in Proc. ESANN,
2016, pp. 489–495.

33/33

You might also like