1803.04311 - Transaction Survey

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

IEEE COMMUNICATIONS SURVEYS & TUTORIALS 1

Deep Learning in Mobile and Wireless Networking:


A Survey
Chaoyun Zhang, Paul Patras, and Hamed Haddadi

Abstract—The rapid uptake of mobile devices and the rising The growing diversity and complexity of mobile network
popularity of mobile applications and services pose unprece- architectures has made monitoring and managing the multi-
dented demands on mobile and wireless networking infrastruc- tude of network elements intractable. Therefore, embedding
ture. Upcoming 5G systems are evolving to support exploding
mobile traffic volumes, real-time extraction of fine-grained an- versatile machine intelligence into future mobile networks is
arXiv:1803.04311v2 [cs.NI] 17 Sep 2018

alytics, and agile management of network resources, so as to drawing unparalleled research interest [6], [7]. This trend is
maximize user experience. Fulfilling these tasks is challenging, reflected in machine learning (ML) based solutions to prob-
as mobile environments are increasingly complex, heterogeneous, lems ranging from radio access technology (RAT) selection [8]
and evolving. One potential solution is to resort to advanced to malware detection [9], as well as the development of
machine learning techniques, in order to help manage the rise
in data volumes and algorithm-driven applications. The recent networked systems that support machine learning practices
success of deep learning underpins new and powerful tools that (e.g. [10], [11]). ML enables systematic mining of valuable
tackle problems in this space. information from traffic data and automatically uncover corre-
In this paper we bridge the gap between deep learning lations that would otherwise have been too complex to extract
and mobile and wireless networking research, by presenting a by human experts [12]. As the flagship of machine learning,
comprehensive survey of the crossovers between the two areas.
We first briefly introduce essential background and state-of-the- deep learning has achieved remarkable performance in areas
art in deep learning techniques with potential applications to such as computer vision [13] and natural language processing
networking. We then discuss several techniques and platforms (NLP) [14]. Networking researchers are also beginning to
that facilitate the efficient deployment of deep learning onto recognize the power and importance of deep learning, and are
mobile systems. Subsequently, we provide an encyclopedic review exploring its potential to solve problems specific to the mobile
of mobile and wireless networking research based on deep
learning, which we categorize by different domains. Drawing networking domain [15], [16].
from our experience, we discuss how to tailor deep learning to Embedding deep learning into the 5G mobile and wireless
mobile environments. We complete this survey by pinpointing networks is well justified. In particular, data generated by
current challenges and open future directions for research. mobile environments are increasingly heterogeneous, as these
Index Terms—Deep Learning, Machine Learning, Mobile Net- are usually collected from various sources, have different
working, Wireless Networking, Mobile Big Data, 5G Systems, formats, and exhibit complex correlations [17]. As a conse-
Network Management. quence, a range of specific problems become too difficult or
impractical for traditional machine learning tools (e.g., shallow
I. I NTRODUCTION neural networks). This is because (i) their performance does
NTERNET connected mobile devices are penetrating every not improve if provided with more data [18] and (ii) they
I aspect of individuals’ life, work, and entertainment. The
increasing number of smartphones and the emergence of ever-
cannot handle highly dimensional state/action spaces in control
problems [19]. In contrast, big data fuels the performance
more diverse applications trigger a surge in mobile data traffic. of deep learning, as it eliminates domain expertise and in-
Indeed, the latest industry forecasts indicate that the annual stead employs hierarchical feature extraction. In essence this
worldwide IP traffic consumption will reach 3.3 zettabytes means information can be distilled efficiently and increasingly
(1015 MB) by 2021, with smartphone traffic exceeding PC abstract correlations can be obtained from the data, while
traffic by the same year [1]. Given the shift in user preference reducing the pre-processing effort. Graphics Processing Unit
towards wireless connectivity, current mobile infrastructure (GPU)-based parallel computing further enables deep learn-
faces great capacity demands. In response to this increasing de- ing to make inferences within milliseconds. This facilitates
mand, early efforts propose to agilely provision resources [2] network analysis and management with high accuracy and
and tackle mobility management distributively [3]. In the in a timely manner, overcoming the runtime limitations of
long run, however, Internet Service Providers (ISPs) must de- traditional mathematical techniques (e.g. convex optimization,
velop intelligent heterogeneous architectures and tools that can game theory, meta heuristics).
spawn the 5th generation of mobile systems (5G) and gradually Despite growing interest in deep learning in the mobile
meet more stringent end-user application requirements [4], [5]. networking domain, existing contributions are scattered across
different research areas and a comprehensive survey is lacking.
C. Zhang and P. Patras are with the Institute for Computing Systems Archi- This article fills this gap between deep learning and mobile
tecture (ICSA), School of Informatics, University of Edinburgh, Edinburgh, and wireless networking, by presenting an up-to-date survey of
UK. Emails: {chaoyun.zhang, paul.patras}@ed.ac.uk. H. Haddadi is with the
Dyson School of Design Engineering at Imperial College London. Email: research that lies at the intersection between these two fields.
[email protected]. Beyond reviewing the most relevant literature, we discuss
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 2

the key pros and cons of various deep learning architec- TABLE I: List of abbreviations in alphabetical order.
tures, and outline deep learning model selection strategies, Acronym Explanation
in view of solving mobile networking problems. We further 5G 5th Generation mobile networks
investigate methods that tailor deep learning to individual A3C Asynchronous Advantage Actor-Critic
mobile networking tasks, to achieve the best performance in AdaNet Adaptive learning of neural Network
AE Auto-Encoder
complex environments. We wrap up this paper by pinpointing AI Artificial Intelligence
future research directions and important problems that remain AMP Approximate Message Passing
unsolved and are worth pursing with deep neural networks. ANN Artificial Neural Network
Our ultimate goal is to provide a definite guide for networking ASR Automatic Speech Recognition
BSC Base Station Controller
researchers and practitioners, who intend to employ deep BP Back-Propagation
learning to solve problems of interest. CDR Call Detail Record
Survey Organization: We structure this article in a top-down CNN or ConvNet Convolutional Neural Network
ConvLSTM Convolutional Long Short-Term Memory
manner, as shown in Figure 1. We begin by discussing work CPU Central Processing Unit
that gives a high-level overview of deep learning, future mobile CSI Channel State Information
networks, and networking applications built using deep learn- CUDA Compute Unified Device Architecture
ing, which help define the scope and contributions of this paper cuDNN CUDA Deep Neural Network library
D2D Device to Device communication
(Section II). Since deep learning techniques are relatively new DAE Denoising Auto-Encoder
in the mobile networking community, we provide a basic deep DBN Deep Belief Network
learning background in Section III, highlighting immediate OFDM Orthogonal Frequency-Division Multiplexing
advantages in addressing mobile networking problems. There DPPO Distributed Proximal Policy Optimization
DQN Deep Q-Network
exist many factors that enable implementing deep learning DRL Deep Reinforcement Learning
for mobile networking applications (including dedicated deep DT Decision Tree
learning libraries, optimization algorithms, etc.). We discuss ELM Extreme Learning Machine
these enablers in Section IV, aiming to help mobile network GAN Generative Adversarial Network
GP Gaussian Process
researchers and engineers in choosing the right software and
GPS Global Positioning System
hardware platforms for their deep learning deployments. GPU Graphics Processing Unit
In Section V, we introduce and compare state-of-the-art GRU Gate Recurrent Unit
deep learning models and provide guidelines for model se- HMM Hidden Markov Model
HTTP HyperText Transfer Protocol
lection toward solving networking problems. In Section VI
IDS Intrusion Detection System
we review recent deep learning applications to mobile and IoT Internet of Things
wireless networking, which we group by different scenarios IoV Internet of Vehicle
ranging from mobile traffic analytics to security, and emerging ISP Internet Service Provider
LAN Local Area Network
applications. We then discuss how to tailor deep learning
LTE Long-Term Evolution
models to mobile networking problems (Section VII) and LSTM Long Short-Term Memory
conclude this article with a brief discussion of open challenges, LSVRC Large Scale Visual Recognition Challenge
with a view to future research directions (Section VIII).1 MAC Media Access Control
MDP Markov Decision Process
MEC Mobile Edge Computing
II. R ELATED H IGH - LEVEL A RTICLES AND ML Machine Learning
T HE S COPE OF T HIS S URVEY MLP Multilayer Perceptron
MIMO Multi-Input Multi-Output
Mobile networking and deep learning problems have been MTSR Mobile Traffic Super-Resolution
researched mostly independently. Only recently crossovers be- NFL No Free Lunch theorem
tween the two areas have emerged. Several notable works paint NLP Natural Language Processing
a comprehensives picture of the deep learning and/or mobile NMT Neural Machine Translation
NPU Neural Processing Unit
networking research landscape. We categorize these works into PCA Principal Components Analysis
(i) pure overviews of deep learning techniques, (ii) reviews PIR Passive Infra-Red
of analyses and management techniques in modern mobile QoE Quality of Experience
networks, and (iii) reviews of works at the intersection between RBM Restricted Boltzmann Machine
ReLU Rectified Linear Unit
deep learning and computer networking. We summarize these RFID Radio Frequency Identification
earlier efforts in Table II and in this section discuss the most RNC Radio Network Controller
representative publications in each class. RNN Recurrent Neural Network
SARSA State-Action-Reward-State-Action
SELU Scaled Exponential Linear Unit
A. Overviews of Deep Learning and its Applications SGD Stochastic Gradient Descent
SON Self-Organising Network
The era of big data is triggering wide interest in deep SNR Signal-to-Noise Ratio
learning across different research disciplines [28]–[31] and a SVM Support Vector Machine
growing number of surveys and tutorials are emerging (e.g. TPU Tensor Processing Unit
VAE Variational Auto-Encoder
1 We list the abbreviations used throughout this paper in Table I. VR Virtual Reality
WGAN Wasserstein Generative Adversarial Network
WSN Wireless Sensor Network
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 3

Sec. II Sec. IV Sec. V


Sec. III
Related Work & Our Scope Enabling Deep Learning in Mobile & Deep Learning: State-of-the-Art
Deep Learning 101
Wireless Networking
Advanced Parallel Distributed Machine
Related books, Fundamental Multilayer Boltzmann Auto-
Our scope and Evolution Computing Learning Systems
surveys and Principles Perceptron Machine encoder
distinction
magazine papers Dedicated Fast
Forward & Fog
Deep Learning Optimisation
Backward Advantages Computing
Libraries Algorithms
propagation
Convolutional Recurrent Generative
Neural Neural Adversarial
Tensorflow Fixed learning Network Network Network
Feature Unsupervised Hardware
Overviews of Caffe(2) rate
Surveys on Deep Learning Extraction Learning
Deep Learning Future Mobile Driven Networking
Theano Adaptive Software
and its Networks Applications Big Data Multi-task learning rate
Applications (Py)Torch Deep Reinforcement Learning
Benefits Learning
MXNET Others

Sec. VI Sec. VII


Deep Learning Driven Mobile & Wireless Networking Tailoring Deep Learning to Mobile Networks
Mobile Big Data as a Prerequisite

1. Deep Learning Driven Network- 2. Deep Learning Driven App- 3. Deep Learning Driven Mobility Mobile Devices Distributed Data Changing Mobile
Level Mobile Data Analysis Level Mobile Data Analysis Analysis and User Localization and Systems Containers Environment

Network Traffic CDR Mobile Mobile Pattern Mobile NLP Mobility User
Prediction Classification Mining Healthcare Recognition and ASR Analysis Localization Deep Deep
Model Training
Lifelong Transfer
Parallelism Parallelism
Learning Learning
4. Deep Learning Driven 5. Deep Learning Driven 6. Deep Learning Driven
Wireless Sensor Network Network Control Network Security

7. Deep Learning Driven Network Sec. VIII


Routing Scheduling Infrasructure
Signal Processing Optimisation Future Research Perspectives

Resource Radio
Others Software User Privacy
Allocation Control Serving Deep Learning with
MIMO Systems Spatio-Temporal Mobile
Massive and High-Quality
Traffic Data Mining
Data
8. Emerging Deep Learning Driven
Modulation Others
Mobile Network Applications
Deep Reinforcement
Deep Unsupervised Learning
Learning for Mobile
in Mobile Networks
Network Data IoT In-Network Mobile Mobile Internet of Network Control
Monetisation Computation Crowdsensing Blockchain Vehicles (IoVs)

Fig. 1: Diagramatic view of the organization of this survey.

[23], [24]). LeCun et al. give a milestone overview of deep for recommender systems [32], which have potential to play
learning, introduce several popular models, and look ahead an important role in mobile advertising. As deep learning
at the potential of deep neural networks [20]. Schmidhuber becomes increasingly popular, Goodfellow et al. provide a
undertakes an encyclopedic survey of deep learning, likely comprehensive tutorial of deep learning in a book that covers
the most comprehensive thus far, covering the evolution, prerequisite knowledge, underlying principles, and popular
methods, applications, and open research issues [21]. Liu et al. applications [18].
summarize the underlying principles of several deep learning
models, and review deep learning developments in selected B. Surveys on Future Mobile Networks
applications, such as speech processing, pattern recognition, The emerging 5G mobile networks incorporate a host of
and computer vision [22]. new techniques to overcome the performance limitations of
Arulkumaran et al. present several architectures and core current deployments and meet new application requirements.
algorithms for deep reinforcement learning, including deep Q- Progress to date in this space has been summarized through
networks, trust region policy optimization, and asynchronous surveys, tutorials, and magazine papers (e.g. [4], [5], [38],
advantage actor-critic [26]. Their survey highlights the re- [39], [47]). Andrews et al. highlight the differences between
markable performance of deep neural networks in different 5G and prior mobile network architectures, conduct a com-
control problem (e.g., video gaming, Go board game play, prehensive review of 5G techniques, and discuss research
etc.). Similarly, deep reinforcement learning has also been challenges facing future developments [38]. Agiwal et al.
surveyed in [75], where the authors shed more light on ap- review new architectures for 5G networks, survey emerging
plications. Zhang et al. survey developments in deep learning wireless technologies, and point out research problems that
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 4

D
TABLE II: Summary of existing surveys, magazine papers, and books related to deep learning and mobile networking. The
symbol indicates a publication is in the scope of a domain; 7 marks papers that do not directly cover that area, but from
which readers may retrieve some related insights. Publications related to both deep learning and mobile networks are shaded.
Scope
Publication One-sentence summary Machine learning Mobile networking
Deep Other ML Mobile 5G tech-
learning methods big data nology
LeCun et al. [20] A milestone overview of deep learning. D
Schmidhuber [21] A comprehensive deep learning survey. D
Liu et al. [22] A survey on deep learning and its applications. D
Deng et al. [23] An overview of deep learning methods and applications. D
Deng [24] A tutorial on deep learning. D
Goodfellow et al. [18] An essential deep learning textbook. D 7
Simeone [25] An introduction to machine learning for engineers. D D
Arulkumaran et al. [26] A survey of deep reinforcement learning. D 7
Hussein et al. [27] A survey of imitation learning. D D
Chen et al. [28] An introduction to deep learning for big data. D 7 7
Najafabadi [29] An overview of deep learning applications for big data analytics. D 7 7
Hordri et al. [30] A brief of survey of deep learning for big data applications. D 7 7
Gheisari et al. [31] A high-level literature review on deep learning for big data analytics. D 7
Zhang et al. [32] A survey and outlook of deep learning for recommender systems. D 7 7
Yu et al. [33] A survey on networking big data. D
Alsheikh et al. [34] A survey on machine learning in wireless sensor networks. D D
Tsai et al. [35] A survey on data mining in IoT. D D
Cheng et al. [36] An introductions mobile big data its applications. D 7
Bkassiny et al. [37] A survey on machine learning in cognitive radios. D 7 7
Andrews et al. [38] An introduction and outlook of 5G networks. D
Gupta et al. [5] A survey of 5G architecture and technologies. D
Agiwal et al. [4] A survey of 5G mobile networking techniques. D
Panwar et al. [39] A survey of 5G networks features, research progress and open issues. D
Elijah et al. [40] A survey of 5G MIMO systems. D
Buzzi et al. [41] A survey of 5G energy-efficient techniques. D
Peng et al. [42] An overview of radio access networks in 5G. 7 D
Niu et al. [43] A survey of 5G millimeter wave communications. D
Wang et al. [2] 5G backhauling techniques and radio resource management. D
Giust et al. [3] An overview of 5G distributed mobility management. D
Foukas et al. [44] A survey and insights on network slicing in 5G. D
Taleb et al. [45] A survey on 5G edge architecture and orchestration. D
Mach and Becvar [46] A survey on MEC. D
Mao et al. [47] A survey on mobile edge computing. D D
Wang et al. [48] An architecture for personalized QoE management in 5G. D D
Han et al. [49] Insights to mobile cloud sensing, big data, and 5G. D D
Singh et al. [50] A survey on social networks over 5G. 7 D D
Chen et al. [51] An introduction to 5G cognitive systems for healthcare. 7 7 7 D
Chen et al. [52] Machine learning for traffic offloading in cellular network D D
Wu et al. [53] Big data toward green cellular networks D D D
Buda et al. [54] Machine learning aided use cases and scenarios in 5G. D D D
Imran et al. [55] An introductions to big data analysis for self-organizing networks (SON) in 5G. D D D
Keshavamurthy et al. [56] Machine learning perspectives on SON in 5G. D D D
Klaine et al. [57] A survey of machine learning applications in SON. 7 D D D
Jiang et al. [7] Machine learning paradigms for 5G. 7 D D D
Li et al. [58] Insights into intelligent 5G. 7 D D D
Bui et al. [59] A survey of future mobile networks analysis and optimization. 7 D D D
Kasnesis et al. [60] Insights into employing deep learning for mobile data analysis. D D
Alsheikh et al. [17] Applying deep learning and Apache Spark for mobile data analytics. D D
Cheng et al. [61] Survey of mobile big data analysis and outlook. D D D 7
Wang and Jones [62] A survey of deep learning-driven network intrusion detection. D D D 7
Kato et al. [63] Proof-of-concept deep learning for network traffic control. D D
Zorzi et al. [64] An introduction to machine learning driven network optimization. D D D
Fadlullah et al. [65] A comprehensive survey of deep learning for network traffic control. D D D 7
Zheng et al. [6] An introduction to big data-driven 5G optimization. D D D D
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 5

TABLE III: Continued from Table II


Scope
Publication One-sentence summary Machine learning Mobile networking
Deep Other ML Mobile 5G tech-
learning methods big data nology
Mohammadi et al. [66] A survey of deep learning in IoT data analytics. D D D
Ahad et al. [67] A survey of neural networks in wireless networks. D 7 7 D
Mao et al. [68] A survey of deep learning for wireless networks. D D D
Zhou et al. [69] A survey of ML and cognitive wireless communications. D D D D
Chen et al. [70] A tutorial on neural networks for wireless networks. D D D D
Gharaibeh et al. [71] A survey of smart cities. D D D D
Lane et al. [72] An overview and introduction of deep learning-driven mobile sensing. D D D
Ota et al. [73] A survey of deep learning for mobile multimedia. D D D
Mishra et al. [74] A survey machine learning driven intrusion detection. D D D D
Our work A comprehensive survey of deep learning for mobile and wireless network. D D D D

remain unsolved [4]. Gupta et al. also review existing work problems [67]. Despite several limitations of neural networks
on 5G cellular network architectures, subsequently proposing identified, this article focuses largely on old neural networks
a framework that incorporates networking ingredients such as models, ignoring recent progress in deep learning and suc-
Device-to-Device (D2D) communication, small cells, cloud cessful applications in current mobile networks. Lane et al.
computing, and the IoT [5]. investigate the suitability and benefits of employing deep
Intelligent mobile networking is becoming a popular re- learning in mobile sensing, and emphasize on the potential
search area and related work has been reviewed in the literature for accurate inference on mobile devices [72]. Ota et al.
(e.g. [7], [34], [37], [54], [56]–[59]). Jiang et al. discuss report novel deep learning applications in mobile multimedia.
the potential of applying machine learning to 5G network Their survey covers state-of-the-art deep learning practices in
applications including massive MIMO and smart grids [7]. mobile health and wellbeing, mobile security, mobile ambi-
This work further identifies several research gaps between ML ent intelligence, language translation, and speech recognition.
and 5G that remain unexplored. Li et al. discuss opportunities Mohammadi et al. survey recent deep learning techniques for
and challenges of incorporating artificial intelligence (AI) into Internet of Things (IoT) data analytics [66]. They overview
future network architectures and highlight the significance of comprehensively existing efforts that incorporate deep learning
AI in the 5G era [58]. Klaine et al. present several successful into the IoT domain and shed light on current research
ML practices in Self-Organizing Networks (SONs), discuss challenges and future directions. Mao et al. focus on deep
the pros and cons of different algorithms, and identify future learning in wireless networking [68]. Their work surveys state-
research directions in this area [57]. Potential exists to apply of-the-art deep learning applications in wireless networks, and
AI and exploit big data for energy efficiency purposes [53]. discusses research challenges to be solved in the future.
Chen et al. survey traffic offloading approaches in wireless
networks, and propose a novel reinforcement learning based
D. Our Scope
solution [52]. This opens a new research direction toward em-
bedding machine learning towards greening cellular networks. The objective of this survey is to provide a comprehensive
view on state-of-the-art deep learning practices in the mobile
C. Deep Learning Driven Networking Applications networking area. By this we aim to answer the following key
questions:
A growing number of papers survey recent works that
bring deep learning into the computer networking domain. 1) Why is deep learning promising for solving mobile
Alsheikh et al. identify benefits and challenges of using big networking problems?
data for mobile analytics and propose a Spark based deep 2) What are the cutting-edge deep learning models relevant
learning framework for this purpose [17]. Wang and Jones to mobile and wireless networking?
discuss evaluation criteria, data streaming and deep learning 3) What are the most recent successful deep learning appli-
practices for network intrusion detection, pointing out research cations in the mobile networking domain?
challenges inherent to such applications [62]. Zheng et al. 4) How can researchers tailor deep learning to specific
put forward a big data-driven mobile network optimization mobile networking problems?
framework in 5G networks, to enhance QoE performance [6]. 5) Which are the most important and promising directions
More recently, Fadlullah et al. deliver a survey on the progress worthy of further study?
of deep learning in a board range of areas, highlighting its The research papers and books we mentioned previously
potential application to network traffic control systems [65]. only partially answer these questions. This article goes beyond
Their work also highlights several unsolved research issues these previous works and specifically focuses on the crossovers
worthy of future study. between deep learning and mobile networking. We cover a
Ahad et al. introduce techniques, applications, and guide- range of neural network (NN) structures that are increasingly
lines on applying neural networks to wireless networking important and have not been explicitly discussed in earlier
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 6

tutorials, e.g., [76]. This includes auto-encoders and Genera- (usually more than one) can be regarded as ‘deep’ models.
tive Adversarial Networks. Unlike such existing tutorials, we Besides deep NNs, other architectures have multiple layers,
also review open-source libraries for deploying and training such as deep Gaussian processes [79], neural processes [80],
neural networks, a range of optimization algorithms, and the and deep random forests [81], and can also be regarded as
parallelization of neural networks models and training across deep learning structures. The major benefit of deep learning
large numbers of mobile devices. We also review applications over traditional ML is thus the automatic feature extraction,
not looked at in other related surveys, including traffic/user by which expensive hand-crafted feature engineering can be
analytics, security and privacy, mobile health, etc. circumvented. We illustrate the relation between deep learning,
While our main scope remains the mobile networking machine learning, and artificial intelligence (AI) at a high level
domain, for completeness we also discuss deep learning appli- in Fig. 2.
cations to wireless networks, and identify emerging application
domains intimately connected to these areas. We differentiate
between mobile networking, which refers to scenarios where
devices are portable, battery powered, potentially wearable,
and routinely connected to cellular infrastructure, and wireless
networking, where devices are mostly fixed, and part of a Applications in mobile &
wireless networks Examples: Examples:
distributed infrastructure (including WLANs and WSNs), and (our scope)
Examples:
Supervised learning Rule engines
MLP, CNN,
serve a single application. Overall, our paper distinguishes RNN Unsupervised learning Expert systems
Evolutionary algorithms
itself from earlier surveys from the following perspectives: Deep Learning Reinforcement learning

(i) We particularly focus on deep learning applications for Machine Learning


mobile network analysis and management, instead of
broadly discussing deep learning methods (as, e.g., in
[20], [21]) or centering on a single application domain, AI
e.g. mobile big data analysis with a specific platform [17].
(ii) We discuss cutting-edge deep learning techniques from Fig. 2: Venn diagram of the relation between deep learning,
the perspective of mobile networks (e.g., [77], [78]), machine learning, and AI. This survey particularly focuses on
focusing on their applicability to this area, whilst giving deep learning applications in mobile and wireless networks.
less attention to conventional deep learning models that
may be out-of-date.
(iii) We analyze similarities between existing non-networking A. The Evolution of Deep Learning
problems and those specific to mobile networks; based
The discipline traces its origins 75 years back, when
on this analysis we provide insights into both best deep
threshold logic was employed to produce a computational
learning architecture selection strategies and adaptation
model for neural networks [82]. However, it was only in
approaches, so as to exploit the characteristics of mobile
the late 1980s that neural networks (NNs) gained interest, as
networks for analysis and management tasks.
Rumelhart et al. showed that multi-layer NNs could be trained
To the best of our knowledge, this is the first time effectively by back-propagating errors [83]. LeCun and Bengio
that mobile network analysis and management are jointly subsequently proposed the now popular Convolutional Neural
reviewed from a deep learning angle. We also provide for Network (CNN) architecture [84], but progress stalled due to
the first time insights into how to tailor deep learning to computing power limitations of systems available at that time.
mobile networking problems. Following the recent success of GPUs, CNNs have been em-
ployed to dramatically reduce the error rate in the Large Scale
III. D EEP L EARNING 101 Visual Recognition Challenge (LSVRC) [85]. This has drawn
We begin with a brief introduction to deep learning, high- unprecedented interest in deep learning and breakthroughs
lighting the basic principles behind computation techniques in continue to appear in a wide range of computer science areas.
this field, as well as key advantages that lead to their success.
Deep learning is essentially a sub-branch of ML, which B. Fundamental Principles of Deep Learning
essentially enables an algorithm to make predictions, classifi- The key aim of deep neural networks is to approximate com-
cations, or decisions based on data, without being explicitly plex functions through a composition of simple and predefined
programmed. Classic examples include linear regression, the operations of units (or neurons). Such an objective function
k-nearest neighbors classifier, and Q-learning. In contrast to can be almost of any type, such as a mapping between images
traditional ML tools that rely heavily on features defined and their class labels (classification), computing future stock
by domain experts, deep learning algorithms hierarchically prices based on historical values (regression), or even deciding
extract knowledge from raw data through multiple layers of the next optimal chess move given the current status on the
nonlinear processing units, in order to make predictions or board (control). The operations performed are usually defined
take actions according to some target objective. The most by a weighted combination of a specific group of hidden units
well-known deep learning models are neural networks (NNs), with a non-linear activation function, depending on the struc-
but only NNs that have a sufficient number of hidden layers ture of the model. Such operations along with the output units
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 7

are named “layers”. The neural network architecture resembles weight of the last hidden layer, and updates the weight by
the perception process in a brain, where a specific set of units computing:
are activated given the current environment, influencing the dL(w)
w4 = w4 − λ . (2)
output of the neural network model. dw4
Here λ denotes the learning rate, which controls the step
C. Forward and Backward Propagation size of moving in the direction indicated by the gradient. The
In mathematical terms, the architecture of deep neural same operation is performed for each weight, following the
networks is usually differentiable, therefore the weights (or chain rule. The process is repeated and eventually the gradient
parameters) of the model can be learned by minimizing descent will lead to a set w that minimizes the L(w).
a loss function using gradient descent methods through For other NN structures, the training and inference processes
back-propagation, following the fundamental chain rule [83]. are similar. To help less expert readers we detail the principles
We illustrate the principles of the learning and inference and computational details of various deep learning techniques
processes of a deep neural network in Fig. 3, where we use a in Sec.V.
two-dimensional (2D) Convolutional Neural Network (CNN) TABLE IV: Summary of the benefits of applying deep learning
as an example. to solve problems in mobile and wireless networks.
Key aspect Description Benefits
Forward propagation: The figure shows a CNN with 5 layers, Deep neural networks can Reduce expensive
i.e., an input layer (grey), 3 hidden layers (blue) and an output automatically extract hand-crafted feature
Feature
layer (orange). In forward propagation, A 2D input x (e.g extraction
high-level features engineering in processing
through layers of heterogeneous and noisy
images) is first processed by a convolutional layer, which different depths. mobile big data.
perform the following convolutional operation: Unlike traditional ML
tools, the performance of Efficiently utilize huge
h1 = σ(w1 ∗ x). (1) Big data
deep learning usually amounts of mobile data
exploitation
grow significantly with generated at high rates.
Here h1 is the output of the first hidden layer, w1 is the the size of training data.
convolutional filter and σ(·) is the activation function, Unsuper-
Deep learning is effective Handling large amounts
aiming at improving the non-linearity and representability in processing unl-/semi- of unlabeled data, which
vised
labeled data, enabling are common in mobile
of the model. The output h1 is subsequently provided as learning
unsupervised learning. system.
input to and processed by the following two convolutional Features learned by Reduce computational
layers, which eventually produces a final output y. This neural networks through and memory requirements
Multi-task
hidden layers can be when performing
could be for instance vector of probabilities for different learning
applied to different tasks multi-task learning in
possible patterns (shapes) discovered in the (image) input. by transfer learning. mobile systems.
To train the CNN appropriately, one uses a loss function
L(w) to measure the distance between the output y and the
ground truth y∗ . The purpose of training is to find the best D. Advantages of Deep Learning
weights w, so as to minimize the loss function L(w). This can
We recognize several benefits of employing deep learning
be achieved by the back propagation through gradient descent.
to address network engineering problems, as summarized in
Table IV. Specifically:
Backward propagation: During backward propagation, one
computes the gradient of the loss function L(w) over the 1) It is widely acknowledged that, while vital to the perfor-
mance of traditional ML algorithms, feature engineering
is costly [86]. A key advantage of deep learning is that
Forward Passing (Inference) it can automatically extract high-level features from data
that has complex structure and inner correlations. The
Units
learning process does not need to be designed by a
human, which tremendously simplifies prior feature hand-
Inputs x crafting [20]. The importance of this is amplified in the
Outputs y
context of mobile networks, as mobile data is usually gen-
Hidden Hidden Hidden erated by heterogeneous sources, is often noisy, and ex-
Layer 1 Layer 2 Layer 3
hibits non-trivial spatial/temporal patterns [17], whose la-
beling would otherwise require outstanding human effort.
2) Secondly, deep learning is capable of handling large
Backward Passing (Learning)
amounts of data. Mobile networks generate high volumes
of different types of data at fast pace. Training traditional
Fig. 3: Illustration of the learning and inference processes ML algorithms (e.g., Support Vector Machine (SVM) [87]
of a 4-layer CNN. w(·) denote weights of each hidden layer, and Gaussian Process (GP) [88]) sometimes requires to
σ(·) is an activation function, λ refers to the learning rate, store all the data in memory, which is computationally
∗(·) denotes the convolution operation and L(w) is the loss infeasible under big data scenarios. Furthermore, the
function to be optimized. performance of ML does not grow significantly with
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 8

large volumes of data and plateaus relatively fast [18]. In


contrast, Stochastic Gradient Descent (SGD) employed to
train NNs only requires sub-sets of data at each training Fast Optimization Algorithms
step, which guarantees deep learning’s scalability with (e.g., SGD, RMSprop, Adam)
big data. Deep neural networks further benefit as training
with big data prevents model over-fitting. Dedicated Deep Fog Computing
3) Traditional supervised learning is only effective when Learning Libraries (Software)
sufficient labeled data is available. However, most cur- (e.g., Tensorflow, (e.g., Core ML,
Pytorth, Caffe2) DeepSence)
rent mobile systems generate unlabeled or semi-labeled
data [17]. Deep learning provides a variety of methods
that allow exploiting unlabeled data to learn useful pat- Distributed Machine Learning Systems
terns in an unsupervised manner, e.g., Restricted Boltz- (e.g., Gaia, TUX2)
mann Machine (RBM) [89], Generative Adversarial Net-
work (GAN) [90]. Applications include clustering [91],
data distributions approximation [90], un/semi-supervised Advanced Parallel Fog Computing
Computing (Hardware)
learning [92], [93], and one/zero shot learning [94], [95], (e.g., GPU, TPU) (e.g., nn-X, Kirin 970)
among others.
4) Compressive representations learned by deep neural net-
works can be shared across different tasks, while this
is limited or difficult to achieve in other ML paradigms Fig. 4: Hierarchical view of deep learning enablers. Parallel
(e.g., linear regression, random forest, etc.). Therefore, a computing and hardware in fog computing lay foundations for
single model can be trained to fulfill multiple objectives, deep learning. Distributed machine learning systems can build
without requiring complete model retraining for different upon them, to support large-scale deployment of deep learning.
tasks. We argue that this is essential for mobile network Deep learning libraries run at the software level, to enable
engineering, as it reduces computational and memory fast deep learning implementation. Higher-level optimizers are
requirements of mobile systems when performing multi- used to train the NN, to fulfill specific objectives.
task learning applications [96].
Although deep learning can have unique advantages when
addressing mobile network problems, it requires certain system these need to be updated during every training step, requiring
and software support, in order to be effectively deployed in powerful computation resources. The training and inference
mobile networks. We review and discuss such enablers in the processes involve huge amounts of matrix multiplications and
next section. other operations, though they could be massively parallelized.
Traditional Central Processing Units (CPUs) have a limited
IV. E NABLING D EEP L EARNING IN M OBILE N ETWORKING number of cores, thus they only support restricted computing
parallelism. Employing CPUs for deep learning implementa-
5G systems seek to provide high throughput and ultra-low tions is highly inefficient and will not satisfy the low-latency
latency communication services, to improve users’ QoE [4]. requirements of mobile systems.
Implementing deep learning to build intelligence into 5G
Engineers address these issues by exploiting the power of
systems, so as to meet these objectives is expensive. This is
GPUs. GPUs were originally designed for high performance
because powerful hardware and software is required to support
video games and graphical rendering, but new techniques
training and inference in complex settings. Fortunately, several
such as Compute Unified Device Architecture (CUDA) [101]
tools are emerging, which make deep learning in mobile
and the CUDA Deep Neural Network library (cuDNN) [102]
networks tangible; namely, (i) advanced parallel computing,
developed by NVIDIA add flexibility to this type of hardware,
(ii) distributed machine learning systems, (iii) dedicated deep
allowing users to customize their usage for specific purposes.
learning libraries, (iv) fast optimization algorithms, and (v) fog
GPUs usually incorporate thousand of cores and perform ex-
computing. These tools can be seen as forming a hierarchical
ceptionally in fast matrix multiplications required for training
structure, as illustrated in Fig. 4; synergies between them
neural networks. This provides higher memory bandwidth
exist that make networking problem amenable to deep learning
over CPUs and dramatically speeds up the learning process.
based solutions. By employing these tools, once the training
Recent advanced Tensor Processing Units (TPUs) developed
is completed, inferences can be made within millisecond
by Google even demonstrate 15-30× higher processing speeds
timescales, as already reported by a number of papers for a
and 30-80× higher performance-per-watt, as compared to
range of tasks (e.g., [97]–[99] ). We summarize these advances
CPUs and GPUs [100].
in Table V and review them in what follows.
Diffractive neural networks (D2 NNs) that completely rely
on light communication were recently introduced in [117],
A. Advanced Parallel Computing to enable zero-consumption and zero-delay deep learning.
Compared to traditional machine learning models, deep The D2 NN is composed of several transmissive layers, where
neural networks have significantly larger parameters spaces, points on these layers act as neurons in a NN. The structure
intermediate outputs, and number of gradient values. Each of is trained to optimize the transmission/reflection coefficients,
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 9

TABLE V: Summary of tools and techniques that enable deploying deep learning in mobile systems.
Performance Energy con- Economic
Technique Examples Scope Functionality
improvement sumption cost
Enable fast, parallel
Advanced GPU, TPU [100],
Mobile servers, training/inference of deep Medium
parallel CUDA [101], High High
workstations learning models in mobile (hardware)
computing cuDNN [102]
applications
TensorFlow [103], High-level toolboxes that
Associated
Dedicated deep Theano [104], Mobile servers enable network engineers to Low
Medium with
learning library Caffe [105], and devices build purpose-specific deep (software)
hardware
Torch [106] learning architectures
nn-X [107], ncnn
[108], Kirin 970 Support edge-based deep Medium
Fog computing Mobile devices Medium Low
[109], learning computing (hardware)
Core ML [110]
Nesterov [111], Associated
Fast optimization Training deep Accelerate and stabilize the Low
Adagrad [112], RM- Medium with
algorithms architectures model optimization process (software)
Sprop, Adam [113] hardware
MLbase [114],
Distributed Distributed data Support deep learning
Gaia [10], Tux2 [11], High
machine learning centers, frameworks in mobile High High
Adam [115], (hardware)
systems cross-server systems across data centers
GeePS [116]

which are equivalent to weights in a NN. Once trained, in a cluster and to avoid congestion.
transmissive layers will be materialized via 3D printing and Storage – Designing efficient storage mechanisms tailored
they can subsequently be used for inference. to different environments (e.g., distributed clusters, single
There are also a number of toolboxes that can assist the machines, GPUs), given I/O and data processing diversity.
computational optimization of deep learning on the server side. Resource management – Assigning workloads and ensuring
Spring and Shrivastava introduce a hashing based technique that nodes work well-coordinated.
that substantially reduces computation requirements of deep Programming model – Designing programming interfaces to
network implementations [118]. Mirhoseini et al. employ a support multiple programming languages.
reinforcement learning scheme to enable machines to learn There exist several distributed machine learning systems
the optimal operation placement over mixture hardware for that facilitate deep learning in mobile networking applications.
deep neural networks. Their solution achieves up to 20% Kraska et al. introduce a distributed system named MLbase,
faster computation speed than human experts’ designs of such which enables to intelligently specify, select, optimize, and
placements [119]. parallelize ML algorithms [114]. Their system helps non-
Importantly, these systems are easy to deploy, therefore
experts deploy a wide range of ML methods, allowing opti-
mobile network engineers do not need to rebuild mobile
mization and running ML applications across different servers.
servers from scratch to support deep learning computing. This
Hsieh et al. develop a geography-distributed ML system called
makes implementing deep learning in mobile systems feasible
Gaia, which breaks the throughput bottleneck by employing
and accelerates the processing of mobile data streams.
an advanced communication mechanism over Wide Area Net-
B. Distributed Machine Learning Systems works, while preserving the accuracy of ML algorithms [10].
Their proposal supports versatile ML interfaces (e.g. Tensor-
Mobile data is collected from heterogeneous sources (e.g., Flow, Caffe), without requiring significant changes to the ML
mobile devices, network probes, etc.), and stored in multiple algorithm itself. This system enables deployments of complex
distributed data centers. With the increase of data volumes, it deep learning applications over large-scale mobile networks.
is impractical to move all mobile data to a central data center Xing et al. develop a large-scale machine learning platform
to run deep learning applications [10]. Running network-wide to support big data applications [120]. Their architecture
deep learning algorithms would therefore require distributed achieves efficient model and data parallelization, enabling
machine learning systems that support different interfaces parameter state synchronization with low communication cost.
(e.g., operating systems, programming language, libraries), so Xiao et al. propose a distributed graph engine for ML named
as to enable training and evaluation of deep models across TUX2 , to support data layout optimization across machines
geographically distributed servers simultaneously, with high and reduce cross-machine communication [11]. They demon-
efficiency and low overhead. strate remarkable performance in terms of runtime and con-
Deploying deep learning in a distributed fashion will vergence on a large dataset with up to 64 billion edges.
inevitably introduce several system-level problems, which Chilimbi et al. build a distributed, efficient, and scalable
require satisfying the following properties: system named “Adam".2 tailored to the training of deep
Consistency – Guaranteeing that model parameters and com- models [115]. Their architecture demonstrates impressive per-
putational processes are consistent across all machines. formance in terms of throughput, delay, and fault tolerance.
Fault tolerance – Effectively dealing with equipment break- Another dedicated distributed deep learning system called
downs in large-scale distributed machine learning systems.
Communication – Optimizing communication between nodes 2 Note that this is distinct from the Adam optimizer discussed in Sec. IV-D
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 10

TABLE VI: Summary and comparison of mainstream deep learning libraries.


Low-Level Available Mobile Popu- High-Level
Library Pros Cons
Language(s) Interface Support larity Libraries
• Large user community
• Well-written documentation Keras,
Tensor- Python, Java, • • Difficult to debug
C++ Complete functionality Yes High TensorLayer [121],
Flow C, C++, Go • • Package is heavy
Provides visualization tools Luminoth
• Multiple interfaces support
• Difficult to learn
• Flexible Keras, Blocks,
Theano Python Python • Long compilation time No Low
• Fast runtime Lasagne
• No longer maintained
Python, • Fast runtime • Small user base
Caffe(2) C++ Yes Medium None
Matlab • Multiple platforms support • Modest documentation
• Easy to build models in
Lua,
• Flexible
(Py)Torch Lua, C++ Python, C, • Limited resources Yes High None
• Well documented
C++
• Easy to debug
C++, • Lightweight
• Small user base
MXNET C++ Python, • Memory-efficient Yes Low Gluon
• Difficult to learn
Matlab, R • Fast training

GeePS is developed by Cui et al. [116]. Their framework and building a NNs with it involves substantial compiling
allows data parallelization on distributed GPUs, and demon- time. Though Theano has a large user base and a support
strates higher training throughput and faster convergence rate. community, and at some stage was one of the most popular
deep learning tools, its popularity is decreasing rapidly, as
C. Dedicated Deep Learning Libraries core ideas and attributes are absorbed by TensorFlow.
Building a deep learning model from scratch can prove
complicated to engineers, as this requires definitions of Caffe(2) is a dedicated deep learning framework developed
forwarding behaviors and gradient propagation operations by Berkeley AI Research [105] and the latest version,
at each layer, in addition to CUDA coding for GPU Caffe2,6 was recently released by Facebook. Inheriting all
parallelization. With the growing popularity of deep learning, the advantages of the old version, Caffe2 has become a very
several dedicated libraries simplify this process. Most of these flexible framework that enables users to build their models
toolboxes work with multiple programming languages, and highly efficient. It also allows to train neural networks
are built with GPU acceleration and automatic differentiation on multiple GPUs within distributed systems, and supports
support. This eliminates the need of hand-crafted definition deep learning implementations on mobile operation systems,
of gradient propagation. We summarize these libraries below, such as iOS and Android. Therefore, it has the potential
and give a comparison among them in Table VI. to play an important role in the future mobile edge computing.

TensorFlow3 is a machine learning library developed by (Py)Torch is a scientific computing framework with wide
Google [103]. It enables deploying computation graphs on support for machine learning models and algorithms [106].
CPUs, GPUs, and even mobile devices [122], allowing ML It was originally developed in the Lua language, but
implementation on both single and distributed architectures. developers later released an improved Python version [123].
Although originally designed for ML and deep neural In essence PyTorch is a lightweight toolbox that can run
networks applications, TensorFlow is also suitable for other on embedded systems such as smart phones, but lacks
data-driven research purposes. Detailed documentation and comprehensive documentations. Since building NNs in
tutorials for Python exist, while other programming languages PyTorch is straightforward, the popularity of this library is
such as C, Java, and Go are also supported. currently it growing rapidly. PyTorch is now officially maintained by
is the most popular deep learning library. Building upon Facebook and mainly employed for research purposes.
TensorFlow, several dedicated deep learning toolboxes were
released to provide higher-level programming interfaces, MXNET is a flexible deep learning library that provides
including Keras4 , Luminoth 5 and TensorLayer [121]. interfaces for multiple languages (e.g., C++, Python, Matlab,
R, etc.) [124]. It supports different levels of machine learning
Theano is a Python library that allows to efficiently define, models, from logistic regression to GANs. MXNET provides
optimize, and evaluate numerical computations involving fast numerical computation for both single machine and dis-
multi-dimensional data [104]. It provides both GPU and tributed ecosystems. It wraps workflows commonly used in
CPU modes, which enables users to tailor their programs to deep learning into high-level functions, such that standard
individual machines. Learning Theano is however difficult neural networks can be easily constructed without substantial
coding effort. However, learning how to work with this toolbox
3 TensorFlow, https://www.tensorflow.org/
4 Keras
in short time frame is difficult, hence the number of users who
deep learning library, https://github.com/fchollet/keras
5 Luminoth deep learning library for computer vision, https://github.com/
tryolabs/luminoth 6 Caffe2, https://caffe2.ai/
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 11

prefer this library is relatively small. MXNET is the official to distributed systems [131]. They quantize float gradient val-
deep learning framework in Amazon. ues to {-1, 0 and +1} in the training processing, which theoret-
Although less popular, there are other excellent deep ically require 20 times less gradient communications between
learning libraries, such as CNTK,7 Deeplearning4j,8 Blocks,9 nodes. The authors prove that such gradient approximation
Gluon,10 and Lasagne,11 which can also be employed in mechanism allows the objective function to converge to optima
mobile systems. Selecting among these varies according to with probability 1, where in their experiments only a 2%
specific applications. accuracy loss is observed on average on GoogleLeNet [129]
training. Zhou et al. employ a differential private mechanism
D. Fast Optimization Algorithms to compare training and validation gradients, to reuse samples
The objective functions to be optimized in deep learning and keep them fresh [130]. This can dramatically reduce
are usually complex, as they involve sums of extremely large overfitting during training.
numbers of data-wise likelihood functions. As the depth of
the model increases, such functions usually exhibit high non- E. Fog Computing
convexity with multiple local minima, critical points, and
The fog computing paradigm presents a new opportunity to
saddle points. In this case, conventional Stochastic Gradi-
implement deep learning in mobile systems. Fog computing
ent Descent (SGD) algorithms [125] are slow in terms of
refers to a set of techniques that permit deploying applications
convergence, which will restrict their applicability to latency
or data storage at the edge of networks [132], e.g., on
constrained mobile systems. To overcome this problem and
individual mobile devices. This reduces the communications
stabilize the optimization process, many algorithms evolve the
overhead, offloads data traffic, reduces user-side latency, and
traditional SGD, allowing NN models to be trained faster for
lightens the sever-side computational burdens [133], [134]. A
mobile applications. We summarize the key principles behind
formal definition of fog computing is given in [135], where
these optimizers and make a comparison between them in
this is interpreted as ’a huge number of heterogeneous (wire-
Table VII. We delve into the details of their operation next.
less and sometimes autonomous) ubiquitous and decentralized
Fixed Learning Rate SGD Algorithms: Suskever et al.
devices [that] communicate and potentially cooperate among
introduce a variant of the SGD optimizer with Nesterov’s
them and with the network to perform storage and processing
momentum, which evaluates gradients after the current ve-
tasks without the intervention of third parties.’ To be more
locity is applied [111]. Their method demonstrates faster
concrete, it can refer to smart phones, wearables devices
convergence rate when optimizing convex functions. Another
and and vehicles which store, analyze and exchange data,
approach is Adagrad, which performs adaptive learning to
to offload the burden from cloud and perform more delay-
model parameters according to their update frequency. This is
sensitive tasks. Since fog computing involves deployment at
suitable for handling sparse data and significantly outperforms
the edge, participating devices usually have limited computing
SGD in terms of robustness [112]. Adadelta improves the
resource and battery power. Therefore, special hardware and
traditional Adagrad algorithm, enabling it to converge faster,
software are required for deep learning implementation, as we
and does not rely on a global learning rate [126]. RMSprop is
explain next.
a popular SGD based method introduced by G. Hinton. RM-
Hardware: There exist several efforts that attempt to shift
Sprop divides the learning rate by an exponential smoothing
deep learning computing from the cloud side to mobile de-
the average of gradients and does not require one to set the
vices. For example, Gokhale et al. develop a mobile copro-
learning rate for each training step [125].
cessor named neural network neXt (nn-X), which accelerates
Adaptive Learning Rate SGD Algorithms: Kingma and
the deep neural networks execution in mobile devices, while
Ba propose an adaptive learning rate optimizer named Adam,
retaining low energy consumption [107]. Bang et al. intro-
which incorporates momentum by the first-order moment of
duce a low-power and programmable deep learning processor
the gradient [113]. This algorithm is fast in terms of conver-
to deploy mobile intelligence on edge devices [136]. Their
gence, highly robust to model structures, and is considered as
hardware only consumes 288 µW but achieves 374 GOPS/W
the first choice if one cannot decide what algorithm to use.
efficiency. A Neurosynaptic Chip called TrueNorth is proposed
By incorporating the momentum into Adam, Nadam applies
by IBM [137]. Their solution seeks to support computationally
stronger constraints to the gradients, which enables faster
intensive applications on embedded battery-powered mobile
convergence [127].
Other Optimizers: Andrychowicz et al. suggest that the devices. Qualcomm introduces a Snapdragon neural process-
optimization process can be even learned dynamically [128]. ing engine to enable deep learning computational optimization
They pose the gradient descent as a trainable learning problem, tailored to mobile devices.12 Their hardware allows developers
which demonstrates good generalization ability in neural net- to execute neural network models on Snapdragon 820 boards
work training. Wen et al. propose a training algorithm tailored to serve a variety of applications. In close collaboration with
Google, Movidius develops an embedded neural network com-
7 MS Cognitive Toolkit, https://www.microsoft.com/en-us/cognitive-toolkit/ puting framework that allows user-customized deep learning
8 Deeplearning4j, http://deeplearning4j.org
9 Blocks, A Theano framework for building and training neural networks 12 Qualcomm Helps Make Your Mobile Devices Smarter With
https://github.com/mila-udem/blocks New Snapdragon Machine Learning Software Development Kit:
10 Gluon, A deep learning library https://gluon.mxnet.io/
https://www.qualcomm.com/news/releases/2016/05/02/qualcomm-helps-
11 Lasagne, https://github.com/Lasagne make-your-mobile-devices-smarter-new-snapdragon-machine
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 12

TABLE VII: Summary and comparison of different optimization algorithms.


Optimization algorithm Core idea Pros Cons
• Setting a global learning rate required
Computes the gradient of • Algorithm may get stuck on saddle
SGD [125] mini-batches iteratively and • Easy to implement points or local minima
updates the parameters • Slow in terms of convergence
• Unstable
Introduces momentum to
• Stable
maintain the last gradient
Nesterov’s momentum [111] • Faster learning • Setting a learning rate needed
direction for the next
• Can escape local minima
update
• Still requires setting a global
Applies different learning • Learning rate tailored to each learning rate
Adagrad [112] rates to different parameter • Gradients sensitive to the regularizer
parameters • Handle sparse gradients well • Learning rate becomes very slow in
the late stages
Improves Adagrad, by • Does not rely on a global learning rate • May get stuck in a local minima at
Adadelta [126] applying a self-adaptive • Faster speed of convergence late training
learning rate • Fewer hyper-parameters to adjust
• Learning rate tailored to each
Employs root mean square parameter
RMSprop [125] as a constraint of the • Learning rate do not decrease • Still requires a global learning rate
learning rate dramatically at late training • Not good at handling sparse gradients
• Works well in RNN training
• Learning rate stailored to each
Employs a momentum parameter
mechanism to store an • Good at handling sparse gradients and
Adam [113] • It may turn unstable during training
exponentially decaying non-stationary problems
average of past gradients • Memory-efficient
• Fast convergence
Incorporates Nesterov
Nadam [127] accelerated gradients into • Works well in RNN training —
Adam
Casts the optimization
• Does not require to design the • Require an additional RNN for
Learn to optimize [128] problem as a learning
learning by hand learning in the optimizer
problem using a RNN
Quantizes the gradients • Good for distributed training
Quantized training [129] • Loses training accuracy
into {-1, 0, 1} for training • Memory-efficient
Employs a differential
private mechanism to
• More stable
compare training and
Stable gradient descent [130] • Less overfitting • Only validated on convex functions
validation gradients, to
• Converges faster than SGD
reuse samples and keep
them fresh.

deployments at the edge of mobile networks. Their products source frameworks in terms of inference speed. Apple has
can achieve satisfying runtime efficiency, while operating developed “Core ML", a private ML framework to facilitate
with ultra-low power requirements. More recently, Huawei mobile deep learning implementation on iOS 11.16 This lowers
officially announced the Kirin 970 as a mobile AI computing the entry barrier for developers wishing to deploy ML models
system on chip.13 Their innovative framework incorporates on Apple equipment. Yao et al. develop a deep learning
dedicated Neural Processing Units (NPUs), which dramatically framework called DeepSense dedicated to mobile sensing
accelerates neural network computing, enabling classification related data processing, which provides a general machine
of 2,000 images per second on mobile devices. learning toolbox that accommodates a wide range of edge
Software: Beyond these hardware advances, there are also applications. It has moderate energy consumption and low
software platforms that seek to optimize deep learning on mo- latency, thus being amenable to deployment on smartphones.
bile devices (e.g., [138]). We compare and summarize all these
The techniques and toolboxes mentioned above make the
platforms in Table VIII.14 In addition to the mobile version of
deployment of deep learning practices in mobile network
TensorFlow and Caffe, Tencent released a lightweight, high-
applications feasible. In what follows, we briefly introduce
performance neural network inference framework tailored to
several representative deep learning architectures and discuss
mobile platforms, which relies on CPU computing.15 This
their applicability to mobile networking problems.
toolbox performs better than all known CPU-based open
13 Huawei announces the Kirin 970 – new flagship SoC with AI capabilities
http://www.androidauthority.com/huawei-announces-kirin-970-797788/
14 Adapted from https://mp.weixin.qq.com/s/3gTp1kqkiGwdq5olrpOvKw
15 ncnn is a high-performance neural network inference framework opti- 16 Core ML: Integrate machine learning models into your app, https:
mized for the mobile platform, https://github.com/Tencent/ncnn //developer.apple.com/documentation/coreml
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 13

TABLE VIII: Comparison of mobile deep learning platform.


Platform Developer Mobile hardware supported Speed Code size Mobile compatibility Open-sourced
TensorFlow Google CPU Slow Medium Medium Yes
Caffe Facebook CPU Slow Large Medium Yes
ncnn Tencent CPU Medium Small Good Yes
CoreML Apple CPU/GPU Fast Small Only iOS 11+ supported No
DeepSense Yao et al. CPU Medium Unknown Medium No

V. D EEP L EARNING : S TATE - OF - THE -A RT where k is the number of labels involved in classification.
Revisiting Fig. 2, machine learning methods can be natu- The MLP can be employed for supervised, unsupervised,
rally categorized into three classes, namely supervised learn- and even reinforcement learning purposes. Although this struc-
ing, unsupervised learning, and reinforcement learning. Deep ture was the most popular neural network in the past, its
learning architectures have achieved remarkable performance popularity is decreasing because it entails high complexity
in all these areas. In this section, we introduce the key prin- (fully-connected structure), modest performance, and low con-
ciples underpinning several deep learning models and discuss vergence efficiency. MLPs are mostly used as a baseline or
their largely unexplored potential to solve mobile networking integrated into more complex architectures (e.g., the final
problems. Technical details of classical models are provided to layer in CNNs used for classification). Building an MLP is
readers who seek to obtain a deeper understanding of neural straightforward, and it can be employed, e.g., to assist with
networks. The more experienced can continue reading with feature extraction in models built for specific objectives in
Sec. VI. We illustrate and summarize the most salient archi- mobile network applications. The advanced Adaptive learning
tectures that we present in Fig. 5 and Table IX, respectively. of neural Network (AdaNet) enables MLPs to dynamically
train their structures to adapt to the input [139]. This new
architecture can be potentially explored for analyzing contin-
A. Multilayer Perceptron uously changing mobile environments.
The Multilayer Perceptrons (MLPs) is the initial Artificial
Neural Network (ANN) design, which consists of at least three B. Boltzmann Machine
layers of operations [154]. Units in each layer are densely Restricted Boltzmann Machines (RBMs) [89] were orig-
connected, hence require to configure a substantial number of inally designed for unsupervised learning purposes. They
weights. We show an MLP with two hidden layers in Fig. 5(a). are essentially a type of energy-based undirected graphical
Note that usually only MLPs have more than 1 hidden layer models, which include a visible layer and a hidden layer, and
are regarded as deep learning structures. where each unit can only assume binary values (i.e., 0 and 1).
Given an input vector x, a standard MLP layer performs the The probabilities of these values are given by:
following operation: 1
P(h j = 1|v) =
y = σ(W · x + b). (3) 1 + e−W·v+bj
1
P(v j = 1|h) = ,
1+e −W T ·h+a
Here y denotes the output of the layer, W are the weights j

and b the biases. σ(·) is an activation function, which aims where h, v are the hidden and visible units respectively, and
at improving the non-linearity of the model. Commonly used W are weights and a, b are biases. The visible units are
activation function are the sigmoid, conditional independent to the hidden units, and vice versa.
1
sigmoid(x) = , RBMs can be effectively trained using the contrastive
1 + e−x
divergence algorithm [157] through multiple steps of Gibbs
the Rectified Linear Unit (ReLU) [155], sampling [158]. We illustrate the structure and the training
ReLU(x) = max(x, 0), process of an RBM in Fig. 5(b). RBM-based models are
usually employed to initialize the weights of a neural network
tanh, in more recent applications. The pre-trained model can be
ex − e−x
tanh(x) = , subsequently fine-tuned for supervised learning purposes using
ex + e−x a standard back-propagation algorithm. A stack of RBMs is
and the Scaled Exponential Linear Units (SELUs) [156], called a Deep Belief Network (DBN) [140], which performs
( layer-wise training and achieves superior performance as com-
x, if x > 0;
SELU(x) = λ pared to MLPs in many applications, including time series
αe − α, if x ≤ 0,
x
forecasting [159], ratio matching [160], and speech recognition
where the parameters λ = 1.0507 and α = 1.6733 are [161]. Such structures can be even extended to a convolutional
frequently used. In addition, the softmax function is typically architecture, to learn hierarchical spatial representations [141].
employed in the last layer when performing classification: C. Auto-Encoders
exi Auto-Encoders (AEs) are also designed for unsupervised
softmax(xi ) = Ík ,
j=0 ex k learning and attempt to copy inputs to outputs. The underlying
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 14

Input layer Hidden variables


Feature 1 Hidden layers

Feature 2
Output layer
1. Sample from
Feature 3 2. Sample from

Feature 4
3. Update weights
Feature 5
Visible variables
Feature 6

(a) Structure of an MLP with 2 hidden layers (blue circles). (b) Graphical model and training process of an RBM. v and h denote
visible and hidden variables, respectively.

Output layer

Hidden Minimise
layer
Scanning

Convolutional
Input layer Input map
kernel Output map

(c) Operating principle of an auto-encoder, which seeks to reconstruct (d) Operating principle of a convolutional layer.
the input from the hidden layer.

Outputs h1 h2 h3 ht H1 H2 ... HT Hidden


states
o1 i1 o2 oT
States S 1 S2 S3 ... St
C1
f1
C2
...
CT
Cells

i1 i2 iT
Inputs x1 x2 x3 xt X1 X2 ... XT Inputs

(e) Recurrent layer – x1:t is the input sequence, indexed by time t, (f) The inner structure of an LSTM layer.
st denotes the state vector and ht the hidden outputs.

Generator Network Rewards

Inputs Outputs Agent Outputs


State Policy/Q values ...
input
Actions
Environment

Outputs from Discriminator Network


the generator
Output

Real data Observed state


Real/Fake

(g) Underlying principle of a generative adversarial network (GAN). (h) Typical deep reinforcement learning architecture. The agent is a
neural network model that approximates the required function.

Fig. 5: Typical structure and operation principles of MLP, RBM, AE, CNN, RNN, LSTM, GAN, and DRL.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 15

TABLE IX: Summary of different deep learning architectures. GAN and DRL are shaded, since they are built upon other models.
Potential
Learning Example Suitable
Model Pros Cons applications in
scenarios architectures problems
mobile networks
Modeling
High complexity,
Supervised, Modeling data Naive structure and multi-attribute mobile
ANN, modest performance
MLP unsupervised, with simple straightforward to data; auxiliary or
AdaNet [139] and slow
reinforcement correlations build component of other
convergence
deep architectures
Learning
representations from
DBN [140], unlabeled mobile
Extracting robust Can generate virtual
RBM Unsupervised Convolutional Difficult to train well data; model weight
representations samples
DBN [141] initialization;
network flow
prediction
model weight
Learning sparse Powerful and initialization; mobile
DAE [142], Expensive to pretrain
AE Unsupervised and compact effective data dimension
VAE [143] with big data
representations unsupervised learning reduction; mobile
anomaly detection
High computational
AlexNet [85], cost; challenging to
Supervised, ResNet [144], find optimal
Spatial data Weight sharing; Spatial mobile data
CNN unsupervised, 3D-ConvNet [145], hyper-parameters;
modeling affine invariance analysis
reinforcement GoogLeNet [129], requires deep
DenseNet [146] structures for
complex tasks
Individual traffic flow
LSTM [147], High model
Supervised, Expertise in analysis; network-
Attention based Sequential data complexity; gradient
RNN unsupervised, capturing temporal wide (spatio-)
RNN [148], modeling vanishing and
reinforcement dependencies temporal data
ConvLSTM [149] exploding problems
modeling
Virtual mobile data
Training process is
Can produce lifelike generation; assisting
WGAN [78], unstable
GAN Unsupervised Data generation artifacts from a target supervised learning
LS-GAN [150] (convergence
distribution tasks in network data
difficult)
analysis
DQN [19],
Deep Policy Control problems Ideal for
Mobile network
Gradient [151], with high- high-dimensional Slow in terms of
DRL Reinforcement control and
A3C [77], dimensional environment convergence
management.
Rainbow [152], inputs modeling
DPPO [153]

principle of an AE is shown in Fig. 5(c). AEs are frequently AEs can be employed to address network security prob-
used to learn compact representation of data for dimension lems, as several research papers confirm their effectiveness
reduction [162]. Extended versions can be further employed to in detecting anomalies under different circumstances [163]–
initialize the weights of a deep architecture, e.g., the Denoising [165], which we will further discuss in subsection VI-G. The
Auto-Encoder (DAE) [142]), and generate virtual examples structures of RBMs and AEs are based upon MLPs, CNNs or
from a target data distribution, e.g. Variational Auto-Encoders RNNs. Their goals are similar, while their learning processes
(VAEs) [143]. are different. Both can be exploited to extract patterns from un-
A VAE typically comprises two neural networks – an labeled mobile data, which may be subsequently employed for
encoder and a decoder. The input of the encoder is a data various supervised learning tasks, e.g., routing [166], mobile
point x (e.g., images) and its functionality is to encode this activity recognition [167], [168], periocular verification [169]
input into a latent representation space z. Let fΘ (z|x) be an and base station user number prediction [170].
encoder parameterized by Θ and z is sampled from a Gaussian
distribution, the objective of the encoder is to output the mean D. Convolutional Neural Network
and variance of the Gaussian distribution. Similarly, denoting
Instead of employing full connections between layers, Con-
gΩ (x|z) the decoder parameterized by Ω, this accepts the latent
volutional Neural Networks (CNNs or ConvNets) employ a
representation z as input, and outputs the parameter of the
set of locally connected kernels (filters) to capture correla-
distribution of x. The objective of the VAE is to minimize
tions between different data regions. Mathematically, for each
the reconstruction error of the data and the Kullback-Leibler
location p y of the output y, the standard convolution performs
(KL) divergence between p(z) and fΘ (z|x). Once trained, the
the following operation:
VAE can generate new data point samples by (i) drawing
Õ
latent variables zi ∼ p(z) and (ii) drawing a new data point y( p y ) = w( p G ) · x( p y + p G ), (4)
xi ∼ p(x|z). p G ∈G
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 16

where p G denotes all positions in the receptive field G of the a vanilla RNN performs the following operations:
convolutional filter W. Here the weights W are shared across
st = σs (Wx xt + Ws st−1 + bs )
different locations of the input map. We illustrate the operation
of one 2D convolutional layer in Fig. 5(d). ht = σh (Wh st + bh ),
CNNs improve traditional MLPs by leveraging three im- where st is the state and ht is the hidden output.
portant ideas, namely, (i) sparse interactions, (ii) parameter However, gradient vanishing and exploding problems are
sharing, and (iii) equivariant representations [18]. This reduces frequently reported in traditional RNNs, which make them
the number of model parameters significantly and maintains particularly hard to train [174]. The Long Short-Term Mem-
the affine invariance (i.e., recognition result are robust to ory (LSTM) mitigates these issues by introducing a set of
the affine transformation of objects). Specifically, The sparse “gates” [147], which has been proven successful in many
interactions imply that the weight kernel has smaller size than applications (e.g., speech recognition [175], text categoriza-
the input. It performs moving filtering to produce outputs tion [176], and wearable activity recognition [98]). A standard
(with roughly the same size as the inputs) for the current LSTM performs the following operations:
layer. Parameter sharing refers to employing the same kernel
to scan the whole input map. This significantly reduces the it = σ(Wxi Xt + Whi Ht−1 + Wci Ct−1 + bi ),
number of parameters needed, which mitigates the risk of over- ft = σ(Wx f Xt + Wh f Ht−1 + Wc f Ct−1 + b f ),
fitting. Equivariant representations indicate that convolution Ct = ft Ct−1 + it tanh(Wxc ∗ Xt + Whc Ht−1 + bc ),
operations are invariant in terms of translation, scale, and ot = σ(Wxo Xt + Who Ht−1 + Wco Ct + bo ),
shape. This is particularly useful for image processing, since
essential features may show up at different locations in the Ht = ot tanh(Ct ).
image, with various affine patterns. Here, ‘ ’ denotes the Hadamard product, Ct denotes the cell
Owing to the properties mentioned above, CNNs achieve outputs, Ht are the hidden states, it , ft , and ot are input
remarkable performance in imaging applications. Krizhevsky gates, forget gates, and output gates, respectively. These gates
et al. [85] exploit a CNN to classify images on the Ima- mitigate the gradient issues and significantly improve the
geNet dataset [171]. Their method reduces the top-5 error RNN. We illustrated the structure of an LSTM in Fig. 5(f).
by 39.7% and revolutionizes the imaging classification field. Sutskever et al. introduce attention mechanisms to RNNs,
GoogLeNet [129] and ResNet [144] significantly increase the which achieves outstanding accuracy in tokenized predic-
depth of CNN structures, and propose inception and residual tions [148]. Shi et al. substitute the dense matrix multiplication
learning techniques to address problems such as over-fitting in LSTMs with convolution operations, designing a Convolu-
and gradient vanishing introduced by “depth”. Their structure tional Long Short-Term Memory (ConvLSTM) [149]. Their
is further improved by the Dense Convolutional Network proposal reduces the complexity of traditional LSTM and
(DenseNet) [146], which reuses feature maps from each layer, demonstrates significantly lower prediction errors in precipita-
thereby achieving significant accuracy improvements over tion nowcasting (i.e., forecasting the volume of precipitation).
other CNN based models, while requiring fewer layers. CNNs Mobile networks produce massive sequential data from
have also been extended to video applications. Ji et al. propose various sources, such as data traffic flows, and the evolution
3D convolutional neural networks for video activity recog- of mobile network subscribers’ trajectories and application
nition [145], demonstrating superior accuracy as compared latencies. Exploring the RNN family is promising to enhance
to 2D CNN. More recent research focuses on learning the the analysis of time series data in mobile networks.
shape of convolutional kernels [172], [173]. These dynamic
architectures allow to automatically focus on important regions F. Generative Adversarial Network
in input maps. Such properties are particularly important in
The Generative Adversarial Network (GAN) is a framework
analyzing large-scale mobile environments exhibiting cluster-
that trains generative models using the following adversarial
ing behaviors (e.g., surge of mobile traffic associated with a
process. It simultaneously trains two models: a generative one
popular event).
G that seeks to approximate the target data distribution from
Given the high similarity between image and spatial mobile training data, and a discriminative model D that estimates the
data (e.g., mobile traffic snapshots, users’ mobility, etc.), probability that a sample comes from the real training data
CNN-based models have huge potential for network-wide rather than the output of G [90]. Both of G and D are nor-
mobile data analysis. This is a promising future direction that mally neural networks. The training procedure for G aims to
we further discuss in Sec. VIII. maximize the probability of D making a mistake. The overall
objective is solving the following minimax problem [90]:
min max Ex∼Pr (x) [log D(x)] + Ez∼Pn (z) [log(1 − D(G(z)))].
G D
E. Recurrent Neural Network
Both the generators and the discriminator are trained iteratively
Recurrent Neural Networks (RNNs) are designed for mod- while fixing the other one. Finally G can produce data close
eling sequential data. At each time step, they produce output to a target distribution (the same with training examples), if
via recurrent connections between hidden units [18], as shown the model converges. We show the overall structure of a GAN
in Fig. 5(e). Given a sequence of inputs x = {x1, x2, · · · , xT }, in Figure 5(g).
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 17

The training process of traditional GANs is highly sen- on multi-threaded CPUs in a distributed manner [153]. Based
sitive to model structures, learning rates, and other hyper- on this method, an agent developed by OpenAI defeated a
parameters. Researchers are usually required to employ nu- human expert in Dota2 team in a 5v5 match.18
merous ad hoc ‘tricks’ to achieve convergence. There exist Many mobile networking problems can be formulated as
several solutions for mitigating this problem, e.g., Wasserstein Markov Decision Processes (MDPs), where reinforcement
Generative Adversarial Network (WGAN) [78] and Loss- learning can play an important role (e.g., base station on-
Sensitive Generative Adversarial Network (LS-GAN) [150], off switching strategies [184], routing [185], and adaptive
but research on the theory of GANs remains shallow. Recent tracking control [186]). Some of these problems nevertheless
work confirms that GANs can promote the performance of involve high-dimensional inputs, which limits the applica-
some supervised tasks (e.g., super-resolution [177], object bility of traditional reinforcement learning algorithms. DRL
detection [178], and face completion [179]) by minimizing techniques broaden the ability of traditional reinforcement
the divergence between inferred and real data distributions. learning algorithms to handle high dimensionality, in scenarios
Exploiting the unsupervised learning abilities of GANs is previously considered intractable. Employing DRL is thus
promising in terms of generating synthetic mobile data for promising to address network management and control prob-
simulations, or assisting specific supervised tasks in mobile lems under complex, changeable, and heterogeneous mobile
network applications. This becomes more important in tasks environments. We further discuss this potential in Sec. VIII.
where appropriate datasets are lacking, given that operators
are generally reluctant to share their network data. VI. D EEP L EARNING D RIVEN M OBILE AND W IRELESS
N ETWORKS
G. Deep Reinforcement Learning Deep learning has a wide range of applications in mobile
and wireless networks. In what follows, we present the most
Deep Reinforcement Learning (DRL) refers to a set of
important research contributions across different mobile net-
methods that approximate value functions (deep Q learning) or
working areas and compare their design and principles. In
policy functions (policy gradient method) through deep neural
particular, we first discuss a key prerequisite, that of mobile
networks. An agent (neural network) continuously interacts
big data, then organize the review of relevant works into eight
with an environment and receives reward signals as feedback.
subsections, focusing on specific domains where deep learning
The agent selects an action at each step, which will change
has made advances. Specifically,
the state of the environment. The training goal of the neural
network is to optimize its parameters, such that it can select 1) Deep Learning Driven Network-Level Mobile Data
actions that potentially lead to the best future return. We Analysis focuses on deep learning applications built on
illustrate this principle in Fig. 5(h). DRL is well-suited to network-level mobile big data, including network predic-
problems that have a huge number of possible states (i.e., envi- tion, traffic classification, and Call Detail Record (CDR)
ronments are high-dimensional). Representative DRL methods mining.
include Deep Q-Networks (DQNs) [19], deep policy gradient 2) Deep Learning Driven App-Level Mobile Data Anal-
methods [151], Asynchronous Advantage Actor-Critic [77], ysis shifts the attention towards mobile data analytics on
Rainbow [152] and Distributed Proximal Policy Optimization edge devices.
(DPPO) [153]. These perform remarkably in AI gaming (e.g., 3) Deep Learning Driven Mobility Analysis and Local-
Gym17 , robotics, and autonomous driving [180]–[183], and ization sheds light on the benefits of employing deep
have made inspiring deep learning breakthroughs recently. neural networks to understand the movement patterns
In particular, the DQN [19] is first proposed by DeepMind of mobile users, or localize users in indoor or outdoor
to play Atari video games. However, traditional DQN requires environments.
several important adjustments to work well. The A3C [77] 4) Deep Learning Driven Wireless Sensor Networks dis-
employs an actor-critic mechanism, where the actor selects cusses important work about deep learning applications
the action given the state of the environment, and the critic in WSNs.
estimates the value given the state and the action, then delivers 5) Deep Learning Driven Network Control investigate the
feedback to the actor. The A3C deploys different actors and usage of deep reinforcement learning and deep imitation
critics on different threads of a CPU to break the dependency learning on network optimization, routing, scheduling,
of data. This significantly improves training convergence, resource allocation, and radio control.
enabling fast training of DRL agents on CPUs. Rainbow [152] 6) Deep Learning Driven Network Security presents work
combines different variants of DQNs, and discovers that these that leverages deep learning to improve network security,
are complementary to some extent. This insight improved which we cluster by focus as infrastructure, software, and
performance in many Atari games. To solve the step size privacy related.
problem in policy gradients methods, Schulman et al. propose 7) Deep Learning Driven Signal Processing scrutinizes
a Distributed Proximal Policy Optimization (DPPO) method to physical layer aspects that benefit from deep learning and
constrain the update step of new policies, and implement this reviews relevant work on signal processing.
8) Emerging Deep Learning Driven Network Mobile
17 Gym is a toolkit for developing and comparing reinforcement learning Network Application warps up this section, presenting
algorithms. It supports teaching agents everything from walking to playing
games like Pong or Pinball. https://gym.openai.com/) 18 Dota2 is a popular multiplayer online battle arena video game.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 18

Weather Magnetic field


App-Level Mobile Data
Analysis
[17], [96], [98], [167], [207]–[257]
Humidity
Mobility Analysis and Network-Level Mobile Noise Reconnaissance
Sensor nodes Gateways
User Localization Data Analysis
[97], [204], [242], [243], [258]–[275] [76], [187]–[206]
Location
Temperature
Air quality

Wireless sensor network


Wireless Deep Learning Driven Sensor
Emerging Applications
Sensor Networks Mobile and Wireless data
[368]–[372]
[276]–[286]
Networks Data
storage

Mobile data
Network Control Signal Processing mining/analysis
[166], [259], [287]–[298], [298]–[319] [273], [308], [310], [351]–[367]

Network Security App/Network level


[165], [200], [286], [320]–[344], [344]–[350] mobile data

BSC/RNC
Base station
Fig. 6: Classification of the literature reviewed in Sec. VI.
WiFi Modern

Cellular/WiFi network
other interesting deep learning applications in mobile
networking.
For each domain, we summarize work broadly in tabular form,
providing readers with a general picture of individual topics. Fig. 7: Illustration of the mobile data collection process in
Most important works in each domain are discussed in more cellular, WiFi and wireless sensor networks. BSC: Base Station
details in text. Lessons learned are also discussed at the end Controller; RNC: Radio Network Controller.
of each subsection. We give a diagramatic view of the topics
dealt with by the literature reviewed in this section in Fig. 6.
overcoming this burden. We begin therefore by introducing
characteristics of mobile big data, then present a holistic
A. Mobile Big Data as a Prerequisite review of deep learning driven mobile data analysis research.
The development of mobile technology (e.g. smartphones, Yazti and Krishnaswamy propose to categorize mobile data
augmented reality, etc.) are forcing mobile operators to evolve into two groups, namely network-level data and app-level
mobile network infrastructures. As a consequence, both the data [373]. The key difference between them is that in the
cloud and edge side of mobile networks are becoming in- former data is usually collected by the edge mobile devices,
creasingly sophisticated to cater for users who produce and while in the latter obtained throughout network infrastructure.
consume huge amounts of mobile data daily. These data can We summarize these two types of data and their information
be either generated by the sensors of mobile devices that comprised in Table X. Before delving into mobile data analyt-
record individual user behaviors, or from the mobile network ics, we illustrate the typical data collection process in Figure 7.
infrastructure, which reflects dynamics in urban environments. Network-level mobile data generated by the networking
Appropriately mining these data can benefit multidisciplinary infrastructure not only deliver a global view of mobile network
research fields and the industry in areas such mobile network performance (e.g. throughput, end-to-end delay, jitter, etc.), but
management, social analysis, public transportation, personal also log individual session times, communication types, sender
services provision, and so on [36]. Network operators, how- and receiver information, through Call Detail Records (CDRs).
ever, could become overwhelmed when managing and ana- Network-level data usually exhibit significant spatio-temporal
lyzing massive amounts of heterogeneous mobile data. Deep variations resulting from users’ behaviors [374], which can
learning is probably the most powerful methodology that can be utilized for network diagnosis and management, user
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 19

TABLE X: The taxonomy of mobile big data. GAN) for handing unlabeled data, which is common in
Information mobile network logs.
Mobile data Source
4) Multi-modal deep learning allows to learn features over
Infrastructure locations,
Infrastructure capability, equipment
multiple modalities [375], which makes it powerful in
holders, etc modeling with data collected from heterogeneous sensors
Network-level data
Performance Data traffic, end-to-end and data sources.
indicators delay, QoE, jitter, etc.
Session start and end These advantages make deep learning as a powerful tool for
Call detail
records (CDR)
times, type, sender and mobile data analysis.
receiver, etc.
Radio Signal power, frequency,
information spectrum, modulation etc. B. Deep Learning Driven Network-level Mobile Data Analysis
Device type, usage,
Device Media Access Control Network-level mobile data refers bradly to logs recorded by
(MAC) address, etc.
User settings, personal
Internet service providers, including infrastructure metadata,
App-level data
Profile network performance indicators and call detail records
information, etc
Mobility, temperature, (CDRs) (see Table. XI). The recent remarkable success
Sensors magnetic field,
movement, etc
of deep learning ignites global interests in exploiting this
Picture, video, voice, methodology for mobile network-level data analysis, so as to
Application health condition, optimize mobile networks configurations, thereby improving
preference, etc.
Software and hardware
end-uses’ QoE. These work can be categorized into four
System log types: network state prediction, network traffic classification,
failure logs, etc.
CDR mining and radio analysis. In what follows, we review
work in these directions, which we first summarize and
mobility analysis and public transportation planning [193]. compare in Table XI.
Some network-level data (e.g. mobile traffic snapshots) can
be viewed as pictures taken by ‘panoramic cameras’, which Network State Prediction refers to inferring mobile network
provide a city-scale sensing system for urban sensing. traffic or performance indicators, given historical measure-
On the other hand, App-level data is directly recorded ments or related data. Pierucci and Micheli investigate the
by sensors or mobile applications installed in various mo- relationship between key objective metrics and QoE [187].
bile devices. These data are frequently collected through They employ MLPs to predict users’ QoE in mobile commu-
crowd-sourcing schemes from heterogeneous sources, such as nications, based on average user throughput, number of active
Global Positioning Systems (GPS), mobile cameras and video users in a cells, average data volume per user, and channel
recorders, and portable medical monitors. Mobile devices act quality indicators, demonstrating high prediction accuracy.
as sensor hubs, which are responsible for data gathering Network traffic forecasting is another field where deep learning
and preprocessing, and subsequently distributing such data is gaining importance. By leveraging sparse coding and max-
to specific locations, as required [36]. App-level data may pooling, Gwon and Kung develop a semi-supervised deep
directly or indirectly reflect users’ behaviors, such as mobility, learning model to classify received frame/packet patterns and
preferences, and social links [61]. Analyzing app-level data infer the original properties of flows in a WiFi network
from individuals can help reconstructing one’s personality [188]. Their proposal demonstrates superior performance over
and preferences, which can be used in recommender systems traditional ML techniques. Nie et al. investigate the traffic
and users targeted advertising. Some of these data comprise demand patterns in wireless mesh network [189]. They design
explicit information about individuals’ identities. Inappropriate a DBN along with Gaussian models to precisely estimate
sharing and use can raise significant privacy issues. Therefore, traffic distributions.
extracting useful patterns from multi-modal sensing devices In [191], Wang et al. propose to use an AE-based architec-
without compromising user’s privacy remains a challenging ture and LSTMs to model spatial and temporal correlations
endeavor. of mobile traffic distribution, respectively. In particular, the
Compared to traditional data analysis techniques, deep authors use a global and multiple local stacked AEs for
learning embraces several unique features to address the spatial feature extraction, dimension reduction and training
aforementioned challenges [17]. Namely: parallelism. Compressed representations extracted are subse-
1) Deep learning achieves remarkable performance in vari- quently processed by LSTMs, to perform final forecasting.
ous data analysis tasks, on both structured and unstruc- Experiments with a real-world dataset demonstrate superior
tured data. Some types of mobile data can be represented performance over SVM and the Autoregressive Integrated
as image-like (e.g. [193]) or sequential data [201]. Moving Average (ARIMA) model. The work in [192] extends
2) Deep learning performs remarkably well in feature ex- mobile traffic forecasting to long time frames. The authors
traction from raw data. This saves tremendous effort of combine ConvLSTMs and 3D CNNs to construct spatio-
hand-crafted feature engineering, which allows spending temporal neural networks that capture the complex spatio-
more time on model design and less on sorting through temporal features at city scale. They further introduce a fine-
the data itself. tuning scheme and lightweight approach to blend predictions
3) Deep learning offers excellent tools (e.g. RBM, AE, with historical means, which significantly extends the length of
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 20

TABLE XI: A summary of work on network-level mobile data analysis.


Domain Reference Applications Model Optimizer Key contribution
Pierucci and
Uses NNs to correlate Quality of Service
Micheli QoE prediction MLP Unknown
parameters and QoE estimations.
[187]
Sparse
Gwon and Inferring Wi-Fi flow
coding + SGD Semi-supervised learning
Kung [188] patterns
Max pooling
Network prediction DBN +
Nie et al. Wireless mesh network Considers both long-term dependency and
Gaussian SGD
[189] traffic prediction short-term fluctuations.
models
Moyo and
Investigates the impact of learning in traffic
Sibanda TCP/IP traffic prediction MLP Unknown
forecasting
[190]
Wang et al. Uses an AE to model spatial correlations and
Mobile traffic forecasting AE + LSTM SGD
[191] an LSTM to model temporal correlation
Zhang and Long-term mobile traffic ConvLSTM Combines 3D-CNNs and ConvLSTMs to
Adam
Patras [192] forecasting + 3D-CNN perform long-term forecasting
Introduces the MTSR concept and applies
Zhang et al. Mobile traffic CNN +
Adam image processing techniques for mobile traffic
[193] super-resolution GAN
analysis
Combines CNNs and RNNs to extract
Huang et al. LSTM +
Mobile traffic forecasting Unknown geographical and temporal features from
[194] 3D-CNN
mobile traffic.
Densely
Zhang et al. Uses separate CNNs to model closeness and
Cellular traffic prediction connected Adam
[195] periods in temporal dependency
CNN
Chen et al. Multivariate Uses mobile traffic forecasting to aid cloud
Cloud RAN optimization Unknown
[76] LSTM radio access network optimization
Navabi et al. Wireless WiFi channel Infers non-observable channel information
MLP SGD
[196] feature prediction from observable features.
Performs feature learning, protocol
MLP,
Wang [197] Traffic classification Unknown identification and anomalous protocol detection
stacked AE
simultaneously
Traffic classification Wang et al. Encrypted traffic Employs an end-to-end deep learning approach
CNN SGD
[198] classification to perform encrypted traffic classification
Lotfollahi et Encrypted traffic Can perform both traffic characterization and
CNN Adam
al. [199] classification application identification
Wang et al. Malware traffic First work to use representation learning for
CNN SGD
[200] classification malware classification from raw traffic
Aceto et al. Mobile encrypted traffic MLP, CNN, Comprehensive evaluations of different NN
SGD, Adam
[376] classification LSTM architectures and excellent performance
Employs geo-spatial data processing, a
Liang et al.
Metro density prediction RNN SGD weight-sharing RNN and parallel stream
[201]
analytic programming
CDR mining
Felbo et al. Exploits the temporal correlation inherent to
Demographics prediction CNN Adam
[202] mobile phone metadata
Scaled
Chen et al. Tourists’ next visit conjugate LSTM that performs significantly better than
MLP, RNN
[203] location prediction gradient other ML approaches
descent
Input-Output
Lin et al. Human activity chains First work that uses an RNN to generate
HMM + Adam
[204] generation human activity chains
LSTM
Xu et al. Wi-Fi hotpot
CNN Unknown Combing deep learning with frequency anslysis
Others [205] classification
Investigates trade-off between accuracy of
Meng et al. QoE-driven big data
CNN SGD high-dimensional big data analysis and model
[206] analysis
training speed

reliable prediction steps. Deep learning was also employed in with multiple skip connections between layers, named
[194] and [76], where the authors employ CNNs and LSTMs deep zipper network, along with a Generative Adversarial
to perform mobile traffic forecasting. By effectively extracting Network (GAN) to perform precise MTSR and improve
spatio-temporal features, their proposals gain significantly the fidelity of inferred traffic snapshots. Experiments with a
higher accuracy than traditional approaches, such as ARIMA. real-world dataset show that this architecture can improve the
More recently, Zhang et al. propose an original Mobile granularity of mobile traffic measurements over a city by up
Traffic Super-Resolution (MTSR) technique to infer network- to 100×, while significantly outperforming other interpolation
wide fine-grained mobile traffic consumption given coarse- techniques.
grained counterparts obtained by probing, thereby reducing
traffic measurement overheads [193]. Inspired by image Traffic Classification is aimed at identifying specific
super-resolution techniques, they design a dedicated CNN applications or protocols among the traffic in networks. Wang
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 21

recognizes the powerful feature learning ability of deep C. Deep Learning Driven App-level Mobile Data Analysis
neural networks and uses deep a AE to identify protocols
in a TCP flow dataset, achieving excellent precision and Triggered by the increasing popularity of Internet of Things
recall rates [197]. Work in [198] proposes to use a 1D CNN (IoT), current mobile devices bundle increasing numbers of
for encrypted traffic classification. The authors suggest that applications and sensors that can collect massive amounts of
this structure works well for modeling sequential data and app-level mobile data [377]. Employing artificial intelligence
has lower complexity, thus being promising in addressing to extract useful information from these data can extend the
the traffic classification problem. Similarly, Lotfollahi et al. capability of devices [73], [378], [379], thus greatly benefit-
present Deep Packet, which is based on a CNN, for encrypted ing users themselves, mobile operators, and indirectly device
traffic classification [199]. Their framework reduces the manufacturers. Analysis of mobile data therefore becomes
amount of hand-crafted feature engineering and achieves an important and popular research direction in the mobile
great accuracy. More recently, Aceto et al. employ MLPs, networking domain. Nonetheless, mobile devices usually op-
CNNs, and LSTMs to perform encrypted mobile traffic erate in noisy, uncertain and unstable environments, where
classification [376], arguing that deep NNs can automatically their users move fast and change their location and activity
extract complex features present in mobile traffic. As reflected contexts frequently. As a result, app-level mobile data analysis
by their results, deep learning based solutions obtain superior becomes difficult for traditional machine learning tools, which
accuracy over RFs in classifying Android, IOS and Facebook perform relatively poorly. Advanced deep learning practices
traffic. CNNs have also been used to identify malware traffic, provide a powerful solution for app-level data mining, as
where work in [200] regards traffic data as images and they demonstrate better precision and higher robustness in IoT
unusual patterns that malware traffic exhibit are classified applications [380].
by representation learning. Similar work on mobile malware There exist two approaches to app-level mobile data anal-
detection will be further discussed in subsection VI-G. ysis, namely (i) cloud-based computing and (ii) edge-based
computing. We illustrate the difference between these scenar-
ios in Fig. 8. As shown in the left part of the figure, the cloud-
CDR Mining involves extracting knowledge from specific based computing treats mobile devices as data collectors and
instances of telecommunication transactions such as phone messengers that constantly send data to cloud servers, via local
number, cell ID, session start/end time, traffic consumption, points of access with limited data preprocessing capabilities.
etc. Using deep learning to mine useful information from This scenario typically includes the following steps: (i) users
CDR data can serve a variety of functions. For example, query on/interact with local mobile devices; (ii) queries are
Liang et al. propose Mercury to estimate metro density transmitted to severs in the cloud; (iii) servers gather the
from streaming CDR data, using RNNs [201]. They take data received for model training and inference; (iv) query
the trajectory of a mobile phone user as a sequence of results are subsequently sent back to each device, or stored and
locations; RNN-based models work well in handling such analyzed without further dissemination, depending on specific
sequential data. Likewise, Felbo et al. use CDR data to study application requirements. The drawback of this scenario is
demographics [202]. They employ a CNN to predict the that constantly sending and receiving messages to/from servers
age and gender of mobile users, demonstrating the superior over the Internet introduces overhead and may result in severe
accuracy of these structures over other ML tools. More latency. In contrast, in the edge-based computing scenario
recently, Chen et al. compare different ML models to predict pre-trained models are offloaded from the cloud to individual
tourists’ next locations of visit by analyzing CDR data mobile devices, such that they can make inferences locally.
[203]. Their experiments suggest that RNN-based predictors As illustrated in the right part of Fig. 8, this scenario typically
significantly outperform traditional ML methods, including consists of the following: (i) servers use offline datasets to
Naive Bayes, SVM, RF, and MLP. per-train a model; (ii) the pre-trained model is offloaded to
edge devices; (iii) mobile devices perform inferences locally
using the model; (iv) cloud servers accept data from local
Lessons learned: Network-level mobile data, such as mo- devices; (v) the model is updated using these data whenever
bile traffic, usually involves essential spatio-temporal cor- necessary. While this scenario requires less interactions with
relations. These correlations can be effectively learned by the cloud, its applicability is limited by the computing and
CNNs and RNNs, as they are specialized in modeling spatial battery capabilities of edge hardware. Therefore, it can only
and temporal data (e.g., images, traffic series). An important support tasks that require light computations.
observation is that large-scale mobile network traffic can be Many researchers employ deep learning for app-level
processed as sequential snapshots, as suggested in [192], mobile data analysis. We group the works reviewed according
[193], which resemble images and videos. Therefore, potential to their application domains, namely mobile healthcare,
exists to exploit image processing techniques for network- mobile pattern recognition, and mobile Natural Language
level analysis. Techniques previously used for imaging usu- Processing (NLP) and Automatic Speech Recognition (ASR).
ally, however, cannot be directly employed with mobile data. Table XII gives a high-level summary of existing research
Efforts must be made to adapt them to the particularities of the
mobile networking domain. We expand on this future research 18 Human profile source: https://lekeart.deviantart.com/art/male-body-
direction in Sec. VIII-B profile-251793336
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 22

Edge Cloud Edge Cloud


Data Data
Transmission Model Training & Transmission
2. Query Updates
sent to
1. Query servers
1. Model
2. Model Pretraining
offloading

4. User data
transmission
Neural Network

5. Model
5.1 Results update
sent back 4. Query
Results Neural Network
3. Inference

3. Local
5.2 Results
Storage and Inference
analysis
Data Collection Data Collection

Cloud-based Edge-based

Fig. 8: Illustration of two deployment approaches for app-level mobile data analysis, namely cloud-based (left) and edge-based
(right). The cloud-based approach makes inference on clouds and send results to edge devices. On the contrary, the edge-based
approach deploys models on edge devices which can make local inference.

efforts and we discuss representative work next. As deep learning performs remarkably in medical data
analysis [382], we expect more and more deep learning
Mobile Health. There is an increasing variety of wearable powered health care devices will emerge to improve physical
health monitoring devices being introduced to the market. monitoring and illness diagnosis.
By incorporating medical sensors, these devices can capture
the physical conditions of their carriers and provide real-time Mobile Pattern Recognition. Recent advanced mobile de-
feedback (e.g. heart rate, blood pressure, breath status etc.), or vices offer people a portable intelligent assistant, which fosters
trigger alarms to remind users of taking medical actions [381]. a diverse set of applications that can classify surrounding
Liu and Du design a deep learning-driven MobiEar to aid objects (e.g. [217]–[219], [222]) or users’ behaviors (e.g. [98],
deaf people’s awareness of emergencies [207]. Their proposal [224], [227], [233], [234], [383], [384]) based on patterns
accepts acoustic signals as input, allowing users to register dif- observed in the output of the mobile camera or other sensors.
ferent acoustic events of interest. MobiEar operates efficiently We review and compare recent works on mobile pattern
on smart phones and only requires infrequent communications recognition in this part.
with servers for updates. Likewise, Liu et al. develop a UbiEar, Object classification in pictures taken by mobile devices
which is operated on the Android platform to assist hard-to- is drawing increasing research interest. Li et al. develop
hear sufferers in recognizing acoustic events, without requiring DeepCham as a mobile object recognition framework [217].
location information [208]. Their design adopts a lightweight Their architecture involves a crowd-sourcing labeling process,
CNN architecture for inference acceleration and demonstrates which aims to reduce the hand-labeling effort, and a collab-
comparable accuracy over traditional CNN models. orative training instance generation pipeline that is built for
Hosseini et al. design an edge computing system for health deployment on mobile devices. Evaluations of the prototype
monitoring and treatment [213]. They use CNNs to extract system suggest that this framework is efficient and effective
features from mobile sensor data, which plays an important in terms of training and inference. Tobías et al. investigate the
role in their epileptogenicity localization application. Stamate applicability of employing CNN schemes on mobile devices
et al. develop a mobile Android app called cloudUPDRS to for objection recognition tasks [218]. They conduct exper-
manage Parkinson’s symptoms [214]. In their work, MLPs iments on three different model deployment scenarios, i.e.,
are employed to determine the acceptance of data collected on GPU, CPU, and respectively on mobile devices, with two
by smart phones, to maintain high-quality data samples. The benchmark datasets. The results obtained suggest that deep
proposed method outperforms other ML methods such as GPs learning models can be efficiently embedded in mobile devices
and RFs. Quisel et al. suggest that deep learning can be effec- to perform real-time inference.
tively used for mobile health data analysis [215]. They exploit Mobile classifiers can also assist Virtual Reality (VR) ap-
CNNs and RNNs to classify lifestyle and environmental traits plications. A CNN framework is proposed in [222] for facial
of volunteers. Their models demonstrate superior prediction expressions recognition when users are wearing head-mounted
accuracy over RFs and logistic regression, over six datasets. displays in the VR environment. Rao et al. incorporate a deep
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 23

learning object detector into a mobile augmented reality (AR) Infra-Red (PIR) sensing, specific actions and activities that a
system [223]. Their system achieves outstanding performance human subject performs. Data collected will be delivered to
in detecting and enhancing geographic objects in outdoor en- servers for model training and the model will be subsequently
vironments. Further work focusing on mobile AR applications deployed for domain-specific tasks.
is introduced in [385], where the authors characterize the Essential features of sensor data can be automatically ex-
tradeoffs between accuracy, latency, and energy efficiency of tracted by neural networks. The first work in this space that
object detection. is based on deep learning employs a CNN to capture local
Activity recognition is another interesting area that relies on dependencies and preserve scale invariance in motion sensor
data collected by mobile motion sensors [384], [386]. This data [224]. The authors evaluate their proposal on 3 offline
refers to the ability to classify based on data collected via, datasets, demonstrating their proposal yields higher accuracy
e.g., video capture, accelerometer readings, motion – Passive over statistical methods and Principal Components Analysis

TABLE XII: A summary of works on app-level mobile data analysis.


Subject Reference Application Deployment Model
Liu and Du [207] Mobile ear Edge-based CNN
Liu at al. [208] Mobile ear Edge-based CNN
Jindal [209] Heart rate prediction Cloud-based DBN
Kim et al. [210] Cytopathology classification Cloud-based CNN
Mobile Healthcare Sathyanarayana et al. [211] Sleep quality prediction Cloud-based MLP, CNN, LSTM
Li and Trocan [212] Health conditions analysis Cloud-based Stacked AE
Hosseini et al. [213] Epileptogenicity localisation Cloud-based CNN
Stamate et al. [214] Parkinson’s symptoms management Cloud-based MLP
Quisel et al. [215] Mobile health data analysis Cloud-based CNN, RNN
Khan et al. [216] Respiration surveillance Cloud-based CNN
Li et al. [217] Mobile object recognition Edge-based CNN
Edge-based &
Tobías et al. [218] Mobile object recognition CNN
Cloud based
Teng and Yang [222] Facial recognition Cloud-based CNN
Rao et al. [223] Mobile augmented reality Edge-based CNN
Zeng et al. [224] Activity recognition Cloud-based CNN, RBM
Almaslukh et al. [225] Activity recognition Cloud-based AE
Li et al. [226] RFID-based activity recognition Cloud-based CNN
Bhattacharya and Lane [227] Smart watch-based activity recognition Edge-based RBM
Edge-based &
Antreas and Angelov [228] Mobile surveillance system CNN
Cloud based
Ordóñez and Roggen [98] Activity recognition Cloud-based ConvLSTM
Wang et al. [229] Gesture recognition Edge-based CNN, RNN
Gao et al. [230] Eating detection Cloud-based DBM, MLP
Zhu et al. [231] User energy expenditure estimation Cloud-based CNN, MLP
Sundsøy et al. [232] Individual income classification Cloud-based MLP
Chen and Xue [233] Activity recognition Cloud-based CNN
Ha and Choi [234] Activity recognition Cloud-based CNN
Mobile Pattern Recognition Edel and Köppe [235] Activity recognition Edge-based Binarized-LSTM
Okita and Inoue [236] Multiple overlapping activities recognition Cloud-based CNN+LSTM
Alsheikh et al. [17] Activity recognition using Apache Spark Cloud-based MLP
Edge-based &
Mittal et al. [237] Garbage detection CNN
Cloud based
Seidenari et al. [238] Artwork detection and retrieval Edge-based CNN
Zeng et al. [239] Mobile pill classification Edge-based CNN
Zeng [241] Mobile object recognition Edge-based Unknown
Radu et al. [167] Activity recognition Edge-based RBM, CNN
Wang et al. [242], [243] Activity and gesture recognition Cloud-based Stacked AE
Cao et al. [245] Mood detection Cloud-based GRU
Edge-based &
Ran et al. [246] Object detection for AR applications. CNN
cloud-based
Estimating 3D human skeleton from radio
Zhao et al. [257] Cloud-based CNN
frequently signal
Mixture density
Siri [247] Speech synthesis Edge-based
networks
McGraw et al. [248] Personalised speech recognition Edge-based LSTM
Prabhavalkar et al. [249] Embedded speech recognition Edge-based LSTM
Mobile NLP and ASR
Yoshioka et al. [250] Mobile speech recognition Cloud-based CNN
Ruan et al. [251] Shifting from typing to speech Cloud-based Unknown
Georgiev et al. [96] Multi-task mobile audio sensing Edge-based MLP
Ignatov et al. [252] Mobile images quality enhancement Cloud-based CNN
Information retrieval from videos in
Lu et al. [253] Cloud-based CNN
Others wireless network
Lee et al. [254] Reducing distraction for smartwatch users Cloud-based MLP
Vu et al. [255] Transportation mode detection Cloud-based RNN
Fang et al. [256] Transportation mode detection Cloud-based MLP
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 24

(PCA). Almaslukh et al. employ a deep AE to perform in [96], where Georgiev et al. propose and evaluate a
human activity recognition by analyzing an offline smart novel deep learning modelling and optimization framework
phone dataset gathered from accelerometers and gyroscope tailored to embedded audio sensing tasks. To this end,
sensors [225]. Li et al. consider different scenarios for activity they selectively share compressed representations between
recognition [226]. In their implementation, Radio Frequency different tasks, which reduces training and data storage
Identification (RFID) data is directly sent to a CNN model for overhead, without significantly compromising accuracy of
recognizing human activities. While their mechanism achieves an individual task. The authors evaluate their framework on
high accuracy in different applications, experiments suggest a memory-constrained smartphone performing four audio
that the RFID-based method does not work well with metal tasks (i.e., speaker identification, emotion recognition, stress
objects or liquid containers. detection, and ambient scene analysis). Experiments suggest
[227] exploits an RBM to predict human activities, this proposal can achieve high efficiency in terms of energy,
given 7 types of sensor data collected by a smart watch. runtime and memory, while maintaining excellent accuracy.
Experiments on prototype devices show that this approach
can efficiently fulfill the recognition objective under tolerable Other applications. Deep learning also plays an important
power requirements. Ordóñez and Roggen architect an role in other applications that involve app-level data analysis.
advanced ConvLSTM to fuse data gathered from multiple For instance, Ignatov et al. show that deep learning can
sensors and perform activity recognition [98]. By leveraging enhance the quality of pictures taken by mobile phones. By
CNN and LSTM structures, ConvLSTMs can automatically employing a CNN, they successfully improve the quality of
compress spatio-temporal sensor data into low-dimensional images obtained by different mobile devices, to a digital
representations, without heavy data post-processing effort. single-lens reflex camera level [252]. Lu et al. focus on video
Wang et al. exploit Google Soli to architect a mobile post-processing under wireless networks [253], where their
user-machine interaction platform [229]. By analyzing radio framework exploits a customized AlexNet to answer queries
frequency signals captured by millimeter-wave radars, their about detected objects. This framework further involves an
architecture is able to recognize 11 types of gestures with optimizer, which instructs mobile devices to offload videos, in
high accuracy. Their models are trained on the server side, order to reduce query response time.
and inferences are performed locally on mobile devices. More Another interesting application is presented in [254], where
recently, Zhao et al. design a 4D CNN framework (3D for Lee et al. show that deep learning can help smartwatch users
the spatial dimension + 1D for the temporal dimension) to reduce distraction by eliminating unnecessary notifications.
reconstruct human skeletons using radio frequency signals Specifically, the authors use an 11-layer MLP to predict the
[257]. This novel approach resembles virtual “X-ray”, importance of a notification. Fang et al. exploit an MLP to
enabling to accurately estimate human poses, without extract features from high-dimensional and heterogeneous
requiring an actual camera. sensor data, including accelerometer, magnetometer, and
gyroscope measurements [256]. Their architecture achieves
Mobile NLP and ASR. Recent remarkable achievements 95% accuracy in recognizing human transportation modes,
obtained by deep learning in Natural Language Processing i.e., still, walking, running, biking, and on vehicle.
(NLP) and Automatic Speech Recognition (ASR) are also
embraced by applications for mobile devices. Lessons Learned: App-level data is heterogeneously and
Powered by deep learning, the intelligent personal assistant generated from distributed mobile devices, and there is a trend
Siri, developed by Apple, employs a deep mixture density to offload the inference process to these devices. However,
networks [387] to fix typical robotic voice issues and synthe- due to computational and battery power limitations, models
size more human-like voice [247]. An Android app released employed in the edge-based scenario are constrained to light-
by Google supports mobile personalized speech recognition weight architectures, which are less suitable for complex
[248]; this quantizes the parameters in LSTM model compres- tasks. Therefore, the trade-off between model complexity and
sion, allowing the app to run on low-power mobile phones. accuracy should be carefully considered [66].
Likewise, Prabhavalkar et al. propose a mathematical RNN At the same time app-level data usually contains important
compression technique that reduces two thirds of an LSTM users information and processing this poses significant privacy
acoustic model size, while only compromising negligible ac- concerns. Although there have been efforts that commit to pre-
curacy [249]. This allows building both memory- and energy- serve user privacy, as we discuss in Sec.VI-G, research efforts
efficient ASR applications on mobile devices. in this direction are new, especially in terms of protecting user
Yoshioka et al. present a framework that incorporates a information in distributed training. We expect more efforts in
network-in-network architecture into a CNN model, which this direction.
allows to perform ASR with mobile multi-microphone devices
used in noisy environments [250]. Mobile ASR can also
accelerate text input on mobile devices, Ruan et al.’s study D. Deep Learning Driven Mobility Analysis and Localization
showing that with the help of ASR, the input rates of English Understanding movement patterns of groups of human
and Mandarin are 3.0 and 2.8 times faster over standard beings is becoming crucial for epidemiology, urban planning,
typing on keyboards [251]. More recently, the applicability public service provisioning, and mobile network resource
of deep learning to multi-task audio sensing is investigated management [388]. Location-based services and applications
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 25

TABLE XIII: A summary of work on deep learning driven mobility analysis and user localization.
Subject Reference Application Model Key contribution
Mobile user trajectory Online framework for data stream
Ouyang et al. [258] CNN
prediction processing
Social networks and mobile
Yang et al. [259] RNN, GRU Multi-task learning
trajectories modeling
Mobility analysis City-wide mobility prediction
Song et al. [260] Multi-task LSTM Multi-task learning
and transportation modeling
Deep spatio-temporal
City-wide crowd flows Exploitation of spatio-temporal
Zhang et al. [261] residual networks
prediction characteristics of mobility events
(CNN-based)
Human activity chains Input-Output HMM +
Lin et al. [204] Generative model
generation LSTM
Subramanian and Sadiq Fewer location updates and lower
Mobile movement prediction MLP
[262] paging signaling costs
Operates with received signal
Ezema and Ani [263] Mobile location estimation MLP
strength in GSM
Reduced false negatives caused by
Shao et al. [264] CNN driven Pedometer CNN periodic movements and lower
initial response time
First deep learning driven indoor
Wang et al. [265] Indoor fingerprinting RBM
localization based on CSI
Works with calibrated phase
Wang et al. [242], [243] Indoor localization RBM
information of CSI
Uses more robust angle of arrival
Wang et al. [266] Indoor localization CNN
for estimation
Bi-modal framework using both
User localization
Wang et al. [267] Indoor localization RBM angle of arrival and average
amplitudes of CSI
Nowicki and Requires less system tuning or
Indoor localization Stacked AE
Wietrzykowski [268] filtering effort
Device-free framework, multi-task
Wang et al. [269], [270] Indoor localization Stacked AE
learning
Handles unlabeled data;
Mohammadiet al. [271] Indoor localization VAE+DQN reinforcement learning aided
semi-supervised learning
Counter propagation
Anzum et al. [274] Indoor localization Solves the ambiguity among zones
neural network
Employ bimodal magnetic field
Wang et al. [275] Indoor localization LSTM
and light intensity data
Kumar et al. [272] Indoor vehicles localization CNN Focus on vehicles applications
Online learning scheme;
Zheng and Weng [391] Outdoor navigation Developmental network
edge-based
Indoor and outdoor Operates under both indoor and
Zhang et al. [97] Stacked AE
localization outdoor environments
Massive MIMO Operates with massive MIMO
Vieira et al. [273] CNN
fingerprint-based positioning channels

(e.g. mobile AR, GPS) demand precise individual positioning which may be problematic for traditional ML models. Instead
technology [389]. As a result, research on user localization they build upon deep learning advances and propose an online
is evolving rapidly and numerous techniques are emerging learning scheme to train a hierarchical CNN architecture, al-
[390]. In this subsection, we discuss research in this space, lowing model parallelization for data stream processing [258].
which we first summarize in Table XIII. By analyzing usage records, their framework “DeepSpace”
predicts individuals’ trajectories with much higher accuracy
Mobility Analysis. Since deep learning is able to capture as compared to naive CNNs, as shown with experiments on a
spatial dependencies in sequential data, it is becoming a real-world dataset.
powerful tool for mobility analysis. The applicability of deep Instead of focusing on individual trajectories, Song et al.
learning for trajectory prediction is studied in [392]. By shar- shed light on the mobility analysis at a larger scale [260]. In
ing representations learned by RNN and Gate Recurrent Unit their work, LSTM networks are exploited to jointly model the
(GRU), the framework can perform multi-task learning on both city-wide movement patterns of a large group of people and
social networks and mobile trajectories modeling. Specifically, vehicles. Their multi-task architecture demonstrates superior
the authors first use deep learning to reconstruct social network prediction accuracy over vanilla LSTM. City-wide mobile
representations of users, subsequently employing RNN and patterns is also researched in [261], where the authors architect
GRU models to learn patterns of mobile trajectories with deep spatio-temporal residual networks to forecast the move-
different time granularity. Importantly, these two components ments of crowds. In order to capture the unique character-
jointly share representations learned, which tightens the overall istics of spatio-temporal correlations associated with human
architecture and enables efficient implementation. Ouyang et mobility, their framework abandons RNN-based models and
al. argue that mobility data are normally high-dimensional, constructs three ResNets to extract nearby and distant spatial
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 26

dependencies within a city. This scheme learns temporal fea- accurately a user, given both labeled and unlabeled data.
tures and fuses representations extracted by all models for the Beyond indoor localization, there also exist several
final prediction. By incorporating external events information, research works that apply deep learning in outdoor scenarios.
their proposal achieves the highest accuracy among all deep For example, Zheng and Weng introduce a lightweight
learning and non-deep learning methods studied. developmental network for outdoor navigation applications on
Lin et al. consider generating human movement chains mobile devices [391]. Compared to CNNs, their architecture
from cellular data, to support transportation planning [204]. requires 100 times fewer weights to be updated, while
In particular, they first employ an input-output Hidden maintaining decent accuracy. This enables efficient outdoor
Markov Model (HMM) to label activity profiles for CDR navigation on mobile devices. Work in [97] studies localization
data pre-processing. Subsequently, an LSTM is designed under both indoor and outdoor environments. They use an
for activity chain generation, given the labeled activity AE to pre-train a four-layer MLP, in order to avoid hand-
sequences. They further synthesize urban mobility plans crafted feature engineering. The MLP is subsequently used
using the generative model and the simulation results reveal to estimate the coarse position of targets. The authors
reasonable fit accuracy. Shao et al. design a sophisticated further introduce an HMM to fine-tune the predictions based
pedometer using a CNN. By reducing false negative steps on temporal properties of data. This improves the accuracy
caused by periodic movements, their proposal significantly estimation in both in-/out-door positioning with Wi-Fi signals.
improves the robustness of the pedometer.
Lessons learned: Mobility analysis is concerned with the
User Localization. Deep learning is also playing an important movement trajectory of a single user or large groups of
role in user localization. To overcome the variability and users. The data of interest are essential time series, but have
coarse-granularity limitations of signal strength based meth- an additional spatial dimension. CNNs and RNNs are the
ods, Wang et al. propose a deep learning driven fingerprinting most successful architectures in such applications (e.g., [204],
system name “DeepFi” to perform indoor localization based on [258]–[261]), as they can effectively exploit spatial and tem-
Channel State Information (CSI) [265]. Their toolbox yields poral correlations. The localization on the other hand, relies
much higher accuracy as compared to traditional methods, on sensors, signal strength, or CSI. These data usually have
including including FIFS [393], Horus [394], and Maximum complex features, therefore large amounts of data are required
Likelihood [395]. The same group of authors extend their for learning [268]. As deep learning can extract features in
work in [242], [243] and [266], [267], where they update an unsupervised manner, it has become a strong candidate for
the localization system, such that it can work with calibrated localization tasks.
phase information of CSI [242], [243]. They further use more
sophisticated CNN [266] and bi-modal structures [267] to
improve the accuracy. E. Deep Learning Driven Wireless Sensor Networks
Nowicki and Wietrzykowski propose a localization frame- Wireless Sensor Networks (WSNs) consist of a set of unique
work that reduces significantly the effort of system tuning or or heterogeneous sensors that are distributed over geographical
filtering and obtains satisfactory prediction performance [268] regions. Theses sensors collaboratively monitor physical or
. Wang et al. suggest that the objective of indoor localization environment status (e.g. temperature, pressure, motion, pollu-
can be achieved without the help of mobile devices. In [270], tion, etc.) and transmit the data collected to centralized servers
the authors employ an AE to learn useful patterns from through wireless channels (see purple circle in Fig. 7 for an
WiFi signals. By automatic feature extraction, they produce a illustration). A WSN typically involves three key core tasks,
predictor that can fulfill multi-tasks simultaneously, including namely sensing, communication and analysis. Deep learning
indoor localization, activity, and gesture recognition. Kumar is becoming increasingly popular for WSN data analysis. In
et al. use deep learning to address the problem of indoor what follows, we review works of deep learning driven WSNs.
vehicles localization [272]. They employ CNNs to analyze Note that this is distinct from mobile data analysis discussed
visual signal and localize vehicles in a car park. This can help in subsections VI-B and VI-C, as in this subsection we only
driver assistance systems operate in underground environments focus on WSN applications. Before starting, we summarize
where the system has limited vision ability. the most important works in Table XIV.
Most mobile devices can only produce unlabeled position There exist two data processing scenarios in WSNs, namely
data, therefore unsupervised and semi-supervised learning centralized and decentralized. The former simply takes sensors
become essential. Mohammadi et al. address this problem by as data collectors, which are only responsible for gathering
leveraging DRL and VAE. In particular, their framework envi- data and sending these to a central location for processing. The
sions a virtual agent in indoor environments [271], which can latter assumes sensors have some computational ability and the
constantly receive state information during training, including main server offloads part of the jobs to the edge, each sensor
signal strength indicators, current agent location, and the real performing data processing individually. Work in [284] focuses
(labeled data) and inferred (via a VAE) distance to the target. on the centralized approach and the authors apply a 3-layer
The agent can virtually move in eight directions at each time MLP to reduce data redundancy while maintaining essential
step. Each time it takes an action, the agent receives an reward points for data aggregation. These data are sent to a central
signal, identifying whether it moves to a correct direction. server for analysis. In contrast, Li et al. propose to distribute
By employing deep Q learning, the agent can finally localize data mining to individual sensors [285]. They partition a
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 27

TABLE XIV: A summary of work on deep learning driven WSNs.


Reference Application Model Optimizer Key contribution
Exploits both received signal strength indicators
Chuang and Jiang [276] Node localization MLP Unknown
and hop count to improve accuracy
Resilient Dramatically reduces the memory consumption of
Bernas and Płaczek [277] Indoor localization MLP
backpropagation received signal strength map storage
First-order and
second-order Compares different training algorithms of MLP for
Payal et al. [278] Node localization MLP
gradient descent WSN localization
algorithms
Performs WSN localization in underwater
Dong et al. [279] Underwater localization MLP RMSprop
environments
Smoldering and flaming Achieves high accuracy in detecting fire in forests
Yan et al. [280] MLP SGD
combustion identification using smoke, CO2 and temperature sensors
Employs deep learning to learn correlation between
Wang et al. [281] Temperature correction MLP SGD
polar radiation and air temperature error
Employs adaptive query refinement to enable
Lee et al. [282] Online query processing CNN Unknown
real-time analysis
Hopfield Embedding Hopfield NNs as a static optimizer for
Li and Serpen [283] Self-adaptive WSN Unknown
network the weakly-connected dominating problem
Improves the energy efficiency in the aggregation
Khorasani and Naji [284] Data aggregation MLP Unknown
process
Performing data analysis at distributed nodes,
Li et al. [285] Distributed data mining MLP Unknown
reducing by 58.31% the energy consumption
Distributed WSN Employs distributed anomaly detection techniques
Luo and Nagarajany [286] AE SGD
anomaly detection to offload computations from the cloud

deep neural network into different layers and offload layer relationship between solar radiation and actual air temperature,
operations to sensor nodes. Simulations conducted suggest which can be effectively learned by neural networks.
that, by pre-processing with neural networks, their framework Missing data or de-synchronization are common in WSN
obtains high fault detection accuracy, while reducing power data collection. These may lead to serious problems in analysis
consumption at the central server. due to inconsistency. Lee et al. address this problem by
Chuang and Jiang exploit neural networks to localize sensor plugging a query refinement component in deep learning
nodes in WSNs [276]. To adapt deep learning models to based WSN analysis systems [282]. They employ exponential
specific network topology, they employ an online training smoothing to infer missing data, thereby maintaining the
scheme and correlated topology-trained data, enabling effi- integrity of data for deep learning analysis without signif-
cient model implementations and accurate location estimation. icantly compromising accuracy. To enhance the intelligence
Based on this, Bernas and Płaczek architect an ensemble of WSNs, Li and Serpen embed an artificial neural network
system that involves multiple MLPs for location estimation into a WSN, allowing it to agilely react to potential changes
in different regions of interest [277]. In this scenario, node and following deployment in the field [283]. To this end,
locations inferred by multiple MLPs are fused by a fusion they employ a minimum weakly-connected dominating set to
algorithm, which improves the localization accuracy, particu- represent the WSN topology, and subsequently use a Hopfield
larly benefiting sensor nodes that are around the boundaries of recurrent neural network as a static optimizer, to adapt network
regions. A comprehensive comparison of different training al- infrastructure to potential changes as necessary. This work
gorithms that apply MLP-based node localization is presented represents an important step towards embedding machine
in [278]. Experiments suggest that the Bayesian regularization intelligence in WSNs.
algorithm in general yields the best performance. Dong et
Lessons learned: Looking at Table XIV, it is interesting to
al. consider an underwater node localization scenario [279].
see that the majority of deep learning practices in WSNs em-
Since acoustic signals are subject to loss caused by absorption,
ploy MLP models. Since MLP is straightforward to architect
scattering, noise, and interference, underwater localization is
and performs reasonably well, it remains a good candidate for
not straightforward. By adopting a deep neural network, their
WSN applications. On the other hand, since most sensor data
framework successfully addresses the aforementioned chal-
collected is sequential, we expect RNN-based models will play
lenges and achieves higher inference accuracy as compared
a more important role in this area.
to SVM and generalized least square methods.
Deep learning has also been exploited for identification of
smoldering and flaming combustion phases in forests. In [280], F. Deep Learning Driven Network Control
Yan et al. embed a set of sensors into a forest to monitor CO2 , In this part, we turn our attention to mobile network
smoke, and temperature. They suggest that various burning control problems. Due to powerful function approximation
scenarios will emit different gases, which can be taken into mechanism, deep learning has made remarkable breakthroughs
account when classifying smoldering and flaming combustion. in improving traditional reinforcement learning [26] and imi-
Wang et al. consider deep learning to correct inaccurate tation learning [396]. These advances have potential to solve
measurements of air temperature [281]. They discover a close mobile network control problems which are complex and pre-
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 28

Reinforcement Learning useful information and delivers this to an agent, to execute


Rewards
the actions. We illustrate the principles between the three
Mobile network control paradigms in Fig. 9. We review works proposed so far
Neural network agent environment
State
representation
in this space next, and summarize these efforts in Table XV.
Actions

Network Optimization refers to the management of network


resources and functions in a given environment, with the goal
Observed state
of improving the network performance. Deep learning has
recently achieved several successful results in this area. For
Imitation Learning example, Liu et al. exploit a DBN to discover the correlations
Demonstration
between multi-commodity flow demand information and link
usage in wireless networks [287]. Based on the predictions
made, they remove the links that are unlikely to be scheduled,
Mobile network
so as to reduce the size of data for the demand constrained
Desired
Neural network agent actions environment energy minimization. Their method reduces runtime by up
Observation
representation Learning to 50%, without compromising optimality. Subramanian and
Predicted Banerjee propose to use deep learning to predict the health
actions
condition of heterogeneous devices in machine to machine
communications [288]. The results obtained are subsequently
Observations
exploited for optimizing health aware policy change decisions.
He et al. employ deep reinforcement learning to address
Analysis-based Control caching and interference alignment problems in wireless
Mobile network networks [289], [290]. In particular, they treat time-varying
Neural network analysis environment
Observation channels as finite-state Markov channels and apply deep
representation
Analysis Q networks to learn the best user selection policy. This
results
Outer novel framework demonstrates significantly higher sum
controller
rate and energy efficiency over existing approaches. Chen
Observations et al. shed light on automatic traffic optimization using a
deep reinforcement learning approach [294]. Specifically,
Fig. 9: Principles of three control approaches applied in they architect a two-level DRL framework, which imitates
mobile and wireless networks control, namely reinforcement the Peripheral and Central Nervous Systems in animals, to
learning (above), imitation learning (middle), and analysis- address scalability problems at datacenter scale. In their
based control (below). design, multiple peripheral systems are deployed on all
end-hosts, so as to make decisions locally for short traffic
flows. A central system is further employed to decide on
the optimization with long traffic flows, which are more
viously considered intractable [397], [398]. Recall that in re- tolerant to longer delay. Experiments in a testbed with 32
inforcement learning, an agent continuously interacts with the severs suggest that the proposed design reduces the traffic
environment to learn the best action. With constant exploration optimization turn-around time and flow completion time
and exploitation, the agent learns to maximize its expected significantly, compared to existing approaches.
return. Imitation learning follows a different learning paradigm
called “learning by demonstration”. This learning paradigm Routing: Deep learning can also improve the efficiency of
relies on a ‘teacher’ who tells the agent what action should routing rules. Lee et al. exploit a 3-layer deep neural network
be executed under certain observations during the training. to classify node degree, given detailed information of the
After sufficient demonstrations, the agent learns a policy that routing nodes [295]. The classification results along with
imitates the behavior of the teacher and can operate standalone temporary routes are exploited for subsequent virtual route
without supervision. For instance, an agent is trained to mimic generation using the Viterbi algorithm. Mao et al. employ a
human behaviour (e.g., in applications such as game play, self- DBN to decide the next routing node and construct a software
driving vehicles, or robotics), instead of learning by interacting defined router [166]. By considering Open Shortest Path First
with the environment, as in the case of pure reinforcement as the optimal routing strategy, their method achieves up
learning. This is because in such applications, making mistakes to 95% accuracy, while reducing significantly the overhead
can have fatal consequences [27]. and delay, and achieving higher throughput with a signaling
Beyond these two approaches, analysis-based control is interval of 240 milliseconds. A similar outcome is obtained
gaining traction in mobile networking. Specifically, this in [259], where the authors employ Hopfield neural networks
scheme uses ML models for network data analysis, and for routing, achieving better usability and survivability in
subsequently exploits the results to aid network control. mobile ad hoc network application scenarios.
Unlike reinforcement/imitation learning, analysis-based
control does not directly output actions. Instead, it extract Scheduling: There are several studies that investigate schedul-
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 29

TABLE XV: A summary of work on deep learning driven network control.


Domain Reference Application Control approach Model
Demand constrained energy
Liu et al. [287] Analysis-based DBN
minimization
Subramanian and Machine to machine system Deep multi-modal
Analysis-based
Banerjee [288] optimization network
Network optimization
Caching and interference
He et al. [289], [290] Reinforcement learning Deep Q learning
alignment
mmWave Communication
Masmar and Evans [291] Reinforcement learning Deep Q learning
performance optimization
Handover optimization in wireless
Wang et al. [292] Reinforcement learning Deep Q learning
systems
Cellular network random access
Chen and Smith [293] Reinforcement learning Deep Q learning
optimization
Chen et al. [294] Automatic traffic optimization Reinforcement learning Deep policy gradient
Lee et al. [295] Virtual route assignment Analysis-based MLP
Yang et al. [259] Routing optimization Analysis-based Hopfield neural networks
Routing
Mao et al. [166] Software defined routing Imitation learning DBN
Tang et al. [296] Wireless network routing Imitation learning CNN
Hybrid dynamic voltage and
Zhang et al. [297] Reinforcement learning Deep Q learning
frequency scaling scheduling
Roadside communication network
Scheduling Atallah et al. [298] Reinforcement learning Deep Q learning
scheduling
Chinchali et al. [299] Cellular network traffic scheduling Reinforcement learning Policy gradient
Roadside communications
Atallah et al. [298] Reinforcement learning Deep Q learning
networks scheduling
User scheduling and content
Wei et al. [300] Reinforcement learning Deep policy gradient
caching for mobile edge networks
Resource management over
Sun et al. [301] Imitation learning MLP
wireless networks
Resource allocation in cloud radio
Resource allocation Xu et al. [302] Reinforcement learning Deep Q learning
access networks
Resource management in
Ferreira et al. [303] Reinforcement learning Deep SARSA
cognitive communications
Proactive resource management
Challita et al. [305] Reinforcement learning Deep policy gradient
for LTE
Resource allocation in
Ye and Li [304] Reinforcement learning Deep Q learning
vehicle-to-vehicle communication
Naparstek and Cohen
Dynamic spectrum access Reinforcement learning Deep Q learning
[306]
Radio control O’Shea and Clancy [307] Radio control and signal detection Reinforcement learning Deep Q learning
Intercell-interference cancellation
Wijaya et al. [308], [310] Imitation learning RBM
and transmit power optimization
Rutagemwa et al. [309] Dynamic spectrum alligenment Analysis-based RNN
Mao et al. [311] Adaptive video bitrate Reinforcement learning A3C
Oda et al. [312], [313] Mobile actor node control Reinforcement learning Deep Q learning
Kim [314] IoT load balancing Analysis-based DBN
Other Path planning for aerial vehicle Multi-agent echo state
Challita et al. [315] Reinforcement learning
networking networks
Luo et al. [316] Wireless online power control Reinforcement learning Deep Q learning
Multiple access for wireless
Yu et al. [317] Reinforcement learning Deep Q learning
network
Xu et al. [318] Traffic engineering Reinforcement learning Deep policy gradient
Liu et al. [319] Base station sleep control Reinforcement learning Deep Q learning

ing with deep learning. Zhang et al. introduce a deep Q function, the agent learns a scheduling policy that achieves
learning-powered hybrid dynamic voltage and frequency scal- lower latency and busy time, and longer battery life, compared
ing scheduling mechanism, to reduce the energy consumption to traditional scheduling methods.
in real-time systems (e.g. Wi-Fi, IoT, video applications) [297].
In their proposal, an AE is employed to approximate the Q More recently, Chinchali et al. present a policy gradient
function and the framework performs experience replay [399] based scheduler to optimize the cellular network traffic
to stabilize the training process and accelerate convergence. flow [299]. Specifically, they cast the scheduling problem as
Simulations demonstrate that this method reduces by 4.2% a MDP and employ RF to predict network throughput, which
the energy consumption of a traditional Q learning based is subsequently used as a component of a reward function.
method. Similarly, the work in [298] uses deep Q learning for Evaluations with a realistic network simulator demonstrate
scheduling in roadside communications networks. In partic- that this proposal can dynamically adapt to traffic variations,
ular, interactions between vehicular environments, including which enables mobile networks to carry 14.7% more data
the sequence of actions, observations, and reward signals traffic, while outperforming heuristic schedulers by more than
are formulated as an MDP. By approximating the Q value 2×. Wei et al. address user scheduling and content caching
simultaneously [300]. In particular, they train a DRL agent,
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 30

consisting of an actor for deciding which base station should and deployed on an adaptive bit rate server, demonstrating
serve certain content, and whether to save the content. A that the system outperforms the best existing scheme by 12%-
critic is further employed to estimate the value function and 25% in terms of QoE. Liu et al. apply deep Q learning to
deliver feedback to the actor. Simulations over a cluster of reduce the energy consumption in cellular networks [319].
base stations show that the agent can yield low transmission They train an agent to dynamically switch on/off base stations
delay. based on traffic consumption in areas of interest. An action-
wise experience replay mechanism is further designed for
Resource Allocation: Sun et al. use a deep neural network balancing different traffic behaviours. Experiments show that
to approximate the mapping between the input and output their proposal can significantly reduce the energy consumed
of the Weighted Minimum Mean Square Error resource by base stations, outperforming naive table-based Q learning
allocation algorithm [400], in interference-limited wireless approaches.
network environments [301]. By effective imitation learning, Kim and Kim link deep learning with the load balancing
the neural network approximation achieves close performance problem in IoT [314]. The authors suggest that DBNs can
to that of its teacher. Deep learning has also been applied effectively analyze network load and process structural
to cloud radio access networks, Xu et al. employing deep Q configuration, thereby achieving efficient load balancing in
learning to determine the on/off modes of remote radio heads IoT. Challita et al. employ a deep reinforcement learning
given, the current mode and user demand [302]. Comparisons algorithm based on echo state networks to perform path
with single base station association and fully coordinated planning for a cluster of unmanned aerial vehicles [315].
association methods suggest that the proposed DRL controller Their proposal yields lower delay than a heuristic baseline. Xu
allows the system to satisfy user demand while requiring et al. employ a DRL agent to learn from network dynamics
significantly less energy. Ferreira et al. employ deep State- how to control traffic flow [318]. They advocate that DRL
Action-Reward-State-Action (SARSA) to address resource is suitable for this problem, as it performs remarkably well
allocation management in cognitive communications [303]. in handling dynamic environments and sophisticated state
By forecasting the effects of radio parameters, this framework spaces. Simulations conducted over three network topologies
avoids wasted trials of poor parameters, which reduces the confirm this viewpoint, as the DRL agent significantly reduces
computational resources required. the delay, while providing throughput comparable to that of
traditional approaches.
Radio Control: In [306], the authors address the dynamic
spectrum access problem in multichannel wireless network Lessons learned: There exist three approaches to network
environments using deep reinforcement learning. In this set- control using deep learning i.e., reinforcement learning, im-
ting, they incorporate an LSTM into a deep Q network, itation learning, and analysis-based control. Reinforcement
to maintain and memorize historical observations, allowing learning requires to interact with the environment, trying
the architecture to perform precise state estimation, given different actions and obtaining feedback in order to improve.
partial observations. The training process is distributed to each The agent will make mistakes during training, and usually
user, which enables effective training parallelization and the needs a large number of steps of steps to become smart.
learning of good policies for individual users. Experiments Therefore, most works do not train the agent on the real
demonstrate that this framework achieves double the channel infrastructure, as making mistakes usually can have serious
throughput, when compared to a benchmark method. consequences for the network. Instead, a simulator that mimics
The work in [307] sheds light on the radio control the real network environments is built and the agent is trained
and signal detection problems. In particular, the authors offline using that. This imposes high fidelity requirements on
introduce a radio signal search environment based on the the simulator, as the agent can not work appropriately in an
Gym Reinforcement Learning platform. Their agent exhibits environment that is different from the one used for training.
a steady learning process and is able to learn a radio In contrast, the imitation learning mechanism “learns by
signal search policy. Rutagemwa et al. employ an RNN to demonstration”. It requires a teacher that provides labels
perform traffic prediction, which can subsequently aid the telling the agent what it should do under certain circumstances.
dynamic spectrum assignment in mobile networks [309]. In the networking context, this mechanism is usually employed
With accurate traffic forecasting, their proposal improves to reduce the computational time [166]. Specifically, in some
the performance of spectrum sharing in dynamic wireless network application (e.g., routing), computing the optimal
environments, as it attains near-optimal spectrum assignments. solution is time-consuming, which cannot satisfy the delay
constraints of mobile network. To mitigate this, one can
Other applications: Deep learning is playing an important generate a large dataset offline, and use an NN agent to learn
role in other network control problems as well. Mao et al. the optimal actions.
develop the Pensieve system that generates adaptive video Analysis-based control on the other hand, is suitable for
bit rate algorithms using deep reinforcement learning [311]. problems were decisions cannot be based solely on the state
Specifically, Pensieve employs a state-of-the-art deep rein- of the network environment. One can use a NN to extract addi-
forcement learning algorithm A3C, which takes the bandwidth, tional information (e.g. traffic forecasts), which subsequently
bit rate and buffer size as input, and selects the bit rate that aids decisions. For example, the dynamic spectrum assignment
leads to the best expected return. The model is trained offline can benefit from the analysis-based control.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 31

G. Deep Learning Driven Network Security excellent false positive and false negative rates. Distributed
attacks are also studied in [324], where the authors focus on
With the increasing popularity of wireless connectivity, an IoT scenario. Another work in [325] employs MLPs to
protecting users, network equipment and data from malicious detect distributed denial of service attacks. By characterizing
attacks, unauthorized access and information leakage becomes typical patterns of attack incidents, the proposed model works
crucial. Cyber security systems guard mobile devices and well in detecting both known and unknown distributed denial
users through firewalls, anti-virus software, and Intrusion of service attacks.
Detection Systems (IDS) [401]. The firewall is an access Martin et al. propose a conditional VAE to identify
security gateway that allows or blocks the uplink and downlink intrusion incidents in IoT [326]. In order to improve detection
network traffic, based on pre-defined rules. Anti-virus software performance, their VAE infers missing features associated
detects and removes computer viruses, worms and Trojans and with incomplete measurements, which are common in IoT
malware. IDSs identify unauthorized and malicious activities, environments. The true data labels are embedded into the
or rule violations in information systems. Each performs decoder layers to assist final classification. Evaluations on the
its own functions to protect network communication, central well-known NSL-KDD dataset [404] demonstrate that their
servers and edge devices. model achieves remarkable accuracy in identifying denial
Modern cyber security systems benefit increasingly of service, probing, remote to user and user to root attacks,
from deep learning [403], since it can enable the system to outperforming traditional ML methods by 0.18 in terms of
(i) automatically learn signatures and patterns from experience F1 score. Hamedani et al. employ MLPs to detect malicious
and generalize to future intrusions (supervised learning); or attacks in delayed feedback networks [327]. The proposal
(ii) identify patterns that are clearly differed from regular achieves more than 99% accuracy over 10,000 simulations.
behavior (unsupervised learning). This dramatically reduces
the effort of pre-defined rules for discriminating intrusions. Software level security: Nowadays, mobile devices are carry-
Beyond protecting networks from attacks, deep learning can ing considerable amount of private information. This informa-
also be used for attack purposes, bringing huge potential tion can be stolen and exploited by malicious apps installed on
to steal or crack user passwords or information. In this smartphones for ill-conceived purposes [405]. Deep learning
subsection, we review deep learning driven network security is being exploited for analyzing and detecting such threats.
from three perspectives, namely infrastructure, software, and Yuan et al. use both labeled and unlabeled mobile apps
user privacy. Specifically, infrastructure level security work to train an RBM [330]. By learning from 300 samples,
focuses on detecting anomalies that occur in the physical their model can classify Android malware with remarkable
network and software level work is centred on identifying accuracy, outperforming traditional ML tools by up to 19%.
malware and botnets in mobile networks. From the user Their follow-up research in [331] named Droiddetector further
privacy perspective, we discuss methods to protect from how improves the detection accuracy by 2%. Similarly, Su et al.
to protect against private information leakage, using deep analyze essential features of Android apps, namely requested
learning. To our knowledge, no other reviews summarize permission, used permission, sensitive application program-
these efforts. We summarize these works in Table XVI. ming interface calls, action and app components [332]. They
employ DBNs to extract features of malware and an SVM for
Infrastructure level security: We mostly focus on anomaly classification, achieving high accuracy and only requiring 6
detection at the infrastructure level, i.e. identifying network seconds per inference instance.
events (e.g., attacks, unexpected access and use of data) that Hou et al. attack the malware detection problem from a
do not conform to expected behaviors. Many researchers different perspective. Their research points out that signature-
exploit the outstanding unsupervised learning ability of AEs based detection is insufficient to deal with sophisticated An-
[320]. For example, Thing investigates features of attacks and droid malware [333]. To address this problem, they propose
threats that exist in IEEE 802.11 networks [165]. The author the Component Traversal, which can automatically execute
employs a stacked AE to categorize network traffic into 5 types code routines to construct weighted directed graphs. By em-
(i.e. legitimate, flooding, injection and impersonation traffic), ploying a Stacked AE for graph analysis, their framework
achieving 98.67% overall accuracy. The AE is also exploited in Deep4MalDroid can accurately detect Android malware that
[321], where Aminanto and Kim use an MLP and stacked AE intentionally repackages and obfuscates to bypass signatures
for feature selection and extraction, demonstrating remarkable and hinder analysis attempts to their inner operations. This
performance. Similarly, Feng et al. use AEs to detect abnormal work is followed by that of Martinelli et al., who exploit
spectrum usage in wireless communications [322]. Their ex- CNNs to discover the relationship between app types and
periments suggest that the detection accuracy can significantly extracted syscall traces from real mobile devices [334]. The
benefit from the depth of AEs. CNN has also been used in [335], where the authors draw
Distributed attack detection is also an important issue in inspiration from NLP and take the disassembled byte-code of
mobile network security. Khan et al. focus on detecting flood- an app as a text for analysis. Their experiments demonstrate
ing attacks in wireless mesh networks [323]. They simulate that CNNs can effectively learn to detect sequences of opcodes
a wireless environment with 100 nodes, and artificially inject that are indicative of malware. Chen et al. incorporate location
moderate and severe distributed flooding attacks, to generate a information into the detection framework and exploit an RBM
synthetic dataset. Their deep learning based methods achieve for feature extraction and classification [336]. Their proposal
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 32

TABLE XVI: A summary of work on deep learning driven network security.


Learning
Level Reference Application Problem considered Model
paradigm
Malware classification & Denial of service, Unsupervised
Azar et al. [320] Cyber security applications Stacked AE
probing, remote to user & user to root & supervised
IEEE 802.11 network
Unsupervised
Thing [165] anomaly detection and Flooding, injection and impersonation attacks Stacked AE
& supervised
attack classification
Aminanto and Wi-Fi impersonation attacks Unsupervised
Flooding, injection and impersonation attacks MLP, AE
Infrastructure Kim [321] detection & supervised
Sudden signal-to-noise ratio changes in the Unsupervised
Feng et al. [322] Spectrum anomaly detection AE
communication channels & supervised
Flooding attacks detection
Khan et al. [323] Moderate and severe distributed flood attack Supervised MLP
in wireless mesh networks
Diro and IoT distributed attacks Denial of service, probing, remote to user &
Supervised MLP
Chilamkurti [324] detection user to root
Distributed denial of service Known and unknown distributed denial of
Saied et al. [325] Supervised MLP
attack detection service attack
Martin et al. Denial of service, probing, remote to user & Unsupervised Conditional
IoT intrusion detection
[326] user to root & supervised VAE
Hamedani et al. Attacks detection in delayed Attack detection in smart grids using reservoir
Supervised MLP
[327] feedback networks computing
Luo and Spikes and burst recorded by temperature and
Anomalies in WSNs Unsupervised AE
Nagarajany [286] relative humidity sensors
Das et al. [328] IoT authentication Long duration of signal imperfections Supervised LSTM
Packets from different hardware use same
Jiang et al. [329] MAC spoofing detection Supervised CNN
MAC address
Apps in Contagio Mobile and Google Play Unsupervised
Yuan et al. [330] Android malware detection RBM
Store & supervised
Apps in Contagio Mobile, Google Play Store Unsupervised
Yuan et al. [331] Android malware detection DBN
and Genome Project & supervised
Apps in Drebin, Android Malware Genome
Unsupervised
Su et al. [332] Android malware detection Project, the Contagio Community, and Google DBN + SVM
& supervised
Software Play Store
App samples from Comodo Cloud Security Unsupervised
Hou et al. [333] Android malware detection Stacked AE
Center & supervised
Apps in Drebin, Android Malware Genome
Martinelli [334] Android malware detection Supersived CNN
Project and Google Play Store
McLaughlin et al. Apps in Android Malware Genome project and
Android malware detection Supersived CNN
[335] Google Play Store
Malicious application
Unsupervised
Chen et al. [336] detection at the network Publicly-available malicious applications RBM
& supervised
edge
Wang et al. [200] Malware traffic classification Traffic extracted from 9 types of malware Superivised CNN
Oulehla et al.
Mobile botnet detection Client-server and hybrid botnets Unknown Unknown
[337]
Torres et al. [338] Botnet detection Spam, HTTP and unknown traffic Superivised LSTM
Eslahi et al. [339] Mobile botnet detection HTTP botnet traffic Superivised MLP
Alauthaman et al.
Peer-to-peer botnet detection Waledac and Strom Bots Superivised MLP
[340]
Shokri and Privacy preserving deep Avoiding sharing data in collaborative model
Superivised MLP, CNN
Shmatikov [341] learning training
Privacy preserving deep Addressing information leakage introduced
Phong et al. [342] Supervised MLP
learning in [341]
Privacy-preserving mobile
Ossia et al. [343] Offloading feature extraction from cloud Supervised CNN
analytics
User privacy
Deep learning with Preventing exposure of private information in
Abadi et al. [344] Supervised MLP
differential privacy training data
MLP & Latent
Privacy-preserving personal Unsupervised Dirichlet
Osia et al. [345] Offloading personal data from clouds
model training & supervised Allocation
[402]
Privacy-preserving model Breaking down large models for
Servia et al. [346] Supervised CNN
inference privacy-preserving analytics
Stealing information from Breaking the ordinary and differentially private
Hitaj et al. [347] Unsupervised GAN
collaborative deep learning collaborative deep learning
Hitaj et al. [344] Password guessing Generating passwords from leaked password set Unsupervised GAN
Reconstructing functions of polyalphabetic
Greydanus [348] Enigma learning Supervised LSTM
cipher
MLP, AE,
Maghrebi [349] Breaking cryptographic Side channel attacks Supervised
CNN, LSTM
Employing adversarial generation to guess
Liu et al. [350] Password guessing Unsupervised LSTM
passwords
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 33

improves the performance of other ML methods. and an analyzer (on the cloud side) to minimize the exposure
Botnets are another important threat to mobile networks. of sensitive information. Through local processing of raw
A botnet is effectively a network that consists of machines input data, sensitive personal information is transferred into
compromised by bots. These machine are usually under abstract features, which avoids direct disclosure to the cloud.
the control of a botmaster who takes advantages of the Experiments on gender classification and emotion detection
bots to harm public services and systems [406]. Detecting suggest that this framework can effectively preserve user
botnets is challenging and now becoming a pressing task privacy, while maintaining remarkable inference accuracy.
in cyber security. Deep learning is playing an important Deep learning has also been exploited for cyber attacks,
role in this area. For example, Oulehla et al. propose to including attempts to compromise private user information and
employ neural networks to extract features from mobile guess passwords. In [347], Hitaj et al. suggest that learning
botnet behaviors [337]. They design a parallel detection a deep model collaboratively is not reliable. By training a
framework for identifying both client-server and hybrid GAN, their attacker is able to affect such learning process and
botnets, and demonstrate encouraging performance. Torres lure the victims to disclose private information, by injecting
et al. investigate the common behavior patterns that botnets fake training samples. Their GAN even successfully breaks
exhibit across their life cycle, using LSTMs [338]. They the differentially private collaborative learning in [344]. The
employ both under-sampling and over-sampling to address authors further investigate the use of GANs for password
the class imbalance between botnet and normal traffic in the guessing. In [410], they design PassGAN, which learns the
dataset, which is common in anomaly detection problems. distribution of a set of leaked passwords. Once trained on a
Similar issues are also studies in [339] and [340], where dataset, PassGAN is able to match over 46% of passwords in a
the authors use standard MLPs to perform mobile and different testing set, without user intervention or cryptography
peer-to-peer botnet detection respectively, achieving high knowledge. This novel technique has potential to revolutionize
overall accuracy. current password guessing algorithms.
Greydanus breaks a decryption rule using an LSTM
User privacy level: Preserving user privacy during training network [348]. They treat decryption as a sequence-to-
and evaluating a deep neural network is another important sequence translation task, and train a framework with large
research issue [407]. Initial research is conducted in [341], enigma pairs. The proposed LSTM demonstrates remarkable
where the authors enable user participation in the training performance in learning polyalphabetic ciphers. Maghrebi
and evaluation of a neural network, without sharing their et al. exploit various deep learning models (i.e. MLP, AE,
input data. This allows to preserve individual’s privacy while CNN, LSTM) to construct a precise profiling system and
benefiting all users, as they collaboratively improve the model perform side channel key recovery attacks [349]. Surprisingly,
performance. Their framework is revisited and improved in deep learning based methods demonstrate overwhelming
[342], where another group of researchers employ additively performance over other template machine learning attacks in
homomorphic encryption, to address the information leakage terms of efficiency in breaking both unprotected and protected
problem ignored in [341], without compromising model accu- Advanced Encryption Standard implementations.
racy. This significantly boosts the security of the system.
Osia et al. focus on privacy-preserving mobile analytics Lessons learned: Most deep learning based solutions focus
using deep learning. They design a client-server framework on existing network attacks, yet new attacks emerge every day.
based on the Siamese architecture [408], which accommodates As these new attacks may have different features and appear
a feature extractor in mobile devices and correspondingly a to behave ’normally’, old NN models may not easily detect
classifier in the cloud [343]. By offloading feature extraction them. Therefore, an effective deep learning technique should
from the cloud, their system offers strong privacy guarantees. be able to (i) rapidly transfer the knowledge of old attacks to
An innovative work in [344] implies that deep neural networks detect newer ones; and (ii) constantly absorb the features of
can be trained with differential privacy. The authors introduce newcomers and update the underlying model. Transfer learning
a differentially private SGD to avoid disclosure of private and lifelong learning are strong candidates to address this
information of training data. Experiments on two publicly- problems, as we will discuss in Sec.VII-C. Research in this
available image recognition datasets demonstrate that their directions remains shallow, hence we expect more efforts in
algorithm is able to maintain users privacy, with a manageable the future.
cost in terms of complexity, efficiency, and performance.
This approach is also useful for edge-based privacy filtering
techniques such as Distributed One-class Learning [409]. H. Deep Learning Driven Signal Processing
Servia et al. consider training deep neural networks on Deep learning is also gaining increasing attention in signal
distributed devices without violating privacy constraints [346]. processing, in applications including Multi-Input Multi-Output
Specifically, the authors retrain an initial model locally, tai- (MIMO) and modulation. MIMO has become a fundamental
lored to individual users. This avoids transferring personal technique in current wireless communications, both in cellular
data to untrusted entities, hence user privacy is guaranteed. and WiFi networks. By incorporating deep learning, MIMO
Osia et al. focus on protecting user’s personal data from the performance is intelligently optimized based on environment
inferences’ perspective. In particular, they break the entire conditions. Modulation recognition is also evolving to be more
deep neural network into a feature extractor (on the client side) accurate, by taking advantage of deep learning. We give an
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 34

TABLE XVII: A summary of deep learning driven signal processing.


Domain Reference Application Model
Samuel et al. [351] MIMO detection MLP
Yan et al. [352] Signal detection in a MIMO-OFDM system AE+ELM
Vieira et al. [273] Massive MIMO fingerprint-based positioning CNN
MIMO systems Neumann et al. [353] MIMO channel estimation CNN
Wijaya et al. [308], [310] Inter-cell interference cancellation and transmit power optimization RBM
O’Shea et al. [354] Optimization of representations and encoding/decoding processes AE
Borgerding et al. [355] Sparse linear inverse problem in MIMO CNN
Fujihashi et al. [356] MIMO nonlinear equalization MLP
Rajendran et al. [357] Automatic modulation classification LSTM
CNN, ResNet, Inception
West and O’Shea [358] Modulation recognition
Modulation CNN, LSTM
Radio transformer
O’Shea et al. [359] Modulation recognition
network
O’Shea and Hoydis [360] Modulation classification CNN
Jagannath et al. [361] Modulation classification in a software defined radio testbed MLP
O’Shea et al. [362] Radio traffic sequence recognition LSTM
AE + radio transformer
O’Shea et el. [363] Learning to communicate over an impaired channel
network
Others
Ye et al. [364] Channel estimation and signal detection in OFDM systsms. MLP
Liang et al. [365] Channel decoding CNN
Lyu et al. [366] NNs for channel decoding MLP, CNN and RNN
Dörner et al. [367] Over-the-air communications system AE

overview of relevant work in this area in Table XVII. stations, thereby preventing degradation of network perfor-
mance due to inter-cell interference. The neural network is
MIMO Systems: Samuel et al. suggest that deep neural trained to estimate the optimal transmit power at every packet
networks can be a good estimator of transmitted vectors in transmission, selecting that with the highest activation prob-
a MIMO channel. By unfolding a projected gradient de- ability. Simulations demonstrate that the proposed framework
scent method, they design an MLP-based detection network significantly outperform the belief propagation algorithm that
to perform binary MIMO detection [351]. The Detection is routinely used for transmit power control in MIMO systems,
Network can be implemented on multiple channels after a while attaining a lower computational cost.
single training. Simulations demonstrate that the proposed More recently, O’Shea et al. bring deep learning to physical
architecture achieves near-optimal accuracy, while requiring layer design [354]. They incorporate an unsupervised deep
light computation without prior knowledge of Signal-to-Noise AE into a single-user end-to-end MIMO system, to opti-
Ratio (SNR). Yan et al. employ deep learning to solve a similar mize representations and the encoding/decoding processes, for
problem from a different perspective [352]. By considering transmissions over a Rayleigh fading channel. Experimental
the characteristic invariance of signals, they exploit an AE as results show that the AE system outperforms the Space Time
a feature extractor, and subsequently use an Extreme Learn- Block Code approach in terms of SNR by approximately 15
ing Machine (ELM) to classify signal sources in a MIMO dB. In [355], Borgerding et al. propose to use deep learning
orthogonal frequency division multiplexing (OFDM) system. to recover a sparse signal from noisy linear measurements
Their proposal achieves higher detection accuracy than several in MIMO environments. The proposed scheme is evaluated
traditional methods, while maintaining similar complexity. on compressive random access and massive-MIMO channel
Vieira et al. show that massive MIMO channel measure- estimation, where it achieves better accuracy over traditional
ments in cellular networks can be utilized for fingerprint- algorithms and CNNs.
based inference of user positions [273]. Specifically, they
design CNNs with weight regularization to exploit the sparse Modulation: West and O’Shea compare the modulation
and information-invariance of channel fingerprints, thereby recognition accuracy of different deep learning architectures,
achieving precise positions inference. CNNs have also been including traditional CNN, ResNet, Inception CNN, and
employed for MIMO channel estimation. Neumann et al. LSTM [358]. Their experiments suggest that the LSTM is the
exploit the structure of the MIMO channel model to design best candidate for modulation recognition, since it achieves
a lightweight, approximated maximum likelihood estimator the highest accuracy. Due to its superior performance, an
for a specific channel model [353]. Their methods outperform LSTM is also employed for a similar task in [357]. O’Shea
traditional estimators in terms of computation cost and re- et al. then focus on tailoring deep learning architectures to
duce the number of hyper-parameters to be tuned. A similar radio properties. Their prior work is improved in [359], where
idea is implemented in [364], where Ye et al. employ an they architect a novel deep radio transformer network for
MLP to perform channel estimation and signal detection in precise modulation recognition. Specifically, they introduce
OFDM systems. radio-domain specific parametric transformations into a spatial
Wijaya et al. consider applying deep learning to a different transformer network, which assists in the normalization of
scenario [308], [310]. The authors propose to use non-iterative the received signal, thereby achieving superior performance.
neural networks to perform transmit power control at base This framework also demonstrates automatic synchronization
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 35

TABLE XVIII: A summary of emerging deep learning driven mobile network applications.
Reference Application Model Key contribution
Network data A platform named Net2Vec to facilitate deep learning deployment in
Gonzalez et al. [368] -
monetifzation communication networks.
In-network computation
Kaminski et al. [369] MLP Enables to perform collaborative data processing and reduces latency.
for IoT
Xiao et al. [370] Mobile crowdsensing Deep Q learning Mitigates vulnerabilities of mobile crowdsensing systems.
Resource allocation for Employs deep learning to perform monotone transformations of miners’ bids
Luong et al. [371] MLP
mobile blockchains and outputs the allocation and conditional payment rules in optimal auctions.
Data dissemination in Investigates the relationship between data dissemination performance and
Gulati et al. [372] CNN
Internet of Vehicles (IoV) social score, energy level, number of vehicles and their speed.

abilities, which reduces the dependency on traditional expert user profiles [368] using an on-network machine learning
systems and expensive signal analytic processes. In [360], platform called Net2Vec [411]. Specifically, they analyze
O’Shea and Hoydis introduce several novel deep learning user browsing data in real time and generate user profiles
applications for the network physical layer. They demonstrate using product categories. The profiles can be subsequently
a proof-of-concept where they employ a CNN for modulation associated with the products that are of interest to the users
classification and obtain satisfying accuracy. and employed for online advertising.

Other signal processing applciations: Deep learning is also IoT In-Network Computation: Instead of regarding IoT
adopted for radio signal analysis. In [362], O’Shea et al. nodes as producers of data or the end consumers of processed
employ an LSTM to replace sequence translation routines information, Kaminski et al. embed neural networks into
between radio transmitter and receiver. Although their frame- an IoT deployment and allow the nodes to collaboratively
work works well in ideal environments, its performance drops process the data generated [369]. This enables low-latency
significantly when introducing realistic channel effects. Later, communication, while offloading data storage and processing
the authors consider a different scenario in [363], where they from the cloud. In particular, the authors map each hidden
exploit a regularized AE to enable reliable communications unit of a pre-trained neural network to a node in the IoT
over an impaired channel. They further incorporate a radio network, and investigate the optimal projection that leads
transformer network for signal reconstruction at the decoder to the minimum communication overhead. Their framework
side, thereby achieving receiver synchronization. Simulations achieves functionality similar to in-network computation in
demonstrate that this approach is reliable and can be efficiently WSNs and opens a new research directions in fog computing.
implemented.
In [365], Liang et al. exploit noise correlations to decode Mobile Crowdsensing: Xiao et al. argue that there exist
channels using a deep learning approach. Specifically, they malicious mobile users who intentionally provide false
use a CNN to reduce channel noise estimation errors by sensing data to servers, to save costs and preserve their
learning the noise correlation. Experiments suggest that privacy, which in turn can make mobile crowdsensings
their framework can significantly improve the decoding systems vulnerable [370]. The authors model the server-users
performance. The decoding performance of MLPs , CNNs system as a Stackelberg game, where the server plays the
and RNNs is compared in [366]. By conducting experiments role of a leader that is responsible for evaluating the sensing
in different setting, the obtained results suggest the RNN effort of individuals, by analyzing the accuracy of each
achieves the best decoding performance, nonetheless yielding sensing report. Users are paid based on the evaluation of
the highest computational overhead. their efforts, hence cheating users will be punished with zero
reward. To design an optimal payment policy, the server
Lessons learned: Deep learning is beginning to play an employs a deep Q network, which derives knowledge from
important role in signal processing applications and the per- experience sensing reports, without requiring specific sensing
formance demonstrated by early prototypes is remarkable. models. Simulations demonstrate superior performance in
Complexity remains however an open challenge. We can only terms of sensing quality, resilience to attacks, and server
expect that deep learning will become increasingly popular in utility, as compared to traditional Q learning based and
this area. random payment strategies.

I. Emerging Deep Learning Applications in Mobile Networks Mobile Blockchain: In [371], Luong et al. shed light
In this part, we review work that builds upon deep learning on resource management in mobile blockchain networks
in other mobile networking areas, which are beyond the scopes based on optimal auction. They design an MLP to first
of the subjects discussed thus far. These emerging applications conduct monotone transformations of the miners??? bids and
open several new research directions, as we discuss next. A subsequently output the allocation scheme and conditional
summary of these works is given in Table XVIII. payment rules for each miner. By running experiments with
different settings, the results suggest the propsoed deep
Network Data Monetization: Gonzalez et al. employ learning based framework can deliver much higher profit
unsupervised deep learning to generate real-time accurate to edge computing service provider than the second-price
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 36

auction baseline. that of AlexNet [85], a classical CNN, yet 50 times fewer
parameters [417]. SqueezeNet is also based on CNNs, but
Internet of Vehicles (IoV): Gulati et al. extend the success its significantly smaller model size (i) allows more efficiently
of deep learning to IoV [372]. The authors design a deep training on distributed systems; (ii) reduces the transmission
learning-based content centric data dissemination approach overhead when updating the model at the client side; and (iii)
that comprises three steps, namely (i) performing energy facilitates deployment on resource-limited embedded devices.
estimation on selected vehicles that are capable of data Howard et al. extend this work and introduce an efficient
dissemination; (ii) employing a Weiner process model to family of streamlined CNNs called MobileNet, which uses
identify stable and reliable connections between vehicles; depth-wise separable convolution operations to drastically
and (iii) using a CNN to predict the social relationship reduce the number of computations required and the model
among vehicles. Experiments unveil that the volume of data size [418]. This new design can run with low latency and can
disseminated is positively related to social score, energy satisfy the requirements of mobile and embedded vision ap-
levels, and number of vehicles, while the speed of vehicles plications. The authors further introduce two hyper-parameters
has negative impact on the connection probability. to control the width and resolution of multipliers, which can
help strike an appropriate trade-off between accuracy and
Lessons learned: Although deep learning is versatile and efficiency. The ShuffleNet proposed by Zhang et al. improves
can be employed in a range of tasks in the mobile and the accuracy of MobileNet by employing point-wise group
wireless networking domain, deep learning solutions are not convolution and channel shuffle, while retaining similar model
universal and may not be suitable for any problem. In general, complexity [419]. In particular, the authors discover that more
NNs have low interpretability, which is essential in business groups of convolution operations can reduce the computation
applications such as network economics. Secondly, training a requirements.
deep architecture requires substantial amount of data. Further, Zhang et al. focus on reducing the number of parameters of
deep neural networks usually have many hyper-parameters structures with fully-connected layers for mobile multimedia
and finding their optimal configuration can be difficult. The features learning [420]. This is achieved by applying Trucker
AutoML platform19 provides a first solution to this problem, decomposition to weight sub-tensors in the model, while
which employs progressive neural architecture search [412]. maintaining decent reconstruction capability. The Trucker de-
composition has also been employed in [425], where the
VII. TAILORING D EEP L EARNING TO M OBILE N ETWORKS authors seek to approximate a model with fewer parameters, in
Although deep learning performs remarkably in many mo- order to save memory. Mobile optimizations are further studied
bile networking areas, the No Free Lunch (NFL) theorem for RNN models. In [421], Cao et al. use a mobile toolbox
indicates that there is no single model that can work univer- called RenderScript20 to parallelize specific data structures and
sally well in all problems [413]. This implies that for any enable mobile GPUs to perform computational accelerations.
specific mobile and wireless networking problem, we may Their proposal reduces the latency when running RNN models
need to adapt different deep learning architectures to achieve on Android smartphones. Chen et al. shed light on imple-
the best performance. In this section, we look at how to tailor menting CNNs on iOS mobile devices [422]. In particular,
deep learning to mobile networking applications from three they reduce the model executions latency, through space ex-
perspectives, namely, mobile devices and systems, distributed ploration for data re-usability and kernel redundancy removal.
data centers, and changing mobile network environments. The former alleviates the high bandwidth requirements of
convolutional layers, while the latter reduces the memory
and computational requirements, with negligible performance
A. Tailoring Deep Learning to Mobile Devices and Systems
degradation.
The ultra-low latency requirements of future 5G networks Rallapalli et al. investigate offloading very deep CNNs from
demand runtime efficiency from all operations performed by clouds to edge devices, by employing memory optimization on
mobile systems. This also applies to deep learning driven both mobile CPUs and GPUs [423]. Their framework enables
applications. However, current mobile devices have limited running at high speed deep CNNs with large memory re-
hardware capabilities, which means that implementing com- quirements in mobile object detection applications. Lane et al.
plex deep learning architectures on such equipment may be develop a software accelerator, DeepX, to assist deep learning
computationally unfeasible, unless appropriate model tuning is implementations on mobile devices. The proposed approach
performed. To address this issue, ongoing research improves exploits two inference-time resource control algorithms, i.e.,
existing deep learning architectures [414], such that the in- runtime layer compression and deep architecture decomposi-
ference process does not violate latency or energy constraints tion [424]. The runtime layer compression technique controls
[415], [416]. We outline these works in Table XIX and discuss the memory and computation runtime during the inference
their key contributions next. phase, by extending model compression principles. This is
Iandola et al. design a compact architecture for embedded important in mobile devices, since offloading the inference
systems named SqueezeNet, which has similar accuracy to process to edge devices is more practical with current hardware
19 AutoML – training high-quality custom machine learning models with
minimum effort and machine learning expertise. https://cloud.google.com/ 20 Android Renderscript https://developer.android.com/guide/topics/
automl/ renderscript/compute.html.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 37

TABLE XIX: Summary of works on improving deep learning for mobile devices and systems.
Reference Methods Target model
Iandola et al. [417] Filter size shrinking, reducing input channels and late downsampling CNN
Howard et al. [418] Depth-wise separable convolution CNN
Zhang et al. [419] Point-wise group convolution and channel shuffle CNN
Zhang et al. [420] Tucker decomposition AE
Cao et al. [421] Data parallelization by RenderScript RNN
Chen et al. [422] Space exploration for data reusability and kernel redundancy removal CNN
Rallapalli et al. [423] Memory optimizations CNN
Lane et al. [424] Runtime layer compression and deep architecture decomposition MLP, CNN
Huynh et al. [425] Caching, Tucker decomposition and computation offloading CNN
Wu et al. [426] Parameters quantization CNN
Bhattacharya and Lane [427] Sparsification of fully-connected layers and separation of convolutional kernels MLP, CNN
Georgiev et al. [96] Representation sharing MLP
Cho and Brand [428] Convolution operation optimization CNN
Guo and Potkonjak [429] Filters and classes pruning CNN
Li et al. [430] Cloud assistance and incremental learning CNN
Zen et al. [431] Weight quantization LSTM
Falcao et al. [432] Parallelization and memory sharing Stacked AE

TABLE XX: Summary of work on model and training parallelism for mobile systems and devices.
Parallelism Reference Target Core Idea Improvement
Paradigm
Very large deep neural Employs downpour SGD to support a large number Up to 12× model training
Dean et al. [112] networks in distributed of model replicas and a Sandblaster framework to speed up, using 81
Model systems. support a variety of batch optimizations. machines.
parallelism Teerapittayanon Neural network on cloud Maps a deep neural network to a distributed setting Up to 20× reduction of
et al. [433] and end devices. and jointly trains each individual section. communication cost.
Distills from a pre-trained NN to obtain a smaller
De Coninck et Neural networks on IoT 10ms inference latency
NN that performs classification on a subset of the
al. [99] devices. on a mobile device.
entire space.
Multi-task multi-agent Deep recurrent Q-networks & cautiously-optimistic
Omidshafiei et reinforcement learning learners to approximate action-value function; Near-optimal execution
al. [434] under partial decentralized concurrent experience replays time.
observability. trajectories to stabilize training.
Recht et al. Eliminates overhead associated with locking in Up to 10× speed up in
Parallelized SGD.
[435] distributed SGD. distributed training.
Employs a hyper-parameter-free learning rule to
Training Goyal et al. Distributed synchronous Trains billions of images
adjust the learning rate and a warmup mechanism
parallelism [436] SGD. per day.
to address the early optimization problem.
Combines the stochastic variance reduced gradient
Zhang et al. Asynchronous distributed
algorithm and a delayed proximal gradient Up to 6× speed up
[437] SGD.
algorithm.
Hardy et al. Distributed deep learning Compression technique (AdaComp) to reduce Up to 191× reduction in
[438] on edge-devices. ingress traffic at parameter severs. ingress traffic.
McMahan et al. Distributed training on Users collectively enjoy benefits of shared models Up to 64.3× training
[439] mobile devices. trained with big data without centralized storage. speedup.
Up to 1.98×
Data computation over Secure multi-party computation to obtain model
Keith et al. [440] communication
mobile devices. parameters on distributed mobile devices.
expansion.

platforms. Further, the deep architecture designs “decomposi- B. Tailoring Deep Learning to Distributed Data Containers
tion plans” that seek to optimally allocate data and model
Mobile systems generate and consume massive volumes of
operations to local and remote processors. By combining
mobile data every day. This may involve similar content, but
these two, DeepX enables maximizing energy and runtime
which is distributed around the world. Moving such data to
efficiency, under given computation and memory constraints.
centralized servers to perform model training and evaluation
inevitably introduces communication and storage overheads,
which does not scale. However, neglecting characteristics
embedded in mobile data, which are associated with local
Beyond these works, researchers also successfully adapt culture, human mobility, geographical topology, etc., during
deep learning architectures through other designs and sophis- model training can compromise the robustness of the model
ticated optimizations, such as parameters quantization [426], and implicitly the performance of the mobile network appli-
[431], sparsification and separation [427], representation and cations that build on such models. The solution is to offload
memory sharing [96], [432], convolution operation optimiza- model execution to distributed data centers or edge devices,
tion [428], pruning [429], and cloud assistance [430]. These to guarantee good performance, whilst alleviating the burden
techniques will be of great significance when embedding deep on the cloud.
neural networks into mobile systems. As such, one of the challenges facing parallelism, in the con-
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 38

text of mobile networking, is that of training neural networks concurrent experience replay trajectories and distillation, to
on a large number of mobile devices that are battery powered, enable multi-agent coordination using a single joint policy
have limited computational capabilities and in particular lack under a set of decentralized partially observable MDPs.
GPUs. The key goal of this paradigm is that of training with Their framework can potentially play an important role in
a large number of mobile CPUs at least as effective as with addressing control problems in distributed mobile systems.
GPUs. The speed of training remains important, but becomes
a secondary goal. Training Parallelism is also essential for mobile system,
Generally, there are two routes to addressing this problem, as mobile data usually come asynchronously from differ-
namely, (i) decomposing the model itself, to train (or make ent sources. Training models effectively while maintaining
inference with) its components individually; or (ii) scaling consistency, fast convergence, and accuracy remains however
the training process to perform model update at different challenging [441].
locations associated with data containers. Both schemes allow A practical method to address this problem is to perform
one to train a single model without requiring to centralize all asynchronous SGD. The basic idea is to enable the server that
data. We illustrate the principles of these two approaches in maintains a model to accept delayed information (e.g. data,
Fig. 10 and summarize the existing work in Table XX. gradient updates) from workers. At each update iteration, the
server only requires to wait for a smaller number of workers.
Model Parallelism. Large-scale distributed deep learning is This is essential for training a deep neural network over
first studied in [112], where the authors develop a framework distributed machines in mobile systems. The asynchronous
named DistBelief, which enables training complex neural net- SGD is first studied in [435], where the authors propose a lock-
works on thousands of machines. In their framework, the full free parallel SGD named HOGWILD, which demonstrates
model is partitioned into smaller components and distributed significant faster convergence over locking counterparts. The
over various machines. Only nodes with edges (e.g. connec- Downpour SGD in [112] improves the robustness of the
tions between layers) that cross boundaries between machines training process when work nodes breakdown, as each model
are required to communicate for parameters update and infer- replica requests the latest version of the parameters. Hence
ence. This system further involves a parameter server, which a small number of machine failures will not have a sig-
enables each model replica to obtain the latest parameters nificant impact on the training process. A similar idea has
during training. Experiments demonstrate that the proposed been employed in [436], where Goyal et al. investigate the
framework can be training significantly faster on a CPU usage of a set of techniques (i.e. learning rate adjustment,
cluster, compared to training on a single GPU, while achieving warm-up, batch normalization), which offer important insights
state-of-the-art classification performance on ImageNet [171]. into training large-scale deep neural networks on distributed
Teerapittayanon et al. propose deep neural networks tailored systems. Eventually, their framework can train an network on
to distributed systems, which include cloud servers, fog layers ImageNet within 1 hour, which is impressive in comparison
and geographically distributed devices [433]. The authors with traditional algorithms.
scale the overall neural network architecture and distribute Zhang et al. argue that most of asynchronous SGD al-
its components hierarchically from cloud to end devices. gorithms suffer from slow convergence, due to the inherent
The model exploits local aggregators and binary weights, to variance of stochastic gradients [437]. They propose an im-
reduce computational storage, and communication overheads, proved SGD with variance reduction to speed up the con-
while maintaining decent accuracy. Experiments on a multi- vergence. Their algorithm outperforms other asynchronous
view multi-camera dataset demonstrate that this proposal can SGD approaches in terms of convergence, when training deep
perform efficient cloud-based training and local inference. Im- neural networks on the Google Cloud Computing Platform.
portantly, without violating latency constraints, the deep neural The asynchronous method has also been applied to deep
network obtains essential benefits associated with distributed reinforcement learning. In [77], the authors create multiple
systems, such as fault tolerance and privacy. environments, which allows agents to perform asynchronous
Coninck et al. consider distributing deep learning over IoT updates to the main structure. The new A3C algorithm breaks
for classification applications [99]. Specifically, they deploy the sequential dependency and speeds up the training of the
a small neural network to local devices, to perform coarse traditional Actor-Critic algorithm significantly. In [438], Hardy
classification, which enables fast response filtered data to be et al. further study distributed deep learning over cloud and
sent to central servers. If the local model fails to classify, the edge devices. In particular, they propose a training algorithm,
larger neural network in the cloud is activated to perform fine- AdaComp, which allows to compress worker updates of the
grained classification. The overall architecture maintains good target model. This significantly reduce the communication
accuracy, while significantly reducing the latency typically overhead between cloud and edge, while retaining good fault
introduced by large model inference. tolerance.
Decentralized methods can also be applied to deep Federated learning is an emerging parallelism approach that
reinforcement learning. In [434], Omidshafiei et al. consider enables mobile devices to collaboratively learn a shared model,
a multi-agent system with partial observability and limited while retaining all training data on individual devices [439],
communication, which is common in mobile systems. They [442]. Beyond offloading the training data from central servers,
combine a set of sophisticated methods and algorithms, this approach performs model updates with a Secure Aggrega-
including hysteresis learners, a deep recurrent Q network, tion protocol [440], which decrypts the average updates only if
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 39

Model Parallelism Training Parallelism

Machine/Device 1

Machine/Device 3 Machine/Device 4

Asynchronous SGD

Data Collection

Machine/Device 2
Time

Fig. 10: The underlying principles of model parallelism (left) and training parallelism (right).

enough users have participated, without inspecting individual data, their architecture includes two memory buffers, namely
updates. a deep memory and a fast memory. The deep memory is
composed of several deep networks, which are built when the
C. Tailoring Deep Learning to Changing Mobile Network amount of data from an unseen distribution is accumulated
Environments and reaches a threshold. The fast memory component is a
small neural network, which is updated immediately when
Mobile network environments often exhibit changing
coming across a new data sample. These two memory mod-
patterns over time. For instance, the spatial distributions
ules allow to perform continuous learning without forgetting
of mobile data traffic over a region may vary significantly
old knowledge. Experiments on a non-stationary image data
between different times of the day [443]. Applying a deep
stream prove the effectiveness of this model, as it significantly
learning model in changing mobile environments requires
outperforms other online deep learning algorithms. The mem-
lifelong learning ability to continuously absorb new features,
ory mechanism has also been applied in [446]. In particular,
without forgetting old but essential patterns. Moreover, new
the authors introduce a differentiable neural computer, which
smartphone-targeted viruses are spreading fast via mobile
allows neural networks to dynamically read from and write to
networks and may severely jeopardize users’ privacy and
an external memory module. This enables lifelong lookup and
business profits. These pose unprecedented challenges to
forgetting of knowledge from external sources, as humans do.
current anomaly detection systems and anti-virus software,
as such tools must react to new threats in a timely manner, Parisi et al. consider a different lifelong learning scenario
using limited information. To this end, the model should in [447]. They abandon the memory modules in [445] and
have transfer learning ability, which can enable the fast design a self-organizing architecture with recurrent neurons
transfer of knowledge from pre-trained models to different for processing time-varying patterns. A variant of the Growing
jobs or datasets. This will allow models to work well with When Required network is employed in each layer, to to
limited threat samples (one-shot learning) or limited metadata predict neural activation sequences from the previous network
descriptions of new threats (zero-shot learning). Therefore, layer. This allows learning time-vary correlations between
both lifelong learning and transfer learning are essential for inputs and labels, without requiring a predefined number
applications in ever changing mobile network environments. of classes. Importantly, the framework is robust, as it has
We illustrated these two learning paradigms in Fig. 11 and tolerance to missing and corrupted sample labels, which is
review essential research in this subsection. common in mobile data.
Another interesting deep lifelong learning architecture is
Deep Lifelong Learning mimics human behaviors and seeks presented in [448], where Tessler et al. build a DQN agent
to build a machine that can continuously adapt to new envi- that can retain learned skills in playing the famous computer
ronments, retain as much knowledge as possible from previous game Minecraft. The overall framework includes a pre-trained
learning experience [444]. There exist several research efforts model, Deep Skill Network, which is trained a-priori on
that adapt traditional deep learning to lifelong learning. For various sub-tasks of the game. When learning a new task,
example, Lee et al. propose a dual-memory deep learning the old knowledge is maintained by incorporating reusable
architecture for lifelong learning of everyday human behaviors skills through a Deep Skill module, which consists of a Deep
over non-stationary data streams [445]. To enable the pre- Skill Network array and a multi-skill distillation network.
trained model to retain old knowledge while training with new These allow the agent to selectively transfer knowledge to
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 40

Deep Lifelong Learning Deep Transfer Learning


Model 1 Model 2 Model n

Deep Learning Model


...

Knowledge 1 Knowledge 2 Knowledge n Knowledge 1 Knowledge Knowledge 2 Knowledge Knowledge n


Knowledge ... Transfer Transfer
Base
Support
Support

Learning Task 1 Learning task 2 Learning Task n Learning Task 1 Learning task 2 Learning Task n

... ...

Time Time

Fig. 11: The underlying principles of deep lifelong learning (left) and deep transfer learning (right). Lifelong learning retains
the knowledge learned while transfer learning exploits labeled data of one domain to learn in a new target domain.

solve a new task. Experiments demonstrate that their proposal A. Serving Deep Learning with Massive High-Quality Data
significantly outperforms traditional double DQNs in terms Deep neural networks rely on massive and high-quality
of accuracy and convergence. This technique has potential to data to achieve good performance. When training a large
be employed in solving mobile networking problems, as it and complex architecture, data volume and quality are very
can continuously acquire new knowledge. important, as deeper models usually have a huge set of
parameters to be learned and configured. This issue remains
Deep Transfer Learning: Unlike lifelong learning, transfer true in mobile network applications. Unfortunately, unlike
learning only seeks to use knowledge from a specific domain in other research areas such as computer vision and NLP,
to aid learning in a target domain. Applying transfer learning high-quality and large-scale labeled datasets still lack for
can accelerate the new learning process, as the new task does mobile network applications, because service provides and
not require to learn from scratch. This is essential to mobile operators keep the data collected confidential and are reluctant
network environments, as they require to agilely respond to to release datasets. While this makes sense from a user privacy
new network patterns and threats. A number of important standpoint, to some extent it restricts the development of deep
applications emerge in the computer network domain [57], learning mechanisms for problems in the mobile networking
such as Web mining [449], caching [450] and base station domain. Moreover, mobile data collected by sensors and
sleep strategies [184]. network equipment are frequently subject to loss, redundancy,
There exist two extreme transfer learning paradigms, mislabeling and class imbalance, and thus cannot be directly
namely one-shot learning and zero-shot learning. One-shot employed for training purpose.
learning refers to a learning method that gains as much To build intelligent 5G mobile network architecture, ef-
information as possible about a category from only one or ficient and mature streamlining platforms for mobile data
a handful of samples, given a pre-trained model [451]. On the processing are in demand. This requires considerable amount
other hand, zero-shot learning does not require any sample of research efforts for data collection, transmission, cleaning,
from a category [452]. It aims at learning a new distribution clustering, transformation, and annonymization. Deep learning
given meta description of the new category and correlations applications in the mobile network area can only advance if
with existing training data. Though research towards deep one- researchers and industry stakeholder release more datasets,
shot learning [94], [453] and deep zero-shot learning [454], with a view to benefiting a wide range of communities.
[455] is in its infancy, both paradigms are very promising in
detecting new threats or traffic patterns in mobile networks. B. Deep Learning for Spatio-Temporal Mobile Data Mining
Accurate analysis of mobile traffic data over a geographical
region is becoming increasingly essential for event localiza-
VIII. F UTURE R ESEARCH P ERSPECTIVES tion, network resource allocation, context-based advertising
and urban planning [443]. However, due to the mobility of
As deep learning is achieving increasingly promising results smartphone users, the spatio-temporal distribution of mobile
in the mobile networking domain, several important research traffic [456] and application popularity [457] are difficult
issues remain to be addressed in the future. We conclude our to understand (see the example city-scale traffic snapshot
survey by discussing these challenges and pinpointing key in Fig. 12). Recent research suggests that data collected by
mobile networking research problems that could be tackled mobile sensors (e.g. mobile traffic) over a city can be regarded
with novel deep learning tools. as pictures taken by panoramic cameras, which provide a
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 41

3D Mobile Traffic Surface 2D Mobile Traffic Snapshot 2) Single mobile traffic series usually exhibit some period-
icity (both daily and weekly), yet this is not a feature
seen among video pixels.
3) Due to user mobility, traffic consumption is more likely
to stay or shift to neighboring cells in the near future,
which is less likely to be seen in videos.
Such spatio-temporal correlations in mobile traffic can be
exploited as prior knowledge for model design. We recognize
several unique advantages of employing deep learning for
mobile traffic data mining:
1) CNN structures work well in imaging applications, thus
Fig. 12: Example of a 3D mobile traffic surface (left) and 2D
can also serve mobile traffic analysis tasks, given the
projection (right) in Milan, Italy. Figures adapted from [193]
analogies mentioned before.
using data from [458].
2) LSTMs capture well temporal correlations in time series
data such as natural language; hence this structure can
city-scale sensing system for urban surveillance [459]. These also be adapted to traffic forecasting problems.
traffic sensing images enclose information associated with the 3) GPU computing enables fast training of NNs and together
movements of individuals [374]. with parallelization techniques can support low-latency
From both spatial and temporal dimensions perspective, we mobile traffic analysis via deep learning tools.
recognize that mobile traffic data have important similarity In essence, we expect deep learning tools tailored to mo-
with videos or speech, which is an analogy made recently also bile networking, will overcome the limitation of tradi-
in [193] and exemplified in Fig. 13. Specifically, both videos tional regression and interpolation tools such as Exponential
and the large-scale evolution of mobile traffic are composed of Smoothing [460], Autoregressive Integrated Moving Average
sequences of “frames”. Moreover, if we zoom into a small cov- model [461], or unifrom interpolation, which are commonly
erage area to measure long-term traffic consumption, we can used in operational networks.
observe that a single traffic consumption series looks similar to
a natural language sequence. These observations suggest that, C. Deep Unsupervised Learning in Mobile Networks
to some extent, well-established tools for computer vision (e.g.
We observe that current deep learning practices in mobile
CNN) or NLP (e.g. RNN, LSTM) are promising candidate for
networks largely employ supervised learning and reinforce-
mobile traffic analysis.
ment learning. However, as mobile networks generate consid-
Beyond these similarity, we observe several properties of
erable amounts of unlabeled data every day, data labeling is
mobile traffic that makes it unique in comparison with images
costly and requires domain-specific knowledge. To facilitate
or language sequences. Namely,
the analysis of raw mobile network data, unsupervised learn-
1) The values of neighboring ‘pixels’ in fine-grained traffic ing becomes essential in extracting insights from unlabeled
snapshots are not significantly different in general, while data [462], so as to optimize the mobile network functionality
this happens quite often at the edges of natural images. to improve QoE.
The potential of a range of unsupervised deep learning tools
including AE, RBM and GAN remains to be further explored.
Mobile Traffic Evolutions
t t+s Video In general, these models require light feature engineering
and are thus promising for learning from heterogeneous and
unstructured mobile data. For instance, deep AEs work well
Latitude

Latitude

...
for unsupervised anomaly detection [463]. Though less popu-
Longitude Longitude lar, RBMs can perform layer-wise unsupervised pre-training,
Image which can accelerate the overall model training process. GANs
are good at imitating data distributions, thus could be em-
Mobile Traffic
Latitude

Snapshot ployed to mimic real mobile network environments. Recent


research reveals that GANs can even protect communications
Longitude
by crafting custom cryptography to avoid eavesdropping [464].
Zooming All these tools require further research to fulfill their full
Speech Signal
Amplitute
potentials in the mobile networking domain.
Traffic volume

Mobile Traffic
Series
D. Deep Reinforcement Learning for Mobile Network Control
Time Frequency
Many mobile network control problems have been solved
by constrained optimization, dynamic programming and game
Fig. 13: Analogies between mobile traffic data consumption theory approaches. Unfortunately, these methods either make
in a city (left) and other types of data (right). strong assumptions about the objective functions (e.g. function
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 42

convexity) or data distribution (e.g. Gaussian or Poisson dis- [2] Ning Wang, Ekram Hossain, and Vijay K Bhargava. Backhauling 5G
tributed), or suffer from high time and space complexity. As small cells: A radio resource management perspective. IEEE Wireless
Communications, 22(5):41–49, 2015.
mobile networks become increasingly complex, such assump- [3] Fabio Giust, Luca Cominardi, and Carlos J Bernardos. Distributed
tions sometimes turn unrealistic. The objective functions are mobility management for future 5G networks: overview and analysis
further affected by their increasingly large sets of variables, of existing approaches. IEEE Communications Magazine, 53(1):142–
149, 2015.
that pose severe computational and memory challenges to [4] Mamta Agiwal, Abhishek Roy, and Navrati Saxena. Next generation
existing mathematical approaches. 5G wireless networks: A comprehensive survey. IEEE Communications
In contrast, deep reinforcement learning does not make Surveys & Tutorials, 18(3):1617–1655, 2016.
[5] Akhil Gupta and Rakesh Kumar Jha. A survey of 5G network:
strong assumptions about the target system. It employs func- Architecture and emerging technologies. IEEE access, 3:1206–1232,
tion approximation, which explicitly addresses the problem 2015.
of large state-action spaces, enabling reinforcement learning [6] Kan Zheng, Zhe Yang, Kuan Zhang, Periklis Chatzimisios, Kan Yang,
and Wei Xiang. Big data-driven optimization for mobile networks
to scale to network control problems that were previously toward 5G. IEEE network, 30(1):44–51, 2016.
considered hard. Inspired by remarkable achievements in [7] Chunxiao Jiang, Haijun Zhang, Yong Ren, Zhu Han, Kwang-Cheng
Atari [19] and Go [465] games, a number of researchers begin Chen, and Lajos Hanzo. Machine learning paradigms for next-
generation wireless networks. IEEE Wireless Communications,
to explore DRL to solve complex network control problems, 24(2):98–105, 2017.
as we discussed in Sec. VI-F. However, these works only [8] Duong D Nguyen, Hung X Nguyen, and Langford B White. Rein-
scratch the surface and the potential of DRL to tackle mobile forcement learning with network-assisted feedback for heterogeneous
network control problems remains largely unexplored. For rat selection. IEEE Transactions on Wireless Communications, 2017.
[9] Fairuz Amalina Narudin, Ali Feizollah, Nor Badrul Anuar, and Ab-
instance, as DeepMind trains a DRL agent to reduce Google’s dullah Gani. Evaluation of machine learning classifiers for mobile
data centers cooling bill,21 DRL could be exploited to extract malware detection. Soft Computing, 20(1):343–357, 2016.
rich features from cellular networks and enable intelligent [10] Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis,
Gregory R Ganger, Phillip B Gibbons, and Onur Mutlu. Gaia: Geo-
on/off base stations switching, to reduce the infrastructure’s distributed machine learning approaching LAN speeds. In USENIX
energy footprint. Such exciting applications make us believe Symposium on Networked Systems Design and Implementation (NSDI),
that advances in DRL that are yet to appear can revolutionize pages 629–647, 2017.
[11] Wencong Xiao, Jilong Xue, Youshan Miao, Zhen Li, Cheng Chen,
the autonomous control of future mobile networks. Ming Wu, Wei Li, and Lidong Zhou. Tux2: Distributed graph com-
putation for machine learning. In USENIX Symposium on Networked
Systems Design and Implementation (NSDI), pages 669–682, 2017.
E. Summary [12] Paolini, Monica and Fili, Senza . Mastering Analytics: How to benefit
from big data and network complexity: An Analyst Report. RCR
Deep learning is playing an increasingly important role in Wireless News, 2017.
the mobile and wireless networking domain. In this paper, we [13] Anastasia Ioannidou, Elisavet Chatzilari, Spiros Nikolopoulos, and
provided a comprehensive survey of recent work that lies at Ioannis Kompatsiaris. Deep learning advances in computer vision with
3d data: A survey. ACM Computing Surveys (CSUR), 50(2):20, 2017.
the intersection between deep learning and mobile networking. [14] Richard Socher, Yoshua Bengio, and Christopher D Manning. Deep
We summarized both basic concepts and advanced principles learning for nlp (without magic). In Tutorial Abstracts of ACL 2012,
of various deep learning models, then reviewed work specific pages 5–5. Association for Computational Linguistics.
[15] IEEE Network special issue: Exploring Deep Learning for Efficient
to mobile networks across different application scenarios. We and Reliable Mobile Sensing. http://www.comsoc.org/netmag/cfp/
discussed how to tailor deep learning models to general mobile exploring-deep-learning-efficient-and-reliable-mobile-sensing, 2017.
networking applications, an aspect overlooked by previous [Online; accessed 14-July-2017].
[16] Mowei Wang, Yong Cui, Xin Wang, Shihan Xiao, and Junchen Jiang.
surveys. We concluded by pinpointing several open research Machine learning for networking: Workflow, advances and opportuni-
issues and promising directions, which may lead to valuable ties. IEEE Network, 2017.
future research results. Our hope is that this article will become [17] Mohammad Abu Alsheikh, Dusit Niyato, Shaowei Lin, Hwee-Pink
a definite guide to researchers and practitioners interested in Tan, and Zhu Han. Mobile big data analytics using deep learning
and Apache Spark. IEEE network, 30(3):22–29, 2016.
applying machine intelligence to complex problems in mobile [18] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning.
network environments. MIT press, 2016.
[19] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu,
Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller,
ACKNOWLEDGEMENT Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control
through deep reinforcement learning. Nature, 518(7540):529–533,
We would like to thank Zongzuo Wang for sharing valuable 2015.
insights on deep learning, which helped improving the quality [20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.
Nature, 521(7553):436–444, 2015.
of this paper. We also thank the anonymous reviewers, whose
[21] Jürgen Schmidhuber. Deep learning in neural networks: An overview.
detailed and thoughtful feedback helped us give this survey Neural networks, 61:85–117, 2015.
more depth and a broader scope. [22] Weibo Liu, Zidong Wang, Xiaohui Liu, Nianyin Zeng, Yurong Liu,
and Fuad E Alsaadi. A survey of deep neural network architectures
and their applications. Neurocomputing, 234:11–26, 2017.
R EFERENCES [23] Li Deng, Dong Yu, et al. Deep learning: methods and applications.
Foundations and Trends R in Signal Processing, 7(3–4):197–387,
[1] Cisco. Cisco Visual Networking Index: Forecast and Methodology, 2014.
2016-2021, June 2017. [24] Li Deng. A tutorial survey of architectures, algorithms, and applications
for deep learning. APSIPA Transactions on Signal and Information
21 DeepMind AI Reduces Google Data Center Cooling Bill by 40% Processing, 3, 2014.
https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling- [25] Simeone, Osvaldo. A brief introduction to machine learning for
bill-40/ engineers. arXiv preprint arXiv:1709.02840, 2017.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 43

[26] Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and QoE management in 5G wireless networks. IEEE Wireless Communi-
Anil Anthony Bharath. A brief survey of deep reinforcement learn- cations, 24(1):102–110, 2017.
ing. arXiv:1708.05866, 2017. To appear in IEEE Signal Processing [49] Qilong Han, Shuang Liang, and Hongli Zhang. Mobile cloud sensing,
Magazine, Special Issue on Deep Learning for Image Understanding. big data, and 5G networks make an intelligent and smart world. IEEE
[27] Hussein, Ahmed and Gaber, Mohamed Medhat and Elyan, Eyad and Network, 29(2):40–45, 2015.
Jayne, Chrisina. Imitation learning: A survey of learning methods. [50] Sukhdeep Singh, Navrati Saxena, Abhishek Roy, and HanSeok Kim.
ACM Computing Surveys (CSUR), 50(2):21, 2017. f. IETE Technical Review, 34(1):30–39, 2017.
[28] Xue-Wen Chen and Xiaotong Lin. Big data deep learning: challenges [51] Min Chen, Jun Yang, Yixue Hao, Shiwen Mao, and Kai Hwang. A 5G
and perspectives. IEEE access, 2:514–525, 2014. cognitive system for healthcare. Big Data and Cognitive Computing,
[29] Maryam M Najafabadi, Flavio Villanustre, Taghi M Khoshgoftaar, 1(1):2, 2017.
Naeem Seliya, Randall Wald, and Edin Muharemagic. Deep learning [52] Chen, Xianfu and Wu, Jinsong and Cai, Yueming and Zhang, Honggang
applications and challenges in big data analytics. Journal of Big Data, and Chen, Tao. Energy-efficiency oriented traffic offloading in wireless
2(1):1, 2015. networks: A brief survey and a learning approach for heterogeneous
[30] NF Hordri, A Samar, SS Yuhaniz, and SM Shamsuddin. A systematic cellular networks. IEEE Journal on Selected Areas in Communications,
literature review on features of deep learning in big data analytics. In- 33(4):627–640, 2015.
ternational Journal of Advances in Soft Computing & Its Applications, [53] Wu, Jinsong and Guo, Song and Li, Jie and Zeng, Deze. Big data meet
9(1), 2017. green challenges: big data toward green applications. IEEE Systems
[31] Mehdi Gheisari, Guojun Wang, and Md Zakirul Alam Bhuiyan. A Journal, 10(3):888–900, 2016.
survey on deep learning in big data. In Computational Science and [54] Teodora Sandra Buda, Haytham Assem, Lei Xu, Danny Raz, Udi
Engineering (CSE) and Embedded and Ubiquitous Computing (EUC), Margolin, Elisha Rosensweig, Diego R Lopez, Marius-Iulian Corici,
IEEE International Conference on, volume 2, pages 173–180, 2017. Mikhail Smirnov, Robert Mullins, et al. Can machine learning aid in
[32] Zhang, Shuai and Yao, Lina and Sun, Aixin. Deep learning based delivering new use cases and scenarios in 5G? In Network Operations
recommender system: A survey and new perspectives. arXiv preprint and Management Symposium (NOMS), 2016 IEEE/IFIP, pages 1279–
arXiv:1707.07435, 2017. 1284, 2016.
[33] Shui Yu, Meng Liu, Wanchun Dou, Xiting Liu, and Sanming Zhou. [55] Ali Imran, Ahmed Zoha, and Adnan Abu-Dayya. Challenges in 5G:
Networking for big data: A survey. IEEE Communications Surveys & how to empower SON with big data for enabling 5G. IEEE Network,
Tutorials, 19(1):531–549, 2017. 28(6):27–33, 2014.
[34] Mohammad Abu Alsheikh, Shaowei Lin, Dusit Niyato, and Hwee-Pink [56] Bharath Keshavamurthy and Mohammad Ashraf. Conceptual design
Tan. Machine learning in wireless sensor networks: Algorithms, strate- of proactive SONs based on the big data framework for 5G cellular
gies, and applications. IEEE Communications Surveys & Tutorials, networks: A novel machine learning perspective facilitating a shift in
16(4):1996–2018, 2014. the son paradigm. In System Modeling & Advancement in Research
[35] Chun-Wei Tsai, Chin-Feng Lai, Ming-Chao Chiang, Laurence T Yang, Trends (SMART), International Conference, pages 298–304. IEEE,
et al. Data mining for Internet of things: A survey. IEEE Communi- 2016.
cations Surveys and Tutorials, 16(1):77–97, 2014. [57] Paulo Valente Klaine, Muhammad Ali Imran, Oluwakayode Onireti,
[36] Xiang Cheng, Luoyang Fang, Xuemin Hong, and Liuqing Yang. and Richard Demo Souza. A survey of machine learning techniques
Exploiting mobile big data: Sources, features, and applications. IEEE applied to self organizing cellular networks. IEEE Communications
Network, 31(1):72–79, 2017. Surveys and Tutorials, 2017.
[37] Mario Bkassiny, Yang Li, and Sudharman K Jayaweera. A survey on [58] Rongpeng Li, Zhifeng Zhao, Xuan Zhou, Guoru Ding, Yan Chen,
machine-learning techniques in cognitive radios. IEEE Communica- Zhongyao Wang, and Honggang Zhang. Intelligent 5G: When cellular
tions Surveys & Tutorials, 15(3):1136–1159, 2013. networks meet artificial intelligence. IEEE Wireless Communications,
[38] Jeffrey G Andrews, Stefano Buzzi, Wan Choi, Stephen V Hanly, Angel 2017.
Lozano, Anthony CK Soong, and Jianzhong Charlie Zhang. What [59] Nicola Bui, Matteo Cesana, S Amir Hosseini, Qi Liao, Ilaria Malan-
will 5G be? IEEE Journal on selected areas in communications, chini, and Joerg Widmer. A survey of anticipatory mobile networking:
32(6):1065–1082, 2014. Context-based classification, prediction methodologies, and optimiza-
[39] Nisha Panwar, Shantanu Sharma, and Awadhesh Kumar Singh. A tion techniques. IEEE Communications Surveys & Tutorials, 2017.
survey on 5G: The next generation of mobile communication. Physical [60] Panagiotis Kasnesis, Charalampos Patrikakis, and Iakovos Venieris.
Communication, 18:64–84, 2016. Changing the game of mobile data analysis with deep learning. IT
[40] Olakunle Elijah, Chee Yen Leow, Tharek Abdul Rahman, Solomon Professional, 2017.
Nunoo, and Solomon Zakwoi Iliya. A comprehensive survey of pilot [61] Xiang Cheng, Luoyang Fang, Liuqing Yang, and Shuguang Cui. Mobile
contamination in massive MIMO–5G system. IEEE Communications big data: The fuel for data-driven wireless. IEEE Internet of Things
Surveys & Tutorials, 18(2):905–923, 2016. Journal, 2017.
[41] Stefano Buzzi, I Chih-Lin, Thierry E Klein, H Vincent Poor, Chenyang [62] Lidong Wang and Randy Jones. Big data analytics for network
Yang, and Alessio Zappone. A survey of energy-efficient techniques intrusion detection: A survey. International Journal of Networks and
for 5G networks and challenges ahead. IEEE Journal on Selected Areas Communications, 7(1):24–31, 2017.
in Communications, 34(4):697–709, 2016. [63] Nei Kato, Zubair Md Fadlullah, Bomin Mao, Fengxiao Tang, Osamu
[42] Mugen Peng, Yong Li, Zhongyuan Zhao, and Chonggang Wang. Akashi, Takeru Inoue, and Kimihiro Mizutani. The deep learning vision
System architecture and key technologies for 5G heterogeneous cloud for heterogeneous network traffic control: proposal, challenges, and
radio access networks. IEEE network, 29(2):6–14, 2015. future perspective. IEEE Wireless Communications, 24(3):146–153,
[43] Yong Niu, Yong Li, Depeng Jin, Li Su, and Athanasios V Vasilakos. 2017.
A survey of millimeter wave communications (mmwave) for 5G: [64] Michele Zorzi, Andrea Zanella, Alberto Testolin, Michele De Filippo
opportunities and challenges. Wireless Networks, 21(8):2657–2676, De Grazia, and Marco Zorzi. Cognition-based networks: A new
2015. perspective on network optimization using learning and distributed
[44] Xenofon Foukas, Georgios Patounas, Ahmed Elmokashfi, and Ma- intelligence. IEEE Access, 3:1512–1530, 2015.
hesh K Marina. Network slicing in 5G: Survey and challenges. IEEE [65] Zubair Fadlullah, Fengxiao Tang, Bomin Mao, Nei Kato, Osamu
Communications Magazine, 55(5):94–100, 2017. Akashi, Takeru Inoue, and Kimihiro Mizutani. State-of-the-art deep
[45] Tarik Taleb, Konstantinos Samdanis, Badr Mada, Hannu Flinck, Sunny learning: Evolving machine intelligence toward tomorrow’s intelligent
Dutta, and Dario Sabella. On multi-access edge computing: A survey network traffic control systems. IEEE Communications Surveys &
of the emerging 5G network edge architecture & orchestration. IEEE Tutorials, 2017.
Communications Surveys & Tutorials, 2017. [66] Mohammadi, Mehdi and Al-Fuqaha, Ala and Sorour, Sameh and
[46] Pavel Mach and Zdenek Becvar. Mobile edge computing: A survey Guizani, Mohsen. Deep Learning for IoT Big Data and Streaming
on architecture and computation offloading. IEEE Communications Analytics: A Survey. IEEE Communications Surveys & Tutorials, 2018.
Surveys & Tutorials, 2017. [67] Nauman Ahad, Junaid Qadir, and Nasir Ahsan. Neural networks in
[47] Yuyi Mao, Changsheng You, Jun Zhang, Kaibin Huang, and Khaled B wireless networks: Techniques, applications and guidelines. Journal of
Letaief. A survey on mobile edge computing: The communication Network and Computer Applications, 68:1–27, 2016.
perspective. IEEE Communications Surveys & Tutorials, 2017. [68] Mao, Qian and Hu, Fei and Hao, Qi. Deep learning for intelligent
[48] Ying Wang, Peilong Li, Lei Jiao, Zhou Su, Nan Cheng, Xuemin Sher- wireless networks: A comprehensive survey. IEEE Communications
man Shen, and Ping Zhang. A data-driven architecture for personalized Surveys & Tutorials, 2018.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 44

[69] Zhou, Xiangwei and Sun, Mingxuan and Li, Ye Geoffrey and Juang, [92] Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and
Biing-Hwang. Intelligent Wireless Communications Enabled by Cog- Max Welling. Semi-supervised learning with deep generative models.
nitive Radio and Machine Learning. arXiv preprint arXiv:1710.11240, In Advances in Neural Information Processing Systems, pages 3581–
2017. 3589, 2014.
[70] Chen, Mingzhe and Challita, Ursula and Saad, Walid and Yin, [93] Russell Stewart and Stefano Ermon. Label-free supervision of neural
Changchuan and Debbah, Mérouane. Machine learning for wireless networks with physics and domain knowledge. In AAAI, pages 2576–
networks with artificial intelligence: A tutorial on neural networks. 2582, 2017.
arXiv preprint arXiv:1710.02913, 2017. [94] Danilo Rezende, Ivo Danihelka, Karol Gregor, Daan Wierstra, et al.
[71] Ammar Gharaibeh, Mohammad A Salahuddin, Sayed J Hussini, Ab- One-shot generalization in deep generative models. In International
dallah Khreishah, Issa Khalil, Mohsen Guizani, and Ala Al-Fuqaha. Conference on Machine Learning, pages 1521–1529, 2016.
Smart cities: A survey on data management, security and enabling [95] Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew
technologies. IEEE Communications Surveys & Tutorials, 2017. Ng. Zero-shot learning through cross-modal transfer. In Advances in
[72] Nicholas D Lane and Petko Georgiev. Can deep learning revolutionize neural information processing systems, pages 935–943, 2013.
mobile sensing? In Proceedings of the 16th International Workshop on [96] Petko Georgiev, Sourav Bhattacharya, Nicholas D Lane, and Cecilia
Mobile Computing Systems and Applications, pages 117–122. ACM, Mascolo. Low-resource multi-task audio sensing for mobile and
2015. embedded devices via shared deep neural network representations. Pro-
[73] Kaoru Ota, Minh Son Dao, Vasileios Mezaris, and Francesco GB ceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
De Natale. Deep learning for mobile multimedia: A survey. ACM Technologies, 1(3):50, 2017.
Transactions on Multimedia Computing, Communications, and Appli- [97] Wei Zhang, Kan Liu, Weidong Zhang, Youmei Zhang, and Jason Gu.
cations (TOMM), 13(3s):34, 2017. Deep neural networks for wireless localization in indoor and outdoor
[74] Mishra, Preeti and Varadharajan, Vijay and Tupakula, Uday and Pilli, environments. Neurocomputing, 194:279–287, 2016.
Emmanuel S. A detailed investigation and analysis of using machine [98] Francisco Javier Ordóñez and Daniel Roggen. Deep convolutional
learning techniques for intrusion detection. IEEE Communications and LSTM recurrent neural networks for multimodal wearable activity
Surveys & Tutorials, 2018. recognition. Sensors, 16(1):115, 2016.
[75] Yuxi Li. Deep reinforcement learning: An overview. arXiv preprint [99] Elias De Coninck, Tim Verbelen, Bert Vankeirsbilck, Steven Bohez,
arXiv:1701.07274, 2017. Pieter Simoens, Piet Demeester, and Bart Dhoedt. Distributed neural
[76] Chen, Longbiao and Yang, Dingqi and Zhang, Daqing and Wang, networks for Internet of Things: the big-little approach. In Internet
Cheng and Li, Jonathan and others. Deep mobile traffic forecast of Things. IoT Infrastructures: Second International Summit, IoT 360◦
and complementary base station clustering for C-RAN optimization. 2015, Rome, Italy, October 27-29, Revised Selected Papers, Part II,
Journal of Network and Computer Applications, 2018. pages 484–492. Springer, 2016.
[77] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex [100] Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav
Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden,
Kavukcuoglu. Asynchronous methods for deep reinforcement learning. Al Borchers, et al. In-datacenter performance analysis of a tensor
In International Conference on Machine Learning, pages 1928–1937, processing unit. In Computer Architecture (ISCA), ACM/IEEE 44th
2016. Annual International Symposium on, pages 1–12. IEEE, 2017.
[78] Martin Arjovsky, Soumith Chintala, and Leon Bottou. Wasserstein gen- [101] John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. Scal-
erative adversarial networks. In International Conference on Machine able parallel programming with CUDA. Queue, 6(2):40–53, 2008.
Learning, 2017. [102] Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Co-
[79] Damianou, Andreas and Lawrence, Neil. Deep Gaussian processes. In hen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cuDNN:
Artificial Intelligence and Statistics, pages 207–215, 2013. Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759,
[80] Garnelo, Marta and Schwarz, Jonathan and Rosenbaum, Dan and Viola, 2014.
Fabio and Rezende, Danilo J and Eslami, SM and Teh, Yee Whye. [103] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis,
Neural processes. arXiv preprint arXiv:1807.01622, 2018. Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving,
[81] Zhou, Zhi-Hua and Feng, Ji. Deep forest: towards an alternative to Michael Isard, et al. TensorFlow: A system for large-scale machine
deep neural networks. In Proceedings of the 26th International Joint learning. In OSDI, volume 16, pages 265–283, 2016.
Conference on Artificial Intelligence, pages 3553–3559. AAAI Press, [104] Theano Development Team. Theano: A Python framework for
2017. fast computation of mathematical expressions. arXiv e-prints,
[82] W. McCulloch and W. Pitts. A logical calculus of the ideas immanent abs/1605.02688, May 2016.
in nervous activity. Bulletin of Mathematical Biophysics, (5). [105] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev,
[83] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learn- Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell.
ing representations by back-propagating errors. Nature, 323(6088):533, Caffe: Convolutional architecture for fast feature embedding. arXiv
1986. preprint arXiv:1408.5093, 2014.
[84] Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, [106] R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A Matlab-like
speech, and time series. The handbook of brain theory and neural environment for machine learning. In BigLearn, NIPS Workshop, 2011.
networks, 3361(10):1995, 1995. [107] Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, and
[85] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet Eugenio Culurciello. A 240 G-ops/s mobile coprocessor for deep neural
classification with deep convolutional neural networks. In Advances in networks. In Proceedings of the IEEE Conference on Computer Vision
neural information processing systems, pages 1097–1105, 2012. and Pattern Recognition Workshops, pages 682–687, 2014.
[86] Pedro Domingos. A few useful things to know about machine learning. [108] ncnn is a high-performance neural network inference framework opti-
Commun. ACM, 55(10):78–87, 2012. mized for the mobile platform . https://github.com/Tencent/ncnn, 2017.
[87] Ivor W Tsang, James T Kwok, and Pak-Ming Cheung. Core vector [Online; accessed 25-July-2017].
machines: Fast SVM training on very large data sets. Journal of [109] Huawei announces the Kirin 970- new flagship SoC with AI capabil-
Machine Learning Research, 6:363–392, 2005. ities. http://www.androidauthority.com/huawei-announces-kirin-970-
[88] Carl Edward Rasmussen and Christopher KI Williams. Gaussian 797788/, 2017. [Online; accessed 01-Sep-2017].
processes for machine learning, volume 1. MIT press Cambridge, [110] Core ML: Integrate machine learning models into your app. https:
2006. //developer.apple.com/documentation/coreml, 2017. [Online; accessed
[89] Nicolas Le Roux and Yoshua Bengio. Representational power of 25-July-2017].
restricted boltzmann machines and deep belief networks. Neural [111] Ilya Sutskever, James Martens, George E Dahl, and Geoffrey E Hinton.
computation, 20(6):1631–1649, 2008. On the importance of initialization and momentum in deep learning.
[90] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David ICML (3), 28:1139–1147, 2013.
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen- [112] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin,
erative adversarial nets. In Advances in neural information processing Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le,
systems, pages 2672–2680, 2014. et al. Large scale distributed deep networks. In Advances in neural
[91] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A information processing systems, pages 1223–1231, 2012.
unified embedding for face recognition and clustering. In Proceedings [113] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic
of the IEEE Conference on Computer Vision and Pattern Recognition, optimization. International Conference on Learning Representations
pages 815–823, 2015. (ICLR), 2015.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 45

[114] Tim Kraska, Ameet Talwalkar, John C Duchi, Rean Griffith, Michael J [136] Suyoung Bang, Jingcheng Wang, Ziyun Li, Cao Gao, Yejoong Kim,
Franklin, and Michael I Jordan. MLbase: A distributed machine- Qing Dong, Yen-Po Chen, Laura Fick, Xun Sun, Ron Dreslinski, et al.
learning system. In CIDR, volume 1, pages 2–1, 2013. 14.7 a 288µw programmable deep-learning processor with 270kb on-
[115] Trishul M Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik chip weight storage using non-uniform memory hierarchy for mobile
Kalyanaraman. Project adam: Building an efficient and scalable deep intelligence. In IEEE International Conference on Solid-State Circuits
learning training system. In OSDI, volume 14, pages 571–582, 2014. (ISSCC), pages 250–251, 2017.
[116] Henggang Cui, Hao Zhang, Gregory R Ganger, Phillip B Gibbons, and [137] Filipp Akopyan. Design and tool flow of IBM’s truenorth: an ultra-low
Eric P Xing. Geeps: Scalable deep learning on distributed GPUs with power programmable neurosynaptic chip with 1 million neurons. In
a GPU-specialized parameter server. In Proceedings of the Eleventh Proceedings of the International Symposium on Physical Design, pages
European Conference on Computer Systems, page 4. ACM, 2016. 59–60. ACM, 2016.
[117] Lin, Xing and Rivenson, Yair and Yardimci, Nezih T. and Veli, [138] Seyyed Salar Latifi Oskouei, Hossein Golestani, Matin Hashemi, and
Muhammed and Luo, Yi and Jarrahi, Mona and Ozcan, Aydogan. Soheil Ghiasi. Cnndroid: GPU-accelerated execution of trained deep
All-optical machine learning using diffractive deep neural networks. convolutional neural networks on Android. In Proceedings of the ACM
Science, 2018. on Multimedia Conference, pages 1201–1205. ACM, 2016.
[118] Ryan Spring and Anshumali Shrivastava. Scalable and sustainable [139] Corinna Cortes, Xavi Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, and
deep learning via randomized hashing. ACM SIGKDD Conference on Scott Yang. Adanet: Adaptive structural learning of artificial neural
Knowledge Discovery and Data Mining, 2017. networks. ICML, 2017.
[119] Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus [140] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learn-
Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy ing algorithm for deep belief nets. Neural computation, 18(7):1527–
Bengio, and Jeff Dean. Device placement optimization with reinforce- 1554, 2006.
ment learning. International Conference on Machine Learning, 2017. [141] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng.
[120] Eric P Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Convolutional deep belief networks for scalable unsupervised learning
Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. of hierarchical representations. In Proceedings of the 26th annual
Petuum: A new platform for distributed machine learning on big data. international conference on machine learning, pages 609–616. ACM,
IEEE Transactions on Big Data, 1(2):49–67, 2015. 2009.
[121] Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, [142] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and
Simiao Yu, and Yike Guo. TensorLayer: A versatile library for efficient Pierre-Antoine Manzagol. Stacked denoising autoencoders: Learning
deep learning development. In Proceedings of the ACM on Multimedia useful representations in a deep network with a local denoising cri-
Conference, MM ’17, pages 1201–1204, 2017. terion. Journal of Machine Learning Research, 11(Dec):3371–3408,
[122] Moustafa Alzantot, Yingnan Wang, Zhengshuang Ren, and Mani B 2010.
Srivastava. RSTensorFlow: GPU enabled tensorflow for deep learning [143] Diederik P Kingma and Max Welling. Auto-encoding variational bayes.
on commodity Android devices. In Proceedings of the 1st International International Conference on Learning Representations (ICLR), 2014.
Workshop on Deep Learning for Mobile Systems and Applications, [144] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep
pages 7–12. ACM, 2017. residual learning for image recognition. In Proceedings of the IEEE
[123] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward conference on computer vision and pattern recognition, pages 770–778,
Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, 2016.
and Adam Lerer. Automatic differentiation in pytorch. 2017. [145] Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 3D convolutional neural
[124] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, networks for human action recognition. IEEE transactions on pattern
Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Mxnet: analysis and machine intelligence, 35(1):221–231, 2013.
A flexible and efficient machine learning library for heterogeneous [146] Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der
distributed systems. arXiv preprint arXiv:1512.01274, 2015. Maaten. Densely connected convolutional networks. IEEE Conference
[125] Sebastian Ruder. An overview of gradient descent optimization on Computer Vision and Pattern Recognition, 2017.
algorithms. arXiv preprint arXiv:1609.04747, 2016. [147] Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. Learning to
[126] Zeiler, Matthew D. ADADELTA: an adaptive learning rate method. forget: Continual prediction with LSTM. 1999.
arXiv preprint arXiv:1212.5701, 2012. [148] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence
[127] Dozat, Timothy. Incorporating Nesterov momentum into Adam. 2016. learning with neural networks. In Advances in neural information
[128] Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoff- processing systems, pages 3104–3112, 2014.
man, David Pfau, Tom Schaul, and Nando de Freitas. Learning to [149] Shi Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin
learn by gradient descent by gradient descent. In Advances in Neural Wong, and Wang-chun Woo. Convolutional LSTM network: A machine
Information Processing Systems, pages 3981–3989, 2016. learning approach for precipitation nowcasting. In Advances in neural
[129] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, information processing systems, pages 802–810, 2015.
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew [150] Guo-Jun Qi. Loss-sensitive generative adversarial networks on Lips-
Rabinovich. Going deeper with convolutions. In Proceedings of the chitz densities. arXiv preprint arXiv:1701.06264, 2017.
IEEE conference on computer vision and pattern recognition, pages [151] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Lau-
1–9, 2015. rent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis
[130] Zhou, Yingxue and Chen, Sheng and Banerjee, Arindam. Stable gra- Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering
dient descent. In Conference on Uncertainty in Artificial Intelligence, the game of Go with deep neural networks and tree search. Nature,
2018. 529(7587):484–489, 2016.
[131] Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran [152] Hessel, Matteo and Modayil, Joseph and Van Hasselt, Hado and Schaul,
Chen, and Hai Li. TernGrad: Ternary gradients to reduce communi- Tom and Ostrovski, Georg and Dabney, Will and Horgan, Dan and
cation in distributed deep learning. In Advances in neural information Piot, Bilal and Azar, Mohammad and Silver, David. Rainbow: Com-
processing systems, 2017. bining improvements in deep reinforcement learning. arXiv preprint
[132] Flavio Bonomi, Rodolfo Milito, Preethi Natarajan, and Jiang Zhu. Fog arXiv:1710.02298, 2017.
computing: A platform for internet of things and analytics. In Big [153] Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford,
Data and Internet of Things: A Roadmap for Smart Environments, Alec and Klimov, Oleg. Proximal policy optimization algorithms. arXiv
pages 169–186. Springer, 2014. preprint arXiv:1707.06347, 2017.
[133] Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and [154] Ronan Collobert and Samy Bengio. Links between perceptrons, MLPs
Yiran Chen. MoDNN: Local distributed mobile computing system and SVMs. In Proceedings of the twenty-first international conference
for deep neural network. In Design, Automation & Test in Europe on Machine learning, page 23. ACM, 2004.
Conference & Exhibition (DATE), pages 1396–1401. IEEE, 2017. [155] booktitle=Proceedings of the fourteenth international conference on
[134] Mukherjee, Mithun and Shu, Lei and Wang, Di. Survey of fog comput- artificial intelligence and statistics Glorot, Xavier and Bordes, Antoine
ing: Fundamental, network applications, and research challenges. IEEE and Bengio, Yoshua. Deep sparse rectifier neural networks. pages
Communications Surveys & Tutorials, 2018. 315–323, 2011.
[135] Vaquero, Luis M and Rodero-Merino, Luis. Finding your way in [156] Klambauer, Günter and Unterthiner, Thomas and Mayr, Andreas and
the fog: Towards a comprehensive definition of fog computing. ACM Hochreiter, Sepp. Self-normalizing neural networks. In Advances in
SIGCOMM Computer Communication Review, 44(5):27–32, 2014. Neural Information Processing Systems, pages 971–980, 2017.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 46

[157] Geoffrey E Hinton. Training products of experts by minimizing [177] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew
contrastive divergence. Neural computation, 14(8):1771–1800, 2002. Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Jo-
[158] George Casella and Edward I George. Explaining the gibbs sampler. hannes Totz, Zehan Wang, et al. Photo-realistic single image super-
The American Statistician, 46(3):167–174, 1992. resolution using a generative adversarial network. In IEEE Conference
[159] Takashi Kuremoto, Masanao Obayashi, Kunikazu Kobayashi, Takaomi on Computer Vision and Pattern Recognition, 2017.
Hirata, and Shingo Mabu. Forecast chaotic time series data by DBNs. [178] Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and
In Image and Signal Processing (CISP), 7th International Congress Shuicheng Yan. Perceptual generative adversarial networks for small
on, pages 1130–1135. IEEE, 2014. object detection. In IEEE Conference on Computer Vision and Pattern
[160] Yann Dauphin and Yoshua Bengio. Stochastic ratio matching of RBMs Recognition, 2017.
for sparse high-dimensional inputs. In Advances in Neural Information [179] Yijun Li, Sifei Liu, Jimei Yang, and Ming-Hsuan Yang. Generative
Processing Systems, pages 1340–1348, 2013. face completion. In IEEE Conference on Computer Vision and Pattern
[161] Tara N Sainath, Brian Kingsbury, Bhuvana Ramabhadran, Petr Fousek, Recognition, 2017.
Petr Novak, and Abdel-rahman Mohamed. Making deep belief net- [180] Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine.
works effective for large vocabulary continuous speech recognition. Continuous deep Q-learning with model-based acceleration. In Inter-
In Automatic Speech Recognition and Understanding (ASRU), IEEE national Conference on Machine Learning, pages 2829–2838, 2016.
Workshop on, pages 30–35, 2011. [181] Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisỳ, Dustin
[162] Yoshua Bengio et al. Learning deep architectures for AI. Foundations Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson,
and trends R in Machine Learning, 2(1):1–127, 2009. and Michael Bowling. Deepstack: Expert-level artificial intelligence in
[163] Mayu Sakurada and Takehisa Yairi. Anomaly detection using au- heads-up no-limit poker. Science, 356(6337):508–513, 2017.
toencoders with nonlinear dimensionality reduction. In Proceedings [182] Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre
Workshop on Machine Learning for Sensory Data Analysis (MLSDA), Quillen. Learning hand-eye coordination for robotic grasping with deep
page 4. ACM, 2014. learning and large-scale data collection. The International Journal of
[164] Miguel Nicolau, James McDermott, et al. A hybrid autoencoder and Robotics Research, page 0278364917710318, 2016.
density estimation model for anomaly detection. In International [183] Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil
Conference on Parallel Problem Solving from Nature, pages 717–726. Yogamani. Deep reinforcement learning framework for autonomous
Springer, 2016. driving. Electronic Imaging, (19):70–76, 2017.
[165] Vrizlynn LL Thing. IEEE 802.11 network anomaly detection and [184] Rongpeng Li, Zhifeng Zhao, Xianfu Chen, Jacques Palicot, and Hong-
attack classification: A deep learning approach. In IEEE Wireless gang Zhang. Tact: A transfer actor-critic learning framework for energy
Communications and Networking Conference (WCNC), pages 1–6, saving in cellular radio access networks. IEEE Transactions on Wireless
2017. Communications, 13(4):2000–2011, 2014.
[166] Bomin Mao, Zubair Md Fadlullah, Fengxiao Tang, Nei Kato, Osamu [185] Hasan AA Al-Rawi, Ming Ann Ng, and Kok-Lim Alvin Yau. Ap-
Akashi, Takeru Inoue, and Kimihiro Mizutani. Routing or computing? plication of reinforcement learning to routing in distributed wireless
the paradigm shift towards intelligent computer network packet trans- networks: a review. Artificial Intelligence Review, 43(3):381–416, 2015.
mission based on deep learning. IEEE Transactions on Computers, [186] Yan-Jun Liu, Li Tang, Shaocheng Tong, CL Philip Chen, and Dong-
2017. Juan Li. Reinforcement learning design-based adaptive tracking control
[167] Valentin Radu, Nicholas D Lane, Sourav Bhattacharya, Cecilia Mas- with less learning parameters for nonlinear discrete-time MIMO sys-
colo, Mahesh K Marina, and Fahim Kawsar. Towards multimodal deep tems. IEEE Transactions on Neural Networks and Learning Systems,
learning for activity recognition on mobile devices. In Proceedings 26(1):165–176, 2015.
of ACM International Joint Conference on Pervasive and Ubiquitous [187] Laura Pierucci and Davide Micheli. A neural network for quality of
Computing: Adjunct, pages 185–188, 2016. experience estimation in mobile communications. IEEE MultiMedia,
[168] Valentin Radu, Catherine Tong, Sourav Bhattacharya, Nicholas D Lane, 23(4):42–49, 2016.
Cecilia Mascolo, Mahesh K Marina, and Fahim Kawsar. Multimodal [188] Youngjune Gwon and HT Kung. Inferring origin flow patterns in Wi-Fi
deep learning for activity and context recognition. Proceedings of the with deep learning. In ICAC, pages 73–83, 2014.
ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, [189] Laisen Nie, Dingde Jiang, Shui Yu, and Houbing Song. Network traffic
1(4):157, 2018. prediction based on deep belief network in wireless mesh backbone
[169] Ramachandra Raghavendra and Christoph Busch. Learning deeply cou- networks. In Wireless Communications and Networking Conference
pled autoencoders for smartphone based robust periocular verification. (WCNC), IEEE, pages 1–5, 2017.
In Image Processing (ICIP), IEEE International Conference on, pages [190] Vusumuzi Moyo et al. The generalization ability of artificial neural
325–329, 2016. networks in forecasting TCP/IP traffic trends: How much does the size
[170] Jing Li, Jingyuan Wang, and Zhang Xiong. Wavelet-based stacked of learning rate matter? International Journal of Computer Science
denoising autoencoders for cell phone base station user number predic- and Application, 2015.
tion. In IEEE International Conference on Internet of Things (iThings) [191] Jing Wang, Jian Tang, Zhiyuan Xu, Yanzhi Wang, Guoliang Xue, Xing
and IEEE Green Computing and Communications (GreenCom) and Zhang, and Dejun Yang. Spatiotemporal modeling and prediction in
IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE cellular networks: A big data enabled deep learning approach. In
Smart Data (SmartData), pages 833–838, 2016. INFOCOM–36th Annual IEEE International Conference on Computer
[171] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Communications, 2017.
Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, [192] Chaoyun Zhang and Paul Patras. Long-term mobile traffic forecasting
Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large using deep spatio-temporal neural networks. In Proceedings of the
Scale Visual Recognition Challenge. International Journal of Computer Eighteenth ACM International Symposium on Mobile Ad Hoc Network-
Vision (IJCV), 115(3):211–252, 2015. ing and Computing, pages 231–240. ACM, 2018.
[172] Junmo Kim Yunho Jeon. Active convolution: Learning the shape [193] Chaoyun Zhang, Xi Ouyang, and Paul Patras. ZipNet-GAN: Inferring
of convolution for image classification. In Proceedings of the IEEE fine-grained mobile traffic patterns via a generative adversarial neural
Conference on Computer Vision and Pattern Recognition, 2017. network. In Proceedings of the 13th ACM Conference on Emerging
[173] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Networking Experiments and Technologies. ACM, 2017.
and Yichen Wei. Deformable convolutional networks. In Proceedings [194] Huang, Chih-Wei and Chiang, Chiu-Ti and Li, Qiuhui. A study of
of the IEEE International Conference on Computer Vision, pages 764– deep learning networks on mobile traffic forecasting. In Personal,
773, 2017. Indoor, and Mobile Radio Communications (PIMRC), 28th Annual
[174] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long- International Symposium on, pages 1–6. IEEE, 2017.
term dependencies with gradient descent is difficult. IEEE transactions [195] Zhang, Chuanting and Zhang, Haixia and Yuan, Dongfeng and Zhang,
on neural networks, 5(2):157–166, 1994. Minggao. Citywide cellular traffic prediction based on densely con-
[175] Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. Hybrid nected convolutional neural networks. IEEE Communications Letters,
speech recognition with deep bidirectional LSTM. In Automatic Speech 2018.
Recognition and Understanding (ASRU), IEEE Workshop on, pages [196] Navabi, Shiva and Wang, Chenwei and Bursalioglu, Ozgun Y and
273–278, 2013. Papadopoulos, Haralabos. Predicting wireless channel features using
[176] Rie Johnson and Tong Zhang. Supervised and semi-supervised text neural networks. arXiv preprint arXiv:1802.00107, 2018.
categorization using LSTM for region embeddings. In International [197] Zhanyi Wang. The applications of deep learning on traffic identifica-
Conference on Machine Learning, pages 526–534, 2016. tion. BlackHat USA, 2015.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 47

[198] Wei Wang, Ming Zhu, Jinlin Wang, Xuewen Zeng, and Zhongzhen [216] Usman Mahmood Khan, Zain Kabir, Syed Ali Hassan, and Syed Has-
Yang. End-to-end encrypted traffic classification with one-dimensional san Ahmed. A deep learning framework using passive WiFi sensing
convolution neural networks. In Intelligence and Security Informatics for respiration monitoring. In Global Communications Conference
(ISI), IEEE International Conference on, pages 43–48, 2017. (GLOBECOM), pages 1–6. IEEE, 2017.
[199] Mohammad Lotfollahi, Ramin Shirali, Mahdi Jafari Siavoshani, and [217] Dawei Li, Theodoros Salonidis, Nirmit V Desai, and Mooi Choo
Mohammdsadegh Saberian. Deep packet: A novel approach for Chuah. Deepcham: Collaborative edge-mediated adaptive deep learning
encrypted traffic classification using deep learning. arXiv preprint for mobile object recognition. In Edge Computing (SEC), IEEE/ACM
arXiv:1709.02656, 2017. Symposium on, pages 64–76, 2016.
[200] Wei Wang, Ming Zhu, Xuewen Zeng, Xiaozhou Ye, and Yiqiang Sheng. [218] Luis Tobías, Aurélien Ducournau, François Rousseau, Grégoire
Malware traffic classification using convolutional neural network for Mercier, and Ronan Fablet. Convolutional neural networks for object
representation learning. In Information Networking (ICOIN), Interna- recognition on mobile devices: A case study. In Pattern Recognition
tional Conference on, pages 712–717. IEEE, 2017. (ICPR), 23rd International Conference on, pages 3530–3535. IEEE,
[201] Victor C Liang, Richard TB Ma, Wee Siong Ng, Li Wang, Marianne 2016.
Winslett, Huayu Wu, Shanshan Ying, and Zhenjie Zhang. Mercury: [219] Parisa Pouladzadeh and Shervin Shirmohammadi. Mobile multi-food
Metro density prediction with recurrent neural network on streaming recognition using deep learning. ACM Transactions on Multimedia
CDR data. In Data Engineering (ICDE), IEEE 32nd International Computing, Communications, and Applications (TOMM), 13(3s):36,
Conference on, pages 1374–1377, 2016. 2017.
[202] Bjarke Felbo, Pål Sundsøy, Alex’Sandy’ Pentland, Sune Lehmann, [220] Ryosuke Tanno, Koichi Okamoto, and Keiji Yanai. DeepFoodCam: A
and Yves-Alexandre de Montjoye. Using deep learning to predict DCNN-based real-time mobile food recognition system. In Proceedings
demographics from mobile phone metadata. International Conference of the 2nd International Workshop on Multimedia Assisted Dietary
on Learning Representations (ICLR), workshop track, 2016. Management, pages 89–89. ACM, 2016.
[203] Nai Chun Chen, Wanqin Xie, Roy E Welsch, Kent Larson, and Jenny [221] Pallavi Kuhad, Abdulsalam Yassine, and Shervin Shimohammadi.
Xie. Comprehensive predictions of tourists’ next visit location based on Using distance estimation and deep learning to simplify calibration in
call detail records using machine learning and deep learning methods. food calorie measurement. In Computational Intelligence and Virtual
In Big Data (BigData Congress), IEEE International Congress on, Environments for Measurement Systems and Applications (CIVEMSA),
pages 1–6, 2017. IEEE International Conference on, pages 1–6, 2015.
[204] Ziheng Lin, Mogeng Yin, Sidney Feygin, Madeleine Sheehan, Jean- [222] Teng Teng and Xubo Yang. Facial expressions recognition based on
Francois Paiement, and Alexei Pozdnoukhov. Deep generative models convolutional neural networks for mobile virtual reality. In Proceedings
of urban mobility. IEEE Transactions on Intelligent Transportation of the 15th ACM SIGGRAPH Conference on Virtual-Reality Continuum
Systems, 2017. and Its Applications in Industry-Volume 1, pages 475–478. ACM, 2016.
[205] Chang Xu, Kuiyu Chang, Khee-Chin Chua, Meishan Hu, and Zhenx- [223] Jinmeng Rao, Yanjun Qiao, Fu Ren, Junxing Wang, and Qingyun Du.
iang Gao. Large-scale Wi-Fi hotspot classification via deep learning. A mobile outdoor augmented reality method combining deep learning
In Proceedings of the 26th International Conference on World Wide object detection and spatial relationships for geovisualization. Sensors,
Web Companion, pages 857–858. International World Wide Web Con- 17(9):1951, 2017.
ferences Steering Committee, 2017.
[224] Ming Zeng, Le T Nguyen, Bo Yu, Ole J Mengshoel, Jiang Zhu, Pang
[206] Qianyu Meng, Kun Wang, Bo Liu, Toshiaki Miyazaki, and Xiaoming
Wu, and Joy Zhang. Convolutional neural networks for human activity
He. QoE-based big data analysis with deep learning in pervasive edge
recognition using mobile sensors. In Mobile Computing, Applications
environment. In International Conference on Communications (ICC),
and Services (MobiCASE), 6th International Conference on, pages
pages 1–6. IEEE, 2018.
197–205. IEEE, 2014.
[207] Sicong Liu and Junzhao Du. Poster: Mobiear-building an environment-
[225] Bandar Almaslukh, Jalal AlMuhtadi, and Abdelmonim Artoli. An ef-
independent acoustic sensing platform for the deaf using deep learning.
fective deep autoencoder approach for online smartphone-based human
In Proceedings of the 14th Annual International Conference on Mobile
activity recognition. International Journal of Computer Science and
Systems, Applications, and Services Companion, pages 50–50. ACM,
Network Security (IJCSNS), 17(4):160, 2017.
2016.
[208] Liu Sicong, Zhou Zimu, Du Junzhao, Shangguan Longfei, Jun Han, and [226] Xinyu Li, Yanyi Zhang, Ivan Marsic, Aleksandra Sarcevic, and Ran-
Xin Wang. Ubiear: Bringing location-independent sound awareness to dall S Burd. Deep learning for RFID-based activity recognition.
the hard-of-hearing people with smartphones. Proceedings of the ACM In Proceedings of the 14th ACM Conference on Embedded Network
on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(2):17, Sensor Systems CD-ROM, pages 164–175. ACM, 2016.
2017. [227] Sourav Bhattacharya and Nicholas D Lane. From smart to deep: Robust
[209] Vasu Jindal. Integrating mobile and cloud for PPG signal selection to activity recognition on smartwatches using deep learning. In Perva-
monitor heart rate during intensive physical exercise. In Proceedings sive Computing and Communication Workshops (PerCom Workshops),
of the International Workshop on Mobile Software Engineering and IEEE International Conference on, pages 1–6, 2016.
Systems, pages 36–37. ACM, 2016. [228] Antreas Antoniou and Plamen Angelov. A general purpose intelligent
[210] Edward Kim, Miguel Corte-Real, and Zubair Baloch. A deep semantic surveillance system for mobile devices using deep learning. In Neural
mobile application for thyroid cytopathology. In Proc. SPIE, volume Networks (IJCNN), International Joint Conference on, pages 2879–
9789, page 97890A, 2016. 2886. IEEE, 2016.
[211] Aarti Sathyanarayana, Shafiq Joty, Luis Fernandez-Luque, Ferda Ofli, [229] Saiwen Wang, Jie Song, Jaime Lien, Ivan Poupyrev, and Otmar
Jaideep Srivastava, Ahmed Elmagarmid, Teresa Arora, and Shahrad Hilliges. Interacting with Soli: Exploring fine-grained dynamic gesture
Taheri. Sleep quality prediction from wearable data using deep recognition in the radio-frequency spectrum. In Proceedings of the
learning. JMIR mHealth and uHealth, 4(4), 2016. 29th Annual Symposium on User Interface Software and Technology,
[212] Honggui Li and Maria Trocan. Personal health indicators by deep pages 851–860. ACM, 2016.
learning of smart phone sensor data. In Cybernetics (CYBCONF), [230] Yang Gao, Ning Zhang, Honghao Wang, Xiang Ding, Xu Ye, Guanling
2017 3rd IEEE International Conference on, pages 1–5, 2017. Chen, and Yu Cao. ihear food: Eating detection using commodity
[213] Mohammad-Parsa Hosseini, Tuyen X Tran, Dario Pompili, Kost Elise- bluetooth headsets. In Connected Health: Applications, Systems and
vich, and Hamid Soltanian-Zadeh. Deep learning with edge computing Engineering Technologies (CHASE), IEEE First International Confer-
for localization of epileptogenicity using multimodal rs-fMRI and ence on, pages 163–172, 2016.
EEG big data. In Autonomic Computing (ICAC), IEEE International [231] Jindan Zhu, Amit Pande, Prasant Mohapatra, and Jay J Han. Using
Conference on, pages 83–92, 2017. deep learning for energy expenditure estimation with wearable sensors.
[214] Cosmin Stamate, George D Magoulas, Stefan Küppers, Effrosyni In E-health Networking, Application & Services (HealthCom), 17th
Nomikou, Ioannis Daskalopoulos, Marco U Luchini, Theano Mous- International Conference on, pages 501–506. IEEE, 2015.
souri, and George Roussos. Deep learning parkinson’s from smartphone [232] Pål Sundsøy, Johannes Bjelland, B Reme, A Iqbal, and Eaman Jahani.
data. In Pervasive Computing and Communications (PerCom), IEEE Deep learning applied to mobile phone data for individual income
International Conference on, pages 31–40, 2017. classification. ICAITA doi, 10, 2016.
[215] Tom Quisel, Luca Foschini, Alessio Signorini, and David C Kale. Col- [233] Yuqing Chen and Yang Xue. A deep learning approach to human
lecting and analyzing millions of mhealth data streams. In Proceedings activity recognition based on single accelerometer. In Systems, Man,
of the 23rd ACM SIGKDD International Conference on Knowledge and Cybernetics (SMC), IEEE International Conference on, pages
Discovery and Data Mining, pages 1971–1980. ACM, 2017. 1488–1492, 2015.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 48

[234] Sojeong Ha and Seungjin Choi. Convolutional neural networks for convolutional networks. In the IEEE Int. Conf. on Computer Vision
human activity recognition using multiple accelerometer and gyroscope (ICCV), 2017.
sensors. In Neural Networks (IJCNN), International Joint Conference [253] Zongqing Lu, Noor Felemban, Kevin Chan, and Thomas La Porta.
on, pages 381–388. IEEE, 2016. Demo abstract: On-demand information retrieval from videos using
[235] Marcus Edel and Enrico Köppe. Binarized-BLSTM-RNN based human deep learning in wireless networks. In Internet-of-Things Design and
activity recognition. In Indoor Positioning and Indoor Navigation Implementation (IoTDI), IEEE/ACM Second International Conference
(IPIN), International Conference on, pages 1–7. IEEE, 2016. on, pages 279–280, 2017.
[236] Tsuyoshi Okita and Sozo Inoue. Recognition of multiple overlapping [254] Jemin Lee, Jinse Kwon, and Hyungshin Kim. Reducing distraction
activities using compositional cnn-lstm model. In Proceedings of of smartwatch users with deep learning. In Proceedings of the 18th
the ACM International Joint Conference on Pervasive and Ubiquitous International Conference on Human-Computer Interaction with Mobile
Computing and Proceedings of the ACM International Symposium on Devices and Services Adjunct, pages 948–953. ACM, 2016.
Wearable Computers, pages 165–168. ACM, 2017. [255] Toan H Vu, Le Dung, and Jia-Ching Wang. Transportation mode
[237] Gaurav Mittal, Kaushal B Yagnik, Mohit Garg, and Narayanan C detection on mobile devices using recurrent nets. In Proceedings of
Krishnan. Spotgarbage: smartphone app to detect garbage using deep the ACM on Multimedia Conference, pages 392–396. ACM, 2016.
learning. In Proceedings of the 2016 International Joint Conference [256] Shih-Hau Fang, Yu-Xaing Fei, Zhezhuang Xu, and Yu Tsao. Learning
on Pervasive and Ubiquitous Computing, pages 940–945. ACM, 2016. transportation modes from smartphone sensors based on deep neural
[238] Lorenzo Seidenari, Claudio Baecchi, Tiberio Uricchio, Andrea Ferra- network. IEEE Sensors Journal, 17(18):6111–6118, 2017.
cani, Marco Bertini, and Alberto Del Bimbo. Deep artwork detection [257] Zhao, Mingmin and Tian, Yonglong and Zhao, Hang and Alsheikh,
and retrieval for automatic context-aware audio guides. ACM Trans- Mohammad Abu and Li, Tianhong and Hristov, Rumen and Kabelac,
actions on Multimedia Computing, Communications, and Applications Zachary and Katabi, Dina and Torralba, Antonio. RF-based 3D
(TOMM), 13(3s):35, 2017. skeletons. In Proceedings of the Conference of the ACM Special Interest
[239] Xiao Zeng, Kai Cao, and Mi Zhang. Mobiledeeppill: A small-footprint Group on Data Communication (SIGCOMM), pages 267–281. ACM,
mobile deep learning system for recognizing unconstrained pill images. 2018.
In Proceedings of the 15th Annual International Conference on Mobile [258] Xi Ouyang, Chaoyun Zhang, Pan Zhou, and Hao Jiang. DeepSpace:
Systems, Applications, and Services, pages 56–67. ACM, 2017. An online deep learning framework for mobile big data to understand
[240] Zou, Han and Zhou, Yuxun and Yang, Jianfei and Jiang, Hao and Xie, human mobility patterns. arXiv preprint arXiv:1610.07009, 2016.
Lihua and Spanos, Costas J. Deepsense: Device-free human activity [259] Hua Yang, Zhimei Li, and Zhiyong Liu. Neural networks for MANET
recognition via autoencoder long-term recurrent convolutional network. AODV: an optimization approach. Cluster Computing, pages 1–9,
2018. 2017.
[241] Xiao Zeng. Mobile sensing through deep learning. In Proceedings of [260] Xuan Song, Hiroshi Kanasugi, and Ryosuke Shibasaki. DeepTransport:
the Workshop on MobiSys Ph. D. Forum, pages 5–6. ACM, 2017. Prediction and simulation of human mobility and transportation mode
[242] Xuyu Wang, Lingjun Gao, and Shiwen Mao. PhaseFi: Phase finger- at a citywide level. In IJCAI, pages 2618–2624, 2016.
printing for indoor localization with a deep learning approach. In [261] Junbo Zhang, Yu Zheng, and Dekang Qi. Deep spatio-temporal
residual networks for citywide crowd flows prediction. In Thirty-
Global Communications Conference (GLOBECOM), IEEE, pages 1–6,
2015. First Association for the Advancement of Artificial Intelligence (AAAI)
Conference on Artificial Intelligence, 2017.
[243] Xuyu Wang, Lingjun Gao, and Shiwen Mao. CSI phase fingerprinting
[262] J Venkata Subramanian and M Abdul Karim Sadiq. Implementation
for indoor localization with a deep learning approach. IEEE Internet
of artificial neural network for mobile movement prediction. Indian
of Things Journal, 3(6):1113–1123, 2016.
Journal of science and Technology, 7(6):858–863, 2014.
[244] Feng, Chunhai and Arshad, Sheheryar and Yu, Ruiyun and Liu, [263] Longinus S Ezema and Cosmas I Ani. Artificial neural network
Yonghe. Evaluation and improvement of activity detection systems approach to mobile location estimation in gsm network. International
with recurrent neural network. In International Conference on Com- Journal of Electronics and Telecommunications, 63(1):39–44, 2017.
munications (ICC), pages 1–6. IEEE, 2018. [264] Shao, Wenhua and Luo, Haiyong and Zhao, Fang and Wang, Cong
[245] Bokai Cao, Lei Zheng, Chenwei Zhang, Philip S Yu, Andrea Piscitello, and Crivello, Antonino and Tunio, Muhammad Zahid. DePedo: Anti
John Zulueta, Olu Ajilore, Kelly Ryan, and Alex D Leow. Deepmood: periodic negative-step movement pedometer with deep convolutional
Modeling mobile phone typing dynamics for mood detection. In neural networks. In International Conference on Communications
Proceedings of the 23rd ACM SIGKDD International Conference on (ICC), pages 1–6. IEEE, 2018.
Knowledge Discovery and Data Mining, pages 747–755. ACM, 2017. [265] Xuyu Wang, Lingjun Gao, Shiwen Mao, and Santosh Pandey. DeepFi:
[246] Ran, Xukan and Chen, Haoliang and Zhu, Xiaodan and Liu, Zhenming Deep learning for indoor fingerprinting using channel state information.
and Chen, Jiasi. Deepdecision: A mobile deep learning framework for In Wireless Communications and Networking Conference (WCNC),
edge video analytics. In INFOCOM. IEEE, 2018. IEEE, pages 1666–1671, 2015.
[247] Siri Team. Deep Learning for Siri’s Voice: On-device Deep Mix- [266] Xuyu Wang, Xiangyu Wang, and Shiwen Mao. CiFi: Deep convolu-
ture Density Networks for Hybrid Unit Selection Synthesis. https: tional neural networks for indoor localization with 5 GHz Wi-Fi. In
//machinelearning.apple.com/2017/08/06/siri-voices.html, 2017. [On- 2017 IEEE International Conference on Communications (ICC), pages
line; accessed 16-Sep-2017]. 1–6.
[248] Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez [267] Xuyu Wang, Lingjun Gao, and Shiwen Mao. BiLoc: Bi-modal deep
Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Haşim Sak, learning for indoor localization with commodity 5GHz WiFi. IEEE
Alexander Gruenstein, Françoise Beaufays, et al. Personalized speech Access, 5:4209–4220, 2017.
recognition on mobile devices. In Acoustics, Speech and Signal [268] Michał Nowicki and Jan Wietrzykowski. Low-effort place recognition
Processing (ICASSP), IEEE International Conference on, pages 5955– with WiFi fingerprints using deep learning. In International Conference
5959, 2016. Automation, pages 575–584. Springer, 2017.
[249] Rohit Prabhavalkar, Ouais Alsharif, Antoine Bruguier, and Lan Mc- [269] Xiao Zhang, Jie Wang, Qinghua Gao, Xiaorui Ma, and Hongyu Wang.
Graw. On the compression of recurrent neural networks with an appli- Device-free wireless localization and activity recognition with deep
cation to LVCSR acoustic modeling for embedded speech recognition. learning. In Pervasive Computing and Communication Workshops
In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE (PerCom Workshops), IEEE International Conference on, pages 1–5,
International Conference on, pages 5970–5974, 2016. 2016.
[250] Takuya Yoshioka, Nobutaka Ito, Marc Delcroix, Atsunori Ogawa, [270] Jie Wang, Xiao Zhang, Qinhua Gao, Hao Yue, and Hongyu Wang.
Keisuke Kinoshita, Masakiyo Fujimoto, Chengzhu Yu, Wojciech J Device-free wireless localization and activity recognition: A deep
Fabian, Miquel Espi, Takuya Higuchi, et al. The NTT CHiME-3 learning approach. IEEE Transactions on Vehicular Technology,
system: Advances in speech enhancement and recognition for mobile 66(7):6258–6267, 2017.
multi-microphone devices. In IEEE Workshop on Automatic Speech [271] Mehdi Mohammadi, Ala Al-Fuqaha, Mohsen Guizani, and Jun-Seok
Recognition and Understanding (ASRU), pages 436–443, 2015. Oh. Semi-supervised deep reinforcement learning in support of IoT
[251] Sherry Ruan, Jacob O Wobbrock, Kenny Liou, Andrew Ng, and James and smart city services. IEEE Internet of Things Journal, 2017.
Landay. Speech is 3x faster than typing for english and mandarin text [272] Anil Kumar Tirumala Ravi Kumar, Bernd Schäufele, Daniel Becker,
entry on mobile devices. arXiv preprint arXiv:1608.07323, 2016. Oliver Sawade, and Ilja Radusch. Indoor localization of vehicles using
[252] Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, deep learning. In IEEE 17th International Symposium on World of
and Luc Van Gool. DSLR-quality photos on mobile devices with deep Wireless, Mobile and Multimedia Networks (WoWMoM), pages 1–6.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 49

[273] Joao Vieira, Erik Leitinger, Muris Sarajlic, Xuhong Li, and Fredrik [293] Chen, Ziqi and Smith, David B. Heterogeneous machine-type commu-
Tufvesson. Deep convolutional neural networks for massive MIMO nications in cellular networks: Random access optimization by deep
fingerprint-based positioning. In 28th Annual International Sympo- reinforcement learning. In International Conference on Communica-
sium on Personal, Indoor and Mobile Radio Communications. IEEE– tions (ICC), pages 1–6. IEEE, 2018.
Institute of Electrical and Electronics Engineers Inc., 2017. [294] Chen, Li and Lingys, Justinas and Chen, Kai and Liu, Feng. AuTO:
[274] Anzum, Nafisa and Afroze, Syeda Farzia and Rahman, Ashikur. Zone- scaling deep reinforcement learning for datacenter-scale automatic
based indoor localization using neural networks: A view from a real traffic optimization. In Proceedings of the Conference of the ACM
testbed. In International Conference on Communications (ICC), pages Special Interest Group on Data Communication (SIGCOMM), pages
1–7. IEEE, 2018. 191–205. ACM, 2018.
[275] Wang, Xuyu and Yu, Zhitao and Mao, Shiwen. DeepML: Deep LSTM [295] YangMin Lee. Classification of node degree based on deep learning and
for indoor localization with smartphone magnetic and light sensors. In routing method applied for virtual route assignment. Ad Hoc Networks,
International Conference on Communications (ICC), pages 1–6. IEEE, 58:70–85, 2017.
2018. [296] Fengxiao Tang, Bomin Mao, Zubair Md Fadlullah, Nei Kato, Osamu
[276] Po-Jen Chuang and Yi-Jun Jiang. Effective neural network-based node Akashi, Takeru Inoue, and Kimihiro Mizutani. On removing routing
localisation scheme for wireless sensor networks. IET Wireless Sensor protocol from future wireless networks: A real-time deep learning
Systems, 4(2):97–103, 2014. approach for intelligent traffic control. IEEE Wireless Communications,
[277] Marcin Bernas and Bartłomiej Płaczek. Fully connected neural net- 2017.
works ensemble with signal strength clustering for indoor localization [297] Qingchen Zhang, Man Lin, Laurence T Yang, Zhikui Chen, and Peng
in wireless sensor networks. International Journal of Distributed Li. Energy-efficient scheduling for real-time systems based on deep Q-
Sensor Networks, 11(12):403242, 2015. learning model. IEEE Transactions on Sustainable Computing, 2017.
[278] Ashish Payal, Chandra Shekhar Rai, and BV Ramana Reddy. Analysis [298] Ribal Atallah, Chadi Assi, and Maurice Khabbaz. Deep reinforcement
of some feedforward artificial neural network training algorithms learning-based scheduling for roadside communication networks. In
for developing localization framework in wireless sensor networks. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks
Wireless Personal Communications, 82(4):2519–2536, 2015. (WiOpt), 15th International Symposium on, pages 1–8. IEEE, 2017.
[279] Yuhan Dong, Zheng Li, Rui Wang, and Kai Zhang. Range-based [299] Sandeep Chinchali, Pan Hu, Tianshu Chu, Manu Sharma, Manu Bansal,
localization in underwater wireless sensor networks using deep neural Rakesh Misra, Marco Pavone, and Katti Sachin. Cellular network traffic
network. In IPSN, pages 321–322, 2017. scheduling with deep reinforcement learning. In National Conference
[280] Xiaofei Yan, Hong Cheng, Yandong Zhao, Wenhua Yu, Huan Huang, on Artificial Intelligence (AAAI), 2018.
and Xiaoliang Zheng. Real-time identification of smoldering and [300] Wei, Yifei and Zhang, Zhiqiang and Yu, F Richard and Han, Zhu. Joint
flaming combustion phases in forest using a wireless sensor network- user scheduling and content caching strategy for mobile edge networks
based multi-sensor system and artificial neural network. Sensors, using deep reinforcement learning. In International Conference on
16(8):1228, 2016. Communications Workshops (ICC Workshops). IEEE, 2018.
[281] Baowei Wang, Xiaodu Gu, Li Ma, and Shuangshuang Yan. Temper- [301] Haoran Sun, Xiangyi Chen, Qingjiang Shi, Mingyi Hong, Xiao Fu,
ature error correction based on BP neural network in meteorological and Nikos D Sidiropoulos. Learning to optimize: Training deep neural
wireless sensor network. International Journal of Sensor Networks, networks for wireless resource management. In Signal Processing
23(4):265–278, 2017. Advances in Wireless Communications (SPAWC), 18th International
[282] Ki-Seong Lee, Sun-Ro Lee, Youngmin Kim, and Chan-Gun Lee. Workshop on, pages 1–6. IEEE, 2017.
Deep learning–based real-time query processing for wireless sensor [302] Zhiyuan Xu, Yanzhi Wang, Jian Tang, Jing Wang, and Mustafa Cenk
network. International Journal of Distributed Sensor Networks, Gursoy. A deep reinforcement learning based framework for power-
13(5):1550147717707896, 2017. efficient resource allocation in cloud RANs. In 2017 IEEE Interna-
[283] Jiakai Li and Gursel Serpen. Adaptive and intelligent wireless sensor tional Conference on Communications (ICC), pages 1–6.
networks through neural networks: an illustration for infrastructure [303] Paulo Victor R Ferreira, Randy Paffenroth, Alexander M Wyglinski,
adaptation through hopfield network. Applied Intelligence, 45(2):343– Timothy M Hackett, Sven G Bilén, Richard C Reinhart, and Dale J
362, 2016. Mortensen. Multi-objective reinforcement learning-based deep neural
[284] Fereshteh Khorasani and Hamid Reza Naji. Energy efficient data aggre- networks for cognitive space communications. In Cognitive Commu-
gation in wireless sensor networks using neural networks. International nications for Aerospace Applications Workshop (CCAA), pages 1–8.
Journal of Sensor Networks, 24(1):26–42, 2017. IEEE, 2017.
[285] Chunlin Li, Xiaofu Xie, Yuejiang Huang, Hong Wang, and Changxi [304] Ye, Hao and Li, Geoffrey Ye. Deep reinforcement learning for resource
Niu. Distributed data mining based on deep neural network for wireless allocation in V2V communications. In International Conference on
sensor network. International Journal of Distributed Sensor Networks, Communications (ICC), pages 1–6. IEEE, 2018.
11(7):157453, 2015. [305] Challita, Ursula and Dong, Li and Saad, Walid. Proactive resource man-
[286] Luo, Tie and Nagarajany, Sai G. Distributed anomaly detection agement for LTE in unlicensed spectrum: A deep learning perspective.
using autoencoder neural networks in WSN for IoT. In International IEEE Transactions on Wireless Communications, 2018.
Conference on Communications (ICC), pages 1–6. IEEE, 2018. [306] Oshri Naparstek and Kobi Cohen. Deep multi-user reinforcement learn-
[287] Lu Liu, Yu Cheng, Lin Cai, Sheng Zhou, and Zhisheng Niu. Deep ing for dynamic spectrum access in multichannel wireless networks.
learning based optimization in wireless network. In 2017 IEEE arXiv preprint arXiv:1704.02613, 2017.
International Conference on Communications (ICC), pages 1–6. [307] Timothy J O’Shea and T Charles Clancy. Deep reinforcement learning
[288] Shivashankar Subramanian and Arindam Banerjee. Poster: Deep radio control and signal detection with KeRLym, a gym RL agent.
learning enabled M2M gateway for network optimization. In Proceed- arXiv preprint arXiv:1605.09221, 2016.
ings of the 14th Annual International Conference on Mobile Systems, [308] Michael Andri Wijaya, Kazuhiko Fukawa, and Hiroshi Suzuki.
Applications, and Services Companion, pages 144–144. ACM, 2016. Intercell-interference cancellation and neural network transmit power
[289] Ying He, Chengchao Liang, F Richard Yu, Nan Zhao, and Hongxi Yin. optimization for MIMO channels. In Vehicular Technology Conference
Optimization of cache-enabled opportunistic interference alignment (VTC Fall), IEEE 82nd, pages 1–5, 2015.
wireless networks: A big data deep reinforcement learning approach. In [309] Rutagemwa, Humphrey and Ghasemi, Amir and Liu, Shuo. Dynamic
2017 IEEE International Conference on Communications (ICC), pages spectrum assignment for land mobile radio with deep recurrent neural
1–6. networks. In International Conference on Communications Workshops
[290] Ying He, Zheng Zhang, F Richard Yu, Nan Zhao, Hongxi Yin, Vic- (ICC Workshops). IEEE, 2018.
tor CM Leung, and Yanhua Zhang. Deep reinforcement learning-based [310] Michael Andri Wijaya, Kazuhiko Fukawa, and Hiroshi Suzuki. Neural
optimization for cache-enabled opportunistic interference alignment network based transmit power control and interference cancellation for
wireless networks. IEEE Transactions on Vehicular Technology, 2017. MIMO small cell networks. IEICE Transactions on Communications,
[291] Faris B Mismar and Brian L Evans. Deep reinforcement learning 99(5):1157–1169, 2016.
for improving downlink mmwave communication performance. arXiv [311] Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. Neural adap-
preprint arXiv:1707.02329, 2017. tive video streaming with pensieve. In Proceedings of the Conference
[292] Wang, Zhi and Li, Lihua and Xu, Yue and Tian, Hui and Cui, Shuguang. of the ACM Special Interest Group on Data Communication, pages
Handover optimization via asynchronous multi-user deep reinforcement 197–210. ACM, 2017.
learning. In International Conference on Communications (ICC), pages [312] Tetsuya Oda, Ryoichiro Obukata, Makoto Ikeda, Leonard Barolli, and
1–6. IEEE, 2018. Makoto Takizawa. Design and implementation of a simulation system
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 50

based on deep Q-network for mobile actor node control in wireless [332] Xin Su, Dafang Zhang, Wenjia Li, and Kai Zhao. A deep learning
sensor and actor networks. In Advanced Information Networking and approach to Android malware feature learning and detection. In
Applications Workshops (WAINA), 31st International Conference on, Trustcom/BigDataSE/ISPA, IEEE, pages 244–251, 2016.
pages 195–200. IEEE, 2017. [333] Shifu Hou, Aaron Saas, Lifei Chen, and Yanfang Ye. Deep4MalDroid:
[313] Tetsuya Oda, Donald Elmazi, Miralda Cuka, Elis Kulla, Makoto Ikeda, A deep learning framework for Android malware detection based on
and Leonard Barolli. Performance evaluation of a deep Q-network linux kernel system call graphs. In Web Intelligence Workshops (WIW),
based simulation system for actor node mobility control in wireless IEEE/WIC/ACM International Conference on, pages 104–111, 2016.
sensor and actor networks considering three-dimensional environment. [334] Fabio Martinelli, Fiammetta Marulli, and Francesco Mercaldo. Eval-
In International Conference on Intelligent Networking and Collabora- uating convolutional neural network for effective mobile malware
tive Systems, pages 41–52. Springer, 2017. detection. Procedia Computer Science, 112:2372–2381, 2017.
[314] Hye-Young Kim and Jong-Min Kim. A load balancing scheme based [335] Niall McLaughlin, Jesus Martinez del Rincon, BooJoong Kang,
on deep-learning in IoT. Cluster Computing, 20(1):873–878, 2017. Suleiman Yerima, Paul Miller, Sakir Sezer, Yeganeh Safaei, Erik
[315] Challita, Ursula and Saad, Walid and Bettstetter, Christian. Deep Trickel, Ziming Zhao, Adam Doupé, et al. Deep android malware
reinforcement learning for interference-aware path planning of cellular detection. In Proceedings of the Seventh ACM on Conference on Data
connected UAVs. In Proc. of International Conference on Communi- and Application Security and Privacy, pages 301–308. ACM, 2017.
cations (ICC). Kansas City, MO, USA, 2018. [336] Yuanfang Chen, Yan Zhang, and Sabita Maharjan. Deep learning for
[316] Luo, Changqing and Ji, Jinlong and Wang, Qianlong and Yu, Lixing secure mobile edge computing. arXiv preprint arXiv:1709.08025, 2017.
and Li, Pan. Online power control for 5G wireless communications: [337] Milan Oulehla, Zuzana Komínková Oplatková, and David Malanik.
A deep Q-network approach. In International Conference on Commu- Detection of mobile botnets using neural networks. In Future Tech-
nications (ICC), pages 1–6. IEEE, 2018. nologies Conference (FTC), pages 1324–1326. IEEE, 2016.
[317] Yu, Yiding and Wang, Taotao and Liew, Soung Chang. Deep- [338] Pablo Torres, Carlos Catania, Sebastian Garcia, and Carlos Garcia
reinforcement learning multiple access for heterogeneous wireless Garino. An analysis of recurrent neural networks for botnet detection
networks. In International Conference on Communications (ICC), behavior. In Biennial Congress of Argentina (ARGENCON), IEEE,
pages 1–7. IEEE, 2018. pages 1–6, 2016.
[318] Xu, Zhiyuan and Tang, Jian and Meng, Jingsong and Zhang, Weiyi [339] Meisam Eslahi, Moslem Yousefi, Maryam Var Naseri, YM Yussof,
and Wang, Yanzhi and Liu, Chi Harold and Yang, Dejun. Experience- NM Tahir, and H Hashim. Mobile botnet detection model based on
driven networking: A deep reinforcement learning based approach. In retrospective pattern recognition. International Journal of Security and
INFOCOM. IEEE, 2018. Its Applications, 10(9):39–+, 2016.
[319] Liu, Jingchu and Krishnamachari, Bhaskar and Zhou, Sheng and Niu, [340] Mohammad Alauthaman, Nauman Aslam, Li Zhang, Rafe Alasem,
Zhisheng. DeepNap: Data-driven base station sleeping operations and MA Hossain. A p2p botnet detection scheme based on decision
through deep reinforcement learning. IEEE Internet of Things Journal, tree and adaptive multilayer neural networks. Neural Computing and
2018. Applications, pages 1–14, 2016.
[320] Mahmood Yousefi-Azar, Vijay Varadharajan, Len Hamey, and Uday [341] Reza Shokri and Vitaly Shmatikov. Privacy-preserving deep learning.
Tupakula. Autoencoder-based feature learning for cyber security ap- In Proceedings of the 22nd ACM SIGSAC conference on computer and
plications. In Neural Networks (IJCNN), International Joint Conference communications security, pages 1310–1321. ACM, 2015.
on, pages 3854–3861. IEEE, 2017. [342] Yoshinori Aono, Takuya Hayashi, Lihua Wang, Shiho Moriai, et al.
Privacy-preserving deep learning: Revisited and enhanced. In Inter-
[321] Muhamad Erza Aminanto and Kwangjo Kim. Detecting impersonation
national Conference on Applications and Techniques in Information
attack in WiFi networks using deep learning approach. In Interna-
Security, pages 100–110. Springer, 2017.
tional Workshop on Information Security Applications, pages 136–147.
[343] Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R Rabiee,
Springer, 2016.
Nic Lane, and Hamed Haddadi. A hybrid deep learning architecture for
[322] Qingsong Feng, Zheng Dou, Chunmei Li, and Guangzhen Si. Anomaly
privacy-preserving mobile analytics. arXiv preprint arXiv:1703.02952,
detection of spectrum in wireless communication via deep autoencoder.
2017.
In International Conference on Computer Science and its Applications,
[344] Martín Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya
pages 259–265. Springer, 2016.
Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential
[323] Muhammad Altaf Khan, Shafiullah Khan, Bilal Shams, and Jaime privacy. In Proceedings of the ACM SIGSAC Conference on Computer
Lloret. Distributed flood attack detection mechanism using artificial and Communications Security, pages 308–318. ACM, 2016.
neural network in wireless mesh networks. Security and Communica- [345] Seyed Ali Osia, Ali Shahin Shamsabadi, Ali Taheri, Kleomenis Kat-
tion Networks, 9(15):2715–2729, 2016. evas, Hamed Haddadi, and Hamid R Rabiee. Private and scalable
[324] Abebe Abeshu Diro and Naveen Chilamkurti. Distributed attack personal data analytics using a hybrid edge-cloud deep learning. IEEE
detection scheme using deep learning approach for Internet of Things. Computer Magazine Special Issue on Mobile and Embedded Deep
Future Generation Computer Systems, 2017. Learning, 2018.
[325] Alan Saied, Richard E Overill, and Tomasz Radzik. Detection of [346] Sandra Servia-Rodriguez, Liang Wang, Jianxin R Zhao, Richard
known and unknown DDoS attacks using artificial neural networks. Mortier, and Hamed Haddadi. Personal model training under privacy
Neurocomputing, 172:385–393, 2016. constraints. In Proceedings of the 3rd ACM/IEEE International Con-
[326] Manuel Lopez-Martin, Belen Carro, Antonio Sanchez-Esguevillas, and ference on Internet-of-Things Design and Implementation, Apr 2018.
Jaime Lloret. Conditional variational autoencoder for prediction and [347] Briland Hitaj, Giuseppe Ateniese, and Fernando Perez-Cruz. Deep
feature recovery applied to intrusion detection in IoT. Sensors, models under the GAN: information leakage from collaborative deep
17(9):1967, 2017. learning. In Proceedings of the ACM SIGSAC Conference on Computer
[327] Hamedani, Kian and Liu, Lingjia and Atat, Rachad and Wu, Jinsong and Communications Security, pages 603–618. ACM, 2017.
and Yi, Yang. Reservoir computing meets smart grids: attack detection [348] Sam Greydanus. Learning the enigma with recurrent neural networks.
using delayed feedback networks. IEEE Transactions on Industrial arXiv preprint arXiv:1708.07576, 2017.
Informatics, 14(2):734–743, 2018. [349] Houssem Maghrebi, Thibault Portigliatti, and Emmanuel Prouff. Break-
[328] Das, Rajshekhar and Gadre, Akshay and Zhang, Shanghang and Kumar, ing cryptographic implementations using deep learning techniques. In
Swarun and Moura, Jose MF. A deep learning approach to IoT International Conference on Security, Privacy, and Applied Cryptog-
authentication. In International Conference on Communications (ICC), raphy Engineering, pages 3–26. Springer, 2016.
pages 1–6. IEEE, 2018. [350] Liu, Yunyu and Xia, Zhiyang and Yi, Ping and Yao, Yao and Xie,
[329] Jiang, Peng and Wu, Hongyi and Wang, Cong and Xin, Chunsheng. Tiantian and Wang, Wei and Zhu, Ting. GENPass: A general deep
Virtual MAC spoofing detection through deep learning. In International learning model for password guessing with PCFG rules and adversarial
Conference on Communications (ICC), pages 1–6. IEEE, 2018. generation. In International Conference on Communications (ICC),
[330] Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. Droid- pages 1–6. IEEE, 2018.
Sec: deep learning in Android malware detection. In ACM SIGCOMM [351] Neev Samuel, Tzvi Diskin, and Ami Wiesel. Deep MIMO detection.
Computer Communication Review, volume 44, pages 371–372. ACM, arXiv preprint arXiv:1706.01151, 2017.
2014. [352] Xin Yan, Fei Long, Jingshuai Wang, Na Fu, Weihua Ou, and Bin Liu.
[331] Zhenlong Yuan, Yongqiang Lu, and Yibo Xue. Droiddetector: Android Signal detection of MIMO-OFDM system based on auto encoder and
malware characterization and detection using deep learning. Tsinghua extreme learning machine. In Neural Networks (IJCNN), International
Science and Technology, 21(1):114–123, 2016. Joint Conference on, pages 1602–1606. IEEE, 2017.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 51

[353] David Neumann, Wolfgang Utschick, and Thomas Wiese. Deep [373] Demetrios Zeinalipour Yazti and Shonali Krishnaswamy. Mobile big
channel estimation. In 21th International ITG Workshop on Smart data analytics: research, practice, and opportunities. In Mobile Data
Antennas; Proceedings of, pages 1–6. VDE, 2017. Management (MDM), 15th International Conference on, volume 1,
[354] Timothy J O’Shea, Tugba Erpek, and T Charles Clancy. Deep learning pages 1–2. IEEE, 2014.
based MIMO communications. arXiv preprint arXiv:1707.07980, 2017. [374] Diala Naboulsi, Marco Fiore, Stephane Ribot, and Razvan Stanica.
[355] Mark Borgerding, Philip Schniter, and Sundeep Rangan. AMP-inspired Large-scale mobile traffic analysis: a survey. IEEE Communications
deep networks for sparse linear inverse problems. IEEE Transactions Surveys & Tutorials, 18(1):124–161, 2016.
on Signal Processing, 2017. [375] Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee,
[356] Fujihashi, Takuya and Koike-Akino, Toshiaki and Watanabe, Takashi and Andrew Y Ng. Multimodal deep learning. In Proceedings of the
and Orlik, Philip V. Nonlinear equalization with deep learning 28th international conference on machine learning (ICML-11), pages
for multi-purpose visual MIMO communications. In International 689–696, 2011.
Conference on Communications (ICC), pages 1–6. IEEE, 2018. [376] Aceto, Giuseppe and Ciuonzo, Domenico and Montieri, Antonio and
[357] Sreeraj Rajendran, Wannes Meert, Domenico Giustiniano, Vincent Pescapé, Antonio. Mobile encrypted traffic classification using deep
Lenders, and Sofie Pollin. Deep learning models for wireless signal learning. In 2nd Network Traffic Measurement and Analysis Confer-
classification with distributed low-cost spectrum sensors. IEEE Trans- ence. IEEE, 2018.
actions on Cognitive Communications and Networking, 2018. [377] Ala Al-Fuqaha, Mohsen Guizani, Mehdi Mohammadi, Mohammed
[358] Nathan E West and Tim O’Shea. Deep architectures for modulation Aledhari, and Moussa Ayyash. Internet of things: A survey on enabling
recognition. In Dynamic Spectrum Access Networks (DySPAN), IEEE technologies, protocols, and applications. IEEE Communications Sur-
International Symposium on, pages 1–6, 2017. veys & Tutorials, 17(4):2347–2376, 2015.
[359] Timothy J O’Shea, Latha Pemula, Dhruv Batra, and T Charles Clancy. [378] Suranga Seneviratne, Yining Hu, Tham Nguyen, Guohao Lan, Sara
Radio transformer networks: Attention models for learning to synchro- Khalifa, Kanchana Thilakarathna, Mahbub Hassan, and Aruna Senevi-
nize in wireless systems. In Signals, Systems and Computers, 50th ratne. A survey of wearable devices and challenges. IEEE Communi-
Asilomar Conference on, pages 662–666, 2016. cations Surveys & Tutorials, 2017.
[360] Timothy OâĂŹShea and Jakob Hoydis. An introduction to deep [379] He Li, Kaoru Ota, and Mianxiong Dong. Learning IoT in edge:
learning for the physical layer. IEEE Transactions on Cognitive Deep learning for the Internet of Things with edge computing. IEEE
Communications and Networking, 3(4):563–575, 2017. Network, 32(1):96–101, 2018.
[361] Jagannath, Jithin and Polosky, Nicholas and O’Connor, Daniel and [380] Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio For-
Theagarajan, Lakshmi N and Sheaffer, Brendan and Foulke, Svetlana livesi, and Fahim Kawsar. An early resource characterization of deep
and Varshney, Pramod K. Artificial neural network based automatic learning on wearables, smartphones and internet-of-things devices.
modulation classification over a software defined radio testbed. In In Proceedings of the International Workshop on Internet of Things
International Conference on Communications (ICC), pages 1–6. IEEE, towards Applications, pages 7–12. ACM, 2015.
2018. [381] Daniele Ravì, Charence Wong, Fani Deligianni, Melissa Berthelot,
Javier Andreu-Perez, Benny Lo, and Guang-Zhong Yang. Deep
[362] Timothy J O’Shea, Seth Hitefield, and Johnathan Corgan. End-to-end
learning for health informatics. IEEE journal of biomedical and health
radio traffic sequence recognition with recurrent neural networks. In
informatics, 21(1):4–21, 2017.
Signal and Information Processing (GlobalSIP), IEEE Global Confer-
[382] Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel T
ence on, pages 277–281, 2016.
Dudley. Deep learning for healthcare: review, opportunities and
[363] Timothy J O’Shea, Kiran Karra, and T Charles Clancy. Learning to
challenges. Briefings in Bioinformatics, page bbx044, 2017.
communicate: Channel auto-encoders, domain specific regularizers, and
[383] Charissa Ann Ronao and Sung-Bae Cho. Human activity recognition
attention. In Signal Processing and Information Technology (ISSPIT),
with smartphone sensors using deep learning neural networks. Expert
IEEE International Symposium on, pages 223–228, 2016.
Systems with Applications, 59:235–244, 2016.
[364] Ye, Hao and Li, Geoffrey Ye and Juang, Biing-Hwang. Power of deep [384] Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu.
learning for channel estimation and signal detection in OFDM systems. Deep learning for sensor-based activity recognition: A survey. Pattern
IEEE Wireless Communications Letters, 7(1):114–117, 2018. Recognition Letters, 2018.
[365] Liang, Fei and Shen, Cong and Wu, Feng. Exploiting noise corre- [385] Xukan Ran, Haoliang Chen, Zhenming Liu, and Jiasi Chen. Delivering
lation for channel decoding with convolutional neural networks. In deep learning to mobile devices via offloading. In Proceedings of the
International Conference on Communications (ICC), pages 1–6. IEEE, Workshop on Virtual Reality and Augmented Reality Network, pages
2018. 42–47. ACM, 2017.
[366] Lyu, Wei and Zhang, Zhaoyang and Jiao, Chunxu and Qin, Kangjian [386] Vishakha V Vyas, KH Walse, and RV Dharaskar. A survey on human
and Zhang, Huazi. Performance evaluation of channel decoding with activity recognition using smartphone. International Journal, 5(3),
deep neural networks. In International Conference on Communications 2017.
(ICC), pages 1–6. IEEE, 2018. [387] Heiga Zen and Andrew Senior. Deep mixture density networks
[367] Sebastian Dorner, Sebastian Cammerer, Jakob Hoydis, and Stephan ten for acoustic modeling in statistical parametric speech synthesis. In
Brink. On deep learning-based communication over the air. In Signals, Acoustics, Speech and Signal Processing (ICASSP), IEEE International
Systems, and Computers, 51st Asilomar Conference on, pages 1791– Conference on, pages 3844–3848, 2014.
1795. IEEE, 2017. [388] Kai Zhao, Sasu Tarkoma, Siyuan Liu, and Huy Vo. Urban human
[368] Roberto Gonzalez, Alberto Garcia-Duran, Filipe Manco, Mathias mobility data mining: An overview. In Big Data (Big Data), IEEE
Niepert, and Pelayo Vallina. Network data monetization using Net2Vec. International Conference on, pages 1911–1920, 2016.
In Proceedings of the SIGCOMM Posters and Demos, pages 37–39. [389] Shixiong Xia, Yi Liu, Guan Yuan, Mingjun Zhu, and Zhaohui Wang.
ACM, 2017. Indoor fingerprint positioning based on Wi-Fi: An overview. ISPRS
[369] Nichoas Kaminski, Irene Macaluso, Emanuele Di Pascale, Avishek International Journal of Geo-Information, 6(5):135, 2017.
Nag, John Brady, Mark Kelly, Keith Nolan, Wael Guibene, and Linda [390] Pavel Davidson and Robert Piche. A survey of selected indoor
Doyle. A neural-network-based realization of in-network computation positioning methods for smartphones. IEEE Communications Surveys
for the Internet of Things. In 2017 IEEE International Conference on & Tutorials, 2016.
Communications (ICC), pages 1–6. [391] Zejia Zhengj and Juyang Weng. Mobile device based outdoor nav-
[370] Liang Xiao, Yanda Li, Guoan Han, Huaiyu Dai, and H Vincent Poor. igation with on-line learning neural network: A comparison with
A secure mobile crowdsensing game with deep reinforcement learning. convolutional neural network. In Proceedings of the IEEE Conference
IEEE Transactions on Information Forensics and Security, 2017. on Computer Vision and Pattern Recognition Workshops, pages 11–18,
[371] Luong, Nguyen Cong and Xiong, Zehui and Wang, Ping and Niyato, 2016.
Dusit. Optimal auction for edge computing resource management in [392] Cheng Yang, Maosong Sun, Wayne Xin Zhao, Zhiyuan Liu, and
mobile blockchain networks: A deep learning approach. In Interna- Edward Y Chang. A neural network approach to jointly modeling social
tional Conference on Communications (ICC), pages 1–6. IEEE, 2018. networks and mobile trajectories. ACM Transactions on Information
[372] Gulati, Amuleen and Aujla, Gagangeet Singh and Chaudhary, Rajat Systems (TOIS), 35(4):36, 2017.
and Kumar, Neeraj and Obaidat, Mohammad S. Deep learning-based [393] Jiang Xiao, Kaishun Wu, Youwen Yi, and Lionel M Ni. FIFS:
content centric data dissemination scheme for Internet of Vehicles. In Fine-grained indoor fingerprinting system. In 2012 21st International
International Conference on Communications (ICC), pages 1–6. IEEE, Conference on Computer Communications and Networks (ICCCN),
2018. pages 1–7.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 52

[394] Moustafa Youssef and Ashok Agrawala. The Horus WLAN location mobile and embedded devices. IEEE Pervasive Computing, 16(3):82–
determination system. In Proceedings of the 3rd international confer- 88, 2017.
ence on Mobile systems, applications, and services, pages 205–218. [416] Jie Tang, Dawei Sun, Shaoshan Liu, and Jean-Luc Gaudiot. Enabling
ACM, 2005. deep learning on IoT devices. Computer, 50(10):92–96, 2017.
[395] Mauro Brunato and Roberto Battiti. Statistical learning theory for loca- [417] Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf,
tion fingerprinting in wireless LANs. Computer Networks, 47(6):825– William J Dally, and Kurt Keutzer. SqueezeNet: AlexNet-level accu-
845, 2005. racy with 50x fewer parameters and< 0.5 MB model size. International
[396] Jonathan Ho and Stefano Ermon. Generative adversarial imitation Conference on Learning Representations (ICLR), 2017.
learning. In Advances in Neural Information Processing Systems, pages [418] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko,
4565–4573, 2016. Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam.
[397] Michele Zorzi, Andrea Zanella, Alberto Testolin, Michele De Filippo Mobilenets: Efficient convolutional neural networks for mobile vision
De Grazia, and Marco Zorzi. COBANETS: A new paradigm for applications. arXiv preprint arXiv:1704.04861, 2017.
cognitive communications systems. In Computing, Networking and [419] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. ShuffleNet:
Communications (ICNC), International Conference on, pages 1–7. An extremely efficient convolutional neural network for mobile devices.
IEEE, 2016. In The IEEE Conference on Computer Vision and Pattern Recognition
[398] Mehdi Roopaei, Paul Rad, and Mo Jamshidi. Deep learning control (CVPR), June 2018.
for complex and large scale cloud systems. Intelligent Automation & [420] Qingchen Zhang, Laurence T Yang, Xingang Liu, Zhikui Chen, and
Soft Computing, pages 1–3, 2017. Peng Li. A tucker deep computation model for mobile multimedia
[399] Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Pri- feature learning. ACM Transactions on Multimedia Computing, Com-
oritized experience replay. arXiv preprint arXiv:1511.05952, 2015. munications, and Applications (TOMM), 13(3s):39, 2017.
[400] Qingjiang Shi, Meisam Razaviyayn, Zhi-Quan Luo, and Chen He. An [421] Qingqing Cao, Niranjan Balasubramanian, and Aruna Balasubrama-
iteratively weighted MMSE approach to distributed sum-utility maxi- nian. MobiRNN: Efficient recurrent neural network execution on
mization for a MIMO interfering broadcast channel. IEEE Transactions mobile GPU. 2017.
on Signal Processing, 59(9):4331–4340, 2011. [422] Chun-Fu Chen, Gwo Giun Lee, Vincent Sritapan, and Ching-Yung Lin.
[401] Anna L Buczak and Erhan Guven. A survey of data mining and Deep convolutional neural network on iOS mobile devices. In Signal
machine learning methods for cyber security intrusion detection. IEEE Processing Systems (SiPS), IEEE International Workshop on, pages
Communications Surveys & Tutorials, 18(2):1153–1176, 2016. 130–135, 2016.
[402] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet [423] S Rallapalli, H Qiu, A Bency, S Karthikeyan, R Govindan, B Man-
allocation. Journal of machine Learning research, 3(Jan):993–1022, junath, and R Urgaonkar. Are very deep neural networks feasible on
2003. mobile devices. IEEE Trans. Circ. Syst. Video Technol, 2016.
[403] Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C Suh, Ikkyun [424] Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio For-
Kim, and Kuinam J Kim. A survey of deep learning-based network livesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. DeepX: A software
anomaly detection. Cluster Computing, pages 1–13, 2017. accelerator for low-power deep learning inference on mobile devices.
[404] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A Ghorbani. In Information Processing in Sensor Networks (IPSN), 15th ACM/IEEE
A detailed analysis of the kdd cup 99 data set. In Computational International Conference on, pages 1–12, 2016.
Intelligence for Security and Defense Applications, 2009. CISDA 2009. [425] Loc N Huynh, Rajesh Krishna Balan, and Youngki Lee. DeepMon:
IEEE Symposium on, pages 1–6, 2009. Building mobile GPU deep learning models for continuous vision ap-
[405] Kimberly Tam, Ali Feizollah, Nor Badrul Anuar, Rosli Salleh, and plications. In Proceedings of the 15th Annual International Conference
Lorenzo Cavallaro. The evolution of android malware and Android on Mobile Systems, Applications, and Services, pages 186–186. ACM,
analysis techniques. ACM Computing Surveys (CSUR), 49(4):76, 2017. 2017.
[406] Rafael A Rodríguez-Gómez, Gabriel Maciá-Fernández, and Pedro [426] Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng.
García-Teodoro. Survey and taxonomy of botnet research through life- Quantized convolutional neural networks for mobile devices. In
cycle. ACM Computing Surveys (CSUR), 45(4):45, 2013. Proceedings of the IEEE Conference on Computer Vision and Pattern
[407] Menghan Liu, Haotian Jiang, Jia Chen, Alaa Badokhon, Xuetao Wei, Recognition, pages 4820–4828, 2016.
and Ming-Chun Huang. A collaborative privacy-preserving deep [427] Sourav Bhattacharya and Nicholas D Lane. Sparsification and sepa-
learning system in distributed mobile environment. In Computational ration of deep learning layers for constrained resource inference on
Science and Computational Intelligence (CSCI), International Confer- wearables. In Proceedings of the 14th ACM Conference on Embedded
ence on, pages 192–197. IEEE, 2016. Network Sensor Systems CD-ROM, pages 176–189. ACM, 2016.
[408] Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity [428] Minsik Cho and Daniel Brand. Mec: Memory-efficient convolution for
metric discriminatively, with application to face verification. In deep neural network. arXiv preprint arXiv:1706.06873, 2017.
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE [429] Jia Guo and Miodrag Potkonjak. Pruning filters and classes: Towards
Computer Society Conference on, volume 1, pages 539–546, 2005. on-device customization of convolutional neural networks. In Proceed-
[409] Ali Shahin Shamsabadi, Hamed Haddadi, and Andrea Cavallaro. Dis- ings of the 1st International Workshop on Deep Learning for Mobile
tributed one-class learning. In IEEE International Conference on Image Systems and Applications, pages 13–17. ACM, 2017.
Processing (ICIP), 2018. [430] Shiming Li, Duo Liu, Chaoneng Xiang, Jianfeng Liu, Yingjian Ling,
[410] Briland Hitaj, Paolo Gasti, Giuseppe Ateniese, and Fernando Perez- Tianjun Liao, and Liang Liang. Fitcnn: A cloud-assisted lightweight
Cruz. PassGAN: A deep learning approach for password guessing. convolutional neural network framework for mobile devices. In Em-
arXiv preprint arXiv:1709.00440, 2017. bedded and Real-Time Computing Systems and Applications (RTCSA),
[411] Roberto Gonzalez, Filipe Manco, Alberto Garcia-Duran, Jose Mendes, IEEE 23rd International Conference on, pages 1–6, 2017.
Felipe Huici, Saverio Niccolini, and Mathias Niepert. Net2Vec: Deep [431] Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Hender-
learning for the network. In Proceedings of the Workshop on Big Data son, and Przemysław Szczepaniak. Fast, compact, and high quality
Analytics and Machine Learning for Data Communication Networks, LSTM-RNN based statistical parametric speech synthesizers for mobile
pages 13–18, 2017. devices. arXiv preprint arXiv:1606.06061, 2016.
[412] Liu, Chenxi and Zoph, Barret and Shlens, Jonathon and Hua, Wei and [432] Gabriel Falcao, Luís A Alexandre, J Marques, Xavier Frazão, and
Li, Li-Jia and Fei-Fei, Li and Yuille, Alan and Huang, Jonathan and Joao Maria. On the evaluation of energy-efficient deep learning
Murphy, Kevin. Progressive neural architecture search. arXiv preprint using stacked autoencoders on mobile gpus. In Parallel, Distributed
arXiv:1712.00559, 2017. and Network-based Processing (PDP), 25th Euromicro International
[413] David H Wolpert and William G Macready. No free lunch theorems for Conference on, pages 270–273. IEEE, 2017.
optimization. IEEE transactions on evolutionary computation, 1(1):67– [433] Surat Teerapittayanon, Bradley McDanel, and HT Kung. Distributed
82, 1997. deep neural networks over the cloud, the edge and end devices. In
[414] Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. A survey of Distributed Computing Systems (ICDCS), IEEE 37th International
model compression and acceleration for deep neural networks. arXiv Conference on, pages 328–339, 2017.
preprint arXiv:1710.09282, 2017. To appear in IEEE Signal Processing [434] Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P
Magazine. How, and John Vian. Deep decentralized multi-task multi-agent
[415] Nicholas D Lane, Sourav Bhattacharya, Akhil Mathur, Petko Georgiev, reinforcement learning under partial observability. arXiv preprint
Claudio Forlivesi, and Fahim Kawsar. Squeezing deep learning into arXiv:1703.06182, 2017.
IEEE COMMUNICATIONS SURVEYS & TUTORIALS 53

[435] Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. [456] Huandong Wang, Fengli Xu, Yong Li, Pengyu Zhang, and Depeng Jin.
Hogwild: A lock-free approach to parallelizing stochastic gradient Understanding mobile traffic patterns of large scale cellular towers in
descent. In Advances in neural information processing systems, pages urban environment. In Proc. ACM IMC, pages 225–238, 2015.
693–701, 2011. [457] Cristina Marquez, Marco Gramaglia, Marco Fiore, Albert Banchs,
[436] Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Cezary Ziemlicki, and Zbigniew Smoreda. Not all Apps are created
Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaim- equal: Analysis of spatiotemporal heterogeneity in nationwide mobile
ing He. Accurate, large Minibatch SGD: Training imagenet in 1 hour. service usage. In Proceedings of the 13th ACM Conference on
arXiv preprint arXiv:1706.02677, 2017. Emerging Networking Experiments and Technologies. ACM, 2017.
[437] ShuaiZheng Ruiliang Zhang and JamesT Kwok. Asynchronous dis- [458] Gianni Barlacchi, Marco De Nadai, Roberto Larcher, Antonio Casella,
tributed semi-stochastic gradient optimization. In AAAI, 2016. Cristiana Chitic, Giovanni Torrisi, Fabrizio Antonelli, Alessandro
[438] Corentin Hardy, Erwan Le Merrer, and Bruno Sericola. Distributed Vespignani, Alex Pentland, and Bruno Lepri. A multi-source dataset of
deep learning on edge-devices: feasibility via adaptive compression. In urban life in the city of Milan and the province of Trentino. Scientific
Network Computing and Applications (NCA), IEEE 16th International data, 2, 2015.
Symposium on, pages 1–8. IEEE, 2017. [459] Liang Liu, Wangyang Wei, Dong Zhao, and Huadong Ma. Urban
[439] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and resolution: New metric for measuring the quality of urban sensing.
Blaise Aguera y Arcas. Communication-efficient learning of deep net- IEEE Transactions on Mobile Computing, 14(12):2560–2575, 2015.
works from decentralized data. In Proceedings of the 20th International [460] Denis Tikunov and Toshikazu Nishimura. Traffic prediction for mobile
Conference on Artificial Intelligence and Statistics, volume 54, pages network using Holt-Winters exponential smoothing. In Proc. SoftCOM,
1273–1282, Fort Lauderdale, FL, USA, 20–22 Apr 2017. Split–fDubrovnik, Croatia, September 2007.
[440] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, [461] Hyun-Woo Kim, Jun-Hui Lee, Yong-Hoon Choi, Young-Uk Chung, and
H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Hyukjoon Lee. Dynamic bandwidth provisioning using ARIMA-based
Karn Seth. Practical secure aggregation for privacy preserving machine traffic forecasting for Mobile WiMAX. Computer Communications,
learning. Cryptology ePrint Archive, Report 2017/281, 2017. https: 34(1):99–106, 2011.
//eprint.iacr.org/2017/281. [462] Muhammad Usama, Junaid Qadir, Aunn Raza, Hunain Arif, Kok-
[441] Suyog Gupta, Wei Zhang, and Fei Wang. Model accuracy and runtime Lim Alvin Yau, Yehia Elkhatib, Amir Hussain, and Ala Al-Fuqaha.
tradeoff in distributed deep learning: A systematic study. In Data Unsupervised machine learning for networking: Techniques, applica-
Mining (ICDM), IEEE 16th International Conference on, pages 171– tions and research challenges. arXiv preprint arXiv:1709.06599, 2017.
180, 2016. [463] Chong Zhou and Randy C Paffenroth. Anomaly detection with
[442] B McMahan and Daniel Ramage. Federated learning: Collaborative robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD
machine learning without centralized training data. Google Research International Conference on Knowledge Discovery and Data Mining,
Blog, 2017. pages 665–674. ACM, 2017.
[443] Angelo Fumo, Marco Fiore, and Razvan Stanica. Joint spatial and [464] Martín Abadi and David G Andersen. Learning to protect com-
temporal classification of mobile traffic demands. In Conference on munications with adversarial neural cryptography. arXiv preprint
Computer Communications, pages 1–9. IEEE, 2017. arXiv:1610.06918, 2016.
[465] Silver David, Julian Schrittwieser, Karen Simonyan, Ioannis
[444] Zhiyuan Chen and Bing Liu. Lifelong machine learning. Synthesis
Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas
Lectures on Artificial Intelligence and Machine Learning, 10(3):1–145,
Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap,
2016.
Fan Hui, Laurent Sifre, George van den Driessche, Graepel Thore,
[445] Sang-Woo Lee, Chung-Yeon Lee, Dong-Hyun Kwak, Jiwon Kim,
and Demis Hassabis. Mastering the game of Go without human
Jeonghee Kim, and Byoung-Tak Zhang. Dual-memory deep learning
knowledge. Nature, 550(7676):354–359, 2017.
architectures for lifelong learning of everyday human behaviors. In
IJCAI, pages 1669–1675, 2016.
[446] Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Dani-
helka, Agnieszka Grabska-Barwińska, Sergio Gómez Colmenarejo,
Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. Hybrid
computing using a neural network with dynamic external memory.
Nature, 538(7626):471–476, 2016.
[447] German I Parisi, Jun Tani, Cornelius Weber, and Stefan Wermter.
Lifelong learning of human actions with deep neural network self-
organization. Neural Networks, 2017.
[448] Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J Mankowitz, and
Shie Mannor. A deep hierarchical approach to lifelong learning in
Minecraft. In AAAI, pages 1553–1561, 2017.
[449] Daniel López-Sánchez, Angélica González Arrieta, and Juan M Cor-
chado. Deep neural networks and transfer learning applied to multime-
dia web mining. In Distributed Computing and Artificial Intelligence,
14th International Conference, volume 620, page 124. Springer, 2018.
[450] Ejder Baştuğ, Mehdi Bennis, and Mérouane Debbah. A transfer
learning approach for cache-enabled wireless networks. In Modeling
and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt),
13th International Symposium on, pages 161–166. IEEE, 2015.
[451] Li Fei-Fei, Rob Fergus, and Pietro Perona. One-shot learning of
object categories. IEEE transactions on pattern analysis and machine
intelligence, 28(4):594–611, 2006.
[452] Mark Palatucci, Dean Pomerleau, Geoffrey E Hinton, and Tom M
Mitchell. Zero-shot learning with semantic output codes. In Advances
in neural information processing systems, pages 1410–1418, 2009.
[453] Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al.
Matching networks for one shot learning. In Advances in Neural
Information Processing Systems, pages 3630–3638, 2016.
[454] Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, and Fei Sha.
Synthesized classifiers for zero-shot learning. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages
5327–5336, 2016.
[455] Junhyuk Oh, Satinder Singh, Honglak Lee, and Pushmeet Kohli. Zero-
shot task generalization with multi-task deep reinforcement learning.
arXiv preprint arXiv:1706.05064, 2017.

You might also like