End-to-End Learning From Spectrum Data: A Deep Learning Approach For Wireless Signal Identification in Spectrum Monitoring Applications
End-to-End Learning From Spectrum Data: A Deep Learning Approach For Wireless Signal Identification in Spectrum Monitoring Applications
End-to-End Learning From Spectrum Data: A Deep Learning Approach For Wireless Signal Identification in Spectrum Monitoring Applications
Received February 16, 2018, accepted March 14, 2018, date of publication March 26, 2018, date of current version April 23, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2818794
ABSTRACT This paper presents end-to-end learning from spectrum data—an umbrella term for new
sophisticated wireless signal identification approaches in spectrum monitoring applications based on deep
neural networks. End-to-end learning allows to: 1) automatically learn features directly from simple wireless
signal representations, without requiring design of hand-crafted expert features like higher order cyclic
moments and 2) train wireless signal classifiers in one end-to-end step which eliminates the need for complex
multi-stage machine learning processing pipelines. The purpose of this paper is to present the conceptual
framework of end-to-end learning for spectrum monitoring and systematically introduce a generic methodol-
ogy to easily design and implement wireless signal classifiers. Furthermore, we investigate the importance of
the choice of wireless data representation to various spectrum monitoring tasks. In particular, two case studies
are elaborated: 1) modulation recognition and 2) wireless technology interference detection. For each case
study three convolutional neural networks are evaluated for the following wireless signal representations:
temporal IQ data, the amplitude/phase representation, and the frequency domain representation. From our
analysis, we prove that the wireless data representation impacts the accuracy depending on the specifics and
similarities of the wireless signals that need to be differentiated, with different data representations resulting
in accuracy variations of up to 29%. Experimental results show that using the amplitude/phase representation
for recognizing modulation formats can lead to performance improvements up to 2% and 12% for medium
to high SNR compared to IQ and frequency domain data, respectively. For the task of detecting interference,
frequency domain representation outperformed amplitude/phase and IQ data representation up to 20%.
INDEX TERMS Big spectrum data, spectrum monitoring, end-to-end learning, deep learning, convolutional
neural networks, wireless signal identification, IoT.
2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
18484 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 6, 2018
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
M. Kulin et al.: End-to-End Learning From Spectrum Data: Deep Learning Approach for Wireless Signal Identification
novel identification mechanisms that will provide awareness output (i.e. the predicted type of signal), is learned purely
about the radio environment. For instance, technology identi- from data [4].
fication, modulation type recognition and interference source
detection are essential for interference mitigation strategies A. SCOPE AND CONTRIBUTIONS
to continue effective use of the scarce spectral resources and This paper provides a comprehensive introduction to end-to-
enable the coexistence of heterogeneous wireless networks. end learning from spectrum data. The main contributions of
In this paper, we investigate end-to-end learning from spec- this paper are as follows:
trum data as a unified approach to tackle various challenges • Potential end-to-end learning use cases for spectrum
related to the problems of inefficient spectrum management, monitoring are identified. In particular, two categories
utilization and regulation that the next generation of wire- are presented. The first category are use cases where
less networks is facing. Whether the goal is to recognize detecting spectral opportunities and spectrum sharing
a technology or a particular modulation type, identify the is necessary such as in cognitive radio and emerging
interference source or an interference-free frequency chan- cognitive IoT networks. The second, are scenarios where
nel, we argue that the various problems may be treated as detecting radio emitters is needed such as in spectrum
a generic problem type that we refer to as wireless signal regulation.
identification, which is a natural target for machine learning • To set a preliminary background on this interdisci-
classification techniques. The term end-to-end implies that plinary topic a brief introduction to machine learning/
the process of extracting wireless signal features and learning deep learning is provided and their role for spectrum
a wireless signal classifier consists of a single learning proce- monitoring is discussed. Then, a reference model for
dure. More general, end-to-end learning refers to processing deep learning for spectrum monitoring applications is
architectures where the entire pipeline, connecting the input defined.
(i.e the data representation of a sensed wireless signal) to the • A conceptual framework for end-to-end learning is pro-
desired output (i.e. the predicted type of signal), is learned posed, followed by a comprehensive overview of the
purely from data [4]. It is indisputable that monitoring and methodology for collecting spectrum data, designing
understanding the spectrum resource usage will become a wireless signal representations, forming training data
critical asset for 5G in order to improve and regulate the radio and training deep neural networks for wireless signal
spectrum utilization. However, monitoring the spectrum use classification tasks.
in such a complex wireless system requires distributed sens- • To demonstrate the approach, experiments are carried
ing over a wide frequency range, resulting in a radio spectrum out for two case studies: (i) modulation recognition
data deluge [3]. Extracting meaningful information about the and (ii) wireless technology interference detection, that
spectrum usage from massive and complex spectrum datasets demonstrate the impact of the choice of wireless data
requires sophisticated and advanced algorithms. This paves representation on the presented results. For modulation
the way for new innovative spectrum access schemes and recognition, the following modulation techniques are
the development of novel identification mechanisms that will considered: BPSK (binary phase shift keying), QPSK
provide awareness about the radio environment. For instance, (quadrature phase shift keying), m-PSK (phase shift key-
technology identification, modulation type recognition and ing, for m = 8), m-QAM (quadrature amplitude modu-
interference source detection are essential for interference lation, for m = 16 and 64), CPFSK (continuous phase
mitigation strategies to continue effective use of the scarce frequency shift keying), GFSK (Gaussian frequency
spectral resources and enable the coexistence of heteroge- shift keying) and m-PAM (pulse amplitude modulation
neous wireless networks. for m = 4). For wireless technology identification, three
In this paper, we investigate end-to-end learning from spec- representative technologies operating in the unlicensed
trum data as a unified approach to tackle various challenges bands are analysed: IEEE 802.11b/g, IEEE 802.15.4 and
related to the problems of inefficient spectrum management, IEEE 802.15.1.
utilization and regulation that the next generation of wire- The rest of the paper is organized as follows. The remain-
less networks is facing. Whether the goal is to recognize der of Section I presents related work. Section II presents
a technology or a particular modulation type, identify the motivating scenarios for the proposed approach. Section III
interference source or an interference-free frequency channel, introduces basic concepts related to machine learning/deep
we argue that the various problems may be treated as a generic learning concluded with a high-level processing pipeline
problem type that we refer to as wireless signal identification, for their application to spectrum monitoring scenarios.
which is a natural target for machine learning classification Section IV presents the end-to-end learning methodology
techniques. The term end-to-end implies that the process of for wireless signal classification. In Section V the method-
extracting wireless signal features and learning a wireless ology is applied to two scenarios and experimental results
signal classifier consists of a single learning procedure. More are discussed. Section VI discusses open challenges related
general, end-to-end learning refers to processing architec- to the implementation and deployment of future end-to-end
tures where the entire pipeline, connecting the input (i.e the spectrum management systems. Section VII concludes the
data representation of a sensed wireless signal) to the desired paper.
i.e. the feature extractor φ transforms the data vector d ∈ Rd functions are the hyperbolic tangent function (tanh), g(x) =
2
into a new form, x ∈ Rn , more suitable for making pre- 1+e−2x
− 1, and the sigmoid activation g(x) = 1+e1 −x .
dictions. The importance of feature engineering highlights In order to form a richer representation of the input signal,
the bottleneck of machine learning algorithms: their inability commonly, multiple filters are stacked so that each hidden
to automatically extract the discriminative information from layer consists of multiple feature maps, {h(l) , l = 0, . . . , L}
data. (e.g., L = 64, 128, . . ., etc). The number of filters per layer
Feature learning is a branch of machine learning that is a tunable parameter or hyper-parameter. Other tunable
moves the concept of learning from ‘‘learning the model’’ to parameters are the filter size, the number of layers, etc. The
‘‘learning the features’’. One popular feature learning method selection of values for hyper-parameters may be quite diffi-
is deep learning. In particular, this paper focuses on convolu- cult, and finding it commonly is much an art as it is science.
tional neural networks (CNN). An optimal choice may only be feasible by trial and error. The
Convolutional neural networks perform feature learning filter sizes are selected according to the input data size so as to
via non-linear transformations implemented as a series of have the right level of granularity that can create abstractions
nested layers. The input data is a multidimensional data array, at the proper scale. For instance, for a 2D square matrix input,
called tensor, that is presented at the visible layer. This is typ- such as spectrograms, common choices are 3×3, 5×5, 9×9,
ically a grid-like topological structure, e.g. time-series data, etc. For a wide matrix, such as a real-valued representation of
which can be seen as a 1D grid taking samples at regular time the complex I and Q samples of the wireless signal in R2×N ,
intervals, pixels in images with a 2D layout, a 3D structure suitable filter sizes may be 1 × 3, 2 × 3, 2 × 5, etc.
of videos, etc. Then a series of hidden layers extract several The penultimate layer in a CNN consists of neurons that are
abstract features. Those layers are ‘‘hidden’’ because their fully-connected with all feature maps in the preceding layer.
values are not given. Instead, the deep learning model must Therefore, these layers are called fully-connected or dense
determine which data representations are useful for explain- layers. The very last layer is a softmax classifier, which
ing the relationships in the observed data. Each layer consists computes the posterior probability of each class label over
of several kernels that perform a convolution over the input; K classes as
therefore, they are also referred to as convolutional layers. ezi
Kernels are feature detectors, that convolve over the input ŷi = PK , i = 1, . . . , K (14)
zj
and produce a transformed version of the data at the output. j=1 e
Those are banks of finite impulse response filters as seen in That is, the scores zi computed at the output layer, also called
signal processing, just learned on a hierarchy of layers. The logits, are translated into probabilities. A loss function, l,
filters are usually multidimensional arrays of parameters that is calculated on the last fully-connected layer that measures
are learnt by the learning algorithm [24] through a training the difference between the estimated probabilities, ŷi , and
process called backpropagation. the one-hot encoding of the true class labels, yi . The CNN
For instance, given a two-dimensional input x, parameters, θ, are obtained by minimizing the loss function
a two-dimensional kernel h computes the 2D convolution by on the training set {xi , yi }i∈S of size m,
X
(x ∗ h)i,j = x[i, j] ∗ h[i, j] min l(ŷi , yi ) (15)
θ
i∈S
XX
= x[n, m] · h[i − n][j − m] (12)
n m where l(.) is typically the mean squared error P l(y, ŷ) = ky −
ŷk22 or the categorical cross-entropy l(y, ŷ) = m i=1 yi log(ŷi )
i.e. the dot product between their weights and a small region
for which a minus sign is often added in front to get the
they are connected to in the input.
negative log-likelihood.
After the convolution, a bias term is added and a point-
To control over-fitting, typically regularization is used in
wise nonlinearity g is applied, forming a feature map at the
combination with dropout, which is a new extremely effective
filter output. If we denote the l-th feature map at a given
technique that ‘‘drops out’’ a random set of activations in a
convolutional layer as hl , whose filters are determined by the
layer. Each unit is retained with a fixed probability p, typically
coefficients or weights Wl , the input x and the bias bl , then
chosen using a validation set, or set to 0.5 which has shown
the feature map hl is obtained as follows
to be close to optimal for a wide range of applications [25].
hl i,j = g((W l ∗ x)ij + bl ) (13)
C. DEEP LEARNING FROM SPECTRUM DATA
where ∗ is the 2D convolution defined by Equation 12, Intelligence capabilities will be of paramount importance in
while g(·) is the activation function. Typically, the rectifier the development of future wireless communication systems
activation function is used for CNNs, which is defined by to allow them observe, learn and respond to its complex and
g(x) = max(0, x). Kernels using the rectifier are called ReLU dynamic operating environment. Figure 2 shows a processing
(Rectified Linear Unit) and have shown to greatly acceler- pipeline for realizing intelligent behaviour using deep learn-
ate the convergence during the training process compared ing in an end-to-end learning from spectrum data setup. The
to other activation functions. Others common activation pipeline consists of:
FIGURE 3. End-to-end learning processing chain to obtain radio spectrum feature vectors.
sent to the digital to analog converter module (D/A) where and hardware imperfections of the transmitter and receiver.
the waveform is transformed into an analog continuous time Typical hardware related impairments are:
signal, sb (t). The resulting signal is a baseband signal that is • Noise caused by the resistive components such as the
frequency shifted by the carrier frequency fc to produce the receiver antenna. This thermal noise may be modelled as
wireless signal s(t) that is defined by additive white Gaussian noise (AWGN), n ∼ N (0, σ 2 ).
s(t) = <{sb (t)ej2πfc t } • Frequency offset caused by the slightly different local
oscillator (LO) signal frequencies at the transmitter, fc ,
= <{sb (t)} cos(2π fc t) − ={sb (t)} sin(2π fc t) (16)
and receiver, fc 0 .
where s(t) is a real-valued bandpass signal with center fre- • Phase Noise, ϕ(t), caused by the frequency drift in the
quency fc , while sb (t) = <{sb (t)} + j={sb (t)} is the baseband LOs used to demodulate the received wireless signal.
complex envelope of s(t). It causes the angle of the LO signals to drift around its
intended instantaneous phase 2π fc t.
2) WIRELESS CHANNEL • Timing drift caused by the difference in sample rates at
The wireless channel is characterised by the variations of the the receiver and transmitter.
channel strength over time and over frequency. The varia- The received wireless signal model can be given by
tions are modeled as (i) large-scale fading, which charac- r(t) = <{rb (t)ej2πfc t }, where rb (t) is the baseband complex
terizes the path loss of the channel as a function of distance enveloped defined by
and shadowing by large objects such as buildings and hills,
1 0
and (ii) small-scale fading, which models constructive and rb (t) = (sb (t) ∗ hb (t, τ )) ej2π(fc −fc )t+ϕ(t) + n(t) (18)
destructive interference of the multiple propagation paths 2
between the transmitter and receiver. The channel effects can where hb (t, τ ) is the baseband channel equivalent with l dis-
be modeled as a linear time-varying system described by a tinct propagation paths, each characterised by a time varying
complex finite impulse response (FIR) filter h(t, τ ). If r(t) is path attenuation αi (t, τi ) and path delay τi , given by
the signal at the channel output, the input/output relation is l
αi (t, τ )ej2πfc τi (t) δ(τ − τi (t))
X
given by: hb (t, τ ) = (19)
r(t) = s(t) ∗ h(t, τ ) (17) i=0
the analog to digital (A/D) converter, which samples the Transformation 2 (A/φ vector): The A/φ vector is a
continuous-time signal at a rate fs = 1/Ts samples per second mapping from the raw complex data vector rk ∈ CN into two
and generates the discrete version rn . The discrete signal real-valued vectors, one that represents its phase, φ, and one
rn = r[nTS ] consists of two components, the in-phase, rI , that represents its magnitude A, i.e.
and quadrature component, rQ , i.e. T
A/φ x
xk = AT (26)
rn := r[n] = rI [n] + jrQ [n] (20) xφ
Suppose, we sample for a period T and collect a batch of N A/φ
where xk ∈ R2×N , and the phase, xφ ∈ RN , and
samples. The signal samples r[n] ∈ C, n = 0, . . . , N − 1, magnitude vectors, xA ∈ RN , have the elements
are a time-series of complex raw samples which may be
rq
represented as a data vector. The k-th data vector can be xφ n = arctan( n ),
denoted as rin
xAn = (rq n + ri 2n )1/2 , n = 0, . . . , N − 1
2
(27)
rk = [r[0], . . . , r[N − 1]]T (21)
In short, this may be written as
These data vectors rk are windowed or segmented
representations of the received continuous sample stream, f : CN → R2×N (28)
similarly as is seen in audio signal processing. They carry A/φ
rk 7 → xk (29)
information for assessing which type of wireless signal is
sensed. This may be the type of modulation, the type of Transformation 3 (FFT vector): The FFT vector is a map-
wireless technology, interferer, etc. ping from the raw time-domain complex data vector rk ∈ CN
into its frequency-domain representation vector consisting of
C. WIRELESS SIGNAL REPRESENTATION two sets of real-valued data vectors, one that carries the real
After collecting the k-th data vector the ML receiver base- component of its complex FFT xFre and one that holds the
band processing chain transforms it into a new representation imaginary component of its FFT xFim . That is
suitable for training. That is, the k-th data vector rk ∈ CN is
xFre T
F
translated into the k-th feature vector xk ∈ RN xk = (30)
xFim T
rk 7 → xk (22) The translation to frequency-domain is performed by a Fast
This paper considers three simple data representations. The Fourier Transform (FFT) denoted by F so that
first, is a real-valued equivalent of the raw complex temporal
F : rk 7 → w (31)
wireless signal inspired by the results in [9]. The second,
is based on the amplitude and phase of the raw wireless signal, xFre = <{w} (32)
similar to the one used in the work of Selim et al. [10] for xFim = ={w} (33)
identifying radar signals. The last is a frequency domain rep-
resentation inspired by the work of Danev and Capkun [28] Here, w ∈ CN , xFre , xFim ∈ RN while <{.} and ={.} can
which showed that frequency-based features outperform their be conceived as operators giving the real and imaginary parts
time-based equivalents for wireless device identification. of a complex vector, respectively. Thus, the resulting FFT
Each data representation snapshot has a fixed length of N data vector is xF
k ∈R
2×N . In short, this may be denoted as
that can be used for deep learning to extract high level features
for wireless signal identification.
The motivation behind using these three transformations is
to train three deep learning models where: one will explore
the raw data to discover the patterns and temporal features
solely from raw samples, one will see the amplitude and
phase information in the time domain, while the third will
see the frequency domain representation to perform feature
extraction in the frequency space.
We investigate how the choice of data representation influ-
ences the classification accuracy. The data representations
have been carefully designed so that all of them create a
vector of the same dimension and type in R2×N . The reason FIGURE 5. Constellation diagram, Amplitude and Phase signal time plot
for various modulation schemes. (a) BPSK. (b) QPSK. (c) 8PSK. (d) QAM16.
for that is to obtain a unified vector shape which will allow to (e) QAM64. (f) CPFSK. (g) GFSK. (h) PAM4.
use the same CNN architecture for training on all three data
representations and for different use cases.
as described in Section IV-B. In total, m snapshots for the data
D. WIRELESS SIGNAL CLASSIFICATION vectors rk are collected. These data vectors contain emitting
The problem of identifying the wireless signals from spec- signals that contain distinctive features. In order to extract
trum data can be treated as a data-driven machine learning these features, each data vector is transformed into a feature
classification problem. In order to apply ML techniques to vector, xk , according to the data transformations introduced
this setup, as described in Section III-A the wireless com- in Section IV-C and the results are stacked into an observation
munication problem has to be formulated as a parametric matrix X ∈ Rm×n . Each data vector is further annotated with
estimation problem where certain parameters are unknown the corresponding wireless signal type in form of a discrete
and need to be estimated. one-hot encoded vector yk ∈ RK , k = 1, . . . , m.
Given a set of K wireless signals to be detected, the problem The obtained data pairs, {(xk , yk ), k = 1, . . . , m}, form a
of identifying a signal from this set turns into a K-class clas- dataset suitable to estimate the parameters, θ, that character-
sification problem. Suppose a data measurement point knows ize the wireless signal classifier, f .
the transmitted signal type (e.g. modulation type, interfering It is instructive to note that the training phase presumes
emitter type, etc.) for a time period t = [0, T ) (i.e. a ‘‘training a prior information about the type of wireless signal the was
period’’) and collects several complex baseband time series used on the transmitter. However, once the classifier is trained
of n measurements for each signal type into a data vector rk , this information will no longer be necessary and the signals
A. DATASETS DESCRIPTION
1) RADIO MODULATION RECOGNITION
To evaluate end-to-end learning for radio modulation type
identification, we consider measurements of the received
wireless signal for various modulation formats from the
‘‘RadioML 2016.10a Modulation’’ dataset [9]. Specifically,
for all experiments performed in this paper we used labelled
data vectors for the following digital modulation formats:
BPSK, QPSK, 8-PSK, 16-QAM, 64-QAM, CPFSK, GFSK,
4-PAM, WBFM, AM-DSB, AM-SSB. The data vectors, xk ,
were collected at a sampling rate 1MS/s in N = 128 sample
batches, each containing between 8 and 16 symbols corrupted
by random noise, time offset, phase, and wireless chan-
FIGURE 6. Frequency magnitude spectrum for various modulation
schemes. (a) BPSK. (b) QPSK. (c) 8PSK. (d) QAM16. (e) QAM64. (f) CPFSK.
nel distortions as described by the channel model in IV-A.
(g) GFSK. (h) PAM4. One-hot encoding is used to create a discrete set of 11 class
labels corresponding to 11 considered modulations, so that
the response variable forms a binary 11-vector yk ∈ R11 .
The task of modulation recognition is then a 11-class classi-
may be automatically identified by the model. That is, for
fication problem. In total, 220,000 data vectors xk ∈ R2×128
the i-th spectrum data vector input, xi , the predictor’s last
consisting of I and Q samples are used.
layer can automatically output an estimate of the probability
P(yi = k|xi ; θ ), where k ranges from 0 to K − 1. That is a
2) WIRELESS INTERFERENCE IDENTIFICATION IN ISM BANDS
score class. Finally, the predicted class is then the one with
the highest score, i.e. ŷi = argmax P(yi = k|xi ; θ ). The rise of heterogeneous wireless technologies operating
k in the unlicensed ISM bands has caused severe communica-
tion challenges due to cross-technology interference, which
V. EVALUATION SETUP adversely affects the performance of wireless networks.
To evaluate end-to-end learning from spectrum data, we train To tackle these challenges novel agile methods that can
CNN wireless signal classifiers for two use cases: (i) Radio assess the channel conditions are needed. We showcase end-
signal modulation recognition and (ii) Wireless interference to-end learning as a promising approach that can deter-
identification, for different wireless data representations. mine whether communication is feasible over the wireless
Radio signal modulation recognition relates to the prob- link by accurately identifying cross-technology interference.
lem of identifying the modulation structure of the received Specifically, the ‘‘Wireless interference’’ dataset [12] is used
wireless signal in spectrum monitoring tasks, as a step which consists of measurements gathered from standardized
towards understanding what type of communication scheme wireless communication systems based on IEEE 802.11b/g
and emitter is present. Modulation recognition is vital for (WiFi), IEEE 802.15.4 (Zigbee) and IEEE 802.15.1 (Blue-
radio spectrum regulation and in dynamic spectrum access tooth) standards, operating in the 2.4GHz frequency band.
applications. The dataset is labelled according to the allocated frequency
Wireless interference identification is the task of identi- channel and the corresponding wireless technology, resulting
fying the type of coexisting wireless emitter, that is operating in 15 different classes. Compared to the modulation recogni-
in the same frequency band. This is essential for effective tion dataset, this dataset consists of measurements gathered
interference mitigation and coexistence management in unli- assuming a communication channel model with less channel
censed frequency bands such as, for example, the 2.4GHz impairments. In particular, a flat fading channel with additive
industrial, scientific and medical (ISM) band shared by het- white Gaussian noise was assumed. I and Q samples were
erogeneous wireless communication systems. collected at a sampling rate 10MS/s in batches of 128 each,
For each task the CNNs were trained on three characteristic capturing hereby 1 to 12 symbols for each utilized wire-
data representations: IQ vectors, Amplitude/Phase vectors less technology depending on the symbol duration. In total,
and FFT vectors, as introduced in Section IV-C. As a result 225,225 snapshots were collected.
The intermediate statistics are accumulated over all under high SNR conditions depending on the used data rep-
instances in the test set and used to derive three further resentation the achieved Pavg , Ravg and F1avg are in the range
performance metrics precision (P), recall (R) and F1 score: of 0.67-0.86. For medium SNR, the performance degrades
more than for the CNNIF models, with a Pavg , Ravg and F1avg
TP TP
P= , R= (40) in the range of 0.59-0.75. Under low SNR, the CNNM models
TP + FN TP + FP show poor performance with the metrics values in the range
precision × recall
F1 score = 2 × (41) of 0.22-0.36.
precision + recall This may be explained by the different channel models
Precision, recall and F1 score are per-class performance used for generating the datasets for the two case studies,
metrics. In order to obtain one measure that quantifies and the type of signals that need to be discriminated in each
the overall performance of the classifier, multiple per-class problem. For instance, for the IF case a simple channel model
performance measures are combined using a prevalence- with flat fading was considered, while for modulation recog-
weighted macro-average across the class metrics, Pavg , Ravg nition the channel model was a time-varying multipath fading
and F1avg . For a detailed overview of the per-class perfor- channel and other transceiver impairments were also taken
mance the confusion matrix is used. into account. Hence, the modulation recognition dataset used
a more realistic channel model in the data collection process.
TABLE 2. Performance comparison for the trained CNN signal classifier However, this impacts the classification performance because
models for three SNR scenarios. it is more challenging to design a robust signal classifier for
this case compared to the channel condition considered in
the IF classification problem. Furthermore, the signals that
are classified for IF detection have different characteristics
by design. In particular, they use different medium access
schemes, channel bandwidth and modulation techniques,
which makes it easier for the classifier to differentiate them.
In contrast, the selected modulation recognition signals are
more similar to each other, because subsets of modulations
are based on similar design principles (e.g. all are single
carrier modulations).
To understand the results better confusion matrices for
the CNNM M M
IQ , CNNA/φ and CNNF models are presented
on Figure 7 for the case of SNR=6dB. It can be seen that
the classifiers shows good performance by discriminating
AM-DSB, AM-SSB, BPSK, CPFSK, GFSK and PAM4 with
high accuracy for all three data representations. The main
discrepancies are that of QAM16 misclassified as QAM64,
which can be explained by the underlying dataset. QAM16 is
a subset of QAM64 making it difficult for the classifier to
differentiate them. It can be further noticed that the ampli-
E. NUMERICAL RESULTS tude/phase information helped the model better discrimi-
1) CLASSIFICATION PERFORMANCE nate QAM16/QAM64, leading to a clearer diagonal for the
The CNN network described in Table 1 is trained on three CNNM M
A/φ compared to CNNIQ . There are further difficulties
data representations for two wireless signal identification in separating AM-DSB and WBFM signals. This confusion
problems. Table 2 provides the averaged performance for may be caused by periods of absence of the signal, as the
the six classifiers. That is, the prevalence-weighted macro- modulated signals were created from real audio streams.
average of precision, recall and F1 score under three SNR In case of using the frequency spectrum data, it can be noticed
scenarios, high (SNR=18dB), medium (SNR=0dB) and low that the CNNM F classifier confuses mostly QPSK, 8PSK,
(SNR=−8dB). QAM16 and QAM16 which is due to their similarities in
We observe that the models for interference classification the frequency domain after channel distortions, making the
show better performance compared to the modulation recog- received symbols indiscernible from each other.
nition case. For high SNR conditions, the CNNIF models
achieve a Pavg , Ravg and F1avg between 0.98 and 0.99. For 2) NOISE SENSITIVITY
medium SNR the metrics are in the range of 0.94 and 0.99, In this section, we evaluate the detection performance for
while under low SNR conditions the performance slightly the CNN signal classifiers under different noise levels. This
degrades to 0.81-0.90. The CNNM models show less robust- allows to investigate the communication range over which the
ness to varying SNR conditions, and in general achieve lower classifiers can be effectively used. To estimate the sensitivity
classification performance for all scenarios. In particular, to noise the same testing sets were used labelled with SNR
classifiers, where the CNNIF F showed best performance dur- VI. OPEN CHALLENGES
ing all SNR scenarios. In particular, for low SNR scenar- Despite the encouraging research results, a deep learning-
ios significant improvements can be noticed compared to based end-to-end learning framework for spectrum utilization
the CNNIF IF
A/φ and CNNI /Q models with a performance gain optimization is still in its infancy. In the following we discuss
improvement of at least ∼ 4dB, and classification accuracy some of the most important challenges posed by this exciting
improvement of at least ∼ 9%. Schmidt et al. [12] used interdisciplinary field.
IQ and FFT data representations and reported similar results
as our CNNIF IF
I /Q and CNNF models. However, again we A. SCALABLE SPECTRUM MONITORING
noticed that the amplitude/phase representation is beneficial The first requirement for a cognitive spectrum monitoring
for discriminating signals compared to raw IQ data. But the framework is to have an infrastructure that will support scal-
IF identification classifier performed best on FFT data repre- able spectrum data collection, transfer and storage. In order
sentations. This may be explained by the fact that the wireless to obtain a detailed overview of the spectrum use, the end-
signals from the ISM band standards (ZigBee, WiFi and Blue- devices will be required to perform distributive spectrum
tooth) have more expressive features in the frequency domain sensing [32] over a wide frequency range and cover the area
as they have different frequency spectrum characteristics in of interest. In order to limit the data overhead caused by
terms of bandwidth and modulation/spreading method. huge amounts of I and Q samples that are generated by
Examples of other existing research attempts that study monitoring devices, the predictive models can be pushed to
the application of CNNs to radio signal identification the end devices itself. Recently, [33] proposed Electrosense,
are [10] and [11]. Selim et al. [10] trained a CNN with an initiative for large-scale spectrum monitoring in different
5 convolutional and 2 fully connected layers to identify radar regions of the world using low-cost sensors and providing
signals based on amplitude and phase shifts data. Compared the processed spectrum data as textitopen spectrum data.
to the methodology presented in our work, Selim et al. [10] Access to large datasets is crucial for evaluating research
solved a binary classification problem, and as such the model advances and enabling a playground for wireless communi-
is evaluated using as metric the probability of radar pulse cation researchers interested to acquire a deeper knowledge
detection. Akeret et al. [11] train a CNN based on the of spectrum usage and to extract meaningful knowledge that
U-Net [31] architecture to detect RF interference in radio can be used to design better wireless communication systems.
astronomy applications. They use different performance met-
rics, such as the Area under curve (AUC) and receiver oper-
B. SCALABLE SPECTRUM LEARNING
ating curve (ROC) without a noise sensitivity performance
analysis (model accuracy vs. SNR). The heterogeneity of technologies operating in different radio
bands requires to continuously monitor multiple frequency
bands making the volume and velocity of radio spectrum data
3) TAKEAWAYS several orders of magnitude higher compared to the typical
End-to-end learning is a powerful tool for data-driven spec- data seen in other wireless communication systems such as
trum monitoring applications. It can be applied to various wireless sensor networks (e.g. temperature, humidity reports,
wireless signals to effectively detect the presence of radio etc.). In order to handle this large volume of data and extract
emitters in a unified way without requiring design of expert meaningful information over the entire spectrum, a scalable
features. Experiments have shown that the performance of platform for processing, analysing and learning from big
wireless signal classifiers depends on the used data repre- spectrum data has to be designed and implemented [3], [34].
sentation. This suggests that investigating several data rep- Efficient data processing and storage systems and algorithms
resentations is important to arrive at accurate wireless signal for massive spectrum data analytics [35] are needed to extract
classifiers for a particular task. Furthermore, the choice of valuable information from such data and incorporate it into
data representation depends on the specifics of the problem, the spectrum decision/policy process in real-time.
i.e. the considered wireless signal types for classification.
Signals within a dataset that exhibit similar characteristics C. FLEXIBLE SPECTRUM MANAGEMENT
in one data representation are more difficult to discriminate, One of the main communication challenges for 5G will be
which puts a higher burden on the model learning proce- inter-cell and cross-technology interference. To support spec-
dure. Choosing the right wireless data representation can trum decisions and policies in such complex system, 5G net-
notably increase the classification performance, for which works need to support an architecture for flexible spectrum
domain knowledge about the specifics of the underlying management.
signals targeted in the spectrum monitoring application can Software-ization at the radio level will be a key enabler
assist. Additionally, the performance of the classifier can be for flexible spectrum management as it allows automation
improved by increasing the quality of the wireless signal for the collection of spectrum data, flexible control and
dataset, by adding more training examples, more variation reconfiguration of cognitive radio elements and parameters.
among the examples (e.g. varying channel conditions), and There are several individual works that focused on this issue.
tuning the model hyper-parameters. Some initiatives for embedded devices are WiSCoP [36],
Atomix [37] and [38]. Recently, there is also a growing inter- learning and function approximation techniques, well-suited
est in academia and industry to apply Software Defined Net- for different wireless signal classification problems. Further-
working (SDN) and Network Function Virtualization (NFV) more, the presented results indicated that for the wireless
to wireless networks [39]. Initiatives such as SoftAir [40], communication domain investigating different wireless data
Cloud RAN [41], OpenRadio [42] and several others are representations is important to determine the right representa-
still at the conceptual or prototype level. To bring flexible tion that exhibits discriminative characteristics for the signals
spectrum management strategies into realization and the com- that need to be classified. Specifically, in the modulation
mercial perspective a great deal of standardization efforts is recognition case study for medium-high SNR the CNN model
still required. trained on amplitude/phase representations outperformed the
other two models with a 2% and 10% performance improve-
D. SPECTRUM PRIVACY ment, while for low SNR conditions the model trained on IQ
The introduction of intelligent wireless systems raises several data representations showed best performance. For the task of
privacy issues. The spectrum will be monitored via hetero- detecting interference, the model trained on FFT data outper-
geneous radios including wireless sensor networks (WSNs), formed amplitude/phase and IQ data representation models
radio-frequency identification (RFID), cellular phones and by up to 20% for low SNR conditions, while for medium-
others, which may lead to misuse of the applications and high SNR up to 5% classification accuracy improvements.
cause severe privacy-related threats. Therefore, privacy is These results demonstrate the importance of both choos-
required at the spectrum data collection level. As spectrum ing the correct data representation and machine learning
data may be shared along the way, privacy has to be main- approach, both of which are systematically introduced in
tained also at data sharing levels. Thus, data anonymization, this paper. By following the proposed methodology, deeper
restricted data access, proper authentication and strict control insights can be obtained regarding the optimality of data
of intelligent radio users is required. representations for different research domains. As such,
we envisage this paper to empower and guide machine
VII. CONCLUSION learning/signal processing practitioners and wireless engi-
This paper presents a comprehensive and systematic intro- neers to design new innovative research applications of end-
duction to end-to-end learning from spectrum data - a deep to-end learning from spectrum data that address issues related
learning based unified approach for realizing various wire- to cross-technology coexistence, inefficient spectrum utiliza-
less signal identification tasks, which are the main build- tion and regulation.
ing blocks of spectrum monitoring systems. The approach
develops around the systematic application of deep learning ACKNOWLEDGEMENTS
techniques to obtain accurate wireless signal classifiers in an The authors would like to thank Associate
end-to-end learning pipeline. In particular, convolutional neu- Prof. Gerard J. M. Janssen for his insightful comments and
ral networks (CNNs) lend themselves well to this setting, Schmidt et al. [12] for sharing the ‘‘Wireless interference’’
because they consist of many layers of processing units capa- dataset.
ble to (i) automatically extract non-linear and more abstract REFERENCES
wireless signal features that are invariant to local spectral and [1] M. Höyhtyä et al., ‘‘Spectrum occupancy measurements: A survey and
temporal variations, and (ii) train wireless signal classifiers use of interference maps,’’ IEEE Commun. Surveys Tuts., vol. 18, no. 4,
that can outperform traditional approaches. pp. 2386–2414, 4th Quart., 2016.
[2] A. Hithnawi, H. Shafagh, and S. Duquennoy, ‘‘Understanding the impact
With the aim to raise awareness of the potential of this of cross technology interference on IEEE 802.15.4,’’ in Proc. 9th ACM Int.
emerging interdisciplinary research area, first, machine learn- Workshop Wireless Netw. Testbeds, Experim. Eval. Characterization, 2014,
ing, deep learning and CNNs were briefly introduced and a pp. 49–56.
[3] G. Ding, Q. Wu, J. Wang, and Y.-D. Yao. (2014). ‘‘Big spectrum data:
reference model for their application for spectrum monitoring The new resource for cognitive wireless networking.’’ [Online]. Available:
scenarios was proposed. Then, a framework for end-to-end https://arxiv.org/abs/1404.6508 2014
learning from spectrum data was presented. In particular, [4] U. Müller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, ‘‘Off-road obstacle
avoidance through end-to-end learning,’’ in Proc. Adv. Neural Inf. Process.
wireless data collection, the design of wireless signal features Syst., 2006, pp. 739–746.
and classifiers suitable for several wireless signal identifi- [5] E. Axell, G. Leus, E. G. Larsson, and H. V. Poor, ‘‘Spectrum sensing
cation tasks are elaborated. Three common wireless signal for cognitive radio: State-of-the-art and recent advances,’’ IEEE Signal
Process. Mag., vol. 29, no. 3, pp. 101–116, May 2012.
representations were defined, the raw IQ temporal wireless
[6] K. Kim, I. A. Akbar, K. K. Bae, J.-S. Um, C. M. Spooner, and J. H. Reed,
signal, the time domain amplitude and phase information ‘‘Cyclostationary approaches to signal detection and classification in cog-
data, and the spectral magnitude representation. The pre- nitive radio,’’ in Proc. 2nd IEEE Int. Symp. New Frontiers Dyn. Spectr.
sented methodology was validated on two active wireless Access Netw. (DySPAN), Apr. 2007, pp. 212–215.
[7] A. Fehske, J. Gaeddert, and J. H. Reed, ‘‘A new approach to signal
signal identification research problems: (i) modulation recog- classification using spectral correlation and neural networks,’’ in Proc.
nition crucial for dynamic spectrum access applications and 1st IEEE Int. Symp. New Frontiers Dyn. Spectr. Access Netw. (DySPAN),
(ii) wireless interference identification essential for effec- Nov. 2005, pp. 144–150.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
tive interference mitigation strategies in unlicensed bands. with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Pro-
Experiments have shown that CNNs are promising feature cess. Syst., 2012, pp. 1097–1105.
[9] T. J. O’Shea, J. Corgan, and T. C. Clancy, ‘‘Convolutional radio modulation [34] A. Zaslavsky, C. Perera, and D. Georgakopoulos. (2013). ‘‘Sensing as a
recognition networks,’’ in Proc. Int. Conf. Eng. Appl. Neural Netw., 2016, service and big data.’’ [Online]. Available: https://arxiv.org/abs/1301.0159
pp. 213–226. [35] A. Sandryhaila and J. M. F. Moura, ‘‘Big data analysis with signal pro-
[10] A. Selim, F. Paisana, J. A. Arokkiam, Y. Zhang, L. Doyle, and cessing on graphs: Representation and processing of massive data sets with
L. A. DaSilva. (2017). ‘‘Spectrum monitoring for radar bands using deep irregular structure,’’ IEEE Signal Process. Mag., vol. 31, no. 5, pp. 80–90,
convolutional neural networks.’’ [Online]. Available: https://arxiv.org/abs/ Sep. 2014.
1705.00462 [36] T. Kazaz, X. Jiao, M. Kulin, and I. Moerman, ‘‘Demo: WiSCoP—Wireless
[11] J. Akeret, C. Chang, A. Lucchi, and A. Refregier, ‘‘Radio frequency sensor communication prototyping platform,’’ in Proc. Int. Conf. Embed-
interference mitigation using deep convolutional neural networks,’’ Astron. ded Wireless Syst. Netw., 2017, pp. 246–247.
Comput., vol. 18, pp. 35–39, Jan. 2017. [37] M. Bansal, A. Schulman, and S. Katti, ‘‘Atomix: A framework for deploy-
[12] M. Schmidt, D. Block, and U. Meier. (2017). ‘‘Wireless interference ing signal processing applications on wireless infrastructure,’’ in Proc.
identification with convolutional neural networks.’’ [Online]. Available: NSDI, 2015, pp. 173–188.
https://arxiv.org/abs/1703.00737 [38] T. Kazaz, C. Van Praet, M. Kulin, P. Willemen, and I. Moerman, ‘‘Hard-
[13] S. Rajendran, W. Meert, D. Giustiniano, V. Lenders, and S. Pollin. ware accelerated SDR platform for adaptive air interfaces,’’ in Proc. Work-
(2017). ‘‘Distributed deep learning models for wireless signal classification shop Future Radio Technol. (ETSI) Air Interfaces, 2016, pp. 1–26.
with low-cost spectrum sensors.’’ [Online]. Available: https://arxiv.org/ [39] Z. Zaidi, V. Friderikos, Z. Yousaf, S. Fletcher, M. Dohler, and
abs/1707.08908 H. Aghvami. (2017). ‘‘Will SDN be part of 5G?’’ [Online]. Available:
[14] T. O’Shea and J. Hoydis, ‘‘An introduction to deep learning for the physical https://arxiv.org/abs/1708.05096
layer,’’ IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, [40] I. F. Akyildiz, P. Wang, and S.-C. Lin, ‘‘Softair: A software defined
Dec. 2017. networking architecture for 5G wireless systems,’’ Comput. Netw., vol. 85,
[15] S. Yao, S. Hu, Y. Zhao, A. Zhang, and T. Abdelzaher, ‘‘Deepsense: pp. 1–18, Jul. 2015.
A unified deep learning framework for time-series mobile sensing data [41] A. Checko et al., ‘‘Cloud RAN for mobile networks—A technology
processing,’’ in Proc. 26th Int. Conf. World Wide Web, 2017, pp. 351–360. overview,’’ IEEE Commun. Surveys Tuts., vol. 17, no. 1, pp. 405–426,
[16] D. Chen, S. Yin, Q. Zhang, M. Liu, and S. Li, ‘‘Mining spectrum usage 1st Quart., 2015.
data: A large-scale spectrum measurement study,’’ in Proc. 15th Annu. Int. [42] M. Bansal, J. Mehlman, S. Katti, and P. Levis, ‘‘Openradio: A pro-
Conf. Mobile Comput. Netw., 2009, pp. 1–12. grammable wireless dataplane,’’ in Proc. 1st Workshop Hot Topics Softw.
Defined Netw., 2012, pp. 109–114.
[17] S. Haykin, ‘‘Cognitive radio: Brain-empowered wireless communica-
[43] A.-J. Van Der Veen, E. F. Deprettere, and A. L. Swindlehurst, ‘‘Subspace-
tions,’’ IEEE J. Sel. Areas Commun., vol. 23, no. 2, pp. 201–220, Feb. 2005.
based signal analysis using singular value decomposition,’’ Proc. IEEE,
[18] I. F. Akyildiz, W.-Y. Lee, M. C. Vuran, and S. Mohanty, ‘‘A survey on
vol. 81, no. 9, pp. 1277–1308, Sep. 1993.
spectrum management in cognitive radio networks,’’ IEEE Commun. Mag.,
vol. 46, no. 4, pp. 40–48, Apr. 2008.
[19] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, ‘‘Internet of
Things (IoT): A vision, architectural elements, and future directions,’’
Future Generat. Comput. Syst., vol. 29, no. 7, pp. 1645–1660, 2013.
[20] S. Gollakota, F. Adib, D. Katabi, and S. Seshan, ‘‘Clearing the RF
smog: Making 802.11 n robust to cross-technology interference,’’ ACM
SIGCOMM Comput. Commun. Rev., vol. 41, no. 4, pp. 170–181, 2011. MERIMA KULIN received the M.Sc. degree
[21] A. A. Khan, M. H. Rehmani, and A. Rachedi, ‘‘Cognitive-radio-based (summa cum laude) in electrical engineering from
Internet of Things: Applications, architectures, spectrum related function- the Department for Telecommunications, Univer-
alities, and future research directions,’’ IEEE Wireless Commun., vol. 24, sity of Sarajevo, in 2012. She is currently pursuing
no. 3, pp. 17–25, Jun. 2017. the Ph.D. degree with Ghent University. In 2013,
[22] Q. Wu et al., ‘‘Cognitive Internet of Things: A new paradigm beyond she joined JSC Elektroprivreda, as an Expert
connection,’’ IEEE Internet Things J., vol. 1, no. 2, pp. 129–143, Apr. 2014. Associate for ICT Management and Maintenance.
[23] G. Staple and K. Werbach, ‘‘The end of spectrum scarcity [spectrum In 2015, she started her research activities with the
allocation and utilization],’’ IEEE Spectr., vol. 41, no. 3, pp. 48–52, Department of Information Technology (INTEC),
Mar. 2004. Ghent University. She was actively involved in the
[24] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. EU H2020 WiSHFUL, eWINE, and SBO SAMURAI research projects. Her
Cambridge, MA, USA: MIT Press, 2016. [Online]. Available: http://www. main research interests include Internet of Things, network architectures and
deeplearningbook.org protocols, cognitive radio networks, data mining, machine learning, and self-
[25] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and learning networks.
R. Salakhutdinov, ‘‘Dropout: A simple way to prevent neural networks
from overfitting,’’ J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
2014.
[26] M. Kulin, C. Fortuna, E. De Poorter, D. Deschrijver, and I. Moerman,
‘‘Data-driven design of intelligent wireless networks: An overview and
tutorial,’’ Sensors, vol. 16, no. 6, p. 790, 2016.
[27] C. Jiang, H. Zhang, Y. Ren, Z. Han, K.-C. Chen, and L. Hanzo, ‘‘Machine TARIK KAZAZ received the M.Sc. degree (cum
learning paradigms for next-generation wireless networks,’’ IEEE Wireless laude) in electrical engineering from the Depart-
Commun., vol. 24, no. 2, pp. 98–105, Apr. 2017.
ment for Telecommunications, University of Sara-
[28] B. Danev and S. Capkun, ‘‘Transient-based identification of wireless sen-
jevo, in 2012. He is currently pursuing the Ph.D.
sor nodes,’’ in Proc. Int. Conf. Inf. Process. Sensor Netw., 2009, pp. 25–36.
research with the Circuits and Systems Research
[29] F. Chollet et al. (2015). Keras. [Online]. Available: https://github.com/
fchollet/keras
Group, Delft University of Technology. In 2013, he
[30] D. P. Kingma and J. Ba. (2014). ‘‘Adam: A method for stochastic optimiza-
joined BH Mobile, where he was a Radio Access
tion.’’ [Online]. Available: https://arxiv.org/abs/1412.6980 Network Engineer, while at the same time he was
[31] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-Net: Convolutional networks a part-time Teaching Assistant with the Faculty
for biomedical image segmentation,’’ in Proc. Int. Conf. Med. Image of Electrical Engineering, University of Sarajevo.
Comput. Comput.-Assist. Intervent, 2015, pp. 234–241. In 2015, he joined the Department of Information Technology, Ghent Univer-
[32] W. Liu et al., ‘‘Heterogeneous spectrum sensing: challenges and method- sity, as a Ph.D. Researcher. He was active in several national and international
ologies,’’ EURASIP J. Wireless Commun. Netw., vol. 2015, p. 70, research projects, including EU H2020 ORCA, WiSHFUL, iMinds’ IoT
Dec. 2015. Strategic Research Program, and NWO SuperGPS. His main research inter-
[33] S. Rajendran et al. (2017). ‘‘Electrosense: Open and big spectrum data.’’ ests are wireless networks, signal processing for communications, software-
[Online]. Available: https://arxiv.org/abs/1703.09989 defined radio and cognitive radio, and hardware-software co-design.
INGRID MOERMAN received the degree in elec- ELI DE POORTER received the master’s degree in
trical engineering and the Ph.D. degree from Ghent computer science engineering from Ghent Univer-
University, in 1987 and 1992, respectively. She sity, Belgium, in 2006, and the Ph.D. degree from
was a part-time Professor with Ghent University the Department of Information Technology, Ghent
in 2000. She is a Staff Member at IDLab, a core University, in 2011. After obtaining his Ph.D., he
research group of imec with research activities received the FWO Post-Doctoral Research Grant
embedded in Ghent University and University of and is currently a Professor at the same research
Antwerp. She is coordinating the research activ- group, where he is currently involved in and/or
ities on mobile and wireless networking. She is research coordinator of several national and inter-
leading a research team of about 30 members at national projects. He is currently a Professor with
Ghent University. She has a longstanding experience in running and coordi- Ghent University. He has authored or co-authored over 100 papers in
nating national and EU research funded projects. At the European level, she international journals or the proceedings of international conferences. His
is in particular very active in the Future Connectivity Systems research area, main research interests include wireless network protocols, IoT, network
where she has coordinated and is coordinating several FP7/H2020 projects architectures, wireless sensor and ad hoc networks, future Internet, machine
(CREW, WiSHFUL, eWINE, and ORCA). She has authored or co-authored learning and self-learning networks, and indoor localization. He is part of the
over 700 publications in international journals or conference proceedings. program committee of several conferences.
Her main research interests include Internet of Things, low power wide area
networks, high-density wireless access networks, collaborative and coop-
erative networks, intelligent cognitive radio networks, real-time software-
defined radio, flexible hardware/software architectures for radio/network
control and management, and experimentally supported research.