Paper 2

Chinese Journal of Aeronautics, (2022), 35(9): 35–48
Chinese Society of Aeronautics and Astronautics

& Beihang University
Chinese Journal of Aeronautics
[email protected]
www.sciencedirect.com
Large-scale real-world radio signal recognition with

deep learning
Ya TU a, Yun LIN a,*, Haoran ZHA a, Ju ZHANG b, Yu WANG c, Guan GUI c,
Shiwen MAO d
a
College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China
b
College of Electronic Science and Engineering, National University of Defense Technology, Changsha 410073, China
c
College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003,
China
d
Department of Electrical and Computer Engineering, Auburn University, Auburn 36849, USA
Received 27 February 2021; revised 6 April 2021; accepted 24 May 2021

Available online 13 October 2021
KEYWORDS Abstract In the past ten years, many high-quality datasets have been released to support the rapid
Signal recognition; development of deep learning in the fields of computer vision, voice, and natural language process-
Radio signal dataset; ing. Nowadays, deep learning has become a key research component of the Sixth-Generation wire-
Automatic Dependent less systems (6G) with numerous regulatory and defense applications. In order to facilitate the
Surveillance-Broadcast application of deep learning in radio signal recognition, in this work, a large-scale real-world radio
(ADS-B); signal dataset is created based on a special aeronautical monitoring system - Automatic Dependent
Deep learning; Surveillance-Broadcast (ADS-B). This paper makes two main contributions. First, an automatic
Recognition benchmark data collection and labeling system is designed to capture over-the-air ADS-B signals in the open
and real-world scenario without human participation. Through data cleaning and sorting, a
high-quality dataset of ADS-B signals is created for radio signal recognition. Second, we conduct
an in-depth study on the performance of deep learning models using the new dataset, as well as
comparison with a recognition benchmark using machine learning and deep learning methods.
Finally, we conclude this paper with a discussion of open problems in this area.
Ó 2021 Chinese Society of Aeronautics and Astronautics. Production and hosting by Elsevier Ltd. This is
an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
* Corresponding author.
In recent years, with the rapid development of algorithms,
computation technologies, and the number of datasets, deep
E-mail address: [email protected] (Y. LIN).
learning has made significant progress in computer vision,
Peer review under responsibility of Editorial Committee of CJA.
voice and natural language analysis.1 In the implementation
of deep learning, good-quality data collection plays a crucial
role as the high performance of deep learning is mostly data-
Production and hosting by Elsevier driven. For example, ImageNet was the first high-quality data-
https://doi.org/10.1016/j.cja.2021.08.016
1000-9361 Ó 2021 Chinese Society of Aeronautics and Astronautics. Production and hosting by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
36 Y. TU et al.
set including more than 14 million images in 20,000 categories, learning. Ma et al.17 proposed a novel AMC method by using
which drove the advances of deep learning in computer vision. Cyclic Correntropy Spectrum (CCES) and deep Residual neu-
After that, the general and robust image features in ImageNet ral Network (ResNet). Experimental results confirm that the
have nurtured enormous successful computer vision deep proposed algorithm outperforms the existing designs with
learning models, such as AlexNet, VGG (Visual Geometry much higher classification accuracy. Furthermore, some
Group), and Deep Residual Network.2 In the field of speech, researchers are not only satisfied with accurate signal recogni-
the CSTR VCTK Corpus dataset consists of speech data col- tion but also consider few labeled data to achieve faster recog-
lected from 110 English speakers with various accents. This nition. Tu et al.18 applied Semi-Supervised Generative
dataset paved way for remarkable real-time Text-to-Speech Adversarial Networks (SSGAN) to deal with unlabeled signal
(TTS) deep learning models such as WaveNet and Deep data and confirmed that the semi-supervised learning method
Voice.3 The Bidirectional Encoder Representations from is an effective way to exploit unlabeled data effectively to
Transformers (BERT) has achieved great success in natural reduce over-fitting in deep learning. Wang et al.19 proposed a
language processing with the help of BooksCorpus (including novel deep learning based lightweight AMC (LightAMC)
800 M words) and English Wikipedia (including 2500 M method with smaller model sizes and faster computational
words).4 On some test datasets, BERT even outperforms aver- speed. The LightAMC method can effectively reduce model
age bilingual human translators. sizes and accelerate computation with the slight performance
Deep learning has become one of the most important loss. Lin et al.20 proposed a new filter-level pruning technique
research topics in the Sixth-Generation (6G) wireless commu- based on Activation Maximization (AM) that omits the less
nications,5,6 which has been widely used in Multi-Input Multi- important convolutional filter. The proposed method can
Output (MIMO),7 resource management8 and ultra-reliable achieve equal or higher classification accuracy than conven-
and low-latency communications.9 Automatic Modulation tional methods.
Classification (AMC) is considered as one of the most impor- According to the above survey, various deep learning based
tant techniques in 6G scenario. Hence, many researchers pay AMC methods have been proposed to improve the signal
attention to get a better recognition performance based on recognition performance. But there still exist some problems.
deep learning.5,6 Since the Convolutional Neural Network Firstly, the dataset is often generated from the simulation sys-
(CNN) is not good at learning temporal information in time tem, such as MATLAB, GNURadio, and Python. Secondly,
series, researchers have proposed other deep learning methods the number of signal categories is limited. Thirdly, the real
to dip the signal persistent features. Rajendran et al.10 pro- radio signal environment is not considered. Actually, all the
posed a new data-driven model for automatic modulation clas- defects are due to the lack of a large-scale and high-quality
sification based on Long Short-Term Memory (LSTM), which radio signal dataset in the real world. However, producing a
is useful in classifying modulation signals with different sym- good dataset requires a lot of time, money and manpower.
bol rates. The achieved accuracy of 75% on an input sample Due to the unique characteristics of radio signal, it is very dif-
length of 64 for which it was not trained, substantiates the rep- ferent to label a large-scale and high-quality dataset with the
resentation power of the model. Afan and Fan11 demonstrated manual operation. Hence, in this paper, an automatic collec-
a novel method for the AMC based on autoencoder network, tion and labeling system is designed to capture radio signals
which is trained by a nonnegativity constraint algorithm. The in the real world without human participation. The main con-
results indicate that the Autoencoder with Nonnegativity Con- tributions of this paper are summarized as follows.
straint (ANC) improves the sparsity and minimizes the recon- (1) The Automatic Dependent Surveillance-Broadcast
struction error in comparison with the conventional sparse (ADS-B) system is chosen as the signal source from the real
autoencoder. Huang et al.12 proposed a novel Gated recurrent world. An automatic collection and labeling system is designed
residual neural Network (GrrNet) for feature-based AMC, to capture over-the-air ADS-B signals. By means of data clean-
where the amplitude and phase of the received signal are uti- ing, a large-scale real-world radio signal dataset is acquired,
lized as the inputs of GrrNet. Simulations are conducted to which includes 426613 pieces of long signals from 1661 cate-
verify the classification performance and robustness of the pro- gories of airplanes, and 167234 pieces of short signals from
posed GrrNet and it is shown that GrrNet outperforms other 1713 categories of airplanes.
recent deep learning based AMC methods. Zhang et al.13 (2) Numerous experiments are carried out to demonstrate
addressed the AMC based on CNN-LSTM which is a dual- the effectiveness and accuracy of the dataset under different
stream structure by combining the advantages of CNN and scenarios. A rigorous benchmark is provided based on machin-
LSTM. The experimental results not only demonstrate the ing learning and deep learning techniques. The dataset will be
superior performance of the proposed method compared with released to the research community, which will catalyze the
the existing state-of-the-art methods, but also reveal the poten- development of many new algorithms, models, and evaluation
tial of deep learning based approaches for AMC. Tu et al.14 methods.
designed several key building blocks for Complex-Valued Neu- The rest of this paper is organized as follows. In Section 2,
ral Network (CVNN) for AMC. Additionally, signal recogni- we introduce the work on ADS-B signals and the model of the
tion domain knowledge is also taken into consideration. Peng radio channel used in this paper. In Section 3, we present the
et al.15 investigated several approaches to denote the complex general framework of the ADS-B dataset generation process
signals into images with grid like topologies and utilized two and highlight the key elements in this framework. In Section 4,
CNN models (AlexNet and GoogLeNet) to learn features we introduce the signal recognition models based on three dif-
from these images for AMC. Similarly, Lin et al.16 proposed ferent deep learning algorithms. In Section 5, we introduce a
contour stella image method, which can convey deep level sta- series of experiments conducted with the dataset. Finally, we
tistical information by dot density in constellation diagram. conclude this paper in Section 6 with a discussion of future
This work bridges the gap between signal recognition and deep work.
Large-scale real-world radio signal recognition with deep learning 37
2. Related work conservation of phase, variability of size, and invariance of

time change. These benefits mean that during transformation,
2.1. Related datasets a lot of details in the signal are retained, which are particularly
useful for the classification and recognition of radiation source
devices. In addition, high-order spectral analysis can, techni-
In recent years, several datasets have been proposed in the field
cally, minimize the Gaussian noise. However, since the trans-
of radio signal recognition, which are shown as follows:
formation of the bispectrum is not conducive to detection
(1) RadioML 2016.10A and RadioML 2016.04C. The
and identification, it generally absorbs a number of energies
RadioML 2016.10A21 and RadioML 2016.04C datasets22 pro-
for two-dimensional picture recognition.
vide 170163 and 220000 labeled In-Phase and Quadrature (I/
The two-dimensional signal image, then, will first be con-
Q) samples (two-dimensional data), respectively. Designed
verted by the line integral bispectrum method to a one-
using GNU Radio, each synthetic dataset consists of 11 mod-
dimensional signal. Axial Integral Bispectrum (AIB), Circum-
ulations (8 digital and 3 analog). The signals are synthesized
ferential Integral Bispectrum (CIB), Radial Integral Bispec-
for evaluating output under distinct signal and noise scenarios
trum (RIB) and Rectangular Integral Bispectrum (RIB) are
at various SNR levels with mild Local Oscillators (LO) drift,
four types of well-integrated bispectrum algorithms. First, it
light fading, and multiple distinct labelled SNR increments.
is possible to express the signal bispectrum of xðtÞ as follows:
(2) RadioML 2018.01A. RadioML 2018.01A23 is a rich
modulation classification dataset containing more than 2.5 X
1 X
1
million radio signals covering up to 24 analog and digital mod- Bx ðx1 ; x2 Þ ¼ C3x ðs1 ; s2 Þeiðx1 s1 þx2 s2 Þt ð1Þ
ulation formats in a wide range of SNR [20: 2: 30] dB. This s1 ¼1 s2 ¼1
dataset contains both artificial virtual channel effects and over- where
the-air recordings of 24 tested forms of optical and analog Z 1
modulation. C3x ðs1 ; s2 Þ ¼ x ðtÞxðt þ s1 Þxðt þ s2 Þdt ð2Þ
(3) ORACLE. The Oracle dataset24 refers to a specific radio 1
signal from a wide pool of machines that are bit-like (i.e. with
the same hardware, protocol, physical address, and MAC ID) ¼ Efx ðtÞxðt þ s1 Þxðt þ s2 Þg
utilizing only physical layer IQ samples. ORACLE follows two Among the four bispectrum algorithms, we choose AIB in
approaches: (A) it trains a CNN to identify special hardware- this paper, which can be described as follows:
centric signatures embedded in the transmitter radio chain Z
(e.g., IQ imbalance, DC offsets, etc.); (B) it uses a receiver- 1 1
AIBðxÞ ¼ Bðx1 ; x2 Þdx2 ð3Þ
feedback to insert modifications in the transmitter chain to 2p 1
conduct channel-independent fingerprinting of Radio Fre- Z 1
1
quency (RF). ¼ Bðx1 ; x2 Þdx1
Datasets are an integral part of the field of machine learn- 2p 1
ing. Major advances in this field can result from advances in From the Fourier transform projection, it can be shown
learning algorithms (such as deep learning), computer hard- that AIB can be considered as the axial portion Fourier trans-
ware, and less intuitively, the availability of high-quality train- form of the signal’s third-order correlation function.
ing datasets. Many major AI breakthroughs have actually (2) Machine learning based classifier
been constrained by the availability of high-quality training A variety of lightweight machine learning models or ana-
datasets, but not by algorithmic advances. We have seen that lytic decision processes can be used to classify the signal after
numerous awesome datasets have been instrumental in mapping statistical features to a class name. Popular methods
advancing computer vision, NLPs and deep learning research. include Support Vector Machine (SVM),26 Decision Trees
However, there are several datasets for radio signal recogni- (DTree),27 and K-Nearest Neighbors (KNN),28 which are
tion. But we believe that the high-quality and real-world briefly introduced in the following:
labeled training datasets are still absent. (A) SVM is based on the minimization of systemic risk and
These existing datasets have been widely used in research, has strengths over generative learning. For example, where the
but they may not be sufficient for many practical scenarios. ultimate aim is to obtain a classifier, not the distributions, the
First, most categories of these datasets are generated by simu- Vapnik-Chervonenkis theory29 provides relevant objections
lation, which could be very different from that captured from against attempting to estimate probability distributions in gen-
real applications. Second, the number of signal categories in erative learning.
these datasets may not be sufficient, which limits their applica- (B) DTree is a decision support mechanism that utilizes a
tions in AMC. Third, these datasets are hard to extend for decision-making tree-like paradigm and its future effects,
emerging wireless systems, and thus they could be out-of- namely the implications of chance events, resource costs, and
date in the near future. The new radio signal dataset proposed utility. It is an efficient way of expressing an algorithm that
in this paper will address all these issues. includes only conditional statements of power.
(C) KNN is a type of instance learning, or lazy learning,
2.2. Benchmark recognition approach where the function is only locally approximated and all calcu-
lation is postponed before evaluation of the function. Since
(1) Statistical features this algorithm relies on distance for classification, if the fea-
In this paper, the bispectrum method25 is used to extract the tures come in vastly different scales, normalizing the training
statistical features. The bispectral transform has the benefits of data can improve its accuracy dramatically.
38 Y. TU et al.
2.3. Deep learning and designed for open-source. Therefore, for scientific
research, everyone can receive and collect ADS-B signals with-
To optimize wide parametric neural network models, deep out causing security and privacy concerns.
learning depends entirely on Stochastic Gradient Descent Two parts, including the preamble and the data block, con-
(SGD). The key strategy over the years stays relatively sist of the ADS-B signal, which are shown in Fig. 1. The data
unchanged. Neural networks consist of a set of layers that block uses the pulse position modulation method: a high pulse
map the input h0 of each layer to output h1 utilizing dense following a low pulse represents ‘‘1”, and a low pulse following
parametric matrix operations accompanied by non-linearities. a high pulse means ‘‘0”.
This can be represented clearly as follows: In practice, the lengths of received ADS-B signals are dif-
ferent. In the preamble, there are 8 microseconds of signal
h1 ¼ maxð0; h0 W þ bÞ ð4Þ header with a total of 4 pulses at fixed positions, which can
where the weights, W, have the dimension jh0 h1 j, the bias, b, be used to detect and synchronize the ADS-B signal.30 In
has the dimension jh1 j (both constituting h), and max is applied the data block, there are two different data formats. The long
element-wise per output jh1 j (i.e., as the Rectified Linear Unit data format is 112 bits, and the short data format is 56 bits.
(ReLU) activation function). From Fig. 1, it can be seen that the long and short data for-
Traditionally, preparation leverages a loss function (L). In mats have the same structure in the first 32 bits. The ICAO
this case (for supervised classification), categorical cross- code appears both in long and short signals, which is used
entropy of one-hot recognized class labels yi (an all zero vec- to identify the airplane. Therefore, the ICAO code will be
tor, except for a one value at the class index i of the correct used as the unique identification of the airplane to labels
class) and predicted class values ybi , is used. the signals in the dataset. Long and short signals will both
be labeled in the dataset, with increased amount of signal
1 X
N
types.
Lðy; ybÞ ¼ ½y lg ð ybi Þ þ ð1 yi Þlgð1 ybi Þ ð5Þ
N i¼0 i
3.2. Radio signal capture
For each epoch n within the network (fðx; hn Þ), back prop-
agation of loss gradients can be used to iteratively update net- 3.2.1. Overall architecture of capture system
work weights (h) until validation loss no longer decreases.
Back propagation of loss gradients can be used to itera- The overall system architecture of the proposed signal capture
tively update network weights (h) for each epoch n within system is shown in Fig. 2. The Software Defined Radio (SDR)
the network (fðx; hn Þ) until validation loss no longer decreases. device is used to detect and capture the baseband I/Q data of
The essential optimization method is as follows: ADS-B signal, and an automatic decoding algorithm is used to
obtain the individual identity (ID) of the airplane. Then, an
@Lðy; fðx; hÞÞ automatic clustering and labeling algorithm is used to label
hnþ1 ¼ hn g ð6Þ
@hn the baseband I/Q data with the corresponding airplane ID.
In this way, a dataset will be obtained after continuous collec-
tion of ADS-B signals over a period of time. The entire system
3. Dataset generation approach
works well without requiring human participation, which will
greatly save the cost, time, and human power for constructing
3.1. Radio signal source
a dataset.
As mentioned above, the new radio signal dataset should have 3.2.2. Hardware setup of capture system
three important advantages. First, it should be collected in a
The hardware configuration of the proposed signal collecting
real-world scenario with a large number of categories. Second,
system is shown as follows:
it should not be complicated to acquire and at a low expense.
(1) Signal collecting device: SM200B is a SDR platform,
Finally, it should be easy to update in the future. Therefore,
which is produced by Signal Hound, Inc.. The collecting
the source of radio signal is very important for the dataset.
parameters are summarized in Table 1.
Fortunately, the ADS-B system is a good fit for the applica-
(2) Signal processing device: An HP Laptop computer is
tion. The ADS-B system is widely used to monitor the status
used for signal decoding, labeling, and storage. The laptop
of airplanes, which is very important for the safety of air traf-
configuration is Intel(R) Core (TM) i7-10750H CPU
fic. The advantages of ADS-B include:
@ 2.60 GHz, 32 GB RAM, and 256 GB SSD Hard Disk.
(1) Large Scale. The International Civil Aviation Organiza-
(3) Antennas device: A 1090 MHz omnidirectional antenna
tion (ICAO) requires that every airplane installs the ADS-B
is used to collect over-the-air ADS-B signal.
system, and every airplane should periodically transmit over-
As shown in Fig. 3, in the actual collecting environment, we
the-air ADS-B signal. Therefore, there are a lot of ADS-B sig-
select an open and unobstructed place in the acquisition envi-
nals from different airplane categories in the open radio
ronment to avoid the influence of the surrounding environ-
environment.
ment on the received signal fingerprint to the greatest extent.
(2) Easy labeling. The ADS-B system follows a standard
and open protocol (DO-260B), which is designed by the ICAO.
3.2.3. Software algorithm of capture system
Therefore, it is easy to automatically collect and label the radio
signal without human participation. The software algorithm comprises five parts, including resam-
(3) Open Source. The ADS-B system works at the fre- pling, header search, information decoding, CRC check, and
quency of 1090 MHz, which is a passive receiving system positioning and labeling.
Fig. 1 Structure of ADS-B signal.
Fig. 2 Overall architecture of capture system.
Table 1 SM200B parameters during recording.

Parameter Value
Sampling frequency (Mbps) 50
IF (Hz) 0 (baseband)
Center frequency (MHz) 1090
Bandwidth (MHz) 10
Gain (dB) 30
(1) Resampling. This is used to reduce the amount of data,

increase the rate of data processing, and extend the adaptabil-
ity of different processing devices.
(2) Header Search. This is to detect the beginning of the
ADS-B signal and synchronize the signal by finding the peak
value of the cross-correlation output in a certain interval. Fig. 3 Actual collecting environment.
(3) Information Decoding. According to the DO-260B stan-
dard, the received ADS-B signal can be decoded to obtain
specific information, which includes the airplane ID, status,
and storage systems to identify unintentional data
position, and so on. Such information will be used for labeling
adjustments.
the radio signal.
(5) Positioning and Labeling. The resampling factor Dfactor
(4) CRC Check. Cyclic Redundancy Check (CRC) is an
and the position index LD of the header are used to determine
error-detection code that is widely used in digital networks
the starting position of the radio signal, i.e., Lstart ¼ Dfactor LD .
40 Y. TU et al.
According to the signal length, the ending position of the radio Four ADS-B signals from different airplanes are shown in
signal can be easily calculated. Finally, a clustering method is Fig. 5.
used to label the ADS-B signal with the corresponding air-
plane ID. A high-quality radio signal dataset can be obtained 4. Signal recognition models
without human operation in this way.
The overall dataset preprocessing procedure is summarized 4.1. CNN-based model
in Algorithm 1.
It is well understood what deep neural networks actually learn
as discriminating features in CV applications. For example, in
Algorithm 1. Software algorithm. CNN, the first layers are trained to detect small-scale ‘‘edges”,
Input: I/Q signal zBI ðnÞ, zBQ ðnÞ, and the front-end sampling which become increasingly complicated as the network deep-
rate1=Ts ens. CNN also has the benefit of being shift-invariant: con-
1.Calculate the resampling factor Dfactor and obtain the received verted weights detect patterns in arbitrary positions in the
I/Q signal zBIR ðnÞ,zBQR ðnÞ sequence in each layer, and the presence of a signal is passed
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2. Extract the envelope signal aðnÞ ¼ zBIR ðnÞ2 þ zBQR ðnÞ2 to a higher layer by a max-pool layer. This is precisely the
3. Calculate the correlation coefficient Rs1;s2 ðsÞ to find signal property that makes it excellent for these networks to detect.
header position index LD The convolution operation can be written as
4. Decode the received signal and get the airplane information XnH XnW Xnc
5. Check the effectiveness of decoding ADS-B information convðI; KÞx;y ¼ i¼1 j¼1 k¼1
Ki;j;k Ixþi1;yþj1;k ð7Þ
6. Combine the airplane ID with the received I/Q signal, and
store them to the database
where I and K denote the input and convolutional layer
Output: A high-quality radio signal dataset weights, respectively, and H; W; andC denote the input’s
height, width, and channel, respectively.
However, in the wireless domain, CNN does not operate on
images but on samples of I/Q. In the I/Q plane, various radio
signal waveforms present various transition patterns. For
3.3. Statistics and visualization of dataset
example, transitions between (1, 0) and (–1, 0) are typical for
BPSK signals, but they do not appear in QPSK signals, which
Data collection lasted for nearly one month. Tens of thou- have a constellation that is significantly different. This can
sands of signals have been acquired, but we should choose constitute a unique ‘‘signature” of the signal that the CNN fil-
the high-quality data. Therefore, after data cleansing, 26613 ters can ultimately learn.
pieces of long signals in 1661 categories of airplanes, and The structure and parameters of the CNN used in this
167234 pieces of short signals in 1713 categories of airplanes paper are provided in Table 2.
have been chosen for the dataset. The detailed quantity distri-
bution of the captured ADS-B signals is shown in Fig. 4. In 4.2. LSTM-based model
order to ensure the richness and balance of signal samples,
530 categories of long signals and 198 categories of short sig-
Recurrent Neural Network (RNN) is an advanced deep learn-
nals have been selected to build the dataset. Every category
ing model for extracting temporal features, and it is generally
consists of 200 to 600 signal samples. In the following experi-
applied to model language sequences and multimodal time ser-
ments, we will randomly select 80% of the data as the training
ies. Different from CNN, whose input dimensionality is fixed,
set, 10% of the data as the validation set, and 10% of the data
RNN is suitable to analyze sequential data with variable
as the test set.
lengths. In the wireless domain, RNN not only considers the
At a sampling rate of 50 MHz, the long signal length is 6000
current transition patterns in I/Q samples, but also looks back
sampling points, and the short signal is 3000 sampling points.
into historical transition patterns. In this way, RNN performs
Fig. 4 Quantity distribution of the captured ADS-B signals.

Fig. 5 Four types of ADS-B signal of different airplanes from dataset.
LSTM structure, with a simple recurrent unit more effective

Table 2 Structure and parameters of convolution neural
than the RNN model.
network.
The structure and parameters of the RNN used in this
Layer Dimension Activation paper are provided in Table 3.
Input 1 Signal length 2
Convolution 2 Signal length 50 ReLU 4.3. CVNN-based model
Batch normalization 2 Signal length 50
Max pooling 2 Signal length/2 50
Complex-valued signals are observed in a wide variety of tech-
Convolution 2 Signal length/2 75 ReLU
nologies, such as wireless networking, sensor array signal pro-
Batch normalization 2 Signal length/2 75
Max pooling 2 Signal length/4 75 cessing, as well as biomedical research and physics.31 The
Convolution 2 Signal length/4 100 ReLU outcome of the operation of Complex-Valued Neural Net-
Batch normalization 2 Signal length/4 100 works (CVNN) contains values that, as a consequence of the
Max pooling 2 Signal length/8 100 complex operation, carry features of both I/Q components.
Convolution 2 Signal length/8 150 ReLU Let the complex kernel be A þ jB and the complex input signal
Batch normalization 2 Signal length/8 150 be X þ jY. We can store the outcome as
Max pooling 2 Signal length/16 150 (1) Separated channel: XA YB
Convolution 2 Signal length/16 200 ReLU (2) Mixed channel: XB þ YA
Batch normalization 2 Signal length/16 200
Mathematically, the outcome of this operation is still real-
valued in one channel and imaginary in the other channel. To
Convolution 2 Signal length/32 300 ReLU
Batch normalization 2 Signal length/32 300 this end, the mixed channel will allow the CVNN to learn sig-
Average pooling 2 Signal length/64 300 nal coherence information. It has been shown that the CVNN
Flatten 2 Signal length/64 300 outperforms the Real-Valued Neural Network (RVNN) in
Output 1 classes number Softmax high signal coherence regions (e.g., the high SNR region).14
The structure and parameters of the CVNN used in this
paper are provided in Table 4.
5. Recognition performance analysis

well by using more sequence features. Hence, the simple RNN
can written as Signal recognition efficiency of deep learning approaches
would be greatly influenced by the architecture, deployment,
ht ¼ rh ðWh xt þ Uh ht1 þ bh Þ ð8Þ

yt ¼ rh Wy ht þ by ð9Þ
Table 3 Structure and parameters of LSTM.
where xt is the input vector, ht is the hidden layer vector,
Wh ; Wy ; Uh are RNN weight parameters, b bh , by denotes Layer Dimension Activation
RNN bias parameters, and rh denotes the activation function. Input 1 Signal length 2
Usually, RNN models are trained with Back Propagation LSTM 1 Signal length 128 ReLU
Through Time (BPTT), which may result in gradient explosion LSTM 1 Signal length 128 ReLU
or gradient disappearance. In order to address the problem, LSTM 1 Signal length 128 ReLU
several variants of RNN have been developed, such as LSTM, Flatten 1(Signal length 128)
which is one of the most well-known and the most widely used Dense 1 1024 ReLU
Dense 1 1024 ReLU
models. However, LSTM has a high computing complexity,
Output 1 classes number Softmax
while Gated Recurrent Unit (GRU) greatly simplifies the
42 Y. TU et al.
tion, the maximum training epoch is 1000 and the batch size is
Table 4 Structure and parameters of CVNN.
256 for all the deep learning models in the following
Layer Dimension Activation experiments.
Input 1 Signal length 2 Early stopping is a way to define an unspecified large num-
Complex convolution 1 Signal length 50 CReLU ber of training epochs and avoid training until the output of
Complex batch 1 Signal length 50 the model finishes improving on a validation dataset hold-
normalization out. Early stopping is scheduled to end the training phase in
Max pooling 1 Signal length/2 50 this experiment when there is no change across 20 epochs.
Complex convolution 1 Signal length/2 75 CReLU The rate of learning controls how easily the model is
Complex batch 1 Signal length/2 75
adapted to the problem. When smaller adjustments are made
normalization
to the weights in each update, smaller learning rates need more
Complex convolution 1 Signal length/ CReLU training epochs, whereas larger learning rates contribute to fas-
4 100 ter improvements which need less training epochs. The initial
Complex batch 1 Signal length/ learning rate is 0.001 for all the models. The decreased learning
normalization 4 100 rate enables our models to descend into areas of the loss land-
Max pooling 1 Signal length/ scape that are ‘‘more optimal.” The learning rate decreasing
8 100 scheme in the experiments is as follows:
Complex convolution 1 Signal length/ CReLU 8 3
8 150 < 10 ; epoch 15
>
Complex batch 1 Signal length/ a ¼ 104 ; 15 < epoch 40 ð10Þ
normalization 8 150 >
: 5
Max pooling 1 Signal length/
10 ; Otherwise
16 150 During the training process, the optimizer will tweak and
Complex convolution 1 Signal length/ CReLU change the parameters (i.e., the weights) of the model to try
16 200
and minimize the loss function and make predictions as correct
Batch normalization 1 Signal length/
and optimized as possible. In our experiments, (stochastic gra-
16 200
Max pooling 1 Signal length/ dient descent) SGD with momentum is chosen for all the deep
32 200 learning models.
Complex convolution 1 Signal length/ CReLU Considering the effectiveness of the imbalanced sample cat-
32 300 egories, we choose mean Average Precision (mAP) to reduce
Complex batch 1 Signal length/ the influence from imbalance dataset. The mAP metrics we
normalization 32 300 used is defined as
Average pooling 1 Signal length/ PQ
64 300 q¼1 PðqÞ
Flatten 1 Signal length/
mAP ¼ ð11Þ
Q
64 300
Output 1 classes number Softmax where Q is the number of queries and PðqÞ is precis for each
query.
5.2. Recognition performance of different classifiers

training, and data consideration. Therefore, in this section,
several experiments are conducted to evaluate the most com- In this part, three different machine learning and three differ-
mon design parameters, which will greatly impact the recogni- ent deep learning methods are selected to compare their recog-
tion performance, including different classifier, signal nition performance using the ADS-B signal dataset, and the
categories, sampling rate, and signal impairments. comparison results are presented in Fig. 6.
(1) BiSpec+KNN. The KNN classifier combined with
5.1. Dataset and deep learning configuration BiSpectrum features. The number of neighbors is set to 5,
the uniform weight function is applied, and minkowski dis-
In the following experiments, long and short signals are sepa- tance is adopted as the distance metric.
rated to evaluate the recognition performance. The entire data- (2) BiSpec+Dtree. The decision tree classifier combined
set will be separated into three non-overlapping parts, which with BiSpectrum features. The Gini coefficient is chosen to
include the training set (80% of the dataset), the validation measure the quality of a split, the minimum number of samples
set (10% of the dataset), and the testing set (10% of the data- required to split an internal node is two, and the minimum
set). Therefore, there are 144550 and 50484 samples in the long number of samples required to be at a leaf node is one.
and short signal training sets, respectively; 18068 and 6311 (3) BiSpec+SVM. The SVM classifier combined with
samples in the long and short signal validation sets, respec- BiSpectrum features. The radial basis function is chosen as
tively; 18068 and 6311 samples in the long and short signal test the kernel, and regularization is l2 regularization, whose
sets, respectively. strength is 1.0.
One epoch means that a whole dataset is transferred for- (4) CNN. The CNN-based deep learning algorithm intro-
ward and backward through the neural network once, and duced in Section 4.
batch size is the total number of training examples present in (5) LSTM. The LSTM-based deep learning algorithm intro-
a single batch. To balance the performance and time consump- duced in Section 4.
Fig. 6 Recognition performance at different SNR levels.
(6) CVNN. The CVNN-based deep learning algorithm selected from the ADS-B signal dataset. To eliminate the ran-
introduced in Section 4. domness of selecting categories, 20 experiments are conducted
According to Fig. 6, several observations can be made as and the average results are presented. Then the deep learning
follows: models mentioned in Section 4 are applied to test their recog-
(1) The recognition performance of deep learning is much nition performance under different SNR levels, and the result
better than that of machine learning. This is due to different is shown in Fig. 7.
feature extraction mechanisms. With the help of extra layers, According to Fig. 7, some conclusions can be drawn:
deep learning can extract robust and accurate features auto- (1) With the increase of signal categories, the recognition
matically without human participation. However, machine rate will decrease in low SNR. However, the recognition rate
learning should use the handcrafted features, which have a lim- will also improve with the increase of SNR. Therefore, the
ited capacity to obtain correct real-world signal features. recognition rate of different category numbers will become
(2) The recognition rate of deep learning will be improved stable at the same level in high SNR. It indicates that deep
with increased SNR, but it will become stable in the high learning method has a better capacity of feature extraction
SNR region. According to Fig. 6, when the SNR is higher than and finds a better decision boundary in high SNR, even at a
0 dB, the performance rates of all the deep learning models will large category number.
exceed 90%. (2) CVNN model is proved to be the best recognition
(3) The CVNN achieves the highest recognition rate among method for both long and short ADS-B signal at any category
all the models at high SNR. The reason is that the correlation number. It indicates that we should fully use I/Q information
information from the I/Q channels is not polluted by noise at to improve the recognition rate in the future.
high SNR levels. Therefore, according to the correlation infor-
mation from the I/Q channels, the CVNN can acquire more 5.4. Recognition performance at various sampling rates
information to extract features, which lead to better recogni-
tion performance at high SNR. A higher sampling rate requires a higher performance recei-
(4) We compared three different machine learning and three ver. However, a higher performance receiver may not always
different deep learning methods. The accuracy that we be available. An experiment is conducted to find out how
achieved from CVNN, RNN, and CNN is approximate to sampling rate affects the ADS-B recognition performance.
99.8%, 97.2% and 96% respectively at 20 dB for long and In the experiment, MATLAB is used to resample the original
short ADS-B signal, which implies that this dataset is high- signal at 50 MHz, and then resample the long and short sig-
quality labeled and the noise in the signal is very low. In this nals with sampling rates of 50 MHz, 40 MHz, 30 MHz, 20
way, the reader can consider that this dataset is clean and MHz, and 10 MHz. To guarantee that we obtain the inter-
add any influence on it. ested band, we used an Anti-Aliasing Filter (AAF) which is
a filter used before a signal sampler restricts the bandwidth
5.3. Recognition performance under different numbers of of a signal.
categories The recognition performance under different sampling rates
are presented in Fig. 8. It indicates that the sample rate has a
It is common to receive hundreds to thousands of categories of great influence on recognition rate for both long and short sig-
ADS-B signals. Therefore, it is crucial to explore how the num- nal. Because the information of original signals will be lost in
ber of categories of ADS-B signals influences the recognition sample acquisition with a low sampling rate, according to
performance. In this experiment, the numbers of categories Fig. 8, the good recognition rate can be obtained with the sam-
are Nc ¼ 530, 424, 318, 212, and 106 (corresponding to ple rate higher than 40 MHz, and we know that the signal
100%, 80%, 60%, 40%, and 20% of long signals) and Nc ¼ bandwidth of ADS-B signal is 10 MHz. Therefore, we can
198, 158, 119, 79, and 40 (corresponding to 100%, 80%, draw a conclusion that the sample rate should be four times
60%, 40%, and 20% of short signals), which are randomly higher than the bandwidth of original signal.
44 Y. TU et al.
Fig. 7 Recognition performance at different categories number for different models.
5.5. Recognition performance under various signal impairments is important to analyze how well-trained classifiers do and
compare their performance under those transmission
Complex and various channels lead to the impairments of sig- impairments.
nal in real world, whereas the Additive White Gaussian Noise One of the several non-ideal situations that may influence
(AWGN) is commonly utilized in simulation and modeling. It the baseband receiver design is Carrier Frequency Offset
Fig. 8 Recognition performance at different sampling rates with different models.
(CFO). CFO usually happens where the local oscillator signal Like the carrier frequencies, transmitter and receiver sam-
in the receiver is not compatible with the carrier in the pling frequencies used by the Digital to Analog Converter
obtained signal for down-conversion. Two significant reasons (DAC) and Analog-to-Digital Converter (ADC) are generally
may be related to this phenomenon: the frequency mis-match slightly mismatched. Such impairment is known as the Sam-
between the transmitter and the oscillators of the receiver; pling Rate Offset (SRO) and will also degrade the system
the Doppler effect when the transmitter and/or receiver pass. performance.
46 Y. TU et al.
Fig. 9 Recognition performance under different signal impairments with different models.
In this experiment, common signal impairments, such as way, a high-performance receiver is needed to mitigate such
SRO and CFO, are considered. For long and short ADS-B sig- influence.
nals, SRO is set to 1 KHz or 10 KHz, and CFO to 1 KHz or 10 (2) Due to CFO, we suppose that the received frame at right
KHz. The experimental results are presented in Fig. 9. frequency fc is down-converted with a local carrier frequency
The following observations can be made from the results in ð1 þ eÞfc . The resulting baseband samples are subject to phase
Fig. 9: rotation. The waveform of ADS-B signal will be distorted non-
(1) It is obvious that signal impairments have a great linearly and make it hard to recognize when facing worse CFO.
impact on the recognition performance, and a higher offset (3) Due to SRO, it is supposed that the sampling duration is
0
leads to a larger decline of the recognition performance. In this Ts ¼ ð1 eÞTs . Hence, if the sampling e offset is positive (or
negative), we obtain the history or future sample in the frame, the future. First, the category and number of signals are still
which is taken at time sn ¼ neTs earlier (or later) than it not enough, which should be further replenished. Second, we
should. The ideal feature like pike and high frequency chang- only capture the ADS-B signal in one city, the location should
ing in the waveform will be lost. It will be more difficult for change to several different cities, and we can even design a dis-
deep learning method to learn the correct features in ADS-B tributed capturing system. Third, we only label the airplane ID
signal. with the ADS-B signal, and more tag should be used, such as
(4) The impairments of signal cannot be avoided in real SNR, speed, and altitude. Fourth, the evaluation method is
world, and they will make a great influence on the recognition not perfect, a more comprehensive target should be proposed.
performance. Therefore, first, we should try to collect much Finally, a lot of deep learning methods should be included in
larger data from real world, which includes different signal future recognition task. Fifth, the long tail concept has found
impairments. Deep learning can learn to deal with the complex some ground for signal classification, such as minor class
conditions. Second, we should try to find better signal process- ADS-B signal recognition, zero-shot learning, and supervised
ing method to reduce the impairments before inputting into learning. We will consider long-tail application in this dataset,
deep learning models. especially for those ADS-B categories that have less than 50
samples.
6. Conclusions
Declaration of Competing Interest
In this paper, a large-scale and real-world dataset was pre-
sented, which was designed to advance the deep learning appli- The authors declare that they have no known competing
cation in radio signal recognition. We also presented the key financial interests or personal relationships that could have
part about how to collect the dataset without human participa- appeared to influence the work reported in this paper.
tion. Benchmark results were also presented using the dataset
under different radio signal recognition scenarios, such as dif- Acknowledgements
ferent classifiers, different numbers of categories, different
sample rates, and different signal impairments. As shown in This work was supported by the National Natural Science
the results, the deep learning methods have a great potential Foundation of China (No. 61771154) and the Fundamental
in signal recognition. Finally, this dataset will be released to Research Funds for the Central Universities, China (No.
the research community, which will catalyze the development 3072021CF0815). This work was also supported by the Key
of new algorithms, models, and evaluation studies. Laboratory of Advanced Marine Communication and Infor-
In the future, this dataset would become a primary refer- mation Technology, Ministry of Industry and Information
ence for a wide variety of academic studies on signal process- Technology, Harbin Engineering University, Harbin, China.
ing, including the following potential applications:
(1) A benchmark dataset. The current benchmark datasets References
in radio signal processing, such as RadioML 2016.10A and
RadioML 2018.01A, have played a critical role in advancing 1. Pouyanfar S, Sadiq S, Yan Y, et al. A survey on deep learning.
signal recognition research. The proposed high-quality, rich- ACM Comput Surv 2019;51(5):1–36.
diversity, large-scale, and real-world radio signal dataset will 2. Guo J, He H, He T, et al. GluonCV and GluonNLP: Deep
become a new and valuable benchmark dataset for future learning in computer vision and natural language processing.
arXiv preprint: 1907.04433, 2019.
research in the field.
3. Alam M, Samad MD, Vidyaratne L, et al. Survey on deep neural
(2) Automatic data labeling. In this paper, an automatic networks in speech and vision systems. Neurocomputing
data labeling framework for radio signals is proposed. Addi- 2020;417:302–21.
tional ADS-B signal data could be added in the dataset with- 4. Otter DW, Medina JR, Kalita JK. A survey of the usages of deep
out much cost. The proposed system will be a useful tool to learning for natural language processing. IEEE Trans Neural Netw
collect other types of signal in the future. Learn Syst 2021;32(2):604–24.
(3) Training resources for other ADS-B tasks. This dataset 5. Kato N, Mao B, Tang F, et al. Ten challenges in advancing
could serve as a pretrain dataset for other tasks of the ADS-B machine learning technologies toward 6G. IEEE Wirel Commun
system, such as ADS-B demodulation and separation. This is 2020;27(3):96–103.
mainly due to the rich, diverse, and robust features captured 6. Gui G, Liu M, Tang F, et al. 6G: Opening new horizons for
integration of comfort, security, and intelligence. IEEE Wirel
in this dataset. One interesting research direction could be:
Commun 2020;27(5):126–32.
transferring the deep learning models to conduct few-shot 7. Dong P, Zhang H, Li GY, et al. Deep CNN-based channel
learning about ADS-B signals in other scenarios. estimation for mmWave massive MIMO systems. IEEE J Sel Top
(4) This paper’s dataset and codes have been open in Signal Process 2019;13(5):989–1000.
‘‘https://gitee.com/heu-linyun”. We hope it can be used to 8. Liang F, Shen C, Yu W, et al. Towards optimal power control via
attract more researchers to join the fields, and better deep ensembling deep neural networks. IEEE Trans Commun 2020;68
learning method can be provided to improve the recognition (3):1760–76.
performance. 9. She C, Dong R, Gu Z, et al. Deep learning for ultra-reliable and
However, there are also some improvements that should be low-latency communications in 6G networks. IEEE Netw 2020;34
continued with more effort. In wireless communications, fad- (5):219–25.
10. Rajendran S, Meert W, Giustiniano D, et al. Deep learning models
ing is variation of the attenuation of a signal with common
for wireless signal classification with distributed low-cost spectrum
various variables. To give a comprehensive benchmark result sensors. IEEE Trans Cogn Commun Netw 2018;4(3):433–45.
about our dataset, we will consider channel fading factor in
48 Y. TU et al.
11. Afan A, Fan YY. Automatic modulation classification using deep 22. O’Shea TJ, Corgan J, Clancy TC. Unsupervised representation
learning based on sparse autoencoders with nonnegativity con- learning of structured radio communication signals. 2016 First
straints. IEEE Signal Process Lett 2017;24(11):1626–30. International Workshop on Sensing, Processing and Learning for
12. Huang S, Dai R, Huang J, et al. Automatic modulation Intelligent Machines (SPLINE); 2016 Jul 6-8; Aalborg, Denmark;
classification using gated recurrent residual network. IEEE Inter- Piscataway: IEEE Press; 2016: p. 1–5.
net Things J 2020;7(8):7795–807. 23. O’Shea TJ, Roy T, Clancy TC. Over-the-air deep learning based
13. Zhang Z, Luo H, Wang C, et al. Automatic modulation radio signal classification. IEEE J Sel Top Signal Process 2018;12
classification using CNN-LSTM based dual-stream structure. (1):168–79.
IEEE Trans Veh Technol 2020;69(11):13521–31. 24. Sankhe K, Belgiovine M, Zhou F, et al. No radio left behind:
14. Tu Ya, Lin Y, Hou C, et al. Complex-valued networks for radio fingerprinting through deep learning of physical-layer
automatic modulation classification. IEEE Trans Veh Technol hardware impairments. IEEE Trans Cogn Commun Netw 2020;6
2020;69(9):10085–9. (1):165–78.
15. Peng S, Jiang H, Wang H, et al. Modulation classification based 25. Lin Y, Jia JC, Wang S, et al. Wireless device identification based
on signal constellation diagrams and deep learning. IEEE Trans on radio frequency fingerprint features. ICC 2020–2020 IEEE
Neural Netw Learn Syst 2019;30(3):718–27. International Conference on Communications (ICC); 2020 Jun 7-11;
16. Lin Y, Tu Ya, Dou Z, et al. Contour stella image and deep Dublin, Ireland. Piscataway: IEEE Press; 2020. p.1–6.
learning for signal recognition in the physical layer. IEEE Trans 26. Muller FCBF, Cardoso C, Klautau A. A front end for discrim-
Cogn Commun Netw 2021;7(1):34–46. inative learning in automatic modulation classification. IEEE
17. Ma JT, Lin SC, Gao HJ, et al. Automatic modulation classifica- Commun Lett 2011;15(4):443–5.
tion under non-Gaussian noise: A deep residual learning 27. Meng F, Chen P, Wu L, et al. Automatic modulation classifica-
approach. ICC 2019 - 2019 IEEE International Conference on tion: a deep learning enabled approach. IEEE Trans Veh Technol
Communications (ICC); 2019 May 20-24; Shanghai, China. 2018;67(11):10760–72.
Piscataway: IEEE Press; 2019.p.1–6 28. Aslam MW, Zhu ZC, Nandi AK. Automatic modulation classi-
18. Tu Y, Lin Y, Wang J. Semi-supervised learning with generative fication using combination of genetic programming and KNN.
adversarial networks on digital signal modulation recognition. IEEE Trans Wirel Commun 2012;11(8):2742–50.
Comp, Mater Continua 2018;55(2):243–54. 29. Editorial G. Vapnik-Chervonenkis (VC) learning theory and its
19. Wang Yu, Yang J, Liu M, et al. LightAMC: Lightweight applications. IEEE Trans Neural Netw 1999;10(5):985–7.
automatic modulation classification via deep learning and com- 30. Haoran Z, Sen W, Shihao W, et al. A method using blind source
pressive sensing. IEEE Trans Veh Technol 2020;69(3):3491–5. separation to improve the decoding efficiency of space-based
20. Lin Y, Tu Y, Dou Z. An improved neural network pruning ADS-B receiver. Int J Performability Eng 2020;16(6):950.
technology for automatic modulation classification in edge 31. Gopalakrishnan S, Cekic M, Madhow U. Robust wireless
devices. IEEE Trans Veh Technol 2020;69(5):5703–6. fingerprinting via complex-valued neural networks. 2019 IEEE
21. O’Shea TJ, Corgan J, Clancy TC. Convolutional radio modulation Global Communications Conference (GLOBECOM); 2019 Dec 9-
recognition networks. Engineering applications of neural net- 13; Waikoloa, USA. Piscataway: IEEE Press; 2019. p. 1–6.
works. Cham: Springer International Publishing; 2016. p. 213–26.

Paper 2

Uploaded by

Copyright:

Available Formats

Paper 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper 2

Uploaded by

Copyright:

Available Formats

Chinese Journal of Aeronautics, (2022), 35(9): 35–48

Chinese Society of Aeronautics and Astronautics

Large-scale real-world radio signal recognition with

Received 27 February 2021; revised 6 April 2021; accepted 24 May 2021

2. Related work conservation of phase, variability of size, and invariance of

Fig. 1 Structure of ADS-B signal.

Fig. 2 Overall architecture of capture system.

Table 1 SM200B parameters during recording.

(1) Resampling. This is used to reduce the amount of data,

Fig. 4 Quantity distribution of the captured ADS-B signals.

Fig. 5 Four types of ADS-B signal of different airplanes from dataset.

LSTM structure, with a simple recurrent unit more effective

5. Recognition performance analysis

5.2. Recognition performance of different classifiers

Fig. 6 Recognition performance at different SNR levels.

Fig. 7 Recognition performance at different categories number for different models.

Fig. 8 Recognition performance at different sampling rates with different models.

You might also like