Research Article On Sybil Attack Detection
Research Article On Sybil Attack Detection
Research Article On Sybil Attack Detection
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 1
Abstract—In this paper, a scheme to detect both clone and Sybil seriously threaten industrial network security in a way of
attacks by using channel-based machine learning is proposed. amending routing information, collecting sensitive information
To identify malicious attacks, channel responses between sensor or interrupting key distributions. In addition, a Sybil attack is
peers have been explored as a form of fingerprints with spatial
and temporal uniqueness. Moreover, the machine learning based another type of security threats to industrial wireless networks,
method is applied to provide a more accurate authentication where a malicious node claims multiple fake identities. Since
rate. Specifically, by combining with edge devices, we apply the industrial control center may not be able to distinguish
a threshold detection method based on channel differences to those multiple fake identities, the malicious node will be able
provide offline training sample sets with labels for the machine to attack industrial networks.
learning algorithm, which avoids manually generating labels.
Therefore, our proposed scheme is lightweight for resource con- Owing to the resource constraints of wireless sensor net-
strained industrial wireless devices, since only an online-decision work devices, a lightweight and effective detection method
making is required. Extensive simulations and experiments were is needed in industrial wireless networks. Especially for clone
conducted in real industrial environments. Both results show attacks, as attackers are able to obtain all sensitive information
that the authentication accuracy rate of our strategy with an from a compromised legitimate node, it is difficult to detect
appropriate threshold can achieve 84% without manual labeling.
and deter these attacks by relying on cryptographic algorithms.
In [?], [?], [?], [?], [?], [?], [?], physical layer authentication
Index Terms—Cyber physical security, physical layer authen- based on generalized channel responses with spatial variability
tication, supervised machine learning.
was put forward, which is based on generalized channel
response with spatial variability and considers the correlation
I. I NTRODUCTION of time, frequency and spatial domains. On account of this,
we propose an attack detection method based on channel
I NDUSTRIALwireless networks offer unprecedented op-
portunities for Industry 4.0. However, industrial wireless
communications are vulnerable to probing free attacks (PFAs),
differences. To the best of our knowledge, the existing methods
are not able to detect both clone and Sybil attacks at once. In
addition, in order to further improve the detection rate, a spoof-
which are not possible in wired communication systems [?].
ing attack detection strategy combined with machine learning
Security is of vital importance, especially in industrial control
algorithm is proposed in [?]. However, the offline training
systems. If malicious information or commands are sent to
model of the machine learning needs strong computing power.
control devices by attackers, unexpected results may occur,
Moreover, another premise of this work is that the machine
such as an interruption in the industrial process, leading to eco-
learning algorithm needs to be optimized by providing samples
nomic losses or even safety accidents. For example, an attacker
with labels in the offline training phase. Previous studies about
can launch a clone node attack in an unsupervised industrial
machine learning have focused on artificial injection of label
wireless network. The attacker first hijacks legitimate node
samples, and it is difficult to obtain attack samples in advance
and extracts its ID, key and other confidential information.
[?].
Subsequently, a large number of cloned nodes are deployed
To further improve network efficiency, edge computing or
by the attacker in the industrial wireless network, which
fog computing is introduced into industrial wireless networks
Songlin Chen is with the National Key Laboratory of Science and Technol- to meet the needs of low-latency of industrial applications
ogy on Communications, University of Electronic Science and Technology of such as real-time monitoring and control for critical industrial
China (UESTC), Chengdu 611731, China, email: [email protected]. devices [?], [?], [?]. Both edge and fog computing acquire
Zhibo Pang is with Department of Automotive Solutions, ABB Corporate
Research, Västerås 72226, Sweden, email: [email protected]. more devices to process data from nearby devices on the
Kan Yu is with La Trobe University, Bundoora, VIC, 3086, Australia, email: basis of traditional centralized cloud platform data processing
[email protected]. and analysis methods, thus it can relieve the traffic burden
Yueming Lu is with Beijing University of Posts and Telecommunications,
Beijing, 100876, China, email: [email protected]. and enable ultra-low latency response. Regarding industrial
Hong Wen and Tengyue Zhang are with Department of Aeronautics and edge networks, a lot of studies focus on optimizing resource
Astronautics, University of Electronic Science and Technology of China allocation by utilizing edge computing networks and security
(UESTC), Chengdu 611731, China, emails: [email protected] and
[email protected]. protection [?], [?], [?], [?]. By this means, the edge computing
Corresponding author: Hong Wen. can provide the proximal support platform that makes use
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 2
of machine learning based method to improve the channel ratio test are exploited for authentication in smart systems
state information (CSI) based attack detection rate. There- in [?]. Thirdly, a suitable threshold also affects the effect
fore, in this paper, we proposed an automated labeling and of channel-based physical layer authentication. User authen-
learning method for physical layer authentication to detect tication in practical leveraging CSI based threshold method
clone and Sybil attacks in edge computing industrial wireless is studied in [?] for spoofing detection. However, it is prone
networks. Firstly, we provide a method of utilizing channel to multi-path effect and Doppler frequency shift. At present,
differences and threshold detection, as well as the offline in order to improve the accuracy of detection rate, some re-
training sample sets with labels, for the machine learning searchers devoted to study machine learning algorithms based
classification algorithm. Secondly, a support vector machines on CSI. In [?], [?], various supervised learning algorithms are
(SVM) algorithm is utilized to realize online decision of two applied to detect spoofing attacks. Q-learning and game theory
types of attack detection at once, since the SVM algorithm has are utilized to adjust and determine the threshold in channel
better performance under small size offline training sample sets authentication. All of these approaches need offline training
compared with other machine learning algorithms. Our main sample sets with labels. Those are provided to the machine
contributions are summarized as follows. learning algorithm for continuous optimization of the learning
• We propose a physical layer authentication strategy based model.
on channel differences to detect clone attacks and Sybil To the best of our knowledge, none of the previous studies is
attacks simultaneously in industrial wireless environ- able to detect clone and Sybil attacks at the same time. More-
ments. over, the existing CSI-based attack detection methods com-
• Compared with existing machine learning based physical bined with machine learning algorithms need offline training
layer authentication schemes, the labels of offline train- samples, requiring labels with attack identities or legitimate
ing sample sets can be generated automatically without identities. However, the existing methods need to label samples
manual operation by our proposed strategy. manually. They fail to satisfy the requirement of offline
• The simulations of both clone and Sybil attacks detection sample training for detection model without knowing the label
are conducted by using open datasets from National attributes of attack nodes beforehand [?]. To overcome this
Institute of Standards and Technology (NIST). Field drawback, we propose an automated labeling and learning
validations are carried out in real industrial environments strategy for physical layer authentication. This strategy utilizes
to verify the feasibility of our proposed method [?]. the channel difference threshold method to generate labels of
This paper is organized as follows: Section II illustrates related learning samples into the offline training sample sets when the
work. Attack modeling and analysis are introduced in section machine learning algorithm is trained offline. In addition, our
III. In section IV the proposed scheme is described in detail, method can also detect clone and Sybil attacks at the same
followed by simulation and experimental verification for our time in wireless industrial networks.
proposed strategy in section V and VI. Finally, we conclude
the paper and demonstrate future work in section VII. III. ATTACK M ODELING AND A NALYSIS
In this section, we first introduce industrial wireless edge
II. R ELATED W ORK networks, and then give an overview of clone and Sybil attacks
in industrial wireless edge networks.
A high number of research efforts have been put on the
detection of clone and Sybil attacks by utilizing physical
layer authentication. CSI based Sybil attacks detection in A. Industrial Wireless Edge Networks (IWEN)
wireless networks was proposed in [?]. This approach is an An industrial wireless edge network is regarded as a bridge
enhanced physical layer authentication scheme to detect Sybil between industrial wireless devices and the remote cloud. It
attacks by exploiting the spatial variability of radio channels. can provide real-time edge intelligent services to meet the criti-
Physical layer authentication is based on judging the channel cal needs of industry digitization in terms of agile connectivity,
information difference of nodes, which is caused by their real-time services, data optimization, application intelligence,
different spatial locations. In [?], the feature-based physical security and privacy protection [?]. In order to improve the
layer authentication protocol is highly effective in detecting work efficiency of assembly lines in automotive factories, a
Sybil attacks for mission-critical machine type communication local wireless autonomous network with an industrial edge
(MTC) applications. The existing physical layer authentication computing node is established. The network consists of local
methods are mainly studied from three perspectives. Firstly, industrial wireless sensors for industrial environment detection
appropriate channel estimation algorithms are selected to im- and wireless control nodes (e.g. mechanical arm, unmanned
prove the accuracy of CSI. Due to channel differences of logistics vehicles, robots, etc). An industrial edge computing
communications peers need to be compared, it is essential node can work in real-time and efficiently with other local
to obtain accurate CSI through channel estimation. Secondly, terminal devices. Moreover, this node ensures a smooth and
a better channel difference is constructed between different efficient collaboration among devices by dispatching the oper-
nodes, and which can be constructed as the test statistic in the ation status of all kinds of terminals reasonably, thus realizing
binary hypothesis testing. A variety of channel difference test a real-time and efficient operation in industrial environments
statistics have been proposed by other scientists. For instance, [?]. In addition, the industrial edge computing node is able
Channel-based likelihood ratio test and sequential probability to work with a remote cloud computing. The remote cloud
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 3
Mechanical arm node Malicious node Mechanical arm node PC node Malicious node
Attacker PC node Attacker
Clone node L8 S2 Sybil node
L2
L2
Edge computing L1 Edge computing
C1 L1 L3 L3
node L5 node
Sensor node Sensor node
E1
M1 S1 E1
M1 L7 S3
Automated guided vehicle Automated guided vehicle
L5 L6 L5
L4 L4
Fig.1. Clone attack in IWEN: the attacker replicates the clone node C1 based Fig.2. Sybil attack in IWEN: the attacker deploys multiple Sybil nodes,
on M1 in large quantities and deploys C1 in different positions in the industrial including S1, S2, S3, in the industrial control system, but their true position
control system based on edge computing. is M1.
computing is responsible for the analysis of non-factual and industrial edge computing node continuously to prevent the in-
long-period data, while the industrial edge computing node dustrial edge computing node from receiving any information
focuses on real-time, short-period data analysis. An industrial from legitimate nodes. Moreover, by broadcasting malicious
edge computing network can support real-time responses and information in the network, attackers can inject malicious
high mobility in an industrial operating environment. Clone data or change transmission information of legitimate nodes,
and Sybil attacks may occur in the industrial wireless edge leading to a collapse or breakdown of the industrial edge
network. Fortunately, the industrial edge computing is more computing node.
conducive to providing sufficient computing resources, and 2) Sybil attack scenario: As the local real-time processing
utilizing machine learning algorithms based on channel iden- center, the industrial edge computing node needs to interact
tification attack detection in a proximal location. with other nodes shown in Fig. 2. Sybil attacks are malicious
nodes pretending to be other nodes or claiming faked IDs
B. Overview of Clone Attacks and Sybil Attacks in the industrial wireless network. As shown in Fig. 2, an
We consider that an industrial edge computing node controls attacker deploys multiple Sybil nodes, including S1, S2, S3,
multiple wireless device nodes as local real-time decision in the industrial control system, but their true location is M1.
center in an edge computing based industrial control envi- Meanwhile, L1,..., L8 are legitimate nodes. The edge control
ronment. There are various wireless nodes in the wireless node mistakenly believes that S1, S2 and S3 are located in
network, such as manipulators, wireless personal computers, different locations in the network topology, but in fact, they
temperature sensors, automatic steering vehicles, and wireless are in the same geographical location. For example, if nodes
control nodes, shown in Fig. 1. The industrial edge computing S1 and S2 are generated from the node M1 and they are not
node communicates with multiple nodes to achieve the coor- in the positions being claimed in Fig. 2, the nodes S1, S2 and
dination and integration of all kinds of wireless sensors. It M1 are physically in the same position.
needs to ensure the network security of all access wireless Sybil attackers fake multiple identities to communicate with
nodes and provide resources for various devices to ensure the industrial edge computing node, affecting the communica-
stable operation. tion network of other legitimate nodes, thus intercepting and
1) Clone attack scenario: The scenario of clone attacks in tampering with information, and decreasing network availabil-
industrial control systems based on edge computing is shown ity and undermining data integrity. In particular, Sybil nodes
in Fig. 1. When an attacker hijacks the legitimate node M1 may continuously send a huge amount of messages to the
(automated guided vehicle), he is able to access all sensitive industrial edge computing node by imitating legitimate devices
information related to this node (e.g. key, control information, or with fake IDs to request network and storage resources.
data information, ID information, etc). Therefore, the attacker Consequently, due to the network resource exhaustion, the
is able to replicate a clone node C1 based on the extracted industrial wireless edge network may collapse, leading to more
information from M1 and deploys C1 in a different location serious economic losses or safety accidents.
in this industrial wireless edge network. Then the clone node
C1 is able to act as the legitimate node M1 and participate in
IV. AUTOMATED L ABELING AND L EARNING FOR
data interaction with the industrial edge computing node E1.
P HYSICAL L AYER AUTHENTICATION
Meanwhile, the other nodes L1,..., L5 are legitimate nodes.
A clone attack will bring a series of serious consequences. We propose an automated labeling and learning scheme
For example, when a great number of clone nodes exist in for physical layer authentication to detect both clone and
the industrial wireless edge network, they are able to launch Sybil attacks, consisting of the physical layer channel response
a denial of service (DoS) attack by occupying the communi- extraction, physical layer authentication scheme and channel-
cation channel at all times and sending access requests to the based machine learning scheme.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 4
B. Physical Layer Channel Response Extraction 1) Channel Identification for Clone Detection: The channel
identification algorithms for clone attacks are launched after
The extraction of CSI is undertaken by legitimate receivers.
the communication and interaction between the industrial edge
The signal model of the legitimate receiver can be given by
computing node and terminal nodes to be tested. The industrial
r(t) = hx(t) + n(t) (1) edge computing node initially interacts with terminal nodes.
Specifically, after receiving identity information from each
where t, h, x respectively refer to the time slot which node, the industrial edge computing node makes a preliminary
means the time interval between every data frames, a time- assessment based on the received information. Clone attacks
domain channel matrix which is the matrix value of channel may occur if some nodes have the same ID. In this case,
coefficients, a pilot signal known by transmitters and receivers the industrial edge computing node uses the pilot signal to
and used to estimate channel information, and n(t) is additive estimate the channel and generate the CSI Ĥk (n), which can
white Gaussian noise with variance σ 2 . The channel frequency be given by
response generated by the receiver through the channel esti-
mation is as follows Ĥk (n) = [Ĥk (n, 1), Ĥk (n, 2), ..., Ĥk (n, M )]T (6)
−1 −1
Ĥk = RX = Hk + N X (2)
Assuming that an industrial edge computing node obtains
where Hk is the channel frequency response, R, N and X the CSI Ĥk , at the receiver side of nodes to be tested at t1 , the
are obtained by discrete Fourier transform in time domain r, CSI Ĥk+1 at t2 is compared with Ĥk . Then it is possible to
n and x. LS and NMSE methods can be used to estimate the determine whether the channel information received from the
channel response Ĥk from pilots [?]. According to [?], Ĥk different time comes from the same node. The clone attack
can be expressed by the following detection problem can be considered as a binary hypothesis
test. In the null hypothesis H0 , the message of the transmitting
Ĥk (n) = [Ĥk (n, 1), Ĥk (n, 2), ..., Ĥk (n, M )]T (3)
node comes from the same location, thus there is no clone
where k is the frame index, n is the symbol index, and M is attack. Otherwise, the alternative hypothesis H1 , is that the
the dimension of channel information. transmitting node comes from different geographical locations,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 5
and there are clone attacks on the node. The binary hypothesis frames. Again we adopted the Euclidean distance as the test
testing can be described as statistic:
Ĥk+1 − Ĥk
H
(
1
H0 : Ĥk −→ Ĥk+1 No existing clone attack 2 <
(7) TSybil = diff(Ĥk+1 , Ĥk ) = η (11)
> 2
H1 : Ĥk 7−→ Ĥk+1 Existing clone attack
Ĥk
H0
2
where H0 indicates no clone attacks occurring, while H1 where Ĥk is the CSI of a node to be tested at the time of
means clone attacks exist. To determine the similarity of the k, and Ĥk+1 is the CSI of a node to be tested at the time of
CSI Ĥ, it is essential to make comparisons of the difference k + 1. When the CSI difference between the two is less than
and threshold of the channel matrices of two consecutive data the Sybil threshold η2 , it indicates that a Sybil attack occurs.
frames Ĥk , Ĥk+1 . We adopted the Euclidean distance as the When the difference is less than the Sybil threshold η2 , the
test statistic, which can be given by channel matrices are very similar, and the nodes have different
ID in the same location. Thus, it is determined that the current
Ĥk+1 − Ĥk
H
0
2 <
transmitter is a Sybil attack node.
Tclone = diff(Ĥk+1 , Ĥk ) = η (8)
> 1
Ĥk
H1
2 D. Support Vector Machine
where Ĥk is the CSI of a legitimate node, Ĥk+1 is the In machine learning, SVM is functioned as a supervised
CSI of a node to be tested. When the CSI difference is learning model with associated learning algorithms that ana-
less than the clone threshold η1 , the channel matrices are lyze data used for classification and regression analysis. We
very similar, which determines that the current transmitter is consider adopting SVM as our attack detection algorithms for
legitimate. Contrarily, when the CSI difference is greater than two main reasons. First, the attack detection is a two-class
the clone threshold η1 , the channel matrices are considered to problem that only has two results. One is that attacks exist, the
be different, which determines that the current transmitter is other indicates no attacks. Second, due to the limited sample
a clone node with the same ID in different locations. size of open datasets provided by NIST, we need to consider
2) Channel Identification for Sybil Detection: In Sybil that machine learning algorithms should probably gain better
attack, malicious nodes pretend to be multiple other nodes classification results with small offline training sample sets.
and have multiple IDs in the industrial wireless edge net- The optimization model is given by
work. When the industrial edge computing node communicates N
X 1XX
N N
and interacts with terminal nodes, the channel identification max αn − αn αm yn ym k(xn , xm ) (12)
α 2 n=1 m=1
algorithms for Sybil attacks are launched. Specifically, the n=1
industrial edge computing node, according to the received N
X
response signals R from each node, determines that Sybil s.t αn yn = 0,
attacks may occur if some terminal nodes have different IDs. n=1
In this context, the industrial edge computing node uses the αn ≥ 0, (n = 1, ..., N ).
pilot signal to estimate the channel and generate the CSI where k(x, x0 ) = φ(x)T φ(x0 ) is the kernel function. In this
Ĥk (n), which can be given by paper, the offline training sample set S comprises n input
vectors Ĥ1 , ..., Ĥn with corresponding target labels y1 , ..., yn ,
Ĥk (n) = [Ĥk (n, 1), Ĥk (n, 2), ..., Ĥk (n, M )]T (9)
where yi ∈ {−1, 1}. S can be given by
Assuming that the industrial edge computing node obtains S = {(x1 = Ĥ1 , y1 ), ..., (xn = Ĥn , yn )} (13)
the CSI Ĥk at the receiver side of nodes to be tested at t1 , the
CSI Ĥk+1 at t2 is compared with Ĥk . Then it is possible to The offline training sample set S is utilized to obtain a value
N
determine whether the channel information received from the for α, and w∗ =
P
αn∗ yn φ(xn ) via Eq. 12 [?]. We can
different time comes from the same node. The Sybil attack n=1
detection problem can be considered as a binary hypothesis detemine the b∗ by any support vector xn satisfying
then P
∗
test. In the null hypothesis H0 , the message of the transmitting yn ( αm ym k(xn , xm ) + b∗ ) = 1, yn ∈ {−1, 1}, where
m∈S
node comes from different locations, thus there is no Sybil S denotes the set of indices of the support vectors [?]. By
attack. Otherwise, the alternative hypothesis H1 indicates that multipling yn and making use of yn2 = 1 and all support
the transmission nodes are from the same location and Sybil vectors, we obtain a more stable solution b∗ given by [?],
attacks may exist. Hypothesis testing can be expressed as: 1 X X
b∗ = (yn − αm∗
ym k(xn , xm )) (14)
Ns
(
H0 : Ĥk 7−→ Ĥk+1 No existing sybil attack n∈S m∈S
(10)
H1 : Ĥk −→ Ĥk+1 Existing sybil attack where NS is the total number of support vectors. According
to w∗ and b∗ , the final SVM offline model is given by
To determine the similarity of the channel matrices, it is N
essential to make the comparison of the CSI difference and X
f (x) = sign( αn∗ yn k(x, xn ) + b∗ ) (15)
threshold of the channel matrices of two consecutive data n=1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 6
Algorithm 1 Automated labeling for physical-layer authenti- Algorithm 2 Machine learning for physical layer authentica-
cation tion
Input: clone threshold η1 , Sybil threshold η2 , the feedback Input: The labeled node channel information samples S can
signal of i node Ri = {P ilot, IDi }, the feedback signal be given by, S = [Ĥki , yki = −1], ..., [Ĥkj , ykj = +1], target
of j node Rj = {P ilot, IDj }. authentication accuracy rate G.
Output: offline training samples. Output: online authentication model.
1: If IDi = IDj 1: Divide S into training sample set S1 and testing sample
2: Calculate CSI and channel difference Tclone between i and set S2 ;
j nodes, via Eq. 6, 8; 2: Calculate offline authentication model based on SVM, via
3: If Tclone > η1 S1 and Eq. 12, Eq. 14. Specifically, Ĥki is put into the
4: Insert clone attack labels ykj = −1 into channel kernel function K(·), and yki is put into Eq. 14;
5: samples of j node; 3: Calculate authentication accuracy rate A of offline authen-
6: Issue a warning that clone attack occurs; tication model based on SVM, via S2 and Eq. 15, Eq. 18;
7: Else insert legitimate node labels ykj = +1 into 4: If A > G
8: channel samples of j node; 5: Return online authentication model = offline
9: End if. 6: authentication model.
10: Else if IDi 6= IDj 7: Else return to step 2 for optimizing the offline authentica-
11: Calculate CSI and channel difference TSybil between i and tion model;
j nodes, via Eq. 9, 11; 8: End if.
12: If TSybil < η2 9: Output online authentication model.
13: Insert Sybil attack labels ykj = −1 into channel
14: samples of j node;
15: Warn Sybil attack occur; layer authentication process. The industrial edge computing
16: Else insert legitimate node label ykj = +1 for node extracts the CSI of the nodes respectively, and compare
17: channel samples of j node; their channel differences at different times. If the difference
18: End if. TSybil is less than the Sybil threshold η2 , the industrial edge
19: End if. computing node can correctly detect whether a Sybil attack has
20: Output offline training samples including clone attack occurred in this node. If a Sybil attack occurs TSybil < η2 ,
samples, Sybil attack samples, and legitimate node sam- the industrial edge computing node injects Sybil attack labels
ples. ykj = −1 into the Sybil attack node to form a sample set
of Sybil attack nodes. On the contrary, the industrial edge
computing node injects legitimate labels ykj = +1 to form a
E. Channel-based Machine Learning Scheme sample set of legitimate nodes. Offline training sample sets
are given by S = {[Ĥki , yki = −1], ..., [Ĥkj , ykj = +1]}.
We hypothesize that attackers launched Sybil attacks and Subsequently, the industrial edge computing node uses the
clone attacks in the industrial wireless edge network. As a labeled node channel information samples to form the training
result, there exist three kinds of nodes: legitimate nodes, clone and testing sample sets S required by the machine learning
nodes and Sybil nodes. Meanwhile, each node has its own ID. algorithm. According to Algo. 2, the industrial edge computing
The industrial edge computing node can deal with automated node implements the CSI-based machine learning algorithm,
labeling for physical layer authentication via Algo. 1. The and optimizes the model through the sample sets S to achieve
industrial edge computing node classifies the nodes according the target authentication accuracy rate G. Otherwise, the
to their ID. If nodes have the same identity, the industrial optimization model will continue iteratively returning to step
edge computing node needs to consider whether these nodes 2 until target G achieved. Finally, an online attack detection
are legitimate nodes or clone ones, and then conducts the model is generated to detect clone attacks and Sybil attack
channel authentication. The industrial edge computing node nodes.
extracts CSI of each node to obtain the channel difference
Tclone , and compares it with the clone threshold value η1 to V. S IMULATIONS ON REAL INDUSTRIAL OPEN DATASET
determine whether a clone attack occurs. If a clone attack
occurs Tclone > η1 , the industrial edge computing node A. Measurement Setup
injects the clone attack label yki = −1 into channel response The industrial environment chosen for this measurement
vectors of the clone attack node, [Ĥki , yki = −1], to form a is an automotive factory of 400m × 400m × 12m, where
clone attack node sample set, whereas the legitimate node is full of metallic equipment and obstacles. The channel
injects the legitimate label yki = +1 to form a legitimate information used in this work is based on the NIST dataset
node sample set. Offline training samples sets are given by that are acquired by the channel sounder system composed
S = {[Ĥki , yki = −1], ..., [Ĥkj , ykj = +1]}. In another situation, of NIST TX and RX channel sounder [?]. The transmitter
if these nodes have different IDs, the industrial edge computing repeatedly sends the pseudo-noise code (PN) serial number
node begins to consider whether these nodes are legitimate of a set of digital symbols, which are modulated by the
nodes or Sybil nodes, and then also performs the physical signal of binary phase shift keying (BPSK) and converted up
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 7
to the radio frequency carrier frequency. Though the power Industrial edge computing node
amplifier, the signals propagate through the factory to the Captured node
TX and RX channel sounder. The receiver converts, digitizes Clone node
and stores the received signal locally. Balancing steps are Attacker Legitimate node
performed during the post-processing to eliminate hardware 167.2m
damage. The channel measurement equipment keeps moving
in the during the measurements, following a loop with non-
line-of-sight path with rich multipath. The parameter config- Legitimate Legitimate
121.6m
uration of the channel measurement system includes center Clone
frequency, antenna, and power. The center frequency is 2.245 Legitimate
Legitimate
Captured node
GHz, omni-directional is applied for both the receiving and
Industrial edge computing node
transmitting, and the polarization modes are Cross pol and Legitimate Clone Clone
V pol, respectively. The receiving antenna gain is -4.2 dBi, Clone
the transmitting antenna gain is 2.9 dBi, and the transmitting
power is 1.5w. The sampling rate is 80 MHz. In this scenario, Fig.4. The simulation experiment of clone attack under automotive assembly,
106 channel measurement positions were measured and 300 utilizing open dataset from NIST.
records are for each measurement position [?].
1 1
0.8 0.8
Value
The receiver operating characteristic (ROC) curve is the 0.4 0.4
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 8
different clone thresholds to provide labeled offline training Industrial edge computing node
sample sets for the machine learning models. The experiment Malicious node
uses different clone thresholds to generate attack labels into Legitimate node
node samples in different sizes. The node under attacks is Attacker Sybil node
marked as -1, and the other nodes without being attacked are 167.2m
marked as +1. The purpose is to form offline training samples
for machine learning. Subsequently, the machine learning Legitimate Legitimate Legitimate
algorithm generates the final clone attack detection model Legitimate
Malicious
to satisfy our set authentication accuracy rate through offline S1 M2
121.6m
training sample sets. Finally, the online decision of the model S3
LegitimateLegitimate
is carried out to evaluate the performance of the system. The Malicious
M1
simulation results show that the offline training sample sets Industrial edge computing node
generated by different clone thresholds affect the clone attack S2 Legitimate S4
detection model based on machine learning of CSI. The area Legitimate
under curve (AUC) and authentication accuracy rate are the Fig.6. The experiment of Sybil attacks under automotive assembly, utilizing
indexes to measure the classification of machine learning. Fig. open datasets from NIST. S1 and S2 are generated from M1 and the true
5. (b) shows that the offline training samples generated under position of them is at M1; S3 and S4 are generated from M2 and the true
position of them is at M2.
four different cloning thresholds have no significant impact on
the attack detection rate of SVM based on CSI.
1 1
0.8 0.8
True Positive Rate
E. Simulation of Sybil Attack Scenario
0.6 0.6
Value
We also adopt NIST channel information dataset in an
automotive factory to simulate Sybil attacks. The legitimate 0.4 0.4
30 nodes
nodes and an industrial edge computing node are set up 0.2 60 nodes 0.2
Auc
Accuracy
90 nodes
at multiple mobile locations in this factory. If an attacker 0 0
attacks a legitimate node, the node becomes a malicious node 0 0.2 0.4 0.6 0.8 1 0.6 0.7 0.8 0.9
False Positive Rate Threshold
in an industrial wireless edge network. The malicious node
masquerades as another node or claims to fake IDs. As shown (a) The ROC of threshold method (b) The result of SVM scheme
in Fig. 6, the black solid circles are Sybil nodes generated
Fig.7. (a) the ROC of channel threshold method for detection Sybil attack
from the hexagonal malicious nodes, in which S1 and S2 are under different nodes density; (b) the accuracy of CSI-SVM physical authen-
generated from M1; S3 and S4 are generated from M2. The tication scheme for detection Sybil attack.
true position of S1 and S2 is the position of M1. The true
position of S3 and S4 is the position of M2. The area of this Another simulation is implemented to verify our proposed
simulation is 167.2m long and 121.6m wide. The number of strategy. SVM is applied for the attack detection method based
nodes in three simulation are 30, 60, 90 respectively, where on CSI machine learning. First, the Sybil threshold method is
we guarantee that Sybil nodes occupy 20% of all nodes. utilized for attack detection in the mobile environment, so that
it can provide label generation under small-scale samples for
offline training. Furthermore, SVM is suitable for classification
F. Performance of Sybil Attack detection under small sample conditions. As shown in Fig.
In order to satisfy this condition that Sybil nodes occupy 7. (b), we use different Sybil thresholds, 0.6, 0.7, 0.8 and
20% of all nodes, we hypothesizes that a fixed channel 0.9 respectively, to generate labels into the offline training
measurement location is a malicious node being attacked, and sample set of the initial machine learning algorithm. If the
other different measurement locations are legitimate nodes node is under a Sybil attack, it is marked as -1. Otherwise, it
being disguised by Sybil nodes or claimed to counterfeit IDs. is marked as +1. Then, the machine learning algorithm utilizes
We deploy 30, 60 and 90 nodes in the network, and keep the four different offline training sample sets to optimize the SVM
number of Sybil attack nodes is 6, 12 and 18 respectively. model, and forms a Sybil attack detection model. This model
The ROC curves in the three cases are shown in Fig. 7. (a). satisfies the condition that the target authentication accuracy
The ROC curve of the set Sybil threshold ranges from 0 to rate should be higher than 90%. Finally, a new sample set is
1 with interval 0.1, representing the detection performance employed to judge the model online and evaluate the perfor-
of the system. In the situation where the total number of mance of the model. As shown in Fig. 7. (b), the Sybil attacks
nodes deployed is small scale, the simulation results show detection model to implement four CSI based SVM machine
that the threshold attack detection performance based on CSI learning model is developed from offline training sample sets
differences is better than that based on large scale nodes. being generated at different Sybil thresholds. The four attack
Since the channel measurement information when devices are detection results show that the authentication accuracy rate of
moving is different from that when devices keep static, the attack detection can reach 100%, and the AUC increases with
threshold method is a more appropriate choice when deploying the increase of the threshold value. The selection of the Sybil
a small-scale nodes in the industrial wireless network. threshold determines the final performance of our proposed
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 9
48.3m West
A B C
① ⑤
South ④
38.8m North
②
⑥
:Concretepillar :Door
:Wooden Chairs
:Metal Machines
East
Fig.9. The layout of experiment factory with area of 48.3m × 38.8m where
6 nodes to be test is located.
Fig.8. Engineering training center for real experiment environment and test
platforms.
1 0.8
Clone node Auc
Legitimate node Accuracy
Channel difference
0.8
strategy, and also verifies the feasibility of our proposed 0.6
Value
0.4
of learning labels. 0.4
0.2
0.2
VI. E XPERIMENTAL VERIFICATION 0 0
0 100 200 300 400 500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
A. Experimental Setup for Verification Time slots Threshold
In order to verify the effects of the strategy in reality, (a) The channel differences of clone attack (b) The result of SVM scheme
we conducted attack detection experiments in an engineering
Fig.10. (a) The channel difference of clone and legitimate nodes; (b) The
training center. The engineering training center with a lot of result of SVM scheme for clone attack detection.
metal equipment can truly reveal the reflection and scattering
environment of the industrial wireless network, as shown in 1 according to Eq. 8. This figure shows that the channel differ-
Fig. 8, A, B, C. The engineering training center is considered ence of clone node is greater than that of the legitimate node.
to be large, with outer dimensions of approximately 48.3m × The strategy generates a labeled offline training sample set in
38.8m and a ceiling height of approximately 6.5 m. A map order to implement machine learning algorithms by setting
of the experimental site floor is shown in Fig. 9. A total of different clone thresholds. The machine learning algorithm
six Universal Software Radio Peripheral (USRP) platforms utilizes linear SVM, where kernel size is automatic and the
provided by National Instrument (NI) are employed. The cross validation is 5 k-fold. When the target authentication
equipment is numbered into six nodes, one of which is taken as accuracy rate for the machine learning algorithm is larger than
an industrial edge computing node, and the rest are simulated 90%, the model after offline training is output as the final
terminals. The industrial edge computing is statically located authentication model. Then the model is tested. The node 6 is
at position 4, and other nodes are also statically placed at taken as the clone attack node once again. It clones legitimate
position 1, 2, 3, 5, 6 respectively, as shown in Fig. 9. As node 5. Meanwhile, it launches 1000 clone attacks on the
shown in Fig. 8, each node is an 8 × 8 MIMO transceiver. Its industrial edge computing node. The attack detection results
specific configuration parameters are listed below. The center of the authentication model generated under different clone
frequency is 3.5 GHz. The signal bandwidth is 2 MHz. The threshold settings are shown in Fig. 10. (b). The results
number of MIMO subcarriers is 128, and the transmission show that when the clone threshold is 0.6, AC can reach
power is 15 dBm. The other nodes numbered 2, 3, 4, 5 and 75%. Furthermore, AUC is 0.531, which is higher than other
6 can establish communication links with the industrial edge clone thresholds. The results reflect the impact of accuracy of
computing node. Subsequently, clone attack detections and labeling in offline training on the machine learning algorithm.
Sybil attack detections are implemented. The accuracy of labeling in offline training set needs to be
1) The verification of clone attack detection: The node further improved.
6 initiates clone attacks, and intercepts all information of 2) The verification of Sybil attack detection: The node 3,
node 5, including ID, key, and then cheats the industrial edge with multiple identities including node 1, 2 and 6, initiates
computing node. The industrial edge computing node detects Sybil attack. The industrial edge computing node needs to
attacks and evaluates effect through automated labeling and judge whether Sybil attack occurs at node 3. Then we imple-
learning scheme for physical layer authentication. The Fig. ment the proposed Sybil attack detection strategy. The Fig. 11
10. (a) shows the channel differences between the clone node (a) describes the channel differences between Sybil attack node
and the legitimate node after normalization ranging from 0 to and legitimate node after normalizing the differences from 0 to
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 10
0.8 0.8
0.6 0.6
Value
ACKNOWLEDGMENT
0.4 0.4 This work is supported by NSFC (No. 61572114),
0.2 0.2
National major R & D program (2018YFB0904900,
2018YFB0904905), Sichuan sci & tech sevice development
0 0 project (No. 18KJFWSF0368) and Sichuan sci & tech basic
0 100 200 300 400 500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 research condition platform project (No. 2018TJPT0041).
Time slots Threshold
(a) The channel differences of Sybil attack (b) The result of SVM scheme
Fig.11. (a) The channel difference of Sybil and legitimate nodes; (b) The
result of SVM scheme for Sybil attack detection.
1 through Eq. 11. The figure shows the channel characteristics Songlin Chen (S’17) is currently pursuing his Ph.D.
of the Sybil node, and the channel difference at different degree in communication and information system
with the National Key Laboratory of Science and
times is smaller than that of the legitimate node. Subsequently, Technology on Communications, University of Elec-
the automated labeling and learning process is implemented. tronic Science and Technology of China (UESTC),
SonglinChen.pdf Chengdu, China. He became a student member of
Different Sybil thresholds are set to build a labeled offline
IEEE in 2017. His current main interests lie in
training sample set. The machine learning algorithm also wireless communication system security, physical
employs the linear SVM, where the kernel size is automatic layer security, artificial intelligence and industrial
and cross validation is 5 k-fold. When the authentication rate communications.
of machine learning algorithm in offline training is larger
than 90%, it is regarded as the final authentication model.
Then the authentication model is tested practically. Sybil
attack was launched at node 3 once again, and 1000 attacks
were carried simultaneously. Under different Sybil thresholds,
attack detection results of the strategy are as shown in Fig.
Zhibo Pang (M’13–SM’15) received MBA from
11. (b). When the Sybil threshold is set as 0.7, the attack University of Turku and PhD from the Royal Insti-
detection effect of the generated authentication model is the tute of Technology (KTH). He is currently a Senior
best, reaching 84%, compared with other thresholds. This Principal Scientist at ABB Corporate Research Swe-
den, Adjunct Professor at University of Sydney, and
experiment shows the feasibility of strategy. On the other hand, ZhiboPang.jpg Affiliated Faculty and PhD Supervisor at KTH. He
it shows again that the accuracy of the labeled offline training is a Senior Member of IEEE and Co-Chair of the
set is the key to obtain the high authentication accuracy rate Technical Committee on Industrial Informatics. He
is Associate Editor of IEEE Transactions on Indus-
of machine learning model. trial Informatics, IEEE Journal of Biomedical and
Health Informatics, and IEEE Journal of Emerging
and Selected Topics in Industrial Electronics, Guest Editor of Proceedings of
VII. C ONCLUSIONS AND FUTURE WORK the IEEE, IEEE Internet of Things Journal, and IEEE Reviews in Biomedical
Engineering, etc. He has 60+ patents and 60+ refereed journal papers and
In this paper, we discussed physical layer channel infor- 50+ conference papers. He was Invited Speaker at the Gordon Research
mation combined with the machine learning algorithm for Conference on Advanced Health Informatics (AHI2018), and General Chair
detecting clone and Sybil attacks. In order to accomplish the of IEEE ES2017. He was awarded the “2016 Inventor of the Year Award”
and “2018 Inventor of the Year Award ” by ABB Corporate Research Sweden
goal that offline training sample sets are automatically labeled which is the only award for individuals and only one winner per year out of
in the initial stage of the strategy, we proposed a channel dif- 300+ researchers.
ference threshold detection method to label learning samples.
Thus, we solve the problem that physical layer authentication
method based on machine learning lacks learning samples.
More importantly we verify the feasibility of the strategy
by conducting the simulation with the wireless network CSI
dataset disclosed in the industrial environment and experiment Hong Wen (M’08–SM’13) received her M.Sc. de-
in practical industrial environment. gree from the Sichuan University, Chengdu, in 1997,
and the Ph.D. degrees from Southwest Jiaotong
For the future work, one direction is to improve the accuracy University and University of Waterloo in 2004 and
of offline training sample labels for machine learning through 2018. From 2008 to 2009, she was a Visiting Scholar
HongWen.jpg
a better channel difference threshold method. Moreover, better and a Post-Doctoral Fellow with the Electrical and
Computer Engineering Department, University of
channel characteristics are required to improve the accuracy Waterloo, Waterloo, ON, Canada. She is currently a
of channel identification. Last but not least, we will follow Professor with UESTC. Her current research interest
with interest to the Advanced Persistent Threat (APT) attack includes communication systems and security.
which is a kind of sustained and effective attack and hard to be
detected by the encryption based scheme. CSI based physical
layer authentication will be explored for APT attack detection,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.2963962, IEEE
Transactions on Industrial Informatics
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, MARCH 2019 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.