1 s2.0 S016740482300007X Main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Computers & Security 127 (2023) 103097

Contents lists available at ScienceDirect

Computers & Security


journal homepage: www.elsevier.com/locate/cose

2DF-IDS: Decentralized and differentially private federated


learning-based intrusion detection system for industrial IoT
Othmane Friha a,∗, Mohamed Amine Ferrag b, Mohamed Benbouzid c, Tarek Berghout d,
Burak Kantarci e, Kim-Kwang Raymond Choo f
a
Networks and Systems Laboratory (LRS), Badji Mokhtar-Annaba University, B.P.12, Annaba 23000, Algeria
b
Artificial Intelligence & Digital Science Research Center, Technology Innovation Institute, United Arab Emirates
c
UMR CNRS 6027 IRDL, University of Brest, Brest, France
d
Laboratory of Automation and Manufacturing Engineering, University of Batna 2, Batna, Algeria
e
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
f
Department of Information Systems and Cyber Security, University of Texas at San Antonio, San Antonio, TX 78249, USA

a r t i c l e i n f o a b s t r a c t

Article history: Advanced technologies, such as the Internet of Things (IoT) and Artificial Intelligence (AI), underpin many
Received 1 September 2022 of the innovations in Industry 4.0. However, the interconnectivity and open nature of such systems in
Revised 30 November 2022
smart industrial facilities can also be targeted and abused by malicious actors, which reinforces the im-
Accepted 9 January 2023
portance of cyber security. In this paper, we present a secure, decentralized, and Differentially Private
Available online 10 January 2023
(DP) Federated Learning (FL)-based IDS (2DF-IDS), for securing smart industrial facilities. The proposed
Keywords: 2DF-IDS comprises three building blocks, namely: a key exchange protocol (for securing the communi-
Cybersecurity cated weights among all peers in the system), a differentially private gradient exchange scheme (achieve
Privacy improved privacy of the FL approach), and a decentralized FL approach (that mitigates the single point
Intrusion detection of failure/attack risk associated with the aggregation server in the conventional FL approach). We eval-
Industry 4.0 uate our proposed system through detailed experiments using a real-world IoT/IIoT dataset, and the re-
Decentralized federated learning
sults show that the proposed 2DF-IDS system can identify different types of cyber attacks in an Indus-
Differential privacy
IoT/IIoT security
trial IoT system with high performance. For instance, the proposed system achieves comparable perfor-
Post-Quantum cryptography mance (94.37%) with the centralized learning approach (94.37%) and outperforms the FL-based approach
(93.91%) in terms of accuracy. The proposed system is also shown to improve the overall performance
by 12%, 13%, and 9% in terms of F1-score, recall, and precision, respectively, under strict privacy settings
when compared to other competing FL-based IDS solutions.
© 2023 Elsevier Ltd. All rights reserved.

1. Introduction and capabilities facilitates the search and exchange of data across
various industrial applications while adding a ubiquitous digital as-
Next-generation networking technologies, cooperation of physi- pect by involving society and industries, thus, enabling a series of
cal and virtual environments, enhanced interoperability, intelligent interactions between the different components of industrial sys-
management, and decision support are essential in enabling the tems. Industrial Automation and Control Systems (IACS) operating
fourth industrial revolution, or Industry 4.0. One of the success- across a wide range of industries, sometimes referred to as Cyber-
fully adopted technologies is the Internet of Things (IoT), partly Physical Systems (CPS), have traditionally been isolated from legacy
due to its capability in supporting interconnectivity among densely digital networked environments (Boyes et al., 2018). As IoT technol-
clustered heterogeneous objects. The deployment of various intel- ogy is widely adopted and implemented, changes are being made
ligent devices (e.g., IoT devices) with heterogeneous specifications to industrial system architectures, specifically increased connectiv-
ity, and enhanced smartness across industrial systems, a domain
known as the Industrial IoT (IIoT). Figure 1 illustrates a high-level

Corresponding author. abstraction of a simplified IIoT ecosystem, where each layer con-
E-mail addresses: [email protected] (O. Friha), mohamed. tains different types of technologies with a specific purpose within
[email protected] (M.A. Ferrag), [email protected] (M. Benbouzid), the overall ecosystem (e.g., from sensing and actuation through
[email protected] (T. Berghout), [email protected] (B. Kantarci),
[email protected] (K.-K.R. Choo).
networking right up to highly complex calculations).

https://doi.org/10.1016/j.cose.2023.103097
0167-4048/© 2023 Elsevier Ltd. All rights reserved.
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Fig. 1. High-level illustration of layered IIoT ecosystem.

The IIoT market is estimated to reach more than $100 billion by acterized as specialized security systems designed to continuously
20261 , and the number of operable IoT devices to be 500 billion track and assess incidents within computer networks and systems
by 2030, as reported by Cisco. Although the advantages provided for signs of security breaches. IDS operates under two main strate-
by IIoT technologies are significant, however, it also brings with gies: signature-based IDS maintains a predefined signature of at-
them inherent and considerable security risks, turning the in- tack, but the downside of this technique is that the IDS cannot
dustrial sector into a more vulnerable position than ever before, detect novel or unknown attacks. Anomaly-based IDSs employ Ma-
making it no surprise that it is being a target of attacks. A recent chine Learning (ML) to construct a model that can identify both at-
Private Industry Notification (PIN)2 from the FBI’s Cyber Division tack and normal behaviors. Although the latter can circumvent the
highlights that cyberattacks against the industrial food sectors are problem of detecting new attacks, it introduces other problems. For
on the rise, resulting in operational disruptions, and economic example, traditional learning-based IDSs typically require central-
losses. The massive growth of IoT requires the enforcement of ized training data. This, in turn, leads to various challenges, such
appropriate security and privacy policies that prevent potential as privacy issues, a considerable network, and power consump-
threats to the security and privacy of the industrial system, with tion overhead (Ferrag et al., 2021a). Federated Learning (FL) is an
some IoT-related threats that can cause greater damage in an emerging concept that supports collaborative learning capabilities
IoT environment, and conventional security measures may not be that preserve privacy and minimize training costs (McMahan et al.,
always sufficient (Sengupta et al., 2020). IoT security accounts for 2017) by allowing devices to jointly train a distributed model using
a key bottleneck that constrains the success of IIoT deployment. an aggregation server, while retaining all learning data on the de-
Clearly then, unless there is adequate security, IIoT is unlikely to vice, thereby isolating machine learning capabilities from central-
fully achieve its anticipated potential. Therefore, recent years have ized storage. There has also been significant interests in cyberse-
witnessed a huge growth in IIoT security research. curity issues related to industrial IoT ecosystems, with a particular
One of the hot-spot security research areas includes the devel- focus on FL-based IDSs, in recent years (Ferrag et al., 2021b).
opment of Intrusion Detection Systems (IDSs), which can be char-
1.1. Motivations

1
https://www.marketsandmarkets.com/Market-Reports/ Our proposed system is intended to be implemented across
industrial- internet- of- things- market- 129733727.html different smart factories belonging to different industrial organi-
2
https://www.ic3.gov/Media/News/2021/210907.pdf zations to collaboratively shape defensive cybersecurity models
2
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Fig. 2. High-level illustration of 2DF-IDS.

without being subject to any conflict of interest. Similarly, shar- is rarely highly secure, either from external or internal threats.
ing data between parts of the same organization, will not be a In addition, relying on a centralized server-based architecture can
problem as there will be a certain level of trust between them. introduce new threats to the ecosystem. In the the high-level
As a result, retaining cybersecurity intelligence from other or- network design part of Fig. 2, we can see four smart factories
ganizations under the same defense policy with a lower level equipped with 2DF-IDS. When we zoom in on one of them (as
of confidence leads to a potential increase in compromise and illustrated in the Hight-level IIoT system design part), we can see
competitiveness. Our proposal is specifically designed to maintain the adopted architecture of a smart factory as a whole, and also,
such scenarios, where mutual trust training is very necessary for the links with other smart factories. Let’s assume that their ref-
unreliable environments. However, our system does not follow erences are SF1, SF2, SF3, and SF4. Also, let’s assume that SF1 has
traditional FL rules by involving a centralized entity to aggregate been the subject of AttackType1 in the past, meaning that SF1 has
updates from participating parties. In fact, it instead imple- intelligence on AttackType1 along with the Benign profile of its
ments a peer-to-peer approach with different layers of security operations, network flows, and all normal profiles. The same can
to combat security issues from both external and participating also be generalized to other smart factories (AttackType2 targeting
parties. SF2, and so on). By using 2DF-IDS, the smart factory SF1 can
learn to detect all attack types (AttackType1..AttackType4), without
1.2. Contributions sharing its data, without the need for other smart factories’ data,
and without necessarily being the subject of every attack type to
Most existing systems suffer from various shortfalls, like having recognize it. In addition, the proposed system is intended to elim-
limited instances of high-quality cyber attacks, as information inate the single point of failure vulnerability represented in the
owners are not prepared to make such sensitive information about conventional FL aggregation server. We do this by allowing par-
their critical systems public, rendering the model construction ticipating parties to directly train a model, without a centralized
effort extremely difficult. In addition, the exchange of parameters entity that controls the training sessions, and can be compromised

3
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

either in a normal way (caused by failure) or maliciously (caused secures FL-based schemes against adversarial attacks that target
by cyber-attacks). Moreover, FL is found to be vulnerable to model IIoT applications within the cloud. This system can gather defense
extraction and model reverse attacks, where hackers can make a intelligence about attack instances from various sources and al-
certain level of confidence in the existence of private data from lows sharing them between IIoT devices that are implemented
the exchanged gradients (Wang et al., 2019). To tackle that we in a cloud-based architecture. Hao et al. (2019) implemented a
propose the use of quantum-based secure exchange for securing Privacy-Enhanced FL (PEFL) scheme to build a privacy-preserving
against external parties and DP for securing against participat- IDS for industrial artificial intelligence environments. The authors
ing parties. Furthermore, the system we propose is designed to proposed a secure aggregation protocol based on Homomorphic
ensure user-friendly resource utilization and low network usage, encryption and protected training data privacy using Differen-
unlike centralized learning, where uploading training data can tial Privacy (DP) by implementing distributed Gaussian mecha-
cause privacy and performance issues, due to the amount of data nism. The performance evaluation was carried out on the MNIST
that needs to be exchanged before every centralized training dataset. Zhang et al. (2020) presented a blockchain-based FL plat-
session. form architecture for IIoT ecosystem failure detection. The pro-
In this paper, we demonstrate that by addressing the above posed scheme checks the data integrity of customers and imple-
problems, FL-based IDSs can be more secure, efficient, and privacy- ments a Centroid Distance Weighted Federated Averaging (CDW
enhancing. Moreover, a detailed performance analysis, as well as FedAvg) algorithm that accounts for the positive class to a negative
a comparative analysis between the proposed system, centralized class distance of each customer dataset. Khan et al. (2021) pro-
learning, FL, and state-of-the-art works, can validate our claims. posed an FL-based security system designed to recognize supply
With a slight degradation of up to 0.47% in accuracy compared chain 4.0 intrusions, called DFF-SC4N. From the proposed system,
to the base case (centralized learning), and an outperformance rounds of communication are employed in an FL manner with Re-
of up to 12%, 13%, and 9% in F1 score, recall, and accuracy current Managed Units (RMUs) while sharing just the learned met-
within tough privacy settings compared to other FL-based IDS so- rics while maintaining the data undisturbed on the local server.
lutions. This work has three primary contributions as laid out Taheri et al. (2020) designed an FL-based architecture, named Fed-
below: IIoT that is intended to discover Android malware apps in the IIoT
domains. The system architecture consists of two main sections:
• We present 2DF-IDS, a secure, decentralized, and differentially
1) a participant side, in which data is generated from two dif-
private FL-based IDS for securing IoT and IIoT environments.
ferent poisoning threats using a Generative Adversarial Network
• We enforce the privacy of participating clients in two ways,
(GAN) and a federated GAN; and 2) a server-side that seeks to
specifically a quantum-resilient Ring-Learning With Errors (R-
supervise the aggregated model and build a robust collaborative
LWE) key exchange protocol for outbound, and for inbound, dif-
training model. In addition, the communications were secured us-
ferential privacy to prevent privacy leakage from clients’ local
ing a Paillier cryptosystem to preserve the security of exchanged
gradients.
parameters during the training process. Li et al. (2020) proposed
• We consider a fully decentralized aggregation scheme in 2DF-
DeepFed, an FL-based IDS designed for securing Industrial Cyber-
IDS to eliminate the single point of failure threat introduced by
Physical Systems (CPSs). The system was evaluated using a real in-
the aggregation server as within conventional FL.
dustrial CPS dataset and three sets of learner clients, namely 3,
The rest of the paper is organized as follows: In Section 2, we 5, and 7 clients, along with an aggregation server and a trusted
present the related works. In Section 3, we describe the concept of authority. The threat model considered is DoS, command and re-
the proposed system, starting with a high-level overview, followed sponse injection, reconnaissance, and FL-related eavesdropping at-
by a description of its constituent parts, and the system stages. In tacks. Mothukuri et al. (2021) proposed an FL-based anomaly de-
Section 4, we perform a series of experimental evaluations to as- tection in order to identify IoT network intrusions. The authors
sess the performance of the proposed system, along with a side- trained Gated Recurrent Units (GRU) models while preserving the
by-side comparative summary with other competing approaches. local IoT device data privacy, by only sharing the learned weights
Finally, Section 5 concludes this paper. with the aggregation server.

2. Related works 2.1. Limitations

Research efforts related to FL-based cyber security in IoT and The aforementioned works leverage FL’s key advantage, which
IIoT environments have emerged in the last few years after the involves local training without sharing private data, thus protect-
introduction of the FederatedAveraging algorithm (FedAvg) pro- ing client data from being compromised by malicious parties. Nev-
posed by McMahan et al., which aims to aggregate the weights ertheless, FL brings its own challenges (Kairouz et al., 2019). For
of multiple locally trained machine learning models into a global instance, data privacy can still be compromised to some extent
model (McMahan et al., 2017). Since then, different studies have by adversaries analyzing the variation between FL-clients associ-
been conducted in this field (Ferrag et al., 2021a; Friha et al., ated training weights (Wang et al., 2019), or that private and sen-
2022). Nguyen et al. (2019) proposed an FL-based IDS built upon sitive training data is unintentionally memorized by generative se-
an automated technique for type-specific IoT devices, the sys- quence models (Carlini et al., 2019), making it possible to recover
tem experimented with its ability to detect Mirai-infected devices. some information on the training data from the model. In addi-
The accuracy, training time, and False Alarm Rate (FAR) reported tion, the aggregation server introduces a Single Point Of Failure
on a real testbed indicated the feasibility of the system. How- (SPOF) threat to the entire network architecture. Furthermore, the
ever, only malware-based threats (and precisely Mirai) are con- network and computational overheads brought in when relying on
sidered. Other types of IoT/IIoT threats and attacks are missing. external modules such as current blockchains are not well suited
Schneble and Thamilarasu (2019) implemented and evaluated an for certain time-critical and real-time security applications. It is ev-
FL-based IDS using a collection of actual patients’ data. The pro- ident from the preceding analysis that certain gaps are highlighted
posed IDS was designed to Secure Medical Cyber-Physical Sys- in the literature, including issues of privacy, constrained architec-
tems (MCPS). However, the only considered threat model was De- tures, and limited threat models addressing only a limited scope of
nial of Service (DoS), along with data injection and modification. the attack vectors, all requiring addressing effectively to secure the
Song et al. (2020) proposed a devoted defense mechanism that IIoT sector.

4
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

3. 2DF-IDS: Proposed approach Table 1


Notations used in the proposed system.

In this section, alongside a credible threat model, we provide Notation Description


a detailed description and construction workflow of the proposed T Global iterations
system by describing the operational models, constituent blocks, E Clients local iterations
and the proposed methodology. PG Connected graph
Nkt Neighborhood of the client k at t
Dk Local Private Dataset for client k
3.1. High-level system overview wtk Client k parameters at t
K Total clients number
Figure 2 presents a high-level and thorough illustration of the η Learning rate
L (. ) Loss function
proposed 2DF-IDS. The remainder of this section discusses each g ( xi ) Gradient computed on xi
part comprehensively. C Gradient norm bound
σ DP Noise scale
R Ring (R := Z[X]/( f (X )))
3.1.1. Objectives Rq Quotient Ring (R/qR)
Research on the development of FL-based IDSs is currently a hot χ β -bounded Gaussian distribution over Rq
topic of investigation (Ferrag et al., 2021a; Friha et al., 2022), given a ← Rq Uniformly random sampling
sk Initial secret key for user k
the potential for privacy preservation (at certain levels) with ac-
Pk Temporary public key for user k
curate model training in the former, and efficient profiling of both Sig(. ) Signal Function
benign and malicious patterns in the latter. The main goals of our Rec (. ) Reconciliation function
proposed system are to ensure the above features and, in addi- LKk Initial shared session key
tion, to introduce extra security layers and efficiency features to SK Ephemeral key
H (. ) One-way Hash function
improve security, privacy, and reliability further.
KeyGen(. ) Key Generation function
Enc (. ) Symmetric Encryption function
Dec (. ) Symmetric Decryption function
3.1.2. Threat model
( , δ ) Privacy cost
The proposed 2DF-IDS is designed around the assumption of
“honest but curious” clients. This means that participating clients
actually follow all the steps, with the possibility to take a closer
not the case. This also implies that the exchange of model pa-
look at the parameters of other peers. We denote an adversary A
rameters during FL training sessions which are secured using
to be either a malicious entity with harmful intentions or else a
such cryptosystems is also likely to be compromised in the fu-
curious in-system user, who does not intend to cause damage but
ture.
is curious enough to explore. While it is possible that an adversary
A could try to poison the learning process, however, the existence
3.1.3. Proposed model
of multiple honest entities will greatly reduce the adversary’s ef-
As illustrated in Fig. 2, the proposed model is as follows:
fect. We consider the following threat model:
• Network Model: having a group of smart factories intercon-
• SPOF Threat: The aggregation server can stretch the attack sur- nected3 by an unsecured communication channel. The first ob-
face against FL-based systems, given that it serves as the prin- jective is to use this channel and create a secure version of it
cipal entity on which the training depends, and thus represents so that the transmitted data is protected.
a single point of failure to the entire operation from a secu- • System Model: Each smart factory is classified as a group of
rity perspective. Adversary A can shut down the entire training overlapping and interoperating technologies, as illustrated ear-
process by simply focusing on compromising the aggregation lier in the paper. In addition, each contains an event logging
server. This type of attack has the potential to cause significant mechanism, a local private database, and an instance of 2DF-
damage when the FL-based training process is used to train an IDS also referred to as a client. We also consider adversary A
IDS system, potentially delaying the creation and deployment that launches different types of cyberattacks against different
of the model, thus allowing more time to launch an attack suc- smart factories. For the training stage, each client exchanges
cessfully. the learned model on its local data, with all other interested
• Private Data Reconstruction: it is proved that there are cer- clients in a collaborative and decentralized learning process.
tain conditions in which the training data privacy is threat- The shared parameters must first be protected from both out-
ened in FL settings, either unintentionally (Carlini et al., siders (those in the insecure channel) and insiders (those in
2019) or intentionally (Wang et al., 2019). For example, the secure communications). The collaboratively trained model
Aono et al. (2017) demonstrated that individual users’ private will be able to effectively detect all kinds of attacks that have
training data could be retrieved from the exchanged gradients. targeted the group since the experience of all clients is shared
To be able to reconstruct a given sample xk , the attacker makes among all 2DF-IDS instances.
use of both the model weight parameters wk and the biases bk
w
using b k . 3.2. 2DF-IDS: Building blocks
k
• Quantum-based Crypto-analysis Attacks: a major threat to exist-
ing public-key methods, which are primarily grounded upon In this part, we present the constituent elements used in the
the two main hard problems of the prime number factorization construction of 2DF-IDS. Table 1 gives the notations used.
problem and the discrete logarithm problem, lies in the possi-
bility of being broken by quantum computers, as demonstrated 3.2.1. Deep learning model
by Peter Shor’s proposal of a polynomial-time quantum scheme Deep Learning (DL) (LeCun et al., 2015) operates through mul-
for the factorization problem (Shor, 1994). This has the obvious tiple layers (hence the term “deep”) to derive high-level charac-
meaning that all online workloads are handled under the as-
sumption that factoring integers containing a fairly big number 3
The term “interconnected” does not necessarily mean a direct connection, it can
of digits is virtually impossible, although hypothetically this is be a connection through the Internet.

5
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Fig. 3. (a) Vanilla FL vs. (b) Decentralized FL, from the standpoint of client k.

teristics from the input data in a stepwise fashion. A feature that where there is an edge (k, n ) ∈ E that represents a bidirectional
has proven the tremendous benefit of providing efficient data rep- connection between client k and client n. The neighborhood of the
resentations to build improved models. In our system, we employ client k representing all clients {[1, . . . , N], N ≤ K }, being connected
a Deep Neural Network (DNN) (to be more specific a Multilayer to k, and can exchange information with at a given time t, is de-
Perceptron (MLP)), which is composed of an input layer (h(0 ) (x )), noted by Nkt . In FedAvg (or vanilla FL) (McMahan et al., 2017) ev-
multiple hidden layers (h(k ) (x )), and the output layer (h(L+1 ) (x )), ery client computes the average gradient on its local data and up-
as presented in Eq. (1). dates its model weights using [wk − η∇ L(wk , x )], where η is the
 learning rate, and L(. ) is the loss function. Then the aggregation
h (0 ) ( x ) = x
server computes the new weight for the global model at a given
h(k ) (x ) = g(W (k ) h(k−1) (x ) + b(k ) , k = 1, . . . , L (1)  n
global learning round using [ Kk=1 nk wt+1 ], where nk is the local
h(L+1) (x ) = g(W (L+1) h(L ) (x ) + b(L+1) ), k = L + 1 k
sample number for client k. Going back to network topology, the
Where x is the input entry, L is the layer, g(. ) is the activation vanilla FL algorithm represents a star network with the server at
function, and b is the bias. While there exist different kinds of neu- the center, with C.K elements at a given round, where C is the pro-
ral networks, they still share the same main ingredients (neurons, portion of clients used for training at that round. If client k has
weights, biases, and functions). Each artificial  neuron is a func- the ability to collect the full weights from all its connected neigh-
tion, given by f j (x ) = g(< w j , x > +b j ) , where, X is the charac- bors in Nkt , then, by simply updating the model parameters using
teristics input record [X = (x1 , x2 , . . . , xd )], w j is the vector of con-
  Eq. (2) (Zhang et al., 2021), where every client k generates its own
nection weights w j = (w j,1 . . . w j,d ) . For the hidden layers’ acti- model wk by averaging the resultant difference to its neighbors.
vation function, we utilize Rectified Linear Unit (ReLu), given by
1 
[g(x ) = max(0, x )], since it is significantly less prone to the vanish- wk = ( w − wk ) (2)
ing gradients problem when compared to Sigmoid and Tanh, and |Nkt | t n
n∈Nk
therefore makes the training process much more efficient. How-
ever, since we are dealing with a multi-class classification problem, The above method has been guaranteed to eventually reach the
  centroid of all client models since there exists at least a path be-
ezi
we employ SoftMax for the last layer, given by σ ( z )i = K zj , tween every pair of clients (Zhang et al., 2021), so it is possible for
j=1 e
the clients to train a model and reach a consensus without using
where z is the input vector, and K is the classes number.
the aggregation server. In Fig. 3 we present a comprehensive illus-
3.2.2. Decentralized FL approach tration of the main differences between aggregation server-based
We are considering a training algorithm that manages and gen- FL (Vanilla FL), and aggregation server-less FL (Decentralized FL).
erates information exchanged between the different participants As we can see, the main difference between the two approaches is
in the network (Lalitha et al., 2019), that sustains high connectiv- that there is no central server that orchestrates the training pro-
ity across time, where every client can train a model on its pri- cess in the decentralized FL approach, and thus the aggregation
vate data Dk . Messages exchanged between the connected clients techniques that are used.
including the trained weights, are represented by Wk . Having a
connected graph PG of peer-to-peer clients, given by PG = (V, E ), 3.2.3. DP-Based privacy preserving
where V is the clients’ vector of K elements (K is the total num- Since model parameters are shared among the system peers,
ber of clients), and E is the set of paired elements in the graph, models are not allowed to exhibit any private details within the

6
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Fig. 4. Differentially Private Stochastic Gradient Descent schematic illustration.

trained data. DP (Dwork et al., 2006; Song et al., 2013) is a mech- Where B is the group size, N is the Gaussian distribution, and

anism that involves making computations on large datasets while σ is the noise scale, in which we will have gt . After that it updates

restricting leakage of specific and particular information. In other the parameters using Wt − ηgt , where η is the learning rate. In the
words, a DP-satisfied scheme quantified by ( , δ ) privacy settings, end, it calculates the overall privacy cost ( , δ ). Figure 4 presents
guarantees that a given input will not influence its outcome or the a schematic illustration of DP-SGD. Although this so-called micro-
overall dataset statistics. Within a formal definition, a randomized batching technique actually produces accurate per-sample gradi-
scheme S is ( , δ ) private for adjacent datasets d, d ∈ D, which dif- ents, in practice this method tends to take a long time to be ac-
fer in no more than one sample, and for each output subset R ⊆ R, complished, with the limited exploitation of hardware accelerators.
it is stated that P r (S(d ) ∈ R ) ≤ e · P r (S(d ) ∈ R ) + δ . Vectorized computation is a new technique (Yousefpour et al., 2021),
The first nave way to protect private training data would which is intended to overcome the DP-SGD speed problem, by de-
involve operating solely on the end parameters resulting from riving the per-sample gradient method and constructing a vector-
the training process. However, the over-conservative addition ized variant of it. Using loss versusweight derivative according to
of noise to the parameters has the potential to completely ∂y ∂y ∂u
ruin the usefulness of the learned model (Dwork et al., 2006).
the chain rule equation ∂ x = ∂ u ∂ x , for obtaining per-sample gra-
∂L
Abadi et al. (2016) provided a Differentially Private Stochastic Gra- dients, the following equation is used [ ∂ W B = ∂ L(x) X j(x ) ], where W
i, j ∂Y 
dient Descent (DP-SGD), which is a DP version of the SDG opti- i
denotes the weights matrix, LB is the loss of the batch B, and x is
mization algorithm, that maintains DP whenever the model pa-
the batch element, while Y and X denote the output and the in-
rameters are updated. DP-SGD protects the training dataset privacy
put matrix, respectively. The previous equation is intended for one
by manipulating the parameter gradients used by the model when
linear layer, a more general formula for a layered neural network
updating its weights, instead of manipulating the data directly. In ∂L
order to achieve this, and during every step, DP-SGD computes the is given by [ (Bl ) = ∂(lL)(x) G(jl−1 )(x ) ], where i is the neuron, l is the
∂ Wi, j ∂ Gi
gradient at time t using: ∂L
layer, G is the activation function, and is the highway gradi-
∂ Gi(l )(b)
gt (xi ) = ∇Wt L(Wt , xi ) (3)
ent (Yousefpour et al., 2021).
For a random subset of samples {x0 , x1 , . . . , xn }, with W is the
parameters, and L(. ) is the loss function. Then clip the gradient’s
l2 norm by dividing the gradient at time t on: 3.2.4. Secure key exchange protocol
Although DP ensures the privacy of training data located at
gt (xi ) 2
max 1, (4) the client end, it cannot provide any additional protection against
C
other types of attacks over unsecured networks. Therefore, we pro-
where C is the norm bound. After computing the average, the noise pose the use of a secure key exchange protocol, which is used
is added using the: between peers in the 2DF-IDS system, to secure the transmission
channels. Given the fact that existing public key methods, mainly
1 
gt (xi ) + N (0, σ 2C 2 I ) (5) based on the two main hard problems of prime number factor-
B ization problem and discrete logarithm problem (such as Diffie-
i

7
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Hellman, RSA, and elliptic curve cryptography), are very likely to AES cryptosystem to generate an ephemeral key (as presented in
be broken by quantum computers, as shown by Shor’s quantum Algorithm 1), which will be used in the second phase to se-
algorithms (Shor, 1999). Post-quantum cryptography consists in de-
signing cryptosystems whose reliability against adversaries having Algorithm 1 Group Training Session key exchange protocol.
quantum computers is maintained. Currently, the majority of post-
Input K, n, q, R, χ , a  R: the used ring, χ : the
quantum-based solutions belong to six different families. All repre-
discrete Gaussian distribution on Rq , and a: the uniformly random
sent a distinct class of mathematical problems, which are hard to
public sample from Rq .
address even by quantum computers.
Output Symmetric group session key SK
Lattice-based hard problems are one of these families that rely
on worst-case problems for security. While lattice-based cryptog- Require: n = x2 , f (x ) = xn + 1, a ← Rq
1: for k = 0, . . . , K − 1 in parallel do  Clients Initialization
raphy is getting widely applied in practice, one specific class of
problems used in many recent implementations is of special inter- 2: Randomly Sample. sk ← χ
est, namely Learning With Errors (LWE) (Regev, 2009) and its even 3: Randomly Sample. e0k ← χ
more efficient ring-based variant, Ring-LWE (Lyubashevsky et al., 4: Compute. Pk0 = ask + 2e0k
2010). Within a formal definition, given the polynomial quotient 5: Send Pk0 to client k + 1.
ring R of degree n over Z, where R := Z[X]/( f (X )), where n is a 6: end for
power of 2, with f (X ) is irreducible, let Rq := R/qR = Zq [X ]/( f (X )), 7: for k = 0, . . . , K − 1 do  Group-SK negotiation
where Rq is the quotient ring with a prime integer modulus q, and 8: for c = 1, . . . , K − 2 do
given an error distribution over short elements of the ring R de- 9: Randomly Sample. eck ← χ
noted by χ . Then, the Ring-LWE search problem lies in recovering a 10: Set. z = (k + c ) mod K
uniformly random secret s, by having independent samples of the 11: Compute. Pkc = sz · Pkc−1 + 2eck
form: 12: Send Pkc to client (z + 1 ) mod K.
13: end for
( ai , bi = s · ai + ei ) ∈ Rq × Rq (6)
14: end for
where every ai ∈ Rq , and si ∈ Rq are sampled uniformly at ran- 15: for k = 0, . . . , K − 1 do  Ephemeral key Generation
dom, and every ei ← χ , is drawn according to the explicit (Gaus- 16: Set. nd = K − 2
sian) error distribution χ . Additionally, the Ring-LWE decision prob- 17: Set. np = (k == 0 )?1 : k + 1
lem is the challenge of differentiating samples of the above form, 18: Randomly Sample. erk ← χ
from other samples that are uniformly random over {Rq × Rq }. 19: Compute. LKk = sk · Pnp nd + 2er
k
Given the fact that is hard for quantum algorithms to approxi- 20: if k == 0 then
mate the Shortest Vector Problem (approx-SVP) in polynomial-time 21: Compute. ς = Sig(LKk )
with the worst case on ideal lattices, Ring-LWE is known to be 22: Compute. T Kk = Rec (LKk , ς )
as hard as approx-SVP on any ideal lattice (Lyubashevsky et al., 23: Broadcast. ς
2010). We employ Ring-LWE together with the AES cryptosystem 24: else
for exchanging a shared group training session key. The Ring-LWE 25: Compute. T Kk = Rec (LKk , ς )
-based key exchange part relies on the DXL-KE protocol proposed 26: end if
by Ding et al. (2012), which introduces a signal function to spec- 27: SK = KeyGen(T Kk )  AES Ephemeral key
ify whether or not an input is part of a fixed set, which is de- 28: end for
noted by Sig(.), and defined as: Having Zq = {− q−1 q−1
2 , . . . , 2 }, to-
gether with the middle subset E := {−[ 4q ], . . . , [ 4q ]}, the comple-
cure the communication channel for collaborative model train-
ment of E’s characteristic function Sig(a) return 0 if a ∈ E, and re-
ing in a decentralized and DP-based FL approach (Algorithm 2).
turn 1 otherwise. In addition, the reconciliation function used to de-
Table 1 presents the list of notations used. Given a total of K
rive the Ring-LWE key, is denoted by Rec(.), and defined by: Given
clients, and a decentralized secure system for public parameters
any a, b ∈ Zq , s.t. (a − b) mod 2 = 0, and |a − b| < τ , where τ is
setting broadcasting.4 The 2DF-IDS training process steps are as
the systems’ error tolerance, we have [Rec (a, ς ) = (a + ς · q−1 2 mod follows:
q ) mod 2: Zq × {0, 1} → {0, 1}], in other words, where ς = Sig(b),
it holds that Rec (a, ς ) = Rec (b, ς ). Ding et al. (2018) demonstrated
that the DXL-KE protocol is vulnerable to key reuse attacks. To ad- 3.3.1. Initialization
dress this, we emphasize stricter requirements, including: The initialization stage includes setting up the public environ-
ment. We consider initialization at two different levels: 1) global
• Using the Ring-LWE shared key as a simple input to the AES system initialization and 2) initialization of participating clients.
cryptosystem to obtain the final shared secret group key
• Public and secret settings must be changed for each training • Global System Initialization: The public system parameters
session needed include: Total number of participating clients denoted
• Public and secret settings and keys must never be rechosen for by K, default parameter w0 , global training epochs T , local
a new training session training epochs E, learning rate η, an integer n of power of
2 used as a dimension, defining f (x ) = xn + 1, and considering
The last two conditions require the use of a randomization
the ring R, where R := Z[x]/( f (x )). Choosing a prime integer q,
function when choosing q, n, and the ring R.
defining Rq = Zq [x]/( f (x )), sampling the Discrete Gaussian dis-
tribution on Rq , denoted by χ , with norm at most β , and choos-
3.3. 2DF-IDS: Secure, decentralized, and DP training
ing public sample a ← Rq uniformly random.
• Clients Initialization Every client k sets its initial parameter
The proposed system aims to train a shared global model
w0k to the default parameter w0 , samples a uniformly random
in a secure, decentralized, and DP-based FL approach. There are
two main phases in the 2DF-IDS system, first, is the session
group key exchange protocol, where we use the Ring-LWE group 4
We consider using a decentralized system as well for setting the public param-
key exchange protocol of Ding et al. (2012), together with the eter, specifically a blockchain application enabling smart contracts.

8
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Algorithm 2 Secure, Decentralized, and DP-based FL. • Ephemeral key Generation: The obtained key LKk from the previ-
ous step will be used by all clients to generate an ephemeral
Input SK, PG , {Nkt }k∈K , K, w0 , T , E, {Dk }k∈K , η, C, σ
key SK, using the AES cryptosystem. We also use four func-
Output Trained model wT , and overall privacy cost ( , δ ). tions, namely a one-way hash function H(.), a key generation
Require: Prior execution of Alg. 1 function, denoted by KeyGen(.), and defined by KeyGen(x ) =
1: Set. w0 = w0 for k ∈ [0, . . . , K] H (x||K ), which is used to generate the ephemeral key SK. Sym-
k
2: for t = 0, . . . , T do  Local Training metric encryption function, denoted by Enc(.), and defined by
3: for k = 0, . . . , K − 1 in parallel do C = Enc (M, KS ), and a symmetric decryption function, denoted
4: for i = 1, . . . , E do by Dec(.) and defined by M = Enc (C, KS ), where M is the plain-
Dik
5: Sample. Dik ← Dk , with Prob. Dk
text, and C is its ciphertext.
6: Set. MB = size(Dik )
7: for each dk ∈ Dik do 3.3.3. Model training
8: Compute. gtk (dk ) = ∇wt L(wtk , dk ) By the time all clients k ∈ [0, . . . , K − 1] have agreed on the
k
gtk (dk )
group session key SK, they are ready to start the shared model
9: Clip. gtk (dk ) = gt (dk )
learning process. This phase consists of two steps, namely lo-
max(1, C
2 ) cal client training and decentralized aggregation, as outlined in
10: end for
1 
Algorithm 2.
11: Compute. gtk = MB ( dk ∈Di gtk (dk ) + N (0, σ 2C 2 ))
k
12: Compute. wt+1 = wtk − η · gtk • Local DP-model Training: Every client k start by setting its initial
k
model weights w0k to the default settings w0 , after that, it starts
13: Compute. cwt+1 = Enc (wt+1 , SK )
k k the local model training on its private dataset Dk , by sampling
t+1
14: Send. cwk to all neighbors n ∈ Nkt a mini-batch Dik , where i is the local epoch index from E. Then,
15: end for it computes per sample dk gradient gtk (dk ), using ∇wt L(wtk , dk ),
16: while No consensus do  Global Training k
which is then clipped in l2 norm, according to clipping bound
17: for k = 0, . . . , K − 1 do
C to control the effects of individual training data on updates
18: for each n ∈ Nkt do
(Abadi et al., 2016). After completing the local epochs update,
19: Compute. NWkt [n] = Dec (cwt+1
n , SK ) client k introduces the Gaussian noise N (0, σ 2C 2 )), where σ
20: end for
 denotes the noise scale and computes the new local model pa-
Compute. wt+1 n∈Nt (NWk [n] − wk )
t+1
21: = 1t t
rameters wt+1
k |Nk | k k
. Client k at this step needs to securely share the
22: end for resulting weights with its neighbors denoted by Nkt . To do this,
23: end while it encrypt the payload using Enc (wt+1
k
, SK ) and send the result-
24: end for ing cipher cwt+1 to n : [n ∈ Nkt ].
k
25: end for
• Decentralized Aggregation: After client k receives all neighbours’
ciphered new models, k decrypts every and each one of them
using Dec (cwt+1
n , SK ), and stores the parameters in a table de-
sk , e0k ← χ , where sk is the initial secret key, and e0k is the sam- noted by NWkt [n]. In the end, k executes the consensus pro-
pled error distribution. Then, every client computes its pub- 
tocol and aggregates all weights using 1t n∈Nt (NWk [n] −
t
lic key using Pk0 = ask + 2e0k , and send it to the next client. It |Nk | k

has been earlier mentioned that the network is considered as wt+1


k
) to obtain the new wt+1
k
. The consensus protocol enforces
a connected graph PG of P2P clients, and the client positions the repetition of the last step until all clients reach identical
can be managed by the decentralized global system using client model parameters before beginning the next local update cycle
identities and chained lists to re-form the graph logically into a (Zhang et al., 2021).
cyclic group, for that task.
4. Performance evaluation
3.3.2. Group-SK agreement
In this section, we begin with a detailed description of the used
At this stage, all clients agree on a standard training session key.
dataset in conjunction with all experimental configurations, fol-
This phase is then divided into two stages. The first is where all
lowed by an evaluation of our proof-of-concept implementation of
clients run the Ring-LWE-based key exchange protocol, where they
the 2DF-IDS system. In addition, we will demonstrate the effec-
get the initial group-SK, which serves as the input to the next cryp-
tiveness of our proposed system by comparing it against various
tosystem, namely AES, wherein the output will be the ephemeral
state-of-the-art studies.
key, which will be used to secure the training session.

• Initial Group-SK Negotiation: for every client k, and given c ∈ 4.1. Dataset description
[1, . . . , K − 2], randomly sample eck ← χ , and compute Pkc = sz ·
Pkc−1 + 2eck , where z = (k + c ) mod K. Then send it to the next We used the Edge-IIoTset dataset (Ferrag et al., 2022), which
client. After that, every client samples another uniformly ran- is a recent, comprehensive, and realistic IoT/IIoT-based applica-
dom set given by eck ← χ , and computes LKk . The first client tions dataset specifically designed for cybersecurity purposes. The
uses the two functions discussed above, namely, the signal dataset was generated on a sophisticated and realistic multi-layer
(Sig(. )), and Reconciliation (Rec (. )) functions, to compute both testbed that includes the following layers: IoT and IIoT perception,
the temporary group secret key T Kk , and to broadcast ς to the edge computing, Software-Defined Networking (SDN), Fog comput-
rest of clients, that use Rec (LKk , ς ) to obtain the same T Kk . ing, Blockchain network, Network Function Virtualization (NFV),
It is proven that there is an overwhelming probability that all and cloud computing. The IoT/IIoT layer is powered by more than
clients k, will end up with the exact same key T Kk with value ten types of industrial devices. The dataset provides 61 highly im-

a K−1
k=0 k
·s + ψ , if the condition given by 4q − 2 ≥ nK β K+1 .4K ≥ portant features out of a total of 1176 features, aggregated from
|ψ0 − ψk∈[1,...,K−1] | is met (Ding et al., 2012). different sources (alerts, logs, network packets, system events, etc).

9
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Edge-IIoTset incorporates various types of network protocols, in- Table 2


Settings used in the experiments.
cluding traffic from IP, ARP, ICMP, HTTP, TCP, UDP, DNS, MQTT, and
Modbus/TCP. The normal (or benign) traffic represents 72.8% of the Subject Parameters Values
total samples used, with about 1,615,643 samples. Another strength Hidden nodes 215
of the Edge-IIoTset is the considerable number of attack types in- Hidden layers 3
cluded, which represents a total of 14 different IoT/IIoT-targeted at- Nodes per layer 128, 64, 32
tacks, and are categorized as follows: Learning rate [0.1, 0.01, 0.0001]
Regularization L2
• DoS/DDoS:attacks in this category include four attack types with Classifier Loss function CrossEntropyLoss
Activation function ReLu
the following states: TCP SYN Flood DDoS with 50,062 samples Batch size 1000
(2.3%), UDP flood DDoS with 121,568 samples (5.5%), HTTP flood Classification function SoftMax
DDoS with 49,911 samples (2.2%), and ICMP flood DDoS with Clients Sets [20, 40, 80]
116,436 samples (5.2%). Data Distribution Non-IID
FL Local epochs 1
• Information Gathering: attacks in this category include three at-
Global epochs 30
tack types with the following states: network port scanning Batch size 100
with 22,564 samples (1.0%), Operating Systems fingerprinting  [0.2, 0.5, 1.0]
with 1001 samples (0.04%), and vulnerability scanning with DP δ 2.e−7
50,110 samples (2.3%). C 1.2

• Man-in-the-middle: attacks in this category include ARP and


DNS Spoofing with 1214 samples (0.1%).
• Injection: attacks in this category include three attack types 4.2.1. Experiments
with the following states: SQL injection with 51,203 samples the conducted experiments are described below:
(2.3%), XSS, with 15,915 samples (0.7%), and uploading attacks
with 37,634 samples (1.7%). • First Set: In these experiments, we want to evaluate the per-
• Malware: attacks in this category include three attack types formance of our proposed system in comparison with the cen-
with the following states: ransomware with 10,925 samples tralized data training version (train on all the dataset in a cen-
(0.5%), backdoor with 24,862 samples (1.1%), and password tralized approach) and the vanilla FL version (server-based ag-
cracking attacks with 50,153 samples (2.3%). gregation), considering a multi-class classification problem. We
did not use DP for these experiments, in order to evaluate
4.1.1. Pre-processing the decentralized algorithm in comparison with the centralized
The dataset contains more than 20 million records, with both data-based system and the server-based FL system. For each
benign (more than 11 million) and malicious (more than 9 mil- experiment (except the centralized data training version), we
lion) records. The dataset files published by the authors5 , 6 con- repeat the experiments against three different sets of clients,
tains selected CSV files for DNN training. We used the CSV file specifically 20, 40, and 80 clients. The data distribution tech-
named DNN-EdgeIIoT-dataset.csv, and apply the following prepos- nique used for sampling private local data for each client is
sessing steps 1) Data Cleaning: by removing duplicated rows, miss- Non-Independent and Identical Distribution (Non-IID), which is
ing values, drop overfitting causing features such as ip.src_host and closer to real-life settings.
ip.dst_hos. 2) Data Encoding: by dummy encoding categorical data, • Second Set: In these experiments, we repeat the previous set of
and normalizing numeric data using the Z-score normalization de- experiments, but this time we introduce the DP in every exper-
fined by: (x − μ )/σ , where the feature value is x, the mean is μ, iment using three different and fixed  values, namely [0.2, 0.5,
and the standard deviation is σ . 3) Oversampling minority classes 1.0]. We also repeat the usage of the same sets of clients as in
with SMOTE, and 4) Dataset Splitting. The used statistics of nor- the previous experiments.
mal/attack records are visualized in Fig. 5. In Fig. 6 We provide a
t-Distributed Stochastic Neighbor Embedding (t-SNE) plot of both
training and test sets. 4.2.2. Performance metrics
When examining the performance of IDSs, typically the follow-
ing metrics are involved:
4.2. Experimental settings
• True Positive (TP): correctly classified attack samples.
In order to evaluate the performance of the proposed 2DF-IDS
• False Negative (FN): wrongly classified attack samples.
system, we conducted two different sets of experiments on the
• True Negative (TN): correctly classified benign samples.
Google Colaboratory platform and implemented the system using
• False Positive (FP): wrongly classified benign samples.
the Python 3 programming language. 2DF-IDS utilizes well-known
• Accuracy, given by: T P+TT N+
P+T N
F P+F N
libraries, including NumPy for multi-dimensional arrays manipula- TP
tion, Pandas for data structures, Scikit-learn with PyTorch as the • Precision, given by: T P+F P
TP
main machine learning framework, SMOTE for minority classes • Recall, given by: T P+ FN
Precision·Recall
oversampling, and Opacus for enhanced and fast DP. In Table 2 we • F1 -Score, given by: 2 · Precision +Recall
provide the different setting values used in the conducted exper- • Macro-averaging score (Macro-Avg): is used when all classes
iments. Figure 7 plots the accuracy obtained when training with are expected to be handled equally to measure the overall per-
different learning rates (η), precisely 0.1, 0.001, and 0.0 0 01. As formance of the classifier against the most frequent class labels.
shown, the parameters of η = 0.01 obtained the best accuracy with In this case, the aim is to treat all classes equally, regardless of
94.84%. We employ η = 0.01 for all remaining experiments. their support values.
• Weighted-averaged score (Weighted-Avg): is used when there
5
is a class imbalance. It is calculated by weighting the score of
https://ieee-dataport.org/documents/edge-iiotset-new-comprehensive-realistic-
cyber- security- dataset- iot- and- iiot- applications each class label by the number of true instances in the averag-
6
https://www.kaggle.com/mohamedamineferrag/edgeiiotset-cyber-security- ing. In other words, the contribution of each class to the aver-
dataset- of- iot- iiot/version/2 age is weighted by its size.

10
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Fig. 5. Classes distribution visualization: (a) Train set, and (b) Test set.

4.3. Numerical results and comparisons curacy of the 2DF-IDS model along with its loss history gets better
with each learning epoch, as in the centralized and FL learning,
In this section, we present the numerical results of the exper- which means that all client sets are benefiting from the knowl-
iments mentioned above. We classify the results from three per- edge of their peers and improve the overall shared model. The
spectives, namely model training efficiency, per-class performance, learning performance with the introduction of DP in the second
and comparative analysis of the results. set of experiments is presented in Fig. 9, and as we can see, there
is a relative degradation in performance by all the training ap-
4.3.1. Models training performances proaches (centralized learning, FL, and 2DF-IDS) compared to the
The learning performance related to the first set of experi- previous case, which is a logical and well-known effect of the in-
ments is shown in Fig. 8. As we can see, both, the validation ac- troduced noise, which in turn brings a trade-off between model

11
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Fig. 6. t-SNE applied to (a) training set and (b) test set. (0: Normal, 1: Backdoor, 2: Vulnerability_scanner, 3: DDoS_ICMP, 4: Password, 5: Port_Scanning, 6: DDoS_UDP, 7:
Uploading, 8: DDoS_HTTP, 9: SQL_injection, 10: Ransomware, 11: DDoS_TCP, 12: XSS, 13: MITM, 14: Fingerprinting).

performance and privacy. Nevertheless, the 2DF-IDS model man- and Weighted Avg.). The results for FL and 2DF-IDS were obtained
aged to provide better performance than the FL model in some using the client sets of K = 40. We also demonstrate the advantage
specific settings, as reported in Table 3, and for instance in the of 2DF-IDS over FL in different experimental settings, for exam-
settings of K = 40,  = 1.0 the FL model’s validation accuracy af- ple, when  = 0.5, the Macro F1 score for the FL model was 32%,
ter 30 training epochs reached 86.79%, while in the same settings whereas 2DF-IDS reached 39%.
the 2DF-IDS model achieved 90.82%, another instance in the set-
tings of K = 20,  = 0.5, where the FL model reached 90.99%, while 4.3.2. Per-Class performances
2DF-IDS achieved 91.69%. Successfully classifying the Benign class In this part, we analyze the per-class performance of the three
in an IDS is critical, since, false positive rates can lead to unnec- approaches. Starting with Fig. 10, where we provide the confusion
essary confusion and unwanted notifications. For our proposed IDS matrices of four models, namely: the centralized model without
models, we have the True Positive Rate (TPR) of 100% while the the addition of PD in Fig. 10a, compared to when we add a strict
False Positive Rate (FPR) of 00% for the Benign class in all models. privacy budget (epsilon = 0.2) in Fig. 10b. In addition, the per-class
In Table 4 we present the performance of different approaches in performance of the FL model with K = 80 (Fig. 10c) is compared
different settings, in terms of precision, recall, and F1 score (Macro to that of the 2DF-IDS (Fig. 10d) under the same parameters. We

12
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Fig. 7. Comparisons of best model accuracies with different values of η.

Fig. 8. Learning Performance: DP-disabled.

Table 3
Global accuracy of the different models for the multi-classes classification after 30
training epochs.

Model K DP-settings

No-DP  = 1.0  = 0.5  = 0.2


Centralized - 94.84% 94.29% 94.22% 94.05%
20 94.27% 90.90% 90.99% 88.87%
Fig. 9. Learning Performance: (a)  = 1.0, (b)  = 0.5, and (c)  = 0.2).
FL 40 93.91% 86.79% 87.93% 83.33%
80 93.96% 80.94% 80.80% 79.89%
20 93.17% 91.95% 91.69% 84.64%
2DF-IDS 40 94.37% 90.82% 90.40% 78.00% notice from the first comparison that the introduced noise has af-
80 93.78% 79.13% 77.95% 77.95% fected the per-class performance of the centralized model, espe-
cially in the minority classes.
Moreover, we note that the performance of FL and 2DF-IDS un-
Table 4
der the same experimental conditions related to client sets (K=80)
Comparisons between different models’ performance in terms of Preci-
sion, Recall, and F1 score. are closer to each other than in the previous comparison. However,
demonstrates significant differences in the detection of password
Model DP Macro-Avg Weighted-Avg
attacks (84% for 2DF-DS and 65% for FL), uploading attacks (48% for
Pr. Re. F1. Pr. Re. F1. 2DF-DS and 28% for FL), DDoS-HTTP attacks (69% for 2DF-DS and
No-DP 79% 84% 79% 96% 95% 95% 82% for FL), and SQL injection attacks (14% for 2DF-DS and 43% for
Centralized  = 1.0 83% 76% 76% 95% 94% 94% FL). However, when noise is introduced to ensure strong privacy
 = 0.5 82% 76% 77% 95% 94% 94% preservation, 2DF-IDS outperforms FL in the majority of cases, as
 = 0.2 68% 65% 65% 94% 94% 94%
presented in Table 4.
No-DP 79% 74% 73% 95% 94% 94%
FL  = 1.0 32% 35% 31% 85% 87% 85% Table 5 provides a per-class performance evaluation and com-
 = 0.5 38% 38% 32% 86% 88% 85% parisons. Results for FL and 2DF-IDS were provided using K = 40.
 = 0.2 19% 27% 21% 79% 83% 80% The most noticeable observation is the impact of a strict privacy
No-DP 82% 77% 77% 95% 94% 94%
budget on the performance of all models, however, this constraint
2DF-IDS  = 1.0 41% 48% 43% 88% 91% 89%
 = 0.5 39% 47% 39% 88% 90% 88%
has less impact on the majority classes than on the minority
 = 0.2 10% 13% 12% 75% 78% 76% classes. For example, the benign class, which represents the major-
ity class, with more than one million records, achieved 100% preci-
Precision (Pr.); Recall (Re.); F1-score (F1.)
sion, recall, and F1 score in all DP parameters. Furthermore, these

13
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Fig. 10. Confusion Matrices: (a) Centralized, (b) Centralized [ = 0.2], (c) FL [K = 80], and (d) 2DF [K = 80].

results demonstrate the effectiveness of 2DF-IDS over FL in the ma- (or fully connected) network topology that enables each partic-
jority of the configuration parameters, as previously discussed. ipant node to communicate with all other participants. Client-
server architecture is useful given the moderate complexity of
4.3.3. 2DF-IDS and FL performances comparisons its implementation. The network has to handle only 2.K.T net-
The main observation from the previous results is that there work exchanges, where K is the total number of participating
is a clear trade-off between performance, privacy budget, and clients, T in the FL global iterations, and 2 is for the client
the number of clients involved in the training process in all ap- model upload and the server model download operations. Yet,
proaches. In other words, the more clients involved (and precisely in many situations, it can be undesirable for a variety of rea-
the fewer per-class samples in our case) with more stringent DP sons. For instance, by adding a large number of clients, the ag-
requirements, the lower the performance. In addition, we have gregation server node can be considered a communication bot-
demonstrated the advantages of 2DF-IDS over FL in different con- tleneck. For example, when it has limited communication ca-
texts. Figure 11 presents another angle of comparison considering pacity and receives too many requests from a lot of clients, it
the loss of individual clients in a training session with K = 80 and can lead to network congestion and thus a degradation in the
 = 0.2. The key significant observation is that in FL all clients have service quality. In addition, the server node can be considered
closer loss values among all clients at the beginning of the training, a reliability bottleneck, so that if the server node fails, the en-
while they tend to get more distant towards the end, while the ex- tire learning session crashes. Eliminating these problems entails
act opposite happens in 2DF-IDS, when the training starts, and all the elimination of the server node, going to the Peer-to-Peer
clients have a default model, but different local data, each client (P2P) topology where each node acts like an aggregation server
tries the default model on its data, resulting in different losses per in the server-client architecture. However, the communication
client (i.e. per data), and when decentralized aggregation starts, costs associated with the necessity of having all nodes commu-
the models of all clients tend to be much closer in each epoch, nicating during each iteration are considerably higher than in
as reflected in the results with closer losses at the end. In other the client-server architecture. Even if we ignore the exchange of
words, the resulting 2DF-IDS model tends to be more powerful, private keys, which takes place only once per training session,
since all clients have closer losses using different data, implying the network in this case has to handle T [K (K − 1 )] network ex-
that the model has successfully achieved a robust representation changes. For the centralized FL approach, the communication
of all clients’ data without compromising its security or privacy. cost CF L related to training a model can be formulated as:
• Communication overhead: Unlike the client-server (or star) net-
 

T 
K
work topology used by the centralized FL approach, which in- CF L = Swt + K.Swtcs (7)
k
volves a centralized aggregation server used to aggregate mod- t=0 k=1
els from all participating clients, 2DF-IDS adopts a peer-to-peer

14
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Table 5
Per-class performance: Precision, Recall, and F1-score.

Class DP settings Precision Recall F1-score

CL FL 2DF-IDS CL FL 2DF-IDS CL FL 2DF-IDS

No-DP 100% 100% 100% 100% 100% 100% 100% 100% 100%
Normal  = 1.0 100% 100% 100% 100% 100% 100% 100% 100% 100%
 = 0.5 100% 100% 100% 100% 100% 100% 100% 100% 100%
 = 0.2 100% 100% 100% 100% 100% 100% 100% 100% 100%
No-DP 100% 100% 100% 95% 95% 95% 98% 97% 97%
Backdoor  = 1.0 99% 00% 71% 97% 00% 97% 98% 00% 82%
 = 0.5 99% 100% 61% 96% 0.02% 100% 98% 0.03% 76%
 = 0.2 100% 00% 00% 94% 00% 00% 97% 00% 00%
No-DP 97% 95% 95% 85% 84% 84% 91% 89% 89%
Vulnerability Scanning  = 1.0 95% 30% 52% 84% 43% 100% 89% 35% 68%
 = 0.5 95% 43% 62% 84% 99% 99% 89% 60% 76%
 = 0.2 93% 20% 00% 85% 100% 00% 89% 33% 00%
No-DP 50% 40% 43% 58% 64% 84% 53% 49% 57%
Password  = 1.0 43% 00% 22% 77% 00% 0.09% 55% 00% 0.13%
 = 0.5 45% 37% 00% 49% 76% 00% 47% 50% 00%
 = 0.2 45% 00% 00% 33% 00% 00% 38% 00% 00%
No-DP 100% 100% 100% 100% 100% 100% 100% 100% 100%
DDoS ICMP  = 1.0 100% 98% 97% 100% 55% 94% 100% 70% 95%
 = 0.5 100% 91% 98% 100% 51% 92% 100% 66% 95%
 = 0.2 99% 00% 0.01% 99% 00% 0.01% 99% 00% 0.01%
No-DP 54% 77% 76% 87% 23% 23% 66% 35% 35%
Port Scanning  = 1.0 82% 00% 00% 23% 00% 00% 35% 00% 00%
 = 0.5 77% 00% 00% 22% 00% 00% 35% 00% 00%
 = 0.2 73% 00% 00% 23% 00% 00% 35% 00% 00%
No-DP 100% 100% 100% 100% 99% 100% 100% 99% 100%
DDoS UDP  = 1.0 100% 79% 94% 100% 99% 99% 100% 88% 97%
 = 0.5 99% 78% 92% 100% 99% 99% 100% 87% 96%
 = 0.2 99% 63% 56% 100% 100% 100% 100% 77% 72%
No-DP 68% 83% 62% 54% 30% 44% 60% 44% 52%
Uploading  = 1.0 65% 00% 00% 41% 00% 00% 50% 00% 00%
 = 0.5 62% 00% 00% 42% 00% 00% 50% 00% 00%
 = 0.2 66% 00% 00% 00% 39% 00% 00% 49% 00%
No-DP 95% 72% 92% 83% 91% 82% 89% 80% 87%
DDoS HTTP  = 1.0 89% 92% 70% 83% 43% 43% 86% 58% 53%
 = 0.5 88% 80% 78% 83% 43% 15% 85% 56% 25%
 = 0.2 85% 43% 00% 83% 31% 00% 84% 36% 00%
No-DP 51% 46% 56% 52% 44% 19% 51% 45% 29%
SQL injection  = 1.0 50% 24% 37% 27% 90% 73% 35% 38% 49%
 = 0.5 45% 00% 36% 53% 00% 98% 49% 00% 53%
 = 0.2 43% 00% 00% 69% 00% 00% 53% 00% 00%
No-DP 100% 100% 100% 98% 89% 90% 99% 94% 95%
Ransomware  = 1.0 100% 00% 00% 88% 00% 00% 93% 00% 00%
 = 0.5 100% 00% 00% 88% 00% 00% 93% 00% 00%
 = 0.2 96% 00% 00% 88% 00% 00% 91% 00% 00%
No-DP 98% 75% 75% 70% 98% 98% 82% 85% 85%
DDoS TCP  = 1.0 75% 62% 68% 96% 93% 100% 84% 74% 81%
 = 0.5 75% 45% 56% 96% 99% 96% 84% 61% 71%
 = 0.2 74% 57% 00% 95% 77% 00% 84% 66% 00%
No-DP 50% 65% 47% 85% 21% 78% 63% 32% 59%
XSS  = 1.0 48% 00% 00% 78% 00% 00% 59% 00% 00%
 = 0.5 48% 00% 00% 77% 00% 00% 59% 00% 00%
 = 0.2 48% 00% 00% 67% 00% 00% 56% 00% 00%
No-DP 100% 100% 100% 100% 100% 100% 100% 100% 100%
MITM  = 1.0 100% 00% 00% 100% 00% 00% 100% 00% 00%
 = 0.5 100% 00% 00% 100% 00% 00% 100% 00% 00%
 = 0.2 00% 00% 00% 00% 00% 00% 00% 00% 00%
No-DP 21% 27% 88% 97% 67% 60% 35% 39% 71%
Fingerprinting  = 1.0 100% 00% 00% 43% 00% 00% 60% 00% 00%
 = 0.5 93% 00% 00% 54% 00% 00% 68% 00% 00%
 = 0.2 00% 00% 00% 00% 00% 00% 00% 00% 00%

Centralized Learning (CL)

In the above equation, Swt is the size of the model w of a In the above equation, Swt is the size of the model w of a given
k k
given client k at the global epoch t, and Swt is the size of client k at the global epoch t, which must be communicated
cs
the aggregated model w from the centralized server cs at the to all K − 1 other peer clients. Experimental evaluations of the
global epoch t. While for the 2DF-IDS, the communication cost communication costs using the above formulations yielded the
C2DF −IDS related to training a model can be formulated as: following results: for the centralized FL with K=20 and T=30,
the total overhead is 105 MB (the total size of all models ex-
 

T 
K changed during a single training session), while for the 2DF-
C2DF −IDS = Swt (K − 1 ) (8) IDS the total overhead is 1 GB. However, the aggregation server
k
t=0 k=1

15
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Table 6
Comparisons between 2DF-IDS and state-of-the-art IDSs.

System Main Focus Used Dataset Classifier FL Settings Privacy Settings C.S.

K D.D. Decentralized DP  E.C. P.Q.

Nguyen et al. (2019) IoT-based Systems Collected RNN-GRU [5, 9, 15] Non-IID ✗ ✗ ✗ ✗ ✗
Schneble and CPS (Medical) MIMIC MLP [2, 4, 8, 16, 32, 64] N/A ✗ ✗ ✗ ✗ ✗
Thamilarasu (2019)
Song et al. (2020) Industrial IoT MNIST, CNN [10, 20,...,50] Non-IID ✗ ✗ ✗ ✗ ✗
CIFAR10
√ √ √
Hao et al. (2019) Industrial AI MNIST CNN [20, 30,...,100] Non-IID ✗ [0.5, 2]
√ √
Zhang et al. (2020) Industrial IoT Collected SGD 4 Non-IID ✗ ✗ ✗
Khan et al. (2021) Supply Chain 4.0 TON_IoT GRU N/A Non-IID ✗ ✗ ✗ ✗ ✗
Taheri et al. (2020) Industrial IoT Drebin, CNN [5, 6,...,15] Non-IID ✗ ✗ ✗ ✗ ✗
Genome,
Contagio

Li et al. (2020) CPS (Industrial) GasPipeline CNN-GRU [3, 5, 7] IID ✗ ✗ ✗ ✗

Mothukuri et al. (2021) IoT-based Systems Modbus GRU N/A Non-IID ✗ ✗ ✗ ✗
network
√ √ √ √
Ours Industrial IoT Edge-IIoTset DNN [20, 40, 80] Non-IID [0.2, 0.5, 1.0]

Data Distribution (D.D.); Communication Security (C.S.); Encrypted Communications (E.C.); Post-Quantum (P.Q).

study employs a recent peer-reviewed dataset (created in 2022)


with up-to-date protocol versions and recent complex cyber at-
tacks. While the other studies either collected their data (Nguyen
et al., 2019; Zhang et al., 2020) or used relatively old datasets. For
the classifier employed, we used a DNN model with a lightweight
and cost-effective design (as presented in Table 2). This provides
the ability to communicate the various parameters of the model
while saving communication costs, and having efficient perfor-
mance. For example, using a CNN (such in Hao et al., 2019; Li et al.,
2020; Song et al., 2020; Taheri et al., 2020) will increase the num-
ber of trainable parameters in the model (and thus the size of the
model) given the introduction of convolution and pooling layers.
For the FL parameters, in our system, we used three clients sets
(20, 40, and 80), while some studies used only one set of clients
(4 clients in Zhang et al., 2020), others used small-numbered sets
(only 3, 5, and 7 in Li et al., 2020), or used various sets with the
maximum clients set is relatively small (15 in Taheri et al., 2020,
50 in Song et al., 2020, and 64 in Schneble and Thamilarasu, 2019).
For the data distribution technique, we used the Non-IID which is
more comparable to real-life settings, however, some studies in-
cluding (Li et al., 2020) employed an IID technique.
For the learning approach, and while the majority of the studies
have taken the centralized FL, we consider a decentralized learn-
ing approach given the aforementioned advantages of this tech-
nique (including bypassing SPOF). Zhang et al. (2020) employed
a decentralized learning approach using the blockchain. Although
this method can also avoid the SPOF threats, it has its main disad-
vantages, including the necessity of invoking the consensus mech-
anism for each update, not to forget the time needed for read and
write operations, and additional network overhead. For many real-
time security-critical applications, the computational and network
overheads involved when using external assets, such as blockchain,
are not practical. Protecting the clients’ private data has not been
considered in the majority of works. While Hao et al. (2019) con-
sidered this aspect, the privacy budgets used was ( = 0.5 or
 = 2), unlike our work that uses more strict and varied privacy
budgets with  = 0.2,  = 0.5, and  = 1.0, given that ( , δ )-
Fig. 11. Individual Clients Loss with [K = 80,  = 0.2], for (a) FL and (b) 2DF-IDS.
DP ensures that the privacy loss’s absolute value will be bounded
network for the centralized FL must handle all total communi- by  with a probability of 1-δ (Dwork et al., 2014). In addition,
cation costs, i.e., all 105 MB in this experiment, while for the with  = 0.5, their model achieved a maximum accuracy of 90.8%
2DF-IDS, each client network must handle only 50 MB during while for 2DF-IDS we achieved a maximum accuracy of 91.69%,
the entire training session. along with the advantages of the decentralized nature of 2DF-IDS,
compared to the centralized FL approach on which the other sys-
4.3.4. 2DF-IDS and state-of-the-art FL-based IDS works comparisons tem is founded. Different works employed communication security
Table 6 provides a comparative summary between 2DF-IDS and for protecting the gradients exchange (Li et al., 2020; Mothukuri
several other FL-based IDS works. Regarding the datasets used, our et al., 2021; Zhang et al., 2020), however, only one approach

16
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

(Hao et al., 2019) considered the resistance to quantum attacks. (centralized learning) in terms of accuracy, and a 12%, 13% and 9%
Nevertheless, unlike their work, we do not rely on an aggregation enhancement in F1 score, recall, and precision, respectively, under
server and a key generation center, something that can also lead strict privacy settings, over other leading FL-based IDS solutions.
to opening vulnerabilities inside the overall system. The 2DF-IDS While the study successfully demonstrated that secure and
framework is designed to provide not only different layers of se- privacy-enforced decentralized learning is better than centralized
curity to protect FL-based decentralized IDSs, but also deliver en- and FL approaches from a variety of angles, including secure gra-
hanced performance to detect cyber attacks. dient exchange and no supporting centralized entities to worry
about in the event of attacks. However, it has some limitations in
4.4. Discussion terms of the inherited trade-off between privacy and performance,
a known problem in privacy-enhanced FL systems. This is an issue
As reported in the previous section, numerical results obtained worth investigating in future work.
from a series of experiments demonstrated the robustness of 2DF-
IDS. Although the proposed system was found to be effective, we Declaration of Competing Interest
believe that an improvement in the security of the proposed sys-
tem will be in terms of federated poisoning attacks. In this type The authors declare that they have no known competing finan-
of attack, malicious parties tend to train malicious models on poi- cial interests or personal relationships that could have appeared to
soned data (by injecting false or incorrect information into the influence the work reported in this paper.
training data, which is called data poisoning Szegedy et al., 2013)
andengage in the decentralized learning session. As a result, the CRediT authorship contribution statement
final overall model can produce undesirable and potentially dam-
aging results over time as it learns from malicious updates. This Othmane Friha: Conceptualization, Methodology, Software,
is particularly dangerous when building security models, as the at- Writing – original draft. Mohamed Amine Ferrag: Conceptualiza-
tacker can cause the resulting system to learn sample classes of at- tion, Methodology, Software, Writing – original draft, Supervision,
tacks as benign. One solution to mitigate this type of attack could Writing – review & editing. Mohamed Benbouzid: Supervision,
be the introduction of a sophisticated authentication mechanism Writing – review & editing. Tarek Berghout: Supervision, Writing
that verifies the identity of each participating client before getting – review & editing. Burak Kantarci: Supervision, Writing – review
permission to join the training session, but this is not fully guar- & editing. Kim-Kwang Raymond Choo: Supervision, Conceptual-
anteed because the authenticated participating clients can also be ization, Writing – review & editing.
compromised. The defensive mechanism would have to be incor-
Data availability
porated into each client’s aggregation algorithm, which is worth
investigating in future work. The data used in this research is publicly available.
Introducing a significant amount of noise to shield the data will
cause a reduced accuracy and vice versa (Dwork et al., 2014). This References
trade-off between security and accuracy is recognized as a major
limitation of privacy-enhanced FL-based solutions, as we have seen Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K., Zhang, L.,
2016. Deep learning with differential privacy. In: Proceedings of the 2016 ACM
in the performance evaluations. The amount of privacy one can SIGSAC Conference on Computer and Communications Security, pp. 308–318.
give for high accuracy or vice versa is viewed as a design choice, Aono, Y., Hayashi, T., Wang, L., Moriai, S., et al., 2017. Privacy-preserving deep learn-
with respect to the implementation situation at hand. This limita- ing via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 13
(5), 1333–1345.
tion is further increased when fewer samples are given to a large Boyes, H., Hallaq, B., Cunningham, J., Watson, T., 2018. The industrial internet of
number of clients. This challenge can be managed by adding more things (IIoT): an analysis framework. Comput. Ind. 101, 1–12.
samples per client (if data is available), or by data augmentation Carlini, N., Liu, C., Erlingsson, Ú., Kos, J., Song, D., 2019. The secret sharer: evaluat-
ing and testing unintended memorization in neural networks. In: 28th USENIX
techniques, where synthetic samples are generated for each client,
Security Symposium (USENIX Security 19), pp. 267–284.
including adversarial ML approaches such as FAug (Jeong et al., Ding, J., Fluhrer, S., Rv, S., 2018. Complete attack on RLWE key exchange with reused
2018) and FedHome (Wu et al., 2020). However, to some extent, keys, without signal leakage. In: Australasian Conference on Information Secu-
rity and Privacy. Springer, pp. 467–486.
the noise introduced and the resulting performance will always be
Ding, J., Xie, X., Lin, X., 2012. A simple provably secure key exchange scheme based
a matter of trade-off and implementation choices. on the learning with errors problem. Cryptol. ePrint Arch..
Dwork, C., McSherry, F., Nissim, K., Smith, A., 2006. Calibrating noise to sensitiv-
5. Conclusion ity in private data analysis. In: Theory of Cryptography Conference. Springer,
pp. 265–284.
Dwork, C., Roth, A., et al., 2014. The algorithmic foundations of differential privacy.
The progressive deployment of complex technologies in the in- Foundations Trends®Theor. Comput. Sci. 9 (3–4), 211–407.
dustrial sector is the engine that drives Industry 4.0. Yet, it is also Ferrag, M.A., Friha, O., Hamouda, D., Maglaras, L., Janicke, H., 2022. Edge-IIoTset: a
new comprehensive realistic cyber security dataset of IoT and IIoT applications
the leading source of security vulnerabilities and the primary tar- for centralized and federated learning. IEEE Access.
get of associated cyberattacks in this sector. In this paper, we pre- Ferrag, M.A., Friha, O., Maglaras, L., Janicke, H., Shu, L., 2021. Federated deep learning
sented 2DF-IDS, designed to secure industrial smart IoT systems for cyber security in the internet of things: concepts, applications, and experi-
mental analysis. IEEE Access 9, 138509–138542.
against a wide range of cyber threats. Besides, the proposed ap- Ferrag, M.A., Shu, L., Friha, O., Yang, X., 2021. Cyber security intrusion detection for
proach enhances and secures the FL training process, by protect- agriculture 4.0: machine learning-based solutions, datasets, and future direc-
ing the exchanged data from malicious parties outside the train- tions. IEEE/CAA J. Autom. Sin. 9 (3), 407–436.
Friha, O., Ferrag, M.A., Shu, L., Maglaras, L., Choo, K.-K.R., Nafaa, M., 2022. felids:
ing groups, and also securing the process itself from the partic-
federated learning-based intrusion detection system for agricultural internet of
ipating entities. In addition, the proposed system is fully decen- things. J. Parallel Distrib. Comput..
tralized, which eliminates the risk of compromising the aggrega- Hao, M., Li, H., Luo, X., Xu, G., Yang, H., Liu, S., 2019. Efficient and privacy-enhanced
federated learning for industrial artificial intelligence. IEEE Trans. Ind. Inf. 16
tion server and interrupting the entire training process in the regu-
(10), 6532–6542.
lar FL approach. The experimental performance evaluation of 2DF- Jeong, E., Oh, S., Kim, H., Park, J., Bennis, M., Kim, S.-L., 2018. Communication-
IDS demonstrated its robust operational performance in recogniz- efficient on-device machine learning: federated distillation and augmentation
ing different types of cyber threats to industrial IoT systems, as under non-IID private data. arXiv preprint arXiv:1811.11479.
Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz,
well as its benefits over existing state-of-the-art approaches, with K., Charles, Z., Cormode, G., Cummings, R., et al., 2019. Advances and open prob-
nearer performance (up to 0.47% deviation) to the baseline study lems in federated learning. arXiv preprint arXiv:1912.04977.

17
O. Friha, M.A. Ferrag, M. Benbouzid et al. Computers & Security 127 (2023) 103097

Khan, I.A., Moustafa, N., Pi, D., Hussain, Y., Khan, N.A., 2021. DFF-SC4N: a deep fed- ria, in June, 2008, June, 2010, June, 2014, and April, 2019, respectively. From 2014
erated defence framework for protecting supply chain 4.0 networks. IEEE Trans. to 2022, he was an Associate Professor with the Department of Computer Science,
Ind. Inf.. Guelma University, Algeria. From 2019 to 2022, he was a Visiting Senior Researcher
Lalitha, A., Kilinc, O. C., Javidi, T., Koushanfar, F., 2019. Peer-to-peer federated learn- with the NAU-Lincoln Joint Research Center of Intelligent Engineering, Nanjing Agri-
ing on graphs. arXiv preprint arXiv:1901.11173. cultural University, China. Since 2022, he has been the Led Researcher with Artificial
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444. Intelligence & Digital Science Research Center, Technology Innovation Institute, Abu
Li, B., Wu, Y., Song, J., Lu, R., Li, T., Zhao, L., 2020. DeepFed: federated deep learning Dhabi, United Arab Emirates. His research interests include wireless network secu-
for intrusion detection in industrial cyber–physical systems. IEEE Trans. Ind. Inf. rity, network coding security, applied cryptography, blockchain technology, and AI
17 (8), 5615–5624. for cyber security. He has published over 100 papers in international journals and
Lyubashevsky, V., Peikert, C., Regev, O., 2010. On ideal lattices and learning with conferences in the above areas. He has been conducting several research projects
errors over rings. In: Annual International Conference on the Theory and Appli- with international collaborations on these topics. He was a recipient of the 2021
cations of Cryptographic Techniques. Springer, pp. 1–23. IEEE TEM Best Paper Award. He is featured in Stanford Universitys list of the worlds
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A., 2017. Communica- Top 2% scientists for the years 2020, 2021, and 2022. He is a Senior Member of the
tion-efficient learning of deep networks from decentralized data. In: Artificial Institute of Electrical & Electronic Engineers (IEEE) and a member of the Association
Intelligence and Statistics. PMLR, pp. 1273–1282. for Computing Machinery (ACM).
Mothukuri, V., Khare, P., Parizi, R.M., Pouriyeh, S., Dehghantanha, A., Srivastava, G.,
2021. Federated learning-based anomaly detection for IoT security attacks. IEEE Mohamed Benbouzid received the BSc degree in electrical engineering from the
Internet Things J.. University of Batna, Batna, Algeria, in 1990, the MSc and PhD degrees in electri-
Nguyen, T.D., Marchal, S., Miettinen, M., Fereidooni, H., Asokan, N., Sadeghi, A.-R., cal and computer engineering from the National Polytechnic Institute of Grenoble,
2019. DÏoT: a federated self-learning anomaly detection system for IoT. In: 2019 Grenoble, France, in 1991 and 1994, respectively, and the Habilitation Diriger des
IEEE 39th International Conference on Distributed Computing Systems (ICDCS). Recherches degree from the University of Picardie Jules Verne, Amiens, France, in
IEEE, pp. 756–767. 20 0 0.,After receiving the PhD degree, he joined the Professional Institute of Amiens,
Regev, O., 2009. On lattices, learning with errors, random linear codes, and cryptog- University of Picardie Jules Verne, where he was an Associate Professor of electrical
raphy. J. ACM (JACM) 56 (6), 1–40. and computer engineering. Since September 2004, he has been with the University
Schneble, W., Thamilarasu, G., 2019. Attack detection using federated learning in of Brest, Brest, France, where he is currently a Full Professor of electrical engineer-
medical cyber-physical systems. In: Proceedings of the 28th International Con- ing. He is also a Distinguished Professor and a 10 0 0 Talent Expert with the Shanghai
ference on Computer Communications and Networks (ICCCN), Vol. 29. Valencia, Maritime University, Shanghai, China. His main research interests and experience
Spain include analysis, design, and control of electric machines, variable-speed drives for
Sengupta, J., Ruj, S., Bit, S.D., 2020. A comprehensive survey on attacks, security traction, propulsion, and renewable energy applications, and fault diagnosis of elec-
issues and blockchain solutions for IoT and IIoT. J. Netw. Comput. Appl. 149, tric machines.,He is a Fellow of IET, the Editor-in-Chief of the International Journal
102481. on Energy Conversion and Applied Sciences (MDPI) Section on Electrical, Electron-
Shor, P.W., 1994. Algorithms for quantum computation: discrete logarithms and fac- ics and Communications Engineering, and the Subject Editor for the IET Renewable
toring. In: Proceedings 35th Annual Symposium on Foundations of Computer Power Generation.
Science. IEEE, pp. 124–134.
Shor, P.W., 1999. Polynomial-time algorithms for prime factorization and discrete
Tarek Berghout received the MSc and PhD degrees in industrial informatics and
logarithms on a quantum computer. SIAM Rev. 41 (2), 303–332.
manufacturing from the University of Batna 2, Algeria, in 2015 and 2021, respec-
Song, S., Chaudhuri, K., Sarwate, A.D., 2013. Stochastic gradient descent with differ-
tively.,His research interests include condition monitoring of industrial processes
entially private updates. In: 2013 IEEE Global Conference on Signal and Infor-
using machine learning tools while his main interests include artificial neural net-
mation Processing. IEEE, pp. 245–248.
works, deep learning least squares methods, and extreme learning machine.
Song, Y., Liu, T., Wei, T., Wang, X., Tao, Z., Chen, M., 2020. FDA3: federated de-
fense against adversarial attacks for cloud-based IIoT applications. arXiv e-
prints, arXiv–2006. Burak Kantarci is an associate professor, and the founding director of the Smart
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R., Connected Vehicles Innovation Centre and the Next Generation Communications
2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. and Computing Networks Research Lab at the University of Ottawa. He holds a
Taheri, R., Shojafar, M., Alazab, M., Tafazolli, R., 2020. Fed-IIoT: a robust federated PhD in computer engineering. He is an author/co-author of more than 200 pub-
malware detection architecture in industrial IoT. IEEE Trans. Ind. Inf. 17 (12), lications, and has actively contributed to the field of mobile crowdsensing (MCS)
8442–8452. for IoT systems, as well as AI-backed access control, authentication, and machine-
Wang, Z., Song, M., Zhang, Z., Song, Y., Wang, Q., Qi, H., 2019. Beyond infer- learning-backed intrusion detection solutions in sensing environments. He served
ring class representatives: user-level privacy leakage from federated learning. as the Chair of IEEE Communications Systems Integration and Modeling Technical
In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, Committee; an Editor of IEEE Communications Surveys & Tutorials, the IEEE Internet
pp. 2512–2520. of Things Journal, and Elsevier’s Vehicular Communications, and an Associate Editor
Wu, Q., Chen, X., Zhou, Z., Zhang, J., 2020. FedHome: cloud-edge based personalized for IEEE Networking Letters and the Journal of Cybersecurity and Privacy. He served
federated learning for in-home health monitoring. IEEE Trans. Mob. Comput.. as a Distinguished Speaker of the Association for Computing Machinery (ACM) in
Yousefpour, A., Shilov, I., Sablayrolles, A., Testuggine, D., Prasad, K., Malek, M., 2019 2021.
Nguyen, J., Ghosh, S., Bharadwaj, A., Zhao, J., et al., 2021. Opacus: user-friendly
differential privacy library in pytorch. arXiv preprint arXiv:2109.12298. Kim-Kwang Raymond Choo received the PhD in Information Security in 2006 from
Zhang, W., Lu, Q., Yu, Q., Li, Z., Liu, Y., Lo, S.K., Chen, S., Xu, X., Zhu, L., 2020. Queensland University of Technology, Australia. He currently holds the Cloud Tech-
Blockchain-based federated learning for device failure detection in industrial nology Endowed Professorship at The University of Texas at San Antonio. He is
IoT. IEEE Internet Things J. 8 (7), 5926–5937. the founding co-editor-in-chief of ACM Distributed Ledger Technologies: Research &
Zhang, Z., Zhou, M., Niu, K., Abdallah, C., 2021. The effect of training parameters Practice, and the founding chair of IEEE Technology and Engineering Management
and mechanisms on decentralized federated learning based on MNIST dataset. Societys Technical Committee (TC) on Blockchain and Distributed Ledger Technolo-
arXiv preprint arXiv:2108.03508. gies. He is the recipient of the IEEE Systems, Man, and Cybernetics TC on Homeland
Security Research and Innovation Award in 2022, the IEEE Hyper-Intelligence TC
Award for Excellence in Hyper-Intelligence Systems (Technical Achievement award)
Othmane Friha received the master’s degree in computer science from Badji
in 2022, the IEEE TC on Secure and Dependable Measurement Mid-Career Award
Mokhtar - Annaba University, Algeria, in 2018. He is currently working toward the
in 2022, and the 2019 IEEE Technical Committee on Scalable Computing Award
PhD degree in the University of Badji Mokhtar - Annaba, Algeria. His current re-
for Excellence in Scalable Computing (Middle Career Researcher). He has also re-
search interests include network and computer security, Internet of Things, and ap-
ceived IEEE Computer Society s Bio-Inspired Computing Special TC Outstanding Pa-
plied cryptography.
per Award for 2021, and best paper awards from the IEEE Systems Journal in 2021,
IEEE Consumer Electronics Magazine for 2020, EURASIP Journal on Wireless Com-
Mohamed Amine Ferrag received the Bachelors, Masters, PhD, and Habilitation de- munications and Networking in 2019, IEEE TrustCom 2018, and ESORICS 2015.
grees in computer science from Badji Mokhtar Annaba University, Annaba, Alge-

18

You might also like