LV, Lu Wang, Wenhai Zhang, Zeyin Liu, Xinggao (2020)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Knowledge-Based Systems 195 (2020) 105648

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

A novel intrusion detection system based on an optimal hybrid kernel


extreme learning machine✩

Lu Lv, Wenhai Wang, Zeyin Zhang, Xinggao Liu
State Key Laboratory of Industrial Control Technology, College of Control Science & Engineering, Zhejiang University, 310027 Hangzhou, PR China

article info a b s t r a c t

Article history: Intrusion detection is a challenging technology in the area of cyberspace security for protecting a
Received 23 August 2019 system from malicious attacks. A novel accurate and effective misuse intrusion detection system that
Received in revised form 7 February 2020 relies on specific attack signatures to distinguish between normal and malicious activities is therefore
Accepted 10 February 2020
presented to detect various attacks based on an extreme learning machine with a hybrid kernel func-
Available online 13 February 2020
tion (HKELM). First, the derivation and proof of the proposed hybrid kernel are given. A combination
Keywords: of the gravitational search algorithm (GSA) and differential evolution (DE) algorithm is employed
Intrusion detection system to optimize the parameters of HKELM, which improves its global and local optimization abilities
Extreme learning machine during prediction attacks. In addition, the kernel principal component analysis (KPCA) algorithm is
Gravitational search algorithm introduced for dimensionality reduction and feature extraction of the intrusion detection data. Then,
Differential evolution a novel intrusion detection approach, KPCA-DEGSA-HKELM, is obtained. The proposed approach is
Kernel principal component analysis eventually applied to the classic benchmark KDD99 dataset, the real modern UNSW-NB15 dataset and
the industrial intrusion detection dataset from the Tennessee Eastman process. The numerical results
validate both the high accuracy and the time-saving benefit of the proposed approach.
© 2020 Elsevier B.V. All rights reserved.

1. Introduction attack signatures to distinguish between normal and malicious


activities. However, these systems are directly influenced by the
With the increasing development of network technology, par- freshness of the detection rules, thus, improvements in the detec-
ticularly, with the popularity of the Internet, the problem of cyber tion accuracy and learning speed of intrusion detection systems
security has been the focus of a growing number of people [1,2]. remain a challenging task [6].
As a new security defense technology, the intrusion detection Recently, considerable work has been done in the area of in-
system (IDS) can actively protect a network system from ille- trusion detection (ID), which attempts to design anomaly and/or
gal external attacks. IDS is designed to ensure the security of misuse detection systems to detect malicious attacks with a high
systems and can promptly detect abnormal phenomena [3]. Fur- detection rate and low false alarm rate. Tsang et al. [7] proposed
thermore, IDS can improve the reliability and security of systems an effective anomaly detection approach to extract both accu-
by detecting and responding to various malicious behaviors. rate and interpretable fuzzy rules from network traffic data for
Generally, IDSs can be classified into two categories: anomaly classification. The fuzzy rule-based IDS was based on an agent-
detection systems (profile-based detection systems) and misuse based evolutionary framework and carried out a genetic feature
detection systems (signature-based detection systems). Anomaly selection for dimensionality reduction. Principal component anal-
detection systems aim at behavior that deviates from a normal ysis (PCA) was utilized successfully in ID by Salo et al. [8]. PCA
profile of the system, while misuse detection systems aim at can extract the most significant features by mapping the input
behavior that matches a known attack scenario [4,5]. Even if dataset into an uncorrelated subspace. The k-nearest neighbor
anomaly detection systems perform better in detecting unknown (KNN) method was used as a basic classifier to detect malicious
attacks, they usually yield a high false alarm rate. This limitation attacks, which provided effective misuse detection systems with
is addressed by misuse detection systems, which rely on specific a high accuracy and detection rate [9,10]. Xiang et al. [11] in-
troduced a novel multilevel hybrid classifier based on Bayesian
clustering and decision trees, and they adopted this method for
✩ No author associated with this paper has disclosed any potential or
the IDS. An intelligent signature-based detection system called
pertinent conflicts which may be perceived to have impending conflict with
this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.
Dendron was presented by Dimitrios et al. [6] and classified vari-
2020.105648. ous types of attacks; this methodology combined the advantages
∗ Corresponding author. of both decision trees and genetic algorithms to obtain accurate
E-mail address: [email protected] (X. Liu). detection rules. Chan et al. [12] proposed a policy-enhanced

https://doi.org/10.1016/j.knosys.2020.105648
0950-7051/© 2020 Elsevier B.V. All rights reserved.
2 L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648

fuzzy model with adaptive neuro-fuzzy inference system features, ability of the KELM method aiming at providing accurate and
which can countermeasure SOAP-related attacks with a high de- efficient misuse intrusion detection methods. Furthermore,
tection accuracy and low false positive rate. Furthermore, they the proof of the proposed hybrid kernel function is provided
also developed fuzzy associative rules to effectively countermea- in detail.
sure SOAP-related and XML-related attacks in Web services and • In the context of DE and GSA, a hybrid algorithm, DEGSA,
e-commerce applications [13,14]. Wathiq et al. [15] proposed is proposed and combines the benefits of DE and GSA with
a method called real-time multi-agent system for an adaptive the aim of improving both the local and global optimization
intrusion detection system (RTMAS-AIDS) to allow the IDS to abilities for detecting attacks.
adapt to unknown attacks in real-time, and this method applied • The KPCA algorithm is introduced for the dimensionality
a hybrid support vector machine (SVM) and extreme learning reduction and feature extraction of the intrusion detection
machine (ELM) to classify normal behavior and known attacks. An data. Then, an effective intrusion detection approach, KPCA-
effective SVM-based ID algorithm was presented by Tao et al. [16] DEGSA-HKELM, is obtained.
to identify intrusions, which obtained great results. In addition, • The proposed approach is compared with other literature
the improved algorithms of ELM and SVM were widely used for methods in an extensive testbed comprising of three intrusion
ID applications [17–20]. These advances have achieved a great detection datasets, namely, the classic benchmark KDD99
performance for detecting and reporting malicious attacks. Nev- dataset [28], the real modern UNSW-NB15 dataset [29] and
ertheless, the better accuracy and efficiency of the prediction the industrial intrusion detection dataset from the TE pro-
model is still the first purpose of an IDS. cess [30]. These datasets include both host-based and
The objective of this paper is to provide an accurate and network-based attacks from different platforms, which can
effective misuse intrusion detection system with machine learn- demonstrate the effectiveness of the proposed method.
ing techniques that rely on specific attack signatures to dis- • The proposed approach is evaluated and compared with other
tinguish between normal and malicious activities with a high literature methods using several classification evaluation met-
accuracy and fast learning speed. It is well known that the de- rics. The experimental results show that the proposed ap-
termination of a suitable configuration for a particular dataset is proach is superior to other methods in terms of accuracy
a demanding problem in machine learning. Therefore, numerous (Acc), mean accuracy (MAcc), mean F-score (MF ) and attack
researchers have attempted to find the optimal parameters of accuracy (AAcc) evaluation metrics. Furthermore, the pro-
machine learning models. Aburomman and Reaz [21] applied the posed approach outperforms the CPSO-SVM method in terms
particle swarm optimization (PSO) algorithm to the SVM-KNN of all the overall evaluation metrics while achieving a higher
computational efficiency with less training and testing time.
ensemble method to create a classifier with a better accuracy for
ID. The authors also proposed a novel weighted SVM multiclass The rest of this paper is organized as follows. In Section 2, the
classifier based on differential evolution (DE) for the IDS [22]. extreme learning machine with hybrid kernel function (HKELM)
ELMs are a popular area of research for detecting possible intru- approach is proposed, and DEGSA is presented to optimize the
sions and attacks [23]. Ku and Zheng [24] proposed an improved parameters of the HKELM model, and moreover, the KPCA al-
learning algorithm named self-adaptive differential evolution ex- gorithm is introduced for feature extraction. Section 3 outlines
treme learning machine with Gaussian kernel for classifying and the implementation of the proposed algorithms. The experiment
detecting the intrusions. Bostani and Sheikhan [25] introduced a environment and model evaluation metrics are illustrated in Sec-
hybrid binary gravitational search algorithm (GSA) for feature se- tion 4. In Section 5, the experimental results are provided to
lection in IDSs, and this method can find a great subset of features validate the accuracy and efficiency of the proposed approach.
and achieve a high accuracy and detection rate. GSA is a heuristic Finally, Section 6 contains some concluding remarks.
optimization method with fewer parameters to be determined,
which has the benefits of a high convergence rate and strong 2. Approach description
local optimization ability. Meanwhile a DE algorithm is a heuristic
optimization method with a strong global optimization ability, 2.1. Extreme learning machine with hybrid kernel function (HKELM)
which has the benefit of great adaption [26,27]. Unfortunately,
both the GSA and DE algorithms have their own disadvantages, 2.1.1. Extreme learning machine (ELM)
whereby the former is easily trapped into a local optimum and An ELM is an effective feedforward neural network with a sin-
the latter’s local optimization ability is relatively weak. gle hidden layer, as presented by Huang et al. [31,32]. Traditional
An effective approach to deal with ID problems is therefore neural networks need to set a large number of parameters to train
presented. First, considering the timeliness requirement of ID the network, and moreover, it is easy to generate local optimal
problems, an ELM method is selected as the basic model in solutions. Nevertheless, the ELM only needs to set the number
the current work. Furthermore, to improve the accuracy of the of hidden nodes in the network, without adjusting the weight of
ELM method, a hybrid kernel function combining the radial basis the input layer and the bias of the hidden layer, and it is easier to
function (RBF) kernel with the polynomial kernel is derived and generate a global optimal solution [33]. Therefore, the ELM has
introduced to the ELM model. The proof that the proposed hybrid a faster convergence rate and is more efficient in terms of the
kernel function satisfies Mercer’s theorem is also given. Second, learning performance. The network structure of the ELM is shown
taking advantage of both the GSA and DE algorithms, a hybrid dif- in Fig. 1.
ferential evolution combined with gravitational search algorithm For the given training dataset T0 = {(xj , t j ), j = 1, . . . , N },
(DESGA) is proposed to optimize the parameters of the proposed where xj = [xj1 , . . . , xjn ] ∈ Rn is the input feature vector and
model, which improves both the local and global optimization tar j = [tarj1 , . . . , tarjm ] ∈ Rm is the corresponding target vector,
abilities over those of the individual algorithms. Third, the ker- the goal is to obtain the optimal model for further testing tasks.
nel principal component analysis (KPCA) is introduced for the In Fig. 1, y j = [yj1 , . . . , yjm ] ∈ Rm is the output vector obtained
dimensionality reduction and feature extraction of the nonlinear via the ELM network. Then, the ELM model can be expressed by
ID data. The significance of this paper is summarized as follows. the following formula:
l l
∑ ∑
• A new HKELM method with a hybrid kernel function is pro- yj = βi gi (xj ) = βi g(αi · xj + ci ), j = 1, . . . , N (1)
posed that improves both the generalization and learning i=1 i=1
L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648 3

where H + is the Moore–Penrose generalized inverse of matrix H


and H + can be obtained as follows,
H + = H T (H H T )−1 (7)

2.1.2. HKELM approach


The prediction accuracy of the ELM model may be relatively
low when the model is applied to several unknown testing
datasets. As a result, the kernel parameter I /C was introduced
to H H T by Huang et al. for improving the generalization ability
of the ELM model, namely, the kernel extreme learning machine
(KELM). The output function of the KELM can be expressed as
follows [34],
I
f (x) = h(x)β = h(x)H T ( + H H T )−1 T (8)
C
where the positive constant C is the penalty parameter and I is
the identity matrix. The kernel function of the KELM is defined as
follows,

Fig. 1. Network structure of the ELM model.


ΩKELM = H H T , ΩKELMi,j = h(xi )h(xj ) = K (xi , xj ) (9)
Therefore, the model function of the KELM can be written as:
K (x, x1 )
⎡ ⎤
where αi is the weight vector between the input layer and hidden
( )−1
f (x) = ⎣
.. ⎥ I
+ Ω T (10)
layer, βi is the weight vector between the hidden layer and output

. ⎦
C
KELM

layer, ci is the bias of the ith hidden node, and g(·) is the activation K (x, xN )
function of the hidden layer. The node parameters αi and ci of the
The selection of the kernel function can greatly influence the
hidden layer are randomly assigned, and as a consequence, only
performance of the KELM model. Consequently, it is significant
the number of hidden layer nodes l needs to be determined in the to find an appropriate kernel function for the KELM model. The
ELM model. polynomial kernel function and radial basis function (RBF) kernel
If the error between the output y and the target tar can function are two common kernel functions, which are combined
be approximated to zero, then the following equation can be together as the hybrid function of the KELM in the current work.
obtained as:
2.1.2.1. Polynomial kernel function. The expression of the polyno-
N
∑ mial kernel function is stated as follows,
∥tar j − y j ∥ = 0 (2)
j=1 Kpoly (x, xi ) = (x · xi + b)p (11)

Combining Eq. (1) with Eq. (2), there exist βi , αi and ci that satisfy: where b and p are the constant and exponent parameters of the
polynomial kernel function, respectively.
l
The polynomial kernel function is a typical global kernel func-
∑ tion, which means that its corresponding KELM model possesses
βi g(αi · xj + ci ) = tar j , j = 1, . . . , N (3)
a strong generalization ability and weak learning ability [35,36].
i=1
Fig. 2 demonstrates the curves of the polynomial kernel function
Eq. (3) can be converted into matrix form as follows, with different b and p, where the test point is selected as xi = 0.2.
In Fig. 2(a), the value of parameter p is set as p = 2, while
g(α1 · x1 + c1 ) · · · g(αl · x1 + cl ) βT1
⎡ ⎤ ⎡ ⎤
the value of parameter b changes from 0.2 to 1.0. In Fig. 2(b),
.. .. .. .. ⎥ the value of parameter b is set as b = 1, while the value of
. ⎦ ·⎣
⎢ ⎥ ⎢
⎣ . . . ⎦
g(α1 · xN + c1 ) · · · g(αl · xN + cl ) N ×l p changes from 1 to 5. As Fig. 2 indicates, the output of the
βTl l×m
      polynomial kernel function increases with the input. Further-
H N ×l =[h(x1 ),...,h(xN )]T βl×m more, the sample points both near and far away from the test
⎡ ⎤ (4) point have an influence on the output of the kernel function,
tar T1
which verifies the strong generalization ability of the polynomial
=⎣
⎢ .. ⎥
. ⎦ kernel function. However, the test point has no apparent learning
tar TN capability, revealing the weak learning ability of the polynomial
N ×m
   kernel function.
T N ×m
2.1.2.2. RBF kernel function. The expression of the RBF kernel
that is, function is indicated as follows,
∥x − xi ∥2 ∥x − xi ∥2
( ) ( )
Hβ = T (5)
KRBF (x, xi ) = exp − = exp − (12)
2σ 2 a
where T is the output matrix of the target and H and β are the
output matrix and weight matrix of the hidden layer, respectively. where a = 2σ 2 is the exponent parameter of the RBF kernel
Therefore, the weight matrix of hidden layer β can be calcu- function.
lated by the following equation: The RBF kernel function is a typical local kernel function,
which means that the corresponding KELM model has a strong
β = H +T (6) learning ability and weak generalization ability [35,36]. Fig. 3
4 L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648

demonstrates the curves of the RBF kernel function with different


a.
In Fig. 3, the value of a changes from 0.02 to 0.50, which deter-
mines the width of the RBF kernel function. Unlike the polynomial
kernel function, only the sample points near the test point can
affect the output of the RBF kernel function, which indicates the
poor generalization ability of the RBF kernel function. As seen
from Fig. 3, the closer the sample point is to the test point, the
stronger the learning ability will be. Compared with Fig. 2, the
learning ability of the RBF kernel function is superior to that of
polynomial kernel function. Fig. 3 also shows that as the value of a
decreases, the learning ability of the RBF kernel function increases
while the generalization ability decreases.

2.1.2.3. Hybrid kernel function. The generalization ability of the


polynomial kernel function is superior to that of the RBF kernel
function, while the learning ability is poorer. Therefore, to im-
prove both the generalization and learning ability of the KELM,
a combination of the two kernel functions is proposed as the
hybrid function, which takes advantage of both the kernels. In
the current work, the linear weight method is utilized, and the
equation of the new hybrid function is expressed as follows,

Khybrid (x, xi ) = w · KRBF + (1 − w ) · Kpoly , w ∈ [0, 1] (13)

that is,

∥x − xi ∥2
( )
Khybrid (x, xi ) = w · exp − + (1 − w) · (x · xi + b)p ,
a
w ∈ [0, 1]
(14)

where the constant w is the weight coefficient of the hybrid


function Khybrid .

Proposition 1. As the RBF kernel function KRBF and polynomial


kernel function Kpoly satisfy Mercer’s theorem, the proposed function
Khybrid is also a kernel function satisfying Mercer’s theorem.

Proof. KRBF and Kpoly are kernel functions, thus, KRBF and Kpoly are
positive semidefinite matrices, that is, for any vector λ ∈ R, the
following conditions must be satisfied:
λT KRBF λ ≥ 0
{
(15)
Fig. 2. Curves of the polynomial kernel function with different b and p. λT Kpoly λ ≥ 0
Then, the following expressions can be obtained:
wλT KRBF λ ≥ 0 λT (w KRBF )λ ≥ 0
{ {
⇒ (16)
(1 − w )λT Kpoly λ ≥ 0 λT [(1 − w)Kpoly ]λ ≥ 0
Therefore, w KRBF and (1 − w )Kpoly are positive semidefinite matri-
ces. According to Eq. (13), λT Khybrid λ can be rewritten as follows,

λT Khybrid λ = λT [w · KRBF + (1 − w) · Kpoly ]λ


= λT (w · KRBF )λ + λT [(1 − w) · Kpoly ]λ (17)
     
≥0 ≥0

Combined with Eqs. (16) and (17), the following expression can
be obtained:

λT Khybrid λ ≥ 0 (18)

It can be observed from Eq. (18) that Khybrid is a positive semidef-


inite matrix, that is, Khybrid satisfies Mercer’s theorem; thus, the
proposed function Khybrid is a kernel function, which can be ap-
plied to improve the performance of the KELM approach. The
Fig. 3. Curves of the RBF kernel function with different a.
proof is completed.
L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648 5

Step 2. Mutation. The mutation operation is utilized to generate a


mutant vector, V i (t) = [vi,1 (t), . . . , vi,D (t)], and the mutation rule
is executed by the following equation:

V i (t + 1) = X r1 (t) + F × (X r2 (t) − X r3 (t)), 1 ≤ i ̸ = r1 ̸ = r2 ̸ = r3 (23)

where t denotes the tth generation and F is the scaling parameter


of two vectors.
Step 3. Crossover. A trial vector, U i (t) = [ui,1 (t), . . . , ui,D (t)],
is created through the crossover operation. The corresponding
process can be descried as follows,
vi,j (t + 1), if rand(0, 1) ≤ CR
{
ui,j (t + 1) = (24)
xi,j (t), other w ise

where CR is the crossover rate, which can assist the model in


avoiding local optimization and maintains the diversity of the
population [26].
Step 4. Selection. The fitness of U i (t) is compared with that
of X i (t), and the better result is selected by the DE algorithm.
In other words, if the fitness of the population after crossover
Fig. 4. Curves of the hybrid kernel function with different w.
increases, then the results are updated, that is,
U i (t),
{
if fit(U i (t)) ≤ fit(X i (t))
X i (t + 1) = (25)
X i (t), other w ise
Fig. 4 illustrates the curves of the hybrid kernel function with
different w . The values of parameters a, b and p are set as a = where fit(·) is a fitness value, which is calculated by the following
equation:
0.18, b = 1 and p = 2, respectively, while the value of parameter
w changes from 0 to 1. It is observed from Fig. 4 that the larger

 n
1 ∑
the value of w is, the stronger the hybrid kernel function learning fit(x) = √ (yi − ŷi )2 (26)
ability, while the corresponding generalization ability will be n
i=1
weakened. Moreover, all the parameters of Khybrid can affect the
performance of the HKELM model. Therefore, it is significant to where yi and ŷi are the measured and predicted results, respec-
determine the optimal parameters aopt , bopt , popt and wopt of the tively.
hybrid kernel function, which achieves both the great learning
and generalization abilities of the resulting HKELM model. 2.2.2. Gravitational search algorithm
Gravitational search algorithm (GSA) was proposed by Esmat
2.2. Differential evolution combined with gravitational search algo- et al. in 2009 [40]. In GSA, the agents are regarded as objects,
rithm (DEGSA) and all objects tend to move towards the objects with heavy
masses. This algorithm realizes the communication between ob-
2.2.1. Differential evolution algorithm jects through gravitational force, thus, guiding all the objects to
the optimal solution in the search space [41,42].
The differential evolution (DE) algorithm is a particle-based
Suppose the number of objects is N, the position of the ith
global optimization algorithm, which was proposed by Storn and
object can be defined as follows,
Price in 1997 [37]. Compared to evolutionary algorithms, the DE
algorithm exhibits a strong global search capacity and robust- xi = (x1i , . . . , xdi , . . . , xDi ), i = 1, . . . , N (27)
ness [38,39].
The optimization problem within a D-dimensional space is where xdi denotes the position of the ith object in the dth dimen-
shown as follows, sion.
The force acting on the ith object from the jth object is de-
min f (X 1 , X 2 , . . . , X N ) (19) scribed as,
subject to, Mpi (t)Maj (t)
Fijd = G(t) (xdj (t) − xdi (t)) (28)
Bij (t) + ε
X Li ≤ Xi ≤ X Ui , i = 1, . . . , N (20)
where Mpi (t) and Maj (t) are the active gravitational masses related
where X i = [xi,1 , . . . , xi,D ], i = 1, . . . , N is a candidate solu- to the ith object and jth object, respectively. Bij (t) is the Euclidian
tion and X Li and X Ui denote the minimum and maximum of X i , distance between ith object and jth object. ε is a small constant.
respectively. Then, the DE algorithm can be described as follows, G(t) is the gravitational constant, namely,
Step 1. Initialization. The boundary constraints of the search space ( )
are given as follows, t
G(t) = G0 exp −γ (29)
itermax
xLi,j ≤ xi,j (0) ≤ xLi,j , i = 1, . . . , N ; j = 1, . . . , D (21)
where γ is the descending coefficient, G0 is the initial gravita-
where xi,j (0) is the initial population, which should cover the tional constant, and itermax is the maximum iteration.
entire space. In GSA, the total force that acts on the ith object is calculated
Then, the initial value is generated by as,

xi,j (0) = xLi,j + rand(0, 1) · (xUi,j − xLi,j ) (22) NP



Fid (t) = randj Fijd (t) (30)
where rand(0, 1) is a random number between 0 and 1. j∈Kbest ,j̸ =i
6 L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648

where randj is a random number between 0 and 1 and Kbest are Based on the above optimization procedures of DEGSA, a de-
the first k objects with the best fitness values. tailed description of the proposed algorithm is summarized as
By the law of motion, the acceleration of the ith object is follows,
shown as follows,
(1) Initialize the number of objects N and the velocity xi and
Fid (t) position vi of the ith object. Determine the parameters of
acid (t) = (31)
Mi (t) DEGSA, e.g., the descending coefficient γ , the initial gravita-
tional constant G0 , etc. Set the maximal number of iterations
where Mi (t) is the inertial mass of the ith object.
itermax and let the initial iteration be t = 1.
The next velocity of an object is the sum of the current ve-
(2) Calculate the fitness value according to Eq. (26).
locity and its acceleration. Therefore, the updates in velocity and
(3) If the number of the iteration t is odd, go to step (4); other-
position of the ith object can be described as,
wise, go to step (5).
vid (t + 1) = randi × vid (t) + acid (t) (32) (4) Activate GSA.

(a) Calculate the gravitational constant G according to


Eq. (29).
xdi (t + 1) = xdi (t) + vid (t + 1) (33)
(b) For the minimum problem, calculate the best and worst
Assuming that the gravitational mass equals to the inertial fitness values best(t) and w orst(t) according to
mass, namely, Eqs. (37)∼ (38).
(c) Calculate the total force of the ith object Fid (t) ac-
Mai = Mpi = Mii = Mi , i = 1, . . . , N (34)
cording to Eq. (30), obtain the acceleration of the
then the updates in gravitational and inertial masses are indicated ith object acid (t) according to Eq. (31), and calculate
as follows, the inertial mass of the ith object Mi (t) according to
Eqs. (35)∼ (36).
fiti (t) − w orst(t)
mi (t) = (35) (d) Update the velocity and position of the ith object ac-
best(t) − w orst(t) cording to Eqs. (32)∼ (33).
(e) If i ≤ N, go back to step(4)-(c); else, go to step (6).
mi (t)
Mi (t) = ∑N (36) (5) Activate the DE algorithm.
j=1 mj (t)

where fiti (t) is a fitness value. For a minimum problem, best(t) (a) Generate a mutant vector Vi (t) according to Eq. (23).
and w orst(t) are defined by the following equations: (b) Generate a trial vector Ui (t) according to Eq. (24).
(c) If fit(Ui (t)) ≤ fit(Xi (t)), go to step (5)-(d); otherwise, go
best(t) = min fitj (t) (37) back to step (5)-(a).
j∈1,...,NP
(d) If i ≤ N, go back to step(5)-(a); else, go to step (6).

worst(t) = max fitj (t) (38) (6) Update the parameters according to the new fitness.
j∈1,...,NP
(7) If t ≤ itermax , go back to step (3); otherwise, go to step (8).
For a maximum problem, best(t) and w orst(t) are defined as, (8) Output the updated solutions as the optimal parameters.

best(t) = max fitj (t) (39) Then, the proposed DEGSA is finished.
j∈1,...,NP

2.3. Kernel principal component analysis (KPCA)


worst(t) = min fitj (t) (40)
j∈1,...,NP Principal component analysis (PCA) is a classic approach uti-
In the current work, the intrusion detection problem is re- lized to feature extraction and dimensionality reduction [45]. PCA
garded as a minimum problem, that is, Eqs. (37) and (38) are used can deal well with the linear relationship between variables;
for the intrusion detection problem. however, when dealing with nonlinear relations, the contribution
rate of each principal component is too scattered. As a result,
2.2.3. The proposed DEGSA the comprehensive variables that can effectively represent the
The global optimization ability of the DE algorithm is strong, original samples cannot be found precisely, leading to the low ef-
which can accurately find the global optima of the search space fectiveness of the PCA method. The KPCA approach was proposed
with the differential information [43]. Nevertheless, the local by Scholkopf et al. [46] as an improvement to the original PCA
optimization ability of the DE algorithm is relatively weak. On method and can deal with nonlinear problems effectively. KPCA
the contrary, the local optimization ability of GSA is strong, while maps the nonlinear training samples X = [x1 , . . . , xn ]T ∈ Rn×d
its global optimization ability is relatively weak. As the iteration into a high-dimensional feature space Γ through the nonlinear
progresses, GSA requires more time to reach the optimal solu- mapping function Φ [47], namely,
Rn×d → Γ
{
tion due to the emergence of a large number of inertial mass
Φ: (41)
objects. Therefore, GSA and the DE algorithm are combined to X → Φ (X )
improve both the local and global optimization abilities of the
proposed DEGSA. In current work, GSA and the DE algorithm The inseparable data in the input space becomes separable
will be performed alternately, i.e., GSA is implemented at the in the high-dimensional feature space Γ by using the simple
odd generations, while the DE algorithm is applied at the even nonlinear mapping function Φ , then the PCA method is utilized
generations. The introduction of the DE algorithm increases the to extract the features in Γ , which realizes the separation of non-
diversity of the original GSA, which assists DEGSA to explore linear training samples. The ith feature after the transformation
the search space smartly while the individual approaches avoid of KPCA can be expressed as follows,
becoming stuck at the local optima [43,44]. The flowchart of 1
qi = √ σiT [k(x1 , xnew ), . . . , k(xn , xnew )]T , i = 1, . . . , p (42)
DEGSA is given graphically in Fig. 5. µi
L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648 7

Fig. 5. Flowchart of DEGSA.

where µi , i = 1, . . . , p are the p largest positive eigenvalues Table 1


satisfying µ1 ≥ µ2 ≥ · · · ≥ µp and σi , i = 1, . . . , p are the The definitions of evaluation metrics for the multiclass ID problem.

corresponding eigenvectors. k(xn , xnew ) is the kernel function of xn Name Meaning Notation

and a new column vector sample xnew , which calculates the inner Precision The instances of correctly Pi (i = 0, 1, . . . , c − 1)
classified as the ith class given
products of xn and xnew in the high-dimensional feature space Γ .
the proportion of all instances
predicted as the ith class
3. Algorithm outline Recall The instances of correctly Ri (i = 0, 1, . . . , c − 1)
classified as the ith class given
the proportion of all instances
The ID process based on the KPCA-DEGSA-HKELM approach actually belonging to the ith class
is demonstrated in Fig. 6, which is also described in detail as F-score The balance between the Fi (i = 0, 1, . . . , c − 1)
follows, precision and the recall
Accuracy The frequency of correct Acc
decisions
Algorithm. The proposed KPCA-DEGSA-HKELM approach Mean accuracy The average recall among all the MAcc
Step 1: Input: ID dataset, the initial parameters of the KPCA- classes of the dataset
DEGSA-HKELM model. Mean F-score The average F-score among all MF
Step 2: Dimensionality reduction and feature extraction of the the classes of the dataset
Attack accuracy The accuracy rate for the attack AAcc
input dataset by using the KPCA method. classes
Step 3: Train the HKELM model with the preprocessed training False attack The frequency of falsely FAR
dataset and optimize the model parameters of the HKELM with accuracy predicting a normal instance as
the hybrid algorithm DEGSA. When the number of iterations an attack
False normal The frequency of falsely FNR
reaches the maximum value iter max , the optimization process rate predicting an attack instance as
stops, and the optimal parameters are obtained. normal
Step 4: Evaluate the obtained optimal HKELM model with the
testing dataset.
Step 5: Output: the evaluation metrics of the ID problem.
the different categories of the network connection in the exper-
iments are denoted as normal (labeled as 0), attacks (labeled as
4. Environment and model evaluation
1, 2, . . . , c − 1), where c denotes the number of categories for
All the experiments in the current work are executed on a PC the network connection. Take the KDD99 dataset as an example,
constituted by a 2.6 GHz Intel Core i5 processor with 8.0 GB of which contains four attack categories of DoS, PRB, U2R and R2L.
RAM. To validate the effectiveness of the KPCA-DEGSA-HKELM A confusion matrix of a classification experiment is shown in
classification model, several multiclass classification evaluation Table 2, where Nij represents the number of the ith kind of
metrics are utilized to gauge the models. In current work, the network connection that is predicted as the jth kind of network
precision, recall and F-score are used as the evaluation metrics connection.
of each class, moreover, the accuracy, mean accuracy, mean F- Then, all the evaluation metrics can be calculated as follows,
score, attack accuracy, false alarm rate and false normal rate are
utilized as the overall evaluation metrics. The detailed definitions TPi Nii
Pi = = ∑c −1 (i = 0, 1, . . . , c − 1) (43)
of the evaluation metrics are provided in Table 1. In Table 1, TPi + FPi Nji
j=0
8 L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648

Fig. 6. ID process based on the KPCA-DEGSA-HKELM approach.

Table 2
An example of a confusion matrix.
Classified class
Actual class
Normal (0) DoS (1) PRB (2) U2R (3) R2L (4)
Normal (0) N00 N01 N02 N03 N04
DoS (1) N10 N11 N12 N13 N14
PRB (2) N20 N21 N22 N23 N24
U2R (3) N30 N31 N32 N33 N34
R2L (4) N40 N41 N42 N43 N44

TPi Nii Fig. 7. F-scores and mean F-scores for the Poly_KELM, RBF_KELM and HKELM
Ri = = ∑c −1 (i = 0, 1, . . . , c − 1) (44) methods with the KDD99 dataset.
TPi + FNi Nij
j=0

2 · R i · Pi 5. Case study
Fi = (i = 0, 1, . . . , c − 1) (45)
R i + Pi
5.1. Case 1: Intrusion detection of the KDD99 dataset

∑c −1 ∑c −1
TPi 5.1.1. Dataset description
i=0 0 Nii
Acc = ∑c −1 = ∑c −1 i=
∑c −1 (46) The KDD99 dataset is selected as a standard benchmark
(TPi + FNi ) Nij
i=0 i=0 j=0 database to evaluate the effectiveness of the proposed models.
In the KDD99 dataset, each instance consists of 41 features and
c −1 c −1 a label, where the label belongs to either a normal or specific
1∑ TPi 1∑
MAcc = = Ri (47) attack type. As Table 3 shows, there are 23 types of attacks in
c TPi + FNi c total, which can be divided into four major attack categories [28]:
i=0 i=0
DoS (denial of service), PRB (probing), U2R (user to root) and R2L
(remote to local).
c −1
1∑ As the full dataset of KDD99 (18 M; 743 M uncompressed)
MF = Fi (48)
c is cumbersome for training machine learning algorithms, the
i=0
10% subset of the dataset (2.1 M; 75 M uncompressed) is used
by the majority of researchers. Thus, the subset that maintains
c −1 c −1 the initial characteristics of the full dataset is chosen to be the
1 ∑ TPi 1 ∑
AAcc = = Ri (49) experiment dataset in current work. Table 4 provides the detailed
c−1 TPi + FNi c−1
i=1 i=1 information of the instances in the datasets, where the training
and two testing datasets are denoted as T0 , T1 and T2 , respectively.
FN1 The training dataset T0 together with the testing data set T1 are
FAR = = 1 − R1 (50) used to demonstrate the effectiveness of the proposed HKELM,
TP1 + FN1
DEGSA-HKELM and KPCA-DEGSA-HKELM models. Furthermore,
to be consistent with other literature works such as KDD99 win-
∑c −1
Nj0 ner [48], another testing dataset T2 is applied to compare the
FP0 j=1
FNR = ∑c −1 = ∑c −1 ∑c −1 (51) performance of the proposed KPCA-DEGSA-HKELM model with
FP0 + i=1 TPi j=1 Nj0 + i=1 Nii the models of other research.

where c = 5, 10 and 5 denote the number of categories for the


5.1.2. Experimental results and discussion
KDD99 dataset, UNSW-NB15 dataset and TE dataset, respectively;
Firstly, the training dataset T0 and testing dataset T1 are used
true positive (TP) is the number of correctly classified instances; to demonstrate the performance of the proposed HKELM, DEGSA-
false positive (FP) is the number instances with an actual class HKELM and KPCA-DEGSA- HKELM models.
other than the ith but incorrectly classified as the ith class; false (1) HKELM
negative (FN) is the number of instances with ith being the actual When the model parameters are set as a = 100, b = 15,
class but incorrectly classified as another class. p = 4, and w = 0.9, the confusion matrices of the Poly_KELM,
L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648 9

Table 3
Attack categories of the KDD99 dataset.
Class Meaning Attacks of KDD99
DoS Denial of service back, land, neptune, pod, smurf, teardrop
PRB Surveillance and other means of probing ipsweep, nmap, portsweep, satan
U2R Unauthorized access to local (root) privileges buffer_overflow, loadmodule, perl, rootkit
R2L Unauthorized access from a remote machine ftp_write, guess_passwd, imap, multihop, phf, spy, warezclient, warezmaster

Table 4
The instances of training and testing datasets for the KDD99 dataset.
Number of instances
Class
Training data set T0 Testing dataset T1 Testing dataset T2 10% KDD99 dataset
Normal 200 1000 10000 97278
DoS 60 500 40000 391458
PRB 40 500 400 4107
U2R 30 52 52 52
R2L 60 1000 100 1126

Table 5 Table 6
Confusion matrices of the KDD99 dataset T1 . Comparison of the overall evaluation metrics with the KDD99 dataset T 1 .
(a) By the Poly_KELM method Evaluation metric
Method
Classified class Acc (%) MAcc (%) MF (%) AAcc (%) FNR (%)
Actual class
Normal DoS PRB U2R R2L Recall (%) Poly_KELM 91.87 89.19 85.15 87.99 6.14
Normal 940 0 0 31 29 94.00 RBF_KELM 92.46 86.27 87.71 83.39 6.68
DoS 1 497 2 0 0 99.40 HKELM 93.25 92.19 88.08 91.12 5.81
PRB 48 0 411 16 25 82.20
U2R 6 0 0 41 5 78.85
R2L 67 1 0 17 915 91.50
Precision (%) 89.27 99.80 99.54 48.42 96.00 the testing results with different values of a, b, p and w are shown
in Fig. 8(a)∼(d).
(b) By the RBF_KELM method
As shown in Fig. 8, the values of these four parameters will
Classified class
Actual class affect the accuracy of the classification, therefore, it is significant
Normal DoS PRB U2R R2L Recall (%) to determine the appropriate set of the four parameter values.
Normal 978 0 2 4 16 97.80 (2) DEGSA-HKELM
DoS 3 463 34 0 0 92.60
It is difficult to quickly and precisely find the optimal param-
PRB 55 15 427 0 3 85.40
U2R 10 0 0 33 9 63.46 eter values via human experience. As a result, DEGSA integrated
R2L 64 7 1 7 921 92.10 with a swarm intelligent optimization algorithm is introduced to
Precision (%) 88.11 95.46 92.03 75.00 97.05 adaptively obtain the optimal parameter values in the current
(c) By the HKELM method work. In the meanwhile, the DE-HKELM and GSA-HKELM meth-
ods are considered to compare with the DEGSA-HKELM method
Classified class
Actual class proposed in the paper. In current work, the internal parameters of
Normal DoS PRB U2R R2L Recall (%)
the DE branch of the DEGSA-HKELM method are consistent with
Normal 965 0 0 23 12 96.50
those of the DE-HKELM method, which are set as: the population
DoS 2 496 2 0 0 99.20
PRB 32 0 429 15 24 85.80 size is NDE = 10, the maximum number of iterations is iter max =
U2R 4 0 0 46 2 88.46 20; the lower bound of the scaling parameter F in Eq. (23) is
R2L 78 1 0 11 910 91.00 FL = 0.2, the upper bound of the scaling parameter F in Eq. (23)
Precision (%) 89.27 99.80 99.54 48.42 95.99 is FU = 0.8, and the crossover rate in Eq. (24) is CR = 0.2.
(a) The parameters are set as b = 15, and p = 4. In addition, the internal parameters of the GSA branch of the
(b) The parameter is set as a = 100. DEGSA-HKELM method are consistent with those of the GSA-
(c) The parameters are set as a = 100, b = 15, p = 4, and w = 0.9.
HKELM method, which are set as: the number of agents is NGSA =
20, the maximum number of iterations is iter max = 20, the small
constant in Eq. (28) is ε = 2−52 , the initial gravitational constant
RBF_KELM and proposed HKELM methods are given in in Eq. (29) is G0 = 300, and the descending coefficient in Eq. (29)
Table 5(a)∼(c). is γ = 20.
To demonstrate the performance of Poly_KELM, RBF_KELM Table 7 shows the detailed F-score of each class obtained
and HKELM, a more visual illustration is shown in Fig. 7, which by DE-HKELM, GSA-HKELM and DEGSA-HKELM. From Table 7,
gives the comparisons among the three methods as the F-score the F-score of each class obtained by DEGSA-HKELM is higher
and mean F-score for each class. In Fig. 7, the F-score of normal, than that of the other two methods. In particular, the F-score of
PRB and mean F-score obtained by the HKELM method are higher U2R obtained by DE-HKELM is 77.78%, GSA-HKELM is 81.48% and
than those of the other two methods. In addition, the overall DEGSA-HKELM is 85.46%. DEGSA-HKELM improves of the F-score
evaluation metrics of the three methods are shown in Table 6. of U2R by approximately 7.68% and 3.98% compared to that of
It is clear that the HKELM method is better than the other two DE-HKELM and GSA-HKELM, respectively. Furthermore, the mean
methods for all the overall evaluation metrics. F-score obtained by DE-HKELM is 92.77%, GSA-HKELM is 93.61%
For the proposed HKELM method, the appropriate values of and DEGSA-HKELM is 94.86%. DEGSA-HKELM achieves percentage
the hybrid kernel function parameters a, b, p and w in Eq. (14) increases of approximately 2.09% and 1.25% for the mean F-
need to be selected. By utilizing the HKELM method that is score, when compared to that of DE-HKELM and GSA-HKELM,
constructed upon the training dataset T0 and testing dataset T1 , respectively.
10 L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648

Fig. 8. Testing results for the KDD99 dataset with different values of a, b, p, and w.
L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648 11

Table 7 0.012569 s. In detail, KPCA-DEGSA-HKELM achieves a percent-


The detailed F-score of each class obtained by DE-HKELM, GSA-HKELM and age decrease of 62.57% in time compared to that of DEGSA-
DEGSA-HKELM with the KDD99 dataset T1 .
HKELM, which indicates the great computational efficiency of the
Class DE-HKELM (%) GSA-HKELM (%) DEGSA-HKELM (%)
proposed KPCA-DEGSA-HKELM approach.
Normal 94.52 94.70 95.52 (4) Comparisons and discussion
DoS 99.20 99.20 99.40
PRB 97.24 97.24 97.45
The proposed KPCA-DEGSA-HKELM approach is also tested
U2R 77.78 81.48 85.46 with the testing dataset T2 , and the results of which are compa-
R2L 95.11 95.43 96.50 rable to those of other literature works. Table 10 gives the com-
Mean 92.77 93.61 94.86 parison among the solutions achieved by the proposed approach
and those of the literature methods.
As shown in Table 10, the current work has superior precision
Table 8
for DoS and U2R attacks, when compared with that of other
Comparison of the overall evaluation metrics for the KDD99 dataset T1 .
methods. Moreover, the accuracy (Acc), mean accuracy (MAcc),
Evaluation metric
Method mean F-score (MF ) and attack accuracy (AAcc) evaluation met-
Acc (%) MAcc (%) MF (%) AAcc (%) FNR (%)
rics of the proposed KPCA-DEGSA-HKELM are 99.00%, 95.38%,
DE-HKELM 95.61 93.09 92.77 91.59 5.21 87.21% and 94.47%, respectively, which are higher than those
GSA-HKELM 95.84 93.96 93.61 92.68 5.01
of the other methods. The mean F-score of the current work is
DEGSA-HKELM 96.59 95.58 94.86 94.70 4.12
87.21%, KDDwinner [48] is 58.87%, CSVAC [5] is 66.20%, CPSO-
SVM [18] is 71.28% and Dendron [6] is 85.77%; the mean F-score
of the current work is improved by approximately 28.34%, 21.01%,
15.93% and 1.44% over that of KDDwinner, CSVAC, CPSO-SVM
and Dendron, respectively. The attack accuracy of the current
work is 94.47%, KDDwinner [48] is 83.74%, CSVAC [5] is 69.83%,
CPSO-SVM [18] is 92.62% and Dendron [6] is 87.50%; the at-
tack accuracy of the current work is improved by approximately
10.73%, 24.64%, 1.85% and 6.97% compared to that of KDDwinner,
CSVAC, CPSO-SVM and Dendron, respectively. As a result, the
proposed approach has the ability to improve the performance
of the KDD99 problem.
In addition, to demonstrate the advantage in computational
Fig. 9. F-scores and mean F-scores for DE-HKELM, GSA-HKELM and DEGSA-
efficiency of the current work, the results (accuracy, training
HKELM with the KDD99 dataset.
and testing time) of CPSO-SVM [18] and the proposed KPCA-
DEGSA-HKELM with the same training dataset T0 are shown in
Table 11.
In addition, to indicate the performance of DEGSA more intu-
It is noted that all the results of CPSO-SVM and the current
itively, the F-score and mean F-score of DE-HKELM, GSA-HKELM
work are obtained on the same computational platform. In Ta-
and DEGSA-HKELM are shown in Fig. 9. The column bars contain-
ble 11, the accuracy of the current work with testing datasets
ing slashes, horizontal lines and pure black present the F-scores
T1 and T2 are 96.69% and 99.00%, respectively, which are higher
obtained by DE-HKELM, GSA-HKELM and DEGSA-HKELM, respec- than those of the CPSO-SVM method. Furthermore, the training
tively. It is evident that the F-scores for the five classes and the and testing times of CPSO-SVM with the testing dataset T1 are
overall mean F-score obtained by DEGSA-HKELM are higher than 19.915382 s and 0.041214 s, and those of KPCA-DEGSA-HKELM
those of the other two methods, especially the F-score of the U2R are 13.204581 s and 0.012569 s. Specifically, KPCA-DEGSA-HKELM
class. achieves savings of 33.70% in training time and 69.50% in testing
Table 8 gives the comparison of the overall evaluation metrics time when compared to those of CPSO-SVM. In addition, the
for DE-HKELM, GSA-HKELM and DEGSA-HKELM. The accuracy, training and testing times of CPSO-SVM with the testing dataset
mean accuracy, mean F-score and attack accuracy of DEGSA- T2 are 20.047230 s and 0.426238 s, while those of KPCA-DEGSA-
HKELM are higher than those of DE-HKELM and GSA-HKELM, HKELM are 14.830142 s and 0.168058 s, respectively. In detail,
while the false normal rate of DEGSA-HKELM is lower than that KPCA-DEGSA-HKELM achieves savings of 26.02% in training time
of the other two methods. These results show that the hybrid and 60.57% in testing time when compared to those of CPSO-
optimization algorithm DEGSA is superior to DE and GSA in SVM. The results of Table 11 reveal the time-saving benefits of
determining the optimal parameters to improve the performance KPCA-DEGSA-HKELM for the KDD99 dataset.
of the HKELM method.
(3) KPCA-DEGSA-HKELM 5.2. Case 2: Intrusion detection with the UNSW-NB15 dataset
The KPCA algorithm is introduced to reduce the impact of
meaningless or less important features on the classification re- 5.2.1. Dataset description
sults and computational efficiency. The comparisons between UNSW-NB15 was proposed by Nour et al. [29] as a modernized
DEGSA-HKELM and KPCA-DEGSA-HKELM are shown in Table 9. dataset reflecting contemporary network traffic characteristics
In Table 9, the accuracy, mean accuracy and attack accuracy of and new low-footprint attack scenarios. This dataset was cre-
KPCA-DEGSA-HKELM are 96.69%, 95.66% and 94.85%, which are ated at the Australian Centre for Cyber Security (ACCS), utilizing
higher than those of DEGSA-HKELM with 96.59%, 95.58% and the IXIA tool, which generated a modern representative of real
94.70%, respectively. In addition, the false normal rate means modern normal and synthetic abnormal network traffic in the
the probability that the actual attack records are misclassified synthetic environment. The UNSW-NB15 dataset is quite different
as the normal records, which has significant meaning in ID sys- from the previous KDD99 dataset and reflects a more recent and
tems. The false normal rate of KPCA-DEGSA-HKELM is 3.73%, complex threat environment. The UNSW-NB15 dataset contains
which achieves a percentage decrease of 9.47% compared to that approximately 2540044 data instances, and each instance con-
of DEGSA-HKELM with 4.12%. Furthermore, the testing time of sists of 49 features. As Table 12 shows, this dataset includes 9
DEGSA-HKELM is 0.033581s, and that of KPCA-DEGSA-HKELM is types of attack in total.
12 L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648

Table 9
Comparison of the overall evaluation metrics between DEGSA-HKELM and KPCA-DEGSA-HKELM with the KDD99 dataset T1 .
Evaluation metric
Method
Acc (%) MAcc (%) MF (%) AAcc (%) FAR (%) FNR (%) Testing time (s)
DEGSA-HKELM 96.59 95.58 94.86 94.70 0.90 4.12 0.033581
KPCA-DEGSA-HKELM 96.69 95.66 94.38 94.85 1.10 3.73 0.012569

Table 10
Comparison of the solutions achieved by the proposed approach with those of other literature methods with the KDD99 dataset T2 .
Precision (%)
Method Acc (%) MAcc (%) MF (%) AAcc (%) FAR (%)
Normal DoS PRB U2R R2L
KDDwinner [48] 99.45 97.12 83.32 13.16 8.40 92.71 81.91 58.87 83.74 25.39
SVM [49] 99.30 99.50 97.50 19.70 28.80 95.70 N/A N/A N/A 0.70
CSVAC [5] 99.91 99.72 65.74 42.59 20.47 94.86 75.26 66.20 69.83 3.04
CPSO-SVM [18] 96.87 99.98 63.61 11.08 50.27 98.05 93.45 71.28 92.62 3.26
RTMAS-AIDS [15] 97.89 99.79 91.86 24.68 35.90 95.86 N/A N/A N/A 2.13
Dendron [6] 99.36 99.12 82.83 52.63 79.54 98.85 89.85 85.77 87.50 0.75
Current work 96.35 99.99 89.57 54.55 68.15 99.00 95.38 87.21 94.47 0.94

Table 11
The results of the CPSO-SVM method and the current work for different KDD99 testing datasets..
Acc (%) Training time (s) Testing time (s)
Testing dataset
CPSO-SVM Current work CPSO-SVM Current work Time saved CPSO-SVM Current work Time saved
T1 94.35 96.69 19.915382 13.204581 33.70% 0.041214 0.012569 69.50%
T2 98.05 99.00 20.047230 14.830142 26.02% 0.426238 0.168058 60.57%

The results are obtained on the same computational platform.

Table 12
Attack categories of the UNSW-NB15 dataset.
Class Meaning
Generic A technique that works against all block-ciphers (with a given
block and key size), without consideration of the structure of the
block-cipher
Exploits The attacker knows of a security problem within an operating
system or a piece of software and leverages that knowledge by
exploiting the vulnerability
Fuzzers Attempting to cause a program or network to suspend by
feeding it randomly generated data
DoS A malicious attempt to make a server or a network resource
unavailable to users, usually by temporarily interrupting or
suspending the services of a host connected to the Internet
Reconnaissance Contains all strikes that can simulate attacks that gather
information
Analysis It contains different attacks, including port scan, spam and html
file penetrations
Backdoors A technique in which a system security mechanism is bypassed
stealthily to access a computer or its data
Shellcode A small piece of code used as the payload in the exploitation of
software vulnerability
Worms Attacker replicates itself to spread to other computers. Often, it
uses a computer network to spread itself, relying on the security
failures of the target computer to access it

Fig. 10. F-scores and mean F-scores for CPSO-SVM, Dendron and KPCA-DEGSA-HKELM with the UNSW-NB15 dataset.

The UNSW-NB15 dataset has been divided into two subsets, removing 6 features from the original dataset. The training set
namely, the training set and testing set [29], which contain and testing set are used by the majority of researchers; therefore,
175341 and 82332 records, respectively. It is noted that the they are chosen to be the experiment datasets in the current
partitioned dataset has only 43 features with the class label, work. Table 13 indicates the detailed information of the datasets,
L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648 13

Table 13
The instances of training and testing datasets for the UNSW-NB15 dataset.
Number of instances
Class
UNSW-NB15 training set UNSW-NB15 testing set Training dataset D0 Testing dataset D1
Normal 56000 37000 3420 30786
Generic 40000 18871 366 3291
Exploits 33393 11132 760 6849
Fuzzers 18184 6062 485 4353
DoS 12264 4089 172 1546
Reconnaissance 10491 3496 270 2433
Analysis 2000 677 45 401
Backdoors 1746 583 40 306
Shellcode 1133 378 40 338
Worms 130 44 22 22
Total 175341 82332 5620 50325

Table 14
Confusion matrix of the UNSW-NB15 dataset D1 .
Classified class
Actual class
Norm. Gene. Expl. Fuzz. DoS Reco. Anal. Back. Shell. Worms Recall (%)
Norm. 30043 7 208 310 6 52 17 6 134 3 97.59
Gene. 44 3014 126 42 30 14 1 0 20 0 91.58
Expl. 363 15 5895 176 136 118 9 12 121 4 86.07
Fuzz. 958 8 82 3107 32 66 6 3 90 1 71.38
DoS 121 28 648 141 395 98 17 7 90 1 25.55
Reco. 102 6 172 38 37 1976 6 0 96 0 81.22
Anal. 54 0 69 62 70 31 101 0 13 1 25.19
Back. 30 0 47 68 56 34 15 41 15 0 13.40
Shell. 77 0 4 11 0 31 0 0 215 0 63.61
Worms 1 0 12 0 0 0 0 0 0 9 40.91
Precision (%) 94.50 97.92 81.16 78.56 51.84 81.65 58.72 59.42 27.08 47.37

Norm.: Normal, Gene.: Generic, Expl.: Exploits, Fuzz.: Fuzzers, Reco.: Reconnaissance, Anal.: Analysis, Back.: Backdoors, Shell.: Shellcode.

where the training and testing datasets are denoted as D0 and D1 , Table 15
respectively. D0 and D1 are applied to compare the performance The detailed F-score of each class obtained by KPCA-DEGSA-HKELM and the
other literature methods with the UNSW-NB15 dataset.
of the proposed KPCA-DEGSA-HKELM model with other literature
Class CPSO-SVM [18] (%) Dendron [6] (%) Current work (%)
methods.
Normal 92.81 95.58 96.02
Generic 87.45 88.96 94.65
5.2.2. Experimental results and discussion Exploits 74.21 76.22 83.55
The confusion matrix derived from the testing process of the Fuzzers 44.94 68.84 74.80
proposed KPCA-DEGSA-HKELM approach is given in Table 14. DoS 20.23 16.76 34.23
Reconnaissance 56.15 53.42 81.43
Table 15 shows the detailed F-score of each class obtained
Analysis 15.89 31.48 35.25
by CPSO-SVM [18], Dendron [6] and the proposed KPCA-DEGSA- Backdoors 12.41 28.01 21.87
HKELM approach. From Table 15, the F-scores for the classes Shellcode 32.99 22.32 37.99
Normal, Generic, Exploits, Fuzzers, DoS, Reconnaissance, Analy- Worms 27.83 6.56 43.90
sis, Shellcode and Worms achieved by the KPCA-DEGSA-HKELM Mean 46.49 48.81 60.37
approach are higher than those of the other two methods, and
the F-score of the Backdoors class obtained by the KPCA-DEGSA-
HKELM approach is higher than that of the CPSO-SVM method.
accuracy (Acc), mean accuracy (MAcc), mean F-score (MF ) and
Particularly, the F-scores for the DoS, Reconnaissance and Worms
attack accuracy (AAcc) of the current work are higher than those
classes are improved by over 10%, when compared with those of other literature methods, while the false attack rate (FAR) of
of the other two methods. Moreover, the mean F-score provided the current work is lower than that of other literature methods.
by KPCA-DEGSA-HKELM is 60.37%, CPSO-SVM is 46.49% and Den- In particular, the accuracy of the proposed KPCA-DEGSA-HKELM
dron is 48.81%. The KPCA-DEGSA-HKELM approach achieves per- approach is 89.01%, which represents a 3.45% increase over the
centage increases of approximately 13.88% and 11.56% in mean suboptimal result 85.56% achieved by DT [50]. Therefore, the
F-score when compared to that of CPSO-SVM and Dendron, re- proposed approach has the ability to improve the performance
spectively. of the UNSW-NB15 problem.
In addition, to indicate the performance of KPCA-DEGSA- In addition, to demonstrate the advantage in computational
HKELM more intuitively, the F-scores and mean F-scores of CPSO- efficiency of current work, the results (accuracy, training and
SVM, Dendron and KPCA-DEGSA-HKELM are shown in Fig. 10. testing time) of CPSO-SVM [18] and the proposed KPCA-DEGSA-
The column bars containing slashes, horizontal lines and pure HKELM with the same training dataset D0 are shown in Table 17.
black are the F-scores obtained by CPSO-SVM, Dendron and As Table 17 indicates, the accuracy of the current work is
KPCA-DEGSA-HKELM, respectively. It is apparent that the F-scores 89.01%, which yields a percentage increase of 7.95% over that
for the nine classes and the overall mean F-score obtained by of the CPSO-SVM method with 81.06%. Furthermore, the training
KPCA-DEGSA-HKELM are higher than those of the other two and testing times of CPSO-SVM are 114.226274 s and 14.430140 s,
methods. while those of KPCA-DEGSA-HKELM are 43.306235 s and 2.567050
Table 16 gives the comparison of the overall evaluation met- s, respectively. Specifically, the KPCA-DEGSA-HKELM achieves
rics for the proposed approach and other literature methods. The savings of 62.09% in training time and 82.21% in testing time,
14 L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648

Fig. 11. Flowchart of the TE process.

Table 16 Table 18
Comparison of overall evaluation metrics for the proposed approach and other Attack categories of the TE intrusion dataset.
literature methods with the UNSW-NB15 dataset. Attack Target
Evaluation metric Step C header pressure loss—reduced availability (stream 4)
Method
Acc (%) MAcc (%) MF (%) AAcc (%) FAR (%) Random variation A, B, and C feed compositions (stream 4)
CPSO-SVM [18] 81.06 49.98 46.49 44.99 5.16 Slow drift Reaction kinetics
ANN [50] 81.34 N/A N/A N/A 21.13 Sticking Reactor cooling water valve
NB [50] 82.07 N/A N/A N/A 18.56
DT [50] 85.56 N/A N/A N/A 15.78
GALR-DT [51] 81.42 N/A N/A N/A 6.39
MP [6] 73.89 27.91 26.61 20.57 4.56 5.3.1. Intrusion simulation experiment of the TE process
C4.5 [6] 85.15 49.33 48.79 44.14 2.54 The revised TE process model was created by Bathelt et al. [52]
Dendron [6] 84.33 52.21 48.81 47.19 2.61 which provided a real chemical simulation platform. Five major
CAI [23] 82.74 N/A N/A N/A 36.46
Current work 89.01 59.65 60.37 55.43 2.41
units named reactor, product condenser, vapor–liquid separator,
product stripper and recycle compressor constitute the TE pro-
cess [53]. The revised TE process contains four reactants named
Table 17 A, C, D and E as well as an inertia component, B. The setup
The results of the CPSO-SVM method and current work with the testing dataset
D1 .
produces two liquid products named G and H and a byproduct
Method Acc (%) Training time (s) Testing time (s)
named F through a reaction system composed of four irreversible
chemical reactions, and the flowchart of the TE process is shown
CPSO-SVM [18] 81.06 114.226274 14.430140
Current work 89.01 43.306235 2.567050 in Fig. 11 [26,52].
Accuracy improved/time saved 7.95% 62.09% 82.21% The intrusion experiment of the revised TE process was con-
The results are obtained on the same computational platform.
ducted in operating mode 1, which seems to be the most com-
monly used mode in the literature [30]. The process was first
run for 40 h under normal operating conditions (Phase I), and
an attack (including step, random variation, slow drift and stick-
when compared with those of CPSO-SVM. The results of Table 17
ing attack categories) was then introduced to the process for
indicate the time-saving benefits of KPCA-DEGSA-HKELM for the 100 h (Phase II). Taking the slow drift attack as an example,
UNSW-NB15 dataset. the corresponding plots of the reactor pressure and G product
quality are shown in Figs. 12 and 13, respectively. Figs. 12 and
13 illustrate that the process is in control in Phase I; however,
5.3. Case 3: Intrusion detection with the industrial TE process when introducing a slow drift attack at 40 h, the process tends to
become out of control in Phase II. In the practical industrial pro-
cess, fluctuations in process variables, such as reactor pressure,
To further demonstrate the ID performance of the proposed
will cause a significant reduction in product quality and even
KPCA-DEGSA-HKELM approach in a real and complex environ- severe damage to the equipment. As a result, it is significant to
ment, an industrial simulation experiment platform with a non- propose an effective ID system for detecting malicious activities
linear and complex multicomponent TE process is built. This in industrial processes.
platform simulates the continuous chemical system and attack Finally, the values of the 41 measured variables and 12 manip-
activities in the real TE process to obtain the TE intrusion data. ulated variables were collected in sequence during the continuous
L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648 15

(AAcc) evaluation metrics of the proposed KPCA-DEGSA-HKELM


are 95.82%, 95.90% and 95.16%, respectively, which are higher
than those of CPSO-SVM; while the false attack rate (FAR) of the
current work is lower than that of CPSO-SVM. As a result, the pro-
posed approach presents the ability to improve the performance
of the TE intrusion problem.
In addition, to demonstrate the advantage in computational
efficiency of the current work, the results (accuracy, training
and testing time) of CPSO-SVM [18] and the proposed KPCA-
DEGSA-HKELM with the same training dataset E0 are shown in
Table 21.
As Table 21 shows, the accuracy of the current work is 95.82%,
which achieves a percentage increase of 5.11% when compared
to that of the CPSO-SVM method with 90.71%. Furthermore, the
training and testing times of CPSO-SVM are 54.184571 s and
Fig. 12. Plot of the reactor pressure during both Phase I and Phase II. 0.252235 s, while those of KPCA-DEGSA-HKELM are 21.283896 s
and 0.128424 s, respectively. In detail, the KPCA-DEGSA-HKELM
achieves savings of 60.72% in training time and 49.09% in test-
ing time, when compared to those of CPSO-SVM. The results in
Table 21 demonstrate the time-saving benefits of KPCA- DEGSA-
HKELM for the TE intrusion dataset.

6. Conclusions

An effective IDS named KPCA-DEGSA-HKELM is proposed that


can detect malicious attacks successfully. For the classic bench-
mark KDD99 dataset, the HKELM model, based on the presented
hybrid kernel function, outperforms Poly_KELM with the polyno-
mial kernel and RBF_KELM with the RBF kernel, as gauged by all
the overall evaluation metrics. The hybrid optimization algorithm
DEGSA, which combines the advantages of both GSA and the DE
algorithm, is employed to search for the optimal parameters of
Fig. 13. Plot of G production quality during both Phase I and Phase II. the HKELM model. To show the performance of DEGSA-HKELM,
DE-HKELM and GSA-HKELM are developed and estimated. The
Table 19 three models are applied to the training dataset T0 and testing
The instances of training and testing datasets for the TE intrusion dataset.
dataset T1 , and the results indicate that DEGSA-HKELM achieves a
Number of instances
Class percentage increase in the mean F-score of approximately 2.09%
Training dataset E0 Testing dataset E1
and 1.25% compared to that of DE-HKELM and GSA-HKELM, re-
Normal 300 10000 spectively. In addition, the KPCA algorithm is introduced for the
Step 300 10000
Random variation 300 10000 dimensionality reduction and feature extraction of the ID data.
Slow drift 300 10000 Then, the KPCA-DEGSA-HKELM approach is carried out with the
Sticking 300 10000 training dataset T0 and testing dataset T2 . Compared with other
Total 1500 50000 literature results, the mean F-score of the current work is im-
proved by approximately 28.34%, 21.01%, 15.93% and 1.44% when
compared to that of KDDwinner, CSVAC, CPSO-SVM and Dendron,
operation of the process with a sampling time of 0.01 h, and respectively. For the real modern UNSW-NB15 dataset, all the
those 52 variables are used as feature variables in the TE intrusion overall evaluation metrics of the current work are higher than
dataset. Table 18 describes the major four attack categories of the those of other literature methods, while the false attack rate (FAR)
TE intrusion dataset. of the current work is lower than that of other literature meth-
Table 19 provides the detailed information of the instances in ods. In particular, the accuracy of the proposed KPCA-DEGSA-
the datasets, where the training and testing datasets are denoted HKELM approach achieves a 3.45% percentage increase over the
as E0 and E1 , respectively.
suboptimal result of 85.56% obtained by DT. For the industrial
intrusion TE dataset, the accuracy of the current work achieves
5.3.2. Experimental results and discussion
a percentage increase of 5.11%, when compared to that of the
The proposed KPCA-DEGSA-HKELM approach and the CPSO-
SVM method are tested with training dataset E0 and testing CPSO-SVM method. Furthermore, KPCA-DEGSA-HKELM achieves
dataset E1 . Table 20 presents the comparison of the solutions a higher computational efficiency with savings of 60.57%, 82.21%
between the proposed approach and the CPSO-SVM method. and 49.09% in testing time compared to that of CPSO-SVM for the
As Table 20 shows, the current work exhibits F-scores for KDD99, UNSW-NB15 and intrusion TE datasets, respectively. The
the five classes that are superior to those of CPSO-SVM. More- experimental results for the three ID datasets demonstrate the
over, the accuracy (Acc), mean F-score (MF ) and attack accuracy effectiveness and efficiency benefits of the current work.
16 L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648

Table 20
Comparison of the solutions between the proposed approach and CPSO-SVM with the TE intrusion dataset.
F-score (%)
Method Acc (%) MF (%) AAcc (%) FAR (%)
Normal Step Random Slow Sticking
CPSO-SVM [18] 88.57 95.99 90.19 87.40 91.98 90.71 90.83 90.78 9.58
Current work 91.28 99.09 95.54 97.41 96.21 95.82 95.90 95.16 1.53

Random: Random variation; Slow: Slow drift.

Table 21
Results of the CPSO-SVM method and current work with the testing dataset E1 .
Method Acc (%) Training time (s) Testing time (s)
CPSO-SVM [18] 90.71 54.184571 0.252235
Current work 95.82 21.283896 0.128424
Accuracy improved/time saved 5.11% 60.72% 49.09%

The results are obtained on the same computational platform.

Notation [2] A.L. Buczak, E. Guven, A survey of data mining and machine learning
methods for cyber security intrusion detection, IEEE Commun. Surv. Tutor.
18 (2016) 1153–1176, http://dx.doi.org/10.1109/comst.2015.2494502.
a Exponent parameter of the RBF kernel function
[3] R. Yahalom, A. Steren, Y. Nameri, M. Roytman, A. Porgador, Y. Elovici,
b Constant parameter of the polynomial kernel Improving the effectiveness of intrusion detection systems for hierarchical
function data, Knowl.-Based Syst. 168 (2019) 59–69, http://dx.doi.org/10.1016/j.
B Euclidian distance knosys.2019.01.002.
C Penalty parameter [4] T. Aldwairi, D. Perera, M.A. Novotny, An evaluation of the performance of
fit Fitness value restricted Boltzmann machines as a model for anomaly network intrusion
detection, Comput. Netw. 144 (2018) 111–119, http://dx.doi.org/10.1016/j.
G Gravitational constant comnet.2018.07.025.
H Output matrix of the hidden layer [5] W.Y. Feng, Q.L. Zhang, G.Z. Hu, J.X.J. Huang, Mining network data for
I Identity matrix intrusion detection through combining SVMs with ant colony networks,
K Kernel function Future Gener. Comput. Syst. 37 (2014) 127–140, http://dx.doi.org/10.1016/
M Gravitational mass j.future.2013.06.027.
[6] D. Papamartzivanos, F.G. Marmol, G. Kambourakis, Dendron: Genetic trees
p Exponent parameter of the polynomial kernel
driven rule induction for network intrusion detection systems, Future
function Gener. Comput. Syst. 79 (2018) 558–574, http://dx.doi.org/10.1016/j.future.
tar Target vector 2017.09.056.
T Output matrix of the target [7] C.H. Tsang, S. Kwong, H.L. Wang, Genetic-fuzzy rule mining approach and
T0 Training dataset evaluation of feature selection techniques for anomaly intrusion detection,
Pattern Recognit. 40 (2007) 2373–2391, http://dx.doi.org/10.1016/j.patcog.
T1 , T2 Testing datasets
2006.12.009.
U Trail vector [8] F. Salo, A.B. Nassif, A. Essex, Dimensionality reduction with IG-PCA and
V Mutant vector ensemble classifier for network intrusion detection, Comput. Netw. 148
w Weight coefficient of the hybrid kernel function (2019) 164–175, http://dx.doi.org/10.1016/j.comnet.2018.11.010.
x Input feature vector [9] C.F. Tsai, C.Y. Lin, A triangle area based nearest neighbors approach to
y Output vector intrusion detection, Pattern Recognit. 43 (2010) 222–229, http://dx.doi.org/
10.1016/j.patcog.2009.05.017.
α Weight vector between the input layer and hidden
[10] Y. Li, L. Guo, An active learning based TCM-KNN algorithm for supervised
layer network intrusion detection, Comput. Secur. 26 (2007) 459–467, http:
β Weight matrix of the hidden layer //dx.doi.org/10.1016/j.cose.2007.10.002.
γ Descending coefficient [11] C. Xiang, P.C. Yong, L.S. Meng, Design of multiple-level hybrid classifier
Γ High-dimensional feature space for intrusion detection system using Bayesian clustering and decision
trees, Pattern Recognit. Lett. 29 (2008) 918–924, http://dx.doi.org/10.1016/
Φ Nonlinear mapping function
j.patrec.2008.01.008.
[12] G.Y. Chan, C.S. Lee, S.H. Heng, Policy-enhanced ANFIS model to counter
CRediT authorship contribution statement SOAP-related attacks, Knowl.-Based Syst. 35 (2012) 64–76, http://dx.doi.
org/10.1016/j.knosys.2012.04.013.
Lu Lv: Methodology, Software, Writing - original draft. Wen- [13] G.Y. Chan, C.S. Lee, S.H. Heng, Discovering fuzzy association rule patterns
hai Wang: Supervision, Project administration. Zeyin Zhang: Su- and increasing sensitivity analysis of XML-related attacks, J. Netw. Comput.
Appl. 36 (2013) 829–842, http://dx.doi.org/10.1016/j.jnca.2012.11.006.
pervision, Project administration. Xinggao Liu: Supervision, Writ-
[14] G.Y. Chan, C.S. Lee, S.H. Heng, Defending against XML-related attacks in e-
ing - review & editing, Funding acquisition.
commerce applications with predictive fuzzy associative rules, Appl. Soft.
Comput. 24 (2014) 142–157, http://dx.doi.org/10.1016/j.asoc.2014.06.053.
Acknowledgments [15] W.L. Al-Yaseen, Z.A. Othman, M.Z.A. Nazri, Real-time multi-agent system
for an adaptive intrusion detection system, Pattern Recognit. Lett. 85
This work is supported by the National Key R&D Program of (2017) 56–64, http://dx.doi.org/10.1016/j.patrec.2016.11.018.
China (grant number 2018YFB2004200), Zhejiang Provincial Nat- [16] P.Y. Tao, Z. Sun, Z.X. Sun, An improved intrusion detection algorithm based
ural Science Foundation, PR China (grant number LY18D060002), on GA and SVM, IEEE Access 6 (2018) 13624–13631, http://dx.doi.org/10.
1109/access.2018.2810198.
and National Natural Science Foundation of China (grant number
[17] J.P. Liu, J.Z. He, W.X. Zhang, T.Y. Ma, Z.H. Tang, J.P. Niyoyita, W.H. Gui,
61590921), and their supports are thereby acknowledged. ANID-SEoKELM: Adaptive network intrusion detection based on selective
ensemble of kernel ELMs with random features, Knowl.-Based Syst. 177
References (2019) 104–116, http://dx.doi.org/10.1016/j.knosys.2019.04.008.
[18] F.J. Kuang, S.Y. Zhang, Z. Jin, W.H. Xu, A novel SVM by combining
[1] H.W. Wang, J. Gu, S.S. Wang, An effective intrusion detection framework kernel principal component analysis and improved chaotic particle swarm
based on SVM with feature augmentation, Knowl.-Based Syst. 136 (2017) optimization for intrusion detection, Soft Comput. 19 (2015) 1187–1199,
130–139, http://dx.doi.org/10.1016/j.knosys.2017.09.014. http://dx.doi.org/10.1007/s00500-014-1332-7.
L. Lv, W. Wang, Z. Zhang et al. / Knowledge-Based Systems 195 (2020) 105648 17

[19] A.I. Saleh, F.M. Talaat, L.M. Labib, A hybrid intrusion detection system [35] G.F. Smits, E.M. Jordaan, Improved SVM regression using mixtures of
(HIDS) based on prioritized k-nearest neighbors and optimized SVM kernels, in: Proceeding 2002 Int. Jt. Conf. Neural Networks, Vols. 1–3, 2002,
classifiers, Artif. Intell. Rev. 51 (2019) 403–443, http://dx.doi.org/10.1007/ pp. 2785–2790, http://dx.doi.org/10.1109/IJCNN.2002.1007589.
s10462-017-9567-1. [36] Z.D. Tian, S.J. Li, Y.H. Wang, X.D. Wang, Wind power prediction method
[20] M.R.G. Raman, N. Somu, K. Kirthivasan, R. Liscano, V.S.S. Sriram, An efficient based on hybrid kernel function support vector machine, Wind Eng. 42
intrusion detection system based on hypergraph - Genetic algorithm for (2018) 252–264, http://dx.doi.org/10.1177/0309524x17737337.
parameter optimization and feature selection in support vector machine, [37] R. Storn, K. Price, Differential evolution - A simple and efficient heuristic
Knowl.-Based Syst. 134 (2017) 1–12, http://dx.doi.org/10.1016/j.knosys. for global optimization over continuous spaces, J. Global Optim. 11 (1997)
2017.07.005. 341–359, http://dx.doi.org/10.1023/a:1008202821328.
[21] A.A. Aburomman, M.B.I. Reaz, A novel SVM-kNN-PSO ensemble method [38] S. Das, P.N. Suganthan, Differential evolution: A survey of the state-of-the-
for intrusion detection system, Appl. Soft. Comput. 38 (2016) 360–372, art, IEEE Trans. Evol. Comput. 15 (2011) 4–31, http://dx.doi.org/10.1109/
http://dx.doi.org/10.1016/j.asoc.2015.10.011. tevc.2010.2059031.
[22] A.A. Aburomman, M.B. Reaz, A novel weighted support vector machines [39] A.K. Qin, V.L. Huang, P.N. Suganthan, Differential evolution algorithm with
multiclass classifier based on differential evolution for intrusion detection strategy adaptation for global numerical optimization, IEEE Trans. Evol.
systems, Inf. Sci. 414 (2017) 225–246, http://dx.doi.org/10.1016/j.ins.2017. Comput. 13 (2009) 398–417, http://dx.doi.org/10.1109/tevc.2008.927706.
06.007. [40] E. Rashedi, H. Nezamabadi-Pour, S. Saryazdi, GSA: A gravitational search
[23] C.R. Wang, R.F. Xu, S.J. Lee, C.H. Lee, Network intrusion detection us- algorithm, Inform. Sci. 179 (2009) 2232–2248, http://dx.doi.org/10.1016/j.
ing equality constrained-optimization-based extreme learning machines, ins.2009.03.004.
Knowl.-Based Syst. 147 (2018) 68–80, http://dx.doi.org/10.1016/j.knosys. [41] E. Rashedi, H. Nezamabadi-pour, S. Saryazdi, BGSA: binary gravitational
2018.02.015. search algorithm, Nat. Comput. 9 (2010) 727–745, http://dx.doi.org/10.
[24] J.H. Ku, B. Zheng, Intrusion detection based on self-adaptive differential 1007/s11047-009-9175-3.
evolution extreme learning machine with Gaussian kernel, in: G. Chen, H. [42] M. Zhang, X.G. Liu, Z.Y. Zhang, A soft sensor for industrial melt index
Shen, M. Chen (Eds.), Parallel Archit. Algorithm Program, Paap 2017, 2017, prediction based on evolutionary extreme learning machine, Chin. J. Chem.
pp. 13–24, http://dx.doi.org/10.1007/978-981-10-6442-5_2. Eng. 24 (2016) 1013–1019, http://dx.doi.org/10.1016/j.cjche.2016.05.030.
[25] H. Bostani, M. Sheikhan, Hybrid of binary gravitational search algo- [43] M. Seyedmahmoudian, R. Rahmani, S. Mekhilef, A.M.T. Oo, A. Stojcevski,
rithm and mutual information for feature selection in intrusion detection T.K. Soon, A.S. Ghandhari, Simulation and hardware implementation of
systems, Soft Comput. 21 (2017) 2307–2324, http://dx.doi.org/10.1007/ new maximum power point tracking technique for partially shaded PV
s00500-015-1942-8. system using hybrid DEPSO method, IEEE Trans. Sustain. Energy 6 (2015)
[26] S.M. He, L. Xiao, Y.L. Wang, X.G. Liu, C.H. Yang, J.G. Lu, W.H. Gui, Y.X. Sun, A 850–862, http://dx.doi.org/10.1109/tste.2015.2413359.
novel fault diagnosis method based on optimal relevance vector machine, [44] W.J. Zhang, X.F. Xie, DEPSO: Hybrid particle swarm with differential
Neurocomputing 267 (2017) 651–663, http://dx.doi.org/10.1016/j.neucom. evolution operator, in: 2003 IEEE Int. Conf. Syst. Man Cybern. Vols 1–
2017.06.024. 5, Conf. Proc, 2003, pp. 3816–3821, http://dx.doi.org/10.1109/ICSMC.2003.
[27] X. Qiu, K.C. Tan, J.X. Xu, Multiple exponential recombination for differential 1244483.
evolution, IEEE Trans. Cybern. 47 (2017) 995–1006, http://dx.doi.org/10. [45] I.T. Jolliffe, Principle Component Analysis, 2006.
1109/tcyb.2016.2536167. [46] B. Scholkopf, A. Smola, K.R. Muller, Nonlinear component analysis as a
[28] W. Lee, S.J. Stolfo, A framework for constructing features and models for kernel eigenvalue problem, Neural Comput. 10 (1998) 1299–1319, http:
intrusion detection systems, ACM Trans. Inf. Syst. Secur. 3 (2000) 227–261, //dx.doi.org/10.1162/089976698300017467.
http://dx.doi.org/10.1145/382912.382914. [47] L.L. Guo, P. Wu, J.F. Gao, S.W. Lou, Sparse kernel principal component anal-
[29] N. Moustafa, J. Slay, UNSW-NB15: A comprehensive data set for network ysis via sequential approach for nonlinear process monitoring, IEEE Access
intrusion detection systems (UNSW-NB15 network data set), in: IEEE 7 (2019) 47550–47563, http://dx.doi.org/10.1109/access.2019.2909986.
Military Communications and Information Systems Conference, MilCIS, [48] C. Elkan, Results of the KDD’99 classifier learning, ACM SIGKDD Explor.
2015, http://dx.doi.org/10.1109/MilCIS.2015.7348942. Newsl. 1 (2000) 63–64, http://dx.doi.org/10.1145/846183.846199.
[30] F. Capaci, E. Vanhatalo, M. Kulahci, The revised Tennessee Eastman process [49] S.J. Horng, M.Y. Su, Y.H. Chen, T.W. Kao, R.J. Chen, J.L. Lai, C.D. Perkasa,
simulator as testbed for SPC and DoE methods, Qual. Eng. 31 (2019) A novel intrusion detection system based on hierarchical clustering and
212–229, http://dx.doi.org/10.1080/08982112.2018.1461905. support vector machines, Expert Syst. Appl. 38 (2011) 306–313, http:
[31] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: A new learning //dx.doi.org/10.1016/j.eswa.2010.06.066.
scheme of feedforward neural networks, in: 2004 IEEE Int. Jt. Conf. Neural [50] N. Moustafa, J. Slay, The evaluation of network anomaly detection systems:
Networks, Vols. 1–4, Proc, 2004, pp. 985–990, http://dx.doi.org/10.1109/ Statistical analysis of the UNSW-NB15 data set and the comparison with
IJCNN.2004.1380068. the KDD99 data set, Int. J. Inf. Secur. 25 (2016) 18–31, http://dx.doi.org/
[32] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: Theory and 10.1080/19393555.2015.1125974.
applications, Neurocomputing 70 (2006) 489–501, http://dx.doi.org/10. [51] C. Khammassi, S. Krichen, A GA-LR wrapper approach for feature selection
1016/j.neucom.2005.12.126. in network intrusion detection, Comput. Secur. 70 (2017) 255–277, http:
[33] J. Wu, Y. Zhu, Z.C. Wang, Z.J. Song, X.G. Liu, W.H. Wang, Z.Y. Zhang, Y.S. Yu, //dx.doi.org/10.1016/j.cose.2017.06.005.
Z.P. Xu, T.J. Zhang, J.H. Zhou, A novel ship classification approach for high [52] A. Bathelt, N.L. Ricker, M. Jelali, Revision of the Tennessee Eastman
resolution SAR images based on the BDA-KELM classification model, Int. process model, in: 2015 IFAC Symposium on Advanced Control of Chemical
J. Remote Sens. 38 (2017) 6457–6476, http://dx.doi.org/10.1080/01431161. Processes ADCHEM, 48, 2015, pp. 309–314, http://dx.doi.org/10.1016/j.
2017.1356487. ifacol.2015.08.199.
[34] G.B. Huang, H.M. Zhou, X.J. Ding, R. Zhang, Extreme learning machine for
regression and multiclass classification, IEEE Trans. Syst. Man Cybern. B 42
(2012) 513–529, http://dx.doi.org/10.1109/tsmcb.2011.2168604.

You might also like