Ilorin Journal of Computer Science and Information Technology
Vol. 5, No. 1 (2022)
© Department of Computer Science, University of Ilorin
ISSN: 2141-3959 (print)
A Metaheuristic Approach to Network Intrusion Detection
Modinat A. Mabayoje 1*, Kayode S. Adewole2, Omonigho E. Ekeruvwe3, Jumoke F. Ajao4,
Akinyemi O. Akinrotimi5, Abdullateef O. Balogun6.
1.2.3,5.6
Department of Computer Science, University of Ilorin, Ilorin, Nigeria.
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
4
Kwara State University, Malete, Nigeria.
[email protected]
*Corresponding author
Abstract
Content: It is crucial to avoid intrusion in networks; hence, a developing and intrusion detection system that used a strong
mechanism for detecting intrusions is important. Several studies have been conducted in the domain of intrusion detections.
However, some of them suffer from high false alarms, in terms of the use of a raw dataset with redundancy. Objective: This
paper, therefore, proposes a multi-level dimensionality reduction framework that is based on meta-heuristic optimization and
Principal Component Analysis (PCA). Method: In this research, PCA was applied for feature extraction. Genetic Algorithm
and Particle Swarm Optimization, that is GA-PSO, algorithms were utilized for feature selection to extract the most
discriminative features to develop intrusion detection model. In the classification phase, both Artificial Neural Network
(ANN) and Support Vector Machine (SVM) algorithms were used to develop intrusion detection, using
kddcup.data_10_percent dataset. Result: Experimental results reveal that the proposed framework brought about an
accuracy of 99.7% and ROC of 99.9%, while the time required building model is 0.23 seconds. Conclusion: To a very high
extent, incidences of high false alarm are allayed through the GA-PSO induced feature selection method.
keywords: Genetic Algorithm, Principal Component Analysis, Particle Swarm Optimization, Support Vector Machine,
Intrusion Detection System.
22
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
1.
Introduction
Wireless Sensor Networks (WSNs) are defenseless
against several security threats because of reasons like
their weak nature, dynamism, fault-tolerance,
scalability, and their scattered nature. Nowadays
WSNs are increasingly playing important roles in the
economy and as such become prime targets of
intruders (Rezaeipanah, Mojarad & Sechin Matoori,
2021). In recent researches about security threat
against WSNs, Principal Component Analysis (PCA)
has been utilized for feature selection in which
features are chosen at some level consisting of the
main principal components. There exists the tendency
to omit a few cogent characteristics and to incorporate
insignificant features in the feature subset amid this
procedure. The technique described may not be viable
because some important characteristics which may be
more delicate and useful for classification (Xue et al.,
2012). An adaptation is, thus, necessary with respect
to this issue. Particle Swarm Optimization (PSO) is
proposed and executed for ideal element choice.
Artificial Neural Network (ANN), which simulates the
way information is analyzed and analyzed the brain
and Support Vector Machine (SVM), (which performs
classification, regression, and outer detection), given
its proven effectiveness in detecting diverse types of
anomalies and intrusions (Mohammadi, 2021; Kocher
& Kumar, 2021) is utilized for grouping. PSO is a
powerful and productive worldwide search method.
A fitting algorithm that addresses feature selection
issues. It is simpler to develop because only few
parameters are required to search for large spaces.
Also, it is less expensive computationally and provides
better representation.
Private communication on the internet and some other
system is usually exposed to threat of intrusion and
abuses. As such, intrusion detection systems have
turned into an essential part of computer resources and
system security. Different methodologies are being
used in intrusion detection. However, none of the
systems so far is without its faults.
A lot of previous works on intrusion detection and
prevention have not been able to seriously address the
issue of multi-level feature selection. Existing
techniques suffer a few issues like high computational
cost, the complexity of classifier architecture, and
higher memory use (Kumar, 2021). Researches have
shown that PCA has been utilized for feature selection
in which features are chosen at some level of the most
crucial components. It is possible to miss a few
important features and to incorporate insignificant
features in feature subset amid this procedure. This
technique may not be viable because of the negligence
of certain features which may be more delicate and
important to the classifier.
PSO, ANN and SVM were utilized for grouping. PSO
is a powerful and productive worldwide search
method which is simpler to use because only few
parameters are needed. It is capable of searching
spaces that are very wide. It is also computationally
inexpensive and has better representation. In recent
times PSO is being used in developing intrusion
detection mechanisms for Internet of Things (IoT)
applications so as to foster security, integrity and
reliability (Liu et al., 2021; Kan et al., 2021). Genetic
algorithm is connected to search the principal space
for the feature subset selection (Metawa et al., 2017;
Mirjalili, 2019). This strategy by-passes previous
approaches by which genetic algorithm yet has minor
shortcomings like incapability of solving variant
problems, and the inability to discover global
optimum. Hence, optimal feature selection based on
the meta-heuristics approach is considered important
to enhance the classifier performance.
The remaining part of this paper is organized as
follows: Section 2 discusses related works in network
intrusion detection. Section 3 outlines the
methodology, while section 4 discusses the outcomes
of the research, and section 5 presents the conclusion.
2. Related Works
Intrusion is known as an arrangement of activities that
endeavors to compromise the honesty, privacy, or
accessibility of computer system resources (Zamboni,
2001). An intruder, therefore, can be characterized as
a system, program or individual who attempts to break
into an computer to carry out an illegal activity
(Graham, 2000). The act of identifying intrusion; i.e,
activities that endeavour to reduce or totally eliminate,
the confidentiality, and integrity of the information on
a system is often called intrusion detection (Zamboni,
2001). This can be handled through the deployment of
an intrusion detection system – that is a gadget or
programming application that monitors networks
and/or system activities for malicious actions and
produces reports (Scarfone et al., 2007).
Gong, Zhong, Yu and Hu (2018) used a genetic
algorithm to distinguish strange system practices
considering data hypothesis. Some of these practices
are related to organized attacks while considering
common data between arranged highlights and sort of
intrusions; after which these highlights are utilized to
determine a direct structure lead and a Generic
23
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
Algorithm (GA). The approach of using common data
and coming about straight run can be deemed
exceptionally compelling due to the lessened
multifaceted nature and greater identification rate.
However the challenge is that it is seen as just the
discrete highlights. Gong, Zulkernine,
and
Abolmaesumi, (2005) exhibited an execution of GA
based technique to deal with Network intrusion
detection and indicated programming usage. The
method determined an arrangement of grouping rules
and uses a help certainty system to judge wellness
work. A yet another effort is Abdullah et al. (2009)
attempt which demonstrated a genetic algorithm-based
execution assessment calculation to network intrusion
recognition. The approach utilizes data hypothesis for
sifting the activity information.
The exertion of utilizing a genetic algorithm for
detecting intrusion can be traced back to Crosbie and
Spafford (1995) which connected the different
operator innovation and GP to distinguish network
abnormalities. The scientists used GP (Genetic
Programming) to create bizarre system practices and
every operator can screen one parameter of the system
review information. The proposed philosophy has a
acceptable position when many self-sufficient
operators are utilized. However, it has an issue when
conveying among the specialists and the procedure
can be very difficult if the operators are not
appropriately introduced.
In Kolias et al. (2011) a point by point correlation of a
few intrusion detection approaches considering swarm
insight is exhibited. The fundamental spotlight was on
the investigation of the proficiency of each technique
in the territory of intrusion recognition.
Anand et al. (2012), proposed a lead-based
component determination calculation to evacuate
repetitive ascribes, and to choose a touchy list of
capabilities that is significant for meddling
investigation motor in remote sensor systems. The
concentration was just disavowal of administration
attacks. Further, the multiclass- Support Vector
Machine (SVM) is stretched out for the change of
characterization precision. In Hassanzadeh et al.
(2014), a checking strategy was used for intrusion
recognition in Wireless Mesh Networks (WMN)
which depends on two classes: movement freethinker
and clever and activity mindful and creative. The
outcome shows ideal execution in the rate of intrusion
detection and asset utilization in WMN. In (Fawzy et
al, 2013) tended to exception detection issues in
remote sensor systems and they proposed anomaly
detection and grouping instrument in a sensor
network. The outcomes indicated a change in the
arrangement procedure.
Staudemeyer et al. (2014) offered an element
determination instrument which depended on custom
component preprocessing. This strategy takes a shot at
the difference. The features with higher fluctuation are
chosen, while the ones with less change were
overlooked. This procedure may miss numerous
essential features. Othman et al. (2014), however,
proposed a features choice calculation considering
record to record travel while Support Vector Machine
is connected for the characterization. In the same vein,
Halim et al. (2021) proposed a technique which
preserves many important information which are
related to a dataset being analyzed with few number of
bring about a solution to problems of network security
and intrusion detection. The study utilized an
enhanced Genetic Algorithm (GA)-based feature
selection method, named as GA-based Feature
Selection (GbFS), in increasing the classifiers’
accuracy. It introduces parameter tuning for the GAbased feature selection along with a novel fitness
function. The improved GA-based feature selection
technique resulting from the study was tested over
three benchmark network traffic datasets. A
comparison is also performed with standard feature
selection methods. Results show that the accuracies
improved using GbFS by giving a maximum accuracy
of 99.80%.
Onah et al. (2021) developed a Genetic Algorithm
Wrapper-Based feature selection and Nave Bayes for
Anomaly Detection Model (GANBADM) in a Fog
Environment that eliminates extraneous attributes to
reduce time complexity while also developing an
enhanced model that can predict results with greater
accuracy using the Security Laboratory Knowledge
Discovery Dataset (NSL-KDD). The results showed
that the developed system has a higher overall
performance of 99.73% accuracy, with a false positive
rate as low as 0.6%. The results reveal that the
proposed GANBADM approach performs better than
similar approaches captured in the researchers’
literature.
Gamal, Abbas, & Sadek (2020) proposed the use of
Genetic Algorithm (GA) along with Reinforcement
Learning (RL) and threat intelligence to overcome
XSS attacks. For validation, the proposed approach is
applied on a real dataset of XSS attacks. Results show
24
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
better performance of our proposed approach when
compared to the approaches reported in the literature.
In addition to better performance, our method is not
only flexible to changes in XSS payloads, but the
results are also more understandable to end users.
Moreover, our approach shows improvement when the
number of attacks is increased. Alimi et al. (2021)
developed an IDS model which relies on the
hybridization of particle swarm optimization (PSO)
and back-propagation neural network (BPNN) to
classify intrusions in water system infrastructure. The
PSO is used to optimize the parameters for the BPNN,
increasing the efficiency of classification. The iTrust
Lab's secure water treatment dataset was for validating
the procedure. The experimental outcome revealed
that: using prominent classification metrics, the
precision results achieved (97% accuracy and 98.7%)
using the developed BPNN-PSO model is better
compared to those obtained using other methods
including models from related works. The proposed
model is therefore deemed to be able to meet the
requirements of cyberattacks and intrusions detection
in practical water distribution infrastructure.
2.1
Theoretical Background
This section looks at the genetic algorithm and other
related concepts applied to intrusion detection, as well
as the parameters in genetic algorithm.
2.1.1
Genetic Algorithm
A Genetic Algorithm (GA) is a programming
procedure that imitates natural development as a
critical thinking technique. It depends on Darwinian's
guideline of advancement and survival of the fittest to
streamline a populace of hopeful arrangements
towards predefined wellness. GA utilizes a
development and common choice that uses a
chromosome-like information structure and advance
the chromosomes utilizing choice, recombination and
transformation administrators. It is a class of
computational models given standards of advancement
and characteristic determination. These calculations
change over the issue in a particular space into a
model by utilizing a chromosome-like information
structure and develop the chromosomes utilizing
determination, recombination, and transformation
administrators. The scope of the applications that can
make utilization of the genetic algorithm is very wide.
2.1.2
Principal Component Analysis
Principal Component Analysis (PCA) is a
measurement method that is ordinarily utilized for
information examination. It is an extremely helpful
strategy for feature selection. The PCA is connected to
change raw features into foremost features to ensure
that the features are all the more unmistakable and
their significance envisioned. This method has been
utilized in various areas. In this system, the features
are chosen based on eigenvalues. The features with
greater eigenvalues are chosen, while the features with
low eigenvalues are disregarded.
2.1.3
Particle Swarm Optimization
Particle Swarm Optimization (PSO) is an algorithm
for stochastic optimization method that is based on
swarm. It was developed by Eberhat and Kenedy
(1995). As a populace based system, PSO simulates
animals’ social behaviour. The main focus of PSO
algorithm is closely related to two kinds of research,
including an evolutionary related algorithm which,
like PSO, makes use of swarm mode by which it
simultaneously searches wide region within the
solution space of the optimized objective function.
Likewise, the artificial life-related algorithm as
discussed in the next subsection below.
2.1.4
Artificial Neural Network
Artificial Neural Network (ANN), also simply referred
to as neural networks, are systems of computing that
are inspired by the biological neural network that
constitutes the brains of animals (van den Bergh,
2001) (Basheer, and Hajmeer, 2020). In practice,
ANN refers to the part of artificial intelligence which
simulates the functioning of the human brain.
Processing units make up ANNs which also consist of
inputs and outputs.
2.1.5
Support Vector Machine
The support vector machine (SVM) is an administered
machine learning design which uses classification
algorithms for two-group classification problems.
After giving SVM model sets of labelled training data
for each category, they can group new text, so that a
text classification problem is being worked on. More
specifically, an SVM builds a hyperplane or set of
hyperplanes in an infinite-dimensional space, which
can be used for classification, or other functions like
detection of outliers. Naturally, a good separation is
derived by the hyperplane with the largest distance to
the nearest training-data point of any class, also
known as functional margin. This is so because
generally, the generalization error of the classifier gets
lower as much as the margin gets larger (Trevor et al.,
2008).
3.
Methodology
Figure 1 shows the methodology employed in this
paper. Each of the components is discussed in the
subsequent
sections.
25
A Metaheuristic Approach to Network Intrusion Detection
Datasets
Evaluation
Mabayoje et. al.
Feature selection
(GA-PSO)
Feature extraction
(PCA)
Preprocessing
Training
Testing
Classification
Figure 1: Architecture of proposed system
3.1
Dataset
Ten percent of the KDD 99 intrusion detection
datasets were utilized for the implementation of our
algorithm which depends on the 1998 DARPA
activity. This in turn gives originators of intrusion
detection systems (IDS) with a benchmark upon which
assess distinctive systems lies. The dataset can be
downloaded from this link https://www.kdd.org/kddcup/view/kdd-cup-1999/Data.
3.2
Preprocessing
Preprocessing of raw features is vital since they
confound the classifier which brings about false
cautions. Also, a couple of representatives feature
increment computational and memory assets and are
unexploited for the grouping methods. The raw list of
capabilities from the KDD Cup dataset is
communicated by:
rf={rx1,rx2,rx3,rx4,…,rxl},
(1)
where r=2,
which shows that there are 2 features in the raw
Input feature vector
Principal feature space
dataset. The emblematic features are disposed of from
the raw list of capabilities as these features increment
overheads with no advantages in the learning process.
3.3
Feature Extraction
Component Analysis (PCA)
using
Principal
PCA is an approach that is ordinarily used for
information examination. It is an extremely helpful
strategy for feature selection. The PCA is connected to
change raw features into foremost features with the
goal that the features are all the more unmistakable
with envisioned significance. This method has been
utilized in various areas. In this system, the features
are chosen based on eigenvalues. The features with the
higher eigenvalues are chosen and the features with
bringing down eigenvalues are disregarded. The Flow
of PCA for feature extraction is illustrated with Figure
2.
Mean calculation
Obtain eigenvalues and
eigenvectors
Find deviation
Find covariance
Figure 2: Flow of PCA for feature extraction
3.4 Feature Selection using Genetic Algorithms
Genetic algorithms (GA) are a piece of transformative
figuring, which is a quickly developing region of manmade reasoning. GAs seeks calculations in light of the
standards of characteristic choice and hereditary
qualities. In GAs, a populace of chromosomes shows
hopeful arrangements of the issue. Every chromosome
is spoken to, with settled length bits. The essential
populace of chromosomes is made by conveying 0 s
discretionarily. This circulation depended on and large
task of 0 s. In this encoding plan, each chromosome is
a touch of strings (0 s) whose length is figured by the
number of foremost segments in the essential space.
The bit with 1 is chosen and a bit with 0 isn't chosen in
this encoding plan. Every chromosome demonstrates a
competitor arrangement or a subset of the main parts.
The populace develops via looking through the ideal
arrangement utilizing hereditary administrators. The
26
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
GAs has two major problems, which are:
neighbourhood optima and high computational cost
(Hamamoto, 2018). The stream of GA for include
determination appears in Figure 3. The calculation
connected is portrayed.
First population creation
Population evaluation
Criterion met
Crossover
Selection
Mutation
Return best individuals
Figure 3: Flow of GA for feature selection
3.5 Feature Selection using Particle Swarm
Optimization (PSO)
The Particle Swarm Optimization (PSO) is a populacebased system created by Eberhart and Kennedy(1995).
PSO is an effective and esteemed worldwide hunt
method. It is an appropriate calculation to address
determination issues due to the accompanying reasons
like simple encoding of highlight, worldwide hunt
Swarm initialization
Criterion met
office, being sensible computationally, fewer
parameters, and less demanding execution. The PSO is
connected to include determination due to the
previously mentioned reasons. Its stream for the
feature selection process is established in Figure 4
below.
Particle from evaluation
Update
velocity of a
particle
Update position of
a particle
Update fitness of a
particle (pbest)
Update fitness of a
particle (gbest)
Return gbest and
its value
Figure 4: Flow of PSO for feature selection
The vital space is the inquiry space through which a
subset of important segments or vital features were
investigated and chosen using PSO. The particles
speak to applicant arrangements in the pursuit of space
particles and frame a populace which is otherwise
27
called a swarm. The swarm of the molecule is created
by dispersing 0 s arbitrarily. For each molecule, if the
foremost segment is 1, it is chosen and the main part
with 0 is disregarded. Along these lines, each
molecule demonstrates an alternate subset of central
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
parts. The particles swarm is introduced haphazardly
and afterwards, it moved in the hunt space or central
space to look through the ideal subset of features by
refreshing its position and speed.
3.6
Classification
This section outlines the classification method which
is a technique of grouping a given set of data into
classes. This technique can be performed on both
structured and unstructured data. It begins with
predicting the class of given data points.
3.6.1
Artificial Neural Network
An artificial neural network is made of numerous
neurons that are connected as per particular system
design (Mukherjee, 1994). The target of the neural
system is to change the contributions to significant
yields. The outcome is dictated by the attributes of the
hubs and the weights related to the interconnections
among neurons. By changing the associations between
the hubs, the system can adjust to the coveted yields. It
looks like the mind in two regards:
1) Knowledge is obtained by the system from its
condition through a learning procedure.
2) Interneuron association qualities, known as synaptic
weights, are utilized to store the gained information.
3.6.2
Support Vector Machine
The support vector machine utilizes a segment of the
information to prepare the system, finding a few help
vectors that speak to the preparation information.
These help vectors will frame an SVM show. As
indicated by this model, the SVM will work with PSO
for advancement and highlight subset determination.
What's more, it enhances the SVM display. After that
SVM is utilized to group a given obscure dataset.
3.6.3 Detection Rate
Detection rate (DR) is calculated as the ratio between
the number of correctly detected intrusions and the
total number of intrusions, that is:
DR =
#𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
(2)
#𝐹𝑎𝑙𝑠𝑒𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒+#𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
3.6.4 False Positive Rate
False positive rate (FP) is calculated as the ratio
between the numbers of normal connections that are
incorrectly classified as intrusions and the total
number of normal connections, that is:
DR =
#𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
(3)
#𝑇𝑟𝑢𝑒𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒+#𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
4.
Results Discussion
In the testing stage, for each of the test information,
the underlying populace is made to utilize the
information and happening change in various features.
This populace is contrasted, and every chromosome
arranged in the preparation stage. Part of the populace,
which is more approximately related to all preparation
information than others, is expelled. Hybrid and
change happen in the rest of the populace which turns
into the number of inhabitants in a new age. The
procedure keeps running up to the point that the age
estimate reduces to one. The gathering of the
chromosome which is the nearest relative of a just
surviving chromosome of test information is returned
as the anticipated sort.
4.1
Results Based on All Features
This testing work utilized "kddcup.data_10_percent"
as preparing dataset and "redressed" as testing dataset.
For this situation, the training set comprises of
44,104records among which 3,232 are connection
records which are normal and 40,872 are attack
connection records, while the test set consists of
55,894 records which include 55036 are normal
connection records and 858 are attack connection
records. The table below reveals the distribution of
each of the intrusion type in the training and testing
sets.
Table 1: Classification results based on all features
Class
FP
Rate
0.073
F-Measure
MCC
Precision
Recall
Normal
TP
Rate
0.985
0.964
0.918
0.945
Attack
0.927
0.015
0.952
0.918
Weighted
Avg
0.959
0.048
0.959
0.918
28
0.985
ROC
Area
0.959
PRC
Area
0.928
0.979
0.927
0.958
0.961
0.960
0.959
0.959
0.943
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
1.2
1
0.8
0.6
0.4
0.2
0
TP Rate
FP Rate
F-Measure
MCC
Precision
Normal
Attack
Recall
ROC Area PRC Area
Figure 5: Classification results based on all features
In this classification, performance of different features
that could produce a high level of accuracy and also
some error to diagnose the dataset is examined. The
aggregate time that is required to develop this model is
also an important parameter in the comparison of the
classification algorithm which is 0.23 seconds.
Table 2: Standard metrics for system evaluation
Instances
Actual
Normal
Class
Intrusion
Kappa statistic
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
Predicted Label
Percentage (%)
95908
4090
0.9166
0.0412
0.201
8.3484%
40.4829%
99998
4.2 Results Based on Principal Component Analysis
This testing work utilized "kddcup.data_10_percent"
as the preparation dataset, and "redressed" as the
testing dataset. For this situation, the training set
comprises of 44,104 records among which 419 are
normal connection records and 43,685 are attack
95.9099 %
4.0901%
connection records, while the test set contains 55,894
records among which 55,617 are normal connection
records and 277 are attack connection records. The
table below shows the distribution of each intrusion
type in the training and testing set.
Table 3: Classification results based on PCA using multi-class classifier
Class
TP Rate
FP Rate
F-Measure
MCC
Precision
Recall
ROC
Area
PRC
Area
Attack
0.990
0.005
0.992
0.986
0.994
0.990
0.997
0.996
Normal
0.995
0.010
0.994
0.986
0.993
0.995
0.997
0.996
Weighted
Avg
0.993
0.007
0.993
0.986
0.993
0.993
0.997
0.996
29
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
Classification results based on PCA
1.2
1
0.8
0.6
0.4
0.2
0
TP Rate
FP Rate F-Measure
MCC
Precision
Attack
Recall
ROC Area PRC Area
Normal
Figure 6: Classification results based on principal component analysis
In this classification, we examine the performance of
principal component using multi-class identifier that
could generate accuracy and some error to diagnosis
the data set. The total time required to build this model
is also a crucial parameter in comparing the
classification algorithm which is 19.16 seconds.
Table 4: Standard metrics for system evaluation
Actual
Class
Instances
99302
696
Normal
Intrusion
Kappa statistic
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
Predicted Label
Percentage (%)
99.304 %
0.696 %
0.9859
0.021
0.0858
4.2648 %
17.2826 %
99998
4.3 Results Based on Genetic Algorithm
This testing work utilized "kddcup.data_10_percent"
as preparing dataset, and "redressed" as testing
dataset. For this situation, the training set comprises of
44,104 records among of which 303 are normal
connection records and 43,801 are attack connection
records. The test set contains 55,894 records among of
which 55,855 are normal connection records and 39
are attack connection records. The table below reveals
the distribution of each intrusion type in the training
and testing set.
Table 5: Classification results based on GA using Bagging
Class
TP Rate
FP Rate
F-Measure
MCC
Precisi
on
Recall
ROC
Area
PRC
Area
Attack
0.993
0.001
0.996
0.993
0.999
0.993
0.999
0.999
Normal
0.999
0.007
0.994
0.986
0.995
0.999
0.999
0.999
Weighted
Avg
0.997
0.004
0.997
0.993
0.997
0.997
0.999
0.999
30
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
1.2
1
0.8
0.6
0.4
0.2
0
TP Rate
FP Rate F-Measure
MCC
Precision
Attack
Recall
ROC Area PRC Area
Normal
Figure 7: Classification results based on GA
In the classification, the performance of a genetic
algorithm using bagging that could generate accuracy
and some error to diagnose the data set were
examined. The total time required to build this model
is also a crucial parameter in comparing the
classification algorithm which is 10.08 seconds.
Table 6: Standard metrics for system evaluation
Actual
Class
Predicted Label
Instances
Percentage (%)
99656
99.658 %
342
0.342 %
Normal
Intrusion
Kappa statistic
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
0.9931
0.0062
0.0556
1.2497 %
11.1959 %
99998
4.4
Results Based on Generic Algorithm and
Particle Swarm Optimization (GA-PSO)
This testing work utilized "kddcup.data_10_percent"
as preparing dataset and "redressed" as testing dataset.
For this situation, the training set comprises of 44,104
records, among of which 305 are normal connection
records and 43,799 are attack connection records. The
test set contains 55,894 records, among of which
55,831 are normal connection records and 63 are
attack connection records. The table below reveals the
distribution of each intrusion type in the training and
testing set.
Table 7: Classification results based on Generic Algorithm and Particle Swarm Optimization (GA-PSO)
Class
TP
Rate
FP
Rate
F-Measure
MCC
Precision
Recall
ROC
Area
PRC Area
Attack
0.993
0.001
0.996
0.993
0.999
0.993
0.999
0.998
Normal
0.999
0.007
0.997
0.993
0.995
0.999
0.999
0.996
Weighted
Avg
0.996
0.004
0.996
0.993
0.996
0.996
0.999
0.997
31
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
Chart Title
1.2
1
0.8
0.6
0.4
0.2
0
TP Rate
FP Rate F-Measure
MCC
Precision
Attack
Recall
ROC Area PRC Area
Normal
Figure 7: Classification results based on GA-PSO
In this classification, the performance of a genetic
algorithm using bagging that could generate accuracy
and some error to diagnose the data set were
examined. The total time required to build this model
is also a crucial parameter in comparing the
classification algorithm which is 2.31 seconds.
Table 8: Standard metrics for system evaluation
Normal
Intrusion
Kappa statistic
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances
Actual
Class
Predicted Label
Instances
Percentage (%)
99630
99.632 %
368
0.368 %
0.9925
0.0061
0.0588
1.2366 %
11.8515%
99998
4.5
Models Comparison
In this study, the performance of different
classification methods that could generate accuracy
and some error to diagnose the data set were
examined. In order to make the evaluation of the
system simple, besides the classical accuracy measure,
the two standard metrics of detection rate and false
positive rate were employed for network intrusions
detection evaluations.
According to Table 7 above, which shows that the
proposed framework achieved an accuracy of 99.7%
and ROC of 99.9%, the time taken to build model is
0.23 seconds based on GA model. The total time
required to build the model is also a crucial parameter
in comparing the classification algorithm.
5. Conclusion
This paper talked about a strategy of utilizing genetic
algorithm in developing intrusion detection systems. A
short diagram of Intrusion Detection System (IDS),
genetic algorithm and related recognition strategies
were examined. To actualize and measure the
execution of the new system, this study utilized the
standard KDD99 benchmark dataset and got sensible
detection rate. To quantify the wellness of a
chromosome, the study utilized the standard deviation
condition with remove. On the off chance, where we
can utilize a superior
condition or heuristic in this detection procedure, we
trust that the identification rate and process will
enhance an awesome degree. Particularly, false
positive rate will, most likely, be much lower.
Subsequently, the designed intrusion detection system
can be enhanced with the assistance of more
measurable investigation with better and more
intricate conditions. The system design is additionally
presented. The variables influencing the GA are
attended to in detail. This usage of genetic algorithm is
extraordinary as it thinks about both fleeting and
spatial data of system associations amid the encoding
32
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
of the issue; hence, it ought to be more useful for
intrusion detection. The results returned by this
experiment on the proposed framework show that the
accuracy is 0.946 and its ROC area is 0.959. Hence,
the conclusion is drawn from the results obtained that
the framework designed is efficient compared to
existing techniques.
Future work is expected to combine the making of a
standard test information collection for the genetic
algorithm proposed in this paper, and then apply it to a
test situation. The details of parameters to be
considered for genetic algorithm ought to be resolved
amongst the trials. Consolidating learning from
various security sensors into a standard govern base is
another promising aspect in this study.
References
Ahmad, I. (2014). Enhancing MLP performance in
intrusion detection using optimal feature
subset selection based on genetic principal
components Applied Mathematics &
Information Sciences.
Alimi O., Ouahada K., Abu-Mahfouz A., Rimer S. &
Alimi K., (2021)."Intrusion Detection for
Water Distribution Systems based on an
Hybrid Particle Swarm Optimization with
Back Propagation Neural Network," 2021
IEEE AFRICON, 2021, pp. 1-5, doi:
10.1109/AFRICON51333.2021.9570951.
Basheer, I. and Hajmeer, M.N.(2020). Artificial
Neural Networks: Fundamentals,
Bobor, V. (2006). “Efficient Intrusion Detection
System Architecture Based on Neural
Networks
and
Genetic
Algorithms.
Department of Computer and Systems
Sciences,
Stockholm
University/Royal
Institute of Technology, KTH/DSV.
Brad, L.M., Shaw, M.J. (1996). “Genetic Algorithms
with Dynamic Niche Sharing for Multimodal
Function Optimization.” In Proceedings of
IEEE
international
conference
on
evolutionary computation (pp. 786-791).
IEEE.
Computing, Design, and Application, Journal
of microbiological methods, 10.1016/S01677012(00)00201-3
Crosbie, M., Spafford, E. (1995). “Applying Genetic
Programming to Intrusion Detection”. In
Working Notes for the AAAI Symposium on
Genetic Programming (pp. 1-8). Cambridge,
MA: MIT Press.
GAbased approach to network intrusion
detection,
http://www.cse.msu.edu/~cse848/2011/Stud
ent_papers/Tavon_Pourboghrat.pdf.
Gamal, M., Abbas, H., & Sadek, R. (2020, April).
Hybrid approach for improving intrusion
detection based on deep learning and machine
learning techniques. In The International
Conference on Artificial Intelligence and
Computer Vision (pp. 225-236). Springer,
Cham.
Gong, R. H.,Zulkernine, M. and Abolmaesumi, P.,
(2005): A software implementation of Gong,
Z., Zhong, P., Yu Y., and Hu, W. (2018),
‘‘Diversity-promoting deep structural metric
Graham,
R. (2000). FAQ: Network Intrusion
Detection
Systems.
Retrieved
from
http://www.
robertgraham.com/pubs/network-intrusiondetection. html.
Halim, Z., Yousaf, M. N., Waqas, M., Sulaiman, M.,
Abbas, G., Hussain, M., ... & Hanif, M.
(2021). An effective genetic algorithm-based
feature selection method for intrusion
detection systems. Computers & Security,
110, 102448.
Hamamoto, A. H., Carvalho, L. F., Sampaio, L. D. H.,
Abrão, T., &Proença Jr, M. L. (2018).
Network anomaly detection system using
genetic algorithm and fuzzy logic. Expert
Systems with Applications, 92, 390-402.
Iftikhar A. (2015) Feature Selection Using Particle
Swarm Optimization in Intrusion Detection,
International Journal of Distributed Sensor
Networks, article 9. Pp 9, 'Hindawi Limited'
Kan, X., Fan, Y., Fang, Z., Cao, L., Xiong, N., Yang,
D., & Li, X. (2021). A novel IoT network
intrusion detection approach based on
adaptive particle swarm optimization
convolutional neural network. Information
Sciences, 568, 147-162.
Kennedy, J., Eberhart, R. C., Shi, Y. (1995). Swarm
Intelligence. San Francisco, Calif, USA.
Kocher, G. & Kumar, G. (2021). Machine learning
and deep learning methods for intrusion
detection systems: recent developments and
challenges. Soft Comput 25, 9731–9763
(2021). https://doi.org/10.1007/s00500-02105893-0
learning
for
remote
sensing
scene
classification,’’ IEEE Trans. Geosci. Remote
Sens., vol. 56.
Metawa, N., Hassan, M. K., &Elhoseny, M. (2017).
Genetic algorithm based model for
optimizing bank lending decisions. Expert
Systems with Applications, 80, 75-82.
33
A Metaheuristic Approach to Network Intrusion Detection
Mabayoje et. al.
‘Mirjalili,
S.
(2019).
Genetic
algorithm.
In Evolutionary algorithms and neural
networks (pp. 43-55). Springer, Cham.’
Mohammadi, M., Rashid, T., Karim, S.., Aldalwie, A.,
Tho, Q.., Bidaki, M., & Hosseinzadeh, M.
(2021). A comprehensive survey and
taxonomy of the SVM-based intrusion
detection systems. Journal of Network and
Computer Applications, 178, 102983.
Mukherjee, B., Heberlein, L.T., Levitt, K.N.
(1994). “Network Intrusion Detection”. IEEE
network, 8(3), 26-41.
Rev. 1 (Draft)). National
Standards and Technology.
Institute
of
Sinclair, C., Pierce, L., Matzner, S. (1999). “An
Application of Machine Learning to network
intrusion detection”. In Proceedings 15th
Annual Computer Security Applications
Conference (ACSAC'99) (pp. 371-377).
IEEE.
Staudemeyer, R., Omlin, C. (2014). Extracting salient
features for network intrusion detection using
machine learning methods. South African
computer journal, 52(1), 82-96.
Onah, J. O., Abdullahi, M., Hassan, I. H., & AlGhusham, A. (2021). Genetic Algorithm
based feature selection and Naïve Bayes for
anomaly detection in fog computing
environment. Machine Learning with
Applications, 6, 100156.
Wang, L., Xiao, Y. (2006). A survey of energyefficient scheduling mechanisms in sensor
networks Mobile Networks and Applications.
Mobile Networks and Applications, 11(5),
723-740.
Rezaeipanah, A., Mojarad, M., & Sechin Matoori, S.
(2021). Intrusion Detection in Computer
Networks Through Combining Particle
Swarm Optimization and Decision Tree
Algorithms. Journal of Business Data Science
Research, 1(1), 14-22.
Xue, B., Zhang, M., Browne, W. N. (2013). Particle
swarm optimization for feature selection in
classification: a multi-objective approach
IEEE Transactions on Cybernetics. IEEE
transactions on cybernetics, 43(6), 16561671.
Scarfone, K., Mell,P. (2007). Guide to Intrusion
Detection and Prevention Systems (IDPS).
(No. NIST Special Publication (SP) 800-94
Zamboni, D. (2001). Using Internal Sensors for
Computer Intrusion Detection. Center for
Education and Research in Information
Assurance and Security, Purdue University.
34