A Machine Learning-Based Intrusion Detection
A Machine Learning-Based Intrusion Detection
A Machine Learning-Based Intrusion Detection
H O S T E D BY
Alexandria University
a
School of IT & Computing, American University of Nigeria, Nigeria
b
Department of Computer Science, Lagos State Polytechnic, Ikorodu, Nigeria
c
Department of Computer Science and Communication, Østfold University College, Halden, Norway
KEYWORDS Abstract The Internet of Things (IoT) refers to the collection of all those devices that could con-
Intrusion Detection System; nect to the Internet to collect and share data. The introduction of varied devices continues to grow
Machine Learning; tremendously, posing new privacy and security risks—the proliferation of Internet connections and
Internet of Things; the advent of new technologies such as the IoT. Various and sophisticated intrusions are driving the
Min-max Normalization; IoT paradigm into computer networks. Companies are increasing their investment in research to
UNSWNB-15; improve the detection of these attacks. By comparing the highest rates of accuracy, institutions
Principal Component Anal- are picking intelligent procedures for testing and verification. The adoption of IoT in the different
ysis; sectors, including health, has also continued to increase in recent times. Where the IoT applications
Cat boost; became well known for technology researchers and developers. Unfortunately, the striking chal-
XgBoost
lenge of IoT is the privacy and security issues resulting from the energy limitations and scalability
of IoT devices. Therefore, how to improve the security and privacy challenges of IoT remains an
important problem in the computer security field. This paper proposes a machine learning-based
intrusion detection system (ML-IDS) for detecting IoT network attacks. The primary objective
of this research focuses on applying ML-supervised algorithm-based IDS for IoT. In the first stage
of this research methodology, feature scaling was done using the Minimum-maximum (min–max)
concept of normalization on the UNSW-NB15 dataset to limit information leakage on the test data.
This dataset is a mixture of contemporary attacks and normal activities of network traffic grouped
into nine different attack types. In the next stage, dimensionality reduction was performed with
Principal Component Analysis (PCA). Lastly, six proposed machine learning models were used
for the analysis. The experimental results of our findings were evaluated in terms of validation data-
* Corresponding author.
E-mail addresses: [email protected] (Y. Kayode Saheed), [email protected] (S. Misra), [email protected] (M.
Kristiansen Holone), [email protected] (R. Colomo-Palacios).
Peer review under responsibility of Faculty of Engineering, Alexandria University.
https://doi.org/10.1016/j.aej.2022.02.063
1110-0168 Ó 2022 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Engineering, Alexandria University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
9396 Y. Kayode Saheed et al.
set, accuracy, the area under the curve, recall, F1, precision, kappa, and Mathew correlation coef-
ficient (MCC). The findings were also benchmarked with the existing works, and our results were
competitive with an accuracy of 99.9% and MCC of 99.97%.
Ó 2022 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Engineering, Alexandria
University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/
licenses/by-nc-nd/4.0/).
persed system for IoT applications in smart cities. Using the IDS-based machine learning algorithms for detecting IoT
ECDSA method, the authors [25] offer a key exchange proto- attacks. The central contributions of this research are as
col for cluster-based wireless sensor networks in an IoT envi- follows.
ronment that improves the probability of key exchange
between sensor nodes and CH. It increases key management To use the Minimum-Maximum (Min-Max) normalization
performance in terms of key sharing, but the computational technique to ensure that all the feature values are on the
complexity is substantially raised by the rekeying procedure. same scale.
The study [26] suggested a hash key-based key management To adopt PCA for dimensionality reduction to transform
mechanism for cluster-based wireless sensor networks that the data into principal components.
employs random key pre-distribution for WSN in IoT. Three To design and implement IDS that satisfy IoT protocol
performance indicators were used in the study: packet loss requirements with the UNSWNB-15 dataset as against
rate, energy usage, and latency. Although the effort enhances dataset that suffers from numerous issues obtained in a con-
performance by establishing a secure link for one-hop and ventional network.
multi-hop communication, the network’s cluster heads are To develop several lightweight IDS models for IoT net-
not movable. In 6LoWPAN and IEEE 802.15.4, the research- works that are efficient.
ers [27] provided a system for detecting denial of service To compare the performance of the proposed models with
threats. For simulation, it is tested and built using the Ebbits existing techniques.
network adapter platform and Contiki OS. The security man-
ager component of Ebbits includes a DoS protection manager. This paper is organized as follows. Section 2 discusses the
It collects intrusion detection warnings via a network-based related work. Section 3 presents the proposed methodology.
intrusion detection system built on Suricata, an open source Section 4 reports the results and discussion. Section 5 con-
intrusion detection system. cludes the paper.
Previous research looked at solutions from the standpoints
of IoT security threats and practices, as well as different 2. Related work
machine learning algorithms, datasets, and implementation
tools. Additionally, some have always focused on encryption In this section, the past studies of IDS in IoT are presented.
and cryptography methods [28], which are dependent on key Diro et al. [24] proposed a distributed deep learning-based
management techniques. Because keys are shared between sen- IoT attack detection system. The results of their work gave
sor nodes, there are still key management difficulties in 96% accuracy. Hoda et al. [36] propose an IDS with low-
encrypted solutions [29]. Many key management systems [30] capacity devices for IoT applications. The experimental results
are probabilistic when it comes to key sharing in a clustered of their work achieved 99.4% for Denial of Service. The infor-
environment. A probabilistic key management method cannot mation of the dataset used for the analysis was not provided.
guarantee that two nodes in separate clusters will be able to The study of [37] utilized the NSL-KDD dataset with the deep
establish a shared key; if some of the neighbor nodes are learning (DL) method in cybersecurity. This work used a self-
unable to do so, they will be unable to participate in the net- taught DL approach in which sparse-auto encoders were used
work. Furthermore, because the same key is used by multiple to perform unsupervised feature learning on training data. The
nodes, the network is at higher risk if any of them is compro- learned characteristics were used to classify the labeled test
mised. Hence, we proposed an ML-based IDS for detecting dataset into attack and normal. The authors evaluated perfor-
IoT network attacks. As a result, communication overhead is mance using the n-fold cross-validation methodology, and the
reduced, and there is no need for a foreign key among obtained result seemed to be reasonable. Two recent publica-
Cluster-Head (CH) and Cluster-Node (CN) for secret commu- tions can also be referred to that focus on applying ML
nication whenever the cluster node transfers to a new region. approaches to issues with security in IoT architectures using
We investigate an ML-based IDS because ML-based models the KDD99 dataset.
work well in enhancing scalability and minimizing energy Bostani and Sheikhan [38] introduced an altered K-means
consumption. strategy for shrinking the training dataset and balancing the
IoT devices are anticipated to develop more predominant data used to train ELMs and SVMs. The experimental findings
than smartphones and have access to the most up-to-date com- of the proposed model gave 96.02% percent accuracy and a
plex information such as confidential information [31]. As a 5.92 percent false alarm rate. The clustering with self-
result, the number of attacks will increase, and the attack pre- Organized Ant Colony Networks (CSOACN) was used to cat-
dictor variables will increase [32]. Another major challenge of egorize network traffic as benign or usual traffic.
IoT in the health care industry will be to provide network The authors [24] described attack detection using a fog-to-
safety concerning possible attacks in health care systems [33]. things design. The authors conducted a comparison between a
IDS is a technology established to address network security. shallow and a deep neural network utilizing a free online
As a result of IoT applications’ important application in our dataset.
lives, it is important to advance IoT machine learning-based The fundamental objective of this effort was to spot four
IDS capable of attacks detection. distinct types of attack and abnormality. The system achieved
The IoT is a novel cohort of IT, which is presently a hot 98.27 percent accuracy with a DNN model and 96.75 percent
topic of research for organizations, citizens, and governments accuracy with a shallow NN model. According to the authors
around the world, with is security issues gaining more consid- [38], the wireless sensor networks and Internet, respectively,
eration [34]. IDS technology is a significant technique to safe- IoT’s primary units, are insecure, making the IoT suffer from
guard the network’s security, which is presently a popular various assaults. The same researchers suggest a new architec-
topic in IoT security [35]. Hence, this paper aims to present
9398 Y. Kayode Saheed et al.
ture for real-time-based intrusion detection, consisting of the 2.1. Motivation of the present work
anomaly-based intrusion detection components and require-
ments for spotting two types of routing assaults referred to The IoT is the driving force behind home automation,
as selective and collectors routing assaults in the IoT. When improved manufacturing, modern healthcare, and smart cities.
the selective attack and collector attack were conducted con- Our businesses, communities, government machinery, and
currently, the suggested hybrid real-time technique produced Critical National Infrastructure are all being pushed toward
a true-positive of 76.19% and a false positive of 5.92%. the formation of a linked knowledge-based society by CNI.
Anthi et al., [11] proposed an IoT-based intrusion detection IoT devices and smart technologies are critical in smart homes,
system. The authors successfully used several machine learning intelligent transportation, health care, smart cities, and smart
models to recognize network monitoring probing and simple grids. In the sphere of IoT, anomaly and attack detection in
kinds of DoS attacks for this purpose. The data set is gener- the IoT ecosystem is a growing concern. Threats and attacks
ated by capturing network traffic for four (4) successive days against IoT infrastructure are increasing in lockstep with the
with the software known as Wireshark. Weka was utilized to increasing utilization of IoT infrastructure across all domains.
apply machine learning classifiers. Thousands of assaults are known to emerge regularly as a
The study [39] proposed an intrusion detection model that result of the addition of multiple protocols, primarily from
uses a two-dimension reduction and classification module with IoT. The majority of these assaults are minor variations of pre-
two tiers. Additionally, this model was created to detect mali- viously identified cyberattacks. This shows that even advanced
cious behavior such as R2L and U2R attacks. The PCA and techniques like cryptography have a hard time identifying even
LDA were employed to decrease the dimensions. The entire tiny mutations of threats with time. The success of ML in
experiment was conducted using the NSL-KDD dataset. NB numerous big data sectors has sparked interest in cybersecu-
and the Certainty Factor version of K-NN were used in the rity. Because of improvements in CPU characteristics, the
two-tier classification module to detect suspicious activities. application of ML has been practicable. In this research, we
Kozik et al., [40] demonstrated a cloud-based classification- have adopted the Min-max approach for feature scaling and
based threat detection solution. In this paper, an ELM scaled PCA for FS. However, there are other feature scaling methods
in the Apache Spark cloud infrastructure is used to analyze such as the z-score technique, and FS like LDA. The z-score
simulated Netflow structured data. The authors in [41] pro- technique necessitates the knowledge of the standard devia-
posed different emerging technologies for the treatment, study, tion, which isn’t always possible while the LDA is sensitive
and investigation of victims with Covid-19. The result of their to outliers. As a result, we decided to employ Min-max in
findings showed that technologies like Big Data, AI, and IoT the first phase and PCA in the second.
would be essential for the treatment of sick-person with
Covid-19. The work in [42] proposes a platform based on the
3. Proposed model
IoT to identify and monitor Covid-19 occurrence. They used
ML algorithms SVM, Decision Stump, K-NN, NN, Decision
Table, NB, ZeroR, and OneR. The experimental results of This section discusses the proposed IDS for detecting IoT
their findings revealed that five (5) of these classification algo- applications attacks. The data generated from a smartphone
rithms gave an accuracy of more than 90%. The study in [43] and sensors are uploaded to the cloud. The data in the cloud
proposed a platform for healthcare professionals in the Covid- is not safe as this data is vulnerable to attack. The attack
19 epidemic by utilizing artificial intelligence and the Internet can be launched directly on the cloud or via transmission.
of things. The authors found out that the challenges faced The dataset employed in this research is the UNSW-NB15
by health care workers can be reduced with the adoption of dataset [46]. This dataset contains up-to-date attack types
IoT. Table 1 depicts existing methods for IoT attacks classifi- and was released recently [47]. In the first stage of this research
cation using different ML strategies. methodology, feature scaling was done using the min–max
The majority of research in the literature focused on poten- concept of normalization on the UNSWNB15 dataset to limit
tial solutions from the perspectives of IoT standards and tech- information leakage on the test data. In the next stage, dimen-
nologies, architecture types, IoT security threats, and sionality reduction was performed with PCA. Data preprocess-
practices, dissimilar machine learning approaches, datasets, ing was the first analysis performed after the acquisition and
and tools for implementation. loading of the dataset. Data preprocessing is very vital as it
Unlike the previous efforts, in this paper, we study IDS helps in eliminating outliers and removing redundant attri-
approaches for resource-constrained devices in the network. butes. The Normalization method with the Min-max technique
The distinction is that the technique will perform feature scal- was used for data preprocessing. The output of the min–max is
ing with the min–max method and do classification indepen- fed into the feature selection algorithm known as PCA. The
dently using its chosen features, which are chosen by PCA. PCA selected ten (10) important components out of the
Table 1 explains and summarizes the existing methods. The forty-nine attributes in the dataset. The reduced dataset is then
majority of the classifiers use ELM, K-means, LDA, and trained by the XGBoost, CatBoost, KNN, SVM, QDA, and
SVM algorithms for feature selection, as shown in the table. NB classifiers. The architecture of our proposed model is
The majority of existing studies utilize the NSLKDD and shown in Fig. 1.
KDDCUP99 datasets to conduct experiments. It is an older
dataset in terms of identifying R2L, DoS, probing, and U2R 3.1. Data preprocessing
assaults, whereas the UNSW-NB15 dataset, which is the most
recent and captures a wireless network in terms of detecting Data preprocessing can be seen as a significant task in the ML
exploits, DoS, generic, fuzzers, reconnaissance, backdoors, field as it helps to eliminate defects in the dataset [48]. This is
and worms, will be used for this research. the first stage of the proposed methodology. It is projected to
A machine learning-based intrusion detection for detecting internet 9399
change the raw network IoT attack data to a format that is standardized moment, z-score normalization, and min–max
effective to use for further analysis [49]. normalization [50]. We used the min–max normalization tech-
nique in this paper.
3.1.1. Normalization
Normalization is a feature scaling technique in which the aim 3.1.2. Min-max normalization technique
is to have all the values of the attributes on the same scale. According to [51] attributes are normalized in the range [0,1]
There are various normalization methods, including based on the equation.
K is the covariances and variances matrix after the observa- Quadratic discriminant analysis (QDA) is the next-generation
tions. To find solution of the equation (3), constraint b1 b1 = 1 classifier in the family of discriminant analysis. QDA gives bet-
must be recognized utilizing Lagrange multiplier idea. ter results analysis than LDA [73]. It divides observation using
Z ¼ b1 Kb vðb1 b1 1Þ ð4Þ the idea of a quadratic function [74]. In this paper, the QDA
Maximizing Equation 4 entails deriving it in terms of its was used for the classification of the ten components attributes
constituents b1 until zero:. selected by PCA. The calibration plot of QDA in the analysis
@Z=@b1 ¼ 2Db1 2vb1 ¼ 0 ð5Þ is shown in Fig. 5.
The equation (5) gives in Db1 = vb1, where b is an eigen-
vector of D, and v is the eigenvalue. 4. €
Xgboost is one of the recently introduced ensemble machine- NB is a simple extremely scalable [75] classifier that is
learning algorithms [62], highly effective tree boosting that grounded on the Bayes Theorem [76]. NB is used to predict
has been used to generate state-of-the-art results in different the probability of a class belonging to either normal or attack
applications. Xgboost uses the idea of ensembles of the tree classes [77]. It operates easily in training and classification
to execute feature selection to select the feature importance phases. NB assumes that all attributes in the vector are simi-
[63]. The feature importance selected by the Xgboost in this larly independent and important [78]. In this paper, the NB
paper is shown in Fig. 2. was used for the classification of the ten components attributes
A machine learning-based intrusion detection for detecting internet 9401
selected by PCA. The calibration plot of NB in the experimen- Park [79] proposed a semi-supervised machine learning tech-
tal analysis is shown in Fig. 6. nique for IoT. The experimental analysis of their work was
done on NSL-KDD. The dataset used in their work suffers
5. Results and discussion from numerous issues, according to the authors in [80]. We
think that this dataset should not be adopted for IoT as it
The experimental analysis findings were reported and dis- was obtained from a conventional network [32]. The majority
cussed in this section. This paper adopted the UNSW-NB15. of datasets utilized in previous papers lack real-world features.
The UNSW-NB15 was formed by using IXIA perfect tool to This is why the majority of anomalous intrusion detection sys-
extract a mixture of contemporary attacks and normal activi- tems in IoT are unsuitable for use in a production environ-
ties of network traffic [47]. The attacks of UNSW-NB15 were ment. Additionally, they are incapable of adjusting to the
grouped into nine different attack types. They are Analysis, continual changes in network architecture. This necessitates
Backdoor, DoS, Exploit, Generic, Reconnaissance, Fuzzers, us to implement IDS that satisfied IoT protocol requirements
Shellcode, and Worm. The dataset was split into two, where like the Wireless low-power personal area network. Therefore,
75% of the dataset was used for training the model, and IDS that is made for the IoT environment should work under
25% was used for testing the model. high-speed settings, big data capacity, low power consump-
tion, and processing. The authors in [81] also adopted the
5.1. Performance measure NSL-KDD dataset. This same dataset limitation was the
major problem in their work. The work in [82] suffers in terms
of accuracy and dataset used for the experimental analysis,
The performance of the model was evaluated in terms of the
accuracy, area under the curve, recall, precision, F1, kappa,
and Mathew Correlation Coefficient. The experimental results
of our findings of the proposed models are presented in Table 2 Cat Boost Algorithm.
Table 3. 1: Input: fYi; Zig n
i=1, J
To corroborate the results of our findings, the decision
boundary of each of the models was also generated. The deci- 2: Output: r [1, u]
sion boundary gives meaningful insight into how each of the
models has studied the task. The decision boundary of 3: Mj 0 for L = 1. . .u;
XgBoost gives insight into how XgBoost studies the classifica-
4: For r 1 to J do
tion task, as demonstrated in Fig. 7.
The decision boundary of KNN gives insight into how 5: For l 1 to u do
KNN studies the classification task, as demonstrated in Fig. 8.
The decision boundary of SVM gives insight into how SVM 6: Tl sl – M (j) 1 (Yj);
studies the classification task, as demonstrated in Fig. 9.
The decision boundary of QDA gives insight into how 7: For L 1 to u do
QDA studies the classification task, as demonstrated in
Fig. 10. 8: k M Learning model ((x, t), r (1) l);
The decision boundary of NB gives insight into how NB
9: Ml Ml + k M;
studies the classification task as demonstrated in Fig. 11.
In addition, the results of our findings were compared
10: Return Mu
against recent studies that applied IDS to IoT. Rathore and
9402 Y. Kayode Saheed et al.
which does not reflect contemporary attacks and is not a good and low accuracy. The authors in [84]-[88] evaluated the per-
fit for the IoT environment. The study in [34] also adopted a formances of their models in terms of accuracy, precision,
dataset that is of a lower standard to NSLKDD. The main and F1 without considering the MCC. However, we extend
issues in their work were the dataset employed for analysis the performance metrics in our proposed study by introducing
the MCC. Our Proposed methods outperformed all reported From Fig. 12, we observed that the PCA-KNN also gave
studies, as shown in Table 4 in terms of validation dataset, an accuracy of 99.98, precision of 1, F1 of 99.99, and MCC
accuracy, precision, F1 score, and MCC. of 99.96. Our proposed PCA-SVM gave an accuracy of
The accuracy of our proposed method PCA-XgBoost gave 99.98, precision of 0, F1 of 1, and MCC of 99.96. The pro-
the highest accuracy as revealed in Table 3 and Fig. 12 in terms posed PCA-QDA gave an accuracy of 99.97, precision of
of accuracy, precision, F1, and MCC than all other proposed 99.99, F1 of 99.98, and MCC of 99.94. Lastly, the proposed
methods. The PCA-Cat Boost also outperformed other pro- PCA-NB gave an accuracy of 97.14, precision of 96.72, F1
posed methods with an accuracy of 99.99, precision of 1, F1 of 97.94, and MCC of 93.41. In the experimental testbed situ-
of 99.99, and MCC of 99.97. ation, we discover that our proposed models surpass the previ-
ous detection technique in terms of accuracy and MCC. It is and are not the product of methodological flaws. There are
critical to highlight that our detection system was previously two key aspects to consider here.
trained before these experiments, and so the training time
remains constant. These results demonstrate that our sug- 5.2.2. Instrumentation
gested IDS perform well on simulated network traffic. As a This refers to discrepancies caused by changes in an instru-
result, we can conclude that our suggested system is capable ment’s calibration, as well as changes in the scorers, observers,
of reliably detecting security assaults in a variety of network or probably the device itself. The accuracy, validation dataset,
attack scenarios. precision, MCC, and F1 metrics used in the validation are all
well-known methodologies. As a result, there have been no
5.2. Threats to validity modifications that could have caused the evaluation results
to be incorrect.
This section looks at potential challenges to the validity of the
verification results acquired in this research. 5.2.3. Selection
Any factor besides the system that causes posttest disparities is
5.2.1. Internal validity referred to as a selection threat. Therefore, the situation in
Internal validity is the degree to which reported results accu- which the feature scaling is not performed and the data is
rately reflect the reality in the population under investigation not on the same scale can be a factor in this work.
Proposed NB
Proposed QDA
Proposed SVM
Proposed KNN
Proposed CatBoost
Proposed XgBoost
(L. Deng et al., 2019)
(Mohammadi & Sabokrou,…
(Almiani et al., 2020)
(Meidan et al., 2017)
(Rathore & Park, 2018) Accuracy Precision F1 Score MCC
0 50 100 150 200 250 300 350 400
5.3. Construct validity More specifically, the BoT-IoT dataset will be utilized for
experimental analysis and compared to the UNSWNB-15 with
The degree to which the instrument ’interacts’ in a way that is a deep learning model for network traffic classification.
consistent with conceptual assumptions, and how effectively
the instrument’s scores are reflective of the complex frame- Declaration of Competing Interest
work. This threat stems from the question of if the experiment
accurately replicates real-world occurrences to be examined. The authors declare that they have no known competing
The evaluation criteria in terms of accuracy are very high indi- financial interests or personal relationships that could have
cating the proposed model is consistent. appeared to influence the work reported in this paper.
This has to do with our ability to apply the findings of this This paper is partially funded by the Research Council of Nor-
research to real-world situations. This threat raises the ques- way (RCN) in the INTPART program under the project
tion, ‘‘Can this effect be generalized to various populations, "Reinforcing Competence in Cybersecurity of Critical Infra-
situations, treatment, and measurement attributes?”. structures: A Norway-US Partnership (RECYCIN)" with the
On the UNSW-NB15 data, the proposed ML technique for project number #309911.
IoT network threat detection was applied and confirmed. The
findings are consistent with what has been found in the litera- References
ture. Validation will be done in an industry context or on the
IoT botnet dataset in the future. [1] B.B. Zarpelão, R.S. Miani, C.T. Kawakani, S.C. de Alvarenga,
A survey of intrusion detection in Internet of Things, J. Netw.
6. Conclusion and future work Comput. Appl. 84 (2017) 25–37, https://doi.org/10.1016/j.
jnca.2017.02.009.
[2] D. Miorandi, S. Sicari, F. De Pellegrini, I. Chlamtac, Internet of
We examined the viability of deploying machine-learning- things: Vision, applications and research challenges, Ad Hoc
based intrusion detection in resource-constrained IoT environ- Networks 10 (7) (2012) 1497–1516, https://doi.org/10.1016/j.
ments in this paper. To that aim, we built an intelligent IDS adhoc.2012.02.016.
capable of detecting abnormal behavior on insecure IoT net- [3] A.B. Feroz Khan, G. Anandharaj, A Multi-layer Security
works by deftly combining feature dimensionality reduction approach for DDoS detection in Internet of Things, Int. J.
and machine learning methods. We evaluated our scheme’s Intell. Unmanned Syst. 9 (3) (2020) 178–191, https://doi.org/
10.1108/IJIUS-06-2019-0029.
performance using the UNSW-NB15 dataset to determine
[4] ‘‘Cisco Delivers Vision of Fog Computing to Accelerate Value
the optimal approach for machine learning-based IDS. Secu-
from Billions of Connected Devices | The Network.” https://
rity concerns have become a major roadblock to the develop- newsroom.cisco.com/press-release-
ment of the IoT. Security detection tasks could be handled by content?type=webcontent&articleId=1334100 (accessed Nov.
machine learning-based IDS. The PCA algorithm was used for 30, 2020).
dimensionality reduction to select ten components. The model [5] S. Sicari, A. Rizzardi, L.A. Grieco, A. Coen-Porisini, Security,
was evaluated on a recent dataset UNSW-NB15 that supports privacy and trust in Internet of things: The road ahead, Comput.
contemporary attacks and is very appropriate for IoT applica- Networks 76 (2015) 146–164, https://doi.org/10.1016/
tions. Also, communication overhead is reduced in the pro- j.comnet.2014.11.008.
posed model, and there is no need for a foreign key as [6] J. Jin, J. Gubbi, S. Marusic, M. Palaniswami, An information
framework for creating a smart city through internet of things,
required in encryption methods for IoT network security.
IEEE Internet Things J. 1 (2) (2014) 112–121, https://doi.org/
Our results revealed that the suggested approach achieves
10.1109/JIOT.2013.2296516.
higher F1 scores, indicating a stronger overall detection perfor- [7] D. Singh, G. Tripathi, A.J. Jara, ‘‘A survey of Internet-of-
mance. Based on the experimental results from network simu- Things: Future vision, architecture, challenges and services”,
lations and testbed implementations, we can infer that using 2014 IEEE World Forum Internet Things, WF-IoT 2014 (2014)
machine learning techniques for successful anomaly detection 287–292, https://doi.org/10.1109/WF-IoT.2014.6803174.
in the IoT environment is both realistic and practicable. The [8] C. Perera, C.H. Liu, S. Jayawardena, M. Chen, A Survey on
comparison of the proposed PCA-XgBoost, PCA-Cat Boost, Internet of Things from Industrial Market Perspective, IEEE
PCA-KNN, PCA-SVM, PCA-QDA, and PCA-NB with exist- Access 2 (2015) 1660–1679, https://doi.org/10.1109/
ing studies demonstrates outstanding accuracy and can ACCESS.2015.2389854.
[9] H. A. Abdul-Ghani and D. Konstantas, ‘‘A comprehensive
address the issue of labeled data in IoT applications. The
study of security and privacy guidelines, threats, and
experimental findings of our work were superior to the state-
countermeasures: An IoT perspective,” J. Sens. Actuator
of-the-art in terms of validation dataset, precision, F1, Networks, vol. 8, no. 2, 2019, doi: 10.3390/jsan8020022.
MCC, and accuracy of the two of our proposed models, attain- [10] V. Adat and B. B. Gupta, ‘‘Security in Internet of Things: issues,
ing 99.99%. The architecture can be organized with smart challenges, taxonomy, and architecture,” Telecommun. Syst.,
cities, smart homes, and healthcare devices as a unit that vol. 67, no. 3, pp. 423–441, 2018, doi: 10.1007/s11235-017-0345-
detects attacks in the IoT settings. The future work will be 9.
to adopt an ensemble model with a novel dataset suitable for [11] E. Anthi, L. Williams, P. Burnap, Pulse: An adaptive intrusion
the IoT environment. In addition, this approach can be detection for the internet of things, IET Conf. Publ. 2018
enhanced in the future by incorporating deep learning models. (CP740) (2018) 1–4, https://doi.org/10.1049/cp.2018.0035.
A machine learning-based intrusion detection for detecting internet 9407
[12] S. Cirani, G. Ferrari, L. Veltri, Enforcing security mechanisms [28] J.H. Anajemba, Y. Tang, C. Iwendi, A. Ohwoekevwo, G.
in the IP-based internet of things: An algorithmic overview, Srivastava, O. Jo, Realizing efficient security and privacy in IoT
Algorithms 6 (2) (2013) 197–226, https://doi.org/10.3390/ networks, Sensors (Switzerland) 20 (9) (2020) 1–24, https://doi.
a6020197. org/10.3390/s20092609.
[13] C. Thirumalai, S. Mohan, G. Srivastava, An efficient public key [29] A.B. Feroz Khan, G. Anandharaj, A cognitive key management
secure scheme for cloud and IoT security, Comput. Commun. technique for energy efficiency and scalability in securing the
150 (2020) 634–643, https://doi.org/10.1016/ sensor nodes in the IoT environment: CKMT, SN Appl. Sci. 1
j.comcom.2019.12.015. (12) (2019), https://doi.org/10.1007/s42452-019-1628-4.
[14] A. Riahi Sfar, E. Natalizio, Y. Challal, Z. Chtourou, A roadmap [30] P. Vijayakumar, M. Azees, V. Chang, J. Deborah, B. Balusamy,
for security challenges in the internet of things, Digit Commun Computationally efficient privacy preserving authentication and
Netw 4 (2) (2018) 118–137. key distribution techniques for vehicular ad hoc networks,
[15] M. Zolanvari, M.A. Teixeira, L. Gupta, K.M. Khan, R. Jain, Cluster Comput. 20 (3) (2017) 2439–2450, https://doi.org/
Machine Learning-Based Network Vulnerability Analysis of 10.1007/s10586-017-0848-x.
Industrial Internet of Things, IEEE Internet Things J. 6 (4) [31] Y.K. Saheed, M.O. Arowolo, Efficient Cyber Attack Detection
(2019) 6822–6834, https://doi.org/10.1109/JIOT.2019.2912022. on the Internet of Medical Things-Smart Environment Based on
[16] A. Alrawais, A. Alhothaily, C. Hu, X. Cheng, Fog Computing Deep Recurrent Neural Network and Machine Learning
for the Internet of Things: Security and Privacy Issues, IEEE Algorithms, IEEE Access 9 (2021) 161546–161554, https://doi.
Internet Comput. 21 (2) (2017) 34–42, https://doi.org/10.1109/ org/10.1109/ACCESS.2021.3128837.
MIC.2017.37. [32] A. Khraisat, I. Gondal, P. Vamplew, J. Kamruzzaman, and A.
[17] Y. K. Saheed, ‘‘Performance Improvement of Intrusion Alazab, ‘‘A novel ensemble of hybrid intrusion detection system
Detection System for Detecting Attacks on Internet of Things for detecting internet of things attacks,” Electron., vol. 8, no. 11,
and Edge of Things,” in Artificial Intelligence for Cloud and Edge 2019, doi: 10.3390/electronics8111210.
Computing. Internet of Things (Technology, Communications and [33] J. John, M.S. Varkey, M. Selvi, Security attacks in s-wbans on
Computing), S. Misra, T. K. A., V. Piuri, and L. Garg, Eds. iot based healthcare applications, Int. J. Innov. Technol. Explor.
Springer, Cham, 2022, pp. 321–339. Eng. 9 (1) (2019) 2088–2097, https://doi.org/10.35940/ijitee.
[18] A.P. Kelton, J.P. Papa, C.O. Lisboa, R. Munoz, V.H.C. De, A4242.119119.
Internet of Things : A survey on machine learning-based [34] L. Deng, D. Li, X. Yao, D. Cox, H. Wang, Mobile network
intrusion detection approaches, Comput. Networks 151 (2019) intrusion detection for IoT system based on transfer learning
147–157, https://doi.org/10.1016/j.comnet.2019.01.023. algorithm, Cluster Comput. 22 (2019) 9889–9904, https://doi.
[19] W. Wu, H. Zhang, S. Pirbhulal, S.C. Mukhopadhyay, Y.T. org/10.1007/s10586-018-1847-2.
Zhang, Assessment of Biofeedback Training for Emotion [35] A. Adnan, A. Muhammed, A.A.A. Ghani, A. Abdullah, F.
Management Through Wearable Textile Physiological Hakim, An intrusion detection system for the internet of things
Monitoring System, IEEE Sens. J. 15 (12) (2015) 7087–7095, based on machine learning: Review and challenges, Symmetry
https://doi.org/10.1109/JSEN.2015.2470638. (Basel) 13 (6) (2021) 1–13, https://doi.org/10.3390/sym13061011.
[20] D. Pasini, S. Mastrolembo Ventura, S. Rinaldi, P. Bellagente, A. [36] E. Hodo et al., ‘‘Threat analysis of IoT networks using artificial
Flammini, and A. L. C. Ciribini, ‘‘Exploiting internet of things neural network intrusion detection system,” 2016 Int. Symp.
and building information modeling framework for management Networks, Comput. Commun. ISNCC 2016, pp. 4–9, 2016, doi:
of cognitive buildings,” IEEE 2nd Int. Smart Cities Conf. 10.1109/ISNCC.2016.7746067.
Improv. Citizens Qual. Life, ISC2 2016 - Proc., vol. 40545387, [37] Q. Niyaz, W. Sun, A.Y. Javaid, M. Alam, ‘‘A deep learning
no. 40545387, 2016, doi: 10.1109/ISC2.2016.7580817. approach for network intrusion detection system”, EAI Int,
[21] W. Wu, S. Pirbhulal, H. Zhang, S.C. Mukhopadhyay, Conf. Bio-inspired Inf. Commun. Technol. (2015), https://doi.
Quantitative Assessment for Self-Tracking of Acute Stress org/10.4108/eai.3-12-2015.2262516.
Based on Triangulation Principle in a Wearable Sensor [38] H. Bostani, M. Sheikhan, Hybrid of anomaly-based and
System, IEEE J. Biomed. Heal. Informatics 23 (2) (2019) 703– specification-based IDS for Internet of Things using
713, https://doi.org/10.1109/JBHI.2018.2832069. unsupervised OPF based on MapReduce approach, Comput.
[22] E. Kabir, J. Hu, H. Wang, G. Zhuo, A novel statistical Commun. 98 (2017) 52–71, https://doi.org/10.1016/
technique for intrusion detection systems, Futur. Gener. j.comcom.2016.12.001.
Comput. Syst. 79 (2018) 303–318, https://doi.org/10.1016/ [39] H.H. Pajouh, R. Javidan, R. Khayami, A. Dehghantanha, K.-
j.future.2017.01.029. K. Choo, A Two-Layer Dimension Reduction and Two-Tier
[23] M. Ahmed, A. Naser Mahmood, J. Hu, A survey of network Classification Model for Anomaly-Based Intrusion Detection in
anomaly detection techniques, J. Netw. Comput. Appl. 60 IoT Backbone Networks, IEEE Trans. Emerg. Top. Comput. 7
(2016) 19–31, https://doi.org/10.1016/j.jnca.2015.11.016. (2) (2019) 314–323, https://doi.org/10.1109/
[24] A.A. Diro, N. Chilamkurti, Distributed Attack Detection TETC.2016.2633228.
Scheme using Deep Learning Approach for Internet of Things, [40] R. Kozik, M. Choraś, M. Ficco, F. Palmieri, A scalable
Futur. Gener. Comput. Syst. 82 (2018) 761–768, https://doi.org/ distributed machine learning approach for attack detection in
10.1016/j.future.2017.08.043. edge computing environments, J. Parallel Distrib. Comput. 119
[25] S.R. Nabavi, S.M. Mousavi, ‘‘A Novel Cluster-based Key (2018) 18–26, https://doi.org/10.1016/j.jpdc.2018.03.006.
Management Scheme to Improve Scalability in Wireless Sensor [41] M. Tsikala Vafea, E. Atalla, J. Georgakas, F. Shehadeh, E.K.
Networks” 16 (7) (2016) 150–156. Mylona, M. Kalligeros, E. Mylonakis, Emerging Technologies
[26] S.D. Babar, P.N. Mahalle, A Hash Key-Based Key for Use in the Study, Diagnosis, and Treatment of Patients with
Management Mechanism for Cluster-Based Wireless Sensor COVID-19, Cell. Mol. Bioeng. 13 (4) (2020) 249–257, https://
Network, J. Cyber Secur. Mobil. 5 (2017) 73–88, https://doi.org/ doi.org/10.1007/s12195-020-00629-w.
10.13052/2245-1439.524. [42] M. Otoom, N. Otoum, M.A. Alzubaidi, Y. Etoom, R. Banihani,
[27] P. Kasinathan, C. Pastrone, M.A. Spirito, M. Vinkovits, Denial- Biomedical Signal Processing and Control An IoT-based
of-Service detection in 6LoWPAN based Internet of Things, Int. framework for early identification and monitoring of COVID-
Conf. Wirel. Mob. Comput. Netw. Commun. (2013) 600–607, 19 cases, Biomed. Signal Process. Control 62 (2020) 102149,
https://doi.org/10.1109/WiMOB.2013.6673419. https://doi.org/10.1016/j.bspc.2020.102149.
9408 Y. Kayode Saheed et al.
[43] S. Kumar, R.D. Raut, B.E. Narkhede, A proposed collaborative [59] S. Bhattacharya et al., ‘‘A novel PCA-firefly based XGBoost
framework by using artificial intelligence-internet of things (AI- classification model for intrusion detection in networks using
IoT) in COVID-19 pandemic situation for healthcare workers, GPU,” Electron., vol. 9, no. 2, 2020, doi:
Int. J. Healthc. Manag. 13 (4) (2020) 337–345, https://doi.org/ 10.3390/electronics9020219.
10.1080/20479700.2020.1810453. [60] S. Velliangiri, ‘‘A hybrid BGWO with KPCA for intrusion
[44] Y. Feng, J. Zhong, C.X. Ye, Z.F. Wu, ‘‘Clustering based on self- detection,” J. Exp. Theor. Artif. Intell., vol. 32, no. 1, pp. 165–
organizing ant colony networks with application to intrusion 180, 2020, doi: 10.1080/0952813X.2019.1647558.
detection”, Proc. - ISDA 2006 Sixth Int, Conf. Intell. Syst. Des. [61] D. Gonzalez-Cuautle et al., ‘‘Synthetic minority oversampling
Appl. 2 (2006) 1077–1080, https://doi.org/10.1109/ technique for optimizing classification tasks in botnet and
ISDA.2006.253761. intrusion-detection-system datasets,” Appl. Sci., vol. 10, no. 3,
[45] Y.W. Chen, J.P. Sheu, Y.C. Kuo, N. Van Cuong, ‘‘Design and 2020, doi: 10.3390/app10030794.
implementation of IoT DDoS attacks detection system based on [62] C. Hu, J. Yan, and C. Wang, ‘‘Advanced Cyber-Physical Attack
machine learning”, 2020 Eur, Conf. Networks Commun. Classification with Extreme Gradient Boosting for Smart
EuCNC 2020 (2020) 122–127, https://doi.org/10.1109/ Transmission Grids,” IEEE Power Energy Soc. Gen. Meet.,
EuCNC48522.2020.9200909. vol. 2019-Augus, 2019, doi: 10.1109/
[46] M. Ahmad, Q. Riaz, M. Zeeshan, H. Tahir, S.A. Haider, M.S. PESGM40551.2019.8973679.
Khan, Intrusion detection in internet of things using supervised [63] A. Husain, A. Salem, C. Jim, and G. Dimitoglou, ‘‘Development
machine learning based on application and transport layer of an Efficient Network Intrusion Detection Model Using
features using UNSW-NB15 data-set, Eurasip J. Wirel. Extreme Gradient Boosting (XGBoost) on the UNSW-NB15
Commun. Netw. 1 (2021) 2021, https://doi.org/10.1186/s13638- Dataset,” 2019 IEEE 19th Int. Symp. Signal Process. Inf.
021-01893-8. Technol. ISSPIT 2019, 2019, doi: 10.1109/
[47] N. Moustafa and J. Slay, ‘‘The significant features of the ISSPIT47144.2019.9001867.
UNSW-NB15 and the KDD99 data sets for Network Intrusion [64] A. V. Dorogush, V. Ershov, and A. Gulin, ‘‘CatBoost: Gradient
Detection Systems,” Proc. - 2015 4th Int. Work. Build. Anal. boosting with categorical features support,” arXiv, pp. 1–7,
Datasets Gather. Exp. Returns Secur. BADGERS 2015, pp. 25– 2018.
31, 2017, doi: 10.1109/BADGERS.2015.14. [65] T. Al-hadhrami and F. Mohammed, Advances on Smart and Soft
[48] E.A. Felix, S.P. Lee, Systematic literature review of Computing. 2020.
preprocessing techniques for imbalanced data, IET Softw. 13 [66] G. Kavitha, N.M. Elango, An approach to feature selection in
(6) (2019) 479–496, https://doi.org/10.1049/iet-sen.2018.5193. intrusion detection systems using machine learning algorithms,
[49] Y. Zhou, G. Cheng, S. Jiang, M. Dai, Building an efficient Int. J. e-Collaboration 16 (4) (2020) 48–58, https://doi.org/
intrusion detection system based on feature selection and 10.4018/IJeC.2020100104.
ensemble classifier, Comput. Networks 174 (October) (2019) [67] G. Serpen, E. Aghaei, Host-based misuse intrusion detection
2020, https://doi.org/10.1016/j.comnet.2020.107247. using PCA feature extraction and kNN classification algorithms,
[50] S. Jain, S. Shukla, R. Wadhvani, Dynamic selection of Intell. Data Anal. 22 (5) (2018) 1101–1114, https://doi.org/
normalization techniques using data complexity measures, 10.3233/IDA-173493.
Expert Syst. Appl. 106 (2018) 252–262, https://doi.org/10.1016/ [68] N. Moustafa, J. Slay, ‘‘A hybrid feature selection for network
j.eswa.2018.04.008. intrusion detection systems: Central points and association
[51] S. Agarwal, Data mining: Data mining concepts and techniques. rules” arXiv (2017) 5–13, https://doi.org/10.4225/75/
2014. 57a84d4fbefbb.
[52] H. Alazzam, A. Sharieh, K.E. Sabri, A feature selection [69] A.A. Salih, M.B. Abdulrazaq, Combining Best Features
algorithm for intrusion detection system based on Pigeon Selection Using Three Classifiers in Intrusion Detection
Inspired Optimizer, Expert Syst. Appl. 148 (2020) 113249, System, 2019 Int Conf. Adv. Sci. Eng. ICOASE 2019 (2019)
https://doi.org/10.1016/j.eswa.2020.113249. 94–99, https://doi.org/10.1109/ICOASE.2019.8723671.
[53] S. Maza, M. Touahria, Feature selection for intrusion detection [70] W. Wang, X. Du, N. Wang, Building a Cloud IDS Using an
using new multi-objective estimation of distribution algorithms, Efficient Feature Selection Method and SVM, IEEE Access 7
Appl. Intell. 49 (12) (2019) 4237–4257, https://doi.org/10.1007/ (2019) 1345–1354, https://doi.org/10.1109/
s10489-019-01503-7. ACCESS.2018.2883142.
[54] F.H. Almasoudy, W.L. Al-Yaseen, A.K. Idrees, Differential [71] M. Al-Qatf, Y. Lasheng, M. Al-Habib, K. Al-Sabahi, Deep
Evolution Wrapper Feature Selection for Intrusion Detection Learning Approach Combining Sparse Autoencoder with SVM
System, Procedia Comput. Sci. 167 (2019) (2020) 1230–1239, for Network Intrusion Detection, IEEE Access 6 (2018) 52843–
https://doi.org/10.1016/j.procs.2020.03.438. 52856, https://doi.org/10.1109/ACCESS.2018.2869577.
[55] Y.K. Saheed, F.E. Hamza-Usman, Feature Selection with IG-R [72] W. Feng, J. Sun, L. Zhang, C. Cao, Q. Yang, A support vector
for Improving Performance of Intrusion Detection System, Int. machine based naive Bayes algorithm for spam filtering, 2016
J. Commun. Networks Inf. Secur 12 (3) (2020) 338–344. IEEE 35th Int. Perform. Comput. Commun. Conf. IPCCC 2016
[56] A. Yulianto, P. Sukarno, and N. A. Suwastika, ‘‘Improving (2017), https://doi.org/10.1109/PCCC.2016.7820655.
AdaBoost-based Intrusion Detection System (IDS) Performance [73] P. P. S. Saputra, F. D. Murdianto, R. Firmansyah, and K.
on CIC IDS 2017 Dataset,” J. Phys. Conf. Ser., vol. 1192, no. 1, Widarsono, ‘‘Combination of Quadratic Discriminant Analysis
2019, doi: 10.1088/1742-6596/1192/1/012018. and Daubechis Wavelet for Classification Level of Misalignment
[57] R. Abdulhammed, H. Musafer, A. Alessa, M. Faezipour, A. on Induction Motor,” Proceeding - 2019 Int. Symp. Electron.
Abuzneid, Features Dimensionality Reduction Approaches for Smart Devices, ISESD 2019, pp. 1–5, 2019, doi: 10.1109/
Machine Learning Based Network Intrusion Detection, ISESD.2019.8909431.
Electronics 8 (3) (2019) 322, https://doi.org/ [74] Y. Saheed, O. Longe, U. A. Baba, S. Rakshit, and N. R.
10.3390/electronics8030322. Vajjhala, ‘‘An Ensemble Learning Approach for Software
[58] J. Gao, S. Chai, B. Zhang, and Y. Xia, ‘‘Research on network Defect Prediction in Developing Quality Software Product.,”
intrusion detection based on incremental extreme learning in Advances in Computing and Data Sciences., M. Singh, V.
machine and adaptive principal component analysis,” Tyagi, P. K. Gupta, J. Flusser, T. Ören, and V. R. Sonawane,
Energies, vol. 12, no. 7, 2019, doi: 10.3390/en12071223. Eds. Springer, Cham, 2021.
A machine learning-based intrusion detection for detecting internet 9409
[75] M.O. Mughal, S. Kim, S. Member, ‘‘Signal Classification and [81] M. Almiani, A. AbuGhazleh, A. Al-Rahayfeh, S. Atiewi, A.
Jamming Detection in Wide-band Radios Using Na ¨ ıve, Bayes Razaque, Deep recurrent neural network for IoT intrusion
Classifier” 14 (8) (2018) 8–11, https://doi.org/10.1109/ detection system, Simul. Model. Pract. Theory 101 (2020),
LCOMM.2018.2830769. https://doi.org/10.1016/j.simpat.2019.102031 102031.
[76] S.M. Kasongo, Y. Sun, A deep learning method with filter based [82] B. Mohammadi, M. Sabokrou, ‘‘End-to-End Adversarial
feature engineering for wireless intrusion detection system, IEEE Learning for Intrusion Detection in Computer Networks”
Access 7 (2019) 38597–38607, https://doi.org/10.1109/ arXiv (2019) 270–273.
ACCESS.2019.2905633. [83] Y. Meidan et al., ‘‘Detection of Unauthorized IoT Devices
[77] J. Manhas, S. Kotwal, Implementation of Intrusion Detection Using Machine Learning Techniques,” arXiv, 2017.
System for Internet of Things Using Machine Learning [84] C. Liang, B. Shanmugam, S. Azam, M. Jonkman, F. De Boer,
Techniques, Multimedia Security. Algorithms Intelligent G. Narayansamy, Intrusion Detection System for Internet of
Systems (2021), https://doi.org/10.1007/978-981-15-8711-5_11. Things based on a Machine Learning approach, 2019 Int. Conf.
[78] L.I. Li, W. Jiang, X. Li, K.L. Moser, Z. Guo, L. Du, Q. Wang, Vis. Towar. Emerg. Trends Commun. Netw. (2019) 1–6.
E.J. Topol, Q. Wang, S. Rao, A robust hybrid between genetic [85] S. Fenanir, F. Semchedine, A. Baadache, A Machine Learning-
algorithm and support vector machine for extracting an optimal Based Lightweight Intrusion Detection System for the Internet
feature gene subset, Genomics 85 (1) (2005) 16–23, https://doi. of Things, Rev. d’Intelligence Artif. 33 (3) (2019) 203–211.
org/10.1016/j.ygeno.2004.09.007. [86] S.S. Abul Basar, Haoxiang Wang, Hybrid Intrusion Detection
[79] S. Rathore, J.H. Park, Semi-supervised learning based System for Internet of Things (IoT), J. ISMAC 2 (4) (2020) 190–
distributed attack detection framework for IoT, Appl. Soft 199, https://doi.org/10.36548/jismac.2020.4.002.
Comput. J. 72 (2018) 79–89, https://doi.org/10.1016/j. [87] G. Thamilarasu and S. Chawla, ‘‘Towards deep-learning-driven
asoc.2018.05.049. intrusion detection for the internet of things,” Sensors
[80] J. Mchugh, Testing Intrusion Detection Systems: A Critique of (Switzerland), vol. 19, no. 9, 2019, doi: 10.3390/s19091977.
the 1998 and 1999 DARPA Intrusion Detection System [88] S.U. Jan, S. Ahmed, V. Shakhov, I. Koo, Toward a Lightweight
Evaluations as Performed by Lincoln Laboratory, ACM Intrusion Detection System for the Internet of Things, IEEE
Trans. Inf. Syst. Secur. 3 (4) (2000) 262–294, https://doi.org/ Access 7 (2019) 42450–42471, https://doi.org/10.1109/
10.1145/382912.382923. ACCESS.2019.2907965.