1 s2.0 S2665917423003276 Main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Measurement: Sensors 31 (2024) 100991

Contents lists available at ScienceDirect

Measurement: Sensors
journal homepage: www.sciencedirect.com/journal/measurement-sensors

Prevention and detection of DDOS attack in virtual cloud computing


environment using Naive Bayes algorithm of machine learning
Yongqiang Shang
Xinyang Agriculture and Forestry University, Department of Information Engineering Department, Xinyang, Henan, 464000, China

A R T I C L E I N F O A B S T R A C T

Keywords: The popularity of cloud computing, with its incredible scalability and accessibility, has already welcomed a new
Machine learning era of innovation. Consumers who subscribe to a cloud-based service and use the associated pay-as-you-go
Cyber attack features have unlimited access to the applications mentioned above and technologies. In addition to lowering
Virtual cloud computing environment
prices, this notion also increased the reliability and accessibility of the offerings. One of the most crucial aspects
Cloud computing
Navie bayes
of cloud technology is the on-demand viewing of personal services, which is also one of its most significant
advantages. Apps that are cloud-based are available on demand from anywhere in the world at a reduced cost.
Although it causes its users pain with safety concerns, cloud computing can thrive because of its fantastic
instantaneous services. There are various violations, but they all accomplish something similar, taking the sys­
tems offline. Distributed denial of service attacks are among the most harmful forms of online assault. For fast
and accurate DDoS (Distributed Denial of Service, distributed denial of service) attack detection. This research
introduced the DDOS attack and a method to defend against it, making the system more resistant to such attacks.
In this scenario, numerous hosts are used to carrying out a distributed denial of service assault against cloud-
based web pages, sending possibly millions or even trillions of packets. It uses an OS like ParrotSec to pave
the way for the attack and make it possible. In the last phase, the most effective algorithms, such as Naive Bayes
and Random Forest, are used for detection and mitigation. Another major topic was studying the many cyber
attacks that can be launched against cloud computing.

1. Introduction IT systems that do not use cloud computing. Focused cloud-based crimes
are already using their innovations. Many security vulnerabilities in
DDos attack is a distributed type of attack mode in which an attacker cloud computing are unique compared to their predecessors in non-
controls a large number of attack machines and sends out DoS attack cloud computing environments because data and business logic are
instructions to the machine. In the latest Internet security report, DDoS stored on an external cloud server that lacks accessible oversight. The
attacks remain one of the major cybersecurity threats. The inexpensive denial-of-service (DoS) assault is one technique that has been in the
pricing and "pay-as-you-go" focused accessibility to computational fea­ spotlight recently. Denial-of-service incidents are directed at the server
tures and amenities on demand make cloud-based services a formidable rather than the people it supports. DoS attackers attempt to flood live
competitor to the conventional IT solutions available in prior eras. The servers by masquerading genuine users to overload the service’s ca­
use of cloud computing is gaining popularity rapidly. Whether entirely pacity to handle incoming inquiries [1]. Cloud computing is an
or largely governments and companies have moved their IT in­ Internet-based service that enables users to access configurable
frastructures onto the cloud. Cloud-based Infrastructure offers various computing resource sharing pools (including server, storage, application
advantages compared to traditional, on-site conventional in­ software, services, networks, etc.) to achieve online access to computing
frastructures. The removal of expenses associated with operation and resources on demand. As a mixture of emerging technologies and busi­
impairment, as well as the accessibility of materials on request, are only ness models, cloud computing has developed rapidly in recent years due
a few of the advantages. However, there are many concerns that cloud to its advantages of super-large scale, virtualization, high reliability,
consumers have, and the research addresses these issues. The majority of good scalability and on-demand services. To overcome this issue, mul­
these inquiries centre on safeguarding operational concepts and infor­ tiple inquiries are sent to the server simultaneously. The term "distrib­
mation. Many security-related attacks can be prevented in conventional uted denial of service," or DDoS, refers to a variation on the classic

E-mail address: [email protected].

https://doi.org/10.1016/j.measen.2023.100991
Received 25 July 2023; Received in revised form 13 November 2023; Accepted 18 December 2023
Available online 20 December 2023
2665-9174/© 2024 The Author. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
Y. Shang Measurement: Sensors 31 (2024) 100991

"denial of service" that uses numerous computers to attack and impair way we work and live, and the promotion of cloud computing in recent
one service at a time simultaneously. Among the most important and years has provided a more convenient platform for network resource
possibly catastrophic risks, among many others, is the growing number sharing. Cloud computing platform organically integrates computer
of distributed denial of service attacks observed. A quarter or more of the infrastructure (including server resources, storage resources, network
world’s organizations have experienced a distributed denial of service resources, etc.) through virtualization technology means, so as to realize
attack. The authors show great foresight in predicting DDoS attacks and resource sharing among multiple users, and greatly reduce the cost of
will increasingly focus on cloud-based assets and amenities. Multiple using resources, making it possible to provide cheap and high-
assaults in the past two years corroborate the paper’s predictions about performance services for users.
future attacks. There have been many attacks recently, but only a few
have gained widespread notoriety and interest from scientists. In 2015,
Lizard Squad hacked Microsoft and Sony’s cloud-based gaming systems, 1.1. DDoS attack and cloud features
causing both firms to shut down their services on Christmas Day.
Distributed denial of service attacks hit Rackspace, a cloud computing Denial of service (DoS, Denial of Service) attack is a destructive
services provider, hard. Another massive distributed denial of service attack on the target server through abnormal methods, resulting in its
attack was launched against Amazon EC2 cloud servers, serving as a inability to provide services to normal network users. Currently,
magnificent example of an attack. Company activities were severely distributed denial of service (DDoS) assaults have achieved much
disrupted, money was lost, and there were immediate and long-term accomplishment in cloud computing, where hackers make use of the
effects on the attacked businesses. In recent years, DDoS attacks have "pay-as-you-go" model. Many factors contribute to cloud computing’s
become more frequent, and the botnet used by attackers has become meteoric rise in popularity, but these three features stand out as
larger, and the network traffic usage has reached a height of 1000G. For particularly crucial. On the other hand, DDoS attackers have found that
cloud computing platforms, DDoS attacks from outside are similar to the same set of features dramatically aids them in achieving the objec­
DDoS attacks from traditional networks. According to the basic principle tives of their cyber attacks. In the sections that follow, we will examine
and characteristics of DDoS attack, the defense is mainly divided into each of these features more closely. Fig. 1 describes the cloud archi­
four stages: detection (Detecting), analysis (Analyz-ing), defense tecture which was affected by DDoS attack [5]. There are many methods
(Resisting) and counterattack (Counterattack). The detection and anal­ and forms of DoS attack, which are summarized in the following situa­
ysis technology is the key to the successful defense against DDoS attacks. tions: illegal occupation and consumption of computer resources such as
According to research published by Verisign iDefense Security In­ CPU, network bandwidth and storage space; changing or even destroy­
telligence Solutions, distributed denial of service (DDoS) assaults have ing the configuration information of the target server; changing or even
been particularly damaging to the internet and SaaS (Software as a destroying the key node equipment in the physical network; and
Service) business throughout the past few quarters. More than 75 % of accessing the services by programming.
known countermeasures against DDoS assaults utilized services pro­
vided by the cloud [2]. "financial damages" refers to one of the worst 1.1.1. Automatic sizing
possible results of a Distributed Denial of Service attack in the cloud. The Physical virtualization provides the capacity to scale down, up, and
median price of a distributed denial of service assault is put at $482,000, re-resource a live VM. A VM’s processing power, primary memory,
according to some estimates. Some of the financial losses suffered in Q1 storage area, and data transfer capacity can all be increased as needed,
2015 have been detailed in new disclosures from Neustar. Studies show thanks to these features. When some of the assigned resources are not
that, on average, more than $72K is stolen in a single hour. Distributed being used or needed, this can be utilized to free up some of those ca­
denial of service (DDoS) attacks take on new significance in cloud pabilities. Multiple vendors of services employ this method of resource
computing. This variation directly results from the operational diffi­ distribution, which is made practical by automatic scaling and web-
culties introduced by an assault on the victim network [3]. In the based tools. This allows those who use the cloud to calculate their
environment of cloud computing, DoS attack technology is also under­ needed facilities using utilization rates or similar matrices. It is possible
going new changes, and is manifested in a variety of forms. The attack to extend this functionality to automatically deploy new virtual ma­
may come from outside the server cluster or from inside the server chines (VMs) on top of existing physical servers and remove them when
cluster. At present, the more popular attack method is for the attacker to they are no longer needed. Upward scaling, which refers to adding more
attack a specific cloud computing platform or server cluster. This attack machines, and horizontal scaling, which refers to adding more data
method causes great harm and various methods, and it is difficult to centres or clouds, are two of the most crucial computing features for
quickly carry out fault positioning and troubleshooting.
Clouds that provide Infrastructure as a Service (IaaS) to their clients
contain virtual machines (VMs) that host the amenities for the clients.
The flexibility and on-demand nature of the cloud is made possible by
the abstraction of servers. It allows virtual machines to acquire and
distribute capabilities on the fly as needed. The advantages of cloud
computing, such as upon-request processing and easily accessible assets,
have contributed significantly to its recent meteoric rise. As a result, the
cloud can now support a more significant number of virtual machines
(VMs) with a far greater capacity to meet their resource requirements.
This is because a cloud-based virtual machine can access infinite re­
sources. A Distributed Denial of Service (DDoS) assault, also known as
an Economic Denial of Sustainability (EDoS) attack or a Fraudulent
Resource Consumption (FRC) attack, is the result of this "adaptability" or
"auto-scaling," which results in financial losses. Data center-distributed
denial of service attacks is the focus of this study. We also define these
assaults, compared to more conventional DDoS attacks, and analyze and
classify the numerous developments in this field. We will provide a
comprehensive taxonomy of these functions to make this analysis more
approachable. The popularization of computer network has changed the Fig. 1. Cloud architecture DDoS attacks.

2
Y. Shang Measurement: Sensors 31 (2024) 100991

utility purposes. Distributing an application across multiple cloud- the key network equipment such as firewall and router, to find the
hosted physical servers is one way to increase its capacity. High-speed bottleneck of network equipment and optimize the performance. The
connections and ample storage space are the two most essential fac­ approach to cloud computing provides customers with several oppor­
tors in determining adaptability. The virtualization of OSes is crucial tunities and benefits; nevertheless, DDoS attackers also have access to
when contemplating the scalability of virtual machines (VMs). The these features and may find them helpful. In order to accomplish "Denial
process of replicating a virtual machine and then releasing it is quick. To of Service," an attacker launching a DDoS attack will send out a flood of
alleviate strain, duplicate virtual machines might be launched on fake inquiries. Fig. 2 describes the classification, prevention and miti­
different servers [4]. This action can be taken at any time when it is gation of DDoS attacks [7]. Although DDoS attack technology is varied,
required. Streaming virtual machine deployments are an additional it has many similarities to the phenomena caused by the system.
significant expansion accelerator because they allow migrating an active Therefore, through the implementation of a distributed detection sys­
virtual server to a different nation’s more comprehensive hardware tem, we will strive to find the behavior of DDoS attacks in the first time,
server with practically no interruption. This guarantees ongoing and accurately locate the source and characteristics of attacks. Through
adaptability, which is further strengthened in this manner. the network abnormal traffic analysis system and DDoS detection tool,
timely find the abnormal traffic and DDoS behavior in the network, find
1.1.2. Pay-as-you-go reporting problems in time, and improve the overall detection and analysis ability
Upon request, utility services have grown in popularity due to their of the system.
convenience and the simplified resources reporting and invoicing they However, the targeted system must expend many resources to
provide. Customers of cloud computing services can take advantage of counter this hack. This "overload" condition would be seen as feedback
the "Pay-as-you-go" model without making any upfront financial com­ by the "auto-scaling" function, which would then add more CPUs (or
mitments for resources. The administrator of a virtual machine (VM) other resources) to the VM’s existing amount of readily available assets.
may want to dynamically adjust the number of resources that are First, a virtual machine will enter its "normal load VM" phase. Let us
accessible, either by adding more or taking them away [23]. Another assume the DDoS attack has commenced, and the VM is now over­
perk of adopting a cloud-based system is that you can get more use out of burdened as an immediate consequence of the attack. As soon as the
your hardware without worrying about things like electricity, space, cloud detects an overload, its auto-scaling features will kick in, and it
cooling, and maintenance. DDoS attacks in the cloud are only possible to will choose among the many methods described in the literature for
comprehend with a firm grasp of the financial aspects of doing so. Since allocating resources to virtual machines, migrating them, and relocating
most cloud instances are billed hourly, the minimum possible time frame them. When a virtual machine (VM) gets overloaded, it can be given
for accounting is typically 70 min. Funds could be allocated in three more resources, transferred to a server with more available resources, or
ways: a predetermined amount, a pay-as-you-go system, or auctions. have a copy of itself launched on a separate server [20]. If there is no
The size and volume of data transferred in and out of a computer countermeasure to halt this procedure, further resources will be added.
network also determine its usefulness. The "pay as you go" models are This can continue until the service provider makes a payment or the
experimental and still in the prototype phase [6]. cloud service provider exhausts all available resources, whichever
comes first. The eventual outcome of this is "Service Denial [8]." The vast
1.1.3. Multi-tenancy majority of DDoS attacks are organized and premeditated destructive
Multi-tenancy allows several Virtual Machines (VMs) belonging to acts. Relying solely on a technical department or an enterprise, it is
different VM proprietors to coexist on just one hardware system. impossible to completely solve the problem of security protection, let
Increasing hardware utilization and, consequently, one’s return on in­ alone to quickly track and locate the source of attacks. For the defense of
vestment (ROI) can be accomplished through multi-tenancy. On the DDoS attacks, cloud computing platform suppliers, communication op­
same physical server, a single user can want to run multiple instances of erators and government departments need to establish a cooperation
the same program or entirely distinct ones using different virtual mechanism to complete security defense.
machines. Consequently, this results in billing for resources only when used,
which raises the risk of incurring financial losses over the set limit. To
1.2. Cloud-based DDoS attack situation keep things manageable, we might run virtual machines with a static
resource profile, in which case SLA will not cover the provisioning of
The attack depicted in Fig. 1 is very normal. The cloud requires extra resources on demand. A "Denial of Service" (DoS) attack would
enormous computers that can service multiple users in a standardized immediately wipe out the cloud’s valuable features in this situation.
setting. An attacker’s purpose may not always be limited to a "Denial of Fig. 3 is the description of tiers of cloud-based DDoS defense.
Service" but can include reducing the profitability of cloud subscribers.
How to prevent assaults like this has been a hot topic since the inception 2. Related works
of cloud computing. The term "Fraudulent Resource Consumption"
(FRC) attacks have been used in many other works to characterize this DDoS attacks, which target computer systems, are becoming
type of attack. Dispersed denial of service attacks targeting web pages increasingly common. DDoS perpetrators have expanded their reach
and hackers plant bots and trojans on compromised systems all over the into practically every area of technology, especially the cloud, the IoT,
Internet. A DDoS attack will be executed as an EDoS attack if the target and the edge. Distributed denial of service (DDoS) attacks flood the
service is hosted in the cloud. "Booters" are businesses that connect their targeted machine or host with so much traffic that it crashes or exhausts
clients with a botnet to launch distributed denial of service attacks all available resources (including the network). Multiple strategies for
(DDoS) on their rivals’ web pages. Attacks like these can be spurred on defense have been proposed, but they have yet to be successful due to
by everything from commercial competition to political rivalry to attackers’ ability to educate themselves to employ recently discovered
ransom to full-out cyber war between nations [22]. In view of the computerized ways of attack. Because of this, we presented a machine-
working principle that DDoS consumes system resources and causes the learning-based approach to spotting distributed denial-of-service
system cannot provide normal services, network managers can optimize (DDoS) attacks in the cloud. K Nearest Neighbour, Random Forest,
and reinforce the system to improve the system’s tolerance of DDoS and Naive Bayes are three different categorization machine learning
attacks, and even block some DDoS attack packets. Firstly, improve the techniques that help the system detect a distributed denial of service
network planning and design scheme to eliminate the unreasonable attack with a 99.75% success rate. In our study, we offer a machine
factors of network structure; then implement the system security vul­ learning-based approach to identifying and blocking DDoS attacks
nerabilities and hidden dangers in the network system in the last, scan against servers situated in the cloud. Data mining for relevant statistics

3
Y. Shang Measurement: Sensors 31 (2024) 100991

Fig. 2. Cloud DDoS classification, prevention, and mitigation.

based infrastructures. In the following paper, we will examine the wide


variety of attacks that could occur in a cloud environment. There is
sometimes a conflation between the terms "bandwidth reduction" and
"resource reduction" when describing the impacts of distributed denial
of service (DDoS) attacks. Most distributed denial of service (DDoS)
assaults in the cloud are SYN Flood or Flash Crowd assaults. The analysis
found that TCP denial of service low-rate assaults and performance de­
creases are two of the most prevalent attack categories [11].
To spot distributed denial of service (DDoS) attacks, researchers are
trying out various machine-learning algorithms, some of which have
shown greater precision than others. In experiments, real-time network
logs, KDD, NSL-KDD, and CIDDS datasets were used to identify network
attacks. Also used to predict DDoS attacks, linear regression, and logistic
regression algorithms have been found to have high false favourable
rates when implemented in several databases. To improve precision and
Fig. 3. Several tiers of cloud-based DDoS defence.
recognition rates, however, it is constantly necessary to increase the
number of records used for training and testing the dataset, which is a
largely determined our recommended technique’s efficacy. Table 1
difficult task in and of itself. DDoS assaults cover a wide array of topics.
illustrate the results, from which it can be deduced that the suggested
Therefore, researchers can use many machine-learning techniques and
method has an excellent success rate (about 99.78 %) in identifying
classifiers in future studies. Furthermore, regression analysis has
DDoS attacks while producing few errors. Since we focused primarily on
received more usage in recently released literature [12]. As a potential
the supervised learning method in this study, future studies may
research strategy, we can reduce dimensionality and then use the
investigate uncontrolled or reinforcement learning methods [9].
remaining data for regression evaluation.
Distributed denial of service (DDoS) attacks are more challenging to
The number of people using online resources has increased recently
execute on the public internet than on a conventional network. There is
due to the COVID-19 outbreak. As a direct consequence, there has been
more than one threat to the cloud, and its surroundings are under attack
an increase in the number of end users subscribing to various cloud-
from several directions. Existing machine learning techniques, such as
based applications, which provide various services to the end user.
neural classifiers, can be used to identify DDoS attacks. This research
DDoS assaults, on the other hand, are aimed at interrupting cloud
aims to shed light on the results of an investigation into distributed
computing services’ availability and processing power. This has the ef­
denial of service (DDoS) attacks in cloud settings. The number of false
fect of negatively impacting both the performance and accessibility of
positives rises when artificial intelligence methods are applied for
cloud computing resources. There is currently no reliable method for
detection. The ANN, SVM, kNN, J48, Feature rank and Feature selection
detecting or filtering DDoS attacks, so they are a reliable tool for anyone
algorithms frequently detect Distributed Denial of Service (DDoS) at­
looking to launch cyberattacks. Recently, scientists have started exper­
tacks in a cloud context [10].
imenting with machine learning (ML) techniques to develop effective
The goal of this research was to examine several works associated
ML-based tactics to detect distributed denial of service (DDoS) attacks
with the identification of network assaults in both traditional and cloud-

Table 1
Dataset snapshot.
No. Time Preliminary Place Target Protocol Length Information

1 113.6020 11.0.2.20 194.172.8.2 DNS 77 Standard query 0x0aa0 A


2 113.6037 11.0.2.20 11.0.2.20 DNS 174 Standard query response 0x375e AAAA
3 113.6039 91.32.134.1 11.0.2.20 TCP 66 82 > 59,501 (ACK)Seq = 310 Ack = 18 Win = 66,646 Len = 0
4 113.6039 91.32.134.1 11.0.2.20 TCP 66 82 > 59,523 (ACK)Seq = 310 Ack = 18 Win = 66,646 Len = 0
5 113.6039 91.32.134.1 11.0.2.20 TCP 66 82 > 59,547 (ACK)Seq = 310 Ack = 18 Win = 66,646 Len = 0

4
Y. Shang Measurement: Sensors 31 (2024) 100991

[13]. In this scenario, we offer a method for detecting distributed denial suggest that LVQ-based feature selection in the DT model may be more
of service (DDoS) assaults in a cloud computing environment by accurate than the other methods in identifying attacks. As mentioned
combining big data with deep learning methods. The proposed method earlier, the model also outperforms its predecessors in terms of accuracy,
employs big data sparking innovation to examine many incoming recollection, particularity, and f-score.
packets and a deep learning machine learning algorithm to filter
fraudulent transmissions. Both of these technologies are used to make 3. Materials and methods
the methodology more effective. The testing and training phases were
done with the KDDCUP99 dataset, and the final result attained a pre­ 3.1. Navie Bayes algorithm
cision of 99.82 %. Even if the number of people using smart devices
proliferates, the computing power and resources available in these de­ The premise that the most straightforward answers often turn out to
vices still need improvement. be the most enlightening is evident in Naive Bayes and may be
The cloud-based system offers multiple solutions for overcoming the demonstrated in practice in daily situations. Machine learning has come
issue of scarce resources by allowing for their cooperative use. The cloud a long way in recent years, but its continued development shows that it
computing platform is periodically targeted by attackers while being can still be kept very straightforward without compromising efficiency,
susceptible to a wide range of cyber threats. As such, we provide access accuracy, or dependability. It serves many functions and has particular
to a DDoS warning system that is capable to detect the DDoS attack in a strength in resolving problems associated with natural language pro­
timely and accurate fashion. To avoid malicious or undesirable com­ cessing (NLP). In machine learning, the naive Bayes technique is a
munications from reaching a cloud computing environment, we offer an standard statistical methodology used to solve classification problems
approach that employs big data and deep learning techniques. This is based on the Bayes Theorem. To clarify any lingering questions, the
achieved by employing these methods. We hope to eventually imple­ following paragraphs will thoroughly explain the Naive Bayes algorithm
ment our suggested approach along with additional methods to enhance and its core concepts. The speed with which an NB model may be built
its overall functioning and test its usefulness on a wide range of datasets makes it particularly useful when dealing with vast amounts of data. The
[14]. Naive Bayes approach has been widely used because of its simplicity and
Additionally, a more effective DDoS attack avoidance mechanism ability to outperform more complex classification techniques. The
might be constructed and recommended as a future work of this study in foundation of a Bayesian classification is the assumption that indicators
order to manage DDoS attacks in a cloud computing environment in an can be treated separately. A Naive Bayes classifier assumes that the
efficient manner. The examination of various DDoS prevention strate­ presence of one feature in a class does not influence the presence of any
gies that have been used in the past, as well as those that are considered other feature, which simplifies things.
state-of-the-art, is the only purpose of this work. The scope of future The Naive Bayes classifier is a popular guided machine learning
study may be expanded to include the presentation of a novel and approach in applications like text classification. Since it mimics the
effective DDoS prevention method to deal with the attacks [15]. distribution of inputs for a given class or category, it belongs to the
The term "cloud computing" describes a new and attractive model for group of learning algorithms known as generative learning approaches.
administering and distributing offerings over the World Wide Web. To be successful, this tactic relies on the assumption that the input data’s
Because of this, information retention strategies are changing across the attributes are conditionally independent given the class. This allows for
IT environment. Data security must be considered when handling fast and accurate recommendation generation by the system.
massive amounts of data storage. Intruders pose one of the biggest Naive Bayes classifiers, which implement Bayes’ statistical theorem,
challenges to data security in the modern Internet environment. The are often thought of as being used for more fundamental probabilistic
resources, data, and applications stored on the public internet are categorization tasks. This theorem incorporates empirical evidence and
vulnerable to assault due to the system’s connection. Intrusion Detection supplementary context when determining a hypothesis’s credibility. In
Systems (IDS) are employed in the cloud to monitor malicious behavior order to function, the naive Bayes classifier relies on the assumption that
on both the network and the host systems. Because it creates so much the input data’s attributes are unrelated to one another. Contrarily, real-
illicit information online, detecting a Distributed Denial of Service world scenarios usually play out differently. Although based on an
(DDoS) attack is challenging for Intrusion detection systems (IDS). unduly naive premise, the Naive Bayes classifier sees widespread
Cybersecurity analytics can aid in the detection of intrusions through the application. This is because it serves its purpose well and has proven
use of methods for data mining. Many distinct approaches have been highly efficient in several practical settings.
developed with machine learning methods as their foundation [16]. One of the simplest Bayesian network models, naive Bayes classifiers,
Selecting features is another effective method for decreasing the can achieve high levels of reliability when used in conjunction with
dataset’s dimensionality. This research proposes two distinct ap­ kernel density estimation. Despite their simplicity, they are used less
proaches for utilizing the dataset generated via NSL-KDD. Learning than other Bayesian network models. When the distribution pattern of
Vector Quantization (LVQ) is a filtering technique that comes first. The the input data is not given, using a kernel function to approximate the
second technique is dimensionality reduction by principal component probability density function of the input data can help the classifier
analysis (PCA). Naive Bayes (NB), Support Vector Machine (SVM), and operate better. The purpose of developing this strategy was to raise ef­
Decision Tree (DT) categorization were applied to the characteristics ficiency. This proves that the naive Bayes classifier is an effective ma­
chosen from each technique, and the results were compared for their chine learning technique for various purposes, including but not limited
ability to identify DDoS attacks [17]. The results show that the to text categorization, spam filtering, and sentiment analysis. Thomas
LVQ-based DT method is superior to the alternatives when it comes to Bayes is credited with developing the method for predicting a proba­
spotting attacks. Unauthorized access to confidential data must be bility given a set of known probabilities currently known as Bayes’
detected as the first step in securing that information [18]. Theorem. Fig. 4 is the layout of Navie bayes.
The NSL-KDD standard is the foundation for a cloud-based intrusion
detection system. In this study, we explore data pertaining to distributed 3.2. Understanding Naive Bayes and machine learning
denial of service attacks. LVQ, PCA, and other feature selection methods
were used to classify the attacks using machine learning techniques such Machine learning has two main branches: supervised learning and
as neural networks, support vector machines, and decision trees. In unsupervised learning. Classification and regression are two subsets of
order to properly categorize DDoS attacks, it was necessary to look at supervised learning that can be distinguished here. Classification is
how well various techniques worked [19–21]. The PCA selected 21 where the Naive Bayes method excels. The naive Bayes method was used
features from a possible 42, while the LVQ selected only 20. The results for face recognition. People’s faces and other features, like their noses,

5
Y. Shang Measurement: Sensors 31 (2024) 100991

would probably come up with an attack plan, and that plan would
involve a Distributed denial of service attack, which would involve
methods like the "ping of death." A distributed denial of service (DDOS)
assault is one of the most damaging types of cyber-attacks since it dis­
rupts the entire system. Due to the flood of packets caused by the DDoS
assault, all services are either momentarily or completely inaccessible.
ParrotSec, like Kali and Ubuntu, can be managed via command line
interface, with the shell or prompt serving as the main interface for
entering these instructions.
Fig. 4. Procedure for navie bayes. Wireshark thoroughly analyzes each incoming packet. After finish­
ing the thorough packet analysis, a large data set was produced, which
mouths, eyes, etc., can be recognized using this classification method. In may indicate the presence of a classifier. The experimental setting
meteorology, it can be used to foretell whether the following weather demonstrates that both the random forest and the naive Bayes classifier,
will be pleasant or unpleasant. Doctors can make accurate diagnoses both of which are well-known, produce excellent results. While various
with the help of the classifier. Doctors can assess a patient’s likelihood of other classifiers may be used for detection (support vector machines, k-
developing cancer, cardiovascular disease, or other disorders using the nearest neighbors, k-means, etc.), "Naive Bayes" is still the most
Naive Bayes approach. Using a Naive Bayes classifier, Google News can effective.
decide whether a news piece is about politics, the world, or any other In this work, naive Bayes is applied to the problem of predicting
topic. The Naive Bayes classifier has the advantages of being simple, application-layer packets during distributed denial-of-service attacks.
easily implemented, and requiring little training data. Both continuous Notwithstanding the apparent simplicity, the Naive Bayes algorithm
and discrete data types are manageable using this method. It is stable may make precise forecasts using the current data. The data set under
even when exposed to many predictors and data points. It is fast, can be consideration was trained with naive Bayes, and then a fresh informa­
used to make predictions in the here and now, and does not care about tion set was built using the cross-validation technique with 65 folds. This
trivial details. was done so that we could figure out where the files were coming from
and where they were going. The true affirmative level, false alarm rate,
4. Proposed method fake negative level, and many more are just some of the metrics that may
be derived from this fresh information set. Naive Bayes, a technique for
Gathering relevant data should be the initial step. By collecting making predictions, produces a mix of correct and incorrect results. A
relevant data, we can locate and exploit several security holes in the fake negative is considered an alarm for the benefit of internet con­
victim’s computers in our attack. All available information regarding sumers. Naive Bayes and random forest both correctly identified the true
running services, open and closed ports, and other security holes is positives as ordinary packets, whereas the false negatives were classified
compiled during the information-gathering phase. Here, the attacker has as DDoS attacks.
a better chance of learning the weak spots of the victim, making further
attacks much simpler. The cloud service provider assigns a different port 5. Experimentation & results
number to each of its services, such as: In most cases, FTP uses port 990,
but it can use port 21 as well; HTTP uses port 80. TCP and UDP use ports 5.1. Data pre-processing
20 through 23 for various purposes.
In conclusion, gathering information is a procedure that provides an Regarding data mining, the most efficient method is preliminary data
attacker with all the necessary data to launch a successful attack on any processing. It streamlines complex information into something everyone
target system. In order to learn more about a network, we can employ can understand. Due to its unreliability and lack of granularity, real-time
the Nmap scanner. It simply needs the target machine’s IP address to data necessitates transforming pretreatment into valuable information.
launch an attack; at this point, it will perform a full system scan, This is because information in real-time is often unreliable and vague.
revealing the targeted system’s activity, services, open ports, and so on. Weka includes numerous options for preprocessing filters. A single filter,
This implies that when the exposed connection is found, whatever such as normalization, is chosen from the available options. Data stan­
occurring right now may be shown, regardless of what OS the other dardization, or "making data un-redundantly," refers to removing su­
system is using. We would probably come up with an attack plan, and perfluous or identical information from a dataset.
that plan would involve a Distributed denial of service attack, which
would involve methods like the "ping of death." A distributed denial of
5.2. Training data set
service (DDOS) assault is one of the most damaging types of cyberattacks
since it disrupts the entire system. Due to the flood of packets caused by
The procedure for the collection of collecting training information
the DDoS assault, all services are either momentarily or completely
includes the construction of a machine-learning model. Programming a
inaccessible. ParrotSec, like Kali and Ubuntu, can be managed via
computer algorithm typically requires the use of data to train it. Said
command line interface, with the shell or terminal serving as the pri­
training information is a subset of a dataset used for instruction and
mary interface for entering these instructions. This feature is shared with
evaluation alongside the entire dataset. Separating the datasets into
ParrotSec. Since ParrotSec handles everything, you can type "PING IP"
training and testing sets is an essential first step when developing a
into the console, and it will be carried out. Since the victim site would
machine learning-based model. However, a model driven by machine
receive over 65 thousand packets, all services would be taken down.
learning is necessary to generate further forecasts against the newly
This is how an assault could be generated. The subsequent stage is
acquired dataset.
detection. In this case, the target is a website hosted in the cloud, and
Nmap is used to scan the entire site in order to locate any security flaws.
This would lead to the exposure of any underlying problems. After the 5.3. Prediction algorithm
exposed ports have been made public, a Python script comprising a
distributed denial of service attack will be created and run. This implies Following the development and validation of the information set,
that when the exposed connection is found, whatever occurring right various algorithms have been developed through this process to antici­
now may be shown, regardless of what OS the other system is using. We pate several of the issues. In this particular scenario, one must consider
identifying whether DDoS messages are harmful or not.

6
Y. Shang Measurement: Sensors 31 (2024) 100991

5.4. Prediction of naive bayes the Gaussian distribution. After the computation, the outcomes are
displayed on a two-dimensional network. The Gaussian Naive Bayes
The percentages of real positives and fake positives are displayed in approach, which requires the calculation of the mean and standard de­
this figure. viation for analysis, is applied once the quantitative input has been
The percentage of fake positives is seen as an indicator of a distrib­ gathered. Table 1 is about the dataset format sample.
uted denial of service attack (DDoS) or of fake data packets. In contrast,
the proportion of actual positives is the standard one. In this case, the 5.6.3. Matlab’s Current classification using the Naive Bayes algorithm
average mean of actual packets is 0.973, while the overall mean for Matlab is the application we employ for the method of categorization
fraudulent transmissions is approximately 0.05. because it is not only user-friendly but also highly effective in producing
aesthetically pleasing outcomes. In the environment of analyzing in­
5.5. Proposed formula for naive bayes formation, a tool built into Matlab allows users to do Naive Bayes
categorization. Using this method, we can also classify network traffic as
either K, L, or Q to gain further insight into the type of data transmitted
P(x|y) = P(y|x) P(x) / P(y) throughout an internet connection. This concept will be challenging to
grasp for a significant number of individuals. The Matlab script for the
Where, The conditional probability of y given x is denoted by P (y|x), Naive Bayes classification and the parameters that go along with it are
The likelihood of a class being P(x) and the conditional likelihood of a displayed in the following figure. The results of categorizing the infor­
predictor is P(y), Probability of occurrence is P (x|y). mation obtained from the system are shown in the figure. The nonlinear
shape the blue line represents limits the standard class set, of which the
5.6. Basic theory green circle is a component. The blue line shows these limitations. The
other variety is an array of red squares depicting some threat. Fig. 5
5.6.1. Three-way handshake defines the DDoS attack detection using MATLAB.
The between-machine communication paradigm is depicted in Fig. 2,
and it must be adhered to for the communication to succeed. A three- 6. Conclusion
way handshake is the name given to this particular protocol. Within
the context of this dialogue, a protocol exchange takes place between the The key goals of this study are to learn how to recognize and prevent
server and the hacker. When establishing a standard TCP relationship, attacks involving distributed denial-of-service. The first and most crucial
the attacker contacts the client by sending an SYN protocol. This is step is determining which ports can be exploited. Nevertheless, this
referred to as the "three-way handshake." A buffer will be allotted to the approach is not risk-free because susceptible ports are more likely to be
user by the server as a reaction, and the server will also send back an exploited. Given ParrotSec’s track record for stability and performance,
ACK packet in addition to the SYN packet. At this stage, the connection is we decided it would be the ideal choice for our company’s computer
in a state that is referred to be "partially accessible," and it is waiting for system. Since a DDoS attack involves sending one million separate
an ACK response from the adversary in order to complete the link packets toward the target, starting with an on-the-internet website
configuration. The process that occurs once it has been determined that would be best. The targeted website was taken offline after it became
a relationship has been successfully established is called the three-way clear that an assault had happened. Machine learning is constructive in
handshake. this detecting process as well. Using this data, the most popular and
On the other hand, instances known as TCP SYN Flood are intended accessible tool, "weka," is being trained. Employing pre-processing
to exploit this three-way handshake by saturating the server with an techniques and the "discretize" filter to achieve the desired effect.
excessive number of SYN queries. The denial of functionality attack, of Therefore, the following phase is not only quite intriguing but also
which TCP SYN Flood is a prominent example, falls within the DoS rather useful for both forecasting and detecting. We employed both
category. Employing a prolonged link and monitoring a duplicate of the methods and compared the findings on the same platform, and we found
server’s activity is required for a packet capture program to identify a that the naive Bayes method provides the most trustworthy conclusions.
TCP SYN Flood as having occurred. One way to accomplish this is to PCA selected 21 features from the possible 42 features, while LVQ
keep an eye on a copy of the server’s traffic. Introducing an incoming IP selected only 20 features. The results suggest that LVQ based feature
Address to the server typically corresponds with the manifestation of selection in the DT model may be more accurate than other methods in
TCP SYN Flood properties. After being submitted to calculation within a identifying attacks. As mentioned earlier, the model also outperformed
predetermined period, IP Addresses that continually show on the server the previous models in terms of accuracy, recall, specificity, and f-score.
are utilized to get characteristics in a DDoS attack. It was shown that the naive Bayes model had significantly better pre­
dictive power than the random forest model. There is a chance that a
5.6.2. Naive Bayes algorithm false positive rate warning will be triggered for packet transmissions
A simple computational approach that can be used to calculate within a network. Moreover, when compared to the random forest,
conditional likelihoods is the Naive Bayes Theorem. A probabilistic naive Bayes produces considerably more accurate forecasts. It was
condition quantifies the likelihood of one event based on the presump­ demonstrated that the Naive Bayes algorithm outperformed the random
tion, premise, declaration, or reality that a second event has already forest technique to identify the false and actual rate of transmissions.
occurred. An analogy would be the chance of something happening after The result detection is not carried out in real time. Although attacks can
something else has happened. The posterior likelihood can be computed be detected, real-time alarm cannot be realized in the environment of
using a formula like the one below based on the Naive Bayes theorem. high cluster security, so the feasibility of real-time monitoring under
Hadoop platform should be studied continuously.
P(B|A)P(A)
P(A|B) =
P(B)
Declaration of competing interest
If A is more likely if B happens to be accurate, then P (A|B) represents
the conditional likelihood of B if A is true. In probability theory, P(A) The authors declare that they have no known competing financial
stands for the likelihood of occurrence A, and P(B) stands for the like­ interests or personal relationships that could have appeared to influence
lihood of occurrence B. We discussed using the packet-capturing soft­ the work reported in this paper.
ware as a computational input to estimate the IP address and packet
length obtained. We did the maths using the Naive Bayes method and

7
Y. Shang Measurement: Sensors 31 (2024) 100991

Fig. 5. Categorization outcome using MATLAB module.

Data availability [6] P. Arun Raj Kumar, S. Selvakumar, Detection of distributed denial of service
attacks using an ensemble of adaptive and hybrid neuro-fuzzy systems, Comput.
Commun. 36 (3) (2013) 303–319.
No data was used for the research described in the article. [7] A.S. Boroujerdi, S. Ayat, A robust ensemble of neuro-fuzzy classifiers for DDoS
attack detection, in: Proc. 2013 3rd Int. Conf. Comput. Sci. Netw. Technol. ICCSNT
Acknowledgement 2013, 2014, pp. 484–487.
[8] L. Kwiat, C.A. Kamhoua, K.A. Kwiat, J. Tang, Risks and benefits: game-theoretical
analysis and algorithm for virtual machine security management in the cloud,
The study was supported by Key R&D and Promotion Special Project Assur. Cloud Comput. (2018) 49–80.
(Science and Technology Research) in Henan Province [9] H.S. Mondal, M.T. Hasan, M.B. Hossain, M.E. Rahaman, R. Hasan, Enhancing
secure cloud computing environment by Detecting DDoS attack using fuzzy logic,
(232102210146)" in: 3rd Int. Conf. Electr. Inf. Commun. Technol. EICT 2017, 2018-Janua, 2018,
pp. 1–4. December.
References [10] P. Mishra, E.S. Pilli, V. Varadharajan, U. Tupakula, Intrusion detection techniques
in cloud environment: a survey, J. Netw. Comput. Appl. 77 (October 2016) (2017)
18–47.
[1] X. Jing, Z. Yan, W. Pedrycz, Security data collection and data analytics in the
[11] R. Biswas, J. Wu, Filter assignment policy against distributed denial-of-service
internet: a survey, IEEE Commun. Surv. Tutorials 21 (1) (2019) 586–618.
attack, Proc. Int. Conf. Parallel Distrib. Syst. - ICPADS 2018– Decem (2019)
[2] K.J. Singh, K. Thongam, T. De, Detection and differentiation of application layer
537–544.
DDoS attack from flash events using fuzzy-GA computation, IET Inf. Secur. 12 (6)
[12] S. Abbas, T. Alyas, A. Athar, M.A. Khan, A. Fatima, W.A. Khan, EAI Endorsed
(2018) 502–512.
Transactions Cloud Services Ranking by Measuring Multiple Parameters Using
[3] T. Subbulakshmi, S. Mercy Shalinie, C. Suneel Reddy, A. Ramamoorthi, Detection
AFIS, 2014, pp. 1–7.
and classification of DDoS attacks using fuzzy inference system, Commun. Comput.
[13] K. Iqbal, M. Adnan, S. Abbas, Z. Hasan, A. Fatima, Intelligent transportation system
Inf. Sci. 89 CCIS (2010) 242–252.
(ITS) for smart-cities using mamdani fuzzy inference system, Int. J. Adv. Comput.
[4] N. Tabassum, M. S. Khan, S. Abbas, T. Alyas, A. Athar, and M. A. Khan, “EAI
Sci. Appl. 9 (2) (2018) 94–105.
Endorsed Transactions Intelligent reliability management in hyper- convergence
[14] R.L. Neupane, T. Neely, P. Calyam, N. Chettri, M. Vassell, R. Durairajan, Intelligent
cloud infrastructure using fuzzy inference system,”vol. 4, no. 5, pp. 1–12.
defense using pretense against targeted attacks in cloud platforms, Future Generat.
[5] A.K. Soliman, C. Salama, H.K. Mohamed, Detecting DNS reflection amplification
Comput. Syst. 93 (2019) 609–626.
DDoS attack originating from the cloud, in: Proc. - 2018 13th Int. Conf. Comput.
[15] T. Alyas, M.S. Khan, Intelligent reliability management in software based cloud
Eng. Syst. ICCES 2018, 2019, pp. 145–150.
ecosystem using AGI 17 (12) (2017) 134–139.

8
Y. Shang Measurement: Sensors 31 (2024) 100991

[16] N.S. Naz, S. Abbas, M. Adnan, B. Abid, N. Tariq, M. Farrukh, Efficient load [20] S.A. Miller, O. Behalf, C. America, CASE STUDY HYPERCONVERGENCE VS
balancing in cloud computing using multi-layered mamdani fuzzy inference expert CLOUD, 2017, pp. 134–139.
system, Int. J. Adv. Comput. Sci. Appl. 10 (3) (2019) 569–577. [21] T. Alyas, M.S. Khan, Intelligent reliability management in software based cloud
[17] Rudol, Implementasi keamanan jaringan komputer pada virtual private network ecosystem using AGI 17 (12) (2017) 134–139.
(vpn) menggungakan, Implementasi Keamanan Jar. Komput. Pada Virtual Priv. [22] R.E. Spiridonov, V.D. Cvetkov, O.M. Yurchik, Data Mining for Social Networks
Netw. Menggungakan Ipsec 2 (1) (2017) 65–68. Open Data Analysis, 2017, pp. 395–396.
[18] W. Alosaimi, M. Alshamrani, K. Al-Begain, Simulation-based study of distributed [23] L. Wang, Y. Ma, J. Yan, V. Chang, A.Y. Zomaya, pipsCloud: high performance cloud
denial of service attacks prevention in the cloud, Proc. - NGMAST 2015 9th Int. computing for remote sensing big data management and processing, Future
Conf. Next Gener. Mob. Appl. Serv. Technol. (2016) 60–65. Generat. Comput. Syst. 78 (2018) 353–368.
[19] N.C.S.N. Iyengar, G. Ganapathy, Chaotic theory based defensive mechanism
against distributed Denial of Service Attack in cloud computing environment, Int.
J. Secur. its Appl. 9 (9) (2015) 197–212.

You might also like