IEEE Conference Templa

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Network Traffic Anomaly Detection using Machine

Learning Approaches

Ashlesha Shetty Sohan Rai Sunil Kumar Vaishnavi D Rane


CSE, SCEM CSE, SCEM CSE, SCEM CSE, SCEM
Mangalore, India Mangalore, India Mangalore, India Mangalore, India

Abstract—The rise of IoT technology and the surge in wireless II. L ITERATURE S URVEY
networking devices have triggered a significant uptick in network
attacks originating from diverse sources. Safeguarding networks
A. Using Machine Learning to Analyze Network Traffic
has become paramount, making Intrusion Detection Systems Anomalies
(IDS) pivotal. These systems are geared to identify abnormal Anastasia Khudoyarovaet. ed [1] studied the application of
behaviors or unauthorized usage, offering more intricate se- machine learning methods, as well as spectral and statistical
curity measures than simple access control barriers. While
IDS functions at endpoint or network levels, the progression methods for real time network traffic anomaly detection.
to intrusion prevention systems allows real-time responses to They determine the strengths and weaknesses of the existing
breaches. Detailed visibility into network traffic is essential for methods and compare them in terms of efficiency. They then
an accurate IDS, enabling detection of internal threats and build a system of criteria to ensure the most efficient anomaly
access control breaches. Traditionally, IDS relied on rule and detection while meeting the specified system performance and
signature-based approaches, effective in reducing false positives
but limited in detecting new attacks. The current landscape resource consumption requirements.
demands a data-driven approach due to escalated connectivity
leading to a surge in attacks. This paper utilizes the KDD dataset
B. Anomaly detection in Network Traffic Using Unsupervised
to train an unsupervised machine learning algorithm, Isolation Machine learning Approach.
Forest, addressing the dataset’s imbalances and redundancies. Aditya Vikram et.ed [2] used the KDD data set to train
This model aims to detect outliers and potential attacks within the unsupervised machine learning algorithm called Isolation
network traffic, evaluated through anomaly scores.
Forest. The data set is highly imbalanced and contains various
I. I NTRODUCTION attacks such as DOS, Probe, U2R, R2L. Since this data set
suffers from a redundancy of values and class imbalance,
The proliferation of networking devices is exponential, espe- the data preprocessing will be performed first and also used
cially within workplaces handling sensitive data communica- unsupervised learning. For this network traffic based anomaly
tion. The surge in unknown attacks, originating both internally detection model isolation forest was used to detect outliers and
and externally, underscores the critical need to ensure secure probable attacks the results were evaluated using the anomaly
network access for customers and users while safeguarding score.
against attacks. Intrusion Detection Systems (IDS) offer a key
strategy in thwarting digital attacks, functioning as a vital C. Machine Learning Mechanisms for Network Anomaly De-
component of system security. They scrutinize data traffic to tection System.
identify interruptions or anomalies, with this study focusing Sweety Singh et. Ed [3] used a comparative study about
on an anomaly-based IDS. few machine learning methods in this article using NSL-KDD
Anomaly detection systems presume abnormal behavior as dataset for the analysis purpose. Finally, the simulated results
potentially malicious, employing machine learning models have been compared by implementing of Na¨ıve Bayes classi-
trained to discern normal behavior and flag deviations as alerts. fier (NB), Support Vector Machine (SVM) and Decision Tree
This approach aligns well with addressing such challenges. classifier on NSL-KDD dataset. Recursive Feature Elimination
Although IDS have existed for decades, early models relying (RFE) and Principal Component Analysis (PCA) have been
on heuristics and thresholds managed false positives and used for selecting the appropriate features among all features
negatives but struggled with new attack detection. present in the dataset to improve the accuracy and processing
Given the surge in wireless devices and cloud comput- speed of the IDS.
ing, the frequency of attacks has soared, necessitating a
shift toward data-driven strategies for companies. However, D. Anomaly Detection Method Based on Clustering Under-
machine learning-based approaches face challenges, such as sampling and Ensemble Learning
misidentifying normal instances as false positives, leading to Wenming Huan et. Ed [4] have used , an undersampling
wastage of time and resources. method based on clustering to process imbalanced data sets.
Set the number of clusters in normal flow samples to the points. A nuanced array of parameters is strategically utilized
number of abnormal flow samples, and then use the cluster in the construction and instantiation of the Isolation Forest
center nearest neighbor sample points as retained sample model. Among these parameters, the contamination parameter
individuals to achieve the purpose of under-sampling. The assumes a pivotal role, exercising control over the decision
effective fusion of clustering undersampling and Adaboost function’s threshold in determining when a scored data point
algorithm makes the algorithm pay more attention to samples should be deemed an outlier.
that are difficult to judge in the data set, and further improves The temporal complexity of the Isolation Forest is a note-
the effect of network traffic anomaly detection. K-RUSboost worthy characteristic, quantified as O(number of samples *
algorithm is simulated in Moore data set and iscxvpn2016 data n-estimators * log(sample size)). This linear time complexity
set. renders the Isolation Forest remarkably efficient, rendering
it particularly well-suited for real-time anomaly detection
E. Data Analysis for Anomaly Detection to Secure Rail Net- applications.
work In the instantiation of the Isolation Forest object, specific
Huaqun Guo et. Ed [14] in focuses on data analysis for parameters play a pivotal role. Notably, n-estimators governs
anomaly detection with Wireshark and packet analysis system. the number of trees or base estimators to be erected for
An alert function is also developed to provide an alert when estimation and outlier detection, while max-sample dictates
abnormality happens. Rail network traffic data have been cap- the quantity of training data points sampled for the analysis
tured and analyzed so that their network features are obtained of each tree. The contamination-param assumes significance
and used to detect the abnormality. To improve efficiency, a as it represents the proportion of outliers within the dataset,
packet analysis system is introduced to receive the network thereby influencing the threshold for identifying anomalous
flow and analyze data automatically. The provision of two data points. In the context of the current implementation, a
detection methods, i.e., the Wireshark detection and the packet contamination factor of 1
analysis system together with the alert function will facilitate Evaluation metrics are paramount in gauging the efficacy
the timely detection of abnormality and triggering of alert in and robustness of the Isolation Forest model. In this imple-
the rail network. mentation, the anomaly score and AUC score were deemed
as pivotal metrics, providing a lucid indication of the model’s
III. DESIGN METHODOLOGY AND performance. The ensuing flowchart, delineated in Figure 3,
IMPLEMENTATION serves as a comprehensive visual guide to the process of outlier
detection.

Fig. 1. Figure 2
Fig. 2. Figure 3
The block diagram depicted in Figure 2 intricately illustrates
the mechanics of anomaly detection through the application Training models necessitated a meticulous division between
of Isolation Forest algorithms.In the realm of unsupervised training and test data, a practice conducive to enhancing
anomaly detection, these algorithms employ a distinctive ap- algorithmic performance. Given the unsupervised nature of the
proach—constructing random forests and subsequently scru- Isolation Forest, obviating the need for target labels during
tinizing the average depth required to isolate individual data training, a judicious analysis of various arguments was un-
dertaken to initialize the Isolation Forest model appropriately. Furthermore, strict adherence to security and privacy com-
Remarkably, the Isolation Forest stands out in the realm of pliance standards will be prioritized, encompassing robust
anomaly detection due to its celerity when juxtaposed with encryption, stringent access controls, and alignment with reg-
alternative algorithms, reinforcing its stature as an efficacious ulatory frameworks such as GDPR.
and expeditious solution. Comprehensive documentation, coupled with training ini-
tiatives for security personnel, will ensure the successful
IV. PROPOSED SYSTEM
deployment and management of the enhanced anomaly de-
Proposing an advanced system to elevate the existing tection system. A well-informed and trained team is crucial
machine-learning model for anomaly detection in network for leveraging the system’s full potential and navigating the
security involves a comprehensive and intricate strategy. Build- complex landscape of network security challenges. In sum-
ing on the strong foundation laid by the current model, mary, the proposed system integrates dynamic adaptability,
characterized by an impressive AUC score and carefully tuned ensemble techniques, advanced feature engineering, and real-
parameters, the proposed enhancements aim to fortify the time learning to create a sophisticated, responsive, and user-
system’s capabilities in addressing the ever-evolving landscape friendly anomaly detection system tailored to the evolving
of cybersecurity threats. nature of network security threats.
Firstly, the system will implement dynamic parameter op-
timization mechanisms, allowing the model to adapt in real-
time to shifting network conditions. This entails automating V. CONCLUSION
the tuning of critical parameters, such as ”n-estimators” and
”contamination,” based on the dynamic characteristics of in- In response to the inherent challenges posed by highly
coming data streams. imbalanced data, a meticulous endeavor led to the construction
Ensemble techniques will be explored to complement the of an unsupervised machine-learning model. The resulting
existing model’s predictive prowess. By combining multiple AUC score, a formidable 98.3
anomaly detection algorithms or models, the system seeks to This model takes shape within the context of a dynamic
provide a more nuanced and robust approach to identifying cybersecurity landscape, characterized by a burgeoning array
and mitigating diverse threats. of sophisticated network attacks. Recognizing the imperative
Feature engineering will be a key focus, delving deeper of real-time threat detection, organizations are investing in the
into network traffic patterns to extract more relevant and development of highly efficient Intrusion Detection Systems
discriminative features. This comprehensive analysis aims to (IDS). Anomaly detection emerges as a beacon of promise
uncover new indicators of anomalous behavior, enhancing in this endeavor, leveraging its efficiency in training and
the model’s ability to discern subtle deviations from normal its capacity to minimize false positives and false negatives,
network activity. thereby offering a nuanced approach to threat identification.
Real-time learning capabilities will be integrated into the The implementation process not only yielded commendable
system to facilitate continuous model updates. This adaptive results but also provided valuable insights into opportunities
learning approach ensures that the model stays attuned to for refining the anomaly detection process. Experimentation
emerging threats without the need for periodic and resource- with diverse parameter values emerged as a key strategy for
intensive retraining sessions. optimizing model performance. Moreover, a notable revelation
Additionally, the proposed system will incorporate external was the significant impact of dataset quality on outcomes. A
threat intelligence feeds, providing valuable context to the more comprehensive and clean dataset consistently correlated
anomaly detection process. This integration enhances the sys- with improved anomaly detection results, underscoring the
tem’s understanding of potential threats, enabling it to make symbiotic relationship between data quality and model effi-
more informed decisions and reducing false positive rates. cacy.
Addressing scalability challenges is paramount, and the Within this evolving landscape, the contamination parame-
system will be optimized to handle larger datasets and in- ter’s pivotal role becomes even more pronounced, influencing
creased traffic volumes. This may involve the implementation the model’s ability to accurately identify the proportion of
of parallel processing, distributed computing, or the utilization anomalies. It serves as a critical lever for fine-tuning the
of cloud resources to ensure efficient scaling. model’s performance, emphasizing the importance of param-
A user-friendly interface will be developed, featuring visu- eter tuning in anomaly detection frameworks.
alization tools and real-time dashboards tailored for security However, it is crucial to acknowledge the relatively nascent
analysts. This intuitive interface empowers analysts to interpret status of machine learning and deep learning applications in
and act upon the system’s insights swiftly and effectively. the network security domain. Challenges, particularly related
The proposed system will establish a continuous evaluation to scalability and efficiency, persist as the field continues to
process, incorporating feedback loops from security analysts. mature. Ongoing efforts are dedicated to addressing these
This iterative approach ensures ongoing refinement and opti- challenges, reflecting a commitment to refining and optimizing
mization based on real-world insights, enhancing the system’s the application of these advanced technologies in fortifying
adaptability and effectiveness. network security infrastructures.
R EFERENCES
[1] [1] Khudoyarova, Anastasia, Mikhail Burlakov, and Mikhail
Kupriyashin. ”Using Machine Learning to Analyze Network Traffic
Anomalies.” In 2021 IEEE Conference of Russian Young Researchers
in Electrical and Electronic Engineering (ElConRus), pp. 2344-2348.
IEEE, 2021.
[2] [2] Vikram, Aditya. ”Anomaly detection in network traffic using unsu-
pervised machine learning approach.” In 2020 5th International Confer-
ence on Communication and Electronics Systems (ICCES), pp. 476-479.
IEEE, 2020.
[3] [3] Singh, Sweety, and Subhasish Banerjee. ”Machine learning mech-
anisms for network anomaly detection system: A review.” In 2020
International Conference on Communication and Signal Processing
(ICCSP), pp. 0976-0980. IEEE, 2020.
[4] [4] Huan, Wenming, Haitao Lin, Haixue Li, Yan Zhou, and Yiming
Wang. ”Anomaly detection method based on clustering undersampling
and ensemble learning.” In 2020 IEEE 5th Information Technology and
Mechatronics Engineering Conference (ITOEC), pp. 980-984. IEEE,
2020.
[5] [5] Guo, Huaqun, Xiaoyi Shen, Wang Ling Goh, and Luying Zhou.
”Data Analysis for Anomaly Detection to Secure Rail Network.” In 2018
International Conference on Intelligent Rail Transportation (ICIRT), pp.
1-5. IEEE, 2018.

You might also like