Paper 8 CN

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/259267751

Network traffic anomaly detection using machine learning approaches

Conference Paper · April 2012


DOI: 10.1109/NOMS.2012.6211951

CITATIONS READS
23 3,689

2 authors:

Kriangkrai Limthong Thidarat Tawsook


Bangkok University Bangkok University
8 PUBLICATIONS 85 CITATIONS 1 PUBLICATION 23 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Kriangkrai Limthong on 20 August 2014.

The user has requested enhancement of the downloaded file.


Network Traffic Anomaly Detection using
Machine Learning Approaches
Kriangkrai Limthong∗ and Thidarat Tawsook†
∗ Graduate University for Advanced Studies (Sokendai), Tokyo 101-8430, Japan
† Computer Engineering Department, Bangkok University, Pathumthani 12120, Thailand
[email protected], [email protected]

Abstract—One of the biggest challenges for both network of service (DoS) attacks or when there are accidents, out-
administrators and researchers is detecting anomalies in network ages, or misconfigurations. On the other hand, the supervised
traffic. If they had a tool that could accurately and expeditiously algorithms can cover and detect a wide range of network
detect these anomalies, they would prevent many of the serious
problems caused by them. We conducted experiments in order to anomalies that the unsupervised algorithms cannot. The major
study the relationship between interval-based features of network assumption of supervised algorithms is that the anomalous
traffic and several types of network anomalies by using two network traffic is statistically different from normal network
famous machine learning algorithms: the naı̈ve Bayes and k- traffic. There have been many studies on supervised anomaly
nearest neighbor. Our findings will help researchers and network detection [4], such as the Bayesian network-based one [5], k-
administrators to select effective interval-based features for each
particular type of anomaly, and to choose a proper machine nearest neighbor-based one [6], and support vector machine-
learning algorithm for their own network system. based one [7]. Nevertheless, a comparison of the performances
Index Terms—anomaly detection, time interval, network traffic of these algorithms has not yet been conducted.
analysis, machine learning, naı̈ve Bayes, nearest neighbor Almost all of the previous studies used packet-based or
connection-based features that have a scalability issue when
I. I NTRODUCTION the number of packets or connections increases. For example,
Anomalies in network traffic are major symptoms of com- if we have network traffic consisting of 10 packets for 10
puter security problems and network congestion. Network ad- seconds long, the processing time on the packet-based features
ministrators and researchers have been trying to find a method will be 10 units just like with the interval-based features. By
that can accurately and expeditiously perceive anomalies in contrast, if the network traffic consisting of 1,000 packets
network traffic. Generally, we can classify anomaly detection for 10 seconds long, the processing time on the packet-based
methods into two major groups [1]: signature-based methods feature will be 1,000 units but the interval-based feature still
and statistical-based methods. remains 10 units. For this reason, we are looking for alternate
The signature-based methods monitor and compare network features called interval-based features that could answer the
packets or connections with predetermined patterns known as scalability problem of the packet-based or connection-based
signatures. This technique is a simple and efficient processing features.
of the audit data. Although the false positive rate of these The contributions of this work are to study and compare
techniques can also be low, comparing packets or connections nine different interval-based features by using two famous
with large sets of signatures is a time consuming task and has machine learning algorithms, namely naı̈ve Bayes and k-
limited predictive capabilities. The signature-based methods nearest neighbor.
cannot detect novel anomalies that would not be defined in the
signatures, and thus administrators frequently have to update II. M ATERIALS AND M ETHODS
the system signatures. The naı̈ve Bayes algorithm is a simple probabilistic learning
The statistical-based methods, however, have the ability algorithm that is based on applying the Bayes’ theorem [8],
to learn behavior of network traffic and the possibility of which has been applied to many domains such as document
detecting novel anomalies. The machine learning approach is classification and face recognition. The k-nearest neighbor
one of the statistical-based methods that has high capabilities is a method for classifying objects based on the closest
to automatically learn to recognize complex patterns and make training examples in the feature space [9]; it is one of the
intelligent decisions based on data [2]. There are two basic simplest learning algorithms. Both of the algorithms have
types of machine learning techniques: unsupervised algorithms different strengths, weaknesses, and especially dissimilar time
and supervised algorithms. consumption during the training and testing phase.
The unsupervised algorithms take a set of unlabeled data as
its input and attempt to find anomalies based on the assumption A. Data Sets
that the large proportion of data is normal network traffic We acquired three-month data traces of an anomaly-free
[3]. However, it is not true in many cases, such as in denial network from an edge router of the Internet service center
978-1-4673-0269-2/12/$31.00 © 2012 IEEE at Kasetsart University, Thailand. This center is for college

542
TABLE I
C HARACTERISTICS OF S ELECTED A NOMALIES

Packet Size
Occurrence #AvgPacket
Sources #SrcAddr #DstAddr #SrcPort #DstPort #Packet Min:Avg:Max %Anomaly
(Seconds) per Second
(Bytes)
back
Week 2 Fri 1 1 1,013 1 43,724 60:1,292.31:1,514 651 67.16 0.75
Week 3 Wed 1 1 999 1 43,535 60:1,297.29:1,514 1,064 40.92 1.23
ipsweep
Week 3 Wed 1 2,816 1 104 5,657 60:60.26:118 132 42.86 0.15
Week 6 Thurs 5 1,779 2 105 5,279 60:67.75:118 4,575 1.15 5.30
neptune
Week 5 Thurs 2 1 26,547 1,024 205,457 60:60:60 3,143 65.37 3.64
Week 6 Thurs 2 1 48,932 1,024 460,780 60:60:118 6,376 72.27 7.38
Week 7 Fri 2 1 25,749 1,024 205,600 60:60:60 3,126 65.77 3.62
portsweep
Week 5 Tues 1 1 1 1,024 1,040 60:60:60 1,024 1.02 1.19
Week 5 Thurs 1 1 1 1,015 1,031 60:60:60 1,015 1.02 1.17
Week 6 Thurs 2 2 2 1,024 1,608 60:60:60 1,029 1.56 1.19
smurf
Week 5 Mon 7,428 1 1 1 1,931,272 14:1,066:1,066 1,868 1,033.87 2.16
Week 5 Thurs 7,428 1 1 1 1,932,325 14:1,066:1,066 1,916 1,008.52 2.22
Week 6 Thurs 7,428 1 1 1 1,498,073 1,066:1,066:1,066 1,747 857.51 2.02

students, educators, and researchers so that they can ascertain traffic, we do need real and clean network traffic to train
advantageous information for their studies from the Internet. the classifiers because making an effective decision on all the
There are about 1,300 users per day, and the service time is classifiers depends on the training data. The second reason is
between 8:30 and 24:00 on every weekday. The users cannot the selected anomalies are test bed data that anyone can use it
change or install any software from the computer clients, and for evaluating detection methods on their own network traffic.
the administrators provide appropriate software for all ordinary
TABLE II
users. Moreover, the administrators regularly update the virus I NTERVAL - BASED E VALUATION
signatures of the anti-virus software installed on all of the
clients. At the end of every day, all the clients automatically Test Result
Actual Status
reverse their operating system and all the software back to the Anomaly Normal
Anomaly True Positive (TP) False Positive (FP)
initial state, so we can guarantee that all of them are clean Normal False Negative (FN) True Negative (TN)
and anomaly-free.
We selected 39 days of clean data traces to train all the
classifiers in the training phase and other 16 days to combine B. Evaluation
them with several types of anomalies. The selected anomalies The measure of accuracy that we use for evaluating is F-
are from the Lincoln Laboratory at the Massachusetts Institute measure [12], which takes into consideration both the precision
of Technology [10], [11]. These anomalies were provided and recall [13] of the test to compute the score. We use the
for researchers who would like to evaluate and compare the precision, recall, and F-measure on a per-interval basis. All
efficiency of their own anomaly detection method. the measures can be calculated based on the following four
We selected five different types of anomalies that have the parameters: the true positive (TP), which is the number of
characteristics listed in Table I. The main criteria we took into anomalous intervals correctly detected, the false positive (FP),
account are the number of source and destination address, the which is the number of normal intervals wrongly detected
number of source and destination port, and the number of as anomalous intervals, the false negative (FN), which is
average packet per second. The back attack is a denial of the number of anomalous intervals not detected, and the
service attack against the Apache web server through port 80, true negative (TN), which is the number of normal intervals
where a client requests a URL containing many backslashes. correctly detected. All of the parameters are defined in Table
The ipsweep attack is a surveillance sweep performing either II. From these parameters, the precision, recall, and F-measure
a port sweep or ping on multiple IP addresses. The neptune are derived by using Eqs. (1)-(3), respectively:
attack is a SYN flood denial of service attack on one or more
destination ports. The portsweep attack is a surveillance sweep 𝑇𝑃
precision = , (1)
through many ports to determine which services are supported 𝑇𝑃 + 𝐹𝑃
on a single host. The smurf attack is an amplified attack using 𝑇𝑃
an ICMP echo reply flood. recall = , (2)
𝑇𝑃 + 𝐹𝑁
There are two reasons why we selected the network data
traces from different sources. First, although the data traces precision × recall
F-measure = 2 × . (3)
from MIT consist of both normal and anomaly network precision + recall

2012 IEEE Network Operations and Management Symposium (NOMS): Short Papers 543
In Eq. (1), the precision or positive predictive value is the Back Attack

percentage of detected intervals that are actually anomalies.


In Eq. (2), the recall or sensitivity value is the percentage of
0.5
the actual anomalous intervals that are detected. Equation (3) 0.5

F-measure
0.45
shows the F-measure value, which is the harmonic mean of the 0.4
0.4
precision and recall. We used the F-measure value as a single 0.3
0.35

measure of the classifier performance because it represents the 0.3


0.25
performance of anomaly detection even better than an accuracy 0.2
0.2
value or receiver operating characteristic (ROC).
4
TABLE III 3.5
I NTERVAL - BASED N ETWORK T RAFFIC F EATURES 3 50
60
Discriminant Value 40
(σ) 2.5 30
20
10
f# Features Description 2 0
Interval (Second)

f1 Packet Number of packets


f2 Byte Sum of packet size
f3 Flow Number of flows
f4 SrcAddr Number of source addresses Fig. 1. F-measure of Packet Feature (f1) with Naı̈ve Bayes
f5 DstAddr Number of destination addresses
f6 SrcPort Number of source ports Back Attack
f7 DstPort Number of destination ports 1
f8 ΔAddr ∣SrcAddr − DstAddr∣ 0.75
precision
recall
f9 ΔPort ∣SrcPort − DstPort∣ 0.5 F
0.25

f1 f2 f3 f4 f5 f6 f7 f8 f9

IpSweep Attack
III. P RELIMINARY R ESULTS 1
0.75
Due to the characteristics of selected anomalies, we focused 0.5
on the nine interval-based features of the network traffic listed 0.25

in Table III. The results include the outcome from the naı̈ve f1 f2 f3 f4 f5 f6 f7 f8 f9

Bayes and k-nearest neighbor classifications, which depict the 1


Neptune Attack

comparison of the F-measure values across all nine features. 0.75


0.5
0.25
A. Experiment 1: Naı̈ve Bayes Classification
f1 f2 f3 f4 f5 f6 f7 f8 f9

In the first experiment, we varied the discriminant value and PortSweep Attack

time interval value in the naı̈ve Bayes learning algorithm to 0.75


1

find the best F-measure value. First, we used the testing data 0.5
0.25
of the back attacks by using the Packet feature (f1). Figure
f1 f2 f3 f4 f5 f6 f7 f8 f9
1 shows the F-measure values on the back attacks by using
Smurf Attack
the Packet feature (f1). Then, we located the best F-measure 1
values from this figure. Second, we switched the feature from 0.75
0.5
the Packet feature (f1) to the Byte feature (f2) and so on. After 0.25

that, we changed the testing data from the back attacks to the f1 f2 f3 f4 f5 f6 f7 f8 f9

ipsweep attacks. We went through the same process as for the Fig. 2. Feature Comparison of Naı̈ve Bayes Algorithm
back attacks and repeated it for all types of anomalies. Next,
we compared the precision, recall, and F-measure values on
all the features for each type of anomaly as shown in Fig. 2. value for the back attacks by using the Packet feature (f1)
The x-axis indicates the features (f1-f9) that are listed in Table from Fig. 3. Next, we moved on different features until all
III. The results show the effective features for a particular type nine features were covered. After that we changed the testing
of anomaly using the naı̈ve Bayes algorithm. data to different types of anomalies just like we did with the
naı̈ve Bayes algorithm. Figure 4 shows a comparison of the
B. Experiment 2: k-Nearest Neighbor Classification precision, recall, and F-measure values for all the features of
In this experiment, we set 𝑘 = 3 and then varied the each type of anomaly. The main objective of this experiment
distance value and time interval value in k-nearest neighbor was to compare the results from the k-nearest neighbor with
algorithm to find the highest F-measure value. If the number those from the naı̈ve Bayes algorithm.
of training examples measured from a testing example with
the distance value is equal or greater than 3, we classify the IV. D ISCUSSION
testing example as normal. Like in the prior experiment, we We found from preliminary results that both experiments
first focused on the testing data of the back attacks by using the on the two algorithms produced virtually the same results. The
Packet feature (f1). Then, we spotted the highest F-measure effective features for the back attack were the Byte feature (f2)

544 2012 IEEE Network Operations and Management Symposium (NOMS): Short Papers
Back Attack V. C ONCLUSION
We conducted experiments using machine learning algo-
0.4
rithms to answer the questions as to which interval-based
0.35
features of network traffic are feasible enough to detect
F-measure

0.3 0.3
0.25 anomalies, and searched for a technique that might possibly
0.2 0.2
0.15
improve the accuracy of detecting anomalies. We selected nine
0.1
0.1 interval-based features of network traffic, five types of test bed
0 0.05
0 anomalies, and two machine learning algorithms to study and
evaluate the accuracy of each feature. The preliminary results
30
25
revealed the more practical features for each of the anomaly
20
15 50
60 types, although these are only from the naı̈ve Bayes and k-
Distance Value 40
10
5 20
30 nearest neighbor algorithms. Our next steps are performing
10 Interval (Second)
0 0
the experiment on the support vector machine and applying a
signal processing technique, the wavelet transform, in the part
of the feature extraction to improve the accuracy of anomaly
Fig. 3. F-measure of Packet Feature (f1) with k-Nearest Neighbor detection.
ACKNOWLEDGMENTS
Back Attack
1
precision
We gratefully acknowledge the funding from the Faculty
0.75
0.5
recall
F Members Development Scholarship Program of Bangkok Uni-
0.25
versity, Thailand.
f1 f2 f3 f4 f5 f6 f7 f8 f9

IpSweep Attack R EFERENCES


1
0.75
[1] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,”
0.5 ACM Comput. Surv., vol. 41, pp. 15:1–15:58, July 2009.
0.25 [2] C. Sinclair, L. Pierce, and S. Matzner, “An application of machine learn-
f1 f2 f3 f4 f5 f6 f7 f8 f9 ing to network intrusion detection,” in Computer Security Applications
Conference, 1999. (ACSAC ’99) Proceedings. 15th Annual, 1999, pp.
Neptune Attack
1
371 –377.
0.75 [3] K. Leung and C. Leckie, “Unsupervised anomaly detection in network
0.5 intrusion detection using clusters,” in Proceedings of the Twenty-eighth
0.25 Australasian conference on Computer Science - Volume 38, ser. ACSC
f1 f2 f3 f4 f5 f6 f7 f8 f9 ’05. Darlinghurst, Australia, Australia: Australian Computer Society,
PortSweep Attack
Inc., 2005, pp. 333–342.
1 [4] P. Laskov, P. Dssel, C. Schfer, and K. Rieck, “Learning intrusion detec-
0.75 tion: Supervised or unsupervised?” in Image Analysis and Processing
0.5 ICIAP 2005, ser. Lecture Notes in Computer Science, F. Roli and
0.25
S. Vitulano, Eds. Springer Berlin / Heidelberg, 2005, vol. 3617, pp.
f1 f2 f3 f4 f5 f6 f7 f8 f9 50–57.
Smurf Attack [5] D. Barbará, J. Couto, S. Jajodia, and N. Wu, “Adam: a testbed for
1 exploring the use of data mining in intrusion detection,” SIGMOD Rec.,
0.75 vol. 30, pp. 15–24, December 2001.
0.5
[6] L. (vivian) Kuang, “Dnids: A dependable network intrusion detection
0.25
system using the csi-knn algorithm,” 2007.
f1 f2 f3 f4 f5 f6 f7 f8 f9 [7] L. Khan, M. Awad, and B. Thuraisingham, “A new intrusion detection
Fig. 4. Feature Comparison of k-Nearest Neighbor Algorithm system using support vector machines and hierarchical clustering,” The
VLDB Journal, vol. 16, pp. 507–521, October 2007.
[8] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification (2nd
Edition), 2nd ed. Wiley-Interscience, Nov. 01.
[9] G. Shakhnarovich, T. Darrell, and P. Indyk, Nearest-Neighbor Methods
and Packet feature (f1), for both of the algorithms, respectively. in Learning and Vision: Theory and Practice (Neural Information
The feasible features for the ipsweep attack were the DstAddr Processing). The MIT Press, 2006.
[10] R. Lippmann, D. Fried, I. Graf, J. Haines, K. Kendall, D. McClung,
feature (f5), DstPort feature (f7), and the ΔAddr feature (f8). D. Weber, S. Webster, D. Wyschogrod, R. Cunningham, and M. Ziss-
All of the features can be used for the neptune attack except man, “Evaluating intrusion detection systems: the 1998 darpa off-line
the SrcAddr feature (f4), DstAddr feature (f5), and ΔAddr intrusion detection evaluation,” vol. 2, 2000, pp. 12 –26 vol.2.
[11] R. Lippmann, J. W. Haines, D. J. Fried, J. Korba, and K. Das, “The
feature (f8). For the portsweep attack, the only feature that 1999 darpa off-line intrusion detection evaluation,” Computer Networks,
produces the highest F-measure value was the DstPort feature vol. 34, no. 4, pp. 579 – 595, 2000, recent Advances in Intrusion
(f7). The SrcAddr feature (f4) and the ΔAddr feature (f8) can Detection Systems.
[12] C. J. V. Rijsbergen, Information Retrieval. Newton, MA, USA:
be used for the smurf attack as well as the Packet feature Butterworth-Heinemann, 1979.
(f1), Byte feature (f2), and Flow feature (f3). Surprisingly, we [13] J. Davis and M. Goadrich, “The relationship between precision-recall
found that almost all the features of the k-nearest neighbor and roc curves,” in ICML ’06: Proceedings of the 23rd international
conference on Machine learning. New York, NY, USA: ACM, 2006,
showed higher recall values than the naı̈ve Bayes one for the pp. 233–240.
same types of anomalies.

2012 IEEE Network Operations and Management Symposium (NOMS): Short Papers 545

View publication stats

You might also like