Towards Generating Real-Life Datasets For Network Intrusion Detection
Towards Generating Real-Life Datasets For Network Intrusion Detection
Towards Generating Real-Life Datasets For Network Intrusion Detection
2015 683
Department of Computer Science and Engineering, Kaziranga University, Jorhat-785006, Assam, India1
(Email: [email protected])
Department of Computer Science and Engineering, Tezpur University, Tezpur-784028, Assam, India2
(Email: [email protected])
Department of Computer Science, University of Colorado at Colorado Springs, CO 80918, USA3
(Email: [email protected])
(Received February 5, 2015; revised and accepted Apr. 20 & May 9, 2015)
DARPA 1998 dataset [26] is commonly used for perfor- for real-time deployment in certain situations. Most ex-
mance evaluation of anomaly detection systems [24]. So isting datasets have been created based on the following
that one method can be compared against others. assumptions.
• Parameters tuning: To properly obtain the model to clas- – Anomalous traffic is statistically different from nor-
sify the normal from malicious traffic, it is necessary to mal traffic [13].
tune model parameters. Network anomaly detection as- – The majority of network traffic instances is nor-
sumes the normality model to identify malicious traffic. mal [36].
For example, Cemerlic et al. [9] and Thomas et al. [44]
use the attack-free part of the DARPA 1999 dataset for However, unlike most traditional intrusions, DDoS at-
training to estimate parameter values. tacks do not follow these assumptions because they
change network traffic rate dynamically and employ
• Dimensionality or the number of features: An optimal set multi-stage attacks. A DDoS dataset must reflect this fact.
of features or attributes should be used to represent nor-
mal as well as all possible attack instances.
1.3 Motivation and Contributions
1.2 Requirements By considering the aforementioned requirements, we propose
a systematic approach for generating real-life network intru-
Although good datasets are necessary for validating and evalu- sion dataset at both packet and flow levels with a view to
ating IDSs, generating such datasets is a time consuming task. analyzing, testing and evaluating network intrusion detection
A dataset generation approach should meet the following re- methods and systems with a clear focus on anomaly based de-
quirements. tectors. The following are the major contributions of this paper.
• Real world: A dataset should be generated by monitor- • We present guidelines for real-life intrusion dataset gen-
ing the daily situation in a realistic way, such as the daily eration.
network traffic of an organization.
• We discuss systematic generation of both normal and at-
• Completeness in labelling: The labelling of traffic as be- tack traffic.
nign or malicious must be backed by proper evidence for
each instance. The aim these days should be to provide • We extract features from the captured network traffic
labelled datasets at both packet and flow levels for each such as basic, content-based, time-based, and connec-
piece of benign and malicious traffic. tion-based features using a distributed feature extraction
framework.
• Correctness in labelling: Given a dataset, labelling of
each traffic instance must be correct. This means that our • We generate three categories of real-life intrusion
knowledge of security events represented by the data has datasets, viz., (i) TUIDS (Tezpur University Intrusion
to be certain. Detection System) intrusion dataset, (ii) TUIDS coor-
dinated scan dataset, and (iii) TUIDS DDoS dataset.
• Sufficient trace size: The generated dataset should be un- These datasets are available for the research community
biased in terms of size in both benign and malicious traffic to download for free.
instances.
• Concrete feature extraction: Extraction of an optimal set 1.4 Organization of the Paper
of concrete features when generating a dataset is impor-
tant because such features play an important role when The remainder of the paper is organized as follows. Section 2
validating a detection mechanisms. discusses prior datasets and their characteristics. Section 3 is
dedicated to the discussion of a systematic approach to gen-
• Diverse attack scenarios: With the increasing frequency, erate real-life datasets for intrusion detection with a focus on
size, variety and complexity of attacks, intrusion threats network anomaly detectors. Finally, Section 4 presents obser-
have become more complex including the selection of tar- vations and concluding remarks.
geted services and applications. When contemplating at-
tack scenarios for dataset generation, it is important to tilt
toward a diverse set of multi-step attacks that are recent. 2 Existing Datasets
• Ratio between normal and attack traffic: Most bench- As discussed earlier, datasets play an important role in the test-
mark datasets are biased because the proportion of normal ing and validation of network anomaly detection methods or
and attack traffic are not the same. This is because nor- systems. A good quality dataset not only allows us to iden-
mal traffic is usually much more common than anomalous tify the ability of a method or a system to detect anomalous
traffic. However, the evaluation of an intrusion detection behavior, but also allows us to provide potential effective-
method or system using biased datasets may not be fit ness when deployed in real operating environments. Several
International Journal of Network Security, Vol.17, No.6, PP.683-701, Nov. 2015 685
datasets are publicly available for testing and evaluation of net- • Denial of Service (DoS): An attacker attempts to prevent
work anomaly detection methods and systems. A taxonomy of valid users from using a service provided by a system.
network intrusion datasets is shown in Figure 1. We briefly Examples include SYN flood, smurf and teardrop attacks.
discuss each of them below.
• Remote to Local (r2l): Attackers try to gain entrance to
a victim machine without having an account on it. An
example is the password guessing attack.
97277
in Table 3.
telnet, rlogin
multihop, phf, spy
Service(s)
r2l
smtp
dns
dns
ftp
NSL-KDD dataset
X
X
-
Attacks
r2l
Attack name
-
dictionary
Table 2: List of attacks and corresponding services in KDDcup99 dataset
sendmail
ftp-write
xsnoop
named
named
stances
xlock
guest
imap
Total
1126
1126
1126
rootkit,
u2r
ios were carried out over several network and audit scenarios.
Attack name
stances
loadmodule
These sessions were grouped into four attack phases: (a) prob-
-
Total
ffbconfig
fdformat
52
52
52
perl
Service(s)
many
many
many
many
Probe
Attack name
ipsweep
Total
4107
4107
4107
mscan
nmap
satan
saint
back, teardrop,
Any TCP
Any TCP
tune,
syslog
icmp
smtp
N/A
N/A
http
http
DoS
DoS
229853
229853
stances
Total
ping of death
process table
SYN flood
mailbomb
udpstorm
teardrop
apache2
syslogd
smurf
back
land
Corrected KDD
KDD99
Dataset
ideval/data/index.html
3 http://cctf.shmoo.com/data/
4 http://www.caida.org/home/
International Journal of Network Security, Vol.17, No.6, PP.683-701, Nov. 2015 687
traces are anonymized backbone traces without their payload. 2.2.7 Endpoint Dataset
The CAIDA DDoS 2007 attack dataset contains one hour of
The background and attack traffic for the endpoint datasets are
anonymized traffic traces from DDoS attacks on August 4,
described below.
2007, which attempted to consume a large amount of net-
work resources when connecting to Internet servers. The traf- • Endpoint background traffic: In the endpoint context, we
fic traces contain only attack traffic to the victim and responses see in Table 5 that home computers generate significantly
from the victim with 5 minutes split form. All traffic traces are higher traffic volumes than office and university comput-
in pcap (tcpdump) format. The creators removed non-attack ers because: (i) they are generally shared between mul-
traffic as much as possible when creating the CAIDA DDoS tiple users, and (ii) they run peer-to-peer and multimedia
2007 dataset. applications. The large traffic volumes of home comput-
ers are also evident from their high mean number of ses-
2.2.6 LBNL Dataset sions per second. To generate attack traffic, developers
on infect Virtual Machines (VMs) at the endpoints with
LBNL’s internal enterprise traffic traces are full header net- different malware, viz., Zotob.G, Forbot-FU, Sdbot-AFR,
work traces without payload [23]. This dataset suffers from Dloader-NY, So-Big.E@mm, MyDoom.A@mm, Blaster,
heavy anonymization to the extent that scanning traffic was Rbot-AQJ, and RBOT.CCC. Details of the malware can
extracted and separately anonymized to remove any informa- be found in [42]. Characteristics of the attack traffic in
tion which could identify individual IPs. The background and this dataset are given in Table 6. These malwares have
attack traffic in the LBNL dataset are described below. diverse scanning rates and attack ports or applications.
• LBNL background traffic: This dataset can be ob- • Endpoint attack traffic: The attack traffic logged at the
tained from the Lawrence Berkeley National Laboratory endpoints is mostly comprised of outgoing port scans.
(LBNL) in the US. Traffic in this dataset is comprised Note that this is the opposite of the LBNL dataset, in
of packet level incoming, outgoing and internally routed which most attack traffic is inbound. Moreover, the attack
traffic streams at the LBNL edge routers. Traffic was traffic rates at the endpoints are generally much higher
anonymized using the tcpmkpub tool [35]. The main ap- than the background traffic rates of the LBNL datasets.
plications observed in the internal and external traffic are This diversity in attack direction and rates provides a
Web, email and name services. Other applications like sound basis for performance comparison among scan de-
Windows services, network file services and backup were tectors. For each malware, attack traffic of 15 minute
used by internal hosts. The details of each service and duration was inserted in the background traffic for each
information on each packet and other relevant description endpoint at a random time instance. This operation was
are given in [34]. The background network traffic statis- repeated to insert 100 non-overlapping attacks of each
tics of the LBNL dataset are given in Table 4. worm inside each endpoint’s background traffic.
• LBNL attack traffic: This dataset identifies attack traf- 2.3 Real-life Datasets
fic by isolating scans in aggregate traffic traces. Scans
are identified by flagging those hosts which unsuccess- We discuss three real-life datasets created by collecting net-
fully probe more than 20 hosts, out of which 16 hosts are work traffic on several consecutive days. The details include
probed in ascending or descending IP order [35]. Mali- both normal as well as attack traffic in appropriate proportions
cious traffic mostly consists of failed incoming TCP SYN in the authors’ respective campus networks (i.e., testbeds).
requests, i.e., TCP port scans targeted towards LBNL
hosts. However, there are also some outgoing TCP scans 2.3.1 UNIBS Dataset
in the dataset. Most UDP traffic observed in the data (in-
The UNIBS packet traces [45] were collected on the edge
coming and outgoing) is comprised of successful con-
router of the campus network of the University of Brescia
nections, i.e., host replies for the received UDP flows.
in Italy, on three consecutive working days. The dataset in-
Clearly, the attack rate is significantly lower than the
cludes traffic captured or collected and stored using 20 work-
background traffic rate. Details of the attack traffic in this
stations, each running the GT (Ground Truth) client daemon.
dataset are shown in Table 4. Complexity and privacy
The dataset creators collected the traffic by running tcpdump
were two main reservations of the participants of the end-
on the faculty router, which was a dual Xeon Linux box that
point data collection study. To address these reservations,
connected the local network to the Internet through a dedicated
the dataset creators developed a custom multi-threaded
100Mb/s uplink. They captured and stored the traces on a ded-
MS Windows tool using the W inpcap API [7] for data
icated disk of a workstation connected to the router through a
collection. To reduce packet logging complexity at the
dedicated ATA controller.
endpoints, they only logged very elementary session-level
information (bidirectional communication between two
IP addresses on different ports) for the TCP and UDP 2.3.2 ISCX-UNB Dataset
packets. To ensure user privacy, an anonymization pol- The ISCX-UNB dataset [37] is built on the concept of profiles
icy was used to anonymize all traffic instances. that include the details of intrusions. The datasets were col-
International Journal of Network Security, Vol.17, No.6, PP.683-701, Nov. 2015 688
Table 4: Background and attack traffic information for the LBNL datasets
Date Duration LBNL hosts Remote hosts Background traffic rate Attack traffic rate
(mins) (packet/sec) (packet/sec)
10/04/2004 10 min 4,767 4,342 8.47 0.41
12/15/2004 60 min 5,761 10,478 3.5 0.061
12/16/2004 60 min 5,210 7,138 243.83 72
among multiple hosts for individual actions which may be syn- binations. This attack has been designed with the goal of ac-
chronized. We use the rnmap10 tool to launch coordinated quiring an SSH account by running a brute force dictionary at-
scans in our testbed network during the collection of traffic. tack against our central server. We use the brutessh11 tool and
a customized dictionary list. The dictionary consists of over
6100 alphanumeric entries of varying length. We executed the
3.3.4 Scenario 4: User to Root Using Brute Force ssh
attack for 60 minutes, during which superuser credentials were
These attacks are very common against networks as they tend returned from the server. This ID and password combination
to break into accounts with weak username and password com- was used to download other users’ credentials immediately.
10 http://rnmap.sourceforge.net/ 11 http://www.securitytube-tools.net/
International Journal of Network Security, Vol.17, No.6, PP.683-701, Nov. 2015 691
Figure 5: (a) composition of protocols and (b) average throughput during last hour of data capture for the TUIDS intrusion
dataset seen in our lab’s traffic
The extent and scope of network traffic capture become 3.8 Comparison with Other Relevant Work
relevant in situations where the information contained in the
traces may breach the privacy of individuals or organizations. Our approach differs from other works as follows.
In order to prevent privacy issues, almost all publicly avail-
able datasets remove any identifying information such as pay- • The NSL-KDD [32] dataset is an enhanced version of
load, protocol, destination and flags. In addition, the data is the KDDcup99 intrusion dataset prepared by Tavallaee
anonymized where necessary header information is cropped or et al. [43]. This dataset is too old to evaluate a mod-
flows are just summarized. ern detection method or a system that has been devel-
oped recently. It removes repeated traffic records from
the old KDDcup99 dataset. In contrast, our datasets are
In addition to anomalous traffic, traces must contain back- prepared using diverse attack scenarios incorporating re-
ground traffic. Most captured datasets have little control over cent attacks. Our datasets contain both packet and flow
the anomalous activities included in the traces. However, a level information that help detect attacks more effectively
major concern with evaluating anomaly based detection ap- in high speed networks.
proaches is the requirement that anomalous traffic must be
present at a certain scale. Anomalous traffic also tends to be- • Song et al. [39] prepared the KU dataset and used the
come outdated with the introduction of more sophisticated at- dataset to evaluate an unsupervised network anomaly de-
tacks. So, we have generated more up-to-date datasets that tection method. This dataset contains 17 different features
reflect the current trends and are tailored to evaluate certain at packet level only. In contrast, we present a systematic
characteristics of detection mechanisms which are unique to approach to generate real-life network intrusion datasets
themselves. and prepared three different categories of datasets at both
packet and flow levels.
As discussed earlier, several datasets are available for evalu-
ating an IDS. Network intrusion detection researchers evaluate • Like Shiravi et al. [37], our approach considers recently
detection methods using intrusion datasets to demonstrate how developed attacks and attacks on network layers when
their methods can handle recent attacks and network environ- generating the datasets. Shiravi et al. concentrate mostly
ments. We have used our datasets to evaluate several network on application-layer attacks. They build profiles for dif-
intrusion detection methods. Some of them are outlier-based ferent real-world attack scenarios and use them to gener-
network anomaly detection approach (NADO) [4], an unsu- ate traffic that follows the same behavior while generating
pervised method [3, 6], an adaptive outlier-based coordinated the dataset at packet level. In comparison, we generate
scan detection approach (AOCD) [5], and a multi-level hybrid three different categories of datasets at both packet and
IDS (MLH-IDS) [15]. We found better results in almost all the flow levels for the research community to evaluate detec-
experiments when we used TUIDS dataset in terms of false tion methods or systems. Since we have extracted more
positive rate, true positive rate and F-measure. number of features at both packet and flow levels. Our
International Journal of Network Security, Vol.17, No.6, PP.683-701, Nov. 2015 696
datasets will help to identify individual attacks in more The following are the major observations and requirements
effectively in high speed networks. when generating an unbiased real-life dataset for intrusion de-
tection.
4 Observations and Conclusion • The dataset should not exhibit any unintended property in
both normal and anomalous traffic.
Several questions may be raised with respect to what consti-
• The dataset should be labelled properly.
tutes a perfect dataset when dealing with the datasets gener-
ation task. These include qualities of normal, anomalous or • The dataset should cover all possible current network sce-
realistic traffic included in the dataset. We provide a path and narios.
a template to generate a dataset that simultaneously exhibits
the appropriate levels of normality, anomalousness and real- • The dataset should be entirely nonanonymized.
ism while avoiding the various weak points of currently avail-
able datasets, pointed out earlier. Quantitative measurements • In most benchmark datasets, the two basic assumptions
can be obtained only when specific methods are applied to the described in Section 1 are valid but this bias should be
dataset. avoided as much as possible.
International Journal of Network Security, Vol.17, No.6, PP.683-701, Nov. 2015 697
DIS-122004.pdf)
Table 14: TUIDS dataset traffic composition
Protocol Size (MB) (%)
[11] DEFCON, The SHMOO Group, 2011. (http://cctf.
(a) Total traffic composition shmoo.com/)
IP 66784.29 99.99
ARP 3.96 0.005 [12] L. Delooze, Applying Soft-Computing Techniques to
IPv6 0.00 0.00 Intrusion Detection, Ph.D. Thesis, Computer Science
IPX 0.00 0.00
STP 0.00 0.00 Department, University of Colorado, Colorado Springs,
Other 0.00 0.00 2005.
(b) TCP/UDP/ICMP traffic composi-
tion [13] D. E. Denning, “An intrusion-detection model,” IEEE
TCP 49049.29 73.44% Transactions on Software Engineering, vol. 13, pp. 222–
UDP 14940.53 22.37%
ICMP 2798.43 4.19% 232, Feb. 1987.
ICMPv6 0.00 0.00 [14] A. A. Ghorbani, W. Lu, and M. Tavallaee, “Network at-
Other 0.00 0.00
tacks,” in Network Intrusion Detection and Prevention,
pp. 1–25, Springer-verlag, 2010.
[15] P. Gogoi, D. K. Bhattacharyya, B. Bora, and J. K.
[9] A. Cemerlic, L. Yang, and J.M. Kizza, “Network intru- Kalita, “MLH-IDS: A multi-level hybrid intrusion detec-
sion detection based on bayesian networks,” in Proceed- tion method,” The Computer Journal, vol. 57, pp. 602–
ings of the 20th International Conference on Software 623, May 2014.
Engineering and Knowledge Engineering, pp. 791–794, [16] P. Gogoi, M. H. Bhuyan, D. K. Bhattacharyya, and
San Francisco, USA, 2008. J. K. Kalita, “Packet and flow-based network intrusion
[10] A. Dainotti and A. Pescape, “PLAB: A packet cap- dataset,” in Proceedings of the 5th International Con-
ture and analysis architecture,” 2004. (http://traffic. ference on Contemporary Computing, LNCS-CCIS 306,
comics.unina.it/software/ITG/D-ITGpublications/TR- pp. 322–334, Springer, 2012.
International Journal of Network Security, Vol.17, No.6, PP.683-701, Nov. 2015 699
Table 15: Distribution of normal and attack connection instances in real time packet and flow level TUIDS datasets
Dataset type
Connection type Training dataset Testing dataset
(a) TUIDS intrusion dataset
Packet level
Normal 71785 58.87% 47895 55.52%
DoS 42592 34.93% 30613 35.49%
Probe 7550 6.19% 7757 8.99%
Total 121927 - 86265 -
Flow level
Normal 23120 43.75% 16770 41.17%
DoS 21441 40.57% 14475 35.54%
Probe 8282 15.67% 9480 23.28%
Total 52843 - 40725 -
(b) TUIDS coordinated scan dataset
Packet level
Normal 65285 90.14% 41095 84.95%
Probe 7140 9.86% 7283 15.05%
Total 72425 - 48378 -
Flow level
Normal 20180 73.44% 15853 65.52%
Probe 7297 26.56% 8357 34.52%
Total 27477 - 24210 -
(c) TUIDS DDoS dataset
Packet level
Normal 46513 68.62% 44328 60.50%
Flooding attacks 21273 31.38% 28936 39.49%
Total 67786 - 73264 -
Flow level
Normal 27411 57.67% 28841 61.38%
Flooding attacks 20117 42.33% 18150 38.62%
Total 47528 - 46991 -
and Computer Applications, vol. 36, no. 2, pp. 567–581, [40] A. Sperotto, R. Sadre, F. Vliet, and A. Pras, “A labeled
2013. data set for flow-based intrusion detection,” in Proceed-
[26] R. P. Lippmann, D. J. Fried, I. Graf, et al., “Evaluating ings of the 9th IEEE International Workshop on IP Oper-
intrusion detection systems: The 1998 DARPA offline ations and Management, pp. 39–50, Venice, Italy, 2009.
intrusion detection evaluation,” in Proceedings of the [41] S. J. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. K.
DARPA Information Survivability Conference and Expo- Chan, “Cost-based modeling for fraud and intrusion de-
sition, pp. 12–26, 2000. tection: Results from the JAM project,” in Proceedings of
[27] M. V. Mahoney and P. K. Chan, “An analysis of the 1999 the IEEE DARPA Information Survivability Conference
DARPA/Lincoln laboratory evaluation data for network and Exposition, vol. 2, pp. 130–144, USA, 2000.
anomaly detection,” in Proceedings of the 6th Interna- [42] symantec.com, Symantec Security Response, June 2015.
tional Symposium on Recent Advances in Intrusion De- (http://securityresponse.symantec.com/avcenter)
[43] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani,
tection, pp. 220–237, 2003.
“A detailed analysis of the KDD CUP 99 data set,” in
[28] S. McCanne and V. Jacobson, “The BSD packet fil- Proceedings of the 2nd IEEE International Conference
ter: A new architecture for user level packet capture,” on Computational Intelligence for Security and Defense
in Proceedings of the Winter 1993 USENIX Conference, Applications, pp. 53–58, USA, 2009.
pp. 259–269, 1993. [44] C. Thomas, V. Sharma, and N. Balakrishnan, “Usefulness
[29] J. McHugh, “Testing intrusion detection systems: a cri- of DARPA dataset for intrusion detection system evalua-
tique of the 1998 and 1999 DARPA intrusion detection tion,” in Proceedings of the Data Mining, Intrusion De-
system evaluations as performed by lincoln laboratory,” tection, Information Assurance, and Data Networks Se-
ACM Transactions on Information and System Security, curity, SPIE 6973, Orlando, FL, 2008.
vol. 3, pp. 262–294, Nov. 2000. [45] UNIBS, University of Brescia Dataset, 2009.
[30] P. Mell, V. Hu, R. Lippmann, J. Haines, and M. Zissman, (http://www.ing.unibs.it/ntw/tools/traces/)
An Overview of Issues in Testing Intrusion Detection Sys- [46] J. Xu and C. R. Shelton, “Intrusion detection using con-
tems, 2003. (http://citeseer.ist.psu.edu/621355.html) tinuous time bayesian networks,” Journal of Artificial In-
[31] Z. Muda, W. Yassin, M. N. Sulaiman, and N. I. Udzir, telligence Research, vol. 39, pp. 745–774, 2010.
“A K-means and naive bayes learning approach for bet- [47] G. Zhang, S. Jiang, G. Wei, and Q. Guan, “A prediction-
ter intrusion detection,” Information Technology Journal, based detection algorithm against distributed denial-of-
vol. 10, no. 3, pp. 648–655, 2011. service attacks,” in Proceedings of the ACM International
[32] NSL-KDD, NSL-KDD Data Set for Network-based Conference on Wireless Communications and Mobile
Intrusion Detection Systems, Mar. 2009. (http://iscx.cs. Computing: Connecting the World Wirelessly, pp. 106–
unb.ca/NSL-KDD/) 110, Leipzig, Germany, 2009.
[48] Y. F. Zhang, Z. Y. Xiong, and X. Q. Wang, “Distributed
[33] M. E. Otey, A. Ghoting, and S. Parthasarathy, “Fast dis- intrusion detection based on clustering,” in Proceeding of
tributed outlier detection in mixed-attribute data sets,” the International Conference on Machine Learning and
Data Mining and Knowledge Discovery, vol. 12, no. 2- Cybernetics, vol. 4, pp. 2379–2383, Aug. 2005.
3, pp. 203–228, 2006.
[34] R. Pang, M. Allman, M. Bennett, J. Lee, V. Paxson, and Monowar H. Bhuyan is an assistant professor in the Depart-
B. Tierney, “A first look at modern enterprise traffic,” in ment of Computer Science and Engineering at Kaziranga Uni-
Proceedings of the 5th ACM SIGCOMM Conference on versity, Jorhat, Assam, India. He received his Ph.D. in Com-
Internet Measurement, pp. 2, Berkeley, USA, 2005. puter Science & Engineering from Tezpur University (a Cen-
[35] R. Pang, M. Allman, V. Paxson, and J. Lee, “The devil tral University) in February 2014. He is a life member of IETE,
and packet trace anonymization,” SIGCOMM Computer India. His research areas include data mining, cloud security,
Communication Review, vol. 36, no. 1, pp. 29–38, 2006. computer and network security. He has published 20 papers
[36] L. Portnoy, E. Eskin, and S. Stolfo, “Intrusion detection in international journals and referred conference proceedings.
with unlabeled data using clustering,” in Proceedings of He is on the programme committee members/referees of sev-
ACM CSS Workshop on Data Mining Applied to Security, eral international conferences/journals.
pp. 5–8, 2001.
[37] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, Dhruba K. Bhattacharyya received his Ph.D. in Computer
“Towards developing a systematic approach to generate Science from Tezpur University in 1999. Currently, he is a
benchmark datasets for intrusion detection,” Computers Professor in the Computer Science & Engineering Department
& Security, vol. 31, no. 3, pp. 357–374, 2012. at Tezpur University. His research areas include data min-
[38] J. Song, H. Takakura, and Y. Okabe, “Description ing, network security and bioinformatics. Prof. Bhattacharyya
of kyoto university benchmark data,”. pp. 1–3. 2006. has published more than 220 research papers in leading in-
(http://www.takakura.com/Kyoto data/BenchmarkData- ternational journals and conference proceedings. Dr. Bhat-
Description-v5.pdf) tacharyya also has written/edited 10 books. He is on the edito-
[39] J. Song, H. Takakura, Y. Okabe, and K. Nakao, “Toward a rial boards of several international journals and also on the pro-
more practical unsupervised anomaly detection system,” gramme committees/advisory bodies of several international
Information Sciences, vol. 231, pp. 4–14, Aug. 2013. conferences/workshops.
International Journal of Network Security, Vol.17, No.6, PP.683-701, Nov. 2015 701