Network Intrusion Detection System Using Single Level Multi-Model Decision Trees
Network Intrusion Detection System Using Single Level Multi-Model Decision Trees
Network Intrusion Detection System Using Single Level Multi-Model Decision Trees
Bachelor of Technology
in
1|Page
DECLARATION
I hereby declare that the project entitled “Network Intrusion Detection System
Using Single Level Multi-Model Decision Trees” submitted by our team, for
the award of the degree of Bachelor of Technology in Information Security
Management to VIT is a record of bonafide work carried out by our team under
the supervision of Prof. Amutha Prabakar.
I further declare that the work reported in this project has not been
submitted and will not be submitted, either in part or in full, for the
award of any other degree or diploma in this institute or any other
institute or university.
Place: Vellore
2|Page
TABLE OF CONTENTS
1. Abstract 4
2. Introduction 5
3. Literature review 6
4. Technical 10
Specifications
5. Attacks 11
6. Methodology 12
7. Architecture Diagram 14
8. Implementation 15
9. Conclusion 26
10. References 27
3|Page
ABSTRACT
An intrusion detection system(IDS) is basically a software application that
monitors a network or systems for malicious activity or policy violations. Any
intrusion activity or violation of the protocols is reported to an authorized person
or an administrator. Intrusion detection helps to detect a malicious activity which
may cause any harm or damage to the important data or services.
4|Page
INTRODUCTION
Intrusion detection system allows packets to pass and then based on the
performance or behaviour of that packets in the system the IDS stop the further
incoming packets. An advantage of an Intrusion detection system is that it is very
fast and hence only very less number of infected packets are able to enter the
system and the entry of suck packets is stopped very soon thus saving the system
from much damage.
In the today’s world where everyone is connected to the Internet and everything
is being done online and there is a lot of communication between the devices
where the data packets are sent from one PC to another, several types of attacks
have evolved which not only can harm the private data of organisation but can
also lead to the disruption of services like in the case of Denial Of Service attack.
So a system which can detect such malicious packets and protect the system from
losses is the need of the hour.
5|Page
LITERATURE REVIEW
TITLE AUTHORS YEAR OF METHODOLOGY
(STUDY) PUBLICATION
6|Page
identifying smurf assault with
exactness of 98.6161%
7|Page
An effective Wang 2017 proposed a SVM based
intrusion interruption identification
detection procedure that considers
framework prehandling information
based on SVM using changing over the
with feature typical qualities by the
augmentation logarithms of the negligible
thickness proportions that
abuses the order data that is
remembered for each
element. This subsequent in
information that has top
caliber and compact which
thus accomplished a superior
recognition execution as well
as lessening the preparation
time required for the SVM
identification model
A Deep Yin, et al 2017 have investigated how to
Learning show an IDS dependent on
Approach for profound learning approach
Intrusion utilizing repetitive neural
Detection organizations (RNN-IDS) in
Using view of its capability of
Recurrent removing better portrayals for
Neural the information and make
Networks better models. They
preprocessed the dataset
utilizing Numericalization
procedure on the grounds that
the information worth of
RNN-IDS ought to be a
numeric network. The
outcomes showed that
RNNIDS has extraordinary
precision rate and location
rate with a low bogus positive
rate contrasted and
8|Page
conventional grouping
techniques.
9|Page
TECHNICAL SPECIFICATIONS
In this project, a multi model decision tree classifier has been used for approach
for intrusion detection/prevention systems is proposed.
Numpy: It is a python library which has many in-built functions for working on
single and multi-dimensional arrays
Label Encoding: It refers to converting labels into the numeric format so that it
gets converted into the machine-readable form.
10 | P a g e
Attacks against which the IDS will provide protection:
1) U2R : U2R attack means unauthorized access to local root privileges. This
is the type of attack where the attacker attempts to illegally obtain root
privileges by actually legally accessing a local machine by using some
vulnerability in the victim’s system to his advantage.
4) Probing : This is also cyber attack where the attacker tries to steal
Many of the similar attacks have been classified into these four categories to
extend this project to detect more number of attacks.
11 | P a g e
METHODOLOGY
MODULES USED
Feature selection
Here we will utilize Information Gain (IG) so that we can choose the relevant
features from the dataset. It is determined for each and every class independently.
The classes are ranked as per the information gain such that if the value is less
than a threshold value, feature will be eliminated.We partition the preparation
dataset into 4 datasets. The preparation dataset is partitioned into 4 datasets so
that each dataset comprises of records having a place with a similar attack class
alongside a portion of the records of the first dataset. Then the datasets for each
attack class are sent independently as info into the technique used to compute the
attack class.
12 | P a g e
Obtaining the best features
Developing a classifier
A supervised machine learning model is being used here which is used to classify
a label data into a particular class. The classification algorithm that we have used
in our project is decision tree classifier Decision tree is an algorithm which takes
decisions at each node of the tree and is widely used for regression and
classification. We have chosen decision trees since they can be trained very easily.
It is more productive than most of the classification algorithms in ML like K-
Nearest Neighbours in most of the cases.
13 | P a g e
ARCHITECTURE DIAGRAM
14 | P a g e
Implementation
Dataset Description
Here column which we are using for the given dataset are as follows:-
This is how the datasets looks like for training and testing
15 | P a g e
The table dimensions are:
Training dataset:
16 | P a g e
Testing dataset
17 | P a g e
Here we have separated Categorical data columns which comes to be
protocol_type, service, flag, label and calculated unique row values of them for
the training dataset and testing dataset
Training dataset
For testing data set it has 6 less categories in feature protocol_type than training
dataset.
18 | P a g e
Then we have imported labelEncoder and Onehot encoder to transform the
categorical values into binary values. Here we have taken the 3 columns and
formed a new data frame with these 3 columns and shown
Now what we have done is that we have given unique names to each unique
category of the 3 feature columns so that , it becomes easy to identify
The here what we have done is to label encode the 3 columns by numbering their
unique categories with the help of label Encoder which transforms each value in
a column to it’s corresponding number for both training and test dataset
19 | P a g e
After that we performed Onhotencoding on the label encoded data set to
transform it into binary data form. The basic use of Onehotencoding is that here
each category value is converted into a new column and assigned a 1 or 0
Now there were no. of different categories for service column in training dataset.
Here we have fetched that. These are the 6 categories which are missing
20 | P a g e
After adding the new data frame of binary encoded data back to original dataframe
after removing the categorical columns we get the dimensions as:-
21 | P a g e
After that feature slection is done using RFE to select 13 best features from
122 features here.
Which comes as
Then we have built the model using decision tree classifier and have
got our results as follows:
22 | P a g e
Dos ATTACK
Probe ATTACK
R2L ATTACK
23 | P a g e
U2R ATTACK
Dos attack
24 | P a g e
Probe attack
R2L attack
U2R attack
25 | P a g e
Conclusion
We have made an intrusion detection system using the decision tree machine
learning model classifier to classify a particular attack and detect the attack.
Preprocessing on the data has been done so as to decrease the computation time
and analyse only the futures. Many network parameters like srv_count, etc have
been taken into account which makes our IDS system to detect attacks more easily
and effectively. The confusion matrices have been calculated which show that
there are a very low very number of false positives and very high number of true
positives. Also the evaluation measures like accuracy and precision etc have been
calculated and the accuracy was found to be around 99% which is very good and
better than many of the other IDS systems that exist.
26 | P a g e
References
• H. S. Hota and A. K. Shrivas, “Decision Tree Techniques Applied on
NSLKDD Data and Its Comparison with Various Feature Selection
Techniques,” in Advanced Computing, Networking and
InformaticsVolume 1: Advanced Computing and Informatics Proceedings
of the Second International Conference on Advanced Computing,
Networking and Informatics (ICACNI-2014), 2014, pp. 205–211.
27 | P a g e