Cse 3045

CSE 3045
MATHEMATICAL MODELING FOR DATA SCIENCE

THEORY DA-1
Name – Akhil Rajan
Reg No- 20BDS0344
Intrusion Detection Systems for Security
Intrusion is a type of attack that attempts to bypass the security mechanism of a computer
system. Intrusion detection is the process of monitoring and analyzing the events occurring in a
computer system in order to detect signs of security problems. There are two main strategies of
IDS: misuse detection and anomaly detection. Misuse detection attempts to match patterns and
signatures of already known attacks in the network traffic. A constantly updated database is
usually used to store the signatures of known attacks. It cannot detect new attack until trained for
them. Anomaly detection attempts to identify behavior that does not conform to normal
behavior. This technique is based on the detection of traffic anomalies. The anomaly detection
systems are adaptive in nature, they can deal with new attack but they cannot identify the
specific type of attack.
(i) Bayesian Classifier -
The naïve Bayes model is a heavily simplified Bayesian probability model. The naïve Bayes
classifier operates on a strong independence assumption.
This means that the probability of one attribute does not affect the probability of the other. Given
a series of n attributes, the naïve Bayes classifier makes 2n! independent assumptions.
Nevertheless, the results of the naïve Bayes classifier are often correct. The work reported in
examines the circumstances under which the naïve bayes classifier.
It states that the error is a result of three factors: training data noise, bias, and variance. Training
data noise can only be minimized by choosing good training data. The training data must be
divided into various groups by the machine learning algorithm.
Bias is the error due to groupings in the training data being very large. Variance is the error due
to those groupings being too small.
Bayesian classification algorithm which improves the comprehensive weighting coefficient. This
algorithm adds a comprehensive weighting coefficient to the traditional Naive Bayesian
classification model. The comprehensive weighting coefficient combines covariance theory and
weighting. Coefficient makes up for the previous literature only considering the frequency
relationship of attributes, while ignoring the impact of the content of attribute values on
classification makes the original concise and efficient algorithm more perfect
(ii) Decision Trees –
Intrusion detection can be considered as classification problem where each
connection or user is identified either as one of the attack types or normal based on
some existing data.
Decision trees can solve this classification problem of intrusion detection as they
learn the model from the data set and can classify the new data item into one of the
classes specified in the data set.
Decision trees can be used as a misuse intrusion detection as they can learn a model
based on the training data and can predict the future data as one of the attack types or
normal based on the learned model.
Decision trees work well with large data sets. This is important as large amounts of
data flow across computer networks. The high performance of Decision trees makes
them useful in real-time intrusion detection.
Decision trees construct easily interpretable models, which is useful for a security
officer to inspect and edit. These models can also be used in the rule-based models
with minimum processing.
Generalization accuracy of decision trees is another useful property for intrusion
detection model. There will always be some new attacks on the system which are
small variations of known attacks after the intrusion detection models are built. The
ability to detect these new intrusions is possible due to the generalization accuracy of
decision trees.
(iii) Random Forest –

Random Forest is an ensemble method, which predicts based on the results of a
collection of Decision Trees. Resampling using the bootstrap approach is used for the
creation of each tree in the “forest.” Also, on each node split a subset of features is
selected randomly and the selection of the split variable occurs over this subset. The
predicted value is the majority vote, for classification, and the average, for
regressions. The approach is due to Breiman (2001), which was based on prior ideas
of Amit and Geman (1997) and Ho (1995, 1998). Essentially, there are two
parameters for tuning on Random Forest models: mtry – the number of randomly
selected features to consider in each split; and ntree – the number of trees in the
model. There is a tradeoff in mtry: large values increases the correlation among trees,
but improves the strength (accuracy) of each tree (Breiman 2001). A bootstrapped
subset of the training dataset is created to train each tree in the “forest.”
Due to this fact, on average, each tree makes use of around two-thirds of the training
dataset. The unused elements are called by the Out Of Bag (OOB) samples. The OOB
samples can be used for validation. In this case, each tree predicts over its respective
OOB samples and the final result is an average over the trees’ outcomes. The OOB
samples can be used to estimate the importance of each variable and create a rank for
them. For each tree and its respective OOB samples, it computes the accuracy over
this set, permutes randomly a variable between samples, and recomputes the accuracy
on the new set. Performing this for all trees and averaging for each variable, it is
possible to have a relevance comparison metric, which is usually referred to as
Variable Importance Measure (VIM) or Permutation Importance Index (PIM).
(iv) Support Vector Machines-

Support Vector Machines (SVM) has become one of the popular ML algorithms used
for intrusion detection due to their good generalization nature and the ability to
overcome the curse of dimensionality.
As quoted by different researchers’ number of dimensions still affects the
performance of SVM-based IDS. Another issue quoted is that SVM treats every
feature of data equally. In real intrusion detection datasets, many features are
redundant or less important. It would be better if we consider feature weights during
SVM training. This incorporates Information Gain Ratio (IGR) and K-mean
algorithm to SVM for intrusion detection. In purposed framework NSL-KDD dataset
is ranked using IGR and later feature subset selection is done using K-mean
algorithm.
The parameters for SVM will be selected by a swarm intelligence algorithm (Particle
Swarm Optimization or Artificial Bee Colony). The results of experiments
demonstrate that applying SVM in Intrusion
Detection System can be an effective and
efficient way for detecting intrusions.
(v) Back Propagation Neural Networks –

Intrusion detection is a critical component of secure information systems. Current
intrusion detection systems (IDS) especially NIDS (Network Intrusion Detection
System) examine all data features to detect intrusions. However, some of the features
may be redundant or contribute little to the detection process and therefore they have
an unnecessary negative impact on the system performance.
An intrusion detection model that is computationally efficient and effective based on
feature selection and back-propagation neural network (BPNN). Firstly, the issue of
identifying important input features based on independent component analysis (ICA)
is addressed, because elimination of the insignificant and/or useless inputs leads to a
simplification of the problem, therefore results in faster and more accurate detection.
Secondly, classic BPNN is used to learn and detect intrusions using the selected
important features. Experimental results on the well-known KDD Cup 1999 dataset
demonstrate the proposed model is effective and can further improve the performance
by reducing the computational cost without obvious deterioration of detection
performances. logy towards developing a novel intrusion detection system (IDS) by
back propagation neural networks (BPN). The main function of Intrusion Detection
System is to protect the resources from threats. It analyzes and predicts the
behaviours of users, and then these behaviours will be considered an attack or a
normal behaviour. There are several techniques which exist at present to provide
more security to the network, but most of these techniques are static. Test the
proposed method by a benchmark intrusion dataset to verify its feasibility and
effectiveness. Results show that choosing good attributes and samples will not only
have impact on the performance, but also on the overall execution efficiency. The
proposed method can significantly reduce the training time required. Additionally, the
training results are good. It provides a powerful tool to help supervisors analyze,
model and understand the complex attack behavior of electronic crime.

Cse 3045

Uploaded by

Copyright:

Available Formats

Cse 3045

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cse 3045

Uploaded by

Copyright:

Available Formats

CSE 3045

MATHEMATICAL MODELING FOR DATA SCIENCE

(iii) Random Forest –

(iv) Support Vector Machines-

(v) Back Propagation Neural Networks –

You might also like