Academia.eduAcademia.edu

An Immune Inspired Multilayer IDS

2011, IJCSIS

The use of artificial immune systems in intrusion detection is an appealing concept for two reasons. Firstly, the human immune system provides the human body with a high level of protection from invading pathogens, in a robust, selforganized and distributed manner. Secondly, current techniques used in computer security are not able to cope with the dynamic and increasingly complex nature of computer systems and their security. The objective of our system is to combine several immunological metaphors in order to develop a forbidding IDS. The inspiration come from: (1) Adaptive immunity which is characterized by learning, adaptability, and memory and is broadly divided into two branches: humoral and cellular immunity. And (2) The analogy of the human immune systems multilevel defense could be extended further to the intrusion detection system itself. This is also the objective of intrusion detection which need multiple detection mechanisms to obtain a very high detection rate with a very low false alarm rate.

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 10, October 2011 . An Immune Inspired Multilayer IDS Najlaa Badie Aldabagh Computer Sciences College of Computer Sciences and Mathematics Iraq, Mosul, Mosul University Mafaz Muhsin Khalil Alanezi Computer Sciences College of Computer Sciences and Mathematics Iraq, Mosul, Mosul University [email protected] [email protected] Abstract—The use of artificial immune systems in intrusion detection is an appealing concept for two reasons. Firstly, the human immune system provides the human body with a high level of protection from invading pathogens, in a robust, selforganized and distributed manner. Secondly, current techniques used in computer security are not able to cope with the dynamic and increasingly complex nature of computer systems and their security. The objective of our system is to combine several immunological metaphors in order to develop a forbidding IDS. The inspiration come from: (1) Adaptive immunity which is characterized by learning, adaptability, and memory and is broadly divided into two branches: humoral and cellular immunity. And (2) The analogy of the human immune systems multilevel defense could be extended further to the intrusion detection system itself. This is also the objective of intrusion detection which need multiple detection mechanisms to obtain a very high detection rate with a very low false alarm rate. Dasgupta et. al. [2, 3] in which they describe the use of several types of detector analogous to T helper cells, T suppressor cells, B cells and antigen presenting cells in two type of data binary and real, to detect anomaly in time series data generated by Mackey-Glass equation. NSL-KDD are data Sets provide platform for the purpose of testing intrusion detection systems and to generate both background traffic and intrusions with provisions for multiple interleaved streams of activity [4]. These provide a (more or less) repeatable environment in which real-time tests of an intrusion detection system can be performed. The data set contain records each of which contains 41 features and is labeled as either normal or an attack, with exactly one specific attack type, The data set contains 24 attack types. These attacks fall into four main categories: DoS; U2R; R2L; and Probing [24, 26]. These data set available at [25]. II. In computer security there is no single component or application that can be employed to keep a computer system completely secure. For this reason it is recommended that a multilevel defense approach be taken to computer security. The biological immune system employs a multilevel defense against invaders through nonspecific (innate) and specific (adaptive) immunity. The problems for intrusion detection also need multiple detection mechanisms to obtain a very high detection rate with a very low false alarm rate. The objective of our system is to combine several immunological metaphors in order to develop a forbidding IDS. The inspiration come from: (1) Adaptive immunity which is characterized by learning, adaptability, and memory and is broadly divided into two branches: humoral and cellular immunity. And (2) The analogy of the human immune systems multilevel defense could be extended further to the intrusion detection system itself. An IDS is designed with three phases: Initialization and Preprocessing phase, Training phase, Testing phase. But the Training phase has two defense layers, the first layer is a Cellular immunity (T & B cells reproduction) where an ALCs would attempt to identify the attack. If this level was unable to identify the attack the second layer Humoral immunity (Complement System), which is a more complex level of detection within the IDS would be enabled. The complement system, represents a chief component of innate immunity, not Keywords: Artificial Immune System (AIS); Clonal Selection Algorithm (CLONA); Immune Complement Algorithm (ICA); Negative Selection (NS); Positive Selection (PS); NSl-KDD dataset. I. IMMUNITY IDS OVERVIEW INTRODUCTION When designing an intrusion detection system it is desirable to have an adaptive system. The system should be able to recognize attacks it has not seen before and then respond appropriately. This kind of adaptive approach is used in anomaly detection, although where the adaptive immune system is specific in its defense, anomaly detection is nonspecific. Anomaly detection identifies behavior that differs from “normal” but is unable to the specific type of behavior, or the specific attack. However, the adaptive nature of the adaptive immune system and its memory capabilities make it a useful inspiration for an intrusion detection system [1]. However on subsequent exposure to the same pathogen, memory cells are already present and are ready to be activated and defend the body. It is important for an intrusion detection system to be adaptive. There are always new attacks being generated and so an IDS should be able to recognize these attacks. It should also then be able to use the information gathered through the recognition process so that it can quickly identify the attacks in the future [1]. 30 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 10, October 2011 . only participates in inflammation but also acts to enhance the adaptive immune response [23]. All memory ALCs obtained from Training phase layers used in Testing phase to detect attacks. This multilevel approach could provide more specific levels of defense and response to attacks or intrusions. The problem with anomaly detection systems is that often normal activity is classified as intrusive activity and so the system is continuously raising alarms. The co-operation and co-stimulation between cells in the immune system ensures that an immune response is not initiated unnecessarily, thus providing some regulation to the immune response. Implementing an error-checking process provided by cooperation between two levels of detectors could reduce the level of false positive alerts in an intrusion detection system. The algorithm works on similar principles, generating detectors, and eliminating the ones that detect self, so that the remaining detectors can detect any non-self. The initial exposure to Ag that stimulates an adaptive immune response is handled by a small number of low-affinity lymphocytes. This process is called primary response and this what will happened in Training phase. Memory cells with high affinity for the encounter, however, are produced as a result of response in the process of proliferation, somatic hyper mutation, and selection. So, a second encounter with the same antigen induces a heightened state of immune response due to the presence of memory cells associated with the first infection. This process is called secondary response and this what will happened in Testing phase. By comparison with the primary response, the secondary response is characterized by a shorter lag phase and a lower dose of antigen required for causing the response, and that could be notice in the run speed of these two phases. The overall diagram of Immunity-Inspired IDS in figure (1) Note the terms ALCs and detectors have the same meaning in this system. will be close to 1. Since information gain is calculated for discrete features, continuous features are discretized with the emphasis of providing sufficient discrete values for detection [20]. The most 10 significant features the system obtained are: duration, src_bytes, dst_bytes, hot, num_compromised, num_root, count, srv_count, dst_host_count, dst_host_srv_count. a) Information Gain Let S be a set of training set samples with their corresponding labels. Suppose there are m classes (here m=2) and the training set contains si samples of class I and s is the total number of samples in the training set. Expected information needed to classify a given sample is calculated by [20, 21]: (1) A feature F with values { f1, f2, …, fv } can divide the training set into v subsets { S1, S2, …, Sv } where Sj is the subset which has the value fj for feature F. Furthermore let Sj contain sij samples of class i. Entropy of the feature F is (2) Information gain for F can be calculated as: Gain(F) = I(s1,...,sm ) − E(F) (3) b) Univariate discretization process Discrete values offer several advantages over continuous ones, such as data reduction and simplification. Quality discretization of continuous attributes is an important problem that has effects on speed, accuracy, and understandability of the classification models [22]. Discretization can be univariate or multivariate. Univariate discretization quantifies one continuous feature at a time while multivariate discretization simultaneously considers multiple features. We mainly consider univariate (typical) discretization in this paper. A typical discretization process broadly consists of four steps [22]: • Sort the values of the attribute to be discretized. • Determine a cut-point for splitting or adjacent intervals for merging. • Split or merge intervals of continuous values, according to some criterion. • Stop at some point. Since information gain is calculated for discrete features, continuous features should be discretized [20, 22]. To this end, continuous features are partitioned into equalsized partitions by utilizing equal frequency intervals. In equal frequency intervals method, the feature space is partitioned into arbitrary number of partitions where each partition contains the same number of data points. That is to say, the range of each partition is adjusted to contain N dataset instances. If a value occurs more than N times in a feature space, it is assigned a A. Initialization and Preprocessing phase Have the following operations: 1) Preprocessing NSL dataset The data are partitioned in to two classes: normal and attack, where the attack is the collection of all 22 different attacks belonging to the four classes described in section I, the labels of each data instance in the original data set are replaced by either `normal' for normal connections or `anomalous' for attacks. Due to the abundance of the 41 features, it is necessary to reduce the dimensionality of the data set, to discard the irrelevant attributes. Therefore, information gains of each attribute are calculated and the attributes with low information gains are removed from the data set. The information gain of an attribute indicates the statistical relevance of this attribute regarding the classification [21]. Based on the entropy of a feature, information gain measures the relevance of a given feature, in other words its role in determining the class label. If the feature is relevant, in other words highly useful for an accurate determination, calculated entropies will be close to 0 and the information gain 31 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 10, October 2011 . partition of its own. In “21% NSL” dataset, certain classes such as denial of service attacks and normal connections occur in the magnitude of thousands whereas other classes such as R2L and U2R attacks occur in the magnitude of tens or hundreds. Therefore, to provide sufficient resolution for the minor classes N is set to 10 [20]. The result of this step are the most gain indexes to use them later in preprocessing training and testing files. ranges from outweighing attributes with initially smaller ranges [9]. There are many methods for data normalization include min-max normalization, z-score normalization, Logarithmic normalization and normalization by decimal scaling [8, 9]. Min-max normalization: The Min-max normalization performs a linear transformation on the original data. Suppose that mina and maxa are the minimum and the maximum values for feature A. Min-max normalization maps a value v of A to v’ in the range [new-mina, new-maxa] by computing [9]: v’=((v-mina) / (maxa–mina)) * (4) (new-maxa–new-mina) + new-mina 2) Self and NonSelf Antigens As mentioned in chapter 2 that each record of NSL or KDD dataset contains 41 features and is labeled as either normal or an attack, so it would be here as Self and NonSelf respectively. The dataset used in the training phase of the system contain about 200 records from normal and attack records, the attack records have records from all types of attack in the original dataset. And this rule applied on NSL and KDD datasets. But the all “21% NSL” test datasets used when test the system in testing phase. The system in training and testing phase, apply on each file before enter to it: selecting the most gain indexes and convert each continuous feature to discrete. In the case range is [0-1] the equation would be: v’= (v-mina) / (maxa – mina) (5) In order to generalization all the comparisons (NS & PS) done in IIDS, and to simplify the chosen of thresholds values, the calculated affinities between each one of ALCs and all Ags is normalized into the range [1-100] in case Th and B cells, and normalized into the range [0-1] in case Ts cells and CDs. 5) Detector Generation Mechanism All Nonself or attack records in training file will be consider as the initial detectors (or ALCs) then in training phase eliminates those that match self samples. Sure there are three types of detectors (integer, string, real). The output of this step is a specified number for every detectors types and their length equal to Self and NonSelf patterns length's which is the number of gain indexes. 3) Antigens Presentation T cell and B cell are assumed that recognize antigens in different ways. In biological immune system, T cells can only recognize internal features (peptides) processed from foreign protein. In our system, T cells recognition is defined as bitlevel recognition (real, integer). This is a low-level recognition scheme. In the immune system, however, B cells can only recognize surface features of antigens. Because of the large size and complexity of most antigens, only parts of the antigen, discrete sites called epitopes, get bound to B cells. Bcell recognition is proposed that is a higher-level recognition (string) at different non-contiguous (occasionally contiguous) positions of antigen strings. So different data types are used for each ALC in order to compose several detection levels. In order to present the self and nonself antigens on ALCs, there are also converted to suit different data types of ALCs, like integer for T-helper cells, string for B-cells, and real [0-1] for T-suppresser cells . Real values would be in range [0-1], so Normalization is used for conversion operation. 6) Affinity Measure by Matching Rules In several next steps affinity needs to be calculated the between (ALCs & Self patterns) and (ALCs & NonSelf Ags), so matching rules are determined depend on the data type. • The affinity between an Th ALC (integer) and a NonSelf Ags or Self patterns is measured by Landscape-affinity matching (Physical matching rule) [11, 12, 10]. The Physical matching gives an indication of the similarity between two patterns, i.e. a higher affinity value between an ALC and a NonSelf Ags implies a stronger affinity. (6) 4) Normalization Data transformation such as normalization may improve the accuracy and efficiency of classification algorithms involving neural networks, mining algorithm, or distance measurements such as nearest neighbor classification and clustering. Such methods provide better results if data to be analyzed has been normalized, that scaled to specific ranges such as (0-1) [8, 9]. If using the neural network back propagation algorithm for classification mining, normalizing the input values for each attribute measured in the training samples will help speed up the learning phase. For distanced-based methods, normalization helps prevent attributes with initially large • The affinity between an Ts ALC (real) and a NonSelf Ags or Self patterns is measured by Euclidean distance [11 ,13, 12]. The Euclidean distance gives an indication of the difference between two patterns, i.e. a lower affinity value between an ALC and a NonSelf Ags implies a stronger affinity. (7) • The affinity between an B ALC (string) and a NonSelf Ags or Self patterns is measured by R-Contiguous string matching rule. If x and y are equal-length strings defined 32 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 10, October 2011 . over a finite alphabet, match(x, y) is true if x and y agree in at least r contiguous locations [11, 14, 12, 15]. The RContiguous string matching gives an indication of the similarity between two patterns, i.e. a higher affinity value between an ALC and a NonSelf Ags implies a stronger affinity. 100%], and Maxgeneration is the maximum no of generation used in random generation of ALCs in initialization and Generation phase. • The affinity is measured here between all cloned ALCs and NonSelf Ags. And sort all ALCs in descending order depend on their affinity with NonSelf Ags. B. Training Phase Here the system will be train by a serious of recognition operations between the previous generated detectors and self and nonself Ags to constitute multilevel recognition, make the recognition system more robust and ensures efficient detection. 1) First Layer-Cellular immunity (T & B cells reproduction) Both B cells and T cells undergo proliferation and selection and exhibit immunological memory once they have recognized and responded to an Ag. All system's ALCs progress in the following stages: a) Clonal and Expansion Clonal selection in AIS is the selection of a set of ALCs with the highest calculated affinity with a NonSelf pattern. The selected ALCs are then cloned and mutated in an attempt to have a higher binding affinity with the presented NonSelf pattern. The mutated clones compete with the existing set of ALCs, based on the calculated affinity between the mutated clones and the NonSelf pattern, for survival to be exposed to the next NonSelf pattern. • • Clonal Operator Now is a time to clone the previous selected ALCs in order to expand the number of ALCs in training phase, and ALC how has the higher affinity with NonSelf Ags will has the higher Clonal Rate. Here the clonal rate is calculated for each one of the selected ALCs, (9) TotalCloneALC = Σni=1 ClonalRateALCi , where ClonalRateALCi = Round (Kscale / i), or ClonalRateALCi = Round (Kscale × i), [16] The choice between the two equation of ClonalRateALCi depend on how much clones required? Kscale is the clonal rate, Round() is the operator that rounds the value in parentheses toward its closet integer value, and TotalCloneALC is the total no of clones cells. • Affinity Maturation (Somatic hypermutation) After producing clones from the selected ALCs, these clones alter by a simple mutation operator to provide some initial diversity over the ALCs population. The process of affinity maturation plays an important role in adaptive immune response. From the viewpoint of evolution, a remarkable characteristic of the affinity maturation process is its controlled nature. That is to say the hypermutation rate to be applied to every immune cell receptor is proportional to its antigenic affinity. By computationally simulating this process, one can produce powerful algorithms that perform a search akin to local search around each candidate solution. In account to this important aspect of the mutation in the immune system: it is inversely proportional to the antigenic affinity [5]. Without mutation the system is only capable of manipulating the ALCs material that was present in initial population [6]. In case Th, and B ALCs, the system calculate mutation rate for each ALCs depend on its affinity with NonSelf Ags, where higher affinity (similarity) has lower mutation rate. In Ts case, one can evaluate the relative affinity of each candidate ALCs by scaling (normalizing) their affinities. The inverse of an exponential function can be used to establish a relationship between the hypermutation rate α(.) and normalized affinity D*, as described in next equation. In some cases it might be interesting to re-scale α to an interval such as [0 – 1] [5]. α(D*) = exp(-ρD*) (10) Selection Mechanism The selection of cells for cloning in the immune system is proportional to their affinities with the selective antigens. Thus implementing an affinity proportionate selection can be performed probabilistically using algorithms like the roulette wheel selection, or other evolutionary selection mechanism can be used, such as elitist selection, rank- based selection, biclassist selection, and tournament selection [5]. Here the system use elitist selection because it needs to remember good detectors and discard bad ones if it is to make progress towards the optimum. A very simple selector would be to select the top N detectors from each population for progression to the next population. This would work up to a point, but any detectors which have very high affinity will always make it through to the next population. This concept is known as elitism. To apply this idea four selected percent values are specified, which determine the percent from each type of ALCs will be select to Clonal and Expansion operations, SelectedALCNo =(ALCsize * selectALCpercent) / Maxgeneration, Sorting Affinity (8) Where SelectedALCNo is no of ALCs will be Selected to clone them, ALCsize is the number of ALCs survived from NS and PS in initialization and Generation phase, selectALCpercent is a selected percent value it range [10- 33 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 10, October 2011 . where ρ is a parameter that controls the smoothness of the inverse exponential, and D* is the normalized affinity, that can be determined by D* = D/Dmax. inverse mean lower affinity (difference) has higher mutation rate. Mutators generally are not as complicated, they tend to just choose a random point on the ALCs and perturb this allele (part of Gene) either completely randomly or by some given amount [6]. To control the mutation operator mutation rate is calculated as descried up, which is determine number of allele from ALCs will be mutate. The hypermutation operator for each type of shape-space as follows: – Integer shape-space (Th): when mutation rate of the current Th-ALC high enough, randomly choose the alleles position from ALC, and replace them with a random integer values. Another case use inversive mutation that might occur between one or more pairs of allele. – String shape-space (B): when mutation rate of the current Th-ALC high enough, randomly choose the alleles position from ALC, here the allele has length equal R string, so may the entire characters of allele change or part of them with another characters. – Real shape-space (Ts): randomly choose the alleles position from ALC, and a random real number to be added or subtracted to a given allele is generated m` = m + α(D*) N(0,σ) (11) of detectors at one site provides no information of detectors at different sites. – The self set and the detector set are mutually protective: detectors can monitor self data as well as themselves for change. The negative selection (NS) based AIS for detecting intrusion or viruses was the first successful piece of work using the immunity concept for detecting harmful autonomous agents in the computing environment. The steps of NS algorithm are applied here, – Generated three types of ALCs (Th, Ts, B), and present them together with the set of Self (normal record) patterns to NS mechanism. – For all the ALCs generated, compute the affinity between each one of ALCs and all Self pattern, The choose of matching rule to measure the affinity depend on ALCs data type representation. – If the ALC did not match with all self patterns depend on threshold comparison will survive to inter the next step, and the ALCs whose match with any Self pattern will be discard. Each type of ALCs have its own threshold value specially for NS. – Goto to the first step until reach the maximum number of generations of ALCs. But here NS is done between the three types of mutated ALCs and Self patterns, because may be some ALCs match Self pattern after mutation. where m is allele, m` its mutated version, α(D*) is a function that accounts for affinity proportional mutation. • • Positive Selection The mutated ALCs survived from previous Negative selection will be put here to face the NonSelf Ags (attack records) in order to distinguish which detectors can detect them and also because may be some ALCs not match NonSelf Ags after mutation so there is no need to keep them. The steps of PS algorithm are applied here: – Present the three types of ALCs (Th, Ts, B) that survive from NS together with the set of NonSelf Ags to PS mechanism. – For all the ALCs, compute the affinity between each one of ALCs and all NonSelf Ags, The choose of matching rule to measure the affinity depend on ALCs data type representation. – If the ALC match with all Nonself Ags depend on threshold comparison will survive to inter the Training Phase, and the ALCs whose did not match with any NonSelf Ags will be discard. Each type of ALCs have its own threshold value specially for PS. – Goto to the first step until apply PS on all ALCs. Negative Selection A number of the NS algorithm features that distinguish it from other intrusion detection approaches. They are as follows [4]: – No prior knowledge of intrusions is required: this permits the NS algorithm to detect previously unknown intrusions. – Detection is probabilistic, but tunable: the NS algorithm allows a user to tune an expected detection rate by setting the number of generated detectors, which is appropriate in terms of generation, storage and monitoring costs. – Detection is inherently distributable: each detector can detect an anomaly independently without communication between detectors. – Detection is local: each detector can detect any change on small sections of data. This contrasts with the other classical change detection approaches, such as checksum methods, which need an entire data set for detection. In addition, the detection of an individual detector can pinpoint where a change arises. – The detector set at each site can be unique: this increases the robustness of IDS. When one host is compromised, this does not offer an intruder an easier opportunity to compromise the other hosts. This is because the disclosure • Immune Memory Save all survived ALCs from NS and PS in text files, text files for each types of ALCs (Th, Ts, B). Here the system produce memory cells to protect against the reoccurrence of the same antigens. Memory cells enable the immune system’s response to previously encountered antigens (known as the 34 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 10, October 2011 . secondary response), which is known to be more efficient and faster than non-memory cells’ response to new antigens. In an individual these cells are long-lived, often lasting for many years or even for the lifetime of it. i.e. if the affinity between one CD and all NonSelf Ags not exceed a threshold, then the detector successfully detect, else not successfully detect. – Immune Memory: if there are successful CD, then store all CDs can detect NonSelf Ags in PS in text file and go to stopping Condition: have an CDsno optimal complement detectors, else continues. – Sorting CDs: according to the affinities calculated in previous PS step, Sort all the successful individuals CDs in A0NS by their ascending affinities (the higher affinity is the lower value because this affinity is a difference value). – Immerge Population: first put A0NS in the population and then append A0PS after it. 2) Second Layer-Humoral immunity (Complement System) This layer automatically activated when the first layer terminate, and this layer simulate the classical pathway of the complement system, which is activated by a recognition between antigen and antibody (here detectors). The classical pathway is composed of three phases: Identify phase, Activate phase and Membrane attack phase. These phases and all its step called Immune Complement Algorithm(ICA) describe in details in [23]. In this system the complement detectors progress ICA steps with several additional step designed for it purpose, the objective of ICA is the continuo in generation, cleave, and bind the CD individuals until find the optimal CD individuals. The system's ICA summary here in the following four phases: • • – Divide the Population into At1& At2 using Div active variable. At1is a Cleave Set, and At2 is a Bind Set. – For each individual in At1apply a Cleave Operator OC to produce two sub-individual a1 and a2. Then take the second sub-individual a2 for all CD individuals in At1and bind them in one remainder cleave set bt by Positive bind operator OPB. ICA: Initialization phase – Get the Nonself as the initial first one population A0 has a fix number of Complements detectors CDs as individuals their data type are real in range [0-1]. – Stopping conditions: if the current population has contained the desire number of optimal detectors (CDsn) or achieved the maximum generation, then stop, else, continues. – Define the following operators 1. Cleave operator OC: A CD individual cleave according to a cleaved probability Pc, is cleaved in two sub-individuals: a1 and a2. 2. Bind operator OB : There are two kinds of bind ways between individuals a and b: – Positive bind operator OPB : A new individual c = OPB (a,b) – Reverse bind operator ORB : A new individual c= ORB (b,a) • ICA: Active phase • ICA: Membrane attack process – Using Reverse bind operator ORB, bind bt and each DC individual of At2 to get a membrane attack complex set Ct. – For each DC individual of Ct , recode it by the code length of initial DC individual, then gets a new set C'. – Create a random population of complement individuals D, then join them into C', to finally form a new set E = C' ∪ D. For the next loop A0 is replace with E . – If the iteration step not finish go to stopping condition. C. Testing Phase This phase apply test on the immune memory of ALCs created in training phase. So here the meeting between memory ALCs and all types of Antigens Selfs and NonSelfs take place, it is important to note here that memory ALCs not encountered in passed with these new Ags. The Testing phase use Positive Selection to decide wither an Ag is Selfs or NonSelfs (i.e. normal or attack record) by calculate the affinity between ALCs and the new Ags and compared it with testing thresholds. As in Affinity Measure by Matching Rules section. So if any Ag match any one of ALCs it consider anomaly, i.e. a NonSelf Ags (attack), otherwise it is Self (normal). ICA: Identify Phase – Negative Selection: For each Complement detector in the current population apply NS with Self patterns, and the Complement detector whose match with any Self pattern will be discard. The Euclidean distance used here, which is give an indication of the difference between the two patterns, i.e. if the affinity between one CD and all Self patterns exceed a threshold, then the detector survive, else discard. – Split Population: isolate the CDs how survived from NS alone (A0NS) from the CDs how discarded (A0PS). – Positive Selection: For each Complement detector in the A0NS apply PS with NonSelf Ags, and the Complement detector whose match with all NonSelf Ag will be survive. The Euclidean distance used here, which is give an indication of the difference between the two patterns, Performance Measurement In learning extremely imbalanced data, the overall classification accuracy is often not an appropriate measure of performance. Metrics are used as true negative rate, true positive rate, weighted accuracy, G-mean, precision, recall, and F-measure to evaluate the performance of learning algorithms on imbalanced data. These metrics have been widely used for comparison and performance evaluation of 35 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 10, October 2011 . Figure (1): The overall diagram of Immunity IDS. 36 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 10, October 2011 . the 10 of the 41 features are continuous and identified as most significant are: 1, 5, 6, 10, 13, 16, 23, 24, 32, 33. – Save the indexes of these significant feature in text file to use them later in preprocessing the training and testing files. 1.3. Antigens Presentation – For both training and testing files apply preprocessing operations on the 10 significant features of them. – Convert all inputted Self & NonSelf Ags to (integer, real, string). – Apply Min-Max normalization on only how has real value to be in range [0-1]. 1.4. Detector Generation – Get NonSelfs Ags as initial Th, Ts, B ALCs, their length is ALClength = MaxFeature. – Convert them to 3 type of ALCs (integer, real, string). 2. Training Phase Input: 200 NSL records (60 normal, 140 attacks from every types); 2.1. First Layer-Cellular immunity (T & B cells reproduction) - Clonal and Expansion For (all ALCs type) do /*Calculate the select percent for cloning operation; SelectThNo = (Th_size × SelectTh) / 100; SelectTsNo = (Ts_size × SelectTs) / 100; SelectBNo = (B_size × SelectB) / 100; For (all ALCs type) do /* As an example Th While (Th_size < MaxThsize ) Λ (generate < MaxgenerationALC) Calculate the affinity between each ALC and all NonSelf Ags; Sort the ALCs in ascending or descending order (depend on affinity similarity or differently), according to the ALCs affinity; Select SelectThNo of the highest affinity ALCs with all NonSelf Ags as subset A; Calculate Clonal Rate for each one of ALC in A, according to the ALCs affinity; Create clones C as the set of clones for each ALC in A; Normalize the SelectThNo highest affinity ALCs; Calculate mutation Rate for each one of ALC in C, according to the ALCs normalized highest affinity; Mutate each one of ALC in C, according to it's mutation Rate and randomly select allele no, as the set of mutated clones C'; /*Apply NS between mutated ALCs C' and Self patterns; For (all Self patterns) do NS Calculate affinity by Landscape-affinity rule between current Th-ALC & all Self patterns; Normalize affinities in range [1-100] If (all affinity < ThNS) /* Apply PS between survived mutated ALCs from NS and NonSelfs Ags; For (all NonSelf Ags) do PS Calculate affinity by Landscape-affinity rule between current Th-ALC & all NonSelf Ags; Normalize affinities in range [1-100] If (all affinity >= ThPS) Th-ALC survive and save it in file "Thmem.txt"; Th_size = Th_size + 1; Else Discard current Th-ALC; Go to next Th-ALC End If Add survived mutated ALCs from NS & PS to "Thmem.txt", as Secondary response; generate++; End While End For Call Complement System to activate it; 2.2. Second Layer-Humoral immunity (Complement System) 2.2.A. ICA: Initialization phase Get NonSelfs as an initial real [0-1] population A0 has CDs equal PopSize. Stop: if the current population has contained CDsn optimal detectors or achieved MaxgenerationCDs generation. classifications. All of them are based on the confusion matrix as shown at table (1) [7, 17, 18, 19]. Table (1): The Confusion matrix. predicted predicted positives negatives real TP FN positives real FP TN negatives Where TP (true positive), attack records identified attack; TN (true negative), normal records identified normal; FP (false positive), normal records identified attack; FN ( false negative), attack records identified normal [3, 17, 18]. III. as as as as IMMUNITY-INSPIRED IDS PSEUDO CODE Each phase or layer of the algorithm and its iterative processes are given below: 1. Initialization and Preprocessing phase 1.1. Set all parameters that have constant value: – Threshold of NS: ThNS = 60, TsNS =0.2, TbNS = 30, TcompNS = 0.25; – Threshold of PS: ThPS = 80, TsPS =0.15, TbPS = 70, TcompPS = 0.15; – Threshold of Test PS: ThTest = 20, TsTest =0.1, TbTest = 80, TcompTest = 0.05; – Generation: MaxgenerationALC = 500, MaxThsize = 50, MaxTssize = 50, MaxBsize = 25. – Clonal & Expansion: selectTh= 50%, selectTs = 50%, selectB = 100%; – Complement System: MaxgenerationCDs = 1000, PopSize = NonSelfno., CDlength = 10, Div = 70%, CDno = 50; – Others: MaxFeature =10, Interval = 10, classes = 2, ALClength = 10, R-contiguous R = 1, ρ = 2 parameter control the smoothness of exponential (mutation); – Classes: • Normalize class: contain all functions and operation to perform min-max normalization in range [0-1] and [1-100]. • Cleave-Bind Class: contain Cleave() function OC ,PositiveBind() function OPB, ReverseBind() function ORB. – Input files for Training phase: NSL or KDD file contain 200 records (60 normal, 140 attack from all attack types). – Input files for Testing phase: files contain 20% from KDD or NSL datasets. 1.2. Preprocessing and Information Gain – Using the 21%NSL dataset file to calculate the following: – Split the dataset into two classes normal and attack. – Convert alphabetic features to numeric. – Convert all continuous features to discrete, for each class alone. For each one of 41 features Do Sort feature's space values; Partitioned feature space by Interval number specified, each partition contains the same number of data; Find the minimum and maximum values; Find the initial assignment value V = (maximum-minimum)/Interval no.; Assign each interval i by Vi = Σi V; If a value occurs more than Interval size in a feature space, it is assigned a partition of its own; – Calculate Information Gain for every feature in both two classes by applying equations in section 4.3.1.1. – By selecting the most significant features (MaxFeature=10) that have larger values of information gain, the system obtained the same features for both classes (normal and attack) but in different order. So 37 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 10, October 2011 . For (all Ags types) do PS Calculate the affinity by Landscape-affinity rule between each one of Ags and current Thmemory ALCs; Normalize affinities in range [1-100] If (affinity > ThNS) Thmemory ALCs detect a NonSelf Ag; Record Ag name; TP = TP + 1; /* no of detected Ags Else FP = FP +1; /*do the previous on, TsMemory, BMemory, and CDMemory. 3.1. Performance Measurement TN = normalAg - FP; FN = attackAg – TP; DetectionRate = TP / (TP + FN); FalseAlarmRate = FP / (TN + FP); ACY = (TP + TN)/(TP + TN + FP + FN); Gmean = DetectionRate × (1 - FalseAlarmRate); Precision =TP / (TP + FP); Recall = TP / (TP + FN); F-measure = (2 * Precision * Recall) / (Precision + Recall); Assign a random real value [0.5-1] as Cleave Probability Pc; 2.B. ICA: Identify Phase While ((CD_size < CDsn) Λ (generate <= MaxgenerationCDs)) For (each CD in Population A0) do For (all Self patterns) do NS Calculate affinity by Euclidean distance between current CD & all Self patterns; Normalize affinities in range [0-1] If (all affinity > TcompNS) Put current CD in A0NS sub-population; Else Put current CD in A0Rem sub-population; End For For (each CD in Population A0NS) do For (all NonSelf Ags) do PS Calculate affinity by Euclidean distance between current CD & all NonSelf Ags; If (all affinity <= TcomPS) Save it in file "CDmem.txt"; CD_size = CD_size + 1 Else Discard current CD; End For Sort all CDs in A0NS by their ascending affinities with NonSelf Ag, and put them in At; Append A0Rem at last At; 2.2.C. ICA: Active phase Divide At into At1 and At2 depend on Div active variable; /* At1 is a cleave set, At2 is a bind set; For (each CD individual in At1) do Apply cleave operator on CD with cleave probability Pc to produce two sub-individual a1 and a2, OC (CD, Pc, a1, a2); For (all sub-individual in a2) do Bind them in one remainder cleave set bt by Positive bind operator OPB, bt = OPB (a2i,…,Λ, a2n); 2.2.D. ICA: Membrane attack process For (each CD individual ai in At2) do Bind bt with current individual of At2 by Reverse bind operator ORB, to obtain Membrane Attack complex set Ct, Ct = ORB(bt, ai); For (each individual ci in Ct) do Recode it to the initial CDlength = 10 to get a new set C; /* different strategies may use here for that purpose. Create Random population of CDs individuals as a set D; Join C and D in one set E, consider it as a new population; E= C & D, A0 = E; Generate++; End While 3. Testing Phase Input: 21%NSL dataset; Initialize: FP, FN, TP, TN, DetectionRate, FalseAlarmRate, ACY, Gmean. /*Calculation number of normalAg & attackAg only for the purpose to calculate performance measurements For (each record in input file) do If (record type is normal) normalAg = normalAg +1; Else attackAg = attackAg +1; /*Antigens Presentation Convert all inputted Self & NonSelf Ags to (integer, real, string). Apply Min-Max normalization on only how has real value to be in range [0-1]. Read ThMemory ALCs; Read TsMemory ALCs; Read BMemory ALCs; Read CDMemory Detectors; /*Apply PS between all inputted Ags (Self & NonSelf, i.e. normal & attack) and all memory ALCs; For (all Thmemory ALCs) do /* As an example Th IV. SYSTEM PROPERTIES The special properties of Immunity IDS are: – The small size of training data, about 200 NSL records(60 normal, 140 attack from different types). – The speed of system, where the training periods are about 1 minute because the small size of training data, and the testing periods are about very few minutes depend on memory ALCs size. – The results of the system test different after each training operation, because it depend on randomly mutation for ALCs. – The numbers of memory ALCs depend on number of times of retraining, or what the system want. – The system permit to delete all memory contents to start new training, or every new training after the first one, the ALCs result from it will be add to memory with the previous. – The detection rate is high with small numbers of memory ALCs produced from one training. – To apply the Immunity IDS in real, the optimal result of one or more training are chosen, to carry out optimal outcome. – The thresholds values determined by many experiments until found the fit values. – The IIDS implemented using C# language. V. Experimentals Results 1) Several series of experiments were performed by 175 detectors (memory ALCs) sizes. The table (2) shows the test results of 10 training operation done seriously on 200 records to test "NSLTest-21.txt" file, which contain 9698 attack records and 2152 normal records. 2) Comparison of performances (ACY) between single level detection and multilevel detection. The ACY is chosen because it include both TPR and TNR. The table (3) and figure (2) show the test results of 5 training operation done seriously also on "NSLTest%.txt" file. Notice that CDs have the higher 38 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 10, October 2011 . [6] accuracy and B cells has the lower accuracy. Although the accuracy of IIDS lower than CD but IIDS has the higher detection rate this return to the effect of false alarm. [7] Table (2): Results of Test experiments. [8] TP 8748 8893 8748 8730 8800 8788 8802 8817 8833 8869 TN 2108 1871 2123 2146 1971 2014 2007 2046 2002 1963 FP 44 281 29 6 181 138 145 106 150 189 FN 950 805 950 968 898 910 896 881 865 829 TPR 0.9 0.92 0.9 0.9 0.91 0.91 0.91 0.91 0.91 0.91 TNR 0.02 0.13 0.01 0 0.08 0.06 0.07 0.05 0.07 0.09 ACY 0.92 0.91 0.92 0.92 0.91 0.91 0.91 0.92 0.91 0.91 g_m. 0.88 0.80 0.89 0.9 0.84 0.85 0.85 0.86 0.85 0.83 Prec. 0.99 0.97 1 1 0.98 0.98 0.98 0.99 0.98 0.98 F-m. 0.94 0.94 0.95 0.95 0.94 0.94 0.94 0.95 0.94 0.94 [9] [10] [11] [12] [13] Table 3: Accuracy of IIDS and each type of ALCs. IIDS 0.91 0.91 0.91 0.91 0.91 [14] ACY Ts 0.73 0.78 0.74 0.74 0.77 Th 0.84 0.84 0.84 0.84 0.84 B 0.22 0.21 0.22 0.25 0.30 CD 0.92 0.92 0.92 0.92 0.92 [15] [16] [17] [18] 1 0.9 Accuracy 0.8 0.7 IIDS 0.6 Th 0.5 Ts 0.4 B 0.3 CD [19] 0.2 [20] 0.1 0 1 2 3 4 5 Train no. [21] Figure 2 : Accuracy curve comparing the single-level detection (Th, Ts, B, CD) and multilevel (IIDS). [22] REFERENCES [1] [2] [3] [4] [5] M. Middlemiss, "Framework for Intrusion Detection Inspired by the Immune System", The Information Science Discussion Paper Series, July 2005. Dasgupta, D., Yu, S., Majumdar, N.S. Majumdar, "MILA - multilevel immune learning algorithm", In Cantu-Paz, E., et. al., eds.: Genetic and Evolutionary Computation Conference, Chicago, USA, Springer-Verlag (2003) 183–194 Dasgupta, D., Yu, S., Majumdar, N.S. Majumdar, "MILA – multilevel immune learning algorithm and its application to anomaly detection", DOI 10.1007/s00500-003-0342-7, Springer-Verlag 2003. Jungwon Kim, Peter J. Bentley, Uwe Aickelin, Julie Greensmith, Gianni Tedesco, Jamie Twycross, "Immune System Approaches to Intrusion Detection - A Review", Editorial Manager(tm) for Natural Computing, 2006. L. N. de Castro and J. Timmis. “Artificial Immune Systems: A New Computational Intelligence Approach”, book, Springer, 2002. [23] [24] [25] [26] 39 Edward Keedwell and Ajit Narayanan, "Intelligent Bioinformatics The application of artificial intelligence techniques to bioinformatics problems", book, John Wiley & Sons Ltd, 2005. Yanfang Ye · Dingding Wang · Tao Li · Dongyi Ye , "An intelligent PEmalware detection system based on association mining", J Comput Virol (2008) 4:323–334, Springer-Verlag France 2008. Adel Sabry Issa, "A Comparative Study among Several Modified Intrusion Detection System Techniques", Master Thesis, University of Duhok, 2009. Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh, "Data Mining: A Preprocessing Engine", Journal of Computer Science 2 (9): 735-739, 2006, ISSN 1549-3636, Science Publications, 2006. Paul K. Harmer, Paul D. Williams, Gregg H. Gunsch, and Gary B. Lamont, "An Artificial Immune System Architecture for Computer Security Applications", IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 6, NO. 3, JUNE 2002. Dipankar Dasgupta and Luis Fernando Niño, "Immunological Computation Theory and Applications", book, 2009. Zhou Ji and Dipankar Dasgupta, "Revisiting Negative Selection Algorithms", Massachusetts Institute of Technology, 2007. Thomas Stibor, "On the Appropriateness of Negative Selection for Anomaly Detection and Network Intrusion Detection", PhD thesis 2006. Rune Schmidt Jensen, "Immune System for Virus Detection and Elimination", IMM-thesis-2002. Fernando Esponda, Stephanie Forrest and Paul Helman, "A Formal Framework for Positive and Negative Detection Schames", IEEE 2002. A. H. Momeni Azandaryani M. R. Meybodi, "A Learning Automata Based Artificial Immune System for Data Classification", Proceedings of the 14th International CSI Computer Conference, IEEE 2009. Chao Chen, Andy Liaw, and Leo Breiman, "Using Random Forest to Learn Imbalanced Data", Department of Statistics,UC Berkeley, 2004. Yuchun Tang, Sven Krasser, Paul Judge, and Yan-Qing Zhang, "Fast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data", (Invited Paper), Secure Computing Corporation, North Point Parkway, 2006. Jamie Twycross , Uwe Aickelin and Amanda Whitbrook, "Detecting Anomalous Process Behaviour using Second Generation Artificial Immune Systems", University of Nottingham, UK, 2010. H. Güneş Kayacık, A. Nur Zincir-Heywood, and Malcolm I. Heywood, "Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Intrusion Detection Datasets", 6050 University Avenue, Halifax, Nova Scotia. B3H 1W5, 2006. Feng Gu, Julie Greensmith and Uwe Aickelin, "Further Exploration of the Dendritic Cell Algorithm: Antigen Multiplier and Time Windows", University of Nottingham, UK, 2007. Prachya Pongaksorn, Thanawin Rakthanmanon, and Kitsana Waiyamai, "DCR: Discretization using Class Information to Reduce Number of Intervals", Data Analysis and Knowledge Discovery Laboratory (DAKDL), P. Lenca and S. Lallich (Eds.): QIMIE/PAKDD 2009. Chen Guangzhu, Li Zhishu, Yuan Daohua, Nimazhaxi and Zhai yusheng. "An Immune Algorithm based on the Complement Activation Pathway", IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.1A, January 2006. J. McHugh, “Testing intrusion detection systems: a critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory,” ACM Transactions on Information and System Security, vol. 3, no. 4, pp. 262–294, 2000. The NSL-KDD Data Set, http://nsl.cs.unb.ca/NSL-KDD. M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, “A Detailed Analysis of the KDD CUP 99 Data Set,” Submitted to Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009. http://sites.google.com/site/ijcsis/ ISSN 1947-5500