GAA UNIT 5 - Part A

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Genetic Algorithms and its

Applications

UNIT V

Presenter
Dr.Siron Anita Susan T
AP/SRMIST
Feature Selection

• Feature selection is the process of selecting a subset of relevant


features (variables, predictors) for use in model construction.
• Feature selection is a process that chooses a subset of features from the
original features so that the feature space is optimally reduced
according to a certain criterion.
• Each feature used in classification increases cost and running time,
motivating the design of systems with small feature sets.
• There is a need to balance a small feature set with the requirement to
achieve high recognition rates, especially under difficult conditions.
• Optimal Feature Subset Techniques: Techniques have been developed to
find an “optimal” subset of features from a larger set, falling into two main
categories:
• Independent Selection: Features are selected independent of their effect on
classification performance. This involves identifying transformations to preserve
information and remove redundancy and noise.
• Performance-based Selection: A subset of features is selected to maintain classifier
performance, accounting for dependencies between features.
• Traditional machine learning indirectly addresses “optimal” feature
selection through biases for simple classification rules, leading to efficient
induction procedures for individual rules (trees) with a few features.
• Individual rules (trees) in machine learning can use different sets of
features, resulting in larger cumulative feature sets than acceptable for
image classification problems.
• Overfitting and Pruning: Traditional machine learning algorithms tend to
overfit training data, especially noisy data, necessitating ad hoc truncating
(pruning) procedures to simplify the induced rules (trees).
• Opportunity for Improvement: There is significant potential to improve the
usefulness of traditional machine learning techniques for automatic feature
selection in image processing.
Feature Selection in Machine learning using GA

• Genetic algorithms (GAs) are known for their ability to efficiently search large,
unknown spaces.
• GAs are relatively insensitive to noise, making them robust for feature selection.
• Due to their noise insensitivity, GAs are an excellent choice for a robust feature
selection strategy.
• Using GAs for feature selection can enhance the performance of texture
classification systems.
• Genetic algorithms (GAs) are a form of inductive learning strategy.
• Adaptive Search Techniques: GAs are adaptive search techniques that show
significant improvement over random and local search methods.
• Exploitation of Information: GAs exploit accumulating information about an
initially unknown search space to bias searches towards promising subspaces.
Cont…
• GAs are domain-independent search techniques, suitable for applications where domain
knowledge and theory are hard to provide.
• Key Issues in Application: The main issues in applying GAs are selecting an appropriate
representation and an adequate evaluation function.
• Feature Selection Representation: In feature selection, the focus is on representing the
space of all possible subsets of a given feature set.
• Binary Representation: The simplest form of representation in feature selection is
binary, where each feature is a binary gene.
• Fixed-Length Binary String: Each individual in the population is a fixed-length binary
string, representing a subset of the given feature set.
• Binary Feature Vector: An individual of length l corresponds to an l-dimensional binary
feature vector X.
• Inclusion or Elimination: Each bit in the binary feature vector indicates whether a feature
is included (1) or eliminated (0).
Evaluation Function

• Choosing an appropriate evaluation function is crucial for the successful


application of genetic algorithms (GAs).
• Implementing a more performance-oriented fitness function tailored for GAs is
essential.
• The fitness function must properly assess the decision rules generated by the
AQ algorithm.
• AQ- focuses on generating decision rules from a given set of training examples
• Each testing example is classified using the AQ-generated rules.
• If a testing example is classified correctly, it is considered a correct recognition.
• The overall fitness function is evaluated by:
• Adding the weighted sum of the match scores of all correct recognitions.
• Subtracting the weighted sum of the match scores of all incorrect
recognitions.
Cont….

• after the value of F was normalized


to the range [-100, 100], the
subtraction ensures that the final
evaluation is always positive with
lower values representing better
classification performance.
Designing Texture Filters with Genetic Algorithms
• Various techniques have been employed for texture-based segmentation, often involving
texture descriptor categories and clustering during a training phase.
• Traditional methods of texture feature extraction are based on statistical or structural
models.
• Statistical Model: Defines texture by characteristic relationships between image elements,
typically determined from tonal values. i.e analyze the distribution and relationship of pixel
intensities to capture texture patterns.
• structural models focus on the arrangement and repetition of basic texture elements called
primitives
Comparators for Texture Feature Extraction:
• Grey Level Spatial Dependency (GLSD) Matrices: Also known as co-occurrence
matrices.
• Sum and Difference Histograms: A simplified approach for texture feature extraction.
• Texture Energy in the Spatial Domain: Derived by convolution, as described by Laws.
• Fractal-Based Methods: Utilizes fractals for texture analysis.
Flow chart of
proposed genetic
algorithm
Texture Discrimination Using Genetic Algorithms

• Goal: Design a mask to maximize inter-class differences and minimize


intra-class differences when correlated with the Fourier spectrum of
given patterns.

Steps for Correlation-Based Optimization Using GA


• Patch Selection: Select rectangular patches from a given image as
training templates representing each class of texture to be detected.

• Fourier Transform (FT):


Perform FT on each patch.
Logarithmically transform the resulting spectra to reduce the range of values, simplifying binary encoding for GAs.

• Objective Function Evaluation:


Evaluate an objective function based on the responses of the correlation of the mask with the Fourier transformed
patches.

• Optimization with Genetic Algorithm (GA):


Apply a GA to optimize the objective function over the set of all possible masks.
Cont..
The Objective Function
• The aim is to design a filter which when correlated with all the members of class A will produce a high
response and when correlated with the members of class B will produce a low response.
• As a starting point the objective function should incorporate the terms:
Filter Realisation Using Genetic Algorithms

• Parameters are mapped to a finite-length symbol string using an appropriate conversion


alphabet.
• The chromosome is two-dimensional, representing a matrix output of a 2D FFT with n×n
dimensions.
• Fourier spectra are transformed logarithmically and quantized to a minimal number of bits. Each
gene is encoded in binary.
• Eight harmonics are used to balance robustness and sensitivity in evaluating the correlation
coefficient.

Genetic Operators:

• Selection: Enhanced by pre-computing the number of offspring per chromosome based on


fitness.
• Crossover: Implemented by exchanging rectangular segments of the chromosome with
breakpoints aligned on gene boundaries.
• Mutation: Occurs with a 0.5% probability, acting as a logical NOT on the value of the bit.
Cont…
Choice of GA Parameters
• Population Size: Limited to 100 individuals.
• Generations: 100 generations of evolution allowed.
• Crossover Probability: Typically set at 0.8 with 2 breakpoints in each dimension.
• Encoding: Logarithm of the real part of spectral magnitude encoded to 8 bits.
• Crossover Strategy: Switching from 1-breakpoint to nnn-breakpoints reduces the
number of generations needed for convergence.
Implementation of the System
• Initial Setup: Start with an image containing regions classified into two classes. Select
32×32 pixel training patches as class members.
• Initialization: First generation initialized with random values.
• GA Optimization: Design a discriminating filter using GA optimization.
• Filter Application: The filter is used to find additional regions of the classes in the same
or different images.
Cont…
Post-Processing the Results of Texture-Based Segmentation
• Segmentation Output: GA-designed filter output may result in
incompletely segmented images.
• Improving Segmentation: Use a maximum likelihood decision rule and
a 5×5 median filter to enhance segmentation.
• Enclosing Contour Derivation: Refine the boundary using strong edge
information to accurately enclose the region of contrasting texture.
System Results
• Result Example: Figure shows the result of the GA-based texture filter
using 32×32 pixel training segments.
• System Description: The system is based on spectral frequency
properties for texture discrimination and exploits well-established
Fourier spectral properties.
Genetic Algorithm Based Knowledge Acquisition on Image Processing

• The Internet enables easy and immediate acquisition of large numbers of digital color images, such as
daily growth of plants in remote fields.
• Detailed information concerning shape, growth rate, and leaf colors of plants can be obtained from
these images.
• Vast quantities of image data increase the time spent extracting detailed information due to the
need for human aid and empirical knowledge of image processing.
• The extraction procedure involves segmenting images of objects and deriving their outlines or areas,
often through routine and trial and error performed by hand.
• Expert systems and automated image processing systems have been studied in various engineering
areas.
• Genetic algorithms are suitable for selecting filtering algorithms and adjusting parameters for
segmenting target components in images due to their optimization and automation by trial and error.
• Researchers have applied GAs to obtain optimal image processing transformations, mapping the
original image into the target.
• Software based on GAs can be used for segmenting images of plants and acquiring knowledge on the
operations involved.
Image Segmentation Strategy

• Numerous efficient filtering algorithms Filtering algorithms used as phenotypes


exist for tasks such as noise elimination
and edge enhancement.
• Implementing all filtering algorithms is
impractical due to increased processing
time.
• Based on empirical knowledge, several
commonly used filtering algorithms are
selected for implementation in the
developed algorithm (Table).
• Thresholding and reversion algorithms
are performed on focused pixels in serial
order.
• Other algorithms use spatialmask
operators.
Cont…
Common Procedures for Segmentation (Figure):
• Smoothing (SM): Averages color of component areas using smoothing.
• Thresholding on Hue (TH): Enhances target components and converts the Flow chart of a typical
image to monochrome. procedure for enhancing
• Edge Enhancement (EE): Differentiation used to outline target features. targets components
• Binarization (TB): Converts the entire monochrome image to binary.
• Reversion (RV): Occasionally enhances binarized pixels.
• Fusion Operations: Noise reduction through expansion (EF) and contraction
(CF), which may be repeated.

• The image is converted to a binarized image with defined target components.


• The HSI model is used instead of the RGB model for efficiency in segmenting
plants.
• All operations are performed after converting pixel values from RGB to HSI.
• A median operator with a 3×3 mask is applied to the hue of pixels.
Cont…
• One operator works on hue, the other on brightness.
• Substitutes null for pixel bits within a defined range and unity for those
outside.
• A Sobel operator with a 3×3 mask is used on brightness values.
• Converts images to monochrome by substituting null for pixel saturation.
• Contraction: Replaces a pixel with black if more than one neighboring
pixel is black.
• Expansion: Replaces a pixel with white if more than one neighboring
pixel is white.
• A binary drawing image representing target components with white
pixels and remainders with black pixels is required for fitness evaluation.
Object Localization in Images Using Genetic Algorithm
• Genetic algorithms (GAs) are applied to object registration in medical images, specifically for
blood cell detection, localization, and recognition.
• This approach is appropriate due to the characteristics of blood cell images.
• Object registration involves detecting, localizing, and recognizing objects in images.
• Challenges include overlapping objects against a uniform background and identifying
different classes based on morphology, color, texture, etc.
Digital Processing:
• Major goals: Object detection, localization, recognition, and classification.
• Additional interests: Detailed characterization (size, color, direction, scaling, shift, rotation).
• Image search for specific object prototypes with a required similarity level.
• Digital Image Processing:
• Offers various approaches for processing images.
• Choice of the most suitable method is not fully automated.
• Application of GAs to localize objects in blood cell images.

• The problem of object registration is critical in blood and serum analysis.


• Automated solutions exist, but quality and critical nature often require manual examination
by experts.
Cont…
Genetic Algorithms:
• Non-linear optimization method: Seeks optimal solutions via non-exhaustive, randomly generated solutions.
• Controlled randomness: Parameters control randomness, making GAs flexible and robust.
• Disadvantages: Not suitable for real-time applications, long convergence times, unpredictable convergence
time.
• Advantages: Strong optimization tool, ongoing research combining GAs with fuzzy logic and neural networks.
Mechanisms of Genetic Algorithms:
1. Parent Selection:
1. Methods include roulette, classification, constant situation, proportional forms, and elitist choice.
2. Genetic Operations:
1. Crossover: Combines parents to produce offspring with improved characteristics.
2. Mutation: Introduces variations to improve offspring.
3. Parent Replacement:
1. Generational Replacement: Replaces entire parent generation with offspring.
2. Steady State Reproduction: Gradual replacement of parents by offspring.
• Application to Blood Cell Images:
• GAs are well-suited to the morphology of blood cells.
• Results indicate the effectiveness of GAs in processing these medical images.
Feature Selection in Data-Mining
• A 2-phase approach using a specific genetic algorithm (GA) is employed to discover features and factors in large
databases.
• This heuristic approach is chosen due to the large number of features to consider.
Data Representation:
• Data consists of pairs of affected individuals from the same family, indicating their similarity at given points (loci) of
their chromosomes.
• This is represented in a matrix where each column represents a locus and each row represents a pair of individuals.
Objectives:
• Identify the most relevant associations of features (loci).
• Classify individuals based on the similarities according to the identified associations.
• Phase 1: Feature Selection Using Genetic Algorithm (GA):
• A GA is used to address the feature selection problem.
• Advanced mechanisms introduced in the GA for this specific problem:
• Sharing: Ensures diversity in the population.
• Random Immigrant: Introduces new individuals to maintain diversity.
• Dedicated Genetic Operators: Customized operators tailored to the problem.
• Particular Distance Operator: Defined to measure distances between individuals.
• Phase 2: Clustering Using K-means Algorithm:
• Clustering based on features selected in Phase 1.
• K-means Algorithm: Popular clustering algorithm used to group individuals with similar features.
The Clustering Phase: Use of K-Means Algorithm
• The k-means algorithm is an iterative method for clustering
data.
• The algorithm requires an initial classification or partitioning of
the data.

• Algorithm Steps:
• Compute Cluster Centers: Calculate the center (mean) of
each cluster.
• Reassign Objects: Assign each object to the cluster with the
nearest center using the Hamming distance.
• Repeat Cycle: This process is repeated for a specified number
of iterations or until the cluster assignments do not change
during an iteration.

• The procedure is initialized by randomly selecting initial cluster


centers.

Application:
Classical k-Means Algorithm: Implemented due to the reduced number of features.
Versatility: The approach can be applied to any large databases depending on the user's application.
Genetic Algorithm Based Fuzzy Data Mining to Intrusion Detection

• The widespread use of computer networks and the rise of e-commerce have made computer
network security a global priority.
• Intrusion Detection Categories:
• Misuse Detection: Recognizes attacks based on known patterns reported by experts. Vulnerable
to new or masked attack patterns.
• Anomaly Detection: Identifies intrusions by detecting deviations from normal behavior
patterns.
• AI Techniques in Intrusion Detection:
• Rule-Based Expert Systems: Use if-then rules based on expert knowledge of known attacks
(e.g., SRI’s IDES). The rule acquisition process is tedious and prone to errors.
• Machine Learning Techniques: Automate learning of attack patterns (e.g., TIM, neural network-
based systems).
• Data Mining in Intrusion Detection:
• Challenges: Rules derived directly from audit data may miss slight deviations or cause false
alarms due to minor changes in normal behavior.
• Fuzzy Logic Integration: Helps address these challenges by smoothing the separation between
normality and abnormality, providing a degree of normality or abnormality.
Cont…
Prototype Intelligent Intrusion Detection System (IIDS):
• Combines Two Approaches:
• Anomaly-based detection using fuzzy data mining.
• Misuse detection using traditional rule-based expert systems.
• Inputs: Uses both network traffic and system audit data.
Use of Genetic Algorithms:

• To tune fuzzy membership functions for improved performance.


• To select the most informative features from audit data for the data mining component.

Advantages of Fuzzy Logic in Intrusion Detection:

• Handles quantitative features well.


• Smooths abrupt separations between normal and abnormal behavior.
• Provides degrees of normality or abnormality for measurements.
System Goals and Preliminary Architecture

• Long-Term Goal: Design and build an intelligent


intrusion detection system that is:

• Accurate (low false negatives and false positives)


• Flexible
• Not easily fooled by small variations in intrusion
patterns
• Adaptive to new environments
• Modular with both misuse and anomaly
detection components
• Distributed
• Real-time

• System Architecture: Developed with the above


goals in mind.

Architecture of IIDS
Cont…
Machine Learning Component:
• Integrates fuzzy logic with association rules and frequency episodes to learn normal
patterns of system behavior.
• Stores normal behavior as sets of fuzzy association rules and fuzzy frequency episodes.

Anomaly Intrusion Detection Module:


• Extracts patterns from observed audit trails and compares them with stored normal
patterns.
• Alarms an intrusion if the similarity is below a specified threshold.
Misuse Intrusion Detection Modules:
• Use rules written in FuzzyCLIPS to match known attack patterns or suspicious behavior.
• Fuzzy logic makes the system's rules more flexible and less brittle.

Adaptability:
• The machine learning component allows adaptation to new environments.
Cont…
Modular Structure:
• Detection methods implemented as a set of intrusion detection modules.
• Modules can address one or multiple types of intrusions.
• Modules can cooperate to detect intrusions in a loosely coupled manner.
• Different modules may use different methods (e.g., rule-based expert system, neural
network classifier).
• Modular structure eases future system expansion.
Decision-Making Module:
• Decides whether to activate an intrusion detection module (misuse or anomaly).
• Integrates evaluation results provided by the intrusion detection modules.
Communication Module:
• Acts as the bridge between the intrusion detection sentries and the decision-making
module.
• Pre-processes audit data and sends results to the communication module.
• Provides feedback to the sentries.
Very Large Scale Integration (VLSI)

Objective:
The objective of VLSI (Very Large Scale Integration) testing is to create a
collection of test vectors (sequences of input stimuli) that effectively detect
and diagnose manufacturing defects within integrated circuits (ICs).
•Stuck at fault modeling is the widely used fault modeling method in VLSI
Testing.
•Here nodes are assumed to be stuck at either “0” or “1”, for the purpose
fault modeling.
• Testing methodology for a digital circuit
Very Large Scale Integration (VLSI)

• Test vectors are encoded as Binary bit stream.


•Fitness function gives the number of faults covered by each test
vector
VLSI Macro Cell Layout Using Hybrid GA
• Genetic algorithms are effective for solving combinatorial optimization problems.
• The algorithm's "blindness" during the search process in the discrete space of encoding must be
addressed.
• Problem-specific genotype encoding is recommended to ensure the search reaches feasible points after genetic
operators are applied.
• Hybrid techniques, incorporating knowledge-based methods, can enhance the algorithm's performance
during initial individual creation and optimization.
• A novel hybrid genetic algorithm is introduced for solving the macrocell placement problem in VLSI design.
• VLSI chip design involves several steps, including specification, functional design, circuit design, physical
design, and fabrication.
• Macro-cell layout generation is a specific task within the physical design cycle, involving partitioning the circuit
and grouping components into functional units, the macro-cells..
VLSI Macro Cell Layout Using Hybrid GA

• Description of macrocells as rectangular blocks with terminals (pins) for signal transmission.
• Definition of nets connecting terminals for signal transmission, some routed to pads for chip I/O.

• Importance of layout in defining cell positions.


VLSI Macro Cell Layout Using Hybrid GA
• Major objectives: minimization of chip area and interconnection wire length.
• Complexity of the problem due to the exponential increase in possible placements
with the number of blocks.
• Introduction of a hybrid genetic algorithm with a genotype representation based
on binary trees and genetic operators that work directly on this tree structure.

Problem Description
• Inputs of the placement problem are
• a set of blocks with fixed geometries and fixed pin positions
• a set of nets specifying the interconnections between pins of blocks
• a set of pads (external pins) with fixed positions
• a set of user constraints, e.g., block positions/orientations, critical nets, if any
VLSI Macro Cell Layout Using Hybrid GA

• The objective function, which measures the quality of the resulting placement, can be
expressed as follows,

• E =1/(C1ChipArea+C2WireLength)
where C1,C2 are the corresponding weights.
Genetic Layout Optimization
• The Hybrid Genetic Algorithm
• Design of a Hybrid Genetic Algorithm (HGA) incorporating heuristics to enhance
the quality of offspring produced by crossover.

• Initialization of the population with randomly generated individuals.


• Offspring generation through crossover between two randomly selected parents.
• Utilization of layout improvement heuristics, such as RemoveSharp and LocalOpt,
to refine the offspring towards a local maximum.
• Comparison of the fitness of the offspring's layout with the fitness of its parent
layouts:
□ If the offspring's layout has higher fitness than either parent, the parent with
lower fitness is replaced by the offspring in the population.

□ If the offspring's layout has lower fitness than both parents, it is discarded.
The Hybrid Genetic Algorithm
• Mutation process:
□ Random number generation within the range of 0 to 1.
□ If the generated number is less than the specified probability for the mutation
operator, a layout is randomly selected and replaced with a randomized version in the
population.

□ Step 1 : Initialize population randomly


□ Step 2 : Apply RemoveSharp algorithm to all layouts in the initial population
□ Apply LocalOpt algorithm to all layouts in the initial population
□ Step 3 : Select two parents randomly
□ Apply Crossover between parents and generate an offspring
□ Apply RemoveSharp algorithm to offspring
□ Apply LocalOpt algorithm to offspring
□ If Fitness(offspring) > Fitness (any one of the parents) then replace the
□ weaker parent by the offspring
□ Step 4 : Mutate any one randomly selected layout from population
□ Step 5 : Repeat steps 3 and 4 until end of specified number of iterations.
The Hybrid Genetic Algorithm
• Genotype Representation
• Genetic Operators
• Optimization process requires changing the placement of blocks in the layout.
• Genetic operators directly manipulate the tree structure:
• Crossover combines subtrees of parents.
• Mutation modifies the tree of an individual.
• Crossover operator combines subtrees from two parents to form an offspring.
• Resulting layout may be incomplete, requiring the addition of missing blocks.
• Mutation operator modifies by exchanging simple blocks, a block with a meta-block,
or two meta-blocks.
The Hybrid Genetic Algorithm
The Hybrid Genetic Algorithm
Simulation:
• Hybrid genetic algorithm tested on real-life circuits from the MCNC benchmark suite.
• Benchmarks maintained by North Carolina's Microelectronics, Computing and Networking Center.

• Characteristics of circuits provided in Table


• Analysis of heuristics:
• Parameters for optimal performance of the hybrid genetic algorithm:
• RemoveSharp (m): 5
• LocalOpt (q): 6
• Probability of Population size: 50
• Mutation operator: 0.02
• Number of Iterations: 10000.
The Hybrid Genetic Algorithm
The Hybrid Genetic Algorithm

The classical layout problem discussed in problem description; in which wirelength is also a factor in
fitness value is analyzed. But due to technological progress, technologies like Over-The-Cell routing are
now used so there is no need to determine wirelength and to add routing space (through which wires are
routed) to the layout.
Therefore in the fitness function introduced in problem description, C1 is assigned one, and C2 is
assigned zero.
The result of using hybrid GA is shown in Table 10.12.
Self Study:
• GA in network synthesis
• Fuzzy based speed control of business DC motor.
• Application in wireless network for topology planning
• GA application in ATM network

You might also like