2016 Book SearchAndOptimizationByMetaheu
2016 Book SearchAndOptimizationByMetaheu
2016 Book SearchAndOptimizationByMetaheu
M.N.S. Swamy
Search and
Optimization by
Metaheuristics
Techniques
and Algorithms
Inspired by Nature
Ke-Lin Du M.N.S. Swamy
•
To My Parents
M.N.S. Swamy
Preface
vii
viii Preface
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Computation Inspired by Nature . . . . . . . . . . . . . . . . . . . . . 1
1.2 Biological Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Evolution Versus Learning . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Swarm Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Group Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Foraging Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Heuristics, Metaheuristics, and Hyper-Heuristics . . . . . . . . . . 9
1.6 Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.1 Lagrange Multiplier Method . . . . . . . . . . . . . . . . . . 12
1.6.2 Direction-Based Search and Simplex Search . . . . . . . 13
1.6.3 Discrete Optimization Problems . . . . . . . . . . . . . . . 14
1.6.4 P, NP, NP-Hard, and NP-Complete . . . . . . . . . . . . . 16
1.6.5 Multiobjective Optimization Problem . . . . . . . . . . . . 17
1.6.6 Robust Optimization . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Performance Indicators. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.8 No Free Lunch Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.9 Outline of the Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Basic Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Variants of Simulated Annealing . . . . . . . . . . . . . . . . . . . . . 33
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....... 37
3.1 Introduction to Evolutionary Computation . . . . . . . ....... 37
3.1.1 Evolutionary Algorithms Versus Simulated
Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Terminologies of Evolutionary Computation . . . . . . . . . . . . . 39
3.3 Encoding/Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Selection/Reproduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
xi
xii Contents
3.6 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7 Noncanonical Genetic Operators . . . . . . . . . . . . . . . . . . . . . 49
3.8 Exploitation Versus Exploration . . . . . . . . . . . . . . . . . . . . . 51
3.9 Two-Dimensional Genetic Algorithms . . . . . . . . . . . . . . . . . 55
3.10 Real-Coded Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . 56
3.11 Genetic Algorithms for Sequence Optimization . . . . . . . . . . . 60
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Syntax Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 Causes of Bloat. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4 Bloat Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4.1 Limiting on Program Size . . . . . . . . . . . . . . . . . . . 77
4.4.2 Penalizing the Fitness of an Individual
with Large Size. . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.4.3 Designing Genetic Operators . . . . . . . . . . . . . . . . . 77
4.5 Gene Expression Programming . . . . . . . . . . . . . . . . . . . . . . 78
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5 Evolutionary Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3 Evolutionary Gradient Search and Gradient Evolution . . . . . . 85
5.4 CMA Evolutionary Strategies . . . . . . . . . . . . . . . . . . . . . . . 88
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6 Differential Evolution . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 93
6.1 Introduction . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 DE Algorithm . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3 Variants of DE . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 97
6.4 Binary DE Algorithms . ... . . . . . . . . . . . . . . . . . . . . . . . . 100
6.5 Theoretical Analysis on DE . . . . . . . . . . . . . . . . . . . . . . . . 100
References. . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 101
7 Estimation of Distribution Algorithms . . . . . . . . . . . . . . . . . . . . . 105
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.2 EDA Flowchart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.3 Population-Based Incremental Learning . . . . . . . . . . . . . . . . 108
7.4 Compact Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . 110
7.5 Bayesian Optimization Algorithm . . . . . . . . . . . . . . . . . . . . 112
7.6 Concergence Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.7 Other EDAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.7.1 Probabilistic Model Building GP. . . . . . . . . . . . . . . 115
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Contents xiii
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Abbreviations
Ab Antibody
ABC Artificial bee colony
AbYSS Archive-based hybrid scatter search
ACO Ant colony optimization
ADF Automatically defined function
AI Artificial intelligence
aiNet Artificial immune network
AIS Artificial immune system
BBO Biogeography-based optimization
BFA Bacterial foraging algorithm
BMOA Bayesian multiobjective optimization algorithm
CCEA Cooperative coevolutionary algorithm
cGA Compact GA
CLONALG Clonal selection algorithm
CMA Covariance matrix adaptation
C-MOGA Cellular multiobjective GA
COMIT Combining optimizers with mutual information trees algorithm
COP Combinatorial optimization problem
CRO Chemical reaction optimization
CUDA Computer unified device architecture
DE Differential evolution
DEMO DE for multiobjective optimization
DMOPSO Dynamic population multiple-swarm multiobjective PSO
DNA Deoxyribonucleic acid
DOP Dynamic optimization problem
DSMOPSO Dynamic multiple swarms in multiobjective PSO
DT-MEDA Decision-tree-based multiobjective EDA
EA Evolutionary algorithms
EASEA Easy specification of EA
EBNA Estimation of Bayesian networks algorithm
EDA Estimation of distribution algorithm
EGNA Estimation of Gaussian networks algorithm
ELSA Evolutionary local selection algorithm
xix
xx Abbreviations
This chapter introduces background material on global optimization and the concept
of metaheuritstics. Basic definitions of optimization, swarm intelligence, biological
process, evolution versus learning, and no-free-lunch theorem are described. We
hope this chapter will arouse your interest in reading the other chapters.
Nature is the primary source of inspiration for new computational paradigms. For
instance, Wiener’s cybernetics was inspired by feedback control processes observ-
able in biological systems. Changes in nature, from microscopic scale to ecological
scale, can be treated as computations. Natural processes always reach an equilibrium
that is optimal. Such analogies can be used for finding useful solutions for search
and optimization. Examples of natural computing paradigms are artificial neural
networks [43], simulated annealing (SA) [37], genetic algorithms [30], swarm intel-
ligence [22], artificial immune systems [16], DNA-based molecular computing [1],
quantum computing [28], membrane computing [51], and cellular automata (von
Neumann 1966).
From bacteria to humans, biological entities have social interaction ranging from
altruistic cooperation to conflict. Swarm intelligence borrows the idea of the collec-
tive behavior of biological population. Cooperative problem-solving is an approach
that achieves a certain goal by the cooperation of a group of autonomous enti-
ties. Cooperation mechanisms are common in agent-based computing paradigms,
be biological-based or not. Cooperative behavior has inspired researches in biology,
economics, and the multi-agent systems. This approach is based on the notion of the
associated payoffs from pursuing certain strategies.
Game theory studies situations of competition and cooperation between multiple
parties. The discipline starts with the von Neumann’s study on zero-sum games [48].
It has many applications in strategic warfares, economic or social problems, animal
behaviors, and political voting.
Evolutionary computation, DNA computing, and membrane computing are depen-
dent on knowledge on the microscopic cell structure of life. Evolutionary com-
putation evolves a population of individuals by generations, generate offspring by
mutation and recombination, and select the fittest to survive each generation. DNA
computing and membrane computing are emerging computational paradigms at the
molecular level.
Quantum computing is characterized by principles of quantum mechanics, com-
bined with computational intelligence [46]. Quantum mechanics is a mathematical
framework or set of rules for the construction of physical theories.
All effective formal behaviors can be simulated by Turing machines. For phys-
ical devices used for computational purpose, it is widely assumed that all physical
machine behaviors can be simulated by Turing machines. When a computational
model computes the same class of functions as the Turing machine, and potentially
faster, it is called a super-Turing model. Hypercomputation refers to computation
that goes beyond the Turing limit, and it is in the sense of super-Turing computation.
While Deutsch’s (1985) universal quantum computer is a super-Turing model, it is not
hypercomputational. The physicality of hypercomputational behavior is considered
in [55] from first principles, by showing that quantum theory can be reformulated in
a way that explains why physical behaviors can be regarded as computing something
in standard computational state machine sense.
1.2 Biological Processes 3
A C T G
T G A C
Figure 1.2 displays a chromosome, its DNA makeup, and identifies one gene.
The genome directs the construction of a phenotype, especially because the genes
specify sequences of amino acids which, when properly folded, become proteins. The
phenotype contains the genome. It provides the environment necessary for survival,
maintenance, and replication of the genome.
Heredity is relevant to information theory as a communication process [5]. The
conservation of genomes over intervals at the geological timescale and the existence
of mutations at shorter intervals can be conciliated, assuming that genomes possess
intrinsic error-correction codes. The constraints incurred by DNA molecules result
in a nested structure. Genomic codes resemble modern codes, such as low-density
parity-check (LDPC) codes or turbocodes [5]. The high redundancy of genomes
achieves good error-correction performance by simple means. At the same time,
DNA is a cheap material.
In AI, some of the most important components comprise the process of memory
formation, filtering, and pattern recognition. In biological systems, as in the human
brain, a model can be constructed of a network of neurons that fire signals with
different time sequence patterns for various input signals. The unit pulse is called an
action potential, involving a depolarization of the cell membrane and the successive
repolarization to the resting potential. The physical basis of this unit pulse is from
active transport of ions by chemical pumps [29]. The learning process is achieved by
taking into account the plasticity of the weights with which the neurons are connected
to one another. In biological nervous systems, the input data are first processed locally
and then sent to the central nervous system [33]. This preprocessing is partly to avoid
overburdening the central nervous system.
The connectionist systems (neural networks) are mainly based on a single brain-
like connectionist principle of information processing, where learning and infor-
mation exchange occur in the connections. In [36], the connectionist paradigm is
extended to integrative connectionist learning systems that integrate in their struc-
ture and learning algorithms principles from different hierarchical levels of informa-
tion processing in the brain, including neuronal, genetic, quantum. Spiking neural
networks are used as a basic connectionist learning model.
1.3 Evolution Versus Learning 5
The adaptation of creatures to their environments results from the interaction of two
processes, namely, evolution and learning. Evolution is a slow stochastic process
at the population level that determines the basic structures of a species. Evolution
operates on biological entities, rather than on the individuals themselves. At the
other end, learning is a process of gradually improving an individual’s adaptation
capability to the environment by tuning the structure of the individual.
Evolution is based on the Darwinian model, also called the principle of natural
selection or survival of the fittest, while learning is based on the connectionist model
of the human brain. In the Darwinian evolution, knowledge acquired by an individual
during the lifetime cannot be transferred into its genome and subsequently passed
on to the next generation. Evolutionary algorithms (EAs) are stochastic search meth-
ods that employ a search technique based on the Darwinian model, whereas neural
networks are learning methods based on the connectionist model.
Combinations of learning and evolution, embodied by evolving neural networks,
have better adaptability to a dynamic environment [39,66]. Evolution and learning
can interact in the form of the Lamarckian evolution or be based on the Baldwin
effect. Both processes use learning to accelerate evolution.
The Lamarckian strategy allows the inheritance of the acquired traits during an
individual’s life into the genetic code so that the offspring can inherit its charac-
teristics. Everything an individual learns during its life is encoded back into the
chromosome and remains in the population. Although the Lamarckian evolution is
biologically implausible, EAs as artificial biological systems can benefit from the
Lamarckian theory. Ideas and knowledge are passed from generation to generation,
and the Lamackian theory can be used to characterize the evolution of human cul-
tures. The Lamarckian evolution has proved effective within computer applications.
Nevertheless, the Lamarckian strategy has been pointed out to distort the population
so that the schema theorem no longer applies [62].
The Baldwin effect is biologically more plausible. In the Baldwin effect, learning
has an indirect influence, that is, learning makes individuals adapt better to their envi-
ronments, thus increasing their reproduction probability. In effect, learning smoothes
the fitness landscape and thus facilitates evolution [27]. On the other hand, learning
has a cost, thus there is evolutionary pressure to find instinctive replacements for
learned behaviors. When a population evolves a new behavior, in the early phase,
there will be a selective pressure in favor of learning, and in the latter phase, there
will be a selective pressure in favor of instinct. Strong bias is analogous to instinct,
and weak bias is analogous to learning [60]. The Baldwin effect only alters the fitness
landscape and the basic evolutionary mechanism remains purely Darwinian. Thus,
the schema theorem still applies to the Baldwin effect [59].
A parent cannot pass its learned traits to its offspring, instead only the fitness after
learning is retained. In other words, the learned behaviors become instinctive behav-
iors in subsequent generations, and there is no direct alteration of the genotype.
The acquired traits finally come under direct genetic control after many genera-
tions, namely, genetic assimilation. The Baldwin effect is purely Darwinian, not
6 1 Introduction
Lamarckian in its mechanism, although it has consequences that are similar to those
of the Lamarckian evolution [59]. A computational model of the Baldwin effect is
presented in [27].
Hybridization of EAs and local search can be based either on the Lamarckian
strategy or on the Baldwin effect. Local search corresponds to the phenotypic plas-
ticity in biological evolution. The hybrid methods based on the Lamarckian strategy
and the Baldwin effect are very successful with numerous implementations.
The definition of swarm intelligence was introduced in 1989, in the context of cellular
robotic systems [6]. Swarm intelligence is a collective intelligence of groups of
simple agents [8]. Swarm intelligence deals with collective behaviors of decentralized
and self-organized swarms, which result from the local interactions of individual
components with one another and with their environment [8]. Although there is
normally no centralized control structure dictating how individual agents should
behave, local interactions among such agents often lead to the emergence of global
behavior.
Most species of animals show social behaviors. Biological entities often engage
in a rich repertoire of social interaction that could range from altruistic cooperation
to open conflict. The well-known examples for swarms are bird flocks, herds of
quadrupeds, bacteria molds, fish schools for vertebrates, and the colony of social
insects such as termites, ants, bees, and cockroaches, that perform collective behavior.
Through flocking, individuals gain a number of advantages, such as having reduced
chances of being captured by predators, following migration routes in a precise and
robust way through collective sensing, having improved energy efficiency during the
travel, and the opportunity of mating.
The concept of individual–organization [57] has been widely used to understand
collective behavior of animals. The principle of individual–organization indicates
that simple repeated interactions between individuals can produce complex behav-
ioral patterns at group level [57]. The agents of these swarms behave without super-
vision and each of these agents has a stochastic behavior due to its perception from,
and also influence on, the neighborhood and the environment. The behaviors can
be accurately described in terms of individuals following simple sets of rules. The
existence of collective memory in animal groups [15] establishes that the previous
history of the group structure influences the collective behavior in future stages.
Grouping individuals often have to make rapid decisions about where to move
or what behavior to perform, in uncertain or dangerous environments. Groups are
often composed of individuals that differ with respect to their informational status,
and individuals are usually not aware of the informational state of others. Some
animal groups are based on a hierarchical structure according to a fitness principle
known as dominance. The top member of the group leads all members of that group,
e.g., in the cases of lions, monkeys, and deer. Such animal behaviors lead to stable
1.4 Swarm Intelligence 7
groups with better cohesion properties among individuals [9]. Some animals, like
birds, fishes and sheep droves, live in groups but have no leader. This type of animals
has no knowledge about their group and environment. Instead, they can move in the
environment via exchanging data with their adjacent members.
Different swarm intelligence systems have inspired several approaches, including
particle swarm optimization (PSO) [21], based on the movement of bird flocks and
fish schools; the immune algorithm by the immune systems of mammals; bacteria for-
aging optimization [50], which models the chemotactic behavior of Escherichia coli;
ant colony optimization (ACO) [17], inspired on the foraging behavior of ants; and
artificial bee colony (ABC) [35], based on foraging behavior of honeybee swarms.
Unike EAs, which are primarily competitive among the population, PSO and
ACO adopt a more cooperative strategy. They can be treated as ontogenetic, since
the population resembles a multicellular organism optimizing its performance by
adapting to its environment.
Many population-based metaheuristics are actually social algorithms. Cultural
algorithm [53] is introduced for modeling social evolution and learning. Ant colony
optimization is a metaheuristic inspired by ant colony behavior in finding the short-
est path to reach food sources. Particle swarm optimization is inspired by social
behavior and movement dynamics of insect swarms, bird flocking, and fish school-
ing. Artificial immune system is inspired by biological immune systems, and exploit
their characteristics of learning and memory to solve optimization problems. Society
and civilization method [52] utilizes the intra and intersociety interactions within a
society and the civilization model.
These aspects are not only general for biological-inspired natural computing, but
also applicable for all agent-based paradigms.
In biological populations, there is a continuous interplay between individuals of
the same species, and individuals of different species. Such ecological systems is
observed as symbiosis, host–parasite systems, and prey–predator systems, in which
two organisms mutually support each other, one exploits the other, or they fight
against each other. For instance, symbiosis between plants and fungi are very com-
mon, where the fungus invades and lives among the cortex cells of the secondary roots
and, in turn, helps the host plant absorb minerals from the soil. Cleaning symbiosis
is common in fish.
Natural selection has a tendency to eliminate animals having poor foraging strategies
and favor the ones with successful foraging strategies to propagate their genes. After
1.4 Swarm Intelligence 9
many generations, poor foraging strategies are either eliminated or shaped into good
ones.
Foraging can be modeled as an optimization process where an animal seeks to
maximize the energy obtained per unit time spent in foraging, or to maximize the
long-term average rate of energy intake, under constraints of its own physiology and
environment. Optimization models are also valid for social foraging where groups
of animals cooperatively forage.
Some animals forage as individuals and others forage as groups with a type of
collective intelligence. While an animal needs communication capabilities to perform
social foraging, it can exploit essentially the sensing capabilities of the group. The
group can catch large prey, individuals can obtain protection from predators while
in a group.
In general, a foraging strategy involves finding a patch of food, deciding whether
to proceed and search for food, and when to leave the patch. There are predators and
risks, energy required for travel, and physiological constraints (sensing, memory,
cognitive capabilities). Foraging scenarios can be modeled and optimal policies can
be found using dynamic programming. Search and optimal foraging decision-making
of animals can be one of three basic types: cruise (e.g., tunafish and hawks), saltatory
(e.g., birds, fish, lizards, and insects), and ambush (e.g., snakes and lions). In cruise
search, an animal searches the perimeter of a region; in an ambush, it sits and waits;
in saltatory search, an animal typically moves in some directions, stops or slows
down, looks around, and then changes direction over a whole region.
problem. By searching over a large set of feasible solutions, metaheuristics can often
find good solutions with less computational effort than calculus-based methods, or
simple heuristics, can.
Metaheuristics can be single-solution-based or population-based. Single-solution
based metaheuristics are based on a single solution at any time and comprise
local search-based metaheuristics such as SA, Tabu search, iterated local search
[40,42], guided local search [61], pattern search or random search [31], Solis–Wets
algorithm [54], and variable neighborhood search [45]. In population-based meta-
heuristics, a number of solutions are updated iteratively until the termination condi-
tion is satisfied. Population-based metaheuristics are generally categoried into EAs
and swarm-based algorithms. Single-solution-based metaheuristics are regarded to
be more exploitation-oriented, whereas population-based metaheuristics are more
exploration-oriented.
The idea of hyper-heuristics can be traced back to the early 1960s [23]. Hyper-
heuristics can be thought of as heuristics to choose heuristics or as search algorithms
that explore the space of problem solvers. A hyper-heuristic is a heuristic search
method that seeks to automate the process of selecting, combining, generating, or
adapting several simpler heuristics to efficiently solve hard search problems. The low-
level heuristics are simple local search operators or domain-dependent heuristics,
which operate directly on the solution space for a given problem instance. Unlike
metaheuristics that search in a space of problem solutions, hyper-heuristics always
search in a space of low-level heuristics.
Heuristic selection and heuristic generation are currently the two main method-
ologies in hyper-heuristics. In the first method, the hyper-heuristic chooses heuristics
from a set of known domain-dependent low-level heuristics. In the second method,
the hyper-heuristic evolves new low-level heuristics by utilizing the components
of the existing ones. Hyper-heuristics can be based on genetic programming [11]
or grammatical evolution [10], which becomes an excellent candidate for heuristic
generation.
Several Single-Solution-Based Metaheuristics
Search strategies that randomly generate initial solutions and perform a local search
are also called multi-start descent search methods. However, to randomly create an
initial solution and perform a local search often results in low solution quality as the
complete search space is uniformly searched and search cannot focus on promising
areas of the search space.
Variable neighborhood search [45] combines local search strategies with dynamic
neighborhood structures subject to the search progress. The local search is an inten-
sification step focusing the search in the direction of high-quality solutions. Diver-
sification is a result of changing neighborhoods. By changing neighborhoods, the
method can easily escape from local optima. With an increasing cardinality of the
neighborhoods, diversification gets stronger as the shaking steps can choose from a
larger set of solutions and local search covers a larger area of the search space.
Guided local search [61] uses a similar principle and dynamically changes the
fitness landscape subject to the progress that is made during the search so that local
1.5 Heuristics, Metaheuristics, and Hyper-Heuristics 11
search can escape from local optima. The neighborhood structure remains constant.
It starts from a random solution x0 and performs a local search returning the local
optimum x1 . To escape the local optimum, a penalty is added to the fitness function
f such that the resulting fitness function h allows local search to escape. A new local
search is started from x1 using the modified fitness function h. Search continues until
a termination criterion is met.
Iterated local search [40,42] connects the unrelated local search phases as it creates
initial solutions not randomly but based on solutions found in previous local search
runs. If the perturbation steps are too small, the search cannot escape from a local
optimum. If perturbation is too strong, the search has the same behavior as multi-start
descent search methods. The modification step as well as the acceptance criterion
can depend on the search history.
1.6 Optimization
Figure 1.3 The landscape of Rosenbrock function f (x) with two variables x1 , x2 ∈
[−204.8, 204.8]. The spacing of the grid is set as 1. There are many local minima, and the global
minimum 0 is at (1, 1).
points. Simplex search and pattern search are two examples of effective direct search
methods.
Typical nonderivative methods for multivariable functions are random-restart
hill-climbing, random search, many heuristic and metaheuristic methods, and their
hybrids. Hill-climbing attempts to optimize a discrete or continuous function for
a local optimum. When operating on continuous space, it is called gradient ascent.
Other nonderivative search methods include univariant search parallel to an axis (i.e.,
coordinate search method), sequential simplex method, and acceleration methods in
direct search such as the Hooke-Jeeves method, Powell’s method and Rosenbrock’s
method. Interior-point methods represent state-of-the-art techniques for solving lin-
ear, quadratic, and nonlinear optimization programs.
The Lagrange multiplier method can be used to analytically solve continuous func-
tion optimization problem subject to equality constraints [24]. By introducing the
1.6 Optimization 13
Lagrangian formulation, the dual problem associated with the primal problem is
obtained, based on which the optimal values of the Lagrange multipliers can be
found.
Let f (x) be the objective function and hi (x) = 0, i = 1, . . . , m, be the constraints.
The Lagrange function can be constructed as
m
L (x; λ1 , . . . , λm ) = f (x) + λi hi (x), (1.1)
i=1
where λi , i = 1, . . . , m, are called the Lagrange multipliers.
The constrained optimization problem is converted into an unconstrained opti-
mization problem: Optimize L (x; λ1 , . . . , λm ). By setting
∂
L (x; λ1 , . . . , λm ) = 0, (1.2)
∂x
∂
L (x; λ1 , . . . , λm ) = 0, i = 1, . . . , m, (1.3)
∂λi
and solving the resulting set of equations, we can obtain the x position at the extremum
of f (x) under the constraints.
To deal with constraints, the Karush-Kuhn-Tucker (KKT) theorem, as a gener-
alization to the Lagrange multiplier method, introduces a slack variable into each
inequality constraint before applying the Lagrange multiplier method. The conditions
derived from the procedure are known as the KKT conditions [24].
Simplex Search
Simplex search is a group-based deterministic local search method capable of explor-
ing the objective space very fast. Thus many EAs use simplex search as a local search
method after mutation.
A simplex is a collection of n + 1 points in n-dimensional space. In an optimization
problem involving n variables, simplex method searches for an optimization solution
by evaluating a set of n + 1 points. The method continuously forms new simplices
by replacing the point having the worst performance in a simplex with a new point.
The new point is generated by reflection, expansion, and contraction operations.
In a multidimensional space, the subtraction of two vectors means a new vector
starting at one vector and ending at the other, like x2 − x1 . We often refer to the
subtraction of two vectors as a direction. Addition of two vectors can be implemented
in a triangular way, moving the start of one vector to the end of the other to form
another vector. The expression x3 + (x2 − x1 ) can be regarded as the destination of
a moving point that starts at x3 and has a length and direction of x2 − x1 .
For every new simplex, several points are assigned according to their objective
values. Then simplex search repeats reflection, expansion, contraction, and shrink in
a very efficient and deterministic way. Vertices of the simplex will move toward the
optimal point and the simplex will become smaller and smaller. Stop criteria can be
selected as a predetermined number of maximal iterations, the length of the edge or
the improving rate of B.
Simplex search for minimization is shown in Algorithm 1.1. The coefficients for
the reflection, expansion, contraction, and shrinking operations are typically selected
as α = 1, β = 2, γ = −1/2, and δ = 1/2. The initial simplex is important. The
search may easily get stuck for too small an initial simplex. This simplex should be
selected depending on the nature of the problem.
1. Initialize parameters.
Randomize the set of individuals xi .
2. Repeat:
a. Find the worst and best individuals as xh and xl .
Calculate the centroid of all xi ’s, i = h, as x.
b. Enter reflection mode:
xr = x + α(x − xh );
c. if f (xl ) < f (xr ) < f (xh ), xh ← xr ;
else if f (xr ) < f (xl ), enter expansion mode:
xe = x + β(x − xh );
if f (xe ) < f (xl ), xh ← xe ;
else xh ← xr ;
end
else if f (xr ) > f (xi ), ∀i = h, enter contraction mode:
xc = x + γ(x − xh );
if f (xc ) < f (xh ), xh ← xc ;
else enter shrinking mode:
xi = xl + δ(xi − xl ), ∀i = l;
end
end
until termination condition is satisfied.
where X ⊂ RN is the search space defined over a finite set of N discrete decision
variables x = (x1 , x2 , . . . , xN )T , f : X → R, is the set of constraints on x. Space
X is constructed according to all the constraints imposed on the problem.
Definition 1.2 (Feasible solution). A vector x that satisfies the set of constraints for
an optimization problem is called a feasible solution.
Traveling salesman problem (TSP) is perhaps the most famous COP. Given a set
of points, either nodes on a graph or cities on a map, find the shortest possible tour
that visits every point exactly once and then returns to its starting point. There are
(n − 1)!/2 possible tours for an n-city TSP. TSP arises in numerous applications,
from routing of wires on a printed circuit board (PCB), VLSI circuit design, to fast
food delivery.
Multiple traveling salesmen problem (MTSP) generalizes TSP using more than
one salesman. Given a set of cities and a depot, m salesmen must visit all cities
according to the constraints that the route formed by each salesman must start and
end at the depot, that each intermediate city must be visited once and by a single
salesman, and that the cost of the routes must be minimum. TSP with a time window
is a variant of TSP in which each city is visited within a given time window.
The vehicle routing problem concerns the transport of items between depots and
customers by means of a fleet of vehicles. It can be used for logistics and public
16 1 Introduction
services, such as milk delivery, mail or parcel pick-up and delivery, school bus
routing, solid waste collection, dial-a-ride systems, and job scheduling. Two well-
known routing problems are TSP and MTSP.
The location-allocation problem is defined as follows. Given a set of facilities,
each of which serves a certain number of nodes on a graph, the objective is to place
the facilities on the graph so that the average distance between each node and its
serving facility is minimized.
An issue related to the efficiency and efficacy of an algorithm is how hard the problem
itself is. The optimization problem is first transformed into a decision problem.
Problems that can be solved using a polynomial-time algorithm are tractable. A
polynomial-time algorithm has an upper bound O(nk ) on its running time, where k is
a constant and n is the problem size (input size). Usually, tractable problems are easy
to solve as running time increases relatively slowly with n. In contrast, problems are
intractable if they cannot be solved by a polynomial-time algorithm and there is a
lower bound on the running time which is (k n ), where k > 1 is a constant and n is
the input size.
The complexity class P (standing for polynomial time complexity) is defined as
the set of decision problems that can be solved by a deterministic Turing machine
using an algorithm with worst-case polynomial time complexity. P problems are
usually easy as there are algorithms that solve them in polynomial time.
The class NP (standing for nondeterministic polynomial time complexity) is the
set of all decision problems that can be verified by a nondeterministic Turing machine
using a nondeterministic algorithm in worst-case polynomial time. Although nonde-
terministic algorithms cannot be executed directly on conventional computers, this
concept is important and helpful for the analysis of the computational complexity
of problems. All problems in P also belong to the class NP, i.e., P ⊆ NP. There are
also problems where correct solutions cannot be verified in polynomial time.
All decision problems in P are tractable. Those problems that are in NP, but not in
P, are difficult as no polynomial-time algorithms exist for them. There are problems
in NP where no polynomial algorithm is available and which can be transformed into
one another with polynomial effort. A problem is said to be NP-hard, if an algorithm
for solving this problem is polynomial-time reducible to an algorithm that is able to
solve any problem in NP. Therefore, NP-hard problems are at least as hard as any
other problem in NP, and are not necessarily in NP.
The set of NP-complete problems is a subset of NP [14]. A decision problem A is
said to be NP-complete, if A is in NP and A is also NP-hard. NP-complete problems
are the hardest problems in NP. They all have the same complexity. They are difficult
as no polynomial-time algorithms are known. Decision problems that are not in NP
are even more difficult. The relationship between all these classes is illustrated in
Figure 1.4.
1.6 Optimization 17
Practical COPs are all NP-complete or NP-hard. Right now, no algorithm with
polynomial time complexity can guarantee that an optimal solution will be found.
hi (x) = 0, i = 1, 2, . . . , p, (1.7)
where x = (x1 , x2 , . . . , xn )T ∈ Rn , the objective functions fi : Rn → R, i = 1, . . . , k,
and gi , hj : Rn → R, i = 1, . . . , m, j = 1, . . . , p are the constraint functions of the
problem.
Conflicting objectives will be the case where increasing the quality of one objective
tends to simultaneously decrease the quality of another objective. The solution to
an MOP is not a single optimal solution, but a set of solutions representing the best
trade-offs among the objectives.
In order to optimize a system with conflicting objectives, the weighted sum of
these objectives is usually used as the compromise of the system
k
F(x) = wi f i (x), (1.8)
i=1
fi (x)
where f i (x) = |max(f i (x))|
are normalized objectives, and ki=1 wi = 1.
For many problems, there are difficulties in normalizing the individual objectives,
and also in selecting the weights. The lexicographic order optimization is based on
the ranking of the objectives in terms of their importance.
18 1 Introduction
Definition 1.7 (Pareto optimal frontier). The Pareto optimal frontier P ∗ is defined
by the space in Rn formed by all Pareto optimal solutions P ∗ = {x ∈ F |x
is Pareto optimal}.
Obtaining the Pareto front of a MOP is the main goal of multiobjective optimiza-
tion. A good solution must contain a limited number of points, which should be as
close as possible to the exact Pareto front, as well as they should be uniformly spread
so that no regions are left unexplored.
An illustration of Pareto optimal solutions for a two-dimensional problem with
two objectives is given in Figure 1.5. The upper border from points A to B of the
domain X , denoted P ∗ , contains all Pareto optimal solutions. The frontier from points
f A to f B along the lower border of the domain Y , denoted PF ∗ , contains all Pareto
frontier in the objective space. For two points a and b, their mapping f a dominates f b ,
1.6 Optimization 19
x1 f1
A
P* f1* fA Y
X fb
a B f (x) fa
b fB
PF *
Figure 1.5 An illustration of Pareto optimal solutions for a two-dimensional problem with two
objectives. X ⊂ Rn is the domain of x, and Y ⊂ Rm is the domain of f (x).
fB fB fB
PF *
PF *
f2* f2 f2* f2 PF * f2* f2
• Best-so-far (BSF) records the best solution found by the algorithm thus far for
each generation in every run. BSF index is monotonic.
• Best-of-current-population (BCP) records the best solution in each generation in
every run. MBF is the average of final BCP or final BSF over multiple runs.
• Average-of-current-population (ACP) records the average solution in each gener-
ation in every run.
• Worst-of-current-population (WCP) records the worst solution in each generation
in every run.
After many runs with random initial setting, we can draw conclusions on an algo-
rithm by applying statistical descriptions, e.g., statistical visualization, descriptive
statistics, and statistical inference.
Statistical visualization uses graphs to describe and compare algorithms. The box
plot is widely used for this purpose. Suppose we run an algorithm on a problem 100
times and get 100 values of a performance indicator. We can rank the 100 numbers
in ascending order. On each box, the central mark is the median, the lower and upper
edges are the 25th and 75th percentiles, the whiskers extend to the most extreme
data points not considered outliers, and outliers are plotted individually by +. The
interquartile range (IQR) is between the lower and upper edges of the box. Any
data that lie more than 1.5IQR lower than the lower quartile or 1.5IQR higher than
the higher quartile is considered an outlier. Two lines called whiskers are plotted to
indicate the smallest number that is not a lower outlier and the largest number that
is not a higher outlier. The default 1.5IQR corresponds to approximately ±2.7σ and
99.3 coverage if the data are normally distributed.
The box plot for BSF performance of two algorithms is illustrated in Figure 1.7.
Algorithm 2 has a larger median BDF and a smaller IQR, that is, better average
2
BSF
−2
−4
Algorithm 1 Algorithm 2
performance along with smaller variance, thus it outperforms algorithm 1. Also, for
the evolving process of many runs, the convergence graph illustrating the perfor-
mance over number of fitness evaluation (NOFE) is quite useful.
Graphs are easy to understand. When the difference between different algorithms
is small, one has to calculate specific numbers to describe and compare the perfor-
mance. The most often used descriptive statistics are mean and variance (or standard
deviation) of all performance indicators and compare them. Statistical inference
includes parameter estimation, hypothesis testing, and many other techniques.
Before no free lunch theorem [63,64] was proposed in 1995, people intuitively
believed that there exists some universally beneficial algorithms for search, and
many people actually made efforts to design some algorithms. No free lunch theorem
asserts that there is no universally beneficial algorithm.
The original no free lunch theorem for optimization states that no search algorithm
is better than another in locating an extremum of a function when averaged over the
set of all possible discrete functions. That is, all search algorithms achieve the same
performance as random enumeration, when evaluated over the set of all functions.
Theorem 1.1 (No free lunch theorem). Given the set of all functions F and a set
of benchmark functions F1 , if algorithm A1 is better on average than algorithm A2
on F1 , then algorithm A2 must be better than algorithm A1 on F \ F1 .
When there is no structural knowledge at all, all algorithms have equal perfor-
mance. No free lunch theorem is feasible for non-revisiting algorithms with no
problem-specific knowledge. It seems to be true because of deceptive functions and
random functions. Deceptive functions lead a hill-climber away from the optimum.
For random functions, search for optimum is totally at no where. For the two classes
of functions, it is like finding a needle in a haystack.
No free lunch theorem is concerned with the average performance for solving
all problems. In applications, such a scenario is hardly ever realistic since there is
almost always some knowledge about typical solutions. Practical problems always
contain priors such as smoothness, symmetry, and i.i.d. samples. The performance
of any algorithm is determined by the knowledge concerning the cost function. Thus,
it is meaningless to evaluate the performance of an algorithm without specifying the
prior knowledge. Thus, developing search algorithms actually builds special-purpose
methods to solve application-specific problems. For example, there are potentially
free lunches for coevolutionary approaches [65].
No free lunch theorem was later extended to coding methods, crossvalidation [67],
early stopping [12], avoidance of overfitting, and noise prediction [41]. Again, it has
been asserted that no one method is better than the others for all problems.
1.8 No Free Lunch Theorem 23
• At the level of immune systems, the whole immune system of the biological entity
is working together to protect the body from damage of antigens. Artificial immune
systems simulate the activity of the immune system.
• At the level of the brain, cognitive processes take place in a life-long incremental
multiple task/multiple modalities learning mode, such as language and reasoning,
and global information processes are manifested, such as consciousness. Tabu
search, fuzzy logic and reasoning simulate how human thinks.
• At the level of a population of individuals, species evolve through evolution via
changing the genetic DNA code. Evolutionary algorithms are inspired by this idea.
• At the level of a population of individuals, individuals interact with one another
by social behaviors. Swarm intelligence contains a large class of algorithms that
simulate the social behaviors of a wide range of animals, from bacteria, insects,
fishes, mammals, to humans.
There are also many algorithms inspired by various natural phenomena, such as
rivers, tornado, plant reproduction, or by physical laws. Building computational
models that integrate principles from different levels may be efficient for solving
complex problems.
In the subsequent chapters we, respectively, introduce optimization methods
inspired from physical annealing, biological evolution, Bayesian inference, cultural
propagation, swarming of animals, artificial immune systems, ant colony, bee for-
aging, bacteria foraging, music harmony, quantum mechanics, DNA and molecular
biology, human strategies for problem-solving, and numerous other natural phenom-
ena.
In addition to the specific metaheuristics-based methods, we have also described
some general topics that are common to all optimization problems. The topics treated
are dynamic optimization, multimodal optimization, constrained optimization, mul-
tiobjective optimization, and coevolution.
Recurrent neural network models are also used for solving discrete as well as con-
tinuous optimization in the form of quadratic programming. Reinforcement learning
is a metaheuristic dynamic programming method for solving Markov and semi-
Markov decision problems. Since these neural network methods are useful for a
particular class of optimization problems, we do not treat them in this book. Inter-
ested readers are referred to the textbook entitled Neural Networks and Statistical
Learning by the same authors [19].
Problems
subject to
x1 + 2x2 − x3 + x4 = 2,
2x1 − x2 + x3 + x4 = 4
by using the Lagrange multiplier method.
1.3 Consider the function f (x) = x 3 + 4x 2 + 3x + 1.
(a) Compute its gradient.
(b) Find all its local and global maxima/minima.
1.4 Given a set of points and a multiobjective optimization problem, judge the state-
ment that one point always dominates the others.
1.5 Given four points and their objective function values for multiobjective mini-
mization:
f1 (x1 ) = 1, f2 (x1 ) = 1,
f1 (x2 ) = 1, f2 (x2 ) = 2,
f1 (x3 ) = 2, f2 (x3 ) = 1,
f1 (x4 ) = 2, f2 (x4 ) = 2,
1) Which point dominates all the others?
2) Which point is nondominated?
3) Which point is Pareto optimal?
1.6 Apply exhaustive search to find the Pareto set and Pareto front of the problem
min{sin(x1 + x2 ), sin(x1 − x2 )},
where x1 , x2 ∈ (0, π], and the search resolution is 0.02.
1.7 What are the path, adjacent, ordinal, and matrix representations of the path
1 → 2 → 3 → 4 → 5?
1.8 MATLAB Global Optimization Toolbox provides MultiStart solver for find-
ing multiple local minima of smooth problems by using efficient gradient-based
local solvers. Try MultiStart solver on a benchmark function given in the
Appendix. Test the influence of different parameters.
1.9 Implement the patternsearch solver of MATLAB Global Optimization
Toolbox for solving a benchmark function given in the Appendix. Test the influ-
ence of different parameters.
References
1. Adleman LM. Molecular computation of solutions to combinatorial problems. Science.
1994;266:1021–4.
2. Auger A, Teytaud O. Continuous lunches are free plus the design of optimal optimization
algorithms. Algorithmica. 2010;57:121–46.
3. Banks A, Vincent J, Phalp K. Natural strategies for search. Nat Comput. 2009;8:547–70.
26 1 Introduction
4. Barnard CJ, Sibly RM. Producers and scroungers: a general model and its application to captive
flocks of house sparrows. Anim Behav. 1981;29:543–50.
5. Battail G. Heredity as an encoded communication process. IEEE Trans Inf Theory.
2010;56(2):678–87.
6. Beni G, Wang J. Swarm intelligence in cellular robotics systems. In: Proceedings of NATO
Advanced Workshop on Robots Biological Systems, Toscana, Italy, June 1989, p. 703–712.
7. Bishop CM. Neural networks for pattern recogonition. New York: Oxford Press; 1995.
8. Bonabeau E, Dorigo M, Theraulaz G. Swarm intelligence: from natural to artificial systems.
New York: Oxford University Press; 1999.
9. Broom M, Koenig A, Borries C. Variation in dominance hierarchies among group-living ani-
mals: modeling stability and the likelihood of coalitions. Behav Ecol. 2009;20:844–55.
10. Burke EK, Hyde MR, Kendall G. Grammatical evolution of local search heuristics. IEEE Trans
Evol Comput. 2012;16(3):406–17.
11. Burke EK, Hyde MR, Kendall G, Ochoa G, Ozcan E, Woodward JR. Exploring hyper-heuristic
methodologies with genetic programming. In: Mumford CL, Jain LC, editors. Computational
intelligence: collaboration, fusion and emergence. Berlin, Heidelberg: Springer; 2009. p. 177–
201.
12. Cataltepe Z, Abu-Mostafa YS, Magdon-Ismail M. No free lunch for early stropping. Neural
Comput. 1999;11:995–1009.
13. Clark CW, Mangel M. Foraging and ocking strategies: information in an uncertain environment.
Am Nat. 1984;123(5):626–41.
14. Cook SA. The complexity of theorem-proving procedures. In: Proceedings of the 3rd ACM
symposium on theory of computing, Shaker Heights, OH, USA, May 1971, p. 151–158.
15. Couzin ID, Krause J, James R, Ruxton GD, Franks NR. Collective memory and spatial sorting
in animal groups. J Theoret Biol. 2002;218:1–11.
16. de Castro LN, Timmis J. Artificial immune systems: a new computational intelligence approach.
Springer; 2002.
17. Dorigo M, Maniezzo V, Colorni A. Ant system: an autocatalytic optimizing process. Technical
Report 91-016, Politecnico di Milano, Milan, Italy, 1991.
18. Dorigo M, Maniezzo V, Colorni A. The ant system: optimization by a colony of cooperating
agents. IEEE Trans Syst, Man, Cybern Part B. 1996;26(1):29–41.
19. Du K-L, Swamy MNS. Neural netwotks and statistical learning. London: Springer; 2014.
20. Duenez-Guzman EA, Vose MD. No free lunch and benchmarks. Evol Comput. 2013;21(2):293–
312.
21. Eberhart R, Kennedy J. A new optimizer using particle swarm theory. In: Proceedings of the
6th International symposium on micro machine and human science, Nagoya, Japan, October
1995, p. 39–43.
22. Engelbrecht AP. Fundamentals of computational swarm intelligence. New Jersey: Wiley; 2005.
23. Fisher H, Thompson GL. Probabilistic learning combinations of local job shop scheduling rules.
In: Muth JF, Thompson GL, editors. Industrial scheduling. New Jersey: Prentice Hall;1963. p.
225–251.
24. Fletcher R. Practical methods of optimization. New York: Wiley; 1991.
25. Glover F. Future paths for integer programming and links to artificial intelligence. Comput
Oper Res. 1986;13:533–49.
26. Glover F, Laguna M, Marti R. Scatter search. In: Ghosh A, Tsutsui S, editors. Advances in
evolutionary computing: theory and applications. Berlin: Springer;2003. p. 519–537.
27. Hinton GE, Nowlan SJ. How learning can guide evolution. Complex Syst. 1987;1:495–502.
28. Hirvensalo M. Quantum computing. Springer. 2004.
29. Hodgkin AL, Huxley AF. Quantitative description of membrane current and its application to
conduction and excitation in nerve. J Physiol. 1952;117:500.
30. Holland JH. Outline for a logical theory of adaptive systems. J ACM. 1962;9(3):297–314.
References 27
31. Hooke R, Jeeves TA. “Direct search” solution of numerical and statistical problems. J ACM.
1961;8(2):212–29.
32. Hopfield JJ, Tank DW. Neural computation of decisions in optimization problems. Biol Cybern.
1985;52:141–52.
33. Hoppe W, Lohmann W, Markl H, Ziegler H. Biophysics. New York: Springer; 1983.
34. Igel C, Toussaint M. A no-free-lunch theorem for non-uniform distributions of target functions.
J Math Model Algorithms. 2004;3(4):313–22.
35. Karaboga D. An idea based on honey bee swarm for numerical optimization. Technical Report
TR06, Erciyes University, Kayseri, Turkey. 2005.
36. Kasabov N. Integrative connectionist learning systems inspired by nature: current models,
future trends and challenges. Natural Comput. 2009;8:199–218.
37. Kirkpatrick S, Gelatt CD Jr, Vecchi MP. Optimization by simulated annealing. Science.
1983;220:671–80.
38. Kleene SC. Introduction to metamathematics. Amsterdam: North Holland; 1952.
39. Ku KWC, Mak MW, Siu WC. Approaches to combining local and evolutionary search for
training neural networks: a review and some new results. In: Ghosh A, Tsutsui S, editors.
Advances in evolutionary computing: theory and applications. Berlin: Springer; 2003. p. 615–
641.
40. Lourenco HR, Martin O, Stutzle T. Iterated local search: framework and applications. In:
Handbook of metaheuristics, 2nd ed. New York: Springer. 2010.
41. Magdon-Ismail M. No free lunch for noise prediction. Neural Comput. 2000;12:547–64.
42. Martin O, Otto SW, Felten EW. Large-step Markov chains for the traveling salesman problem.
Complex Syst. 1991;5:299–326.
43. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull
Math Biophys. 1943;5:115–33.
44. Mirjalili S, Lewis A, Mostaghim S. Confidence measure: a novel metric for robust meta-
heuristic optimisation algorithms. Inf Sci. 2015;317:114–42.
45. Mladenovic N, Hansen P. Variable neighborhood search. Comput Oper Res. 1997;24:1097–
100.
46. Moore M, Narayanan A. Quantum-inspired computing. Technical Report, Department of Com-
puter Science, University of Exeter, Exeter, UK. 1995.
47. Nelder JA, Mead R. A simplex method for function minimization. Comput J. 1965;7:308–13.
48. von Neumann J. Zur Theorie der Gesellschaftsspiele. Ann Math. 1928;100:295–320.
49. Ozcan E, Bilgin B, Korkmaz EE. A comprehensive analysis of hyper-heuristics. Intell Data
Anal. 2008;12(1):3–23.
50. Passino KM. Biomimicry of bacterial foraging for distributed optimisation and control. IEEE
Control Syst Mag. 2002;22(3):52–67.
51. Paun G. Membrane computing: an introduction. Berlin: Springer; 2002.
52. Ray T, Liew KM. Society and civilization: an optimization algorithm based on simulation of
social behavior. IEEE Trans Evol Comput. 2003;7:386–96.
53. Reynolds RG. An introduction to cultural algorithms. In: Proceedings of the 3rd Annual con-
ference on evolutionary programming, San Diego, CA, USA. New Jersey: World Scientific;
1994. p. 131–139
54. Solis FJ, Wets RJ. Minimization by random search techniques. Math Oper Res. 1981;6:19–30.
55. Stannett M. The computational status of physics: a computable formulation of quantum theory.
Nat Comput. 2009;8:517–38.
56. Storn R, Price K. Differential evolution—a simple and efficient heuristic for global optimization
over continuous spaces. J Glob Optim. 1997;11:341–59.
57. Sumper D. The principles of collective animal behaviour. Philos Trans R Soc B.
2006;36(1465):5–22.
58. Talbi E-G. Metaheuristics: from design to implementation. Hoboken, NJ: Wiley; 2009.
28 1 Introduction
59. Turney P. Myths and legends of the Baldwin effect. In: Proceedings of the 13th international
conference on machine learning, Bari, Italy, July 1996, p. 135–142.
60. Turney P. How to shift bias: lessons from the Baldwin effect. Evol Comput. 1997;4(3):271–95.
61. Voudouris C, Tsang E. Guided local search. Technical Report CSM-247, University of Essex,
Colchester, UK. 1995.
62. Whitley D, Gordon VS, Mathias K. Lamarckian evolution, the Baldwin effect and function
optimization. In: Proceedings of the 3rd Conference on parallel problem solving from nature
(PPSN III), Jerusalem, Israel, October 1994. p. 6–15.
63. Wolpert DH, Macready WG. No free lunch theorems for search. Technical Report SFI-TR-95-
02-010, Santa Fe Institute, Sante Fe, NM, USA. 1995.
64. Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Com-
put. 1997;1(1):67–82.
65. Wolpert DH, Macready WG. Coevolutionary free lunches. IEEE Trans Evol Comput.
2005;9(6):721–35.
66. Yao X. Evolving artificial neural networks. Proc IEEE. 1999;87(9):1423–47.
67. Zhu H. No free lunch for cross validation. Neural Comput. 1996;8(7):1421–6.
68. Zimmermann HJ, Sebastian HJ. Intelligent system design support by fuzzy-multi-criteria deci-
sion making and/or evolutionary algorithms. In: Proceedings of IEEE international conference
on fuzzy systems, Yokohama, Japan, March 1995. p. 367–374.
69. Zitzler E, Thiele L, Laumanns M, Fonseca CM, da Fonseca VG. Performance assessment of
multiobjective optimizers: an analysis and review. IEEE Trans Evol Comput. 2003;7:117–32.
Simulated Annealing
2
2.1 Introduction
SA. It has been used in statistical physics to choose sample states of a particle
system model to efficiently estimate some physical quantities. Importance sampling
probabilistically favors states with lower energies.
SA is a general-purpose, serial algorithm for finding a global minimum for a
continuous function. It is also a popular Monte Carlo algorithm for any optimization
problem including COPs. The solutions by this technique are close to the global
minimum within a polynomial upper bound for the computational time and are
independent of the initial conditions. Some parallel algorithms for SA have been
proposed aiming to improve the accuracy of the solutions by applying parallelism [5].
The probability of uphill moves in the energy function (ΔE > 0) is large at high T ,
and is low at low T . SA allows uphill moves in a controlled fashion: It attempts to
improve on greedy local search by occasionally taking a risk and accepting a worse
solution. SA can be performed as Algorithm 2.1 [10].
However, due to its Monte Carlo nature, SA would require for some problems
even more iterations than complete enumeration in order to guarantee convergence to
an exact solution. For example, for an n-city TSP, SA using
2n−1 the logarithmic cooling
schedule needs a computational complexity of O n n , which is far more than
2 n
O((n − 1)!) for complete enumeration and O n 2 for dynamic programming [1].
Thus, one has to apply heuristic fast cooling schedules to improve the convergence
speed.
−0.2
Function value
−0.4
−0.6
−0.8
−1
0 200 400 600 800 1000 1200 1400 1600
Iteration
Current Function Value: −7.7973e−005
0.5
Function value
−0.5
−1
0 200 400 600 800 1000 1200 1400 1600
Iteration
Figure 2.2 The evolution of a random run of simple GA: the minimum and average objectives.
After restricting the search space to [−10, 10]2 , and selecting a random intial
point x 0 ∈ [−0.5, 0.5]2 , we have the results of a random run as f (x) = −0.9997 at
(3.1347, 3.1542) with 1597 function evaluations. The evolution of the
simulannealbnd solver is given in Figure 2.2.
These results are very close to the global minimum.
Standard SA is a stochastic search method, and the convergence to the global opti-
mum is too slow for a reliable cooling schedule. Many methods, such as Cauchy
annealing [18], simulated reannealing [9], generalized SA [19], and SA with known
global value [13] have been proposed to accelerate SA search. There are also global
optimization methods that make use of the idea of annealing [15,17].
Cauchy annealing [18] replaces the Boltzmann distribution with the Cauchy dis-
tribution, also known as the Cauchy–Lorentz distribution. The infinite variance pro-
vides a better ability to escape from local minima and allows for the use of faster
schedules, such as T decreasing according to T (t) = Tt0 .
In simulated reannealing [9], T decreases exponentially with t:
c1 t
T = T0 e− J , (2.5)
34 2 Simulated Annealing
where the constant c1 > 0, and J is the dimension of the input space. The intro-
duction of reannealing also permits adaptation to changing insensitivities in the
multidimensional parameter space.
Generalized SA [19] generalizes both Cauchy annealing [18] and Boltzmann
annealing [10] within a unified framework inspired by the generalized thermostatis-
tics. Opposition-based SA [20] improves SA in accuracy and convergence rate using
opposite neighbors.
An SA algorithm under the simplifying assumption of known global value [13]
is the same as Algorithm 2.1 except that at each iteration a uniform random point is
generated over a sphere whose radius depends on the difference between the current
function value E (x(t)) and the optimal value E ∗ , and T is also decided by this
difference. The algorithm has guaranteed convergence and an upper bound for the
expected first hitting time, namely, the expected number of iterations before reaching
the global optimum value within a given accuracy [13].
The idea of annealing is a general optimization principle, which can be extended
using fuzzy logic. In the fuzzy annealing scheme [15], fuzzification is performed by
adding an entropy term. The fuzziness at the beginning of the entire procedure is
used to prevent the optimization process getting stuck at an inferior local optimum.
Fuzziness is reduced step by step. The fuzzy annealing scheme results in an increase
in the computation speed by a factor of one hundred or more compared to SA [15].
Since SA works by simulating from a sequence of distributions scaled with dif-
ferent temperatures, it can be regarded as Markov chain Monte Carlo (MCMC) with
a varying temperature. The stochastic approximation Monte Carlo (SAMC) algo-
rithm [12] has a remarkable feature of its self-adjusting mechanism. If a proposal
is rejected, the weight of the subregion that the current sample belongs to will be
adjusted to a larger value, and thus the proposal of jumping out from the current
subregion will be less likely rejected in the next iteration. Annealing SAMC [11] is
a space annealing√version of SAMC. Under mild conditions, it can converge weakly
at a rate of (1/ t) toward a neighboring set (in the space of energy) of the global
minimizers.
Reversible jump MCMC [7] is a framework for the construction of reversible
Markov chain samplers that jump between parameter subspaces of differing dimen-
sionality. The measure of interest occurs as the stationary measure of the chain. This
iterative algorithm does not depend on the initial state. At each step, a transition
from the current state to a new state is accepted with a probability. This acceptance
ratio is computed so that the detailed balance condition is satisfied, under which
the algorithm converges to the measure of interest. The proposition kernel can be
decomposed into several kernels, each corresponding to a reversible move. In order
for the underlying sampler to ensure the jump between different dimensions, the
various moves used are the birth move, death move, split move, merge move, and
perturb move, each with a probability of 0.2 [2]. SA with reversible-jump MCMC
method [2] has proved convergence.
SA makes a random search on the energy surface. Deterministic annealing [16,17]
is a deterministic method that replaces stochastic simulations by the use of expecta-
tion. It is a method where randomness is incorporated into the energy or cost function,
2.3 Variants of Simulated Annealing 35
Problems
References
1. Aarts E, Korst J. Simulated annealing and Boltzmann machines. Chichester: Wiley; 1989.
2. Andrieu A, de Freitas JFG, Doucet A. Robust full Bayesian learning for radial basis networks.
Neural Comput. 2001;13:2359–407.
3. Azencott R. Simulated annealing: parallelization techniques. New York: Wiley; 1992.
4. Cerny V. Thermodynamical approach to the traveling salesman problem: an efficient simulation
algorithm. J Optim Theory Appl. 1985;45:41–51.
36 2 Simulated Annealing
5. Czech ZJ. Three parallel algorithms for simulated annealing. In: Proceedings of the 4th inter-
national conference on parallel processing and applied mathematics, Naczow, Poland. London:
Springer; 2001. p. 210–217.
6. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration
of images. IEEE Trans Pattern Anal Mach Intell. 1984;6:721–41.
7. Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model deter-
mination. Biometrika. 1995;82:711–32.
8. Hajek B. Cooling schedules for optimal annealing. Math Oper Res. 1988;13(2):311–29.
9. Ingber L. Very fast simulated re-annealing. Math Comput Model. 1989;12(8):967–73.
10. Kirkpatrick S, Gelatt CD Jr, Vecchi MP. Optimization by simulated annealing. Science.
1983;220:671–80.
11. Liang F. Annealing stochastic approximation Monte Carlo algorithm for neural network train-
ing. Mach Learn. 2007;68:201–33.
12. Liang F, Liu C, Carroll RJ. Stochastic approximation in Monte Carlo computation. J Am Stat
Assoc. 2007;102:305–20.
13. Locatelli M. Convergence and first hitting time of simulated annealing algorithms for contin-
uous global optimization. Math Methods Oper Res. 2001;54:171–99.
14. Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E. Equations of state calculations
by fast computing machines. J Chem Phys. 1953;21(6):1087–92.
15. Richardt J, Karl F, Muller C. Connections between fuzzy theory, simulated annealing, and
convex duality. Fuzzy Sets Syst. 1998;96:307–34.
16. Rose K, Gurewitz E, Fox GC. A deterministic annealing approach to clustering. Pattern Recog-
nit Lett. 1990;11(9):589–94.
17. Rose K. Deterministic annealing for clustering, compression, classification, regression, and
related optimization problems. Proc IEEE. 1998;86(11):2210–39.
18. Szu HH, Hartley RL. Nonconvex optimization by fast simulated annealing. Proc IEEE.
1987;75:1538–40.
19. Tsallis C, Stariolo DA. Generalized simulated annealing. Phys A. 1996;233:395–406.
20. Ventresca M, Tizhoosh HR. Simulated annealing with opposite neighbors. In: Proceedings
of the IEEE symposium on foundations of computational intelligence (SIS 2007), Honolulu,
Hawaii, 2007. p. 186–192.
21. Xavier-de-Souza S, Suykens JAK, Vandewalle J, Bolle D. Coupled simulated annealing. IEEE
Trans Syst Man Cybern Part B. 2010;40(2):320–35.
Genetic Algorithms
3
Evolutionary algorithms (EAs) are the most influential metaheuristics for optimiza-
tion. Genetic algorithm (GA) is the most popular form of EA. In this chapter, we
first give an introduction to evolutionary computation. A state-of-the-art description
of GA is then presented.
pieces. Mutations can be caused by copying errors in the genetic material during cell
division and by external environment factors. Although the overwhelming majority
of mutations have no real effect, some can cause disease in organisms due to partially
or fully nonfunctional proteins arising from the errors in the protein sequence.
The procedure of a typical EA (in the form of GA) is given by Algorithm 3.1.
The initial population is usually generated randomly, while the population of other
generations are generated from some selection/reproduction procedure. The search
process of an EA will terminate when a termination criterion is met. Otherwise a
new generation will be produced and the search process continues. The termination
criterion can be selected as a maximum number of generations, or the convergence
of the genotypes of the individuals. Convergence of the genotypes occurs when all
the values in the same positions of all the strings are identical, and crossover has no
effect for further processes. Phenotypic convergence without genotypic convergence
is also possible. For a given system, the objective values are required to be mapped
into fitness values so that the domain of the fitness function is always greater than
zero.
1. Set t = 0.
2. Randomize initial population P (0).
3. Repeat:
a. Evaluate fitness of each individual of P (t).
b. Select individuals as parents from P (t) based on fitness.
c. Apply search operators (crossover and mutation) to parents, and generate
P (t + 1).
d. Set t = t + 1.
until termination criterion is satisfied.
EAs are directed stochastic global search. They employ a structured, yet ran-
domized, parallel multipoint search strategy that is biased toward reinforcing search
points of high fitness. The evaluation function must be calculated for all the individ-
uals of the population, thus resulting in a high computation load. The high computa-
tional cost can be reduced by introducing learning into EAs, depending on the prior
knowledge of a given optimization problem.
EAs can be broadly divided into genetic algorithms (GAs) [46,47], evolution
strategies (ESs) [75], evolutionary programming [25], genetic programming (GP)
[55], differential evolution (DE) [88], and estimation of distribution algorithms
(EDAs) [67].
C++ code libraries for EAs are available such as Wall’s GALib at http://lancet.
mit.edu/ga/, and EOlib at http://eodev.sourceforge.net.
3.1 Introduction to Evolutionary Computation 39
Markov chain Monte Carlo (MCMC) methods are often used to sample from
intractable target distributions. SA is an instance of MCMC. The stochastic process
of EAs is basically similar to that of MCMC algorithms: Both are Markov chains with
fixed transition matrices between individual states, for instance, transition matrices
given by mutation and recombination operators for EAs and by perturbation operators
for MCMC. MCMC uses a single chain whereas EAs use a population of individuals
that interact.
In SA, at each search two possibilities of selecting are controlled by a random
function. In an EA, this is achieved by the crossover and mutation operations. The
capability of an EA to converge to a premature local minimum or a global optimum is
usually controlled by suitably selecting the probabilities of crossover and mutation.
This is comparable to the controlled lowering of the temperature in SA. Thus, SA
can be viewed as a subset of EAs with a population of one individual and a changing
mutation rate.
SA is too slow for practical use. EAs are much more effective in finding the
global minimum due to their simplicity and parallel nature. The inherent parallel
property also offsets their high computational cost. Combination of SA and EAs
inherits the parallelization of EAs and avoids the computational bottleneck of EAs
by incorporating elements of SA. The hybrid retains the best properties of both
paradigms. Many efforts in the synergy of the two approaches have been made in
the past decade [12,100].
Guided evolutionary SA [100] incorporates the idea of SA into the selection
process of evolutionary computation in place of arbitrary heuristics. The hybrid
method is practically a number of parallel SA processes. Genetic SA [12] provides
a completely parallel, easily scalable hybrid GA/SA method. The hybrid method
combines the recombinative power of GA and annealing schedule of SA. Population
MCMC [21,56] simulates several (Markov) chains in parallel. The MCMC chains
interact through recombination and selection.
Some terminologies that are used in the evolutionary computation literature are listed
below. These terminologies are an analogy to their biological counterparts.
In biology, genes are entities that parents pass to offspring during reproduction.
These entities encode information essential for the construction and regulation of pro-
teins and other molecules that determine the growth and functioning of the organism.
Definition 3.4 (Allele). The biological definition for an allele is any one of a number
of alternative forms of the same gene occupying a given position called a locus on
a chromosome. The gene’s position in the chromosome is called locus.
Alleles are the smallest information units in a chromosome. In nature, alleles exist
pairwise, whereas in EAs an allele is represented by only one symbol and it indicates
the value of a gene.
Figure 3.1 illustrate the differences between chromosome, gene, and allele.
Fitness function is a particular type of objective function that quantifies the opti-
mality of a solution, i.e., a chromosome, in an EA. It is used to map an individual’s
chromosome into a positive number. Fitness is the value of the objective function
for a chromosome x i , namely f (x i ). The fitness function is used to convert the
phenotype’s parameter values into the fitness.
Natural selection is different from artificial selection. Genetic drift and genetic
flow are two other mechanisms in biological evolution. Genetic flow, also known as
genetic migration, is the migration of genes from one population to another.
The genes of a new generation are a sampling from the genes of the successful
individuals of the previous one, but with some statistical error. Genetic drift is the
cumulative effect over time of this sampling error on the allele frequencies in the
population, and traits that do not affect reproductive fitness change in a population
over time. Like selection, genetic drift acts on populations, altering allele frequencies
and the predominance of traits. It occurs most rapidly in small populations and can
lead some alleles to become extinct or become the only alleles in the population,
thus reducing the genetic diversity in the population in finite populations.
3.3 Encoding/Decoding
01 .
. . 00 10 .
. . 10 . . . 10 .
. . 11 . (3.1)
x1 x2 xn
If the chromosome is l-bit long, it has 2l possible values. If the variable xi is in the
− +
range xi , xi with a coding sli . . . s2 s1 , where li is its bit-length in the chromosome
and si ∈ {0, 1}, then the decoding function is given by
1
l −1
i
xi = xi− + xi+ − xi− s j 2 j
. (3.2)
2li − 1
j=0
In binary coding, there is the so-called Hamming cliffs phenomenon, where large
Hamming distances between the binary codes of adjacent integers occur. Gray coding
is another approach to encoding the parameters into bits. The decimal value of a
Gray-encoded integer variable increases or decreases by 1 if only one bit is changed.
However, the Hamming distance does not monotonically increase with the difference
in integer values.
For a long period, Gray encoding was believed to outperform binary encoding
in GA. However, based on a Markov chain analysis of GA, there is little difference
between the performance of binary and Gray codings for all possible functions [10].
Also, Gray coding does not necessarily improve the performance for functions that
have fewer local minima in the Gray representation than in the binary representation.
This reiterates the no free lunch theorem, namely, no representation is superior for
all classes of problems.
Example 3.1: The conversion from binary coding to Gray coding is formulated as
b1 , i =1
gi = , (3.3)
bi ⊕ bi−1 , i > 1
3.3 Encoding/Decoding 43
where gi and bi are, respectively, the ith Gray code bit and the ith binary code bit,
which are numbered from 1 to n starting on the left, and ⊕ denotes addition mod 2,
i.e., exclusive-or. Gray coding can be converted into binary coding by
i
bi = gj, (3.4)
j=1
where the summation denotes summation mod 2. As an example, we can check the
equivalence between the binary code 1011011011 and the gray code 1110110110.
From the two equations, the most significant i bits of the binary code determine the
most significant i bits of the Gray code and vice versa.
3.4 Selection/Reproduction
Selection embodies the principle of survival of the fittest, which provides a driving
force in GA. Selection is based on the fitness of the individuals. From a population
P (t), those individuals with strong fitness will be selected for reproduction so as to
generate a population of the next generation, P (t + 1). Chromosomes with larger
fitness are selected and are assigned a higher probability of reproduction.
Sampling chromosomes from the sample space can be in a stochastic manner,
a deterministic manner, or their mixed mode. The roulette-wheel selection [47] is
a stochastic selection method, while the ranking selection [33] and the tournament
selection [31] are mixed mode selection methods.
Other approaches that incorporate mating preferences into evolutionary systems
are correlative tournament selection [62] and seduction [76].
44 3 Genetic Algorithms
Roulette-Wheel Selection
The roulette-wheel or proportional selection [31,47] is a simple and popular selection
scheme. Segments of the roulette wheel are allocated to individuals of the population
in proportion to the individuals’ relative fitness scores. Selection of parents is carried
out by successive spins of the roulette wheel, and an individual’s possibility of being
selected is based on its fitness:
f (x i )
Pi = N , i = 1, 2, . . . , N P . (3.5)
i=1 f (x i )
P
Elitism Strategy
The elitism strategy for selecting the individual with best fitness can improve the
convergence of GA [78]. The elitism strategy always copies the best individual of a
generation to the next generation. Although elitism may increase the possibility of
premature convergence, it improves the performance of GA in most cases and thus,
is integrated in most GA implementations [18].
Truncation selection is also an elitism strategy. It ranks all the individuals in the
current population according to their fitness and selects the best ones as parents.
Truncation selection is used as the basic selection scheme in ES and is also used in
breeder GA [68]. Breeder GA [68] was designed according to the methods used in
livestock breeding, and is based on artificial selection. Stud GA [52] uses the fittest
individual (the stud) in the population as one of the parents in all recombination
operations. Only one parent is selected stochastically.
Fitness-Uniform Selection/Deletion
Fitness-uniform selection and fitness-uniform deletion [49] achieve a population
which is uniformly distributed across fitness values, thus diversity is always preserved
in the population. Fitness-uniform selection generates selection pressure toward
sparsely populated fitness regions, not necessarily toward higher fitness. Fitness-
uniform deletion always deletes those individuals with very commonly occurring
fitness values. As fitness-uniform deletion is only a deletion scheme, EA still requires
a selection scheme. However, within a given fitness level genetic drift can occur,
although the presence of many individuals in other fitness levels to breed with will
reduce this effect.
Multikulti Selection
The natural mate selection of preferring somewhat different individuals has been
proved to increase the resistance to infection of the resulting offspring and thus
fitness. Multikulti methods [2] choose the individuals that are going to be sent to
other nodes based on the principle of multiculturality in an island model. In general,
multikulti policies outperform the usual migration policy of sending the best or a
random individual; however, the size of this advantage tends to be greater as the
number of nodes increases [2].
Replacement Strategies
The selection procedure needs to decide as to how many individuals in one population
will be replaced by the newly generated individuals so as to produce the population
for the new generation. Thus, the selection mechanism is split into two phases,
namely, parental selection and replacement strategy. There are many replacement
strategies such as the complete generational replacement, replace-random, replace-
worst, replace-oldest, and deletion by kill tournament [85]. In the crowding strategy
[20], an offspring replaces one of the parents whom it most resembles using the
similarity measure of the Hamming distance. These replacement strategies may result
in a situation where the best individuals in a generation may fail to reproduce. Elitism
strategy cures the problem by storing the best individuals obtained so far [18].
46 3 Genetic Algorithms
Statistically, the selective pressure for different replacement strategies are ranked
as: replace worst > kill tournament > age-based replacement ≈ replace random.
Elitism increases the selective pressure. Elitism can be combined with the kill tour-
nament, the age-based replacement, and the replace random rule. One can define a
probability for replacement so that the individual selected by the replacement rule
will have a chance to survive. This technique decreases the selective pressure.
3.5 Crossover
Children Children
(c) (d)
Parents Parents
A BCD E F G H I J
abcde f g hi j
Children Children
AB c d E f G h I j
a b CD e F g H i J
Figure 3.2 Illustration of crossover operators. a One-point crossover. b Two-point crossover.
c Multipoint crossover. d Uniform crossover. For multipoint crossover and uniform crossover,
the exchange between crossover points takes place at a fixed probability.
Multipoint Crossover
Multipoint crossover treats each string as a ring of bits divided by m crossover points
into m segments, and each segment is exchanged at a fixed probability.
Uniform Crossover
Uniform crossover exchanges bits of a string rather than segments. Individual bits in
the parent chromosomes are compared, and each of the nonmatching bits is proba-
bilistically swapped with a fixed probability, typically 0.5. The operator is unbiased
with respect to defining length. In half-uniform crossover [23], exactly half of the
nonmatching bits are swapped.
One-point and two-point crossover operations preserve schemata due to low dis-
ruption rates. In contrast, uniform crossover swaps are more exploratory, but have
a high disruptive nature. Uniform crossover is more suitable for small populations,
while two-point crossover is better for large populations. Two-point crossover per-
forms consistently better than one-point crossover [90].
When all the chromosomes are very similar or even the same in the population, it
is difficult to generate a new structure by crossover only and premature convergence
takes place. Mutation operation can introduce genetic diversity into the population.
This prevents premature convergence from happening when all the individuals in the
population become very similar.
48 3 Genetic Algorithms
3.6 Mutation
Mutation is a unary operator that requires only one parent to generate an offspring.
A mutation operator typically selects a random position of a random chromosome
and replaces the corresponding gene or bit by other information. Mutation helps to
regain the lost alleles into the population.
Mutations can be classified into point mutations and large-scale mutations. Point
mutations are changes to a single position, which can be substitutions, deletions,
or insertions of a gene or a bit. Large-scale mutations can be similar to the point
mutations, but operate in multiple positions simultaneously, or at one point with
multiple genes or bits, or even on the chromosome scale. Functionally, mutation
introduces the necessary amount of noise to perform hill-climbing.
Inversion and rearrangement operators are also large-scale mutation operators.
Inversion operator [47] picks up a portion between two randomly selected positions
within a chromosome and then reverses it. Swap is the most primitive reordering
operator, based on which many new unary operators including inversion can be
derived. The rearrangement operator reshuffles a portion of a chromosome such
that the juxtaposition of the genes or bits is changed. Some mutation operations are
illustrated in Figure 3.3.
Uniform bit-flip mutation is a popular mutation for binary string representations.
It independently changes each bit of a chromosome with a probability of p. Typ-
ically, p = 1/L for a string of L bits. This in expectation changes one bit in each
chromosome. The probability distribution of fitness values after the operation can
be exactly computed as a polynomial in p [14].
(a) (b)
Parent Parent
A
Child Child
a
Child
E DCB A
Child
Parent
e
Child
A high mutation rate can lead genetic search to random search. It may change
the value of an important bit, and thus slow down the fast convergence of a good
solution or slow down the process of convergence of the final stage of the iterations.
In simple GA, mutation is typically selected as a substitution operation that changes
one random bit in the chromosome at a time. An empirically derived formula that
can be used as the probability of mutation Pm at a starting point is Pm = 1√ , for a
T l
total number of T generations and a string length of l [80].
The random nature of mutation and its low probability of occurrence leads to slow
convergence of GA. The search process can be expedited by using the directed muta-
tion technique [6] that deterministically introduces new points into the population
by using gradient or extrapolation of the information acquired so far.
It is commonly agreed that crossover plays a more important role if the population
size is large, and mutation is more important if the population size is small [69].
In addition to traditional mutation operators, hill-climbing and bit climber are
two well-known local search operators, which can be treated as mutation operators.
Hill-climbing operators [65] find an alternative similar individual that represents a
local minimum close to the original individual in the solution space. The bit climber
[17] is a simple stochastic bit-flipping operator. The fitness is computed for an initial
string. A bit of the string is randomly selected and flipped, and the fitness is computed
at the new point. If the fitness is lower than its earlier value, the new string is updated
as the current string. The operation repeats until no bit flip improves the fitness. The
bit-based descent algorithm is several times faster than an efficient GA [17].
the recipient chromosome. Transduction involves the transfer of genes from a donor
bacterium to a recipient one by a bacteriophage, namely, a virus whose hosts are
bacteria. In contrast with transduction, in conjugation, the absence of a bacteriophage
requires a direct physical contact between the donor bacterium and the recipient one.
Gene transfer operation [70] allows the transfer of a segment between the bacteria
in the population. Bacterial EA [70] substitutes the classical crossover with the gene
transfer operation. Each bacterium represents a solution for the original problem. A
segment of a bacterium is transferred to a destination bacterium, and those genes in
the destination bacterium that appears in the segment from the source bacterium are
removed after the transfer.
Based on a microbial tournament, microbial GA [39] is a minimal steady-state GA
implementation. Thus, once two parent chromosomes are chosen at random from a
population, the winner is unchanged, while the loser or less fit chromosome is infected
by a copy of a segment of the winner’s chromosome and further mutated. This form of
recombination is inspired in bacterial conjugation. A conjugation operator simulating
the genetic mechanism exhibited by bacterial colonies is introduced in [73].
Jumping-Gene Transposition
The jumping-gene (transposon) phenomenon is the gene transposition in the genome
that was discovered from the maize plants. The jumping genes could move around
the genome in two ways: cut-and-paste transposon and copy-and-paste (replicate)
transposon. Cut-and-paste cuts a piece of DNA and pastes it somewhere else. Copy-
and-paste means that the genes remain at the same location while the message in
the DNA is copied into RNA and then copied back into DNA at another place in the
genome. The jump of genes enables a transposition of gene(s) to be induced in the
same chromosome or even to other chromosomes.
Transposition operator [11,39,84] is a genetic operator that mimics the jumping-
gene phenomenon. It enables the gene mobility within the same chromosome, or
even to a different chromosome. Transposons resemble computer viruses: They are
the autonomous programs, which are transmissible from one site to another on the
same or another chromosome, or from parent to offspring in the reproduction process.
These autonomous parasitic programs cooperate with the host genetic programs, thus
realizing process of self-replication.
Crossover for Variable-Length GAs
The complexity of the human genome was not obtained at the beginning of evolution,
but rather it is generally believed that life started off from simple form and gradually
incremented its organism complexity through evolution. Variable-length GAs operate
within a variable parameter space. Consequently, they are usually applied to design
problems, where the phenotype can have a variable number of components and the
problem is incremental in nature.
Messy GA [34] utilizes a variable-length representation. In messy GA, the
crossover operator is implemented by cutting and splicing. Each parent genome
is first cut into two strings at a random point, obtaining four strings. The strings are
3.7 Noncanonical Genetic Operators 51
For EAs, two fundamental processes that drive the evolution of a population are the
exploration process and the exploitation process. Exploitation means taking advan-
tage of the information already obtained, while exploration means searching differ-
ent regions of the search space. Exploitation is achieved by the selection procedure,
while exploration is achieved by genetic operators, which preserve genetic diversity
in the population. The two objectives are conflicting: increasing the selective pres-
sure leads to decreasing diversity, while keeping the diversity can result in delayed
convergence.
GA often converges rather prematurely before the optimal solution is found. To
prevent premature convergence, an appropriate diversity in the population has to be
maintained. Otherwise, the entire population tends to be very similar, and crossover
52 3 Genetic Algorithms
will be useless and GA reduces to parallel mutation climbing. The trade-off between
exploitation (convergence) and exploration (diversity) controls the performance of
GA and is determined by the choice of the control parameters, namely, the probability
of crossover Pc , the probability of mutation Pm , and the population size N P . Some
trade-offs are made for selecting the optimal control parameters:
These control parameters depend on one another, and their choices depend on the
nature of the problem. In GA practice, for small N P one can select relatively large Pm
and Pc , while for large N P smaller Pc and Pm are desirable. Empirical results show
that GA with N P = 20 – 30, Pc = 0.75 – 0.95, and Pm = 0.005 – 0.01 performs
well [80]. When crossover is not used, GA can start with large Pm , decreasing toward
the end of the run. In [66], the optimal Pm is analytically derived as Pm = 1/L for
a string length L.
It is concluded from a systematic benchmark investigation on the seven parameters
of GA in [64] that crossover most significantly influenced the success of GA, followed
by mutation rate and population size and then by rerandomization point and elite
strategy. Selection method and the representation precision for numerical values had
least influence.
Adapting Control Parameters
Adaptation of control parameters is necessary for the best search process. At the
beginning of a search process, GA should have more emphasis on exploration, while
at a later stage more emphasis should be on exploitation.
Increasing Pm and Pc promotes exploration at the expense of exploitation. A
simple method to adapt Pm is implemented by linearly decreasing Pm with the
number of generations, t. Pm can also be modified by [44]
γ0 t
α0 e− 2
Pm (t) = √ , (3.7)
NP l
where the constants α0 > 0, γ0 ≥ 0, and l is the length of the chromosome. In [80],
α0 = 1.76 and γ0 = 0.
In [87], a fitness-based rule is used to assign mutation and recombination rates,
with higher rates being assigned to those genotypes that are most different in fitness
from the fittest individual in the population. This results in a reduced probability of
crossover for the best solutions available in an attempt to protect them. When all
the individuals in the population are very similar, the exploration drive will be lost.
Rank GA [9] is obtained by assigning the mutation rate through a ranking of the
3.8 Exploitation Versus Exploration 53
population by fitness. This protects only the current maximal fitness found, while
the rest perform random walks with different step sizes. The worst individuals will
undergo the most changes.
Dynamic control of GA parameters can be based on fuzzy logic techniques [40,
41,57]. In [57], the population sizes, and crossover and mutation rates are determined
from average and maximum fitness values and differentials of the fitness value by
fuzzy reasoning.
Controlling Diversity
The genetic diversity of the population can be easily improved so as to prevent
premature convergence by adapting the size of the population [1,34] and using partial
restart [23]. Partial restart is a simple approach to maintain genetic diversity [23]. It
can be implemented by a fixed restart schedule at a fixed number of generations, or
implemented when premature convergence occurs.
Periodic population reinitialization can increase the diversity of the population.
One methodology combining the effects of the two strategies is saw-tooth GA [54],
which follows a saw-tooth population scheme with a specific amplitude and period
of variation.
Duplicate removal can enhance the diversity substantially. The uniqueness opera-
tor [63] allows a child to be inserted into the population only if its Hamming distance
to all members of the population is greater than a threshold. Analysis of an EA with
N P > 1 using uniform bit mutation but no crossover [28] shows that the duplicate
removal method changes the time complexity of optimizing a plateau function from
exponential to polynomial. Each child is required to compare with all the solutions
in the current population.
Diversity-guided EA [94] uses the distance-to-average-point measure to alternate
between phases of exploration (mutation) and phases of exploitation (recombination
and selection). The diversity-guided EA has shown remarkable results not only in
terms of fitness, but also in terms of saving a substantial amount of fitness evaluations
compared to simple EA.
Since the selection operator has a tendency to reduce the population variance,
population variance can be increased by the variation operator to maintain adequate
diversity in the population. A variation operator [5] is a combination of the recombi-
nation and the mutation operator. For a variation operator, population mean decision
variable vector should remain the same before and after the variation operator.
Varying Population Size
Population sizing schemes for EAs may rely on the population sizing theory [60],
or include the concepts of age, lifetime, and competition among species for limited
resources. In [51], a thorough analysis of the role of the offspring population size
in an EA is presented using a simplified, but still realistic EA. The result suggests a
simple way to dynamically adapt this parameter when necessary.
Messy GA [34] starts with a large initial population and halves it at regular intervals
during the primordial stage. In the primordial stage only a selection operation is
54 3 Genetic Algorithms
applied. This helps the population to get enriched with good building blocks. Fast
messy GA [35] is an improved version of messy GA.
GENITOR [96] employs an elitist selection that is a deterministic, rank-based
selection method so that the best N P individuals found so far are preserved by
using a crossgenerational competition. Crossover produces only one offspring that
immediately enters the population. Offspring do not replace their parents, except for
those least-fit individuals in the population. This selection strategy is similar to the
(λ + μ) strategy of ES.
CHC algorithm [23] stands for crossgenerational elitist selection, heterogeneous
recombination, and cataclysmic mutation. Like GENITOR, it also borrows from the
(λ + μ) strategy of ES. Incest prevention is introduced so that similar individuals
are prevented from mating. Half-uniform crossover is applied, and mutation is not
performed. Diversity is reintroduced by restarting partial population whenever con-
vergence is detected. This is implemented by randomly flipping a fixed proportion
of the best individual found so far as template, and introducing the better offspring
into the population.
Parameterless population pyramid [36] is an efficient, general, parameterless
evolutionary approach without user-specified parameters. It replaces the genera-
tional model with a pyramid of multiple populations that are iteratively created and
expanded. The approach scales to the difficulty of the problem when combined with
local search, advanced crossover, and addition of diversity.
Aging
Aging provides a mechanism to make room for the development of the next genera-
tion. Aging is a general mechanism to increase genetic diversity. An optimal lifespan
plays an important role in improving the effectiveness of evolution. For intelligent
species which are able to learn from experience, aging avoids excessive experience
accumulation of older individuals to avoid their being always the superior competi-
tors.
Aging is often used by assigning age 0 to each new offspring. The age is increased
by 1 in each generation. In selection for replacement the age is taken into account:
Search points exceeding a predefined maximal age are removed from the collection
of search points.
GA with varying population size [1] does not use any variation of selection mech-
anism discussed earlier, but introduces the concept of age of a chromosome in the
number of generations.
In cohort GA [48], a string of high fitness produces offspring quickly, while a
string of low fitness may have to wait a long time before reproducing. All strings
can have the same number of offspring, say two, at the time they reproduce. To
implement this delayed-reproduction idea, the population of cohort GA is divided
into an ordered set of nonoverlapping subpopulations called cohorts. Reproduction
is carried out by cycling through the cohorts in the given order.
3.8 Exploitation Versus Exploration 55
15
10
10
10
Best
Average
f( x , x )
2
5
10
i
0
10
−5
10
0 50 100 150 200 250 300
Generation t
Figure 3.4 The evolution of a random run of simple GA: the maximum and average objectives
pairs of matrices [13], and an unbiased crossover operator called UNBLOX (uniform
block crossover) [8]. UNBLOX is a two-dimensional wraparound crossover and can
sample all the matrix positions equally. The convergence rates of two-dimensional
GAs are higher than that of simple GA for bitmaps [8].
10
10
Best
Average
5
f(x ,x ) 10
1 2
0
10
−5
10
0 50 100 150 200 250 300
Generation t
Figure 3.5 The evolution of a random run of the real-coded GA with the elitism strategy: the
maximum and average objectives
Best: −1 Mean: −1
0
Best fitness
Mean fitness
−0.2
Fitness value
−0.4
−0.6
−0.8
−1
0 20 40 60 80 100
Generation
Figure 3.6 The evolution of a random run of simple GA: the minimum and average objectives
For sequence optimization problems such as scheduling and TSP, permutation encod-
ing is a natural representation for a set of symbols, and each symbol can be identified
by a distinct integer. This representation avoids missing or duplicate alleles [37].
Genetic operators should be defined so that infeasible solutions do not occur or
a way is viable for repairing or rejecting infeasible solutions. Genetic operators for
reordering a sequence of symbols can be unary operators such as inversion and swap,
or binary operators that combine features of inversion and crossover, such as partial
matched crossover, order crossover, and cycle crossover [31], edge recombination
[97], as well as intersection and union [26].
The random keys representation [4] encodes each symbol with a random num-
ber in (0, 1). A random key for a variable is a real-valued number in the interval
(0,1). By sorting the random keys in a descending or ascending order, we can get a
decoded solution. For example, assume that we are solving a TSP of 5 cities, with
the chromosome for a route encoded as (0.52, 0.40, 0.81, 0.90, 0.23). If the genes
are sorted in a descending order, the largest random key is 0.90, so the fourth city is
the beginning of the route, and the whole route can be 4 → 3 → 1 → 2 → 5. This
representation avoids infeasible offspring by representing solutions in a soft manner,
such that real-coded GA and the ES can be applied directly for sequence optimization
problems. The random keys representation is simple and robust, and always allows
simple crossover operations to generate feasible solutions. Ordering messy GA [53]
is specialized for solving sequence optimization problems. It uses the mechanics of
fast messy GA [35] and represents the solutions using random keys.
3.11 Genetic Algorithms for Sequence Optimization 61
Biased random key GA [22] is a variation of random keys GA, but differs in the
way crossover is performed. In biased random key GA, the population is divided
into a small elite subpopulation and a nonelite subpopulation. To generate the off-
spring, biased random key GA selects one parent from the elite subpopulation and
the other parent from the nonelite subpopulation. Thus the offspring would have
more probability of inhering the keys of its elite parent.
Coding Spanning Trees
Many combinatorial problems seek solutions that either are or are derived from
spanning trees. For the minimum spanning tree (MST) problem, polynomial time
algorithms exist for identifying an optimal solution. Other problems, such as the
optimal communications spanning tree problem and the degree-constrained MST
problem have been shown to be NP-hard.
The concept of random keys [4] has been transferred from scheduling and ordering
problems to the encoding of trees. A tree is an undirected, fully connected graph with
no cycles. One of the most common representation schemes for networks is the char-
acteristic vector representation. Simple GAs with network random keys (NetKeys)
significantly outperform their counterparts using characteristic vectors and are much
faster for solving complex tree problems [77]. For NetKeys [77], a chromosome
assigns to each edge on the network a rating of its importance, which is referred to as
a weight, a real number in [0, 1]. A spanning tree is decoded from the chromosome
by adding edges from the network to an initially empty graph in order of importance,
ignoring edges that introduce cycles. Once n − 1 edges have been added, a span-
ning tree has been identified. NetKeys has high computational complexity. Since
the chromosome has length e = |E|, E being the set of edges, the time required for
crossover and mutation is O(e). Decoding is even more complex, since it requires
to identify an MST on the problem network.
With a direct tree representation, the identity of all n − 1 edges in the spanning
tree can be identified directly from its chromosome. One example is the predecessor
code [71]. A node is designated as the root node of the tree and, for each node, the
immediate predecessor pi in the path from pi to the present node is recorded. A
spanning tree T = (V, E) is encoded as the vector P = { p1 , p2 , . . . , pn−1 }, where
(i, pi ) ∈ E and V is designated as the root node. Although the code does not exclu-
sively encode spanning trees, it does ensure that each node belongs to at least one
edge and that no edge is represented more than twice.
The Dandelion code [92] represents each tree on n vertices as a string of (n − 2)
integers from the set [1, n]. The implementation of the Dandelion mapping has
O(n) complexity. Although the direct tree coding, which exhibits perfect heritability,
achieves the best results in the fewest generations, with NetKeys being a close second,
the Dandelion code is a strong alternative, particularly for very large networks, since
the Dandelion code is computationally the most efficient coding scheme for spanning
trees and locality seems to improve as the problem size increases. The decoding and
encoding algorithms for the Dandelion code may both be implemented in O(n) time
[72], and the locality is high.
62 3 Genetic Algorithms
0.8
0.6
0.4
0.2
0 0.5 1 1.5
Example 3.5: Consider the TSP for 30 randomly generated cities in the United States,
plotted in Figure 3.7.
When using the GA solver, city sequence is coded as a custom data type, and
the corresponding creation function, crossover function, and mutation function are
provided in the MATLAB Global Optimization Toolbox. We set the population size
as 50 and the number of generations as 400. The initial solutions are randomly
selected as a sequence of all the cities.
The evolution of a random run is illustrated in Figure 3.8. The final optimal route
length is 4.096 obtained at the 391st generation, with 19600 fitness evaluations.
Problems
(a) (b)
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
0 0.5 1 1.5 0 0.5 1 1.5
Fitness value
14
12 10
10
5
8
6 0
0 100 200 300 400 0 100 200 300 400
Generation Generation
Figure 3.8 The GA evolution of the TSP: a the optimal solution, b minimum and average route
lengths
3.12 For binary GA with population size N and mutation rate pm , and chromosome
length n bits, what is the probability that any bits will not be mutated in the
entire population for one generation?
3.13 Explain why the crossover operators defined for binary GA are not suitable for
real-coded GA.
3.14 Implement fitness transform for maximizing a function with negative objective
value.
3.15 Implement the ga solver for solving a benchmark function using both the
binary mode and the real-coded mode.
3.16 Implement the ga solver for solving the knapsack problem in the Appendix.
References
1. Arabas J, Michalewicz Z, Mulawka J. GAVaPS—a genetic algorithm with varying population
size. In: Proceedings of the 1st IEEE international conference on evolutionary computation,
Orlando, FL, USA, June 1994. p. 73–78.
2. Araujo L, Merelo JJ. Diversity through multiculturality: assessing migrant choice policies in
an island model. IEEE Trans Evol Comput. 2011;15(4):456–69.
3. Ballester PJ, Carter JN. An effective real-parameter genetic algorithm with parent centric
normal crossover for multimodal optimisation. In: Proceedings of genetic and evolutionary
computation conference (GECCO), Seattle, WA, USA, June 2004. p. 901–913.
4. Bean J. Genetic algorithms and random keys for sequence and optimization. ORSA J Comput.
1994;6(2):154–60.
5. Beyer H-G, Deb K. On self-adaptive features in real-parameter evolutionary algorithms. IEEE
Trans Evol Comput. 2001;5(3):250–70.
6. Bhandari D, Pal NR, Pal SK. Directed mutation in genetic algorithms. Inf Sci. 1994;79:251–
70.
7. Burke DS, De Jong KA, Grefenstette JJ, Ramsey CL, Wu AS. Putting more genetics into
genetic algorithms. Evol Comput. 1998;6(4):387–410.
8. Cartwright HM, Harris SP. The application of the genetic algorithm to two-dimensional
strings: the source apportionment problem. In: Forrest S, editor, Proceedings of the 5th inter-
national conference on genetic algorithms, Urbana-Champaign, IL, USA, June 1993. San
Mateo, CA: Morgan Kaufmann; 1993. p. 631.
9. Cervantes J, Stephens CR. Limitations of existing mutation rate heuristics and how a rank
GA overcomes them. IEEE Trans Evol Comput. 2009;13(2):369–97.
10. Chakraborty UK, Janikow CZ. An analysis of Gray versus binary encoding in genetic search.
Inf Sci. 2000;156:253–69.
11. Chan TM, Man KF, Kwong S, Tang KS. A jumping gene paradigm for evolutionary multiob-
jective optimization. IEEE Trans Evol Comput. 2008;12(2):143–59.
12. Chen H, Flann NS, Watson DW. Parallel genetic simulated annealing: a massively parallel
SIMD algorithm. IEEE Trans Parallel Distrib Syst. 1998;9(2):126–36.
13. Cherkauer KJ. Genetic search for nearest-neighbor exemplars. In: Proceedings of the 4th
midwest artificial intelligence and cognitive science society conference, Utica, IL, USA,
1992. p. 87–91.
14. Chicano F, Sutton AM, Whitley LD, Alba E. Fitness probability distribution of bit-flip muta-
tion. Evol Comput. 2015;23(2):217–48.
References 65
15. Chuang Y-C, Chen C-T, Hwang C. A real-coded genetic algorithm with a direction-based
crossover operator. Inf Sci. 2015;305:320–48.
16. Civicioglu P. Backtracking search optimization algorithm for numerical optimization prob-
lems. Appl Math Comput. 2013;219:8121–44.
17. Davis L. Bit-climbing, representational bias, and test suite design. In: Proceedings of the 4th
international conference on genetic algorithms, San Diego, CA, USA, July 1991. San Mateo,
CA: Morgan Kaufmann; 1991. p. 18–23.
18. Davis L, Grefenstette JJ. Concerning GENESIS and OOGA. In: Davis L, editor. Handbook
of genetic algorithms. New York: Van Nostrand Reinhold; 1991. p. 374–377.
19. Deb K, Anand A, Joshi D. A computationally efficient evolutionary algorithm for real-
parameter optimization. Evol Comput. 2002;10(4):371–95.
20. De Jong K. An analysis of the behavior of a class of genetic adaptive systems. PhD Thesis,
University of Michigan, Ann Arbor, MI, USA, 1975.
21. Drugan MM, Thierens D. Recombination operators and selection strategies for evolutionary
Markov Chain Monte Carlo algorithms. Evol Intel. 2010;3(2):79–101.
22. Ericsson M, Resende MGC, Pardalos PM. A genetic algorithm for the weight setting problem
in OSPF routing. J Comb Optim. 2002;6:299–333.
23. Eshelman LJ. The CHC adaptive search algorithm: How to have safe search when engaging
in nontraditional genetic recombination. In: Rawlins GJE, editor. Foundations of genetic
algorithms. San Mateo, CA: Morgan Kaufmannpp; 1991. p. 265–283.
24. Eshelman LJ, Schaffer JD. Real-coded genetic algorithms and interval-schemata. In: Whitley
LD, editor, Foundations of genetic algorithms 2. San Mateo, CA: Morgan Kaufmann; 1993.
p. 187–202.
25. Fogel L, Owens J, Walsh M. Artificial intelligence through simulated evolution. New York:
Wiley; 1966.
26. Fox BR, McMahon MB. Genetic operators for sequencing problems. In: Rawlins GJE, editor.
Foundations of genetic algorithms. San Mateo, CA: Morgan Kaufmann; 1991. p. 284–300.
27. Frantz DR. Non-linearities in Genetic Adaptive Search. PhD Thesis, University of Michigan,
Ann Arbor, MI, USA, 1972.
28. Friedrich T, Hebbinghaus N, Neumann F. Rigorous analyses of simple diversity mechanisms.
In: Proceedings of genetic and evolutionary computation conference (GECCO), London, UK,
July 2007. p. 1219–1225.
29. Galan SF, Mengshoel OJ, Pinter R. A novel mating approach for genetic algorithms. Evol
Comput. 2012;21(2):197–229.
30. Garcia-Martinez C, Lozano M, Herrera F, Molina D, Sanchez AM. Global and local real-
coded genetic algorithms based on parent-centric crossover operators. Eur J Oper Res.
2008;185:1088–113.
31. Goldberg DE. Genetic algorithms in search, optimization, and machine learning. Reading,
MA, USA: Addison-Wesley; 1989.
32. Goldberg D. A note on Boltzmann tournament selection for genetic algorithms and population-
oriented simulated annealing. Complex Syst. 4:4:445–460.
33. Goldberg DE, Deb K. A comparative analysis of selection schemes used in genetic algo-
rithms. In: Rawlins GJE, editor. Foundations of genetic algorithms. San Mateo, CA: Morgan
Kaufmann; 1991. p. 69–93.
34. Goldberg DE, Deb K, Korb B. Messy genetic algorithms: motivation, analysis, and first results.
Complex Syst. 1989;3:493–530.
35. Goldberg DE, Deb K, Kargupta H, Harik G. Rapid, accurate optimization of difficult problems
using fast messy genetic algorithms. In: Proceedings of the 5th international conference on
genetic algorithms, Urbana-Champaign, IL, USA, June 1993. p. 56–64.
36. Goldman BW, Punch WF. Fast and efficient black box optimization using the parameter-less
population pyramid. Evol Comput. 2015;23(2):451–79.
66 3 Genetic Algorithms
37. Grefenstette JJ, Gopal R, Rosmaita BJ, Gucht DV. Genetic algorithms for the traveling sales-
man problem. In: Proceedings of the 1st international conference on genetic algorithms and
their applications, Pittsburgh, PA, USA, July 1985. Mahwah, NJ: Lawrence Erlbaum Asso-
ciates; 1985. p. 160–168.
38. Harvey I. The SAGA cross: the mechanics of crossover for variable-length genetic algorithms.
In: Proceedings of the 2nd conference on parallel problem solving from nature (PPSN II),
Brussels, Belgium, Sept 1992. Amsterdam, The Netherlands: North Holland; 1992. p. 269–
278.
39. Harvey I. The microbial genetic algorithm. In: Proceedings of 10th european conference on
advances in artificial life: Darwin meets von Neumann, Budapest, Hungary, Sept 2009, Part
II, p. 126–133.
40. Herrera F, Lozano M. Adaptation of genetic algorithm parameters based on fuzzy logic con-
trollers. In: Herrera F, Verdegay JL, editors. Genetic algorithms and soft computing. Berlin:
Physica-Verlag; 1996. p. 95–125.
41. Herrera F, Lozano M. Fuzzy adaptive genetic algorithms: design, taxonomy, and future direc-
tions. Soft Comput. 2003;7:545–62.
42. Herrera F, Lozano M, Verdegay JL. Fuzzy connectives based crossover operators to model
genetic algorithms population diversity. Fuzzy Sets Syst. 1997;92(1):21–30.
43. Herrera F, Lozano M, S’anchez AM. A taxonomy for the crossover operator for real-coded
genetic algorithms: An experimental study. Int J Intell Syst. 2003;18:3:309–338.
44. Hesser J, Manner R. Towards an optimal mutation probability for genetic algorithms. In:
Proceedings of the 1st workshop on parallel problem solving from nature (PPSN I), Dortmund,
Germany, Oct 1990. p. 23–32.
45. Hillis WD. Co-evolving parasites improve simulated evolution as an optimization procedure.
Physica D. 1990;42:228–34.
46. Holland JH. Outline for a logical theory of adaptive systems. J ACM. 1962;9(3):297–314.
47. Holland J. Adaptation in natural and artificial systems. Ann Arbor, Michigan: University of
Michigan Press; 1975.
48. Holland JH. Building blocks, cohort genetic algorithms and hyperplane-defined functions.
Evol Comput. 2000;8(4):373–91.
49. Hutter M, Legg S. Fitness uniform optimization. IEEE Trans Evol Comput. 2006;10(5):568–
89.
50. Hutt B, Warwick K. Synapsing variable-length crossover: meaningful crossover for variable-
length genomes. IEEE Trans Evol Comput. 2007;11(1):118–31.
51. Jansen T, De Jong KA, Wegener I. On the choice of the offspring population size in evolu-
tionary algorithms. Evol Comput. 2005;13(4):413–40.
52. Khatib W, Fleming PJ. The stud GA: a mini revolution? In: Eiben A, Back T, Schoenauer
M, Schwefel H, editors. Proceedings of the 5th international conference on parallel problem
solving from nature (PPSN V). Amsterdam: The Netherlands; 1998. p. 683–691.
53. Knjazew D, Goldberg DE. OMEGA—Ordering messy GA: Solving permutation problems
with the fast messy genetic algorithm and random keys. In: Proceedings of genetic and evo-
lutionary computation conference (GECCO), Las Vegas, NV, USA, July 2000. p. 181–188.
54. Koumousis VK, Katsaras CP. A saw-tooth genetic algorithm combining the effects of vari-
able population size and reinitialization to enhance performance. IEEE Trans Evol Comput.
2006;10(1):19–28.
55. Koza JR. Genetic programming: On the programming of computers by means of natural
selection. Cambridge, MA: MIT Press; 1992.
56. Laskey KB, Myers JW. Population Markov chain Monte Carlo. Mach Learn. 2003;50:175–96.
57. Lee MA, Takagi H. Dynamic control of genetic algorithms using fuzzy logic techniques. In:
Proceedings of the 5th international conference on genetic algorithms (ICGA’93), Urbana,
IL, USA, July 1993. p. 76–83.
References 67
58. Lee CY. Entropy-Boltzmann selection in the genetic algorithms. IEEE Trans Syst Man Cybern
Part B. 2003;33(1):138–42.
59. Leung FHF, Lam HK, Ling SH, Tam PKS. Tuning of the structure and parameters of a neural
network using an improved genetic algorithm. IEEE Trans Neural Networks. 2003;14(1):79–
88.
60. Lobo FG, Lima CF. A review of adaptive population sizing schemes in genetic algorithms.
In: Proceedings of genetic and evolutionary computation conference (GECCO), Washington,
DC, USA, June 2005. p. 228–234.
61. Mathias K, Whitley LD. Changing representations during search: a comparative study of delta
coding. Evol Comput. 1995;2(3):249–78.
62. Matsui K. New selection method to improve the population diversity in genetic algorithms.
In: Proceedings of the 1999 IEEE International conference on systems, man, and cybernetics,
Tokyo, Japan, Oct 1999. p. 625–630.
63. Mauldin ML. Maintaining diversity in genetic search. In: Proceedings of the 4th national
conference on artificial intelligence (AAAI-84), Austin, TX, USA, Aug 1984. p. 247–250.
64. Mills KL, Filliben JJ, Haines AL. Determining relative importance and effective settings for
genetic algorithm control parameters. Evol Comput. 2015;23(2):309–42.
65. Muhlenbein H. Parallel genetic algorithms, population genetics and combinatorial optimiza-
tion. In: Proceedings of the 3rd international conference on genetic algorithms, Fairfax, VA,
USA, June 1989. San Mateo, CA: Morgan Kaufman; 1989. p. 416–421.
66. Muhlenbein H. How genetic algorithms really work: mutation and hill climbing. In: Manner
R, Manderick B, editors. Proceedings of the 2nd conference on parallel problem solving
from nature (PPSN II), Brussels, Belgium, Sept 1992. Amsterdam, The Netherlands: North
Holland; 1992. pp. 15–25.
67. Muhlenbein H, Paab G. From recombination of genes to the estimation of distributions. I.
Binary parameters. In: Proceedings of the 4th International conference on parallel problem
solving from nature (PPSN IV), Berlin, Germany, Sept 1996. p. 178–187.
68. Muhlenbein H, Schlierkamp-Voosen D. Predictive models for the breeder genetic algorithm:
continuous parameter optimization. Evol Comput. 1994;1(4):25–49.
69. Mulenbein H, Schlierkamp-Voose D. Analysis of selection, mutation and recombination in
genetic algorithms. In: Banzhaf W, Eechman FH, editors. Evolution and biocomputation:
Evolution and biocomputation, computational models of evolution. Berlin: Springer; 1995.
p. 142–68.
70. Nawa NE, Furuhashi T. Fuzzy systems parameters discovery by bacterial evolutionary algo-
rithms. IEEE Trans Fuzzy Syst. 1999;7:608–16.
71. Palmer CC, Kershenbaum A. An approach to a problem in network design using genetic
algorithms. Networks. 1995;26:151–63.
72. Paulden T, Smith DK. From the Dandelion code to the Rainbow code: a class of bijective
spanning tree representations with linear complexity and bounded locality. IEEE Trans Evol
Comput. 2006;10(2):108–23.
73. Perales-Gravan C, Lahoz-Beltra R. An AM radio receiver designed with a genetic algorithm
based on a bacterial conjugation genetic operator. IEEE Trans Evol Comput. 2008;12(2):129–
42.
74. Potter MA, De Jong KA. Cooperative coevolution: an architecture for evolving coadapted
subcomponenets. Evol Comput. 2000;8(1):1–29.
75. Rechenberg I. Evolutionsstrategie-optimierung technischer systeme nach prinzipien der biol-
ogischen information. Freiburg, Germany: Formman Verlag; 1973.
76. Ronald E. When selection meets seduction. In: Proceedings of the 6th international conference
on genetic algorithms, Pittsburgh, PA, USA, July 1995. p. 167–173.
77. Rothlauf F, Goldberg DE, Heinzl A. Network random keys—a tree network representation
scheme for genetic and evolutionary algorithms. Evol Comput. 2002;10(1):75–97.
68 3 Genetic Algorithms
78. Rudolph G. Convergence analysis of canonical genetic algorithm. IEEE Trans Neural Net-
works. 1994;5(1):96–101.
79. Satoh H, Yamamura M, Kobayashi S. Minimal generation gap model for GAs considering
both exploration and exploitation. In: Proceedings of the 4th International conference on
soft computing (Iizuka’96): Methodologies for the conception, design, and application of
intelligent systems, Iizuka, Fukuoka, Japan, Sept 1996. p. 494–497.
80. Schaffer JD, Caruana RA, Eshelman LJ, Das R. A study of control parameters affecting
online performance of genetic algorithms for function optimisation. In: Proceedings of the
3rd international conference on genetic algorithms, Fairfax, VA, USA, June 1989. San Mateo,
CA: Morgan Kaufmann; 1989. p. 70–79.
81. Schraudolph NN, Belew RK. Dynamic parameter encoding for genetic algorithms. Mach
Learn. 1992;9(1):9–21.
82. Schwefel HP. Numerical optimization of computer models. Chichester: Wiley; 1981.
83. Sharma SK, Irwin GW. Fuzzy coding of genetic algorithms. IEEE Trans Evol Comput.
2003;7(4):344–55.
84. Simoes AB, Costa E. Enhancing transposition performance. In: Proceedings of congress on
evolutionary computation (CEC), Washington, DC, USA, July 1999. p. 1434–1441.
85. Smith J, Vavak F. Replacement strategies in steady state genetic algorithms: static environ-
ments. In: Banzhaf W, Reeves C, editors. Foundations of genetic algorithms 5. CA: Morgan
Kaufmann; 1999. p. 219–233.
86. Sokolov A, Whitley D. Unbiased tournament selection. In: Proceedings of the conference
on genetic and evolutionary computation (GECCO), Washington, DC, USA, June 2005. p.
1131–1138.
87. Srinivas M, Patnaik LM. Adaptive probabilities of crossover and mutation in genetic algo-
rithms. IEEE Trans Syst Man Cybern. 1994;24(4):656–67.
88. Storn R, Price K. Differential evolution–a simple and efficient adaptive scheme for global
optimization over continuous spaces. Technical Report TR-95-012, International Computer
Science Institute, Berkeley, CA, March 1995.
89. Streifel RJ, Marks RJ II, Reed R, Choi JJ, Healy M. Dynamic fuzzy control of genetic algorithm
parameter coding. IEEE Trans Syst Man Cybern Part B. 1999;29(3):426–33.
90. Syswerda G. Uniform crossover in genetic algorithms. In: Proceedings of the 3rd international
conference on genetic algorithms, Fairfax, VA, USA, June 1989. San Francisco: Morgan
Kaufmann; 1989. p. 2–9.
91. Syswerda G. Simulated crossover in genetic algorithms. In: Whitley LD, editor. Foundations
of genetic algorithms 2, San Mateo, CA: Morgan Kaufmann; 1993. p. 239–255.
92. Thompson E, Paulden T, Smith DK. The Dandelion code: a new coding of spanning trees for
genetic algorithms. IEEE Trans Evol Comput. 2007;11(1):91–100.
93. Tsutsui S, Yamamura M, Higuchi T. Multi-parent recombination with simplex crossover in
real coded genetic algorithms. In: Proceedings of the genetic and evolutionary computation
conference (GECCO), Orlando, FL, USA, July 1999. San Mateo, CA: Morgan Kaufmann;
1999. p. 657–664.
94. Ursem RK. Diversity-guided evolutionary algorithms. In: Proceedings of the 7th conference
on parallel problem solving from nature (PPSN VII), Granada, Spain, Sept 2002. p. 462–471.
95. Voigt HM, Muhlenbein H, Cvetkovic D. Fuzzy recombination for the breeder genetic algo-
rithm. In: Eshelman L, editor. Proceedings of the 6th international conference on genetic
algorithms, Pittsburgh, PA, USA, July 1995. San Mateo, CA: Morgan Kaufmann; 1995. p.
104–111.
96. Whitley D. The GENITOR algorithm and selective pressure. In: Proceedings of the 3rd inter-
national conference on genetic algorithms, Fairfax, VA, USA, June 1989. San Mateo, CA:
Morgan Kaufmann; 1989. p. 116–121.
97. Whitley D, Starkweather T, Fuquay D. Scheduling problems and traveling salesmen: the
genetic edge recombination operator. In: Proceedings of the 3rd international conference on
References 69
genetic algorithms, Fairfax, VA, USA, June 1989. San Mateo, CA: Morgan Kaufmann; 1989.
p. 133–140.
98. Wright AH. Genetic algorithms for real parameter optimization. In: Rawlins G, editor. Foun-
dations of genetic algorithms. San Mateo, CA: Morgan Kaufmann; 1991. p. 205–218.
99. Yao X, Liu Y, Liang KH, Lin G. Fast evolutionary algorithms. In: Ghosh S, Tsutsui S, editors.
Advances in evolutionary computing: theory and applications. Berlin, Springer; 2003. p. 45–9.
100. Yip PPC, Pao YH. Combinatorial optimization with use of guided evolutionary simulated
annealing. IEEE Trans Neural Networks. 1995;6(2):290–5.
101. Yukiko Y, Nobue A. A diploid genetic algorithm for preserving population diversity—pseudo-
meiosis GA. In: Parallel problem solving from nature (PPSN III), Vol. 866 of the series Lecture
Notes in Computer Science. Berlin: Springer; 1994. p. 36–45.
Genetic Programming
4
4.1 Introduction
Symbolic regression via GP has advantages over neural networks and SVMs in
terms of representation complexity, interpretability, and generalizing behavior. An
approach to generating data-driven regression models are proposed in [37]. These
models are obtained as solutions of the GP process for two-objective optimization
of low model error and low orders of expressional complexity. It is Pareto optimiza-
tion of the goodness of fit and expressional complexity, alternated with the Pareto
optimization of the goodness of fit and the order of nonlinearity at every generation.
Grammatical evolution [23] represents a grammar-based GP. Rather than repre-
senting the programs as parse trees, it uses a linear genome representation in the form
of a variable-length binary string. Grammatical evolution uses algorithmic maps to
define a phenotype from a genome, and uses a GA to search the space of struc-
tures specified by some context-free or attribute grammar. Christiansen grammar
evolution [24] extends grammatical evolution by replacing context-free grammars
by Christiansen grammars to improve grammatical evolution performance. Gram-
matical evolution only takes into account syntactic restrictions to generate valid
individuals, while Christiansen grammar evolution adds semantics to ensure that
both semantically and syntactically valid individuals are generated.
The inclusion of automatically defined functions (ADFs) in GP is widely adopted
by the GP research community. ADFs are reusable subroutines that are simultane-
ously evolved with the GP program, and are capable of exploiting any modularity
present in a problem to improve the performance of GP. However, the output of each
ADF is determined by evolution.
Gene expression programming (http://www.gepsoft.com/) [9] is a genotype/
phenotype GA for the creation of computer programs. In gene expression program-
ming, the genome is a symbol string of constant length, which may contain one or
more genes linked through a linking function. Thus, the algorithm distinguishes the
expression of genes (phenotype) from their representation (genotype). Gene expres-
sion programming considerably outperforms GP.
Cartesian GP uses directed graphs to represent programs, rather than trees. This
allows implicit reuse of nodes, as a node can be connected to the output of any
previous node in the graph. This is an advantage over tree-based GP representations
(without ADFs), where identical subtrees have to be constructed independently. Even
though Cartesian GP does not have ADFs, it performs better than GP with ADFs on a
number of problems. Embedded Cartesian GP [38] implements a form of ADF based
on the evolutionary module acquisition approach, which is capable of automatically
acquiring and evolving modules.
+
^ sin
x 2 /
x 3
Figure 4.1 Syntax tree for f (x) = x 2 + sin(x/3).
+ +
^ + ^ +
x 4 * x / x * x
x y x y x y
+ +
Crossover
^ * ^ *
/ x 4 x 4 x
x 4
x y
Parents Children
+ +
^ + ^ −−
Mutation
x * x x 4 y x
4
x y
Parent Child
Example 4.1: GP has been used for generating nonlinear input–output models that
are linear in parameters. The models are represented in a tree structure [20]. For
linear-in-parameters models, the model complexity can be controlled by orthogonal
least squares (OLS) method. The model terms are sorted by error reduction ratio
values according to OLS method. The subtree that had the least error reduction ratio
is eliminated from the tree.
MATLAB GP-OLS Toolbox provides an efficient and fast method for data-based
identification of nonlinear models. Instead of the mean square error (MSE), the fitness
function is defined as a correlation coefficient of the measured and the calculated
output values, multiplied by a penalty factor controlling the model complexity. OLS
introduces the error reduction ratio which is a measure of the decrease in the variance
of the output by a given term. The user should only specify the input–output data,
the set of the variables or the maximum model order at the terminal nodes, the set of
mathematical operators at the internal nodes, and some parameters of GP.
We consider the nonlinear input–output model with linear parameters:
y(k) = 0.5u(k − 1)2 + 0.6y(k − 1) − 0.6y(k − 2) − 0.2;
where u(k) and y(k) are the input and the output variables of the model at the kth
sample time.
This model is first used to generate the input and output data, as plotted in
Figure 4.4. Notice that the input and the output are polluted by 6 % and 3 % Gaussian
noise, respectively.
During the evolution, the function set F contained the basic arithmetic operations
F = {+, −, ∗}, and the terminal set T contained the arguments T = {u(k − 1),
u(k − 2), y(k − 1), y(k − 2)}. Parameters of GP are set as follows: N P = 50, the
maximum tree depth as 5, the maximum number of generations as 200, tournament
selection of size 2, one-point crossover pc = 0.8, point-mutation pm = 0.4, elitist
replacement, and generation gap as 0.9.
For ten random runs, the algorithm found perfect solution to the model structure
five times. For a random run, we got the best fitness 0.7596, the best MSE 0.7632,
and the evolved model
y(k) = 0.5074u(k − 1)2 + 0.4533y(k − 1) − 0.4586y(k − 2) − 0.2041.
That is, GP-OLS method can correctly identify the model structure of nonlinear
systems. The evolution of the fitness and MSE are shown in Figure 4.5.
4.3 Causes of Bloat 75
0.5
0
u(t), y(t)
−0.5
−1
u(t)
−1.5
y(t)
−2
0 2 4 6 8 10
t
Figure 4.4 The input and output data for model identification.
2
Value
1.5
0.5
0
0 20 40 60 80 100
Iteration
GP, code bloat is almost inevitable [12,26]. Programs that are much larger than they
need to be may over-fit the training data, reducing the performance on unseen data.
Classical theories for explaining bloat are mainly based on the concept of introns,
areas of code that can be removed without altering the fitness value of the solution.
Introns in biology are noncoding regions of the DNA, that is, those that eventually do
not end up as part of a protein. Explicitly defined introns [21] control the probability
of particular nodes being chosen as the crossover point in an attempt to prevent
destructive crossover. Increasing the number of nodes of the tree makes it more
difficult to destroy with crossover.
Hitchhiking theory [35] proved that random selection in conjunction with standard
subtree crossover does not cause code growth and therefore it is concluded that fitness
is the cause of size increase.
The removal bias theory [33] states that, assuming that redundant data are closer to
the leaves than to the root and applying crossover to redundant data does not modify
the fitness of a solution, evolution will favor the replacement of small branches.
Since there is not a bias for insertion, small branches will be replaced by average-
size branches, leading to bigger trees. In [15] experimental evidence is against the
claim that it is the crossover between introns that causes the bloat problem. Instead,
a generalization of the removal bias theory is used to explain the code growth.
In [26,29], a size evolution equation is developed, which provides an exact for-
malization of the dynamics of average program size. Also, the crossover bias theory
[5,29] states that while the mean size of programs is unaffected by crossover, higher
moments of the distribution are. The population evolves toward a distribution where
small programs have a higher frequency than longer ones.
Several non-intron theories of bloat have been proposed. The program search
space theory [14] relies on the idea that above a certain size, the distribution of
fitness does not vary with size. Since in the search space there are more big tree
structures than small ones, during the search process GP will tend to find bigger
trees. In [27], it is argued that GP will tend to produce larger trees simply because
there are more large programs than small ones within the search space. Theory of
modification point depth [16] argues that if deeper points are selected for crossover,
then it is less likely that crossover will significantly modify fitness. Therefore, there
is a bias for larger trees, which have deeper modification points.
This method constrains the evolving population with the maximum allowed depth,
or size, of the trees. A limit can be placed on either the number of nodes or the depth
of the tree [12]. Children whose size exceeds the limit are rejected, placing copies of
their parents in the population in their stead. In [25], newly created programs do not
enter the population until after a number of generations proportional to their size,
the idea being to give smaller programs a chance to spread through the population
before being overwhelmed by their larger brethren.
Augmenting any bloat control method with size limit never hurts [19]. However,
the population will quickly converge to the size limit and thus, lead to premature con-
vergence. It is very difficult to set a good limit without prior knowledge. The dynamic
limits approach [31] refines the hard-limiting approach based on fitness. Bloat con-
trol methods based on operator equalization [6,32] eliminate bloat by biasing the
search toward a predefined size distribution.
Several schemes are on the genetic operators, such as the crossover operator [2],
[14], and selection strategy that eliminates larger trees [19,25].
An editing operator [12] periodically simplify the trees, eliminating the subtrees
that do not add anything to the final solution. In [7], a mutation operator that performs
algebraic simplification of the tree expression is introduced, in a way similar to
78 4 Genetic Programming
In nature, the phenotype has multiple levels of complexity: tRNAs, proteins, ribo-
somes, cells, and the organism itself, all of which are products of expression and
are ultimately encoded in the genome. The expression of the genetic information
starts with transcription (the synthesis of RNA) and, for protein genes, proceeds
with translation (the synthesis of proteins).
Gene expression programming (GEP) [9] incorporates both the linear fixed-length
chromosomes of GA type and the expression trees of different sizes and shapes
similar to the parse trees of GP. The chromosomes have fixed length and are composed
4.5 Gene Expression Programming 79
of one or more equal-size genes structurally organized in a head and a tail. Since the
expression trees are totally encoded in the linear chromosomes of fixed length, the
genotype and phenotype are finally separated from each other. Thus, the phenotype
consists of the same kind of ramified structure used in GP.
In GEP, from the simplest individual to the most complex, the expression of
genetic information starts with translation, the transfer of information from a gene
into an expression tree. There is no need for transcription: the message in the gene is
directly translated into an expression tree. The expression trees are the expression of
a totally autonomous genome. Only the genome is passed on to the next generation,
and the modified simple linear structure will grow into an expression tree.
The chromosomes function as a genome and are subjected to modification by
means of mutation, transposition, root transposition, gene transposition, gene recom-
bination, and one- and two-point recombination. The chromosomes encode expres-
sion trees which are the object of selection.
Karva language is used to read and express the information encoded in the chro-
mosomes. K-expressions in terms of open reading frames (ORFs) is in fact the
phenotype of the chromosomes, being the genotype easily inferred from the pheno-
type, which is the straightforward reading of the expression tree from left to right
and from top to bottom. The length of the ORFs is variable, and it may be equal to or
less than the length of a gene. These noncoding regions in genes allow modification
of the genome using any genetic operator without restrictions, always producing
syntactically correct programs.
However, experiments show that GEP does not have a better performance than
other GP techniques [22].
Self-learning GEP [39] features a chromosome representation in which each chro-
mosome is embedded with subfunctions that can be deployed to construct the final
solution. The subfunctions are self-learned or self-evolved during the evolutionary
search. Self-learning GEP is simple, generic and has much fewer control parameters
than GEP has.
Problems
References
1. Alfaro-Cid E, Merelo JJ, Fernandez de Vega F, Esparcia-Alcazar AI, Sharman K. Bloat con-
trol operators and diversity in genetic programming: a comparative study. Evol Comput.
2010;18(2):305–32.
2. Blickle T, Thiele L. Genetic programming and redundancy. In: Hopf J, editor. Proceedings
of KI-94 workshop on genetic algorithms within the framework of evolutionary computation.
Germany: Saarbrucken; September 1994. p. 33–8.
3. Crawford-Marks R, Spector L. Size control via size fair genetic operators in the PushGP genetic
programming system. In: Proceedings of the genetic and evolutionary computation conference
(GECCO), New York, USA, July 2002. pp. 733–739.
4. Daida JM, Li H, Tang R, Hilss AM. What makes a problem GP-hard? validating a hypothesis
of structural causes. In: Cantu-Paz E, et al., editors. Proceedings of genetic and evolutionary
computation conference (GECCO), Chicago, IL, USA; July 2003. p. 1665–77.
5. Dignum S, Poli R. Generalisation of the limiting distribution of program sizes in tree-based
genetic programming and analysis of its effects on bloat. In: Proceedings of the 9th annual
conference on genetic and evolutionary computation (GECCO), London, UK, July 2007. p.
1588–1595.
6. Dignum S, Poli R. Operator equalisation and bloat free GP. In: Proceedings of the 11th European
conference on genetic programming (EuroGP), Naples, Italy, March 2008. p. 110–121.
7. Ekart A. Shorter fitness preserving genetic programs. In: Proceedings of the 4th European con-
ference on artificial evolution (AE’99), Dunkerque, France, November 1999. Berlin: Springer;
2000. p. 73–83.
8. Ekart A, Nemeth SZ. Selection based on the Pareto nondomination criterion for controlling
code growth in genetic pregramming. Genet Program Evol Mach. 2001;2(1):61–73.
9. Ferreira C. Gene expression programming: a new adaptive algorithm for solving problems.
Complex Syst. 2001;13(2):87–129.
10. Hoai NX, McKay RIB, Essam D. Representation and structural difficulty in genetic program-
ming. IEEE Trans Evol Comput. 2006;10(2):157–66.
11. Kinzett D, Johnston M, Zhang M. Numerical simplification for bloat control and analysis of
building blocks in genetic programming. Evol Intell. 2009;2:151–68.
12. Koza JR. Genetic programming: on the programming of computers by means of natural selec-
tion. Cambridge: MIT Press; 1992.
13. Langdon WB. Size fair and homologous tree genetic programming crossovers. Genet Program
Evol Mach. 2000;1:95–119.
14. Langdon WB, Poli R. Fitness causes bloat. In: Proceedings of the world conference on soft
computing in engineering design and manufacturing, London, UK, June 1997. p. 13–22.
15. Luke S. Code growth is not caused by introns. In: Proceedings of the genetic and evolutionary
computation conference (GECCO’00), Las Vegas, NV, USA, July 2000. p. 228–235.
16. Luke S. Modification point depth and genome growth in genetic programming. Evol Comput.
2003;11(1):67–106.
References 81
17. Luke S, Panait L. Lexicographic parsimony pressure. In: Proceedings of the genetic and evo-
lutionary computation conference (GECCO), New York, USA, July 2002. p. 829–836.
18. Luke S, Panait L. Fighting bloat with nonparametric parsimony pressure. In: Proceedings of
the 7th international conference on parallel problem solving from nature (PPSN VII), Granada,
Spain, September 2002. p. 411–421.
19. Luke S, Panait L. A comparison of bloat control methods for genetic programming. Evol
Comput. 2006;14(3):309–44.
20. Madar J, Abonyi J, Szeifert F. Genetic programming for the identification of nonlinear input-
output models. Ind Eng Chem Res. 2005;44(9):3178–86.
21. Nordin P, Francone F, Banzhaf W. Explicitly defined introns and destructive crossover in genetic
programming. In: Rosca JP, editor. Proceedings of the workshop on genetic programming: from
theory to real-world applications, Tahoe City, July 1995. p. 6–22.
22. Oltean M, Grosan C. A comparison of several linear genetic programming techniques. Complex
Syst. 2003;14:4. 285CC314.
23. O’Neill M, Ryan C. Grammatical evolution. IEEE Trans Evol Comput. 2001;5(4):349–58.
24. Ortega A, de la Cruz M, Alfonseca M. Christiansen grammar evolution: grammatical evolution
with semantics. IEEE Trans Evol Comput. 2007;11(1):77–90.
25. Panait L, Luke S. Alternative bloat control methods. In: Proceedings of genetic and evolutionary
computation conference (GECCO), Seattle, WA, USA, June 2004. p. 630–641.
26. Poli R. General schema theory for genetic programming with subtree-swapping crossover. In:
Proceedings of the 4th European conference on genetic programming (EuroGP), Lake Como,
Italy, April 2001. p. 143–159.
27. Poli R. A simple but theoretically-motivated method to control bloat in genetic programming.
In: Proceedings of the 6th European conference on genetic programming (EuroGP), Essex,
UK, April 2003. p. 204–217.
28. Poli R, Langdon WB. Genetic programming with one-point crossover. In: Chawdhry PK, Roy
R, Pant RK, editors. Soft computing in engineering design and manufacturing, Part 4. Berlin:
Springer; 1997. p. 180–189.
29. Poli R, McPhee NF. General schema theory for genetic programming with subtree-swapping
crossover: Part II. Evol Comput. 2003;11(2):169–206.
30. Poli R, McPhee NF. Parsimony pressure made easy. In: Proceedings of the 10th annual confer-
ence on genetic and evolutionary computation (GECCO’08), Atlanta, GA, USA, July 2008. p.
1267–1274.
31. Silva S, Costa E. Dynamic limits for bloat control in genetic programming and a review of past
and current bloat theories. Genet Program Evol Mach. 2009;10(2):141–79.
32. Silva S, Dignum S. Extending operator equalisation: fitness based self adaptive length distribu-
tion for bloat free GP. In: Proceedings of the 12th European conference on genetic programming
(EuroGP), Tubingen, Germany, April 2009. p. 159–170.
33. Soule T, Foster JA. Removal bias: a new cause of code growth in tree based evolutionary pro-
gramming. In: Proceedings of the IEEE international conference on evolutionary computation,
Anchorage, AK, USA, May 1998. p. 781–786.
34. Syswerda G. A study of reproduction in generational and steady state genetic algorithms.
In: Rawlings GJE, editor. Foundations of genetic algorithms. San Mateo: Morgan Kaufmann;
1991. p. 94–101.
35. Tackett WA. Recombination, selection and the genetic construction of genetic programs. PhD
thesis, University of Southern California, Los Angeles, CA, USA, 1994.
36. Trujillo L. Genetic programming with one-point crossover and subtree mutation for effective
problem solving and bloat control. Soft Comput. 2011;15:1551–67.
37. Vladislavleva EJ, Smits GF, den Hertog D. Order of nonlinearity as a complexity measure for
models generated by symbolic regression via Pareto genetic programming. IEEE Trans Evol
Comput. 2009;13(2):333–49.
82 4 Genetic Programming
38. Walker JA, Miller JF. The automatic acquisition, evolution and reuse of modules in Cartesian
genetic programming. IEEE Trans Evol Comput. 2008;12(4):397–417.
39. Zhong J, Ong Y, Cai W. Self-learning gene expression programming. IEEE Trans Evol Comput.
2016;20(1):65–80.
Evolutionary Strategies
5
Evolutionary strategy (ES) paradigm is one of the most successful EAs. Evolutionary
gradient search and gradient evolution are two methods that use EA to construct gra-
dient information for directing the search efficiently. Covariance matrix adaptation
(CMA) ES [11] accelerates the search efficiency by supposing that the local solution
space of the current point has a quadratic shape.
5.1 Introduction
ES [20,22] is another popular EA. ES was originally developed for numerical opti-
mization problems [22]. It was later extended to discrete optimization problems [13].
The objective parameters x and strategy parameters σ are directly encoded into the
chromosome using regular numerical representation, and thus no coding or decoding
is necessary.
Evolutionary programming [9] was presented for evolving artificial intelligence
for predicting changes in an environment, which was coded as a sequence of symbols
from a finite alphabet. Each chromosome is encoded as a finite state machine. The
approach was later generalized for solving numerical optimization problems based
on Gaussian mutation [8]. Evolutionary programming is very similar to ES with the
(λ + λ) strategy, but it does not use crossover, and it uses probabilistic competition
for selection.
Unlike GA, the primary search operator in ES is mutation. There are some major
differences between ES and GA.
procedure in GA is random and the chances of selection and mating are propor-
tional to an individual’s fitness.
• Relative order of selection and genetic operations. In ES, the selection procedure
is implemented after crossover and mutation, while in GA, it is carried out before
crossover and mutation are applied.
• Adaptation of control parameters. In ES, the strategy parameters σ are evolved
automatically by encoding them into chromosomes. In contrast, the control para-
meters in GA are problem-specific and need to be prespecified.
• Function of mutation. In GA, mutation is used to regain the lost genetic diversity,
while in ES, mutation functions as a hill-climbing search operator with adaptive
step size σ. Due to the normal distribution nature in Gaussian mutation, the tail
part of the distribution may generate a chance for escaping from a local optimum.
Other differences are embodied in the encoding methods and genetic operators.
However, the line between the different evolutionary computation methods is now
being blurred, since both methods are improved by borrowing the ideas from each
other. For example, CHC [7] has the properties of both GA and ES.
For continuous functional optimization, it is generally known that evolutionary
programming or ES works better than GA [2]. CMA-ES belongs to the
best-performing direct search strategies for real-valued black-box optimization of
unconstrained problems, based on the results of the 2009 and 2010 GECCO black-
box optimization benchmarking.
decreasing function. When the gradient is zero, the curve at that point is flat. This
point is called an extreme or stationary point. An optimal solution is located at an
extreme point.
Classical gradient methods provide fast and reliable search on a differentiable solu-
tion landscape, but may be trapped at a local optimal solution. Evolutionary gradient
search uses EAs to construct gradient information on a nondifferential landscape and
later developed it for noisy environment optimization [1,21]. It uses self-adaptive
control for mutation, i.e., the chromosome is coded as x = (x1 , x2 , . . . , xn , σ). Mul-
tidirectional searches are carried out. The gradient is the direction calculated from
the evolutionary movement instead of the single movement of a solution. A centred
differencing approach is used for gradient estimation.
Evolutionary gradient search has the sense of (1, λ)-ES. It only works on one
individual. From current point x, the method generates λ new individuals t 1 , . . . , t λ
using normal mutation, and calculates their fitness values as f (t 1 ), . . . , f (t λ ). The
estimated gradient is given by
λ
g= (f (t i ) − f (x)) (t i − x), (5.4)
i=1
g
which is normalized as e = g .
Evolutionary gradient search generates two trial points:
σ
x1 = x + (σψ)e, x2 = x + e, (5.5)
ψ
where ψ > 1 is a factor. The new individual is given by
x = x + σ e, (5.6)
with
σψ if f (x1 ) > f (x2 )
σ = σ . (5.7)
ψ if f (x1 ) ≤ f (x2 )
Gradient evolution [14] is a population-based metaheuristic method. Similar to
evolutionary gradient search, gradient evolution uses a gradient estimation approach
that is based on a centered differencing approach. Its population comprises a num-
ber of vectors that represent possible solutions. Gradient evolution searches for the
optimal solution over several iterations. In each iteration, all vectors are updated
using three operators: vector updating, jumping, and refreshing. Gradient evolution
algorithm uses an elitist strategy. Gradient evolution performs better than, or as well
as, PSO, DE, ABC, and continuous GA, for most of the benchmark problems tested.
The updating rule for gradient evolution is derived from a gradient estimation
method. It modifies the updating rule for the individual-based search, which is
inspired from a Taylor series expansion. The search direction is determined by the
Newton–Raphson method. Vector jumping and refreshing help to avoid local optima.
The algorithm simply sets a jumping rate to determine whether or not a vector must
jump. Vector refreshing is performed when a vector does not move to another location
after multiple iterations. Only a chosen vector can jump to a different direction.
5.3 Evolutionary Gradient Search and Gradient Evolution 87
Example 5.2: The Easom function is treated in Example 2.1 and Example 3.4. Here
we solve this same problem using ES with the same ES settings given in Example
5.1. The global minimum value is −1 at x = (π, π)T .
For a random run, we have f (x) = −1.0000 at (3.1416, 3.1413) with 9000 func-
tion evaluations. All the individuals converge toward the global optimum. For 10
random runs, the solver always converged to the global optimum within 100 gener-
ations. The evolution of a random run is illustrated in Figure 5.2.
20
10
Best fitness
Mean fitness
15
10
10
Fitness value
10
5
10
0
10
−5
10
0 5 10 15 20 25 30 35 40
Generation
Figure 5.1 The evolution of a random run of ES for Rosenbrock function: the minimum and
average objectives.
88 5 Evolutionary Strategies
0
Best fitness
−0.1 Mean fitness
−0.2
−0.3
−0.4
Fitness value
−0.5
−0.6
−0.7
−0.8
−0.9
−1
0 10 20 30 40 50 60 70 80 90
Generation
Figure 5.2 The evolution of a random run of ES for the Easom function: the minimum and average
objectives.
From this example and Example 5.1, it is concluded that the ES implementa-
tion gives better results than SA and GA for both Rosenbrock function and Easom
function.
where δ is a global step size, z is a random vector whose elements are drawn from a
normal distribution N(0, 1), and the columns of the rotation matrix B are the eigen-
vectors of the covariance matrix C of the distribution of mutation points. The step
size δ is also adaptive. CMA implements PCA of the previously selected mutation
steps to determine the new mutation distribution [11].
By suitably defining mutation operators, ES can evolve significantly faster. CMA-
based mutation operator makes ES two orders of magnitude faster than conventional
ES [10–12]. CMA implements the concepts of derandomization and cumulation for
self-adaptation of the mutation distribution [11].
In CMA-ES (https://www.lri.fr/~hansen/), not only is the step size of the mutation
operator adjusted at each generation, but also is the step direction. Heuristics for
setting search parameters, detecting premature convergence, and a restart strategy can
also be introduced into CMA-ES. CMA is one of the best real-parameter optimization
algorithms.
In [12], the original CMA-ES [11] is modified to adapt the covariance matrix
by exploiting more of the information contained in larger populations. Instead of
updating the covariance matrix with rank-one information, higher rank information
is included. This reduces the time complexity from O(n2 ) to O(n), for a problem
dimension of n.
BI-population CMA-ES with alternative restart strategy combines two modes of
parameter settings for each restart [17]. It is the winner of the competition on real-
parameter single objective optimization at IEEE CEC-2013.
Limited memory CMA-ES [18] is an alternative to limited memory BFGS method.
Inspired by limited memory BFGS, limited memory CMA-ES samples candidate
solutions according to a covariance matrix reproduced from m direction vectors
selected during the optimization process. Limited memory CMA-ES outperforms
CMA-ES and its large scale versions on non-separable ill-conditioned problems
with a factor that increases with problem dimension. The algorithm demonstrates a
performance comparable to that of limited memory BFGS on non-trivial largescale
optimization problems.
Mixed integer evolution strategies [16] are natural extensions of ES for mixed
integer optimization problems whose parameter vectors consisting of continuous
variables as well as nominal discrete and integer variables. They use specialized
mutation operators tailored for the mixed parameter classes. For each type of variable,
the choice of mutation operators is governed by a natural metric for this variable type,
maximal entropy, and symmetry considerations. All distributions used for mutation
can be controlled in their shape by means of scaling parameters, allowing self-
adaptation to be implemented. Global convergence of the method is proved on a
very general class of problems.
The evolution path technique employed by CMA-ES is a fine example of exploit-
ing history. History was also used in developing efficient EAs that adaptively mutate
and never revisit [24]. An archive is used to store all the solutions that have been
explored before. It constitutes an adaptive mutation operator that has no parameter.
The algorithm has superior performance over CMA-ES.
90 5 Evolutionary Strategies
Problems
5.1 Find out the global search mechanism, the convergence mechanism, and the
up-hill mechanism of ES.
5.2 Explain how an elitist GA is similar to (μ + λ)-ES.
5.3 Minimize the 10-dimensional Rastrigin function on the domain [−5.12, 5.12]
using (μ + λ)-ES with μ = 10 and λ = 10. Set the standard deviation of the
mutation in each dimension to 0.02.
(1) Record the best individual at each generation for 100 generations.
(2) Run 50 simulations.
(3) Plot the average minimum cost values as a function of generation number.
References
1. Arnold D, Salomon R. Evolutionary gradient search revisited. IEEE Trans Evol Comput.
2007;11(4):480–95.
2. Back T, Schwefel H. An overview of evolutionary algorithms for parameter optimization. Evol
Comput. 1993;1(1):1–23.
3. Beyer H-G. Toward a theory of evolution strategies: self-adaptation. Evol Comput.
1995;3(3):311–47.
4. Beyer H-G. Convergence analysis of evolutionary algorithms that are based on the paradigm
of information geometry. Evol Comput. 2014;22(4):679–709.
5. Beyer H-G, Melkozerov A. The dynamics of self-adaptive multi-recombinant evolution strate-
gies on the general ellipsoid model. IEEE Trans Evol Comput. 2014;18(5):764–78.
6. Beyer H-G, Hellwig M. The dynamics of cumulative step size adaptation on the ellipsoid
model. Evol Comput. 2016;24:25–57.
7. Eshelman LJ. The CHC adaptive search algorithm: how to have safe search when engaging in
nontraditional genetic recombination. In: Rawlins GJE, editor. Foundations of genetic algo-
rithms. San Mateo, CA: Morgan Kaufmann; 1991. p. 265–283.
8. Fogel DB. An analysis of evolutionary programming. In: Proceedings of the 1st annual con-
ference on evolutionary programming, La Jolla, CA, May 1992. p. 43–51.
9. Fogel L, Owens J, Walsh M. Artificial intelligence through simulated evolution. New York:
Wiley; 1966.
10. Hansen N, Ostermeier A. Adapting arbitrary normal mutation distributions in evolution strate-
gies: the covariance matrix adaptation. In: Proceedings of IEEE international conference on
evolutionary computation, Nagoya, Japan, 1996. p. 312–317.
11. Hansen N, Ostermeier A. Completely derandomized self-adaptation in evolution strategies.
Evol Comput. 2001;9(2):159–95.
12. Hansen N, Muller SD, Koumoutsakos P. Reducing the time complexity of the derandomized
evolution strategy with covariance matrix adaptation (CMA-ES). Evol Comput. 2003;11(1):1–
18.
13. Herdy M. Application of the evolution strategy to discrete optimization problems.In: Schwe-
fel HP, Manner R, editors. Parallel problem solving from nature, Lecture notes on computer
science, vol. 496. Berlin: Springer; 1991. p. 188–192
14. Kuo RJ, Zulvia FE. The gradient evolution algorithm: a new metaheuristic. Inf Sci.
2015;316:246–65.
References 91
15. Lee CY, Yao X. Evolutionary programming using mutations based on the Levy probability
distribution. IEEE Trans Evol Comput. 2004;8(1):1–13.
16. Li R, Emmerich MTM, Eggermont J, Back T, Schutz M, Dijkstra J, Reiber JHC. Mixed integer
evolution strategies for parameter optimization. Evol Comput. 2013;21(1):29–64.
17. Loshchilov I. CMA-ES with restarts for solving CEC 2013 benchmark problems. In: Proceed-
ings of IEEE congress on evolutionary computation (CEC 2013), Cancun, Mexico, June 2013.
p. 369–376.
18. Loshchilov I. LM-CMA: an alternative to L-BFGS for large scale black-box optimization. Evol
Comput. 2016.
19. Ostermeier A, Gawelczyk A, Hansen N. Step-size adaptation based on non-local use of selection
information. In: Parallel problem solving from nature (PPSN III), Lecture notes in computer
science, vol. 866. Berlin: Springer; 1994. p. 189–198.
20. Rechenberg I. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biol-
ogischen biologischen Evolution. Freiburg, Germany: Formman Verlag; 1973.
21. Salomon R. Evolutionary algorithms and gradient search: similarities and differences. IEEE
Trans Evol Comput. 1998;2(2):45–55.
22. Schwefel HP. Numerical optimization of computer models. Chichester: Wiley; 1981.
23. Yao X, Liu Y, Lin G. Evolutionary programming made faster. IEEE Trans Evol Comput.
1999;3(2):82–102.
24. Yuen SY, Chow CK. A genetic algorithm that adaptively mutates and never revisits. IEEE Trans
Evol Comput. 2009;13(2):454–72.
Differential Evolution
6
Differential evolution (DE) is a popular, simple yet efficient EA for solving real-
parameter global optimization problems [30]. DE is an elitist EA. It creates new
candidate solutions by a multiparent reproduction strategy. DE uses the directional
information from the current population for each individual to form a simplex-like
triangle.
6.1 Introduction
6.2 DE Algorithm
Figure 6.1 The landscape of Rastrigin function f (x) with two variables.
20
Fitness value
15
10
0
0 20 40 60 80 100
Generation
Figure 6.2 The evolution of a random run of DE for Rastrigin function: the minimum and average
objectives.
6.3 Variants of DE
DE basically outperforms PSO and other EAs in terms of the solution quality [35].
It still has the problems of slow and/or premature convergence.
By dynamically controlling F and/or Cr using fuzzy logic controllers, fuzzy adap-
tive DE [18] converges much faster than DE, particularly when the dimensionality
98 6 Differential Evolution
best vector xtbest of the entire population at current generation G for mutating a
population member.
DE Markov chain [34], as a population MCMC algorithm, solves an important
problem in MCMC, namely, that of choosing an appropriate scale and orientation
for the jumping distribution. In DE Markov chain, the jumps are simply a fixed
multiple of the differences of two random parameter vectors that are currently in
the population. The selection process of DE Markov chain works via the Metropolis
ratio which defines the probability with which a proposal is accepted.
JADE [45] implements a mutation strategy DE/current-to-pbest with optional
external archive and updates control parameters in an adaptive manner. DE/current-
to-pbest is a generalization of classical DE/current-to-best, while the optional archive
operation utilizes historical data to provide information of progress direction. Both
operations diversify the population and improve the convergence performance.
Current-to-pbest utilizes the information of multiple best solutions to balance the
greediness of the mutation and the diversity of the population. JADE is better than,
or at least comparable to, other DE algorithms, canonical PSO, and other EAs in
terms of convergence performance.
Geometric DE is a formal generalization of traditional DE that can be used to
derive specific DE algorithms for both continuous and combinatorial spaces retain-
ing the same geometric interpretation of the dynamics of DE search across represen-
tations. Specific geometric DE are derived for search spaces associated with binary
strings, permutations, vectors of permutations and genetic programs [20].
In [7], switched parameter DE modifies basic DE by switching the values of
the scale factor (mutation step size) and crossover rate in a uniformly random way
between two extreme corners of their feasible ranges for different individuals. Each
individual is mutated either by DE/rand/1 scheme or by DE/best/1 scheme. The
individual is subjected to that mutation strategy which was responsible for its last
successful update. Switched parameter DE achieves very competitive results against
the best known algorithms under the IEEE CEC 2008 and 2010 competitions.
In DE, the use of different mutation and crossover strategies with different parame-
ter settings can be appropriate during different stages of the evolution. In evolving
surrogate model-based DE method [19], a surrogate model, which is constructed
based on the population members of the current generation, is used to assist DE
in order to generate competitive offspring using the appropriate parameter setting
during different stages of the evolution. From the generated offspring members, a
competitive offspring is selected based on the surrogate model evaluation. A Krig-
ing model is employed to construct the surrogate. Evolving surrogate model-based
DE performs statistically similar or better than the state-of-the-art self-adaptive DE
algorithms.
100 6 Differential Evolution
Standard DE and its variants typically operate in the continuous space. Several DE
algorithms are proposed for binary and discrete optimization problems.
In artificial immune system-based binary DE [16], the scaling factor is treated as
a random bit-string and the trial individuals are generated by Boolean operators. A
modified binary DE algorithm [37] improves the Boolean mutation operator based
on the binary bit-string framework. In binary-adapted DE [14], the scaling factor
is regarded as the probability of the scaled difference bit to take on one. Stochastic
diffusion binary DE [28] hybridizes binary-adapted DE [14] with ideas extracted
from stochastic diffusion search. These binary DE algorithms discard the updating
formulas of standard DE and generated new individuals based on different Boolean
operators.
Angle modulated DE is a binary DE inspired by angle modulated PSO. In angle
modulated DE [23], standard DE is adopted to update the four real-coded parameters
of angle modulated function which is sampled to generate the binary-coded solutions
till the global best solution is found. Thus, angle modulated DE actually works in
continuous space.
In discrete binary DE [5], the sigmoid function used in discrete binary PSO [17]
is directly taken to convert the real individuals to bit strings. Discrete binary DE
searches in the binary space directly, but it is very sensitive to the setting of the
control parameters. Moreover, the value transformed by the sigmoid function is not
symmetrical in discrete binary DE, which reduces the global searching ability.
Another modified binary DE [38] develops a probability estimation operator to
generate individuals. The method reserves the updating strategy of DE. The prob-
ability estimation operator is utilized to build the probability model for generating
binary-coded mutated individuals, and it can keep the diversity of population better
and is robust to the setting of parameters. It outperforms discrete binary DE [5],
modified binary DE [37], discrete binary PSO [17] and binary ant system in terms
of accuracy and convergence speed.
Some theoretical results on DE are provided in [42,43], where the influence of the
mutation and crossover operators and their parameters on the expected population
variance is theoretically analyzed. In case of applying mutation and recombination
but no selection, the expected population variance of DE is shown to be greater than
that of ES [42].
In [44], the influence of the crossover rate on the distribution of the number of
mutated components and on the mutation probability is theoretically analyzed for
several variants of crossover, including binomial and exponential strategies in DE.
The behavior of exponential crossover variants is more sensitive to the problem size
than that of its binomial crossover counterparts.
6.5 Theoretical Analysis on DE 101
Problems
References
1. Ahandani MA, Alavi-Rad H. Opposition-based learning in the shuffled differential evolution
algorithm. Soft Comput. 2012;16:1303–37.
2. Ahandani MA, Shirjoposht NP, Banimahd R. Three modified versions of differential evolution
algorithm for continuous optimization. Soft Comput. 2010;15:803–30.
3. Brest J, Greiner S, Boskovic B, Mernik M, Zumer V. Self-adapting control parameters in
differential evolution: a comparative study on numerical benchmark problems. IEEE Trans
Evol Comput. 2006;10(6):646–57.
4. Brest J, Maucec MS. Population size reduction for the differential evolution algorithm. Appl
Intell. 2008;29:228–47.
5. Chen P, Li J, Liu Z. Solving 0-1 knapsack problems by adiscrete binary version of differen-
tial evolution. In: Proceedings of second international symposiumon intelligent information
technology application, Shanghai, China, Dec 2008. p. 513–516.
102 6 Differential Evolution
28. Salman AA, Ahmad I, Omran MGH. A metaheuristic algorithm to solve satellite broadcast
scheduling problem. Inf Sci. 2015;322:72–91.
29. Sarker RA, Elsayed SM, Ray T. Differential evolution with dynamic parameters selection for
optimization problems. IEEE Trans Evol Comput. 2014;18(5):689–707.
30. Storn R, Price K. Differential evolution—a simple and efficient adaptive scheme for global
optimization over continuous spaces. International Computer Science Institute, Berkeley, CA,
Technical Report TR-95-012, March 1995.
31. Storn R, Price KV. Differential evolution—a simple and efficient heuristic for global optimiza-
tion over continuous spaces. J Global Optim. 1997;11(4):341–59.
32. Storn R, Price KV, Lampinen J. Differential evolution—a practical approach to global opti-
mization. Berlin, Germany: Springer; 2005.
33. Sutton AM, Lunacek M, Whitley LD. Differential evolution and non-separability: using selec-
tive pressure to focus search. In: Proceedings of the 9th annual conference on GECCO, July
2007. p. 1428–1435.
34. Ter Braak CJF. A Markov chain Monte Carlo version of the genetic algorithm differential
evolution: Easy Bayesian computing for real parameter spaces. Stat Comput. 2006;16:239–49.
35. Vesterstrom J, Thomson R. A comparative study of differential evolution, particle swarm opti-
mization, and evolutionary algorithms on numerical benchmark problems. In: Proceedings of
IEEE congress on evolutionary computation (CEC), Portland, OR, June 2004. p. 1980–1987.
36. Wang H, Rahnamayan S, Sun H, Omran MGH. Gaussian bare-bones differential evolution.
IEEE Trans Cybern. 2013;43(2):634–47.
37. Wu CY, Tseng KY. Topology optimization of structures using modified binary differential
evolution. Struct Multidiscip Optim. 2010;42:939–53.
38. Wang L, Fu X, Mao Y, Menhas MI, Fei M. A novel modified binary differential evolution
algorithm and its applications. Neurocomputing. 2012;98:55–75.
39. Zaharie D. Control of population diversity and adaptation in differential evolution algorithms.
In: Proceedings of MENDEL 2003, Brno, Czech, June 2003. p. 41–46.
40. Yang Z, He J, Yao X. Making a difference to differential evolution. In: Advances in metaheuris-
tics for hard optimization. Berlin: Springer; 2007. p. 415–432.
41. Yang Z, Tang K, Yao X. Self-adaptive differential evolution with neighborhood search. In:
Proceedings of IEEE congress on evolutionary computation (CEC), Hong Kong, June 2008.
p. 1110–1116.
42. Zaharie D. On the explorative power of differential evolution. In: Proceedings of 3rd interna-
tional workshop on symbolic numerical algorithms and scientific computing, Oct 2001. http://
web.info.uvt.ro/~dzaharie/online?papers.html.
43. Zaharie D. Critical values for the control parameters of differential evolution algorithms. In:
Proceedings of the 8th international mendel conference on soft computing, 2002. p. 62–67.
44. Zaharie D. Influence of crossover on the behavior of differential evolution algorithms. Appl
Soft Comput. 2009;9(3):1126–38.
45. Zhang J, Sanderson AC. JADE: adaptive differential evolution with optional external archive.
IEEE Trans Evol Comput. 2009;13(5):945–58.
46. Zhang X, Yuen SY. A directional mutation operator for differential evolution algorithms. Appl
Soft Comput. 2015;30:529–48.
Estimation of Distribution Algorithms
7
7.1 Introduction
EDAs [28,37] are also called probabilistic model-building GAs [41] and iterated
density-estimation EAs (IDEAs) [7]. They borrow two concepts from evolutionary
computation: population-based search, and exploration by combining and perturbing
promising solutions. They also use probabilistic models from machine learning to
guide exploration of the search space. EDAs usually differ in the representation of
candidate solutions, the class of probabilistic models, or the procedures for learning
and sampling probabilistic models. EDA deals with noisy information.
EDAs have the ability to uncover the hidden regularities of problems and then
exploit them for effective search. EDA uses a probabilistic model to estimate the
distribution of promising solutions, and to further guide the exploration of the search
space. Estimating the probability distribution from data corresponds to tuning the
model for the inductive search bias. The probabilistic model is further employed to
generate new points.
In EDAs, classical genetic operators are replaced by the estimation of a prob-
abilistic model and its simulation in order to generate the next population. EDAs
perform two steps: building a probabilistic model from promising solutions found so
far, and then using this model to generate new individuals to replace the old popula-
tion. EDAs often require fewer fitness evaluations than EAs. A population is usually
not maintained between generations. A drawback of EDAs is that the computational
complexity increases rapidly with increasing dimensionality.
The first EDA for real-valued random variables was an adaptation of binary PBIL
[49,52]. Unsupervised estimation of Bayesian network algorithm [44] is for effective
and efficient globally multimodal problem optimization. It uses a Bayesian network
for data clustering in order to factorize the joint probability distribution for the
individuals selected at each iteration.
ACO belongs to EDAs. EDAs and ACO are very similar and differ mainly in the
way the probabilistic model is updated [14,34].
Mateda-2.0 (http://www.jstatsoft.org/article/view/v035i07) is a MATLAB pack-
age for the implementation and analysis of EDAs.
1. Set t = 1.
Initialize the probability model p(x, t) to some prior (e.g., a uniform distribution).
2. Repeat:
a. Sampling step: Generate a population P (t) of N P individuals by sampling the
model.
b. Evaluation step: Determine the fitness of the individuals in the population.
c. Selection step: Create an improved data set by selecting M ≤ N P points.
d. Learning step: Create a new model p(x, t) from the old model and the improved
data set.
e. Generate O(t) by generating N P new points from the distribution p(x, t).
f. Incorporate O(t) into P (t).
g. Set t = t + 1.
until termination criteria are met.
108 7 Estimation of Distribution Algorithms
1 Minimum value
10
Mean value
Function value
0
10
Figure 7.1 The evolution of a random run of PBIL for Ackley function: the minimum and average
objectives.
Example 7.1:
We now minimize Ackley function of two variables:
⎛ ⎞
n
n
1 1
min f (x) = 20 + e − 20 exp ⎝−0.2 xi2 ⎠ − exp cos(2πxi ) ,
x n n
i=1 i=1
Compact GAs (cGAs) [24] evolve a probability vector that describes the hypothetic
distribution of a population of solutions in the search space to mimic the first-order
behavior of simple GA with uniform crossover. It was primarily inspired by the ran-
dom walk model, proposed to estimate GA convergence on a class of problems where
7.4 Compact Genetic Algorithms 111
there is no interaction between the building blocks constituting the solution. cGA
iteratively processes the probability vector with updating mechanisms that mimic
the typical selection and recombination operations performed in standard GA, and
is almost equivalent to simple GA with binary tournament selection and uniform
crossover on a number of test problems [24].
Elitism-based cGAs [2] are EDAs for solving difficult optimization problems
without compromising on memory and computation costs. The idea is to deal with
issues connected with lack of memory by allowing a selection pressure that is high
enough to offset the disruptive effect of uniform crossover. The analogies between
cGAs and (1 + 1)-ES are discussed and a mathematical model of ES is also extended
to cGAs obtaining useful analytical performance in [2].
cGA represents the population by means of a vector of probabilities pi ∈
[0, 1], i = 1, . . . , l, for l alleles needed to represent the solutions. Each pi mea-
sures the proportion of individuals in the simulated population that have a zero (one)
in the ith locus. By treating these values as probabilities, new individuals can be
generated and, based on their fitness, the probability vector updated in order to favor
the generation of better individuals.
The probabilities pi are initially set to 0.5 for a randomly generated population. At
each iteration cGA generates two individuals on the basis of the current probability
vector and compares their fitness. Let W be the individual with better fitness and L
the individual with worse fitness. The probability vector at step k + 1 is updated by
⎧ k
⎨ pi + N1 , if wi = 1 and li = 0
k+1
pi = pik − N1 , if wi = 0 and li = 1 , (7.4)
⎩ k
pi , if wi = li
where N is the size of the population simulated and wi (or li ) is the value of the ith
allele of W (or L). cGA stops when the values of the probability vector p are all
equal to zero or one, which is the final solution.
Since cGA mimics the first-order behavior of standard GA, it is basically a 1-bit
optimizer and ignores the interactions among the genes. To solve problems with
higher-order building blocks, GAs with both higher selection pressure and larger
population sizes have to be exploited to help cGA to converge to better solutions [24].
cGA can be used to quickly assess the difficulty of a problem. A problem is
easy if it can be solved with cGA exploiting a low selection rate. The more the
selection rate is for solving the problem, the more difficult is the problem. Given a
population of individuals, cGA updates the probability vector by 1/N . Only log2 N
bits are needed to store the finite set of values for each pi . cGA, therefore, requires
l log2 N bits compared to the Nl bits needed by simple GA, hence saving memory
requirement.
Real-valued cGA [32] works directly with real-valued chromosomes. For an opti-
mization problem with m real-valued variables, it uses as probability vector a m × 2
matrix describing the mean and the standard deviation of the distribution of each gene
in the hypothetical population. New variants of the update rules are then introduced
to evolve the probability vector in a way that mimics binary-coded cGA.
112 7 Estimation of Distribution Algorithms
BOA employs general probabilistic models for discrete variables [41]. It utilizes
techniques for modeling multivariate data by Bayesian networks so as to estimate
the joint probability distribution of promising solutions. The method is very effec-
tive even on large decomposable problems with loose and tight linkage of building
blocks. The superior subsolutions are identified as building blocks. Theoretically
and empirically, BOA finds the optimal solution with subquadratic scaleup behavior.
BOA realizes probabilistic building-block crossover that approximates population-
wise building-block crossover by a probability distribution estimated on the basis of
proper decomposition [41,42].
Real-coded BOA [1] employs a Bayesian factorization that estimates a joint prob-
ability distribution for multivariate variables by a product of univariate conditional
distributions of each random variable. It deals with a real-valued optimization by
evolving a population of promising solutions such that new offspring are generated
in line with the estimated probabilistic models of superior parent population. An ini-
tial population is generated at random. Superior solutions are selected by a method
such as tournament or truncation. A probabilistic model is learned from the selected
solutions by exploiting an information metric. New solutions are drawn by sampling
the learned model. The procedure iterates until some termination criteria are satisfied.
Real-coded BOA empirically solves numerical optimization problems of bounded
difficulty with subquadratic scaleup behavior [1]. A theoretical analysis shows that
real-coded BOA finds the optimal solution with a subquadratic (in problem size)
scalability for uniformly scaled decomposable problems [3]. The analytical models
of real-coded BOA have been verified by experimental studies. The analysis has been
extended for exponentially scaled problems, and the quasi-quadratic scalability has
also found experimental support.
The behaviors of PBIL with elitist selection in discrete space have been studied in
[22,26]. Having a sufficiently small learning rate, PBIL is modeled using a discrete
dynamic system and the local optima of an injective function with respect to Ham-
ming distance are stable fixed points of PBIL [22]. The dynamic behavior of UMDA
is shown to be very similar to that of GA with uniform crossover [35].
UMDA and PBIL can locate the optimum of a linear function, but cannot solve
problems with nonlinear variable interactions [22,26]. These results suggest that
EDAs using only first-order statistics have very limited ability to find global optimal
solutions.
PBIL and cGA are modeled by a Markov process and the behavior is approximated
using an ordinary differential equation, which, with sufficiently small learning rates,
converges to local optima of the function to be optimized, with respect to Hamming
distance [46,48]. Bounds on the probability of convergence to the optimal solution
7.6 Concergence Properties 113
are obtained in [45] for cGA and PBIL. Moreover, a sufficient condition for conver-
gence to the optimal solution is given, and a range of possible values for algorithmic
parameters is computed, at which the algorithm converges to the optimal solution
with a predefined confidence level.
The dynamic behaviors of the limit models of UMDA and FDA with tournament
selection are studied in [63] for discrete optimization problems. The local optima
with respect to the Hamming distance are asymptotically stable. The limit model
of UMDA can be trapped at any local optimum for some initial probability models.
In the case of an additively decomposable objective function, FDA can converge to
the global optimal solution [63]. Based on the dynamic analysis of the distributions
of infinite population in EDAs, FDA under proportional selection converges to the
global optimum for optimization of continuous additively decomposable functions
with overlaps [64].
In addition to convergence time, the time complexity of EDAs can be measured
by the first hitting time. The first hitting time of cGA with population size 2 is
analyzed in [17] by employing drift analysis and Chernoff bounds on linear pseudo-
boolean functions. On the pseudo-boolean injective function, the worst-case mean
exponential first hitting time in the problem size are proved for four commonly used
EDAs using the analytical Markov chain framework [21].
In [13], a classification of problem hardness for EDAs and the corresponding prob-
ability conditions are proposed based on the first hitting time measure. An approach
to analyzing the first hitting time for EDAs with finite population was introduced,
which is implemented on UMDA with truncation selection using discrete dynamic
systems and Chernoff bounds on two unimodal problems.
For EDAs, theoretical results on convergence are available based on infinite pop-
ulation assumption [12,22,38,64].
In consideration of the premature convergence phenomenon, the dynamics of
EDAs are analyzed in [55] in terms of Markov chains and general EDAs cannot
satisfy two necessary conditions for being effective search algorithms. In the case of
UMDA, the global optimum is found only if the population size is sufficiently large.
When the initial configuration is fixed and the learning rate is close to zero,
a unified convergence behavior of PBIL is presented in [30] based on the weak
convergence property of PBIL, and the results are further generalized to the case
when the individuals are randomly selected from the population.
outperforms PBIL in solution quality, but at a higher computational cost; its perfor-
mance is comparable to that of MA-SW-Chains [33], the winner of CEC’2010.
Source code of various EDAs can be downloaded from the following sources:
extended cGA [23] (C++), BOA (C++), BOA with decision graphs (http://www-
illigal.ge.uiuc.edu); adaptive mixed BOA (http://jiri.ocenasek.com), real-coded
BOA (http://www.evolution.re.kr), naive multiobjective mixture-based IDEA,
normal IDEA-induced chromosome elements exchanger, normal IDEA (http://
homepages.cwi.nl/~bosman). There are also Java applets for several real-valued and
permutation EDAs (http://www2.hannan-u.ac.jp/~tsutsui/research-e.html).
Problems
7.1 Given a uniform random variable in (0, 1), find the function y(x) with the pdf
3a if 0 < y < 3/4
p(y) =
a if 3/4 < y < 1.
Solve a so that p(y) is a valid pdf.
7.2 Plot Ackley function of two variables.
7.3 Write the algorithmic flowchart of PBIL.
7.4 Use PBIL to minimize the 10-dimensional Ackley function, using eight bits per
dimension. Run for 30 generations using N P = 200, α = 0.005, and μ = 2.
7.5 Write the algorithmic flowchart of cGA.
7.6 Download Mateda-2.0 package and learn by running the examples. Then use it
for optimizing a general benchmark in the Appendix.
116 7 Estimation of Distribution Algorithms
References
1. Ahn CW, Goldberg DE, Ramakrishna RS. Real-coded Bayesian optimization algorithm: bring-
ing the strength of BOA into the continuous world. In: Proceedings of genetic and evolutionary
computation conference (GECCO), Seattle, WA, USA, June 2004. p. 840–851.
2. Ahn CW, Ramakrishna RS. Elitism based compact genetic algorithms. IEEE Trans Evol Com-
put. 2003;7(4):367–85.
3. Ahn CW, Ramakrishna RS. On the scalability of real-coded Bayesian optimization algorithm.
IEEE Trans Evol Comput. 2008;12(3):307–22.
4. Baluja S. Population-based incremental learning: a method for integrating genetic search based
function optimization and competitive learning. Technical Report CMU-CS-94-163, Computer
Science Department, Carnegie Mellon University, Pittsburgh, PA, 1994.
5. Baluja S, Caruana R. Removing the genetics from the standard genetic algorithm. In: Prieditis
A, Russel S, editors. Proceedings of the 12th international conference on machine learning.
San Mateo, CA: Morgan Kaufmann; 1995. p. 38–46.
6. Baluja S, Davies S. Fast probabilistic modeling for combinatorial optimization. In: Proceedings
of the 15th national conference on artificial intelligence (AAAI-98), Madison, WI, 1998. p.
469–476.
7. Bosman PAN, Thierens D. An algorithmic framework for density estimation based evolutionary
algorithms. Technical Report UU-CS-1999-46, Utrecht University, 1999.
8. Bosman PAN, Thierens D. Expanding from discrete to continuous estimation of distribution
algorithms: The IDEA. In: Proceedings of parallel problem solving from nature (PPSN VI),
vol. 1917 of Lecture Notes in Computer Science. Springer: Berlin; 2000. p. 767–776.
9. Bosman PAN, Thierens D. Advancing continuous IDEAs with mixture distributions and factor-
ization selection metrics. In: Proceedings of genetic and evolutionary computation conference
(GECCO-2001). San Francisco, CA; 2001. p. 208–212.
10. Ceberio J, Mendiburu A, Lozano JA. Introducing the Mallows model on estimation of distribu-
tion algorithms. In: Proceedings of international conference on neural information processing
(ICONIP), Shanghai, China, Nov 2011. p. 461–470.
11. Ceberio J, Irurozki E, Mendiburu A, Lozano JA. A distance-based ranking model estimation
of distribution algorithm for the flowshop scheduling problem. IEEE Trans Evol Comput.
2014;18(2):286–300.
12. Chen T, Tang K, Chen G, Yao X. On the analysis of average time complexity of estimation of
distribution algorithms. In: Proceedings of IEEE congress on evolutionary computation (CEC),
Singapore, Sept 2007. p. 453–460.
13. Chen T, Tang K, Chen G, Yao X. Analysis of computational time of simple estimation of
distribution algorithms. IEEE Trans Evol Comput. 2010;14(1):1–22.
14. Cordon O, de Viana IF, Herrera F, Moreno L. A new ACO model integrating evolutionary com-
putation concepts: the best-worst ant system. In: Proceedings of second international workshop
ant algorithms (ANTS’2000): from ant colonies to artificial ants, Brussels, Belgium, 2000.
p. 22–29.
15. de Bonet JS, Isbell Jr CL, Viola P. MIMIC: finding optima by estimating probability densities.
In: Mozer MC, Jordan MI, Petsche T. editors, Advances in neural information processing
systems, vol. 9. Cambridge, MA: MIT Press; 1997. p. 424–424.
16. Dong W, Chen T, Tino P, Yao X. Scaling up estimation of distribution algorithms for continuous
optimization. IEEE Trans Evol Comput. 2013;17(6):797–822.
17. Droste S. A rigorous analysis of the compact genetic algorithm for linear functions. Nat Comput.
2006;5(3):257–83.
18. Etxeberria R, Larranaga P. Global optimization using Bayesian networks. In: Proceedings of
2nd symposium on artificial intelligence (CIMAF-99), Habana, Cuba, 1999. p. 332–339.
References 117
19. Gallagher M, Frean M, Downs T. Real-valued evolutionary optimization using a flexible prob-
ability density estimator. In: Proceedings of genetic and evolutionary computation conference
(GECCO), Orlando, Florida, July 1999. p. 840–846.
20. Gallagher JC, Vigraham S, Kramer G. A family of compact genetic algorithms for intrinsic
evolvable hardware. IEEE Trans Evol Comput. 2004;8:111–26.
21. Gonzalez C. Contributions on theoretical aspects of estimation of distribution algorithms.
Doctoral Dissertation, Department of Computer Science and Artificial Intelligence, University
of Basque Country, Donostia, San Sebastian, Spain, 2005.
22. Gonzalez C, Lozano JA, Larranaga P. Analyzing the PBIL algorithm by means of discrete
dynamical systems. Complex Syst. 2000;12(4):465–79.
23. Harik G. Linkage learning via probabilistic modeling in the ECGA. Berlin, Germany: Springer;
1999.
24. Harik GR, Lobo FG, Goldberg DE. The compact genetic algorithm. IEEE Trans Evol Comput.
1999;3(4):287–97.
25. Hasegawa Y, Iba H. A Bayesian network approach to program generation. IEEE Trans Evol
Comput. 2008;12(6):750–63.
26. Hohfeld M, Rudolph, G. Towards a theory of population-based incremental learning. In: Pro-
ceedings of the 4th IEEE conference on evolutionary computation, Indianapolis, IN, 1997. p.
1–5.
27. Khan IH. A comparative study of EAG and PBIL on large-scale global optimization problems.
Appl Comput Intell Soft Comput. 2014; Article ID 182973:10 p.
28. Larranaga P, Lozano JA, editors. Estimation of distribution algorithms: a new tool for evolu-
tionary computation. Norwell, MA: Kluwer Academic Press; 2001.
29. Larranaga P, Lozano JA, Bengoetxea E. Estimation of distribution algorithms based on multi-
variate normal and gaussian networks. Department of Computer Science and Artificial Intelli-
gence, University of Basque Country, Vizcaya, Spain, Technical Report KZZA-1K-1-01, 2001.
30. Li H, Kwong S, Hong Y. The convergence analysis and specification of the population-based
incremental learning algorithm. Neurocomputing. 2011;74:1868–73.
31. Looks M, Goertzel B, Pennachin C. Learning computer programs with the Bayesian optimiza-
tion algorithm. In: Proceedings of genetic and evolutionary computation conference (GECCO),
Washington, DC, 2005, vol. 2, p. 747–748.
32. Mininno E, Cupertino F, Naso D. Real-valued compact genetic algorithms for embedded micro-
controller optimization. IEEE Trans Evol Comput. 2008;12(2):203–19.
33. Molina D, Lozano M, Herrera F. MA-SW-Chains: memetic algorithm based on local search
chains for large scale continuous global optimization. In: Proceedings of the IEEE world
congress on computational intelligence (WCCI’10), Barcelona, Spain, July 2010, p. 1–8.
34. Monmarche N, Ramat E, Dromel G, Slimane M, Venturini G. On the Similarities between
AS, BSC and PBIL: toward the birth of a new meta-heuristic. Technical Report 215, Ecole
d’Ingenieurs en Informatique pour l’Industrie (E3i), Universite de Tours, France, 1999.
35. Muhlenbein H. The equation for response to selection and its use for prediction. Evol Comput.
1998;5:303–46.
36. Muhlenbein H, Mahnig T. FDA—a scalable evolutionary algorithm for the optimization of
additively decomposed function. Evol Comput. 1999;7(4):353–76.
37. Muhlenbein H, Paab G. From recombination of genes to the estimation of distributions. I.
Binary parameters. In: Voigt H-M, Ebeling W, Rechenberg I, Schwefel H-P. editors, Parallel
problem solving from nature (PPSN IV), Lecture Notes in Computer Science 1141. Berlin:
Springer; 1996. p. 178–187.
38. Muhlenbein H, Schlierkamp-Voosen D. Predictive models for the breeder genetic algorithm,
i: continuous parameter optimization. Evol Comput. 1993;1(1):25–49.
39. Muhlenbein H, Mahnig T, Rodriguez AO. Schemata, distributions, and graphical models in
evolutionary optimization. J Heuristics. 1999;5(2):215–47.
118 7 Estimation of Distribution Algorithms
40. Ocenasek J, Schwarz J. Estimation of distribution algorithm for mixed continuous-discrete opti-
mization problems. In: Proceedings of the 2nd euro-international symposium on computational
intelligence, Kosice, Slovakia, 2002. p. 115–120.
41. Pelikan M. Bayesian optimization algorithm: from single level to hierarchy. PhD thesis, Uni-
versity of Illinois at Urbana-Champaign, Urbana, IL, 2002. Also IlliGAL Report No. 2002023.
42. Pelikan M, Goldberg DE, Cantu-Paz E. BOA: the Bayesian optimization algorithm. In: Pro-
ceedings of genetic and evolutionary computation conference, Orlando, FL, 1999. p. 525–532.
43. Pelikan M, Muhlenbein H. The bivariate marginal distribution algorithm. In: Roy R, Furuhashi
T, Chawdhry PK. editors, Advances in soft computing: engineering design and manufacturing.
London, U.K.: Springer; 1999. p. 521–53.
44. Pena JM, Lozano JA, Larranaga P. Globally multimodal problem optimization via an estimation
of distribution algorithm based on unsupervised learning of Bayesian networks. Evol Comput.
2005;13(1):43–66.
45. Rastegar R. On the optimal convergence probability of univariate estimation of distribution
algorithms. Evol Comput. 2011;19(2):225–48.
46. Rastegar R, Hariri A. A step forward in studying the compact genetic algorithm. Evol Comput.
2006;14(3):277–89.
47. Ratle A, Sebag M. Avoiding the bloat with probabilistic grammar-guided genetic programming.
In: Proceedings of the 5th international conference on artificial evolution, Creusot, France,
2001. p. 255–266.
48. Rastegar R, Hariri A. The population-based incremental learning algorithm converges to local
optima. Neurocomputing. 2006;69:1772–5.
49. Rudlof S, Koppen M. Stochastic hill climbing with learning by vectors of normal distributions.
In: Furuhashi T, editor. Proceedings of the 1st Online Workshop on Soft Computing (WSC1).
Nagoya, Japan: Nagoya University; 1996. p. 60–70.
50. Salustowicz R, Schmidhuber J. Probabilistic incremental program evolution. Evol. Comput.
1997;5(2):123–41.
51. Sastry K, Goldberg DE. Probabilistic model building and competent genetic programming. In:
Riolo RL, Worzel B, editors. Genetic programming theory and practice, ch. 13. Norwell, MA:
Kluwer; 2003. p. 205–220.
52. Sebag M, Ducoulombier A. Extending population–based incremental learning to continuous
search spaces. In: Eiben AE et al, editors. Parallel problem solving from nature (PPSN) V.
Berlin: Springer; 1998. p. 418–427.
53. Shan Y, McKay RI, Abbass HA Essam D. Program evolution with explicit learning: A new
framework for program automatic synthesis. In: Proceedings of 2003 congress on evolutionary
computation (CEC), Canberra, Australia, 2003. p. 1639–1646.
54. Shan Y, McKay RI, Baxter R, Abbass H, Essam D, Hoai NX. Grammar model-based program
evolution. In: Proceedings of 2004 IEEE congress on evolutionary computation, Portland, OR,
2004. p. 478–485.
55. Shapiro JL. Drift and scaling in estimation of distribution algorithms. Evol Comput.
2005;13(1):99–123.
56. Sun J, Zhang Q, Tsang E. DE/EDA: a new evolutionary algorithm for global optimization. Inf
Sci. 2005;169:249–62.
57. Syswerda G. Simulated crossover in genetic algorithms. In: Whitley DL, editor. Foundations
of genetic algorithms 2. San Mateo, CA: Morgan Kaufmann; 1993. p. 239–255.
58. Tu Z, Lu Y. A robust stochastic genetic algorithm (StGA) for global numerical optimization.
IEEE Trans Evol Comput. 2004;8(5):456–70.
59. Tsutsui S. Probabilistic model-building genetic algorithms in permutation representation
domain using edge histogram. In: Proceedings of the 7th international conference on parallel
problem solving from nature (PPSN VII), Granada, Spain, September 2002. p. 224–233.
60. Tsutsui S. Node histogram vs. edge histogram: a comparison of probabilistic model-building
genetic algorithms in permutation domains. In: Proceedings of IEEE congress on evolutionary
computation (CEC), Vancouver, BC, Canada, July 2006. p. 1939–1946.
References 119
The behavior of EAs is often analyzed by using the schema-based approach [51],
Markov chain models [79], and infinite population models [91].
Schema Theorem
The two most important theoretical foundations of GA are Holland’s schema theorem
[50] and Goldberg’s building-block hypothesis [40]. The convergence analysis of
simple GA is based on the concept of schema [50]. A schema is a bit pattern that
functions as a set of binary strings.
A schema is a similarity template describing a subset of strings with the same bits
(0 or 1) at certain positions. A schema h = (h 1 , h 2 , . . . , h l ) is defined as a ternary
string of length l, where h i ∈ {0, 1, ∗}, with ∗ denoting the do-not-care symbol. The
size or order o(h) of a schema h is defined as the number of fixed positions (0s or
1s) in the string. A position in a schema is fixed if there is either a 0 or a 1 in this
position. The defining length δ(h) of a schema h is defined as the maximum distance
between any two fixed bits. The fitness of a schema is defined as the average fitness
where h is the number of individuals x that are an instance of the schema h.
The instances of a schema h are all genotypes where x g ∈ {h}. For example,
x = 01101 and x g = 01100 are instances of h = 0 ∗ 1 ∗ ∗. The number of
g
The schema theory for GA [50,51] aims to predict the expected numbers of
solutions in a given schema (a subset of the search space) at the next generation,
in terms of quantities measured at the current generation. According to the schema
theorem, schemata with high fitness and small defining lengths grow exponentially
with time. Thus, GA simultaneously processes a large number of schemata. For a
population of N P individuals, GA implicitly evaluates approximately N P3 schemata
in one generation [40]. This is called implicit parallelism. The theorem holds for all
schemata represented in the population.
The schema theorem works for GP as well, based on the idea of defining a schema
as the subspace of all trees that contain a predefined set of subtrees [59,82]. A schema
theorem for GP was derived in the presence of fitness-proportionate selection and
crossover in [82].
The exact schema theorems for GA and GP have been derived for exactly predict-
ing the expected characteristics of the population at the next generation [85,107]. The
schema theorem based on the concept of effective fitness [107] shows that schemata
of higher than average effective fitness receive an exponentially increasing number
of trials over time. However, generically there is no preference for short, low-order
schemata [107]. Based on the theory proposed in [107], a macroscopic exact schema
8.1 Convergence of Evolutinary Algorithms 123
theorem for GP with one-point crossover is provided in [85]. These schema theorems
have also been written for standard GP with subtree crossover [87,88].
A simpler definition of the schema of GP given in [86] is close to the original
concept of schema in GA. Along with one-point crossover and point mutation, this
concept of schema has been used to derive an improved schema theorem for GP that
describes the propagation of schemata from one generation to the next [86].
An exact microscopic model for the dynamics of a GA with generalized recom-
bination is presented in [106]. It is shown that the schema dynamics have the same
functional form as that of strings and a corresponding exact schema theorem is
derived.
However, there are a lot of criticisms on the schema theorem. The schema growth
inequality provides a lower bound for one-generation transition of GA. For multiple
generations, the prediction of the schema may be useless or misleading due to the
inexactness of the inequality [43].
Building-Block Hypothesis
Building-block hypothesis [40] is the assumption that strings with high fitness can
be located by sampling building blocks with high fitness and combining the building
blocks effectively. This is given in Theorem 8.2.
Many attempts have been made on characterizing the dynamics of EAs. This helps
to understand the conditions for EAs to converge to the global optimum.
Markov chains are widely used mathematical models for the theoretical analysis of
EAs [21,28,79,96,97]. An EA is characterized as a Markov chain with the current
population being the state variables, because the state of the (t + 1)th generation
often depends only on the tth generation. Convergence is analyzed in the sense of
124 8 Topics in Evolutinary Algorithms
are derived based on the infinite population model under proportional selection and
uniform crossover but no elitist selection. The result is then extended to the finite
population model.
A rigorous runtime analysis of a nonelitist-based EA with linear ranking selection
is presented by using an analytical tool called multi-type branching processes in [65].
The results point out situations where a correct balance between selection pressure
and mutation rate is essential for finding the optimal solution in polynomial time.
Building on known results on the performance of the (1+1) EA, an analysis of the
performance of the (1 + λ) EA has been presented for different offspring population
size λ [53]. A simple way is suggested to dynamically adapt this parameter when
necessary.
In [108], a method for establishing lower bounds on the expected running time of
EAs is presented. It is based on fitness-level partitions and an additional condition
on transition probabilities between fitness levels. It yields exact or near-exact lower
bounds for all functions with a unique optimum.
For the above two problems, guided search methods perform worse than many
other methods, since the fitness landscape leads the search method away from the
126 8 Topics in Evolutinary Algorithms
f(x)
0
40 45 50 55 60
x
500
400
300
f(x)
200
100
0
0 20 40 60 80 100
x
optimal solution. For these problems, random search is most likely to be the most
efficient approach to these problems.
GA-deceptive functions are a class of functions where low-order building blocks
are misleading, and their combinations cannot generate higher order building blocks.
Deceptive problems remain as hard problems. Due to deceptive problems, the
building-block hypothesis is facing strong criticism [43]. A fitness landscape with
the global optimum surrounded by a part of the landscape of low average payoff
is highly unlikely to be found by GA, and thus, GA may converge to a suboptimal
solution. For deceptive functions, the fitness of an individual of the population is not
correlated to the expected ability of its representational components. Messy GA [41]
was specifically designed to handle bounded deceptive problems. In [43], the static
building-block hypothesis was proposed as the underlying assumption for defining
deception, and augmented GAs for deceptive problems were also proposed.
Through deception, objective functions may actually prevent the objective from
being reached. Objective functions themselves may actively misdirect search toward
8.2 Random Problems and Deceptive Functions 127
dead ends. Novelty search [64] circumvents deception that also yields a perspec-
tive on open-ended evolution. It simply explores search space by searching for
behavioral novelty and ignoring the objective, even in an objective-based problem.
In the maze navigation and biped walking tasks, novelty search significantly outper-
forms objective-based search.
In general, they are capable of higher quality solutions than EAs due to better
diversity.
Figure 8.3 illustrates panmictic EA, master–slave EA, island EA, and cellular EA.
Various hierarchical EAs can be obtained by hybridizing these models, producing,
such models as island-master–slave hybrid, island-cellular hybrid, and island–island
hybrid.
Figure 8.4 illustrates pool-based EA. The pool is a shared global array of n tasks.
Each of the p processors processes a segment of size u.
Figure 8.5 illustrates coevolutionary EA. Each of the p processors handles one
dimension of the decision variable, and the final solution is obtained by assembling
these components. Each processor treats one variable as the primary variable, and
the other variables as secondary variables.
Scheduling in distributed systems, as grid computing, is a challenging task in
terms of time. Energy savings is also a promising objective for meta-schedulers.
Energy consumption and execution time can be optimized simultaneously using
multiobjective optimization [9].
8.3 Parallel Evolutionary Algorithms 129
Slave
Slave
Master
a ls
du
vi
s
es
di
In
tn
Fi
Slave Slave
Panmictic Master−slave
Pool
1 2 3 4 5 6 ... n
...
x1 Processor 1
Network
evaluation costs are not relatively high, employing a master–slave model may become
inefficient in that communications occupy a large proportion of time.
Another approach is a coarse-grained master–slave model in which each slave
processor contains a subpopulation, while the master receives the best individual
from each slave and sends the global best information to all the slaves [122]. Master
conducts basic EA for global search, whereas the slaves execute local search by
considering the individuals received from the master as neighborhood centers.
In a master–slave algorithm, synchronization plays a vital role in algorithm per-
formance on load-balanced problems, while asynchronous distributed EAs are more
efficient for load-imbalanced problems [102]. The speedup and efficiency of master–
slave distributed EAs may be limited by the master’s performance and by the com-
munication speed between the master and the slaves. In a master–slave model, with
increasing number of slave nodes, the speedup will eventually become poor when
the master saturates. The master–slave distributed EAs are fault-tolerant unless the
master node fails.
The island model is a well-known way to parallelize EAs [4]. The population is
split into smaller subpopulations, which evolve independently for certain periods of
time and periodically exchange solutions through a process called migration. The
approach can execute an existing EA within each deme. To promote information
sharing, a migration mechanism allows to periodically export some best individuals
to other nodes according to a predefined topology.
Using coarse-grained parallelization can have several advantages. This approach
introduces very little overhead, compared to parallelizing function evaluations,
8.3 Parallel Evolutionary Algorithms 131
Sub−node
Master
node
because the amount of communication between different machines is very low. Fur-
thermore, the effort of managing a small population can be much lower than that of
managing a large, panmictic population, as some operations require time that grows
superlinearly with the population size. Also, a small population is more likely to fit
into a cache than a big one. For EA speedups, a linear speedup in the size of the
population or even a superlinear speedup have been reported [2,48], which means
that the total execution time across all machines may be even lower than that for
its sequential counterpart. Diversity is also an advantage, since the subpopulations
evolve independently for certain periods of time. An island distributed EA is often
synchronous that the best individual on each island propagates to all the other islands
at a specific interval of generation [127]. In asynchronous island models, an island
can receive migrated information as soon as it is ready.
A rigorous runtime analysis for island models is performed in [62]. A simple island
model with migration finds a global optimum in polynomial time, while panmictic
populations as well as island models without migration need exponential time, with
very high probability.
GENITOR II [117] is a coarse-grained parallel version of GENITOR. Individuals
migrate at fixed intervals to neighboring nodes, and immigrants replace the worst
individuals in the target deme. In an asynchronous parallel GA [74], each individual
of the population improves its fitness by hill-climbing.
In the parallel DE scheme [112], an entire subpopulation is mapped to a proces-
sor using island model, allowing different subpopulations to evolve independently
toward a solution. It is organized around one master node and m subpopulations
running each on one node, and organized as a unidirectional ring, as shown in
Figure 8.6. The migrations of individuals are passing through the master. This method
is improved in [116].
In religion-based EAs [113], individuals are allowed to move around and interact
with one another as long as they do not violate the religion membership rules. Mating
is prohibited among individuals of different religions and exchange of individuals
between religions is provided only via religious conversion. Briefly, the religious
rules include commitments to reproduce, to believe in no other religion and to con-
vert nonbelievers. Like other structured population GAs, genetic information is able
to spread slowly due to the spatial topology of the population model which restricts
132 8 Topics in Evolutinary Algorithms
In multiagent GA [128], each agent represents a candidate solution, and has its
own purpose and behaviors and can also use knowledge. An agent interacts with its
neighbors by transfering information. In this manner, the information is diffused to
the whole agent lattice. Four evolutionary operators are designed: The neighborhood
competition operator and the neighborhood orthogonal crossover operator realize
the behaviors of competition and cooperation, respectively; the mutation operator
and the self-learning (local search) operator realize the behaviors of making use of
knowledge. Theoretical analysis shows that multiagent GA converges to the global
optimum. Multiagent GA can find high-quality solutions at a computational cost
better than a linear complexity. Similar ideas are implemented in multiagent EA for
constraint satisfaction problems [67] and in multiagent EA for COPs [68].
By analyzing the behavior of a three-dimensional cellular GA against different
grid shapes and selection rates to investigate their influence on the performance of
the algorithm, convergence-speed-guided three-dimensional cellular GA [7] dynam-
ically balances between exploration and exploitation processes. A diversity speed
measure is used to guide the algorithm.
Cooperative coevolution has been introduced into EAs for solving increasingly com-
plex optimization problems through a divide-and-conquer paradigm. In the cooper-
ative coevolution model [89,90], each subcomponent is evolved in a genetically
isolated subpopulation (species). These species cohabit in an ecosystem where each
of them occupies a niche. These species collaborate with one another. Species are
evolved in separate instances of an EA executing in parallel. The individuals are eval-
uated in collaboration with the best individuals of the other species. Credit assign-
ment at the species level is defined in terms of the fitness of the complete solutions
in which the species members participate. The evolution of each species is handled
by a standard EA.
A key issue in cooperative coevolution is the task of problem decomposition. An
automatic decomposition strategy called differential grouping [80] can uncover the
underlying interaction structure of the decision variables and form subcomponents
such that the interdependence between them is kept to a minimum. In [38], the inter-
dependencies among variables are captured by a fast search operator, and problem
decomposition is then performed.
Another key issue involved is the optimization of the subproblems. In [38], a
cross-cluster mutation strategy is utilized to enhance exploitation and exploration.
More specifically, each operator is identified as exploitation-biased or exploration-
biased. The population is divided into several clusters. For individuals within each
cluster, exploitation-biased operators are applied. For individuals among different
clusters, exploration-biased operators are applied. These operators are incorporated
into DE. A cooperative coevolution GP is given in [60].
134 8 Topics in Evolutinary Algorithms
Although cluster, computing grid [35] and peer-to-peer network [119] have been
widely used as physical platforms for distributed algorithms, the implementation of
distributed EAs on a cloud platform has received increasing attention since 2008.
Cloud computing is an emerging technology that is now a commercial reality. Cloud
computing represents a pool of virtualized computer resources. It utilizes virtualiza-
tion and autonomic computing techniques to realize dynamic resource allocations.
MapReduce [30] is a programming model for accessing and processing of scal-
able data with parallel and distributed algorithms. It has been applied in various web-
scale and cloud computing applications. Hadoop is a popular Java-based open-source
clone of Google’s private MapReduce infrastructure. The MapReduce infrastructure
provides all the functional components including communications, load balancing,
fault-tolerance, resource allocation, and file distribution. A user needs only to imple-
ment the map and the reduce functions. Thus, the user needs to focus on the problem
and algorithm only.
As a nondemand computing paradigm, cloud computing is well suited to build
highly scalable and cost-effective distributed EA systems for solving problems with
requirements of variable demands. The cloud computing paradigm prefers availabil-
ity to efficiency, and hence are more suitable for business and engineering applica-
tions. The speedup and distributed efficiency of distributed EAs deployed on clouds
are lower than those deployed on clusters and computing grids, due to the higher
communication overhead. As a pool-based model, the set of participating proces-
sors can be dynamically changed, which enables the algorithms to achieve superior
performance.
Studies in cloud storage are mainly related to content delivery or designing data
redundancy schemes to ensure information integrity. The public FluidDB platform
is a structured storage system. It is an ideal candidate for acting as the substrate of a
persistent or pool-based EA, leading to fluid EA [71].
A cloud-computing-based EA uses a synchronous storage service as pool for
information exchange among population of solutions [72]. It uses free cloud storage
as a medium for holding distributed evolutionary computation, in a parasitic way.
In parasitic computing [11], one machine forces target computers to solve a piece
of a complex computational task merely by engaging them in standard communica-
tion, and the target computers are unaware of having performed computation for a
commanding node.
8.3 Parallel Evolutionary Algorithms 135
The computing power of GPUs can also be used to implement other distributed
population-based metaheuristic models such as the fine-grain parallel fitness evalu-
ation [16], and parallel implementations of ACO [10], and PSO [18].
A comparison of these distrubuted EAs are given in Table 8.1.
8.4 Coevolution
Cooperative Coevolution
Cooperative coevolution is inspired by the ecological relationship of symbiosis,
where different species live together in a mutually beneficial relationship. The rela-
tion between butterflies and plants are an example of coevolution. They apply a
divide-and-conquer approach to simplify the search. It divides an optimization prob-
lem into many modules, evolves each module separately using a species, and then
combines them together to form the whole system [90]. The fitness of an individual
depends on its ability to collaborate with individuals from other species. Each pop-
ulation evolves individuals representing a component of the final solution. Thus, a
full solution is obtained by joining an individual chosen from each population. In
this way, increases in a collaborative fitness value are shared between individuals of
all the populations of the algorithm [90].
Cooperative coevolutionary algorithm [89] decomposes a high-dimensional prob-
lem into multiple lower dimensional subproblems, and tackles each subproblem sep-
arately by a subpopulation. An overall solution can be derived from a combination
of subsolutions, which are evolved from individual subpopulations. The cooperative
coevolution framework is applied to PSO in [114].
The method performs poorly on nonseparable problems, because the interde-
pendencies among different variables cannot be captured well enough. Existing
algorithms perform poorly on nonseparable problems with 100 or more real-valued
variables [123]. Theoretical and empirical arguments show that cooperative coevo-
lutionary algorithms tend to converge to suboptimal solutions in the search space.
An extended formal model for cooperative coevolutionary algorithms, under specific
conditions, can be guaranteed to converge to the globally optimal solution [83].
Teacher-learner type coevolutionary algorithms are a popular approach for imple-
menting active learning, where active learning is divided into two complementary
subproblems: one population infers models using a dynamic dataset while the second
adds to the dataset by designing experiments that disambiguate the current candidate
models. Each EA leverages the advancements in its counterpart to achieve superior
results in a unified active learning framework [15].
Competitive Coevolution
Competitive coevolution resembles predator–prey or host–parasite interactions,
where predators (or hosts) implement the potential solutions to the optimization prob-
lem, while preys (or parasites) find individual fitness. In competitive coevolutionary
optimization, there are usually two independently evolving populations of hosts and
parasites, and an inverse fitness interaction exists between the two subpopulations.
To survive, the losing subpopulation adapts to counter the winning subpopulation in
order to become the new winner. The individuals of each population compete with
one another. This competition is usually represented by a decrease in the fitness value
of an individual when the fitness value of its antagonist increases [95].
138 8 Topics in Evolutinary Algorithms
Cooperative–Competitive Coevolution
Cooperative–competitive coevolution paradigm, which tries to achieve the advan-
tages of cooperation and competition at different levels of the model, has been suc-
cessfully employed in dynamic multiobjective optimization [39]. Multiple-species
models have also been used to evolve coadapted subcomponents. Because the host
and parasite species are genetically isolated and only interact through their fitness
functions, they are full-fledged species in a biological sense.
By a security strategy, a predator’s fitness is assigned with respect to the prey that
performs best against it, that is, by maximizing
G(s) = min f (x, s). (8.5)
x in PX
In alternating coevolutionary GA [12], the evolution of the two populations is
staggered. It assigns fitness to preys and predators alternatively. The populations are
randomly initialized and evaluated against each other. The algorithm then fixes the
prey population, while it evolves the predator population for several generations.
Then, it fixes the predator population and evolves the prey population for several
generations. This process repeats a fixed number of times. Parallel coevolutionary
GA [45] is similar to alternating coevolutionary GA except that the two populations
evolve simultaneously. The two populations are randomly initialized and evaluated
against each other. There is no fixed fitness landscape. Alternating coevolutionary
PSO [103] is the same as alternating coevolutionary GA except that it is implemented
using PSO. In [61], an approach based on coevolutionary PSO is applied to solve
minimax problems. Two populations of independent PSO using Gaussian distribution
8.4 Coevolution 139
are evolved: one for the variable vector and the other for the Lagrange multiplier
vector. A method of solving general minimax optimization problems using GA is
proposed in [26].
Many design tasks, such as artistic or aesthetic design, control for virtual reality or
comfortableness, and signal processing to increase visibility or audibility, applica-
tions in engineering, edutainment and other fields, require human evaluation. For
domains in which fitness is subjective or difficult to formalize (e.g., for aesthetic
appeal), interactive evolutionary computation (IEC) is an approach to evolutionary
computation in which human evaluation replaces the fitness function. IEC is applied
to human–computer interaction to optimize a target system based on human subjec-
tive evaluation [111].
Genetic art encompasses a variety of digital media, including images, movies,
three-dimensional models, and music [14]. GenJam is an IEC system for evolving
improvisational jazz music [14]. IEC has been applied to police face sketching [19].
In [44], IEC is applied to particle system effects for generating special effects in
computer graphics.
A typical IEC application presents the current generation of solutions, which may
be in the form of sounds or images, to the user. The user interactively gives his or
her subjective evaluations as numerical inputs, based on which EA generates new
parameters for candidate solutions as the next generation of solutions. The parameters
of the target system are optimized toward each user’s preference by iterating this
procedure. In each generation, the user selects the most promising designs, which
are then mated and mutated to create the next generation. This initial population is
evolved through a process similar to domesticated animal and plant breeding.
IEC is often limited by human fatigue. An IEC process usually lasts dozens
of generations for a single user [111]. Collaborative interactive evolution systems
[110] involve multiple users in one IEC application, working to create products with
broader appeal and greater significance. Users vote on a particular individual selected
by the system. To overcome user fatigue, the system combines these inputs to form
a fitness function for GA. GA then evolves an individual to meet the combined user
requirements. Imagebreeder system (http://www.imagebreeder.com/) also offers an
online community coupled with an IEC client for evolving images.
using the model as a surrogate for an expensive fitness function. Fitness models have
also been applied to handle noisy fitness functions, smooth multimodal landscapes,
and define a continuous fitness in domains that lack an explicit fitness (e.g., evolving
art and music).
Fitness Inheritance
An approach to function approximation is fitness inheritance. By fitness inheritance,
an offspring inherits a fitness value from its parents rather than through function
evaluations. An individual is evaluated indirectly by interpolating the fitness of their
parents. In [104], fitness inheritance can be implemented by taking the average fitness
of the two parents or by taking a weighted average. Convergence time and population
sizing of EAs with fitness inheritance are derived for single-objective GAs in [100].
In [99], fitness of a child is the weighted sum of its parents; a fitness and associated
reliability value are assigned to each new individual that is evaluated using the true
fitness function only if the reliability value is below a certain threshold.
An exact evaluation of fitness may not be necessary as long as a proper rank is
approximately preserved. By using fitness granulation via an adaptive fuzzy simi-
larity analysis, the number of fitness evaluations can be reduced [1]. An individual’s
fitness is computed only if it has insufficient similarity to a pool of fuzzy granules
whose fitness has already been computed.
Fitness Approximation by Metamodeling
Fitness approximation can be otained through metamodeling [81,92,93]. Data col-
lected for all previously evaluated points can be used during the evolution to build
metamodels, and the cost of training a metamodel depends on its type and the training
set size. Many statistics such as Bayesian interpolation and neural network models
[81] can be used to construct surrogate models.
Screening methods also consider the confidence of the predicted output [34,57,
92]. Among the previous evaluated points, the less promising generation members
are screened out, and expensive evaluations are only necessary for the most promising
population members. For multimodal problems and in multiobjective optimization,
the confidence information provided by Bayesian interpolation should be used in
order to boost evaluations toward less explored regions. In EAs assisted by local
Bayesian interpolation [34], predictions and their confidence intervals predicted by
Bayesian interpolation are used by EA. It selects the promising members in each
generation and carries out exact, costly evaluations only for them.
In [129], a data parallel Gaussian process-based global surrogate model and a
Lamarckian evolution-based neural network local metamodel are combined in a
hierarchical framework to accelerate convergence.
Efficient global optimization [57] makes use of Gaussian process to model the
search landscape from solutions visited during the search. It does not just choose the
solution that the model predicts would minimize the cost. Rather, it automatically
balances exploitation and exploration. The method uses a closed-form expression
for the expected improvement, and it is thus possible to search the decision space
globally for the solution that maximizes this.
8.6 Fitness Approximation 141
can be resolved by sorting the strings appropriately before crossover. When evolving
the architecture of the network, crossover is usually avoided and only mutations are
adopted.
Coding of network parameters is critical in view of the convergence speed of
search. Each instance of the neural network is encoded by the concatenation of all
the network parameters in one chromosome. A heuristic concerning the order of the
concatenation of the network parameters is to put connection weights terminating at
the same unit together.
The architecture of a neural network is referred to as its topological structure,
i.e., connectivity. Given certain performance criteria, such as minimal training error
and lowest network complexity, the performance levels of all architectures form a
discrete surface in the space due to a discrete number of nodes. The performance
surface is nondifferentiable and multimodal.
Direct and indirect encodings are used for encoding the architecture. For direct
encoding, every connection of the architecture is encoded into the chromosome. For
indirect encoding, only the most important parameters of the architecture, such as
the number of hidden layers and the number of hidden units in each hidden layer, are
encoded. Only the architecture of a network is evolved, whereas other parameters of
the architecture such as the connection weights have to be learned after a near-optimal
architecture is found.
In direct encoding, each parameter ci j , the connectivity from nodes i to j, can be
represented by a bit denoting the presence or absence of a connection.
An architecture
of Nn nodes is represented by an Nn × Nn matrix, C = ci j . If ci j is represented
by real-valued connection weights, both the architecture and connection weights
are evolved simultaneously. The binary string representing the architecture is the
concatenation of all the rows of the matrix. For a feedforward network, only the
upper triangle of the matrix will have nonzero entries, and thus only this part of the
connectivity matrix needs to be encoded into the chromosome. As an example, a
2-2-1 feedforward network is shown in Figure 8.8. Only the upper triangle of the
connectivity matrix is encoded in the chromosome, and we get “0110 110 01 1.”
A chromosome is required to be converted back to a neural network in order to
evaluate the fitness of each chromosome. The neural network is then trained after
being initialized with random weights. The training error is used to measure the
fitness. In this way, EAs explore all possible connectivities.
144 8 Topics in Evolutinary Algorithms
The direct encoding scheme has the problem of scalability. A large network
would require a very large matrix and thus, the computation time of the evolu-
tion is increased. Prior knowledge can be used to reduce the size of the matrix. For
example, for multilayer perceptrons, two adjacent layers are in complete connection,
and therefore, its architecture can be encoded by the number of hidden layers and
the number of hidden units in each layer. This leads to indirect encoding.
Indirect encoding can effectively reduce the chromosome length of the architec-
ture by encoding only some characteristics of the architecture. The details of each
connection are either predefined or specified by some rules. Indirect encoding may
not be very good at finding a compact network with good generalization ability.
Each network architecture may be encoded by a chromosome consisting of a set
of parameters such as the number of hidden layers, the number of hidden nodes in
each layer, and the number of connections between two layers. In this case, EAs can
only search a limited subset of the whole feasible architecture space. This parametric
representation method is most suitable when the type of architecture is known.
One major problem with the evolution of architectures without connection weights
is noisy fitness evaluation [124]. The noise is dependent on the random initialization
of the weights and the training algorithm used. The noise identified is caused by
the one-to-many mapping from genotypes to phenotypes. This drawback can be
alleviated by simultaneously evolving network architectures and connection weights.
The activation function for each neuron can be evolved by symbolic regression
among some popular nonlinear functions such as the Heaviside, sigmoidal, and
Gaussian functions during the learning period.
Example 8.3: We consider the iris classification problem. The iris data set has 150
patterns belonging to 3 classes, shown in Figure 8.9. Each pattern has four numeric
properties. We use a 4-4-3 multilayer perceptron to learn this problem, with three dis-
crete values representing different classes. The logistic sigmoidal function is selected
for the hidden neurons and linear function is used for the output neurons. We use
GA to train the neural network and hope to find a global optimum solution for the
weights.
There are a total of 28 weights in the network, which are encoded as a string of 28
numbers. The fitness function is defined as f = 1+E 1
, where E is the training error,
that is, the mean squared error. Real encoding is employed. A fixed population of
20 is applied. The selection scheme is the roulette-wheel selection. Only mutation is
employed. Only one random gene of a chromosome is mutated by adding Gaussian
8.8 Application: Optimizating Neural Networks 145
(a) (b)
4.5 2.5
4 2
3.5 1.5
2
x4
x
3 1
2.5 0.5
2 0
4 5 6 7 8 1 2 3 4 5 6 7
x1 x3
0.35
Best
0.3 Average
0.25
0.2
Fitness
0.15
0.1
0.05
0
0 100 200 300 400 500
t
Figure 8.10 The evolution of real-coded GA for training a 4-4-3 MLP: the fitness and average
fitness. t corresponds to the number of generations.
noise with variance σ = σ0 1 − Tt + σ1 . The initial population is randomly
generated with all genes of the chromosomes as random numbers in (0, 1). σ0 and σ1
are, respectively, selected as 10 and 0.5. Elitism strategy is adopted. The results for
a typical random run are shown in Figures 8.10 and 8.11. The computation time is
461.66 s for 500 generations. Although the training error is relatively large, E =
2.9171, the rate of correct classification for the trained examples is 96.67%.
In the above implementation, the selection of variance σ is of vital importance. In
ESs, σ itself is evolved, and some other measures beneficial to numerical optimization
are also used.
146 8 Topics in Evolutinary Algorithms
30
25
20
15
E
10
0
0 100 200 300 400 500
t
Figure 8.11 The evolution of real-coded GA for training a 4-4-3 MLP: the training error. t corre-
sponds to the number of generations.
References
1. Akbarzadeh-T M-R, Davarynejad M, Pariz N. Adaptive fuzzy fitness granulation for evolu-
tionary optimization. Int J Approx Reason. 2008;49:523–38.
2. Alba E. Parallel evolutionary algorithms can achieve superlinear performance. Inf Process
Lett. 2002;82(1):7–13.
3. Alba E, Dorronsoro B. The exploration/exploitation tradeoff in dynamic cellular evolutionary
algorithms. IEEE Trans Evol Comput. 2005;9(2):126–42.
4. Alba E, Tomassini M. Parallelism and evolutionary algorithms. IEEE Trans Evol Comput.
2002;6(5):443–62.
5. Al-Madi NA. De Jong’s sphere model test for a human community based genetic algorithm
model (HCBGA). Int J Adv Compu Sci Appl. 2014;5(1):166–172.
6. Al-Madi NA, Khader AT. A social based model for genetic algorithms. In: Proceedings of the
3rd international conference on information technology (ICIT), Amman, Jordan, May 2007.
p. 23–27
7. Al-Naqi A, Erdogan AT, Arslan T. Adaptive three-dimensional cellular genetic algorithm for
balancing exploration and exploitation processes. Soft Comput. 2013;17:1145–57.
8. Arora R, Tulshyan R, Deb K. Parallelization of binary and realcoded genetic algorithms on
GPU using CUDA. In: Proceedings of IEEE world congress on computational intelligence,
Barcelona, Spain, July 2010. p. 3680–3687.
9. Arsuaga-Rios M, Vega-Rodriguez MA. Multiobjective energy optimization in grid systems
from a brain storming strategy. Soft Comput. 2015;19:3159–72.
10. Bai H, Ouyang D, Li X, He L, Yu H. MAX-MIN ant system on GPU with CUDA. In:
Proceedings of the IEEE 4th international conference on innovative computing, information
and control (ICICIC), Kaohsiung, Taiwan, Dec 2009. p. 801–204.
11. Barabasi AL, Freeh VW, Jeong H, Brockman JB. Parasitic computing. Nature.
2001;412(6850):894–7.
References 147
12. Barbosa HJC. A genetic algorithm for min-max problems. In: Proceedings of the 1st inter-
national conference on evolutionary computation and applications, Moscow, Russia, 1996. p.
99–109.
13. Beyer H-G. An alternative explanation for the manner in which genetic algorithms operate.
Biosystems. 1997;41(1):1–15.
14. Biles J. Genjam: a genetic algorithm for generating jazz solos. In: Proceedings of international
computer music conference, Arhus, Denmark, 1994. p. 131–137.
15. Bongard J, Zykov V, Lipson H. Resilient machines through continuous self-modeling. Science.
2006;314(5802):1118–21.
16. Bozejko W, Smutnicki C, Uchronski M. Parallel calculating of the goal function in meta-
heuristics using GPU. In: Proceedings of the 9th international conference on computational
science, Baton Rouge, LA, USA, May 2009, vol. 5544 of Lecture Notes in Computer Science.
Berlin: Springer; 2009. p. 1014–2023.
17. Brownlee AEI, McCall JAW, Zhang Q. Fitness modeling with Markov networks. IEEE Trans
Evol Comput. 2013;17(6):862–79.
18. Calazan RM, Nedjah N, De Macedo Mourelle L. Parallel GPU-based implementation of high
dimension particle swarm optimizations. In: Proceedings of the IEEE 4th Latin American
symposium on circuits and systems (LASCAS), Cusco, Peru, Feb 2013. p. 1–4.
19. Caldwell C, Johnston VS. Tracking a criminal suspect through “face-space” with a genetic
algorithm. In: Proceedings of the 4th international conference on genetic algorithms, San
Diego, CA, USA, July 1991. San Diego, CA: Morgan Kaufmann; 1991. p. 416–421
20. Candan C, Dreo J, Saveant P, Vidal V. Parallel divide-and-evolve: experiments with Open-MP
on a multicore machine. In: Proceedings of GECCO, Dublin, Ireland, July 2011. p. 1571–
1578.
21. Cerf R. Asymptotic convergence of genetic algorithms. Adv Appl Probab. 1998;30(2):521–50.
22. Cheang SM, Leung KS, Lee KH. Genetic parallel programming: design and implementation.
Evol Comput. 2006;14(2):129–56.
23. Collet P, Lutton E, Schoenauer M, Louchet J. Take it EASEA. In: Proceedings of the 6th
international conference on parallel problem solving from nature (PPSN VI), Paris, France,
Sept 2000, vol. 1917 of Lecture Notes in Computer Science. London: Springer; 2000. p.
891–901
24. Collins RJ, Jefferson DR. Selection in massively parallel genetic algorithms. In: Belew RK,
Booker LB, editors. Proceedings of the 4th international conference on genetic algorithms,
San Diego, CA, USA, July 1991. San Diego, CA: Morgan Kaufmann; 1991. p. 249–256.
25. Corno F, Reorda M, Squillero G. The selfish gene algorithm: a new evolutionary optimization
strategy. In: Proceedings of the 13th annual ACM symposium on applied computing (SAC),
Atlanta, Georgia, USA, 1998. p. 349–355.
26. Cramer AM, Sudhoff SD, Zivi EL. Evolutionary algorithms for minimax problems in robust
design. IEEE Trans Evol Comput. 2009;13(2):444–53.
27. Dawkins R. The selfish gene. Oxford: Oxford University Press; 1989.
28. De Jong K. An analysis of the behavior of a class of genetic adaptive systems. PhD Thesis,
University of Michigan, Ann Arbor, 1975.
29. de Veronese PL, Krohling RA. Differential evolution algorithm on the GPU with C-CUDA.
In: Proceedings of IEEE world congress on computational intelligence, Barcelona, Spain,
July 2010. p. 1878–1884.
30. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceed-
ings of the 6th symposium on operating system design and implementation (OSDI), San
Francisco, CA, 2004. p. 137–147.
31. Droste S, Jansen T, Wegener I. On the analysis of the (1+1) evolutionary algorithm. Theor
Comput Sci. 2002;276:51–81.
32. Du K-L, Swamy MNS. Neural networks and statistical learning. London: Springer; 2014.
148 8 Topics in Evolutinary Algorithms
33. Eiben AE, Aarts EHL, Van Hee KM. Global convergence of genetic algorithms: a Markov
chain analysis. In: Proceedings of the 1st workshop on parallel problem solving from nature
(PPSN I), Dortmund, Germany, Oct 1990. Berlin: Springer; 1991. p. 3–12.
34. Emmerich MTM, Giannakoglou KC, Naujoks B. Single- and multiobjective evolutionary
optimization assisted by Gaussian random field metamodels. IEEE Trans Evol Comput.
2006;10(4):421–39.
35. Ewald G, Kurek W, Brdys MA. Grid implementation of a parallel multiobjective genetic algo-
rithm for optimized allocation of chlorination stations indrinking water distribution systems:
Chojnice case study. IEEE Trans Syst Man Cybern Part C. 2008;38(4):497–509.
36. Fok K-L, Wong T-T, Wong M-L. Evolutionary computing on consumer graphics hardware.
IEEE Intell Syst. 2007;22:69–78.
37. Folino G, Pizzuti C, Spezzano G. A scalable cellular implementation of parallel genetic
programming. IEEE Trans Evol Comput. 2003;7(1):37–53.
38. Ge H, Sun L, Yang X, Yoshida S, Liang Y. Cooperative differential evolution with fast variable
interdependence learning and cross-cluster mutation. Appl Soft Comput. 2015;36:300–14.
39. Goh C-K, Tan KC. A competitive-cooperative coevolutionary paradigm for dynamic multi-
objective optimization. IEEE Trans Evol Comput. 2009;13(1):103–27.
40. Goldberg DE. Genetic algorithms in search, optimization, and machine learning. Reading,
MA, USA: Addison-Wesley; 1989.
41. Goldberg DE, Deb K, Korb B. Messy genetic algorithms: motivation, analysis, and first results.
Complex Syst. 1989;3:493–530.
42. Gong Y-J, Chen W-N, Zhan Z-H, Zhang J, Li Y, Zhang Q, Li J-J. Distributed evolutionary
algorithms and their models: a survey of the state-of-the-art. Appl Soft Comput. 2015;34:286–
300.
43. Grefenstette JJ. Deception considered harmful. In: Whitley LD, editor. Foundations of genetic
algorithms, vol. 2. Morgan Kaufmann: San Mateo, CA; 1993. p. 75–91.
44. Hastings EJ, Guha RK, Stanley KO. Interactive evolution of particle systems for computer
graphics and animation. IEEE Trans Evol Comput. 2009;13(2):418–32.
45. Herrmann JW. A genetic algorithm for minimax optimization problems. In: Proceedings of
the congress on evolutionary computation (CEC), Washington DC, July 1999, vol. 2. p. 1099–
1103.
46. He J, Yao X. Drift analysis and average time complexity of evolutionary algorithms. Artif
Intell. 2001;127:57–85.
47. He J, Yao X. From an individual to a population: an analysis of the first hitting time of
population-based evolutionary algorithms. IEEE Trans Evol Comput. 2002;6(5):495–511.
48. He J, Yao X. Analysis of scalable parallel evolutionary algorithms. In: Proceedings of the
IEEE congress on evolutionary computation (CEC), Vancouver, BC, Canada, July 2006. p.
120–127.
49. He J, Yu X. Conditions for the convergence of evolutionary algorithms. J Syst Arch.
2001;47(7):601–12.
50. Holland J. Adaptation in natural and artificial systems. Ann Arbor, Michigan: University of
Michigan Press; 1975.
51. Holland JH. Building blocks, cohort genetic algorithms and hyperplane-defined functions.
Evol Comput. 2000;8(4):373–91.
52. Horn J. Finite Markov chain analysis of genetic algorithms with niching. In: Proceedings of
the 5th international conference on genetic algorithms, Urbana, IL, July 1993. San Francisco,
CA: Morgan Kaufmann Publishers; 1993. p. 110–117
53. Jansen T, De Jong KA, Wegener I. On the choice of the offspring population size in evolu-
tionary algorithms. Evol Comput. 2005;13(4):413–40.
54. Jansen T, Wegener I. The analysis of evolutionary algorithms—a proof that crossover really
can help. Algorithmica. 2002;33:47–66.
References 149
55. Jin H, Frumkin M, Yan J.The OpenMP implementation of NAS parallel benchmarks and its
performance. MRJ Technology Solutions, NASA Contract NAS2-14303, Moffett Field, CA,
Oct 1999.
56. Jin Y, Sendhoff B. Reducing fitness evaluations using clustering techniques and neural network
ensembles. In: Proceedings of genetic and evolutionary computation, Seattle, WA, USA, July
2004. p. 688–699.
57. Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensive black-box
functions. J Global Optim. 1998;13(4):455–92.
58. Kim H-S, Cho S-B. An efficient genetic algorithms with less fitness evaluation by clustering.
In: Proceedings of IEEE congress on evolutionary computation (CEC), Seoul, Korea, May
2001. p. 887–894.
59. Koza JR. Genetic programming: on the programming of computers by means of natural
selection. Cambridge, MA: MIT Press; 1992.
60. Krawiec K, Bhanu B.Coevolution and linear genetic programming for visual learning. In:
Proceedings of genetic and evolutionary computation conference (GECCO), Chicago, Illinois,
USA, vol. 2723 of Lecture Notes of Computer Science. Berlin: Springer; 2003. p. 332–343
61. Krohling RA, Coelho LS. Coevolutionary particle swarm optimization using Gaussian distri-
bution for solving constrained optimization problems. IEEE Trans Syst Man Cybern Part B.
2006;36(6):1407–16.
62. Lassig J, Sudholt D. Design and analysis of migration in parallel evolutionary algorithms.
Soft Comput. 2013;17:1121–44.
63. Lastra M, Molina D, Benitez JM. A high performance memetic algorithm for extremely
high-dimensional problems. Inf Sci. 2015;293:35–58.
64. Lehman J, Stanley KO. Abandoning objectives: evolution through the search for novelty
alone. Evol Comput. 2011;19(2):189–223.
65. Lehre PK, Yao X. On the impact of mutation-selection balance on the runtime of evolutionary
algorithms. IEEE Trans Evol Comput. 2012;16(2):225–41.
66. Leung Y, Gao Y, Xu Z-B. Degree of population diversity: a perspective on premature con-
vergence in genetic algorithms and its Markov chain analysis. IEEE Tran Neural Netw.
1997;8(5):1165–76.
67. Liu J, Zhong W, Jiao L. A multiagent evolutionary algorithm for constraint satisfaction prob-
lems. IEEE Trans Syst Man Cybern Part B. 2006;36(1):54–73.
68. Liu J, Zhong W, Jiao L. A multiagent evolutionary algorithm for combinatorial optimization
problems. IEEE Trans Syst Man Cybern Part B. 2010;40(1):229–40.
69. Mallipeddi R, Lee M. An evolving surrogate model-based differential evolution algorithm.
Appl Soft Comput. 2015;34:770–87.
70. Manderick B, Spiessens P. Fine-grained parallel genetic algorithms. In: Schaffer JD, editor.
Proceedings of the 3rd international conference on genetic algorithms, Fairfax, Virginia, USA,
June 1989. San Mateo, CA: Morgan Kaufmann; 1989. p. 428–433.
71. Merelo-Guervos JJ. Fluid evolutionary algorithms. In: Proceedings of IEEE congress on
evolutionary computation, Barcelona, Spain, July 2010. p. 1–8.
72. Meri K, Arenas MG, Mora AM, Merelo JJ, Castillo PA, Garcia-Sanchez P, Laredo JLJ. Cloud-
based evolutionary algorithms: an algorithmic study. Natural Comput. 2013;12(2):135–47.
73. Meyer-Spradow J, Loviscach J. Evolutionary design of BRDFs. In: Chover M, Hagen H, Tost
D, editors. Eurographics 2003 short paper proceedings. Spain: Granada; 2003. p. 301–6.
74. Muhlenbein H. Parallel genetic algorithms, population genetics and combinatorial optimiza-
tion. In: Schaffer JD, editor. Proceedings of the 3rd international conference on genetic
algorithms, Fairfax, Virginia, USA, June 1989. San Mateo, CA: Morgan Kaufman; 1989.
p. 416–421.
75. Muhlenbein H, Schomisch M, Born J. The parallel genetic algorithm as a function optimizer.
In: Proceedings of the 4th international conference on genetic algorithms, San Diego, CA,
July 1991. p. 271–278.
150 8 Topics in Evolutinary Algorithms
76. Munawar A, Wahib M, Munawar A, Wahib M. Theoretical and empirical analysis of a GPU
based parallel Bayesian optimization algorithm. In: Proceedings of IEEE international confer-
ence on parallel and distributed computing, applications and technologies, Higashi Hiroshima,
Japan, Dec 2009. p. 457–462.
77. Nara K, Takeyama T, Kim H. A new evolutionary algorithm based on sheep flocks hered-
ity model and its application to scheduling problem. In: Proceedings of IEEE international
conference on systems, man, and cybernetics, Tokyo, Japan, Oct 1999, vol. 6. p. 503–508.
78. Niwa T, Iba H. Distributed genetic programming: empirical study and analysis. In: Proceedings
of the 1st annual conference on genetic programming, Stanford University, CA, USA, July
1996. p. 339–344.
79. Nix AE, Vose MD. Modeling genetic algorithms with markov chains. Ann Math Artif Intell.
1992;5:79–88.
80. Omidvar MN, Li X, Mei Y, Yao X. Cooperative co-evolution with differential grouping for
large scale optimization. IEEE Trans Evol Comput. 2014;18(3):378–93.
81. Ong YS, Nair PB, Kean AJ. Evolutionary optimization of computationally expensive problems
via surrogate modeling. AIAA J. 2003;41(4):687–96.
82. O’Reilly UM, Oppacher F. The troubling aspects of a building-block hypothesis for genetic
programming. In: Whitley LD, Vose MD, editors. Foundations of genetic algorithm 3. San
Francisco, CA: Morgan Kaufmann; 1995. p. 73–88
83. Panait L. Theoretical convergence guarantees for cooperative coevolutionary algorithms. Evol
Comput. 2010;18(4):581–615.
84. Poli R. Parallel distributed genetic programming. In: Come D, Dorigo M, Glover F, editors.
New ideas in optimization. New York: McGraw-Hill; 1999.
85. Poli R. Exact schema theory for GP and variable-length GAs with one-point crossover. Genetic
Progr Evol Mach. 2001;2:123–63.
86. Poli R, Langdon WB. Schema theory for genetic programming with one-point crossover and
point mutation. Evol Comput. 2001;6(3):231–52.
87. Poli R, McPhee NF. General schema theory for genetic programming with subtree-swapping
crossover: part i. Evol Comput. 2003;11(1):53–66.
88. Poli R, McPhee NF. General schema theory for genetic programming with subtree-swapping
crossover: part ii. Evol Comput. 2003;11(2):169–206.
89. Potter MA, de Jong KA. A cooperative coevolutionary approach to function optimization.
In: Proceedings of the 3rd conference on parallel problem solving from nature (PPSN III),
Jerusalem, Isreal, Oct 1994. Berlin: Springer; 1994. p. 249–257.
90. Potter MA, De Jong KA. Cooperative coevolution: an architecture for evolving coadapted
subcomponenets. Evol Comput. 2000;8(1):1–29.
91. Qi X, Palmieri F. Theoretical analysis of evolutionary algorithms with an infinite population
size in continuous space, part 1: basic properties of selection and mutation. IEEE Trans Neural
Netw. 2004;5(1):102–19.
92. Ratle A. Accelerating the convergence of evolutionary algorithms by fitness landscape approx-
imation. In: Parallel problem solving from nature (PPSN V), 1998. p. 87–96.
93. Regis RG, Shoemaker CA. Local function approximation in evolutionary algorithms for the
optimization of costly functions. IEEE Trans Evol Comput. 2004;8(5):490–505.
94. Reza A, Vahid Z, Koorush Z. MLGA: a multilevel cooperative genetic algorithm. In: Pro-
ceedings of the IEEE 5th international conference on bio-inspired computing: theories and
applications (BIC-TA), Changsha, China, Sept 2010. p. 271–277.
95. Rosin C, Belew R. New methods for competitive coevolution. Evol Comput. 1997;15(1):1–29.
96. Rudolph G. Convergence analysis of canonical genetic algorithm. IEEE Trans Neural Netw.
1994;5(1):96–101.
97. Rudolph G. Finite Markov chain results in evolutionary computation: a tour d’horizon. Fun-
damenta Informaticae. 1998;35:67–89.
References 151
98. Rudolph G. Self-adaptive mutations may lead to premature convergence. IEEE Transa Evol
Comput. 2001;5:410–4.
99. Salami M, Hendtlass T. A fast evaluation strategy for evolutionary algorithms. Appl Soft
Comput. 2003;2(3):156–73.
100. Sastry K, Goldberg DE, Pelikan M. Don’t evaluate, inherit. In: Proceedings of genetic evolu-
tionary computation conference (GECCO), San Francisco, CA, USA, July 2001. p. 551–558.
101. Schmidt MD, Lipson H. Coevolution of fitness predictors. IEEE Trans Evol Comput.
2008;12(6):736–49.
102. Schutte JF, Reinbolt JA, Fregly BJ, Haftka RT, George AD. Parallel global optimization with
the particle swarm algorithm. Int J Numer Methods Eng. 2004;61(13):2296–315.
103. Shi Y, Krohling RA. Co-evolutionary particle swarm optimization to solve min-max problems.
In: Proceedings of the congress on evolutionary computation (CEC), Honolulu, HI, May 2002,
vol. 2. p. 1682–1687.
104. Smith RE, Dike BA, Stegmann SA. Fitness inheritance in genetic algorithms. In: Proceedings
of ACM symposium on applied computing, Nashville, Tennessee, USA, 1995. p. 345–350.
105. Smith J, Vavak F. Replacement strategies in steady state genetic algorithms: static environ-
ments. In: Banzhaf W, Reeves C, editors. Foundations of genetic algorithms, vol. 5. CA:
Morgan Kaufmann; 1999. p. 219–233.
106. Stephens CR, Poli R. Coarse-grained dynamics for generalized recombination. IEEE Trans
Evol Comput. 2007;11(4):541–57.
107. Stephens CR, Waelbroeck H. Schemata evolution and building blocks. Evol Comput.
1999;7:109–29.
108. Sudholt D. A new method for lower bounds on the running time of evolutionary algorithms.
IEEE Trans Evol Comput. 2013;17(3):418–35.
109. Sudholt D. How crossover speeds up building-block assembly in genetic algorithms. Evol
Comput 2016.
110. Szumlanski SR, Wu AS, Hughes CE. Conflict resolution and a framework for collaborative
interactive evolution. In: Proceedings of the 21st national conference on artificial intelligence
(AAAI), Boston, Massachusetts, USA, July 2006. p. 512–517.
111. Takagi H. Interactive evolutionary computation: fusion of the capacities of EC optimization
and human evaluation. Proc IEEE. 2001;89(9):1275–96.
112. Tasoulis DK, Pavlidis NG, Plagianakos VP, Vrahatis MN. Parallel differential evolution. In:
Proceedings of the IEEE congress on evolutionary computation, Portland, OR, USA, June
2004. p. 2023–2029.
113. Thomsen R, Rickers P, Krink T. A religion-based spatial model for evolutionary algorithms.
In: Proceedings of the 6th international conference on parallel problem solving from nature
(PPSN VI), Paris, France, September 2000, vol. 1917 of Lecture Notes in Computer Science.
London: Springer; 2000. p. 817–826.
114. van den Bergh F, Engelbrecht A. A cooperative approach to particle swarm optimization.
IEEE Trans Evol Comput. 2004;8(3):225–39.
115. Vose M, Liepins G. Punctuated equilibria in genetic search. Complex Syst. 1991;5:31–44.
116. Weber M, Neri F, Tirronen V. Distributed differential evolution with explorative-exploitative
population families. Genetic Progr Evol Mach. 2009;10:343–471.
117. Whitley D, Starkweather T. GENITOR II: a distributed genetic algorithm. J Exp Theor Artif
Intell. 1990;2(3):189–214.
118. Whitley D, Yoo NW. Modeling simple genetic algorithms for permutation problems. In:
Whitley D, Vose M, editors. Foundations of genetic algorithms, vol. 3. San Mateo, CA:
Morgan Kaufmann; 1995. p. 163–184.
119. Wickramasinghe W, van Steen M, Eiben A. Peer-to-peer evolutionary algorithms with adaptive
autonomous selection. In: Proceedings of the 9th annual conference on genetic and evolu-
tionary computation (GECCO), London, U.K., July 2007. p. 1460–1467.
152 8 Topics in Evolutinary Algorithms
120. Wong M-L, Cui G. Data mining using parallel multiobjective evolutionary algorithms on
graphics hardware. In: Sobrevilla P, editors. Proceedings of IEEE world congress on compu-
tational intelligence, Barcelona, Spain, July 2010. p. 3815–3822.
121. Wong M-L, Wong T-T, Fok K-L. Parallel evolutionary algorithms on graphics processing
unit. In: Proceedings of the IEEE congress on evolutionary computation, Edinburgh, UK,
Sept 2005. p. 2286–2293.
122. Xu L, Zhang F. Parallel particle swarm optimization for attribute reduction. In: Proceedings
of the 8th ACIS international conference on software engineering, artificial intelligence, net-
working, and parallel/distributed computing, Qingdao, China, July 2007, vol. 1. p. 770–775.
123. Yang Z, Tang K, Yao X. Large scale evolutionary optimization using cooperative coevolution.
Inf Sci. 2008;178(15):2985–99.
124. Yao X, Liu Y. A new evolutionary system for evolving artificial neural networks. IEEE Trans
Neural Netw. 1997;8(3):694–713.
125. Yuen SY, Cheung BKS. Bounds for probability of success of classical genetic algorithm based
on Hamming distance. IEEE Trans Evol Comput. 2006;10(1):1–18.
126. Yu Y, Zhou Z-H. A new approach to estimating the expected first hitting time of evolutionary
algorithms. Artif Intell. 2008;172(15):1809–32.
127. Zhang C, Chen J, Xin B. Distributed memetic differential evolution with the synergy of
Lamarckian and Baldwinian learning. Appl Soft Comput. 2013;13(5):2947–59.
128. Zhong W, Liu J, Xue M, Jiao L. A multiagent genetic Algorithm for global numerical opti-
mization. IEEE Trans Syst Man Cybern Part B. 2004;34(2):1128–41.
129. Zhou Z, Ong YS, Nair PB, Keane AJ, Lum KY. Combining global and local surrogate models
to accelerate evolutionary optimization. IEEE Trans Syst Man Cybern Part C. 2007;37(1):
66–76.
Particle Swarm Optimization
9
PSO can locate the region of the optimum faster than EAs, but once in this region
it progresses slowly due to the fixed velocity stepsize. Almost all variants of PSO
try to solve the stagnation problem. This chapter is dedicated to PSO as well as its
variants.
9.1 Introduction
The notion of employing many autonomous particles that act together in simple ways
to produce seemingly complex emergent behavior was initially considered to solve
the problem of rendering images in computer animations [79]. A particle system
stochastically generates a series of moving points. Each particle is assigned an initial
velocity vector. It may also have additional characteristics such as color, texture, and
limited lifetime. Iteratively, velocity vectors are adjusted by some random factor. In
computer graphics and computer games, particle systems are ubiquitous and are the
de facto method for producing animated effects such as fire, smoke, clouds, gunfire,
water, cloth, explosions, magic, lighting, electricity, flocking, and many others. They
are defined by a set of points in space and a set of rules guiding their behavior
and appearance, e.g., velocity, color, size, shape, transparency, and rotation. This
decouples the creation of new complex effects from mathematics and programming.
Today, particle systems are even more popular in global optimization.
PSO originates from studies of synchronous bird flocking, fish schooling, and bees
buzzing [22,44,45,59,83]. It evolves populations or swarms of individuals called
particles. Particles work under social behavior in swarms. PSO finds the global
best solution by simply adjusting the moving vector of each particle according to
its personal best (cognition aspect) and the global best (social aspect) positions of
particles in the entire swarm at each iteration.
Compared with ant colony algorithms and EAs, PSO requires only primitive
mathematical operators, less computational bookkeeping and generally fewer lines
of code, and thus it is computationally inexpensive in terms of both memory require-
ments and speed. PSO is popular due to its simplicity of implementation and its
ability to quickly converge to a reasonably acceptable solution.
suboptimal, the swarm can easily stagnate around it without any pressure to continue
exploration. This can be seen from (9.1). If x i (t) = x i∗ (t) = x g (t), then the velocity
update will depend only on the value of αv i (t). If their previous velocities v i (t) are
very close to zero, then all the particles will stop moving once they catch up with
the gbest particle. Even worse, the gbest point may not be a local minimum. This
phenomenon is referred to as stagnation. To avoid stagnation, reseeding or partial
restart is introduced by generating new particles at distinct places of the search space.
Almost all variants of PSO try to solve the local optimum or stagnation problem.
PSO can locate the region of the optimum faster than EAs. However, once in this
region it progresses slowly due to the fixed velocity stepsize. Linearly decreasing
weight PSO (LDWPSO) [83] effectively balances the global and local search abilities
of the swarm by introducing a linearly decreasing inertia weight on the previous
velocity of the particle into (9.1):
v i (t + 1) = αv i (t) + c1r1 x i∗ (t) − x i (t) + c2 r2 x g (t) − x i (t) , (9.3)
where α is called the inertia weight, and the positive constants c1 and c2 are, respec-
tively, cognitive and social parameters. Typically, c1 = 2.0, c2 = 2.0, and α gradu-
ally decreases from αmax to αmin :
t
α(t) = αmax − (αmax − αmin ) , (9.4)
T
T being the maximum number of iterations. One can select αmax = 1 and αmin = 0.1.
The flowchart of PSO is given by Algorithm 9.1.
Center PSO [57] introduces a center particle into LDWPSO and is updated as
the swarm center at every iteration. The center particle has no velocity, but it is
involved in all operations the same way as the ordinary particles, such as fitness
evaluation, competition for the best particle, except for the velocity calculation. All
particles oscillate around the swarm center and gradually converge toward it. The
center particle often becomes the gbest of the swarm during the run. Therefore, it
has more opportunities to guide the search of the whole swarm, and influences the
performance greatly. CenterPSO achieves not only better solutions but also faster
convergence than LDWPSO does.
PSO, DE, and CMA-ES are compared using certain fitness landscapes evolved
with GP in [52]. DE may get stuck in local optima most of the time for some problem
landscapes. However, over similar landscapes PSO will always find the global optima
correctly within a maximum time bound. DE sometimes has a limited ability to move
its population large distances across the search space if the population is clustered
in a limited portion of it.
Instead of applying inertia to the velocity memory, constriction PSO [22] applies
a constriction factor χ to control the magnitude of velocities:
v i (t + 1) = χ{v i (t) + φ1r1 (x i∗ (t) − x i (t)) + φ2 r2 (x g (t) − x i (t))}, (9.5)
2
χ= , (9.6)
2 − ϕ − ϕ2 − 4ϕ
156 9 Particle Swarm Optimization
1. Set t = 1.
Initialize each particle in the population by randomly selecting values for its position
x i and velocity v i , i = 1, . . . , N P .
2. Repeat:
a. Calculate the fitness value of each particle i.
If the fitness value for each particle i is greater than its best fitness value found
so far, then revise x i∗ (t).
b. Determine the location of the particle with the highest fitness and revise x g (t) if
necessary.
c. for each particle i, calculate its velocity according to (9.1) or (9.3).
d. Update the location of each particle i according to (9.2).
e. Set t = t + 1.
until stopping criteria are met.
where ϕ = ϕ1 + ϕ2 > 4. With this formulation, the velocity limit vmax is no longer
necessary, and the algorithm could guarantee convergence without clamping the
velocity. It is suggested that ϕ = 4.1 (c1 = c2 = 2.05) and χ = 0.729 [27]. When
α = χ and ϕ1 + ϕ2 > 4, the constriction and inertia approaches are algebraically
equivalent and improved performance could be achieved across a wide range of
problems [27]. Constriction PSO has faster convergence than LDWPSO, but it is
prone to be trapped in local optima for multimodal functions.
Bare-bones PSO [42], as the simplest version of PSO, eliminates the velocity equation
of PSO and uses a Gaussian distribution based on pbest and gbest to sample the
search space. It does not use the inertia weight, acceleration coefficient or velocity.
The velocity update equation (9.3) is not used and a Gaussian distribution with the
global and local best positions is used to update the particles’ positions. Bare-bones
PSO has the following update equations:
xi, j (t + 1) = gi, j (t) + σi, j (t)N (0, 1), (9.7)
g
gi, j (t) = 0.5 xi,∗ j (t) + x j (t) , (9.8)
g
σi, j (t) = xi,∗ j (t) − x j (t) , (9.9)
where subscripts i, j denote the ith particle and jth dimension, respectively, N (0, 1)
is the Gaussian distribution with zero mean and unit variance. The method can be
9.2 Basic PSO Algorithms 157
derived from basic PSO [68]. An alternative version is to set xi, j (t + 1) to (9.7) with
50 % chance, and to the previous best position x i,∗ j (t) with 50 % chance. Bare-bones
PSO still suffers from the problem of premature convergence.
It is shown in [15] that during stagnation in PSO, the points sampled by the leader
particle lie on a specific line. The condition under which particles stick to exploring
one side of the stagnation point only is obtained, and the case where both sides
are explored is also given. Information about the gradient of the objective function
during stagnation in PSO are also obtained.
Under the generalized theoretical deterministic PSO model, conditions for particle
convergence to a point are derived in [20]. The model greatly weakens the stagnation
assumption, by assuming that each particle’s personal best and neighborhood best
can occupy an arbitrarily large number of unique positions.
In [21], an objective function is designed for assumption-free convergence analysis
of some PSO variants. It is found that canonical particle swarm’s topology does not
have an impact on the parameter region needed to ensure convergence. The parameter
region needed to ensure convergent particle behavior has been empirically obtained
for fully informed PSO, bare-bones PSO, and standard PSO 2011.
The issues associated with PSO are the stagnation of particles in some points in
the search space, inability to change the value of one or more decision variables,
poor performance in case of small swarm, lack of guarantee to converge even to
a local optimum, poor performance for an increasing number of dimensions, and
sensitivity to the rotation of the search space. A general form of velocity update rule
for PSO proposed in [10] guarantees to address all of these issues if the user-definable
function f satisfies the two conditions: (i) f is designed in such a way that for any
input vector x in the search space, there exists a region A which contains x and f (x)
can be located anywhere in A, and (ii) f is invariant under any affine transformation.
Example 9.1: We revisit the optimization problem treated in Example 2.1. The
Easom function is plotted in Figure 2.1. The global minimum value is −1 at
x = (π, π)T .
MATLAB Global Optimization Toolbox provides a PSO solver particles
warm. Using the default parameter settings, particleswarm solver can always
find the global optimum very rapidly for ten random runs, for the range [−100, 100]2 .
This is because all the initial individuals which are randomly selected in (0, 1) are
very close the global optimum.
A fair evaluation of PSO is to set the initial population randomly from the entire
domain. We select an initial population size of 40 and other default parameters. For
20 random runs, the solver converged 19 times for a maximum of 100 generations.
For a random run, we have f (x) = −1.0000 at (3.1416, 3.1416) with 2363 function
evaluations, and all the individuals converge toward the global optimum. The evolu-
tion of a random run is illustrated in Figure 9.1. For this problem, we conclude that
the particleswarm solver outperforms ga and simulannealbnd solvers.
9.3 PSO Variants Using Different Neighborhood Topologies 159
-0.2
Function value
-0.4
-0.6
-0.8
-1
0 10 20 30 40 50 60
Iteration
Figure 9.1 The evolution of a random run of PSO: the minimum and average objectives.
A key feature of PSO is social information sharing among the neighborhood. Typical
neighborhood topologies are the von-Neumann neighborhood, gbest and lbest, as
shown in Figure 9.2. The simplest neighbor structure might be the ring structure.
Basic PSO uses gbest topology, in which the neighborhood consists of the whole
swarm, meaning that all the particles have the information of the globally found best
solution. Every particle is a neighbor of every other particle.
The lbest neighborhood has ring lattice topology: each particle generates a neigh-
borhood consisting of itself and its two or more immediate neighbors. The neighbors
may not be close to the generating particle either regarding the objective function
values or the positions, instead they are chosen by their adjacent indices.
chronous model. In general, the asynchronous model has faster convergence speed
than synchronous PSO, yet at the cost of getting trapped by rapidly attracting all parti-
cles to a deceitful solution. Random asynchronous PSO is a variant of asynchronous
PSO where particles are selected at random to perform their operations. Random
asynchronous PSO has the best general performance in large neighborhoods, while
synchronous PSO has the best one in small neighborhoods [77].
In fitness-distance-ratio-based PSO (FDR-PSO) [74], each particle utilizes an
additional information of the nearby higher fitness particle that is selected according
to fitness–distance ratio, i.e., the ratio of fitness improvement over the respective
weighted Euclidean distance. The algorithm moves particles toward nearby particles
of higher fitness, instead of attracting each particle toward just the gbest position. This
combats the problem of premature convergence observed in PSO. Concurrent PSO
[6] avoids the possible crosstalk effect of pbest and gbest with nbest in FDR-PSO
by concurrently simulating modified PSO and FDR-PSO algorithms with frequent
message passing between them.
To avoid stagnation and to keep the gbest particle moving until it has reached a
local minimum, guaranteed convergence PSO [87] uses a different velocity update
equation for the x g particle, which causes the particle to perform a random search
around x g within a radius defined by a scaling factor. Its ability to operate with small
swarm sizes makes it an enabling technique for parallel niching solutions.
For large parameter optimization problems, orthogonal PSO [35] uses an intel-
ligent move mechanism, which applies orthogonal experimental design to adjust a
velocity for each particle by using a divide and conquer approach in determining the
next move of particles.
In [14], basic PSO and Michigan PSO are used to solve the problem of prototype
placement for nearest prototype classifiers. In the Michigan approach, a member of
the population only encodes part of the solution, and the whole swarm is the potential
solution to the problem. This reduces the dimension of the search space. Adaptive
Michigan PSO [14] uses modified PSO equations with both particle competition
and cooperation between the closest neighbors and a dynamic neighborhood. The
Michigan PSO algorithms introduce a local fitness function to guide the particles’
movement and dynamic neighborhoods that are calculated on each iteration.
Diversity can be maintained by relocating the particles when they are too close
to each other [60] or using some collision-avoiding mechanisms [8]. In [71], trans-
formations of the objective function through deflection and stretching are used to
overcome local minimizers and a repulsion source at each detected minimizer is
used to repel particles away from previously detected minimizers. This combina-
tion is able to find as many global minima as possible by preventing particles from
moving to a previously discovered minimal region.
In [30], PSO is used to improve simplex search. Clustering-aided simplex PSO
[40] incorporates simplex method to improve PSO performance. Each particle in
PSO is regarded as a point of the simplex. On each iteration, the worst particle is
replaced by a new particle generated by one iteration of the simplex method. Then,
all particles are again updated by PSO. PSO and simplex methods are performed
iteratively.
162 9 Particle Swarm Optimization
Adaptive PSO [93] first, by evaluating the population distribution and particle
fitness, performs a real-time procedure to identify one of the four defined evolution-
ary states, including exploration, exploitation, convergence, and jumping out in each
generation. It enables the automatic control of algorithmic parameters at run time to
improve the search efficiency and convergence speed. Then, an elitist learning strat-
egy is performed when the evolutionary state is classified as convergence state. The
strategy will act on the gbest particle to jump out of the likely local optima. Adaptive
PSO substantially enhances the performance of PSO in terms of convergence speed,
global optimality, solution accuracy, and algorithm reliability.
Chaotic PSO [2] utilizes chaotic maps for parameter adaptation which can improve
the search ability of basic PSO. Frankenstein’s PSO [25] combines a number of algo-
rithmic components such as time-varying population topology, the velocity updating
mechanism of fully informed PSO [64], and decreasing inertia weight, showing
advantages in terms of optimization speed and reliability. Particles are initially con-
nected with fully connected topology, which is reduced over time with certain pattern.
Comprehensive learning PSO (http://www.ntu.edu.sg/home/epnsugan) [55] uses
all other particles’ pbest information to update a particle’s velocity. It learns each
dimension of a particle from just one particle’s historical best information, while
each particle learns from different particles’ historical best information for different
dimensions for a few generations. This strategy helps to preserve the diversity to
discourage premature convergence. The method outperforms PSO with inertia weight
[83] and PSO with constriction factor [22] in solving multimodal problems.
Inspired by the social behavior of clan, clan PSO [13] divides the PSO population
into several clans. Each clan will first perform the search and the particle with the best
fitness is selected as the clan leader. The leaders then meet to adjust their position.
Dynamic clan PSO [7] allows particles in one clan migrate to another clan.
Motivated by a social phenomenon where multiple of good exemplars assist the
crowd to progress better, in example-based Learning PSO [37], an example set of
multiple gbest particles is employed to update the particles’ position in example-
based Learning PSO.
Charged PSO [8] utilizes an analogy of electrostatic energy, where some mutually
repelling particles orbit a nucleus of neutral particles. This nucleus corresponds to
a basic PSO swarm. The particles with identical charges produce a repulsive force
between them. The neutral particles allow exploitation while the charged particles
enforce separation to maintain exploration.
Random black hole PSO [95] is a PSO algorithm based on the concept of black
holes in physics. In each dimension of a particle, a black hole located nearest to
the best particle of the swarm in current generation is randomly generated and then
particles of the swarm are randomly pulled into the black hole with a probability
p. This helps the algorithm fly out of local minima, and substantially speed up the
evolution process to global optimum.
Social learning plays an important role in behavior learning among social animals.
In contrast to individual learning, social learning allows individuals to learn behaviors
from others without the cost of individual trials and errors. Social learning PSO [18]
introduces social learning mechanisms into PSO. Each particle learns from any of
164 9 Particle Swarm Optimization
the better particles (termed demonstrators) in the current swarm. Social learning
PSO adopts a dimension-dependent parameter control method. It performs well on
low-dimensional problems and is promising for solving large-scale problems as well.
In [5], agents in the swarm are categorized into explorers and settlers, which can
dynamically exchange their role in the search process. This particle task differen-
tiation is achieved through a different way of adjusting the particle velocities. The
coefficients of the cognitive and social component of the stochastic acceleration as
well as the inertia weight are related to the distance of each particle from the gbest
position found so far. This particle task differentiation enhances the local search
ability of the particles close to the gbest and improves the exploration ability of the
particles far from the gbest.
PSO lacks mechanisms which add diversity to exploration in the search process.
Inspired by the collective response behavior of starlings, starling PSO [65] introduces
a mechanism to add diversity into PSO. This mechanism consists of initialization,
identifying seven nearest neighbors, and orientation change.
In PSO, the particles move through the solution space through perturbations of their
position, which are influenced by other particles, whereas in EAs, individuals breed
with one another to produce new individuals. Compared to EAs, PSO is easy to
implement and there are few parameters to adjust. In PSO, every particle remembers
its pbest and gbest, thus having a more effective memory capability than EAs have.
PSO is also more efficient in maintaining the diversity of the swarm, since all the
particles use the information related to the most successful particle in order to improve
themselves, whereas in EAs only the good solutions are saved.
Hybridization of EAs and PSO is usually implemented by incorporating genetic
operators into PSO to enhance the performance of PSO: to keep the best particles
[4], to increase the diversity, and to improve the ability to escape local minima [61].
In [4], a tournament selection process is applied to replace each poorly performing
particle’s velocity and position with those of better performing particles. In [61], basic
PSO is combined with arithmetic crossover. The hybrid PSOs combine the velocity
and position update rules with the ideas of breeding and subpopulations. The swarm
is divided into subpopulations, and a breeding operator is used within a subpopulation
or between the subpopulations to increase the diversity of the population. In [82], the
standard velocity and position update rules of PSO are combined with the concepts
of selection, crossover, and mutation. A breeding ratio is employed to determine
the proportion of the population that undergoes breeding procedure in the current
generation and the portion to perform regular PSO operation. Grammatical swarm
adopts PSO coupled to a grammatical evolution genotype–phenotype mapping to
generate programs [67].
Evolutionary self-adapting PSO [63] grants a PSO scheme with an explicit selec-
tion procedure and with self-adapting properties for its parameters. This selection
9.5 PSO and EAs: Hybridization 165
acts on the weights or parameters governing the behavior of a particle and, the particle
movement operator is introduced to generate diversity.
In [39], mutation, crossover, and elitism are incorporated into PSO. The upper-
half of the best-performing individuals, known as elites, are regarded as a swarm
and enhanced by PSO. The enhanced elites constitute half of the population in the
new generation, while crossover and mutation operations are applied to the enhanced
elites to generate the other half.
AMALGAM-SO [90] implements self-adaptive multimethod search using a sin-
gle universal genetic operator for population evolution. It merges the strengths of
CMA-ES, GA, and PSO for population evolution during each generation and imple-
ments a self-adaptive learning strategy to automatically tune the number of offspring.
The method scales well with increasing number of dimensions, converges in the close
proximity of the global minimum for functions with noise induced multimodality,
and is designed to take full advantage of the power of distributed computer networks.
Time-varying acceleration coefficients (TVAC) [78] are introduced to efficiently
control the local search and convergence to the global optimum, in addition to the
time-varying inertia weight factor in PSO. Mutated PSO with TVAC adds a pertur-
bation to a randomly selected modulus of the velocity vector of a random particle by
predefined probability. Self-organizing hierarchical PSO with TVAC considers only
the social and cognitive parts, but eliminates the inertia term in the velocity update
rule. Particles are reinitialized whenever they are stagnated in the search space, or
any component of a particle’s velocity vector becomes very close to zero.
Generate a random number r for each bit. If r < Ti,d , then xid is interpreted as 1;
otherwise, as 0. The velocity term is limited to |vi,d | < Vmax . To prevent Ti,d from
approaching 0 or 1, one can force Vmax = 4 [44].
166 9 Particle Swarm Optimization
Based on the discrete PSO proposed in [43], multiphase discrete PSO [3] is for-
mulated by using an alternative velocity update technique, which incorporates hill
climbing using random stepsize in the search space. The particles are divided into
groups that follow different search strategies. A discrete PSO algorithm is proposed
in [56] for flowshop scheduling, where the particle and velocity are redefined, an
efficient approach is developed to move a particle to the new sequence, and a local
search scheme is incorporated.
Jumping PSO [62] is a discrete PSO inspired from frogs. The positions x i of
particles jump from one solution to another. It does not consider any velocity. Each
particle has three attractors: its own best position, the best position of its social
neighborhood, and the gbest position. A jump approaching an attractor consists of
changing a feature of the current solution by a feature of the attractor.
Multiple swarms in PSO explore the search space together to attain the objective of
finding the optimal solutions. This resembles many bird species joining to form a
flock in a geographical region, to achieve certain foraging behaviors that benefit one
another. Each species has different food preferences. This corresponds to multiple
swarms locating possible solutions in different regions of the solution space. This
is also similar to people all over the world: In each country, there is a different
lifestyle that is best suited to the ethnic culture. A species can be defined as a group
of individuals sharing common attributes according to some similarity metric.
Multi-swarm PSO is used for solving multimodal problems and combating PSO’s
tendency in premature convergence. It typically adopts a heuristically chosen number
of swarms with a fixed swarm size throughout the search process. Multi-swarm PSO
is also used to locate and track changing optima in a dynamic environment.
Based on guaranteed convergence PSO [87], niching PSO [11] creates a subswarm
from a particle and its nearest spatial neighbor, if the variance in that particle’s fitness
is below a threshold. Niching PSO initially sets up subswarm leaders by training the
main swarm utilizing the basic PSO using no social information (c2 = 0). Niches are
then identified and a subswarm radius is set. As optimization progresses, particles
are allowed to join subswarms, which are in turn allowed to merge. Once the velocity
has minimized, they converge to their subswarm optimum.
In turbulent PSO [19], the population is divided into two subswarms: one sub-
swarm following the gbest, while the other moving in the opposite direction. The
particles’ positions are dependent on their lbest, their corresponding subswarm’s
best, and the gbest collected from the two subswarms. If the gbest has not improved
for fifteen successive iterations, the worst particles of a subswarm are replaced by the
best ones from the other subswarm, and the subswarms switch their flight directions.
Turbulent PSO avoids premature convergence by replacing the velocity memory by
a random turbulence operator when a particle exceeds it. Fuzzy adaptive turbulent
9.7 Multi-swarm PSOs 167
PSO [58] is a hybrid of turbulent PSO with a fuzzy logic controller to adaptively
regulate the velocity parameters.
Speciation-based PSO [54] uses spatial speciation for locating multiple local
optima in parallel. Each species is grouped around a dominating particle called the
species seed. At each iteration, species seeds are identified from the entire popula-
tion, and are then adopted as neighborhood bests for these individual species groups
separately. Dynamic speciation-based PSO [69] modifies speciation-based PSO for
tracking multiple optima in the dynamic environment by comparing the fitness of
each particle’s current lbest with its previous record to continuously monitor the
moving peaks, and by using a predefined species population size to quantify the
crowdedness of species before they are reinitialized randomly in the solution space
to search for new possible optima.
In adaptive sequential niche PSO [94], the fitness values of the particles are mod-
ified by a penalty function to prevent all subswarms from converging to the same
optima. A niche radius is not required. It can find all optimal solutions for multimodal
function sequentially.
In [48], the swarm population is clustered into a certain number of clusters. Then,
a particle’s lbest is replaced by its cluster center, and the particles’ gbest is replaced
by the neighbors’ best. This approach has improved the diversity and exploration of
PSO. In [72], in order to solve multimodal problems, clustering is used to identify
the niches in the swarm population and then to restrict the neighborhood of each
particle to the other particles in the same cluster in order to perform a local search
for any local minima located within the clusters.
In [9], the population of particles are split into a set of interacting swarms,
which interact locally by an exclusion parameter and globally through a new anti-
convergence operator. Each swarm maintains diversity either by using charged or
quantum particles. Quantum swarm optimization (QSO) builds on the atomic pic-
ture of charged PSO, and uses a quantum analogy for the dynamics of the charged
particles. Multi-QSO uses multiple swarms [9].
In multigrouped PSO [81], N solutions of a multimodal function can be searched
with N groups. A repulsive velocity component is added to the particle update equa-
tion, which will push the intruding particles out of the other group’s gbest radius.
The predefined radius is allowed to increase linearly during the search process to
avoid several groups from settling on the same peak.
When multi-swarms are used for enhancing diversity of PSO, each swarm per-
forms a PSO paradigm independently. After some predefined generations, the swarms
will exchange information based on a diversified list of particles. Some strategies
for information exchange between two or more swarms are given in [28,75]. In [28],
two subswarms are updated independently for a certain interval, and then, the best
particles (information) in each subswarm are exchanged. In [75], swarm population
is initially clustered into a predefined number of swarms. Particles’ positions are first
updated using a PSO equation where three levels of communications are facilitated,
namely, personal, global, and neighborhood levels. At every iteration, the particles in
a swarm are divided into two sets: One set of particles is sent to another swarm, while
the other set of particles will be replaced by the individuals from other swarms [75].
168 9 Particle Swarm Optimization
ture. Tribes may benefit by removal of their weakest member, or by addition of a new
member. The best particles of the tribes are exchanged among all the tribes. Relation-
ships between particles in a tribe are similar to those defined in global PSO. TRIBES
is efficient in quickly finding a good region of the landscape, but less efficient for
local refinement.
Problems
9.1 Explain why in basic PSO with a neighbor structure a larger neighbor number
has faster convergence, but in fully informed PSO the opposite is true.
9.2 Implement the particleswarm solver of MATLAB Global Optimization
Toolbox for solving a benchmark function. Test the influence of different para-
meter settings.
References
1. Akat SB, Gazi V. Decentralized asynchronous particle swarm optimization. In: Proceedings of
the IEEE swarm intelligence symposium, St. Louis, MO, USA, September 2008. p. 1–8.
2. Alatas B, Akin E, Bedri A. Ozer, Chaos embedded particle swarm optimization algorithms.
Chaos Solitons Fractals. 2009;40(5):1715–34.
3. Al-kazemi B, Mohan CK. Multi-phase discrete particle swarm optimization. In: Proceedings
of the 4th international workshop on frontiers in evolutionary algorithms, Kinsale, Ireland,
January 2002.
4. Angeline PJ. Using selection to improve particle swarm optimization. In: Proceedings of IEEE
congress on evolutionary computation, Anchorage, AK, USA, May 1998. p. 84–89.
5. Ardizzon G, Cavazzini G, Pavesi G. Adaptive acceleration coefficients for a new search diver-
sification strategy in particle swarm optimization algorithms. Inf Sci. 2015;299:337–78.
6. Baskar S, Suganthan P. A novel concurrent particle swarm optimization. In: Proceedings of
IEEE congress on evolutionary computation (CEC), Beijing, China, June 2004. p. 792–796.
7. Bastos-Filho CJA, Carvalho DF, Figueiredo EMN, de Miranda PBC. Dynamicclan particle
swarm optimization. In: Proceedings of the 9th international conference on intelligent systems
design and applications (ISDA’09), Pisa, Italy, November 2009. p. 249–254.
8. Blackwell TM, Bentley P. Don’t push me! Collision-avoiding swarms. In: Proceedings of
congress on evolutionary computation, Honolulu, HI, USA, May 2002, vol. 2. p. 1691–1696.
9. Blackwell T, Branke J. Multiswarms, exclusion, and anti-convergence in dynamic environ-
ments. IEEE Trans Evol Comput. 2006;10(4):459–72.
10. Bonyadi MR, Michalewicz Z. A locally convergent rotationally invariant particle swarm opti-
mization algorithm. Swarm Intell. 2014;8:159–98.
11. Brits R, Engelbrecht AF, van den Bergh F. A niching particle swarm optimizer. In: Proceedings
of the 4th Asia-Pacific conference on simulated evolutions and learning, Singapore, November
2002. p. 692–696.
12. Carlisle A, Dozier G. An off-the-shelf PSO. In: Proceedings of workshop on particle swarm
optimization, Indianapolis, IN, USA, Jannuary 2001. p. 1–6.
13. Carvalho DF, Bastos-Filho CJA. Clan particle swarm optimization. In: Proceedings of IEEE
congress on evolutionary computation (CEC), Hong Kong, China, June 2008. p. 3044–3051.
170 9 Particle Swarm Optimization
14. Cervantes A, Galvan IM, Isasi P. AMPSO: a new particle swarm method for nearest neighbor-
hood classification. IEEE Trans Syst Man Cybern Part B. 2009;39(5):1082–91.
15. Chatterjee S, Goswami D, Mukherjee S, Das S. Behavioral analysis of the leader particle during
stagnation in a particle swarm optimization algorithm. Inf Sci. 2014;279:18–36.
16. Chen H, Zhu Y, Hu K. Discrete and continuous optimization based on multi-swarm coevolution.
Nat Comput. 2010;9:659–82.
17. Chen W-N, Zhang J, Lin Y, Chen N, Zhan Z-H, Chung HS-H, Li Y, Shi Y-H. Particle swarm
optimization with an aging leader and challengers. IEEE Trans Evol Comput. 2013;17(2):241–
58.
18. Cheng R, Jin Y. A social learning particle swarm optimization algorithm for scalable optimiza-
tion. Inf Sci. 2015;291:43–60.
19. Chen G, Yu J. Two sub-swarms particle swarm optimization algorithm. In: Advances in natural
computation, vol. 3612 of Lecture notes in computer science. Berlin: Springer; 2005. p. 515–
524.
20. Cleghorn CW, Engelbrecht AP. A generalized theoretical deterministic particle swarm model.
Swarm Intell. 2014;8:35–59.
21. Cleghorn CW, Engelbrecht AP. Particle swarm variants: standardized convergence analysis.
Swarm Intell. 2015;9:177–203.
22. Clerc M, Kennedy J. The particle swarm-explosion, stability, and convergence in a multidi-
mensional complex space. IEEE Trans Evol Comput. 2002;6(1):58–73.
23. Clerc M. Particle swarm optimization. In: International scientific and technical encyclopaedia.
Hoboken: Wiley; 2006.
24. Coelho LS, Krohling RA. Predictive controller tuning using modified particle swarm optimi-
sation based on Cauchy and Gaussian distributions. In: Proceedings of the 8th online world
conference soft computing and industrial applications, Dortmund, Germany, September 2003.
p. 7–12.
25. de Oca MAM, Stutzle T, Birattari M, Dorigo M. Frankenstein’s PSO: a composite particle
swarm optimization algorithm. IEEE Trans Evol Comput. 2009;13(5):1120–32.
26. de Oca MAM, Stutzle T, Van den Enden K, Dorigo M. Incremental social learning in particle
swarms. IEEE Trans Syst Man Cybern Part B. 2011;41(2):368–84.
27. Eberhart RC, Shi Y. Comparing inertia weights and constriction factors in particle swarm
optimization. In: Proceedings of IEEE congress on evolutionary computation (CEC), La Jolla,
CA, USA, July 2000. p. 84–88.
28. El-Abd M, Kamel MS. Information exchange in multiple cooperating swarms. In: Proceedings
of IEEE swarm intelligence symposium, Pasadena, CA, USA, June 2005. p. 138–142.
29. Esquivel SC, Coello CAC. On the use of particle swarm optimization with multimodal func-
tions. In: Proceedings of IEEE congress on evolutionary computation (CEC), Canberra, Aus-
tralia, 2003. p. 1130–1136.
30. Fan SKS, Liang YC, Zahara E. Hybrid simplex search and particle swarm optimization for the
global optimization of multimodal functions. Eng Optim. 2004;36(4):401–18.
31. Fernandez-Martinez JL, Garcia-Gonzalo E. Stochastic stability analysis of the linear continuous
and discrete PSO models. IEEE Trans Evol Comput. 2011;15(3):405–23.
32. Hakli H, Uguz H. A novel particle swarm optimization algorithm with Levy flight. Appl Soft
Comput. 2014;23:333–45.
33. He S, Wu QH, Wen JY, Saunders JR, Paton RC. A particle swarm optimizer with passive
congregation. Biosystems. 2004;78:135–47.
34. Higashi N, Iba H. Particle swarm optimization with Gaussian mutation. In: Proceedings of
IEEE swarm intelligence symposium, Indianapolis, IN, USA, April 2003. p. 72–79.
35. Ho S-Y, Lin H-S, Liauh W-H, Ho S-J. OPSO: orthogonal particle swarm optimization and its
application to task assignment problems. IEEE Trans Syst Man Cybern Part A. 2008;38(2):288–
98.
References 171
36. Hsieh S-T, Sun T-Y, Liu C-C, Tsai S-J. Efficient population utilization strategy for particle
swarm optimizer. IEEE Trans Syst Man Cybern Part B. 2009;39(2):444–56.
37. Huang H, Qin H, Hao Z, Lim A. Example-based learning particle swarm optimization for
continuous optimization. Inf Sci. 2012;182:125–38.
38. Janson S, Middendorf M. A hierarchical particle swarm optimizer and its adaptive variant.
IEEE Trans Syst Man Cybern Part B. 2005;35(6):1272–82.
39. Juang C-F. A hybrid of genetic algorithm and particle swarm optimization for recurrent network
design. IEEE Trans Syst Man Cybern Part B. 2004;34(2):997–1006.
40. Juang C-F, Chung I-F, Hsu C-H. Automatic construction of feedforward/recurrent fuzzy
systems by clustering-aided simplex particle swarm optimization. Fuzzy Sets Syst.
2007;158(18):1979–96.
41. Kadirkamanathan V, Selvarajah K, Fleming PJ. Stability analysis of the particle dynamics in
particle swarm optimizer. IEEE Trans Evol Comput. 2006;10(3):245–55.
42. Kennedy J. Bare bones particle swarms. In: Proceedings of IEEE swarm intelligence sympo-
sium, Indianapolis, IN, USA, April 2003. p. 80–87.
43. Kennedy J, Eberhart RC. A discrete binary version of the particle swarm algorithm. In: Pro-
ceedings of IEEE conference on systems, man, and cybernetics, Orlando, FL, USA, October
1997. p. 4104–4109.
44. Kennedy J, Eberhart RC. Swarm intelligence. San Francisco, CA: Morgan Kaufmann; 2001.
45. Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedings of IEEE international
conference on neural networks, Perth, WA, USA, November 1995, vol. 4. p. 1942–1948.
46. Kennedy J, Mendes R. Population structure and particle swarm performance. In: Proceedings
of congress on evolutionary computation, Honolulu, HI, USA, May 2002. p. 1671–1676.
47. Kennedy J. Small worlds and mega-minds: Effects of neighborhood topology on particle swarm
performance. In: Proceedings of congress on evolutionary computation (CEC), Washington,
DC, USA, July 1999. p. 1931–1938.
48. Kennedy J. Stereotyping: improving particle swarm performance with cluster analysis. In:
Proceedings of congress on evolutionary computation (CEC), La Jolla, CA, July 2000. p.
1507–1512.
49. Kennedy J. The particle swarm: social adaptation of knowledge. In: Proceedings of IEEE
international conference on evolutionary computation, Indianapolis, USA, April 1997. p. 303–
308.
50. Koh B-I, George AD, Haftka RT, Fregly BJ. Parallel asynchronous particle swarm optimization.
Int J Numer Methods Eng. 2006;67:578–95.
51. Krohling RA. Gaussian swarm: a novel particle swarm optimization algorithm. In: Proceedings
of IEEE conference cybernetics and intelligent systems, Singapore, December 2004. p. 372–
376.
52. Langdon WB, Poli R. Evolving problems to learn about particle swarm optimizers and other
search algorithms. IEEE Trans Evol Comput. 2007;11(5):561–78.
53. Lanzarini L, Leza V, De Giusti A. Particle swarm optimization with variable population size.
In: Proceedings of the 9th international conference on artificial intelligence and soft computing,
Zakopane, Poland, June 2008, vol. 5097 of Lecture notes in computer science. Berlin: Springer;
2008. p. 438–449.
54. Li X. Adaptively choosing neighbourhood bests using species in a particle swarm optimizer for
multimodal function optimization. In: Proceedings of genetic and evolutionary computation
conference (GECCO), Seattle, WA, USA, June 2004. p. 105–116.
55. Liang JJ, Qin AK, Suganthan PN, Baskar S. Comprehensive learning particle swarm optimizer
for global optimization of multimodal functions. IEEE Trans Evol Comput. 2006;10(3):281–
95.
56. Liao C-J, Tseng C-T, Luarn P. A discrete version of particle swarm optimization for flowshop
scheduling problems. Comput Oper Res. 2007;34:3099–111.
172 9 Particle Swarm Optimization
57. Liu Y, Qin Z, Shi Z, Lu J. Center particle swarm optimization. Neurocomputing. 2007;70:672–
9.
58. Liu H, Abraham A. Fuzzy adaptive turbulent particle swarm optimization. In: Proceedings of
the 5th international conference on hybrid intelligent systems (HIS’05), Rio de Janeiro, Brazil,
November 2005. p. 445–450.
59. Loengarov A, Tereshko V. A minimal model of honey bee foraging. In: Proceedings of IEEE
swarm intelligence symposium, Indianapolis, IN, USA, May 2006. p. 175–182.
60. Lovbjerg M, Krink T. Extending particle swarm optimisers with self-organized criticality. In:
Proceedings of congress on evolutionary computation (CEC), Honolulu, HI, USA, May 2002.
p. 1588–1593.
61. Lovbjerg M, Rasmussen TK, Krink T. Hybrid particle swarm optimiser with breeding and sub-
populations. In: Proceedings of genetic and evolutionary computation conference (GECCO),
Menlo Park, CA, USA, August 2001. p. 469–476.
62. Martinez-Garcia FJ, Moreno-Perez JA. Jumping frogs optimization: a new swarm method for
discrete optimization. Technical Report DEIOC 3/2008, Department of Statistics, O.R. and
Computing, University of La Laguna, Tenerife, Spain, 2008.
63. Miranda V, Fonseca N. EPSO—Best of two worlds meta-heuristic applied to power system
problems. In: Proceedings of IEEE congress on evolutionary computation, Honolulu, HI, USA,
May 2002. p. 1080–1085.
64. Mendes R, Kennedy J, Neves J. The fully informed particle swarm: simpler, maybe better.
IEEE Trans Evol Comput. 2004;8(3):204–10.
65. Netjinda N, Achalakul T, Sirinaovakul B. Particle swarm optimization inspired by starling flock
behavior. Appl Soft Comput. 2015;35:411–22.
66. Niu B, Zhu Y, He X. Multi-population cooperative particle swarm optimization. In: Proceedings
of European conference on advances in artificial life, Canterbury, UK, September 2005. p. 874–
883.
67. O’Neill M, Brabazon A. Grammatical swarm: the generation of programs by social program-
ming. Nat Comput. 2006;5:443–62.
68. Pan F, Hu X, Eberhart RC, Chen Y. An analysis of bare bones particle swarm. In: Proceedings
of the IEEE swarm intelligence symposium, St. Louis, MO, USA, September 2008. p. 21–23.
69. Parrott D, Li X. Locating and tracking multiple dynamic optima by a particle swarm model
using speciation. IEEE Trans Evol Comput. 2006;10(4):440–58.
70. Parsopoulos KE, Vrahatis MN. UPSO: a unified particle swarm optimization scheme. In: Pro-
ceedings of the international conference of computational methods in sciences and engineering,
2004. The Netherlands: VSP International Science Publishers; 2004. pp. 868–873.
71. Parsopoulos KE, Vrahatis MN. On the computation of all global minimizers through particle
swarm optimization. IEEE Trans Evol Comput. 2004;8(3):211–24.
72. Passaro A, Starita A. Clustering particles for multimodal function optimization. In: Proceedings
of ECAI workshop on evolutionary computation, Riva del Garda, Italy, 2006. p. 124–131.
73. Pedersen MEH, Chipperfield AJ. Simplifying particle swarm optimization. Appl Soft Comput.
2010;10(2):618–28.
74. Peram T, Veeramachaneni K, Mohan CK. Fitness-distance-ratio based particle swarm opti-
mization. In: Proceedings of the IEEE swarm intelligence symposium, Indianapolis, IN, USA,
April 2003. p. 174–181.
75. Pulido GT, Coello CAC. Using clustering techniques to improve the performance of a par-
ticle swarm optimizer. In: Proceedings of genetic and evolutionary computation conference
(GECCO), Seattle, WA, USA, June 2004. p. 225–237.
76. Qin Q, Cheng S, Zhang Q, Li L, Shi Y. Biomimicry of parasitic behavior in a coevolutionary par-
ticle swarm optimization algorithm for global optimization. Appl Soft Comput. 2015;32:224–
40.
77. Rada-Vilela J, Zhang M, Seah W. A performance study on synchronicity and neighborhood
size in particle swarm optimization. Soft Comput. 2013;17:1019–30.
References 173
78. Ratnaweera A, Halgamuge SK, Watson HC. Self-organizing hierarchical particle swarm opti-
mizer with time-varying acceleration coefficients. IEEE Trans Evol Comput. 2004;8(3):240–
55.
79. Reeves WT. Particle systems—a technique for modeling a class of fuzzy objects. ACM Trans
Graph. 1983;2(2):91–108.
80. Secrest BR, Lamont GB. Visualizing particle swarm optimizationGaussian particle swarm
optimization. In: Proceedings of the IEEE swarm intelligence symposium, Indianapolis, IN,
USA, April 2003. p. 198–204.
81. Seo JH, Lim CH, Heo CG, Kim JK, Jung HK, Lee CC. Multimodal function optimization
based on particle swarm optimization. IEEE Trans Magn. 2006;42(4):1095–8.
82. Settles M, Soule T. Breeding swarms: a GA/PSO hybrid. In: Proceedings of genetic and evo-
lutionary computation conference (GECCO), Washington, DC, USA, June 2005. p. 161–168.
83. Shi Y, Eberhart RC. A modified particle swarm optimizer. In: Proceedings of IEEE congress
on evolutionary computation, Anchorage, AK, USA, May 1998. p. 69–73.
84. Silva A, Neves A, Goncalves T. An heterogeneous particle swarm optimizer with predator
and scout particles. In: Proceedings of the 3rd international conference on autonomous and
intelligent systems (AIS 2012), Aveiro, Portugal, June 2012. p. 200–208.
85. Stacey A, Jancic M, Grundy I. Particle swarm optimization with mutation. In: Proceedings of
IEEE congress on evolutionary computation (CEC), Canberra, Australia, December 2003. p.
1425–1430.
86. Suganthan PN. Particle swarm optimizer with neighborhood operator. In: Proceedings of IEEE
congress on evolutionary computation (CEC), Washington, DC, USA, July 1999. p. 1958–1962.
87. van den Bergh F, Engelbrecht AP. A new locally convergent particle swarm optimizer. In: Pro-
ceedings of IEEE conference on systems, man, and cybernetics, Hammamet, Tunisia, October
2002, vol. 3. p. 96–101.
88. van den Bergh F, Engelbrecht AP. A cooperative approach to particle swarm optimization.
IEEE Trans Evol Comput. 2004;3:225–39.
89. van den Bergh F, Engelbrecht AP. A study of particle swarm optimization particle trajectories.
Inf Sci. 2006;176(8):937–71.
90. Vrugt JA, Robinson BA, Hyman JM. Self-adaptive multimethod search for global optimization
in real-parameter spaces. IEEE Trans Evol Comput. 2009;13(2):243–59.
91. Wang H, Liu Y, Zeng S, Li C. Opposition-based particle swarm algorithm with Cauchy muta-
tion. In: Proceedings of the IEEE congress on evolutionary computation (CEC), Singapore,
September 2007. p. 4750–4756.
92. Yang C, Simon D. A new particle swarm optimization technique. In: Proceedings of the 18th
IEEE international conference on systems engineering, Las Vegas, NV, USA, August 2005. p.
164–169.
93. Zhan Z-H, Zhang J, Li Y, Chung HS-H. Adaptive particle swarm optimization. IEEE Trans
Syst Man Cybern Part B. 2009;39(6):1362–81.
94. Zhang J, Huang DS, Lok TM, Lyu MR. A novel adaptive sequential niche technique for
multimodal function optimization. Neurocomputing. 2006;69:2396–401.
95. Zhang J, Liu K, Tan Y, He X. Random black hole particle swarm optimization and its application.
In: Proceedings on IEEE international conference on neural networks and signal processing,
Nanjing, China, June 2008. p. 359–365.
Artificial Immune Systems
10
EAs and PSO tend to converge to a single optimum and hence progressively lose
diversity. This is not the case for artificial immune systems (AISs). AISs are based on
four main immunological theories, namely, clonal selection, immune networks, neg-
ative selection, and danger theory. This chapter introduces four immune algorithms
inspired by the four immunological theories.
10.1 Introduction
Artificial immune system (AIS) is inspired by ideas gleaned from the biological
immune system. The immune system is a collection of defense mechanisms in a
living body that protects the body from disease by detecting, identifying, and killing
pathogens and tumor cells. It discriminates between the host organism’s own mole-
cules and external pathogenic molecules. It has inherent mechanisms for maintaining
and boosting the diversity of the immune repertoire.
The immune system of vertebrates protects living bodies against the invasion of
various foreign substances (called antigens or pathogens) such as viruses, harmful
bacteria, parasites and fungi, and eliminates debris and malfunctioning cells. This
job does not depend upon prior knowledge of these pathogens. The immune sys-
tem has a memory of previously encountered pathogens in the form of memory
cells. Immune response then quickly destroys the nonself-cells and stores memory
for similar intruders. This protection property, along with the distributed and self-
organized nature, has made the immune system particularly useful within computer
science and for intrusion detection.
The immune system is made up of some organs (e.g., thymus, spleen, lymph
nodes) and a huge number of cells (1012 –1013 in a human being) of different types.
Like the neural system, the immune system has a high degree of robustness. The two
basic components of the immune system are two types of white blood cells, called
B lymphocytes (B cells) and T lymphocytes (T cells).
B lymphocytes are blood cells produced by bone marrow and migrate to the
spleen, where they mature and differentiate into mature B lymphocytes, which are
then released into the blood and lymph systems. T cells are also produced in the
bone marrow, but migrate to and mature in the thymus. Both B and T cells can
encounter antigens, proliferate and evolve, and mature into fully functional cells.
Cell-mediated immunity is mediated by T lymphocytes, and humoral immunity is
mediated by secreted antibodies produced in B lymphocytes.
Roughly 107 distinct types of B lymphocytes exist in a human body. B lymphocytes
generate a Y-shaped molecular structure called antibody (Ab) on their surfaces to
recognize and bind to foreign cells (antigens) or malfunctioning self-cells. After
maturation, they have B cell receptors of one specific type on their membrane, called
immunoglobulin (Ig) receptors. When a B cell encounters its specific antigen for the
first time through its Ig receptors and receives additional signals from a helper T cell,
it further differentiates into an effector cell called a plasma cell. Plasma cells, instead
of having Ig receptors, produce antibodies, which lock onto the antigens. Antibodies
bind to antigens on the surfaces of invading pathogens and trigger their destruction.
Phagocytes destroy any pathogens.
T lymphocytes regulate the production of antibodies from B lymphocytes. T cells
are produced through negative selection. They express unique T cell receptors. When
a T cell receptor encounters any antigen with major histocompatibility complex
molecule, it undergoes proliferation as well as production of memory cells.
Lymphocytes normally stay in a passive state until they encounter antigens. After
an infection, the antigen leaves a genetic blueprint memory on B or T lymphocytes
so that each lymphocyte recognizes one type of antigen. Some cloned B cells can
differentiate into B memory cells. Adaptive cells that are not stimulated by any
antigen are eliminated. This phenomenon is called immunological memory. Memory
cells circulate through the body. They live long by costimulating one another in a
way that mimics the presence of the antigen.
When exposed to an antigenic stimulus, B lymphocytes differentiate into plasma
that are capable of producing high-affinity antibodies for the specific antigen. These
new cloned cells suffer a high-rate somatic mutation (or hypermutation) that will
promote their genetic variation; a mechanism of selective pressure will result in the
survival of cells with increased affinity. An antibody recognizes and eliminates a
specific type of antigen. The damaged antigens are eliminated by scavenger cells
called macrophages. The immune system is required to recognize all cells (or mole-
cules) within the body as self or nonself. A simplified view of the immune system is
illustrated in Figure 10.1.
Memory can also be developed in an artificial manner by means of vaccination.
Vaccines are attenuated live virus or dead pathogenic cells that can activate the
immune system to develop resistance to particular pathogen groups. When a vaccine
is administered, the immune system detects the vaccine and develops resistance
against the pathogen in the vaccine. These memory cells recognize real pathogens
and defend the body before severe damage results.
10.1 Introduction 177
Figure 10.1 Principle of the immune system: Red shape stands for the antigen, blue ones for
immune system detectors, and green one denotes the antibody. Antigens are eliminated by general-
purpose scavenger cells (macrophages). Reproduced from Figure 1 in [15].
Four main immunological theories are clonal selection [1,3], immune networks [21],
negative selection, and danger theory [23]. The learning and memory mechanisms of
the immune system typically take clonal selection and immune network theories as a
basis, whereas the selection of detectors for identifying anomalous entities is based
on the negative selection theory. The biological immune system has the features of
immunological memory and immunological tolerance.
Artificial immune networks [2,10,29] employ two types of dynamics. The short-
term dynamics govern the concentration of a fixed set of lymphocyte clones and
the corresponding immunoglobins. The metadynamics govern the recruitment of
new species from an enormous pool of lymphocytes freshly produced by the bone
marrow. The short-term dynamics correspond to a set of cooperating or competing
agents, while the metadynamics refine the results of the short-term dynamics. In this
sense, the short-term dynamics resemble evolution and the metadynamics resemble
learning.
Clonal Selection Theory
In clonal selection theory [3], when immune cells are stimulated by antigens, clonal
proliferation occurs; a large number of clones are generated, and then these clones
differentiate into effect cells and memory cells. Effect cells generate a large number
178 10 Artificial Immune Systems
of antibodies, which duplicate and mutate to make affinities gradually increase and
eventually reach affinity maturation. Clonal selection theory simulates the evolution
of immune cells, which can learn and memorize the modes of antigens.
The antibodies with good affinity value are selected as parents and are led to
proliferation by producing multiple offspring in an asexual manner (mitosis). In
immunology, cloning corresponds to asexual reproduction so that multiple identical
cells can be obtained from a parent by mitosis. These offspring are copies of parent
antibodies, but further undergo affinity maturation. An offspring replaces the parent
only if it has improved its fitness value.
Clonal selection theory describes the basic features of an immune response to an
antigenic stimulus. The clonal operation is an antibody random map induced by the
affinity, including four steps, namely clone, clonal crossover, clonal mutation, and
clonal selection.
Immune Networks
Immune network theory [21] states that the B cells are interconnected to form a
network. It is an important complement to clonal selection theory. When a B cell
is stimulated by an antigen, the stimulated cell activates other B cells in the net-
work through its paratopes [26]. Cells with close resemblance to one another are
suppressed, and new cells are generated to replace lost cells. Thus, the network can
maintain population diversity and equilibrium. B cells can edit their receptors by
randomly changing the genetic orientation of their receptors. The change may result
in higher affinity between the antigen epitope and the B cell antibody. When a B
cell is first activated, it increases in number; this stimulates the neighboring cell to
suppress the first stimulated antibody. Differential equations are designed to accom-
modate idiotypic interactions, in consideration of antigenic recognition, death of
unstimulated cells, and influx of new cells [12].
Idiotypic network theory is derived from immune network theory. It postulates
that the immune system can be seen as a network in which the interactions can not
only be between antigens and antibodies, but also between antibodies and antibodies.
This induces either stimulating or suppressive immune responses. These result in a
series of immunological behaviors, including tolerance and memory emergence.
Negative Selection
Negative selection is a way of differentiating self from nonself. The immune system
destroys all the generated antibodies, which are similar to self to avoid self-destructive
immune responses. Negative selection is performed in the thymus, where all T cells
that recognize self-cells are excluded, whereas T cells having less affinity to self-cells
are tolerated and released to the system. The negative selection algorithm mimics
this biological process of generating mature T cells and self-/nonself-discrimination.
This allows the immune system to detect previously unseen harmful cells.
Danger Theory
Danger theory [23,24], proposed by Matzinger in 1994, argues that the objective of
immune system is not to discriminate between self and nonself, but to react to signs
10.2 Immunological Theories 179
of damage to the body. It explains why the immune system is able to distinguish
the nonself-antigens and self-antigens. The nonself-antigens make the body produce
biochemical reactions different from natural rules and the reactions produce danger
signals of different levels. Danger theory introduces the environmental factors of the
body. It can explain some immune phenomena, such as autoimmune diseases.
Danger theory states that the immune system will only respond when damage
is indicated and is actively suppressed otherwise. The immune system is triggered
by a danger signal produced by a necrotic cell which unexpectedly dies due to a
pathogenic infection. When a cell is infected, it establishes a danger zone around
itself to mitigate and localize the impact of the attack. In principle, danger theory
views all cells in the human body as antigens. It relies on the function of dendrite
cells, a family of cells known as macrophages. In nature, dendritic cells are the
intrusion detection agents of the human body, monitoring the tissue and organs for
potential invaders in the form of pathogens.
Signals are collected by dendritic cells from their local environment. Dendritic
cells combine molecular information and interpret this information for the T cells
and controls the activation state of T cells in the lymph nodes. The dendrite cell has
three states, namely immature state, semi-mature state, and mature state.
The immune system produces danger signals in the form of molecules based
on the environmental changes. These molecules are released as a by-product of
unplanned cell death, necrosis. By combining the signals from the tissue, these
cells produce their own output signals to instruct the responder cells of the immune
system to deal with the potential damage. The danger signal creates a danger zone
around itself and immune cells within this zone will be activated to participate in
the immune response. Danger signals are indicators of abnormality. The PAMP
(pathogenic associated molecular patterns) signals are a class of molecules expressed
exclusively by microorganisms such as bacteria. They are processed as environmental
input and are a strong indicator that a non-host-based entity is present. Safe signals
are released as a result of healthy tissue cell function and this form of cell death is
termed apoptosis.
At the beginning of the detection process, the dendrite cells are initially immature
cells in thymus. The dendrite cell collects the body cell protein paired with its three
signals in cell tissue. Based on the collected input, the dendrite cell will evolve from
being immature into either a semi-mature (apoptotic death) or a mature state (necrotic
death). At this phase, the dendrite cell is migrated from cell tissue to lymph node.
Reaching a mature state indicates that the cell has experienced more danger signals
throughout its life span, and that the harmful antigen has been detected and a danger
zone will be released. In a mature state, T cells are activated to release antibody. A
semi-mature state indicates that apoptotic death has occurred as part of normal cell
function, and the semi-mature dendrite cells cannot activate T Cells and they are
tolerized to the presented antigen.
180 10 Artificial Immune Systems
1. Set t = 0.
2. Initialize a population P of cells (antibodies) and a set of memory M = ∅.
3. Repeat:
a. Selection. Select n best cells (antibodies) to generate a new population Pn
according to the affinity principle.
b. Clonal. Reproduce a population of clones C from the population Pn . More
offspring are produced for higher affinity cells.
c. Maturation. Hypermutate the cells to create the population C ∗ .
d. Reselection. Reselect the improved cells from C ∗ and update the memory set
M.
e. Diversity introduction. Replace d cells in P with Nd newly generated cells.
f. Set t = t + 1.
until termination criterion is satisfied.
Example 10.1: Revisit the Rastrigin function treated in Example 6.1 and Exam-
ple 14.1. The global optimum is f (x) = 0 at x ∗ = 0.
We implement CLONALG with the following parameters: population size is set
as 50, the best population size as 20, the clone size factor as 0.7, and the maximum
number of iterations as 100. The initial population is randomly generated from the
entire domain. For a random run, we have the optimum solution f (x) = 1.2612 ×
10−6 at (−0.1449 × 10−4 , −0.7840 × 10−4 ). For 10 random runs, the solver always
converged toward a point very close to the global optimum. The evolution of a random
run is illustrated in Figure 10.2. The average cost is the mean of the 20 best solutions,
and it is very close to the optimum solution. CLONALG has very good diversity,
since the clone and mutation operations are applied on the 20 best solutions for
each generation. It continuously searches for the global optimum even after many
iterations.
−2
10
−4
10
−6
10
0 50 100 150 200
Iteration
Figure 10.2 The evolution of a random run of CLONALG for Rastrigin function: the minimum
and average objectives.
184 10 Artificial Immune Systems
The aiNet (artificial immune network) [9] combines CLONALG with immune net-
work theory for solving optimization problems. It is a connectionist, competitive and
constructive network, where the antibodies correspond to the network nodes and the
antibody concentration and affinity are their states. Learning is responsible for the
changes in antibody concentration and affinity. The decision as to which node is to
be cloned, suppressed, or maintained depends on the interaction established by the
immune network using an affinity measure. Learning aims at building a memory set
that recognizes and represents the antigenic spatial distribution. The nodes work as
internal images of ensembles of patterns, and the connection strengths describe the
similarities among these ensembles.
Optimized aiNet (opt-aiNet) [8] adapts aiNet for multimodal optimization prob-
lems by locating and maintaining a memory of multiple optimal solutions. It can
dynamically adjust the population size and maintain stable local optima solutions.
Opt-aiNet represents cells by real-valued vectors in the search space. The initial
population goes through fitness evaluation, cloning, and mutation operations. After
these operations, fitter antibodies from each clone are selected and passed to form
the memory set. This process is repeated until the available population stabilizes
in the local search. When this population reaches a stable state, the cells interact
with one another in a network form, and some of the cells with affinity above a
preset threshold are eliminated to avoid redundancy. Antibodies that have affinities
less than the suppression threshold are eliminated. The affinity between two cells is
determined by their Euclidean distance. Afterward, new antibodies are introduced
to the system to encourage the exploration in the decision space. Opt-aiNet delivers
the selection process to clone level by selecting the elitist from each clone. Roughly,
the computational complexity of the algorithm is quadratic in the number of cells in
the network. Opt-aiNet algorithm is described in Algorithm 10.2.
As a member of aiNet family, omni-aiNet [4] presents self-maintenance of diver-
sity in the population, simultaneous search for multiple high-quality solutions, and
dynamical adjustment of its population by adapting to the optimization problem.
The dopt-aiNet algorithm [11] enhances the diversity of the population, and refines
individuals of solutions to suit dynamic optimization. It introduces golden section
line search procedure for choosing the best step size of mutation, and two muta-
tion operators, namely, one-dimensional mutation and gene duplication, are used.
Danger theory-based immune network algorithm [32], named dt-aiNet, introduces
danger theory into aiNet algorithm in order to increase the solution quality and the
population diversity.
10.3 Immune Algorithms 185
1. Set t = 0.
2. Initialize a population P with N cells.
Initialize Nc , Ns , σs .
3. Repeat:
a. for each cell - Generate Nc clones.
Mutate the clones.
Determine the fitness of each clone.
Select the best cell among the clones and parent cell to form the new population.
end for
b. Determine the average fitness of the new population.
c. if clone suppression should be made (t mod Ns == 0)
Determine the affinity (distance) among all cells.
Suppress cells according to threshold σs .
Introduce randomly generated cells.
end if
d. Set t = t + 1.
until termination criterion is satisfied.
Negative selection algorithm [14] is inspired from the negative selection mecha-
nism with the ability to detect unknown antigens. An efficient implementation of
the algorithm (for binary strings) run in linear time with the number of self input
patterns [14].
At the beginning, the algorithm treats the profiled normal patterns as self-patterns,
which represent the typical property of the date stream to protect. Then, it generates a
number of random patterns (called detectors) and compares them to each self-pattern
to check if a detector recognizes self-patterns. If a detetor matches a self-pattern, it
is discarded, otherwise it is kept as a detector pattern. This process is repeated until
sufficient detectors are accumulated. In the monitoring phase, if a detector pattern
matches any newly profiled pattern, anomaly must have occurred since the data are
corrupted or altered. The detectors are hard to determine so as to cover all data to
protect.
The negative selection algorithm has been applied to anomaly detection, such as
detecting computer security in computer networks [14,15]. In [20], AIS is applied
to computer security in the form of a network intrusion detection system.
Receptor density algorithm [25] is an AIS developed from models of the immuno-
logical T cell and the T-cell receptor’s ability to contribute to T-cell discrimination. It
is an anomaly detection system for generation of clean signatures. Stochastic analy-
sis of the T-cell mechanism modeling results in a hypothesis for T-cell activation,
which is abstracted to a simplified model retaining key mechanisms. The algorithm
places a receptor at each discretized location within the spectrum. At each time step,
each receptor takes an input and produces a binary classification on whether that
location is considered anomalous.
186 10 Artificial Immune Systems
Danger theory provides inspiration for a robust, highly distributed, adaptive, and
autonomous detection mechanism for early outbreak notification with excellent
detection results. Dendritic cell algorithm [17,19] is a population-based algorithm
inspired by the function of the dendritic cells of the human immune system. It incor-
porates the principles of danger theory in immunology. The algorithm is a multi-
sensor data fusion and correlation algorithm that can perform anomaly detection on
time series datasets.
Dendritic cell algorithm does not require a training phase and knowledge of nor-
mality and anomaly is acquired through statistical analysis. It has a linear computa-
tional complexity, making it ideal for anomaly detection tasks, which require high
detection speed. Dendritic cell algorithm has shown a high detection rate and a low
rate of false alarms.
Each dendritic cell in the population has a set of instructions which is followed
each time a dendritic cell is updated. Each dendritic cell performs its own antigen
sampling and signal collection. It is capable of combining multiple data streams and
can add context to data suspected as anomalous. Diversity is generated by migration
of the dendritic cells. Each dendritic cell can perform fusion of signal input to produce
its own signal output. The assessment of the signal output of the entire population is
used to perform correlation with suspect data items.
In dendritic cell algorithm, three types of signals are used. PAMP signal is a con-
fident indicator of anomaly. Danger signal is an indicator of a potential abnormality.
Safe is a confident indicator of normal, predictable, or steady-state system behavior.
Predefined weights are incorporated for each signal category. The output signals are
used to evaluate the status of the monitored system. By defining the danger zone
to calculate danger signals for each antibody, the algorithm adjusts antibodies’ con-
centrations through its own danger signals and then triggers immune responses of
self-regulation.
The input data is mapped to the underlying problem domain. Signals are rep-
resented as vectors of real-valued numbers. Antigens are categorical values repre-
senting what are to be classified within a problem domain. The algorithm aims to
incorporate a relationship to identify antigens that are responsible for the anomalies
reflected by signals. The algorithm first identifies whether anomalies occurred in
the past based on the input data. Then it correlates the identified anomalies with the
potential causes, generating an anomaly scene per suspect.
The dendrite cell acts as an agent that is responsible for collecting antigen cou-
pled with its three context signals. The antigen presents each record contained in the
dataset and the signals present the normalized value of the selected attributes. Each
dendrite cell accumulates the changes that occur in the monitored system and deter-
mines which antigen causes the changes. All input signals are transformed into three
outputs signals, namely, immature (co-stimulatory molecules), mature, semi-mature
states: 3
i=0 Wi j Ii j (x)
O j (x) = 3
, (10.1)
i=0 |Wi j |
10.3 Immune Algorithms 187
where W = [Wi j ] is the weight matrix, I = [Ii j ] is the input signal matrix, O is the
output signal vector, i is the input signal category, and j is the output signal category.
The dendrite cell samples input signals and antigens multiple times. This is analo-
gous to sampling a series of suspected antigens in human body such that the dendrite
cell will hold several antigens until it matures. Throughout the sampling process, the
experience of each cell is increasing whereby the entire experience is documented
in immature (O1 ), mature (O2 ), and semi-mature (O3 ) as output signals. The sam-
pling process stops when the cell is ready to migrate. This occurs when O1 reaches
the migration threshold and the cell is then removed from the population for anti-
gen presentation. After migration, the outputs O2 and O3 are compared in order to
derive a context for the presented item. The antigen is treated mature if O2 > O3
or semi-mature if O2 < O3 . Then, the migrated dendrite cell is replaced with a new
cell to restart sampling and return to the population. This process is iterated several
times.
A prototype dendritic cell algorithm [17] has been applied to a binary classification
problem which can perform two-class discrimination on an ordered dataset, using
a time stamp as antigen and a combination of features forming the three signal
categories. Deterministic dendritic cell algorithm [18] provides a controllable system
by removing a large amount of randomness from the algorithm.
Problems
References
1. Ada GL, Nossal GJV. The clonal selection theory. Sci Am. 1987;257(2):50–7.
2. Atlan H, Cohen IR. Theories of immune networks. Berlin: Spriner; 1989.
3. Burnet FM. The clonal selection theory of acquired immunity. Cambridge, UK: Cambridge
University Press; 1959.
4. Coelho GP, Von Zuben FJ. Omni-aiNet: an immune-inspired approach for omni optimiza-
tion. In: Proceedings of the 5th international conference on artificial immune systems, Oeiras,
Portugal, Sept 2006. p. 294–308.
5. Cutello V, Nicosia G, Pavone M. An immune algorithm with stochastic aging and Kullback
entropy for the chromatic number problem. J Combinator Optim. 2007;14(1):9–33.
6. Dasgupta D. Advances in artificial immune systems. IEEE Comput Intell Mag. 2006;1(4):40–9.
7. de Castro PAD, Von Zuben FJ. BAIS: a Bayesian artificial immune system for the effective
handling of building blocks. Inf Sci. 2009;179(10):1426–40.
8. de Castro LN, Timmins J. An artificial immune network for multimodal function optimization.
In: Proceedings of IEEE congress on evolutionary computation, Honolulu, HI, USA, May
2002, vol. 1, p. 699–704.
188 10 Artificial Immune Systems
9. de Castro LN, Von Zuben FJ. aiNet: an artificial immune network for data analysis. In: Abbass
HA, Sarker RA, Newton CS, editors. Data mining: a heuristic approach. Hershey, USA: Idea
Group Publishing; 2001. p. 231–259.
10. de Castro LN, Von Zuben FJ. Learning and optimization using the clonal selection principle.
IEEE Trans Evol Comput. 2002;6(3):239–51.
11. de Franca FO, Von Zuben FJ, de Castro LN. An artificial immune network for multimodal
function optimization on dynamic environments. In: Proceedings of genetic and evolutionary
computation conference (GECCO), Washington, DC, USA, June 2005. p. 289–296.
12. Engelbrecht AP. Computational intelligence: an introduction. New York: Wiley; 2007.
13. Ferreira C. Gene expression programming: a new adaptive algorithm for solving problems.
Complex Syst. 2001;13(2):87–129.
14. Forrest S, Perelson AS, Allen L, Cherukuri R. Self-nonself discrimination in a computer. In:
Proceedings of IEEE symposium on security and privacy, Oakland, CA, USA, May 1994. p.
202–212.
15. Forrest S, Hofmeyr SA, Somayaji A. Computer immunology. Commun ACM. 1997;40(10):88–
96.
16. Garret SM. Parameter-free, adaptive clonal selection. In: Proceedings of IEEE congress on
evolutionary computation (CEC), Portland, OR, June 2004. p. 1052–1058.
17. Greensmith J, Aickelin U. Dendritic cells for SYN scan detection. In: Proceedings of genetic
and evolutionary computation conference (GECCO), London, UK, July 2007. p. 49–56.
18. Greensmith J, Aickelin U. The deterministic dendritic cell algorithm. In: Proceedings of the
7th International conference on artificial immune systems (ICARIS), Phuket, Thailand, August
2008. p. 291–303.
19. Greensmith J, Aickelin U, Cayzer S. Introducing dendritic cells as a novel immune-inspired
algorithm for anomaly detection. In: Proceedings of the 4th international conference on artificial
immune systems (ICARIS), Banff, Alberta, Canada, Aug 2005. p. 153–167.
20. Hofmeyr SA, Forrest S. Architecture for an artificial immune system. Evol Comput.
2000;8(4):443–73.
21. Jerne NK. Towards a network theory of the immune system. Annales d’Immunologie (Paris).
1974;125C:373–89.
22. Jiao L, Wang L. A novel genetic algorithm based on immunity. IEEE Trans Syst Man Cybern
Part A. 2000;30(5):552–61.
23. Matzinger P. Tolerance, danger and the extended family. Annu Rev Immunol. 1994;12:991–
1045.
24. Matzinger P. The danger model: a renewed sense of self. Science. 2002;296(5566):301–5.
25. Owens NDL, Greensted A, Timmis J, Tyrrell A. T cell receptor signalling inspired kernel
density estimation and anomaly detection. In: Proceedings of the 8th international conference
on artificial immune systems (ICARIS), York, UK, Aug 2009. p. 122–135.
26. Perelson AS. Immune network theory. Immunol Rev. 1989;110:5–36.
27. Smith RE, Forrest S, Perelson AS. Population diversity in an immune system model: implica-
tions for genetic search. In: Whitley LD, editor. Foundations of genetic algorithms, vol. 2. San
Mateo, CA: Morgan Kaufmann Publishers; 1993. p. 153–165.
28. Tang T, Qiu J. An improved multimodal artificial immune algorithm and its convergence
analysis. In: Proceedings of world congress on intelligent control and automation, Dalian,
China, June 2006. p. 3335–3339.
29. Varela F, Sanchez-Leighton V, Coutinho A. Adaptive strategies gleaned from immune networks:
Viability theory and comparison with classifier systems. In: Goodwin B, Saunders PT, editors.
Theoretical biology: epigenetic and evolutionary order (a Waddington Memorial Conference).
Edinburgh, UK: Edinburgh University Press; 1989. p. 112–123.
30. Woldemariam KM, Yen GG. Vaccine-enhanced artificial immune system for multimodal func-
tion optimization. IEEE Trans Syst Man Cybern Part B. 2010;40(1):218–28.
References 189
31. Xu X, Zhang J. An improved immune evolutionary algorithm for multimodal function opti-
mization. In: Proceedings of the 6th international conference on natural computing, Haikou,
China, Aug 2007. p. 641–646.
32. Zhang R, Li T, Xiao X, Shi Y. A danger-theory-based immune network optimization algorithm.
Sci World J;2013:Article ID 810320, 13 p.
Ant Colony Optimization
11
Ants are capable of finding the shortest path between the food and the colony using
a pheromone-laying mechanism. ACO is a metaheuristic optimization approach
inspired by this foraging behavior of ants. This chapter is dedicated to ACO.
11.1 Introduction
Eusociality has evolved independently among the hymenoptera insects (ants and
bees), and among the isoptera insects (termites). These two orders of social insects
have almost identical social structures: populous colonies consisting of sterile work-
ers, often differentiated into castes that are the offspring of one or a few reproductively
competent individuals. This type of social structure is similar to a superorganism, in
which the colony has many attributes of an organism, including physiological and
structural differentiation, coordinated and goal-directed action.
Many species of ants have foraging behavior. The strategies of two types of poner-
ine ant are the army ant style foraging of the genus Leptogenys and the partitioned
space search of Pachycondyla apicalis.
Termite swarms are organized through a complex language of tactile and chem-
ical signals between individual members. These drive the process of recruitment in
response to transient perturbation of the environment. A termite can either experience
a perturbation directly, or is informed of it by other termites. The structures as well
as their construction of the mound of Macrotermes have been made clear in [22].
Swarm cognition in these termites is in the form of extended cognition, whereby
the swarm’s cognitive abilities arise both from interaction among agents within a
swarm, and from the interaction of the swarm with the environment, mediated by
the mound’s dynamic architecture.
Ants are capable of finding the shortest path between the food and the colony (nest)
due to a simple pheromone-laying mechanism. Inspired by the foraging behavior of
ants, ACO is a metaheuristic approach for solving discrete or continuous optimization
problems [1,2,4–6]. Unlike in EAs, PSO and multiagent systems where agents do
not communicate with each other, agents in ant-colony system communicate with
one another with pheromone. The optimization is the result of the collective work of
all the ants in the colony.
Ants use their pheromone trails as a medium for communicating information.
All the ants secrete pheromone and contribute to the pheromone reinforcement, and
old trails will vanish due to evaporation. The pheromone builds up on the traversed
links between nodes. An ant selects a link probabilistically based on the intensity of
the pheromone. Ant-Q [3,8] merges ant-colony system with reinforcement learning
such as Q-learning to update the amount of pheromone on the succeeding link. Ants
in the ant-colony system use only one kind of pheromone for their communication,
while natural ants also use haptic information for communication and possibly learn
the environment with their micro brain.
In ACO, simulated ants walk around the graph representing the problem to solve.
ACO has an advantage over SA and GA when the graph changes dynamically. ACO
has been extended to continuous domains without any major conceptual change
to ACO structure, applied to continuous and mixed discrete-continuous problems
[18,19].
pheromone trials. Edges can also have an associated heuristic value to represent a
priori information about the problem instance definition or runtime information pro-
vided by a source different from the ants. Once all ants have completed their tours
at the end of each generation, the algorithm updates the pheromone trails. Different
ACO algorithms arise from different pheromone update rules.
The probability for ant k at node i moving to node j at generation t is defined by [5]
−β
τi, j (t)di, j
Pi,k j (t) = −β
, j ∈ Jik , (11.1)
u∈Jik τi,u di,u
where τi, j is the intensity of the pheromone on edge i → j, di, j is the distance
between nodes i and j, Jik is the set of nodes that remain to be visited by ant k
positioned at node i to make the solution feasible, and β > 0. A tabu list is used to
save the nodes already visited during each generation. When a tour is completed, the
tabu list is used to compute the ant’s current solution.
Once all the ants have built their tours, the pheromone is updated on all edges
i → j according to a global pheromone updating rule
NP
τi, j (t + 1) = (1 − ρ)τi, j (t) + τi,k j (t), (11.2)
k=1
where τi,k j is the intensity of the pheromone on edge i → j laid by ant k, taking
L k if ant k passes edge i → j and 0 otherwise, ρ ∈ (0, 1) is a pheromone decay
1
parameter, L k is the length of the tour performed by ant k, and N P is the number
of ants. Consequently, a shorter tour gets a higher reinforcement. Each edge has a
long-term memory to store the pheromone intensity. In ACO, pheromone evaporation
provides an effective strategy to avoid rapid convergence to local optima and to favor
the exploration of new areas of the search space.
Finally, a pheromone renewal is again implemented by
τi, j (t + 1) ← max{τmin , τi, j (t + 1)} ∀(i, j). (11.3)
Ant-colony system [4] improves on ant system [5]. It applies a pseudorandom-
proportional state transition rule. The global pheromone updating rule is applied only
to edges that belong to the best ant tour, while in ant system, the pheromone update
is performed at a global level by every ant. Ant-colony system also applies a local
pheromone updating rule during the construction of a solution, which is performed
by every ant every time node j is added to the path being built.
Max–min ant system [20] improves ant system by introducing explicit maximum
and minimum trail strengths on the arcs to alleviate the problem of early stagnation.
In both max–min ant system and ant-colony system, only the best ant updates the
trails in each iteration. The two algorithms differ mainly in the way how a premature
stagnation of the search is prevented.
A convergence proof to the global optimum, which is applicable to a class of ACO
algorithms that constrain all pheromone values not smaller than a given positive lower
bound, is given in [21]. This lower bound prevents the probability of generating any
194 11 Ant Colony Optimization
solution becoming zero. This proof is applicable directly to ant-colony system [4]
and max–min ant system [20]. A short convergence proof for a class of ACO is given
in [21].
In [14], the dynamics of ACO algorithms are analyzed for certain types of per-
mutation problems using a deterministic pheromone update model that assumes an
average expected behavior of the algorithms. In [16], a runtime analysis of sim-
ple ACO algorithm is presented. By deriving lower bounds on the tails of sums of
independent Poisson trials, the effect of the evaporation factor is almost completely
determined and a transition from exponential to polynomial runtime is proved. In
[11], an analysis of ACO convergence time is made based on the absorbing Markov
chain model, and the relationship between convergence time and pheromone rate is
established.
Example 11.1: Consider the TSP for Berlin52 benchmark in TSPlib. Berlin52 pro-
vides coordinates of 52 locations in Berlin, Germany. The length of the optimal tour
is 7542 when using Euclidean distances. In this example, we implement max–min
ant system. The parameters are selected as β = 5, ρ = 0.7. We set the population
size as 40 and the number of iterations as 1000. The best result obtained is 7544.4.
For a random run, the optimal solution is illustrated in Figure 11.1, and the evolution
of a random run is illustrated in Figure 11.2.
11.2 Ant-Colony Optimization 195
1. Set t = 0.
2. Initialize the pheromone matrix T(0), the number of ants N P .
3. sbest ← Null .
4. Repeat:
a. Initialize the set of solutions obtained by ants: Ss (t) ← ∅.
b. for k = 1, . . . , N P do
i. Ant k builds a solution s ∈ S.
S ← {1, 2, . . . , n}.
for i = 1 to n do
Choose item j ∈ S with probability pi j .
S ← S \ { j}.
Build s by the selected items.
end for
) or sbest = N ull, sbest ← s.
ii. if f (s) ≤ f (sbest
iii. Ss (t) ← Ss (t) {s}.
end for
c. Update pheromone T(t) according to Ss (t), sbest .
for all (i, j): τi j ← (1 − ρ)τi j + .
d. Set t = t + 1.
until termination condition is satisfied.
1000
800
600
400
200
0
0 500 1000 1500 2000
9500
8500
8000
7500
0 200 400 600 800 1000
Iteration
API [15] simulates the foraging behavior of Pachycondyla apicalis ants, which
use visual landmarks but not pheromones to memorize the positions and search the
neighborhood of the hunting sites.
Continuous ACO [1,23] generally hybridizes with other algorithms for maintain-
ing diversity. Pheromones are placed on the points in the search space. Each point is
a complete solution, indicating a region for the ants to perform local neighborhood
search. Continuous interacting ant-colony algorithm [7] uses both the pheromone
information and the ants’ direct communications to accelerate the diffusion of infor-
mation. Continuous orthogonal ant-colony algorithm [10] adopts an orthogonal
design method and a global pheromone modulation strategy to enhance the search
accuracy and efficiency.
By analysizing the relationship between the position distribution and food source
in the process of ant-colony foraging, a distribution model of ant-colony foraging
is proposed in [13], based on which a continuous domain optimization algorithm is
implemented.
Traditional ACO is extended for solving both continuous and mixed discrete–
continuous optimization problems in [18]. ACOR [19] is an implementation of con-
tinuous ACO. In ACOR, an archive with k best solutions with n variables are main-
tained and used to generate normal distribution density functions, which are later
used to generate m new solutions by ants. Then, the m newly generated solutions
replace the worst solutions in the archive. In ACOR, the construction of new solu-
tions by the ants is accomplished in an incremental manner, variable by variable. At
first, an ant is used to generate a variable value, just like it is used to generate a step in
TSP. For a problem with n variables, an ant needs n steps to generate a solution, just
like it needs n steps to generate a Hamiltonian cycle in TSP. ACOR is quite similar
to CMA and EDA. Similar realizations of this type are reported in [17].
11.2 Ant-Colony Optimization 197
Problems
11.1 Given an ant-colony system with four cities, and that the kth ant is in city 1
and
k
P11 = 0, P12 k
= 1/4, P13 k
= 1/4, P14 k
= 1/2.
What is the probability of the kth ant proceeding to each of the four cities?
11.2 TSP consists in finding a Hamiltonian circuit of minimum cost on an edge-
weighted graph G = (N , E), where N is the set of nodes, and E is the set of
edges. Let xi j (s) be a binary variable taking 1 if edge <i, j> is included in
the tour, and 0 otherwise. Let ci, j be the cost associated with edge <i, j>.
The goal is to find such a tour that minimizes the function
f (s) = ci j xi j (s).
i∈N j∈N
References
1. Bilchev G, Parmee IC. The ant colony metaphor for searching continuous design spaces. In:
Fogarty TC, editor. Proceedings of AISB workshop on evolutionary computing, Sheffield, UK,
April 1995, vol. 993 of Lecture notes in computer science. London: Springer; 1995. p. 25–39.
2. Dorigo M, Di Caro G, Gambardella LM. Ant algorithms for discrete optimization. Artif Life.
1999;5(2):137–72.
3. Dorigo M, Gambardella LM. A study of some properties of Ant-Q. In: Proceedings of the 4th
international conference on parallel problem solving from nature (PPSN IV), Berlin, Germany,
September 1996. p. 656–665.
4. Dorigo M, Gambardella LM. Ant colony system: a cooperative learning approach to the trav-
eling salesman problem. IEEE Trans Evol Comput. 1997;1(1):53–66.
5. Dorigo M, Maniezzo V, Colorni A. Positive feedback as a search strategy. Dipartimento di
Elettronica, Politecnico di Milano, Milan, Italy, Technical Report, 1991. p. 91–016:
6. Dorigo M, Stutzle T. Ant colony optimization. Cambridge: MIT Press; 2004.
7. Dreo J, Siarry P. Continuous interacting ant colony algorithm based on dense heterarchy. Future
Gener Comput Syst. 2004;20(5):841–56.
8. Gambardella LM, Dorigo M. Ant-Q: a reinforcement learning approach to the traveling sales-
man problem. In: Proceedings of the 12th international conference on machine learning, Tahoe
City, CA, USA, July 1995. p. 252–260.
9. Hu X-M, Zhang J, Chung HS-H, Li Y, Liu O. SamACO: variable sampling ant colony
optimization algorithm for continuous optimization. IEEE Trans Syst Man Cybern Part B.
2010;40:1555–66.
10. Hu X-M, Zhang J, Li Y. Orthogonal methods based ant colony search for solving continuous
optimization problems. J Comput Sci Technol. 2008;23(1):2–18.
11. Huang H, Wu C-G, Hao Z-F. A pheromone-rate-based analysis on the convergence time of
ACO algorithm. IEEE Trans Syst Man Cybern Part B. 2009;39(4):910–23.
12. Liao T, Socha K, Montes de Oca MA, Stutzle T, Dorigo M. Ant colony optimization for
mixed-variable optimization problems. IEEE Trans Evol Comput. 2013;18(4):503–18.
13. Liu L, Dai Y, Gao J. Ant colony optimization algorithm for continuous domains based on
position distribution model of ant colony foraging. Sci World J. 2014; 2014:9 p. Article ID
428539.
14. Merkle D, Middendorf M. Modeling the dynamics of ant colony optimization. Evol Comput.
2002;10(3):235–62.
15. Monmarche N, Venturini G, Slimane M. On how Pachycondyla apicalis ants suggest a new
search algorithm. Future Gener Comput Syst. 2000;16(9):937–46.
16. Neumann F, Witt C. Runtime analysis of a simple ant colony optimization algorithm. In:
Proceedings of the 17th international symposium on algorithms and computation, Kolkata,
India, December 2006. vol. 4288 of Lecture notes in computer science. Berlin: Springer; 2006.
p. 618–627.
17. Pourtakdoust SH, Nobahari H. An extension of ant colony system to continuous optimization
problems. In: Proceedings of the 4th international workshop on ant colony optimization and
swarm intelligence (ANTS 2004), Brussels, Belgium, September 2004. p. 294–301.
18. Socha K. ACO for continuos and mixed-variable optimization. In: Proceedings of the 4th
international workshop on ant colony optimization and swarm intelligence (ANTS 2004),
Brussels, Belgium, September 2004. p. 25–36.
19. Socha K, Dorigo M. Ant colony optimization for continuous domains. Eur J Oper Res.
2008;185(3):1115–73.
20. Stutzle T, Hoos HH. The MAX-MIN ant system and local search for the traveling salesman
problem. In: Proceedings of IEEE international conference on evolutionary computation (CEC),
Indianapolis, IN, USA, April 1997. p. 309–314.
References 199
21. Stutzle T, Dorigo M. A short convergence proof for a class of ant colony optimization algo-
rithms. IEEE Trans Evol Comput. 2002;6(4):358–65.
22. Turner JS. Termites as models of swarm cognition. Swarm Intell. 2011;5:19–43.
23. Wodrich M, Bilchev G. Cooperative distributed search: the ants’ way. Control Cybern.
1997;26(3):413–46.
Bee Metaheuristics
12
This chapter introduces various algorithms that are inspired by the foraging, mating,
fertilization, and communication behaviors of honey bees. Artificial bee colony
(ABC) algorithm and marriage in honeybees optimization are described in detail.
12.1 Introduction
In nature, although each bee only performs one single task, yet through a variety of
ways of communication between bees, such as waggle dance and special odor, the
entire colony can complete complex works, such as hives building and pollen harvest
[51]. A number of optimization algorithms are inspired by the intelligent behavior of
honey bees, such as artificial bee colony (ABC) [27], bee colony optimization [57],
bees algorithm [48], and bee nectar search optimization [7].
Bees crawl along a straight line, and then turn left, moving as figure eight and swing-
ing their belly. This is waggle dance, and the angle between the gravity direction and
the center axis of the dance is exactly equal to the angle between the sun and food
source. Waggle dance can also deliver information about the distance and direction
of the food sources. The nature and duration of a waggle dance depends on the nec-
tar content of the food source. Bees in the hive each select a food source to search
for nectar, or investigate new food sources around the hive, from the information
delivered by the waggle dance [54]. Through this kind of information exchanging
and learning, the colony would always find relatively prominent nectar source. Fol-
lowing a visit to a nectar-rich inflorescence, a bee will fly a short distance to the
next inflorescence, but direction is maintained; this is believed to avoid revisiting a
site that it has depleted. When an inflorescence provides poor rewards, the bee will
extend its flight and increase its turn angles to move away from the area.
Initially, some scout bees search the region around the hive for food. After the
search, they return to the hive and inform other bees of the locations, quantity and
quality of food sources. In case they have discovered nectar, they will dance in the
so-called dance floor area of the hive, to advertise food locations so as to encourage
the other bees to follow them. If a bee decides to leave the hive and collect nectar,
it will follow one of the dancing scout bees to the destination. Upon arriving at the
food source, the foraging bee takes a load of nectar and returns to the hive, passing
the nectar to a food storer. It can abandon the food location and return to its role of an
uncommitted follower, or continue with the foraging behavior, or recruit other bees
by dancing before returning to the food location. Several bees may attempt to recruit
other bees at the dance floor area simultaneously. The process continues repeatedly,
while bees accumulate nectar and explore new areas with potential food sources.
The essential components of a colony are food sources, unemployed foragers and
employed foragers [27]. Unemployed foragers can be either onlookers or scouts.
They are continually looking for a food source to exploit. Scout bees performs explo-
ration, whereas employed and onlooker bees perform exploitation.
• Employed bees are those that are presently exploiting a food source. They bring
loads of nectar from the food sources to the hive and share the information (via
waggle dance) about food sources with onlooker bees. They carry information
about a particular source, and share this information with certain probability.
• Onlookers are those that search for a better food source in the neighborhood of
the memorized food sources based on the information from the employed bees.
Onlookers wait in the dance area of the hive for the information from the employed
bees about the food sources. They watch the dance of the employed bees, and then
choose a food source.
• Scout bees are those that are randomly looking for new food sources in the vicinity
of the hive without any knowledge. The percentage of scout bees varies from 5 to
30 % according to the information into the hive [51].
Onlooker bees observe numerous dances before choosing a food source with a prob-
ability proportional to the nectar content of that food source. Therefore, good food
sources attract more bees than bad ones. Whenever a bee, whether it is a scout or
an onlooker, finds a food source it becomes employed. Whenever a food source is
completely exhausted, all the employed bees associated with it leave, and can again
become scouts or onlookers.
A typical bee colony is composed of the queen, drones (male bees), and workers
(female workers). The queen’s life is a couple of years, and she is the only mother
of the colony. She is the only bee capable of laying eggs. Drones are produced from
12.1 Introduction 203
unfertilized eggs and are the fathers of the colony. Their number is around a couple of
hundreds. Worker bees are produced from fertilized eggs, and they work on all pro-
cedures in the colony, such as feeding the colony and the queen, maintaining broods,
building combs, and collecting food. Their numbers are around 10–60 thousand.
Mating flight happens only once during the life of the queen. Mating starts with
the dance of the queen. Drones follow and mate with the queen during the flight.
Mating of a drone with the queen depends of the queen’s speed and their fitness.
Sperms of the drone are stored in the spermatheca of the queen, where the gene
pool of future generations is created. The queen has a certain amount of energy at
the start of the flight and return to the nest when her energy falls to minimum or
when her spermatheca is full. After going back to the nest, broods are generated
and these are improved by the worker bees crossover and mutation. The queen lays
approximately two thousand fertilized eggs a day (two hundred thousand a year).
After the spermatheca is discharged, she lays unfertilized eggs [45].
ABC associates all employed bees with food sources (solutions). Unlike real bee
colonies, there is a one-to-one correspondence between employed bees and food
sources (solutions). That is, the number of food sources is the same as that of
employed bees.
204 12 Bee Metaheuristics
Example 12.1: The Easom function is treated in Examples 2.1, 3.4, and 5.2. Here
we solve this same problem by using ABC. The global minimum value is −1 at
x = (π, π )T .
By setting the maximum number of search cycles as 200, the bee colony size as
100, the local search abandoning limit as 2000, the implementation always finds a
solution close to the global optimum.
For a random run, we have f (x) = −0.9988 at (3.1687, 3.1486) with 9000 func-
tion evaluations. All the individuals converge toward the global optimum. For 10
random runs, the solver always converged to the global optimum within 200 search
cycles. For a random run, the evolution is shown in Figure 12.1, and the evolution
of the best solution at each cycle is shown in Figure 12.2. Note that in Figure 12.2,
we show only a small region of the domain for illustration purpose.
206 12 Bee Metaheuristics
−0.3
Function value
−0.4
−0.5
−0.6
−0.7
−0.8
−0.9
−1
0 50 100 150 200
Iteration
Figure 12.1 The evolution of a random run of ABC for the Easom function: the minimum and
average objectives.
Figure 12.2 The evolution of the best solution at each cycle for a random run of ABC for the
Easom function.
12.2 Artificial Bee Colony Algorithm 207
Due to roulette-wheel selection in the onlooker phase, ABC suffers from some inher-
ent drawbacks like slow or premature convergence when dealing with certain com-
plex models [28,54]. Boltzmann selection mechanism is employed instead in [24]
for improving the convergence ability of ABC.
Intermediate ABC [53] modifies the structure of ABC. The potential food sources
are generated by using the intermediate positions between the uniformly generated
random numbers and random numbers generated by opposition-based learning. Inter-
mediate ABC is further modified by guiding the bees toward the best food location
in the population to improve the convergence rate.
Hybrid simplex ABC [25] combines ABC with Nelder–Mead simplex method to
solve inverse analysis problems. Memetic ABC proposed in [20] hybridizes ABC
with two local search heuristics: the Nelder-Mead algorithm and the random walk
with direction exploitation.
Interactive ABC [58] introduced in the onlooker phase of ABC, the Newtonian law
of universal gravitation, which is also for modifying roulette-wheel selection. Gbest-
guided ABC [62] incorporates the gbest solution into the solution search equation. In
[11], different chaotic maps are used for parameter adaptation in order to improve the
convergence characteristics and to prevent ABC from getting stuck in local solutions.
In ABC, only one dimension of the food source position is updated by the
employed or onlooker bees. In order to accelerate the convergence, in ABC with
modification rate [4], a control parameter called modification rate (in [0, 1]) is intro-
duced to decide whether a dimension will be updated. If a random number is less
than the modification rate, the dimension j is modified and at least one dimension
is updated. A lower modification rate may cause solutions to improve slowly while
a higher value may cause too much diversification in the population.
The undirected random search in ABC causes slow convergence of the algorithm
to the optimum or near optimum. Directed ABC [35] adds directional information for
each dimension of each food source position to ABC. The directions of information
for all dimensions are initially set to 0. If the new solution is better than old one,
the direction information is updated. If previous value of the dimension is less than
current value, the direction information of this dimension is set to −1; otherwise the
direction information of this dimension is set to 1. If new solution is worse than old
one, the direction information of the dimension is set to 0. The direction information
of each dimension of each food source position is used. Directed ABC is better than
ABC and ABC with modification rate in terms of solution quality and convergence
rate.
ABC is excellent in exploration but poor in exploitation. Gaussian bare-bones
ABC [61] designs a search equation based on utilizing the global best solution.
The generalized opposition-based learning strategy is employed to generate new
food sources for scout bees. In [40], exploitation is improved by integrating the
information of previous best solution into the search equation for employed bees
and global best solution into the update equation for onlooker bees. S-type adaptive
208 12 Bee Metaheuristics
scaling factors are introduced in the search equation of employed bees. The search
policy of scout bees is modified to update food source in each cycle in order to
increase diversity and stochasticity of the bees.
In [8], ABC is modified by replacing the process of the employed bee opera-
tor by the hill-climbing optimizer controlled by hill-climbing rate to empower its
exploitation capability. The algorithm is applied on nurse rostering problem.
ABC uses differential position update rule. When food sources gather on the
similar points within the search space, differential position update rule can cause
stagnation during the search process. Distribution-based update rule for ABC [9]
uses the mean and standard deviation of the selected two food sources to obtain a
new candidate solution. This effectively overcomes stagnation behavior.
Rosenbrock ABC [26] combines Rosenbrock’s rotational direction method with
ABC. In [18], two variants of ABC apply new methods for the position update of
the artificial bees. An improved version of ABC [50] uses mutation based on Levy
probability distributions.
In [29], ABC is extended for solving constrained optimization problems. In [13],
an improved version of ABC is proposed for constrained optimization problems. In
[43], an algorithm is introduced based on ABC to solve constrained real-parameter
optimization problems, in which a dynamic tolerance control mechanism for equality
constraints is added to the algorithm in order to facilitate the approach to the fea-
sible region of the search space. In a modified ABC algorithm [55] for constrained
problems, a smart bee having memory is employed to keep the location and quality
of food sources.
Quick ABC [32] models the behavior of onlooker bees more accurately and
improves the performance of standard ABC in terms of local search ability; this
is described and its performance is analyzed depending on the neighborhood radius,
on a set of benchmark problems. ABC with memory [38] imitates a memory mech-
anism to the artificial bees to memorize their previous successful experiences of
foraging behavior. ABC with memory outperforms ABC and quick ABC.
Opposition-based Levy flight ABC [52] incorporates Levy flight random-walk-
based local search strategy with ABC along with opposition-based learning strategy.
It outperforms basic ABC, gbest-guided ABC [62], best-so-far ABC [10] and a
modified ABC [4] in most of the experiments.
Binary versions of ABC are available for binary optimization problems [34,47].
Discrete ABC [34] uses a differential expression which employs a measure of dis-
similarity between binary vectors in place of the vector subtraction operator used in
ABC. In [47], the binary ABC is based on genetic operators such as crossover and
swap; it improves the global–local search ability of basic ABC in binary domain by
integrating the neighborhood searching mechanism of basic ABC.
12.2 Artificial Bee Colony Algorithm 209
In [37], concepts of inertia weight and acceleration coefficients from PSO have
been utilized to improve the search process of ABC. In [31], a combinatorial ABC
is introduced for traveling salesman problems. ABC programming is applied to
symbolic regression in [33].
In another ABC for binary optimization [36], artificial bees work on the continuous
solution space, and the obtained food source position is converted to binary values
before the objective function is evaluated.
where α ∈ [0, 1] and γ is the amount of energy reduction in each pass. The algorithm
[1] is shown in Algorithm 12.2.
During the backward pass, all bees share their solutions using waggle dance. Each
bee decides, with certain probability, whether to keep its solution or not: a bee with
better solution has a higher chance of keeping and advertising its solution. The bees
that are loyal to their partial solutions are called recruiters. Every remaining bee has
to decide whether to continue to explore its own solution in the next forward pass or
to start exploring the neighborhood of one of the solutions advertised. The followers
have to choose a bee to follow and adopt its solution. Selection of a recruiter is made
probabilistically. Once a solution is abandoned, the bee becomes uncommitted, and
has to select one of the advertised solutions probabilistically, in such a way that
better advertised solutions have higher chances to be chosen for further exploration.
Within each backward pass, all bees are divided into two groups (R recruiters and B R
uncommitted bees). The number of components is calculated in such a way that one
iteration of bee colony optimization is completed after NC forward/backward passes.
At the end of the forward pass the new (partial or complete) solution is generated
for each bee.
Bee colony optimization is good at exploration but weak at exploitation. Weighted
bee colony optimization [44] improves the exploitation power by allowing the bees
to search in the solution space deliberately while considering policies to share the
attained information about the food sources heuristically. It considers global and
local weights for each food source, where the former is the rate of popularity of a
given food source in the swarm and the latter is the relevancy of a food source to a
category label. To preserve diversity in the population, new policies are embedded in
the recruiter selection stage to ensure that uncommitted bees follow the most similar
committed ones.
Other approaches that simulate the behavior of bees are virtual bee algorithm [60],
beehive algorithm [59], bee swarm optimization [19], bees algorithm [48], honey bee
colony algorithm [15], beehive model [46], and honey bee social foraging algorithm
[49].
Virtual bee algorithm [60] associates the population of bees with a memory and
a food source, and then, the bees communicate with a waggle dance procedure. A
swarm of virtual bees are generated and they are allowed to move randomly in the
phase space and these bees interact when they find some target nectar. Nectar sources
correspond to the encoded values of the function. The solution can be obtained from
the intensity of bee interactions.
212 12 Bee Metaheuristics
Bees algorithm [48] mimics the foraging behavior of honey bees. It performs
neighborhood search combined with random search and can be used for both combi-
natorial and continuous optimization. A population of initial solutions (food sources)
are randomly generated. Then, the bees are assigned to the solutions based on their
fitness function. The bees return to the hive and based on their food sources, a number
of bees are assigned to the same food source in order to find a better neighborhood
solution. Each bee is represented as an individual whose behavior is regulated by a
behavior-control structure.
Beehive algorithm [59] is inspired by the communication in the hive of honey
bees. It has been applied to the routing in networks. In Beehive algorithm, a protocol
inspired from dance language and foraging of one bee is determined by the internal
and external information available to it and its motivational state, according to a set
of specific rules which is identical for each bee. Since the perceptible environment
differs for bees with a different spatial location, the behavior also differs. Bees can
show different behaviors as well, given differences in their foraging experience and/or
their motivational state.
Bee swarm optimization [5,19] uses a modified formula for different phases of
ABC. Different types of flying patterns are introduced to maintain proper balance
between global and local search by providing diversity into the swarm of bees.
Penalty and repulsion factors are introduced to mitigate stagnation. In bees swarm
optimization, initially a bee finds an initial solution (food source) and from this
solution the other solutions are produced with certain strategies. Then, every bee is
assigned in a solution and when they accomplish their search, the bees communicate
between themself with a waggle dance strategy and the best solution will become
the new reference solution. A tabu list is used to avoid cycling.
Bee collecting pollen algorithm [41] is a metaheuristic optimization algorithm
for discrete problems such as TSP, inspired by the pollen-collecting behavior of
honeybees.
Wasp swarm optimization [16,17] is a heuristic stochastic method for solving dis-
crete optimization problems. It mimics the behavior of a wasp colony, in particular,
the assignment of resources to individual wasps is based on their social status. For
example, if the colony has to fight a war against an enemy colony, then wasp sol-
diers will receive more food than others. Generally, the method assigns resources
to individual solution components stochastically, where the probabilities depend
on the strength of each option. The function for computing this strength is highly
application-dependent. In [16], a stochastic tournament mechanism is used to pick a
solution based on the probabilities calculated from the given strengths. The algorithm
needs to decide the application-specific strength function and the way to stochasti-
cally pick options.
12.5 Other Bee Algorithms 213
Problems
References
1. Abbass HA. MBO: Marriage in honey bees optimization—a haplometrosis polygynous swarm-
ing approach. In: Proceedings of the IEEE congress on evolutionary computation (CEC2001),
Seoul, Korea, May 2001. p. 207–214.
2. Afshar A, Bozog Haddad O, Marino MA, Adams BJ. Honey-bee mating optimization (HBMO)
algorithm for optimal reservoir operation. J Frankl Inst. 2007;344:452–462.
3. Akay B, Karaboga D. Parameter tuning for the artificial bee colony algorithm. In:Proceedings
of the 1st international conference on computational collective intelligence (ICCCI): Semantic
web, social networks and multiagent systems, Wroclaw, Poland, October 2009. p. 608–619.
4. Akay B, Karaboga D. A modified artificial bee colony algorithm for real-parameter optimiza-
tion. Inf Sci. 2012;192:120–42.
5. Akbari R, Mohammadi A, Ziarati K. A novel bee swarm optimization algorithm for numerical
function optimization. Commun Nonlinear Sci Numer Simul. 2010;15:3142–55.
6. Alam MS, Ul Kabir MW, Islam MM. Self-adaptation of mutation step size in artificial bee
colony algorithm for continuous function optimization. In: Proceedings of the 13th international
conference on computer and information technology (ICCIT), Dhaka, Bangladesh, December
2010. p. 69–74.
7. Alfonso W, Munoz M, Lopez J, Caicedo E. Optimización de funciones inspirada en el compor-
tamiento de búsqueda de néctar en abejas. In: Congreso Internacional de Inteligenicia Com-
putacional (CIIC2007), Bogota, Colombia, September 2007.
8. Awadallah MA, Bolaji AL, Al-Betar MA. A hybrid artificial bee colony for a nurse rostering
problem. Appl Soft Comput. 2015;35:726–39.
9. Babaoglu I. Artificial bee colony algorithm with distribution-based update rule. Appl Soft
Comput. 2015;34:851–61.
10. Banharnsakun A, Achalakul T, Sirinaovakul B. The best-so-far selection in artificial bee colony
algorithm. Appl Soft Comput. 2011;11(2):2888–901.
11. Bilal A. Chaotic bee colony algorithms for global numerical optimization. Expert Syst Appl.
2010;37:5682–7.
12. Brajevic I, Tuba M, Subotic M. Improved artificial bee colony algorithm for constrained prob-
lems. In: Proceedings of the 11th WSEAS International conference on evolutionary computing,
world scientific and engineering academy and society (WSEAS), Stevens Point, WI, USA, June
2010. p. 185–190.
13. Brajevic I, Tuba M, Subotic M. Performance of the improved artificial bee colony algorithm
on standard engineering constrained problems. Int J Math Comput Simul. 2011;5(2):135–43.
14. Chang HS. Convergingmarriage in honey-bees optimization and application to stochastic
dynamic programming. J Glob Optim. 2006;35(3):423–41.
15. Chong CS, Low MYH, Sivakumar AI, Gay KL. A bee colony optimization algorithm to job
shop scheduling. In: Proceedings of the winter simulation conference, Monterey, CA, USA,
December 2006. p. 1954–1961.
214 12 Bee Metaheuristics
16. Cicirello VA, Smith SF. Improved routing wasps for distributed factory control. In: Proceedings
of IJCAI workshop on artificial intelligence and manufacturing, Seattle, WA, USA, August
2001. p. 26–32.
17. Cicirello VA, Smith SF. Wasp-like agents for distributed factory coordination. Auton Agents
Multi-Agent Syst. 2004;8:237–66.
18. Diwold K, Aderhold A, Scheidler A, Middendorf M. Performance evaluation of artificial bee
colony optimization and new selection schemes. Memetic Comput. 2011;3:149–62.
19. Drias H, Sadeg S, Yahi S. Cooperative bees swarm for solving the maximum weighted satisfi-
ability problem. In: Computational intelligence and bioinspired systems, vol. 3512 of Lecture
notes in computer science. Berlin: Springer; 2005. p. 318–325.
20. Fister I, Fister Jr I, Zumer JB. Memetic artificial bee colony algorithm for large-scale global
optimization. In: Proceedings of IEEE congress on evolutionary computation (CEC), Brisbane,
Australia, June 2012. p. 1–8.
21. Gao W, Liu S. Improved artificial bee colony algorithm for global optimization. Inf Process
Lett. 2011;111(17):871–82.
22. Gao WF, Liu SY. A modified artificial bee colony algorithm. Comput Oper Res.
2012;39(3):687–97.
23. Haddad OB, Afshar A, Marino MA. Honey-bees mating optimization (HBMO) algo-
rithm: a new heuristic approach for water resources optimization. Water Resour Manage.
2006;20(5):661–80.
24. Haijun D, Qingxian F. Artificial bee colony algorithm based on Boltzmann selection policy.
Comput Eng Appl. 2009;45(31):53–5.
25. Kang F, Li J, Xu Q. Structural inverse analysis by hybrid simplex artificial bee colony algo-
rithms. Comput Struct. 2009;87(13):861–70.
26. Kang F, Li J, Ma Z. Rosenbrock artificial bee colony algorithm for accurate global optimization
of numerical functions. Inf Sci. 2011;181:3508–31.
27. Karaboga D. An Idea based on honey bee swarm for numerical optimization. Technical Report,
Erciyes University, Engineering Faculty Computer Engineering Department, Erciyes, Turkey,
2005.
28. Karaboga D, Akay B. A comparative study of artificial bee colony algorithm. Appl Math
Comput. 2009;214:108–32.
29. Karaboga D, Basturk B. A powerful and efficient algorithm for numerical function optimization:
artificial bee colony (ABC) algorithm. J Glob Optim. 2007;39(3):459–71.
30. Karaboga D, Basturk B. On the performance of artificial bee colony (ABC) algorithm. Appl
Soft Comput. 2008;8(1):687–97.
31. Karaboga D, Gorkemli B. A combinatorial artificial bee colony algorithm for traveling salesman
problem. In: Proceedings of international symposium on innovations in intelligent systems and
applications (INISTA), Istanbul, Turkey, June 2011. p. 50–53.
32. Karaboga D, Gorkemli B. A quick artificial bee colony (qABC) algorithm and its performance
on optimization problems. Appl Soft Comput. 2014;23:227–38.
33. Karaboga D, Ozturk C, Karaboga N, Gorkemli B. Artificial bee colony programming for
symbolic regression. Inf Sci. 2012;209:1–15.
34. Kashan MH, Nahavandi N, Kashan AH. DisABC: a new artificial bee colony algorithm for
binary optimization. Appl Soft Comput. 2012;12:342–52.
35. Kiran MS, Findik O. A directed artificial bee colony algorithm. Appl Soft Comput.
2015;26:454–62.
36. Kiran MS. The continuous artificial bee colony algorithm for binary optimization. Appl Soft
Comput. 2015;33:15–23.
37. Li G, Niu P, Xiao X. Development and investigation of efficient artificial bee colony algorithm
for numerical function optimization. Appl Soft Comput. 2012;12:320–32.
38. Li X, Yang G. Artificial bee colony algorithm with memory. Appl Soft Comput. 2016;41:362–
72.
References 215
39. Liu Y, Passino KM. Biomimicry of social foraging bacteria for distributed optimization: models,
principles, and emergent behaviors. J Optim Theor Appl. 2002;115(3):603–28.
40. Liu J, Zhu H, Ma Q, Zhang L, Xu H. An artificial bee colony algorithm with guide of global and
local optima and asynchronous scaling factors for numerical optimization. Appl Soft Comput.
2015;37:608–18.
41. Lu X, Zhou Y. A novel global convergence algorithm: bee collecting pollen algorithm. In:
Proceedings of the 4th international conference on intelligent computing, Shanghai, China,
September 2008, vol. 5227 of Lecture notes in computer science. Berlin: Springer; 2008. p.
518–525.
42. Lucic P, Teodorovic D. Computing with bees: attacking complex transportation engineering
problems. Int J Artif Intell Tools. 2003;12:375–94.
43. Mezura-Montes E, Velez-Koeppel RE. Elitist artificial bee colony for constrained real-
parameter optimization. In: Proceedings of IEEE congress on evolutionary computation (CEC),
Barcelona, Spain, July 2010. p. 1–8.
44. Moayedikia A, Jensen R, Wiil UK, Forsati R. Weighted bee colony algorithm for discrete opti-
mization problems with application to feature selection. Eng Appl Artif Intell. 2015;44:153–67.
45. Moritz RFA, Southwick EE. Bees as super-organisms. Berlin, Germany: Springer; 1992.
46. Navrat P. Bee hive metaphor for web search. In: Proceedings of the international conference
on computer systems and technologies (CompSysTech), Veliko Turnovo, Bulgaria, 2006. p.
IIIA.12.
47. Ozturk C, Hancer E, Karaboga D. A novel binary artificial bee colony algorithm based on
genetic operators. Inf Sci. 2015;297:154–70.
48. Pham DT, Kog E, Ghanbarzadeh A, Otri S, Rahim S, Zaidi M. The bees algorithm—a novel tool
for complex optimisation problems. In: Proceedings of the 2nd international virtual conference
on intelligent production machines and systems (IPROMS), Cardiff, UK, July 2006. p. 454–
459.
49. Quijano N, Passino KM. Honey bee social foraging algorithms for resource allocation, Part i:
algorithm and theory; part ii: application. In: Proceedings of the American control conference,
New York, NY, USA, July 2007. p. 3383–3388, 3389–3394.
50. Rajasekhar A, Abraham A, Pant M. Levy mutated artificial bee colony algorithm for global opti-
mization. In: Proceedings of IEEE international conference on systems, man and cybernetics,
Anchorage, AK, USA, October 2011. p. 665–662.
51. Seeley TD. The wisdom of the hive: the social physiology of honey bee colonies. Massachusetts:
Harvard University Press; 1995.
52. Sharma H, Bansal JC, Arya KV. Opposition based Levy flight artificial bee colony. Memetic
Comput. 2013;5:213–27.
53. Sharma TK, Pant M. Enhancing the food locations in an artificial bee colony algorithm. Soft
Comput. 2014;17:1939–65.
54. Singh A. An artificial bee colony algorithm for the leaf-constrained minimum spanning tree
problem. Applied Soft Comput. 2009;9(2):625–31.
55. Stanarevic N, Tuba M, Bacanin N. Enhanced artificial bee colony algorithm performance. In:
Proceedings of the 14th WSEAS international conference on computers, world scientific and
engineering academy and society (WSEAS). Stevens Point, WI, USA, June 2010. p. 440–445.
56. Teo J, Abbass HA. A true annealing approach to the marriage in honey-bees optimization
algorithm. Int J Comput Intell Appl. 2003;3:199–208.
57. Teodorovic D, Dell’Orco M. Bee colony optimization—a cooperative learning approach to
complex transportation problems. In: Proceedings of the 10th meeting of the EURO working
group on transportation, Poznan, Poland, September 2005. p. 51–60.
58. Tsai P-W, Pan J-S, Liao B-Y, Chu S-C. Enhanced artificial bee colony optimization. Int J
Innovative Comput Inf Control. 2009;5(12):5081–92.
59. Wedde HF, Farooq M, Zhang Y. BeeHive: an efficient fault-tolerant routing algorithm inspired
by honey bee behavior. In: Dorigo M, editors. Ant colony optimization and swarm intelligence,
vol. 3172 of Lecture notes in computer science. Berlin: Springer; 2004. pp. 83–94.
216 12 Bee Metaheuristics
60. Yang XS. Engineering optimizations via nature-inspired virtual bee algorithms. In: Mira J,
lvarez JR, editors. Artificial intelligence and knowledge engineering applications: a bioinspired
approach, vol. 3562 of Lecture notes in computer science. Berlin: Springer; 2005. pp. 317–323.
61. Zhou X, Wu Z, Wang H, Rahnamayan S. Gaussian bare-bones artificial bee colony algorithm.
Soft Comput. 2016: 1–18. doi:10.1007/s00500-014-1549-5.
62. Zhu G, Kwong S. Gbest-guided artificial bee colony algorithm for numerical function opti-
mization. Appl Math Comput. 2010;217:3166–73.
Bacterial Foraging Algorithm
13
This chapter describes bacterial foraging algorithm inspired by the social foraging
behavior of Escherichia coli present in human intestine. Several algorithms inspired
by molds, algae, and tumor cells are also introduced.
13.1 Introduction
The social foraging behavior of Escherichia coli present in human intestine and
M. xanthus bacteria was explained in [9]. The social foraging of both species of bac-
teria is able to climb noisy gradients in nutrients. The foraging behavior is modeled
as an optimization process where bacteria seek to maximize the energy intake per
unit time spent for foraging, considering all the constraints presented by their own
physiology and environment. Bacterial foraging algorithm [9,14] is a population-
based stochastic optimization technique inspired by the behavior of Escherichia coli
bacteria that forage for food. Bacterial chemotaxis algorithm [11] tackles optimiza-
tion problems by employing the way in which bacteria react to chemoattractants in
concentration gradients.
Bacterial foraging behavior is known as bacterial chemotaxis. Chemotaxis, a cell
movement in response to gradients of chemical concentrations present in the envi-
ronment, is a survival strategy that allows bacteria to search for nutrients and avoid
noxious environments. The chemotactical behavior of bacteria as an optimization
process was modeled in the early 1970s [2].
The chemotactical behavior of bacteria is modeled by making the following
assumptions [3]. (1) The path of a bacterium is a sequence of straight-line trajec-
tories joined by instantaneous turns, each trajectory being characterized by speed,
direction, and duration. (2) All trajectories have the same constant speed. (3) When
a bacterium turns, its choice of a new direction is governed by a probability distri-
bution, which is azimuthally symmetric about the previous direction. (4) The angle
event of elimination–dispersal in the real bacterial population, where all the bacteria
in a region are killed or a group is dispersed into a new part of the environment.
In summary, the chemotactical strategy of Escherichia coli can be given as follows
[14]. If a bacterium finds a neutral environment or an environment without gradients,
it alternately tumbles and swims; If it finds a nutrient gradient, the bacterium spend
more time swimming and less time tumbling, so the directions of movement are
biased toward increasing nutrient gradients; If it finds a negative gradient or noxious
substances, it swims to better environments or run away from dangerous places.
Overall, bacterial foraging algorithm is a very effective search approach for global
optimization problems [4,14]. However, it is relatively complex and more computa-
tion time might be needed [9].
In bacterial foraging algorithm, a set of bacteria tries to reach an optimum cost
by following four stages: chemotaxis, swarming, reproduction, and elimination and
dispersal. All the stages are continuous and they are repeated until the end of bacteria
life.
At the beginning, each bacterium produces a solution iteratively for a set of para-
meters. In the chemotaxis phase, the step size of bacterium movement determines the
performance of the algorithm both in terms of the convergence speed and the accu-
racy. In the swarming stage, each bacterium signals another bacterium via attractants
to swarm together. This is the cell-to-cell signaling stage. During the process of reach-
ing toward the best food location, the bacterium which has searched the optimum
path produces an attraction signal to other bacteria to swarm to the desired location.
In the reproduction stage, all the bacteria are sorted and grouped into two classes.
The first half of the bacteria with high fitness is cloned to inherit their good features.
Each bacterium splits into two bacteria, which are placed at the same location; the
other half are eliminated from the population. In the elimination and dispersal stage,
any bacterium from the total set can be either eliminated or dispersed to randomly
distribute within the search area to search for other better nutrient location. This
stage prevents the bacteria from attaining the local optimum.
Let x be the position of a bacterium and J (x) be the value of the objective
function. The conditions J (x) < 0, J (x) = 0, and J (x) > 0 indicate whether
the bacterium at location x is in nutrient-rich, neutral, and noxious environments,
respectively. Chemotaxis tries to find lower values of J (x), and avoids positions x
where J (x) ≥ 0.
The chemotaxis process simulates the movement of the bacteria via swarming
and tumbling. The chemotactic movement can be represented by
j+1,k,l j,k,l i
xi = xi + Ci , (13.1)
iT i
220 13 Bacterial Foraging Algorithm
j,k,l
where x i is the position of the ith bacterium at the jth chemotaxis, kth repro-
duction, and lth elimination–dispersal stage, the step size Ci is taken in the random
direction specified by the tumble (swim), and i is a random vector with each entry
lying in [−1, 1].
A mathematical analysis of the chemotactic step in bacterial foraging algorithm is
performed based on gradient descent approach in [4]. The stability and convergence
behavior of the dynamics is analyzed according to Lyapunov stability theorems. The
analysis suggests that chemotaxis employed in standard bacterial foraging algorithm
usually results in sustained oscillation in the vicinity of the global minimum. The
step size can be made adaptive to avoid oscillation: a high nutrient value corresponds
to a large step size, and in the vicinity of the optima the step size can be reduced.
During the movements, cells release signal to other cells to swarm, depending
on whether they get a nutrient-rich environment or avoid a noxious environment. A
tie-varying term associated to the number of bacteria N P and the number of variables
p is added to the actual objective function.
The swarming pattern of the cell-to-cell attraction and repellence in bacterial
foraging algorithm reduces the precision of optimization. Bacteria in the local optima
may attract those in global optimum and thus lower the convergence speed. Fast
bacterial swarming algorithm [10] assumes that bacteria have the ability, similar to
that of birds to follow the best bacteria in the optimization domain. The position of
each bacterium is updated by
j+1,k,l j,k,l j,k,l j,k,l j,k,l j,k,l
xi = xi + Ci (x ∗ − xi ), if Ji > Jmin , (13.2)
j,k,l j,k,l
where x∗ is the best position the bacterium has at the moment. Ji is the
health status of the ith bacterium at the jth chemotaxis, kth reproduction and lth
elimination–disperse stage.
To accelerate the convergence speed near optima, the chemotactic step size C is
made adaptive in [4]:
1
C= λ
, (13.3)
ψ + |J (x)−J ∗|
where λ is a positive constant, typically λ = 400 and ψ ∈ [0, 1], and J ∗ is the fitness
of the global best bacterium. When the distance between the two fitness values is
much smaller than λ, C ≈ 1/λ. In [13], the step size of bacteria movement is
dynamically adjusted by using linear and nonlinear relationships based on the index
of iteration, index of bacteria, and fitness cost.
At the reproduction stage, the population is sorted according to the accumulated
cost, and N P /2 least healthy bacteria die and the remaining N P /2 healthier bacteria
are used for asexual reproduction, each being split into two bacteria at the same
location of their parent and keeping the same value. That is, after Nc chemotactic
steps, the fitness value of the ith bacterium in the chemotactic loop is accumulated
and calculated by
c +1
N
j,k,l
Jihealth = Ji . (13.4)
j=1
13.2 Bacterial Foraging Algorithm 221
−1
10
−2
Fitness 10
−3
10
−4
10
−5
10
0 20 40 60 80 100
Iteration
Figure 13.1 The evolution of a random run of bacterial foraging algorithm for Rastrigin function:
the minimum objective at each iteration.
For the purpose of improving the global search ability, after Nr e steps of repro-
duction, an elimination–dispersal event is applied to the algorithm. Each bacterium
is eliminated and dispersed to random positions in the search space according to the
probability Ped and their health status. Some bacteria are liquidated at random with a
small probability (commonly set to 0.25) while the new replacements are randomly
initialized over the search space.
A few variants of the classical algorithm as well as hybridizations of bacterial
foraging algorithm with other naturally inspired algorithms have been introduced in
[4,14]. New versions of bacterial foraging algorithm have been proposed in [1,17].
Quantum-inspired bacterial foraging algorithm [6] applies several quantum comput-
ing principles, and a mechanism is proposed to encode and observe the population.
Example 13.1: We revisit the Rastrigin function considered in Example 6.1. The
global optimum is f (x) = 0 at x ∗ = 0.
We now find the global optimum by using bacterial foraging algorithm. The pop-
ulation size is selected as 40, the numbers of reproduction steps, chemotactic steps
and swarming steps are all set as 20, C = 0.001, Ped = 0.8, and the maximum
number of iterations is 100. The initial population is randomly generated from the
entire domain.
For a random run, we have f (x) = 0.0107 at (−0.0050, 0.0054) at the end of
the iteration, and the evolution is illustrated in Figure 13.1. For 10 random runs, the
solver found the solution near the global optimum at the end of the iteration for three
runs. The performance is undesirable compared to that of the other methods, as the
algorithm lacks an elitism strategy to retain the best solutions found thus far.
222 13 Bacterial Foraging Algorithm
CEC05, it has balanced search performance, arising from the contribution of adap-
tation and evolutionary process, semi-random selection while choosing the source
of light in order to avoid local minima, and balancing of helical movement methods.
Artificial algae corresponds to solutions in the problem space. Artificial algae
algorithm has three control parameters (energy loss, adaptation parameter, and shear
force). Energy loss parameter determines the number of new candidate solutions
of algal colonies produced at each iteration. Each algal colony can produce new
candidate solutions in direct proportion to its energy (the success achieved in the
previous iteration). A small energy loss parameter corresponds to a high local search
capability, whereas a high parameter leads to a high global search ability. It uses an
adaptive energy loss parameter.
Similar to the real algae, artificial algae can move toward the source of light to
photosynthesize with helical swimming, and they can adapt to the environment, are
able to change the dominant species, and can reproduce by mitotic division. The
algorithm is composed of three basic parts: evolutionary process, adaptation, and
helical movement. In adaptation process, in each iteration, an insufficiently grown
algal colony tries to resemble itself to the biggest algal colony in the environment.
This process ends up with the change in starvation level. Starvation value increases
with time, when algal cell receive insufficient light. In evolutionary process, single
algal cell of the smallest algal colony dies and it is replaced by the replicated algal
cell of the biggest algal colony; this process achieves fine-tuning to find the global
optimum. Helical movement is applied to produce a new candidate solution. The
algorithm employs a greedy selection process between the candidate and the current
solutions.
The whole population is composed of algal colonies. An algal colony is a group
of algal cells living together. Under sufficient nutrient conditions, if the algal colony
receives enough light, it grows and reproduces itself to generate two new algal cells,
similar to the real mitotic division. When a single algal cell is divided to produce two
new algal cells, they live adjacently. An algal colony behaves like a single cell, moves
together, and cells in the colony may die under unsuitable life conditions. An external
force like a shear force may distribute the colony, and each distributed part become a
new colony as life proceeds. An algal colony not receiving enough light survives for
a while but eventually dies. An algal colony providing good solutions grows more
as the amount of nutrient obtained is high. In a randomly selected dimension, algal
cell of the smallest algal colony dies and algal cell of the biggest colony replicates
itself.
Algal cells and colonies generally swim and try to stay close to the water surface
because of adequate light for survival is available there. They swim helically in
the liquid with their flagella which provide forward movement. As friction surface
of growing algal cell gets larger, the frequency of helical movements increases by
increasing their local search ability. Each algal cell can move proportional to its
energy. The energy of an algal cell is directly proportional to the amount of nutrient
uptake at the time. The gravity restricting the movement is set as 0 and viscous drag
is displayed as shear force, which is proportional to the size of algal cell.
224 13 Bacterial Foraging Algorithm
Problem
References
1. Abraham A. A synergy of differential evolution and bacterial foraging optimization for global
optimization. Neural Netw World. 2007;17(6):607–26.
2. Bremermann H. Chemotaxis and optimization. J Franklin Inst. 1974;297:397–404.
3. Dahlquist FW, Elwell RA, Lovely PS. Studies of bacterial chemotaxis in defined concentration
gradients—a model for chemotaxis toward l-serine. J Supramol Struct. 1976;4:329–42.
4. Dasgupta S, Das S, Abraham A, Biswas A. Adaptive computational chemotaxis in bacterial
foraging optimization: an analysis. IEEE Trans Evol Comput. 2009;13(4):919–41.
5. Eisenbach M. Chemotaxis. London: Imperial College Press; 2004.
6. Huang S, Zhao G. A comparison between quantum inspired bacterial foraging algorithm and
Ga-like algorithm for global optimization. Int J Comput Intell Appl. 2012;11(3):19. Paper no.
1250016.
7. Hughes BD. Random walks and random environments. London: Oxford University Press;
1996.
8. Li WW, Wang H, Zou ZJ, Qian JX. Function optimization method based on bacterial colony
chemotaxis. J Circ Syst. 2005;10:58–63.
9. Liu Y, Passino KM. Biomimicry of social foraging bacteria for distributed optimization: models,
principles and emergent behaviors. J Optim Theory Appl. 2002;115(3):603–28.
10. Mi H, Liao H, Ji Z, Wu QH. A fast bacterial swarming algorithm for high-dimensional function
optimization. In: Proceedings of IEEE world congress on computational intelligence, Hong
Kong, China, June 2008. p. 3135–3140.
11. Muller SD, Marchetto J, Airaghi S, Kournoutsakos P. Optimization based on bacterial chemo-
taxis. IEEE Trans Evol Comput. 2002;6:16–29.
12. Nakagaki T, Kobayashi R, Nishiura Y, Ueda T. Obtaining multiple separate food sources:
behavioural intelligence in the Physarum plasmodium. Proc R Soc B: Biol Sci. 2004;271:2305–
10.
13. Nasir ANK, Tokhi MO, Abd Ghani NM. Novel adaptive bacteria foraging algorithms for global
optimization. Appl Comput Intell Soft Comput. 2014:7. Article ID 494271.
14. Passino KM. Biomimicry of bacterial foraging for distributed optimization and control. IEEE
Control Syst Mag. 2002;22(3):52–67.
References 225
15. Segall J, Block S, Berg H. Temporal comparisons in bacterial chemotaxis. Proc Natl Acad Sci
U S A. 1986;83(23):8987–91.
16. Tang D, Dong S, Jiang Y, Li H, Huang Y. ITGO: invasive tumor growth optimization algorithm.
Appl Soft Comput. 2015;36:670–98.
17. Tang WJ, Wu QH. Bacterial foraging algorithm for dynamic environments. In: Proceedings
of the IEEE congress on evolutionary computation (CEC), Vancouver, Canada, July 2006. p.
1324–1330.
18. Tero A, Kobayashi R, Nakagaki T. A mathematical model for adaptive transport network in
path finding by true slime mold. J Theor Biol. 2007;244:553–64.
19. Tero A, Yumiki K, Kobayashi R, Saigusa T, Nakagaki T. Flow-network adaptation in Physarum
amoebae. Theory Biosci. 2008;127:89–94.
20. Uymaz SA, Tezel G, Yel E. Artificial algae algorithm (AAA) for nonlinear global optimization.
Appl Soft Comput. 2015;31:153–71.
21. Zhang X, Wang Q, Chan FTS, Mahadevan S, Deng Y. A Physarum polycephalum optimization
algorithm for the bi-objective shortest path problem. Int J Unconv Comput. 2014;10:143–62.
Harmony Search
14
14.1 Introduction
Harmony search uses five parameters, including three core parameters such as the
size of harmony memory (HMS), the harmony memory considering rate (PHMCR ),
and the maximum number of iterations or improvisations (NI), and two optional ones
such as the pitch adjustment rate (PAR), and the adjusting bandwidth (BW) or fret
width (FW). HMS is similar to the population size in GA. PHMCR ∈ (0, 1) is the rate
of choosing one value from the harmony memory, while 1 − PHMCR is the rate of
randomly selecting one value from the domain. The number of improvisations (NI)
corresponds to the number of iterations. PAR decides whether the decision variables
are to be adjusted to a neighboring value. In [8], three PARs are used for moving
rates to the nearest, second nearest, and third nearest cities. The number of musicians
N is equal to the number of variables in the optimization function. In [12], fret width
is introduced to replace the static valued bandwidth, making the algorithm adaptive
to the variance in the variable range and suitable for solving real-valued problems.
Generating a new harmony is called improvisation. Harmony search generates a
new vector that encodes a candidate solution, after considering a selection of existing
quality vectors. It is an iterative improvement method initiated with a number of
provisional solutions that are stored in the harmony memory. At each iteration, a
new solution (harmony) x is generated that is based on three operations: memory
consideration for exploitation, random consideration for diversification, and pitch
adjustment for local search. A new harmony is then evaluated against an objective
function, and replaces the worst harmony in the harmony memory, only if its fitness
is better than that of the worst harmony. This process is repeated until an acceptable
solution is obtained.
Consider four decision variables, each of which has stored experience values in
the harmony memory as follows: x1 : {10, 20, 4}, x2 : {133, 50, 60}, x3 : {100, 23,
393}, and x4 : {37, 36, 56}. In an iteration, if x1 is assigned 20 from its memory, x2 is
adjusted from the value 133 stored in its memory to be 28, x3 is assigned 23 from its
memory, and x4 is assigned 67 from its feasible range x4 ∈ [0, 100]. The objective
function of a constructed solution (20, 28, 23, 67) is evaluated. If the new solution
is better than the worst solution in the harmony memory, then it replaces the worst
solution. This process is repeated until an optimal solution is reached.
In basic harmony search, randomly generated feasible solutions are initialized in the
harmony memory. In each iteration, the algorithm aims at improvising the harmony
memory. Harmony search algorithm can be summarized in four steps: initialization
of the harmony memory, improvisation of a new harmony, inclusion of the newly
generated harmony in the harmony memory if its fitness improves the worst fitness
value in the harmony memory, and loop until a termination criterion is satisfied.
The first step is initialization of the control parameters: HMS, HMCR, PAR, BW,
NI. Randomly generate feasible solution vectors from the solution xt obtained from
tabu search. The harmony memory is initialized with the solution obtained from
14.2 Harmony Search Algorithm 229
tabu search plus HMS − 1 solutions that are randomly chosen in the neighborhood
of xt :
xi = xt + rand(−0.5, 0.5), i = 1, 2, . . . , HMS. (14.1)
Then the solution is sorted by the objective function as
⎡ 1 ⎤
x1 · · · xj1 · · · xn1
⎢ . .. .. .. .. ⎥
⎢ .. . . . . ⎥
⎢ ⎥
⎢ xi · · · xi · · · xi ⎥
HM = ⎢ 1 n ⎥, (14.2)
⎢ . .. ..
j
.. .. ⎥
⎢ . ⎥
⎣ . . . . . ⎦
x1HMS · · · xjHMS · · · xnHMS
where n is the number of variables.
Next step is to generate new solutions. A new solution xi can be obtained by
choosing from the harmony memory with the probability of PHMCR , or generated
randomly with probability 1 − PHMCR in the feasible search space. PHMCR can be
selected as 0.9. This solution is then adjusted by a random number with probability
PAR, and remains unchanged with probability 1 − PAR. PAR can be selected as 0.8.
In pitch adjustment, the solution is changed slightly in the neighborhood space of
the solution.
The harmony memory is then updated. The new solution xi is substituted for
the worst solution in the harmony memory, if it outperforms the worst one. New
solutions are generated and the harmony memory is updated, until the stopping
criterion is satisfied. The flowchart of harmony search is given by Algorithm 14.1.
GA considers only two vectors for generating a new solution or offspring, whereas
harmony search takes into account, componentwise and on a probabilistic basis, all
the existing solutions (melodies) in the harmony memory. Harmony search is able
to infer new solutions merging the characteristics of all individuals by simply tuning
the values of its probabilistic parameters. Besides, it independently operates on each
constituent variable (note) of a solution vector (harmony), to which stochastic opera-
tors for fine-tuning and randomization are applied. The convergence rate of harmony
search and the quality of the produced solutions are not dramatically affected by
the initialized values of the constituent melodies in the harmony memory. Besides,
harmony search utilizes a probabilistic gradient which does not require the derivative
of the fitness function to be analytically solvable, nor even differentiable over the
whole solution space. Instead, the probabilistic gradient converges to progressively
better solutions iteration by iteration. Harmony search performs satisfactorily in both
continuous and discrete optimization problems. It is able to handle both decimal and
binary alphabets without modifying the definition of the original HMCR and PAR
parameters of the algorithm.
230 14 Harmony Search
There are some improved and hybridized variants of harmony search. The HMCR and
PAR parameters help harmony search in searching for globally and locally improved
solutions, respectively. Harmony search is not successful in performing local search
in numerical optimization [19].
Improved harmony search [19] dynamically adjusts the parameters PAR and BW
with regard to search iterations. It linearly adjusts PAR from its minimum to the max-
imum, while exponentially decreases BW from its maximum value to its minimum,
as iteration proceeds.
Global-best harmony search [23] hybridizes PSO concept with harmony search
operators. The pitch adjustment operator is modified to improve the convergence rate,
such that the new improvised harmony is directly selected from the best solution in
the harmony memory. PAR is dynamically updated. Instead of making a random
change in the generated solution after the harmony memory consideration phase, the
solution is replaced with the best solution in harmony memory with the probability
of PAR. Improved global-best harmony search [21] combines a novel improvisation
scheme with an existing PAR and BW updating mechanism.
14.3 Variants of Harmony Search 231
−2
10
−4
10
−6
10
0 20 40 60 80 100
Iteration
Figure 14.1 The evolution of a random run of harmony search for Rastrigin function: the minimum
and average objectives.
14.3 Variants of Harmony Search 233
Example 14.1: Revisit the Rastrigin function treated in Example 6.1. The global
optimum is f (x) = 0 at x∗ = 0.
We now find the global optimum by using the improved harmony search [19]. We
select HMCR = 0.9, PAR linearly decreasing from 0.9 to 0.3, and BW exponentially
decreasing from 0.5 to 0.2. The harmony memory size is selected as 50, and the
maximum number of iterations is 100. The initial harmonies are randomly generated
from the entire domain.
For 10 random runs, the solver always converged to the global optimum. For a
random run, it gives the optimum solution: f (x) = 6.0243 × 10−6 at (−0.1232 ×
10−3 , −0.1232 × 10−3 ), and all the individuals converged toward the global opti-
mum. The evolution of a random run is illustrated in Figure 14.1.
In music, harmony is the use of simultaneous pitches or chords, and is the vertical
aspect of music space. Melodic line is the horizontal aspect, as shown in Figure 14.2.
Melody is a linear succession of individual pitches. Figure 14.3 illustrates the melody
search model.
Melody search [2], as an improved version of harmony search method, mimics
performance processes of the group improvisation for finding the best series of
pitches within a melody. In such a group, the music players can improvise the melody
differently and lead one another to achieve the best subsequence of pitches. The group
of music players can achieve the best subsequence of pitches faster.
In medody search, each melodic pitch corresponds to a decision variable, each
melody is generated by a player and corresponds to a solution of the problem. Each
player produces a series of subsequent pitches within their possible ranges; if the
succession of pitches makes a good melody, that experience is stored into the player
memory. Unlike harmony search that uses a single harmony memory, melody search
employs several memories named player memory.
Melody
Harmony Harmony
234 14 Harmony Search
Melody 1, by player 1
Melody 2, by player 2
Melody 3, by player 3
References
1. Al-Betar MA, Doush IA, Khader AT, Awadallah MA. Novel selection schemes for harmony
search. Appl Math Comput. 2012;218:6095–117.
2. Ashrafi SM, Dariane AB. A novel and effective algorithm for numerical optimization: melody
search. In: Proceedings of the 11th international conference on hybrid intelligent systems (HIS),
Malacca, Malaysia, Dec 2011. p. 109–114.
3. Ashrafi SM, Dariane AB. Performance evaluation of an improved harmony search algorithm
for numerical optimization: melody Search (MS). Eng Appl Artif Intell. 2013;26:1301–21.
4. Castelli M, Silva S, Manzoni L, Vanneschi L. Geometric selective harmony search. Inf Sci.
2014;279:468–82.
5. Chakraborty P, Roy GG, Das S, Jain D, Abraham A. An improved harmony search algorithm
with differential mutation operator. Fundamenta Informaticae. 2009;95(4):401–26.
References 235
6. Das S, Mukhopadhyay A, Roy A, Abraham A, Panigrahi BK. Exploratory power of the harmony
search algorithm: analysis and improvements for global numerical optimization. IEEE Trans
Syst Man Cybern Part B. 2011;41(1):89–106.
7. Fesanghary M, Mahdavi M, Minary-Jolandan M, Alizadeh Y. Hybridizing harmony search
algorithm with sequential quadratic programming for engineering optimization problems.
Comput Methods Appl Mech Eng. 2008;197:3080–91.
8. Geem ZW, Tseng C, Park Y. Harmony search for generalized orienteering problem: best touring
in China. In: Wang L, Chen K, Ong Y editors. Advances in natural computation, vol. 3412 of
Lecture Notes in Computer Science. Berlin: Springer; 2005. p. 741–750.
9. Geem ZW, Kim JH, Loganathan GV. A new heuristic optimization algorithm: harmony search.
Simulation. 2001;76(2):60–8.
10. Geem ZW, Kim JH, Loganathan GV. Harmony search optimization: application to pipe network
design. Int J Model Simul. 2002;22:125–33.
11. Geem ZW. Novel derivative of harmony search algorithm for discrete design variables. Appl
Math Comput. 2008;199(1):223–30.
12. Geem ZW. Recent advances in harmony search algorithm. Berlin: Springer; 2010.
13. Geem ZW, Sim K-B. Parameter-setting-free harmony search algorithm. Appl Math Comput.
2010;217(8):3881–9.
14. Hasannebi O, Erdal F, Saka MP. Adaptive harmony search method for structural optimization.
ASCE J Struct Eng. 2010;136(4):419–31.
15. Karimi M, Askarzadeh A, Rezazadeh A. Using tournament selection approach to improve har-
mony search algorithm for modeling of proton exchange membrane fuel cell. Int J Electrochem
Sci. 2012;7:6426–35.
16. Khalili M, Kharrat R, Salahshoor K, Sefat MH. Global dynamic harmony search algorithm:
GDHS. Appl Math Comput. 2014;228:195–219.
17. Lee KS, Geem ZW. A new structural optimization method based on the harmony search algo-
rithm. Comput Struct. 2004;82:781–98.
18. Lee KS, Geem ZW. A new meta-heuristic algorithm for continuous engineering optimization:
harmony search theory and practice. Comput Methods Appl Mech Eng. 2005;194:3902–33.
19. Mahdavi M, Fesanghary M, Damangir E. An improved harmony search algorithm for solving
optimization problems. Appl Math Comput. 2007;188(2):1567–79.
20. Maheri MR, Narimani MM. An enhanced harmony search algorithm for optimum design of
side sway steel frames. Comput Struct. 2014;136:78–89.
21. Mohammed EA. An improved global-best harmony search algorithm. Appl Math Comput.
2013;222:94–106.
22. Mora-Gutierrez RA, Ramirez-Rodriguez J, Rincon-Garcia EA. An optimization algorithm
inspired by musical composition. Artif Intell Rev. 2014;41:301–15.
23. Omran MGH, Mahdavi M. Global-best harmony search. Appl Math Comput. 2008;198(2):643–
56.
24. Pan QK, Suganthan PN, Liang JJ, Tasgetiren MF. A local-best harmony search algorithm with
dynamic subpopulations. Eng Optim. 2010;42(2):101–17.
25. Pan QK, Suganthan PN, Tasgetiren MF, Liang JJ. A self-adaptive global best harmony search
algorithm for continuous optimization problems. Appl Math Comput. 2010;216:830–48.
26. Wang CM, Huang YF. Self-adaptive harmony search algorithm for optimization. Expert Syst
Appl. 2010;37:2826–37.
27. Yadav P, Kumar R, Panda SK, Chang CS. An intelligent tuned harmony search algorithm for
optimization. Inf Sci. 2012;196:47–72.
28. Zou D, Gao L, Wu J, Li S. Novel global harmony search algorithm for unconstrained problems.
Neurocomputing. 2010;73:3308–18.
Swarm Intelligence
15
Similar to PSO, firefly algorithm [83] is inspired by the ability of fireflies in emitting
light (bioluminescence) in order to attract other fireflies for mating purposes. It was
first proposed for multimodal continuous optimization [83]. A further study on the
firefly algorithm is presented for constrained continuous optimization problems in
[45]. In [69] a discrete firefly algorithm is presented to minimize makespan for
flowshop scheduling problems.
A firefly’s flash mainly acts as a signal to attract mating partners and potential
prey. Flashes also serve as a protective warning mechanism. In firefly algorithm [83],
a firefly will be attracted to other fireflies regardless of their sex. Its attractiveness
is proportional to its brightness, and they both decrease as the distance increases. If
there is no brighter one than a particular firefly, it will move randomly. The brightness
of a firefly is affected by the landscape of the objective function.
The attractiveness of a firefly is determined by its light intensity I , which can be
defined by the fitness function f (x). The attractiveness may be calculated by
β(r ) = β0 e−γ r ,
2
(15.4)
where r is the distance between any two fireflies, β0 is the initial attractiveness at
r = 0, and γ is an absorption coefficient, which controls the decrease in the intensity
of light.
A less attractive firefly i move toward a more attractive firefly j by
x i = x i + β0 e−γ x j −x i + α(rand − 0.5),
2
(15.5)
where α ∈ [0, 1], and rand ∈ (0, 1) is a uniformly distributed random number.
Typically, γ0 = 0.8, α = 0.01, β0 = 1.
Firefly algorithm is implemented as follows. For all the N P fireflies: if intensity
I j < Ii , move firefly j toward i; update attractiveness and light intensity. The
algorithm repeats unitl the termination criterion is satisfied.
Firefly movement is based on the local optima, but is not influenced by the global
optima. Thus the exploration rate of firefly algorithm is very limited. Fuzzy firefly
algorithm [26] increases the exploration and improves the global search of fire-
fly algorithm. In each iteration, the global optima and some brighter fireflies have
influence on the movement of fireflies. The effect of each firefly depends on its
attractiveness, which is selected as a fuzzy variable.
Eagle strategy [89] is a two-stage hybrid search method for stochastic optimiza-
tion. It combines the random search using Levy walk with firefly algorithm in an
iterative manner.
240 15 Swarm Intelligence
Free search [63] is inspired from the animals’ behavior and operates on a set of
solutions called population. In free search, each animal has original peculiarities
called sense and mobility. The sense is an ability of the animal for orientation within
the search space, and it is used for selection of location for the next step. The sen-
sibility varies during the optimization process. The animal can select any location
marked with pheromone, which fits its sense. During the exploration walk, the ani-
mals step within the neighbor space. The neighbor space also varies for the different
animals. Therefore, the probability for access to any location of the search space
is nonzero. During the exploration, each animal achieves some favor (an objective
function value) and distributes a pheromone in amount proportional to the amount
of the found favor. The pheromone is fully replaced with a new one after each walk.
Particularly, the animals in the algorithm are mobile. Each animal can operate either
with small precise steps for local search or with large steps for global exploration.
Moreover, the individual decides how to search personally.
Animal migration optimization [42] is a heuristic optimization method inspired
by the ubiquitous animal migration behavior, such as birds, mammals, fish, reptiles,
amphibians, insects, and crustaceans. In the first process, the algorithm simulates how
the groups of animals move from the current position to the new position. During
this process, each individual should obey three main rules. In the latter process, the
algorithm simulates how some animals leave the group and some join the group
during the migration.
Pa ∈ [0, 1]. The nests discovered by the host bird are abandoned and removed from
the population, and they are replaced by new nests (with new random solutions).
Levy flights algorithm is a stochastic algorithm for global optimization [62]. It is a
random walk that is characterized by a series of straight jumps chosen from a heavy-
tailed probability density function [77]. Unlike Gaussian and Cauchy distributions,
Levy distribution is nonsymmetrical, and has infinite variance with an infinite mean.
The foraging path of an animal commonly has the next move based on the current
state and the variation probability to the next state. The flight behavior of many birds
and insects has the characteristics of Levy flights.
When generating new solution x(t + 1) for the ith cuckoo, a Levy flight is per-
formed:
x i (t + 1) = x i (t) + αl ev y(λ), (15.6)
where α > 0 is the step size, and the random step length is drawn from a Levy
distribution u = t −λ , λ ∈ (1, 3], which has an infinite variance with an infinite
mean. This escapes local minima easier than Gaussian random steps do.
Cuckoo search algorithm consists of three parameters: Pa (probability of worse
nests to be abandoned), step size α, and random step length λ. The optimal solutions
obtained by cuckoo search are far better than the best solutions obtained by PSO or
GA [88]. The algorithm flowchart is given in Algorithms 15.2.
In [19], cuckoo search is enhanced with multimodal optimization capacities by
incorporating a memory mechanism to register potential local optima according
to their fitness value and the distance to other potential solutions, modifying the
individual selection strategy to accelerate the detection process of new local minima,
and including a depuration procedure to cyclically eliminate duplicated memory
elements.
15.5 Cuckoo Search 245
Example 15.1: We revisit Rosenbrock function treated in Examples 3.3 and 5.1. The
function has the global minimum f (x) = 0 at xi = 1, i = 1, . . . , n. The landscape
of this function is shown in Figure 1.3.
We apply cuckoo search algorithm to solve this problem. The implementation
sets the number of nests (solutions) as 30, the maximum number of iterations as
1000, Pa = 0.25, and selects the initial nests randomly from the entire domain.
For a random run, we have f (x) = 3.0920 × 10−4 at (0.9912, 0.9839) with 60000
function evaluations. All the individuals converge toward the global optimum. For
10 random runs, the solver always converged toward a point very close to the global
optimum. The evolution of a random run is illustrated in Figure 15.1.
10
10
8
10
6
10
Function value
4
10
2
10
0
10
−2
10
−4
10
0 200 400 600 800 1000
Iteration
Figure 15.1 The minimum objective of a random run of cuckoo search for Rosenbrock function.
246 15 Swarm Intelligence
Bats are the only volitant mammals in the world. There are nearly 1,000 species
of bats. Many bats have echolocation (https://askabiologist.asu.edu/echolocation);
they can emit a very loud and short sound pulse and receive the echo reflected
from the surrounding objects by their extraordinary big auricle. The emitted pulse
could be as loud as 110 dB in the ultrasonic region. The loudness varies from the
loudest when searching for prey and to a quieter base when homing toward the
prey. This echo is then analyzed in their brain, from which they can discriminate
direction for their flight pathway and also distinguish different insects and obstacles,
to hunt prey and avoid collision effectively. Natural bats increases the rate of pulse
emission and decreases the loudness when a bat finds a prey [7]. The echolocation
signal can simultaneously serve as a communication function, allowing for social
communication in bats population.
Bat algorithm [84,85] is a metaheuristic optimization method inspired by the
echolocation or biosonar behavior of bats. In the algorithm, all bats navigate by
using echolocation to sense distance and detect the surroundings. Bats fly randomly
with velocity v i at position x i with a fixed frequency f min , varying wavelength λ,
and loudness A0 to search for prey. They automatically adjust the wavelength of
their emitted pulses and adjust the rate of pulse emission r ∈ [0, 1], depending on
the proximity of their target. Typically, the rate of pulse emission r increases and the
loudness A decreases when the population draws nearer to the local optimum. The
loudness varies from a positive large value A0 to a minimum value Amin .
Apart from the population size N P and maximum iteration number, the algorithm
employs two control parameters: pulse rate and loudness. The pulse rate regulates
an improvement of the best solution, while the loudness influences an acceptance of
the best solution.
Bat algorithm controls the size and orientation of bats’ moving speed by adjusting
the frequency of each bat and then moves to a new location. To some extent, PSO
is a special case of bat algorithm. Bat algorithm utilizes a balanced combination of
PSO and the local/global search mode controlled by loudness A and pulse rate r .
Each bat in the population represents a candidate solution x i , i = 1, . . . , N P .
Bat algorithm consists of initialization, variation operation, local search, solution
evaluation, and replacement steps.
In the initialization step, the algorithm parameters are initialized. Then, an initial
population of N P solutions (bats) x i is generated randomly. Next, this population is
evaluated, and the best solution is determined as x best .
The variation operator moves the virtual bats in the search space. In local search,
the current best solution is improved by the random walk direct exploitation heuris-
tics. The replacement step replaces the current solution with the newly generated
solution according to some probability. A local search is launched with the proba-
bility of pulse rate r . The probability of accepting the new best solution and save the
best solution conditionally depends on loudness A.
x it = x it−1 + v it , (15.7)
15.6 Bat Algorithm 247
Spiders are air-breathing arthropods having eight legs and chelicerae with fangs.
Most of them detect prey by sensing vibrations on their webs. Some social species,
e.g., Mallos gregalis and Oecobius civitas, live in groups and interact with others in
248 15 Swarm Intelligence
1. Initialization.
Set t = 1.
Set bat population N P .
Set loudness Ai , pulse frequency f i at x i , pulse rate ri .
Initialze x i , v i .
2. Repeat
a. Generate new solutions by adjusting frequency, and updating velocities and location
solution.
b. for bat i:
if (rand > ri )
Select a solution among the best solutions.
Generate a location solution around the selected best solution.
end if
Generate a new solution by flying randomly.
if (rand < Ai and f (x i ) < f (x ∗ ))
Accept the new solution.
Increase ri and reduce Ai .
end if
end for
c. Rank the bats and find the current best x ∗ .
d. Set t = t + 1.
until termination criterion is met.
the same group. Spiders have accurate senses of vibration. They can separate different
vibrations and sense their respective intensities. The social spiders passively receive
vibrations generated by other spiders on the same web to have a clear view of the
web. The foraging behavior of the social spider can be described as the cooperative
movement of the spiders toward the food source.
Social spider optimization [18] is a swarm algorithm imitating the mating behavior
of social spiders. A group of spiders interact with one another based on the biological
laws of the cooperative colony. The algorithm considers the gender of the spiders.
Depending on gender, each individual is conducted by a set of different evolutionary
operators which mimic different cooperative behaviors that are typically found in
the colony.
Social spider algorithm [92] solves global optimization problems, imitating the
information-sharing foraging strategy of social spiders, utilizing the vibrations on
the spider web to determine the positions of preys. The search space is formulated
as a hyper-dimensional spider web, on which each position represents a feasible
solution. The web also serves as the transmission media of the vibrations generated
by the spiders. Each spider on the web holds a position and the fitness of the solution
is based on the objective function, and represented by the potential of finding a
food source at the position. The spiders can move freely on the web. When a spider
moves to a new position, it generates a vibration which is propagated over the web.
15.7 Swarm Intelligence Inspired by Animal Behaviors 249
The intensity of the vibration is correlated with the fitness of the position. In this
way, the spiders on the same web share their personal information with others to
form a collective social knowledge.
Monkey Algorithm
Monkey algorithm [94] is a swarm intelligent algorithm. It was put forward for solv-
ing large-scale, multimodal optimization problem. The method derives from the sim-
ulation of mountain-climbing processes of monkeys. It consists of three processes:
climb process, watch–jump process, and somersault process. In the original monkey
algorithm, the time consumed mainly lies in using the climb process to search local
optimal solutions.
15.7 Swarm Intelligence Inspired by Animal Behaviors 253
the capture group makes use of their keen visions to exploit neighborhood of the
current best food source found by the search group. The randomness and fuzziness
of the foraging behavior of fruit fly swarm during the olfactory phase are described
by a normal cloud model. The algorithm outperforms, or perform similarly to, PSO
and DE.
Antlion optimizer (http://www.alimirjalili.com/ALO.html) [54] is a population-
based global optimization metaheuristic that mimics the hunting mechanism of
antlions in nature. Five main steps of hunting prey such as the random walk of
antlions, building traps, entrapment of antlions in traps, catching preys, and rebuild-
ing traps are implemented.
Moths fly on nights for searching food by maintaining a fixed angle with respect
to the moon, a very effective mechanism called transverse orientation for travel-
ing in a straight line for long distances. However, these insects are trapped in a
useless/deadly spiral path around artificial lights. Moth flame optimization (http://
www.alimirjalili.com/MFO.html) [55] is a population-based metaheuristic optimiza-
tion method inspired by the navigation strategy of moths.
The idea underlying all swarm intelligence algorithms is similar. Shuffled frog leap-
ing algorithm, group search optimizer, firefly algorithm, ABC and the gravitational
search algorithm are all algorithmically identical to PSO under certain conditions
[47]. However, their implementation details result in notably different performance
levels.
More and more emerging computational paradigms are inspired by the metaphor
of nature. This section gives an introduction to some of them.
Amorphous Computing
Amorphous computing [1,64,79] presents a computational paradigm that consists
of a set of tiny, independent and self-powered processors or robots that can com-
municate wirelessly to a limited distance. Such systems should also be compared
to so-called population protocols [3]. In the underlying model, they consider the
anonymous finite-state agents computing a predicate of the multiset of their inputs
via two-way or one-way interactions in the all-pairs family of communication net-
works.
258 15 Swarm Intelligence
Hyper-Spherical Search
Hyper-spherical search [33] is a population-based metaheuristic. Population indi-
viduals are particles and hyper-sphere centers that all together form particle sets.
Searching the hyper-sphere inner space made by the hyper-sphere center and its
particle is the basis of the algorithm. The algorithm hopefully converges to a state
at which there exists only one hyper-sphere center, and its particles are at the same
position and have the same cost function value as the hyper-sphere center.
Weighted superposition attraction algorithm [8] is a swarm-based metaheuristic
for global optimization, based on the superposition principle in combination with
the attracted movement of agents that are observable in many systems; it attempts
to model and simulate the dynamically changing superposition due to the dynamic
nature of the system in combination with the attracted movement of agents.
Problems
15.1 Give some specific conditions under which the firefly algorithm can be con-
sidered as a special case of the PSO algorithm.
15.2 Run the accompanying MATLAB code of firefly algorithm to find the global
minimum of Schwefel function in the Appendix. Investigate how to improve
the result by adjusting the parameters.
15.3 Run the accompanying MATLAB code of bat algorithm to find the global
minimum of Griewank function in the Appendix. Understand the principle
of the algorithm.
15.4 Run the accompanying MATLAB code of gray wolf optimizer to find the
global minimum of Schwefel function in the Appendix. Compare its perfor-
mance with that of firefly algorithm in Problems 15.2 and 15.3.
15.5 Run the accompanying MATLAB code of collective animal behavior algo-
rithm to find the global minimum of Griewank function in the Appendix.
Understand the principle of the algorithm.
15.9 Other Swarm Intelligence-Based Metaheuristics 259
References
1. Abelson H, Allen D, Coore D, Ch Hanson G, Homsy TF Knight, Jr R, Nagpal E, Rauch GJ
Sussman, Weiss R. Amorphous computing. Commun ACM. 2000;43(5):74–82.
2. Al-Madi N, Aljarah I, Ludwig SA. Parallel glowworm swarm optimization clustering algo-
rithm based on MapReduce. In: Proceedings of IEEE symposium on swarm intelligence (SIS),
Orlando, FL, December 2014. p. 1–8.
3. Angluin D, Aspnes J, Eisenstat D, Ruppert E. The computational power of population protocols.
Distrib Comput. 2007;20(4):279–304.
4. Askarzadeh A, Rezazadeh A. A new heuristic optimization algorithm for modeling of proton
exchange membrane fuel cell: bird mating optimizer. Int J Energ Res. 2013;37(10):1196–204.
5. Bansal JC, Sharma H, Jadon SS, Clerc M. Spider monkey optimization algorithm for numerical
optimization. Memetic Comput. 2014;6(1):31–47.
6. Bastos-Filho CJA, Nascimento DO. An enhanced fish school search algorithm. In: Proceed-
ings of 2013 BRICS congress on computational intelligence and 11th Brazilian congress on
computational intelligence, Ipojuca, Brazil, September 2013. p. 152–157.
7. Bates ME, Simmons JA, Zorikov TV. Bats use echo harmonic structure to distinguish their
targets from background clutter. Science. 2011;333(6042):627–30.
8. Baykasoglu A, Akpinar S. Weighted Superposition Attraction (WSA): a swarm intelligence
algorithm for optimization problems - part 1: unconstrained optimization; part 2: constrained
optimization. Appl Soft Comput. 2015;37:396–415.
9. Bishop JM. Stochastic searching networks. Proceedings of IEE conference on artificial neural
networks, London, UK, October 1989. p. 329–331.
10. Brabazon A, Cui W, O’Neill M. The raven roosting optimisation algorithm. Soft Comput.
2016;20(2):525–45.
11. Buttar AS, Goel AK, Kumar S. Evolving novel algorithm based on intellectual behavior of
wild dog group as optimizer. In: Proceedings of IEEE symposium on swarm intelligence (SIS),
Orlando, FL, December 2014. p. 1–7.
12. Cai X, Fan S, Tan Y. Light responsive curve selection for photosynthesis operator of APOA.
Int J Bio-Inspired Comput. 2012;4(6):373–9.
13. Caraveo C, Valdez F, Castillo O. A new bio-inspired optimization algorithm based on the self-
defense mechanisms of plants. In: Design of intelligent systems based on fuzzy logic, neural
networks and nature-inspired optimization, vol. 601 of studies in computational intelligence.
Berlin: Springer; 2015. p. 211–218.
14. Chen Z. A modified cockroach swarm optimization. Energ Procedia. 2011;11:4–9.
15. Chen Z, Tang H. Cockroach swarm optimization. In: Proceedings of the 2nd international
conference on computer engineering and technology (ICCET’10). April 2010, vol. 6. p. 652–
655.
16. Civicioglu P. Transforming geocentric cartesian coordinates to geodetic coordinates by using
differential search algorithm. Comput Geosci. 2012;46:229–47.
260 15 Swarm Intelligence
17. Cuevas E, Gonzalez M. An optimization algorithm for multimodal functions inspired by col-
lective animal behavior. Soft Comput. 2013;17:489–502.
18. Cuevas E, Cienfuegos M, Zaldvar D, Prez-Cisneros M. A swarm optimization algorithm
inspired in the behavior of the social-spider. Expert Syst Appl. 2013;40(16):6374–84.
19. Cuevas E, Reyna-Orta A. A cuckoo search algorithm for multimodal optimization. Sci World
J. 2014;2014:20. Article ID 497514.
20. Elbeltagi E, Hegazy T, Grierson D. Comparison among five evolutionary-based optimization
algorithms. Adv Eng Inf. 2005;19(1):43–53.
21. Eusuff MM, Lansey KE. Optimization of water distribution network design using the shuffled
frog leaping algorithm. J Water Resour Plan Manage. 2003;129(3):210–25.
22. Eusuff MM, Lansey K, Pasha F. Shuffled frog-leaping algorithm: a memetic meta-heuristic for
discrete optimization. Eng Optim. 2006;38(2):129–54.
23. Filho C, de Lima Neto FB, Lins AJCC, Nascimento AIS, Lima MP. A novel search algorithm
based on fish school behavior. In: Proceedings of IEEE international conference on systems,
man and cybernetics, Singapore, October 2008. p. 2646–2651.
24. Gandomi AH, Alavi AH. Krill herd: A new bio-inspired optimization algorithm. Commun
Nonlinear Sci Numer Simul. 2012;17(12):4831–45.
25. Haldar V, Chakraborty N. A novel evolutionary technique based on electrolocation principle
of elephant nose fish and shark: Fish electrolocation optimization. Soft Computing, first online
on 11, February 2016. p. 22. doi:10.1007/s00500-016-2033-1.
26. Hassanzadeh T, Kanan HR. Fuzzy FA: a modified firefly algorithm. Appl Artif Intell.
2014;28:47–65.
27. Havens TC, Spain CJ, Salmon NG, Keller JM. Roach infestation optimization. In: Proceedings
of the IEEE swarm intelligence symposium, St. Louis, MO, USA, September 2008. p. 1–7.
28. He S, Wu QH, Saunders JR. A novel group search optimizer inspired by animal behavioral
ecology. In: Proceedings of IEEE congress on evolutionary computation (CEC), Vancouver,
BC, Canada, July 2006. p. 1272–1278.
29. He S, Wu QH, Saunders JR. Group search optimizer: an optimization algorithm inspired by
animal searching behavior. IEEE Trans Evol Comput. 2009;13(5):973–90.
30. Huang Z, Chen Y. Log-linear model based behavior selection method for artificial fish swarm
algorithm. Comput Intell Neurosci. 2015;2015:10. Article ID 685404.
31. Jayakumar N, Venkatesh P. Glowworm swarm optimization algorithm with topsis for solv-
ing multiple objective environmental economic dispatch problem D. Appl Soft Comput.
2014;23:375–86.
32. Jordehi AR. Chaotic bat swarm optimisation (CBSO). Appl Soft Comput. 2015;26:523–30.
33. Karami H, Sanjari MJ, Gharehpetian GB. Hyper-spherical search (HSS) algorithm: a novel
meta-heuristic algorithm to optimize nonlinear functions. Neural Comput Appl. 2014;25:1455–
65.
34. Kaveh A, Farhoudi N. A new optimization method: dolphin echolocation. Adv Eng Softw.
2013;59:53–70.
35. Krishnanand KN, Ghose D. Detection of multiple source locations using a glowworm metaphor
with applications to collective robotics. In: Proceedings of IEEE swarm intelligence sympo-
sium, 2005. p. 84–91.
36. Krishnanand KN, Ghose D. Theoretical foundations for rendezvous of glowworm-inspired
agent swarms at multiple locations. Robot Auton Syst. 2008;56(7):549–69.
37. Krishnanand KN, Ghose D. Glowworm swarm optimization for simultaneous capture of mul-
tiple local optima of multimodal functions. Swarm Intell. 2009;3:87–124.
38. Kundu D, Suresh K, Ghosh S, Das S, Panigrahi BK, Das S. Multi-objective optimization with
artificial weed colonies. Inf Sci. 2011;181(12):2441–54.
39. Li XL, Lu F, Tian GH, Qian JX. Applications of artificial fish school algorithm in combinatorial
optimization problems. Chin J Shandong Univ (Eng Sci). 2004;34(5):65–7.
References 261
40. Li X, Luo J, Chen M-R, Wang N. An improved shuffled frog-leaping algorithm with extremal
optimisation for continuous optimisation. Inf Sci. 2012;192:143–51.
41. Li XL, Shao ZJ, Qian JX. An optimizing method based on autonomous animals: fish-swarm
algorithm. Syst Eng—Theory Pract. 2002;22(11):32–8.
42. Li X, Zhang J, Yin M. Animal migration optimization: an optimization algorithm inspired by
animal migration behavior. Neural Comput Appl. 2014;24:1867–77.
43. Li L, Zhou Y, Xie J. A free search krill herd algorithm for functions optimization. Math Probl
Eng. 2014;2014:21. Article ID 936374.
44. Linhares A. Synthesizing a predatory search strategy for VLSI layouts. IEEE Trans Evol
Comput. 1999;3(2):147–52.
45. Lukasik S, Zak S. Firefly algorithm for continuous constrained optimization tasks. In: Proceed-
ings of the 1st international conference on computational collective intelligence: Semantic web,
social networks and multiagent systems, Wroclaw, Poland, October 2009. p. 97–106.
46. Luo Q, Zhou Y, Xie J, Ma M, Li L. Discrete bat algorithm for optimal problem of permutation
flow shop scheduling. Sci World J. 2014;2014:15. Article ID 630280.
47. Ma H, Ye S, Simon D, Fei M. Conceptual and numerical comparisons of swarm intelligence
optimization algorithms. Soft Comput. 2016:1–20. doi:10.1007/s00500-015-1993-x.
48. Ma L, Zhu Y, Liu Y, Tian L, Chen H. A novel bionic algorithm inspired by plant root foraging
behaviors. Appl Soft Comput. 2015;37:95–113.
49. Mahmoudi S, Lotfi S. Modified cuckoo optimization algorithm (MCOA) to solve graph coloring
problem. Appl Soft Comput. 2015;33:48–64.
50. Martinez-Garcia FJ, Moreno-Perez JA. Jumping frogs optimization: a new swarm method for
discrete optimization. Technical Report DEIOC 3/2008. Spain: Universidad de La Laguna;
2008.
51. Mehrabian AR, Lucas C. A novel numerical optimization algorithm inspired from weed colo-
nization. Ecol Inf. 2006;1:355–66.
52. Meng Z, Pan J-S. Monkey king evolution: a new memetic evolutionary algorithm and its
application in vehicle fuel consumption optimization. Knowl.-Based Syst. 2016;97:144–57.
53. Merrikh-Bayat F. The runner-root algorithm: a metaheuristic for solving unimodal and mul-
timodal optimization problems inspired by runners and roots of plants in nature. Appl Soft
Comput. 2015;33:292–303.
54. Mirjalili S. The ant lion optimizer. Adv Eng Softw. 2015;83:80–98.
55. Mirjalili S. Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm.
Knowl-Based Syst. 2015;89:228–49.
56. Mirjalili S, Mirjalili SM, Lewis A. Grey wolf optimizer. Adv Eng Softw. 2014;69:46–61.
57. Mucherino A, Seref O. Monkey search: a novel metaheuristic search for global optimization.
In: AIP conference proceedings 953: Data mining, systems analysis and optimization in bio-
medicine, American, Gainesville, FL, USA, March 2007. New York: American Institute of
Physics; 2007. p. 162–173.
58. Nasuto SJ, Bishop JM. Convergence analysis of stochastic diffusion search. Parallel Algorithms
Appl. 1999;14:89–107.
59. Obagbuwa IC, Adewumi AO. An improved cockroach swarm optimization. Sci World J.
2014;375358:13.
60. Osaba E, Yang X-S, Diaz F, Lopez-Garcia P, Carballedo R. An improved discrete bat algorithm
for symmetric and asymmetric traveling salesman problems. Eng Appl Artif Intell. 2016;48:59–
71.
61. Pan W-T. A new fruit fly optimization algorithm: taking the financial distress model as an
example. Knowl-Based Syst. 2012;26:69–74.
62. Pavlyukevich I. Levy flights, non-local search and simulated annealing. J Comput Phys.
2007;226(2):1830–44.
63. Penev K, Littlefair G. Free search-a comparative analysis. Inf Sci. 2005;172:173–93.
262 15 Swarm Intelligence
64. Petru L, Wiedermann J. A universal flying amorphous computer. In: Proceedings of the 10th
International conference on unconventional computation (UC’2011), Turku, Finland, June
2011. p. 189–200.
65. Poliannikov OV, Zhizhina E, Krim H. Global optimization by adapted diffusion. IEEE Trans
Sig Process. 2010;58(12):6119–25.
66. Rajabioun R. Cuckoo optimization algorithm. Appl Soft Comput. 2011;11(8):5508–18.
67. Ray T, Liew KM. Society and civilization: an optimization algorithm based on the simulation
of social behavior. IEEE Trans Evol Comput. 2003;7(4):386–96.
68. Salhi A, Fraga ES. Nature-inspired optimisation approaches and the new plant propagation
algorithm. In: Proceedings of the international conference on numerical analysis and optimiza-
tion (ICeMATH’11), Yogyakarta, Indonesia, June 2011. p. K2-1–K2-8.
69. Sayadia MK, Ramezaniana R, Ghaffari-Nasab N. A discrete firefly meta-heuristic with local
search for makespan minimization in permutation flow shop scheduling problems. Int J Ind
Eng Comput. 2010;1(1):1–10.
70. Shiqin Y, Jianjun J, Guangxing Y. A dolphin partner optimization. In: Proceedings of IEEE
WRI global congress on intelligent systems, Xiamen, China, May 2009, vol. 1. p. 124–128.
71. Sulaiman M, Salhi A. A seed-based plant propagation algorithm: the feeding station model.
Sci World J. 2015;2015:16. Article ID 904364.
72. Sur C. Discrete krill herd algorithm—a bio-inspired metaheuristics for graph based network
route optimization. In: Natarajan R, editor. Distributed computing and internet technology, vol.
8337 of Lecture notes in computer science. Berlin: Springer; 2014. p. 152–163.
73. Tuba M, Subotic M, Stanarevic N. Modified cuckoo search algorithm for unconstrained opti-
mization problems. In: Proceedings of the european computing conference (ECC), Paris,
France, April 2011. p. 263–268.
74. Tuba M, Subotic M, Stanarevic N. Performance of a modified cuckoo search algorithm for
unconstrained optimization problems. WSEAS Trans Syst. 2012;11(2):62–74.
75. Wang G-G, Gandomi AH, Alavi AH. Stud krill herd algorithm. Neurocomputing.
2014;128:363–70.
76. Wang P, Zhu Z, Huang S. Seven-spot ladybird optimization: a novel and efficient metaheuristic
algorithm for numerical optimization. Sci World J. 2013;2013:11. Article ID 378515.
77. Walton S, Hassan O, Morgan K, Brown M. Modified cuckoo search: a new gradient free
optimisation algorithm. J Chaos, Solitons Fractals. 2011;44(9):710–8.
78. Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–
2.
79. Wiedermann J, Petru L. On the universal computing power of amorphous computing systems.
Theor Comput Syst. 2009;46(4):995–1010.
80. Wu L, Zuo C, Zhang H. A cloud model based fruit fly optimization algorithm. Knowl-Based
Syst. 2015;89:603–17.
81. Wu L, Zuo C, Zhang H, Liu Z. Bimodal fruit fly optimization algorithm based on cloud model
learning. Soft Comput. 2016:17. doi:10.1007/s00500-015-1890-3.
82. Yan X, Yang W, Shi H. A group search optimization based on improved small world and its
applicationon neural network training in ammonia synthesis. Neurocomputing. 2012;97:94–
107.
83. Yang XS. Firefly algorithms for multimodal optimization. In: Proceedings of the 5th inter-
national symposium on stochastic algorithms: Foundations and applications, SAGA 2009,
Sapporo, Japan, October 2009. p. 169–178.
84. Yang X-S. A new metaheuristic bat-inspired Algorithm. In: Cruz C, Gonzlez J, Krasnogor
GTN, Pelta DA, editors. Nature inspired cooperative strategies for optimization (NICSO), vol.
284 of Studies in computational intelligence. Berlin, Germany: Springer; 2010. p. 65–74.
85. Yang X-S. Bat algorithm for multi-objective optimisation. Int J Bio-Inspired Comput.
2011;3:267–74.
References 263
86. Yang X-S. Flower pollination algorithm for global optimization. In: Unconventional computa-
tion and natural computation, vol. 7445 of Lecture notes in computer science. Berlin: Springer;
2012. p. 240–249.
87. Yang XS, Deb S. Cuckoo search via Levy flights. In: Proceedings of world congress on nature
and biologically inspired computing, Coimbatore, India, December 2009. p. 210–214.
88. Yang XS, Deb S. Engineering optimisation by cuckoo search. Int J Math Modell Numer Optim.
2010;1(4):330–43.
89. Yang X-S, Deb S. Eagle strategy using Levy walk and firefly algorithms for stochastic opti-
mization. In: Gonzalez JR, Pelta DA, Cruz C, Terrazas G, Krasnogor N, editors. Nature inspired
cooperative strategies for optimization (NISCO 2010), vol. 284 of Studies in computational
intelligence. Berlin: Springer; 2010. p. 101–111.
90. Yang X-S, Karamanoglu M, He X. Multi-objective flower algorithm for optimization. Procedia
Comput Sci. 2013;18:861–8.
91. Yang X-S, Karamanoglu M, He XS. Flower pollination algorithm: a novel approach for mul-
tiobjective optimization. Eng Optim. 2014;46(9):1222–37.
92. Yu JJQ, Li VOK. A social spider algorithm for global optimization. Appl Soft Comput.
2015;30:614–27.
93. Zelinka I. SOMA—Self organizing migrating algorithm. In: Onwubolu GC, Babu BV, edi-
tors. New optimization techniques in engineering, vol. 141 of Studies in fuzziness and soft
computing. New York: Springer; 2004. p. 167–217.
94. Zhao R, Tang W. Monkey algorithm for global numerical optimization. J Uncertain Syst.
2008;2(3):164–75.
Biomolecular Computing
16
16.1 Introduction
A multicellular organism consists of a vast number of cells that run their own cycle
in parallel to maintain the organism alive and functional. A single cell is the build-
ing block of living organisms. As such, either development alone, or with evolu-
tion could be suitable design methods for a cellular computing machine. Living
cells can be categorized into the prokaryotes (including bacteria and archaea) and
eukaryotes (including animals, plants and fungi). Eukaryotic cells contain complex
functional substructures enclosed in membranes, whereas prokaryotic cells largely
organize their substructures without using them. P systems (or membrane systems)
are eukaryotical models of computation [45]. Biological functions are the results of
the interactions between modules made up of many molecular species.
The transport of chemicals (symbol objects) across membranes is a fundamental
function of a cell. The transport can be passive or active. It is passive when molecules
(symbol objects) pass across the membrane from a higher concentration region to a
lower concentration region, while it is active in a reverse case. This requires some
metabolical energy to accomplish the transport. Respiration is the biological process
that allows the cells (from bacteria to humans) to obtain energy. In short, respiration
promotes a flux of electrons from electron donors to a final electron acceptor, which
in most cases is the molecular oxygen.
Cellular architectures are an appealing architecture for hardware. BioSpice [60]
is a simulation tool providing models of cells and cell communication at different
and preprocess diverse incoming signals, likening their behavior to that of a fuzzy
classifier system. In [35], the author draws parallels between the adaptive behaviors
of various signaling pathways and those of engineered controllers.
Adleman’s seminal work [1] on the use of DNA molecules for solving a TSP of
seven cities has pioneered an era in DNA computing. The application was realized
by creating a solution environment in the biology laboratory and using biochemical
reactions. The cities and distances were coded using DNA series and operations for
solution were created using polymer chain reactions.
DNA computing is an optimization metaheuristic method that performs compu-
tation using DNA molecules of the living things. The paradigm utilizes the natural
biomolecular characteristics of DNA molecules, such as the inherent logic of DNA
hybridization, massive parallelism, high memory capacity without any corruption
in many years, and energy saving. Techniques in isolating, copying, and prepar-
ing nucleotide sequences make DNA computing a powerful alternative to silicon
computers. However, equipments such as solution tubes and systems named gel
electrophoresis are needed for production of DNA serials, DNA synthesis, and for
acquiring and analyzing the results.
DNA molecules exist in the form of single serial and double helix serial. Using the
single DNA serials, synthesis and reproduction of DNA molecules is realized. Double
helix DNA serials are created according to the Watson–Crick complementation rule:
A and T combine, and G and C combine.
The unique double stranded data structure can be exploited in many ways, such
as error correction. Errors in DNA occur due to mistakes made by DNA enzymes or
damage from thermal energy and ultraviolet energy from the sun. If the error occurs
in only one strand of double stranded DNA, repair enzymes can restore the proper
DNA sequence by using the complement strand as a reference. In biological systems,
due to this error correction capability, the error rate for DNA operations can be quite
low. A typical error rate for DNA replication is 10−9 .
DNA molecules can perform sophisticated, massively parallel computations. The
potential of polymerase chain reaction as a verification and readout approach has been
shown for computations at the molecular level. In [1], polymerase chain reaction is
used to amplify all the correct solutions of the TSP. A 10-bit RNA combinator-
ial library was reverse transcribed and amplified through colony polymerase chain
reaction followed by multiplex linear polymerase chain reaction to determine the
configuration of knights on a chessboard [18].
The experimental side often focused on the implementation of molecular circuits
and gates that mimic their digital counterparts [53,55,57,61]. Bacterial genetic ele-
ments have been connected to create logic gates that approximate Boolean functions
such as NOT, OR, and AND [26]. A set of deoxyribozyme-based logic gates (NOT,
AND, and XOR) is presented in [57]. As the input and output of the gates are both
16.2 DNA Computing 269
DNA strands, different gates can communicate with one another. In [55], DNA-based
digital logic gates for constructing large reliable circuits are implemented. In addi-
tion to logic gates they demonstrated signal restoration and amplification. In [53], a
set of catalytic logic gates suitable for scaling up to large circuits is presented and a
formalism for representing and analyzing circuits based on these gates is developed.
Engineered nucleic acid logic switches based on hybridization and conformational
changes have also been successfully demonstrated in vivo [31]. These switches have
been extended to more complex logical gates in [61]. Their gates are part of a single
molecule of RNA which can fold on itself into a special structure. It can detect spe-
cific chemical molecules as input, and either cleave itself or remain intact based on
the input(s) and the function of the gate. Advances have also been made in designing
simple molecular machines that open and close like a clamp [65].
RTRACS (Reverse transcription and TRanscription-based Autonomous Comput-
ing System) is a molecular computing system constructed with DNA, RNA, and
enzymes. In [33], a two-input logic gate is reported that receives input and produces
output in the form of RNA molecules. Each of the two-input molecules is chosen
from a set of two, and the logic gate produces an output molecule for each of the four
possible input combinations. Since the RNA strands can be arbitrarily assigned log-
ical values, this module is capable performing multiple logical operations, including
AND, NAND, OR, and NOR.
The processing of the information stored in DNA is rather random, incomplete,
and complex, especially as the size and sequence diversity of the oligonucleotide
mix increases. DNA computing generates solutions in a probabilistic manner, where
any particular solution is generated with some probability based on the complex
dynamics of the bonding process. By increasing the number of each strand in the
initial solution, one can assume with reasonable certainty that all possible solutions
will be constructed in the initial solution set.
Many theoretical designs have been proposed for DNA automata and Turing
machines [6,10,54]. An in vitro combination of DNA, restriction enzymes and
DNA ligase has been used to construct a programmable finite automaton (Turing
machine) [5].
Synthetic gene networks allow to engineer cells in the same way that we currently
program computers. To programm cell behavior, a component library of genetic cir-
cuit building blocks is necessary. These building blocks perform computation and
communications using DNA-binding proteins, small inducer molecules that inter-
act with these proteins, and segments of DNA that regulate the expression of these
proteins. A component library of cellular gates that implement several digital logic
functions is described in [59]. To represent binary streams, the chemical concentra-
tions of specific DNA-binding proteins and inducer molecules act as the input and
output signals of the genetic logic gates. Biochemical inverters are used to construct
more sophisticated gates and logic circuits. Figure 16.1 depicts a circuit in which a
NAND gate is connected to an inverter. For simplicity, both mRNA and their corre-
sponding protein products are used to denote the signals, or the circuit wires. The
regulation of the promoter and mRNA and protein decay enable the gate to perform
computation. The NAND gate protein output is expressed in the absence of either
270 16 Biomolecular Computing
Figure 16.1 A biochemical NAND gate connected to a downstream inverter. The two-input NAND
gate consists of two separate inverters, each with a different input, but both connected to the same
output protein [59].
of the inputs, and transcription of the output gene is only inhibited when both input
repressor proteins are present.
One of the most promising applications of DNA computing might be a DNA
memory [4]. The stored information on DNA can be kept without deteriorating for
a long period of time because DNA is very hard to collapse. DNA memory could
have a capacity greater than the human brain in minute scale (ex. a few hundreds
microliter) [4]. A DNA-based memory has been implemented with in vitro learning
and associative recall [11]. The learning protocol stores the sequences to which it
is exposed, and memories are recalled by sequence content through DNA-to-DNA
template annealing reactions. Theoretically, the memory has a pattern separation
capability that is very large, and can learn long DNA sequences. The learning and
recall protocols are massively parallel, as well as simple, inexpensive, and quick. The
design of an irreversible memory element for use in DNA-based computing systems
is presented in [7]. A DNA memory with 16.8 million addresses was achieved in
[64]. The data embedded into a unique address was correctly extracted through
an addressing processes based on nested polymerase chain reaction. In decoding
process, multiple data with different addresses can be also simultaneously accessed
by using the mixture of some address primers.
Recombinant DNA technology allows the manipulation of the genetic information
of the genome of a living cell. It facilitates the alteration of bio-nanomachines within
the living cells and leads to genetically modified organisms. Manipulation of DNA
mimics the horizontal gene transfer in the test tube.
Numerical DNA Computing
Unlike DNA computing using DNA molecules, numerical DNA computing is similar
to GA, but it uses A, T, G, and C bases to code the solution set [63]. A, G, C, and T
bases can be converted into numerical data using 0, 1, 2, and 3, respectively. DNA
computing has two new mutation operations: enzyme and virus mutations. Enzyme
mutation deletes one or more DNA parts from a DNA serial, while virus mutation
adds one or more DNA parts to a DNA serial. The two mutations provide continuous
renewal of the population and prevent focusing on local optima.
DNA computing has some limitations in terms of convergence speed, adaptability,
and effectiveness. In [34], DNA computing algorithm is improved by using adaptive
16.2 DNA Computing 271
parameters toward the desired goal using quantum-behaved PSO, where parameters
of population size, crossover rate, maximum number of operations, enzyme and virus
mutation rates, and fitness function are simultaneously tuned for adaptive process in
order to increase the diversity in the population and prevent the focusing on local
optimum points.
All life forms process information on a biomolecular level, which is robust, self-
organizing, adaptive, decentralized, asynchronous, fault-tolerant, and evolvable.
These properties have been exploited in artificial chemical systems like P systems
or artificial hormone systems.
272 16 Biomolecular Computing
The structure of the basic cell-like P system consists of several membranes arranged
in a hierarchical structure inside a main membrane (the skin), and delimiting regions.
A cell-like P system is defined as a hierarchical arrangement of compartments delim-
ited by membranes. Each compartment may contain a finite multiset of objects (chem-
icals) and a finite set of rules, as well as a finite set of other compartments. The rules
perform transformation and communication operations. Each membrane identifies
a region inside the system. A region contains some objects representing molecules
and evolution rules representing chemical reactions, and possibly other membranes.
A region without any membrane inside is called an elementary one. Objects inside
the regions are delimited by membranes, and rules assigned to the regions of the
membrane structure. The objects can be described by symbols or by strings of sym-
bols. They can evolve and/or move from a region to a neighboring one according to
given evolution rules, associated with the regions. Usually, the rules are applied in
a nondeterministic and maximally parallel way. The evolution of the system corre-
sponds to a computation. The evolution rules represent biochemical interactions or
chemical reactions. During execution, rules are iteratively applied to the symbolic
state of each compartment, and compartments may break open, causing the compo-
sition of symbolic states. The molecular species (ions, proteins, etc.) floating inside
cellular compartments are represented by multisets of objects described by means
of symbols or strings over a given alphabet.
The membrane structure and its associated tree is shown in Figure 16.2. It has a
parentheses expression for membranes as [[[ ]5 [[ ]6 ]4 ]2 [ ]3 ]1 .
16.3 Membrane Computing 273
6
Membranes Elementary Environment
membranes
The membrane structure µ is denoted by a string of left and right brackets ([, ]),
each with the label of the membrane it points to and describing the position of this
membrane in the hierarchy. The rules in each region: µ → (a1 , t1 ), . . . , (am , tm ),
where u is a multiset of symbols from V , ai ∈ V , and ti ∈ {in, out, her e}, i =
1, . . . , m, denotes the symbol remaining in the current compartment, sent to the
outer compartment, or sent to one of the arbitrarily chosen compartments contained
in the current one. When the rule is applied to a multiset u in the current compartment,
u is replaced by the symbols ai .
A configuration of the P system is a tuple c = (u 1 , . . . , u n ), where u i ∈ V ∗ ,
is the multiset associated with compartment i, i = 1, . . . , n. A computation from a
configuration c1 to c2 using the maximal parallelism mode is denoted by c1 =⇒ c2 .
A configuration is a terminal configuration if there is no compartment i such that u i
can be further developed.
A sequence of transitions between configurations of a given P system is called
a computation. A computation is successful if and only it reaches a configuration in
which no rule is applicable. A successful computation sends out of the skin membrane
the multiset of objects during the computation. Unsuccessful computation will never
halt, and generate no result.
This framework provides polynomial time solutions to NP-complete problems
by trading space for time, and whose efficient simulation poses challenges in three
different aspects: an intrinsic massively parallelism of P systems, an exponential com-
putational workspace, and a nonintensive floating point nature. Specifically, these
models were inspired by the capability of cells to produce an exponential num-
ber of new membranes in linear time, through mitosis (membrane division) and/or
autopoiesis (membrane creation) processes.
Computing Capability of P Systems
In [28] it was proved that P system with symport/antiport operating under maxi-
mal parallelism, with only one symbol and degree 2n + 3 can simulate a partially
blind register machines with n registers. If priorities are added to the rules, then the
obtained P system, having n + 3 compartments, can simulate register machines with
n registers. The former result was improved in [20], where it was proved that any
partially blind register machine with n registers can be simulated by a P system with
symport/antiport with only one symbol, degree n + 3 and operating under maximal
parallelism. It was proved in [21] that P systems with symport/antiport operating
under maximal parallelism, with only one symbol and degree 2n + 1, can simulate
register machines with n registers. P systems can solve a number of NP-hard prob-
lems in linear or polynomial time complexity and even solve PSPACE problems in
a feasible time [3,32].
The first super-Turing model of computation rooted in biology rather than physics
is introduced in [8]. In [23], the accelerating P system model [8] is extended, and it
is shown that the resulting systems have hyperarithmetical computational power.
16.3 Membrane Computing 275
In addition to basic cell-like P systems [45], there are tissue-like P systems [41],
neural-like P systems [29], metabolic P systems [39], and population P systems [46].
In all cases, there are basic components (membranes, cells, neurons, etc.) hier-
archically arranged, through a rooted tree, for cell-like P systems, or distributed
across a network, like a directed graph, for tissue-like P systems, with a common
environment. Neural-like P systems consider neurons as their cells organized with
a network structure as a directed graph. Various variants of P systems with Turing
computing power have been developed and polynomial or linear solutions to a variety
of computationally hard, NP-complete or PSPACE-complete, problems have been
obtained [48].
A biological motivation of tissue P systems [41] is the intercellular communication
and cooperation between tissue cells by the interchange of signaling molecules.
Tissue P systems can simulate a Turing machine even when using a small number
of cells, each of them having a small number of states.
Tissue-like P systems consider arbitrary graphs as underlying structures, with
membranes placed in the nodes while edges correspond to communication channels
[41]. In tissue-like P systems, several one-membrane cells are considered as evolving
in a common environment [16]. Neural-like P systems can be similar to tissue-like
P systems, or be spiking neural P systems, which only use one type of objects–
the spike. Results are output through the distance between consecutive spikes. The
computing systems obtained are proved to be equivalent to Turing machines [47] even
when using restricted combinations of features. In the evolution–communication P
systems, communication rules are represented by symport/antiport rules that simulate
some of the biochemical transport mechanisms present in the cell.
Figure 16.3 shows the membrane structure of a tissue-like P system for evolving
the optimal solution. It consists of q cells. The region 0 is the environment and out-
put region of the system. The directed lines indicate the communication of objects
between the cells. Each object in the cells expresses a solution. The cells are arranged
as a loop topology based on the communication rules. Each cell runs independently.
The environment stores the global best object found so far. The communication
mechanism exchanges the objects between each cell and its two adjacent cells and
updates the global best object in the environment by using communication antiport
rule and symport rule. The role of evolution rules is to evolve the objects in cells to
generate new objects used in next computing step. During the evolution, each cell
maintains a population of objects. After objects are evolved, each cell communi-
q−1
0
276 16 Biomolecular Computing
cates its best object found in current computing step into the environment to update
the global best object. When the system halts, the objects in the environment are
regarded as the output of the system. This membrane computing approach has been
implemented for clustering in [50].
The inspiration for tissue P systems with cell separation [44] is that new cells are
produced by cell separation in tissues in a natural way. An upper bound of the power
of tissue P systems with cell separation is demonstrated in [56]. The class of problems
solvable by uniform families of these systems in polynomial time is contained in the
class PSPACE, which characterizes the power of many classical models of parallel
computing machines, such as the alternating Turing machine, relating classical and
bio-inspired parallel computing devices.
Spiking neural P systems [29,38], are a class of distributed parallel computing
models inspired by the neurophysiological behavior of neurons sending electrical
impulses (spikes) along axons to other neurons where there is a synapse between
each pair of connected neurons. Spiking neural P systems can also be viewed as an
evolution of P systems shifting from cell-like to neural-like architectures. They have
been shown to be computationally universal [29]. They employ the basic principle
of spiking neural networks — computation by sending spikes via a fixed network of
synapses between neurons, using membrane computing background. Each spiking
neuron is represented as a discrete device equipped with a counter of spikes it receives
from its neighbors. Even very restricted spiking neural P systems keep their universal
(in Turing sense) computational power [22].
Metabolic P systems [39] are a quantitative extension of P system form modeling
metabolic processes. They are deterministic P systems developed to model dynamics
of biological phenomena related to metabolism and signaling transduction in the
living cell. The classical viewpoint on metabolic dynamics, in terms of ordinary
differential equations, is replaced by suitable generalizations of chemical principles.
P systems with active membranes [46] have been proved to be complete from a
computational viewpoint, equivalent in this respect to Turing machines. The mem-
brane division can be used to solve computationally hard problems, e.g., NP-complete
problems, in polynomial or even linear time, by a space–time trade-off. In this com-
puting paradigm, decision problems are solved by using families of recognizer con-
fluent P systems [51], where all possible computations with the same initial config-
uration must give the same answer. In confluent recognizer P systems, all computa-
tions halt, only two possible outputs exist (usually named yes and no), and the result
produced by the system only depends upon its input, and is not influenced by the
particular sequence of computation steps taken to produce it.
Reconfig-P [43] is an implementation of membrane computing based on recon-
figurable hardware that is able to execute P systems at high performance. It exploits
the reconfigurability of the hardware by constructing and synthesizing a customized
hardware circuit for the specific P system to be executed. The Reconfig-P hardware
design treats reaction rules as the primary computational entities and represents
regions only implicitly. A generic simulator on GPUs for a family of recognizer P
system with active membranes was presented in [9].
16.3 Membrane Computing 277
metaheuristics. Until now two classes of membrane structures, the hierarchical struc-
ture of a cell-like P system (formally, a rooted tree) and the network structure of a
tissue-like P system (formally, a directed graph) have been used to design a variety
of membrane-inspired EAs.
DEPS [13] is a membrane algorithm for numerical optimization, which com-
bines DE, local search such as simplex method and P systems. The hierarchical
structure of cell-like P systems is used to organize the objects consisting of real-
valued strings, and the rules are composed of mutation, crossover and selection
operations in elementary membranes, a local search in the skin membrane and
transformation/communication-like rules in P systems. DEPS outperforms DE.
In [67], an EA is introduced by using the concepts and principles of the quantum-
inspired evolutionary approach and the hierarchical arrangement of the compart-
ments of a P system for solving NP-hard COPs.
The main structure of an animal cell includes the cell membrane, cell cytoplasm,
and cell nucleus. Many nuclear pores distributed on the cell’s nucleus are channels
for the transportation of macromolecules, such as mRNA nucleus, related enzymes,
and some proteins. Those macromolecules are essential substances for metabolism
of the cell, but some other substances are forbidden to enter the cell nucleus. Due
to the nuclear pores, the nucleus has the ability to select essential substances to
keep itself alive and stronger by means of substance filtration. Cell nuclear pore
optimization [37] is inspired by the phenomenon that appears in cell biology. This
means the optimal solution can be obtained through continuous filtrating from the
potential optimal solutions. The method obtains the potential essential samples from
the common samples according to certain evaluation criteria; if the potential essential
samples meet some evaluation criteria, they are the real essential samples. All the
common samples accompany with a pore vector containing 0 or 1, and its elements
are generated by some initial conditions or initial rules.
References
1. Adleman LM. Molecular computation of solutions to combinatorial problems. Science.
1994;266(5187):1021–4.
2. Albert R, Othmer HG. The topology of the regulatory interactions predicts the expression
pattern of the segment polarity genes in Drosophila melanogaster. J Theor Biol. 2003;223(1):
1–18.
3. Alhazov A, Martin-Vide C, Pan L. Solving a PSPACE complete problem by recognizing P
systems with restricted active membranes. Fundamenta Informaticae. 2003;58(2):67–77.
4. Baum EB. Building an associative memory vastly larger than the brain. Science. 1995;268:583–
5.
5. Benenson Y, Paz-Elizur T, Adar R, Keinan E, Livneh Z, Shapiro E. Programmable and
autonomous computing machine made of biomolecules. Nature. 2001;414:430–4.
6. Benenson Y, Gil B, Ben-Dor U, Adar R, Shapiro E. An autonomous molecular computer for
logical control of gene expression. Nature. 2004;429(6990):423–9.
References 279
7. Blenkiron M, Arvind DK, Davies JA. Design of an irreversible DNA memory element. Nat
Comput. 2007;6:403–11.
8. Calude CS, Paun G. Bio-steps beyond Turing. BioSystems. 2004;77:175–94.
9. Cecilia JM, Garcia JM, Guerrero GD, Martinez-del-Amor MA, Perez-Hurtado I, Perez-
Jimenez MJ. Simulation of P systems with active membranes on CUDA. Briefings Bioinform.
2010;11(3):313–22.
10. Chen H, Anindya D, Goel A. Towards programmable molecular machines. In: Proceedings of
the 5th conference on foundation of nanoscience, Snowbird, Utah, 2008. p. 137–139.
11. Chen J, Deaton R, Wang YZ. A DNA-based memory with in vitro learning and associative
recall. Nat Comput. 2005;4:83–101.
12. Chen H, Ionescu M, Ishdorj T. On the efficiency of spiking neural P systems. In: Gutierrez-
Naranjo MA, Paun G, Riscos-Nunez A, Romero-Campero FJ, editors. Proceedings of fourth
brainstorming week on membrane computing, Sevilla, Spain, February 2006. p. 195–206.
13. Cheng J, Zhang G, Zeng X. A novel membrane algorithm based on differential evolution for
numerical optimization. Int J Unconv Comput. 2011;7:159–83.
14. Clelland CT, Risca V, Bancroft C. Hiding messages in DNA microdots. Nature.
1999;399(6736):533–4.
15. Cox JP. Long-term data storage in DNA. Trends Biotechnol. 2001;19(7):247–50.
16. Diaz-Pernil D, Gutierrez-Naranjo MA, Perez-Jimenez MJ, Riscos-Nuez A. A linear-time tis-
sue P system based solution for the 3-coloring problem. Electron Notes Theor Comput Sci.
2007;171(2):81–93.
17. Dittrich P, Ziegler J, Banzhaf W. Artificial chemistries—a review. Artif Life. 2001;7(3):225–75.
18. Faulhammer D, Cukras AR, Lipton RJ, Landweber LF. Molecular computation: RNA solutions
to chess problems. Proc Nat Acad Sci U.S.A. 2000;97:1385–9.
19. Fisher MJ, Paton RC, Matsuno K. Intracellular signalling proteins as ‘smart’ agents in parallel
distributed processes. BioSystems. 1999;50(3):159–71.
20. Frisco P. Computing with cells: advances in membrane computing. Oxford: Oxford University
Press; 2009.
21. Frisco P. P Systems and unique-sum sets. In: Proceedings of international conference on mem-
brane computing, Lecture notes of computer science 6501. Berlin: Springer; 2010. p. 208–225.
22. Garcia-Arnau M, Perez D, Rodriguez-Paton A, Sosik P. On the power of elementary features
in spiking neural P systems. Nat Comput. 2008;7:471–83.
23. Gheorghe M, Stannett M. Membrane system models for super-Turing paradigms. Nat Comput.
2012;11:253–9.
24. Grumbach S, Tahi F. A new challenge for compression algorithms: genetic sequences. Inf
Process Manag. 1994;30:875–86.
25. Grumbach S, Tahi F. Compression of DNA sequences. In: Proceedings of data compression
conference, Snowbird, UT, March 1993. p. 340–350.
26. Hasty J, McMillen D, Collins JJ. Engineered gene circuits. Nature. 2002;420:224–30.
27. Heider D, Barnekow A. DNA-based watermarks using the DNA-crypt algorithm. BMC Bioin-
form. 2007;8:176.
28. Ibarra OH, Woodworth S. On symport/antiport P systems with small number of objects. Int J
Comput Math. 2006;83(7):613–29.
29. Ionescu M, Paun G, Yokomori T. Spiking neural P systems. Fundamenta Informaticae.
2006;71:279–308.
30. Ionescu M, Paun G, Yokomori T. Spiking neural P systems with an exhaustive use of rules. Int
J Unconv Comput. 2007;3(2):135–53.
31. Isaacs FJ, Dwyer DJ, Ding C, Pervouchine DD, Cantor CR, Collins JJ. Engineered riboregu-
lators enable post-transcriptional control of gene expression. Nat Biotechnol. 2004;22:841–7.
32. Ishdorj T, Leporati A, Pan L, Zeng X, Zhang X. Deterministic solutions to QSAT and Q3SAT by
spiking neural P systems with pre-computed resources. Theor Comput Sci. 2010;411:2345–58.
280 16 Biomolecular Computing
33. Kan A, Sakai Y, Shohda K, Suyama A. A DNA based molecular logic gate capable of a variety
of logical operations. Nat Comput. 2014;13:573–81.
34. Karakose M, Cigdem U. QPSO-based adaptive DNA computing algorithm. Sci World J.
2013;2013:8. Article ID 160687.
35. Lauffenburger DA. Cell signaling pathways as control modules: complexity for simplicity?
PNAS. 2000;97(10):5031–3.
36. Leporati A, Besozzi D, Cazzaniga P, Pescini D, Ferretti C. Computing with energy and chemical
reactions. Nat Comput. 2010;9:493–512.
37. Lin L, Guo F, Xie X. Novel informative feature samples extraction model using cell nuclear
pore optimization. Eng Appl Artif Intell. 2015;39:168–80.
38. Maass W. Computing with spikes. Found Inf Process TELEMATIK. 2002;8:32–6.
39. Manca V, Bianco L, Fontana F. Evolution and oscillation in P systems: applications to biolog-
ical phenomena. In: Mauri G, Paun G, Perez-Jimenez MJ, Rozenberg G, Salomaa A, editors.
Workshop on membrane computing, Lecture notes in computer science 3365. Berlin: Springer;
2004. p. 63–84.
40. Marijuan PC. Enzymes, artificial cells and the nature of biological information. BioSystems.
1995;35:167–70.
41. Martin-Vide C, Paun G, Pazos J, Rodriguez-Paton A. Tissue P systems. Theor Comput Sci.
2003;296(2):295–326.
42. Neary T. On the computational complexity of spiking neural P systems. Nat Comput.
2010;9:831–51.
43. Nguyen V, Kearney D, Gioiosa G. An implementation of membrane computing using recon-
figurable hardware. Comput Inform. 2008;27:551–69.
44. Pan L, Perez-Jimenez M. Computational complexity of tissue-like P systems. J Complex.
2010;26:296–315.
45. Paun G. Computing with membranes. J Comput Syst Sci. 2000;61(1):108–43.
46. Paun G. Membrane computing: an introduction. Berlin: Springer; 2002.
47. Paun G. A quick introduction to membrane computing. J Logic Algebraic Program.
2010;79(6):291–4.
48. Paun G, Rozenberg G, Salomaa A, editors. Handbook of membrane computing. Oxford, UK:
Oxford University Press; 2010.
49. Paun G, Rozenberg G, Salomaa A. DNA computing. Berlin: Springer; 1998.
50. Peng H, Luo X, Gao Z, Wang J, Pei Z. A novel clustering algorithm inspired by membrane
computing. Sci World J. 2015;2015:8. Article ID 929471.
51. Perez-Jimenez MJ, Romero-Jimenez A, Sancho-Caparrini F. Complexity classes in models of
cellular computing with membranes. Nat Comput. 2003;2(3):265–85.
52. Porreca AE, Leporati A, Mauri G, Zandron C. P systems with active membranes: trading time
for space. Nat Comput. 2011;10:167–82.
53. Qian L, Winfree E. A simple DNA gate motif for synthesizing large-scale circuits. In: DNA
computing, Volume 5347 of Lecture notes in computer science. Berlin: Springer; 2008. p.
70–89.
54. Rothemund P. A DNA and restriction enzyme implementation of turing machines. In: DNA
based computers, DIMACS series in discrete mathematics and theoretical computer science,
no. 27. Providence, RI: American Mathematical Society; 1996. p. 75–120.
55. Seelig G, Soloveichik D, Zhang DY, Winfree E. Enzyme-free nucleic acid logic circuits. Sci-
ence. 2006;314(5805):1585.
56. Sosik P, Cienciala L. Computational power of cell separation in tissue P systems. Inf Sci.
2014;279:805–15.
57. Stojanovic MN, Mitchell TE, Stefanovic D. Deoxyribozyme-based logic gates. J Am Chem
Soc. 2002;124(14):3555–61.
58. Tufte G, Haddow PC. Towards development on a silicon-based cellular computing machine.
Nat Comput. 2005;4:387–416.
References 281
59. Weiss R, Basu S, Hooshansi S, Kalmbach A, Karig D, Mehreja R, Netravalt I. Genetic circuit
building blocks for cellular computation, communications, and signal processing. Nat Comput.
2003;2:47–84.
60. Weiss R, Knight Jr TF, Sussman G. Genetic process engineering. In: Amos M, editor. Cellular
computation. Oxford, UK: Oxford University Press; 2004. p. 43–73.
61. Win MN, Smolke CD. Higher-order cellular information processing with synthetic RNA
devices. Science. 2008;322(5900):456–60.
62. Wong PC, Wong K, Foote H. Organic data memory using the DNA approach. Commun ACM.
2003;46(1):95–8.
63. Xu J, Qiang X, Yang Y, Wang B, Yang D, Luo L, Pan L, Wang S. An unenumerative DNA
computing model for vertex coloring problem. IEEE Trans Nanobiosci. 2011;10(2):94–8.
64. Yamamoto M, Kashiwamura S, Ohuchi A, Furukawa M. Large-scale DNA memory based on
the nested PCR. Nat Comput. 2008;7:335–46.
65. Yurke B, Turberfield A, Mills A Jr, Simmel F, Neumann J. A DNA-fuelled molecular machine
made of DNA. Nature. 2000;406:605–8.
66. Zandron C, Ferretti C, Mauri G. Solving NP-complete problems using P systems with active
membranes. In: Antoniou CS, Calude MJ, Dinneen I, editors. Unconventional models of com-
putation. London: Springer; 2000. p. 289–301.
67. Zhang GX, Gheorghe M, Wu CZ. A quantum-inspired evolutionary algorithm based on P
systems for knapsack problem. Fundamenta Informaticae. 2008;87:93–116.
Quantum Computing
17
17.1 Introduction
nized, coherent. If one particle is measured and collapsed it causes all other particles
to collapse too.
The well-known quantum algorithms include the Deutsch–Jozsa algorithm [7],
Shor’s quantum factoring algorithm [29,30], and Grover’s database search algorithm
[9,10]. Shor’s algorithm can give an exponential speedup for factoring large integers
into prime numbers, and has been implemented using nuclear magnetic resonance
(NMR) [33]. Shor’s quantum algorithm [30] is exponentially faster than any known
classical algorithm. It can factorize large integers faster than any Turing program,
and this suggests that quantum theory has super-Turing potential.
In ensemble quantum computation, all computations are performed on an ensem-
ble of computers rather than on a single computer. Measurements of qubits in a single
computer cannot be performed, and only expectation values of each particular bit
over all the computers can be read out. The randomizing strategy and the sorting strat-
egy resolve the ensemble-measurement problem in most cases [2]. NMR computing
[5,8] is a promising implementation of quantum computing. Several quantum algo-
rithms involving only few qubits have been demonstrated [5,8,18,26,33]. In such
NMR systems, each molecule is used as a computer. Different qubits in the com-
puter are represented by spins of different nuclei. There is an ensemble of quantum
computers.
17.2 Fundamentals
where |0 > and |1 > are two basis states, α and β are complex numbers defining
probabilities which of the corresponding states are likely to appear when a qubit
is read (measured, collapsed). |α|2 and |β|2 give the probability of a qubit being
found in state 0 or 1, respectively. Thus, |α|2 + |β|2 = 1 at any time. After loss of
coherence, the qubit will collapse into one of the states |0 > or |1 >.
With the exception of measurements, all other operations allowed by quantum
mechanics are unitary operations on the Hilbert space in which qubits live. They
are represented by gates, much as in a classical circuit. Hadamard gate H maps:
|0 >→ √1 (|0 > +|1 >) and |1 >→ √1 (|0 > −|1 >). It makes the eigenstates into
2 2
a superposition of |0 > and |1 > with equal probability amplitudes.
The evolution of a quantum system is described by special linear operators, unitary
operators U , which give
U | >= U [α|0 > +β|1 >] = αU |0 > +βU |1 > . (17.2)
That is, evolution of a two-level quantum system is a linear combination of those of
their basis states.
Analogous to logic gates in classical computers, quantum computing tasks can be
completed through quantum logic gates. In order to modify the probability ampli-
tudes, quantum gates can be applied to the states of a qubit. Quantum gates are
unitary operators that transform quregisters into quregisters. Being unitary, gates
represent characteristic reversible transformations. Some most useful quantum gates
for quantum computation are NOT-gate, controlled-NOT (CNOT) gate, phase-shift
gate, and Hadamard gate. Phase-shift gate is an important element to carry out the
Grover iteration for reinforcing a good decision. The quantum analog for exploring
the search space is quantum gate, which is a unitary transformation.
A set of gates is said to be universal for quantum computation if any unitary
operation may be approximated to arbitrary accuracy by a quantum circuit involving
only those gates. Any arbitrary unitary operation can be approximated to arbitrary
accuracy using Hadamard, phase, CNOT, and π/8 gates. Further, any classical circuit
can be made reversible by introducing a special gate called Toffoli gate. Since a
quantum version of Toffoli gate is developed, a classical reversible circuits can be
converted to a quantum circuit that computes the same function.
The basic components of quantum circuits are linear quantum gates, which imple-
ment unitary (and reversible) transformations as rotations in the complex qubit vector
space. Rotations maintain the orthogonality of basis vectors, and hence, the validity
of the measurement postulate. Each quantum gate is therefore represented by a suit-
able unitary matrix U. As a consequence of linearity for matrix-vector multiplication,
the gate operation is equivalently represented by the transformation of every basis
vector in the quantum state space. The unitary property implies that quantum states
cannot be copied or cloned; this is also known as the no-cloning property.
No-cloning theorem states that it is not possible to clone a quantum state , and
thus to obtain full information on the coefficients α and β from a single copy of .
Entanglement is another feature arising from the linearity of quantum mechanics.
The state of a composite classical system is completely determined by the states
286 17 Quantum Computing
of the subsystems. The state of a composite quantum system | > AB is the tensor
product ⊗ of the states of the component systems
1
|Bell AB = √ [|0 > A ⊕|0 > B +|1 > A ⊕|1 > B ]. (17.3)
2
Such a Bell state is said to be entangled.
Quantum algorithms rely on properties of quantum parallelism and quantum
superposition. Quantum parallelism arises from the ability of a quantum memory
register to exist in a superposition of base states. A quantum memory register can
exist in a superposition of states and each component of this superposition may be
thought of as a single argument to a function. Since the number of possible states is
2n for n qubits in the quantum register, we can perform in one operation on a quan-
tum computer what would take an exponential number of operations on a classical
computer. As the number of superposed states increases in the quantum register,
the probability of measuring any state will start decreasing. In quantum computing,
by contrast, all solutions are guaranteed to be generated and we need not concern
ourselves with the possibility of missing potential solutions.
Quantum algorithm is based on a probability for searching the best solution ran-
domly. It has the drawback of premature and stagnation in the late stage of evolution.
√
increasing the amplitude of the marked state by O( N ), at the expense of the non-
marked states,
√ in a number of ways analogous to interference of waves. Therefore,
after O( N ) times, the probability of measuring the marked √ state approaches 1.
Grover showed that performing a measurement after π/4 N iterations is highly
likely to give the correct result for sufficiently large N .
Grover’s algorithm is given in Algorithm 17.1.
Example 17.1: In this simple example, we search for a needle in the haystack, i.e., to
find a particular element among the elements of a database. We now simulate Grover’s
algorithm using six qubits. There are 26 = 64 database elements. The desired ele-
ment is randomly generated from among the 64 elements. The Walsh–Hadamard
transformation and the operators to rotate phase and inversion about average trans-
formation are achieved through matrices of zeros and ones. By testing with different
value for the number √ of iterations, we verified that the optimal number of iterations
is determined by π/4 N as proposed by Grover. Figure 17.1 shows the probability
dynamics generated for the desired element being selected and the resulting distrib-
ution for each element being selected.
0.9
0.8
0.7
Probability
0.6
0.5
0.4
0.3
0.2
0.1
1 2 3 4 5 6
Iteration
0.9
0.8
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70
Element
Figure 17.1 a The probability dynamics for the desired element being selected, and b the resulting
distribution for each element being selected.
In EA, variation operators like crossover or mutation operations are used to explore
the search space. The quantum analog for these operators is called a quantum gate.
Mutation can be performed by deducing probability distribution or by Q-gate rotation
[34], while quantum collapse concept was introduced to maintain diversity among
quantum chromosomes in [35].
Quantum or quantum-inspired EAs have been proposed to improve the existing
EAs. In such hybrid methods, qubits are generally used to encode the individuals
and quantum operations are used to define the genetic operations. There are binary
observation quantum-inspired EA [11,12], and real-observation quantum EA [37].
17.3 Hybrid Methods 289
1. Quantum initialization.
Set the generation t = 1.
Set the population size N , and the number of quantum genes m.
2. Repeat:
a. Measurement: Convert quantum encoding to binary encoding (b1t , b2t , . . . , btN ).
Produce a random number r ∈ (0, 1).
if r < |αtji |2 , then bi = 0 otherwise bi = 1.
b. Fitness evaluation.
c. Determine the rotation angle θ.
d. Apply quantum gate U on each x tj .
e. Record the best solution.
f. Set t = t + 1.
until termination criterion is satisfied.
Problem
17.1 Open the accompanying Quantum Information Toolkit (QIT) MATLAB code.
(a) Run and understand Shor’s factorization algorithm.
(b) Run and understand Grover’s search algorithm.
References
1. Benioff P. The computer as a physical system: a microscopic quantum mechanical Hamiltonian
model of computers as represented by Turing machines. J Stat Phys. 1980;22(5):563–91.
2. Boykin PO, Mor T, Roychowdhury V, Vatan F. Algorithms on ensemble quantum computers.
Natural Comput. 2010;9(2):329–45.
292 17 Quantum Computing
3. Chiang H-P, Chou Y-H, Chiu C-H, Kuo S-Y, Huang Y-M. A quantum-inspired tabu search
algorithm for solving combinatorial optimization problems. Soft Comput. 2014;18:1771–81.
4. Chuang IL, Gershenfeld N, Kubinec M. Experimental implementation of fast quantum search-
ing. Phys Rev Lett. 1998;80(15):3408–11.
5. Cory DG, Fahmy AF, Havel TF. Ensemble quantum computing by nuclear magnetic resonance
spectroscopy. Proc Natl Acad Sci USA. 1997;94:1634–9.
6. Deutsch D. Quantum theory, the Church-Turing principle and the universal quantum computer.
Proc Royal Soc Lond A. 1985;400(1818):97–117.
7. Deutsch D, Jozsa R. Rapid solution of problems by quantum computation. Proc Royal Soc
Lond A. 1992;439(1907):553–8.
8. Gershenfeld N, Chuang IL. Bulk spin-resonance quantum computation. Science.
1997;275(5298):350–6.
9. Grover LK. Quantum mechanics helps in searching for a needle in a haystack. Phys Rev Lett.
1997;79(2):325–8.
10. Grover LK. A fast quantum mechanical algorithm for database search. In: Proceedings of the
28th annual ACM symposium on theory of computing (STOC’96), Philadelphia, PA, USA,
May 1996. New York: ACM Press; 1996. p. 212–219.
11. Han KH, Kim JH. Quantum-inspired evolutionary algorithm for a class of combinatorial opti-
mization. IEEE Trans Evol Comput. 2002;6(6):580–93.
12. Han KH, Kim JH. Quantum-inspired evolutionary algorithms with a new termination criterion,
H gate, and two-phase scheme. IEEE Trans Evol Comput. 2004;8(2):156–69.
13. Han KH, Kim JH. On the analysis of the quantum-inspired evolutionary algorithm with a single
individual. In: Proceedings of IEEE congress on evolutionary computation (CEC), Vancouver,
BC, Canada, July 2006. p. 2622–2629.
14. Ibrahim AA, Mohamed A, Shareef H. A novel quantum-inspired binary gravitational search
algorithm in obtaining optimal power quality monitor placement. J Appl Sci. 2012;12:822–30.
15. Jeong Y-W, Park J-B, Jang S-H, Lee KY. A new quantum-inspired binary PSO: application to
unit commitment problems for power systems. IEEE Trans Power Syst. 2010;25(3):1486–95.
16. Jiao L, Li Y, Gong M, Zhang X. Quantum-inspired immune clonal algorithm for global opti-
mization. IEEE Trans Syst Man Cybern Part B. 2008;38(5):1234–53.
17. Jones JA. Fast searches with nuclear magnetic resonance computers. Science.
1998;280(5361):229.
18. Jones JA, Mosca M, Hansen RH. Implementation of a quantum search algorithm on a quantum
computer. Nature. 1998;393:344–6.
19. Kadowaki T, Nishimori H. Quantum annealing in the transverse Ising model. Phys Rev E.
1998;58:5355–63.
20. Kwiat PG, Mitchell JR, Schwindt PDD, White AG. Grover’s search algorithm: an optical
approach. J Modern Optics. 2000;47:257–66.
21. Liao G. A novel evolutionary algorithm for dynamic economic dispatch with energy saving
and emission reduction in power system integrated wind power. Energy. 2011;36:1018–29.
22. Meng K, Wang HG, Dong ZY, Wong KP. Quantum-inspired particle swarm optimization for
valve-point economic load dispatch. IEEE Trans Power Syst. 2010;25(1):215–22.
23. Montiel O, Rivera A, Sepulveda R. Design and acceleration of a quantum genetic algorithm
through the Matlab GPU library. In: Design of intelligent systems based on fuzzy logic, neural
networks and nature-inspired optimization, vol. 601 of Studies in Computational Intelligence.
Berlin: Springer; 2015. p. 333–345.
24. Narayanan A, Moore M. Quantum-inspired genetic algorithms. In: Proceedings of IEEE inter-
national conference on evolutionary computation, Nogaya, Japan, May 1996. p. 61–66.
25. Nezamabadi-pour H. A quantum-inspired gravitational search algorithm for binary encoded
optimization problems. Eng Appl Artif Intell. 2015;40:62–75.
26. Nielsen MA, Knill E, Laflamme R. Complete quantum teleportation using nuclear magnetic
resonance. Nature. 1998;396:52–5.
References 293
27. Platel MD, Schliebs S, Kasabov N. A versatile quantum-inspired evolutionary algorithm. In:
Proceedings of IEEE congress on evolutionary computation (CEC), Singapore, Sept 2007. p.
423–430.
28. Platel MD, Schliebs S, Kasabov N. Quantum-inspired evolutionary algorithm: a multimodel
EDA. IEEE Tran Evol Comput. 2009;13(6):1218–32.
29. Shor PW. Algorithms for quantum computation: discrete logarithms and factoring. In: Pro-
ceedings of the 35th annual symposium on foundations of computer science, Sante Fe, NM,
USA, Nov 1994. pp. 124–134.
30. Shor PW. Polynomial-time algorithms for prime factorization and discrete logarithms on a
quantum computer. SIAM J Comput. 1997;26:1484–509.
31. Soleimanpour-moghadam M, Nezamabadi-pour H, Farsangi MM. A quantum-inspired gravi-
tational search algorithm fornumerical function optimization. Inf Sci. 2014;276:83–100.
32. Sun J, Feng B, Xu WB. Particle swarm optimization with particles having quantum behavior.
In: Proceedings of IEEE congress on evolutionary computation (CEC), Portland, OR, USA,
June 2004. p. 325–331.
33. Vandersypen LMK, Steffen M, Breyta G, Yannoni CS, Sherwood MH, Chuang IL. Experi-
mental realization of Shor’s quantum factoring algorithm using nuclear magnetic resonance.
Nature. 2001;414(6866):883–7.
34. Vlachogiannis JG, Ostergaard J. Reactive power and voltage control based on general quantum
genetic algorithms. Expert Syst Appl. 2009;36:6118–26.
35. Yang S, Wang M, Jiao L. A genetic algorithm based on quantum chromosome. In: Proceedings
of the 7th international conference on signal processing, Beijing, China, Aug 2004. p. 1622–
1625.
36. Zhang G, Jin W, Hu L. A novel parallel quantum genetic algorithm. In: Proceedings of the 4th
international conference on parallel and distributed computing, applications and technologies,
Chengdu, China, Aug 2003. p. 693–697.
37. Zhang GX, Rong HN. Real-observation quantum-inspired evolutionary algorithm for a class
of numerical optimization problems. In: Proceedings of the 7th international conference on
computational science, Beijing, China, May 2007, vol. 4490 of Lecture Notes in Computer
Science. Berlin: Springer; 2007. p. 989–996.
Metaheuristics Based on Sciences
18
mostly vibrate in their position in solid phase. The IMO algorithm also mimics these
two phases to perform diversification and intensification during optimization.
Optimization by Optics
Similar to other multiagent methods, ray optimization [29] has a number of particles
consisting of the variables of the problem. These agents are considered as rays of light.
Based on the Snell’s light refraction law, when light travels from a lighter medium to
a darker medium, it refracts and its direction changes. This behavior helps the agents
to explore the search space in early stages of the optimization process and to make
them converge in the final stages.
Optics inspired optimization [28] treats the surface of the numerical function to
be optimized as a reflecting surface in which each peak is assumed to reflect as a
convex mirror and each valley to reflect as a concave one.
Filter machine [24] is an optical model for computation in solving combinatorial
problems. It consists of optical filters as data storage and imaging operation for
computation. Each filter is a long optical sensitive sheet, divided into cells. Filter
machine is able to generate every Boolean function.
Cloud-Based Optimization
Atmosphere clouds model optimization [62] is a stochastic optimization algorithm
inspired from the behaviors of cloud in the natural world. It simulates the generation,
moving and spreading behaviors of cloud in a simple way. The search space is divided
into many disjoint regions according to some rules, and each region has its own
302 18 Metaheuristics Based on Sciences
humidity value and air pressure value. There are some rules. (a) Clouds can only be
generated in regions whose humidity values are higher than a certain threshold. (b)
Under wind, clouds move from regions with higher air pressure to regions with lower
air pressure. (c) In the moving process, the droplets of one cloud would spread or
gather according to the air pressure difference between the region where this cloud
is located before move behavior and the region where the cloud is located after move
behavior. (d) One cloud is regarded as having disappeared when its coverage exceeds
a certain value or its droplets number is less than a threshold. The humidity value and
air pressiure value of a region are updated every time after the generation, moving
and spreading behaviors of clouds.
Lightning Search
Lightning search [50] is a metaheuristic method for solving constrained optimiza-
tion problems, which is inspired by the natural phenomenon of lightning and the
mechanism of step leader propagation using the concept of fast particles known as
projectiles. Three projectile types are developed to represent the transition projec-
tiles that create the first step leader population, the space projectiles that attempt to
become the leader, and the lead projectile that represent the projectile fired from
best positioned step leader. The major exploration feature of the algorithm is mod-
eled using the exponential random behavior of space projectile and the concurrent
formation of two leader tips at fork points using opposition theory.
Wind Driven Optimization
Wind driven optimization [7,8] is a population-based global optimization algorithm
inspired by atmospheric motion. At its core, a population of infinitesimally small
air parcels navigates over a search space, where the velocity and the position of
wind controlled air parcels are updated following Newton’s second law of motion.
Compared to PSO, wind driven optimization employs additional terms in the velocity
update equation (e.g., gravitation and Coriolis forces).
other stars around it. If a star gets too close to the black hole, it will be swallowed
by the black hole and is gone forever. In such a case, a new star (candidate solution)
is randomly generated and placed in the search space and starts a new search.
Once the stars are initialized and the black hole is designated, the black hole starts
absorbing the stars around it. Therefore, all stars move toward the black hole [26]:
x i = x i + rand(x B H − x i ), ∀i, i = best. (18.1)
The black hole does not move, because it has the best fitness value and then attracts
all other particles. While moving toward the black hole, a star may reach a location
with lower cost (with a best fitness) than the black hole. Therefore, the black hole is
updated by selecting this star.
If a star crosses the event horizon of the black hole, i.e., if the distance between
a star and the black hole is less than the Schwarzschild radius, this star dies. A new
star is born and it is distributed randomly in the search space. The radius of the event
horizon is calculated by [26]
fBH
R = N , (18.2)
i=1 f i
where f B H is the fitness value of the black hole, f i is the fitness value of the ith star,
and N is the number of stars.
Stellar-mass black hole optimization [6] is another metaheuristic technique
inspired from the property of a black hole’s gravity that is present in the Universe. It
outperforms PSO and cuckoo search on the benchmark.
Multi-verse optimizer [38] is a metaheuristic inspired from three concepts in
cosmology: white hole, black hole, and wormhole. The mathematical models of
these three concepts are developed to perform exploration, exploitation, and local
search, respectively.
18.5 Sorting
Computer processing mainly depends on sorting and searching methods. There are
many sorting algorithms, like bubble sort and library sort [9]. Beadsort [4] is a natural
sorting algorithm where the basic operation can be compared to the manner in which
beads slide on parallel poles, such as on an abacus. Rainbow sort [47] is based on the
physical concepts of refraction and dispersion, where light beams of longer wave-
lengths are refracted to a lesser degree than beams of a shorter wavelength. Spaghetti
sort [15] can be illustrated by using uncooked pipes of spaghetti. Centrifugal sort
[41] represents the numbers to be sorted by the density of the liquids. The gravitation
acceleration would be sufficient for sorting. Higher values of acceleration and speed
may speed up the process. Friction-based sorting [16] is to associate to each number
a ball with weight proportional to that number. All the balls fall in the presence of
friction, and the heavier ball corresponding to the greater input number will reach
the ground earlier.
304 18 Metaheuristics Based on Sciences
variable population size. All quantities related to energy should have nonnegative
values.
Molecular structure represents the feasible solution of the optimization problem
currently attained by the molecule. Potential energy quantifies the molecular struc-
ture in terms of energy and is modeled as the cost function value of the optimization
problem. Kinetic energy characterizes the degree of the molecule’s activity, indicat-
ing the solution’s ability of jumping out of local optima. Number of hits counts the
number of hits experienced by the molecule. Minimum hit number is recorded at
the hit when the molecule possesses the current best solution. Thus, the difference
between the number of hits and minimum hit number is the number of hits that the
molecule has experienced without finding a better solution. This is also used as the
criterion for decomposition. Minimum value is the cost function value of the solu-
tion generated at the time when the minimum hit number is updated, that is, it is the
minimum potential energy experienced by the molecule itself.
Imagine that there is a closed container with a certain number of molecules.
These molecules collide and undergo elementary reactions, which may modify their
molecular structures and the attained energies. Elementary reactions are operators,
which update the solutions. Through a random sequence of elementary reactions,
the algorithm explores the solution space and converges to the global minimum.
Chemical reactions occur due to the formation and breaking of chemical bonds
that is produced by the motion of electrons of the molecules. Four types of elementary
reactions are on-wall ineffective collision, decomposition, intermolecular ineffective
collision, and synthesis. Through a random sequence of elementary reactions, CRO
explores the solution space and converges to the global minimum. The two ineffective
collisions modify the molecules to new molecular structures that are close to the
original ones, thus enabling the molecules to search their immediate surroundings on
the potential energy space (solution space). Conversely, decomposition and synthesis
tend to produce new molecular structures. Among the four collisions, local search is
contributed by onwall ineffective collosion and intermolecular ineffective collision,
whereas global search is intensified by decomposition and synthesis.
In initialization, a population of molecules is randomly generated, their potential
energies are determined, and they are assigned with proper initial kinetic energys.
Then, CRO enters into the stage of iterations. The manipulated agents are mole-
cules and the events for manipulating the solutions represented by the molecules
are classified into four elementary reactions. In each iteration, the collision is first
identified as unimolecular or intermolecular. In each iteration of the algorithm, only
one elementary reaction will take place, depending on the conditions of the chosen
molecules for that iteration. The algorithm then checks if any new solution superior
to the best-so-far solution is found. If so, the solution will be kept in memory. This
iteration stage repeats until a stopping criterion is satisfied. Finally, the solution with
the lowest cost function value is outputted.
In [32], some convergence results are presented for several generic versions of
CRO, each adopting different combinations of elementary reactions. By modeling
CRO as a finite absorbing Markov chain, CRO is shown to converge to a global
optimum solution with a probability arbitrarily close to one, when time tends to
306 18 Metaheuristics Based on Sciences
μi
fmin fmax fi
nearly saturated with species. Islands with low habitability have a high species immi-
gration rate. Immigration of new species to islands might raise the habitability of
those islands because habitability is proportional to biological diversity.
Biogeography-based optimization (BBO) (Matlab Code, http://academic.csuohio.
edu/simond/bbo) [51,53] is a population-based stochastic global optimization algo-
rithm based on biogeography theory. In BBO, a set of solutions is called archipelago,
a solution is called a habitat (island) with a habitat suitability index (HSI) as the fit-
ness of the solution, and a solution feature is called species. BBO adopts migration
operator to share information between solutions. It maintains its set of solutions from
one iteration to the next one.
BBO has migration and mutation operators. As with every other EA, mutation and
elitism might also be incorporated. Each individual has its own immigration rate λi
and emigration rate μi , which are functions of its fitness. A good solution has higher
μi and lower λi , and vice versa. In a linear model of species richness (as illustrated
in Figure 18.1), a habitat’s immigration rate λi and emigration rate μi are calculated
based on its fitness f i by
f max − f i f i − f min
λi = I , μi = E , (18.3)
f max − f min f max − f min
where f max and f min are, respectively, the maximum and minimum fitness values
among the population and I and E are, respectively, the maximum possible immi-
gration rate and emigration rate. That is, with the increase of HSI f i , λi linearly
decreases from I to 0, while μi linearly increases from 0 to E.
The probability of immigrating to x k is λk and that of emigrating from x k is based
on roulette-wheel selection Nμk , where N is the population size.
j=1 μ j
For each habitat i, a species count probability Pi computed from λi and μi indi-
cates the likelihood that the habitat was expected a priori as a solution. Mutation is
a probabilistic operator that randomly modifies a decision variable of a candidate
solution to increase diversity among the population. The mutation rate of habitat i is
inversely proportional to its probability: pm,i = pm,max (1 − PPmax
i
), where pm,max is
a control parameter and Pmax is the maximum habitat probability in the population.
The BBO flowchart is given in Algorithm 18.1.
308 18 Metaheuristics Based on Sciences
−0.3
Function value
−0.4
−0.5
−0.6
−0.7
−0.8
−0.9
−1
0 20 40 60 80 100
Iteration
Figure 18.2 The evolution of a random run of BBO for Easom function: the minimum and average
objectives.
the convergence of BBO. Instead of opposite numbers, they use quasi-reflected num-
bers for population initialization and also for generation jumping.
Example 18.1: The Easom function is treated in Examples 2.1, 3.4, and 5.2. Here
we solve this same problem by using BBO. The global minimum value is −1 at
x = (π, π )T .
We implement BBO on this problem by setting the number of habitats (population
size) as 50, the maximum number of iterations as 100, a keep rate of 0.2, α = 0.9,
pm = 0.1, σ = 4, and selects the initial population randomly from the entire domain.
For a random run, we have f (x) = −1.0000 at (3.1416, 3.1416) with 4010 func-
tion evaluations. All the individuals converge toward the global optimum. The evolu-
tion of the search is illustrated in Figure 18.2. For 10 random runs, the solver always
converged to the global optimum within 100 generations.
theory has been used to develop global optimization techniques. Chaos has also
been widely integrated with metaheuristic algorithms.
Chaos optimization algorithms [42] are population-based metaheuristic based
on the use of pseudorandom numerical sequences generated by means of chaotic
map. Chaos optimization can carry out overall searches at higher speeds and escape
from local minima more easily than stochastic ergodic searches that depend on the
probabilities [42]. The parallel chaos optimization algorithm proposed in [63] uses
migration and merging operations to achieve a good balance between exploration
and exploitation.
Sine cosine algorithm (http://www.alimirjalili.com/SCA.html) [39] is a
population-based metaheuristic optimization algorithm. It creates multiple initial
random solutions and requires them to fluctuate outwards or toward the best solution
using a mathematical model based on sine and cosine functions. Several random and
adaptive variables are integrated to enable exploration and exploitation of the search
space.
The concept of opposition-based learning [58] has been utilized in a wide range of
learning and optimization fields. A mathematical proof shows that in terms of conver-
gence speed, utilizing random numbers and their oppositions is more beneficial than
using the pure randomness to generate initial estimates without a prior knowledge
about the solution of a continuous domain optimization problem [44]. It is mathemat-
ically proven in [48] that opposition-based learning performs well in binary spaces.
The proposed binary opposition-based scheme can be embedded inside many binary
population-based algorithms. Opposition-based learning is applied to accelerate the
convergence rate of binary gravitational search algorithm [48].
Opposition-based strategy in optimization algorithms uses the concept of
opposition-based learning [58]. EAs by opposition-based learning is implemented
by comparing the fitness of an individual to its opposite and retaining the fitter one in
the population. Opposition-based learning is an effective method to enhance various
optimization techniques.
Definition 18.2 (Opposition number in binary domain). Let x ∈ {0, 1}. The oppo-
site number x̃ is denoted by x̃ = 1 − x. For a vector x in binary space, each dimension
xi , the corresponding dimension of the opposite point is denoted by x̃i = 1 − xi .
18.8 Methods Based on Mathematical Concepts 311
Problems
References
1. Acan A, Unveren A. A two-stage memory powered great deluge algorithm for global optimiza-
tion. Soft Comput. 2015;19:2565–85.
2. Acan A, Unveren A. A great deluge and tabu search hybrid with two-stage memory support
for quadratic assignment problem. Appl Soft Comput. 2015;36:185–203.
3. Alatas B. ACROA: artificial chemical reaction optimization algorithm for global optimization.
Expert Syst Appl. 2011;38:13170–80.
4. Arulanandham JJ, Calude C, Dinneen MJ. Bead-sort: a natural sorting algorithm. Bull Eur
Assoc Theor Comput Sci. 2002;76:153–61.
5. Astudillo L, Melin P, Castillo O. Introduction to an optimization algorithm based on the chem-
ical reactions. Inf Sci. 2015;291:85–95.
6. Balamurugan R, Natarajan AM, Premalatha K. Stellar-mass black hole optimization for biclus-
tering microarray gene expression data. Appl Artif Intell. 2015;29:353–81.
7. Bayraktar Z, Komurcu M, Werner DH. Wind driven optimization (WDO): a novel nature-
inspired optimization algorithm and its application to electromagnetics. In: Proceedings of
IEEE antennas and propagation society international symposium (APSURSI), Toronto, ON,
Canada, July 2010. p. 1–4.
8. Bayraktar Z, Komurcu M, Bossard JA, Werner DH. The wind driven optimization technique
and its application in electromagnetics. IEEE Trans Antennas Propag. 2013;61(5):2745–57.
9. Bender MA, Farach-Colton M, Mosteiro MA. Insertion sort is O(n log n). Theory Comput
Syst. 2006;39(3):391–7.
10. Bhattacharya A, Chattopadhyay P. Solution of economic power dispatch problems using oppo-
sitional biogeography-based optimization. Electr Power Compon Syst. 2010;38:1139–60.
11. Birbil SI, Fang S-C. An electromagnetism-like mechanism for global optimization. J Global
Optim. 2003;25(3):263–82.
312 18 Metaheuristics Based on Sciences
12. Chao M. SunZhi Xin, LiuSan Min, Neural network ensembles based on copula methods
and Distributed Multiobjective Central Force Optimization algorithm. Eng Appl Artif Intell.
2014;32:203–12.
13. Chen H-L, Doty D, Soloveichik D. Deterministic function computation with chemical reaction
networks. Nat Comput. 2014;13:517–34.
14. Cuevas E, Echavarria A, Ramirez-Ortegon MA. An optimization algorithminspired by the
states of matter that improves the balance between explorationand exploitation. Appl Intell.
2014;40:256–72.
15. Dewdney AK. On the spaghetti computer and other analog gadgets for problem solving. Sci
Am. 1984;250(6):19–26.
16. Diosan L, Oltean M. Friction-based sorting. Nat Comput. 2011;10:527–39.
17. Dogan B, Olmez T. A new metaheuristic for numerical function optimization: vortex search
algorithm. Inf Sci. 2015;293:125–45.
18. Doty D, Hajiaghayi M. Leaderless deterministic chemical reaction networks. Nat Comput.
2015;14:213–23.
19. Dueck G. New optimization heuristics: the great deluge algorithm and the record-to-record
travel. J Comput Phys. 1993;104:86–92.
20. Ergezer M, Simon D, Du D. Oppositional biogeography-based optimization. In: Proceedings of
IEEE conference on systems, man, and cybernetics, San Antonio, Texas, 2009. p. 1035–1040.
21. Erol OK, Eksin I. A new optimization method: big bang big crunch. Adv Eng Softw.
2006;37(2):106–11.
22. Eskandar H, Sadollah A, Bahreininejad A, Hamdi M. Water cycle algorithm—a novel meta-
heuristic optimization method for solving constrained engineering optimization problems.
Comput Struct. 2012;110:151–60.
23. Formato RA. Central force optimization: a new metaheuristic with application in applied elec-
tromagnetics. Prog Electromagn Res. 2007;77:425–91.
24. Goliaei S, Jalili S. Computation with optical sensitive sheets. Nat Comput. 2015;14:437–50.
25. Gong W, Cai Z, Ling CX. DE/BBO: a hybrid differential evolution with biogeography-based
optimization for global numerical optimization. Soft Comput. 2010;15:645–65.
26. Hatamlou A. Black hole: a new heuristic optimization approach for data clustering. Inf Sci.
2013;222:175–84.
27. Javidy B, Hatamlou A, Mirjalili S. Ions motion algorithm for solving optimization problems.
Appl Soft Comput. 2015;32:72–9.
28. Kashan AH. A New metaheuristic for optimization: optics inspired optimization (OIO). Tech-
nical Report, Department of Industrial Engineering, Tarbiat Modares University. 2013.
29. Kaveh A, Khayatazad M. A new meta-heuristic method: ray optimization. Comput Struct.
2012;112:283–94.
30. Kaveh A, Talatahari S. A novel heuristic optimization method: charged system search. Acta
Mech. 2010;213:267–89.
31. Lam AYS, Li VOK. Chemical-reaction-inspired metaheuristic for optimization. IEEE Trans
Evol Comput. 2010;14(3):381–99.
32. Lam AYS, Li VOK, Xu J. On the convergence of chemical reaction optimization for combina-
torial optimization. IEEE Trans Evol Comput. 2013;17(5):605–20.
33. Lam AYS, Li VOK, Yu JJQ. Real-coded chemical reaction optimization. IEEE Trans Evol
Comput. 2012;16(3):339–53.
34. Lomolino M, Riddle B, Brown J. Biogeography. 3rd ed. Sunderland, MA: Sinauer Associates;
2009.
35. MacArthur R, Wilson E. The theory of biogeography. Princeton, NJ: Princeton University;
1967.
36. Mehdizadeh E, Tavakkoli-Moghaddam R, Yazdani M. A vibration damping optimization algo-
rithm for a parallel machines scheduling problem with sequence-independent family setup
times. Appl Math Modell. 2016. in press.
References 313
61. Xu J, Lam AYS, Li VOK. Chemical reaction optimization for task scheduling in grid computing.
IEEE Trans Parallel Distrib Syst. 2011;22(10):1624–31.
62. Yan G-W, Hao Z-J. A novel optimization algorithm based on atmosphere clouds model. Int J
Comput Intell Appl 12:1;2013: article no. 1350002, 16 pp.
63. Yuan X, Zhang T, Xiang Y, Dai X. Parallel chaos optimization algorithm with migration and
merging operation. Appl Soft Comput. 2015;35:591–604.
64. Zhou Y, Wang Y, Chen X, Zhang L, Wu K. A novel path planning algorithm based on plant
growth mechanism. Soft Comput. 2016. p. 1–11. doi:10.1007/s00500-016-2045-x.
Memetic Algorithms
19
The term meme was coined by Dawkins in 1976 in his book The Selfish Gene [7].
The sociological definition of a meme is the basic unit of cultural transmission or
imitation. A meme is the social analog of genes for individuals. Universal Darwinism
draws the analogy on the role of genes in genetic evolution to that of memes in a
cultural evolutionary process [7]. The science of memetics [3] represents the mind-
universe analog to genetics in cultural evolution, ranging the fields of anthropology,
biology, cognition, psychology, sociology and sociobiology. This chapter is dedicated
to memetic and cultural algorithms.
19.1 Introduction
The meme is a unit of intellectual or cultural information that can pass from mind to
mind, when people exchange ideas. As genes propagate in the gene pool via sperms
or eggs, memes propagate in the meme pool by spreading from brain to brain via a
process called imitation. Unlike genes, memes are typically adapted by the people
who transmit them before being passed on, that is, meme is a lifetime learning
procedure capable of generating refinement on individuals.
Like genes that serve as building blocks in genetics, memes are building blocks of
meaningful information that is transmissible and replicable. Memes can be thought
of as schemata that are modified and passed on over a learning process. The concept
of schemata being passable are just as behaviors or thoughts are passed on memes.
The typical memetic algorithm uses an additional mechanism to modify schemata
during an individual’s lifetime, taken as the period of evaluation from the point of
view of GA, and that refinement can be passed on to an individual’s offspring.
Memetic computation is a computational paradigm that encompasses the con-
struction of a comprehensive set of memes. It involves the additional dimension of
cultural evolution through memetic transmission, selection, replication, imitation, or
EAs are easy to fall into premature convergence because implicit information and
domain knowledge is not fully used. Cultural algorithms [30] are motivated by human
culture evolution process. They can effectively improve the evolution performance
[6,28,31].
Cultural algorithms [30] are a computational framework consisting of two differ-
ent spaces: population space and belief space. Selected experiences of the successful
agents during the population evolution will produce knowledge that can be commu-
nicated to the belief space, where it gets manipulated and used to affect the evolution
process of the population. The interaction between both spaces yields a dual inheri-
tance structure in which the evolution of the agents and the evolved beliefs take place
in parallel, in a way similar to the human cultures evolution. Figure 19.1 presents
the components of a cultural algorithm.
19.2 Cultural Algorithms 317
Accept() Influence()
Population space
Reproduce() Performance()
The population space comprises a set of possible solutions to the problem, and can
be modeled using any population-based approach. Inspired by culture as information
storage in the society, the belief space is information which does not depend on the
individuals who generated it and can be accessed by all members in the population
space [31]. In belief space, implicit knowledge is extracted from better individuals
in the population. This is utilized to direct the evolution in the population space
to escape from the local optimal solutions. The two spaces first evolve separately,
then exchange experience by accept operation and affect operation. Individuals in
the population space can contribute their experience to the belief space using accept
operation, and the belief space influence the individuals in the population space using
affect operation. The two spaces can be modeled using any swarm-based computing
model.
Five basic categories of knowledge are stored in the belief space: situational,
normative, topographical, domain, and history knowledge [6,28].
1. Initialize t = 0.
Initialize population space Pst and belief space Bst .
Initialize all the individuals in the two spaces randomly.
Evaluate the fitness of each individual.
2. Repeat:
a. Update the individuals in the two spaces according to their own rules and eval-
uate the fitness value of each individual in Pst .
b. Update belief space by accept operation: Bst = evolve (Bst , accept (Pst )).
c. Update population space by influence operation: Pst = create(Pst ,
influence(Bst )).
d. Set t = t + 1.
e. Choose Pst from Pst−1 .
until stopping criterion is satisfied.
Domain knowledge and history knowledge are useful on dynamic landscape prob-
lems [28].
The process of cultural algorithm is described as Algorithm 19.1. Individuals get
assessed with the performance function. The accept function then selects the best
agents in the population space to inform the belief space so as to update the belief
space. Knowledge in the belief space is then allowed to enhance those individuals
selected to the next generation through the affect operation. The algorithm replicates
this process iteratively until the stopping condition is reached.
Crossover operators are often used but they have no biological analogy; they mimic
obsequious and rebellious behavior found in cultural systems; the problem-solving
experience of individuals selected from the population space is used to generate
problem-solving knowledge in the belief space. This knowledge can control the
evolution of individuals by means of an influence function, by modifying any aspect
of the individuals.
Multi-population cultural algorithm [8] adopts individual migration. Only best
solutions coming from each sub-population are exchanged in terms of given migra-
tion rules. It does not use implicit knowledge extracted from a sub-population. A
method proposed in [1] divides sub-population based on fuzzy clustering and gives
cultural exchange among sub-populations.
In [2], culture algorithm uses DE as the population space. The belief space uses
different knowledge sources to influence the variation operator of DE in order to
reduce the calculated amount on evaluating fitness values.
Motivated by the evolution of ideas, memetic algorithm [23,24], also called genetic
local search, is another cultural algorithm framework based upon the cultural evolu-
tion that can exhibit local refinement. It is a dual inheritance system that consists of
19.3 Memetic Algorithms 319
a social population and a belief space, and models the evolution of culture or ideas.
Their owner can improve upon the idea by incorporating local search.
Memetic algorithm was inspired by both the neo-Darwinian paradigm and
Dawkins’ notion of a meme defined as a unit of cultural evolution that is capa-
ble of local refinements. Evolution and learning are combined using the Lamarckian
strategy. Memetic algorithm can be considered as EA with local search. It combines
the evolutionary adaptation of a population with individual learning of its members.
Memetic algorithm is considerably faster than simple GA.
Though encompassing characteristics of cultural evolution in the form of local
refinement in the search cycle, memetic algorithm is not a true evolving system
according to universal Darwinism, since the principles of inheritance/memetic trans-
mission, variation and selection are missing.
In [25], a probabilistic memetic framework that governs memetic algorithms as
a process involving whether evolution or individual learning should be favored is
presented and the probability of each process in locating the global optimum is
analyzed. The framework balances evolution and individual learning by governing
the learning intensity of each individual according to the theoretical upper bound
derived during the search process.
Another class of memetic algorithms exhibits the principles of memetic trans-
mission and selection in their design. In multi-meme memetic algorithm [16], the
memetic material is encoded as part of the genotype. Subsequently, the decoded
meme of each respective individual is then used to perform a local refinement. The
memetic material is then transmitted through a simple inheritance mechanism from
parent to offspring. In hyper-heuristic [14] and meta-Lamarckian memetic algorithm
[27], the pool of candidate memes considered will compete, based on their past merits
in generating local improvements through a reward mechanism, deciding on which
meme to be selected to proceed for future local refinements.
In coevolution and self-generation memetic algorithms [17,32], all three princi-
ples satisfying the definitions of a basic evolving system has been considered. A
rule-based representation of local search is co-adapted alongside candidate solutions
within the evolutionary system, thus capturing regular repeated features or patterns
in the problem space.
By combining cellular GA with a random walk local search [11], a better conver-
gence rate is achieved on the satisfiability problems. For cellular memetic algorithm
[12], adaptive mechanisms that tailor the amount of exploration versus exploita-
tion of local solutions are carried out. A memetic version of DE, called memDE
[26], applies crossover-based local search, called fittest individual refinement, for
exploring the neighborhood of the best solution in each generation for enhanced
convergence speed and robustness. Evolutionary gradient search [34] adapts gradi-
ent search into evolutionary mechanism. The bacterial memetic algorithm [4] is a
kind of memetic algorithm based on the bacterial approach. An intense continuous
local search is proposed in the framework of memetic algorithms [22].
Real-coded memetic algorithm [18] applies a crossover hill-climbing to solutions
produced by the genetic operators. Crossover hill-climbing exploits the self-adaptive
capacity of real-parameter crossover operators with the aim of producing an effective
320 19 Memetic Algorithms
local tuning on the solutions. The algorithm employs an adaptive mechanism that
determines the probability with which every solution should receive the application
of crossover hill-climbing.
In [36], greedy crossover-based hill-climbing and steepest mutation-based hill-
climbing are used as an adaptive hill-climbing strategy within the framework of
memetic algorithms for solving dynamic optimization problems.
In memetic algorithms, local search is used to search around the most promis-
ing solutions. As the local region extension increases with the dimensionality,
high-dimensional problems require a high number of evaluations during each local
search process, called local search intensity. MA-SW-Chains [21], the winner of
the CEC’2010 competition, is a memetic algorithm for large scale global optimiza-
tion. It combines a steady state GA with a Solis Wets local search method. MA-
SW-Chains introduces the concept of local search chains to adapt the local search
intensity assigned to the local search method, by exploiting with higher intensity the
most promising individuals. It assigns to each individual a local search intensity that
depends on its features, by chaining different local search applications. MA-SW-
Chains adapts the local search intensity by applying the local search several times
over the same individual, with a fixed local search intensity, and storing its final
parameters, creating local search chains [21]. MA-SW-Chains uses a relative small
population, and iteratively improves the best current solution.
A diversity-based adaptive local search strategy based on parameterized Gaussian
distribution [35] is integrated into the framework of the parallel memetic algorithm
to address large scale COPs.
generation does not exist in global simplex search; this allows for smooth decrease
of the population from an initial size to a final one.
Global simplex optimization [13] is a population-based EA incorporating a spe-
cial multistage, stochastic and weighted version of the reflection operator of classical
Nelder–Mead simplex method for minimization of continuous multimodal functions.
The method incorporates a weighted stochastic recombination operator inspired from
the reflection and expansion operators of the simplex method, but no mutation oper-
ator.
Binary sequences with low aperiodic autocorrelation levels, defined in terms of the
peak sidelobe level and/or merit factor, have many important engineering applica-
tions, such as radars, sonars, channel synchronization and tracking, spread spectrum
communications, system identification, and cryptography. Searching for low auto-
correlation binary sequences (LABS) is a notorious combinatorial problem.
For a binary sequence of length L, a = a1 a2 . . . a L with ai = {−1, +1} for all i,
its autocorrelation function is given by
L−k
Ck (a) = ai ai+k , k = 0, ±1, . . . , ±(L − 1). (19.1)
i=1
For k = 0, the value of the autocorrelation function equals L and is called the peak,
and for k = 0, the values of the autocorrelation function are called the sidelobes.
The peak sidelobe level (PSL) of a binary sequence a of length L is defined as
P S L(a) = max |Ck (a)|. (19.2)
k=1,...,L−1
The minimum peak sidelobe (MPS) for all possible binary sequences of length L
is defined as
M P S(L) = min P S L(a). (19.3)
a∈{−1,+1} L
Our EA for the LABS problem integrates the key features of GA, ES and memetic
algorithms. Binary coding is a natural coding scheme for this problem. Each chromo-
some is encoded by a string. The EA design incorporates several features, including
(λ + μ) ES-like scheme, two-point mutation, a bit-climber used as a local search
operator, partial population restart, and a fast scheme for calculating autocorrela-
tion. Crossover operation is not applied. The algorithm can efficiently discover long
LABS of lengths up to several thousands.
Represent binary sequences ai ’s as ±1-valued bit strings. The evaluation of the
fitness function takes O(L 2 ) operations for calculating Ck (a)’s. For the bit-climber,
for each bit flip at ai , Ck (a) can be calculated from its previous value Ck (a) by the
update equation
⎧
⎪
⎪ Ck (a) − 2ai ai+k , 1≤i ≤k
⎪
⎪
⎪
⎪ and i ≤ L − k;
⎨
Ck (a) − 2ai (ai−k + ai+k ), k + 1 ≤ i ≤ L − k;
Ck (a) = (19.5)
⎪
⎪ Ck (a) − 2ai−k ai , L −k+1≤i ≤ L
⎪
⎪
⎪
⎪ and i ≥ k + 1;
⎩
Ck (a), otherwise.
This reduces the complexity for updating all Ck (a)’s to O(L). The resultant saving
is significant, especially because each mutated or randomly generated individual
is subject to L bit flips and fitness evaluations. For example, compared to direct
calculation of Ck ’s, the computing time of the EA is reduced by a factor of 4 when
calculating Ck ’s for L = 31 by (19.5).
In addition to PSL and merit factor, another fitness function is defined by
F(a)
f (a) = . (19.6)
P S L(a)
The results for using several fitness functions were compared in terms of both PSL
and merit factor in [9].
Denote the number of children N O , number of generations for each restart G R S ,
maximal number of generations G max , and population size for partial restart N R S .
Before applying the algorithm for finding long LABS with low PSL, we first
address the problem of which fitness function is most suitable for the task at hand.
We set N P = 4L, N O = 20L, G R S = 5, G max = 100, N R S = 10L. The fitness func-
tions PSL, F and f are evaluated for 5 random runs of the EA on a Linux system
with Intel’s Core 2 Duo processor. When PSL is selected as the fitness function, the
F performance is the poorest. In contrast, when F is selected as the fitness function,
the PSL performance is poorest. Better tradeoffs are achieved by the fitness functions
f . In particular, f achieves the best tradeoff between the achieved PSL and F [9].
For each length, we implemented 3 random runs of our program, and the best
result was retained. To reduce the computing time, the population and children
sizes for longer lengths are decreased. For L = 300 to 1000, we set N P = L,
N O = 2L, G R S = 5, G max = 200, N R S = L. When L > 1000, we set N P = N O =
1000, G R S = 5, G max = 200, N R S = 1000.
19.4 Application: Searching Low Autocorrelation Sequences 323
Table 19.1 Results for L = 1024 and 4096, obtained from 3 random runs of the algorithm
L PSL F Hexadecimal form
1024 28 3.9683 4A3850
61EB56D8C3A37BEDFF2EEBC30 96B47CF2CE9EBA6C28A6895AF
4CDF08090AB612DA8043C3D1F E644D50A15E908692AC4DC095
218D398A6A66B389D16C8A6BC AF26896612DF211D48CBC027C
7C451B6B5B14EECD199CE823E 63C07C4E20AECF7513F41329D
56706E05F66D22A6EEC152A83 0F9378B07D7F3DC2D9FF88C08
4096 61 3.4589 E30A5D894A09A4CE0D11987E
FC7E8DC88127C078FBD569A4A D05AB26D86A2D067C1E274783
B891CBF64617E0906673F029A ED144133B3FF48DF2DB8A1878
6780075E9C2B0CC46E6D0DA62 3CF1F50F1DF94177C28076F3C
E44BC24C69D242E8D6F49F678 E71C2D4D72C9412C828734AA3
9CA28EA2A7E5891B451ADA9B2 408E666BA052C81509DE81789
7E4AF9FE4F504846D80D6B14C EEBDD9402A35C03AFD4EAE97B
7ECB690094681EFD13837398A CECAA9AB5FC10682B00CA74BD
15B5C0D7C53BAF35BF70612CB 4DDE55EB4CF2F028596ED8382
3F5D1A73463B9953326AE6950 CF1299AB6ACB432887A56E9F0
42957BAE604C003E982152DFE AFA75968C0D8B0FEAA2ED33FC
20DE73FBA4E21F154CB291291 58F8BB5B9977C57B6F77A7363
4D9164A6FEA9647EAA1E1D631 14B6BA1E9F065D66E5F5BF15B
0D46EF9CED3216DB9DF0298E1 CFBE0AF7596E9EB4BCBBBDA10
8A2B6088380B8D73797F9E9DB 094FCC06FF0544F46E261FE4E
F60AABCA0A32A5D1694B818B0 3A6D5351B28BAF523D1AE65D6
048136003CFBA56CF22E0E1A2 F2973C8163731272219255826
1DC2BEC886EBBBD73B5D1EFC2 9BB7E91F72964943D6D3560C3
A8E20D11EC5A81C106E04D5F5 9218D9FD9D823B118AD4FB1D6
C1435461E338D9F171B337E5D D7320CCD9CFE5DC651051E0F6
678550BA09F9892E76D6E17C4 9ECD63F71B71FF351EEAF6DEB
The computing time is 16.1136 hours for L = 1019. For lengths up to 4096, the
computing time required empirically shows a seemingly quadratic growth with L.
In particular, the parameters have been adjusted to trade the performance for the
search time, in case of long sequences. This flexible tradeoff is in fact one of the
key advantages of the algorithm. The sequences obtained for L=1024 and 4096 are
listed in Table 19.1. A detailed implementation of the algorithm and a full list of best
sequences thus far is given in [9].
Problem
19.1 Run the accompanying MATLAB code of cultural algorithm to find the global
minimum of Rosenbrock function. Investigate how to improve the program.
324 19 Memetic Algorithms
References
1. Alami J, Imrani AE, Bouroumi A. A multi-population cultural algorithm using fuzzy clustering.
Appl Soft Comput. 2007;7(2):506–19.
2. Becerra RL, Coello CAC. Cultured differential evolution for constrained optimization. Comput
Meth Appl Mech Eng. 2006;195:4303–22.
3. Blackmore S. The meme machine. New York: Oxford University Press; 1999.
4. Botzheim J, Cabrita C, Koczy LT, Ruano AE. Fuzzy rule extraction by bacterial memetic
algorithms. Int J Intell Syst. 2009;24(3):1563–8.
5. Chelouah R, Siarry P. Genetic and Nelder-Mead algorithms hybridized for a more accurate
global optimization of continuous multiminima functions. Eur J Oper Res. 2003;148:335–48.
6. Chung CJ, Reynolds RG. Function optimization using evolutionary programming with self-
adaptive cultural algorithms. In: Proceedings of Asia-Pacific conference on simulated evolution
and learning, Taejon, Korea, 1996. p. 17–26.
7. Dawkins R. The selfish gene. Oxford, UK: Oxford Unive Press; 1976.
8. Digalakis JG, Margaritis KG. A multi-population cultural algorithm for the electrical generator
scheduling problem. Math Comput Simul. 2002;60(3):293–301.
9. Du K-L, Mow WH, Wu WH. New evolutionary search for long low autocorrelation binary
sequences. IEEE Trans Aerosp Electron Syst. 2015;51(1):290–303.
10. Farahmand AM, Ahmadabadi MN, Lucas C, Araabi BN. Interaction of culture-based learning
and cooperative coevolution and its application to automatic behavior-based system design.
IEEE Trans Evol Comput. 2010;14(1):23–57.
11. Folino G, Pizzuti C, Spezzano G. Combining cellular genetic algorithms and local search for
solving satisfiability problems. In: Proceedings of the 12th IEEE international conference on
tools with artificial intelligence, Taipei, Taiwan, November 1998. p. 192–198.
12. Huy NQ, Soon OY, Hiot LM, Krasnogor N. Adaptive cellular memetic algorithms. Evol Com-
put. 2009;17(2):231–56.
13. Karimi A, Siarry P. Global simplex optimization—a simple and efficient metaheuristic for
continuous optimization. Eng Appl Artif Intell. 2012;25:48–55.
14. Kendall G, Soubeiga E, Cowling P. Choice function and random hyperheuristics. In: Pro-
ceedings of the 4th Asia-Pacific conference on simulated evolution and learning, Singapore,
November 2002. p. 667–671.
15. Kirby S. Spontaneous evolution of linguistic structure: an iterated learning model of the emer-
gence of regularity and irregularity. IEEE Trans Evol Comput. 2001;5(2):102–10.
16. Krasnogor N. Studies on the theory and design space of memetic algorithms. PhD Thesis,
Faculty Comput Math Eng Bristol, UK, University West of England, 2002.
17. Lee JT, Lau E, Ho Y-C. The Witsenhausen counterexample: a hierarchical search approach for
nonconvex optimization problems. IEEE Trans Autom Control. 2001;46(3):382–97.
18. Lozano M, Herrera F, Krasnogor N, Molina D. Real-coded memetic algorithms with crossover
hill-climbing. Evol Comput. 2004;12(3):273–302.
19. Luo C, Yu B. Low dimensional simplex evolution—a new heuristic for global optimization. J
Glob Optim. 2012;52(1):45–55.
20. Malaek SM, Karimi A. Development of a new global continuous optimization algorithm based
on Nelder–Mead Simplex and evolutionary process concepts. In: Proceedings of the 6th inter-
national conference on nonlinear problems in aerospace and aviation (ICNPAA), Budapest,
Hungary, June 2006. p. 435–447.
21. Molina D, Lozano M, Garcia-Martinez C, Herrera F. Memetic algorithms for continuous opti-
mization based on local search chains. Evol Comput. 2010;18(1):27–63.
22. Molina D, Lozano M, Herrera F. MA-SW-Chains: memetic algorithm based on local search
chains for large scale continuous global optimization. In: Proceedings of the IEEE Congress
on evolutionary computation (CEC), Barcelona, Spain, July 2010. p. 1–8.
References 325
23. Moscato P. On evolution, search, optimization, genetic algorithms and martial arts: towards
memetic algorithms. Technical Report 826, Caltech Concurrent Computation Program, Cali-
fornia Institute of Technology, Pasadena, CA, 1989.
24. Moscato P. Memetic algorithms: a short introduction. In: Corne D, Glover F, Dorigo M, editors.
New ideas in optimization. McGraw-Hill; 1999. p. 219–234.
25. Nguyen QH, Ong Y-S, Lim MH. A probabilistic memetic framework. IEEE Trans Evol Comput.
2009;13(3):604–23.
26. Noman N, Iba H. Enhancing differential evolution performance with local search for high
dimensional function optimization. In: Proceedings of genetic and evolutionary computation
conference (GECCO), Washington DC, June 2005. p. 967–974.
27. Ong YS, Keane AJ. Meta-Lamarckian learning in memetic algorithms. IEEE Trans Evol Com-
put. 2004;8(2):99–110.
28. Peng B, Reynolds RG. Cultural algorithms: knowledge learning in dynamic environments. In:
Proceedings of IEEE congress on evolutionary computation, Portland, OR, 2004. p. 1751–1758.
29. Renders J-M, Bersini H. Hybridizing genetic algorithms with hill-climbing methods for global
optimization: two possible ways. In: Proceedings of the 1st IEEE conference on evolutionary
computation, Orlando, FL, June 1994, vol. 1. p. 312–317.
30. Reynolds RG. An introduction to cultural algorithms. In: Sebald AV, Fogel LJ, editors. Pro-
ceedings of the 3rd annual conference on evolutionary programming. River Edge, NJ: World
Scientific; 1994. p. 131–139.
31. Reynolds RG. Cultural algorithms: theory and applications. In: Corne D, Dorigo M, Glover
F, editors. Advanced topics in computer science series: new ideas in optimization. New York:
McGraw-Hill; 1999. p. 367–377.
32. Smith JE. Coevolving memetic algorithms: a review and progress report. IEEE Trans Syst Man
Cybern Part B. 2007;37(1):6–17.
33. Sotiropoulos DG, Plagianakos VP, Vrahatis MN. An evolutionary algorithm for minimizing
multimodal functions. In: Proceedings of the 5th Hellenic–European conference on computer
mathematics and its applications (HERCMA), Athens, Greece, September 2001, vol. 2. Athens,
Greece: LEA Press; 2002. p. 496–500.
34. Solomon R. Evolutionary algorithms and gradient search: similarities and differences. IEEE
Trans Evol Compu. 1998;2(2):45–55.
35. Tang J, Lim M, Ong YS. Diversity-adaptive parallel memetic algorithm for solving large scale
combinatorial optimization problems. Soft Comput. 2007;11(9):873–88.
36. Wang H, Wang D, Yang S. A memetic algorithm with adaptive hill climbing strategy for
dynamic optimization problems. Soft Comput. 2009;13:763–80.
37. Yen J, Liao JC, Lee B, Randolph D. A hybrid approach to modeling metabolic systems using a
genetic algorithm and simplex method. IEEE Trans Syst Man Cybern Part B. 1998;28:173–91.
Tabu Search and Scatter Search
20
Once a potential solution has been determined, it will be marked as tabu, and
the algorithm will not visit it repeatedly. The approach uses memories to avoid en-
trapment in cycles and pursues the search when the optimization process encounters
local optima, where cycling back to formerly visited solutions is prohibited through
the use of memory lists called tabu lists, which trace the recent search history. Best
improvement is implemented by always replacing each current solution by its best
neighbor, even if the best neighbor is worse than the current solution. This can avoid
getting stuck at local optima. In order to avoid cycling among already visited solu-
tions, a tabu list is used to keep the information about the past steps of the search,
and to create and exploit new solutions in the search space.
Tabu search starts searching with a present solution and constructs a set of feasible
solutions from the present one based on neighborhood by using the tabu list. The tabu
list T holds a record of all previously visited states. The solutions constructed are
evaluated and the one with the highest metric value is selected as the next solution.
The tabu list is then updated. However, forbidding all solutions corresponding to a
tabu attribute may forbid some good or even optimal solutions that have not yet been
visited. No records in T can be used to form a next feasible solution, unless they fit
aspiration criteria. The aspiration criteria allow better solutions to be chosen, even
if they have been tabooed. Suppose T follows the policy of FIFO; the larger the set
T , the longer will be the prohibited time of the move in T .
An aspiration criterion is a condition that, if satisfied, allows to set a solution
obtained by performing a tabu move as new current solution. It is a rule that allows
the tabu status to be overridden in cases where the forbidden exchange exhibits
desirable properties. A typical aspiration criterion is to keep a solution that is better
than the best solution found so far. In this metaheuristic, intensification is provided
by the local search mechanism, while diversification is given by the use of tabu lists.
Basic tabu search is given in Algorithm 20.1, where x, y are feasible solutions of a
COP, A(x, t) is the set of solutions among which the new current solution is chosen
at iteration t, N (x) is the set of neighbors of x, T (x, t) is the set of tabu moves
at iteration t, and T̃ (x, t) is the set of tabu moves satisfying at least one aspiration
criterion, and E(·) is the metric function. Stopping criterion may be the maximum
number of consecutive iterations not producing an improving solution, or A(x, t) is
an empty set.
Step 4.a can be implemented as follows. The set A(x, t) is generated by generating
M children x from the neighborhood of x. These children satisfy the conditions that
their features do not belong to T , or they satisfy at least one of the aspirations T̃ . Step
4.b determines the new solution x by selecting the one with the minimum fitness.
Step 4.c updates the tabu list T by including the features from x , and updates x by
x , if f (x ) < f (x). Simple tabu search, in most cases, will find a local optimum
rather than a global optimum.
Tabu search has a strong reliance on the initial solution and its quality. The conver-
gence speed of tabu search to the global optimum is dependent on the initial solution,
since it is a form of iterative search. A multistart method is one that executes multiple
20.1 Tabu Search 329
times from different initial settings. In [14], strategic diversification is utilized within
the tabu search framework for the QAP, by incorporating several diversification and
multistart tabu search variants.
1. Set t = 0.
2. Generate an initial solution x.
3. Initialize the tabu lists T ← ∅ and the size of tabu list L.
4. Repeat:
a. Set the candidate set A(x, t) = {x ∈ N (x) \ T (x, t) ∪ T̃ (x, t)}.
b. Find the best x from A(x, t): Set x = arg min y∈A(x,t) f ( y).
c. If f (x ) is better than f (x), x ← x .
d. Update the tabu lists and the aspiration criteria.
e. If the tabu list T is full, then old features from T are replaced.
f. Set t = t + 1.
until termination criterion is satisfied.
By introducing parallelism, tabu search can find the promising regions of the
search space very quickly. A parallel tabu search model, which is based on the
crossover operator of GA, has been described in [15]. Theoretical properties of
convergence of tabu search to the optimal solutions has been analyzed in [13].
Diversification-driven tabu search [12] repeatedly alternates between simple tabu
search and a diversification phase founded on a memory-based perturbation operator.
Starting from an initial random solution, the method uses tabu search to reach a local
optimum. Then, perturbation operator is applied to displace the solution to a new
region, whereupon a new round of tabu search is launched. The tabu search procedure
uses a neighborhood defined by single 1-flip moves, which consist of flipping a single
variable x j to its complement value 1 − x j . The diversification strategy utilizes a
memory-based perturbation operator.
CA-TS [1] combines cultural algorithms and tabu search, where tabu search is used
to transform history knowledge in the belief space from a passive knowledge source
to an active one. In each generation of the cultural algorithm, the best individual
solution is calculated and then the best new neighbor of that solution is sought
in the social network for that population using tabu search. In order to speed up
the convergence process through knowledge dissemination, simple forms of social
network topologies are used to describe the connectivity of individual solutions. The
integration of tabu search as a local enhancement process enables CA-TS to leap
over false peaks and local optima.
330 20 Tabu Search and Scatter Search
Random search or pattern search introduces fixed step size random search based
on basic mathematical analysis. It iteratively moves to better positions in the search
space that are sampled from a hypersphere surrounding the current position. The step
size significantly affects the performance of the algorithms. Solis–Wets algorithm
is a randomized hill climber with an adaptive step size, which is a general and fast
search algorithm with good behavior.
Iterated local search [17] creates a sequence of solutions iteratively according to
a local search heuristic. After a new solution is created by local search, it is modified
by perturbation to escape from local extremum, and an intermediate solution is
produced. A neighborhood-based local search procedure is also designed to return
an enhanced solution. An acceptance measure is also delineated deciding which
solution is selected for further evolution. The new solution replaces the previous
one if it has better quality. The procedure continues until a termination criterion is
satisfied.
Iterated tabu search [18], as a special case of iterative local search, combines
tabu search with perturbation operators to avoid getting stuck in local optima. The
local search phase is replaced by a tabu search phase. At each iteration, solution ŝ is
perturbed resulting in solution s , which is then improved by tabu search to obtain
solution s̄. If solution s̄ satisfies the acceptance criterion, the search continues with
solution s̄, otherwise the search proceeds with solutions ŝ. The best-known feasible
solution encountered s ∗ and its function value are recorded.
Example 20.1: Reconsider the TSP for Berlin52 benchmark in TSPlib, which is
treated in Example 11.1. The length of the optimal tour is 7542 when using Euclidean
distances. In this example, we implement tabu search. We set the maximum number
of iterations as 1000, and the tabu list length as 500.
1000
800
600
400
200
0
0 500 1000 1500 2000
2.5
2
Iterative cost
1.5
0.5
0 20 40 60 80 100
Iteration
For a random run, the best route length obtained is 7782.9844 at the 100th iteration,
the optimal solution is illustrated in Figure 20.1, and the evolution of a random run is
illustrated in Figure 20.2. Compared to the ACO implementation given in Example
20.1, the implementation given here always converges to a local minimum. A more
elaborate strategy is required to help the search to get out of local minima.
Scatter search explores the solution space by evolving a set of reference points
(solutions) stored in the reference set (RefSet). These points are initially generated
with a diversification method and the evolution of these reference points is induced
by the application of four methods: subset generation, combination, improvement,
and update. Furthermore, these new individuals can be improved by applying a local
search method.
Scatter search is a kind of direction-based method that utilizes the subtraction of
two solutions as the perturbation direction in an evolution episode. A set of solutions
with high evaluation are used to generate new solutions to replace less promising
solutions at each iteration of the implementation process. A local search procedure is
usually applied over each solution of the population and each combined new solution.
The scatter search method builds a reference set (RefSet for short) of solutions to
maintain a good balance between intensification and diversification of the solution
process. Reference set stores b high-quality solutions: RefSet1 with b1 solutions in
terms of objective value, and RefSet2 with b2 = b − b1 solutions in terms of diversity
(crowdedness) and far away from RefSet1 points. With a generation procedure,
subsets are generated from the reference set. A combination procedure is then carried
out to form new solutions from subsets, and the new solutions experience local search
by the improvement procedure to become better solutions. There are update rules to
determine whether an improved solution could enter a reference set.
Scatter search has four main steps. The initialization of scatter search randomly
generates solutions in such a way that the more the individuals generate in one area,
the less opportunity this area will have to generate new ones. This ensures that the
initial solutions of scatter search have maximum diversity. Scatter search then makes
use of simplex search to improve the initial solutions. After that, RefSet1 is selected
from the improvement results according to the objective quality, and RefSet2 is
selected according to the distance to RefSet1 of the remaining improved individuals
(the larger the better). Then the algorithm starts the main loop. The reference set is
used to generate subsets. The solutions in the subsets are combined in various ways
to get Psi ze new solutions, which are then improved by local search such as simplex
search. If the improvement results in shrinking of the population, diversification
is applied again until the total number of improved solutions reaches the desired
target. Based on the improved solutions, the reference update is applied to construct
the reference set. Then scatter search continues in a loop that consists of applying
solution combination followed by improvement and the reference update. Finally,
the improved solutions will replace some solutions of the reference set if they are
good with respect to objective quality or diversity. This loop terminates when the
reference set does not change and all the subsets have already been subjected to
solution combination. At this point, diversification generation is used to construct a
new Refset2 and the search continues. The whole scatter search is terminated when
the predefined termination criterion is satisfied.
There are four types of subsets to be generated in scatter search: two-element
subsets, three-element subsets, four-element subsets, and subsets containing the best
five elements or more. There are many types of combinations for generating new
20.2 Scatter Search 333
solutions from subsets. Let us give an example for a two-element subset: x 1 and x 2 .
We can first define a vector starting at x 1 and pointing to x 2 as d = x 2 −x 1
2 . Three
types of recombination are suggested [11]:
x new = x 1 − r d, x new = x 1 + r d, x new = x 2 + r d, (20.1)
where r is a random number uniformly drawn from (0, 1).
Every subset can generate several new solutions according to the composition of
the subset. When both x 1 and x 2 belong to RefSet1, which means that they are all
good solutions, four new solutions are generated by types 1 and 3 once and type
2 twice. When only one of x 1 and x 2 belong to RefSet1, three new solutions are
generated by types 1, 2, 3 once. When neither x 1 nor x 2 belongs to RefSet1, which
means that they are all uncrowded solutions, two new solutions are generated by type
2 once and by type 1 or 3 once.
Simplex search is used to improve the new solutions. If an improved solution is
better than the worst one in RefSet1, it will replace the worst one. If an improved
solution’s distance to the closest reference set solutions is larger than that of most
crowded solutions in RefSet2, it will replace the most crowded one. If reference
set does not change in the updating procedure and the stop criterion has not been
satisfied, then the initialization procedure will be started to construct a new RefSet2.
It is suggested that Psi ze = max(100, 5b) [11]. Scatter search can be considered
as a (b + Psi ze )-ES, but the objective value is not the only criterion in the updating
(replacement) phase.
Scatter search is given in Algorithm 20.2.
Global search [20], called OptQuest/NLP, is a global optimization heuristic for
pure and mixed integer nonlinear problems with many constraints and variables,
where all problem functions are differentiable with respect to the continuous vari-
ables. The procedure combines the global optimization abilities of OptQuest with
the superior accuracy and feasibility-seeking behavior of gradient-based local NLP
solvers. OptQuest, a commercial implementation of scatter search developed by
OptTek Systems, provides starting points for any gradient-based local NLP solver.
1. Set D = ∅.
2. Repeat:
Construct a solution x by the diversification method.
if x D, then D = D ∪ {x}.
until |D| = DSi ze.
3. Build Re f Set = {x 1 , . . . , x b } in D with a one-by-one max–min selection.
4. Order the solutions in Re f set by their objective function value in the order that x 1
is the best.
5. N ewSolutions ← T RU E.
6. while (N ewSolutions) do:
a. Generate N ewSubsets, which consists of all pairs of solutions in Re f Set that
include at least one new solution.
N ewSolution ← F AL S E.
b. while (N ewSubsets = ∅) do:
i. Select the nest subset S in N ewSubsets.
ii. Apply solution combination on S to obtain one or more new solutions x.
if (x Re f Set and f (x) < f (x b ) )
x b ← x, and reorder Re f Set.
N ewSolutions ← T RU E.
end if
iii. N ewSubsets ← N ewSubsets \ S.
end while
end while
Problems
20.1 Find out the global search mechanism, the convergence mechanism, and the
uphill mechanism of scatter search.
20.2 GlobalSearch solver of MATLAB Global Optimization Toolbox imple-
ments global search algorithm [20] for finding global optimum solution of
smooth problems. Try GlobalSearch solver on a benchmark function given
in the Appendix. Test the influence of different parameters.
20.3 Run the accompanying MATLAB code of tabu search for n-queens problem.
Understand the principle of the algorithm. Investigate how to improve the
result by adjusting the parameters.
References
1. Ali MZ, Reynolds RG. Cultural algorithms: a Tabu search approach for the optimization of
engineering design problems. Soft Comput. 2014;18:1631–44.
2. Cvijovic D, Klinowski J. Taboo search: an approach to the multiple minima problem. Science.
1995;267(3):664–6.
3. Glover F. Future paths for integer programming and links to artificial intelligence. Comput
Oper Res. 1986;13(5):533–49.
4. Glover F. Tabu search-Part I. ORSA J Comput. 1989;1(3):190–206.
5. Glover F. Tabu search-Part II. ORSA J Comput. 1990;2(1):4–32.
6. Glover F. A template for scatter search and path relinking. In: Proceedings of the 3rd European
conference on artificial evolution, Nimes, France, Oct 1997, vol. 1363 of Lecture Notes in
Computer Science. Berlin: Springer; 1997. p. 3–51.
7. Glover F. Tabu search and adaptive memory programming: advances, applications and chal-
lenges. In: Barr RS, Helgason RV, Kennington JL, editors. Interfaces in computer science
and operations research: advances in metaheuristics, optimization, and stochastic modeling
technologies. Boston, USA: Kluwer Academic Publishers; 1997. p. 1–75.
336 20 Tabu Search and Scatter Search
8. Glover F, Exterior path relinking for zero-one optimization. Int J Appl Metaheuristic Comput.
2014;5(3):8 pages.
9. Glover F, Laguna M. Tabu search. Norwell, MA, USA: Kluwer Academic Publishers; 1997.
10. Glover F, Laguna M, Marti R. Fundamentals of scatter search and path relinking. Control
Cybernet. 2000;29(3):653–84.
11. Glover F, Laguna M, Marti R. Scatter search. In: Koza JR, editors. Advances in evolutionary
computation: theory and applications. Berlin: Springer; 2003. p. 519–537.
12. Glover F, Lv Z, Hao JK. Diversification-driven tabu search for unconstrained binary quadratic
problems. 4OR Q J Oper Res. 2010;8:239–53.
13. Hanafi S. On the convergence of tabu search. J Heuristics. 2000;7(1):47–58.
14. James T, Rego C, Glover F. Multistart tabu search and diversification strategies for the quadratic
assignment problem. IEEE Trans Syst Man Cybern Part A. 2009;39(3):579–96.
15. Kalinli A, Karaboga D. Training recurrent neural networks by using parallel tabu search algo-
rithm based on crossover operation. Eng Appl Artif Intell. 2004;17:529–42.
16. Laguna M, Marti R. Scatter search: methodology and implementations in C. Dordrecht: Kluwer
Academic; 2003.
17. Lourenco HR, Martin OC, Stutzle T. Iterated local search: framework and applications. In:
Glover F, Kochenberger G, editors. Handbook of metaheuristics, 2nd ed. Boston, USA: Kluwer
Academic Publishers; 2010. p. 363–397.
18. Misevicius A, Lenkevicius A, Rubliauskas D. Iterated tabu search: an improvement to standard
tabu search. Inf Technol Control. 2006;35:187–97.
19. Siarry P, Berthiau G. Fitting of tabu search to optimize functions of continuous variables. Int
J Numer Methods Eng. 1997;40:2449–57.
20. Ugray Z, Lasdon L, Plummer JC, Glover F, Kelly J, Marti R. Scatter search and local NLP
solvers: a multistart framework for global optimization. INFORMS J Comput. 2007;19(3):328–
40.
Search Based on Human Behaviors
21
Human being is the most intelligent creature on this planet. This chapter introduces
various search metaheuristics that are inspired by various behaviors of human creative
problem-solving process.
g
d i,alt = x best − x i (t), (21.3)
p
where x i (t) is the position of the ith seeker, x i,best is its own personal best position
g
so far, and x best is the neighborhood best position so far.
Each seeker may be proactive to change his search direction according to his
past behavior and the environment. Proactiveness direction for each seeker i can be
determined by the empirical gradient by evaluating the latest three positions:
d i, pr o = x i (t1 ) − x i (t2 ), (21.4)
where x i (t1 ) and x i (t2 ) are the best and worst positions from {x i (t − 2), x i
(t − 1), x i (t)}, respectively.
The position update is given by
x i (t + 1) = x i (t) + αi (t)d i (t), (21.5)
where αi (t) is a step size, which is given by a Gaussian membership function.
Compared to PSO with inertia weight, PSO with constriction factor and DE, seeker
optimization algorithm has faster convergence speed and better global search ability
with more successful runs for the benchmark functions.
learning from the teacher x teacher , who is the best individual in the population. The
learner will move their position toward x teacher , by taking into account the current
mean value of the learners (x mean ) that represents the average qualities of all learners
in the population.
During the teacher phase, the learner x i updates his/her position by
x new,i = x i + r (x teacher − TF x mean ), (21.6)
where r is a random number ranges from 0 to 1, TF is a teaching factor that is used
to emphasize the importance of the learner’s average qualities x mean . TF = 1 or 2 is
heuristically obtained by TF = r ound[1 + rand(0, 1)].
For learner phase, each learner x i randomly selects a peer learner x j . It will move
toward or from x j depending on whether x j has better fitness than x i has:
x i + r (x j − x i ), f (x j ) > f (x i )
x new,i = . (21.7)
x i + r (x i − x j ), f (x j ) < f (x i )
TLBO has many features in common with DE.
Teaching and peer-learning PSO [16] adapts TLBO into PSO. It adopts the teach-
ing and peer-learning phases. The particle first enters into the teaching phase and
updates its velocity based on its historical best and the global best information.
Particle that fails to improve its fitness in the teaching phase then enters into the
peer-learning phase, where an exemplar is selected as the guidance particle. Roulette
wheel selection technique is employed to ensure fitter particle has higher probabil-
ity to be selected as the exemplar. Additionally, a stagnation prevention strategy is
employed.
In bare-bones TLBO [26], each learner of teacher phase employs an interactive
learning strategy, which is the hybridization of the learning strategy of teacher phase
in TLBO and Gaussian sampling learning based on neighborhood search, and each
learner of learner phase employs the learning strategy of learner phase in TLBO or
the new neighborhood search strategy. The bare-bones method outperforms TLBO.
TLBO is a parameter-free stochastic search technique. It gains its popularity due
to its ability to achieve better results in comparatively faster convergence to GA,
PSO, and ABC.
In [25], TLBO is enhanced with learning experience of other learners. In this
method, two random possibilities are used to determine the learning methods of
learners in different phases. In teacher phase, the learners improve their grades by
utilizing the mean information of the class and the learning experience of other learn-
ers according to a random probability. In learner phase, a learner learns knowledge
from another learner which is randomly selected from the whole class or the mutual
learning experience of two randomly selected learners. Area-copying operator in
producer–scrounger model is also used for parts of learners to increase the learning
speed.
340 21 Search Based on Human Behaviors
−5
10
−10
10
−15
10
−20
10
0 20 40 60 80 100
Iteration
Figure 21.1 The evolution of a random run of TLBO for Ackley function: the minimum and
average objectives.
Example 21.1:
We now reconsider Ackley function, which was solved in Example 7.1. The global
minimum value is 0 at x ∗ = 0.
We implement TLBO on this problem by setting the population size as 50, the
maximum number of iterations as 100, the teaching factor randomly as 1 or 2, and
selecting the initial population randomly from the entire domain. For a random
run, we have f (x) = 8.8818 × 10−16 at (−0.2663, −0.1530) × 10−15 . The con-
vergence curves are illustrated in Figure 21.1. TLBO always converged toward the
global optimum very rapidly during the random runs. When we take the number
of dimensions as 10, a random run gives the minimum value 1.8854 × 10−10 at
(0.1217, 0.5837, −0.2207, 0.7190, −0.2184, 0.1717, −0.1355, −0.7541, 0.5743,
−0.5536) ×10−10 .
the power of each country. Some of the best initial countries, i.e., the countries with
the least cost function value, become imperialists and start taking control of other
countries (called colonies) and form the initial empires.
Two major operators are assimilation and revolution. Assimilation makes the
colonies of each empire get closer to the imperialist state in the space of sociopolitical
characteristics (i.e., search space). Revolution causes sudden random changes in the
characteristics of some of the countries in the search space. During assimilation and
revolution, a colony might reach a better position and has the chance to take control
of the entire empire and replace the current imperialist of the empire. All the empires
try to win imperialistic competition and take possession of colonies of other empires.
Based on their power, all the empires have a chance to take control of one or more
of the colonies of the weakest empire. Weak empires lose their power gradually
and they will finally be eliminated. Algorithm continues with the mentioned steps
(assimilation, revolution, competition) until a stop condition is satisfied.
In [5], imperialist competitive algorithm is combined with a policy-learning func-
tion for solving the TSP. All offspring of each country represent feasible solutions for
the TSP. All countries can grow increasingly strong by learning the effective policies
of strong countries. Weak countries will generate increasingly excellent offspring by
learning the policies of strong countries while retaining the characteristics of their
own countries.
Example 21.2: The global minimum of Ackley function was solved in Examples 7.1
and 21.3. We now do the same thing by using imperialist competitive algorithm. We
set the number of initial countries as 200, number of initial imperialists as 8, number
−1
10
−2
10
−3
10
−4
10
0 20 40 60 80 100
Iteration
Figure 21.2 The evolution of a random run of imperialist competitive algorithm for Ackley func-
tion: the minimum and average objectives.
342 21 Search Based on Human Behaviors
of all colonies as 192, number of decades as 100, revolution rate as 0.3, assimila-
tion coefficient β = 2, assimilation angle coefficient gama = 0.5, cost penalizing
parameter of all colonies as 0.02, damping ratio as 0.99, uniting threshold as 0.02,
and α = 0.1. The algorithm stops when just one empire is remaining. The algo-
rithm always converges to the global optimum very rapidly for random runs. For
a random run, we have f (x) = 3.1093 × 10−4 at (0.0269, −0.1065) × 10−3 . The
convergence curves are illustrated in Figure 21.2.
Problems
References
1. Aickelin U, Burke EK, Li J. An evolutionary squeaky wheel optimisation approach to personnel
scheduling. IEEE Trans Evol Comput. 2009;13:433–43.
2. Ali H, Khan FA. Group counseling optimization for multi-objective functions. In: Proceedings
of IEEE congress on evolutionary computation (CEC), Cancun, Mexico, June 2013. p. 705–
712.
3. Atashpaz-Gargari E, Lucas C. Imperialist competitive algorithm: an algorithm for optimization
inspired by imperialistic competition. Proceedings of IEEE congress on evolutionary compu-
tation (CEC), Singapore, September 2007. p. 4661–4666.
4. Burman R, Chakrabarti S, Das S. Democracy-inspired particle swarm optimizer with the con-
cept of peer groups. Soft Comput. 2016, p. 1–20. doi:10.1007/s00500-015-2007-8.
5. Chen M-H, Chen S-H, Chang P-C. Imperial competitive algorithm with policy learning for the
traveling salesman problem. Soft Comput. 2016, p. 1–13. doi:10.1007/s00500-015-1886-z.
6. Dai C, Chen W, Zhu Y, Zhang X. Seeker optimization algorithm for optimal reactive power
dispatch. IEEE Trans Power Syst. 2009;24(3):1218–31.
7. Dai C, Zhu Y, Chen W. Seeker optimization algorithm. In: Wang Y, Cheung Y, Liu H, editors.
Computational intelligence and security, vol. 4456 of Lecture Notes in Computer Science.
Berlin: Springer; 2007. p. 167–176.
8. Eita MA, Fahmy MM. Group counseling optimization: a novel approach. In: Proceedings of
the 29th SGAI international conference on innovative techniquesand applications of artificial
intelligence (AI-2009), Cambridge, UK, Dec 2009, p. 195–208.
9. Eita MA, Fahmy MM. Group counseling optimization. Appl Soft Comput. 2014;22:585–604.
10. Feng X, Zou R, Yu H. A novel optimization algorithm inspired by the creative thinking process.
Soft Comput. 2015;19:2955–72.
11. Ghorbani N, Babaei E. Exchange market algorithm. Appl Soft Comput. 2014;19:177–87.
12. Joslin D, Clements DP. Squeaky wheel optimization. J Artif Intell Res. 1999;10:353–73.
13. Kamali HR, Sadegheih A, Vahdat-Zad MA, Khademi-Zare H. Immigrant population search
algorithm for solving constrained optimization problems. Appl Artif Intell. 2015;29:243–58.
14. Kashan AH. League championship algorithm (LCA): an algorithm for global optimization
inspired by sport championships. Appl Soft Comput. 2014;16:171–200.
15. Li J, Parkes AJ, Burke EK. Evolutionary squeaky wheel optimization: a new framework for
analysis. Evol Comput. 2011;19(3):405–28.
16. Lim WH, Isa NAM. Teaching and peer-learning particle swarm optimization. Appl Soft Com-
put. 2014;18:39–58.
17. Nazari-Shirkouhi S, Eivazy H, Ghodsi R, Rezaie K, Atashpaz-Gargari E. Solving the integrated
product mix-outsourcing problem by a novel meta-heuristic algorithm: imperialist competitive
algorithm. Expert Syst Appl. 2010;37(12):7615–26.
18. Osaba E, Diaz F, Onieva E. A novel meta-heuristic based on soccer concepts to solve routing
problems. In: Proceedings of the 15th ACM annual conference on genetic and evolutionary
computation (GECCO), Amsterdam, The Netherlands, July 2013. p. 1743–1744.
19. Osaba E, Diaz F, Onieva E. Golden ball: a novel metaheuristic to solve combinatorial opti-
mization problems based on soccer concepts. Appl Intell. 2014;41(1):145–66.
20. Rao RV, Patel V. An elitist teaching-learning-based optimization algorithm for solving complex
constrained optimization problems. Int J Ind Eng Comput. 2012;3:535–60.
21. Rao RV, Savsania VJ, Balic J. Teaching-learning-based optimization algorithm for uncon-
strained and constrained real-parameter optimization problems. Eng Optim. 2012;44:1447–62.
22. Rao RV, Savsani VJ, Vakharia DP. Teaching-learning-based optimization: an optimization
method for continuous non-linear large scale problems. Inf Sci. 2012;183(1):1–15.
23. Shi Y. Brain storm optimization algorithm. In: Advances in swarm intelligence, Vol. 6728 of
Lecture Notes in Computer Science. Berlin: Springer; 2011. p. 303–309.
346 21 Search Based on Human Behaviors
24. Wang L, Yang R, Ni H, Ye W, Fei M, Pardalos PM. A human learning optimization algorithm
and its application to multi-dimensional knapsack problems. Appl Soft Comput. 2015;34:736–
43.
25. Zou F, Wang L, Hei X, Chen D. Teaching-learning-based optimization with learning experience
of other learners and its application. Appl Soft Comput. 2015;37:725–36.
26. Zou F, Wang L, Hei X, Chen D, Jiang Q, Li H. Bare-bones teaching-learning-based optimiza-
tion. Sci World J. 2014; 2014: 17 pages. Article ID 136920.
Dynamic, Multimodal,
and Constrained Optimizations 22
This chapter treats several hard problems associated with metaheuristic optimization,
namely, dynamic, multimodal, and constrained optimization problems.
with the allele distribution of the population was first calculated and then was used
to generate immigrants for GAs to address DOPs with some preliminary results.
As to the number of immigrants, in order to prevent immigrants from disrupting
the ongoing search progress too much, the ratio of the number of the immigrants to
the population size, i.e., the replacement rate, is usually set to a small value, e.g., 0.2
or 0.3.
Some special diversity schemes have been developed for PSO in dynamic envi-
ronments. In charged PSO [5], a nucleus of neutral particles is surrounded by some
charged particles. The charge imposes a repulsion force between particles and thus
hinders the swarm to converge. Several techniques are quantum particles [6] based
on a quantum, the replacement of global by local neighborhoods [41] or hierarchical
neighborhood structures [36].
Multiple population methods [6,41,56,79] are used to enhance the population diver-
sity for an algorithm with the aim of maintaining multiple populations in different
subareas in the fitness landscape. One challenging issue of using the multipopulation
method is that of how to create an appropriate number of subpopulations with an ap-
propriate number of individuals to cover different subareas in the fitness landscape.
Clustering particle swarm optimizer of [79] can solve this problem. A hierarchical
clustering method is employed to automatically create a proper number of subpopu-
lations in different subareas. A hierarchical clustering method is investigated in [39]
to locate and track multiple optima for dynamic optimization problems.
In another multi-swarm approach [56], the number and size of swarms are adjusted
dynamically by a speciation mechanism, which was originally proposed for finding
multiple optima in multimodal landscapes.
The dynamic forecasting genetic program (DyFor GP) model [74] is a dynamic
GP model that is specifically tailored for forecasting in nonstatic environments. It
incorporates features that allow it to adapt to changing environments automatically,
as well as retain knowledge learned from previously encountered environments.
By adapting the concept of forking GA [70] to time-varying multimodal opti-
mization problems, multinational GA [72] uses multiple GA populations known as
nations to track multiple peaks in a dynamic environment, with each nation having
a policy representing the best point of the nation.
The self-organizing scouts approach [8] divides the population into a parent popu-
lation that searches the solution space and child populations that track known optima.
The parent population is periodically analyzed for clusters of partly converged indi-
viduals which are split off as child populations centered on the best individual in the
child population. Members of the parent population are then excluded from the child
population’s space. The size of child populations is altered to give large populations
to optima demonstrating high fitness or dynamism.
350 22 Dynamic, Multimodal, and Constrained Optimizations
Generally, the niching methods can be divided into two major categories: sequen-
tial niching and parallel niching. Sequential niching develops niches sequentially
over time. As niches are discovered, the search space of a problem is adapted to
repel other individuals from traversing the area around the recently located solu-
tion. The sequential niching technique [4] modifies the evaluation function in the
region of the solution to eliminate the solution found once an optimum is found. GA
continues the search for new solutions without restarting the population. In order to
avoid repeated search within previously visited areas, individuals in the vicinity of
a discovered optimum are punished by a fitness derating function.
Parallel niching forms and maintains several different niches simultaneously. The
search space is not modified. Parallel niching techniques therefore not only depend
on finding a good measure to locate possible solutions, but also need to organize
individuals in a way that maintains their organization in the search space over time,
to populate locations around solutions [29,34]. Most multimodal GAs adopt a parallel
scheme [27,29,40,54,70,71,82].
Dynamic niche clustering approach [27] starts from N small niches with given
initial radii. It merges niches approaching the same optimum and splits niches focus-
ing on different optima. Each niche has an independent
√
radius, which is dynamically
adjusted, with an initial radius σinitial = Nλ 1/dd , where d is the dimensionality of the
problem at hand, and λ is a constant. Dynamic niche clustering is able to identify
niches of variable radii. It also allows some overlap between niches. In [22], each
individual has its own radius, and the niche radius is incorporated as an additional
variable of the optimization problem.
Crossover between individuals from different niches may lead to unviable off-
spring and is usually avoided. It introduces a strong selection advantage to the niche
with the largest population, and thus accelerates symmetry breaking of the search
space and causes the population to become focused around one region of the search
space. This, however, prevents a thorough exploration of the fitness landscape and
makes it more likely to find a suboptimal solution.
22.2.3 Speciation
Clearing eliminates similar individuals and maintains the diversity among the se-
lected individuals. Clearing [58] determines the dominant individuals of the sub-
populations and removes the remaining population members from the mating pool.
The algorithm first sorts the population members in descending order of their fit-
ness values. It then picks one individual at a time from the top and removes all the
individuals with worse fitness than the selected one within the specified clearing ra-
dius σclear . This step is repeated until all the individuals in the population are either
selected or removed. The complexity of clearing is O(cN), for c niches maintained
during the generations. Clearing is simpler than sharing. It is also able to preserve
the best elements of the niches during the generations. However, clearing can be
slow to converge and may not locate local optima effectively. In clearing, the cleared
individuals still occupy population slots. In [65] these individuals are reallocated
outside the range of their respective fittest individuals. It is known that clearing is
particularly sensitive to parameterization [40].
In local selection scheme [51], fitness is the result of an individual’s interaction
with the environment and its finite shared resources. Individual fitnesses are com-
pared to a fixed threshold to decide as to who gets the opportunity to reproduce.
Local selection is an implicitly niched scheme. It maintains genetic diversity in a
way similar to, yet generally more efficient than, fitness sharing. Local selection is
suitable for parallel implementations. It can effectively avoid premature convergence
and it applies minimal selection pressure upon the population.
22.2 Multimodal Optimization 357
In [66], the conservation of the best successive local individuals is integrated with
a topological method of separating the subpopulations instead of the conventional
radius-triggered manner.
Some niching techniques integrated with PSO are given in [11,56]. In [43], a
simple lbest PSO employing the ring topology is used to ensure stable niching be-
haviors. Index-based neighborhood is utilized in ring-topology-based PSO to control
the speed of convergence for the PSO population.
In [9], a neighborhood-based mutation is integrated with three different DE nich-
ing algorithms, namely, crowding DE, species-based DE, and sharing DE to solve
multimodal optimization problems. In neighborhood mutation, difference vector gen-
eration is limited to m similar individuals. In this way, each individual is evolved
toward its nearest optimal point and the possibility of between niche difference vec-
tor generation is reduced. Generally, m should be chosen between 0.05 to 0.2 of the
population size. In [59], an Euclidean neighborhood-based mutation is integrated
with various niching DE algorithms. Neighborhood mutation is able to restrict the
production of offspring within a local area or the same niche as their parents.
In addition to those genetic operator based GA such as clearing [58] and species-
conserving GA [40], there are also some population-based multimodal GA tech-
niques, such as multinational GA [71], multipopulation GA, forking GA, and roam-
ing. Forking GA [70] uses a multipopulation scheme, which involves one parent
population that explores one subspace and one or more child populations that exploit
other subspaces. Multinational GA [71] maintains a number of nations. Each nation
corresponds to a promising optimum area in the search space. Mating is locally re-
stricted within individual nation. Selection is performed either globally (weighted
selection) or locally (national selection). In multinational GA with national selection,
individuals only compete with other individuals from the same nation.
CMA-ES with self-adaptive niche radius [64] applied a concept of adaptive indi-
vidual niche radius in conjunction with CMA-ES. The so-called niche radius problem
is addressed by the introduction of radii-based niching methods with derandomized
ES [64]. A new concept of an adaptive individual niche radius is applied to niching
with CMA-ES.
358 22 Dynamic, Multimodal, and Constrained Optimizations
The popular performance metrics for multimodal optimization are effective number
of the peaks maintained (ENPM), maximum peak ratio (MPR), and Chi-square-like
performance criterion.
A large ENPM value indicates a good ability to identify and maintain multiple
peaks. After running a niche EA several times, the average and the standard deviation
of the ENPM are calculated to characterize the algorithm.
ENPM does not consider the influence of peak heights. Suppose that a problem
has k peaks with heights h1 , . . . , hk , and that the algorithm has found m peaks with
heights h1 , . . . , hm
. MPR is defined by [54]
m
h
MPR = ki=1 i . (22.6)
j=1 hj
MPR grants higher peaks with more preference. A larger MPR value means a better
convergence to peaks. It takes a maximum value of 1, when all the peaks have been
identified and maintained correctly.
ENPM and MPR do not consider the distribution of individuals in the last gen-
eration. Chi-square-like (CSL) performance criterion [18], which has the form of
chi-square distribution, can be used to evaluate the distribution of a population.
Suppose that every individual at the end of the evolution converges to one peak
and that the probability pi of an individual being on peak i is given by
hi
pi = k . (22.7)
l=1 hl
CSL is defined by
k+1
(xi − μi )2
CSL = , (22.8)
i=1
σi2
When dealing with optimization problems with constraints, two kinds of constraints,
namely, equality and inequality constraints, may arise. The existence of equality con-
straints reduces the size of the feasible space significantly, which makes it difficult to
locate feasible and optimal solutions. Popular methods include penalizing infeasible
individuals, repairing infeasible individuals, or considering bound violation as an
360 22 Dynamic, Multimodal, and Constrained Optimizations
Penalty function method exploits infeasible solutions by adding some penalty value
to the objective function of each infeasible individual so that it will be penalized for
violating the constraints. It converts equality and/or inequality constraints into a new
objective function, so that beyond the constraints the objective function is abruptly
reduced. Most of the constraint handling methods are based on penalty functions.
Penalty function method transforms constraint optimization problems into uncon-
strained optimization problems by defining an objective function in the form such
as [33]
f (x) = f (x) + fp (x), (22.9)
22.3 Constrained Optimization 361
where f (x) is the objective function, and fp (x) < 0 is a penalty for infeasible solutions
and zero for feasible solutions. A constrained problem can be solved by a sequence
of unconstrained optimizations in which the penalty factors are stepwise intensified.
Static penalty functions usually require the user to control the amount of penalty
added when multiple constraints are violated. These parameters are usually prob-
lem dependent and chosen heuristically. The penalties are the weighted sum of the
constraint violations.
In adaptive penalty function methods [24,37,68,76], information gathered from
the search process, such as the generation number t of EA, are used to control the
amount of penalty added to infeasible solutions, and they do not require users to
define parameters explicitly.
As the number of generations increases, the penalty also increases, and this puts
more and more selective pressure on GA to find a feasible solution [37]. Penalty
factors can be defined statically or depending on the number of satisfied constraints.
They can dynamically depend on the number of generations [37]
f˜ (x) = f (x) + (Ct)α G(x), (22.10)
where C and α are user-defined, and G(x) is a penalty function. Typically C = 0.5,
α = 1 or 2.
In [38], the average value of the objective function in the current population and
the level of violation of each constraint during the evolution process are used to define
the penalty parameters. For each constraint violation, a different penalty coefficient
is assigned so that a higher penalty value will be added for larger violation of a given
constraint. This requires an extra computation to evaluate the average value of the
objective function in each generation.
In [24], an infeasibility measure is used to form a two-stage penalty that is im-
posed upon the infeasible solutions to ensure that those infeasible individuals with
low fitness value and low constraint violation remain fit. The worst infeasible in-
dividual is first penalized to have objective function value equal to or greater than
the best feasible solution. This value is then increased to twice the original value
by penalizing. All other individuals are penalized accordingly. The method requires
no parameter tuning and no initial feasible solution. However, the algorithm fails to
produce feasible solutions in every run.
In [68], infeasible individuals with low objective value and low constraint violation
are exploited to facilitate finding feasible individuals in each run as well as producing
quality results. The number of feasible individuals in the population is used to guide
the search process either toward finding more feasible individuals or searching for
the optimum solution. Two types of penalties are added to each infeasible individual
to identify the best infeasible individuals in the current population. The amount of
the two penalties added is controlled by the number of feasible individuals currently
present in the population. If there are few feasible individuals, large penalty will be
added to infeasible individuals with higher constraint violation. On the other hand,
if there are sufficient numbers of feasible individuals present, infeasible individuals
with larger objective function values will be penalized more than infeasible ones
with smaller objective function values. The algorithm can find feasible solutions in
362 22 Dynamic, Multimodal, and Constrained Optimizations
problems having small feasible space compared to the search space. The proposed
method is simple to implement and does not need any parameter tuning. It is able to
find feasible solutions in every run for all of the benchmark functions tested.
In [76], the constraint handling technique extends the single-objective optimiza-
tion algorithm proposed in [68] for multiobjective optimization. It is based on an
adaptive penalty function and a distance measure. These two functions vary de-
pending upon the objective function value and the sum of constraint violations of
an individual. The objective space is modified to account for the performance and
constraint violation of each individual. The modified objective functions are used
in the nondominance sorting to facilitate the search of optimal solutions not only
in the feasible space but also in the infeasible regions. The search in the infeasible
space is designed to exploit those individuals with better objective values and lower
constraint violations. The number of feasible individuals in the population is used
to guide the search process either toward finding more feasible solutions or favor in
search for optimal solutions.
In [2], the average value of the objective function in the current population and the
level of violation of each constraint during the evolution process are used to define
the penalty parameters. For each constraint violation, a different penalty coefficient
is assigned so that a higher penalty value will be added for larger violation of a given
constraint.
The constrained optimum is usually located at the boundary between feasible
and infeasible domains. Self-organizing adaptive penalty method [46] attempts to
maintain an equal number of designs on each side of the constraint boundary. The
method adjusts the penalty parameter value of each constraint according to the ratio
of the number of solutions that satisfy the constraint to the number of solutions that
violate the constraint. The penalty cost is calculated as the sum of a penalty factor
multiplying the constraint violation and the penalty pressure term that increases as
the generation increases.
(a) 2 6
Variable 2 4
1 2
0
0
−2
−0.5 0 0.5 1 1.5 2
Variable 1
−1
Best: −3.5533 Mean: −2.7103
Fitness value
−2 Best fitness
Mean fitness
−3
−4
0 20 40 60 80 100
Generation
(b) 2 6
4
Variable 2
1 2
0
0
−2
−0.5 0 0.5 1 1.5 2
Variable 1
−1
Best: −3.563 Mean: −3.5626
Fitness value
−2 Best fitness
Mean fitness
−3
−4
0 20 40 60 80 100
Generation
Figure 22.1 The evolution of a random run of GA for an linearly constrained problem. (a) At the
5th generation. (b) At the end of evolution.
Constraint violation and objective function can be optimized separately using multi-
objective optimization techniques [25,61,62,67,75]. A single-objective constrained
optimization problem can be converted into a MOP by treating the constraints as one
or more objectives of constraint violation to be minimized.
To be more specific, a constrained optimization problem can be transformed into
a two-objective problem, where one objective is the original objective and the other
is the overall violation of the constraints [13]. The method maintains two groups of
individuals: one for the population in GAs, and the other for best infeasible indi-
viduals close to the feasible region. A constrained optimization problem can also be
transformed into a (k + 1)-objective problem, where k objectives are related to the
k constraints, and one is the original objective.
In [61,62], stochastic ranking is introduced to achieve a balance between objec-
tive and penalty functions stochastically in terms of the dominance of penalty and
objective functions. A probability factor is used to determine whether the objective
function value or the constraint violation value determines the rank of each individual.
Suitable ranking alone is capable of improving the search performance significantly.
In [62], the simulation results reveal that the unbiased multiobjective approach to
constraint handling may not be effective. A nondominated rank removes the need
for setting a search bias. However, this does not eliminate the need for having a bias
in order to locate feasible solutions.
In [73], constrained optimization problems are solved by a two-phase algorithm. In
the first phase, only constraint satisfaction is considered. The search is directed toward
finding a single feasible solution using ranking. In the second phase, simultaneous
optimization of the objective function and the satisfaction of the constraints is treated
as a biobjective optimization problem. In this case, nondominated ranking is used
to rank individuals, and niching scheme is used to preserve diversity. The algorithm
can always find feasible solutions for all problems.
α-constrained method [67] introduces a satisfaction level of a search point for
the constraints. It can convert an algorithm for unconstrained problems into an algo-
rithm for constrained problems by replacing ordinary comparisons with the α level
comparisons.
Hybrid constrained optimization EA [75] effectively combines multiobjective
optimization with global and local search models. A niching GA based on tournament
selection is used to perform global search. A parallel local search operator is adopted
to implement a clustering partition of the population and multiparent crossover is
used to generate the offspring population. The dominated individuals in the parent
population are replaced by nondominated individuals in the offspring population.
Infeasible individuals are replaced in a way to rapidly guide the population toward
the feasible region of the search space.
Problems
22.1 Explain why fitness sharing technique does not suffer from genetic drift after
all relevant peaks have already been found.
22.3 Constrained Optimization 365
References
1. Baluja S. Population-based incremental learning: a method for integrating genetic search based
function optimization and competitive learning. Computer Science Department, Carnegie Mel-
lon University, Pittsburgh, PA, USA, Technical Report CMU-CS-94-163. 1994.
2. Barbosa HJC, Lemonge ACC. An adaptive penalty scheme in genetic algorithms for constrained
optimization problems. In: Proceedings of the genetic and evolutionary computation conference
(GECCO), New York, July 2002. p. 287–294.
3. Basak A, Das S, Tan KC. Multimodal optimization using a biobjective differential evo-
lution algorithm enhanced with mean distance-based selection. IEEE Trans Evol Comput.
2013;17(5):666–85.
4. Beasley D, Bull DR, Martin RR. A sequential niche technique for multimodal function opti-
mization. Evol Comput. 1993;1(2):101–25.
5. Blackwell TM, Bentley PJ. Dynamic search with charged swarms. In: Proceedings of the
genetic and evolutionary computation conference (GECCO), New York, July 2002. p. 19–26.
6. Blackwell T, Branke J. Multi-swarm optimization in dynamic environments. In: Applications
of Evolutionary Computing, vol. 3005 of Lecture Notes in Computer Science. Berlin: Springer.
p. 489–500.
7. Branke J. Memory enhanced evolutionary algorithms for changing optimization problems. In:
Proceedings of the IEEE congress on evolutionary computation (CEC), Washington, DC, USA,
July 1999. p. 1875–1882.
366 22 Dynamic, Multimodal, and Constrained Optimizations
28. Mc Ginley B, Maher J, O’Riordan C, Morgan F. Maintaining healthy population diversity using
adaptive crossover, mutation, and selection. IEEE Trans Evol Comput. 15:5;2011: 692–714.
29. Goldberg DE, Richardson J. Genetic algorithms with sharing for multimodal function opti-
mization. In: Grefenstette J, edtor. Proceedings of the 2nd International conference on genetic
algorithms and their application, Cambridge, MA, USA, July 1987. Hillsdale, New Jersey:
Lawrence Erlbaum; 1987. p. 41–49.
30. Grefenstette JJ. Genetic algorithms for changing environments. In: Proceedings of the 2nd
International conference on parallel problem solving from nature (PPSN II), Brussels, Belgium,
September 1992. p. 137–144.
31. Hansen N. Benchmarking a BI-population CMA-ES on the BBOB-2009 function testbed.
In: Proceedings of Genetic and Evolutionary Computation Conference (GECCO), Montreal,
Canada, July 2009, pp. 2389–2395.
32. Harik GR. Finding multimodal solutions using restricted tournament selection. In: Proceedings
of the 6th International conference on genetic algorithms, Pittsburgh, PA, USA, July 1995. San
Mateo, CA: Morgan Kaufmann; 1995. p. 24–31.
33. Homaifar A, Lai SHY, Qi X. Constrained optimization via genetic algorithms. Simulation.
1994;62(4):242–54.
34. Horn J. The nature of niching: genetic algorithms and the evolution of optimal, cooperative
populations. Ph.D. Thesis, Genetic Algorithm Lab, University of Illinois at Urbana-Champaign
Champaign, IL, USA, 1997.
35. Horn J, Nafpliotis N, Goldberg DE. A niched pareto genetic algorithm for multiobjective
optimization. In: Proceedings of the 1st IEEE Conference on evolutionary computation (CEC),
Orlando, FL, USA, June 1994, vol. 1, p. 82–87.
36. Janson S, Middendorf M. A hierarchical particle swarm optimizer for dynamic optimization
problems. In: Applications of evolutionary computing, vol. 3005 of Lecture Notes in Computer
Science. Berlin: Springer; 2004. p. 513–524.
37. Joines JA, Houck CR. On the use of non-stationary penalty functions to solve nonlinear con-
strained optimization problems with GAs. In: Proceedings of IEEE Congress on evolutionary
computation (CEC), Orlando, FL, USA, June 1994, p. 579–584.
38. Lemonge ACC, Barbosa HJC. An adaptive penalty scheme in genetic algorithms for constrained
optimization problems. In: Proceedings of genetic and evolutionary computation conference
(GECCO), New York, July 2002, p. 287–294.
39. Li C, Yang S. A general framework of multipopulation methods with clustering in undetectable
dynamic environments. IEEE Trans Evol Comput. 2012;16(4):556–77.
40. Li J-P, Balazs ME, Parks GT, Clarkson PJ. A species conserving genetic algorithm for multi-
modal function optimization. Evol Comput. 2002;10(3):207–34.
41. Li X. Adaptively choosing neighborhood bests using species in a particle swarm optimizer for
multimodal function optimization. In: Proceedings of the genetic and evolutionary computation
conference (GECCO), Seattle, WA, USA, June 2004. p. 105–116.
42. Li X. Efficient differential evolution using speciation for multimodal function optimization. In:
Proceedings of conference on genetic and evolutionary computation (GECCO), Washington,
DC, USA, June 2005. p. 873–880.
43. Li X. Niching without niching parameters: particle swarm optimization using a ring topology.
IEEE Trans Evol Comput. 2010;14(1):150–69.
44. Li L, Tang K. History-based topological speciation for multimodal optimization. IEEE Trans
Evol Comput. 2015;19(1):136–50.
45. Liapis A, Yannakakis GN, Togelius J. Constrained novelty search: a study on game content
generation. Evol Comput. 2015;23(1):101–29.
46. Lin CY, Wu WH. Self-organizing adaptive penalty strategy in constrained genetic search. Struct
Multidiscip Optim. 2004;26(6):417–28.
47. Ling Q, Wu G, Yang Z, Wang Q. Crowding clustering genetic algorithm for multimodal function
optimization. Appl Soft Comput. 2008;8(1):88–95.
368 22 Dynamic, Multimodal, and Constrained Optimizations
48. Liu L, Yang S, Wang D. Particle swarm optimization with composite particles in dynamic
environments. IEEE Trans Syst, Man, Cybern Part B. 2010;40(6):1634–48.
49. Mahfoud SW. Crowding and preselection revisited. In: Manner R, Manderick B, editors. Pro-
ceedings of the 2th International conference on parallel problem solving from nature (PPSN
II), Brussels, Belgium, September 1992. Amsterdam: Elsevier; 1992. p. 27–36.
50. Mahfoud SW. Niching methods for genetic algorithms. Technical Report 95001, Illinois Ge-
netic Algorithms Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL, USA,
1995.
51. Menczer F, Belew RK. Local selection. In: Proceedings of the 7th International conference on
evolutionary programming, San Diego, CA, USA, March 1998, vol. 1447 of Lecture Notes in
Computer Science. Berlin: Springer; 1998. p. 703–712.
52. Mengshoel OJ, Goldberg DE. Probability crowding: deterministic crowding with probabilistic
replacement. In: Proceedings of genetic and evolutionary computation conference (GECCO),
Orlando, FL, USA, July 1999. p. 409–416.
53. Mengshoel OJ, Goldberg DE. The crowding approach to niching in genetic algorithms. Evol
Comput. 2008;16(3):315–54.
54. Miller BL, Shaw MJ. Genetic algorithms with dynamic niche sharing for multimodal function
optimization. In: Proceedings of IEEE International conference on evolutionary computation
(CEC), Nagoya, Japan, May 1996. p. 786–791.
55. Morrison RW, De Jong KA. Triggered hyper mutation revisited. In: Proceedings of congress
on evolutionary computation (CEC), San Diego, CA, USA, July 2000. p. 1025–1032.
56. Parrott D, Li X. Locating and tracking multiple dynamic optima by a particle swarm model
using speciation. IEEE Trans Evol Comput. 2006;10(4):440–58.
57. Parsopoulos KE, Tasoulis DK, Pavlidis NG, Plagianakos VP, Vrahatis MN. Vector evaluated
differential evolution for multiobjective optimization. In: Proceedings of IEEE congress on
evolutionary computation (CEC), Portland, OR, USA, June 2004. p. 204–211.
58. Petrowski A. A CLEARING procedure as a niching method for genetic algorithms. In: Pro-
ceedings of IEEE International conference on evolutionary computation (CEC), Nagoya, Japan,
May 1996. p. 798–803.
59. Qu BY, Suganthan PN, Liang JJ. Differential evolution with neighborhood mutation for mul-
timodal optimization. IEEE Trans Evol Comput. 2012;16(5):601–14.
60. Richter H. Detecting change in dynamic fitness landscapes. In: Proceedings of congress on
evolutionary computation (CEC), Trondheim, Norway, May 2009. p. 1613–1620.
61. Runarsson TP, Yao X. Stochastic ranking for constrained evolutionary optimization. IEEE
Trans Evol Comput. 2000;4(3):284–94.
62. Runarsson TP, Yao X. Search bias in constrained evolutionary optimization. IEEE Trans Syst,
Man, Cybern Part C. 2005;35:233–43.
63. Shir OM, Back T. Niche radius adaptation in the CMA-ES niching algorithm. In: Proceedings of
the 9th International conference on parallel problem solving from nature (PPSN IX), Reykjavik,
Iceland, September 2006, vol. 4193 of Lecture Notes in Computer Science. Berlin: Springer;
2006. p. 142–151.
64. Shir OM, Emmerich M, Back T. Adaptive niche radii and niche shapes approaches for niching
with the CMA-ES. Evol Comput. 2010;18(1):97–126.
65. Singh G, Deb K. Comparison of multimodal optimization1 algorithms based on evolution-
ary algorithms. In: Proceedings of the 8th Annual conference on genetic and evolutionary
computation (GECCO), Seattle, WA, USA, June 2006. p. 1305–1312.
66. Stoean C, Preuss M, Stoean R, Dumitrescu D. Multimodal optimization by means of a topo-
logical species conservation algorithm. IEEE Trans Evol Comput. 2010;14(6):842–64.
67. Takahama T, Sakai S. Constrained optimization by applying the α-constrained method to the
nonlinear simplex method with mutations. IEEE Trans Evol Comput. 2005;9(5):437–51.
68. Tessema B, Yen GG. An adaptive penalty formulation for constrained evolutionary optimiza-
tion. IEEE Trans Syst, Man, Cybern Part A. 2009;39(3):565–78.
References 369
69. Thomsen R. Multimodal optimization using crowding-based differential evolution. In: Pro-
ceedings of IEEE Congress on evolutionary computation (CEC), Portland, OR, USA, June
2004. p. 1382–1389.
70. Tsutsui S, Fujimoto Y, Ghosh A. Forking genetic algorithms: GAs with search space division
schemes. Evol Comput. 1997;5:61–80.
71. Ursem RK. Multinational evolutionary algorithms. In: Proceedings of the IEEE Congress on
evolutionary computation (CEC), Washington, DC, USA, July 1999. p. 1633–1640.
72. Ursem RK. Multinational GAs: multimodal optimization techniques in dynamic environments.
In: Proceedings of the genetic and evolutionary computation conference (GECCO), Las Vegas,
NV, USA, July 2000. p. 19–26.
73. Venkatraman S, Yen GG. A generic framework for constrained optimization using genetic
algorithms. IEEE Trans Evol Comput. 2005;9(4):424–35.
74. Wagner N, Michalewicz Z, Khouja M, McGregor RR. Time series forecasting for dynamic
environments: the DyFor genetic program model. IEEE Trans Evol Comput. 2007;11(4):433–
52.
75. Wang Y, Cai Z, Guo G, Zhou Y. Multiobjective optimization and hybrid evolutionary algo-
rithm to solve constrained optimization problems. IEEE Trans Syst, Man, Cybern Part B.
2007;37(3):560–75.
76. Woldesenbet YG, Yen GG, Tessema BG. Constraint handling in multiobjective evolutionary
optimization. IEEE Trans Evol Comput. 2009;13(3):514–25.
77. Yang S. Genetic algorithms with elitism-based immigrants for changing optimization problems.
In: Applications of evolutionary computing, vol. 4448 of Lecture Notes in Computer Science.
Berlin: Springer; 2007. p. 627–636.
78. Yang S. Genetic algorithms with memory- and elitism-based immigrants in dynamic environ-
ments. Evol Comput. 2008;16(3):385–416.
79. Yang S, Li C. A clustering particle swarm optimizer for locating and tracking multiple optima
in dynamic environments. IEEE Trans Evol Comput. 2010;14(6):959–74.
80. Yang S, Yao X. Population-based incremental learning with associative memory for dynamic
environments. IEEE Trans Evol Comput. 2008;12(5):542–61.
81. Yao J, Kharma N, Grogono P. Bi-objective multipopulation genetic algorithm for multimodal
function optimization. IEEE Trans Evol Comput. 2010;14(1):80–102.
82. Yin X, Germay N. A fast genetic algorithm with sharing scheme using cluster analysis meth-
ods in multimodal function optimization. In: Proceedings of the International conference on
artificial neural nets and genetic algorithms, Innsbruck, Austria, 1993. Vienna: Springer; 1993.
p. 450–457.
83. Yu X, Tang K, Yao X. An immigrants scheme based on environmental information for genetic
algorithms in changing environments. In: Proceedings of the IEEE Congress on evolutionary
computation (CEC), Hong Kong, June 2008. p. 1141–1147.
Multiobjective Optimization
23
23.1 Introduction
system from converging toward solutions that are not with respect to any criterion.
This algorithm, however, has bias toward some regions [150].
In multiple-objective genetic local search (MOGLS) [74], the MOP is reformu-
lated as a simultaneous optimization of all weighted Tchebycheff functions or a
weighted sum of multiple objectives as a fitness function. A local search procedure
is applied to each individual generated by genetic operations. MOGLS randomly
specifies weight values whenever a pair of parent solutions are selected. It examines
only a small number of neighborhood solutions of a current solution in the local
search procedure.
EAs tend to converge to a single solution if running long enough. Therefore, a
mechanism to maintain diversity is required in order to deal with MOPs. All the
nondominated solutions should be considered equally good by the selection mech-
anism. Goldberg [58] introduced nondominated sorting to rank a search population
according to Pareto optimality. Pareto ranking assigns a rank to each solution based
on its Pareto dominance, such that nondominated solutions are all sampled at the
same rate. In Pareto rank method, all individuals need to be compared with others
using a Pareto dominance concept to determine the nondominated solutions in the
current population. Pareto ranking gives nondominated individuals the highest rank,
i.e., rank 1. Then rank-1 individuals are removed from the population, the nondom-
inated solutions are determined in the remaining individuals, and rank 2 are given.
The procedure is repeated until all individuals have been assigned a rank number.
Niching and speciation techniques can be used to promote genetic diversity so that
the entire Pareto frontier is covered. Equal probability of reproduction is assigned to
all nondominated individuals in the population.
Multiobjective GA (MOGA) [50,51] uses a rank-based fitness assignment method.
Niche formation techniques are used to promote diversity among preferable candi-
dates. If an individual xi at generation t is dominated by pi (t) individuals in the
current population, its current rank is given by rank(xi (t)) = 1 + pi (t) [50]. All non-
dominated individuals are assigned rank 1, see Figure 23.1.
The rank-based fitness assignment can be implemented in three steps [50]. First,
the population is sorted according to rank. Then, fitnesses is assigned to individuals
by interpolating from the best (rank 1) to the worst (rank n < NP ) according to
f1
23.1 Introduction 373
some function, say, linear function. Finally, the fitnesses of individuals having the
same rank are averaged, so that all of them will be sampled at the same rate. This
procedure keeps the global population fitness constant while maintaining appropriate
selective pressure. The vast majority of MOEAs resort to Pareto ranking as a fitness
assignment methodology.
Pearson’s correlation coefficient has been used as the measure of conflict among
the objectives in KOSSA [75], thus aiding in dimension reduction. Another method
selects a subset of objectives and performs the MOEA based on those objectives only.
In the context of objective reduction, a principal component analysis (PCA)-based
algorithm has been suggested in [40]. In [16], δ-conflict is defined as a measure of
conflict among objective functions, and it is used to select a subset of the original
set of objectives, which preserve the weak Pareto dominance.
To extend multiobjective optimization algorithms in the presence of noise in fit-
ness estimates, a common strategy is to utilize the concept of sampling (fitness
reevaluation of the same trial solution) to improve fitness estimates in the presence
of noise [6].
Dynamic MOPs requires an optimization algorithm to continuously track the
moving Pareto front over time. In [162], a directed search strategy combines two
mechanisms for achieving a good balance between exploration and exploitation of
MOEAs in changing environments. The first mechanism reinitializes the population
based on the predicted moving direction as well as the directions that are orthogonal
to the moving direction of the Pareto set, when a change is detected. The second
mechanism aims to accelerate the convergence by generating solutions in predicted
regions of the Pareto set according to the moving direction of the nondominated
solutions between two consecutive generations.
Pareto-based methods can be nonelitist and elitist MOEAs. They typically adopt
Pareto ranking, some form of elitism and diversity maintenance strategy. They have
the ability to find multiple Pareto optimal solutions in one single run. Nonelitist
MOEAs do not retain the nondominated solutions that they generate. Representative
nonelitist MOEAs are nondominated sorting GA (NSGA) [149], niched Pareto GA
(NPGA) [71], and MOGA [50]. Elitist MOEAs retain these solutions either in an
external archive or in the main population. Elitism allows solutions that are globally
nondominated to be retained. Popular Pareto-based elitist MOEAs are strength Pareto
EA (SPEA) [181], strength Pareto EA 2 (SPEA2) [180], Pareto archived ES (PAES)
[84], and nondominated sorting GA II (NSGA-II) [39].
A good MOEA for MOPs should satisfy the requirements of convergence, dis-
tribution, and elitism. MOEAs should have a convergence mechanism so that they
can find the Pareto front as soon as possible. They should distribute their individuals
as evenly as possible along the Pareto front so as to provide more nondominated
374 23 Multiobjective Optimization
Example 23.1:
In this example, we run gamultiobj solver to minimize the Schaffer function:
min f1 (x) = x 2 , f2 (x) = (x − 2)2 , [−10, 10]. (23.1)
x
This function has a convex, continuous Pareto optimal front with x ∈ [0, 2]. The
Schaffer function is plotted in Figure 23.2.
376 23 Multiobjective Optimization
The population size is selected as 50. The solver will try to limit the number
of individuals in the current population that are on the Pareto front to 40 % of the
population size by setting the Pareto fraction to 0.4. Thus, the number of points on
the Pareto front is 20. The initial population is randomly selected within the domain.
Crowding distance function in genotype space is selected for diversity control.
For a random run, the solver terminated at the 200th generation, which is the
default maximum number of generations, with 10051 function evaluations. The
solutions on the Pareto front has an average distance of 0.00985864, and a spread
of 0.127192. The obtained Pareto front is given in Figure 23.3. It is shown that the
150
f1
f
2
100
f
50
0
−10 −8 −6 −4 −2 0 2 4 6 8 10
x
2.5
2
1.5
1
0.5
0
0 1 2 3 4 5
Objective 1
23.2 Multiobjective Evolutionary Algorithms 377
solutions are well distributed over the Pareto front. The objective values on the Pareto
front are both within [0, 4], corresponding to the Pareto optimal solutions x ∈ [0, 2].
For a random run with crowding distance function in phenotype space, the solver
terminated at the 139th generation, with 7001 function evaluations. The solutions on
the Pareto front has an average distance of 0.0190628 and a spread of 0.100125.
SPEA [181] is an elitist Pareto-based strategy that uses an external archive to store the
nondominated solutions found so far. The Pareto-based fitness assignment method
is itself a niching technique that does not require the concept of distance. The fitness
(strength) of a nondominated solution stored in the external archive is proportional
to the number of individuals covered, while the fitness of a dominated individual
is calculated by summing the strength of the nondominated solutions that cover
it. This fitness assignment criterion results in the definition of a niche that can be
identified with the portion of the objective function space covered by a nondominated
solution. Both the population and the external nondominated set participate in the
selection phase (the smaller the fitness, the higher the reproduction probability). The
secondary population is updated every generation and pruned by clustering if the
number of the nondominated individuals exceeds a predefined size. SPEA can be
very effective in sampling along the entire Pareto optimal front and distributing the
generated solutions over the tradeoff surface.
A systematic comparison of various MOEAs is provided by [177] (http://www.
tik.ee.ethz.ch/~zitzler/testdata.html) using six carefully chosen test functions [35]:
convexity, non-convexity, discrete Pareto fronts, multimodality, deception and biased
search spaces. A clear hierarchy of algorithms emerges regarding the distance to the
Pareto optimal front in descending order of merit: SPEA, NSGA, VEGA, HLGA,
NPGA, FFGA. While there is a clear performance gap between SPEA and NSGA,
as well as between NSGA and the remaining algorithms, the fronts achieved by
VEGA, HLGA, NSGA, and FFGA are rather close. Elitism is shown to be an impor-
tant factor for improving evolutionary multiobjective search. An elitist variant of
NSGA (NSGA-II) equals the performance of SPEA. The performance of the other
algorithms improved significantly when elitist strategy was included.
SPEA2 (in C language, http://www.tik.ee.ethz.ch/sop/pisa/) [179] improves SPEA
by incorporating a fine-grained fitness assignment strategy, a nearest neighbor density
estimation technique, and an enhanced archive truncation method. The convergence,
distribution, and elitism mechanisms in SPEA2 are raw fitness assignment, density,
and archive, respectively. Both the archive and population are assigned a fitness based
upon the strength and density estimation. SPEA2 and NSGA-II seem to behave in
a very similar manner, and they both outperform Pareto envelope-based selection
algorithm (PESA).
378 23 Multiobjective Optimization
If xi is the nondominated solution in the union of A and P, it is assigned the best raw
fitness (such as zero.)
Density estimation function is an adaptation of the kth nearest neighbor, where
the density at any point is a decreasing function of the distance to the kth nearest data
point. Density D(i) is defined to describe the crowdedness of xi , based on ranking
the distances of every individual to all the other individuals. A truncation method
based upon the density is applied to keep the archive at a fixed size.
Every individual is granted fitness, based on which the basis of the binary tour-
nament select is:
F(i) = R(i) + D(i). (23.4)
For nondominated individual xi , R(i) = 0 and D(i) < 1, thus the MOP is transformed
into a single-objective minimization problem.
The adaptive grid mechanism was first used in PAES [84]. In PAES, an external
archive is incorporated to store all the nondominated solutions obtained. In reality,
the adaptive grid is a space formed by hypercubes. As it is effective in maintaining
diversity of nondominated solutions, adaptive grid and its variations are used by
a number of algorithms. The adaptive grid is started when the upper limitation of
external archive is reached. This means that it cannot maintain good distribution of
nondominated solutions when the number of nondominated solutions is below the
upper limitation.
PAES [84] expands ES to solve MOPs. PAES ensures that the nondominated
solutions residing in an uncrowded location will survive. In the simplest (1+1)-
PAES, there are three groups: one current individual, one updated individual, and
one archive containing all the nondominated individuals found thus far. It consists
of a (1 + 1)-ES employing local search in combination with a reference archive that
records some of the nondominated solutions previously found in order to identify the
approximate dominance ranking of the current and candidate solution vectors. If the
archive size exceeds a threshold, then it is pruned by removing the individual that has
the smallest crowding distance. This archive is used as a reference set against which
each mutated individual is compared. The mechanism used to maintain diversity
consists of a crowding procedure that divides the objective space in a recursive
23.2 Multiobjective Evolutionary Algorithms 379
manner. Each solution is placed in a certain grid location based on the values of its
objectives. A map of such grid is maintained, indicating the number of solutions that
reside in each grid location. Since the procedure is adaptive, no extra parameters are
required (except for the number of divisions of the objective space). (1 + λ)-PAES
and (μ + λ)-PAES extend the basic algorithm. (1 + 1)-PAES is comparable with
NSGA-II.
Memetic PAES [85] associates a global search evolutionary scheme with mutation
and recombination operators of a population, with the local search method of (1 + 1)-
PAES [84]. Memetic PAES outperforms (1 + 1)-PAES, and performs similar to
SPEA.
Motivated from SPEA, PESA [31] uses an external archive to store the evolved
Pareto front and an internal population to generate new candidate solutions. PESA
maintains a hyper grid-based scheme to keep track of the degree of crowding in
different regions of the archive, which is applied to maintain the diversity of the
external population and to select the internal population from the external archive.
In PESA, mating selection is only performed on the archive which stores the current
nondominated set. The same holds for NSGA-II. PESA uses binary tournament
selection to generate new population from the archive. The archive in PESA only
contains the nondominated solutions found thus far. The one with the smaller squeeze
factor, i.e., the one residing in the less crowded hyperbox, wins the tournament. PESA
generally outperforms both SPEA and PAES. Both SPEA and PESA outperform
NSGA and NPGA.
As to the diversity mechanisms, NSGA-II uses the crowding distance and SPEA2
the density function. In PESA, hyperbox method and squeeze factor concept are
used. For the archive-updating mechanism, if a new individual is nondominated in
both the population and the archive, and the archive is full, then select the individual
in the archive with the largest squeeze factor to be replaced by the new one.
PESA-II [29] differs from PESA only in the selection mechanism. In PESA-II,
the unit of selection is a hyperbox in the objective space. Every hyperbox has its own
squeeze factor. The hyperbox with the smallest squeeze factor will be selected first
and then a randomly chosen individual is selected. Region-based selection could
ensure a good distribution along the Pareto front. Instead of assigning a selective
fitness to an individual, selective fitness is assigned to the hyperboxes in an elitist
fashion in the objective space which are currently occupied by at least one individual
in the current approximation to the Pareto frontier. A hyperbox is thereby selected,
and the resulting selected individual is randomly chosen from this hyperbox. This
method of selection is more sensitive to ensuring a good spread of development
along the Pareto frontier than individual-based selection. PESA-II gives significantly
superior results to PAES, PESA, and SPEA.
380 23 Multiobjective Optimization
extra measures for maintaining the population diversity. Compared with NSGA-
II with the same reproduction operators on the test instances, MOEA/D-DE is less
sensitive to the control parameters in DE operator than NSGA-II-DE. MOEA/D could
significantly outperform NSGA-II on these test instances. Optional local search is
used to guarantee that the offspring will be a legal and feasible solution and they
utilize the archive to contain the nondominated solutions found thus far.
In MOEA/D, each subproblem is paired with a solution in the current popula-
tion. Subproblems and solutions are two sets of agents. The selection of promising
solutions for subproblems can be regarded as a matching between subproblems and
solutions. Stable matching, proposed in economics, can effectively resolve conflicts
of interests among selfish agents in the market. MOEA/D-STM [97] is derived from
MOEA/D-DRA [171]. The only difference between MOEA/D-STM and MOEA/D-
DRA is in the selection. MOEA/D-STM uses stable matching model to implement
the selection operator in MOEA/D. The subproblem preference encourages conver-
gence, whereas the solution preference promotes population diversity. Stable match-
ing model is used to balance these two preferences and thus, trading off the conver-
gence and diversity of the evolutionary search. The stable outcome produced by the
stable matching model matches each subproblem with one single solution, whereas
each subproblem agent in MOEA/D, by using its aggregation function, ranks all
solutions in the solution pool.
In micro-GA [27], the initial population memory is divided into a replaceable and a
non-replaceable portion. The non-replaceable portion is randomly generated, never
changes during the entire run and it provides the required diversity. The replaceable
portion changes after each generation. The population of each generation is taken
randomly from both population portions, and then undergoes conventional genetic
operators. After one generation, two nondominated vectors are chosen from the final
population and they are compared with the contents of the external archive. If either
of them (or both) remains as nondominated after comparing it against the vectors in
this archive, then they are included in the archive. This is the historical archive of
nondominated vectors. All dominated vectors contained in the archive are eliminated.
Micro-GA uses three forms of elitism. It can produce an important portion of the
Pareto front at a very low computational cost. During the evolving process, micro-
GA will start from points getting closer and closes to the Pareto front, which makes
micro-GA very efficient. The crowdedness evaluation in micro-GA is the squeeze
factor.
Incrementing MOEA [151] has a dynamic population size that is computed adap-
tively according to the discovered Pareto front and desired population density. It
incorporates the method of fuzzy boundary local perturbation with interactive local
fine-tuning for broader neighborhood exploration.
382 23 Multiobjective Optimization
fitness scores favor solutions that are closer to the Pareto frontier and that are located
at underrepresented regions.
ParEGO [83] is an extension of efficient global optimization to the multiobjective
framework. The objective values of solutions are scalarized with a weighted Tcheby-
cheff function and a model based on the Gaussian process at each iteration is used
to better approximate the Pareto frontier. ParEGO generally outperforms NSGA-II.
Hill climber with sidestep [91] is a local search-based procedure. It has been
integrated into a given evolutionary method such as SPEA2 and NSGA-II leading
to new memetic algorithms. The local search procedure is intended to be capable of
both moving toward and along the (local) Pareto set depending on the distance of
the current iterate toward this set. It utilizes the geometry of the directional cones of
such optimization problems and works with or without gradient information.
Genetic diversity evaluation method [155] is a diversity-preserving mechanism.
It considers a distance-based measure of genetic diversity as a real objective in
fitness assignment. This provides a dual selection pressure toward the exploitation
of current nondominated solutions and the exploration of the search space. Fitness
assignment is performed by ranking the solutions according to the Pareto ranks scored
with respect to the objectives of the MOP and a distance-based measure of genetic
diversity, creating a two-criteria optimization problem in which the objectives are
the goals of the search process itself. Genetic diversity EA [155] is a multiobjective
EA that is strictly designed around genetic diversity evaluation method, and features
a (μ + λ) selection scheme as an elitist strategy.
NPGA [71] is a global nonelitist selection algorithm for finding the Pareto optimal
set. It modifies GA to deal with multiple objectives by incorporating the concept of
Pareto dominance in its selection operator, and applying a niching pressure to spread
its population out along the Pareto optimal tradeoff surface. Niched Pareto GA 2
(NPGA2) [46] improves NPGA by using Pareto-rank-based tournament selection
and criteria-space niching to find nondominated frontiers.
Hypervolume-based algorithm (HypE) [7] uses a hypervolume estimation algo-
rithm for multiobjective optimization, by which the accuracy of the estimates can
be traded off against the available computing resources. Like standard MOEA, it is
based on fitness assignment schemes, and consists of successive application of mat-
ing selection, variation, and environmental selection. The hypervolume indicator
is applied in environmental selection. In HypE, a Monte Carlo simulation method
is used to approximate the exact hypervolume value. This approximation method
significantly reduces the computational load and makes HypE very competitive for
solving many-objective optimization problems.
Single front GA [20,21] is an island model for multiobjective problems with a
clearing procedure that uses a grid in the objective space for maintaining diversity
and the distribution of the solutions in the Pareto front. Each subpopulation (island)
is associated with a different area in the search space. Compared with NSGA-II and
SPEA2, single front GAs (SFGA, and especially SFGA2) have obtained adequate
quality in the solutions in very little time. Single front GAs could be appropriate
in dealing with optimization problems with high rates of change, and thus stronger
time constraints, such as multiobjective optimization for dynamic problems.
384 23 Multiobjective Optimization
The three performance objectives are convergence to the Pareto front, evenly distrib-
uted Pareto optimal solutions and coverage of the entire front.
In multiobjective optimization, a theoretically well-supported alternative to Pareto
dominance is the use of a set-based indicator function to measure the quality of a
Pareto front approximation of solution sets. Some performance metrics are described
in this section.
Generational Distance
Generational distance [157] aims to find a set of nondominated solutions having
the lowest distance with the Pareto optimal fronts. An algorithm with the minimum
generational distance has the best convergence to Pareto optimal fronts.
Generational distance is defined as the average distance from an approximation
set of solutions, P , found by evolution to the global Pareto optimal set (i.e., the
reference set):
∗ ∗ n
∗ s∈P min{x1 − s2 , . . . , xNP − s2 } di
D(P , P ) = = i=1 , (23.6)
|P | n
where |P ∗ | = NP is the cardinality of the set P ∗ = {x∗1 , . . . , x∗NP }, di is the Euclidean
distance (in the objective space) from solution i to the nearest solution in the Pareto
optimal set, and n is the size of P . This metric describes convergence, but not the
distribution of the solution over the entire Pareto optimal front. The metric measures
the distance of the elements in the set P from the nearest point of the reference
23.3 Performance Metrics 387
Pareto frontier, P being an approximation of the true front and P ∗ the reference
Pareto optimal set.
Spacing
Spacing metric [141] is used to measure the distribution of the nondominated solu-
tions obtained by an algorithm, i.e., the obtained Pareto optimal front [36]
|P |
1
Sp = (di − d̄)2 , (23.7)
|P |
i=1
where |P | is the number of member in the approximate Pareto optimal front P , and
di is the Euclidean distance between the member xi in P and the nearest member
in P
k
di = min |fm (xi ) − fm (xj )| , xj ∈ P , j = i, j = 1, 2, . . . , |P |. (23.8)
m=1
low IGD value (ideally zero) is preferable, indicating that the obtained solution set is
close to the Pareto front as well as has a good distribution. Knowledge of the Pareto
front of a test problem is required for the calculation of IGD. The DTLZ problems
have known optimal fronts.
Let PF be the Pareto optimal set (a reference set representing the Pareto front);
the IGD value from PF to the obtained solution set P is defined by
|PF|
d(x, P ) d̄i
I GD(P ) = = i=1 , (23.11)
|PF| |PF|
x∈PF
where the cardinality function |PF| is the size of the Pareto optimal set, d(x, P ) is
the minimum Euclidean distance (in the objective space) from x to P , and di is the
Euclidean distance from solution i in the Pareto optimal set to the nearest solution
in P .
In order to get a low IGD value, P needs to cover all parts of the Pareto optimal
set. However, this method only focuses on the solution that is closest to the solution
in the Pareto optimal set.
The average IDG over all time is a dynamic performance metric
Tmax
I GDi
I GD = i=1 , (23.12)
Tmax
where I GDi is the performance metric at the moment t, and Tmax is the maximum
number of time steps.
Hypervolume
Hypervolume metric [177,181] is a unary measure of the hypervolume in the objec-
tive space that is dominated by a set of nondominated points. Hypervolume measures
the volume of the objective space covered/dominated by the approximation set, rep-
resenting a combination of proximity and diversity. For the problem whose Pareto
front is unknown, hypervolume is a popular performance metric.
The hypervolume measure is strictly monotonic with regard to Pareto dominance,
i.e., if a set A dominates the set B , then the hypervolume metric H V (A) > H V (B ),
assuming the metric to be maximized. However, hypervolume calculation requires
a runtime that increases exponentially with respect to the number of objectives. R2
metric is considered as an alternative to hypervolume. R2 metric [64] is weakly
monotonic, i.e., H V (A) ≥ H V (B ) if A weakly dominates B . Its calculation is much
easier.
Hypervolume calculates the volume of the objective space between the obtained
solution set and a reference point, and a larger value is preferable. Before computing
hypervolume, the values of all objectives are normalized to the range of a reference
point for each test problem. Choosing a reference point that is slightly larger than
the worst value of each objective on the Pareto front is suitable since the effects
of convergence and diversity of the set can be well balanced [5]. For minimization
problems, hypervolume values are normalized as
H Vk∗
H Vk = , (23.13)
maxi=1,2,...,N H Vi∗
23.3 Performance Metrics 389
where H Vk∗ , k = 1, 2, . . . , N, is the kth hypervolume value for a test problem, and
H Vk is the normalized value of H Vk∗ .
Two Set Coverage
Two set coverage (SC) metric [177], as a relative coverage comparison of two sets,
is defined as the mapping of the order pair (A, B ) to [0, 1]:
|{xb ∈ B ; ∃xa ∈ A : xb ≺ xa }|
SC(A, B ) = . (23.14)
|B |
If all points in A dominate or are equal to all positions in B, then SC = 1; SC = 0
otherwise. SC(A, B ) denotes the total percentage of solutions in B that are dominated
by A. Note that SC(A, B ) = 1 − SC(B , A).
Additive indicator (+-indicator) is the smallest distance the approximation set
must be translated so that every objective vector in the reference set is covered.
This identifies situations in which the approximation set contains one or more out-
lying objective vectors with poor proximity. A binary ε-dominance-based indicator
is defined in [182].
the objectives, but with near optimal values in the others. Consequently, the solutions
in the final solution set may be distributed uniformly in the objective space, but away
from the desired Pareto front.
Some studies have shown that a random search algorithm may even achieve bet-
ter results than Pareto-based algorithms in MOPs with a high number of objectives
[86,130]. The selection rule created by the Pareto dominance makes the solutions
nondominated with respect to one another, at an early stage of MOEAs [30,48]. In
these algorithms, the ineffectiveness of the Pareto dominance relation for a high-
dimensional space leads diversity maintenance mechanisms to play a leading role
during the evolutionary process, while the preference of diversity maintenance mech-
anisms for individuals in sparse regions results in the final solutions distributed widely
over the objective space but distant from the desired Pareto front.
Let us consider a solution set of size N for an M (M > 4) objective optimization
problem. Assume that each of the solutions is distinct in all M objectives and each of
the objective values are continuous variables. The expected number of nondominated
solutions is given by [17]
N
k+1 N 1
A(N, M) = (−1) . (23.15)
k k M−1
k=1
By dividing the above expression by N, we have
N
1
k+1 N
A(N, M) k=1 (−1) k k M−1
P(N, M) = = . (23.16)
N N
For given N, P(N, M) converges to 1 with increasing M, as shown in Figure 23.4.
This indicates that if we follow the selection rule defined by Pareto dominance, the
change of getting a nondominated solution increases as the number M of objectives
is increased. The problem can be solved by changing the dominance criterion [48].
Nondominance is an inadequate strategy for convergence to the Pareto front for
such problems, as almost all solutions in the population become nondominated,
0.8
0.6 N=10 20 50
P(N,M)
0.4
0.2
0
5 10 15 20
M
Based on DEMO [136], α-DEMO [8] implements the technique of selecting a subset
of conflicting objectives using a correlation-based ordering of objectives. α is a
parameter determining the number of conflicting objectives to be selected. A new
form of elitism is proposed so as to restrict the number of higher ranked solutions that
are selected in the next population. α-DEMO algorithm [8] works faster than other
MOEAs based on dimensionality reduction, such as KOSSA [75], MOEA/D [170],
and HypE [7] for many-objective optimization problems, while having competitive
performance.
Shift-based density estimation (SDE) [98] can accurately reflect the density of
individuals in the population. It is a modification of traditional density estimation
in Pareto-based algorithms for dealing with many-objective problems. By shifting
individuals’ positions according to their relative proximity to the Pareto front, SDE
considers both convergence and diversity for each individual in the population. The
implementation of SDE is simple and it can be applied to any density estimator with-
out additional parameters. SDE has been applied to three Pareto-based algorithms,
namely, NSGA-II, SPEA2, and PESA-II. SPEA2+SDE provides a good balance
between convergence and diversity. When addressing a many-objective problem,
SDE may be easily and effectively adopted, as long as the algorithm’s density esti-
mator can accurately reflect the density of individuals.
Grid-based EA [165] solves many-objective optimization problems. It exploits
the potential of the grid-based approach to strengthen the selection pressure toward
the optimal direction while maintaining an extensive and uniform distribution among
solutions. Grid dominance and grid difference are used to determine the mutual rela-
tionship of individuals in a grid environment. Three grid-based criteria, namely, grid
392 23 Multiobjective Optimization
ranking, grid crowding distance, and grid coordinate point distance, are incorporated
into the fitness of individuals to distinguish them in both the mating and environ-
mental selection processes. Moreover, a fitness adjustment strategy is developed by
adaptively punishing individuals based on the neighborhood and grid dominance
relations in order to avoid partial overcrowding as well as guide the search toward
different directions in the archive. The designed density estimator of an individ-
ual takes into account not only the number of its neighbors, but also the distance
difference between itself and these neighbors.
A diagnostic assessment framework for rigorously evaluating the effectiveness,
reliability, efficiency, and controllability of many-objective evolutionary optimiza-
tion as well as identifying their search controls and failure modes is proposed in [62].
Given the variety of fitness landscapes and the complexity of search population
dynamics, the operators used during multiobjective search are adapted based on
their success in guiding search [158]. Building on this, Borg MOEA [63] handles
many-objective multimodal problems using an auto-adaptive multioperator recom-
bination operator. This adaptive configuration of simulated binary crossover, DE,
parent-centric recombination (PCX), unimodal normal distribution crossover, sim-
plex crossover, polynomial mutation, and uniform mutation enables Borg MOEA to
quickly adapt to the problem’s local characteristics. The auto-adaptive multioperator
recombination, adaptive population sizing, and time continuation components all
exploit dynamic feedback from an -dominance archive to guarantee convergence
and diversity throughout search, according to the theoretical analysis of [94]. Borg
MOEA combines -dominance, a measure of convergence speed named -progress,
randomized restarts, and auto-adaptive multioperator recombination into a unified
optimization framework. Borg meets or exceeds -NSGA-II, -MOEA, OMOPSO,
GDE3, MOEA/D, SPEA2, and NSGA-II on the majority of the tested problems.
NSGA-III [38,76] is a reference point-based many-objective EA following NSGA-
II framework. It emphasizes population members that are nondominated, yet close to
a set of supplied reference points. NSGA-III outperforms MOEA/D-based algorithm
for unconstrained and constrained problems with a large number of objectives.
Clustering–ranking EA [19] implements clustering and ranking sequentially for
many-objective optimization. Clustering incorporates NSGA-III, using a series of
reference lines as the cluster centroid. The solutions are ranked according to their
degree of closeness to the true Pareto front. An environmental selection operation is
performed on every cluster to promote both convergence and diversity.
MOEA equipped with the preference relation can be integrated into an interactive
optimization method. A preference relation based on a reference point approach [108]
enables integrating decision-maker’s preferences into an MOEA. Besides finding the
optimal solution of the achievement scalarizing function, the new preference relation
allows the decision-maker to find a set of solutions around that optimal solution.
Since the preference relation induces a finer order on vectors of the objective space
than that achieved by the Pareto dominance relation, it is appropriate to cope with
many-objective problems.
23.4 Many-Objective Optimization 393
Various multiobjective PSO algorithms have been developed for MOPs [14,26,28,
67,68,100,101,129,133,134,144]. These designs generally use a fixed population
size throughout the process of searching for possible nondominated solutions until
the Pareto optimal set is obtained.
Multiobjective PSO (MOPSO) [28] incorporates Pareto dominance and an adap-
tive mutation operator. It uses an archive of particles for guiding the flight of other
particles. The algorithm is relatively easy to implement. It is able to cover the full
Pareto front of all the functions used.
Multiswarm multiobjective PSO [26] divides the decision space into multiple
subswarms via clustering to improve the diversity of solutions on the Pareto optimal
front. PSO is executed in each subswarm. Every particle will deposit its flight expe-
riences after each flight cycle. At some points during the search, different subswarms
exchange information so that each subswarm chooses a different leader to preserve
diversity. The number of particles in each swarm is predetermined. AMOPSO [129]
396 23 Multiobjective Optimization
Archive-based hybrid scatter search (AbYSS) [123] follows scatter search structure
but uses mutation and crossover operators from EAs for solving MOPs. AbYSS
incorporates Pareto dominance, density estimation, and an archive. An archive is
used to store the nondominated solutions found during the search, following the
scheme applied by PAES, but using the crowding distance of NSGA-II as a niching
measure instead of the adaptive grid. Selection of solutions from the initial set used
to build the reference set applies the SPEA2 density estimation. AbYSS outperforms
NSGA-II and SPEA2 in terms of diversity of solutions, and it obtains very competitive
results according to the convergence to the true Pareto fronts and the hypervolume
metric.
MOSS [11] is a hybrid tabu/scatter search method for MOPs. It uses a weighted
sum approach. Multistart tabu search is used as the diversification method for gener-
ating a diverse approximation to the Pareto optimal set of solutions. It is also applied
to rebuild the reference set after each iteration of scatter search. Each tabu search
works with its own starting point, recency memory, and aspiration threshold. Fre-
quency memory is used to diversify the search and it is shared between the tabu
400 23 Multiobjective Optimization
search algorithms. SSPMO [117] is also a hybrid scatter/tabu search algorithm for
continuous MOPs. Part of the reference set is obtained by selecting the best solu-
tions from the initial set for each objective function. The rest of the reference set
is obtained using the usual approach of selecting the remaining solutions from the
initial set which maximize the distance to the solutions already in the reference set.
SSMO [122] is a scatter search-based algorithm for solving MOPs. It incorpo-
rates Pareto dominance, crowding, and Pareto ranking. It is characterized by using
a nondominating sorting procedure to build the reference set from the initial set
where all the nondominated solutions found in the scatter search loop are stored,
and a mutation-based local search is used to improve the solutions obtained from the
reference set.
M-scatter search [156] extends scatter search to multiobjective optimization by
using nondominated sorting and niched-type penalty method of NSGA. It uses an
archive to store nondominated solutions found during the computation. NSGA nich-
ing method is applied for updating the archive so as to keep nondominated solutions
uniformly distributed along the Pareto front.
Multiobjective SA [120] uses dominance concept and annealing scheme for efficient
search. In [120], the relative dominance of the current and proposed solutions is
tested by using dominance in state change probabilities, and the proposal is accepted
when the proposed solution dominates the current solution. In [146], multiobjective
optimization is mapped to single-objective optimization by using the true tradeoff
surface, and is then applied by single-objective SA. Exploration of the full tradeoff
surface is encouraged. The method uses the relative dominance of a solution as the
system energy for optimization. It promotes rapid convergence to the true Pareto front
with a good coverage of solutions across it comparing favorably with both NSGA-II
and multiobjective SA [120]. SA-based multiobjective optimization [9] incorporates
an archive to provide a set of tradeoff solutions. To determine the acceptance prob-
ability of a new solution against the current solution, an elaborate procedure takes
into account the domination status of the new solution with the current solution, as
well as those in the archive.
Multiobjective ACO algorithms are proposed in [53]. In [118] different coarse-
grained distribution schemes for multiobjective ACO algorithms are based on inde-
pendent multi-colony structures. An island-based model is introduced where the
colonies communicate by migrating ants, following a neighborhood topology which
fits to the search space. The methods are aimed to cover the whole Pareto front, thus
each subcolony tries to search for solutions in a limited area.
Dynamic multi-colony multiobjective ABC [163] uses the multi-deme model and
a dynamic information exchange strategy. Colonies search independently most of
the time and share information occasionally. In each colony, there are S bees con-
taining an equal number of employed bees and onlooker bees. For each food source,
23.9 Other Methods 401
the employed or onlooker bee will explore a temporary position generated by using
neighboring information, and the better one determined by a greedy selection strategy
is kept for the next iterations. The external archive is employed to store nondominated
solutions found during the search process, and the diversity over the archived indi-
viduals is maintained by using crowding distance strategy. If a randomly generated
number is smaller than the migration rate, then an elite, identified as the intermediate
individual with the maximum crowding distance value, is used to replace the worst
food source in a randomly selected colony.
In elite-guided multiobjective ABC algorithm [70], fast nondominated sorting and
population selection strategy are applied to measure the quality of the solution and
select the better ones. The neighborhood of the existing solutions are exploited to
generate new solutions under the guidance of the elite. A fitness calculation method
is used to calculate the selection probability for onlookers.
Bacterial chemotaxis algorithm for multiobjective optimization [61] uses fast
nondominated sorting procedure, communication between the colony members and
a simple chemotactical strategy to change the bacterial positions in order to explore
the search space to find several optimal solutions. Multiobjective bacterial colony
chemotaxis algorithm [109] adds improved adaptive grid, oriented mutation based
on grid, and adaptive external archive to bacterial colony chemotaxis algorithm to
improve the convergence and the diversity of nondominated solutions.
A general framework for combining MOEAs with interactive preference infor-
mation and ordinal regression is presented in [15]. The interactive MOEA attempts
to learn a value function capturing the users’ true preferences. At regular intervals,
the user is asked to rank a single pair of solutions. This information is used to update
the algorithm’s internal value function model, and the model is used in subsequent
generations to rank solutions incomparable according to dominance.
HP-CRO [102] is a hybrid of PSO and CRO for multiobjective optimization. It
creates new molecules (particles) used by CRO operations as well as by mechanisms
of PSO. HP-CRO outperforms FMOPSO, MOPSO, NSGA-II and SPEA2.
Examples of other methods for multiobjective optimization are multiobjective
backtracking search algorithm [116], multiobjective cultural algorithm along with
evolutionary programming [24], multiobjective ABC by combining modified near-
est neighbor approach and improved inver-over operation [96], hybrid multiobjective
optimization based on shuffled frog leaping and bacteria optimization [131], mul-
tiobjective cuckoo search [65], self-adaptive multiobjective harmony search [34],
multiobjective teaching–learning-based optimization [132], multiobjective fish
school search [10], multiobjective invasive weed optimization [90], multiobjective
BBO [33,115], multiobjective bat algorithm [166], multiobjective brainstorming
optimization [164], multiobjective water cycle algorithm (MOWCA) [137], Gaussian
bare-bones multiobjective imperialist competitive algorithm [54], multiobjective dif-
ferential search algorithm [89], and multiobjective membrane algorithms [69].
402 23 Multiobjective Optimization
Problems
23.1 Apply gamultiobj solver to solve the ZDT1 problem in the Appendix as an
instance of unconstrained multiobjective optimization.
23.2 Apply gamultiobj solver to solve the Srinivas problem in the Appendix as
an instance of constrained multiobjective optimization.
23.3 Run the accompanying MATLAB code of MOEA/D to find the Pareto front
of Fonseca function in the Appendix. Investigate how to improve the result by
adjusting the parameters.
References
1. Abbass HA, Sarker R, Newton C. PDE: a Pareto-frontier differential evolution approach for
multi-objective optimization problems. In: Proceedings of IEEE congress on evolutionary
computation (CEC), Seoul, South Korea, May 2001. p. 971–978.
2. Abbass HA. The self-adaptive pareto differential evolution algorithm. In: Proceedings of IEEE
congress on evolutionary computation (CEC), Honolulu, HI, USA, May 2002. p. 831–836.
3. Agrawal S, Panigrahi BK, Tiwari MK. Multiobjective particle swarm algorithm with fuzzy
clustering for electrical power dispatch. IEEE Trans Evol Comput. 2008;12(5):529–41.
4. Asafuddoula M, Ray T, Sarker R. A decomposition-based evolutionary algorithm for many
objective optimization. IEEE Trans Evol Comput. 2015;19(3):445–60.
5. Auger A, Bader J, Brockhoff D, Zitzler E. Theory of the hypervolume indicator: optimal μ-
distributions and the choice of the reference point. In: Proceedings of the 10th ACM SIGEVO
workshop on foundations of genetic algorithms (FOGA), Orlando, FL, USA, Jan 2009. p.
87–102.
6. Babbar M, Lakshmikantha A, Goldberg DE. A modified NSGA-II to solve noisy multi-
objective problems. In: Proceedings of genetic and evolutionary computation conference
(GECCO), Chicago, IL, USA, July 2003. p. 21–27.
7. Bader J, Zitzler E. HypE: an algorithm for fast hypervolume-based many-objective optimiza-
tion. Evol Comput. 2011;19(1):45–76.
8. Bandyopadhyay S, Mukherjee A. An algorithm for many-objective optimization with
reduced objective computations: a study in differential evolution. IEEE Trans Evol Com-
put. 2015;19(3):400–13.
9. Bandyopadhyay S, Saha S, Maulik U, Deb K. A simulated annealing-based multiobjective
optimization algorithm: AMOSA. IEEE Trans Evol Comput. 2008;12(3):269–83.
10. Bastos-Filho CJA, Guimaraes ACS. Multi-objective fish school search. Int J Swarm Intell
Res. 2015;6(1):18p.
11. Beausoleil RP. Moss: multiobjective scatter search applied to nonlinear multiple criteria opti-
mization. Eur J Oper Res. 2006;169(2):426–49.
12. Bosman PAN, Thierens D. The balance between proximity and diversity in multiobjective
evolutionary algorithms. IEEE Trans Evol Comput. 2003;7(2):174–88.
13. Bosman PAN, Thierens D. The naive MIDEA: a baseline multi-objective EA. In: Proceed-
ings of the 3rd international conference on evolutionary multi-criterion optimization (EMO),
Guanajuato, Mexico, March 2005. p. 428–442.
14. Branke J, Mostaghim S. About selecting the personal best in multiobjective particle swarm
optimization. In: Proceedings of conference on parallel problem solving from nature (PPSN
IX), Reykjavik, Iceland, Sept 2006. Berlin: Springer; 2006. p. 523–532.
404 23 Multiobjective Optimization
15. Branke J, Greco S, Slowinski R, Zielniewicz P. Learning value functions in interactive evo-
lutionary multiobjective optimization. IEEE Trans Evol Comput. 2015;19(1):88–102.
16. Brockhoff D, Zitzler E. Objective reduction in evolutionary multiobjective optimization: the-
ory and applications. Evol Comput. 2009;17(2):135–66.
17. Buchta C. On the average number of maxima in a set of vectors. Inf Process Lett.
1989;33(2):63–5.
18. Bui LT, Liu J, Bender A, Barlow M, Wesolkowski S, Abbass HA. DMEA: a direction-based
multiobjective evolutionary algorithm. Memetic Comput. 2011;3:271–85.
19. Cai L, Qu S, Yuan Y, Yao X. A clustering-ranking method for many-objective optimization.
Appl Soft Comput. 2015;35:681–94.
20. Camara M, de Toro F, Ortega J. An analysis of multiobjective evolutionary algorithms for
optimization problems with time constraints. Appl Artif Intell. 2013;27:851–79.
21. Camara M, Ortega J, de Toro F. A single front genetic algorithm for parallel multi-objective
optimization in dynamic environments. Neurocomputing. 2009;72:3570–9.
22. Chen Q, Guan S-U. Incremental multiple objective genetic algorithms. IEEE Trans Syst Man
Cybern Part B. 2004;34(3):1325–34.
23. Clymont KM, Keedwell E. Deductive sort and climbing sort: new methods for non-dominated
sorting. Evol Comput. 2012;20(1):1–26.
24. Coello CAC, Becerra RL. Evolutionary multiobjective optimization using a cultural algorithm.
In: Proceedings of IEEE swarm intelligence symposium, Indianapolis, IN, USA, April 2003.
p. 6–13.
25. Coello CAC, Cortes NC. Solving multiobjective optimization problems using an artificial
immune system. Genet Program Evolvable Mach. 2005;6:163–90.
26. Coello CAC, Lechuga MS. MOPSO: a proposal for multiple objective particle swarm opti-
mization. In: Proceedings of IEEE congress on evolutionary computation (CEC), Honolulu,
HI, USA, May 2002. p. 1051–1056.
27. Coello CAC, Pulido GT. A micro-genetic algorithm for multiobjective optimization. In: Pro-
ceedings of the 1st international conference on evolutionary multi-criterion optimization
(EMO), Zurich, Switzerland, March 2001. p. 126–140.
28. Coello CAC, Pulido GT, Lechuga MS. Handling multiple objectives with particle swarm
optimization. IEEE Trans Evol Comput. 2004;8(3):256–79.
29. Corne DW, Jerram NR, Knowles JD, Oates MJ. PESA-II: region-based selection in evolu-
tionary multiobjective optimization. In: Proceedings of genetic and evolutionary computation
conference (GECCO), San Francisco, CA, USA, July 2001. p. 283–290.
30. Corne DW, Knowles JD. Techniques for highly multiobjective optimization: some nondomi-
nated points are better than others. In: Proceedings of the 9th ACM genetic and evolutionary
computation conference (GECCO), London, UK, July 2007. p. 773–780.
31. Corne DW, Knowles JD, Oates MJ. The pareto envelope-based selection algorithm for multi-
objective optimisation. In: Proceedings of the 6th international conference on parallel problem
solving from nature (PPSN VI), Paris, France, Sept 2000. Berlin: Springer; 2000. p. 839–848.
32. Costa M, Minisci E. MOPED: a multi-objective Parzen-based estimation of distribution algo-
rithm for continuous problems. In: Proceedings of the 2nd international conference on evo-
lutionary multi-criterion optimization (EMO), Faro, Portugal, April 2003. p. 282–294.
33. Costa e Silva MA, Coelho LDS, Lebensztajn L. Multiobjective biogeography-based optimiza-
tion based on predator-prey approach. IEEE Trans Magn. 2012;48(2):951–954.
34. Dai X, Yuan X, Zhang Z. A self-adaptive multi-objective harmony search algorithm based on
harmony memory variance. Appl Soft Comput. 2015;35:541–57.
35. Deb K. Multi-objective genetic algorithms: problem difficulties and construction of test prob-
lems. Evol Comput. 1999;7(3):205–30.
36. Deb K. Multi-objective optimization using evolutionary algorithms. Chichester: Wiley; 2001.
References 405
37. Deb K, Agrawal S, Pratap A, Meyarivan T. A fast elitist non-dominated sorting genetic
algorithm for multi-objective optimization: NSGA-II. In: Proceedings of the 6th international
conference on parallel problem solving from nature (PPSN VI), Paris, France, Sept 2000.
Berlin: Springer; 2000. p. 849–858.
38. Deb K, Jain H. An evolutionary many-objective optimization algorithm using reference-point
based non-dominated sorting approach, part i: solving problems with box constraints. IEEE
Trans Evol Comput. 2013;18(4):577–601.
39. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multi-objective genetic algorithm:
NSGA-II. IEEE Trans Evol Comput. 2002;6(2):182–97.
40. Deb K, Saxena DK. On finding Pareto-optimal solutions through dimensionality reduc-
tion for certain large-dimensional multi-objective optimization problems. KanGAL Report,
No.2005011. 2005.
41. Deb K, Sinha A, Kukkonen S. Multi-objective test problems, linkages, and evolutionary
methodologies. In: Proceedings of genetic and evolutinary computation conference (GECCO),
Seattle, WA, USA, July 2006. p. 1141–1148.
42. Deb K, Sundar J. Reference point based multiobjective optimization using evolutionary algo-
rithms. In: Proceedings of the 8th genetic and evolutionary computation conference (GECCO),
Seattle, WA, USA, July 2006. p. 635–642.
43. Depolli M, Trobec R, Filipic B. Asynchronous master-slave parallelization of differential
evolution for multi-objective optimization. Evol Comput. 2013;21(2):261–91.
44. di Pierro F, Khu S-T, Savic DA. An investigation on preference order ranking scheme for
multiobjective evolutionary optimization. IEEE Trans Evol Comput. 2007;11(1):17–45.
45. Elhossini A, Areibi S, Dony R. Strength Pareto particle swarm optimization and hybrid EA-
PSO for multi-objective optimization. Evol Comput. 2010;18(1):127–56.
46. Erickson M, Mayer A, Horn J. The niched pareto genetic algorithm 2 applied to the design
of groundwater remediation systems. In: Proceedings of the 1st international conference on
evolutionary multi-criterion optimization (EMO), Zurich, Switzerland, March 2001. p. 681–
695.
47. Fang H, Wang Q, Tu Y-C, Horstemeyer MF. An efficient non-dominated sorting method for
evolutionary algorithms. Evol Comput. 2008;16(3):355–84.
48. Farina M, Amato P. On the optimal solution definition for many-criteria optimization prob-
lems. In: Proceedings of the annual meeting of the North American fuzzy information process-
ing society (NAFIPS), New Orleans, LA, USA, June 2002. p. 233–238.
49. Fleming PJ, Purshouse RC, Lygoe RJ. Many-objective optimization: an engineering design
perspective. In: Proceedings of international conference on evolutionary multi-criterion opti-
mization (EMO), Guanajuato, Mexico, March 2005. p. 14–32.
50. Fonseca CM, Fleming PJ. Genetic algorithms for multiobjective optimization: formulation,
discussion and generalization. In: Forrest S, editor. Proceedings of the 5th international con-
ference on genetic algorithms, July 1993. San Francisco, CA: Morgan Kaufmann; 1993. p.
416–423.
51. Fonseca CM, Fleming PJ. Multiobjective optimization and multiple constraint handling with
evolutionary algorithms—Part i: a unified formulation; Part ii: application example. IEEE
Trans Syst Man Cybern Part A. 1998;28(1):26–37, 38–47.
52. Freschi F, Repetto M. Multiobjective optimization by a modified artificial immune system
algorithm. In: Proceedings of the 4th international conference on artificial immune systems
(ICARIS), Banff, Alberta, Canada, Aug 2005. pp. 248–261.
53. Garcia-Martinez C, Cordon O, Herrera F. A taxonomy and an empirical analysis of mul-
tiple objective ant colony optimization algorithms for the bi-criteria TSP. Eur J Oper Res.
2007;180(1):116–48.
54. Ghasemi M, Ghavidel S, Ghanbarian MM, Gitizadeh M. Multi-objective optimal electric
power planning in the power system using Gaussian bare-bones imperialist competitive algo-
rithm. Inf Sci. 2015;294:286–304.
406 23 Multiobjective Optimization
55. Giagkiozis I, Purshouse RC, Fleming PJ. Generalized decomposition and cross entropy meth-
ods for many-objective optimization. Inf Sci. 2014;282:363–87.
56. Goh C-K, Tan KC. A competitive-cooperative coevolutionary paradigm for dynamic multi-
objective optimization. IEEE Trans Evol Comput. 2009;13(1):103–27.
57. Goh CK, Tan KC, Liu DS, Chiam SC. A competitive and cooperative coevolutionary
approach to multi-objective particle swarm optimization algorithm design. Eur J Oper Res.
2010;202(1):42–54.
58. Goldberg DE. Genetic algorithms in search, optimization, and machine learning. Reading,
MA, USA: Addison-Wesley; 1989.
59. Gong M, Jiao L, Du H, Bo L. Multiobjective immune algorithm with nondominated neighbor-
based selection. Evol Comput. 2008;16(2):225–55.
60. Guevara-Souza M, Vallejo EE. Using a simulated Wolbachia infection mechanism to improve
multi-objective evolutionary algorithms. Nat Comput. 2015;14:157–67.
61. Guzman MA, Delgado A, De Carvalho J. A novel multi-objective optimization algorithm
based on bacterial chemotaxis. Eng Appl Artif Intell. 2010;23:292–301.
62. Hadka D, Reed P. Diagnostic assessment of search controls and failure modes in many-
objective evolutionary optimization. Evol Comput. 2012;20(3):423–52.
63. Hadka D, Reed P. Borg: an auto-adaptive many-objective evolutionary computing framework.
Evol Comput. 2013;21:231–59.
64. Hansen MP, Jaszkiewicz A. Evaluating the quality of approximations to the non-dominated
set. Technical Report IMM-REP-1998-7, Institute of Mathematical Modeling, Technical Uni-
versity of Denmark, Denmark; 1998.
65. He X-S, Li N, Yang X-S. Non-dominated sorting cuckoo search for multiobjective optimiza-
tion. In: Proceedings of IEEE symposium on swarm intelligence (SIS), Orlando, FL, USA,
Dec 2014. p. 1–7.
66. He Z, Yen GG. Many-objective evolutionary algorithm: objective space reduction and diversity
improvement. IEEE Trans Evol Comput. 2016;20(1):145–60.
67. Hu X, Eberhart RC. Multiobjective optimization using dynamic neighborhood particle swarm
optimization. In: Proceedings of congress on evolutinary computation (CEC), Honolulu, HI,
USA, May 2002. p. 1677–1681.
68. Hu X, Eberhart RC, Shi Y. Particle swarm with extended memory for multiobjective opti-
mization. In: Proceedings of IEEE swarm intelligence symposium, Indianapolis, IN, USA,
April 2003. p. 193–197.
69. Huang L, He XX, Wang N, Xie Y. P systems based multi-objective optimization algorithm.
Prog Nat Sci. 2007;17:458–65.
70. Huo Y, Zhuang Y, Gu J, Ni S. Elite-guided multi-objective artificial bee colony algorithm.
Appl Soft Comput. 2015;32:199–210.
71. Horn J, Nafpliotis N, Goldberg DE. A niched pareto genetic algorithm for multiobjective opti-
mization. In: Proceedings of the 1st IEEE conference on evolutionary computation, Orlando,
FL, USA, June 1994. p. 82–87.
72. Ikeda K, Kita H, Kobayashi S. Failure of Pareto-based MOEAs: does non-dominated really
mean near to optimal? In: Proceedings of congress on evolutionary computation (CEC), Seoul,
Korea, May 2001. p. 957–962.
73. Iorio AW, Li X. A cooperative coevolutionary multiobjective algorithm using non-dominated
sorting. In: Proceedings of genetic and evolutionary computation conference (GECCO), Seat-
tle, WA, USA, June 2004. p. 537–548.
74. Ishibuchi H, Murata T. Multi-objective genetic local search algorithm and its application to
flowshop scheduling. IEEE Trans Syst Man Cybern Part C. 1998;28(3):392–403.
75. Jaimes AL, Coello CAC, Barrientos JEU. Online objective reduction to deal with many-
objective problems. In: Proceedings of the 5th international conference on evolutionary multi-
criterion optimization (EMO), Nantes, France, April 2009. p. 423–437.
References 407
95. Li H, Zhang Q. Multiobjective optimization problems with complicated Pareto sets, MOEA/D
and NSGA-II. IEEE Trans Evol Comput. 2009;13(2):284–302.
96. Li JQ, Pan QK, Gao KZ. Pareto-based discrete artificial bee colony algorithm for multi-
objective flexible job shop scheduling problems. Int J Adv Manuf Technol. 2011;55:1159–69.
97. Li K, Zhang Q, Kwong S, Li M, Wang R. Stable matching-based selection in evolutionary
multiobjective optimization. IEEE Trans Evol Comput. 2014;18(6):909–23.
98. Li M, Yang S, Liu X. Shift-based density estimation for Pareto-based algorithms in many-
objective optimization. IEEE Trans Evol Comput. 2014;18(3):348–65.
99. Li M, Yang S, Liu X. Bi-goal evolution for many-objective optimization problems. Artif Intell.
2015;228:45–65.
100. Li X. A non-dominated sorting particle swarm optimizer for multiobjective optimization. In:
Proceedings of genetic and evolutionary computation conference (GECCO), Chicago, IL,
USA, July 2003. p. 37–48.
101. Li X. Better spread and convergence: particle swarm multiobjective optimization using the
maximin fitness function. In: Proceedings of genetic and evolutionary computation conference
(GECCO), Seattle, WA, USA, June 2004. p. 117–128.
102. Li Z, Nguyen TT, Chen SM, Truong TK. A hybrid algorithm based on particle swarm and
chemical reaction optimization for multi-object problems. Appl Soft Comput. 2015;35:525–
40.
103. Liang Z, Song R, Lin Q, Du Z, Chen J, Ming Z, Yu J. A double-module immune algorithm
for multi-objective optimization problems. Appl Soft Comput. 2015;35:161–74.
104. Liu D, Tan KC, Goh CK, Ho WK. A multiobjective memetic algorithm based on particle
swarm optimization. IEEE Trans Syst Man Cybern Part B. 2007;37(1):42–50.
105. Lohn JD, Kraus WF, Haith GL. Comparing a coevolutionary genetic algorithm for multiob-
jective optimization. In: Proceedings of the world on congress on computational intelligence,
Honolulu, HI, USA, May 2002. p. 1157–1162.
106. Lu H, Yen G. Rank-density-based multiobjective genetic algorithm and benchmark test func-
tion study. IEEE Trans Evol Comput. 2003;7(4):325–43.
107. Leong W-F, Yen GG. PSO-based multiobjective optimization with dynamic population size
and adaptive local archives. IEEE Trans Syst Man Cybern Part B. 2008;38(5):1270–93.
108. Lopez-Jaimes A, Coello Coello CA. Including preferences into a multiobjective evolu-
tionary algorithm to deal with many-objective engineering optimization problems. Inf Sci.
2014;277:1–20.
109. Lu Z, Zhao H, Xiao H, Wang H, Wang H. An improved multi-objective bacteria colony
chemotaxis algorithm and convergence analysis. Appl Soft Comput. 2015;31:274–92.
110. Ma X, Qi Y, Li L, Liu F, Jiao L, Wu J. MOEA/D with uniform decomposition measurement
for many-objective problems. Soft Comput. 2014;18:2541–64.
111. Madavan NK. Multiobjective optimization using a Pareto differential evolution approach. In:
Proceedings of IEEE congress on evolutionary computation (CEC), Honolulu, HI, USA, May
2002. p. 1145–1150.
112. Marti L, Garcia J, Berlanga A, Molina JM. Solving complex high-dimensional problems with
the multi-objective neural estimation of distribution algorithm. In: Proceedings of the 11th
genetic and evolutionary computation conference (GECCO), Montreal, Canada, July 2009.
p. 619–626.
113. Menczer F, Degeratu M, Steet WN. Efficient and scalable Pareto optimization by evolutionary
local selection algorithms. Evol Comput. 2000;8(2):223–47.
114. Miettinen K. Nonlinear multiobjective optimization. Boston: Kluwer; 1999.
115. Mo H, Xu Z, Xu L, Wu Z, Ma H. Constrained multiobjective biogeography optimization
algorithm. Sci World J. 2014;2014, Article ID 232714:12p.
116. Modiri-Delshad M, Rahim NA. Multi-objective backtracking search algorithm for economic
emission dispatch problem. Appl Soft Comput. 2016;40:479–94.
References 409
117. Molina J, Laguna M, Marti R, Caballero R. SSPMO: a scatter tabu search procedure for
non-linear multiobjective optimization. INFORMS J Comput. 2007;19(1):91–100.
118. Mora AM, Garcia-Sanchez P, Merelo JJ, Castillo PA. Pareto-based multi-colony multi-
objective ant colony optimization algorithms: an island model proposal. Soft Comput.
2013;17:1175–207.
119. Murata T, Ishibuchi H, Gen M. Specification of genetic search direction in cellular multi-
objective genetic algorithm. In: Proceedings of the 1st international conference on evolution-
ary multicriterion optimization (EMO), Zurich, Switzerland, March 2001. Berlin: Springer;
2001. p. 82–95.
120. Nam DK, Park CH. Multiobjective simulated annealing: a comparative study to evolutionary
algorithms. Int J Fuzzy Syst. 2000;2(2):87–97.
121. Nebro AJ, Durillo JJ, Luna F, Dorronsoro B, Alba E. MOCell: a cellular genetic algorithm
for multiobjective optimization. Int J Intell Syst. 2009;24:726–46.
122. Nebro AJ, Luna F, Alba E. New ideas in applying scatter search to multiobjective optimization.
In: Proceedings of the 3rd international conference on evolutionary multicriterion optimization
(EMO), Guanajuato, Mexico, March 2005. p. 443–458.
123. Nebro AJ, Luna F, Alba E, Dorronsoro B, Durillo JJ, Beham A. AbYSS: adapting scatter
search to multiobjective optimization. IEEE Trans Evol Comput. 2008;12(4):439–57.
124. Nguyen L, Bui LT, Abbass HA. DMEA-II: the direction-based multi-objective evolutionary
algorithm-II. Soft Comput. 2014;18:2119–34.
125. Okabe T, Jin Y, Sendhoff B, Olhofer M. Voronoi-based estimation of distribution algorithm for
multi-objective optimization. In: Proceedings of IEEE congress on evolutionary computation
(CEC), Portland, OR, USA, June 2004. p. 1594–1601.
126. Parsopoulos KE, Tasoulis DK, Pavlidis NG, Plagianakos VP, Vrahatis MN. Vector evaluated
differential evolution for multiobjective optimization. In: Proceedings of IEEE congress on
evolutionary computation (CEC), Portland, Oregon, USA, June 2004. p. 204–211.
127. Parsopoulos KE, Tasoulis DK, Vrahatis MN. Multiobjective optimization using parallel vector
evaluated particle swarm optimization. In: Proceedings of the IASTED international confer-
ence on artificial intelligence and applications, Innsbruck, Austria, Feb 2004. p. 823–828.
128. Pelikan M, Sastry K, Goldberg DE. Multiobjective HBOA, clustering, and scalability. In:
Proceedings of international conference on genetic and evolutionary computation; 2005. p.
663–670.
129. Pulido GT, Coello CAC. Using clustering techniques to improve the performance of a par-
ticle swarm optimizer. In: Proceedings of genetic and evolutionary computation conference
(GECCO), Seattle, WA, USA, June 2004. p. 225–237.
130. Purshouse RC, Fleming PJ. On the evolutionary optimization of many conflicting objectives.
IEEE Trans Evol Comput. 2007;11(6):770–84.
131. Rahimi-Vahed A, Mirzaei AH. A hybrid multi-objective shuffled frog-leaping algorithm for
a mixed-model assembly line sequencing problem. Comput Ind Eng. 2007;53(4):642–66.
132. Rao RV, Patel V. Multi-objective optimization of two stage thermoelectric cooler using a mod-
ified teaching-learning-based optimization algorithm. Eng Appl Artif Intell. 2013;26:430–45.
133. Ray T, Liew KM. A swarm metaphor for multiobjective design optimization. Eng Optim.
2002;34(2):141–53.
134. Reddy MJ, Kumar DN. An efficient multi-objective optimization algorithm based on swarm
intelligence for engineering design. Eng Optim. 2007;39(1):49–68.
135. Reynoso-Meza G, Sanchis J, Blasco X, Martinez M. Design of continuous controllers using
a multiobjective differential evolution algorithm with spherical pruning. In: Applications of
evolutionary computation. Lecture notes in computer science, vol. 6024. Berlin: Springer;
2010. p. 532–541.
136. Robic T, Filipic B. DEMO: differential evolution for multiobjective optimization. In: Proceed-
ings of the 3rd international conference on evolutionary multi-criterion optimization (EMO),
Guanajuato, Mexico, March 2005. p. 520–533.
410 23 Multiobjective Optimization
137. Sadollah A, Eskandar H, Kim JH. Water cycle algorithm for solving constrained multi-
objectiveoptimization problems. Appl Soft Comput. 2015;27:279–98.
138. Sastry K, Goldberg DE, Pelikan M. Limits of scalability of multi-objective estimation of dis-
tribution algorithms. In: Proceedings of IEEE congress on evolutionary computation (CEC),
Edinburgh, UK, Sept 2005. p. 2217–2224.
139. Sato H, Aguirre H, Tanaka K. Controlling dominance area of solutions and its impact on the
performance of MOEAs. In: Proceedings of the 4th international conference on evolutionary
multi-criterion optimization (EMO), Matsushima, Japan, March 2007. p. 5–20.
140. Schaffer JD. Multiple objective optimization with vector evaluated genetic algorithms. In:
Grefenstette JJ, editor. Proceedings of the 1st international conference on genetic algorithms,
Pittsburgh, PA, USA, July 1985. Hillsdale, NJ, USA: Lawrence Erlbaum; 1985. p. 93–100.
141. Schott JR. Fault tolerant design using single and multicriteria genetic algorithm optimization.
Master’s Thesis, Department of Aeronautics and Astronautics, Massachusetts Institute of
Technology, Cambridge, MA; 1995.
142. Shim VA, Tan KC, Cheong CY. An energy-based sampling technique for multi-objective
restricted Boltzmann machine. IEEE Trans Evol Comput. 2013;17(6):767–85.
143. Shim VA, Tan KC, Chia JY, Al Mamun A. Multi-objective optimization with estimation of
distribution algorithm in a noisy environment. Evol Comput. 2013;21(1):149–77.
144. Sierra MR, Coello CAC. Improving PSO-based multiobjective optimization using crowding,
mutation and -dominance. In: Proceedings of the 3rd international conference on evolution-
ary multi-criterion optimization (EMO), Guanajuato, Mexico, March 2005. p. 505–519.
145. Singh HK, Isaacs A, Ray T. A Pareto corner search evolutionary algorithm and dimen-
sionality reduction in many-objective optimization problems. IEEE Trans Evol Comput.
2011;15(4):539–56.
146. Smith KI, Everson RM, Fieldsend JE, Murphy C, Misra R. Dominance-based multiobjective
simulated annealing. IEEE Trans Evol Comput. 2008;12(3):323–42.
147. Soh H, Kirley M. moPGA: toward a new generation of multiobjective genetic algorithms. In:
Proceedings of IEEE congress on evolutionary computation, Vancouver, BC, Canada, July
2006. p. 1702–1709.
148. Soylu B, Köksalan M. A favorable weight-based evolutionary algorithm for multiple criteria
problems. IEEE Trans Evol Comput. 2010;14(2):191–205.
149. Srinivas N, Deb K. Multiobjective optimization using nondominated sorting in genetic algo-
rithms. Evol Comput. 1994;2(3):221–48.
150. Srinivas M, Patnaik LM. Adaptive probabilities of crossover and mutation in genetic algo-
rithms. IEEE Trans Syst Man Cybern. 1994;24(4):656–67.
151. Tan KC, Lee TH, Khor EF. Evolutionary algorithms with dynamic population size and local
exploration for multiobjective optimization. IEEE Trans Evol Comput. 2001;5(6):565–88.
152. Tan KC, Yang YJ, Goh CK. A distributed cooperative coevolutionary algorithm for multiob-
jective optimization. IEEE Trans Evol Comput. 2006;10(5):527–49.
153. Tang HJ, Shim VA, Tan KC, Chia JY. Restricted Boltzmann machine based algorithm for
multi-objective optimization. In: Proceedings of IEEE congress on evolutionary computation
(CEC), Barcelona, Spain, July 2010. p. 3958–3965.
154. Teo J. Exploring dynamic self-adaptive populations in differential evolution. Soft Comput.
2006;10(8):673–86.
155. Toffolo A, Benini E. Genetic diversity as an objective in multi-objective evolutionary algo-
rithms. Evol Comput. 2003;11(2):151–67.
156. Vasconcelos JA, Maciel JHRD, Parreiras RO. Scatter search techniques applied to electro-
magnetic problems. IEEE Trans Magn. 2005;4:1804–7.
157. Veldhuizen DAV, Lamont GB. Multiobjective evolutionary algorithm research: a history and
analysis. Technical Report TR-98-03, Department of Electrical and Computer Engineering,
Graduate School of Engineering, Air Force Institute of Technology, Wright-Patterson AFB,
OH, USA; 1998.
References 411
158. Vrugt JA, Robinson BA, Hyman JM. Self-adaptive multimethod search for global optimization
in real-parameter spaces. IEEE Trans Evol Comput. 2009;13(2):243–59.
159. Wagner T, Beume N, Naujoks B. Pareto-, aggregation-, and indicator-based methods in many-
objective optimization. In: Proceedings of the 4th international conference on evolutionary
multi-criterion optimization (EMO), Matsushima, Japan, March 2007. p. 742–756.
160. Wang R, Purshouse RC, Fleming PJ. Preference-inspired coevolutionary algorithms for many-
objective optimization. IEEE Trans Evol Comput. 2013;17(4):474–94.
161. Wanner EF, Guimaraes FG, Takahashi RHC, Fleming PJ. Local search with quadratic approx-
imations into memetic algorithms for optimization with multiple criteria. Evol Comput.
2008;16(2):185–224.
162. Wu Y, Jin Y, Liu X. A directed search strategy for evolutionary dynamic multiobjective
optimization. Soft Comput. 2015;19:3221–35.
163. Xiang Y, Zhou Y. A dynamic multi-colony artificial bee colony algorithm for multi-objective
optimization. Appl Soft Comput. 2015;35:766–85.
164. Xue J, Wu Y, Shi Y, Cheng S. Brain storm optimization algorithm for multi-objective opti-
mization problems. In: Proceedings of the 3rd international conference on advances in swarm
intelligence, Shenzhen, China, June 2012. Berlin: Springer; 2012. p. 513–519.
165. Yang S, Li M, Liu X, Zheng J. A grid-based evolutionary algorithm for many-objective
optimization. IEEE Trans Evol Comput. 2013;17(5):721–36.
166. Yang X-S. Bat algorithm for multi-objective optimization. Int J Bio-Inspired Comput.
2011;3(5):267–74.
167. Yen GG, Leong WF. Dynamic multiple swarms in multiobjective particle swarm optimization.
IEEE Trans Syst Man Cybern Part A. 2009;39(4):890–911.
168. Yen GG, Lu H. Dynamic multiobjective evolutionary algorithm: adaptive cell-based rank and
density estimation. IEEE Trans Evol Comput. 2003;7(3):253–74.
169. Zhan Z-H, Li J, Cao J, Zhang J, Chung HS-H, Shi Y-H. Multiple populations for multi-
ple objectives: a coevolutionary technique for solving multiobjective optimization problems.
IEEE Trans Cybern. 2013;43(2):445–63.
170. Zhang Q, Li H. MOEA/D: a multiobjective evolutionary algorithm based on decomposition.
IEEE Trans Evol Comput. 2007;11(6):712–31.
171. Zhang Q, Liu W, Li H. The performance of a new version of MOEA/D on CEC09 uncon-
strained MOP test instances. In: Proceedings of the IEEE conference on evolutionary com-
putation (CEC), Trondheim, Norway, May 2009. p. 203–208.
172. Zhang Q, Zhou A, Jin Y. Global multiobjective optimization via estimation of distribution
algorithm with biased initialization and crossover. In: Proceedings of the genetic and evolu-
tionary computation conference (GECCO), London, UK, July 2007. p. 617–622.
173. Zhang Q, Zhou A, Jin Y. RM-MEDA: a regularity model-based multi-objective estimation of
distribution algorithm. IEEE Trans Evol Comput. 2008;12(1):41–63.
174. Zhang X, Tian Y, Cheng R, Jin Y. An efficient approach to non-dominated sorting for evolu-
tionary multi-objective optimization. IEEE Trans Evol Comput. 2015;19(2):201–15.
175. Zhong X, Li W. A decision-tree-based multi-objective estimation of distribution algorithm. In:
Proceedings of international conference on computational intelligence and security, Harbin,
China, Dec 2007. p. 114–118.
176. Zhou A, Zhang Q, Jin Y. Approximating the set of pareto-optimal solutions in both the
decision and objective spaces by an estimation of distribution algorithm. Trans Evol Comput.
2009;13(5):1167–89.
177. Zitzler E, Deb K, Thiele L. Comparison of multiobjective evolutionary algorithms: empirical
results. Evol Comput. 2000;8(2):173–95.
178. Zitzler E, Kunzli S. Indicator-based selection in multiobjective search. In: Proceedings of the
8th international conference on parallel problem solving from nature (PPSN VIII), Birming-
ham, UK, Sept 2004. Berlin: Springer; 1998. p. 832–842.
412 23 Multiobjective Optimization
179. Zitzler E, Laumanns M, Thiele L. SPEA2: improving the strength Pareto evolutionary algo-
rithm. TIK-Report 103, Departmentt of Electrical Engineering, Swiss Federal Institute of
Technology, Switzerland. 2001.
180. Zitzler E, Laumanns M, Thiele L. SPEA2: improving the strength pareto evolutionary algo-
rithm. In: Proceedings of evolutionary methods for design, optimisation and control. CIMNE,
Barcelona, Spain; 2002. p. 95–100.
181. Zitzler E, Thiele L. Multiobjective evolutionary algorithms: a comparative case study and the
strength Pareto approach. IEEE Trans Evol Comput. 1999;3(4):257–71.
182. Zitzler E, Thiele L, Laumanns M, Fonseca CM, da Fonseca VG. Performance assessment of
multiobjective optimizers: an analysis and review. IEEE Trans Evol Comput. 2003;7:117–32.
Appendix
Benchmarks A
This chapter gives benchmark functions for discrete optimization as well as for real-
valued unconstrained, multimodal, multiobjective, and dynamic optimization.
This section gives a few well-known benchmark functions for evaluating discrete
optimization methods.
Quadratic Assignment Problem (QAP)
The quadratic assignment problem (QAP) is a well-known NP-hard COP with a
wide variety of applications, including the facility location problem. For the facility
location problem, the objective is to find a minimum cost assignment of facilities
to locations considering the flow of materials between facilities and the distance
between locations. The facility location problem can be formulated as
n
n
min z p = f i j d pi p j , (A.1)
p∈P
i=1 j=1
where f i j is the flow matrix with the flow f i j between the two facilities i and j,
di j is the distance matrix, p is a permutation vector of n indices of facilities (or
locations) mapping a possible assignment of n facilities to n locations, and P is the
set of all n-vector permutations.
The p-center problem [3], also known as minimax location-allocation problem,
consists of locating p facilities (centers) on a network such that the maximum of
the distances between nodes and their nearest centers is minimized. In the p-center
problem, N nodes (customers) and distances between nodes are given, and p centers
should be located at any of the N given nodes. The p-center problem can be used in
applications such as locating fire stations, police departments, or emergency centers.
subject to
n
ai j xi j = bi , ∀i = 1, . . . , m, (A.3)
j=1
m
xi j = 1, ∀ j = 1, . . . , n, (A.4)
i=1
n
subject to xik = 1, ∀i = 1, 2, . . . , r, (A.7)
k=1
r
m i xik ≤ Mk , ∀k = 1, 2, . . . , n, (A.8)
i=1
r
pi xik ≤ Pk , ∀k = 1, 2, . . . , n, (A.9)
i=1
requirements of the tasks assigned to processor k should not exceed the processing
capacity of processor k.
Other QAP applications have been encountered in a variety of other domains
such as the backboard wiring problem in electronics, the arrangement of electronic
components in printed circuit boards and in microchips, machine scheduling in man-
ufacturing, load balancing and task allocation in parallel and distributed computing,
statistical data analysis, and transportation. A set of test problems can be obtained
from QAPLIB (http://www.seas.upenn.edu/qaplib/inst.html) and Taillard’s reposi-
tory (http://mistic.heig-vd.ch/taillard/problemes.dir/qap.dir/qap.html).
Traveling Salesman Problem
For symmetric TSP, the distances between nodes are independent of the direction,
i.e., di j = d ji for every pair of nodes. In asymmetric TSP, at least one pair of nodes
satisfies di j = d ji .
The problem can be described as
min dx y vxi (v y,i+1 + v y,i−1 ) (A.11)
x y=x i
subject to
vxi vx j = 0, (A.12)
x i j=i
vxi v yi = 0, (A.13)
i x x= y
2
vxi − n = 0. (A.14)
x i
The objective is to find the shortest tour. The first constraint is satisfied if and only if
each city row x contains no more than one 1, i.e., the rest of the entries are zero. The
second constraint is satisfied if and only if each position-in-tour column contains no
more than one 1, i.e., the rest of the entries are zero. The third constraint is satisfied
if and only if there are n entries of one in the entire matrix. The first three terms
describe the feasibility requirements which defines a valid tour by taking zero [4].
The last term represents the objective function of TSP.
TSPLIB (http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/) provides TPS
problem benchmarking from Reinelt [9]. Ulysses16 provides coordinates of 16 loca-
tions of Odysseus’ journey home to Ithaca, also known as Homer’s Odyssey, given
in Table A.1. The length of the optimal tour is 6859 when geographical distances are
used.
Some benchmarks can be found in TSPlib. Berlin52 provides coordinates of 52
locations in Berlin, Germany. The length of optimal tour is 7542 when using Euclid-
ean distances. Bier127 provides coordinates of 127 beer gardens in Augsburg, Ger-
many. The length of optimal tour is 118282 when using Euclidean distances. Gr666
provides coordinates of 666 cities on earth. The length of optimal tour is 294358 in
case of using geographical distances.
416 Appendix A: Benchmarks
Knapsack Problem
The knapsack problem consists in finding a subset of an original set of objects such
that the total profit of the selected objects is maximized while a set of resource con-
straints are satisfied. The knapsack problem is a model of many real applications such
as cutting stock problems, project selection and cargo loading, allocating processors
and databases in a distributed computer system.
The knapsack problem is an NP-hard problem. It can be formulated as an inte-
ger linear programming problem. The most common 0/1 knapsack problem can be
formulated as [1]
n
max pi xi (A.15)
i=1
subject to
n
ri j x j ≤ bi , i = 1, . . . , m, (A.16)
j=1
xi ∈ {0, 1}, i = 1, . . . , n, (A.17)
where p = ( p1 , p2 , . . . , pn )T with pi > 0 denoting the profit on item i, x =
(x1 , x2 , . . . , xn )T with xi = 1 denoting item i among the selected items (the knap-
sack) and xi = 0 otherwise, m is the number of resource constraints, bi ≥ 0,
i = 1, 2, . . . , m, denotes the budget of constraint i, and the weights ri j represents
the investment on item j subject to constraint i.
The bounded knapsack problem replaces the constraint (A.17) by xi ∈ {0,
1, . . . , ci }, i = 1, . . . , n, where ci is an integer value. The unbounded knapsack
problem replaces the constraint (A.17) by xi ≥ 0, that is, xi is a nonnegative integer.
There are also multidimensional knapsack problems, multiple knapsack problems,
and multiobjective multiple knapsack problems. Multiple knapsack problem is sim-
ilar to bin packing problem.
Appendix A: Benchmarks 417
subject to
n
xi = m, (A.19)
i=1
xi ∈ {0, 1}, i = 1, . . . , n, (A.20)
where di j is simply the distance between element i and element j.
Bin Packing Problem
In the bin packing problem, objects of different volumes must be packed into a finite
number of bins or containers each of volume V in a way that minimizes the number
of bins used. There are many variations of this problem, such as 2D packing, linear
packing, packing by weight, packing by cost, and so on. They have many applications,
such as filling up containers, loading trucks with weight capacity constraints, and
creating file backups in media.
The bin packing problem is an NP-hard COP. It can also be seen as a special
case of the cutting stock problem. When the number of bins is restricted to 1 and
each item is characterized by both a volume and a value, the problem of maximizing
the value of items that can fit in the bin is known as the knapsack problem. The
2D bin packing problem is to pack objects with various width and length sizes into
minimized number of 2D bins.
Nurse Rostering Problem
Nurse rostering problem is a COP tackled by assigning a set of shifts to a set of
nurses, each of whom has specific skills and work contract, to a predefined roster-
ing period according to a set of constraints. The standard dataset published in the
First International Nurse Rostering Competition 2010 (INRC2010) consists of 69
instances which reflect this problem in many real-world cases that are varied in size
and complexity.
418 Appendix A: Benchmarks
Many problems from the EA literature, each belonging to the important class of real-
valued, unconstrained, multiobjective test problems, are systematically reviewed
and analyzed in [6], where a flexible toolkit is presented for constructing well-
designed test problems. CEC2005 benchmark [10] is a well-known benchmark that
includes 25 functions for real-parameter optimization algorithms. The codes in Mat-
lab, C and Java for them could be found at http://www.ntu.edu.sg/home/EPNSugan/.
IEEE Congress on Evolutionary Computation provides a series of CEC benchmark
functions for testing various optimization algorithms. The Black-Box Optimization
Benchmarking (BBOB) Workshop of the Genetic and Evolutionary Computation
Conference (GECCO) also provides a series of BBOB benchmark functions, which
are composed of noisy and noiseless test functions.
Optimal reactive power dispatch problem is a well-known nonlinear optimization
problem in power systems. It tries to find the best combination of control variables
so that the loss and voltage deviation minimizations can be achieved. Two examples
are IEEE 30-bus system and IEEE 118-bus system.
Some test functions are illustrated in http://en.wikipedia.org/wiki/Test_functions_
for_optimization and http://www.sfu.ca/~ssurjano/optimization.html. MATLAB
codes for various metaheuristics are available at http://yarpiz.com.
Ackley Function
⎛ ⎞ n
n
1 1
20 + e − 20 exp ⎝−0.2 xi2 ⎠ − exp cos(2πxi ) . (A.21)
n n
i=1 i=1
Decision space: [−32, 32]n .
Minimum: 0 at x ∗ = 0.
Alpine Function
n
|xi sin xi + 0.1xi |. (A.22)
i=1
Decision space: [−10, 10]n .
Minimum: 0.
Six-Hump-Camelback Function
x16
4x12 − 2.1x14 + + x1 x2 − 4x22 + 4x24 . (A.23)
3
Appendix A: Benchmarks 419
n
x2 = xi2 . (A.24)
i=1
Decision space: [−100, 100]n .
Minimum: 0 at x ∗ = 0.
Drop Wave Function
√
1 + cos 12 x
− . (A.25)
2 x +2
1 2
− cos x1 cos x2 exp −(x1 − π)2 − (x2 − π)2 (A.26)
Decision space: [−100, 100]2 .
Minimum: -1 at x = (π, π)T .
Griewank Function
x2
n
xi
− cos √ + 1. (A.27)
4000 i
i=1
Decision space: [−600, 600]n .
Minimum: 0 at x ∗ = 0.
Michalewicz Function
20
n
i xi2
− sin xi sin . (A.28)
π
i=1
Decision space: [0, π]n .
Minimum: −1.8013 at x ∗ = (2.20, 1.57)T for n = 2.
Pathological Function
⎛ ⎞
n−1 sin2 100xi2 + xi+1
2 − 0.5
⎜ ⎟
⎝0.5 + 2 2 ⎠ . (A.29)
i=1 1 + 0.001 xi − 2xi xi+1 + xi+1
2
420 Appendix A: Benchmarks
n
10n + xi2 − 10 cos(2πxi ) . (A.30)
i=1
Decision space: [−5.12, 5.12]n .
Minimum: 0 at x ∗ = 0.
Rosenbrock Function
n−1
2
100 xi+1 − xi2 + (xi − 1)2 . (A.31)
i=1
Decision space: [−100, 100]n .
Minimum: 0 at x ∗ = (1, 1, . . . , 1)T .
Salomon Function
2
a
f (x) = + (x12 + x22 )2 (A.33)
b + (x12 + x22 )
with a = 3.0, b = 0.05.
x ∈ [5.12, 5.12]2 .
Schaffer Function
sin2 x12 + x22 − 0.5
f (x) = 0.5 + 2 . (A.34)
1 + 0.001(x12 + x22 )
Decision space: [−100, 100]2 .
Minimum: 0 at x = 0.
Schwefel Function
n
418.9829n − xi sin |xi | . (A.35)
i=1
Appendix A: Benchmarks 421
n
|xi |i+1 . (A.36)
i=1
x2 2.5
n
3 exp − − 10 exp −8x2 + cos(5(xi + (1 + i mod 2) cos(x2 ))).
10n n
i=1
(A.37)
Decision space: [−10, 5]n .
Minimum: 0.
Whitley Function
n
n yi,2 j
− cos yi, j + 1 . (A.38)
4000
i=1 j=1
2 4
n
i x1
n
i x1
x2 + + xi . (A.40)
2 2
i=1 i=1
n
i xi2 . (A.41)
i=1
n
5i xi2 . (A.42)
i=1
Decision space: [−5.12, 5.12]n .
Test Functions for Multimodal Optimization
Those test functions listed in Section A.2.1 that contains sin and cos functions demon-
strates periodic properties and thus can be used as benchmark for multimodal opti-
mization. For example, Ackley, Rastrigin, Griewank, and Schwefel functions are
typically used.
The following test functions for constrained optimization are extracted from [7].
g06
n
1 2
f 1 (x) = 1 − exp − (xi − √ ) , (A.52)
n
i=1
n
1 2
f 2 (x) = 1 − exp − (xi − √ ) , (A.53)
n
i=1
n = 3;
Variable bounds: [−4, 4].
Optimal solutions: x1 = x2 = x3 ∈ [− √1 , √1 ].
3 3
This function has a nonconvex, continuous Pareto optimal front which corresponds
to g(x) = 1.
ZDT1
f 1 (x) = x1 , (A.54)
x1
f 2 (x) = g(x) 1 − , (A.55)
g(x)
n
xi
g(x) = 1 + 9 i=2 (A.56)
n−1
n = 30;
Variable bounds: x = (x1 , x2 , . . . , xn )T , xi ∈ [0, 1].
Optimal solutions: x1 ∈ [0, 1], xi = 0, i = 2, . . . , 30.
This function has a convex, continuous Pareto optimal front which corresponds to
g(x) = 1.
424 Appendix A: Benchmarks
ZDT2
f 1 (x) = x1 , (A.57)
x1 2
f 2 (x) = g(x) 1 − ( ) , (A.58)
g(x)
n
xi
g(x) = 1 + 9 i=2 . (A.59)
n−1
Variable bounds: x = (x1 , x2 , . . . , xn )T , xi ∈ [0, 1].
Optimal solutions: x1 ∈ [0, 1], xi = 0, i = 2, . . . , 30.
This function has a nonconvex, continuous Pareto optimal front which corresponds
to g(x) = 1.
ZDT3
f 1 (x) = x1 , (A.60)
x1 x1
f 2 (x) = g(x) 1 − − sin(10πx1 ) , (A.61)
g(x) g(x)
n
x i
g(x) = 1 + 9 i=2 (A.62)
n−1
Variable bounds: x = (x1 , x2 , . . . , xn )T , xi ∈ [0, 1].
Optimal solutions: x1 ∈ [0, 1], xi = 0, i = 2, . . . , 30.
This function has a convex, discontinuous Pareto optimal front which corresponds
to g(x) = 1.
ZDT4
f 1 (x) = x1 , (A.63)
x1
f 2 (x) = g(x) 1 − , (A.64)
g(x)
n
2
g(x) = 1 + 10(n − 1) + xi − 10 cos(4πxi ) . (A.65)
i=2
n = 30.
Variable bounds: x1 ∈ [0, 1], xi ∈ [−5, 5], i = 2, . . . , n.
Optimal solutions: x1 ∈ [0, 1], xi = 0, i = 2, . . . , 30.
This function has a nonconvex, discontinuous Pareto optimal front which corre-
sponds to g(x) = 1.
Appendix A: Benchmarks 425
ZDT6
Osyczka2
Objective functions:
f 1 (x) = − 25(x1 − 2)2 + (x2 − 2)2 + (x3 − 1)2 (x4 − 4)2 + (x5 − 1)2 , (A.69)
f 2 (x) = x12 + x22 + x32 + x42 + x52 + x62 . (A.70)
Constraints:
g1 (x) = 0 ≤ x1 + x2 − 2, (A.71)
g2 (x) = 0 ≤ 6 − x1 − x2 , (A.72)
g3 (x) = 0 ≤ 2 − x2 + x1 , (A.73)
g4 (x) = 0 ≤ 2 − x1 + 3x2 , (A.74)
g5 (x) = 0 ≤ 4 − (x3 − 3)2 − x4 , (A.75)
g6 (x) = 0 ≤ (x5 − 3)3 + x6 − 4. (A.76)
Variable bounds: x1 ∈ [0, 10], x2 ∈ [0, 10], x3 ∈ [1, 5], x4 ∈ [0, 6], x5 ∈ [1, 5],
x6 ∈ [0, 10].
Tanaka
Objective functions:
f 1 (x) = x1 , (A.77)
f 2 (x) = x2 , (A.78)
Constraints:
g1 (x) = −x12 − x22 + 1 + 0.1 cos(16 arctan(x1 /x2 )) ≤ 0, (A.79)
g2 (x) = (x1 − 0.5)2 + (x2 − 0.5)2 ≤ 0.5. (A.80)
426 Appendix A: Benchmarks
f 1 ( y) = y1 , (A.89)
y1
f 2 ( y) = g( y) 1 − , (A.90)
g( y)
n
yi
g( y) = 1 + 9 i=2 (A.91)
n−1
t =
f c /F E Sc (A.92)
Appendix A: Benchmarks 427
y1 = x1 (A.93)
t
yi = |xi − |/H (t), i = 2, . . . , n (A.94)
nT
t t
H (t) = max{|1 − |, | − 1 − |}. (A.95)
nT nT
n = 30.
Variable bounds: x1 ∈ [0, 1], xi ∈ [−1, 1], i = 2, . . . , n.
DZDT2
f 1 ( y) = y1 , (A.96)
y1 2
f 2 ( y) = g( y) 1 − ( ) , (A.97)
g( y)
n
yi
g( y) = 1 + 9 i=2 , (A.98)
n−1
t =
f c /F E Sc , (A.99)
y1 = x1 (A.100)
t
yi = |xi − |/H (t), i = 2, . . . , n, (A.101)
nT
t t
H (t) = max{|1 − |, | − 1 − |}. (A.102)
nT nT
n = 30.
Variable bounds: x1 ∈ [0, 1], xi ∈ [−1, 1], i = 2, . . . , n.
DZDT3
f 1 (x) = x1 , (A.103)
x1 x1
f 2 (x) = g(x) 1 − − sin(10πx1 ) , (A.104)
g(x) g(x)
n
xi
g(x) = 1 + 9 i=2 (A.105)
n−1
t =
f c /F E Sc (A.106)
y1 = x1 (A.107)
t
yi = |xi − |/H (t), i = 2, . . . , n (A.108)
nT
t t
H (t) = max{|1 − |, | − 1 − |}. (A.109)
nT nT
n = 30.
Variable bounds: x1 ∈ [0, 1], xi ∈ [−1, 1], i = 2, . . . , n.
428 Appendix A: Benchmarks
DZDT4
y1
f 1 ( y) = y1 , f 2 ( y) = g( y) 1 − , (A.110)
g( y)
n
g( y) = 1 + 10(n − 1) + [yi2 − 10 cos(4πyi )], (A.111)
i=2
t =
f c /F E Sc , (A.112)
y1 = x1 (A.113)
t
yi = |xi − |/H (t), i = 2, . . . , n (A.114)
nT
t t
H (t) = max{|1 − |, | − 1 − |}. (A.115)
nT nT
n = 10.
Variable bounds: x1 ∈ [0, 1], xi ∈ [−1, 1], i = 2, . . . , n.
Problem
References
1. Chu PC, Beasley JE. A genetic algorithm for the multidimensional knapsack problem. J Heuris-
tics. 1998;4:63–86.
2. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multi-objective genetic algorithm:
NSGA-II. IEEE Trans Evol Comput. 2002;6(2):182–97.
3. Drezner Z. The p-center problem: heuristic and optimal algorithms. J Oper Res Soc.
1984;35(8):741–8.
4. Hopfield JJ, Tank DW. Neural computation of decisions in optimization problems. Biol Cybern.
1985;52:141–52.
5. Huband S, Barone L, While RL, Hingston P. A scalable multiobjective test problem toolkit. In:
Proceedings of the 3rd international conference on evolutionary multi-criterion optimization
(EMO), Guanajuato, Mexico, March 2005. p. 280–295.
6. Huband S, Hingston P, Barone L, While L. A review of multiobjective test problems and a
scalable test problem toolkit. IEEE Trans Evol Comput. 2006;10(5):477–506.
7. Kramer O. Self-adaptive heuristics for evolutionary computation. Berlin: Springer; 2008.
8. Matsuda S. "Optimal" Hopfield network for combinatorial optimization with linear cost func-
tion. IEEE Trans Neural Netw. 1998;9(6):1319–30.
9. Reinelt G. TSPLIB–a traveling salesman problem library. ORSA J Comput. 1991;3:376–84.
Appendix A: Benchmarks 429
10. Suganthan PN, Hansen N, Liang JJ, Deb K, Chen Y-P, Auger A, Tiwari S. Problem defini-
tions and evaluation criteria for the CEC 2005 special session on real-parameter optimization.
Technical Report, Nanyang Technological University, Singapore, and KanGAL Report No.
2005005, Kanpur Genetic Algorithms Laboratory, IIT Kanpur, India, May 2005. http://www.
ntu.edu.sg/home/EPNSugan/.
11. Zhang Q, Zhou A, Zhao S, Suganthan PN, Liu W, Tiwari S. Multiobjective optimization
test instances for the CEC 2009 special session and competition. Technical Report CES-487,
University of Essex and Nanyang Technological University, Essex, UK/Singapore, 2008.
12. Zitzler E, Deb K, Thiele L. Comparison of multiobjective evolutionary algorithms: empirical
results. Evol Comput. 2000;8(2):173–95.
Index
A C
Adaptive coding, 43 Cauchy annealing, 33
Affinity, 180 Cauchy mutation, 58
Affinity maturation process, 180 Cell-like P system, 272
Algorithmic chemistry, 304 Cellular EA, 128, 132
Allele, 40 Central force optimization, 296
Animal migration optimization, 243 Chemical reaction network, 306
Annealing, 29 Chemical reaction optimization, 304
Antibody, 180 Chemotaxis, 217
Antigen, 180 Chromosome, 40
Artificial algae algorithm, 222 Clonal crossover, 178
Artificial fish swarm optimization, 249 Clonal mutation, 178
Artificial immune network, 184 Clonal selection, 177, 178
Artificial physics optimization, 296 Clonal selection algorithm, 180
Artificial selection, 41 Clone, 178
Cloud computing, 134
CMA-ES, 88
B Cockroach swarm optimization, 251
Backtracking search, 58 Coevolution, 136
Bacterial chemotaxis algorithm, 222 Collective animal behavior algorithm, 242
Baldwin effect, 5 Combinatorial optimization problem, 14
Bare-bones PSO, 156 Compact GA, 110
Bat algorithm, 246 Computational temperature, 30
Bee colony optimization, 210 Constrained optimization, 359
Belief space, 316 Cooperative coevolution, 133
Big bang big crunch, 301 Crossover, 46, 56
Binary coding, 42 Crowding, 351
Bin packing problem, 417 Cuckoo search, 243
Biochemical network, 267 Cycle crossover, 60
Bit climber, 49
Black hole-based optimization, 302 D
Bloat phenomenon, 71 Danger theory, 178
Boltzmann annealing, 31 Darwinian model, 5
Boltzmann distribution, 30 Deceptive function, 125
Building block, 123 Deceptive problem, 126
Building-block hypothesis, 123 Deme, 128, 356
I
E
Immune algorithm, 180
Ecological selection, 41
Immune network, 178
Electromagnetism-like algorithm, 297
Immune selection, 182
Elitism strategy, 45
Immune system, 175
Evolutionary gradient search, 85
Imperialist competitive algorithm, 340
Evolutionary programming, 83 Individual, 40
Exchange market algorithm, 343 Intelligent water drops algorithm, 299
Exploitation/Exploration, 51 Invasive tumor growth optimization, 224
Invasive weed optimization, 255
F Inversion operator, 48
Firefly algorithm, 239 Ions motion optimization, 297
Fitness, 41 Island, 128
Fitness approximation, 139 Island model, 130
Fitness imitation, 141 Iterated local search, 11
Fitness inheritance, 140 Iterated tabu search, 330
Fitness landscape, 41
Fitness sharing, 350 J
Flower pollination algorithm, 256 Jumping-gene phenomenon, 50
Free search, 243
K
Kinetic gas molecule optimization, 299
G
KKT conditions, 13
Gausssian mutation, 57
Knapsack problem, 416
Gene, 40 Krill herd algorithm, 250
Gene expression programming, 78
Generational distance, 386
L
Genetic assimilation, 5
Lagrange multiplier method, 12
Genetic diversity, 47, 51 Lamarckian strategy, 5, 319
Genetic drift, 41 (λ + μ) strategy, 85
Genetic flow, 41 (λ, μ) strategy, 85
Genetic migration, 41 Large-scale mutation, 48
Genotype, 40 League championship algorithm, 342
Genotype–phenotype map, 41 Levy flights, 244
Glowworm swarm optimization, 238 Lexicographic order optimization, 17
Golden ball metaheuristic, 342 Location-allocation problem, 414
GPU computing, 135 Locus, 40
Gradient evolution, 85
Gravitational search algorithm, 295 M
Gray coding, 42 Magnetic optimization algorithm, 298
Great deluge algorithm, 300 MapReduce, 134
Group search optimization, 240 Markov chain analysis, 124
Grover’s search algorithm, 286 Marriage in honeybees optimization, 209
Guided local search, 10 Master–slave model, 129
Index 433