Differential Evolution

Introduction
Optimization is viewed as a decision problem that
involves nding the best values of the decision variables over all possibilities. The best values would give the smallest objective function value for a minimization problem or the largest objective function value for a maximization problem. In terms of real world applications, the objective function is often a representation of some physically signicant measure such as prot, loss, utility, risk or error.
Evolution in Biology
Evolution in Biology I
Organisms produce a number of offspring similar to
themselves but can have variations due to: Mutations (random changes)
Sexual reproduction (offspring have combinations of

features inherited from each parent)
Evolution in Biology II
Some offspring survive, and produce next generations, and some
dont: The organisms adapted to the environment better have higher chance to survive Over time, the generations become more and more adapted because the fittest organisms survive
Dierential Evolution
Population-based optimisation algorithm
Introduced by Storn and Price in 1996 But many practical problems have objective functions
that are non-dierentiable, hence gradient based methods cannot be employed Developed to optimise mixed integer-discretecontinuous non-linear programming as against the continuous non-linear programming which is done by most of the algorithms.
Why use Dierential Evolution?

All evolutionary algorithms use predefined
probability density functions(PDFs) for parameter perturbation. These PDFs have to be adapted throughout the optimization by a process that imposes additional complexity to the optimization procedures. DE is a self-adjusting as it deduces the perturbation information from the distances between the vectors that comprise the population.
Problem
The population
A population is a congregation of individuals. An important
characteristic feature of a population is its age, expressed in terms of generation.
where Np is the size of population denoted by i g defines the generation counter
and D the dimensionality, i.e. the number of variables in X.

E.g. X=[x1;x2;x3] =>D=3, we take a population size of 5 =>Np=5 and
suppose we wish to see the results for 20 generations => g=20
Basic steps of the algorithm
Initialisation
Mutation
Cross-over
Selection
Initialization of the population

X is a D-dimensional initialization vectors, bL and bU indicate the lower and upper bounds of the
parameter vectors xi,j. The random number generator, randj[0,1), returns a uniformly distributed random number from within the range [0,1), i.e., 0 randj[0,1) < 1. The subscript, j, indicates that a new random value is generated for each parameter.
DE Mutation Operator
DE generates new parameter vectors by adding the weighted
difference between two parameter vectors to a third vector For each target vector xi,G, i = 1,2,,NP, a mutant vector is generated according to: ui,G+1 = xr1,G + F (xr2,G xr3,G) where ui,G+1 is a mutant vector; r1,r2,r3 {1, 2, , NP}, random integer mutually different and different from index i then NP4 F is a real and constant factor (0, 1+] which controls the amplification of the differential variation; typical value is 0.8-0.9
DE Mutation Operator
Crossover
The classic variant of diversity enhancement is crossover
which mixes parameters of the mutation vector vi,g and the so-called target vector xi,g in order to generate the trial vector ui,g. The most common form of crossover is uniform and is defined as
The crossover probability, Cr [0,1], is a user-defined value that
controls the fraction of parameter values that are copied from the mutant.
Illustration of the crossover process for D=4.
Selection
DE uses simple one-to-one survivor selection where
the trial vector ui,g competes against the target vector xi,g. The vector with the lowest objective function value survives into the next generation g+1
Termination criteria
a preset number of maximum generations,
the difference between the best and worst function
values in the population is verysmall, the best function value has not improved beyond some tolerance value for a predened number of generations, the distance between solution vectors in the population is small.
Intrinsic Control Parameters of Differential Evolution

population size Np; 2. mutation intensities Fy 3. crossover probability pc
1.
First Choice The originators recommend Np/N=10, F=0.8, and pc
=0.9. However, F=0.5 and pc=0.1 are also claimed to be a good rst choice.
Adjusting Intrinsic Control Parameters

Different problems usually require different intrinsic
control parameters. Increasing the population size is suggested if differential evolution does not converge. However, this only helps to a certain extent. Mutation intensity has to be adjusted a little lower or higher than 0.8 if population size is increased. It is claimed that convergence is more likely to occur but generally takes longer with larger population and weaker mutation (smaller F).
DE dynamics
There are two opposing mechanisms that influence
DEs population. 1) Population tendency to expand and explore the terrain on which it is working. It also prevents DEs population from converging prematurely. 2)By selection, expansion is counteracted by removing vectors which are located in unproductive regions.
Sample problem
f 2 (x) y (1 - x1).^2 100 * (x2 - x1.^2).^2 with x j [2.048, 2.048] and f * (x) 0.0.
Here it is a two variable problem, so D=2 The demonstration is for first 2 generations so g=2 We take a population size of 5 so Np=5 xL=-2.048 xU=2.048
Step 1:Initialization
bL=-2.048 bU=2.048
Pop Var 1 Var 2
1 0.6557 0.0357
2 0.8491 0.9340
3 0.6787 0.7577
4 0.7431 0.3922
5 0.6555 0.1712
Random numbers are generated for both the variables
for a population size of 5
Rand()*( bU-bL ) is generated

Pop Var 1 Var 2 1 2.6859 0.1463 2 3.4780 3.8256 3 2.7801 3.1037 4 3.0439 1.6066 5 2.6848 0.7012
Initial vectors are then generated x = bL + Rand()*( bU-bL )

Pop Var 1 1 0.6379 2 1.4300 3 0.7321 4 0.9959 5 0.6368
Var 2
-1.9017
1.7776
1.0557
-0.4414
-1.3468
Example of Mutation
ui,G+1 = xr1,G + F (xr2,G xr3,G) F=0.9
Pop Var 1 Var 2 1 0.6379 -1.9017 2 1.4300 1.7776 3 0.7321 1.0557 4 0.9959 -0.4414 5 0.6368 -1.3468
E.g. For 1st variable and first generation Xr1=1.4300
Xr2=0.6368
Xr3=0.9959 Trial vector U=1.4300 + 0.9 *(0.6368-0.9959)=1.10681
Since it is the first generation so cross-over and
selection is not done
Generation 2
Function value of the previous generation is
533.1020893 7.329829 27.08365508 205.4110593 307.1924338
New vectors are generated

Pop Var 1 Var 2 1 -1.6977 -0.9729 2 1.2330 -1.9283 3 1.7566 0.9434 4 -0.0467 0.3216 5 -1.0761 -0.1686
Cross over
Random numbers generated for comparison of cross-over

0.1239 0.4904 0.8530 0.2085 0.2703 0.5650 0.6403 0.8739 0.4170 0.2060
Let Cr=0.8 Old population
Current population x(i,2)
Crossed population u(i,2)

Pop Var 1 1 -1.6977 2 1.4300 3 1.7566 4 -0.0456 5 -1.0761
Var 2
-0.9729
-1.9283
0.9434
-0.4414
-0.1686
Selection
f [u (i,2)]
1493.445845 1189.330898 459.4931906 11.29843767 180.2946151
f [x (i,2)]
1493.445845 1578.816724 459.4931906 20.77198149 180.2946151
Clearly u(i,2) has better objective function values so
the entire u(i,2) is the x(i,3) for the next generation.
DE Algorithm for Rosenbrock function

Minimize Rosenbrock function:
f 2 (x) [100(x
j 1
D/ 2
2 2 j 1
x2 j ) (1 x2 j 1 ) ]
2 2
with x j [2.048, 2.048] and f * (x) 0.0.

if D = 2, then f * (x11, x2 1) 10-6 Rosenbrock function D = 2 is a difficult minimization
problem because the convergence to the global optimum is inside a long, narrow, parabolic shaped flat valley
Rosenbrock Surface Plotting, D=2
Experiments and Results

DE/rand/1/bin scheme is used
NP=5, F=0.9, CR=0.8 Results:

Total time for 20 test runs (ms) = 156 Eval. (mean) = 108.750 (vs 654) Best min fitness value = 8.56340e-08 Best member:
var(0) = 0.9997081358509485 var(1) = 0.999414237117633
The Difference Vector Distribution

All difference vectors have both a negative counterpart and
an equal chance of being chosen, their distributions mean is zero.
The effects of scaling a, and large vector differences b
For clarity, the difference vector distribution plot only
shows the difference vector endpoints
DE Algorithm for Rosenbrock function

Minimize Rosenbrock function:
f 2 (x) [100( x2 x ) (1 x1 ) ]
2 2 1 2
with x j [5, 5] and f * (x) 0.0.

if D = 2, then f * (x11, x2 1) 10-6 Rosenbrock function D = 2 is a difficult minimization
problem because the convergence to the global optimum is inside a long, narrow, parabolic shaped flat valley
Rosenbrock Surface Plotting, D=2
ui,G+1 = xr1,G + F (xr2,G xr3,G)
Gen 1
Contour plot generated using three random vectors
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
Contour plot generated using two random and a deterministic vector corresponding to fbest
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
-4
-3
-2
-1
-4
-3
-2
-1
Gen 3
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
-4
-3
-2
-1
-4
-3
-2
-1
Gen 5
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
-4
-3
-2
-1
-4
-3
-2
-1
Gen 10
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
-4
-3
-2
-1
-4
-3
-2
-1
Gen 20
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
-4
-3
-2
-1
-4
-3
-2
-1
Gen 27
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
-4
-3
-2
-1
-4
-3
-2
-1
Gen 40
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
-4
-3
-2
-1
-4
-3
-2
-1
Gen 50
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
-4
-3
-2
-1
-4
-3
-2
-1
Gen 60
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5
-4
-3
-2
-1
-4
-3
-2
-1
X1=0.9997; x2=0.9995
X1=0.4367; x2=0.7395
Optimization with Constraints

A general formulation for constrained optimization is
Inequality Constraints
Most often, inequality constraints are implemented as
penalty funcions. One common way to integrate constraint viola-tions into an optimization task is to multiply each penalty by a weight, wm, and add the result to the objective function, f(x):
Importance of adding weights

Weights help normalize all penalties to the same range.
Without normalization, penalty function contributions
may differ by many orders of magnitude, leaving violations with small penalties underrepresented until those that generate large penalties become just as small. When there are many constraints, the main drawback of the penalty approach is that pre-specified weights must be well chosen to keep the population from converging upon either infeasible or non-optimal vectors.
Drawbacks of Penalty functions

Schemes that sum penalty functions run the risk that
one penalty will dominate unless weights are correctly adjusted. In addition, the population can converge upon an infeasible region if its objective function values are much lower than those in feasible regions. It can even happen that no set of weights will work. Because weight selection tends to be a trial and error optimization problem in its own right, simpler direct constraint handling methods have been designed that do not require the user to tune penalty weights.
Direct Constraint Handling

In simple terms, Lampinens criterion selects the trial
vector ui,g if: 1. ui,g satisfies all constraints and has a lower or equal objective function value than xi,g, or 2. ui,g is feasible and xi,g is not, or 3. ui,g and xi,g are both infeasible, but ui,g does not violate any constraint more than xi,g.
Lampinens selection criterion
Equality Constraints
equality constraints can be written in the form
If circumstances permit, the best way to handle equality
constraints is to use them to eliminate variables from the objective function.
Eliminating variables ??
Eliminating an objective function variable with an
equality constraint not only ensures that all vectors satisfy the constraint, but also reduces the problems dimension by 1. In pactice, not all equality constraint equations can be solved for a term that also appears in the objective function. When it cannot be used to eliminate a variable, an equality constraint can be recast as a pair of inequality constraints.
Transformation into Inequality Constraints

Eliminating a variable is the only way to ensure that a
solution exactly satisfies an equality constraint. Otherwise, the finite precision of floating-point number formats limits the degree to which an equality constraint can be satisfied. It is more reasonable, therefore, to demand that an equality constraint violation be less than , where can be made as small as desired:
Penalty Approach
Like inequality constraints, equality constraints can be
transformed into cost terms that penalize the objective function. Equation 4.29 shows that either the absolute value or the square of n(x) makes a suitable penalty. When all constraints are fulfilled, the penalty term, like , becomes zero:

Differential Evolution

Uploaded by

Copyright:

Available Formats

Differential Evolution

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Differential Evolution

Uploaded by

Copyright:

Available Formats

Introduction

Optimization is viewed as a decision problem that

Sexual reproduction (offspring have combinations of

Why use Dierential Evolution?

characteristic feature of a population is its age, expressed in terms of generation.

where Np is the size of population denoted by i g defines the generation counter

and D the dimensionality, i.e. the number of variables in X.

suppose we wish to see the results for 20 generations => g=20

Basic steps of the algorithm

Initialization of the population

The crossover probability, Cr [0,1], is a user-defined value that

Illustration of the crossover process for D=4.

Intrinsic Control Parameters of Differential Evolution

Adjusting Intrinsic Control Parameters

Pop Var 1 Var 2

Random numbers are generated for both the variables

for a population size of 5

Rand()*( bU-bL ) is generated

Initial vectors are then generated x = bL + Rand()*( bU-bL )

E.g. For 1st variable and first generation Xr1=1.4300

Since it is the first generation so cross-over and

selection is not done

New vectors are generated

Random numbers generated for comparison of cross-over

Let Cr=0.8 Old population

Current population x(i,2)

Crossed population u(i,2)

1493.445845 1578.816724 459.4931906 20.77198149 180.2946151

Clearly u(i,2) has better objective function values so

the entire u(i,2) is the x(i,3) for the next generation.

DE Algorithm for Rosenbrock function

with x j [2.048, 2.048] and f * (x) 0.0.

Rosenbrock Surface Plotting, D=2

Experiments and Results

NP=5, F=0.9, CR=0.8 Results:

var(0) = 0.9997081358509485 var(1) = 0.999414237117633

The Difference Vector Distribution

an equal chance of being chosen, their distributions mean is zero.

The effects of scaling a, and large vector differences b

For clarity, the difference vector distribution plot only

shows the difference vector endpoints

DE Algorithm for Rosenbrock function

with x j [5, 5] and f * (x) 0.0.

Rosenbrock Surface Plotting, D=2

ui,G+1 = xr1,G + F (xr2,G xr3,G)

Optimization with Constraints

Importance of adding weights

Drawbacks of Penalty functions

Direct Constraint Handling

Lampinens selection criterion

If circumstances permit, the best way to handle equality

constraints is to use them to eliminate variables from the objective function.

Transformation into Inequality Constraints

You might also like