Differential Evolution

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 67

Introduction

Optimization is viewed as a decision problem that

involves nding the best values of the decision variables over all possibilities. The best values would give the smallest objective function value for a minimization problem or the largest objective function value for a maximization problem. In terms of real world applications, the objective function is often a representation of some physically signicant measure such as prot, loss, utility, risk or error.

Evolution in Biology

Evolution in Biology I
Organisms produce a number of offspring similar to

themselves but can have variations due to: Mutations (random changes)

Sexual reproduction (offspring have combinations of


features inherited from each parent)

Evolution in Biology II
Some offspring survive, and produce next generations, and some

dont: The organisms adapted to the environment better have higher chance to survive Over time, the generations become more and more adapted because the fittest organisms survive

Dierential Evolution
Population-based optimisation algorithm
Introduced by Storn and Price in 1996 But many practical problems have objective functions

that are non-dierentiable, hence gradient based methods cannot be employed Developed to optimise mixed integer-discretecontinuous non-linear programming as against the continuous non-linear programming which is done by most of the algorithms.

Why use Dierential Evolution?


All evolutionary algorithms use predefined

probability density functions(PDFs) for parameter perturbation. These PDFs have to be adapted throughout the optimization by a process that imposes additional complexity to the optimization procedures. DE is a self-adjusting as it deduces the perturbation information from the distances between the vectors that comprise the population.

Problem

The population
A population is a congregation of individuals. An important

characteristic feature of a population is its age, expressed in terms of generation.

where Np is the size of population denoted by i g defines the generation counter

and D the dimensionality, i.e. the number of variables in X.


E.g. X=[x1;x2;x3] =>D=3, we take a population size of 5 =>Np=5 and

suppose we wish to see the results for 20 generations => g=20

Basic steps of the algorithm

Initialisation

Mutation

Cross-over

Selection

Initialization of the population


X is a D-dimensional initialization vectors, bL and bU indicate the lower and upper bounds of the

parameter vectors xi,j. The random number generator, randj[0,1), returns a uniformly distributed random number from within the range [0,1), i.e., 0 randj[0,1) < 1. The subscript, j, indicates that a new random value is generated for each parameter.

DE Mutation Operator
DE generates new parameter vectors by adding the weighted

difference between two parameter vectors to a third vector For each target vector xi,G, i = 1,2,,NP, a mutant vector is generated according to: ui,G+1 = xr1,G + F (xr2,G xr3,G) where ui,G+1 is a mutant vector; r1,r2,r3 {1, 2, , NP}, random integer mutually different and different from index i then NP4 F is a real and constant factor (0, 1+] which controls the amplification of the differential variation; typical value is 0.8-0.9

DE Mutation Operator

Crossover
The classic variant of diversity enhancement is crossover

which mixes parameters of the mutation vector vi,g and the so-called target vector xi,g in order to generate the trial vector ui,g. The most common form of crossover is uniform and is defined as

The crossover probability, Cr [0,1], is a user-defined value that

controls the fraction of parameter values that are copied from the mutant.

Illustration of the crossover process for D=4.

Selection
DE uses simple one-to-one survivor selection where

the trial vector ui,g competes against the target vector xi,g. The vector with the lowest objective function value survives into the next generation g+1

Termination criteria
a preset number of maximum generations,
the difference between the best and worst function

values in the population is verysmall, the best function value has not improved beyond some tolerance value for a predened number of generations, the distance between solution vectors in the population is small.

Intrinsic Control Parameters of Differential Evolution


population size Np; 2. mutation intensities Fy 3. crossover probability pc
1.
First Choice The originators recommend Np/N=10, F=0.8, and pc

=0.9. However, F=0.5 and pc=0.1 are also claimed to be a good rst choice.

Adjusting Intrinsic Control Parameters


Different problems usually require different intrinsic

control parameters. Increasing the population size is suggested if differential evolution does not converge. However, this only helps to a certain extent. Mutation intensity has to be adjusted a little lower or higher than 0.8 if population size is increased. It is claimed that convergence is more likely to occur but generally takes longer with larger population and weaker mutation (smaller F).

DE dynamics
There are two opposing mechanisms that influence

DEs population. 1) Population tendency to expand and explore the terrain on which it is working. It also prevents DEs population from converging prematurely. 2)By selection, expansion is counteracted by removing vectors which are located in unproductive regions.

Sample problem
f 2 (x) y (1 - x1).^2 100 * (x2 - x1.^2).^2 with x j [2.048, 2.048] and f * (x) 0.0.

Here it is a two variable problem, so D=2 The demonstration is for first 2 generations so g=2 We take a population size of 5 so Np=5 xL=-2.048 xU=2.048

Step 1:Initialization
bL=-2.048 bU=2.048

Pop Var 1 Var 2

1 0.6557 0.0357

2 0.8491 0.9340

3 0.6787 0.7577

4 0.7431 0.3922

5 0.6555 0.1712

Random numbers are generated for both the variables

for a population size of 5

Rand()*( bU-bL ) is generated


Pop Var 1 Var 2 1 2.6859 0.1463 2 3.4780 3.8256 3 2.7801 3.1037 4 3.0439 1.6066 5 2.6848 0.7012

Initial vectors are then generated x = bL + Rand()*( bU-bL )


Pop Var 1 1 0.6379 2 1.4300 3 0.7321 4 0.9959 5 0.6368

Var 2

-1.9017

1.7776

1.0557

-0.4414

-1.3468

Example of Mutation
ui,G+1 = xr1,G + F (xr2,G xr3,G) F=0.9
Pop Var 1 Var 2 1 0.6379 -1.9017 2 1.4300 1.7776 3 0.7321 1.0557 4 0.9959 -0.4414 5 0.6368 -1.3468

E.g. For 1st variable and first generation Xr1=1.4300

Xr2=0.6368
Xr3=0.9959 Trial vector U=1.4300 + 0.9 *(0.6368-0.9959)=1.10681

Since it is the first generation so cross-over and

selection is not done

Generation 2
Function value of the previous generation is
533.1020893 7.329829 27.08365508 205.4110593 307.1924338

New vectors are generated


Pop Var 1 Var 2 1 -1.6977 -0.9729 2 1.2330 -1.9283 3 1.7566 0.9434 4 -0.0467 0.3216 5 -1.0761 -0.1686

Cross over

Random numbers generated for comparison of cross-over


0.1239 0.4904 0.8530 0.2085 0.2703 0.5650 0.6403 0.8739 0.4170 0.2060

Let Cr=0.8 Old population

Current population x(i,2)

Crossed population u(i,2)


Pop Var 1 1 -1.6977 2 1.4300 3 1.7566 4 -0.0456 5 -1.0761

Var 2

-0.9729

-1.9283

0.9434

-0.4414

-0.1686

Selection
f [u (i,2)]
1493.445845 1189.330898 459.4931906 11.29843767 180.2946151

f [x (i,2)]

1493.445845 1578.816724 459.4931906 20.77198149 180.2946151

Clearly u(i,2) has better objective function values so

the entire u(i,2) is the x(i,3) for the next generation.

DE Algorithm for Rosenbrock function


Minimize Rosenbrock function:

f 2 (x) [100(x
j 1

D/ 2

2 2 j 1

x2 j ) (1 x2 j 1 ) ]
2 2

with x j [2.048, 2.048] and f * (x) 0.0.


if D = 2, then f * (x11, x2 1) 10-6 Rosenbrock function D = 2 is a difficult minimization

problem because the convergence to the global optimum is inside a long, narrow, parabolic shaped flat valley

Rosenbrock Surface Plotting, D=2

Experiments and Results


DE/rand/1/bin scheme is used

NP=5, F=0.9, CR=0.8 Results:


Total time for 20 test runs (ms) = 156 Eval. (mean) = 108.750 (vs 654) Best min fitness value = 8.56340e-08 Best member:

var(0) = 0.9997081358509485 var(1) = 0.999414237117633

The Difference Vector Distribution


All difference vectors have both a negative counterpart and

an equal chance of being chosen, their distributions mean is zero.

The effects of scaling a, and large vector differences b

For clarity, the difference vector distribution plot only

shows the difference vector endpoints

DE Algorithm for Rosenbrock function


Minimize Rosenbrock function:

f 2 (x) [100( x2 x ) (1 x1 ) ]
2 2 1 2

with x j [5, 5] and f * (x) 0.0.


if D = 2, then f * (x11, x2 1) 10-6 Rosenbrock function D = 2 is a difficult minimization

problem because the convergence to the global optimum is inside a long, narrow, parabolic shaped flat valley

Rosenbrock Surface Plotting, D=2

ui,G+1 = xr1,G + F (xr2,G xr3,G)

Gen 1
Contour plot generated using three random vectors
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

Contour plot generated using two random and a deterministic vector corresponding to fbest
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

-4

-3

-2

-1

-4

-3

-2

-1

Gen 3
Contour plot generated using three random vectors
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

Contour plot generated using two random and a deterministic vector corresponding to fbest
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

-4

-3

-2

-1

-4

-3

-2

-1

Gen 5
Contour plot generated using three random vectors
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

Contour plot generated using two random and a deterministic vector corresponding to fbest
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

-4

-3

-2

-1

-4

-3

-2

-1

Gen 10
Contour plot generated using three random vectors
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

Contour plot generated using two random and a deterministic vector corresponding to fbest
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

-4

-3

-2

-1

-4

-3

-2

-1

Gen 20
Contour plot generated using three random vectors
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

Contour plot generated using two random and a deterministic vector corresponding to fbest
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

-4

-3

-2

-1

-4

-3

-2

-1

Gen 27
Contour plot generated using three random vectors
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

Contour plot generated using two random and a deterministic vector corresponding to fbest
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

-4

-3

-2

-1

-4

-3

-2

-1

Gen 40
Contour plot generated using three random vectors
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

Contour plot generated using two random and a deterministic vector corresponding to fbest
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

-4

-3

-2

-1

-4

-3

-2

-1

Gen 50
Contour plot generated using three random vectors
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

Contour plot generated using two random and a deterministic vector corresponding to fbest
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

-4

-3

-2

-1

-4

-3

-2

-1

Gen 60
Contour plot generated using three random vectors
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

Contour plot generated using two random and a deterministic vector corresponding to fbest
5 4 3 2 1 0 -1 -2 -3 -4 -5 -5

-4

-3

-2

-1

-4

-3

-2

-1

X1=0.9997; x2=0.9995

X1=0.4367; x2=0.7395

Optimization with Constraints


A general formulation for constrained optimization is

Inequality Constraints
Most often, inequality constraints are implemented as

penalty funcions. One common way to integrate constraint viola-tions into an optimization task is to multiply each penalty by a weight, wm, and add the result to the objective function, f(x):

Importance of adding weights


Weights help normalize all penalties to the same range.
Without normalization, penalty function contributions

may differ by many orders of magnitude, leaving violations with small penalties underrepresented until those that generate large penalties become just as small. When there are many constraints, the main drawback of the penalty approach is that pre-specified weights must be well chosen to keep the population from converging upon either infeasible or non-optimal vectors.

Drawbacks of Penalty functions


Schemes that sum penalty functions run the risk that

one penalty will dominate unless weights are correctly adjusted. In addition, the population can converge upon an infeasible region if its objective function values are much lower than those in feasible regions. It can even happen that no set of weights will work. Because weight selection tends to be a trial and error optimization problem in its own right, simpler direct constraint handling methods have been designed that do not require the user to tune penalty weights.

Direct Constraint Handling


In simple terms, Lampinens criterion selects the trial

vector ui,g if: 1. ui,g satisfies all constraints and has a lower or equal objective function value than xi,g, or 2. ui,g is feasible and xi,g is not, or 3. ui,g and xi,g are both infeasible, but ui,g does not violate any constraint more than xi,g.

Lampinens selection criterion

Equality Constraints
equality constraints can be written in the form

If circumstances permit, the best way to handle equality

constraints is to use them to eliminate variables from the objective function.

Eliminating variables ??
Eliminating an objective function variable with an

equality constraint not only ensures that all vectors satisfy the constraint, but also reduces the problems dimension by 1. In pactice, not all equality constraint equations can be solved for a term that also appears in the objective function. When it cannot be used to eliminate a variable, an equality constraint can be recast as a pair of inequality constraints.

Transformation into Inequality Constraints


Eliminating a variable is the only way to ensure that a

solution exactly satisfies an equality constraint. Otherwise, the finite precision of floating-point number formats limits the degree to which an equality constraint can be satisfied. It is more reasonable, therefore, to demand that an equality constraint violation be less than , where can be made as small as desired:

Penalty Approach
Like inequality constraints, equality constraints can be

transformed into cost terms that penalize the objective function. Equation 4.29 shows that either the absolute value or the square of n(x) makes a suitable penalty. When all constraints are fulfilled, the penalty term, like , becomes zero:

You might also like