X - Nanyang Technological University

Differential Evolution:

Foundations, Perspectives, and

Applications

By

Swagatam Das 1 and P. N. Suganthan 2

1 Dept of Electronics and Telecommunication Engg., Jadavpur **University**

Kolkata 700032, India

2 School of Electrical and Electronic Engineering, **Nanyang** **Technological** **University**,

Singapore 639798, Singapore

SSCI 2011

Part – I (By Dr. S. Das)

Topics to be Covered

• Metaheuristics, multi-agent search, and DE

• Steps of the basic DE family of algorithms – a first look

• Control parameters of DE

• Some Significant Single-objective DE-variants for real-parameter

optimization

• A stochastic mathematical model of DE-population and convergence

analysis

Part – II (By Dr. P. N. Suganthan)

• Ensemble Strategies in DE (EPSDE)

• Constraint Handling in DE and ECHT-DE

• DE for Multi-objective Optimization

• DE for Large Scale Optimization

• DE for Multi-modal Optimization

Meta-heuristics

A metaheuristic is a heuristic method for

solving a very general class of

computational problems by combining

user-given black-box procedures —

usually heuristics themselves — in

the hope of obtaining a more efficient or

more robust procedure. The name combines

the Greek prefix "meta" ("beyond", here

in the sense of "higher level") and "heuristic"

(from ευρισκειν, heuriskein, "to find").

How a single agent can find global optima by following gradient

descent?

Direction Of

negative

gradient

� �

�

X ( n + 1)

= X ( n)

− µ .

n

( ∇f

( X ) ) � �

X = X ( )

But What about these multi-modal, noisy and even

discontinuous functions?

Gradient based methods get trapped in a local minima or the

Function itself may be non differentiable.

Multi-Agent Multi Agent Optimization in Continuous

Randomly

Initialized Agents

Space

Agents

After Convergence

Most Agents

are near

Global Optima

Differential Evolution

• A stochastic population-based algorithm for

continuous function optimization (Storn and Price,

1995)

• Finished 3 rd at the First International Contest on

Evolutionary Computation, Nagoya, 1996

(icsi.berkley.edu/~storn)

• Outperformed GA and PSO on a 34-function test

suite (Vesterstrom & Thomsen, 2004)

• Continually exhibited remarkable performance in

competitions on different kinds of optimization

problems like dynamic, multi-objective, constrained,

and multi-modal problems held under IEEE congress

on Evolutionary Computation (CEC) conference

series.

DE is an Evolutionary Algorithm

This Class also includes GA, Evolutionary

Programming and Evolutionary Strategies

Initialization Mutation Recombination Selection

Basic steps of an Evolutionary Algorithm

Max

X �

Min

Representation

x 1 x 2 x D-1 x D

Solutions are represented as vectors of size D with each

value taken from some domain.

May wish to constrain the values taken in each domain

above and below.

X �

X �

X �

Maintain Population - NP

1

2

NP

We will maintain a population of size NP

x 1,1 x 2,1 x D-1,1 x D,1

x 1,2 x 2,2 x D-1,2 x D,2

x 1,NP x 2,NP x D-1,NP x D, NP

Initialization Mutation Recombination Selection

X �

Max

Min

i

0.42 0.22 0.78 0.83

x 1,i,0

x 2,i,0 x D-1,i,0 x D,i,0

x = x + rand [0,1] ⋅( x − x )

j, i,0 j,min i, j j,max j,min

Different rand values are instantiated for each i and j.

, [0,1] i j

Initialization Mutation Recombination Selection

�For each vector select three other parameter vectors randomly.

�Add the weighted difference of two of the parameter vectors to the

third to form a donor vector (most commonly seen form of

DE-mutation):

�

V

i,

G

=

�

X

, G

�

⋅ ( X

, G

�The scaling factor F is a constant from (0, 2)

i

1

r

+

F

r

i

2

−

�

X

r

i

3

, G

).

Example of formation of donor vector over twodimensional

constant cost contours

Constant cost contours of

Sphere function

Initialization Mutation Recombination Selection

Binomial (Uniform) Crossover:

Components of the donor vector enter into the trial offspring vector in

the following way:

Let j rand be a randomly chosen integer between 1,...,D.

An Illustration of Binomial Crossover in 2-D Parametric Space:

Three possible trial vectors:

Exponential (two-point modulo) Crossover:

First choose integers n (as starting point) and L (number of components the

donor actually contributes to the offspring) from the interval [1,D]

where the angular brackets D

Pseudo-code for choosing L:

denote a modulo function with modulus D.

Example: Let us consider the following pair of donor and target vectors

�

X

i, G

⎡ 3.82 ⎤

⎢

4.78

⎥

⎢ ⎥

= ⎢− 9.34 ⎥

⎢ ⎥

⎢ 5.36 ⎥

⎢

⎣−3.77 ⎥

⎦

�

V

i , G

⎡ 8.12 ⎤

⎢

10

⎥

⎢ ⎥

= ⎢ −10

⎥

⎢ ⎥

⎢ −3.22

⎥

⎢

⎣ −1.12

⎥

⎦

Suppose n = 3 and L= 3 for this specific example. Then the exponential

crossover process can be shown as:

Initialization Mutation Recombination Selection

�“Survival of the fittest” principle in selection: The trial

offspring vector is compared with the target vector and

that on with a better fitness is admitted to the next

generation.

X �

i,

G+

1

U i,G

,

� � �

= if f ( Ui,

G)

≤ f ( Xi,

G)

X ,

� � �

f U ) >

f ( X

= if )

i,G

( i,

G i,

G

An Example of Optimization by DE

Consider the following two-dimensional function

f (x, y) = x2 +y2 The minima is at (0, 0)

Let’s start with a population of 5 candidate solutions randomly initiated in the

range (-10, 10)

X 1,0 = [2, -1] X 2,0 = [6, 1] X 3,0 = [-3, 5] X 4,0 = [-2, 6]

X 5,0 = [6,-7]

For the first vector X 1 , randomly select three other vectors say X 2 ,

X 4 and X 5

Now form the donor vector as, V 1,0 = X 2,0 +F. (X 4,0 – X 5,0 )

V

1,0

⎡6⎤ ⎧⎡−2⎤ ⎡6 ⎤⎫ ⎡−0.4⎤ = ⎢ 0.8

1

⎥ + × ⎨⎢ − ⎬ =

6

⎥ ⎢

−7

⎥ ⎢

10.4

⎥

⎣ ⎦ ⎩⎣ ⎦ ⎣ ⎦⎭ ⎣ ⎦

Now we form the trial offspring vector by exchanging

components of V 1,0 with the target vector X 1,0

Let rand[0, 1) = 0.6. If we set Cr = 0.9,

since 0.6 < 0.9, u 1,1,0 = V 1,1,0 = - 0.4

Again next time let rand[0, 1) = 0.95 > Cr

Hence u 1,2,0 = x 1,2,0 = - 1

So, finally the offspring is

Fitness of parent:

f (2, -1) = 2 2 + (-1) 2 = 5

U

1,0

⎡−0.4⎤ = ⎢

−1

⎥

⎣ ⎦

Fitness of offspring

f (-0.4, -1) = (-0.4) 2 + (-1) 2 = 1.16

Hence the parent is replaced by offspring at G = 1

Population

at G = 0

X 1,0 =

[2,-1]

X 2,0 =

[6, 1]

X 3,0 =

[-3, 5]

X 4,0 =

[-2, 6]

X 5,0 =

[6, 7]

Fitness

at G = 0

Donor vector

at G = 0

5 V 1,0

=[-0.4,10.4]

37 V 2,0

=[1.2, -0.2]

34 V 3,0

=[-4.4, -0.2]

40 V 4,0

=[9.2, -4.2 ]

85 V 5,0

=[5.2, 0.2]

Offspring

Vector at G = 0

U 1,0

=[-0.4,-1]

U 2,0

=[1.2, 1]

U 3,0

=[-4.4, -0.2]

U 4,0

=[9.2, 6 ]

U 5,0

=[6, 0.2]

Fitness of

offspring at

G = 1

Evolved

population at

G = 1

1.16 X 1,1

=[-0.4,-1]

2.44 X 2,1

=[1.2, 1]

19.4 X 3,1

=[-4.4, -0.2]

120.64 X 4,1

=[-2, 6 ]

36.04 X 5,1

=[6, 0.2]

Locus of the fittest solution: DE working on 2D Sphere Function

Locus of the fittest solution: DE working on 2D Rosenbrock Function

�

�

“DE/rand/1”: ( t)

= X ( t)

+ F ⋅(

X ( t)

− X ( t)).

“DE/best/1”:

“DE/rand/2”:

Five most frequently used DE mutation schemes

Vi i

i

i

r1

r2

r3

� �

� �

V ( t ) = X ( t)

+ F .( X i ( t ) − X i

i

best

�

r

1

�

r

2

( t)).

� � � � � �

V ( t)

= X ( t)

+ F.(

X ( t)

− X ( t))

+ F.(

X i ( t)

− X i ( t)),

“DE/target-to-best/1”: i i

best i

r1

r2

� �

� � � �

“DE/best/2”: V ( t)

= X ( t)

+ F.(

X i ( t)

− X i ( t))

+ F.(

X i ( t)

− X i ( t)).

i

best

r

1

� � � � � �

V ( t)

= X i ( t)

+ F .( X i ( t)

− X i ( t))

+ F .( X i ( t)

− X i ( t)).

i

r

1

1

The general convention used for naming the various mutation strategies is

DE/x/y/z, where DE stands for Differential Evolution, x represents a string

denoting the vector to be perturbed, y is the number of difference vectors

considered for perturbation of x, and z stands for the type of crossover being

used (exp: exponential; bin: binomial)

r

2

r

r

3

2

2

r

3

r

4

r

4

r

5

Basic Control Parameters of DE:

The Scale Factor F:

1) DE is much more sensitive to the choice of F than it is to the choice of Cr

2) The upper limit of the scale factor F is empirically taken as 1. Although it does not necessarily

mean that a solution is not possible with F > 1, however, until date, no benchmark function that was

successfully optimized with DE required F > 1 .

3) Zaharie derived a lower limit of F and the study revealed that if F is sufficiently small, the

population can converge even in the absence of selection pressure.

Zaharie’s Formula for evolution of population-variance in absence of selection:

Critical Value of F:

2

G

⎛ 2 2. pCr p ⎞ Cr

x, G = Cr − + +

x,0

Var( P ) ⎜2. F . p 1 ⎟ . Var( P )

⎝ NP NP ⎠

F

crit

=

⎛ pCr

⎜1−

⎝ 2

NP

The population variance decreases when F < F crit and increases if F > F crit .

⎞

⎟

⎠

Selection and Tuning of F in DE: Some Early Approaches

1) Typically 0.4 < F < 0.95 with F = 0.9 can serve as a good first choice

2) Randomizing F may yield good results over a variety of functions: Price et al. coined the following

terms in this context:

Dither: scales the length of vector differentials because the same factor, F i , is applied to all components of

a difference vector. (in 2005, Das et al. demonstrated the use of Dither in improving DE’s

performance. They varied F uniformly randomly between 0.5 and 1for each vector.

Jitter: Generates a new value of F for every parameter in every vector is called jitter.

3) Das et al. also proposed a DE with time-varying scale factor where the value of F is linearly decreased from

1 to 0.5 with a view of promoting exploration of diverse regions of the search volume during earlier stages

of search while favoring exploitation during the final stages.

4) A fitness-based adaptation of F was proposed by Ali et al. as:

min = 0.

4

l is the lower bound of F.

F

⎧ ⎧

⎪max

⎨l

⎪ ⎩

= ⎨

⎪ ⎧

⎪max

⎨l

⎩ ⎩

min

min

, 1

, 1

−

−

f

f

f

f

max

min

min

max

⎫

⎬

⎭

⎫

⎬

⎭

if

f

f

max

min

otherwise,

K. Price, R. Storn, and J. Lampinen, Differential Evolution - A Practical Approach to Global Optimization, Springer, Berlin,

2005.

M. M. Ali and A. Törn, “Population set based global optimization algorithms: some modifications and numerical studies,”

Computers and Operations Research, Elsevier, no. 31, pp. 1703–1725, 2004

S. Das, A. Konar, U. K. Chakraborty, “ Two improved differential evolution schemes for faster global search”, appeared in the

ACM-SIGEVO Proceedings of GECCO, Washington D.C., June 2005.

< 1

The Crossover Rate Cr:

1) The parameter Cr controls how many parameters in expectation, are changed in a population member.

2) Low value of Cr, a small number of parameters are changed in each generation and the stepwise

movement tends to be orthogonal to the current coordinate axes.

3) High values of Cr (near 1) cause most of the directions of the mutant vector to be inherited prohibiting the

generation of axis orthogonal steps.

Empirical distribution of trial vectors for three different values of Cr has been shown. The plots were

obtained by obtained by running DE on a single starting population of 10 vectors for 200 generations with

selection disabled.

(a) Cr = 0

(b) Cr = 0.5 (c) Cr = 1.0

For schemes like DE/rand/1/bin the performance is rotationally invariant only when Cr = 1.

At that setting, crossover is a vector-level operation that makes the trial vector a pure mutant i.e.

�

U

i,

G

�

=

X

i

1

r , G

+ F ⋅

�

( X i X i

r , G r , G

2

−

�

3

).

The Crossover Rate Cr (Contd.):

A low Cr value (e.g. 0 or 0.1) results in a search that changes each direction (or a small subset of

directions) separately.

� D

This is an effective strategy for functions that are separable or decomposable i.e. ( X ) = f ( )

f ∑

i=

1

A Fitness-based adaptation scheme for Cr: Recently Ghosh et al. suggested the following scheme for

fitness-based adaptation of Cr. The basic idea is: if the fitness of the donor vector gets worse,

value of Cr should be higher and vice-versa.

Define:

� �

∆ fdonor _ i = f ( Vi ) − f ( X best )

� �

If f ( V ) ≤ f ( X ),

i best

Cri = Crconst

, Else

Cr = 0.8, Cr = 0.1

Parametric setup max min

( Crmax − Crmin

)

Cri = Crmin

+

1+

∆f

yielded fairly robust performance over a wide variety of benchmarks.

donor _ i

A. Ghosh, S. Das, A. Chowdhury, and R. Giri, “Differential evolution with a fitness-based adaptation of

control parameters”, Information Sciences, Elsevier Science, Netherlands, (Accepted, 2011).

i xi

•J. Brest and and M. S. Maučec, “Population size reduction for the differential evolution algorithm”, Applied Intelligence, Vol. Vol. 29, No. 3, pp. 228-247, Dec. 2008.

The population size NP

1) The influence of NP on the performance of DE is yet to be extensively studied and

fully understood.

2) Storn and Price have indicated that a reasonable value for NP could be chosen between

5D and 10D (D being the dimensionality of the problem).

3) Brest and Maučec presented a method for gradually reducing population size of DE. The method

improves the efficiency and robustness of the algorithm and can be applied to any variant of a DE

algorithm.

4) Mallipeddi and Suganthan proposed a DE algorithm with an ensemble of parallel populations,

where the number of Function Evaluations (FEs) allocated to each population is self-adapted by

learning from their previous experiences in generating superior solutions. Consequently, a more

suitable population size along with its parameter settings can be determined adaptively to match

different search / evolution phases.

J. Brest and M. S. Maučec, “Population size reduction for the differential evolution algorithm”,

Applied Intelligence, Vol. 29, No. 3, pp. 228-247, Dec. 2008.

R. Mallipeddi, P. N. Suganthan, “Empirical study on the effect of population size on Differential

evolution Algorithm”, IEEE Congress on Evolutionary Computation, pp. 3663-3670, Hong Kong, 1-6

June, 2008.

A Few Significant and

Improved Variants of DE for

Continuous Single

Objective Optimization

DE with Arithmetic Crossover

1) In continuous or arithmetic recombination, the individual components of the trial vector are expressed as a

linear combination of the components from mutant/donor vector and the target vector.

�

�

General form: W = X + k X − X )

i,

G r , .(

1 G i r1

, G r2

, G

�

�

2) ‘DE/current-to-rand/1’ replaces the binomial crossover operator with the rotationally invariant arithmetic

line recombination operator to generate the trial vector by a linear arithmetic recombination of target and

donor vectors: � � � �

U

i,

G = X i,

G + ki

.( Vi,

G − X i,

G

�

�

which further simplifies to: U = X + k X − X ) + F'.(

X − X )

Change of the trial vectors generated through the discrete

and random intermediate recombination due to rotation of

the coordinate system.

U �

R /

i,

G

and

U �

R //

i,

G

)

�

i,

G i,

G i.(

r1

, G i,

G

r2

, G r3

, G

indicate the new trial vectors due to discrete

recombination in rotated coordinate system.

�

�

�

The ‘jDE’ Algorithm (Brest et al.,

2006)

• Control parameters F and Cr into the individual and adjusted

them by introducing two new parameters τ 1 and τ 2

• The new control parameters for the next generation are

computed as follows:

F + rand * F

rand

< τ

F =

2 1

i,G

+1 l

1 u

= F i,

G else.

Cri , G + 1 = rand if rand

3

4 < τ 2

= else,

Cr i,

G

τ = τ = 0.

1 F = 0.1,

1

2

l

if

The new F takes a value from [0.1, 0.9] while the new Cr takes

a value from [0, 1].

J. Brest, S. Greiner, B. Bošković, M. Mernik, and V. Žumer, “Self-adapting Control parameters

in differential evolution: a comparative study on numerical benchmark problems,” IEEE Trans.

on Evolutionary Computation, Vol. 10, Issue 6, pp. 646 – 657, 2006

Self-Adaptive DE (SaDE) (Qin et al., 2009)

• Includes both control parameter adaptation and strategy adaptation

Strategy Adaptation:

Four effective trial vector generation strategies: DE/rand/1/bin, DE/rand-to-best/2/bin,

DE/rand/2/bin and DE/current-to-rand/1 are chosen to constitute a strategy candidate pool.

For each target vector in the current population, one trial vector generation strategy

is selected from the candidate pool according to the probability learned from its

success rate in generating improved solutions (that can survive to the next generation)

within a certain number of previous generations, called the Learning Period (LP).

Control Parameter Adaptation:

SaDE (Contd..)

1) NP is left as a user defined parameter.

2) A set of F values are randomly sampled from normal distribution

N(0.5, 0.3) and applied to each target vector in the current

population.

3) Cr obeys a normal distribution with mean value Cr m and standard

deviation Std =0.1, denoted by N (Cr m ,Std) where Cr m is initialized

as 0.5.

4) SaDE gradually adjusts the range of Cr values for a given problem

according to previous Cr values that have generated trial vectors

successfully entering the next generation.

A. K. Qin, V. L. Huang, and P. N. Suganthan, Differential evolution algorithm

with strategy adaptation for global numerical optimization", IEEE Trans. on

Evolutionary Computation, 13(2):398-417, April, 2009.

Opposition-based DE (Rahnamayan et

al., 2008)

• Three stage modification to original DE framework based on

the concept of Opposite Numbers :

Let x be a real number defined in the closed interval [a, b]. Then

the opposite number of x may be defined as:

ODE Steps:

∪

x = a + b −

1) Opposition based Population Initialization: Fittest NP individuals are chosen

as the starting population from a combination of NP randomly generated population

members and their opposite members.

2) Opposition Based Generation Jumping: In this stage, after each iteration,

instead of generating new population by evolutionary process, the opposite

population is calculated with a predetermined probability Jr () and the NP fittest

individuals may be selected from the current population and the corresponding

opposite population.

x

Rahnamayan, H. R. Tizhoosh, and M. M. A. Salama, “Opposition-based differential evolution”, IEEE Trans. on Evolutionary Computation, Vol. 12,

Issue 1, pp. 64-79, Feb. 2008.

ODE (Contd.)

3) Opposition Based Best Individual Jumping: In this phase, at first a

difference-offspring of the best individual in the current population is created

as:

�

X

=

where r1 and r2 are mutually different random integer indices selected from

{1, 2, ..., NP} and F’ is a real constant. Next the opposite of offspring is

generated as � . Finally the current best member is replaced

by the fittest member of the set

�

+

new_

best,

G X best,

G F'.(

X r1

, G X r2

, G

X opp _

newbestG

�

�

�

−

{ , X , X }

�

X best,

G new_

best,

G opp _ newbest , G

�

)

JADE (Zhang and Sanderson, 2009)

1) Uses DE/current-to-pbest strategy as a less greedy generalization of the DE/current-to-best/ strategy.

Instead of only adopting the best individual in the DE/current-to-best/1 strategy, the current-to-pbest/1

strategy utilizes the information of other good solutions.

Denoting

�

p

X best,

G

as a randomly chosen vector from the top 100p% individuals of the current population,

DE/current-to-pbest/1 without external archive:

� � � � � �

V = X + F ⋅( X − X ) + F ⋅( X − X )

i, G i, G i

p

best, G i, G i r , G r , G

i i

1 2

2) JADE can optionally make use of an external archive (A), which stores the recently explored inferior

p �

solutions. In case of DE/current-to-pbest/1 with archive, , , and are selected from the

current population P, but is selected from

X �

2 ,

i

r G

P ∪

A

�

X i,

G

�

X best,

G

r G

i X ,

1

JADE (Contd..)

3) JADE adapts the control parameters of DE in the following manner:

A) Cr for each individual and at each generation is randomly generated from a normal distribution

N µ and then truncated to [0, 1].

( Cr,

0.

1)

The mean of normal distribution is updated as: 1−

c ). + c.

mean ( S )

µ

Cr

= ( µ Cr

A Cr

where S Cr be the set of all successful crossover probabilities Cr i s at generation G

B) Similarly for each individual and at each generation Fi is randomly generated from a Cauchy distribution

with location parameter µ F and scale parameter 0.1.

C ( µ F , 0.

1)

F i is truncated if F i > 1 or regenerated if F i

Differential Evolution with Neighborhood-based Mutation

S. Das, U. K. Chakraborty, and A. Abraham, “Differential evolution with a

neighborhood based mutation operator: a comparative study”, IEEE Transactions

on Evolutionary Computing, Vol 13, No. 3, June 2009.

Local Mutation Model:

�

� � � � �

( t)

= X ( t)

+ α ⋅ ( X _ ( t)

− X ( t))

+ β ⋅ ( X ( t)

− X ( t))

L i

n best i

p q

i i

Global Mutation Model:

� � � � � �

g i ( t)

= X i ( t)

+ α ⋅(

X g _ best ( t)

− X i ( t))

+ β ⋅(

X r ( t)

− X r

Combined Model for Donor Vector generation:

�

�

�

( t)

= w.

g ( t)

+ ( 1 − w).

L ( t)

Vi i

i

The weight factor w may be adjusted during the run or selfadapted

through the evolutional learning process.

1

2

( t))

On Timing Complexity of DEGL

Running of G max no. of generations,

complexity of DE/rand/1/bin :

Runtime complexity of

DE/target-to-best/1/bin :

O ⋅

( NP ⋅ D G max

O(max( NP ⋅Gmax

, NP ⋅ D ⋅Gmax

)) = O(

NP ⋅ D ⋅Gmax

)

Worst case timing complexity of DEGL: O NP ⋅ k ⋅ G , NP ⋅ D ⋅ G ))

where k is neighborhood radius.

Asymptotic order of complexity remains

)

(max( max

max

O ⋅

( NP ⋅ D Gmax

if k < D, which is usually the case for high-dimensional functions.

DEGL does not impose any serious burden on the runtime complexity

of the existing DE variants

)

A Special Issue on DE

just appeared in:

Feb. 2011

Guest Editors: Swagatam Das, Jadavpur **University**, India

P. N. Suganthan, **Nanyang** **Technological** **University**,Singapore.

Carlos A. Coello Coello, Col. San Pedro Zacatenco, México.

This tutorial is partly based on the survey article published in this special issue:

S. Das and P. N. Suganthan, “Differential evolution – a survey of the state-of-the-art”,

IEEE TEVC, Vol. 15, No. 1, feb. 2011.

Introduction to a few DE-variants from the special issue of IEEE TEVC:

1) cDE (Mininno et al., “Compact differential evolution”, IEEE TEVC, Vol 15, No. 1):

- An interesting Estimation of Distribution Algorithm (EDA) which,

analogously to other compact Evolutionary Algorithms (cEAs), does not

store and process the entire population and all of its individuals, but adopts

instead a static representation of the population to perform the optimization

process.

- However, unlike other cEAs, cDE encodes the actual survivor selection

mechanism of the original DE algorithm, as well as its original search logic

(i.e., its crossover and mutation operators).

- cDE uses a fairly limited amount of memory, which makes it appropriate for

hardware implementations.

CoDE (Wang et al., “Differential evolution with composite trial vector generation

strategies and control parameters”, IEEE TEVC, Vol. 15, No. 1):

-adopts three different trial vector generation strategies (rand/1/bin,

rand/2/bin, and current-to-rand/1) and three mechanisms to control DE’s

parameter settings. The parametric choices were:

1) F = 1.0, Cr = 0.1 - meant for dealing with separable problems.

2) F = 1.0, Cr = 0.9 – meant for maintaining population diversity.

3) F = 0.8, Cr = 0.2 – meant for exploiting the search space and

achieving better convergence characteristics.

-Such strategies offer different advantages and, somehow, complement

one another.

DE with Proximity-based Mutation Operators (Epitropakis et al., “Enhancing differential

evolution with proximity-based mutation operators”, IEEE TEVC, Vol. 15, No. 1):

-is based on framework that incorporates information of neighboring individuals

to guide the search towards the global optimum in a more efficient manner.

-The main idea is to adopt a stochastic selection mechanism in which the

probability of selecting an individual to become a parent is inversely proportional

to its distance from the individual undergoing mutation.

-This will favor the search in the vicinity of the mutated individual, which should

promote a proper exploitation of such a neighborhood, without sacrificing the

exploration capabilities of the mutation operator.

-The authors incorporate the proposed framework to several DE variants,

finding that in most cases its use significantly improves the performance of the

algorithm (when there is no improvement, there is no significant degradation in

performance either).

Stochastic Analysis of the DE

• DE has been modeled as a stochastic process with a time-varying

PDF.

• The mutation, crossover and selection steps have been analyzed to

produce a recurrence relation relating the PDF at time n+1 with the

present PDF for time n.

• Investigation of the relation shows the existence of a Lyapunov

functional (dependent on the PDF) which proves asymptotic stability

of the DE system.

S. Ghosh, S. Das, and A. V. Vasilakos, Convergence Analysis of Differential

Evolution over a Class of Continuous Functions with Unique Global Optimum,

Accepted in IEEE Trans. on SMC Part B, 2011.

Assumptions on the Objective Function:

The high-dimensional integral of a scalar function like

∫ ∫ ∫

∫D

ℜ

... g( x , x ,... x ). dx . dx ... dx

� �

g( x). dx

1 2 D 1 2 D

.

g( x) �

over several components of a vector i.e.

is expressed through a single integration over a vector space like

Towards a recursive relation of the population PDFs

In binomial crossover, we assume a component is inherited from donor vector with probability

PC And from the corresponding target vector with a probability 1 C P −

For the m-th crossover combination, we define a D-dimensional string Such that a ‘1’ in

θ θ m

m, d

(the d-th bit position of ) indicates that the d-th dimension is taken from the donor, and

a ‘0’ indicates that thed-th dimension is taken from the parent. For example,

is represented by ‘001’.

We define the following sets: α { a : 1 ≤ a ≤ D andθ

= 1}

and

β

m

m

= m,a

= { b : 1 ≤ b ≤ D andθ

m,b =

The probability of the m-th combination appearing in the crossover:

ν = Cardinality of set

α

m

0}

θ

m

m =

ν

2

and

ν

C

D = 3

D

P = C . P ( 1−

P )

m

C

D−ν

Next we prove the following :

1) At time n the PDF of the donor vector corresponding to the i-th target vector is given by:

X �

p

n+

1

�

( v

1

= N −1

∑ ∑∑

�

( v

�

⎛vi

, ⎜

⎝ F

⎞

⎟∗

p

⎟

⎠

n

�

p ∗ p

V i,

n)

�

X i,

n)

�

�

i,

n

2 p

a,

n

Xb,

n

Xc,

n

F P3

a≠b≠c≠i

X �

n+

1

X �

n

�

⎛− vi

⎜

⎝ F

2) If the random variable represents any population-member at time n, and the

random variable

is given by:

represents the same member at time n+1, then the PDF of

� � � � � �

p ( x ) P . p ( x ) p ( λ , µ ). dλ . dµ

= ∑ ∫

� � �

X n+ 1 m X n+

1

�

X

m �

f ( x ) < f ( λ , x )

n+ 1

n n

n+ 1 n+ 1, βm

n+ 1 n+ 1, βm

� � �

� � �

P . p ( γ , x ). p ( x , η). dγ . dη

+∑ ∫ � �

m X n+ 1, β 1,

n V n+

α

n

m

�

f ( x ) ≥ f ( γ , x )

, n

m m

⎞

⎟

⎠

Terms in the relation:

p +

p � �

X and X are PDFs of the population vectors

n

n 1

at times n and n+1 respectively. We have shown that

the PDFs for all vectors in the population are

identical, so we do not refer to any vector in

particular.

p �

V is the PDF of the donor vectors at time n. Here

n

also we do not refer to any vector in particular.

�

p � ( x )

Our objective is to express X n+

1 in terms of

n+

1

�

and p � �

X

V

for the point x n+

1

p

n+

1

n

What does it mean ?

• The recurrence relation portrays the

heuristics in the DE algorithm.

• Failure of the population PDF at time n to

find good solutions may contribute to

selection of the donor PDF.

• Similarly, failure of the donor PDF to find

good solutions may contribute to

selection of the original population PDF.

The Lyapunov Functional

• Investigation of the recurrence relation shows that there

exists a Lyapunov functional dependent on the

population PDF

V

�

⎛

= ⎜

∫

⎝ ℜ

[ ] *

p(

x)

f ( x)

p(

x)

dx

− f ( x )

• The Lyapunov functional strictly decreases to

zero with time, hence the dynamics is

asymptotically convergent at the equilibrium

PDF:

p

� � �

( x) = δ ( x − x*)

E

D

�

�

�⎞

⎟

⎠

�

We prove the following things about the Lyapunov Functional:

� ⎛ � � � ⎞ � *

1) The functional V [ p(

x ) ] = ⎜ f ( x ) p(

x ) dx

⎟ − f ( x )

⎜ ∫

⎟ is positive definite

D ⎝ ℜ

⎠

� � �

w.r.t the equilibrium PDF p ( x) = δ ( x − x*)

2) The functional

p( x) �

E

� �

V[ pX ( x)] −V

[ p ( )]

n 1

X x

+

n

� � �

is negative semi-definite w.r.t the equilibrium PDF p ( x) = δ ( x − x*)

.

defined on the set of all PDFs

3) For the functional dynamics arising from the transition equation,

� �

V[ p ( x)] −V

[ p ( x)]

X X

n+ 1

n

taken by the PDF

.

�

p � ( x )

X

n

n

does not identically vanish along any trajectory

for all n ≥ 0

Dynamics given by the PDF transition equation is asymptotically stable

� � �

at the equilibrium PDF p ( x) = δ ( x − x*)

E

E

How correct is the model?

• To check the correctness of our model, we test the

actual DE, with DE/1/rand/bin mutation and a large

population size (1000), and apply it to some standard

one-dimensional benchmarks.

• The estimated PDFs obtained through the

experiment are found to be pretty close to the PDFs

predicted through the recurrent stochastic model.

• The estimated Lyapunov functional also reduces to

zero and is very close to the predicted Lyapunov

functional.

The Sphere Function

PDFs predicted for different time instants through the

recurrent stochastic model

PDFs estimated by running the DE algorithm

Griewank’s Function

PDFs predicted for different time instants through the

stochastic model

PDFs estimated by running the DE algorithm

Shifted Rastrigin’s Function

PDFs predicted for different time instants through the

stochastic model

PDFs estimated by running the DE algorithm

Conclusions

• The stochastic model accurately predicts the behavior of

the DE algorithm for a large population size.

• The model is also successful in showing that the

population vectors converge at the global optimum point,

provided it exists uniquely.

• Further research can be undertaken to predict the

algorithm’s behavior for a finite number of vectors.