Ot Lab 13
Ot Lab 13
Ot Lab 13
Normalize
marks out of 5
(5)
Signed by Course teacher/ Lab Instructor
Lab 13
OBJECTIVE:
Apply Particle Swarm Algorithm using Python.
APPARATUS USED:
THEORY:
Introduction
Particle swarm optimization (PSO) has been successfully applied in many research and application areas.
For my part, I really enjoyed the application of this algorithm in the article by G. Sermpinis [1] on foreign
exchange rate forecasting.
It is demonstrated that PSO can have better results in a faster, cheaper way compared with other
methods. It can also be parallelized. Moreover, it does not use the gradient of the problem being
optimized. In other words, unlike traditional optimization methods, PSO does not require the problem to
be differentiable.
Last but not least, there are very few hyperparameters. These parameters are very simple to understand
and do not require advanced notions. For the same hyperparameters, PSO will work on a very wide
variety of tasks, which makes it a very powerful and flexible algorithm.
Throughout this article, I will detail the mechanisms behind the Particle Swarm Optimization algorithm
assuming as a metaphor a group of birds.
Objectives
Function to minimize
The objective of this article will be to minimize the function you see above. The function in question is
clearly defined, includes only 2 variables, and is differentiable. But keep in mind that this function could
be a non-differentiable step function, or a function defined by the weights of a neural network for which
we do not know the global minimum [1].
Particle
Before we dive into our simple application case, let’s jump into the past. Particle Swarm Optimization
is a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in
1995 [2] inspired by the social behavior of birds or schools of fish.
PSO traduction: a group of particles (potential solutions) of the global minimum in a research space.
There is only a global minimum in this search space. None of the particles knows where the global
minimum is located, but all particles have fitness values evaluated by the fitness function to be
optimized.
Before going further in the explanation of the PSO algorithm, let’s focus a moment on our particles. As
you will have understood, each of these particles is a potential solution of the function to be minimized.
They are defined by their coordinates in the search space.
We can then randomly define particles in the search space as in the image above. But these particles
must be in movement to find the optimal function.
PSO traduction: each of these particles is in movement with a velocity allowing them to update their
position over the iterations to find the global minimum.
Swarm
PSO shares many similarities with evolutionary computation techniques such as Genetic Algorithms
(GA). The system is initialized with a population of random solutions and searches for optima
by updating generations. However, unlike GA, PSO has no evolution operators such as crossover and
mutation. The difference is in the way the generations are updated.
PSO traduction: over the iterations in the search space, the speed of each particle is stochastically
accelerated towards its previous best position (personal best) and towards the best solution of the
group (global best).
Particle update
Concretely, at each iteration, each particle is updated according to its velocity. This velocity is subject
to inertia and is governed by the two best values found so far.
The first value is the best personal solution the particle has found so far. The second one is the best
global solution that the swarm of particles has found so far. So each particle has in memory its best
personal solution and the best global solution.
Optimization
As you might have noticed, I have not yet talked about the inertia, cognitive and social coefficients.
These coefficients control the levels of exploration and exploitation. Exploitation is the ability of particles
to target the best solutions found so far. Exploration, on the other hand, is the ability of particles to
evaluate the entire research space. The challenge of the remaining part of the article will be to
determine the impact of these coefficients to find a good balance between exploration and exploitation.
This assertion of a balance between exploration and exploitation does not make much sense unless both
are defined in a measurable way and, moreover, such a balance is neither necessary nor sufficient from
an efficiency point of view. But for the sake of understanding, I will use these terms in this article.
PSO traduction: at each iteration, the acceleration is weighted by random terms. Cognitive acceleration
and social acceleration are stochastically adjusted by the weights r1 and r2.
Parameter tuning
These two weights r1 and r2 are unique for each particle and each iteration. This is not the case for the
coefficients that I am going to introduce to you now.
PSO traduction: the hyperparameter w allows to define the ability of the swarm to change its direction.
The particles have an inertia proportional to this coefficient w.
To
better appreciate the influence of this coefficient w (also called inertia weight), I invite you to visualize
the 3 swarms of particles above. We can then see that the lower the coefficient w, the stronger the
convergence. In other words, a low coefficient w facilitates the exploitation of the best solutions found
so far while a high coefficient w facilitates the exploration around these solutions. Note that it is
recommended to avoid w >1 which can lead to a divergence of our particles.
The inertia weight w thus makes a balance between the exploration and the exploitation of the best
solutions found so far. Let’s look at how these solutions are found by studying the
coefficients c1 and c2 (also called acceleration coefficients).
PSO traduction: the c1 hyperparameter allows defining the ability of the group to be influenced by
the best personal solutions found over the iterations. The hyperparameter c2 allows defining the ability
of the group to be influenced by the best global solution found over the iterations.
On the GIF above, we can see the impact of these two coefficients. For the purposes, I deliberately
chose a very low coefficient w and forced the extremes of c1 and c2. We can notice then how the
particles of the swarm are more individualistic when c1 is high. There is, therefore, no
convergence because each particle is only focused on its own best solutions. In contrast, the particles of
the swarm are more influenced by the others when c2 is high. We notice on the GIF that the exploration
of the solutions is not optimal and that the exploitation of the best global solution is very important (this
is obvious at iteration ~40).
The coefficients c1 and c2 are consequently complementary. A combination of the two increases both
exploration and exploitation.
Auto hyperparameters
PSO traduction: we can go even further by updating coefficients over the iterations. Starting with a
strong c1, strong w, and weak c2 to increase the exploration of the search space, we want to tend
towards a weak c1, weak w, and strong c2 to exploit the best results after exploration by converging
towards the global minimum.
According to the paper by M. Clerc and J. Kennedy [3] to define a standard for Particle Swarm
Optimization, the best static parameters are w=0.72984 and c1 + c2 > 4. More exactly c1 = c2 = 2.05.
Additionally, the linear decay of the parameter w was initially proposed by Yuhui and Russ Y. H. Shi and
R. C. Eberhart [4].
Based on these ideas and inspired by the paper by G. Sermpinis [1], I suggest the coefficients as specified
in the image above.
Ultimately, this sounds like a lot of information, but the Particle Swarm Optimization is a very simple
algorithm and is even simpler to transcribe into python code.
import sys
import numpy as np
class PSO:
self.N = len(self.particles)
self.w = w
self.c_1 = c_1
self.c_2 = c_2
self.auto_coef = auto_coef
self.max_iter = max_iter
self.p_bests = self.particles
self.p_bests_values = self.fitness_function(self.particles)
self.g_best = self.p_bests[0]
self.g_best_value = self.p_bests_values[0]
self.update_bests()
self.iter = 0
self.is_running = True
self.update_coef()
def __str__(self):
return f'[{self.iter}/{self.max_iter}] $w$:{self.w:.3f} - $c_1$:
{self.c_1:.3f} - $c_2$:{self.c_2:.3f}'
def next(self):
if self.iter > 0:
self.move_particles()
self.update_bests()
self.update_coef()
self.iter += 1
self.is_running = self.is_running and self.iter < self.max_iter
return self.is_running
def update_coef(self):
if self.auto_coef:
t = self.iter
n = self.max_iter
self.w = (0.4/n**2) * (t - n) ** 2 + 0.4
self.c_1 = -3 * t / n + 3.5
self.c_2 = 3 * t / n + 0.5
def move_particles(self):
# add inertia
new_velocities = self.w * self.velocities
# add cognitive component
r_1 = np.random.random(self.N)
r_1 = np.tile(r_1[:, None], (1, 2))
new_velocities += self.c_1 * r_1 * (self.p_bests - self.particles)
# add social component
r_2 = np.random.random(self.N)
r_2 = np.tile(r_2[:, None], (1, 2))
g_best = np.tile(self.g_best[None], (self.N, 1))
new_velocities += self.c_2 * r_2 * (g_best - self.particles)
def update_bests(self):
fits = self.fitness_function(self.particles)
for i in range(len(self.particles)):
# update best personnal value (cognitive)
if fits[i] < self.p_bests_values[i]:
self.p_bests_values[i] = fits[i]
self.p_bests[i] = self.particles[i]
# update best global value (social)
if fits[i] < self.g_best_value:
self.g_best_value = fits[i]
self.g_best = self.particles[i]
Perhaps you will have noticed the only conditions to stop my iterations are:
These stop evaluation functions are not necessarily the bests. These stopping conditions depend heavily
on the size of the swarm. The paper of A. P. Engelbrecht’s paper [5] explicitly shows the choice of the
evaluation function is based more on empirical results than on common standards. It also provides a
wide benchmark of evaluation functions.
Procedure
Results:
Discussion:
Conclusion: