Analysis of Cuckoo Search Efficiency: Carlos Eduardo M. Barbosa Germano C. Vasconcelos

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Analysis of Cuckoo Search Efficiency

Carlos Eduardo M. Barbosa Germano C. Vasconcelos


Center for Informatics Center for Informatics
Federal University of Pernambuco Federal University of Pernambuco
Recife, Brazil Recife, Brazil
[email protected] [email protected]

Abstract—Optimizing the parameters of a bio-inspired algo- in local extrema [3]. The efficiency of any algorithm depends
rithm is naturally the main path to improve its performance. very much on how those two properties are balanced [2].
The distribution used in the displacement and creation of new This paper is structured as follows: Section II describes and
solutions is also a factor to consider when enhancing its capacity.
In this work, the efficiency of cuckoo search (CS) and self- compares the α-stable distributions. Experimental results in
adaptive cuckoo search algorithms (SACS) is investigated through three different problems are provided and analyzed in Section
extensive experimentation in three problems: (1) benchmark III. Finally we conclude the paper in Section IV.
function optimization, (2) wind energy forecasting and (3) data
clustering. This paper examines the reasons why CS and SACS II. α- STABLE D ISTRIBUTIONS
have presented better performance and convergence rate than
other algorithms in the above optimization problems. The Lévy Stable distributions arise constantly in the study of long
probability distribution employed in the algorithms, the repro- tail distributions with various applications in physics and
duction strategy determined by parameters N and pa , and economics, modeling rare events such as earthquakes or
the reduced number of parameters to optimize are candidate stock market crashes. Many scholars believe some of the
hypotheses studied. It is seen how such factors influence the
recent financial problems have occurred because analysts have
performance of the algorithms, showing the efficiency of cuckoo
search is very much associated with the Lévy distribution. limited themselves to using Gaussian models, which do not
Index Terms—Bio-inspired algorithms, cuckoo search, Lévy have long tails [4]. α-stable distributions have been used in
Flight large scale to describe models in several areas where the
Gaussian distribution fails, such as modeling time series of
I. I NTRODUCTION stock exchanges or of human heart signals, among others.
The Lévy distribution used in the CS algorithm is a par-
Randomization is a very important aspect in bio-inspired ticular case of the α-stable distributions. The stability of
algorithms. It not only improves exploration capacity by distributions lies in the fact that a linear combination of
performing global search in the solution space when search two independent random variables of the same distribution
steps are long, but also allows local search around the best generate another random variable with the same distribution.
current solution when steps are small and limited to a local This is the case of normal distribution, or Gaussian, since the
region. Thus, the value generated by randomization for the sum of Gaussians is also a Gaussian and the product of a
displacement of individuals in the solution space has impor- Gaussian by a scalar, plus or not another scalar, is still a
tant influence on the performance of such algorithms. Ran- Gaussian [5].
domization in bio-inspired algorithms typically uses uniform Lévy has shown the Gaussian distribution to be a special
or gaussian distributions [1]. However, several studies have case of α-stable distributions [6], a family of four-parameter
shown the displacement of many animals exhibit behavior like distributions, usually denoted by Equation (1):
Lévy flights, a kind of random walk whose step size is based
on a long tail probability distribution, which does not attenuate S(α, β, γ, δ) (1)
exponentially and combines several groups of small steps and
occasional long steps in its trajectory. Such behavior has been Where:
largely applied to search and optimization problems with very • α ∈ (0, 2] is the most important parameter, called the
promising capability results [2]. characteristic exponent, responsible for the tail of the dis-
Over the last years, several bio-inspired algorithms have tribution. In this interval, the probability density function
been developed for optimization. Those algorithms work ac- has behavior in power law. The exponent λ = α + 1;
cording to a random search in the solution space of a given • β ∈ [−1, 1] is the asymmetric coefficient, which char-
problem. Two crucial properties of the algorithms are exploita- acterizes the long tail asymptotic behavior of the distri-
tion and exploration. In the first there is an intensification of bution, and makes rare events more likely to occur. It
the search for the best current solutions. In the second there defines whether the distribution is symmetric (β = 0), or
is a diversification to explore solutions far enough from the asymmetric to the right (β > 0) or to the left (β < 0);
current solution as to prevent the algorithm from getting stuck • γ > 0 is the scale or width of the distribution; and,

c
978-1-7281-2153-6/19/$31.00 2019 IEEE 1351
• δ ∈ R is the location or offset. low values of α increase diversification, and thus the proba-
This family includes the three distributions [4]: bility of large jumps.
Using the concepts in [7] and [8], [9], the function stblrnd
1) Gaussian distribution N(µ,σ 2 ), with α = 2, mean µ = δ
in MATLAB generates random numbers with a α-stable dis-
and variance σ 2 = 2γ 2 , is given by S(2,β, √σ2 ,µ) and its
tribution. The command below generates an M-by-N array of
probability density function can be seen in Equation (2),
random variables with S(α,β,γ,δ) distribution.
1 (x − µ)2
f (x) = √ exp(− ) (2) X = stblrnd(α, β, γ, δ, M, N, ...) (6)
2πσ 2 2σ 2
2) Cauchy-Lorentz distribution, with α = 1, β = 0, Additionally, for random number generation with uniform
σ = γ and mean µ = δ, is given by S(1,0,σ,µ) and its distribution, MATLAB has the function unifrnd(a, b, [M N]).
probability density function can be seen in Equation (3), Employing those two functions, Figure 2 shows the trajectories
  of each distribution, considering 100,000 steps and having the
1 σ2 origin as the starting point, in 2D search space.
f (x) = (3)
πσ (x − µ)2 + σ 2
3) Lévy distribution, with α = 32 , β = 0, σ = γ and mean
µ = δ is given by S( 32 ,0,σ,µ) and its probability density
function can be seen in Equation (4).
r σ
σ exp(− 2(x−µ) )
f (x) = (4)
2π (x − µ) 32
The continuous uniform distribution is the one where, for
any point in the interval [a, b], its probability density function
is given by Equation (5).
 1
f (x) = b−a , a<x<b
(5)
0, otherwise
The behavior of the probability density functions for the
above 3 distributions and the uniform distribution can be seen
in Figure 1.

Fig. 2: Trajectory in Uniform, Gaussian, Cauchy-Lorentz and


Lévy distributions

As we can see (Figure 2), in the trajectory of the Gaussian


distribution, with simpler random walk known as Brownian,
individuals perform always small steps over time. The same
can be seen with the uniform distribution. In the Lévy dis-
tribution, the displacements are not smooth as those of the
Gaussian. Instead, there are regions of entrapment with a lot
of small steps, where individuals stay for a long time, and
also long-distance displacements (flights) [10]. The trajectory
Fig. 1: Probability density function of distributions of the Cauchy-Lorentz distribution is the opposite, with several
long jumps and very small groups of small steps.
Gaussian distributions are symmetric with tails decaying
rapidly when N tends to infinity. Lévy and Cauchy-Lorentz III. S TATISTICAL A NALYSIS
distributions are long tail distributions, which increase the CS and SACS outperformed genetic algorithms (GA),
likelihood of rare events (long jumps) occurring. Decreasing ant colony optimization (ACO), particle swarm optimization
the value of α results in generating more samples far from (PSO), artificial bee colony (ABC), firefly algorithm (FA) and
zero, since the distributions are symmetric around zero. This bat algorithm (BAT) in benchmark function optimization, wind
shows the influence of α in the control of exploitation and energy forecasting and data clustering problems [11], [12].
exploration. High values of α make the algorithm more CS and SACS use the Lévy probability distribution, whereas
exploited, reducing the probability of large jumps. Conversely, the others use the uniform probability distribution. Among the

1352
possible reasons for the efficiency of the cuckoo algorithms are TABLE I: Population Size - Execution Time Results
the small number of parameters to be fitted, population size (a) Benchmark
(N ), probability (pa ) and equilibrium between diversification Sphere (d=2) Ackley (d=2) Sphere (d=50) Ackley (d=50)
and intensification, obtained from the Lévy distribution. In 10
7.91 12.76 11.00 15.16
(9.89e-02) (4.01e-02) (0.11) (1.95e-02)
order to investigate the influence of such parameters, we 15.37 24.85 21.39 29.83
observe in this section the results of four experimentsin three 20
(3.89e-02) (7.88e-02) (4.77e-02) (5.79e-02)
problems.The parameters of all algorithms were set with their 22.70 37.11 31.55 44.44
30
(7.06e-02) (0.11) (4.99e-02) (5.62e-02)
best values obtained as a result of a Tuning-PSO technique 30.21 49.67 42.02 59.16
40
to explore their full potential [12]. All experiments were per- (5.26e-02) (0.17) (8.11e-02) (7.34e-02)
37.67 61.94 52.49 73.71
formed to a maximum of 10000 iterations, after observing this 50
(8.00e-02) (9.21e-02) (0.15) (0.12)
guaranteed complete convergence of the algorithms. Results (b) Wind (c) Clustering
were measured with mean and standard deviation, values were Montana Texas Balance Bupa
averaged over 20 independent runs, time was measured in 10
2.65e+03 2.70e+03
10
378.15 336.18
(13.72) (15.60) (3.94) (5.04)
seconds. Best results are highlighted in bold. 5.09e+03 5.17e+03 753.54 670.85
20 20
(20.48) (23.24) (1.50) (0.43)
A. Population Size (N ) 30
7.55e+03 7.65e+03
30
1.12e+03 1.00e+03
(37.02) (28.64) (3.74) (0.98)
Population size (N ) is an important factor not only in the 40
1.00e+04 1.01e+04
40
1.50e+03 1.33e+03
(28.98) (39.14) (2.82) (1.36)
CS algorithm, but to all bio-inspired algorithms. In order to 1.24e+04 1.26e+04 1.87e+03 1.67e+03
50 50
evaluate its impact on CS, an experiment was performed to (52.26) (265.75) (1.59) (2.80)
compare results when (N ) was varied to 10, 20, 30, 40 and
50, and fixing the probability pa with its best value, obtained
with the Tuning-PSO technique.
As we can see in Table I, the variation of population size
has a direct influence on the execution time of the algorithm.
The larger the number of individuals the longer the execution
time. This is explained by the greater number of evaluations
of the objective function when there are more individuals in
the algorithm. As observed in Table II (a) and Figure 3, all
experiments with the benchmark functions reached the same
quality of solution within the 10000 iterations, except in the
case of the 50-dimensions Sphere function, which obtained
better results and converged faster with fewer individuals in
its population. In the wind energy prediction problem (Table II
(b,c) and Figure 4), which is more complex, it was observed
that with only 10 individuals in the population the worst
results were obtained in all metrics, with POCID below 60%
in the Montana database, UTHEIL above 1 (one) and the
highest values obtained for the MAPE and MSE metrics in
the two databases. With 20 or more individuals the differences
were not significant, but it was observed the majority of best Fig. 3: Population Size - Convergence (Benchmark)
results (9 out of 12 cases) were obtained with N = 50.
In the clustering problem (Table II (d,e) and Figure 5), all
experiments reached the same solution quality within 10000 importance of (pa ) in the CS algorithm, an experiment was
iterations and was not affected by variation in the population conducted varying pa with values 0.1, 0.3, 0.5, 0.7 and 0.9,
size. and fixing the population size N at its best value reached with
the Tuning-PSO technique.
B. Probability (pa ) Regarding the quality of solutions in the Benchmark and
pa represents the probability of a cuckoo’s egg being Wind problems with respect to error rates, we observe (Ta-
discovered by the host bird and the nest being abandoned, ble IV (a,b,c)) the best results were typically obtained with
that is, the probability the solution is discarded and replaced lower probability values. For more complex problems such
by another solution. pa has values ranging from 0 to 1, with as high dimensional benchmark functions (d = 50) and Wind
low values meaning less nests will tend to be probabilistically problems (all best results found with pa =0.1, with exception
discarded, and the algorithm will hold more information from of metric POCID, also presenting good results), the choice of
the past. Conversely, high values make the algorithm to forget the probability value has an even greater effect on the quality
information from past populations, increasing in theory its of the solution, except for the Ackley function with 50 dimen-
ability to escape from local extrema. In order to evaluate the sions, in which case the same results were obtained. However,

1353
TABLE II: Population Size - Error Results
(a) Benchmark
Sphere (d=2) Ackley (d=2) Sphere (d=50) Ackley (d=50)
4.94e-324 2.22e-15 1.53e-322 2.22e-15
10
(0) (0) (0) (0)
4.94e-324 2.22e-15 1.53e-322 2.22e-15
20
(0) (0) (0) (0)
4.94e-324 2.22e-15 1.04e-270 2.22e-15
30
(0) (0) (1.08e-271) (0)
4.94e-324 2.22e-15 2.65e-242 2.22e-15
40
(0) (0) (1.76e-242) (0)
4.94e-324 2.22e-15 8.10e-225 2.22e-15
50
(0) (0) (4.46e-225) (0) Fig. 4: Population Size - Convergence (Wind)
(b) Wind - Montana
MAE MAPE MSE POCID UTHEIL ARV
10.83 215.03 348.97 56.76 2.68 3.20e-04
10
(13.68) (278.98) (490.20) (42.43) (3.43) (4.37e-04)
0.94 11.77 1.73 86.67 0.18 7.72e-06
20
(0.17) (3.40) (0.53) (0.13) (6.05e-02) (2.72e-06)
2.17 39.33 8.16 86.69 1.04 5.21e-05
30
(0.59) (14.40) (4.72) (9.69e-02) (0.92) (4.70e-05)
1.53 28.20 3.95 86.76 0.35 1.63e-05
40
(0.72) (21.68) (2.68) (0) (0.18) (1.01e-05)
0.94 12.09 1.64 86.71 0.18 7.37e-06
50
(0.13) (2.83) (0.37) (6.46e-02) (4.86e-02) (2.18e-06)
(c) Wind - Texas
MAE MAPE MSE POCID UTHEIL ARV
6.59 136.24 472.75 79.83 1.22 1.53e-04 Fig. 5: Population Size - Convergence (Clustering)
10
(9.19) (214.22) (853.12) (18.64) (1.51) (2.07e-04)
3.04 47.85 263.03 83.38 0.58 5.95e-05
20
(5.39) (88.55) (813.30) (14.59) (1.06) (1.22e-04)
1.40 19.01 6.25 88.11 0.27 2.33e-05 error, we observe, for a general population size, the smaller
30
(0.43) (8.34) (4.53) (0.18) (0.19) (1.62e-05) the value of the parameter pa the faster the convergence of
1.44 20.67 5.58 88.14 0.26 2.21e-05
40
(0.47) (7.38) (4.49) (0.17) (0.23) (2.01e-05) CS ( Figure 6, Figure 7 and Figure 8). However, the value
50
1.19 18.39 3.42 87.95 0.16 1.34e-05 reached at the end of 10000 iterations is in most cases the
(0.23) (9.23) (0.59) (0.22) (2.31e-02) (2.29e-06)
(d) Clustering - Balance same. This indicates the pa is able to control convergence
speed of the algorithm, but is not the main responsible for
J Accuracy Precision Coverage F-Measure IRC
1.55e+03 0.83 0.75 0.82 0.69 0.50 its global convergence, which causes it to escape from local
10
(4.82e-13) (0) (0) (0) (0) (0) minima and obtain the lowest error values.
1.55e+03 0.83 0.75 0.82 0.69 0.50
20
(3.60e-13) (0) (0) (0) (0) (0)
1.55e+03 0.83 0.75 0.82 0.69 0.50
30
(1.61e-13) (0) (0) (0) (0) (0)
1.55e+03 0.83 0.75 0.82 0.69 0.50
40
(3.94e-13) (0) (0) (0) (0) (0)
1.55e+03 0.83 0.75 0.82 0.69 0.50
50
(2.27e-13) (0) (0) (0) (0) (0)
(e) Clustering - Bupa
J Accuracy Precision Coverage F-Measure IRC
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
10
(0) (0) (0) (0) (0) (0)
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
20
(2.88e-12) (0) (0) (0) (0) (0)
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
30
(1.29e-12) (0) (0) (0) (0) (0)
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
40
(1.82e-12) (0) (0) (0) (0) (0)
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
50
(1.82e-12) (0) (0) (0) (0) (0)

for simpler problems such as low benchmark functions (d =


2) and the Clustering problem (Table IV (d,e)), the effect of
pa was not significant, with different values of pa reaching Fig. 6: Probability - Convergence (Benchmark)
the same solution. Will respect to execution time, it is seen
(Table III), there was no significant difference in none of the
problems since for all of them there are no extra function C. Probability Distribution
evaluations. Since the population size and probability of abandoning the
With respect to the average convergence velocity of the nest are not the determining factors for the performance of

1354
TABLE III: Probability - Execution Time Results TABLE IV: Probability - Error Results
(a) Benchmark (a) Benchmark
Sphere (d=2) Ackley (d=2) Sphere (d=50) Ackley (d=50) Sphere (d=2) Ackley (d=2) Sphere (d=50) Ackley (d=50)
31.10 16.35 25.70 24.15 4.94e-324 2.22e-15 1.43e-322 2.22e-15
0.1 0.1
(0.84) (2.40e-02) (0.52) (7.03e-02) (0) (0) (0) (0)
31.74 16.40 25.94 24.27 4.94e-324 2.22e-15 3.66e-287 2.22e-15
0.3 0.3
(7.51e-02) (3.28e-02) (5.35e-02) (4.27e-02) (0) (0) (0) (0)
31.68 16.51 26.16 24.28 4.94e-324 2.22e-15 1.17e-220 2.22e-15
0.5 0.5
(4.58e-02) (4.51e-02) (3.22e-02) (3.13e-02) (0) (0) (0) (0)
31.74 16.52 26.38 24.22 4.94e-324 2.22e-15 1.91e-138 2.22e-15
0.7 0.7
(0.11) (2.04e-02) (4.00e-02) (2.80e-02) (0) (0) (6.03e-138) (0)
31.76 16.49 26.68 24.10 4.94e-324 2.22e-15 6.38e-62 2.22e-15
0.9 0.9
(6.17e-02) (2.86e-02) (3.10e-02) (5.12e-02) (0) (0) (2.01e-61) (0)
(b) Wind (c) Clustering (b) Wind - Montana
Montana Texas Balance Bupa MAE MAPE MSE POCID UTHEIL ARV
9.48e+03 7.82e+03 930.21 1.35e+03 0.87 9.36 1.59 86.64 0.17 6.90e-06
0.1 0.1 0.1
(24.54) (19.14) (10.95) (16.14) (0.10) (0.28) (0.50) (9.69e-02) (5.27e-02) (2.24e-06)
9.57e+03 7.82e+03 938.01 1.37e+03 0.95 11.34 1.73 86.67 0.19 7.70e-06
0.3 0.3 0.3
(37.21) (32.98) (0.69) (12.18) (0.11) (2.60) (0.24) (0) (2.34e-02) (9.98e-07)
9.50e+03 7.80e+03 938.19 1.36e+03 8.18 171.36 401.65 61.30 2.14 2.00e-04
0.5 0.5 0.5
(14.98) (25.02) (2.09) (3.08) (8.84) (159.24) (559.28) (35.87) (2.40) (2.52e-04)
9.51e+03 7.81e+03 936.82 1.36e+03 1.15 16.00 2.46 86.67 0.26 1.14e-05
0.7 0.7 0.7
(23.55) (40.37) (3.08) (2.71) (0.22) (6.39) (0.65) (6.46e-02) (6.74e-02) (3.34e-06)
9.57e+03 7.90e+03 941.85 1.37e+03 1.28 19.28 3.90 86.55 0.38 1.81e-05
0.9 0.9 0.9
(8.85) (227.98) (0.96) (3.50) (0.72) (14.32) (3.88) (9.69e-02) (0.37) (1.84e-05)
(c) Wind - Texas
MAE MAPE MSE POCID UTHEIL ARV
1.05 13.39 3.94 88.04 0.18 1.54e-05
0.1
(0.27) (5.84) (0.57) (0.26) (2.39e-02) (2.83e-06)
1.33 26.03 5.15 88.06 0.22 1.94e-05
0.3
(0.67) (23.75) (1.09) (0.29) (3.10e-02) (2.63e-06)
1.63 18.65 8.79 88.06 0.42 3.71e-05
0.5
(0.94) (11.18) (6.01) (0.29) (0.34) (2.91e-05)
1.33 28.53 4.86 87.99 0.20 1.85e-05
0.7
(0.66) (27.30) (0.66) (0.19) (9.53e-03) (1.22e-06)
1.62 29.53 4.12 87.97 0.24 2.00e-05
0.9
(0.50) (14.87) (1.88) (0.23) (0.15) (1.18e-05)
(d) Clustering - Balance
J Accuracy Precision Coverage F-Measure IRC
Fig. 7: Probability - Convergence (Wind) 1.55e+03 0.83 0.75 0.82 0.69 0.50
0.1
(3.60e-13) (0) (0) (0) (0) (0)
1.55e+03 0.83 0.75 0.82 0.69 0.50
0.3
(3.60e-13) (0) (0) (0) (0) (0)
CS (different parameter values usually maintain the same cost 1.55e+03 0.83 0.75 0.82 0.69 0.50
0.5
and quality of solutions), the choice of distribution employed (4.55e-13) (0) (0) (0) (0) (0)
1.55e+03 0.83 0.75 0.82 0.69 0.50
in the algorithm is the remaining aspect to be analyzed. This 0.7
(5.08e-13) (0) (0) (0) (0) (0)
has already been the subject of some recent work, with many 0.9
1.55e+03 0.83 0.75 0.82 0.69 0.50
(1.61e-13) (0) (0) (0) (0) (0)
researchers [13]–[17] trying to improve the performance of
(e) Clustering - Bupa
CS by changing the probability distribution or by adding local
J Accuracy Precision Coverage F-Measure IRC
search methods. 1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
With the aim of analyzing the effect of distribuions on CS, 0.1
(1.82e-12) (0) (0) (0) (0) (0)
the next experiment compared the results of the CS-Gauss, CS- 1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
0.3
(1.29e-12) (0) (0) (0) (0) (0)
Cauchy, CS-Lévy and CS-Uniform algorithms, as variations 1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
0.5
of the CS algorithm, using the three α-stable distributions and (1.29e-12) (0) (0) (0) (0) (0)
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
the distribution uniform. In this case, parameters N and pa 0.7
(1.82e-12) (0) (0) (0) (0) (0)
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
0.9
(2.88e-12) (0) (0) (0) (0) (0)

were set to their best values obtained with the Tuning-PSO


technique, so as to isolate the impact of the distribution on
the performance of the algorithm.
With respect to runtime (Table V), the values obtained
were very close but with the uniform distribution showing the
shortest times. It was also observed among the α-stable distri-
butions, the Lévy distribution presented the longest execution
Fig. 8: Probability - Convergence (Clustering) time. This can be explained by the Lévy distribution having a

1355
more complex computational equation than the Gaussian and TABLE VI: Probability Distribution - Error Results
Cauchy distributions. (a) Benchmark
Sphere (d=2) Ackley (d=2) Sphere (d=50) Ackley (d=50)
TABLE V: Probability Distribution - Execution Time Results 4.94e-324 2.22e-15 1.53e-187 2.22e-15
CS-Gauss
(0) (0) (5.64e-188) (0)
(a) Benchmark 4.94e-324 2.22e-15 9.97e-190 2.22e-15
CS-Cauchy
(0) (0) (6.67e-190) (0)
Sphere (d=2) Ackley (d=2) Sphere (d=50) Ackley (d=50) 4.94e-324 2.22e-15 3.40e-190 2.22e-15
31.03 48.37 40.56 50.84 CS-Lévy
CS-Gauss (0) (0) (0) (0)
(0.27) (1.06) (2.65) (53.48) 2.46e-190 2.22e-15 4.93e-186 2.22e-15
34.78 51.08 42.50 52.42 CS-Uniform
CS-Cauchy (1.12e-190) (0) (2.22e-186) (0)
(3.68) (2.54) (3.06) (4.12)
36.57 54.38 52.29 55.83
(b) Wind - Montana
CS-Lévy
(3.53) (5.47) (6.52) (1.48) MAE MAPE MSE POCID UTHEIL ARV
28.20 48.30 31.09 43.93 1.26 21.82 3.69 86.60 0.33 1.53e-05
CS-Uniform CS-Gauss
(0.46) (4.14) (1.96) (0.51) (0.60) (17.43) (3.33) (3.23e-02) (0.27) (1.34e-05)
(b) Wind (c) Clustering 2.25 22.91 16.91 86.62 1.24 6.19e-05
CS-Cauchy
(0.73) (3.70) (7.14) (0) (0.54) (2.80e-05)
Montana Texas Balance Bupa 1.15 16.90 2.27 86.69 0.24 1.04e-05
9.36e+03 7.63e+03 912.24 1.31e+03 CS-Lévy
CS-Gauss CS-Gauss (0.25) (3.20) (0.83) (9.69e-02) (9.80e-02) (4.59e-06)
(36.90) (211.36) (13.14) (17.88) 8.10 164.90 400.88 61.32 2.11 1.99e-04
9.37e+03 7.53e+03 921.51 1.32e+03 CS-Uniform
CS-Cauchy CS-Cauchy (8.96) (168.38) (560.38) (35.90) (2.44) (2.55e-04)
(26.65) (66.57) (1.05) (0.32)
9.41e+03 7.71e+03 956.87 1.37e+03
(c) Wind - Texas
CS-Lévy CS-Lévy
(54.33) (14.95) (0.41) (2.77) MAE MAPE MSE POCID UTHEIL ARV
9.32e+03 7.43e+03 898.80 1.29e+03 1.63 18.65 8.79 88.06 0.42 3.71e-05
CS-Uniform CS-Uniform CS-Gauss
(16.68) (89.18) (0.11) (0.62) (0.94) (11.18) (6.01) (0.29) (0.34) (2.91e-05)
1.41 21.07 10.67 88.06 0.40 3.45e-05
CS-Cauchy
(0.78) (16.75) (8.87) (0.29) (0.29) (2.38e-05)
In the Benchmark and Wind problems, with respect to 1.16 14.77 3.09 88.15 0.14 1.20e-05
CS-Lévy
quality of solutions and convergence speed (Table VI (a,b,c), (7.41e-02) (0.80) (0.91) (9.69e-02) (3.66e-02) (2.88e-06)
1.96 16.37 23.14 88.31 0.77 6.19e-05
Figure 9 and Figure 10), the best results were mostly obtained CS-Uniform
(0.27) (2.53) (8.02) (6.46e-02) (0.25) (2.12e-05)
with the Lévy distribution and the worst results were obtained (d) Clustering - Balance
with the uniform distribution. Exceptions were observed with J Accuracy Precision Coverage F-Measure IRC
the MAPE metric in Texas database, where the uniform dis- CS-Gauss
1.55e+03 0.83 0.75 0.82 0.69 0.50
(3.60e-13) (0) (0) (0) (0) (0)
tribution outperformed the Gaussian and Cauchy distributions, 1.55e+03 0.83 0.75 0.82 0.69 0.50
and with the POCID metric also in Texas, where the uniform CS-Cauchy
(3.94e-13) (0) (0) (0) (0) (0)
distribution outperformed all α-stable distributions. 1.55e+03 0.83 0.75 0.82 0.69 0.50
CS-Lévy
(3.60e-13) (0) (0) (0) (0) (0)
Due to the occurrence of longer jumps, the Cauchy and CS-Uniform
1.55e+03 0.83 0.75 0.82 0.69 0.50
Lévy distributions have greater power of exploration than the (0) (0) (0) (0) (0) (0)
(e) Clustering - Bupa
Gaussian distribution, but the Cauchy distribution does not
present many short shifts, which does not allow the algorithm J Accuracy Precision Coverage F-Measure IRC
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
to intensify and improve the solution as much as the Lévy CS-Gauss
(0) (0) (0) (0) (0) (0)
distribution. Thus, out of the three α-stable distributions, the CS-Cauchy
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
(2.88e-12) (0) (0) (0) (0) (0)
Lévy distribution is the one that best balances exploitaion 1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
CS-Lévy
and exploration in the search, and therefore obtained the (1.82e-12) (0) (0) (0) (0) (0)
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
best results generally. However, in the Clustering problem CS-Uniform
(3.15e-12) (0) (0) (0) (0) (0)
we observed (Table VI (d,e) the results attained were the
same, regardless of the chosen distribution, with the uniform
distribution presenting little slower convergence than the other In order to further investigate the impact of the distribution,
distributions (Figure 11). especially the good performance associated with the Lévy
distribution, the next experiment compares the PSO algorithm,
D. PSO-Lévy which uses the uniform distribution by default, with the CS
Several studies were conducted comparing the efficiency and PSO-Lévy, the latter being a variation of PSO employing
of CS to other bio-inspired algorithms. [18] and several the Lévy distribution. The PSO was the particular choice for
other theoretical studies suggested the PSO algorithm can this experiment due its frequent use in many previous works
quickly converge to the best current solution (gbest), but not and applications, and its potential problem of premature con-
necessarily to the best overall solution, since it does not satisfy vergence to local minima, opening new space for investigation.
the global convergence conditions of an algorithm [19]. On the As seen in Table VII, CS presented the longest running
other hand, [20] and other studies have shown CS meets those times in all cases and the evenly distributed PSOs, with the
requirements and guarantees global convergence. This causes exception of the Texas database. From this we can reinforce
the PSO to converge prematurely to local optimal locations in the use of the Lévy distribution includes additional processing
multimodal optimization problems, while the CS can generally when compared to the uniform distribution (in fact, regardless
converge to the optimal global. of the distribution CS is more costly than PSO). With respect

1356
TABLE VII: PSO-Lévy - Execution Time Results
(a) Benchmark
Sphere (d=2) Ackley (d=2) Sphere (d=50) Ackley (d=50)
16.31 8.84 31.42 19.11
PSO
(0.21) (7.67e-02) (0.33) (6.60e-02)
18.09 11.33 36.10 23.11
PSO-Lévy
(5.15e-02) (7.01e-02) (0.18) (5.84e-02)
31.31 16.32 45.42 23.91
CS-Lévy
(0.10) (6.54e-02) (8.51e-02) (6.35e-02)
(b) Wind (c) Clustering
Montana Texas Balance Bupa
6.10e+03 3.07e+03 471.13 565.71
PSO PSO
(29.59) (12.48) (5.59) (6.93)
6.01e+03 3.09e+03 479.03 574.98
PSO-Lévy PSO-Lévy
(32.25) (15.60) (1.43) (0.56)
9.81e+03 7.61e+03 934.47 1.37e+03
CS-Lévy CS-Lévy
(144.64) (19.08) (0.47) (0.69)

TABLE VIII: PSO-Lévy - Error Results


(a) Benchmark
Sphere (d=2) Ackley (d=2) Sphere (d=50) Ackley (d=50)
Fig. 9: Probability Distribution - Convergence (Benchmark) PSO
4.94e-324 0.86 7.34e-104 4.15
(0) (2.72) (2.32e-103) (0.71)
4.94e-324 2.22e-15 1.48e-323 2.22e-15
PSO-Lévy
(0) (0) (0) (0)
4.94e-324 2.22e-15 8.40e-311 2.22e-15
CS-Lévy
(0) (0) (0) (0)
(b) Wind - Montana
MAE MAPE MSE POCID UTHEIL ARV
13.10 264.90 800.88 71.32 3.81 3.79e-04
PSO
(3.96) (168.38) (360.38) (15.90) (1.44) (1.64e-04)
1.31 22.49 3.20 86.62 0.32 1.42e-05
PSO-Lévy
(1.78e-02) (5.17) (0.46) (0) (7.04e-02) (2.95e-06)
0.83 9.21 1.39 86.67 0.14 5.93e-06
CS-Lévy
(2.20e-04) (3.67e-04) (1.22e-03) (0) (1.28e-04) (5.32e-09)
(c) Wind - Texas
Fig. 10: Probability Distribution - Convergence (Wind) MAE MAPE MSE POCID UTHEIL ARV
10.97 207.85 551.63 56.36 41.27 6.57e-04
PSO
(6.69) (157.36) (795.09) (31.30) (84.57) (5.78e-04)
1.60 20.37 7.60 88.24 0.37 3.06e-05
PSO-Lévy
to quality of solution considering the error measures, it is (0.51) (8.44) (3.70) (7.20e-02) (0.27) (2.21e-05)
1.36 19.69 6.66 88.11 0.27 2.27e-05
seen (Table VIII) the use of the Lévy distribution in the CS-Lévy
(0.36) (9.26) (5.33) (0.18) (0.17) (1.48e-05)
PSO algorithm was effective, outperforming the uniformly (d) Clustering - Balance
distributed PSO. In addition, it is emphasized the PSO-Lévy J Accuracy Precision Coverage F-Measure IRC
algorithm also matched and even surpassed the CS algorithm 1.55e+03 0.83 0.75 0.82 0.69 0.50
PSO
in some cases. Particularly, in the Ackley multimodal function (4.79e-12) (0) (0) (0) (0) (0)
1.55e+03 0.83 0.75 0.82 0.69 0.50
with 2 and 50 dimensions, the employment of the Lévy PSO-Lévy
(2.95e-13) (0) (0) (0) (0) (0)
distribution made PSO to converge to the global minimum. CS-Lévy
1.55e+03 0.83 0.75 0.82 0.69 0.50
(2.27e-13) (0) (0) (0) (0) (0)
With respect to convergence velocity, we can observe (Fig- (e) Clustering - Bupa
ure 12, Figure 13 and Figure 14) the algorithm PSO-Lévy
J Accuracy Precision Coverage F-Measure IRC
converged faster than the PSO in all problems, showing the 1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
PSO
efficiency of the Lévy distribution. We highlight here the (1.80e-11) (0) (0) (0) (0) (0)
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
PSO-Lévy
(1.42e-13) (0) (0) (0) (0) (0)
1.23e+04 0.62 0.63 0.63 0.62 5.80e-02
CS-Lévy
(1.29e-12) (0) (0) (0) (0) (0)

2-dimensional Sphere unimodal function and the Clustering


databases, in which cases the uniformly distributed PSO
already converged faster than CS and converged even faster
with the Lévy distribution.
IV. C ONCLUSION
Lévy and Cauchy flights use long tail distributions to im-
Fig. 11: Probability Distribution - Convergence (Clustering) prove global search or exploration of the entire solution space,

1357
robustness and easy adaptability of the algorithm, was not
also the main reason for efficiency. This became evident with
the experiment with the PSO-Lévy algorithm where although
having many parameters, PSO was able to reach very good
results when associated with the Lévy distribution. In addition,
the ABC algorithm has also only two parameters (population
size and upper limit of acceleration coefficient) and an ABC-
Lévy algorithm was unable to reach the same performance
level.
R EFERENCES
[1] C. Cruz, J. R. González, D. A. Pelta, N. Krasnogor, and G. Terrazas,
Nature Inspired Cooperative Strategies for Optimization (NICSO 2010).
Springer, 2010, vol. 284.
[2] N. E. Humphries, N. Queiroz, J. R. Dyer, N. G. Pade, M. K. Musyl,
K. M. Schaefer, D. W. Fuller, J. M. Brunnschweiler, T. K. Doyle, J. D.
Houghton et al., “Environmental context explains lévy and brownian
movement patterns of marine predators,” Nature, vol. 465, no. 7301,
pp. 1066–1069, 2010.
[3] X.-S. Yang and S. Deb, “Cuckoo search: recent advances and applica-
Fig. 12: PSO-Lévy - Convergence (Benchmark) tions,” Neural Computing and Applications, vol. 24, no. 1, pp. 169–174,
2014.
[4] G. Samoradnitsky and M. Taqqu, “Stable non-gaussian random pro-
cesses: Stochastic models with infinite variance, volume 1 of stochastic
modelling,” 1994.
[5] L. Fredriksson, “A brief survey of lévy walks: with applications to probe
diffusion,” 2010.
[6] G. M. B. d. Lima, “Processos aleatórios não-markovianos: perfis de
memória,” 2013.
[7] J. M. Chambers, C. L. Mallows, and B. Stuck, “A method for simulating
stable random variables,” Journal of the american statistical association,
vol. 71, no. 354, pp. 340–344, 1976.
[8] A. Weron and R. Weron, “Computer simulation of lévy α-stable vari-
ables and processes,” in ChaosThe Interplay Between Stochastic and
Deterministic Behaviour. Springer, 1995, pp. 379–392.
[9] M. Veillette, “Stbl: Alpha stable distributions for matlab,” Matlab
Fig. 13: PSO-Lévy - Convergence (Wind) Central File Exchange, retreived October, vol. 10, p. 2012, 2012.
[10] Y. Hariya, T. Kurihara, T. Shindo, and K. Jin’no, “Lévy flight pso,”
in Evolutionary Computation (CEC), 2015 IEEE Congress on. IEEE,
while Gaussian and uniform distribution movements favor 2015, pp. 2678–2684.
[11] C. E. M. Barbosa and G. C. Vasconcelos, “Cuckoo search optimization
local search. Being a kind of middle ground between the two for short term wind energy forecasting,” in Evolutionary Computation
other α-stable distributions, the Lévy distribution managed to (CEC), 2016 IEEE Congress on. IEEE, 2016, pp. 1765–1772.
better balance the exploration vs exploitation problem and has [12] ——, “Eight bio-inspired algorithms evaluated for solving optimization
problems,” in International Conference on Artificial Intelligence and Soft
shown in this work to reach better results than the others. This Computing. Springer, 2018, pp. 290–301.
evidences why CS and SACS algorithms have surpassed other [13] H. Zheng and Y. Zhou, “A novel cuckoo search optimization algorithm
six bio inspired algorithms in the same problems. The results based on gauss distribution,” Journal of Computational Information
Systems, vol. 8, no. 10, pp. 4193–4200, 2012.
obtained showed the Lévy distribution is more efficient and [14] S. K. Mishra, “Global optimization of some difficult benchmark func-
also the parameter of CS to most influence its performance. tions by host-parasite co-evolutionary algorithm,” 2013.
We have seen the probability and size of the population did [15] T. T. Nguyen, D. N. Vo, and T. T. Dao, “Cuckoo search algorithm
using different distributions for short-term hydrothermal scheduling with
not generally influenced the quality of the solutions. We also cascaded hydropower plants,” in TENCON 2014-2014 IEEE Region 10
saw the small number of CS parameters, despite indicating Conference. IEEE, 2014, pp. 1–6.
[16] T. T. Nguyen, D. N. Vo, and B. H. Dinh, “Cuckoo search algorithm
using different distributions for short-term hydrothermal scheduling
with reservoir volume constraint,” International Journal on Electrical
Engineering and Informatics, vol. 8, no. 1, p. 76, 2016.
[17] H. Abedi Firouzjaee, J. K. Kordestani, and M. R. Meybodi, “Cuckoo
search with composite flight operator for numerical optimization prob-
lems and its application in tunnelling,” Engineering Optimization, pp.
1–20, 2016.
[18] M. Clerc and J. Kennedy, “The particle swarm-explosion, stability, and
convergence in a multidimensional complex space,” IEEE transactions
on Evolutionary Computation, vol. 6, no. 1, pp. 58–73, 2002.
[19] X.-S. Yang, Cuckoo search and firefly algorithm: Theory and applica-
tions. Springer, 2013, vol. 516.
[20] F. Wang, X. He, Y. Wang, and S. Yang, “Markov model and convergence
analysis based on cuckoo search algorithm,” Computer Engineering,
Fig. 14: PSO-Lévy - Convergence (Clustering) vol. 38, no. 11, pp. 180–185, 2012.

1358

You might also like