Optimizacion R
Optimizacion R
Optimizacion R
Introduction
1.1 Motivation
different solutions. In particular, modern methods are useful for solving complex
problems for which no specialized optimization algorithm has been developed (Luke
2012). For instance, problems with discontinuities, dynamic changes, multiple
objectives or hard and soft restrictions, which are more difficult to be handled by
classical methods (Michalewicz et al. 2006). Also in contrast with classical methods,
modern optimization does not warranty that the optimal solution is always found.
However, often modern methods achieve high quality solutions with a much more
reasonable use of computational resources (e.g., memory and processing effort)
(Michalewicz and Fogel 2004).
There is a vast number of successful real-world applications based on modern
optimization methods. Examples studied by the author of this book include (among
others): sitting guests at a wedding party (Rocha et al. 2001); time series forecasting
(Cortez et al. 2004); optimization of data mining classification and regression
models (Rocha et al. 2007); and improvement of quality of service levels in
computer networks (Rocha et al. 2011).
1.2 Why R?
The R tool (R Core Team 2013) is an open source, high-level matrix programming
language for statistical and data analysis. The tool runs on multiple platforms,
including Windows, MacOS, Linux, FreeBSD, and other UNIX systems. R is an
interpreted language, meaning that the user gets an immediate response of the tool,
without the need of program compilation. The most common usage of R is under a
console command interface, which often requires a higher learning curve from the
user when compared with other more graphical user interface tools. However, after
mastering the R environment, the user achieves a better understanding of what is
being executed and higher control when compared with graphical interface based
products.
The R base distribution includes a large variety of statistical techniques (e.g.,
distribution functions, statistical tests), which can be useful for inclusion in modern
optimization methods and to analyze their results. Moreover, the tool is highly
extensible by creating packages. The R community is very active and new packages
are being continuously created, with more than 5,800 packages available at the Com-
prehensive R Archive Network (CRAN): http://www.r-project.org/. By installing
these packages, users get access to new features, such as: data mining/machine
learning algorithms; simulation and visualization techniques; and also modern
optimization methods. New algorithms tend to be quickly implemented in R, thus
this tool can be viewed as worldwide gateway for sharing computational algorithms
(Cortez 2010). While it is difficult to know the real number of R users (e.g., it may
range from 250,000 to 2 million), several estimates show a clear growth in the R
popularity (Vance 2009; Muenchen 2013). A useful advantage of using R is that it
is possible to execute quite distinct computational tasks under the same tool, such as
1.4 Evaluation Function 3
A major decision when using modern optimization methods is related with how
to represent a possible solution (Michalewicz et al. 2006). Such decision sets the
search space and its size, thus producing an impact on how new solutions are
searched. To represent a solution, there are several possibilities. Binary, integer,
character, real value and ordered vectors, matrices, trees and virtually any computer
based representation form (e.g., computer program) can be used to encode solutions.
A given representation might include a mix of different encodings (e.g., binary and
real values). Also, a representation might be of fixed (e.g., fixed binary vectors) or
of variable length (e.g., trees).
Historically, some of these representation types are attached with specific
optimization methods. For instance, binary encodings are the basis of Genetic
Algorithms (Holland 1975). Tabu search was designed to work on discrete
spaces (Glover and Laguna 1998). Real-value encodings are adopted by several
evolutionary algorithms (e.g., evolution strategy) (Bäck and Schwefel 1993),
differential evolution, and particle swarms. Tree structures are optimized using
genetic programming (Banzhaf et al. 1998). It should be noted that often these
optimization methods can be adapted to other representations. For instance, a novel
particle swarm was proposed for discrete optimization in Chen et al. (2010).
In what concerns this book, the representations adopted are restricted by the
implementations available in the explored R tool packages, which mainly adopt
binary or real values. Thus, there will be more focus on these type of representations.
Nevertheless, Sect. 7.2 shows a ordered vector representation optimization example.
More details about other representations, including their algorithmic adjustments,
can be found in Luke (2012).
evaluation function
evaluation function
l
local minimum
l
local minimum
l l
global minimum global minimum
Fig. 1.1 Example of a convex (left) and non-convex (right) function landscapes
1.5 Constraints
There are two main types of constraints (Michalewicz 2008): hard and soft. Hard
constraints cannot be violated and are due to factors such as laws or physical
restrictions. Soft constraints are related with other (often non-priority) user goals,
such as increasing production efficiency while reducing environmental costs.
Soft restrictions can be handled by adopting a multi-objective approach
(Chap. 6), while hard constraints may originate infeasible solutions that need to
1.6 Optimization Methods 5
There are different dimensions that can be used to classify optimization methods.
Three factors of analysis are adopted in this book: the type of guided search; the
search is deterministic or stochastic based; and if the method is inspired by physical
or biological processes.
The type of search can be blind or guided. The former assumes the exhaustion
of all alternatives for finding the optimum solution, while the latter uses previous
searches to guide current search. Modern methods use a guided search, which often
is subdivided into two main categories: local, which searches within the neighbor-
hood of an initial solution, and global search, which uses a population of solutions.
In most practical problems, with high-dimensional search spaces, pure blind search
is not feasible, requiring too much computational effort. Local (or single-state)
search presents in general a much faster convergence, when compared with global
search methods. However, if the evaluation landscape is too noisy or complex, with
several local minima (e.g., right of Fig. 1.1), local methods can easily get stuck.
In such cases, multiple runs, with different initial solutions, can be executed to
improve convergence. Although population based algorithms tend to require more
computation than local methods, they perform a simultaneous search in distinct
regions of the search space, thus working better as global optimization methods.
The distinct search types can even be combined. For instance, a two-phase search
can be set, where a global method is employed at a first step, to quickly identify
6 1 Introduction
interesting search space regions, and then, as the second step, the best solutions are
improved by employing a local search method. Another alternative is to perform
a tight integration of both approaches, such as under a Lamarckian evolution or
Baldwin effect (Rocha et al. 2000). In both cases, within each cycle of the population
based method, each new solution is used as the initial point of a local search and the
evaluation function is computed over the improved solution. Lamarckian evolution
replaces the population original solution by its improved value (Sect. 7.2 presents
a Lamarckian evolution example), while the Baldwinian strategy keeps the original
point (as set by the population based method).
Several modern methods employ some degree of randomness, thus belonging
to the family of stochastic optimization methods, such as simulated annealing
and genetic algorithms (Luke 2012). Also, several of these methods are naturally
inspired (e.g., genetic algorithms, particle swarm optimization) (Holland 1975;
Eberhart et al. 2001). Figure 1.2 shows the full taxonomy of the optimization
methods presented in this book (with respective R packages).
The distinct modern optimization methods share some common features.
Algorithm 1 shows (in pseudo-code) a generic modern optimization method that
is applicable to all methods discussed in this book. This global algorithm receives
two inputs, the evaluation function (f ) and a set of control parameters (C ), which
includes not only the method’s internal parameters (e.g., initial temperature,
population size) but it is also related with the representation of the solution
(e.g., lower and upper bounds, representation type, and length). In all modern
optimization methods, there is an initial setup followed by a main loop cycle that
ends once a given termination criterian is met. Distinct criteria can be adopted (or
even combined):
Deterministic Stochastic
pure blind search monte carlo search
Blind Search grid search
Modern Optimization
Single−State hill climbing
Search tabu search (tabuSearch)
simulated annealing (optim)
genetic/evolutionary
Population algorithm (genalg, mco)
Based Search genetic programming (rgp)
differential evolution (DEoptim)
particle swarm optimization (pso)
Naturally
estimation of distribution
algorithm (copulaedas) Inspired
Fig. 1.2 Classification of the optimization methods presented in this book (related R packages are
in brackets)
1.7 Demonstrative Problems 7
This section includes examples of simple optimization tasks that were selected
mainly from a tutorial perspective, where the aim is to easily show the capabilities
of the optimization methods. The selected demonstrative problems include 2 binary,
1 integer and 2 real value tasks. More optimization tasks are presented and explored
in Chaps. 6 (multi-optimization tasks) and 7 (real-world tasks).
The binary problems are termed here sum of bits and max sin. The former, also
known as max ones, is a simple binary maximization “toy” problem, defined as
(Luke 2012):
X
D
fsum of bits .x/ D xi (1.1)
iD1
8 1 Introduction
X
D
fbag prices D xi sales.xi / cost .xi / (1.3)
iD1
The middle left graph of Fig. 1.3 plots the full search space for the first item of
bag prices (D D 1), while the middle right plot shows a zoom near the optimum
solution. As shown by the graphs, the profit function follows in general a global
convex shape. However, close to the optimum (x1 D 414, f .x1 / D 11420) point
there are several local minima, under a “saw” shape that is more difficult to optimize.
As shown in Chap. 3, the optimum solution for five different bags (D D 5) is
x D c.414; 404; 408; 413; 395/, which corresponds to an estimated profit of
$43,899.
Turning to the real value tasks, two popular benchmarks are adopted, namely
sphere and rastrigin (Tang et al. 2009), which are defined by:
P
fsphere .x/ D D xi2
PiD1
D (1.4)
frastrigin .x/ D iD1 .xi2 10 cos 2xi C 10/
where x D .x1 ; : : : ; xD / is a real value vector (xi 2 <). For both tasks, the
goal is to find the minimum value, which is the origin point (e.g., if D D 4 and
x D .0; 0; 0; 0/, then f .x/ D 0). The sphere task is more simpler, mainly used for
demonstration purposes, while the rastrigin is much more difficult multi-modal
1.7 Demonstrative Problems 9
1.0
8
l l
l optimum l optimum
0.8
evaluation function
evaluation function
6
0.6
4
0.4
2
0.2
0.0
0
l
10000
l optimum l optimum
11400
l
l
l
l
l
l
evaluation function
evaluation function
l
l
l
l
l l
6000
l
l l
l
l l
l
11200
l l
l
l l
2000
l l l
l l l
l l l
l
l l l
l
l
−2000
l l
l
11000
0 200 400 600 800 1000 390 400 410 420 430
search space search space
x2 x2
x1 x1
Fig. 1.3 Example of the binary (sum of bits—top left; max sin—top right), integer (bag prices—
middle) and real value (sphere—bottom left; rastrigin—bottom right) task landscapes
problem, given that the number of local minima grows exponentially with the
increase of dimensionality (D). The differences between sphere and rastrigin are
clearly shown in the two graphs at the bottom of Fig. 1.3.