Optimizacion R

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Chapter 1

Introduction

1.1 Motivation

A vast number of real-world (often complex) tasks can be viewed as an optimization


problem, where the goal is to minimize or maximize a given goal. In effect,
optimization is quite useful in distinct application domains, such as Agricul-
ture, Banking, Control, Engineering, Finance, Marketing, Production and Science.
Moreover, due to advances in Information Technology, nowadays it is easy to store
and process data. Since the 1970s, and following the Moore’s law, the number of
transistors in computer processors has doubled every 2 years, resulting in more
computational power at a reasonable price. And it is estimated that the amount
of data storage doubles at a higher rate. Furthermore, organizations and individual
users are currently pressured to increase efficiency and reduce costs. Rather than
taking decisions based on human experience and intuition, there is an increasing
trend for adopting computational tools, based on optimization methods, to analyze
real-world data in order to make better informed decisions.
Optimization is a core topic of the Operations Research field, which developed
several classical techniques, such as linear programming (proposed in the 1940s)
and branch and bound (developed in the 1960s) (Schrijver 1998). More recently,
in the last decades, there has been an emergence of new optimization algorithms,
often termed “modern optimization” (Michalewicz et al. 2006), “modern heuristics”
(Michalewicz and Fogel 2004), or “metaheuristics” (Luke 2012). In this book, we
adopt the first term, modern optimization, to describe these algorithms.
In contrast with classical methods, modern optimization methods are general
purpose solvers, i.e., applicable to a wide range of distinct problems, since few
domain knowledge is required. For instance, the optimization problem does not
need to be differentiable, which is required by classical methods such as gradient
descent. There are only two main issues that need to be specified by the user
when adopting modern heuristic methods (Michalewicz et al. 2006): the representa-
tion of the solution, which defines the search space and its size; and the evaluation
function, which defines how good a particular solution is, allowing to compare

© Springer International Publishing Switzerland 2014 1


P. Cortez, Modern Optimization with R, Use R!, DOI 10.1007/978-3-319-08263-9__1
2 1 Introduction

different solutions. In particular, modern methods are useful for solving complex
problems for which no specialized optimization algorithm has been developed (Luke
2012). For instance, problems with discontinuities, dynamic changes, multiple
objectives or hard and soft restrictions, which are more difficult to be handled by
classical methods (Michalewicz et al. 2006). Also in contrast with classical methods,
modern optimization does not warranty that the optimal solution is always found.
However, often modern methods achieve high quality solutions with a much more
reasonable use of computational resources (e.g., memory and processing effort)
(Michalewicz and Fogel 2004).
There is a vast number of successful real-world applications based on modern
optimization methods. Examples studied by the author of this book include (among
others): sitting guests at a wedding party (Rocha et al. 2001); time series forecasting
(Cortez et al. 2004); optimization of data mining classification and regression
models (Rocha et al. 2007); and improvement of quality of service levels in
computer networks (Rocha et al. 2011).

1.2 Why R?

The R tool (R Core Team 2013) is an open source, high-level matrix programming
language for statistical and data analysis. The tool runs on multiple platforms,
including Windows, MacOS, Linux, FreeBSD, and other UNIX systems. R is an
interpreted language, meaning that the user gets an immediate response of the tool,
without the need of program compilation. The most common usage of R is under a
console command interface, which often requires a higher learning curve from the
user when compared with other more graphical user interface tools. However, after
mastering the R environment, the user achieves a better understanding of what is
being executed and higher control when compared with graphical interface based
products.
The R base distribution includes a large variety of statistical techniques (e.g.,
distribution functions, statistical tests), which can be useful for inclusion in modern
optimization methods and to analyze their results. Moreover, the tool is highly
extensible by creating packages. The R community is very active and new packages
are being continuously created, with more than 5,800 packages available at the Com-
prehensive R Archive Network (CRAN): http://www.r-project.org/. By installing
these packages, users get access to new features, such as: data mining/machine
learning algorithms; simulation and visualization techniques; and also modern
optimization methods. New algorithms tend to be quickly implemented in R, thus
this tool can be viewed as worldwide gateway for sharing computational algorithms
(Cortez 2010). While it is difficult to know the real number of R users (e.g., it may
range from 250,000 to 2 million), several estimates show a clear growth in the R
popularity (Vance 2009; Muenchen 2013). A useful advantage of using R is that it
is possible to execute quite distinct computational tasks under the same tool, such as
1.4 Evaluation Function 3

combining optimization with statistical analysis, visualization, simulation, and data


mining (see Sect. 7.4 for an example that optimizes data mining models).
To facilitate the usage of packages, given that a large number is available,
several R packages are organized into CRAN task views. The Optimization
and Mathematical Programming view is located at http://cran.r-project.org/web/
views/Optimization.html and includes more than 60 packages. In this book, we
explore several of these CRAN view packages (and others) related with modern
optimization methods.

1.3 Representation of a Solution

A major decision when using modern optimization methods is related with how
to represent a possible solution (Michalewicz et al. 2006). Such decision sets the
search space and its size, thus producing an impact on how new solutions are
searched. To represent a solution, there are several possibilities. Binary, integer,
character, real value and ordered vectors, matrices, trees and virtually any computer
based representation form (e.g., computer program) can be used to encode solutions.
A given representation might include a mix of different encodings (e.g., binary and
real values). Also, a representation might be of fixed (e.g., fixed binary vectors) or
of variable length (e.g., trees).
Historically, some of these representation types are attached with specific
optimization methods. For instance, binary encodings are the basis of Genetic
Algorithms (Holland 1975). Tabu search was designed to work on discrete
spaces (Glover and Laguna 1998). Real-value encodings are adopted by several
evolutionary algorithms (e.g., evolution strategy) (Bäck and Schwefel 1993),
differential evolution, and particle swarms. Tree structures are optimized using
genetic programming (Banzhaf et al. 1998). It should be noted that often these
optimization methods can be adapted to other representations. For instance, a novel
particle swarm was proposed for discrete optimization in Chen et al. (2010).
In what concerns this book, the representations adopted are restricted by the
implementations available in the explored R tool packages, which mainly adopt
binary or real values. Thus, there will be more focus on these type of representations.
Nevertheless, Sect. 7.2 shows a ordered vector representation optimization example.
More details about other representations, including their algorithmic adjustments,
can be found in Luke (2012).

1.4 Evaluation Function

Another important decision for handling optimization tasks is the definition of


the evaluation function, which should translate the desired goal (or goals) to be
maximized or minimized. Such function allows to compare different solutions,
by providing either a rank (ordinal evaluation function) or a quality measure score
4 1 Introduction

evaluation function

evaluation function
l
local minimum
l
local minimum
l l
global minimum global minimum

search space search space

Fig. 1.1 Example of a convex (left) and non-convex (right) function landscapes

(numeric function) (Michalewicz et al. 2006). When considering numeric functions,


the shape can be convex or non-convex, with several local minima/maxima
(Fig. 1.1). Convex tasks are much easier to solve and there are specialized algorithms
(e.g., least squares, linear programming) that are quite effective for handling such
problems (Boyd and Vandenberghe 2004). However, many practical problems
are non-convex, often including noisy or complex function landscapes, with
discontinuities. Optimization problems can even be dynamic, changing through
time. For all these complex problems, an interesting alternative is to use modern
optimization algorithms that only search a subset of the search space but tend to
achieve near optimum solutions in a reasonable time.
By default, some implementations of optimization methods only perform a
minimization of a numerical evaluation function. In such cases, a simple approach is
to transform the maximization function max .f .s// into the equivalent minimization
task  min .f 0 .s//, by adopting f 0 .s/ D f .s/, where s denotes the solution.
In several application fields (e.g., Control, Engineering, Finance) there are two
or more goals that need to be optimized. Often, these goals conflict and trade-
offs need to be set, since optimizing solutions under a single objective can lead to
unacceptable outcomes in terms of the remaining goals. In such cases, a much better
approach is to adopt a multi-objective optimization. In this book, we devote more
attention to single response evaluation functions, since multi-objective optimization
is discussed in a separated chapter (Chap. 6).

1.5 Constraints

There are two main types of constraints (Michalewicz 2008): hard and soft. Hard
constraints cannot be violated and are due to factors such as laws or physical
restrictions. Soft constraints are related with other (often non-priority) user goals,
such as increasing production efficiency while reducing environmental costs.
Soft restrictions can be handled by adopting a multi-objective approach
(Chap. 6), while hard constraints may originate infeasible solutions that need to
1.6 Optimization Methods 5

be treated by the optimization procedure. To deal with infeasible solutions, several


methods can be adopted (Michalewicz et al. 2006): death-penalty, penalty-weights,
repair and only generate feasible solutions.
Death-penalty is a simple method, which involves assigning a very large penalty
value, such that infeasible solutions are quickly discarded by the search (see
Sect. 4.4 for an example). However, this method is not very efficient and often
puts the search engine effort in discarding solutions and not finding the optimum
value. Penalty-weights is a better solution, also easy to implement. For example,
quite often, the shape of an evaluation function can be set within the form f .s/ D
Objective.s/  Penalty.s/ (Rocha et al. 2001). For instance, for a given business,
a possible evaluation function could be f D w1  Profit.s/  w2  Cost.s/.
The main problem with penalty-weights is that often it is difficult to find the ideal
weights, in particular when several constraints are involved. The repair approach
transforms an infeasible solution into a feasible one. Often, this is achieved by using
domain dependent information (such as shown in Sect. 5.2) or by applying a local
search (e.g., looking for a feasible solution in the solution space neighborhood,
see Sect. 5.7). Finally, the approaches that only generate feasible solutions are
based in decoders and special operators. Decoders work only in a feasible search
space, by adopting an indirectly representation, while special operators use domain
knowledge to create new solutions from previous ones.

1.6 Optimization Methods

There are different dimensions that can be used to classify optimization methods.
Three factors of analysis are adopted in this book: the type of guided search; the
search is deterministic or stochastic based; and if the method is inspired by physical
or biological processes.
The type of search can be blind or guided. The former assumes the exhaustion
of all alternatives for finding the optimum solution, while the latter uses previous
searches to guide current search. Modern methods use a guided search, which often
is subdivided into two main categories: local, which searches within the neighbor-
hood of an initial solution, and global search, which uses a population of solutions.
In most practical problems, with high-dimensional search spaces, pure blind search
is not feasible, requiring too much computational effort. Local (or single-state)
search presents in general a much faster convergence, when compared with global
search methods. However, if the evaluation landscape is too noisy or complex, with
several local minima (e.g., right of Fig. 1.1), local methods can easily get stuck.
In such cases, multiple runs, with different initial solutions, can be executed to
improve convergence. Although population based algorithms tend to require more
computation than local methods, they perform a simultaneous search in distinct
regions of the search space, thus working better as global optimization methods.
The distinct search types can even be combined. For instance, a two-phase search
can be set, where a global method is employed at a first step, to quickly identify
6 1 Introduction

interesting search space regions, and then, as the second step, the best solutions are
improved by employing a local search method. Another alternative is to perform
a tight integration of both approaches, such as under a Lamarckian evolution or
Baldwin effect (Rocha et al. 2000). In both cases, within each cycle of the population
based method, each new solution is used as the initial point of a local search and the
evaluation function is computed over the improved solution. Lamarckian evolution
replaces the population original solution by its improved value (Sect. 7.2 presents
a Lamarckian evolution example), while the Baldwinian strategy keeps the original
point (as set by the population based method).
Several modern methods employ some degree of randomness, thus belonging
to the family of stochastic optimization methods, such as simulated annealing
and genetic algorithms (Luke 2012). Also, several of these methods are naturally
inspired (e.g., genetic algorithms, particle swarm optimization) (Holland 1975;
Eberhart et al. 2001). Figure 1.2 shows the full taxonomy of the optimization
methods presented in this book (with respective R packages).
The distinct modern optimization methods share some common features.
Algorithm 1 shows (in pseudo-code) a generic modern optimization method that
is applicable to all methods discussed in this book. This global algorithm receives
two inputs, the evaluation function (f ) and a set of control parameters (C ), which
includes not only the method’s internal parameters (e.g., initial temperature,
population size) but it is also related with the representation of the solution
(e.g., lower and upper bounds, representation type, and length). In all modern
optimization methods, there is an initial setup followed by a main loop cycle that
ends once a given termination criterian is met. Distinct criteria can be adopted (or
even combined):

Deterministic Stochastic
pure blind search monte carlo search
Blind Search grid search
Modern Optimization
Single−State hill climbing
Search tabu search (tabuSearch)
simulated annealing (optim)
genetic/evolutionary
Population algorithm (genalg, mco)
Based Search genetic programming (rgp)
differential evolution (DEoptim)
particle swarm optimization (pso)
Naturally
estimation of distribution
algorithm (copulaedas) Inspired

Fig. 1.2 Classification of the optimization methods presented in this book (related R packages are
in brackets)
1.7 Demonstrative Problems 7

Algorithm 1 Generic modern optimization method


1: Inputs: f; C F f is the evaluation function, C includes control parameters
2: S i ni t i ali zat i on.C / F S is a solution or population
3: i 0 F i is the number of iterations of the method
4: while not t ermi nat i on_cri t eri a.S; f; C; i / do
5: S0 change.S; f; C; i / F new solution or population
6: B best .S; S 0 ; f; C; i / F store the best solution
7: S select .S; S 0 ; f; C; i / F solution or population for next iteration
8: i i C1
9: end while
10: Output: B F the best solution

• maximum computational measures—such as number of iterations, evaluation


function calculations, time elapsed;
• target measures—such as to stop if best value is higher or equal to a given
threshold;
• convergence measures—such as number of iterations without any improvement
in the solution or average enhancement achieved in the last iterations; and
• distribution measures—such as measuring how spread are the last tested
solutions in the search space and stop if the dispersion is smaller than a threshold.
What distinguishes the methods is related with two main aspects. First, if in each
iteration there is a single-state (local based) or a population of solutions. Second,
the way new solutions are created (function change) and used to guide in the search
(function select). In the generic pseudo-code, the number of iterations (i ) is also
included as an input of the change, best, and select functions because it is assumed
that the behavior of these functions can be dynamic, changing as the search method
evolves.

1.7 Demonstrative Problems

This section includes examples of simple optimization tasks that were selected
mainly from a tutorial perspective, where the aim is to easily show the capabilities
of the optimization methods. The selected demonstrative problems include 2 binary,
1 integer and 2 real value tasks. More optimization tasks are presented and explored
in Chaps. 6 (multi-optimization tasks) and 7 (real-world tasks).
The binary problems are termed here sum of bits and max sin. The former, also
known as max ones, is a simple binary maximization “toy” problem, defined as
(Luke 2012):

X
D
fsum of bits .x/ D xi (1.1)
iD1
8 1 Introduction

where x D .x1 ; : : : ; xD / is a boolean vector (xi 2 f0; 1g) with a dimension


(or length) of D. The latter problem is another simple binary task, where the goal is
to maximize (Eberhart and Shi 2011):
P
x0 D D iD1 xi 2
i1
0 (1.2)
fmax sin .x/ D sin . 2xD /

where x 0 is the integer representation of x.


A visualization for both binary problems is given in top of Fig. 1.3, assuming
a dimension of D D 8. In the top left graph, x-axis denotes x 0 , the integer
representation of x for sum of bits. In the example, the optimum solution for the
sum of bits is x D .1; 1; 1; 1; 1; 1; 1; 1/ (f .x/ D 8), while the best solution for max
sin is x D .1; 0; 0; 0; 0; 0; 0; 0/, x 0 D 128 (f .x/ D 1).
The bag prices is an integer optimization task (proposed in this book), that
mimics the decision of setting of prices for items produced in a bag factory.
The factory produces up to five (D D 5) different bags, with a unit cost of
u D .$30; $25; $20; $15; $10/, where ui is the cost for manufacturing product i .
The production cost is cost .xi / D 100 C ui  sales.xi / for the i -th bag
type. The number of expected sales, which is what the factory will produce, is
dependent on the product selling price (x) and marketing effort (m), according to
the formula sales.xi / D round..1000= ln .xi C 200/  141/  mi /, where round
is the ordinary rounding function and m D .2:0; 1:75; 1:5; 1:25; 1:0/. The manager
at the factory needs to decide the selling prices for each bag (xi , in $), within the
range $1 to $1,000, in order to maximize the expected profit related with the next
production cycle:

X
D
fbag prices D xi  sales.xi /  cost .xi / (1.3)
iD1

The middle left graph of Fig. 1.3 plots the full search space for the first item of
bag prices (D D 1), while the middle right plot shows a zoom near the optimum
solution. As shown by the graphs, the profit function follows in general a global
convex shape. However, close to the optimum (x1 D 414, f .x1 / D 11420) point
there are several local minima, under a “saw” shape that is more difficult to optimize.
As shown in Chap. 3, the optimum solution for five different bags (D D 5) is
x D c.414; 404; 408; 413; 395/, which corresponds to an estimated profit of
$43,899.
Turning to the real value tasks, two popular benchmarks are adopted, namely
sphere and rastrigin (Tang et al. 2009), which are defined by:
P
fsphere .x/ D D xi2
PiD1
D (1.4)
frastrigin .x/ D iD1 .xi2  10 cos 2xi C 10/

where x D .x1 ; : : : ; xD / is a real value vector (xi 2 <). For both tasks, the
goal is to find the minimum value, which is the origin point (e.g., if D D 4 and
x D .0; 0; 0; 0/, then f .x/ D 0). The sphere task is more simpler, mainly used for
demonstration purposes, while the rastrigin is much more difficult multi-modal
1.7 Demonstrative Problems 9

1.0
8
l l
l optimum l optimum

0.8
evaluation function

evaluation function
6

0.6
4

0.4
2

0.2
0.0
0

0 50 100 150 200 250 0 50 100 150 200 250


search space (x) search space

l
10000

l optimum l optimum

11400
l
l
l
l
l
l
evaluation function

evaluation function
l
l
l
l
l l
6000

l
l l

l
l l

l
11200

l l

l
l l
2000

l l l

l l l

l l l

l
l l l

l
l
−2000

l l
l
11000

0 200 400 600 800 1000 390 400 410 420 430
search space search space

x2 x2

x1 x1

Fig. 1.3 Example of the binary (sum of bits—top left; max sin—top right), integer (bag prices—
middle) and real value (sphere—bottom left; rastrigin—bottom right) task landscapes

problem, given that the number of local minima grows exponentially with the
increase of dimensionality (D). The differences between sphere and rastrigin are
clearly shown in the two graphs at the bottom of Fig. 1.3.

You might also like