Academia.eduAcademia.edu

Variable Neighbourhood Search

2003, Inteligencia Artificial,revista Iberoamericana De Inteligencia Artificial

The basic idea of VNS is the change of neighbourhoods in the search for a better solution. VNS proceeds by a descent method to a local minimum exploring then, systematically or at random, increasingly distant neighbourhoods of this solution. Each time, one or several points within the current neighbourhood are used as initial solutions for a local descent. The method jumps from the current solution to a new one if and only if a better solution has been found. Therefore, VNS is not a trajectory following method (as Simulated Annealing or Tabu Search) and does not specify forbidden moves.

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/225198396 Variable Neighbourhood Search Chapter · January 2006 DOI: 10.1007/0-387-33416-5_4 CITATIONS READS 172 58 4 authors, including: José Andrés Moreno-Pérez Nenad Mladenovic 136 PUBLICATIONS 2,161 CITATIONS 230 PUBLICATIONS 8,166 CITATIONS Universidad de La Laguna SEE PROFILE University of Valenciennes and Hainaut-Cambr… SEE PROFILE Belen Melian Universidad de La Laguna 81 PUBLICATIONS 727 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Design and implementation of a parallel platform for massive data analysis in port logistics View project All content following this page was uploaded by Nenad Mladenovic on 01 February 2014. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. Chapter 5 VARIABLE NEIGHBOURHOOD SEARCH VNS for Training Neural Networks José Andrés Moreno Pérez, Pierre Hansen, Nenad Mladenovic, Belén Melián Batista, Ignacio J. García del Amo Universidad de La Laguna Abstract: The basic idea of VNS is the change of neighbourhoods in the search for a better solution. VNS proceeds by a descent method to a local minimum exploring then, systematically or at random, increasingly distant neighbourhoods of this solution. Each time, one or several points within the current neighbourhood are used as initial solutions for a local descent. The method jumps from the current solution to a new one if and only if a better solution has been found. Therefore, VNS is not a trajectory following method (as Simulated Annealing or Tabu Search) and does not specify forbidden moves. Key words: Estas son las palabras claves 1. INTRODUCTION Artificial neural networks allow to approximate non-linear mappings from several input variables to several output variables. In order to do it, the structure of the network has to be fixed and a set of parameters known as weights have to be tuned. If the outputs are continuous variables then it is prediction or approximation problem while in classification the output is a single categorical variable. Most of the key issues in the net functionality are common to both. The main goal in the fitting process is to obtain a model which makes good predictions for new inputs (i.e. to provide good generalization). Once the structure of the network is given, the problem is to find the values of the weights w that optimize the performance of the network in the classification of prediction task. In the supervised learning approach, given with a training 2 Chapter data set, the network is trained for the classification or prediction task by tuning the values of the weights in order to minimize the error across the training set. The training set T consists of a series of input patterns and the corresponding outputs. If the function f to be approximated or predicted has an input vector of variables x = ( x1, x2, …, xn) and the output is represented by f(x), the error of the prediction is the difference between the output p(w,x) provided by the network to the inputs x using the weights w and real value f(x). The usual way to measure the total error is typically by the root mean squared difference between the predicted output p(w,x) and the actual output value f(x) for all the elements x in T (RMSE; Root Mean Squared Error) RMSE (T , w) = 1 ( f ( x) − p ( w, x)) 2 ∑ | T | x∈ T Therefore, the task of training the net consisting in tuning the weights in interpreted as the non-linear optimization problem of minimizing the RMSE on the training set through an optimal set of values w* for the weights. That is to solve the problem RMSE (T , w*) = min RMSE (T , w), w ≥ 0. w To this problem, one can apply specific or general optimization techniques. However, the main goal in the design of an artificial neural network is to obtain a design which makes best predictions for future inputs (i.e. which provide achieves the best possible generalization). Therefore the design must allow the representation of the systematic aspects of the data rather than their specific details. To evaluate the generalization provided by the network, the standard way consists of introducing another set V of pairs input/outputs in order to perform the validation. Once the training has been performed and the weights have been chosen, the performance of the design is given by the root mean squared error across the validation set V, i.e., the validation error, computed by: RMSE (V ; T , w) = 1 ∑ ( f ( y) − p(w*, y )) 2 | V | y∈ V The net must exhibit a good fit between the target values and the output (prediction) in the training set and also in the testing set. If the RMSE in T is significantly higher than that one in E, we will say that the net has . 3 memorized the data, instead of learning them (i.e., the net has over-fitted the training data). In order to avoid over-fitting by stopping the training before the network starts to memorize the data instead of learning the general characteristic of the instance, is useful to use a third disjoint set of instance VT. Then, from time to time (i.e., when the number of iterations reach to some values) the percentage of the accuracy on this set obtained; when this value increased instead of decreases the training stops. If the design of the model has few parameters it is difficult to fit the network to the data and if it has too many parameters and the structure is enough general, it would over-fit the training data set excessively minimizing the training errors against the improvement of the validation errors. However, there is not a well established criterion to determine the appropriate number of parameters. Moreover there is not consensus on what are the good architectures of the networks. The usual neural networks used for the approximation or prediction of a function consists of a series of input nodes or neurones (one for each variable of the function), one output neuron (or several if the function is multi-dimensional) that are interconnected by variable set of hidden network. A kind of structure that has been widely used is the multilayer architecture where the neurons are organized in a set of layers with interconnections only between neurons of different layers. The multilayer neural networks for prediction have, in addition the input layers and the output layers a series of layers with a finite set of neurons from the input to the output where there is a link from every neuron each neuron of the next layer. The simplest model consists of a network with only a hidden network with h neurons, therefore the set of neurons N of the network consists of n input neurons (n is the number of variables of the function to be approximated) in the input layer NI, h neurons in the hidden layer NH, and a single neuron in the output layer NO. Moreover all the connections go from the neurons of the input layer to the neurons of the hidden layer and from the neurons of the hidden layer to the neuron of the output layer. Blum and Li (1991) proved that a neural network having two layers and sigmoid hidden units can approximate any continuous mapping arbitrarily well. As consequence, regarding the classification problem, two layer networks with sigmoid units can approximate any decision boundary to arbitrary accuracy. However, Gallant and White (1992) showed that, from a practical point of view, the number of hidden units must grow as the size of the data set to be approximated (or classified) grows. Within a multilayer neural network, the neurons can be enumerated consecutively through the layers from the first to the last layers. So we consider a network with a hidden layer to predict a real valued function with 4 Chapter n variables consisting of a set of input neurons NI = { 1, 2, …, n }, a set of hidden neurons NH = {n+1, n+2, …, n+h} and the output neurons n+h+1. The links are: L = { (i,n+j): i = 1, …, n, j = 1, …, h } ∪ { (n+j,n+h+1): j = 1, …, h }. This network is shown in the figure Given an input pattern x = (x1, x2, …, xn) for the input neurons of the network, each hidden neuron receive an input from each input neuron to which it is connected and send its output to the output neuron. In the usual models the, the input of each neuron of the hidden and output layers are linear combinations of the weights of the links and the output of the previous layer. So, the input for the j-th neuron of the hidden layer is x j = wj + n ∑ i= 1 x j wij ; j = n + 1, n + 2,..., n + h. Here, wj is the weight associated to the bias of the previous layer. Each neuron of the hidden layer, transform its input in an output by the expression yj = g(xj) being g the sigmoid function g(x) = 1/(1+exp(−x)) one of the most used function. However, in prediction is usual to consider linear activation function for the output layer . 2. 5 THE VNS METHODOLOGY A optimization problem consists in finding the minimum or maximum of a real valued function f defined on an arbitrary set X. If it is a minimization problem, it can be formulated as follows min{ f ( x) : x ∈ X } (1) In this notation, X denotes the solution space, x represents a feasible solution and f the objective function of the problem. It is a combinatorial optimization problem if the solution space is discrete or partially discrete. An optimal solution x* (or a global minimum) of the problem is a feasible solution for which the minimum of (1) is reached. Therefore, x* ∈ X satisfies that f(x*) ≤ f(x), ∀ x ∈ X . A Neighbourhood Structure in X is defined by a function N: X → 2X where, ∀ x ∈ X, N(x) ⊆ X is the set of neighbours of x. Then, a local minimum x' of the problem (1), with respect to (w.r.t. for short) the neighbourhood structure N, is a feasible solution x' ∈ X that satisfies the following property: f(x’) ≤ f(x) , ∀ x ∈ N(x’). Therefore any local or neighbourhood search method (i.e., method that only moves to a better neighbour of the current solution) is trapped when it reaches a local minimum. Several metaheuristics, or frameworks for building heuristics, extend this scheme to avoid being trapped in a local optimum. The best known of them are Genetic Search, Simulated Annealing and Tabu Search (for discussion of these metaheuristics and others, the reader is referred to the books of surveys edited by Reeves (1993) and Glover and Kochenberger (1993)). Variable Neighbourhood Search (VNS) (Mladenović (1995), Mladenović and Hansen (1997), Hansen and Mladenović (1999), Hansen and Mladenović (2000), Hansen and Mladenović (2001), Hansen and Mladenović (2003)) is a recent metaheuristic that exploits systematically the idea of neighbourhood change, both in the descent to local minima and in the escape from the valleys which contain them. Hence, VNS proceeds by a descent method to a local minimum exploring then a series of different predefined neighbourhoods of this solution. Each time one or several points of the current neighbourhood are used as starting points to run a local descent method this stops at a local minimum. The search jumps to the new local minimum if and only if it is better than the incumbent. In this sense, VNS is not a trajectory following method (that allows non-improving moves within the same neighbourhood) as Simulated Annealing or Tabu Search. Unlike many other metaheuristics, the basic schemes of VNS and its extensions are simple and require few, and sometimes no parameters. 6 Chapter Therefore in addition to providing very good solutions, often in simpler ways than other methods, VNS gives insight into the reasons for such a performance, which in turn can lead to more efficient and sophisticated implementations. Despite its simplicity it proves to be effective. VNS exploits systematically the following observations: 1. A local minimum with respect to one neighbourhood structure is not necessary so for another; 2. A global minimum is a local minimum with respect to all possible neighbourhood structures. 3. For many problems local minima with respect to one or several neighbourhoods are relatively close to each other. The last observation, which is empirical, implies that a local optimum often provides some information about the global one. There may for instance be several variables with the same value in both. However, it is usually not known which ones are such. An organized study of the neighbourhoods of this local optimum is therefore performed in order, until a better one is found. Variable neighbourhood descent (VND) is a deterministic version of VNS. It is based on the observation 1 mentioned above, i.e., a local optimum for a first type of move x → x' (or heuristic, or within the x neighbourhood N1(x)) is not necessary one for another type of move x ← ~ (within neighbourhood N2(x)). It may thus be advantageous to combine descent heuristics. This leads to the basic VND scheme presented in figure \ref{figVND}. VND method 1. Find an initial solution x. 2. Repeat the following sequence until no improvement is obtained: (i) Set l ← 1 ; (ii) Repeat the following steps until l = lmax : (a) Find the best neighbor x' of x (x'∈ Nl(x)); (b) If the solution x' thus obtained is better than x , set x ← x' and l ← 1; otherwise, set l ← l +1; Figure -1. Variable Neighbourhood Descent Method \label{figVND} . 7 Another simple application of the VNS principles appears in the reduced VNS. It is a pure stochastic search method: solutions from the pre-selected neighbourhoods are chosen at random. Its efficiency is mostly based on Fact 3 described above. A set of neighbourhoods N1(x), N2(x), ···, Nkmax(x) around the current point x (which may be or not a local optimum) is considered. Usually, these neighbourhoods are nested, i.e., each one contains the previous. Then a point x' is chosen at random in the first neighbourhood. If its value is better than that of the incumbent (i.e., f(x') < f(x)), the search is recentered there (x ← x'). Otherwise, one proceeds to the next neighbourhood. After all neighbourhoods have been considered, the search begins again with the first one, until a stopping condition is satisfied (usually it is the maximum computing time since the last improvement, or the maximum number of iterations). The description of the steps of Reduced VNS is as shown in figure \ref{figRVNS}. RVNS method 1. Find an initial solution x; choose a stopping condition; 2. Repeat the following sequence until the stoping condition is met: (i) Set k ← 1 ; (ii) Repeat the following steps until k = kmax : (a) Shake. Take at random a solution x' from Nk(x); (b) If the solution x' is better than the incumbent, move there (x ← x') and continue the search with N1 (k ← 1); otherwise, set k ← k +1; Figure -2. RVNS In the two previous methods, we examined how to use variable neighbourhoods in descent to a local optimum and in finding promising regions for near-optimal solutions. Merging the tools for both tasks leads to the General Variable Neighbourhood Search scheme. We first discuss how to combine a local search with systematic changes of neighbourhoods around the local optimum found. We then obtain the Basic VNS scheme of figure \ref{figBVNS}. 8 Chapter BVNS method 1. Find an initial solution x; choose a stopping condition; 2. Repeat the following sequence until the stoping condition is met: (i) Set k ← 1 ; (ii) Repeat the following steps until k = kmax : (a) Shaking. Generate a point x' at random from the kth neighborhood of x ( x' ∈ Nk(x)); (b) Local Search. Apply some local search method with x' as initial solution; denote with x'' the so obtained local optimum; (c) Move or not. If the local optimum x'' is better than the incumbent x, move there (x ← x'') and continue the search with N1 (k ← 1); otherwise, set k ← k +1; Figure -3. BVNS The simplest basic VNS, where the neighbourhood for shaking is fixed, is called Fixed Neighbourhood Search (FSN) (see Brimberg et al. (2000)) and sometimes called Iterated Local Search, (see Lourenco et al. (2003)). The method selects by a perturbation a neighbour of the current solution, runs a local search from it to reach a local optimum, and moves to it if there has been an improvement. Therefore, the definition of different neighbourhood structures is not necessary or, one can fix only one among them (i.e., by fixing k) and jump (or 'kick the function') in the shaking (or perturbation) step to a point from that fixed neighbourhood. For example in Johnson and Mc-Geosh (1997) a new solution is always obtained from 4-opt (double-bridge) neighbourhood in solving TSP. Thus, k is fixed to 4. The steps of the simplest VNS are obtained taking only one neighbourhood (see figure \ref{figSimpleVNS} . 9 FNS method 1. Initialization: Find an initial solution x; Set x* ← x; 2. Iterations: Repeat the following sequence until a stopping condition is met: (a) Shake. Take at random a neighbor x' of x ( x' ∈ Nk(x)); (b) Local Search. Apply the local search method with x' as initial solution; denote x'' the so obtained local optimum; (c) Move or not. If x'' is better than x*, do x* ← x'' Figure -4. FNS If one uses Variable Neighborhood Descent instead of simple local search and if one improves the initial solution found by Reduced VNS, one obtains the General Variable Neighborhood Search scheme (GVNS) shown in figure \ref{figGVNS}. 10 Chapter GVNS method 1. Initialization: Select the set of neighborhood structures Nk, for k = 1,···,kmax, that will be used in the shaking phase, and the set of neighborhood structures Nl for l = 1, ···, lmax that will be used in the local search; find an initial solution x and improve it by using RVNS; choose a stopping condition; 2. Iterations: Repeat the following sequence until the stopping condition is met: (i) Set k ← 1; (ii) Repeat the following steps until k = kmax; (a) Shaking. Generate at random a point x' in the k-th neighborhood of x ( x' ∈ Nk(x)); (b) Local Search by VND. Set l ← 1; and repeat the following steps until l = lmax; 1. Find the best neighbor x'' of x in Nl(x') 2. If f(x'') < f(x') set x' ← x'' and l ← 1; otherwise set l ← l + 1; (c) Move or not. If this local optimum is better than the incumbent, move there (x ← x''), and continue the search with N1 (k ← 1); otherwise, set k ← k + 1; Figure -5. GVNS Then a C code for the simple version of the Variable Neighborhood Search is shown in figure \ref{figVNScode}. . 11 SVNS Code 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: initialize(best_sol) ; k = 0 ; while (k < k_max) { k++ ; cur_sol = shake(best_sol,k) ; local_search(cur_sol,best_sol) if improved(cur_sol,best_sol) best_sol = cur_sol ; k = 0 ; } /* if */ } /* while */ Figure -6. FNS code This code of the VNS can be applied to any problem if the user provides the initialization procedure initialize, the shake shake, the local search local_search and the function improved to test if the solution is improved or not. 2.1 Parallelization of the VNS The application of parallelism to a metaheuristic may allow to reduce the computational time (by the partition of the sequential program) or to increase the exploration in the search space (by the application of independent search threads). In order to do carry out this task, we need to know the parts of the code of an appropriate size that can be partitioned to be solved simultaneously. Several strategies for parallelizing a VNS algorithm have been proposed and analyzed in the literature. Four different parallelization strategies have been reported in the literature (see Hansen et al. (2004) and Hansen et al. (2005)); two are simple parallelizations and the other two are more complex strategies. The two simple parallelizations of the VNS consist of parallelizing the local search and replicating the whole VNS in the processors respectively. The other two additional parallel strategies are proposed in García et al. (2002) and in Crainic et al. (2004). The first of the two simple parallelization strategies (analyzed in García et al. (2002)) attempts to reduce computation time by parallelizing the local search in the sequential VNS and is denoted SPVNS ( Synchronous Parallel VNS ). The second one implements an independent search strategy that runs an independent VNS procedure on each processor and is denoted RPVNS (Replicated Parallel VNS ). The two additional parallelization strategies use 12 Chapter cooperation mechanisms to improve the performance. The ReplicatedShaking VNS parallelization (RSVNS) of the VNS proposed in García et al. (2002) applies a synchronous cooperation mechanism through a classical master-slave approach. The Cooperative Neighbourhood VNS (CNVNS) parallelization proposed in Crainic et al. (2004) applies a cooperative multisearch method based on a central-memory mechanism. In the Replicated-Shaking VNS (RSVNS), the master processor runs a sequential VNS but the current solution is sent to each slave processor that shakes it to obtain an initial solution from which the local search is started. The solutions obtained by the slaves are passed to the master that selects the best and continues the algorithm. The independence between the local searches in the VNS allows their execution in independent processors and updating the information about the joint best solution found. This information must be available for all the processors in order to improve the intensification of the search. The Cooperative Neighbourhood VNS (CNVNS) proposed by Crainic et al. (2004) is obtaining by applying the cooperative multi-search method to the VNS metaheuristic. This parallelization method is based on the centralmemory mechanism that has been successfully applied to a number of different combinatorial problems. In this approach, several independent VNS's cooperate by asynchronously exchanging information about the best solution identified so far, thus conserving the simplicity of the original, sequential VNS ideas. The asynchronous cooperative multi-search parallel VNS proposed allows a broader exploration of the solution space by several VNS searches. The controlled random search nature of the shaking in the VNS and its efficiency is altered significantly by the cooperation mechanism that implements frequent solution exchanges. However, the CNVNS implements a cooperation mechanism that allows each individual access to the current overall best solution without disturbing its normal proceedings. Individual VNS processes communicate exclusively with a central memory or master. There are no communications among individual VNS processes. The master keeps, updates, and communicates the current overall best solution. Solution updates and communications are performed following messages from the individual VNS processes. The master initiates the algorithms by executing a parallel RVNS (without local search) and terminates the whole search by applying a stopping rule. Each processor implements the same VNS algorithm. It proceeds with the VNS exploration for as long as it improves the solution. When the solution is not improved any more it is communicated to the master if better than the last communication, and the overall best solution is requested from the master. The search is the continued starting from the best overall solution . 13 in the current neighborhood. The CNVNS procedure is summarized as follows: The combination of the Variable Neighborhood Search and parallelism provides a useful tool to solve hard problems. The VNS, as a combination of series of random and local searches, is parallelizable in several ways. Two simple parallelization strategies are the Synchronous Parallel VNS (SPVNS) that is obtained by parallelizing the local search and the Replicated Parallel VNS (RPVNS) that is obtaining by parallelizing the whole procedure so that each processor runs in parallel a VNS. These parallelizations provide the basic advantages of the parallel procedures. However using cooperative mechanisms, the performance is improved by the Replicated-Shaking VNS (RSVNS) proposed in García et al. (2002) that applies a synchronous cooperation mechanism through a classical master-slave approach and additionally improved by the Cooperative Neighborhood VNS (CNVNS) proposed in Crainic et al. (2004) that applies a cooperative multi-search method based on a central-memory mechanism. 3. APLICATION OF THE VNS TO THE TRAINING PROBLEM When considering the problem of training an artificial neural network from the metaheuristics point of view, it is useful to treat it as a global optimization problem. The error E is a function of the adaptative parameters of the net, which all can be arranged (i.e., weights and biases) into a single W-dimensional weights vector w with components w1...wW. Then the problem consists in finding the weights vector with the lowest error. In this chapter a solution is referred to as a vector of weights since there are no restrictions in the values a weight can have. Therefore, every point of the Wdimensional space is a feasible solution. In this section, we try to explain how a VNS variant can be applied to solve this problem. The key point of the VNS is that it searches over neighborhood structures in the solution space. A neighborhood of a solution s1 is simply a group of solutions that are related to s1 through a neighborhood function. Usually, this function is the classical euclidean distance, so as a solution s 2 belongs to the k-th neighborhood of a solution s1 if the euclidean distance from s2 to s1 is less than or equal to k. From this example we can also deduce that this function generates nested neighborhoods, in the sense that if a solution s 2 belongs to the k-th neighborhood of s1, it also belongs to the (k+1)-th neighborhood. This is a desirable property, because if we find that a certain solution s3 is a local minimum of the k-th neighborhood, it is also most likely a local minimum of all the previous neighborhoods 1...k-1. 14 Chapter If the solution space has no inner structure, or this structure is unknown to us, the euclidean distance is a neighborhood function as suitable as any other can be, with the advantage that it is an intuitive function and generates nested neighborhoods. However, if the solution space shows some kind of inner structure and we have information about it, we can use it to generate a neighborhood function that is better fitted to the problem. In the artificial neural network training problem, it seems obvious that the solutions have a certain structure. In fact, we know the structure, since the elements of a solution (the weights) have to be arranged spatially according with the architecture of the net. Therefore the goal is to use this information to generate a good neighborhood function. Our proposal is to define a neighborhood of a solution as all the solutions that share all their weights with the original solution but a finite number of them. More specifically, a solution s2 belongs to the k-th neighborhood of a solution s1 if they only differ in the values of the weights that arrive to the kth neuron. This definition implies that a search through the k-th neighborhood of a solution will only allow changes in the weights that feed the k-th neuron, improving each neuron locally to best fit the global performance of the net. This approach has one main drawback: it is strongly local (it improves each neuron one at a time), and thus, it does not generate nested neighborhoods. An alternative proposal is to consider for the k-th neighborhood not only the weights that arrive to the k-th neuron, but also the weights that arrive to the (k+1)-th neuron. This approach does not still generate nested neighborhoods, but at least it generates overlapping neighbourhoods. These neighbourhoods make the method less local than the previous ones. After deciding the way in which we are going to define a neighborhood, the next step is to consider the stopping criterion of the search process. The usual criterion for stopping the training process for an artificial neural network can be used here: stop when the net starts over-fitting by using a test data set to decide this. The best solution found until the moment by the VNS would be used with this test set, and the error would be logged. If the test error decreases, it means that the solution is good enough both for the train and test sets, and the search process can continue. But if the test error starts to increase, it may mean that the net is over-fitting, and the search should stop. This stopping criterion may require more iterations than neurons in the net, which can lead to inconsistencies in the way neighborhoods are selected (k can reach a value larger than the neurons in the net, producing an undefined neighborhood). To avoid this situation, we can slightly modify the neighborhood function to make it modular, thus selecting the k-th neighborhood modulus W. . 15 The VNS relies on the local search procedure to find a local minimum in a certain neighborhood. The great advantage of the VNS is that it greatly reduces the search space for the local search procedure (with this approach, from W to the number of weights arriving to a neuron, which is typically much smaller than W). Local search procedures may use no gradient information at all (as, for example, the Simplex method), first order derivative information (gradient descent, line search, conjugate gradients), or even second order derivatives (Quasi-Newton methods, LevenbergMarquardt method, etc). The key concept here is that gradient information is fast to calculate with the backpropagation technique, and that it is independent of the method used to search the weights-space. 4. EXPERIMENTAL TESTS 5. CONCLUSIONS 6. REFERENCES J.E. Beasley. A note on solving large p-median problems, European Journal of Operational Research, 21 (1985) 270-273. M.L. Brandeau and S.S. Chiu. An overview of representative problems in location research. Management Science 35 (1989) 645-674. G. Cornuejols, M.L. Fisher and G.L. Nemhauser. Location of bank accounts to optimize float: An analytic study of exact and approximate algorithms, Management Sci. 23 (1977) 789810. T.G. Crainic, M. Gendreau, P. Hansen and N. Mladenović, Cooperative parallel variable neighbourhood search for the p -median. Journal of Heuristics 10 (2004) 293-314. T.G. Crainic, M. Gendreau, Cooperative parallel tabu search for the capacitated network design. Journal of Heuristics 8 (2002) 601-627. J.A. Díaz, E. Fernández, Scatter search and Path relinking for the capacitated p -median problem, European Journal of Operational Research (2004), forthcoming. Drezner, Z. (ed.) Facility location: A survey of applications and methods, Springer, 1995. F. García López, B. Melián Batista, J.A. Moreno Pérez and J.M. Moreno Vega, The parallel variable neighbourhood search for the p -median problem. Journal of Heuristics , 8 (2002) 375-388. F. Glover and G. Kochenberger (eds.), Handbook of Metaheuristics, Kluwer, 2003. 16 Chapter P. Hanjoul, D. Peeters. A comparison of two dual-based procedures for solving the p -median problem. European Journal of Operational Research, 20 (1985) 387-396. P. Hansen and N. Mladenovic. Variable neighborhood search for the p -median. Location Science , 5 (1997) 207--226. P. Hansen and N. Mladenović, An introduction to Variable neighborhood search, in: S. Voss et al. eds., Metaheuristics, Advances and Trends in Local Search Paradigms for Optimization , Kluwer, (1999) 433-458. P. Hansen and N. Mladenović. Developments of variable neighborhood search, C. Ribeiro, P. Hansen (eds.), Essays and surveys in metaheuristics , Kluwer Academic Publishers, Boston/Dordrecht/London, (2001) 415--440. P. Hansen and N. Mladenović, Variable neighborhood search: principles and applications, European Journal of Operational Research} 130 (2001) 449-467. P. Hansen, N. Mladenovic. Variable Neighborhood Search. In F. Glover and G. Kochenberger (eds.), Handbook of Metaheuristics Kluwer (2003) 145--184. P. Hansen, N. Mladenovic and D. Pérez-Brito. Variable neighborhood decomposition search. Journal of Heuristics 7 (2001) 335-350. O. Kariv, S.L. Hakimi. An algorithmic approach to network location problems; part 2. The p -medians. SIAM Journal on Applied Mathematics, 37 (1969) 539-560. A.A. Kuehn, M.J. Hamburger. A heuristic program for locating warehouses. Management Science, 9 (1963) 643-666. L.A.N. Lorena, E.L.F. Senne, A column generation approach to capacitated p-median problems Computers and Operations Research 31 2004) 863-876. H.R. Lourenco, O. Martin, and T. Stuetzle. Iterated Local Search. In F. Glover and G. Kochenberger (eds.), Handbook of Metaheuristics, Kluwer, (2003) 321--353. F.E. Maranzana. On the location of supply points to minimize transportation costs. Operations Research Quarterly, 12 (1964) 138-139. P. Mirchandani and R. Francis, (eds.) Discrete location theory , Wiley-Interscience, (1990). N. Mladenovic. A variable neighborhood algorithm-A new metaheuristic for combinatorial optimization. Presented at Optimization Days, Montreal (1995) pp. 112. N. Mladenovic and P. Hansen. Variable neighborhood search. Computers Oper. Res. 24 (1997) 1097--1100. N. Mladenovic, J.A. Moreno-Pérez, and J. Marcos Moreno-Vega. A chain-interchange heuristic method, Yugoslav J. Oper. Res. 6 (1996) 41-54. C.R. Reeves, Modern heuristic techniques for combinatorial problems. Blackwell Scientific Press, (1993). M.G.C. Resende and R.F. Werneck, A hybrid heuristic for the p -median problem, Journal of Heuristics 10 (2004) 59--88. E. Rolland, D.A. Schilling and J.R. Current, An efficient tabu search procedure for the p -median problem, European Journal of Operational Research , 96 (1996) 329-342. K.E. Rosing, C.S. ReVelle and H. Rosing-Vogelaar, The p -median and its linear programming relaxation: An approach to large problems, Journal of the Operational Research Society , 30 (1979) 815-823. K.E. Rosing and C.S. ReVelle, Heuristic concentration: Two stage solution construction, European Journal of Operational Research 97 (1997) 75-86. E.L.F. Senne and L.A.N. Lorena, Lagrangean/surrogate heuristics for p -median problems, in: M. Laguna and J. L. González-Velarde, eds. Computing Tools for Modeling Optimization and Simulation: Interfaces in Computer Science and Operations Research , Kluwer (2000) 115-130. . 17 E.L.F. Senne and L.A.N. Lorena, Stabilizing column generation using Lagrangean/surrogate relaxation: an application to p -median location problems, European Journal of Operational Research , To appear (2002) M.B. Teitz, P. Bart. Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations Research, 16 (1968) 955-961. S. Voss. A reverse elimination approach for the p-median problem. Studies in Locational Analysis, 8 (1996) 49-58. R. Whitaker, A fast algorithm for the greedy interchange of large-scale clustering and median location problems, INFOR 21 (1983) 95-108. Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford University Press, New York. Chambers, J. and T. Hastie (1991), Statistical models in S, Wadsworth/Brooks Cole, Pacific grove, CA. Cleveland, W.S. (1979) "Robust Locally Weighted Regression and Smoothing Scatterplots," Journal of the American Statistical Association, Vol. 74, pp. 829-836. Fahlman, S.E. (1988) "An empirical study of learning speed in back-propagation networks", In T. J. Sejnowski G. E. Hinton and D. S. Touretzky, editors, Connectionist Models Summer School, San Mateo, CA, Morgan Kaufmann, pp. 38-51. Friedman, J. H. and Silverman, B. W. (1989), Flexible parsimonious smoothing and additive modelling (with discussion), Technometrics, 31, 3-39. Glover, F. (1989) “Tabu Search-Part 1”, ORSA Journal on Computing, vol 1, pp. 190-206. Hastie, T., R. Tibshirami and J. Friedman (2001), The elements of statistical learning, Springer, New-York. Hornik, K., M. Stinchcombe and H. White (1989) “Multilayer feedforward networks are universal approximators”, Neural networks, vol. 2, pp. 359-366. Jacobs, R.A., (1988) "Increased Rates of Convergence Through Learning Rate Adaptation", Neural Networks, 1, pp. 295-307. Laguna, M. and R. Martí (2002) “Neural Network Prediction in a System for Optimizing Simulations,” IIE Transactions, vol. 34(3), pp. 273-282. Laguna, M. and R. Martí (2003) Scatter Search: Methodology and Implementations in C, Kluwer Academic Publishers. Martí, R. and A. El-Fallahi (2004) “Multilayer neural networks: an experimental evaluation of on-line training methods” Computers and Operations Research 31, pp. 1491-1513. Press, W. H., S. A. Teukolsky, W. T. Vetterling and B. P. Flannery (1992) Numerical Recipes: The Art of Scientific Computing, Cambridge University Press (www.nr.com). R Development Core Team. (2003), “R: A Language and Environment for Statistical Computing”, “http://www.R_project.org”. Sexton, R. S., B. Alidaee, R. E. Dorsey and J. D. Johnson (1998) “Global Optimization for Artificial Neural Networks: A Tabu search Application,” European Journal of Operational Research, vol. 106, pp. 570-584. Smith, S. and L. Lasdon (1992), "Solving Large Nonlinear Programs Using GRG," ORSA Journal on Computing, Vol. 4, No. 1, pp. 2-15.