CHAPTER 1
Parallel Branch-and-Bound Algorithms
TEODOR GABRIEL CRAINIC
Département de management et technologie École des Sciences de la Gestion Université du
Québec à Montréal and CIRRELT, Canada
BERTRAND LE CUN and CATHERINE ROUCAIROL
Laboratoire PRiSM, Université de Versailles (France)
1.1. INTRODUCTION
In the beginning of the twenty-first century, large unsolved combinatorial optimization problems have been solved exactly.
Two impressive cases should be mentioned. First are two instances of the
famous Symmetric Traveling Salesman problem (TSP) with >10,000 cities
(respectively, 13,509 and 15,112 cities; instances usa13509, d15112) by Applegate et al. (1). Second are instances of the Quadratic Assignment Problem
(QAP) up to size 32, Nugent 30 (900 variables) and Krarup 32 (1024 variables)
by Anstreicher, Brixius, Goux, and Linderoth (2,3). The results on the instance
Nugent 30 have been announced in American (Chicago Tribune, Chicago Sun
Times, HPCWire, WNCSA Access Magazine) and French (InfoScience, Le
Monde, Transfert) newspapers. This impressive media frenzy reflected the
impact of the achievement, which was deemed of the same order of magnitude as the victory of IBMs parallel computer DeepBlue over the chess world
champion Kasparov.
Several factors combined to bring on these achievements. A first reason is
the scientific progress in Operations Research, in particular regarding the
quality of the lower bounds for these problems (cutting plane techniques for
the TSP and convex quadratic programming for the QAP). The computation
of these new bounds is very time consuming, however. Moreover, the bounds
are computed at each node of a tree whose size is huge (several billions of
nodes). The progress in processor computing power certainly contributed.
Parallel Combinatorial Optimization, edited by El-Ghazali Talbi
Copyright © 2006 by John Wiley & Sons, Inc.
1
2
PARALLEL BRANCH-AND-BOUND ALGORITHMS
These two reasons would not have been sufficient, however. The utilization of
parallel branch-and-bound (B&B) strategies on large computer clusters and
grids with advanced programming tools, including multithreading and fault
tolerance functionalities, is the third factor of success. Indeed, the TSP instance
usa13509 required 48 workstations (DECAlpha, Pentium II, Pentium Pro, and
Ultrasparc) that explored in parallel a tree of 9539 nodes. The instance Nugent
30 (30 firms to be assigned to 30 sites) needed the exploration of a tree with
11,892,208,412 nodes and a network of 2510 heterogeneous machines with an
average number of 700 working machines (Pc, Sun, and SGI Origin2000).
These machines were distributed in two national laboratories (Argonne,
NCSA), five American universities (Wisconsin, Georgia tech, New Mexico,
Colombia, Northwestern), and was connected to the Italian network INFN.
The time spent was ~1 week! But, the equivalent sequential time on a HPC3000, for example, was estimated at 218,823,577 s or 7 years!
The resolution of these problems by using parallel B&B illustrates the interest of this methodology. Not all NP-hard combinatorial optimization problems
or problem instances may be equally well addressed, however. An honest
analysis of the above results shows that these performances are also due to
some characteristics of the problems, particularly the existence of very good
upper bounds. Moreover, the tree search strategy used was equivalent to a
brute force exploration of the tree. In most cases, bounds are less tight and
more advanced parallel tree exploration strategies must be used. The goal of
this chapter is to discuss some of these challenges and present some of the
parallel B&B strategies that may be used to address them. We will show how
parallelism could help to fight against the combinatorial burst and to address
efficiently (linear speed up) and accurately (optimal solutions) combinatorial
optimization problems of considerable size. A general review of parallel B&B
methodology and literature may be found in Gendron and Crainic (4).
The chapter is organized as follows. Section 1.2 briefly recalls the sequential B&B algorithm. Section 1.3 presents the different sources of parallelism
that can be exploited in a B&B algorithm. Section 1.4 discusses the performances that can be obtained from a theoretical and experimental point of view.
Section 1.5 presents different parallelization strategies, with their respective
advantages, issues, and limits. Section 1.6 briefly reviews B&B libraries proposed to help the user to implement the different parallelization strategies and
to benefit from advanced programming tools, including multithreading and
fault tolerance functionalities. To illustrate these concepts, the application of
parallel B&B to the QAP is presented in Section 1.7. Section 1.8 contains concluding remarks.
1.2. SEQUENTIAL B&B
Let us briefly recall the main components of a B&B algorithm. Suppose the
following combinatorial optimization problem is to be solved: Given a finite
SEQUENTIAL B&B
3
discrete set X, a function F : X → ⺢, and a set S, where S ⊆ X, and an optimal
solution x* ∈ S, such that f(x*) = min{f(x)|∀x ∈ S}. The set S is usually a set
of constraints and is called feasible domain, all elements x ∈ S being feasible
solutions. We assume that S is finite or empty.
A branch-and-bound (B&B) solution method consists in implicitly enumerating S by examining a subset of feasible solutions only. The other solutions are eliminated when they cannot lead to a feasible or an optimal solution.
Enumerating the solutions of a problem consists in building a B&B tree
whose nodes are subsets of solutions of the considered problem. The size
of the tree, that is, the number of generated nodes, is directly linked to the
strategy used to build it.
Synthetically, the B&B paradigm could be summed up as follows:
Building the Search Tree
•
•
A branching scheme splits X into smaller and smaller subsets, in order to
end up with problems we know how to solve; the B&B nodes represent
the subsets, while the B&B edges represent the relationship linking a
parent-subset to its child-subsets created by branching.
A search or exploration strategy selects one node among all pending nodes
according to priorities defined a priori. The priority of a node or set Si,
h(Si), is usually based either on the depth of the node in the B&B tree,
which leads to a depth-first tree-exploration strategy, or on its presumed
capacity to yield good solutions, leading to a best-first strategy.
Pruning Branches
•
•
•
A bounding function υ gives a lower bound for the value of the best solution belonging to each node or set Si created by branching.
The exploration interval restricts the size of the tree to be built: Only
nodes whose evaluations belong to this interval are explored, other nodes
are eliminated. The smallest value associated to a pending node in a
current tree, the upper bound UB, belongs to this interval and may be
provided at the outset by a known feasible solution to the problem, or
given by a heuristic. The upper bound is constantly updated, every time
a new feasible solution is found (value of the best known solution, also
called the incumbent).
Dominance relationships may be established in certain applications
between subsets Si, which will also lead to discard nondominant nodes.
A Termination Condition
This condition states when the problem is solved and the optimal solution
is found. It happens when all subproblems have been either explored or
eliminated.
4
PARALLEL BRANCH-AND-BOUND ALGORITHMS
Fig. 1.1. Graphical view of a sequential B&B algorithm.
From an algorithmic point of view, illustrated in Fig. 1.1, a B&B algorithm
consists in carrying out a series of basic operations on a pool of nodes (set of
nodes of varying or identical priority), usually implemented as a priority
queue: deletemin (select and delete the highest priority element), insert (insert
a new element with predefined priority), deletegreater (delete elements with
higher priority than a given value).
1.3. SOURCES OF PARALLELISM
Two basic, by now classic, approaches are known to accelerate the B&B search:
1. Node-based strategies that aim to accelerate a particular operation,
mainly at the node level: Computation in parallel of lower or upper
bound, evaluation in parallel of sons, and so on.
2. Tree-based strategies that aim to build and explore the B&B tree in
parallel.
Node-based strategies aim to accelerate the search by executing in parallel
a particular operation. These operations are mainly associated to the subproblem, or node, evaluation, bounding, and branching, and range from
“simple” numerical tasks (e.g., matrix inversion), to the decomposition of computing intensive tasks (e.g., the generation of cuts), to parallel mathematical
programming (e.g., simplex, Lagrangean relaxation, capacitated multicommodity network flow) and meta-heuristic (e.g., tabu search) methods used to
CRITICAL B&B TREE AND SPEEDUP ANOMALIES
5
compute lower bounds and to derive feasible solutions. This class of strategies
has also been identified as low-level (or type 1) parallelization, because they
do not aim to modify the search trajectory, neither the dimension of the B&B
tree nor its exploration. Speeding it up is the only objective. It is noteworthy,
however, that some node-based approaches may modify the search trajectory.
Typical examples are the utilization of parallel Lagrangean relaxation or
parallel simplex, particularly when multiple optima exist or the basis is transmitted by the nodes generated by the branching operation.
Other strategies, for example, domain decomposition (decompose the feasible domain and use B&B to address the problem on each of the components
of the partition) and multisearch (several different B&B explore the same
solution domain in parallel with or without communications), hold great
promise for particular problem settings, but have not yet been studied in any
particular depth. Note that these strategies are not mutually incompatible.
Indeed, when problem instances are particularly hard and large, several strategies may be combined into a comprehensive algorithmic design. Thus, for
example, node-based strategies could initiate the search and rapidly generate
interesting subproblems, followed by a parallel exploration of the tree. Or, a
multisearch approach may be set up, where each B&B search is using one or
more parallel strategies. Tree-based strategies have been the object of a broad
and most comprehensive research effort. Therefore, in this chapter, the focus
is on tree-based strategies.
Tree-based parallelization strategies yield irregular algorithms and the corresponding difficulties have been well identified [e.g., Authié et al. (5) and the
STRATAGEMME project (6)]:
•
•
•
•
•
Tasks are created dynamically in the course of the algorithm.
The structure of the tree to explore is not known beforehand.
The dependency graph between tasks is unpredictable: no part of the
graph may be estimated at compilation or runtime.
The assignment of tasks to processors must be done dynamically.
Algorithmic aspects, such as sharing and balancing the workload or transmitting information between processes, must be taken into account at run
time in order to avoid overheads due to load balancing or communication.
Furthermore, parallelism can create tasks that are redundant or unnecessary
for the work to be carried out (research overhead), or which degrade
performances, the so-called speedup anomalies. Some of these issues are
examined in the next section.
1.4. CRITICAL B&B TREE AND SPEEDUP ANOMALIES
The success of the parallelization of a B&B algorithm may be measured experimentally by the absolute speedup obtained with p processors, defined as the
6
PARALLEL BRANCH-AND-BOUND ALGORITHMS
Fig. 1.2. Pseudo-code of a sequential B&B algorithm.
ratio of the time taken by the best serial algorithm over that obtained with a
parallel algorithm using p processors, for one instance of a problem. For the
sake of simplicity, a relative speed-up is often used, defined as the ratio of the
time taken by a serial algorithm implemented on one processor over the time
required by parallelizing the same algorithm and implementing it on p processors. Efficiency is a related measure computed as the speedup divided by the
number of processors.
For parallel tree-based B&B, one would expect results that show almost
linear speedup, close to p (efficiency close to 100%). Yet, the relative speedup
obtained by a parallel B&B algorithm may sometimes be quite spectacular,
>p, while at other times, a total or partial failure (much <p) may be observed.
These behavioral anomalies, both positive and negative, may seem surprising
at first (7). They are mainly due to the combination of the speedup definitions
and the properties of the B&B tree where priorities, or the bounding function,
may only be recognized a posteriori, once the exploration is completed. These
issues have been the subject of a great deal of research in the 1980s (we would
like to quote Refs 8, 9 and refer to Ref. 6 for a more comprehensive literature
review).
In fact, the time taken by a serial B&B is related to the number of nodes
in the fully developed tree. The size of the B&B tree—where the branching
rule, the bounding function υ, and the node-processing priority h have been
defined a priori (prior to execution)—depends on the search strategy and the
properties of υ.
Four different types of nodes may be defined in a B&B tree (6,7), illustrated
in Section 1.3:
CRITICAL B&B TREE AND SPEEDUP ANOMALIES
7
1. Critical nodes, set C, representing incomplete solutions with a value
strictly smaller than the optimum solution f*.
2. Undecidable nodes, set M, which are nonterminal, incomplete, solutions
with a value equal to f*.
3. Optimal nodes, set O, with the value f*.
4. Eliminated nodes, set E, with a value strictly >f*.
As a B&B algorithm is executed according to a best-first strategy, it develops all the critical nodes, some undecidable nodes, and one optimal node.
Certain nodes belonging to E can be explored when using other strategies.
Any strategy will always develop the set of critical nodes, also called the critical tree. Several executions may correspond to the same developed B&B tree,
according to the choices made by the strategy between nodes of equal priority, which may be very numerous.
The minimal tree is defined as the tree which, regardless of the exploration
strategy, must be built to prove the optimality of a feasible solution and that
has the minimal number of nodes. Notice that the critical tree (critical node)
is included in the minimal tree. In parallel processing, p processors must
explore all nodes in the minimal tree. In serial processing, speedup is always
linear (lower than or equal to p) if the minimal tree has been built. Therefore,
the serial time is that of the best possible serial algorithm.
The fact that it is not always possible to define a priori the search strategy
to construct the minimal tree shows that speedups may be favorable (the serial
tree is very large) or unfavorable, and therefore proves the existence of these
anomalies. It is interesting to note that parallelism may have a corrective effect
in serial cases, where the minimal tree has not been built.
In a theoretical synchronous context, where one iteration corresponds to
the exploration of p nodes in parallel by p processors between two synchronizations, a suitable condition for avoiding detrimental anomalies is h to be
discriminating (9). One necessary condition for the existence of favorable
anomalies is that h should not be completely consistent with υ (strictly higher
priority means a lower or equal value). This is the case of the breadth and
depth-first strategies.
The best-first strategy constructs the minimal tree if (sufficient condition)
there are no undecidable nodes (M = 0/), and if the bounding function is discriminating, which means that υ is such that no nodes have equal priority. The
best-first strategy has proved to be very robust (10). The speedup it produces
varies within a small interval around p. It avoids detrimental anomalies if υ is
discriminating, as we have already seen, or if it is consistent (at least one node
of the serial tree is explored at each iteration) (11).
Rules for selecting between equal priority nodes have been studied (11),
with the aim of avoiding unfavorable anomalies in best-first B&B algorithms,
thus eliminating processing overheads or memory occupation, which is not the
case with the other proposals (12). Three policies, based on the order in which
nodes are created in the tree, have been compared: (1) newest (the most
8
PARALLEL BRANCH-AND-BOUND ALGORITHMS
Fig. 1.3. Example of nodes classification of a B&B tree.
recently generated node); (2) leftmost (the leftmost node in the search tree);
and (3) oldest (the least recently generated node).
Bounds calculated on expected speedups show that the “oldest” rule is least
likely to produce anomalies. It is therefore interesting to study the data structures which, unlike the well-known heaps, can implement the “oldest” rule. As
similar results (condition of existence of anomalies) may be demonstrated
when the number of processors is increased (growth anomalies or nonmonotonous increases in speed up), some researchers worked on defining a
measurement, the isoefficiency function iso(p) (13), based on general results
on the “scalability” of parallel algorithms: in order to maintain an efficiency
of e with p processors, the size of the problem processed should grow according to a function iso(p).
Despite the obvious advantage of keeping acceleration anomalies possible,
the interest in avoiding detrimental anomalies has been emphasized. In this
context, Li and Wah (9) presented a special condition on the nodes with same
priority that is sufficient to avoid degradation. Their method is attractive for
depth-first search strategies where anomalous behavior is quite usual, and has
been improved technically by Kalé and Saletore (12).
An example of anomaly with best-first strategy is given in Fig. 1.4. Since the
cost of unfavorable anomalies needs to be compared with the price to forbid
them, it may be worthwhile to consider the design and analysis of basic bestfirst search strategies, which deal with active nodes of same priority (same
value), without either processing or memory overhead due to the removal of
unfavorable anomalies.
STRATEGIES FOR PARALLELIZATION
9
Fig. 1.4. Example of Anomaly on a B&B. (a) The B&B tree. (b) Sequential scheduling with best-first search. (c) Parallel scheduling with 4 processors. (d) Parallel scheduling with five processors.
However, it is very important during the design process of a B&B algorithm
to be able to define different priorities for subproblems (nodes) to be explored
and to deal with nodes of equal priorities.
1.5. STRATEGIES FOR PARALLELIZATION
Most parallel B&B algorithms implement some form or another of tree exploration in parallel. The fundamental idea is that in most applications of
interest the size of the B&B enumeration tree grows, possibly rapidly, to
unmanageable proportions. So, if the exploration of the search tree is done
more quickly by several processes, the faster acquisition of knowledge during
the search (communication between processes) will allow for pruning more
nodes or for eliminating more branches of the tree.
To describe the possible strategies, we start from a representation of the
sequential B&B where a number of operations are performed on the data
structure containing the work to be done (e.g., nodes in various states) and
10
PARALLEL BRANCH-AND-BOUND ALGORITHMS
the information relative to the status of the search (e.g., the incumbent value).
Transposed in a parallel environment, the tree management strategies are presented as search control strategies implying information and pool management
strategies. It is noteworthy that some of these strategies induce a tree exploration different from the one performed by the sequential method.
Recall from Section 1.2 that sequential B&B is fundamentally a recursive
procedure that extracts a node (i.e., a subproblem) from the pool, thus deleting it from the data structure, performs a series of operations (evaluation, computation of upper bound, branching, etc.), and completes the loop by inserting
one or several nodes (i.e., the new subproblems yielded by the branching operation) into the same pool.
Nodes in the pool are usually kept and accessed according to their priority
based on various node attributes (e.g., lower and upper bound values, depth
in the tree) and the search tree exploration strategy (e.g., best or depth-first).
Node priorities thus define an order on the nodes of the pool, as well as the
sequential scheduling of the search. An evident property of the sequential
B&B search is that each time a node is scheduled, the decision has been taken
with a complete knowledge of the information of the state of the search, which
is the global view of all the pending nodes generated so far.
In a parallel environment, both the search decisions, including node scheduling ones, and the search information may be distributed. In particular, not
all the relevant information may be available at the time and place (i.e., the
processor) a decision is taken. Thus, first we examine issues related to the
storage and availability of information in a parallel exploration of a B&B tree.
We then turn to the role processors may play in such an exploration. The
combination of various alternatives for these two components yield the basic
strategies for parallel B&B algorithm design.
Two issues have to be addressed when examining the search information in
a parallel context: (1) how is this information stored; (2) what information is
available at decision time.
The bulk of the search information is made up of the pool of nodes, and
pool management strategies address the first issue with respect to it: if and how
to decompose the pool of nodes. A centralized strategy keeps all the nodes in
a central pool.This implies that this unique pool serves, in one form or another,
all processors involved in the parallel computation. Alternatively, in a distributed strategy, the pool is partitioned, each subset being stored by one processor. Other relevant information (e.g., global status variables like the value of
the incumbent) is of limited size. Consequently, the issue is not whether it is
distributed or not, but rather whether it is available in an up-to-date form
when decisions are taken.
In a parallel B&B algorithm, more than one processor may decide, more
or less simultaneously, to process a node. Their collective action corresponds
to the parallel scheduling of the search, that is, to the management of the entire,
but often distributed, pool and the corresponding distribution of work among
processors.
STRATEGIES FOR PARALLELIZATION
11
The scheduling of nodes is based on the node priorities, defined as before
in the sequential context. We define as search knowledge, the pool of nodes
with their priorities, plus the incumbent value and the other global status variables of the search. The search knowledge may be complete or partial. If the
search knowledge is complete, the resulting scheduling is very close to the
sequential scheduling. Indeed, when at each step, the processor that has to
make up a scheduling decision has an exact and complete knowledge of all
the pending nodes to process, its decision is almost the same as in the sequential case. When only partial information is known to a processor, the scheduling could be really different compared to the sequential one.
When information is distributed, parallel scheduling must also include specific provisions to address a number of particular issues:
•
•
•
•
•
The definition of a parallel initialization phase and of the initial work allocation among processors.
The updating of the global status variables (e.g., the value of the
incumbent).
The termination of the search.
The minimization of the idle time.
The maximization of the meaningful work.
Search control strategies specify the role of each processor in performing
the parallel search, that is, decisions relative to the extraction and insertion of
nodes into the pool(s), the exploration strategy, the work to perform (e.g., total
or partial evaluation, branching, offspring evaluation), the communications to
undertake to manage the different pools, and the associated search knowledge.
From a search control point of view, we distinguish between two basic, but
fundamental, types of processes: master, or control, and slave processes. Master
processes execute the complete range of tasks.They specify the work that slave
processes must do and fully engage in communications with the other master
processes to implement the parallel scheduling of the nodes, control the
exchange of information, and determine the termination of the search. Slave
processes communicate exclusively with their assigned master process and
execute the prespecified tasks on the subproblem they receive. Slave processes
do not engage in scheduling activities.
The classical master-slaves, or centralized control, strategy makes use of
these two types in a two-layer processor architecture, one master process and
a number of slave processors (Fig. 1.5). This strategy is generally combined to
a centralized pool management strategy. Thus, the master process maintains
global knowledge and controls the entire search, while slave processes perform
the B&B operations nodes received from the master processor and return the
result to the master.
At the other end of the spectrum, one finds the distributed control strategy
combined to a distributed pool management approach. In this case, sometimes
also called collegial and illustrated in Fig. 1.6, several master processes
12
PARALLEL BRANCH-AND-BOUND ALGORITHMS
Fig. 1.5. Master-slave B&B algorithm.
Fig. 1.6. Collegial B&B algorithm.
collegially control the search and the associated information. In this basic
approach, there are no slave processes. Each master process is associated to
one of the local pools that stores a subsets of the currently existing nodes (there
is no sharing of local pools among several masters). It then performs the B&B
operations on its local pool based on this partial node information, as well as
on information transmitted by the other processes. The all-master processes
combined activities thus makes up the partial-knowledge parallel scheduling.
The pool distribution and thus the search information distribution often
results in uneven workloads for the various processes during the search. A
load balancing strategy must then be implemented to indicate how the infor-
BRANCH- AND BOUND-SOFTWARE
13
mation relative to the processor workloads circulates and how the corresponding load balancing decisions are taken. Information updating the global status
variables (incumbent value, termination status, etc.) must also be exchanged.
This communication policies set enforces the global control over the search.
From a control point of view, two approaches are again available: either the
decision is collegially distributed among the processes or it is (more or less)
centralized. Thus, in the basic collegial strategy introduced above, all master
processes may exchange messages (according to various communication
topologies and policies) and decide collectively on how to equilibrate loads.
Alternatively, one of the processors acts as load balancing master. It collects
the load information and decides on the data exchanges to perform.
These strategies and types of processes are the basic building blocks that
may be combined to construct more complex, hierarchical parallel strategies,
with more than one level of information (pool) distribution, search control,
and load balancing. Thus, for example, a processor could depend on another
for its own load balancing, while controlling the load balancing for a group of
lower level master processes, as well as the work of a number of slave
processes.
1.6. BRANCH-AND-BOUND SOFTWARE
The range of purpose for B&B software is quite large, so it stands to reason
that the numbers of problem types, user interface types, parallelization types,
and machine types are also large.
This chapter focuses on software for which there exists an interface to
implement Basic B&B, which means that one can use this type of library to
implement one’s own bounding procedure and one’s own branching procedure. Several Libraries or applications exist, which are dedicated to one type
of problems, commercial linear solvers like CPLEX, XPress-MP, or Open
Source Software like GLPK (14) or lp_Solve are dedicated to solve Mixed
Integer Linear program, but the B&B part is hidden in a “Black-Box”. It could
not be customized to implement an efficient parallel one.
However, these solvers could be used using a callable library. An application that implements a Parallel Basic B&B could use this kind of Solvers to
compute the Bound. In this case, only the linear solver is used, the B&B part
is ignored.
We first explain the designing tips used to develop this kind of software.
Then we try to discuss the different Frameworks in terms of User algorithms
they provide, the proposed Parallelization strategies, and finally the target
machines.
1.6.1. Designing of a B&B FrameWork
When we consider the really basic form of B&B, framework is the only possible form of software. A Framework in this context is an implementation of
14
PARALLEL BRANCH-AND-BOUND ALGORITHMS
a B&B, with hooks that allow the user to provide custom implementations of
certains aspects of the algorithm. For example, the user may wish to provide
a custom branching rule, or custom bounding function. The customization is
generally accomplished either through the use of C language callback functions or through a C++ interface in which the user must derive certain base
classes and override default implementations for the desired function. In most
cases, base classes and default implementations are abstract. The main algorithm, for example, the B&B loop, is written as a skeleton of algorithm.A skeleton uses abstract classes or abstract functions that must be redefined to obtain
a concrete application.
In a Basic B&B, the abstractions are the following: node type, which
includes the data in order to compute the bound of the subproblem and also
stores the partial solution of the subproblem; solution type, which includes the
value and the solution itself (the value of the variables), sometimes the type
could be the same for the solution and for the subproblem; a function to evaluate the node, which according to the data stored in the node computes
the evaluation of the partial solution, or the cost of a complete solution; a
function used for branching, which must implement the method by which a
subproblem is divided into subproblems according to the problem; a function
to generate the root subproblem; and a function to generate the initial
solution.
The Framework generally offers the following functionalities: the priority
of a node that decides the order in which the subproblems will be explored.
The choice of the node priority defines the search strategy; the management
of the pending nodes, which is generally made using priority queues. As said
in Section 1.5 about the parallelization strategy, the framework could store one
or several priority queues, the managing of the pending nodes is then centralized or distributed; the management of the incumbent when a process finds
a better solution, the framework must update it automatically in the other
processes; and the main B&B loop that is executed by each process.
Let us present a modified version of Fig. 1.2 in order to present the data or
function that must must be redefined by the user and the parts that are provided by the Framework. This design is not the unique possible design for a
Framework. Authors of such a framework surely present theirs in a different
way. But we believe that this design is what correponds more to reality.
The type Node represents a subproblem. As already mentioned, it must
be defined by the user, in a Object Oriented Framework (written in C++,
e.g.), the user defines its own type by derivating the type offered by the
Framework.
Here, the type of the Solution (incumbent) is the same as the type of a subproblem; as said before, some Frameworks defined different types for the solution and for the nodes.
The type ParaPriorityQueue is provided by the Framework. It is used by
the main loop to obtain a new node to be explored and to insert a new generated nodes.
BRANCH- AND BOUND-SOFTWARE
15
Fig. 1.7. Skeleton of a parallel B&B algorithm.
According to the parallelisation strategy, this type could have different
implementations. First, in a centralized strategy, it is just an interface to receive
or send nodes from and to the master process. Second, in a distributed strategy the ParaPriorityQueue stores a local priority queue, and also executes a
load balancing procedure in order to ensure that each process has enough
interesting nodes. This Data structure, or more generally this interface, could
be seen as a high level communication tool. The main loop does not know
where the nodes are really stored, but it just knows that this tool could be used
to obtain and to insert nodes. The IsEmtpty method or function of the ParaPriorityQueue interface will be true only if no more Nodes exist in the entire
application. In a distributed parallel machine like a cluster, for example, this
interface will use a message passing library, like MPI or PVM, to transmit
Nodes from one process to another.
16
PARALLEL BRANCH-AND-BOUND ALGORITHMS
Fig. 1.8. Graphical view of a Framework and its interactions.
The Update function or method is used to broadcast to each process, the
value or the new best solution found. The same communication library as the
ParaPriorityQueue will be used. The management of the incumbent could be
merged with the management of the Pending nodes.
Then we could see the Framework as a composition of different levels as
presented in the Fig. 1.8. On the higher level, the user defines types and functions. On the low level the target machine.
There are many Frameworks for parallel B&B including:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
PUBB (Univ. of Tokyo) (15).
BOB++ (Univ. of Versailles) (16).
PPBB-Lib (Univ. of Paderborn) (17).
PICO (Sandia Lab. & Rutgers University) (18).
FATCOP (University of Wisconsin) (19).
MallBa (Consortium of Spanish universities) (20).
ZRAM (Eth Zürich) (21).
BCP (CoinOR) (22).
ALPS/BiCePS (Leghih University, IBM, Clemson University) (22,23).
MW Framework (Leghih University) (24).
Symphony (Rice Univ.) (22,25,26).
They differ by the type of B&B they are able to solve, by the type of parallelization they propose, and then by the cibling machine.
Some of them propose a C interface [Bob (27), PPBB-Lib, MallBa, ZRAM,
PUBB], although the other ones propose a C++ interface (Bob++, ALPS/
BiCePS, PICO, MW).
BRANCH- AND BOUND-SOFTWARE
17
1.6.2. User Algorithms
As already discussed, there exist a large variety of B&B: Basic B&B with a
huge set of bounding procedures, Branch and Price, Branch and Cut. One
could also consider that a simple divide and conquer algorithm is a base for a
B&B algorithm.
With this multiplcity of methods, to make Framework easy to maintain, easy
to use, and as flexible as possible, a possible design is a multilayered class
library, in which the only assumptions made in each layer about the algorithm
being implemented are those needed for implementing specific functionnality
efficiently. Each of the proposed frameworks proposes a subset of these layers
in core.
1.6.2.1. The Low-Level Algorithm: The Divide and Conquer. From a
design point of view, a Simple tree search procedure like Divide and Conquer
could be a base for a B&B procedure. Both are tree search procedures, but
the B&B has additional functionalities, like prunning where the evaluation of
the subproblem is used to discard a branch of the tree. For the parallelism,
the differences of these two algorithms do not imply many modifications. If a
parallel B&B is implemented, a parallel Divide and Conquer could also be
implemented without a big overcost. For example, ZRAM, Bob++, ALPS,
PUBB and MallBa implement simple tree search algorithms like backtracking. ZRAMand MallBA proposes Divide and Conquer and B&B has two
different methods. Although, Bob++, ALPS, and PUBB, try to modelize Divide
and Conquer as a base class of a B&B.
1.6.2.2. The Basic B&B. Bob, Bob++, PUBB, PPBB, ZRAM, PICO, MallBa,
ALPS, BCP, and MW, propose an interface for a basic B&B. The interface are
quite similar, and correspond to the design we presented in the previous
section.
1.6.2.3. Mixed-Integer Linear Programming. A Mixed-Integer Linear
Program could be solved using a B&B where the bounding operation is performed using tools from linear programming and polyhedral theory. Since
Linear Programming does not accommodate the designation of integral variables, the integrality constraints are relaxed to obtain a linear programming
relaxation. Most of time, this formulation is augmented with additional constraints or cutting planes, that is, inequalities valid for the convex hull of solutions to the original problem. In this way, the hope is to obtain an integral (and
hence feasible) solution.
Each Framework that proposes a Basic B&B could also be used to solve
Integer Linear Programs. The evaluation function and the Branching function
could be written using a linear solver. The Coin Open Solver Interface (OSI)
is very usefull for that. The OSI consists of a C++ base class with containers
for storing instance data, as well as a standard set of problem import, export
18
PARALLEL BRANCH-AND-BOUND ALGORITHMS
modification, solution, and query routines. Each supported solver has a corresponding derived class that implements the methods of the base class. A
nonexhaustive list of supported solver is CPLEX, XpressMP, lp_solve, GLPK,
CLP.
However, an important feature that must be included in Frameworks to
solve such problems efficiently, is a way to globally store the set of already
generated cutting planes. One cutting plane that has been generated for one
subproblem could be valid for another subproblem. Then, to avoid redundant
computation, cutting planes should be stored in a “global data structure”, from
which processes could obtain an already generated cutting plane without
regenerating it. The ALPS/BiCePS, PICO, SYMPHONY, and BCP Frameworks propose such data structure. The ALPS framework introduces the
notion of Knowledge Base (KB).
1.6.2.4. Branch and Price and Cut. The Frameworks that propose a native
interface for Branch and Price and Cut are Symphony and ALPS/BiCePS.
As SYMPHONY is written in C, its user interface consists in implementing
callback funtions including cutting-planes, generation, management of cut
pool, management of the LP relaxation, search and dividing strategies, and
so on.
ALPS/BiCePs, which seems to be the Symphony replacement should also
propose an interface for Branch, Price, and Cut. Unlike Symphony, ALPS/
BiCeps is layered allowing the resolution of sevral types of B&B. Symphony
seems to be only for Branch and Cut.
1.6.2.5. Other User Methods. Another very interesting method that could
also be offered by this kind of Framework is a graph search procedure. Bob++,
ZRAM and MallBa propose an interface to develop Dynamic Programming
application. In this kind of method, the difficulty is the parallel management
of the state space. In a parallel environment the state space must be global,
thus an algorithm to maintain the coherency of the state space between all the
processes must be proposed.
1.6.3. Parallelization Strategies
Symphony and MW, use the master-worker paradigm (see Section 1.5) to
parallelize the application. The nodes management is thus centralized. One
process, the master, controls all the aspects of the search. In the other
processes, the workers do the work (an exploration of one or several nodes)
provided by the master. But as the task unity is the subtree and not the subproblem, each worker performs a search on a subtree, thus it could be considered that the worker controls its search. Hence, the control of the search is
distributed. MW (24) has an interesting feature in a sense that it is based on
a Fault Tolerant communication library Condor (28). As said before, this strategy works well for a small number of processors, but does not scale well, as
ILLUSTRATIONS
19
the central pool inevitably becomes a computational and communications
bottleneck.
ALPS (23) and PICO (18) propose to use the Master-Hub-Worker paradigm, to overcome the drawbacks of the master-Worker approach. A layer of
middle management is inserted between the master and worker process. In
this scheme, “a cluster” consists of a hub, which is responsible for managing a
fixed number of workers. As the number of processes increases, more hubs
and cluster of workers are simply added. This decentralized approach maintains many advantages of global decision making while reducing overhead and
moving some computational burden from the master process to the hubs.
The other libraries propose one or several distributed strategies. Some of
them have used PVM (29) (PUBB, Bob, PPBB-Lib, ZRAM), but the modern
ones use MPI (Bob++, ALPS, PICO, MallBa). PPBB-Lib proposes a fully distributed parallel B&B, where each process stores a local pool of subproblems.
Several Load balancing strategies were proposed in order to ensure that each
local pool has enough work.
In the vast majority of these frameworks, there is no easy way to extend
the parallel feature. Only one parallel algorithm and one communication layer
are proposed. As its ancestor Bob, Bob++ also proposes an interface to extend
the parallel part of the library. Master–Slave, fully distributed, or mixed strategy could be implemented using this interface. Master–Slave, distributed versions of parallelisations exist for the Bob Library.
1.6.4. The Target Machines
According to the accessibility of the authors to specific machine, Frameworks
have been ported to various types of machines. For example, Bob has been
ported on PVM (29), MPI, PM2 (30), Charm++ (31), Athapascan (32), POSIX
threads, using shared memory machines, and then using distributed machines.
The recent architecture being clusters of shared memory machines (SMP) or
grid-computers, the current frameworks have proposed versions for these
machines. MallBA, PICO, and ALPS use MPI as their message passing library.
Bob++, MallBa, PICO, and ALPS run very well on a massively parallel distributed memory system. Bob++ (Athapascan based version) could also
run on a cluster of SMPs, where a multithreading programming paradigm is
used on each SMP node. Bob++ has been succesfully tested on a Grid (see
Section 1.7).
As far as we know, only Bob++, MW, PICO, ALPS, MallBa, and Symphony
are still being maintained.
1.7. ILLUSTRATIONS
Parallel B&B algorithms have been developped for many important applications, such as the Symmetric Traveling Salesman problem [Applegate, Bixby,
20
PARALLEL BRANCH-AND-BOUND ALGORITHMS
Chvatal, and Cook (1)], the Vehicle Routing problem [Ralphs (33), Ralphs,
Ladány, and Saltzman (34)], and the multicommodity location and network
design [Gendron and Crainic (35), Bourbeau, Gendron, and Crainic (36)], to
name but a few. In this section, we illustrate the parallel B&B concepts introduced previously by using the case of the Quadratic Assignment Problem. The
description follows the work of (37).
The Quadratic Assignment Problem (QAP) consists to assign n units to n
sites in order to minimize the quadratic cost of this assignment, which depends
on both the distances between the sites and the flows between the units. It can
be formulated as follows:
Given two (n × n) matrices,
F = ( fij )
where
fij
is the flow between units i and j,
D = (dkl )
where
dkl
is the distance between sites k and l
find a permutation p of the set N = {1,2, . . . ,n}, which minimizes the global
cost function:
n
n
Cost ( p) = ∑ ∑ fij d p ( i ) p ( j )
i =1 j =1
Although the QAP can be used to formulate a variety of interesting problems in location, manufacturing, data analysis, and so on, it is a NP–hard combinatorial problem, however, and, in practice, extraordinarily difficult to solve
even for what may seem “small” instances. This is very different compared to
other NP–hard combinatorial optimization problems, for example, the Traveling Salesman problem, for which, in practice, very large instances have been
successfully solved to optimality in recent years. To illustrate this difficulty, the
B&B for QAP develops huge critical trees. Thus, for example, the tree for
Nugent24 has 48,455,496 nodes with bounds for values in [3488, 3490], while
for Nugent30, the tree has 12,000,000,000 nodes for 677 values in interval
[5448, 6124]. This extreme difficulty in addressing the QAP has resulted in
the development of exact solution methods to be implemented on highperformance computers (7,38–43).
The parallel B&B presented in this section is based on the serial algorithm
of Mautor and Roucairol (11) and uses the lower bound procedure (DP) of
Hahn-and-Grant (44). This lower bound is based upon a new linear formulation for the QAP called “level_1 formulation RLT”. A new variable is defined
yijkl = xijxkl and the formulation becomes
n
n
n
(QAP:)Min∑ ∑ ∑ ∑
i =1 j =1 k =1k ≠i l =1l ≠ j
n
n
Cijkl yijkl + ∑ ∑ cij xij
i =1 j =1
21
ILLUSTRATIONS
n
∑
yijkl = xij = 0, 1
∀(i, j ), l ≠ j
(1.1)
yijkl = xij = 0, 1
∀(i, j ), k ≠ i
(1.2)
∀(i, j ), k ≠ i, l ≠ j
(1.3)
=1
∀i = 1, . . . , n
(1.4)
=1
∀j = 1, . . . , n
(1.5)
∀(i, j, k, l ), k ≠ i, l ≠ j
(1.6)
k =1k ≠i
n
∑
l =1l ≠ j
yijkl = yklij
n
∑x
ij
j =1
n
∑x
ij
i =1
yijkl ≥ 0
The Linear programming relaxation of this program provides a lower
bound to the QAP. But, as the number of variables and constraints is huge, it
cannot be computed by commercial softwares. Hahn-and-Grant proposed a
procedure called DP based on a successive dual decomposition of the problem
(44). The DP bound is obtained iteratively by solving ñ2 + 1 linear assignment
problems (using the Hungarian method), instead of solving the large linear
program given above.
A “polytomic” branching strategy is used (45). This strategy extends a node
by creating all assignments of an unassigned facility to unassigned locations
based upon the counting of forbidden locations. A forbidden location is a location where the addition of the corresponding leader element would increase
the lower bound beyond the upper bound. At a given unfathomed node, we
generate children according to one of the following schemes
•
•
Row Branching. Fix i ∈ I. Generate a child problem for each j ∈ J for
which the problem with Xij = 1 cannot be eliminated;
Column Branching. Fix j ∈ J. Generate a child problem for each i ∈ I for
which the problem with Xij = 1 cannot be eliminated.
Several different branching strategies to choose the candidate for next generation i (row) or j (column) have been considered:
1. SLC: Choose row i or column j, which maximizes the sum of leaders.
Max(∑ Cijij )
2. SLC_v2: Add the nonlinear elements of each submatrix to the associated
leader and choose row or column with maximal sum.
22
PARALLEL BRANCH-AND-BOUND ALGORITHMS
n n
⎞
⎛
Max⎜ Cijij + ∑ ∑ Cijkl ⎟
⎠
⎝
k =1 l =1
3. SLC_v3: Choose a row or column with a maximal sum of elements in
their associated submatrices.
The implementation of the B&B algorithm is made with Bob++ framework
(16) on top of the Athapascan environment. Athapascan is a macrodata-flow
application programming interface (API) for asynchronous parallel programming. The API permits to define the concurrency between computational tasks
that make synchronization from their accesses to objects into a global distributed memory. The parallelism is explicit, and functional, the detection of
the synchronization is implicit. The semantics is sequential and an Athapascan
program is independent from the target parallel architecture (cluster or grid).
The execution of the program relies on an interpretation step that builds a
macrodata-flow graph. The graph is direct and acyclic (DAG) and it encodes
the computation and the data depencies (read and write). It is used by the
runtime support and the scheduling algorithm to compute a schedule of tasks
and a mapping of data onto the architecture. The implementation is based on
using a light-weight process (thread) and one-side communication (active
message).
In the context of B&B, an Athapascan task is a subproblem. The execution
of the task yields to the exploration of a subtree of the B&B tree. The set of
ready tasks (the subproblems) is distributed on the processors. Each processor stores a list of ready tasks locally. As Athapascan performs load balancing
on the lists to insure maximal efficiency, each processor has a subproblem to
work on. Each Athapascan’s list of ready tasks is equivalent to a local priority queue that stores a subproblem in a parallel B&B. Then, according to the
strategies listed in Section 1.5, the Bob++/Athapascan Framework proposes a
fully distributed strategy.
First, we present results on a small cluster (named COCTEAU, located at
the PRiSM laboratory, University of Versailles): 7 DELL workstations (Xeon
bi-proc, 2.4 GHz, 2 GB RAM, 5 Go memory, Red Hat 7.3). The cluster was not
dedicated, but was very steady. Table 1.1 displays the results.
TABLE 1.1. Runtime of Parallel QAP on COCTEAU Cluster of Size 10
Instance
Nug18
Nug20
Nug21
Nug22
Nug24
Cocteau Cluster, SLC
Sequential Time, min
Parallel Time, min
Efficiency
23.31
246.98
548.31
402.43
10764.2
2.43
24.83
54.86
44.23
1100.56
0.959
0.994
0.999
0.909
0.978
ILLUSTRATIONS
23
Performances are very good, as the efficiency is close to 1. The performance
decreases slightly as the size increases up to 24. To show that the results are
not machine dependant, we display the results of experiments with the same
number of processors on a larger cluster (located in Grenoble and named Icluster 2): 104 HP nodes (bi-Itanium-2900 MHz, 3 GB RAM, Intel icc 8 Compiler) interconnected by a Myrinet network. As displayed in Table 1.2, with 14
processors, the performances are similar, while for two problems, Nugent 18
and 20, favorable anomalies appear with a speedup >14.
Tables 1.1 and 1.2 show the results with the SLC Branching strategy presented above. We compare different branching strategies (SLC, SLC_v2
SLC_v3) using the small cluster (14 processors) implementation. Results are
displayed in Table 1.3. The total number of explored nodes is comparable
with a small advantage to strategy SLC_v3. Notice that the optimal solution
is known for Nugent problems of size up to 30, and thus the upper bound is
initialized to this value plus one unit. The parallel branch and bound algorithm
developed the critical tree (critical nodes C representing incomplete solutions
with a value strictly smaller than the optimum solution f*) and explored some
nodes in the set of undecidable nodes M, nonterminal or incomplete solutions
with a value equal to f*, and one node in O (optimal nodes with value f*).
Indeed, the set E of eliminated nodes (with a value strictly >f*) is empty. But
TABLE 1.2. Performance of Parallel QAP on I-Cluster 2 of Size 14
Instance
Nugent18
Nugent20
Nugent21
Nugent22
Running time, min
Performances
Sequential
Parallel(7CPUs)
Parallel(14CPUs)
Speedup
Efficiency
54.24
588.46
1208.9
959.7
7.54
84.29
173.53
138.14
3.39
41.96
87.14
69.45
16
14.02
13.87
13.81
114.28
100.17
99.09
98.7
TABLE 1.3. Comparison of Branching Strategies
Instance
Nug12
Nug14
Nug15
Nug17
Nug18
Nug20
Nug22
Nug24
SLC_v2
SLC_v3
Nodes
Time
Nodes
Time
11
121
248
1,653
4,775
42,223
47,312
266,575
01.01
04.04
14.16
173.74
593.90
5,248.15
7,235.75
73,821
10
122
239
1,674
4,744
42,232
47,168
265,687
01.03
04.04
10.13
107.07
369.70
4,892.72
7,447.54
69,365
24
PARALLEL BRANCH-AND-BOUND ALGORITHMS
TABLE 1.4. Runtime of Parallel B&B for QAP on a Large Cluster with 80
Processors
Instance
Nug18
Nug20
Nug21
Nug22
Nug24
I-Cluster2, SLC
1CPU
20CPUs
40CPUs
50CPUs
80CPUsb
11.07
149.06
61.28
214.7
n.a
1.07
5.78
2.41
8.26
82.18
1.6
4.56
2.11
5.81
44.35
0.53
3.38
1.55
4.56
37.25
n.a
4.46
2.98
6.01
27.96
the set M could be very large and, according to the branching strategy, the size
of the explorated tree could be very close to the size of the minimal tree. For
Nugent 24, at the root node, the search interval is very thin in comparison with
the number of explored nodes, while, later, many nodes have equal evaluation.
In addition, tests have been conducted on a higher number of processors,
on the I-cluster 2. Table 1.4 shows that the results for the problem Nugent 24
up to 80 processors are very good. The other problems are too small to have
the time to use all the 80 processors. This is the reason why the times with 80
processors is slightly higher than those with 50 processors.
To complete this illustration, we discus the implementation of the parallel
B&B algorithm for the QAP using Grid Computing. Grid computing (or metacomputing or using a computational grid) is the application of the resources
of many geographically distributed, network-linked, heterogeneous computing resources to a single problem at the same time. The main features of metacomputing are
•
•
•
•
Dynamically available computing resources: Machines may join the computation at any time.
Unreliability: Machines may leave without warning due to reboot,
machine failure, network failure, and so on.
Loosely coupling: Huge network, “low” communication speed, communication latency are highly variable and unpredictable.
Heterogeneous resources: Many different machines (shared workstations,
nodes of PC clusters, supercomputers) and characteristics (memory,
processors, OS, network latency, and so on.
Programs should therefore be self-adaptable and fault tolerant. Grid computing requires the use of software based upon resource management tools
provided by projects like Globus (46), Legion (47), and Condor (28). The goal
of these tools is to assign processors to a parallel application. These software
do not perform load balancing between the processes, however.
The advantage of such a platform compared to a traditional multiprocessor machine is that a large number of CPUs may be assembled very inex-
REFERENCES
25
pensively. The disadvantage is that the availability of individual machines is
variable and communication between processors may be very slow.
We experimented our parallel QAP algorithm, using the Athpascan environment, on a French grid initiative, called e-Toile, where six locations with
clusters or supercomputers were linked by a very high-speed network.
The middleware used was the e-Toile Middleware that was an evolution of the
Globus middleware. We obtained and proved the optimal solution to the
Nugent 20 instance in a time of 1648 secondes (>4 h), using a cluster of 20
machines (AMD MP, 1800+) at one location, while only 499 s (8,31 min) were
required with the following 84 machines located in different sites:
•
•
•
•
7*bi-Xeons 2.4 GHz, 2 Go (Laboratory PriSM-Versailles).
10*bi-AMD MP 1800, 1 Go (CEA-Saclay).
10*bi-Xeons, 2.2 GHz, 2 Go (EDF-Clamart).
15*bi-PIII, 1.4 GHz, 1 Go (ENS-Lyon).
A relative speedup of T20/T84 = 3.1 was thus achieved.
1.8. CONCLUSION
We have presented a summary of basic concepts for parallel B&B methodology applied to hard combinatorial optimization problems. This methodology
is achieving very impressive results and is increasinly “accessible” given the
continuously decrease in the costs of computers, networks, and communication devices.
We have also presented several software tools that help to implement Parallel B&B algorithms. These tools offer a large variety of interfaces and tools
that allow the user to implement algorithms from the basic B&B to the most
sophisticated branch and cut. However, a lot of work has to be done to include
fault-tolerance, self-adaptibility, multiapplication, heterogeneity in these tools.
We also present the resolution of the Quadratic assignement problem,for
which the parallelism is really a great feature to solve instances that could
never been approach before.
REFERENCES
1. D. Applegate, R.E. Bixby, V. Chvatal, and W. Cook. On the solution of traveling
salesman problem. Doc. Math., ICM(III):645–656 (1998).
2. K.M. Anstreicher and N.W. Brixius. A new bound for the quadratic assignment
problem based on convex quadratic programming. Math. Prog., 89(3):341–357
(2001).
3. J.P. Goux, K.M. Anstreicher, N.W. Brixius, and J. Linderoth. Solving large quadratic
assignment problems on computational grids. Math. Prog., 91(3):563–588 (2002).
26
PARALLEL BRANCH-AND-BOUND ALGORITHMS
4. B. Gendron and T.G. Crainic. Parallel Branch-and-Bound Algorithms: Survey and
Synthesis. Oper. Res., 42(6):1042–1066 (1994).
5. G. Authié et al. Parallélisme et Applications Irrégulières. Hermès, 1995.
6. C. Roucairol. A parallel branch and bound algorithm for the quadratic assignment
problem. Discrete Appl. Math., 18:211–225 (1987).
7. B. Mans, T. Mautor, and C. Roucairol. A parallel depth first search branch and
bound algorithm for the quadratic assignment problem. Eur. J. Oper. Res. (Elsevier), 3(81):617–628 (1995).
8. T.-H. Lai and S. Sahni. Anomalies in parallel branch-and-bound algorithms. Communication A.C.M., 27:594–602 (June 1984).
9. G. Li and B.W. Wah. Coping with anomalies in parallel branch-and-bound. IEEE
Trans. Comp., C-35(6):568–573 (June 1986).
10. C. Roucairol. Recherche arborescente en parallèle. RR M.A.S.I. 90.4, Institut
Blaise Pascal—Paris VI, 1990. In French.
11. B. Mans and C. Roucairol. Theoretical comparisons of parallel best-first search
branch and bound algorithms. In H. France, Y. Paker, and I. Lavallée, eds.,
OPOPAC, International Workshop on Principles of Parallel Computing, Lacanau,
France, November 1993.
12. L.V. Kalé and Vikram A. Saletore. Parallel state-space search for a first solution
with consistent linear speedups. Inter. J. Parallel Prog., 19(4):251–293 (1990).
13. Anshul Gupta and Vipin Kumar. Scalability of parallel algorithms for matrix multiplication. Proceedings of the 1993 International Conference on Parallel Processing, Vol. III—Algorithms & Applications, CRC Press, Boca Raton, FL, 1993,
pp. III-115–III-123.
14. Andrew Makhorin. Glpk (gnu linear programming kit). Available at
http://www.gnu.org/software/glpk/glpk.html.
15. Y. Shinano, M. Higaki, and R. Hirabayashi. A generalized utility for parallel branch
and bound algorithms. Proceedings of the 7nd IEEE Symposium on Parallel and
Distributed Processing (SPDP ’95), 1995, pp. 858–865. Available at
http://al.ei.tuat.ac.jp/yshinano/pubb/.
16. B. Le Cun. Bob++ framework: User’s guide and API. Available at
http://www.prism.uvsq.fr/blec/Research/BOBO/.
17. S. Tschoke and T. Polzer. Portable parallel branch-and-bound library user manuel,
library version 2.0. Technical Report, University of Paderborn, 1998.
18. J. Eckstein, C.A. Phillips, and W.E. Hart. Pico: An object-oriented framework for
parallel branch-and-bound. Technical Report 40-2000, RUTCOR Research
Report, 2000.
19. Q. Chen and M.C. Ferris. FATCOP: A fault tolerant condor-PVM mixed integer
program solver. Technical Report, University of Madison, Madison, w1, 1999.
20. E. Alba et al. Mallba: A library of skeletons for combinatorial optimisation. In B.
Monien and R. Feldman, eds., Euro-Par 2002 Parallel Processing, Vol. 2400 of
Lecture Notes in Computer Science, Springer-Verlag, Berlin Heidelberg, 2002,
pp. 927–932.
21. A. Brunnger, A. Marzetta, K. Fukuda, and J. Nievergelt. The parallel search bench
zram and its applications. Ann. Oper. Res., 90:45–63 (1999).
22. The COIN-OR Team. The coin-or organization. http://www.coinor.org/.
REFERENCES
27
23. Y. Xu, T.K. Ralphs, L. Ladányi, and M.J. Saltzman. Alps: A framework for implementing parallel search algorithms. The Proceedings of the Ninth INFORMS Computing Society Conference (2005).
24. J. Goux, J. Linderoth, and M. Yoder. Metacomputing and the master-worker paradigm, Technical Report, Angcnne National Laboratory. 1999.
25. T.K. Ralphs and M. Guzelsoy. The symphony callable library for mixed integer programming. The Proceedings of the Ninth INFORMS Computing Society Conference, San Francisco, USA, (2005).
26. T.K. Ralphs. Symphony 3.0.1 user’s manual. http://www.branchandcut.org/.
27. M. Benaichouche et al. Bob: une plateforme unifiée de développement pour les
algorithmes de type branch-and-bound. RR 95/12, Laboratoire PRiSM, Université
de Versailles—Saint Quentin en Yvelines, May 1995. In french.
28. M. Litzkow, M. Livny, and M.W. Mutka. Condor—a hunter of idle workstations.
Proceedings of the 8th International Conference of Distributed Computing Systems
(ICDCS ’88), 1988, pp. 104–111.
29. V.S. Sunderam. PVM: a framework for parallel distributed computing. Concurrency, Practice and Experience, 2(4):315–340 (1990).
30. J.F. Mehaut and R. Namyst. PM2: Parallel Multithreaded Machine. A multithreaded
environment on top of PVM. Proceedings of EuroPVM’95, Lyon, September
1995.
31. L.V. Kale and Sanjeev Krishnan. Charm++: Parallel Programming with MessageDriven Objects. In Gregory V. Wilson and Paul Lu, eds., Parallel Programming
using C++, MIT Press, 1996, pp. 175–213. Boston, MA.
32. T. Gautier, R. Revire, and J.L. Roch. Athapascan: an api for asynchronous parallel
programming. Technical Report, INRIA RT-0276, 2003.
33. T.K. Ralphs. Parallel Branch and Cut for Capacitated Vehicle Routing. Parallel
Comp., 29:607–629 (2003).
34. T.K. Ralphs, L. Ladány, and M.J. Saltzman. Parallel Branch, Cut, and Price for
Large-Scale Discrete Optimization. Math. Prog., 98(13):253–280 (2003).
35. B. Gendron and T.G. Crainic. A Parallel Branch-and-Bound Algorithm for
Multicommodity Location with Balancing Requirements. Comp. Oper. Res.,
24(9):829–847 (1997).
36. B. Bourbeau, B. Gendron, and T.G. Crainic. Branch-and-Bound Parallelization
Strategies Applied to a Depot Location and Container Fleet Management
Problem. Parallel Comp., 26(1):27–46 (2000).
37. A. Djerrah, V.D. Cung, and C. Roucairol. Solving large quadratic assignement problems, on clusters and grid with bob++/athapascan. Fourth International Workshop
of the PAREO Working Group on Parallel Processing in Operation Research,
Mont-Tremblant, Montreal, Canada, January 2005.
38. J. Crouse and P. Pardalos. A parallel algorithm for the quadratic assignment
problem. Proceedings of Supercomputing 89, ACM, 1989, pp. 351–360.
39. A. Brüngger, A. Marzetta, J. Clausen, and M. Perregaard. Solving large-scale QAP
problems in parallel with the search library ZRAM. J. Parallel Distributed Comp.,
50(1–2):157–169 (1998).
40. V-D. Cung, S. Dowaji, Cun B. Le T. Mautor, and C. Roucairol. Concurrent data
structures and load balancing strategies for parallel branch-and-bound/a*
28
41.
42.
43.
44.
45.
46.
47.
PARALLEL BRANCH-AND-BOUND ALGORITHMS
algorithms. III Annual Implementation Challenge Workshop, DIMACS, New
Brunswick NJ, October 1994.
B. Le cun and C. Roucairol. Concurrent data structures for tree search algorithms.
In J. Rolim A. Ferreira, ed., IFIP WG 10.3, IRREGULAR94: Parallel Algorithms
for Irregular Structured Problems, Kluwer Academic, Gereva, Swizterland,
September 1994, pp. 135–155.
Y. Denneulin, B. Lecun, T. Mautor, and J.F. Mehaut. Distributed branch and bound
algorithms for large quadratic assignment problems. 5th Computer Science Technical Section on Computer Science and Operations Research, Dallas, TX, 1996.
Y. Denneulin and T. Mautor. Techniques de régulation de charge—applications en
optimisation combinatoire. ICaRE’97, Conception et mise en oeuvre d’applications
parallèles irrégulières de grande taille, Aussois, 1997, pp. 215–228.
P. Hahn and T. Grant. Lower bounds for the quadratic assignment problem based
upon a dual formulation. Oper. Res., 46:912–922 (1998).
T. Mautor and C. Roucairol. A new exact algorithm for the solution of quadratic
assignment problems. Discrete Appl. Math., 55:281–293 (1994).
Ian Foster and Carl Kesselman. Globus: A metacomputing infrastructure toolkit.
Inter. J. Supercomp. Appl. High Performance Compu., 11(2):115–128 (Summer
1997).
A. Natrajan et al. The legion grid portal. Grid Computing Environments 2001, Concurrency and Computation: Practice and Experience, 2002. Vol. 14 No 13–15
pp. 1365–1394.