The Traveling Salesman Problem: A Neural Network Perspective

1
The Traveling Salesman Problem:

A Neural Network Perspective
Jean-Yves Potvin
Centre de Recherche sur les Transports

Université de Montréal
C.P. 6128, Succ. A,
Montréal (Québec)
Canada H3C 3J7
[email protected]
Abstract. This paper surveys the "neurally" inspired p r o b l e m -

solving approaches to the traveling salesman problem, namely, t h e
Hopfield-Tank network, the elastic net, and the self-organizing m a p .
The latest achievements in the neural network domain are r e p o r t e d
and numerical comparisons are provided with the classical solution
approaches of operations research. An extensive bibliography w i t h
more than one hundred references is also included.
Introduction
The Traveling Salesman Problem (TSP) is a classical

combinatorial optimization problem, which is simple to state b u t
very difficult to solve. The problem is to find the shortest possible
tour through a set of N vertices so that each vertex is visited exactly
once. This problem is known to be NP-complete, and cannot b e
solved exactly in polynomial time.
Many exact and heuristic algorithms have been devised in t h e

field of operations research (OR) to solve the TSP. We refer readers to
[15, 64, 65] for good overviews of the TSP. In the sections that follow,
we briefly introduce the OR problem-solving approaches to the TSP.
Then, the neural network approaches for solving that problem a r e
discussed.
Exact Algorithms
The exact algorithms are designed to find the optimal solution t o

the TSP, that is, the tour of minimum length. They a r e
computationally expensive because they must (implicitly) consider
2
all feasible solutions in order to identify the optimum. The exact

algorithms are typically derived from the integer linear
programming (ILP) formulation of the TSP
Min Σ iΣ j d ij x ij
subject to:
Σj x ij = 1 , i=1,...,N
Σi x ij = 1 , j=1,...,N
(x ij) X
x ij = 0 or 1 ,
where d ij is the distance between vertices i and j and the x ij 's are the
decision variables: x ij is set to 1 when arc (i,j) is included in the tour,
and 0 otherwise. (x ij ) X denotes the set of s u b t o u r - b r e a k i n g
constraints that restrict the feasible solutions to those consisting of a
single tour.
Although the subtour-breaking constraints can be formulated i n

many different ways, one very intuitive formulation is
Σ i,j S V xij ≤ |Sv | - 1 (Sv V; 2 ≤ |Sv | ≤ N-2) ,
where V is the set of all vertices, S v is some subset of V and |S v | is

the cardinality of S v . These constraints prohibit subtours, that is,
tours on subsets with less than N vertices. If there were such a
subtour on some subset of vertices S v , this subtour would contain |S v |
arcs. Consequently, the left-hand side of the inequality would b e
equal to |Sv |, which is greater than |S v |-1, and the constraint would be
violated for this particular subset. Without the s u b t o u r - b r e a k i n g
constraints, the TSP reduces to an assignment problem (AP), and a
solution like the one shown in Figure 1 would then be feasible.
Branch and bound algorithms are commonly used to find a n

optimal solution to the TSP, and the AP-relaxation is useful t o
generate good lower bounds on the optimum value. This is true i n
particular for asymmetric problems, where d ij ≠ d ji for some i,j. For
symmetric problems, like the Euclidean TSP (ETSP), the AP-solutions
often contain many subtours with only two vertices. Consequently,
3
these problems are better addressed by specialized algorithms t h a t

can exploit their particular structure. For instance, a specific ILP
formulation can be derived for the symmetric problem which allows
for relaxations that provide sharp lower bounds (e.g., the s h o r t e s t
spanning one-tree [46] ) .
(a) (b)
Fig. 1. (a) Solving the TSP, (b) Solving the assignment problem.
It is worth noting that problems with a few hundred vertices

can now be routinely solved to optimality. Also, instances involving
more than 2,000 vertices have been addressed. For example, t h e
optimal solution to a symmetric problem with 2,392 vertices w a s
identified after two hours and forty minutes of computation time o n
a powerful vector computer, the IBM 3090/600. [76,77] On the o t h e r
hand, a classical problem with 532 vertices took five and a half hours
on the same machine, indicating that the size of the problem is n o t
the only determining factor for computation time. We refer t h e
interested reader to [64] for a complete description of the state of the
art with respect to exact algorithms.
Heuristic Algorithms
Running an exact algorithm for hours on an expensive c o m p u t e r

may not be very cost-effective if a solution, within a few percent of
the optimum, can be found quickly on a microcomputer. Accordingly,
heuristic or approximate algorithms are often preferred to exact
algorithms for solving the large TSPs that occur in practice (e.g.,
drilling problems).
4
Generally speaking, TSP heuristics can be classified as t o u r

construction procedures, tour improvement procedures, and
composite procedures, which are based on both construction a n d
improvement techniques.
(a) Construction procedures. The best known procedures in t h i s

class gradually build a tour by selecting each vertex in turn and b y
inserting them one by one into the current tour. Various metrics a r e
used for selecting the next vertex and for identifying the best place
to insert it, like the proximity to the current tour and the m i n i m u m
detour. [88]
(b) Improvement procedures. Among the local i m p r o v e m e n t

procedures, the k-opt exchange heuristics are the most widely used,
in particular, the 2-opt, 3-opt, and Lin-Kernighan heuristics. [67,68]
These heuristics locally modify the current solution by replacing k
arcs in the tour by k new arcs so as to generate a new improved tour.
Figure 2 shows an example of a 2-opt exchange. Typically, t h e
exchange heuristics are applied iteratively until a local optimum is
found, namely a tour which cannot be improved further via t h e
exchange heuristic under consideration. In order to overcome t h e
limitations associated with local optimality, new heuristics like
simulated annealing and tabu search are being used. [25,39,40,60]
Basically, these new procedures allow local modifications t h a t
increase the length of the tour. By this means, the method can escape
from local minima and explore a larger number of solutions.
The neural network models discussed in this paper are often

compared to the simulated annealing heuristic described in [60]. I n
this context, simulated annealing refers to an implementation b a s e d
on the 2-opt exchanges of Lin [67] , where an increase in the length of
the tour has some probability of being accepted (see the description
of simulated annealing in Section 3).
(c) Composite procedures. Recently developed composite

procedures, which use both construction and i m p r o v e m e n t
techniques, are now among the most powerful heuristics for solving
TSPs. Among the new generation of composite heuristics, the m o s t
successful ones are the CCAO heuristic, [41] the GENIUS heuristic, [38]
and the iterated Lin-Kernighan heuristic. [ 5 3 ]
For example, the iterated Lin-Kernighan heuristic can routinely

find solutions within 1% of the optimum for problems with up t o
5
10,000 vertices. [53] Heuristic solutions within 4% of the optimum f o r

some 1,000,000-city ETSPs are reported in [12]. Here, the t o u r
construction procedure is a simple greedy heuristic. At the start, each
city is considered as a fragment, and multiple fragments are built i n
parallel by iteratively connecting the closest fragments together until
a single tour is generated. The solution is then processed by a 3 - o p t
exchange heuristic. A clever implementation of this procedure solved
some 1,000,000-city problems in less than four hours on a VAX
8550.
i j i j
l k l k
Fig. 2. Exchange of links (i,k),(j,l) for links (i,j),(k,l).
Artificial Neural Networks
Because of the simplicity of its formulation, the TSP has a l w a y s

been a fertile ground for new solution ideas. Consequently, it is n o t
surprising that many problem-solving approaches inspired b y
artificial neural networks have been applied to the TSP.
Currently, neural networks do not provide solution quality t h a t

compares with the classical heuristics of OR. However, the technology
is quite young and spectacular improvements have already b e e n
achieved since the first attempts in 1985. [51] All of these efforts f o r
solving a problem that has already been quite successfully a d d r e s s e d
by operations researchers are motivated, in part, by the fact t h a t
artificial neural networks are powerful parallel devices. They a r e
made up of a large number of simple elements that can process t h e i r
inputs in parallel. Accordingly, they lend themselves naturally t o
implementations on parallel computers. Moreover, many n e u r a l
6
network models have already been directly implemented in

hardware as "neural chips."
Hence, the neural network technology could provide a means t o

solve optimization problems at a speed that has never been achieved
before. It remains to be seen, however, whether the quality of t h e
neural network solutions will ever compare to the solutions produced
by the best heuristics in OR. Given the spectacular improvements i n
the neural network technology in the last few years, it w o u l d
certainly be premature at this time to consider this line of r e s e a r c h
to be a "dead end."
In the sections that follow, we review the three basic n e u r a l

network approaches to the TSP, namely, the Hopfield-Tank network,
the elastic net, and the self-organizing map. Actually, the elastic n e t s
and self-organizing maps appear to be the best approaches f o r
solving the TSP. But the Hopfield-Tank model was the first to b e
applied to the TSP and it has been the dominant neural approach f o r
solving combinatorial optimization problems over the last decade.
Even today, many researchers are still working on that model, t r y i n g
to explain its failures and successes. Because of its importance, a
large part of this paper is thus devoted to that model and i t s
refinements over the years.
The paper is organized along the following lines. Sections 1 and 2

first describe the Hopfield-Tank model and its many variants.
Sections 3 and 4 are then devoted to the elastic net and the self-
organizing map, respectively. Finally, concluding remarks are m a d e
in Section 5. Each basic model is described in detail and no d e e p
understanding of neural network technology is assumed. However,
previous exposure to an introductory paper on the subject could h e l p
to better understand the various models. [61] In each section,
computation times and numerical comparisons with other OR
heuristics are provided when they are available. However, the OR
specialist must understand that the computation time for simulating
a neural network on a serial digital computer is not particularly
meaningful, because such an implementation does not exploit t h e
inherent parallelism of the model. For this reason, computation t i m e s
are often missing in neural network research papers.
A final remark concerns the class of TSPs addressed by n e u r a l

network researchers. Although the Hopfield-Tank network has b e e n
applied to TSPs with randomly generated distance matrices, [106]
7
virtually all work concerns the ETSP. Accordingly, Euclidean

distances should be assumed in the sections that follow, unless it is
explicitly stated otherwise.
The reader should also note that general surveys on the use of
neural networks in combinatorial optimization may be found i n
[22, 70]. An introductory paper about the impacts of neurocomputing
on operations research may be found in [29].
Section 1. The Hopfield-Tank Model
Before going further into the details of the Hopfield model, it is

important to observe that the network or graph defining the TSP is
very different from the neural network itself. As a consequence, t h e
TSP must be mapped, in some way, onto the neural n e t w o r k
structure.
For example, Figure 3a shows a TSP defined over a

transportation network. The artificial neural network encoding t h a t
problem is shown in Figure 3b. In the transportation network, t h e
five vertices stand for cities and the links are labeled or weighted b y
the inter-city distances d ij (e.g., d NY,LA is the distance between New
York and Los Angeles). A feasible solution to that problem is the t o u r
Montreal-Boston-NY-LA-Toronto-Montreal, as shown by the bold
arcs.
In Figure 3b, the Hopfield network [50] is depicted as a 5 x 5

matrix of nodes or units that are used to encode solutions to the TSP.
Each row corresponds to a particular city and each column to a
particular position in the tour. The black nodes are the activated
units that encode the current solution (namely, Montreal is in first
position in the tour, Boston in second position, NY in third, etc.).
Only a few connections between the units are shown in Figure

3b. In fact, there is a connection between each pair of units, and a
weight is associated with each connection. The signal sent along a
connection from unit i to unit j is equal to the weight T ij if unit i is
activated. It is equal to 0 otherwise. A negative weight thus defines
an inhibitory connection between the two units. In such a case, it is
unlikely that both units will be active or "on" at the same time,
because the first unit that turns on immediately sends an inhibitory
signal to the other unit through that connection to prevent i t s
activation. On the other hand, it is more likely for both units to be o n
at the same time if the connection has a positive weight. In such a
8
case, the first unit that turns on sends a positive excitatory signal t o
the other unit through that connection to facilitate its activation.
Montreal
(a) TSP problem
Toronto
Boston
NY
d
NY,LA
LA
(b) Neural network representation
1 2 3 4 5
Boston
Montreal
LA
T LA5,NY5
NY
Toronto
Fig. 3. Mapping a TSP onto the Hopfield network.

9
In the TSP context, the weights are derived in part from t h e

inter-city distances. They are chosen to penalize infeasible tours and,
among the feasible tours, to favor the shorter ones. For example,
T LA5,NY5 in Figure 3b denotes the weight on the connection b e t w e e n
the units that represent a visit to cities LA and NY both in the fifth
position on a tour. Consequently, that connection should be inhibitory
(negative weight), because two cities cannot occupy the same exact
position. The first unit to be activated will inhibit the other unit v i a
that connection, so as to prevent an infeasible solution to occur.
In Section 1.1, we first introduce the Hopfield model, which is a

network composed of binary "on/off" or "0/1" units, like the artificial
neural network shown in Figure 3b. We will then describe t h e
Hopfield-Tank model, which is a natural extension of the discrete
model to units with continuous activation levels. Finally, t h e
application of the Hopfield-Tank network to the TSP will b e
described.
1.1 The Discrete Hopfield Model
The original Hopfield neural network model [50] is a fully

interconnected network of binary units with symmetric connection
weights between the units. The connection weights are not l e a r n e d
but are defined a priori from problem data (the inter-city distances
in a TSP context). Starting from some arbitrarily chosen initial
configuration, either feasible or infeasible, the Hopfield n e t w o r k
evolves by updating the activation of each unit in turn (i.e., a n
activated unit can be turned off, and an unactivated unit can b e
turned on). The update rule of any given unit involves the activation
of the units it is connected to as well as the weights on t h e
connections. Via this update process, various configurations a r e
explored until the network settles into a stable configuration. In t h i s
final state, all units are stable according to the update rule and d o
not change their activation status.
The dynamics of the Hopfield network can be described formally

in mathematical terms. To this end, the activation levels of t h e
binary units are set to zero and one for "off" and "on," respectively.
Starting from some initial configuration {Vi}i=1,...,L, where L is t h e
number of units and V i is the activation level of unit i, the n e t w o r k
relaxes to a stable configuration according to the following u p d a t e
rule
10
set Vi to 0 if Σ jT ijV j < θi
set Vi to 1 if Σ jT ijV j > θi
do not change Vi if Σ jT ijV j = θi ,
where T ij is the connection weight between units i and j, and θ i is t h e

threshold of unit i.
The units are updated at random, one unit at a time. Since t h e

configurations of the network are L-dimensional, the update of o n e
unit from zero to one or from one to zero moves the configuration of
the network from one corner to another of the L-dimensional u n i t
hypercube.
The behavior of the network can be characterized by a n

appropriate energy function. The energy E depends only on t h e
activation levels V i (the weights T ij and the thresholds θ i are fixed
and derived from problem data), and is such that it can only
decrease as the network evolves over time. This energy is given by
E = -1/2 Σ iΣ j T ijV iV j + Σ i θ iV i . (1.1)
Since the connection weights T ij are symmetric, each term T ij V i V j

appears twice within the double summation of (1.1). Hence, t h i s
double summation is divided by 2.
It is easy to show that a unit changes its activation level if a n d

only if the energy of the network decreases by doing so. In order t o
prove that statement, we must consider the contribution E i of a given
unit i to the overall energy E, that is,
Ei = - Σ j T ijV iV j + θ iV i .
Consequently,
if Vi = 1 then Ei = - Σ j T ij V j + θ i
if Vi = 0 then Ei = 0 .
Hence, the change in energy due to a change ∆ V i in the activation

level of unit i is
∆Ei = - ∆Vi ( Σ j T ijV j - θ i) .

11
Now, ∆ V i is one if unit i changed its activation level from zero t o

one, and such a change can only occur if the expression between t h e
parentheses is positive. As a consequence, ∆Ei is negative and t h e
energy decreases. This same line of reasoning can be applied when a
unit i changes its activation level from one to zero (i.e., ∆ V i = -1).
Since the energy can only decrease over time and the number of
configurations is finite, the network must necessarily converge to a
stable state (but not necessarily the minimum energy state). In t h e
next section, a natural extension of this model to units w i t h
continuous activation levels is described.
1.2 The Continuous Hopfield-Tank Model
In [51], Hopfield and Tank extended the original model to a fully

interconnected network of nonlinear analog units, where t h e
activation level of each unit is a value in the interval [0,1]. Hence, t h e
space of possible configurations {Vi}i=1,...,L is now continuous r a t h e r
than discrete, and is bounded by the L-dimensional h y p e r c u b e
defined by V i = 0 or 1. Obviously, the final configuration of t h e
network can be decoded into a solution of the optimization p r o b l e m
if it is close to a corner of the hypercube (i.e., if the activation v a l u e
of each unit is close to zero or one).
The main motivation of Hopfield and Tank for extending t h e

discrete network to a continuous one was to provide a model t h a t
could be easily implemented using simple analog h a r d w a r e .
However, it seems that continuous dynamics also facilitate
convergence. [47]
The evolution of the units over time is now characterized by t h e

following differential equations (usually called "equations of motion")
dU i/dt = Σ j TijV j + Ii - Ui, i=1,...,L (1.2)
where U i, I i and V i are the input, input bias, and activation level of
unit i, respectively. The activation level of unit i is a function of i t s
input, namely
V i = g(Ui ) = 1/2 (1 + tanh Ui /U o ) = 1/ (1+ e-2U i /U o ) . (1.3)
The activation function g is the well-known sigmoidal function,

which always returns a value between 0 and 1. The parameter U o is
12
used to modify the slope of the function. In Figure 4, for example, the
U o value is lower for curve (2) than for curve (1).
g(Ui)
g(Ui) = 1
(2)
(1)
(1)
(2)
g(Ui) = 0
0 Ui
Fig. 4. The sigmoidal activation function.
The energy function for the continuous Hopfield-Tank model is

now
Vi
E = -1/2 Σ iΣ j T ijV iV j - Σ iV iI i + ∫ g -1 (x)dx . (1.4)

0
Note in particular that d U i /dt = -dE/dV i . Accordingly, when t h e

units obey the dynamics of the equations of motion, the network is
performing a gradient descent in the network's configuration space
with respect to that energy function, and stabilizes at a local
minimum. At that point, dU i /dt = 0 and the input to any given unit i
is the weighted sum of the activation levels of all the other units p l u s
the bias, that is
Ui = Σ j T ijV j + Ii .
13
1.3 Simulation of the Hopfield-Tank Model
In order to simulate the behavior of the continuous Hopfield-

Tank model, a discrete time approximation is applied to t h e
equations of motion
{U i (t+ ∆ t) - Ui (t)} / ∆ t = Σ j TijV j(t) + Ii - Ui(t) ,
where ∆ t is a small time interval. This formula can be rewritten as
U i(t+ ∆ t) = U i(t) + ∆ t ( Σ j TijV j(t) + Ii - Ui(t) ) .
Starting with some initial values {Ui(0)}i=1,...,L at time t=0, t h e

system evolves according to these equations until a stable state is
reached. During the simulation, ∆ t is usually set to 1 0 -5 or 10 -6 .
Smaller values provide a better approximation of the analog system,
but more iterations are then required to converge to a stable state.
In the literature, the simulations have mostly been p e r f o r m e d

on standard sequential machines. However, implementations o n
parallel machines are discussed in [10, 93]. The authors report that i t
is possible to achieve almost linear speed-up with the number of
processors. For example, a Hopfield-Tank network for a 100-city TSP
took almost three hours to converge to a solution on a single
processor of the Sequent Balance 8000 computer. [10] T h e
computation time was reduced to about 20 minutes using eight
processors.
1.4 Application of the Hopfield-Tank Model to the TSP
In the previous sections, we have shown that the Hopfield-Tank

model performs a descent towards a local minimum of the e n e r g y
function E. The "art" of applying that model to the TSP is t o
appropriately define the connection weights T ij and the bias I i s o
that the local minima of E will correspond to good TSP solutions.
In order to map a combinatorial optimization problem like t h e

TSP onto the Hopfield-Tank model, the following steps are suggested
in [83, 84]:
( 1 ) Choose a representation scheme which allows the activation

levels of the units to be decoded into a solution of t h e
problem.
14
( 2 ) Design an energy function whose minimum corresponds to t h e

best solution of the problem.
( 3 ) Derive the connectivity of the network from the e n e r g y
function.
( 4 ) Set up the initial activation levels of the units.
These ideas can easily be applied to the design of a Hopfield-

Tank network in a TSP context:
( 1 ) First, a suitable representation of the problem must be chosen.

In [51], the TSP is represented as an NxN matrix of units,
where each row corresponds to a particular city and e a c h
column to a particular position in the tour (see Figure 1).
If the activation level of a given unit VXi is close to 1, it is then
assumed that city X is visited at the ith position in the tour. I n
this way, the final configuration of the network can b e
interpreted as a solution to the TSP. Note that N 2 units a r e
needed to encode a solution for a TSP with N cities.
( 2 ) Second, the energy function must be defined. The following

function is used in [51]
E= A/2 ( Σ X Σ i Σ j i VXiVXj)
+ B/2 (Σ i Σ X Σ Y X VXiVYi)
+ C/2 (Σ X Σ i V Xi-N )2
+ D/2 (Σ X Σ Y X Σ i d XY V Xi (V Yi+1 + VYi-1)) , (1.5)
where the A, B, C, and D parameters are used to weight t h e

various components of the energy.
The first three terms penalize solutions that do not correspond
to feasible tours. Namely, there must be exactly one activated
unit in each row and column of the matrix. The first a n d
second terms, respectively, penalize the rows and columns
with more than one activated unit, and the third term requires
a total of N activated units (so as to avoid the trivial solution
V Xi=0 for all Xi). The fourth term ensures that the e n e r g y
function will favor short tours over longer ones. This t e r m
adds the distance dX Y to the energy value when cities X and Y
are in consecutive positions in the tour (note that subscripts
are taken modulo N, so that V X,N+1 is the same as VX1 ).
15
( 3 ) Third, the bias and connection weights are derived. To do so,

the energy function of Hopfield and Tank (1.5) is compared t o
the generic energy function (1.6), which is a slightly modified
version of (1.4): each unit has now two subscripts (city a n d
position) and the last term is removed (since it does not p l a y
any role here)
E = -1/2 Σ XiΣ Yj T XiYjV XiV Yj - Σ Xi V XiIXi . (1.6)
Consequently, the weights TXiYj on the connections of the

Hopfield-Tank network are identified by looking at the
quadratic terms in the TSP energy function, while the bias IXi
is derived from the linear terms. Hence,
TXiYj = - A δ XY (1- δ ij)
- B δ ij (1- δ XY )
-C
- D dXY (δ j,i+1 + δ j,i-1) ,
IXi = + CNe ,
where δ ij =1 if i=j and 0 otherwise.

The first and second terms in the definition of the connection
weights stand for inhibitory connections within each row a n d
each column, respectively. Hence, a unit whose activation level
is close to 1 tends to inhibit the other units in the same r o w
and column. The third term is a global inhibitor term. T h e
combined action of this term and the input bias IXi, which a r e
both derived from the C term in the energy function (1.5),
favor solutions with a total of N activated units. Finally, t h e
fourth term is called the "data term" and prevents solutions
with adjacent cities that are far apart (namely, the inhibition
is stronger between two units when they represent two cities
X, Y in consecutive positions in the tour, with a large inter-city
distance dXY ).
In the experiments of Hopfield and Tank, the parameter N e i n
the definition of the bias IXi = CNe does not always correspond
exactly to the number of cities N. This parameter is used b y
Hopfield and Tank to adjust the level of the positive b i a s
signal with respect to the negative signals coming through t h e
other connections, and it is usually slightly larger than N. Note
16
finally that there are O(N4) connections between the N 2 u n i t s

for a TSP with N cities.
( 4 ) The last step is to set the initial activation value of each u n i t

to 1/N plus or minus a small random perturbation (in this way
the sum of the initial activations is approximately equal to N).
With this model, Hopfield and Tank were able to solve a

randomly generated 10-city ETSP, with the following p a r a m e t e r
values: A=B=500, C=200, D=500, N e =15. They reported that for 2 0
distinct trials, using different starting configurations, the n e t w o r k
converged 16 times to feasible tours. Half of those tours were one of
the two optimal tours. On the other hand, the network was much less
reliable on a randomly generated 30-city ETSP (900 units). A p a r t
from frequent convergence to infeasible solutions, the n e t w o r k
commonly found feasible tours with a length over 7.0, as c o m p a r e d
to a tour of length 4.26 generated by the Lin-Kernighan exchange
heuristic. [68]
Three years later, it was claimed in [105] that the results of

Hopfield and Tank were quite difficult to reproduce. For the 1 0 - c i t y
ETSP of Hopfield and Tank, using the same parameter settings, t h e
authors report that on 100 different trials, the network converged t o
feasible solutions only 15 times. Moreover, the feasible tours w e r e
only slightly better than randomly generated tours. Other
experiments by the same authors, on various randomly g e n e r a t e d
10-city ETSP problems, produced the same kind of results.
The main weaknesses of the original Hopfield-Tank model, a s

pointed out in [105] are the following.
(a) Solving a TSP with N cities requires O(N2) units and O(N4)
connections.
( b ) The optimization problem is not solved in a problem space of
O(N!), but in a space of O(2 N 2) where many configurations
correspond to infeasible solutions.
(c) Each valid tour is represented 2N times in the Hopfield-Tank
model because any one of the N cities can be chosen as t h e
starting city, and the two orientations of the tour a r e
equivalent for a symmetric problem. This phenomenon is
referred to as "2N-degeneracy" in neural network terminology.
17
( d ) The model performs a gradient descent of the energy function

in the configuration space, and is thus plagued with t h e
limitations of "hill-climbing" approaches, where a local
optimum is found. As a consequence, the performance of t h e
model is very sensitive to the initial starting configuration.
(e) The model does not guarantee feasibility. In other words,
many local minima of the energy function correspond t o
infeasible solutions. This is related to the fact that t h e
constraints of the problem, namely that each city must b e
visited exactly once, are not strictly enforced but r a t h e r
introduced into the energy function as penalty terms.
(f) Setting the values of the parameters A, B, C, and D is m u c h
more an art than a science and requires a long "trial-and-
error" process. Setting the penalty parameters A, B, and C t o
small values usually leads to short but infeasible tours.
Alternatively, setting the penalty parameters to large v a l u e s
forces the network to converge to any feasible solution
regardless of the total length. Moreover, it seems to b e
increasingly difficult to find "good" parameter settings as t h e
number of cities increases.
(g) Many infeasible tours produced by the network visit only a
subset of cities. This is due to the fact that the third term i n
the energy function (C term) is the only one to penalize such a
situation. The first two terms (A and B terms), as well as t h e
fourth term (D term), benefit from such a situation.
( h ) It usually takes a large number of iterations (in t h e
thousands) before the network converges to a solution.
Moreover, the network can "freeze" far from a corner of t h e
hypercube in the configuration space, where it is not possible
to interpret the configuration as a TSP solution. This
phenomenon can be explained by the shape of the sigmoidal
activation function which is very flat for large positive a n d
large negative U i's (see Figure 2). Consequently, if t h e
activation level V i of a given unit i is close to zero or one, e v e n
large modifications to U i will produce only slight modifications
to the activation level. If a large number of units are in t h i s
situation, the network will evolve very slowly, a p h e n o m e n o n
referred to as "network paralysis."
Paralysis far from a corner of the hypercube can occur if t h e
slope of the activation function is not very steep. In that case,
the flat regions of the sigmoidal function extend further a n d
18
affect a larger number of units (even those with activation

levels far from zero and one).
(i) The network is not adaptive, because the weights of t h e
network are fixed and derived from problem data, rather t h a n
taught from it.
The positive points are that the model can be easily

implemented in hardware, using simple analog devices, and that i t
can also be applied to non-Euclidean TSPs, in particular, p r o b l e m s
that do not satisfy the triangle inequality and cannot be i n t e r p r e t e d
geometrically. [106] This is an advantage over the geometric
approaches that are presented in Sections 3 and 4.
Section 2. Variants of the Hopfield-Tank Model
Surprisingly enough, the results of Wilson and Pawley did n o t

discourage the community of researchers but rather stimulated t h e
search for ways to improve the original Hopfield-Tank model. T h e r e
were also numerous papers providing in-depth analysis of the m o d e l
to explain its failures and propose various improvements to t h e
method. [4,5,6,7,17,24,56,86,87,90,108]
The modifications to the original model can be classified into six

distinct categories: modifications to the energy function, techniques
for estimating "good" parameter settings, addition of hard constraints
to the model, incorporation of techniques to escape from local
minima, new problem representations, and modifications to t h e
starting configurations. We now describe each category, a n d
emphasize the most important contributions.
2.1 Modifications to the Energy Function
The first attempts were aimed at modifying the energy function

to improve the performance of the Hopfield-Tank model. Those
studies, which add or modify terms to push the model t o w a r d s
feasible solutions, are mostly empirical.
(a) In [18, 75, 78, 95], the authors suggest replacing either the t h i r d
term (C term) or the three first terms (A, B, and C terms) of t h e
original energy function (1.5) by
F/2 Σ X (Σ i VXi -1 )2 + G/2 Σ i (Σ X VXi -1 )2 .

19
This modification helps the model to converge towards feasible

tours, because it heavily penalizes configurations that do not h a v e
exactly one active unit in each row and each column. In particular, i t
prevents many solutions that do not incorporate all cities. Note that a
formulation where the A, B, and C terms are replaced by the two new
terms can be implemented with only O(N3) connections. In [14], t h e
author proposes an alternative approach to that problem by adding
an additional excitatory bias to each unit so as to get a larger n u m b e r
of activated units.
( b ) In [18], the authors suggest the addition of a new penalty t e r m
in the energy function (1.5). This term attempts to drive the s e a r c h
out from the center of the hypercube (and thus towards the corners
of the hypercube) so as to alleviate the network paralysis problem.
The additional term is
F/2 Σ Xi VXi (1-V Xi ) = F/2 ( N 2 /4 - Σ Xi (VXi - 1/2)2 ) .

In the same paper, they also propose a formulation where t h e
inter-city distances are only used in the linear components of t h e
energy function. Hence, the distances are provided to the n e t w o r k
via the input bias rather than encoded into the connection weights. I t
is a great advantage for a hardware implementation, like a n e u r a l
chip, because the connection weights do not change from one TSP
instance to another and can be fixed into the hardware at fabrication
time. However, a new representation of the problem with O(N3) u n i t s
is now required.
Brandt and his colleagues [18] report that the Hopfield-Tank

model with the two modifications suggested in (a) and ( b )
consistently converged to feasible tours for randomly generated 1 0 -
city ETSPs. Moreover, the average length was now much better t h a n
the length of randomly generated tours. Table 1 shows these r e s u l t s
for problems with 10, 16, and 32 cities. Note that the heading
"Manual" in the Table refers to tours constructed manually by hand.
20
Average Tour Length

Number of Number of
Cities Problems Brandt Manual Random
10 21 2.97 2.92 5.41

16 10 3.70 3.57 9.77
32 4 5.48 4.57 17.12
Table 1. Comparison of Results for Three Solution Procedures
2.2 Finding Good Settings for the Parameter Values
In [44, 45], the authors experimentally demonstrate that various

relationships among the parameters of the energy function must b e
satisfied in order for the Hopfield-Tank model to converge to feasible
tours. Their work also indicates that the region of good settings in the
parameter space quickly gets very narrow as the number of cities
grows. This study supports previous observations about the difficulty
to tune the Hopfield-Tank energy function for problems with a large
number of cities.
Cuykendall and Reese [28] also provide ways of estimating

parameter values from problem data, as the number of cities
increases. In [4, 5, 6, 7, 27, 57, 58, 74, 86, 87] theoretical
relationships among the parameters are investigated in order f o r
feasible tours to be stable. The work described in these papers is
mostly based on a close analysis of the eigenvalues and eigenvectors
of the connection matrix.
Wang and Tsai [102] propose to gradually reduce the value of

some parameters over time. However, time-varying p a r a m e t e r s
preclude a simple hardware implementation. Lai and Coghill[63]
propose the genetic algorithm, as described in [49], to find good
parameter values for the Hopfield-Tank model.
Along that line of research, the most impressive practical r e s u l t s

are reported in [28]. The authors generate feasible solutions for a
165-city ETSP, by appropriately setting the bias of each unit and t h e
U 0 parameter in the sigmoidal activation function (1.3). Depending o n
the parameter settings, it took between one hour and 10 hours of
computation time on an APOLLO DN4000 to converge to a solution.
The computation times were in a range of 10 to 30 minutes f o r
21
another 70-city ETSP. Unfortunately, no comparisons are p r o v i d e d

with other problem-solving heuristics.
2.3 Addition of Constraints to the Model
The approaches that we now describe add new constraints to t h e

Hopfield-Tank model, so as to restrict the configuration space t o
feasible tours.
(a) In [81, 97, 98] the activation levels of the units are normalized
so that Σ i VXi =1 for all cities X. The introduction of these additional
constraints is only one aspect of the problem-solving methodology,
which is closely related to the simulated annealing heuristic.
Accordingly, the full discussion is deferred to Section 2.4, w h e r e
simulated annealing is introduced.
( b ) Other approaches are more aggressive and explicitly restrict t h e

configuration space to feasible tours. In [96], the authors calculate
the changes required in the remaining units to maintain a feasible
solution when a given unit is updated. The energy function is t h e n
evaluated on the basis of the change to the updated unit and all t h e
logically implied changes to the other units. This approach converges
consistently to feasible solutions on 30-city ETSPs. The tours are only
5% longer on average than those generated by the simulated
annealing heuristic.
In [69], the author updates the configurations using Lin a n d

Kernighan's exchange heuristic. [68] Foo and Szu[35] use a "divide-and-
conquer" approach to the problem. They partition the set of cities
into subsets and apply the Hopfield-Tank model to each subset. T h e
subtours are then merged back together into a single larger tour with
a simple heuristic. Although their approach is not conclusive, t h e
integration of classical OR heuristics and artificial intelligence w i t h i n
a neural network framework could provide interesting r e s e a r c h
avenues for the future.
2.4 Incorporation of Techniques to Escape from Local Minima
The Hopfield-Tank model converges to a local minimum, and is

thus highly sensitive to the starting configuration. Hence, various
modifications have been proposed in the literature to alleviate t h i s
problem.
22
(a) In [1, 2, 3], a Boltzmann machine[48] is designed to solve the TSP.

Basically, a Boltzmann machine incorporates the simulated annealing
heuristic [25,60] within a discrete Hopfield network, so as to allow t h e
network to escape from bad local minima.
The simulated annealing heuristic performs a stochastic s e a r c h

in the space of configurations of a discrete system, like a Hopfield
network with binary units. As opposed to classical hill-climbing
approaches, simulated annealing allows modifications to the c u r r e n t
configuration that increase the value of the objective or e n e r g y
function (for a minimization problem). More precisely, a modification
that reduces the energy of the system is always accepted, while a
modification that increases the energy by ∆E is accepted w i t h
Boltzmann probability e - ∆ E/T , where T is the temperature p a r a m e t e r .
At a high temperature, the probability of accepting an increase to t h e
energy is high. This probability gets lower as the temperature is
reduced. The simulated annealing heuristic is typically initiated at a
high temperature, where most modifications are accepted, so as t o
perform a coarse search of the configuration space. The t e m p e r a t u r e
is then gradually reduced to focus the search on a specific region of
the configuration space.
At each temperature T, the configurations are modified

according to the Boltzmann update rule until the system reaches a n
"equilibrium." At that point, the configurations follow a Boltzmann
distribution, where the probability of the system being i n
configuration s' at temperature T is
e -E(s')/T
P T (s') = _________ . (2.1)
Σ s e -E(s)/T
Here, E(s') is the energy of configuration s', and the denominator is
the summation over all configurations. According to that probability,
configurations of high energy are very likely to be observed at high
temperatures and much less likely to be observed at low
temperatures. The inverse is true for low energy configurations.
Hence, by gradually reducing the temperature parameter T and b y
allowing the system to reach equilibrium at each temperature, t h e
system is expected to ultimately settle down at a configuration of low
energy.
23
This simulated annealing heuristic has been incorporated into

the discrete Hopfield model to produce the so-called Boltzmann
machine. Here, the binary units obey a stochastic update rule, r a t h e r
than a deterministic one. At each iteration, a unit is r a n d o m l y
selected and the consequence of modifying its activation level ( f r o m
zero to one or from one to zero) on the energy is evaluated. T h e
probability of accepting the modification is then
1
____________ ,
1 + e ∆ E/T
where ∆ E is the modification to the energy.
This update probability is slightly different from the one used i n

simulated annealing. In particular, the probability of accepting a
modification that decreases the energy of the network (i.e., ∆E < 0) is
not one here, but rather a value between 0.5 and one. However, t h i s
new update probability has the same convergence properties as t h e
one used in simulated annealing and, in that sense, the t w o
expressions are equivalent.
Aarts and Korst[1,2,3] design a Boltzmann machine for solving t h e

TSP based on these ideas. Unfortunately, their approach suffers f r o m
very slow convergence and, as a consequence, only 30-city TSPs have
been solved with this model.
( b ) In [43], the Boltzmann Machine is generalized to units w i t h

continuous activation levels. A truncated exponential distribution is
used to compute the activation level of each unit. As for the discrete
Boltzmann machine, the model suffers from slow convergence, a n d
only small 10-city ETSPs have been solved.
(c) The research described in [81, 97, 98], which is derived from t h e
mean-field theory, is probably the most important contribution t o
the literature relating to the Hopfield-Tank model since its original
description in [51]. The term "mean-field" refers to the fact that t h e
model computes the mean activation levels of the stochastic b i n a r y
units of a Boltzmann machine.
This section focuses on the model of Van den Bout a n d

Miller, [97,98] but the model of Peterson and Soderberg [81] is similar.
We first introduce the iterative algorithm for updating t h e
24
configurations of the network. Then, we explain the relationships

between this model and the Boltzmann machine.
The neural network model introduced in [97] is characterized b y

a new simplified energy function
E = dmax /2 ( Σ i Σ X ΣY X VXiVYi)
+ ΣX ΣY X Σi dXY V Xi (VYi+1 + VYi-1) . (2.2)
The first summation penalizes solutions with multiple cities at t h e

same position, while the second summation computes the tour length.
Note that the penalty value is weighted by the parameter d m a x .
Starting from some arbitrary initial configuration, the m o d e l

evolves to a stable configuration that minimizes (2.2), via t h e
following iterative algorithm:
1. Set the temperature T.

2. Select a city X at random.
3. Compute
U Xi = - dmax ΣY X V Yi - ΣY X dXY(VYi+1 + VYi-1), i=1,...,N.
4. Compute
e U Xi /T
V Xi = _________ , i=1,...,N .
Σ j e U Xj/T
5. Evaluate the energy E.
6. Repeat Steps 2 to 5 until the energy no longer decreases

(i.e., a stable configuration has been reached).
We note that the activation levels always satisfy the constraints

Σ i VXi = 1 , for all cities X.
Accordingly, each value V Xi can be interpreted as the probability that
city X occupies position i. When a stable configuration is reached, t h e
activation levels V Xi satisfy the following system of equations (called
the "mean-field equations")
25
e U Xi /T
V Xi = _________ , (2.3)
Σ j e U Xj/T
where U Xi = - dE/dVXi (see Step 3 of the algorithm).
In order to understand the origin of the mean-field equations,

we must go back to the evolution of a discrete Hopfield network w i t h
binary units, when those units are governed by a stochastic u p d a t e
rule like the Boltzmann rule (see the description of the simulated
annealing heuristic and the Boltzmann machine in point (a)).
It is known that the configurations of that network follow a

Boltzmann distribution at equilibrium (i.e., after a large number of
updates). Since the network is stochastic, it is not possible to k n o w
what the exact configuration will be at a given time. On the o t h e r
hand, the average or mean activation value of each binary unit a t
Boltzmann equilibrium at a given temperature T is a deterministic
value which can be computed as follows
<V Xi> = Σ s PT(s)VXi(s) = Σ sXi PT(sXi) .
In this equation, the summations are restricted to the configurations
satisfying Σ j V Xj = 1, for all cities X (so as to comply with the model of
Van den Bout and Miller), P T (s) is the Boltzmann probability of
configuration s at temperature T, V Xi (s) is the activation level of u n i t
Xi in configuration s, and sXi denotes the configurations w h e r e
V Xi = 1. Hence, we have
ΣsXi e− E(s Xi )/ T
< V Xi > = _________________ .
Σ Σ e− E(sXj)/T
j s Xj
In this formula, the double summation in the denominator is

equivalent to a single summation over all configurations s, because
each configuration contains exactly one activated unit in {Xj}j=1,...,N
( Σ j V Xj = 1 and VXj is either zero or one).
Now, we can apply the so-called "mean-field approximation" t o

<VXi>. Rather than summing up over all configurations, we a s s u m e
that the activation levels of all units that interact with a given unit Xj
are fixed at their mean value. For example, rather than summing u p
26
over all configurations sXi in the numerator (configurations w h e r e

V Xi=1), we fix the activation levels of all the other units to t h e i r
mean value. In this way, the summation can be removed. By
applying this idea to both the numerator and the denominator, a n d
by observing that -UXi is the contribution of unit Xi to the e n e r g y
(2.2) when VXi =1, the expression can be simplified to
e< U Xi > / T
< V Xi > = ____________ ,
Σ j e< U Xj > / T
where
< U Xi > = - dmax Σ Y≠X <V Yi> - Σ Y≠X dXY(<V Yi+1> + <V Yi-1>) .
These equations are the same as the equations of Van den Bout
and Miller (2.3). Hence, the VXi values computed via their iterative
algorithm can be interpreted as the mean activation levels of t h e
corresponding stochastic binary units at Boltzmann equilibrium (at a
given temperature T). At low temperatures, the low e n e r g y
configurations have high Boltzmann probability and they dominate in
the computation of the mean values <VXi>. Hence, the s t a b l e
configuration computed by the algorithm of Van den Bout and Miller
is expected to be of low energy, for a sufficiently small p a r a m e t e r
value T, because the stable configuration is composed of those m e a n
values < V Xi > .
As noted in [98], all the activation levels are the same at high
temperatures, that is, VXi → 1/N when T → . As the t e m p e r a t u r e
parameter is lowered, each city gradually settles into a single
position, because such configurations correspond to low e n e r g y
states. In addition, the model also prevents two cities from occupying
the same position, because a penalty of d m a x /2 is incurred in t h e
energy function. If the parameter d m a x is set to a value slightly
larger than twice the largest distance between any two cities, t h e
network can find a configuration with lower energy simply b y
moving one of the two cities into the empty position. Feasible t o u r s
are thus guaranteed through the combined actions of the new e n e r g y
function and the additional constraints imposed on the activation
levels V Xi (once again, for a sufficiently small parameter value T).
It is clear that the key problem is to identify a "good" value f o r

the parameter T. By gradually decreasing the temperature, Van d e n
27
Bout and Miller identified a critical value T c where all the e n e r g y

minimization takes place. Above the critical value T c , the units r a r e l y
converge to zero or one and feasible tours do not emerge. Below T c,
all the tours generated are feasible, and the best tours emerge w h e n
T is close to T c . Obviously, the critical temperature value is highly
dependent on the particular TSP to be solved. In [97, 98], Van d e n
Bout and Miller describe a methodology for estimating that v a l u e
from the inter-city distances. Using various T and d m a x p a r a m e t e r
values, their best tour on a 30-city TSP had a length of 26.9, a s
compared to 24.1 for a tour obtained with the simulated annealing
heuristic.
Peterson and Soderberg [81] test a similar model on much l a r g e r

problems, ranging in size from 50 to 200 cities. They observe t h a t
the length of the tours generated by the neural network a p p r o a c h
are about 8% longer on average than the tours generated with a
simulated annealing heuristic. Moreover, no tour was more than 10%
longer than the corresponding simulated annealing tour. However,
the average tour lengths are not provided (rather, the results a r e
displayed as small histograms). Also, no computation times a r e
reported.
It is quite interesting to note that Bilbro et al. [13] have s h o w n

that the evolution of the above model is equivalent to the evolution
of the Hopfield-Tank network governed by the equations of motion
(see Section 1). However, convergence to a stable configuration is
much faster by solving the mean-field equations. This increased
convergence speed explains the successes of Peterson and Soderberg
who routinely found feasible solutions to the TSP with up to 2 0 0
cities, those being the largest problems ever solved with models
derived from the work of Hopfield and Tank.
( d ) Mean-field annealing refers to the application of the mean-field

algorithm, as described in (c), with a gradual reduction of t h e
temperature from high to low values, as in the simulated annealing
heuristic. [13,19,80,107] As pointed out in [97], this approach is of little
use if the critical temperature, where all the energy minimization
takes place, can be accurately estimated from problem data. If t h e
estimate is not accurate, then it can be useful to gradually decrease
the temperature.
(e) In [8, 26, 66], random noise is introduced into the activation
level of the units in order to escape from local minima. The r a n d o m
28
noise follows a normal distribution with mean zero, and its intensity
is modulated by a temperature parameter which is gradually
reduced, as in the simulated annealing heuristic.
(f) Finally, Wells [103] describes a network of neural oscillators. T h e

network evolves towards sustained oscillations between equivalent
configurations, as opposed to the classical Hopfield-Tank m o d e l
which is evolving towards a single stable configuration.
2.5 Using New Problem Representations
In [54, 55], a new problem representation based on an integer

linear programming formulation of the TSP is proposed. Each u n i t
VXY now stands for an arc (X,Y) in the graph, and that unit is o n
when arc (X,Y) is in the tour. This "adjacency representation"
requires O(N 2 ) units and only O(N3 ) connections. Moreover, the i n t e r -
city distances now occur in the linear terms of the energy function
and are thus provided to the network via the input bias
E = A/2 Σ X (Σ X Y VXY - 2) 2 + B/2 Σ Y ( Σ X Y VXY - 2) 2
+ C/2 ΣX ΣY VXY dXY .
In this new energy function, the first two terms penalize cities t h a t
do not have exactly two neighbors.
Basically, this model solves the assignment problem associated

to the TSP when the subtour-breaking constraints are relaxed.
Accordingly, the network usually ends up with multiple s u b t o u r
solutions. In order to address that problem, a second binary layer is
connected to the Hopfield-Tank layer, so as to break the stability of
multiple subtour solutions and drive the network towards feasible
solutions.
Xu and Tsai [106] use the same representation and e n e r g y

function. However they address the multiple subtours problem in a
different way. When the neural network reaches a multiple s u b t o u r
solution, they add a higher-order connection between the u n i t s
involved in a subtour, so as to forbid any later occurrence of t h a t
subtour (a higher-order connection can involve an arbitrary n u m b e r
of units, as opposed to a standard connection which involves only
two units). Higher-order connections are incrementally added in t h i s
way, until the network stabilizes with a feasible solution. T h a t
29
procedure is usually slow to converge. In the worst case, all possible

subtours must be forbidden for a valid solution to emerge!
In practice, the authors often stop the procedure before a

feasible solution is reached, and then merge the remaining s u b t o u r s
into a single tour via the "patching" heuristic proposed in [59]. This
simple heuristic connects two subtours by deleting one link in e a c h
subtour and by introducing two new links to generate a single l a r g e r
tour. In Figure 5, for example, the links (i,j) and (k,l) have b e e n
replaced by the links (i,l) and (j,k). The links to be replaced a r e
chosen so as to minimize the length of the new enlarged tour. A s
noted by Xu and Tsai, the number of solutions with multiple subtours
tends to increase when the number of cities increases, and t h e
patching heuristic thus plays an important role in large problems.
i l
j k
Fig. 5. The patching heuristic.
Euclidean problems with up to 150 cities have been a d d r e s s e d

with this approach. However, the authors have shown that b e t t e r
solutions can be generated by applying 25 distinct runs of a 2 - o p t
exchange heuristic from different randomly generated initial
solutions, and by taking the best overall solution.
The authors also apply their model to TSPs with r a n d o m l y

generated distance matrices. In that context, the model p r o v i d e d
better solutions than the 2-opt (30 runs) and the Lin-Kernighan
heuristics (30 runs). These results are supported by the probabilistic
analysis in [59], where it is shown that solving the assignment
problem first and then applying the patching heuristic provides n e a r
optimal solutions to TSPs with random distance matrices.
30
2.6 Starting from Different Initial Configurations
In [98], the authors suggest starting from a good initial

configuration by forcing cities that are close in distance to b e
adjacent in the tour (which means that some units are preset to one).
This technique is called "heuristic seeding." Other authors suggest
starting from the center of the unit hypercube, that is, setting t h e
initial activation value of all units to 1/2.[ 4 4 ]
This discussion shows that the Hopfield-Tank model and i t s

many variants are still restricted to relatively small TSPs with 2 0 0
cities or less. The main results with respect to this model a r e
summarized in Table 2.
Paper Largest TSP Length of TSP Length of

(Number of Tour from Heuristic Tour from
Cities) Hopfield- TSP
Tank Models Heuristic
Hopfield and
Tank 30 5.07 LK 4.26
(1985)
Brandt et al. 32 MT 4.57
(1988) 5.48
Cuykendall 165 NA NA
9.409-9.933 a
and Reese
(1989)
Peterson and 200 NA SA NA
Soderberg
(1989)
Xu and 100 592.3 TO 570.2 c
Tsai 150 LK
9.529 b 9.616 d
(1991)
a Computation time of 1 to 10 hours on an Apollo DN4000
b Best of 5 runs with post-optimization by Lin-Kernighan
c Best of 25 runs starting from randomly generated tours
d Best of 30 runs starting from randomly generated tours
LK Lin-Kernighan
31
MT Manual Tour
NA Not Available
SA Simulated Annealing
TO 2-opt
Table 2. Comparison of Results Produced by Hopfield-Tank Models

and TSP Heuristics
Although the work in [81, 97, 98] has shown that it is possible t o
design models that consistently converge to feasible tours (one of t h e
problems with the original formulation), those tours do not y e t
compare with the tours generated by other TSP heuristics, like
simulated annealing. Furthermore, the convergence is quite slow
when the model is executed on a sequential computer. These
observations have led individuals to explore new lines of research, i n
particular, elastic nets and self-organizing maps.
Section 3. The Elastic Net
The elastic net approach [32] is fundamentally different from t h e

approach of Hopfield and Tank. It is a geometrically inspired
algorithm, originating in the work of Willshaw and Von d e r
Malsburg [104] , and is closely related to the self organizing map, a s
described in Section 4.
The elastic net algorithm is an iterative procedure where M

points, with M larger than the number of cities N, are lying on a
circular ring or "rubber band" originally located at the center of t h e
cities. The rubber band is gradually elongated until it p a s s e s
sufficiently near each city to define a tour. During that process t w o
forces apply: one for minimizing the length of the ring, and the o t h e r
one for minimizing the distance between the cities and the points o n
the ring. These forces are gradually adjusted as the p r o c e d u r e
evolves.
Figures 6a, 6b and 6c show how the elastic net typically evolves
over time. In the figure, the small black circles are the points located
on the ring which are migrating towards the cities. The final solution
is shown in Figure 6d.
The elastic net algorithm will now be introduced more formally.

Let Xi be the position of the ith city, i=1,...,N and let Yj be the position
of the jth point on the ring, j=1,...,M. In order to get a feasible tour,
one point in the ring should converge to each city. Accordingly, a t
32
each step, the ring points are updated in parallel according to t h e

following equations
∆Yj = α Σ i wij (Xi-Yj) + βK (Yj+1 + Yj-1 - 2Yj) , j=1,...,M (3.1)
where
φ (dX iY j, K)
w ij = ________________ ,
Σk φ (dX iY k,K)
2 2
φ (d,K) = e- d /2K .
33
(a) (b)
Montreal Montreal
Toronto
Toronto Boston Boston
NY NY
LA LA
(c) (d)
Montreal Montreal
Toronto Boston Toronto Boston
NY
NY
LA LA
Fig. 6. Evolution of the elastic net over time (a) (b) (c)
and the final tour (d).
In equation (3.1), α and β are constant parameters, K is t h e

"scale" parameter, d X iY j is the Euclidean distance between Xi and Yj,
and wij is a measure of the "influence" of city i on point j. The α t e r m
drives the points on the ring towards the cities, so that for each city i
there is at least one ring point j within distance K. The β term is
aimed at keeping neighboring points together so as to produce a
short tour.
34
Figure 7 illustrates the two forces that impact point j. The first
force (1), derived from the α term in the update equations, is q u i t e
easy to understand and is aimed at driving point j towards city i. T h e
second force (2), derived from the β term, can be more easily
interpreted by considering the equivalence
Y j+1 + Yj-1 - 2Yj = (Yj+1 - Yj) + (Yj-1 - Yj) .
Accordingly, this force defines a tension on the ring that keeps

neighboring points together.
The update equations can be expressed as the derivatives of an

appropriate energy function E, namely
∆ Y j = -K dE/dYj ,
where
E=-αK Σ i ln Σ j φ (d X iY j, K) + β /2 Σ j dY jY j+12 . (3.2)
Consequently, this algorithm finds a local minimum of t h e

energy function (3.2) by performing a gradient descent in t h e
(continuous) two-dimensional Euclidean space. Note also that w h e n
K → 0 and M/N → , the global minimum of the energy function
solves the TSP. Accordingly, as the algorithm evolves, the scale
parameter K is gradually reduced (like the temperature parameter i n
the simulated annealing heuristic). When K is 0, the energy r e d u c e s
to the sum of the squared distances between the points on the ring.
This is equivalent to minimizing the tour length at the limit M/N →
.
city i
j
(1)
j-1 (2)
j+1
35
Fig. 7. Forces that impact ring point j.
With this algorithm, Durbin and Willshaw [32] solved five

different sets of randomly generated 50-city ETSPs. It took a b o u t
1,250 iterations to converge and the tour lengths were, on average,
only 2% longer than those obtained with the simulated annealing
heuristic (average of five distinct runs). The authors report that t h e
computation times for both the elastic net and the simulated
annealing heuristic are approximately the same. The elastic n e t
algorithm also solved a 100-city ETSP and generated a tour about 1%
longer than the best tour obtained after eight million trials of t h e
simulated annealing heuristic. In this experiment, the following
parameter settings were used: M = 2.5N, α =0.2, β =2.0, Kmax = 0.2, a n d
K min = 0.02.
In a follow-up paper, [31] Durbin and his colleagues analyze h o w

the shape of the energy function (3.2) is modified by the value of t h e
scale parameter. It is shown that when K → 0, the local mimima of
the energy function are such that each city Xi is matched by at least
one Yj . Moreover, each X i is matched by exactly one Yj if p a r a m e t e r s
α and β satisfy the condition α / β < A, where A is the s h o r t e s t
distance from a point being attracted to some city to any other point
not being attracted to that city.
As pointed out by Simmen [92] , the above conditions ensure t h a t

each city is matched by exactly one point on the ring, but a point c a n
still match two or more cities, if those cities are sufficiently close.
Although the problem could be solved by using so many points o n
the ring that the typical spacing between neighboring points w o u l d
be much less than the minimum inter-city distance, Simmen
provides additional conditions on the α / β ratio to achieve the s a m e
result, and avoid the computational burden associated with a large
M.
Two variants of this algorithm have been proposed in t h e

literature.
(a) Burr [23] reports improvements to the convergence speed of t h e

elastic net algorithm, by using the new update equations
36
∆Yj = α Σ i wij (Xi-Y j) + α Σ i sij (Xi-Y(X i)) + β K (Yj+1 + Yj-1 - 2Yj) ,

where
Y(X i ) is the closest ring point to Xi ,
φ (dY(X i)Y j, K)
s ij = ___________________ .
Σk φ (d Y(X i)Y k , K)
The first and third terms in the new update equations

correspond to the two original terms in (3.1). The middle term is
added by the author to favor short tours by keeping the points
evenly distributed on the ring. The update equations are applied
with a varying K parameter whose value is computed at e a c h
iteration, and derived from the average squared distance b e t w e e n
each city and its closest point on the tour.
The author reports that on the five sets of 50-city ETSPs

described in [32], the solutions are of similar quality as those of t h e
original elastic net algorithm. However, it took only 30 iterations o n
average to converge to a solution, as compared to 1,250 iterations f o r
the model of Durbin and Willshaw.
( b ) In [16], the ring points are updated by only considering t h e

cities that have a significant influence on them. The authors r e p o r t
that this filtering mechanism reduces the computation times w i t h o u t
affecting the quality of the solution. The authors have solved
problems ranging in size from 100 to 1,000 cities. Their solutions a r e
about 5% to 10% longer than those of the Lin-Kernighan heuristic.
However, on the 1,000-city problems, the elastic net algorithm took
only half the time of the Lin-Kernighan heuristic.
To summarize this section, a benchmark study in [79] compares

five different problem-solving approaches on randomly g e n e r a t e d
ETSPs ranging in size from 50 to 200 cities. In this study, the elastic
net and the model of Peterson and Soderberg [81] , a variant of t h e
Hopfield-Tank model, are compared. The main results are shown i n
Table 3.
37
These results clearly indicate that the elastic net is superior t o

the model of Peterson and Soderberg. Another comparison in favor of
the elastic net can be established by comparing the figures in Durbin
and Willshaw [32] and in Hopfield and Tank [51] . On the same 3 0 - c i t y
ETSP, the best tour generated by the Hopfield-Tank model had a
length of 5.07, while the elastic net found a tour of length 4.26 on a
single trial (the same tour as the one found by the Lin-Kernighan
exchange heuristic [68] ).
As a final remark, interesting relationships can be established

between the elastic net model and the Hopfield-Tank network. I n
particular, Simic [91] demonstrates that the elastic net and t h e
Hopfield-Tank model can both be derived from statistical mechanics.
The differences observed between the two approaches stem f r o m
different ways of handling the constraints. The elastic net, by i t s
very nature, strictly enforces some TSP constraints, while t h e
Hopfield-Tank model transforms these constraints into penalties.
Yuille [109] provides a similar study relating elastic nets to deformable
templates, which are simple parameterized shapes that dynamically
evolve to fit an image.
Average Tour Length

Number of
Cities EN PS a
50 5.62 6.61
100 7.69 8.58
200 11.14 12.66

a Best of five runs on each problem
EN Elastic Net
PS Peterson and Soderberg
Table 3. Comparison of Results for the Elastic Net and the Model of
Peterson and Soderberg
38
Section 4. The Self-organizing Map
The self-organizing maps are instances of so-called "competitive

neural networks," which are used by many unsupervised learning
systems to cluster or classify data. As opposed to supervised models,
where a desired response is externally provided to the network, t h e
unsupervised systems must find useful correlations in the input d a t a
by themselves. Hence, they are said to be "self-organizing." T h e y
self-organize in an adaptive way, by gradually adjusting the weights
on their connections (as opposed to the Hopfield model, where t h e
connection weights are fixed).
Conceptually, this model is closely related to the elastic net. I n

both cases, a ring is defined and gradually modified until it p a s s e s
sufficiently near each city to define a tour. However, the t w o
approaches differ in the way they update the coordinates of t h e
points on the ring.
In Section 4.1, we first describe the competitive networks i n

general terms. Then, we introduce the self-organizing map a n d
describe its application to the TSP.
4.1 Competitive Networks
A typical two-layer competitive network with two input u n i t s

and M output units is depicted in Figure 8. Since we are considering
two-dimensional TSPs in this paper, we have shown only two i n p u t
units. However, there can be a large number I of inputs. In Figure 8,
T 1M and T 2M denote the weights on the connections from the t w o
input units to output unit M, and T M = (T1M , T 2M ) is the weight
vector of unit M.
The aim of the network is to group a set of I-dimensional i n p u t

patterns into K clusters (K ≤ M). The classes or clusters are not known
in advance and, consequently, the network must discover by itself
structure in the input data. At the end of the self-organization
process, each cluster is associated with a distinct output unit. More
precisely, each output unit gets activated only for a particular s u b s e t
of input patterns (defining a cluster).
39
Input Output
O1
X1
O2
X2
T 1M
T2M
OM
T M = ( T 1M , T2M )
Fig. 8. A two-layer network.
The model is quite simple and works in the following way. A f t e r

initializing the network with small random connection weights T ij,
the input patterns are presented sequentially to the network. For
each presentation of a given I-dimensional pattern X, the n e t w o r k
initiates the competition among the output units, and the winning
unit j* is the one which maximizes its net input, namely
net j* = max j netj , j=1,...,M
where net j = Σ i TijX i .

The winning unit thus associates the current input pattern t o
cluster j* and updates its weight vector Tj*= {Tij*}i=1,...,I, so as to g e t
closer to that input pattern. The update rule is
T j* new = Tj* old + µ (X - Tj* old ) ,

40
where µ is a fractional learning rate. Typically, the other output u n i t s

do not modify their weight vectors.
In order to get a stable classification, the whole set of i n p u t

patterns must be presented many times to the network (referred t o
as "passes" or "cycles" through the set of inputs). As the n e t w o r k
evolves, the weight vectors of the output units move in the I -
dimensional space and tend to settle after a while into distinct
regions of that space. When this phenomenon occurs, the winning
output unit associated with each input pattern does not change f r o m
one pass through the data to another and the clustering process
stabilizes. The final weight vector of each unit is then interpreted a s
the prototype vector of a cluster (that is, the centroid or "average
pattern" of the cluster).
4.2 The Self-organizing Map
A competitive network groups the inputs on the basis of

similarity, namely via the correlation between the input patterns X
and the weight vectors T j. In the TSP context, the input patterns a r e
the two-dimensional coordinates of the cities, and consequently, t h e
Euclidean distance must be used as the similarity m e a s u r e
(maximizing the inner product is equivalent to minimizing t h e
Euclidean distance only when all input and weight vectors a r e
normalized to the same length). Moreover, the objective is now t o
map these coordinates onto a ring of output units, so as to define a n
ordering of the cities.
Those observations suggest the use of Kohonen's self-organizing

map. [62] This model is aimed at generating mappings from higher t o
lower dimensional spaces, so that the relationships among the i n p u t s
are reflected as faithfully as possible in the outputs (topological
mapping). As a consequence, topological relationships are established
among the output units, and those relationships are respected w h e n
the weights are updated. Namely, the output units that are "close" t o
the winning unit, according to the topology, also update their weight
vectors so as to move in the same direction as the winning unit.
In the TSP context, a set of two-dimensional coordinates, locating

the cities in Euclidean space, must be mapped onto a set of o n e
dimensional positions in the tour, so as to preserve the neighborhood
relationships among the cities. For example, the competitive n e t w o r k
depicted in Figure 8 can be transformed into a self-organizing m a p ,
41
by choosing M ≥ N and by linking the output units to form the r i n g

O 1 -O 2 -...-O M -O 1 (each output unit has two immediate neighbors).
The evolution of the self-organizing map is quite similar to t h e

evolution of the competitive network of Section 4.1. However, t h e r e
are two main differences. First, the winning output unit is the o n e
whose weight vector is closest in Euclidean distance to the vector of
coordinates of the current city (current input pattern). Second, t h e
neighbors of the winning unit also move their weight vector in t h e
same direction as the winning unit, but with decreasing intensity
along the ring.
As a consequence, the winning unit j* satisfies
d X,T j* = minj dX,T j , j=1,...,M ,
and the connection weights are then updated according to t h e

formula
T j new = Tj old + µ G(d'jj*) (X - Tjold ) , (4.1)
where dX,Tj is the Euclidean distance between input vector X a n d

weight vector T j, and G is a function of the "proximity" of o u t p u t
units j and j* on the ring. Usually, the units on the ring are indexed
from 1 to M, and d'jj* is based on that indexing, so that
d' jj* = min (|j - j*|, M- |j - j*|) ,
where |x| denotes absolute value. In particular, the distance b e t w e e n

a given unit and its two immediate neighbors on the ring is one.
Obviously, the value of G must decrease as the d' jj* v a l u e

increases. Moreover, the modifications to the weight vectors of t h e
neighboring units should ideally be reduced over time. As a n
example, the function G suggested in [21] is
G = (1 - d'jj* /L) β , if d'jj* < L

= 0 , otherwise ,
where L= N/2, and the value of β is increased after each complete

pass through the set of coordinates. Note that G is always equal t o
one for the winning unit j*, and the value decreases for the o t h e r
units j as d'jj * and β increase.
42
Multiple passes through the entire set of coordinates are

performed until there is a weight vector sufficiently close to each
city. Figure 6 is still a good way to visualize the evolution of the self-
organizing map. However, the little black circles on the ring now
represent the output units, and the location of each output unit Oj is
determined by its weight vector T j .
Different self-organizing maps are described in the l i t e r a t u r e

based on that scheme.
(a) Fort [36] was one of the first with Angeniol et al., [9] Hueter, [52]
and Ritter and Schulten [85] to apply Kohonen's ideas to the TSP. Fort
notes, in particular, that the speed of convergence can be increased
by reducing the neighborhood of the winning unit and by reducing
the modification to the weights of the neighboring units over time. I n
his experiments, Fort uses 2N output units and solves problems w i t h
up to 400 cities. With respect to the 400-city problem, he generates a
tour of length 15.73, as compared to an estimated optimum of 1 4 . 9 8
derived from Stein's formula [94] .
( b ) The work of Angeniol et al. [9] is based on the distinctive f e a t u r e

that units in the ring are dynamically created and deleted. A unit is
duplicated if it wins for two different cities after a complete p a s s
through the set of coordinates. It is deleted, if it does not win a f t e r
three complete passes. Starting with a single unit in the ring, t h e
authors report that up to twice as many units as cities may b e
created during the procedure.
In their weight update equations, there is no learning r a t e

parameter µ (it is implicitly set to one), but there is an additional
parameter K in the definition of the function G, namely
2 2
G(d 'jj *, K) = 1/√ 2 (e-d' jj* /K ) .
When K→ , all units move towards the current city w i t h

strength 1/ √ 2. On the other hand, when K→ 0 only the winning unit j*
moves toward the city. Consequently, the K value is decreased a f t e r
each pass through the set of coordinates according to the formula
K ← (1- α ) K ,
with 0 < α < 1 .
43
The authors report good results on the five sets of 5 0 - c i t y

problems of Durbin and Willshaw. [32] Their results also show that a
single run of the elastic net algorithm is superior to a single run of
the self-organizing map (see Table 4). The same kind of results a r e
reported in [21].
City Average Tour Length

Sets
EN SA SA+3 SOM SOM+
1 5.98 5.88 5.84 6.06 5.84
2 6.09 6.01 5.99 6.25 6.00
3 5.70 5.65 5.57 5.83 5.58
4 5.86 5.81 5.70 5.87 5.60
5 6.49 6.33 6.17 6.70 6.19

EN Elastic Net
SA+3 Best solution found by SA and many distinct runs of 3-opt starting
from randomly generated tours
SOM Self-organizing Map ( α = 0.2)
SOM+ Best solution found by SOM over 4,000 different runs
(by processing the cities in various orders)
Table 4. Comparison of Results for Five Solution Procedures

on Five Sets of 50-city Problems
The authors also solved a randomly generated 1,000-city ETSP

with their model. It took 20 minutes on a "classical sequential
machine" to solve the problem with α set to 0.2. Slightly b e t t e r
solutions were found by setting α to 0.02, but the computation t i m e
increased to 12 hours. Unfortunately, no comparisons are p r o v i d e d
with other problem-solving heuristics for the 1,000-city problem.
Fritzke and Wilke [37] describe a similar approach where units i n

the ring are dynamically created. They also introduce various
simplifications in order to reduce complexity (e.g., the neighborhood
of the winning unit only includes its two immediate neighbors). T h e
authors were able to solve nine problems taken from the OR
literature ranging in size from 8 to 2,392 cities. Since the o p t i m u m
44
for each one of these problems is known, they observed that t h e i r

tours were never more than 10% longer than the optimum. However,
the difference increases steadily from the 8-city problem (3%) to t h e
2,392-city problem (10%). It took about three hours to solve t h e
largest problem on a SPARC 2 workstation.
The authors also observe that better solutions are produced b y

constructing a tour with the nearest neighbor heuristic and b y
applying the 2-opt exchange heuristic to this initial tour. On t h e
2,392-city problem, the solution obtained was about 6% larger t h a n
the optimum.
(c) Favata and Walker [34] add a third "normalizing" dimension t o

the coordinates of the cities, so that all the input vectors have t h e
same length. In their experiments, the authors note that it sometimes
occurs that two or more cities are mapped to the same unit in t h e
ring. They interpret that phenomenon as meaning that the n e t w o r k
does not care about the local ordering of those cities in the solution.
For small problems, the use of a local optimization procedure f o r
defining a complete ordering slightly improved the solutions. On
larger problems, however, an arbitrary insertion proved to b e
equally good.
Favata and Walker were able to solve problems with up t o

10,000 cities with their approach. On randomly generated 3 0 - c i t y
ETSPs, their solutions are about 5% longer than the solutions
produced by the simulated annealing heuristic. On a 1,000-city
problem, they got a tour of length 26.82 after half an hour of
computation time on a VAX 3600, as compared to a length of 2 5 . 9 5
with the simulated annealing heuristic after one hour of computation.
They report that their approach is 10 times faster than simulated
annealing on the 10,000-city problem, but no additional details a r e
provided.
( d ) In [20, 21], the authors use the "conscience" mechanism [30] t o

solve the problem related to the mapping of multiple cities to t h e
same unit in the ring. Basically, the conscience mechanism inhibits
units that are winning too often, so that other units also get a chance
to win. The conscience mechanism avoids the artificial
"deletion/duplication" step of the procedure described in [9], and t h e
number of units on the ring is fixed to the number of cities N
45
throughout the algorithm. The computational results show that t h e i r

approach (called the "guilty net") is better than the Hopfield-Tank
model, but is outperformed by the elastic net on four sets of
randomly generated problems ranging in size from 10 to 100 cities
(see Table 5). The authors also note that their model converges m u c h
faster than the elastic net (500 iterations on the 100-city problems,
as compared to 7,000 for the elastic net).
Average Tour Length

Number of
Cities EN GN HT SA Optimum a
10 2.45 2.78 2.92 2.74 2.43
30 4.51 4.81 5.12 4.75 4.21
50 5.58 6.21 6.64 6.13 5.43
100 8.23 8.81 NA 8.40 7.68
a Optimum estimated via Stein's formula [ 9 4 ]
EN Elastic Net
GN Guilty Net
HT Hopfield-Tank
NA Not Available
Table 5. Comparison of Results for Five Solution Procedures

on Random Problems
(e) Matsuyama [71] adds a new term, previously introduced in

Durbin and Willshaw, [32] to the weight update equations (4.1)
T jnew = Tjold + µ G(d'jj*) (X - Tjold ) + β (Tj+1 + Tj-1 - 2Tj) .
As previously noted in the description of the elastic net algorithm,

the last term is aimed at shortening the length of the ring. It acts i n
synergy with the second term to allow the neighbors of the winning
unit to move in the same direction as the winning unit.
46
Table 6 summarizes the main results with respect to the self-

organizing map.
47
Paper Largest TSP Length of Tour TSP Length of

(Number of from Heuristic Tour from
Cities) Self-organizing TSP
Map Heuristic
Fort 50 6.035 SA 5.735

(1988)
400 15.73 NA NA
Angeniol 1,000 18,036-18,800 a NA NA

et al.
(1988)
Fritzke and 2,392 NA b NN+TO NA

Wilke
(1991)
Favata and 1,000 26.82 c SA 25.95

Walker
(1991) 10,000 NA NA NA
Burke and 100 8.81 SA 8.40

Damany EN 8.23
(1992)
a Computation time of 20 minutes to 12 hours on a sequential computer (not
identified)
b Computation time of 3 hours on a SPARC 2 workstation
c Computation time of 30 minutes on a VAX 3600
EN Elastic Net
NA Not Available
NN+TO Nearest Neighbor + 2-opt
Table 6. Comparison of Results Produced by Self-Organizing Maps

and TSP Heuristics
As a final remark, it is interesting to note that Van den Bout a n d

Miller [99] describe a way to implement Kohonen's self-organizing
map in hardware with VLSI components (i.e., a neural chip).
48
Assuming a clock frequency of 10 MHz, they report that their n e u r a l

chip could process an input vector each 6.9 microseconds, regardless
of the number of units in the network.
Section 5. Concluding Remarks
In summary, the self-organizing map and the elastic net c a n

now find good solutions to very large ETSPs with up to 10,000 cities.
Both approaches outperform the Hopfield-Tank network, which is
restricted to relatively small problems with 200 cities or less ( b u t
can be applied as well to problems with no geometric interpretation).
The results in the literature also indicate that the elastic net is b e t t e r
than the self-organizing map with respect to solution quality. [9,21]
On the other hand, the gap between the elastic net or the self-
organizing map and the best heuristics of OR is still quite large. For
example, the study of Fritzke and Wilke [37] shows that the t o u r s
generated by their self-organizing map are 5% to 10% longer than the
optimal tours for ETSPs with 500 to 2,392 cities. These results c a n
hardly be compared with the tours generated by the heuristics of
Johnson and Bentley. [12,53] In the first case, solutions within 1% of
the optimum are reported for problems with 10,000 cities, while i n
the second case, solutions within 4% of the optimum are reported o n
problems with 1,000,000 cities.
However, the rapid evolution of neural network technology c a n

hardly be overlooked. Just eight years ago, Hopfield and Tank w e r e
struggling on a 10-city problem! The aim of this paper is to inform
the OR community about the rapid progress made in the n e u r a l
network field, and to demonstrate its potential for new r e s e a r c h
developments, in particular, via harmonious integration of OR a n d
neural network technologies. As an example, Wacholder and h i s
colleagues [100,101] used the well-known equivalence between the TSP
and the multiple traveling salesman problem [11] to design Hopfield-
Tank models for the latter problem.
Recently, various extensions to the TSP have been considered b y

neural network researchers. In [33, 42, 72, 73], multiple rings of
output units are used to solve the multiple traveling s a l e s m a n
problem and the vehicle routing problem with the self-organizing
map. In [89], an extension of the Hopfield-Tank model for d y n a m i c
TSPs is described. In these problems, the traveling costs change
according to the time of the day. Finally, in [82], a neural network is
49
trained to perform a vehicle dispatching task. The network l e a r n e d

to weight various criteria used by human experts to e v a l u a t e
dispatching situations and to assign service requests to vehicles. This
application is well suited to neural network technology, because
many subjective assessments are made by the dispatcher, and n e u r a l
networks are known to be particularly useful for such ill-defined
problems.
Acknowledgments. I would like to thank Professor Guy Lapalme and t h r e e

anonymous referees for their very useful comments. Thanks also to Dr. B r u c e
L. Golden who gave to me some interesting papers on the subject. Finally, t h i s
research would not have been possible without the financial support of t h e
Natural Sciences and Engineering Council of Canada (NSERC) and the Fonds
pour la Formation de Chercheurs et l’Aide a la Recherche of Quebec
Government (FCAR).
References
1. E.H.L. Aarts, J.H.M. Korst (1987), "Boltzmann Machines and t h e i r

Applications", in Parallel Architectures and Languages i n
Europe, Lecture Notes in Computer Science 258, Volume 1, J.W.
de Bakker, A.J. Nijman and P.C. Treleaven Eds, Springer-Verlag,
Berlin, pp. 34-50.
2. E.H.L. Aarts, J.H.M. Korst (1989), "Boltzmann Machines f o r

Travelling Salesman Problems", European Journal of Operational
Research 39, pp. 79-95.
3. E.H.L. Aarts, J.H.M. Korst (1989), Simulated Annealing a n d

Boltzmann Machines: A Stochastic Approach to Combinatorial
Optimization and Neural Computing, Wiley, New-York, NY.
4. S. Abe (1989), "Theories on the Hopfield Neural Networks", i n

Proceedings of the Int. Joint Conf. on Neural Networks,
Washington, DC, pp. I-557-564.
5. S. Abe (1991), "Global Convergence and Suppression of Spurious

States of the Hopfield Neural Networks", in Proceedings of t h e
Int. Joint Conf. on Neural Networks, Singapore, pp. 935-940.
6. S.V.B. Aiyer, M. Niranjan, F. Fallside (1990), "A Theoretical

Investigation into the Performance of the Hopfield Model", IEEE
Transactions on Neural Networks 1(2), pp. 204-215.
50
7. S.V.B. Aiyer, M. Niranjan, F. Fallside (1990), "On t h e

Optimization Properties of the Hopfield Model", in Proceedings
of the Int. Neural Network Conf., Paris, France, pp. 245-248.
8. Y. Akiyama, A. Yamashita, M. Kajiura, H. Aiso (1989),

"Combinatorial Optimization with Gaussian Machines", i n
9. B. Angeniol, G. Vaubois, J.Y. Le Texier (1988), "Self-Organizing

Feature Maps and the Travelling Salesman Problem", Neural
Networks 1, pp. 289-293.
10. N. Bagherzadeh, T. Kerola, B. Leddy, R. Brice (1987), "On

Parallel Execution of the Traveling Salesman Problem on a
Neural Network Model", in Proceedings of the IEEE Int. Conf. o n
Neural Networks, San Diego, CA, pp. III-317-324.
11. M. Bellmore, S. Hong (1974), "Transformation of Multi-Salesman

Problem to the Standard Traveling Salesman Problem", Journal
of the ACM 21, pp. 500-504.
12. J. Bentley (1992), "Fast Algorithms for Geometric Traveling

Salesman Problems", ORSA Journal on Computing 4(4), pp. 3 8 7 -
411.
13. G. Bilbro, R. Mann, T. Miller, W. Snyder, D.E. Van den Bout, M.

White (1989), "Opimization by Mean Field Annealing", i n
Advances in Neural Information Processing Systems I , D.
Touretzky Eds, Morgan Kaufmann, San Mateo, CA, pp. 91-98.
14. A. Bizzarri (1991), "Convergence Properties of a Modified

Hopfield-Tank Model", Biological Cybernetics 64, pp. 293-300.
15. L. Bodin, B.L. Golden, A. Assad, M. Ball (1983), "Routing a n d

Scheduling of Vehicles and Crews: The State of the Art",
Computers & Operations Research 10(2), pp. 63-211.
16. M.C.S. Boeres, L.A.V. de Carvalho (1992), "A Faster Elastic Net
Algorithm for the Traveling Salesman Problem", in Proceedings
of the Int. Joint Conf. on Neural Networks, Baltimore, MD, pp. II-
215-220.
51
17. P. Bozovsky (1990), "Discrete Hopfield Model with Graded

Response", in Proceedings of the Int. Joint Conf. on Neural
Networks, San Diego, CA, pp. III-851-856.
18. R.D. Brandt, Y. Wang, A.J. Laub (1988), "Alternative Networks

for Solving the Traveling Salesman Problem and the List-
Matching Problem", in Proceedings of the IEEE Int. Conf o n
Neural Networks, San Diego, CA, pp. II-333-340.
19. T. Bultan, C. Aykanat (1991), "Parallel Mean Field Algorithms

for the Solution of Combinatorial Optimization Problems", i n
Artificial Neural Networks, vol. 1, T. Kohonen, K. Makisara, O.
Simula, J. Kangas Eds, North-Holland, Amsterdam, pp. 591-596.
20. L.I. Burke (1992), "Neural Methods for the Traveling Salesman
Problem: Insights from Operations Research", presented at t h e
ORSA-TIMS Joint Meeting, San Francisco, CA, November 1992.
21. L.I. Burke, P. Damany (1992), "The Guilty Net for the Traveling
Salesman Problem", Computers & Operations Research 19 ( 3 / 4 ) ,
pp. 255-265.
22. L.I. Burke, J.P. Ignizio (1992), "Neural Networks and Operations
Research: An Overview", Computers & Operations Research
19(3/4), pp. 179-189.
23. D.J. Burr (1988), "An Improved Elastic Net Method for t h e
Traveling Salesman Problem", in Proceedings of the IEEE I n t .
Conf. on Neural Networks, San Diego, CA, pp, I-69-76.
24. L.A.V. de Carvalho, V.C. Barbosa (1990), "A TSP Objective

Function that Ensures Feasibility at Stable Points", i n
Proceedings of the Int. Neural Network Conf., Paris, France, p p .
249-253.
25. V. Cerny (1985), "Thermodynamical Approach to the Traveling

Salesman Problem: An Efficient Simulation Algorithm", Journal
of Optimization, Theory and Applications 45, pp. 41-55.
26. J.H. Cervantes, R.R. Hildebrant (1987), "Comparison of T h r e e

Neuron-Based Computation Schemes", in Proceedings of t h e
IEEE Int. Conf on Neural Networks, San Diego, CA, pp. I I I - 6 5 7 -
671.
52
27. W.I. Clement, R.M. Inigo, E.S. McVey (1988), "Synaptic Strengths
for Neural Simulation of the Traveling Salesman Problem", i n
Proceedings of Applications of Artificial Intelligence, SPIE 9 3 7 ,
Orlando, FL, pp. 373-380.
28. R. Cuykendall, R. Reese (1989), "Scaling the Neural TSP

Algorithm", Biological Cybernetics 60, pp. 365-371.
29. J.W. Denton, G.R. Madey (1989), "Impact of Neurocomputing o n

Operations Research", in Impacts of Recent Computer Advances
on Operations Research, Publications in Operations Research
Series 9, R. Sharda, B.L. Golden, E. Wasil, O. Balci and W. S t e w a r t
Eds, Elsevier Science Publishing Co., New-York, NY, pp. 302-312.
30. D. Desieno (1988), "Adding a Conscience Mechanism t o

Competitive Learning", in Proceedings of the IEEE Int. Conf. o n
Neural Networks, San Diego, CA, pp. I-117-124.
31. R. Durbin, R. Szeliski, A. Yuille (1989), "An Analysis of t h e

Elastic Net Approach to the Traveling Salesman Problem",
Neural Computation 1, pp. 348-358.
32. R. Durbin, D. Willshaw (1987), "An Analogue Approach to t h e

Traveling Salesman Problem using an Elastic Net Method",
Nature 326, pp. 689-691.
33. H. El-Gahziri (1991), "Solving Routing Problems by a Self-

Organizing Map", in Artificial Neural Networks, T. Kohonen, K.
Makisara, O. Simula, J. Kangas Eds, North-Holland, A m s t e r d a m ,
pp. 829-834.
34. F. Favata, R. Walker (1991), "A Study of the Application of

Kohonen-type Neural Networks to the Traveling Salesman
Problem", Biological Cybernetics 64, pp. 463-468.
35. Y.P.S. Foo, H. Szu (1989), "Solving Large-Scale Optimization

Problems by Divide-and-Conquer Neural Networks", in
36. J.C. Fort (1988), "Solving a Combinatorial Problem via Self-

Organizing Process: An Application of the Kohonen Algorithm t o
the Traveling Salesman Problem", Biological Cybernetics 59, p p .
33-40.
53
37. B. Fritzke, P. Wilke (1991), "FLEXMAP: A Neural Network f o r

the Traveling Salesman Problem with Linear Time and Space
Complexity", in Proceedings of the Int. Joint Conf. on Neural
Networks, Singapore, pp. 929-934.
38. M. Gendreau, A. Hertz, G. Laporte (1992), "New Insertion a n d

Post-Optimization Procedures for the Traveling Salesman
Problem", Operations Research 40, pp. 1086-1094.
39. F. Glover (1989), "Tabu Search, Part I", ORSA J. on Computing

1(3), pp. 190-206.
40. F. Glover (1990), "Tabu Search, Part II", ORSA J. on Computing

2(1), pp. 4-32.
41. B.L. Golden, W.R. Stewart (1985), "Empirical Analysis of

Heuristics", in The Traveling Salesman Problem. A Guided Tour
of Combinatorial Optimization, E.L. Lawler, J.K. Lenstra, A.H.G.
Rinnooy Kan and D.B. Shmoys Eds, Wiley, Chichester, pp. 2 0 7 -
249.
42. M. Goldstein (1990), "Self-Organizing Feature Maps for t h e

Multiple Traveling Salesmen Problem", in Proceedings of t h e
Int. Neural Network Conf. Paris, France, pp. 258-261.
43. K. Gutzmann (1987), "Combinatorial Optimization Using a

Continuous State Boltzmann Machine", in Proceedings of t h e
IEEE Int. Conf. on Neural Networks, San Diego, CA, pp. I I I - 7 2 1 -
733.
44. S.U. Hedge, J.L. Sweet, W.B. Levy (1988), "Determination of

Parameters in a Hopfield/Tank Computational Network", i n
Proceedings of the IEEE Int. Conf. on Neural Networks, San
Diego, CA, pp. II-291-298.
45. S.U. Hedge, J.L. Sweet, W.B. Levy (1989), "Determination of

Parameters in a Hopfield/Tank Computational Network", i n
Proceedings of the 1988 Connectionist Models Summer School,
D. Touretzky, G. Hinton, T. Sejnowski Eds, Morgan Kaufmann,
San Mateo, CA, pp. 211-216.
46. M. Held, R.M. Karp (1970), "The Traveling-Salesman Problem

and Minimum Spanning Trees", Operations Research 18, p p .
1138-1162.
54
47. J. Hertz, A. Krogh and R.G. Palmer (1991), Introduction to t h e

Theory of Neural Computation, Addison-Wesley, Redwood City,
CA.
48. G.E. Hinton, T.J. Sejnowski (1986), "Learning and Relearning i n

Boltzmann Machines", in Parallel Distributed Processing:
Explorations in the MicroStructure of Cognition, vol. 1, D.E.
Rumelhart and J.L. McClelland Eds, The MIT Press, Cambridge,
MA, pp. 282-317.
49. J.H. Holland (1992), Adaptation in Natural and Artificial

Systems, The MIT Press, Cambridge, MA.
50. J.J. Hopfield (1982), "Neural Networks and Physical Systems

with Emergent Collective Computational Abilities", in
Proceedings of the National Academy of Sciences, volume 7 9 ,
pp. 2554-2558.
51. J.J. Hopfield, D.W. Tank (1985), "Neural Computation of Decisions

in Optimization Problems", Biological Cybernetics 52, pp. 1 4 1 -
152.
52. G.J. Hueter (1988), "Solution of the Traveling Salesman Problem

with an Adaptive Ring", in Proceedings of the IEEE Int. Conf. o n
Neural Networks, San Diego, CA, pp. I-85-92.
53. D.S. Johnson (1990), "Local Optimization and the Traveling

Salesman Problem", in Automata, Languages and Programming,
Lecture Notes in Computer Science 443, G. Goos and J.
Hartmanis Eds, Springer-Verlag, Berlin, pp. 446-461.
54. A. Joppe, H.R.A. Cardon, J.C. Bioch (1990), "A Neural Network f o r
Solving the Traveling Salesman Problem on the Basis of City
Adjacency in the Tour", in Proceedings of the Int. Joint Conf. o n
55. A. Joppe, H.R.A. Cardon, J.C. Bioch (1990), "A Neural Network f o r
Solving the Traveling Salesman Problem on the Basis of City
Adjacency in the Tour", in Proceedings of the Int. Neural
Network Conf., Paris, France, pp. 254-257.
56. A.B. Kahng (1989), "Traveling Salesman Heuristics a n d

Embedding Dimension in the Hopfield Model", in Proceedings of
the Int. Joint Conf. on Neural Networks, Washington, DC, pp. I -
513-520.
55
57. B. Kamgar-Parsi, B. Kamgar-Parsi (1987), "An Efficient Model of

Neural Networks for Optimization", in Proceedings of the IEEE
Int. Conf. on Neural Networks, San Diego, CA, pp. III-785-790.
58. B. Kamgar-Parsi, B. Kamgar-Parsi (1990), "On Problem-solving

with Hopfield Neural Networks", Biological Cybernetics 62, p p .
415-423.
59. R.M. Karp, J.M. Steele (1985), "Probabilistic Analysis of

Heuristics", in The Traveling Salesman Problem: A Guided Tour
of Combinatorial Optimization, E.L Lawler, J.K. Lenstra, A.H.G.
Rinnooy Kan, D.B. Shmoys Eds, Wiley, Chichester, pp. 181-205.
60. S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi (1983), "Optimization b y

Simulated Annealing", Science 220, pp. 671-680.
61. K. Knight (1990), "Connectionist Ideas and Algorithms",

Communications of the ACM 33(11), pp. 59-74.
62. T. Kohonen (1988), Self-Organization and Associative M e m o r y,

Springer-Verlag, Berlin.
63. W.K. Lai, G.C. Coghill (1992), "Genetic Breeding of Control

Parameters for the Hopfield-Tank Neural Net", in Proceedings of
the Int. Joint Conf. on Neural Networks, Baltimore, MD, pp. I V -
618-623.
64. G. Laporte (1992), "The Traveling Salesman Problem: A n

Overview of Exact and Approximate Algorithms", European
Journal of Operational Research 59(2), pp. 231-247.
65. E.L. Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan, D.B. Shmoys
(1985), The Traveling Salesman Problem: A Guided Tour of
Combinatorial Optimization, Wiley, Chichester.
66. B.C. Levy, M.B. Adams (1987), "Global Optimization w i t h

Stochastic Neural Networks", in Proceedings of the IEEE I n t .
Conf. on Neural Networks, San Diego, CA, pp. III-681-689.
67. S. Lin (1965), "Computer Solutions of the Traveling Salesman

Problem", Bell System Technical Journal 44, pp. 2245-2269.
68. S. Lin, B. Kernighan (1973), "An Effective Heuristic Algorithm

for the Traveling Salesman Problem", Operations Research 2 1 ,
pp. 498-516.
56
69. R. Lister (1989), "Segment Reversal and the TSP", Proceedings

of the Int. Joint Conf. on Neural Networks, Washington, DC, p p .
I-424-427.
70. C.K. Looi (1992), "Neural Network Methods in Combinatorial

Optimization", Computers & Operations Research 19 (3/4), p p .
191-208.
71. Y. Matsuyama (1990), "Competitive Self-Organization a n d

Combinatorial Optimization: Applications to the Traveling
Salesman Problem", in Proceedings of the Int. Joint Conf. o n
72. Y. Matsuyama (1991), "Self-Organization via Competition,

Cooperation and Categorization Applied to Extended Vehicle
Routing Problems", in Proceedings of the Int. Joint Conf. o n
Neural Networks, Seattle, WA, pp. I-385-390.
73. Y. Matsuyama (1992), "Self-Organization Neural Networks a n d

Various Euclidean Traveling Salesman Problems", Systems a n d
Computers in Japan 23(2), pp. 101-112.
74. S. Mehta, L. Fulop (1990), "A Neural Algorithm to Solve t h e

Hamiltonian Cycle Problem", in Proceedings of the Int. Joint
Conf. on Neural Networks, San Diego, CA, pp. III-843-849.
75. E. Mjolsness (1987), "Control of Attention in Neural Networks",

in Proceedings of the IEEE Int. Conf. on Neural Networks, San
Diego, CA, pp. II-567-574.
76. M. Padberg, G. Rinaldi (1988), "A Branch-and-Cut Algorithm f o r

the Resolution of Large-Scale Symmetric Traveling Salesman
Problems", Technical Report R. 247, Istituto di Analisi d e i
Sistemi ed Informatica, Consiglio Nazionale delle Ricerche,
Roma.
77. M. Padberg, G. Rinaldi (1990), "Facet Identification for t h e

Symmetric Traveling Salesman Problem", Mathematical
Programming 47, pp. 219-257.
78. J.H. Park, H. Jeong (1990), "Solving the Traveling Salesman

Problem Using an Effective Hopfield Network", in Proceedings of
the Int. Neural Network Conf., Paris, France, pp. 291-292.
57
79. C. Peterson (1990), "Parallel Distributed Approaches t o

Combinatorial Optimization: Benchmark Studies on Traveling
Salesman Problem", Neural Computation 2, pp. 261-269.
80. C. Peterson, J.R. Anderson (1987), "A Mean Field Theory

Learning Algorithm for Neural Networks", Complex Systems
1(5), pp. 995-1019.
81. C. Peterson, B. Soderberg (1989), "A New Method for Mapping

Optimization Problems onto Neural Networks", Int. J. of Neural
Systems 1, pp. 3-22.
82. J.Y. Potvin, Y. Shen, J.M. Rousseau (1992), "Neural Networks f o r

Automated Vehicle Dispatching", Computers & Operations
Research 19(3/4), pp. 267-276.
83. J. Ramanujam, P. Sadayappan (1988), "Optimization by Neural

Networks", Proceedings of the IEEE Int. Conf. on Neural
Networks, San Diego, CA, pp. II-325-332.
84. J. Ramanujam, P. Sadayappan (1989), "Parameter Identification

for Constrained Optimization Using Neural Networks", i n
Proceedings of the 1988 Connectionist Models Summer School,
D. Touretzky, G. Hinton, T. Sejnowski Eds, Morgan Kaufmann,
San Mateo, CA, pp. 154-161.
85. H. Ritter, K. Schulten (1988), "Kohonen's Self Organizing Maps:

Exploring their Computational Capabilities", in Proceedings of
the IEEE Int. Conf. on Neural Networks, San Diego, CA, pp. I -
109-116.
86. L. Rong, L. Ze-Min (1991), "Determination of the Parameters in a

Modified Hopfield-Tank Model on Solving TSP", in Proceedings
of the Int. Joint Conf. on Neural Networks, Singapore, pp. 2 4 5 5 -
2460.
87. L. Rong, L. Ze-Min (1992), "Parameters Rules of the Hopfield-

Tank Model on Solving TSP", in Proceedings of the Int. Joint
Conf. on Neural Networks, Baltimore, MD, pp. IV-492-497.
88. D. Rosenkrantz, R. Sterns, P. Lewis (1977), "An Analysis of

Several Heuristics for the Traveling Salesman Problem", SIAM J.
Computing 6, pp. 563-581.
58
89. K. Shinozawa, T. Uchiyama, K. Shimohara (1991), "An Approach

for Solving Dynamic TSP's Using Neural Networks", i n
Proceedings of the Int. Joint Conf on Neural Networks,
Singapore, pp. 2450-2454.
90. B. Shirazi, S. Yih (1989), "Critical Analysis of Applying a n

Hopfield Neural Net Model to Optimization Problems", i n
Proceedings of the IEEE Int. Conf. on Systems, Man a n d
Cybernetics, Cambridge, MA, pp. 210-215.
91. P.D. Simic (1990), "Statistical Mechanics as the Underlying

Theory of Elastic and Neural Optimisations", Network 1, pp. 8 9 -
103.
92. M.W. Simmen (1991), "Parameter Sensitivity of the Elastic Net

Approach to the Traveling Salesman Problem", Neural
Computation 3, pp. 363-374.
93. S.D. Simmes (1987), "Program Parallel Computers using Energy

Minimization Algorithms", in Proceedings of the IEEE Int. Conf.
on Neural Networks, San Diego, CA, pp. II-205-211.
94. D. Stein (1977), "Scheduling Dial-a-Ride Transportation Systems:

An Asymptotic Approach", Ph.D. Thesis, Harvard University,
Cambridge, MA.
95. H. Szu (1988), "Fast TSP Algorithm Based on Binary Neuron

Output and Analog Neuron Input Using the Zero-Diagonal
Interconnect Matrix and Necessary and Sufficient Constraints of
the Permutation Matrix", in Proceedings of the IEEE Int. Conf. on
96. D.A. Thomae, D.E. Van den Bout (1990), "Encoding Logical
Constraints into Neural Network Cost Functions", in Proceedings
of the Int. Joint Conf. on Neural Networks, San Diego, CA, p p .
III-863-868.
97. D.E. Van den Bout, T.K. Miller III (1988), "A Traveling Salesman
Objective Function that Works", in Proceedings of the IEEE I n t .
Conf. on Neural Networks, San Diego, CA, pp. II-299-303.
98. D.E. Van den Bout, T.K. Miller III (1989), "Improving t h e
Performance of the Hopfield-Tank Neural Network t h r o u g h
Normalization and Annealing", Biological Cybernetics 62, p p .
129-139.
59
99. D.E. Van den Bout, T.K. Miller III (1989), "TInMann: The I n t e g e r
Markovian Artificial Neural Network", in Proceedings of the I n t .
Joint Conf. on Neural Networks, Washington, DC, pp. II-205-211.
1 0 0 . E. Wacholder, J. Han, R.C. Mann (1988), "An Extension of t h e

Hopfield-Tank Model for Solution of the Multiple Traveling
Salesmen Problem", in Proceedings of the IEEE Int. Conf. o n
1 0 1 . E. Wacholder, J. Han, R.C. Mann (1989), "A Neural Network

Algorithm for the Multiple Traveling Salesman Problem",
Biological Cybernetics 61, pp. 11-19.
1 0 2 . S.D. Wang, C.M. Tsai (1991), "Hopfield Nets with Time-Varying

Energy Functions for Solving the Traveling Salesman Problem",
in Proceedings of the Int. Joint Conf. on Neural Networks,
Singapore, pp. 807-812.
1 0 3 . D.M. Wells (1992), "Solving Degenerate Optimization Problems

Using Networks of Neural Oscillators", Neural Networks 5, p p .
949-959.
1 0 4 . D.J. Willshaw, C. Von der Malsburg (1979), "A Marker Induction

Mechanism for the Establishment of Ordered Neural Mappings:
Its Application to the Retinotectal Problem", Philosophical
Transactions of the Royal Society, Series B 287, pp. 203-243.
1 0 5 . G.V. Wilson, G.S. Pawley (1988), "On the Stability of t h e

Travelling Salesman Problem Algorithm of Hopfield and Tank",
Biological Cybernetics 58, pp 63-70.
1 0 6 . X. Xu, W.T. Tsai (1991), "Effective Neural Algorithms for t h e

Traveling Salesman Problem", Neural Networks 4, pp. 193-205.
1 0 7 . C.S. Yu, W.D. Lee (1992), "Parallel Mean Field Annealing Neural
Network for Solving the Traveling Salesman Problem", i n
Baltimore, MD, pp. IV-532-536.
1 0 8 . T.W. Yue, L.C. Fu (1990), "Ineffectiveness in Solving

Combinatorial Optimization Problems Using a Hopfield Network:
A New Perspective from Aliasing Effect", in Proceedings of t h e
Int. Joint Conf on Neural Networks, San Diego, CA, pp. I I I - 7 8 7 -
792.
60
1 0 9 . A.L. Yuille (1990), "Generalized Deformable Models, Statistical

Physics, and Matching Problems", Neural Computation 2, pp. 1 -
24.

The Traveling Salesman Problem: A Neural Network Perspective

Uploaded by

Copyright:

Available Formats

The Traveling Salesman Problem: A Neural Network Perspective

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Traveling Salesman Problem: A Neural Network Perspective

Uploaded by

Copyright:

Available Formats

1

The Traveling Salesman Problem:

Centre de Recherche sur les Transports

Abstract. This paper surveys the "neurally" inspired p r o b l e m -

The Traveling Salesman Problem (TSP) is a classical

Many exact and heuristic algorithms have been devised in t h e

The exact algorithms are designed to find the optimal solution t o

all feasible solutions in order to identify the optimum. The exact

Although the subtour-breaking constraints can be formulated i n

Σ i,j S V xij ≤ |Sv | - 1 (Sv V; 2 ≤ |Sv | ≤ N-2) ,

where V is the set of all vertices, S v is some subset of V and |S v | is

Branch and bound algorithms are commonly used to find a n

these problems are better addressed by specialized algorithms t h a t

It is worth noting that problems with a few hundred vertices

Running an exact algorithm for hours on an expensive c o m p u t e r

Generally speaking, TSP heuristics can be classified as t o u r

(a) Construction procedures. The best known procedures in t h i s

(b) Improvement procedures. Among the local i m p r o v e m e n t

The neural network models discussed in this paper are often

(c) Composite procedures. Recently developed composite

For example, the iterated Lin-Kernighan heuristic can routinely

10,000 vertices. [53] Heuristic solutions within 4% of the optimum f o r

Fig. 2. Exchange of links (i,k),(j,l) for links (i,j),(k,l).

Artificial Neural Networks

Because of the simplicity of its formulation, the TSP has a l w a y s

Currently, neural networks do not provide solution quality t h a t

network models have already been directly implemented in

Hence, the neural network technology could provide a means t o

In the sections that follow, we review the three basic n e u r a l

The paper is organized along the following lines. Sections 1 and 2

A final remark concerns the class of TSPs addressed by n e u r a l

virtually all work concerns the ETSP. Accordingly, Euclidean

Section 1. The Hopfield-Tank Model

Before going further into the details of the Hopfield model, it is

For example, Figure 3a shows a TSP defined over a

In Figure 3b, the Hopfield network [50] is depicted as a 5 x 5

Only a few connections between the units are shown in Figure

(b) Neural network representation

Fig. 3. Mapping a TSP onto the Hopfield network.

In the TSP context, the weights are derived in part from t h e

In Section 1.1, we first introduce the Hopfield model, which is a

1.1 The Discrete Hopfield Model

The original Hopfield neural network model [50] is a fully

The dynamics of the Hopfield network can be described formally

set Vi to 0 if Σ jT ijV j < θi

set Vi to 1 if Σ jT ijV j > θi

do not change Vi if Σ jT ijV j = θi ,

where T ij is the connection weight between units i and j, and θ i is t h e

The units are updated at random, one unit at a time. Since t h e

The behavior of the network can be characterized by a n

E = -1/2 Σ iΣ j T ijV iV j + Σ i θ iV i . (1.1)

Since the connection weights T ij are symmetric, each term T ij V i V j

It is easy to show that a unit changes its activation level if a n d

Hence, the change in energy due to a change ∆ V i in the activation

∆Ei = - ∆Vi ( Σ j T ijV j - θ i) .

Now, ∆ V i is one if unit i changed its activation level from zero t o

1.2 The Continuous Hopfield-Tank Model

In [51], Hopfield and Tank extended the original model to a fully

The main motivation of Hopfield and Tank for extending t h e