Session 2520
Optimization Using the Simulated Annealing Algorithm
Edgar N. Reyes, Dennis I. Merino, Carl Steidley
Southeastern Louisiana University/Texas A&M University - Corpus Christi
Hammond, LA 70402/Corpus Christi, TX 78412
Abstract
In this paper we will briefly review the simulated annealing algorithm, an algorithm with applications in
optimization and pattern recognition used extensively in artificial intelligence. In earlier papers the authors analyzed
a simulation of the annealing of a solid, a dodecahedron in particular. Our use of this algorithm, which is
based in the field of combinatorial optimization, reflects properties of Boltzman machines - a neural network
characterized by massive parallelism.
We will demonstrate two implementations of this algorithm in simulated annealing. Each of the implementations
depends upon a neighborhood structure and a transition mechanism. In the first implementation our neighborhood
structure is a linear transformation of the vector space of all configurations and the transition probability is
deterministic. In this case, we will use techniques from character theory of finite groups to analyze simulated
annealing. In the second implementation, a special case of which includes the first implementation, our
neighborhood structure is a set-valued function and the transition mechanism is stochastic in nature. In this case, we
use techniques from matrix analysis, in particular properties of doubly stochastic matrices, to analyze simulated
annealing modeled and based on a class of Boltzman machines.
For pattern recognition, we use the simulated annealing algorithm to solve the classic seven-segment display
problem. This is a classification problem which we will solve by choosing an appropriate Boltzmann machine.
1. Introduction.
Annealing is the physical process of heating up a solid and following it by a specified slow
cooling process. We shall use the simulated annealing algorithm, a method based in the field of
combinatorial optimization, to describe simulated controlled cooling processes. In the annealing
process, one can interpret the states (and free energy) of the solid in the cooling process as
solutions (and cost function, respectively) of a combinatorial optimization problem [1]. Our use
of the simulated annealing algorithm reflects properties of Boltzman machines, a neural network
model belonging to a class of connectionists models and which has massive parallelism as a
feature, amongst others.
Also, we will use an appropriate Boltzmann machine to solve a pattern recognition problem,
namely, the seven-segment display problem. The display of the decimal digits in a hand-watch
for instance uses a seven-segment display. In identifying the digit displayed, we will maximize an
overall measurement of desirability of the Boltzmann machine.
We shall briefly review some aspects of Boltzman machines and the simulated annealing
algorithm. Let (U, C) be a network consisting of units, U = {u i : i = 1, ..., n } , and a set of
{
}
{
}
connections, C, consisting of unordered pairs u i , u j . A connection u i , u j in C is said to join
Page 3.434.1
u i to u j . Intrinsic to Boltzman machines are the notions of a connection strength s and a
Session 2520
configuration k of the network (U, C); respectively, they are real-valued functions defined on C
{
and U, respectively. The values s u i , u j
} and k ( u ) give us the strength of the connection
i
{u , u } and the state of the unit u , respectively. Thus, a Boltzman machine is a network (U,C)
i
j
i
with a given connection strength s [1, chapter 8]. An objective of a Boltzman machine is to find
an optimal configuration k o p t in the space S of all configurations k that minimizes the consensus
function defined by
C(k ) =
∑
{u i , u j }∈C
{
}
s ui , u j k ( ui )k ( u j ) .
(1.1)
The values of the consensus function provide an overall measurement of desirability of the
connections and the states of the units. The function in (1.1) is usually called the cost function in
combinatorial optimization.
To optimize (1.1), we will use the simulated annealing algorithm. There are several ways to
implement this algorithm, moreover, each way depends on a neighborhood structure and a
transition mechanism. A neighborhood structure is a function N from S into P( S), the family of
all subsets of S. A configuration l in N(k) is called a neighbor of k. To optimize the consensus
(1.1), we need a mechanism which allows a configuration to change. Given a configuration k, we
shall randomly generate a neighbor l in N(k), with the neighborhood structure being defined at
the outset, and then it will be determined whether l will replace k. Specifically, let X(m) be the
configuration on the mth trial and let Pk , l ( m ) = P ( X ( m ) = l | X ( m − 1 ) = k ) be the probability of
accepting configuration l on the mth trial given that the configuration of the (m-1)th trial is k.
Under certain conditions, such as those discussed in [1, page 18, 42, or 46], the sequence of
configurations generated by the simulated annealing algorithm asymptotically converges to an
optimal configuration.
The controlled cooling process is represented by a sequence {c m | m=0,1,2,3,...}
of real numbers. Following [5], the simulated annealing is described by the algorithm below
Begin Simulated Annealing Algorithm
Initialize:
ko ; an initial configuration
m = 0; a counter
Do: generate k in N(ko)
if C(k) ≤ C(ko), then ko = k
end;
else if P ko ,k (m)> Random(0,1), then ko = k
m:=m+1;
Until:
Calculate Stop Criterion (cm);
Page 3.434.2
Session 2520
2. A Special Case : Cooling of a Dodecahedron
Our first implementation is the simulated annealing of a dodecahedron, i.e., a regular solid with
12 faces which we choose because of its inherent symmetry.. We start with a network ( U d , C d )
consisting of units that are connected in some way. The set U d = {u i | i = 1, ..,1 2 } represents the
{
set of faces of a dodecahedron. A connection u i , u j
} in C
d
is said to join u i to u j . For
computational purposes, let us suppose the following pairs of faces are opposite of each other
u1 and u2, u3 and u10, u4 and u11,
u5 and u12, u6 and u8, u7 and u9
(2.2)
In describing a cooling process for the dodecahedron, we label the 12 faces of the dodecahedron
with real numbers. These numbers play the role of the temperatures of the faces. Let us suppose
in this cooling process, that the numbers (or temperatures) change every minute. At the first
minute , initially, assume for each i in {1,...,12} that face u i is labeled with the number i
representing its temperature. On the second minute, the numbers on each of the faces changes
according to a specified schedule. That is, on the second minute the number on face u i becomes
the average of the first minute's numbers except those that were on u i and on the face opposite
1 12
∑ i ; since u 2 is the face
1 0 i=3
opposite u1 then the numbers on the second minute on u1 and u 2 must be equal. Also, the
u i . For example, on the second minute, the number on face u1 is
12
1
∑ i . We continue
1 0 i =1 , i ≠ 3 ,1 0
with this process and on the third minute, the number on face u i becomes the average of the
number on the third and tenth faces on the second minute is given by
second minute's numbers except those that were on u i and on the face opposite u i . Repeating
this process for four minutes, we find that the numbers on the faces are given by (rounded to 1
decimal place)
Min
u1
u2
u3
u4
u5
u6
u7
u8
u9
u10
u11
u12
1
1
2
3
4
5
6
7
8
9
10
11
12
2
7.5
7.5
6.5
6.3
6.1
6.4
6.2
6.4
6.2
6.5
6.3
6.1
3
6.3
6.3
6.5
6.5
6.6
6.5
6.6
6.5
6.6
6.5
6.5
6.6
4
6.5
6.5
6.5
6.5
6.5
6.5
6.5
6.5
6.5
6.5
6.5 6.5
(2.3)
In particular, the numbers (or temperatures) for the first four minutes on face u1 are 1, 7.5, 6.3,
and 6.5. For each face, note that the number on the fourth minute is approximately 6.5, which is
the average of the integers 1,...,12. Then as the number of minutes increases (or as more trials or
iterations are done) the numbers on the faces shall all be approximately equal to 6.5.
Page 3.434.3
Session 2520
To find the `equilibrium or optimal state' of the dodecahedron, i.e. the configuration when the
temperatures on the faces are all equal, we shall seek to minimize
12
∑ (k ( u i ) − m ( k ))
2
i =1
(2.4)
1 12
∑ k ( u i ) over all configurations k. To implement the simulated annealing
1 2 i =1
algorithm, we need to interpret (2.4) as the consensus of k. To do this, we shall need to define an
appropriate connection strength. Simplifying (2.4), we obtain
12
1 1 12
1
2
∑ (k ( ui ) − m ( k )) =
∑ k ( u i ) 2 − ∑ k ( u i ) k ( u j ) and by defining the connection strength s on
1 2 i =1
6 i< j
i =1
C according to
where m ( k ) =
11
12 if u = v
s ({u,v}) =
1
−
otherwise
6
(2.5)
we find that the consensus C(k) in (1.1) and cost function in (2.4) become equal. Next, we
provide a neighborhood structure N for the cooling process given in (2.3). In this first
implementation, the values of the neighborhood structure N shall be singletons instead of being
set-valued in general. Given a configuration k, let N(k) be the configuration defined by
1
(2.6)
N (k)(u i ) =
∑ k(u j )
10
where the sum is taken over all faces u j , except u i and the face opposite u i . If we define the
transition probabilities by
1 if N (k) = l
(2.7)
Pk,l (m ) =
0 o th erw ise
for all trials m, then together with the program for the simulated annealing algorithm given at the
end of the introduction we have described a cooling process of the dodecahedron.
The neighborhood structure N in (2.6) completely describes the simulated annealing of the
dodecahedron as described in [6]. In particular, the equation in (2.6) defines a linear
transformation N from the complex vector space V (consisting of all configurations {u1 , ..., u1 2 }
Page 3.434.4
where u i is a complex number) into itself. Certain invariant subspaces and eigenvalues of the
linear transformation are identified by using techniques of character theory (for instance); which
in turn lead us to conclude that iterates of this operator converges to a multiple of the identity
1
operator. In particular, lim N m =
I 1 2 where I 1 2 is the identity operator on V. Thus, as we
12
m→ ∞
have shown in [6], if the initial numbering (or set of temperatures) of the 12 faces is given by the
sequence (Bo1,Bo2,...,Bo12) of real numbers, i.e., face u i is labeled (or has the temperature)
B io , then after several minutes or iterations each number (or temperature) on the face shall be
approximately
1 12 0
∑ Bi .
1 2 i =1
3. Using Matrices in Simulated Annealing
In this section, we generalize the results (as derived in [4]) of the previous section.
Let U = {u i | i = 1, ..., n } and let C =
{{u , u } | 1 ≤ i , j ≤ n } be a network representing a Boltzman
i
j
machine, where n ≥ 2 . For the connection strength s, we consider
n − 1
if i = j
n
s ui , u j =
2
−
otherwise
n
{
}
(3.8)
An optimal configuration k o p t minimizing the consensus (1.1) with connection strength (3.8)
necessarily and sufficiently minimizes
∑ (k ( ui ) − m ( k ))
n
2
i =1
over all configurations k where m ( k ) =
(3.9)
1 n
∑ k ( ui ) .
n i =1
1 Λ 1
To generalize the results in section 2, a few preliminaries are necessary. Let N n = Μ Ο Μ
be
1 Λ 1
the n-by-n matrix whose entries are all 1's. For instance N 1 2 is a 12-by-12 matrix. One says that
an n-by-n matrix, Q, is doubly stochastic if the entries of Q are nonnegative, and the sum of the
entries in each row and column is 1. In the previous section, the neighborhood structure in (2.6)
1
when thought of as a linear transformation can be represented by the matrix N d = ( N 1 2 − 2 Q )
10
where Q is the doubly stochastic matrix given by
Page 3.434.5
Session 2520
.5 .5 0 0 0
.5 .5 0 0 0
0 0 .5 .5 0
0 0 .5 .5 0
0 0 0 0 .5
0 0 0 0 .5 .
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0
0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
.5 0 0 0 0 0 0
.5 0 0 0 0 0 0
0 .5 .5 0 0 0 0
0 .5 .5 0 0 0 0
0 0 0 .5 .5 0 0
0 0 0 .5 .5 0 0
0 0 0 0 0 .5 .5
0 0 0 0 0 .5 .5
0
0
0
0
0
0
0
0
0
0
0
0
To enlarge the neighborhood structure in section 2, let
n
1
N (k ) =
( N n − cQ ) : Q is a d o u b ly sto ch a stic m a trix a n d 0 ≤ c <
2
n − c
(3.10)
(3.11)
Once again in utilizing the simulated annealing algorithm, we need to define a mechanism for
{
}
generating a neighbor l in N(k) of k. If E r : r = 1, ..., n ! is the set of all n-by-n permutation
matrices, then each doubly stochastic matrix Q can be expressed in the form
n!
Q = ∑αr Er
r =1
(3.12)
n!
n
, we can stochastically
2
r =1
generate a double stochastic matrix Q by stochastically choosing an n!-tuple (α r ) and then
where ∑ α r = 1 , α r ≥ 0 [3, Theorem 8.7.1]. This implies that given c <
constructing the matrix in (3.12). Furthermore, to implement the simulated annealing algorithm
we shall have to define a transition probability. For one, let Pk , l ( m ) = α 1 where l ∈ N ( k ) is
given by l =
1
n −c
n!
N n − c ∑αr Er
r =1
Suppose k is an initial configuration of a Boltzman machine (U, C ). By realizing k as a column
1
vector, then the configuration in the second trial is
( N n − c1 Q 1 )k and the configuration
n − c1
Page 3.434.6
Session 2520
1
1
n
in the third trial is
N n − c 2 Q 2 )
N n − c1 Q 1 ) k for some 0 ≤ c1 , c 2 < and doubly
(
(
2
n − c2
n − c1
n
− ε for some ε > 0 ,
2
1
we have shown that after several trials the mth configuration shall be approximately N n ( k ) ,
n
m
1
1
i.e., lim ∏
( N n − c i Q i )( k ) = N n ( k ) . As a special case, when n=12, c i = 2 , and each Q i
n
m → ∞ i =1 n − c i
is the matrix in (3.10) for all i, we obtain the simulated annealing of the dodecahedron in section
m
1
1
2; that is, lim ∏
( N n − c i Q i )( k ) = lim N dm ( k ) =
I1 2 .
12
m→ ∞
m → ∞ i =1 n − c i
stochastic matrices Q_1, Q_2. In [4], provided each c i satisfies 0 ≤ c i <
4. Classification: Identifying Digits
A class of optimization problems that can be easily solved by human beings but very difficult for
computers are the so-called classification problems - these are problems on associating objects
with subsets. Classification problems have their origins in pattern recognition. A specific pattern
recognition problem we will solve with combinatorial optimization and Boltzman machines is
the seven-segment display problem. This particular problem is extensively discussed in
[1,Section 10.3].
The display of the digits 0,1,2,3,...,9 often uses a seven-segment display. (Imagine two equal
squares where one sits on top of the other square). Each of the segments can be independently
assigned 0 (for `off') or 1 (for `on'). We will choose a Boltzman machine which will identify any
digit displayed. It is possible that the figure shown in the seven-segment display is not a number
but we still would like the Boltzmann machine to assign a digit to the seven-segment display.
We will consider a neural network whose set of units, C, is the union of U i = {u1 , ..., u 7 } , the
input units, and U o = {v 1 , ..., v 1 0 } , the output units. The state of a unit is either 0 or 1. In
particular, a configuration is a 17-tuple consisting of 0's and 1's. For each of the ten digits, we
assign a configuration according to the table below.
u1
u2
u3
u4
u5
u6
u7
v1
v2
v3
v4
v5
v6
v7
v8
v9
v10
0
1
1
1
0
1
1
1
1
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
2
1
0
1
1
1
0
1
0
0
1
0
0
0
0
0
0
0
3
1
0
1
1
0
1
1
0
0
0
1
0
0
0
0
0
0
4
0
1
1
1
0
1
0
0
0
0
0
1
0
0
0
0
0
5
1
1
0
1
0
1
1
0
0
0
0
0
1
0
0
0
0
Page 3.434.7
digit
digit
u1
u2
u3
u4
u5
u6
u7
v1
v2
v3
v4
v5
v6
v7
v8
v9
v10
6
0
1
0
1
1
1
1
0
0
0
0
0
0
1
0
0
0
7
1
0
1
0
0
1
0
0
0
0
0
0
0
0
1
0
0
8
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
0
9
1
1
1
1
0
1
0
0
0
0
0
0
0
0
0
0
1
We will call the given set of configurations above as the classification set V ′ . If each of the
seven input units are assigned a state, we would like the Boltzmann machine to maximize its
overall desirability by assigning states to the remaining units, the outputs units. Observe that each
of the configurations in V ′ assigns the state 1 to exactly one output unit. For us, an acceptable
Boltzmann machine will assign exactly one digit to a given configuration of the input units.
For the set of connections, we have the union of C i , o =
C o ,o =
{{u , v } : 1 ≤ i ≤ 7 , 1 ≤ j ≤ 1 0} and
i
j
{{v , v } : 1 ≤ i < j ≤ 1 0} . Note, no connection exists between input units and there are no
i
j
{
}
bias connections. Following [1, Section 10.3], a connection u i , v j ∈ C i , o is said to be excitatory
{
}
if there exist k ∈V ′ such that k ( u i ) k ( v j ) = 1 ; otherwise, u i , v j is said to be inhibitory. To
{ {
}
}
{ {
}
}
each v j ∈U o , let N v+j = u i : u i , v j is excita to ry and let N v−j = u i : u i , v j is in h ib ito ry .
γ
For the connection strength s, we first choose a positive constant γ and let δ = M in
v j ∈C o , o N +
vj
{
}
.
Note, − γ + δ < 0 . For each u i , v j ∈ C i , o , let
γ
if ui ∈ N v+j
+
Nvj
s ui , v j =
γ
−
if ui ∈ N v−j
−
Nvj
{
{ }
{
− γ < s {v , v } < − γ + δ .
}
and if v i , v j ∈ C o , o , let s v i , v j
i
(4.15)
} be any negative constant number satisfying
j
Page 3.434.8
Session 2520
As shown in [1, page 186], if the given set of states of the input units represents one of the digits
0,1,2,3,...,9 then the optimal configuration maximizing the consensus function (obtained by
choosing the optimal configuration of the remaining units, i.e., of the output units) represents and
identifies the correct input digit.
5. Summary
In this paper, we have simulated annealing processes and solved a pattern recognition problem by
using the simulated annealing algorithm and Boltzman machines. These are two types of fields,
combinatorial optimization and pattern recognition (a part of artificial intelligence) amongst
others, where Boltzman machines can be used.
We have interpreted the simulated annealing processes in sections 2 and 3 as minimization
problems; in particular the minimization of the consensus function over all configurations of an
appropriate Boltzman machine. The states of the units represented the temperatures of the units.
The simulated annealing algorithm has close connections with statistical mechanics. In the
annealing process, one can interpret the states (respectively, free energy) of the solid in the
cooling process as configurations (respectively, consensus function) of a Boltzman machine
[1,2].
Pattern recognition, briefly, finds a correct output for a given input. As a combinatorial
optimization problem, the seven-segment display problem is construed as a constrained
optimization problem. An appropriate Boltzman machine with a set of units which can be
separated into two subsets, namely, the subset of input units and the subset of output units.
Giving an input is equivalent to giving a configuration of the input units. Once an input
is given, the objective then is to find a configuration of the remaining units (which are the output
units) which maximizes the consensus function of the Boltzman machine.
5. References
1. AARTS, E. and KORST, J., Simulated Annealing and Boltzman Machines, Wiley-Interscience Series in Discrete
Mathematics and Optimization, John Wiley and Sons, 1989.
2. BONOMI, E. and LUTTON, J.-L., The N-City Travelling Salesman Problem: Statistical Mechanics and the
Metropolis Algorithm, Siam Review 26, No. 4, October 1984.
3. HORN, R.A. and JOHNSON, C.R., Matrix Analysis, Cambridge University Press, 1990.
4. MERINO, D.I., REYES, E.N., and STEIDLEY, C., Using Matrix Analysis to Approach the Simulated Annealing
Algorithm, accepted for publication in Computers in Education Journal of the American Society for Engineering
Education
5. REYES, E.N. and STEIDLEY, C., A GAP Approach to the Simulated Annealing Algorithm, Computers in
Education Journal of the American Society for Engineering Education, Vol. 7 (1997), No. 3, 43-47.
6. REYES, E.N. and STEIDLEY, C., A Theoretical Discussion of the Mathematics Underlying ``A GAP Approach
to the Simulated Annealing Algorithm”, Computers in Education Journal of the American Society for Engineering
Education, Vol. 7 (1997), No. 4, 50-57.
7. SCHONERT, M et al., GAP-Groups, Algorithms, and Programming, Version 3 Release 4, Lehrstuhl D fur
Mathematik, RWTH, Aachen, Germany, 1994.
Page 3.434.9