Academia.eduAcademia.edu

Optimization Using The Simulated Annealing Algorithm

1998 Annual Conference Proceedings

In this paper we will briefly review the simulated annealing algorithm, an algorithm with applications in optimization and pattern recognition used extensively in artificial intelligence. In earlier papers the authors analyzed a simulation of the annealing of a solid, a dodecahedron in particular. Our use of this algorithm, which is based in the field of combinatorial optimization, reflects properties of Boltzman machines-a neural network characterized by massive parallelism. We will demonstrate two implementations of this algorithm in simulated annealing. Each of the implementations depends upon a neighborhood structure and a transition mechanism. In the first implementation our neighborhood structure is a linear transformation of the vector space of all configurations and the transition probability is deterministic. In this case, we will use techniques from character theory of finite groups to analyze simulated annealing. In the second implementation, a special case of which includes the first implementation, our neighborhood structure is a set-valued function and the transition mechanism is stochastic in nature. In this case, we use techniques from matrix analysis, in particular properties of doubly stochastic matrices, to analyze simulated annealing modeled and based on a class of Boltzman machines. For pattern recognition, we use the simulated annealing algorithm to solve the classic seven-segment display problem. This is a classification problem which we will solve by choosing an appropriate Boltzmann machine.

Session 2520 Optimization Using the Simulated Annealing Algorithm Edgar N. Reyes, Dennis I. Merino, Carl Steidley Southeastern Louisiana University/Texas A&M University - Corpus Christi Hammond, LA 70402/Corpus Christi, TX 78412 Abstract In this paper we will briefly review the simulated annealing algorithm, an algorithm with applications in optimization and pattern recognition used extensively in artificial intelligence. In earlier papers the authors analyzed a simulation of the annealing of a solid, a dodecahedron in particular. Our use of this algorithm, which is based in the field of combinatorial optimization, reflects properties of Boltzman machines - a neural network characterized by massive parallelism. We will demonstrate two implementations of this algorithm in simulated annealing. Each of the implementations depends upon a neighborhood structure and a transition mechanism. In the first implementation our neighborhood structure is a linear transformation of the vector space of all configurations and the transition probability is deterministic. In this case, we will use techniques from character theory of finite groups to analyze simulated annealing. In the second implementation, a special case of which includes the first implementation, our neighborhood structure is a set-valued function and the transition mechanism is stochastic in nature. In this case, we use techniques from matrix analysis, in particular properties of doubly stochastic matrices, to analyze simulated annealing modeled and based on a class of Boltzman machines. For pattern recognition, we use the simulated annealing algorithm to solve the classic seven-segment display problem. This is a classification problem which we will solve by choosing an appropriate Boltzmann machine. 1. Introduction. Annealing is the physical process of heating up a solid and following it by a specified slow cooling process. We shall use the simulated annealing algorithm, a method based in the field of combinatorial optimization, to describe simulated controlled cooling processes. In the annealing process, one can interpret the states (and free energy) of the solid in the cooling process as solutions (and cost function, respectively) of a combinatorial optimization problem [1]. Our use of the simulated annealing algorithm reflects properties of Boltzman machines, a neural network model belonging to a class of connectionists models and which has massive parallelism as a feature, amongst others. Also, we will use an appropriate Boltzmann machine to solve a pattern recognition problem, namely, the seven-segment display problem. The display of the decimal digits in a hand-watch for instance uses a seven-segment display. In identifying the digit displayed, we will maximize an overall measurement of desirability of the Boltzmann machine. We shall briefly review some aspects of Boltzman machines and the simulated annealing algorithm. Let (U, C) be a network consisting of units, U = {u i : i = 1, ..., n } , and a set of { } { } connections, C, consisting of unordered pairs u i , u j . A connection u i , u j in C is said to join Page 3.434.1 u i to u j . Intrinsic to Boltzman machines are the notions of a connection strength s and a Session 2520 configuration k of the network (U, C); respectively, they are real-valued functions defined on C { and U, respectively. The values s u i , u j } and k ( u ) give us the strength of the connection i {u , u } and the state of the unit u , respectively. Thus, a Boltzman machine is a network (U,C) i j i with a given connection strength s [1, chapter 8]. An objective of a Boltzman machine is to find an optimal configuration k o p t in the space S of all configurations k that minimizes the consensus function defined by C(k ) = ∑ {u i , u j }∈C { } s ui , u j k ( ui )k ( u j ) . (1.1) The values of the consensus function provide an overall measurement of desirability of the connections and the states of the units. The function in (1.1) is usually called the cost function in combinatorial optimization. To optimize (1.1), we will use the simulated annealing algorithm. There are several ways to implement this algorithm, moreover, each way depends on a neighborhood structure and a transition mechanism. A neighborhood structure is a function N from S into P( S), the family of all subsets of S. A configuration l in N(k) is called a neighbor of k. To optimize the consensus (1.1), we need a mechanism which allows a configuration to change. Given a configuration k, we shall randomly generate a neighbor l in N(k), with the neighborhood structure being defined at the outset, and then it will be determined whether l will replace k. Specifically, let X(m) be the configuration on the mth trial and let Pk , l ( m ) = P ( X ( m ) = l | X ( m − 1 ) = k ) be the probability of accepting configuration l on the mth trial given that the configuration of the (m-1)th trial is k. Under certain conditions, such as those discussed in [1, page 18, 42, or 46], the sequence of configurations generated by the simulated annealing algorithm asymptotically converges to an optimal configuration. The controlled cooling process is represented by a sequence {c m | m=0,1,2,3,...} of real numbers. Following [5], the simulated annealing is described by the algorithm below Begin Simulated Annealing Algorithm Initialize: ko ; an initial configuration m = 0; a counter Do: generate k in N(ko) if C(k) ≤ C(ko), then ko = k end; else if P ko ,k (m)> Random(0,1), then ko = k m:=m+1; Until: Calculate Stop Criterion (cm); Page 3.434.2 Session 2520 2. A Special Case : Cooling of a Dodecahedron Our first implementation is the simulated annealing of a dodecahedron, i.e., a regular solid with 12 faces which we choose because of its inherent symmetry.. We start with a network ( U d , C d ) consisting of units that are connected in some way. The set U d = {u i | i = 1, ..,1 2 } represents the { set of faces of a dodecahedron. A connection u i , u j } in C d is said to join u i to u j . For computational purposes, let us suppose the following pairs of faces are opposite of each other u1 and u2, u3 and u10, u4 and u11, u5 and u12, u6 and u8, u7 and u9 (2.2) In describing a cooling process for the dodecahedron, we label the 12 faces of the dodecahedron with real numbers. These numbers play the role of the temperatures of the faces. Let us suppose in this cooling process, that the numbers (or temperatures) change every minute. At the first minute , initially, assume for each i in {1,...,12} that face u i is labeled with the number i representing its temperature. On the second minute, the numbers on each of the faces changes according to a specified schedule. That is, on the second minute the number on face u i becomes the average of the first minute's numbers except those that were on u i and on the face opposite 1 12 ∑ i ; since u 2 is the face 1 0 i=3 opposite u1 then the numbers on the second minute on u1 and u 2 must be equal. Also, the u i . For example, on the second minute, the number on face u1 is 12 1 ∑ i . We continue 1 0 i =1 , i ≠ 3 ,1 0 with this process and on the third minute, the number on face u i becomes the average of the number on the third and tenth faces on the second minute is given by second minute's numbers except those that were on u i and on the face opposite u i . Repeating this process for four minutes, we find that the numbers on the faces are given by (rounded to 1 decimal place) Min u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11 u12 1 1 2 3 4 5 6 7 8 9 10 11 12 2 7.5 7.5 6.5 6.3 6.1 6.4 6.2 6.4 6.2 6.5 6.3 6.1 3 6.3 6.3 6.5 6.5 6.6 6.5 6.6 6.5 6.6 6.5 6.5 6.6 4 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 (2.3) In particular, the numbers (or temperatures) for the first four minutes on face u1 are 1, 7.5, 6.3, and 6.5. For each face, note that the number on the fourth minute is approximately 6.5, which is the average of the integers 1,...,12. Then as the number of minutes increases (or as more trials or iterations are done) the numbers on the faces shall all be approximately equal to 6.5. Page 3.434.3 Session 2520 To find the `equilibrium or optimal state' of the dodecahedron, i.e. the configuration when the temperatures on the faces are all equal, we shall seek to minimize 12 ∑ (k ( u i ) − m ( k )) 2 i =1 (2.4) 1 12 ∑ k ( u i ) over all configurations k. To implement the simulated annealing 1 2 i =1 algorithm, we need to interpret (2.4) as the consensus of k. To do this, we shall need to define an appropriate connection strength. Simplifying (2.4), we obtain 12 1 1 12 1 2 ∑ (k ( ui ) − m ( k )) = ∑ k ( u i ) 2 − ∑ k ( u i ) k ( u j ) and by defining the connection strength s on 1 2 i =1 6 i< j i =1 C according to where m ( k ) =  11  12 if u = v s ({u,v}) =  1 − otherwise  6 (2.5) we find that the consensus C(k) in (1.1) and cost function in (2.4) become equal. Next, we provide a neighborhood structure N for the cooling process given in (2.3). In this first implementation, the values of the neighborhood structure N shall be singletons instead of being set-valued in general. Given a configuration k, let N(k) be the configuration defined by 1 (2.6) N (k)(u i ) = ∑ k(u j ) 10 where the sum is taken over all faces u j , except u i and the face opposite u i . If we define the transition probabilities by 1 if N (k) = l (2.7) Pk,l (m ) =  0 o th erw ise for all trials m, then together with the program for the simulated annealing algorithm given at the end of the introduction we have described a cooling process of the dodecahedron. The neighborhood structure N in (2.6) completely describes the simulated annealing of the dodecahedron as described in [6]. In particular, the equation in (2.6) defines a linear transformation N from the complex vector space V (consisting of all configurations {u1 , ..., u1 2 } Page 3.434.4 where u i is a complex number) into itself. Certain invariant subspaces and eigenvalues of the linear transformation are identified by using techniques of character theory (for instance); which in turn lead us to conclude that iterates of this operator converges to a multiple of the identity 1 operator. In particular, lim N m = I 1 2 where I 1 2 is the identity operator on V. Thus, as we 12 m→ ∞ have shown in [6], if the initial numbering (or set of temperatures) of the 12 faces is given by the sequence (Bo1,Bo2,...,Bo12) of real numbers, i.e., face u i is labeled (or has the temperature) B io , then after several minutes or iterations each number (or temperature) on the face shall be approximately 1 12 0 ∑ Bi . 1 2 i =1 3. Using Matrices in Simulated Annealing In this section, we generalize the results (as derived in [4]) of the previous section. Let U = {u i | i = 1, ..., n } and let C = {{u , u } | 1 ≤ i , j ≤ n } be a network representing a Boltzman i j machine, where n ≥ 2 . For the connection strength s, we consider n − 1 if i = j  n s ui , u j =  2 − otherwise  n { } (3.8) An optimal configuration k o p t minimizing the consensus (1.1) with connection strength (3.8) necessarily and sufficiently minimizes ∑ (k ( ui ) − m ( k )) n 2 i =1 over all configurations k where m ( k ) = (3.9) 1 n ∑ k ( ui ) . n i =1  1 Λ 1   To generalize the results in section 2, a few preliminaries are necessary. Let N n =  Μ Ο Μ  be    1 Λ 1 the n-by-n matrix whose entries are all 1's. For instance N 1 2 is a 12-by-12 matrix. One says that an n-by-n matrix, Q, is doubly stochastic if the entries of Q are nonnegative, and the sum of the entries in each row and column is 1. In the previous section, the neighborhood structure in (2.6) 1 when thought of as a linear transformation can be represented by the matrix N d = ( N 1 2 − 2 Q ) 10 where Q is the doubly stochastic matrix given by Page 3.434.5 Session 2520  .5 .5 0 0 0   .5 .5 0 0 0  0 0 .5 .5 0   0 0 .5 .5 0  0 0 0 0 .5   0 0 0 0 .5 . 0 0 0 0 0  0 0 0 0 0  0 0 0 0 0 0 0 0 0 0  0 0 0 0 0  0 0 0 0 0 0  0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 .5 0 0 0 0 0 0   .5 0 0 0 0 0 0  0 .5 .5 0 0 0 0  0 .5 .5 0 0 0 0   0 0 0 .5 .5 0 0  0 0 0 .5 .5 0 0   0 0 0 0 0 .5 .5   0 0 0 0 0 .5 .5  0 0 0 0 0 0 0 0 0 0 0 0 To enlarge the neighborhood structure in section 2, let n  1 N (k ) =  ( N n − cQ ) : Q is a d o u b ly sto ch a stic m a trix a n d 0 ≤ c <  2 n − c (3.10) (3.11) Once again in utilizing the simulated annealing algorithm, we need to define a mechanism for { } generating a neighbor l in N(k) of k. If E r : r = 1, ..., n ! is the set of all n-by-n permutation matrices, then each doubly stochastic matrix Q can be expressed in the form n! Q = ∑αr Er r =1 (3.12) n! n , we can stochastically 2 r =1 generate a double stochastic matrix Q by stochastically choosing an n!-tuple (α r ) and then where ∑ α r = 1 , α r ≥ 0 [3, Theorem 8.7.1]. This implies that given c < constructing the matrix in (3.12). Furthermore, to implement the simulated annealing algorithm we shall have to define a transition probability. For one, let Pk , l ( m ) = α 1 where l ∈ N ( k ) is given by l = 1 n −c n!    N n − c ∑αr Er    r =1 Suppose k is an initial configuration of a Boltzman machine (U, C ). By realizing k as a column 1 vector, then the configuration in the second trial is ( N n − c1 Q 1 )k and the configuration n − c1 Page 3.434.6 Session 2520  1  1  n in the third trial is  N n − c 2 Q 2 )  N n − c1 Q 1 ) k for some 0 ≤ c1 , c 2 < and doubly ( ( 2  n − c2   n − c1  n − ε for some ε > 0 , 2 1 we have shown that after several trials the mth configuration shall be approximately N n ( k ) , n m 1 1 i.e., lim ∏ ( N n − c i Q i )( k ) = N n ( k ) . As a special case, when n=12, c i = 2 , and each Q i n m → ∞ i =1 n − c i is the matrix in (3.10) for all i, we obtain the simulated annealing of the dodecahedron in section m 1 1 2; that is, lim ∏ ( N n − c i Q i )( k ) = lim N dm ( k ) = I1 2 . 12 m→ ∞ m → ∞ i =1 n − c i stochastic matrices Q_1, Q_2. In [4], provided each c i satisfies 0 ≤ c i < 4. Classification: Identifying Digits A class of optimization problems that can be easily solved by human beings but very difficult for computers are the so-called classification problems - these are problems on associating objects with subsets. Classification problems have their origins in pattern recognition. A specific pattern recognition problem we will solve with combinatorial optimization and Boltzman machines is the seven-segment display problem. This particular problem is extensively discussed in [1,Section 10.3]. The display of the digits 0,1,2,3,...,9 often uses a seven-segment display. (Imagine two equal squares where one sits on top of the other square). Each of the segments can be independently assigned 0 (for `off') or 1 (for `on'). We will choose a Boltzman machine which will identify any digit displayed. It is possible that the figure shown in the seven-segment display is not a number but we still would like the Boltzmann machine to assign a digit to the seven-segment display. We will consider a neural network whose set of units, C, is the union of U i = {u1 , ..., u 7 } , the input units, and U o = {v 1 , ..., v 1 0 } , the output units. The state of a unit is either 0 or 1. In particular, a configuration is a 17-tuple consisting of 0's and 1's. For each of the ten digits, we assign a configuration according to the table below. u1 u2 u3 u4 u5 u6 u7 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 0 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 1 0 1 0 0 1 0 0 0 0 0 0 0 3 1 0 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 4 0 1 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 5 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 Page 3.434.7 digit digit u1 u2 u3 u4 u5 u6 u7 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 6 0 1 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 7 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 8 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 9 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 We will call the given set of configurations above as the classification set V ′ . If each of the seven input units are assigned a state, we would like the Boltzmann machine to maximize its overall desirability by assigning states to the remaining units, the outputs units. Observe that each of the configurations in V ′ assigns the state 1 to exactly one output unit. For us, an acceptable Boltzmann machine will assign exactly one digit to a given configuration of the input units. For the set of connections, we have the union of C i , o = C o ,o = {{u , v } : 1 ≤ i ≤ 7 , 1 ≤ j ≤ 1 0} and i j {{v , v } : 1 ≤ i < j ≤ 1 0} . Note, no connection exists between input units and there are no i j { } bias connections. Following [1, Section 10.3], a connection u i , v j ∈ C i , o is said to be excitatory { } if there exist k ∈V ′ such that k ( u i ) k ( v j ) = 1 ; otherwise, u i , v j is said to be inhibitory. To { { } } { { } } each v j ∈U o , let N v+j = u i : u i , v j is excita to ry and let N v−j = u i : u i , v j is in h ib ito ry .   γ For the connection strength s, we first choose a positive constant γ and let δ = M in  v j ∈C o , o  N +  vj { }   .   Note, − γ + δ < 0 . For each u i , v j ∈ C i , o , let  γ if ui ∈ N v+j  +  Nvj s ui , v j =  γ − if ui ∈ N v−j −  Nvj  { { } { − γ < s {v , v } < − γ + δ . } and if v i , v j ∈ C o , o , let s v i , v j i (4.15) } be any negative constant number satisfying j Page 3.434.8 Session 2520 As shown in [1, page 186], if the given set of states of the input units represents one of the digits 0,1,2,3,...,9 then the optimal configuration maximizing the consensus function (obtained by choosing the optimal configuration of the remaining units, i.e., of the output units) represents and identifies the correct input digit. 5. Summary In this paper, we have simulated annealing processes and solved a pattern recognition problem by using the simulated annealing algorithm and Boltzman machines. These are two types of fields, combinatorial optimization and pattern recognition (a part of artificial intelligence) amongst others, where Boltzman machines can be used. We have interpreted the simulated annealing processes in sections 2 and 3 as minimization problems; in particular the minimization of the consensus function over all configurations of an appropriate Boltzman machine. The states of the units represented the temperatures of the units. The simulated annealing algorithm has close connections with statistical mechanics. In the annealing process, one can interpret the states (respectively, free energy) of the solid in the cooling process as configurations (respectively, consensus function) of a Boltzman machine [1,2]. Pattern recognition, briefly, finds a correct output for a given input. As a combinatorial optimization problem, the seven-segment display problem is construed as a constrained optimization problem. An appropriate Boltzman machine with a set of units which can be separated into two subsets, namely, the subset of input units and the subset of output units. Giving an input is equivalent to giving a configuration of the input units. Once an input is given, the objective then is to find a configuration of the remaining units (which are the output units) which maximizes the consensus function of the Boltzman machine. 5. References 1. AARTS, E. and KORST, J., Simulated Annealing and Boltzman Machines, Wiley-Interscience Series in Discrete Mathematics and Optimization, John Wiley and Sons, 1989. 2. BONOMI, E. and LUTTON, J.-L., The N-City Travelling Salesman Problem: Statistical Mechanics and the Metropolis Algorithm, Siam Review 26, No. 4, October 1984. 3. HORN, R.A. and JOHNSON, C.R., Matrix Analysis, Cambridge University Press, 1990. 4. MERINO, D.I., REYES, E.N., and STEIDLEY, C., Using Matrix Analysis to Approach the Simulated Annealing Algorithm, accepted for publication in Computers in Education Journal of the American Society for Engineering Education 5. REYES, E.N. and STEIDLEY, C., A GAP Approach to the Simulated Annealing Algorithm, Computers in Education Journal of the American Society for Engineering Education, Vol. 7 (1997), No. 3, 43-47. 6. REYES, E.N. and STEIDLEY, C., A Theoretical Discussion of the Mathematics Underlying ``A GAP Approach to the Simulated Annealing Algorithm”, Computers in Education Journal of the American Society for Engineering Education, Vol. 7 (1997), No. 4, 50-57. 7. SCHONERT, M et al., GAP-Groups, Algorithms, and Programming, Version 3 Release 4, Lehrstuhl D fur Mathematik, RWTH, Aachen, Germany, 1994. Page 3.434.9