Fast Self-Organizing Feature Map Algorithm: Mu-Chun Su and Hsiao-Te Chang
Fast Self-Organizing Feature Map Algorithm: Mu-Chun Su and Hsiao-Te Chang
Fast Self-Organizing Feature Map Algorithm: Mu-Chun Su and Hsiao-Te Chang
Abstract—We present an efficient approach to forming feature It has been noted that the success of conventional feature
maps. The method involves three stages. In the first stage, we use map formation (e.g., trained by the Kohonen self-organizing
the -means algorithm to select 2 (i.e., the size of the feature feature map algorithm) is critically dependent on how the
map to be formed) cluster centers from a data set. Then a heuristic
assignment strategy is employed to organize the 2 selected data main parameters of the algorithm, namely, the learning rate
points into an neural array so as to form an initial fea- and the neighborhood function, are selected and how the
ture map. If the initial map is not good enough, then it will be weights are initialized. Unfortunately, a process of trial and
fine-tuned by the traditional Kohonen self-organizing feature map error usually determines them. Often one realizes only at the
(SOM) algorithm under a fast cooling regime in the third stage. By end of a simulation that usually requires a huge amount of
our three-stage method, a topologically ordered feature map would
be formed very quickly instead of requiring a huge amount of it- iterations that different selections of the parameters or initial
erations to fine-tune the weights toward the density distribution of weights would have been more appropriate. In addition, another
the data points, which usually happened in the conventional SOM problem associated with Kohonen self-organizing feature map
algorithm. Three data sets are utilized to illustrate the proposed (SOM) algorithm is that it tends to overrepresent regions of
method. low input density and underrepresent regions of high input
Index Terms—Neural networks, self-organizing feature map, un- density [11]. Several different approaches have been proposed
supervised learning. to improve the conventional SOM algorithm. Some researchers
used genetic algorithms to form feature maps [12]–[15]. Lo
I. INTRODUCTION and Bavarian addressed the effect of neighborhood function
selection on the rate of convergence of the SOM algorithm
one- or two-dimensional (2-D) arrays of neurons and to perform We propose a method involving three stages to generate such
this transform adaptively in a topological ordered fashion. The a topologically ordered feature map. These three stages will be
essential constituents of feature maps are as follows [2]: discussed in the following three subsections.
• an array of neurons that compute simple output functions
of incoming inputs of arbitrary dimensionality; A. Selection of Data Points
• a mechanism for selecting the neuron with the largest Let denote the size of the square feature map to be
output; generated. We use the -means algorithm (also known as
• an adaptive mechanism that updates the weights of the -means algorithm) to select cluster centers from a data
selected neuron and its neighbors. set containing data points. For a batch-mode operation, the
The training algorithm proposed by Kohonen for forming a -means algorithm consists of the following steps.
feature map is summarized as follows. 1) Randomly choose initial cluster centers
Step 1) Initialization: Choose random values for the initial .
weights . 2) For each sample, find the cluster center nearest it. Put the
Step 2) Winner Finding: Find the winning neuron at time sample in the cluster identified with this nearest cluster
, using the minimum-distance Euclidean criterion center.
3) Compute the new cluster centers
(1)
(4)
where represents the th
input pattern, is the total number of neurons, and
indicates the Euclidean norm. where is the number of samples in that denotes
Step 3) Weights Updating: Adjust the weights of the winner the set of samples whose cluster centers is at time
and its neighbors, using the following rule: .
4) If , for , or the number
of iterations has reached its prespecified number, the pro-
(2) cedure is terminated. Otherwise, go to Step 2).
These data points selected by the -means algorithm
where is a positive constant and is would reflect variations in the statistics of the input distribu-
the topological neighborhood function of the winner tion as strictly as possible. Unlike the -means algorithm, the
neuron at time . It should be emphasized that the conventional SOM algorithm has to simultaneously update the
success of the map formation is critically dependent weight vectors of the winner as well as its corresponding neigh-
on how the values of the main parameters (i.e., bors. Understandably, the -means algorithm would require
and ), initial values of weight vectors, and the much less computational resources than the conventional SOM
number of iterations are prespecified. algorithm. Here we suggest terminating the updating procedure
of the -means algorithm by checking whether the number of
III. THREE-STAGE METHOD iterations has reached the prespecified number since we will use
the SOM algorithm to fine-tune the cluster centers in stage 3.
To begin with, let denote a spatially continuous input
After we have selected the most representative data
space, the topology of which is defined by the metric relation-
points, we enter the second stage where we try to arrange these
ship of the input vectors . The input vector is a random
data points into an neural array so as to form a
variable distributed according to an unknown probability
feature map as topologically ordered as possible.
density function . Let denote a spatially discrete output
space, the topology of which is endowed by arranging a set of B. Assignment of the Data Points
neurons as the computation nodes of a lattice.
Our objective is to generate a mapping from onto a two- A number of algorithms have been proposed to represent
dimensional topological structure , as shown by data from a high-dimensional space in a low-dimensional space
so as to preserve as far as possible the “internal structure” of
(3) the data in the high-dimensional space. They are metric multi-
dimensional scaling (metric MDS) [25]–[27], minimal wiring
This mapping should have the following properties. [28], [29], minimal path length [29]–[31], minimal distortion
• Nearby input vectors are mapped onto neighboring loca- [32], [33], dynamic link matching [34], [35], Sammons’s
tions in the output space , and topologically close nodes method [36], nonmetric MDS [37], and the approach of Jones
in should have similar signals being mapped onto them. et al.[38]. Basically, these approaches construct such mapping
• Regions in the input space from which input vectors by optimizing some particular object functions. Goodhill and
are drawn with a high probability of occurrence are Sejnowski showed that several of these approaches could be
mapped onto large domains of the output space , and seen as particular cases of a more general objective function
therefore with better resolution than regions in from [39], [40]. A problem associated with these approaches is
which input are drawn with a low probability of occur- that there are usually many local minima on object functions,
rence. and getting stuck in some local minimum is unavoidable if a
SU AND CHANG: FAST SELF-ORGANIZING FEATURE MAP ALGORITHM 723
Fig. 8. The 225 cluster centers found by iterating the K -means algorithm for
five times.
Fig. 6. Projection of the iris data by Sammon’s method.
we peel an onion layer by layer. There are ( 1)/2 layers in first select data points whose summed distances to
a neural array containing neurons. We label the layers are the first
in the manner shown in Fig. 3. Suppose we are to process the largest ones. The coordinates of these data
assignment of the neurons on the th layer. Without loss of gen- points are then assigned to be the weight vectors of the neu-
erality, we still present the method of selecting rons on the top edge accordingly. We repeat the process until all
data points to compute the weight vectors of the weight vectors of neurons in the neural array have been com-
neurons on the top edge. In the same manner as in Step 2), we puted. Fig. 4 illustrates the whole procedure.
SU AND CHANG: FAST SELF-ORGANIZING FEATURE MAP ALGORITHM 725
Fig. 9. The resultant feature maps constructed by the three methods under the first cooling regime for the 579 data set: the left column is method 1; the center
column is method 2; and the right column is method 3.
C. Fine-Tuning of the Initial Map it should be emphasized that (2) usually involves a large amount
If the initial map constructed in the previous stage is not good of computation in computing the neighborhood function values
enough, we will use the Kohonen SOM algorithm under a fast if a Gaussian-type function is utilized to implement . Al-
cooling regime to fine-tune it. Since we start from an initial map though the -means algorithm also requires the computation of
that roughly preserves the topological property of the data set, distances, comparison operations to find the
we then will not need a lot of amount of iterations to fine-tune most nearest cluster centers, and updates to modify
the map. cluster centers within an epoch, the amount of computation in-
Let denote the number of training data. For the conven- volved in (4) is much less than the amount involved in (2). If we
tional SOM algorithm, it requires the computation of run the -means algorithm times, then times that amount
distances, comparison operations to select winners, and of computation is required to find out cluster centers. If is
updates to modify weight vectors within an epoch. Here not a huge number, then the whole computational complexity of
726 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000
Fig. 10. The resultant feature maps constructed by the three methods under the second cooling regime for the 579 data set: the left column is method 1; the center
column is method 2; and the right column is method 3.
the -means algorithm is much less than the SOM algorithm. work, the total amount of computation required for Steps 2) and
When we enter stage B, we first have to establish a look-up table 3) will be less than times of . If is not much
for the distances to lessen the computational greater than , then the complexity of our method is less than
burden. Then it requires comparison operations the SOM algorithm.
to find and summations and comparison
operations to find , and comparison operations and IV. SIMULATION RESULTS
summations to find . In Step 2), it requires Three data sets are used to illustrate the performance of our
4 summations and comparison method. The first data set consists of 579 2-D data points shown
operations to initialize the neurons on the four edges of the net- in Fig. 5. The second data set is the well-known iris data set
work. Let denote the total amount of computation required that consists of 150 four-dimensional (4-D) data points. A 2-D
in Step 2). Since there are layers in an net- configuration for the iris data set using Sammon’s projection
SU AND CHANG: FAST SELF-ORGANIZING FEATURE MAP ALGORITHM 727
Fig. 11. The resultant calibrated maps constructed by the three methods under the first cooling regime for the iris data set: the left column is method 1; the center
column is method 2; and the right column is method 3.
method is shown in Fig. 6. The 150 patterns in Fig. 6 have -means algorithm for only five times to select the most rep-
been labeled by category to highlight the separation of the three resentative data points from each data set. To illustrate the
categories. The third data set is the animal data set originally effectiveness of our method, we use three different approaches
introduced by Ritter and Kohonen [41]. Fig. 7 shows the 2-D to forming feature maps. The first method is to use the random
projection of the 16 patterns in two dimensions by Sammon’s initialization scheme. Method 2 randomly selects training
method. For all simulations of the three data sets, we iterated the data to initialize the weight vectors. Our method is the third
728 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000
Fig. 12. The resultant calibrated maps constructed by the three methods under the second cooling regime for the iris data set: the left column is method 1; the
center column is method 2; and the right column is method 3.
TABLE I
ANIMAL NAMES AND BINARY ATTRIBUTES (ADAPTED FROM RITTER AND KOHONEN, 1989): IF AN ATTRIBUTE APPLIES FOR AN ANIMAL, THE CORRESPONDING
TABLE ENTRY IS 1; OTHERWISE, 0
tors for the last one-third epochs. The intention sional space, and there are total 150 patterns in the data set.
of this cooling regime is to quickly cool down the Each class has 50 patterns. A network with 15 15 neurons
training procedure. was trained by the iris data set. Since it is not possible to vi-
Note that the parameter represents the number of epochs in sualize a 4-D mesh, we decided to provide “calibrated maps”
our simulations. so that one may easily validate whether the resultant maps are
Example 1) 2-D Data Set: A 15 15 network is trained topologically ordered or not. A map is calibrated if the neu-
by the artificial data set shown in Fig. 5. Fig. 8 shows the 225 rons of the network are labeled according to their responses to
cluster centers found by the -means algorithm. Figs. 9 and specific known input vectors. Throughout our simulations, such
10 show the resultant maps constructed by the three different labeling was achieved by so-called minimum distance method
methods under regimes 1 and 2, respectively. The left column, (i.e., a neuron is labeled to class if its nearest neighbor be-
the center column, and the right column of these two figures longs to class .) The resultant calibrated maps are shown in
show the maps constructed by method 1, method 2, and method Figs. 11 and 12 for regimes 1 and 2, respectively. We have man-
3, respectively. By viewing Fig. 9, we find that the conventional ually drawn boundaries (black thin curves) between different
SOM algorithm under a convenient cooling regime is insensible classes on some calibrated maps in order to ease the compar-
to initial weights. All three methods constructed topologically isons of the three methods. Again, Fig. 11 confirms that the con-
ordered maps. Note that although topologically ordered maps ventional SOM algorithm under a convenient cooling regime is
can be formed in the 10th epoch more updates were still required insensible to initial weights. Although the three maps shown in
to fine-tune the weights toward the density distribution of the Fig. 11(b), (f), and (j) have no small fragment regions, they have
data points. On the contrary, Fig. 10 tells us that the conventional not yet matched the density distribution of the data set. As the
SOM algorithm under a fast cooling regime is very sensible to learning procedure progressed, they matched the data distribu-
initial weights, and only our method can form a good topologi- tion more. From Fig. 12(d), we find that the number of neurons
cally ordered map. In the case of our method, the mesh remains most responding to class 1 (i.e., iris setosa) is much less than
untangled and quickly adapts in detail within ten epochs. On the other two kinds of neurons, indicating that the feature map
the other hand, the meshes on the left two columns of Fig. 10 is not well formed. This phenomenon also exists in Fig. 12(h).
first tangled and then tried to unfold themselves. Unfortunately, Furthermore, class 1 is split into two regions in Fig. 12(h), indi-
even if we used 90 more epochs to continue the training process, cating a more serious defect. On the contrary, classes 1, 2, and 3
the incorrect topological ordering was not eliminated. In fact, shown in Fig. 12(l) are almost separable from each other except
the meshes still tangled even when we continued the training for two small isolated regions. That classes 2 and 3 do overlap
process for another 900 epochs. each other in the original 4-D space indicates that this feature
Example 2) Iris Data Set: The iris data set has three sub- map matches the data structure of the iris data. Again, a good
sets (i.e., iris setosa, iris versicolor, and iris virginical), two topologically ordered map can be constructed more quickly and
of which are overlapping. The iris data are in a four-dimen- correctly by our method than the SOM algorithm.
730 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000
Fig. 13. The resultant calibrated maps constructed by the three methods under the first cooling regime for the animal data set: the left column is method 1; the
center column is method 2; and the right column is method 3.
Example 3) Animal Data Set: The animal data set was orig- “2” represents carnivore, and “3” represents herbivore). Note
inally introduced by Ritter and Kohonen to illustrate the SOM that we find that the 2-D projection of the animal data set is
for high-dimensional data set. It consists of the description of linearly separable from each other by viewing Fig. 7. The 13
16 animals by binary property lists tabulated in Table I. We then properties consist of the input vector to the network of 11
group these 16 animals into three classes (“1” represents bird, 11 neurons. The calibrated feature maps are shown in Figs. 13
SU AND CHANG: FAST SELF-ORGANIZING FEATURE MAP ALGORITHM 731
Fig. 14. The resultant calibrated maps constructed by the three methods under the second cooling regime for the animal data set: the left column is method 1; the
center column is method 2; and the right column is method 3.
and 14 for regimes 1 and 2, respectively. From Fig. 13, we and two clusters, respectively; indicating that the feature map
observe that all three methods can construct topologically or- is not correctly formed because it is too fragmented. The map
dered maps quickly. However, Fig. 14 demonstrates totally dif- shown in Fig. 14(h) also is fragmented. On the contrary, from
ferent results. In Fig. 14(d), classes 1, 2, and 3 span two, three, Fig. 14(l), we find that the three classes are entirely enclosed by
732 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000
their population clusters, indicating that the feature map is well [12] S. A. Harp and T. Samad, “Genetic optimization of self-organizing fea-
formed. Once again, a topologically ordered map can be con- ture maps,” in Proc. Int. Conf. Neural Networks, Seattle, WA, 1991, pp.
341–346.
structed quickly and correctly by our method. [13] M. McInerney and A. Dhawan, “Training the self-organizing feature
map using hybrids of genetic and Kohonen methods,” in Proc. IEEE
Int. Conf. Neural Networks, 1994, pp. 641–644.
[14] S. J. Huang and C. C. Hung, “Genetic algorithms enhanced Kohonen’s
V. CONCLUSION neural networks,” in Proc. IEEE Int. Conf. Neural Networks, 1995, pp.
708–712.
In this paper, an efficient method for forming a topologically [15] M. C. Su and H. T. Chang, “Genetic-algorithm-based approach to self-
organizing feature map and its application in cluster analysis,” in Proc.
ordered feature map is proposed. Observing the simulation re- IEEE Int. Joint Conf. Neural Networks, AK, 1995, pp. 2116–2121.
sults, we can make the following four concluding remarks. [16] Z.-P. Lo and B. Bavarian, “On the rate of convergence in topology pre-
serving neural networks,” Biol. Cybern., vol. 65, pp. 55–63, 1991.
1) The initial map constructed in the second stage already [17] M. Y. Kiang, U. R. Kulkarni, M. Goul, A. Philippakis, R. T. Chi, and E.
roughly preserves the topological property of the data set. Turban, “Improving the effectiveness of self-organizing map networks
Viewing Figs. 10(i), 12(i), and 14(i) can confirm this ob- using a circular Kohonen layer,” in Proc. 30th Hawaii Int. Conf. System
Sciences, 1997, pp. 521–529.
servation. In other words, we almost achieve one-pass [18] B. Fritzke, “Growing cell structures-a self-organizing network for unsu-
learning. pervised and supervised learning,” Neural Networks, vol. 7, no. 9, pp.
2) The topological relations of data can be more preserved 1441–1460, 1994.
[19] Y. P. Jun, H. Yoon, and J. W. Cho, “L learning: A fast self-organizing
if we fine-tune the initial map in the third stage. More feature map learning algorithm based on incremental ordering,” IEICE
important, we can use a fast cooling regime instead of a Trans. Inform. Syst., vol. E76, no. 6, pp. 698–706, 1993.
slow cooling regime usually adopted by the conventional [20] J. Koh, M. Suk, and S. M. Bhandarkar, “A multilayer self-organizing
feature map for range image segmentation,” Neural Networks, vol. 8,
SOM algorithm. no. 1, pp. 67–86, 1995.
3) Our method is less sensible to the selections of learning [21] T. Samad and S. A. Harp, “Self-organization with partial data,” Network
parameters than the conventional SOM algorithm since Computat. Neural Syst., vol. 3, pp. 205–212, 1992.
[22] M. M. Van Hulle and K. U. Leuven, “Globally-ordered topology-pre-
our method can form topologically ordered feature maps serving maps achieved with a learning rule performing local weight up-
under two different cooling regimes. dates only,” in Proc. IEEE Workshop Neural Networks for Signal Pro-
4) The conventional SOM algorithm under a convenient cessing, 1995, pp. 95–104.
[23] J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory of
cooling regime can construct topologically ordered Neural Computation. Reading, MA: Addison-Wesley, 1991, ch. 9.
feature maps no matter which initialization scheme is [24] L. Han, “Initial weight selection methods for self-organizing training,”
utilized. This can be confirmed by viewing Figs. 9, 11, in Proc. IEEE Int. Conf. Intelligent Processing Systems, 1997, pp.
404–406.
and 13. The only price one has to pay is a large amount of [25] W. S. Torgerson, “Multidimentional scaling, I: Theory and method,”
updates required to fine-tune the maps to move weights Psychometrika, vol. 17, pp. 401–419, 1952.
[26] R. N. Shepard, “Multidimentional scaling, tree-fitting and clustering,”
toward the density distribution of the data. Science, vol. 210, pp. 390–398, 1980.
[27] F. W. Young, Multidimensional Scaling: History, Theory, and Applica-
tions. Hillsidale, NJ: Erlbum, 1987.
2
[28] G. Mitchison and R. Durbin, “Optimal numberings on an N N array,”
REFERENCES SIAM J. Alg. Disc. Meth., vol. 7, pp. 571–581, 1986.
[29] R. Durbin and G. Mitchison, “A dimension reduction framework for
[1] D. J. Willshaw and C. Von Der Malsburg, “How patterned neural con- understanding cortical maps,” Nature, vol. 343, pp. 644–647, 1990.
nections can be set up by self-organization,” Proc. Roy. Soc. London B, [30] R. Durbin and D. J. Willshaw, “An analogue approach to the traveling
vol. 194, pp. 431–445, 1976. salesman problem using an elastic net method,” Nature, vol. 326, pp.
[2] T. Kohonen, “Self-organized formation of topologically correct feature 689–691, 1987.
maps,” Biol. Cybern., vol. 43, pp. 59–69, 1982. [31] G. J. Goodhill and D. J. Willshaw, “Application of the elastic net algo-
[3] H. J. Ritter, T. Martinetz, and K. J. Schulten, Neural Computation rithm to the formation of ocular dominance stripes,” Network, vol. 1, pp.
and Self-Organizing Maps: An Introduction. Reading, MA: Ad- 41–59, 1990.
[32] S. P. Luttrell, “Derivation of a class of training algorithms,” IEEE Trans.
dison-Wesley, 1992.
Neural Networks, vol. 1, pp. 229–232, 1990.
[4] M. C. Su, N. DeClaris, and T. K. Liu, “Application of neural networks
[33] , “A Bayesian analysis of self-organizing maps,” Neural Com-
in cluster analysis,” in Proc. IEEE Int. Conf. Systems, Man, and Cyber- putat., vol. 6, pp. 767–794, 1994.
netics, Orlando, FL, Oct. 12–15, 1997, pp. 1–6. [34] E. Bienenstock and C. von der Malsburg, “A neural network for the re-
[5] T. M. Martinetz, H. J. Ritter, and K. J. Schulten, “Three-dimensional trieval of superimposed connection patterns,” Europhys. Lett., vol. 3, pp.
neural net for learning visuomotor coordination of a robot arm,” IEEE 1243–1249, 1987.
Trans. Neural Networks, vol. 1, pp. 131–136, 1990. [35] , “A neural network for invariant pattern recognition,” Europhys.
[6] T. Kohonen, “The ‘neural’ phonetic typewriter,” Computer, vol. 21, pp. Lett., vol. 4, pp. 121–126, 1987.
11–22, Mar. 1988. [36] J. W. Sammon, “A nonlinear mapping for data structure analysis,” IEEE
[7] S. P. Luttrell, “Hierarchical vector quantization,” Proc. Inst. Elect. Eng., Trans. Comput., vol. C-18, pp. 401–409, 1969.
vol. 136, pp. 405–413, 1989. [37] R. N. Shepard, “The analysis of proximities: Multidimensional scaling
[8] T. K. Kohonen, J. Kangas, J. Laaksonen, and K. Tokkola, LVQ-PAK: with unknown distance function,” Psychometrica, vol. 27, pp. 219–246,
The learning vector quantization program package, , Helsinki Univ. 1962.
Technol., Finland, 1992. [38] D. G. Jones, R. C. Van Sluyters, and K. M. Murphy, “A computational
model for the overall pattern of ocular dominance,” J. Neurosci., vol. 11,
[9] F. Farata and R. Walker, “A study of the application of Kohonen-type
pp. 3794–3808, 1991.
neural networks to the travelling salesman problem,” Biol. Cybern., vol. [39] G. J. Goodhill and T. J. Seinowski, “Quantifying neighborhood preser-
64, pp. 463–468, 1991. vation in topographic mappings,” in Proc. 3rd Joint Symp. Neural Com-
[10] T. Kohonen, E. Oja, O. Simula, A. Visa, and J. Kangas, “Engineering putation, 1996, pp. 61–82.
application of the self-organizing map,” Proc. IEEE, vol. 84, pp. [40] , “A unifying objective function for topographic mappings,” Neural
1358–1383, Oct. 1996. Computat., vol. 9, pp. 1291–1303, 1997.
[11] S. Haykin, Neural Networks—A Comprehensive Foundation. New [41] H. J. Ritter and T. Kohonen, “Self-organizing semantic maps,” Biolog.
York, NY: Macmillan, 1994, ch. 10. Cybern., vol. 61, pp. 241–254, 1989.
SU AND CHANG: FAST SELF-ORGANIZING FEATURE MAP ALGORITHM 733
Mu-Chun Su received the B. S. degree in electronics Hsiao-Te Chang received the B.S. and M.S. degrees
engineering from National Chiao-Tung University, in electrical engineering from Tamkang University,
Taiwan, in 1986, and the M. S. and Ph.D. degree in Taiwan, Republic of China, in 1996 and 1998, respec-
electrical engineering from University of Maryland, tively.
College Park, in 1990 and 1993, respectively. His current research interests include neural net-
He is an Associate Professor of electrical engi- works, pattern recognition, and fuzzy systems.
neering at Tamkang University, Taiwan. His current
interests include neural networks, fuzzy systems,
computer-aided medical systems, assistive tech-
nology, pattern recognition, and image processing.
Dr. Su was the IEEE Franklin V. Taylor Award re-
cipient for the most outstanding paper coauthored with Dr. N. DeClaris and
presented ato the 1991 IEEE SMC Conference.