Artificial Neural Networks For Pattern Recognition
Artificial Neural Networks For Pattern Recognition
Artificial Neural Networks For Pattern Recognition
B YEGNANARAYANA
Abstract. This tutorial article deals with the basics of artificial neural
networks (ANN)and their applications in pattern recognition. ANN can be
viewed as computing models inspired by the structure and function of the
biological neural network. These models are expected to deal with
problem solving in a manner different from conventional computing. A
distinction is made between pattern and data to emphasize the need for
developing pattern processing systems to address pattern recognition
tasks. After introducing the basic principles of ANN, some fundamental
networks are examined in detail for their ability to solve simple pattern
recognition tasks. These fundamental networks together with the
principles of ANNwill lead to the development of architectures for complex
pattern recognition tasks. A few popular architectures are described to
illustrate the need to develop an architecture specific to a given pattern
recognition problem. Finally several issues that still need to be addressed
to solve practical problems using ANN approach are discussed.
1. Introduction
Human problem solving is basically a pattern processing problem and not a data
processing problem. In any pattern recognition task humans perceive patterns in the
input data and manipulate the pattern directly. In this paper we discuss attempts at
developing computing models based on artificial neural networks (ANN) to deal with
various pattern recognition situations in real life.
Search for new models of computing is motivated by our quest to solve natural
(intelligent) tasks by exploiting the developments in computer technology (Marcus
& van Dam 1991). The developments in artificial intelligence (AI) appeared promising
till a few years ago. But when the AI methods were applied to natural tasks such as
in speech, vision and natural language processing, the inadequacies of the methods
189
190 B Yegnanarayana
showed up. Like conventional algorithms, AI methods also need a clear specification
of the problem, and mapping of the problem into a form suitable for the methods
to be applicable. For example, in order to apply heuristic search methods, one needs
to map the problem as a search problem. Likewise, to solve a problem using a
rule-based approach, it is necessary to explicitly state the rules governing it. Scientists
are hoping that computing models inspired by biological neural networks may provide
new directions to solving problems arising in natural tasks. In particular, it is hoped
that neural networks would extract the relevant features from input data and perform
the pattern recognition task by learning from examples, without explicitly stating the
rules for performing the task.
The objective of this tutorial paper is to present an overview of the current
approaches based on artificial neural networks for solving various pattern recognition
tasks. From the overview it will be evident that the current approaches still fall far
short of our expectations, and there is scope for evolving better models inspired by
the principles of operation of our biological neural network. This paper is organized
as follows: In § 2 we discuss the nature of patterns and pattern recognition tasks that
we encounter in our daily life. We make a distinction between pattern and data, and
also between understanding and recognition. In this section we also briefly discuss
methods available for dealing with pattern recognition tasks, and make a case for
new models of computing based on artificial neural networks. The basics of artificial
neural networks are presented in § 3, including a brief discussion on the operation of
a biological neural network, models of neuron and the neuronal activation and
synaptic dynamics. Section 4 deals with the subject matter of this paper, namely, the
use of principles of artificial neural networks to solve simple pattern recognition tasks.
This section introduces the fundamental neural networks that laid the foundation
for developing new architectures. In § 5 we discuss a few architectures for complex
pattern recognition tasks. In the final section we discuss several issues that need to
be addressed to develop artificial neutral network models for solving practical
problems.
The current usage of the terms like AI systems, intelligent systems, knowledge-based
systems, expert systems etc., are intended to show the urge to build machines that
can demonstrate intelligence similar to human beings in performing some simple
tasks. In these tasks we look at the performance of a machine and compare it with
the performance of a person. We attribute intelligence to the machine if the perfor-
mances match. But the way the tasks are performed by a machine and by a human
being are basically different; the machine performing the task in a step-by-step
sequential manner dictated by an algorithm, modified by some known heuristics.
The algorithm and the heuristics have to be derived for a given task. Once derived,
they generally remain fixed. Typically, implementation of these tasks requires large
number of operations (arithmetic and logical) and also a large amount of memory.
The trends in computing clearly demonstrate the machine's ability to handle a large
number of operations (Marcus & van Dam 1991).
Artificial neural networks for pattern recognition 191
it is desirable to recall the correct pattern even though the test data are noisy or
incomplete. The problem of storage and recall of patterns is called autoassociation.
Since this is a content addressable memory function, the system should display
accretive behaviour, i.e., should recall the stored pattern closest to the given input. It
is also necessary to store as many patterns or pattern pairs as possible in a given
system.
Printed characters or any set of fixed symbols could be considered as examples of
patterns for these tasks. Note that the test patterns are the same as the training
patterns, but with some noise added, or some portions missing. In other words, the
test data are generated from the same source in an identical manner as the training
data.
2.3b Pattern mapping: In pattern mapping, given a set of input patterns and the
corresponding output pattern or class label, the objective is to capture the implicit
relationship between the patterns and the output, so that when a test input is given,
the corresponding output pattern or the class label is retrieved. Note that the system
should perform some kind of generalization as opposed to memorizing the information.
This can also be viewed as a pattern classification problem belonging to supervised
learning category. Typically, in this case the test patterns belonging to a class are
not the same as the training patterns, although they may originate from the same
source. Speech spectra of steady vowels generated by a person, or hand-printed
characters, could be considered as examples of patterns for pattern mapping problems.
Pattern mapping generally displays interpolative behaviour, whereas pattern
classification displays accretive behaviour.
2.3c Pattern grouping: In this case, given a set of patterns, the problem is to identify
the subset of patterns possessing similar distinct features and group them together.
Since the number of groups and the features of each group are not explicitly stated,
this problem belongs to the category of unsupervised learning or pattern clustering.
Note that this is possible only when the features are unambiguous as in the case of
hand-printed characters or steady vowels. In the pattern mapping problem the
patterns for each group are given separately, and the implicit, although distinct,
features have to be captured through the mapping. In pattern grouping on the other
hand, patterns belonging to several groups are given, and the system has to resolve
the groups.
Examples of the patterns for this task could be printed characters or hand-printed
characters. In the former case, the grouping can be made based on the data themselves.
Moreover, in that case the test data are also generated from an identical source as the
training data. For hand-printed characters or steady vowel patterns, the features of
the patterns in the data are used for grouping. Therefore in this case the test data are
generated from a similar source as the training data, so that only features are preserved
and not necessarily the actual data values.
2.3d Feature mapping: In several patterns the features are not unambiguous. In
fact the features vary over a continuum, and hence it is difficult to form groups of
patterns having some distinct features. In such cases, it is desirable to display the
feature changes in the patterns directly. This again belongs to the unsupervised
learning category. In this case what is learnt is the feature map of a pattern and not
the group or class to which the pattern may belong. This occurs, for example, in the
Artificial neural networks for pattern recofnition 193
speech spectra for vowels in continuous speech. Due to changes in the vocal tract
shape for the same vowel occurring in different contexts, the features (formants or
resonances of the vocal tract in this case) vary over overlapping regions for different
vowels.
2.3e Pattern variability: There are many situations when the features in the pattern
undergo unspecified distortions each time the pattern is generated by the system.
This can be easily seen in the normal handwritten cursive script. Human beings are
able to recognize them due to some implicit interrelations among the features, which
themselves cannot be articulated precisely. Classification of such patterns falls into
the category of pattern variability task.
2.3f Temporal patterns: All the tasks discussed so far refer to the features present
in a given static pattern. Human beings are able to capture effortlessly the dynamic
features present in a sequence of patterns. This is true, for example, in speech where
the changes in the resonance characteristics of the vocal tract system (e.g. formant
contours) capture the significant information about the speech message. This is also
true in any dynamic scene situation. All such situations require handling sequences
of static patterns simultaneously, looking for changes in the features in the subpatterns
in adjacent pattern pairs.
2.3g Stability-plasticity dilemma: In any pattern recognition task the input patterns
keep changing. Therefore it is difficult to freeze the categorization task based on a
set of patterns used in the training set. If it is frozen, then the system cannot learn
the category that a new pattern may suggest. In other words, the system lacks its
plasticity. On the other hand, if the system is allowed to change its categorization
continuously, based on new input patterns, it cannot be used for any application
such as pattern classification or clustering, as it is not stable. This is called stability-
plasticity dilemma in pattern recognition.
Methods for solving pattern recognition tasks generally assume a sequential model
for the pattern recognition process, consisting of pattern environment, sensors to
collect data from the environment, feature extraction from the data and association/
storage/classification/clustering using the features.
The simplest solution to a pattern recognition problem is to use template matching,
where the data of the test pattern are matched point by point with the corresponding
data in the reference pattern. Obviously, this can work only for very simple and
highly restricted pattern recognition tasks. At the next level of complexity, one can
assume a deterministic model for the pattern generation process, and derive the
parameters of the model from given data in order to represent the pattern information
in the data. Matching test and reference patterns are done at the parametric level. This
works well when the model of the generation process is known with reasonable
accuracy. One could also assume a stochastic model for the pattern generation process,
and derive the parameters of the model from a large set of training patterns. Matching
between test and reference patterns can be performed by several statistical methods
like likelihood ratio, variance weighted distance, Bayesian classification etc. Other
approaches for pattern recognition tasks depend on extracting features from
194 B Yegnanarayana
parameters or data. These features may be specific for the task. A pattern is described
in terms of features, and pattern matching is done using descriptions of the features.
Another method based on descriptions is called syntactic or structural pattern
recognition in which a pattern in expressed in terms of primitives suitable for the
classes of pattern under study (Schalkoft 1992). Pattern matching is performed by
matching the descriptions of the patterns in terms of the primitives. More recently,
methods based on the knowledge of the sources generating the patterns are being
explored for pattern recognition tasks. These knowledge-based systems express
knowledge in the form of rules for generating and perceiving patterns.
The main difficulty in each of the pattern recognition techniques alluded to above
is that of choosing an appropriate model for the pattern generating process and
estimating the parameters of the model in the case of a model-based approach, or
extraction of features from data/parameters in the case of feature-based methods, or
selecting appropriate primitives in the case of syntactic pattern recognition, or deriving
rules in the case of a knowledge-based approach. It is all the more difficult when the
test patterns are noisy and distorted versions of the patterns used in the training
process. The ultimate goal is to impart to a machine the pattern recognition
capabilities comparable to those of human beings. This goal is difficult to achieve
using most of the conventional methods, because, as mentioned earlier, these methods
assume a sequential model for the pattern recognition process. On the other hand,
the human pattern recognition process is an integrated process involving the use of
biological neural processing even from the stage of sensing the environment. Thus
the neural processing takes place directly on the data for feature extraction and
pattern matching. Moreover, the large size (in terms of number of neurons and
interconnections) of the biological neural network and the inherently different
mechanism of processing are attributed to our abilities of pattern recognition in spite
of variability and noise in the data. Moreover, we are able to deal effortlessly with
temporal patterns and also with the so-called stability-plasticity dilemma as well.
It is for these reasons attempts are being made to explore new models of computing,
inspired by the structure and function of the biological neural network. Such models
for computing are based on artificial neural networks, the basics of which are
introduced in the next section.
To Other neurons
"°22'2J' - -
.il I
NEURON ! NEURON 2
Although all neurons operate on the same basic principle, there exist several different
types of neurons, distinguished by the size and degree of branching of their dendritic
trees, the length of their axons, and other structural details. The complexity of the
human central nervous system is due to the vast number of neurons and their mutual
connections. Connectivity is characterized by the complementary properties of
convergence and divergence. In the human cortex every neuron is estimated to receive
converging input on the average from about 104 synapses. On the other hand, each
cell feeds its output into many hundreds of other neurons. The total number of
neurons in the human cortex is estimated to be in the vicinity of 10 tl, and are
distributed in layers over a full depth of cortical tissue at a constant density of about
150,000 neurons per square millimetre. Combined with the average number of
synapses per neuron, this yields a total of about 10Is synaptic connections in the
human brain, the majority of which develop during the first few months after birth.
The study of the properties of complex systems built of simple, identical units, may
lead to an understanding of the mode of operation of the brain in its various functions,
although we are still far from it.
The simplified schematic and uniform connectionist units offer a surprisingly rich
structure when assembled in a closely interconnected network. We shall call such a
network an artificial neural network. Since artificial neural networks are implemented
on computers, it is worth comparing the processing capabilities of computers with
that of the biological neural networks (Simpson 1990).
Neural networks are slow in processing information. The cycle time corresponding
to execution of one step of a program in a computer is in the range of a few
nanoseconds, whereas the cycle time corresponding to a neural event prompted by
an external stimulus, is in the milliseconds range. Thus computers process information
a million times faster.
Neural networks perform massively parallel operations. Most programs operate
in a serial mode, one instruction after another, in a conventional computer, whereas
the brain operates with massively parallel programs that have comparatively fewer
steps.
Neural networks have large numbers of computing elements, and the computing
is not restricted to within neurons. The conventional computer typically has one
central processing unit where all the computing takes place.
Neural networks store information in the strengths of the interconnections. In a
computer, information is stored in the memory which is addressed by its location.
New information is added by adjusting the interconnection strengths without
completely destroying the old information, whereas in a computer the information
is strictly replaceable.
Neural networks distribute the encoded information throughout the network, and
hence they exhibit fault tolerance. In contrast, computers are inherently not fault
tolerant, in the sense that information corrupted in the memory cannot be retrieved.
There is no central control in processing information in the brain. Thus there in
no specific control mechanism external to the computing task. In a computer, on the
other hand, there is a control unit which monitors all the activities of computing.
While the superiority of the human information processing system over the
conventional computer for pattern recognition tasks stems from the basic structure
and operation of the biological neural network, it is possible to realize some of the
features of the human system using an artificial neural network consisting of basic
computing elements. In particular, it is possible to show that such a network exhibits
Artificial neural networks for pattern recognition 197
3.2a Processing unit: We can consider an artificial neural network (ANN)as a highly
simplified model of the structure of the biological neural network. An ANN consists
of interconnected processing units. The general model of a processing unit consists
of a summing part followed by an output part. The summing part receives n input
values, weighs each value, and performs a weighted sum. The weighted sum is called
the activation value. The sign of the weight for each input determines whether the
input is excitatory (positive weight) or inhibitory (negative weight). The inputs could
be discrete or continuous data values, and likewise the outputs also could be discrete
or continuous. The input and output may also be viewed as deterministic or stochastic
or fuzzy, depending on the nature of the problem and its solution.
3.2c Operations: In operation, each unit of an ANN receives inputs from other
connected units and/or from an external source. A weighted sum of the inputs is
computed at a given instant of time. The resulting activation value determines the
actual output from the output function unit, i.e., the output state of the unit. The
output values and other external inputs in turn determine the activation and output
states of the other units. The activation values of the units (activation state) of the
network as a function of time are referred to as activation dynamics. The activation
dynamics also determine the dynamics of the output state of the network. The set
of all activation states defines the state space of the network. The set of all output
states defines the output or signal state space of the network. Activation dynamics
determines the trajectory of the path of the states in the state space of the network.
For a given network, defined by the units and their interconnections with
appropriate weights, the activation states refer to the short term memory function of
the network. Generally the activation dynamics is followed to recall a pattern stored
in a network.
In order to store a pattern in a network, it is necessary to adjust the weights of
the network. The sets of all weight values (corresponding to strengths of all connecting
links of an ANN) defines the weight space. If the weights are changing, then the set of
198 B Yegnanarayana
weight values as a function of time defines the synaptic dynamics of the network.
Synaptic dynamics is followed to adjust the weights in order to store given patterns
in the network. The process of adjusting the weights is referred to as learning. Once
the learning process is completed, the final set of weight values corresponds to the
long term memory function of the network. The procedure to incrementally update
each of the weights is called a learning law or learning algorithm.
3.2d Update: In implementation, there are several options available for both
activation and synaptic dynamics. In particular, the updating of the output states of
all units could be performed synchronously. In this case, the activation values of all
units are computed at the same time assuming a given output state throughout. From
these activation values the new output state of the network is derived. In an asynchronous
update, on the other hand, each unit is updated sequentially, taking the current output
state of the network into account each time. For each unit, the output state can be
determined from the activation value either deterministically or stochastically.
In practice, the activation dynamics, including the update, is much more complex
in a biological neural network. The ANN models along with the equations governing
the activation and synaptic dynamics are developed according to the complexity of
the pattern recognition task to be handled.
In this section we will consider three classical models for an artificial neuron or
processing unit.
°2
w2
w~
hAA, ~--wiO,- 0
i
~ s = f(,)
)
On ~
L._
summing output
port function
I'(~} /
o o
In this model the weights wt are constant. That means there is no learning. Networks
consisting of MP neurons with binary (on-off) output signals can be configured to
perform several logical functions (McCulloch & Pitts 1943).
activation: x = ~ wia i - O,
i=l
input weights
- (adjustable)
output
activation signal
value (Binary)
o~ ~--
° 2
)- _• s
>
= f(x)
nature of the desired input-output pair to be represented by the model. The perceptron
convergence theorem (Rosenblatt 1962) enables us to determine whether a given
pattern pair is representable or not. If the weight values converge, then the
corresponding problem is said to be representable by the perceptron network.
3.3c Adaline: The main distinction between Rosenblatt's perceptron model and
Widrow's adaline model (figure 5) is that in the adaline model the analog activation
value (x) is compared with the target output (b). In other words, the output is a linear
function of the activation value (x). The equations that describe the operation of an
adaline are as follows (Widrow & Hoff 1960):
activation: x = ~ wia ~- O,
i=1
output signal: s = f ( x ) = x,
error: 6 = b - s = b - x,
weight update: dwJdt = rlSa~.
This rule minimizes the mean squared error fi2, averaged over all inputs. Hence it is
called the least mean squared (LMS) error learning law. The law is derived using the
negative gradient of the error surface in the weight space. Hence it is also called a
gradient descent algorithm.
3.4 Topology
Artificial neural networks are useful only when the processing units are organized in
a suitable manner to accomplish a given pattern recognition task. This section presents
a few basic structures which will assist in evolving new architectures. The arrangement
of the processing units, connections, and pattern input/output is referred to as
topology (Simpson 1992, pp. 3-24).
Artificial neural networks are normally organized into layers of processing units.
Connections can be made either from units of one layer to units of another (interlayer
connections) or from the units within the layer (intralayer connections) or both inter
and intralayer connections. Further, the connections among the layers and among
the units within a layer can be organized either in a feedforward manner or in a
feedback manner. In a feedback network the same processing unit may be visited
more than once.
We will discuss a few basic structures which form building blocks for complex
neural network architectures. Let us consider two layers F 1 and F2 with N and M
processing units, respectively. By providing connections to the jth unit in F 2 from
all the units in F1, as shown in figures 6a and b, we get two network structures instar
and outstar, which have fan-in and fan-out geometries, respectively. The units in the
Artificial neural networks for pattern recognition 201
© ...... ® (i) . . . . . . ®
fl ~ Ft
(c) (d)
(f)
(e)
Figure 6. Some basic structures of the Artiticial Neural Networks. (a) Imtar,
(b) outstar, (c) group of instars, (d) group of outstars, (e) bidirectional associative
memory, and if) autoassociative memory.
Fx layer are linear units, so that for each unit i in this layer the input (at) = activation
(xi) = output signal (st). In instar, during learning, the weight vector w~(wll, w~2..... wjN)
is adjusted so as to approach the given input vector a at Fx layer. Therefore whenever
the input is given to FI, then tbejth unit ofF2 wiU be activated to the maximum extent.
Thus the operation of the instar can be viewed as content addressing the memory. In
the case of the outstar, during learning, the weight vector for the connections from the
jth unit in F2 approaches the activity pattern in F1 when input vector = is present
at F1. During recall, whenever the unit j is activated, the signal pattern (sjw~j,
S~Wlj.... s~wN~) will be transmitted to F~, which then produces the original activity
pattern corresponding to the input vector a, although the input is absent. Thus the
operation of the outstar can be viewed as memory addressing the contents.
202 B Yeonanarayana
When all the connections from units in F1 and F 2 are made as in figure 6c, then
we obtain a heteroassociation network. This network can be viewed as a group of
instars, if the flow is from F1 to F2. On the other hand, if the flow is from F2 to F1,
then the network can be viewed as a group of outstars (figure 6d).
When the flow is bidirectional, and the weights are symmetric wij = wj~, then we
get a bidirectional associative memory (figure 6e), where either of the layers can be
used as input/output.
If the two layers F1 and F2 coincide, then we obtain an autoassociative memory
in which each unit is connected to every other unit and to itself (figure 6f).
3.5a Models of activation dynamics: Different models are proposed for the
activation dynamics, the most common ones among them are the additive and
shunting activation models. The additive activation model is given by the equation
for the rate of change of the activation of the ith unit as (Grossberg 1988; Carpenter
1989).
In this equation the first term on the right hand side contributes to a passive decay
term. The net excitatory and inhibitory inputs are contributed by signals from other
units appropriately weighted by the synaptic strengths and by the externally applied
inputs.
Artificial neural networks for pattern recognition 203
In the steady state there will not be any change in activation. That is dxi/dt = O.
In such a case the activation value is given by the net excitatory and inhibitory inputs.
That is
xi = ~. wos ~ - Oi + I~.
J
where the activity x~ remains bounded in the range ( - B , A), and it decays to the
resting level 0 in the absence of all inputs. In this model the excitatory inputs drive
the activity towards a finite maximum A, and the inhibitory inputs drive the activity
towards a finite minimum - B . The shunting model represents a special case of
Hodgkin-Huxley membrane equations to describe the physiology of single nerve cell
dynamics (Hodgkin & Huxley 1952).
The activation models considered so far are called deterministic models. In practice,
the input/output patterns and the activation values can he considered as samples of
a random process, and the output signal of each unit may be a random function of
the unit's activation value. In such a case the network activation state can be viewed
as a vector stochastic process. Each unit in turn behaves as a scalar stochastic process
(Kosko 1992).
The function g may be viewed as a learning function that depends on the type of
learning adopted.
Continuous time learning can be expressed
dwi(t)
- - = ~/gIwi(t), a(t), bi(t)]a(t).
dt
In discrete time learning, at the kth step the new weight is given by
= + b 2.
There are different methods for implementing the synaptic dynamics. These methods
are called learning laws. A few common discrete time learning laws are given below
(Zurada 1992).
This rule is applicable for bipolar output function. The weights can be initialized
to any values prior to learning.
where - V E is the negative gradient of the error E = ½[bi- f(wi ra)]2. Therefore
Here f(.) is a continuous function. The weights may be initialized to any values.
where Wm T a = max (wl ra). Here the weights are initialized to random values prior to
learning and their lengths are normalized during learning.
available, whereas an on-line learning updates only the available information of the
past patterns in the form of weights.
In practice, the training patterns can be considered as samples of random processes.
Learning laws could take into account the changes in the random process reflected
through the samples patterns. Thus one could define stochastic versions of the
deterministic learning laws described so far. The random learning laws are expressed
as first order stochastic differential equations. For example, the random signal
Hebbian learning law relates random processes as (Kosko 1992)
where 6kU= b, -- ~. Since the given data are sample functions of a random process,
the corresponding weights at each iteration are also random.
Synaptic equilibrium in the deterministic signal Hebbian law occurs in the steady
state when the weights stop changing. That is,
In the stochastic case the synaptic weights reach a stochastic equilibrium when the
changes in the weights are contributed by only the random noise. That is, at stochastic
equilibrium, the expectation or ensemble average of the change in weights is given
by (Kosko 1990)
E [ (dwUdt)2 "] = a 2 ,
3.5j Stability and convergence: So far the activation and synaptic dynamics
equations are described in terms of first-order differential equations which are
continuous time equations. Discrete time versions of these equations are convenient
for implementation of the network dynamics on a digital computer. In discrete time
implementation the activation state of each unit at each stage is computed in terms
of the state of the network in the previous stage. The state update at each stage could
be made asynchronously, i.e. each unit is updated using the new updated state, or
synchronously, i.e., all the units are updated using the same previous state.
Artificial neural networks for pattern recognition 207
3.5k Neural network recall: During learning, the weights are adjusted to store the
information in a given pattern or a pattern pair. However, during performance, the
weight-changes are suppressed, and the input to the network determines the output
activation x i or signal values sj. This operation is called recall of stored information.
The recall techniques are different for feedforward and feedback networks.
The simplest feedforward network uses the following equation to compute the
output signal from the input data vector a to the input layer FI:
where f~ is the output function of the ith unit in the output layer F 2. Here the units
in the input layer F1 are assumed to be linear.
A recall equation for a network with feedback connections is given by (Simpson
1992, pp. 3-24)
N
xi(t + 1) = (1 -- ~)xi(t ) + fl ~. f ~(xj(t))wij + ai,
j=l
where xi(t + 1) is the activation value of the ith unit in a single layer neural network
at time (t + 1), fi is the nonlinear output function of the jth unit, 0~ is a positive
constant that regulates the amount of decay the unit has during the update interval,
fl is a positive constant that regulates the amount of feedback the other units provide
to the ith unit, and ai is the external input to the ith unit. In general, stability is the
main issue in feedback networks. If the network reaches a stable state in a finite
number of iterations, then the resulting output signals represent the nearest neighbour
stored pattern of the system for the approximate input pattern u.
Cohen & Grossberg (1983) showed that for a wide class of neural networks with
certain constraints, the network with fixed weights reaches a stable state in a finite
period of time for any initial condition. Later Kosko showed that a neural network
could learn and recall at the same time, and yet remain stable (Kosko 1990).
The response of a network due to recall could be the nearest neighbour or interpolative.
In the nearest neighbour case, the stored pattern closest to the input pattern is recalled.
This typically happens in the feedforward pattern classification or feedback pattern
matching networks. In the interpolative case, the recalled pattern is a combination
of the outputs corresponding to the input training patterns nearest to the given input
test pattern. This happens in the feedforward pattern mapping networks.
1. Feedforward ANN
(a) Pattern association
(b) Pattern classification
(c) Pattern mapping/classification
2. Feedback ANN
(a) Autoassociation
(b) Pattern storage (LTM)
(c) Pattern environment storage (LTM)
we discuss artificial neural networks for the following pattern recognition problem
and for various special cases of the problem.
PROBLEM
Design a neural network to associate the pattern pairs (at,b1), (az,bz),...(aL, br.),
where a, = (az~,a~2.... atN) and bt = ( b , , bz2.... biM) are N and M dimensional vectors,
respectively.
Figure 7 shows the organization of the networks and the pattern recognition tasks
to be discussed in this section. We consider three types of ANN: Feedforward, feedback
and a combination of both. We begin discussion of each network with only a minimal
structure, and study their capabilities and limitations. To start with, the feedforward
network consists of two layers of processing units, one layer with linear units for
receiving the external input, and the other layer for delivering the output. A minimal
feedback network consists of a set of processing units, each connected to all other
units. A combination network consists of an input layer of linear units feeding to
the output layer of units in a feedforward manner, and a feedback connection among
the units in the output layer, including self feedback. We consider each one of these
networks in some detail.
4.1a Pattern association: The objective is to design a linear network that can
capture the association in the pairs of vectors (a~,bz), l = 1, 2... L, through a set of
weights to be determined by a learning or training law. The input data used in training
are typically generated synthetically, like machine printed characters. The input data
used for recall may be corrupted by external noise.
The network consists of a set of weights connecting the two layers of processing
units, the output function of each unit being linear. Such a network is called a linear
associator network. Due to linearity of the output function of each unit, the activation
values and the output signals of the units in the input layer are same as the input
Artificial neural networks for pattern recoflnition 209 .
Pattern association
, Arch: Two layers, linear processing unit, single set of weights
, Learning: Hebb (orthogonal) rule, Delta (linearly independent) rule
. Recall: Direct
, Limitation: Linear independence, # patterns restricted to dimensionality
. T o overcome: Nonlinear processing unit, becomes a pattern classification problem
Pattern classification
, Arch: Two layers, nonlinear processing units, geometrical interpretation
• Learnin#: Delta rule
, Recall: Direct
• Limitation: Linearly separable functions, hard problems
, T o overcome: More layers, hard learning problems
Pattern mapping/classification
, Arch: Multilayer (hidden), nonlinear processing units, geometric interpretation
, Learning: Generalized delta rule - baekpropagation
, Recall: Direct
, Limitation: Slow learning
, T o overcome: More complex architectures
data values. The activation value of the ith unit in the output layer is given by
N
Yi-~" ~ w/ja/j.
j=l
The output of the ith unit is the same as its activation value y~, since the output
function of the unit is linear. The objective is to determine a set of weights w/j i n
such a way that the actual output b't~is equal to the desired output b~l for all the L
pattern pairs.
If the input L pattern vectors {a~} are all orthogonal, then it is possible to use
Hebb's learning law to determine the optimal weights of the network (Hecht-Nielsen
1990). Note that a learning law enables updating of weights as patterns are applied
one by one to the network. The optimality of the weights is determined by minimizing
the mean squared error between the desired and the actual output values. The optimal
weights after I pattern pairs are fed to the network are given by
The final optimal weights for pattern association task are given by
W/j-----W/J.
If the input vectors {as} are only linearly independent, but not necessarily
orthogonal, then the optimal weights that minimize the mean squared error can be
obtained using the LMS learning law (Widrow & Holt 1960; Hecht-Nielsen 1990).
Once the network is trained, for any given input pattern az, the associated pattern
b~ can be recalled using the equations
N
y~ = ~ w/ja/j and bn = Yi.
j=l
210 B Yegnanarayana
When noisy input patterns are used during recall, i.e., {a~i}, then the recalled pattern
{b'zl} will also be noisy. Since the given set of input pattern {a~}, I= 1... L, is assumed
to be linearly independent, the number of patterns in the input set is limited to the
dimensionality of the input vector, namely, N. Therefore, it is not possible to store
more than N pattern pairs in a linear associative network. If the number of input
pattern are more than its dimension (N), or if the input set (even for L < N) are not
linearly independent, then the resulting weight vectors are not optimal any more. In
such a case the recall of the associative pattern for a given input pattern may not be
correct always.
Even if the input patterns are linearly independent and optimal weights are used,
the recall may be in error if a noisy input pattern is presented to recall the associated
pattern (Murakami & Aibara 1987).
In practice, linear independence is too severe a restriction to satisfy. Moreover the
number of input patterns may far exceed the dimensionality of the input pattern
space. It is possible to overcome these limitations by using nonlinear output functions
in the processing units of the feedforward ANN. Once the restriction on the number
of input patterns is removed, then the problem becomes a pattern classification
problem, which we will discuss in the next section.
(a)
/
Input Output /
//
al a2 bl I /
I
0 0 0
0 0 0
1 0 0
1 1 1 I
(b)
Input Output
al a2 bl
0 0 1
0 1 0
1 0 0
1 1 1 ~[]1: Q1
geometrical terms linear separability means that the given set of input patterns {a~}
can be separated into M distinct regions in the N-dimensional pattern space by a
set of linear hyperplanes. Here M corresponds to the number of distinct output
patterns or classes. As a simple illustration, we can consider 2-dimensional binary
(0, 1) patterns in input pattern space and a 1-dimensional output pattern. Two pattern
classification problems are shown in figure 9. Note that the number of input patterns
(4) is more than the number of dimensions (2) of the input pattern space. These are
two-class problems, as the number of distinct outputs are two. Of the two problems
in the figure, the first one is linearly separable since a straight line separates the
patterns into two regions of desired classes. In the second problem the desired region
cannot be obtained by using a single straight line. Note that a straight line is equivalent
to a linear hyperplane in a 2-dimensional space.
The restriction of linear separability is due to the function relating input and output
patterns. Any arbitrary assignment of an output pattern to a set of input patterns
need not result in a linearly separable function, and hence cannot be represented by
the two layer network with nonlinear units in the output layer. Thus, although the
restriction on the number and type of input patterns (as in the case of pattern
association problem) is removed due to introduction of nonlinear units, a restriction
is now placed on the nature of the function relating the input and output patterns.
To remove this restriction a multilayer feedforward network with nonlinear processing
units can be used (Minsky & Papert 1988). Such a network can handle a more general
class of pattern classification problems, namely, pattern mapping problems which will
212 B Yegnanarayana
4.1c Pattern mapping: For a pattern mapping problem the input and output
patterns are points in the N- and M-dimensional continuous spaces, respectively. The
objective is to capture the implied functional relationship or mapping function
between the input and output by training a feedforward neural network. This is also
called the generalization problem (Deuker et al 1987). Once the network generalizes
by capturing the mapping function through its weights, then during recall from an
input pattern the network produces an output which is an interpolated version of
the outputs of the training input patterns near the current input pattern. The input
patterns are generally naturally occurring patterns as in speech and hand-printed
characters.
A multilayer feedforward network with at least two intermediate layers in addition
to the input and output layers can perform a pattern mapping task (Cybenko 1989).
The number of units in the input and output layers correspond to the dimensions
of the input and output patterns, respectively. The additional layers are called hidden
layers, and the number of units in a hidden layer is determined depending on the
problem, usually by trial and error. The network can be trained (i.e. weights at
different layers can be adjusted) for a given set of input-output pattern pairs using
a generalized delta rule or backpropagation law (see figure 10) (Rumelhart &
McClelland 1986; Hush & Home 1993). It is derived using the principle of gradient
descent along the error surface in the weight space. The given patterns are applied
in some random order one by one, and the weights are adjusted using the backpro-
pagation law. The pattern pairs may have to be applied several times till the output
error is reduced to an acceptable value.
Once the network is trained, it can be used to recall the appropriate pattern (in
this case some interpolated output pattern) for a new input pattern. The computation
is straightforward in the sense that the weights and the output functions of the units
at different layers are used to compute the activation values and output signals. The
signals from the output layer correspond to the output.
Note that for the backpropagation law to work (see figure 10), the output function
of the units in the hidden and output layers must be nonlinear and differentiable.
Such functions are called semilinear. If they are linear, no advantage is obtained by
using additional hidden layers. By using a hardlimiting threshold function, it is not
possible to propagate the error to hidden layer units to adjust the weights in that
layer. Thus the advantage of complex pattern mapping or pattern classification is
obtained by a multilayer feedforward network mainly because of the use of the
semilinear output functions.
The use of semilinear functions results in a rough error surface in the weight space.
That is, there will be several local minima, besides a global minimum. The effects of
local minima can be partially reduced by using a stochastic update of weight values
(Wasserman 1988). In general the backpropagation learning law needs several
iterations in order to reach an acceptably low value of error, at which the network
can be assumed to have captured the implied mapping in the given set of input-output
Artificial neural networks for pattern recognition 213
pattern pairs. However, due to the slow rate of convergence of the backpropagation
learning law, new architectures (like counter propagation, Hecht-Nielsen 1990) are
being sought for faster learning.
Thus autoassociation by a feedback network with linear units is not going to serve
any purpose. Moreover, the number of patterns is limited to the dimensionality of
the pattern. Although there is no simple learning law, it can be shown that the
weights of such a network can be determined to store any L ~< N patterns, without
any error in recall, where N is the dimension of the input pattern space. Discussion
of autoassociation task by a feedback network with linear units is only of academic
interest, as any input pattern comes out as itself if it is one of the stored ones, and
a noise input comes out as a noisy pattern, not as the nearest stored pattern.
To overcome this limitation due to the absence of accretive behaviour, the linear
units are replaced with units having nonlinear output functions. The resulting feedback
network can then perform p a t t e r n s t o r a o e task which will be considered next.
si(O)=ai, i = 1 , 2 . . . N
where s~(0) is the output of the unit i at time t = 0.
3. Iterate until convergence
The process is repeated until the outputs remain unchanged with further iteration. The
steady outputs of the units represent the stored pattern that best matches the given input.
Figure 12. Hopfield Net algorithm to store and recall a set of bipolar patterns.
which is defined in terms of the output state {si} and weights [w~j] of the network
as (Hopfield 1982)
where Ii is an externa! input and 0i is the threshold of the unit. The energy as a
function of the output state can be viewed as something like an energy landscape.
The shape of the landscape is dictated by the network units and their interconnection
strengths (weights). The feedback and the nonlinear processing units of the network
create basins of attraction in the energy landscape. The basins tend to be regions of
equilibrium states. If there is a fixed state (point in the output state space) in each of
these basins where the energy is minimum, these states corresponds to fixed points
of equilibrium. There could also be periodic (or oscillating) regions or chaotic regions
of equilibrium (Kosko 1992).
It is the existence of the basins of attraction that is exploited to store the desired
patterns and recall them even with approximate inputs as keys. Each pattern is stored
at a fixed point of equilibrium of the energy minimum. An erroneous or distorted
pattern is more likely to be closer to the corresponding true pattern than to the other
stored patterns. Each input pattern results in a state of the network that may be
closer to the desired state, in the sense that it may lie near the basin of attraction
corresponding to the true state. Since an arbitrary state need not correspond to a
stable state, the activation dynamics of the network may eventually lead to a stable
state from which the desired pattern may be read or derived.
If the nonlinear output function of each unit is a binary threshold function (curve A
Sl
(a)
w12 -- = - . = = 0 •5
S2 53
O
0 1
0.1
(b)
1
\
I, 02
0
J 3 3
1
1 0 1
0.1 0.1
0
°I//I'
0.0
0 1
0.0
1
0 0
41.1
0
1 0
-02
Figure 13. (a) A 3-unit feedback network with symmetric weights and binary
threshold units. Activation dynamics x j = ~ w j ~ s i - O j , sj=f(xj); Energy
J
E = - ½ ~ ~ w,js,sj + ~. s~O,. (b) State transition diagram for the 3-unit network
i j i
of figure 13a. Each block represents a state given by sequence sl,s2, %. There are
eight blocks for eight states. The energy for each state is indicated by the bold
numbers with each block. Note that the state diagram has three stable states
(I 11, 100 and 010) (Aleksander & Morton 1990).
Artificial neural networks for pattern recognition 217
in figure 3), then the stable stables of the network would lie at the corners of the binary
hypercube in the N-dimensional discrete binary space. On the other hand, if the
output function is a semilinear function (curve C in figure 3), then the points
corresponding to these states may move closer to each other within the unit hypercube.
If the output function is a horizontal line, then almost all states remain close to each
other, and hence there will be only one state for the network.
Given a network, it is possible to determine the state transition diagram (Aleksander
& Morton 1990). Figure 13 shows the state transition diagram for a 3-unit network.
The diagram illustrates the different states of the network and their transition
probabilities. States which have self transition with probability 1 are stable states.
For a given number of units, the state transition probabilities and the number or
stable states are dictated by the connection strengths or weights.
Since each state is associated with some energy value, the state transition diagram
shows transitions from a state with higher energy value to a state having lower or
equal energy value. The energy value of a stable state corresponds to an energy
minimum in the landscape, as there is no transition from this to the other states.
The number of basins of attraction in the energy landscape depends only on the
network, i.e., the number of processing units and their interconnection strengths
(weights). When the number of pattens to be stored is less than the number of basins
of attraction, i.e., stable states, then there will be spurious stable states, which do not
correspond to any desired pattens. In such a case, when the network is presented
with an approximate p a t t e n for recall, the activation dynamics may eventually lead
to a stable state which may correspond to one of the spurious states or a false energy
minimum, or to one of the stable states corresponding to some other patten. In the
latter case there will be an undetected error in the recall. The average probability of
error depends on the energy values of the stable states corresponding to the desired
pattens, and the relative locations of these states in the state space, measured in
terms of some distance criterion.
If the number of desired pattens to be stored is more than the number of basins
of attraction in the energy landscape, then the problem becomes a hard problem, in
the sense that the given pattens cannot be stored in the network.
For a given network it is not normally possible to determine exactly the number
of basins of attraction as well as their relative spacings and depths in the state space
of the network. It is possible to estimate the capacity (number of pattens that can
be stored) of the network and also the average probability of error in recall
(Abu-Mostafa & St. Jaques 1985; Aleksander & Morton 1990). The probability of
error in recall can be reduced by adjusting the weights in such a way that the resulting
energy landscape is matched to the probability distribution of the input pattens.
This becomes the problem of storing a pattern environment.
schedule), making sure that at each temperature the network is allowed to reach
thermal equilibrium before the next change in temperature is made. This process is
called simulated annealing (Kirkpatrick et al 1983) (see figure 16). Note that at each
temperature the state update dynamics is fixed, as the probability of transition from
one state to another depends only on the temperature. The update dynamics is
however altered when the temperature is changed, and it results in a new state
transition diagram.
The states at thermal equilibrium at T = 0 represent the stable states of the network
corresponding to the minima of the energy landscape. The probabilities of these states
are also related to the actual minimum energy values of the states. The relation
between the probabilities of stable states and energy suggest that the probability of
error in the recall of patterns can be further reduced if the probability distribution
of the desired patterns, i.e., the pattern environment, is known, and is used in
determining the optimal setting of weights of the network.
P+(Vo) = ~ P+(VoAH~),
b
where P ÷( VoA H b) is the probability of the state of the network when the outputs of the visible
units are clamped to the vector Vo, and the outputs of the hidden units are Hb.
Likewise the probability that V° will appear on the visible units when none of the visible
units are clamped (indicated by " - " superscript) is given by
P-(V~)= ~P-(VaAHb).
b
Note that P+(Vo) is given by the pattern environment description, and P-(Va) depends on
the network dynamics and is given by
where the total energy of the system in the state VoAH b is given by
E, b = --iz.~.
1 S~ w 0 ~b
i sab
j ,
t,J
Pi~ = ~ P- ( VaAHb)~b~ ~.
a,b
The Boltzmann learning law (see figure 17) allows us to represent a given environ-
ment by the network (Ackley et al 1985; Hinton & Sejnowski 1986; Aleksander
& Morton 1990). The law uses an information theoretic measure to evaluate how
well the environment is represented in the network. If a perfect representation is
obtained, then there will be as many energy minima as there are desired patterns.
But in practice only an approximate representation of the environment is accomplished,
and hence there will be some spurious stable states which correspond to the false wells
in the energy landscape. The Boltzmann learning law uses a simulated annealing
schedule for implementation, i.e., for determining the weight updates at each stage.
Recall of stored patterns from an approximate input pattern also uses a simulated
annealing schedule to overcome the false minima created because of the approximate
representation of the environment by the network.
In general the I~oltzmann learning law converges slowly to the desired weights
(Geman & Geman 1984; Szu 1986, pp. 420-5). Moreover, there is no simple way to
determine the optimum number of hidden units for a network to solve the given
problem of pattern environment storage. The larger the number of hidden units,
the greater is the chance for more false minima, and hence the greater the probability
of error in recalling a stored pattern. The smaller the number of hidden units, the
greater the chance that the given problem becomes hard for the network. New
architectures are needed to overcome some of these limitations of the Boltzmann
machine for the problem of pattern environment storage.
In this section we discuss some pattern recognition tasks (figure 18) that can be
performed by a network consisting of two layers of processing units: The first layer
with linear output units feeds the input pattern to the units in the second layer
through a set of feedforward connections with appropriate weights. The outputs of
Feature map
. Arch: Self-organizationnetwork, 2 layers, nonlinear processing units
. Learning: Neighbourhood units in competitive layer
. Recall: Apply input, determine winner
• Limitation: Only visual features, not quantitative
. T o overcome: More complex architecture
the units in the second layer are fedback to the units in the same layer including
feedback to the same unit. The self feedback is usually with a positive weight (excitatory
connection) and the feedback to the other units is usually with a negative weight
(inhibitory connection). The weights on the feedback connections in the seond layer
are usually fixed. The first layer of units is called input layer, and the second layer
is called competitive layer (Rumelhart & Zipser 1986). Different choices of output
functions and methods of learning lead to networks for different types of competition
tasks. We discuss three such tasks. Assuming fLxedweights in the feedforward connections
from the input to the competitive layer, and in the feedback connections in the
competitive layer, we can study the behaviour of the network for different types of
output functions of the units in the competition layer.
4.3a Pattern storage (short term memory): First let us assume the output functions
to be linear. When an input pattern is applied, the units in the competition layer
settle to a steady activation state which will remain there even after the input pattern
is removed (Freeman & Skupura 1991). The activation pattern will remain as long
as the network is not given a different input pattern. Another input pattern will erase
the previous activation state. Hence this is called short-term memory. The pattern
is stored only temporarily.
This pattern storage representation is only of theoretical interest. There is no
application for such a short-term memory function. However, by using a nonlinear
output function for the units in the competition layer one could show that the network
can perform a pattern clustering task.
input raver
Figure 19. A feature mapping architecture. Each input
unit is connected to all the units of the output layer.
4.4b Feature map: Given a set of patterns, the objective is to design a network
that would organise the patterns in accordance with similarity of features among
them in such a way that by looking at the output of the network one can visually
obtain an idea of how different patterns are related. The display of signals from the
output layer (typically in 2-dimension) of units is called a feature map.
To accomplish the task of feature mapping, a competitive network is modified into
one called a selforoanising network (Kohonen 1990; Freeman & Skupura 1991) shown
in figure 19. The modification consists of creating a neighbourhood region around the
winning unit in the competitive layer, so that during training all the feedforward
weights leading to the units in this region are adjusted to favour the input pattern.
The weight adjustment is similar to the case of a competitive network. The neighbour-
hood region around a winning unit is gradually reduced for each application of the
given set of patterns (see figure 20).
where aj(t) is the input to thejth input unit at time t and wu(t) is the weight from thejth input
unit to the ith output unit.
4. Select the output unit i* with minimum distance
For recall, when an unknown input is applied, the activation dynamics determines
the winning unit whose location would determine its features relative to the features
represented by the other units in its neighbourhood.
While a feature map produces a more realistic arrangement of patterns, the output
is useful only for visual observation. Since it is difficult to categorize a feature map,
it is difficult to use it for applications such as pattern classification. A more complex
architecture is needed to exploit the advantages of a feature map for pattern
classification purposes (Huang & Kuh 1992).
So far we have considered simple structures of neural networks and discussed the
pattern recognition tasks that these structures could accomplish. In practice the
pattern recognition tasks are much more complex, and each task may require evolving
a new structure based on the.principles discussed in the previous sections. In fact
designing an architecture for a given task involves developing a suitable structure of
the neural network and defining appropriate activation and synaptic dynamics. In
this section we will discuss some general architectures for complex pattern recognition
tasks.
Pattern storage is the most obvious pattern recognition task that one would like to
accomplish by an ANN. This is a memory function, where the network is expected
to store the pattern information for later recall. The patterns to be stored may be
spatial or spatiotemporal (pattern sequence). Typically an ANN behaves like an
associative memory, in which a pattern is associated with another, or with itself. This
is in contrast with the random access memory which maps an address to a data. An
ANN can also function as a content addressable memory where data are mapped to
an address.
The pattern information is stored in the weight matrix of a feedback neural network.
The stable states of the network represent the stored patterns, which can be recalled
by providing an external simulus in the form of partial input. If the weight matrix
stores the given patterns, then network becomes an autoassociative memory. Several
architectures are proposed in the literature for realizing an associative memory
function depending on whether the pattern data is discrete/continuous, or the network
is operating in discrete time/continuous time, or the learning is taking place off-line/
on-line (Simpson 1990).
We will discuss the discrete bidirectional associative memory (BAM) in some detail.
It is a two-layer heteroassociative neural network (figure 21) that encodes arbitrary
binary spatial patterns using Hebbian learning. It learns on-line and operates in
discrete time. The BAM weight matrix is given by,
L
w= E
1=1
where a l e ( - l, + 1}N and bte { - 1, + 1}~. The superscript T refers to the transpose
of the vector.
Artificial neural networks for pattern recognition 225
hi1 b, 2 lid
\ \
\
Figure 21. Discrete bidirectional as-
011 ol2 GIN sociative memory.
1, if y~>O, N
b~(t + 1) = bi(t ), if y~ = O, where y~ = ~ ai(t)wlt,
- 1, if yi<O, ~1
1, if xi>O,
ai(t + 1) = a~(O, if x~ = O, where xi = ~ bj(t)w O.
- 1, if x~<0, j~t
For recall, the given input ai(O), i = 1, 2 . . . N is applied and the stable values of bi(oo),
j = I, 2...M are "read out. BAM updates are synchronous in the sense that the units
in each layer are updated simultaneously.
BAM can be shown to be unconditionally stable (Kosko 1988). However its storage
is limited to a small number of binary/bipolar patterns.
b2
' i bm
(5) Repeat steps 1 through 4 until all input vectors are grouped properly by applying
the training set several times.
After successful training the weight vectors leading to each hidden unit represent the
average of the input vectors corresponding to the group represented by the unit.
(4) Repeat steps 1 through 3 until all the vector pairs in the training set
are mapped satisfactory.
After successful training, the outstar weights for each unit in the competitive layer
represents the average of the subset of the output vectors bj corresponding to the
input vectors belonging to that unit.
Depending on the number of nodes in the hidden layer, the network can perform
any desired mapping function. In the extreme case, if a unit is provided in the hidden
layer for each input pattern, then any arbitrary mapping (a~, bt) can be realized. But
in such a case the network fails to generalize. It merely stores the pattern pair. By
using a small number of units in the hidden layer, the network can accomplish data
compression. Note also tliat the network can be trained to capture the inverse mapping
Artificial neural networks for pattern recognition 227
as well, i.e., at = ~-~ (b~), provided such a mapping exists and it is unique. The name
counterpropagation is given due to the network's ability to learn both forward and
inverse mapping functions.
control F~ Layer t
Input vector,
l° Reset
signal
Figure 23. Adaptive resonance
theory (ART) architecture. Two
major subsystems are the atten-
tional subsystemand the orienting
subsystem. Units in each layer are
fully interconnected to the units in
the other layer.
228 B Yegnanarayana
suitable match is obtained between the activations in the F t layer due to top-down
pattern and the input pattern. When a match is obtained, then both the bottom-up
and top-down network weights are adjusted to reinforce the input pattern. If no
match is obtained then an uncommitted (whose category is not identified during
training) unit in the f 2 layer is committed to this input pattern, and the corresponding
weights are adjusted to reinforce the input.
The above sequence of events conducts a search through the encoded patterns
associated with each category trying to find a sufficiently ctose match with the input
pattern. If no category exits, a new category is made. The search process is controlled
by two subsystems, namely the orienting subsystem and the attentional subsystem.
The orienting subsystem uses the dimensionless vigilance parameter that establishes
the criterion for deciding whether the match is good enough to accept the input
pattern as an exemplar of the chosen category. The gain control process in the
attentional subsystem allows the units in F I to be engaged only when an input pattern
is present, and it also actively regulates the learning (Freeman & Skupura 1991).
Stability is achieved in the ART network due to the dynamic matching and the
control in learning. Plasticity is achieved in the ART due to its ability to commit an.
uncommitted unit in the f 2 layer for an input pattern belonging to a category different
from what was already learnt.
ART gets its name from the particular way in which learning and recall interplay
in the network. Information in the form of output signals from units reverberate back
and forth between the two layers. If the proper patterns develop, a stable oscillation
ensures, which is the neural network equivalent of resonance. During this resonance
period learning or adjustment of adaptive weights takes place. Before the network
has achieved a resonant state, no learning takes place, because the time required for
changes in the weights is much longer than the time it takes the network to achieve
resonance.
ART1 network was proposed to deal with binary input patterns (Carpenter &
Grossberg 1988). ART2 network was developed to selforganize recognition categories
for analog as well as binary input patterns (Carpenter & Grossberg 1987).
A minimal ART network can be embedded in a larger system to realize an associate
memory. A system like CPN or multilayer perceptron directly maps pairs of patterns
(at, bl) during learning. If an ART system replaces the CPN, the resulting system becomes
self stabilizing. Two ART systems can be used to pair sequences of the categories
seiforganized by the input sequences as shown in figure 24. The pattern recall can
2o1 .-o
I
Memory
Associo,vo I
to to+Z~t tl
• . .
f
)
al(t) aN(t)
Figure 25. Grossberg's formal avalanche
b1(t) b2(t) bN(t) architecture.
occur in either direction during performance as in BAM. This scheme brings to the
associate memory paradigm the code compression capabilities, as well as the stability
properties of ART (Carpenter 1989).
The ANN architectures described so far are applicable for recognition of patterns on
the basis of information contained within the pattern itself. Even if a sequence of
patterns with temporal correlation are presented, the previous or subsequent patterns
have no effect on the classification of the current input pattern. But there are many
applications (for example, speech recognition) where it is necessary to encode the
information relating to the time correlation of spatial patterns, as well as the spatial
pattern information itself.
Architectures for classification of spatio-temporal patterns (STP) are based on the
Grossberg formal avalanche structure (Grossberg 1969). The structure (figure 25) of
the network resembles the top two layers of the CPN, and both use multiple outstars.
The avalanche architecture shows how a complex spatio-temporal pattern can be
learned and recalled. Assume a(t)= (al (t), a z (t).... aN(t)) the spatial pattern required
at time t. The sequence ofa(t) at time intervals of At in the range to ~<t ~<tl correspond
to the desired spatio-temporal pattern. Activate the node labelled to and apply a(to)
to be learned by the outstar's output units. The second pattern a(t + At) is applied
while activating the second outstar, labelled to + At. Continue this process by activating
successive outstars until all the patterns have been learned in sequence. Replay of
the learned sequence can be initialized by stimulating the to node, while a zero vector
is applied to the a inputs. The output sequence b(t) ~ a(t), for to ~<t ~<tl, is the learned
sequence.
associative memory can recover complete patterns from such imperfections, but
normally cannot work if there is variability or deformation in the patterns of the test
input.
Neural network models based on our understanding of human visual pattern
recognition tend to perform well even for shifted and deformed patterns. In the visual
area of the cerebrum, neurons respond selectively to local features of a visual pattern
such as lines and edges. In areas higher than the visual cortex, cells exist that respond
selectively to certain figures like circles, triangles, squares, human faces etc (Fukushima
1975). Thus the human visual system seems to have a hierarchical structure in which
simple features are first extracted from the stimulus pattern, then integrated into more
complicated ones. A cell at a higher stage generally receives signals from a wider
area of the retina and is less sensitive to the position of the stimulus. Within the
hierarchical structure of the visual systems are forward (afferent or bottom-up) and
backward (efferent or top-down) propagation of signals. This kind of physiological
evidence suggests a neural network structure for modelling the phenomenon of visual
pattern recognition.
The objective is to synthesize a neural network model for pattern recognition for
shifted and deformed patterns. The network model learns with a teacher (supervised
learning) for reinforcement of the adaptive weights. The network model is called
neocognitron. It is a hierarchical network (figure 26) consisting of many layers of cells,
and has variable connections between cells in adjoining layers. It can be trained to
recognize any set of patterns. After training, pattern recognition is performed on the
basis of similarity in shape between patterns, and the recognition is not affected by
deformation, or changes in size, or shifts in the positions of the input patterns
(Fukushima 1988).
In the hierarchical network of the neocognitron, local features of the input pattern
are extracted by the cells of the lower stage, and they are gradually integrated into
more global features. Finally, each cell of the highest stage integrates all the
information of the input pattern, and responds only to one specific pattern. During
the process of extracting and integrating features, errors in the relative positions of
the local features are gradually tolerated. The operation of tolerating positional error
a little at a time at each stage, rather than all in one step, plays an important role
in endowing the network with the ability to recognize even distorted patterns
(Fukushima 1991).
Neocognitron also provides backward connections which will enable it to realize
the selective attention feature of the visual pattern recognition. The selective attention
feature relates to two or more patterns simultaneously present in the data, and our
ability to focus on the desired one.
Neocognitron was developed for recognition of handwritten characters, although
the ideas used in the architecture may be extended to other situations of pattern
variability (Fukushima 1991).
6. Applications
In this section of the paper we briefly discuss the application potential of neural
network models and some research issues that are being currently addressed in this
field. In applications we consider two different situations, one where the existing
neural network concepts can be directly applied, and the other where there is potential
for applying the neural network ideas but it is not yet clear how to formulate the
real world problems to evolve a suitable neural network architecture. We will also
list a few cases where neural network principles are being used in practice.
6.1a Associative memories (Bienenstock & yon der Malsburg 1987; Hassoun 1989;
Desai 1990; Kamp & Hasler 1990; Michel & Farrell 1990): As discussed earlier, the
objective of associative memory is to store a pattern or data for later recall with
partial or noisy version of the pattern as input, or to store association between two
patterns for later recall of one of the patterns given the other. Both feedback and
feedforward topologies of neural networks are directly used for these applications.
Associative memory, if used in a feedback structure of the Hopfield type, can function
as a content addressable memory as well. The stable states of the network, which
represent the energy minima or basins of attraction, are used to store the pattern
information. In a feedforward network the mapping function corresponding to the
input-output pattern pairs is stored in the weights of the network.
Applications of these networks for associative memory is direct, if the patterns are
232 B Yeonanarayana
available in the form of one or two dimensional array of values. Associative memories
as content addressable memories are quite powerful. For example, if information
about individuals is stored in a network, then it is possible to retrieve the complete
data by providing partial or even noisy clues. Other common applications for an
associative memory are recognition of images, and retrieval of bibliography
information from partial references such as from incomplete title of a paper.
6.1c Vector quantization: Vector quantization (VQ) typically encodes a large set of
training data vectors into a small set of representative points, thus achieving a
significant compression in the representation of data. Vector quantization has been
shown to be useful in compressing data that arises in image processing, speech
processing, facsimile transmission, and weather satellites (Kohonen 1988; Nasrabadi
& King 1988; Naylor & Li 1988).
Formally, vector quantization maps arbitrary data vectors to a binary representation
or a symbol. The mapping is from an N-dimensional vector space to a finite set of
symbols M. Associated with each symbol m e M is a reproduction vector ~m. The
encoding of the data vector x to the symbol m is the mapping in VQ. The collection
of all possible reproduction vectors is called the codebook.
The design of a codebook is called training, and it can be implemented using neural
network models. The learning vector quantization (LVQ)structure is one such network
model. Several other models have been proposed, for example, Kohonen's self-
organising feature maps, to construct VQ codebooks for speech applications, and for
image coding (Kohonen 1989; Ahalt et al 1990).
6.1d Pattern classification: Pattern classification is the most direct among all
applications of neural networks. In fact neural networks became very popular because
Artificial neural networks for pattern recognition 233
Neural network concepts and principles appear to have great potential for solving
problems arising in practice. For most practical problems the solution by neural
networks is not obvious. This is because the problems cannot be mapped directly
onto an existing neural network architecture. In fact there are no principles guiding
us to this mapping. There are many pattern recognition tasks in speech and vision
which we seem to perform effortlessly, but we do not understand how we do so. For
example, in speech, our auditory mechanism processes the signal directly in a manner
suitable for later neural processing. On the other hand, to prepare input to an artificial
neural network, the speech signal is normally processed in fixed frames of 10-20 ms
to extract a fixed number of spectral or related parameters. In this process the temporal
and spectral features with proper resolution needed for recognition may not have
been captured. Moreover, there is as yet no neural network architecture which could
perform the speech pattern recognition with the same effectiveness as human beings
do. Similar comments apply to problems in visual pattern recognition also. Some of
the other areas where human performance cannot be matched by existing neural
architectures are in motor control and decision making.
Despite realization of these issues, there are several cases where neural principles
have been used successfully. A few of them are listed below in different areas for
illustration (Lisboa 1992).
6.2b Computer vision: Recognition of hand-printed digits has been one of the most
successful applications of neural networks (Krzyzak et al 1990). Satellite image data
compression and enhancement of noisy images are some of the other useful
applications (Hertz et al 1991; Raghu et al 1993).
Transformation invariant object recognition is one of the most difficult tasks,
234 B Yegnanarayana
6.2c Robotics and control: Artificial vision for autonomous navigation, path
planning with obstacle avoidance, and parallel computation of inverse dynamics are
some of the applications of neural networks in robotics (Kung & Hwang 1989;
Handleman et al 1990; Kuperstein & Wang 1990).
Operation guidance in blast furnace control and modelling nonlinearities in
chemical process control are some of the applications of neural networks in control
areas (Bhat & McAvoy 1989; Konishi et al 1990).
6.2e Medical applications: Medical diagnosis, noise filters for cardiac signals, image
processing of ultrasonograms, and discrimination of signals for patient monitoring,
have all been successfully implemented using networks (Reggia & Suttonn 1988;
Scalia et al 1988).
6.2f Business and finance: Scheduling and inventory control application, bond
rating and asset forecasting in the stock market, exchange-rate forecasting, credit
scoring, and mortgage underwriting have all demonstrated the successful use of neural
network principles in business and finance (Collins et al 1988; Dutta & Shekkar 1988;
White 1988).
In this tutorial article we have discussed the need for exploring new computing models
for pattern recognition tasks. The importance of distinction between pattern
processing and data processing has been discussed. The promise of the architectures
inspired by the functions of biological neural networks has been shown by tracing the
significant developments in artificial neural networks over the past decade. We have
discussed the basics of artificial neural networks in terms of models of neurons,
learning laws, and topology. We have also discussed the types of pattern recognition
problems that can be solved by simple architectures based on the principles of artificial
neural networks. Complex pattern recognition tasks require specialized architectures.
Some general architectures were discussed for tasks requiring to resolve
stability-plasticity dilemma and for tasks involving pattern variability and temporal
patterns.
The most important issue for solving practical problems using the principles of A N N
is still in evolving a suitable architecture to solve a problem. This continues to
dominate this research area. ANN research may have to expand its scope to take into
account the fuzzy nature of real world data and reasoning, and the complex (unknown)
processing performed by the human perceptual mechanism through biological neural
networks.
Artificial neural networks for pattern recognition 235
References
Abu-Mostafa Y S, St. Jaques J M 1985 Information capacity of the Hopfield model. IEEE
Trans. Inf. Theor. 31:461-464
Ackley D M, Hinton G E, SejnowskiT J 1985 A learning algorithm for Boltzmann machines.
Cogn. Sci. 9:147-169
Ahalt S C, Krishnamurthy A K, Chen P, Melton D E 1990 Competitive learning algorithms
for vector quantization. Neural Networks 3:277-290
Aleksander I, Morton H 1990 An introduction to neural computing (London: Chapman and
Hall)
Bhat N, McAvoy T 1989 Use of neural nets for dynamic modelling and control of chemical
process systems. Proc. Am. Autom. Contr. Conf., Pittsburgh, PA, pp. 1342-1348
Bienenstock E, vonder Malsburg Ch 1987 A neural network for the retrieval of superimposed
connection patterns. Euro. Phys. Lett. 3:1243-1249
Bottou L, Soulie F F, Blanchet P, Lienard J S 1990 Speaker independent isolated digit
recognition:multilayerperceptrons vs. dynamictime warping. Neural Networks 3:436-465
Carpenter G A 1989 Neural network models for pattern recognition and associative memory.
Neural networks 2:138-152
Carpenter G A, Grossberg S 1987 ART2:Self-organizationof stable category recognition codes
for analog input patterns. Appl. Opt. 26:4919-4930
Carpenter G A, Grossberg S 1988 The ARTof adaptive pattern recognition by a self-organizing
neural network. IEEE Comput. 21:77-88
236 B Yegnanarayana
Cohen M, Grossberg S 1983 Absolute stability of global pattern formation and parallel storage
by competitive neural networks. IEEE Trans. Syst., Man Cybern. SMC-13:815-825
Cole R, Fanty M, Muthuswamy Y, Gopalakrishna M 1992 Speaker-independent recognition
of spoken English letters. Proc. Int. Joint Conf. Neural Networks, San Diego, CA
Collins E, Ghosh S, Scotfield C L 1988 An application of a multiple neural network learning
system to emulation of mortgage underwriting judgements. IEEE Int. Conf. Neural Networks
(Piscataway, NJ: IEEE Press) 2:459-466
Cybenko G 1989 Continuous value neural networks with two hidden layers are sufficient.
Math. Control. SionaI Syst. 2:303-314
Desai M S 1990 Noisy pattern retrieval usin9 associative memories. MSEE thesis, University of
Louisville, Kentucky
Deuker J, Schwartz D, Wittner B, Solla S, Howard R, Jackel L, Hopfield J 1987 Large automatic
learning, rule extraction, and generalization. Complex Syst. 1:877-922
Dotsenko V S 1988 Neural networks: translation-, rotation- and scale invariant pattern
recognition J. Phys. A21:L783-L787
Durra S, Shekkar S 1988 Bond rating: a non-conservative application of neural networks.
1EEE lnt. Conf. Neural Networks (Piscataway, NJ: IEEE Press) 2:443-450
Freeman J A, Skupura D M 1991 Neural network algorithms, applications and programming
techniques (New York: Addison-Wesley)
Fukushima K 1975 Cognitron: A self-organizing multilayer neural network. Biol. Cybern.
20:121-136
Fukushima K 1988 A neural network for visual pattern recognition. IEEE Comput. 21:65-75
Fukushima K 1991 Handwritten alphanumeric character recognition by the neocognitron.
IEEE Trans. Neural Networks 2:355-365
Fukushima K, Miyake S 1982 Neocognitron: a new algorithm for pattern recognition tolerant
of deformations and shifts in position. Pattern Recogn. 15:455-469
Geman S, Geman D 1984 Stochastic relaxation, Gibbs distributions, and the Bayesian
restoration of images. IEEE Trans. Pattern Anal. Machine lntell. PAMI-6:721-741
Gorman R, Sejnowski T 1988 Learned classification of sonar targets using a massively parallel
network. 1EEE Trans. Acoust. Speech Signal Process. 36:1135-1140
Grossberg S 1969 Some networks that can learn, and reproduce any number of complicated
space-time patterns. Int. J. Math. Mech. 19:53-91
Grossberg S 1980 How does a brain build a cognitive code? Psychol. Rev. 87:1-51
Grossberg S 1982 Studies of mind & brain (Boston: Reidel)
Grossberg S 1988 Nonlinear neural networks: Principles, mechanisms, and architecture. Neural
networks 1:17-61
Handleman D H, Lane S H, Gelfand J J 1990 Integrating neural networks arld knowledge-based
systems for intelligent robotic control. IEEE Control Syst. Mag. 10(3): 77-87
Hassoun M H 1989 Dynamic heteroassociative memories. Neural Networks 2:275-287
Hebb D 1949 Organization of the behaviour (New York: Wiley)
Hecht-Nielson R 1987 Counterpropagation networks. AppL Opt. 26:4979-4984
Hecht-Nielson R 1990 Neurocomputin9 (Reading, MA: Addison-Wesley)
Hertz J, Krogh A, Richard G P 1991 Introduction to the theory of neural computation (New York:
Addison-Wesley)
Hinton G E, Sejnowski T J 1986 Learning and relearning in Boltzmann machines. In Parallel
distributed processing: Explorations in the microstructure of cognition (eds) D E Rumelhart,
J L McClelland (Cambridge, MA: MIT Press) 1:282-317
Hodgkin A L, Huxley A F 1952 A quantitative description of membrane current and its
application to conduction and excitation in nerve. J. Physiol. 117:500-544
Hopfield J J 1982 Neural networks and physical systems with emergent collective computational
capabilities. Proc. Natl. Acad. Sci. (USA) 79:2554-2558
Hopfield J J, Tank D W 1985 Neural computation of decisions in optimization problems.
Biol. Cybern. 52:141-154
Huang Z, Kuh A 1992 A combined self-organizing feature map and multilayer perceptron for
isolated word recognition. IEEE Trans. Signal Process. 40:2651-2657
Hush D R, Horne B G 1993 Progress in supervised neural networks. IEEE Signal Process.
Mag. 10:8-39
Kamp Y, Hasler M 1990 Recursive neural networks for associative memory (Chichester: John
Wiley & Sons)
Artificial neural networks for pattern recoonition 237
Kennedy M P, Chau L O 1988 Neural networks for nonlinear programming. IEEE Trans.
Circuits Syst. CAS-35:554-562
Kirkpatrick S, Gelatt C D Jr, Vecchi M P 1983 Optimization by simulated annealing. Science
220:671-680
Kohonen T 1988 An introduction to neural computing. Neural Networks 1:3-16
Kohonen T 1989 Self-organization and associative memory (3rd edn) (Berlin: Springer-Verlag)
Kohonen T 1990 The self-organizing map. Proc. IEEE 78:1464-1480
Konishi M, Otsuka Y, Matsuda K, Tamura N, Fuki A, Kadognchi K 1990 Application of a
neural network to operation guidance in a blast furnace. 3rd European Seminar on Neural
Computing: The Marketplace, London
Kosko B 1988 Bidirectional associative memories. IEEE Trans. Syst., Man Cybern. 18:
49-60
Kosko B 1990 Unsupervised learning in noise. IEEE Trans. Neural Networks 1:44-57
Kosko B 1992 Neural networks and fuzzy systems (Englewood Cliffs, NJ: Prentice-Hall)
Krzyzak A, Dali W, Yuen C Y 1990 Unconstrained handwritten character classification using
modified back propagation model. In Frontiers in handwritting recognition (ed.) C Y Suen
(Montreal: CENPARMI)
Kung S Y, Hwang J N 1989 Neural network architectures for robotic applications. IEEE
Trans. Robotics Autom. 5:641-657
Kuperstein M, Wang J 1990 Neural controller for adaptive movements with unforeseen
payloads. IEEE Trans. Neural Networks 1(1): 137-142
Lang K J, Waibel A H, Hinton G E 1990 A time-delay neural network architecture for isolated
word recognition. Neural Networks 3(1): 23-44
LcCun Y, Boser B, Dcnker J S, Henderson D, Howard R E, Hubbard W, Jackel L D 1989
Back propagation applied to handwritten zip code recognition. Neural Comput. 1:541-551
Lippmann R P 1987 An introduction to computing with neural nets. IEEE Trans. Acoust.
Speech Signal Process. Mag. (April): 4-22
Lippmann R P 1989a Review of neural networks for speech recognition. Neural Comput. 1(1):
1-38
Lippmann R P 1989b Pattern classification using neural networks. IEEE Commun. Mag. (Nov):
47-64
Lisboa P G P 1992 Neural networks current applications (London: Chapman & Hall)
Maa C Y, Chin C, Shanblatt M A 1990 A constrained optimization neural net techniques for
economic power dispatch. Proc. 1990 (New York: IEEE Press)
Marcus A, van Dam A 1991 User-interface developments for the nineties. IEEE Comput. 24:
49-57
McCulloch W S, Pitts W 1943 A logical calculus of the ideas immanent in nervous activity.
Bull. Math. Biophys. 5:115-133
Michel A N, Farrell J A 1990 Associative memories via artificial neural networks. IEEE Control
Syst. Mag. (April): 6-17
Minsky M, Papert S A 1988 Perceptron (Cambridge, MA: M1T Press)
Muller B, Reinhardt J 1990 Neural networks: An introduction (Berlin: Springer-Verlag)
Murakami K, Aibara T 1987 An improvement on the Moore-Penrose generalized inverse
associative memory. IEEE Trans. Syst. Man. Cybern. SMC 17:699-706
Naidu S R, Zafiriou E, McAvoy T J 1990 Use of neural networks for sensor failure detection
in a control system. IEEE Control Syst. Mag. 10(3): 49-55
Nasrabadi N M, King R A 1988 Image coding using vector quantization: A review. 1EEE
Trans. Commun. 36:957-971
Naylor J, Li K P 1988 Analysis of a neural network algorithm for vector quantization of
speech parameters. Neural Networks 1 (Suppl): 310
Pal S K, Mitra S 1992 Multilayer perceptron, fuzzy sets, and ciassifcation. IEEE Trans. Neural
Networks 3:683-697
Raghu P P, Chouhan H M, Yegnanarayana B 1993 Multispectral image classification using
neural network. Proc. Natl. Conf. on Neural Networks, Anna University, Madras (NCNN):
1-10
Rauch H E, Winarske T 1988 Neural networks for routing communications traffic. IEEE
Control Syst. Mao. (April): 26-31
Reggia J A, Suttonn G G 1988 III. Self-processing networks and their biomedical implications.
Proc. IEEE 76:680-692
238 B Yegnanarayana
Rosenblatt F 1958 A probabilistic model for information storage and organization in the brain.
Psychol. Rev. 65:386-408
Rosenblatt F 1962 Principles of neurodynamics (Washington, DC: Spartan)
Rumelhart D, McClelland J 1986 Parallel distributed processing: Explorations in the microstructure
of cognition (Boston: MIT Press) voi. 1
Rumelhart D E, Zipser D 1986 Feature discovery by competitive learning. Parallel and
distributed processing (eds) J L McClellan& D E Rumelhart 1:151-193
Scalia F, Marconi L, Ridella S, Arrigo P, Mansi C, Mela G S 1988 An example of back
propagation: diagnosis of dyspepsia. 1st lEE Conf. Neural Networks (IEE Conf. Publ.) 313:
332-540
Schalkoft R 1992 Pattern recognition - Statistical, structural and neural approaches (New York:
John Wiley & Sons)
Seibert M, Waxman A 1989 Spreading activation layers, visual saccades, and invariant
representations for neural pattern recognition systems. Neural Networks 2:9-27
Sejnowski T, Rosenberg C 1987 Parallel networks that learn to pronounce English text. Complex
Syst. 1:145-168
Shea P M, Lin V 1989 Detection of explosives in checked airline baggage using an artificial
neural system. Int. Joint. Conf. on Neural Networks 2:31-34
Simpson K P 1990 Artificial neural systems (New York: Pergamon)
Simpson K P 1992 Foundations of neural networks in artificial neural networks (eds) Edgar
Sanchez-Sinencio, Clifford Lau (New York: IEEE Press)
Szu H 1986 Fast simulated annealing. In Neural networks for computing (ed.) J S Denker
(New York: Snowbird)
Tagliarini G A, Page E W 1988 A neural network solution to the concentrator assignment
problem. Neural information processing systems (ed.) D Z Anderson (New York: Am. Inst.
Phys.)
yon der Malsburg Ch 1973 Self-organization of orientation sensitive cells in the striate cortex.
Kybernetik 14:85-100
Waibel A 1989 Modular construction of time-delay neural networks for speech recognition.
Neural Comput. 1:39-46
Wasserman P D 1988 Combined backpropagation/cauchy machine. Neural networks: Abstracts
of the first iNNS Meeting, Boston (Elmsford, NV: Pergamon) 1:556
White H 1988 Economic prediction using neural networks: the case of IBM daily stock returns.
Neural networks: Abstracts of the First INNS Meeting, Boston (Elmsford, NY: Pergamon)
1:451-458
Widrow B, Hoff M E 1960 Adaptive switching circuits. IRE WESCON Convention Record
(4): 96-104
Willshaw D J, von der Malsburg Ch 1976 How patterned neural connections can be set up
by self-organization. Proc. R. Soc. London B194:431-445
Zurada J M 1992 Introduction to artificial neural systems (St. Paul, MN: West)