Unit 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Neural Networks

UNIT 2 NEURAL NETWORKS


Structure
2.1 Introduction
Objectives
2.2 Architecture of the Neural Network
2.3 Back Propagation (BP)
2.4 Self-organising Map (SOM)
2.4.1 Competitive Process
2.4.2 Cooperative Process
2.4.3 Adaptive Process
2.5 Implementation of Neural Network in Process Planning Problem
2.5.1 Problem Representation
2.5.2 Training
2.5.3 Examples
2.5.4 Results
2.6 Summary
2.7 Key Words
2.8 Answers to SAQs

2.1 INTRODUCTION
Work on neural networks, commonly referred to as “neural networks,” has been
motivated right from its inception by the recognition that the human brain computes in an
entirely different way from the conventional digital computer. In its most general form, a
neural network is a machine that is designed to model the way in which the brain
performs a particular task or function of interest; the network is usually implemented by
using electronic components or is simulated in software on a digital computer. We may
offer the following definition of neural network viewed as an adaptive machine:
A neural network is a massively parallel distributed processor made up of simple
processing units, which has a natural propensity for storing experiential knowledge and
making it available for use. It resembles the brain in two respects :
• Knowledge is acquired by the network from its environment through a
learning process.
• Interneuron connection strengths, known as synaptic weights, are used to
store the acquired knowledge.
Neural networks are a class of computing systems that use a highly parallel architecture
to efficiently perform pattern recall, prediction, and classification tasks. These networks
are loosely modeled after human networks of neurons in the brain and nervous systems.
The networks consist of large numbers of simple processing units that are characterized
by a state of activation, which is a function of inputs to the units. Each unit calculates
three functions :
• Propagation Function Neti : Generally the propagation function is
calculated as the weighted sum of the outputs of all other units connected to
unit i :
Net i = ∑ Wij O j . . . (2.1)
j

where, Wij is the strength of the connection between units i and j, and Oj is
the output level of unit j.
• Activation Function Ai : A function of Neti and occasionally of time.
Commonly used functions are the linear, logistic, and threshold functions.
19
Basics of Artificial • Output Function Oi : A function of Ai, Oi is often simply set equal to the
Intelligence Tools activation, Ai.
The connections may be inhibitory or excitatory. Inhibitory connections tend to reduce
the activation of units to which they are connected, while excitatory connections will
tend to raise the activation.
Neural network operate at two different levels of time: short-term response to
environmental inputs, and long-term changes in connection weights, which encode
knowledge and change how the network reacts to its environment.
Many different neural network architectures have been developed. They differ in the
types of propagation and activation functions used, how units are interconnected, and
how learning is implemented. The type of paradigm used depends on the characteristics
of the task to be performed. A major distinction among networks is whether the system
will be used for recall (recognition), prediction, or classification. Recall systems are used
for noise filtering and pattern completion (also called content addressable memories or
CAM). Examples are Anderson Brain State in a box (BSB) model and Hopfield network.
Prediction systems can be used to estimate the behavior of complex systems. An example
is the work of Rangwala and Dornfield on prediction of machining parameters.
Classification networks create mappings of input patterns into categories, represented by
characteristic output patterns.
The use of neural networks offers the following useful properties and capabilities :
Non-linearity, Input-Output Mapping, Adaptivity, Evidential Response, Contextual
Information, Fault Tolerance, Uniformity of Analysis and Design, Neurobiological
Analogy, etc.
Objectives
After studying this unit, you should be able to
• describe the architecture of neural network,
• understand the concept of self organizing map, and
• explain the applications of neural network in production systems.

2.2 ARCHITECTURE OF NEURAL NETWORK


A neural network is the technique that seeks to build an intelligent program (to
implement intelligence) using models that simulate the working network of the neurons
in the human brain. A neuron is made up of several protrusions called dendrites, and a
long branch called axon. These entities are used to receive and pass information to other
neurons. The neurons are connected with synapses to form a basic biocomputational
system. Conceptually, the meaning of connections is interpreted as the relations between
the neurons. Figure 2.1 illustrates the basic architecture of neurons and connections. The
number of connections among a network is so large that it provides the network with
sophisticated capabilities such as logical deviation, objective perception in natural scenes,
and so on.
Synapse

Synapse Dendrites
Axon
Axon

Soma Soma
Dendrites
Synapse

Figure 2.1 : Basic Architecture of Neurons and Connections

20
The dendrites and axon are the channels for receiving and transmitting information. The Neural Networks
synapses process limitations and stimuli. The reaction signal is produced and computed
by neurons. It is understood that the entire procedure of information processing and
sharing is conducted in three steps :
• Receiving information.
• Processing information.
• Responding to the information.
The process keeps repeating until the network reaches a proper response towards the
stimuli.
Neural network have been widely used in pattern recognition and classification tasks
such as vision and speech processing. Their advantages over traditional classification
schemes and expert systems are :
• Parallel consideration of multiple constraints
• Capability for continued learning throughout the life of the system
• Graceful degradation of performance
• Ability to learn arbitrary mappings between input and output spaces.
Indeed, neural network computing shows great potential in performing complex data
processing and data interpretation tasks. Neural networks are modeled after
neuro-physical structures of human brain cells and the connections among those cells.
Such networks are characterised by exceptional classification and learning capabilities.
Neural networks differ from most other classes of AI tools in that the network does not
require clear-cut rules and knowledge to perform tasks. The magic of neural network is
the ability to make reasonable generalizations and perform reasonably on patterns that
have never before been presented to the network. They learn problem solving procedure
by “characterising” and “memorising” the special features associated with each training
case and example, and “generalising” the knowledge. Internally, this learning process is
done by adjusting the weights tagged to the interconnections among those nodes of a
network. The training can be done in batches or individually in an incremental mode.
Neural networks are inspired by the biological systems in which large numbers of
neurons, which individually function rather slowly, collectively perform tasks at amazing
speeds that even the most advanced computers can not match. These neurons are made of
a number of simple processors, connected to one another by adjustable memory elements.
Each connection is associated with a weight and the weight is adjusted by experiences.
Among the more interesting properties of neural networks is their ability to learn. Neural
networks are not the only class of structures that learn. It is their learning ability coupled
with the distributed processing inherent in neural network systems that distinguishes
these systems from others.
Since neural networks learn, they are different from current AI expert systems in that
these networks are more flexible and adaptive: they can be thought of as dynamic
repositories of knowledge.
Researchers have long been felt that the neurons are responsible for the human capacity
to learn, and it is in this sense that the physical structure is being emulated by a neural
network to accomplish machine learning. Each computational unit computes some
function of its inputs and passes the results to connected units in the network. The
knowledge of the systems comes out of entire network of the neurons.
Figure 2.2 shows the analog of a neuron as a threshold element. The variables
x1, x2, . . . , xi, . . . , xn are the n inputs to the threshold element. These are analogous to
impulses arriving from several different neurons to one neuron. The variables
w1, w2, . . . , wi, . . . , wn are the weights associated with the impulses/inputs, satisfying the
relative importance that is associated with the path from which the input is coming.
When wi is positive, input xi acts as an excitatory signal for the element. When wi is 21
Basics of Artificial negative, input xi acts as an inhibitory signal for the element. The threshold element sums
Intelligence Tools the product of these inputs and their associated weights (∑ wi xi ) , compares it to a
prescribed threshold value and, if the summation is greater than the threshold value,
computes an output using a nonlinear function (F). The signal output y is nonlinear
function (F) of the difference between the preceding computed summation and the
threshold value and is expressed as :
y = ( ∑ wi xi − t ) . . . (2.2)
where, xi = signal output (I = 1, 2, . . , n),
wi = weight associated with the signal input xi, and
t = threshold level prescribed by user.Error!

x1
w1
x2
w2
∑ t y
wi
xi
wn
xn

Figure 2.2 : Threshold Element as an Analogous to a Neuron

Learning rules define which network parameters (weights, thresholds, number of


connections, etc.) change over time and in what way.
The node receives input from other nodes through weighted connections (links). The
input node is activated by the total effect of the weighted signals. The node’s output is
determined by processing the total sum of those weighted signals through a function,
generally a sigmoid function. Output signals travel along other weighted links to
connected nodes.
The simplest form of such neural network is the two layer associative network. As the
name implies, there are only two layers of nodes in each associative network: input and
output. The input patterns arriving at the input layer are mapped directly to a set of
output pattern at the output layers. There are no hidden nodes; that is, there is no internal
representation. These simple structured networks have proved practically useful in a
variety of applications. Obviously, the essential characteristics of such network are that
mapping is done from similar input patterns to simple output patterns.
Researchers have discovered that the procedure for neural networks to respond to the
external stimulus is electrical reactions. Over the years, researchers have tried to simulate
a neural system by using physical devices, such as electrical circuits, resisters, and wires.
The Hopfield Neural Network (HNN) is a prototype simulated neural system that
possesses an extremely efficient coupling ability.
A HNN has units such as neurons, synapses, dendrites, and an axon, which are made of
electrical devices. The functions and implicit meanings of these are exactly the same as
they are in bio-neural system. The architecture of HNN is shown in Figure 2.3. The basic
units are implemented as follows :
• Parallel input subsystem (dendrites Ii, i = 1, 2, 3, 4)
• Parallel output subsystem (axon Vi, i = 1, 2, 3, 4)
• Interconnectivity subsystem (synapses, the circuit in the schematic)
The neurons are constructed with electrical amplifiers in conjunction with the above
mentioned subsystems so as to simulate the basic computational features of the human
neural system. The procedure for the HNN to process the external stimuli is similar to
that of its biological counterpart. The functions of receiving, processing, and responding
22
to the stimuli are translated into three major electrical functions performed by the HNN. Neural Networks
These functions are :
• Sigmoid function
• Computing function
• Updating function
I1 I2 I3 I4

V1 V2 V3 V4

Figure 2.3 : Schematic of a Simplified Four Neuron HNN

Sigmoid function is a nonlinear increasing mathematical curve as shown in Figure 2.4.


This function describes how a neuron reacts to the external signal and generates its initial
excitability value. The computing function, including all necessary information about its
procedure for producing a reaction signal, is the major processor that processes the
external information. The updating function prompts out the reaction signals towards the
stimuli and passes these signals to the network to adjust the neurons’ reactions. As such
the HNN is able to provide a proper response toward the external stimuli and settle into a
stable state.
1

V = g (u)

0 − U0 0 U0

Figure 2.4 : Schematic of a Simplified Four Neuron HNN

A three layered perceptron architecture is used here for illustration purposes. The layers
are organized into a feed forward system, with each layer having full interconnection to
the next layer, but no connection within a layer, nor feedback connections to the previous
layer. The first layer is the input layer, whose units take on the activation equal to
corresponding network input values. The second layer is referred to as a hidden layer,
because its outputs are used internally and not considered as outputs of the network. The
final layer is the output layer. The activations of output units are considered the response
of the network.
The processing functions are as follows :
Neti = ∑ Wij A j + φ . . . (2.3)
j

1
Ai = Oi . . . (2.4)
(1 − e Net i )

where φ is a unit basis (similar to a threshold). Short-term operation of the network is


straight forward. The input layer unit activations are set equal to the corresponding
elements of an input vector. These activations propagate to the hidden layer via weighted
connections and are processed according to the functions above. The hidden layer outputs
23
Basics of Artificial then propagate to the output layer are again processed by the above functions. The
Intelligence Tools activations of the output layer units form the network’s response pattern.
We give a brief overview and analysis of the perception architecture to show how
classification occurs within the network. The analysis is given with respect to use of a
threshold unit activation function, rather than sigmoid function of Eq. (2.4). The basic
operation of the network is similar for both functions.
A hidden or output unit utilizing a threshold function is either entirely deactivated or
entirely activated, depending on the state of its inputs. Each unit is capable of deciding to
which of two different classes its current inputs belong and may be perceived as forming
a decision hyperplane through the n-dimensional input space. The orientation of this
hyperplanes depends on the value of the connection weights to the units.
Each unit thus divides the input space into two regions. However, many more regions
(and of much more complex shape) can be represented by considering the decisions of all
hidden units simultaneously. The maximum number of regions M representable by H
hidden nodes having n inputs each is given by
n ⎛H ⎞
M = ∑ ⎜ ⎟ . . . (2.5)
k =0⎝ k ⎠

The actual number of representable regions will likely be lower, depending on the
efficiency of the learning algorithm (note that if two units share the same hyperplane, or
if three or more hyperplanes share a common intersection point, the number of
representable classes will be reduced).
Each unit in a hidden layer may be viewed as classifying the input according to
microfeatures. Additional hidden layers and the output layer then further classify the
input according to higher level features composed of sets of microfeatures. For instance,
a two point unit perceptron with no hidden layers cannot learn the exclusive “OR”
classification of its inputs, since this requires classifying two disjoint regions in the input
space into the same class. The addition of hidden layer results in the formation of a
higher level feature that integrates the two disjoint regions into one, permitting correct
classification.
The performance of a network utilizing a sigmoid unit activation function may be
interpreted in a similar fashion. The sigmoid function, however, includes “fuzziness” into
the decision making of a unit. The unit no longer divides the input space into two crisp
regions. This fuzziness propagates through the network to the output.
The goal of classification system, however, is to make a decision regarding its inputs; it
is therefore necessary to introduce a threshold decision at some point in the system. In
our system, the output nodes use an output function different from the activation function
:
Oi = 1 if Ai > t and Ai = maxj {Aj} . . . (2.6)
= 0 otherwise
where, t is a threshold value.
SAQ 1
(a) What are the basic constituents of neural networks?
(b) What are the neural networks?
(c) Discuss the scope of implementation of neural networks.
(d) What is the role of axon in functioning of neural networks?
(e) What is sigmoid function?
(f) What is processing function? Discuss its role in network’s functioning.

24
Neural Networks
2.3 BACK PROPAGATION (BP)
The usefulness of the network comes from its ability to respond to input patterns in some
desirable fashion. For this to occur, it is necessary to train the network to respond
correctly to a given input pattern. Training or knowledge acquisition occurs by
modifying the weights of the network to obtain the required output. The most wieldy
used learning mechanism for multilayered perceptrons is the back propagation
algorithm (BP).
The problem of finding best set of weights to minimise error between the expected and
actual response of the network can be considered a non-linear optimisation problem. The
BP algorithm uses example input-output pairs to train the network. An input pattern is
presented to the network, and the network unit activations are calculated on a forward
pass through the network. The output unit activations present the network’s current
response to the given input pattern. This output is then compared to a desired output for
the given input pattern, and, assuming a logistic activation function, error terms are
calculated for each output unit by the following operation :
δE
Δ oi = = (Ti − Aoi ) Ahj (1 − Ahj ) . . . (2.7)
δWij

where E is the network error, Ti the desired activation of output unit i, Aoi the actual
activation of output unit i, Ahj the actual activation of hidden unit j.
The weights leading into the hidden nodes are then adjusted according to the following
equation :
Δ Wij = − k Δ oi Ahj . . . (2.8)

where k is the small constant often referred to as the learning rate.


The error terms are then propagated back to the hidden layer, where they are used to
calculate the error terms for the units of the hidden layer as follows :
Δ oi = Ai (1 − A) ∑ (Δ ok − W jk ) . . . (2.9)
k

The weights from the input layer to the hidden layer are then adjusted as in Eq. (2.8).
Momentum terms may be added to the hidden layer are then adjusted as in Eq. (2.8) to
increase the rate of convergence. Such terms attempt to speed convergence by preventing
the search from falling into shallow local minima during the search process. The strength
of the error terms will depend on the productivity of the current direction over a period of
iterations and thus introduces a more global prospective to the search. A wildly used
simple momentum strategy is as follows :
• A momentum term was added to the weight update equation as follows :
ΔWij = (t + 1) = η ∑ (Δ j o j ) + α ΔWij (t ) . . . (2.10)

where α is a momentum rate.


• If the total network error increases over a certain percentage, say 1%, from
the previous iteration, the momentum rate α is immediately set to zero until
the error is again reduced. This allows the search process to reorient itself it
gets off track. Once the error is again reduced, the momentum rate is reset to
its original value.
During training, the total network error typically drops quickly during the initial
iterations but easily becomes destabilized when using high learning rates. As the total
network error converges towards zero, the rate of change in error gradually decreases, but
the gradient descent search process that can tolerate higher learning rates before
destabilizing. In order to take advantage of this technique, a small acceleration
factor ω was used to accelerate the learning rate from a small initial value (0.01) to some
maximum value (0.75) over several thousand iterations. The acceleration occurs in a 25
Basics of Artificial compound fashion, increasing the learning rate at each iteration according to the
Intelligence Tools following equation :
η=ωη . . . (2.11)
This ensures that large increase in the learning rate does not occur until the convergence
process has stabilised. Use of the acceleration factor was highly effective, typically
reducing convergence time by thousands of iterations. For this study, ω was set equal
to 1.001.
SAQ 2
(a) What is back propagation?
(b) What is the use of back propagation in neural networks?
(c) How error terms for the unit of hidden layers are calculated?

2.4 SELF-ORGANISING MAP (SOM)


Self-organizing maps are special class of artificial neural networks. These networks are
based on competitive learning; the output neurons of the network compute among
themselves to be activated or fired, with the result that only one output neuron, one
neuron per group, is one at a time. An output neuron that wins the competition is called a
winner-takes-all neurons or simply a winning neuron. A self organising map is
characterised by the formation of a topographic map of the input patterns in which the
spatial locations of the neurons in the lattice are inductive of intrinsic statistical features
contained in the input patterns, hence the name “self-organising map”.
The principle goal of the self-organizing map (SOM) is to transform an incoming signal
pattern of arbitrary dimension into one or two-dimensional discrete map, and to perform
this transformation adaptively in a topologically ordered fashion. Figure 2.5 shows the
schematic diagram of a two-dimensional lattice of neurons commonly used as the
discrete map. Each neuron in the lattice is fully connected to all the source nodes in the
input layer. This network represents a feed forward structure with a single computational
layer consisting of neurons arranged in row and columns. A one-dimensional lattice is a
special case of the configuration depicted in Figure 2.5, in this special case the
computational layer consists simply of a single column or row nurons.

Layer
of
source node

Figure 2.5 : Two-Dimensional Lattice of Neuron

Each input patterns presented to the network typically consists of a localized region or
26 “spot” of activity against a quit background. The nature and location of such spot varies
from one realisation of the input pattern to another. All the neurons in the network should Neural Networks
therefore be exposed to a sufficient number of different realisations of the input pattern to
ensure that the self-organisation process has a chance to mature properly.
The algorithm responsible for the formation of self-organising map proceeds first by
initializing the synaptic weights in the network. This can be done by assigning them
small values picked from a random number generator, and in doing so doing, no prior
order is imposed on the feature map. Once the network has been properly initialized,
there are three essential process involved in the formation of the self-organising map, as
summarised here :
Competition
For each input pattern, the neurons in the network compute their respective values
on a discriminant function. This discriminant function provides the basis for
competition among the neurons. The particular neurons with the largest value of
discriminant function is declared winner of the competition.
Cooperation
The winning neuron determines the spatial location of a topological neighborhood
of excited neurons, thereby providing the basis for cooperation among such
neighboring neurons.
Synaptic Adaptation
The last mechanism enables the excited neurons to increase their individual values
to the discriminant function in relation to the input pattern through suitable
adjustments applied to their synaptic weights. The adjustments made are such that
the response of the wining neuron to the subsequent application of a similar input
pattern is enhanced.
Detailed descriptions of the processes of competition, cooperation, synaptic adaptation
are now presented.
2.4.1 Competitive Process
Let m denotes the dimension of input space. Let an input pattern selected at random from
the input space be denoted by

X = [ x1 , x2 , . . . , xm ]T . . . (2.12)
The synaptic weight vector of each neuron in the network has the same dimension as the
input space. Let the synaptic weight vector of neuron j be denoted by

W j = [ w j1 , w j 2 , . . . , w jm ]T , j = 1, 2, . . . , l . . . (2.13)

where l is the total number of neurons in the network. To find the best match of the input
vector X with the synaptic weight vector, compare the inner products wTj x for
j = 1, 2, . . . , l and select the largest. This assumes that the same threshold is applied to
all the neurons; the threshold is the negative of bias. Thus, by selecting the neuron with
the largest inner product wTj x , the location where the topological neighbourhood of
excited neurons is to be centered is determined.

It is found that the best matching criterion, based on maximising the inner product wTj x ,
is mathematically equivalent to minimising the Euclidean distance between the vectors X
and Wj. If the index i (X) is used to identify the neuron that best matches the input vector
X, then determine i (X) by applying the condition which sums up the essence of the
competition process among the neurons.
i ( x) = arg min || x − w j ||, j = 1, 2, . . . , l . . . (2.14)

According to the above equation i (X) is the subject of attention as the requirement is to
find the neuron i. The particular neuron i that satisfy this condition is called the 27
Basics of Artificial best-matching or winning neuron for the input vector X. Depending upon the application
Intelligence Tools of interest, the response of the network could be either the index of the winning neuron,
or the synaptic weight vector that is closest to the input vector in a Euclidean sense.
2.4.2 Cooperative Process
The winning neuron locates the center of a topological neighborhood of cooperating
neurons. In practice, a neuron that is firing tends to excite the neurons in its immediate
neighborhood more than those farther away from it, which is intuitively satisfying. This
observation leads to make the topological neighborhood around the winning neuron i,
decay smoothly with lateral distance. To be specific, let hj,i denote the topological
neighborhood centered on winning neuron i, and encompassing a set of excited neurons,
a typical one of which is denoted by j. Let di,j denote the lateral distance between winning
neuron i and excited neuron j. Then an assumption can be made that the topological
neighborhood hj,i is a unimodal function of the lateral distance dj,i, such that it satisfies
two distinct requirements :
• The topological neighborhood hj,i is symmetric about the maximum point
defined by dj,i = 0; in other words it attains its maximum value at the
winning neuron i, for which the distance dj,i is zero.
• The amplitude of the topological neighborhood hj,i decreases monotonically
with the increasing lateral distance dj,i, decaying to zero for dj,i → ∞; this is a
necessary condition for convergence.
A typical choice of hj,i that satisfies these requirements is Gaussian function
⎛ d 2j , i ⎞
h j , i ( x) = exp ⎜ − ⎟ . . . (2.15)
⎜ 2σ2 ⎟
⎝ ⎠
which is translational invariant. The parameter σ is the “effective width” of the
topological neighborhood; it measures the degree to which excited neurons in the vicinity
of the winning neuron participate in the learning process. It also makes the SOM
algorithm converge more quickly than a rectangular topological neighborhood would.
For cooperation among neighboring neurons to hold, it is necessary that topological
neighborhood be dependent on lateral distance between winning neurons i and excited
neuron j in the output space rather than on some distance measure in the original output
space. This is precisely what we have in Eq. (2.15). In case of one dimensional lattice, di,j
is an integer equal to | i – j |. On the other hand, in the case of a two dimensional lattice is
defined by
d 2j , i = || r j − ri || . . . (2.16)
where the distance rj defines the position of excited neuron j and ri defines the discrete
position of winning neuron i, both of which are measured in the discrete output space.
Another unique feature of the SOM algorithm is that the size of the topological
neighborhood shrinks with time. This requirement is satisfied by making the width σ of
the topological neighborhood function hi,j decrease with time. A popular choice of the
dependence of σ on discrete time n is the exponential decay.
⎛ n⎞
σ (n) = σ0 exp ⎜ − ⎟ . . . (2.17)
⎝ τ1 ⎠
where σ0 is the value of σ at the initiation of the SOM algorithm, and τ1 is a time
constant.
2.4.3 Adaptive Process
The last process is the synaptic adaptive process, in the self-organised formation of a
feature map. For the network to be self-organising, the synaptic weight vector Wj of
neuron j in the network is required to change in relation to the input vector X.
The question is how to make the change. In Hebb’s postulate of learning, a synaptic
weight is increased with a simultaneous occurrence of presynaptic and postsynaptic
28
activities. The use of such a rule is well suited for associative learning. For the type of Neural Networks
unsupervised learning being considered here, however, the Hebbian hypothesis in its
basic form is unsatisfactory for the following reason: changes in connectivities occur in
one direction only, which finally derive all the synaptic weights into saturation. To
overcome this problem, the Hebbian hypothesis is modified by including a forgetting
term – g (yj) Wj, where Wj is the synaptic weight vector of neuron j and g (yj) is some
positive scalar function of the response yj. The only requirement imposed on the function
g (yi) is that the constraint term in the Taylor series expansion of g (yj) be zero, so that
g ( yi ) = 0 for y j = 0 . . . (2.18)

The significant of this requirement will become apparent momentarily. Given such a
function, the change to the weight vector of neuron j in the lattice can be found as
follows :
ΔW j = η y j x − g ( y j ) W j . . . (2.19)

where, η is the learning rate parameter of the algorithm. The first term on right hand side
of Eq. (2.19) is the Hebbian term and the second term is the forgetting term. To satisfy
the requirement of Eq. (2.18), a liner function for g (yj) is chosen as shown by
g (yj ) = η yj . . . (2.20)

Eq. (2.19) is further simplified by setting


y j = h j, i ( x) . . . (2.21)

Using Eqs. (2.19), (2.20), and (2.21), it is obtained


ΔW j = η h j , i ( x ) ( x − w j ) . . . (2.22)

Finally, using discrete-time formalism, given the synaptic weight vector Wj (n) of neuron
j at time n, the update weight vector Wj (n + 1) at time n + 1 is defined by :
W j (n + 1) = W j (n) + η (n) h j , i ( x ) (n) ( x − w j ) (n) . . . (2.23)

Which is applied to all neurons in the lattice that lie inside the topological neighborhood
of winning neuron i. Eq. (2.23) has the effect of moving the synaptic weight vector Wi of
winning neuron i towards the input vector X. The algorithm therefore leads to a
topological ordering of the feature map in the input space in the sense that neurons that
are adjacent in the lattice will tend to have similar synaptic weight vectors. Eq. (2.23) is
the desired formula for computing the synaptic weights of the feature map.
SAQ 3
(a) What is self-organising map (SOM)?
(b) What is cooperative process in SOM?
(c) What is adaptive process in SOM?

2.5 IMPLEMENTATION OF NEURAL NETWORK IN


PROCESS PROBLEM
Neural networks are widely used in process planning problems. The process planner
learns mappings between input patterns, consisting of the features and attributes of a past,
and output patterns, consisting of sequences of machining operations to apply to these
parts. Thus neural networks offer a promising solution for automating the learning of
process knowledge.
2.5.1 Problem Representation
The process planning tasks may be represented by the transformation 29
Basics of Artificial F A → C, . . . (2.23)
Intelligence Tools
where F is a set of part feature, A is a set of feature attributes, C as a set of feasible
operation sequences, and → indicates a mapping function.
Process planners are interested in those features that are generated by some sequence of
machining process. Typical feature includes holes (threaded and unthreaded), external
cylinders (threaded and unthreaded), faces, slots, keyways, and gears. Each feature is
associated with a set of attributes that define it from a manufacturing standpoint. These
include dimensions, tolerances, and surface finish requirements.
Based on the particular values of feature attributes, the process planner can identify the
sequence of operations necessary to produce the feature. Each sequence corresponds to a
particular classification of the input pattern. In the neural network the transformation
function is embedded in the network’s connection weights through training. Figure 2.6
demonstrates how inputs are physically presented to the network. Each known feature is
associated with an input unit, which is highly active (+ 1) if the feature is present, and
inactive (0) otherwise. Each known attribute is also associated with an input unit. The
input unit takes on the value of the attribute, normalized to lie between 0 and 1.
The features composing the part being planned are presented to the network one at a
time, along with their corresponding attributes. The networks response to the feature
pattern represents selection of machining operation. If the activation of the output unit is
positive, it is interpreted as the selection of the corresponding machining operation is
supported. A threshold mechanism selects the operation whose output unit has the
highest positive activation above some threshold value as presented in Eq. (2.6). Such a
mechanism can be implemented in a neural network via lateral inhabitation in the output
layer.
Recurrent inputs
(sequence context)

Features

Operation
selection

Attribute
values

Output
layer

Hidden
layer

Connections

Input
layer

Figure 2.6 : A Neural Network for Process Planning Knowledge Acquisition

Note that output units are assigned to single operations rather than sequences. To learn
sequencing constraints, it is necessary to provide encoding of operation position within a
sequence. Encoding may be accomplished directly by assigning one output unit to each
30
possible sequence or, for each operation, to each possible position of that operation in a Neural Networks
sequence. Neither of these encodings accurately reflects the approach of the expert
process planner, however, and both require a large number of output units, many of
which are infrequently used. An expert process planner builds a sequence of operations
for each feature, with each operation being added depending on a specific, but not
necessarily identical set of attributes. In order to select operations individually, yet
maintain correct sequencing, the outputs of the network may be fed back as inputs, thus
serving as a context for a decision on the next operation in the sequence. Reoccurrence
ends when a null output is obtained; that is, all output units have activations below the
threshold value, signaling the sequence and by using recurrence, output units are
efficiently utilised, with little sacrifice in the final performance speed of the network.
2.5.2 Training
Training of neural networks may be distinguished by weather training patterns are
presented incrementally or in batches. In incremental training, weights are updated after
the presentation of each training input-output pairs. An error measure, total pattern sum
of squares error (PSS), is calculated for each iteration as

PSS = ∑ (T j − A j )2 . . . (2.24)
oj = 1

where o is the number of output units, and Tj and Aj are the training values and activation,
respectively, for the output node.
The training pattern is repeatedly presented until a tolerable error level is achieved,
determined by
E ≥ TSS . . . (2.25)

where E is the some common criterion.


In batch training, a set of pattern is presented to the network. The network error terms for
each pattern are summed, and only at the end of presentation of all the patterns are the
weights updated. TSS, falls below E, as given by
E ≥ TSS = ∑ PSSi . . . (2.26)
pj =1

where p is the number of patterns in the batch.


While the incremental approach more closely mimics the learning experience, it tends to
perform more poorly than the batch approach. The quality of system response becomes
dependent on the order in which examples are presented, and also the extent to which
error is minimised for each pattern. Minimisation of error for a particular pattern may
cause increased response error for another pattern. The batch approach minimises
response error over all example patterns, resulting in better overall performance.
However, a complete set of examples covering all contingencies will rarely be available
for a realistic application. Therefore some combination of batch and incremental modes
might be useful.
2.5.3 Examples
To demonstrate the neural network approach, a training set of example process plans was
generated for spur gear with five features: a hole, a keyway, two faces (identical in
dimensions and tolerances), and the gear teeth. Each feature was associated with a set of
attributes, defined in Table 2.1. For each example, the value for the attributes were
randomly assigned within common manufacturing ranges for such features, subject to
physical constraints.
Table 2.1 : Example Features and Associated Attributes

Feature Attributes
31
Basics of Artificial Depth, diameter, size tolerance, position tolerance,
Intelligence Tools Hole
circularity, straightness, surface finish
Diameter, thickness, size tolerance, parallelism,
Face
surface finish
Width, depth, inset, size tolerance, position
Keyway
tolerance, surface finish
Gear Diameter pitch, error in action, surface finish

In order to accurately evaluate the performance of the network, the process plans were
generated artificially using rules. These rules then constituted the domain transformation
functions to be learned by the network. Figure 2.7 shows some rules used to generate the
operation sequences for hole features. Note that these rules are non-trivial and, in fact,
require evaluation of relationships between attributes. Figure 2.8 demonstrates some
example process plans generated.
Table 2.2 defines the machining operations known to a network. A number of these
operations, such as milling and honing, are used in the manufacture of more than one
feature type.
If (depth/diameter ratio) >=3 then
If (diameter > 2) then
“center drill”
“trepan”
else
“gundrill”
end if
else
if (((diameter, 0.75) and
((size tolerance <=0.003) or
(position tolerance <=0.005))))
then
“central drill”
“twist drill”
else
“twist drill”
end if
if (straight <=0.001)
then
“ream”
end if
if (circularity <=0.001)
then
“counter – bore”
end if
if (surface finish <=16)
then
“hone”
end if
Figure 2.7 : Some Rules Used to Generate Process Plans for Hole Features

A total of 19 attributes and 15 machining operations were defined for the four feature
type (note that three of the attributes correspond to the same dimension – hole depth, face
thickness, keyway depth – and can be considered a single attribute). The input layer
32 consists of 38 units : 4 units corresponding to the four feature type, 19 units
corresponding to the attributes, and 15 inputs corresponding to the recurrent feedback Neural Networks
inputs. The output layer consists of 15 units, each corresponding to a particular machine
operation. Since the domain rules are known in advance, it is possible to determine an
approximate lower bound on the number of hidden units needed.
<hole>
attributes :
diameter = 1.1011
depth = 8.2909
size tolerance = +/-0.0105
position tolerance = +/-0.0117
straightness tolerance = +/-0.0245
circularity tolerance = +/-0.0033
surface finish = 32
process sequence :
gundrill
<keyway>
attributes :
width = 0.2434
depth = 7.8141
inset = 0.8812
size of tolerance = +/-0.0188
position of tolerance = +/-0.0053
surface finish = 16
process sequence :
broach
finish broach
<plans>
attributes :
diameter = 20.1368
thickness = 8.2909
size of tolerance = +/-0.0008
parallelism tolerance = +/- 0.0081
surface finish = 63
process sequence :
mill
finish mill
grind
<gear>
attributes :
diametral pitch = 5.6169
error in action = +/- 0.0026
surface finish = 8
process sequence :
hob
finishing hob
gear shaving
Figure 2.8 : Examples of Generated Process Plans

Table 2.2 : Known Machining Operation


Operation Used for
Central drill Hole
33
Basics of Artificial Trepan Hole
Intelligence Tools
Gundrill Hole
Twist drill Hole
Ream Hole
Counterbore Hole
Hone Hole, Keyway
Broach Keyway
Finish broach Keyway
Mill Face, gear
Finish mill Face, gear
Grind Face, gear
Hob Gear
Finish hob Gear
Gear shaving Gear

The number of regions required to be uniquely identified for each feature is done by
taking the number of rule conditions times the maximum operation sequence length for
that feature. The number of regions was then summed over all four features, yielding a
total of 120 regions. An approximate lower bound can then be found by determining H
satisfying the inequality
n ⎛H ⎞
M ≤ ∑ ⎜ ⎟ . . . (2.27)
k =0⎝ k ⎠

For M = 120 regions, the inequality was minimally satisfied for H = 7 nodes. The bound
proved fairly accurate; 8 nodes were required during the simulations in order to achieve
convergence during training. The training set contained 30 examples. The network was
trained in a number of different modes in order to characterize the performance
differences between incremental and batch learning. The network was trained in four
modes: a pure incremental mode, in which the network was trained on each pattern
separately; a combined approach, in which the network was trained in batches of 10 new
examples (3 batches); a second combined approach in which the batch size was again 10,
but 5 of the examples were randomly chosen from previously learned batches
(6 batches), with the aim of reducing forgetting effects; and a pure batch mode, in which
all 30 examples are presented simultaneously (1 batch). In addition the network was
trained, in batch mode, with 75 example training sets in order to examine the effect of
increasing training set size on performance. For all simulations, the error criteria, E, was
equal to 0.7, and the threshold, e, was also equal to 0.7.
2.5.4 Results
After training, additional gear process plans were generated for testing the network’s
trained performance. The trained network was tested separately with both the training
and testing examples sets. For each training session the following information was
collected.
• Example set used (training and testing).
• Batch size.
• Overlap size.
• Number of iterations per batch size (training only) for incremental training,
the total number of iterations over all batches.

34
• Iterations per Minute (Training Only) : This information is included to Neural Networks
give an idea of simulation speeds on a sequential computer and is not
indicative of the speed of an actual neural network implementation.
• Percentage of Essentially Correct Plans Missing Operations
(COR/MISS) : The percentage of feasible plans that constituted a subset of
the correct plan but were missing necessary finishing operations.
• Percentage of Essentially Correct Plans with Extra Operations
(COR/EXTRA) : The percentage of feasible plans that contain the correct
plan as a subset but include unnecessary finishing operations.
• Percentage of Incorrect but Feasible Plans (INC/FEAS) : The percentage
of plans generated by the network which are feasible for the particular
feature but not applicable for the given feature instance.
• Percentage of Infeasible Plans (INFEAS) : The percentage of plans
generated that do not represent a possible plan, either due to inclusion of an
incorrect operation or to improper operation ordering.
• Percentage of Correct Process Plans (COR) : The percentage of generated
process plans containing no errors.
This information is summarised in Table 2.3. The percentages do not tally exactly to
100% because some plans had both extra and missing operations and thus were counted
twice. The results indicate a consistent decrease in total convergence time required to
learn a fixed number of examples with increasing batch size, suggesting that the batch
approach is more efficient than the incremental approach from training prospective. The
number of training iterations per batch increases considerably, however, with increased
batch size (note difference in convergence times between batch size 30 and 75) this can
be attributed to the need for the system to consider more constraints simultaneously for
larger batch size.
Table 2.3 : Network Performance
Batch Overlap Example Convergence COR/MIS COR/EXT INC/FEA INFEAS COR
Speed
Size Size Set (lter) (%) (%) (%) (%) (%)
Training 7500 745 38.5 11.7 7.5 - 49.2
1 0
Testing − − 47 9.1 1.4 0.8 41.6

Training 5960 80 8.3 20.8 4.2 4.1 65


10 0
Testing − − 19.1 25.8 5 5.8 46.7

Training 5810 80 6.6 19.2 2.5 1.6 71.7


5 5
Testing − − 15 21.7 3.3 5 61.7

Training 4880 25 − − − − 100


30 0
Testing − − 7 18 2.5 1.6 73.5

Training 13120 12 − − − − 100


75 0
Testing − − − 10 1.6 0.8 83.3

Because smaller batch sizes do not consider all constraints simultaneously, forgetting
effects can occur : learning new examples occur at the expense of previously learned
examples. That this had occurred can be evidenced by examining the performance of the
network on the example set used to train it. Note that for the single batch case,
performance was 100% correct for the training example set, whereas performance
dropped to 49.2% for the strictly incremental approach (batch size 1). Increasing batch
size yielded better performance, but the simulations required significantly more CPU
time.
The forgetting effects can be negated to some extant by including previously learned
examples in new examples batches as context. Note that with an overlap of 5 examples,
and 5 new examples, performance (total number of completely correct plans) improves
over the case of batch size 10 with no overlap. 35
Basics of Artificial The majority of the planning errors obtained were not simply random errors but instead
Intelligence Tools were inclusion and exclusion of finishing operations. Because examples are randomly
drawn from the population, with each example representing a particular instance of the
mapping function, the example set only provides an approximate model of the actual
transformation function. Thus the quality of the internal model learned by the network is
directly related to the sample size. Note the increase in performance for the 75 example
case over the 30 example case.
Strictly infeasible plans were rare. In almost all cases, the infeasibility resulted from an
inappropriate operation occurring at the end of a plan sequence, generally with much
weaker activation than the correct operations within the plan. In all cases the occurrence
of the errors could be eliminated by raising the threshold. In addition, using larger batch
sizes decreased the occurrence of these errors.
For a constant threshold, increasing the batch size lead to an increase in both the use and
misapplication of operations that were rare in the training example set, such as hone. This
can be seen in the increase of COR/EXTRA for batch size 10 over batch size 1. However,
with larger batch sizes, COR/EXTRA again began to decrease. For the small batch size, it
appeared that the network was only able to learn those operations that were core to
almost all process plans. Larger batch sizes generated these operations but, for batch size
10, frequency misapplied them.
An important issue in neural networks is training convergence rate. The convergence rate
is affected by a number of factors, including learning and momentum rates, and problem
complexity. In general, with all other factors held constant, increasing either the
momentum or learning rates resulted in a decrease in convergence time. However, above
a certain level, instability will occur, with the network eventually setting into a poor local
error minima. Another important issue for neural networks is size complexity. For the
given problem representation, the number of input units is directly equal to the number of
features plus attributes and thus grows linearly with size of input space. Likewise, the
number of output nodes grows linearly with the number of machining operations to be
represented. The number of hidden nodes required is primarily a function of the problem
complexity, in terms of the number of decision regions required.
SAQ 4
(a) What is process planning?
(b) How training of neural networks is done?

2.6 SUMMARY
Basics of neural network and its implementation on automatic acquisition of process
planning knowledge have been introduced. This approach overcomes the time
complexity associated with the earlier attempts using machine learning techniques. The
example demonstrated here shows the potential of the approach for use on real word
problems. The neural network approach uses a single methodology for generating useful
inferences, rather than using explicit generalization rules. Because the network only
generates inferences as needed for a problem, there is no need to generate and store all
possible inferences ahead of time.

36
Neural Networks
2.7 KEY WORDS
Neural Networks : A neural network is a massively parallel
distributed processor made up of simple
processing units, which has a natural propensity
for storing experiential knowledge and making it
available for use.
Back Propagation : The most wieldy used learning mechanism for
multi-layered perceptions.
SOM : Self-organizing maps are special class of artificial
neural network. These networks are based on
competitive learning, the output neurons of the
network compute among themselves to be
activated of fired, with the result that only one
output neuron, one neuron per group, is one at a
time

2.8 ANSWERS TO SAQs


Refer the preceding text for all the Answers to SAQs.

37

You might also like