Unit 2
Unit 2
Unit 2
2.1 INTRODUCTION
Work on neural networks, commonly referred to as “neural networks,” has been
motivated right from its inception by the recognition that the human brain computes in an
entirely different way from the conventional digital computer. In its most general form, a
neural network is a machine that is designed to model the way in which the brain
performs a particular task or function of interest; the network is usually implemented by
using electronic components or is simulated in software on a digital computer. We may
offer the following definition of neural network viewed as an adaptive machine:
A neural network is a massively parallel distributed processor made up of simple
processing units, which has a natural propensity for storing experiential knowledge and
making it available for use. It resembles the brain in two respects :
• Knowledge is acquired by the network from its environment through a
learning process.
• Interneuron connection strengths, known as synaptic weights, are used to
store the acquired knowledge.
Neural networks are a class of computing systems that use a highly parallel architecture
to efficiently perform pattern recall, prediction, and classification tasks. These networks
are loosely modeled after human networks of neurons in the brain and nervous systems.
The networks consist of large numbers of simple processing units that are characterized
by a state of activation, which is a function of inputs to the units. Each unit calculates
three functions :
• Propagation Function Neti : Generally the propagation function is
calculated as the weighted sum of the outputs of all other units connected to
unit i :
Net i = ∑ Wij O j . . . (2.1)
j
where, Wij is the strength of the connection between units i and j, and Oj is
the output level of unit j.
• Activation Function Ai : A function of Neti and occasionally of time.
Commonly used functions are the linear, logistic, and threshold functions.
19
Basics of Artificial • Output Function Oi : A function of Ai, Oi is often simply set equal to the
Intelligence Tools activation, Ai.
The connections may be inhibitory or excitatory. Inhibitory connections tend to reduce
the activation of units to which they are connected, while excitatory connections will
tend to raise the activation.
Neural network operate at two different levels of time: short-term response to
environmental inputs, and long-term changes in connection weights, which encode
knowledge and change how the network reacts to its environment.
Many different neural network architectures have been developed. They differ in the
types of propagation and activation functions used, how units are interconnected, and
how learning is implemented. The type of paradigm used depends on the characteristics
of the task to be performed. A major distinction among networks is whether the system
will be used for recall (recognition), prediction, or classification. Recall systems are used
for noise filtering and pattern completion (also called content addressable memories or
CAM). Examples are Anderson Brain State in a box (BSB) model and Hopfield network.
Prediction systems can be used to estimate the behavior of complex systems. An example
is the work of Rangwala and Dornfield on prediction of machining parameters.
Classification networks create mappings of input patterns into categories, represented by
characteristic output patterns.
The use of neural networks offers the following useful properties and capabilities :
Non-linearity, Input-Output Mapping, Adaptivity, Evidential Response, Contextual
Information, Fault Tolerance, Uniformity of Analysis and Design, Neurobiological
Analogy, etc.
Objectives
After studying this unit, you should be able to
• describe the architecture of neural network,
• understand the concept of self organizing map, and
• explain the applications of neural network in production systems.
Synapse Dendrites
Axon
Axon
Soma Soma
Dendrites
Synapse
20
The dendrites and axon are the channels for receiving and transmitting information. The Neural Networks
synapses process limitations and stimuli. The reaction signal is produced and computed
by neurons. It is understood that the entire procedure of information processing and
sharing is conducted in three steps :
• Receiving information.
• Processing information.
• Responding to the information.
The process keeps repeating until the network reaches a proper response towards the
stimuli.
Neural network have been widely used in pattern recognition and classification tasks
such as vision and speech processing. Their advantages over traditional classification
schemes and expert systems are :
• Parallel consideration of multiple constraints
• Capability for continued learning throughout the life of the system
• Graceful degradation of performance
• Ability to learn arbitrary mappings between input and output spaces.
Indeed, neural network computing shows great potential in performing complex data
processing and data interpretation tasks. Neural networks are modeled after
neuro-physical structures of human brain cells and the connections among those cells.
Such networks are characterised by exceptional classification and learning capabilities.
Neural networks differ from most other classes of AI tools in that the network does not
require clear-cut rules and knowledge to perform tasks. The magic of neural network is
the ability to make reasonable generalizations and perform reasonably on patterns that
have never before been presented to the network. They learn problem solving procedure
by “characterising” and “memorising” the special features associated with each training
case and example, and “generalising” the knowledge. Internally, this learning process is
done by adjusting the weights tagged to the interconnections among those nodes of a
network. The training can be done in batches or individually in an incremental mode.
Neural networks are inspired by the biological systems in which large numbers of
neurons, which individually function rather slowly, collectively perform tasks at amazing
speeds that even the most advanced computers can not match. These neurons are made of
a number of simple processors, connected to one another by adjustable memory elements.
Each connection is associated with a weight and the weight is adjusted by experiences.
Among the more interesting properties of neural networks is their ability to learn. Neural
networks are not the only class of structures that learn. It is their learning ability coupled
with the distributed processing inherent in neural network systems that distinguishes
these systems from others.
Since neural networks learn, they are different from current AI expert systems in that
these networks are more flexible and adaptive: they can be thought of as dynamic
repositories of knowledge.
Researchers have long been felt that the neurons are responsible for the human capacity
to learn, and it is in this sense that the physical structure is being emulated by a neural
network to accomplish machine learning. Each computational unit computes some
function of its inputs and passes the results to connected units in the network. The
knowledge of the systems comes out of entire network of the neurons.
Figure 2.2 shows the analog of a neuron as a threshold element. The variables
x1, x2, . . . , xi, . . . , xn are the n inputs to the threshold element. These are analogous to
impulses arriving from several different neurons to one neuron. The variables
w1, w2, . . . , wi, . . . , wn are the weights associated with the impulses/inputs, satisfying the
relative importance that is associated with the path from which the input is coming.
When wi is positive, input xi acts as an excitatory signal for the element. When wi is 21
Basics of Artificial negative, input xi acts as an inhibitory signal for the element. The threshold element sums
Intelligence Tools the product of these inputs and their associated weights (∑ wi xi ) , compares it to a
prescribed threshold value and, if the summation is greater than the threshold value,
computes an output using a nonlinear function (F). The signal output y is nonlinear
function (F) of the difference between the preceding computed summation and the
threshold value and is expressed as :
y = ( ∑ wi xi − t ) . . . (2.2)
where, xi = signal output (I = 1, 2, . . , n),
wi = weight associated with the signal input xi, and
t = threshold level prescribed by user.Error!
x1
w1
x2
w2
∑ t y
wi
xi
wn
xn
V1 V2 V3 V4
V = g (u)
0 − U0 0 U0
A three layered perceptron architecture is used here for illustration purposes. The layers
are organized into a feed forward system, with each layer having full interconnection to
the next layer, but no connection within a layer, nor feedback connections to the previous
layer. The first layer is the input layer, whose units take on the activation equal to
corresponding network input values. The second layer is referred to as a hidden layer,
because its outputs are used internally and not considered as outputs of the network. The
final layer is the output layer. The activations of output units are considered the response
of the network.
The processing functions are as follows :
Neti = ∑ Wij A j + φ . . . (2.3)
j
1
Ai = Oi . . . (2.4)
(1 − e Net i )
The actual number of representable regions will likely be lower, depending on the
efficiency of the learning algorithm (note that if two units share the same hyperplane, or
if three or more hyperplanes share a common intersection point, the number of
representable classes will be reduced).
Each unit in a hidden layer may be viewed as classifying the input according to
microfeatures. Additional hidden layers and the output layer then further classify the
input according to higher level features composed of sets of microfeatures. For instance,
a two point unit perceptron with no hidden layers cannot learn the exclusive “OR”
classification of its inputs, since this requires classifying two disjoint regions in the input
space into the same class. The addition of hidden layer results in the formation of a
higher level feature that integrates the two disjoint regions into one, permitting correct
classification.
The performance of a network utilizing a sigmoid unit activation function may be
interpreted in a similar fashion. The sigmoid function, however, includes “fuzziness” into
the decision making of a unit. The unit no longer divides the input space into two crisp
regions. This fuzziness propagates through the network to the output.
The goal of classification system, however, is to make a decision regarding its inputs; it
is therefore necessary to introduce a threshold decision at some point in the system. In
our system, the output nodes use an output function different from the activation function
:
Oi = 1 if Ai > t and Ai = maxj {Aj} . . . (2.6)
= 0 otherwise
where, t is a threshold value.
SAQ 1
(a) What are the basic constituents of neural networks?
(b) What are the neural networks?
(c) Discuss the scope of implementation of neural networks.
(d) What is the role of axon in functioning of neural networks?
(e) What is sigmoid function?
(f) What is processing function? Discuss its role in network’s functioning.
24
Neural Networks
2.3 BACK PROPAGATION (BP)
The usefulness of the network comes from its ability to respond to input patterns in some
desirable fashion. For this to occur, it is necessary to train the network to respond
correctly to a given input pattern. Training or knowledge acquisition occurs by
modifying the weights of the network to obtain the required output. The most wieldy
used learning mechanism for multilayered perceptrons is the back propagation
algorithm (BP).
The problem of finding best set of weights to minimise error between the expected and
actual response of the network can be considered a non-linear optimisation problem. The
BP algorithm uses example input-output pairs to train the network. An input pattern is
presented to the network, and the network unit activations are calculated on a forward
pass through the network. The output unit activations present the network’s current
response to the given input pattern. This output is then compared to a desired output for
the given input pattern, and, assuming a logistic activation function, error terms are
calculated for each output unit by the following operation :
δE
Δ oi = = (Ti − Aoi ) Ahj (1 − Ahj ) . . . (2.7)
δWij
where E is the network error, Ti the desired activation of output unit i, Aoi the actual
activation of output unit i, Ahj the actual activation of hidden unit j.
The weights leading into the hidden nodes are then adjusted according to the following
equation :
Δ Wij = − k Δ oi Ahj . . . (2.8)
The weights from the input layer to the hidden layer are then adjusted as in Eq. (2.8).
Momentum terms may be added to the hidden layer are then adjusted as in Eq. (2.8) to
increase the rate of convergence. Such terms attempt to speed convergence by preventing
the search from falling into shallow local minima during the search process. The strength
of the error terms will depend on the productivity of the current direction over a period of
iterations and thus introduces a more global prospective to the search. A wildly used
simple momentum strategy is as follows :
• A momentum term was added to the weight update equation as follows :
ΔWij = (t + 1) = η ∑ (Δ j o j ) + α ΔWij (t ) . . . (2.10)
Layer
of
source node
Each input patterns presented to the network typically consists of a localized region or
26 “spot” of activity against a quit background. The nature and location of such spot varies
from one realisation of the input pattern to another. All the neurons in the network should Neural Networks
therefore be exposed to a sufficient number of different realisations of the input pattern to
ensure that the self-organisation process has a chance to mature properly.
The algorithm responsible for the formation of self-organising map proceeds first by
initializing the synaptic weights in the network. This can be done by assigning them
small values picked from a random number generator, and in doing so doing, no prior
order is imposed on the feature map. Once the network has been properly initialized,
there are three essential process involved in the formation of the self-organising map, as
summarised here :
Competition
For each input pattern, the neurons in the network compute their respective values
on a discriminant function. This discriminant function provides the basis for
competition among the neurons. The particular neurons with the largest value of
discriminant function is declared winner of the competition.
Cooperation
The winning neuron determines the spatial location of a topological neighborhood
of excited neurons, thereby providing the basis for cooperation among such
neighboring neurons.
Synaptic Adaptation
The last mechanism enables the excited neurons to increase their individual values
to the discriminant function in relation to the input pattern through suitable
adjustments applied to their synaptic weights. The adjustments made are such that
the response of the wining neuron to the subsequent application of a similar input
pattern is enhanced.
Detailed descriptions of the processes of competition, cooperation, synaptic adaptation
are now presented.
2.4.1 Competitive Process
Let m denotes the dimension of input space. Let an input pattern selected at random from
the input space be denoted by
X = [ x1 , x2 , . . . , xm ]T . . . (2.12)
The synaptic weight vector of each neuron in the network has the same dimension as the
input space. Let the synaptic weight vector of neuron j be denoted by
W j = [ w j1 , w j 2 , . . . , w jm ]T , j = 1, 2, . . . , l . . . (2.13)
where l is the total number of neurons in the network. To find the best match of the input
vector X with the synaptic weight vector, compare the inner products wTj x for
j = 1, 2, . . . , l and select the largest. This assumes that the same threshold is applied to
all the neurons; the threshold is the negative of bias. Thus, by selecting the neuron with
the largest inner product wTj x , the location where the topological neighbourhood of
excited neurons is to be centered is determined.
It is found that the best matching criterion, based on maximising the inner product wTj x ,
is mathematically equivalent to minimising the Euclidean distance between the vectors X
and Wj. If the index i (X) is used to identify the neuron that best matches the input vector
X, then determine i (X) by applying the condition which sums up the essence of the
competition process among the neurons.
i ( x) = arg min || x − w j ||, j = 1, 2, . . . , l . . . (2.14)
According to the above equation i (X) is the subject of attention as the requirement is to
find the neuron i. The particular neuron i that satisfy this condition is called the 27
Basics of Artificial best-matching or winning neuron for the input vector X. Depending upon the application
Intelligence Tools of interest, the response of the network could be either the index of the winning neuron,
or the synaptic weight vector that is closest to the input vector in a Euclidean sense.
2.4.2 Cooperative Process
The winning neuron locates the center of a topological neighborhood of cooperating
neurons. In practice, a neuron that is firing tends to excite the neurons in its immediate
neighborhood more than those farther away from it, which is intuitively satisfying. This
observation leads to make the topological neighborhood around the winning neuron i,
decay smoothly with lateral distance. To be specific, let hj,i denote the topological
neighborhood centered on winning neuron i, and encompassing a set of excited neurons,
a typical one of which is denoted by j. Let di,j denote the lateral distance between winning
neuron i and excited neuron j. Then an assumption can be made that the topological
neighborhood hj,i is a unimodal function of the lateral distance dj,i, such that it satisfies
two distinct requirements :
• The topological neighborhood hj,i is symmetric about the maximum point
defined by dj,i = 0; in other words it attains its maximum value at the
winning neuron i, for which the distance dj,i is zero.
• The amplitude of the topological neighborhood hj,i decreases monotonically
with the increasing lateral distance dj,i, decaying to zero for dj,i → ∞; this is a
necessary condition for convergence.
A typical choice of hj,i that satisfies these requirements is Gaussian function
⎛ d 2j , i ⎞
h j , i ( x) = exp ⎜ − ⎟ . . . (2.15)
⎜ 2σ2 ⎟
⎝ ⎠
which is translational invariant. The parameter σ is the “effective width” of the
topological neighborhood; it measures the degree to which excited neurons in the vicinity
of the winning neuron participate in the learning process. It also makes the SOM
algorithm converge more quickly than a rectangular topological neighborhood would.
For cooperation among neighboring neurons to hold, it is necessary that topological
neighborhood be dependent on lateral distance between winning neurons i and excited
neuron j in the output space rather than on some distance measure in the original output
space. This is precisely what we have in Eq. (2.15). In case of one dimensional lattice, di,j
is an integer equal to | i – j |. On the other hand, in the case of a two dimensional lattice is
defined by
d 2j , i = || r j − ri || . . . (2.16)
where the distance rj defines the position of excited neuron j and ri defines the discrete
position of winning neuron i, both of which are measured in the discrete output space.
Another unique feature of the SOM algorithm is that the size of the topological
neighborhood shrinks with time. This requirement is satisfied by making the width σ of
the topological neighborhood function hi,j decrease with time. A popular choice of the
dependence of σ on discrete time n is the exponential decay.
⎛ n⎞
σ (n) = σ0 exp ⎜ − ⎟ . . . (2.17)
⎝ τ1 ⎠
where σ0 is the value of σ at the initiation of the SOM algorithm, and τ1 is a time
constant.
2.4.3 Adaptive Process
The last process is the synaptic adaptive process, in the self-organised formation of a
feature map. For the network to be self-organising, the synaptic weight vector Wj of
neuron j in the network is required to change in relation to the input vector X.
The question is how to make the change. In Hebb’s postulate of learning, a synaptic
weight is increased with a simultaneous occurrence of presynaptic and postsynaptic
28
activities. The use of such a rule is well suited for associative learning. For the type of Neural Networks
unsupervised learning being considered here, however, the Hebbian hypothesis in its
basic form is unsatisfactory for the following reason: changes in connectivities occur in
one direction only, which finally derive all the synaptic weights into saturation. To
overcome this problem, the Hebbian hypothesis is modified by including a forgetting
term – g (yj) Wj, where Wj is the synaptic weight vector of neuron j and g (yj) is some
positive scalar function of the response yj. The only requirement imposed on the function
g (yi) is that the constraint term in the Taylor series expansion of g (yj) be zero, so that
g ( yi ) = 0 for y j = 0 . . . (2.18)
The significant of this requirement will become apparent momentarily. Given such a
function, the change to the weight vector of neuron j in the lattice can be found as
follows :
ΔW j = η y j x − g ( y j ) W j . . . (2.19)
where, η is the learning rate parameter of the algorithm. The first term on right hand side
of Eq. (2.19) is the Hebbian term and the second term is the forgetting term. To satisfy
the requirement of Eq. (2.18), a liner function for g (yj) is chosen as shown by
g (yj ) = η yj . . . (2.20)
Finally, using discrete-time formalism, given the synaptic weight vector Wj (n) of neuron
j at time n, the update weight vector Wj (n + 1) at time n + 1 is defined by :
W j (n + 1) = W j (n) + η (n) h j , i ( x ) (n) ( x − w j ) (n) . . . (2.23)
Which is applied to all neurons in the lattice that lie inside the topological neighborhood
of winning neuron i. Eq. (2.23) has the effect of moving the synaptic weight vector Wi of
winning neuron i towards the input vector X. The algorithm therefore leads to a
topological ordering of the feature map in the input space in the sense that neurons that
are adjacent in the lattice will tend to have similar synaptic weight vectors. Eq. (2.23) is
the desired formula for computing the synaptic weights of the feature map.
SAQ 3
(a) What is self-organising map (SOM)?
(b) What is cooperative process in SOM?
(c) What is adaptive process in SOM?
Features
Operation
selection
Attribute
values
Output
layer
Hidden
layer
Connections
Input
layer
Note that output units are assigned to single operations rather than sequences. To learn
sequencing constraints, it is necessary to provide encoding of operation position within a
sequence. Encoding may be accomplished directly by assigning one output unit to each
30
possible sequence or, for each operation, to each possible position of that operation in a Neural Networks
sequence. Neither of these encodings accurately reflects the approach of the expert
process planner, however, and both require a large number of output units, many of
which are infrequently used. An expert process planner builds a sequence of operations
for each feature, with each operation being added depending on a specific, but not
necessarily identical set of attributes. In order to select operations individually, yet
maintain correct sequencing, the outputs of the network may be fed back as inputs, thus
serving as a context for a decision on the next operation in the sequence. Reoccurrence
ends when a null output is obtained; that is, all output units have activations below the
threshold value, signaling the sequence and by using recurrence, output units are
efficiently utilised, with little sacrifice in the final performance speed of the network.
2.5.2 Training
Training of neural networks may be distinguished by weather training patterns are
presented incrementally or in batches. In incremental training, weights are updated after
the presentation of each training input-output pairs. An error measure, total pattern sum
of squares error (PSS), is calculated for each iteration as
PSS = ∑ (T j − A j )2 . . . (2.24)
oj = 1
where o is the number of output units, and Tj and Aj are the training values and activation,
respectively, for the output node.
The training pattern is repeatedly presented until a tolerable error level is achieved,
determined by
E ≥ TSS . . . (2.25)
Feature Attributes
31
Basics of Artificial Depth, diameter, size tolerance, position tolerance,
Intelligence Tools Hole
circularity, straightness, surface finish
Diameter, thickness, size tolerance, parallelism,
Face
surface finish
Width, depth, inset, size tolerance, position
Keyway
tolerance, surface finish
Gear Diameter pitch, error in action, surface finish
In order to accurately evaluate the performance of the network, the process plans were
generated artificially using rules. These rules then constituted the domain transformation
functions to be learned by the network. Figure 2.7 shows some rules used to generate the
operation sequences for hole features. Note that these rules are non-trivial and, in fact,
require evaluation of relationships between attributes. Figure 2.8 demonstrates some
example process plans generated.
Table 2.2 defines the machining operations known to a network. A number of these
operations, such as milling and honing, are used in the manufacture of more than one
feature type.
If (depth/diameter ratio) >=3 then
If (diameter > 2) then
“center drill”
“trepan”
else
“gundrill”
end if
else
if (((diameter, 0.75) and
((size tolerance <=0.003) or
(position tolerance <=0.005))))
then
“central drill”
“twist drill”
else
“twist drill”
end if
if (straight <=0.001)
then
“ream”
end if
if (circularity <=0.001)
then
“counter – bore”
end if
if (surface finish <=16)
then
“hone”
end if
Figure 2.7 : Some Rules Used to Generate Process Plans for Hole Features
A total of 19 attributes and 15 machining operations were defined for the four feature
type (note that three of the attributes correspond to the same dimension – hole depth, face
thickness, keyway depth – and can be considered a single attribute). The input layer
32 consists of 38 units : 4 units corresponding to the four feature type, 19 units
corresponding to the attributes, and 15 inputs corresponding to the recurrent feedback Neural Networks
inputs. The output layer consists of 15 units, each corresponding to a particular machine
operation. Since the domain rules are known in advance, it is possible to determine an
approximate lower bound on the number of hidden units needed.
<hole>
attributes :
diameter = 1.1011
depth = 8.2909
size tolerance = +/-0.0105
position tolerance = +/-0.0117
straightness tolerance = +/-0.0245
circularity tolerance = +/-0.0033
surface finish = 32
process sequence :
gundrill
<keyway>
attributes :
width = 0.2434
depth = 7.8141
inset = 0.8812
size of tolerance = +/-0.0188
position of tolerance = +/-0.0053
surface finish = 16
process sequence :
broach
finish broach
<plans>
attributes :
diameter = 20.1368
thickness = 8.2909
size of tolerance = +/-0.0008
parallelism tolerance = +/- 0.0081
surface finish = 63
process sequence :
mill
finish mill
grind
<gear>
attributes :
diametral pitch = 5.6169
error in action = +/- 0.0026
surface finish = 8
process sequence :
hob
finishing hob
gear shaving
Figure 2.8 : Examples of Generated Process Plans
The number of regions required to be uniquely identified for each feature is done by
taking the number of rule conditions times the maximum operation sequence length for
that feature. The number of regions was then summed over all four features, yielding a
total of 120 regions. An approximate lower bound can then be found by determining H
satisfying the inequality
n ⎛H ⎞
M ≤ ∑ ⎜ ⎟ . . . (2.27)
k =0⎝ k ⎠
For M = 120 regions, the inequality was minimally satisfied for H = 7 nodes. The bound
proved fairly accurate; 8 nodes were required during the simulations in order to achieve
convergence during training. The training set contained 30 examples. The network was
trained in a number of different modes in order to characterize the performance
differences between incremental and batch learning. The network was trained in four
modes: a pure incremental mode, in which the network was trained on each pattern
separately; a combined approach, in which the network was trained in batches of 10 new
examples (3 batches); a second combined approach in which the batch size was again 10,
but 5 of the examples were randomly chosen from previously learned batches
(6 batches), with the aim of reducing forgetting effects; and a pure batch mode, in which
all 30 examples are presented simultaneously (1 batch). In addition the network was
trained, in batch mode, with 75 example training sets in order to examine the effect of
increasing training set size on performance. For all simulations, the error criteria, E, was
equal to 0.7, and the threshold, e, was also equal to 0.7.
2.5.4 Results
After training, additional gear process plans were generated for testing the network’s
trained performance. The trained network was tested separately with both the training
and testing examples sets. For each training session the following information was
collected.
• Example set used (training and testing).
• Batch size.
• Overlap size.
• Number of iterations per batch size (training only) for incremental training,
the total number of iterations over all batches.
34
• Iterations per Minute (Training Only) : This information is included to Neural Networks
give an idea of simulation speeds on a sequential computer and is not
indicative of the speed of an actual neural network implementation.
• Percentage of Essentially Correct Plans Missing Operations
(COR/MISS) : The percentage of feasible plans that constituted a subset of
the correct plan but were missing necessary finishing operations.
• Percentage of Essentially Correct Plans with Extra Operations
(COR/EXTRA) : The percentage of feasible plans that contain the correct
plan as a subset but include unnecessary finishing operations.
• Percentage of Incorrect but Feasible Plans (INC/FEAS) : The percentage
of plans generated by the network which are feasible for the particular
feature but not applicable for the given feature instance.
• Percentage of Infeasible Plans (INFEAS) : The percentage of plans
generated that do not represent a possible plan, either due to inclusion of an
incorrect operation or to improper operation ordering.
• Percentage of Correct Process Plans (COR) : The percentage of generated
process plans containing no errors.
This information is summarised in Table 2.3. The percentages do not tally exactly to
100% because some plans had both extra and missing operations and thus were counted
twice. The results indicate a consistent decrease in total convergence time required to
learn a fixed number of examples with increasing batch size, suggesting that the batch
approach is more efficient than the incremental approach from training prospective. The
number of training iterations per batch increases considerably, however, with increased
batch size (note difference in convergence times between batch size 30 and 75) this can
be attributed to the need for the system to consider more constraints simultaneously for
larger batch size.
Table 2.3 : Network Performance
Batch Overlap Example Convergence COR/MIS COR/EXT INC/FEA INFEAS COR
Speed
Size Size Set (lter) (%) (%) (%) (%) (%)
Training 7500 745 38.5 11.7 7.5 - 49.2
1 0
Testing − − 47 9.1 1.4 0.8 41.6
Because smaller batch sizes do not consider all constraints simultaneously, forgetting
effects can occur : learning new examples occur at the expense of previously learned
examples. That this had occurred can be evidenced by examining the performance of the
network on the example set used to train it. Note that for the single batch case,
performance was 100% correct for the training example set, whereas performance
dropped to 49.2% for the strictly incremental approach (batch size 1). Increasing batch
size yielded better performance, but the simulations required significantly more CPU
time.
The forgetting effects can be negated to some extant by including previously learned
examples in new examples batches as context. Note that with an overlap of 5 examples,
and 5 new examples, performance (total number of completely correct plans) improves
over the case of batch size 10 with no overlap. 35
Basics of Artificial The majority of the planning errors obtained were not simply random errors but instead
Intelligence Tools were inclusion and exclusion of finishing operations. Because examples are randomly
drawn from the population, with each example representing a particular instance of the
mapping function, the example set only provides an approximate model of the actual
transformation function. Thus the quality of the internal model learned by the network is
directly related to the sample size. Note the increase in performance for the 75 example
case over the 30 example case.
Strictly infeasible plans were rare. In almost all cases, the infeasibility resulted from an
inappropriate operation occurring at the end of a plan sequence, generally with much
weaker activation than the correct operations within the plan. In all cases the occurrence
of the errors could be eliminated by raising the threshold. In addition, using larger batch
sizes decreased the occurrence of these errors.
For a constant threshold, increasing the batch size lead to an increase in both the use and
misapplication of operations that were rare in the training example set, such as hone. This
can be seen in the increase of COR/EXTRA for batch size 10 over batch size 1. However,
with larger batch sizes, COR/EXTRA again began to decrease. For the small batch size, it
appeared that the network was only able to learn those operations that were core to
almost all process plans. Larger batch sizes generated these operations but, for batch size
10, frequency misapplied them.
An important issue in neural networks is training convergence rate. The convergence rate
is affected by a number of factors, including learning and momentum rates, and problem
complexity. In general, with all other factors held constant, increasing either the
momentum or learning rates resulted in a decrease in convergence time. However, above
a certain level, instability will occur, with the network eventually setting into a poor local
error minima. Another important issue for neural networks is size complexity. For the
given problem representation, the number of input units is directly equal to the number of
features plus attributes and thus grows linearly with size of input space. Likewise, the
number of output nodes grows linearly with the number of machining operations to be
represented. The number of hidden nodes required is primarily a function of the problem
complexity, in terms of the number of decision regions required.
SAQ 4
(a) What is process planning?
(b) How training of neural networks is done?
2.6 SUMMARY
Basics of neural network and its implementation on automatic acquisition of process
planning knowledge have been introduced. This approach overcomes the time
complexity associated with the earlier attempts using machine learning techniques. The
example demonstrated here shows the potential of the approach for use on real word
problems. The neural network approach uses a single methodology for generating useful
inferences, rather than using explicit generalization rules. Because the network only
generates inferences as needed for a problem, there is no need to generate and store all
possible inferences ahead of time.
36
Neural Networks
2.7 KEY WORDS
Neural Networks : A neural network is a massively parallel
distributed processor made up of simple
processing units, which has a natural propensity
for storing experiential knowledge and making it
available for use.
Back Propagation : The most wieldy used learning mechanism for
multi-layered perceptions.
SOM : Self-organizing maps are special class of artificial
neural network. These networks are based on
competitive learning, the output neurons of the
network compute among themselves to be
activated of fired, with the result that only one
output neuron, one neuron per group, is one at a
time
37