Multi Perceptor
Multi Perceptor
Multi Perceptor
networks
Agenda
Backpropagation
Speed ups
Conclusions
Some historical notes
Rosenblatt’s perceptron 1958: a single neuron
with adjustable synaptic weights and bias
Perceptron convergence theorem 1962
(Rosenblatt): If patterns used to train the
perceptron are drawn from two linearly
separable classes, then the algorithm
converges and positions the decision surface in
the form of hyperplane between two classes.
The limitations of networks implementing linear
discriminants were well known in the 1950s and
1960s.
Some historical notes
A single neuron -> the class of solutions that
can be obtained is not general enough ->
multilayer perceptron
The LMS
algorithm exists
for linear
systems
Training error
Learning rule
Backpropagation Algorithm
Learning rule
Hidden-to-output
Input-to-hidden
Note, that wij are
initialized with
random values
Demo Chapter 11
Backpropagation Algorithm
Compare with LMS
algoritms
1) Method of Steepest
Descent
The direction of
steepest descent is in
direction opposite to
the gradient vector g =
E(w)
w(n+1) = w(n) –g(n)
is the stepsize or
learning-rate parameter
Backpropagation Algorithm
Training set = a set of patterns with known
labels
Stochastic training = patterns are chosen
randomly from the training set
Batch training = all patterns are presented to
the network before learning takes place
On-line protocol = each pattern is presented
once and only once, no memory in use
Epoch = a single presentation of all patterns in
the training set. The number of epochs is an
indication of the relative amount of learning.
Backpropagation Algorithm
Stopping criterion
Backpropagation Algorithm
Learning set
Validation set
Test set
Stopping criterion
Learning curve, the
average error per
pattern
Cross-validation
Error Surfaces and Feature
Mapping
Note, error backpropagation is based
on gradient descent in a criterion
function J(w) that is represented
represented
Error Surfaces and Feature
Mapping
The total training
error is minimized.
It usually
decreases
monotonically,
even though this is
not the case for the
error on each
individual pattern.
Error Surfaces and Feature
Mapping
Hidden-to-output
weights ~ a linear
discriminant
Input-to-hidden
weights ~ ”matched
filter”
Practical Techniques for
Improving Backpropagation
How to improve
convergence,
performance, and
results?
Neuron:
Sigmoid function =
centered zero and
antisymmetric
Practical Techniques for
Improving Backpropagation
Scaling input variables
= the input patterns should be shifted so
that the average over the training set of
each feature is zero.
= the full data set should be scaled to have
the same variance in each feature
component
Note, this standardization can only be done
for stochastic and batch learning!
Practical Techniques for
Improving Backpropagation
When the training set is small one can generate
surrogate training patterns.
In the absence of problem-specific information,
the surrogate patterns should be made by
adding d-dimensional Gaussian noise to true
training points, the category label should be left
unchanged.
If we know about the source of variation among
patterns we can manufacture training data.
The number of hidden units should be less than
the total number of training points n, say roughly
n/10.
Practical Techniques for
Improving Backpropagation
We cannot initialize the weights to 0.
Initializing weights = uniform learning ->
choose weights randomly from a
single distribution
Input-to-hidden weights: -1 / d < wij < + 1 /
d ,where d is the number of input units
Hidden-to-output weights: -1 / nH < wkj < +
1 / nH ,where d is the number of hidden
units
Practical Techniques for
Improving Backpropagation
Learning Rates
Demo Chapter 9
The optimal rate
Practical Techniques for
Improving Backpropagation
Momentum : allows
the network to learn
more quickly when
plateaus in the
error surface exist.
Demo chapter 12.
0.9
Practical Techniques for
Improving Backpropagation
Weight Decay to avoid overfitting = to
start with a network with too many
weights and decay all the weights
during training wnew =
wold(1-), where 0 < < 1
The weights that are not needed for
reducing the error function become
smaller and smaller -> eliminated
Practical Techniques for
Improving Backpropagation
If we have insufficient
training data for the
desired classification
accuracy.
Learning with hints is
to add output units for
addressing an ancillary
problem, one different
from but related to the
specific classification
problem at hand.
Practical Techniques for
Improving Backpropagation
Stopped Training
Number of hidden layers: typically 2-3.