NN Adaline

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 14

Neural Networks - Adaline

L. Manevitz

NNs Adaline

Plan of Lecture
Perceptron: Connected and Convex examples Adaline Square Error Gradient Calculate for and, xor Discuss limitations LMS algorithm: derivation.
NNs Adaline 2

What is best weights?


Most classified correctly? Least Square Error

Least Square Error Before Cut-Off!


To Minimize S (d Sw x)**2
examples; second over dimension)
(first sum over

NNs Adaline

Least Square Minimization


Find gradient of error over all examples. Either calculate the minimum or move opposite to gradient.

Widrow-Hoff(LMS): Use instantaneous example as approximation to gradient.


Advantages: No memory; on-line; serves similar function as noise to avoid local problems. Adjust by w(new) = w(old) + a (d) x for each x. Here d = (desired output Swx)
NNs Adaline 4

LMS Derivation
Errsq = S (d(k) W x(k)) ** 2 Grad(errsq) = 2S(d(k) W x(k)) (-x(k)) W (new) = W(old) - m Grad(errsq) To ease calculations, use Err(k) in place of Errsq

W(new) = W(old) + 2mErr(k) x(k))


Continue with next choice of k
NNs Adaline 5

Applications
Adaline has better convergence properties than Perceptron

Useful in noise correction


Adaline in every modem.

NNs Adaline

LMS (Least Mean Square Alg.)


1. Apply input to Adaline input 2. Find the square error of current input
Errsq(k) = (d(k) - W x(k))**2

3. Approximate Grad(ErrorSquare) by
differentiating Errsq approximating average Errsq by Errsq(k) obtain -2Errsq(k)x(k)

4.
5.

Update W: W(new) = W(old) + 2mErrsq(k)X(k)


Repeat steps 1 to 4.

NNs Adaline

Comparison with Perceptron


Both use updating rule changing with each input One fixes binary error; the other minimizes continuous error Adaline always converges; see what happens with XOR

Both can REPRESENT Linearly separable functions


NNs Adaline 8

Convergence Phenomenom
LMS converges depending on choice of m.

How to choose it?

NNs Adaline

Limitations
Linearly Separable How can we get around it?
Use network of neurons?
Use a transformation of data so that it is linearly separable

NNs Adaline

10

Multi-level Neural Networks


Representability
Arbitrarily complicated decisions Continuous Approximation: Arbitrary Continuous Functions (and more) (Cybenko Theorem)

Learnability
Change Mc-P neurons to Sigmoid etc.

Derive backprop using chain rule. (Like LMS TheoremSample Feed forward Network (No loops)

NNs Adaline

11

Replacement of Threshold Neurons with Sigmoid or Differentiable Neurons

Threshold
NNs Adaline

Sigmoid
12

Prediction
delay Input/Output NN

Compare

NNs Adaline

13

Sample Feed forward Network (No loops)


Weights Weights Weights

Output

Input

Wji Vik

F(S wji xj

NNs Adaline

14

You might also like