4.2 Ann
4.2 Ann
4.2 Ann
Networks
Machine Learning CS-603
Instructor
Dr. Sanjay Chatterji
Introduction
● Sigmoid Unit
Incremental (Stochastic)Gradient Descent
● Incremental Gradient Descent can approximate Batch
Gradient Descent arbitrarily closely if η is small
enough.
● Batch mode:
− w = w - ηΔED[w] over the entire data D
− ED[w]=1/2Σd∈D(td-od)2
● Incremental mode:
− w=w-ηΔEd[w] over individual training examples d
− Ed[w]=1/2 (td-od)2
Gradient Descent
•The key idea behind the delta rule is to use gradient descent to search
the hypothesis space of possible weight vectors to find the weights that
best fit the training examples.
• Gradient descent provides the basis for the BACKPROPAGATION
algorithm which learns networks with many interconnected units.
• Applied when
1. hypothesis space contains continuously parameterized hypotheses
2. error can be differentiated w.r.t. these hypothesis parameters.
• Difficulties in applying gradient descent
1. converging to a local minimum can sometimes be quite slow
2. if there are multiple local minima in the error surface, then there is
no guarantee to find the global minimum.
Gradient Descent(training_examples, η)
● Training example: <(x1,...xn),t> η: learning rate
● Initialize each wi to some small random value
● Until the termination condition is met, Do
− Initialize each Δwi to zero
− For each <(x1,...xn),t> in training_examples,Do
● Input (x1,...,xn) to the linear unit and compute o
● For each linear unit weight wi Do
− Δwi= Δwi+ η(t-o) xi
− For each linear unit weight wi, Do
● wi=wi+Δwi
Sigmoid Unit