Lecture 13 - Perceptrons: Machine Learning March 16, 2010
Lecture 13 - Perceptrons: Machine Learning March 16, 2010
Lecture 13 - Perceptrons: Machine Learning March 16, 2010
Last Time
Hidden Markov Models
Sequential modeling represented in a Graphical Model
Today
Perceptrons Leading to
Neural Networks aka Multilayer Perceptron Networks But more accurately: Multilayer Logistic Regression Networks
Combining function
How do we construct the neural output
Linear Neuron
10
Combining function
Sigmoid function or Squashing function
Logistic Neuron
13
14
15
16
Gradient Descent
The Gradient is defined (though we cant solve directly) Points in the direction of fastest increase
17
Gradient Descent
Gradient points in the direction of fastest increase To minimize R, move in the opposite direction
18
Gradient Descent
Gradient points in the direction of fastest increase To minimize R, move in the opposite direction
19
Gradient Descent
Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum
20
Gradient Descent
Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum
21
Gradient Descent
Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum
22
Gradient Descent
Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum
23
Gradient Descent
Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum
24
Gradient Descent
Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum
25
Gradient Descent
Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum
26
Gradient Descent
Initialize Randomly Update with small steps Can oscillate if is too large
27
Gradient Descent
Initialize Randomly Update with small steps Can oscillate if is too large
28
Gradient Descent
Initialize Randomly Update with small steps Can oscillate if is too large
29
Gradient Descent
Initialize Randomly Update with small steps Can oscillate if is too large
30
Gradient Descent
Initialize Randomly Update with small steps Can oscillate if is too large
31
Gradient Descent
Initialize Randomly Update with small steps Can oscillate if is too large
32
Gradient Descent
Initialize Randomly Update with small steps Can oscillate if is too large
33
Gradient Descent
Initialize Randomly Update with small steps Can oscillate if is too large
34
Gradient Descent
Initialize Randomly Update with small steps Can oscillate if is too large
35
Gradient Descent
Initialize Randomly Update with small steps is ever 0 not at the minimum Can stall if
36
Back to Neurons
Linear Neuron
Logistic Neuron
37
Perceptron
Classification squashing function
Perceptron
Classification Error
Only count errors when a classification is incorrect.
39
Classification Error
Only count errors when a classification is incorrect.
40
Perceptron Error
Cant do gradient descent on this.
41
Perceptron Loss
With classification loss: Define Perceptron Loss.
Loss calculated for each misclassified data point
vs
Perceptron Loss
Perceptron Loss.
Loss calculated for each misclassified data point
Update the gradient for each misclassified point at training Update rule:
45
Theorem: If xi in X are linearly separable, then this process will converge to a * which leads to zero error in a finite number of steps.
46
Linearly Separable
Two classes of points are linearly separable, iff there exists a line such that all the points of one class fall on one side of the line, and all the points of the other class fall on the other side of the line
47
Linearly Separable
Two classes of points are linearly separable, iff there exists a line such that all the points of one class fall on one side of the line, and all the points of the other class fall on the other side of the line
48
Next Time
Multilayer Neural Networks
49