Lecture 13 - Perceptrons: Machine Learning March 16, 2010

Lecture 13 Perceptrons
Machine Learning March 16, 2010
Last Time
Hidden Markov Models
Sequential modeling represented in a Graphical Model
Today
Perceptrons Leading to
Neural Networks aka Multilayer Perceptron Networks But more accurately: Multilayer Logistic Regression Networks
Review: Fitting Polynomial Functions

Fitting nonlinear 1-D functions Polynomial: Risk:
Review: Fitting Polynomial Functions

Order-D polynomial regression for 1D variables is the same as D-dimensional linear regression Extend the feature vector from a scalar. More generally
Neuron inspires Regression

Graphical Representation of linear regression
McCullough-Pitts Neuron Note: Not a graphical model
Neuron inspires Regression

Edges multiply the signal (xi) by some weight (i). Nodes sum inputs Equivalent to Linear Regression
Introducing Basis functions

Graphical representation of feature extraction
Edges multiply the signal by a weight. Nodes apply a function, d.
Extension to more features

Graphical representation of feature extraction
Combining function
How do we construct the neural output
Linear Neuron
10
Combining function
Sigmoid function or Squashing function
Logistic Neuron
Classification using the same metaphor

11
Logistic Neuron optimization

Minimizing R() is more difficult
Bad News: There is no closed-form solution. Good News: Its convex.

12
Aside: Convex Regions

Convex: for any pair of points xa and xb within a region, every point xc on a line between xa and xb is in the region
13
Aside: Convex Functions

14

15

Convex functions have a single maximum and minimum! How does this help us? (nearly) Guaranteed optimality of Gradient Descent
16
Gradient Descent
The Gradient is defined (though we cant solve directly) Points in the direction of fastest increase
17
Gradient Descent
Gradient points in the direction of fastest increase To minimize R, move in the opposite direction
18
Gradient Descent
Gradient points in the direction of fastest increase To minimize R, move in the opposite direction
19
Gradient Descent
Initialize Randomly Update with small steps (nearly) guaranteed to converge to the minimum
20
Gradient Descent
21
Gradient Descent
22
Gradient Descent
23
Gradient Descent
24
Gradient Descent
25
Gradient Descent
26
Gradient Descent
Initialize Randomly Update with small steps Can oscillate if is too large
27
Gradient Descent
28
Gradient Descent
29
Gradient Descent
30
Gradient Descent
31
Gradient Descent
32
Gradient Descent
33
Gradient Descent
34
Gradient Descent
35
Gradient Descent
Initialize Randomly Update with small steps is ever 0 not at the minimum Can stall if
36
Back to Neurons
Linear Neuron
Logistic Neuron
37
Perceptron
Classification squashing function
Perceptron
Strictly classification error

38
Classification Error
Only count errors when a classification is incorrect.
Sigmoid leads to greater than zero error on correct classifications
39
Classification Error
Only count errors when a classification is incorrect.
Perceptrons use strictly classification error
40
Perceptron Error
Cant do gradient descent on this.
Perceptrons use strictly classification error
41
Perceptron Loss
With classification loss: Define Perceptron Loss.
Loss calculated for each misclassified data point
vs
Now piecewise linear risk rather than step function

42
Perceptron Loss
Perceptron Loss.
Loss calculated for each misclassified data point
Nice gradient descent

43
Perceptron vs. Logistic Regression

Logistic Regression has a hard time eliminating all errors
Errors: 2
Perceptrons often do better. Increased importance of errors.

Errors: 0
44
Stochastic Gradient Descent

Instead of calculating the gradient over all points, and then updating.
Update the gradient for each misclassified point at training Update rule:
45
Online Perceptron Training

Online Training
Update weights for each data point.
Iterate over points xi point,

If xi is correctly classified Else
Theorem: If xi in X are linearly separable, then this process will converge to a * which leads to zero error in a finite number of steps.
46
Linearly Separable
Two classes of points are linearly separable, iff there exists a line such that all the points of one class fall on one side of the line, and all the points of the other class fall on the other side of the line
47
Linearly Separable
Two classes of points are linearly separable, iff there exists a line such that all the points of one class fall on one side of the line, and all the points of the other class fall on the other side of the line
48
Next Time
Multilayer Neural Networks
49

Lecture 13 - Perceptrons: Machine Learning March 16, 2010

Uploaded by

Copyright:

Available Formats

Lecture 13 - Perceptrons: Machine Learning March 16, 2010

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 13 - Perceptrons: Machine Learning March 16, 2010

Uploaded by

Copyright:

Available Formats

Lecture 13 Perceptrons

Machine Learning March 16, 2010

Review: Fitting Polynomial Functions

Review: Fitting Polynomial Functions

Neuron inspires Regression

Neuron inspires Regression

Introducing Basis functions

Edges multiply the signal by a weight. Nodes apply a function, d.

Extension to more features

Classification using the same metaphor

Logistic Neuron optimization

Bad News: There is no closed-form solution. Good News: Its convex.

Aside: Convex Regions

Aside: Convex Functions

Aside: Convex Functions

Aside: Convex Functions

Strictly classification error

Sigmoid leads to greater than zero error on correct classifications

Perceptrons use strictly classification error

Perceptrons use strictly classification error

Now piecewise linear risk rather than step function

Nice gradient descent

Perceptron vs. Logistic Regression

Perceptrons often do better. Increased importance of errors.

Stochastic Gradient Descent

Online Perceptron Training

Iterate over points xi point,

You might also like