05-1 Supervised Learning

Supervised Learning
IE:4172 Big Data Analytics

Stephen Baek
Assignment
1. If A and B are matrices, what are the conditions that they must satisfy in order
to compute AB (matrix multiplication)?
2. What kind of operations are required and how many times are they going to be
called to compute AB?
3. What is Gauss-Jordan elimination? What kind of operations are required and
how many times are they going to be called to perform G-J elimination on a d-
by-d matrix?
Assignment
4. Given a linear model Y=XA, the solution is known as A = ((XTX)-1XT)Y. How
many operations do you need to perform in order to find A?
5. The solution to 4 can also be computed as (XTX)-1(XTY). Note the parentheses.
How many operations do you need to perform to find A this time?
6. What is the complexity of solving a linear system?
7. Can the complexity improved? How? Find at least 3 ways.
What is Machine Learning?
● Supervised Learning
○ Classification
○ Regression
○ Classification
○ Regression
● Unsupervised Learning
○ Classification
○ Regression
● Self-supervised Learning (not in this course)
○ Classification
○ Regression
● Self-supervised Learning (not in this course)
● Reinforcement Learning (not in this course)
Learning a Class from Examples
● What follows is some theory of classification into two classes.
● First, we assume there is no noise (results can be generalized to noise,
though)
● What you should learn:
though)
○ Learning can be seen as pruning out possible hypothesis (models).
though)
○ Learning is generalization (we want to predict classes of new examples).
though)
○ Learning is impossible if the hypothesis (model) space is too large (in other words: we need
some prior information, we need to select a model family)
though)
○ Learning is impossible if the hypothesis (model) space is too large (in other words: we need
some prior information, we need to select a model family)
○ The more complex model family (hypothesis space), the more training data needed.
Independent and Identically Distributed (iid) Data
● We assume that we have a training data that contains data points
drawn independently from the identical distribution.
○ Ordering of the data points does not matter.
● This assumption often holds or loosely holds in real-world problems.
● This assumption often holds or loosely holds in real-world problems.
● Notable exception: time series.
○ e.g. Today’s temperature is not independent of yesterday’s temperature. In fact, there is a
strong correlation.
The Family Car Example
Question: Given car properties, is the car a family car?
Car properties: = (price, engine power)
Hypothesis: if a car is a family car?

Family Car: Training Set
Family Car: True Class
Family Car: Hypothesis class H
Consistent Hypothesis
Consistent Hypothesis
What did we learn from the Family Cars example?
● We must choose some hypothesis to be able to predict anything (unless we
observe all possible data values).
● This causes inductive bias (the choice of hypothesis space affects your
results).
results).
● All consistent hypothesis can be found between the most general and most
specific hypothesis.
results).
● All consistent hypothesis can be found between the most general and most
specific hypothesis.
● In practical applications, there may be no consistent hypothesis due to the
hypothesis space that is too simple (underfitting) or a noisy distribution of data.
Noise and Model Complexity
● Noise is unwanted anomaly of data.
● Because of the noise, we may never reach zero
error.
● Why to use simpler model:
○ Simpler to use
○ Easier to train
○ Easier to explain
○ Generalizes better
Noise and Model Complexity
● Noise may be caused by:
○ Errors in measurements of input attributes or class labels.
○ Unknown or ignored (hidden or latent) attributes.
(Example: plane ticket price may increase when there is
an event in the region.)
○ Model is wrong or inaccurate (figure)
Regression
● Classification is the prediction of a class label, given attributes.
Regression
● Regression is the prediction of a real number, given attributes (usually with
noise).
Regression
noise).
● The training set is given by , where .
Regression
noise).
● Each hypothesis is a function . We would like to find for
all items in the training set.
Regression
noise).
● Each hypothesis is a function . We would like to find for
all items in the training set.
● Usually, we want to minimize a quadratic error function:
Regression
● The simplest case is linear regressor:
Regression
● Optimization task: find w0 and w1 such that the error
is minimized.
Regression
● Optimization task: find w0 and w1 such that the error
is minimized.
● Analytic solution:
where and .
Linear Regression
Linear Regression
Linear Regression
Linear Model:
Linear Regression
Model
parameters
Linear Model:
Linear Regression
Model
parameters
Linear Model:
broadcasting
Linear Regression
Model
parameters
Linear Model:
broadcasting
A “trick”:
Linear Regression
Model
parameters
Linear Model:
broadcasting
A “trick”:
Linear Regression
Linear Regression
Linear Regression
Linear Regression
Assume
Linear Regression
Assume
Linear Regression
Assume
Linear Regression
Linear Regression
Moore-Penrose Pseudoinverse
Linear Regression: Toy data
Linear Basis Functions
Least Squares Solution to Regression
Polynomial Regressors
● Etrain is the error in the training data. It decreases as model complexity
increases.
● Etest is the error on the remaining 93 data points, or “test set”. It has minimum
at k = 3.
Wait a minute...
is d-by-d.
requires N x d2 multiplications and additions.
Inverting is O(d3) (Gauss-Jordan)
Afterwards, requires N x d2 + N x d multiplications.
Big Data!
is d-by-d.
requires N x d2 multiplications and additions.
Inverting is O(d3) (Gauss-Jordan)
Afterwards, requires N x d2 + N x d multiplications.

05-1 Supervised Learning

Uploaded by

Copyright:

Available Formats

05-1 Supervised Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

05-1 Supervised Learning

Uploaded by

Copyright:

Available Formats

Supervised Learning

IE:4172 Big Data Analytics

Car properties: = (price, engine power)

Hypothesis: if a car is a family car?

You might also like