3 2KNN

CSCE-421 Machine Learning
2. KNN
Instructor: Guni Sharon, classes: TR 3:55-5:10, HRBB 124 1

Assignments
• Overdue:
• Programming assignment 0: Python Warmup (due Monday, Sep 6)
• Due:
• Written assignment 1: K Nearest Neighbors + Linear algebra (due Monday,
Sep 13)
2
Classification problem
• Find a hypothesis from the set of possible hypothesizes, , that
minimizes the classification error on the provided data set
• That’s easy:
• We want to generalize, i.e., minimizes the expected classification

error on samples from the true distribution,
• That is, find
• Assuming our data set comes from the same distribution as
• K-fold cross validation
3
The best we can hope for
• If we knew , specifically
• We can easily compute the most likely label, denoted
• AKA, Bayes optimal classifier

• What is the expected error rate (binary loss) for ?
4
The best we can hope for
• If we knew , specifically
• We can easily compute the most likely label, denoted
• AKA, Bayes optimal classifier

• What is the expected error rate (binary loss) for ?
• Lower bound of the error rate

• With the same feature representation, no classifier can obtain better accuracy
5
Optimal error rate
• Assume that we have a classifier for predicting if it is going to rain
tomorrow
• Assume that if the atmospheric pressure = 0.95 bar, there is a
probability of 0.8 that it will rain the next day
• What is the best prediction we can make if we know that atmospheric
pressure = 0.95 and don’t know anything else?
• What is our expected error rate for the best prediction?
• Suboptimality of a classification model can be defined with respect to
this optimal error rate
6
Worst-case accuracy
• The best accuracy we can hope for is that of the Bayes optimal
classifier
• What is the worst accuracy that we should never underperform?
• Constant classifier, essentially predicts the same label independent of
any feature vectors
• The best constant in classification is the most common label in the
training set
• you should show that your classifier outperforms the best Constant
classifier
Your classifier error
1 − 𝑃 ( 𝑦 ∨⋅ )
∗
1 − 𝑃 ( 𝑦 ∗ ∨𝑥 )
7
An intuitive classification model
• is unknown but is provided
• Generalize beyond the observed
data…
• Guess the class of the unknown
point
• What do you assume about the
data?
8
K-nearest neighbors
• Assign the majority class among the
k-nearest neighbors
• K=2 ?
• K=17 ?
9
10
K-nearest neighbors
• Assign the majority class among the
k-nearest neighbors
• K=2 ?
• K=17 ?
• How do we define nearest to ?
• Obviously… but how do we define

the distance function ?
11
Distance/similarity metric
• Minkowski distance
• is a model (K-NN) parameter

• (Manhattan distance)… why Manhattan?
• (Euclidian distance)
12
Minkowski distance quiz
• Which blue circle is closest to the 1
red x?
2
13
Feature scaling
• Distance based algorithms are affected by the scale of the variables
Index X Y
Age Income House owner
1 25 45,000 0
2 34 56,000 1
3 27 90,000 0
4 23 51,000 0
5 42 105,000 1
• Which x seems most similar to ?

• Which x is closest to (Euclidian distance)?
14
Feature scaling - Normalization
• Avoid bias towards variables with higher magnitude
• Standard score:
• Represents the feature value in terms of units from mean
• Works well for populations that are normally distributed
• Min-max feature scaling:
• Also called unity-based normalization
• Set all feature values within [0,1] range
15
Feature scaling
• Min-max feature scaling
Index X Y Index X Y
Age Income House Age Income House
owner owner
1 25 45,000 0 1 0.1 0 0
2 34 56,000 1 2 0.58 0.18 1
3 27 90,000 0 3 0.21 0.75 0
4 23 51,000 0 4 0 0.1 0
5 42 105,000 1 5 1 1 1
• Which x seems most similar to ?

• Which x is closest to (Euclidian distance)?
16
Accuracy guarantees for K-NN
• Cover and Hart 1967: As , the 1-NN error for binary classification is no
more than twice the error of the Bayes optimal classifier
17
Accuracy guarantees for 1-NN
• Let be the nearest neighbor of our test point
• As , , i.e.,
• This means the nearest neighbor is identical to
• 1-NN returns the label of
• What is the probability that (binary classification)
1. and
2. At ,
18
Accuracy guarantees for 1-NN
• Let be the nearest neighbor of our test point
• As , , i.e.,
• This means the nearest neighbor is identical to
• 1-NN returns the label of
• What is the probability that (binary classification)
Good news:
1. and K-NN has impressive accuracy guarantees
2. At , Bad news: only for a large enough .
Q: How large?
19
Required number of samples
• Assume a 1-dimension feature vector from
• Accurate predictions require neighbors within distance
• Add samples (uniform distribution)
• Interval of sizecovers of the state space and should include of
the (uniform) samples on expectancy
• and so e.g., requires samples
0 1
20
• Now consider a 2D feature vector:
• A square of sizecovers of the state space 1
and should include of the (uniform) samples

on expectancy 𝑙
1
21
• Now consider a 3D feature vector:
• A cube of sizecovers of the state space and
should include of the (uniform) samples on
expectancy
• For the general case ( dimensions)
22
Curse of dimensionality
𝑛
𝑑
23
• Frequency of
pairwise Euclidian
distances between
randomly distributed
points within -
dimensional unit
squares
• K-NN is meaningless
for
24
• How many features are in a 1.3MP RGB image
• ~4M dimensions
• We would need samples
• Not enough atoms in the universe
• Say we could get all these samples, what is the computation
complexity for inference?
• We need to consider other approaches that can scale better

• We will revisit (next class) the linear classifier from Lecture 1
25
What did we learn?
• Classification problem
• Performance bounds (best - Bayes optimal classifier vs worst - Fixed
classifier)
• K-nearest neighbors
• Can reach optimal performance with enough samples and a suitable k
• “Enough samples” grows exponentially with the number of dimensions (curse
of dimensionality)
• Computationally intensive for large training sets
26
What next?
• Class:
• linear classifier (perceptron)
• Assignments:
• Written assignment 1: K Nearest Neighbors + Linear algebra (due Monday,
Sep 13)
27

3 2KNN

Uploaded by

Copyright:

Available Formats

3 2KNN

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 2KNN

Uploaded by

Copyright:

Available Formats

CSCE-421 Machine Learning

Instructor: Guni Sharon, classes: TR 3:55-5:10, HRBB 124 1

• We want to generalize, i.e., minimizes the expected classification

• AKA, Bayes optimal classifier

• AKA, Bayes optimal classifier

• Lower bound of the error rate

• How do we define nearest to ?

• Obviously… but how do we define

• is a model (K-NN) parameter

• Which x seems most similar to ?

• Which x seems most similar to ?

• A square of sizecovers of the state space 1

and should include of the (uniform) samples

• For the general case ( dimensions)

• We need to consider other approaches that can scale better

You might also like