3 2KNN
3 2KNN
3 2KNN
2. KNN
2
Classification problem
• Find a hypothesis from the set of possible hypothesizes, , that
minimizes the classification error on the provided data set
• That’s easy:
3
The best we can hope for
• If we knew , specifically
• We can easily compute the most likely label, denoted
4
The best we can hope for
• If we knew , specifically
• We can easily compute the most likely label, denoted
5
Optimal error rate
• Assume that we have a classifier for predicting if it is going to rain
tomorrow
• Assume that if the atmospheric pressure = 0.95 bar, there is a
probability of 0.8 that it will rain the next day
• What is the best prediction we can make if we know that atmospheric
pressure = 0.95 and don’t know anything else?
• What is our expected error rate for the best prediction?
• Suboptimality of a classification model can be defined with respect to
this optimal error rate
6
Worst-case accuracy
• The best accuracy we can hope for is that of the Bayes optimal
classifier
• What is the worst accuracy that we should never underperform?
• Constant classifier, essentially predicts the same label independent of
any feature vectors
• The best constant in classification is the most common label in the
training set
• you should show that your classifier outperforms the best Constant
classifier
Your classifier error
1 − 𝑃 ( 𝑦 ∨⋅ )
∗
1 − 𝑃 ( 𝑦 ∗ ∨𝑥 )
7
An intuitive classification model
• is unknown but is provided
• Generalize beyond the observed
data…
• Guess the class of the unknown
point
• What do you assume about the
data?
8
K-nearest neighbors
• Assign the majority class among the
k-nearest neighbors
• K=2 ?
• K=17 ?
9
10
K-nearest neighbors
• Assign the majority class among the
k-nearest neighbors
• K=2 ?
• K=17 ?
11
Distance/similarity metric
• Minkowski distance
12
Minkowski distance quiz
• Which blue circle is closest to the 1
red x?
2
13
Feature scaling
• Distance based algorithms are affected by the scale of the variables
Index X Y
Age Income House owner
1 25 45,000 0
2 34 56,000 1
3 27 90,000 0
4 23 51,000 0
5 42 105,000 1
15
Feature scaling
• Min-max feature scaling
Index X Y Index X Y
Age Income House Age Income House
owner owner
1 25 45,000 0 1 0.1 0 0
2 34 56,000 1 2 0.58 0.18 1
3 27 90,000 0 3 0.21 0.75 0
4 23 51,000 0 4 0 0.1 0
5 42 105,000 1 5 1 1 1
17
Accuracy guarantees for 1-NN
• Let be the nearest neighbor of our test point
• As , , i.e.,
• This means the nearest neighbor is identical to
• 1-NN returns the label of
• What is the probability that (binary classification)
1. and
2. At ,
18
Accuracy guarantees for 1-NN
• Let be the nearest neighbor of our test point
• As , , i.e.,
• This means the nearest neighbor is identical to
• 1-NN returns the label of
• What is the probability that (binary classification)
Good news:
1. and K-NN has impressive accuracy guarantees
2. At , Bad news: only for a large enough .
Q: How large?
19
Required number of samples
• Assume a 1-dimension feature vector from
• Accurate predictions require neighbors within distance
• Add samples (uniform distribution)
• Interval of sizecovers of the state space and should include of
the (uniform) samples on expectancy
• and so e.g., requires samples
0 1
20
Required number of samples
• Now consider a 2D feature vector:
21
Required number of samples
• Now consider a 3D feature vector:
• A cube of sizecovers of the state space and
should include of the (uniform) samples on
expectancy
22
Curse of dimensionality
𝑛
𝑑
23
Curse of dimensionality
• Frequency of
pairwise Euclidian
distances between
randomly distributed
points within -
dimensional unit
squares
• K-NN is meaningless
for
24
Curse of dimensionality
• How many features are in a 1.3MP RGB image
• ~4M dimensions
• We would need samples
• Not enough atoms in the universe
• Say we could get all these samples, what is the computation
complexity for inference?
25
What did we learn?
• Classification problem
• Performance bounds (best - Bayes optimal classifier vs worst - Fixed
classifier)
• K-nearest neighbors
• Can reach optimal performance with enough samples and a suitable k
• “Enough samples” grows exponentially with the number of dimensions (curse
of dimensionality)
• Computationally intensive for large training sets
26
What next?
• Class:
• linear classifier (perceptron)
• Assignments:
• Written assignment 1: K Nearest Neighbors + Linear algebra (due Monday,
Sep 13)
27