Chapter 6 ML Classifications
Chapter 6 ML Classifications
Chapter 6 ML Classifications
CHAPTER 6
Machine Learning Classification Methods
1
Overview
Machine Learning Classification Methods
2
KNN
Tell me about your friends(who your neighbors are) and I will tell you
who you are.
KNN
• K-Nearest Neighbours
• Memory-Based Reasoning
• Example-Based Reasoning
• Instance-Based Learning
• Lazy Learning
KNN
What is it?
• K nearest neighbors stores all available cases and classifies new cases based
on a similarity measure (e.g distance function)
Classification Approach
Distance Functions
k
• Euclidean i i
( x
i =1
− y ) 2
• Manhattan x
i =1
i − yi
1
k
( xi − yi )
q q
• Minkowski
i =1
KNN
Distance Between Neighbours
• Calculate the distance between new example (E) and all examples in the
training set
• Euclidean distance between two examples.
– X = [x1,x2,x3,..,xn]
– Y = [y1,y2,y3,...,yn]
– The Euclidean distance between X and Y is defined as:
k
D( X , Y ) = i i
( x
i =1
− y ) 2
KNN
K-Nearest Neighbour Algorithm
• All the instances correspond to points in an n-dimensional feature space.
• Each instance is represented with a set of numerical attributes.
• Each of the training data consists of a set of vectors and a class label associated
with each vector.
• Classification is done by comparing feature vectors of different K nearest points.
• Select the K-nearest examples to E in the training set.
• Assign E to the most common class among its K-nearest neighbours.
KNN
3-KNN: Example(1)
KNN
How to choose K?
• If K is too small, it is sensitive to noise points.
• Larger K works well. But too large K may include majority points
from other classes.
K-nearest neighbors of a record x are data points that have the k smallest
distance to x
KNN
KNN Feature Weighting
• Scale each feature by its importance for classification
• Can use our prior knowledge about which features are more important
• Can learn the weights wk using cross‐validation
KNN
Feature Normalization
• Distance between neighbors could be dominated by some attributes
with relatively large numbers.
e.g., income of customers in our previous example.
x= yD=0
x y D =1
X Y Distance
Male Male 0
Male Female 1
KNN
KNN Classification
KNN
KNN Classification – Distance
KNN
KNN Classification – Standardized Distance
KNN
Strengths of KNN
Weaknesses of KNN
Output: [0 1 1 1 0]
Overview
Machine Learning Classification Methods
25
SVM
Introduction
Given N labeled empirical data:
(x1, y1 ), , (x N , yN ) X {−1,+1}
where X is the set of input data in D and yi are the labels.
x2 yi = 1
yi = −1
c1 c2
Domain X
x1
SVM
w w
x2 yi = 1
yi = −1
c1 c2
x Decision plan
Domain X
x1
SVM
Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1 w x + b>0
denotes -1
w x + b<0
SVM
Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1
Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1
Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1
Misclassified
to +1 class
SVM
Classifier Margin x f yest
denotes +1 f(x,w,b) = sign(w x + b)
denotes -1
Linear SVM
SVM
Good Decision Boundary: Margin Should Be Large
The decision boundary should be as far away from the data of both classes as
possible
We should maximize the margin, m
Class 2
m?
Class 1
35
SVM
w w
x2
Therefore, maximizing
the margin is equivalent
w x + b = +1 to minimizing ||w||2.
wx +b = 0
w x + b = −1 x1
SVM
The Optimization Problem
• Let {x1, ..., xn} be our data set and let yi {1,-1} be the class label of xi
38
SVM
The Optimization Problem
• We can transform the problem to its dual using Lagrange multipliers
• w can be recovered by
39
SVM
Characteristics of the Solution
40
SVM
Characteristics of the solution
Compute
41
SVM
A Geometrical Interpretation
Class 2
a10=0
a8=0.6
a7=0
a2=0
a5=0
a1=0.8
a4=0
a6=1.4
a9=0
a3=0
Class 1
42
SVM
If Nonlinear?
Use of mapping through
Kernal function
43
SVM
Polynomial kernel K ( xi , x j ) = ( xi • x j + 1) 2
2
x j − xi
−
Radial Base Function (RBF) K ( xi , x j ) = e 22
x j − xi
−
Laplace K ( xi , x j ) = e
SVM
Dataset with noise
denotes +1 ◼ Hard Margin: require all data points be
denotes -1 correctly classified, No training error
OVERFITTING!
SVM
Hard Margin v.s. Soft Margin
◼ The formulation:
Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1
Strengths
Weaknesses
Output: [0 1 1 1 0]
References
https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors
https://sikoried.github.io/sequence-learning/09-sequence-kernels/intro_svm_new.pdf