ML.4-Classification Techniques (Week 5,6,7)
ML.4-Classification Techniques (Week 5,6,7)
ML.4-Classification Techniques (Week 5,6,7)
Chapter 4
Classification Techniques
Machine Learning
CONTENTs
• Classification Problems
• Classification Algorithms
• K-Nearest Neighbors
• Naïve Bayes Classification
• Decision Tree
• Support Vector Machines
• Metrics to measure Classification Performance
Machine Learning 2
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 3
Classification Problem
Machine Learning 4
Classification Problem
➢ Classification requires a training dataset with many
examples of inputs and outputs from which to learn.
➢ A ML model will use the training dataset and will
calculate how to best map examples of input data to
specific class labels.
➢ The training dataset must be sufficiently representative
of the problem and have many examples of each class
label.
Machine Learning 5
Example
Machine Learning
Classification Examples
Machine Learning 7
Binary Classification
➢ Binary Classification classify the input data into two mutually
exclusive categories.
➢ The training data in such a situation is labeled in a binary
format: true and false; positive and negative; 0 and 1; spam and
not spam, etc. depending on the problem being tackled.
Machine Learning 8
Multi-Class Classification
➢ Multi-class classification is where each data sample is
assigned one and only one label from more than two
classes.
Machine Learning 9
Multi-Label Classification
➢ Multi-Label classification refers to predicting zero or
more labels to each data sample.
➢ Example: auto-tagging in Natural Language Processing,
where a given text can contain multiple topics or in computer
vision, an image can contain multiple objects.
Machine Learning 10
Example
????
Machine Learning 11
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 12
K- nearest neighbor (K-NN)
➔ Given training data D = {(x1, y1),...,(xN , yN )} and a test point
➔ Prediction Rule: Look at the K most similar training examples
➔ For classification: assign the majority class label (majority
voting)
➔ For regression: assign the average response
➔ The algorithm requires:
◆ Parameter K: number of nearest
Machine Learning 14
K- Nearest Neighbor (K-NN)
Machine Learning 15
K- Nearest Neighbor (K-NN): Computing the distance
commonly used.
Machine Learning 16
K- Nearest Neighbor (K-NN): Computing the distance
Machine Learning 17
K- Nearest Neighbor (K-NN): Choice of K
➔ Small K
◆ Creates many small regions for each class
◆ May lead to non-smooth) decision boundaries and overfit
➔ Large K
◆ Creates fewer larger regions
◆ Usually leads to smoother decision boundaries (caution: too smooth decision
boundary can underfit)
◆ Choosing K
➔ Often data dependent and heuristic based
◆ Or using cross-validation (using some held-out data)
◆ In general, a K too small or too big is bad!
Machine Learning 18
K- Nearest Neighbor (K-NN): Properties
➔ What’s nice
◆ Simple and intuitive; easily implementable
◆ Asymptotically consistent (a theoretical property)
● With infinite training data and large enough K, K-NN approaches the
best possible classifier (Bayes optimal)
➔ What’s not so nice..
◆ Store all the training data in memory even at test time
● Can be memory intensive for large training datasets
● An example of non-parametric, or memory/instance-based methods
● Different from parametric, model-based learning models
◆ Expensive at test time: O(ND) computations for each test point
● Have to search through all training data to find nearest neighbors
● Distance computations with N training points (D features each)
◆ Sensitive to noisy features
Machine Learning 19
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 20
Naïve Bayes Classification: Rule
➔ Bayes Rule:
➔ Prior :
➔ Posterior :
➔ Bayes Rule:
➔ What is P(flu|cough)=P(A|B)?
Machine Learning 22
Naïve Bayes Classification: in a Nutshell
➔ Bayes Rule:
Machine Learning 23
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 24
Decision Tree
Example: learn concept Play Tennis (i.e., decide whether our friend will play
tennis or not in a given day)
Simple Training Data Set
Machine Learning 25
Decision Tree
➔ Each internal node: test one (discrete-valued) attribute Xi
➔ Each branch from a node: corresponds to one possible values for Xi
➔ Each leaf node: predict Y (or P(Y=1|x ∊ leaf))
➔ Example: A Decision tree for f: PlayTennis?
Machine Learning 26
Decision Tree
➔ Suppose X = <x1,…xn >
➔ where xi are boolean-valued variables
➔ How would you represent the following as DTs?
Machine Learning 27
Decision Tree
➔ Input: Training labeled examples {(x (i) ,y(i) )} of unknown target
function f
◆ Examples described by their values on some set of features or
attributes
● E.g. 4 attributes: Humidity, Wind, Outlook, Temp – e.g.,
● Set of possible instances X (a.k.a instance space)
◆ Unknown target function f : XY
● e.g., Y={0,1} label space
● e.g., 1 if we play tennis on this day, else 0
Machine Learning 30
Support Vector Machines: Geometric Margin
➔ Definition: The margin of example 𝑥 w.r.t. a linear sep. 𝑤 is the
distance from 𝑥 to the plane 𝑤 ⋅ 𝑥 = 0.
➔ Definition: The margin 𝛾𝑤 of a set of examples 𝑆 wrt a linear separator
𝑤 is the smallest margin over points 𝑥 ∈ 𝑆.
➔ Definition: The margin 𝛾 of a set of examples 𝑆 is the maximum 𝛾𝑤
over all linear separators 𝑤.
Machine Learning 31
Support Vector Machines: Geometric Margin
➔ Directly optimize for the maximum margin separator: SVMs
➔ First, assume we know a lower bound on the margin 𝜸
Machine Learning 32
Support Vector Machines: Geometric Margin
➔ Directly optimize for the maximum margin separator: SVMs
E.g., search for the best possible 𝜸
Machine Learning 33
Support Vector Machines: Geometric Margin
➔ Directly optimize for the maximum margin separator: SVMs
objective function
Machine Learning 34
Support Vector Machines: Geometric Margin
➔ Directly optimize for the maximum margin separator: SVMs
Machine Learning 35
Chapter Content
➢ Classification Problems
➢ Classification Algorithms
○ K-Nearest Neighbors
○ Naïve Bayes Classification
○ Decision Tree
○ Support Vector Machines
➢ Metrics
Machine Learning
to measure Classification Performance 36
Metrics to measure Classification Performance
➢ Confusion Matrix
➢ Accuracy
➢ Precision
➢ Recall
➢ F1 score
Machine Learning 37
Confusion Matrix
➢ describe the performance of a classification model on a set
of test data for which the true values (actual classes) are known.
Binary-class Classification
Machine Learning 38
Example
A confusion matrix for a binary classifier in disease diagnosis in
which "yes" would mean they have the disease, and "no" would
mean they don't have the disease.
Machine Learning 39
Confusion Matrix
Multi-class Classification
Machine Learning 40
Confusion Matrix
Multi-label Classification
Machine Learning 41
The role of confusion matrix
➢ evaluates the performance of the classification
models, when they make predictions on test data.
➢ measures how good generated classification model is.
➢ calculate the different metrics for evaluating the
model, such as accuracy, precision, etc.
Machine Learning 42
Accuracy
➢ determined as the number of correct predictions to the
total number of predictions.
Machine Learning 43
Accuracy in Binary Classification
Machine Learning 44
Accuracy in Multi-class Classification
Multi-class Classification
Accuracy
Machine Learning 45
Accuracy in Multilabel Classification
Machine Learning 46
When uses Accuracy?
➢ use the Accuracy metric when the target variable
classes in data are approximately balanced
(unbiased)
➢ in the case of imbalanced data (biased), where one
class is much larger than another, the accuracy can
be highly misleading.
Machine Learning 47
Example
There is a model for a disease prediction in which, out
of 100 persons, only 5 persons have a disease, and 95
people don't have one. If the classifier predicts
everyone with no disease , the Accuracy value is
95% (TP= 0; TN= 95; FP=0; FN=5; All examples=100)
=> not correct
Machine Learning 48
Example
60% of classes in a fruit image dataset are of Apple, 40%
are Mango (approximately balanced data)
if the classifier predict whether the image is of Apple or
Mango, a prediction with 97% of accuracy is credible.
Machine Learning 49
Precision
➢ Precision measures how good the model is at correctly
identifying the positive class
Machine Learning 50
Precision in Multi-class Classification
Actual Labels
Predicted
Labels
Machine Learning 51
Recall
➢ Recall measures how good the model is at
correctly predicting all the positive observations
in the dataset
Machine Learning 52
Recall in Multi-class Classification
Actual Labels
Predicted
Labels
Machine Learning 53
F1 Score
➢ F1 Score (F-measure) is the harmonic mean of precision
and recall.
Machine Learning 54
SUMMARY
• Classification Problems
• Classification Algorithms
• K-Nearest Neighbors
• Naïve Bayes Classification
• Decision Tree
• Support Vector Machines
• Metrics to measure Classification Performance
Machine Learning 55
Nhân bản – Phụng sự – Khai phóng
Machine Learning 56