08 Classifier Evaluation
08 Classifier Evaluation
08 Classifier Evaluation
• Useful for lower level tasks and debugging (e.g. diagnosing bias vs
variance).
• Point metrics:
• Accuracy
• Precision, Recall, F-score
• Sensitivity, Specificity
PREDICTED CLASS
a: TP (true positive)
Class=Yes Class=No b: FN (false negative)
c: FP (false positive)
Class=Yes a/TP b/FN d: TN (true negative)
ACTUAL
CLASS Class=No c/FP d/TN
Metrics for Performance Evaluation
• ML model for classifying passengers as COVID positive or negative.
• True Positive (TP): A passenger who is classified as COVID positive and
is actually positive.
• False Negative(FN): A passenger who is classified as not COVID
positive (negative) and is actually COVID positive.
• True Negative (TN): A passenger who is classified as not COVID
positive (negative) and is actually not COVID positive (negative).
• False Positive (FP): A passenger who is classified as COVID positive
and is actually not COVID positive (negative).
Type I and Type II Errors
Performance Evaluation Primitives
𝑇𝑃 𝑇𝑃
𝑇𝑃𝑅 = = = 1 − 𝐹𝑁𝑅
𝑃 𝑇𝑃 + 𝐹𝑁
𝐹𝑃 𝐹𝑃
𝐹𝑃𝑅 = = = 1 − 𝑇𝑁𝑅
𝑁 𝐹𝑃 + 𝑇𝑁
𝐹𝑁 𝐹𝑁
𝐹𝑁𝑅 = = = 1 − 𝑇𝑃𝑅
𝑃 𝑇𝑃 + 𝐹𝑁
𝑇𝑁 𝑇𝑁
𝑇𝑁𝑅 = = = 1 − 𝐹𝑃𝑅
𝑁 𝑇𝑁 + 𝐹𝑃
Accuracy
PREDICTED CLASS
Class=Yes Class=No
Class=Yes a b
ACTUAL (TP) (FN)
CLASS
Class=No c d
(FP) (TN)
𝑎+𝑑 𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = =
𝑎+𝑏+𝑐+𝑑 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Limitations of Accuracy Metric
Q: Which model will you ship in a recruiting agency?
PREDICTED CLASS
𝑤1 𝑇𝑃 + 𝑤4 𝑇𝑁
𝑊𝐴 =
𝑤1 𝑇𝑃 + 𝑤2 𝐹𝑁 + 𝑤3 𝐹𝑃 + 𝑤4 𝑇𝑁
2𝑝𝑟 2𝑇𝑃
𝐹 𝑠𝑐𝑜𝑟𝑒 𝐹 = =
𝑝 + 𝑟 2𝑇𝑃 + 𝐹𝑁 + 𝐹𝑃
Cost-Sensitive Measures
• Precision is biased towards C(Yes|Yes) & C(Yes|No)
• Recall is biased towards C(Yes|Yes) & C(No|Yes)
𝑇𝑃 𝑇𝑃
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃𝑅 = 𝑅𝑒𝑐𝑎𝑙𝑙 = =
𝑃 𝐹𝑁 + 𝑇𝑃
𝑇𝑁 𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑇𝑁𝑅 = =
𝑁 𝐹𝑃 + 𝑇𝑁
Matthew's Correlation Coefficient
• a specific case of a linear correlation coefficient (Pearson r) for a
binary classification setting
• useful in unbalanced class settings
• MCC is bounded between the range 1 (perfect correlation between
ground truth and predicted outcome) and -1 (inverse or negative
correlation) - a value of 0 denotes a random prediction.
𝑇𝑃 . 𝑇𝑁 − 𝐹𝑃 . 𝐹𝑁
𝑀𝐶𝐶 =
√ 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 + 𝐹𝑁 𝑇𝑁 + 𝐹𝑃 (𝑇𝑁 + 𝐹𝑁)
Confusion Matrix for Multi-Class Settings
• Confusion matrices are traditionally for binary class problems but we
can be readily generalized it to multi-class settings
𝑇𝑃𝐶𝑙𝑎𝑠𝑠 𝐴
• Precision for class A =
𝑇𝑃𝐶𝑙𝑎𝑠𝑠 𝐴 + 𝐹𝑃𝐶𝑙𝑎𝑠𝑠 𝐴
𝑇𝑃𝐶𝑙𝑎𝑠𝑠 𝐴
• Recall for class A =
𝑇𝑃𝐶𝑙𝑎𝑠𝑠 𝐴 + 𝐹𝑁𝐶𝑙𝑎𝑠𝑠 𝐴
Macro-Average of Precision and Recall
https://www.evidentlyai.com/classification-metrics/multi-class-metrics
Micro-Average of Precision and Recall
https://www.evidentlyai.com/classification-metrics/multi-class-metrics
Micro- vs. Macro-Averaging
• A macro-average will compute the metric independently for each
class and then take the average Macro-averaging treats each class equally.
• Gives equal weight to each class It can be useful when all classes are equally important, and you want to know how
well the classifier performs on average across them.
It is also useful when you have an imbalanced dataset and want to ensure each
class equally contributes to the final evaluation
• Logistic Regression
• Class as class 1 if output >= 0.5
• Class as class 1 if output < 0.5