Presentation On Classification
Presentation On Classification
Presentation On Classification
Classification(2)
Section 8.5.1-8.5.3
K Saiveer – 30121101.
8.5.1 Metrics for Evaluating Classifier Performance
• Given a dataset, it is partitioned into two disjoint sets called training set and testing set.
• Classifier is learned based on the training set and get evaluated with testing set.
• Proportion of training and testing sets is at the discretion of analyst; typically 1:1 or 2:1, and
there is a trade-off between these sizes of these two sets.
• If the training set is too large, then model may be good enough, but estimation may be less
reliable due to small testing set and vice-versa.
Random Subsampling
• It is a variation of Holdout method to overcome the drawback of over-
presenting a class in one set thus under-presenting it in the other set and vice-
versa.
• In this method, Holdout method is repeated k times, and in each time, two
disjoint sets are chosen at random with a predefined sizes.
• k-fold cross-validation
• N-fold cross-validation
k-fold Cross-Validation
• Dataset consisting of N tuples is divided into k (usually, 5 or 10) equal,
mutually exclusive parts or folds (, and if N is not divisible by k, then the last
part will have fewer tuples than other (k-1) parts.
• A series of k runs is carried out with this decomposition, and in ith iteration is
used as test data and other folds as training data
• Thus, each tuple is used same number of times for training and once for testing.
Learning
Di technique
Fold i
Data set
Dk
Fold k
CLASSIFIER
Accuracy Performance
N-fold Cross-Validation
• In k-fold cross-validation method, part of the given data is used in training
with k-tests.
• Here, dataset is divided into as many folds as there are instances; thus, all
most each tuple forming a training set, building N classifiers.
• In this method, therefore, N classifiers are built from N-1 instances, and
each tuple is used to classify a single test instances.
• Test sets are mutually exclusive and effectively cover the entire set (in
sequence). This is as if trained by entire data as well as tested by entire data
set.
• In practice, the method is extremely beneficial with very small data set
only, where as much data as possible to need to be used to train a classifier.
Thank
you.