Ch4 Supervised
Ch4 Supervised
Ch4 Supervised
• Linear regression
• Process of Classification
• Decision Tree (DT) Approach for Classification
• DT Algorithm
• Attribute Selection Measures
• Tree Pruning for Solving Model Overfitting
• Evaluating Performance of Decision Trees
• Decision Tree Induction in Weka
Process of Classification
(2)
50:50
Initial Model
Test Set
Final Model with
Measured Accuracy
Measuring
Model Accuracy
(3)
Refined Model
Categories of DM Problem and Output Patterns
Here, we will learn how to derive a linear regression from data that of one
dimension: number of features is 1, which is x
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression (least squares)
Regression (least squares)
Regression (least squares)
Regression (least squares)
Regression (least squares)
Regression (least squares)
Regression (least squares)
Regression (least squares)
Regression (least squares)
Regression (prediction error: R^2)
In statistics, the coefficient of
determination, denoted R2 or r2 and
pronounced "R squared", is the proportion
of the variation in the dependent variable
that is predictable from the independent
variable(s).
Regression (prediction error: R^2)
• Classification Accuracy
• Estimated accuracy during development stage vs. actual
accuracy during practical use
• Classification Performance
• Time taken for model construction
• Time taken for classification
• Comprehensibility of the model
• Ease of interpreting decisions by the classification model
A set of classification techniques
Method1:
Decision trees
Decision Tree Induction
The Approach
Input Training Examples
45
Computing information gain (entropy)
46
Example: attribute Outlook
47
Naïve Bayes for classification
• A priori probability of H : 𝑃 𝐻 𝐸
• Probability of event before evidence is seen
• A posteriori probability of H : 𝑃(𝐻)
• Probability of event after evidence is seen
Method2:
Naïve Bayes
Naïve Bayes for classification
Frequency and likelihood table
Outlook Play Class Weather Yes No
0 Rainy Yes Overcast 5 0 5
= 0.35
1 Sunny Yes 14
Rainy 2 2 4
2 Overcast Yes = 0.29
14
3 Overcast Yes Sunny 3 2 5
= 0.35
4 Sunny No 14
5 Rainy Yes
All 10 4
6 Sunny Yes = 0.71 = 0.29
14 14
7 Overcast Yes
8 Rainy No
9 Sunny No Applying Bayes‘ theorem:
10 Sunny Yes P(Yes|Sunny) = P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes) = 3/10= 0.3
11 Rainy No
P(Sunny) = 0.35
12 Overcast Yes P(Yes) = 0.71
13 Overcast Yes So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
𝑷 𝑬 𝑯 𝑷(𝑯)
𝑷 𝑯𝑬 =
𝑷(𝑬)
Naïve Bayes for classification
• Naïve assumption: evidence splits into parts (i.e., attributes) that are
conditionally independent P(H | E) = P(E1 | H)P(E3 | H)… P(En | H)P(H) / P(E)
Pruning methods:
• Reduced error pruning
• Cost complexity pruning
• Pessimistic pruning and many others
Classification:
Evaluation
Evaluating Tree Accuracy
What to Evaluate?
• Accuracy is measured in terms of error rate
• Details of errors are shown in a confusion matrix
e.g.
• Evaluation methods:
• Holdout Method: divide data set 50-50 as training and test sets
• Random Subsampling: use the holdout method several times and take the
average of the accuracy.
• Bootstrap: use sampling with replacement. Training examples are also test
examples.
• Cross Validation: the data set is divided into k equal-size partitions. For each
round of decision tree induction, one partition is used for testing and the
rest used for training. After k rounds, the average error rate is used.
• Leave-One-Out: a particular form of cross-validation
1 2 3 4 5 6 7 8 9 10
Holdout evaluation
• What to do if the amount of data is limited?
• The holdout method reserves a certain amount for testing and uses the
remainder for training
• Usually: one third for testing, the rest for training
• Problem: the samples might not be representative
• Example: class might be missing in the test data
• Advanced version uses stratification
• Ensures that each class is represented with approximately equal proportions in
both subsets
61
Random subsampling-Bootstrapping
- Random Subsampling performs ‘k’
iterations of entire dataset ,i.e. we form
‘k’ replica of given data.
- Let the estimated PE (prediction error) Problem: The testing set might
in the ith test set be denoted by Ei . The appear in some of the iterations.
true error estimate is obtained as the
average of the separate estimates Ei .
62
(K-fold) Cross-validation evaluation
• Cross-validation helps to
compare and select an Example of 4-fold
appropriate model for the cross validation
specific predictive
modeling problem.
• Divide the dataset into two
parts: one for training,
other for testing
• Train the model on the
training set
• Validate the model on the
test set
• Repeat 1-3 steps a couple
of times. This number
depends on the CV method
that you are using 63
Leave-one out evaluation
• Choose one sample from the dataset
which will be the test set
• The remaining n – 1 samples will be
the training set
• Train the model on the training set.
On each iteration, a new model
must be trained
• Validate on the test set
• Save the result of the validation
• Repeat steps 1 – 5 n times as for n
samples we have n different training
and test sets
• To get the final score average the
results that you got on step 5.
64
Decision Tree Induction in Weka
Overview
• ID3 (only work for categorical attributes)
• J48 (Java implementation of C4.5)
• RandomTree (with K attributes)
• RandomForest (a forest of random trees)
• REPTree (regression tree with reduced error pruning)
• BFTree (best-first tree, using Gain or Gini)
• FT (functional tree, logistic regression as split nodes)
• SimpleCart (CART with cost-complexity pruning)
Related Issues
Strengths
Capability of generating understandable rules
Efficiency in classifying unseen data
Ability to indicate the most important attribute for classification
Weaknesses
• Error rate increases as the training set contains a small number of
instances of a large variety of classes
• The computationally expensive to build
Decision Tree Induction in Practice
Selecting
attributes
Decision Tree Induction in Weka
Constructing Classification Models (ID3)
2. Setting a
4. View the model and
test option
evaluation results
3. Starting
the process
5. Selecting the
option to view
the tree
Decision Tree Induction in Weka
J48 (unpruned tree)
Decision Tree Induction in Weka
Random Tree
Decision Tree Induction in Weka
3.Press to start
the classification
Decision Tree Induction in Weka
1.Selecting the
option to pop up
visualisation
Decision Tree Induction in Weka
Class labels
assigned
Class Activities
Heart disease dataset (P = heart disease
present, N = heart disease absent)
a) Calculate the information gain over the
attribute Blood Pressure.
b) Given that gain(BodyWeight) = 0.0275 bits,
gain(BodyHeight) = 0.2184 bits,
gain(BodySugarLevel) =0.1407 bits and
gain(Habits) = 0.0721 bits which attribute
should be selected by ID3 algorithm?
c) Use the data in Table 6.4 as training
dataset and perform the following using
Weka:
i. Use the J48 to construct a decision tree
model and use the data in Table 6.6 as
testing dataset.
ii. Determine the classes for the unseen data
records in Table 6.5.
Homework: Other Classification Techniques
Other Classification Techniques to be presented by group members (how
it works, advantages/disadvantages, and example):