Lecture 8
Lecture 8
Lecture 8
Non-metric Methods
• Numerical Attributes
– Nearest-neighbor -- distance
– Neural networks: two similar inputs leads to
similar outputs
– SVMs: Dot Product
Classifier
(Decision Tree)
age?
<=30 >40
overcast
30..40
no yes fair
excellent
no yes no yes
• Misclassification impurity
There is a mistake in this slide. Entropy for a two class problem has its max. vaule =
1. For others, (for 2 class problems), its max value = 0.5
5. For each Dk ϵ D
BuildTD(Dk) // Recursive call
6. Stop
– The test condition for a binary attribute generates only two outcomes
– Muti-way split: Outcome depends on the number of distinct values for the
corresponding attribute
– In this case, decision tree induction must consider all possible split positions
• Range query : vi ≤ A < vi+1 for i = 1, 2, …, q (if q number of ranges are chosen)
Attributes:
Gender = {Male(M), Female (F)} // Binary attribute
Height = {1.5, …, 2.5} // Continuous attribute
– it partitions into a number of smaller training sets based on the distinct values
of attribute under split.
12/8/2021 48
Calculating Information Gain
✔
Age Eye-sight Use Type Astigmatic
Age Eye Ast Use Class Age Eye Ast Use Class
1 1 1 2 2
1 1 1 1 3
1 1 2 2 1
1 1 2 1 3
1 2 1 2 2
1 2 1 1 3
1 2 2 1 3 1 2 2 2 1
2 1 1 1 3 2 1 1 2 2
2 2 1 1 3 2 1 2 2 1
2 2 2 1 3 2 2 1 2 2
3 1 1 1 3 3 1 1 2 3
3 1 2 1 3 3 1 2 2 3
3 2 1 1 3 3 2 1 2 2
3 2 2 1 3 3 2 2 2 3
Age Eye-si
Astigmatic Age Eye-si
ght Astigmatic
ght
12/8/2021 Data Mining: Concepts and Techniques 53
Splitting of Continuous Attribute Values
Yes No
Yes No
Yes No
{O} {Y,M}
Yes No
{H} {L,M}
1 2 3
Class 1 2 1 1
Class 2 2 2 1
Class 3 4 5 6
Column sum 8 8 8
Job Y Performance
P G A E
N Y Y N
?
Decision Tree using CART
Note:
Decision Tree Induction Algorithm ID3 may suffer from overfitting problem.
Frequency 32 0 0 0
Frequency 16 16 0 0
Frequency 16 8 8 0
– Distribution 4
Frequency 16 8 4 4
Frequency 8 8 8 8
• On the other hand, split information forms the denominator in the gain ratio
formula.
– This implies that higher the value of split information is, lower the gain ratio.
– In turns, it decreases the information gain.
• Further, information gain is large when there are many distinct attribute
values.
– When many distinct values, split information is also a large value.
– This way split information reduces the value of gain ratio, thus resulting a
balanced value for information gain.
• Like information gain (in ID3), the attribute with the maximum gain ratio is
selected as the splitting attribute in C4.5.
• For a given training data D, the important task is to build the decision tree
so that:
– All test data can be classified accurately
– The tree is balanced and with as minimum depth as possible, thus the
classification can be done at a faster rate.
2. Missing data and noise: Decision tree induction algorithms are quite robust to
the data set with missing values and presence of noise. However, proper data
pre-processing can be followed to nullify these discrepancies.
6. Tree Pruning: A sub-tree can replicate two or more times in a decision tree
(see figure below). This makes a decision tree unambiguous to classify a
test record. To avoid such a sub-tree replication problem, all sub-trees
except one can be pruned from the tree.
A
C B
1 0
D C
0 1 D 1
Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber, Morgan
Kaufmann, 2015.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar,
Addison-Wesley, 2014