M01 Tree-Based Methods
M01 Tree-Based Methods
M01 Tree-Based Methods
Tree-Based Models
Decision trees – classification trees and
regression trees
CART (Classification And Regression Trees)
Classification tree: target variable Y is discrete, finitely-many
valued
Leaf nodes are associated with discrete class labels
Regression trees: The target variable Y is continuous-valued
Leaf nodes are associated with numerical, continuous values
M of N
When to Consider Decision Trees for classification
Examples:
Medical diagnosis
Credit risk analysis
Equipment fault diagnosis
Top-down induction of decision tree algorithm
(ID3)
Main loop: (given the current Node with dataset S)
If the termination condition at Node is satisfied, then STOP
Else:
A the ``best'' decision attribute for splitting the data S at
Node
Assign A as decision attribute for Node
For each value of A, create a new descendant of Node
Sort training examples to the new leaf nodes
Iterate over new leaf nodes
Criterion function for selecting the “best” attribute
Node: S
A
𝑎1 𝑎2 𝑎𝑘
: : …… :
points at node n
Compute ( the error of pruned tree validation data
Greedily remove the one (node n*) that most improves validation set accuracy
((T*) < (T :
T
IF (Outlook=Sunny) (Humidity=High)
THEN PlayTennis =No
IF (Outlook=Sunny) ∧ (Humidity=
Normal)
THEN PlayTennis = Yes
IF Outlook=overcast
THEN PlayTennis =Yes
IF (Outlook=Rain) (Wind=Strong)
THEN PlayTennis =No
IF (Outlook=Rain) (Wind=Weak)
THEN PlayTennis =Yes
Extensions: continuous valued attributes
Discretize in advance and then use ID3
Create a binary attribute (on the fly) to test continuous variable
values
For example, for variable Temperature, a new attribute
(Temperature > 85 ) takes value true/false
Temperature: 40 48 60 72 80 90
PlayTennis : No No Yes Yes Yes No
The split spot should be just between class label transitions to have
better information gain
Other attribute selection criterion
If attribute has many values, Gain will select it
Imagine using Date = Jun_1_2020 as attribute can not
generalize at all
One approach: use GainRatio instead
GainRatio(S,A) =
SplitInformation(S,A) = Entropy(A)
where is the subset of S for which attribute A has value
Other attribute selection criterion
Gini-impurity:
Y: discrete random variable (e.g., Y = Class)
values(Y) = with probabilities respectively.
Gini-imp(Y) =
If we use Gini-impurity(S) to denote Gini-impurity(class) for the
class variable associated with dataset S:
Reduction_Gini-
Average/vote
Bagging