Decision Tree
Decision Tree
Decision Tree
Decision Tree
• Decision Tree
• "A decision tree in machine learning is a flowchart structure in which each node
represents a "test" on the attribute and each branch represents the outcome of the
test."
• The end node (called leaf node) represents a class label.
Decision Tree Learning
• The decision tree learning is a method for approximating discrete valued target function in
which the learned function is represented by a decision tree.
• In machine learning, decision tree is a supervised learning method.
• It is used for regression and classification.
• Entropy E(x): The "average amount of information" contained by a random variable (x) is
called Entropy. It is denoted by (E) or (H).
• In other words, entropy is the "measure of randomness of information" of a variable.
Entropy (E) is the measure of impurity or uncertainty associated with a random variable
(X).
In the above fig, the entropy H(x) is zero when probability Pr(x) is 0 or 1. The entropy is
maximum (i.e ., 1) when probability is 0.5, because at this point randomness or impurity in
data is very high.
Basic Characteristic of Decision Tree Algorithms
ID3 Algorithm
ID3 Steps
1. Calculate the Information Gain of each feature.
2. Considering that all rows don’t belong to the same class, split the dataset S into subsets
using the feature for which the Information Gain is maximum.
3. Make a decision tree node using the feature with the maximum Information gain.
4. If all rows belong to the same class, make the current node as a leaf node with the class as
its label.
5. Repeat for the remaining features until we run out of all features, or the decision tree has
all leaf nodes.
Here, nine “Yes”, Five “No” i.e. 9 (+ve) and 5(-ve) example in this Table 1
CART Algorithm
Gini index
• Gini Index, also known as Gini impurity, measures the probability of a specific feature that
is classified incorrectly when selected randomly.
• Gini index varies between values 0 and 1, where 0 expresses the purity of classification,
i.e. All the elements belong to a specified class or only one class exists there.
• Here, 1 indicates the random distribution of elements across various classes. The value of
0.5 of the Gini Index shows an equal distribution of elements over some classes.
Gini ratio
• Pruning is a process of deleting the unnecessary nodes from a tree in order to get the
optimal decision tree.
• A too-large tree increases the risk of overfitting, and a small tree may not capture all the
important features of the dataset.
• Therefore, a technique that decreases the size of the learning tree without reducing
accuracy is known as Pruning. There are mainly two types of tree pruning technology
used:
• Reduced Error Pruning (Pre Pruning)
• Cost Complexity Pruning (Post Pruning)
Pre-pruning
• The pre-pruning technique of Decision Trees is tuning the hyperparameters prior to the
training pipeline.
• It involves the heuristic known as ‘early stopping’ which stops the growth of the decision
tree - preventing it from reaching its full depth.
• It stops the tree-building process to avoid producing leaves with small samples.
• During each stage of the splitting of the tree, the cross-validation error will be
monitored. If the value of the error does not decrease anymore - then we stop the
growth of the decision tree.
• The hyperparameters that can be tuned for early stopping and preventing overfitting
are:
max_depth, min_samples_leaf, and min_samples_split
• These same parameters can also be used to tune to get a robust model. However, you
Post-pruning
• Post-pruning does the opposite of pre-pruning and allows the Decision Tree model to
grow to its full depth.
• Once the model grows to its full depth, tree branches are removed to prevent the model
from overfitting.
• The algorithm will continue to partition data into smaller subsets until the final subsets
produced are similar in terms of the outcome variable.
• The final subset of the tree will consist of only a few data points allowing the tree to have
learned the data to the T. However, when a new data point is introduced that differs from
the learned data - it may not get predicted well.
• The hyperparameter that can be tuned for post-pruning and preventing overfitting is:
ccp_alpha
• ccp stands for Cost Complexity Pruning and can be used as another option to control the
size of a tree.
• A higher value of ccp_alpha will lead to an increase in the number of nodes pruned.
• This hyperparameter can also be used to tune to get the best fit models.