Decision Tree - Associative Rule Mining
Decision Tree - Associative Rule Mining
Decision Tree - Associative Rule Mining
The top-down approach means that we start building the tree from the top
Information Gain calculates the reduction in the entropy and measures how well a given
feature separates or classifies the target classes. The feature with the highest Information
Gain is selected as the best one.
Where,
Sᵥ is the set of rows in S for which the feature column A has value v
|Sᵥ| is the number of rows in Sᵥ
|S| is the number of rows in S.
Entropy is the measure of disorder and the Entropy of a dataset is the measure of
disorder in the target feature of the dataset.
In the case of binary classification (where the target column has only two types of classes)
entropy is 0 if all values in the target column are homogenous(similar) and will be 1 if the
target column has equal number values for both the classes.
Entropy(S) = - ∑ pᵢ * log₂(pᵢ) ; i = 1 to n
where,
n is the total number of classes in the target column (in our case n = 2 i.e YES and NO)
pᵢ is the probability of class ‘i’ or the ratio of “number of rows with class i in the target
column” to the “total number of rows” in the dataset.
ID3 Steps
2.Considering that all rows don’t belong to the same class, split the dataset S into subsets
using the feature for which the Information Gain is maximum.
3.Make a decision tree node using the feature with the maximum Information gain.
4.If all rows belong to the same class, make the current node as a leaf node with the class
as its label.
5.Repeat for the remaining features until we run out of all features, or the decision tree has
all leaf nodes.
Example
CART Algorithm
CART Algorithm for Classification
The tree will be constructed in a top-down approach as follows: