Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Decision Trees

At some point of time you have to


take a decision sitting on a tree 
What is a Decision Tree?

I. We come up with a decision dividing our


decisions in node and branches.
II. It is a Supervised Model
III. Can do both Linear and Logistic regression
models using decision trees
IV. Will discuss about
I. CHAID
II. CART
III. RandomForest
Measures of Decision Trees
Measures of Decision Trees (Contd.)

Gini Impurity Measure:


Measures of Decision Trees (Contd.)

Deviance:

-2 (nB log PB + nG log PG)


CHAID (Chi-square Automatic Interaction
Detector)

• Used to discover the relationship between variables.


• Performs multi-level splits
• Nominal, ordinal, and continuous data can be used,
where continuous predictors are split into categories
with approximately equal number of observations. 
• Creates all possible cross tabulations for each categorical
predictor until the best outcome is achieved and no
further splitting can be performed. 
• Well suited for large data sets.
• Commonly used for Marketing Segmentation.
CHAID (Contd.)

• We can visually see the relationships between the split


variables and the associated related factor within the
tree
• The development of the decision, or classification tree,
starts with identifying the target variable or dependent
variable; which would be considered the root.
• Splits the target into two or more categories that are
called the initial, or parent nodes, and then the nodes
are split using statistical algorithms into child nodes.
Benefit of CHAID

• Unlike in regression analysis, the CHAID


technique does not require the data to be
normally distributed.
Merging in CHAID

• In CHAID analysis, if the dependent variable is continuous,


the F test is used. F Test is used to check if the variances of
two variables are equal. In R it is done using var.test
command.
• If the dependent variable is categorical, the chi-square
test is used
• Each pair of predictor categories are assessed to determine
what is least significantly different with respect to the
dependent variable.
• Due to these steps of merging, a Bonferroni adjusted p-
value is calculated for the merged cross tabulation.
CHAID Components

• Root Node: Root node contains the dependent, or target, variable. 


• Parent’s Node: The algorithm splits the target variable into two or
more categories.  These categories are called parent node or initial
node. 
• Child Node:  Independent variable categories which come below
the parent's categories in the CHAID analysis tree are called the
child node.
• Terminal Node: The last categories of the CHAID analysis tree are
called the terminal node.  In the CHAID analysis tree, the category
that is a major influence on the dependent variable comes first and
the less important category comes last.  Thus, it is called the
terminal node.
CART (Classification and Regression Trees)

• Classifies objects and predicts outcomes by


selecting from a large number of variables, the
most important ones in determining the
outcome variable.
• CART Analysis is a form of binary recursive
partitioning.
Strengths of CART

• No distributional assumptions are required.


• No assumption of homogeneity.
• The explanatory variables can be a mixture of
categorical, interval and continuous.
• Especially good for high-dimensional and large
data sets. Produce useful results by using a few
important variables.
• Unaffected by outliers, collinearities or
hetroscedasticity.
Weakness of CART

• Not based on a probabilistic model and has no


confidence interval.
The best part of CART

• Sophisticated methods of dealing with missing


values.
• CART does not drop cases with missing values.
• It follows the concept of SURROGATE SPLITS
– Defines the measure of similarity between two splits
– If best split is ‘s’, on a variable; find s’ on other variables
that is most similar to s. Find the second best and so
on…
– If a case has the variable missing, it refers to the
surrogate
Steps of Tree Building

• Start with splitting a variable at all of its split


points. Sample splits into two binary nodes at
each split point.
• Select the best split in the variable in terms of the
reduction in impurity.
• Repeat steps 1 and 2 for all variables at the root
node.
• Rank all of the best splits and select the variable
that achieves the highest purity at the root.
Steps of Tree Building (Contd.)

• Assign classes to the nodes according to the


rule that minimises misclassification errors.
• Repeat steps 1-5 for each non-terminal node.
• Grow a very large tree Tmax until all terminal
nodes are either small or pure or contain
identical measurement vectors.
• Prune and choose final tree using the cross
validation.
What happens in pruning??

• CART lets tree grow to full extent, then prunes


it back
• Idea is to find that point at which the
validation error begins to rise
• Generate successively smaller trees by pruning
leaves
• At each pruning stage, multiple trees are
possible
What do I do before PRUNING?

• We will cover the pruning concept going ahead when


we discuss RANDOM FORESTS… Till then

CHILL
Random Forest

You might also like