Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree

Decision Trees
At some point of time you have to

take a decision sitting on a tree 
What is a Decision Tree?
I. We come up with a decision dividing our

decisions in node and branches.
II. It is a Supervised Model
III. Can do both Linear and Logistic regression
models using decision trees
IV. Will discuss about
I. CHAID
II. CART
III. RandomForest
Measures of Decision Trees
Measures of Decision Trees (Contd.)
Gini Impurity Measure:

Measures of Decision Trees (Contd.)
Deviance:
-2 (nB log PB + nG log PG)

CHAID (Chi-square Automatic Interaction
Detector)
• Used to discover the relationship between variables.

• Performs multi-level splits
• Nominal, ordinal, and continuous data can be used,
where continuous predictors are split into categories
with approximately equal number of observations.
• Creates all possible cross tabulations for each categorical
predictor until the best outcome is achieved and no
further splitting can be performed.
• Well suited for large data sets.
• Commonly used for Marketing Segmentation.
CHAID (Contd.)
• We can visually see the relationships between the split

variables and the associated related factor within the
tree
• The development of the decision, or classification tree,
starts with identifying the target variable or dependent
variable; which would be considered the root.
• Splits the target into two or more categories that are
called the initial, or parent nodes, and then the nodes
are split using statistical algorithms into child nodes.
Benefit of CHAID
• Unlike in regression analysis, the CHAID

technique does not require the data to be
normally distributed.
Merging in CHAID
• In CHAID analysis, if the dependent variable is continuous,

the F test is used. F Test is used to check if the variances of
two variables are equal. In R it is done using var.test
command.
• If the dependent variable is categorical, the chi-square
test is used
• Each pair of predictor categories are assessed to determine
what is least significantly different with respect to the
dependent variable.
• Due to these steps of merging, a Bonferroni adjusted p-
value is calculated for the merged cross tabulation.
CHAID Components
• Root Node: Root node contains the dependent, or target, variable.

• Parent’s Node: The algorithm splits the target variable into two or
more categories. These categories are called parent node or initial
node.
• Child Node: Independent variable categories which come below
the parent's categories in the CHAID analysis tree are called the
child node.
• Terminal Node: The last categories of the CHAID analysis tree are
called the terminal node. In the CHAID analysis tree, the category
that is a major influence on the dependent variable comes first and
the less important category comes last. Thus, it is called the
terminal node.
CART (Classification and Regression Trees)
• Classifies objects and predicts outcomes by

selecting from a large number of variables, the
most important ones in determining the
outcome variable.
• CART Analysis is a form of binary recursive
partitioning.
Strengths of CART
• No distributional assumptions are required.

• No assumption of homogeneity.
• The explanatory variables can be a mixture of
categorical, interval and continuous.
• Especially good for high-dimensional and large
data sets. Produce useful results by using a few
important variables.
• Unaffected by outliers, collinearities or
hetroscedasticity.
Weakness of CART
• Not based on a probabilistic model and has no

confidence interval.
The best part of CART
• Sophisticated methods of dealing with missing

values.
• CART does not drop cases with missing values.
• It follows the concept of SURROGATE SPLITS
– Defines the measure of similarity between two splits
– If best split is ‘s’, on a variable; find s’ on other variables
that is most similar to s. Find the second best and so
on…
– If a case has the variable missing, it refers to the
surrogate
Steps of Tree Building
• Start with splitting a variable at all of its split

points. Sample splits into two binary nodes at
each split point.
• Select the best split in the variable in terms of the
reduction in impurity.
• Repeat steps 1 and 2 for all variables at the root
node.
• Rank all of the best splits and select the variable
that achieves the highest purity at the root.
Steps of Tree Building (Contd.)
• Assign classes to the nodes according to the

rule that minimises misclassification errors.
• Repeat steps 1-5 for each non-terminal node.
• Grow a very large tree Tmax until all terminal
nodes are either small or pure or contain
identical measurement vectors.
• Prune and choose final tree using the cross
validation.
What happens in pruning??
• CART lets tree grow to full extent, then prunes

it back
• Idea is to find that point at which the
validation error begins to rise
• Generate successively smaller trees by pruning
leaves
• At each pruning stage, multiple trees are
possible
What do I do before PRUNING?
• We will cover the pruning concept going ahead when

we discuss RANDOM FORESTS… Till then
CHILL
Random Forest

Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree

Uploaded by

Copyright:

Available Formats

Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree

Uploaded by

Copyright:

Available Formats

Decision Trees

At some point of time you have to

I. We come up with a decision dividing our

Gini Impurity Measure:

-2 (nB log PB + nG log PG)

• Used to discover the relationship between variables.

• We can visually see the relationships between the split

• Unlike in regression analysis, the CHAID

• In CHAID analysis, if the dependent variable is continuous,

• Root Node: Root node contains the dependent, or target, variable.

• Classifies objects and predicts outcomes by

• No distributional assumptions are required.

• Not based on a probabilistic model and has no

• Sophisticated methods of dealing with missing

• Start with splitting a variable at all of its split

• Assign classes to the nodes according to the

• CART lets tree grow to full extent, then prunes

• We will cover the pruning concept going ahead when

You might also like