Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Department of

Computer Science and Engineering

Title: Implementation of Decision Tree


Classification

Green University of Bangladesh


1 Objective(s)
• To create a training model that can use to predict the class or value of the target variable by learning
simple decision rules inferred from prior data (training data).
• To apply Decision Tree Classification to a real-world predictive modeling problem.

2 Problem analysis
Decision Tree is a Supervised learning technique that can be used for both classification and Regression
problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where
internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome.

• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes
are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.

• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a problem/decision based on given
conditions.

• It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further
branches and constructs a tree-like structure.
• In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree
algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into
subtrees.

Below diagram explains the general structure of a decision tree:

Figure 1: Structure of Decision Tree

© Dept. of Computer Science and Engineering, GUB


Example: Suppose there is a candidate who has a job offer and wants to decide whether he should accept
the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary attribute by ASM).
The root node splits further into the next decision node (distance from the office) and one leaf node based on
the corresponding labels. The next decision node further gets split into one decision node (Cab facility) and one
leaf node. Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer). Consider
the below diagram:

Figure 2: Decision Tree

© Dept. of Computer Science and Engineering, GUB


3 Flowchart

Figure 3: Flowchart of Decision Tree

© Dept. of Computer Science and Engineering, GUB


4 Algorithm
Algorithm 1: Decision Tree Algorithm
Input: S, where S = set of classified instances
Output: Decision Tree
/* Algorithm for Decision Tree */
1 Procedure BUILD TREE
2 repeat
3 maxGain = 0
4 splitA = null
5 e = Entropy(Attributes)
6 For all Attributes a in S do
7 gain = InformationGain(a,e)
8 if gain > maxGain then then
9 maxGain = gain
10 splitA = a
11 end
12 end for
13 Partition (S, splitA)
14 until all partitions processed
15 end procedure

5 The Decision Tree Classifier implementation for any sample data


in Python
1 # Importing the libraries
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import pandas as pd
5
6 # Importing the dataset
7 dataset = pd.read_csv(’Social_Network_Ads.csv’)
8 X = dataset.iloc[:, [2,3]].values
9 y = dataset.iloc[:, 4].values
10
11 # Splitting the dataset into the Training set and Test set
12 from sklearn.model_selection import train_test_split
13 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
14
15 from sklearn.tree import DecisionTreeClassifier
16 classifier = DecisionTreeClassifier(criterion=’entropy’, random_state = 0,)
17 # criterion = entropy (goal is to reduce entropy, higher info gain and more
homogenious groups after splits)
18 classifier.fit(X_train, y_train)
19
20 #Predicting the Test Set Results
21 y_pred = classifier.predict(X_test)
22 # y_pred is the vector of predictions.
23 # gives prediction for each of the test set observation
24 # using predict method of the DecisionTreeClassifier class (classifier is an
object of the DecisionTreeClassifier class)
25
26 #Making the Confusion Matrix
27 from sklearn.metrics import confusion_matrix
28 # confustion_matrix is a function of the metrics library

© Dept. of Computer Science and Engineering, GUB


29 # difference between class and function (Class is Capitalized, function is
lower−case)
30 cm = confusion_matrix(y_test, y_pred)
31 # Definition : confusion_matrix(y_true, y_pred, labels=None, sample_weight=
None)
32 # using function to compute cofustion matrix
33 #cm=
34 # [62, 6],
35 # [ 3, 29]
36
37 from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
38 disp = ConfusionMatrixDisplay(confusion_matrix=cm,display_labels=classifier.
classes_)
39 disp.plot()
40
41 #Visulaizing the results
42
43 print(X_test)
44
45 print(classifier.predict([[24 , 57000]]))
46
47 print(classifier.predict([[26 , 52000]]))
48
49 print(classifier.predict([[45 , 26000]]))
50
51 from sklearn import tree
52 plt.figure(figsize=(16,16))
53 tree.plot_tree(classifier)
54 plt.show()
55
56 import graphviz
57 dot_data = tree.export_graphviz(classifier, out_file=None,
58 filled=True, rounded=True,
59 feature_names=["Age","Salary"],
60 class_names=["No","Yes"],
61 special_characters=True)
62 graph = graphviz.Source(dot_data)
63 graph
64
65 from sklearn.tree import DecisionTreeClassifier
66 from sklearn.tree import export_text
67 r = export_text(classifier, feature_names=["Age","Salary"])
68 print(r)
69
70 import graphviz
71 dot_data = tree.export_graphviz(classifier, out_file=None)
72 graph = graphviz.Source(dot_data)
73 graph.render("Social_Network_Ads")

© Dept. of Computer Science and Engineering, GUB


6 Input/Output
Output of the program is given below.

Output of Line no. 41: [0].


Output of Line no. 43: [0].
Output of Line no. 45: [1].

7 Discussion & Conclusion


From this experiment we learn about how to visualize the Decision Tree classifier for any sample data.

8 Lab Task (Please implement yourself and show the output to the
instructor)
1. You need to visualize the Decision Tree classifier for this sample data.

Dataset = [[3.393,2.331,0], [3.110,1.786,0], [1.348,3.309,0], [3.540,4.679,0], [2.284,2.892,0], [7.429,4.622,1],


[5.741,3.538,1], [9.176,2.510,1], [7.794,3.429,1], [7.939,0.791,1]]
Then you test your implemented Decision Tree classifier algorithm with the first data i.e.,dataset[0], and
observe the label.

9 Lab Exercise (Submit as a report)


• You need to visualize the Decision Tree classifier for this "Social Network Ads.csv" dataset after using
dummy dataset function.

In this dataset, there have 400 records.

9.1 Problem Analysis

1 # Importing the dataset


2 dataset = pd.read_csv(’Social_Network_Ads.csv’)
3 dataset.head()

The Output is:

Figure 4: Output of the program

© Dept. of Computer Science and Engineering, GUB


1 # Creating Dummies of the dataset
2 dataset = pd.get_dummies(dataset)
3 dataset.head()

The Output is:

Figure 5: Output of the program

10 Policy
Copying from internet, classmate, seniors, or from any other source is strongly prohibited. 100% marks will be
deducted if any such copying is detected.

© Dept. of Computer Science and Engineering, GUB

You might also like