Lab Program 3

Machine Learning Laboratory 15CSL76
3. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
ID3 Algorithm
ID3(Examples, Target_attribute, Attributes)
Examples are the training examples. Target_attribute is the attribute whose value is to
be predicted by the tree. Attributes is a list of other attributes that may be tested by the
learned decision tree. Returns a decision tree that correctly classifies the given
Examples.
 Create a Root node for the tree

 If all Examples are positive, Return the single-node tree Root, with label = +
 If all Examples are negative, Return the single-node tree Root, with label = -
 If Attributes is empty, Return the single-node tree Root, with label = most common value
of Target_attribute in Examples
 Otherwise Begin
 A ← the attribute from Attributes that best* classifies Examples
 The decision attribute for Root ← A
 For each possible value, vi, of A,

Add a new tree branch below Root, corresponding to the test A = vi

Let Examples vi, be the subset of Examples that have value vi for A

If Examples vi , is empty
 Then below this new branch add a leaf node with label = most common
value of Target_attribute in Examples
 Else below this new branch add the subtree
ID3(Examples vi, Targe_tattribute, Attributes – {A}))
 End
 Return Root
* The best attribute is the one with highest information gain
1 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College,

ENTROPY:
Entropy measures the impurity of a collection of examples.
Where, p+ is the proportion of positive examples in S

p- is the proportion of negative examples in S.
INFORMATION GAIN:
 Information gain, is the expected reduction in entropy caused by partitioning the

examples according to this attribute.
 The information gain, Gain(S, A) of an attribute A, relative to a collection of examples
S, is defined as
Training Dataset:
Day Outlook Temperature Humidity Wind PlayTennis

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Test Dataset:
Day Outlook Temperature Humidity Wind

T1 Rain Cool Normal Strong
T2 Sunny Mild Normal Strong

Program:
import math
import csv
def load_csv(filename):
lines=csv.reader(open(filename,"r"));
dataset = list(lines)
headers = dataset.pop(0)
return dataset,headers
class Node:
def init (self,attribute):
self.attribute=attribute
self.children=[]
self.answer=""
def subtables(data,col,delete):
dic={}
coldata=[row[col] for row in data]
attr=list(set(coldata))
counts=[0]*len(attr)
r=len(data)
c=len(data[0])
for x in range(len(attr)):
for y in range(r):
if data[y][col]==attr[x]:
counts[x]+=1
dic[attr[x]]=[[0 for i in range(c)] for j in
range(counts[x])]
pos=0
for y in range(r):
if data[y][col]==attr[x]:
if delete:
del data[y][col]
dic[attr[x]][pos]=data[y]
pos+=1
return attr,dic

def entropy(S):
attr=list(set(S))
if len(attr)==1:
return 0
counts=[0,0]
for i in range(2):
counts[i]=sum([1 for x in S if
attr[i]==x])/(len(S)*1.0)
sums=0
for cnt in counts:
sums+=-1*cnt*math.log(cnt,2)
return sums
def compute_gain(data,col):
attr,dic = subtables(data,col,delete=False)
total_size=len(data)
entropies=[0]*len(attr)
ratio=[0]*len(attr)
total_entropy=entropy([row[-1] for row in data])

ratio[x]=len(dic[attr[x]])/(total_size*1.0)
entropies[x]=entropy([row[-1] for row in
dic[attr[x]]])
total_entropy-=ratio[x]*entropies[x]
return total_entropy
def build_tree(data,features):
lastcol=[row[-1] for row in data]
if(len(set(lastcol)))==1:
node=Node("")
node.answer=lastcol[0]
return node
n=len(data[0])-1
gains=[0]*n
for col in range(n):
gains[col]=compute_gain(data,col)
split=gains.index(max(gains))
node=Node(features[split])
fea = features[:split]+features[split+1:]
attr,dic=subtables(data,split,delete=True)

child=build_tree(dic[attr[x]],fea)
node.children.append((attr[x],child))
return node
def print_tree(node,level):
if node.answer!="":
print(" "*level,node.answer)
return
print(" "*level,node.attribute)
for value,n in node.children:
print(" "*(level+1),value)
print_tree(n,level+2)
def classify(node,x_test,features):
if node.answer!="":
print(node.answer)
return
pos=features.index(node.attribute)
for value, n in node.children:
if x_test[pos]==value:
classify(n,x_test,features)
'''Main program'''
dataset,features=load_csv("data3.csv")
node1=build_tree(dataset,features)
print("The decision tree for the dataset using ID3 algorithm

is")
print_tree(node1,0)
testdata,features=load_csv("data3_test.csv")
for xtest in testdata:
print("The test instance:",xtest)
print("The label for test instance:",end=" ")
classify(node1,xtest,features)

Output:
The decision tree for the dataset using ID3 algorithm is
Outlook
rain
Wind
strong
no
weak
yes
overcast
yes
sunny
Humidity
normal
yes
high
no
The test instance: ['rain', 'cool', 'normal', 'strong']

The label for test instance: no
The test instance: ['sunny', 'mild', 'normal', 'strong']

The label for test instance: yes

Lab Program 3

Uploaded by

Copyright:

Available Formats

Lab Program 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab Program 3

Uploaded by

Copyright:

Available Formats

Machine Learning Laboratory 15CSL76

ID3(Examples, Target_attribute, Attributes)

 Create a Root node for the tree

* The best attribute is the one with highest information gain

1 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College,

Where, p+ is the proportion of positive examples in S

 Information gain, is the expected reduction in entropy caused by partitioning the

Day Outlook Temperature Humidity Wind PlayTennis

Day Outlook Temperature Humidity Wind

2 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College,

3 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College,

total_entropy=entropy([row[-1] for row in data])

4 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College,

print("The decision tree for the dataset using ID3 algorithm

5 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College,

The decision tree for the dataset using ID3 algorithm is

The test instance: ['rain', 'cool', 'normal', 'strong']

The test instance: ['sunny', 'mild', 'normal', 'strong']

6 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College,

You might also like