ML Lab Manual
ML Lab Manual
ML Lab Manual
SCHOOL OF ENGINEERING
PRACTICAL RECORD
Course:______________Branch:____________Reg.No:______________________
Name of Laboratory:___________________________________________________
ST.MARY’S GROUP OF INSTITUTIONS GUNTUR
(Approved by AICTE &Govt .of AP, Affiliated to JNTU-KAKINADA, Accredited by 'NAAC')
Chebrolu (V&M), Guntur (Dist), Andhra Pradesh, INDIA-522212
SCHOOL OF ENGINEERING
Certificate
This is to certify that Mr. / Ms.
academic year .
Signature of HOD
Index
Signature of
S.No Name of the Program Page No. Date
the Faculty
MACHINE LEARNING LAB MANUAL
1. Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Training Examples:
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong
Warm Same Yes 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change
No 4 Sunny Warm High Strong Cool Change Yes
Program:
import csv
num_attributes = 6
a = []
print("\n The Given Training Data Set \n")
with open('enjoysport.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[0][j];
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{0} the hypothesis is ".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)
Output:
The Given Training Data Set
['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']
['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']
['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']
['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'yes']
The initial value of hypothesis:
['0', '0', '0', '0', '0', '0']
Find S: Finding a Maximally Specific Hypothesis
For Training Example No:0 the hypothesis is
['sunny', 'warm', 'normal', 'strong', 'warm', 'same']
For Training Example No:1 the hypothesis is
['sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training Example No:2 the hypothesis is
'sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training Example No:3 the hypothesis is
'sunny', 'warm', '?', 'strong', '?', '?']
The Maximally Specific Hypothesis for a given Training Examples:
['sunny', 'warm', '?', 'strong', '?', '?']
2. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.
• If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
• Remove g from G
• Add to G all minimal specializations h of g such that
• h is consistent with d, and some member of S is more specific than h
• Remove from G any hypothesis that is less general than another hypothesis in G
Program:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('enjoysport.csv'))
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
print(specific_h)
print(specific_h)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1)
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")
Program:
import pandas as pd
import math
import numpy as np
data = pd.read_csv("Dataset/4-dataset.csv")
features = [feat for feat in data]
features.remove("answer")
Create a class named Node with four members children, value, isLeaf and pred.
class Node:
def __init__(self):
self.children = []
self.value = ""
self.isLeaf = False
self.pred = ""
Define a function called entropy to find the entropy oof the dataset.
def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["answer"] == "yes":
pos += 1
else:
neg += 1
if pos == 0.0 or neg == 0.0:
return 0.0
else:
p = pos / (pos + neg)
n = neg / (pos + neg)
return -(p * math.log(p, 2) + n * math.log(n, 2))
Define a function named ID3 to get the decision tree for the given dataset
def ID3(examples, attrs):
root = Node()
max_gain = 0
max_feat = ""
for feature in attrs:
#print ("\n",examples)
gain = info_gain(examples, feature)
if gain > max_gain:
max_gain = gain
max_feat = feature
root.value = max_feat
#print ("\nMax feature attr",max_feat)
uniq = np.unique(examples[max_feat])
#print ("\n",uniq)
for u in uniq:
#print ("\n",u)
subdata = examples[examples[max_feat] == u]
#print ("\n",subdata)
if entropy(subdata) == 0.0:
newNode = Node()
newNode.isLeaf = True
newNode.value = u
newNode.pred = np.unique(subdata["answer"])
root.children.append(newNode)
else:
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
new_attrs.remove(max_feat)
child = ID3(subdata, new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)
return root
Output:
rain
wind
strong -> ['no']
sunny
humidity
high -> ['no']
------------------
Predicted Label for new example {'outlook': 'sunny', 'temperature': 'hot', 'humidity': 'normal', 'wind':
'strong'} is: ['yes']
Experiment-7:
Build an Artificial Neural Network by implementing the Back propagation algorithm and test the
same using appropriate data sets.
BACKPROPAGATION Algorithm
Training Examples:
Expected % in
Example Sleep Study
Exams
1 2 9 92
2 1 5 86
3 3 6 89
Program:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinallyy = y/100
#Variable initialization
epoch=5000 #Setting training
iterationslr=0.1 #Setting earning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neur ons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neuron s))
bout=np.random.uniform(size=(1,output_neurons))
#draws a random range of numbers uniformly of dim x*yfor i in range(epoch):
#Forward Propogation
hinp1=np.dot(X,wh
)hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout) outinp= outinp1+
bout
output = sigmoid(outinp)
#Backpropagation
EO = y-
output
outgrad = derivatives_sigmoid(output)d_output = EO*
outgrad
EH = d_output.dot(wout.T)
Output:
Input:
[[0.66666667 1.]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual
Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89726759]
[0.87196896]
[0.9000671]]
Experiment-8:
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both
correct and wrong predictions.
Data Set:
Iris Plants Dataset: Dataset contains 150 instances (50 in each of three classes)
Number of Attributes: 4 numeric, predictive attributes and the Class
Program:
""" Iris Plants Dataset, dataset contains 150 (50 in each of three classes)Number of Attributes: 4
numeric, predictive attributes and the Class
"""
iris=datasets.load_iris()
""" The x variable contains the first four columns of the dataset(i.e. attributes) while y contains the
labels.
""" x = iris.data
y = iris.target
y_pred=classifier.predict(x_test)
""" For evaluating an algorithm, confusion matrix, precision, recalland f1 score are the most
commonly used metrics.
""" print('Confusion Matrix') print(confusion_matrix(y_test,y_pred)) print('Accuracy
Metrics') print(classification_report(y_test,y_pred))
Output:
Confusion Matrix
[[20 0 0]
[0 1 0]
0
[0 1 14]]
Accuracy Metrics
Program
import numpy as np
from bokeh.plotting import figure, show, output_notebook from bokeh.layouts import
gridplot
from bokeh.io import push_notebook
def local_regression(x0, X, Y, tau):# add bias termx0 = np.r_[1, x0] # Add one to
avoid the loss in
information
X = np.c_[np.ones(len(X)), X]
n = 1000
# generate dataset
X = np.linspace(-3, 3, num=n)
print("The Data Set ( 10 Samples) X :\n",X[1:10])Y = np.log(np.abs(X ** 2 -
1) + .5)
print("The Fitting Curve Data Set (10 Samples) Y
:\n",Y[1:10])
# jitter X
X += np.random.normal(scale=.1, size=n) print("Normalised (10 Samples) X
:\n",X[1:10])
show(gridplot([
[plot_lwr(10.), plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)]]))
Output
Experiment-10:
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy,
precision, and recall for your data set.
Data set:
Program:
print ('\n The total number of Training Data :',ytrain.shape)print ('\n The total number of Test
Data :',ytest.shape)
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_fe ature_names())
Output:
Accuracy of the
classifier is 0.8
Confusion matrix
[[2 1]
[0 2]]
The value of Precision
0.6666666666666666 The value of
Recall 1.0
Experiment-11: Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment on the
quality of clustering. You can add Java/Python ML library classes/API in the program.
iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
X.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y = pd.DataFrame(iris.target)
y.columns = ['Targets']
model = KMeans(n_clusters=3)
model.fit(X)
plt.figure(figsize=(14,7))
plt.subplot(2, 2, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_gmm], s=40)
plt.title('GMM Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
Data Set:
Title: Heart Disease Databases
The Cleveland database contains 76 attributes, but all published experiments refer to using a
subset of 14 of them. In particular, the Cleveland database is the only one that has been used by
ML researchers to this date. The "Heartdisease" field refers to the presence of heart diseasein the
patient. It is integer valued from 0 (no presence) to 4.
Database: 0 1 2 3 4 Total
Cleveland: 164 55 36 35 13 303
Attribute Information: