AIML Lab
AIML Lab
AIML Lab
BACHELOR OF ENGINEERING
DEPARTMENT
OF
INFORMATION SCIENCE AND ENGINEERING
Course Objectives:
1. Implement supervised and unsupervised machine learning algorithms
2. Perform classification on the preprocessed dataset.
3. Implement the machine learning concepts and algorithms in Python Programming
Course Outcomes: At the end of the course, student will be able to:
Understand and implement supervised and unsupervised machine learning
CO1
algorithms
CO2 Analyze and Implement Machine Learning algorithms on a given dataset
CO3 Construct the linear regression model as a method for prediction
CO4 Develop Bayesian concepts and clustering algorithms using Python program
CO5 Design and implement decision tree using information gain and entropy calculations
CO6 Analyze and build Artificial neural network.
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
CO1 3 2 - - 3 - - - - - - 2 2 - -
CO2 3 3 - - 3 - - - - - - 2 2 - -
CO3 3 2 2 - 3 - - - - - - 2 2 - -
CO4 3 2 2 - 3 - - - - - - 2 2 - -
CO5 3 2 2 - 3 - - - - - - 2 2 - -
CO6 3 2 2 - 3 - - - - - - 2 2 - -
7. For the given table, write a python program to perform K-Means Clustering. 3 CO2,
CO4
X1 3 1 1 2 1 6 6 6 5 6 7 8 9 8 9 9 8
X2 5 4 6 6 5 8 6 7 6 7 1 2 1 2 3 2 3
8. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the 3 CO2,
same dataset for clustering using k-Means algorithm. Compare the results of CO4
these two algorithms and comment on the quality of clustering.
9. For the given customer dataset, using the dendogram to find the optimal 3 CO2,
number of clusters and finding Hierarchical Clustering to the dataset CO4
10. Build an Artificial Neural Network by implementing the Backpropagation 3 CO2,
algorithm and test the same using appropriate data sets. CO6
TEXT BOOKS
1. Ethem Alpaydın: Introduction to Machine Learning, 2nd Edition, The MIT Press Cambridge,
Massachusetts London, England, 2010.
2. Tom M. Mitchell, “Machine Learning”, McGraw-Hill Education (INDIAN EDITION), 2018
Assessment Pattern:
CIE –Continuous Internal Evaluation Lab (50 Marks)
Implement simple linear regression using python program and estimate statistical quantities from
training data
Linear regression models provide a simple approach towards supervised learning. They are simple yet effective.
Linear implies the following: arranged in or extending along a straight or nearly straight line. Linear suggests
that the relationship between dependent and independent variable can be expressed in a straight line.
y = mx + c
Linear regression is nothing but a manifestation of this simple equation.
y is the dependent variable i.e. the variable that needs to be estimated and predicted.
x is the independent variable i.e. the variable that is controllable. It is the input.
m is the slope. It determines what will be the angle of the line. It is the parameter denoted as β.
When implementing linear regression of some dependent variable 𝑦 on the set of independent variables 𝐱 = (𝑥₁,
…, 𝑥ᵣ), where 𝑟 is the number of predictors, you assume a linear relationship between 𝑦 and 𝐱: 𝑦 = 𝛽₀ + 𝛽₁𝑥₁ +
⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀. This equation is the regression equation. 𝛽₀, 𝛽₁, …, 𝛽ᵣ are the regression coefficients, and 𝜀 is
the random error.
Linear regression calculates the estimators of the regression coefficients or simply the predicted weights,
denoted with 𝑏₀, 𝑏₁, …, 𝑏ᵣ. They define the estimated regression function (𝐱) = 𝑏₀ + 𝑏₁𝑥₁ + ⋯ + 𝑏ᵣ𝑥ᵣ. This
function should capture the dependencies between the inputs and output sufficiently well.
When implementing simple linear regression, you typically start with a given set of input-output (𝑥-𝑦) pairs
(green circles). These pairs are your observations. For example, the leftmost observation (green circle) has the
input 𝑥 = 5 and the actual output (response) 𝑦 = 5. The next one has 𝑥 = 15 and 𝑦 = 20, and so on. Linear
regression assumes a linear or straight line relationship between the input variables (X) and the single output
variable (y).
# Calculate coefficients
def coefficients(dataset):
x = [row[0] for row in dataset]
y = [row[1] for row in dataset]
x_mean, y_mean = mean(x), mean(y)
b1 = covariance(x, x_mean, y, y_mean) / variance(x, x_mean)
b0 = y_mean - b1 * x_mean
return [b0, b1]
Department of ISE, DSCE Page 5
AI & MACHINE LEARNING LAB -19IS6DLAML
OUTPUT:
• Specific Hypothesis
Most of human learning is based on past instances or experiences. For example, we are able to identify
any type of vehicle based on a certain set of features like make, model, etc., that are defined over a large
set of features. These special features differentiate the set of cars, trucks, etc from the larger set of
vehicles. These features that define the set of cars, trucks, etc are known as concepts. Similar to this,
machines can also learn from concepts to identify whether an object belongs to a specific category or not.
Any algorithm that supports concept learning requires the following:
– Training Data
– Target Concept
– Actual Data Objects
Hypothesis, in general, is an explanation for something. The general hypothesis basically states the
general relationship between the major variables. For example, a general hypothesis for ordering food
would be I want a burger.
G = { ‘?’, ‘?’, ‘?’, .....’?’}
The specific hypothesis fills in all the important details about the variables given in the general
hypothesis. The more specific details into the example considered would be I want a cheese burger with
pepperoni filling with a lot of lettuce.
S = {‘Φ’,’Φ’,’Φ’, ......,’Φ’}
The Find-S algorithm follows the steps below:
The process starts with initializing „h‟ with the most specific hypothesis; generally, it is the first positive example
in the data set.
Then check for each positive example. If the example is negative, move on to the next example but if it is a
positive example consider it for the next step.
Check if each attribute in the example is equal to the hypothesis value.
o If the value matches, then no changes are made.
o If the value does not match, the value is changed to „?‟.
Do this until the last positive example in the data set is reached.
Python code
import pandas as pd
import numpy as np
#to read the data in the csv file
data = pd.read_csv("C:/Users/Sudipta/Data.csv.csv")
print(data)
The final hypothesis is: ['?' 'Sunny' '?' 'Yes' '?' '?']
Program 3
For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent with the
training examples.
Candidate Elimination Algorithm Concept:
– Will use Version Space. Version Space: It is the intermediate space between Specific hypothesis and general hypothesis.
It denotes not just one hypothesis but a set of all possible hypothesis based on training data-set.
– Considers both positive and negative result.
– For positive example: tends to generalize specific hypothesis.
Algorithm
Initialize G & S as most General and specific hypothesis.
• For each example e:
data=pd.read_csv('ENJOYSPORT.csv')
print(data)
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
if target[i] == "no":
print("Instance is Negative ")
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print("Specific hypothesis after ", i+1, "Instance is ", specific_h)
print("Generic hypothesis after ", i+1, "Instance is ", general_h)
print("\n")
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
Generic hypothesis: [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?',
'?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
['Sunny' 'Warm' '?' 'Strong' '?' '?']
Final General_h:
[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?']]
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
Task: ID3 determines the information gain for each candidate attribute (i.e., Outlook, Temperature,
Humidity, and Wind), then selects the one with highest information gain as the root node of the tree.
The information gain values for all four attributes are calculated using the following formula:
Entropy(S)=∑- P(I).log2P(I)
Gain(S,A)=Entropy(S)-∑ [ P(S/A).Entropy(S/A) ]
Dataset:
Calculation:
Decision/play column consists of 14 instances and includes two labels: yes and
no. There are 9 decisions labeled yes and 5 decisions labeled no.
Now, we need to find out the most dominant factor for decision.
Gain(Decision,Wind)= Entropy(Decision) –
[P(Decision/Wind=Weak). Entropy(Decision/Wind=Weak)] -
[[P(Decision/Wind=strong).
Entropy(Decision/Wind=strong)]
There are 8 instances for weak. In that decision of 2 items are no and 6 items are yes.
Gain(Decision,outlook) = Entropy(decision)-
[P(Decision/Outlook=Sunny).Entropy(Decision/Outlook=Sunny)]-[
P(Decision/Outlook=overcast).Entropy(Decision/Outlook=overcast)
- [P(P(Decision/Outlook=Rain).Entropy(Decision/Outlook=Rain)]
Gain(Decision,Outlook)= 0.940-(5/14)(0.9709)-(4/14)(0)-(5/14)(0.9708)
=0.2473
Gain(Decision,Temperature) = Entropy(decision)-
∑[P(Decision/Temperature).Entropy(Decision/Temperature)]
Gain(Decision,Temp) = Entropy(decision)-
[P(Decision/Temp=hot).Entropy(Decision/Temp=hot)]-[
P(Decision/Temp=mild).Entropy(Decision/Temp=mild)]-
[P(P(Decision/Temp=cool).Entropy(
Decision/Temp=cool)]
Gain(Decision,Humidity) = Entropy(decision)-
∑[P(Decision/Humidity).Entropy(Decision/Humidity)]
High Normal:
Instances:7 Instances:7
Yes:3 Yes:6
No:4 No: 1
Now calculate sunny outlook on decision, overcast outlook on decision, and rain outlook
on decision to generate the decision tree.
Here, wind produces the highest score . And wind has two attributes namely strong and weak.
ID3 Algorithm:
ID3 in Python:
Output
Output
collections.Counter()
A counter is a container that stores elements as dictionary keys, and their counts are stored as dictionary values.
Calculate Information Gain for each Attribute
#Defining Information Gain Function
def information_gain(df, split_attribute_name, target_attribute_name, trace=0):
print("Information Gain Calculation of ",split_attribute_name)
print("target_attribute_name",target_attribute_name)
#Grouping features of Current Attribute
df_split = df.groupby(split_attribute_name)
for name,group in df_split:
print("Name: ",name)
print("Group: ",group)
nobs = len(df.index) * 1.0
print("NOBS",nobs)
#Calculating Entropy of the Attribute and probability part of formula
df_agg_ent = df_split.agg({target_attribute_name : [entropy_of_list, lambda x: len(x)/nobs]
})[target_attribute_name]
print("df_agg_ent",df_agg_ent)
# Calculate Information Gain
avg_info = sum( df_agg_ent['Entropy'] * df_agg_ent['Prob1'] )
old_entropy = entropy_of_list(df[target_attribute_name])
return old_entropy - avg_info
print('Info-gain for Outlook is :'+str(information_gain(df_tennis, 'Outlook', 'PT')),"\n")
Output
Output
attribute = next(iter(tree))
print("Best Attribute :\n",attribute)
print("Tree Keys:\n",tree[attribute].keys())
ACCURACY
training_data = df_tennis.iloc[1:-4]
test_data = df_tennis.iloc[-4:]
train_tree = id3(training_data, 'PT', attribute_names)
test_data['predicted2'] = test_data.apply(
classify, axis=1, args=(train_tree,'Yes') )
print ('\n\n Accuracy is : ' + str( sum(test_data['PT']==test_data['predicted2'] ) / (1.0*len(test_data.index)) ))
Output
Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
Where, P(h|D) is the probability of hypothesis h given the data D. This is called the posterior probability.
P(D|h) is the probability of data d given that the hypothesis h was true. P(h) is the probability of hypothesis
h being true. This is called the prior probability of h. P(D) is the probability of the data. This is called the
prior probability of D.
After calculating the posterior probability for a number of different hypotheses h, and is interested in
finding the most probable hypothesis h ∈ H given the observed data D. Any such maximally probable
hypothesis is called a maximum a posteriori (MAP) hypothesis.
The data set used in this program is the Pima Indians Diabetes problem.
This data set is comprised of 768 observations of medical details for Pima Indians patents.
The records describe instantaneous measurements taken from the patient such as their age, the number
of times pregnant and blood workup. All patients are women aged 21 or older. All attributes are
numeric, and their units vary from attribute to attribute.
The attributes are Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI,
DiabeticPedigreeFunction, Age, Outcome
Each record has a class value that indicates whether the patient suffered an onset of diabetes within 5
years of when the measurements were taken (1) or not (0)
Python code
import csv
import random
import math
def loadcsv(filename):
lines = csv.reader(open(filename, "r"))
dataset = list(lines)
for i in range(len(dataset)):
# converting the attributes from string to floating point numbers
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def splitDataset(dataset, splitRatio):
trainSize = int(len(dataset) * splitRatio)
trainSet = []
Department of ISE, DSCE Page 24
AI & MACHINE LEARNING LAB -19IS6DLAML
copy = list(dataset)
while len(trainSet) < trainSize:
index = random.randrange(len(copy)) # random index
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)]
del summaries[-1]
return summaries
def summarizeByClass(dataset):
separated = separateByClass(dataset)
summaries = {}
for classValue, instances in separated.items():
summaries[classValue] = summarize(instances)
return summaries
def calculateProbability(x, mean, stdev):
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent
def calculateClassProbabilities(summaries, inputVector):
probabilities = {}
for classValue, classSummaries in summaries.items():
Department of ISE, DSCE Page 25
AI & MACHINE LEARNING LAB -19IS6DLAML
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i]
x = inputVector[i]
probabilities[classValue] *= calculateProbability(x, mean, stdev)
return probabilities
def predict(summaries, inputVector):
probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries, testSet):
predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions
def getAccuracy(testSet, predictions):
correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0
def main():
filename = 'C:\\Users\\DELL\\.conda\\envs\\ml_env\\Scripts\\naivedata.csv'
splitRatio = 0.67
dataset = loadcsv(filename)
print("\n The Data Set Splitting into Training and Testing \n")
Department of ISE, DSCE Page 26
AI & MACHINE LEARNING LAB -19IS6DLAML
trainingSet, testSet = splitDataset(dataset, splitRatio)
# prepare model
summaries = summarizeByClass(trainingSet)
print("\n Model Summaries:\n",summaries)
# test model
predictions = getPredictions(summaries, testSet)
print("\nPredictions:\n",predictions)
OUTPUT
The length of the Data Set : 768
The Data Set Splitting into Training and Testing
Model Summaries:
{1.0: [(4.621621621621622, 3.675353808400119),(142.18378378378378, 32.791812948886125),
(71.55135135135136, 20.365380119287128),(22.524324324324326, 17.700733916947893),
(104.57297297297298, 143.58157931205457),(35.32162162162162, 7.162057905588884),
(0.543054054054054, 0.37339656809119126), (36.45945945945946, 10.470441299705367)], 0.0:
[(3.2401215805471124, 3.009147838987053), (108.44376899696049,
25.944312415767783),(66.65653495440729, 19.37925314843171), (19.89969604863222,
14.814059938401465), (65.09422492401215, 93.30385842522621), (29.891793313069915,
7.963504376664226), (0.41687537993920976, 0.3016472554815733), (30.93617021276596,
11.493590416134817)]}
Predictions:
[1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0,
1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0,
0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0,
1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0,
0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0,
1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0,
1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0,
1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0,
1.0, 0.0, 0.0, 0.0]
Accuracy: 72.83464566929135%
Department of ISE, DSCE Page 28
AI & MACHINE LEARNING LAB -19IS6DLAML
Program 6
Write a program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.
variables.
Probability
P(A) is used to denote the probability of A. For example if A is discrete with states {True, False} then
P(A) might equal [0.2, 0.8]. I.e. 20% chance of being True, 80% chance of being False.
Joint probability
A joint probability refers to the probability of more than one variable occurring together, such as
the probability of A and B, denoted P(A,B).
Conditional probability
Conditional probability is the probability of a variable (or set of variables) given another variable (or
set of variables), denoted P(A|B).For example, the probability of Windy being True, given that Raining
is True might equal 50%.This would be denoted P(Windy = True | Raining = True) = 50%.
Once the structure has been defined (i.e. nodes and links), a Bayesian network requires a probability
distribution to be assigned to each node.Each node X in a Bayesian network requires a probability
distribution P(X | pa(X)).Note that if a node X has no parents pa(X) is empty, and the required
distribution is just P(X) sometimes referred to as the prior.This is the probability of itself given its
parent nodes.
If U = {A1,...,An} is the universe of variables (all the variables) in a Bayesian network, and pa(Ai) are
the parents of Ai then the joint probability distribution P(U) is the simply the product of all the
probability distributions (prior and conditional) in the network, as shown in the equation below.This
equation is known as the chain rule.
From the joint distribution over U we can in turn calculate any query we are interested in (with or
without evidence set).
Suppose that there are two events which could cause grass to be wet: either the sprinkler is on or it's
raining. Also, suppose that the rain has a direct effect on the use of the sprinkler (namely that when it
The model can answer questions like "What is the probability that it is raining, given the grass is wet?"
by using the conditional probability formula and summing over all nuisance variables:
model=BayesianModel([('age','heartdisease'),('gender','heartdisease'),('exang','heartdisease'),
('cp','heartdisease'),('heartdisease','restecg'),('heartdisease','chol')])
print('\n Learning CPD using Maximum likelihood estimators')
model.fit (heartDisease,estimator=MaximumLikelihoodEstimator)
print ('\n Inferencing with Bayesian Network:')
HeartDiseasetest_infer = VariableElimination(model)
print ('\n 1. Probability of HeartDisease given evidence= restecg')
q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'age':28})
print(q1)
print('\n 2. Probability of HeartDisease given evidence= cp ')
q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'chol':100})
print(q2)
For the given table, write a python program to perform K-Means Clustering.
X1 3 1 1 2 1 6 6 6 5 6 7 8 9 8 9 9 8
X2 5 4 6 6 5 8 6 7 6 7 1 2 1 2 3 2 3
K-Means clustering intends to partition n objects into k clusters in which each object belongs to the cluster with the
nearest mean. This method produces exactly k different clusters of greatest possible distinction. The best number of
clusters k leading to the greatest separation (distance) is not known as a priori and must be computed from the data. The
objective of K-Means clustering is to minimize total intra-cluster variance, or, the squared error function:
Algorithm:
K-Means is relatively an efficient method. However, we need to specify the number of clusters, in advance and the final
results are sensitive to initialization and often terminates at a local optimum. Unfortunately, there is no global theoretical
method to find the optimal number of clusters. A practical approach is to compare the outcomes of multiple runs with
different k and choose the best one based on a predefined criterion. In general, a large k probably decreases the error but
increases the risk of over fitting.
# clustering dataset
from sklearn.cluster import KMeans
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt
x1 = np.array([3, 1, 1, 2, 1, 6, 6, 6, 5, 6, 7, 8, 9, 8, 9, 9, 8])
x2 = np.array([5, 4, 6, 6, 5, 8, 6, 7, 6, 7, 1, 2, 1, 2, 3, 2, 3])
plt.plot()
plt.xlim([0, 10])
plt.ylim([0, 10])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()
# KMeans algorithm
K = 3
kmeans_model = KMeans(n_clusters=K).fit(X)
plt.plot()
for i, l in enumerate(kmeans_model.labels_):
plt.plot(x1[i], x2[i], color=colors[l], marker=markers[l],ls='None')
plt.xlim([0, 10])
plt.ylim([0, 10])
plt.show()
Output:
Probability Density estimation is basically the construction of an estimate based on observed data. It
involves selecting a probability distribution function and the parameters of that function that best explains the
joint probability of the observed data.
2. E-Step (Expectation step): In this step, what we do is that Basically the data in which the missing values and
latent variables are present, we estimate them by observe data that we have. (Updating variables and data)
3. M-step (Maximization step): This step is basically used to complete the data we get from E-step. This step
updates the hypothesis.
Python code
import pandas as pd
import numpy as np
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.title('Real')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y])
# K-PLOT
model=KMeans(n_clusters=3, random_state=3425).fit(X)
plt.subplot(1,3,2)
plt.title('KMeans')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[model.labels_])
# GMM PLOT
gmm=GaussianMixture(n_components=3, random_state=3425).fit(X)
y_cluster_gmm=gmm.predict(X)
plt.subplot(1,3,3)
plt.title('GMM Classification')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm])
Output:
The accuracy score of K-Mean: 0.8933333333333333
The Confusion matrixof K-Mean:
[[50 0 0]
[ 0 48 2]
[ 0 14 36]]
The accuracy score of EM: 0.0
The Confusion matrix of EM:
[[ 0 50 0]
[ 5 0 45]
[50 0 0]]
A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different
contexts:
in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses.
in computational biology, it shows the clustering of genes or samples, sometimes in the margins of heatmaps.
Python code
# Hierarchical Clustering
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Mall_Customer.csv')
X = dataset.iloc[:, [3, 4]].values
# y = dataset.iloc[:, 3].values
# Using the dendrogram to find the optimal number of clusters
import scipy.cluster.hierarchy as sch
dendrogram = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()
# Fitting Hierarchical Clustering to the dataset
fromsklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward')
y_hc = hc.fit_predict(X)
# Visualising the clusters
plt.scatter(X[y_hc == 0, 0], X[y_hc == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_hc == 1, 0], X[y_hc == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_hc == 2, 0], X[y_hc == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_hc == 3, 0], X[y_hc == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_hc == 4, 0], X[y_hc == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()
FORWARD PROPAGATION
1. To calculate value of h1
net h1=w1*i1+w2*i2+b1*1
Eo1=E1/2(target-output)pow2
4. To calculate total error of the model
Etotal=Eo1+Eo2
BACKWARD PROPAGATION
Here we are writing the process and formulas to update our w5 weight.
Python code
import numpy as np
x = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
print("small x",x)
#original output
y = np.array(([92], [86], [89]), dtype=float)
X = x/np.amax(x,axis=0) #maximum along the first axis
print("Capital X",X)
#Defining Sigmoid Function for output
def sigmoid (x):
return (1/(1 + np.exp(-x)))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variables initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of input layer neurons
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#Defining weight and biases for hidden and output layer
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
Sample Output
Actual Output:
[[92.]
[86.]
[89.]]
Predicted Output:
[[0.92928201]
[0.92075172]
[0.93223878]]