01 Machine Learning

N.G.ACHARYA & D.K.
MARATHE COLLEGE OF
ARTS, SCIENCE & COMMERCE.
(Affiliated to University Of Mumbai)
PRACTICAL JOURNAL
PSCSP512
Machine Learning
SUBMITTED BY
KAMBLE YASH RAJESH
SEAT NO :
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR QUALIFYING M.Sc. (CS) PART-I (SEMESTER – II) EXAMINATION.
2023-2024
DEPARTMENT OF COMPUTER SCIENCE

SHREE N.G.ACHARYA MARG,CHEMBUR
MUMBAI-400 071
N.G.ACHARYA & D.K.MARATHE COLLEGE OF
ARTS, SCIENCE & COMMERCE.
(Affiliated to University Of Mumbai)
CERTIFICATE
This is to certify that Mr. Kamble Yash Rajesh Seat No. studying
in Master of Science in Computer Science Part I Semester II has
satisfactorily completed the Practical of PSCSP512 Machine Learning
as prescribed by University of Mumbai, during the academic year 2023-
24.
Signature Signature Signature
Internal Guide External Examiner Head Of Department
College Seal Date:

INDEX
Practical Practical Signature.

No.
1 Implement Linear Regression(Diabetes Dataset)
2 Implement Logistics Regression(Iris Dataset).
3 Implement Multinomial Logistic Regression(Iris

Dataset).
4 Implement SVM classifier(Iris Dataset)
5 Train and fine-tune a decision tree for the moons

dataset.
6 Train and SVM regression on the California

housing dataset.
7 Implement Batch Gradient Descent with early

Stopping for softmax regression.
8 Implement MLP for classification of handwritten

Digits (MNIST Dataset).
Practical No:- 01
AIM :- Implement Linear Regression (Diabetes Dataset).
# Import the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes
print(diabetes.DESCR)
# columns
diabetes.feature_names
# Now we will split the data into the independent and independent variabl
X = diabetes.data
Y = diabetes.target
X.shape, Y.shape
# We will split the data into training and testing data

from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(X,Y,test_size=0.3,random_state=99)
train_x.shape, train_y.shape
# Linear Regression
from sklearn.linear_model import LinearRegression
le = LinearRegression()
le.fit(train_x,train_y)
y_pred = le.predict(test_x)
y_pred
result = pd.DataFrame({'Actual': test_y, 'Predict' : y_pred})
result
# we will check the accuracy

print('coefficient', le.coef_)
print('intercept', le.intercept_)
from sklearn.metrics import mean_squared_error, r2_score, explained_variance_score
#Variance_Score
explained_variance_score(test_y,predicted_y)
>> 0.47737703777354545
# mean_squared_error
mean_squared_error(test_y,y_pred)
>> 3157.972848565651
# r2 score
r2_score(test_y,y_pred)
Inference:-
The model explains 0.477377 variance of the target w.r.t. features
The mean absolute error of model is 3157.972848565651
The R-Square score of model is 0.45
Below are the Coefficients & intercepts of the Regression Equation as calculated by the
model.
coeff = pd.Series(le.coef_, index = train_x.columns)

intercept = le.intercept_
print("Coefficients:\n")
print(coeff)
print("\n")
print("Intercept:\n")
print(intercept)
print("\n")
Coefficients:
age 54.820535
sex -260.930304
bmi 458.001802
bp 303.502332
s1 -995.584889
s2 698.811401
s3 183.095229
s4 185.698494
s5 838.503887
s6 96.441048
dtype: float64
Intercept:
154.42752615353518
The regression Equation would be :
Diabetes Progression = Intercept + coeff(1) X age + coeff(2) X sex +.....+ coeff(10) X s6

Practical No 2
Aim :- Implement Logistic Regression (Iris Dataset)
#Check the accuracy of model
from sklearn import accuracy_score
print(accuracy_score(Y_test, Y_predict))
0.973684
Inference:-
The logistic regression model’s accuracy is 97.36%
We have a 97.36% of accuracy which is a very good model, and with the confusion matrix
we can see that we have just only one misclassified data.
Practical No 3
Aim:- Implement Multinomial Logistic Regression (Iris Dataset)
import numpy as np
import pandas as pd
# Importing Sklearn module and classes

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
#Data Loading – IRIS dataset

iris = datasets.load_iris()
iris
{'data': array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7, 3.2, 1.3, 0.2], [4.6, 3.1, 1.5, 0.2], [5.
, 3.6, 1.4, 0.2], [5.4, 3.9, 1.7, 0.4], {'data': array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7,
3.2, 1.3, 0.2], [4.6, 3.1, 1.5, 0.2], [5. , 3.6, 1.4, 0.2], [5.4, 3.9, 1.7, 0.4],
X = iris.data[:, [0, 2]]

Y = iris.target
#Create Training / Test Data

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1,
stratify=Y)
X_train
X_train.shape
(105, 2) .
X_test.shape
(45, 2)
#Perform Feature Scaling

#in order to make sure features are in fixed range irrespective of their values / units etc.
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
X_train
X_train_std
#Train a Logistic Regression Model

# Create an instance of LogisticRegression classifier
lr = Multiple LogisticRegression(C=100.0, solver='lbfgs', multi_class='ovr')
# Fit the model
lr.fit(X_train_std, Y_train)
#Measure model performance

# Create the predictions
Y_predict = lr.predict(X_test_std)
Y_predict
array([2, 0, 0, 1, 1, 1, 2, 1, 2, 0, 0, 2, 0, 1, 0, 1, 2, 1, 1, 2, 2, 0, 1, 1, 1, 1, 1, 2, 0, 2, 0, 0, 1, 1, 2,
2, 0, 0, 0, 1, 2, 2, 1, 0, 0])
# Use metrics.accuracy_score to measure the score

print("multinomial LogisticRegression Accuracy %.3f" %metrics.accuracy_score(Y_test,
Y_predict))
Output
multinomial LogisticRegression Accuracy 0.956
Infernce – The accuracy score of model is 95%

Practical No 4
Aim:- Implement SVM classifier (Iris Dataset)
import pandas as pd
import seaborn as sns
#Define the col names

colnames=["sepal_length_in_cm",
"sepal_width_in_cm","petal_length_in_cm","petal_width_in_cm", "class"]
#Read the dataset

dataset = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data", header = None, names= colnames )
#Data
dataset.head()
#Encoding the categorical column

dataset = dataset.replace({"class": {"Iris-setosa":1,"Iris-versicolor":2, "Iris-virginica":3}})
#Visualize the new dataset
dataset.head()
# Now we’re going to analyze our data

plt.figure(1)
sns.heatmap(dataset.corr())
plt.title('Correlation On iris Classes')
# Spliting the data

X = dataset.iloc[:,:-1]
y = dataset.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
#Create the SVM model

from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
#Fit the model for the data
classifier.fit(X_train, y_train)
#Make the prediction

y_pred = classifier.predict(X_test)
#And finally for check the acurracy of the model

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
from sklearn.model_selection import cross_val_score

accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
print("Accuracy: {:.2f} %".format(accuracies.mean()*100))
print("Standard Deviation: {:.2f} %".format(accuracies.std()*100))
Output
[[13 0 0]
[ 0 15 1]
[ 0 0 9]]
Accuracy: 98.18 %
Standard Deviation: 3.64 %
Inference - Accuracy score of model is 98.18% and standard deviation is 3.64%

We have a 98% of accuracy which is a very good model, and with the confusion matrix we
can see that we have just only one misclassified data.
Practical No 5
Aim:- Train and fine-tune a Decision Tree for the Moons Dataset
import numpy as np
def plot_dataset(X, y, axes):
plt.figure(figsize=(10,6))
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs",alpha = 0.5)
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^",alpha = 0.2)
plt.axis(axes)
plt.grid(True, which='both')
plt.xlabel(r"$x_1$", fontsize=20)
plt.ylabel(r"$x_2$", fontsize=20, rotation=0)
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=10000, noise=0.4, random_state=21)

plot_dataset(X, y, [-3, 5, -3, 3])
X_train, X_test, y_train, y_test =train_test_split(X,y, test_size = 0.2)
from sklearn.tree import DecisionTreeClassifier
tree_clf = DecisionTreeClassifier()
from sklearn.model_selection import GridSearchCV
parameter = {
'criterion' : ["gini", "entropy"],
'max_leaf_nodes': list(range(2, 50)),
'min_samples_split': [2, 3, 4]
}
clf = GridSearchCV(tree_clf, parameter, cv = 5,scoring =

"accuracy",return_train_score=True,n_jobs=-1)
clf.fit(X_train, y_train)
clf.best_params_
{'criterion': 'gini', 'max_leaf_nodes': 37, 'min_samples_split': 2}
cvres = clf.cv_results_
for mean_score, params in zip(cvres["mean_train_score"], cvres["params"]):
print(mean_score, params)
0.77859375 {'criterion': 'gini', 'max_leaf_nodes': 2, 'min_samples_split': 2}

#Getting the training score:

clf.score(X_train, y_train)
0.875125
We have an accuracy of approximately 87% but accuracy is sometimes not a good measure to
use,
lets see the confusion matrix.
from sklearn.metrics import confusion_matrix

pred = clf.predict(X_train)
confusion_matrix(y_train,pred)
array([[3547, 481],
[ 518, 3454]])
Now from the confusion matrix let's get our precision and recall, which are better
metrics.
from sklearn.metrics import precision_score, recall_score
pre = precision_score(y_train, pred)

re = recall_score(y_train, pred)
print(f"Precision: {pre} Recall:{re}")
Precision: 0.8777636594663278 Recall:0.8695871097683786
Not bad we have a higher precision than recall but lets combine the two metrics
into F1_ score.
from sklearn.metrics import f1_score

f1_score(y_train, pred)
0.8736562539521943
Our F1_Score and accuracy are almost the same.
Getting the testing score:

clf.score(X_test, y_test)
0.8585
Inference:-
We have an accuracy of approximately 85% on the testing set.
Practical No 6
Aim:- Train and SVM regression on the California housing dataset
import pandas as pd
import numpy as np
import seaborn as sb
from sklearn.datasets import fetch_california_housing
housing_data = fetch_california_housing()
descr = housing_data[‘DESCR’]
feature_names = housing_data[‘feature_names’]
data = housing_data[‘data’]
target = housing_data[‘target’]
df1 = pd.DataFrame(data=data)
df1.rename(columns={0: feature_names[0], 1: feature_names[1], 2: feature_names[2], 3:
feature_names[3],
4: feature_names[4], 5: feature_names[5], 6: feature_names[6], 7: feature_names[7]},
inplace=True)
df2 = pd.DataFrame(data=target)
df2.rename(columns={0: ‘Target’}, inplace=True)
housing = pd.concat([df1, df2], axis=1)
print(housing.columns)
housing.head()
MedIncHouseAgeAveRoomsAveBedrmsPopulationAveOccupLatitudeLongitudeTarget08.32
5241.06.9841271.023810322.02.55555637.88-
122.234.52618.301421.06.2381370.9718802401.02.10984237.86-
122.223.58527.257452.08.2881361.073446496.02.80226037.85-
122.243.52135.643152.05.8173521.073059558.02.54794537.85-
122.253.41343.846252.06.2818531.081081565.02.18146737.85-122.253.422
print(“dimension of housing data: {}”.format(housing.shape))
dimension of housing data: (20640, 9)
housing.info()
X_train, X_test, y_train, y_test = train_test_split(housing.loc[:, housing.columns != 'Target'],
housing['Target'], random_state=66)
from sklearn.svm import SVR
svr = SVR()
svr.fit(X_train, y_train)
s1 = svr.score(X_train, y_train)
s2 = svr.score(X_test, y_test)
print(“R² of Support Vector Regressor on training set: {:.3f}”.format(s1))
print(“R² of Support Vector Regressor on test set: {:.3f}”.format(s2))
O/P
R² of Support Vector Regressor on training set: -0.023
R² of Support Vector Regressor on test set: -0.033
Inference:-
The model underperform quite substantially, with a negative score on both the training set
and the test set.
SVM requires all the features to vary on a similar scale. We will need to re-scale our data that
all the features are approximately on the same scale:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)
svr1 = SVR()
svr1.fit(X_train_scaled, y_train)
s3 = svr1.score(X_train_scaled, y_train)
s4 = svr1.score(X_test_scaled, y_test)
R² of Support Vector Regressor on training set: 0.659
R² of Support Vector Regressor on test set: 0.663
Inference:-
Scaling the data made a huge difference! Now we are actually underfitting, where training
and test set performance are quite similar but less close to 100% accuracy. From here, we can
try increasing either gamma or C to fit a more complex model.
svr2 = SVR(gamma=10)
svr2.fit(X_train_scaled, y_train)
s5 = svr2.score(X_train_scaled, y_train)
s6 = svr2.score(X_test_scaled, y_test)
R² of Support Vector Regressor on training set: 0.702
R² of Support Vector Regressor on test set: 0.697
Inference:-
Here, increasing gamma allows us to improve the model, resulting in 69.7% test set accuracy.
Practical No 7
Aim:- Implement Batch Gradient Descent with early stopping for Softmax
Regression.
import numpy as np
import math
class SoftmaxClassifier:
def __init__(self,learning_rate=0.1,max_iter=1000):
self.__learning_rate=learning_rate
self.__max_iter=max_iter
def __calculate_score(self,k,x):
weight=self.__weights[k]
return x.dot(weight)
def train(self,x,y):
self.__x=x
self.__y=y
self.__class_count=len(self.__y[0])
self.__weights=np.random.rand(self.__class_count,x.shape[1])
for i in range(self.__max_iter):
for j in range(self.__class_count):
self.__weights[j]=self.__calculate_new_weights(j)
def __calculate_softmax(self,k,x):
sum_of_exp=0
for i in range(self.__class_count):
sum_of_exp+=self.__calculate_score(i,x)
return self.__calculate_score(k,x)/sum_of_exp
def __calculate_cross_entropy_gradient(self,k):
sum=0
for i in range(len(self.__x)):
sum+=((self.__calculate_softmax(k,self.__x[i])-self.__y[i][k])*self.__x[i])
return sum
def __calculate_new_weights(self,k):
step_size=(self.__calculate_cross_entropy_gradient(k) )* self.__learning_rate
return self.__weights[k]-step_size
def predict(self,x):
y=np.zeros((len(x),self.__class_count))
for i in range(len(x)):
max_score_index=0
max_score=0
for j in range(self.__class_count):
score=self.__calculate_softmax(j,x[i])
if score>max_score:
max_score=score
max_score_index=j
y[i][max_score_index]=1
return y

from sklearn.metrics import accuracy_score
import numpy as np
def convert_to_one_hot(labels):
class_count=len(set(labels))
one_hot=np.zeros((len(labels),class_count))
for i in range(len(labels)):
one_hot[i][labels[i]]=1
return one_hot
def main():
iris=datasets.load_iris()
data=iris['data']
labels_one_hot=convert_to_one_hot(iris['target'])
rand=np.random.permutation(len(data))
x_train,x_test,y_train,y_test=train_test_split(data[rand],labels_one_hot[rand],test_size=0.33
)
soft_clf=SoftmaxClassifier()
soft_clf.train(x_train,y_train)
y_pred=soft_clf.predict(x_test)
Accuracy=accuracy_score(y_test,y_pred)
print(Accuracy)
if __name__=="__main__":
main()
O/P Accuracy=0.8
Inference:-
Accuracy of model is 80%
Practical No 8
Aim:- Implement MLP for classification of handwritten digits (MNIST
Dataset).
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
image_index = 7777 # You may select anything up to 60,000

print(y_train[image_index]) # The label is 8
plt.imshow(x_train[image_index], cmap='Greys')
8
<matplotlib.image.AxesImage at 0x7fa7325add50>
x_train.shape
(60000, 28, 28)
# Reshaping the array to 4-dims so that it can work with the Keras API
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)
# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print('Number of images in x_train', x_train.shape[0])
print('Number of images in x_test', x_test.shape[0])
x_train shape: (60000, 28, 28, 1)
Number of images in x_train 60000
Number of images in x_test 10000
# Importing the required Keras modules containing model and layers

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D
# Creating a Sequential Model and adding the layers
model = Sequential()
model.add(Conv2D(28, kernel_size=(3,3), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # Flattening the 2D arrays for fully connected layers
model.add(Dense(128, activation=tf.nn.relu))
model.add(Dropout(0.2))
model.add(Dense(10,activation=tf.nn.softmax))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x=x_train,y=y_train, epochs=10)
Epoch 1/10
1875/1875 [==============================] - 43s 22ms/step - loss: 0.2121 -
accuracy: 0.9357
Epoch 2/10
1875/1875 [==============================] - 42s 22ms/step - loss: 0.0878 -
accuracy: 0.9727
Epoch 3/10
1875/1875 [==============================] - 45s 24ms/step - loss: 0.0588 -
accuracy: 0.9815
Epoch 4/10
1875/1875 [==============================] - 43s 23ms/step - loss: 0.0447 -
accuracy: 0.9855
Epoch 5/10
1875/1875 [==============================] - 44s 23ms/step - loss: 0.0359 -
accuracy: 0.9886
Epoch 6/10
1875/1875 [==============================] - 42s 22ms/step - loss: 0.0307 -
accuracy: 0.9897
Epoch 7/10
1875/1875 [==============================] - 43s 23ms/step - loss: 0.0259 -
accuracy: 0.9911
Epoch 8/10
1875/1875 [==============================] - 45s 24ms/step - loss: 0.0230 -
accuracy: 0.9921
Epoch 9/10
1875/1875 [==============================] - 43s 23ms/step - loss: 0.0197 -
accuracy: 0.9932
Epoch 10/10
1875/1875 [==============================] - 42s 22ms/step - loss: 0.0183 -
accuracy: 0.9943
<keras.callbacks.History at 0x7fa72f2c22f0>
model.evaluate(x_test, y_test)
313/313 [==============================] - 4s 13ms/step - loss: 0.0644 - accuracy:

0.9849
[0.06441233307123184, 0.9848999977111816]
Inference:-
The model accuracy is 98%
Use the model to predict image at index position 4444
image_index = 4444
plt.imshow(x_test[image_index].reshape(28, 28),cmap='Greys')
pred = model.predict(x_test[image_index].reshape(1, 28, 28, 1))
print(pred.argmax())
1/1 [==============================] - 0s 115ms/step

9
Inference:- The model predicts the correct digit

01 Machine Learning

Uploaded by

Copyright:

Available Formats

01 Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01 Machine Learning

Uploaded by

Copyright:

Available Formats

N.G.ACHARYA & D.K.

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

DEPARTMENT OF COMPUTER SCIENCE

ARTS, SCIENCE & COMMERCE.

(Affiliated to University Of Mumbai)

Signature Signature Signature

Internal Guide External Examiner Head Of Department

College Seal Date:

Practical Practical Signature.

1 Implement Linear Regression(Diabetes Dataset)

2 Implement Logistics Regression(Iris Dataset).

3 Implement Multinomial Logistic Regression(Iris

4 Implement SVM classifier(Iris Dataset)

5 Train and fine-tune a decision tree for the moons

6 Train and SVM regression on the California

7 Implement Batch Gradient Descent with early

8 Implement MLP for classification of handwritten

# We will split the data into training and testing data

train_x, test_x, train_y, test_y = train_test_split(X,Y,test_size=0.3,random_state=99)

# we will check the accuracy

coeff = pd.Series(le.coef_, index = train_x.columns)

The regression Equation would be :

Diabetes Progression = Intercept + coeff(1) X age + coeff(2) X sex +.....+ coeff(10) X s6

# Importing Sklearn module and classes

#Data Loading – IRIS dataset

X = iris.data[:, [0, 2]]

#Create Training / Test Data

#Perform Feature Scaling

#Train a Logistic Regression Model

# Fit the model

#Measure model performance

# Use metrics.accuracy_score to measure the score

multinomial LogisticRegression Accuracy 0.956

Infernce – The accuracy score of model is 95%

#Define the col names

#Read the dataset

#Encoding the categorical column

# Now we’re going to analyze our data

# Spliting the data

#Create the SVM model

#Make the prediction

#And finally for check the acurracy of the model

from sklearn.model_selection import cross_val_score

Inference - Accuracy score of model is 98.18% and standard deviation is 3.64%

from sklearn.datasets import make_moons

X, y = make_moons(n_samples=10000, noise=0.4, random_state=21)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test =train_test_split(X,y, test_size = 0.2)

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import GridSearchCV

clf = GridSearchCV(tree_clf, parameter, cv = 5,scoring =

{'criterion': 'gini', 'max_leaf_nodes': 37, 'min_samples_split': 2}

0.77859375 {'criterion': 'gini', 'max_leaf_nodes': 2, 'min_samples_split': 2}

#Getting the training score:

lets see the confusion matrix.

from sklearn.metrics import confusion_matrix

pre = precision_score(y_train, pred)

Precision: 0.8777636594663278 Recall:0.8695871097683786

from sklearn.metrics import f1_score