01 Machine Learning
01 Machine Learning
01 Machine Learning
MARATHE COLLEGE OF
ARTS, SCIENCE & COMMERCE.
(Affiliated to University Of Mumbai)
PRACTICAL JOURNAL
PSCSP512
Machine Learning
SUBMITTED BY
KAMBLE YASH RAJESH
SEAT NO :
2023-2024
MUMBAI-400 071
N.G.ACHARYA & D.K.MARATHE COLLEGE OF
CERTIFICATE
This is to certify that Mr. Kamble Yash Rajesh Seat No. studying
in Master of Science in Computer Science Part I Semester II has
satisfactorily completed the Practical of PSCSP512 Machine Learning
as prescribed by University of Mumbai, during the academic year 2023-
24.
diabetes = datasets.load_diabetes()
diabetes
print(diabetes.DESCR)
# columns
diabetes.feature_names
# Now we will split the data into the independent and independent variabl
X = diabetes.data
Y = diabetes.target
X.shape, Y.shape
train_x.shape, train_y.shape
# Linear Regression
from sklearn.linear_model import LinearRegression
le = LinearRegression()
le.fit(train_x,train_y)
y_pred = le.predict(test_x)
y_pred
result = pd.DataFrame({'Actual': test_y, 'Predict' : y_pred})
result
#Variance_Score
explained_variance_score(test_y,predicted_y)
>> 0.47737703777354545
# mean_squared_error
mean_squared_error(test_y,y_pred)
>> 3157.972848565651
# r2 score
r2_score(test_y,y_pred)
Inference:-
The model explains 0.477377 variance of the target w.r.t. features
The mean absolute error of model is 3157.972848565651
The R-Square score of model is 0.45
Below are the Coefficients & intercepts of the Regression Equation as calculated by the
model.
print("Coefficients:\n")
print(coeff)
print("\n")
print("Intercept:\n")
print(intercept)
print("\n")
Coefficients:
age 54.820535
sex -260.930304
bmi 458.001802
bp 303.502332
s1 -995.584889
s2 698.811401
s3 183.095229
s4 185.698494
s5 838.503887
s6 96.441048
dtype: float64
Intercept:
154.42752615353518
{'data': array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7, 3.2, 1.3, 0.2], [4.6, 3.1, 1.5, 0.2], [5.
, 3.6, 1.4, 0.2], [5.4, 3.9, 1.7, 0.4], {'data': array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7,
3.2, 1.3, 0.2], [4.6, 3.1, 1.5, 0.2], [5. , 3.6, 1.4, 0.2], [5.4, 3.9, 1.7, 0.4],
X_train
X_train.shape
(105, 2) .
X_test.shape
(45, 2)
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
X_train
X_train_std
lr.fit(X_train_std, Y_train)
Y_predict = lr.predict(X_test_std)
Y_predict
array([2, 0, 0, 1, 1, 1, 2, 1, 2, 0, 0, 2, 0, 1, 0, 1, 2, 1, 1, 2, 2, 0, 1, 1, 1, 1, 1, 2, 0, 2, 0, 0, 1, 1, 2,
2, 0, 0, 0, 1, 2, 2, 1, 0, 0])
Output
#Data
dataset.head()
classifier.fit(X_train, y_train)
Output
[[13 0 0]
[ 0 15 1]
[ 0 0 9]]
Accuracy: 98.18 %
Standard Deviation: 3.64 %
import numpy as np
import matplotlib.pyplot as plt
def plot_dataset(X, y, axes):
plt.figure(figsize=(10,6))
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs",alpha = 0.5)
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^",alpha = 0.2)
plt.axis(axes)
plt.grid(True, which='both')
plt.xlabel(r"$x_1$", fontsize=20)
plt.ylabel(r"$x_2$", fontsize=20, rotation=0)
tree_clf = DecisionTreeClassifier()
parameter = {
'criterion' : ["gini", "entropy"],
'max_leaf_nodes': list(range(2, 50)),
'min_samples_split': [2, 3, 4]
}
clf.fit(X_train, y_train)
clf.best_params_
cvres = clf.cv_results_
for mean_score, params in zip(cvres["mean_train_score"], cvres["params"]):
print(mean_score, params)
We have an accuracy of approximately 87% but accuracy is sometimes not a good measure to
use,
array([[3547, 481],
[ 518, 3454]])
Now from the confusion matrix let's get our precision and recall, which are better
metrics.
from sklearn.metrics import precision_score, recall_score
Not bad we have a higher precision than recall but lets combine the two metrics
into F1_ score.
0.8736562539521943
Our F1_Score and accuracy are almost the same.
0.8585
Inference:-
We have an accuracy of approximately 85% on the testing set.
Practical No 6
Aim:- Train and SVM regression on the California housing dataset
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
housing_data = fetch_california_housing()
descr = housing_data[‘DESCR’]
feature_names = housing_data[‘feature_names’]
data = housing_data[‘data’]
target = housing_data[‘target’]
df1 = pd.DataFrame(data=data)
df1.rename(columns={0: feature_names[0], 1: feature_names[1], 2: feature_names[2], 3:
feature_names[3],
4: feature_names[4], 5: feature_names[5], 6: feature_names[6], 7: feature_names[7]},
inplace=True)
df2 = pd.DataFrame(data=target)
df2.rename(columns={0: ‘Target’}, inplace=True)
housing = pd.concat([df1, df2], axis=1)
print(housing.columns)
housing.head()
MedIncHouseAgeAveRoomsAveBedrmsPopulationAveOccupLatitudeLongitudeTarget08.32
5241.06.9841271.023810322.02.55555637.88-
122.234.52618.301421.06.2381370.9718802401.02.10984237.86-
122.223.58527.257452.08.2881361.073446496.02.80226037.85-
122.243.52135.643152.05.8173521.073059558.02.54794537.85-
122.253.41343.846252.06.2818531.081081565.02.18146737.85-122.253.422
print(“dimension of housing data: {}”.format(housing.shape))
housing.info()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(housing.loc[:, housing.columns != 'Target'],
housing['Target'], random_state=66)
from sklearn.svm import SVR
svr = SVR()
svr.fit(X_train, y_train)
s1 = svr.score(X_train, y_train)
s2 = svr.score(X_test, y_test)
print(“R² of Support Vector Regressor on training set: {:.3f}”.format(s1))
print(“R² of Support Vector Regressor on test set: {:.3f}”.format(s2))
O/P
Inference:-
The model underperform quite substantially, with a negative score on both the training set
and the test set.
SVM requires all the features to vary on a similar scale. We will need to re-scale our data that
all the features are approximately on the same scale:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)
svr1 = SVR()
svr1.fit(X_train_scaled, y_train)
s3 = svr1.score(X_train_scaled, y_train)
s4 = svr1.score(X_test_scaled, y_test)
print(“R² of Support Vector Regressor on training set: {:.3f}”.format(s3))
print(“R² of Support Vector Regressor on test set: {:.3f}”.format(s4))
Inference:-
Scaling the data made a huge difference! Now we are actually underfitting, where training
and test set performance are quite similar but less close to 100% accuracy. From here, we can
try increasing either gamma or C to fit a more complex model.
svr2 = SVR(gamma=10)
svr2.fit(X_train_scaled, y_train)
s5 = svr2.score(X_train_scaled, y_train)
s6 = svr2.score(X_test_scaled, y_test)
print(“R² of Support Vector Regressor on training set: {:.3f}”.format(s5))
print(“R² of Support Vector Regressor on test set: {:.3f}”.format(s6))
Inference:-
Here, increasing gamma allows us to improve the model, resulting in 69.7% test set accuracy.
Practical No 7
Aim:- Implement Batch Gradient Descent with early stopping for Softmax
Regression.
import numpy as np
import math
class SoftmaxClassifier:
def __init__(self,learning_rate=0.1,max_iter=1000):
self.__learning_rate=learning_rate
self.__max_iter=max_iter
def __calculate_score(self,k,x):
weight=self.__weights[k]
return x.dot(weight)
def train(self,x,y):
self.__x=x
self.__y=y
self.__class_count=len(self.__y[0])
self.__weights=np.random.rand(self.__class_count,x.shape[1])
for i in range(self.__max_iter):
for j in range(self.__class_count):
self.__weights[j]=self.__calculate_new_weights(j)
def __calculate_softmax(self,k,x):
sum_of_exp=0
for i in range(self.__class_count):
sum_of_exp+=self.__calculate_score(i,x)
return self.__calculate_score(k,x)/sum_of_exp
def __calculate_cross_entropy_gradient(self,k):
sum=0
for i in range(len(self.__x)):
sum+=((self.__calculate_softmax(k,self.__x[i])-self.__y[i][k])*self.__x[i])
return sum
def __calculate_new_weights(self,k):
step_size=(self.__calculate_cross_entropy_gradient(k) )* self.__learning_rate
return self.__weights[k]-step_size
def predict(self,x):
y=np.zeros((len(x),self.__class_count))
for i in range(len(x)):
max_score_index=0
max_score=0
for j in range(self.__class_count):
score=self.__calculate_softmax(j,x[i])
if score>max_score:
max_score=score
max_score_index=j
y[i][max_score_index]=1
return y
import numpy as np
def convert_to_one_hot(labels):
class_count=len(set(labels))
one_hot=np.zeros((len(labels),class_count))
for i in range(len(labels)):
one_hot[i][labels[i]]=1
return one_hot
def main():
iris=datasets.load_iris()
data=iris['data']
labels_one_hot=convert_to_one_hot(iris['target'])
rand=np.random.permutation(len(data))
x_train,x_test,y_train,y_test=train_test_split(data[rand],labels_one_hot[rand],test_size=0.33
)
soft_clf=SoftmaxClassifier()
soft_clf.train(x_train,y_train)
y_pred=soft_clf.predict(x_test)
Accuracy=accuracy_score(y_test,y_pred)
print(Accuracy)
if __name__=="__main__":
main()
O/P Accuracy=0.8
Inference:-
Accuracy of model is 80%
Practical No 8
Aim:- Implement MLP for classification of handwritten digits (MNIST
Dataset).
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
8
<matplotlib.image.AxesImage at 0x7fa7325add50>
x_train.shape
(60000, 28, 28)
# Reshaping the array to 4-dims so that it can work with the Keras API
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)
# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print('Number of images in x_train', x_train.shape[0])
print('Number of images in x_test', x_test.shape[0])
x_train shape: (60000, 28, 28, 1)
Number of images in x_train 60000
Number of images in x_test 10000
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x=x_train,y=y_train, epochs=10)
Epoch 1/10
1875/1875 [==============================] - 43s 22ms/step - loss: 0.2121 -
accuracy: 0.9357
Epoch 2/10
1875/1875 [==============================] - 42s 22ms/step - loss: 0.0878 -
accuracy: 0.9727
Epoch 3/10
1875/1875 [==============================] - 45s 24ms/step - loss: 0.0588 -
accuracy: 0.9815
Epoch 4/10
1875/1875 [==============================] - 43s 23ms/step - loss: 0.0447 -
accuracy: 0.9855
Epoch 5/10
1875/1875 [==============================] - 44s 23ms/step - loss: 0.0359 -
accuracy: 0.9886
Epoch 6/10
1875/1875 [==============================] - 42s 22ms/step - loss: 0.0307 -
accuracy: 0.9897
Epoch 7/10
1875/1875 [==============================] - 43s 23ms/step - loss: 0.0259 -
accuracy: 0.9911
Epoch 8/10
1875/1875 [==============================] - 45s 24ms/step - loss: 0.0230 -
accuracy: 0.9921
Epoch 9/10
1875/1875 [==============================] - 43s 23ms/step - loss: 0.0197 -
accuracy: 0.9932
Epoch 10/10
1875/1875 [==============================] - 42s 22ms/step - loss: 0.0183 -
accuracy: 0.9943
<keras.callbacks.History at 0x7fa72f2c22f0>
model.evaluate(x_test, y_test)
Inference:-
The model accuracy is 98%
image_index = 4444
plt.imshow(x_test[image_index].reshape(28, 28),cmap='Greys')
pred = model.predict(x_test[image_index].reshape(1, 28, 28, 1))
print(pred.argmax())