# Diabetes: Pandas PD Numpy NP Seaborn Sns

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

p5

November 6, 2024

[28]: # Diabetes

[1]: import pandas as pd


import numpy as np
import seaborn as sns

C:\Users\ASUS\AppData\Local\Temp\ipykernel_12308\1285016483.py:1:
DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of
pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better
interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466

import pandas as pd

[2]: df = pd.read_csv(r"C:\Users\ASUS\Downloads\diabetes.csv")

[3]: df

[3]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \


0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
4 0 137 40 35 168 43.1
.. … … … … … …
763 10 101 76 48 180 32.9
764 2 122 70 27 0 36.8
765 5 121 72 23 112 26.2
766 1 126 60 0 0 30.1
767 1 93 70 31 0 30.4

Pedigree Age Outcome


0 0.627 50 1
1 0.351 31 0

1
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1
.. … … …
763 0.171 63 0
764 0.340 27 0
765 0.245 30 0
766 0.349 47 1
767 0.315 23 0

[768 rows x 9 columns]

[4]: # input data


x = df.drop('Outcome', axis = 1)

# output data
y = df['Outcome']

[5]: sns.countplot(x = y)

[5]: <Axes: xlabel='Outcome', ylabel='count'>

2
[6]: y.value_counts()

[6]: Outcome
0 500
1 268
Name: count, dtype: int64

[7]: # Feature scaling


from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
x_scaled = scaler.fit_transform(x)

[8]: # cross-validation
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, random_state = 0,␣
↪test_size=0.5)

[10]: x.shape

[10]: (768, 8)

[11]: xtrain.shape

[11]: (384, 8)

[12]: ytrain.shape

[12]: (384,)

[13]: from sklearn.neighbors import KNeighborsClassifier

[15]: knn = KNeighborsClassifier(n_neighbors=5)

[16]: knn.fit(xtrain, ytrain)

[16]: KNeighborsClassifier()

[26]: from sklearn.metrics import accuracy_score, confusion_matrix, precision_score,␣


↪recall_score

from sklearn.metrics import classification_report

[18]: ypred = knn.predict(xtest)

[22]: confusion_matrix = confusion_matrix(ytest, ypred)

[23]: print(classification_report(ytest, ypred))

3
precision recall f1-score support

0 0.78 0.85 0.81 253


1 0.65 0.53 0.58 131

accuracy 0.74 384


macro avg 0.71 0.69 0.70 384
weighted avg 0.73 0.74 0.73 384

[24]: tn=confusion_matrix[0][0]
fp=confusion_matrix[0][1]
fn=confusion_matrix[1][0]
tp=confusion_matrix[1][1]

[25]: accuracy = (tp+tn)/(tn+fp+fn+tp)


error = (fp+fn)/(tn+fp+fn+tp)
precision= (tp)/(tp+fp)
recall = (tp)/ (tp+fn)
print("\nBy using manual calculation : ")
print("Accuracy : ", accuracy)
print("Error rate : ", error)
print("Precision : ", precision)
print("Recall : ",recall)

By using manual calculation :


Accuracy : 0.7421875
Error rate : 0.2578125
Precision : 0.6509433962264151
Recall : 0.5267175572519084

[29]: print("\nBy using inbuilt function : ")


print("Accuracy : ", accuracy_score(ytest,ypred))
print("Error rate : ", 1-accuracy_score(ytest, ypred))
print("Precision : ", precision_score(ytest,ypred))
print("Recall : ",recall_score(ytest,ypred))

By using inbuilt function :


Accuracy : 0.7421875
Error rate : 0.2578125
Precision : 0.6509433962264151
Recall : 0.5267175572519084

[ ]:

You might also like