ML Lab6.Ipynb - Colaboratory

5/10/22, 12:39 AM 191389_ML_Lab6.
ipynb - Colaboratory
DECISION TREE
1 import pandas as pd
2 import numpy as np
3 import seaborn as sns
4 import matplotlib.pyplot as plt
5 from google.colab import files
6 uploaded = files.upload()
7
Choose Files No file chosen

Upload widget is only available when the cell has been
executed in the
current browser session. Please rerun this cell to enable.
Saving Bill.csv to Bill.csv
1 dt_df = pd.read_csv("Bill.csv")
1 dt_df.describe()
Variance Skewness Curtosis Entropy Class
count 1372.000000 1372.000000 1372.000000 1372.000000 1372.000000
mean 0.433735 1.922353 1.397627 -1.191657 0.444606
std 2.842763 5.869047 4.310030 2.101013 0.497103
min -7.042100 -13.773100 -5.286100 -8.548200 0.000000
25% -1.773000 -1.708200 -1.574975 -2.413450 0.000000
50% 0.496180 2.319650 0.616630 -0.586650 0.000000
75% 2.821475 6.814625 3.179250 0.394810 1.000000
max 6.824800 12.951600 17.927400 2.449500 1.000000
1
dt_df.isnull().sum()*100/dt_df.shape[0]
Variance 0.0
Skewness 0.0
Curtosis 0.0
Entropy 0.0
Class 0.0
dtype: float64
1
X_dt = dt_df.drop('Class',axis=1)
2
y_dt = dt_df['Class']
1
from sklearn.model_selection import train_test_split
3
X_train,X_test,y_train,y_test=train_test_split(X_dt,y_dt,test_size=0.2)
https://colab.research.google.com/drive/1fb1GxUhTNcGPM5E0YNyfB3YyegB03GFC#scrollTo=bjc56ErlCh4M&printMode=true 1/5
5/10/22, 12:39 AM 191389_ML_Lab6.ipynb - Colaboratory
1
X_train.head()
Variance Skewness Curtosis Entropy
286 1.3419 -4.4221 8.0900 -1.73490
412 3.7767 9.7794 -3.9075 -3.53230
493 2.8084 11.3045 -3.3394 -4.41940
369 2.1948 1.3781 1.1582 0.85774
732 -2.7143 11.4535 2.1092 -3.96290
1
y_train.head()
286 0
412 0
493 0
369 0
732 0
Name: Class, dtype: int64
1
import statsmodels.api as sm
3
X_train_sm = sm.add_constant(X_train)
4
dt_lm = sm.OLS(y_train, X_train_sm).fit()
/usr/local/lib/python3.7/dist-packages/statsmodels/tsa/tsatools.py:117: FutureWarning
x = pd.concat(x[::order], 1)
1
print(dt_lm.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Class R-squared: 0.867
Model: OLS Adj. R-squared: 0.866
Method: Least Squares F-statistic: 1775.
Date: Thu, 24 Mar 2022 Prob (F-statistic): 0.00
Time: 18:14:55 Log-Likelihood: 315.74
No. Observations: 1097 AIC: -621.5
Df Residuals: 1092 BIC: -596.5
Df Model: 4
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.8033 0.008 94.520 0.000 0.787 0.820
Variance -0.1436 0.002 -59.842 0.000 -0.148 -0.139
Skewness -0.0787 0.002 -45.004 0.000 -0.082 -0.075
Curtosis -0.1035 0.002 -47.924 0.000 -0.108 -0.099
Entropy 0.0008 0.004 0.239 0.811 -0.006 0.008
==============================================================================
Omnibus: 150.601 Durbin-Watson: 1.938
Prob(Omnibus): 0.000 Jarque-Bera (JB): 283.478
Skew: -0.844 Prob(JB): 2.78e-62
Kurtosis: 4.832 Cond. No. 11.3
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly spec
1
from sklearn.tree import DecisionTreeClassifier
3
clf_CART = DecisionTreeClassifier()
4
clf_ID3 = DecisionTreeClassifier(criterion = 'entropy') #criterion is by default gini
5
#For ID3 criterion is entropy
7
clf_CART.fit(X_train,y_train)
8
clf_ID3.fit(X_train,y_train)
DecisionTreeClassifier(criterion='entropy')
1
y_pred=clf_CART.predict(X_test)
1
from sklearn.metrics import confusion_matrix,classification_report
3
print(confusion_matrix(y_test,y_pred))
4
print(classification_report(y_test,y_pred))
[[149 4]
[ 1 121]]
precision recall f1-score support
0 0.99 0.97 0.98 153
1 0.97 0.99 0.98 122
accuracy 0.98 275
macro avg 0.98 0.98 0.98 275
weighted avg 0.98 0.98 0.98 275
1
#CART Decision Tree
3
from sklearn.tree import plot_tree
5
plt.figure(figsize=(25,10))
6
plot_tree(clf_CART, filled=True)
7
plt.show()
1
#ID3 Decision Tree
3
from sklearn.tree import plot_tree
5
plt.figure(figsize=(25,10))
6
plot_tree(clf_ID3, filled=True)
7
plt.show()

ML Lab6.Ipynb - Colaboratory

Uploaded by

Copyright:

Available Formats

ML Lab6.Ipynb - Colaboratory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Lab6.Ipynb - Colaboratory

Uploaded by

Copyright:

Available Formats

5/10/22, 12:39 AM 191389_ML_Lab6.

Choose Files No file chosen

Variance Skewness Curtosis Entropy Class

count 1372.000000 1372.000000 1372.000000 1372.000000 1372.000000

mean 0.433735 1.922353 1.397627 -1.191657 0.444606

std 2.842763 5.869047 4.310030 2.101013 0.497103

min -7.042100 -13.773100 -5.286100 -8.548200 0.000000

25% -1.773000 -1.708200 -1.574975 -2.413450 0.000000

50% 0.496180 2.319650 0.616630 -0.586650 0.000000

75% 2.821475 6.814625 3.179250 0.394810 1.000000

max 6.824800 12.951600 17.927400 2.449500 1.000000

Variance Skewness Curtosis Entropy

286 1.3419 -4.4221 8.0900 -1.73490

412 3.7767 9.7794 -3.9075 -3.53230

493 2.8084 11.3045 -3.3394 -4.41940

369 2.1948 1.3781 1.1582 0.85774

732 -2.7143 11.4535 2.1092 -3.96290

Name: Class, dtype: int64

OLS Regression Results

Dep. Variable: Class R-squared: 0.867

Model: OLS Adj. R-squared: 0.866

Method: Least Squares F-statistic: 1775.

Date: Thu, 24 Mar 2022 Prob (F-statistic): 0.00

Time: 18:14:55 Log-Likelihood: 315.74

No. Observations: 1097 AIC: -621.5

Df Residuals: 1092 BIC: -596.5

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

const 0.8033 0.008 94.520 0.000 0.787 0.820

Variance -0.1436 0.002 -59.842 0.000 -0.148 -0.139

Skewness -0.0787 0.002 -45.004 0.000 -0.082 -0.075

Curtosis -0.1035 0.002 -47.924 0.000 -0.108 -0.099

Entropy 0.0008 0.004 0.239 0.811 -0.006 0.008

Omnibus: 150.601 Durbin-Watson: 1.938

Prob(Omnibus): 0.000 Jarque-Bera (JB): 283.478

Skew: -0.844 Prob(JB): 2.78e-62

Kurtosis: 4.832 Cond. No. 11.3

precision recall f1-score support

0 0.99 0.97 0.98 153

1 0.97 0.99 0.98 122

accuracy 0.98 275

macro avg 0.98 0.98 0.98 275

weighted avg 0.98 0.98 0.98 275

You might also like