Week 1 To Week 9

WEEK-1
Create repository- named mini project-1 Push the same to GitHub
Shreyas@DESKTOP-4UBQVST MINGW64 ~ (master)

$ cd desktop
Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)

$ git config --global user.name shreyas55555

$ git config --global user.email [email protected]

$ git init
Reinitialized existing Git repository in C:/Users/Shreyas/Desktop/.git/

$ git clone https://github.com/Shreyas55555/kaggle.git
Cloning into 'kaggle'...
warning: You appear to have cloned an empty repository.

$ cd kaggle
Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)

$ git status
On branch main
No commits yet
nothing to commit (create/copy files and use "git add" to track)

$ touch shreyasb

$ git add -A

$ git commmit -m "a"
git: 'commmit' is not a git command. See 'git --help'.
The most similar command is

commit

$ git commit -m "a"
[main (root-commit) 570c72d] a
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 shreyasb

$ git status
On branch main
Your branch is based on 'origin/main', but the upstream is gone.
(use "git branch --unset-upstream" to fixup)
nothing to commit, working tree clean

$ git checkout -b sshreyas
Switched to a new branch 'sshreyas'
Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)

$ git add ggg
fatal: pathspec 'ggg' did not match any files

$ touch ggg

$ git add ggg

$ git commit -m "dd"
[sshreyas 2360687] dd
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 ggg

$ git push origin sshreyas
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (5/5), 407 bytes | 407.00 KiB/s, done.
Total 5 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To https://github.com/Shreyas55555/kaggle.git
* [new branch] sshreyas -> sshreyas

$ echo "# kagge1" >> ggg

$ git add ggg
warning: in the working copy of 'ggg', LF will be replaced by CRLF the next time Git touches it

$ git commit -m "g"
[sshreyas cbcd087] g
1 file changed, 1 insertion(+)

$ git push origin sshreyas
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 271 bytes | 271.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To https://github.com/Shreyas55555/kaggle.git
2360687..cbcd087 sshreyas -> sshreyas
Week:2
Python recap Database connectivity
Code:
import pymysql
from tkinter import messagebox
import tkinter as tk
def submit():
name = name_entry.get()
reg_no = reg_no_entry.get()
pathway = pathway_entry.get()
if name and reg_no and pathway:

try:
db_config = {
"host": "localhost",
"user": "root",
"password": "bvvs",
"database": "CS"
}
conn = pymysql.connect(**db_config)
cursor = conn.cursor()
query = "INSERT INTO students (name, reg_no, pathway) VALUES (%s, %s, %s)"
values = (name, reg_no, pathway)
cursor.execute(query, values)
conn.commit()
cursor.close()
conn.close()
messagebox.showinfo("Success", "Data submitted successfully!")
clear_fields()
except pymysql.MySQLError as err:
messagebox.showerror("Error", f"Database error: {err}")
except Exception as e:
messagebox.showerror("Error", f"Unexpected error: {e}")
else:
messagebox.showwarning("Warning", "Please fill in all fields.")
def clear_fields():
name_entry.delete(0, tk.END)
reg_no_entry.delete(0, tk.END)
pathway_entry.delete(0, tk.END)
# Setting up the main window

root = tk.Tk()
root.title("Database GUI")
# Define the layout

labels_text = ["Name:", "Registration:", "Pathway:"]
entries = []
for i, text in enumerate(labels_text):

tk.Label(root, text=text).grid(row=i, column=0, padx=10, pady=10, sticky=tk.W)
entry = tk.Entry(root)
entry.grid(row=i, column=1, padx=10, pady=10)
entries.append(entry)
name_entry, reg_no_entry, pathway_entry = entries
submit_button = tk.Button(root, text="Submit", command=submit)

clear_button = tk.Button(root, text="Clear", command=clear_fields)
submit_button.grid(row=len(labels_text), column=0, columnspan=2, pady=10, sticky=tk.W+tk.E)

clear_button.grid(row=len(labels_text)+1, column=0, columnspan=2, pady=10, sticky=tk.W+tk.E)
root.mainloop()
Mysql:
mysql> create database CS;
Query OK, 1 row affected (0.04 sec)
mysql> use CS;
Database changed
mysql> create table students(name varchar(20),reg_no int(20),pathway varchar(20));
Query OK, 0 rows affected, 1 warning (0.08 sec)
Output:
mysql> select * from students;
+------------------+--------+------------+
| name | reg_no | pathway |
+------------------+--------+------------+
| shreyas bhagoji| 2246 | AIML|
+------------------+--------+------------+
1 row in set (0.01 sec)

Week:3
Consider a dataset and infer the relations with the help of different plots
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("/content/data.csv")
df.head()
plt.hist(df['FastCharge_KmH'],bins=10)
plt.hist(df['TopSpeed_KmH'],bins=10)
plt.show()
plt.scatter(x='TopSpeed_KmH',y='FastCharge_KmH',data=df)
plt.xlabel('TopSpeed_KmH')
plt.ylabel('FastCharge_KmH')
plt.show()
plt.boxplot(df['FastCharge_KmH'])
plt.show()

plt.bar(df['TopSpeed_KmH'],df['FastCharge_KmH'],width=5)
plt.show()
plt.scatter(x='TopSpeed_KmH',y='FastCharge_KmH',data=df)
plt.xlabel(" miles per gallon")
plt.show()
plt.hist(df['TopSpeed_KmH'],width=10)
plt.show()
Week4
Use revelent python packages to compute Central tendency for the parameters
Dispersion for the parameters
import pandas as pd
from scipy import stats
import numpy as np
data=pd.read_csv("/content/Copy of ev_sales_year1.csv")
data.head()
mean=data['HEAVY GOODS VEHICLE'].mean()

median=data['HEAVY PASSENGER VEHICLE'].median()
mode=stats.mode(data['LIGHT GOODS VEHICLE'])
print("centraltendency")
print(mean)
print(median)
print(mode)
centraltendency
1.6666666666666667
0.0
ModeResult(mode=5, count=3)
data_range=np.ptp(data['HEAVY GOODS VEHICLE'])
variance=np.var(data['HEAVY PASSENGER VEHICLE'])
std_deviation=np.std(data['LIGHT GOODS VEHICLE'])
iqr=np.percentile(data['LIGHT GOODS VEHICLE'],75)-np.percentile(data['LIGHT GOODS
VEHICLE'],25)
print("dispersion")
print("data_range",data_range)
print("variance",variance)
print("std_deviation",std_deviation)
print("iqr",iqr)
output=dispersion
data_range 3
variance 0.07638888888888888
std_deviation 5.97854961972848
iqr 2.5
import seaborn as sns
plt.figure(figsize=(15,5))
plt.subplot(131)
sns.histplot(data['HEAVY GOODS VEHICLE'],kde=True)
plt.title("HistogramofHEAVY GOODS VEHICLE")
plt.subplot(132)
sns.boxplot(x='HEAVY GOODS VEHICLE',data=data)
plt.title("BoxPlotofHEAVY GOODS VEHICLE")
plt.show()
Week5
Dealing with missing values with different approaches Outliers detecting Outliers
import numpy as np
import pandas as pd
df=pd.read_csv("/content/Copy of ev_sales_year1.csv")
df.head()
missing_values=data.isnull().sum()
print("MissingValues:\n",missing_values)
MissingValues:
Date 0
HEAVY GOODS VEHICLE 0
HEAVY MOTOR VEHICLE 0
HEAVY PASSENGER VEHICLE 0
LIGHT GOODS VEHICLE 0
LIGHT MOTOR VEHICLE 0
LIGHT PASSENGER VEHICLE 0
MEDIUM MOTOR VEHICLE 0
OTHER THAN MENTIONED ABOVE 0
THREE WHEELER(NT) 0
THREE WHEELER(T) 0
TWO WHEELER(NT) 0
TWO WHEELER(T) 0
dtype: int64
from scipy import stats

import pandas as pd
df=pd.read_csv('/content/Copy of ev_sales_year1.csv')
df.describe()
zscore=stats.zscore(df['HEAVY GOODS VEHICLE'])
threshold=2
outlier=abs(zscore)>threshold
print(outlier)
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
Name: HEAVY GOODS VEHICLE, dtype: bool

import pandas as pd
data={ 'PassengerId':[101,102,103,104,105,106,107,108,109,110,1000],'Pclass':
[1,2,3,4,5,6,7,8,9,10,11] }
df=pd.DataFrame(data)
z_scores=(df['PassengerId']-df['PassengerId'].mean())/df['PassengerId'].std()
outliers=df[(z_scores>threshold)|(z_scores<-threshold)]
plt.figure(figsize=(8,6))
plt.plot(df.index,df['PassengerId'],'bo',label='DataPoints')
plt.plot(outliers.index,outliers['PassengerId'],'ro',label='Outliers')
plt.xlabel("Index")
plt.ylabel("PassengerId")
plt.legend()
plt.show()
plt.scatter(x='Pclass',y='PassengerId',data=df)
plt.xlabel("milespergallon")
plt.ylabel("")
plt.show()
WEEK-6
split training and testing data sets inPython using train_test_split() of
sci-kitlearn.Explore the optionsof train_test_split()
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris=load_iris()
x,y=iris.data,iris.target
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.5,random_state=42)
print("X train data :",x_train)
print("X test data :",x_test)
print("y train data :",y_train)
print("y test data :",y_test)
X train data : [[5.4 3. 4.5 1.5]
[6.2 3.4 5.4 2.3]
[5.5 2.3 4. 1.3]
[5.4 3.9 1.7 0.4]
[5. 2.3 3.3 1. ]
[6.4 2.7 5.3 1.9]
[5. 3.3 1.4 0.2]
[5. 3.2 1.2 0.2]
[5.5 2.4 3.8 1.1]
[6.7 3. 5. 1.7]
[4.9 3.1 1.5 0.2]
[5.8 2.8 5.1 2.4]
[5. 3.4 1.5 0.2]
[5. 3.5 1.6 0.6]
[5.9 3.2 4.8 1.8]
[5.1 2.5 3. 1.1]
[6.9 3.2 5.7 2.3]
[6. 2.7 5.1 1.6]
[6.1 2.6 5.6 1.4]
[7.7 3. 6.1 2.3]
[5.5 2.5 4. 1.3]
[4.4 2.9 1.4 0.2]
[4.3 3. 1.1 0.1]
[6. 2.2 5. 1.5]
[7.2 3.2 6. 1.8]
[4.6 3.1 1.5 0.2]
[5.1 3.5 1.4 0.3]
[6.3 3.3 6. 2.5]
[4.7 3.2 1.3 0.2]
[7.1 3. 5.9 2.1]]
X test data : [[6.1 2.8 4.7 1.2]
[5.7 3.8 1.7 0.3]
[7.7 2.6 6.9 2.3]
[6. 2.9 4.5 1.5]
[6.8 2.8 4.8 1.4]
[5.4 3.4 1.5 0.4]
[5.6 2.9 3.6 1.3]
[6.9 3.1 5.1 2.3]
[6.2 2.2 4.5 1.5]
[5.8 2.7 3.9 1.2]
[6.5 3.2 5.1 2. ]
[4.8 3. 1.4 0.1]
[5.5 3.5 1.3 0.2]
[4.9 3.1 1.5 0.1]
[5.1 3.8 1.5 0.3]
[6.3 3.3 4.7 1.6]
[6.5 3. 5.8 2.2]
[5.6 2.5 3.9 1.1]
[5.7 2.8 4.5 1.3]
[6.4 2.8 5.6 2.2]
[4.7 3.2 1.6 0.2]
[6.1 3. 4.9 1.8]
[5. 3.4 1.6 0.4]
[6.4 2.8 5.6 2.1]
[7.9 3.8 6.4 2. ]
[6.7 3. 5.2 2.3]
[6.7 2.5 5.8 1.8]
[6.8 3.2 5.9 2.3]
[4.8 3. 1.4 0.3]
[4.8 3.1 1.6 0.2]
[4.6 3.6 1. 0.2]
[5.7 4.4 1.5 0.4]
[6.7 3.1 4.4 1.4]
[4.8 3.4 1.6 0.2]
[4.4 3.2 1.3 0.2]
[6.3 2.5 5. 1.9]
[6.4 3.2 4.5 1.5]
[5.2 3.5 1.5 0.2]
[5. 3.6 1.4 0.2]
y train data : [1 2 1 0 1 2 0 0 1 1 0 2 0 0 1 1 2 1 2 2 1 0 0 2 2 0 0 0 1 2 0 2 2 0 1 1 2
1202121110110122012202012212112201201
2]
y test data : [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0002110012212121021000120001012012022
1]
WEEK - 7
Iris dataset from sci-kit learn Perform data exploration,preprocessing and
splitting
Data Exploration :
import pandas as pd
df=pd.read_csv('/content/Iris.csv')
summary=df.describe()
sample_data=df.head()
missing_values=df.isnull().sum()
print("\nsummary of the dataset:",summary)
print("\nsample data of the dataset:",sample_data)
print("\nchecking missing values in dataset:",missing_values)
sns.histplot(df['SepalLengthCm'],bins=10)
plt.title('Histplot of SepalLengthCm column')
plt.show()
sns.barplot(data=df,x='SepalLengthCm')
plt.title('count plot of SepalWidthCm column')
plt.show()
summary of the dataset: Id SepalLengthCm SepalWidthCm PetalLengthCm

PetalWidthCm
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 75.500000 5.843333 3.054000 3.758667 1.198667
std 43.445368 0.828066 0.433594 1.764420 0.763161
min 1.000000 4.300000 2.000000 1.000000 0.100000
25% 38.250000 5.100000 2.800000 1.600000 0.300000
50% 75.500000 5.800000 3.000000 4.350000 1.300000
75% 112.750000 6.400000 3.300000 5.100000 1.800000
max 150.000000 7.900000 4.400000 6.900000 2.500000
sample data of the dataset: Id SepalLengthCm SepalWidthCm PetalLengthCm

PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
checking missing values in dataset: Id 0

SepalLengthCm 0
SepalWidthCm 0
PetalLengthCm 0
PetalWidthCm 0
Species 0
dtype: int64
Data Splitting :
import pandas as pd
iris=load_iris()
x,y=iris.data,iris.target
print("x train data:",x_train)
print("x test data:",x_test)
print("y train data:",y_train)
print("y test data:",y_test)
df=pd.DataFrame(data=iris.data,columns=iris.feature_names)
print(df)
x train data: [[5.4 3. 4.5 1.5]
[6.2 3.4 5.4 2.3]
[5.5 2.3 4. 1.3]
[5.4 3.9 1.7 0.4]
[5. 2.3 3.3 1. ]
[6.4 2.7 5.3 1.9]
[5. 3.3 1.4 0.2]
[5. 3.2 1.2 0.2]
[5.5 2.4 3.8 1.1]
[6.7 3. 5. 1.7]
[4.9 3.1 1.5 0.2]
[5.8 2.8 5.1 2.4]
[5. 3.4 1.5 0.2]
[5. 3.5 1.6 0.6]
[5.9 3.2 4.8 1.8]
[5.1 2.5 3. 1.1]
[6.9 3.2 5.7 2.3]
[6. 2.7 5.1 1.6]
[6.1 2.6 5.6 1.4]
[7.7 3. 6.1 2.3]
[5.5 2.5 4. 1.3]
[4.4 2.9 1.4 0.2]
[4.3 3. 1.1 0.1]
[6. 2.2 5. 1.5]
[7.2 3.2 6. 1.8]
[4.6 3.1 1.5 0.2]
[5.1 3.5 1.4 0.3]
[4.4 3. 1.3 0.2]
[6.3 2.5 4.9 1.5]
[6.3 3.4 5.6 2.4]
[4.6 3.4 1.4 0.3]
[6.8 3. 5.5 2.1]
[6.3 3.3 6. 2.5]
[4.7 3.2 1.3 0.2]
[6.1 2.9 4.7 1.4]
[6.5 2.8 4.6 1.5]
[6.2 2.8 4.8 1.8]
[7. 3.2 4.7 1.4]
[6.4 3.2 5.3 2.3]
[5.1 3.8 1.6 0.2]
[6.9 3.1 5.4 2.1]
[5.9 3. 4.2 1.5]
[6.5 3. 5.2 2. ]
[5.7 2.6 3.5 1. ]
[5.2 2.7 3.9 1.4]
[6.1 3. 4.6 1.4]
[4.5 2.3 1.3 0.3]
[6.6 2.9 4.6 1.3]
[5.5 2.6 4.4 1.2]
[5.3 3.7 1.5 0.2]
[5.6 3. 4.1 1.3]
[7.3 2.9 6.3 1.8]
[6.7 3.3 5.7 2.1]
[5.1 3.7 1.5 0.4]
[4.9 2.4 3.3 1. ]
[6.7 3.3 5.7 2.5]
[7.2 3. 5.8 1.6]
[7.1 3. 5.9 2.1]]
x test data: [[6.1 2.8 4.7 1.2]
[5.7 3.8 1.7 0.3]
[7.7 2.6 6.9 2.3]
[6. 2.9 4.5 1.5]
[6.8 2.8 4.8 1.4]
[5.4 3.4 1.5 0.4]
[5.6 2.9 3.6 1.3]
[6.9 3.1 5.1 2.3]
[6.2 2.2 4.5 1.5]
[5.8 2.7 3.9 1.2]
[6.5 3.2 5.1 2. ]
[4.8 3. 1.4 0.1]
[5.5 3.5 1.3 0.2]
[4.9 3.1 1.5 0.1]
[5.1 3.8 1.5 0.3]
[6.3 3.3 4.7 1.6]
[6.5 3. 5.8 2.2]
[5.6 2.5 3.9 1.1]
[5.7 2.8 4.5 1.3]
[6.4 2.8 5.6 2.2]
[4.7 3.2 1.6 0.2]
[6.1 3. 4.9 1.8]
[5. 3.4 1.6 0.4]
[6.4 2.8 5.6 2.1]
[7.9 3.8 6.4 2. ]
[6.7 3. 5.2 2.3]
[6.7 2.5 5.8 1.8]
[6.8 3.2 5.9 2.3]
[4.8 3. 1.4 0.3]
[4.8 3.1 1.6 0.2]
[4.6 3.6 1. 0.2]
[5.7 4.4 1.5 0.4]
[6.7 3.1 4.4 1.4]
[4.8 3.4 1.6 0.2]
[4.4 3.2 1.3 0.2]
[6.3 2.5 5. 1.9]
[6.4 3.2 4.5 1.5]
[5.2 3.5 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.2 4.1 1.5 0.1]
[5.8 2.7 5.1 1.9]
[6. 3.4 4.5 1.6]
[6.7 3.1 4.7 1.5]
[5.4 3.9 1.3 0.4]
[5.4 3.7 1.5 0.2]
[5.5 2.4 3.7 1. ]
[6.3 2.8 5.1 1.5]
[6.4 3.1 5.5 1.8]
y train data: [1 2 1 0 1 2 0 0 1 1 0 2 0 0 1 1 2 1 2 2 1 0 0 2 2 0 0 0 1 2 0 2 2 0 1 1 2
1202121110110122012202012212112201201
2]
y test data: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0002110012212121021000120001012012022
1]
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. ... ... ... ...
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8
[150 rows x 4 columns]
Data preprocessing :
missing=df.dropna()
print("Removed the missing values:",missing)
correlation=df.corr()
print("checking the correlation of dataset:",correlation)
sns.heatmap(df.corr())
plt.title('Heatmap of correlation of dataset')
plt.show()
x=df.drop(columns=['sepal length (cm)'])
print("Removed the SepalLengthCm column:",x)
OUTPUT :
Removed the missing values: sepal length (cm) sepal width (cm) petal length (cm) petal
width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. ... ... ... ...
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

checking the correlation of dataset: sepal length (cm) sepal width (cm) petal length
(cm) \
sepal length (cm) 1.000000 -0.117570 0.871754
sepal width (cm) -0.117570 1.000000 -0.428440
petal length (cm) 0.871754 -0.428440 1.000000
petal width (cm) 0.817941 -0.366126 0.962865
petal width (cm)

sepal length (cm) 0.817941
sepal width (cm) -0.366126
petal length (cm) 0.962865
petal width (cm) 1.000000
Removed the SepalLengthCm column: sepal width (cm) petal length (cm) petal width (cm)
0 3.5 1.4 0.2
1 3.0 1.4 0.2
2 3.2 1.3 0.2
3 3.1 1.5 0.2
4 3.6 1.4 0.2
.. ... ... ...
145 3.0 5.2 2.3
146 2.5 5.0 1.9
147 3.0 5.2 2.0
148 3.4 5.4 2.3
149 3.0 5.1 1.8

2. Build decision tree-based model inpython for like Breast Cancer
Wisconsin(diagnostic) dataset from sci-kit learn Or any classification
dataset from UCI ,Kaggle.
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
breast=load_breast_cancer ()
x,y=breast.data,breast.target
clf=DecisionTreeClassifier()
clf=clf.fit(x_train,y_train)
xtrain=clf.predict(x_train)
xtest=clf.predict(x_test)
print("Accuracy of the Traindata:",accuracy_score(y_train,xtrain))
print("Accuracy of the Test data :",accuracy_score(y_test,xtest))
OUTPUT :
Accuracy of the Traindata: 1.0

Accuracy of the Test data : 0.9230769230769231
WEEK - 8
1.Build Logistic regression model in python.
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
# Load the Iris dataset
iris = load_iris()
X,y=iris.data,iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create the Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions and evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:\n", report)
Accuracy: 1.0
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 10

1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
import pandas as pd
df=pd.read_csv('/content/fish.csv')x=df[['LIVE_BAIT','CAMPER','PERSONS','CHILDREN']]
y=df['FISH_COUNT']
print("Shape of the dataset :",df.shape)
print("Five columns of dataset :\n",df.head())
Shape of the dataset : (250, 5)
Five columns of dataset :
LIVE_BAIT CAMPER PERSONS CHILDREN FISH_COUNT
0 0 0 1 0 0
1 1 1 1 0 0
2 1 0 1 0 0
3 1 1 2 1 0
4 1 0 1 0 1
from sklearn import svm
clf=svm.SVC(kernel='linear')
clf.fit(x_train,y_train)
xtest=clf.predict(x_test)
from sklearn.metrics import accuracy_score,classification_report
print("Accuracy of the model is :",accuracy_score(y_test,xtest))
print("classification reort of the model is :",classification_report(y_test,xtest))
Accuracy of the model is : 0.5733333333333334
classification reort of the model is : precision recall f1-score support
0 0.57 1.00 0.73 43

1 0.00 0.00 0.00 5
2 0.00 0.00 0.00 5
3 0.00 0.00 0.00 6
4 0.00 0.00 0.00 1
5 0.00 0.00 0.00 5
6 0.00 0.00 0.00 3
7 0.00 0.00 0.00 2
8 0.00 0.00 0.00 1
15 0.00 0.00 0.00 1
21 0.00 0.00 0.00 1
29 0.00 0.00 0.00 1
31 0.00 0.00 0.00 1
accuracy 0.57 75
macro avg 0.04 0.08 0.06 75
weighted avg 0.33 0.57 0.42 75
WEEK - 09
1.Python Implementation of K-Means clustering Algorithm.import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
# Load the dataset
data = pd.read_csv('/content/fish.csv')
x = data.iloc[:, [3, 4]].values
# Determine the optimal number of clusters using the elbow method
wc = []
for i in range(1, 11):
#The following line was not indented and has been corrected
kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
kmeans.fit(x)
wc.append(kmeans.inertia_)
# Plot the elbow method graph
plt.plot(range(1, 11), wc)
plt.title("The Elbow Method Graph")
plt.xlabel("Number of Clusters")
plt.ylabel("WCSS (Within-Cluster Sum of Squares)")
plt.show()
# Perform K-means clustering with 5 clusters
num_clusters = 5
kmeans = KMeans(n_clusters=num_clusters, init='k-means++',
random_state=42)
y_predict = kmeans.fit_predict(x)
# Visualize the clusters
for cluster_num in range(num_clusters):
plt.scatter(x[y_predict == cluster_num, 0], x[y_predict == cluster_num, 1],
s=100, label=f'Cluster {cluster_num + 1}')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300,
c='yellow', label='Centroid')
plt.title('Clusters of customers (5 clusters)')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()

Week 1 To Week 9

Uploaded by

Document Informationclick to expand document informationNotes assignment

Document Informationclick to expand document information

Copyright:

Available Formats

Week 1 To Week 9

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 1 To Week 9

Uploaded by

Copyright:

Available Formats

WEEK-1

Create repository- named mini project-1 Push the same to GitHub

Shreyas@DESKTOP-4UBQVST MINGW64 ~ (master)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)

nothing to commit (create/copy files and use "git add" to track)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)

The most similar command is

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)

nothing to commit, working tree clean

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)

if name and reg_no and pathway:

# Setting up the main window

# Define the layout

for i, text in enumerate(labels_text):

name_entry, reg_no_entry, pathway_entry = entries

submit_button = tk.Button(root, text="Submit", command=submit)

submit_button.grid(row=len(labels_text), column=0, columnspan=2, pady=10, sticky=tk.W+tk.E)

mysql> create database CS;

Query OK, 1 row affected (0.04 sec)

mysql> use CS;

mysql> create table students(name varchar(20),reg_no int(20),pathway varchar(20));

Query OK, 0 rows affected, 1 warning (0.08 sec)

| name | reg_no | pathway |

| shreyas bhagoji| 2246 | AIML|

1 row in set (0.01 sec)

import matplotlib.pyplot as plt

mean=data['HEAVY GOODS VEHICLE'].mean()

from scipy import stats

import matplotlib.pyplot as plt

summary of the dataset: Id SepalLengthCm SepalWidthCm PetalLengthCm

sample data of the dataset: Id SepalLengthCm SepalWidthCm PetalLengthCm

checking missing values in dataset: Id 0

[150 rows x 4 columns]

[150 rows x 4 columns]

petal width (cm)

[150 rows x 3 columns]

Accuracy of the Traindata: 1.0

0 1.00 1.00 1.00 10

0 0.57 1.00 0.73 43

You might also like