Week 1 To Week 9

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 30

WEEK-1

Create repository- named mini project-1 Push the same to GitHub

Shreyas@DESKTOP-4UBQVST MINGW64 ~ (master)


$ cd desktop

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)


$ git config --global user.name shreyas55555

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)


$ git config --global user.email [email protected]

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)


$ git init
Reinitialized existing Git repository in C:/Users/Shreyas/Desktop/.git/

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)


$ git clone https://github.com/Shreyas55555/kaggle.git
Cloning into 'kaggle'...
warning: You appear to have cloned an empty repository.

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop (master)


$ cd kaggle

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)


$ git status
On branch main
No commits yet

nothing to commit (create/copy files and use "git add" to track)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)


$ touch shreyasb

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)


$ git add -A

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)


$ git commmit -m "a"
git: 'commmit' is not a git command. See 'git --help'.

The most similar command is


commit

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)


$ git commit -m "a"
[main (root-commit) 570c72d] a
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 shreyasb

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)


$ git status
On branch main
Your branch is based on 'origin/main', but the upstream is gone.
(use "git branch --unset-upstream" to fixup)

nothing to commit, working tree clean

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (main)


$ git checkout -b sshreyas
Switched to a new branch 'sshreyas'

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)


$ git add ggg
fatal: pathspec 'ggg' did not match any files

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)


$ touch ggg

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)


$ git add ggg

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)


$ git commit -m "dd"
[sshreyas 2360687] dd
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 ggg

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)


$ git push origin sshreyas
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (5/5), 407 bytes | 407.00 KiB/s, done.
Total 5 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To https://github.com/Shreyas55555/kaggle.git
* [new branch] sshreyas -> sshreyas

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)


$ echo "# kagge1" >> ggg

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)


$ git add ggg
warning: in the working copy of 'ggg', LF will be replaced by CRLF the next time Git touches it

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)


$ git commit -m "g"
[sshreyas cbcd087] g
1 file changed, 1 insertion(+)

Shreyas@DESKTOP-4UBQVST MINGW64 ~/desktop/kaggle (sshreyas)


$ git push origin sshreyas
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 271 bytes | 271.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
To https://github.com/Shreyas55555/kaggle.git
2360687..cbcd087 sshreyas -> sshreyas
Week:2
Python recap Database connectivity
Code:
import pymysql
from tkinter import messagebox
import tkinter as tk

def submit():
name = name_entry.get()
reg_no = reg_no_entry.get()
pathway = pathway_entry.get()

if name and reg_no and pathway:


try:
db_config = {
"host": "localhost",
"user": "root",
"password": "bvvs",
"database": "CS"
}
conn = pymysql.connect(**db_config)
cursor = conn.cursor()
query = "INSERT INTO students (name, reg_no, pathway) VALUES (%s, %s, %s)"
values = (name, reg_no, pathway)
cursor.execute(query, values)
conn.commit()
cursor.close()
conn.close()
messagebox.showinfo("Success", "Data submitted successfully!")
clear_fields()
except pymysql.MySQLError as err:
messagebox.showerror("Error", f"Database error: {err}")
except Exception as e:
messagebox.showerror("Error", f"Unexpected error: {e}")
else:
messagebox.showwarning("Warning", "Please fill in all fields.")

def clear_fields():
name_entry.delete(0, tk.END)
reg_no_entry.delete(0, tk.END)
pathway_entry.delete(0, tk.END)

# Setting up the main window


root = tk.Tk()
root.title("Database GUI")

# Define the layout


labels_text = ["Name:", "Registration:", "Pathway:"]
entries = []

for i, text in enumerate(labels_text):


tk.Label(root, text=text).grid(row=i, column=0, padx=10, pady=10, sticky=tk.W)
entry = tk.Entry(root)
entry.grid(row=i, column=1, padx=10, pady=10)
entries.append(entry)

name_entry, reg_no_entry, pathway_entry = entries

submit_button = tk.Button(root, text="Submit", command=submit)


clear_button = tk.Button(root, text="Clear", command=clear_fields)

submit_button.grid(row=len(labels_text), column=0, columnspan=2, pady=10, sticky=tk.W+tk.E)


clear_button.grid(row=len(labels_text)+1, column=0, columnspan=2, pady=10, sticky=tk.W+tk.E)

root.mainloop()

Mysql:

mysql> create database CS;

Query OK, 1 row affected (0.04 sec)

mysql> use CS;

Database changed

mysql> create table students(name varchar(20),reg_no int(20),pathway varchar(20));

Query OK, 0 rows affected, 1 warning (0.08 sec)

Output:
mysql> select * from students;

+------------------+--------+------------+

| name | reg_no | pathway |

+------------------+--------+------------+

| shreyas bhagoji| 2246 | AIML|

+------------------+--------+------------+

1 row in set (0.01 sec)


Week:3
Consider a dataset and infer the relations with the help of different plots

import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("/content/data.csv")
df.head()

plt.hist(df['FastCharge_KmH'],bins=10)
plt.hist(df['TopSpeed_KmH'],bins=10)
plt.show()

plt.scatter(x='TopSpeed_KmH',y='FastCharge_KmH',data=df)
plt.xlabel('TopSpeed_KmH')
plt.ylabel('FastCharge_KmH')
plt.show()
plt.boxplot(df['FastCharge_KmH'])
plt.show()

import matplotlib.pyplot as plt


plt.bar(df['TopSpeed_KmH'],df['FastCharge_KmH'],width=5)
plt.show()
plt.scatter(x='TopSpeed_KmH',y='FastCharge_KmH',data=df)
plt.xlabel(" miles per gallon")
plt.show()

plt.hist(df['TopSpeed_KmH'],width=10)
plt.show()
Week4
Use revelent python packages to compute Central tendency for the parameters
Dispersion for the parameters
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
import numpy as np
data=pd.read_csv("/content/Copy of ev_sales_year1.csv")
data.head()

mean=data['HEAVY GOODS VEHICLE'].mean()


median=data['HEAVY PASSENGER VEHICLE'].median()
mode=stats.mode(data['LIGHT GOODS VEHICLE'])
print("centraltendency")
print(mean)
print(median)
print(mode)
centraltendency
1.6666666666666667
0.0
ModeResult(mode=5, count=3)
data_range=np.ptp(data['HEAVY GOODS VEHICLE'])
variance=np.var(data['HEAVY PASSENGER VEHICLE'])
std_deviation=np.std(data['LIGHT GOODS VEHICLE'])
iqr=np.percentile(data['LIGHT GOODS VEHICLE'],75)-np.percentile(data['LIGHT GOODS
VEHICLE'],25)
print("dispersion")
print("data_range",data_range)
print("variance",variance)
print("std_deviation",std_deviation)
print("iqr",iqr)
output=dispersion
data_range 3
variance 0.07638888888888888
std_deviation 5.97854961972848
iqr 2.5
import seaborn as sns
plt.figure(figsize=(15,5))
plt.subplot(131)
sns.histplot(data['HEAVY GOODS VEHICLE'],kde=True)
plt.title("HistogramofHEAVY GOODS VEHICLE")
plt.subplot(132)
sns.boxplot(x='HEAVY GOODS VEHICLE',data=data)
plt.title("BoxPlotofHEAVY GOODS VEHICLE")
plt.show()
Week5
Dealing with missing values with different approaches Outliers detecting Outliers
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("/content/Copy of ev_sales_year1.csv")
df.head()

missing_values=data.isnull().sum()
print("MissingValues:\n",missing_values)
MissingValues:
Date 0
HEAVY GOODS VEHICLE 0
HEAVY MOTOR VEHICLE 0
HEAVY PASSENGER VEHICLE 0
LIGHT GOODS VEHICLE 0
LIGHT MOTOR VEHICLE 0
LIGHT PASSENGER VEHICLE 0
MEDIUM MOTOR VEHICLE 0
OTHER THAN MENTIONED ABOVE 0
THREE WHEELER(NT) 0
THREE WHEELER(T) 0
TWO WHEELER(NT) 0
TWO WHEELER(T) 0
dtype: int64

from scipy import stats


import pandas as pd
df=pd.read_csv('/content/Copy of ev_sales_year1.csv')
df.describe()
zscore=stats.zscore(df['HEAVY GOODS VEHICLE'])
threshold=2
outlier=abs(zscore)>threshold
print(outlier)
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
Name: HEAVY GOODS VEHICLE, dtype: bool

import matplotlib.pyplot as plt


import pandas as pd
data={ 'PassengerId':[101,102,103,104,105,106,107,108,109,110,1000],'Pclass':
[1,2,3,4,5,6,7,8,9,10,11] }
df=pd.DataFrame(data)
z_scores=(df['PassengerId']-df['PassengerId'].mean())/df['PassengerId'].std()
outliers=df[(z_scores>threshold)|(z_scores<-threshold)]
plt.figure(figsize=(8,6))
plt.plot(df.index,df['PassengerId'],'bo',label='DataPoints')
plt.plot(outliers.index,outliers['PassengerId'],'ro',label='Outliers')
plt.xlabel("Index")
plt.ylabel("PassengerId")
plt.legend()
plt.show()
import matplotlib.pyplot as plt
plt.scatter(x='Pclass',y='PassengerId',data=df)
plt.xlabel("milespergallon")
plt.ylabel("")
plt.show()
WEEK-6
split training and testing data sets inPython using train_test_split() of
sci-kitlearn.Explore the optionsof train_test_split()
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris=load_iris()
x,y=iris.data,iris.target
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.5,random_state=42)
print("X train data :",x_train)
print("X test data :",x_test)
print("y train data :",y_train)
print("y test data :",y_test)
X train data : [[5.4 3. 4.5 1.5]
[6.2 3.4 5.4 2.3]
[5.5 2.3 4. 1.3]
[5.4 3.9 1.7 0.4]
[5. 2.3 3.3 1. ]
[6.4 2.7 5.3 1.9]
[5. 3.3 1.4 0.2]
[5. 3.2 1.2 0.2]
[5.5 2.4 3.8 1.1]
[6.7 3. 5. 1.7]
[4.9 3.1 1.5 0.2]
[5.8 2.8 5.1 2.4]
[5. 3.4 1.5 0.2]
[5. 3.5 1.6 0.6]
[5.9 3.2 4.8 1.8]
[5.1 2.5 3. 1.1]
[6.9 3.2 5.7 2.3]
[6. 2.7 5.1 1.6]
[6.1 2.6 5.6 1.4]
[7.7 3. 6.1 2.3]
[5.5 2.5 4. 1.3]
[4.4 2.9 1.4 0.2]
[4.3 3. 1.1 0.1]
[6. 2.2 5. 1.5]
[7.2 3.2 6. 1.8]
[4.6 3.1 1.5 0.2]
[5.1 3.5 1.4 0.3]
[6.3 3.3 6. 2.5]
[4.7 3.2 1.3 0.2]
[7.1 3. 5.9 2.1]]
X test data : [[6.1 2.8 4.7 1.2]
[5.7 3.8 1.7 0.3]
[7.7 2.6 6.9 2.3]
[6. 2.9 4.5 1.5]
[6.8 2.8 4.8 1.4]
[5.4 3.4 1.5 0.4]
[5.6 2.9 3.6 1.3]
[6.9 3.1 5.1 2.3]
[6.2 2.2 4.5 1.5]
[5.8 2.7 3.9 1.2]
[6.5 3.2 5.1 2. ]
[4.8 3. 1.4 0.1]
[5.5 3.5 1.3 0.2]
[4.9 3.1 1.5 0.1]
[5.1 3.8 1.5 0.3]
[6.3 3.3 4.7 1.6]
[6.5 3. 5.8 2.2]
[5.6 2.5 3.9 1.1]
[5.7 2.8 4.5 1.3]
[6.4 2.8 5.6 2.2]
[4.7 3.2 1.6 0.2]
[6.1 3. 4.9 1.8]
[5. 3.4 1.6 0.4]
[6.4 2.8 5.6 2.1]
[7.9 3.8 6.4 2. ]
[6.7 3. 5.2 2.3]
[6.7 2.5 5.8 1.8]
[6.8 3.2 5.9 2.3]
[4.8 3. 1.4 0.3]
[4.8 3.1 1.6 0.2]
[4.6 3.6 1. 0.2]
[5.7 4.4 1.5 0.4]
[6.7 3.1 4.4 1.4]
[4.8 3.4 1.6 0.2]
[4.4 3.2 1.3 0.2]
[6.3 2.5 5. 1.9]
[6.4 3.2 4.5 1.5]
[5.2 3.5 1.5 0.2]
[5. 3.6 1.4 0.2]

y train data : [1 2 1 0 1 2 0 0 1 1 0 2 0 0 1 1 2 1 2 2 1 0 0 2 2 0 0 0 1 2 0 2 2 0 1 1 2
1202121110110122012202012212112201201
2]
y test data : [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0002110012212121021000120001012012022
1]
WEEK - 7
Iris dataset from sci-kit learn Perform data exploration,preprocessing and
splitting
Data Exploration :
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df=pd.read_csv('/content/Iris.csv')
summary=df.describe()
sample_data=df.head()
missing_values=df.isnull().sum()
print("\nsummary of the dataset:",summary)
print("\nsample data of the dataset:",sample_data)
print("\nchecking missing values in dataset:",missing_values)
sns.histplot(df['SepalLengthCm'],bins=10)
plt.title('Histplot of SepalLengthCm column')
plt.show()
sns.barplot(data=df,x='SepalLengthCm')
plt.title('count plot of SepalWidthCm column')
plt.show()

summary of the dataset: Id SepalLengthCm SepalWidthCm PetalLengthCm


PetalWidthCm
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 75.500000 5.843333 3.054000 3.758667 1.198667
std 43.445368 0.828066 0.433594 1.764420 0.763161
min 1.000000 4.300000 2.000000 1.000000 0.100000
25% 38.250000 5.100000 2.800000 1.600000 0.300000
50% 75.500000 5.800000 3.000000 4.350000 1.300000
75% 112.750000 6.400000 3.300000 5.100000 1.800000
max 150.000000 7.900000 4.400000 6.900000 2.500000

sample data of the dataset: Id SepalLengthCm SepalWidthCm PetalLengthCm


PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa

checking missing values in dataset: Id 0


SepalLengthCm 0
SepalWidthCm 0
PetalLengthCm 0
PetalWidthCm 0
Species 0
dtype: int64

Data Splitting :
import pandas as pd
from sklearn.datasets import load_iris
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
iris=load_iris()
x,y=iris.data,iris.target
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.5,random_state=42)
print("x train data:",x_train)
print("x test data:",x_test)
print("y train data:",y_train)
print("y test data:",y_test)
df=pd.DataFrame(data=iris.data,columns=iris.feature_names)
print(df)
x train data: [[5.4 3. 4.5 1.5]
[6.2 3.4 5.4 2.3]
[5.5 2.3 4. 1.3]
[5.4 3.9 1.7 0.4]
[5. 2.3 3.3 1. ]
[6.4 2.7 5.3 1.9]
[5. 3.3 1.4 0.2]
[5. 3.2 1.2 0.2]
[5.5 2.4 3.8 1.1]
[6.7 3. 5. 1.7]
[4.9 3.1 1.5 0.2]
[5.8 2.8 5.1 2.4]
[5. 3.4 1.5 0.2]
[5. 3.5 1.6 0.6]
[5.9 3.2 4.8 1.8]
[5.1 2.5 3. 1.1]
[6.9 3.2 5.7 2.3]
[6. 2.7 5.1 1.6]
[6.1 2.6 5.6 1.4]
[7.7 3. 6.1 2.3]
[5.5 2.5 4. 1.3]
[4.4 2.9 1.4 0.2]
[4.3 3. 1.1 0.1]
[6. 2.2 5. 1.5]
[7.2 3.2 6. 1.8]
[4.6 3.1 1.5 0.2]
[5.1 3.5 1.4 0.3]
[4.4 3. 1.3 0.2]
[6.3 2.5 4.9 1.5]
[6.3 3.4 5.6 2.4]
[4.6 3.4 1.4 0.3]
[6.8 3. 5.5 2.1]
[6.3 3.3 6. 2.5]
[4.7 3.2 1.3 0.2]
[6.1 2.9 4.7 1.4]
[6.5 2.8 4.6 1.5]
[6.2 2.8 4.8 1.8]
[7. 3.2 4.7 1.4]
[6.4 3.2 5.3 2.3]
[5.1 3.8 1.6 0.2]
[6.9 3.1 5.4 2.1]
[5.9 3. 4.2 1.5]
[6.5 3. 5.2 2. ]
[5.7 2.6 3.5 1. ]
[5.2 2.7 3.9 1.4]
[6.1 3. 4.6 1.4]
[4.5 2.3 1.3 0.3]
[6.6 2.9 4.6 1.3]
[5.5 2.6 4.4 1.2]
[5.3 3.7 1.5 0.2]
[5.6 3. 4.1 1.3]
[7.3 2.9 6.3 1.8]
[6.7 3.3 5.7 2.1]
[5.1 3.7 1.5 0.4]
[4.9 2.4 3.3 1. ]
[6.7 3.3 5.7 2.5]
[7.2 3. 5.8 1.6]
[7.1 3. 5.9 2.1]]
x test data: [[6.1 2.8 4.7 1.2]
[5.7 3.8 1.7 0.3]
[7.7 2.6 6.9 2.3]
[6. 2.9 4.5 1.5]
[6.8 2.8 4.8 1.4]
[5.4 3.4 1.5 0.4]
[5.6 2.9 3.6 1.3]
[6.9 3.1 5.1 2.3]
[6.2 2.2 4.5 1.5]
[5.8 2.7 3.9 1.2]
[6.5 3.2 5.1 2. ]
[4.8 3. 1.4 0.1]
[5.5 3.5 1.3 0.2]
[4.9 3.1 1.5 0.1]
[5.1 3.8 1.5 0.3]
[6.3 3.3 4.7 1.6]
[6.5 3. 5.8 2.2]
[5.6 2.5 3.9 1.1]
[5.7 2.8 4.5 1.3]
[6.4 2.8 5.6 2.2]
[4.7 3.2 1.6 0.2]
[6.1 3. 4.9 1.8]
[5. 3.4 1.6 0.4]
[6.4 2.8 5.6 2.1]
[7.9 3.8 6.4 2. ]
[6.7 3. 5.2 2.3]
[6.7 2.5 5.8 1.8]
[6.8 3.2 5.9 2.3]
[4.8 3. 1.4 0.3]
[4.8 3.1 1.6 0.2]
[4.6 3.6 1. 0.2]
[5.7 4.4 1.5 0.4]
[6.7 3.1 4.4 1.4]
[4.8 3.4 1.6 0.2]
[4.4 3.2 1.3 0.2]
[6.3 2.5 5. 1.9]
[6.4 3.2 4.5 1.5]
[5.2 3.5 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.2 4.1 1.5 0.1]
[5.8 2.7 5.1 1.9]
[6. 3.4 4.5 1.6]
[6.7 3.1 4.7 1.5]
[5.4 3.9 1.3 0.4]
[5.4 3.7 1.5 0.2]
[5.5 2.4 3.7 1. ]
[6.3 2.8 5.1 1.5]
[6.4 3.1 5.5 1.8]

y train data: [1 2 1 0 1 2 0 0 1 1 0 2 0 0 1 1 2 1 2 2 1 0 0 2 2 0 0 0 1 2 0 2 2 0 1 1 2
1202121110110122012202012212112201201
2]
y test data: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0002110012212121021000120001012012022
1]
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. ... ... ... ...
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

[150 rows x 4 columns]

Data preprocessing :
missing=df.dropna()
print("Removed the missing values:",missing)
correlation=df.corr()
print("checking the correlation of dataset:",correlation)
sns.heatmap(df.corr())
plt.title('Heatmap of correlation of dataset')
plt.show()
x=df.drop(columns=['sepal length (cm)'])
print("Removed the SepalLengthCm column:",x)

OUTPUT :
Removed the missing values: sepal length (cm) sepal width (cm) petal length (cm) petal
width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
.. ... ... ... ...
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

[150 rows x 4 columns]


checking the correlation of dataset: sepal length (cm) sepal width (cm) petal length
(cm) \
sepal length (cm) 1.000000 -0.117570 0.871754
sepal width (cm) -0.117570 1.000000 -0.428440
petal length (cm) 0.871754 -0.428440 1.000000
petal width (cm) 0.817941 -0.366126 0.962865

petal width (cm)


sepal length (cm) 0.817941
sepal width (cm) -0.366126
petal length (cm) 0.962865
petal width (cm) 1.000000
Removed the SepalLengthCm column: sepal width (cm) petal length (cm) petal width (cm)
0 3.5 1.4 0.2
1 3.0 1.4 0.2
2 3.2 1.3 0.2
3 3.1 1.5 0.2
4 3.6 1.4 0.2
.. ... ... ...
145 3.0 5.2 2.3
146 2.5 5.0 1.9
147 3.0 5.2 2.0
148 3.4 5.4 2.3
149 3.0 5.1 1.8

[150 rows x 3 columns]


2. Build decision tree-based model inpython for like Breast Cancer
Wisconsin(diagnostic) dataset from sci-kit learn Or any classification
dataset from UCI ,Kaggle.
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
breast=load_breast_cancer ()
x,y=breast.data,breast.target
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,random_state=50)
clf=DecisionTreeClassifier()
clf=clf.fit(x_train,y_train)
xtrain=clf.predict(x_train)
xtest=clf.predict(x_test)
print("Accuracy of the Traindata:",accuracy_score(y_train,xtrain))
print("Accuracy of the Test data :",accuracy_score(y_test,xtest))

OUTPUT :

Accuracy of the Traindata: 1.0


Accuracy of the Test data : 0.9230769230769231
WEEK - 8
1.Build Logistic regression model in python.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_iris
# Load the Iris dataset
iris = load_iris()
X,y=iris.data,iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create the Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions and evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:\n", report)
Accuracy: 1.0
Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 10


1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

import pandas as pd
from sklearn.model_selection import train_test_split
df=pd.read_csv('/content/fish.csv')x=df[['LIVE_BAIT','CAMPER','PERSONS','CHILDREN']]
y=df['FISH_COUNT']
print("Shape of the dataset :",df.shape)
print("Five columns of dataset :\n",df.head())
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=109)
Shape of the dataset : (250, 5)
Five columns of dataset :
LIVE_BAIT CAMPER PERSONS CHILDREN FISH_COUNT
0 0 0 1 0 0
1 1 1 1 0 0
2 1 0 1 0 0
3 1 1 2 1 0
4 1 0 1 0 1
from sklearn import svm
clf=svm.SVC(kernel='linear')
clf.fit(x_train,y_train)
xtest=clf.predict(x_test)
from sklearn.metrics import accuracy_score,classification_report
print("Accuracy of the model is :",accuracy_score(y_test,xtest))
print("classification reort of the model is :",classification_report(y_test,xtest))
Accuracy of the model is : 0.5733333333333334
classification reort of the model is : precision recall f1-score support

0 0.57 1.00 0.73 43


1 0.00 0.00 0.00 5
2 0.00 0.00 0.00 5
3 0.00 0.00 0.00 6
4 0.00 0.00 0.00 1
5 0.00 0.00 0.00 5
6 0.00 0.00 0.00 3
7 0.00 0.00 0.00 2
8 0.00 0.00 0.00 1
15 0.00 0.00 0.00 1
21 0.00 0.00 0.00 1
29 0.00 0.00 0.00 1
31 0.00 0.00 0.00 1

accuracy 0.57 75
macro avg 0.04 0.08 0.06 75
weighted avg 0.33 0.57 0.42 75
WEEK - 09
1.Python Implementation of K-Means clustering Algorithm.import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Load the dataset
data = pd.read_csv('/content/fish.csv')
x = data.iloc[:, [3, 4]].values
# Determine the optimal number of clusters using the elbow method
wc = []
for i in range(1, 11):
#The following line was not indented and has been corrected
kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
kmeans.fit(x)
wc.append(kmeans.inertia_)
# Plot the elbow method graph
plt.plot(range(1, 11), wc)
plt.title("The Elbow Method Graph")
plt.xlabel("Number of Clusters")
plt.ylabel("WCSS (Within-Cluster Sum of Squares)")
plt.show()
# Perform K-means clustering with 5 clusters
num_clusters = 5
kmeans = KMeans(n_clusters=num_clusters, init='k-means++',
random_state=42)
y_predict = kmeans.fit_predict(x)
# Visualize the clusters
for cluster_num in range(num_clusters):
plt.scatter(x[y_predict == cluster_num, 0], x[y_predict == cluster_num, 1],
s=100, label=f'Cluster {cluster_num + 1}')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300,
c='yellow', label='Centroid')
plt.title('Clusters of customers (5 clusters)')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()

You might also like