DT 444
DT 444
DT 444
Experiment No.: 4
Student Name: UID: 23BAI70
Branch: Computer Science & Engineering Section/Group: B
Semester: 1 Date of Performance: 12/09/2023
Subject Name: Distruptive Technologies
Subject Code: 23ECH-102
2. Tool Used:
Google Collaboratory, Laptop, Microsoft word
CLASSIFICATION MODEL• Step 1:– Have a large amount of data that is correctly labeled.
This means that we alarge dataset were corresponding to each observation, we know what
the“type” or “class” or “category” of it is.
• Step 2:– Once the data is prepared, selecting one or more classification algorithmsand
applying them on (typically) the train/development dataset.
• Step 3:
– Tune the hyper-parameters of these classification algorithms and select thatalgorithm (and
its hyper-parameters) that provide the best result.
DIABETES DATASET
• The dataset used for this project is Pima Indians Diabetes Dataset
from Kaggle.
• This dataset is used to predict whether a patient is likely to get
diabetes based on the input parameters like Age, Glucose, Blood
pressure, Insulin, BMI, etc.
• Each row in the data provides relevant information about the
patient.
• It is to be noted that all patients here are females minimum 21
years old belonging to Pima Indian heritage.
FEATURES OF DATASET
The dataset contains 768 individuals data with 9 features set. The detailed
description of all the features are as follows:
• Pregnancies: indicates the number of pregnancies
• Glucose: indicates the plasma glucose concentration
• Blood Pressure: indicates diastolic blood pressure in mm/Hg
• Skin Thickness: indicates triceps skinfold thickness in mm
• Insulin: indicates insulin in U/mL
• BMI: indicates the body mass index in kg/m2
• Diabetes Pedigree Function: indicates the function which scores likelihood
of diabetes based on family history
• Age: indicates the age of the person
• Outcome: indicates if the patient had a diabetes or not (1 = yes, 0 = no)
• Exploratory data analysis
• Missing value treatment is not only the most important step in a model-building process,
but also for any data analysis that is used for making decisions.
• Missing values if not properly interpreted can lead to poor decisions which can lead to
sever loss of business.
4. Code:
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering