DT 444

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

University Institute of Engineering

Department ofComputer Science & Engineering

Experiment No.: 4
Student Name: UID: 23BAI70
Branch: Computer Science & Engineering Section/Group: B
Semester: 1 Date of Performance: 12/09/2023
Subject Name: Distruptive Technologies
Subject Code: 23ECH-102

1. Aim of the practical:


Explore, transform and summarize input datasets forbuilding
Classification/regression / prediction models.

2. Tool Used:
Google Collaboratory, Laptop, Microsoft word

3. Basic Concept/ Command


Description: CLASSIFICATION

• CLASSIFICATIONPREDICTS THE CATEGORYTHE DATA BELONGS TO.


• Classification is a techniquefor determining which classthe dependent belongs tobased on
one or moreindependent variables.
– Classification is used forpredicting discrete responses.

CLASSIFICATION MODEL• Step 1:– Have a large amount of data that is correctly labeled.
This means that we alarge dataset were corresponding to each observation, we know what
the“type” or “class” or “category” of it is.
• Step 2:– Once the data is prepared, selecting one or more classification algorithmsand
applying them on (typically) the train/development dataset.

• Step 3:
– Tune the hyper-parameters of these classification algorithms and select thatalgorithm (and
its hyper-parameters) that provide the best result.

DIABETES DATASET

• The dataset used for this project is Pima Indians Diabetes Dataset
from Kaggle.
• This dataset is used to predict whether a patient is likely to get
diabetes based on the input parameters like Age, Glucose, Blood
pressure, Insulin, BMI, etc.
• Each row in the data provides relevant information about the
patient.
• It is to be noted that all patients here are females minimum 21
years old belonging to Pima Indian heritage.

FEATURES OF DATASET

The dataset contains 768 individuals data with 9 features set. The detailed
description of all the features are as follows:
• Pregnancies: indicates the number of pregnancies
• Glucose: indicates the plasma glucose concentration
• Blood Pressure: indicates diastolic blood pressure in mm/Hg
• Skin Thickness: indicates triceps skinfold thickness in mm
• Insulin: indicates insulin in U/mL
• BMI: indicates the body mass index in kg/m2
• Diabetes Pedigree Function: indicates the function which scores likelihood
of diabetes based on family history
• Age: indicates the age of the person
• Outcome: indicates if the patient had a diabetes or not (1 = yes, 0 = no)
• Exploratory data analysis

Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that


employs a variety of techniques (mostly graphical) to
• maximize insight into a data set.
• uncover underlying structure.
• extract important variables.
• detect outliers and anomalies.
• test underlying assumptions.
• develop parsimonious models; and determine optimal factor settings.

• Transformation Missing value treatment

• Missing value treatment is not only the most important step in a model-building process,
but also for any data analysis that is used for making decisions.
• Missing values if not properly interpreted can lead to poor decisions which can lead to
sever loss of business.

• Treating missing values


• Dropping records with at least one missing value
• Drop columns that are least significant and has majority of missing values
• Replace missing values with mean, median and mode OR consider missing values as a
different category
• Correlation — Depending on the strength of correlation missing values can be imputed
• Predicting missing values using the Regression technique

4. Code:
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering

5. Observations, Simulation Screen Shots and Discussions:


(a) Installing python
6. Result and Summary:
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering
University Institute of Engineering
Department ofComputer Science & Engineering

7. Learning outcomes (What I have learnt):


1. We discovered that data is essential for directing decisions in a range of businesses.
With the knowledge we've gained from data analysis, we can now improve patient
monitoring, lower expenditures, and optimize operating procedures.

2. As we researched and learned more about data science, we gained a deeper


awareness of its ethical implications. We developed a deep respect for protecting
privacy, maintaining justice, and promoting transparency. This epiphany highlighted
how crucial it is to guarantee that data-driven judgments are fair and considerate of
personal privacy.
3. One of the most important lessons was navigating the onslaught of data that
contemporary firms face. We now have the know-how to efficiently manage large,
connected datasets, allowing us to discover insightful information that, in turn,
supports informed decision-making.
4. We discovered the power of domain knowledge in data analysis. This ability
enables us to create questions that are pertinent to the context, comprehend the
complexities of data within particular fields, and correctly interpret results. Such
expertise greatly improves the standard of data-driven decisions.
5. We gained knowledge and experience with data visualization tools like Matplotlib
and Tableau as a result of our adventure. This ability enables us to create captivating
visual narratives that connect nontechnical stakeholders with complex data and
increase the effectiveness of data-driven decision communication

You might also like