healthcare-project-simplilearn- Week1
healthcare-project-simplilearn- Week1
healthcare-project-simplilearn- Week1
In [ ]: Data Exploration:
• Glucose
• BloodPressure
• SkinThickness
• Insulin
• BMI
2. Visually explore these variables using histograms. Treat the missing values accor
3. There are integer and float data type variables in this dataset. Create a count (
In [7]: df.head()
1 1 85 66 29 0 26.6 0.351 31
3 1 89 66 23 94 28.1 0.167 21
In [8]: df.tail()
localhost:8888/nbconvert/html/healthcare-project-simplilearn.ipynb?download=false 1/6
9/16/2021 healthcare-project-simplilearn
In [11]: df.columns
In [12]: df.describe()
In [13]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
In [14]: df.shape
Out[14]: (768, 9)
In [15]: df.isnull().sum()
Out[15]: Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
In [29]: df['Outcome'].value_counts()
df['Outcome'].value_counts(normalize=True)
localhost:8888/nbconvert/html/healthcare-project-simplilearn.ipynb?download=false 2/6
9/16/2021 healthcare-project-simplilearn
Out[29]: 0 0.651042
1 0.348958
Name: Outcome, dtype: float64
1 1 85 66 29 0 26.6 0.351 31
3 1 89 66 23 94 28.1 0.167 21
In [39]: plt.figure(figsize=(15,15))
for i, feature in enumerate(feature_cols):
rows = int(len(feature_cols)/2)
plt.subplot(rows, 2, i+1)
sns.distplot(df[feature])
plt.tight_layout()
plt.show()
localhost:8888/nbconvert/html/healthcare-project-simplilearn.ipynb?download=false 3/6
9/16/2021 healthcare-project-simplilearn
In [49]: print(end="\n")
print("Negatively skewed Features are {}".format(negative_skew), end='\n')
print("Positively skewed Features are {}".format(positive_skew), end='\n')
print("Negatively skewed feature")
localhost:8888/nbconvert/html/healthcare-project-simplilearn.ipynb?download=false 4/6
9/16/2021 healthcare-project-simplilearn
Out[51]: percent_missing
column_name
Insulin 48.697917
SkinThickness 29.557292
Pregnancies 14.453125
BloodPressure 4.557292
BMI 1.432292
Glucose 0.651042
DiabetesPedigreeFunction 0.000000
Age 0.000000
Outcome 0.000000
In [53]: df.isnull().sum()
Out[53]: Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
In [55]: df.head()
localhost:8888/nbconvert/html/healthcare-project-simplilearn.ipynb?download=false 5/6
9/16/2021 healthcare-project-simplilearn
3 1 89 66 23 94 28.1 0.167 21
In [56]: df.dtypes
In [58]: plt.figure(figsize=(10,6))
df.dtypes.value_counts().plot(kind='bar', color='gray')
plt.title("Frequency plot describing the data types and the count of variables"
, fontsize=15,loc='center', color='Black')
plt.xlabel("Data types")
plt.ylabel("Count of types")
plt.show()
In [ ]:
localhost:8888/nbconvert/html/healthcare-project-simplilearn.ipynb?download=false 6/6