Course Title: Data Pre-Processing and Visualization
Course Title: Data Pre-Processing and Visualization
Course Title: Data Pre-Processing and Visualization
Visualization
Ram Mohan Dhara|
IMTG/ PGDM/ Term – VI / 2017-2019
Session 3 : EDA (Exploratory Data Analysis)
After completing this session, you will be able to –
Session • Carryout a comprehensive exploration of a
objectives dataset in R
Exploratory Data Analysis
• The following is a list of the EDA functions included in the dlookr package-
• describe() - provides descriptive statistics for all variables
• normality() and plot_normality() - perform normalization and visualization of normality
• correlate() and plot_correlate() - calculate the correlation coefficient between two
numerical variables and plots correlation
• target_by() - defines the target variable
• relate() - describes the relationship with the variables of interest corresponding to the
target variable.
• plot.relate() - visualizes the relationship to the variable of interest corresponding to the
target variable.
• summary()- gives a detailed summary of analysis
• eda_report() - performs an exploratory data analysis and reports the results.
Calculating descriptive statistics using describe() in R
• n : number of observations excluding • skewness : skewness
missing values
• kurtosis : kurtosis
• na : number of missing values
• p25 : Q1. 25% percentile
• mean : arithmetic average
• p50 : Q2. median. 50% percentile
• sd : standard deviation
• p75 : Q3. 75% percentile
• se_mean : standard error mean.
sd/sqrt(n)
• IQR : interquartile range (Q3-Q1)
Test of normality of numeric Normalization visualization of
variables using normality() numerical variables using plot_
normality()
• describe()
• normality() and plot_normality()
• correlate() and plot_correlate()
• target_by()
• relate() and plot.relate()
• summary()
• eda_report()
This concludes the session :
EDA (Exploratory Data Analysis)
Next session :
Introduction to Visual Analytics and Tableau