Descriptive Analytics - Uni and Bi
Descriptive Analytics - Uni and Bi
Descriptive Analytics - Uni and Bi
Introductory
Big Data
College of Engineering
Chapter -2-
Descriptive Analytics Part 1
Descriptive
Analytics Part 1
2
Our Mission: From Raw Data to a
Dashboard
Learning Objectives
• Scale types
• Introduction to descriptive analytics
• Univariate descriptive analytics
• Visualization
4
Let’s Consider these Scenarios
• Can you study the employees behavior in ALL government
organizations by surveying ALL the employees?
• Can you study the purchasing behavior of ALL teenagers around the
globe by surveying ALL teenagers?
• Is it feasible?
7
Scale Types .. Brainstorming
• Does your family name describe a quantity?
9
Scale Types
• What does ordinal mean?
• Qualitative scales
• Nominal: categorizes data in a non-
ordinal way (CAN’T BE ORDERED)
• Operations: = and ≠
• E.g. friend’s name and gender (e.g. Eve is a
Female – Eve is not a Male)
• Ordinal: categorizes data in an ordinal
way (CAN BE ORDERED)
• Operations: =, ≠, <, >, ≤, and ≥
• E.g. company
• Let’s compare Andrew and Marcus
Company
10
Scale Types
• Quantitative scales
• Always numeric
• Relative (Interval): does not have an
absolute zero
• Operations: =, ≠, <, >, ≤, ≥, - and +
• E.g. temperature
• Absolute (Ratio): has an absolute zero
• Operations: =, ≠, <, >, ≤, ≥, -, +, / and ×
• E.g. weight and heigth
Nominal = and ≠
Hence, in many cases we convert data of a certain scale type to another more informative type
12
Descriptive
univariate analysis
13
Descriptive Univariate Analysis
• What does univariate mean?
• In descriptive univariate analysis, three types of
information can be obtained:
1. Frequency tables
2. Visualization (plots)
3. Statistical measures
14
Frequency Tables
15
Descriptive Univariate Analysis:
Frequencies
• A frequency is basically a counter
• Absolute frequency counts how many times a value appears.
• Relative frequency counts the percentage of times that value appears.
16
Example 1 – Company
7/14=50%
17
Example 2 – Height
18
Data Visualization
19
Descriptive Univariate Analysis:
data visualization
• Pie chart: it is used typically for
Qualitative Data
• Question: can you estimate the
proportion of Bad values in the data by
looking at the pie chart?
20
Descriptive Univariate Analysis:
data visualization
• Bar chart: It is used
typically for qualitative
scales.
• Sometimes it can be used
with quantitative scales
with a limited number of
values.
21
Descriptive Univariate Analysis:
data visualization Max Temp Day
21 1
• Line chart: They are specially 25 2
used to deal with the notion of 30 3
time. 20 4
21 5
26
Descriptive Univariate Analysis:
statistics
• A statistic is a descriptor
• Location statistics:
• It describes numerically a • Minimum: is the lowest value
characteristic of the sample or • Maximum: is the largest value
the population • Mean: is the average value
• There are two main groups of • Mode: is the most frequent value
univariate statistics: • The value that is larger than:
• Location statistics • 25% of all values is the 1st quartile
• Dispersion statistics • 50% of all values is the median or 2nd
quartile
• 75% of all values is the 3rd quartile
27
Example
• Let us use as an example the attribute
weight from our data set
29
Descriptive Univariate Analysis:
statistics
• Box-plots can also be used
to describe the symmetry/
skewness of an attribute
31
Descriptive Univariate Analysis:
statistics
• Dispersion statistics (cont): • Using again as example the
• Standard deviation: is another weight attribute, dispersion
measure for the typical distance statistics are as shown in the
between the observations and table
their mean
• Its math formula for the population
is: Dispersion statistic Weight (kg)
• Its math formula for a sample is:
Amplitude 60.00
• The square of the standard
deviation is named variance Interquartile range 21.75
14.31
s 17.38
32
Descriptive bivariate
analysis
36
Descriptive bivariate analysis
• When the two attributes of the pair
are quantitative
• There are several visualization
techniques able to visually show the
distribution of points with two
quantitative attributes
• One of these techniques is scatter
plots
37
Descriptive bivariate analysis
• Pearson correlation
• Sample Pearson correlation
38
Descriptive bivariate analysis
• The Spearman's rank correlation:
39
Example Friend Weight Height Ranked Ranked
(cm) (cm) weight height
Andrew 77 175 1.0 1.0
Bernhard 110 195 4.0 2.0
• Pearson correlation Carolina 70 172 2.0 3.0
Dennis 85 180 3.0 4.0
Eve 65 168 5.0 5.5
• Spearman's rank Fred 75 173 6.0 5.5
correlation Gwyneth 75 180 7.5 7.0
• Hayden 63 165 9.0 8.0
Irene 55 158 7.5 9.5
James 66 163 11.0 9.5
Kevin 95 190 10.0 11.0
Lea 72 172 12.0 12.0
Marcus 83 185 14.0 13.0
Nigel 115 192 13.0 14.0
Discussion
42