Descriptive Statsistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Statistical Methods for Decision Making

[email protected]
4YS9XLK8G2
What do the numbers tell?

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Why Study Statistics
• Technological developments, Revolution of Internet and social
networks, data generated from electronic devices, produce large
amount of data
• Large storage capacity
[email protected]
4YS9XLK8G2

• Advancement in enormous computing power to effectively process


and analyze large amount of data
• Better data visualization from Business Intelligence
• Discovery of patterns and trends from this data can help
organizations gain competitive advantage in marketplace

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Types of Statistics
• Descriptive statistics is concerned with Data Summarization
Graphs/Charts and tables.

• Inferential Statistics is the method used to talk about a


[email protected]
4YS9XLK8G2

population parameter from a sample. It involves point


estimation, interval estimation, and hypothesis testing.

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Some Key Terms
• Population is the collection of • Parameter is the population characteristic
all possible observations of a of interest. For example, you are interested
specified of characteristic in the average income of a particular class
interest. of people. The average income of this
[email protected]
entire class of people is called a parameter.
4YS9XLK8G2

• Sample is a subset of • Statistic is based on a sample to make


population inferences about the population
parameter. The average income in of
population can be estimated by the
average income based on the sample. This
sample average is called a statistic.
This file is meant for personal use by [email protected] only.
Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Types of Data

Data

[email protected]
Categorical Numerical
4YS9XLK8G2
(Qualitative) (Quantitative)
E.g. Gender, Location of
store, Preference
Discrete Continuous
E.g. Family size, Number of E.g. Waiting time, Length of
rooms in a hotel, number of a part produced
credit cards issued

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Measurement Scales
• Nominal –e.g. Internet service provider

• Ordinal: e.g. Bond rating, employee designation


[email protected]
4YS9XLK8G2

• Interval: e.g. IQ Score, Temperature in °C or °F

• Ratio: e.g. cost of an item

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Measure of central Tendency
• You need the summary measures of central tendency to
draw meaningful conclusions in the functional area of
operation.
[email protected]
4YS9XLK8G2

The most widely used measures of central tendency are the


Arithmetic Mean, Median and Mode.

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Arithmetic Mean
• Arithmetic mean(called Mean) is defined as the sum of all
observations in a data set divided by the total number of
observations. For example, consider a data set containing
the following observations:
[email protected]
4YS9XLK8G2

• In symbolic form mean is given b

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Arithmetic Mean - Example
• The inner diameter of a particular grade of tire based on 5 sample
measurements are as follows: (Figures in millimetres)

565, 570, 572, 568, 585


[email protected]
4YS9XLK8G2

Applying the formula

We get mean = (565 + 570+572+568+585)/5 =572

• Caution: Arithmetic Mean is affected by extreme values or fluctuations


in sampling. It is not the best average to use when the data set contains
extreme values (Very high or very low values).
This file is meant for personal use by [email protected] only.
Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Median

• Median is the middle most observation when you arrange


data in ascending order of magnitude. Median is such 50% of
the observations are above the median and 50% of the
[email protected]

observations are below the median.


4YS9XLK8G2

• Median is a very useful measure for ranked data in the


context Preferences and rating. It is not affected by extreme
values (greater resistance to outliers)
• Median = (n+1)/2 th value of ranked data.
• n = Number of observations in the sample This file is meant for personal use by [email protected] only.
Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Median - Example

• Marks obtained by 7 students in computer science exam


are given below: Compute the median.
45 40 60 80 90 65 55
[email protected]
4YS9XLK8G2

• Arranging the data after ranking them


40 45 55 60 65 80 90
• Median = (n+1)/2 th value in this set = (7+1)/2 th
observation= 4th observation=60
• Hence median = 60 for this problem.
This file is meant for personal use by [email protected] only.
Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
[email protected]
4YS9XLK8G2

https://economictimes.indiatimes.com/articleshow/52450273.cms?utm_source=contentofinterest
&utm_medium=text&utm_campaign=cppst
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
US median household income climbs to new high
If you're a member of the middle class, chances are things are looking up. Median
household income reached a record $61,372 in 2017, up 1.8 percent from $60,309
in 2016.This marks the third year in a row that median household income has
gone up, according to the U.S. Census Bureau, which compiled the data.
[email protected]
4YS9XLK8G2

https://www.cnbc.com/2018/09/12/median-household-income-climbs-to-new-high-
of-61372.html

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Questions about the response time of ambulances dispatched to the stadium
were also raised. The head of New South Wales Ambulance was to be hauled
before the state health minister Jillian Skinner on Thursday after the ambulance
authority issued conflicting statements about their response times. The arrival of
[email protected]
4YS9XLK8G2

the first ambulance took 15 minutes, NSW Ambulance clarified in a statement on


Wednesday. The state's median response time for the highest priority life-
threatening cases was just under eight minutes in 2013-14, according the
authority's statistics
http://timesofindia.indiatimes.com/articleshow/45292785.cms?utm_source=conte
ntofinterest&utm_medium=text&utm_campaign=cppst

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Mode
• Mode is that value which occurs most often. It has the
maximum frequency of occurrence. Mode also has
resistance to outliers.
• Mode is a very useful measure when you want to keep in the
[email protected]
4YS9XLK8G2

inventory, the most popular shirt in terms of collar size


during festival season.
• Caution: In a few problems in real life, there will be more
than one mode such as bimodal and multi-modal values. In
these cases mode cannot be uniquely determined.
This file is meant for personal use by [email protected] only.
Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Mode - Example

• The life in number of hours of 10 flashlight batteries are


as follows: Find the mode
[email protected]
4YS9XLK8G2

• 340 340 350 350 340 340 320 340 330 330

• 340 occurs five times. Hence, mode = 34O.

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Comparison of Mean, Median
and Mode
Mean Median Mode

Affected by extreme values. Not affected by extreme values. Not affected by extreme values.

Can be treated algebraically. That is, Cannot be treated algebraically. That


[email protected]
Cannot be treated algebraically.
4YS9XLK8G2 Means of several groups can be is, Medians of several groups cannot That is, Modes of several groups
combined. be combined. cannot be combined.

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Measures of Dispersion

• In simple terms, of dispersion indicate how large the spread


of the distribution is around the central tendency.
• It answers unambiguously the equation
[email protected]

“What is the magnitude of departure from the average value


4YS9XLK8G2

for different groups having identical averages?”

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Range
• Range is the simplest of all the measures of dispersion. It is
calculated as the difference between maximum and
minimum value in the data set.
[email protected]
4YS9XLK8G2

Range = X Maximum –X Minimum

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Range -Example

Example for calculating Range


The following data represents the percentage return on the
investment for the 10 mutual funds per annum.
Calculate the Range.
[email protected]
4YS9XLK8G2

12, 14, 11, 18, 11.3, 12, 14, 11, 9

Range = X Maximum –X minimum = 18 - 9 = 9


Caution: If one of the components of range namely the
maximum value or minimum value becomes an extreme value,
then range should not be used. This file is meant for personal use by [email protected] only.
Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Inter-Quartile Range(IQR)

• IQR= Range computed on middle 50% of the observations


after eliminating the highest and lowest 25% of observations
in a data set that is arranged ascending order. IQR is less
affected by outliers.
[email protected]
4YS9XLK8G2

• IQR =Q3-Q1

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Interquartile Range-Example

• The following data represents the percentage return on


investment for 9 mutual funds per annum. Calculate
interquartile range.
[email protected]
4YS9XLK8G2

• Data set: 12, 14, 11, 18, 11.5, 12, 14, 11, 9
• Arranging in ascending order, the data set becomes
9, 11, 11, 11.5, 12, 12, 14, 14, 18
IQR = Q3 – Q1 = 14 – 11 = 3
This file is meant for personal use by [email protected] only.
Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Standard deviation
• Standard deviation forms the cornerstone for the inferential
statistics.

• To define standard deviation, you need to define another


[email protected]
4YS9XLK8G2

term called variance. In simple terms, standard deviation is


the square root of variance.

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
[email protected]
4YS9XLK8G2

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Example of Standard Deviation
• The following data represent the percentage return on
investment for 10 mutual funds per annum. Calculate the
sample standard deviation.
[email protected]
4YS9XLK8G2

• 12, 14, 11, 18, 10.5, 11.3, 12, 14, 11, 9

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
[email protected]
4YS9XLK8G2

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Solution for the example cont.
Form the spreadsheet of the Microsoft excel in the previous slide, it is
easy to see
that Mean = = 12.28 ( In column A and row14, 12.28 is seen)
[email protected]
4YS9XLK8G2

Sample variance = = 6.33 ( In column D and row14, 6.33 is seen)

Sample standard deviation = = 2.52 ( In column D and row15,


2.52 is seen)

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
[email protected]
4YS9XLK8G2

Histogram( also known as frequency histogram) is a snap shot of the frequency distribution.

Histogram is a graphical representation of the frequency distribution in which the X-axis represents the
classes and the Y-axis represents the frequencies in bars.

Histogram depicts the pattern of the distribution emerging from the characteristic being measured.

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
The Empirical Rule
• The empirical rule approximates the variation of data in the
bell-shaped distribution.
• Approximately 68% of the data in a bell shaped distribution is
within 1 standard deviation of the mean or
[email protected]
4YS9XLK8G2

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
The Empirical Rule
• Approximately 95% of the data is a bell-shaped distribution lies within
two standard deviations of the mean, or

• Approximately 99.73% of the data is a bell-shaped distribution lies


[email protected]
4YS9XLK8G2

within three standard deviations of the mean, or

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Coefficient of Variation
(Relative Dispersion)
• Coefficient Variation (CV) is defined as the ratio of standard
deviation to mean.

[email protected]

• In symbolic form
4YS9XLK8G2

CV = S/𝑋ത for the sample data and = σ/µ for the population
data.

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Coefficient of Variation
Example
• Following is the performance of two Sales Teams in terms
of monthly sales
Comment on the results.
[email protected]
4YS9XLK8G2
Sales Team 1
• Standard deviation: 10 units

Sales Team 2
• Standard Deviation 12 units
This file is meant for personal use by [email protected] only.
Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Coefficient of Variation
Example
• Additional information
Sales Team 1
• Mean: 70 units
[email protected]
4YS9XLK8G2

Sales Team 2
• Mean: 120 units

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Interpretation for the Example
• The CV for Team 1 is 10/70 = 0.14 or 14%
• The CV for Team 2 is 12/120 = 0.10 or 10%

[email protected]
4YS9XLK8G2

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
The five number summary
• The five numbers that help describe the center, spread and
shape of the data are:
▪ XSmallest

[email protected]
4YS9XLK8G2
First Quartile (Q1)
▪ Median (Q2)
▪ Third Quartile (Q3)
▪ XLargest

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
CASE STUDY - HEALTH INSURANCE

• Most companies are now recognizing the power of data in making crucial
business decisions. For an Insurance company, it becomes more important to
study various attributes about their customers. Leveraging this customer
information to make business decisions can provide a competitive edge to the
company over other players in the market
[email protected]
4YS9XLK8G2

• We are provided with some customer data of an Insurance company like age,
gender, BMI and medical charges billed by insurance company. We need to
explore this data to see if we can derive some meaningful insights from this data.

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
[email protected]
4YS9XLK8G2

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Five number summary and
The Boxplot
• The Boxplot: A graphical display of the data on the five-
number summary:

[email protected]
4YS9XLK8G2

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Five number summary:
Shape of Boxplots
• If data is symmetric around the median then the box and
central line are centered between the endpoints.
[email protected]
4YS9XLK8G2

• A Boxplot can be shown in either a vertical or horizontal


orientation.
This file is meant for personal use by [email protected] only.
Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Distribution shape and
The Boxplots

[email protected]
4YS9XLK8G2

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Boxplot Example
• Below is a Boxplot for the following data:

[email protected]
4YS9XLK8G2

• The data are right skewed, as the plot depicts


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Boxplot example showing an outlier
• The Boxplot below of the same data shows the outlier
value of 27 plotted separately.
• A value is considered an outlier if it is more than 1.5 times
[email protected]
4YS9XLK8G2
the interquartile range between Q1 or above Q3.

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Covariance
• The covariance measures the strength of the linear relationship between two
numerical variables (X and Y). Formula for sample Covariance is

[email protected]
4YS9XLK8G2

• Drawback of covariance: It can have any value, so it cannot be used to


determine the relative strength of the relationship

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.
Coefficient of Correlation
• Coefficient of correlation (r) measures the relative strength of a linear
relationship between two numerical variables.

[email protected]

• The values r range from -1 to + 1


4YS9XLK8G2

• The value -1 indicates a perfect negative correlation and +1 indicates a perfect


positive correlation

This file is meant for personal use by [email protected] only.


Proprietary content. Sharing
©GreatorLearning. All contents
publishing the Rights Reserved.
in part or fullUnauthorized
is liable for legaluse or distribution
action. prohibited.

You might also like