Chapter 2
Chapter 2
Chapter 2
DESCRIPTIVE
STATISTICS
LECTURER: DR RUZANITA MAT RANI
P R E PA R E D B Y:
H A Z I YA H B I N T I M D J A S M I N
CONTENTS
2.0 Introduction to Descriptive Statistics
2.1 Organizing & Graphing Qualitative and Quantitative Data
2.1.1 Tabular form (Frequency distribution & Cross Tabulation)
2.1.2 Graphical form (Pie chart, Bar chart, Stem and Leaf Plot, Histogram and Boxplot)
2.0
Introduction to Descriptive Statistics
➢Descriptive Statistics are used to describe the basic features of the data in a study.
➢They provide simple summaries about the sample and the measures.
➢With descriptive statistics you are simply describing what is or what the data shows.
➢Descriptive statistics therefore enables us to present the data in a more meaningful way, which allows
simpler interpretation of the data.
➢For example, if we had the results of 100 pieces of students' coursework, we may be interested in the
overall performance of those students. We would also be interested in the distribution or spread of the
marks. Descriptive statistics allow us to do this.
2.1
Organizing & Graphing Qualitative
and Quantitative Data
Data
Presentation
Quantitative data
Qualitative data
1. Frequency distribution 1. Stem and leaf plot
1. Pie chart
2. Cross tabulation 2. Histogram
2. Bar chart
3. Boxplot
2.1.1
Tabular form
1) Frequency Distribution Table
❖ A frequency distribution is a table consisting of columns and rows that contains a list of data values and its
frequency.
❖ Frequency is the number of times a values occurs.
❖ A column may consist of categories of data. A class is a category into which qualitative data can be
classified. Class frequency is the number of observations that fall in a particular class.
❖ For example: A car dealer in KL makes the sales for the following types of cars in the month of January
2005 as shown below:
Car model Number of cars
CLASS Waja 66
CLASS
(Qualitative Wira 50 FREQUENCY
variable)
Saga 39
Gen-2 25
Total 180
EXAMPLE 1
25 army inductees were given a blood test to determine their blood type. The data set and its
frequency table are as follow:
Conclusion: More people have type ‘O’ blood than any other types
2.1.1
Tabular form
2) CROSS TABULATION TABLE (Cross Tab)
❖Also known as “Contingency Table”
❖Illustrated by Row x Column dimension.
❖It is often desirable to examine the categorical responses in terms of Two Qualitative Variables
simultaneously.
Variable 1/ Variable 2 Variable 2 Total Column, j
1 2 ... c
Total Row, i Total Row, 1 Total Row, 2 ... Total Row, i Grand Total
Example 2
A car manufacturer might be interested to know whether colour preference for a car is
independent of gender. In this case, the two variables are GENDER and COLOUR.
❖ The table shows that men preferred black or red cars while women prefer red cars. Men
dislike blue cars while women dislike green cars.
2.1.2
Graphical form (Qualitative data)
10 A B O AB
9
9
8
7
7
AB A
No of Army
6 16%
5 20%
5
4
4
3
2 O B
1 36% 28%
0
A B O AB
Blood Type
Following are the steps to construct stem and leaf plot manually:
Step 1 Step 2 Step 3
Solution:
1. For 20 days, each day the number of patients receiving cardiograms was between 31 and 36.
a) If the positioning point is an integer, the numerical observation on the positioning point is chosen for the quartiles.
b) If the positioning point is halfway between two integers, the average of their corresponding observations is selected.
c) If the positioning point is neither an integer nor a value halfway between two integers, thus round off to the nearest integer and select the
numerical value of the corresponding observation.
EXAMPLE 4
The 3-year annual returns of 14 low-risk funds arranged in ascending order are given as follows.
Find the min, max, Q1, Q3 and median. Then construct the box whisker plot.
Pearson
Mean Range Coefficient of
Skewness
Variance &
Median Standard
Deviation *NOTES:
Please round off your answer
that involved calculation to 3
Mode decimal places
2.2.1
Measures of Central Tendency
EXAMPLE 5
A company has five departments. The numbers of workers in five departments are 24, 13, 19,
26, and 11 respectively. What is the mean for the number of workers in a department and
interpret the value?
ΣX 24+13+19+26+11
Solution: μ = = = 18.6
N 5
Interpretation:
Solution: 1st step - Arrange the data from the smallest to largest value (increasing order)
7 7 11 12 16 16 18 22 25
Interpretation:
50% of the students spend more than 16 hours studying per week and another 50% of the
students spend less than 16 hours studying per week.
EXAMPLE 7
The marks of nine students in a mathematics test with a maximum possible mark of 50 are giving
below. Find the value of mode and give your comment.
Solution: Find the value that most occur often in a data set.
◦ NOTES: In a set of data can have more than one mode as well as no mode
Solution:
𝒙 𝒙𝟐
46.5 2162.25 Sample Std. Dev, s 1 (Σ𝑥)2
= Σ𝑥 2 −
18 324 n−1 n
16 256 1 (95.5)2
= 2854.93 − = 16.054
7.8 60.84 5−1 5
7.2 51.84 Sample Variance, 𝒔𝟐 = (16.054)2 = 257.731
Σ𝑥 = 95.5 Σ𝑥 2 = 2854.93
2.2.3
Coefficient of Variation (CV)
% of CV Conclusion
Large % of CV ✓ More dispersed
✓ Less consistent/Less reliable
Small % of CV ✓ Less dispersed
✓ More consistent/More reliable
EXAMPLE 9
A study is conducted to determine the performance of student from various classes of
Sekolah Menengah Taman Meru Jati. The measurements on students’ CGPA were tabulated
as follows:
Using the most suitable measurement, which class is more consistent in their performance?
Solution:
Conclusion:
Class is more consistent in their performance since it has the smallest
percentage value of CV.
2.2.4
Measures of Skewness
Pcs value Description
Positive Skewed to the Right/ Positively skew
Negative Skewed to the Left/ Negatively skew
Zero Normal
EXAMPLE 10
Calculate the skewness for the data in Example 9 and comment on the shape of the
distribution.
Solution:
CLASS MELATI
Formula
Mean − mode 3(Mean − median)
Pcs @ Skewness = Pcs @ Skewness =
Std. Deviation Std. Deviation
3.12 − 2.12 Calculation 3(3.12 − 3.10)
Skewness = = 1.905 Skewness = = 0.114
0.2755 0.2755
Skew to the right Conclusion on the Skew to the right
shape of the
distribution
CLASS LILY
Conclusion on the
shape of the
distribution
CLASS MAWAR
Conclusion on the
shape of the
distribution
2.2.5
Measures of Position
Past Year Question
DEC’19 – Question 3
Waiting time (in minutes) of customers at the Providence Bank (PB) and the Valley Bank (VB) are given
as follow.
PB 3.2 4.4 4.8 5.2 6.2 6.7 7.5 8.0 6.5 9.0 5.1 3.3 5.2 4.0 4.9
VB 5.8 5.6 5.7 5.8 6.1 6.4 6.7 6.7 6.7 6.8 6.5 7.0 6.9 6.5 6.4
a) Calculate the mean and standard deviation for Valley Bank waiting times.
(3 marks)
b) From the above box-and-whisker plots of waiting times for both Providence and Valley banks,
comment on the shape of distribution for each plot.
(2 marks)
c) The mean and standard deviation for Providence Bank are 5.6 and 1.692 respectively. Determine
which bank shows a more consistent in waiting times.
(3 marks)
Past Year Question
JUNE’19 – Question 2
The stem and leaf plot below represents the Mathematics test scores (out of 100) of 15 randomly selected
students in Class Angle.
Stem Leaf
4 3
5 2 4 8
6 2 3 5 7 8 8 9
7 1 5 8
8 9 Key : 4|3 means 43
a) Calculate the mean and standard deviation for the test score.
(4 marks)
b) Interpret the mean obtained in a).
(1 mark)
Past Year Question
JUNE’19 – Question 2
c) The summary statistics of the test score for Mathematics and Biology students in Class
Angle are summarized in the following table. Using an appropriate measure, determine
which distribution of the test score between the subjects is more dispersed.
Descriptive statistics
N Mean Std. Deviation
Mathematics 15 65.4667 11.1859
Biology 17 66.9412 17.5623
(3 marks)
Past Year Question
DEC’18 – Question 2
The descriptive statistics for the life span (in years) of brand AA washing machine are summarized as below.
a) Calculate the coefficient of skewness. Hence, comment on the shape of the distribution.
(2 marks)
(1 mark)
c) Given the mean and variance for the life span (in years) of brand BB were 7.1 and 12.3 respectively. Using an
appropriate measurement, determine which brand has a more consistent life span.
(4 marks)
Past Year Question
JULY’17 – Question 2
The plot given below represents the time (in minutes) taken by each child in Class A to solve a
same problem. Time to solve problem
5 7 9
6 1 4 8
7 0 2 2 3 4 4 4 6 7 9
8 2 5 8 9 9
9 1 3 3 6
Key : 5 | 7 means 5.7
a) What is the name of the plot above?
b) How many children are there in Class A?
c) State the mode time taken for the children in Class A to solve the problem. (3 marks)
Past Year Question
JULY’17 – Question 2
d) The statistics for the time taken by children in Class A and Class B to solve the problem are
summarized in the following table. Using an appropriate measure, determine which class has
a more consistent time in solving the problem.
Descriptive statistics
N Mean Std. Deviation
Class A 24 7.733 1.131
Class B 27 7.196 1.223
(5 marks)
END OF CHAPTER 2