Chapter 2-190810 074149
Chapter 2-190810 074149
Chapter 2-190810 074149
Summarizing Data
2.1 Introduction
Raw data - Data recorded in the sequence in which there are collected and
before they are processed or ranked.
Example 1:
Here is a list of question asked in a large statistics class and the “raw data” given by
one of the students:
Example 2:
1
Qualitative raw data
A frequency distribution for qualitative data lists all categories and the
number of elements that belong to each of the categories.
It exhibits the frequencies are distributed over various categories
Also called as a frequency distribution table or simply a frequency table.
The number of students who belong to a certain category is called the
frequency of that category.
2
2.2.2 Relative Frequency and Percentage Distribution
Example 3:
A sample of UUM staff-owned vehicles produced by Proton was identified and the
make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj = Waja,
St = Satria, P = Perdana, Sv = Savvy):
W W P Is Is P Is W St Wj
Is W W Wj Is W W Is W Wj
Wj Is Wj Sv W W W Wj St W
Wj Sv W Is P Sv Wj Wj W W
St W W W W St St P Wj Sv
Construct a frequency distribution table for these data with their relative frequency
and percentage.
3
Solution:
Relative
Category Frequency Percentage (%)
Frequency
0.38*100
Wira 19 19/50 = 0.38
= 38
Iswara 8 0.16 16
Perdana 4 0.08 8
Waja 10 0.20 20
Satria 5 0.10 10
Savvy 4 0.08 8
Total 50 1.00 100
1. Bar Graphs
4
Figure 2.1
Horizontal Bar Chart
Satria
Perdana
Wira
0 5 10 15 20
Frequency
Figure 2.3
5
Component Bar Chart
To construct a component bar chart, all categories is in one bar and every bar
is divided into components.
The height of components should be tally with representative frequencies.
Example 5:
Solution:
Figure 2.5
6
Multiple Bar Chart
To construct a multiple bar chart, each bars that representative any categories
are gathered in groups.
The height of the bar represented the frequencies of categories.
Useful for making comparisons (two or more values).
120
Number of participants
100
80 Climbing
Caving
60
Walking
40
Sailing
20
0
2004 2005 2006
Year
Figure 2.6
7
Figure 2.7: Preferred snack choices of students at UUM
2. Pie Chart
8
Horror 22 0.11 360*0.11=39.6o
Foreign 16 0.08 360*0.08=28.8o
Science Fiction 16 0.08 360*0.08=28.8o
200 1.00 360o
Figure 2.9
In stem and leaf display of quantitative data, each value is divided into two
portions – a stem and a leaf. Then the leaves for each stem are shown
separately in a display.
Gives the information of data pattern.
Can detect which value frequently repeated.
Example 10:
25 12 9 10 5 12 23 7
9
36 3 11 12 31 28 37 6
14 41 38 44 13 22 18 19
Solution:
0 3 5 6 7 9
1 0 1 2 2 2 3 4 8 9
2 2 3 5 8
3 1 6 7 8
4 1 4
A frequency distribution for quantitative data lists all the classes and the
number of values that belong to each class.
Data presented in form of frequency distribution are called grouped data.
The class boundary is given by the midpoint of the upper limit of one class
and the lower limit of the next class. Also called real class limit.
To find the midpoint of the upper limit of the first class and the lower limit of
the second class, we divide the sum of these two limits by 2.
e.g.:
400 + 401
= 400.5
2
10
Class Width (class size)
401 + 600
e.g: Midpoint of the 1st class = = 500.5
2
c = 1 + 3.3 log n
11
n is the no. of observations in the data set.
2. Class width,
Largest value - Smallest value
i >
Number of classes
Range
i >
c
Example 11:
The following data give the total home runs hit by all players of each of the 30 Major
League Baseball teams during 2004 season
Solution:
12
i) Number of classes, c = 1 + 3.3 log 30
= 1 + 3.3(1.48)
= 5.89 �6 class
13
189 – 206 188.5 less than 206.5 0.2 20
207 – 224 206.5 less than 224.5 0.1 10
225 – 242 224.5 less than 242.5 0.1333 13.33
Sum 1.0 100%
1. Histograms
14
2. Polygon
Example 13
12
10
8
Frequency
0
134.5 152.5 170.5 188.5 206.5 224.5 242.5
1
Total home runs
For a very large data set, as the number of classes is increased (and the width
of classes is decreased), the frequency polygon eventually becomes a smooth
curve called a frequency distribution curve or simply a frequency curve.
15
2.3.5 Shape of Histogram
Same as polygon.
For a very large data set, as the number of classes is increased (and the
width of classes is decreased), the frequency polygon eventually becomes
a smooth curve called a frequency distribution curve or simply a
frequency curve.
The most common of shapes are:
(i) Symmetric
16
2.3.6 Cumulative Frequency Distributions
Ogive
30 – 39 5
40 – 49 6 Less than 29.5 0
50 – 59 6 Less than 39.5 5
60 - 69 3 Less than 49.5 11
70 – 79 3 Less than 59.5 17
80 - 89 7 Less than 69.5 20
Less than 79.5 23
Total 30
17
Less than 89.5 30
35
30
Cumulative Frequency
25
20
15
10
5
0
29.5 39.5 49.5 59.5 69.5 79.5 89.5
Earnings
Figure 2.17
Total 30
35
30
25
20
15
10
5
Cumulative Frequency
0
29.5 39.5 49.5 59.5 69.5 79.5 89.5
Earnings 18
Figure 2.18
19