Chapter 2-190810 074149

Chapter 2:
Summarizing Data
2.1 Introduction
 Raw data - Data recorded in the sequence in which there are collected and
before they are processed or ranked.
 Array data - Raw data that is arranged in ascending or descending order.
Example 1:
Here is a list of question asked in a large statistics class and the “raw data” given by
one of the students:
a) What is your sex (m=male, f=female)?

Answer (raw data): m
b) How many hours did you sleep last night?

Answer: 5 hours
c) What is your height in inches?

Answer: 67 inches
d) What’s the fastest you’ve ever driven a car (mph)?

Answer: 110 mph
Example 2:
Quantitative raw data
1
Qualitative raw data
 These data also called ungrouped data
2.2 Organizing and Graphing Qualitative Data
2.2.1 Frequency Distributions/ Table

2.2.2 Relative Frequency and Percentage Distribution
2.2.3 Graphical Presentation of Qualitative Data
2.2.1 Frequency Distributions / Table
 A frequency distribution for qualitative data lists all categories and the
number of elements that belong to each of the categories.
 It exhibits the frequencies are distributed over various categories
 Also called as a frequency distribution table or simply a frequency table.
 The number of students who belong to a certain category is called the
frequency of that category.
2
2.2.2 Relative Frequency and Percentage Distribution
 A relative frequency distribution is a listing of all categories along with their

relative frequencies (given as proportions or percentages).
 It is commonplace to give the frequency and relative frequency distribution
together.
 Calculating relative frequency and percentage of a category
Relative Frequency of a category = Frequency of that category

Sum of all frequencies
Percentage = (Relative Frequency)* 100
Example 3:
A sample of UUM staff-owned vehicles produced by Proton was identified and the
make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj = Waja,
St = Satria, P = Perdana, Sv = Savvy):
W W P Is Is P Is W St Wj
Is W W Wj Is W W Is W Wj
Wj Is Wj Sv W W W Wj St W
Wj Sv W Is P Sv Wj Wj W W
St W W W W St St P Wj Sv
Construct a frequency distribution table for these data with their relative frequency
and percentage.
3
Solution:
Relative
Category Frequency Percentage (%)
Frequency
0.38*100
Wira 19 19/50 = 0.38
= 38
Iswara 8 0.16 16
Perdana 4 0.08 8
Waja 10 0.20 20
Satria 5 0.10 10
Savvy 4 0.08 8
Total 50 1.00 100
2.2.3 Graphical Presentation of Qualitative Data
1. Bar Graphs
 A graph made of bars whose heights represent the frequencies of respective

categories.
 Such a graph is most helpful when you have many categories to represent.
 Notice that a gap is inserted between each of the bars.
 It has simple/ vertical bar chart, horizontal bar chart, component bar chart and
multiple bar chart.
Simple/ Vertical Bar Chart
 To construct a vertical bar chart, mark the various categories on the

horizontal axis and mark the frequencies on the vertical axis
4
Figure 2.1
Horizontal Bar Chart
 To construct a horizontal bar chart, mark the various categories on the

vertical axis and mark the frequencies on the horizontal axis.
Example 4: Refer Example 3,
UUM Staff-owned Vehicles Produced By

Proton
Types of Vehicle
Satria
Perdana
Wira
0 5 10 15 20
Frequency
Figure 2.3
 Another example of horizontal bar chart: Figure 2.4
Figure 2.4: Number of students at Diversity College who are

immigrants, by last country of permanent residence
5
Component Bar Chart
 To construct a component bar chart, all categories is in one bar and every bar
is divided into components.
 The height of components should be tally with representative frequencies.
Example 5:
Suppose we want to illustrate the information below, representing the number of

people participating in the activities offered by an outdoor pursuits centre during
Jun of three consecutive years.
2004 2005 2006

Climbing 21 34 36
Caving 10 12 21
Walking 75 85 100
Sailing 36 36 40
Total 142 167 191
Solution:
Figure 2.5
6
Multiple Bar Chart
 To construct a multiple bar chart, each bars that representative any categories
are gathered in groups.
 The height of the bar represented the frequencies of categories.
 Useful for making comparisons (two or more values).
Example 6: Refer example 5,
Activities Breakdown (Jun)
120
Number of participants
100
80 Climbing
Caving
60
Walking
40
Sailing
20
0
2004 2005 2006
Year
Figure 2.6
 Another example of horizontal bar chart: Figure 2.7
7
Figure 2.7: Preferred snack choices of students at UUM
2. Pie Chart
 A circle divided into portions that represent the relative frequencies or

percentages of a population or a sample belonging to different categories.
 An alternative to the bar chart and useful for summarizing a single categorical
variable if there are not too many categories.
 The chart makes it easy to compare relative sizes of each class/category.
 The whole pie represents the total sample or population. The pie is divided
into different portions that represent the different categories.
 To construct a pie chart, we multiply 360 o by the relative frequency for each
category to obtain the degree measure or size of the angle for the
corresponding categories.
Example 7 (Table 2.6 and Figure 2.8):
Table 2.6 Figure 2.8
Example 8 (Table 2.7 and Figure 2.9):
Movie Frequency Relative Frequency Angle Size

Genres
Comedy 54 0.27 360*0.27=97.2o
Action 36 0.18 360*0.18=64.8o
Romance 28 0.14 360*0.14=50.4o
Drama 28 0.14 360*0.14=50.4o
8
Horror 22 0.11 360*0.11=39.6o
Foreign 16 0.08 360*0.08=28.8o
Science Fiction 16 0.08 360*0.08=28.8o
200 1.00 360o
Figure 2.9
2.3 Organizing and Graphing Quantitative Data
2.3.1 Stem and Leaf Display

2.3.2 Frequency Distribution
2.3.3 Relative Frequency and Percentage Distributions.
2.3.4 Graphing Grouped Data
2.3.5 Shapes of Histogram
2.3.6 Cumulative Frequency Distributions.
2.1 Stem-and-Leaf Display
 In stem and leaf display of quantitative data, each value is divided into two
portions – a stem and a leaf. Then the leaves for each stem are shown
separately in a display.
 Gives the information of data pattern.
 Can detect which value frequently repeated.
Example 10:
25 12 9 10 5 12 23 7
9
36 3 11 12 31 28 37 6
14 41 38 44 13 22 18 19
Solution:
0 3 5 6 7 9
1 0 1 2 2 2 3 4 8 9
2 2 3 5 8
3 1 6 7 8
4 1 4
2.2 Frequency Distributions
 A frequency distribution for quantitative data lists all the classes and the
number of values that belong to each class.
 Data presented in form of frequency distribution are called grouped data.
 The class boundary is given by the midpoint of the upper limit of one class
and the lower limit of the next class. Also called real class limit.
 To find the midpoint of the upper limit of the first class and the lower limit of
the second class, we divide the sum of these two limits by 2.
e.g.:
400 + 401
= 400.5
2
10
 Class Width (class size)
Class width = Upper boundary – Lower boundary
e.g. : Width of the first class = 600.5 – 400.5 = 200
 Class Midpoint or Mark
Lower limit + Upper limit

class midpoint or mark =
2
401 + 600
e.g: Midpoint of the 1st class = = 500.5
2
Constructing Frequency Distribution Tables

Figure 2.9
1. To decide the number of classes, we used Sturge’s formula, which is
c = 1 + 3.3 log n
where c is the no. of classes
11
n is the no. of observations in the data set.
2. Class width,
Largest value - Smallest value
i >
Number of classes
Range
i >
c
This class width is rounded to a convenient number.
3. Lower Limit of the First Class or the Starting Point
 Use the smallest value in the data set.
Example 11:
The following data give the total home runs hit by all players of each of the 30 Major
League Baseball teams during 2004 season
Solution:
12
i) Number of classes, c = 1 + 3.3 log 30
= 1 + 3.3(1.48)
= 5.89 �6 class
ii) Class width,

242 - 135
i >
6
> 17.8
�18
iii) Starting Point = 135
Table 2.10 Frequency Distribution for Data of Table 2.9
Total Home Runs Tally f

135 – 152 |||| |||| 10
153 – 170 || 2
171 – 188 |||| 5
189 – 206 |||| | 6
207 – 224 ||| 3
225 – 242 |||| 4
�f = 30
2.3 Relative Frequency and Percentage Distributions
Frequency of that class

Relative frequency of a class =
Sum of all frequencies
f
=
�f
Percentage = (Relative frequency) �100
Example 12 (Refer example 11)
Table 2.11: Relative Frequency and Percentage Distributions
Total Home Runs Class Boundaries Relative %

Frequency
135 – 152 134.5 less than 152.5 0.3333 33.33

153 – 170 152.5 less than 170.5 0.0667 6.67
171 – 188 170.5 less than 188.5 0.1667 16.67
13
189 – 206 188.5 less than 206.5 0.2 20
207 – 224 206.5 less than 224.5 0.1 10
225 – 242 224.5 less than 242.5 0.1333 13.33
Sum 1.0 100%
2.4 Graphing Grouped Data
1. Histograms
 A histogram is a graph in which the class boundaries are marked on the

horizontal axis and either the frequencies, relative frequencies, or
percentages are marked on the vertical axis. The frequencies, relative
frequencies or percentages are represented by the heights of the bars.
 In histogram, the bars are drawn adjacent to each other and there is a space
between y axis and the first bar.
Example 13 (Refer example 11)
134.5 152.5 170.5 188.5 206.5 224.5 242.5
Figure 2.10: Frequency histogram for Table 2.10
14
2. Polygon
 A graph formed by joining the midpoints of the tops of successive bars in a

histogram with straight lines is called a polygon.
Example 13
12
10
8
Frequency
0
134.5 152.5 170.5 188.5 206.5 224.5 242.5
1
Total home runs
Figure 2.11: Frequency polygon for Table 2.10
 For a very large data set, as the number of classes is increased (and the width
of classes is decreased), the frequency polygon eventually becomes a smooth
curve called a frequency distribution curve or simply a frequency curve.
Figure 2.12: Frequency distribution curve
15
2.3.5 Shape of Histogram
 Same as polygon.
 For a very large data set, as the number of classes is increased (and the
width of classes is decreased), the frequency polygon eventually becomes
a smooth curve called a frequency distribution curve or simply a
frequency curve.
 The most common of shapes are:
(i) Symmetric
Figure 2.13 & 2.14: Symmetric histograms
(ii) Right skewed and (iii) Left skewed
Figure 2.15 & 2.16: Right skewed and Left skewed
16
2.3.6 Cumulative Frequency Distributions
 A cumulative frequency distribution gives the total number of values that

fall below the upper boundary of each class.
Example 14: Using the frequency distribution of table 2.11,
Total Home Runs Class Boundaries Cumulative Frequency
135 – 152 134.5 less than 152.5 10

153 – 170 152.5 less than 170.5 10+2=12
171 – 188 170.5 less than 188.5 10+2+5=17
189 – 206 188.5 less than 206.5 10+2+5+6=23
207 – 224 206.5 less than 224.5 10+2+5+6+3=26
225 – 242 224.5 less than 242.5 10+2+5+6+3+4=30
Ogive
 An ogive is a curve drawn for the cumulative frequency distribution by joining

with straight lines the dots marked above the upper boundaries of classes at
heights equal to the cumulative frequencies of respective classes.
 Two type of ogive:
(i) ogive less than
(ii) ogive greater than
 First, build a table of cumulative frequency.
Example 15: (Ogive Less Than)
Earnings Number of Earnings (RM) Cumulative

(RM) students (f) Frequency (F)
30 – 39 5
40 – 49 6 Less than 29.5 0
50 – 59 6 Less than 39.5 5
60 - 69 3 Less than 49.5 11
70 – 79 3 Less than 59.5 17
80 - 89 7 Less than 69.5 20
Less than 79.5 23
Total 30
17
Less than 89.5 30
35
30
Cumulative Frequency
25
20
15
10
5
0
29.5 39.5 49.5 59.5 69.5 79.5 89.5
Earnings
Figure 2.17
Example 16 : (Ogive Greater Than)
Earnings Number of Earnings (RM) Cumulative

(RM) students (f) Frequency (F)
More than 29.5

30 – 39 5 30
More than 39.5
40 – 49 6 25
More than 49.5
50 – 59 6 19
More than 59.5
60 - 69 3 13
More than 69.5
70 – 79 3 10
More than 79.5
80 - 89 7 7
More than 89.5
0
Total 30
35
30
25
20
15
10
5
Cumulative Frequency
0
29.5 39.5 49.5 59.5 69.5 79.5 89.5
Earnings 18
Figure 2.18
19

Chapter 2-190810 074149

Uploaded by

Copyright:

Available Formats

Chapter 2-190810 074149

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2-190810 074149

Uploaded by

Copyright:

Available Formats

Chapter 2:

 Array data - Raw data that is arranged in ascending or descending order.

a) What is your sex (m=male, f=female)?

b) How many hours did you sleep last night?

c) What is your height in inches?

d) What’s the fastest you’ve ever driven a car (mph)?

Quantitative raw data

 These data also called ungrouped data

2.2 Organizing and Graphing Qualitative Data

2.2.1 Frequency Distributions/ Table

2.2.1 Frequency Distributions / Table

 A relative frequency distribution is a listing of all categories along with their

Relative Frequency of a category = Frequency of that category

Percentage = (Relative Frequency)* 100

2.2.3 Graphical Presentation of Qualitative Data

 A graph made of bars whose heights represent the frequencies of respective

Simple/ Vertical Bar Chart

 To construct a vertical bar chart, mark the various categories on the

 To construct a horizontal bar chart, mark the various categories on the

Example 4: Refer Example 3,

UUM Staff-owned Vehicles Produced By

 Another example of horizontal bar chart: Figure 2.4

Figure 2.4: Number of students at Diversity College who are

Suppose we want to illustrate the information below, representing the number of

2004 2005 2006

Example 6: Refer example 5,

Activities Breakdown (Jun)

 Another example of horizontal bar chart: Figure 2.7

 A circle divided into portions that represent the relative frequencies or

Example 7 (Table 2.6 and Figure 2.8):

Table 2.6 Figure 2.8

Example 8 (Table 2.7 and Figure 2.9):

Movie Frequency Relative Frequency Angle Size

2.3 Organizing and Graphing Quantitative Data

2.3.1 Stem and Leaf Display

2.1 Stem-and-Leaf Display

2.2 Frequency Distributions

Class width = Upper boundary – Lower boundary

e.g. : Width of the first class = 600.5 – 400.5 = 200

 Class Midpoint or Mark

Lower limit + Upper limit

Constructing Frequency Distribution Tables

1. To decide the number of classes, we used Sturge’s formula, which is

where c is the no. of classes

This class width is rounded to a convenient number.

3. Lower Limit of the First Class or the Starting Point

 Use the smallest value in the data set.

ii) Class width,

iii) Starting Point = 135

Table 2.10 Frequency Distribution for Data of Table 2.9

Total Home Runs Tally f

2.3 Relative Frequency and Percentage Distributions

Frequency of that class

Example 12 (Refer example 11)

Table 2.11: Relative Frequency and Percentage Distributions

Total Home Runs Class Boundaries Relative %

135 – 152 134.5 less than 152.5 0.3333 33.33

2.4 Graphing Grouped Data

 A histogram is a graph in which the class boundaries are marked on the