Chapter 2-190810 074149

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 19

Chapter 2:

Summarizing Data

2.1 Introduction

 Raw data - Data recorded in the sequence in which there are collected and
before they are processed or ranked.

 Array data - Raw data that is arranged in ascending or descending order.

Example 1:

Here is a list of question asked in a large statistics class and the “raw data” given by
one of the students:

a) What is your sex (m=male, f=female)?

Answer (raw data): m

b) How many hours did you sleep last night?

Answer: 5 hours

c) What is your height in inches?

Answer: 67 inches

d) What’s the fastest you’ve ever driven a car (mph)?

Answer: 110 mph

Example 2:

Quantitative raw data

Qualitative raw data

 These data also called ungrouped data

2.2 Organizing and Graphing Qualitative Data

2.2.1 Frequency Distributions/ Table

2.2.2 Relative Frequency and Percentage Distribution
2.2.3 Graphical Presentation of Qualitative Data

2.2.1 Frequency Distributions / Table

 A frequency distribution for qualitative data lists all categories and the
number of elements that belong to each of the categories.
 It exhibits the frequencies are distributed over various categories
 Also called as a frequency distribution table or simply a frequency table.
 The number of students who belong to a certain category is called the
frequency of that category.

2.2.2 Relative Frequency and Percentage Distribution

 A relative frequency distribution is a listing of all categories along with their

relative frequencies (given as proportions or percentages).
 It is commonplace to give the frequency and relative frequency distribution
 Calculating relative frequency and percentage of a category

Relative Frequency of a category = Frequency of that category

Sum of all frequencies

Percentage = (Relative Frequency)* 100

Example 3:

A sample of UUM staff-owned vehicles produced by Proton was identified and the
make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj = Waja,
St = Satria, P = Perdana, Sv = Savvy):

W W P Is Is P Is W St Wj
Is W W Wj Is W W Is W Wj
Wj Is Wj Sv W W W Wj St W
Wj Sv W Is P Sv Wj Wj W W
St W W W W St St P Wj Sv

Construct a frequency distribution table for these data with their relative frequency
and percentage.


Category Frequency Percentage (%)
Wira 19 19/50 = 0.38
= 38
Iswara 8 0.16 16
Perdana 4 0.08 8
Waja 10 0.20 20
Satria 5 0.10 10
Savvy 4 0.08 8
Total 50 1.00 100

2.2.3 Graphical Presentation of Qualitative Data

1. Bar Graphs

 A graph made of bars whose heights represent the frequencies of respective

 Such a graph is most helpful when you have many categories to represent.
 Notice that a gap is inserted between each of the bars.
 It has simple/ vertical bar chart, horizontal bar chart, component bar chart and
multiple bar chart.

Simple/ Vertical Bar Chart

 To construct a vertical bar chart, mark the various categories on the

horizontal axis and mark the frequencies on the vertical axis

Figure 2.1
Horizontal Bar Chart

 To construct a horizontal bar chart, mark the various categories on the

vertical axis and mark the frequencies on the horizontal axis.

Example 4: Refer Example 3,

UUM Staff-owned Vehicles Produced By

Types of Vehicle




0 5 10 15 20

Figure 2.3

 Another example of horizontal bar chart: Figure 2.4

Figure 2.4: Number of students at Diversity College who are

immigrants, by last country of permanent residence

Component Bar Chart

 To construct a component bar chart, all categories is in one bar and every bar
is divided into components.
 The height of components should be tally with representative frequencies.

Example 5:

Suppose we want to illustrate the information below, representing the number of

people participating in the activities offered by an outdoor pursuits centre during
Jun of three consecutive years.

2004 2005 2006

Climbing 21 34 36
Caving 10 12 21
Walking 75 85 100
Sailing 36 36 40
Total 142 167 191


Figure 2.5

Multiple Bar Chart

 To construct a multiple bar chart, each bars that representative any categories
are gathered in groups.
 The height of the bar represented the frequencies of categories.
 Useful for making comparisons (two or more values).

Example 6: Refer example 5,

Activities Breakdown (Jun)

Number of participants

80 Climbing
2004 2005 2006

Figure 2.6

 Another example of horizontal bar chart: Figure 2.7

Figure 2.7: Preferred snack choices of students at UUM

2. Pie Chart

 A circle divided into portions that represent the relative frequencies or

percentages of a population or a sample belonging to different categories.
 An alternative to the bar chart and useful for summarizing a single categorical
variable if there are not too many categories.
 The chart makes it easy to compare relative sizes of each class/category.
 The whole pie represents the total sample or population. The pie is divided
into different portions that represent the different categories.
 To construct a pie chart, we multiply 360 o by the relative frequency for each
category to obtain the degree measure or size of the angle for the
corresponding categories.

Example 7 (Table 2.6 and Figure 2.8):

Table 2.6 Figure 2.8

Example 8 (Table 2.7 and Figure 2.9):

Movie Frequency Relative Frequency Angle Size

Comedy 54 0.27 360*0.27=97.2o
Action 36 0.18 360*0.18=64.8o
Romance 28 0.14 360*0.14=50.4o
Drama 28 0.14 360*0.14=50.4o

Horror 22 0.11 360*0.11=39.6o
Foreign 16 0.08 360*0.08=28.8o
Science Fiction 16 0.08 360*0.08=28.8o
200 1.00 360o

Figure 2.9

2.3 Organizing and Graphing Quantitative Data

2.3.1 Stem and Leaf Display

2.3.2 Frequency Distribution
2.3.3 Relative Frequency and Percentage Distributions.
2.3.4 Graphing Grouped Data
2.3.5 Shapes of Histogram
2.3.6 Cumulative Frequency Distributions.

2.1 Stem-and-Leaf Display

 In stem and leaf display of quantitative data, each value is divided into two
portions – a stem and a leaf. Then the leaves for each stem are shown
separately in a display.
 Gives the information of data pattern.
 Can detect which value frequently repeated.

Example 10:

25 12 9 10 5 12 23 7

36 3 11 12 31 28 37 6
14 41 38 44 13 22 18 19


0 3 5 6 7 9
1 0 1 2 2 2 3 4 8 9
2 2 3 5 8
3 1 6 7 8
4 1 4

2.2 Frequency Distributions

 A frequency distribution for quantitative data lists all the classes and the
number of values that belong to each class.
 Data presented in form of frequency distribution are called grouped data.

 The class boundary is given by the midpoint of the upper limit of one class
and the lower limit of the next class. Also called real class limit.
 To find the midpoint of the upper limit of the first class and the lower limit of
the second class, we divide the sum of these two limits by 2.

400 + 401
= 400.5

 Class Width (class size)

Class width = Upper boundary – Lower boundary

e.g. : Width of the first class = 600.5 – 400.5 = 200

 Class Midpoint or Mark

Lower limit + Upper limit

class midpoint or mark =

401 + 600
e.g: Midpoint of the 1st class = = 500.5

Constructing Frequency Distribution Tables

Figure 2.9

1. To decide the number of classes, we used Sturge’s formula, which is

c = 1 + 3.3 log n

where c is the no. of classes

n is the no. of observations in the data set.

2. Class width,
Largest value - Smallest value
i >
Number of classes
i >

This class width is rounded to a convenient number.

3. Lower Limit of the First Class or the Starting Point

 Use the smallest value in the data set.

Example 11:

The following data give the total home runs hit by all players of each of the 30 Major
League Baseball teams during 2004 season


i) Number of classes, c = 1 + 3.3 log 30
= 1 + 3.3(1.48)
= 5.89 �6 class

ii) Class width,

242 - 135
i >
> 17.8

iii) Starting Point = 135

Table 2.10 Frequency Distribution for Data of Table 2.9

Total Home Runs Tally f

135 – 152 |||| |||| 10
153 – 170 || 2
171 – 188 |||| 5
189 – 206 |||| | 6
207 – 224 ||| 3
225 – 242 |||| 4
�f = 30

2.3 Relative Frequency and Percentage Distributions

Frequency of that class

Relative frequency of a class =
Sum of all frequencies
Percentage = (Relative frequency) �100

Example 12 (Refer example 11)

Table 2.11: Relative Frequency and Percentage Distributions

Total Home Runs Class Boundaries Relative %


135 – 152 134.5 less than 152.5 0.3333 33.33

153 – 170 152.5 less than 170.5 0.0667 6.67
171 – 188 170.5 less than 188.5 0.1667 16.67

189 – 206 188.5 less than 206.5 0.2 20
207 – 224 206.5 less than 224.5 0.1 10
225 – 242 224.5 less than 242.5 0.1333 13.33
Sum 1.0 100%

2.4 Graphing Grouped Data

1. Histograms

 A histogram is a graph in which the class boundaries are marked on the

horizontal axis and either the frequencies, relative frequencies, or
percentages are marked on the vertical axis. The frequencies, relative
frequencies or percentages are represented by the heights of the bars.
 In histogram, the bars are drawn adjacent to each other and there is a space
between y axis and the first bar.

Example 13 (Refer example 11)

134.5 152.5 170.5 188.5 206.5 224.5 242.5

Figure 2.10: Frequency histogram for Table 2.10

2. Polygon

 A graph formed by joining the midpoints of the tops of successive bars in a

histogram with straight lines is called a polygon.

Example 13




134.5 152.5 170.5 188.5 206.5 224.5 242.5
Total home runs

Figure 2.11: Frequency polygon for Table 2.10

 For a very large data set, as the number of classes is increased (and the width
of classes is decreased), the frequency polygon eventually becomes a smooth
curve called a frequency distribution curve or simply a frequency curve.

Figure 2.12: Frequency distribution curve

2.3.5 Shape of Histogram

 Same as polygon.
 For a very large data set, as the number of classes is increased (and the
width of classes is decreased), the frequency polygon eventually becomes
a smooth curve called a frequency distribution curve or simply a
frequency curve.
 The most common of shapes are:
(i) Symmetric

Figure 2.13 & 2.14: Symmetric histograms

(ii) Right skewed and (iii) Left skewed

Figure 2.15 & 2.16: Right skewed and Left skewed

2.3.6 Cumulative Frequency Distributions

 A cumulative frequency distribution gives the total number of values that

fall below the upper boundary of each class.

Example 14: Using the frequency distribution of table 2.11,

Total Home Runs Class Boundaries Cumulative Frequency

135 – 152 134.5 less than 152.5 10

153 – 170 152.5 less than 170.5 10+2=12
171 – 188 170.5 less than 188.5 10+2+5=17
189 – 206 188.5 less than 206.5 10+2+5+6=23
207 – 224 206.5 less than 224.5 10+2+5+6+3=26
225 – 242 224.5 less than 242.5 10+2+5+6+3+4=30


 An ogive is a curve drawn for the cumulative frequency distribution by joining

with straight lines the dots marked above the upper boundaries of classes at
heights equal to the cumulative frequencies of respective classes.
 Two type of ogive:
(i) ogive less than
(ii) ogive greater than
 First, build a table of cumulative frequency.

Example 15: (Ogive Less Than)

Earnings Number of Earnings (RM) Cumulative

(RM) students (f) Frequency (F)

30 – 39 5
40 – 49 6 Less than 29.5 0
50 – 59 6 Less than 39.5 5
60 - 69 3 Less than 49.5 11
70 – 79 3 Less than 59.5 17
80 - 89 7 Less than 69.5 20
Less than 79.5 23

Total 30
Less than 89.5 30

Cumulative Frequency

29.5 39.5 49.5 59.5 69.5 79.5 89.5


Figure 2.17

Example 16 : (Ogive Greater Than)

Earnings Number of Earnings (RM) Cumulative

(RM) students (f) Frequency (F)

More than 29.5

30 – 39 5 30
More than 39.5
40 – 49 6 25
More than 49.5
50 – 59 6 19
More than 59.5
60 - 69 3 13
More than 69.5
70 – 79 3 10
More than 79.5
80 - 89 7 7
More than 89.5

Total 30

Cumulative Frequency
29.5 39.5 49.5 59.5 69.5 79.5 89.5
Earnings 18
Figure 2.18


You might also like