Chapter 2 VISUAL PRESENTATION OF DATA
Chapter 2 VISUAL PRESENTATION OF DATA
Chapter 2 VISUAL PRESENTATION OF DATA
Number of A Number of
grades students
0 61
1 38
2 17
3 8
4 3
5 2
6 0
Total 129
Additional examples A:
Below is an example set of 72 counts of the number of customers entering a shop in a one-
hour period.
7 5 8 0 4 2 5 7 0 1 3
3 5 5 1 7 4 7 4 4 7 3
4 4 6 2 3 7 6 7 5 4 5
5 6 1 2 2 5 3 11 6 7 3
5 3 7 8 2 5 3 9 4 5 2
7 6 5 3 3 8 5 5 6 3 2
4 6 5 7 4 4
Our aim is to summarize this set of measurement data. The most popular way to do this is by
forming a frequency table where each row is a different value, and alongside each value we
report the number of times it occurs. A frequency table for the above data would look like:
1
CHAPTER 2 VISUAL PRESENTATION OF DATA
A grouped frequency distribution condenses data even more, by grouping the possible
values into classes and showing for each class the class frequency, ie. how many values fall
in the class.
Example 2.1.2: The following table shows a grouped frequency distribution.
Fortnightly food expenditure for a group of households
Expenditure Number of
(nearest kina) households
10 - 19 7
20 - 29 16
30 - 39 30
40 - 49 14
50 - 59 8
60 - 69 3
70 - 79 2
Total 80
It is often useful to know what proportion, or percentage, of cases falls into each class; for
this purpose one can construct a relative frequency distribution.
2
CHAPTER 2 VISUAL PRESENTATION OF DATA
Expenditure Percentage of
(nearest kina) households
10 - 19 8.75
20 - 29 20.00
30 - 39 37.50
40 - 49 17.50
50 - 59 10.00
60 - 69 3.75
70 - 79 2.50
Total 100.00
Additional Examples B:
Here is another example. Below is a set of 112 daily takings of a shop, in hundreds of kina.
38 68 69 64 65 55 66 64 52 63
69 50 57 67 50 47 32 51 46 56
57 54 74 48 60 55 68 53 50 69
49 57 40 73 71 62 62 65 62 65
68 47 51 54 57 48 53 43 61 48
74 57 72 58 64 72 42 60 46 78
73 74 57 50 64 59 68 53 63 54
55 76 61 63 62 54 59 72 63 72
57 54 61 69 61 75 67 56 39 33
69 45 67 75 67 51 47 83 64 48
63 62 64 37 67 49 47 70 64 59
66 52
We can not use a frequency table, since we would have far too many rows in the table.
Instead we use a grouped frequency table (in which we group the data into a set of
intervals) to summarise the data. A grouped frequency table for the above data would look
like:
Interval Frequency
31-40 5
41-50 16
51-60 33
61-70 42
71-80 15
81-90 1
Total: 112
The advantage of using frequency tables to summarise data is that with practice you can more
readily “see” properties of the data, like its distribution , its average value), or its spread of
values
3
CHAPTER 2 VISUAL PRESENTATION OF DATA
Frequency 50
40
30
20
10
0
35 45 55 65 75 85
Mark
A frequency polygon is a line graph obtained by joining the midpoints of the corresponding
frequency histogram, as is demonstrated in the following two diagrams:
50
Frequency
40
30
20
10
0
35 45 55 65 75 85
Mark
5
CHAPTER 2 VISUAL PRESENTATION OF DATA
50
Frequency
40
30
20
10
0
35 45 55 65 75 85
Mark
Another common graph is the cumulative frequency polygon, or ogive. It is a line graph
obtained by joining the mid-points of a cumulative frequency histogram. This is illustrated in
the following sequence of diagrams:
Total: 112 -
Cumulative Frequency
120
100
80
60
40
20
0
35 45 55 65 75 85
Mark
Cumulative Frequency
120
100
80
60
40
20
0
35 45 55 65 75 85
Mark
6
CHAPTER 2 VISUAL PRESENTATION OF DATA
Additional Examples C:
4.9, 5.6, 7.2, 6.7, 3.1, 4.6, 6.0, 5.0, 3.7, 7.3, 6.0, 5.4, 4.2, 6.6, 4.7, 5.8, 4.4, 3.6, 4.2, 5.4,
3l 4 3
4l 5 6
5l 6 5
6l 7 4
7l 8 2
7
CHAPTER 2 VISUAL PRESENTATION OF DATA
8
CHAPTER 2 VISUAL PRESENTATION OF DATA
There are actually two types of cumulative frequency distribution: one (the “less than” type)
showing how many cases lie in each class and the lower-valued classes (third column in the
table above); the other ( the “greater than” type) showing how many cases lie in each class
and the higher-valued classes (fourth column). For the fourth class in the example, the figure
67 in the third column shows that there are 67 values in that class and lower-valued classes,
ie. 67 households have expenditure of K49 or less. For the same class, the figure 27 in the
fourth column shows that there are 27 values in that class and higher-valued classes, ie. 27
households have expenditure of K40 or more.
One can also express cumulative frequency distributions in relative terms, of course, as is
done in the fifth column of the table above for the “less than” type.
2.4 Histogram
A histogram is a special kind of bar chart that is used to illustrate a frequency distribution;
each bar is drawn over a class interval (so that all the bars are touching, with no gaps) and the
area of the bar indicates the frequency, or relative frequency, of the class. Conveniently
rounded class limits should be used to mark off the axis.
If the bars are all the same width, which is usually the case, then the height of each bar also
represents the corresponding class frequency. If the bars are not all the same width, then the
area but not the height of each bar is proportional to the corresponding class frequency. In
any case, a vertical axis should not be shown, because the histogram is an “area” diagram –
not a “height” diagram.
9
CHAPTER 2 VISUAL PRESENTATION OF DATA
Example 2.4.1: Illustrate the food expenditure data given in Example 2.3.1 by drawing a
histogram.
If the variable is continuous and one uses many very narrow classes, the polygon would
approach a smooth curve, which is called a frequency curve.
Like the histogram, the frequency polygon and the frequency curve are “area diagrams”;
again a vertical axis should not be shown. The interpretation of these diagrams is that the
area under the polygon or curve between any two values of the variable represents the
relative frequency associated with the interval between the two values, ie. the proportion of
values in the dataset which fall in the interval.
10
CHAPTER 2 VISUAL PRESENTATION OF DATA
80
70
60
Cumulative 50
40
frequency 30
20
10
0
10 20 30 40 50 60 70 80
(kina per fortnight)
An ogive can also be used to illustrate a “greater than” cumulative frequency distribution, of
course. It has a downward slope and the cumulative frequencies are plotted above the class
lower limits.
The ogive is an extremely useful curve. On the “less than” ogive, the height of the curve
above any given value on the horizontal axis indicates the number of observations in the
dataset whose values are less than (or equal to) that given value. One can use such
information to estimate how many items fall into any given interval - not just class intervals.
By reading off the heights (cumulative frequencies) of the ogive at both ends of the interval,
one can calculate an estimate of the number of items falling in the interval.
Example 1.16.2: From the ogive in the previous example, estimate the number of cases in the
interval from 37 to 46.
Number of cases between 37 and 46 (ie. households with expenditure between K37 and K46)
= [No. of cases with values < 46] - [No. of cases with values < 37]
= [Ogive height at 46] - [Ogive height at 37]
= 61 - 43 = 18
So we estimate that 18 households have food expenditure between K37 and K46 per
fortnight.
11
CHAPTER 2 VISUAL PRESENTATION OF DATA
Tutorial exercises
1. What is descriptive statistics? Give two examples of its use.
2. A sample of 10 grocery stores in Newcastle, New South Wales on a particular day
revealed that the average price per kilogram for hamburger mince was $4.50.
(a) What is the population of interest in this study?
(b) In the statistical inference process, we would like to estimate the average price per
kilogram for hamburger mince for all grocery stores in Newcastel, New South
Wales. Suggest a value of such an estimate.
3. Suppose we carry out two statistical surveys. One uses a sample of size 50. The
other uses a sample of size 500.
(a) Which procedure is likely to more accurate?
(b) Since the second sample size is 10 times larger, can we expect the accuracy to be
10 times better? Why?
4. In each of the following situations, indicate whether the data used in the analysis is
primary or secondary.
(a) We wish to determine the proportion of tourists to PNG that come from Japan.
So we approach the Immigration Department and gain permission to analyse
data given by arriving tourists on their customs cards.
(b) We wish to determine the most common destination for PNG tourists. So we
interview PNG citizens at Jackson’s airport just before they enter the
International Departure Lounge to find out where they are travelling to.
(c) We wish to find the proportion of PNG companies that are willing to hire
union labour. So we approach the managers of a selection of companies and
ask them, and use the information gathered to calculate the required
proportion.
(d) We wish to calculate the movement of PNG stocks on the Sydney stock
exchange. So we get the daily papers, check the stock prices, and use this
information to calculate the average movement.
5. Rank the following observations (from lowest to highest):
1.73 1.77 1.83 1.80 1.74 1.72 1.79 1.79 1.75 1.77 1.73
1.77 1.78 1.77 1.80 1.65 1.75 1.69 1.75 1.73 1.80 1.75
1.80 1.82 1.81 1.81 1.71 1.84 1.71 1.74 1.72 1.76 1.70
12
CHAPTER 2 VISUAL PRESENTATION OF DATA
6. In order to help decide which new staff to recruit, a large national company
administered an aptitude test to 74 Grade 10 school-leavers. The following are the
marks obtained by the applicants (the test was marked out of 100).
65 78 72 71 71 74 63 71 76 72 64 79 73 75 70 76 78
73 69 76 66 71 70 70 73 75 75 69 73 68 70 67 74 75
70 62 75 71 78 75 74 65 71 71 73 73 77 75 78 73 73
65 68 74 69 66 72 66 63 70 73 77 73 69 76 68 74 78
73 76 72 74 76 67
13
CHAPTER 2 VISUAL PRESENTATION OF DATA
7. A clinic recorded the heights of 154 adult male patients. The heights (recorded in
metres, rounded to the nearest centimetre) were:
1.73 1.77 1.83 1.80 1.74 1.72 1.79 1.79 1.75 1.77 1.73
1.77 1.78 1.77 1.80 1.65 1.75 1.69 1.75 1.73 1.80 1.75
1.80 1.82 1.81 1.81 1.71 1.84 1.71 1.74 1.72 1.76 1.70
1.86 1.83 1.62 1.72 1.76 1.71 1.71 1.80 1.68 1.72 1.76
1.70 1.73 1.73 1.69 1.77 1.78 1.78 1.74 1.79 1.76 1.76
1.72 1.73 1.69 1.70 1.70 1.72 1.73 1.73 1.69 1.77 1.78
1.70 1.67 1.68 1.82 1.78 1.80 1.74 1.74 1.68 1.68 1.71
1.63 1.77 1.68 1.70 1.69 1.72 1.75 1.68 1.81 1.87 1.67
1.68 1.80 1.78 1.78 1.68 1.82 1.65 1.66 1.88 1.84 1.75
1.85 1.77 1.64 1.76 1.69 1.73 1.79 1.73 1.75 1.81 1.67
1.76 1.67 1.81 1.71 1.69 1.78 1.76 1.72 1.65 1.74 1.80
1.81 1.74 1.69 1.73 1.71 1.76 1.78 1.77 1.76 1.67 1.68
1.75 1.74 1.84 1.79 1.76 1.81 1.72 1.71 1.78 1.73 1.72
1.62 1.72 1.69 1.82 1.87 1.75 1.75 1.77 1.84 1.70 1.81
14