JSMath6 Part3
JSMath6 Part3
JSMath6 Part3
Forces (deviations) at one side balance the forces (deviations) at the other
side.
Mode is the score with the highest frequency (the value that is in fashion,
the most popular).
Median is the value in the middle when the scores are placed in order (if odd
number of observations) OR average of the middle two scores (if even
number of observations).
Half the number of observations are to the right of the median the other half
to the left.
Questions: Which of these threemean, median or modedo you feel can
be used best to represent the set of scores? Justify your answer.
DO NOT ANSWER the question at this stage; only make an inventory of the
pupils opinions and their reasons, without further comment.
Write the results on the chalkboard:
Best measure to use number of because
students in favour
mean
median
mode
Inform pupils that they are going to investigate how mean, mode and median
behave, so as to make a decision on which measure might be best used in a
certain context.
Investigating (40 minutes)
The following are covered in the pupils worksheets (Worksheet for pupils is
on a following page seven pages ahead)
CHANGE EFFECT ON
MEAN MEDIAN MODE
Adding zero value(s) S S S
Adding two values with equal but
N N S
opposite deviations
Adding two values with opposite unequal
A N S
deviations
Adding two values with deviations both
A S S
positive (negative)
Adding two values equal in value to the
N N N
central measure at the top of each column
Adding one value equal in value to the
N S N
central measure at the top of each column
Ask whether or not pupils want to change their previous opinion based on
the increased insight on behaviour of the measures. If a pupil wants to
change he/she is to justify the decision.
The discussion should lead to the decision that the median is most
appropriate: half of the pupils scored below / above 17.5. The mean is less
appropriate as it does not give any information as to how many pupils
scored above / below the average of 16.5 (as mean is affected by outliers).
Pupils assignment
(or take some questions for discussion in class if time permits)
In each of the following cases decide, giving your reasons, whether the mean,
median or mode is the best to represent the data.
1. Mr. Taku wants to stock his shoe shop with shoes for primary school
children. In a nearby primary school he collects the shoe sizes of all the
200 pupils (one class group from class 1 to class 7). Will he be interested
in the mean size, median size or modal size?
Answer: mode
2. In a small business 2 cleaners earn P340 each, the 6 persons handling the
machinery earn P600 each, the manager earns P1500 and the director
P3500 per month.Which measuremean, median or mode best
represents these data?
Answer: mode
3. An inspector visits a school and want to get an impression of how well
form 2X is performing. Will she ask the form teacher for mean, median or
mode?
Answer: median
CHANGE EFFECT ON
MEAN MEDIAN MODE
Adding zero value(s)
Adding two values with equal but opposite
deviations
Adding two values with opposite unequal
deviations
Adding two values with deviations both
positive (negative)
Adding two values equal in value to the
central measure at the top of each column
Adding one value equal in value to the
central measure at the top of each column
From the bar graph find the mean, median and mode. Which of the three
measures is easiest to find?
Practice task 2
1. Try out the lesson outline 2 in your class: A look at the average wage
2a) Write an evaluative report on the lesson. Questions to consider are: Did
pupils meet difficulties? Were pupils well motivated to work on the
activity? Were the objectives achieved? Did you meet some specific
difficulties in preparing the lesson or during the lesson? Was discussion
among pupils enhanced?
b) Present the lesson plan and report to your supervisor.
A mode or median cannot be obtained from this frequency table. You can
only read off the class interval that contains the mode and the class interval
that contains the median. The class interval with the highest frequency is
21 - 30: the modal class. The median is the 22nd observation, i.e., the score
of the 22nd student; that falls in the class interval 31 - 40.
If the number of data is large but discrete (for example the scores of 2000
pupils in an examination marked out of 60) or continuous (the time taken to
run 100 m) data is best placed in groups or intervals.
The scores could be grouped in five intervals: 1 - 10, 11 - 20, 21 - 30, 31 - 40,
41 - 50, and 51 - 60 as in the distribution table above for 43 students only. By
grouping data some information is lost. For example in the class
1 - 10 there are 4 students, and we no longer can see what their actual scores
were (did all score 10?).
For calculation purposes the mid-interval value (average of lower bound
and upper bound value of the interval) is used.
From a grouped frequency table you can find:
the modal class (the class with the highest frequency)
the interval in which the median is found
an estimate of the mean
A calculation of the mean using mid-interval values.
sum of [mid interval value frequency]
Estimate of the mean =
sum of the frequencies
Example
The table gives the end of year examination mark of 200 students (maximum
mark was 50) and the calculation to obtain an estimate of the mean.
b) for grouped values some calculators allow the entry of grouped data
by having separate entry keys for the group frequency (or count) and
for that groups average value. Consult your documentation.
4. Once all data points have been entered, a press of the X key (or
equivalent) will display the mean of all the entered data. In most
calculators there are other keys for the sample and population standard
deviations.
a) How many trees had a height h cm in the range 110 - 119 cm?
b) Make the grouped frequency table corresponding to the histogram.The
first class interval is 100 - 109, the second 110 - 119, etc.
c) How many fruit trees in total were measured?
d) What is the modal class interval?
e) Calculate an estimate of the mean height of the fruit trees from the
grouped frequency table.
4. The ages of pupils in Sefhare CJSS were represented in a histogram.
Although age can be considered to be continuous, you are 14 from the day
you turn 14 until the day you turn 15. The class interval has as lower
bound 14 and as upper bound 15.
Continued on next page
Percentiles
Quartiles divided the data into quarters, similarly percentiles divide the data
into hundred parts.
1
The median is the 50th percentile, the ( n + 1) th value in the ordered
2
sequence of the n values.
1
The lower quartile is the 25th percentile, the ( n + 1) th value.
4
3
The upper quartile is the 75th percentile, the ( n + 1) th value.
4
The pth percentile can be estimated from a cumulative frequency curve by
p
taking (as an approximation) the of the total number of values: p% n.
100
p
More exact it is the ( n + 1) th value.
100
Number of
1 10 11 20 21 30 31 40 41 50
correct answers
Number of
12 43 71 49 25
students
From this grouped frequency table mean, median and mode cannot be
calculated exactly.
You have seen that from the table you can obtain:
(i) an estimate of the mean (assuming all the values in the interval take the
mid interval value)
(ii) the modal class: 21 - 30 correct answers
Altogether there are 100 units of area contained in the histogram. We are
looking for a line that will divide the area such that 50 units of area are to the
left of the line and 50 units of area to the right.
The line is to be drawn somewhere in class 119.5-129.5 (containing 30 units
of area). To the left of this class there are 10 + 20 = 30 units of area. We need
20 more units of area to make up the 50.
We need therefore to divide the 30 units of area of the class 119.5-129.5 in
the ratio 20: 10 = 2 : 1.
1 67
2 1368
3 1122335589
4 01122345678
5 0012667899
6 234566
7 01379
8 05
9 2
n = 50 1 | 6 represent 16 marks.
To represent this data in a boxplot first find the five measures.
Minimum score 16
Maximum score 92
The upper quartile is the median of the upper 25 observations, i.e., the 38th
which is 62 (Q3).
Note that 25% of the pupils scored less than Q1 = 35 (represented by the
lower whisker).
50% of the pupils scored between Q1 = 35 and Q3 = 62 (represented by the
box).
25% of the pupils scored more than Q3 = 62 (represented by the upper
whisker).
The diagram also illustrates the lowest (16) and the highest score (92).
The box illustrates that of the middle 50% of the pupils, 255 scored between
35 and 46.5 (the lower part of the box) and 25% between 46.5 and 62 (the
upper part of the box).
The diagram not only illustrates the measures of central tendency but simple
measures of the amount of spread (variability) can be obtained from the
diagram:
Practice task 3
1. Discuss what you consider the most effective method to facilitate the
learning of data handling. Illustrate with example activities.
2. a) Collect test data on the same topic from two parallel classes.
b) Represent the data in (i) grouped frequency table (ii) histogram (iii)
frequency polygon (both sets of data on the same axes) (iv) double
stem-leaf plot.
c) Which of the representations do you feel best represents the data?
Justify your choice.
d) Calculate (i) the exact value of the mean (ii) an estimate of the mean
from the grouped frequency table (iii) the percent error in the
estimated value.
e) Which of the three averages, mean, mode or median, best represents
the data? Explain.
f) Represent the data of both classes in a box plot.
g) What conclusions can you safely draw from the data?
3. a) Collect data for your school on the ages of the students by gender.
b) Present your data in two frequency polygons, one for the girls and one
for the boys, using the same axes.
c) Calculate an estimate of the mean age of (i) boys (ii) girls in your
school.
Continued on next page
Summary
This unit began with a project-based approach to the teaching of central
tendency. It ended with a number of self-marking exercises to teach you some
lesser-known ways of representing quantitative data. It is hoped that you, and
eventually your students, will benefit from this practical approach to
statistics. Remember this caution from the Introduction to the unit: no set of
projects could ever teach your students all, or even most, of the techniques we
have covered. But since your students will benefit more (in later life) from the
projects, a wise teacher omits many techniques of statistics in order to leave
room for completion of interesting projects.
b) 47
c) modal class 60 m < 65
d) Median in class 55 m < 60
e) Estimate of mean 59.2 g
2. a) 95
b)
Diameter (cm) 5d<6 6d<7 7d<8 8d<9 9d<10
Frequency 20 50 120 95 15
d) 14.8 years
5. a) 24 b) 14 length < 18
c)
Length (m) 10 l <12 12 l < 13 30 l < 14 14 l< 18 18 l < 20
Frequency 8 7 8 24 4
Median (26th observation) in class 14 length < 18
d) 51 e) 743.5 m f) 14.6 m
6. a) Modal class 70 m < 100
b)
Mass (g) 30 m <50 50 m < 60 60 d < 70 70 d< 100
Frequency 80 80 70 120
c) 62.1 g
7. a) 250 I < 500
b)
Income (P) 250 I <500 500 I < 1000 1000 d < 2000 2000 d< 5000
Frequency
0.3 0.1 0.04 0.004
density
Frequency 75 50 40 12
c) 500 I < 1000
d) P947
e) Mode, the salary earned by most people
b) 168.7 cm
9. a) Boundaries of bars 10, 15, 20, 25, 35, 45, 60 and 75
Frequency
Density 5.6 13 16.4 7.6 5.4 2.9 0.8
9. b) 30.1 years
f) 4.6 minutes
3. a)
Time t 0< t 20 20< t 40 40 < t 60 60 < t 90 90 < t 120
CF 12 54 132 154 160
b)
4. a) Use as class boundaries 5.5, 25.5, 45.5, etc. Class width is 20.
Modal class 66 85. Use construction as illustrated in question 1.
Estimated mode 74.1.
d, e) Read length on horizontal axis at the point of inflexion as illustrated
in question 3.
d) More drivers had a speed beyond the upper quartile speed than driver
with a speed below the lower quartile speed.
e) The 25% of drivers with speed above the median speed of 60 km/h
had a wider spread (60 to 70 km/h) than the 25% of drivers with a
speed below the median speed (range 55 to 60 km/h).
f) 50%
Purpose of Unit 5
The main aim of this unit is to look at some basic measures of spread: how to
calculate them and how to interpret them. This unit covers range, inter
quartile range, variance and standard deviation. Box plotsas covered in
Unit 4are a useful graphical aid to visualise spread of data.
Objectives
When you have completed this unit you should be able to:
calculate the range and inter quartile range of ungrouped data
obtain an estimate of the inter quartile range of grouped data
calculate the standard deviation and variance of ungrouped data
calculate an estimate of the standard deviation and variance of grouped
data
use measures of central tendency and of spread to compare sets of similar
data
Time
To study this unit will take you about five hours.
Example 2
The estimate of the length of the book by 10 students was in order to the
nearest cm: 24 24 26 28 29 31 32 33 34 36.
The range is 36 24 = 12 cm.
10 + 1
As = 5.5, the median is the average of the 5th and 6th term in the
2
29 + 31
ordered sequence. Median is = 30 cm.
2
x x x ( x x)2
7 -4 16
9 -2 4
11 0 0
12 1 1
12 1 1
15 4 16
x = 66 38
x 66
x== = 11
6 6
x means sum of all the data x. x 2
means sum all the squares of the
data x.
Variance = s2 =
(x x) 2
=
38
n 6
Practice task 1
x from
2 ( x x) 2
. Remember that
x = x
n n n
Suggested answer at the end of this unit.
x x2 f fx fx2
12 144 6 72 864
13 169 8 104 1352
14 196 10 140 1960
15 225 14 210 3150
16 256 15 240 3840
17 289 5 85 1445
18 324 2 36 648
x= x2= f= fx= fx2=
105 1603 60 887 13259
x2 =
13259
(14.78) 2 = 1.6 h (1 dp)
f 60
fm = m and s.d. =
fm 2
m2
f f
a) Using the same axes draw a frequency polygon for the heights of the
boys and of the girls.
b) Use the frequency polygon to compare the heights of boys and girls.
c) Calculate an estimate of the mean height and standard deviation for
both boys and girls.
d) Compare the height of boys and girls using your estimated values of
mean and standard deviation.
4. Two novels were compared with each other by counting the number of
words in the sentences in a section of the novel.
Number of words 5 - 9 10 - 14 15 - 19 20 - 24 25 - 29 30 - 34 35 - 39
Novel A 12 15 10 26 14 8 4
Novel B 8 18 12 31 18 4 2
a) Calculate for each novel an estimate for the mean number of words in
a sentence and the standard deviation.
b) Compare the results of a. Which novel do you think is easier to read?
Justify your answer.
5. The number of hours spent on sports by two groups of pupils, group A
and group B, are represented in the histograms below.
Which of the data represented in the histograms has a greater standard
deviation? Justify your answer.
Height 151 - 155 156 - 160 161 - 165 166 - 170 171 - 175
Frequency 6 9 14 23 8
Cumulative
6 15 29 52 60
Frequency
b) Plot the points (155.5,6), (160.5, 15), (165.5, 29), (170.5, 52), (175.5, 60)
c) IQR 168.5 160.0 = 8.5 cm
2a) Plot (6.85, 4), (6.95, 15), (7.05, 51), (7.15, 95), (7.25, 100)
b) IQR 7.09 6.99 = 0. 1 cm
3a) Plot (10, 8), (15, 34), (20, 108), (25, 198), (30, 322), (35, 464), (40, 550),
(45, 604), (60, 630), (75, 645)
b) IQR 36 23 =13 years.
4a) Use linear interpolation for the estimates (to 2 dp)
LQ UQ IQR Median
Type A 12.30 18.49 6.19 15.58
Type B 11.50 19.53 8.03 15.63
b) Type A , smaller IQR and more consistent in performance [as both types
have the same median (to 1 dp)]
( x x) 2
=
( x 2
2 xx + x 2 )
=
x 2
2 nx x
+
nx 2
n n n n n
=
x 2
2x2 + x2 =
x 2
x2
n n