Measures of Central Tendency and Dispersion/Variability: Range, Variance and Standard Deviation
Measures of Central Tendency and Dispersion/Variability: Range, Variance and Standard Deviation
Measures of Central Tendency and Dispersion/Variability: Range, Variance and Standard Deviation
Chapter 6 DISPERSION/VARIABILITY
A measure of central tendency is a single value that attempts to describe a set of data
(like scores) by identifying the central position within that set of data scores. As such, measures
of central tendency are sometimes called measures of central location. Central tendency refers
to the center of a distribution of observations. Where do scores tend to congregate? In a test of
100 items, where are most of the scores? Do they tend to a group around the mean score of 50
or 80?
There are three measures central tendency – the mean, the median and the mode.
Perhaps you are most familiar with the mean (often called the average). But there are two other
measures of central tendency, namely, the median and the mode. Is there such a thing as best
measure of central tendency?
If the measures of central tendency indicate where scores congregate, the measures of
variability indicate how spread out a group of score is or how varied the scores are or how far
they are from the mean? Common measures of dispersion or variability are range, interquartile
range, variance and standard deviation.
USMKCC-COL-F-050
When Not to Use the Mean
The mean has one main disadvantage. It is particularly susceptible to the influence of
outliers. These are values that are unusual compared to the rest of the data set by being
especially small or large in numerical value. For example, consider the scores of 10 Grade 12
students in a 100-item Statistics test below:
Score 1 2 3 4 5 6 7 8 9 10
5 38 56 60 67 70 73 78 79 95
The mean score for these ten Grade 12 students is 62.1. However, inspecting the raw data
suggests that this mean score may not be the best way to accurately reflect the score of the
typical Grade 12 student, as the most students have scores in the 5 to 95 range. The mean is
being skewed by the extremely low and extremely high scores. Therefore, in this situation, we
would like to have a better measure of central tendency. As we will find out later, taking the
median would be a better measure of central tendency in this situation.
Median
The median is the middle score for a set of scores arranged from lowest to highest. The
mean is less affected by extremely low and extremely high scores. How do we find the median?
Suppose we have the following data:
65 55 89 56 35 14 56 55 87 45 92
To determine the median, first we have to rearrange the scores into order of magnitude
(from smallest to largest).
14 35 45 55 55 56 56 65 87 89 92
Our median is the score at the middle of the distribution. In this case, 56. It is the middle
score. There are 5 scores before it and 5 score after it. This works fine when you have an odd
number of scores, but what happens when you have an even number of scores? What if you had
10 scores like the scores below?
65 55 89 56 35 14 56 55 87 45
Arrange that data according to order of magnitude (smallest to largest). Then take the
middle two scores (55 and 56) and compute the average of the two scores. The median is 55.5.
This gives us a more reliable picture of the tendency of the scores. There are indeed scores of 55
and 56 in the score distribution.
USMKCC-COL-F-050
Mode
The mode is the most frequent score in our data set. On a histogram or bar chart it
represents the highest bar. If is a score of the number of times an option is chosen in a multiple
choice test you can, therefore, sometimes consider the mode as being the most popular option.
Study the score distribution given below:
14 35 45 55 55 56 56 65 87 89
These are two most frequent scores 55 and 56. So we have a score distribution with two
modes, hence a bimodal distribution.
• If mean is equal to the median and median is equal to the mode, the score distribution
shows a perfectly normal distribution. This illustrated by the perfect bell shape or normal
curve shown in Figure 13.
• If mean is less than the median and the mode, the score distribution is a negatively
skewed distribution. See figure 14. In a negatively skewed distribution, the scores tend to
congregate at the upper end of the score distribution.
USMKCC-COL-F-050
Figure 14. Negatively Skewed Distribution
• If mean is greater than the median and the mode, the score distribution is a positively
skewed distribution. See figure 15. In a positively skewed distribution, the scores tend to
congregate at the lower end of the score distribution.
USMKCC-COL-F-050
If scores tend to be high because teacher taught very well and students are highly
motivated to learn, the score distribution tends to be negatively skewed, i.e., the scores will tend
to be high. On the other hand, when teacher does not teach well and students are poorly
motivated, the score distribution tends to be positively skewed which means that scores tend to
below. So which score distribution should we work for?
Range
What is variability?
Variability refers to how “spread out” a group of scores is. The term variability, spread,
and dispersion are synonyms, and refer to how spread out distribution is. Here are two sets of
score distribution:
A – 5, 5, 5, ,5, 6 , 6, 6, 6,6, 6 – Mean is 5, 6
B – 1, 3, 4, 5, 5, 6, 7, 8, 8, 9 – Mean is 5,6
The two score distributions have equal mean scores and yet the scores are varied. Score
distribution A shows scores that are less varied than score distribution B. That is what we mean
by variability or dispersion. If we have to study both score distributions, assuming that the highest
USMKCC-COL-F-050
possible score in the quiz is 10, we can say that Groups A and B are equal in terms of mean but
Group A has more similar scores and are closer to the mean while Group B, while its mean is
equal to the mean of Group A, students in Group B have more varied scores than Group A. In
fact, the lowest score is extremely low compared to Group A and the highest score is much higher
than the highest score in Group A.
To see more what we mean in spread out, consider graphs in Figures 16 and 17. These
graphs represent the scores on two quizzes. The mean score for each quiz is 7.0. Despite the
equality of means, you can see that the distributions are quite different. Specifically, the scores
on Quiz 1 are more densely packed and those on Quiz 2 are more spread out. The differences
among students were much greater on Quiz 2 than on Quiz 1
Quiz 1
0
4 5 6 7 8 9 10
USMKCC-COL-F-050
Quiz 2
4.5
3.5
2.5
1.5
0.5
0
4 5 6 7 8 9 10
Range
The range is the simplest measures of variability. The range is simply the highest score
minus the lowest score. Here are examples: Let’s take a few examples. What is the range of the
following group of scores: 10, 2, 5, 6, 7, 3, 4? The highest number is 10, and the lowest number
is 2, so 10-2=8. The range is 8.
Here are other examples:
Here is a set of scores in a test: 99, 45, 23, 67, 45, 91, 82, 78, 62, 51. What is the range?
The highest number is 99 and the lowest number is 23, so 99-23 equals 76; the range is 76. Here
is another set of scores: 40, 40, 42, 50, 53, 56, 67, 68, 70, 89. What is the range? 89 minus 40
equals 49. The range is 49. The set of scores with a range of 76 is more varied or more spread
than the set of scores with a range of 49.
USMKCC-COL-F-050
Variance
Variability can also be defined in terms of how close the scores in the distributions are to
the middle of the distribution. Using the mean as the measure of the middle distribution, the
variance is defined as the average squared difference of the scores from the mean. The data from
Quiz 1 are shown in Table 1. The mean score is 7.0. Therefore, the column “Deviation from Mean”
contains the score minus 7. The column “Squared Deviation” is simply the previous column
squared.
One thing that is important to notice is that the mean deviation from the mean is 0. This
will always be the case. The mean of the squared deviation is 1.5. Therefore, the variance is 1.5.
USMKCC-COL-F-050
The formula for the variance is:
∑(𝑥−𝜇 )2
𝜎2 = 𝑁
Standard Deviation
To calculate the standard deviation of those numbers:
(𝑥𝑖 - 𝜇)2
So what is 𝑥𝑖 ? They are the individual x values 9, 2, 5, 4, 12, 7, etc… In other words
𝑥1 = 91 𝑥2 = 22 𝑥3 = 53 etc.
USMKCC-COL-F-050
So it says “for each value, subtract the mean and square the result,” like this
Example (continued):
(9-7)2 = (2)2 = 4
(2-7)2 = (-5)2 = 25
(5-7)2 = (-2)2 = 4
(4-7)2 = (-3)2 = 9
(12-7)2 = (5)2 = 25
(7-7)2 = (0)2 = 0
(8-7)2 = (1)2 = 1
…etc…
And we get these results:
4, 25, 4, 9, 25, 0, 1, 16, 4, 16, 0, 9, 25, 4, 9, 9, 4, 1, 4, 9
Step 3. Then work out the mean of those squared differences. To work out the mean,
add up all the values then divide by how many.
First add up all the values from the previous step.
But how do we say “add them all up” in mathematics? We “sigma” ∑
The handy Sigma Notation says to sum up as many terms as we want:
We want to add up all the values from 1 to N, where N=20 in our case because there are
20 values:
Example (continued):
∑𝑁
𝐼=1(𝑥𝑖 - 𝜇)
2
USMKCC-COL-F-050
Example (continued):
𝟏
∑𝑁
𝐼=1(𝑥𝑖 - 𝜇)
2
𝑵
𝟏
𝝈 = √𝑵 ∑𝑁
𝐼=1(𝑥𝑖 − 𝜇)
2
𝝈 = √(𝟖. 𝟗) = 2.983…
𝟏
𝐼=1(𝑥𝑖 − )
𝑺 = √𝑵−𝟏 ∑𝑁 2
The important change is “N-1” instead of “N” (which is called “Bessel’s correction”).
The symbols also change to reflect that we are working on sample instead of the whole
population:
• The mean is now x (for sample mean) instead of 𝝁 (the population mean),
• And the answer is s (for Sample Standard Deviation) instead of 𝝈.
USMKCC-COL-F-050
But that does not affect the calculations. Only N-1 instead of N changes the calculations.
Here are the steps in calculating the Sample Standard Deviation:
Step 1. Work out the mean
Example 2: Using sampled values 9, 2, 5, 4, 12, 7
The mean is (9+2+5+4+12+7) / 6 = 39/6 =6.5
So: x = 6.5
Step 2. Then for each number: subtract the Mean and square the result
Example 2 (continued):
2 2
(9 – 6.5) = (2.5) = 6.25
2 2
(2 – 6.5) = (-4.5) = 20.25
2 2
(5 – 6.5) = (-1.5) = 2.25
2 2
(4 – 6.5) = (-2.5) = 6.25
2 2
(12 – 6.5) = (5.5) = 30.25
2 2
(7 – 6.5) = (0.5) = 0.25
Step 3. Then work out the mean of those squared differences. To work out the mean,
add up all the values then divide by how many.
But hang on… we are calculating the Sample Standard Deviation, so instead of dividing by
how many (N), we will divide by N-1
Example 2 (continued):
Sum = 6.25 + 20.25 + 2.25 + 6.25 + 30.25 + 0.25 = 65.5
Divide by N-1: (1/5) x 65.5 = 13.1
(this value is called the “Sample Variance”
Step 4. Take the square root of that:
Example 2 (concluded):
USMKCC-COL-F-050
𝟏
𝑺 = √𝑵−𝟏 ∑𝑁
𝐼=1(𝑥𝑖 − )
2
s = √(13.1) = 3.619…
6.5 Comparing
When we used the whole population we got: Mean = 7, Standard Deviation = 2.983…
When we used the sample we got: Sample Mean: 6.5, Sample Standard Deviation =
3.619…
Our Sample Mean was wrong by 7%, and our Sample Standard Deviation was wrong by
21%.
USMKCC-COL-F-050
Figure 18. Normal distribution with standard deviations of 5 and 10
Standard Deviation is a measure of dispersion, the more dispersed the data, the less
consistent the data are. A lower standard deviation means that the data are more clustered
around the mean and hence the data set is more consistent.
You need to read your calculator instructions to see what notation your calculator uses
for standard deviation.
An example: Standard deviation for a data set with frequency 1.
Using the following data: 10 15 13 25 22 53 47
We found the mean to be x = 26:4285714. You should also see from the same calculation
that the standard deviation (SD) = 16:98879182.
USMKCC-COL-F-050
No, consider the standard deviations. Katie has a standard deviation of SD = 37.6470 and
Mike has a standard deviation of SD = 26.1017. Since Mike has a smaller standard deviation, he
is a more consistent bowler than Katie, i.e., Mike is more likely to get a score of 201.4.
Let’s presume that Katie’s and Mike’s scores are scores in a long test:
Katie’s Scores - 189 146 200 241 231
Mike’s Scores - 235 201 217 168 186
If you compute the mean for both sets scores, you get 201. SD for Katie’s scores is
37.64.70 while Mike is 26.1017. Mike’s scores indicate greater consistency than those of Katie.
This means that Mike tends to do better than Katie because his scores are more consistent than
those of Katie.
USMKCC-COL-F-050