Standard Deviation Formulas
Standard Deviation Formulas
Standard Deviation Formulas
Standard Deviation
The Standard Deviation is a measure of how spread out numbers are.
Its symbol is (the Greek letter sigma).
This is the formula for Standard Deviation:
2. Then for each number: subtract the Mean and square the result
Step 2. Then for each number: subtract the Mean and square the result
This is the part of the formula that says:
Example (continued):
(9 - 7)2 = (2)2 = 4
(2 - 7)2 = (-5)2 = 25
(5 - 7)2 = (-2)2 = 4
(4 - 7)2 = (-3)2 = 9
(12 - 7)2 = (5)2 = 25
(7 - 7)2 = (0)2 = 0
(8 - 7)2 = (1)2 = 1
... etc ...
Sigma Notation
We want to add up all the values from 1 to N, where N=20 in our case because there are 20 values:
Example (continued):
We already calculated (x1-7)2=4 etc. in the previous step, so just sum them up:
= 4+25+4+9+25+0+1+16+4+16+0+9+25+4+9+9+4+1+4+9 = 178
But that isn't the mean yet, we need to divide by how many, which is simply done by multiplying by
"1/N":
Example (continued):
Step 4. Take the square root of that and you are done!
Example (concluded):
= (8.9) = 2.983...
Example: Sam has 20 rose bushes, but what if Sam only counted the flowers on 6 of
them?
The "population" is all 20 rose bushes,
and the "sample" is the 6 he counted. Let us say they were:
9, 2, 5, 4, 12, 7
We can still estimate the Standard Deviation.
But when you use the sample as an estimate of the whole population, the Standard Deviation
formula changes to this:
The formula for Sample Standard Deviation:
The important change is "N-1" instead of "N" (which is called "Bessel's correction").
The symbols also change to reflect that we are working on a sample instead of the whole population:
The mean is now x (for sample mean) instead of (the population mean),
But that does not affect the calculations. Only N-1 instead of N changes the calculations.
OK, let us now calculate the Sample Standard Deviation:
Step 2. Then for each number: subtract the Mean and square the result
Example 2 (continued):
(9 - 6.5)2 = (2.5)2 = 6.25
(2 - 6.5)2 = (-4.5)2 = 20.25
(5 - 6.5)2 = (-1.5)2 = 2.25
(4 - 6.5)2 = (-2.5)2 = 6.25
(12 - 6.5)2 = (5.5)2 = 30.25
(7 - 6.5)2 = (0.5)2 = 0.25
Example 2 (continued):
Sum = 6.25 + 20.25 + 2.25 + 6.25 + 30.25 + 0.25 = 65.5
Divide by N-1: (1/5) 65.5 = 13.1
(This value is called the "Sample Variance")
Step 4. Take the square root of that and you are done!
Example 2 (concluded):
s = (13.1) = 3.619...
Comparing
When we used the whole population we got: Mean = 7, Standard Deviation = 2.983...
When we used the sample we got: Sample Mean = 6.5, Sample Standard Deviation = 3.619...
Our Sample Mean was wrong by 7%, and our Sample Standard Deviation was wrong by 21%.
The mean (or average) of a set of data values is the sum of all of the data values divided by the
number of data values. That is:
Example 1
The marks of seven students in a mathematics test with a maximum possible mark of 20 are given
below:
15 13 18 16 14 17 12
Find the mean of this set of data values.
Solution:
Median
The median of a set of data values is the middle value of the data set when it has been
arranged in ascending order. That is, from the smallest value to the highest value.
Example 2
The marks of nine students in a geography test that had a maximum possible mark of 50 are given
below:
47 35 37 32 38 39 36 34 35
Find the median of this set of data values.
Solution:
Arrange the data values in order from the lowest value to the highest value:
32
34
35
35
36
37
38
39
47
The fifth data value, 36, is the middle value in this arrangement.
Note:
In general:
If the number of values in the data set is even, then the median is the average of the two middle
values.
Example 3
Arrange the data values in order from the lowest value to the highest value:
10 12 13 16 17 18 19 21
The number of values in the data set is 8, which is even. So, the median is the average of the two
middle values.
Alternative way:
The fourth and fifth scores, 16 and 17, are in the middle. That is, there is no one middle value.
Note:
Half of the values in the data set lie below the median and half lie above the
median.
The median is the most commonly quoted figure used to measure property
prices. The use of the median avoids the problem of the mean property price
which is affected by a few expensive properties that are not representative of the
general property market.
Mode
The mode of a set of data values is the value(s) that occurs most often.
The mode has applications in printing. For example, it is important to print more of the most popular
books; because printing different books in equal numbers would cause a shortage of some books and
an oversupply of others.
Likewise, the mode has applications in manufacturing. For example, it is important to manufacture
more of the most popular shoes; because manufacturing different shoes in equal numbers would cause
a shortage of some shoes and an oversupply of others.
Example 4
44
48
45
42
49
48
Solution:
It is possible for a set of data values to have more than one mode.
If there are two data values that occur most frequently, we say that the set of
data values is bimodal.
If there is no data value or data values that occur most frequently, we say that
the set of data values has no mode.
Analysing Data
The mean, median and mode of a data set are collectively known as measures of central tendency as
these three measures focus on where the data is centred or clustered. To analyse data using the mean,
median and mode, we need to use the most appropriate measure of central tendency. The following
points should be remembered:
The mean is useful for predicting future results when there are no extreme values
in the data set. However, the impact of extreme values on the mean may be
important and should be considered. E.g. The impact of a stock market crash on
average investment returns.
The median may be more useful than the mean when there are extreme values in
the data set as it is not affected by the extreme values.
The mode is useful when the most common item, characteristic or value of a data
set is required.