Introduction To Statistics
Introduction To Statistics
Introduction To Statistics
Mean
You can use the terms 'mean' and 'average' interchangeably. Strictly speaking
the 'mean' is the correct term. The 'average' has a more general meaning, it can
be used to refer to any value that typifies a set of numbers.
You may already know how to calculate the mean, but I want to use the explanation to make a
few points. Suppose you want to calculate the mean of five values, say:
1 8 3 6 2
Ive introduced a bit of notation, Ive represented the mean by the symbol . The bar over the
'x' is a standard way of representing a mean.
We say x-bar and that's the way I'll write it in the text from now on.
Now lets introduce some more notation. Well represent the number of values (five in this case)
by the variable n. Although we could use other letters, n is the usual first choice when we
want to represent the number of values in a sample.
Well also substitute the individual values by xi, where the subscript i takes a number to
represent each individual value (i stands for index):
This formula can be used to find the mean of any number of values. Just substitute the symbols
x1, x2 and so on up to xn with the actual data values.
The symbol is the Greek symbol capital Sigma, and means summation in 'mathspeak':
In full the notation means add all the values of xi from i equals 1 to i
equals n (run your mouse over the image to see another view of this).
Note that the i = 1 and i = n are sometimes omitted, if they are self-
evident.
The times taken to repair breakdowns of critical equipment, in hours, over the
past week were as follows:
5 4 3 2 5 5
One way of looking at it is to imagine that weights are placed along a plank, with the distance
along the plank representing the data value. The mean is at the point of balance:
Median
The mean is a 'measure of central tendency'. It is a single value that attempts to tell you the
position of the 'center' of the data set. There are several other measures of central tendency that
try to capture the same idea, the two most important are the 'median' and the 'mode'.
Consider the following example, valuations gathered from recent house sales in a particular
neighborhood were as follows:
20 24 22 65 20 20
In this case there is an even number of values, and so no middle value. The median is the mean
of the two middle values, after the data are sorted into order of magnitude:
20 20 20 22 24 65
The American Medical Association (AMA) and the American Bar Association
(ABA) had a dispute about the rising cost of malpractice insurance for
doctors. The doctors used the 'mean' to show a sharp rise in cost over the
period concerned. The lawyers used the 'median' to show there had been no
increase in cost.
The 'Average Net Worth' or the 'Median Net Worth' is often used to measure
the prosperity of the community.
Which do you think is the better measure, and do you think there would be
much difference?
Mode
The previous example considered the ages of people presenting with a particular medical
condition. An alternative measure of central tendency that could be used for this example is the
'mode'. The mode is the most common value in the data set:
20 24 22 65 20 20
The 'mode' is used with 'categorical' data, that is data that can be placed into discrete categories,
rather than measured on a continual scale. This data can be either a number or a name. For
example, one hundred patients were asked to rate the quality of nursing care in a clinic. The
results were:
Mean:
Median: Mode: