Advance Statistics: by Agl
Advance Statistics: by Agl
Advance Statistics: by Agl
STATISTICS
By AGL
Statistics
Is the study of the collection, organization,
analysis, interpretation and presentation of data. It
deals with all aspects of data, including the
planning of data collection in terms of the design
of surveys and experiments.
Statistics
The word statistics, when referring to the
scientific discipline, is singular, as in “Statistics is
an art”. This should not be confused with the word
statistics, referring to a quantity (such as mean or
median) calculated from a set of data, whose plural
is statistics (“this statistic seems wrong” or “these
statistics are misleading”).
Mean, Median, Mode and Range
Mean, median, and mode are the three kinds of
“averages”. These are many “averages” in statistics,
but these are, I think, the three most common, and
are certainly the three you are most likely to
encounter in your pre-statistics courses, if the
topic comes at all.
Mean, Median, Mode and Range
The “mean” is the “average” you`re used to, where you add
up all numbers. The “median” is the “middle” value in the
list of numbers. To find the median, your numbers have to
be listed in numerical order, so you may have to rewrite
your list first. The “mode” is the value that occurs most
often. If no number is repeated, then there is no mode for
the list.
The median is the middle value, so I`ll have to rewrite the list in order:
13, 13, 13, 13, 14, 14, 16, 18, 21
There are nine numbers in the list, so the middle one will be the (9+1) ÷ 2
= 5th number :
13, 13, 13, 13, 14, 14, 16, 18, 21
So the median is 14.
The mode is the number that is repeated more often than any other, so
13 is the mode.
The largest value in the list is 21, and the smallest is 13, so the range is 21
– 13 = 8.
mean : 15
median : 14
mode : 13
range : 8
Note: The formula for the place to find the median is “([the number of
data points] + 1”, but you don`t have to use this formula. You can just
count in from both ends of the list until you meet in the middle, if you
prefer. Either way will work.
Find the mean, median, mode and range for the following list of values:
1, 2, 4, 7
The mean is the usual average:
(1 + 2 + 4 + 7) ÷ 4 = 3.5
The median is the middle number. In this example, the numbers are already
listed in numerical order, so I don`t have to rewrite the list. But there is no
“middle” number, because there are an even numbers of numbers. In this case,
the median is the mean (the usual average) of the middle two values:
(2 + 4) ÷ 2 = 6 ÷ 2 = 3
The mode is the number that is repeated most often, but all the numbers in this
list appear only once, so there is no mode.
The largest value in this list is 7, the smallest is 1, and their difference is 6, so the
range is 6.
mean : 3.5
median : 3
mode : none
range : 6
The list values were whole numbers, but the mean was a decimal value.
Getting a decimal value for the mean (or for the median, if you have an
even number of data points) is perfectly okay; don’t round your
answers to try to match the format of the other numbers.
Find the mean, median, mode and range for the following list of
values:
8, 9, 10, 10, 10, 11, 11, 11, 12, 13
The mean is the usual average:
(8 + 9 + 10 + 10 + 10 + 11 + 11 + 11 + 12 + 13) ÷ 10 = 105 ÷ 10 = 10.5
The median is the middle value. In a list of ten values, that will be the
(10 + 1) ÷ 2 = 5.5th value; that is, I`ll need to average the fifth and sixth
numbers to find the median:
(10 + 11) ÷ 2 = 21 ÷ 2 = 10.5
The mode is the number repeated most often. This list has two values that
are repeated three times.
The largest value is 13 and the smallest is 8, so the range is 13 – 8 = 5.
mean : 10.5
median : 10.5
mode : 10 and 11
range : 5
While unusual, it can happen that two of the averages (the mean and the
median, in this case) will have the same value.
Note : Depending on your text of your instructor, the above data set may be
viewed as having no mode (rather than two modes), since no single solitary
number was repeated more often than any other. I’ve seen books that go
either way; there doesn’t seem to be consensus on the “right” definition of
“mode” in the above case. So if you’re not certain how you should answer the
“mode” part of the above example, ask your instructor before the next test.
About the only hard part of finding the mean, median, and mode is keeping
straight which “average” is which. Just remember the following:
mean : regular meaning of “average”
median : middle value
mode : most often
Median : For grouped data
- The median is the midpoint of the data array.
- Median is located in the middle value of the frequency distribution. It
is the value that separates the upper half of the distribution from the
lower half.
Median (ranked value) = — N
2
Median =
18 - 26 3 3
27 - 35 5 8
36 - 44 9 17
45 - 53 14 31
54 - 62 11 42
63 - 71 6 48
72 - 80 2 50
This class covers 18th to 31st rank in the frequency distribution. The 25th
rank belongs in this class.
Step 1 : Determine the median class
Median =
Step 4 : Median :
B. Quartiles for grouped data :
Where:
= Quartile
N = Population
K = Quartile location
LB = Lower Boundary of the quartile class
f = frequency of the quartile class
cf = cumulative frequency before the quartile class
i = class interval
Example: Using the same example. Determine
Class Limits f cf
18 - 26 3 3
27 - 35 5 8
36 - 44 9 17
45 - 53 14 31
54 - 62 11 42
63 - 71 6 48
72 - 80 2 50
1. (ranked value) =
2.
3.
LB = 54 – 0.5 = 53.50 ; f = 11 ; cf = 31
C. Deciles & Percentiles for grouped data
Class Limits f cf
18 - 26 3 3
27 - 35 5 8
36 - 44 9 17
45 - 53 14 31
54 - 62 11 42
63 - 71 6 48
72 - 80 2 50