Mean Mode Median - Merged
Mean Mode Median - Merged
Mean Mode Median - Merged
2 ARITHMETIC MEAN
Adding all the observations and dividing the sum by the number of observations
These are seven observations. Symbolically, the arithmetic mean, also called simply
mean is
10 + 15 + 30 + 7 + 42 + 79 + 83
=
7
266
= = 38
7
It may be noted that the Greek letter μ is used to denote the mean of the population
and n to denote the total number of observations in a population. Thus the population
mean μ = ∑x/n. The formula given above is the basic formula that forms the
definition of arithmetic mean and is used in case of ungrouped data where weights are
not involved.
In case of ungrouped data where weights are involved, our approach for calculating
Example 2.1: Suppose a student has secured the following marks in three tests:
Mid-term test 30
Laboratory 25
Final 20
30 + 25 + 20
The simple arithmetic mean will be = 25
3
21
However, this will be wrong if the three tests carry different weights on the basis of
their relative importance. Assuming that the weights assigned to the three tests are:
Laboratory 3 points
Final 5 points
Solution: On the basis of this information, we can now calculate a weighted mean as
shown below:
Mid-term 2 30 60
Laboratory 3 25 75
Final 5 20 100
Total ∑ w = 10 235
∑ wx w1 x1 + w2 x 2 + w3 x 3
x= =
∑w w1 + w2 + w3
60 + 75 + 100
= = 23.5 marks
2+3+5
It will be seen that weighted mean gives a more realistic picture than the simple or
unweighted mean.
falling prices in the stock exchange, a stock is sold at Rs 120 per share on one day, Rs
105 on the next and Rs 90 on the third day. The investor has purchased 50 shares on
the first day, 80 shares on the second day and 100 shares on the third' day. What
22
Solution:
Day Price per Share (Rs) (x) No of Shares Purchased (w) Amount Paid (wx)
1 120 50 6000
2 105 80 8400
3 90 100 9000
w1 x1 + w2 x 2 + w3 x 3 ∑ wx
Weighted average = =
w1 + w2 + w3 ∑w
It will be seen that if merely prices of the shares for the three days (regardless of the
number of shares purchased) were taken into consideration, then the average price
would be
120 + 105 + 90
Rs. = 105
3
purchased, it fails to give a correct picture. A simple average, it may be noted, is also
a weighted average where weight in each case is the same, that is, only 1. When we
use the term average alone, we always mean that it is an unweighted or simple
average.
For grouped data, arithmetic mean may be calculated by applying any of the
following methods:
23
In the case of direct method, the formula x = ∑fm/n is used. Here m is mid-point of
various classes, f is the frequency of each class and n is the total number of
frequencies. The calculation of arithmetic mean by the direct method is shown below.
Example 2.3: The following table gives the marks of 58 students in Statistics.
Solution:
No. of Students
Marks Mid-point m fm
f
0-10 5 4 20
10-20 15 8 120
20-30 25 11 275
30-40 35 15 525
40-50 45 12 540
50-60 55 6 330
60-70 65 2 130
∑fm = 1940
Where,
x=
∑ fm = 1940 = 33.45 marks or 33 marks approximately.
n 58
It may be noted that the mid-point of each class is taken as a good approximation of
the true mean of the class. This is based on the assumption that the values are
distributed fairly evenly throughout the interval. When large numbers of frequency
24
In the case of short-cut method, the concept of arbitrary mean is followed. The
formula for calculation of the arithmetic mean by the short-cut method is given
below:
x= A+
∑ fd
n
f = frequency
When the values are extremely large and/or in fractions, the use of the direct method
would be very cumbersome. In such cases, the short-cut method is preferable. This is
particularly for calculation of the product of values and their respective frequencies.
However, when calculations are not made manually but by a machine calculator, it
may not be necessary to resort to the short-cut method, as the use of the direct method
As can be seen from the formula used in the short-cut method, an arbitrary or assumed
mean is used. The second term in the formula (∑fd ÷ n) is the correction factor for the
difference between the actual mean and the assumed mean. If the assumed mean turns
out to be equal to the actual mean, (∑fd ÷ n) will be zero. The use of the short-cut
method is based on the principle that the total of deviations taken from an actual mean
is equal to zero. As such, the deviations taken from any other figure will depend on
how the assumed mean is related to the actual mean. While one may choose any value
as assumed mean, it would be proper to avoid extreme values, that is, too small or too
high to simplify calculations. A value apparently close to the arithmetic mean should
be chosen.
25
For the figures given earlier pertaining to marks obtained by 58 students, we calculate
Example 2.4:
Mid-point
Marks f d fd
m
0-10 5 4 -30 -120
10-20 15 8 -20 -160
20-30 25 11 -10 -110
30-40 35 15 0 0
40-50 45 12 10 120
50-60 55 6 20 120
60-70 65 2 30 60
∑fd = -90
It may be noted that we have taken arbitrary mean as 35 and deviations from
midpoints. In other words, the arbitrary mean has been subtracted from each value of
x= A+
∑ fd
n
⎛ − 90 ⎞
= 35 + ⎜ ⎟
⎝ 58 ⎠
Now we take up the calculation of arithmetic mean for the same set of data using the
26
x = A+
∑ fd ' × C
n
⎛ − 9 × 10 ⎞
= 35 + ⎜ ⎟ = 33.45 or 33 marks approximately.
⎝ 58 ⎠
It will be seen that the answer in each of the three cases is the same. The step-
also be noted that if we select a different arbitrary mean and recalculate deviations
Now that we have learnt how the arithmetic mean can be calculated by using different
Example 2.6: The mean of the following frequency distribution was found to be 1.46.
Solution:
Here we are given the total number of frequencies and the arithmetic mean. We have
to determine the two frequencies that are missing. Let us assume that the frequency
27
x + 2y + 140
1.46 =
200
x + 2y = 152
x + y = 200 - 86
x + y = 114
x + 2y = 152
x+y = 114
- - -
y = 38
Therefore, x = 114 - 38 = 76
Against accident 1 : 76
Against accident 2 : 38
1. The sum of the deviations of the individual items from the arithmetic mean is
the arithmetic mean. Since the sum of the deviations in the positive direction
is equal to the sum of the deviations in the negative direction, the arithmetic
2. The sum of the squared deviations of the individual items from the arithmetic
mean is always minimum. In other words, the sum of the squared deviations
taken from any value other than the arithmetic mean will be higher.
28
3. As the arithmetic mean is based on all the items in a series, a change in the
value of any item will lead to a change in the value of the arithmetic mean.
4. In the case of highly skewed distribution, the arithmetic mean may get
2.3 MEDIAN
Median is defined as the value of the middle item (or the mean of the values of the
two middle items) when the data are arranged in an ascending or descending order of
odd. When n is even, the median is the mean of the two middle values.
We have to first arrange it in either ascending or descending order. These figures are
5,7,10,15,18,19,21,25,33
Now as the series consists of odd number of items, to find out the value of the middle
n +1
Where
2
n +1
Where n is the number of items. In this case, n is 9, as such = 5, that is, the size
2
Suppose the series consists of one more items 23. We may, therefore, have to include
23 in the above series at an appropriate place, that is, between 21 and 25. Thus, the
series is now 5, 7, 10, 15, 18, 19, and 21,23,25,33. Applying the above formula, the
29
median is the size of 5.5th item. Here, we have to take the average of the values of 5th
and 6th item. This means an average of 18 and 19, which gives the median as 18.5.
n +1
It may be noted that the formula itself is not the formula for the median; it
2
merely indicates the position of the median, namely, the number of items we have to
count until we arrive at the item whose value is the median. In the case of the even
number of items in the series, we identify the two items whose values have to be
averaged to obtain the median. In the case of a grouped series, the median is
l 2 + l1
M = l1 (m − c)
f
items
c = the cumulative frequency of the class preceding the one in which the median lies
Example 2.7:
Total 143
frequency to the table. Thus, the table with the cumulative frequency is written as:
30
Cumulative Frequency
Monthly Wages Frequency
800 -1,000 18 18
1,000 -1,200 25 43
1,200 -1,400 30 73
1,400 -1,600 34 107
1,600 -1,800 26 133
1.800 -2,000 10 143
l 2 + l1
M = l1 (m − c)
f
M = n + 1 = 143 + 1 = 72
2 2
200
= 1200 + (29)
30
= Rs 1393.3
At this stage, let us introduce two other concepts viz. quartile and decile. To
understand these, we should first know that the median belongs to a general class of
statistical descriptions called fractiles. A fractile is a value below that lays a given
fraction of a set of data. In the case of the median, this fraction is one-half (1/2).
Likewise, a quartile has a fraction one-fourth (1/4). The three quartiles Q1, Q2 and Q3
are such that 25 percent of the data fall below Q1, 25 percent fall between Q1 and Q2,
25 percent fall between Q2 and Q3 and 25 percent fall above Q3 It will be seen that Q2
is the median. We can use the above formula for the calculation of quartiles as well.
The only difference will be in the value of m. Let us calculate both Q1 and Q3 in
l 2 − l1
Q1 = l1 (m − c)
f
31
n + 1 143 + 1
Here, m will be = = = 36
4 4
1200 − 1000
Q1 = 1000 + (36 − 18)
25
200
= 1000 + (18)
25
= Rs. 1,144
n + 1 3×144
In the case of Q3, m will be 3 = = = 108
4 4
1800 − 1600
Q1 = 1600 + (108 − 107)
26
200
= 1600 + (1)
26
In the same manner, we can calculate deciles (where the series is divided into 10
parts) and percentiles (where the series is divided into 100 parts). It may be noted that
happens to be skewed. Another point that goes in favour of median is that it can be
computed when a distribution has open-end classes. Yet, another merit of median is
that when a distribution contains qualitative data, it is the only average that can be
used. No other average is suitable in case of such a distribution. Let us take a couple
32
Example 2.8:Calculate the most suitable average for the following data:
Size of the Item Below 50 50-100 100-150 150-200 200 and above
Frequency 15 20 36 40 10
Solution: Since the data have two open-end classes-one in the beginning (below 50) and the
other at the end (200 and above), median should be the right choice as a measure of central
tendency.
Below 50 15 15
50-100 20 35
100-150 36 71
150-200 40 111
200 and above 10 121
n +1
Median is the size of th item
2
121 + 1
= = 61st item
2
l 2 − l1
Median = 11 = l1 (m − c)
f
150 − 100
= 100 + (61 − 35)
36
Example 2.9: The following data give the savings bank accounts balances of nine sample
(a) Find the mean and the median for these data; (b) Do these data contain an outlier? If so,
exclude this value and recalculate the mean and median. Which of these summary measures
33
has a greater change when an outlier is dropped?; (c) Which of these two summary measures
Solution:
Rs 83,600
= = Rs 9,289
9
n + 1
Median = Size of th item
2
9 + 1
= = 5th item
2
Arranging the data in an ascending order, we find that the median is Rs 1,800.
exclude this figure and recalculate both the mean and the median.
83,600 − 68,000
Mean = Rs.
8
15,600
= Rs = Rs. 1,950
8
n + 1
Median = Size of th item
2
8 + 1
= = 4.5th item.
2
1,500 − 1,800
= Rs. = Rs. 1,650
2
It will be seen that the mean shows a far greater change than the median when the
(c) As far as these data are concerned, the median will be a more appropriate measure
34
Example 2.10: Suppose we are given the following series:
Frequency 6 12 22 37 17 8 5
We are asked to draw both types of ogive from these data and to determine the
median.
Solution:
First of all, we transform the given data into two cumulative frequency distributions,
Table A
Frequency
Less than 10 6
Less than 20 18
Less than 30 40
Less than 40 77
Less than 50 94
Less than 60 102
Less than 70 107
Table B
Frequency
More than 0 107
More than 10 101
More than 20 89
More than 30 67
More than 40 30
More than 50 13
More than 60 5
35
meet the X-axis at M. Thus, from the point of origin to the point at M gives the value
applying the formula, then the answer comes to 33.8, or 34, approximately. It may be
pointed out that even a single ogive can be used to determine the median. As we have
determined the median graphically, so also we can find the values of quartiles, deciles
1)} /4 = 81st item. From this point on the Y-axis, we can draw a perpendicular to
meet the 'less than' ogive from which another straight line is to be drawn to meet the
X-axis. This point will give us the value of the upper quartile. In the same manner,
1. Unlike the arithmetic mean, the median can be computed from open-ended
2. The median can also be determined graphically whereas the arithmetic mean
4. In case of the qualitative data where the items are not counted or measured but
2.4 MODE
The mode is another measure of central tendency. It is the value at the point around
which the items are most heavily concentrated. As an example, consider the following
36
There are ten observations in the series wherein the figure 15 occurs maximum
number of times three. The mode is therefore 15. The series given above is a discrete
series; as such, the variable cannot be in fraction. If the series were continuous, we
could say that the mode is approximately 15, without further computation.
f1 − f 0
Mode= l1 + ×i
( f1 − f 0 ) + ( f1 − f 2 )
Where, l1 = the lower value of the class in which the mode lies
While applying the above formula, we should ensure that the class-intervals are
uniform throughout. If the class-intervals are not uniform, then they should be made
uniform on the assumption that the frequencies are evenly distributed throughout the
class. In the case of inequal class-intervals, the application of the above formula will
Solution: We can see from Column (2) of the table that the maximum frequency of
12 lies in the class-interval of 60-70. This suggests that the mode lies in this class-
37
12 - 8
Mode = 60 + × 10
12 - 8 (12 - 8) + (12 - 9)
4
= 60 + × 10
4+3
= 65.7 approx.
In several cases, just by inspection one can identify the class-interval in which the
mode lies. One should see which the highest frequency is and then identify to which
class-interval this frequency belongs. Having done this, the formula given for
At times, it is not possible to identify by inspection the class where the mode lies. In
such cases, it becomes necessary to use the method of grouping. This method consists
of two parts:
(i) Preparation of a grouping table: A grouping table has six columns, the first
frequencies grouped in two's, starting from the top. Leaving the first
frequencies of the first three items, then second to fourth item and so on.
Column 5 leaves the first frequency and groups the remaining items in three's.
Column 6 leaves the first two frequencies and then groups the remaining in
three's. Now, the maximum total in each column is marked and shown either
analysis table is prepared. On the left-hand side, provide the first column for
column numbers and on the right-hand side the different possible values of
mode. The highest values marked in the grouping table are shown here by a
38
they represent. The last row of this table will show the number of times a
particular value has occurred in the grouping table. The highest value in the
analysis table will indicate the class-interval in which the mode lies. The
procedure of preparing both the grouping and analysis tables to locate the
10-20 10
20-30 18
30-40 25
40-50 26
50-60 17
60-70 4
Solution:
Grouping Table
Size of item 1 2 3 4 5 6
10-20 10
28
20-30 18 53
43
30-40 25 69
51
40-50 26 68
43
50-60 17 47
21
60-70 4
Analysis table
Size of item
Col. No. 10-20 20-30 30-40 40-50 50-60
1 1
2 1 1
3 1 1 1 1
4 1 1 1
5 1 1 1
39
6 1 1 1
Total 1 3 5 5 2
This is a bi-modal series as is evident from the analysis table, which shows that the
two classes 30-40 and 40-50 have occurred five times each in the grouping. In such a
formula:
Median = Size of (n + l)/2th item, that is, 101/2 = 50.5th item. This lies in the class
30-40. Applying the formula for the median, as given earlier, we get
40 - 30
= 30 + (50.5 − 28)
25
= 30 + 9 = 39
Mean = A+
∑ fd ' × i
n
34
= 35 + × 10
100
= 38.4
= (3 x 39) - (2 x 38.4)
= 117 -76.8
40
= 40.2
This formula, Mode = 3 Median-2 Mean, is an empirical formula only. And it can
give only approximate results. As such, its frequent use should be avoided. However,
when mode is ill defined or the series is bimodal (as is the case in the present
Having discussed mean, median and mode, we now turn to the relationship amongst
these three measures of central tendency. We shall discuss the relationship assuming
(i) When a distribution is symmetrical, the mean, median and mode are the same,
In case, a distribution is
bution is skewed to the right where a large number of families have relatively
low income and a small number of families have extremely high income. In
such a case, the mean is pulled up by the extreme high incomes and the
relation among these three measures is as shown in Fig. 6.3. Here, we find that
41
shown as in the figure.
(iii) Given the mean and median of a unimodal distribution, we can determine
is skewed to the left. It may be noted that the median is always in the middle
At this stage, one may ask as to which of these three measures of central tendency the
best is. There is no simple answer to this question. It is because these three measures
are based upon different concepts. The arithmetic mean is the sum of the values
divided by the total number of observations in the series. The median is the value of
the middle observation that divides the series into two equal parts. Mode is the value
around which the observations tend to concentrate. As such, the use of a particular
measure will largely depend on the purpose of the study and the nature of the data;
For example, when we are interested in knowing the consumers preferences for
different brands of television sets or different kinds of advertising, the choice should
go in favour of mode. The use of mean and median would not be proper. However,
the median can sometimes be used in the case of qualitative data when such data can
Suppose we invite applications for a certain vacancy in our company. A large number
of candidates apply for that post. We are now interested to know as to which age or
age group has the largest concentration of applicants. Here, obviously the mode will
be the most appropriate choice. The arithmetic mean may not be appropriate as it may
42
MEASURES OF CENTRAL TENDENCY
OBJECTIVES
After going through this unit, you will learn:
• the concept and significance of measures of central tendency
• to compute various measures of central tendency, such as arithmetic mean, median, mode and quartiles
• the relationship among various averages.
INTRODUCTION
The objective here is to find one representative value which can-be used to locate and summarise the
entire set of varying values. This one value can be used to make many decisions concerning the entire
set. We can define measures of central tendency (or location) to find some central value around which the
data tend to cluster.
The arithmetic mean (or mean or average) is the most commonly used and readily understood measure of
central tendency. In statistics, the term average refers to any of the measures of central tendency.
Ungrouped data/Raw data
The arithmetic mean is defined as being equal to the sum of the numerical values of each and every
observation divided by the total number of observations. Symbolically, it can be represented as:
where,
∑X indicates the sum of the values of all the observations, and N is the total number of
observations.
For example, let us consider the monthly salary (Rs.) of 10 employees of a firm x
2500, 2700, 2400, 2300, 2550, 2650, 2750, 2450, 2600, 2400
If we compute the arithmetic mean, then 2500+2700+2400+2300+2550+2650+2750+2450+2600+2400 =
25300
Mean=25300 /10= Rs. 2530.
Therefore, the average monthly salary is Rs. 2530.
Discrete data
When the observations are classified into a frequency distribution, Therefore, for discrete data; the
arithmetic mean is defined as
Where, f is the frequency for corresponding variable x and N is the total frequency, i.e. N = f.
X f fx
10 12 120
20 23 460
30 35 1050
40 47 1880
50 38 1900
60 29 1740
70 16 1120
Sum 200 8270
Mean=8270/200= 41.35
Continuous Data
When the observations are classified into a frequency distribution, Therefore, for grouped data; the
arithmetic mean is defined as
=
Where X is midpoint of various classes, f is the frequency for corresponding class and N is the total
frequency, i.e. N = f.
This method is illustrated for the following data which relate to the monthly sales of 200 firms.
the midpoint of the class interval would be treated as the representative average value of that class.
Mean=102000/200=510
MERITS OF MEAN
DEMERITS OF MEAN
A second measure of central tendency is the median. Median is that value which divides the distribution
into two equal parts. Fifty per cent of the observations in the distribution are above the value of median and
other fifty per cent of the observations are below this value of median. The median is the value of the
middle observation when the series is arranged in order of size or magnitude (Ascending order).
UNGROUPED DATA
If the number of observations is odd, then the median is equal to one of the original observations (Middle).
Median = th value
For example, if the income of seven persons in rupees is 1100, 1200, 1350, 1500, 1550, 1600, 1800, then
If the number of observations is even, then the median is the arithmetic mean of the two middle
observations.
Median = th value
For example, if the income of eight persons in rupees is 1100, 1200, 1350, 1500, 1550, 1600, 1800,1850,
then the median income of eight persons would be 1500+1550/2= 1525
DISCRETE SERIES
First we find cumulative frequency.then locate (N+1/2) the value in cumulative frequency.corresponding
that value of x is median.
X f cf
10 12 12
20 23 35
30 35 70
40 47 117
50 38 155
60 29 184
70 16 200
Sum 200
N=200
N+1/2=100.5
Median= 40
CONTINUOUS DATA
For continuous data, First we find cumulative frequency. then locate (N+1/2) the value in cumulative
frequency. corresponding class interval is median class.the following formula may be used to locate the
value of median.
where l1 is the lower limit of the median class, cf is the preceding cumulative frequency to the median class,
f is the frequency of the median class and h is the width of the median class.
Consider the following data which relate to the age distribution of 1000 workers in an industrial
establishment.
The location of median value is facilitated by the use of a cumulative frequency distribution as shown
below in the table.
N=1000
Median Class=(1000+1)/2=500.5th =(35-40)
l1 =35,cf =425,f=160,N/2=500,h=5
MERITS OF MEDIAN
DEMERITS OF MEDIAN
For example, in the series of numbers 3, 4, 5, 5, 6, 7, 8, 8, 8, 9, the mode is 8 because it occurs the
maximum number of times. That means in ungrouped data mode can find by inspection only.
DISCRETE DATA
X f
10 12
20 23
30 35
40 47
50 38
60 29
70 16
Sum 200
Mode=40
CONTINUOUS DATA
where l1 is lower limit of the modal class, f1 is the frequency of the modal class,f0 the frequency of the
preceding class, f2 is the frequency of the succeeding class, h is the size of the modal class.
Modal class=(30-35)
DEMERITS OF MODE
UNGROUPED DATA
If the number of observations is odd, then the median is equal to one of the original observations (Middle).
Q1= th value
Q3= th value
For example, if the income of seven persons in rupees is 1100, 1200, 1350, 1500, 1550, 1600, 1800,1850
then
Q1 = =2.25nd value
Q1=1200+0.25(1350-1200) =1200+37.5 =1237.5
DISCRETE SERIES
First we find cumulative frequency. then locate (N+1/4 ) and 3(N+1/4)the value in cumulative frequency
.corresponding that value of x is Q1 and Q2 respectively.
First X f cf
quartile 10 12 12
20 23 35
30 35 70
40 47 117
Third 50 38 155
quartile 60 29 184
70 16 200
Sum 200
N=200
Q1=N+1/4=50.25th value
Q1 = 30
Q3 =3*(N+1/4)= 150.75th value
Q3=50
CONTINUOUS DATA
For continuous data, First we find cumulative frequency. then locate (N+1/4) and 3(N+1/4)the value in
cumulative frequency. corresponding class interval is first quartile class and third quartile class
respectively.the following formula may be used to locate the value of quartiles.
where l1 is the lower limit of the first quartile class, cf is the preceding cumulative frequency to the first
quartile class, f is the frequency of first quartile class and h is the width of the first quartile class .
where l1 is the lower limit of the third quartile class, cf is the preceding cumulative frequency to the third
quartile class, f is the frequency of third quartile class and h is the width of the third quartile class .
Consider the following data which relate to the age distribution of 1000 workers in an industrial
establishment.
The location of quartile value is facilitated by the use of a cumulative frequency distribution as shown
below in the table.
First
Age (Years) No. of workers Cumulative frequency quartile
f c.f
class
Below 25 120 120
25-30 125 245
30-35 180 425(cf)
35-40 160(f) 585
40-45 150 735
45-50 140 875 Third
50-55 100 975 quartile
55 and Above 25 1000 class
N=1000
Q1=(1000+1)/4=250.25th =(25-30)
l1 =25,cf =120,f=125,N/4=250,h=5
Q1=25+{(250-120)*5}/125 =35+390/125=35+3.12=38.12
Q3=3(1000+1)/4=750.75th =(45-50)
l1 =45,cf =735,f=140,3N/4=750,h=5
Q3=45+{(750-735)*5}/140=45+75/140=45+0.53=45.53
1] Following is the cumulative frequency distribution of preferred length of study-table obtained from
the preference study of 50 students.
A manufacturer has to take decision on the length of study-table to manufacture. What length would
you recommend and why?
2]An incomplete distribution of daily sales (Rs. thousand) is given below. The data relate to 229 days.
Daily sales No. of days Daily sales No. of days
(Rs. thousand) (Rs. thousand)
10-20 12 50-60 ?
20-30 30 60-70 25
30-40 ? 70-80 18
You are told that the median value is 46. Using the median formula, fill up the missing frequencies and
calculate the arithmetic mean of the completed data.