Median and Mode Calculation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

RESEARCH METHODOLOGY

AND BIOSTATISTICS
MEDIAN AND MODE
Application of Data Analysis
They are mostly used in anatomy and physiology to assess what is normal in
health as in the case of
▪ Height
▪ Weight
▪ Blood pressure
▪ Cholesterol in blood
▪ Pulse rate
▪ RBC count etc.
Methods of Data Analysis

In research , there are methods of analyzing quantitative data in


which characteristics and frequency are both variable such as

▪ Calculation of Averages,
▪ Percentiles,
▪ Standard Deviation,
▪ Standard Error etc.
Statistical Averages
▪ Measures of central tendency (or statistical averages) tell us the point about which items
have a tendency to cluster.

▪ Such a measure is considered as the most representative figure for the entire data-set and
are a key way to discuss and communicate with graphs.

▪ Average value of a characteristic is the one central value around which all other
observations are dispersed.

▪ The term central tendency refers to the middle, or typical, value of a set of data, is most
commonly measured by using the three m's: Mean, Median, and Mode.
Use of Statistical Averages
▪ To find most of the normal observations lie close to the central value, while few of the too
large or too small lie far away at both ends.

▪ To find which group is better off by comparing the average of one group with that of the
other.

e.g. one finds the average incubation period of cholera is smaller than that of
typhoid; income of pleaders is higher than that of doctors; average daily attendance
of one hospital is higher than that of another and so on.

After finding the difference, one may reason out why in one group it is more than
that in the other.
MEDIAN

The median of a series of observations is that value above which there are
as many scores as below it; that is, it divides a rank-ordered distribution
into two equal halves.

When all the observations of a variable are arranged in either ascending or


descending order, the middle observation is known as median.

It implies the mid value of series.


Advantages
▪ The advantage of the median as a measure of central tendency is that it is unaffected by
the value of extreme scores.

▪ It is an index of average position in a distribution, not amount.

▪ This measure of central tendency is typically used when the mean value is affected by an
unusually low number or an unusually high number in the data set (outliers). Outliers
distort the mean value to the extent that the mean value no longer accurately depicts the
set of data

▪ It is therefore a useful measure in describing skewed distributions. For instance, the


average cost of a house is usually cited in terms of the median, because the distribution
tends to be skewed to the right.
Example
Consider another example of 7 observations in absenteeism of school children in the series 4,
6, 8, (10), 12, 14, 32.

Mean 86/7 = 12.3

Median value = 10

In this case, mean value gives a distorted result as one observation 32 is too large, so the
mean as a measure of central tendency should not be considered appropriate. To have a
better idea of average, one should ignore unduly high observations such as 32 in the above
example. Mean of the remaining observations will be 54/6 = 9.0 which is much closer to the
median, i.e. 10 than the mean 12.3calculated with seven observations.
Conclusion
▪ Median, therefore, is a better indicator of central value when one or more of the lowest or
the highest observations are wide apart or not so evenly distributed, e.g.

For example: If one of the houses in your neighborhood was broken down and
maintained a low property value, then you would not want to include this property
when determining the value of your own home. However, if you are purchasing a home
in that neighborhood, you may want to include the outlier since it would drive down the
price you would have to pay.
Calculation Of Median - Ungrouped Series
▪ Arrange the observations in the series in ascending or descending order of magnitude.
The central observation of the arranged series gives the median.

▪ ODD SCORES SERIES

▪ ESRs of seven subjects are arranged in ascending order: 3, 4, 4, (5), 5, 6, 7. The 4th
observation (5) is the median in this series. When a distribution contains an odd
number of scores, such as 4, 5, 6, 7, 8, the middle score, 6, is median.

▪ EVEN SCORES SERIES

▪ The midpoint between the two middle scores is the median, so for the series 4, 5, 6,
7, 8, 9, the median lies halfway between 6 and 7. Therefore, median equals 6.5.
Formula
▪ The median is represented as M .

▪ Formula, gives the serial number of median, irrespective of the


number (n) of observation is the series, odd or even.
Example
Find the median of the following data: 12, 2, 16, 8, 14, 10, 6
Step 1: Organize the data, or arrange the numbers from smallest to largest.
2, 6, 8, 10, 12, 14, 16

Step 2: Since number of data values is odd, the median will be found position.

Step 3: In this case, the median is the value that is found in the
fourth position of the organized data.
2, 6, 8, 10, 12, 14, 16

Ans: The median is 10.


Example
Find the median of the following data: 7, 9, 3, 4, 11, 1, 8, 6, 1, 4

Step 1: Organize the data, or arrange the numbers from smallest to largest.
1, 1, 3, 4, 4, 6, 7, 8, 9, 11

Step 2: Since the number of data values is even, the median will be the mean value of the
numbers found before and after the position.

Step 3: The number found before the 5.5 position is 4 and the number found after the
5.5 position is 6. Now, you need to find the mean value.
1, 1, 3, 4, 4, 6, 7, 8, 9, 11

Ans: The median is 5


Calculation Of Median - Grouped Series
To find the median in grouped series simply divide the total observation by 2. If
total is 200, 100th observation is median. Even if the total is 201, the 100th
observation may be taken as median.

Step 1: Construct the cumulative frequency distribution.

Step 2: Decide the class that contain the median.

Class Median is the first class with the value of cumulative frequency equal at
least n/2.

Step 3: Find the median by using the following formula:


Formula
Step 1: Construct the cumulative frequency distribution
Example
Time to Frequency Time to Frequency Cumulative
travel to travel to Frequency Therefore,
work work
1 – 10 8 1 – 10 8 8
11 – 20 14 11 – 20 14 22
21 – 30 12 21 – 30 12 34
31 – 40 9 31 – 40 9 43
41 – 50 7 41 – 50 7 50

Thus, 25 persons take less than 24 minutes to travel to work


and another 25 persons take more than 24 minutes to travel to
work.
MODE
▪ The mode is the score that occurs most frequently in a distribution.

▪ It is most easily determined by inspection of a frequency distribution.

▪ This is the most frequently occurring observation in a series, i.e. the most
common or most fashionable, such as 8 mm in tuberculin test of 10 boys given
below: 3, 5, 7, 7, 8, 8, 10, 11, 12.

▪ Mode is rarely used in medical studies. Out of the three measures of central
tendency mean is better and utilized more often because it uses all the
observations in the data and is further used in the tests of significance.
Advantages
▪ Mode is most commonly or frequently occurring value in a series. The mode in a
distribution is that item around which there is maximum concentration.
▪ Mode is size of item which has the maximum frequency, but at items such an
item may not be mode on account of the effect of the frequencies of the
neighbouring items.
▪ Like median, mode is a positional average and is not affected by the values of
extreme items. it is, therefore, useful in all situations where we want to
eliminate the effect of extreme variations.
▪ Mode is particularly useful in the study of popular sizes.
▪ Mode can be determined for qualitative data as well as quantitative data, but
the mean and the median can only be determined for quantitative data.
Example
A manufacturer of shoes is usually interested in finding out the size most in
demand so that he may manufacture a larger quantity of that size. In other words,
he wants a modal size to be determined for median or mean size would not serve
his purpose. but there are certain limitations of mode as well.

For example, it is not amenable to algebraic treatment and sometimes remains


indeterminate when we have two or more model values in a series. It is considered
unsuitable in cases where we want to give relative importance to items under
consideration.
Types of Mode
If two or more values appear with the same frequency, each is a mode. The
word modal is often used when referring to the mode of a data set.

▪ If a data set has only one value that occurs most often, the set is
called unimodal.
▪ A data set that has two values that occur with the same greatest
frequency is referred to as bimodal.
▪ When a set of data has more than two values that occur with the
same greatest frequency, the set is called multimodal.
Example
The mode is the most
frequent score in our data
set. On a histogram it
represents the highest bar
in a bar chart or histogram.
You can, therefore,
sometimes consider the
mode as being the most
popular option. An example
of a mode is presented
Example
Normally, the mode is used
for categorical data where
we wish to know which is
the most common
category, as illustrated the
most common form of
transport, in this particular
data set, is the bus.
Example
We can see above that the
most common form of
transport, in this particular
data set, is the bus.
However, one of the
problems with the mode is
that it is not unique, so it
leaves us with problems
when we have two or more
values that share the
highest frequency, as
illustrated
Disadvantages
▪ The downside to using the mode as a measure of central tendency is that a set of data may
have no mode, or it may have more than one mode. However, the same set of data will
have only one mean and only one median.

▪ When determining the mode of a data set, calculations are not required, but keen
observation is a must. The mode is a measure of central tendency that is simple to locate,
but it is not used much in practical applications.

▪ Mode is very rarely used with continuous data.


▪ Mode may not provide us with a very good measure of central tendency when the most
common mark is far away from the rest of the data in the data set
Example - Continuous Data
We are now stuck as to which mode best describes the central tendency of the
data. This is particularly problematic when we have continuous data because we
are more likely not to have any one value that is more frequent than the other.

For example, consider measuring 30 peoples' weight (to the nearest 0.1 kg).

How likely is it that we will find two or more people with exactly the same weight
(e.g., 67.4 kg)? The answer, is probably very unlikely - many people might be close,
but with such a small sample (30 people) and a large range of possible weights, you
are unlikely to find two people with exactly the same weight; that is, to the nearest
0.1 kg. This is why the mode is very rarely used with continuous data.
Example – Location
Another problem with the mode is that it will not
provide us with a very good measure of central
tendency when the most common mark is far
away from the rest of the data in the data set, as
depicted in the diagram

In the diagram the mode has a value of 2. We can


clearly see, however, that the mode is not
representative of the data, which is mostly
concentrated around the 20 to 30 value range.
To use the mode to describe the central
tendency of this data set would be misleading.
Example - Ungrouped Series
Find the mode of the following data:
76, 81, 79, 80, 78, 83, 77, 79, 82, 75

▪ There is no need to organize the data, unless you think that it would be easier to
locate the mode if the numbers were arranged from least to greatest.
▪ In the above data set, the number 79 appears twice, but all the other numbers
appear only once.
▪ Since 79 appears with the greatest frequency, it is the mode of the data values.
Example – Qualitative Data
You begin to observe to the color of clothing employees wear at a company. Your goal is to
find out what color is worn most frequently so that you can offer company shirts to your
employees.
Monday: Red, Blue, Black, Pink, Green, and Blue The color blue was worn 11 times
Tuesday: Green, Blue, Pink, White, Blue, and Blue
during the week. All other colors were
worn with much less frequency in
Wednesday: Orange, White, White, Blue, Blue, and Red comparison to the color blue.

Thursday: Brown, Black, Brown, Blue, White, and Blue The mode is blue.
Friday: Blue, Black, Blue, Red, Red, and Pink

What is the mode of the colors above?


Mode - Grouped Data

For grouped data, class


mode (or, modal class) is
the class with the highest
frequency.

To find mode for grouped


data, use the following
formula:
Example
Based on the grouped data Solution
below, find the mode

Time to travel to Frequency


work

1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Mode – Histogram Method
Mode can also be obtained from a histogram.

Step 1: Identify the modal class and the bar


representing it

Step 2: Draw two cross lines as shown in the


diagram.

Step 3: Drop a perpendicular from the


intersection of the two lines until it touchthe
horizontal axis.

Step 4: Read the mode from the horizontal


axis
Mode – Relation with Median and Mean
A distribution in which the values of mean,median and mode coincide (i.e. mean = median =
mode) is known as a symmetrical distribution. Conversely, when values of mean, median and
mode are not equal the distribution is known as asymmetrical or skewed distribution.

In moderately skewed or asymmetrical distribution a very important relationship exists


among these three measures of central tendency.

In such distributions the distance between the mean


and median is about one-third of the distance between
the mean and mode, Karl Pearson expressed this
relationship as:
Knowing any two values, the third can be computed.
Thankyou

You might also like