Measurement of Variation Dispersion 2
Measurement of Variation Dispersion 2
Measurement of Variation Dispersion 2
Specific objectives
At the end of this topic, the trainee should be able to:-
• State the characteristics of a good measure of dispersion.
• Differentiate between the absolute and relative measures.
• Calculate and interpret the measures of dispersion.
Introduction
Measures of variation help us in studying the important characteristics of a
distribution. The measures of dispersion are very useful in statistical work
because they indicate whether the rest of the data are scattered around
the mean or away from the mean. If the data is approximately dispersed
around the mean then the measure of dispersion obtained will be small
therefore indicating that the mean is a good representative of the sample
data. But on the other hand, if the figures are not closely located to the
mean then the measures of dispersion obtained will be relatively big
indicating that the mean does not represent the data sufficiently.
62
iv) to facilitate the use of other statistical measures
• The range
• The Mean deviation
• The interquartile range or quartile deviation
• The standard deviation
• The Lorenz curve
a) The Range
The range is defined as the difference between the highest and the smallest
values in a frequency distribution. This measure is not very efficient
because it utilizes only 2 values in a given frequency distribution. However
the smaller the value of the range, the less dispersed the observations are
from the arithmetic mean and vice versa
63
The range is not commonly used in business management because 2 sets of
data may yield the same range but end up having different interpretations
regarding the degree of dispersion.
Range is the simplest method of studying variation. It is defined as the
difference between the value of the smallest observation and the value of
the largest observation
R= L-S
Limitation
• Range is not based on each and every observation of the distribution
• It is subject to fluctuation of considerable magnitude from sample to
sample
• Range cannot be computed in case of open –end distribution
• Range cannot tell us anything about the character of the distribution
within two extreme observations
Uses of range
64
1. Quality control; the objective of quality control is to keep a check on
the quality of the products without 100% inspection control charts
are prepared.
2. fluctuation in the shares price; range is useful in studying the
variation in the prices in stocks and shares and other commodities
3. Weather forecast; the meteorological department does make use of
the range in determining the difference between the minimum
temperature and maximum temperature.
Example 1
In a given exam the scores for 10 students were as follows
Student Mark (x) x−x
A 60 1.8
B 45 16.8
C 75 13.2
D 70 8.2
65
E 65 3.2
F 40 21.8
G 69 7.2
H 64 2.2
I 50 11.8
J 80 18.2
Total 618 104.4
Required
Determine the absolute mean deviation
618
Mean, x = = 61.8
10
X-X 104.4
Therefore AMD = = = 10.44
N 10
Example 2
The following data was obtained from a given financial institution. The data
refers to the loans given out in 1996 to several firms
Required
Calculate the mean deviation for the amount of items given
66
fx 459, 000
X = = = 24157.9
f 19
X -X 286736.90
AMD = =
f 19
NB if the absolute mean deviation is relatively small it implies that the data
is more compact and therefore the arithmetic mean is a fair sample
representative.
Very often the interquartile range is reduced to the form of the semi-
interquartile range or quartile deviation by dividing it by 2
Q.D = quartile deviation
Q.D = Q3-Q1/2
Quartile deviation gives the average amount by which the quartile differs
from the median. Quartile deviation is an absolute measure of variation.
The relative measure corresponding to the measure called the coefficient
of quartile deviation.
Coefficient of Q.D = Q3-Q1/Q3+Q1
The semi-interquartile range,
Q3 - Q1
SIR =
2
Example 1
67
The weights of 15 parcels recorded at the GPO were as follows:
16.2, 17, 20, 25(Q1) 29, 32.2, 35.8, 36.8(Q2) 40, 41, 42, 44(Q3) 49, 52, 55
(in kgs)
Required
Determine the semi interquartile range for the above data
Q3 − Q1 44 - 25 19
SIR = = = = 8.5
2 2 2
Required
i. Determine the semi interquartile range for the above data
ii. Determine the minimum value for the top ten per cent.(10%)
iii. Determine the maximum value for the lower 40% of the retirees
Solution
The lower quartile (Q1) lies on position
N +1 382 + 1
=
4 4
= 95.75
(95.75 - 50)
the value of Q1 = 29.5 + x 10
69
68
= 29.5 + 6.63
= £36.13
= 287.25
= 61.08
Q3 - Q1
The semi interquartile range =
2
61.08 - 36.13
=
2
= 12.475
= £12,475
ii. The top 10% is equivalent to the lower 90% of the retirees
The position corresponding to the lower 90%
90
= (n + 1) = 0.9 (382 + 1)
100
= 0.9 x 383
= 344.7
69
iii. The lower 40% corresponds to position
40
= (382 + 1)
100
= 153.20
= 39.5 +
(153.2 - 119 ) x 10
70
= 39.5 + 4.88
= 44.38
= £ 44380
Example
Using the above data for retirees calculate the 10th - 90th percentile. The
tenth percentile 10th percentile lies on position
10
(382 + 1) = 0.1 x 383
100
= 38.3
∴ the value corresponding to the tenth percentile
(38.3 x 10)
= 19.5 +
50
= 19.5 + 7.66
= 27.16
The 90th percentile lies on position
90
(382 + 1) = 0.9 x 383
100
70
= 344.7
∴ the value corresponding to the 90th percentile
= 69.5 +
( 344.7 - 331) x 10
40
= 69.5 + 3.425
= 72.925
∴ the required value of the 10th – 90th percentile = 72.925 – 27.16 =
45.765
Limitation
i) Quartile deviation ignores 50% item as the value of quartile
deviation does not depend upon every observation it cannot be
regarded as a god method of measuring variation.
ii) It is not capable of mathematical manipulation
iii) Its value sis very much affected by sampling fluctuation
iv) It is in fact not a measure of variation as it really does not show
the scatter around an average but rather a distance on scale.
71
• Mean deviation can be computed either from median or mean. The
standard deviation on the other hand is always computed from the
arithmetic mean because the sum of the squares of the deviation of
items from arithmetic mean is least.
This is one of the most accurate measures of dispersion. It has the following
advantages;
i. It utilizes all the values given
ii. It makes use of both negative and positive values if they occur
iii. The standard deviation reflects an accurate impression of how
much the sample data varies from the mean. This is because its
suitability can also be tested using other statistical methods
Example
A sample comprises of the following observations; 14, 18, 17, 16, 25, 31
Determine the standard deviation of this sample
Observation.
x ( x − x) ( x − x)
2
14 -6.1 37.21
18 -2.1 4.41
17 -3.1 9.61
16 -4.1 16.81
25 4.9 24.01
31 10.9 118.81
Total 121 210.56
121
X= = 20.1
6
( )
2
x−x 210.56
Standard deviation, = =
n 6
= 5.93
Alternative method
x X2
14 196
18 324
17 289
16 256
72
25 625
31 961
Total 121 2651
2 2
x x 2651 121
=
2
− = −
n n 6 6
= 5.93
Example 2
The following table shows the part-time rate per hour of a given no. of
laborers in the month of June 1997.
Calculate the standard deviation from the above table showing how the
hourly payment were varying from the respective mean
fx fx
2
2
∴ Standard deviation, = -
f f
2
2345300 8410
= -
35 35
= 67008.6 − 577372
= 9271.4
= 96.29
73
In business statistical work we usually encounter a set of grouped data. In
order to determine the standard deviation from such data, we use any of
the three following methods
i. The long method
ii. The shorter method
iii. The coded method
The above methods are used in the following examples
Example 3.1
The quality controller in a given firm had an accurate record of all the iron
bars produced in May 1997. The following data shows those records
fx fx
2
2
∴ Standard deviation, σ = -
f f
2
47489526 118981.50
= -
313 313
= 84.99 cm
74
401 – 450 51 425.5 50 2550 127500
451 – 500 42 475.5 100 4200 420000
501 - 550 30 525.5 150 4500 675000
Total 313 1450 2267500
fd fd
2
2
∴ Standard deviation, σ = −
f f
2
2267500 1450
= −
313 313
= 7244.40 − 21.50
= 7222.90
= 84.99 cm
fu fu
2
2
= c -
f
f
75
2
907 29
= 50 −
313 313
= 50 × 1.6997
= 84.99
Variance
Square of the standard deviation is called variance.
Illustration
In the following table is given the number of companies belonging to two
areas A and B according to the amount of profits earned by them. Draw in
the same diagram their Lorenz curve and interpret them.
Profits earned shs (000s) Number of companies
Area A Area B
6 6 2
25 11 38
60 13 52
84 14 28
105 15 38
150 17 26
170 10 12
400 14 4
Solution
76
Relative measures of dispersion
Def: A relative measure of dispersion is a statistical value which may be
used to compare variations in 2 or more samples.
The measures of dispersion are usually expressed as decimals or
percentages and usually they do not have any other units
Example
The average distance covered by vehicles in a motor rally may be given as
2000 km with a standard deviation of 5 km.
In another competition set of vehicles covered 3000 km with a standard
deviation of 10 kms
NB: The 2 standard deviations given above are referred to as absolute
measures of dispersion. These are actual deviations of the measurements
from their respective mean
However, these are not very useful when comparing dispersions among
samples.
Therefore the following measures of dispersion are usually employed in
order to assess the degree of dispersion.
i. Coefficient of mean deviation
Mean deviation
=
mean
ii. Coefficient of quartile deviation
1 (Q - Q )
= 2
3 1
Q2
∴ C.O.V = 5 x 100
2000
= 0.25%
= 0.33%
Conclusion
Since the coefficient of variation is greater in the 2 nd group, than in the
first group we may conclude that the distances covered in the 1 st group are
much closer to the mean that in the 2nd group.
Example 2
In a given farm located in the UK the average salary of the employees is £
3500 with a standard deviation of £150
The same firm has a local branch in Kenya in which the average salaries are
Kshs 8500 with a standard deviation of Kshs.800
Determine the coefficient of variation in the 2 firms and briefly comment
on the degree of dispersion of the salaries in the 2 firms.
First firm in the UK
C.O.V = 150 x 100
3500
= 4.29%
78
8500
= 9.4%
Conclusively, since 4.29% < 9.4% then the salaries offered by the firm in UK
are much closer to the mean given them in the case to the local branch in
Kenya
Combined mean
Let m be the combined mean
Let x1 be the mean of first sample
Let x2 be the mean of the second sample
Let n1 be the size of the 1st sample
Let n2 be the size of the 2nd sample
Let s1 be the standard deviation of the 1st sample
Let s2 be the standard deviation of the 2nd sample
n1 x1 + n2 x2
combined mean =
n1 + n2
n1s12 + n1 ( m − x1 ) + n2 s22 + n2 ( m − x2 )
2 2
Example
A sample of 40 electric batteries gives a mean life span of 600 hrs with a
standard deviation of 20 hours.
Another sample of 50 electric batteries gives a mean lifespan of 520 hours
with a standard deviation of 30 hours.
If these two samples were combined and used in a given project
simultaneously, determine the combined new mean for the larger sample
and hence determine the combined or pulled standard deviation.
Size x s
40(n1) 600 hrs(x1) 20hrs (s1)
50 (n1) 520 hrs (x2) 30 hrs (s2)
79
40 ( 600 ) + 50 ( 520 ) 50, 000
Combined mean = = = 555.56
40 + 50 90
= 47.52 hrs
SKEWNESS
- This is a concept which is commonly used in statistical decision
making. It refers to the degree in which a given frequency curve is
deviating away from the normal distribution
- There are 2 types of skew ness namely
i. Positive skew ness
ii. Negative skew ness
1. Positive Skewness
- This is the tendency of a given frequency curve leaning towards
the left. In a positively skewed distribution, the long tail
extended to the right.
80
Frequenc Positively skewed Frequenc
y frequency curve y Negatively skewed
frequency curve
Normal distribution
Long tail
Median
Mode
Mean
Median
Mode
Mean
2. Negative Skewness
This is an asymmetrical curve in which the long tail extends to the left
NB: This frequency curve for the age distribution is characteristic of the
age distribution in developed countries
- The mode is usually bigger than the mean and median
- The median usually occurs in between the mean and mode
- The no. of observations above the mean are usually more than
those below the mean (see the shaded region)
MEASURES OF SKEWNESS
- These are numerical values which assist in evaluating the degree
of deviation of a frequency distribution from the normal
distribution.
2. Coefficient of skewness
mean - mode
=
Standard deviation
NB: These 2 coefficients above are also known as Pearsonian measures of
skewness.
81
Q3 = 3rd quartile
NB: The Pearsonian coefficients of skewness usually range between –ve 3
and +ve 3. These are extreme value i.e. +ve 3 and –ve 3 which therefore
indicate that a given frequency is negatively skewed and the amount of
skewness is quite high.
Similarly if the coefficient of skewness is +ve it can be concluded that the
amount of skew ness of deviation from the normal distribution is quite high
and also the degree of frequency distribution is positively skewed.
Example
The following information was obtained from an NGO which was giving
small loans to some small scale business enterprises in 1996. the loans are
in the form of thousands of Kshs.
Required
Using the Pearsonian measure of skew ness, calculate the coefficients of
skew ness and hence comment briefly on the nature of the distribution of
the loans.
c ( fu )
Arithmetic mean = Assumed mean +
f
= 63 +
( 428 × 5 )
610
= 66.51
It is very important to note that the method of obtaining arithmetic mean (or any other statistic) by misusing
assumed mean (A) from X and then dividing by c can be abit confusing, if this is the case then just use the
straight forward method of:
Arithmetic mean =
f .x where x is the midpoint, the answers are the same.
f
82
fu fu
2
2
2
3086 428
=5 × -
610 610
= 10.68
n +1
The Position of the median lies m =
2
610 +1
= = 305.5
2
= 60.5 +
( 305.5 - 191) ×5
120
= 60.5 +
(114.4 ) ×5
120
Median = 65.27
Therefore the Pearsonian coefficient
= 3
( 66.51- 65.27)
10.68
= 0.348
Comment
The coefficient of skewness obtained suggests that the frequency
distribution of the loans given was positively skewed
This is because the coefficient itself is positive. But the skewness is not
very high implying the degree of deviation of the frequency distribution
from the normal distribution is small
Example 2
Using the above data calculate the quartile coefficient of skewness
Q3+ Q1- 2Q2
Quartile coefficient of skewness =
Q3+ Q1
83
610 +1
The position of Q1 lies on = = 152.75
4
Q2 position: i.e. 2
( 610 +1) = 305.5
4
Conclusion
Same as above when the Pearsonian coefficient was used
KURTOSIS
84
-This is a concept, which refers to the degree of peaked ness of a
given frequency distribution. The degree is normally measured
with reference to normal distribution.
- The concept of kurtosis is very useful in decision making
processes i.e. if is a frequency distribution happens to have either
a higher peak or a lower peak, then it should not be used to make
statistical inferences.
- Generally there are 3 types of kurtosis namely;-
i. Leptokurtic
ii. Mesokurtic
iii. Platykurtic
Leptokurtic
a) A frequency distribution which is lepkurtic has generally a
higher peak than that of the normal distribution. The
coefficient of kurtosis when determined will be found to be
more than 3. thus frequency distributions with a value of more
than 3 are definitely leptokurtic
b) Some frequency distributions when plotted may produce a
curve similar to that of the normal distribution. Such
frequency distributions are referred to as mesokurtic. The
degree of kurtosis is usually equal to 3
c) When the frequency curve contacted produces a peak which is
lower that that of a normal distribution when such a curve is
said to be platykurtic. The coefficient of such is usually less
than 3
- It is necessary to calculate the numerical measure of kurtosis.
The commonly used measure of kurtosis is the percentile
coefficient of kurtosis. This coefficient is normally determined
using the following equation
Percentile measure of kurtosis, K (Kappa) = 1
( Q3 - Q1)
2
P90 - P10
Example
Refer to the table above for loans to small business firms/units
Required
Calculate the percentile coefficient of Kurtosis
90
P90 = ( n +1) = 0.9 (610 +1)
100
= 0.9 (611)
= 549.9
The actual loan for a firm in this position
(549.9) = 80.5 +
( 549.9 - 538 ) x 5 = 81.99
40
10
P10 = (n + 1) = 0.1 (611) = 61.1
100
The actual loan value given to the firm on this position is
85
50.5 +
( 61.1 − 32 ) x 5 = 52.85
62
= 0.9 (611)
= 549.9
∴ Percentile measure of kurtosis
K (Kappa) = ½
( Q3 - Q1)
P90 - P10
=½
( 73.83 - 58.53)
81.99 - 52.85
= 0.26
Since 0.26 < 3, it can be concluded that the frequency distribution
exhibited by the distribution of loans is platykurtic
Kurtosis is also measured by moment statistics, which utilize the exact
value of each observation.
M2 =
X 2
M3 =
X 3
M4 =
X 4
M4 = Where m is mean
f
86