Research 10112

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

Q.

State the meaning of Central Tendency (Averages) and Sate the Qualities of an Ideal
Measure of Central Tendency.
A. Meaning:
Condensation of data is necessary in statistical analysis. This is because a large number of big figures
are not only confusing but also difficult to analyze. Therefore, in order to reduce the complexity of data
and to make them comparable it is necessary that various phenomena, which are being compared, are
reduce to a single figure. The first of such measures of central tendency is the typical value of the entire
group or data. It describes the characteristics of the entire mass of data. It reduces the complexity of
data and makes them to compare. According to Prof. R.A. Fisher, “The inherent inability of the human
mind to grasp in its entirety few constants that will adequately describes the data”. Human mind is
incapable of remembering the entire mass of unwidely data. So a simple figure is used to describe the
series which must be a representative number. It is generally called “a measure of central tendency or
the average”.

Q. Explain Measures of Central Tendency.


A. Definition:
It is an Attempt to arrive at a single Value drawn or computed from Entire volume of data in a simple, easy andAverag
conci

The word average or the term measures of central tendency have been defined by various authors.
Some of the definitions are given below:
Simpson and Kafka observe that “A measure of central tendency is a typical value around which other
figures congregate”.
“One of the most widely used set of summery figures is known as measure of location which are
often referred to as averages, central tendency or central location. The purpose of computing an
average value for a set of observations is to obtain a single value which is representative of all items
and which the mind can grasp simply and quickly. The single value is the point of location around
which the individual items cluster”.
- Lawrence J. Kaplan
Ya-Lum Chou states that “an average is a typical value in the sense that it is sometimes employed to
represent all the individual values in a series of a variable”.

Objectives of Averaging:
1. To arrive at a single value which is representative of the characteristics of the entire mass of data.
2. To facilate comparison.
3. To trace price relationship.
4. To know about the universe from the sample.
5. To help in decision making.

Characteristics of a Good Average:


According to J.F. Kenney and E.S. Keeping, an average should be:

 First Edition : June 1986 First Page Total pages 17


Revised Edition : Print out Day, Date & time - Wednesday, June 09, 2021 11:20 AM
Utmost care is taken to make the edition error free; the auth or/publisher is not responsible for
printing/human/mechanical errors and not liable for any kinds of damages.
Statutory Warning : All the Concepts, Principles and formulae used in this material are Applicable
in Standard Situations , however may be applicable critically in special situations also. Hence, it is
advised to solve under proper guidance only. Call Sandeep Bapat on 99604 35695 For
Advanced, Authentic, Absolute & Assured Commerce Coaching under Healthy Learning Environment
Page 2 Module QST 02 Statistics • Taxation • Accounting • Research • Tutorials S.T.A.R.T
(a) vigorously defined, (b) easy to compute, (c) capable of simple interpretation, (d) dependent on all
the observed values, (e) not unduly influenced by one or two extremely large of small values, (f) not
fluctuate unduly from one random sample to another, and (g) be capable of mathematical manipulation.
1. It should be rigidly defined.
2. Its definitions should be in the form of a mathematical formula.
3. It should be easy to calculate and easy to follow.
4. It should be affected much by a few extreme values.
5. It should be capable of further statistical computation of processing.
6. It should be capable of further algebraic treatment.
7. It should be capable of being used in further statistical computation or processing.
8. It should possess sampling stability.

Characteristics of a Typical Average (Qualities of Central Tendency):


A measure of central tendency is a typical value around which the other figures congregate. Average
condenses a frequency distribution in a one figure. Prof. Bowley said, “Statistics is a science of
averages”. According to Yuleand Kendall average must process the following characteristics:
1) It should be rigidly defined so that there is no confusion with regard to its meaning connotation.
2) It should be easy to understand.
3) It should be simple to calculate.
4) Its definition should be in the form of mathematical formula.
5) It should be based on all the items in the data.
6) It should not be unduly influenced by any single item or a group of items.
7) It should be capable of further aleatoric treatment.
8) It should be capable of being used in further statistical computation.
9) It should have sampling stability.

Q. State the various methods of calculating the measures of central tendency (Averages).
A. The following area the main types of averages:
1)Median:
It is a Middle most or central value in a set of observations which Exactly Divides the arranged Median
( in ascending or desc

The median is another important and widely used statistical average. It has the connection of the
‘Middle Most’ or ‘Most Central’ value of a set of numbers.
As the name itself suggests median is the value of middle item of a series arranged in an ascending
or a descending order of magnitudes.
Unlike arithmetic average median does not take into account the values of all the items in a series. It
is for this reason that median is called a ‘positional average’ as it is the value of middle item
irrespective of all other values.
Secrist has given the following definition “Median of a series is the value of the item actual or
estimate when a series is arranged in order of magnitude which divides the distribution into two parts”.
L.R. Connor states median A “that value of the variable which divides the group into two equal
parts, one part comprising all values greater, and the other, all values less than median”.

Explore fantastic wrld of Knowledge, Enrich with Skills & Excel! Call: 99604 35695 SANDEEP BAPAT | lanhi
ckiV
Computation of Median:
The calculation of median involves two basic steps viz.
1) The location of the medial item
2) The finding out of its value

Merits of Median:
1. Median is easy to calculate and is readily understood. In some cases it can be located merely by
inspection.
2. It is not at all affected by items on extremes values.
3. It can be computed for distributions which have open-end classes. The value of the median never
falls into the open end interval and it can also be calculated without difficulty from grouped frequency
distributions with classes of unequal width.
4. The value of the median can be located graphically.
5. Median is the only average which can be used while dealing with qualitative data numerical
measurements are not available, but where it is possible to rank the objects in same order.
6. The median is centrally located.

Demerits of Median:
1. In the case of an even number observations, median cannot be determined exactly. Also when several
items in the centre of the distribution are of the same size, the median may be somewhat indeterminate.
2. The median does not lend itself to algebraic treatment in so satisfactory a manner as the arithmetic
mean, the geometric mean or the harmonic mean do.
3. It is unsuitable if is desired to give greater importance to large or small values.
4. It is not based on all the items of the series. This property is sometimes described by saying the
median is insensitive.
5. The computation of median from a frequency distribution is based on sample interpolation under the
assumption that the observations in the median class are uniformly distributed which is not found in
reality.
6. As compared with arithmetic average, it is affected much by fluctuations of sampling, and is
therefore less reliable.

Quartiles
The quartiles break the distribution into four equal parts. There are three quartiles. The second
quartile divides the distribution into two halves and therefore it is same as the median. The first (lower)
quartile (Q1) marks off the first one forth, the third (upper) quartile (Q3) marks off the three forth. Thus,
the first quartile has 25% of the total number of observations in the series below it and 75% of the total
number above it. Likewise the third quartile has 25% of the total number of observations above it and
75% of the observations below it. Also,
Q1 < Q2 < Q3; Q2 = Median

2)Mode:
It is a value of variables Measured or observed as a Optimum representative Mode
( occurs most frequently ) in a Dist

Another averages which is conceptually very useful is the mode. In French, to be in the mode implies to
be in fashion. The mode is defined is defined as the value that occurs most frequently in a statistical
distribution.
The mode refers to the distribution which occurs most frequently. It is an actual value which has the
highest concentration of items in and around it. Actually the word ‘mode’ has been derived from the
French word ‘la Mode’ which signifies fashion. Thus, mode is the value occurring most frequently in a
set of observations and around which other items to the set cluster most densely. According to Croxton
and Cowden, “The mode of a distribution is the value at the point around which the items tend to be
most heavily concentrated. It may be regarded as the most typical of a series of a typical values”.
In the words of A.M. Tuttle, “Mode is the value which has the greatest frequency density in its
immediate neighborhood”.
Its importance is very great in marketing studies where a manager is interested in knowing about
the size which has the highest concentration of items. For example, in placing an order for shoes or ready-
made garments the modal size helps because this size and other around it are in common demand.

Computation of the Mode


Ungrouped Data:
For ungrouped data or in a series of individual observations, mode is often found by more
inspections. The data is placed in the form of an array so that items having the same value can be
identified and counted. The value which occurs most frequency it the model value.
Remarks: Although the occurrence of more than one mode in a single distribution can be used in
further analysis, the mode as measure of central tendency has little significance for a bimodal or tri-
modal distribution.

Grouped Data:
In case of grouped data the computation of mode can be studied under the usual classification of
discrete and continuous series. In the latter, the precise value in a given class-interval has to be
interpolated by a formula as was done in the case of locating median.
In a uni-model distribution where the highest concentration is in a single discrete value there should
not be any difficulty in locating model value or a class-interval containing this value of inspection.
Some difficulty is experienced when nearly equal concentrations are found in two or more neighboring
values. There are two ways of dealing with this situation:
a) In a large majority of cases it shall be possible to make a choice of one value by taking the totals of
three values, the value with highest concentration and its two neighboring values in case of the
competing cases. The central value of the two group which yields higher total should be elected.
b) In some cases equal totals are yielded through this process where grouping has to be resorted.

Empirical Relationship between Mean, Median and Mode:


In a symmetrical distribution mean, median and mode are identical and have the same value.
However, in actual life most distributions are not symmetrical - they are skewed. These concepts of
symmetry and skewness would be discussed in chapter on skewness. In case of moderately skewed (or
moderately asymmetrical) distributions the value of mean, median and mode have the following
empirical relationship given by Karl Pearson:
i) Mean - Mode = 3 (Mean - Median)
Mean = Mode + 3/2 (Median - Mode) = ½ (3 Median - Mode)
ii) Mode = 3 Median - 2 Mean
iii) Median = Mode + 2/3 (Mean – Mode) = ¼ (2 Mean + Mode)

Thus we see that the difference between Mean and Mode is three times, the difference between
mean and median. In other words median is closer to the mean than mode. This relationship between
Mean (a), Median (M) and mode (z) is shown in the following diagram:
Relationship among Mean (a), Median (M) and Mode (z):

Mode touches the peak of the curve indicating maximum frequency, Median (M) divides of the area
of the curve in two equal halves and Mean (a) is the centre of gravity. These relationships in the values
of mean, median and mode are great values. In cases where mode is ill defined or the series is bi-
model and mode cannot be calculated by the formulae discussed earlier. This relationship can,
however, give us an empirical value of the mode.
Merits of Mode:
1. Mode is readily comprehensible, commonly understood and easily calculated. Like median, mode
can be located by mere inspection is same cases.
2. Mode is not at all affected by extreme observation
3. Mode can be conveniently located even if the frequency distribution has class-intervals of unequal
magnitudes provided the modal class and the classes preceding and succeeding it are of the same
magnitude.
4. Open end-classes also do not pose any problem in the location of mode.
5. For the determination of mode it is necessary to know the values of all the items of a series.
6. Since mode is the most common item of a series it is not an isolated example like the median. Unlike
arithmetic average it cannot be a value which is not found in the series.
7. Mode is affected by the values of extreme items provided they adhere to the natural law relating to
extremes.

Demerits of Mode:
Mode is an unsatisfactory average and has some demerits:
1. It is not based on all the observations of the series.
2. It is ill-defined, indeterminate and indefinite.
3. It is not capable of further mathematical treatment.
4. Mode may be unrepresentative in many cases.
5. In many cases it may be impossible to get a definite value of mode. There may be 2, 3 or more modal
values.
6. As compared with mean, mode is affected to a great extent, by sampling fluctuations.
7. Choice of grouping has considerable influence on the value of mode. It is, therefore, said “the mode
is that most unstable average and its true value is difficult to determine. Moreover, the value of the
mode is affected significantly by the size of the class intervals used in grouping data into a frequency
distribution.

3) Arithmetic Mean: The arithmetic mean is unquestionably the most widely used and the most
generally understood of all averages. For this reason, when the term ‘Mean’ is used alone.

d value Employed
Meanto represent all the values in a series Around which other figures congregate and obtained by dividing sum of all

Arithmetic Average or Mean


Arithmetic average or Mean of a series (usually denoted by a) is the ‘value obtained by dividing the
sum of the values of various items in a series (∑X) divided by the number of items (N) constituting the
series.
The symbol ∑ (Greek letter called capital sigma) denotes the sum of N items. However, in normal
use only ∑X is written in place of n.
Arithmetic mean is very simple and therefore, commonly used in business and economics.
Wherever there is a mention of average income profit, wage, output, the reference is to the arithmetic
mean unless there are some qualifying words suggesting some other type of average.

Merits of Arithmetic Average:


1. It is rigidly defined so that difference interpretations by different persons are not possible.
2. It is easy to understand and easy to calculate.
3. Since it takes all value into considerations, it is considered to be more representative of the
distribution.
4. It is a widely used method because of its mathematical property. It is used in the computation of
various other statistical measures such as standard deviation, coefficient of skewness, etc.
5. It is possible to calculate the arithmetic average even if the some of details of the data are lacking.
6. It gives weight to all items in direct proportion to their size.
7. Least affected by fluctuations of sampling. Arithmetic averages of different samples will show less
variation than medians and modes of various samples. This advantage has led to the belief that
arithmetic average is a stable measure.

Demerits of Arithmetic Average:


1. It cannot be determined by inspection nor can it be located graphically.
2. It cannot be used in the study of qualitative phenomena which are not capable of numerical
measurements, i.e. intelligence, beauty, honesty etc.
3. The weakest point of arithmetic mean is that it is affected very much by extreme values. Thus, it is
desirable not to use arithmetic average when the distribution is unevenly spread. This is certainly not
representative of the group.
4. It cannot be calculated if the distribution has open-ended classes. Moreover, if a single observation is
missing or lost or is illegible, mean cannot be calculated.
5. It may give a value which does not correspond with any of the individual items in the series and as
such it is called a fictitious average.
6. For an extremely asymmetrical (skewed) distribution, arithmetic mean is generally not a suitable
measure of location.
7. It may lead to fallacious conclusions if the details of the data, from which it is computed are not
given.
8. The arithmetic averages give greater importance to higher value of a series and lesser importance
to lower values. It has an upward bias. One big items among five items, four of which are small, will
push up the average considerably but the reverse is not true. If in a series of five items there are four
big values and one small value the average will not pulled down very much.

Weighted Average Method:


For calculating simple arithmetic average, we presume that all values or the size of items in the
distribution have equal importance. But, in practice this may not be so. In case of some items are more
important than others, simple average is not representative of the distribution. In such a case proper
weightage has to be given to the various items: the weights attached to each item being proportional to
the importance of the item in the distribution. For example to have an idea of the change in the cost of
living of a certain group of persons, the simple average of the prices of the commodities consumed by
them will not do since all the commodities are not equally important.
In the words of Biddington, “This principles of weighted average is important in all cases where
varying quantities are in evidence and it will be necessary, for instance, to use in a factory, where
average cost per unit of commodities manufactured is desired, also where the average output per
machine is required and the machines are of different patterns or are working under varying
conditions”.
Weighted average may be defined as the average whose component items are being multiplied by
certain values known as ‘weights’, and the aggregate of the multiplied results are being divided by the
total sum of their “weights” instead of the sum of items.

Utility of the Weighted Mean:


Weighted arithmetic mean is used in:
a) Construction of index numbers
b) Comparison of results of two or more universities where number of students differ
c) Computation of standardized death and birth rate:
Selection of weights is an important point that arises in the calculation of weighted mean is the
selection of weights. Weights could be either equal or arbitrary. If actual weights are available there is
no problem on calculating the weighted mean. If however, weights are arbitrary it becomes difficult to
determine them. Different persons may assign different weights to the various items. However a
change in weights does not generally affects the series as much as a chance in the value of an item. As
such an error in weight is less serious than corresponding error in the size of the item. It is for the
reason, King had observed “the item should be as exact as possible and the weights used should be
approximately accurate”.
Remarks:
1) The importance of all the items in the series is not equal.
2) The classes of the same group contain widely varying frequencies.
3) It is desired to calculate the average of the series from the average of its component parts.
4) Ratios, percentages or rates are being averaged.
5) There is a change either in the proportion of values of items or in the proportion of their frequencies.

4) Geometric Mean: The geometric mean of ‘n’ numbers is defined as the “n th root of the product of ‘n’
numbers”. It is found out by multiplying all the values of a series and extracting nth root of the product.
The geometric mean of a series containing N observations is the Nth root of product of values. If
there are two times, the square root of the product of the two values is the geometric mean, if there are
three values, the cube root is the geometric mean and so on.

Merits of G.M.:
1. The geometric mean is rigidly defined and its value is precise.
2. It is based on all the observations in a series.
3. Geometric mean which has bias for higher values. It is unlike arithmetic which has bias for higher
values. The geometric mean is then appropriate for certain skewed as asymmetrical distribution. It is
particularly useful when a given phenomenon has a limit for lower value but no such limit for upper
value. Take the case of price which cannot fall below zero but has no upper limit.
4. Unlike A.M., G.M. is not affected much by the presence of extremely small or extremely large
observations.
5. It is suitable for further mathematical treatment.
6. It is not much affected by the fluctuations of sampling. It gives comparatively more weight to small
items.

Demerits of G.M.:
1. It is neither easy to calculate nor easy to understand.
2. Like arithmetic average it may be a value which does not exists in the series.
3. If any value in series is zero, the geometric mean would also be zero. Also, it becomes an imaginary
value if any one of the observations is zero.
4. It brings out the property of the ratio of change and not be absolute difference as in the case of
arithmetic mean.
5. The property of giving more weight to smaller items may in some cases prove to be a drawback of
the geometric mean.

Remarks:
1. G.M. is specially suitable in averaging rates, percentages and rates of increase between two periods,
i.e. the appropriate averaged to be used for computing the average rate of growth of population or
average increase in the rate of sales, profits, production gross national product, etc.
2. It is most appropriate average to be used when it is desired to give more weightage to smaller items
and small weightage to larger items.
3. It is used in the construction of index number, e.g., the Irving Fisher’s ideal number.

5) Harmonic Mean: When an average rate like kilometer per hour per day items manufactured etc., is
required to be find out, harmonic mean is calculated. Thus, harmonic mean is a type of statistical
average capable of application only within restricted field.
(Note: Write the formula of each item along with its information)
Harmonic mean of a series is the reciprocal of the arithmetic average of the reciprocal of the
arithmetic average of the reciprocal of the values of its various items.

Merits of H.M.:
1. It is rigidly defined and is based on all observation.
2. It is capable of further algebraic treatment.
3. Like A.M. and G.M. this average is also not affected very much by fluctuations of sampling.
4. It gives greater importance to small items and as such a single big figure cannot push its value up.
5. It measures relative changes and it extremely useful in averaging certain types of ratios and rates.

Demerits of H.M.:
1. It is not readily understood and is difficult to compute.
2. It is only a summary figure and may not be the actual item in the series.
3. Generally, it is not truly representative of the statistical series unless the phenomenon is such where
the small items have to be given a very high weightage.

Q. Write the methods of calculating dispersion.


A.
An average is a single value which represents a set of values in a distribution. It is a central value which
typically represents the entire distribution. Dispersion, on the other hand indicates the extend to
which the individual values fall away from the average or the central value. This measure brings how
two distributions with the same average value may have wide differences in the spread of Individual
values around the central value.
As per L.N. Connor:
Dispersion is measure of the extent to which the individual items vary.
The following are the methods of calculating the dispersion.
1) Range and Coefficient of Range
2) Quatile Deviation and Coefficient of Quatile Deviation
3) Mean Deviation and Coefficient of Mean Deviation (from Mean, Median, Mode)
4) Standard Deviation and Coefficient of Range
(Note – Write the formula of each while showing the items of dispersion)

Q. State the objectives of calculating the dispersion (Purpose of measuring dispersion).


A. The following are the purposes of measuring dispersion:
1) To test the reliability of an average: The variation of measure is the only means to test the
representative character of an average. If the scatter is large, average is less reliable. On the other hand,
of the scatter is small, the average is a typical value and more closely represents the individual values.
“When dispersion is small, the average is typical in that it closely represents the individual values, and
it is reliable in that it is a good estimate of the corresponding average in the population. On the other
hand, when the dispersion is large, the average is not so typical and unless the same is very large, the
average may be quite unreliable”.
2) To serve as a basis for control of variability: Measures of dispersion are indispensable to determine
the nature and find the cause of variation. When these are known, it is easy to control their variations.
In industries, production efficient operation requires control of quality variation, the causes of which
are sought through inspection and quality control programs.
3) To compare two or more series with regard to their variability: The degree of uniformity or the
consistency of data can be found out through the study of measure of dispersion. When comparing to
series, as regards the reliability of the averages, due to consideration may be given to dispersion which
is a good basis for comparison.
4) To facilitate as a basis for further statistical analysis: The measure of dispersion are essential for
studying the statistical tools such as correlation, regression, test of hypothesis, analysis of fluctuations,
cost control etc.
Introduction of Measures of Dispersion
“An average does not tell the full story. It is hardly fully representative of a mass unless we know the
manner in which the individual items scatter around it. A further description of the series is necessary
if we are to gauge how representative the average is.”

Definition of Dispersion:
Some important definitions of dispersion are given below:
1) “Dispersion is the degree of the scatter or variation of the variable about a central value.”
- Brooks and Dick

2) “Dispersion is the measure of the variations of the items.”


- A.L. Bowley

3) “Measures of variability are usually used to indicate how tightly bunched the sample values around
the mean.”
-Dyckman and Thomas

4) “The degree to which numerical data tend to spread about an average value is called the variation or
dispersion of the data.”

Objectives of Measuring Dispersion:


1. To judge the reliability of measures of central tendency.
2. To compare two or more series with regard to their variability.
3. To control the variability itself.
4. To facilitate the use of other statistical measure.

Measures of Dispersion

1. Range:
The range which is the simplest of all the measures of dispersion is defined as the difference
between the two extreme observations, i.e., the greatest (maximum) and the smallest (minimum)
observations of the distribution. Thus: R = H – L
In case of the grouped frequency distribution (for discrete value) or the continuous frequency
distribution, range is defined as the difference between the upper limit of the smallest class.
Coefficients of Range:
Range as calculated above is an absolute measure of dispersion which is unfit for purpose of
comparison if the distribution are in different units. (H-L) / (H+L)

Merits of Range:
As has been pointed out a good measure of dispersion should be rigidly defined, easily calculated
readily understood, should be capable of further mathematical treatment and should not be much
affected by fluctuations of sampling. Out of these the only merit possessed by Range that is easily
calculated and readily understood.

Demerits of Range:
1) It is not based on all the observations of the series.
2) It is very much affected by fluctuations of sampling as its value varies widely from sample to sample.
3) It is not suitable for mathematical treatment.
4) It cannot be computed from frequency distributions with open-end classes.
5) Any addition or deletion of an item on either of the extremes changes the entire result but, if the
coefficient of range is taken, the effect is comparatively less.

Semi Inter – Quartile Range or Quartile Deviation


Quartile Deviation is a measure of dispersion based on the upper quartile (Q3) and the lower
quartile (Q1). It is also called semi inter-quartile range because it represents the average difference
between two quartiles, as shown below:

Q3 - Q1
Quartile Deviation (Q.D.) = ---------
2

Q.D. as defined above is only an absolute measure of dispersion. For comparative studies of
variability of two distributions, relative measure known as Coefficient of Quartile Deviation
symbolically expressed as:
Q2 - Q1
Coefficient of Q.D. = ---------
Q3 + Q1

Merits of Quartile Deviation:


1. It is quite easy to understand and calculate.
2. Infact Q.D. is the only measure of dispersion which can be obtained while dealing with a distribution
having open-end classes.
3. It is not affected at all by extreme observations and as such in distributions which are highly
asymmetrical or skew, quartile deviation is better measure if dispersion than those which take into
account the values of all the items of a distribution (like Mean Deviation or Standard Deviation).

Demerits of Quartile Deviation:


1. Q.D. is not based on all the observations as it ignores that first 25% and last 25% of the items. Thus it
can not be regarded as a reliable measure of variability.
2. It is not suitable for further mathematical treatment.
3. It is affected considerable by fluctuations of sampling.

Computation of Quartile Deviation:


Computation of Quartile Deviation is very easy as in its calculation we have only to find out values
of the upper and lower quartiles.
Mean Deviation
The range and quartile deviation are positional measure of dispersion and are based on the
position of certain items in a distribution. The mean deviation or average deviation is a measure of
dispersion that is based on all items.
The mean deviation is the arithmetic mean of the deviations of the individual values from the
average of the given data. The average which is frequently used in computing the mean deviation is
mean or median. Also, only the absolute values of the deviations are used.
“Average deviation is the average amount of scatter, of the items in a distribution from either the
mean or the median, ignoring the signs of the deviation. The average that is an arithmetic mean, which
accounts for the fact that this measure is often called the mean deviation”.

Coefficient of Mean Deviation:


Mean deviation is calculated by any measure of central tendency is an absolute measure. When it
divided by the average used for calculating it, we get coefficient of mean deviation which will give a
relative measure of dispersion suitable for comparing two or more series which are expressed in
different units but for different orders of magnitude.

Merits of Mean Deviation:


1. As compared to other calculated measures of dispersion, mean deviation is easy to understand.
2. It is based upon all the items of the series.
3. Truly speaking, it is the mean deviation which is an average of the second order because in it are
considered the difference of all the items of the series from an average of those items.
4. It is less affected by extreme items as compared with S.D.

Demerits of Mean Deviation:


1. The mean deviation is the arithmetic mean of the absolute values of the deviation. It ignores the
positive and negative signs of the deviation.
2. It is not amenable to further algebraic treatment.
3. Mean deviation gives accurate results only when deviations are taken form median.
4. It cannot be computed for distributors with open0end classes.

Standard Deviation
The concept of standard deviation was first suggested by Karl Pearsons in 1893. It may be defined
as the positive square root of the arithmetic mean of squares of deviations of given observations from
their arithmetic mean. In short, S.D. may be defined as “Root-Mean Square Deviation from Mean”. It is
usually denoted by the Greek letter σ (sigma). Although sum of deviations is minimum from the
median, the sum of the squares of deviations from the arithmetic average is mathematically sound.

Comparison between M. Deviation & S. Deviation:


Mean Deviation Standard Deviation
1) Deviations are calculated from mean, 1) These are calculated from arithmetic mean
median and mode. only.
2) The algebraic signs have to be ignored - 2) Since the deviations are squared, the plus and
only values of deviations are taken. the minus signs need not to be omitted.
3) It is based on simple average of the sum 3) It is based on the square root of the squared
of the absolute deviations. deviations.
4) It is simple to calculate when mean is a 4) This is somewhat complex because of the
round number. The short-cut method is squaring of deviations but it is suitable in all
somewhat cumbersome. cases - whether the mean is a round number or
a fraction, since a short-cut method is also
available.
5) It lacks mathematical properties since 5) It is mathematically sound on account of the
only absolute values are considered. fact that algebraic signs are not ignored.

Merits of Standard Deviation:


1. Standard deviation is by far the most important and widely used measure of dispersion. It is rigidly
defined and based on all the observations.
2. The squaring of the deviations (X - X) removes the drawback of ignoring the signs of deviation in
computing the mean deviation.
3. It is affected least the sampling fluctuations. If several independent sample are drawn from the
same [population and each time all the four measures of dispersion are calculated, it will be found that
S.D. fluctuates the least from sample. Therefore, it can be safely used for testing hypothesis and for
conducting tests of significance.
4. In normal distribution the mean 1 S.D covers 68.27% of the values whereas only 50% values are
covered by quartile deviation and 57% by mean deviation. It is because of this also that S.D. is called
‘standard’ measure of dispersion.
5. Standard deviation enables us to determine the reliability of the mean of two or more different series
when these means are same. In this case, the series with the lowest standard deviation has its mean the
most representative of the series. A small S.D. means more compactness and less variability of the items
forms the mean.

Demerits of Standard Deviation:


1. As compared to other measures of dispersion, it is neither easy to understand nor simple to calculate.
2. It is also affected by the extreme values. It gives undue importance to the items away from the
arithmetic mean and less importance to the items near the mean.
3. The chief limitation of S.D. is that it cannot be used for comparing the dispersion of two or more
series of observations given in different units.

Remarks:
Taking into consideration the pros and cons and also the wide applications of standard deviation is
statistical theory, such as in skewness, kurtosis, correlation and regression analysis, sampling theory
and tests of significance, we may regard standard deviation as the best and the most powerful measure
of dispersion.

Coefficient of Standard Deviation:


Standard deviation is an absolute measure of dispersion, depending upon the units of
measurement. The relative measure of dispersion based on standard deviation is called the coefficient
of standard deviation and is given by:

SD
Coefficient of Standard Deviation = --------
Mean
The pure number independent of the units of measurement and thus is suitable for comparing the
variability, homogeneity or uniformity of two or more distributions.
100 times the coefficient of dispersion based on standard deviations is called the coefficient of variation
abbreviated as C.V.
SD
C.V. =-----------100
Mean
Remarks:
1. Coefficient of variation being a pure number is independent of the units of measurements and thus
is suitable for comparing the variability, homogeneity or uniformity of two or more distributions.
2. Again, where the means of distributions are widely different although given in the same unit,
coefficient of variation will be more logical than an absolute measure of dispersion.
3. According to Prof. Karl Pearsons, who suggest this measure ‘C.V. is the percentage variation in mean,
the standard deviation being considered as total variation in the mean’?
4. For comparing the variability, homogeneity, stability, uniformity and consistency of two series,
calculate the coefficients of variation. The series having greater C.V. is said to be more suitable than
the other and the series have less C.V. is said to be less variable or more consistent, more uniform,
more stable or more homogeneous.

S.D. and Normal Curve:


Most distribution are symmetrical bell-shaped type. The frequency curves formed from such
distributions may be regarded as approximations to an important well known curve shown as the
normal curve known as the ‘normal curve’.
Standard deviation lay down the limits of variability by which the individual observations in the
distribution will vary from the mean. In other words, mean ± standard deviation will indicate the range
within which a given percentage of values of the distribution are likely to fall i.e. nearly 68.27% will lie
within mean ± standard deviation, 95.45% within mean ±2 standard deviation and 99.73% within mean
±3 standard deviation under the normal curves.

Skewness
Introduction:
Measure of central tendency gives us an estimate of representative value of a series, the measure of
dispersion gives an indication of the extent to which the items cluster around or scatter away from the
central value and the skewness is a measure that refers to the extent of symmetry or asymmetry in a
distribution. In other words describes the shape of distribution.
Definition:
Some definitions of skewness:
1) “When a series is not symmetrical it is said to be asymmetrical or skewed”.
- Croxton & Cowden
2) “Measures of skewness tells us the direction and the extent of skewness. In symmetrical
distribution the mean, median and mode are identical. The more moves away from a mode, larger the
asymmetry or skewness”.
Simpson and Kafka
3) “Skewness is lack of symmetry. When a frequency distribution is plotted on a chart, skewness
present in the items tends to be dispersed more on one side of the mean than on the other”.
Riggleman & Frisbee
4) “A distribution is said to be skewed when the mean and the median fall at different points in the
distribution, and balance (or centre of gravity) is shifted to one side or the other - to left or right”.
- Gerrett

A unimodal distribution can be divided into three parts: the left tail, the middle part or the hump and
the right tail. In the case of the symmetrical distribution the two tails of two length, in the case of the
asymmetrical distribution one tail is longer than the other. If the left tail is longer than the right tail, the
distribution is said to be negatively skewed, the mean occurs eallier than the mode (the peak value).

If the right tail is longer, the skewness is said to be positive, the mode comes first and the median
afterwards, as we move along the X-axis. Skewness is zero when the two tails are equal; the mean,
mode and the median all coincide. By drawing the frequency polygon or histrogram one can judge
whether the skewness is zero, positive or negative.
Tests of Skewness:
Skewness is present in a distribution if:
1. The values of mean, median and mode do not coincide.
2. When the values are plotted on a graph paper, they do not yield a normal bell-shaped curve, or
when divided vertically through the centre of the curve, the two halves are unequal.
3. Qualities are not equidistant from the median, i.e., (Q1 – Md) is not equal to (Md – Q1)
4. The sum of the positive deviations from the median is not equal to the sum of the negative deviations.
5. Frequencies on either side of the mode are not equal.
Objectives of Skewness:
1. It helps in finding out the nature and the degree of concentration whether it is in higher or the lower
values.
2. The empirical relations of mean, median and mode are based on a moderately skewed distribution.
The measure of skewness will reveal to what extent such empirical relationship holds good.
3. It helps in knowing if the distribution is normal. Many statistical measures, such as the error of the
mean, are based on the assumptions of a normal distribution.

Distinctions between Dispersion and


Skewness:
Dispersion Skewness
1) It deals with spread of individual values 1) It deals with symmetry of distribution
around a central value in distribution. of values on both sides of central value.
2) It is the average of deviations around a 2) It is not an average although measured
central value. It is thus a type of average. by the use of various types of averages.
3) It is useful for finding out the degree of 3) It helps in finding out whether the
variability, absolute deviations. concentration is in higher or in lower
values.
4) Indicates how far mean is representative 4) It helps in judging if the distribution is
of individual values. normal
5) It deals with variability in general. 5) It deals with the symmetry of
distribution on either side of the mode.
6) It refers to the general shape of frequency. 6) It indicates as to how the dispersion on
the two sides of the mode varies in the
arrangement of frequencies.

Measure of Skewness:
To find out the direction and extent of asymmetry in a series statistical measures of skewness are
employed. These measures can be absolute or relative. The absolutes measures of skewness tell us
extent of asymmetry and whether it is positive or negative, Thus, absolute skewness is based on the
difference between mean and mode. Symbolically,
Absolute J= Mean – Mode
If the value of mean is greater than the mode, skewness will be positive. In the case of value of mean
is less than the mode, skewness will be negative. The greater is the amount of skewness, the more is the
gap between mean and mode because of the influence of extreme items on the mean and not on the
mode. The reason why the difference between mean and mode is taken for the measure of skewness is
that in an asymmetrical distribution mean moves away from the mode.
Thus, the difference between the mean and the mode, whether positive or negative, indicates that
the distribution is symmetrical. However, such an absolute measure of skewness is not adequate
because it cannot be used for comparison of skewness in two distributors if they are in different units,
since difference between the mean and mode will be in the terms of the units of distribution. Thus, for
the comparison purposes we use the relative measures of skewness known as Coefficients of
Skewness.
There are four types of relative measures of skewness:
1) Karl Pearson’s Coefficient of Skewness.
2) Bowley’s Coefficients of Skewness.
3) Kelly’s Coefficients of Skewness.
4) Measure of Skewness based on Moments and Kurtosis.

Start
Yes
Bowley’ Formula? Compute Q1, Q3 & Median.
Q3 + Q1 - 2m J=--------------------
No Q3 – Q1
Compute Mean
Compute SD
Stop

Grouping Method
Yes
Bimodal Series

No Compute Mode

Yes
Illdefined? Compute Median

No Apply Apply
A–Z 3(a – m)
J=------------ J= -------------- SD
SD

Stop

Karl Pearson’s Coefficient of Skewness:


Karl Pearson’s Coefficient of Skewness or Pearson’s Coefficient of Skewness is given by the formula:

Mean-Mode
J=
Standard Deviation

If in a particular frequency distribution, it is difficult to determine precisely the mode, or the mode
is ill defined, the coefficient of skewness can be determined by the following formula:

3 (Mean-Median)
J=
Standard
Deviation

This is based on the relationship between different averages in a moderately asymmetrical


distribution. In such as distribution:
Mode = Mean – 3 (Mean-Median)
Mean – Mode = 3 (Mean-Median)

Bowley’s Coefficients of Skewness:


Bowley’s Coefficients of Skewness also known as Quartile coefficient of skewness and is specially
useful:
1) When the mode is ill-defined and extreme observations are present in the data.
2) When the distribution has open-end classes or unequal class intervals.
The quartile measure depends upon the fact that normally Q 3 and Q1 are equidistant from the median,
i.e., symmetrical distribution Q 3 –Md = Md - Q1. But the distribution is asymmetrical, the one quartile
will be further from the median than the other. In such case skewness can be measured by the
following formula given by Bowley:
Skewness = (Q1- Md) – (Md - Q1) = Q3 + Q1 – 2Md
If the first part is more than the second part, the skewness is positive and in the reverse situations it is
negative.To make the measure a readily comparable, the coefficient of skewness is obtained by
dividing it by quartile range, viz. Q3 - Q1.

(Q3- Md) – (Md - Q1) Q3 + Q1 – 2 Md


J= =
(Q3- Md) – (Md - Q1) Q3 - Q1
The range of variation under this method is 1. It is, therefore, easy to interpret but the main
drawback of measure is that it based on the central 50% of the data and ignore the remaining 50% of the
data towards the extremes.
Remarks: It should be clearly noted that the values of the coefficients of skewness obtained by
Bowley’s formula and Pearson’s formula are not comparable. Although, in each case, Sk = ) implies the
absence of skewness, i.e., the distribution is symmetrical. It may even happen that one of them gives
positive skewness while the other gives negative skewness.
Basic Definitions:
Analysis An Attempt to find out a solution on a problem with a Number of closely related
operations performed with a purpose of summarizing the facts collected, to take
necessary Action and develop awareness regarding real importance of research
investigations Logically & legibly. To get optimum Yield formulation of
Systematic set of norms and hypothesis is necessary. Interpretation opens new
avenues for intellectual adventure and Stimulates the quest for more knowledge.
– Sandeep Bapat
Time – series It is a study or analysis of Trends or tendency Interpreted by using Free hand
curve/Semi average/ Moving average or Least square Methods to Evaluate
changes in economic phenomenon due to Secular trend (Long term fluctuations)
, Effect of seasonal or cyclic variations, Regular short time oscillations, Irregular
or random variations to Establish a System to lead towards the success. – Sandeep
Bapat
Sample Selection of small group from the population or universe as a representative, having
certain Attribute/s and a fair chance of selection, which is Measurable, Practical,
Legible and each of them is Exhaustive and mutually exclusive. – Sandeep Bapat
Sampling It is a Study of relationship existing between population or universe and a small
group selected from the universe as a representative of the whole, each of them
has certain attribute/s and Arranged in such a way that every item has a fair
chance of selection. Measurability, Practicability with economy and accuracy
setting by Limits of errors helps the researcher to Investigate and draw an
inference by testing a hypothesis with the help of random/probability method or
non – random /probability method. Strength in a Number, larger the sample size,
Greater the stability (It neutralizes the chance of error & omission). – Sandeep Bapat
Probability It is a process to measure the possibility that the “Particular event will
happen/occur/true/fair” by a series of Random experiments performed
rationally by establishing a relationship (ratio) between the number of “actual
Occurrence and the Total of possible occurrence”, alongwith Bernoulli’s and
Baye’s theorems, various other theorems are also Applied Befittingly.
Impossibility is denoted by “Zero” (0) and absolute certainty by “One” (1),
intermediate “Levels of certainty” lies in between 0 & 1. Interpretation of
inferences helps to take Timely action to maximize the Yield and neutralize the
risk. – Sandeep Bapat

You might also like