Research 10112
Research 10112
Research 10112
State the meaning of Central Tendency (Averages) and Sate the Qualities of an Ideal
Measure of Central Tendency.
A. Meaning:
Condensation of data is necessary in statistical analysis. This is because a large number of big figures
are not only confusing but also difficult to analyze. Therefore, in order to reduce the complexity of data
and to make them comparable it is necessary that various phenomena, which are being compared, are
reduce to a single figure. The first of such measures of central tendency is the typical value of the entire
group or data. It describes the characteristics of the entire mass of data. It reduces the complexity of
data and makes them to compare. According to Prof. R.A. Fisher, “The inherent inability of the human
mind to grasp in its entirety few constants that will adequately describes the data”. Human mind is
incapable of remembering the entire mass of unwidely data. So a simple figure is used to describe the
series which must be a representative number. It is generally called “a measure of central tendency or
the average”.
The word average or the term measures of central tendency have been defined by various authors.
Some of the definitions are given below:
Simpson and Kafka observe that “A measure of central tendency is a typical value around which other
figures congregate”.
“One of the most widely used set of summery figures is known as measure of location which are
often referred to as averages, central tendency or central location. The purpose of computing an
average value for a set of observations is to obtain a single value which is representative of all items
and which the mind can grasp simply and quickly. The single value is the point of location around
which the individual items cluster”.
- Lawrence J. Kaplan
Ya-Lum Chou states that “an average is a typical value in the sense that it is sometimes employed to
represent all the individual values in a series of a variable”.
Objectives of Averaging:
1. To arrive at a single value which is representative of the characteristics of the entire mass of data.
2. To facilate comparison.
3. To trace price relationship.
4. To know about the universe from the sample.
5. To help in decision making.
Q. State the various methods of calculating the measures of central tendency (Averages).
A. The following area the main types of averages:
1)Median:
It is a Middle most or central value in a set of observations which Exactly Divides the arranged Median
( in ascending or desc
The median is another important and widely used statistical average. It has the connection of the
‘Middle Most’ or ‘Most Central’ value of a set of numbers.
As the name itself suggests median is the value of middle item of a series arranged in an ascending
or a descending order of magnitudes.
Unlike arithmetic average median does not take into account the values of all the items in a series. It
is for this reason that median is called a ‘positional average’ as it is the value of middle item
irrespective of all other values.
Secrist has given the following definition “Median of a series is the value of the item actual or
estimate when a series is arranged in order of magnitude which divides the distribution into two parts”.
L.R. Connor states median A “that value of the variable which divides the group into two equal
parts, one part comprising all values greater, and the other, all values less than median”.
Explore fantastic wrld of Knowledge, Enrich with Skills & Excel! Call: 99604 35695 SANDEEP BAPAT | lanhi
ckiV
Computation of Median:
The calculation of median involves two basic steps viz.
1) The location of the medial item
2) The finding out of its value
Merits of Median:
1. Median is easy to calculate and is readily understood. In some cases it can be located merely by
inspection.
2. It is not at all affected by items on extremes values.
3. It can be computed for distributions which have open-end classes. The value of the median never
falls into the open end interval and it can also be calculated without difficulty from grouped frequency
distributions with classes of unequal width.
4. The value of the median can be located graphically.
5. Median is the only average which can be used while dealing with qualitative data numerical
measurements are not available, but where it is possible to rank the objects in same order.
6. The median is centrally located.
Demerits of Median:
1. In the case of an even number observations, median cannot be determined exactly. Also when several
items in the centre of the distribution are of the same size, the median may be somewhat indeterminate.
2. The median does not lend itself to algebraic treatment in so satisfactory a manner as the arithmetic
mean, the geometric mean or the harmonic mean do.
3. It is unsuitable if is desired to give greater importance to large or small values.
4. It is not based on all the items of the series. This property is sometimes described by saying the
median is insensitive.
5. The computation of median from a frequency distribution is based on sample interpolation under the
assumption that the observations in the median class are uniformly distributed which is not found in
reality.
6. As compared with arithmetic average, it is affected much by fluctuations of sampling, and is
therefore less reliable.
Quartiles
The quartiles break the distribution into four equal parts. There are three quartiles. The second
quartile divides the distribution into two halves and therefore it is same as the median. The first (lower)
quartile (Q1) marks off the first one forth, the third (upper) quartile (Q3) marks off the three forth. Thus,
the first quartile has 25% of the total number of observations in the series below it and 75% of the total
number above it. Likewise the third quartile has 25% of the total number of observations above it and
75% of the observations below it. Also,
Q1 < Q2 < Q3; Q2 = Median
2)Mode:
It is a value of variables Measured or observed as a Optimum representative Mode
( occurs most frequently ) in a Dist
Another averages which is conceptually very useful is the mode. In French, to be in the mode implies to
be in fashion. The mode is defined is defined as the value that occurs most frequently in a statistical
distribution.
The mode refers to the distribution which occurs most frequently. It is an actual value which has the
highest concentration of items in and around it. Actually the word ‘mode’ has been derived from the
French word ‘la Mode’ which signifies fashion. Thus, mode is the value occurring most frequently in a
set of observations and around which other items to the set cluster most densely. According to Croxton
and Cowden, “The mode of a distribution is the value at the point around which the items tend to be
most heavily concentrated. It may be regarded as the most typical of a series of a typical values”.
In the words of A.M. Tuttle, “Mode is the value which has the greatest frequency density in its
immediate neighborhood”.
Its importance is very great in marketing studies where a manager is interested in knowing about
the size which has the highest concentration of items. For example, in placing an order for shoes or ready-
made garments the modal size helps because this size and other around it are in common demand.
Grouped Data:
In case of grouped data the computation of mode can be studied under the usual classification of
discrete and continuous series. In the latter, the precise value in a given class-interval has to be
interpolated by a formula as was done in the case of locating median.
In a uni-model distribution where the highest concentration is in a single discrete value there should
not be any difficulty in locating model value or a class-interval containing this value of inspection.
Some difficulty is experienced when nearly equal concentrations are found in two or more neighboring
values. There are two ways of dealing with this situation:
a) In a large majority of cases it shall be possible to make a choice of one value by taking the totals of
three values, the value with highest concentration and its two neighboring values in case of the
competing cases. The central value of the two group which yields higher total should be elected.
b) In some cases equal totals are yielded through this process where grouping has to be resorted.
Thus we see that the difference between Mean and Mode is three times, the difference between
mean and median. In other words median is closer to the mean than mode. This relationship between
Mean (a), Median (M) and mode (z) is shown in the following diagram:
Relationship among Mean (a), Median (M) and Mode (z):
Mode touches the peak of the curve indicating maximum frequency, Median (M) divides of the area
of the curve in two equal halves and Mean (a) is the centre of gravity. These relationships in the values
of mean, median and mode are great values. In cases where mode is ill defined or the series is bi-
model and mode cannot be calculated by the formulae discussed earlier. This relationship can,
however, give us an empirical value of the mode.
Merits of Mode:
1. Mode is readily comprehensible, commonly understood and easily calculated. Like median, mode
can be located by mere inspection is same cases.
2. Mode is not at all affected by extreme observation
3. Mode can be conveniently located even if the frequency distribution has class-intervals of unequal
magnitudes provided the modal class and the classes preceding and succeeding it are of the same
magnitude.
4. Open end-classes also do not pose any problem in the location of mode.
5. For the determination of mode it is necessary to know the values of all the items of a series.
6. Since mode is the most common item of a series it is not an isolated example like the median. Unlike
arithmetic average it cannot be a value which is not found in the series.
7. Mode is affected by the values of extreme items provided they adhere to the natural law relating to
extremes.
Demerits of Mode:
Mode is an unsatisfactory average and has some demerits:
1. It is not based on all the observations of the series.
2. It is ill-defined, indeterminate and indefinite.
3. It is not capable of further mathematical treatment.
4. Mode may be unrepresentative in many cases.
5. In many cases it may be impossible to get a definite value of mode. There may be 2, 3 or more modal
values.
6. As compared with mean, mode is affected to a great extent, by sampling fluctuations.
7. Choice of grouping has considerable influence on the value of mode. It is, therefore, said “the mode
is that most unstable average and its true value is difficult to determine. Moreover, the value of the
mode is affected significantly by the size of the class intervals used in grouping data into a frequency
distribution.
3) Arithmetic Mean: The arithmetic mean is unquestionably the most widely used and the most
generally understood of all averages. For this reason, when the term ‘Mean’ is used alone.
d value Employed
Meanto represent all the values in a series Around which other figures congregate and obtained by dividing sum of all
4) Geometric Mean: The geometric mean of ‘n’ numbers is defined as the “n th root of the product of ‘n’
numbers”. It is found out by multiplying all the values of a series and extracting nth root of the product.
The geometric mean of a series containing N observations is the Nth root of product of values. If
there are two times, the square root of the product of the two values is the geometric mean, if there are
three values, the cube root is the geometric mean and so on.
Merits of G.M.:
1. The geometric mean is rigidly defined and its value is precise.
2. It is based on all the observations in a series.
3. Geometric mean which has bias for higher values. It is unlike arithmetic which has bias for higher
values. The geometric mean is then appropriate for certain skewed as asymmetrical distribution. It is
particularly useful when a given phenomenon has a limit for lower value but no such limit for upper
value. Take the case of price which cannot fall below zero but has no upper limit.
4. Unlike A.M., G.M. is not affected much by the presence of extremely small or extremely large
observations.
5. It is suitable for further mathematical treatment.
6. It is not much affected by the fluctuations of sampling. It gives comparatively more weight to small
items.
Demerits of G.M.:
1. It is neither easy to calculate nor easy to understand.
2. Like arithmetic average it may be a value which does not exists in the series.
3. If any value in series is zero, the geometric mean would also be zero. Also, it becomes an imaginary
value if any one of the observations is zero.
4. It brings out the property of the ratio of change and not be absolute difference as in the case of
arithmetic mean.
5. The property of giving more weight to smaller items may in some cases prove to be a drawback of
the geometric mean.
Remarks:
1. G.M. is specially suitable in averaging rates, percentages and rates of increase between two periods,
i.e. the appropriate averaged to be used for computing the average rate of growth of population or
average increase in the rate of sales, profits, production gross national product, etc.
2. It is most appropriate average to be used when it is desired to give more weightage to smaller items
and small weightage to larger items.
3. It is used in the construction of index number, e.g., the Irving Fisher’s ideal number.
5) Harmonic Mean: When an average rate like kilometer per hour per day items manufactured etc., is
required to be find out, harmonic mean is calculated. Thus, harmonic mean is a type of statistical
average capable of application only within restricted field.
(Note: Write the formula of each item along with its information)
Harmonic mean of a series is the reciprocal of the arithmetic average of the reciprocal of the
arithmetic average of the reciprocal of the values of its various items.
Merits of H.M.:
1. It is rigidly defined and is based on all observation.
2. It is capable of further algebraic treatment.
3. Like A.M. and G.M. this average is also not affected very much by fluctuations of sampling.
4. It gives greater importance to small items and as such a single big figure cannot push its value up.
5. It measures relative changes and it extremely useful in averaging certain types of ratios and rates.
Demerits of H.M.:
1. It is not readily understood and is difficult to compute.
2. It is only a summary figure and may not be the actual item in the series.
3. Generally, it is not truly representative of the statistical series unless the phenomenon is such where
the small items have to be given a very high weightage.
Definition of Dispersion:
Some important definitions of dispersion are given below:
1) “Dispersion is the degree of the scatter or variation of the variable about a central value.”
- Brooks and Dick
3) “Measures of variability are usually used to indicate how tightly bunched the sample values around
the mean.”
-Dyckman and Thomas
4) “The degree to which numerical data tend to spread about an average value is called the variation or
dispersion of the data.”
Measures of Dispersion
1. Range:
The range which is the simplest of all the measures of dispersion is defined as the difference
between the two extreme observations, i.e., the greatest (maximum) and the smallest (minimum)
observations of the distribution. Thus: R = H – L
In case of the grouped frequency distribution (for discrete value) or the continuous frequency
distribution, range is defined as the difference between the upper limit of the smallest class.
Coefficients of Range:
Range as calculated above is an absolute measure of dispersion which is unfit for purpose of
comparison if the distribution are in different units. (H-L) / (H+L)
Merits of Range:
As has been pointed out a good measure of dispersion should be rigidly defined, easily calculated
readily understood, should be capable of further mathematical treatment and should not be much
affected by fluctuations of sampling. Out of these the only merit possessed by Range that is easily
calculated and readily understood.
Demerits of Range:
1) It is not based on all the observations of the series.
2) It is very much affected by fluctuations of sampling as its value varies widely from sample to sample.
3) It is not suitable for mathematical treatment.
4) It cannot be computed from frequency distributions with open-end classes.
5) Any addition or deletion of an item on either of the extremes changes the entire result but, if the
coefficient of range is taken, the effect is comparatively less.
Q3 - Q1
Quartile Deviation (Q.D.) = ---------
2
Q.D. as defined above is only an absolute measure of dispersion. For comparative studies of
variability of two distributions, relative measure known as Coefficient of Quartile Deviation
symbolically expressed as:
Q2 - Q1
Coefficient of Q.D. = ---------
Q3 + Q1
Standard Deviation
The concept of standard deviation was first suggested by Karl Pearsons in 1893. It may be defined
as the positive square root of the arithmetic mean of squares of deviations of given observations from
their arithmetic mean. In short, S.D. may be defined as “Root-Mean Square Deviation from Mean”. It is
usually denoted by the Greek letter σ (sigma). Although sum of deviations is minimum from the
median, the sum of the squares of deviations from the arithmetic average is mathematically sound.
Remarks:
Taking into consideration the pros and cons and also the wide applications of standard deviation is
statistical theory, such as in skewness, kurtosis, correlation and regression analysis, sampling theory
and tests of significance, we may regard standard deviation as the best and the most powerful measure
of dispersion.
SD
Coefficient of Standard Deviation = --------
Mean
The pure number independent of the units of measurement and thus is suitable for comparing the
variability, homogeneity or uniformity of two or more distributions.
100 times the coefficient of dispersion based on standard deviations is called the coefficient of variation
abbreviated as C.V.
SD
C.V. =-----------100
Mean
Remarks:
1. Coefficient of variation being a pure number is independent of the units of measurements and thus
is suitable for comparing the variability, homogeneity or uniformity of two or more distributions.
2. Again, where the means of distributions are widely different although given in the same unit,
coefficient of variation will be more logical than an absolute measure of dispersion.
3. According to Prof. Karl Pearsons, who suggest this measure ‘C.V. is the percentage variation in mean,
the standard deviation being considered as total variation in the mean’?
4. For comparing the variability, homogeneity, stability, uniformity and consistency of two series,
calculate the coefficients of variation. The series having greater C.V. is said to be more suitable than
the other and the series have less C.V. is said to be less variable or more consistent, more uniform,
more stable or more homogeneous.
Skewness
Introduction:
Measure of central tendency gives us an estimate of representative value of a series, the measure of
dispersion gives an indication of the extent to which the items cluster around or scatter away from the
central value and the skewness is a measure that refers to the extent of symmetry or asymmetry in a
distribution. In other words describes the shape of distribution.
Definition:
Some definitions of skewness:
1) “When a series is not symmetrical it is said to be asymmetrical or skewed”.
- Croxton & Cowden
2) “Measures of skewness tells us the direction and the extent of skewness. In symmetrical
distribution the mean, median and mode are identical. The more moves away from a mode, larger the
asymmetry or skewness”.
Simpson and Kafka
3) “Skewness is lack of symmetry. When a frequency distribution is plotted on a chart, skewness
present in the items tends to be dispersed more on one side of the mean than on the other”.
Riggleman & Frisbee
4) “A distribution is said to be skewed when the mean and the median fall at different points in the
distribution, and balance (or centre of gravity) is shifted to one side or the other - to left or right”.
- Gerrett
A unimodal distribution can be divided into three parts: the left tail, the middle part or the hump and
the right tail. In the case of the symmetrical distribution the two tails of two length, in the case of the
asymmetrical distribution one tail is longer than the other. If the left tail is longer than the right tail, the
distribution is said to be negatively skewed, the mean occurs eallier than the mode (the peak value).
If the right tail is longer, the skewness is said to be positive, the mode comes first and the median
afterwards, as we move along the X-axis. Skewness is zero when the two tails are equal; the mean,
mode and the median all coincide. By drawing the frequency polygon or histrogram one can judge
whether the skewness is zero, positive or negative.
Tests of Skewness:
Skewness is present in a distribution if:
1. The values of mean, median and mode do not coincide.
2. When the values are plotted on a graph paper, they do not yield a normal bell-shaped curve, or
when divided vertically through the centre of the curve, the two halves are unequal.
3. Qualities are not equidistant from the median, i.e., (Q1 – Md) is not equal to (Md – Q1)
4. The sum of the positive deviations from the median is not equal to the sum of the negative deviations.
5. Frequencies on either side of the mode are not equal.
Objectives of Skewness:
1. It helps in finding out the nature and the degree of concentration whether it is in higher or the lower
values.
2. The empirical relations of mean, median and mode are based on a moderately skewed distribution.
The measure of skewness will reveal to what extent such empirical relationship holds good.
3. It helps in knowing if the distribution is normal. Many statistical measures, such as the error of the
mean, are based on the assumptions of a normal distribution.
Measure of Skewness:
To find out the direction and extent of asymmetry in a series statistical measures of skewness are
employed. These measures can be absolute or relative. The absolutes measures of skewness tell us
extent of asymmetry and whether it is positive or negative, Thus, absolute skewness is based on the
difference between mean and mode. Symbolically,
Absolute J= Mean – Mode
If the value of mean is greater than the mode, skewness will be positive. In the case of value of mean
is less than the mode, skewness will be negative. The greater is the amount of skewness, the more is the
gap between mean and mode because of the influence of extreme items on the mean and not on the
mode. The reason why the difference between mean and mode is taken for the measure of skewness is
that in an asymmetrical distribution mean moves away from the mode.
Thus, the difference between the mean and the mode, whether positive or negative, indicates that
the distribution is symmetrical. However, such an absolute measure of skewness is not adequate
because it cannot be used for comparison of skewness in two distributors if they are in different units,
since difference between the mean and mode will be in the terms of the units of distribution. Thus, for
the comparison purposes we use the relative measures of skewness known as Coefficients of
Skewness.
There are four types of relative measures of skewness:
1) Karl Pearson’s Coefficient of Skewness.
2) Bowley’s Coefficients of Skewness.
3) Kelly’s Coefficients of Skewness.
4) Measure of Skewness based on Moments and Kurtosis.
Start
Yes
Bowley’ Formula? Compute Q1, Q3 & Median.
Q3 + Q1 - 2m J=--------------------
No Q3 – Q1
Compute Mean
Compute SD
Stop
Grouping Method
Yes
Bimodal Series
No Compute Mode
Yes
Illdefined? Compute Median
No Apply Apply
A–Z 3(a – m)
J=------------ J= -------------- SD
SD
Stop
Mean-Mode
J=
Standard Deviation
If in a particular frequency distribution, it is difficult to determine precisely the mode, or the mode
is ill defined, the coefficient of skewness can be determined by the following formula:
3 (Mean-Median)
J=
Standard
Deviation