Foundation Notes 2013

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 25

Foundation Notes

Arranging Data
In this Lesson we will get familiar with data and its various types. We will also discuss the methods of
data collection. Then we will focus on various data presentation tools like table and graphs (like line
chart, bar chart, pie diagram, pictogram and scatter diagram).
In this Lesson we will get familiar with frequency distribution and frequency polygon. We will also study
the properties (skew ness and kurtosis) of frequency distribute on curve.
What is Data?
Data is a collection of related observations, facts or figures. A collection of data is called a data set, and
each observation a data point.
Example: Marks obtained by students in Introduction to
Quantitative Methods course
Types of Data
Raw Data: Information before its systematic arrangement and analysis is called raw data. Useful
inferences can be derived from the raw data by applying various statistical methods.
Example: Sales data of a company for a year
Data can be classified as:
Published Data

Unpublished Data

Data that is already collected


Data that is yet to be collected or printed
and published
Data collected by a shopkeeper regarding
customer satisfaction and not published

RBI Bulletins, CMIE Reports


Primary Data

Secondary Data

First hand data collected by the


Data collected from other available
way of sample survey or a
sources (collected by others)
census.
Observation, personal interview Company Annual Reports, Information
or questionnaires
from Internet
Apart from this, data can also be classified along some characteristics of data like age, gender,
education, income, etc.
Some common methods of classification are

Geographical, i.e. area-wise or region-wise

Chronological, yearly data, quarterly data, monthly data, weekly data

Qualitative, i.e., depending on characteristics

By magnitude

Methods of Data Collection


Complete Enumeration (Census Survey or Census): - is a method in which the entire population is
taken up and information is collected relating to all the units of the population
Example: Census conducted by Government of India every
10 years
This method gives accurate information but more resources (time, money and people) are required.
Sample Method: - is a method in which enumeration of a part of the population or universe is taken up

and information is gathered regarding the selected part.


Example: Checking only a few units from a production batch
The choice between the two methods of data collection depends on the factors like purpose of the
enquiry, time available for making a decision, budget allocation, and the accuracy of data required for
decision making.
Tables as Data Presentation Device
Tabular presentation is used to summarize or condense data. Tables help the managers to analyze the
relationships and trends in the collected data.
Tabulation is the logical listing of related quantitative data in vertical columns and horizontal rows with
sufficient explanatory and qualifying words, phrases and statements in the form of titles, headings and
explanatory notes to make clear the full meaning, context and origin of the data.
Line Chart

In graphical presentation, the collected data is represented by various types of geometrical devices such
as points, lines, bars, multi-dimensional figures, pictorials, etc. A graphical method is a non-quantitative
form of presentation; the quantities are also indicated along with them. The magnitude of the data is
depicted visually through the proportional size of the diagram or graph.
Line chart is one of the effective graphical methods to depict the trend in a data. If the line is rising from
left to right, then the data is showing an increasing trend and vice-versa.
Bar Chart

Bar charts use rectangles to present the data which is referred as bars. There are two types of bar
charts vertical and horizontal. These diagrams are one-dimensional as the magnitude of the data is
represented by length of the bar. The thickness or width of the bar has no relevance. The bars should be
arranged from left to right.
The given bar diagram shows the yearly sales of a company.
Multiple bar diagram or compound bar diagrams are used to compare two or more sets of related data.
This diagram is similar to the simple bar diagram, but bars in each set are placed together and gap is left
between each set of bars.
The given multiple bar diagram shows yearly export import values of a company.
Pie Diagram

Pie diagram is a circle divided into various segments and each segment represents the percentage
contribution of various components to the total. Pie diagrams are used to compare many components

simultaneously.
For drawing a pie diagram it is necessary to express the value of each category as a percentage of the
total. 3600 in a circle represent the whole (i.e., 100%) and 3.60 constitute 1% of the total.
Degree of each part=Part 360/Total =Part 3.6
The pie diagram represents the share holding pattern of a company.
Pictogram
Pictograms represent the data in the form of pictures. The data is presented using appropriate pictures
and their sizes indicate the magnitude of the data.
Scatter Diagram
Scatter diagram is used to study the correlation between two dependent variables. The scatter diagram
is drawn by plotting the points on X and Y axis. When the points on the graph follow a pattern, it
indicates high correlation and irregular pattern or behavior indicates low correlation.
Frequency Distribution
The table in which raw data is tabulated by dividing it into classes of convenient size and computing the
number of data elements (or their fraction out of the total) falling within each pair of class boundary is
called a frequency distribution table.
Classes are groups of values having same characteristics of data. E.g. Employees of a company are
grouped together on the basis of their ages.
The range of values of a given class is called a class limits, and middle of a class interval is called class
mark. For the class 25-29, 25 and 29 are called as class limits, 27 is the class mark and
30-25 = 5 is the class interval.
A cumulative frequency distribution is a tabular display of data showing how many observations lie
above, or below, certain values.
Construction of Frequency Distribution
To construct a frequency distribution, the data is to be divided into groups of similar intervals. Then the
number of data points that fall into each group has to be recorded against each group.
Frequency distributions can be constructed with classes of qualitative attributes. The classification can be
either quantitative or qualitative and either discrete or continuous classes.
Histogram
A histogram is a series of rectangles, the width of each being proportional to the range of values within a
class and height being proportional to the number of items falling in the class. The widths of the bars are
uniform when the widths of classes in a frequency distribution are equal.
When a histogram is constructed using relative frequency, it is called a relative frequency histogram.
While the absolute histogram represents the number of data items, the relative frequency histogram
shows the relative size of each class with the total.

Frequency Polygon
For constructing a frequency polygon, the frequencies are marked on the vertical axis and the values of
variables (that are being studied) are taken on the horizontal axis. Dots are put on the graph against the
class marks to represent the frequencies. These dots are connected by drawing straight lines, this forms
a frequency polygon. When the straight line are smoothed by adding classes and data points, is called a
frequency curve.
Frequency polygons represent graphically both simple and relative frequency distributions.
Ogive
Frequency Distribution Table

The Less than Ogive Curve for the above Frequency Distribution is:

When the cumulative frequencies are plotted on a graph we get an Ogive.


Ogive are of two types less than ogive and more than ogive. The more than ogive slopes down and to
the right whereas the less than ogive slopes up and to the right.
Skew ness

Skew ness and Kurtosis are the two characteristics of data sets that provide useful trends and patterns
in the data represented as frequency distribution curves.
Skew ness is the extent to which a distribution of data points is concentrated at one end or the other; or
the lack of symmetry in the curve. The curves representing the data points in the data set can be of two
types:

Symmetrical curves :- A curve is said to be symmetrical when a vertical line drawn from the
center of the curve to the X-axis divides the area under the curve into equal parts.

Skewed curves (positively or negatively skewed):-A curve is said to be skewed when the
values in the frequency distribution are concentrated more towards the left or right side of the
curve i.e. the values are not equally distributed from the center of the curve. A curve is said to
be positively skewed when the tail of the curve is more stretched towards the right side. It is
said to be negatively skewed when the tail is more stretched towards the left side.

Kurtosis

Kurtosis is the degree of peak ness of a distribution of points i.e. Kurtosis measures the peaked ness of
a distribution. Two curves with same central location and dispersion may have different degrees of
kurtosis.
Summary

Data is a collection of related observations, facts or figures.

Data can be categorized into published data and unpublished data.

Data collection is done in two ways complete enumeration and sample method.

Data is systematically and clearly represented in the form of tables and graphs.

Line charts, bar charts, pie diagram, scatter diagram are some of the tools that are used to
graphically represent the data.

A frequency distribution is a tabular form that organizes data into classes.

Frequency polygons are graphical representation of frequency tables.

Skewness is the lack of symmetry in a curve

Kurtosis is degree of peaked ness of a distribution of points.

Measure of central Tendency

In this Lesson we will get familiar with measures of central tendency. We will study the objectives
of averaging and requisites of good average. We will also focus on other types of averages like
arithmetic mean, weighted arithmetic mean, geometric mean, harmonic mean, median and mode.
Objectives of Averaging

To find out one value that represents the whole mass of data

If the researcher knows the average value of the data, then he need not study each
and every data point in the data set.

To enable comparison

Averages act as a common denominator for comparing two or more sets of data.

To establish relationship

Averages play a major role in establishing relationships between separate groups in


quantitative terms.

To derive inferences about a universe from a sample

The average calculated from a sample data give a reliable idea about the average of
the entire universe.

To aid decision-making

Averages act as benchmarks or standards for managerial control and decisionmaking.


Requisites of Good Average
An ideal average should have the following characteristics:
Should be rigidly defined
Should be mathematically expressed (Have a mathematical formula)
Should be readily comprehensible and easy to calculate
Should be calculated based on all the observations
Should be least affected by extreme fluctuations in sampling data.
Should be suitable for further mathematical treatment.
In addition to the above requisites, a good average should also retain maximum characteristics of
the data, it should be a nearest value to all the data elements. Averages should be calculated for
homogeneous data i.e. ages, sales etc.
Types of Averages

Averages are basically divided into two types: Mathematical averages and positional averages.
The mathematical averages are arithmetic mean, geometric mean and harmonic mean. The
positional averages are median and mode.
Arithmetic Mean

The mean of a sample containing n observations is given by


= (1/n) (x1 +x2 +...+ xn)
= x/n
=(1/n)
where,
is sample mean
n is the number of elements
When the mean is calculated for the entire population it is known as population arithmetic mean
(). N is the number of elements (observations) in the population.
Then
= x/N
Example: The height of five friends is A=5.6, B=5.9,
C=5.8, D=6.0, E=5.7. What is their average height?
= x / n= (5.6 + 5.9 + 5.8 + 6.0 + 5.7) / 5
= 5.8
Grouped Data

Calculate the mid-point of each class


Mid-point = (Lower Limit + Upper Limit) / 2

Multiply each mid-point by frequency of observations in the corresponding class (f.x)


= (f

x )/n

f =Number of observations in each class


x=class mark (mid point of each class)
Number of observations in the sample
n=
Class

Frequency

21-25

38

26-30

30

31-35

35

36-40

25

41-45

15

46-50

12

51-55

56-60

Class

Frequency (f)

Class Mark (x)

fx

21-25

38

23

874

26-30

30

28

840

36-40

25

38

950

41-45

15

43

645

46-50

12

48

576

51-55

53

159

56-60

58

116
fx=
5315

n = 160
= (f

= 33.218

x )/n=

Short-cut Method

Locate an assumed mean. Assign a code value zero to the class containing assumed
mean

Assign negative integers as codes to the classes with values smaller than assumed mean
and positive integers to the classes with values larger than assumed mean
=

x0 + w (u

f)/n

Where,
=Mean
X0 =value of the class mark assigned the code 0
w =numerical width of the class interval
U =code assigned to each class
F =frequency of the class (number of observations)
N =total number of observations in the sample
Example: We will solve the previous example by the short-cut method.

Class

Class Mark
(X)

Code
(u)

Frequency
(f)

uf

21-25

23

-3

38

-114

26-30

28

-2

30

-60

31-35

33

-1

35

-35

36-40

38

25

41-45

43

15

15

46-50

48

12

24

51-55

53

56-60

58

8
-153

x0 + w (u f)/n
=
= 38 + 5 -153 / 160
= 33.218
Weighted Arithmetic Mean
The weighted mean is calculated taking into account the relative importance of each of the values
to the total value. The formula for calculating the weighted average is:

= (w x)/ Sw

Where,
= symbol for weighted mean
w
W
=weight allocated to each observation
(wx
=sum of each weight multiplied by that element
)

Sw

=sum of all the weights

Example:
Class of
Labour

Wage per hour


(x) (Rs)

Labour hours per unit


Product 1

Product 2

Unskilled
Semiskilled
Skilled

10
15
20

2
3
5

6
2
1

The labor cost / hour for Product 1 is given by


xw = (wx) / Sw

=
=
Similarly
xw=

Rs 16.5/1
Rs. 16.5 per hour
for labor cost / hour for Product 2 is given by
(wx) / Sw

==

Rs. 12.22 per hour

Median
The median is the middle value of a series arranged in ascending or descending order. The
median is the 50th percentile value below which 50% of the values in the sample fall.
Ungrouped Data

If the dataset contains an odd number of items, the middle item of the dataset is the
median

If the dataset contains an even number of items, the average of the two middle items is
the median

If the total of the frequencies is odd, say n, then value of (n+1)/2th item gives the median

If the total of the frequencies is even, say, 2n, then the arithmetic mean of nth and
(n + 1)th gives the median
Example: A fruit vendor recorded the sales of oranges for a week.
Day

Sunday Monday Tuesday Wednesday Thursday Friday Saturday

Number of oranges

280

240

250

220

270

What is the median number of oranges sold in that week?


Solution: First arrange the data in ascending order
Days

Number of oranges

Wednesday

220

225

265

Friday

225

Monday

240

Tuesday

250

Saturday

265

Thursday

270

Sunday

280

The dataset contains 7 data points, so the median is given by the middle item, i.e. item number
4. Thus the median for the given data is 250.
Grouped Data
To find the median for grouped data, first we need to identify the median class. It is
assumed that the items are evenly spaced over the entire class interval. Then by
interpolation median is calculated as

Median=

W + Lm

where,
Lm =lower limit of the median class
fm =frequency of the median class
F =cumulative frequency up to the lower limit of the median class
W =width of the class interval
N =total frequency

Example:
Class

Frequency

Cumulative
Frequency

101-200

201-300

12

18

301-400

18

36

401-500

27

63

501-600

21

84

601-700

17

101

701-800

15

116

801-900

11

127

9011000

136

The total frequency of the data N = 136, thus median is given by


item. i.e. 68.5th item,
which lies in 501-600 class.

The median class is 501-600 class.

Lm =501,N=136,F=63,fm =21,W=100

W + Lm

Median=

=(0.21428 100) + 501


=522.428
Mode
Mode is defined as the value of the variable which occurs most frequently in the data set.
When the data is grouped in a frequency distribution the manager must assume that the mode is
located in the class with highest frequency. The mode can be found using the following equation.
Mode,

Mo =Lmo +

Where,
Lmo =lower limit of the modal class
d1 =frequency of the modal class - the frequency of the class just below it
d2 =frequency of the modal class - the frequency of the class just above it
w =width of the modal class

Advantages and Disadvantages


In case of a symmetrical distribution, mean, median and mode coincide.
In case of a moderately asymmetrical, the mean, median and mode are related in the following
manner
Mode = 3 Median - 2 Mean
Summary

We analyze the data statistically to calculate the average point of the data.

The average point of the data that is located centrally is called as the measure of central
tendency.

There are two types of averages mathematical averages Arithmetic mean, Geometric
mean and Harmonic mean and Positional averages Median and mode.

Measure Of Dispersion
In this Lesson we will get familiar with what is dispersion. We will study a few measures of
dispersion namely range, quartile deviation and mean deviation along with their merits and
limitations. In this session we will discuss the calculation of these measures for ungrouped
and grouped data.

To study measures of dispersion: variance and standard deviation

To study Bienayme Chebyshevs rule


Dispersion

Dispersion of a dataset measures the variability of the data or how data is distributed in a
dataset.
When the dispersion is measured in terms of the difference between two values selected
from the data set, it is called as distance measure. E.g. The range, the interquartile range
and quartile deviation
When the dispersion is measured in terms of the average deviation from some measure of
central tendency, it is called as average deviation measure. E.g. Mean Deviation, Variance
and Standard Deviation
The Range
For ungrouped data, range is defined as the difference between the value of the smallest
observation and the value of the largest observation present in the distribution.
Range = Largest Value Smallest Value
For grouped data, range is defined as the difference between the upper limit of the highest
class and the lower limit of the smallest class.
Range = Upper limit of the highest class - Lower limit of the lowest class
Coefficient of range is relative measure of range and is used for comparing observations in
different units. For example, a physical trainer cannot compare the range of the weights of
employees with range of their heights as the range of weights would be in kilograms and
that of heights in centimeters.
Coeffici
ent of =
Range

Example: Calculate range and coefficient of


range for the given data:
45, 67, 87, 55, 74, 81
Range = Largest Value Smallest Value
= 87 45 = 42
Coefficient of Range =
=
= 0.318
Example: Calculate range and coefficient of

range for the

given data:

Class

0-10

11-20

21-30

31-40

41-50

Frequency

10

Upper limit of the highest class - Lower limit of the lowest


class
= 50 0
= 50

Range =

Coefficient of Range =

=1

Merits:
Range is simple to understand and easy to calculate.
Range is the quickest way to get a measure of dispersion, although it is not accurate.
Limitations:
It is not based on all the observations in the data. It is computed based on the highest
and the lowest values and ignores the nature of dispersion among other values of
observations in the data set.
It is influenced by extreme values and hence fluctuates from sample to sample of a
population, even though the values that fall in between the highest and lowest values are
similar.
Range cannot be computed for frequency distributions with open-end classes.
Range fails to explain about the character of the distribution within two extreme
observations (i.e. L and S)
Range is unreliable as a measure of dispersion of the values within a distribution.
Uses:

The quality control experts analyze the dispersion of a products quality. If the
dispersion is more, that means the quality keeps changing, if the dispersion is less
then the quality remains more or less the same.

Financial analysts are concerned about the dispersion of a firms earnings. Widely
dispersed earnings, those varying from extremely high to low, indicate a higher risk
to stockholders and creditors than do earnings remaining relatively stable.

Quartile Deviation

Interquartile Range
The range calculated on the basis of middle 50% of the observations is called as
interquartile range. This interquartile range is calculated from observations obtained after
discarding one quartile of the observations at the lower end and another quartile of the
observations at the upper end of the distribution. Thus, interquartile range is the
difference between the third quartile and the first quartile.
Interquartile range = Q3-Q1
Quartile Deviation
Quartile deviation is defined as one half of the interquartile range. Quartile deviation gives
the average value by which the two quartiles differ from the median. In symmetrical
distribution, the quartiles Q3 and Q1 are equidistant from the median i.e. Median - Q1 = Q3
Median
Quartile deviation (Q.D.)

The relative measure of quartile deviation is called coefficient of quartile deviation. It can
be used to compare the degree of variation in different distributions.
Coefficient of Q.D

For Ungrouped Data


Lower quartile (Q1)

Upper quartile (Q3)

ob
servation

observation
Where,
N = total number of observations
Example: The sales figures of a company are
given below. Calculate the quartile deviation for
the sales data.
Month &
Year

April 02

May 02

June
02

July
02

Aug. 02

Sept.
02

Oct.02

Sales (in
Rs. 000)

15.6

16.3

18.1

19.5

20.4

21.5

22.7

Q1 =

=2

Q3 =

The 2nd observation is 16.3 and the 6th observation is 21.5

Quartile deviation (Q.D.) =

= 2.6

For Grouped Data


Q1 =
Q3 =
Where,
L1 = the lower boundary of the first quartile class (Q1)
L3 = the lower boundary of the third quartile class (Q3)
N = Total cumulative frequency
f = Frequency of the quartile class
h = Class interval (width)
C = Cumulative frequency of the class just above the quartile class

Example: The wages of employees are given


below. Calculate the quartile deviation and
coefficient of quartile deviation.
Wages

15012500

No. of
3
Employees
Wages
1501-2500
2501-3500
3501-4500
4501-5500
5501-6500

25013500

35014500

45015500

55016500

10

15

12

No. of Employees
3
10
15
12
2

Cumulative Frequency
3
13
28
40
42

Cumulative Frequency Table


Q1 =

= 10.75th observation

This observation will fall in class (2501-3500)


L1 = 2501, C = 3, f = 10, h = 1000
=

Q1 =

Q3 =

= 3251

= 32.25th observation

This observation will fall in class (4501-5500)


L3 = 4501, C = 28, f = 12

Q3 =

=
=

4792.667

Quartile Deviation =

= 770.833

Coefficient of Q.D.

=
0.787

Merits:

Q.D can be used as a measure of variation for open-ended


distributions.
Q.D. is a better measure of variation for highly skewed distribution or
distribution with extreme values as Q.D. is not affected by the presence
of extreme values.

Limitations:

As the Q.D is calculated using only 50% of the total observations, it


cannot be regarded as a good measure of variation.

Q.D. is not a real measure of variation as it does not measure the


scatter of observations from the average. Q.D. is only a positional
average.
Mean Deviation

Calculation of mean deviation for ungrouped data

Calculate the sample mean

Subtract the mean from every value in the data set and ignore the positive or
negative signs

Add all the differences and divide the sum by the number of items in the sample

Absolute Mean Deviation =

(for a sample)

Example: The maximum day


temperature was recorded for 10
days. Calculate the absolute
mean deviation.
Day
Tempera
ture (oC)

1
25.0

2
24.8

3
25.2

4
24.6

5
24.0

6
23.7

7
23.3

8
23.0

9
22.7

10
22.5

Absolute
deviation

Day

Temperature
(oC)

Deviation from
mean (x )

25.0

1.12

1.12

24.8

0.92

0.92

25.2

1.32

1.32

24.6

0.72

0.72

24.0

0.12

0.12

23.7

-0.18

0.18

23.3

-0.58

0.58

23

-0.88

0.88

22.7

-1.18

1.18

10

22.5

-1.38

1.38

N=1
0

x= 238.8

Mean (

= 8.4

Absolute Mean deviation

= 23.88

= 0.84

Example: Calculate mean


deviation for the given data.
Class

0-200

201-400

401-600

601-800

801-1000

Frequency

32

108

67

28

14

Solution:
Class
Interval

Frequ
ency
(f)

Mid-value of
class interval
(X)

0-200

32

100

3200

307.0879

201-400

108

300.5

32454

106.5879

401-600

67

500.1

33506.7

93.0121

6231.8107

601-800

28

700.1

19602.8

293.0121

8204.3388

801-1000

14

900.1

12601.4

493.0121

6902.1694

9826.8128
11511.493

N=

=249

2500.8

101364.9

=42676.624

Hint:
Use MS Excel to demonstrate the example

=
Absolute Mean

= 407.0879
=

= 171.3920

Deviation
Merits:

Absolute mean deviation is simple and easy to understand.


Absolute mean deviation is a more comprehensive measure of
dispersion as it is dependent on all observations of a distribution.
As it is obtained by taking the average of the deviations of every
observation from the mean, it is a true measure of dispersion.

Limitations:

Absolute Mean deviation is less reliable as it is the arithmetic mean of


the absolute values (ignoring the positive and negative signs).

Absolute Mean deviation is not conducive to further algebraic


treatment.

Absolute Mean deviation cannot be computed for distributions with


open-end classes.
Variance (2)

Steps for calculating variance for ungrouped data:

Calculate the sample mean

Subtract the mean from every value in the data set and square the difference

Add all the differences and divide the sum by the total number of items in the
sample
=

Steps for calculating variance for grouped data:


Calculate the Sample
mean

= (f

x )/f

Where x is the mid-point of the class and f is the frequency of the class

Calculate the difference between the sample mean and the mid-point of the class

and square the difference

Multiply the frequency of the class and the squared difference. Add all the products
and divide the sum by the total frequency
=
Standard Deviation ()

Standard deviation is the square root of the average of the squared distances of the
observations from the mean (i.e. square root the variance).
Standard deviation for ungrouped

data,
Standard deviation for grouped data,
Properties of Standard Deviation

Standard Deviation is independent of change of origin

The value of standard deviation remains the same, if in a series each of the observation is
increased or decreased by a constant quantity.
For example, for the observations 3, 10 and
12
= 8.33,

= 3.85

If we increase the value of each observation by


4.5 we get the observations 7.5, 14.5 and 16.5.
Now
= 12.833
and
=

= 3.859
Hence although has increased by 4.5,
remains the same.

Standard Deviation is dependent on the change of scale

For a given series, if each observation is multiplied or divided by a constant quantity


standard deviation will also be similarly affected.
Suppose we multiply each observation by 6, the observations
become 18, 60 and 72. = 50
=
which is nothing but the earlier

= 23.152
, 3.859 6.

Standard deviation is the minimum root-mean- square deviation

The sum of the squares of the deviations of items of any series from a value other than the

arithmetic mean would always be greater.


We know that it is possible to compute combined mean of two or more groups, it is also
possible to compute combined standard deviation of two or more groups. Combined
standard deviation denoted by is computed as follows:

Where,
= standard deviation of first group
= standard deviation of second group
= 1 -
= 2 -
= (n11 + n22 ) / n1 + n2
Coefficient of Variation

d1
d2

The coefficient of variation is a measure of relative dispersion and is given by


Coefficient of variation (%) =

100

The coefficient of variation measures the spread of a set of data as a proportion of its
mean. It is used in problem situations where we want to compare the variability,
homogeneity, stability, uniformity and consistency of two or more data sets. The data set
for which the coefficient of variation is greater is said to be more variable i.e. less
consistent or less homogeneous. On the other hand, if the coefficient of variation is less it
is said to be less variable i.e., more consistent or more homogeneous.
Example 1: Find the standard
deviation and the coefficient of
variance for the given data.

xi

15

13

17

16

18

20

xi

(xi-

(xi-

15

-1.5

2.25

13

-3.5

12.25

17

0.5

0.25

16

-0.5

0.25

18

1.5

2.25

20

3.5

12.25

)2

Sum

=
99

29.50

= 16.5

4.9
1

2.21

Coefficient of variation

100

(%)
=

= 13.429
Example 2: Find the standard deviation and
the coefficient of variance for the given data.

Class

0-10

11-20

21-30

31-40

41-50

51-60

Frequen
cy

13

15

18

20

Class

Fre
que
ncy
(f)

Mid
poi
nt
(x)

f x

x-

(x-

0-10

30

-31.8375

1013.6264

6081.7584

11-20

15.5

124

-21.3375

455.2889

3642.3112

21-30

13

25.5

331.5

-11.3375

128.5389

1671.0057

31-40

15

35.5

532.5

-1.3375

1.7889

26.8335

41-50

18

45.5

819

8.6625

75.0389

1350.7002

)2

f(x-

)2

51-60

20

Sum

80

55.5

1110

18.6625

348.2889

6965.778

2947

-38.525

2022.5709

19738.387

= 246.7298

= 15.7076

Coefficient of variation (%) =

= 42.6402

100

You might also like