Foundation Notes 2013

Foundation Notes
Arranging Data
In this Lesson we will get familiar with data and its various types. We will also discuss the methods of
data collection. Then we will focus on various data presentation tools like table and graphs (like line
chart, bar chart, pie diagram, pictogram and scatter diagram).
In this Lesson we will get familiar with frequency distribution and frequency polygon. We will also study
the properties (skew ness and kurtosis) of frequency distribute on curve.
What is Data?
Data is a collection of related observations, facts or figures. A collection of data is called a data set, and
each observation a data point.
Example: Marks obtained by students in Introduction to
Quantitative Methods course
Types of Data
Raw Data: Information before its systematic arrangement and analysis is called raw data. Useful
inferences can be derived from the raw data by applying various statistical methods.
Example: Sales data of a company for a year
Data can be classified as:
Published Data
Unpublished Data
Data that is already collected

Data that is yet to be collected or printed
and published
Data collected by a shopkeeper regarding
customer satisfaction and not published
RBI Bulletins, CMIE Reports

Primary Data
Secondary Data
First hand data collected by the

Data collected from other available
way of sample survey or a
sources (collected by others)
census.
Observation, personal interview Company Annual Reports, Information
or questionnaires
from Internet
Apart from this, data can also be classified along some characteristics of data like age, gender,
education, income, etc.
Some common methods of classification are
Geographical, i.e. area-wise or region-wise
Chronological, yearly data, quarterly data, monthly data, weekly data
Qualitative, i.e., depending on characteristics
By magnitude
Methods of Data Collection

Complete Enumeration (Census Survey or Census): - is a method in which the entire population is
taken up and information is collected relating to all the units of the population
Example: Census conducted by Government of India every
10 years
This method gives accurate information but more resources (time, money and people) are required.
Sample Method: - is a method in which enumeration of a part of the population or universe is taken up
and information is gathered regarding the selected part.

Example: Checking only a few units from a production batch
The choice between the two methods of data collection depends on the factors like purpose of the
enquiry, time available for making a decision, budget allocation, and the accuracy of data required for
decision making.
Tables as Data Presentation Device
Tabular presentation is used to summarize or condense data. Tables help the managers to analyze the
relationships and trends in the collected data.
Tabulation is the logical listing of related quantitative data in vertical columns and horizontal rows with
sufficient explanatory and qualifying words, phrases and statements in the form of titles, headings and
explanatory notes to make clear the full meaning, context and origin of the data.
Line Chart
In graphical presentation, the collected data is represented by various types of geometrical devices such
as points, lines, bars, multi-dimensional figures, pictorials, etc. A graphical method is a non-quantitative
form of presentation; the quantities are also indicated along with them. The magnitude of the data is
depicted visually through the proportional size of the diagram or graph.
Line chart is one of the effective graphical methods to depict the trend in a data. If the line is rising from
left to right, then the data is showing an increasing trend and vice-versa.
Bar Chart
Bar charts use rectangles to present the data which is referred as bars. There are two types of bar
charts vertical and horizontal. These diagrams are one-dimensional as the magnitude of the data is
represented by length of the bar. The thickness or width of the bar has no relevance. The bars should be
arranged from left to right.
The given bar diagram shows the yearly sales of a company.
Multiple bar diagram or compound bar diagrams are used to compare two or more sets of related data.
This diagram is similar to the simple bar diagram, but bars in each set are placed together and gap is left
between each set of bars.
The given multiple bar diagram shows yearly export import values of a company.
Pie Diagram
Pie diagram is a circle divided into various segments and each segment represents the percentage
contribution of various components to the total. Pie diagrams are used to compare many components
simultaneously.
For drawing a pie diagram it is necessary to express the value of each category as a percentage of the
total. 3600 in a circle represent the whole (i.e., 100%) and 3.60 constitute 1% of the total.
Degree of each part=Part 360/Total =Part 3.6
The pie diagram represents the share holding pattern of a company.
Pictogram
Pictograms represent the data in the form of pictures. The data is presented using appropriate pictures
and their sizes indicate the magnitude of the data.
Scatter Diagram
Scatter diagram is used to study the correlation between two dependent variables. The scatter diagram
is drawn by plotting the points on X and Y axis. When the points on the graph follow a pattern, it
indicates high correlation and irregular pattern or behavior indicates low correlation.
Frequency Distribution
The table in which raw data is tabulated by dividing it into classes of convenient size and computing the
number of data elements (or their fraction out of the total) falling within each pair of class boundary is
called a frequency distribution table.
Classes are groups of values having same characteristics of data. E.g. Employees of a company are
grouped together on the basis of their ages.
The range of values of a given class is called a class limits, and middle of a class interval is called class
mark. For the class 25-29, 25 and 29 are called as class limits, 27 is the class mark and
30-25 = 5 is the class interval.
A cumulative frequency distribution is a tabular display of data showing how many observations lie
above, or below, certain values.
Construction of Frequency Distribution
To construct a frequency distribution, the data is to be divided into groups of similar intervals. Then the
number of data points that fall into each group has to be recorded against each group.
Frequency distributions can be constructed with classes of qualitative attributes. The classification can be
either quantitative or qualitative and either discrete or continuous classes.
Histogram
A histogram is a series of rectangles, the width of each being proportional to the range of values within a
class and height being proportional to the number of items falling in the class. The widths of the bars are
uniform when the widths of classes in a frequency distribution are equal.
When a histogram is constructed using relative frequency, it is called a relative frequency histogram.
While the absolute histogram represents the number of data items, the relative frequency histogram
shows the relative size of each class with the total.
Frequency Polygon
For constructing a frequency polygon, the frequencies are marked on the vertical axis and the values of
variables (that are being studied) are taken on the horizontal axis. Dots are put on the graph against the
class marks to represent the frequencies. These dots are connected by drawing straight lines, this forms
a frequency polygon. When the straight line are smoothed by adding classes and data points, is called a
frequency curve.
Frequency polygons represent graphically both simple and relative frequency distributions.
Ogive
Frequency Distribution Table
The Less than Ogive Curve for the above Frequency Distribution is:
When the cumulative frequencies are plotted on a graph we get an Ogive.

Ogive are of two types less than ogive and more than ogive. The more than ogive slopes down and to
the right whereas the less than ogive slopes up and to the right.
Skew ness
Skew ness and Kurtosis are the two characteristics of data sets that provide useful trends and patterns
in the data represented as frequency distribution curves.
Skew ness is the extent to which a distribution of data points is concentrated at one end or the other; or
the lack of symmetry in the curve. The curves representing the data points in the data set can be of two
types:
Symmetrical curves :- A curve is said to be symmetrical when a vertical line drawn from the
center of the curve to the X-axis divides the area under the curve into equal parts.
Skewed curves (positively or negatively skewed):-A curve is said to be skewed when the
values in the frequency distribution are concentrated more towards the left or right side of the
curve i.e. the values are not equally distributed from the center of the curve. A curve is said to
be positively skewed when the tail of the curve is more stretched towards the right side. It is
said to be negatively skewed when the tail is more stretched towards the left side.
Kurtosis
Kurtosis is the degree of peak ness of a distribution of points i.e. Kurtosis measures the peaked ness of
a distribution. Two curves with same central location and dispersion may have different degrees of
kurtosis.
Summary
Data is a collection of related observations, facts or figures.
Data can be categorized into published data and unpublished data.
Data collection is done in two ways complete enumeration and sample method.
Data is systematically and clearly represented in the form of tables and graphs.
Line charts, bar charts, pie diagram, scatter diagram are some of the tools that are used to
graphically represent the data.
A frequency distribution is a tabular form that organizes data into classes.
Frequency polygons are graphical representation of frequency tables.
Skewness is the lack of symmetry in a curve
Kurtosis is degree of peaked ness of a distribution of points.
Measure of central Tendency
In this Lesson we will get familiar with measures of central tendency. We will study the objectives
of averaging and requisites of good average. We will also focus on other types of averages like
arithmetic mean, weighted arithmetic mean, geometric mean, harmonic mean, median and mode.
Objectives of Averaging
To find out one value that represents the whole mass of data
If the researcher knows the average value of the data, then he need not study each
and every data point in the data set.
To enable comparison
Averages act as a common denominator for comparing two or more sets of data.
To establish relationship
Averages play a major role in establishing relationships between separate groups in

quantitative terms.
To derive inferences about a universe from a sample
The average calculated from a sample data give a reliable idea about the average of
the entire universe.
To aid decision-making
Averages act as benchmarks or standards for managerial control and decisionmaking.

Requisites of Good Average
An ideal average should have the following characteristics:
Should be rigidly defined
Should be mathematically expressed (Have a mathematical formula)
Should be readily comprehensible and easy to calculate
Should be calculated based on all the observations
Should be least affected by extreme fluctuations in sampling data.
Should be suitable for further mathematical treatment.
In addition to the above requisites, a good average should also retain maximum characteristics of
the data, it should be a nearest value to all the data elements. Averages should be calculated for
homogeneous data i.e. ages, sales etc.
Types of Averages
Averages are basically divided into two types: Mathematical averages and positional averages.
The mathematical averages are arithmetic mean, geometric mean and harmonic mean. The
positional averages are median and mode.
Arithmetic Mean
The mean of a sample containing n observations is given by

= (1/n) (x1 +x2 +...+ xn)
= x/n
=(1/n)
where,
is sample mean
n is the number of elements
When the mean is calculated for the entire population it is known as population arithmetic mean
(). N is the number of elements (observations) in the population.
Then
= x/N
Example: The height of five friends is A=5.6, B=5.9,
C=5.8, D=6.0, E=5.7. What is their average height?
= x / n= (5.6 + 5.9 + 5.8 + 6.0 + 5.7) / 5
= 5.8
Grouped Data
Calculate the mid-point of each class

Mid-point = (Lower Limit + Upper Limit) / 2
Multiply each mid-point by frequency of observations in the corresponding class (f.x)

= (f
x )/n
f =Number of observations in each class

x=class mark (mid point of each class)
Number of observations in the sample
n=
Class
Frequency
21-25
38
26-30
30
31-35
35
36-40
25
41-45
15
46-50
12
51-55
56-60
Class
Frequency (f)
Class Mark (x)
fx
21-25
38
23
874
26-30
30
28
840
36-40
25
38
950
41-45
15
43
645
46-50
12
48
576
51-55
53
159
56-60
58
116
fx=
5315
n = 160
= (f
= 33.218
x )/n=
Short-cut Method
Locate an assumed mean. Assign a code value zero to the class containing assumed
mean
Assign negative integers as codes to the classes with values smaller than assumed mean
and positive integers to the classes with values larger than assumed mean
=
x0 + w (u
f)/n
Where,
=Mean
X0 =value of the class mark assigned the code 0
w =numerical width of the class interval
U =code assigned to each class
F =frequency of the class (number of observations)
N =total number of observations in the sample
Example: We will solve the previous example by the short-cut method.
Class
Class Mark
(X)
Code
(u)
Frequency
(f)
uf
21-25
23
-3
38
-114
26-30
28
-2
30
-60
31-35
33
-1
35
-35
36-40
38
25
41-45
43
15
15
46-50
48
12
24
51-55
53
56-60
58
8
-153
x0 + w (u f)/n
=
= 38 + 5 -153 / 160
= 33.218
Weighted Arithmetic Mean
The weighted mean is calculated taking into account the relative importance of each of the values
to the total value. The formula for calculating the weighted average is:
= (w x)/ Sw
Where,
= symbol for weighted mean
w
W
=weight allocated to each observation
(wx
=sum of each weight multiplied by that element
)
Sw
=sum of all the weights
Example:
Class of
Labour
Wage per hour

(x) (Rs)
Labour hours per unit

Product 1
Product 2
Unskilled
Semiskilled
Skilled
10
15
20
2
3
5
6
2
1
The labor cost / hour for Product 1 is given by

xw = (wx) / Sw
=
=
Similarly
xw=
Rs 16.5/1
Rs. 16.5 per hour
for labor cost / hour for Product 2 is given by
(wx) / Sw
==
Rs. 12.22 per hour
Median
The median is the middle value of a series arranged in ascending or descending order. The
median is the 50th percentile value below which 50% of the values in the sample fall.
Ungrouped Data
If the dataset contains an odd number of items, the middle item of the dataset is the
median
If the dataset contains an even number of items, the average of the two middle items is
the median
If the total of the frequencies is odd, say n, then value of (n+1)/2th item gives the median
If the total of the frequencies is even, say, 2n, then the arithmetic mean of nth and
(n + 1)th gives the median
Example: A fruit vendor recorded the sales of oranges for a week.
Day
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Number of oranges
280
240
250
220
270
What is the median number of oranges sold in that week?

Solution: First arrange the data in ascending order
Days
Number of oranges
Wednesday
220
225
265
Friday
225
Monday
240
Tuesday
250
Saturday
265
Thursday
270
Sunday
280
The dataset contains 7 data points, so the median is given by the middle item, i.e. item number
4. Thus the median for the given data is 250.
Grouped Data
To find the median for grouped data, first we need to identify the median class. It is
assumed that the items are evenly spaced over the entire class interval. Then by
interpolation median is calculated as
Median=
W + Lm
where,
Lm =lower limit of the median class
fm =frequency of the median class
F =cumulative frequency up to the lower limit of the median class
W =width of the class interval
N =total frequency
Example:
Class
Frequency
Cumulative
Frequency
101-200
201-300
12
18
301-400
18
36
401-500
27
63
501-600
21
84
601-700
17
101
701-800
15
116
801-900
11
127
9011000
136
The total frequency of the data N = 136, thus median is given by

item. i.e. 68.5th item,
which lies in 501-600 class.
The median class is 501-600 class.
Lm =501,N=136,F=63,fm =21,W=100
W + Lm
Median=
=(0.21428 100) + 501

=522.428
Mode
Mode is defined as the value of the variable which occurs most frequently in the data set.
When the data is grouped in a frequency distribution the manager must assume that the mode is
located in the class with highest frequency. The mode can be found using the following equation.
Mode,
Mo =Lmo +
Where,
Lmo =lower limit of the modal class
d1 =frequency of the modal class - the frequency of the class just below it
d2 =frequency of the modal class - the frequency of the class just above it
w =width of the modal class
Advantages and Disadvantages

In case of a symmetrical distribution, mean, median and mode coincide.
In case of a moderately asymmetrical, the mean, median and mode are related in the following
manner
Mode = 3 Median - 2 Mean
Summary
We analyze the data statistically to calculate the average point of the data.
The average point of the data that is located centrally is called as the measure of central
tendency.
There are two types of averages mathematical averages Arithmetic mean, Geometric
mean and Harmonic mean and Positional averages Median and mode.
Measure Of Dispersion
In this Lesson we will get familiar with what is dispersion. We will study a few measures of
dispersion namely range, quartile deviation and mean deviation along with their merits and
limitations. In this session we will discuss the calculation of these measures for ungrouped
and grouped data.
To study measures of dispersion: variance and standard deviation
To study Bienayme Chebyshevs rule

Dispersion
Dispersion of a dataset measures the variability of the data or how data is distributed in a
dataset.
When the dispersion is measured in terms of the difference between two values selected
from the data set, it is called as distance measure. E.g. The range, the interquartile range
and quartile deviation
When the dispersion is measured in terms of the average deviation from some measure of
central tendency, it is called as average deviation measure. E.g. Mean Deviation, Variance
and Standard Deviation
The Range
For ungrouped data, range is defined as the difference between the value of the smallest
observation and the value of the largest observation present in the distribution.
Range = Largest Value Smallest Value
For grouped data, range is defined as the difference between the upper limit of the highest
class and the lower limit of the smallest class.
Range = Upper limit of the highest class - Lower limit of the lowest class
Coefficient of range is relative measure of range and is used for comparing observations in
different units. For example, a physical trainer cannot compare the range of the weights of
employees with range of their heights as the range of weights would be in kilograms and
that of heights in centimeters.
Coeffici
ent of =
Range
Example: Calculate range and coefficient of

range for the given data:
45, 67, 87, 55, 74, 81
Range = Largest Value Smallest Value
= 87 45 = 42
Coefficient of Range =
=
= 0.318
Example: Calculate range and coefficient of
range for the
given data:
Class
0-10
11-20
21-30
31-40
41-50
Frequency
10
Upper limit of the highest class - Lower limit of the lowest

class
= 50 0
= 50
Range =
Coefficient of Range =
=1
Merits:
Range is simple to understand and easy to calculate.
Range is the quickest way to get a measure of dispersion, although it is not accurate.
Limitations:
It is not based on all the observations in the data. It is computed based on the highest
and the lowest values and ignores the nature of dispersion among other values of
observations in the data set.
It is influenced by extreme values and hence fluctuates from sample to sample of a
population, even though the values that fall in between the highest and lowest values are
similar.
Range cannot be computed for frequency distributions with open-end classes.
Range fails to explain about the character of the distribution within two extreme
observations (i.e. L and S)
Range is unreliable as a measure of dispersion of the values within a distribution.
Uses:
The quality control experts analyze the dispersion of a products quality. If the
dispersion is more, that means the quality keeps changing, if the dispersion is less
then the quality remains more or less the same.
Financial analysts are concerned about the dispersion of a firms earnings. Widely
dispersed earnings, those varying from extremely high to low, indicate a higher risk
to stockholders and creditors than do earnings remaining relatively stable.
Quartile Deviation
Interquartile Range
The range calculated on the basis of middle 50% of the observations is called as
interquartile range. This interquartile range is calculated from observations obtained after
discarding one quartile of the observations at the lower end and another quartile of the
observations at the upper end of the distribution. Thus, interquartile range is the
difference between the third quartile and the first quartile.
Interquartile range = Q3-Q1
Quartile Deviation
Quartile deviation is defined as one half of the interquartile range. Quartile deviation gives
the average value by which the two quartiles differ from the median. In symmetrical
distribution, the quartiles Q3 and Q1 are equidistant from the median i.e. Median - Q1 = Q3
Median
Quartile deviation (Q.D.)
The relative measure of quartile deviation is called coefficient of quartile deviation. It can
be used to compare the degree of variation in different distributions.
Coefficient of Q.D
For Ungrouped Data

Lower quartile (Q1)
Upper quartile (Q3)
ob
servation
observation
Where,
N = total number of observations
Example: The sales figures of a company are
given below. Calculate the quartile deviation for
the sales data.
Month &
Year
April 02
May 02
June
02
July
02
Aug. 02
Sept.
02
Oct.02
Sales (in
Rs. 000)
15.6
16.3
18.1
19.5
20.4
21.5
22.7
Q1 =
=2
Q3 =
The 2nd observation is 16.3 and the 6th observation is 21.5
Quartile deviation (Q.D.) =
= 2.6
For Grouped Data

Q1 =
Q3 =
Where,
L1 = the lower boundary of the first quartile class (Q1)
L3 = the lower boundary of the third quartile class (Q3)
N = Total cumulative frequency
f = Frequency of the quartile class
h = Class interval (width)
C = Cumulative frequency of the class just above the quartile class
Example: The wages of employees are given

below. Calculate the quartile deviation and
coefficient of quartile deviation.
Wages
15012500
No. of
3
Employees
Wages
1501-2500
2501-3500
3501-4500
4501-5500
5501-6500
25013500
35014500
45015500
55016500
10
15
12
No. of Employees
3
10
15
12
2
Cumulative Frequency
3
13
28
40
42
Cumulative Frequency Table

Q1 =
= 10.75th observation
This observation will fall in class (2501-3500)

L1 = 2501, C = 3, f = 10, h = 1000
=
Q1 =
Q3 =
= 3251
= 32.25th observation
This observation will fall in class (4501-5500)

L3 = 4501, C = 28, f = 12
Q3 =
=
=
4792.667
Quartile Deviation =
= 770.833
Coefficient of Q.D.
=
0.787
Merits:
Q.D can be used as a measure of variation for open-ended

distributions.
Q.D. is a better measure of variation for highly skewed distribution or
distribution with extreme values as Q.D. is not affected by the presence
of extreme values.
Limitations:
As the Q.D is calculated using only 50% of the total observations, it

cannot be regarded as a good measure of variation.
Q.D. is not a real measure of variation as it does not measure the

scatter of observations from the average. Q.D. is only a positional
average.
Mean Deviation
Calculation of mean deviation for ungrouped data
Calculate the sample mean
Subtract the mean from every value in the data set and ignore the positive or
negative signs
Add all the differences and divide the sum by the number of items in the sample
Absolute Mean Deviation =
(for a sample)
Example: The maximum day

temperature was recorded for 10
days. Calculate the absolute
mean deviation.
Day
Tempera
ture (oC)
1
25.0
2
24.8
3
25.2
4
24.6
5
24.0
6
23.7
7
23.3
8
23.0
9
22.7
10
22.5
Absolute
deviation
Day
Temperature
(oC)
Deviation from
mean (x )
25.0
1.12
1.12
24.8
0.92
0.92
25.2
1.32
1.32
24.6
0.72
0.72
24.0
0.12
0.12
23.7
-0.18
0.18
23.3
-0.58
0.58
23
-0.88
0.88
22.7
-1.18
1.18
10
22.5
-1.38
1.38
N=1
0
x= 238.8
Mean (
= 8.4
Absolute Mean deviation
= 23.88
= 0.84
Example: Calculate mean

deviation for the given data.
Class
0-200
201-400
401-600
601-800
801-1000
Frequency
32
108
67
28
14
Solution:
Class
Interval
Frequ
ency
(f)
Mid-value of
class interval
(X)
0-200
32
100
3200
307.0879
201-400
108
300.5
32454
106.5879
401-600
67
500.1
33506.7
93.0121
6231.8107
601-800
28
700.1
19602.8
293.0121
8204.3388
801-1000
14
900.1
12601.4
493.0121
6902.1694
9826.8128
11511.493
N=
=249
2500.8
101364.9
=42676.624
Hint:
Use MS Excel to demonstrate the example
=
Absolute Mean
= 407.0879
=
= 171.3920
Deviation
Merits:
Absolute mean deviation is simple and easy to understand.

Absolute mean deviation is a more comprehensive measure of
dispersion as it is dependent on all observations of a distribution.
As it is obtained by taking the average of the deviations of every
observation from the mean, it is a true measure of dispersion.
Limitations:
Absolute Mean deviation is less reliable as it is the arithmetic mean of

the absolute values (ignoring the positive and negative signs).
Absolute Mean deviation is not conducive to further algebraic

treatment.
Absolute Mean deviation cannot be computed for distributions with

open-end classes.
Variance (2)
Steps for calculating variance for ungrouped data:
Calculate the sample mean
Subtract the mean from every value in the data set and square the difference
Add all the differences and divide the sum by the total number of items in the
sample
=
Steps for calculating variance for grouped data:

Calculate the Sample
mean
= (f
x )/f
Where x is the mid-point of the class and f is the frequency of the class
Calculate the difference between the sample mean and the mid-point of the class
and square the difference
Multiply the frequency of the class and the squared difference. Add all the products
and divide the sum by the total frequency
=
Standard Deviation ()
Standard deviation is the square root of the average of the squared distances of the
observations from the mean (i.e. square root the variance).
Standard deviation for ungrouped
data,
Standard deviation for grouped data,
Properties of Standard Deviation
Standard Deviation is independent of change of origin
The value of standard deviation remains the same, if in a series each of the observation is
increased or decreased by a constant quantity.
For example, for the observations 3, 10 and
12
= 8.33,
= 3.85
If we increase the value of each observation by

4.5 we get the observations 7.5, 14.5 and 16.5.
Now
= 12.833
and
=
= 3.859
Hence although has increased by 4.5,
remains the same.
Standard Deviation is dependent on the change of scale
For a given series, if each observation is multiplied or divided by a constant quantity

standard deviation will also be similarly affected.
Suppose we multiply each observation by 6, the observations
become 18, 60 and 72. = 50
=
which is nothing but the earlier
= 23.152
, 3.859 6.
Standard deviation is the minimum root-mean- square deviation
The sum of the squares of the deviations of items of any series from a value other than the
arithmetic mean would always be greater.

We know that it is possible to compute combined mean of two or more groups, it is also
possible to compute combined standard deviation of two or more groups. Combined
standard deviation denoted by is computed as follows:
Where,
= standard deviation of first group
= standard deviation of second group
= 1 -
= 2 -
= (n11 + n22 ) / n1 + n2
Coefficient of Variation
d1
d2
The coefficient of variation is a measure of relative dispersion and is given by

Coefficient of variation (%) =
100
The coefficient of variation measures the spread of a set of data as a proportion of its
mean. It is used in problem situations where we want to compare the variability,
homogeneity, stability, uniformity and consistency of two or more data sets. The data set
for which the coefficient of variation is greater is said to be more variable i.e. less
consistent or less homogeneous. On the other hand, if the coefficient of variation is less it
is said to be less variable i.e., more consistent or more homogeneous.
Example 1: Find the standard
deviation and the coefficient of
variance for the given data.
xi
15
13
17
16
18
20
xi
(xi-
(xi-
15
-1.5
2.25
13
-3.5
12.25
17
0.5
0.25
16
-0.5
0.25
18
1.5
2.25
20
3.5
12.25
)2
Sum
=
99
29.50
= 16.5
4.9
1
2.21
Coefficient of variation
100
(%)
=
= 13.429
Example 2: Find the standard deviation and
the coefficient of variance for the given data.
Class
0-10
11-20
21-30
31-40
41-50
51-60
Frequen
cy
13
15
18
20
Class
Fre
que
ncy
(f)
Mid
poi
nt
(x)
f x
x-
(x-
0-10
30
-31.8375
1013.6264
6081.7584
11-20
15.5
124
-21.3375
455.2889
3642.3112
21-30
13
25.5
331.5
-11.3375
128.5389
1671.0057
31-40
15
35.5
532.5
-1.3375
1.7889
26.8335
41-50
18
45.5
819
8.6625
75.0389
1350.7002
)2
f(x-
)2
51-60
20
Sum
80
55.5
1110
18.6625
348.2889
6965.778
2947
-38.525
2022.5709
19738.387
= 246.7298
= 15.7076
Coefficient of variation (%) =
= 42.6402
100

Foundation Notes 2013

Uploaded by

Copyright:

Available Formats

Foundation Notes 2013

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Foundation Notes 2013

Uploaded by

Copyright:

Available Formats

Foundation Notes

Data that is already collected

RBI Bulletins, CMIE Reports

First hand data collected by the

Geographical, i.e. area-wise or region-wise

Chronological, yearly data, quarterly data, monthly data, weekly data

Qualitative, i.e., depending on characteristics

Methods of Data Collection

and information is gathered regarding the selected part.

When the cumulative frequencies are plotted on a graph we get an Ogive.

Data is a collection of related observations, facts or figures.

Data can be categorized into published data and unpublished data.

A frequency distribution is a tabular form that organizes data into classes.

Frequency polygons are graphical representation of frequency tables.

Skewness is the lack of symmetry in a curve

Kurtosis is degree of peaked ness of a distribution of points.

Measure of central Tendency

Averages play a major role in establishing relationships between separate groups in

To derive inferences about a universe from a sample

Averages act as benchmarks or standards for managerial control and decisionmaking.

The mean of a sample containing n observations is given by

Calculate the mid-point of each class

Multiply each mid-point by frequency of observations in the corresponding class (f.x)

f =Number of observations in each class

Class Mark (x)

=sum of all the weights

Wage per hour

Labour hours per unit

The labor cost / hour for Product 1 is given by

Rs. 12.22 per hour

Sunday Monday Tuesday Wednesday Thursday Friday Saturday

What is the median number of oranges sold in that week?

The total frequency of the data N = 136, thus median is given by

The median class is 501-600 class.

=(0.21428 100) + 501

Advantages and Disadvantages

To study measures of dispersion: variance and standard deviation

To study Bienayme Chebyshevs rule

Example: Calculate range and coefficient of

range for the

Upper limit of the highest class - Lower limit of the lowest

For Ungrouped Data

Upper quartile (Q3)

The 2nd observation is 16.3 and the 6th observation is 21.5

Quartile deviation (Q.D.) =

For Grouped Data

Example: The wages of employees are given

Cumulative Frequency Table

This observation will fall in class (2501-3500)

This observation will fall in class (4501-5500)

Q.D can be used as a measure of variation for open-ended

As the Q.D is calculated using only 50% of the total observations, it

Q.D. is not a real measure of variation as it does not measure the

Calculation of mean deviation for ungrouped data

Calculate the sample mean

Absolute Mean Deviation =

Example: The maximum day

Absolute Mean deviation

Example: Calculate mean