Descriptive Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Chapter: Descriptive statistics

Let us first look at the following definitions:

1) Variable: It is a quantity which varies from individual to individual.
e.g. Height, Weight etc…
2) Continuous variable: It is a quantity which take any numerical value within a certain range.
e.g. Height, Weight etc…
3) Discrete or Discontinuous data: It is a quantity which is not capable of taking all possible
e.g. Number of students in class
4) Types of series:
a) Individual observations (Ungrouped data): These are the observations where
frequencies are not given.
i.e. 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛
b) Discrete data (Grouped data): It is a series of observations of the form
𝒙: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑛
𝒇: 𝑓1 𝑓2 𝑓3 …… 𝑓𝑛
c) Continuous series (Grouped data): It is a series of observations of the form
Class interval 𝑐1 − 𝑐2 𝑐2 − 𝑐3 𝑐3 − 𝑐4 …… 𝑐𝑛 − 𝑐𝑛+1
𝒇: 𝑓1 𝑓2 𝑓3 …… 𝑓𝑛

There are three commonly used averages called as measures of central tendency: Mean, Median
and Mode. The mean again may be of three types: Arithmetic mean (A.M.), Geometric mean
(G.M.) and Harmonic mean (H.M.).
 Arithmetic mean (A.M.)
The arithmetic mean is simply called an ‘Average’.
Mean for ungrouped data:
If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are 𝑛 observations then the arithmetic mean is defined and denoted by
𝑥1 + 𝑥2 + 𝑥3 + … + 𝑥𝑛 ∑𝑛𝑖=1 𝑥𝑖
𝑥̅ = = (𝐷𝑖𝑟𝑒𝑐𝑡 𝑚𝑒𝑡ℎ𝑜𝑑)
𝑛 𝑛
∑𝑛𝑖=1 𝑑𝑖
𝑂𝑅 𝑥̅ = 𝐴 + (𝑆ℎ𝑜𝑟𝑡 𝑐𝑢𝑡 𝑚𝑒𝑡ℎ𝑜𝑑)
Where A is an assumed mean or arbitrary number assumed from 𝑥𝑖 and 𝑑𝑖 = 𝑥𝑖 − 𝐴 called as
Mean of grouped data:
If the values 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 occur with corresponding frequencies 𝑓1 , 𝑓2 , 𝑓3 , … , 𝑓𝑛 then
∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖
𝑥̅ = 𝑛 (𝐷𝑖𝑟𝑒𝑐𝑡 𝑚𝑒𝑡ℎ𝑜𝑑)
∑𝑖=1 𝑓𝑖
∑𝑛𝑖=1 𝑓𝑖 𝑑𝑖
𝑂𝑅 𝑥̅ = 𝐴 + 𝑛 (𝑆ℎ𝑜𝑟𝑡 𝑐𝑢𝑡 𝑚𝑒𝑡ℎ𝑜𝑑)
∑𝑖=1 𝑓𝑖
Remark: For grouped data we assume the frequencies in each class are centred at its class
mark (Mid value of the class is called its class mark, e.g. class mark of 0-5 is 2.5and class mark
of 10-20 is 15). 𝑥𝑖 denotes the class mark in above formula.
1.1 Find the mean of the temperature recorded in degrees centigrade
during a week of April 2014. 38.2, 40.9, 39, 44, 40.6, 39.5, 40.5
Ans: 40.386

1.2 The marks obtained by 10 students in class XII in a Mathematics
examination are 25, 30, 21, 55, 40, 45, 17, 48, 35, 42. Find the mean.
Ans: 35.8
1.3 Find the average wages for the construction of the building from the Win 17
wages paid to different workers:
Wages: 100 200 300 400 500
No. of workers: 3 5 6 9 2
Ans: 308
1.4 Find the arithmetic mean from the following frequency distribution:
𝒙: 5 6 7 8 9 10 11 12 13 14
𝒇: 25 45 90 165 112 96 81 26 18 12
Ans: 8.83
1.5 Find the arithmetic mean from the following data:
𝒙: 35 45 55 60 75 80
𝒇: 12 18 10 6 3 11
Ans: 54.083
1.6 For the following table, find mean by short cut method:
Class 100- 120- 140- 160- 180- 200- 220-
interval 120 140 160 180 200 220 240
𝒇: 10 8 4 4 3 1 2
Ans: 145.7

1) Exclusive class: When the class interval exclude the upper limit of the class then it is known
as exclusive class.
e.g. {𝑥/ 𝑎 ≤ 𝑥 < 𝑏} = [𝑎, 𝑏) is called exclusive class.
2) Inclusive class: When the class interval include the upper limit of the class then it is known
as inclusive class.
e.g. {𝑥/ 𝑎 ≤ 𝑥 ≤ 𝑏} = [𝑎, 𝑏] is called inclusive class.
Remark: It should be noted that in order to ensure the continuity of the class limits and to get
correct class limits exclusive method of classification should be adopted. To convert the given
inclusive class into exclusive, adjustment should be done as follows:
a) Find the difference between the lower limit of the second class and the upper limit of
the first class.
b) Divide the difference found in (a) by 2.
c) Subtract the value obtained in (b) from all the lower limits and add to all upper limits.
1.7 Following are the number of visitors in 180 days to a zoo. Find A.M.
No. of visitors: 1-10 11-20 21-30 31-40 41-50 51-60
No. of days: 22 28 35 45 30 20
Ans: 30.667
1.8 Ten coins were tossed together and the number of tails resulting from
them were observed. The operation was performed 1050 times and the
frequencies thus obtained for different number of tail (𝑥) are shown in

the following table. Calculate the arithmetic mean by using short cut
𝒙: 0 1 2 3 4 5 6 7 8 9 10
𝒇: 2 8 43 133 207 260 213 120 54 9 1
Ans: 5.0114
1.9 For the following table, find mean by short cut method:
Class 0- 10- 20- 30- 40- 50- 60- 70-
interval 10 20 30 40 50 60 70 80
𝒇: 3 8 12 15 18 16 11 5
Ans: 42.5

 Step deviation method

For grouped data, when equal class intervals are given, then the calculation of mean can be
further simplified. In this case, arithmetic mean is obtained by the formula
∑𝑛𝑖=1 𝑓𝑖 𝑑𝑖
𝑥̅ = 𝐴 + 𝑛 ×ℎ
∑𝑖=1 𝑓𝑖
Where A is an assumed mean or arbitrary number assumed from 𝑥𝑖
ℎ is the width of the class interval
𝑥𝑖 − 𝐴
𝑑𝑖 = is the deviation of any variate from 𝐴.

1.10 Calculate the mean by step deviation method for the following data:
𝒙: 5 10 15 20 25 30
𝒇: 21 44 70 65 71 40
Ans: 18.875
1.11 Calculate the mean by step deviation method for the following data:
Weights: 2.0-2.4 2.5-2.9 3.0-3.4 3.5-3.9 4.0-4.4 4.5-4.9
No. of childs: 17 97 187 135 28 6
Ans: 3.283
1.12 Calculate the average marks of the students by step deviation method:
Marks: 0-10 10-20 20-30 30-40 40-50 50-60
No. of students: 40 41 55 30 21 16
Ans: 24.95

1.13 Calculate the mean by step deviation method for the following data:
Class: 10-19 20-29 30-39 40-49 50-59
Frequency: 1 1 15 10 20
Ans: 44.5
1.14 The following table gives the distribution of companies according to size
of capital. Find the mean size of the capital of a company.
Capital (₹ in lacs): <5 <10 <15 <20 <25 <30
No. of companies: 20 27 29 38 48 53
Ans: 12.22 lacs
1.15 Following is the distribution of marks obtained by 60 students in a
Mathematics test of 60 marks :

Marks: >0 >10 >20 >30 >40 >50
No. of students 60 56 40 20 10 3
Find the arithmetic mean.
Ans: 26.5
1.16 Find the average marks of students from the following table:
Marks: Number of Marks: Number of
students students
Above 0 80 Above 60 23
Above 10 77 Above 70 16
Above 20 72 Above 80 10
Above 30 65 Above 90 8
Above 40 55 Above 100 0
Above 50 43
Ans: 51.125

 Weighted arithmetic mean

Suppose that 𝑤1 , 𝑤2 , 𝑤3 , … , 𝑤𝑛 are the weights assigned to the values 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 resp. then
the weighted arithmetic mean is defined as follow:
∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖
𝑥̅ = 𝑛
∑𝑖=1 𝑤𝑖
Properties of Arithmetic mean
1) The algebraic sum of deviations from the mean is zero. If the mean of 𝑛 observations
𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 is 𝑥̅ then (𝑥1 − 𝑥̅ ) + (𝑥2 − 𝑥̅ ) + (𝑥3 − 𝑥̅ ) + … + (𝑥𝑛 − 𝑥̅ ) = 0.
i.e. ∑(𝑥 − 𝑥̅ ) = 0.
2) If ̅̅̅
𝑥1 , ̅̅̅
𝑥2 , ̅̅̅
𝑥3 , … , ̅̅̅
𝑥𝑘 are the means of 𝑘 series of sizes 𝑛1 , 𝑛2 , 𝑛3 , … , 𝑛𝑘 resp. then the mean 𝑥̅
of the composite series is given by
𝑛1 𝑥
̅̅̅1 + 𝑛2 𝑥
̅̅̅2 + … + 𝑛𝑘 ̅̅̅𝑥𝑘 ∑ 𝑛𝑖 𝑥𝑖
𝑥̅ = =
𝑛1 + 𝑛2 + … + 𝑛𝑘 ∑ 𝑛𝑖
1.17 The following table gives the number of students in different subjects of
Civil engineering class of SSASIT and the average marks obtained by them
in different subjects. Find the average marks obtained per subject.
Subjects Average Marks No. of
obtained students
AEM 40 25
SM 70 30
AM 63 15
EME 78 42
EEE 55 13
Ans: 64.288
1.18 Let 12 tires have been run on a group of test truck and have shown an
average of 14560 miles. Similarly 20 other tires have shown an average
of 13425 miles. What is the mean mileage of all the tires?
Ans: 13580.625

1.19 The average salary of male employees in a company is ₹ 5200 and that of
females is ₹ 4200. The mean salary of all the employees is ₹ 5000. Find
the percentage of male and female employees.
Ans: 80% and 20%
1.20 Comment on the performance of the students of two universities given
GTU Mumbai University
Course Pass No. of Pass No. of
% students % students
BE (Computer) 65 130 60 150
BE (Civil) 75 150 80 120
BE (IT) 55 180 60 130
BE 60 130 65 150
Ans: The performance of students of GTU is better than that of Mumbai
1.21 The arithmetic mean of 50 items of a series was calculated by a student
as 20. However, it was later discovered that an item of 25 was misread as
35. Find the correct value of mean.

Geometric mean
If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are 𝑛 observations, then the geometric mean is defined as the 𝑛𝑡ℎ root of their
i. e. G. M = 𝑛√𝑥1 ∙ 𝑥2 ∙ … ∙ 𝑥𝑛
1 ∑ ln 𝑥𝑖
⇒ ln(𝐺. 𝑀. ) = [ln 𝑥1 + ln 𝑥2 + … + ln 𝑥𝑛 ] =
𝑛 𝑛
For example, the G.M. of the numbers 50, 100 and 200 is G.M.= √50 ∗ 100 ∗ 200 = 100
 For frequency data
∑ 𝑓𝑖 log 𝑥𝑖
log G. M =
∑ 𝑓𝑖
Note: G.M. is a suitable measure for averaging rates of increase or decrease.

Harmonic mean
If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are 𝑛 observations, then Harmonic mean is defined as
𝑛 𝑛
H. M = =
1 1 1 1

𝑥1 + 𝑥2 + … + 𝑥𝑛 𝑥𝑖
A frequency distribution with mid values 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 and corresponding frequencies are
𝑓1 , 𝑓2 , 𝑓3 , … , 𝑓𝑛 is
∑ 𝑓𝑖
H. M =
∑( 𝑖)
For example, the harmonic mean of 2, 4 and 8 is
H. M. = = 3.249
1 1 1
+ +
2 4 8

 For frequency data
∑ 𝑓𝑖
𝐻. 𝑀 =
∑ ( 𝑖)
Note: Harmonic mean is very useful for the calculation of averaging speed.

Relation between A.M., G.M. and H.M.

𝐴. 𝑀. ≥ 𝐺. 𝑀. ≥ 𝐻. 𝑀.
𝐺. 𝑀. = √𝐴. 𝑀.× 𝐻. 𝑀.
1.22 Find G.M. for the following distribution:
Wages paid 60 62 64 68 70
No. of persons 3 2 4 2 4
Ans: 64.9561
1.23 Find G.M. for the following data of a football players.
Goals 0-10 10-20 20-30 30-40 40-50
No. of players 5 9 10 16 4
Ans: 22.37
1.24 The price of cereals increased by 5% from 2005 to 2006, 8% from 2006
to 2007 and 77% from 2007 to 2008. The average increase from 2005 to
2008 is quoted as 26.2% and not 30%. Explain and verify the result.
1.25 Find out the average arte of decrease in visitors of a museum if it come
down by 15% on Monday, 20% on Tuesday and 30% on Wednesday.
Ans: Decrease by 21.93%
1.26 Find H.M. for the following data:
Marks 40-50 50-60 60-70 70-80 80-90 90-100
No. of students 12 10 15 17 8 3
Ans: 63.021
1.27 Suppose a car moves 50km with a speed of 60km/hr., then 55 km with a
speed of 70 km/hr. and next 70km at a speed of 80 km/hr. Find the
average speed.
Ans: 58.075 km/hr

 Median
For this measure, the observations are arranged in ascending or in descending order of their
magnitudes then median is defined as the middle most value or observation (OR central value)
in a set of observations.
It divides the arranged observations into two equal parts.
Median (M) for ungrouped data
First arrange the given observations either in ascending or in descending order of their
Case: 1: When number of observations (𝑛) are odd,
𝑛 + 1 𝑡ℎ
𝑀=( ) observation
Case: 2: When number of observations (𝑛) are even,

𝑛 𝑡ℎ 𝑛 𝑡ℎ
(2) observation + (2 + 1) observation

2.1 Find the median of 7, 20, 42, 51, 22, 54, 0, 38, 77, 2, 17. Ans: 22
2.2 The marks obtained by 8 students are 10, 23, 18, 38, 65, 92, 40, 58. Find
median. Ans: 39
Median for grouped data
Case: 1: When the series is discrete
𝑀 = Value of variable 𝑥 for which Cumulative frequency just grater than
Where 𝑁 is the total value of all frequencies.
2.3 Calculate the median for the following data:
No. of students: 6 4 16 7 8 2
Marks: 20 9 25 50 40 80
Ans: 25 marks

Case: 2: When the series is continuous (Less than frequency distribution)

ℎ 𝑁
𝑀 = 𝐿 + ( − 𝐶)
𝑓 2
Where L = Lower limit of the median class
𝑁 = ∑ 𝑓 is sum of all frequencies
𝐶 = Cumulative frequency of the class preceding the median class
𝑓 = Frequency of the median class
ℎ = Width of the median class
Case: 3: When the series is continuous (‘More than’ OR ‘greater than’ type of frequency
ℎ 𝑁
𝑀 = 𝑈 − ( − 𝐶)
𝑓 2
Where U= Upper limit of the median class
𝑁 = ∑ 𝑓 is sum of all frequencies
𝐶 = Cumulative frequency of the class succeeding the median class
𝑓 = Frequency of the median class
ℎ = Width of the median class
2.4 Find the median from the following data: Win 18
Class: 0-30 30-60 60-90 90-120 120-150 150-180
Frequency: 8 13 22 27 18 7
Ans: 95
2.5 The following table gives the mark obtained by 50 students in statistics.
Find the median.
Marks: 10- 15- 20- 25- 30- 35- 40- 45-
14 19 24 29 34 39 44 49
No. of students: 4 6 10 5 7 3 9 6
Ans: 29.5 marks
2.6 The following table gives the weekly expenditure of 100 workers. Find the
median weekly expenditure.

Weekly expenditure: 0-10 10-20 20-30 30-40 40-50
Number of workers: 14 23 27 21 15
Ans: 24.815
2.7 From the following data calculate the median: Sum 15
Marks(less than): 5 10 15 20 25 30 35 40 45
No. of students: 29 224 465 582 634 644 650 653 655
Ans: 12.147
2.8 Find the median of the following data:
Age greater than (in years): 0 10 20 30 40 50 60 70
No. of persons: 230 218 200 165 123 73 28 8
Ans: 41.6 years
2.9 Find the median of the following data: Win 17
Marks <20 21-30 31-40 41-50 51-60 61-70
No. of students 5 15 20 6 6 8
2.10 The following table gives the distribution of daily wages of 900 workers.
However the frequencies of classes 40-50 and 60-70 are missing. If the
median of the distribution is ₹ 59.25, find the missing frequencies.
Wages (in ₹): 30-40 40-50 50-60 60-70 70-80
No. of workers): 120 ? 200 ? 185
Ans: 𝑓1 = 145, 𝑓2 = 250
2.11 The following incomplete table gives the number of students in different
age groups of a town. If the median of the distributions is 11 years, find out
the missing frequencies.
Age group: 0-5 5-10 10-15 15-20 20-25 25-30 Total
No. of students: 15 125 ? 66 ? 4 300
Ans: 50, 40
 Mode
Mode is defined as the value of the variable which occurs most frequently in the set of
observations. When frequency distribution is given the mode is that variable which has
maximum frequency.
For example: 1) 2, 4, 2, 5, 7, 2, 8, 9
It is observed that 2 occurs most frequently. ∴ Mode = 2.
2) For the following data, find mode.
Variable (𝒙): 3 4 5 6
Frequency (𝒇): 15 20 19 10
Here maximum frequency is 20 for 𝑥 = 4. ∴ Mode = 4.
Remark: Sometimes there may be two or more than two values which occur with equal
frequency, the distribution is the called bimodal or multimodal.
e.g. For the series 40, 31, 42, 35, 31, 40, 65, 22, the distribution is bimodal and modes are 31,

For frequency data

In any one (Or more) of the following cases:
(i) If the maximum frequency is repeated

(ii) If the maximum frequency occurs in the very beginning or at the end of the distribution
(iii) If there are irregularities in the distribution.
The value of Mode is determined by the method of grouping.
Remark: If we get two observations of same frequency then use empirical formula of mode.
3.1 Ex: 1: Find the mode of the following frequency distribution:
Size (𝑥): 1 2 3 4 5 6 7 8 9 10 11 12
Frequency (𝑓): 3 8 15 23 35 40 32 28 20 45 14 6
Ans: 6
3.2 Find the mode of the following frequency distribution:
𝑥: 10 11 12 13 14 15 16 17 18 19
𝑓: 8 15 20 100 98 95 90 75 50 30
Ans: Mode is ill defined use empirical formula and answer is 14.82

Mode for continuous frequency data:

𝑓𝑚 − 𝑓1
Mode = 𝐿 + ℎ ( )
2𝑓𝑚 − 𝑓1 − 𝑓2
Where L = Lower limit of the modal class
𝑓𝑚 = Frequency of the modal class
𝑓1 = Frequency of the class preceding the modal class
𝑓2 = Frequency of the class succeeding the modal class
ℎ = Width of the modal class
This method of finding mode is called the method of interpolation. This formula is applicable
only to a unimodal frequency distribution.
Remark: 1) If 2𝑓𝑚 − 𝑓1 − 𝑓2 = 0 then
𝑓𝑚 − 𝑓1
Mode = 𝐿 + ℎ ( )
|𝑓𝑚 − 𝑓1 | + |𝑓𝑚 − 𝑓2 |
2) For an asymmetrical distribution (A distribution in which mean, median, mode are not equal
(coincide) is called asymmetrical distribution) mode is defined as
Mode = 3 Median − 2 Mean
This is known as the empirical formula for calculation of the Mode.

3.3 Find the mode for the following data:

36, 30, 26, 20, 32, 31
Ans: There is no mode.
3.4 Define Mode and also give the relationship between Mean, Median and Sum 16
3.5 Find mode for the following data: Win 17
Class interval: 0-50 50-100 100-150 150-200 200-250 250-300
Frequency: 10 5 25 30 10 20
3.6 The frequency distribution of marks obtained by 60 students of a class in a
college is given by
Marks: 30-34 35-39 40-44 45-49 50-54 55-59 60-64
Frequency: 3 5 12 18 14 6 2

Ans: 47.5
3.7 Calculate mode for the following data: Sum 18
Marks: 0- 10- 20- 30- 40- 50- 60-
10 20 30 40 50 60 70
No. of students 5 15 20 20 32 14 14
3.8 Calculate the Mean, Median and Mode for the following data: Win 15
Class interval 50-53 53-56 56-59 59-62 62-65 65-68
Frequency 3 8 14 30 36 28
Class interval 68-71 71-74 74-77
Frequency 16 10 5
Ans: Mean = 63.82, Median = 63.67, Mode = 63.37
3.9 Calculate mode for the following data:
Class interval: 0-10 10-20 20-30 30-40 40-50
Frequency: 45 20 14 7 3
Ans: 1.47
3.10 The following table gives the incomplete income distribution of 300 workers
of a company, where the frequencies of the classes 3000-4000 and 5000-
6000 are missing. If the mode of the distribution is Rs. 4428.57, find the
missing frequencies.
Monthly income No. of workers
1000-2000 30
2000-3000 35
3000-4000 ?
4000-5000 75
5000-6000 ?
6000-7000 30
7000-8000 15

Ans: 60,55

 Standard deviation (𝝈) (Square root of the mean of squares of the deviations)
If only observations are given then
∑(𝑥𝑖 − 𝑥̅ )2 ∑ 𝑥𝑖2 ∑ 𝑥𝑖
𝜎=√ = √ −( ) = √Mean of squares − square of Mean
𝑛 𝑛 𝑛
In case of a frequency distribution
∑ 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2 ∑ 𝑓𝑖 ∙ 𝑥𝑖2 ∑ 𝑓𝑖 ∙ 𝑥𝑖
𝜎=√ =√ −( ) where 𝑁 = ∑ 𝑓𝑖
∑ 𝑓𝑖 ∙ 𝑑𝑖2 ∑ 𝑓𝑖 ∙ 𝑑𝑖 𝑥𝑖 − 𝐴
=√ −( ) ∗ ℎ where 𝑁 = ∑ 𝑓𝑖 , 𝑑𝑖 =
𝑁 𝑁 ℎ
 Variance
The variance is the square of the standard deviation and it is denoted by V(X).
i.e. 𝑉(𝑋) = 𝜎 2
4.1 Calculate the standard deviation of the weights of ten persons:
Weights: 45 49 55 50 41 44 60 58 53 55
Ans: 5.967

4.2 Find Arithmetic mean and standard deviation for the data given below: Sum 18
2,2,5,7,9,12,13,14,16,20 Ans:
4.3 Calculate the standard deviation of the following data:
𝒙: 10 11 12 13 14 15 16 17 18
𝒇: 2 7 10 12 15 11 10 6 3
Ans: 1.987
4.4 Find the standard deviation for the following data:
Marks: 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No. of students: 5 12 15 20 10 4 2
Ans: 14.285
4.5 The pH of a solution is measured eight times by one operator using the same Sum 16
instrument. She obtains the following data: 7.15, 7.20, 7.18, 7.19, 7.21, 7.20,
7.16, and 7.18. Calculate the sample mean, the sample variance and sample
standard deviation.
Ans: 𝑥̅ = 7.184, 𝜎 = 0.002066, 𝑉 = 0.000427
4.6 If total sum of square is 20 and the sample variance is 5 then find the total
number of observations. Ans:
4.7 In environmental Geology computer simulation was employed to estimate Sum 17
how far a block from a collapsing rock wall bounce down a soil slope. Based
on the depth, location and angle of block soil impact marks left on the slope
of the actual rock fall, the following 10 rebound lengths (meters) were
estimated. Compute mean, median, mode, standard deviation and variance
of the rebounds.
10.2 9.5 8.3 9.7 9.5 11.1 7.8 8.8 9.5 10
Ans: 𝑥̅ = 9.44, 𝑀𝑒𝑑𝑖𝑎𝑛 = 9.5, 𝑀𝑜𝑑𝑒 = 9.5, 𝜎 = 0.9013, 𝑉 = 0.8124
4.8 Find standard deviation for the distribution giving 300 cars according to Win 17
their selling days:
No. of days 0-30 30-60 60-90 90-120 120-150 150-180 180-210
No. of cars 9 17 43 82 81 44 24
Ans: 42.51
4.9 Find the mean and standard deviation for the following data: Win 18
Class interval: 0-10 10-20 20-30 30-40 40-50 50-60 60-70
Frequency: 6 14 10 8 1 3 8
Ans: 30, 19.6214
4.10 Find the standard deviation of the following data:
Size of items: 10 11 12 13 14 15 16
Frequency: 2 7 11 15 10 4 1
Ans: 1.342
4.11 The mean of 5 observations is 4.4 and variance is 24. If three observations
are 1,2 and 6 find the other two observations.
Ans: 𝑥1 , 𝑥2 = 4,9
 Coefficient of variation
The standard deviation is an absolute measure of dispersion. The coefficient of variation is a
relative measure of dispersion and is denoted by CV.
𝐶𝑉 = × 100

The coefficient of variation has great practical significance and it is the best measure of
comparing the variability of two series. The series or groups for which the coefficient of
variation is greater is said to be more variable or less consistent. On the other hand, the series
for which the variation is lesser is said to be less variable or more consistent.
5.1 The runs scored by two batsmen A and B in 9 consecutive matches are Win 18
given below:
A: 85 20 62 28 74 5 69 4 13
B: 72 4 15 30 59 15 49 27 26
Which of the batsmen is more consistent?
Ans: 𝐶𝑉𝐴 = 75.925%, 𝐶𝑉𝐵 = 64.18%, batsman B is more consistent.
5.2 Two automatic filling machines A and B are used to fill a mixture of cement
concrete in a beam. A random sample of beams on each machine showed the
following information.
Machine A: 32 28 47 63 71 39 10 60 96 14
Machine B: 19 31 48 53 67 90 10 62 40 80
Find the standard deviation of each machine and also comment on the
performances of the two machines.
Ans: 𝐶𝑉𝐴 = 55.423%, 𝐶𝑉𝐵 = 48.585%, Machine B is more consistent.

 Partition values
These are the values which divide the series into a number of equal parts.
Quartiles: The three points which divide the series into four parts. The first, second and
third point are known as the first, second and third quartiles resp.
 The first quartiles 𝑄1 is the point which has 25% observations before it and 75%
observations after it.
 The second quartiles 𝑄2 is coincide with median.
 The third quartiles 𝑄3 is the point which has 75% observations before it and 25%
observations after it.
Deciles: The nine points which divide the series into ten equal parts. 𝑖 𝑡ℎ decile is denoted by
𝐷𝑖 .
Percentiles: The ninety-nine points which divide the series into hundred equal parts. 𝑖 𝑡ℎ
percentile is denoted by 𝑃𝑖 .
Partition values For frequency data For continuous frequency data

𝑄𝑖 Observation for which C.F. is ℎ 𝑖𝑁

𝑖𝑁 𝐿 + ( − 𝑐)
just greater than 𝑓 4
4 Use data of the class whose
Where 𝑁 = ∑ 𝑓𝑖 𝑖𝑁
frequency is just greater than 4
𝐷𝑖 Observation for which C.F. is ℎ 𝑖𝑁
𝑖𝑁 𝐿 + ( − 𝑐)
just greater than 𝑓 10
10 Use data of the class whose
Where 𝑁 = ∑ 𝑓𝑖 𝑖𝑁
frequency is just greater than 10
𝑃𝑖 Observation for which C.F. is ℎ 𝑖𝑁
𝑖𝑁 𝐿+ ( − 𝑐)
just greater than 𝑓 100

Where 𝑁 = ∑ 𝑓𝑖 Use data of the class whose
frequency is just greater than 100

6.1 Eight coins were tossed together and the number of heads resulting was
noted. The operation was repeated 256 times and the frequencies (𝑓) that
were obtained for different values of 𝑥, the number of heads are shown in
following table. Calculate median, quartiles, 4th decile and 27th percentile.
𝑥: 0 1 2 3 4 5 6 7 8
𝑓: 1 9 26 59 72 52 29 7 1
Ans: Median = 4, 𝑄1 = 3, 𝑄3 = 5, 𝐷4 = 4, 𝑃27 = 3
6.2 Following is the distribution of marks obtained by 500 candidates in
statistics paper of a civil services examination.
Marks more than: 0 10 20 30 40 50
Number of students: 500 460 400 200 100 30
Calculate the lower quartile marks. If 70% of the candidates pass in the
paper, find the minimum marks obtained by a pass candidate.
Ans: 𝑄1 = 21.25, 𝐷3 = 22.5
6.3 From the following table showing the wage distribution in a certain factory,

(a) The mean wages (b) The median wages (c) The modal wages (d) The
wage limits for the middle 50% of the wage earners.
Ans: (a) 108.5 (b) 108.75 (c) 118.3 (d) 81.25, 129.3
6.4 Find lower and upper quartiles, fourth decile and 60th percentile for
Weight(kg): 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45
Number of boys: 5 6 15 10 5 4 2 2
Ans: 15.41,25.75, 17.86, 21.7

Average (Or the measures of central tendency) give us an idea of the concentration of the
observations about the central part of the distribution. If we know the average alone, we cannot
form a complete idea about the distribution as will be clear from the following example.
Consider the series (i) 7, 8, 9, 10, 11 (ii) 3, 6, 9, 12, 15 and (iii) 1, 5, 9, 13, 17. In all these cases
we see that number of observations are 5 and mean is 9. If only given that mean is 9 and
observations are 5, we cannot form an idea as to whether it is the average of first series or
second series or third series or of any other series of 5 observations whose sum is 45. Thus
measure if central tendency cannot give complete idea of the distribution. They must be
supported and supplemented by some other measures. One such measure is Dispersion.
Measures of dispersion
These are classified in two categories:
(i) The measure which express the spread of observations in terms of distance, e.g., Range,
Quartile deviation.

(ii) The measure which express the spread of observations in terms of the average of
deviations of observations from the central value, e.g., mean deviation, standard
Range: It is the difference between two extreme observations of the distribution.
𝑅𝑎𝑛𝑔𝑒 = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
𝑄3 − 𝑄1
𝐐𝐮𝐚𝐫𝐭𝐢𝐥𝐞 𝐝𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧: 𝑄 =
Where 𝑄1 and 𝑄3 are the first and third quartiles resp.
And 𝑄3 − 𝑄1 is known as interquartile range.
Quartile deviation is definitely better than the range as it makes use of 50% of the data.
Mean deviation: If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are 𝑛 observations and let 𝐴 be the mean, median or
mode then
Mean deviation from the average 𝐴 = ∑ |𝑥𝑖 − 𝐴|
If 𝑥𝑖 are observations and 𝑓𝑖 are respective frequencies then mean deviation from the average
𝐴 (usually mean, median, mode) is given by
Mean deviation from the average 𝐴 = ∑ 𝑓𝑖 |𝑥𝑖 − 𝐴| ; 𝑁 = ∑ 𝑓𝑖
Standard deviation: If 𝑥𝑖 are observations and 𝑓𝑖 are respective frequencies then
standard deviation is given by

∑ 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝜎=√ ; 𝑁 = ∑ 𝑓𝑖

Root mean square deviation: If 𝑥𝑖 are observations and 𝑓𝑖 are respective frequencies
then Root mean square deviation is given by

∑ 𝑓𝑖 (𝑥𝑖 − 𝐴)2
𝑠=√ ; 𝑁 = ∑ 𝑓𝑖

Where 𝐴 is any arbitrary number. 𝑠 2 is called mean square deviation.

Remark: Relation between 𝜎 and 𝑠 is

𝑠 2 = 𝜎 2 + 𝑑 2 ; 𝑑 = 𝑥̅ − 𝐴
i.e. Standard deviation is the least value of root mean square deviation.
Variance of the combined series
If 𝑛1 , 𝑛2 are the sizes, ̅̅̅,
𝑥1 𝑥 ̅̅̅2 the means and 𝜎1 , 𝜎2 the standard deviations of two series, then
the standard deviation 𝜎 of the combined series of sizes 𝑛1 + 𝑛2 is given by
𝜎2 = {𝑛1 (𝜎12 + 𝑑12 ) + 𝑛2 (𝜎22 + 𝑑22 )}
𝑛1 + 𝑛2
𝑛1 ̅̅̅
𝑥1 + 𝑛2 ̅̅̅
Where 𝑑1 = ̅̅̅
𝑥1 − 𝑥̅ , 𝑑2 = 𝑥
̅̅̅2 − 𝑥̅ and 𝑥̅ = is the mean of the combined series.
𝑛1 + 𝑛2

Coefficient of dispersion
The coefficients of dispersion (C.D.) based on different measures of dispersion are as follows:
𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
(𝑎) Based on range ∶ 𝐶. 𝐷. =
𝑥𝑚𝑎𝑥 + 𝑥𝑚𝑖𝑛
𝑄3 − 𝑄1
(𝑏) Based on quartile deviation ∶ 𝐶. 𝐷. =
𝑄3 + 𝑄1
mean deviation
(𝑐) Based on mean deviation ∶ 𝐶. 𝐷. =
Average from which it is calculated
(𝑑) Based on standard deviation ∶ 𝐶. 𝐷. =
7.1 For the following information, find the quartile deviation and its coefficient.
Weights: 15 16 17 18 19 20 21
No. of children: 8 14 15 13 19 23 17
Ans: Q.D. = 1.5, coefficient of Q.D.= 0.081
7.2 Calculate: (i) Quartile deviation and (ii) mean deviation from mean for the
following data:
Marks: 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No. of students: 6 5 8 15 7 6 3
Ans: Q.D. = 11.23, M.D. (from mean) =13.184
7.2 For a group of 200 candidates, the mean and standard deviation of scores
were found to be 40 and 15 resp. later on it was discovered that the scores
43 and 35 were misread as 34 and 53 resp. Find the corrected mean and
standard deviation corresponding to the corrected figures.
Ans: Corrected mean = 39.995, corrected 𝜎 = 14.97
An analysis of monthly wages paid to the workers of two firms A and B
belonging to the same industry gives the following results:
Firm A Firm B
Number of workers 500 600
Average daily wage Rs 186 Rs 175
Variance of distribution of wages 81 100
(𝑖) Which firm A or B has a larger wage bill?
(𝑖𝑖) In which firm A or B, is there greater variability in individual wages?
(𝑖𝑖𝑖) Calculate (𝑎) the average daily wage and (𝑏) the variance of the
distribution of wages of all the workers in the firms A and B taken together.
Ans: (𝑖) Firm B (𝑖𝑖) Firm B (𝑖𝑖𝑖) (𝑎) 180 (𝑏) 121.36
7.4 Find the quartile deviation and its deviation and its coefficients. Also inter
quartile range and coefficient of variation.
Marks: <35 35-37 38-40 41-43 >43
No. of students: 8 16 13 8 5
Ans: Q.D. = 2.7978, coefficient of Q.D.= 0.0729, 5.5956,7.3329
7.5 Find the mean deviation and its coefficient from the A.M. and mode for the
following data:
Product output: 145 155 165 175 185 195
workers: 4 6 10 18 9 3
Ans: A.M.=171.2, M.D. about A.M.=10.56, Mode=175, M.D. about
7.6 Find the M.D. about median of the following distribution about median

Income: 0-10 10-20 20-30 30-40 40-50
No. of workers: 10 25 30 20 15
Ans: Median = 25, M.D. about median= 9.5
7.7 On an external exam in calculus, the mean marks of a group of 125 students
were 75 and the S.D. was 6. In Physics, however, the mean marks of the
group were 70 and S.D. was 7. Find out (a) coefficient of S.D. (b) Coefficient
of variation.
Ans: 0.08, 8%, 0.1, 10%

 Skewness
Skewness is a measure that refers to the extent of symmetry or asymmetry in a distribution. A
distribution is said to be symmetrical when its mean, median and mode are equal and the
frequencies are symmetrically distributed about the mean. A symmetrical distribution when
plotted on a graph will give a perfectly bell shaped curve which is known as a normal curve.

A distribution is said to be asymmetrical or skewed when the mean, median and mode are not
equal i.e. the mean, median and mode do not coincide.

 Karl Pearson’s Coefficient of Skewness

Karl Pearson’s coefficient of skewness is denoted by 𝑆𝑘 and defined as

Mean − Mode Mean − Mode
𝑆𝑘 = =
Standard deviation 𝜎
When the mode is ill defined then
3(Mean − Median) 3(Mean − Median)
𝑆𝑘 = =
Standard deviation 𝜎
The coefficient of Skewness usually lies between −1 and 1.

For a positively skewed distribution, 𝑆𝑘 > 0.

For a negatively skewed distribution, 𝑆𝑘 < 0.

For a symmetrical distribution, 𝑆𝑘 = 0.

8.1 Calculate Karl Pearson’s coefficient of Skewness from the following data:
Wages (₹): 10- 15- 20- 25- 30- 35- 40- 45-50
15 20 25 30 35 40 45
No. of 8 16 30 45 62 32 15 6
Ans: −0.223
8.2 Calculate Karl Pearson’s coefficient of skewness from the following data:
Weekely 40- 50- 60- 70- 80- 90- 100- 110- 120- 130-
wages: 50 60 70 80 90 100 110 120 130 140
No. of 5 6 8 10 25 30 36 50 60 70
Ans: −0.682
8.3 In a distribution, the mean = 65, median = 70, coefficient of skewness
=−0.6. Find the mode and coefficient of variation.
Ans: Mode = 80, CV = 38.64%

Moment about arbitrary point

If 𝑥𝑖 are observations and 𝑓𝑖 are respective frequencies then moment about arbitrary point A
is given by
∑(𝑥𝑖 − 𝐴)𝑟 For ungrouped data
𝜇𝑟′ =
∑ 𝑟
𝑓𝑖 (𝑥𝑖 − 𝐴) For frequency data
OR 𝜇𝑟′ =
∑ 𝑓𝑖 𝑑𝑖𝑟
= ; 𝑑 = 𝑥𝑖 − 𝐴
𝑁 𝑟 𝑖
∑ 𝑓𝑖 𝑑𝑖 For frequency data with equal class interval
OR 𝜇𝑟′ = × ℎ𝑟
Where ℎ is the width of the class interval
𝑥𝑖 − 𝐴
𝑎𝑛𝑑 𝑑𝑖 =

Central moment or moment about actual mean
If 𝑥𝑖 are observations and 𝑓𝑖 are respective frequencies then moment about actual mean (𝑥̅ ) is
given by
∑(𝑥𝑖 − 𝑥̅ )𝑟 ∑ 𝑥𝑖 For ungrouped data
𝜇𝑟 = ; 𝑥̅ =
𝑛 𝑛
∑ 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )𝑟 ∑ 𝑓𝑖 𝑥𝑖 For frequency data
OR 𝜇𝑟 = ; 𝑥̅ =
∑(𝑥𝑖 − 𝑥̅ )
First moment about mean is given by 𝜇1 = =0
∑(𝑥𝑖 − 𝑥̅ )2
Second moment about mean is given by 𝜇2 = = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝜎 2 )
∑(𝑥𝑖 − 𝑥̅ )3
Third moment about mean is given by 𝜇3 = and it measures skewness.
 If 𝜇3 > 0, the distribution is positively skewed.
 If 𝜇3 < 0, the distribution is negatively skewed.

 If 𝜇3 = 0, the distribution is symmetrical.
Skewness β1 =
∑(𝑥𝑖 − 𝑥̅ )4
Fourth moment about mean is given by 𝜇4 = and it measures Kurtosis.
Kurtosis β2 = 2

Remark: In a symmetrical distribution, all odd moments are zero, 𝑖. 𝑒. 𝜇1 = 𝜇3 = … = 0

Kurtosis enables us to have an idea about the flatness or peakedness of the frequency curve. It
is measured by coefficient β2 or its derivation γ2 given by
β2 = , γ2 = β 2 − 3

Platykurtic distribution Normal distribution Leptokurtic distribution

Low degree of peakedness Mesokurtic distribution High degree of peakedness
Kurtosis < 0 Kurtosis = 0 Kurtosis > 0
A curve which is neither flat nor peaked is called the normal curve or mesokurtic curve.
A curve which is flatter than the normal curve is known as platykurtic curve.
A curve which is more peaked than the normal curve is called leptokurtic curve.
Ex: Find the first three moments about the mean for the set of numbers 1, 4, 7 and 12.
1 + 4 + 7 + 12
Sol: 𝑥̅ = =6
By definition 𝜇1 is zero.
(1 − 6) + (4 − 6) + (7 − 6) + (12 − 6) 0
𝜇1 = = =0
4 4
(1 − 6)2 + (4 − 6)2 + (7 − 6)2 + (12 − 6)2 25 + 4 + 1 + 36 66
𝜇2 = = = = 16.5
4 4 4
(1 − 6)3 + (4 − 6)3 + (7 − 6)3 + (12 − 6)3 −128 − 8 + 1 + 216 84
𝜇3 = = = = 21
4 4 4

Relation between central moments and raw moments
The moments about the actual mean i.e. central moments and moments about the arbitrary
origin i.e. raw moments are related with each other by the following equations:
First central moment 𝜇1 = 𝜇1′ − 𝜇1′ = 0
Second central moment 𝜇2 = 𝜇2′ − (𝜇1′ )2
Third central moment 𝜇3 = 𝜇3′ − 3𝜇2′ 𝜇1′ + 2(𝜇1′ )3
Fourth central moment 𝜇4 = 𝜇4′ − 4𝜇3′ 𝜇1′ + 6𝜇2′ (𝜇1′ )2 − 3(𝜇1′ )4
Similarly, the raw moments can be expressed in terms of central moments.
First raw moment 𝜇1′ = 𝑥̅ − 𝑎
Second raw moment 𝜇2′ = 𝜇2 + (𝜇1′ )2
Third raw moment 𝜇3′ = 𝜇3 + 3𝜇2 𝜇1′ + (𝜇1′ )3
Fourth raw moment 𝜇4′ = 𝜇4 + 4 𝜇3 𝜇1′ + 6𝜇2 (𝜇1′ )2 + (𝜇1′ )4
Moments about zero
The moments about zero are denoted by 𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 . etc. and defined as
∑ 𝑓𝑖 𝑥𝑖𝑟
𝑣𝑟 =
Relation between moments about zero and central moments
First moment about zero 𝑣1 = 𝑎 + 𝜇1′ = 𝑥̅
Second moment about zero 𝑣2 = 𝜇2 + (𝑣1 )2
Third moment about zero 𝑣3 = 𝜇3 + 3𝑣1 𝑣2 − 2(𝑣1 )3
Fourth moment about zero 𝑣4 = 𝜇4 + 4𝑣1 𝑣3 − 6(𝑣1 )2 𝑣2 + 3(𝑣1 )4
9.1 Find first four moments of the observations 1, 3, 5, 7, 9.
Ans: 𝜇1 = 0, 𝜇2 = 8, 𝜇3 = 0, 𝜇4 = 108.8
9.2 Calculate the first four moments for the following data:
𝑥: 0 1 2 3 4 5 6 7 8
𝑓: 1 8 28 56 70 56 28 8 1
Also calculate the values of β1 and β2 .
Ans: 𝜇1 = 0, 𝜇2 = 2, 𝜇3 = 0, 𝜇4 = 11, β1 = 0, β2 = 2.75
9.3 Calculate the first four moments about the mean for the following data:
Marks: 0-10 10-20 20-30 30-40 40-50 50-60 60-70
No. of students: 8 12 20 30 15 10 5
Ans: 𝜇1 = 0, 𝜇2 = 236.76, 𝜇3 = 264.336, 𝜇4 = 141290.11
9.4 Find the first four central moments of the following data:
Class 100- 105- 110- 115- 120-
interval: 104.9 109.9 114.9 119.9 124.9
Frequency: 7 13 25 25 30
Ans: 𝜇1 = 0, 𝜇2 = 38.09, 𝜇3 = −110.772, 𝜇4 = 3229.7057
9.5 The first four moments of a distribution about the value 4 of the variables
are 1, 4, 10 and 45. Show that the mean = 5, Variance=3 and 𝜇3 = 0.

9.6 The first four moments about the working mean 28.5 of a distribution are
0.294, 7.144, 14.409 and 454.98. Calculate the moments about the mean.
Also, evaluate β1 and β2 .
Ans: 28.794, 7.058, 36.151, 408.738, 3.717, 8.205
9.7 Find the first four moments about mean for 𝑥 = 5, 10, 8, 13, 4 Win 17

 Bounds on probabilities

If the probability distribution of a random variable is known, 𝐸(𝑋) and 𝑉(𝑋) can be evaluated.
Conversely, if 𝐸(𝑋) and V(X) are known, probability distribution of X cannot be constructed
and quantities such as 𝑃{|𝑋 − 𝐸(𝑋)| ≤ 𝑘} cannot be evaluated. Several approximation
techniques have been developed to yield upper and/or lower bounds to such probabilities.

 Chebyshev’s inequality
If 𝑋 is a random variable with mean and variance 𝜎 2 , then for any positive number 𝑘,
𝑃{|𝑋 − 𝜇| ≥ 𝑘𝜎} ≤ ⇜ (1)
Or 𝑃{|𝑋 − 𝜇| < 𝑘𝜎} ≥ 1 − 2 ⇜ (2)
Equation (1) is used to find upper bound and equation (2) is used to find lower bound.

10.1 A random variable X has a mean 𝜇 = 12 and a variance 𝜎 2 = 9 and

unknown probability distribution. Find 𝑃(6 < 𝑋 < 18). Ans: ≥ 3⁄4
10.2 A random variable X has pdf 𝑓(𝑥) = 𝑒 −𝑥 , 𝑥 ≥ 0. Use Chebyshev’s inequality
to show that 𝑃{|𝑋 − 1| > 2} ≤ 4 and also, show that the actual probability
is given by 1 − 𝑒 −3 .
10.3 A random variable X is exponentially distributed with parameter 1. Use
Chebyshev’s inequality to show that 𝑃(−1 ≤ 𝑋 ≤ 3) ≥ 3⁄4. Find the actual
probability also. Ans: 0.9502
10.4 A fair dice is tossed 120 times. Use Chebyshev’s inequality to find a lower
bound for the probability of getting 80 to 120 sixes. Ans: 15⁄16
10.5 Two dice are thrown once. If X is the sum of the numbers sharing up, prove
that 𝑃{|𝑋 − 7| ≥ 3} ≤ 54. Compare this value with the exact probability.
Ans: 4⁄9

10.6 Use Chebyshev’s inequality to find how many times a fair coin must be
tossed in order that the ratio of the number of heads to the number of
tosses will the between 0.45 and 0.55 will be at least 0.95. Ans: 2000
10.7 If X denotes the sum of the numbers obtained when 2 dice are thrown,
obtain an upper bound for 𝑃{|𝑋 − 7| ≥ 4}. Compare with actual
probability. Ans: 35⁄96 , 1⁄6
10.8 The number of planes landing at an airport in a 30 minute interval obeys
the Poisson law with mean 25. Use Chebyshev’s inequality to find the least
chance that the number of planes landing within a given 30 minute interval
will be between 15 and 25. Ans: ≥ 3⁄4


You might also like