Chapter 2e
Chapter 2e
Chapter 2e
CHAPTER - 2
2. Summarization of Data
2.1 Measures of Central Tendency
The most important objective of a statistical analysis is to determine a single value for the
entire mass of data, which describes the overall level of the group of observations and can be
called a representative of the whole set of data. It tells us where the center of the distribution
of data is located. The most commonly used measures of central tendencies are:
The Mean (Arithmetic mean, Weighted mean, Geometric mean and Harmonic
means)
The Mode
The Median
𝑿𝒊
∑𝒏𝒊=𝟏 → 𝑓𝑜𝑟 𝑟𝑎𝑤 𝑑𝑎𝑡𝑎
𝒏
𝒌 𝒇𝒊 𝑿𝒊
̅=
𝑿 ∑𝒊=𝟏 → 𝑓𝑜𝑟 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑤ℎ𝑒𝑟𝑒 ∑ 𝑓𝑖 = 𝑛.
∑ 𝒇𝒊
𝒌 𝒇𝒊 𝑴 𝒊
{∑𝒊=𝟏 ∑ 𝒇𝒊 → 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑤ℎ𝑒𝑟𝑒 ∑ 𝑓𝑖 = 𝑛.
Example 1: The net weights of five perfume bottles selected at random from the production
line 𝑎𝑟𝑒 85.4, 85.3, 84.9, 85.4 𝑎𝑛𝑑 85.what is the arithmetic mean weight of the sample
observation?
Solution; 𝐺𝑖𝑣𝑒𝑛 𝑛 = 5 𝑥1 = 85.4, 𝑥2 = 85.3, 𝑥3 = 84.9, 𝑥4 = 85.4 𝑎𝑛𝑑 𝑥5 =
85.
∑ 𝑛
𝑋𝑖 85.4+85.3+84.9+85.4+ 85 426.6
𝑋̅ = 𝑖=1 = = = 85.32.
𝑛 5 5
Example 2: Calculate the mean of the marks of 46 students given below;
Marks (𝑋𝑖 ) 9 10 11 12 13 14 15 16 17 18
Frequency (𝑓𝑖 ) 1 2 3 6 10 11 7 3 2 1
𝑋𝑖 9 10 11 12 13 14 15 16 17 18 Total
𝑓𝑖 1 2 3 6 10 11 7 3 2 1 46
𝒇𝒊 𝑿𝒊 9 20 33 72 130 154 105 48 34 18 623
𝒇𝑿 623
So 𝑋̅ = ∑𝒌𝒊=𝟏 ∑𝒊 𝒇 𝒊 = 46 = 13.54.
𝒊
Example 3: The net income of a sample of large importers of Urea was organized into the
following table. What is the arithmetic mean of net income?
𝒇𝒊 𝒎 𝒊 𝟏𝟖𝟑
So 𝑋̅ = ∑𝒌𝒊=𝟏 ∑ 𝒇𝒊
= = 𝟗. 𝟏𝟓.
𝟐𝟎
Example 4: From the following data, calculate the missing frequency? The mean number
of tablets to cure ever was 29.18.
Number of tablets 19 − 21 22 − 24 25 − 27 28 − 30 31 − 33 34 − 36 37 − 39
Number of 6 13 19 𝑓4 18 12 9
persons cured
CI 19 − 21 22 − 24 25 − 27 28 − 30 31 − 33 34 37 − 39 Total
− 36
𝒇𝒊 6 13 19 𝑓4 18 12 9 77+𝑓4
𝒎𝒊 20 23 26 29 32 35 38
𝒇𝒊 𝒎𝒊 120 299 494 29𝑓4 576 420 342 2251
+ 29𝑓4
geometric mean is of great importance because it gives a good mean value. If the observed
values are measured as ratios, proportions or percentages, then the geometric mean gives a
better measure of central tendency than any other means.
The Geometrical mean of a set of values 𝑥1 , 𝑥2 , … , 𝑥𝑛 of n positive values is defined as
the nth root of their product . That is,
𝐺. 𝑀 = 𝑛√ 𝑥1 ∗ 𝑥2 ∗ … ∗ 𝑥𝑛
Example: The G.M of 4, 8 and 6 is
3 3
𝐺. 𝑀 = √ 4 𝑥 8 𝑥 6 = √ 192 = 5.77.
In general, the sample geometric mean is calculated by
𝑮. 𝒎 =
𝑛
√ 𝑥1 ∗ 𝑥2 ∗ … ∗ 𝑥𝑛 → 𝑓𝑜𝑟 𝑟𝑎𝑤 𝑑𝑡𝑎𝑡𝑎
{ 𝑛√ 𝑥1 𝑓1 ∗ 𝑥2 𝑓2 ∗, … ,∗ 𝑥𝑘 𝑓𝑘 → 𝑓𝑜𝑟 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑤ℎ𝑒𝑟𝑒 𝑛 = ∑ 𝑓𝑖 .
𝑛
√ 𝑚1 𝑓1 ∗ 𝑚2 𝑓2 ∗. , … ,∗ 𝑚𝑘 𝑓𝑘 → 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑤ℎ𝑒𝑟𝑒 𝑛 = ∑ 𝑓𝑖 .
Example1: The man gets three annual raises in his salary. At the end of first year, he
gets an increase of 4%, at the end of the second year, he gets an increase of 6% and at
the
end of the third year, he gets an increase of 9% of his salary. What is the average
percentage increase in the three periods?
3
Solution: 𝐺. 𝑀 = √1.04 ∗ 1.06 ∗ 1.09 = 1.0631 => 1.0631 − 1 =
0.0631.
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒, 𝑡ℎ𝑒 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒 𝑖𝑠 6.31%.
Example 2: Compute the Geometric mean of the following data.
Values 2 4 6 8 10
Frequency 1 2 2 2 1
8 8 8
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏: 𝑔. 𝑚 = √ 21 ∗ 42 ∗ 62 ∗ 82 ∗ 101 = √2 ∗ 16 ∗ 36 ∗ 64 ∗ 100 = √737280 =
5.41. Example 3: Suppose that the profits earned by a certain construction company in four
projects were 3%, 2%, 4% & 6% respectively. What is the Geometric mean profit?
4 4
𝑺𝒐𝒍𝒖𝒕𝒊𝒐𝒏: 𝑔. 𝑚 = √3 ∗ 2 ∗ 4 ∗ 6 = √144 = 3.46.
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒; 𝑡ℎ𝑒 𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐 𝑚𝑒𝑎𝑛 𝑝𝑟𝑜𝑓𝑖𝑡 𝑖𝑠 3.46 𝑝𝑒𝑟𝑐𝑒𝑛𝑡.
2.2.1.4 Harmonic mean
Another important mean is the harmonic mean, which is suitable measure of central tendency
when the data pertains to speed, rates and price.
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be n variant values in a set of observations, then simple harmonic
𝐧 𝐧
mean is given by: 𝑺. 𝑯. 𝑴 = 𝟏 𝟏 𝟏 = 𝐧 𝟏
+ +⋯+ ∑𝐢=𝟏
𝐱𝟏 𝐱𝟐 𝐱𝐧 𝐱𝐢
Note: SHM is used for equal distances, equal costs and equal rates.
Example 1: A motorist travels for three days at a rate (speed) of 480 km/day. On the first
day he travels 10 hours at a rate of 48 km/h, on the second day 12 hours at a rate of 40 km/h,
on the third day 15 hours at a rate of 32 km/h. What is the average speed?
Solution: Since the distance covered by the motorist is equal (𝑖. 𝑒. 𝑠1 = 480, 𝑠2 = 480, 𝑠3 =
480), so we use SHM.
3
𝑆. 𝐻. 𝑀 = 1/48+1/40+1/32 = 38.92 so the required average speed = 38.92 𝑘𝑚/ℎ.
We can check this, by using the known formula for average speed in elementary physics.
total distance covered 𝑆𝑇
Check; 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑝𝑒𝑒𝑑 (𝑉𝑎𝑣 ) = =
total time taken 𝑡𝑇
480km+480km+480km 1440km
= = = 38.92 𝑘𝑚/ h.
10hr+12hr+15hr 37hr
Example 2: A business man spent 20 Birr for milk at 40 cents per liter in Mizan-Aman town
and another 20 Birr at 60 cents per liter in Tepi town. What is the average price of milk at
two towns.
Solution: Since the price on the two towns are equal (20 Birr), so we use
SHM.
2
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑝𝑟𝑖𝑐𝑒 (𝑝𝑎𝑣 ) = 𝑆. 𝐻. 𝑀 = 1 1 = 48 𝑐𝑒𝑛𝑡𝑠/𝑙𝑖𝑡𝑒𝑟.
+
40 60
Solution: Since the price on the two towns are different , so we use WHM by taking the cost
as weights (wi).
∑ 𝑤𝑖 (20 + 25) 𝑏𝑖𝑟𝑟
𝑝𝑎𝑣 = 𝑤. ℎ. 𝑚 = w = = 45 𝑐/𝑙.
∑ i (20 + 25) 𝑏𝑖𝑟𝑟 𝑙/𝑐
xi 40 50
(Finally if all the observations are positive) 𝐴. 𝑀 ≥ 𝐺. 𝑀 ≥ 𝐻. 𝑀.
Corrected mean
𝒄−𝒘
̅𝒄𝒐𝒓𝒓 = 𝒙
𝒙 ̅𝒘 + 𝑤ℎ𝑒𝑟𝑒 𝑥̅𝑤 𝑖𝑠 𝑡ℎ𝑒 𝑤𝑟𝑜𝑛𝑔 𝑚𝑒𝑎𝑛
𝒏
Values (xi) 3 5 4 2 7 6
Frequency (fi) 2 1 3 2 1 1
Solution: First arrange the data in increasing order and construct the lcf table for this data.
Values (xi) 2 3 4 5 6 7
Frequency (fi) 2 2 3 1 1 1
Lcf 2 4 7 8 9 10
𝑛 10 𝑛 10
𝑛 = 10 =≫ 𝑒𝑣𝑒𝑛. 𝑆𝑜 = = 5 𝑎𝑛𝑑 +1= + 1 = 5 + 1 = 6.
2 2 2 2
Then the smallest LCF which is ≥ 5 & 6 𝑖𝑠 7 and the variant value corresponding to this
4+4
LCF is 4. Thus the median is x̃ = = 4.
2
Example 2: Calculate the median of the marks of 46 students given below.
Values (xi) 10 9 11 12 14 13 15 16 17 18
Frequency (fi) 2 1 3 6 10 11 7 3 2 1
Solution: First arrange the data in ascending order and construct the LCF table for this data.
Values (xi) 9 10 11 12 13 14 15 16 17 18
Frequency (fi) 1 2 3 6 11 10 7 3 2 1
LCF 1 3 6 12 23 33 40 43 45 46
𝑛 46 𝑛 46
𝑛 = 46 =≫ 𝑒𝑣𝑒𝑛. 𝑆𝑜 = = 23 𝑎𝑛𝑑 + 1 = + 1 = 23 + 1 = 24.
2 2 2 2
𝑇ℎ𝑒 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝐿𝐶𝐹 ≥ 23 & 24 𝑎𝑟𝑒 23 & 33 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦 𝑎𝑛𝑑 the variant values
13+14
corresponding to these LCF are 13 & 14 respectively. Thus the median 𝑖𝑠 x̃ = =
2
13.5.
Median for grouped data
The formula for computing the median for grouped data is given by
𝒏
(𝟐 − 𝒍𝒄𝒇𝒑 ) 𝒙 𝒘
𝒎𝒆𝒅𝒊𝒂𝒏 = 𝐱̃ = 𝐥𝐜𝐛𝒙̃ +
𝒇𝒎
𝑊ℎ𝑒𝑟𝑒: 𝑙𝑐𝑏𝑥̃ − 𝑖𝑠 𝑡ℎ𝑒 𝒍𝒄𝒃 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠.
𝑛 − 𝑖𝑠 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠.
𝑓𝑚 𝑖𝑠 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠.
𝑤 𝑖𝑠 𝑡ℎ𝑒 𝑤𝑖𝑑𝑡ℎ 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠. 𝑟𝑒𝑐𝑎𝑙𝑙 ∶ 𝑤 = 𝑢𝑐𝑏 − 𝑙𝑐𝑏.
𝑙𝑐𝑓𝑝 𝑖𝑠 𝑡ℎ𝑒 𝒍𝒄𝒇 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝒊𝒎𝒎𝒆𝒅𝒊𝒂𝒕𝒆𝒍𝒚 𝒑𝒓𝒆𝒄𝒆𝒅𝒊𝒏𝒈 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠.
n
Note: The class corresponding to the smallest LCF which is ≥ is called the
2
median class. So that the median lies in this class.
Steps to calculate the median for grouped data
1. First construct the LCF table.
n
2. Determine the median class. To determine the median class, find and search the
2
n
smallest LCF which is ≥ 2. Then the class corresponding to this lcf is the median
class.
Example 1: Find the median for the following data.
Daily production 80 − 89 90 − 99 100 − 109 110 − 119 120 − 129 130 − 139
Frequency 5 9 20 8 6 2
𝑛 50 𝑛
To obtain the median class , calculate = = 25. Thus the smallest lcf which is ≥ is
2 2 2
34. So the class corresponding to this lcf is 100 − 109, 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠.
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒, 𝑙𝑐𝑏𝑥̃ = 99.5, 𝑤 = 10, 𝑓𝑚 = 20, 𝑙𝑐𝑓𝑝 = 14.
𝑛
(2 − 𝑙𝑐𝑓𝑝 ) 𝑥 𝑤 (25 − 14) 𝑥 10
𝑚𝑒𝑑𝑖𝑎𝑛 = x̃ = lcb𝑥̃ + = 99.5 + = 105.
𝑓𝑚 20
Properties of the median
1. The median is unique.
2. It can be computed for an open ended frequency distribution if the median does not
lie in an open ended class.
3. It is not affected by extremely large or small values .
4. It is not so suitable for algebraic manipulations.
5. It can be computed for ratio level, interval level and ordinal level data.
2.2.3 The mode
In every day speech, something is “in the mode” if it is fashionable or popular. In statistics
this “popularity” refers to frequency of observations.
Therefore, mode is the `most frequently observed value in a set of observations.
𝑬𝒙𝒂𝒎𝒑𝒍𝒆: 𝑺𝒆𝒕 𝑨: 10, 10, 9, 8, 5, 4, 5, 12, 10 𝑚𝑜𝑑𝑒 = 10 → 𝑢𝑛𝑖𝑚𝑜𝑑𝑎𝑙.
𝑺𝒆𝒕 𝑩: 10, 10, 9, 9, 8, 12, 15, 5 𝑚𝑜𝑑𝑒 = 9 &10 →
𝑏𝑖𝑚𝑜𝑑𝑎𝑙.
𝑺𝒆𝒕 𝑪: 4, 6, 7, 15, 12, 9 𝑛𝑜 𝑚𝑜𝑑𝑒.
Remark: In a set of observed values, all values occur once or equal number of times, there is
no mode. (See set C above).
Mode for a grouped data
If the data is grouped such that we are given frequency distribution of finite class intervals,
we do not know the value of every item, but we easily determine the class with highest
frequency. Therefore, the modal class is the class with the highest frequency. So that the
mode of the distribution lies in this class.
Ages 18 − 20 21 − 23 24 − 26 27 − 29 30 − 32
Number 4 8 11 20 7
Solution: First we determine the modal class. The modal class is 27 − 29, since it has the
highest frequency. 𝑇ℎ𝑢𝑠, 𝑙𝑐𝑏𝑥̂ = 26.5, 𝑤 = 3, ∆1 = 20 − 11 = 9, ∆2 = 20 − 7 = 13.
∆1 9 27
𝑋̂ = 𝑙𝑐𝑏𝑥̂ + ( ) 𝑥 𝑤 = 26.5 + ( ) 𝑥 3 = 26.5 + ( ) = 26.5 + 1.2 = 𝟐𝟕. 𝟕
∆1 + ∆2 9 + 13 22
Interpretation: The age of most of these newly hired employees is 27.7 (27 years and 7
months).
Example 2: The following table shows the distribution of a group of families according to
their expenditure per week. The median and the mode of the following distribution are
known to be 25.50 Birr and 24.50 Birr respectively. Two frequency values are however
missing from the table. Find the missing frequencies.
Class interval 1 − 10 11 − 20 21 − 30 31 − 40 41 − 50
Frequency 14 𝑓2 27 𝑓4 15
Solution: The LCF table of the given distribution can be formed as follows.
Expenditure (CI) 1 − 10 11 − 20 21 − 30 31 − 40 41 − 50
Number of families (fi) 14 𝑓2 27 𝑓4 15
LCF 14 14 + 𝑓2 41 + 𝑓2 41 + 𝑓2 + 𝑓4 56 + 𝑓2 + 𝑓4
Here: 𝑛 = 56 + 𝑓2 + 𝑓4 . Since the median and the mode are Birr 25.5 & 24.5 respectively
then the class 21 − 30 is the median class as well as the modal class.
56+𝑓2 +𝑓4
( −(14+𝑓2 ) ) x 10
2
25.5 = 20.5 + (𝑖)
27
(27−𝑓2 ) x 10
24.5 = 20.5 + (27−𝑓 )+(27−𝑓 ) (𝑖𝑖)
2 4
𝒊𝒙𝒏
( 𝟒 − 𝒍𝒄𝒇𝒑 ) 𝒙 𝒘
𝐐𝐢 = 𝐥𝐜𝐛𝒒𝒊 + , 𝒇𝒐𝒓 𝒊 = 𝟏, 𝟐, 𝟑.
𝒇𝒒𝒊
𝑊ℎ𝑒𝑟𝑒: lcb𝑞𝑖 − 𝑖𝑠 𝑡ℎ𝑒 𝒍𝒄𝒃 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠.
𝑓𝑞𝑖 − 𝑖𝑠 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠.
𝑙𝑐𝑓𝑝 𝑖𝑠 𝑡ℎ𝑒 𝑙𝑐𝑓 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝒊𝒎𝒎𝒆𝒅𝒊𝒂𝒕𝒆𝒍𝒚 𝒑𝒓𝒆𝒄𝒆𝒅𝒊𝒏𝒈 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠.
𝑁𝑜𝑡𝑒: 𝑄2 = 𝑚𝑒𝑑𝑖𝑎𝑛
2. Percentiles (P)
Percentiles are 99 points, which divide a given ordered data into 100 equal parts. These
𝑚 𝑥 (𝑛 + 1)𝑡ℎ
𝑃𝑚 = , 𝑚 = 1, 2, … ,99. → 𝑓𝑜𝑟 𝑟𝑎𝑤 𝑑𝑎𝑡𝑎.
100
interval 21 − 22 23 − 24 25 − 26 27 − 28 29 − 30
F 10 22 20 14 14
interval 21 − 22 23 − 24 25 − 26 27 − 28 29 − 30 total
F 10 22 20 14 14 80
Lcf 10 32 52 66 80
𝑛 80
a) 𝑄1 =? = = 20 . Thus, the minimum lcf just ≥ 20 is 32 so the class
4 4
corresponding to this 𝑙𝑐𝑓 𝑖𝑠 23 − 24, is the first quartile class. lcb𝑞1 = 22.5, 𝑤 = 2, 𝑓𝑞1 =
22, 𝑙𝑐𝑓𝑝 = 10.
𝑛
( −𝑙𝑐𝑓𝑝 ) 𝑥 𝑤 (20−10)𝑥2
4
Q1 = lcb𝑞1 + = 22.5 + = 23.41.
𝑓𝑞1 22
2𝑛 160
𝑄2 =? = = 40. Thus, the minimum lcf just ≥ 40 is 52 so the class corresponding
4 4
to this 𝑙𝑐𝑓 𝑖𝑠 25 − 26, is the second quartile class. lcb𝑞2 = 24.5, 𝑤 = 2, 𝑓𝑞2 = 20, 𝑙𝑐𝑓𝑝 =
2𝑥𝑛
( −𝑙𝑐𝑓𝑝 ) 𝑥 𝑤 (40−32)𝑥2
4
32. Q2 = lcb𝑞2 + = 24.5 + = 25.3.
𝑓𝑞2 20
𝑄3 = 27.64.
25𝑥𝑛 25𝑥80
b) 𝑃25 =? = = 20. Thus, the minimum lcf just ≥ 20 is 32 so the class
100 100
corresponding to this 𝑙𝑐𝑓 𝑖𝑠 23 − 24, is the 25th percentile class.
𝑇ℎ𝑢𝑠, lcb𝑝25 = 22.5, 𝑤 = 2, 𝑓𝑝25 = 22, 𝑙𝑐𝑓𝑝 = 10.
25𝑥𝑛
( 100 − 𝑙𝑐𝑓𝑝 ) 𝑥 𝑤 (20 − 10) 𝑥 2
p25 = lcb𝑝25 + = 22.5 + = 23.41.
𝑓𝑝25 22
p20 = 23.045.
p30 = 23.77.
p50 = 25.3.
p75 = 27.64.
1𝑥𝑛 80
C) 𝐷1 =? = 10 = 8. Thus, the minimum lcf just ≥ 8 is 10 so the class corresponding
10
to this 𝑙𝑐𝑓 𝑖𝑠 21 − 22, is the first decile class. 𝑇ℎ𝑢𝑠, lcb𝐷1 = 20.5, 𝑤 = 2, 𝑓𝐷1 =
10, 𝑙𝑐𝑓𝑝 = 0.
1𝑥𝑛
( 10 − 𝑙𝑐𝑓𝑝 ) 𝑥 𝑤 (8 − 0) 𝑥 2
D1 = lcb𝐷1 + = 20.5 + = 22.1 =≫ 𝑢𝑝𝑡𝑜 𝑐𝑙𝑎𝑠𝑠𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦.
𝑓𝐷1 10
D2 = 23.045.
D3 = 23.77.
D5 = 25.3.
:. Q1 = P25 , Q2 = P50 and Q3 = P75 D1 = P10, D2 = P20, D3 = P30 and D5 = P50
and median = Q2 = D5 = P50
The inter quartile range (IQR) is the difference between the third and the first quartiles of a
set of items. It is not affected by extreme values. So it is a good indicator of the absolute
variability.
Quartile deviation (semi-inter-quartile range): is defined as half of the inter quartile range.
𝑄3 − 𝑄1 𝐼𝑄𝑅
𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑄. 𝐷) = = .
2 2
It is an absolute measure of variation.
If quartile deviation is to be used for comparing the variability of two series, then it is
necessary to convert the absolute measure to a coefficient of quartile deviation.
𝑄 −𝑄
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝐶. 𝑄. 𝐷) = 𝑄3 +𝑄1 .
1 3
2.115.
𝑄 −𝑄 4.23
c. 𝐶. 𝑄. 𝐷 = 𝑄3 +𝑄1 = 23.41+27.64 = 0.083
1 3
∑𝑛 ̅
𝑖=1|𝑋𝑖 − 𝑋 | |53− 59|+|56− 59|+⋯+|66− 59|
𝑀𝐷𝑥̅ = = = 3.67.
𝑛 6
∑𝑛 ̃
𝑖=1|𝑋𝑖 − 𝑋 | |53− 58|+|56− 58|+⋯+|66− 58|
𝑀𝐷𝑥̃ = = = 3.67.
𝑛 6
∑𝑘 ̅
𝑖=1 𝑓𝑖 |𝑚𝑖 − 𝑋|
𝑀𝐷𝑥̅ = ∑ 𝑓𝑖
𝑘
} → For grouped frequency distribution.
∑𝑖=1 𝑓𝑖 |𝑚𝑖 − 𝑋̃|
𝑀𝐷𝑥̃ = ∑ 𝑓𝑖
The coefficient of mean deviation from the mean and from the median are given by:
𝑀𝐷𝑥̅
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑀𝐷𝑥̅ =
𝑋̅
𝑀𝐷𝑥̃
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑀𝐷𝑥̃ =
𝑋̃
Example: Find the coefficient of mean deviation from the mean and from the median for the
weight of six students in previous example.
𝑀𝐷𝑥̅ 3.67 𝑘𝑔
Solution: Coefficient of 𝑀𝐷𝑥̅ = = = 0.0622.
𝑋̅ 59 𝑘𝑔
𝑀𝐷𝑥̃ 3.67 𝑘𝑔
Coefficient of 𝑀𝐷𝑥̃ = = = 0.0633.
𝑋̃ 58 𝑘𝑔
Ex: Find the mean deviation and coefficient of mean deviation from the mean and from the
median for the data given below, test results of 10 students collected (out of 20).
Limitation: Because of absolute value, mean deviation ignores the algebraic signs of the
deviations and it leads to a series difficulties in inference.
4. Variance and Standard Deviation
Variance: is the average of the squares of the deviations taken from the mean.
Suppose that 𝑥1 , 𝑥2 , … , 𝑥𝑁 be the set of observations on N populations. Then,
∑𝑁
𝑖=1(𝑥𝑖 − 𝜇)
2 ∑𝑁 2
𝑖=1 𝑥𝑖 − 𝑁𝜇
2
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 2 = = . → 𝑓𝑜𝑟 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛.
𝑁 𝑁
∑𝑛
𝑖=1(𝑥𝑖 − 𝑥̅ )2 ∑𝑛
𝑖=1 𝑥𝑖
2 − 𝑛𝑥̅ 2
𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠 2 = = . → 𝑓𝑜𝑟 𝑠𝑎𝑚𝑝𝑙𝑒.
𝑛−1 𝑛−1
In general, the sample variance is computed by:
𝜎
𝐶. 𝑉 = 𝑥 100% → 𝑓𝑜𝑟 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛.
𝜇
𝑆
𝐶. 𝑉 = 𝑥 100% → 𝑓𝑜𝑟 𝑠𝑎𝑚𝑝𝑙𝑒.
𝑥̅
Example: The following data refers to the hemoglobin level for 5 males and 5 female
students. In which case, the hemoglobin level has high variability (less consistency).
13+13.8+14.6+15.6+17 74 12+12.5+13.8+14.6+15.6
Solution: 𝑥̅𝑚𝑎𝑙𝑒 = = = 14.8 , 𝑥̅𝑓𝑒𝑚𝑎𝑙𝑒 = =
5 5 5
68
= 13.7.
5
∑𝑛 2
𝑖=1 𝑥𝑖 − 𝑛𝑥̅
2
𝑠 2 𝑚𝑎𝑙𝑒𝑠 = = 2.44. , 𝑆𝑚𝑎𝑙𝑒𝑠 = √2.44 = 1.56205.,
𝑛−1
∑𝑛𝑖=1 𝑥𝑖 2 − 𝑛𝑥̅ 2
𝑠 2𝑓𝑒𝑚𝑎𝑙𝑒𝑠 = = 2.19. , 𝑆𝑓𝑒𝑚𝑎𝑙𝑒𝑠 = √2.19 = 1.479865.
𝑛−1
𝑆𝑚𝑎𝑙𝑒𝑠 1.56205
𝐶. 𝑉𝑚𝑎𝑙𝑒𝑠 = 𝑥 100% = 𝑥100% = 𝟏𝟎. 𝟓𝟔%,
𝑥̅ 𝑚𝑎𝑙𝑒 14.8
𝑆𝑓𝑒𝑚𝑎𝑙𝑒𝑠 1.479865
𝐶. 𝑉𝑓𝑒𝑚𝑎𝑙𝑒𝑠 = 𝑥 100% = 𝑥100% = 𝟏𝟎. 𝟖%.
𝑥̅𝑓𝑒𝑚𝑎𝑙𝑒 13.7
Therefore, the variability in hemoglobin level is higher for females than for males.
𝑥−𝑥̅ 90−85
𝑍 𝑠𝑐𝑜𝑟𝑒 𝑓𝑜𝑟 𝐵 𝑖𝑠 𝑍 = = = 0.71.
𝑆 7
Since ZA > ZB i.e. 2 > 0.71, student A performed better relative to his group than student B.
Therefore, student A has performed better relative to his group because the score's of student
A is two standard deviation above the mean score of section 1 while the score of student B is
only 0.71 standard deviation above the mean score of students in section 2.