Fitting A Straight Line
Fitting A Straight Line
Fitting A Straight Line
In many Social Economical, Engineering and Physical problems we have a set of values of x and y
although we do not know the functional relationship between them. Fitting of a curve to a given set of
values means finding a relationship between x and y whose curve is the closest possible curve to the
given values. The curve so obtained does not pass through all the given points but is close to them to
the maximum extent. Finding such a for a given set of values is called Curve fitting. The relation in
general is assumed to be a linear function or a parabolic function or even exponential or logarithms.
The method used for fitting the curve is based on the principle of least squares.
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2
Example 1
Fit a Straight line to the following data
𝑥 0 1 2 3 4 5
𝑦 1 2 3 4.5 6 7.5
Solution
𝑥 𝑦 𝑥𝑦 𝑥2
0 1 0 0
1 2 2 1
2 3 6 4
3 4.5 13.5 9
4 6 24 16
5 7.5 37.5 25
∑ 𝑥 =15 ∑ 𝑦 = 24 ∑ 𝑥 𝑦 = 83 ∑ 𝑥 2 = 55
1
EM III_SMITA N
Substituting in the Normal Equation
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 24 = 6𝑎 + 15𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 83 = 15𝑎 + 55𝑏……………..(2)
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 0.7142 + 1.3142𝑥
Example 2
Fit a first-degree curve to the following data and estimate the value of
𝑦 𝑤ℎ𝑒𝑛 𝑥 = 73
𝑥 10 20 30 40 50 60 70 80
𝑦 1 3 5 10 6 4 2 1
Solution
𝑥 𝑦 𝑥𝑦 𝑥2
10 1 10 100
20 3 60 400
30 5 150 900
40 10 400 1600
50 6 300 2500
60 4 240 3600
70 2 140 4900
80 1 80 6400
∑ 𝑥 = 360 ∑ 𝑦 = 32 ∑ 𝑥 𝑦 = 1380 ∑ 𝑥 2 = 20400
2
EM III_SMITA N
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 4.6428 − 0.0142𝑥
When 𝑥 = 73 𝑡ℎ𝑒𝑛 𝑦 = 3.6062
Example 3
Fit a straight line to the following data
(𝑥, 𝑦) = (1,1), (2,5), (3,11), (4,8), (5,14)
Solution
𝑥 𝑦 𝑥𝑦 𝑥2
1 1 1 1
2 5 10 4
3 11 33 9
4 8 32 16
5 14 70 25
∑ 𝑥 = 15 ∑ 𝑦 = 39 ∑ 𝑥 𝑦 = 146 ∑ 𝑥 2 = 55
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 39 = 5𝑎 + 15𝑏 ……………(1) :
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = −0.9 + 2.9𝑥
Example 4
Fit a straight line to the following data
𝑥 100 120 140 160 180 200
𝑦 0.45 0.55 0.60 0.70 0.80 0.85
3
EM III_SMITA N
Solution
𝑥 𝑦 𝑥𝑦 𝑥2
100 0.45 45 10000
120 0.55 66 14400
140 0.60 84 19600
160 0.70 112 25600
180 0.80 144 32400
200 0.85 170 40000
∑ 𝑥 = 900 ∑ 𝑦 = 3.95 ∑ 𝑥 𝑦 = 621 ∑ 𝑥 2 = 142000
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 0.0476 + 0.00407𝑥
Example 5
Fit a straight line to the following data
𝑥 1 2 3 4 5 6
𝑦 49 54 60 73 80 86
Solution
𝑥 𝑦 𝑥𝑦 𝑥2
1 49 49 1
2 54 108 4
3 60 180 9
4 73 292 16
5 80 400 25
6 86 516 36
∑ 𝑥 = 21 ∑ 𝑦 = 402 ∑ 𝑥 𝑦 = 1545 ∑ 𝑥 2 = 91
4
EM III_SMITA N
Substituting in the Normal Equation
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 402 = 6𝑎 + 21𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 1545 = 21𝑎 + 91𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 39.4 𝑏 = 7.8857
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 39.4 + 7.8857𝑥
Example 6
Fit a straight line to the following data
(X,Y) = (1,-5),(1,1),(2,4),(3,7),(4,10)
Solution
𝑥 𝑦 𝑥𝑦 𝑥2
1 -5 -5 1
1 1 1 1
2 4 8 4
3 7 21 9
4 10 40 16
∑ 𝑥 = 11 ∑ 𝑦 = 17 ∑ 𝑥 𝑦 = 65 ∑ 𝑥 2 = 31
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 65 = 11𝑎 + 31𝑏……………..(2)
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = −5.5294 + 4.0588𝑥
5
EM III_SMITA N
Example 7
Solution
𝑥 𝑦 𝑋 = 𝑥 − 1967 𝑌 = 𝑦 − 165 𝑋𝑌 𝑋2
1965 125 -2 -40 80 4
1966 140 -1 -25 25 1
1967 165 0 0 0 0
1968 195 1 30 30 1
1969 200 2 35 70 4
∑𝑋 = 0 ∑𝑌 = 0 ∑ 𝑋𝑌 = 205 ∑ 𝑋 2 = 10
∑ 𝑋𝑌 = 𝑎 ∑ 𝑋 + 𝑏 ∑ 𝑋 2 205 = 0𝑎 + 10𝑏……………..(2)
𝑦 = 𝑎 + 𝑏𝑋
𝑦 = 0 + 20.5𝑋
𝑏𝑢𝑡 𝑋 = 𝑥 − 1967
𝑦 = 0 + 20.5(𝑥 − 1967)
𝑦 = 20.5𝑥 − 40323.5
6
EM III_SMITA N
Example 8
Fit a straight line to the following data
𝑥 0 5 10 15 20 25
𝑦 12 12 17 22 24 30
Example 9
Fit a straight line to the following data. Also estimate the production in 1967
𝑦𝑒𝑎𝑟 1951 1961 1971 1981 1991
𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 10 12 8 10 15
Example 10
Fit a straight line to the following data
𝑥 1 2 3 4 5 6
𝑦 83 92 71 90 160 191
Example 11
Fit a straight line to the following data. Estimate 𝑦 when 𝑥 = 12
𝑥 1 2 3 4 5 6 7 8 9 10
𝑦 52.5 58.7 65.0 70.2 75.4 81.1 87.2 95.5 102.2 108.4
𝐴𝑛𝑠 𝑦 = 119.66
7
EM III_SMITA N
Fitting a Parabola or Fitting a second-degree Curve
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2
Where values of 𝑎, 𝑏 𝑎𝑛𝑑 𝑐 are calculated by solving the Normal Equations
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 + 𝑐 ∑ 𝑥 3
∑ 𝑥2𝑦 = 𝑎 ∑ 𝑥2 + 𝑏 ∑ 𝑥3 + 𝑐 ∑ 𝑥4
Example 1
Fit a second-degree Parabola to the following data
𝑥 1 2 3 4 5
𝑦 25 28 33 39 46
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑥 2𝑦 𝑥3 𝑥4
1 25 25 1 25 1 1
2 28 56 4 112 8 16
3 33 99 9 297 27 81
4 39 156 16 624 64 256
5 46 230 25 1150 125 625
∑𝑥 ∑𝑦 ∑𝑥𝑦 ∑ 𝑥 2 = 55 ∑ 𝑥2𝑦 ∑ 𝑥3 ∑ 𝑥4
= 15 = 171 = 566 = 2208 = 225 = 979
8
EM III_SMITA N
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2 171 = 5𝑎 + 15𝑏 + 55𝑐
Example 2
Fit a Parabola to the following data
𝑥 -2 -1 0 1 2
𝑦 1.0 1.8 1.3 2.5 6.3
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑥 2𝑦 𝑥3 𝑥4
-2 1.0 -2 4 4 -8 16
-1 1.8 -1.8 1 1.8 -1 1
0 1.3 0 0 0 0 0
1 2.5 2.5 1 2.5 1 1
2 6.3 12.6 4 25.2 8 16
∑𝑥 = 0 ∑𝑦 ∑𝑥𝑦 ∑ 𝑥 2 = 10 ∑ 𝑥2𝑦 ∑ 𝑥3 = 0 ∑ 𝑥4
= 12.9 = 11.3 = 33.5 = 34
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2 12.9 = 5𝑎 + 0𝑏 + 10𝑐
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 + 𝑐 ∑ 𝑥 3 11.3 = 0𝑎 + 10𝑏 + 0𝑐
𝑥 𝑋=
𝑥 − 1951 𝑌=𝑦 𝑋𝑌 𝑋2 𝑋2𝑌 𝑋3 𝑋4
10
1921 -3 3 -9 9 27 -27 81
1931 -2 5 -10 4 20 -8 16
1941 -1 9 -9 1 9 -1 1
1951 0 10 0 0 0 0 0
1961 1 12 12 1 12 1 1
1971 2 14 28 4 56 8 16
1981 3 15 45 9 135 27 81
∑𝑋 = 0 ∑𝑌 ∑𝑋𝑌 ∑ 𝑋2 ∑ 𝑋2𝑌 ∑ 𝑋3 ∑ 𝑋4
= 68 = 57 = 28 = 259 =0 = 196
∑ 𝑌 = 𝑛𝑎 + 𝑏 ∑ 𝑋 + 𝑐 ∑ 𝑋 2 68 = 7𝑎 + 0𝑏 + 28𝑐
∑ 𝑋𝑌 = 𝑎 ∑ 𝑋 + 𝑏 ∑ 𝑋 2 + 𝑐 ∑ 𝑋 3 57 = 0𝑎 + 28𝑏 + 0𝑐
10
EM III_SMITA N
𝑌 = 10.33 + 2.0357(2.4) − 0.1547(2.4)2
𝑦 = 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 = 14.32
Example 4
Fit a second-degree curve to the following data. Estimate the value of
𝑦 𝑤ℎ𝑒𝑛 𝑥 = 80
𝑥 10 20 30 40 50 60 70
𝑦 20 60 70 80 90 100 100
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑥 2𝑦 𝑥3 𝑥4
10 20 200 100 2000 1000 10000
20 60 1200 400 24000 8000 160000
30 70 2100 900 63000 27000 810000
40 80 3200 1600 128000 64000 2560000
50 90 4500 2500 225000 125000 6250000
60 100 6000 3600 360000 216000 12960000
70 100 7000 4900 490000 343000 24010000
∑𝑥 ∑𝑦 ∑𝑥𝑦 ∑ 𝑥2 ∑ 𝑥2𝑦 ∑ 𝑥3 ∑ 𝑥4
= 280 = 520 = 24200 = 14000 = 1292000 = 784000 = 46760000
𝑤ℎ𝑒𝑛 𝑥 = 80 𝑦 = 94.3429
11
EM III_SMITA N
Example 5
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑥 2𝑦 𝑥3 𝑥4
-2 -3.150 6.3 4 -12.6 -8 16
-1 -1.390 1.39 1 -1.39 -1 1
0 0.620 0 0 0 0 0
1 2.880 2.88 1 2.88 1 1
2 5.378 10.756 4 21.512 8 16
∑𝑥 = 0 ∑ 𝑦 = 4.338 ∑𝑥𝑦 ∑ 𝑥2 ∑ 𝑥 2 𝑦 = 10.402 ∑ 𝑥3 = 0 ∑ 𝑥 4 = 34
= 21.326 = 10
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2 4.338 = 5𝑎 + 0𝑏 + 10𝑐
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 + 𝑐 ∑ 𝑥 3 21.326 = 0𝑎 + 10𝑏 + 0𝑐
𝑥 0 1 2 3 4 5 6
𝑦 1 1 3 7 13 21 31
12
EM III_SMITA N
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑥 2𝑦 𝑥3 𝑥4
0 1 0 0 0 0 0
1 1 1 1 1 1 1
2 3 6 4 12 8 16
3 7 21 9 63 27 81
4 13 52 16 208 64 256
5 21 105 25 525 125 625
6 31 186 36 1116 216 1296
∑𝑥 ∑𝑦 ∑𝑥𝑦 ∑ 𝑥 2 = 91 ∑ 𝑥2𝑦 ∑ 𝑥3 ∑ 𝑥4
= 21 = 77 = 371 = 1925 = 441 = 2275
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2 77 = 7𝑎 + 21𝑏 + 91𝑐
𝑦 = 1 − 𝑥 + 𝑥2
Example 7
13
EM III_SMITA N
Example 8
𝑦 = −1 + 3.55𝑥 + 0.27𝑥 2
Example 9
Fit a Second-degree curve to the following data. Also estimate the production
in 1982
𝑦𝑒𝑎𝑟 1974 1975 1976 1977 1978 1979 1980 1981
𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 12 14 26 42 40 50 52 53
14
EM III_SMITA N
CORRELATION
Correlation is the study of existence and the magnitude and the direction of variation
between two or more variables.
Scatter diagram
One of the simplest methods of studying Correlation between two variables is to construct a
Scatter diagram. To obtain a scatter diagram, one variable is plotted along the 𝑥 − 𝑎𝑥𝑖𝑠 and
the other along the 𝑦 − 𝑎𝑥𝑖𝑠 on a graph paper. By plotting data in this way, we get points
which are generally scattered but which shows a pattern. The way in which the points are
scattered indicates the degree and direction of correlation. If the points are close to each other
we infer that the variables are correlated. If they are spread away from each other, we infer
that the variables are not correlated. Moreover, if the points lie in a narrow strip rising from
left hand bottom to the right hand top, we say that there is a positive correlation of high order.
If the points lie in a narrow strip falling from the left-hand top to the right-hand bottom, we
say that there is negative correlation of high order. If the points are all spread over, we say
that there is no correlation.
The method of scatter diagram is descriptive in nature and gives only a general idea of
Correlation. The most commonly used method which gives a mathematical expression is the
one suggested by Karl Pearson.
1
EM III_SMITA N
𝐶𝑜𝑣(𝑥, 𝑦)
𝑟=
𝜎𝑥 𝜎𝑦
1
𝜎𝑥 2 = ∑(𝑥 − 𝑥̅ )2 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑡ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥
𝑁
1
𝜎𝑦 2 = 𝑁 ∑(𝑦 − 𝑦̅)2 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑡ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑦
1
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑟= 𝑁
√ 1 ∑(𝑥 − 𝑥̅ )2 √ 1 ∑(𝑦 − 𝑦̅)2
𝑁 𝑁
∑ 𝑥𝑦 − 𝑁𝑥̅ 𝑦̅
𝑟=
√∑ 𝑥 2 − 𝑁𝑥̅ 2 √∑ 𝑦 2 − 𝑁𝑦̅ 2
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
Note: - −1 ≤ 𝑟 ≤ 1
2
EM III_SMITA N
Direct Method
Example 1
Find Karl Pearson’s Coefficient of Correlation from the following data
𝑥 2 4 5 6 8 11
𝑦 18 12 10 8 7 5
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
2 18 36 4 324
4 12 48 16 144
5 10 50 25 100
6 8 48 36 64
8 7 56 64 49
11 5 55 121 25
∑ 𝑥 = 36 ∑ 𝑦 = 60 ∑ 𝑥 𝑦 = 293 ∑ 𝑥 2 = 266 ∑ 𝑦 2 = 706
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
(36)(60)
293 −
𝑟= 6 = −0.9203
2 2
√266 − (36) √706 − (60)
6 6
Example 2
Find Karl Pearson’s Coefficient of Correlation from the following data
𝑥 8 8 7 5 6 2
𝑦 3 4 10 13 22 8
3
EM III_SMITA N
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
8 3 24 64 9
8 4 32 64 16
7 10 70 49 100
5 13 65 25 169
6 22 132 36 484
2 8 16 4 64
∑ 𝑥 = 36 ∑ 𝑦 = 60 ∑ 𝑥 𝑦 = 339 ∑ 𝑥 2 = 242 ∑ 𝑦 2 = 842
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
(36)(60)
339 −
𝑟= 6 = −0.2647
2 2
√242 − (36) √842 − (60)
6 6
Solution
∑ 𝑥 770
𝑥̅ = = = 110
𝑁 7
∑ 𝑦 770
𝑦̅ = = = 110
𝑁 7
4
EM III_SMITA N
𝒙 𝒚 ̅) (𝒚 − 𝒚
(𝒙 − 𝒙 ̅) ̅)(𝒚 − 𝒚
(𝒙 − 𝒙 ̅) ̅) 𝟐
(𝒙 − 𝒙 ̅ )𝟐
(𝒚 − 𝒚
100 110 -10 0 0 100 0
102 100 -8 -10 80 64 100
108 104 -2 -6 12 4 36
111 108 1 -2 -2 1 4
115 112 5 2 10 25 4
116 116 6 6 36 36 36
118 120 8 10 80 64 100
∑(𝑥 − 𝑥̅ )(𝑦 − ∑(𝑥 − 𝑥̅ )2 ∑(𝑦 − 𝑦̅)2 = 280
𝑦̅)=216
= 294
216
𝑟= = 0.7528
√294√280
Example 4
Solution
∑ 𝑥 544
𝑥̅ = = = 68
𝑁 8
∑ 𝑦 552
𝑦̅ = = = 69
𝑁 8
5
EM III_SMITA N
𝒙 𝒚 ̅) (𝒚 − 𝒚
(𝒙 − 𝒙 ̅) ̅)(𝒚 − 𝒚
(𝒙 − 𝒙 ̅) ̅) 𝟐
(𝒙 − 𝒙 ̅ )𝟐
(𝒚 − 𝒚
65 67 -3 -2 6 9 4
66 68 -2 -1 2 4 1
67 65 -1 -4 4 1 16
67 68 -1 -1 1 1 1
68 72 0 3 0 0 9
69 72 1 3 3 1 9
70 69 2 0 0 4 0
72 71 4 2 8 16 4
∑(𝑥 − 𝑥̅ )(𝑦 − ∑(𝑥 − 𝑥̅ )2 = 36 ∑(𝑦 − 𝑦̅)2 = 44
𝑦̅)=24
24
𝑟= = 0.6030
√36√44
Assumed Mean Method
Example 5
Find Karl Pearson’s Coefficient of Correlation from the following data
𝑥 62 64 65 69 70 71 72 74
𝑦 126 125 139 145 165 152 180 208
Solution
𝑑𝑥 = 𝑥 − 70 𝑑𝑦 = 𝑦 − 165
𝑥 𝑑𝑥 𝑦 𝑑𝑦 𝑑𝑥 𝑑𝑦 𝑑𝑥 2 𝑑𝑦 2
62 -8 126 -39 312 64 1521
64 -6 125 -40 240 36 1600
65 -5 139 -26 130 25 676
69 -1 145 -20 20 1 400
70 0 165 0 0 0 0
71 1 152 -13 -13 1 169
72 2 180 15 30 4 225
74 4 208 43 172 16 1849
∑ 𝑑𝑥 ∑ 𝑑𝑦 ∑ 𝑑𝑥 𝑑𝑦 = 891 ∑ 𝑑𝑥 2 ∑ 𝑑𝑦 2 = 6440
= −13 = −80 = 147
6
EM III_SMITA N
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑑𝑥 2 − (∑ 𝑑𝑥 ) √∑ 𝑑𝑦 2 − (∑ 𝑑𝑦 )
𝑁 𝑁
(−13)(−80)
891 −
= 8 = 0.9031
(−13) 2 (−80) 2
√147 − √6440 −
8 8
Example 6
Find Karl Pearson’s Coefficient of Correlation from the following data
𝑥 52 42 38 42 45 42 44 40 46 44 43 40
𝑦 10 26 41 29 27 27 19 18 19 31 29 33
Solution
𝑑𝑥 = 𝑥 − 42
𝑑𝑦 = 𝑦 − 27
𝑥 𝑑𝑥 𝑦 𝑑𝑦 𝑑 𝑥 𝑑𝑦 𝑑𝑥 2 𝑑𝑦 2
52 10 10 -17 -170 100 289
42 0 26 -1 0 0 1
38 -4 41 14 -56 16 196
42 0 29 2 0 0 4
45 3 27 0 0 9 0
42 0 27 0 0 0 0
44 2 19 -8 -16 4 64
40 -2 18 -9 18 4 81
46 4 19 -8 -32 16 64
44 2 31 4 8 4 16
43 1 29 2 2 1 4
40 -2 33 6 -12 4 36
∑ 𝑑𝑥 = 14 ∑ 𝑑𝑦 ∑ 𝑑𝑥 𝑑𝑦 ∑ 𝑑𝑥 2 = 158 ∑ 𝑑𝑦 2 = 755
= −15 = −258
7
EM III_SMITA N
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑑𝑥 2 − (∑ 𝑑𝑥 ) √∑ 𝑑𝑦 2 − (∑ 𝑑𝑦 )
𝑁 𝑁
(14)(−15)
−258 −
= 12
2 2
√158 − (14) √755 − (−15)
12 12
= −0.7446
Example 7
Find Karl Pearson’s Coefficient of Correlation from the following data
𝒙 2 3 4 7 4
𝐲 8 7 3 1 1
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
2 8 16 4 64
3 7 21 9 49
4 3 12 16 9
7 1 7 49 1
4 1 4 16 1
∑ 𝑥 = 20 ∑ 𝑦 = 20 ∑ 𝑥 𝑦 = 60 ∑ 𝑥 2 = 94 ∑ 𝑦 2 = 124
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
(20)(20)
60 −
𝑟= 5 = −0.8058
2 2
√94 − (20) √124 − (20)
5 5
8
EM III_SMITA N
Example 8
𝒙 30 33 25 10 33 75 40 85 90 95
𝑦 68 65 80 85 70 30 55 18 15 10
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
30 68 2040 900 4624
33 65 2145 1089 4225
25 80 2000 625 6400
10 85 850 100 7225
33 70 2310 1089 4900
75 30 2250 5625 900
40 55 2200 1600 3025
85 18 1530 7225 324
90 15 1350 8100 225
95 10 950 9025 100
∑𝑥 ∑𝑦 ∑𝑥𝑦 ∑ 𝑥2 ∑ 𝑦 2 = 31948
= 516 = 496 = 17625 = 35378
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
(516)(496)
17625 −
= 10
2 2
√35378 − (516) √31948 − (496)
10 10
= −0.9937
9
EM III_SMITA N
Example 9
𝒙 10 12 14 15 16 17 18 10 14 15
𝑦 17 16 15 12 10 9 8 15 13 12
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
10 17 170 100 289
12 16 192 144 256
14 15 210 196 225
15 12 180 225 144
16 10 160 256 100
17 9 153 289 81
18 8 144 324 64
10 15 150 100 225
14 13 182 196 169
15 12 180 225 144
∑𝑥 ∑𝑦 ∑ 𝑥 𝑦 = 1721 ∑ 𝑥 2 = 2055 ∑ 𝑦 2 = 1697
= 141 = 127
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
(141)(127)
1721 −
= 10
2 2
√2055 − (141) √1697 − (127)
10 10
= −0.9292
10
EM III_SMITA N
Example 10
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
61 64 3904 3721 4096
63 62 3906 3969 3844
65 65 4225 4225 4225
67 70 4690 4489 4900
69 72 4968 4761 5184
∑ 𝑥 = 325 ∑ 𝑦 = 333 ∑ 𝑥 𝑦 = 21693 ∑ 𝑥 2 = 21165 ∑ 𝑦 2 = 22249
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
(325)(333)
21693 −
𝑟= 5 = 0.8994
2 2
√21165 − (325) √22249 − (333)
5 5
Example 11
Find Karl Pearson’s Coefficient of Correlation from the following data
𝒙 3 5 4 6 2
𝑦 3 4 5 2 6
Solution
11
EM III_SMITA N
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
3 3 9 9 9
5 4 20 25 16
4 5 20 16 25
6 2 12 36 4
2 6 12 4 36
∑ 𝑥 = 20 ∑ 𝑦 = 20 ∑ 𝑥 𝑦 = 73 ∑ 𝑥 2 = 90 ∑ 𝑦 2 = 90
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
(20)(20)
73 −
𝑟= 5 = −0.7
(20) 2 (20) 2
√90 − √90 −
5 5
Example 12
Find Karl Pearson’s Coefficient of Correlation from the following data
𝒙 23 27 28 29 30 31 33 35 36 39
𝑦 18 22 23 24 25 26 28 29 30 32
𝐴𝑛𝑠 𝑟 = 0.9948
Example 13
Find Karl Pearson’s Coefficient of Correlation from the following data
𝒙 100 200 300 400 500
𝑦 30 40 50 60 70
𝐴𝑛𝑠 𝑟 = 1
12
EM III_SMITA N
Example 14
Find Karl Pearson’s Coefficient of Correlation from the following data
𝒙 100 98 85 92 90 84 88 90 93 95
𝑦 500 610 700 630 670 800 800 750 700 690
𝐴𝑛𝑠 𝑟 = −0.8179
Example 15
A sample of 25 pairs of values of 𝑥 𝑎𝑛𝑑 𝑦 lead to the following results.
Later on, it was found that two pair of values were taken as (8,14) and (8,6) instead of correct
values (8,12) and (6,8). Find the corrected Correlation coefficient between 𝑥 𝑎𝑛𝑑 𝑦.
Solution
We know that
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
13
EM III_SMITA N
𝐶𝑜𝑟𝑟𝑒𝑐 𝑡𝑒𝑑 ∑ 𝑥𝑦 = 500 − (𝑖𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑥𝑦) + (𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑥𝑦)
=500 − (8𝑋 14 + 8𝑋6) + (8𝑋12 + 6𝑋8)
=500−160 + 144 = 484
∑𝑥∑𝑦
∑ 𝑥𝑦−
𝑁
Corrected 𝑟 = 2 2
√∑ 𝑥 2 −(∑ 𝑥) √∑ 𝑦 2 −(∑ 𝑦)
𝑁 𝑁
(125)(100)
484 −
= 25
2 2
√732 − 125 √425 − 100
25 25
= −0.3093
Example 16
Calculate the Coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦 from the following data
N=10
∑ 𝑥 = 225 , ∑ 𝑦 = 189 , ∑(𝑥 − ̅̅
22̅̅)2 = 85 , ∑(𝑦 − ̅19
̅̅̅)2 = 25,
14
EM III_SMITA N
∑(𝑥 − 22)(𝑦 − 19) = 42
Solution
𝑑𝑥 = 𝑥 − 22 𝑑𝑦 = 𝑦 −19
∑ 𝑑𝑥 2 = 85 ∑ 𝑑𝑦 2 = 25 ∑ 𝑑𝑥 𝑑𝑦 = 42
𝑑𝑥 = 𝑥 − 22
∑ 𝑑𝑥 = ∑(𝑥 − 22)
= ∑ 𝑥 − 𝑁 (22)
𝑁=10
∑ 𝑑𝑥 = 225 − 10(22) = 5
𝑑𝑦 = 𝑦 −19
∑ 𝑑𝑦 = ∑(𝑦 − 19)
= ∑ 𝑦 − 𝑁 (19)
𝑁=10
15
EM III_SMITA N
∑ 𝑑𝑦 = 189 − 10(19) =-1
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑑𝑥 2 − (∑ 𝑑𝑥 ) √∑ 𝑑𝑦 2 − (∑ 𝑑𝑦 )
𝑁 𝑁
5(−1)
42−
10
= 25 1
= 0.9376
√85− √25−
10 10
Example 17
Given No. of pairs of observation=10
𝑥 series standard deviation=22.70
𝑦 series standard deviation=9.592
Summation of the products of corresponding deviations of 𝑥 𝑎𝑛𝑑 𝑦 from their respective
actual means=-1439.find 𝑟.
Solution
𝐶𝑜𝑣(𝑥, 𝑦)
𝑟=
𝜎𝑥 𝜎𝑦
1
𝐶𝑜𝑣(𝑥, 𝑦) = ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑁
1439
𝐶𝑜𝑣(𝑥, 𝑦) = − = −143.9
10
𝜎𝑥 = 22.70 𝜎𝑦 = 9.592
16
EM III_SMITA N
−143.9
𝑟= = −0.6608
(22.70)(9.592)
Example 18
Given 𝑟 = 0.5 ∑ 𝑥𝑦 = 60 𝜎𝑦 = 4 𝑎𝑛𝑑 ∑ 𝑥 2 = 90
𝑤ℎ𝑒𝑟𝑒 𝑥 𝑎𝑛𝑑 𝑦 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑡ℎ𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒 𝑎𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑚𝑒𝑎𝑛. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒
number of observations.
Solution
∑ 𝑥 = ∑(𝑥 − 𝑥̅ )
∑ 𝑦 = ∑(𝑦 − 𝑦̅)
1
𝐶𝑜𝑣(𝑥, 𝑦) 𝑁 ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑟= =
𝜎𝑥 𝜎𝑦
√ 1 ∑(𝑥 − 𝑥̅ )2 𝜎𝑦
𝑁
1
(60)
0.5 = 𝑁
√90 (4)
𝑁
1
60
0.5 = 𝑁
√90 (4)
𝑁
60
√𝑁 =
(0.5)(√90) (4)
√𝑁 = 3.1622
𝑁 = 10
17
EM III_SMITA N
Example 19
Coefficient of Correlation between two variables is 0.4. Their Covariance is 12 and the
variance of 𝑥 is 25. Find the standard deviation of y.
Solution
𝐶𝑜𝑣(𝑥, 𝑦)
𝑟=
𝜎𝑥 𝜎𝑦
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 𝜎𝑥 2 = 25
𝜎𝑥 = 5 𝜎𝑦 =?
12
0.4 =
(5)𝜎𝑦
12
𝜎𝑦 = =6
(5)(0.4)
Example 20
Calculate the Coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦 from the following data
N=10 ∑ 𝑥 = 140 , ∑ 𝑦 = 150 , ∑(𝑥 − 10)2 = 180 , ∑(𝑦 − 15)2 = 215, ∑(𝑥 − 10)(𝑦 −
15) = 60
Solution
𝑑𝑥 = 𝑥 − 10 𝑑𝑦 = 𝑦 −15
∑ 𝑑𝑥 2 = 180 ∑ 𝑑𝑦 2 = 215 ∑ 𝑑𝑥 𝑑𝑦 = 60
𝑑𝑥 = 𝑥 − 10
∑ 𝑑𝑥 = ∑(𝑥 − 10)
= ∑ 𝑥 − 𝑁 (10)
𝑁=10
18
EM III_SMITA N
∑ 𝑑𝑥 = 140 − 100 = 40
𝑑𝑦 = 𝑦 −15
∑ 𝑑𝑦 = ∑(𝑦 − 15)
= ∑ 𝑦 − 𝑁 (15)
𝑁=10
∑ 𝑑𝑦 = 150 − 150 =0
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑑𝑥 2 − (∑ 𝑑𝑥 ) √∑ 𝑑𝑦 2 − (∑ 𝑑𝑦 )
𝑁 𝑁
40(0)
60− 10
= 2 2
= 0.9149
√180−40 √215−0
10 10
Example 21
A Computer while calculating the correlation coefficient between two variables 𝑥 𝑎𝑛𝑑 𝑦
from 25 observations obtained the following results.
where x and y denotes the actual valuies of x and y. Find the value of r
Solution
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
19
EM III_SMITA N
(125)(100)
508 −
= 25 = 0.0676
125 2 100 2
√650 − √960 −
25 25
20
EM III_SMITA N
SPEARMAN’S RANK CORRELATION
The method developed by Spearman is simpler than Karl Pearson’s method as it
depends on the ranks of the items and not on the actual values of the items. It is
denoted by R. The value of R lies between -1 and +1.
When the ranks are not equal
6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛
Example 1
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 36 56 20 42 33 44 50 15 60
𝑦 50 35 70 58 75 60 45 80 38
Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
36 6 50 6 0 0
56 2 35 9 -7 49
20 8 70 3 5 25
42 5 58 5 0 0
33 7 75 2 5 25
44 4 60 4 0 0
50 3 45 7 -4 16
15 9 80 1 8 64
60 1 38 8 -7 49
1
EM III_SMITA N
∑ 𝑑 2 = 228
6(228)
=1− = −0.9
93 − 9
Example 2
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 62 64 65 69 70 71 72 74
𝑦 126 125 139 145 165 152 180 208
Solution
∑ 𝑑2 = 4
2
EM III_SMITA N
Since the ranks are not equal
6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛
6(4)
=1− = 0.9523
83 − 8
Example 3
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 25 28 32 36 38 40 39 42 41 45
𝑦 70 80 85 75 59 65 48 50 54 66
Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
25 10 70 4 6 36
28 9 80 2 7 49
32 8 85 1 7 49
36 7 75 3 4 16
38 6 59 7 -1 1
40 4 65 6 -2 4
39 5 48 10 -5 25
42 2 50 9 -7 49
41 3 54 8 -5 25
45 1 66 5 -4 16
∑ 𝑑 2 = 270
Example 4
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 12 17 22 27 32
𝑦 113 119 117 115 121
Solution
∑ 𝑑2 = 8
6(8)
=1− = 0.6
53 − 5
4
EM III_SMITA N
Example 5
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 18 20 34 52 12
𝑦 39 23 35 18 46
Solution
∑ 𝑑 2 = 38
Example 6
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 61 63 65 67 69
𝑦 64 62 65 70 72
Solution
5
EM III_SMITA N
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
61 5 64 4 1 1
63 4 62 5 -1 1
65 3 65 3 0 0
67 2 70 2 0 0
69 1 72 1 0 0
∑ 𝑑2 = 2
6 ∑ 𝑑2
Since the ranks are not equal 𝑅 =1−
𝑛3 −𝑛
6(2)
=1− = 0.9
53 − 5
Example 7
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 52 63 45 36 72 65 45 25
𝑦 62 53 51 25 79 43 60 33
Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
52 4 62 2 2 4
63 3 53 4 -1 1
45 5.5 51 5 0.5 0.25
36 7 25 8 -1 1
72 1 79 1 0 0
65 2 43 6 -4 16
45 5.5 60 3 2.5 6.25
25 8 33 7 1 1
6
EM III_SMITA N
There are two entries in the x column having the same value 45 i.e. there is a tie
5+6
between ranks 5 and 6 so each is given a rank= = 5.5
2
∑ 𝑑 2 = 29.5
1 1 1
6 [∑ 𝑑 2 + (𝑚 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 1 12 12
𝑛3 − 𝑛
1 3
6 [29.5 + (2 − 2)]
𝑅 =1− 12
83 − 8
6[29.5 + 0.5]
=1−
83 − 8
6[30]
=1− = 0.6428
83 − 8
Example 8
The following table shows the marks obtained by 10 students in Accountancy
and Statistics. Find Spearman’s coefficient of rank Correlation
Student no. 1 2 3 4 5 6 7 8 9 10
𝐴𝑐𝑐𝑜𝑢𝑚𝑡𝑎𝑛𝑐𝑦 45 70 65 30 90 40 50 57 85 60
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 35 90 70 40 95 40 60 80 80 50
7
EM III_SMITA N
Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
45 8 35 10 -2 4
70 3 90 2 1 1
65 4 70 5 -1 1
30 10 40 8.5 1.5 2.25
90 1 95 1 0 0
40 9 40 8.5 0.5 0.25
50 7 60 6 1 1
57 6 80 3.5 2.5 6.25
85 2 80 3.5 -1.5 2.25
60 5 50 7 -2 4
There are two entries in the y column having the same value 80 i.e. there is a tie
3+4
between ranks 3 and 4 so each is given a rank= = 3.5
2
There are two entries in the y column having the same value 40 i.e. there is a tie
8+9
between ranks 8 and 9 so each is given a rank= = 8.5
2
∑ 𝑑 2 = 22
8
EM III_SMITA N
1 3 1
6 [22 + (2 − 2) + (23 − 2)]
𝑅 =1− 12 12
3
10 − 10
6[23]
=1− = 0.8606
103 − 10
Example 9
Obtain the Rank Correlation coefficient from the following data
𝑥 85 74 85 50 65 78 74 60 74 90
𝑦 78 91 78 58 60 72 80 55 68 70
Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
85 2.5 78 3.5 -1 1
74 6 91 1 5 25
85 2.5 78 3.5 -1 1
50 10 58 9 1 1
65 8 60 8 0 0
78 4 72 5 -1 1
74 6 80 2 4 16
60 9 55 10 -1 1
74 6 68 7 -1 1
90 1 70 6 -5 25
9
EM III_SMITA N
There are two entries in the x column having the same value 85 i.e. there is a tie
2+3
between ranks 2 and 3 so each is given a rank= = 2.5
2
there are three entries in the x column having the same value 74 i.e. there is a tie
5+6+7
between ranks 5,6 and 7 so each is given a rank= =6
3
there are two entries in the y column having the same value 78 i.e. there is a tie
3+4
between ranks 3 and 4 so each is given a rank= = 3.5
2
∑ 𝑑 2 = 72
1 1 1
6 [∑ 𝑑 2 + (𝑚1 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 12 12
3
𝑛 −𝑛
1 3 1 1
6 [72 + (2 − 2) + (33 − 3) + (23 − 2)]
𝑅 =1− 12 12 12
3
10 − 10
6[75]
=1− = 0.5454
103 − 10
10
EM III_SMITA N
Example 10
Obtain the rank correlation coefficient from the following data
𝑥 10 12 18 18 15 40
𝑦 12 18 25 25 50 25
Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
10 6 12 6 0 0
12 5 18 5 0 0
18 2.5 25 3 -0.5 0.25
18 2.5 25 3 -0.5 0.25
15 4 50 1 3 9
40 1 25 3 -2 4
There are two entries in the x column having the same value 18 i.e there is a tie
2+3
between ranks 2 and 3 so each is given a rank= = 2.5
2
There are three entries in the y column having the same value 25 i.e. there is a
2+3+4
tie between ranks 2,3 and 4 so each is given a rank= =3
3
∑ 𝑑 2 = 13.5
11
EM III_SMITA N
1 3 1
6 [13.5 + (2 − 2) + (33 − 3)]
𝑅 =1− 12 12
3
6 −6
6[13.5 + 0.5 + 2]
=1−
63 − 6
6[16]
=1− = 0.5429
63 − 6
Example 11
Obtain the Rank Correlation coefficient from the following data
𝑥 32 55 49 60 43 37 43 49 10 20
𝑦 40 30 70 20 30 50 72 60 45 25
Solution
12
EM III_SMITA N
there are two entries in the x column having the same value 43 i.e. there is a tie
5+6
between ranks 5 and 6 so each is given a rank= = 5.5
2
there are two entries in the y column having the same value 30 i.e. there is a tie
7+8
between ranks 7 and 8 so each is given a rank= = 7.5
2
∑ 𝑑 2 = 176
1 3 1 1
6 [176 + (2 − 2) + (23 − 2) + (23 − 2)]
𝑅 =1− 12 12 12
3
10 − 10
6[176 + 0.5 + 0.5 + 0.5]
=1−
103 − 10
6[177.5]
=1− = −0.07575
103 − 10
Example 12
13
EM III_SMITA N
Solution
∑ 𝑑 2 = 35.5
1 3
6 [35.5 + (2 − 2)]
𝑅 =1− 12
53 − 5
= −0.8
Example 13
Calculate the value of rank correlation coefficient from the following data
regarding marks of 6 students in statistics and accountancy in a test.
Marks in 40 42 45 35 36 39
Statistics
Marks in 46 43 44 39 40 43
Accountancy
Solution
14
EM III_SMITA N
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
40 3 46 1 2 4
42 2 43 3.5 -1.5 2.25
45 1 44 2 -1 1
35 6 39 6 0 0
36 5 40 5 0 0
39 4 43 3.5 0.5 0.25
∑ 𝑑 2 = 7.5
1 1 1
6 [∑ 𝑑 2 + (𝑚 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 1 12 12
𝑛3 − 𝑛
6[7.5 + 0.5]
=1−
63 − 6
6[8]
=1− = 0.7714
63 − 6
15
EM III_SMITA N
Practice Problems
Example 14
Find the rank correlation coefficient between poverty and over crowding of
cities from the following data.
Town A B C D E F G H I J
𝑁𝑜. 𝑜𝑓 𝑝𝑜𝑜𝑟 𝑓𝑎𝑚𝑖𝑙𝑖𝑒𝑠 17 13 15 16 6 11 14 9 7 12
𝑂𝑣𝑒𝑟𝑐𝑟𝑜𝑤𝑑𝑖𝑛𝑔 30 46 35 24 12 18 27 22 46 8
𝐴𝑛𝑠 𝑅 = 0.73
Example 15
Find the rank coefficient of correlation from the following data.
𝑥 105 110 112 108 111 116 120 104 115 125
𝑦 39 41 45 38 48 58 60 35 54 69
𝐴𝑛𝑠 𝑅 = 0.9636
Example 16
The coefficient of rank correlation between marks in Physics and Chemistry
obtained by a group of students is 0.8. If the sum of the squares of differenced
in ranks is 33. Find the number of pair of students.
Solution
6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛
6(33)
0.8 = 1 −
𝑛3 − 𝑛
6(33)
= 1 − 0.8
𝑛3 − 𝑛
16
EM III_SMITA N
198
= 0.2
𝑛3 − 𝑛
𝑛3 − 𝑛 = 990
𝑛3 − 𝑛 − 990 = 0
𝑛 = 10 𝑎𝑠 𝑡ℎ𝑒 𝑜𝑡ℎ𝑒𝑟 𝑡𝑤𝑜 𝑟𝑜𝑜𝑡𝑠 𝑎𝑟𝑒 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑟𝑦
Example 17
The coefficient of rank correlation of the marks obtained by 10 students in
Economics and Accountancy was found to be 0.5.It was later discovered that
the difference in the ranks in the two subjects obtained by one of the students
was wrongly taken as 3 instead of 7.Find the correct coefficient of rank
correlation.
Solution
𝑅 = 0.5 𝑛 = 10
6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛
6 ∑ 𝑑2
0.5 = 1 − 3
10 − 10
6 ∑ 𝑑2
0.5 = 1 −
990
6 ∑ 𝑑2
= 1 − 0.5
990
6 ∑ 𝑑2
= 0.5
990
6 ∑ 𝑑 2 = 990 𝑋 0.5
6 ∑ 𝑑 2 = 495
17
EM III_SMITA N
495
∑ 𝑑2 =
6
𝐶𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑑 2
Example 18
If 𝑅𝑥,𝑦 = 0.143 and the sum of the squares of the difference between the
ranks is 48 find 𝑛.
Solution
6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛
6(48)
0.143 = 1 −
𝑛3 − 𝑛
6(48)
= 1 − 0.143
𝑛3 − 𝑛
288
= 0.857
𝑛3 − 𝑛
𝑛3 − 𝑛 = 336
18
EM III_SMITA N
𝑛3 − 𝑛 − 336 = 0
𝑛 = 7 𝑎𝑠 𝑡ℎ𝑒 𝑜𝑡ℎ𝑒𝑟 𝑡𝑤𝑜 𝑟𝑜𝑜𝑡𝑠 𝑎𝑟𝑒 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑟𝑦
19
EM III_SMITA N
REGRESSION
Regression is defined as a method of estimating the value of one variable when that of the
other is known and when the variables are correlated.
If the variables which are highly correlated are plotted on a graph then the points lie in a
narrow strip. If the strip is nearly straight, we may draw a line such that all the points are
close to it from both the sides. Such a line can be taken as the representative of the ideal
variation. It is called the line of best fit. It is a line such that the sum of the distances of the
points from the line is minimum. It is also called as the Line of regression. But we do not
measure the distance by dropping a perpendicular from a point to the line. We measure
the deviations vertically and horizontally and get one line when distances are minimised
vertically and second line when distances are minimised horizontally. thus, we get two lines
of regression.
Lines of regression of y on x
If we minimise the deviations of the point from the line measured along y axis we get a line
which is called the line of regression of y on x. Its equation is written in the form𝑦 = 𝑎 + 𝑏𝑥
This line is used for estimating the value of y for a given value of x.
Lines of regression of x on y
If we minimise the deviations of the point from the line measured along x axis we get a line
which is called the line of regression of x on y .its equation is written in the form 𝑥 = 𝑎 +
𝑏𝑦 This line is used for estimating the value of x for a given value of y
Method of least Square
First Method (Normal Equation Method)
The equation of line of regression of 𝑦 𝑜𝑛 𝑥
𝑦 = 𝑎 + 𝑏𝑥
Where values of 𝑎 𝑎𝑛𝑑 𝑏 are calculated by solving the Normal Equations
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2
.
The equation of line of regression of 𝑥 𝑜𝑛 𝑦
1
EM III_SMITA N
𝑥 = 𝑎 + 𝑏𝑦
Where values of 𝑎 𝑎𝑛𝑑 𝑏 are calculated by solving the Normal Equations
∑ 𝑥 = 𝑛𝑎 + 𝑏 ∑ 𝑦
∑ 𝑥𝑦 = 𝑎 ∑ 𝑦 + 𝑏 ∑ 𝑦 2
Example 1
Obtain the equation of line of regression of 𝑦 𝑜𝑛 𝑥 from the following data and estimate
𝑦 𝑤ℎ𝑒𝑛 𝑥 = 73
𝑥: 70 72 74 76 78 80
𝑦 163 170 179 188 196 220
Solution
𝑥 𝑦 𝑥𝑦 𝑥2
70 163 11410 4900
72 170 12240 5184
74 179 13246 5476
76 188 14288 5776
78 196 15288 6084
80 220 17600 6400
∑ 𝑥 = 450 ∑ 𝑦 = 1116 ∑ 𝑥 𝑦 = 84072 ∑ 𝑥 2 = 33820
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = −212.57 + 5.3142𝑥
𝑤ℎ𝑒𝑛 𝑥 = 73 𝑡ℎ𝑒𝑛 𝑦 = 175.37
2
EM III_SMITA N
Example 2
Obtain the equation of line of regression of 𝑥 𝑜𝑛 𝑦 from the following data
𝑥: 1 3 4 6 8 9 11 14
𝑦 1 2 4 4 5 7 8 9
Solution
𝑥 𝑦 𝑥𝑦 𝑦2
1 1 1 1
3 2 6 4
4 4 16 16
6 4 24 16
8 5 40 25
9 7 63 49
11 8 88 64
14 9 126 81
∑ 𝑥 = 56 ∑ 𝑦 = 40 ∑ 𝑥 𝑦 = 364 ∑ 𝑦 2 = 256
∑ 𝑥 = 𝑛𝑎 + 𝑏 ∑ 𝑦 56 = 8𝑎 + 40𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑦 + 𝑏 ∑ 𝑦 2 364 = 40𝑎 + 256𝑏……………..(2)
𝑥 = 𝑎 + 𝑏𝑦
𝑥 = −0.5 + 1.5𝑦
Example 3
Find the lines of regression and hence find the coefficient of Correlation.
𝑥: 65 66 67 67 68 69 70 72
𝑦 67 68 65 66 72 72 69 71
3
EM III_SMITA N
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
65 67 4355 4225 4489
66 68 4488 4356 4624
67 65 4355 4489 4225
67 66 4422 4489 4356
68 72 4896 4624 5184
69 72 4968 4761 5184
70 69 4830 4900 4761
72 71 5112 5184 5041
∑ 𝑥 = 544 ∑ 𝑦 = 550 ∑ 𝑥 𝑦 = 37426 ∑ 𝑥 2 = 37028 ∑ 𝑦 2 = 37864
Lines of regressions of y 𝑜𝑛 𝑥
Substituting in the Normal Equation
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 550 = 8𝑎 + 544𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 37426 = 544𝑎 + 37028𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 19.64 𝑏 = 0.722
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 19.6388 + 0.7222𝑥
Line of regression of 𝑥 𝑜𝑛 𝑦
∑ 𝑥 = 𝑛𝑎 + 𝑏 ∑ 𝑦 544 = 8𝑎 + 550𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑦 + 𝑏 ∑ 𝑦 2 37426 = 550𝑎 + 37864𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 33.29 𝑏 = 0.5048
𝑥 = 𝑎 + 𝑏𝑦
𝑥 = 33.2912 + 0.5048𝑦
𝑟 2 = (0.7222)(0.5048) = 0.3645
𝑟 = 0.6037
Example 4
𝒙 5 6 7 8 9 10 11
𝑦 11 14 14 15 12 17 16
4
EM III_SMITA N
Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
5 11 55 25 121
6 14 84 36 196
7 14 98 49 196
8 15 120 64 225
9 12 108 81 144
10 17 170 100 289
11 16 176 121 256
∑ 𝑥 = 56 ∑ 𝑦 = 99 ∑ 𝑥 𝑦 = 811 ∑ 𝑥 2 = 476 ∑ 𝑦 2 = 1427
Lines of regressions of y 𝑜𝑛 𝑥
Substituting in the Normal Equation
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 99 = 7𝑎 + 56𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 811 = 56𝑎 + 476𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 8.7142 𝑏 = 0.6785
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 8.7142 + 0.6785𝑥
Line of regression of 𝑥 𝑜𝑛 𝑦
∑ 𝑥 = 𝑛𝑎 + 𝑏 ∑ 𝑦 56 = 7𝑎 + 99𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑦 + 𝑏 ∑ 𝑦 2 811 = 99𝑎 + 1427𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = −2.0053 𝑏 = 0.7074
𝑥 = 𝑎 + 𝑏𝑦
𝑥 = −2.0053 + 0.7074𝑦
SECOND METHOD
METHOD OF LEAST SQUARE
Line of regression of 𝑦 𝑜𝑛 𝑥
𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
𝜎𝑦
𝑤ℎ𝑒𝑟𝑒 𝑏𝑦𝑥 = 𝑟
𝜎𝑥
5
EM III_SMITA N
∑𝑥 ∑𝑦
𝑥̅ = , 𝑦̅ =
𝑛 𝑛
𝜎𝑥 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥
𝜎𝑦 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑦
𝑟 = 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝑥̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑥
𝑦̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑦
Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅)
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
∑𝑥 ∑𝑦
𝑥̅ = , 𝑦̅ =
𝑛 𝑛
𝑏𝑥𝑦 = 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠
𝜎𝑥 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥
𝜎𝑦 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑦
𝑟 = 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝑥̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑥
𝑦̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑦
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑥𝑦 = 𝑁
2
(∑ 𝑦)
∑ 𝑦2 −
𝑁
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑦𝑥 = 𝑁
2
(∑ 𝑥)
∑ 𝑥2 −
𝑁
We know that
6
EM III_SMITA N
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁
𝜎𝑦 𝜎𝑥
𝑏𝑦𝑥 𝑏𝑥𝑦 = 𝑟 𝑟 = 𝑟2
𝜎𝑥 𝜎𝑦
𝜎𝑦 𝜎𝑥
𝑏𝑦𝑥 𝑏𝑥𝑦 = 𝑟 𝑟 = 𝑟2
𝜎𝑥 𝜎𝑦
𝑟 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑓 𝑏𝑦𝑥 & 𝑏𝑥𝑦 𝑎𝑟𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑟 𝑖𝑠 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑖𝑓 𝑏𝑦𝑥 & 𝑏𝑥𝑦 𝑎𝑟𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒
2) If one coefficient of regression is greater than one then the other must be less than
one.
Since −1 ≤ 𝑟 ≤ 1
𝑟2 ≤ 1
𝑏𝑦𝑥 𝑏𝑥𝑦 ≤ 1
1
𝑏𝑦𝑥 ≤
𝑏𝑥𝑦
𝑏𝑦𝑥 < 1, 𝑏𝑥𝑦 > 1
1 𝜎𝑦 𝜎
To prove (𝑟 + 𝑟 𝜎𝑥 ) ≥ 𝑟
2 𝜎𝑥 𝑦
𝜎𝑦 𝜎
T.P (
𝜎𝑥
+ 𝜎𝑥 ) ≥ 2
𝑦
7
EM III_SMITA N
T.P 𝜎𝑦 2 + 𝜎𝑥 2 ≥ 2𝜎𝑥 𝜎𝑦
T.P 𝜎𝑦 2 + 𝜎𝑥 2 − 2𝜎𝑥 𝜎𝑦 ≥ 0
(𝑖) 𝐼𝑓 𝑟 = 0 𝑡𝑎𝑛𝜃 = ∞
𝜋
𝜃=
2
The lines of regression are perpendicular to each other.
(ii) 𝐼𝑓 𝑟 = ±1 𝑡𝑎𝑛𝜃 = 0
𝜃=0
The lines of regression are Coincident.
Example 1
The following table gives the age of car of a certain make and annual maintenance
cost. Obtain the equation of the line of regression of cost on age.
𝐴𝑔𝑒 𝑜𝑓 𝑎 𝑐𝑎𝑟 2 4 6 8
𝑀𝑎𝑖𝑛𝑡𝑒𝑛𝑎𝑛𝑐𝑒 𝑐𝑜𝑠𝑡 1 2 2.5 3
Solution
Line of regression of 𝑦 𝑜𝑛 𝑥
𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
𝑥 𝑦 𝑥𝑦 𝑥2
2 1 2 4
4 2 8 16
6 2.5 15 36
8 3 24 64
∑ 𝑥 = 20 ∑ 𝑦 = 8.5 ∑ 𝑥 𝑦 = 49 ∑ 𝑥 2 = 120
8
EM III_SMITA N
∑ 𝑥 20
𝑥̅ = = =5,
𝑛 4
∑ 𝑦 8.5
𝑦̅ = = = 2.125
𝑛 4
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑦𝑥 = 𝑁
2
2 (∑ 𝑥)
∑𝑥 −
𝑁
(20)(8.5)
49 −
= 4 = 0.325
202
120 − 4
Substituting in 𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
𝑦 − 2.125 = 0.325(𝑥 − 5)
𝑦 = 0.325𝑥 + 0.5
Example 2
Solution
Line of regression of 𝑦 𝑜𝑛 𝑥
𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
9
EM III_SMITA N
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
10 19 190 100 361
12 22 264 144 484
13 24 312 169 576
16 27 432 256 729
17 29 493 289 841
20 33 600 400 900
25 37 925 625 1369
∑ 𝒙 =113 ∑ 𝒚 = 𝟏𝟖𝟖 ∑ 𝒙 𝒚 =3216 ∑ 𝒙𝟐 = 𝟏𝟗𝟖𝟑 ∑ 𝒚𝟐 = 𝟓𝟐𝟔𝟎
∑ 𝑥 113
𝑥̅ = = = 16.1428
𝑛 7
∑ 𝑦 188
𝑦̅ = = = 26.8571
𝑛 7
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑦𝑥 = 𝑁
2
(∑ 𝑥)
∑ 𝑥2 −
𝑁
(113)(188)
3216 −
= 7 = 1.1402
1132
1983 − 7
∑𝑥∑𝑦
∑ 𝑥𝑦−
𝑁
𝑏𝑥𝑦 = (∑ 𝑦)
2
∑ 𝑦2−
𝑁
(113)(188)
3216−
7
= 1882
= 0.8590
5260−
7
Substituting in 𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
𝑦 = 1.1402 𝑥 − 18.4060+26.8571
𝑦 = 1.1402 𝑥 + 8.4511
Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
10
EM III_SMITA N
𝑥 − 16.1428 = 0.8590 𝑦 − 23.0702
𝑥 = 0.8590 𝑦 −6.9274
Example 3
The following data regarding the heights (y) and the weights (x) of 100 college
students are given below
∑ 𝑥 = 15000 , ∑ 𝑥 2 = 2272500 ∑ 𝑦 = 6800
∑ 𝑦 2 = 463025 ∑ 𝑥𝑦 = 1022250
Find the correlation coefficient between height and weight and state the equations
of regression of height on weight.
Solution
∑ 𝑥 15000
𝑥̅ = = = 150
𝑛 100
∑ 𝑦 6800
𝑦̅ = = = 68
𝑛 100
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑦𝑥 = 𝑁
2
(∑ 𝑥)
∑ 𝑥2 −
𝑁
(15000)(6800)
1022250 −
= 100 = 0.1
150002
2272500 − 100
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑥𝑦 = 𝑁
2
(∑ 𝑦)
∑ 𝑦2 −
𝑁
(15000)(6800)
1022250 −
= 100 = 3.6
68002
463025 − 100
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦
= (0.1)(3.6)
11
EM III_SMITA N
= 0.36
𝑟 = √0.36 = 0.6
Line of regression of 𝑦 𝑜𝑛 𝑥
𝑦 − 68 = 0.1(𝑥 − 150)
𝑦 − 68 = 0.1𝑥 − 15
𝑦 = 0.1𝑥 − 15 + 68
𝑦 = 0.1𝑥 + 53
Example 4
Find the regression coefficients and the coefficient of Correlation from the following
data
N=12, ∑ 𝑥 = 120 , ∑ 𝑥 2 = 1392 ∑ 𝑦 = 432
∑ 𝑦 2 = 18252 ∑ 𝑥𝑦 = 4992
Find the correlation coefficient between height and weight and state the equations
of regression of height on weight.
Solution
∑ 𝑥 120
𝑥̅ = = = 10
𝑛 12
∑ 𝑦 432
𝑦̅ = = = 36
𝑛 12
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑦𝑥 = 𝑁
2
(∑ 𝑥)
∑ 𝑥2 −
𝑁
(120)(432)
4992 −
= 12 = 3.5
1202
1392 − 12
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑥𝑦 = 𝑁
2
(∑ 𝑦)
∑ 𝑦2 −
𝑁
12
EM III_SMITA N
(120)(432)
4992 −
= 12 = 0.2488
4322
18252 − 12
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦
= (3.5)(0.2488)
= 0.8708
𝑟 = √0.36 = 0.9331
Example 5
Given Variance of 𝑥 = 25. 𝑇ℎ𝑒 Equations of two lines of regression are 5𝑥 − 𝑦 =
22, 64𝑥 − 45𝑦 = 24.
Find (i) 𝑥̅ 𝑎𝑛𝑑 𝑦̅
(ii) 𝜎𝑦
(iii) 𝑟
Solution
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 = 𝜎𝑥 2 = 25
𝜎𝑥 = 5
Solving the given equations, we get the mean as
𝑥̅ = 6 𝑦̅ = 8
If we consider the given equation
5𝑥 − 𝑦 = 22 as regression equation of 𝑥 𝑜𝑛 𝑦
5𝑥 = 𝑦 + 22
1 22
𝑥 = 5𝑦 + 5
13
EM III_SMITA N
45𝑦 = 64𝑥 − 24
64 24
𝑦= 𝑦−
45 45
Then 𝑏𝑦𝑥 = 1.4222
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦
= (1.4222)(0.2)
= 0.2844
𝑟 = √0.2844
𝑟 = 0.5332
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 = 𝜎𝑥 2 = 25
𝜎𝑥 = 5
5
0.2 = (0.5332)
𝜎𝑦
2.666
0.2 =
𝜎𝑦
2.666
𝜎𝑦 =
0.2
𝜎𝑦 = 13.33
Example 6
∑ 𝑥 = 59 , ∑ 𝑥 2 = 524 ∑ 𝑦 = 40
∑ 𝑦 2 = 256 ∑ 𝑥𝑦 = 364
Find the equations of the line of regression of 𝑥 𝑜𝑛 𝑦 and the Coefficient of
Correlation.
Solution
Line of regression of 𝑥 𝑜𝑛 𝑦
14
EM III_SMITA N
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
∑ 𝑥 59
𝑥̅ = = = 7.375
𝑁 8
∑ 𝑦 40
𝑦̅ = = =5
𝑁 8
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑥𝑦 = 𝑁
2
2 (∑ 𝑦)
∑𝑦 −
𝑁
(59)(40)
364 −
= 8 = 1.2321
402
256 − 8
Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
𝑥 − 7.375 = 1.2321(𝑦 − 5)
(59)(40)
364 −
= 8 = 0.9780
2 2
√524 − 59 √256 − 40
8 8
Example 7
The equations of two lines of regression are
𝑥 = 19.13 − 0.87𝑦
𝑦 = 11.64 − 0.50𝑥
Find (i) the mean of 𝑥 𝑎𝑛𝑑 𝑦 and (ii) coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦
15
EM III_SMITA N
Solution
𝑥̅ = 15.93 𝑦̅ = 3.672
If we consider the given equation
𝑥 = 19.13 − 0.87𝑦 as regression equation of 𝑥 𝑜𝑛 𝑦
Then 𝑏𝑥𝑦 = −0.87
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦
= (−0.50)(−0.87)
= 0.435
𝑟 = −√0.435
= −0.6595
Example 8
In a partially destroyed laboratory record of analysis of correlation data the following results
are legible.
Variance of 𝑥 = 9. Equations of lines of regression 4𝑥 − 5𝑦 + 33 = 0, 20𝑥 − 9𝑦 − 107 =
0.
Find (i) mean value of 𝑥 𝑎𝑛𝑑 𝑦
(ii) Standard deviation of 𝑦
(iii) Coefficient of correlation between 𝑥 𝑎𝑛𝑑 𝑦
Solution
16
EM III_SMITA N
4𝑥 = 5𝑦 − 33
5 33
𝑥= 𝑦−
4 4
5
𝑏𝑥𝑦 =
4
𝐼𝑓 𝑤𝑒 𝑡𝑎𝑘𝑒 20𝑥 − 9𝑦 − 107 = 0. 𝑎𝑠 𝑙𝑖𝑛𝑒 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑜𝑓 𝑦 𝑜𝑛 𝑥 𝑡ℎ𝑒𝑛
9𝑦 = 20𝑥 − 107
20 107
𝑦= 𝑥−
9 9
20
𝑏𝑦𝑥 =
9
Hence, we see that 𝑏𝑥𝑦 𝑎𝑛𝑑 𝑏𝑦𝑥 are greater than 1 which is not possible because
coefficient of correlation cannot be greater than 1.hence our assumption is wrong.
4
𝑏𝑦𝑥 =
5
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦
4 9
= ( )( )
5 20
= 0.36
𝑟 = √0.36
= 0.6
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 = 𝜎𝑥 2 = 9
17
EM III_SMITA N
𝜎𝑥 = 3
9 3
= (0.6)
20 𝜎𝑦
1.8
0.45 =
𝜎𝑦
1.8
𝜎𝑦 =
0.45
𝜎𝑦 = 4
Example 9
Solution
𝑥̅ = 3 𝑦̅ = 0.5
If we consider the given equation
𝑥 + 6𝑦 = 6
𝑥 = −6𝑦 + 6
as regression equation of 𝑥 𝑜𝑛 𝑦
Then 𝑏𝑥𝑦 = −6
2𝑦 = −3𝑥 + 10
18
EM III_SMITA N
3
𝑦 =− 𝑥+5
2
3
Then 𝑏𝑦𝑥 = − 2
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦
3
= (−6)(− )
2
=9
= −0.6595
Hence
𝑥 + 6𝑦 = 6 is a regression line of 𝑦 𝑜𝑛 𝑥
𝑥 + 6𝑦 = 6
6𝑦 = −𝑥 + 6
1
𝑦 =− 𝑥+1
6
1
𝑏𝑦𝑥 = −
6
3𝑥 + 2𝑦 = 10 as regression equation of 𝑥 𝑜𝑛 𝑦
3𝑥 + 2𝑦 = 10
3𝑥 = −2𝑦 + 10
2 10
𝑥 =− 𝑦+
3 3
2
𝑏𝑥𝑦 = −
3
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦
1 2
= (− )(− )
6 3
= 0.1111
𝑟 = −0.3333
To estimate 𝑦 when 𝑥 = 12.
19
EM III_SMITA N
1
𝑦 = − (12) + 1 = −2 + 1 = −1
6
Example 10
Given the following information about marks of 60 students
Mathematics English
Mean 80 50
S.D 15 10
Coefficient of correlation 𝑟 =
0.4. 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑡ℎ𝑒 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝑤ℎ𝑜 𝑠𝑐𝑜𝑟𝑒𝑑 60 𝑚𝑎𝑟𝑘𝑠 𝑖𝑛 𝐸𝑛𝑔𝑙𝑖𝑠ℎ.
Solution
Let 𝑥 𝑏𝑒 𝑡ℎ𝑒 𝑚𝑎𝑟𝑘𝑠 𝑖𝑛 𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝑎𝑛𝑑 𝑦 𝑏𝑒 𝑡ℎ𝑒 𝑚𝑎𝑟𝑘𝑠 𝑖𝑛 𝐸𝑛𝑔𝑙𝑖𝑠ℎ
𝑀𝑒𝑎𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝑥̅ = 80
𝑀𝑒𝑎𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝐸𝑛𝑔𝑙𝑖𝑠ℎ 𝑦̅ = 50
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝜎𝑥 = 15
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝐸𝑛𝑔𝑙𝑖𝑠ℎ 𝜎𝑦 = 10
Coefficient of
correlation 𝑟 = 0.4
Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
15 6
= (0.4) = = 0.6
10 10
Hence 𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
𝑥 − 80 = 0.6(𝑦 − 50)
𝑥 − 80 = 0.6𝑦 − 30
𝑥 = 0.6𝑦 − 30 + 80
𝑥 = 0.6𝑦 + 50
20
EM III_SMITA N
𝑤ℎ𝑒𝑛 𝑦 = 60
𝑥 = 0.6(60) + 50
𝑥 = 36 + 50
𝑥 = 86
Example 11
Given
𝑥 𝑠𝑒𝑟𝑖𝑒𝑠 𝑦 𝑠𝑒𝑟𝑖𝑒𝑠
Mean 18 100
S.D 14 20
𝑟 = 0.6
Find the most probable value of 𝑦 when x = 70 and most probable value of x when y =
90
Solution
𝑥̅ = 18 , 𝑦̅ = 100 𝜎𝑥 = 14 𝜎𝑦 = 20 𝑟 = 0.6
Line of regression of 𝑦 𝑜𝑛 𝑥
𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
𝜎𝑦
𝑏𝑦𝑥 = 𝑟
𝜎𝑥
20
= (0.6) = 0.8571
14
Hence
𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
𝑤ℎ𝑒𝑛 𝑥 = 70
21
EM III_SMITA N
𝑦 = 0.8571(70) + 84.5722
𝑦 = 144.5692
Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
14
= (0.6) = 0.42
20
Hence 𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
𝑥 − 18 = 0.42(𝑦 − 100)
𝑥 − 18 = 0.42𝑦 − 42
𝑥 = 0.42𝑦 − 42 + 18
𝑥 = 0.42𝑦 − 24
𝑤ℎ𝑒𝑛 𝑦 = 90
𝑥 = 0.42(90) − 24
𝑥 = 13.8
Example 12
It is given that the means of 𝑥 𝑎𝑛𝑑 𝑦 are 5 and 10. If the line of regression of 𝑦 𝑜𝑛 𝑥 is
parallel to the line 20𝑦 = 9𝑥 + 40.Estimate value of 𝑦 𝑓𝑜𝑟 𝑥 = 30.
Solution
Line of regression of 𝑦 𝑜𝑛 𝑥
𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
22
EM III_SMITA N
Slope of this equation is 𝑏𝑦𝑥 and this line is parallel to 20𝑦 = 9𝑥 + 40
9 40
𝑦= 𝑥+
20 20
9
𝑏𝑦𝑥 =
20
𝑥̅ = 5 𝑎𝑛𝑑 𝑦̅ = 10
Line of regression of 𝑦 𝑜𝑛 𝑥
9
𝑦 − 10 = (𝑥 − 5)
20
𝑦 − 10 = 0.45(𝑥 − 5)
𝑦 − 10 = 0.45𝑥 − 2.25
𝑦 = 0.45𝑥 − 2.25 + 10
𝑦 = 0.45𝑥 + 7.75
𝑤ℎ𝑒𝑛 𝑥 = 30
𝑦 = 0.45(30) + 7.75
𝑦 = 13.5 + 7.75
𝑦 = 21.25
Example 13
A panel of two judges A and B graded dramatic performances by independently awarding
marks as follows:
Performance No.: 1 2 3 4 5 6 7
23
EM III_SMITA N
Marks by A: 36 32 34 31 32 32 34
Marks by B: 35 33 31 30 34 32 36
The eighth performance however, which judge B could not attend got 38 marks by judge A.
If judge B had also been present how many marks would he be expected to have awarded to
the eighth performance?
Solution
𝑥 𝑦 𝑥𝑦 𝑥2
36 35 1260 1296
32 33 1056 1024
34 31 1054 1156
31 30 930 961
32 34 1088 1024
32 32 1024 1024
34 36 1224 1156
∑ 𝑥 = 231 ∑ 𝑦 = 231 ∑ 𝑥 𝑦 = 7636 ∑ 𝑥 2 = 7641
Example 14
The equations of two lines of regressions are 3𝑥 + 2𝑦 = 26, 6𝑥 + 𝑦 = 31.
Find (i) the mean of 𝑥 𝑎𝑛𝑑 𝑦
(ii) Coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦,
(iii) 𝜎𝑦 𝑖𝑓 𝜎𝑥 = 3
Solution
𝜎𝑥 = 3
Solving the given equations, we get the mean as
24
EM III_SMITA N
𝑥̅ = 4 𝑦̅ = 7
If we consider the given equation
3𝑥 + 2𝑦 = 26 as regression equation of 𝑦 𝑜𝑛 𝑥
2𝑦 = −3𝑥 + 26
−3 26
𝑦= 𝑦+
2 2
−1 31
𝑥= 𝑦+
6 6
Then 𝑏𝑥𝑦 = 0.1667
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦
= (−1.5)(−0.1667)
= 0.25
𝑟 = √0.25
𝑟 = −0.5
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
𝜎𝑥 = 3
3
−0.1667 = (−0.5)
𝜎𝑦
1.5
0.1667 =
𝜎𝑦
1.5
𝜎𝑦 =
0.1667
𝜎𝑦 = 8.9982 ≅ 9
25
EM III_SMITA N
Practice Problems
Find the coefficient of regression and hence the equation of lines of regression for the
following data
X 78 36 98 25 75 82 90 62 65 39
y 84 51 91 60 68 62 86 58 53 47
26
EM III_SMITA N