Fitting A Straight Line

Download as pdf or txt
Download as pdf or txt
You are on page 1of 79

CURVE FITTING

In many Social Economical, Engineering and Physical problems we have a set of values of x and y
although we do not know the functional relationship between them. Fitting of a curve to a given set of
values means finding a relationship between x and y whose curve is the closest possible curve to the
given values. The curve so obtained does not pass through all the given points but is close to them to
the maximum extent. Finding such a for a given set of values is called Curve fitting. The relation in
general is assumed to be a linear function or a parabolic function or even exponential or logarithms.
The method used for fitting the curve is based on the principle of least squares.

Fitting a straight line


𝑦 = 𝑎 + 𝑏𝑥
Where values of 𝑎 𝑎𝑛𝑑 𝑏 are calculated by solving the Normal Equations

∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2

Example 1
Fit a Straight line to the following data

𝑥 0 1 2 3 4 5
𝑦 1 2 3 4.5 6 7.5

Solution

𝑥 𝑦 𝑥𝑦 𝑥2
0 1 0 0
1 2 2 1
2 3 6 4
3 4.5 13.5 9
4 6 24 16
5 7.5 37.5 25
∑ 𝑥 =15 ∑ 𝑦 = 24 ∑ 𝑥 𝑦 = 83 ∑ 𝑥 2 = 55

1
EM III_SMITA N
Substituting in the Normal Equation

∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 24 = 6𝑎 + 15𝑏 ……………(1) :

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 83 = 15𝑎 + 55𝑏……………..(2)

Solving (1) & (2) we get 𝑎 = 0.7142 𝑏 = 1.3142

𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 0.7142 + 1.3142𝑥

Example 2
Fit a first-degree curve to the following data and estimate the value of
𝑦 𝑤ℎ𝑒𝑛 𝑥 = 73
𝑥 10 20 30 40 50 60 70 80
𝑦 1 3 5 10 6 4 2 1

Solution

𝑥 𝑦 𝑥𝑦 𝑥2
10 1 10 100
20 3 60 400
30 5 150 900
40 10 400 1600
50 6 300 2500
60 4 240 3600
70 2 140 4900
80 1 80 6400
∑ 𝑥 = 360 ∑ 𝑦 = 32 ∑ 𝑥 𝑦 = 1380 ∑ 𝑥 2 = 20400

Substituting in the Normal Equation


∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 32 = 8𝑎 + 360𝑏 ……………(1) :

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 1380 = 360𝑎 + 20400𝑏……………..(2)

Solving (1) & (2) we get 𝑎 = 4.6428 𝑏 = −0.0142

2
EM III_SMITA N
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 4.6428 − 0.0142𝑥
When 𝑥 = 73 𝑡ℎ𝑒𝑛 𝑦 = 3.6062

Example 3
Fit a straight line to the following data
(𝑥, 𝑦) = (1,1), (2,5), (3,11), (4,8), (5,14)

Solution

𝑥 𝑦 𝑥𝑦 𝑥2
1 1 1 1
2 5 10 4
3 11 33 9
4 8 32 16
5 14 70 25
∑ 𝑥 = 15 ∑ 𝑦 = 39 ∑ 𝑥 𝑦 = 146 ∑ 𝑥 2 = 55

Substituting in the Normal Equation

∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 39 = 5𝑎 + 15𝑏 ……………(1) :

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 146 = 15𝑎 + 55𝑏……………..(2)

Solving (1) & (2) we get 𝑎 = −0.9 𝑏 = 2.9

𝑦 = 𝑎 + 𝑏𝑥
𝑦 = −0.9 + 2.9𝑥

Example 4
Fit a straight line to the following data
𝑥 100 120 140 160 180 200
𝑦 0.45 0.55 0.60 0.70 0.80 0.85

3
EM III_SMITA N
Solution

𝑥 𝑦 𝑥𝑦 𝑥2
100 0.45 45 10000
120 0.55 66 14400
140 0.60 84 19600
160 0.70 112 25600
180 0.80 144 32400
200 0.85 170 40000
∑ 𝑥 = 900 ∑ 𝑦 = 3.95 ∑ 𝑥 𝑦 = 621 ∑ 𝑥 2 = 142000

Substituting in the Normal Equation


∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 3.95 = 6𝑎 + 900𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 621 = 900𝑎 + 142000𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 0.0476 𝑏 = 0.00407

𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 0.0476 + 0.00407𝑥

Example 5
Fit a straight line to the following data
𝑥 1 2 3 4 5 6
𝑦 49 54 60 73 80 86

Solution
𝑥 𝑦 𝑥𝑦 𝑥2
1 49 49 1
2 54 108 4
3 60 180 9
4 73 292 16
5 80 400 25
6 86 516 36
∑ 𝑥 = 21 ∑ 𝑦 = 402 ∑ 𝑥 𝑦 = 1545 ∑ 𝑥 2 = 91

4
EM III_SMITA N
Substituting in the Normal Equation
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 402 = 6𝑎 + 21𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 1545 = 21𝑎 + 91𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 39.4 𝑏 = 7.8857

𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 39.4 + 7.8857𝑥
Example 6
Fit a straight line to the following data
(X,Y) = (1,-5),(1,1),(2,4),(3,7),(4,10)

Solution

𝑥 𝑦 𝑥𝑦 𝑥2
1 -5 -5 1
1 1 1 1
2 4 8 4
3 7 21 9
4 10 40 16
∑ 𝑥 = 11 ∑ 𝑦 = 17 ∑ 𝑥 𝑦 = 65 ∑ 𝑥 2 = 31

Substituting in the Normal Equation


∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 17 = 5𝑎 + 11𝑏 ……………(1) :

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 65 = 11𝑎 + 31𝑏……………..(2)

Solving (1) & (2) we get 𝑎 = −5.5294 𝑏 = 4.0588

𝑦 = 𝑎 + 𝑏𝑥
𝑦 = −5.5294 + 4.0588𝑥

5
EM III_SMITA N
Example 7

Fit a Straight line to the following data

𝑥 1965 1966 1967 1968 1969


𝑦 125 140 165 195 200

Solution

𝑥 𝑦 𝑋 = 𝑥 − 1967 𝑌 = 𝑦 − 165 𝑋𝑌 𝑋2
1965 125 -2 -40 80 4
1966 140 -1 -25 25 1
1967 165 0 0 0 0
1968 195 1 30 30 1
1969 200 2 35 70 4
∑𝑋 = 0 ∑𝑌 = 0 ∑ 𝑋𝑌 = 205 ∑ 𝑋 2 = 10

Substituting in the Normal Equation


∑ 𝑌 = 𝑛𝑎 + 𝑏 ∑ 𝑋 0 = 5𝑎 + 0𝑏 ……………(1) :

∑ 𝑋𝑌 = 𝑎 ∑ 𝑋 + 𝑏 ∑ 𝑋 2 205 = 0𝑎 + 10𝑏……………..(2)

Solving (1) & (2) we get 𝑎 = 0 𝑏 = 20.5

𝑦 = 𝑎 + 𝑏𝑋
𝑦 = 0 + 20.5𝑋
𝑏𝑢𝑡 𝑋 = 𝑥 − 1967
𝑦 = 0 + 20.5(𝑥 − 1967)
𝑦 = 20.5𝑥 − 40323.5

6
EM III_SMITA N
Example 8
Fit a straight line to the following data
𝑥 0 5 10 15 20 25
𝑦 12 12 17 22 24 30

𝐴𝑛𝑠 𝑦 = 11.25 + 0.7𝑥

Example 9
Fit a straight line to the following data. Also estimate the production in 1967
𝑦𝑒𝑎𝑟 1951 1961 1971 1981 1991
𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 10 12 8 10 15

𝐴𝑛𝑠 𝑦 = −146.68 + 0.08𝑥


Production in 1967=12.28

Example 10
Fit a straight line to the following data
𝑥 1 2 3 4 5 6
𝑦 83 92 71 90 160 191

𝐴𝑛𝑠 𝑦 = 38.2 + 21.8𝑥

Example 11
Fit a straight line to the following data. Estimate 𝑦 when 𝑥 = 12
𝑥 1 2 3 4 5 6 7 8 9 10
𝑦 52.5 58.7 65.0 70.2 75.4 81.1 87.2 95.5 102.2 108.4

𝐴𝑛𝑠 𝑦 = 119.66

7
EM III_SMITA N
Fitting a Parabola or Fitting a second-degree Curve

𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2
Where values of 𝑎, 𝑏 𝑎𝑛𝑑 𝑐 are calculated by solving the Normal Equations

∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 + 𝑐 ∑ 𝑥 3

∑ 𝑥2𝑦 = 𝑎 ∑ 𝑥2 + 𝑏 ∑ 𝑥3 + 𝑐 ∑ 𝑥4

Example 1
Fit a second-degree Parabola to the following data

𝑥 1 2 3 4 5
𝑦 25 28 33 39 46

Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑥 2𝑦 𝑥3 𝑥4
1 25 25 1 25 1 1
2 28 56 4 112 8 16
3 33 99 9 297 27 81
4 39 156 16 624 64 256
5 46 230 25 1150 125 625
∑𝑥 ∑𝑦 ∑𝑥𝑦 ∑ 𝑥 2 = 55 ∑ 𝑥2𝑦 ∑ 𝑥3 ∑ 𝑥4
= 15 = 171 = 566 = 2208 = 225 = 979

8
EM III_SMITA N
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2 171 = 5𝑎 + 15𝑏 + 55𝑐

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 + 𝑐 ∑ 𝑥 3 566 = 15𝑎 + 55𝑏 + 225𝑐

∑ 𝑥2𝑦 = 𝑎 ∑ 𝑥2 + 𝑏 ∑ 𝑥3 + 𝑐 ∑ 𝑥4 2208 = 55𝑎 + 225𝑏 + 979𝑐

Solving (1),(2) & (3) we get 𝑎 = 22.8 𝑏 = 1.4428 𝑐 = 0.6428

𝑦 = 23.8 + 0.5857𝑥 + 0.7857𝑥 2

Example 2
Fit a Parabola to the following data
𝑥 -2 -1 0 1 2
𝑦 1.0 1.8 1.3 2.5 6.3

Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑥 2𝑦 𝑥3 𝑥4
-2 1.0 -2 4 4 -8 16
-1 1.8 -1.8 1 1.8 -1 1
0 1.3 0 0 0 0 0
1 2.5 2.5 1 2.5 1 1
2 6.3 12.6 4 25.2 8 16
∑𝑥 = 0 ∑𝑦 ∑𝑥𝑦 ∑ 𝑥 2 = 10 ∑ 𝑥2𝑦 ∑ 𝑥3 = 0 ∑ 𝑥4
= 12.9 = 11.3 = 33.5 = 34

∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2 12.9 = 5𝑎 + 0𝑏 + 10𝑐

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 + 𝑐 ∑ 𝑥 3 11.3 = 0𝑎 + 10𝑏 + 0𝑐

∑ 𝑥2𝑦 = 𝑎 ∑ 𝑥2 + 𝑏 ∑ 𝑥3 + 𝑐 ∑ 𝑥4 33.5 = 10𝑎 + 0𝑏 + 34𝑐

Solving (1),(2) & (3) we get 𝑎 = 1.48 𝑏 = 1.13 𝑐 = 0.55

𝑦 = 1.48 + 1.13𝑥 + 0.55𝑥 2


9
EM III_SMITA N
Example 3
Fit a second-degree curve to the following data
𝑌𝑒𝑎𝑟 1921 1931 1941 1951 1961 1971 1981
𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛(𝑖𝑛 𝑦𝑒𝑎𝑟𝑠): 3 5 9 10 12 14 15

Estimate the production in 1975


Solution
𝑥−1951
𝑤𝑒 𝑐𝑎𝑛 𝑐ℎ𝑎𝑛𝑔𝑒 𝑥 𝑡𝑜 𝑋 =
10

𝑥 𝑋=
𝑥 − 1951 𝑌=𝑦 𝑋𝑌 𝑋2 𝑋2𝑌 𝑋3 𝑋4
10
1921 -3 3 -9 9 27 -27 81
1931 -2 5 -10 4 20 -8 16
1941 -1 9 -9 1 9 -1 1
1951 0 10 0 0 0 0 0
1961 1 12 12 1 12 1 1
1971 2 14 28 4 56 8 16
1981 3 15 45 9 135 27 81
∑𝑋 = 0 ∑𝑌 ∑𝑋𝑌 ∑ 𝑋2 ∑ 𝑋2𝑌 ∑ 𝑋3 ∑ 𝑋4
= 68 = 57 = 28 = 259 =0 = 196

∑ 𝑌 = 𝑛𝑎 + 𝑏 ∑ 𝑋 + 𝑐 ∑ 𝑋 2 68 = 7𝑎 + 0𝑏 + 28𝑐

∑ 𝑋𝑌 = 𝑎 ∑ 𝑋 + 𝑏 ∑ 𝑋 2 + 𝑐 ∑ 𝑋 3 57 = 0𝑎 + 28𝑏 + 0𝑐

∑ 𝑋2𝑌 = 𝑎 ∑ 𝑋2 + 𝑏 ∑ 𝑋3 + 𝑐 ∑ 𝑋4 259 = 28𝑎 + 0𝑏 + 196𝑐

Solving (1),(2) & (3) we get 𝑎 = 10.33 𝑏 = 2.0357 𝑐 = −0.1547

𝑌 = 10.33 + 2.0357𝑋 − 0.1547𝑋 2


𝑥 − 1951
𝑊ℎ𝑒𝑟𝑒 𝑋 =
10
𝑤ℎ𝑒𝑛 𝑥 = 1975 𝑡ℎ𝑒𝑛 𝑋 = 2.4

10
EM III_SMITA N
𝑌 = 10.33 + 2.0357(2.4) − 0.1547(2.4)2

𝑦 = 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 = 14.32
Example 4
Fit a second-degree curve to the following data. Estimate the value of
𝑦 𝑤ℎ𝑒𝑛 𝑥 = 80
𝑥 10 20 30 40 50 60 70
𝑦 20 60 70 80 90 100 100

Solution
𝑥 𝑦 𝑥𝑦 𝑥2 𝑥 2𝑦 𝑥3 𝑥4
10 20 200 100 2000 1000 10000
20 60 1200 400 24000 8000 160000
30 70 2100 900 63000 27000 810000
40 80 3200 1600 128000 64000 2560000
50 90 4500 2500 225000 125000 6250000
60 100 6000 3600 360000 216000 12960000
70 100 7000 4900 490000 343000 24010000
∑𝑥 ∑𝑦 ∑𝑥𝑦 ∑ 𝑥2 ∑ 𝑥2𝑦 ∑ 𝑥3 ∑ 𝑥4
= 280 = 520 = 24200 = 14000 = 1292000 = 784000 = 46760000

∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2 520 = 7𝑎 + 280𝑏 + 14000𝑐

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 + 𝑐 ∑ 𝑥 3 24200 = 280𝑎 + 14000𝑏 + 784000𝑐

∑ 𝑥2𝑦 = 𝑎 ∑ 𝑥2 + 𝑏 ∑ 𝑥3 + 𝑐 ∑ 𝑥4 1292000 = 14000𝑎 + 784000𝑏 + 46760000𝑐

Solving (1),(2) & (3) we get 𝑎 = −2.8571 𝑏 = 3.119 𝑐 = −0.0238

𝑦 = −2.8571 + 3.119𝑥 − 0.0238𝑥 2

𝑤ℎ𝑒𝑛 𝑥 = 80 𝑦 = 94.3429

11
EM III_SMITA N
Example 5

Fit a Parabola to the following data


𝑥 -2 -1 0 1 2
𝑦 -3.150 -1.390 0.620 2.880 5.378

Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑥 2𝑦 𝑥3 𝑥4
-2 -3.150 6.3 4 -12.6 -8 16
-1 -1.390 1.39 1 -1.39 -1 1
0 0.620 0 0 0 0 0
1 2.880 2.88 1 2.88 1 1
2 5.378 10.756 4 21.512 8 16
∑𝑥 = 0 ∑ 𝑦 = 4.338 ∑𝑥𝑦 ∑ 𝑥2 ∑ 𝑥 2 𝑦 = 10.402 ∑ 𝑥3 = 0 ∑ 𝑥 4 = 34
= 21.326 = 10

∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2 4.338 = 5𝑎 + 0𝑏 + 10𝑐

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 + 𝑐 ∑ 𝑥 3 21.326 = 0𝑎 + 10𝑏 + 0𝑐

∑ 𝑥2𝑦 = 𝑎 ∑ 𝑥2 + 𝑏 ∑ 𝑥3 + 𝑐 ∑ 𝑥4 10.402 = 10𝑎 + 0𝑏 + 34𝑐

Solving (1),(2) & (3) we get 𝑎 = 0.621 𝑏 = 2.1326 𝑐 = 0.1232

𝑦 = 0.621 + 2.1326𝑥 + 0.1232𝑥 2


Example 6
Fit a second-degree curve to the following data.

𝑥 0 1 2 3 4 5 6
𝑦 1 1 3 7 13 21 31

12
EM III_SMITA N
Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑥 2𝑦 𝑥3 𝑥4
0 1 0 0 0 0 0
1 1 1 1 1 1 1
2 3 6 4 12 8 16
3 7 21 9 63 27 81
4 13 52 16 208 64 256
5 21 105 25 525 125 625
6 31 186 36 1116 216 1296
∑𝑥 ∑𝑦 ∑𝑥𝑦 ∑ 𝑥 2 = 91 ∑ 𝑥2𝑦 ∑ 𝑥3 ∑ 𝑥4
= 21 = 77 = 371 = 1925 = 441 = 2275

∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥 2 77 = 7𝑎 + 21𝑏 + 91𝑐

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 + 𝑐 ∑ 𝑥 3 371 = 21𝑎 + 91𝑏 + 441𝑐

∑ 𝑥2𝑦 = 𝑎 ∑ 𝑥2 + 𝑏 ∑ 𝑥3 + 𝑐 ∑ 𝑥4 1925 = 91𝑎 + 441𝑏 + 2275𝑐

Solving (1),(2) & (3) we get 𝑎 = 1 𝑏 = −1 𝑐 = 1

𝑦 = 1 − 𝑥 + 𝑥2

Example 7

Fit a Parabola to the following data


𝑥 0 1 2 3 4
𝑦 1.0 1.5 1.5 2.5 3.5

𝑦 = 1.0857 + 0.02857𝑥 + 0.1428𝑥 2

13
EM III_SMITA N
Example 8

Fit a Parabola to the following data


𝑥 1 2 3 4 5 6 7 8 9
𝑦 2 6 7 8 10 11 11 10 9

𝑦 = −1 + 3.55𝑥 + 0.27𝑥 2

Example 9

Fit a Second-degree curve to the following data. Also estimate the production
in 1982
𝑦𝑒𝑎𝑟 1974 1975 1976 1977 1978 1979 1980 1981
𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 12 14 26 42 40 50 52 53

𝑦 = −2671997.77 + 2695.92𝑥 − 0.68𝑥 2


𝑤ℎ𝑒𝑛 𝑥 = 1982 𝑡ℎ𝑒𝑛 𝑦 = 55.35

14
EM III_SMITA N
CORRELATION
Correlation is the study of existence and the magnitude and the direction of variation
between two or more variables.
Scatter diagram
One of the simplest methods of studying Correlation between two variables is to construct a
Scatter diagram. To obtain a scatter diagram, one variable is plotted along the 𝑥 − 𝑎𝑥𝑖𝑠 and
the other along the 𝑦 − 𝑎𝑥𝑖𝑠 on a graph paper. By plotting data in this way, we get points
which are generally scattered but which shows a pattern. The way in which the points are
scattered indicates the degree and direction of correlation. If the points are close to each other
we infer that the variables are correlated. If they are spread away from each other, we infer
that the variables are not correlated. Moreover, if the points lie in a narrow strip rising from
left hand bottom to the right hand top, we say that there is a positive correlation of high order.
If the points lie in a narrow strip falling from the left-hand top to the right-hand bottom, we
say that there is negative correlation of high order. If the points are all spread over, we say
that there is no correlation.

Karl Pearson’s Coefficient of Correlation

The method of scatter diagram is descriptive in nature and gives only a general idea of
Correlation. The most commonly used method which gives a mathematical expression is the
one suggested by Karl Pearson.

1
EM III_SMITA N
𝐶𝑜𝑣(𝑥, 𝑦)
𝑟=
𝜎𝑥 𝜎𝑦

𝐶𝑜𝑣(𝑥, 𝑦) 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑡ℎ𝑒 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥, 𝑦 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑏𝑦


1
𝐶𝑜𝑣(𝑥, 𝑦) = ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑁

𝜎𝑥 , 𝜎𝑦 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥, 𝑦

1
𝜎𝑥 2 = ∑(𝑥 − 𝑥̅ )2 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑡ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥
𝑁

1
𝜎𝑦 2 = 𝑁 ∑(𝑦 − 𝑦̅)2 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑡ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑦

1
∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑟= 𝑁
√ 1 ∑(𝑥 − 𝑥̅ )2 √ 1 ∑(𝑦 − 𝑦̅)2
𝑁 𝑁

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)


𝑟=
√∑(𝑥 − 𝑥̅ )2 √∑(𝑦 − 𝑦̅)2

∑ 𝑥𝑦 − 𝑁𝑥̅ 𝑦̅
𝑟=
√∑ 𝑥 2 − 𝑁𝑥̅ 2 √∑ 𝑦 2 − 𝑁𝑦̅ 2

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

Note: - −1 ≤ 𝑟 ≤ 1

2
EM III_SMITA N
Direct Method
Example 1
Find Karl Pearson’s Coefficient of Correlation from the following data
𝑥 2 4 5 6 8 11
𝑦 18 12 10 8 7 5

Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
2 18 36 4 324
4 12 48 16 144
5 10 50 25 100
6 8 48 36 64
8 7 56 64 49
11 5 55 121 25
∑ 𝑥 = 36 ∑ 𝑦 = 60 ∑ 𝑥 𝑦 = 293 ∑ 𝑥 2 = 266 ∑ 𝑦 2 = 706

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

(36)(60)
293 −
𝑟= 6 = −0.9203
2 2
√266 − (36) √706 − (60)
6 6

Example 2
Find Karl Pearson’s Coefficient of Correlation from the following data
𝑥 8 8 7 5 6 2
𝑦 3 4 10 13 22 8

3
EM III_SMITA N
Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
8 3 24 64 9
8 4 32 64 16
7 10 70 49 100
5 13 65 25 169
6 22 132 36 484
2 8 16 4 64
∑ 𝑥 = 36 ∑ 𝑦 = 60 ∑ 𝑥 𝑦 = 339 ∑ 𝑥 2 = 242 ∑ 𝑦 2 = 842

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

(36)(60)
339 −
𝑟= 6 = −0.2647
2 2
√242 − (36) √842 − (60)
6 6

Actual mean method


Example 3
Find Karl Pearson’s Coefficient of Correlation from the following data

𝑥 100 102 108 111 115 116 118


𝑦 110 100 104 108 112 116 120

Solution

∑ 𝑥 770
𝑥̅ = = = 110
𝑁 7

∑ 𝑦 770
𝑦̅ = = = 110
𝑁 7

4
EM III_SMITA N
𝒙 𝒚 ̅) (𝒚 − 𝒚
(𝒙 − 𝒙 ̅) ̅)(𝒚 − 𝒚
(𝒙 − 𝒙 ̅) ̅) 𝟐
(𝒙 − 𝒙 ̅ )𝟐
(𝒚 − 𝒚
100 110 -10 0 0 100 0
102 100 -8 -10 80 64 100
108 104 -2 -6 12 4 36
111 108 1 -2 -2 1 4
115 112 5 2 10 25 4
116 116 6 6 36 36 36
118 120 8 10 80 64 100
∑(𝑥 − 𝑥̅ )(𝑦 − ∑(𝑥 − 𝑥̅ )2 ∑(𝑦 − 𝑦̅)2 = 280
𝑦̅)=216
= 294

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)


𝑟=
√∑(𝑥 − 𝑥̅ )2 √∑(𝑦 − 𝑦̅)2

216
𝑟= = 0.7528
√294√280

Example 4

Find Karl Pearson’s Coefficient of Correlation from the following data


𝑥 65 66 67 67 68 69 70 72
𝑦 67 68 65 68 72 72 69 71

Solution
∑ 𝑥 544
𝑥̅ = = = 68
𝑁 8

∑ 𝑦 552
𝑦̅ = = = 69
𝑁 8

5
EM III_SMITA N
𝒙 𝒚 ̅) (𝒚 − 𝒚
(𝒙 − 𝒙 ̅) ̅)(𝒚 − 𝒚
(𝒙 − 𝒙 ̅) ̅) 𝟐
(𝒙 − 𝒙 ̅ )𝟐
(𝒚 − 𝒚
65 67 -3 -2 6 9 4
66 68 -2 -1 2 4 1
67 65 -1 -4 4 1 16
67 68 -1 -1 1 1 1
68 72 0 3 0 0 9
69 72 1 3 3 1 9
70 69 2 0 0 4 0
72 71 4 2 8 16 4
∑(𝑥 − 𝑥̅ )(𝑦 − ∑(𝑥 − 𝑥̅ )2 = 36 ∑(𝑦 − 𝑦̅)2 = 44
𝑦̅)=24

∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)


𝑟=
√∑(𝑥 − 𝑥̅ )2 √∑(𝑦 − 𝑦̅)2

24
𝑟= = 0.6030
√36√44
Assumed Mean Method
Example 5
Find Karl Pearson’s Coefficient of Correlation from the following data
𝑥 62 64 65 69 70 71 72 74
𝑦 126 125 139 145 165 152 180 208

Solution
𝑑𝑥 = 𝑥 − 70 𝑑𝑦 = 𝑦 − 165

𝑥 𝑑𝑥 𝑦 𝑑𝑦 𝑑𝑥 𝑑𝑦 𝑑𝑥 2 𝑑𝑦 2
62 -8 126 -39 312 64 1521
64 -6 125 -40 240 36 1600
65 -5 139 -26 130 25 676
69 -1 145 -20 20 1 400
70 0 165 0 0 0 0
71 1 152 -13 -13 1 169
72 2 180 15 30 4 225
74 4 208 43 172 16 1849
∑ 𝑑𝑥 ∑ 𝑑𝑦 ∑ 𝑑𝑥 𝑑𝑦 = 891 ∑ 𝑑𝑥 2 ∑ 𝑑𝑦 2 = 6440
= −13 = −80 = 147

6
EM III_SMITA N
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑑𝑥 2 − (∑ 𝑑𝑥 ) √∑ 𝑑𝑦 2 − (∑ 𝑑𝑦 )
𝑁 𝑁

(−13)(−80)
891 −
= 8 = 0.9031
(−13) 2 (−80) 2
√147 − √6440 −
8 8

Example 6
Find Karl Pearson’s Coefficient of Correlation from the following data
𝑥 52 42 38 42 45 42 44 40 46 44 43 40
𝑦 10 26 41 29 27 27 19 18 19 31 29 33

Solution
𝑑𝑥 = 𝑥 − 42
𝑑𝑦 = 𝑦 − 27

𝑥 𝑑𝑥 𝑦 𝑑𝑦 𝑑 𝑥 𝑑𝑦 𝑑𝑥 2 𝑑𝑦 2
52 10 10 -17 -170 100 289
42 0 26 -1 0 0 1
38 -4 41 14 -56 16 196
42 0 29 2 0 0 4
45 3 27 0 0 9 0
42 0 27 0 0 0 0
44 2 19 -8 -16 4 64
40 -2 18 -9 18 4 81
46 4 19 -8 -32 16 64
44 2 31 4 8 4 16
43 1 29 2 2 1 4
40 -2 33 6 -12 4 36
∑ 𝑑𝑥 = 14 ∑ 𝑑𝑦 ∑ 𝑑𝑥 𝑑𝑦 ∑ 𝑑𝑥 2 = 158 ∑ 𝑑𝑦 2 = 755
= −15 = −258

7
EM III_SMITA N
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑑𝑥 2 − (∑ 𝑑𝑥 ) √∑ 𝑑𝑦 2 − (∑ 𝑑𝑦 )
𝑁 𝑁

(14)(−15)
−258 −
= 12
2 2
√158 − (14) √755 − (−15)
12 12

= −0.7446
Example 7
Find Karl Pearson’s Coefficient of Correlation from the following data
𝒙 2 3 4 7 4
𝐲 8 7 3 1 1

Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
2 8 16 4 64
3 7 21 9 49
4 3 12 16 9
7 1 7 49 1
4 1 4 16 1
∑ 𝑥 = 20 ∑ 𝑦 = 20 ∑ 𝑥 𝑦 = 60 ∑ 𝑥 2 = 94 ∑ 𝑦 2 = 124

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

(20)(20)
60 −
𝑟= 5 = −0.8058
2 2
√94 − (20) √124 − (20)
5 5

8
EM III_SMITA N
Example 8

Find Karl Pearson’s Coefficient of Correlation from the following data

𝒙 30 33 25 10 33 75 40 85 90 95
𝑦 68 65 80 85 70 30 55 18 15 10

Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
30 68 2040 900 4624
33 65 2145 1089 4225
25 80 2000 625 6400
10 85 850 100 7225
33 70 2310 1089 4900
75 30 2250 5625 900
40 55 2200 1600 3025
85 18 1530 7225 324
90 15 1350 8100 225
95 10 950 9025 100
∑𝑥 ∑𝑦 ∑𝑥𝑦 ∑ 𝑥2 ∑ 𝑦 2 = 31948
= 516 = 496 = 17625 = 35378

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

(516)(496)
17625 −
= 10
2 2
√35378 − (516) √31948 − (496)
10 10

= −0.9937

9
EM III_SMITA N
Example 9

Find Karl Pearson’s Coefficient of Correlation from the following data

𝒙 10 12 14 15 16 17 18 10 14 15
𝑦 17 16 15 12 10 9 8 15 13 12

Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
10 17 170 100 289
12 16 192 144 256
14 15 210 196 225
15 12 180 225 144
16 10 160 256 100
17 9 153 289 81
18 8 144 324 64
10 15 150 100 225
14 13 182 196 169
15 12 180 225 144
∑𝑥 ∑𝑦 ∑ 𝑥 𝑦 = 1721 ∑ 𝑥 2 = 2055 ∑ 𝑦 2 = 1697
= 141 = 127

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

(141)(127)
1721 −
= 10
2 2
√2055 − (141) √1697 − (127)
10 10

= −0.9292

10
EM III_SMITA N
Example 10

Find Karl Pearson’s Coefficient of Correlation from the following data


𝒙 61 63 65 67 69
𝐲 64 62 65 70 72

Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
61 64 3904 3721 4096
63 62 3906 3969 3844
65 65 4225 4225 4225
67 70 4690 4489 4900
69 72 4968 4761 5184
∑ 𝑥 = 325 ∑ 𝑦 = 333 ∑ 𝑥 𝑦 = 21693 ∑ 𝑥 2 = 21165 ∑ 𝑦 2 = 22249

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

(325)(333)
21693 −
𝑟= 5 = 0.8994
2 2
√21165 − (325) √22249 − (333)
5 5

Example 11
Find Karl Pearson’s Coefficient of Correlation from the following data
𝒙 3 5 4 6 2
𝑦 3 4 5 2 6

Solution

11
EM III_SMITA N
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
3 3 9 9 9
5 4 20 25 16
4 5 20 16 25
6 2 12 36 4
2 6 12 4 36
∑ 𝑥 = 20 ∑ 𝑦 = 20 ∑ 𝑥 𝑦 = 73 ∑ 𝑥 2 = 90 ∑ 𝑦 2 = 90

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

(20)(20)
73 −
𝑟= 5 = −0.7
(20) 2 (20) 2
√90 − √90 −
5 5

Example 12
Find Karl Pearson’s Coefficient of Correlation from the following data
𝒙 23 27 28 29 30 31 33 35 36 39
𝑦 18 22 23 24 25 26 28 29 30 32

𝐴𝑛𝑠 𝑟 = 0.9948
Example 13
Find Karl Pearson’s Coefficient of Correlation from the following data
𝒙 100 200 300 400 500
𝑦 30 40 50 60 70

𝐴𝑛𝑠 𝑟 = 1

12
EM III_SMITA N
Example 14
Find Karl Pearson’s Coefficient of Correlation from the following data
𝒙 100 98 85 92 90 84 88 90 93 95
𝑦 500 610 700 630 670 800 800 750 700 690

𝐴𝑛𝑠 𝑟 = −0.8179

Example 15
A sample of 25 pairs of values of 𝑥 𝑎𝑛𝑑 𝑦 lead to the following results.

∑ 𝑥 = 127 , ∑ 𝑦 = 100 , ∑ 𝑥 2 = 760 , ∑ 𝑦 2 = 449 , ∑ 𝑥𝑦 = 500.

Later on, it was found that two pair of values were taken as (8,14) and (8,6) instead of correct
values (8,12) and (6,8). Find the corrected Correlation coefficient between 𝑥 𝑎𝑛𝑑 𝑦.

Solution

We know that
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

𝐶𝑜𝑟𝑟𝑒𝑐 𝑡𝑒𝑑 ∑ 𝑥 = 127 − (𝑖𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑥) + (𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑥)


=127 − (8 + 8) + (8 + 6)
=127 − 16 + 14 = 125

𝐶𝑜𝑟𝑟𝑒𝑐 𝑡𝑒𝑑 ∑ 𝑦 = 100 − (𝑖𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑦) + (𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑦)


=100 − (14 + 6) + (12 + 8)
=127 − 20 + 20 = 100

13
EM III_SMITA N
𝐶𝑜𝑟𝑟𝑒𝑐 𝑡𝑒𝑑 ∑ 𝑥𝑦 = 500 − (𝑖𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑥𝑦) + (𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑥𝑦)
=500 − (8𝑋 14 + 8𝑋6) + (8𝑋12 + 6𝑋8)
=500−160 + 144 = 484

𝐶𝑜𝑟𝑟𝑒𝑐 𝑡𝑒𝑑 ∑ 𝑥 2 = 760 − (𝑖𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑥 2 ) + (𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑥 2 )


=760 − (82 + 82 ) + (82 + 62 )
=760−128 + 100 = 732

𝐶𝑜𝑟𝑟𝑒𝑐 𝑡𝑒𝑑 ∑ 𝑦 2 = 449 − (𝑖𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑦 2 ) + (𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑦 2 )


=449−(142 + 62 ) + (122 + 82 )
=449−232 + 208 = 425

∑𝑥∑𝑦
∑ 𝑥𝑦−
𝑁
Corrected 𝑟 = 2 2
√∑ 𝑥 2 −(∑ 𝑥) √∑ 𝑦 2 −(∑ 𝑦)
𝑁 𝑁

(125)(100)
484 −
= 25
2 2
√732 − 125 √425 − 100
25 25

= −0.3093

Example 16
Calculate the Coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦 from the following data
N=10
∑ 𝑥 = 225 , ∑ 𝑦 = 189 , ∑(𝑥 − ̅̅
22̅̅)2 = 85 , ∑(𝑦 − ̅19
̅̅̅)2 = 25,

14
EM III_SMITA N
∑(𝑥 − 22)(𝑦 − 19) = 42

Solution

𝑑𝑥 = 𝑥 − 22 𝑑𝑦 = 𝑦 −19

∑(𝑥 − 22)2 = 85 , ∑(𝑦 − 19)2 = 25, ∑(𝑥 − 22)(𝑦 − 19) = 42

∑ 𝑑𝑥 2 = 85 ∑ 𝑑𝑦 2 = 25 ∑ 𝑑𝑥 𝑑𝑦 = 42

𝑑𝑥 = 𝑥 − 22

∑ 𝑑𝑥 = ∑(𝑥 − 22)

= ∑ 𝑥 − 𝑁 (22)

𝑁=10

∑ 𝑑𝑥 = 225 − 10(22) = 5

𝑑𝑦 = 𝑦 −19

∑ 𝑑𝑦 = ∑(𝑦 − 19)

= ∑ 𝑦 − 𝑁 (19)

𝑁=10

15
EM III_SMITA N
∑ 𝑑𝑦 = 189 − 10(19) =-1

∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑑𝑥 2 − (∑ 𝑑𝑥 ) √∑ 𝑑𝑦 2 − (∑ 𝑑𝑦 )
𝑁 𝑁

5(−1)
42−
10
= 25 1
= 0.9376
√85− √25−
10 10

Example 17
Given No. of pairs of observation=10
𝑥 series standard deviation=22.70
𝑦 series standard deviation=9.592
Summation of the products of corresponding deviations of 𝑥 𝑎𝑛𝑑 𝑦 from their respective
actual means=-1439.find 𝑟.

Solution
𝐶𝑜𝑣(𝑥, 𝑦)
𝑟=
𝜎𝑥 𝜎𝑦

1
𝐶𝑜𝑣(𝑥, 𝑦) = ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑁

Summation of the products of corresponding deviations of 𝑥 𝑎𝑛𝑑 𝑦 from their respective


actual means=∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅) = −1439

1439
𝐶𝑜𝑣(𝑥, 𝑦) = − = −143.9
10

𝜎𝑥 = 22.70 𝜎𝑦 = 9.592

16
EM III_SMITA N
−143.9
𝑟= = −0.6608
(22.70)(9.592)

Example 18
Given 𝑟 = 0.5 ∑ 𝑥𝑦 = 60 𝜎𝑦 = 4 𝑎𝑛𝑑 ∑ 𝑥 2 = 90

𝑤ℎ𝑒𝑟𝑒 𝑥 𝑎𝑛𝑑 𝑦 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑡ℎ𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒 𝑎𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑚𝑒𝑎𝑛. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒
number of observations.
Solution

∑ 𝑥 = ∑(𝑥 − 𝑥̅ )

∑ 𝑦 = ∑(𝑦 − 𝑦̅)

1
𝐶𝑜𝑣(𝑥, 𝑦) 𝑁 ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑟= =
𝜎𝑥 𝜎𝑦
√ 1 ∑(𝑥 − 𝑥̅ )2 𝜎𝑦
𝑁
1
(60)
0.5 = 𝑁
√90 (4)
𝑁
1
60
0.5 = 𝑁
√90 (4)
𝑁
60
√𝑁 =
(0.5)(√90) (4)

√𝑁 = 3.1622

𝑁 = 10

17
EM III_SMITA N
Example 19
Coefficient of Correlation between two variables is 0.4. Their Covariance is 12 and the
variance of 𝑥 is 25. Find the standard deviation of y.
Solution

𝐶𝑜𝑣(𝑥, 𝑦)
𝑟=
𝜎𝑥 𝜎𝑦

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 𝜎𝑥 2 = 25
𝜎𝑥 = 5 𝜎𝑦 =?
12
0.4 =
(5)𝜎𝑦
12
𝜎𝑦 = =6
(5)(0.4)

Example 20
Calculate the Coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦 from the following data
N=10 ∑ 𝑥 = 140 , ∑ 𝑦 = 150 , ∑(𝑥 − 10)2 = 180 , ∑(𝑦 − 15)2 = 215, ∑(𝑥 − 10)(𝑦 −
15) = 60
Solution

𝑑𝑥 = 𝑥 − 10 𝑑𝑦 = 𝑦 −15

∑(𝑥 − 10)2 = 180 , ∑(𝑦 − 15)2 = 215, ∑(𝑥 − 10)(𝑦 − 15) = 60

∑ 𝑑𝑥 2 = 180 ∑ 𝑑𝑦 2 = 215 ∑ 𝑑𝑥 𝑑𝑦 = 60

𝑑𝑥 = 𝑥 − 10

∑ 𝑑𝑥 = ∑(𝑥 − 10)

= ∑ 𝑥 − 𝑁 (10)

𝑁=10

18
EM III_SMITA N
∑ 𝑑𝑥 = 140 − 100 = 40

𝑑𝑦 = 𝑦 −15

∑ 𝑑𝑦 = ∑(𝑦 − 15)

= ∑ 𝑦 − 𝑁 (15)

𝑁=10
∑ 𝑑𝑦 = 150 − 150 =0

∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑑𝑥 2 − (∑ 𝑑𝑥 ) √∑ 𝑑𝑦 2 − (∑ 𝑑𝑦 )
𝑁 𝑁

40(0)
60− 10
= 2 2
= 0.9149
√180−40 √215−0
10 10

Example 21
A Computer while calculating the correlation coefficient between two variables 𝑥 𝑎𝑛𝑑 𝑦
from 25 observations obtained the following results.

∑ 𝑥 = 125 , ∑ 𝑦 = 100 , ∑ 𝑥 2 = 650 , ∑ 𝑦 2 = 960 , ∑ 𝑥𝑦 = 508.

where x and y denotes the actual valuies of x and y. Find the value of r

Solution
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

19
EM III_SMITA N
(125)(100)
508 −
= 25 = 0.0676
125 2 100 2
√650 − √960 −
25 25

20
EM III_SMITA N
SPEARMAN’S RANK CORRELATION
The method developed by Spearman is simpler than Karl Pearson’s method as it
depends on the ranks of the items and not on the actual values of the items. It is
denoted by R. The value of R lies between -1 and +1.
When the ranks are not equal
6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛

When the ranks are equal


1 1 1
6 [∑ 𝑑 2 + (𝑚 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 1 12 12
𝑛3 − 𝑛

Example 1
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 36 56 20 42 33 44 50 15 60
𝑦 50 35 70 58 75 60 45 80 38

Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
36 6 50 6 0 0
56 2 35 9 -7 49
20 8 70 3 5 25
42 5 58 5 0 0
33 7 75 2 5 25
44 4 60 4 0 0
50 3 45 7 -4 16
15 9 80 1 8 64
60 1 38 8 -7 49

1
EM III_SMITA N
∑ 𝑑 2 = 228

Since the ranks are not equal


6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛

6(228)
=1− = −0.9
93 − 9
Example 2
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 62 64 65 69 70 71 72 74
𝑦 126 125 139 145 165 152 180 208

Solution

𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐


− 𝑹𝒂𝒏𝒌 (𝒚)
62 8 126 7 1 1
64 7 125 8 -1 1
65 6 139 6 0 0
69 5 145 5 0 0
70 4 165 3 1 1
71 3 152 4 -1 1
72 2 180 2 0 0
74 1 208 1 0 0

∑ 𝑑2 = 4

2
EM III_SMITA N
Since the ranks are not equal

6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛
6(4)
=1− = 0.9523
83 − 8

Example 3
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 25 28 32 36 38 40 39 42 41 45
𝑦 70 80 85 75 59 65 48 50 54 66

Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)

25 10 70 4 6 36
28 9 80 2 7 49
32 8 85 1 7 49
36 7 75 3 4 16
38 6 59 7 -1 1
40 4 65 6 -2 4
39 5 48 10 -5 25
42 2 50 9 -7 49
41 3 54 8 -5 25
45 1 66 5 -4 16

∑ 𝑑 2 = 270

Since the ranks are not equal


3
EM III_SMITA N
6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛
6(270)
=1− = −0.6363
103 − 10

Example 4
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 12 17 22 27 32
𝑦 113 119 117 115 121

Solution

𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐


− 𝑹𝒂𝒏𝒌 (𝒚)
12 5 113 5 0 0
17 4 119 2 2 4
22 3 117 3 0 0
27 2 115 4 -2 4
32 1 121 1 0 0

∑ 𝑑2 = 8

Since the ranks are not equal


6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛

6(8)
=1− = 0.6
53 − 5

4
EM III_SMITA N
Example 5
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 18 20 34 52 12
𝑦 39 23 35 18 46

Solution

𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐


− 𝑹𝒂𝒏𝒌 (𝒚)
18 4 39 2 2 4
20 3 23 4 -1 1
34 2 35 3 -1 1
52 1 18 5 -4 16
12 5 46 1 4 16

∑ 𝑑 2 = 38

Since the ranks are not equal


6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛
6(38)
=1− = −0.9
53 − 5

Example 6
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 61 63 65 67 69
𝑦 64 62 65 70 72

Solution

5
EM III_SMITA N
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
61 5 64 4 1 1
63 4 62 5 -1 1
65 3 65 3 0 0
67 2 70 2 0 0
69 1 72 1 0 0

∑ 𝑑2 = 2

6 ∑ 𝑑2
Since the ranks are not equal 𝑅 =1−
𝑛3 −𝑛

6(2)
=1− = 0.9
53 − 5
Example 7
Compute Spearman’s Rank Correlation Coefficient from the following data
𝑥 52 63 45 36 72 65 45 25
𝑦 62 53 51 25 79 43 60 33
Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
52 4 62 2 2 4
63 3 53 4 -1 1
45 5.5 51 5 0.5 0.25
36 7 25 8 -1 1
72 1 79 1 0 0
65 2 43 6 -4 16
45 5.5 60 3 2.5 6.25
25 8 33 7 1 1

6
EM III_SMITA N
There are two entries in the x column having the same value 45 i.e. there is a tie
5+6
between ranks 5 and 6 so each is given a rank= = 5.5
2

∑ 𝑑 2 = 29.5

When the ranks are equal

1 1 1
6 [∑ 𝑑 2 + (𝑚 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 1 12 12
𝑛3 − 𝑛

𝑚1 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 45 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 = 2

1 3
6 [29.5 + (2 − 2)]
𝑅 =1− 12
83 − 8

6[29.5 + 0.5]
=1−
83 − 8

6[30]
=1− = 0.6428
83 − 8

Example 8
The following table shows the marks obtained by 10 students in Accountancy
and Statistics. Find Spearman’s coefficient of rank Correlation
Student no. 1 2 3 4 5 6 7 8 9 10
𝐴𝑐𝑐𝑜𝑢𝑚𝑡𝑎𝑛𝑐𝑦 45 70 65 30 90 40 50 57 85 60
𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 35 90 70 40 95 40 60 80 80 50

7
EM III_SMITA N
Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
45 8 35 10 -2 4
70 3 90 2 1 1
65 4 70 5 -1 1
30 10 40 8.5 1.5 2.25
90 1 95 1 0 0
40 9 40 8.5 0.5 0.25
50 7 60 6 1 1
57 6 80 3.5 2.5 6.25
85 2 80 3.5 -1.5 2.25
60 5 50 7 -2 4
There are two entries in the y column having the same value 80 i.e. there is a tie
3+4
between ranks 3 and 4 so each is given a rank= = 3.5
2

There are two entries in the y column having the same value 40 i.e. there is a tie
8+9
between ranks 8 and 9 so each is given a rank= = 8.5
2

∑ 𝑑 2 = 22

When the ranks are equal


1 1 1
6 [∑ 𝑑 2 + (𝑚1 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 12 12
3
𝑛 −𝑛

𝑚1 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 80 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 = 2


𝑚2 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 40 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 = 2

8
EM III_SMITA N
1 3 1
6 [22 + (2 − 2) + (23 − 2)]
𝑅 =1− 12 12
3
10 − 10

6[22 + 0.5 + 0.5]


=1−
103 − 10

6[23]
=1− = 0.8606
103 − 10

Example 9
Obtain the Rank Correlation coefficient from the following data
𝑥 85 74 85 50 65 78 74 60 74 90
𝑦 78 91 78 58 60 72 80 55 68 70

Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
85 2.5 78 3.5 -1 1
74 6 91 1 5 25
85 2.5 78 3.5 -1 1
50 10 58 9 1 1
65 8 60 8 0 0
78 4 72 5 -1 1
74 6 80 2 4 16
60 9 55 10 -1 1
74 6 68 7 -1 1
90 1 70 6 -5 25

9
EM III_SMITA N
There are two entries in the x column having the same value 85 i.e. there is a tie
2+3
between ranks 2 and 3 so each is given a rank= = 2.5
2

there are three entries in the x column having the same value 74 i.e. there is a tie
5+6+7
between ranks 5,6 and 7 so each is given a rank= =6
3

there are two entries in the y column having the same value 78 i.e. there is a tie
3+4
between ranks 3 and 4 so each is given a rank= = 3.5
2

∑ 𝑑 2 = 72

When the ranks are equal

1 1 1
6 [∑ 𝑑 2 + (𝑚1 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 12 12
3
𝑛 −𝑛

𝑚1 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 85 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑥 𝑐𝑜𝑙𝑢𝑚𝑛 = 2


𝑚2 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 74 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑥 𝑐𝑜𝑙𝑢𝑚𝑛 = 3
𝑚3 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 78 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑖𝑛 𝑦 𝑐𝑜𝑙𝑢𝑚𝑛 = 2

1 3 1 1
6 [72 + (2 − 2) + (33 − 3) + (23 − 2)]
𝑅 =1− 12 12 12
3
10 − 10

6[72 + 0.5 + 2 + 0.5]


=1−
103 − 10

6[75]
=1− = 0.5454
103 − 10

10
EM III_SMITA N
Example 10
Obtain the rank correlation coefficient from the following data

𝑥 10 12 18 18 15 40
𝑦 12 18 25 25 50 25
Solution
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
10 6 12 6 0 0
12 5 18 5 0 0
18 2.5 25 3 -0.5 0.25
18 2.5 25 3 -0.5 0.25
15 4 50 1 3 9
40 1 25 3 -2 4

There are two entries in the x column having the same value 18 i.e there is a tie
2+3
between ranks 2 and 3 so each is given a rank= = 2.5
2

There are three entries in the y column having the same value 25 i.e. there is a
2+3+4
tie between ranks 2,3 and 4 so each is given a rank= =3
3

∑ 𝑑 2 = 13.5

When the ranks are equal


1 1 1
6 [∑ 𝑑 2 + (𝑚1 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 12 12
3
𝑛 −𝑛

𝑚1 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 18 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑥 𝑐𝑜𝑙𝑢𝑚𝑛 = 2


𝑚2 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 25 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑥 𝑐𝑜𝑙𝑢𝑚𝑛 = 3

11
EM III_SMITA N
1 3 1
6 [13.5 + (2 − 2) + (33 − 3)]
𝑅 =1− 12 12
3
6 −6

6[13.5 + 0.5 + 2]
=1−
63 − 6

6[16]
=1− = 0.5429
63 − 6

Example 11
Obtain the Rank Correlation coefficient from the following data
𝑥 32 55 49 60 43 37 43 49 10 20
𝑦 40 30 70 20 30 50 72 60 45 25
Solution

𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐


− 𝑹𝒂𝒏𝒌 (𝒚)
32 8 40 6 2 4
55 2 30 7.5 -5.5 30.25
49 3.5 70 2 1.5 2.25
60 1 20 10 -9 81
43 5.5 30 7.5 -2 4
37 7 50 4 3 9
43 5.5 72 1 4.5 20.25
49 3.5 60 3 0.5 0.25
10 10 45 5 5 25
20 9 25 9 0 0
There are two entries in the x column having the same value 49 i.e. there is a tie
3+4
between ranks 3 and 4 so each is given a rank= = 3.5
2

12
EM III_SMITA N
there are two entries in the x column having the same value 43 i.e. there is a tie
5+6
between ranks 5 and 6 so each is given a rank= = 5.5
2

there are two entries in the y column having the same value 30 i.e. there is a tie
7+8
between ranks 7 and 8 so each is given a rank= = 7.5
2

∑ 𝑑 2 = 176

When the ranks are equal


1 1 1
6 [∑ 𝑑 2 + (𝑚1 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 12 12
3
𝑛 −𝑛

𝑚1 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 49 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑥 𝑐𝑜𝑙𝑢𝑚𝑛 = 2


𝑚2 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 43 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑥 𝑐𝑜𝑙𝑢𝑚𝑛 = 2
𝑚3 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 30 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑖𝑛 𝑦 𝑐𝑜𝑙𝑢𝑚𝑛 = 2

1 3 1 1
6 [176 + (2 − 2) + (23 − 2) + (23 − 2)]
𝑅 =1− 12 12 12
3
10 − 10
6[176 + 0.5 + 0.5 + 0.5]
=1−
103 − 10
6[177.5]
=1− = −0.07575
103 − 10
Example 12

Compute Spearman’s rank correlation coefficient from the following data


X 16 18 25 30 12
Y 38 21 38 16 50

13
EM III_SMITA N
Solution

𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐


− 𝑹𝒂𝒏𝒌 (𝒚)
16 4 38 2.5 1.5 2.25
18 3 21 4 -1 1
25 2 38 2.5 -0.5 0.25
30 1 16 5 -4 16
12 5 50 1 4 16

∑ 𝑑 2 = 35.5

When the ranks are equal


1 1 1
6 [∑ 𝑑 2 + (𝑚1 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 12 12
3
𝑛 −𝑛

𝑚1 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 38 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑥 𝑐𝑜𝑙𝑢𝑚𝑛 = 2

1 3
6 [35.5 + (2 − 2)]
𝑅 =1− 12
53 − 5
= −0.8
Example 13
Calculate the value of rank correlation coefficient from the following data
regarding marks of 6 students in statistics and accountancy in a test.
Marks in 40 42 45 35 36 39
Statistics
Marks in 46 43 44 39 40 43
Accountancy

Solution
14
EM III_SMITA N
𝒙 𝑹𝒂𝒏𝒌( 𝒙) 𝒚 𝑹𝒂𝒏𝒌( 𝒚) 𝒅 = 𝑹𝒂𝒏𝒌(𝒙) 𝒅𝟐
− 𝑹𝒂𝒏𝒌 (𝒚)
40 3 46 1 2 4
42 2 43 3.5 -1.5 2.25
45 1 44 2 -1 1
35 6 39 6 0 0
36 5 40 5 0 0
39 4 43 3.5 0.5 0.25

∑ 𝑑 2 = 7.5

When the ranks are equal

1 1 1
6 [∑ 𝑑 2 + (𝑚 3 − 𝑚1 ) + (𝑚2 3 − 𝑚2 ) + (𝑚3 3 − 𝑚3 ) + ⋯ ]
𝑅 =1− 12 1 12 12
𝑛3 − 𝑛

𝑚1 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 43 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 = 2


1 3
6 [7.5 + (2 − 2)]
𝑅 =1− 12
63 − 6

6[7.5 + 0.5]
=1−
63 − 6

6[8]
=1− = 0.7714
63 − 6

15
EM III_SMITA N
Practice Problems
Example 14
Find the rank correlation coefficient between poverty and over crowding of
cities from the following data.
Town A B C D E F G H I J
𝑁𝑜. 𝑜𝑓 𝑝𝑜𝑜𝑟 𝑓𝑎𝑚𝑖𝑙𝑖𝑒𝑠 17 13 15 16 6 11 14 9 7 12
𝑂𝑣𝑒𝑟𝑐𝑟𝑜𝑤𝑑𝑖𝑛𝑔 30 46 35 24 12 18 27 22 46 8

𝐴𝑛𝑠 𝑅 = 0.73
Example 15
Find the rank coefficient of correlation from the following data.

𝑥 105 110 112 108 111 116 120 104 115 125
𝑦 39 41 45 38 48 58 60 35 54 69

𝐴𝑛𝑠 𝑅 = 0.9636

Example 16
The coefficient of rank correlation between marks in Physics and Chemistry
obtained by a group of students is 0.8. If the sum of the squares of differenced
in ranks is 33. Find the number of pair of students.
Solution
6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛

6(33)
0.8 = 1 −
𝑛3 − 𝑛

6(33)
= 1 − 0.8
𝑛3 − 𝑛

16
EM III_SMITA N
198
= 0.2
𝑛3 − 𝑛
𝑛3 − 𝑛 = 990

𝑛3 − 𝑛 − 990 = 0
𝑛 = 10 𝑎𝑠 𝑡ℎ𝑒 𝑜𝑡ℎ𝑒𝑟 𝑡𝑤𝑜 𝑟𝑜𝑜𝑡𝑠 𝑎𝑟𝑒 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑟𝑦

Example 17
The coefficient of rank correlation of the marks obtained by 10 students in
Economics and Accountancy was found to be 0.5.It was later discovered that
the difference in the ranks in the two subjects obtained by one of the students
was wrongly taken as 3 instead of 7.Find the correct coefficient of rank
correlation.
Solution

𝑅 = 0.5 𝑛 = 10
6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛

6 ∑ 𝑑2
0.5 = 1 − 3
10 − 10
6 ∑ 𝑑2
0.5 = 1 −
990
6 ∑ 𝑑2
= 1 − 0.5
990
6 ∑ 𝑑2
= 0.5
990

6 ∑ 𝑑 2 = 990 𝑋 0.5

6 ∑ 𝑑 2 = 495

17
EM III_SMITA N
495
∑ 𝑑2 =
6

𝐶𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑑 2

= 𝐼𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 ∑ 𝑑 2 − (𝑖𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑟𝑎𝑛𝑘 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒)2


+ (𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑟𝑎𝑛𝑘 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒)2
495 245
= − 32 + 7 2 =
6 2
245
6()
𝑅 =1− 2
990
735
=1−
990
= 1 − 0.7424
= 0.2576

Example 18
If 𝑅𝑥,𝑦 = 0.143 and the sum of the squares of the difference between the
ranks is 48 find 𝑛.
Solution
6 ∑ 𝑑2
𝑅 =1− 3
𝑛 −𝑛

6(48)
0.143 = 1 −
𝑛3 − 𝑛

6(48)
= 1 − 0.143
𝑛3 − 𝑛
288
= 0.857
𝑛3 − 𝑛
𝑛3 − 𝑛 = 336

18
EM III_SMITA N
𝑛3 − 𝑛 − 336 = 0
𝑛 = 7 𝑎𝑠 𝑡ℎ𝑒 𝑜𝑡ℎ𝑒𝑟 𝑡𝑤𝑜 𝑟𝑜𝑜𝑡𝑠 𝑎𝑟𝑒 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑟𝑦

19
EM III_SMITA N
REGRESSION
Regression is defined as a method of estimating the value of one variable when that of the
other is known and when the variables are correlated.
If the variables which are highly correlated are plotted on a graph then the points lie in a
narrow strip. If the strip is nearly straight, we may draw a line such that all the points are
close to it from both the sides. Such a line can be taken as the representative of the ideal
variation. It is called the line of best fit. It is a line such that the sum of the distances of the
points from the line is minimum. It is also called as the Line of regression. But we do not
measure the distance by dropping a perpendicular from a point to the line. We measure
the deviations vertically and horizontally and get one line when distances are minimised
vertically and second line when distances are minimised horizontally. thus, we get two lines
of regression.
Lines of regression of y on x
If we minimise the deviations of the point from the line measured along y axis we get a line
which is called the line of regression of y on x. Its equation is written in the form𝑦 = 𝑎 + 𝑏𝑥
This line is used for estimating the value of y for a given value of x.

Lines of regression of x on y
If we minimise the deviations of the point from the line measured along x axis we get a line
which is called the line of regression of x on y .its equation is written in the form 𝑥 = 𝑎 +
𝑏𝑦 This line is used for estimating the value of x for a given value of y
Method of least Square
First Method (Normal Equation Method)
The equation of line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 = 𝑎 + 𝑏𝑥
Where values of 𝑎 𝑎𝑛𝑑 𝑏 are calculated by solving the Normal Equations

∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2

.
The equation of line of regression of 𝑥 𝑜𝑛 𝑦

1
EM III_SMITA N
𝑥 = 𝑎 + 𝑏𝑦
Where values of 𝑎 𝑎𝑛𝑑 𝑏 are calculated by solving the Normal Equations

∑ 𝑥 = 𝑛𝑎 + 𝑏 ∑ 𝑦

∑ 𝑥𝑦 = 𝑎 ∑ 𝑦 + 𝑏 ∑ 𝑦 2

Example 1
Obtain the equation of line of regression of 𝑦 𝑜𝑛 𝑥 from the following data and estimate
𝑦 𝑤ℎ𝑒𝑛 𝑥 = 73
𝑥: 70 72 74 76 78 80
𝑦 163 170 179 188 196 220

Solution

𝑥 𝑦 𝑥𝑦 𝑥2
70 163 11410 4900
72 170 12240 5184
74 179 13246 5476
76 188 14288 5776
78 196 15288 6084
80 220 17600 6400
∑ 𝑥 = 450 ∑ 𝑦 = 1116 ∑ 𝑥 𝑦 = 84072 ∑ 𝑥 2 = 33820

Substituting in the Normal Equation


∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 1116 = 6𝑎 + 450𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 84072 = 450𝑎 + 33820𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = −212.57 𝑏 = 5.314

𝑦 = 𝑎 + 𝑏𝑥
𝑦 = −212.57 + 5.3142𝑥
𝑤ℎ𝑒𝑛 𝑥 = 73 𝑡ℎ𝑒𝑛 𝑦 = 175.37

2
EM III_SMITA N
Example 2
Obtain the equation of line of regression of 𝑥 𝑜𝑛 𝑦 from the following data

𝑥: 1 3 4 6 8 9 11 14
𝑦 1 2 4 4 5 7 8 9

Solution

𝑥 𝑦 𝑥𝑦 𝑦2
1 1 1 1
3 2 6 4
4 4 16 16
6 4 24 16
8 5 40 25
9 7 63 49
11 8 88 64
14 9 126 81
∑ 𝑥 = 56 ∑ 𝑦 = 40 ∑ 𝑥 𝑦 = 364 ∑ 𝑦 2 = 256

Substituting in the Normal Equation

∑ 𝑥 = 𝑛𝑎 + 𝑏 ∑ 𝑦 56 = 8𝑎 + 40𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑦 + 𝑏 ∑ 𝑦 2 364 = 40𝑎 + 256𝑏……………..(2)

Solving (1) & (2) we get 𝑎 = −0.5 𝑏 = 1.5

𝑥 = 𝑎 + 𝑏𝑦

𝑥 = −0.5 + 1.5𝑦

Example 3
Find the lines of regression and hence find the coefficient of Correlation.

𝑥: 65 66 67 67 68 69 70 72
𝑦 67 68 65 66 72 72 69 71

3
EM III_SMITA N
Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
65 67 4355 4225 4489
66 68 4488 4356 4624
67 65 4355 4489 4225
67 66 4422 4489 4356
68 72 4896 4624 5184
69 72 4968 4761 5184
70 69 4830 4900 4761
72 71 5112 5184 5041
∑ 𝑥 = 544 ∑ 𝑦 = 550 ∑ 𝑥 𝑦 = 37426 ∑ 𝑥 2 = 37028 ∑ 𝑦 2 = 37864

Lines of regressions of y 𝑜𝑛 𝑥
Substituting in the Normal Equation
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 550 = 8𝑎 + 544𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 37426 = 544𝑎 + 37028𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 19.64 𝑏 = 0.722
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 19.6388 + 0.7222𝑥
Line of regression of 𝑥 𝑜𝑛 𝑦
∑ 𝑥 = 𝑛𝑎 + 𝑏 ∑ 𝑦 544 = 8𝑎 + 550𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑦 + 𝑏 ∑ 𝑦 2 37426 = 550𝑎 + 37864𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 33.29 𝑏 = 0.5048
𝑥 = 𝑎 + 𝑏𝑦
𝑥 = 33.2912 + 0.5048𝑦
𝑟 2 = (0.7222)(0.5048) = 0.3645

𝑟 = 0.6037
Example 4

Find the lines of regressions

𝒙 5 6 7 8 9 10 11
𝑦 11 14 14 15 12 17 16

4
EM III_SMITA N
Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
5 11 55 25 121
6 14 84 36 196
7 14 98 49 196
8 15 120 64 225
9 12 108 81 144
10 17 170 100 289
11 16 176 121 256
∑ 𝑥 = 56 ∑ 𝑦 = 99 ∑ 𝑥 𝑦 = 811 ∑ 𝑥 2 = 476 ∑ 𝑦 2 = 1427

Lines of regressions of y 𝑜𝑛 𝑥
Substituting in the Normal Equation
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 99 = 7𝑎 + 56𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 811 = 56𝑎 + 476𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 8.7142 𝑏 = 0.6785
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 8.7142 + 0.6785𝑥
Line of regression of 𝑥 𝑜𝑛 𝑦
∑ 𝑥 = 𝑛𝑎 + 𝑏 ∑ 𝑦 56 = 7𝑎 + 99𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑦 + 𝑏 ∑ 𝑦 2 811 = 99𝑎 + 1427𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = −2.0053 𝑏 = 0.7074
𝑥 = 𝑎 + 𝑏𝑦
𝑥 = −2.0053 + 0.7074𝑦

SECOND METHOD
METHOD OF LEAST SQUARE

Line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
𝜎𝑦
𝑤ℎ𝑒𝑟𝑒 𝑏𝑦𝑥 = 𝑟
𝜎𝑥

5
EM III_SMITA N
∑𝑥 ∑𝑦
𝑥̅ = , 𝑦̅ =
𝑛 𝑛

𝑏𝑦𝑥 = 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠

𝜎𝑥 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥
𝜎𝑦 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑦

𝑟 = 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝑥̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑥
𝑦̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑦

Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅)
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
∑𝑥 ∑𝑦
𝑥̅ = , 𝑦̅ =
𝑛 𝑛
𝑏𝑥𝑦 = 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠

𝜎𝑥 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥
𝜎𝑦 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑦

𝑟 = 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛

𝑥̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑥
𝑦̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑦

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑥𝑦 = 𝑁
2
(∑ 𝑦)
∑ 𝑦2 −
𝑁
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑦𝑥 = 𝑁
2
(∑ 𝑥)
∑ 𝑥2 −
𝑁
We know that

6
EM III_SMITA N
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

𝜎𝑦 𝜎𝑥
𝑏𝑦𝑥 𝑏𝑥𝑦 = 𝑟 𝑟 = 𝑟2
𝜎𝑥 𝜎𝑦

Properties of regression coefficient


1) Coefficient of Correlation is the geometric mean between the coefficients of
regression.

𝜎𝑦 𝜎𝑥
𝑏𝑦𝑥 𝑏𝑥𝑦 = 𝑟 𝑟 = 𝑟2
𝜎𝑥 𝜎𝑦
𝑟 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑓 𝑏𝑦𝑥 & 𝑏𝑥𝑦 𝑎𝑟𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑟 𝑖𝑠 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑖𝑓 𝑏𝑦𝑥 & 𝑏𝑥𝑦 𝑎𝑟𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒

𝑖𝑓 𝑏𝑦𝑥 𝑖𝑠 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑡ℎ𝑒𝑛 𝑏𝑥𝑦 𝑖𝑠 𝑎𝑙𝑠𝑜 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒

2) If one coefficient of regression is greater than one then the other must be less than
one.
Since −1 ≤ 𝑟 ≤ 1

𝑟2 ≤ 1
𝑏𝑦𝑥 𝑏𝑥𝑦 ≤ 1
1
𝑏𝑦𝑥 ≤
𝑏𝑥𝑦
𝑏𝑦𝑥 < 1, 𝑏𝑥𝑦 > 1

3) Arithmetic mean of the coefficients of regression is greater than or equal to the


coefficient of Correlation.
𝑏𝑦𝑥 + 𝑏𝑥𝑦
To prove ≥𝑟
2

1 𝜎𝑦 𝜎
To prove (𝑟 + 𝑟 𝜎𝑥 ) ≥ 𝑟
2 𝜎𝑥 𝑦

𝜎𝑦 𝜎
T.P (
𝜎𝑥
+ 𝜎𝑥 ) ≥ 2
𝑦

7
EM III_SMITA N
T.P 𝜎𝑦 2 + 𝜎𝑥 2 ≥ 2𝜎𝑥 𝜎𝑦

T.P 𝜎𝑦 2 + 𝜎𝑥 2 − 2𝜎𝑥 𝜎𝑦 ≥ 0

T.P (𝜎𝑥 + 𝜎𝑦 )2 ≥ 0 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑜𝑏𝑣𝑖𝑜𝑢𝑠𝑙𝑦 𝑡𝑟𝑢𝑒.

Angle between the lines of regression


1 − 𝑟2 𝜎𝑥 𝜎𝑦
𝑡𝑎𝑛𝜃 = ( )
𝑟 (𝜎𝑥 + 𝜎𝑦 )2

(𝑖) 𝐼𝑓 𝑟 = 0 𝑡𝑎𝑛𝜃 = ∞
𝜋
𝜃=
2
The lines of regression are perpendicular to each other.
(ii) 𝐼𝑓 𝑟 = ±1 𝑡𝑎𝑛𝜃 = 0
𝜃=0
The lines of regression are Coincident.

Example 1
The following table gives the age of car of a certain make and annual maintenance
cost. Obtain the equation of the line of regression of cost on age.

𝐴𝑔𝑒 𝑜𝑓 𝑎 𝑐𝑎𝑟 2 4 6 8
𝑀𝑎𝑖𝑛𝑡𝑒𝑛𝑎𝑛𝑐𝑒 𝑐𝑜𝑠𝑡 1 2 2.5 3

Solution
Line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )

𝑥 𝑦 𝑥𝑦 𝑥2
2 1 2 4
4 2 8 16
6 2.5 15 36
8 3 24 64
∑ 𝑥 = 20 ∑ 𝑦 = 8.5 ∑ 𝑥 𝑦 = 49 ∑ 𝑥 2 = 120

8
EM III_SMITA N
∑ 𝑥 20
𝑥̅ = = =5,
𝑛 4
∑ 𝑦 8.5
𝑦̅ = = = 2.125
𝑛 4
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑦𝑥 = 𝑁
2
2 (∑ 𝑥)
∑𝑥 −
𝑁
(20)(8.5)
49 −
= 4 = 0.325
202
120 − 4

Substituting in 𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )

𝑦 − 2.125 = 0.325(𝑥 − 5)

𝑦 − 2.125 = 0.325𝑥 − 1.625

𝑦 = 0.325𝑥 − 1.625 + 2.125

𝑦 = 0.325𝑥 + 0.5

Example 2

Find the lines of regression


𝑥 10 12 13 16 17 20 25
𝑦 19 22 24 27 29 33 37

Solution
Line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )

9
EM III_SMITA N
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
10 19 190 100 361
12 22 264 144 484
13 24 312 169 576
16 27 432 256 729
17 29 493 289 841
20 33 600 400 900
25 37 925 625 1369
∑ 𝒙 =113 ∑ 𝒚 = 𝟏𝟖𝟖 ∑ 𝒙 𝒚 =3216 ∑ 𝒙𝟐 = 𝟏𝟗𝟖𝟑 ∑ 𝒚𝟐 = 𝟓𝟐𝟔𝟎

∑ 𝑥 113
𝑥̅ = = = 16.1428
𝑛 7
∑ 𝑦 188
𝑦̅ = = = 26.8571
𝑛 7
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑦𝑥 = 𝑁
2
(∑ 𝑥)
∑ 𝑥2 −
𝑁
(113)(188)
3216 −
= 7 = 1.1402
1132
1983 − 7
∑𝑥∑𝑦
∑ 𝑥𝑦−
𝑁
𝑏𝑥𝑦 = (∑ 𝑦)
2
∑ 𝑦2−
𝑁

(113)(188)
3216−
7
= 1882
= 0.8590
5260−
7

Substituting in 𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )

𝑦 − 26.8571 = 1.1402 (𝑥 − 16.1428)

𝑦 = 1.1402 𝑥 − 18.4060+26.8571

𝑦 = 1.1402 𝑥 + 8.4511

Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )

𝑥 − 16.1428 = 0.8590 (𝑦 − 26.8571)

10
EM III_SMITA N
𝑥 − 16.1428 = 0.8590 𝑦 − 23.0702

𝑥 = 0.8590 𝑦 − 23.0702 + 16.1428

𝑥 = 0.8590 𝑦 −6.9274

Example 3
The following data regarding the heights (y) and the weights (x) of 100 college
students are given below
∑ 𝑥 = 15000 , ∑ 𝑥 2 = 2272500 ∑ 𝑦 = 6800

∑ 𝑦 2 = 463025 ∑ 𝑥𝑦 = 1022250
Find the correlation coefficient between height and weight and state the equations
of regression of height on weight.

Solution
∑ 𝑥 15000
𝑥̅ = = = 150
𝑛 100
∑ 𝑦 6800
𝑦̅ = = = 68
𝑛 100
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑦𝑥 = 𝑁
2
(∑ 𝑥)
∑ 𝑥2 −
𝑁
(15000)(6800)
1022250 −
= 100 = 0.1
150002
2272500 − 100

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑥𝑦 = 𝑁
2
(∑ 𝑦)
∑ 𝑦2 −
𝑁
(15000)(6800)
1022250 −
= 100 = 3.6
68002
463025 − 100

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦
= (0.1)(3.6)

11
EM III_SMITA N
= 0.36

𝑟 = √0.36 = 0.6
Line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 − 68 = 0.1(𝑥 − 150)
𝑦 − 68 = 0.1𝑥 − 15
𝑦 = 0.1𝑥 − 15 + 68
𝑦 = 0.1𝑥 + 53

Example 4

Find the regression coefficients and the coefficient of Correlation from the following
data
N=12, ∑ 𝑥 = 120 , ∑ 𝑥 2 = 1392 ∑ 𝑦 = 432
∑ 𝑦 2 = 18252 ∑ 𝑥𝑦 = 4992
Find the correlation coefficient between height and weight and state the equations
of regression of height on weight.

Solution
∑ 𝑥 120
𝑥̅ = = = 10
𝑛 12
∑ 𝑦 432
𝑦̅ = = = 36
𝑛 12
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑦𝑥 = 𝑁
2
(∑ 𝑥)
∑ 𝑥2 −
𝑁
(120)(432)
4992 −
= 12 = 3.5
1202
1392 − 12

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑥𝑦 = 𝑁
2
(∑ 𝑦)
∑ 𝑦2 −
𝑁

12
EM III_SMITA N
(120)(432)
4992 −
= 12 = 0.2488
4322
18252 − 12

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

= (3.5)(0.2488)

= 0.8708

𝑟 = √0.36 = 0.9331

Example 5
Given Variance of 𝑥 = 25. 𝑇ℎ𝑒 Equations of two lines of regression are 5𝑥 − 𝑦 =
22, 64𝑥 − 45𝑦 = 24.
Find (i) 𝑥̅ 𝑎𝑛𝑑 𝑦̅

(ii) 𝜎𝑦

(iii) 𝑟

Solution

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 = 𝜎𝑥 2 = 25
𝜎𝑥 = 5
Solving the given equations, we get the mean as

𝑥̅ = 6 𝑦̅ = 8
If we consider the given equation
5𝑥 − 𝑦 = 22 as regression equation of 𝑥 𝑜𝑛 𝑦
5𝑥 = 𝑦 + 22
1 22
𝑥 = 5𝑦 + 5

Then 𝑏𝑥𝑦 = 0.2

If we consider the given equation 64𝑥 − 45𝑦 = 24.


as as regression equation of 𝑦 𝑜𝑛 𝑥
64𝑥 − 45𝑦 = 24

13
EM III_SMITA N
45𝑦 = 64𝑥 − 24
64 24
𝑦= 𝑦−
45 45
Then 𝑏𝑦𝑥 = 1.4222
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

= (1.4222)(0.2)

= 0.2844

𝑟 = √0.2844

𝑟 = 0.5332
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 = 𝜎𝑥 2 = 25
𝜎𝑥 = 5
5
0.2 = (0.5332)
𝜎𝑦
2.666
0.2 =
𝜎𝑦
2.666
𝜎𝑦 =
0.2

𝜎𝑦 = 13.33

Example 6

From 8 observations the following results were obtained

∑ 𝑥 = 59 , ∑ 𝑥 2 = 524 ∑ 𝑦 = 40

∑ 𝑦 2 = 256 ∑ 𝑥𝑦 = 364
Find the equations of the line of regression of 𝑥 𝑜𝑛 𝑦 and the Coefficient of
Correlation.

Solution
Line of regression of 𝑥 𝑜𝑛 𝑦

14
EM III_SMITA N
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
∑ 𝑥 59
𝑥̅ = = = 7.375
𝑁 8
∑ 𝑦 40
𝑦̅ = = =5
𝑁 8

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏𝑥𝑦 = 𝑁
2
2 (∑ 𝑦)
∑𝑦 −
𝑁
(59)(40)
364 −
= 8 = 1.2321
402
256 − 8

Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )

𝑥 − 7.375 = 1.2321(𝑦 − 5)

𝑥 = 1.2321𝑦 − 6.1605 + 7.375


𝑥 = 1.2321𝑦 + 1.2145
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑟= 𝑁
2 2
√∑ 𝑥 2 − (∑ 𝑥) √∑ 𝑦 2 − (∑ 𝑦)
𝑁 𝑁

(59)(40)
364 −
= 8 = 0.9780
2 2
√524 − 59 √256 − 40
8 8

Example 7
The equations of two lines of regression are
𝑥 = 19.13 − 0.87𝑦
𝑦 = 11.64 − 0.50𝑥
Find (i) the mean of 𝑥 𝑎𝑛𝑑 𝑦 and (ii) coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦

15
EM III_SMITA N
Solution

Solving the given equations, we get the mean as

𝑥̅ = 15.93 𝑦̅ = 3.672
If we consider the given equation
𝑥 = 19.13 − 0.87𝑦 as regression equation of 𝑥 𝑜𝑛 𝑦
Then 𝑏𝑥𝑦 = −0.87

If we consider the given equation 𝑦 = 11.64 − 0.50𝑥 as as regression equation of 𝑦 𝑜𝑛 𝑥


Then 𝑏𝑦𝑥 = −0.50

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

= (−0.50)(−0.87)

= 0.435

𝑟 = −√0.435

= −0.6595
Example 8
In a partially destroyed laboratory record of analysis of correlation data the following results
are legible.
Variance of 𝑥 = 9. Equations of lines of regression 4𝑥 − 5𝑦 + 33 = 0, 20𝑥 − 9𝑦 − 107 =
0.
Find (i) mean value of 𝑥 𝑎𝑛𝑑 𝑦
(ii) Standard deviation of 𝑦
(iii) Coefficient of correlation between 𝑥 𝑎𝑛𝑑 𝑦

Solution

Solving 4𝑥 − 5𝑦 + 33 = 0, 20𝑥 − 9𝑦 − 107 = 0. We get the means


𝑥̅ = 13 𝑦̅ = 17

𝐼𝑓 𝑤𝑒 𝑡𝑎𝑘𝑒 4𝑥 − 5𝑦 + 33 = 0 𝑎𝑠 𝑙𝑖𝑛𝑒 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑜𝑓 𝑥 𝑜𝑛 𝑦 𝑡ℎ𝑒𝑛

16
EM III_SMITA N
4𝑥 = 5𝑦 − 33
5 33
𝑥= 𝑦−
4 4

5
𝑏𝑥𝑦 =
4
𝐼𝑓 𝑤𝑒 𝑡𝑎𝑘𝑒 20𝑥 − 9𝑦 − 107 = 0. 𝑎𝑠 𝑙𝑖𝑛𝑒 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑜𝑓 𝑦 𝑜𝑛 𝑥 𝑡ℎ𝑒𝑛
9𝑦 = 20𝑥 − 107
20 107
𝑦= 𝑥−
9 9

20
𝑏𝑦𝑥 =
9
Hence, we see that 𝑏𝑥𝑦 𝑎𝑛𝑑 𝑏𝑦𝑥 are greater than 1 which is not possible because
coefficient of correlation cannot be greater than 1.hence our assumption is wrong.

so 𝑤𝑒 ℎ𝑎𝑣𝑒 𝑡𝑜 𝑡𝑎𝑘𝑒 4𝑥 − 5𝑦 + 33 = 0 𝑎𝑠 𝑙𝑖𝑛𝑒 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑜𝑓 𝑦 𝑜𝑛 𝑥 𝑡ℎ𝑒𝑛


5𝑦 = 4𝑥 + 33
4 33
𝑦= 𝑥−
5 5

4
𝑏𝑦𝑥 =
5

𝐼𝑓 𝑤𝑒 𝑡𝑎𝑘𝑒 20𝑥 − 9𝑦 − 107 = 0. 𝑎𝑠 𝑙𝑖𝑛𝑒 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑜𝑓 𝑥 𝑜𝑛 𝑦 𝑡ℎ𝑒𝑛


20𝑥 = 9𝑦 + 107
9 107
𝑥= 𝑦−
20 20
9
𝑏𝑥𝑦 =
20

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

4 9
= ( )( )
5 20

= 0.36

𝑟 = √0.36

= 0.6
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 = 𝜎𝑥 2 = 9

17
EM III_SMITA N
𝜎𝑥 = 3
9 3
= (0.6)
20 𝜎𝑦
1.8
0.45 =
𝜎𝑦

1.8
𝜎𝑦 =
0.45

𝜎𝑦 = 4

Example 9

The regression lines of a sample are


𝑥 + 6𝑦 = 6
3𝑥 + 2𝑦 = 10
Find (i) the mean of 𝑥 𝑎𝑛𝑑 𝑦 and (ii) coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦.Also
estimate 𝑦 when 𝑥 = 12.

Solution

Solving the given equations, we get the mean as

𝑥̅ = 3 𝑦̅ = 0.5
If we consider the given equation
𝑥 + 6𝑦 = 6
𝑥 = −6𝑦 + 6
as regression equation of 𝑥 𝑜𝑛 𝑦
Then 𝑏𝑥𝑦 = −6

If we consider the given equation


3𝑥 + 2𝑦 = 10 as as regression equation of 𝑦 𝑜𝑛 𝑥

2𝑦 = −3𝑥 + 10

18
EM III_SMITA N
3
𝑦 =− 𝑥+5
2

3
Then 𝑏𝑦𝑥 = − 2

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

3
= (−6)(− )
2

=9

𝑟 = −3 𝑛𝑜𝑡 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 . ℎ𝑒𝑛𝑐𝑒 𝑜𝑖𝑟 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 𝑖𝑠 𝑤𝑟𝑜𝑛𝑔

= −0.6595

Hence
𝑥 + 6𝑦 = 6 is a regression line of 𝑦 𝑜𝑛 𝑥

𝑥 + 6𝑦 = 6
6𝑦 = −𝑥 + 6
1
𝑦 =− 𝑥+1
6
1
𝑏𝑦𝑥 = −
6
3𝑥 + 2𝑦 = 10 as regression equation of 𝑥 𝑜𝑛 𝑦
3𝑥 + 2𝑦 = 10
3𝑥 = −2𝑦 + 10
2 10
𝑥 =− 𝑦+
3 3
2
𝑏𝑥𝑦 = −
3

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

1 2
= (− )(− )
6 3

= 0.1111
𝑟 = −0.3333
To estimate 𝑦 when 𝑥 = 12.

19
EM III_SMITA N
1
𝑦 = − (12) + 1 = −2 + 1 = −1
6

Example 10
Given the following information about marks of 60 students

Mathematics English
Mean 80 50
S.D 15 10

Coefficient of correlation 𝑟 =
0.4. 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑡ℎ𝑒 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝑤ℎ𝑜 𝑠𝑐𝑜𝑟𝑒𝑑 60 𝑚𝑎𝑟𝑘𝑠 𝑖𝑛 𝐸𝑛𝑔𝑙𝑖𝑠ℎ.
Solution
Let 𝑥 𝑏𝑒 𝑡ℎ𝑒 𝑚𝑎𝑟𝑘𝑠 𝑖𝑛 𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝑎𝑛𝑑 𝑦 𝑏𝑒 𝑡ℎ𝑒 𝑚𝑎𝑟𝑘𝑠 𝑖𝑛 𝐸𝑛𝑔𝑙𝑖𝑠ℎ
𝑀𝑒𝑎𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝑥̅ = 80
𝑀𝑒𝑎𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝐸𝑛𝑔𝑙𝑖𝑠ℎ 𝑦̅ = 50
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝑀𝑎𝑡ℎ𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝜎𝑥 = 15
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝐸𝑛𝑔𝑙𝑖𝑠ℎ 𝜎𝑦 = 10
Coefficient of
correlation 𝑟 = 0.4
Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

15 6
= (0.4) = = 0.6
10 10

Hence 𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )

𝑥 − 80 = 0.6(𝑦 − 50)
𝑥 − 80 = 0.6𝑦 − 30
𝑥 = 0.6𝑦 − 30 + 80
𝑥 = 0.6𝑦 + 50

20
EM III_SMITA N
𝑤ℎ𝑒𝑛 𝑦 = 60
𝑥 = 0.6(60) + 50
𝑥 = 36 + 50
𝑥 = 86

Example 11
Given

𝑥 𝑠𝑒𝑟𝑖𝑒𝑠 𝑦 𝑠𝑒𝑟𝑖𝑒𝑠
Mean 18 100
S.D 14 20
𝑟 = 0.6
Find the most probable value of 𝑦 when x = 70 and most probable value of x when y =
90
Solution

𝑥̅ = 18 , 𝑦̅ = 100 𝜎𝑥 = 14 𝜎𝑦 = 20 𝑟 = 0.6

Line of regression of 𝑦 𝑜𝑛 𝑥
𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
𝜎𝑦
𝑏𝑦𝑥 = 𝑟
𝜎𝑥

20
= (0.6) = 0.8571
14

Hence
𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )

𝑦 − 100 = 0.8571(𝑥 − 18)


𝑦 = 0.8571𝑥 − 15.4278 + 100
𝑦 = 0.8571𝑥 + 84.5722

𝑤ℎ𝑒𝑛 𝑥 = 70

21
EM III_SMITA N
𝑦 = 0.8571(70) + 84.5722
𝑦 = 144.5692

Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

14
= (0.6) = 0.42
20

Hence 𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅ )

𝑥 − 18 = 0.42(𝑦 − 100)
𝑥 − 18 = 0.42𝑦 − 42
𝑥 = 0.42𝑦 − 42 + 18
𝑥 = 0.42𝑦 − 24
𝑤ℎ𝑒𝑛 𝑦 = 90
𝑥 = 0.42(90) − 24
𝑥 = 13.8

Example 12
It is given that the means of 𝑥 𝑎𝑛𝑑 𝑦 are 5 and 10. If the line of regression of 𝑦 𝑜𝑛 𝑥 is
parallel to the line 20𝑦 = 9𝑥 + 40.Estimate value of 𝑦 𝑓𝑜𝑟 𝑥 = 30.
Solution

Line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )

22
EM III_SMITA N
Slope of this equation is 𝑏𝑦𝑥 and this line is parallel to 20𝑦 = 9𝑥 + 40
9 40
𝑦= 𝑥+
20 20

9
𝑏𝑦𝑥 =
20
𝑥̅ = 5 𝑎𝑛𝑑 𝑦̅ = 10

Line of regression of 𝑦 𝑜𝑛 𝑥

9
𝑦 − 10 = (𝑥 − 5)
20

𝑦 − 10 = 0.45(𝑥 − 5)

𝑦 − 10 = 0.45𝑥 − 2.25
𝑦 = 0.45𝑥 − 2.25 + 10
𝑦 = 0.45𝑥 + 7.75

𝑤ℎ𝑒𝑛 𝑥 = 30

𝑦 = 0.45(30) + 7.75

𝑦 = 13.5 + 7.75
𝑦 = 21.25

Example 13
A panel of two judges A and B graded dramatic performances by independently awarding
marks as follows:
Performance No.: 1 2 3 4 5 6 7

23
EM III_SMITA N
Marks by A: 36 32 34 31 32 32 34
Marks by B: 35 33 31 30 34 32 36
The eighth performance however, which judge B could not attend got 38 marks by judge A.
If judge B had also been present how many marks would he be expected to have awarded to
the eighth performance?
Solution

𝑥 𝑦 𝑥𝑦 𝑥2
36 35 1260 1296
32 33 1056 1024
34 31 1054 1156
31 30 930 961
32 34 1088 1024
32 32 1024 1024
34 36 1224 1156
∑ 𝑥 = 231 ∑ 𝑦 = 231 ∑ 𝑥 𝑦 = 7636 ∑ 𝑥 2 = 7641

Substituting in the Normal Equation


∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 231 = 7𝑎 + 231𝑏 ……………(1) :
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 7636 = 231𝑎 + 7641𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 9.16 𝑏 = 0.722
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 9.16 + 0.722𝑥
𝑤ℎ𝑒𝑛 𝑥 = 38 𝑡ℎ𝑒𝑛 𝑦 = 36.52 ≅ 37

Example 14
The equations of two lines of regressions are 3𝑥 + 2𝑦 = 26, 6𝑥 + 𝑦 = 31.
Find (i) the mean of 𝑥 𝑎𝑛𝑑 𝑦
(ii) Coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦,
(iii) 𝜎𝑦 𝑖𝑓 𝜎𝑥 = 3

Solution
𝜎𝑥 = 3
Solving the given equations, we get the mean as

24
EM III_SMITA N
𝑥̅ = 4 𝑦̅ = 7
If we consider the given equation
3𝑥 + 2𝑦 = 26 as regression equation of 𝑦 𝑜𝑛 𝑥
2𝑦 = −3𝑥 + 26
−3 26
𝑦= 𝑦+
2 2

Then 𝑏𝑦𝑥 = −1.5

If we consider the given equation 6𝑥 + 𝑦 = 31.


as as regression equation of 𝑥 𝑜𝑛 𝑦
6𝑥 + 𝑦 = 31
6𝑥 = −𝑦 + 31

−1 31
𝑥= 𝑦+
6 6
Then 𝑏𝑥𝑦 = 0.1667
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

= (−1.5)(−0.1667)

= 0.25

𝑟 = √0.25

𝑟 = −0.5
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

𝜎𝑥 = 3
3
−0.1667 = (−0.5)
𝜎𝑦
1.5
0.1667 =
𝜎𝑦
1.5
𝜎𝑦 =
0.1667

𝜎𝑦 = 8.9982 ≅ 9

25
EM III_SMITA N
Practice Problems
Find the coefficient of regression and hence the equation of lines of regression for the
following data

X 78 36 98 25 75 82 90 62 65 39
y 84 51 91 60 68 62 86 58 53 47

Estimate the value of x when y=90

26
EM III_SMITA N

You might also like