Chapter-4 Ni Kachua Kamote
Chapter-4 Ni Kachua Kamote
Chapter-4 Ni Kachua Kamote
OBJECTIVES
INTRODUCTION
Correlation and regression analysis are one of the most important legacies of Sir Francis Galton.
Correlation analysis is concerned with the relationships between two variables whereas
Regression analysis is used to describe the relationship precisely by mean of an equation.
Correlation and regression refer to the relationship that exists between two variables, X and Y,
in the case where each particular value of Xi is paired with one particular value of Yi. For
example: the measures of height for individual human subjects, paired with their corresponding
measures of weight; the number of hours that individual students in a statistics course spend
studying prior to an exam, paired with their corresponding measures of performance on the
exam; the amount of class time that individual students in a statistics course spend snoozing and
daydreaming prior to an exam, paired with their corresponding measures of performance on the
exam; and so on.
74
increase; and in the second kind (the more of this, the less of that), you are speaking of a
negative correlation between the two variables meaning if one variable increases, the other
decreases.
Correlation and regression are two sides of the same coin. In the under lying logic, you can begin
with either one or end up with the other. We will begin with correlation, since that is the part of
the correlation-regression story with which you are probably already somewhat familiar.
Correlation Analysis
Basic Properties of ρ
Interpretation of ρ
Note:
Coefficient of correlation measures the similarity of the changes in the
value of x and y
75
Formulas for getting the coefficient of correlation
2. Product moment
∑𝑋𝑌
r=
√(∑𝑋 2 )(∑𝑌 2 )
∑(𝑥− 𝑥̅ )(𝑦−𝑦̅)
r= 𝑛(𝑆𝑥 )(𝑆𝑦 )
76
Proof that raw score formula is equal to product moment formula:
∑𝑋𝑌 n(∑xy)− (∑x)(∑y)
=
√(∑𝑋 2 )(∑𝑌 2 ) √[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]
∑x∑y
∑x y− n(∑xy)− (∑x)(∑y)
n
𝑛∑𝑥2 − (∑𝑥)2 𝑛∑𝑦2 −(∑𝑦)2
=
√( √[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]
)( )
𝑛 𝑛
Data which are arranged in numerical order, usually from largest to smallest and
numbered 1, 2, 3 ---- are said to be in ranks or ranked data. These ranks prove useful at
certain times when two or more values of one variable are the same. The coefficient of
correlation for such type of data is given by Spearman rank difference correlation
coefficient and is denoted by ρ.
6∑𝑑2
ρ = 1 − 𝑛(𝑛2 −1)
77
d = is equal to the difference of rank 1 and rank 2
Examples:
78
(𝑥 − (𝑦 − (𝑥 − 𝑥̅ )(𝑦 −
x y x2 y2 xy (𝑥 − 𝑥̅ ) (𝑦 − 𝑦̅)
𝑥̅ )² 𝑦̅) ² 𝑦̅)
18.7777 3.67361
20 4 400 16 80 -4.3333 -1.9167 8.305555556
8 1
0.44444 15.3402
25 2 625 4 50 0.66667 -3.9167 -2.611111111
4 8
205.444 8.50694
10 3 100 9 30 -14.333 -2.9167 41.80555556
4 4
87.1111 0.84027
15 5 225 25 75 -9.3333 -0.9167 8.555555556
1 8
32.1111 4.34027
30 8 900 64 240 5.66667 2.08333 11.80555556
1 8
0.11111 1.17361
24 7 576 49 168 -0.3333 1.08333 -0.361111111
1 1
13.4444 1.17361
28 7 784 49 196 3.66667 1.08333 3.972222222
4 1
122 113.777 9.50694
35 9 81 315 10.6667 3.08333 32.88888889
5 8 4
152.111 8.50694
12 3 144 9 36 -12.333 -2.9167 35.97222222
1 4
69.4444 0.84027
16 5 256 25 80 -8.3333 -0.9167 7.638888889
4 8
102 58.7777 4.34027
32 8 64 256 7.66667 2.08333 15.97222222
4 8 8
1 202 427.111 16.6736
45 100 450 20.6667 4.08333 84.38888889
0 5 1 1
29 7 828 1178.66 74.9166
495 1976 0 0 248.3333333
2 1 4 7 7
Solving r using:
1. Raw Score:
𝑛(∑𝑥𝑦)−(∑𝑥)(∑𝑦)
r=
√[𝑛(∑𝑥 2 )−(∑𝑥)2 ][𝑛(∑𝑦 2 )−(∑𝑦)²
12(1976)−(292)(71)
r=
√[12(8284)−(292)2 ][12(495)−(71)2 ]
79
2. Standard Score
1178.6667 74.1967
Sx = √ Sy =√
12 12
Sx = 9.9107 Sy = 2.4866
∑(𝑥−𝑥̅ )(𝑦− 𝑦̅ỷ)
r= 𝑛(𝑆𝑥)(𝑆𝑦)
248.3333
r= 12(9.9107)(2.2866)
3. Product Moment
∑𝑋𝑌
r=
√(∑𝑋 2 )(∑𝑌 2 )
248.3333
r=
√(1178.6667)(74.1967)
80
4. Spearman Rho
x y Rx Ry d d2
20 4 5 8 -3 9
25 2 12.5 10 2.5 6.25
10 3 18 17 1 1
15 5 11 6 5 25
30 8 2.5 4.5 -2 4
24 7 2.5 1.5 1 1
28 7 16 18 -2 4
35 9 6 7 -1 1
12 3 4 83 -79 6241
16 5 9 4.5 4.5 20.25
32 8 9 12.5 -3.5 12.25
45 10 17 10 7 49
292 71 -69.5 6373.75
6∑𝑑2
ρ = 1 − 𝑛(𝑛2 −1)
6(63.6375)
ρ = 1 − 12(122 −1)
3. The following table summarizes the results of an aptitude test given to six clerks to
determine the correlation between test scores (x) and sales in the first month (y) in
hundreds of dollars.
x 80 65 48 67 91 91 52 75 86 71 72 51 67 52 74 55 94 69
y 71 68 57 81 79 87 50 77 83 78 67 67 61 66 65 63 85 64
81
(𝑥 (𝑦 (𝑥 − 𝑥̅ )(𝑦
x y x² y² xy 𝑥 − 𝑥̅ 𝑦 − 𝑦̅
− 𝑥̅ )2 − 𝑦̅)2 − 𝑦̅)
80 71 6400 5041 5680 10 0.5 100 0.25 5
65 68 4225 4624 4420 -5 -2.5 25 6.25 12.5
48 57 2304 3249 2736 -22 -13.5 484 182.25 297
67 81 4489 6561 5427 -3 10.5 9 110.25 -31.5
91 79 8281 6241 7189 21 8.5 441 72.25 178.5
91 87 8281 7569 7917 21 16.5 441 272.25 346.5
52 50 2704 2500 2600 -18 -20.5 324 420.25 369
75 77 5625 5929 5775 5 6.5 25 42.25 32.5
86 83 7396 6889 7138 16 12.5 256 156.25 200
71 78 5041 6084 5538 1 7.5 1 56.25 7.5
72 67 5184 4489 4824 2 -3.5 4 12.25 -7
51 67 2601 4489 3417 -19 -3.5 361 12.25 66.5
67 61 4489 3721 4087 -3 -9.5 9 90.25 28.5
52 66 2704 4356 3432 -18 -4.5 324 20.25 81
74 65 5476 4225 4810 4 -5.5 16 30.25 -22
55 63 3025 3969 3465 -15 -7.5 225 56.25 112.5
94 85 8836 7225 7990 24 14.5 576 210.25 348
69 64 4761 4096 4416 -1 -6.5 1 42.25 6.5
1260 1269 91822 91257 90861 0 0 3622 1792.5 2031
Solutions:
n(∑xy) − (∑x)(∑y)
𝑟=
√[𝑛(∑𝑥 2 ) − (∑𝑥)2 ][𝑛(∑𝑦 2 ) − (∑𝑦)2 ]
18(90861) − (1260)(1269)
𝑟=
√[18(91822) − (1260)²][18(91257) − (1269)²]
82
Using Product Moment:
∑𝑋𝑌
𝑟=
√(∑𝑋 2 )(∑𝑌 2 )
2031
𝑟=
√(3622)(1792.5)
∑ ( x x)( y y)
𝑟 = 𝑛(𝑆𝑥 )(𝑆𝑦 )
∑(𝑥 − )2
𝑆𝑥 = √
𝑛
3622
𝑆𝑥 = √
18
𝑆𝑥 = 14.1853
∑( 𝑦 − ỷ)2
𝑆𝑦 = √
𝑛
1792.5
𝑆𝑦 = √
18
𝑆𝑦 = 9.9791
83
∑ ( x x)( y y )
𝑟 =
𝑛(𝑆𝑥)(𝑆𝑦)
2031
𝑟 =
18(14.1853)(9.9791)
x y Rx Ry d d²
80 71 14 11 3 9
65 68 6 10 -4 16
48 57 1 2 -1 1
67 81 7.5 15 -7.5 56.25
91 79 16.5 14 2.5 6.25
91 87 16.5 18 -1.5 2.25
52 50 3.5 1 2.5 6.25
75 77 13 12 1 1
86 83 15 16 -1 1
71 78 10 13 -3 9
72 67 11 8.5 2.5 6.25
51 67 2 8.5 -6.5 42.25
67 61 7.5 3 4.5 20.25
52 66 3.5 7 -3.5 12.25
74 65 12 6 6 36
55 63 5 4 1 1
94 85 18 17 1 1
69 64 9 5 4 16
1260 1269 0 243
6∑𝑑²
𝜌 = 1−
𝑛(𝑛2 − 1)
6(243)
𝜌 = 1−
18(182 − 1)
84
𝜌 = 0.7492 Mark or High relationship
3. With the growth of internet service providers, a researcher decides to examine whether there
is a correlation between cost of internet service per month (rounded to the nearest dollar) and
degree of customer satisfaction (on a scale of 1 - 25 with a 1 being not at all satisfied and a 25
being extremely satisfied). The researcher only includes programs with comparable types of
services. A sample of the data is provided below.
x y
(customer (cost of
satisfaction) internet)
20 30
20 38
22 40
22.5 25
23 20
23.5 10
24 13
24.5 15
25 9
25.5 12
230 212
85
Solutions:
47080−48760
r=
√[53230−52900][56880−44944]
n(∑xy)− (∑x)(∑y)
r=
√[𝑛(∑𝑥 2 )− (∑𝑥)2 ][𝑛(∑𝑦 2 )− (∑𝑦)2 ]
33 1193.6
Sx = √10 Sy =√ 10
Sx = 1.8165 Sy = 10.9252
−168
r = 10(1.8165)(10.9252)
−168
r=
√(33)(1193.6)
86
r = -0.84 Mark or High relationship
x y Rx Ry d d²
20 30 1.5 8 -6.5 42.25
20 38 1.5 9 -7.5 56.25
22 40 3 10 -7 49
22.5 25 4 7 -3 9
23 20 5 6 -1 1
23.5 10 6 2 4 16
24 13 7 4 3 9
24.5 15 8 5 3 9
25 9 9 1 8 64
25.5 12 10 3 7 49
230 212 0 304.5
Using Spearman rank coefficient
6∑𝑑2
ρ = 1 − 𝑛(𝑛2 −1)
6(304.5)
ρ = 1 − 10(100−1)
Regression Analysis
Regression analysis is the analysis of several variables in which the focus is on the
relationship between a dependent variable and one or more independent variables.
Scatter Diagram
87
high anxiety) and also completed a checklist designed to measure an individuals degree of
religiosity (belief in a particular religion, regular attendance at religious services, number of
times per week they regularly pray, etc.) (high score = greater religiosity . A data sample is
provided below:
x
88
LEAST SQUARE REGRESSION LINE
The Least Square Regression Line or the Method of Least Squares is the statistical procedure
for finding the best-fitting straight line for a set of points in a given problem.
Formulas:
Y= a0 + a1x X= b0 + b1x
Where: Where:
Where:
∑y – is the sum of all values of Y
∑x – is the sum of all values of X
n – is the total number of pairs of X and Y
∑y2 – is the sum of all squares of each value of Y
∑x2 - is the sum of all squares of each value of X
∑xy – is the sum of the individual product of each pair of X and Y
89
Example:
x y x2 y2 xy
20 4 400 16 80
25 2 625 4 50
10 3 100 9 30
15 5 225 25 75
30 8 900 64 240
24 7 576 49 168
28 7 784 49 196
35 9 1225 81 315
12 3 144 9 36
16 5 256 25 80
32 8 1024 64 256
45 10 2025 100 450
292 71 8284 495 1976
90
LSRL of Y on X
y= a0 + a1x
(71)(8284)−(292)(1976) 12(1976)−(292)(71)
a0 = a1 =
12(8284)−(292)2 12(8284)−(292)2
a0 = 0.7899 a1 = 0.2107
y = 0.7899 + 0.2107x
LSRL of X on Y
x= b0 + b1y
(292)(495)−(71)(1976) 12(1976)−(292)(71)
b0 = b1 =
12(495)−(71)2 12(495)−(71)2
b0 = -0.7853 b1 = 3.3148
x = -0.7853 + 3.3148y
x
91
2. The following table summarizes the results of an aptitude test given to six clerks to
determine the correlation between test scores (x) and sales in the first month (y) in
hundreds of dollars.
X 80 65 48 67 91 91 52 75 86 71 72 51 67 52 74 55 94 69
Y 71 68 57 81 79 87 50 77 83 78 67 67 61 66 65 63 85 64
x y x² y² xy
80 71 6400 5041 5680
65 68 4225 4624 4420
48 57 2304 3249 2736
67 81 4489 6561 5427
91 79 8281 6241 7189
91 87 8281 7569 7917
52 50 2704 2500 2600
75 77 5625 5929 5775
86 83 7396 6889 7138
71 78 5041 6084 5538
72 67 5184 4489 4824
51 67 2601 4489 3417
67 61 4489 3721 4087
52 66 2704 4356 3432
74 65 5476 4225 4810
55 63 3025 3969 3465
94 85 8836 7225 7990
69 64 4761 4096 4416
1260 1269 91822 91257 90861
LSRL of Y on X
(∑𝑦)(∑𝑥 2 ) − (∑𝑥)(∑𝑥𝑦)
𝑎0 =
𝑛(∑𝑥 2 ) − (∑𝑥)²
(1269)(91822) − (1260)(90861)
𝑎0 =
18(91822) − (1260)²
𝑎0 = 31.2482
92
𝑛(∑𝑥𝑦) − (∑𝑥)(∑𝑦)
𝑎1 =
𝑛(∑𝑥 2 ) − (∑𝑥)²
18(90861) − (1260)(1269)
𝑎1 =
18(91822) − (1260)²
𝑎1 = 0.5607
y = 31.2482 + 0.5607x
LSRL of X on Y
(∑𝑥)(∑𝑦 2 ) − (∑𝑦)(∑𝑥𝑦)
𝑏0 =
𝑛(∑𝑦 2 ) − (∑𝑦)²
(1260)(91257) − (1269)(90861)
𝑏0 =
18(91257) − (1269)²
𝑏0 = −9.8803
𝑛(∑𝑥𝑦) − (∑𝑥)(∑𝑦)
𝑏1 =
𝑛(∑𝑦 2 ) − (∑𝑦)²
18(90861) − (1260)(1269)
𝑏1 =
18(91257) − (1269)²
𝑏1 = 1.1331
y
x=- 9.8803 + 1.1331y
93 x
3. With the growth of internet service providers, a researcher decides to examine whether there
is a correlation between cost of internet service per month (rounded to the nearest dollar) and
degree of customer satisfaction (on a scale of 1 - 25 with a 1 being not at all satisfied and a 25
being extremely satisfied). The researcher only includes programs with comparable types of
services. A sample of the data is provided below.
x y
(customer (cost of
satisfaction) internet)
20 30
20 38
22 40
22.5 25
23 20
23.5 10
24 13
24.5 15
25 9
25.5 12
230 212
LSRL of y on x
y= a0 + a1x
(212)(5323)−(230)(4708) 11(4708)−(230)(212)
a0= a1 =
11(5323)−(230)2 11(5323)−(230)2
a0 = 8.0729 a1 = 0.5356
94
y = 8.0729 + 0.5356x
LSRL of x on y
x= b0 + b1y
(230)(5688)−(212)(4708) 11(4708)−(230)(212)
b0 = b1 =
11(5688)−(212)2 11(5688)−(212)2
b0 = 17.5978 b1 = 0.1718
x = 17.5978 + 0.1718y
y
x y x2
y2 xy 𝑥 − 𝑥̅ 𝑦 − 𝑦̅ (𝑥 − 𝑥̅ )2 𝑦 − 𝑦̅)2 (𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
20 30 400 900 600 -3 8.8 9 77.44 -26.4
20 38 400 1444 760 -3 16.8 9 282.24 -50.4
22 40 484 1600 880 -1 18.8 1 353.44 -18.8
22.5 25 506 625 562.5 -0.5 3.8 0.25 14.44 x -1.9
23 20 529 400
460
0
-1.2
0 1.44
0
23.5 10 552 100
235 0.5 -11.2 0.25 125.44 -5.6
24 13 576 169
312 1 -8.2 1 67.24 -8.2
24.5 15 600 225 367.5 1.5 -6.2 2.25 38.44 -9.3
25 9 625 81 225 2 -12.2 4 148.84 -24.4
25.5 12 650 144 306 2.5 -9.2 6.25 84.64 -23
230 212 5323 5688 4708 0 0 33 1193.6 -168
95
LEAST SQUARE REGRESSION PARABOLA
The method of Least Square Regression Parabola assumes that the best-fit
curve of a given type is the curve that has the minimal sum of the deviations
squared (least square error) from a given set of data.
Formulas:
∑y = 𝑎0 𝑛 + 𝑎1 ∑𝑥 + 𝑎2 ∑𝑥 2
∑xy = 𝑎0 ∑𝑥 + 𝑎1 ∑𝑥 2 + 𝑎2 ∑𝑥 3
∑x2y = 𝑎0 ∑𝑥 2 + 𝑎1 ∑𝑥 3 + 𝑎2 ∑𝑥 4
y= a0 + a1x + a2x2
Where:
(∑𝑦)(∑𝑥 2 )− (∑𝑥)(∑𝑥𝑦)
a0 = 𝑛(∑𝑥 2 )− (∑𝑥)2
𝑛(∑𝑥𝑦)− (∑𝑥)(∑𝑦)
a1 = 𝑛(∑𝑥 2 )− (∑𝑥)2
96
Chapter Exercise
1. x 2 5 7 1 4 3 0 2
y 10 4 2 8 5 3 5 8 X - Achievement
2. x 2 7 5 4 9 3 3 4 5 6 Y - GPA
y 20 35 48 51 71 39 45 25 60 70
97