Lesson 6: Correlation and Linear Regression
Lesson 6: Correlation and Linear Regression
Lesson 6: Correlation and Linear Regression
Correlation and
Linear Regression
2
Correlation Analysis
• The process of investigating a relationship between variables.
• Measures the association or strength of the relationship between two
variables say X and Y.
• Finding the relationship between two quantitative variables without being
able to infer causal relationships.
• Correlation is a statistical technique used to determine the degree to
which two variables are related.
3
Pearson’s Correlation – Moment Correlation Coefficient
It measures the nature and strength between two variables of
the quantitative type.
4
0.00 No Correlation
Perfect Correlation
High Correlation
Rectangular coordinate
Two quantitative variables
One variable is called
Y
independent (X) and the second * *
is called dependent (Y) *
Points are not joined X
No frequency table
6
Scatter plots
The pattern of data is indicative of the type of
relationship between your two variables:
Positive relationship
Negative relationship
No relationship
7
8
Positive
Relationship
Two variables are Place your screenshot here
positively correlated if
the two variables both
increase
9
Negative
Relationship
Two variables are Place your screenshot here
negatively correlated
if the one variables
increase while the
values of the other
decreases.
10
No
Relationship
Two variables are not Place your screenshot here
Example problem:
N (Student) 1 2 3 4 5 6 7 8 9 10
X (No. of 45 6 8 4 2 1 5 7 4 6
Study
hours)
Y (Exam 85 80 92 70 65 60 89 82 81 95
Score)
13
Ho: There is no significant relationship between the number of study hours and
the result of the exam score of different students.
Ha: There is significant difference between the number of study hours and the
result of the exam score of different students.
14
Student X (No. of study Y (Exam score) x.y
hours)
1
1 44 85
85 340
340 16
16 7,225
7,225
2
2 66 80
80 480
480 36
36 6,400
6,400
3
3 88 92
92 736
736 64
64 8,464
8,464
4
4 44 70
70 280
280 16
16 4,900
4,900
5
5 22 65
65 130
130 44 3,600
3,600
6
6 11 60
60 60
60 11 7,921
7,921
7
7 55 89
89 445
445 25
25 6,724
6,724
8
8 77 80
80 574
574 49
49 6,5724
6,5724
9
9 44 81
81 324
324 16
16 6,561
6,561
10
10 66 95
95 570
570 36
36 9,025
9,025
15
Conclusion:
The calculated correlation coefficient is positive. Therefore it implies direct
relationship between the number of study hours of the students and their
exam scores, Also the magnitude of the correlation coefficient is 0. 8156 or
0.82 which means the result of r implies a very high correlation.
Negative
Correlation
17
Calculate and analyze the correlation coefficient between the number of the
exam scores and the number of hours on social media of different students.
Example problem:
Student 1 2 3 4 5 6 7 8 9 10
(N)
No. of 18 16 15 11 12 10 8 4 2 0
Exam Score
(X)
Number of 1 3 5 6 9 11 10 12 11 15
Hours on
Social
Media (Y)
18
Therefore
x̅ = = = 9.6
ȳ = = = 8.3
21
22
23
Conclusion:
The calculated correlation coefficient is negative. Therefore it
implies that there is an inverse relationship between the
number of the exam scores and the number of hours on social
media of different students.
No Correlation
Sample problem
x y
3 4
6 1
9 3
12 5
15 2
x y X-X Y-Y (X – X)^2 (Y-Y)^2
3 4 -6 1 -36 1
6 1 -3 -2 -9 -4
9 3 0 0 0 0
12 5 2 2 9 4
15 2 1 1 36 1
= 45 = 15
27
Regression Analysis
Regression: technique concerned with predicting some variables by
knowing others
The process of predicting variable Y using variable X
Regression
Calculates the “best-fit” line for a certain set of data
The regression line makes the sum of the squares of the
residuals smaller than for any other line
30
Regression minimizes residuals
SBP(mmHg)
220
200
180
160
140
120
100
80
Wt (kg)
60 70 80 90 100 110 120
31
By using the least squares method (a procedure that minimizes the
vertical deviations of plotted points surrounding a straight line) we
are able to construct a best fitting straight line to the scatter diagram
points and then formulate a regression equation in the form of:
Regression Equation
32
SBP(mmHg)
Regression equation 220
200
describes the regression 180
Intercept 140
120
Slope 100
80
Wt (kg)
60 70 80 90 100 110 120
Linear Equations
33
Y
Y = bX + a
Change
b = S lo p e in Y
C h a n g e in X
a = Y -in te r c e p t
X
34
y=a+bx
a=
b=
35
Example 1. (line regression)
The data in the table represent the membership at
a university mathematics club during the past 5
years. Number of Years (x) Membership (y)
1 25
2 30
3 32
4 45
5 50
36
Form a curve of the form y=a+bx to predict the membership 5 years
from now.
x y xy
1 25 1 25
2 30 4 60
3 32 9 96
4 45 16 180
5 50 25 250
▫
a=
= = 16.9
b=
= 6.5
The equation is y=a+bx
y=16.9+6.5x
y=16.9+6.5(10)
= 81.9 or 82
Therefore, five years from now, the club would have 82 members
39
THANKS!
Any questions?