Module 5 Stat
Module 5 Stat
Module 5 Stat
CONTENT :
Correlation
Linear Regression
Multiple Regression
Objectives:
At the end of the lesson, the students are expected to:
1. Explain and illustrate when, why, and how to use Pearson r Correlation,
Linear Regression and Multiple Regression Analysis.
2. Formulate the hypothesis of the problems on Pearson r Correlation, Linear
Regression and Multiple Regression Analysis.
3. Analyze and solve problems on Pearson r Correlation, Linear Regression and
Multiple Regression Analysis.
4. Use SPSS or Excel to conduct statistical analysis.
5. Demonstrate their ability to interpret statistical outputs for decisions.
Introduction
In this module, we will discuss first, the correlation analysis, which is used to
quantify the association between two continuous variables (e.g., between an independent
and a dependent variable or between two independent variables). Regression analysis is
a related technique to assess the relationship between an outcome variable and one or
more risk factors or confounding variables. The outcome variable is also called
the response or dependent variable and the risk factors and confounders are called
the predictors, or explanatory or independent variables. In regression analysis, the
dependent variable is denoted "y" and the independent variables are denoted by "x".
Correlation
If the sign is positive this means the relation is direct (an increase in one variable is
associated with an increase in the other variable and a decrease in one variable is
associated with a decrease in the other variable).
While if the sign is negative this means an inverse or indirect relationship (which
means an increase in one variable is associated with a decrease in the other).
Why do we use r?
When do we use r?
The formula,
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
Where:
r = Pearson Product Moment Coefficient of Correlation
n = sample size
∑ 𝑥𝑦 = the sum of the product of x and y
∑ 𝑥 ∑ 𝑦 = the product of the sum of ∑ 𝑥 𝑎𝑛𝑑 ∑ 𝑦
∑ 𝑥 2 = sum of squares of x
∑ 𝑦 2 = sum of squares of y
Example #1.
1. Problem:
Is there a significant relationship between the mid-term grades and the final grades of
8 MPM students in Statistics at 0.05 level of significance?
8(59,218) − (667)(710)
𝑟=
√[8(55,683) − (667)2 ][8(63,092) − (710)2
174
= = 0.288
√365,700
5. Decision:
If the computed value of r is greater than the tabular value of r, reject the null
hypothesis.
6. Conclusion:
Since, the computed value of r is 0.288 is less than the tabular value of .707 at 0.05
level of significance with 8 degrees of freedom, the null hypothesis is accepted. This means
that there is no significant relationship between the mid-term grades and the final grades of 8
MPM students in Statistics. The relationship between the two variables is moderate.
MTG FG x y
x 1
MTG Pearson Correlation 1 .288
y 0.28773107 1
Sig. (2-tailed) .490
N 8 8
N 8 8
Predicts the value of the dependent variable (y) given the independent variable (x).
It is used when there is a relationship between the two variables x and y. it Predicts the value of
y given the value x.
Is used because we are interested in predicting the value of y in terms of x. this is used for
forecasting and prediction.
y = a + bx
The formula, 𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚
𝒃=
𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐
𝒂 = 𝒚 - b𝒙
Where:
y = dependent variable
x = independent variable
a = y-intercept
b = slope of the line
𝑥 = mean of x’s value
𝑦 = mean of y’s value
Example #2, from example #1. Suppose, we want to predict the final grade (y) of the MPM students
whose mid-term grade is 75.
Solution:
∑𝑥 667
𝑥= = = 83.38
𝑛 8
∑𝑦 710
𝑦= = = 88.75
𝑛 8
y = a + bx
y = 63.49 + .303x
y = 63.49+ .303(75) = 86.22 grade (passed)
y x Coefficientsa
N y 8 8 1 (Constant)
63.520 34.306 1.852 .114
x 8 8
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.287731075
R Square 0.082789171
Adjusted R
Square -0.0700793
Standard Error 3.486121552
Observations 8
Standard
Coefficients Error t Stat P-value
Lower 95% Upper 95%
-
Intercept 63.52 34.3059868 1.851572 0.11354248 20.4237257 147.4637257
X Variable 1 0.302608696 0.411200465 0.735915 0.48954096 -0.7035626 1.308779986
y = a + bx
y = 63.52 + .303x
if x = 75
y = 63.52 + .303(75) = 86.25
Multiple linear regression is the most common form of linear regression analysis. As a
predictive analysis, the multiple linear regression is used to explain the relationship
between one continuous dependent variable and two or more independent variables. The
independent variables can be continuous or categorical (dummy coded as appropriate).
When we predicting the value of the dependent variable (y) with two or more
independent variables.
We want to determine if there is a relationship exists between the dependent variable (y)
and the independent variables.
Because we want to know the extent of influence that the independent variables have on
the dependent variable through coefficient of determination (r2 x 100%) and to know
whether the value of r is positive or negative.
Page 7 of 12 PM 212 (Research and Statistics)
How do we use Multiple Linear Regression Analysis?
The formula,
Where:
y = dependent variable
x1, x2, …, xn = independent variables
b0, b1, b2, . . = constant
Example #3.
The following are data on the ages and income of a random sample of 5 MFM students and their
academic performance. Use = 0.05.
Income Age Academic
Student (In thousand pesos) (x1) Achievement
(y) (x2)
1 89.5 46 1.75
2 79 40 1.75
3 81.7 38 1.50
4 69.9 32 2.50
5 73.3 30 2.00
a) Find an equation of the form y = b0 + b1x1 + b2x2 of the given data.
b) Use the equation obtained in (a) to estimate the average income of a 37-year old
executive with 1.25 academic achievement.
Solution:
Solving for b0, b1, and b2 using any methods of system of linear equations in three unknowns
b0 b1 b2
Equation 1 5 186 9.5 b0 393.4
Equation 2 186 7084 347.5 b1 14817.4
Equation 3 9.5 347.5 18.62 b2 738.78
-
37.70872374 -0.54821906 9.007881473 b0 56.59
-0.54821906 0.009640429 0.099786896 b1 = 0.90
-9.00788147 0.099786896 2.787267869 b2 -5.94
b0 = 56.59
b1 = 0.90
b2 = -5.94
a) Use the equation obtained in (a) to estimate the average income of a 37-year old
executive with 1.25 academic achievement.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.963277
R Square 0.927903
Adjusted R
Square 0.855806
Standard
Error 2.893977
Observations 5
Model Summary
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Internet:
1.https://www.investopedia.com/terms/c/correlation.asp
2.https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/
3. https://www.investopedia.com/terms/c/correlation.asp
4. https://www.statisticssolutions.com/what-is-multiple-linear-regression/
Video :
1. https://www.youtube.com/watch?v=AjQA78tI39Q
2. https://www.youtube.com/watch?v=9WvINKYWpPE
3. https://www.youtube.com/watch?v=ZkjP5RJLQF4
4. https://www.youtube.com/watch?v=Qa2APhWjQPc
5. https://www.youtube.com/watch?v=rig4ZZ_cBZo
6. https://www.youtube.com/watch?v=6xcQYmPDqXs
7. https://www.youtube.com/watch?v=jGd2cj4K4Ww
8. https://www.youtube.com/watch?v=icipgz8T7dw
9. https://www.youtube.com/watch?v=ZyruUhomEnQ