Module 5 Stat

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Module 5:

Parametric Tests : Correlation, Linear Regression, and Multiple Regression Analysis

CONTENT :
 Correlation
 Linear Regression
 Multiple Regression

Objectives:
At the end of the lesson, the students are expected to:
1. Explain and illustrate when, why, and how to use Pearson r Correlation,
Linear Regression and Multiple Regression Analysis.
2. Formulate the hypothesis of the problems on Pearson r Correlation, Linear
Regression and Multiple Regression Analysis.
3. Analyze and solve problems on Pearson r Correlation, Linear Regression and
Multiple Regression Analysis.
4. Use SPSS or Excel to conduct statistical analysis.
5. Demonstrate their ability to interpret statistical outputs for decisions.

Introduction

In this module, we will discuss first, the correlation analysis, which is used to
quantify the association between two continuous variables (e.g., between an independent
and a dependent variable or between two independent variables). Regression analysis is
a related technique to assess the relationship between an outcome variable and one or
more risk factors or confounding variables. The outcome variable is also called
the response or dependent variable and the risk factors and confounders are called
the predictors, or explanatory or independent variables. In regression analysis, the
dependent variable is denoted "y" and the independent variables are denoted by "x".

Lecture Proper: 1. Correlation Analysis

Correlation

 Is a statistical technique that measures the degree of relationship between two


variables.

Page 1 of 12 PM 212 (Research and Statistics)


 Is an index of relationship between two variables. The value of correlation r ranges
between [-1, +1] and it denotes the strength of the relationship while the sign of r
denotes the nature of association.

 If the sign is positive this means the relation is direct (an increase in one variable is
associated with an increase in the other variable and a decrease in one variable is
associated with a decrease in the other variable).

 While if the sign is negative this means an inverse or indirect relationship (which
means an increase in one variable is associated with a decrease in the other).

 If r = Zero this means no association or correlation between the two variables.


 If 0 < r < 0.25 = weak correlation.
 If 0.25 ≤ r < 0.75 = moderate correlation.
 If 0.75 ≤ r < 1 = strong correlation.
 If r = l = perfect correlation.

Why do we use r?

 Because we want to analyze if a relationship exists between two variables. If there is a


relationship between two variables say, x & y, then we can determine the extent by which
x influences y using the coefficient of determination (r2). We also use r because it is
powerful test of relationship compared with other nonparametric tests.

When do we use r?

 We use r to determine the index of relationship between two variables.

Page 2 of 12 PM 212 (Research and Statistics)


How do we use r?

 The formula,

Pearson Product Moment Coefficient of Correlation (r)

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]

Where:
r = Pearson Product Moment Coefficient of Correlation
n = sample size
∑ 𝑥𝑦 = the sum of the product of x and y
∑ 𝑥 ∑ 𝑦 = the product of the sum of ∑ 𝑥 𝑎𝑛𝑑 ∑ 𝑦
∑ 𝑥 2 = sum of squares of x
∑ 𝑦 2 = sum of squares of y

Example #1.

Consider the data below,


Variables Students
1 2 3 4 5 6 7 8
Mid-Term Grades (x) 81 85 84 82 90 81 84 80
Final Grades (y) 85 90 87 86 92 95 87 88

1. Problem:
Is there a significant relationship between the mid-term grades and the final grades of
8 MPM students in Statistics at 0.05 level of significance?

Mid-Term Grades Final Grades


(x) (y)
81 85
85 90
84 87
82 86
90 92
81 95
84 87
80 88

Page 3 of 12 PM 212 (Research and Statistics)


2. Hypotheses:
HO : There is no significant relationship between the mid-term grades and the final
grades of 8 MPM students in Statistics.
H1 : There is a significant relationship between the mid-term grades and the final
grades of 8 MPM students in Statistics.
x y x2 y2 xy
3. Level of significance:
81 85 6,561 7,225 6,885
 = 0.05
85 90 7,225 8,100 7,650
df = n – 2 = 8 – 2 = 6
tabular valuer = .707 84 87 7,056 7,569 7,308
82 86 6,724 7,396 7,052
4. Statistics: Pearson r 90 92 8,100 8,464 8,280
81 95 6,561 9,025 7,695
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦 84 87 7,056 7,569 7,308
𝑟= 80 88 6,400 7,744 7,040
√[𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 ][𝑛 ∑ 𝑦2 −(∑ 𝑦)2 ]
667 710 55,683 63,092 59,218

8(59,218) − (667)(710)
𝑟=
√[8(55,683) − (667)2 ][8(63,092) − (710)2

174
= = 0.288
√365,700

5. Decision:
If the computed value of r is greater than the tabular value of r, reject the null
hypothesis.

6. Conclusion:
Since, the computed value of r is 0.288 is less than the tabular value of .707 at 0.05
level of significance with 8 degrees of freedom, the null hypothesis is accepted. This means
that there is no significant relationship between the mid-term grades and the final grades of 8
MPM students in Statistics. The relationship between the two variables is moderate.

Results: SPSS Output Results: EXCEL Output


Lecture Proper: 1. Correlation
Correlations

MTG FG x y
x 1
MTG Pearson Correlation 1 .288
y 0.28773107 1
Sig. (2-tailed) .490

N 8 8

FG Pearson Correlation .288 1

Sig. (2-tailed) .490

N 8 8

Page 4 of 12 PM 212 (Research and Statistics)


Lecture Proper: 2. Simple Linear Regression Analysis

Simple Linear Regression Analysis

 Predicts the value of the dependent variable (y) given the independent variable (x).

When do we use the simple linear regression analysis?

 It is used when there is a relationship between the two variables x and y. it Predicts the value of
y given the value x.

Why do we use linear regression analysis?

 Is used because we are interested in predicting the value of y in terms of x. this is used for
forecasting and prediction.

How do we use linear regression analysis?

y = a + bx

 The formula, 𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚
𝒃=
𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐

𝒂 = 𝒚 - b𝒙

Where:
y = dependent variable
x = independent variable
a = y-intercept
b = slope of the line
𝑥 = mean of x’s value
𝑦 = mean of y’s value

Example #2, from example #1. Suppose, we want to predict the final grade (y) of the MPM students
whose mid-term grade is 75.

Solution:

Page 5 of 12 PM 212 (Research and Statistics)


Statistics: simple linear regression analysis
x y x2 y2 xy
81 85 6,561 7,225 6,885
85 90 7,225 8,100 7,650
84 87 7,056 7,569 7,308
82 86 6,724 7,396 7,052
90 92 8,100 8,464 8,280
81 95 6,561 9,025 7,695
84 87 7,056 7,569 7,308
80 88 6,400 7,744 7,040
667 710 55,683 63,092 59,218

𝑛 ∑ 𝑥𝑦− ∑ 𝑥 ∑ 𝑦 8(59,218)−(667)(710) 174


𝒃 = == = = .303
𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 8(55,683)−(667)2 575

∑𝑥 667
𝑥= = = 83.38
𝑛 8

∑𝑦 710
𝑦= = = 88.75
𝑛 8

a = 𝑦 − 𝑏𝑥 = 88.75 - .303(83.38) = 63.49

y = a + bx
y = 63.49 + .303x
y = 63.49+ .303(75) = 86.22 grade (passed)

Results: SPSS Output

Descriptive Statistics Model Summary

Mean Std. Deviation N Adjusted R Std. Error of


Model R R Square Square the Estimate
y 88.7500 3.37004 8

x 83.3750 3.20435 8 1 .288a .083 -.070 3.48612

Correlations a. Predictors: (Constant), x

y x Coefficientsa

Pearson Correlation y 1.000 .288 Unstandardized Standardized


x .288 1.000 Coefficients Coefficients

Sig. (1-tailed) y . .245 Std.

x .245 . Model B Error Beta t Sig.

N y 8 8 1 (Constant)
63.520 34.306 1.852 .114
x 8 8

x .303 .411 .288 .736 .490


y = a + bx
y = 63.52 + .303x a. Dependent Variable:
if x = 75 y
y = 63.52 + .303(75) = 86.25
Page 6 of 12 PM 212 (Research and Statistics)
Results: Excel Output

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.287731075
R Square 0.082789171
Adjusted R
Square -0.0700793
Standard Error 3.486121552
Observations 8

Standard
Coefficients Error t Stat P-value
Lower 95% Upper 95%
-
Intercept 63.52 34.3059868 1.851572 0.11354248 20.4237257 147.4637257
X Variable 1 0.302608696 0.411200465 0.735915 0.48954096 -0.7035626 1.308779986

y = a + bx
y = 63.52 + .303x
if x = 75
y = 63.52 + .303(75) = 86.25

Lecture Proper: 3. Multiple Linear Regression Analysis

What is Multiple Linear Regression Analysis?

 Multiple linear regression is the most common form of linear regression analysis. As a
predictive analysis, the multiple linear regression is used to explain the relationship
between one continuous dependent variable and two or more independent variables. The
independent variables can be continuous or categorical (dummy coded as appropriate).

When do we use Multiple Linear Regression Analysis?

 When we predicting the value of the dependent variable (y) with two or more
independent variables.

 We want to determine if there is a relationship exists between the dependent variable (y)
and the independent variables.

Why do we use Multiple Linear Regression Analysis?

 Because we want to know the extent of influence that the independent variables have on
the dependent variable through coefficient of determination (r2 x 100%) and to know
whether the value of r is positive or negative.
Page 7 of 12 PM 212 (Research and Statistics)
How do we use Multiple Linear Regression Analysis?

 The formula,

y = b0 + b1x1 + b2x2 + b3x3 +. . . + bnxn

Where:
y = dependent variable
x1, x2, …, xn = independent variables
b0, b1, b2, . . = constant

∑y = nb0 + b1∑x1 + b2∑x2


∑x1y = (∑x1)b0 + (∑x12)b1 +( ∑ x1x2)b2
∑x2y = (∑x2)b0 + (∑x1x2)b1 +( ∑ x22)b2

Example #3.

The following are data on the ages and income of a random sample of 5 MFM students and their
academic performance. Use  = 0.05.
Income Age Academic
Student (In thousand pesos) (x1) Achievement
(y) (x2)
1 89.5 46 1.75
2 79 40 1.75
3 81.7 38 1.50
4 69.9 32 2.50
5 73.3 30 2.00
a) Find an equation of the form y = b0 + b1x1 + b2x2 of the given data.

b) Use the equation obtained in (a) to estimate the average income of a 37-year old
executive with 1.25 academic achievement.

Solution:

Student y x1 x2 x12 x22 x1y x2y x1 x2


1 89.5 46 1.75 2,116 3.0625 4,117 156.63 80.5
2 79 40 1.75 1,600 3.0625 3,160 138.25 70
3 81.7 38 1.5 1,444 2.25 3,105 122.55 57
4 69.9 32 2.5 1,024 6.25 2,237 174.75 80
5 73.3 30 2 900 4 2,199 146.60 60
Total 393.4 186 9.5 7084 18.62 14817.4 738.78 347.5

Solving for b0, b1, b2 using the 3 equations in Excel:


1) ∑ y = nb0 + b1∑x1 + b2∑x2
2) ∑x1y = (∑x1)b0 + (∑x12)b1 +( ∑ x1x2)b2
3) ∑x2y = (∑x2)b0 + (∑x1x2)b1 +( ∑ x22)b2

Page 8 of 12 PM 212 (Research and Statistics)


1) 393.4 = 5 b0 + 186b1 + 9.5b2
2) 14817.4 = 186b0 + 7084b1 + 347.5b2
3) 738.78 = 9.5b0 + 347.5b1 + 18.62b2

Solving for b0, b1, and b2 using any methods of system of linear equations in three unknowns
b0 b1 b2
Equation 1 5 186 9.5 b0 393.4
Equation 2 186 7084 347.5 b1 14817.4
Equation 3 9.5 347.5 18.62 b2 738.78

5 186 9.5 b0 393.4


186 7084 347.5 b1 = 14817.4
9.5 347.5 18.62 b2 738.78

-
37.70872374 -0.54821906 9.007881473 b0 56.59
-0.54821906 0.009640429 0.099786896 b1 = 0.90
-9.00788147 0.099786896 2.787267869 b2 -5.94
b0 = 56.59
b1 = 0.90
b2 = -5.94

y = b0 + b1x1 + b2x2 + b3x3 +. . . + bnxn

y = 56.59 + 0.90Age – 5.94Academic Achievement

a) Use the equation obtained in (a) to estimate the average income of a 37-year old
executive with 1.25 academic achievement.

y = 56.59 + 0.90Age – 5.94Academic Achievement

= 56.59 + 0.90(37) – 5.94(1.25)

= 82.465 x 1,000 = P82,465.00

Page 9 of 12 PM 212 (Research and Statistics)


Results: EXCEL Output

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.963277
R Square 0.927903
Adjusted R
Square 0.855806
Standard
Error 2.893977
Observations 5

Standard Lower Upper Lower Upper


Coefficients Error t Stat P-value 95% 95% 95.0% 95.0%
-
Intercept 56.36872 17.67663 3.188884 0.085862 -19.6877 132.4251 19.6877 132.4251
-
Age (x1) 0.899708 0.283423 3.17444 0.086547 -0.31976 2.119177 0.31976 2.119177
Academic
Achievement -
(x2) -5.87256 4.798209 -1.22391 0.345602 -26.5176 14.77246 26.5176 14.77246

Results: SPSS Output

Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate

1 .963a .928 .856 2.89398

a. Predictors: (Constant), Achievement, Age

Coefficientsa

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 56.369 17.677 3.189 .086

Age .900 .283 .758 3.174 .087

Achievement -5.873 4.798 -.292 -1.224 .346

a. Dependent Variable: Income

Page 10 of 12 PM 212 (Research and Statistics)


y = b0 + b1x1 + b2x2 + b3x3 +. . . + bnxn

y = 56.37 + 0.90Age – 5.87Academic Achievement

Page 11 of 12 PM 212 (Research and Statistics)


References:
1. Angeles, Ma. Felisa and et al., Simplified Approach To Statistics, 2005
2. Bluman, Allan G., Elementary Statistics Step-by-Step Approach, 2008
3. Berenson, Mark L. and et el., Business Statistics Concepts and Applications, Eight
Edition, 2002

Internet:
1.https://www.investopedia.com/terms/c/correlation.asp
2.https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/
3. https://www.investopedia.com/terms/c/correlation.asp
4. https://www.statisticssolutions.com/what-is-multiple-linear-regression/

Video :
1. https://www.youtube.com/watch?v=AjQA78tI39Q
2. https://www.youtube.com/watch?v=9WvINKYWpPE
3. https://www.youtube.com/watch?v=ZkjP5RJLQF4
4. https://www.youtube.com/watch?v=Qa2APhWjQPc
5. https://www.youtube.com/watch?v=rig4ZZ_cBZo
6. https://www.youtube.com/watch?v=6xcQYmPDqXs
7. https://www.youtube.com/watch?v=jGd2cj4K4Ww
8. https://www.youtube.com/watch?v=icipgz8T7dw
9. https://www.youtube.com/watch?v=ZyruUhomEnQ

Page 12 of 12 PM 212 (Research and Statistics)

You might also like