WEEK 8 Regression Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

11

Self-Learning Kit
Statistics & Probability
Quarter 4 - Week 8

ALVIN M. TAMPOS
Writer
Statistics & Probability
Self-Learning Kit
Quarter 4 – Week 8
First Edition, 2020

Republic Act 8293, section 176 states that: No copyright shall subsist in any work of the Government
of the Philippines. However, prior approval of the government agency or office wherein the work is created
shall be necessary for exploitation of such work for profit. Such agency or office may, among other things,
impose as a condition the payment of royalties.

Borrowed materials (i.e., songs, stories, poems, pictures, photos, brand names, trademarks, etc.)
included in this Self-Learning Kit are owned by their respective copyright holders. Every effort has been exerted
to locate and seek permission to use these materials from their respective copyright owners. The publisher
and authors do not represent nor claim ownership over them.

Published by the Department of Education


Secretary: Leonor Magtolis Briones
Undersecretary: Diosdado M. San Antonio

Development Team of the Self-Learning Kit

Writer: Alvin M. Tampos


Editors: Mary Ann E. Ramos
Reviewers: Mary Ann E. Ramos
Illustrator: John Orven V. Saldaña
Layout Artist: Joel R. Capuyan
Management Team: Leah P. Noveras, Ed.D., CESO VI
Bernadette A. Susvilla, Ed.D., CESO VI
Lilia R. Ybañez
Mary Ann E. Ramos
Reynilda G. Ramoneda
Joel R. Capuyan
Raymond L. Ceniza

Printed in the Philippines by ________________________

Department of Education – Region VII Schools Division of Danao City

Office Address: Sitio Upland, National Road, Danao City, Cebu


Telephone No. (032) 262-6211
Telefax: [email protected]
E-mail Address: depeddanaocity.com

i
Note to the Learner

This Self-Learning Kit is prepared for you to learn the specified competencies
based on the Most Essential Learning Competencies (MELC) for Statistics and
Probability, Quarter 4, Week 8. It is designed in a simplified structure to help you easily
understand the lesson for the week. It contains the following parts:

I Have Includes an activity that aims to check what you


Known already know about this lesson

I Can Consists of activities that will help you view the


Connect previous lesson and prepare you to the new one

Details the presentation and discussion of the


I Can Learn
concepts that you need to learn in this new lesson

Comprises of activities for independent practice to


I Can Try
check understanding of the new concepts learned

I Can Contains exercises to validate your knowledge and


Assess understanding of the concepts learned

I Can Do Covers activities and exercises that you can do


More further to enrich your learning

Answer Key Contains the key to correction of all the exercises

Indicates the sources in the development of this


References
Self- Learning Kit (SLK)

ii
Lesson Title Regression Analysis
Learning • solves problems involving regression analysis.
Competency
MELC Code • M11/12SP-IVj-2

I Have Known

A. Directions: Answer the following exercises. Choose the letter of the correct
answer.

1. The regression line Y’=bX +a is also called?


A. The line prediction equation
B. The line progression equation
C. The line direction equation
D. The line deviation equation
2. The field of statistics that deals with prediction is called.
A. T-test C. Chi-Square
B. Regression D. Anova
3. When performing regression analysis, we must determine the _________
and the ____________.
A. Dependent and Independent Variable
B. Direction and strength
C. Slope and Y-intercept
D. Both A and C
4. When the trend line is drawn, we observe that some of the points are on
the line while others are below or above the line. In other words, we say
that the point in the scatterplot regress with reference to the line. If the
average y distances of the points from this line is the least, then we call this
line the.
A. Slope C. Direction
B. Regression line D. all of the above

B. Directions: Solve what is being asked:


5. Determine the slope and y intercept in the following linear equations.
2𝑥
A. Y= 3x-5 B. Y= + 2 C. y=-4x
3

6. Two variables X & Y are related by the line Y=3x-5. Solve for the value of Y
given the following values of X:
A. X= 6
B. X=10
C. X=36

3
I Can Connect

In the last lesson, we learned that when the trend line is drawn, we observe
that some of the points are on the line while others are below or above the line. In
other words, we say that the point in the scatterplot regress with reference to the line.
If the average y distances of the points from this line is the least, then we call this line
the regression line of the line that “best fit” in the scatterplot. The regression line is the
same as the trend line.

To find the regression line, we use the least-square method, which is


summarized using a formula. Like the equation of a line in algebra, we write the
equation of the regression line using the “point-slope-form”.

The Regression Line (The Line of Best Fit)

The equation 𝑌 ′ = 𝑏𝑋 + 𝑎 is the equation of the regression line, where a is the y-


intercept and b is the slope of the regression line. The values of a and b can be found
using the following formulas.

(∑ 𝑌) (∑ 𝑋 2 ) − (∑ 𝑋)(∑ 𝑋𝑌)
𝑎=
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)2

𝑛(𝑋𝑌) − (∑ 𝑋)(∑ 𝑌)
𝑏=
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)2

The regression line Y’=bX +a is also called the line prediction equation because
we use it to predict Y if X is known. Since in the analysis, only the y distance was
considered, the line cannot be used to predict X from Y.

To determine the regression line or do a regression analysis, we go through the


following steps.

1. Find the value of the correlation coefficient (r).


2. Test the significance of r. If r is significant, proceed to regression analysis
(Proceed to step 3). If r is not significant, regression analysis cannot be
done (STOP)
3. Find the values of a and b.
4. Substitute the values of a and b in the regression line Y’=bX+a
.

4
I Can Learn

In this lesson, we will further understand regression by solving more problems


involving regression. Also, in this lesson, you will be taught on how to solve regression
using Microsoft Excel.

SOLVING WORD PROBLEMS IN REGRESSION ANALYSIS

The following data shows the scores of selected grade 11 Senior High School
students in General Mathematics and Earth and Life Science.
a. Test if there is a significant relationship between the two variables at 95%
level of confidence.
b. Predict the grade of a student with a grade of 87 in Math.

Student Grades in General Mathematics Grades in Earth & Life Science


1 85 86
2 87 85
3 80 81
4 79 80
5 88 87
6 89 88

Steps Solution
1. Identify the dependent and Here, the dependent variable is the grades
independent variables in General Mathematics while the
independent variable is the Grades in Earth
& Life Science
2. Compute the correlation Let us put the data in columns and find the
coefficient (r) following: ∑ 𝑋, ∑ 𝑌, ∑ 𝑋 2 , ∑ 𝑌 2 , ∑ 𝑋𝑌 and
substitute them in the formula:
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
X Y 𝑋2 𝑌2 𝑋𝑌
√[𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
85 86 7225 7396 7310
87 85 7569 7225 7395
80 81 6400 6561 6480
79 80 6241 6400 6320
88 87 7744 7569 7656
89 88 7921 7744 7832
∑𝑋 ∑𝑌 ∑ 𝑋2 ∑ 𝑌2 ∑ 𝑋𝑌
= 508 = 507
= 43100 = 42895 = 42993

5
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∙ ∑ 𝑌
𝑟=
√[𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]

6(42993) − (508)(507)
𝑟=
√[6(43100) − (508)2 ][6(42895) − (507)2 ]

𝑟 = 0.97
3. Test the significance of r using the Here n=6 and r = 0.97
formula:
𝑛−2
𝑛−2 𝑡 = 𝑟√
1 − 𝑟2
𝑡 = 𝑟√
1 − 𝑟2
6−2
𝑡 = 0.97√ = 7.98
1 − (0.97)2
4. Compare the computed t-value to Using the df = n-2 = 6-2 = 4, a=0.05, two-
the critical t-value. tailed test, we find from the table that the
critical value of t is 2.776
5. Make a Decision. Since the computed t=7.98 is greater than
the critical value of t=2.776, we reject the
null hypothesis. So, there a significant
relationship between the two variables.
6. Summarize the results. It appears that there is a significant
relationship between the grades in General
Mathematics and Earth & Life Science.
Thus, we will proceed to regression analysis.
7. Compute the value of a and b Using the value obtained in step 2, we have
in the regression equation 𝑌’ = 𝑏𝑋 + 𝑎 the following:
using the following. (∑ 𝑌) (∑ 𝑋 2 ) − (∑ 𝑋)(∑ 𝑋𝑌)
𝑎=
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)2
(∑ 𝑌) (∑ 𝑋 2 ) − (∑ 𝑋)(∑ 𝑋𝑌) (507)(43100) − (508)(42993)
𝑎= =
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)2 6(43100) − 5082
𝒂 = 𝟐𝟏
𝑛(𝑋𝑌) − (∑ 𝑋)(∑ 𝑌)
𝑏=
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)2
𝑛(𝑋𝑌) − (∑ 𝑋)(∑ 𝑌)
𝑏=
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)2
(6)(42993) − (508)(507)
𝑏=
6(43100) − 5082
𝒃 = 𝟎. 𝟕𝟓
8. From the regression equation. Substitute the value of a and b in the
equation.

𝑌’ = 𝑏𝑋 + 𝑎
𝒀’ = 𝟎. 𝟕𝟓𝑿 + 𝟐𝟏

The regression equation for predicting the


height of the son given the height of the
father is 𝑌’ = 0.75𝑋 + 21
6
REGRESSION ANALYSIS USING MICROSOFT EXCEL:

Using Example 1 Data:

Step 1: Open Microsoft Excel and Paste the given data. See image below.

Step 2: Click on Data Then Data Analysis

Step 3: After clicking Data Analysis this window will pop-up. Scroll down and select
Regression then click OK.

7
Step 4: After clicking Ok this window will appear. Supply the Input Y Range with your
Dependent variable and Input X Range with your Independent variable. See image
below.
2
1

How to Input the X and Y Range: Click on the blank space given after the Input Y
Range. Then select the range of your Y in this case select from C2:C7.
Next, click on the blank space given after the Input X Range. Then select the range
of your X, in this case select from B2:B7.

Step 5: Check on the Confidence Level and enter the percentage. Then Click OK.
This window will then appear.

8
Compare the computed t-value to the critical t-value. Using the df = n-2 = 6-2 = 4,
a=0.05, two-tailed test, we find from the table that the critical value of t is 2.776. Since
the computed t=7.864 is greater than the critical t=2.776, we reject the null
hypothesis. So, there is a significant relationship between the two variables. Another
way to identify if there is a significant relationship between the two variables is by
checking the Significance F value. Since, the F value is 0.00 lesser than 0.05 level of
significance (Alpha), then we could say that there is a significant relationship
between the two variables.

Step 6: Proceed on finding the value of a and b in the regression equation.


a. Highlight the X-values and Y-values

b. Go to insert tab and insert scatter plot.

c. On the upper right of the graph click the plus (+) sign. Then, all other options will
be displayed. Choose trendline.

9
d. Go to more options.

e. Lastly, check display equation on the chart. The equation being shown is the
equation of the regression line.
10
Note: There is a little difference between the computed value of t using the formula
and Excel. This is due to the value of r in the manual computation is already
rounded up to two decimal places.

I Can Try

Direction: Solve the following using Microsoft excel.

1. The following data pertains to the heights of fathers and their eldest sons in
inches. If there is a significant relationship between the two variables, predict
the height of the son if the height of his father is 78 inches. Use 0.05 as the level
of significance.

Height of the Father Height of the Son


71 71
69 69
69 71
65 68

11
66 68
63 66
68 70
70 72
60 65
58 60
a. Solve if there is a significant relationship between the two variables using
excel.
b. Solve for a and b. if applicable.
c. Predict the height of the son if the height of his father is 68inches.

I Can Assess

Direction: Read and understand the questions below, and answer. Show your
solution on a separate piece of paper.

1. Given the data for the grades and the number of hours spent studying per day.
Use 0.05 as the level of significance. Solve for the following:
a. the value of r.
b. test if there is a significant relationship between the two variables using
Microsoft excel. (Proceed to c and d. If applicable)
c. Solve for a and b.
d. Predict the grades of the student if the number of hours spent studying
per day is 4hrs.
Hour/s spent Studying Grades
2 77
2 80
3 85
4 88
4 86
3 82

I Can Do More

Additional Activity:
The data shows the weight of the mothers and their eldest daughter in kilograms.
If there is a significant relationship between the two variables. Predict the Weight
of the daughter if the weight of her mother is 60Kg. Use 0.05 as the level of
significance.
12
a. Find the value of r.
b. test if there is a significant relationship between the two variables using
Microsoft excel. (Proceed to c and d. If applicable)
c. Solve for a and b.
d. Predict the weight of the daughter if the weight of her mother is 60kg.

Weight of the Mother Weight of the Daughter


50 50
65 65
67 69
62 65
63 65
62 65
67 69
68 70
61 66
59 61

13

You might also like