Chapter 2 P1

Chapter 2
Multiple Regression Model
30/12/2020 1
Introduction
❑ Simple linear regression used one independent variable to explain the dependent
variable.
❑ Some relationships are too complex to be described using a single independent

variable.
❑ Multiple linear regression uses two or more independent variables to describe the
dependent variable.
30/12/2020 2
Definition
Multiple Regression Model
The equation that describes how the dependent variable y is related to the
independent variables x1 , x2 ,..., xk and  an error term is:
y =  0 + 1 x1 +  2 x2 + ... +  k xk +  (1)
where:
 0 , 1 ,  2 ,...,  k are the parameters;
 is a random variable called the error term;
30/12/2020 3
Assumptions About the Error Term 
The error  is a random variable with mean of zero.
The variance of  , denoted by  2, is the same for all values

of the independent variables.
The values of  are independent.
The error  is a normally distributed random
30/12/2020 4
Estimate model parameters
Given observations ( yi , x1i , x2i ,,..., xki ) , i = 1, 2,..., n . The estimation method again is the
OLS, which produces estimates  0 , 1 ,  2 ,...,  k by minimizing
( y − 
n
0 − 1 x1 −  2 x2 − ... −  k xk )
2
i
i =1
i.e:
 n
Min  ( yi −  0 − 1 x1 −  2 x2 − ... −  k xk )
 i =1
2
 ( 2)
❑ The first order solution is to set the (k + 1) partial derivatives equal to zero.
❑ The solution is straightforward although the explicit form of the estimators
become complicated.
30/12/2020 5
Using matrix algebra simplies considerably the notations in multiple regression.
❑ Denote the observation vector on y as y = ( y1 , y2 ,..., yk ) where ‘t’ denotes

t
transposition
❑ In the same manner denote the data matrix on x-variables enhanced with ones
in the first column as an n (k + 1) matrix.
( 3)
where k < n.
30/12/2020 6
➢ Then we can present the whole set of regression equations for the sample
y1 =  0 + 1 x11 +  2 x12 + ... +  k x1k + 1
y2 =  0 + 1 x21 +  2 x22 + ... +  k x2 k +  2

...........................................................
yn =  0 + 1 xn1 +  2 xn 2 + ... +  k xnk +  n
y = X + ( 4)
30/12/2020 7
y = X +
Where:
, ,
and
30/12/2020 8
❑ The normal equations for the first order conditions of the minimization in matrix
form are simply: X t X ˆ = X t y
❑ which gives the explicit solution for the OLS estimator of  as
ˆ = ( X t X ) X t y
−1
( 5)
(
Where: ˆ = ˆ0 , ˆ1 ,..., ˆk )
❑ The fitted model is
yˆ = ˆ0 + ˆ1 xi1 + ... + ˆk xik ( 6)
30/12/2020 9
Interval estimation
▪ For our regression model, we have:
i − ˆi
has a t-distribution with n-k-1 degrees of freedom
s i
▪ Therefore, an interval estimate for i with 1-  confidence coefficient is:
ˆi  t s ˆ (7)
, n − k −1 i
2
MSE
Where: sˆ =
i
( x − x )2
30/12/2020 10
Example 1
Let:
1 0 0 394.33
1 4 16 329.50
1 8 64 291.00
1 12 144 255.17
𝑋= 1 16 256 𝑎𝑛𝑑 𝑦 = 229.33
1 20 400 204.83
1 24 576 179.00
1 28 784 163.83
1 32 1024 150.33
According to formula 𝛽መ = 𝑋 ′ 𝑋 𝑋 ′ 𝑦
we need to calculate 𝑋 ′ 𝑋 first and then invert, it and get (𝑋 ′ 𝑋)−1
30/12/2020 11
Example 1
9 144 3264 0.6606 −0.0773 0.0019

𝑋 ′ 𝑋 = 144 3264 82,944 ′
(𝑋 𝑋) −1
= −0.0773 0.0140 −0.0004
3264 82,944 2,245,632 0.0019 −0.0004 0.0000
𝛽መ0
Finally, we calculate the vector of LS estimates 𝛽መ = 𝛽መ1
𝛽መ2
30/12/2020 12
Example 1
(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
0.6606 −0.0773 0.0019 1 1 1 1 1 1 1 1 1

= −0.0773 0.0140 −0.0004 0 4 8 12 16 20 24 28 32
0.0019 −0.0004 0.0000 0 16 64 144 256 400 576 784 1024
394.33
329.50
291.00
255.17 386.265
x 229.33 = −12.722
204.83 0.172
179.00
163.83
150.33
Therefore, the LS quadratic model is 𝑦ො = 386.265 − 12.772𝑥1 + 0.172. 𝑥2
30/12/2020 13
Example 2
You work in advertising for the New York Times. You want to find the effect of ad size
and newspaper circulation on the number of ad responses.
You’ve collected the following data:
(y) (x1) (x2)

Resp Size Circ
1 1 2
4 8 8
1 3 1
3 5 7
2 6 4
4 10 6
Estimate the unknown parameters.
yˆ = .0640 + .2049 x1 + .2805x2
30/12/2020 14
Example 2
1. Slope 𝛽መ1
Number of responses to ad is expected to increase by .2049 (20.49) for each 1

increase in ad size holding circulation constant
2. Slope 𝛽መ0
Number of responses to ad is expected to increase by .2805 (28.05) for each 1 unit
(1,000) increase in circulation holding ad size constant
30/12/2020 15
Example 2
The years of experience, score on the aptitude test, and corresponding annual
salary ($1000s) for a sample of 20 programmers is shown on the next slide.
Exper. Score Salary Exper. Score Salary
4 78 24.0 9 88 38.0
7 100 43.0 2 73 26.6
1 86 23.7 10 75 36.2
5 82 34.3 5 81 31.6
8 86 35.8 6 74 29.0
10 84 38.0 8 87 34.0
0 75 22.2 4 79 30.1
1 80 23.1 6 94 33.9
6 83 30.0 3 70 28.2
6 91 33.0 3 89 30.0
30/12/2020 16
Example 2
Suppose we believe that salary (y) is related to the years of experience (x1)
and the score on the programmer aptitude test (x2) by the following
regression model:
y = 0 + 1x1 + 2x2 +  (8)
where
y = annual salary ($1000)
x1 = years of experience
x2 = score on programmer aptitude test
30/12/2020 17
Example 2
Solving for the Estimates of 0 , 1 , 2
Input Data Least Squares Output
x1 x2 y 0 =
Computer Package
4 78 24 for Solving Multiple 1 =
7 100 43
Regression Problems 2 =
. . .
. . .
3 89 30
30/12/2020 18
Example 2
Solving for the Estimates of 0 , 1 , 2
Excel’s Regression Equation Output
A B C D E
38
39 Coeffic. Std. Err. t Stat P-value
40 Intercept 3.17394 6.15607 0.5156 0.61279
41 Experience 1.4039 0.19857 7.0702 1.9E-06
42 Test Score 0.25089 0.07735 3.2433 0.00478
43
Estimated Regression Equation
SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) (10 )
Note: Predicted salary will be in thousands of dollars.
30/12/2020 19
Example 2
Interpreting the Coefficients
We interpret each regression coefficient as follows:
i represents an estimate of the change in y corresponding to a 1-unit

increase in xi when all other independent variables are held constant.
1 = 1.404 Salary is expected to increase by $1,404 for each additional
year of experience.
2 = 0.251 Salary is expected to increase by $251 for each additional
point scored on the programmer aptitude test
30/12/2020 20

Chapter 2 P1

Uploaded by

Copyright:

Available Formats

Chapter 2 P1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2 P1

Uploaded by

Copyright:

Available Formats

Chapter 2

Multiple Regression Model

❑ Some relationships are too complex to be described using a single independent

Multiple Regression Model

independent variables x1 , x2 ,..., xk and  an error term is:

 0 , 1 ,  2 ,...,  k are the parameters;

 is a random variable called the error term;

The error  is a random variable with mean of zero.

The variance of  , denoted by  2, is the same for all values

The values of  are independent.

The error  is a normally distributed random

Using matrix algebra simplies considerably the notations in multiple regression.

❑ Denote the observation vector on y as y = ( y1 , y2 ,..., yk ) where ‘t’ denotes

y1 =  0 + 1 x11 +  2 x12 + ... +  k x1k + 1

y2 =  0 + 1 x21 +  2 x22 + ... +  k x2 k +  2

❑ which gives the explicit solution for the OLS estimator of  as

yˆ = ˆ0 + ˆ1 xi1 + ... + ˆk xik ( 6)

▪ For our regression model, we have:

▪ Therefore, an interval estimate for i with 1-  confidence coefficient is:

we need to calculate 𝑋 ′ 𝑋 first and then invert, it and get (𝑋 ′ 𝑋)−1

9 144 3264 0.6606 −0.0773 0.0019

0.6606 −0.0773 0.0019 1 1 1 1 1 1 1 1 1

Therefore, the LS quadratic model is 𝑦ො = 386.265 − 12.772𝑥1 + 0.172. 𝑥2

You’ve collected the following data:

(y) (x1) (x2)

Estimate the unknown parameters.

yˆ = .0640 + .2049 x1 + .2805x2

Number of responses to ad is expected to increase by .2049 (20.49) for each 1

salary ($1000s) for a sample of 20 programmers is shown on the next slide.

Exper. Score Salary Exper. Score Salary

y = 0 + 1x1 + 2x2 +  (8)

x2 = score on programmer aptitude test

Solving for the Estimates of 0 , 1 , 2

Input Data Least Squares Output

Solving for the Estimates of 0 , 1 , 2

Excel’s Regression Equation Output

Estimated Regression Equation

SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) (10 )

Note: Predicted salary will be in thousands of dollars.

Interpreting the Coefficients

We interpret each regression coefficient as follows:

i represents an estimate of the change in y corresponding to a 1-unit

1 = 1.404 Salary is expected to increase by $1,404 for each additional

2 = 0.251 Salary is expected to increase by $251 for each additional

point scored on the programmer aptitude test

You might also like