Chapter 2 P1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Chapter 2

Multiple Regression Model

30/12/2020 1
Introduction

❑ Simple linear regression used one independent variable to explain the dependent
variable.

❑ Some relationships are too complex to be described using a single independent


variable.

❑ Multiple linear regression uses two or more independent variables to describe the
dependent variable.

30/12/2020 2
Definition

Multiple Regression Model

The equation that describes how the dependent variable y is related to the

independent variables x1 , x2 ,..., xk and  an error term is:

y =  0 + 1 x1 +  2 x2 + ... +  k xk +  (1)

where:

 0 , 1 ,  2 ,...,  k are the parameters;

 is a random variable called the error term;

30/12/2020 3
Assumptions About the Error Term 

The error  is a random variable with mean of zero.

The variance of  , denoted by  2, is the same for all values


of the independent variables.

The values of  are independent.

The error  is a normally distributed random

30/12/2020 4
Estimate model parameters

Given observations ( yi , x1i , x2i ,,..., xki ) , i = 1, 2,..., n . The estimation method again is the
OLS, which produces estimates  0 , 1 ,  2 ,...,  k by minimizing

( y − 
n
0 − 1 x1 −  2 x2 − ... −  k xk )
2
i
i =1

i.e:
 n
Min  ( yi −  0 − 1 x1 −  2 x2 − ... −  k xk )
 i =1
2
 ( 2)

❑ The first order solution is to set the (k + 1) partial derivatives equal to zero.
❑ The solution is straightforward although the explicit form of the estimators
become complicated.

30/12/2020 5
Estimate model parameters

Using matrix algebra simplies considerably the notations in multiple regression.

❑ Denote the observation vector on y as y = ( y1 , y2 ,..., yk ) where ‘t’ denotes


t

transposition

❑ In the same manner denote the data matrix on x-variables enhanced with ones
in the first column as an n (k + 1) matrix.

( 3)

where k < n.

30/12/2020 6
Estimate model parameters

➢ Then we can present the whole set of regression equations for the sample

y1 =  0 + 1 x11 +  2 x12 + ... +  k x1k + 1

y2 =  0 + 1 x21 +  2 x22 + ... +  k x2 k +  2


...........................................................
yn =  0 + 1 xn1 +  2 xn 2 + ... +  k xnk +  n

y = X + ( 4)

30/12/2020 7
Estimate model parameters

y = X +
Where:

, ,

and

30/12/2020 8
Estimate model parameters

❑ The normal equations for the first order conditions of the minimization in matrix
form are simply: X t X ˆ = X t y

❑ which gives the explicit solution for the OLS estimator of  as

ˆ = ( X t X ) X t y
−1
( 5)

(
Where: ˆ = ˆ0 , ˆ1 ,..., ˆk )
❑ The fitted model is

yˆ = ˆ0 + ˆ1 xi1 + ... + ˆk xik ( 6)

30/12/2020 9
Interval estimation

▪ For our regression model, we have:

i − ˆi
has a t-distribution with n-k-1 degrees of freedom
s i

▪ Therefore, an interval estimate for i with 1-  confidence coefficient is:

ˆi  t s ˆ (7)
, n − k −1 i
2

MSE
Where: sˆ =
i
( x − x )2

30/12/2020 10
Example 1

Let:
1 0 0 394.33
1 4 16 329.50
1 8 64 291.00
1 12 144 255.17
𝑋= 1 16 256 𝑎𝑛𝑑 𝑦 = 229.33
1 20 400 204.83
1 24 576 179.00
1 28 784 163.83
1 32 1024 150.33

According to formula 𝛽መ = 𝑋 ′ 𝑋 𝑋 ′ 𝑦

we need to calculate 𝑋 ′ 𝑋 first and then invert, it and get (𝑋 ′ 𝑋)−1

30/12/2020 11
Example 1

9 144 3264 0.6606 −0.0773 0.0019


𝑋 ′ 𝑋 = 144 3264 82,944 ′
(𝑋 𝑋) −1
= −0.0773 0.0140 −0.0004
3264 82,944 2,245,632 0.0019 −0.0004 0.0000

𝛽መ0
Finally, we calculate the vector of LS estimates 𝛽መ = 𝛽መ1
𝛽መ2

30/12/2020 12
Example 1

(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

0.6606 −0.0773 0.0019 1 1 1 1 1 1 1 1 1


= −0.0773 0.0140 −0.0004 0 4 8 12 16 20 24 28 32
0.0019 −0.0004 0.0000 0 16 64 144 256 400 576 784 1024

394.33
329.50
291.00
255.17 386.265
x 229.33 = −12.722
204.83 0.172
179.00
163.83
150.33

Therefore, the LS quadratic model is 𝑦ො = 386.265 − 12.772𝑥1 + 0.172. 𝑥2

30/12/2020 13
Example 2

You work in advertising for the New York Times. You want to find the effect of ad size
and newspaper circulation on the number of ad responses.

You’ve collected the following data:

(y) (x1) (x2)


Resp Size Circ
1 1 2
4 8 8
1 3 1
3 5 7
2 6 4
4 10 6

Estimate the unknown parameters.

yˆ = .0640 + .2049 x1 + .2805x2

30/12/2020 14
Example 2

1. Slope 𝛽መ1

Number of responses to ad is expected to increase by .2049 (20.49) for each 1


increase in ad size holding circulation constant

2. Slope 𝛽መ0
Number of responses to ad is expected to increase by .2805 (28.05) for each 1 unit
(1,000) increase in circulation holding ad size constant

30/12/2020 15
Example 2

The years of experience, score on the aptitude test, and corresponding annual

salary ($1000s) for a sample of 20 programmers is shown on the next slide.

Exper. Score Salary Exper. Score Salary

4 78 24.0 9 88 38.0
7 100 43.0 2 73 26.6
1 86 23.7 10 75 36.2
5 82 34.3 5 81 31.6
8 86 35.8 6 74 29.0
10 84 38.0 8 87 34.0
0 75 22.2 4 79 30.1
1 80 23.1 6 94 33.9
6 83 30.0 3 70 28.2
6 91 33.0 3 89 30.0

30/12/2020 16
Example 2

Suppose we believe that salary (y) is related to the years of experience (x1)

and the score on the programmer aptitude test (x2) by the following

regression model:

y = 0 + 1x1 + 2x2 +  (8)

where
y = annual salary ($1000)

x1 = years of experience

x2 = score on programmer aptitude test

30/12/2020 17
Example 2

Solving for the Estimates of 0 , 1 , 2

Input Data Least Squares Output

x1 x2 y 0 =
Computer Package
4 78 24 for Solving Multiple 1 =
7 100 43
Regression Problems 2 =
. . .
. . .
3 89 30

30/12/2020 18
Example 2

Solving for the Estimates of 0 , 1 , 2

Excel’s Regression Equation Output

A B C D E
38
39 Coeffic. Std. Err. t Stat P-value
40 Intercept 3.17394 6.15607 0.5156 0.61279
41 Experience 1.4039 0.19857 7.0702 1.9E-06
42 Test Score 0.25089 0.07735 3.2433 0.00478
43

Estimated Regression Equation

SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) (10 )

Note: Predicted salary will be in thousands of dollars.

30/12/2020 19
Example 2

Interpreting the Coefficients

We interpret each regression coefficient as follows:

i represents an estimate of the change in y corresponding to a 1-unit


increase in xi when all other independent variables are held constant.

1 = 1.404 Salary is expected to increase by $1,404 for each additional

year of experience.

2 = 0.251 Salary is expected to increase by $251 for each additional

point scored on the programmer aptitude test

30/12/2020 20

You might also like