STA 405: Linear Modelling 2: Dr. Idah

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

STA 405: Linear Modelling 2

Dr. Idah

March 29, 2023

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 1/1
Model Building

In most research problems where regression analysis is applied, more


than one independent variable is needed in the regression model.
Therefore in order to be able to predict an important response, a
multiple regression model is needed. When this model is linear in the
coefficients, it is called a multiple linear regression model.
For the case of k independent variables x1 , x2 · · · , xk the expected
response is given by the multiple linear regression model

ŷ = β0 + β1 x1 + · · · βk xk

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 2/1
Model Selection and Validation

Variable Selection Techniques


Variable selection means selecting which variables to include in our
model. As such it is a special case of model selection.

1)Hypothesis Testing
In many regression situations, individual coefficients are of importance
to the experimenter.
Using regression analysis one is also interested in deletion of variables
when the situation dictates that, in addition to arriving at a workable
prediction equation
The ”best regression” involving only variables that are useful
predictors should be obtained

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 3/1
One criterion that is commonly used to illustrate the adequacy of a
fitted regression model is the coefficient of multiple determination
Pn
2 SSR (yˆi − ȳ )2
R = = Pi=1 n 2
SST i=1 (yi − ȳ )

This quantity merely indicates what proportion of the total variation


in the response Y is explained by the fitted model.
R 2 is often reported as R 2 × 100% and interpret the result as
percentage variation explained by the postulated model.

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 4/1
The square root of R 2 is called the multiple correlation coefficient
between Y and the set x1 , x2 , · · · , xk .
The regression sum of squares can be used to give some indication
concerning whether or not the model is an adequate explanation of
the true situation.
One can test the hypothesis H0 that the regression is not significant
by merely forming the ratio

SSR/k SSR/k
f = =
SSE /(n − k − 1) s2

and rejecting H0 at the α−level of significance when


f > fα (k, n − k − 1)

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 5/1
Example
Consider the below data
y x1 x2 x3
25.5 1.74 5.30 10.80
31.2 6.32 5.42 9.40
25.9 6.22 8.41 7.20
38.4 10.52 4.63 8.50
18.4 1.19 11.60 9.40
26.7 1.22 5.85 9.90
26.4 4.10 6.62 8.00
25.9 6.32 8.72 9.10
32.0 4.08 4.42 8.70
25.2 4.15 7.60 9.20
39.7 10.15 4.83 9.40
35.7 1.72 3.12 7.60
26.5 1.70 5.30 8.20

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 6/1
From the experiment data we list the following sums of squares and
products
13
X 13
X 13
X
yi = 377.5 yi2 = 11, 400.15 xi = 59.43
i=1 i=1 i=1
13
X 13
X 13
X
x2i = 81.82 x3i = 115.40 x1i2 = 394.7255
i=1 i=1 i=1
X13 13
X 13
X
x2i2 = 576.7264 x3i3 = 1035.96 xi yi = 1877.567
i=1 i=1 i=1
13
X X13 13
X
x2i yi = 2246.661 x3i yi = 3337.78 x1i x2i = 360.6621
i=1 i=1 i=1
13
X 13
X
x1i x3i = 522.078 x2i x3i = 728.31 n = 59.43
i=1 i=1

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 7/1
    
13 59.43 81.82 115.4 β0 377.75
59.43 394.7255 360.6621 522.078 β1  1877.5670
81.82 360.6621 576.7264 728.31  , β2  2246.6610
    

15.4 522.078 728.31 1035.96 β3 3337.780

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 8/1
Using the relation
β̂ = (X ′ X )−1 X ′ y

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 9/1
where
Pn Pn Pn
···
 
n i=1 x1i i=1 x2i i=1 xki
 
Pn Pn Pn 
x1i2
P
X ′ X =  i=1 x1i

i=1 x1i x2i · · · i=1 x1i xki 

 .. 
 . 
Pk Pn Pn Pn 2
i=1 xki i=1 xki x1i i=1 xki x2i · · · i=1 xki

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 10 / 1


and  Pn
i=1 yi

P 

 n x y
X y = 1i i 
 i=1.
..

 
Pn
i=1 xki yi

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 11 / 1


Therefore
 
13 59.43 81.82 115.4
59.43 394.7255 360.6621 522.078
X ′X =  
81.82 360.6621 576.7264 728.31 
15.4 522.078 728.31 1035.96

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 12 / 1


and  
377.75
1877.5670
X ′y = 
2246.6610

3337.780

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 13 / 1


Obtaining the inverse of the matrix X ′ X gives elements of the inverse
matrix represented by
 
c00 c01 c02 c03
c10 c11 c12 c13 
(X ′ X )−1 = 
c20 c21 c22 c23 

c30 c31 c32 c33

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 14 / 1


whose values are
 
8.0648 −0.0826 −0.0942 −0.7905
−0.0826 0.0085 0.0017 0.0037 
(X ′ X )−1 =
−0.0942 0.0017

0.0166 −0.0021
−0.7905 0.0037 −0.0021 0.0886

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 15 / 1


and then using the relation β = (X ′ X )−1 X ′ y we obtain the estimated
regression coefficients.
The estimated regression equation is given as

ŷ = 39.1574 + 1.0161x1 − 1.8616x2 − 0.3433x3

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 16 / 1


SSR 399.45
R2 = = = 0.9117
SST 438.13
which means that 91.17% of the variation in y has been explained by the
linear model.
399.45/3
f = = 30
4.298
This value exceeds the tabulated critical point 6.99 of the F −distribution
for 3 and 9 degree of freedom at α = 0.01 level

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 17 / 1


Although the results indicates that the regression explained by the
model is significant, this does not rule out the possibility that there
could be larger value of the F- statistic .
Also that the model could have been more effective with the inclusion
of additional variables or the deletion of one or more variables.

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 18 / 1


Exercise
Consider the below data. Determine whether the regression explained by
the model is significant using R 2 and F − statistic.
y x1 x2
25.5 1.74 5.30
31.2 6.32 5.42
25.9 6.22 8.41
38.4 10.52 4.63
18.4 1.19 11.60

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 19 / 1


Forward Selection

It is based on the notion that the variables should be inserted one at


a time until a satisfactory regression equation is found.
The procedure is as follows.
Step 1
Start with a model with no predictors
Choose a variable that gives the the largest value of R 2 , F-statistics
or lowest p. We call this initial variable x1
Step 2
Choose the variable that when inserted in the model gives e.g the
largest increase in R 2 in the presence of x1 , over the R 2 found in step
1. Call is variable x2

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 20 / 1


That is
R(βj /β1 ) = R(β1 , βj ) − R(β1 )
is the largest.
The regression model with x1 and x2 is then fitted and R 2 is observed.

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 21 / 1


Step 3
Choose the variable xj that gives the largest value of

R(βj /β1 , β2 ) = R(β1 , β2 , βj ) − R(β1 , β2 )

again resulting in the largest increase of R 2 over that given in step 2.


Calling this variable x3 , we now have a regression model involving
x1 , x2 and x3 .
This process is continued until the most recent variable inserted fails
to induce a significant increase in the explained regression.
Such an increase can be determined at each step by using the
appropriate F −test
or t− test.

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 22 / 1


For example in step 2 the value

R(β2 |β1 )
f =
s2
can be determined to test the appropriateness of x2 in the model.

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 23 / 1


Here the value of s 2 is the error mean square for the model explaining
the variables x1 and x2 .
s 2 is approximated using
SSE
s2 =
n−k −1
Similarly, in step 3 the ratio

R(β3 |β1 , β2 )
f =
s2
tests the appropriateness of x3 in the model.

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 24 / 1


Now however the value for s 2 is the error mean square for the model
that contains the three variables x1 , x2 and x3 .
If f < fα(1,n−3) at step 2 for a prechosen significance level, x2 is not
included and the process is terminated, resulting in a simple linear
equation relating y and x1 .
However if f > fα(1,n−3) we proceed to step 3.
Again if f < fα(1,n−4) at step 3, x3 is not included and the process is
terminated with the appropriate regression equation containing the
variables x1 and x2

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 25 / 1


Backward Elimination

Involves the same concepts as forward selection except that one


begins with all variables in the model.
Suppose for example, that there are five variables under consideration.
The steps are as follows
Step1
Fit a regression equation with all five variables included in the model.
Choose the variable that gives the smallest value of the regression
sum of squares adjusted for others.

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 26 / 1


Suppose that this variable is x2 .
Remove x2 from the model if
R(β2 |β1 , β3 , β4 , β5 )
f =
s2
is insignificant

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 27 / 1


Step2
Fit a regression equation using the remaining variables x1 , x3 , x4 and
x5 chosen this time. Once again if

R(β5 |β1 , β3 , β4 )
f =
s2
is insignificant, the variable x5 is removed from the model.
At each step the s 2 used in the F − text is the error mean square for
the regression model at this stage.

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 28 / 1


Exercise
The information below gives regression sum of squares for the linear
regression model fits to data of a sample of 40 individuals in an attempt to
obtain the model that best explains the response variable y based on a
subset or all four predictors X1 , X2 , X3 , X4

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 29 / 1


model terms SSR model terms SSR model terms SSR
X1 1.1796 X2 0.6391 X3 3.7282
X4 0.2720 X1 , X2 1.797 X1 , X3 5.2472
X1 , X4 1.6728 X2 , X3 3.9379 X2 , X4 0.8872
X3 , X4 4.2086 X1 , X2 , X3 5.4281 X1 , X2 , X4 2.256
X1 , X3 , X4 6.0754 X2 , X3 , X4 4.4442 X1 , X2 , X3 , X4 6.219

Given SST = 8.7837; using forward selection, select the best fitting model.

Dr. Idah STA 405: Linear Modelling 2 March 29, 2023 30 / 1

You might also like