General Linear Model: Advance Methods of Research Masters of Engineering Program Major in Electrical Engineering

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 33

General Linear Model

ADVANCE METHODS OF RESEARCH


MASTERS OF ENGINEERING PROGRAM MAJOR IN ELECTRICAL ENGINEERING
Objectives:

 How to develop a multiple regression model


 How to interpret the regression coefficients
 How to determine which IV to include in the
regression model
General Linear Model (GLM)
 Simple form of the GLM

𝑌 =𝜇+𝛼+𝜀
Score = Grand mean + Independent variable + Error

 The basic idea is that everyone in the population has the same
score (the grand mean) that is changed by the effects of an
independent variable (A) plus just random noise (error).

 Some levels of IV raise scores from the GM, other levels lower
scores from the GM and yet others have no effect.
General Linear Model
 General form:

𝑌𝑖 = 𝛽𝑜 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ . 𝛽𝑘 𝑋𝑘 + 𝜀

Where:
𝛽𝑜 =Y-intercept
𝛽1 = Slope of Y with variable 𝑋1 .
𝛽2 = Slope of Y with variable 𝑋1 .
.
.
𝛽𝑘 = Slope of Y with variable 𝑋𝑘 .
𝜀 = random error in Y for observation i.
Assumptions for a GLM

 The mean of  is 0, i.e., E() = 0.


This implies that the mean of Y is equivalent to the
deterministic component of the model.
 For all the settings of the IV, the variance of  is
constant.
 The probability distribution of  is normal.
 The random errors are independent.
Example: Simple Linear Regression

SCORE (Y) STUDY


HABIT (X1)
31 3.5 𝑌𝑖 = 𝛽𝑜 + 𝛽1 𝑋1
28 3
Y-intercept, 𝛽𝑜 = -0.0577
35 4
38 4
Slope, 𝛽1 = 9.135
46 5
22 2 Therefore, SLR model is
41 4.5 𝒀𝒊 = −𝟎. 𝟎𝟓𝟕𝟕 + 𝟗. 𝟏𝟑𝟓𝑿𝟏
34 3.5
12 1.5
23 3

Develop a prediction equation.


Example: GLM
SCORE (Y) STUDY DIALY
HABIT (X1) ALLOWANCE (X2)
31 3.5 450
28 3 350
35 4 430
38 4 420
46 5 450
22 2 290
41 4.5 500
34 3.5 350
12 1.5 250
23 3 300

Develop a prediction equation.


Developing the Multiple Linear Regression

• Place the data in matrices in a particular pattern.


𝑌𝑖 = 𝛽𝑜 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ . 𝛽𝑘 𝑋𝑘 + 𝜀 𝑌 = 𝑋𝛽 + 𝜀

• The Data Matrices:


𝑌1 1 𝑋11 𝑋12 𝛽0 𝜀1
𝑌 = 𝑌2 X = 1 𝑋21 𝑋22 𝛽 = 𝛽1 𝜀 = 𝜀2
𝑌𝑛 1 𝑋𝑛1 𝑋𝑛2 𝛽𝑘 𝜀𝑛
Fitting the Model:
The Method of Least Squares

• Least Square Solution:


𝑋 ′ 𝑋 𝛽 = 𝑋 ′ 𝑌 or
𝛽 = 𝑋′𝑋 −1
𝑋′𝑌

Where: X’ = transpose matrix of X.


Fitting the Model: 31 1 3.5 450
The Method of Least Squares 28 1 3 350
35 1 4 430
38 1 4 420 𝛽0
46 1 5 450
STUDY DIALY 𝑌=
22
𝑋=
1 2 290
𝛽 = 𝛽1
SCORE HABIT ALLOWANCE 41 1 4.5 500
𝛽2
(Y) (X1) (X2) 34 1 3.5 350
12 1 1.5 250
31 3.5 450 23 1 3 300
28 3 350
35 4 430 10 34 3790
𝑋 𝑋 = 34

126 13605
38 4 420 3790 13605 1497900
310
46 5 450 𝑋 𝑌 = 1149

22 2 290 124140
−1.0621
41 4.5 500 𝛽= 𝑋𝑋′ −1 ′
𝑋 𝑌 = 8.6522
0.0070
34 3.5 350
12 1.5 250 𝑌𝑖 = 𝛽𝑜 + 𝛽1 𝑋1 + 𝛽2 𝑋2
23 3 300
𝒀 = −𝟏. 𝟎𝟔𝟐 + 𝟖. 𝟔𝟓𝟐𝑿𝟏 + 𝟎. 𝟎𝟎𝟕𝑿𝟐
Develop a prediction equation.
Using MATLAB:
Using Microsoft Excel:
ANOVA Summary Table for
the Overall F test

Source df SS MS F-test

𝑆𝑆𝑅 𝑀𝑆𝑅
𝑀𝑆𝑅 = 𝐹=
Regression k SSR 𝑘 𝑀𝑆𝐸

𝑆𝑆𝐸
𝑀𝑆𝐸 =
Error n–k-1 SSE 𝑛−𝑘−1

Total n-1 SST


Test for the Significance of the
Overall Multiple Regression Model

• Hypotheses:
𝐻𝑜 : 𝛽1 = ⋯ . = 𝛽𝑘 = 0 (No linear relationship bet DV and IV).
𝐻𝑖 : At least one 𝛽 ≠ 0 (linear relationship bet DV and IV)
ANOVA Table:

 Using a 0.05 level of significance the critical value of the F


distribution with 2 and 7 degrees of freedom is approximately 4.26.
 Because 66.595 (F computed) > 4.26 (F critical), reject null
hypotheses.
 Or, p-value is less than 0.05, reject null.

 Conclusion: at least one of the IV is related to score.


REGRESSION, ANALYSIS OF VARIANCE, AND ANALYSIS OF
COVARIANCE
 There are several reasons why regression and analysis of
variance are applied so frequently. One of the main reasons is
they provide answers to the questions of researchers of their
data. Regression allows researchers to determine if and how
variables are related. ANOVA, allows researchers to
determine if the mean scores of different groups or conditions
differ. Analysis of Covariance (ANACOVA), a combination of
regression and ANOVA. Allows researchers to determine if the
group or condition mean scores differ after the influence of
another variable/s on these scores has been equated across
groups.
General Linear Model (GLM)
Error is the “noise” caused by other variables you aren’t measuring,
haven’t controlled for or are unaware of.
Error like A will have different effects on scores but this
happens independently of A.
Most of the effort in research designs is done to try and
minimize error to make sure the effect of A is not “buried” in the
noise.
The error term is important because it gives us a “yard stick”
with which to measure the variability cause by the A effect. We
want to make sure that the variability attributable to A is greater
than the naturally occurring variability (error).
General Linear Model (GLM)
Example of GLM – ANOVA backwards
We can generate a data set using the GLM formula
We start off with every subject at the GM (e.g. =5).

a1 a2
Case Score Case Score
s1 5 s6 5
s2 5 s7 5
s3 5 s8 5
s4 5 s9 5
s5 5 s10 5
General Linear Model (GLM)
Then we add in the effect of A (a1 adds 2 points and a2
subtracts 2 points).
a1 a2
Case Score Case Score
s1 5+2=7 s6 5–2=3
s2 5+2=7 s7 5–2=3
s3 5+2=7 s8 5–2=3
s4 5+2=7 s9 5–2=3
s5 5+2=7 s10 5–2=3
 Ya1  35  Ya2  15
 a1  245
Y 2
 a2  45
Y 2

Ya1  7 Ya3  3
General Linear Model (GLM)
Now if we add in some random variation (error).
a1 a2
Case Score Case Score SUM
s1 5+2+2=9 s6 5–2+0=3
s2 5+2+0=7 s7 5–2–2=1
s3 5+2–1=6 s8 5–2+0=3
s4 5+2+0=7 s9 5–2+1=4
s5 5+2–1=6 s10 5–2+1=4
 Ya1  35  Ya2  15  Y  50
 a1  251
Y 2
 a2  51
Y 2
 Y  302
2

Ya1  7 Ya3  3 Y 5
General Linear Model (GLM)
Now if we calculate the variance for each group:

 Y 2

(  Y ) 2

251 
35 2

a1
sN2 1  N  5  1.5
N 1 4

 Ya2 
2 (  Y ) 2

51 
152

sN2 1  N  5  1.5
N 1 4
The average variance in this case is also going to be 1.5
((1.5 + 1.5 ) / 2)
General Linear Model (GLM)
We can also calculate the total variability in the data
regardless of treatment group.

 Y 
2 (  Y ) 2

302 
50 2

sN2 1  N  10  5.78
N 1 9

The average variability of the two groups is smaller


than the total variability.
General Linear Model (GLM)
We can also calculate the total variability in the data
regardless of treatment group.

 Y 
2 (  Y ) 2

302 
50 2

sN2 1  N  10  5.78
N 1 9

The average variability of the two groups is smaller


than the total variability.
Analysis – deviation approach
 The total variability can be partitioned into between group
variability and error.

Y ij  GM   Yij  Y j   Y j  GM 

 If you ignore group membership and calculate the mean


of all subjects this is the grand mean and total variability
is the deviation of all subjects around this grand mean.
 Remember that if you just looked at deviations it would
most likely sum to zero.
Analysis – deviation approach
 degrees of freedom
DFtotal = N – 1 = 10 -1 = 9
DFA = a – 1 = 2 – 1 = 1
DFS/A = a(S – 1) = a(n – 1) = an – a =
N – a = 2(5) – 2 = 8
 Variance or Mean square
MStotal = 52/9 = 5.78
MSA = 40/1 = 40
MSS/A = 12/8 = 1.5
 Test statistic
F = MSA/MSS/A = 40/1.5 = 26.67
Critical value is looked up with dfA, dfS/A and alpha. The test is
always non-directional.
Analysis – deviation approach
ANOVA summary table

Source SS df MS F
A 40 1 40 26.67
S/A 12 8 1.5
Total 52 9
Analysis – regression approach
Levels of A Cases Y X YX
S1 9 1 9
S2 7 1 7
a1 S3 6 1 6
S4 7 1 7
S5 6 1 6
S6 3 -1 -3
S7 1 -1 -1
a2 S8 3 -1 -3
S9 4 -1 -4
S10 4 -1 -4
Sum 50 0 20
Squares Summed 302 10
N 10
Mean 5
Analysis – regression approach
Y = a + bX + e
e = Y – Y’

• Sums of squares

  Y
2
502
SS (Y )   Y 2   302   52
N 10
  X
2
2
0
SS ( X )   X 2
  10   10
N 10
( Y )( X ) (50)(0)
SP(YX )   YX   20   20
N 10
Analysis – regression approach
• Slope
( Y )( X ) 
 YX  N SP(YX ) 20
b   2
  
2
X SS ( X ) 10
  N
X 2

• Intercept

a  Y  bX  5  2(0)  5
Analysis – regression approach

Y '  a  bX
For a1 :
Y '  5  2(1)  7
For a 2 :
Y '  5  2(1)  3
Analysis – regression approach
 Degrees of freedom
df(reg.) = # of predictors
df(total) = number of cases – 1
df(resid.) = df(total) – df(reg.) =
9–1=8
SUMMARY:
The general linear model or multivariate regression model is a statistical linear
model. It may be written as

score=grand mean + independent variable + error


Y     
where Y is a matrix with series of multivariate measurements (each column being a
set of measurements on one of the dependent variables). The errors are usually
assumed to be uncorrelated across measurements, and follow a multivariate
normal distribution. If the errors do not follow a multivariate normal
distribution, generalized linear models may be used to relax assumptions
about Y and Ɛ.
END OF
PRESENTATION

You might also like