Generalized Linear Model: Badr Missaoui
Generalized Linear Model: Badr Missaoui
Generalized Linear Model: Badr Missaoui
Badr Missaoui
Logistic Regression
Outline
I Generalized linear models
I Deviance
I Logistic regression.
Generalized Linear Model
Motivation
I Classical linear model
Y = Xβ + ε
Y ∼ N(X β, σ 2 )
Y ∼ P(X β)
Generalized Linear Model
Law Law Pm µ σ2
B(m, p) py (1 − p)m−y . m k =0 δ{k } mp mp(1 − p)
Pm 1 y
P(µ) µy e−µ
n . δ
k! k
k =0 o µ µ
−µ) 2
N (µ, σ 2 ) exp − (y2σ 2 .dy µ σ2
√ n
λ(y −µ)2
o
IG(µ, λ) λ exp − 2µy . √dy µ µ3 /λ
2πy 3
Generalized Linear Model
I We write
`(y ; θ, φ) = log f (y ; θ, φ)
for the log-likelihood function of Y .
I Using the facts that
∂`
E = 0
∂θ
∂2`
∂`
Var = −E
∂θ ∂θ2
I We have
E(y ) = b0 (θ)
and
Var(y ) = b00 (θ)a(φ)
Generalized Linear Model
I Gaussian case
(y − µ)2
1
f (y ; θ, φ) = √ exp −
σ 2π 2σ 2
y µ − µ2 /2 1 y 2
2
= exp − + log(2πσ )
σ2 2 σ2
We can write θ = µ, φ = σ 2 , a(φ) 2
= φ, b(θ) = θ /2 and
2
c(y , φ) = − 21 σy 2 + log(2πσ 2 )
I Binomial case
n y
f (y ; θ, φ) = µ (1 − µ)n−y
y
µ n
= exp y log + n log(1 − µ) + log
1−µ y
µ
We can write θ = log 1−µ , b(θ) = −n log(1 − µ) and
n
c(y , φ) = log y
Generalized Linear Model
β̂ = (X T X )−1 X T Y
∂µ
X T W −1 (y − µ) = 0
∂η
I These equations are non-linear in β and require an
iterative method (e.g Newton-Raphson).
I The Fisher’s Information matrix is
= = X T W −1 X
∂η
Z 0 = η̂ 0 + (Y − µ̂0 ) | 0
∂µ µ=µ̂
So,
β̂ 1 = (X T W0−1 X )−1 X T W0−1 Z 0
Set
η̂ 1 = X βˆ1 , µ̂1 = g −1 (η̂ 1 )
Repeat until changes in β̂ m are sufficiently small.
Generalized Linear Model
Estimation
I In theory, β̂ m → β̂ as m → ∞, but in practice, the algorithm
may fail to converge.
I Under some conditions,
|β̂j |
q ∼ N(0, 1)
φ(X T Wm−1 X )−1 (j, j)
if φ is unknown
|β̂j |
q ∼ tn−p
φ̂(X T Wm−1 X )−1 (j, j)
Generalized Linear Model
Goodness-of-Fit
Goodness-of-Fit
I The scaled deviance for GLM is
Tests
I We use the deviance to compare two models having p1
and p2 parameters respectively, where p1 < p2 . Let µ̂1 and
µ̂2 denote the corresponding MLEs.
I
D(y , µ̂1 ) − D(y , µ̂2 ) ∼ χ2p2 −p1
I If φ is unknown,
D ? (y , µ̂1 ) − D ? (y , µ̂2 )
∼ F1−α,p2 −p1 ,n−p2
(p2 − p1 )φ̂
Generalized Linear Model
Goodness-of-Fit
I The deviance residuals for a given model are
p
di = sign(yi − µ̂i ) D ? (yi ; µ̂i )
Diagnostics
I The Pearson residuals are defined by
yi − µ̂i
ri = p
(1 − hii )V (µ̂)
Diagnostics
I The Anscombe residuals is defined as a transformation of
the Pearson residual
t(y ) − t(µ̂i )
riA = pi
t 0 (µ̂ i ) φV (µ̂i )(1 − hii )
Diagnostics
I Influential points using the Cook’s distance
1 hii
Ci = (β̂ − β̂)T X T Wm X (β̂(i) − β̂) ≈ ri2
p (i) p(1 − hii )2
I The outliers points : if hii > 2p/n or hii > 3p/n, then we
consider that ith point is an outlier.
Generalized Linear Model
Model Selection
I Model selection can be done using the AIC and BIC.
I Forward, Backward and stepwise approach can be used.
Generalized Linear Model
Logistic regression
I Logistic regression is a generalization of regression that is
used when the outcome Y is binary 0, 1.
I As example, we assume that
eβ0 +β1 Xi
P(Yi = 1|Xi ) =
1 + eβ0 +β1 Xi
I Note that
E(Yi |Xi ) = P(Yi = 1|Xi )
Generalized Linear Model
Logistic regression
I Define the logit function
z
logit(z) = log
1−z
I We can write
logit(πi ) = β0 + β1 Xi
where πi = P(Yi = 1|Xi )
I The extension to several covariates is
p
X
logit(πi ) = β0 + βj xij
i=1
Generalized Linear Model
Y − π̂i
p i
π̂i (1 − π̂i )
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8320 -0.8250 -0.4354 0.8747 2.5503
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.9207616 1.3265724 -4.463 8.07e-06 ***
row.names -0.0008844 0.0008950 -0.988 0.323042
sbp 0.0076602 0.0058574 1.308 0.190942
tobacco 0.0777962 0.0266602 2.918 0.003522 **
ldl 0.1701708 0.0597998 2.846 0.004432 **
adiposity 0.0209609 0.0294496 0.712 0.476617
famhistPresent 0.9385467 0.2287202 4.103 4.07e-05 ***
typea 0.0376529 0.0124706 3.019 0.002533 **
obesity -0.0661926 0.0443180 -1.494 0.135285
alcohol 0.0004222 0.0045053 0.094 0.925346
age 0.0441808 0.0121784 3.628 0.000286 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Df Deviance AIC
- alcohol 1 471.17 491.17
- adiposity 1 471.67 491.67
- row.names 1 472.14 492.14
- sbp 1 472.88 492.88
<none> 471.16 493.16
- obesity 1 473.47 493.47
- ldl 1 479.65 499.65
- tobacco 1 480.27 500.27
- typea 1 480.75 500.75
- age 1 484.76 504.76
- famhist 1 488.29 508.29
etc...
Step: AIC=487.69
chd ~ tobacco + ldl + famhist + typea + age
Df Deviance AIC
<none> 475.69 487.69
- ldl 1 484.71 494.71
- typea 1 485.44 495.44
- tobacco 1 486.03 496.03
- famhist 1 492.09 502.09
- age 1 502.38 512.38
Generalized Linear Model
I Suppose Yi ∼ Binomial(ni , πi )
I We can fit the logistic model as before
logit(πi ) = Xi β
I Pearson residuals
Yi − ni π̂i
ri = p
ni π̂i (1 − π̂i )
I Deviation residuals
s
Yi ni − Yi
di = sign(Yi −Ŷi ) 2 Yi log + (ni − Yi ) log
µ̂i ni − µ̂i
Generalized Linear Model
Goodness-of-Fit test
I The Pearson test X
χ2 = ri2
i
I and deviance X
D= di2
i
Deviance Residuals:
Min 1Q Median 3Q Max
-0.70832 -0.29814 0.02996 0.64070 0.91132
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -14.73119 1.83018 -8.049 8.35e-16 ***
x 0.24785 0.03031 8.178 2.89e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Note that the residuals give back the deviance test, and the
p-value is large indicating no evidence of a lack of fit.