Midterm Review STA216: Generalized Linear Models: I I I I I I

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Midterm Review

STA216: Generalized Linear Models

Primary Topics Covered So Far:

1. GLM Basics

(a) Definition of exponential family,

yi θi − b(θi )
 
f (yi ; θi , φ) = exp + c(yi , φ) ,
a(φ)

where θi , φ are location and scale parameters, respectively. Mean

and variance are

E(yi ) = b0 (θi ) and V(yi ) = b00 (θi )a(φ).

(b) Systematic component, link functions:

ηi = g(µi ) = x0i β,

where ηi is the linear predictor, g(·) is the link function, and xi are

predictors

(c) Canonical link: θi = ηi - linear, logistic, log for normal, Bernoulli,

& Poisson, respectively.

1
2. Basics of Frequentist Inference

(a) Maximize likelihood using iteratively reweighted least squares (don’t

need to memorize details!)

(b) Analysis of deviance:

• Scaled deviance = 2× difference between log likelihoods for sat-

urated model and current model.

• Exact χ2n−p distribution for normal data, otherwise normal ap-

proximation - often poor.

• However, the χ2 approximation typically works well for the dif-

ference in scaled deviance between nested models.

(c) Standard errors and confidence intervals can be based on normal

approximation at MLE.

(d) For variable selection, stepwise procedures are often used - forward,

backward, goodness of fit criteria, disadvantages?

2
3. Basics of Bayesian Inference in GLMs

(a) Definition of prior, posterior distribution, marginal likelihood, pos-

terior probability, credible intervals, and Bayes factors.

(b) Should be able to derive posterior distributions for regression pa-

rameters and error precision in normal linear models with conjugate

priors.

(c) Basics of implementing a Markov chain Monte Carlo algorithm with

Gibbs and/or Metropolis-Hastings steps.

(d) Commonly used algorithms for Gibbs sampling in GLMs - adaptive

rejection sampling, data augmentation - basic details (don’t need to

memorize steps in adaptive rejection sampling) and motivation.

(e) How to do inferences based on the posterior in applied problems.

3
4. Latent Variable Models for Binary Data

(a) How to induce a regression model on a binary or categorical response

by defining an underlying continuous variable and a threshold link.

(b) Using an underlying normal specification of probit models, and im-

plementing Albert and Chib (1993) data augmentation algorithm.

(c) Alternative latent variable models - underlying Poisson, etc.

4
5. Using Hierarchical GLMs for Correlated Data

(a) Definition of a typical generalized linear mixed model (GLMM) that

incorporates “random-effects” for a subvector of the regression pa-

rameters.

(b) Conjugate prior distributions and Gibbs sampling in normal case.

(c) Problems with improper priors and mixing.

(d) How to implement Gibbs sampling in non-normal cases, probit and

otherwise.

(e) Deriving correlations between normal and underlying normal vari-

ables as a function of variance components.

5
6. Survival Analysis

(a) Definitions of different types of censoring, hazard function (discrete

and continuous time), survival function, cumulative hazard, and

standard relationships.

(b) Proportional hazards model and accelerated failure time model -

definitions, and parameter interpretation.

(c) Standard parametric models - constant hazard (exponential), piece-

wise constant (piecewise exponential), and Weibull.

(d) Cure rate models - definition and data augmentation trick for pos-

terior computation

6
7. Missing Data

(a) Standard strategies and associated assumptions.

(b) Pattern mixture and selection model definitions.

(c) MCAR, MAR, informative missingness, non-ignorability, and iden-

tifiability issues.

(d) Using data augmentation MCMC to account for missing response

data under MAR.

(e) Accounting for missing covariates

7
8. Discrete Time Survival Analysis

(a) Definition and converting a given continuous-time model (e.g., pro-

portional hazards) to discrete time.

(b) Defining a discrete time survival model in terms of a binary response

likelihood and GLM.

(c) Incorporating time-varying covariates and time-varying coefficients.

(d) Smoothing the baseline hazard function.

(e) Continuation-ratio probit models - issues in computation and

inference & practical justification.

(f) Incorporating parameter restrictions - details in prior specification,

posterior implementation, and motivation.

8
9. Alternatives to Hierarchical Models for Multivariate Binary

Data: Bayesian Logistic Regression

(a) Incorporation of random effects induces correlation structure on

multiple binary response data.

(b) Alternatively, one can define a multivariate distribution (e.g., mul-

tivariate normal, multivariate logistic) for a vector of underlying

outcomes.

(c) Issues in parameter interpretation and computation.

(d) Differences between subject-specific and marginal parameter inter-

pretations.

9
10. Interval Censored Data

(a) Timing of examinations differs between subjects and we only know

if event occurs between examinations.

(b) How to analyze data of this type using a discrete time Bayesian

survival analysis.

(c) In particular, provide details on using data augmentation to simplify

the analysis.

(d) How to extend this approach to incorporate surrogate data on the

latency time between event occurrence and detection at an exami-

nation (e.g., uterus and tumor number/size in uterine fibroid exam-

ple)?

(e) How to use a 3 state illness-death model to account for informative

censoring (e.g., of tumor onset by death) - assumptions?

10
11. Bayesian Variable Selection and Order Restricted Inference

(a) First suppose we have normal or underlying normal (probit) data

and are interested in inferences on effects of ordered categorical pre-

dictors.

(b) What are the possibilities for addressing this approach?

(c) Define a mixture prior for addressing this problem and describe pos-

terior computation and inferences.

(d) How does this approach relate to variable selection via SSVS algo-

rithms?

11
12. Poisson Log-Linear and Logistic Regression Cases

(a) Now suppose data are Poisson distributed counts following a log-

linear model.

(b) Describe a Poisson-gamma hierarchical model which accommodates

over-dispersion relative to the Poisson distribution.

(c) Generalize this model to account for dependency in multiple obser-

vations from an individual.

(d) For categorical covariates, define a conditionally-conjugate prior dis-

tribution for the regression coefficients.

(e) Modify this prior for variable selection and inferences on ordered

categorical covariates.

(f) Use an underlying Poisson modeling strategy to apply this same

approach to logistic regression and complementary log-log models

for categorical outcomes.

12
13. Bayesian Generalized Additive Models with Constraints

(a) Focusing again on the normal or underlying normal data cases, con-

sider models that allow the regression function to be unknown.

(b) Define a generalized additive model, and place a prior on the un-

known regression function(s) with or without monotonicity con-

straints.

(c) Place a hyperprior on unknown smoothing parameters, show that

the prior is conditionally-conjugate and discuss properties including

details in posterior computation.

(d) How to apply to real problems - conceptually?

13
Example Exam Problem Set

Question 1:

Suppose that 2500 pregnant women are enrolled in a study and the outcome

is the occurrence of preterm birth. Possible predictors of preterm birth in-

clude age of the woman, smoking, socioeconomic status, body mass index,

bleeding during pregnancy, serum level of dde, and several dietary factors.

Formulate the problem of selecting the important predictors of preterm birth

in a generalized linear model (GLM) framework. Show the components of

the GLM, including the link function and distribution (in exponential family

form). Describe (briefly) how estimation and inference could proceed via a

frequentist approach.

14
Possible Solution:

yi = 1 if woman i has preterm birth and yi = 0 otherwise (i = 1, . . . , n)

yi ∼ Bernoulli(πi )

Probability density function:

f (yi ; πi ) = πiyi (1 − πi )1−yi

= exp {yi log πi + (1 − yi ) log(1 − πi )}


 
πi  
= exp yi log + log(1 − πi )
1 − πi

yi θi − b(θi ) 
= exp + c(yi , φ) ,
a(φ)

where

πi 

θi = log , b(θi ) = log(1 + eθi ), a(φ) = φ = 1,
1 − πi

and c(yi , φ) = 0.

Link function:

Any mapping from < → [0, 1]. A convenient choice is the canonical link,

πi 
ηi = θi = log ,
1 − πi

which is the logit. The probit and complementary log-log are alternatives.

Frequentist Estimation:

Maximum likelihood estimates can be obtained for a given model, say



πi 
log = x0i β,
1 − πi
15
(where xi is a p × 1 vector of predictors) by iterative weighted least squares

Frequentist Inference:

One can select the important predictors to be included in the model by step-

wise selection, using the AIC or BIC criterion.

Alternatively, one can just fit the model with all the predictors and then do

inferences based on the MLEs and asymptotic standard errors. For example,

for continuous predictors included as linear terms in the model, we can do

a Wald test. Alternatively, we could do analysis of deviance (see notes for

details) to test for significant differences in fit between the nested models

with and without a particular predictor.

16
Question 2:

Women are enrolled in a study when they go off of contraception with the

intention of achieving a pregnancy. Suppose there are 350 women in the

study who provide information on the number of menstrual cycles required

to achieve a pregnancy, whether or not they smoke cigarettes, and their age

at beginning the attempt. Describe a statistical model for addressing the

question: Is cigarette smoking related to time to pregnancy? Formulate the

statistical model within a Bayesian framework and outline the details of model

fitting and inference (including the form of the posterior density, an outline

of the algorithm for posterior computation, and the approach for addressing

the scientific question based on the posterior).

17
Discrete time survival analysis:

Let yij = 1 if woman i conceives in cycle j and yij = 0 otherwise

Let rij = 1 if woman i is attempting in cycle j and rij = 0 otherwise

Discrete hazard of conception in cycle j:

λij = h(αj + x0i β),

where αj is an intercept parameter and xi = (smki , agei )0 .

Assuming a continuation-ratio probit model, the likelihood is:

350 T  rij
1−y
x0i β)yij {1 x0i β)} ij
Y Y
Φ(αj + − Φ(αj +
i=1 j=1

18
We complete a Bayesian specification of the model with prior densities for

α = (α1 , . . . , αT )0 and β,

T
 Y 
1(α1 > α2 > . . . > αT ) N (αj ; α0j , σα2 j ) N (β; β 0 , Σβ ),
j=1

where the order constraint models the selection process where more fertile

couples conceive rapidly

To simplify posterior computation, we augment the observed data with un-

derlying normal variables. In particular, let yij = 1(zij > 0), where zij ∼

N (αj + x0i β, 1) are independent. Posterior computation can then proceed via

the following Gibbs sampling algorithm:

1. Choose initial values for α and β.

2. Sample zij , for all i, j : rij = 1, from its full conditional density, which is

N (αj + x0i β, 1) truncated below (above) by 0 for yij = 1 (yij = 0).

3. Sample αj , for j = 1, . . . , T , from its full conditional density, which is

N (αb j , σb α2 j ) truncated so that αj ∈ (αj+1 , αj−1 ).

4. Sample β from its multivariate normal full conditional density.

5. Repeat steps 2-4 until apparent convergence and calculate posterior sum-

maries based on a large number of additional draws.

19
Our primary goal is to address the question: Is cigarette smoking related to

time to pregnancy.

Based on the Gibbs iterates, we can estimate Pr(β1 < 0 | data). If this poste-

rior probability is high (say, greater than 95%) than we have strong evidence

that the hazard of conception is lower for smokers than non-smokers (at least

in the population).

This implies that smoking may be associated with an increased time to preg-

nancy.

To characterize the magnitude of this effect, we could estimate the time

to pregnancy distribution for smokers and non-smokers, and obtain credible

intervals for these estimates.

20
Question 3:

A study is conducted examining the impact of alcohol intake during preg-

nancy on the occurrence of birth defects of 5 different types. Outcome data

for a child consist of 5 binary indicators of the presence or absence of the

different birth defects. A physician working with you on the study notes

that certain children have several birth defects, possibly due to defects in im-

portant unmeasured genes, while most children have no defects. Describe a

latent variable model for analyzing these data and outline (briefly) the details

of a Bayesian analysis (including the form of the posterior density, an outline

of the algorithm for posterior computation, and the approach for addressing

the scientific question based on the posterior).

21
Multiple binary outcomes:

yi = (yi1 , . . . , yi5 )0 , where yij = 1 if child i has jth defect and 0 otherwse

Probit regression model:

Pr(yij = 1 | ξi , xi ) = Φ(αj + x0i β j + λj ξi ),

where αj is an intercept parameter for defect j,

xi is a vector of predictors (level of alcohol, age of mother, etc),

β j are coefficients specific to the jth defect,

ξi ∼ N (0, 1) is a latent variable measuring genetic susceptibility of child,

λj is a factor loading relating overall genetic susceptibility to jth defect

22
Note that we don’t need to assume probit, we could use other link functions.

We complete a Bayesian specification of the model with prior distributions

for α = (α1 , . . . , α5 )0 , β = (β 01 , . . . , β 05 )0 and λ = (λ1 , . . . , λ5 )0 ,

(α0 , β 0 )0 ∼ N (µ, Σ) and λj ∼ N (λ0j , σλ2j ) truncated below by 0.

Posterior computation can proceed via a Gibbs sampler,

1. Sample underlying variables from their full conditional

zij ∼ N (αj + x0i β j + λj ξi , 1),

truncated below (above) by 0 for yij = 1 (yij = 0)

2. Sample (α0 , β 0 )0 from the multivariate normal full conditional

3. Sample ξi from its normal full conditional

4. Sample each λj from their truncated normal full conditionals

To assess the effect of alcohol intake on the different defects, estimate the

marginal posterior densities of βj1 , for j = 1, . . . , 5.

For defects having high values of Pr(βj1 > 0 | data), we have evidence of a

positive association with alcohol intake

23
Question 4:

A toxicology study is conducted in which pregnant mice are exposed to dif-

ferent doses of a chemical. The outcome data consist of an ordinal ranking

of the sickness of each pup in each litter, with 1 = healthy, 2 = low birth

weight but otherwise healthy, 3 = malformed, and 4 = dead. The goal of the

study is to see if dose is associated with health of the pup. Describe a model

and analytic strategy. What is the interpretation of the model parameters?

What assumptions are being made and can they be relaxed?

24
Let yij ∈ {1, 2, 3, 4} be the outcome for the jth pup in the ith litter, and let

xi be the dose of the test chemical.

A possible model for relating the ordinal ranking of pup health to dose while

allowing for within-litter dependency would be

Pr(yij ≤ k | xi , bi ) = Φ(αk − βxi − bi ),

where αk is an intercept parameter or cutpoint on a latent normal density, β

is a slope parameter characterizing the effect of dose, and bi ∼ N(0, ψ −1 ) is a

litter-specific latent variable (random effect)

Strategy:

1. Formulate this generalized probit model as an underlying normal regres-

sion model.

2. Define conditionally-conjugate priors, restricting α1 = 0 and the α’s to

be increasing for identifiability and so that the distribution function of

Y is a proper distribution function.

3. Proceed with posterior computation via Albert and Chib (1993) - provide

some details.

25
The parameter of primary interest is the slope parameter, β, which is inter-

pretable as the increase in the underlying normal mean attributable to a unit

increase in dose.

One assumption being made is that one parameter can be used to characterize

the shift in the distribution of Y as dose changes.

Potentially the shape of the distribution may be completely different and one

may need to have β’s specific to k

Another assumption is that we have non-informative cluster size - that is, the

number of pups within a litter is not informative. To address this, we could

include cluster size as a covariate or even a separate outcome.

26

You might also like