Fall2012 - Brown - Introduction To Survival Analysis v3

Introduction to Event History
Analysis
DUSTIN BROWN
POPULATION RESEARCH CENTER
Objectives
 Introduce event history analysis
 Describe some common survival (hazard) distributions
 Introduce some useful Stata and SAS commands
 Discuss practical issues worth keeping in mind

Objectives

What is event history analysis?
 A set of statistical techniques used to analyze the time it

takes an event to occur within a specified time interval
 Also called survival analysis (demography, biostatistics),

reliability analysis (engineering), duration analysis
(economics)
 The basic logic behind these methods is from the life table
 Types of “Events” – Mortality, Marriage, Fertility,

Recidivism, Graduation, Retirement, etc.
Basic Concepts in Event History Analysis
 Events
 Repeatable vs. Non-Repeatable
 Single vs. Multiple
 Exposure to Risk (i.e., Measuring Time)

 Risk Set
 Discrete vs. Continuous Time
 Censoring
 Hazard & Survival Functions

 Non-Parametric vs. Semi-Parametric vs. Parametric
 Distributional Assumptions (Proportionality)
Basic Concepts: Different Types of Events
 Event (the outcome) - A discrete transition between two “states”
 Non-Repeatable Events
 Transition can occur only once (absorbing state)
 Examples: 𝐴𝐴𝐴𝐴𝐴 → 𝐷𝐷𝐷𝐷, 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 → 𝐹𝐹𝐹𝐹𝐹 𝐵𝐵𝐵𝐵𝐵
 Repeatable Events
 Transition can occur more than once (non-absorbing state)
 Examples: 𝑀𝑀𝑀𝑀𝑀𝑀𝑀 ↔ 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷, 𝐻𝐻𝐻𝐻𝐻𝐻𝐻 ↔ 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
 Single Events – i.e., 𝐴𝐴𝐴𝐴𝐴 → 𝐷𝐷𝐷𝐷
 Multiple Events – i.e., 𝐴𝐴𝐴𝐴𝐴 → 𝐶𝐶𝐶𝐶𝐶𝐶 𝐷𝐷𝐷𝐷𝐷𝐷 𝑣𝑣. 𝐶𝐶𝐶 𝐷𝐷𝐷𝐷𝐷𝐷

 Methods for competing risks an extension of those for single events
Basic Concepts: Time (Exposure, Duration)
 Time is the core component of event history analysis
 Risk Set – Individuals1 at risk of experiencing some event

 Risk exposure occurs in an observation interval (study time)
 The observation interval is when the “clock” begins and ends
 One of two outcomes are possible in the observation interval

 Failure – Event occurs in the interval (i.e., death)
 Censoring – Event does not occur in the interval (i.e., survival)
 Time usually is measured in discrete units (i.e., years, months)
 Time theoretically can be measured in (quasi) continuous units (i.e.,

hours, minutes, seconds)
1 Note that the unit of analysis does not necessarily have to be individuals.
Basic Concepts: Time (Exposure, Duration)
Source: Blossfeld & Rohwer. 2002. Techniques of Event History Modeling, 2nd Ed. (p. 40)
Objectives
 Discuss a few practical issues worth keeping in mind

Basic Concepts: Hazard & Survival Functions
 Hazard Function – Instantaneous probability2 that an event will

occur at time t, conditional that the event has not already occurred.
𝑃 𝑡 + ∆𝑡 > 𝑇 ≥ 𝑡 𝑇 ≥ 𝑡) 𝑓(𝑡)
𝜆 𝑡 = lim =
∆𝑡→0 ∆𝑡 𝑆(𝑡)
 Also called the Hazard Rate or the Force of Mortality
 With some additional math, you can get the Survival Function
𝑆 𝑡 = 𝑒𝑒𝑒 −𝜆𝜆
2 Note that strictly speaking the hazard rate is a probability only in discrete-time models.
 Models impose different distributional assumptions on the hazard
 Three basic types of hazard (survival) functions are common
 Each one imposes different amounts of “structure” on the data
 The ultimate decision to use one approach over another should be

driven by:
 Your specific research question
 How well the model fits the actual data
 Practical concerns – i.e., difficulty estimating with available
software, interpretability, “typical” approach in previous
research
 Non-Parametric Models
 No assumptions about the baseline hazard distribution
 Pros: Imposes the least structure, easy to estimate and interpret
 Cons: Difficult to incorporate predictors (mostly descriptive)
 Examples: Kaplan-Meier, Nelson-Alan, “Classic” Life Table
 Parametric Models
 Baseline hazard assumed to vary in a specific manner with time
 Pros: Easy to incorporate covariates, gives baseline hazard to
calculate rates, smoothes “noisy” data
 Cons: Imposes the most structure, need to be sure that estimated
distribution matches the data
 Examples: Weibull (decrease or increase), Gompertz (exponential
increase), Exponential (constant)
 Semi-Parametric Models
 Baseline hazard is not pre-determined, but it must be positive.
 Pros: Covariates easily incorporated, less structure than
parametric, smoothes “noisy” data
 Cons: Does not provide the baseline hazard
 Cannot calculate rates (absolute differences)
 Can only interpret in terms of relative differentials
 Any specification errors are “absorbed” into the coefficients
 Examples: Cox Proportional Hazards (most popular model)
 Proportional Hazards Assumption

 The hazard rate is equivalent over time across groups
 Cox models must satisfy this assumption
 Some parametric models - Weibull, Gompertz, Exponential, etc.
Objectives

How To Estimate Hazard Models
 SAS – “lifereg” (parametric models), “phreg” (Cox models),

“lifetest” (Kaplan-Meier), other user-written macros available
 Stata3 - “streg” (parametric models), “stcox” (Cox models), “sts test”

(“Kaplan-Meier”), other specialty packages as .ado files
 R – package “survival” (parametric and Cox models), “KMsurv”

(Kaplan-Meier), other specialty packages “frailpack,” etc.
 SPSS – “coxreg” (Cox models), “km” (Kaplan-Meier), no parametric

models available?
3 You must "stset" the data before estimating survival models in Stata. Type "help st" for details.
Stata Example: Exponential Model
Stata Example: Cox PH Models
SAS Example: Cox Proportional Hazards Model
proc phreg data = nhis ;

model expos*dead(0) = female age ;
run ;
Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard
Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio
lths 1 0.67266 0.020010 1129.662 <.0001 1.959
hs 1 0.45942 0.019990 528.4468 <.0001 1.583
scol 1 0.35431 0.021880 262.1947 <.0001 1.425
female 1 -0.44060 0.011830 1387.689 <.0001 0.644
age 1 0.08639 0.000428 40766.7 <.0001 1.090
Objectives
 Discuss a few practical issues worth keeping in mind

Other Issues: Data Structure
 The data structure has important substantive implications
 The models shown here were estimated on individual-level data
 Models estimated on person-period data can be used to answer

other substantive questions.
 Easy to calculate various life table functions - central death rates

(mx), probabilities of death (qx), etc.
 Easy to incorporate time-varying covariates (age, etc.)

Other Issues: Alternative Models
 Always test model assumptions, evaluate model fit, etc.
 Compare how well various models fit the data

 Fit statistics - BIC, AIC, etc.
 Fitted vs. Observed values – Do the distributions overlap?
 Proportionality assumption (proportional hazards models)
 Other approaches often yield equivalent results
 Count models - Poisson models, Negative Binomial models, etc.
 Logistic Regression – Similar to Cox models, especially when few

observations are censored

Fall2012 - Brown - Introduction To Survival Analysis v3

Uploaded by

Copyright:

Available Formats

Fall2012 - Brown - Introduction To Survival Analysis v3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fall2012 - Brown - Introduction To Survival Analysis v3

Uploaded by

Copyright:

Available Formats

Introduction to Event History

 Introduce event history analysis

 Describe some common survival (hazard) distributions

 Introduce some useful Stata and SAS commands

 Discuss practical issues worth keeping in mind

 Introduce event history analysis

 Describe some common survival (hazard) distributions

 Introduce some useful Stata and SAS commands

 Discuss practical issues worth keeping in mind

 A set of statistical techniques used to analyze the time it

 Also called survival analysis (demography, biostatistics),

 Types of “Events” – Mortality, Marriage, Fertility,

 Exposure to Risk (i.e., Measuring Time)

 Hazard & Survival Functions

 Event (the outcome) - A discrete transition between two “states”

 Single Events – i.e., 𝐴𝐴𝐴𝐴𝐴 → 𝐷𝐷𝐷𝐷

 Multiple Events – i.e., 𝐴𝐴𝐴𝐴𝐴 → 𝐶𝐶𝐶𝐶𝐶𝐶 𝐷𝐷𝐷𝐷𝐷𝐷 𝑣𝑣. 𝐶𝐶𝐶 𝐷𝐷𝐷𝐷𝐷𝐷

 Time is the core component of event history analysis

 Risk Set – Individuals1 at risk of experiencing some event

 One of two outcomes are possible in the observation interval

 Time usually is measured in discrete units (i.e., years, months)

 Time theoretically can be measured in (quasi) continuous units (i.e.,

 Introduce event history analysis

 Describe some common survival (hazard) distributions

 Introduce some useful Stata and SAS commands

 Discuss a few practical issues worth keeping in mind

 Hazard Function – Instantaneous probability2 that an event will

 Also called the Hazard Rate or the Force of Mortality

 Models impose different distributional assumptions on the hazard

 Three basic types of hazard (survival) functions are common

 Each one imposes different amounts of “structure” on the data

 The ultimate decision to use one approach over another should be

 Proportional Hazards Assumption

 Introduce event history analysis

 Describe some common survival (hazard) distributions

 Introduce some useful Stata and SAS commands

 Discuss practical issues worth keeping in mind

 SAS – “lifereg” (parametric models), “phreg” (Cox models),

 Stata3 - “streg” (parametric models), “stcox” (Cox models), “sts test”

 R – package “survival” (parametric and Cox models), “KMsurv”

 SPSS – “coxreg” (Cox models), “km” (Kaplan-Meier), no parametric

proc phreg data = nhis ;

Analysis of Maximum Likelihood Estimates

 Introduce event history analysis

 Describe some common survival (hazard) distributions

 Introduce some useful Stata and SAS commands

 Discuss a few practical issues worth keeping in mind

 The data structure has important substantive implications

 The models shown here were estimated on individual-level data

 Models estimated on person-period data can be used to answer

 Easy to calculate various life table functions - central death rates

 Easy to incorporate time-varying covariates (age, etc.)

 Always test model assumptions, evaluate model fit, etc.

 Compare how well various models fit the data

 Proportionality assumption (proportional hazards models)

 Other approaches often yield equivalent results

 Count models - Poisson models, Negative Binomial models, etc.

 Logistic Regression – Similar to Cox models, especially when few

You might also like