Epidemiology - CHEAT SHEET
Epidemiology - CHEAT SHEET
Epidemiology - CHEAT SHEET
Ratio
Proportion
Rate
Risk
Prevalence
Cumulative/Crude
Incidence (CD)
Incidence Density (ID)
Morbidity rate
Mortality rate
Case-fatality
Attack rate
Years of Potential Life
Loss (YPLL)
E
UE
O
a
c
O
b
d
Equation
Event or People (A)/ Event of People (B)
= +
#
#
#
#
#
Point = specific time pt; period = time interval OR
Incidence x Disease Duration
#
( )
#
#
#
#
#
#
Age at death predetermined age at death
(predetermined standard = 65)
Units
None
%
Use
Descriptive
Personyrs
%
Frequency of
event
Burden disease
in popn
Personyrs
Personyears
years
Disease
causation
Premature
mortality index
Pros/Cons
Research/resource priorities,
surveillance trends/interventions
Concept
Relative Risk (RR)
Attributable Risk/Risk
Difference (AR)
Attributable Fraction
Exposed (AR%)
Population
Attributable Risk
(PAR)
Attributable Fraction
population (PAR%)
Mantel-Haenszel
Summary Odds Ratio
(ODMH)
Equation
= +
+
RateE RateUE =
+ +
100
1
= 100
Incidencetotal
IncidenceUE
OR
AR x PrevelanceE
100
+
( 1)
=
+
( 1) + 1
=
/
Use
+
Likelihood devng Outcome (O ) in Exposed
(E) group relative to Unexposed (UE) group;
ratio incidence in E vs. incidence in UE
Rate of O that can be attributed to the E in
the E group
Interpretation
Those with [E] are [RR%] [more/less] likely
to develop [O] than those with [UE]
Causal Relationships
Direct: factor -> disease; no intermediates
Indirect: factor-> A -> disease; factor causes disease through step(s)
Necessary: no factor = no disease; w/o factor disease never develops
Sufficient: factor -> disease. w/ factor the disease always develops
Individual Matching
Each ctrl matched to case, matched analysis
For each case select one or more ctrls with same
characteristics on potential confounders
Pro: ctrls factors difficult to measure; easier obtain comparable ctrl
group, gain precision of OR estimate (tighter CI)
Cons: complexity in ctrl accrual; info from ctrls on matching variable
need obtain before study inclusion (need screen more), matching on
many variables difficult, cant study matched variable, decrease OR
precision if not true confounder; only small gain if factor not strong
for disease
Frequency Matching
Select controls with similar distribution of confounder
Control in analysis
Analysis of Individually Matched Data
Each pair (case-ctrl) contributes to one observation/count
OR = #(CasesE & ControlsUE)/ # (CasesUE & ControlsE)
Stratification
Stratify by confounder, examine OR within lvls of
confounder
Want summary estimate, estimate of risk adjusted for
effects of confounder
Use ORMH when strata OR similar
Factor is true confounder if adjusted OR and unadjusted
OR differ by greater than 10%
Cohort Studies
Uses
Rare exposures
Multiple effects of single exposure
Identify temporal sequence
Expense follow-up not an issue
Fixed Cohorts: identify popn at time and no include more eligible
Dynamic Cohorts: open popn and includes ppl who enter later
Changes in exposure over time: re-classify during study or allow
exposure status to vary in analysis
Internal Comparisons
Study gradient (D-R) of disease
Variation with amount of exposure (often no unexposed)
External Comparisons
Estimate disease incidence in exposed group in absence of
exposure
As similar as possible to exposed group
Follow-Up
1. Length needs to be considered in design
Base apriori knowledge of time needed for disease show
Induction time (to induce) & latency time (express/detect)
Usually no know induction/latency; try estimate
2. Attempt high levels of follow up (prevent loss)
Be persistent
Bias
1. Selection (systematic differences b/w E & UE groups)
2. Information (Misclassification, measurement error)
3. Non-participation (systematic reason for non-Ps?)
4. Attrition (differential loss to follow-up)
5. Healthy Worker Effect (decrease O+ in worker)
Use internal comp, external comp of workers, artificially
adjust risk (inflate)
Nested Case-Control Study
Conduct cohort but no full evaluation of exposure
After follow-up, conduct case-control within cohort
Cases = identified cases of disease in cohort
Ctrls= sample of cohort free of disease at time of cases
Analysis as case-control
3.
P-values
2
Square root of X to compare to standard distrubtn
Standard normal, p=0.06 (less than 0.05, reject null)
Represents probability of observing result at least as
extreme as that observed by chance alone
Interpretation of Significance Tests
Type 1 Error: Reject H0 when it is true
Area under 0.05 (alpha = 0.05 in 2-tailed test)
Type 2 Error: Fail to reject H0 when it is false
Significance Values
P-value function of sample size and effect magnitude
Confidence Intervals (CIs)
Range within which true effect magnitude lies
If CI includes 1.0, then p>0.05 (accept null)
If CI excludes 1.0, then p<0.05 (reject null)
Width of CI indicates variability of estimate (reflects n)
Power
Ability of study to demonstrate association if it exists
Probability of rejecting the null when it is false (1-Type 2 Er)
Estimating Power and Sample Size
Begin with assumption null is false
Increase OR = decrease n
Exposure proportion amongst controls (p0) move to
extreme = increase n
Decrease n = decrease power
P0 move to extreme = decrease power
Effect Modification (Interaction)
Change in magnitude (or direction) of measure of
association b/w E and disease according to value of X
Contrast confounding which distorts the measure of
association resulting from a mixing effect of E on disease
rd
with that of 3 factor
Confounding = bias in effect to be eliminated/avoided
Effect mod = description of effect to report
Deal with both using stratified analysis
Not similar OR b/w strata = effect mod (similar = pool)
Differences crude OR and adjusted OR suggests confound
If interaction between E and population subgroup exists,
results NOT generalizable
Regression Analysis
For prediction, confound control, and effect mod detect
General Linear Model GLM
Y=a0 + a1X1 + a2X2
Y= outcome
X1 = exposure
X2= confounder
Coefficients a1 and a2 provide estimate of effect of X1 and
X2 that are mutually unconfounded
Data limits determine number of variables included
All GLM have 3 components:
1. Random component (identified response variable)
Identifies response variable Y and selects probability distribn
Standard GLM treat all Y as independent
Binomial distribution: binary outcome (0 vs. 1)
Normal distribution: continuous outcome
2. Systematic component (specifies explanatory variable)
Linearly predictors on right hand side of model equation
Linear combination of explanatory variables = linear predictor
Some can be interaction terms (e.g. x3=x2x1)
2
Others indicate curvilinear effects (e.g. x2=x1 )
3. Link Function (specifies a function of expected value
[mean] of Y)
Connects random and systematic components