GEE
GEE
GEE
longitudinal data
Kristin Sainani Ph.D.
http://www.stanford.edu/~kcobb
Stanford University
Department of Health Research and Policy
Limitations of
rANOVA/rMANOVA
time1
time2
time3
20
18
15
22
24
14
time4
chem1
chem2
chem3
chem4
20
1000
1100
1200
1300
18
22
1000
1000
1005
950
10
24
10
1000
1999
800
1700
38
34
32
34
1000
1100
1150
1100
25
29
25
29
1000
1000
1050
1010
30
28
26
14
1000
1100
1109
1500
chem=chem1;
chem=chem2;
chem=chem3;
chem=chem4;
output;
output;
output;
output;
Data in long
form:
id
time
score
chem
20
1000
18
1100
15
1200
20
1300
22
1000
24
1000
18
1005
22
950
14
1000
10
1999
24
800
10
1700
38
1000
34
1100
32
1150
34
1100
25
1000
29
1000
25
1050
29
1010
30
1000
28
1100
26
1109
Gives no
significant
results!
14
15
16
Graphically
Nave linear regression here looks for significant slopes (ignoring
correlation between individuals):
Y= 24.90889 - 0.557778*time.
Y=42.44831-0.01685*chem
17
The model
The linear regression model:
18
Results
The fitted model:
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
42.46803
6.06410
7.00
<.0001
chem
-0.01704
0.00550
-3.10
0.0054
time
0.07466
0.64946
0.11
0.9096
19
Generalized Estimating
Equations (GEE)
20
The model
Score1
Score 2
0
1
Score3
Score
4
Chem1
Chem2
2 (time) CORR Error
Chem3
Chem4
for time-dependent
predictors
NOTE,
Results
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard
Parameter Estimate
Error
95% Confidence
Limits
Z Pr > |Z|
Intercept
38.2431
4.9704
28.5013
47.9848
7.69
<.0001
chem
-0.0129
0.0026
-0.0180
-0.0079
-5.00
<.0001
time
-0.0775
0.2829
-0.6320
0.4770
-0.27
0.7841
In nave analysis,
standard error for
time parameter was:
0.64946 Its cut
by more than half
here.
In Nave analysis,
the standard error
for the chemical
coefficient was
0.00550 also cut
in half here.
23
Effects on standard
errors
In general, ignoring the dependency of the
observations will overestimate the standard errors
of the the time-dependent predictors (such as
time and chemical), since we havent accounted for
between-subject variability.
However, standard errors of the time-independent
predictors (such as treatment group) will be
underestimated. The long form of the data makes
it seem like theres 4 times as much data then there
really is (the cheating way to halve a standard error)!
24
25
2
y/t
0
t2
t2
t3
y/t
y/t
Correlation structure
(pairwise correlations
between time points) is
Independence.
t3
Variance of scores is homogenous
across time (MSE in ordinary least
squares regression).
27
GEE variance-covariance
matrix
t1
t1
2
y/t
a
t2
t2
t3
y/t
y/t
t3
Variance of scores is homogenous
across time (residual variance).
28
29
Independence
t1
t1
t2
t2
t3
0 0
0 0
0 0
t3
30
Exchangeable
t1
t1
t2
t3
t2
t3
31
Autoregressive
t1
t1
t2
t3
t2
t3
t4
3
2
t4
Only 1 parameter estimated.
Decreasing correlation for
farther time periods.
32
M-dependent
t1
t1
t2
2
t2
t3 0 2
t3
2
1
t4
0
2
t4
Here, 2-dependent. Estimate 2 parameters (adjacent
time periods have 1 correlation coefficient; time
periods 2 units of time away have a different
correlation coefficient; others are uncorrelated)
33
Unstructured
t1
t1
t2
1
2
t3 3
t2
1 2
5
5
4 6
t3
t4
3
4
6
t4
Estimate all correlations
separately (here 6)
34
35
Independent?
time1
time2
time3
time4
time1
1.00000
0.92569
0.0081
0.69728
0.1236
0.68635
0.1321
time2
0.92569
0.0081
1.00000
0.55971
0.2481
0.77991
0.0673
time3
0.69728
0.1236
0.55971
0.2481
1.00000
0.37870
0.4591
time4
0.68635
0.1321
0.77991
0.0673
0.37870
0.4591
1.00000
Exchangeable?
Autoregressive?
M-dependent?
Unstructured?
36
Row1
Row2
Row3
Row4
Col1
Col2
Col3
Col4
1.0000
0.7276
0.7276
0.7276
0.7276
1.0000
0.7276
0.7276
0.7276
0.7276
1.0000
0.7276
0.7276
0.7276
0.7276
1.0000
Standard
Parameter Estimate
Error
95% Confidence
Limits
Z Pr > |Z|
Intercept
38.2431
4.9704
28.5013
47.9848
7.69
<.0001
chem
-0.0129
0.0026
-0.0180
-0.0079
-5.00
<.0001
time
-0.0775
0.2829
-0.6320
0.4770
-0.27
0.7841
38
Compare to
autoregressive
proc genmod data=long4;
class
id;
39
Row1
Row2
Row3
Row4
Col1
Col2
Col3
Col4
1.0000
0.7831
0.6133
0.4803
0.7831
1.0000
0.7831
0.6133
0.6133
0.7831
1.0000
0.7831
0.4803
0.6133
0.7831
1.0000
36.5981
-0.0122
0.1371
4.0421
0.0015
0.3691
95% Confidence
Limits
28.6757
-0.0152
-0.5864
44.5206
-0.0092
0.8605
Z Pr > |Z|
9.05
-7.98
0.37
<.0001
<.0001
0.7104
40
Example tworecall
From rANOVA:
Within subjects effects,
but no between
subjects effects.
Time is significant.
Group*time is not
significant.
Group is not
significant.
This
is an example with
a binary timeindependent predictor.
41
Empirical Correlation
Pearson Correlation Coefficients, N = 6
Prob > |r| under H0: Rho=0
time1
time2
time3
time4
time1
1.00000
-0.13176
0.8035
-0.01435
0.9785
-0.50848
0.3030
time2
-0.13176
0.8035
1.00000
-0.02819
0.9577
-0.17480
0.7405
time3
-0.01435
0.9785
-0.02819
0.9577
1.00000
0.69419
0.1260
time4
-0.50848
0.3030
-0.17480
0.7405
0.69419
0.1260
1.00000
Independent?
Exchangeable?
Autoregressive?
M-dependent?
Unstructured?
42
GEE analysis
proc genmod data=long;
class group id;
model score=
43
1.0000
-0.0701
0.1916
-0.1817
-0.0701
1.0000
0.1778
-0.5931
Col3
Col4
0.1916
0.1778
1.0000
0.5931
-0.1817
-0.5931
0.5931
1.0000
Parameter
Intercept
group
A
group
B
time
time*group A
Standard
Estimate
Error
42.1433
7.8957
0.0000
-4.9184
-4.3198
6.2281
6.6850
0.0000
2.0931
2.1693
95% Confidence
Limits
29.9365
-5.2065
0.0000
-9.0209
-8.5716
54.3501
20.9980
0.0000
-0.8160
-0.0680
Comparable to within
effects for time and
time*group from
rMANOVA and rANOVA
Z Pr > |Z|
6.77
1.18
.
-2.35
-1.99
<.0001
0.2376
.
0.0188
0.0464
GEE analysis
proc genmod data=long;
class group id;
model score=
45
Col2
Col3
Col4
Row1
1.0000
Row2
-0.0529
Row3
-0.0529
P-values areRow4
similar to rANOVA
-0.0529
-0.0529
1.0000
-0.0529
-0.0529
-0.0529
-0.0529
1.0000
-0.0529
-0.0529
-0.0529
-0.0529
1.0000
Parameter
Intercept
group
A
group
B
time
time*group A
Standard
Estimate
Error
40.8333
7.1667
0.0000
-5.1667
-3.5000
5.8516
6.1974
0.0000
1.9461
2.2885
95% Confidence
Limits
29.3645
-4.9800
0.0000
-8.9810
-7.9853
52.3022
19.3133
0.0000
-1.3523
0.9853
Z Pr > |Z|
6.98
1.16
.
-2.65
-1.53
<.0001
0.2475
.
0.0079
0.1262
47
48
49
0i ~ N ( 0 population ,
time constant
2
0
~ N (0, 2y / t )
50
0i ~ N ( 0 population ,
2
0
Generally, this is a
nuisance
parameterwe
have to estimate it for
making statistical
inferences, but we
dont care so much
about the actual
value.
51
Unexplained variability in Y.
LEAST SQUARES ESTIMATION
FINDS THE BETAS THAT MINIMIZE
THIS VARIANCE (ERROR)
52
y/t
y/t
y/t
y/t
y/t
y/t
2
y/t
59.482929
0 constant
3 parameters to
estimate.
24.90888889
time constant
-0.55777778
54
Where to
find these
things in
OLS in SAS:
Model: MODEL1
Dependent Variable: score
Analysis of Variance
Sum of
Mean
DF
Squares
Square
F Value
Pr > F
Model
35.00056
35.00056
0.59
0.4512
Error
22
1308.62444
59.48293
Corrected Total
23
1343.62500
Source
Root MSE
7.71252
R-Square
0.0260
Dependent Mean
23.37500
Adj R-Sq
-0.0182
Coeff Var
32.99473
Parameter Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t Value
Pr > |t|
Intercept
24.90889
2.54500
9.79
<.0001
time
-0.55778
0.72714
-0.77
0.4512
0i ~ N ( 0 population ,
2
0
)
56
Meaning of random
intercept
Mean
population
intercept
Variation
in
intercepts
57
2
y/t
Residual variance:18.9264
0i ~ N ( 0 population ,
Same:24.90888889
time constant
Same:-0.55777778
2
0
4 parameters to
estimate.
Variability in intercepts
between subjects: 44.6121
58
Where to
find these
things in
from MIXED
in SAS:
Subject
Variance
id
Estimate
44.6121
Residual
18.9264
Fit Statistics
-2 Res Log Likelihood
146.7
152.7
154.1
152.1
44.6121
69%
18.9264 44.6121
69% of variability in
depression scores is
explained by the
differences between
subjects
Error
DF
t Value
Pr > |t|
Intercept
24.9089
3.0816
8.08
0.0005
time
-0.5578
0.4102
17
-1.36
0.1916
60
61
Residual variance:40.4937
Same: 24.90888889
0 constant
Same:-0.55777778
62
2
0
63
64
0i ~ N ( 0 population ,
24.90888889
2
0
53.0068
0.55777778
0.4162
Additionally, we have to
estimate the covariance of
the random intercept and
random slope:
here -1.9943
(adding random time therefore
cost us 2 degrees of freedom)
65
152.7
67
68
Residual and
AIC are
reduced even
further due to
strong
explanatory
power of
chemical.
Cov Parm
Subject
Intercept
id
Residual
Estimate
35.5720
10.2504
Fit Statistics
-2 Res Log Likelihood
143.7
147.7
148.4
147.3
Error
DF
t Value
Pr > |t|
38.1287
4.1727
9.14
0.0003
time
-0.08163
0.3234
16
-0.25
0.8039
chem
-0.01283
0.003125
16
-4.11
0.0008
Intercept
71
SAS code
proc mixed data=long ;
class id group;
model score = time group
time*group/s corrb;
random int /subject=id ;
run; quit;
72
138.4
142.4
143.1
142.0
group
Estimate
Error
DF
t Value
Pr > |t|
Intercept
40.8333
4.1934
9.74
0.0006
time
-5.1667
1.5250
16
-3.39
0.0038
1.21
0.2444
group
7.1667
5.9303
16
group
time*group
-3.5000
2.1567
16
-1.62
time*group
.
0.1242
.
73
Standard
Estimate
Error
40.8333
7.1667
0.0000
-5.1667
-3.5000
5.8516
6.1974
0.0000
1.9461
2.2885
95% Confidence
Limits
29.3645
-4.9800
0.0000
-8.9810
-7.9853
52.3022
19.3133
0.0000
-1.3523
0.9853
Z Pr > |Z|
6.98
1.16
.
-2.65
-1.53
<.0001
0.2475
.
0.0079
0.1262
References
76