Overdispersion Models and Estimation
Overdispersion Models and Estimation
Overdispersion Models and Estimation
STATISTICS
& DATAANALYSIS
ELSEVIER
Overdispersion:
Models and estimation
John Hinde a,*, Clarice G.B. Dem6trio b
Abstract
Overdispersion models for discrete data are considered and placed in a general framework. A distinction is made between completely specified models and those with only a mean-variance specification.
Different formulations for the overdispersion mechanism can lead to different variance functions which
can be placed within a general family. In addition, many different estimation methods have been proposed, including maximum likelihood, moment methods, extended quasi-likelihood, pseudo-likelihood
and non-parametric maximum likelihood. We explore the relationships between these methods and examine their application to a number of standard examples for count and proportion data. A simple
graphical method using half-normal plots is used to examine different overdispersion models. (~) 1998
Elsevier Science B.V. All fights reserved.
1. Introduction
In applying standard generalized linear models it is often found that the data
exhibit greater variability than is predicted b y the implicit mean-variance relationship.
This p h e n o m e n o n o f overdispersion has been widely considered in the literature,
particularly in relation to the binomial and Poisson distributions. Failure to take
account o f this overdispersion can lead to serious underestimation o f standard errors
and misleading inference for the regression parameters. Consequently, a number o f
models and associated estimation methods have been proposed for handling such data.
For binomial data, CoUett (1991 ) gives a good practical introduction to some o f these
* Corresponding author.
0167-9473/98/$19.00 (~) 1998 Elsevier Science B.V. All fights reserved
PH S0167-9473(98)00007-3
152
methods, following the work of Williams (1982, 1996). Overdispersed Poisson data
are discussed, for example, in Breslow (1984) and Lawless (1987). More general
discussions of overdispersion are also to be found in McCullagh and Nelder (1989)
and Lindsey (1995).
There are many different specific models for overdispersion, which arise from alternative possible mechanisms for the underlying process. We can broadly categorise
the approaches into the following two groups:
(i) Assume some more general form for the variance function, possibly including
additional parameters.
(ii) Assume a two-stage model for the response. That is, assume that the basic
response model parameter itself has some distribution.
Models of type (i) may not correspond to any specific probability distribution
for the response but may be viewed as useful extensions of the basic model. The
regression parameters can be estimated using quasi-likelihood methods with some ad
hoc procedure for estimating any additional parameters in the variance function. An
example of this is the use of a heterogeneity factor in overdispersed binomial data.
Type (ii) models lead to a compound probability model for the response and, in
principle, all of the parameters can be estimated using full maximum likelihood. A
standard example is the use o f the negative binomial distribution for overdispersed
count data. However, in general, the resulting compound distribution takes no simple
form and approximate methods of estimation are often used. A variant of the second
approach, that removes the need to make any specific assumptions about the second
stage distribution, is to use non-parametric maximum likelihood (NPML) estimation
of the compounding distribution, as advocated by Aitkin (1995, 1996).
In Section 2 we consider models for overdispersed binary data and introduce the
various methods of estimation. The use of these is illustrated on two standard exampies. In Section 3 we introduce models for overdispersed count data. More complex
models for overdispersion are discussed in Section 4. Finally, in Section 5 we use a
simple graphical procedure for diagnostic assessment of overdispersed models.
2. Binary data
2.1. M o d e l s a n d e s t i m a t i o n
In studying a binary response suppose that the random variables Y,- represent counts
of successes out of samples of size mi, i = 1 , . . . , n. If we write E[Y~] --- #; = mirc~,
then a generalized linear model allows us to model the expected proportions ~zi in
terms of explanatory variables xi through
g(zri) = p'xi,
where g is some suitable link function and p is a vector of p unknown parameters.
The usual error specification is Y~,-Bin(mi, zri) with variance function
Vat (Y,-) ----m;~z~(1 - hi).
(1)
153
However, when overdispersion is present the variance will be greater than this. In
the following sections we will consider various generalizations of this basic variance
function in terms of the two approaches outlined in Section 1.
2.1.1. Constant overdispersion
The standard quasi-likelihood approach, which requires only the specification of
the first two moments, replaces (1) by
Var (Y/)
~pmiTzi(1 - 7zi).
(2)
E(Yi) =
miTZi
and
Var(Y/) = miTzi(1 - 7zi)[1 + dp(mi - 1)].
(3)
( y - t)
-2
dt.
For the beta-binomial model we can take the variance function V(#;) as #i( 1 - #i/mi),
the usual binomial variance function, and ~bi -- 1 + ~b(mi - 1) giving a deviance
154
J.
Hinde, C.G.B. Demktrio/ Computational Statistics & Data Analysis 27 (1998) 151-170
function, D, identical to that for the binomial model. Maximizing Q+ over the regression parameters/~ gives a set of weighted binomial estimating equations with weights
1/c~i= 1/[1 + ~b(mi- 1)]. These are the quasi-likelihood equations for a known value
of the overdispersion parameter ~b. For the estimation of ~b we need to solve
~-~(D(yi,~i)i=l dlog(dpi)~--~:D(yi,
lti)-dPi}d~b
(J~i
1}
i=1 [
-~-dq~i: O.
The second form of this equation shows that we can obtain an estimate for ~b by
fitting a gamma model using the deviance components as the y-variable, an identity link and taking the linear model to have a fixed intercept (offset) of 1 and
m ; - 1 as the explanatory variable. An approximate standard error is obtained for
~b by setting the scale to 2, corresponding to modelling Z2 responses (McCullagh
and Nelder, 1989). We iterate between these two sets of estimating equations for fl
and q~ until convergence, giving parameter estimates and standard errors, which are
correct because of the asymptotic independence of fl and 3.
Brooks (1984) suggests a mixed strategy in which quasi-likelihood estimation for
fl is combined with maximum likelihood estimation for 4). This involves replacing the
above estimating equation for ~b by the beta-binomial likelihood score equation, with
fl set equal to its current estimate. This equation is easily solved using a NewtonRaphson-type iteration, although it may be necessary to reduce the step length to
avoid divergence or values outside of the feasible region for ~b. In practice, this
approach seems to give estimates for (fl, q~) which are very close to the full maximum
likelihood estimates. They will not be exact maximum likelihood estimates, as for
fixed ~b the beta-binomial distribution is not in the exponential family and so the
quasi-likelihood estimates for fl with the beta-binomial variance function do not
exactly maximize the beta-binomial likelihood.
An alternative to extended quasi-likelihood is the pseudo-likelihood (PL) approach
of Carroll and Ruppert (1988). Here estimates o f / J are obtained by generalized
least squares, which if iterated is equivalent to quasi-likelihood estimation for given
values of ~b~. The estimation of additional parameters in the variance is based on the
maximization of
P:-2
}
+log(EndpiV(l~i)) .
This is of the same form as Q+ but with the deviance increments replaced by the
squared Pearson residuals and V(yi) by V(#~); it corresponds to a normal likelihood
function for the residuals. For the beta-binomial distribution the estimating equation
for q~ is
i=1
~/V-(~)
d~b
i=1
(~2
: 0,
where ri = (Yi- # i ) / ~ ) ,
the unsealed generalized Pearson residuals. This equation can be solved in the same way as the EQL estimating equation by fitting a
gamma model to the squared Pearson residuals.
155
i : , [ ~iV-(~i)
1:0.
miTz i
and
Var(Y~-):
miTzi(1
--
7zi)[l d-
p(mi
--
1)],
which is of exactly the same form as (3). However, it is now possible for p to
be negative ( - 1 / ( m ~ n ) - 1) < p < 1) corresponding to underdispersion. Using the
mean-variance specification this model is fitted precisely as above. It is also possible
to extend the beta-binomial distribution to handle underdispersion (Prentice, 1986).
156
J. Hinde, C. G.B. Dem~trio l Computational Statistics & Data Analysis 27 (1998) 151-170
milzi(l -
7ti)[1 q- o-2(mi -
1)Tzi(1 - ~zi)],
(4)
which he refers to as type III. EQL, PL and moment methods can be used for the
estimation of o-2 taking q~i = 1 + a2(mi - 1)r~i(1 - rci).
Aitkin (1995, 1996) replaces the assumed distributional form for z by a discrete
mixing distribution. The computation again involves an EM-type algorithm in which
the mixing distribution is assumed to be a discrete distribution on a specified number
of points and the weights and points for the quadrature become additional parameters
in the model. This results in a non-parametric maximum likelihood (NPML) estimate
of this distribution together with the regression parameter estimates.
(5)
2.2. Examples
Here we will consider the application of the various methods described above to
several standard examples from the overdispersion literature. This extends the comparison by Liang and McCullagh (1993), who only consider constant overdispersion
and the beta-binomial-type variance function.
157
J. Hinde, C. G.B. Dem~trio l Computational Statistics & Data Analysis 27 (1998) 151-170
Table 1
Orobanche data: deviances with overdispersion estimated from maximal model
Binomial
Constant
Beta-binomial
Source
d.f.
ML
QL
ML
EQL
PL
Extract I Species
Species I Extract
Species .Extract
1
1
1
56.49
3.06
6.41
30.34
1.64
3.44
32.69
2.88
4.45
31.68
2.84
4.40
31.37
2.85
4.62
1.862
0.012
0.013
0.013
q~
Beta-binomial
Logistic-normal
Source
d.f.
Moment a
EQL a
PL a
Moment~
ML
Extract I Species
Species [ Extract
Species.Extract
1
1
1
22.95
2.64
3.54
24.67
2.69
3.72
24.76
2.72
3.98
21.49
2.54
3.52
31.33
2.85
4.44
0.025
0.022
--
0.021
--
-0.108
-0.056
q~
d 2
--
a(df corrected).
The data consist of the number of seeds and the number germinating for each replicate. Interest focusses on the possible differences in germination rates for the two
types of seed and root extract and whether there is any interaction. Table 1 presents
results for different models and estimation methods with the overdispersion parameter estimated from the interaction model and fixed for all sub-models. An alternative
strategy is to re-estimate the dispersion parameter for each model; this is considered
later. Note that the overdispersion parameter estimates are not all comparable as they
relate to different overdispersion models.
The residual deviance for the interaction model using the standard binomial fit is
33.28 on 17 df, giving strong evidence of overdispersion. Using a constant overdispersion model and quasi-likelihood gives a non-significant interaction term (deviance
= 3.44), with extract as the only important factor (deviance -- 30.34). The conclusions are less clear cut for the other overdispersion models with the interaction being
marginally significant. For the beta-binomial variance function the only real differences between the estimation methods are whether we use the degrees of freedom
correction or not; without any correction the EQL and PL results are very similar to the maximum likelihood fit (ML) and, indeed, using an uncorrected moment
method also leads to very similar estimates. The slight difference between the degrees of freedom corrected moment and PL methods is due to the different forms
of weighting in the estimating equation for 4). The sample sizes here vary from 4
to 81 giving weights which vary by a factor of 10 for a typical value of @, although the impact on the final estimate is slight. Interestingly, if we parameterize
the constant overdispersion model as 1 + 7 ( ~ - 1), where the mean sample size
is ~ = 39.6, we obtain ~ = 0.022, which is very close to the degrees of freedom
corrected estimates for the beta-binomial variance function. The differences between
158
the maximum likelihood and moment method fits for the logistic-normal can also be
attributed to the degrees of freedom correction. Notice that, as the fitted proportions
for the S p e c i e s * E x t r a c t model are not extreme (from 0.36 to 0.68 with overall
proportion 0.51) the results from using a logistic-normal variance function (4) are
very similar to those using a beta-binomial form (3) - for all fitted proportions r~i,
the logistic-normal moment estimates give ~2r~i(1 - z ~ ; ) ~ 0.025, the estimate of ~b
under the beta-binomial model.
Comparing the beta-binomial and binomial models the change in deviance for the
additional parameter is 2.34 on 1 degree of freedom giving no evidence for the
beta-binomial overdispersion function. Nelder and Pregibon (1987) make a similar
observation in considering the EQL fit. Liang and McCullagh (1993) conduct a
formal test between the constant overdispersion and beta-binomial overdispersion
models and are unable to choose between them. The same conclusion results from
using the general form of the variance function introduced above and looking at the
profile likelihood for 61 with 62 = 0.
There is considerable confusion about residual deviances for overdispersion models. In general, these provide no information about the fit of the model, because of
the estimation of the overdispersion parameters. Deviances for the beta-binomial and
logistic-normal models are often given with respect to a binomial saturated model
and, while this is a useful device for comparisons with the standard binomial model,
it does not provide a goodness of fit measure. For the beta-binomial family it is possible to calculate a true deviance for a fixed value of the overdispersion parameter,
but when the parameter is estimated, not surprisingly, this always seems to result in
values close to the degrees of freedom.
Using the NPML approach the mixing distribution is considered as a nuisance
parameter and estimated for each model. Table 2 compares the deviance differences
from this model with results from other overdispersion modelling methods when
the overdispersion parameter is re-estimated for each submodel. The results are all
similar. For the interaction model the NPML fit gives a two-point mixing distribution
with a variance of 0.08, comparable to the variance estimate of 0.056 for the logisticnormal fit. A plot of the component probabilities against the binomial model residuals
shows a strong monotonic relationship, indicating that the mixing distribution is
picking up overdispersion.
It is interesting to note that the results from Tables 1 and 2, for using a fixed
overdispersion parameter or re-estimating it in each model, are fairly similar. Although, as would be expected, all of the deviance contributions are reduced in Table 2
Table 2
Orobanche data: deviances with overdispersion re-estimated for each model
Logistic
Beta-binomial
Source
NPML
normal
ML
EQL
PL
Extract I Species
Species I Extract
Species.Extract
12.92
1.98
4.22
15.26
2.70
4.15
15.44
2.73
4.13
15.08
2.70
4.10
15.78
2.73
4.34
159
J. Hinde, C G.R Dem~trio l Computational Statistics & Data Analysis 27 (1998) 151-170
Table 3
Trout egg data
Location
in stream
11
1
2
3
4
5
89/94
106/108
119/123
104/104
49/93
94/98
91/106
100/130
80/97
11/113
77/86
87/96
88/119
67/99
18/88
141/155
104/122
91/125
111/ 132
0/138
Table 4
Trout egg data: deviances with overdispersion estimated from maximal model
Binomial
Constant
Beta-binomial
Source
d.f.
Logit
CLL
CLL
Logit
CLL
Loc[ Time
Time [ Loe
Residual
4
3
12
849.1
164.1
64.5
853.8
168.8
59.8
184.1
36.4
158.6
31.4
178.6
35.9
4.64
0.038
0.033
q~
Manly (1978) considers the analysis of data on the survival of trout eggs. Boxes
of eggs were buried at five different locations in a stream and at four different times a
box from each location was sampled and the number of surviving eggs were counted.
The data are presented in Table 3 as proportions s / n , where s denotes the number
of survivors and n the number of eggs in the box.
In his original analysis of this data, Manly used a log-log link function for the
probability of surviving, which we reproduce here by using a complementary log-log
link (CLL) for the probability o f death. In Table 4 we present deviance contributions
for the two factors in several models. The residual deviance for the binomial logit
model is 64.5 on 12 df, while that for the CLL model is 59.8 indicating a slight
preference for this link. The results for the beta-binomial variance function are based
on moment estimation.
The large amount of overdispersion present here has a clear impact on the effect
deviances, although both effects remain significant in all of the models. For the
beta-binomial variance function with complementary log-log link we have a slightly
smaller estimate of the overdispersion parameter and correspondingly larger deviance
160
J. Hinde, C.G.B. Demdtriol Computational Statistics & Data Analysis 27 (1998) 151-170
differences for the effects. We will return to the comparison of these models in
Section 5. The other point to note here is the great similarity for the complementary
log-log link of the constant overdispersion and beta-binomial deviances. This is
because the sample sizes here are relatively large and not too different (ranging from
86 to 155 with mean 111). If we parameterize the constant overdispersion model as
1 + 7 ( N - 1), where N is the mean sample size, we obtain ff=0.033, exactly the same
as in the beta-binomial model. Because of the large and relatively similar sample
sizes the other estimation approaches for the beta-binomial variance function give
similar results.
The NPML approach gives a less satisfactory answer here with components picking
off the observations with extreme observed proportions.
(6)
(7)
(8)
161
The asymptotic independence of ~ and k means that the standard errors for/~ from
the IRLS fit are correct, see Lawless (1987) for details.
Using EQL for the negative binomial variance function presents some ambiguity due to different factorizations of Var ( Y / ) = ~iV(#i). Three obvious possibilities
are
(i) 1~)i~ 1, V(#;) =/~i + #~/k;
(ii) q~i = 1 + #ilk, V(#i) = I~i;
(iii) ~bi = #i + #~/k, V(#i) =- 1.
In principle, all of these lead to different estimating equations for fl, defining
different iterative schemes. On convergence these all give the same estimates and
the sensible approach is to use quasi-likelihood with the negative binomial variance
function. For the estimation of k things are not so simple and the different formulations will lead to different estimates. Using (i) leads to an estimating equation for
k similar in form to the negative binomial score equation. In cases (ii) and (iii)
the parameter k appears in the scale parameter and gamma estimating equations are
obtained. In (ii) Poisson deviances are used as the y-variable, while in (iii) we use
Poisson Pearson residuals and this corresponds to pseudo-likelihood with estimating
equation
~ - ~ ( (Yi--l~i)2
i=~ #i(1 ~ #i/k)
1} dlog(1 + # J k )
dk
= O.
Here the simple moment method gives the unbiased estimating equation
i=l
#i(1 +
#i/k)
162
parameters, ~i and fl~, are fixed, we obtain different compound distributions with
different variance functions.
3.2. Examples
3.2.1. Pump failure data
Gaver and O'Muircheartaigh (1987) present data on the numbers of failures and
the period of operation, ti (measured in 1000s of hours) for 10 pumps from a
nuclear plant. The pumps were operated in two different modes; four being run
continuously and the others kept on standby and only run intermittently. To model
the failure rates we need to allow for the different periods of operation. Using a
log-linear model for the numbers of failures we include log t; as an offset in the
linear predictor. The Poisson model allowing for the two modes of operation has
residual deviance of 71.4 on 8 df, showing a very large degree of overdispersion.
Table 5 shows the results for a constant overdispersion model and a negative binomial variance function estimated by maximum likelihood (ML), EQL (using form
(ii)) and PL. For the negative binomial variance function all three estimation methods give similar results. A likelihood ratio test for overdispersion comparing the
negative binomial and Poisson likelihoods (k = c~) has a value of 45.25 on 1
df. Since the null hypothesis of a Poisson model corresponds to a parameter value
on the boundary of the parameter space, the appropriate asymptotic distribution for
J. Hinde, C.G.B. Dem~triol Computational Statistics & Data Analysis 27 (1998) 151-170
163
Table 5
Pump data: deviances and parameter estimates
Source
d.f.
Poisson
Constant
ML
QL
Negative binomial
ML
EQL
PL
Deviances
Mode
53.1
4.8
6.1
7.3
9.4
1.68
0.60
--
1.68
0.61
-1.39
0.56
Parameter estimates
mode
S.E.
q~
/~
S.E.
1.88
0.23
0.0
--
1.88
0.77
11.2
--
1.67
0.63
-1.30
1.46
0.63
0.56
1 2
this statistic under the null hypothesis has a probability mass of ~1 at 0 and a 5Xfl)
distribution above 0, see Lawless (1987). Here there is clearly overwhelming evidence against the Poisson assumption. A comparison of the two overdispersion
models gives no clear preference, but with such a small data set that is hardly
surprising.
164
27.5
25.0
22.5
20.0
17.5
LL
15.0
"5
..0
x x
12.5
E
.-z
z
10.0
x
7.5
x
x
x
x x
5.0
2.5
x
0.0
i
200
400
600
800
1000
L~ngth of Roll
Table 6
Fabric fault data: deviances and parameter estimates
Source
Poisson
ML
Constant
QL
Negative
binomial
Poisson
normal
Poisson
NPML
14.9
51.7
15.8
49.4
Deviances
log l
Residual
39.2
64.5
17.3
28.5
15.7
30.7
Parameter estimates
log l
s.e.
/~
t~
s.c.
0.997
0.176
0.0
---
0.997
0.265
2.27
---
0.938
0.228
-8.67
-0.63
0.922
0.221
--0.34
0.07
0.800
0.201
--0.31
is a s s u m e d to b e t h e s a m e f o r
all o f t h e o b s e r v a t i o n s . H o w e v e r , in s o m e a p p l i c a t i o n s it is q u i t e c o n c e i v a b l e t h a t t h e
overdispersion
m a y b e d i f f e r e n t in d i f f e r e n t s u b g r o u p s
o f t h e data. E x p l i c i t m o d e l s
165
for the variance, and hence overdispersion, are easily handled by an additional model
for the scale parameter of the form
h(4~i) = r'z~
for some suitable link function h, usually the identity or the log. The vector of
explanatory variables z~ may include covariates in the mean model giving great flexibility for joint modelling of the mean and dispersion. Estimation can proceed by
either EQL or PL using a gamma estimating equation for g as in Section 2.1.2;
see McCullagh and Nelder (1989, Ch. 12), for a detailed discussion of this. A related approach is the double exponential family of Efron (1986), in which standard
one-parameter exponential family distributions are extended by the inclusion of an
additional parameter 0, which varies the dispersion of the family by altering the
effective sample size. The dispersion parameters 0~ can again be modelled by explanatory variables and the estimation procedure is very similar to that for EQL or
PL. For a simple example of modelling within this family see Fitzmaurice (1997).
A very general framework for these extended models is given by the exponential
dispersion models of Jorgensen (1987, 1997).
This approach to dispersion modelling is in the spirit of the our first category for
handling overdispersion in Section 1. A natural way to extend the second category
of models is through the addition of more complex random effects structures in the
linear predictor, taking
where ig is a vector of fixed effects, ~ is a vector of random effects and xi and zi are
corresponding vectors of explanatory variables. Assuming that these random effects
are normally distributed gives a direct generalization of the standard linear mixed
model for normally distributed responses to what is commonly called the generalized
linear mixed model (GLMM). Estimation within this family is non-trivial and a number of different approaches have been proposed, including penalised quasi-likelihood
(Breslow and Clayton, 1993), restricted maximum likelihood (Engel and Keen, 1994)
and Bayesian methods using Markov chain Monte Carlo (Clayton, 1996). In some
simple models with nested random effects, maximum likelihood estimation is possible (Anderson and Hinde, 1988) and Aitkin and Francis (1995) describe GLIM4
macros for fitting such models. In many situations the assumption of normality for
the random effects is neither natural nor computationally convenient and Lee and
Nelder (1996) propose an extension of GLMMs to hierarchical generalized linear
models. Here, the random components can come from an arbitrary distribution, although they particularly favour the use of a distribution conjugate to that of the
response. Estimation is based on h-likelihood, a generalization of the restricted maximum likelihood method used for standard normal linear mixed model. Such models
are also easily handled within the Bayesian paradigm using Markov chain MonteCarlo methods (Clayton, 1996). The non-parametric maximum likelihood approach
can also be extended to these more complex models, see Aitldn (1996) and Aitkin
and Francis (1995).
166
Table 7
Rat survival data: models with c o m m o n and distinct overdispersion parameters - deviances and
parameter estimates.
Common overdispersion
Source
d.f.
Binomial
ML
Constant
QL
Beta-binomial
moment
Logistic-normal
moment
Beta-binomial
ML
Trealment
9.02
3.35
3.86
5.68
5.77
0.0
--
2.69
--
0.20
--
-1.29
0.02, 0.31
--
q~
#2
4.1. Examples
4.1.1. Rat survival data
Weil (1970) presents data on a toxicological study with a treatment and control
group each comprising of 16 litters. The mothers in the treatment group received a
diet containing the chemical of interest and the response is the number of rat pups in
the litters surviving after 21 days as a fraction of the number alive at 4 days. Fitting
a binomial logit model with a treatment effect results in a residual deviance of 86.2
on 30 degrees of freedom with clear evidence of overdispersion. The deviance for
the treatment effect is shown in Table 7 for several overdispersion models. With both
constant overdispersion and a beta-binomial variance function the treatment effect is
not significant.
Examination of the data shows that there are different degrees of overdispersion
in the two groups and fitting a beta-binomial model with different parameters for the
two groups shows that the overdispersion parameters (0.02 and 0.31 for the control
and treatment groups, respectively) differ by more than a factor of 10, however, the
standard errors are very large. Allowing for this difference in variability, the treatment
effect becomes significant. A similar conclusion is obtained using a logistic-normal
variance function (4), where the overdispersion factor models the variance in the
two groups. Liang and McCullagh (1993) note that, allowing the overdispersion
to be different for the two groups, it is again not possible to choose between the
constant and beta-binomial type of overdispersion. In terms of our general form of
overdispersed variance function (5) it seems not to matter what value is taken for
6~, however, taking 62 = 1 provides a simple method of allowing for the different
overdispersions in the two groups.
4.1.2. Orobanche #ermination data
Using EQL or PL we can fit the full interaction model S p e c i e s * E x t r a c t with
different overdispersion parameters for each of the four species/extract combinations.
The results from both estimation methods are very similar and while the estimates
for the (#-parameters range from 0.002 to 0.018 the standard errors are very large.
The change in extended quasi-deviance is only 0.2, showing no evidence against the
common overdispersion parameter model.
3. Hinde, C G.R Dem~triol Computational Statistics & Data Analysis 27 (1998) 151-170
167
Binomial: LogR
4.0.
,ot
3.5
3.5
3.0
3.0
<o
g
7~
2.5-
"0
"
2.0x
t-
1.5-
"
2.5,
2.0
e'.
~"
.--~
1.5
x.. x.. x
1.0-
11
1.0
xx
0.5
0.5.
0.0
0.0.
o'.~
(a
1:o
1.5
0:5
2:o
1:0
1'.5
Half-normal scores
Half-normal scores
Beta-Binomial (Moment)
Logistic-Normal (Moment)
,ot
3.5
,o1
3.0
3.0-
2.5.
2.5-
2:0
3.5
gJ
"0
8
eta
5t~
2.0.
""
2.0.
1.5.
" x~ ~ x
.~
1.5 -
x I
1.0.
0.5.
c5
:x x x ~
1.0.
0.5
xx~ "~
0.0
0.0-
0:5
1:o
I:5
o:5
2:0
Half-normal scores
F i g . 2. O r o b a n c h e
data. Half-normal
1:0
1:5
2:0
Half-normal scores
plots: x - d a t a ; - -
simulated envelope.
5. Diagnostics
We have already mentioned problems with assessing the fit of overdispersed models, in that when an overdispersion parameter is estimated, the residual deviance or
Pearson X 2 are typically close to the degrees of freedom. Similarly, residuals based
on either of these quantities will be scaled and not particularly large. However, this
does not mean that the residuals are no longer useful for model diagnostics, it is just
that any useful information is contained in their pattern and not their absolute value.
Standard residual plots can be used to explore the adequacy of the linear predictor
and link function and identify outliers. A plot against the fitted values will provide
an informal check of the specification of the variance function V(#), however, this
may not be helpful in choosing between overdispersion models involving the scale
J. Hinde, C.G.B. Dem~triol Computational Statistics & Data Analysis 27 (1998) 151-170
168
Binomial: Logit
5-
up
X
.'12_
4.
u)
up
nP
r,3
cr6
OJ
3.
2.
t~
t~
x x
x
~
~
o15
1'.o
112
21o
015
Half-normal scores
1:0
115
2:0
Half-normal scores
5.
_m
o~
..,z
4.
.-g_
r~
0~
n"
rr
3.
o=
2.
C3
A(
D
1
o'.5
11o
(.5
21o
o'.5
Half-normal scores
11o
115
21o
Half-normal scores
data;
--
simulated envelope.
parameter ~b. For example, the constant overdispersion and beta-binomial variance
function models differ only in the dependence of q~ on the binomial sample size.
Liang and McCullagh (1993) use plots of binomial residuals against sample size to
suggest an appropriate model, however, it seems that such plots are rarely definitive.
Ganio and Schafer (1992) also consider diagnostics for overdispersion models using
a form of added variable plot to compare variance functions.
A useful omnibus technique for examining the residuals is to use a half-normal
plot with a simulation envelope (Atkinson, 1985) which takes account of the overdispersion in the model. Dem&rio and Hinde (1997) give a simple GLIM4 macro for
such half-normal plots which is easily extended to a wide range of overdispersed
models. Fig. 2 shows half-normal plots for some of the models considered in Table 1
for the orobanche data. In all cases the residuals are for the full S p e c i e s * E x t r a c t
model with a logit link function. This clearly shows the failure of the basic binomial
169
model and also suggests that the constant overdispersion (quasi-likelihood) model
is inadequate. There is no clear evidence to choose between the other two variance
functions, although the beta-binomial form seems most appropriate.
These plots can also indicate the failure o f various other model aspects. For example, with the trout egg data, discussed in Section 2.2.2, Fig. 3 very clearly shows
the failure o f both ordinary binomial models and, more interestingly, the inadequacy
o f the overdispersed model with a logit link, while the complementary log-log link
model has residuals inside o f the envelope. The technique is also useful for detecting
the presence o f outliers which can have a large impact on overdispersion estimates.
Acknowledgements
This work was carried out during reciprocal exchange visits funded by F A P E S P
and The Royal Society.
References
Aitkin, M.A., 1995. NPML estimation of the mixing distribution in general statistical models with
unobserved random variation. In: Seeber, G.U.H., Francis, B.J., Hatzinger, R., Steckel-Berger G.
(Eds.), Statistical Modelling. Springer, New York.
Aitkin, M.A., 1996. A general maximum likelihood analysis of overdispersion in generalized linear
models. Statist. Comput. 6, 251-262.
Aitkin, M.A., Francis, B.J., 1995. Fitting overdispersed generalized linear models by nonparametric
maximum likelihood. GLIM Newslett. 25, 37-45.
Anderson, D.A., Hinde, J.P., 1988. Random effects in generalized linear models and the EM algorithm.
Comm. Statist. A Theory Methods 17, 3847-3856.
Atkinson, A.C., 1985. Plots, Transformations and Regression. Clarendon Press, Oxford.
Breslow, N., 1984. Extra-Poisson variation in log-linear models. Appl. Statist. 33, 38-44.
Breslow, N., Clayton, D., 1993. Approximate inference in generalized linear mixed models. J. Amer.
Statist. Assoc. 88, 9-25.
Brooks, R.J., 1984. Approximate likelihood ratio tests in the analysis of beta-binomial data, Appl.
Statist. 33, 285-289.
Carroll, R.J., Ruppert, D., 1988. Transformation and Weighting in Regression, Chapman & Hall,
London.
Clayton, D., 1996. Generalized linear mixed models. In: Gilks, W.R., Richardson, S., Spiegelhalter,
D.J., (Eds.), Markov Chain Monte Carlo in Practice. Chapman & Hall, London.
Collett, D., 1991. Modelling Binary Data. Chapman & Hall, London.
Crowder, M., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
Demrtrio, C.G.B., Hinde, J.P., 1997. Half-normal plots and overdispersion. GLIM Newslett. 27, 19-26.
Efron, B., 1986. Double exponential families and their use in generalized linear regression. J. Amer.
Statist. Assoc. 81,709-721.
Engel, B., Keen, A., 1994. A simple approach for the analysis of generalized linear mixed models.
Statist. Neerlandica 48, 1-22.
Finney, D.J., 1971. Probit Analysis, 3rd ed. Cambridge University Press, Cambridge.
Fitzmaurice, G.M., 1987. Model selection with overdispersed data. The Statistician 46, 81-91.
Ganio, L.M., Schafer, D.W., 1992. Diagnostics for overdispersion. J. Amer. Statist. Assoc. 87, 795-804.
Gaver, D.P., O'Muircheartaigh, I.G., 1987. Robust empirical Bayes analysis of event rates,
Technometrics 29, 1-15.
170
Hinde, J.P., 1982. Compound Poisson regression models. In: Gilchrist, R. (Ed.), GLIM82. Springer,
New York, 1982, pp. 109-121.
Hinde, J.P., 1996. Macros for fitting overdispersion models. GLIM Newslett. 26, 10-19.
Jorgensen, B., 1987. Exponential dispersion models (with discussion). J. Roy. Statist. Soc. Ser. B 49,
127-162.
Jorgensen, B., 1997. The Theory of Dispersion Models. Chapman & Hall, London.
Lawless, J.F., 1987. Negative binomial and mixed Poisson regression. Canadian J. Statist. 15, 209-225.
Lee, Y., Nelder, J.A., 1996. Hierarchical generalized linear models (with discussion). J. Roy. Statist.
Soc. Ser. B 58, 619-678.
Liang, K.-Y., McCullagh, P., 1993. Case studies in binary dispersion. Biometrics 49, 623-630.
Lindsey, J.K., 1995. Modelling Frequency and Count Data. Clarendon Press, Oxford.
Manly, B., 1978. Regression models for proportions with extraneous variance. Biometrie-Praximetrie
18, 1-18.
McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, 2nd ed. Chapman & Hall, London.
McCulloch, C., 1994. Maximum likelihood variance components estimation for binary data. J. Amer.
Statist. Assoc. 89, 330-335.
Moore, D.F., 1986. Asymptotic properties of moment estimates for overdispersed counts and
proportions. Biometrika 73, 583-588.
Nelder, J.A., 1985. Quasi-likelihood and GLIM. In: Gilchrist, R., Francis, B., Whittaker, J. (Eds.),
Generalized Linear Models. Springer, Berlin, pp. 120-127.
Nelder, J.A., Y. Lee, Y., 1992. Likelihood, quasi-likelihood and pseudo-likelihood: some comparisons.
J. Roy. Statist. Soc. Ser. B 54, 273-284.
Nelder, J.A., Pregibon, D., 1987. An extended quasi-likelihood function. Biometrika 74, 221-232.
Prentice, R.L., 1986. Binary regression using an extended beta-binomial distribution, with discussion
of correlation induced by covariate measurement errors. J. Amer. Statist. Assoc. 81, 321-327.
Weil, C., 1970. Selection of the valid number of sampling units and a consideration of their combination
in toxicological studies involving reproduction, teratogenesis or carcinogenesis. Food Cosmetics
Toxicol. 8, 177-182.
Williams, D.A., 1982. Extra-binomial variation in logistic-linear models. Appl. Statist. 31, 144-148.
Williams, D.A., 1996. Overdispersion in logistic-linear models. In: Morgan, B.J.T. (Ed.), Statistics in
Toxicology. Clarendon Press, Oxford.