Glimmix
Glimmix
Glimmix
Contents
OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Basic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Notation for the Generalized Linear Mixed Model . . . . . . . . . . . . . . 7
The Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
G-side and R-side Random Effects . . . . . . . . . . . . . . . . . . . . 8
Relationship with Generalized Linear Models . . . . . . . . . . . . . . . 9
PROC GLIMMIX Contrasted with Other SAS Procedures . . . . . . . . . . 9
GETTING STARTED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Logistic Regressions with Random Intercepts . . . . . . . . . . . . . . . . 11
SYNTAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
PROC GLIMMIX Statement . . . . . . . . . . . . . . . . . . . . . . . . . 18
BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
CLASS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
CONTRAST Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
ESTIMATE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
FREQ Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
ID Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
LSMEANS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
LSMESTIMATE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 53
MODEL Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Response Variable Options . . . . . . . . . . . . . . . . . . . . . . . . . 60
Model Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
NLOPTIONS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
OUTPUT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
PARMS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
RANDOM Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
WEIGHT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Programming Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
User-Defined Link or Variance Function . . . . . . . . . . . . . . . . . . . 104
Implied Variance Functions . . . . . . . . . . . . . . . . . . . . . . . . 104
Automatic Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
DETAILS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Generalized Linear Models Theory . . . . . . . . . . . . . . . . . . . . . . 108
2
The GLIMMIX Procedure
Such data often display correlations among some or all observations as well as non-
normality. The correlations can arise from repeated observation of the same sampling
units, shared random effects in an experimental design, spatial (temporal) proximity,
multivariate observations, and so on.
The GLIMMIX procedure does not fit hierarchical models with non-normal random
effects. With the GLIMMIX procedure you select the distribution of the response
variable conditional on normally distributed random effects.
For more information on the differences between the GLIMMIX procedure and SAS
procedures that specialize in certain subsets of the GLMM models, see the “PROC
GLIMMIX Contrasted with Other SAS Procedures” section on page 9.
6
The GLIMMIX Procedure
Basic Features
The GLIMMIX procedure enables you to specify a generalized linear mixed model
and to perform confirmatory inference in such models. The syntax is similar to that
of the MIXED procedure and includes CLASS, MODEL, and RANDOM statements.
The following are some of the basic features of PROC GLIMMIX.
Assumptions
The primary assumptions underlying the analyses performed by PROC GLIMMIX
are as follows:
• If the model contains random effects, the distribution of the data conditional
on the random effects is known. This distribution is either a member of the
exponential family of distributions or one of the supplementary distributions
provided by the GLIMMIX procedure. In models without random effects, the
Notation for the Generalized Linear Mixed Model
7
For a model containing random effects, the GLIMMIX procedure, by default, esti-
mates the parameters by applying pseudo-likelihood techniques as in Wolfinger and
O’Connell (1993) and Breslow and Clayton (1993). In a model without random
effects (GLM models), PROC GLIMMIX estimates the parameters by maximum
likelihood, restricted maximum likelihood, or quasi-likelihood. See the “Singly or
Doubly Iterative Fitting” section on page 140 on when the GLIMMIX procedure ap-
plies noniterative, singly and doubly iterative algorithms, and the “Default Estimation
Techniques” section on page 142 on the default estimation methods.
Once the parameters have been estimated, you can perform statistical inferences for
the fixed effects and covariance parameters of the model. Tests of hypotheses for
the fixed effects are based on Wald-type tests and the estimated variance-covariance
matrix.
PROC GLIMMIX uses the Output Delivery System (ODS) for displaying and con-
trolling the output from SAS procedures. ODS enables you to convert any of the
output from PROC GLIMMIX into a SAS data set. See the “ODS Table Names”
section on page 160.
ODS statistical graphics are available with the GLIMMIX procedure. For more infor-
mation, see the PLOTS options of the PROC GLIMMIX and LSMEANS statements.
For more information on the ODS GRAPHICS statement, see Chapter 15, “Statistical
Graphics Using ODS” (SAS/STAT User’s Guide).
where g(·) is a differentiable monotonic link function and g −1 (·) is its inverse. The
matrix X is a (n×p) matrix of rank k, and Z is a (n×r) design matrix for the random
8
The GLIMMIX Procedure
effects. The random effects are assumed to be normally distributed with mean 0 and
variance matrix G.
The GLMM contains a linear mixed model inside the inverse link function. This
model component is referred to as the linear predictor,
η = Xβ + Zγ
The matrix A is a diagonal matrix and contains the variance functions of the model.
The variance function expresses the variance of a response as a function of the mean.
The GLIMMIX procedure determines the variance function from the DIST= option in
the MODEL statement or from the user-supplied variance function (see the “Implied
Variance Functions” section on page 104). The matrix R is a variance matrix spec-
ified by the user through the RANDOM statement. If the conditional distribution of
the data contains an additional scale parameter, it is either part of the variance func-
tions or part of the R matrix. For example, the gamma distribution with mean µ has
variance function a(µ) = µ2 and var[Y |γ] = µ2 φ. If your model calls for G-side ran-
dom effects only (see below), the procedure models R = φI, where I is the identity
matrix. Table 9 on page 104 identifies the distributions for which φ ≡ 1.
random _residual_;
You specify the link function g(·) with the LINK= option of the MODEL statement
or with programming statements. You specify the variance function that controls the
matrix A with the DIST= option of the MODEL statement or with programming
statements.
Unknown quantities subject to estimation are the fixed-effects parameter vector β
and the covariance parameter vector θ that comprises all unknowns in G and R.
The random effects γ are not parameters of the model in the sense that they are not
estimated. The vector γ is a vector of random variables. The solutions for γ are
predictors of these random variables.
Some fitting algorithms require that the best linear unbiased predictors (BLUPs) of γ
be computed at every iteration.
random _residual_;
on page 138. The remainder of this section compares PROC GLIMMIX with the
GENMOD, NLMIXED, LOGISTIC, and CATMOD procedures.
The GENMOD procedure fits generalized linear models for independent data by
maximum likelihood. It can also handle correlated data through the marginal GEE
approach of Liang and Zeger (1986) and Zeger and Liang (1986). The GEE imple-
mentation in the GENMOD procedure is a marginal method that does not incorporate
random effects. The GEE estimation in the GENMOD procedure relies on R-side co-
variances only, and the unknown parameters in R are estimated by the method of
moments. The GLIMMIX procedure allows G-side random effects and R-side co-
variances. The parameters are estimated by likelihood-based techniques.
Many of the fit statistics and tests in the GENMOD procedure are based on the like-
lihood. In a GLMM it is not always possible to derive the log likelihood of the data.
Even if the log likelihood is tractable, it may be computationally infeasible. In some
cases, the objective function must be constructed based on a substitute model. In other
cases, only the first two moments of the marginal distribution can be approximated.
Consequently, obtaining likelihood-based tests and statistics is difficult in the major-
ity of generalized linear mixed models. The GLIMMIX procedure relies heavily on
linearization and Taylor-series techniques to construct Wald-type test statistics and
confidence intervals. Likelihood ratio tests and confidence intervals are not available
in the GLIMMIX procedure.
The NLMIXED procedure also fits generalized linear mixed models but the class of
models it can accommodate is more narrow. The NLMIXED procedure relies on ap-
proximating the marginal log likelihood by integral approximation through Gaussian
quadrature. Like the GLIMMIX procedure, the NLMIXED procedure defines the
problem of obtaining solutions for the parameter estimates as an optimization prob-
lem. The objective function for the NLMIXED procedure is the marginal log like-
lihood obtained by integrating out the random effects from the joint distribution of
responses and random effects using quadrature techniques. Although these are very
accurate, the number of random effects that can be practically managed is limited.
Also, R-side random effects cannot be accommodated with the NLMIXED proce-
dure. The GLIMMIX procedure, on the other hand, determines the marginal log
likelihood as that of an approximate linear mixed model. This allows multiple ran-
dom effects, nested and crossed random effects, multiple cluster types, and R-side
random components. The disadvantage is a doubly iterative fitting algorithm and the
absence of a true log likelihood.
The LOGISTIC and CATMOD procedures also fit generalized linear models but ac-
commodate the independence case only. Binary, binomial, multinomial models for
ordered data, and generalized logit models that can be fit with PROC LOGISTIC,
can also be fit with the GLIMMIX procedure. The diagnostic tools and capabili-
ties specific to such data implemented in the LOGISTIC procedure go beyond the
capabilities of the GLIMMIX procedure.
Logistic Regressions with Random Intercepts
11
Getting Started
Logistic Regressions with Random Intercepts
Researchers investigated the performance of two medical procedures in a multicenter
study. They randomly selected 15 centers for inclusion. One of the study goals
was to compare the occurrence of side effects for the procedures. In each center nA
patients were randomly selected and assigned to procedure “A,” and nB patients were
randomly assigned to procedure “B”. The following DATA step creates the data set
for the analysis.
data multicenter;
input center group$ n sideeffect;
datalines;
1 A 32 14
1 B 33 18
2 A 30 4
2 B 28 8
3 A 23 14
3 B 24 9
4 A 22 7
4 B 22 10
5 A 20 6
5 B 21 12
6 A 19 1
6 B 20 3
7 A 17 2
7 B 17 6
8 A 16 7
8 B 15 9
9 A 13 1
9 B 14 5
10 A 13 3
10 B 13 1
11 A 11 1
11 B 12 2
12 A 10 1
12 B 9 0
13 A 9 2
13 B 9 6
14 A 8 1
14 B 8 1
15 A 7 1
15 B 8 0
;
The variable group identifies the two procedures, n is the number of patients who
received a given procedure in a particular center, and sideeffect is the number of
patients who reported side effects.
If YiA and YiB denote the number of patients in center i who report side effects for
procedures A and B, respectively, then—for a given center—these are independent
12
The GLIMMIX Procedure
binomial random variables. To model the probability of side effects for the two drugs,
πiA and πiB , you need to account for the fixed group effect and the random selection
of centers. One possibility is to assume a model that relates group and center effects
linearly to the logit of the probabilities:
πiA
log = β0 + βA + γi
1 − πiA
πiB
log = β0 + βB + γi
1 − πiB
The PROC GLIMMIX statement invokes the procedure. The CLASS statement in-
structs the procedure to treat the variables center and group as classification vari-
ables. The MODEL statement specifies the response variable as a sample proportion
using the events/trials syntax. In terms of the previous formulas, sideeffect/n corre-
sponds to YiA /niA for observations from Group A and to YiB /niB for observations
from Group B. The SOLUTION option in the MODEL statement requests a listing
of the solutions for the fixed-effects parameter estimates. Note that because of the
events/trials syntax, the GLIMMIX procedure defaults to the binomial distribution,
and that distribution’s default link is the logit link. The RANDOM statement spec-
ifies that the linear predictor contains an intercept term that randomly varies at the
level of the center effect. In other words, a random intercept is drawn separately and
independently for each center in the study.
The results of this analysis are shown in the following tables.
The “Model Information Table” in Figure 1 summarizes important information about
the model you fit and about aspects of the estimation technique.
Logistic Regressions with Random Intercepts
13
Model Information
PROC GLIMMIX recognizes the variables sideeffect and n as the numerator and
denominator in the events/trials syntax, respectively. The distribution—conditional
on the random center effects—is binomial. The marginal variance matrix is block-
diagonal, and observations from the same center form the blocks. The default estima-
tion technique in generalized linear mixed models is residual pseudo-likelihood with
a subject-specific expansion (METHOD=RSPL).
In Figure 2, the “Class Level Information” table lists the levels of the variables
specified in the CLASS statement and the ordering of the levels. The “Number of
Observations” table displays the number of observations read and used in the analy-
sis.
center 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
group 2 A B
There are two variables listed in the CLASS statement. The center variable has
fifteen levels, and the group variable has two levels. Since the response is specified
through the events/trial syntax, the “Number of Observations” table also contains the
total number of events and trials used in the analysis.
The “Dimensions” table in Figure 3 lists the size of relevant matrices.
14
The GLIMMIX Procedure
Dimensions
Figure 3. Dimensions
There are three columns in the X matrix, corresponding to an intercept and the two
levels of the group variable. For each subject (center), the Z matrix contains only an
intercept column.
The “Optimization Information” table in Figure 4 provides information about the
methods and size of the optimization problem.
Optimization Information
The default optimization technique for generalized linear mixed models is the Quasi-
Newton method. Because a residual likelihood technique is used to compute the ob-
jective function, only the covariance parameters are participating in the optimization.
A lower boundary constraint is placed on the variance component for the random
center effect. The solution for this variance cannot be less than zero.
The “Iteration History” table in Figure 5 displays information about the progress of
the optimization process.
Logistic Regressions with Random Intercepts
15
Iteration History
Objective Max
Iteration Restarts Subiterations Function Change Gradient
After the initial optimization, the GLIMMIX procedure performed 15 updates before
the convergence criterion was met. At convergence, the largest absolute value of the
gradient was near zero. This indicates that the process stopped at an extremum of the
objective function.
The “Fit Statistics” table in Figure 6 lists information about the fitted model.
Fit Statistics
Twice the negative of the residual log lilikelihood in the final pseudo-model equaled
81.44. The ratio of the generalized chi-square statistic and its degrees of freedom is
close to 1. This is a measure of the residual variability in the marginal distribution of
the data.
The “Covariance Parameter Estimates” table in Figure 7 displays estimates and
asymptotic estimated standard errors for all covariance parameters.
16
The GLIMMIX Procedure
Standard
Cov Parm Subject Estimate Error
bc2 =
The variance of the random center intercepts on the logit scale is estimated as σ
0.6176.
The “Parameter Estimates” table in Figure 8 displays the solutions for the fixed effects
in the model.
Standard
Effect group Estimate Error DF t Value Pr > |t|
1
π
bA = = 0.2147
1 + exp{0.8071 + 0.4896}
1
π
bB = = 0.3085
1 + exp{0.8071}
Num Den
Effect DF DF F Value Pr > F
Because the group effect has only two levels, the p-value for the effect is the same as
in the “Parameter Estimates” table, and the “F Value” is the square of the “t Value”
shown there.
You can produce the estimates of the average logits in the two groups and their pre-
dictions on the scale of the data with the LSMEANS statement in PROC GLIMMIX.
The LSMEANS statement requests the least-squares means of the group effect on the
logit scale. The CL option requests their confidence limits. The ILINK option adds
estimates, standard errors, and confidence limits on the mean (probability) scale. The
table in Figure 10 displays the results.
Standard
group Estimate Error DF t Value Pr > |t| Alpha Lower Upper
Standard
Error Lower Upper
group Mean Mean Mean Mean
The “Estimate” column displays the least-squares mean estimate on the logit scale,
and the “Mean” column represents its mapping onto the probability scale. The
“Lower” and “Upper” columns are 95% confidence limits for the logits in the two
groups. The “Lower Mean” and “Upper Mean” columns are the corresponding con-
fidence limits for the probabilities of side effects. These limits are obtained by in-
versely linking the confidence bounds on the linear scale, and thus are not symmetric
about the estimate of the probabilities.
18
The GLIMMIX Procedure
Syntax
You can specify the following statements in the GLIMMIX procedure.
ABSPCONV criterion is to stop the process when the absolute change in parameter
estimates is less than the tolerance criterion r. The criterion is based on fixed effects
and covariance parameters.
Note that this convergence criterion does not affect the convergence criteria
applied within any individual optimization. In order to change the conver-
gence behavior within an optimization, you can change the ABSCONV=,
ABSFCONV=, ABSGCONV=, ABSXCONV=, FCONV=, or GCONV= options of
the NLOPTIONS statement.
ASYCORR
produces the asymptotic correlation matrix of the covariance parameter estimates. It
is computed from the corresponding asymptotic covariance matrix (see the descrip-
tion of the ASYCOV option, which follows).
ASYCOV
requests that the asymptotic covariance matrix of the covariance parameter estimates
be displayed. By default, this matrix is the observed inverse Fisher information ma-
trix, which equals mH−1 , where H is the Hessian (second derivative) matrix of the
objective function. The factor m equals 1 in a GLM and equals 2 in a GLMM.
When you use the SCORING= option and PROC GLIMMIX converges without stop-
ping the scoring algorithm, the procedure uses the expected Hessian matrix to com-
pute the covariance matrix instead of the observed Hessian. Regardless of whether a
scoring algorithm is used or the number of scoring iterations has been exceeded, you
can request that the asymptotic covariance matrix be based on the expected Hessian
with the EXPHESSIAN option of the PROC GLIMMIX statement. If a residual
scale parameter is profiled from the likelihood equation, the asymptotic covariance
matrix is adjusted for the presence of this parameter; details of this adjustment pro-
cess are found in Wolfinger, Tobias, and Sall (1994) and in the “Estimated Precision
of Estimates” section on page 117.
CHOLESKY | CHOL
requests that the mixed model equations are constructed and solved using the
Cholesky root of the G matrix. This option applies only to estimation methods that
involve mixed model equations. The Cholesky root algorithm has greater numerical
stability but also requires more computing resources. When the estimated G ma-
trix is not positive definite during a particular function evaluation, PROC GLIMMIX
switches to the Cholesky algorithm for that evaluation and returns to the regular al-
gorithm if G b becomes positive definite again. When the CHOLESKY option is in
effect, the procedure applies the algorithm all the time.
DATA=SAS-data-set
names the SAS data set to be used by PROC GLIMMIX. The default is the most
recently created data set.
EMPIRICAL<=CLASSICAL | HC0>
EMPIRICAL<=DF | HC1>
EMPIRICAL<=ROOT | HC2>
EMPIRICAL<=FIRORES | HC3>
EMPIRICAL<=FIROEEQ<(r)>>
20
The GLIMMIX Procedure
requests that the covariance matrix of the fixed-effects parameter estimates is com-
puted by using one of the asymptotically consistent estimators, known as sandwich or
empirical estimators. The name stems from the layering of the estimator. An empir-
ically based estimate of the inverse variance of the fixed-effects parameter estimates
(the “meat”) is wrapped by the model-based variance estimate (the “bread”).
Empirical estimators are useful for obtaining inferences for the fixed effects that are
not sensitive to the choice of the covariance model. In nonmixed models, they are
useful, for example, to allay the effects of variance heterogeneity on the tests of fixed
effects.
For a general model, let Y denote the response with mean µ and variance Σ, and
let D be the matrix of first derivatives of µ with respect to the fixed effects β. The
classical sandwich estimator (Huber 1967; White 1980; Liang and Zeger 1986) is
m
!
b −1 0 b −1 b
X
Ω b 0Σ
D i i ei ei Vi Di Ω
b b
i=1
where k is the rank of X, and m equals the sum of all frequencies in a GLM and the
number of subjects in a GLMM.
PROC GLIMMIX Statement
21
Here, ` denotes the maximum value of the (possibly restricted) log likelihood, log
pseudo-likelihood, or log quasi-likelihood, d is the dimension of the model, and n,
n∗ reflect the size of the data.
The IC=PQ option requests that the penalties include the number of fixed effects
parameters, when estimation in models with random effects is based on a residual
(restricted) likelihood. For METHOD=MSPL and METHOD=MRPL, IC=Q and
IC=PQ produce the same results. IC=Q is the default for linear mixed models with
normal errors and the resulting information criteria are identical to the IC option in
the MIXED procedure.
The quantities d, n, and n∗ depend on the model and IC= option.
– n equals the sum of the frequencies for maximum likelihood and quasi-
likelihood estimation and n−rank(X) for restricted maximum likelihood
estimation.
– n∗ equals n, unless n < d + 2, in which case n∗ = d + 2.
GLMM, IC=Q: – d equals the number of effective covariance parameters, that is, covari-
ance parameters whose solution does not fall on the boundary. For
estimation of an unrestricted objective function (METHOD=MMPL,
METHOD=MSPL), this value is incremented by rank(X).
– n equals the effective number of subjects as displayed in the
“Dimensions” table, unless this value equals 1, in which case n
equals the number of levels of the first G-side RANDOM effect speci-
fied. If the number of effective subjects equals 1 and there are no G-side
random effects, n is determined as
f − rank(X) METHOD=RMPL, METHOD=RSPL
n=
f otherwise
INITGLM
requests that the estimates from a generalized linear model fit (a model without ran-
dom effects) be used as the starting values for the generalized linear mixed model.
INITITER<=number>
specifies the maximum number of iterations used when a generalized linear model is
fit initially to derive starting values for the fixed effects; see the INITGLM option.
By default, the initial fit involves at most four iteratively reweighted least squares
updates. You can change the upper limit of initial iterations with number. If the
model does not contain random effects, this option has no effect.
ITDETAILS
adds parameter estimates and gradients to the “Iteration History” table.
LIST
requests that the model program and variable lists be displayed. This is a debugging
feature and is not normally needed. When you use programming statements to define
your statistical model, this option enables you to examine the complete set of state-
ments submitted for processing. See the “Programming Statements” section for more
details on how to use SAS statements with the GLIMMIX procedure.
24
The GLIMMIX Procedure
MAXLMMUPDATE<=number>
MAXOPT<=number>
specifies the maximum number of optimizations for doubly iterative estimation meth-
ods based on linearizations. After each optimization, a new pseudo-model is con-
structed through a Taylor series expansion. This step is known as the linear mixed
model update. The MAXLMMUPDATE option limits the number of updates and
thereby limits the number of optimizations. If this option is not specified, number is
set equal to the value specified in the MAXITER option of the NLOPTIONS state-
ment. If no MAXITER= value is given, number defaults to 20.
METHOD=RSPL
METHOD=MSPL
METHOD=RMPL
METHOD=MMPL
specifies the estimation method in a generalized linear mixed model (GLMM). The
default is METHOD=RSPL.
Estimation methods ending in “PL” are pseudo-likelihood techniques. The first letter
of the METHOD= identifier determines whether estimation is based on a residual
likelihood (“R”) or a maximum likelihood (“M”). The second letter identifies the
expansion locus for the underlying approximation. Pseudo-likelihood methods for
generalized linear mixed models can be cast in terms of Taylor series expansions
(linearizations) of the GLMM. The expansion locus of the expansion is either the
vector of random effects solutions (“S”) or the mean of the random effects (“M”). The
expansions are also referred to as the “S”ubject-specific and “M”arginal expansions.
The abbreviation “PL” identifies the method as a pseudo-likelihood technique.
Residual methods account for the fixed effects in the construction of the objective
function, which reduces the bias in covariance parameter estimates. Estimation meth-
ods involving Taylor series create pseudo data for each optimization. Those data are
transformed to have zero mean in a residual method. While the covariance parameter
estimates in a residual method are the maximum likelihood estimates for the trans-
formed problem, the fixed effects estimates are (estimated) generalized least squares
estimates. In a likelihood method that is not residual based, both the covariance pa-
rameters and the fixed effects estimates are maximum likelihood estimates, but the
former are known to have greater bias. In some problems, residual likelihood esti-
mates of covariance parameters are unbiased.
For more information about linearization methods for generalized linear mixed mod-
els, see the “Pseudo-Likelihood Estimation Based on Linearization” section (begin-
ning on page 115).
If the model does not contain random effects, the GLIMMIX procedure estimates
model parameters by the following techniques:
NAMELEN<=number>
specifies the length to which long effect names are shortened. The default and mini-
mum value is 20.
NOCLPRINT<=number>
suppresses the display of the “Class Level Information” table, if you do not specify
number. If you specify number, only levels with totals that are less than number are
listed in the table.
NOFIT
suppresses fitting of the model. When the NOFIT option is in effect, PROC
GLIMMIX produces the “Model Information,” “Class Level Information,” “Number
of Observations,” and “Dimensions” tables. These can be helpful to gauge the com-
putational effort required to fit the model. For example, the “Dimensions” table in-
forms you as to whether the GLIMMIX procedure processes the data by subjects,
which is typically more computationally efficient than processing the data as a single
subject. See the “Processing by Subjects” section (beginning on page 123) for more
information.
If you request a radial smooth with knot selection by k-d tree methods, PROC
GLIMMIX also computes the knot locations of the smoother. You can then examine
the knots without fitting the model. This enables you to try out different knot con-
struction methods and bucket sizes. See the KNOTMETHOD=KDTREE option (and
its suboptions) of the RANDOM statement.
NOITPRINT
suppresses the display of the “Iteration History” table.
NOPROFILE
includes the scale parameter φ into the optimization for models that have such a pa-
rameter (see Table 9 on page 104). By default, the GLIMMIX procedure profiles
scale parameters from the optimization in mixed models. In generalized linear mod-
els, scale parameters are not profiled.
NOREML
determines the denominator for the computation of the scale parameter in a GLM for
normal data and for overdispersion parameters. By default, the GLIMMIX procedure
computes the scale parameter for the normal distribution as
n
X fi (yi − ybi )2
φb =
f −k
i=1
where k is Pthe rank of X, fi is the frequency associated with the ith observation,
and f = fi . Similarly, the overdispersion parameter in an overdispersed GLM is
estimated by the ratio of the Pearson statistic and (f − k). If the NOREML option is
in effect, the denominators are replaced by f , the sum of the frequencies. In a GLM
for normal data, this yields the maximum likelihood estimate of the error variance.
In GLMM models fit by pseudo-likelihood methods, the NOREML option changes
the estimation method to the nonresidual form. See the METHOD= option for the
distinction between residual and nonresidual estimation methods.
26
The GLIMMIX Procedure
ODDSRATIO | OR
requests that odds ratios be added to the output when applicable. Specifying the
ODDSRATIO option in the PROC GLIMMIX statement has the same effect as
specifying the ODDSRATIO option in the MODEL statement and the LSMEANS,
LSMESTIMATE, and ESTIMATE statements.
Odds ratios and their confidence limits are only reported for the following link func-
tions: LINK=LOGIT, LINK=CUMLOGIT, and LINK=GLOGIT.
ORDER=DATA
ORDER=FORMATTED
ORDER=FREQ
ORDER=INTERNAL
specifies the sorting order for the levels of all CLASS variables. This ordering de-
termines which parameters in the model correspond to each level in the data, so the
ORDER= option may be useful when you use CONTRAST or ESTIMATE state-
ments.
When the default ORDER=FORMATTED is in effect for numeric variables for which
you have supplied no explicit format, the levels are ordered by their internal values.
To order numeric class levels with no explicit format by their BEST12. formatted
values, you can specify this format explicitly for the CLASS variables.
The following table shows how PROC GLIMMIX interprets values of the ORDER=
option.
For FORMATTED and INTERNAL values, the sort order is machine dependent.
For more information on sorting order, see the chapter on the SORT procedure in the
SAS Procedures Guide and the discussion of BY-group processing in SAS Language
Reference: Concepts.
PCONV=r
specifies the parameter estimate convergence criterion for doubly iterative estimation
methods. The GLIMMIX procedure applies this criterion to fixed effects estimates
(u)
and covariance parameter estimates. Suppose ψbi denotes the estimate of the ith
PROC GLIMMIX Statement
27
parameter at the uth optimization. The procedure terminates the doubly iterative
process if the largest value
(u) (u−1)
|ψbi − ψbi |
2× (u) (u−1)
|ψb | + |ψb
i i |
is less than r. To check an absolute convergence criteria as well, you can set the
ABSPCONV= option of the PROC GLIMMIX statement. The default value for r is
1E8 times the machine epsilon, a product that equals about 1E−8 on most machines.
Note that this convergence criterion does not affect the convergence criteria
applied within any individual optimization. In order to change the con-
vergence behavior within an optimization, you can use the ABSCONV=,
ABSFCONV=, ABSGCONV=, ABSXCONV=, FCONV=, or GCONV= op-
tions of the NLOPTIONS statement.
PLOTS<(global-options)> <= specific-plot<(specific-plot-options)>>
PLOTS<(global-options)><= ( specific-plot<(specific-plot-options)>
... specific-plot<(specific-plot-options)>)>
requests that the GLIMMIX procedure produces statistical graphics via the Output
Delivery System, provided that the ODS GRAPHICS statement has been specified.
For general information about ODS graphics, see Chapter 15, “Statistical Graphics
Using ODS” (SAS/STAT User’s Guide). For examples of the basic statistical graphics
produced by the GLIMMIX procedure and aspects of their computation and interpre-
tation, see the section “Statistical Graphics for LS-Mean Comparisons” on page 155
in this chapter.
The global-options apply to all plots generated by the GLIMMIX procedure, unless
it is altered by a specific-plot-option. Currently, the global plot options supported by
the GLIMMIX procedure are
The following listing describes the specific plots and their options.
ALL requests that all default plots are produced. The default for each
residual plot is based on using the BLUPs of random effects and
representing the residual on the linearized (linked) scale. Plots of
least-squares means differences are created only for LSMEANS
statements without options that would contradict such a display.
ANOMPLOT requests an analysis of means display in which least-squares means
are compared against an average LS-mean (Ott, 1967; Nelson,
28
The GLIMMIX Procedure
produce analysis of mean plots for effects A and C. The DIFF op-
tion in the second LSMEANS statement implies all pairwise dif-
ferences.
When differences against the average LS-mean are adjusted for
multiplicity with the ADJUST=NELSON option of the LSMEANS
statement, the ANOMPLOT display is adjusted accordingly.
CONTROLPLOT requests a display in which least-squares means are visually com-
pared against a reference level. LS-mean control plots are only
produced for those model effects that are listed in LSMEANS state-
ments that have options that do not contradict with the display. For
example, the statements
lsmeans A / diff=control(’1’);
lsmeans B / diff;
lsmeans C ;
produce control plots for effects A and C. The DIFF option in the
second LSMEANS statement implies all pairwise differences.
When differences against a control level are adjusted for multi-
plicity with the ADJUST= option of the LSMEANS statement, the
control plot display is adjusted accordingly.
DIFFPLOT <(ABS | NOABS)> requests a display of all pairwise least-squares
means differences and their significance. The display is also known
as a “mean-mean scatter plot” (Hsu 1996; Hsu and Peruggia 1994).
For each comparison a line segment, centered at the LS-means in
the pair, is drawn. The length of the segment corresponds to the
projected width of a confidence interval for the least-squares mean
difference. Segments that fail to cross the 45 degree reference line
correspond to significant least-squares mean differences. The ABS
and NOABS suboptions determine the positioning of the line seg-
ments in the plot. When the ABS option is in effect, and this is the
default, all line segments are shown on the same side of the refer-
ence line. The NOABS option separates comparisons according to
the sign of the difference.
If you specify the ADJUST= option in the LSMEANS statement,
the lengths of the line segments are adjusted for multiplicity.
LS-mean difference plots are only produced for those model effects
that are listed in LSMEANS statements that have options that do
not conflict with the display. For example, the statements
PROC GLIMMIX Statement
29
lsmeans A / diff=control(’1’);
lsmeans B / diff;
lsmeans C ;
request differences against a control level for the A effect, all pair-
wise differences for the B effect, and the least-squares means for
the C effect. The DIFF= type in the first statement contradicts a
display of all pairwise differences. Difference plots are produced
only for the B and C effects.
MEANPLOT <(meanplot-options)> requests a display of the least-squares means
of effects specified in LSMEANS statements. The following
meanplot-options affect the display. Upper and lower confidence
limits are plotted when the CL option is used. When the CLBAND
option is in effect, confidence limits are shown as bands, and the
means are connected. By default, least-squares means are not
joined by lines. You can achieve that effect with the JOIN or
CONNECT options. Least-squares means are displayed in the
same order as they appear in the “Least Squares Means” table.
You can change that order for plotting with the ASCENDING and
DESCENDING options. The ILINK option requests that results be
displayed on the inverse linked scale.
Note that there is also a MEANPLOT suboption of the PLOTS=
option in the LSMEANS statement. In addition to the meanplot-
options just described, you can also specify classification effects
that give you more control over the display of interaction means.
NONE requests that no plots are produced.
RESIDUALPANEL <(residualplot-options)> requests a paneled display con-
structed from raw residuals. The panel consists of a plot of
the residuals against the linear predictor or predicted mean, a
histogram with normal density overlaid, a Q-Q plot, and a box
plot of the residuals. The residualplot-options enable you to
specify which type of residual is being graphed. These are further
discussed below.
STUDENTPANEL <(residualplot-options)> requests a paneled display constructed
from studentized residuals. The same panel organization is applied
as for the RESIDUALPANEL plot type.
PEARSONPANEL <(residualplot-options)> requests a paneled display con-
structed from Pearson residuals. The same panel organization is
applied as for the RESIDUALPANEL plot type.
The default is to compute residuals on the linearized scale using BLUPs if the
model contains G-side random effects. Not all combinations of BLUP/NOBLUP
and ILINK/NOILINK suboptions are possible for all residual types and models. For
details, see the description of output statistics for the OUTPUT statement. Pearson
residuals are always displayed against the linear predictor, all other residuals are
graphed versus the linear predictor if the NOILINK suboption is in effect (default),
and against the corresponding prediction on the mean scale if the ILINK option is in
effect. See Table 7 (page 81) for a definition of the residual quantities and exclusions.
PROFILE
requests that scale parameters are profiled from the optimization, if possible. This is
the default for generalized linear mixed models. In generalized linear models with
normally distributed data, you can use the PROFILE option to request profiling of the
residual variance.
SCOREMOD
requests that the Hessian matrix in GLMMs be based on a modified scoring algo-
rithm, provided that PROC GLIMMIX is in scoring mode when the Hessian is eval-
uated. The procedure is in scoring mode during iteration, if the optimization tech-
nique requires second derivatives, the SCORING=n option is specified, and the iter-
ation count has not exceeded n. The procedure also computes the expected (scoring)
Hessian matrix when you use the EXPHESSIAN option of the PROC GLIMMIX
statement.
The SCOREMOD option has no effect if the SCORING= or EXPHESSIAN op-
tions are not specified. The nature of the SCOREMOD modification to the expected
Hessian computation is shown in Table 10 on page 119 in the section “Pseudo-
Likelihood Estimation Based on Linearization” on page 115. The modification can
improve the convergence behavior of the GLMM compared to standard Fisher scoring
and can provide a better approximation of the variability of the covariance parame-
ters. For more details, see the “Estimated Precision of Estimates” section.
SCORING=number
requests that Fisher scoring be used in association with the estimation method up to
iteration number. By default, no scoring is applied. When you use the SCORING=
option and PROC GLIMMIX converges without stopping the scoring algorithm, the
procedure uses the expected Hessian matrix to compute approximate standard errors
for the covariance parameters instead of the observed Hessian. If necessary, the stan-
dard errors of the covariance parameters as well as the output from the ASYCOV and
ASYCORR options are adjusted.
BY Statement
31
If scoring stopped prior to convergence and you want to use the expected Hessian
matrix in the computation of standard errors, use the EXPHESSIAN option of the
PROC GLIMMIX statement.
Scoring is not possible in models for nominal data. It is also not possible for
GLMs with unknown distribution or those outside the exponential family. If you
perform quasi-likelihood estimation, the GLIMMIX procedure is always in scoring
mode and the SCORING= option has no effect. See the section “Quasi-Likelihood
for Independent Data” on page 111 for a description of the types of models where
GLIMMIX applies quasi-likelihood estimation.
The SCORING= option has no effect for optimization methods that do not involve
second derivatives. See the TECHNIQUE= option of the NLOPTIONS statement
and the section “Choosing an Optimization Algorithm” on page 142 for details on
first- and second-order algorithms.
SINGCHOL=number
tunes the singularity criterion in Cholesky decompositions. The default is the square
root of the SINGULAR criterion.
SINGULAR=number
tunes the general singularity criterion applied by the GLIMMIX procedure in divi-
sions and inversions. The default is 1E4 times the machine epsilon; this product is
approximately 1E − 12 on most computers.
STARTGLM
is an alias of the INITGLM option.
BY Statement
BY variables ;
You can specify a BY statement with PROC GLIMMIX to obtain separate analyses on
observations in groups defined by the BY variables. When a BY statement appears,
the procedure expects the input data set to be sorted in order of the BY variables. The
variables are one or more variables in the input data set.
If your input data set is not sorted in ascending order, use one of the following alter-
natives:
• Sort the data using the SORT procedure with a similar BY statement.
• Specify the BY statement options NOTSORTED or DESCENDING in the BY
statement for the GLIMMIX procedure. The NOTSORTED option does not
mean that the data are unsorted but rather that the data are arranged in groups
(according to values of the BY variables) and that these groups are not neces-
sarily in alphabetical or increasing numeric order.
• Create an index on the BY variables using the DATASETS procedure (in Base
SAS software).
Since sorting the data changes the order in which PROC GLIMMIX reads observa-
tions, the sorting order for the levels of the CLASS variable may be affected if you
32
The GLIMMIX Procedure
have specified ORDER=DATA in the PROC GLIMMIX statement. This, in turn, af-
fects specifications in the CONTRAST, ESTIMATE, or LSMESTIMATE statements.
For more information on the BY statement, refer to the discussion in SAS Language
Reference: Concepts. For more information on the DATASETS procedure, refer to
the discussion in the SAS Procedures Guide.
CLASS Statement
CLASS variables ;
The CLASS statement names the classification variables to be used in the analysis. If
the CLASS statement is used, it must appear before the MODEL statement.
Classification variables can be either character or numeric. By default, class levels
are determined from the entire formatted values of the CLASS variables. Note that
this represents a slight change from previous releases in the way in which class levels
are determined. In releases prior to SAS 9, class levels were determined using no
more than the first 16 characters of the formatted values. If you wish to revert to this
previous behavior you can use the TRUNCATE option in the CLASS statement. In
any case, you can use formats to group values into levels. Refer to the discussion of
the FORMAT procedure in the SAS Procedures Guide and to the discussions of the
FORMAT statement and SAS formats in SAS Language Reference: Dictionary. You
can adjust the order of CLASS variable levels with the ORDER= option in the PROC
GLIMMIX statement.
You can specify the following option in the CLASS statement after a slash (/).
TRUNCATE
specifies that class levels should be determined using no more than the first 16 char-
acters of the formatted values of CLASS variables. When formatted values are longer
than 16 characters, you can use this option in order to revert to the levels as deter-
mined in releases previous to SAS 9.
CONTRAST Statement
CONTRAST ’label’ contrast-specification
<, contrast-specification > <, . . . >
< / options > ;
The CONTRAST statement provides a mechanism for obtaining custom hypothesis
tests. It is patterned after the CONTRAST statement in PROC MIXED and enables
you to select an appropriate inference space (McLean, Sanders, and Stroup 1991).
You can test the hypothesis L0 φ = 0, where L0 = [K0 M0 ] and φ0 = [β 0 γ 0 ],
in several inference spaces. The inference space corresponds to the choice of M.
When M = 0, your inferences apply to the entire population from which the random
effects are sampled; this is known as the broad inference space. When all elements
of M are nonzero, your inferences apply only to the observed levels of the random
effects. This is known as the narrow inference space, and you can also choose it by
specifying all of the random effects as fixed. The GLM procedure uses the narrow
inference space. Finally, by zeroing portions of M corresponding to selected main
CONTRAST Statement
33
effects and interactions, you can choose intermediate inference spaces. The broad
inference space is usually the most appropriate; it is used when you do not specify
random effects in the CONTRAST statement.
In the CONTRAST statement,
label identifies the contrast in the table. A label is required for every
contrast specified. Labels can be up to 20 characters and must be
enclosed in single quotes.
contrast-specification identifies the fixed-effects and random-effects and their coef-
ficients from which the L matrix is formed. The syntax represen-
tation of a contrast-specification is
< fixed-effect values . . . > < | random-effect values . . . >
fixed-effect identifies an effect that appears in the MODEL statement. The
keyword INTERCEPT can be used as an effect when an intercept
is fitted in the model. You do not need to include all effects that are
in the MODEL statement.
random-effect identifies an effect that appears in the RANDOM statement. The
first random effect must follow a vertical bar (|); however, random
effects do not have to be specified.
values are constants that are elements of the L matrix associated with the
fixed and random effects.
The rows of L0 are specified in order and are separated by commas. The rows of the
K0 component of L0 are specified on the left side of the vertical bars (|). These rows
test the fixed effects and are, therefore, checked for estimability. The rows of the M0
component of L0 are specified on the right side of the vertical bars. They test the
random effects, and no estimability checking is necessary.
If PROC GLIMMIX finds the fixed-effects portion of the specified contrast to be
nonestimable (see the SINGULAR= option on page 35), then it displays missing
values for the test statistics.
If the elements of L are not specified for an effect that contains a specified effect, then
the elements of the unspecified effect are automatically “filled in” over the levels of
the higher-order effect. This feature is designed to preserve estimability for cases
when there are complex higher-order effects. The coefficients for the higher-order ef-
fect are determined by equitably distributing the coefficients of the lower-level effect
as in the construction of least-squares means. In addition, if the intercept is specified,
it is distributed over all classification effects that are not contained by any other speci-
fied effect. If an effect is not specified and does not contain any specified effects, then
all of its coefficients in L are set to 0. You can override this behavior by specifying
coefficients for the higher-order effect.
If too many values are specified for an effect, the extra ones are ignored; if too few
are specified, the remaining ones are set to 0. If no random effects are specified,
the vertical bar can be omitted; otherwise, it must be present. If a SUBJECT effect
is used in the RANDOM statement, then the coefficients specified for the effects in
34
The GLIMMIX Procedure
the RANDOM statement are equitably distributed across the levels of the SUBJECT
effect. You can use the E option to see exactly what L matrix is used.
PROC GLIMMIX handles missing level combinations of classification variables sim-
ilarly to PROC GLM and PROC MIXED. These procedures delete fixed-effects pa-
rameters corresponding to missing levels in order to preserve estimability. However,
PROC MIXED and PROC GLIMMIX do not delete missing level combinations for
random-effects parameters, because linear combinations of the random-effects pa-
rameters are always estimable. These conventions can affect the way you specify
your CONTRAST coefficients.
The CONTRAST statement computes the statistic
0
β β
L(L0 CL)
b −1 L0
b b
γ
b γ
b
F =
rank(L)
fit a generalized logit model relating the preferred style of instruction to school and
educational program effects. The first contrast compares school effects in all cate-
ESTIMATE Statement
35
ESTIMATE Statement
ESTIMATE ’label’ contrast-specification <(divisor=n)>
<, ’label’ contrast-specification <(divisor=n)> ><, . . . >
< /options > ;
The ESTIMATE statement provides a mechanism for obtaining custom hypothesis
tests. As in the CONTRAST statement, the basic element of the ESTIMATE state-
ment is the contrast-specification, which consists of MODEL and G-side RANDOM
effects and their coefficients. Specifically, a contrast-specification takes the form
PROC GLIMMIX then produces for each row l of L0 an approximate t test of the
hypothesis H: lφ = 0, where φ = [β 0 γ 0 ]0 . You can also obtain multiplicity adjusted
p-values and confidence limits for multi-row estimates with the ADJUST= option.
The output from multiple ESTIMATE statements is organized as follows. Results
from unadjusted estimates are reported first in a single table, followed by separate
tables for each of the adjusted estimates. Results from all ESTIMATE statement are
combined in the “Estimates” ODS table.
Note that multi-row estimates are permitted. Unlike using the CONTRAST state-
ment, you need to specify a ’label’ for every row of the multi-row estimate, since
PROC GLIMMIX produces one test per row.
PROC GLIMMIX selects the degrees of freedom to match those displayed in the
“Type III Tests of Fixed Effects” table for the final effect you list in the ESTIMATE
statement. You can modify the degrees of freedom using the DF= option. If you se-
lect DDFM=NONE and do not modify the degrees of freedom using the DF= option,
PROC GLIMMIX uses infinite degrees of freedom, essentially computing approx-
imate z tests. If PROC GLIMMIX finds the fixed-effects portion of the specified
estimate to be nonestimable, then it displays “Non-est” for the estimate entry.
ADJDFE=SOURCE
ADJDFE=ROW
specifies how denominator degrees of freedom are determined when p-values and
confidence limits are adjusted for multiple comparisons with the ADJUST= op-
tion. When you do not specify the ADJDFE= option, or when you specify
ADJDFE=SOURCE, the denominator degrees of freedom for multiplicity-adjusted
results are the denominator degrees of freedom for the final effect listed in the
ESTIMATE statement from the “Type III Tests of Fixed Effects” table.
The ADJDFE=ROW setting is useful if you want multiplicity adjustments to take
into account that denominator degrees of freedom are not constant across esti-
mates. This can be the case, for example, when the DDFM=SATTERTH or
DDFM=KENWARDROGER degrees-of-freedom methods are in effect.
ADJUST=BON
ADJUST=SCHEFFE
ADJUST=SIDAK
ADJUST=SIMULATE<(simoptions)>
ADJUST=T
requests a multiple comparison adjustment for the p-values and confidence limits
for the estimates. The adjusted quantities are produced in addition to the unad-
justed quantities. Adjusted confidence limits are produced if the CL or ALPHA=
options are in effect. For a description of the adjustments, see Chapter 32, “The GLM
Procedure,” and Chapter 48, “The MULTTEST Procedure,” in the SAS/STAT User’s
Guide and the documentation for the ADJUST= option of the LSMEANS statement.
The ADJUST option is ignored for generalized logit models.
If the STEPDOWN option is in effect and you choose ADJUST=BON or
ADJUST=SIMULATE, the p-values are further adjusted in a step-down fashion.
ALPHA=number
ESTIMATE Statement
37
fit a generalized logit model relating the preferred style of instruction to school and
educational program effects. The first ESTIMATE statement compares school ef-
fects separately for each nonredundant category. The second ESTIMATE statement
compares the school effects for the first non-reference category.
The BYCATEGORY option has no effect unless your model is a generalized (mixed)
logit model.
CL
requests that t type confidence limits be constructed. If DDFM=NONE and you do
not specify degrees of freedom with the DF= option, PROC GLIMMIX uses infi-
nite degrees of freedom, essentially computing a z interval. The confidence level is
0.95 by default. These intervals are adjusted for multiplicity when you specify the
ADJUST= option.
DF=number
specifies the degrees of freedom for the t test and confidence limits. The default is
the denominator degrees of freedom taken from the “Tests of Fixed Effects” table and
corresponds to the final effect you list in the ESTIMATE statement.
DIVISOR=value-list
specifies a list of values by which to divide the coefficients so that fractional coeffi-
cients can be entered as integer numerators. If you do not specify value-list a default
value of 1.0 is assumed. Missing values in the value-list are converted to 1.0.
38
The GLIMMIX Procedure
If the number of elements in value-list exceeds the number of rows of the estimate,
the extra values are ignored. If the number of elements in value-list is less than the
number of rows of the estimate, the last value in value-list is copied forward.
If you specify a row-specific divisor as part of the specification of the estimate row,
this value multiplies the corresponding divisor implied by the value-list. For example,
the statement
divides the coefficients in the first row by 8, and the coefficients in the third and fourth
row by 3. Coefficients in the second row are not altered.
E
requests that the L matrix coefficients be displayed.
ILINK
requests that the estimate and its standard errors are also reported on the scale of
the mean (the inverse linked scale). This enables you to obtain estimates of pre-
dicted probabilities and their standard errors in logistic models, for example. PROC
GLIMMIX computes the value on the mean scale by applying the inverse link to the
estimate. The interpretation of this quantity depends on the fixed-effect values and
random-effect values specified in your ESTIMATE statement. If you also specify the
CL option, the GLIMMIX procedure computes confidence limits for the estimate on
the mean scale. In multinomial models for nominal data the limits are obtained by
the delta method. In other models they are obtained from the inverse link transfor-
mation of the confidence limits for the estimate. The ILINK option is specific to an
ESTIMATE statement.
LOWER
LOWERTAILED
requests that the p-value for the t test be based only on values less than the test statis-
tic. A two-tailed test is the default. A lower-tailed confidence limit is also produced
if you specify the CL or the ALPHA= option.
ODDSRATIO | OR
requests that the estimate is also reported in terms of the odds ratio. This option
is ignored unless you are using either the LOGIT, CUMLOGIT, or GLOGIT link. If
you specify the CL option, confidence intervals for the odds ratios are also computed.
These intervals are adjusted for multiplicity when you specify the ADJUST= option.
SINGULAR=number
tunes the estimability checking as documented for the CONTRAST statement.
Experimental STEPDOWN <(step-down options)>
requests that multiplicity adjustments for the p-values of estimates be further ad-
justed in a step-down fashion. Step-down methods increase the power of multiple
ESTIMATE Statement
39
testing procedures by taking advantage of the fact that a p-value will never be de-
clared significant unless all smaller p-values are also declared significant. Note that
the STEPDOWN adjustment is available only for
MAXTIME = n specifies the time (in seconds) to spend computing the maxi-
mal logically consistent sequential subsets of equality hypothe-
ses for TYPE=LOGICAL. The default is MAXTIME=60. If the
MAXTIME value is exceeded, the adjusted tests are not com-
puted. When this occurs, you can try increasing the MAXTIME
value. However, note that there are common multiple comparisons
problems for which this computation requires a huge amount of
time—for example, all pairwise comparisons between more than
ten groups. In such cases, try using TYPE=FREE (the default) or
TYPE=LOGICAL(n) for small n.
ORDER=PVALUE
ORDER=ROWS specifies the order in which the step-down tests are performed.
ORDER=PVALUE is the default, with estimates being declared
significant only if all estimates with smaller (unadjusted) p-values
are significant. If you specify ORDER=ROWS, then significances
are evaluated in the order in which they are specified in the syntax.
REPORT specifies that a report on the step-down adjustment should be dis-
played, including a listing of the sequential subsets (Westfall 1997)
and, for ADJUST=SIMULATE, the step-down simulation results.
TYPE=LOGICAL <(n)>
TYPE=FREE If you specify TYPE=LOGICAL, the step-down adjustments are
computed using maximal logically consistent sequential subsets of
equality hypotheses (Shaffer 1986, Westfall 1997). Alternatively,
for TYPE=FREE, sequential subsets are computed ignoring logical
constraints. The TYPE=FREE results are more conservative than
those for TYPE=LOGICAL, but they can be much more efficient
to produce for many estimates. For example, it is not feasible to
take logical constraints between all pairwise comparisons of more
than about ten groups. For this reason, TYPE=FREE is the default.
40
The GLIMMIX Procedure
UPPER
UPPERTAILED
requests that the p-value for the t-test be based only on values greater than the test
statistic. A two-tailed test is the default. An upper-tailed confidence limit is also
produced if you specify the CL or the ALPHA= option.
FREQ Statement
FREQ variable ;
The variable in the FREQ statement identifies a numeric variable in the data set or
one computed through PROC GLIMMIX programming statements that contains the
frequency of occurrence for each observation. PROC GLIMMIX treats each obser-
vation as if it appears f times, where f is the value of the FREQ variable for the
observation. If it is not an integer, the frequency value is truncated to an integer.
If the frequency value is less than 1 or missing, the observation is not used in the
analysis. When the FREQ statement is not specified, each observation is assigned a
frequency of 1.
The analysis produced using a FREQ statement reflects the expanded number of ob-
servations.
ID Statement
ID variables ;
The ID statement specifies which quantities to include in the OUT= data set from the
OUTPUT statement in addition to any statistics requested in the OUTPUT statement.
If no ID statement is given, the GLIMMIX procedure includes all variables from the
input data set in the OUT= data set. Otherwise, only the variables you list in the
ID statement are included. Specifying an ID statement with no variables prevents
any variables from being included in these data sets. Automatic variables such as
– LINP– , – MU– , – VARIANCE– , etc. are not transferred to the OUT= data set,
unless explicitly listed in the ID statement.
The ID statement can be used to transfer computed quantities that depend on the
model to an output data set. In the following example, two sets of Hessian weights are
computed in a Gamma regression with a noncanonical link. The covariance matrix
for the fixed effects can be constructed as the inverse of X0 WX. W is a diagonal
LSMEANS Statement
41
matrix of the wei or woi , depending on whether the expected or observed Hessian
matrix is desired, respectively.
proc glimmix;
class group age;
model cost = group age / s error=gamma link=pow(0.5);
output out=gmxout pred=pred;
id _variance_ wei woi;
vpmu = 2*_mu_;
if (_mu_ > 1.0e-8) then do;
gpmu = 0.5 * (_mu_**(-0.5));
gppmu = -0.25 * (_mu_**(-1.5));
wei = 1/(_phi_*_variance_*gpmu*gpmu);
woi = wei + (cost-_mu_) *
(_variance_*gppmu + vpmu*gpmu) /
(_variance_*_variance_*gpmu*gpmu*gpmu*_phi_);
end;
run;
The variables – VARIANCE– , – MU– , and other symbols are predefined by PROC
GLIMMIX and can be used in programming statements. For rules and restrictions,
see the “Programming Statements” section on page 102.
LSMEANS Statement
LSMEANS fixed-effects < / options > ;
The LSMEANS statement computes least-squares means (LS-means) of fixed effects.
As in the GLM and the MIXED procedures, LS-means are predicted population mar-
gins—that is, they estimate the marginal means over a balanced population. In a
sense, LS-means are to unbalanced designs as class and subclass arithmetic means
are to balanced designs. The L matrix constructed to compute them is the same as
the L matrix formed in PROC GLM; however, the standard errors are adjusted for the
covariance parameters in the model. Least-squares means computations are currently
not supported for multinomial models.
Each LS-mean is computed as Lβ b where L is the coefficient matrix associated with
the least-squares mean and β is the estimate of the fixed effects parameter vector.
b
The approximate standard error for the LS-mean is computed as the square root of
Lvar[ b 0 . The approximate variance matrix of the fixed effects estimates depends
c β]L
on the estimation method.
LS-means are constructed on the linked scale, that is, the scale on which the model
effects are additive. For example, in a binomial model with logit link, the least-
squares means are predicted population margins of the logits.
LS-means can be computed for any effect in the MODEL statement that involves
only CLASS variables. You can specify multiple effects in one LSMEANS statement
or in multiple LSMEANS statements, and all LSMEANS statements must appear
after the MODEL statement. As in the ESTIMATE statement, the L matrix is tested
42
The GLIMMIX Procedure
for estimability, and if this test fails, PROC GLIMMIX displays “Non-est” for the
LS-means entries.
Assuming the LS-mean is estimable, PROC GLIMMIX constructs an approximate t
test to test the null hypothesis that the associated population quantity equals zero. By
default, the denominator degrees of freedom for this test are the same as those dis-
played for the effect in the “Tests of Fixed Effects” table. If the DDFM=SATTERTH
or DDFM=KENWARDROGER option is specified in the MODEL statement, PROC
GLIMMIX determines degrees of freedom separately for each test, unless the DDF=
option overrides it for a particular effect. See the DDFM= option on page 62 for more
information.
You can specify the following options in the LSMEANS statement after a slash (/).
ADJDFE=SOURCE
ADJDFE=ROW
specifies how denominator degrees of freedom are determined when p-values and
confidence limits are adjusted for multiple comparisons with the ADJUST= op-
tion. When you do not specify the ADJDFE= option, or when you specify
ADJDFE=SOURCE, the denominator degrees of freedom for multiplicity-adjusted
results are the denominator degrees of freedom for the LS-mean effect in the “Type
III Tests of Fixed Effects” table. When you specify ADJDFE=ROW, the denomina-
tor degrees of freedom for multiplicity-adjusted results correspond to the degrees of
freedom displayed in the DF column of the “Differences of Least Squares Means”
table.
The ADJDFE=ROW setting is particularly useful if you want multiplicity adjust-
ments to take into account that denominator degrees of freedom are not con-
stant across LS-mean differences. This can be the case, for example, when the
DDFM=SATTERTH or DDFM=KENWARDROGER degrees-of-freedom methods
are in effect.
In one-way models with heterogeneous variance, combining certain ADJUST= op-
tions with the ADJDFE=ROW option corresponds to particular methods of perform-
ing multiplicity adjustments in the presence of heteroscedasticity. For example, the
statements
proc glimmix;
class A;
model y = A;
random _residual_ / group=A subject=A;
lsmeans A / adjust=smm adjdfe=row;
run;
ADJUST=TUKEY gives the exact results for the case of fractional degrees of free-
dom in the one-way model, but it does not take into account that the degrees of free-
dom are subject to variability. A more conservative method, such as ADJUST=SMM,
may protect the overall error rate better.
Unless the ADJUST= option of the LSMEANS statement is specified, the ADJDFE=
option has no effect.
ADJUST=BON
ADJUST=DUNNETT
ADJUST=NELSON
ADJUST=SCHEFFE
ADJUST=SIDAK
ADJUST=SIMULATE<(simoptions)>
ADJUST=SMM | GT2
ADJUST=TUKEY
requests a multiple comparison adjustment for the p-values and confidence limits
for the differences of LS-means. The adjusted quantities are produced in addition
to the unadjusted quantities. By default, PROC GLIMMIX performs all pairwise
differences unless you specify ADJUST=DUNNETT, in which case the procedure
analyzes all differences with a control level. The ADJUST= option implies the DIFF
option (see page 46), unless the SLICEDIFF= option is specified.
The BON (Bonferroni) and SIDAK adjustments involve correction factors de-
scribed in Chapter 32, “The GLM Procedure,” and Chapter 48, “The MULTTEST
Procedure,” of the SAS/STAT User’s Guide; also refer to Westfall and Young (1993)
and Westfall et al. (1999). When you specify ADJUST=TUKEY and your data are
unbalanced, PROC GLIMMIX uses the approximation described in Kramer (1956)
and identifies the adjustment as “Tukey-Kramer” in the results. Similarly, when you
specify ADJUST=DUNNETT or ADJUST=NELSON and the LS-means are corre-
lated, the GLIMMIX procedure uses the factor-analytic covariance approximation
described in Hsu (1992) and identifies the adjustment in the results as “Dunnett-Hsu”
or “Nelson-Hsu”, respectively. The approximation derives an approximate “effec-
tive sample sizes” for which exact critical values are computed. Note that comput-
ing the exact adjusted p-values and critical values for unbalanced designs can be
computationally intensive, in particular for ADJUST=NELSON. A simulation-based
approach, as specified by the ADJUST=SIM option, while nondeterministic, may
provide inferences that are sufficiently accurate in much less time. The preceding
references also describe the SCHEFFE and SMM adjustments.
Nelson’s adjustment applies only to the analysis of means (Ott, 1967; Nelson, 1982,
1991, 1993), where LS-means are compared against an average LS-mean. It does not
apply to all pairwise differences of least squares means, or to slice differences that
you specify with the SLICEDIFF= option. See the DIFF=ANOM option for more
details regarding the analysis of means with the GLIMMIX procedure.
The SIMULATE adjustment computes adjusted p-values and confidence limits from
the simulated distribution of the maximum or maximum absolute value of a multivari-
ate t random vector. All covariance parameters, except the residual scale parameter,
are fixed at their estimated values throughout the simulation, potentially resulting in
44
The GLIMMIX Procedure
some underdispersion. The simulation estimates q, the true (1 − α)th quantile, where
1 − α is the confidence coefficient. The default α is 0.05, and you can change this
value with the ALPHA= option in the LSMEANS statement.
The number of samples is set so that the tail area for the simulated q is within γ of
1 − α with 100(1 − )% confidence. In equation form,
q ) − (1 − α)| ≤ γ) = 1 −
P (|F (b
where q̂ is the simulated q and F is the true distribution function of the maximum;
refer to Edwards and Berry (1987) for details. By default, γ = 0.005 and = 0.01,
placing the tail area of q̂ within 0.005 of 0.95 with 99% confidence. The ACC=
and EPS= simoptions reset γ and , respectively, the NSAMP= simoption sets the
sample size directly, and the SEED= simoption specifies an integer used to start the
pseudo-random number generator for the simulation. If you do not specify a seed,
or specify a value less than or equal to zero, the seed is generated from reading the
time of day from the computer clock. For additional descriptions of these and other
simulation options, see the “LSMEANS Statement” (Chapter 34, SAS/STAT User’s
Guide) section on page 1979 in Chapter 32, “The GLM Procedure” (SAS/STAT User’s
Guide).
If the STEPDOWN option is in effect and you choose ADJUST=BON or
ADJUST=SIMULATE, the p-values are further adjusted in a step-down fashion.
ALPHA=number
requests that a t type confidence interval be constructed for each of the LS-means
with confidence level 1 − number. The value of number must be between 0 and 1;
the default is 0.05.
AT variable = value
AT (variable-list) = (value-list)
AT MEANS
enables you to modify the values of the covariates used in computing LS-means. By
default, all covariate effects are set equal to their mean values for computation of
standard LS-means. The AT option enables you to assign arbitrary values to the co-
variates. Additional columns in the output table indicate the values of the covariates.
If there is an effect containing two or more covariates, the AT option sets the effect
equal to the product of the individual means rather than the mean of the product (as
with standard LS-means calculations). The AT MEANS option sets covariates equal
to their mean values (as with standard LS-means) and incorporates this adjustment to
crossproducts of covariates.
As an example, consider the following invocation of PROC GLIMMIX:
proc glimmix;
class A;
model Y = A X1 X2 X1*X2;
lsmeans A;
lsmeans A / at means;
lsmeans A / at X1=1.2;
LSMEANS Statement
45
For the first two LSMEANS statements, the LS-means coefficient for X1 is x1 (the
mean of X1) and for X2 is x2 (the mean of X2). However, for the first LSMEANS
statement, the coefficient for X1*X2 is x1 x2 , but for the second LSMEANS state-
ment, the coefficient is x1 · x2 . The third LSMEANS statement sets the coefficient
for X1 equal to 1.2 and leaves it at x2 for X2, and the final LSMEANS statement sets
these values to 1.2 and 0.3, respectively.
Even if you specify a WEIGHT variable, the unweighted covariate means are used
for the covariate coefficients if there is no AT specification. If you specify the AT
option, WEIGHT or FREQ variables are taken into account as follows. The weighted
covariate means are then used for the covariate coefficients for which no explicit AT
values are given, or if you specify AT MEANS. Observations that do not contribute
to the analysis because of a missing dependent variable are included in computing
the covariate means. You should use the E option in conjunction with the AT option
to check that the modified LS-means coefficients are the ones you desire.
The AT option is disabled if you specify the BYLEVEL option.
BYLEVEL
requests PROC GLIMMIX to compute separate margins for each level of the
LSMEANS effect.
The standard LS-means have equal coefficients across classification effects. The
BYLEVEL option changes these coefficients to be proportional to the observed mar-
gins. This adjustment is reasonable when you want your inferences to apply to a
population that is not necessarily balanced but has the margins observed in the in-
put data set. In this case, the resulting LS-means are actually equal to raw means
for fixed effects models and certain balanced random effects models, but their esti-
mated standard errors account for the covariance structure that you have specified.
If a WEIGHT statement is specified, PROC GLIMMIX uses weighted margins to
construct the LS-means coefficients.
If the AT option is specified, the BYLEVEL option disables it.
CL
requests that t type confidence limits be constructed for each of the LS-means. If
DDFM=NONE, then PROC GLIMMIX uses infinite degrees of freedom for this test,
essentially computing a z interval. The confidence level is 0.95 by default; this can
be changed with the ALPHA= option.
CORR
displays the estimated correlation matrix of the least-squares means as part of the
“Least Squares Means” table.
COV
displays the estimated covariance matrix of the least-squares means as part of the
“Least Squares Means” table.
46
The GLIMMIX Procedure
DF=number
specifies the degrees of freedom for the t test and confidence limits. The default is
the denominator degrees of freedom taken from the “Type III Tests of Fixed Effects”
table corresponding to the LS-means effect.
DIFF<=difftype>
PDIFF<=difftype>
requests that differences of the LS-means be displayed. The optional difftype spec-
ifies which differences to produce, with possible values ALL, ANOM, CONTROL,
CONTROLL, and CONTROLU. The ALL value requests all pairwise differences,
and it is the default. The CONTROL difftype requests differences with a control,
which, by default, is the first level of each of the specified LSMEANS effects.
The ANOM value requests differences between each LS-mean and the average LS-
mean, as in the analysis of means (Ott, 1967). The average is computed as a weighted
mean of the LS-means, the weights being inversely proportional to the diagonal en-
tries of the
−
X0 SX
For multiple effects, the results depend upon the order of the list, and so you should
check the output to make sure that the controls are correct.
Two-tailed tests and confidence limits are associated with the CONTROL difftype.
For one-tailed results, use either the CONTROLL or CONTROLU difftype. The
CONTROLL difftype tests whether the noncontrol levels are significantly smaller
than the control; the upper confidence limits for the control minus the noncontrol
levels are considered to be infinity and are displayed as missing. Conversely, the
CONTROLU difftype tests whether the noncontrol levels are significantly larger than
the control; the upper confidence limits for the noncontrol levels minus the control
are considered to be infinity and are displayed as missing.
If you want to perform multiple comparison adjustments on the differences of LS-
means, you must specify the ADJUST= option.
LSMEANS Statement
47
The differences of the LS-means are displayed in a table titled “Differences of Least
Squares Means.”
E
requests that the L matrix coefficients for the LSMEANS effects be displayed.
ILINK
requests that estimates and their standard errors in the “Least Squares Means Table”
are also reported on the scale of the mean (the inverse linked scale). This enables
you to obtain estimates of predicted probabilities and their standard errors in logistic
models, for example. The option is specific to a LSMEANS statement. If you also
specify the CL option, the GLIMMIX procedure computes confidence intervals for
the predicted means.
LINES
presents results of comparisons between all pairs of least-squares means by listing
the means in descending order and indicating nonsignificant subsets by line segments
beside the corresponding LS-means. When all differences have the same variance,
these comparison lines are guaranteed to accurately reflect the inferences based on
the corresponding tests, made by comparing the respective p-values to the value of
the ALPHA= option (0.05 by default). However, equal variances are rarely the case
for differences between LS-means. If the variances are not all the same, then the
comparison lines may be conservative, in the sense that if you base your inferences
on the lines alone, you will detect fewer significant differences than the tests indicate.
If there are any such differences, PROC GLIMMIX lists the pairs of means that are
inferred to be significantly different by the tests but not by the comparison lines.
Note, however, that in many cases, even though the variances are unbalanced, they
are near enough so that the comparison lines accurately reflect the test inferences.
ODDSRATIO | OR
requests that LS-mean results are also reported in terms of the odds ratio. This option
is ignored unless you are using either the LOGIT, CUMLOGIT, or GLOGIT link. If
you specify the CL option, confidence intervals for the odds ratios are also computed.
These intervals are adjusted for multiplicity when you specify the ADJUST= option.
OBSMARGINS | OM
specifies a potentially different weighting scheme for the computation of LS-means
coefficients. The standard LS-means have equal coefficients across classification ef-
fects; however, the OM option changes these coefficients to be proportional to those
found in the input data set. This adjustment is reasonable when you want your in-
ferences to apply to a population that is not necessarily balanced but has the margins
observed in your data.
In computing the observed margins, PROC GLIMMIX uses all observations for
which there are no missing or invalid independent variables, including those for
which there are missing dependent variables. Also, if you use a WEIGHT state-
ment, PROC GLIMMIX computes weighted margins to construct the LS-means co-
efficients. If your data are balanced, the LS-means are unchanged by the OM option.
The BYLEVEL option modifies the observed-margins LS-means. Instead of comput-
ing the margins across all of the input data set, PROC GLIMMIX computes separate
48
The GLIMMIX Procedure
margins for each level of the LSMEANS effect in question. In this case the result-
ing LS-means are actually equal to raw means for fixed effects models and certain
balanced random effects models, but their estimated standard errors account for the
covariance structure that you have specified.
You can use the E option in conjunction with either the OM or BYLEVEL option to
check that the modified LS-means coefficients are the ones you desire. It is possible
that the modified LS-means are not estimable when the standard ones are estimable,
or vice versa.
PDIFF
is the same as the DIFF option. See the description of the DIFF option on page 46.
PLOTS<=specific-plot<(specific-plot-options)>>
PLOTS<= ( specific-plot<(specific-plot-options)>
... specific-plot<(specific-plot-options)>)>
requests that least-squares means related graphics are produced via ODS Graphics,
provided that the ODS GRAPHICS statement has been specified and that the plot
request does not conflict with other options in the LSMEANS statement. For general
information about ODS graphics, see Chapter 15, “Statistical Graphics Using ODS”
(SAS/STAT User’s Guide). For examples of the basic statistical graphics produced by
the GLIMMIX procedure and aspects of their computation and interpretation, see the
section “Statistical Graphics for LS-Mean Comparisons” on page 155 in this chapter.
The specific-plot options (and their suboptions) of the LSMEANS statement are a
subset of those for the PLOTS= option of the PROC GLIMMIX statement. Options
specified in the LSMEANS statement override those on the PLOTS= option in the
PROC GLIMMIX statement. Currently, the GLIMMIX procedure supports plots of
pairwise differences and differences against a control level.
The specific-plot options and their suboptions are as follows.
produce analysis of mean plots for effects A and C. The DIFF op-
tion in the second LSMEANS statement implies all pairwise dif-
ferences.
CONTROLPLOT requests a display in which least-squares means are visually com-
pared against a reference level. These plots are only produced
LSMEANS Statement
49
for statements with options that are compatible with control dif-
ferences. For example, the statements
lsmeans A / diff=control(’1’);
lsmeans B / diff;
lsmeans C ;
produce control plots for effects A and C. The DIFF option in the
second LSMEANS statement implies all pairwise differences.
DIFFPLOT <(ABS | NOABS)> requests a display of all pairwise least-squares
means differences and their significance. The display is also known
as a “mean-mean scatter plot” (Hsu 1996; Hsu and Peruggia 1994).
For each comparison a line segment, centered at the LS-means in
the pair, is drawn. The length of the segment corresponds to the
projected width of a confidence interval for the least-squares mean
difference. Segments that fail to cross the 45 degree reference line
correspond to significant least-squares mean differences. The ABS
and NOABS suboptions of the DiffPlot determine the positioning
of the line segments in the plot. When the ABS option is in effect,
and this is the default, all line segments are shown on the same side
of the reference line. The NOABS option separates comparisons
according to the sign of the difference.
LS-mean difference plots are only produced for statements with
options that are compatible with the display. For example, the
statements
lsmeans A / diff=control(’1’);
lsmeans B / diff;
lsmeans C ;
request differences against a control level for the A effect, all pair-
wise differences for the B effect, and the least-squares means for
the C effect. The DIFF= type in the first statement is incompati-
ble with a display of all pairwise differences. Difference plots are
produced only for the B and C effects.
MEANPLOT <(meanplot-options)> requests displays of the least-squares means.
See below for a description of the meanplot-options.
NONE requests that no plots are produced.
When LS-mean comparisons are adjusted for multiplicity using the ADJUST= op-
tion, the plots are adjusted accordingly.
The following meanplot-options control the display of the least-squares means.
requests for each level of C one plot of the A*B cell means that
are associated with that level of C. In each plot, levels of A are
displayed on the horizontal axis, and confidence bands are drawn
around the means that share the same level of B.
The PLOTBY= option has no effect unless the LS-mean effect con-
tains at least three classification variables. The PLOTBY= effect
does not have to be an effect in the MODEL statement, but it must
consist entirely of classification variables.
SINGULAR=number
tunes the estimability checking as documented in the “CONTRAST Statement” sec-
tion on page 32.
SLICE= fixed-effect
SLICE= (fixed-effects)
LSMEANS Statement
51
specifies effects by which to partition interaction LSMEANS effects. This can pro-
duce what are known as tests of simple effects (Winer 1971). For example, suppose
that A*B is significant, and you want to test the effect of A for each level of B. The
appropriate LSMEANS statement is
This statement tests for the simple main effects of A for B, which are calculated by
extracting the appropriate rows from the coefficient matrix for the A*B LS-means
and using them to form an F test.
The SLICE option produces F tests that test the simultaneous equality of cell means
at a fixed level of the slice-effect (Schabenberger, Gregoire, and Kong 2000). You
can request differences of the least-squares means while holding one or more factors
at a fixed level with the SLICEDIFF= option.
The SLICE option produces a table titled “Tests of Effect Slices.”
SLICEDIFF= fixed-effect
SLICEDIFF= (fixed-effects)
SIMPLEDIFF= fixed-effect
SIMPLEDIFF= (fixed-effects)
requests that differences of simple effects be constructed and tested against zero.
Whereas the SLICE option extracts multiple rows of the coefficient matrix and forms
an F-test, the SLICEDIFF option tests pairwise differences of these rows. This allows
you to perform multiple comparisons among the levels of one factor at a fixed level
of the other factor. For example, assume that, in a balanced design, factors A and B
have a = 4 and b = 3 levels, respectively. Consider the statements
proc glimmix;
class a b;
model y = a b a*b;
lsmeans a*b / slice=a;
lsmeans a*b / slicediff=a;
run;
The first LSMEANS statement produces four F tests, one per level of A. The first of
these tests is constructed by extracting the three rows corresponding to the first level
of A from the coefficient matrix for the A*B interaction. Call this matrix La1 . The
SLICE test is a test of the hypothesis H : La1 β =0. In a balanced design, where
µij denotes the mean response if A is at level i and B is at level j, this hypothesis is
equivalent to H : µ11 = µ12 = µ13 . The SLICEDIFF option considers the three rows
of La1 in turn and performs tests of the difference between pairs of rows. How these
differences are constructed depends on the SLICEDIFFTYPE= option. By default,
all pairwise differences within the subset of L are considered. In the example, with
a = 4 and b = 3, the second LSMEANS statement produces four sets of least-squares
means differences. Within each set, factor A is held fixed at a particular level.
When the ADJUST= option is specified, the GLIMMIX procedure also adjusts the
tests for multiplicity. The adjustment is based on the number of comparisons within
52
The GLIMMIX Procedure
each level of the SLICEDIFF= effect; see the SLICEDIFFTYPE= option. The Nelson
adjustment is not available for slice differences.
SLICEDIFFTYPE<=difftype>
SIMPLEDIFFTYPE=<=difftype>
determines the type of simple effect differences produced with the SLICEDIFF= op-
tion.
The possible values for the difftype are ALL, CONTROL, CONTROLL, and
CONTROLU. The difftype ALL requests all simple effects differences, and it is the
default. The difftype CONTROL requests the differences with a control, which, by
default, is the first level of each of the specified LSMEANS effects.
To specify which levels of the effects are the controls, list the quoted formatted values
in parentheses after the keyword CONTROL. For example, if the effects A, B, and
C are class variables, each having three levels, 1, 2 and 3, the following LSMEANS
statement specifies the (1,3) level of A*B as the control.
This LSMEANS statement first produces simple effects differences holding the levels
of A fixed, and then it produces simple effects differences holding the levels of B
fixed. In the former case, level ’3’ of B serves as the control level. In the latter case,
level ’1’ of A serves as the control.
For multiple effects, the results depend upon the order of the list, and so you should
check the output to make sure that the controls are correct.
Two-tailed tests and confidence limits are associated with the CONTROL difftype.
For one-tailed results, use either the CONTROLL or CONTROLU difftype. The
CONTROLL difftype tests whether the noncontrol levels are significantly smaller
than the control; the upper confidence limits for the control minus the noncontrol
levels are considered to be infinity and are displayed as missing. Conversely, the
CONTROLU difftype tests whether the noncontrol levels are significantly larger than
the control; the upper confidence limits for the noncontrol levels minus the control
are considered to be infinity and are displayed as missing.
Experimental STEPDOWN <(step-down options)>
requests that multiple comparison adjustments for the p-values of LS-mean differ-
ences be further adjusted in a step-down fashion. Step-down methods increase the
power of multiple comparisons by taking advantage of the fact that a p-value will
never be declared significant unless all smaller p-values are also declared significant.
Note that the STEPDOWN adjustment is available only for
MAXTIME = n specifies the time (in seconds) to spend computing the maxi-
mal logically consistent sequential subsets of equality hypothe-
ses for TYPE=LOGICAL. The default is MAXTIME=60. If the
MAXTIME value is exceeded, the adjusted tests are not com-
puted. When this occurs, you can try increasing the MAXTIME
value. However, note that there are common multiple comparisons
problems for which this computation requires a huge amount of
time—for example, all pairwise comparisons between more than
ten groups. In such cases, try using TYPE=FREE (the default) or
TYPE=LOGICAL(n) for small n.
REPORT specifies that a report on the step-down adjustment should be dis-
played, including a listing of the sequential subsets (Westfall 1997)
and, for ADJUST=SIMULATE, the step-down simulation results.
TYPE=LOGICAL <(n)>
TYPE=FREE If you specify TYPE=LOGICAL, the step-down adjustments are
computed using maximal logically consistent sequential subsets of
equality hypotheses (Shaffer 1986, Westfall 1997). Alternatively,
for TYPE=FREE, sequential subsets are computed ignoring logical
constraints. The TYPE=FREE results are more conservative than
those for TYPE=LOGICAL, but they can be much more efficient
to produce for many comparisons. For example, it is not feasible to
take logical constraints between all pairwise comparisons of more
than ten groups. For this reason, TYPE=FREE is the default.
However, you can reduce the computational complexity of tak-
ing logical constraints into account by limiting the depth of the
search tree used to compute them, specifying the optional depth
parameter as a number n in parentheses after TYPE=LOGICAL.
As with TYPE=FREE, results for TYPE=LOGICAL(n) are con-
servative relative to the true TYPE=LOGICAL results, but even
for TYPE=LOGICAL(0) they can be appreciably less conservative
than TYPE=FREE and they are computationally feasible for much
larger numbers of comparisons.
LSMESTIMATE Statement
LSMESTIMATE fixed-effect <’label’> values <divisor=n >
<, <’label’> values <divisor=n >> <, . . . >
< / options > ;
54
The GLIMMIX Procedure
KL1 β = Lβ
PROC GLIMMIX produces a t test for each row of coefficients specified in the
LSMESTIMATE statement. You can adjust p-values and confidence intervals for
multiplicity with the ADJUST= option. You can obtain an F test of single-row or
multi-row LSMESTIMATEs with the FTEST option.
Note that in contrast to a multi-row estimate in the ESTIMATE statement, you spec-
ify only a single fixed-effect in the LSMESTIMATE statement. The row labels are
optional and follow the effects specification. For example, the statements
proc glimmix;
class a b block ;
model y = a b a*b / s;
random int a / sub=block ;
lsmestimate A ’a1 vs avg(a3,a4)’ 2 0 -1 -1 divisor=2;
run;
fit a split-split-plot design and compare the average of the third and fourth LS-mean
of the whole-plot factor A to the first LS-mean of the factor.
The order in which coefficients are assigned to the least-squares means corresponds
to the order in which they are displayed in the “Least Squares Means” table. You can
use the ELSM option to see how coefficients are matched to levels of the fixed-effect.
The optional divisor=n specification allows you to assign a separate divisor to
each row of the LSMESTIMATE. You can also assign divisor values through the
DIVISOR= option. See the documentation for this option on the interaction between
the two ways of specifying divisors.
Many options of the LSMESTIMATE statement affect the computation of least-
squares means. For example, the AT=, BYLEVEL, and OM options. See the docu-
mentation for the LSMEANS statement for details.
You can specify the following options in the LSMESTIMATE statement after a slash
(/).
LSMESTIMATE Statement
55
ADJDFE=SOURCE
ADJDFE=ROW
specifies how denominator degrees of freedom are determined when p-values and
confidence limits are adjusted for multiple comparisons with the ADJUST= op-
tion. When you do not specify the ADJDFE= option, or when you specify
ADJDFE=SOURCE, the denominator degrees of freedom for multiplicity-adjusted
results are the denominator degrees of freedom for the LS-mean effect in the “Type
III Tests of Fixed Effects” table.
The ADJDFE=ROW setting is useful if you want multiplicity adjustments to
take into account that denominator degrees of freedom are not constant across
estimates. This can be the case, for example, when DDFM=SATTERTH or
DDFM=KENWARDROGER are specified in the MODEL statement.
ADJUST=BON
ADJUST=SCHEFFE
ADJUST=SIDAK
ADJUST=SIMULATE<(simoptions)>
ADJUST=T
requests a multiple comparison adjustment for the p-values and confidence limits
for the LS-mean estimates. The adjusted quantities are produced in addition to the
unadjusted p-values and confidence limits. Adjusted confidence limits are produced
if the CL or ALPHA= options are in effect. For a description of the adjustments, see
Chapter 32, “The GLM Procedure,” and Chapter 48, “The MULTTEST Procedure,”
of the SAS/STAT User’s Guide as well as the documentation for the ADJUST= option
of the LSMEANS statement.
Note that not all adjustment methods of the LSMEANS statement are available for the
LSMESTIMATE statement. Multiplicity adjustments in the LSMEANS statement
are designed specifically for differences of least-squares means.
If you specifyS the STEPDOWN option and you choose ADJUST=BON or
ADJUST=SIMULATE, the p-values are further adjusted in a step-down fashion.
ALPHA=number
requests that a t type confidence interval be constructed for each of the LS-means
with confidence level 1 − number. The value of number must be between 0 and 1;
the default is 0.05.
AT variable = value
AT (variable-list) = (value-list)
AT MEANS
enables you to modify the values of the covariates used in computing LS-means. See
the AT= option of the LSMEANS statement for details.
BYLEVEL
requests PROC GLIMMIX to compute separate margins for each level of the
LSMEANS effect.
56
The GLIMMIX Procedure
The standard LS-means have equal coefficients across classification effects. The
BYLEVEL option changes these coefficients to be proportional to the observed mar-
gins. This adjustment is reasonable when you want your inferences to apply to a
population that is not necessarily balanced but has the margins observed in the in-
put data set. In this case, the resulting LS-means are actually equal to raw means
for fixed effects models and certain balanced random effects models, but their esti-
mated standard errors account for the covariance structure that you have specified.
If a WEIGHT statement is specified, PROC GLIMMIX uses weighted margins to
construct the LS-means coefficients.
If the AT option is specified, the BYLEVEL option disables it.
CHISQ
requests that chi-square tests be performed in addition to F tests, when you request
an F test with the FTEST option.
CL
requests that t type confidence limits be constructed for each of the LS-means. If
DDFM=NONE, then PROC GLIMMIX uses infinite degrees of freedom for this test,
essentially computing a z interval. The confidence level is 0.95 by default; this can
be changed with the ALPHA= option.
CORR
displays the estimated correlation matrix of the linear combination of the least-
squares means.
COV
displays the estimated covariance matrix of the linear combination of the least-
squares means.
DF=number
specifies the degrees of freedom for the t test and confidence limits. The default is
the denominator degrees of freedom taken from the “Type III Tests of Fixed Effects”
table corresponding to the LS-means effect.
DIVISOR=value-list
specifies a list of values by which to divide the coefficients so that fractional coeffi-
cients can be entered as integer numerators. If you do not specify value-list a default
value of 1.0 is assumed. Missing values in the value-list are converted to 1.0.
If the number of elements in value-list exceeds the number of rows of the estimate,
the extra values are ignored. If the number of elements in value-list is less than the
number of rows of the estimate, the last value in value-list is carried forward.
If you specify a row-specific divisor as part of the specification of the estimate row,
this value multiplies the corresponding value in the value-list. For example, the state-
ment
divides the coefficients in the first row by 8, and the coefficients in the third and fourth
row by 3. Coefficients in the second row are not altered.
E
requests that the L coefficients of the estimable function are displayed. These are the
coefficients that apply to the fixed-effect parameter estimates.
ELSM
requests that the K matrix coefficients are displayed. These are the coefficients that
apply to the LS-means. This option is useful to ensure that you assigned the coeffi-
cients correctly to the LS-means.
FTEST <(LABEL=’label’)>
produces an F test that jointly tests the rows of the LSMESTIMATE against zero.
You can specify the optional label to identify the results from that test in the
“LSMContrasts” table.
ILINK
requests that estimates and their standard errors are also reported on the scale of the
mean (the inverse linked scale). If you also specify the CL or the ALPHA= option,
the GLIMMIX procedure computes confidence intervals for the predicted means.
LOWER
LOWERTAILED
requests that the p-value for the t test be based only on values less than the test statis-
tic. A two-tailed test is the default. A lower-tailed confidence limit is also produced
if you specify the CL or the ALPHA= option.
OBSMARGINS | OM
specifies a potentially different weighting scheme for the computation of LS-means
coefficients. The standard LS-means have equal coefficients across classification ef-
fects; however, the OM option changes these coefficients to be proportional to those
found in the input data set. See the OBSMARGINS option of the LSMEANS state-
ment for further details.
ODDSRATIO | OR
requests that the estimate is also reported in terms of the odds ratio. This option
is ignored unless you are using the LOGIT, CUMLOGIT, or GLOGIT link. If you
specify the CL or the ALPHA= option, confidence intervals for the odds ratios are
also computed. These intervals are adjusted for multiplicity when you specify the
ADJUST= option.
SINGULAR=number
tunes the estimability checking as documented in the “CONTRAST Statement” sec-
tion on page 32.
STEPDOWN <(step-down options)> Experimental
requests that multiplicity adjustments for the p-values of LS-mean estimates be fur-
ther adjusted in a step-down fashion. Step-down methods increase the power of mul-
tiple testing procedures by taking advantage of the fact that a p-value will never be
declared significant unless all smaller p-values are also declared significant. Note that
the STEPDOWN adjustment is available only for
58
The GLIMMIX Procedure
MAXTIME = n specifies the time (in seconds) to spend computing the maxi-
mal logically consistent sequential subsets of equality hypothe-
ses for TYPE=LOGICAL. The default is MAXTIME=60. If the
MAXTIME value is exceeded, the adjusted tests are not com-
puted. When this occurs, you can try increasing the MAXTIME
value. However, note that there are common multiple comparisons
problems for which this computation requires a huge amount of
time—for example, all pairwise comparisons between more than
ten groups. In such cases, try using TYPE=FREE (the default) or
TYPE=LOGICAL(n) for small n.
ORDER=PVALUE
ORDER=ROWS specifies the order in which the step-down tests are performed.
ORDER=PVALUE is the default, with LS-mean estimates being
declared significant only if all LS-mean estimates with smaller (un-
adjusted) p-values are significant. If you specify ORDER=ROWS,
then significances are evaluated in the order in which they are spec-
ified.
REPORT specifies that a report on the step-down adjustment should be dis-
played, including a listing of the sequential subsets (Westfall 1997)
and, for ADJUST=SIMULATE, the step-down simulation results.
TYPE=LOGICAL <(n)>
TYPE=FREE If you specify TYPE=LOGICAL, the step-down adjustments are
computed using maximal logically consistent sequential subsets of
equality hypotheses (Schaffer 1986, Westfall 1997). Alternatively,
for TYPE=FREE, sequential subsets are computed ignoring logical
constraints. The TYPE=FREE results are more conservative than
those for TYPE=LOGICAL, but they can be much more efficient
to produce for many estimates. For example, it is not feasible to
take logical constraints between all pairwise comparisons of more
than about ten groups. For this reason, TYPE=FREE is the default.
However, you can reduce the computational complexity of tak-
ing logical constraints into account by limiting the depth of the
MODEL Statement
59
UPPER
UPPERTAILED
requests that the p-value for the t-test be based only on values greater than the test
statistic. A two-tailed test is the default. An upper-tailed confidence limit is also
produced if you specify the CL or the ALPHA= option.
MODEL Statement
MODEL response <(response options)> = < fixed-effects >< / options
>;
MODEL events/trials = < fixed-effects >< / options >;
The MODEL statement is required and names the dependent variable and the fixed-
effects. The fixed-effects determine the X matrix of the model (see the “Notation
for the Generalized Linear Mixed Model” section on page 7 for details). The specifi-
cation of effects (Chapter 34, SAS/STAT User’s Guide) is the same as in the GLM or
MIXED procedures. In contrast to PROC GLM, you do not specify random effects
in the MODEL statement. However, in contrast to PROC GLM and PROC MIXED,
continuous variables on the left-hand and right-hand side of the MODEL statement
can be computed through PROC GLIMMIX programming statements.
An intercept is included in the fixed-effects model by default. It can be removed with
the NOINT option.
The dependent variable can be specified using either the response syntax or the
events/trials syntax. The events/trials syntax is specific to models for binomial data.
A binomial(n,π) variable is the sum of n independent Bernoulli trials with event
probability π. Each Bernoulli trial results either in an event, or a non-event (with
probability 1 − π). You use the events/trials syntax to indicate to the GLIMMIX
procedure that the Bernoulli outcomes are grouped. The value of the second variable,
trials, gives the number n of Bernoulli trials. The value of the first variable, events,
is the number of events out of n. The values of both events and (trials-events) must
be nonnegative and the value of trials must be positive. Observations for which these
conditions are not met are excluded from the analysis. If the events/trials syntax is
used, the GLIMMIX procedure defaults to the binomial distribution. The response is
then the events variable. The trials variable is accounted in model fitting as an addi-
tional weight. If you use the response syntax, the procedure defaults to the normal
distribution.
60
The GLIMMIX Procedure
REFERENCE=’category’ | keyword
REF=’category’ | keyword
specifies the reference category for the generalized logit model and the binary re-
sponse model. For the generalized logit model, each nonreference category is con-
trasted with the reference category. For the binary response model, specifying one
response category as the reference is the same as specifying the other response cate-
gory as the event category. You can specify the value (formatted if a format is applied)
of the reference category in quotes or you can specify one of the following keywords.
The default is REF=LAST.
Model Options
ALPHA=number
requests that a t type confidence interval be constructed for each of the fixed-effects
parameters with confidence level 1 − number. The value of number must be between
0 and 1; the default is 0.05.
CHISQ
requests that chi-square tests be performed for all specified effects in addition to the
F tests. Type III tests are the default; you can produce the Type I and Type II tests by
using the HTYPE= option.
CL
requests that t type confidence limits be constructed for each of the fixed-effects pa-
rameter estimates. The confidence level is 0.95 by default; this can be changed with
the ALPHA= option.
CORRB
produces the correlation matrix of the approximate variance-covariance matrix of the
fixed-effects parameter estimates.
COVB
produces the approximate variance-covariance matrix of the fixed-effects parame-
ter estimates β.
b In a generalized linear mixed model this matrix typically takes the
form (X V0 b X)− and is obtained by sweeping the mixed model equations; see the
−1
assigns 3 denominator degrees of freedom to A and 4.7 to A*B, while those for B
remain the same. If you select a degrees-of-freedom method with the DDFM= op-
tion, nonmissing, positive values in value-list override the degrees of freedom for the
particular effect. For example,
assigns 3 and 6 denominator degrees of freedom in the test of the A main effect and
the A*B interaction, respectively. The denominator degrees of freedom for the test
for the B effect are determined from a Satterthwaite approximation.
Note that the DDF= and DDFM= options determine the degrees of freedom in the
“Type I Tests of Fixed Effects,” “Type I Tests of Fixed Effects,” and “Type III Tests of
Fixed Effects” tables. These degrees of freedom are also used in determining the de-
grees of freedom in tests and confidence intervals from the CONTRAST, ESTIMATE,
and LSMEANS statements. Exceptions from this rule are noted in the documentation
for the respective statements.
DDFM=BETWITHIN | BW
DDFM=CONTAIN |CON
DDFM=KENWARDROGER | KENROG | KR
DDFM=NONE
DDFM=RESIDUAL | RES
DDFM=SATTERTH | SATTER | SAT
specifies the method for computing the denominator degrees of freedom for the
tests of fixed effects resulting from the MODEL, CONTRAST, ESTIMATE, and
LSMEANS statements.
The DDFM=BETWITHIN option divides the residual degrees of freedom into
between-subject and within-subject portions. PROC GLIMMIX then determines
whether a fixed effect changes within any subject. If the GLIMMIX procedure does
not process the data by subjects, the DDFM=BETWITHIN option has no effect. See
the section “Processing by Subjects” on page 123 for details. If so, it assigns within-
subject degrees of freedom to the effect; otherwise, it assigns the between-subject
degrees of freedom to the effect (refer to Schluchter and Elashoff 1990). If there are
multiple within-subject effects containing classification variables, the within-subject
degrees of freedom are partitioned into components corresponding to the subject-by-
effect interactions.
One exception to the preceding method is the case when you model only R-side co-
variation with an unstructured covariance matrix (TYPE=UN option). In this case,
MODEL Statement
63
all fixed effects are assigned the between-subject degrees of freedom to provide
for better small-sample approximations to the relevant sampling distributions. The
DDFM=BETWITHIN method is the default for models with only R-side random ef-
fects and a SUBJECT= option.
The DDFM=CONTAIN option invokes the containment method to compute denom-
inator degrees of freedom, and this method is the default when the model contains
G-side random effects. The containment method is carried out as follows: Denote
the fixed effect in question A and search the RANDOM effect list for the effects that
syntactically contain A. For example, the RANDOM effect B(A) contains A, but the
RANDOM effect C does not, even if it has the same levels as B(A).
Among the RANDOM effects that contain A, compute their rank contribution to the
[X Z] matrix. The denominator degrees of freedom assigned to A is the smallest of
these rank contributions. If no effects are found, the denominator df for A is set equal
to the residual degrees of freedom, n−rank[X Z]. This choice of degrees of freedom
is the same as for the tests performed for balanced split-plot designs and should be
adequate for moderately unbalanced designs.
CAUTION: If you have a Z matrix with a large number of columns, the overall mem-
ory requirements and the computing time after convergence can be substantial for the
containment method. If it is too large, you may want to use the DDFM=RESIDUAL
or the DDFM=BETWITHIN option.
The DDFM=NONE specifies that no denominator degrees of freedom be applied.
PROC GLIMMIX then essentially assumes that infinite degrees of freedom are avail-
able in the calculation of p-values. The p-values for t tests are then identical to
p-values derived from the standard normal distribution. Instead of t statistics, the
procedure reports z statistics with p-values determined from the standard normal dis-
tribution. In the case of F tests, the p-values equal those of chi-square tests deter-
mined as follows: if Fobs is the observed value of the F test with l numerator degrees
of freedom, then
Regardless of the DDFM= method, you can obtain these chi-square p-values with the
CHISQ option of the MODEL statement.
The DDFM=RESIDUAL option performs all tests using the residual degrees of free-
dom, n − rank(X), where n is the sum of the frequencies used. It is the default
degrees of freedom method for GLMs and overdispersed GLMs.
The DDFM=KENWARDROGER option applies the (prediction) standard error and
degrees-of-freedom correction detailed by Kenward and Roger (1997). This approx-
imation involves inflating the estimated variance-covariance matrix of the fixed and
random effects by the method proposed by Prasad and Rao (1990) and Harville and
Jeske (1992); refer also to Kackar and Harville (1984). Satterthwaite-type degrees
of freedom are then computed based on this adjustment. By default, the observed
information matrix of the covariance parameter estimates is used in the calculations.
64
The GLIMMIX Procedure
Table 2. (continued)
Default Link Numeric
DIST= Distribution Function Value
BYOBS(variable) multivariate varied NA
Note that the PROC GLIMMIX default link for the gamma or exponential distribu-
tions is not the canonical link (the reciprocal link).
The numeric value in the last column of Table 2 can be used in combination with
DIST=BYOBS. The BYOBS(variable) syntax designates a variable whose value
identifies the distribution to which an observation belongs. If the variable is numeric,
its values must match values in the last column of Table 2. If the variable is not
numeric, an observation’s distribution is identified by the first four characters of the
distribution’s name in the left-most column of the table. Distributions whose numeric
value is “NA” cannot be used with DIST=BYOBS.
If the variable in BYOBS(variable) is a data set variable, it can also be used in the
CLASS statement of the GLIMMIX procedure. For example, this provides a con-
venient method to model multivariate data jointly while varying fixed effects com-
ponents across outcomes. Assume that, for example, for each patient, a count and a
continuous outcome were observed; the count data are modeled as Poisson data and
the continuous data as gamma variates. The statements
fit a Poisson and a gamma regression model simultaneously. The two models have
separate intercepts and treatment effects. To correlate the outcomes, you can share a
random effect between the observations from the same patient:
√
E[Y ] = exp{µ} ω
Var[Y ] = exp{2µ}ω(ω − 1)
ω = exp{σ 2 }
The DIST=TCENTRAL option models the data as a shifted and scaled central t vari-
able. This enables you to model data with heavy-tailed distributions. If Y denotes
the response and X has a tν distribution with ν degrees of freedom, then PROC
GLIMMIX models
r
ν−2
Y =µ+φ X
ν
By default, ν = 3. You can supply different degrees of freedom for the t variate as in
the following statements
proc glimmix;
class a b;
model y = b x b*x / dist=tcentral(9.6);
random a;
run;
The GLIMMIX procedure does not accept values for the degrees of freedom pa-
rameter less than 3.0. If the t distribution is used with the DIST=BYOBS(variable)
specification, the degrees of freedom are fixed at ν = 3. For mixed models where pa-
rameters are estimated based on linearization, choosing DIST=TCENTRAL instead
MODEL Statement
67
of DIST=NORMAL affects only the residual variance, which increases by the factor
ν/(ν − 2).
The DIST=BETA option implements the parameterization of the beta distribution in
Ferrari and Cribari-Neto (2004). If Y has a beta(α, β) density, so that E[Y ] = µ =
α/(α + β), this parameterization uses the variance function a(µ) = µ(1 − µ) and
var[Y ] = a(µ)/(1 + φ).
See the section “Maximum Likelihood” beginning on page 108 for the log likelihoods
of the distributions fitted by the GLIMMIX procedure.
E
requests that Type I, Type II, and Type III L matrix coefficients be displayed for all
specified effects.
E1 | EI
requests that Type I L matrix coefficients be displayed for all specified effects.
E2 | EII
requests that Type II L matrix coefficients be displayed for all specified effects.
E3 | EIII
requests that Type III L matrix coefficients be displayed for all specified effects.
HTYPE=value-list
indicates the type of hypothesis test to perform on the fixed effects. Valid entries for
values in the value-list are 1, 2, and 3; corresponding to Type I, Type II, and Type III
tests. The default value is 3. You can specify several types by separating the values
with a comma or a space. The ODS table names are “Tests1”, “Tests2”, and “Tests3”
for the Type I, Type II, and Type III tests, respectively.
INTERCEPT
adds a row to the tables for Type I, II, and III tests corresponding to the overall
intercept.
LINK = keyword
specifies the link function in the generalized linear mixed model. The keywords and
their associated built-in link functions are shown in Table 3.
Table 3. Built-in Link Functions of the GLIMMIX Procedure
Link Numeric
LINK= Function g(µ) = η = Value
CUMCLL | CCLL cumulative log(− log(1 − π)) NA
complementary log-log
CUMLOGIT | CLOGIT cumulative logit log(γ/(1 − π)) NA
CUMLOGLOG cumulative log-log − log(− log(π)) NA
CUMPROBIT | CPROBIT cumulative probit Φ−1 (π) NA
CLOGLOG | CLL complementary log-log log(− log(1 − µ)) 5
GLOGIT | GENLOGIT generalized logit NA
IDENTITY | ID identity µ 1
LOG log log(µ) 4
LOGIT logit log(µ/(1 − µ)) 2
68
The GLIMMIX Procedure
Table 3. (continued)
Link Numeric
LINK= Function g(µ) = η = Value
LOGLOG log-log − log(− log(µ)) 6
−1
PROBIT probit λ Φ (µ) 3
µ if λ 6= 0
POWER(λ) | POW(λ) power with exponent λ= number NA
log(µ) if λ = 0
POWERMINUS2 power with exponent -2 1/µ2 8
RECIPROCAL | INVERSE reciprocal 1/µ 7
BYOBS(variable) varied varied NA
For the probit and cumulative probit links, Φ−1 (·) denotes the quantile function of
the standard normal distribution. For the other cumulative links, π denotes a cumu-
lative category probability. The cumulative and generalized logit link functions are
appropriate only for the multinomial distribution. When you choose a cumulative
link function, PROC GLIMMIX assumes that the data are ordinal. When you specify
LINK=GLOGIT, the GLIMMIX procedure assumes that the data are nominal (not
ordered).
The numeric value in the rightmost column of Table 3 can be used in conjunction with
LINK=BYOBS(variable). This syntax designates a variable whose values identify
the link function associated with an observation. If the variable is numeric, its values
must match those in the last column of Table 3. If the variable is not numeric, an
observation’s link function is determined by the first four characters of the link’s
name in the first column. Those link functions whose numeric value is “NA” cannot
be used with LINK=BYOBS(variable).
You can define your own link function through programming statements. See the
section “User-Defined Link or Variance Function” on page 104 for more information
on how to specify a link function. If a user-defined link function is in effect, the
specification in the LINK= option is ignored. If you specify neither the LINK= option
nor a user-defined link function, then the default link function is chosen according to
Table 2.
LWEIGHT=FIRSTORDER | FIRO
LWEIGHT=NONE
LWEIGHT=VAR
determines how weights are used in constructing the coefficients for Type I through
Type III L matrices. The default is LWEIGHT=VAR, and the values of the
WEIGHT variable are used in forming cross-product matrices. If you specify
LWEIGHT=FIRO, the weights incorporate the WEIGHT variable as well as the first-
order weights of the linearized model. For LWEIGHT=NONE, the L matrix coef-
ficients are based on the raw cross-product matrix, whether a WEIGHT variable is
specified or not.
NOCENTER
requests that the columns of the X matrix are not centered and scaled. By default, the
MODEL Statement
69
columns of X are centered and scaled. Unless the NOCENTER option is in effect, X
is replaced by X∗ during estimation. The columns of X∗ are computed as follows:
• In models with an intercept, the intercept column remains the same and the jth
entry in row i of X∗ is
xij − xj
x∗ij = pPn
2
i=1 (xij − xj )
• In models without intercept, no centering takes place and the jth entry in row i
of X∗ is
xij
x∗ij = pPn
2
i=1 (xij − xj )
The effects of centering and scaling are removed when results are reported. For ex-
ample, if the covariance matrix of the fixed effects is printed with the COVB option
of the MODEL statement, the covariances are reported in terms of the original pa-
rameters, not the centered and scaled versions. If you specify the STDCOEF option,
fixed effects parameter estimates and their standard errors are reported in terms of the
standardized (scaled and/or centered) coefficients in addition to the usual results in
noncentered form.
NOINT
requests that no intercept be included in the fixed-effects model. An intercept is
included by default.
ODDSRATIO | OR
requests odds ratios and their confidence limits for the parameter estimates. To
compute odds ratios for general estimable functions and least-square means, see the
ODDSRATIO options of the ESTIMATE and LSMEANS statements.
Odds ratio results are reported for the following link functions: LINK=LOGIT,
LINK=CUMLOGIT, and LINK=GLOGIT.
OFFSET=variable
specifies a variable to be used as an offset for the linear predictor. An offset plays
the role of a fixed effect whose coefficient is known to be 1. You can use an offset in
a Poisson model, for example, when counts have been obtained in time intervals of
different lengths. With a log link function, you can model the counts as Poisson vari-
ables with the logarithm of the time interval as the offset variable. The offset variable
cannot appear in the CLASS statement or elsewhere in the MODEL or RANDOM
statements.
REFLINP=r
specifies a value for the linear predictor of the reference level in the generalized logit
model for nominal data. By default r = 0.
SOLUTION | S
requests that a solution for the fixed-effects parameters be produced. Using notation
from the “Notation for the Generalized Linear Mixed Model” section beginning on
70
The GLIMMIX Procedure
Along with the estimates and their approximate standard errors, a t statistic is
computed as the estimate divided by its standard error. The degrees of freedom
for this t statistic matches the one appearing in the “Tests of Fixed Effects” ta-
ble under the effect containing the parameter. If DDFM=KENWARDROGER or
DDFM=SATTERTH, the degrees of freedom are computed separately for each
fixed-effect estimate, unless you override the value for any specific effect with the
DDF=value-list option. The “Pr > |t|” column contains the two-tailed p-value cor-
responding to the t statistic and associated degrees of freedom. You can use the CL
option to request confidence intervals for the fixed-effects parameters; they are con-
structed around the estimate by using a radius of the standard error times a percentage
point from the t distribution.
STDCOEF
reports solutions for fixed effects in terms of the standardized (scaled and/or centered)
coefficients. This option has no effect when the NOCENTER option is specified or
in models for multinomial data.
ZETA=number
tunes the sensitivity in forming Type III functions. Any element in the estimable
function basis with an absolute value less than number is set to 0. The default is
1E−8.
NLOPTIONS Statement
NLOPTIONS < options >;
Most models fit with the GLIMMIX procedure have one or more nonlinear parame-
ters. Estimation requires nonlinear optimization methods. You can control the opti-
mization through options of the NLOPTIONS statement.
Several estimation methods of the GLIMMIX procedure (METHOD=RSPL, MSPL,
RMPL, MMPL) are doubly iterative in the following sense. The generalized linear
mixed model is approximated by a linear mixed model based on current values of the
covariance parameter estimates. The resulting linear mixed model is then fit, which
is itself an iterative process (with some exceptions). On convergence, new covari-
ance parameters and fixed effects estimates are obtained and the approximated linear
mixed model is updated. Its parameters are again estimated iteratively. It is thus rea-
sonable to refer to outer and inner iterations. The outer iterations involve the repeated
updates of the linear mixed models, and the inner iterations are the iterative steps that
lead to parameter estimates in any given linear mixed model. The NLOPTIONS
statement controls the inner iterations. The outer iteration behavior can be controlled
with options of the PROC GLIMMIX statement; for example, MAXLMMUPDATE,
PCONV=, ABSPCONV=. If the estimation method involves a singly iterative ap-
proach, then there is no need for the outer cycling and the model is fit in a single
NLOPTIONS Statement
71
Option Description
Optimization Specifications
HESCAL= type of Hessian scaling
INHESSIAN= start for approximated Hessian
LINESEARCH= line-search method
LSPRECISION= line-search precision
RESTART= iteration number for update restart
TECHNIQUE= minimization technique
UPDATE= update technique
ABSCONV=r
ABSTOL=r
specifies an absolute function convergence criterion. For minimization, termination
requires f (ψ (k) ) ≤ r. The default value of r is the negative square root of the largest
double precision value, which serves only as a protection against overflows.
ABSFCONV=r < [n] >
ABSFTOL=r < [n] >
specifies an absolute function convergence criterion. For all techniques except
NMSIMP, termination requires a small change of the function value in successive
iterations:
|f (ψ (k−1) ) − f (ψ (k) )| ≤ r
The same formula is used for the NMSIMP technique, but ψ (k) is defined as the
vertex with the lowest function value, and ψ (k−1) is defined as the vertex with the
highest function value in the simplex. The default value is r = 0. The optional
integer value n specifies the number of successive iterations for which the criterion
must be satisfied before the process can be terminated.
ABSGCONV=r < [n] >
ABSGTOL=r < [n] >
specifies an absolute gradient convergence criterion. Termination requires the maxi-
mum absolute gradient element to be small:
This criterion is not used by the NMSIMP technique. The default value is r = 1E−5.
The optional integer value n specifies the number of successive iterations for which
the criterion must be satisfied before the process can be terminated.
ABSXCONV=r < [n] >
ABSXTOL=r < [n] >
specifies an absolute parameter convergence criterion. For all techniques except
NMSIMP, termination requires a small Euclidean distance between successive pa-
rameter vectors,
k ψ (k) − ψ (k−1) k2 ≤ r
For the NMSIMP technique, termination requires either a small length α(k) of the
vertices of a restart simplex,
α(k) ≤ r
or a small simplex size,
δ (k) ≤ r
where the simplex size δ (k) is defined as the L1 distance from the simplex vertex ξ (k)
(k)
with the smallest function value to the other p simplex points ψ l 6= ξ (k) :
(k)
X
δ (k) = k ψl − ξ (k) k1
ψ l 6=y
NLOPTIONS Statement
73
The default is r=10−FDIGITS where FDIGITS is the value of the FDIGITS= option
of the PROC GLIMMIX statement. The optional integer value n specifies the number
of successive iterations for which the criterion must be satisfied before the process can
terminate.
FSIZE=r
specifies the FSIZE parameter of the relative function and relative gradient termina-
tion criteria. The default value is r = 0. For more details, see the FCONV= and
GCONV= options.
GCONV=r < [n] >
GTOL=r < [n] >
specifies a relative gradient convergence criterion. For all techniques except
CONGRA and NMSIMP, termination requires that the normalized predicted func-
tion reduction is small,
where FSIZE is defined by the FSIZE= option. For the CONGRA technique (where
a reliable Hessian estimate H is not available), the following criterion is used:
This criterion is not used by the NMSIMP technique. The default value is r = 1E −8.
The optional integer value n specifies the number of successive iterations for which
the criterion must be satisfied before the process can terminate.
HESCAL=0|1|2|3
HS=0|1|2|3
specifies the scaling version of the Hessian matrix used in NRRIDG, TRUREG,
NEWRAP, or DBLDOG optimization. If HS is not equal to 0, the first iteration
(0)
and each restart iteration sets the diagonal scaling matrix D(0) = diag(di ):
q
(0) (0)
di = max(|Hi,i |, )
(0)
where Hi,i are the diagonal elements of the Hessian. In every other iteration, the di-
(0)
agonal scaling matrix D(0) = diag(di ) is updated depending on the HS option:
HS=2 specifies the Dennis, Gay, and Welsch (1981) scaling update:
q
(k+1) (k) (k)
di = max 0.6 ∗ di , max(|Hi,i |, )
In each scaling update, is the relative machine precision. The default value is HS=0.
Scaling of the Hessian can be time consuming in the case where general linear con-
straints are active.
INHESSIAN[= r ]
INHESS[= r ]
specifies how the initial estimate of the approximate Hessian is defined for the quasi-
Newton techniques QUANEW and DBLDOG. There are two alternatives:
• If you do not use the r specification, the initial estimate of the approximate
Hessian is set to the Hessian at ψ (0) .
• If you do use the r specification, the initial estimate of the approximate Hessian
is set to the multiple of the identity matrix rI.
By default, if you do not specify the option INHESSIAN=r, the initial estimate of the
approximate Hessian is set to the multiple of the identity matrix rI, where the scalar
r is computed from the magnitude of the initial gradient.
NLOPTIONS Statement
75
INSTEP=r
reduces the length of the first trial step during the line search of the first iterations.
For highly nonlinear objective functions, such as the EXP function, the default ini-
tial radius of the trust-region algorithm TRUREG or DBLDOG or the default step
length of the line-search algorithms can result in arithmetic overflows. If this oc-
curs, you should specify decreasing values of 0 < r < 1 such as INSTEP=1E − 1,
INSTEP=1E − 2, INSTEP=1E − 4, and so on, until the iteration starts successfully.
LINESEARCH=i
LIS=i
specifies the line-search method for the CONGRA, QUANEW, and NEWRAP opti-
mization techniques. Refer to Fletcher (1987) for an introduction to line-search tech-
niques. The value of i can be 1, . . . , 8. For CONGRA, QUANEW, and NEWRAP,
the default value is i = 2.
LIS=1 specifies a line-search method that needs the same number of func-
tion and gradient calls for cubic interpolation and cubic extrapola-
tion; this method is similar to one used by the Harwell subroutine
library.
LIS=2 specifies a line-search method that needs more function than gra-
dient calls for quadratic and cubic interpolation and cubic ex-
trapolation; this method is implemented as shown in Fletcher
(1987) and can be modified to an exact line search by using the
LSPRECISION= option.
LIS=3 specifies a line-search method that needs the same number of
function and gradient calls for cubic interpolation and cubic ex-
trapolation; this method is implemented as shown in Fletcher
(1987) and can be modified to an exact line search by using the
LSPRECISION= option.
LIS=4 specifies a line-search method that needs the same number of func-
tion and gradient calls for stepwise extrapolation and cubic inter-
polation.
LIS=5 specifies a line-search method that is a modified version of LIS=4.
LIS=6 specifies golden section line search (Polak 1971), which uses only
function values for linear approximation.
76
The GLIMMIX Procedure
LIS=7 specifies bisection line search (Polak 1971), which uses only func-
tion values for linear approximation.
LIS=8 specifies the Armijo line-search technique (Polak 1971), which
uses only function values for linear approximation.
LSPRECISION=r
LSP=r
specifies the degree of accuracy that should be obtained by the line-search algorithms
LIS=2 and LIS=3. Usually an imprecise line search is inexpensive and successful.
For more difficult optimization problems, a more precise and expensive line search
may be necessary (Fletcher 1987). The second line-search method (which is the
default for the NEWRAP, QUANEW, and CONGRA techniques) and the third line-
search method approach exact line search for small LSPRECISION= values. If you
have numerical problems, you should try to decrease the LSPRECISION= value to
obtain a more precise line search. The default values are shown in Table 5.
Table 5. Default Values for Linesearch Precision
TECH= UPDATE= LSP default
QUANEW DBFGS, BFGS r = 0.4
QUANEW DDFP, DFP r = 0.06
CONGRA all r = 0.1
NEWRAP no update r = 0.9
Note that the optimization can terminate only after completing a full iteration.
Therefore, the number of function calls that is actually performed can exceed the
number that is specified by the MAXFUNC= option.
MAXITER=i
MAXIT=i
specifies the maximum number i of iterations in the optimization process. The default
values are
• TRUREG, NRRIDG, NEWRAP: 50
• QUANEW, DBLDOG: 200
• CONGRA: 400
• NMSIMP: 1000
NLOPTIONS Statement
77
These default values are also valid when i is specified as a missing value.
MAXSTEP=r[n]
specifies an upper bound for the step length of the line-search algorithms during the
first n iterations. By default, r is the largest double precision value and n is the largest
integer available. Setting this option can improve the speed of convergence for the
CONGRA, QUANEW, and NEWRAP techniques.
MAXTIME=r
specifies an upper limit of r seconds of CPU time for the optimization process. The
default value is the largest floating point double representation of your computer.
Note that the time specified by the MAXTIME= option is checked only once at the
end of each iteration. Therefore, the actual running time can be much longer than
that specified by the MAXTIME= option. The actual running time includes the rest
of the time needed to finish the iteration and the time needed to generate the output
of the results.
MINITER=i
MINIT=i
specifies the minimum number of iterations. The default value is 0. If you request
more iterations than are actually needed for convergence to a stationary point, the
optimization algorithms can behave strangely. For example, the effect of rounding
errors can prevent the algorithm from continuing for the required number of itera-
tions.
RESTART=i > 0
REST=i > 0
specifies that the QUANEW or CONGRA algorithm is restarted with a steepest de-
scent/ascent search direction after, at most, i iterations. Default values are
• CONGRA: UPDATE=PB: restart is performed automatically, i is not used.
• CONGRA: UPDATE6=PB: i = min(10p, 80), where p is the number of pa-
rameters.
• QUANEW: i is the largest integer available.
SOCKET=fileref
specifies the fileref that contains the information needed for remote monitoring. See
the section “Remote Monitoring” on page 147 for more details.
TECHNIQUE=value
TECH=value
specifies the optimization technique. You can find additional information on choosing
an optimization technique in the section “Choosing an Optimization Algorithm” on
page 142. Valid values for the TECHNIQUE= option are
• CONGRA
performs a conjugate-gradient optimization, which can be more precisely spec-
ified with the UPDATE= option and modified with the LINESEARCH= option.
When you specify this option, UPDATE=PB by default.
78
The GLIMMIX Procedure
• DBLDOG
performs a version of double dogleg optimization, which can be more pre-
cisely specified with the UPDATE= option. When you specify this option,
UPDATE=DBFGS by default.
• NMSIMP
performs a Nelder-Mead simplex optimization.
• NONE
does not perform any optimization. This option can be used
The GLIMMIX procedure applies the default optimization technique shown in Table
6, depending on your model.
Table 6. Default Techniques
Model Family Setting TECHNIQUE=
GLM DIST=NORMAL NONE
LINK=IDENTITY
UPDATE=method
UPD=method
specifies the update method for the quasi-Newton, double dogleg, or conjugate-
gradient optimization technique. Not every update method can be used with each
optimizer.
NLOPTIONS Statement
79
• BFGS
performs the original Broyden, Fletcher, Goldfarb, and Shanno (BFGS) update
of the inverse Hessian matrix.
• DBFGS
performs the dual BFGS update of the Cholesky factor of the Hessian matrix.
This is the default update method.
• DDFP
performs the dual Davidon, Fletcher, and Powell (DFP) update of the Cholesky
factor of the Hessian matrix.
• DFP
performs the original DFP update of the inverse Hessian matrix.
• PB
performs the automatic restart update method of Powell (1977) and Beale
(1972).
• FR
performs the Fletcher-Reeves update (Fletcher 1987).
• PR
performs the Polak-Ribiere update (Fletcher 1987).
• CD
performs a conjugate-descent update of Fletcher (1987).
XCONV=r[n]
XTOL=r[n]
specifies the relative parameter convergence criterion. For all techniques except
NMSIMP, termination requires a small relative parameter change in subsequent it-
erations:
(k) (k−1)
maxj |ψ j − ψ j |
(k) (k−1)
≤r
max(|ψ j |, |ψ j |, XSIZE)
(k)
For the NMSIMP technique, the same formula is used, but ψ j is defined as the
(k−1)
vertex with the lowest function value and ψj is defined as the vertex with the
highest function value in the simplex. The default value is r = 1E − 8 for the
NMSIMP technique and r = 0 otherwise. The optional integer value n specifies the
number of successive iterations for which the criterion must be satisfied before the
process can be terminated.
XSIZE=r > 0
specifies the XSIZE parameter of the relative parameter termination criterion. The
default value is r = 0. For more details, see the XCONV= option.
80
The GLIMMIX Procedure
OUTPUT Statement
OUTPUT <OUT=SAS-data-set>
<keyword<(keyword-options)><=name>> . . .
<keyword<(keyword-options)><=name>>< / options >;
The OUTPUT statement creates a data set that contains predicted values and residual
diagnostics, computed after fitting the model. By default, all variables in the original
data set are included in the output data set.
You can use the ID statement to select a subset of the variables from the input data set
as well as computed variables for adding to the output data set. If you reassign a data
set variable through programming statements, the value of the variable from the input
data set supersedes the recomputed value when observations are written to the output
data set. If you list the variable in the ID statement, however, PROC GLIMMIX
saves the current value of the variable after the programming statements have been
executed.
For example, suppose that data set Scores contains the variables score, machine,
and person. The following statements fit a model with fixed machine and random
person effects. The variable score divided by 100 is assumed to follow an inverse
Gaussian distribution. The (conditional) mean and residuals are saved to the data set
igausout. Because no ID statement is given, the variable score in the output data set
contains the values from the input data set.
proc glimmix;
class machine person;
score = score/100;
p = 4*_linp_;
model score = machine / dist=invgauss;
random int / sub=person;
output out=igausout pred=p resid=r;
run;
On the contrary, the following statements list explicitly which variables to save to
the OUTPUT data set. Because the variable score is listed in the ID statement, and
is (re-)assigned through programming statements, the values of score saved to the
OUTPUT data set are the input values divided by 100.
proc glimmix;
class machine person;
score = score / 100;
model score = machine / dist=invgauss;
random int / sub=person;
output out=igausout pred=p resid=r;
id machine score _xbeta_ _zgamma_;
run;
You can specify the following options in the OUTPUT statement before the slash (/).
OUTPUT Statement
81
BLUP uses the predictors of the random effects in computing the statistic.
ILINK computes the statistic on the scale of the data.
NOBLUP does not use the predictors of the random effects in computing the
statistic.
NOILINK computes the statistic on the scale of the link function.
The default is to compute statistics using BLUPs on the scale of the link function (the
linearized scale). For example, the OUTPUT statement
Table 7. (continued)
Keyword Options Description Expression
p Name
NOBLUP Standard deviation of ηm ]
var[b StdErrPA
marginal linear predictor p
ILINK Standard deviation of var[g −1 (b
η − z0 γ)] StdErr
mean p
NOBLUP ILINK Standard deviation of var[g −1 (b
ηm )] StdErrMuPA
marginal mean
RESIDUAL Default Residual r = p − ηb Resid
NOBLUP Marginal Residual rm = p − ηbm ResidPA
ILINK Residual on mean scale ry = y − g −1 (b
η) ResidMu
NOBLUP ILINK Marginal residual on rym = y − g −1 (bηm ) ResidMuPA
mean scale p
PEARSON Default Pearson-type residual r/ pvar[p|γ]
c Pearson
NOBLUP Marginal Pearson-type rm / var[p]
c PearsonPA
residual p
ILINK Conditional Pearson- ry / c |γ]
var[Y PearsonMu
type mean residual p
STUDENT Default Studentized residual r/pvar[r]
c Student
NOBLUP Studentized marginal rm / var[r
c m] StudentPA
residual
LCL Default Lower prediction limit LCL
for linear predictor
NOBLUP Lower confidence limit LCLPA
for marginal linear pre-
dictor
ILINK Lower prediction limit LCLMu
for mean
NOBLUP ILINK Lower confidence limit LCLMuPA
for marginal mean
UCL Default Upper prediction limit UCL
for linear predictor
NOBLUP Upper confidence limit UCLPA
for marginal linear pre-
dictor
ILINK Upper prediction limit UCLMu
for mean
NOBLUP ILINK Upper confidence limit UCLMuPA
for marginal mean
VARIANCE Default Conditional variance of var[p|γ]
c Variance
pseudo-data
NOBLUP Marginal variance of var[p]
c VariancePA
pseudo-data
ILINK Conditional variance of c |γ]
var[Y Variance– Dep
response
OUTPUT Statement
83
Table 7. (continued)
Keyword Options Description Expression Name
NOBLUP ILINK Marginal variance of re- c ]
var[Y Variance– DepPA
sponse
Studentized residuals are computed only on the linear scale (scale of the link), un-
less the link is the identity, in which case the two scales are equal. The keywords
RESIDUAL, PEARSON, STUDENT, VARIANCE are not available with the multi-
nomial distribution. You can use the following shortcuts to request statistics: PRED
for PREDICTED, STD for STDERR, RESID for RESIDUAL, VAR for VARIANCE.
You can specify the following options of the OUTPUT statement after the slash (/).
ALLSTATS
requests that all statistics are computed. If you do not use a keyword to assign a name,
the GLIMMIX procedure uses the default name.
ALPHA=number
determines the coverage probability for two-sided confidence and prediction inter-
vals. The coverage probability is computed as 1-number. The value of number must
be between 0 and 1; the default is 0.05.
DERIVATIVES
DER
adds derivatives of model quantities to the output data set. If, for example, the model
fit requires the (conditional) log likelihood of the data, then the DERIVATIVES op-
tion writes for each observation the evaluations of the first and second derivatives of
the log likelihood with respect to – LINP– and – PHI– to the output data set. The
particular derivatives produced by the GLIMMIX procedure depend on the type of
model and the estimation method.
NOMISS
requests that records are written to the output data only for those observations that
were used in the analysis. By default, the GLIMMIX procedure produces output
statistics for all observations in the input data set.
NOUNIQUE
requests that names are not made unique in the case of naming conflicts. By default,
the GLIMMIX procedure avoids naming conflicts by assigning a unique name to each
output variable. If you specify the NOUNIQUE option, variables with conflicting
names are not renamed. In that case, the first variable added to the output data set
takes precedence.
NOVAR
requests that variables from the input data set are not added to the output data set.
This option does not apply to variables listed in a BY statement.
OBSCAT
requests that in models for multinomial data statistics are written to the output data set
only for the response levels that corresponds to the observed level of the observation.
84
The GLIMMIX Procedure
SYMBOLS
SYM
adds to the output data set computed variables that are defined or referenced in the
program.
PARMS Statement
PARMS < (value-list) . . . > < / options > ;
The PARMS statement specifies initial values for the covariance or scale parameters,
or it requests a grid search over several values of these parameters in generalized
linear mixed models.
The value-list specification can take any of several forms:
m a single value
m1 , m2 , . . . , mn several values
m to n a sequence where m equals the starting value, n equals the ending
value, and the increment equals 1
m to n by i a sequence where m equals the starting value, n equals the ending
value, and the increment equals i
m1 , m2 to m3 mixed values and sequences
constrains the first and third covariance parameters to equal 5 and 2, respectively.
Covariance or scale parameters that are held fixed with the HOLD= option are treated
as constrained parameters in the optimization. This is different from evaluating the
objective function, gradient, and Hessian matrix at known values of the covariance
parameters. A constrained parameter introduces a singularity in the optimization pro-
cess. The covariance matrix of the covariance parameters (see the ASYCOV option of
the PROC GLIMMIX statement) is then based on the projected Hessian matrix. As a
consequence, the variance of parameters subjected to a HOLD= is zero. Such param-
eters do not contribute to the computation of denominator degrees of freedom with
the DDFM=KENWARDROGER and DDFM=SATTERTH methods, for example. If
you wish to treat the covariance parameters as known, without imposing constraints
on the optimization, you should use the NOITER option.
When you place a hold on all parameters (or when you specify the NOITER) option
in a GLMM, you may notice that PROC GLIMMIX continues to produce an iter-
ation history. Unless your model is a linear mixed model, several recomputations
of the pseudo-response may be required in linearization-based methods to achieve
agreement between the pseudo-data and the covariance matrix. In other words, the
GLIMMIX procedure continues to update the profiled fixed-effects estimates (and
BLUPs) until convergence is achieved.
In certain models, placing a hold on covariance parameters implies that the procedure
processes the parameters in the same order as if the NOPROFILE was in effect. This
can change the order of the covariance parameters when you place a hold on one
or more parameters. Models that are subject to this re-ordering are those with R-
side covariance structures whose scale parameter could be profiled. This includes
the TYPE=CS, TYPE=SP, TYPE=AR, TYPE=TOEP, and TYPE=ARMA covariance
structures.
LOWERB=value-list
enables you to specify lower boundary constraints for the covariance or scale param-
eters. The value-list specification is a list of numbers or missing values (.) separated
by commas. You must list the numbers in the same order that PROC GLIMMIX uses
for the value-list, and each number corresponds to the lower boundary constraint. A
missing value instructs PROC GLIMMIX to use its default constraint, and if you do
not specify numbers for all of the covariance parameters, PROC GLIMMIX assumes
that the remaining ones are missing.
This option is useful, for example, when you want to constrain the G matrix to be
positive definite in order to avoid the more computationally intensive algorithms re-
quired when G becomes singular. The corresponding code for a random coefficients
model is as follows:
proc glimmix;
class person;
model y = time;
random int time / type=chol sub=person;
86
The GLIMMIX Procedure
parms / lowerb=1e-4,.,1e-4;
run;
Here, the CHOL structure is used in order to specify a Cholesky root parameteriza-
tion for the 2 × 2 unstructured blocks in G. This parameterization ensures that the
G matrix is nonnegative definite, and the PARMS statement then ensures that it is
positive definite by constraining the two diagonal terms to be greater than or equal to
1E−4.
NOITER
requests that no optimization of the covariance parameters be performed. This option
has no effect in generalized linear models.
If you specify the NOITER option, PROC GLIMMIX uses the values for the co-
variance parameters given in the PARMS statement to perform statistical inferences.
Note that the NOITER option is not equivalent to specifying a HOLD= value for all
covariance parameters. If you use the NOITER option, covariance parameters are
not constrained in the optimization. This prevents singularities that might otherwise
occur in the optimization process.
If a residual variance is profiled, the parameter estimates can change from the initial
values you provide as the residual variance is recomputed. To prevent an update of
the residual variance, combine the NOITER option with the NOPROFILE option of
the PROC GLIMMIX statements, as in the following code.
When you specify the NOITER option, you may notice that the GLIMMIX procedure
continues to produce an iteration history. Unless your model is a linear mixed model,
several recomputations of the pseudo-response may be required in linearization-based
methods to achieve agreement between the pseudo-data and the covariance matrix. In
other words, the GLIMMIX procedure continues to update the profiled fixed-effects
estimates (and BLUPs) until convergence is achieved. To prevent these updates, use
the MAXLMMUPDATE= option of the PROC GLIMMIX statement.
Specifying the NOITER option in the PARMS statement of a GLMM has the same
effect as choosing TECHNIQUE=NONE on the NLOPTIONS statement. If you
wish to base initial fixed-effects estimates on the results of fitting a generalized linear
model, then you can combine the NOITER option with the TECHNIQUE= option.
For example, the statements
PARMS Statement
87
determine the starting values for the fixed effects by fitting a logistic model (without
random effects), using the Newton-Raphson algorithm. The initial GLM fit stops at
convergence or after at most 10 iterations, whichever comes first. The pseudo-data
for the linearized GLMM is computed from the GLIM estimates. The variance of the
Clinic random effect is held constant at 0.4 during subsequent iterations that update
the fixed effects only.
If you also want to combine the GLM fixed effects estimates with known and fixed
covariance parameter values, without updating the fixed effects, you can add the
MAXLMMUPDATE=0 option:
Finally, the NOITER option can be useful if you wish to obtain minimum variance
quadratic unbiased estimates (with 0 priors), also known as MIVQUE0 estimates
(Goodnight 1978). Since MIVQUE0 estimates are starting values for covariance pa-
rameters—unless you provide (value-list) in the PARMS statement—the following
statements produce MIVQUE0 mixed model estimates.
PARMSDATA=SAS-data-set
PDATA=SAS-data-set
reads in covariance parameter values from a SAS data set. The data set should con-
tain the numerical variable ESTIMATE or the numerical variables COVP1–COVPq,
where q denotes the number of covariance parameters.
If the PARMSDATA= data set contains multiple sets of covariance parameters, the
GLIMMIX procedure evaluates the initial objective function for each set and com-
mences the optimization step using the set with the lowest function value as the start-
ing values. For example, the SAS statements
88
The GLIMMIX Procedure
data data_covp;
input covp1-covp4;
datalines;
180 200 170 1000
170 190 160 900
160 180 150 800
;
proc glimmix;
class A B C rep mainEU smallEU;
model yield = A|B|C;
random rep mainEU smallEU;
parms / pdata=data_covp;
run;
request that the objective function is evaluated for three sets of initial value. Each set
comprises four covariance parameters.
The order of the observations in a data set with the numerical variable ESTIMATE
corresponds to the order of the covariance parameters in the “Covariance Parameter
Estimates” table. In a GLM, the PARMSDATA= option can be used to set the starting
value for the exponential family scale parameter. A grid search is not conducted for
GLMs if you specify multiple values.
The PARMSDATA= data set must not contain missing values.
If the GLIMMIX procedure is processing the input data set in BY groups, you can add
the BY variables to the PARMSDATA= data set. If this data set is sorted by the BY
variables, the GLIMMIX procedure matches the covariance parameter values to the
current BY group. If the PARMSDATA= data set does not contain all BY variables,
the data set is processed in its entirety for every BY group and a message is written
to the LOG. This enables you to provide a single set of starting values across BY
groups, as in the following statements.
data data_covp;
input covp1-covp4;
datalines;
180 200 170 1000
;
proc glimmix;
class A B C rep mainEU smallEU;
model yield = A|B|C;
random rep mainEU smallEU;
parms / pdata=data_covp;
by year;
run;
The same set of starting values is used for each value of the year variable.
UPPERB=value-list
enables you to specify upper boundary constraints on the covariance parameters. The
value-list specification is a list of numbers or missing values (.) separated by commas.
RANDOM Statement
89
You must list the numbers in the same order that PROC GLIMMIX uses for the value-
list, and each number corresponds to the upper boundary constraint. A missing value
instructs PROC GLIMMIX to use its default constraint. If you do not specify numbers
for all of the covariance parameters, PROC GLIMMIX assumes that the remaining
ones are missing.
RANDOM Statement
RANDOM random-effects < / options > ;
Using notation from the “Notation for the Generalized Linear Mixed Model” section
beginning on page 7, the RANDOM statement defines the Z matrix of the mixed
model, the random effects in the γ vector, the structure of G, and the structure of R
(see the the “Notation for the Generalized Linear Mixed Model” section on page 7).
The Z matrix is constructed exactly as the X matrix for the fixed effects, and the G
matrix is constructed to correspond to the effects constituting Z. The structures of
G and R are defined by using the TYPE= option described on page 93. The random
effects can be classification or continuous effects, and multiple RANDOM statements
are possible.
Some reserved keywords have special significance in the random-effects list. You can
specify INTERCEPT (or INT) as a random effect to indicate the intercept. PROC
GLIMMIX does not include the intercept in the RANDOM statement by default as
it does in the MODEL statement. You can specify the – RESIDUAL– keyword (or
RESID, RESIDUAL, – RESID– ) before the option slash (/) to indicate a residual-type
(R-side) random component that defines the R matrix. Basically, the – RESIDUAL–
keyword takes the place of the random-effect if you want to specify R-side variances
and covariance structures. These keywords take precedence over variables in the data
set with the same name. If your data or the covariance structure requires that an effect
is specified, you can use the RESIDUAL option to instruct the GLIMMIX procedure
to model the R-side variances and covariances.
In order to add an overdispersion component to the variance function, simply specify
a single residual random component. For example, the statements
proc glimmix;
model count = x x*x / dist=poisson;
random _residual_;
run;
fit a polynomial Poisson regression model with overdispersion. The variance function
a(µ) = µ is replaced by φa(µ).
You can specify the following options in the RANDOM statement after a slash (/).
ALPHA=number
requests that a t type confidence interval with confidence level 1 − number be con-
structed for the predictors of random effects on this statement. The value of number
must be between 0 and 1; the default is 0.05. Specifying the ALPHA= option implies
the CL option.
90
The GLIMMIX Procedure
CL
requests that t type confidence limits be constructed for each of the predictors of
random effects on this statement. The confidence level is 0.95 by default; this can be
changed with the ALPHA= option. The CL option implies the SOLUTION option.
G
requests that the estimated G matrix be displayed for G-side random effects asso-
ciated with this RANDOM statement. PROC GLIMMIX displays blanks for values
that are 0.
GC
displays the lower-triangular Cholesky root of the estimated G matrix for G-side
random effects.
GCI
displays the inverse Cholesky root of the estimated G matrix for G-side random ef-
fects.
GCOORD=LAST
GCOORD=FIRST
GCOORD=MEAN
determines how the GLIMMIX procedure associates coordinates for TYPE=SP() co-
variance structures with effect levels for G-side random effects. In these covariance
structures, you specify one or more variables that identify the coordinates of a data
point. The levels of classification variables, on the other hand, can occur multiple
times for a particular subject. For example, in the following statements
proc glimmix;
class A B;
model y = B;
random A / type=sp(pow)(x);
run;
the same level of A can occur multiple times, and the associated values of x may
be different. The GCOORD=LAST option determines the coordinates for a level of
the random effect from the last observation associated with the level. Similarly, the
GCOORD=FIRST and GCOORD=MEAN options determine the coordinate from the
first observation and from the average of the observations. Observations not used in
the analysis are not considered in determining the first, last, or average coordinate.
GCORR
displays the correlation matrix corresponding to the estimated G matrix for G-side
random effects.
GI
displays the inverse of the estimated G matrix for G-side random effects.
GROUP=effect
GRP=effect
identifies groups by which to vary the covariance parameters. Each new level of the
grouping effect produces a new set of covariance parameters. Continuous variables
RANDOM Statement
91
and computed variables are permitted as group effects. PROC GLIMMIX does not
sort by the values of the continuous variable; rather, it considers the data to be from a
new group whenever the value of the continuous variable changes from the previous
observation. Using a continuous variable decreases execution time for models with
a large number of groups and also prevents the production of a large “Class Levels
Information” table.
Specifying a GROUP effect can greatly increase the number of estimated covariance
parameters, which may adversely affect the optimization process.
KNOTINFO
displays the number and coordinates of the knots as determined by the
KNOTMETHOD= option.
KNOTMETHOD=KDTREE<(tree-options)>
KNOTMETHOD=EQUAL<(numberlist)>
KNOTMETHOD=DATA(SAS-data-set)
determines the method of constructing knots for the (approximate) low-rank thin plate
spline fit with the TYPE=RSMOOTH covariance structure. Unless you select the
RSMOOTH covariance structure with the TYPE= option, the KNOTMETHOD= op-
tion has no effect. The default is KNOTMETHOD=KDTREE.
The spline is a low-rank smoother, meaning that the number of knots is considerably
less than the number of observations. By default, PROC GLIMMIX determines the
knot locations based on the vertices of a k-d tree (Friedman, Bentley, and Finkel
1977; Cleveland and Grosse 1991). The k-d tree is a tree data structure that is useful
for efficiently determining the m nearest neighbors of a point. The k-d tree also can
be used to obtain a grid of points that adapts to the configuration of the data. The
process starts with a hypercube that encloses the values of the random effects. The
space is then partitioned recursively by splitting cells at the median of the data in the
cell for the random effect. The procedure is repeated for all cells that contain more
than a specified number of points, b. The value b is called the bucket size.
The k-d tree is thus a division of the data into cells such that cells representing leaf
nodes contain at most b values. You control the building of the k-d tree through the
BUCKET= tree-option. You control the construction of knots from the cell coordi-
nates of the tree with the other options as follows:
BUCKET=number determines the bucket size b. A larger bucket size will result in
fewer knots. For k-d trees in more than one dimension, the corre-
spondence between bucket size and number of knots is difficult to
determine. It depends on the data configuration and on other sub-
options. In the multivariate case, you may need to try out different
bucket sizes to obtain the desired number of knots. The default
value of number is 4 for univariate trees (a single random effect),
and b0.1nc in the multidimensional case.
KNOTTYPE=type specifies whether the knots are based on vertices of the tree cells
or the centroid. The two possible values of type are VERTEX and
CENTER. The default is KNOTTYPE=VERTEX. For multidimen-
92
The GLIMMIX Procedure
See the “Knot Selection” section on page 127 for a detailed example of how the
specification of the bucket size translates into the construction of a k-d tree and the
spline knots.
When you use the NOFIT option of the PROC GLIMMIX statement, the
GLIMMIX procedure computes the knots but does not fit the model. This can
be useful if you want to compare knot selections with different suboptions of
KNOTMETHOD=KDTREE. Suppose you want to determine the number of knots
based on a particular bucket size. The statements
compute and display the knots in a bivariate smooth, constructed from nearest neigh-
bors of the vertices of a k-d tree with bucket size 10.
The KNOTMETHOD=EQUAL option enables you to define a regular grid of knots.
By default, PROC GLIMMIX constructs 10 knots for one-dimensional smooths and
5 knots in each dimension for smoothing in higher dimensions. You can specify a dif-
ferent number of knots with the optional numberlist. Missing values in the numberlist
are replaced with the default values. A minimum of two knots in each dimension is
required. For example, the statements
proc glimmix;
model y=;
random x1 x2 / type=rsmooth knotmethod=equal(5 7);
run;
use a rectangular grid of 35 knots, five knots for x1 combined with seven knots for
x2.
You can specify a data set that contains variables whose values give the knot coordi-
nates with the KNOTMETHOD=DATA option. The data set must contain numeric
RANDOM Statement
93
variables with the same name as the radial smoothing random-effects. This option is
useful to provide knot coordinates different from those that can be produced from a
k-d tree. For example, in spatial problems where the domain is irregularly shaped,
you may want to determine knots by a space-filling algorithm. The following SAS
statements invoke the OPTEX procedure to compute 45 knots that uniformly cover
the convex hull of the data locations (refer to Chapter 30, “Introduction,” and Chapter
31, “Details of the OPTEX Procedure,” in the SAS/QC User’s Guide for details about
the OPTEX procedure).
RESIDUAL
RSIDE
specifies that the random effects listed in this statement are R-side effects. You use
the RESIDUAL option of the RANDOM statement if the nature of the covariance
structure requires you to specify an effect.
SOLUTION | S
requests that the solution γ
b for the random-effects parameters be produced, if the
statement defines G-side random effects.
The numbers displayed in the “Std Err Pred” column of the “Solution for Random
Effects” table are not the standard errors of the γ
b displayed in the Estimate column;
b i − γ i , where γ
rather, they are the square roots of the prediction errors γ b i is the ith
EBLUP and γ i is the ith random-effect parameter.
SUBJECT=effect
SUB=effect
identifies the subjects in your generalized linear mixed model. Complete indepen-
dence is assumed across subjects. Specifying a subject effect is equivalent to nesting
all other effects in the RANDOM statement within the subject effect.
Continuous variables and computed variables are permitted with the SUBJECT= op-
tion. PROC GLIMMIX does not sort by the values of the continuous variable but
considers the data to be from a new subject whenever the value of the continuous
variable changes from the previous observation. Using a continuous variable de-
creases execution time for models with a large number of subjects and also prevents
the production of a large “Class Levels Information” table.
TYPE=covariance-structure
specifies the covariance structure of G for G-side effects and of R for R-side effects.
94
The GLIMMIX Procedure
Although a variety of structures are available, most applications call for either
TYPE=VC or TYPE=UN. The TYPE=VC (variance components) option is the de-
fault structure, and it models a different variance component for each random effect.
If you want different covariance structures in different parts of G, you must use
multiple RANDOM statements with different TYPE= options.
Valid values for covariance-structure are as follows. Examples are shown in Table 8
(page 100). The variances and covariances in the formulas that follow in the TYPE=
descriptions are expressed in terms of generic random variables ξi and ξj . These
correspond to random effects for G-side effects and to the response (conditional on
the random effects) for R-side statements.
The values i∗ and j ∗ are derived for the ith and jth observations
and are not necessarily the observation numbers. For example, in
the statements
proc glimmix;
class time patient;
model y = x x*x;
random time / sub=patient type=ar(1);
run;
the values correspond to the class levels for the time effect of the
ith and jth observation within a particular subject.
PROC GLIMMIX imposes the constraint |ρ| < 1 for stationarity.
TYPE=ARMA(1,1) specifies the first-order autoregressive moving average struc-
ture,
σ2
i=j
cov[ξi , ξj ] = 2 |i∗ −j ∗ |−1
σ γρ 6 j
i=
(1 + b1 θ1 )(θ1 + b1 )
γ=
1 + b21 + 2b1 θ1
proc glimmix;
class sub;
model y = x;
random _residual_ / subject=sub type=un;
run;
proc glimmix;
class sub;
model y = x;
random _residual_ / subject=sub type=chol;
run;
φ + σ2 i = j
cov[ξi , ξj ] =
σ2 i 6= j
proc glimmix;
class block A;
model y = block A;
random block*A / type=vc;
run;
proc glimmix;
class block A;
model y = block A;
random _residual_ / subject=block*A
type=cs;
run;
In the first case, the block*A random effect models the G-side ex-
perimental error. Because the distribution defaults to the normal,
the R matrix is of form φI, (see Table 9), and φ is the subsampling
error variance. The marginal variance for the data from a particular
2 J + φI. This matrix is of compound
experimental unit is thus σb∗a
symmetric form.
Hierarchical random assignments or selections, such as subsam-
pling or split-plot designs, give rise to compound symmetric co-
variance structures. This implies exchangeability of the observa-
tions on the subunit, leading to constant correlations between the
observations. Compound symmetric structures are thus usually not
RANDOM Statement
97
cov[ξi , ξj ] = σ 2 ρdij
σ2
i=j
cov[ξi , ξj ] =
σ|i−j| |i − j| < q
cov[ξi , ξj ] = σij
100
The GLIMMIX Procedure
cov[ξi , ξj ] = σi σj ρij
Table 8. (continued)
Description Structure Example
1 ρ ρ 2 ρ3
First-Order ρ 1 ρ ρ2
AR(1) σ2
ρ2 ρ 1 ρ
Autoregressive
ρ 3 ρ2 ρ 1
2
σ σ0 σ1 σ2
σ0 σ 2 σ0 σ1
Toeplitz TOEP
σ1 σ0 σ 2 σ0
σ2 σ1 σ0 σ 2
2
σ σ0 0 0
Toeplitz with σ0 σ 2 σ0 0
TOEP(2)
0 σ0 σ 2 σ0
Two Bands
0 0 σ0 σ 2
ρd12 ρd13 ρd14
1
Spatial ρd21 1 ρd23 ρd24
SP(POW)(c) σ2 d d
Power ρ 31 ρ 32 1 ρd34
ρd41 ρd42 ρd43 1
2
1 γ γρ γρ
First-Order γ 1 γ γρ
Autoregressive ARMA(1,1) σ2
γρ γ
1 γ
Moving-Average
γρ2 γρ γ 1
2
λ1 + d 1 λ1 λ2 λ1 λ3 λ1 λ4
First-Order λ2 λ1 2+d
λ 2 2 λ 2 λ3 λ2 λ4
Factor FA(1)
λ3 λ1 2
λ 3 λ2 λ3 + d 3 λ3 λ4
Analytic
λ4 λ1 λ4 λ2 λ4 λ3 λ24 + d4
σ12
σ1 σ2 ρ21 σ1 σ3 ρ31 σ1 σ4 ρ41
Unstructured σ2 σ1 ρ21 σ22 σ2 σ3 ρ32 σ2 σ4 ρ42
UNR
Correlations σ3 σ1 ρ31 σ3 σ2 ρ32 σ32 σ3 σ4 ρ43
σ4 σ1 ρ41 σ4 σ2 ρ42 σ4 σ3 ρ43 σ42
V<=value-list>
requests that blocks of the estimated marginal variance-covariance matrix V(θ) b be
displayed in generalized linear mixed models. This matrix is based on the last lin-
earization as described in the section the section “The Pseudo-Model” on page 115.
You can use the value-list to select the subjects for which the matrix is displayed. If
value-list is not specified, the V matrix for the first subject is chosen.
Note that the value-list refers to subjects as the processing units in the “Dimensions”
table. For example, the statements
proc glimmix;
class A B;
model y = B;
random int / subject=A;
102
The GLIMMIX Procedure
request that the estimated marginal variance matrix for the second subject be dis-
played. The subject effect for processing in this case is the A effect, since it is con-
tained in the A*B interaction. If there is only a single subject as per the “Dimensions”
table, then the V option displays an (n × n) matrix.
See the the section “Processing by Subjects” on page 123 for how the GLIMMIX
procedure determines the number of subjects in the “Dimensions” table.
The GLIMMIX procedure displays blanks for values that are 0.
VC<=value-list>
displays the lower-triangular Cholesky root of the blocks of the estimated V(θ)
b ma-
trix. See the V option for the specification of value-list.
VCI<=value-list>
displays the inverse Cholesky root of the blocks of the estimated V(θ)
b matrix. See
the V option for the specification of value-list.
VCORR<=value-list>
displays the correlation matrix corresponding to the blocks of the estimated V(θ)
b
matrix. See the V option for the specification of value-list.
VI<=value-list>
displays the inverse of the blocks of the estimated V(θ)
b matrix. See the V option for
the specification of value-list.
WEIGHT Statement
WEIGHT variable ;
The WEIGHT statement replaces R with W−1/2 RW−1/2 , where W is a diagonal
matrix containing the weights. Observations with nonpositive or missing weights are
not included in the resulting PROC GLIMMIX analysis. If a WEIGHT statement is
not included, all observations used in the analysis are assigned a weight of 1.
Programming Statements
This section lists the programming statements available in PROC GLIMMIX to com-
pute various aspects of the generalized linear mixed model or output quantities. For
example, you can compute model effects, weights, frequency, subject, group, and
other variables. You can use programming statements to define the mean and vari-
ance functions. This section also documents the differences between programming
statements in PROC GLIMMIX and programming statements in the DATA step. The
syntax of programming statements used in PROC GLIMMIX is identical to that used
in the NLMIXED procedure (see Chapter 51 in the SAS/STAT User’s Guide), and the
MODEL procedure (refer to the SAS/ETS User’s Guide). Most of the programming
statements that can be used in the SAS DATA step can also be used in the GLIMMIX
procedure. Refer to SAS Language Reference: Dictionary for a description of SAS
programming statements. The following are valid statements:
Programming Statements
103
ABORT;
CALL name [ ( expression [, expression ... ] ) ];
DELETE;
DO [ variable = expression
[ TO expression ] [ BY expression ]
[, expression [ TO expression ] [ BY expression ] ... ]
]
[ WHILE expression ] [ UNTIL expression ];
END;
GOTO statement– label;
IF expression;
IF expression THEN program– statement;
ELSE program– statement;
variable = expression;
variable + expression;
LINK statement– label;
PUT [ variable] [=] [...] ;
RETURN;
SELECT [( expression )];
STOP;
SUBSTR( variable, index, length ) = expression;
WHEN ( expression) program– statement;
OTHERWISE program– statement;
For the most part, the SAS programming statements work the same as they do in the
SAS DATA step, as documented in SAS Language Reference: Concepts. However,
there are several differences:
do i = 1,2,3;
do i = ’A’,’B’,’C’;
– The PROC GLIMMIX PUT statement does not support line pointers, fac-
tored lists, iteration factors, overprinting, – INFILE– , the colon (:) format
modifier, or “$”.
104
The GLIMMIX Procedure
– The PROC GLIMMIX PUT statement does support expressions, but the
expression must be enclosed in parentheses. For example, the following
statement displays the square root of x:
put (sqrt(x));
– The PROC GLIMMIX PUT statement supports the item – PDV– to dis-
play a formatted listing of all variables in the program. For example,
put _pdv_;
• The WHEN and OTHERWISE statements enable you to specify more than
one target statement. That is, DO/END groups are not necessary for multiple
statement WHENs. For example, the following syntax is valid:
select;
when (exp1) stmt1;
stmt2;
when (exp2) stmt3;
stmt4;
end;
Variance function
DIST= Distribution a(µ) φ≡1
BETA beta µ(1 − µ) No
BINARY binary µ(1 − µ) Yes
BINOMIAL | BIN | B binomial µ(1 − µ)/n Yes
EXPONENTIAL | EXPO exponential µ2 Yes
GAMMA | GAM gamma µ2 No
GAUSSIAN |G | NORMAL | N normal 1 No
GEOMETRIC | GEOM geometric µ + µ2 Yes
INVGAUSS | IGAUSSIAN | IG inverse Gaussian µ3 No
LOGNORMAL |LOGN log-normal 1 No
NEGBINOMIAL | NEGBIN | NB negative binomial µ + kµ2 Yes
POISSON | POI | P Poisson µ Yes
TCENTRAL | TDIST | T t 1 No
To change the variance function, you can use SAS programming statements and the
predefined automatic variables, as outlined in the following section. Your definition
of a variance function will override the DIST= option and its implied variance func-
tion. This has the following implication for parameter estimation with the GLIMMIX
procedure. When a user-defined link is available, the distribution of the data is deter-
mined from the DIST= option, or the respective default for the type of response. In
a GLM, for example, this enables maximum likelihood estimation. If a user-defined
variance function is provided, the DIST= option is not honored and the distribution of
the data is assumed unknown. In a GLM framework, only quasi-likelihood estimation
is then available to estimate the model parameters.
Automatic Variables
To specify your own link or variance function you can use SAS programming state-
ments and draw on the following automatic variables:
proc glimmix;
model y = x / dist=binary;
random int / subject=b;
p = 1/(1+exp(-_linp_);
output out=glimmixout resid(blup)=r;
id p;
run;
– ZGAMMA– equals z0 γ
b.
proc glimmix;
_linp_ = sqrt(abs(_linp_));
_mu_ = exp(_linp_);
model count = logtstd / dist=poisson;
run;
proc glimmix;
_mu_ = exp(sqrt(abs(_linp_)));
model count = logtstd / dist=poisson;
run;
If the value of the linear predictor is altered in any way through programming state-
ments, you need to ensure that an assignment to – MU– follows. The assignment to
variable P in the next set of GLIMMIX statements is without effect.
proc glimmix;
p = _linp_ + rannor(454);
model count = logtstd / dist=poisson;
run;
proc glimmix;
class block entry;
model y/n = block entry / dist=binomial link=logit;
run;
proc glimmix;
class block entry;
prob = 1 / (1+exp(- _linp_));
_mu_ = n * prob ;
_variance_ = n * prob *(1-prob);
model y = block entry;
run;
The first GLIMMIX invocation models the proportion y/n as a binomial propor-
tion with a logit link. The DIST= and LINK= options are superfluous in this case,
since the GLIMMIX procedure defaults to the binomial distribution in light of the
events/trials syntax. The logit link is that distribution’s default link. The second set
of GLIMMIX statements models the count variable y and takes the binomial sample
108
The GLIMMIX Procedure
size into account through assignments to the mean and variance function. In contrast
to the first set of GLIMMIX statements, the distribution of y is unknown. Only its
mean and variance are known. The model parameters are estimated by maximum
likelihood in the first case and by quasi-likelihood in the second case.
Details
Generalized Linear Models Theory
A generalized linear model consists of
• a linear predictor η = x0 β
• a monotonic mapping between the mean of the data and the linear predictor
• a response distribution in the exponential family of distributions
for some functions b(·) and c(·). The parameter θ is called the natural (canonical)
parameter. The parameter φ is a scale parameter and it is not present in all exponential
family distributions. See Table 9 on page 104 for a list of distributions for which
φ ≡ 1. In the case where observations are weighted, the scale parameter is replaced
with φ/w in the above density (or mass function), where w is the weight associated
with the observation y.
The mean and variance of the data are related to the components of the density,
E[Y ] = µ = b0 (θ), var[Y ] = φb00 (θ), where primes denote first and second deriva-
tives. If you express θ as a function of µ, the relationship is known as the natural link
or the canonical link function. In other words, modeling data with a canonical link as-
sumes that θ = x0 β; the effect contributions are additive on the canonical scale. The
second derivative of b(·), expressed as a function of µ, is the variance function of the
generalized linear model, a(µ) = b00 (θ(µ)). Note that because of this relationship,
the distribution determines the variance function and the canonical link function. You
cannot, however, proceed in the opposite direction. If you provide a user-specified
variance function, the GLIMMIX procedure assumes that only the first two moments
of the response distribution are known. The full distribution of the data is then un-
known and maximum likelihood estimation is not possible. Instead, the GLIMMIX
procedure then estimates parameters by quasi-likelihood.
Maximum Likelihood
The GLIMMIX procedure forms the log likelihoods of generalized linear models as
n
X
L(µ, φ; y) = fi l(µi , φ; yi , wi )
i=1
Generalized Linear Models Theory
109
where l(µi , φ; yi , wi ) is the log likelihood contribution of the ith observation with
weight wi and fi is the value of the frequency variable. For the determination of
wi and fi , see the WEIGHT and FREQ statements. The individual log likelihood
contributions for the various distributions are as follows.
• Beta :
Γ(φ/wi )
l(µi , φ; yi , wi ) = log
Γ(µφ/wi )Γ((1 − µ)φ/wi )
+ (µφ/wi − 1) log{yi }
+ ((1 − µ)φ/wi − 1) log{1 − yi }
var[Y ] = µ(1 − µ)/(1 + φ), φ > 0. See Ferrari and Cribari-Neto (2004).
• Binary:
where yi and ni are the events and trials in the events/trials syntax, and 0 <
µ < 1. var[Y ] = µ(1 − µ)/n, φ ≡ 1.
• Exponential:
(
− log{µ
ni } −oyi /µi wi = 1
l(µi , φ; yi , wi ) = w i yi w i yi
wi log µi − µi − log{yi Γ(wi )} wi 6= 1
var[Y ] = µ2 , φ ≡ 1.
• Gamma:
wi yi φ wi yi φ
l(µi , φ; yi , wi ) = wi φ log − − log{yi } − log {Γ(wi φ)}
µi µi
var[Y ] = µ + µ2 , φ ≡ 1.
• Inverse Gaussian:
1 wi (yi − µi )2
3
φyi
l(µi , φ; yi , wi ) = − + log + log{2π}
2 yi φµ2i wi
• “Lognormal”:
1 wi (log{yi } − µi )2
φ
l(µi , φ; log{yi }, wi ) = − + log + log{2π}
2 φ wi
var[log{Y }] = φ, φ > 0.
If you specify DIST=LOGNORMAL with response variable Y, the GLIMMIX
procedure assumes that log{Y } ∼ N (µ, σ 2 ). Note that the above density is
not the density of Y .
• Multinomial:
J
X
l(µi , φ; yi , wi ) = wi yij log{µij }
j=1
φ ≡ 1.
• Negative Binomial:
kµi kµi
l(µi , φ; yi , wi ) = yi log − (yi + wi /k) log 1 +
wi wi
Γ(yi + wi /k)
+ log
Γ(wi /k)Γ(yi + 1)
1 wi (yi − µi )2
φ
l(µi , φ; yi , wi ) = − + log + log{2π}
2 φ wi
var[Y ] = φ, φ > 0.
• Poisson:
var[Y ] = µ, φ ≡ 1.
• Shifted T:
√
zi = − log{φ∗ / wi } + log {Γ(0.5(ν + 1)}
− log {Γ(0.5ν)} − 0.5 ∗ log {πν}
( )
yi − µi 2
l(µi , φ; yi , wi ) = −(ν/2 + 0.5) log 1 + wi /ν + zi
φ∗
∂L(θ; y)
g =
∂θ
2
∂ L(θ; y)
H =
∂θ ∂θ 0
The GLIMMIX procedure computes the gradient vector and Hessian matrix analyt-
ically, unless your programming statements involve functions whose derivatives are
determined by finite differences. If the procedure is in scoring mode, H is replaced
by its expected value. PROC GLIMMIX is in scoring mode when the number n
of SCORING=n iterations has not been exceeded and the optimization technique
uses second derivatives, or when the Hessian is computed at convergence and the
EXPHESSIAN option is in effect.
In models for independent data with known distribution, parameter estimates are ob-
tained by the method of maximum likelihood. No parameters are profiled from the
optimization. The default optimization technique for GLMs is the Newton-Raphson
algorithm, except for Gaussian models with identity link, which do not require itera-
tive model fitting. In the case of a Gaussian model, the scale parameter is estimated
by restricted maximum likelihood, because this estimate is unbiased. The results
from PROC GLIMMIX agree with those of the GLM and REG procedure for such
models. You can obtain the maximum likelihood estimate of the scale parameter with
the NOREML option of the PROC GLIMMIX statement. To change the optimization
algorithm, use the TECHNIQUE= option of the NLOPTIONS statement.
Standard errors of the parameter estimates are obtained from the negative of the in-
verse of the (observed or expected) second derivative matrix H.
µi
yi − t
Z
Q(µi , yi ) = dt
yi φa(µi )
known as the log quasi-likelihood of the ith observation, has some properties of a log
likelihood function (McCullagh and Nelder 1989, p. 325). For example, the expected
value of its derivative is zero, and the variance of its derivative equals the negative of
the expected value of the second derivative. Consequently,
n
X Yi − µi
QL(µ, φ, y) = fi wi
φa(µi )
i=1
can serve as the score function for estimation. Quasi-likelihood estimation takes as
the gradient and “Hessian” matrix—with respect to the fixed effects parameters β
—the quantities
∂QL(µ, φ, y)
gql = [gql,j ] = = D0 V−1 (Y − µ)/φ
∂βj
2
∂ QL(µ, φ, y)
Hql = [hql,jk ] = = D0 V−1 D/φ
∂βj ∂βk
random _residual_;
n
1 X yi − µ
bi
φb = fi wi
m a(b
µi )
i=1
random _residual_;
statement. For models in which φ ≡ 1, this effectively lifts the constraint of the
parameter. In models that already contain a φ or k scale parameter—such as, the
normal, gamma, or negative binomial models—the statement adds a multiplicative
scalar (the overdispersion parameter, φo ) to the variance function.
The overdispersion parameter is estimated from Pearson’s statistic after all other
parameters have been determined by (restricted) maximum likelihood or quasi-
likleihood. This estimate is
n
1 X (yi − µi )2
φbo = fi wi
φp m a(µi )
i=1
• The model may be vacuous in the sense that no valid joint distribution can be
constructed either in general, or for a particular set of parameter values. For
example, if Y is an equicorrelated (n × 1) vector of binary responses with the
same success probability and a symmetric distribution, then the lower bound
on the correlation parameter depends on n and π (Gilliland and Schabenberger
2001). If further restrictions are placed on the joint distribution, as in Bahadur
(1961), the correlation is also restricted from above.
• The dependency between mean and variance for nonnormal data places con-
straints on the possible correlation models that simultaneously yield valid joint
distributions and a desired conditional distributions. Thus, for example, aspir-
ing for conditional Poisson variates that are marginally correlated according to
a spherical spatial process may not be possible.
114
The GLIMMIX Procedure
Because of these special features of generalized linear mixed models, many estima-
tion methods have been put forth in the literature. The two basic approaches are one,
to approximate the objective function, and two to approximate the model. Algorithms
in the second category can be expressed in terms of Taylor series (linearizations) and
are hence also known as linearization methods. They employ expansions to approxi-
mate the model by one based on pseudo-data with fewer nonlinear components. The
process of computing the linear approximation must be repeated several times until
some criterion indicates lack of further progress. Schabenberger and Gregoire (1996)
list numerous algorithms based on Taylor series for the case of clustered data alone.
The fitting methods based on linearizations are usually doubly iterative. The gener-
alized linear mixed model is approximated by a linear mixed model based on current
values of the covariance parameter estimates. The resulting linear mixed model is
then fit, which is itself an iterative process. On convergence, the new parameter esti-
mates are used to update the linearization, which results in a new linear mixed model.
The process stops when parameter estimates between successive linear mixed model
fits change within a specified tolerance only.
Integral approximation methods approximate the log likelihood of the GLMM and
submit the approximated function to numerical optimization. Various techniques are
used to compute the approximation: Laplace methods, quadrature methods, Monte
Carlo integration, and Markov Chain Monte Carlo methods. The advantage of inte-
gral approximation methods is to provide an actual objective function for optimiza-
tion. This enables you to perform likelihood ratio tests among nested models and
to compute likelihood-based fit statistics. The estimation process is singly iterative.
The disadvantage of integral approximation methods is the difficulty of accommodat-
ing crossed random effects, multiple subject effects, and complex R-side covariance
structures. The number of random effects should be small for integral approximation
methods to be practically feasible.
The advantages of linearization-based methods include a relatively simple form of the
linearized model that typically can be fit based on only the mean and variance in the
linearized form. Models for which the joint distribution is difficult—or impossible—
to ascertain can be fit with linearization-based approaches. Models with correlated
errors, a large number of random effects, crossed random effects, and multiple types
of subjects are thus excellent candidates for linearization methods. The disadvan-
tages of this approach include the absence of a true objective function for the overall
optimization process and potentially biased estimates of the covariance parameters,
especially for binary data. The objective function to be optimized after each lin-
earization update is dependent on the current pseudo-data. The process can fail at
both levels of the double iteration scheme.
This version of the GLIMMIX procedure fits generalized linear mixed models
based on linearizations. The default estimation method in GLIMMIX for mod-
Generalized Linear Mixed Models Theory
115
Recall from the “Notation for the Generalized Linear Mixed Model” section (begin-
ning on page 7) that
.
g −1 (η) = g −1 (e
η ) + ∆X(β
e − β)
e + ∆Z(γ
e −γ
e)
where
∂g −1 (η)
∆
e =
∂η β,e
eγ
e −1 (µ − g −1 (e .
∆ η )) + Xβ
e + Ze
γ = Xβ + Zγ
e −1 (Y − g −1 (e
∆ η )) + Xβ γ≡P
e + Ze
and
e −1 A1/2 RA1/2 ∆
var[P|γ] = ∆ e −1
P = Xβ + Zγ +
which is a linear mixed model with pseudo-response P, fixed effects β, random ef-
fects γ, and var[] = var[P|γ].
116
The GLIMMIX Procedure
Objective Functions
Now define
e −1 A1/2 RA1/2 ∆
V(θ) = ZGZ0 + ∆ e −1
as the marginal variance in the linear mixed pseudo-model, where θ is the (q × 1) pa-
rameter vector containing all unknowns in G and R. Based on this linearized model,
an objective function can be defined, assuming that the distribution of P is known.
The GLIMMIX procedure assumes that has a normal distribution. The maximum
log pseudo-likelihood (MxPL) and restricted log pseudo-likelihood (RxPL) for P are
then
1 1 f
l(θ, p) = − log |V(θ)| − r0 V(θ)−1 r − log{2π}
2 2 2
1 1 0 −1
lR (θ, p) = − log |V(θ)| − r V(θ) r
2 2
1 f −k
− log |X0 V(θ)−1 X| − log{2π}
2 2
with r = p − X(X0 V−1 X)− X0 V−1 p. f denotes the sum of the frequencies used
in the analysis, and k denotes the rank of X. The fixed effects parameters β are pro-
filed from these expressions. The parameters in θ are estimated by the optimization
techniques specified in the NLOPTIONS statement. The objective function for min-
imization is −2l(θ, p) or −2lR (θ, p). At convergence, the profiled parameters are
estimated and the random effects are predicted as
b = (X0 V(θ)
β b −1 X)− X0 V(θ)
b −1 p
γ b 0 V(θ)
b = GZ b −1b
r
With these statistics, the pseudo-response and error weights of the linearized model
are recomputed and the objective function is minimized again. The predictors γ b are
the estimated BLUPs in the approximated linear model. This process continues until
the relative change between parameter estimates at two successive (outer) iterations
is sufficiently small. See the PCONV= option of the PROC GLIMMIX statement
for the computational details on how the GLIMMIX procedure compares parameter
estimates across optimizations.
If the conditional distribution contains a scale parameter φ 6= 1 (Table 9 on page 104),
the GLIMMIX procedure profiles this parameter in GLMMs from the log pseudo-
likelihoods as well. To this end define
e −1 A1/2 R∗ A1/2 ∆
V(θ ∗ ) = ∆ e −1 + ZG∗ Z0
G has a variance component structure and R = φI, then θ ∗ contains ratios of the
variance components and φ, and R∗ = I. The solution for φb is
φb = b b∗ )−1b
r0 V(θ r/m
where m = f for MxPL and m = f − k for RxPL. Substitution into the previous
functions yields the profiled log pseudo-likelihoods,
1 f f
l(θ ∗ , p) = − log |V(θ ∗ )| − log r0 V(θ ∗ )−1 r − (1 + log{2π/f })
2 2 2
∗ 1 ∗ f −k 0 ∗ −1
lR (θ , p) = − log |V(θ )| − log r V(θ ) r
2 2
1 f −k
− log |X0 V(θ ∗ )−1 X| − (1 + log{2π/(f − k)})
2 2
S ≡ var[P|γ]
c e −1 A1/2 RA1/2 ∆
=∆ e −1
where all components on the right-hand side are evaluated at the converged estimates.
The mixed model equations (Henderson 1984) in the linear mixed (pseudo-)model are
then
and
−
X0 S−1 X X0 S−1 Z
C =
Z S X Z S−1 Z + G−1
0 −1 0
" #
Ω
b b 0 V(θ)
−ΩX b −1 ZG(θ)
b
= 0 −1 0 −1 0 b −1 ZG(θ)
−G(θ)Z V(θ) XΩ
b b b M + G(θ)Z V(θ) XΩX
b b b V(θ) b
b 0, γ
is the approximate estimated variance-covariance matrix of [β b 0 − γ 0 ]0 . Here, Ω
b =
(X0 V(θ)b −1 X)− and M = (Z0 S−1 Z + G(θ) b −1 )−1 .
118
The GLIMMIX Procedure
The square roots of the diagonal elements of Ω b are reported in the column
“Standard Error” of the “Parameter Estimates” table. This table is produced with the
SOLUTION option in the MODEL statement. The prediction standard errors of the
random effects solutions are reported in the column “Std Err Pred” of the “Solution
for Random Effects” table. This table is produced with the SOLUTION option in the
RANDOM statement.
As a cautionary note, C tends to underestimate the true sampling variability of
b 0, γ
[β b 0 ]0 , because no account is made for the uncertainty in estimating G and R.
Although inflation factors have been proposed (Kackar and Harville 1984; Kass and
Steffey 1989; Prasad and Rao 1990), they tend to be small for data sets that are fairly
well balanced. PROC GLIMMIX does not compute any inflation factors by default.
The DDFM=KENWARDROGER option in the MODEL statement prompts PROC
GLIMMIX to compute a specific inflation factor (Kenward and Roger 1997), along
with Satterthwaite-based degrees of freedom.
If G(θ)
b is singular, or if you use the CHOL option of the PROC GLIMMIX state-
ment, the mixed model equations are modified as follows. Let L denote the lower
triangular matrix so that LL0 = G(θ).
b PROC GLIMMIX then solves the equations
and transforms τb and a generalized inverse of the left-hand side coefficient matrix
using L.
The asymptotic covariance matrix of the covariance parameter estimator θ b is com-
puted based on the observed or expected Hessian matrix of the optimization pro-
cedure. Consider first the case where the scale parameter φ is not present or not
profiled. Since β is profiled from the pseudo-likelihood, the objective function for
minimization is f (θ) = −2l(θ, p) for METHOD=MSPL and METHOD=MMPL
and f (θ) = −2lR (θ, p) for METHOD=RSPL and METHOD=RMPL. Denote the
observed Hessian (second derivative) matrix as
∂ 2 f (θ)
H=
∂θ ∂θ 0
∂ 0
g1 = r V(θ)−1 r
∂θ
Generalized Linear Mixed Models Theory
119
∂2
H1 = log{V(θ)}
∂θ ∂θ 0
∂2
H2 = r0 V(θ)−1 r
∂θ ∂θ 0
∂2
H3 = log{|X0 V(θ)−1 X|}
∂θ ∂θ 0
The (“Mod.”) expressions for the Hessian under scoring in RxPL estimation refer to
a modified scoring method. In some cases, the modification leads to faster conver-
gence than the standard scoring algorithm. The modification is requested with the
SCOREMOD option in the PROC GLIMMIX statement.
Finally, in the case of a profiled scale parameter φ, the Hessian for the (θ ∗ , φ) param-
eterization is converted into that for the θ parameterization as
where
1/φ 0 ··· 0 0
0 1/φ ··· 0 0
B=
0 ··· ··· 1/φ 0
−θ1 /φ −θ2∗ /φ
∗ ∗ /φ
· · · −θq−1 1
120
The GLIMMIX Procedure
β
e=β
b γ
e=γ
b
the current estimates of the fixed effects and estimated BLUPs. The population-
averaged (PA) expansion expands about the same fixed effects and the expected value
of the random effects
β
e=β
b γ
e=0
To recompute the pseudo-response and weights in the SS expansion, the BLUPs must
be computed every time the objective function in the linear mixed model is maxi-
mized. The PA expansion does not require any BLUPs. The four pseudo-likelihood
methods implemented in the GLIMMIX procedure are the 2×2 factorial combination
between two expansion loci and residual versus maximum pseudo-likelihood estima-
tion. The following table shows the combination and the corresponding values of the
METHOD= option (PROC GLIMMIX statement); METHOD=RSPL is the default.
2(Dj )2
νj =
gj0 Agj
where Dj is the jth diagonal element of D and gj is the gradient of bj Cb0j with re-
spect to θ, evaluated at θ.
b The matrix A is the asymptotic variance-covariance matrix
of θ,
b obtained from the second derivative matrix of the likelihood equations. You can
display this matrix with the ASYCOV option of the PROC GLIMMIX statement.
Finally, let
rank
X(L) νj
E= I(νj > 2)
νj − 2
j=1
where the indicator function eliminates terms for which νj ≤ 2. The degrees of
freedom for F are then computed as
2E
ν=
E − rank(L)
are computed as
2(l0 Cl)2
ν=
g0 Ag
Y = µ + , ∼ (0, Σ)
m
!
b −1 0 0 b −1 b
X
c×Ω b 0Σ
Ai D i i Fi ei ei Fi Σi Di Ai Ω
b b
i=1
where Q = D0i Σ b −1 Di Ω.
b The optional number 0 ≤ r < 1 is chosen to provide an
i
upper bound on the correction factor. For r = 0, the classical sandwich estimator
results. PROC GLIMMIX chooses as default value r = 3/4. The diagonal entries of
Ai are then no greater than 2.
Processing by Subjects
123
Table 11 summarizes the components of the computation for the GLMM based on
linearization, where m denotes the number of subjects and k is the rank of X.
Table 11. Empirical Covariance Estimators for a Linearized GLMM
EMPIRICAL= c Ai Fi
CLASSICAL 1 I I
m
m−k m>k
DF I I
1 otherwise
ROOT 1 I (I − H0i )−1/2
FIRORES 1 I (I − H0i )−1
FIROEEQ(r) 1 Diag{(1 − min{r, [Q]jj })−1/2 } I
Computation of an empirical variance estimator requires that the data can be pro-
cessed by independent sampling units. This is always the case in GLMs. In this case,
m equals the sum of all frequencies. In GLMMs, the empirical estimators require that
the data consist of multiple subjects. In that case, m equals the number of subjects
as per the “Dimensions” table. The following section discusses how the GLIMMIX
procedure determines whether the data can be processed by subjects.
Processing by Subjects
Some mixed models can be expressed in different but mathematically equivalent ways
with PROC GLIMMIX statements. While equivalent statements lead to equivalent
statistical models, the data processing and estimation phase may be quite different,
depending on how you write the GLIMMIX statements. For example, the particular
use of the SUBJECT= option of the RANDOM statement affects data processing and
estimation. Certain options are only available when the data are processed by subject,
for example, the EMPIRICAL option of the PROC GLIMMIX statement. Consider
a GLIMMIX model where variables A and Rep are classification variables with a
and r levels, respectively. The following statements produce the same random effects
structure:
1. class Rep A;
random Rep*A;
2. class Rep A;
random intercept / subject=Rep*A;
3. class Rep A;
random Rep / subject=A;
4. class Rep A;
random A / subject=Rep;
In the first case, PROC GLIMMIX will not process the data by subjects because no
SUBJECT= option was given. The computation of empirical covariance estimators,
for example, will not be possible. The marginal variance-covariance matrix has the
same block-diagonal structure as for cases 2–4, where each block consists of the
124
The GLIMMIX Procedure
proc glimmix;
class A B;
model y = B;
random A;
random B / subject=A;
run;
because the first RANDOM statement specifies a G-side component and does not use
a SUBJECT= option. To allow processing by subjects, you can write the equivalent
model
proc glimmix;
class A B;
model y = B;
random int / subject=A;
random B / subject=A;
run;
If you denote a variance component effect X with subject effect S as X–(S), then
the “calculus of random effects” applied to the first RANDOM statement reads A =
Int*A = Int–(A) = A–(Int). For the second statement there are even more equivalent
formulations: A*B = A*B*Int = A*B–(Int) = A–(B) = B–(A) = Int–(A*B).
If there are multiple subject effects, processing by subjects is possible if the effects
are equal or contained in each other. Note that in the last example the A*B interaction
is a random effect. An equivalent specification to the last would be
proc glimmix;
class A B;
model y = B;
random int / subject=A;
random A / subject=B;
run;
Radial Smoothing Based on Mixed Models
125
Processing by subjects would not be possible in this case because the two subject
effects are not syntactically equal or contained in each other. A case where subject
effects are syntactically contained would be
proc glimmix;
class A B;
model y = B;
random int / subject=A;
random int / subject=A*B;
run;
The A main effect is contained in the A*B interaction. The GLIMMIX procedure
chooses as the subject effect for processing the effect that is contained in all other
subject effects. In this case, the subjects are defined by the levels of A.
You can examine the “Model Information” and “Dimensions” tables to see whether
the GLIMMIX procedure processes the data by subjects and which effect is used to
define subjects. The “Model Information” table displays whether the marginal vari-
ance matrix is diagonal (GLM models), blocked, or not blocked. The “Dimensions”
table tells you how many subjects (=blocks) there are.
Finally, nesting or crossing of interaction effects in subject effects are equivalent. The
following two random statements are equivalent:
class Rep A;
random intercept / subject=Rep*A;
class Rep A;
random intercept / subject=Rep(A);
Consider the linearized mixed pseudo-model from the section “The Pseudo-Model”
on page 115, P = Xβ + Zγ + . One derivation of the mixed model equations,
whose solutions are β
b and γb , is to maximize the joint density of f (γ, ) with respect
to β and γ. This is not a true likelihood problem, since γ is not a parameter, but a
random vector.
In the special case with var[] = φI and var[γ] = σ 2 I, the maximization of f (γ, )
is equivalent to the minimization of
Now consider a linear spline as in Ruppert, Wand, and Carroll (2003, p. 108),
K
X
pi = β0 + β1 xi + γj (xi − tj )+
j=1
where the γj denote the spline coefficients at knots t1 , · · · , tK . The truncated line
function is defined as
x−t x>t
(x − t)+ =
0 otherwise
If you collect the intercept and regressor x into the matrix X, and if you collect
the truncated line functions into the (n × K) matrix Z, then fitting the linear spline
amounts to minimization of the penalized spline criterion
a thin-plate spline as in Chapter 13.4 of Ruppert, Wand, and Carroll (2003). For com-
putational expediency, the number of knots is chosen to be less than the number of
data points. Ruppert, Wand, and Carroll (2003) recommend one knot per every four
unique regressor values for one-dimensional smoothers. In the multivariate case, gen-
eral recommendations are more difficult, because the optimal number and placement
of knots depends on the spatial configuration of samples. Their recommendation for
a bivariate smoothers is one knot per four samples, but at least 20 and no more than
150 knots (Ruppert et al. 2003, p. 257).
The magnitude of the variance component σ 2 is dependent on the metric of the ran-
dom effects. For example, if you apply radial smoothing in time, the variance changes
if you measure time in days or minutes. If the solution for the variance component is
near zero, then a rescaling of the random effect data can help the optimization prob-
lem by moving the solution for the variance component away from the boundary of
the parameter space.
Knot Selection
The GLIMMIX procedure computes knots for low-rank smoothing based on the ver-
tices or centroids of a k-d tree. The default is to use the vertices of the tree as the knot
locations, if you use the TYPE=RSMOOTH covariance structure. The construction
of this tree amounts to a partitioning of the random regressor space until all partitions
contain at most b observations. The number b is called the bucket size of the k-d tree.
You can exercise control over the construction of the tree by changing the bucket
size with the BUCKET= suboption of the KNOTMETHOD=KDTREE option in the
RANDOM statement. A large bucket size leads to fewer knots, but it is not correct
to assume that K, the number of knots, is simply bn/bc. The number of vertices
depends on the configuration of the values in the regressor space. Also, coordinates
of the bounding hypercube are vertices of the tree. In the one-dimensional case, for
example, the extreme values of the random effect are vertices.
To demonstrate how the k-d tree partitions the random effects space based on ob-
served data and the influence of the bucket size, consider the following example from
Chapter 41, “The LOESS Procedure” (SAS/STAT User’s Guide). The SAS data set
Gas contains the results of an engine exhaust emission study (Brinkman 1981). The
covariate in this analysis, E, is a measure of the air-fuel mixture richness. The re-
sponse, NOx, measures the nitric oxide concentration (in micrograms per joule, and
normalized).
data Gas;
input NOx E;
format NOx E f5.3;
datalines;
4.818 0.831
2.849 1.045
3.275 1.021
4.691 0.97
4.255 0.825
5.064 0.891
2.118 0.71
128
The GLIMMIX Procedure
4.602 0.801
2.286 1.074
0.97 1.148
3.965 1
5.344 0.928
3.834 0.767
1.99 0.701
5.199 0.807
5.283 0.902
3.752 0.997
0.537 1.224
1.64 1.089
5.055 0.973
4.937 0.98
1.561 0.665
;
There are 22 observations in the data set, and the values of the covariate are unique.
If you want to smooth these data with a low-rank radial smoother, you need to choose
the number of knots, as well as their placement within the support of the variable
E. The k-d tree construction depends on the observed values of the variable E; it
is independent of the values of nitric oxide in the data. The following statements
construct a tree based on a bucket size of b = 11 and display information about the
tree and the selected knots:
The NOFIT option prevents the GLIMMIX procedure from fitting the model.
This option is useful if you want to investigate the knot construction for
various bucket sizes. The TREEINFO and KNOTINFO suboptions of the
KNOTMETHOD=KDTREE option request displays of the k-d tree and the knot
coordinates derived from it. Construction of the tree commences by splitting the
data in half. For b = 11, n = 22, neither of the two splits contains more than b
observations and the process stops. With a single split value, and the two extreme
values, the tree has two terminal nodes and leads to three knots (Figure 11). Note
that for one-dimensional problems, vertices of the k-d tree always coincide with data
values.
Radial Smoothing Based on Mixed Models
129
0 1 2 E 0.9280
1 TERMINAL
2 TERMINAL
Radial Smoother
Knots for
RSmooth(E)
Knot
Number E
1 0.6650
2 0.9280
3 1.2240
produce the tree and knots in Figure 12. The initial split value of 0.9280 leads to two
sets of 11 observations. In order to achieve a partition into cells that contain at most
eight observations, each initial partition is split at its median one more time. Note
that one split value is greater and one split value is less than 0.9280.
130
The GLIMMIX Procedure
0 1 2 E 0.9280
1 3 4 E 0.8070
2 5 6 E 1.0210
3 TERMINAL
4 TERMINAL
5 TERMINAL
6 TERMINAL
Radial Smoother
Knots for
RSmooth(E)
Knot
Number E
1 0.6650
2 0.8070
3 0.9280
4 1.0210
5 1.2240
A further reduction in bucket size to b = 4 leads to the tree and knot information
shown in Figure 13.
Radial Smoothing Based on Mixed Models
131
0 1 2 E 0.9280
1 3 4 E 0.8070
2 9 10 E 1.0210
3 5 6 E 0.7100
4 7 8 E 0.8910
5 TERMINAL
6 TERMINAL
7 TERMINAL
8 TERMINAL
9 11 12 E 0.9800
10 13 14 E 1.0890
11 TERMINAL
12 TERMINAL
13 TERMINAL
14 TERMINAL
Radial Smoother
Knots for
RSmooth(E)
Knot
Number E
1 0.6650
2 0.7100
3 0.8070
4 0.8910
5 0.9280
6 0.9800
7 1.0210
8 1.0890
9 1.2240
The split value for b = 11 is also a split value for b = 8, the split values for b =
8 are a subset of those for b = 4, and so forth. Figure 14 displays the data and
the location of split values for the three cases. For a one-dimensional problem (a
univariate smoother), the vertices comprise the split values and the values on the
bounding interval.
132
The GLIMMIX Procedure
You may wish to move away from the boundary, in particular if the data configuration
is irregular or for multivariate smoothing. The KNOTTYPE=CENTER suboption
of the KNOTMETHOD= option chooses centroids of the leaf node cells instead of
vertices. This tends to move the outer knot locations closer to the convex hull, but
not necessarily to data locations. In the emission example, choosing a bucket size of
b = 11 and centroids as knot locations yields two knots at E=0.7956 and E=1.076.
If you choose the NEAREST suboption, then the nearest neighbor of a vertex or
centroid will serve as the knot location. In this case, the knot locations are a subset
of the data locations, regardless of the dimensionality of the smooth.
Intercept
By default, all models automatically include a column of 1s in X to estimate a fixed-
effect intercept parameter. You can use the NOINT option in the MODEL statement
Parameterization of Generalized Linear Mixed Models
133
to suppress this intercept. The NOINT option is useful when you are specifying a
classification effect in the MODEL statement and you want the parameter estimate to
be in terms of the (linked) mean response for each level of that effect, rather than in
terms of a deviation from an overall mean.
By contrast, the intercept is not included by default in Z. To obtain a column of 1s
in Z, you must specify in the RANDOM statement either the INTERCEPT effect or
some effect that has only one level.
Regression Effects
Numeric variables, or polynomial terms involving them, may be included in the
model as regression effects (covariates). The actual values of such terms are in-
cluded as columns of the model matrices X and Z. You can use the bar operator with
a regression effect to generate polynomial effects. For instance, X|X|X expands to
X X*X X*X*X, a cubic model.
Main Effects
If a class variable has m levels, PROC GLIMMIX generates m columns in the model
matrix for its main effect. Each column is an indicator variable for a given level.
The order of the columns is the sort order of the values of their levels and can be
controlled with the ORDER= option in the PROC GLIMMIX statement. Table 12 is
an example where β0 denotes the intercept.
Table 12. Main Effects Example
Data I A B
A B β0 A1 A2 B1 B2 B3
1 1 1 1 0 1 0 0
1 2 1 1 0 0 1 0
1 3 1 1 0 0 0 1
2 1 1 0 1 1 0 0
2 2 1 0 1 0 1 0
2 3 1 0 1 0 0 1
Typically, there are more columns for these effects than there are degrees of freedom
for them. In other words, PROC GLIMMIX uses an over-parameterized model.
Interaction Effects
Often a model includes interaction (crossed) effects. With an interaction, PROC
GLIMMIX first reorders the terms to correspond to the order of the variables in the
CLASS statement. Thus, B*A becomes A*B if A precedes B in the CLASS state-
ment. Then, PROC GLIMMIX generates columns for all combinations of levels that
occur in the data. The order of the columns is such that the rightmost variables in the
cross index faster than the leftmost variables (Table 13). Empty columns (that would
contain all 0s) are not generated for X, but they are for Z.
134
The GLIMMIX Procedure
Nested Effects
Nested effects are generated in the same manner as crossed effects. Hence, the design
columns generated by the following two statements are the same (but the ordering of
the columns is different):
Note that nested effects are often distinguished from interaction effects by the implied
randomization structure of the design. That is, they usually indicate random effects
within a fixed-effects framework. The fact that random effects can be modeled di-
rectly in the RANDOM statement may make the specification of nested effects in the
MODEL statement unnecessary.
Continuous-Nesting-Class Effects
When a continuous variable nests with a class variable, the design columns are con-
structed by multiplying the continuous values into the design columns for the class
effect (Table 15).
Table 15. Continuous-Nesting-Class Effects Example
Data I A X(A)
X A β0 A1 A2 X(A1) X(A2)
21 1 1 1 0 21 0
24 1 1 1 0 24 0
22 1 1 1 0 22 0
28 2 1 0 1 0 28
19 2 1 0 1 0 19
23 2 1 0 1 0 23
Continuous-by-Class Effects
Continuous-by-class effects generate the same design columns as continuous-nesting-
class effects. The two models are made different by the presence of the continuous
variable as a regressor by itself, as well as a contributor to a compound effect (Table
16).
Table 16. Continuous-by-Class Effects Example
Data I X A X*A
X A β0 X A1 A2 X*A1 X*A2
21 1 1 21 1 0 21 0
24 1 1 24 1 0 24 0
22 1 1 22 1 0 22 0
28 2 1 28 0 1 0 28
19 2 1 19 0 1 0 19
23 2 1 23 0 1 0 23
General Effects
An example that combines all the effects is X1*X2*A*B*C(D E). The continuous list
comes first, followed by the crossed list, followed by the nested list in parentheses.
You should be aware of the sequencing of parameters when you use the CONTRAST
or ESTIMATE statements to compute some function of the parameter estimates.
136
The GLIMMIX Procedure
• Class variables that occur outside parentheses (crossed effects) are sorted in the
order in which they appear in the CLASS statement.
• Variables within parentheses (nested effects) are sorted in the order in which
they appear in the CLASS statement.
• Variables in the crossed list index faster than variables in the nested list.
• Within a crossed or nested list, variables to the right index faster than variables
to the left.
For example, suppose a model includes four effects —A, B, C, and D—each having
two levels, 1 and 2. If the CLASS statement is
class A B C D;
then the order of the parameters for the effect B*A(C D), which is renamed
A*B(C D), is
A1 B1 C1 D1 → A1 B2 C1 D1 → A2 B1 C1 D1 → A2 B2 C1 D1 →
A1 B1 C1 D2 → A1 B2 C1 D2 → A2 B1 C1 D2 → A2 B2 C1 D2 →
A1 B1 C2 D1 → A1 B2 C2 D1 → A2 B1 C2 D1 → A2 B2 C2 D1 →
A1 B1 C2 D2 → A1 B2 C2 D2 → A2 B1 C2 D2 → A2 B2 C2 D2
Note that first the crossed effects B and A are sorted in the order in which they appear
in the CLASS statement so that A precedes B in the parameter list. Then, for each
combination of the nested effects in turn, combinations of A and B appear. The B
effect moves fastest because it is rightmost in the cross list. Then A moves next
fastest, and D moves next fastest. The C effect is the slowest since it is leftmost in
the nested list.
When numeric levels are used, levels are sorted by their character format, which
may not correspond to their numeric sort sequence (for example, noninteger levels).
Therefore, it is advisable to include a desired format for numeric levels or to use the
ORDER=INTERNAL option in the PROC GLIMMIX statement to ensure that levels
are sorted by their internal values.
Response Level Ordering and Referencing
137
You should view the “Response Profile” table to ensure that the categories are prop-
erly arranged and that the desired outcome is modeled. In this table, response levels
are arranged by Ordered Value. The lowest response level is assigned Ordered Value
1, the next lowest is assigned Ordered Value 2, and so forth. In binary models, the
probability modeled is the probability of the response level with the lowest Ordered
Value.
You can change which probability is modeled and the Ordered Value in the “Response
Profile” table with the DESCENDING, EVENT=, ORDER=, and REF= response
variable options in the MODEL statement. See the section “Response Level
138
The GLIMMIX Procedure
Ordering” (Chapter 44, SAS/STAT User’s Guide) in Cahpter 42, “The LOGISTIC
Procedure,” (SAS/STAT User’s Guide) for examples on how to use these options to
affect the probability being modeled for binary data.
For multinomial models, the response level ordering affects two important aspects. In
cumulative link models the categories are assumed ordered according to their Ordered
Value in the “Response Profile” table. If the response variable is a character variable,
or has a format, you should check this table carefully as to whether the Ordered
Values reflect the correct ordinal scale.
In generalized logit models (for multinomial data with unordered categories), one
response category is chosen as the reference category in the formulation of the gener-
alized logits. By default, the linear predictor in the reference category is set to 0, and
the reference category corresponds to the entry in the “Response Profile” table with
the highest Ordered Value. You can affect the assignment of Ordered Values with the
DESCENDING and ORDER= options of the MODEL statement. You can choose a
different reference category with the REF= option. The choice of the reference cat-
egory for generalized logit models affects the results. It is sometimes recommended
to choose the category with the highest frequency as the reference (see, for exam-
ple, Brown and Prescott 1999, p. 160). You can achieve this with the GLIMMIX
procedure by combining the ORDER= and REF= options. For example,
proc glimmix;
class preference;
model preference(order=freq ref=first) = feature price /
dist=multinomial link=glogit;
random intercept / subject=store group=preference;
run;
structure, do not have to be in the data set. They can be computed with
GLIMMIX programming statements.
• In the GLIMMIX procedure, RANDOM statement options apply to the
RANDOM statement in which they are specified. For example, the statements
random a / s;
random a*b / G;
random a*b*c / alpha=0.04;
in the GLIMMIX procedure request that the solution vector be printed for the
A and A*B*C random effects and that the G matrix corresponding to the A*B
interaction random effect is printed. Confidence intervals with a 0.96 coverage
probability are produced for the solutions of the A*B*C effect.
In the MIXED procedure, the S option, for example, applies to all RANDOM
statements.
• If you select nonmissing values in the value-list of the DDF= option of the
MODEL statement, PROC GLIMMIX uses these values to override degrees of
freedom for this effect that may be determined otherwise. For example, the
statements
proc glimmix;
class block a b;
model y = a b a*b / s ddf=4,.,. ddfm=sat;
random block a*block / s;
lsmeans a b a*b / diff;
run;
request that the denominator degrees of freedom for tests and confidence inter-
vals involving the A effect are set to 4. In the example, this applies to the “Type
III Tests of Fixed Effects,” “Least Squares Means,” and “Differences of Least
Squares Means” tables.
In the MIXED procedure, the Satterthwaite approximation overrides the DDF=
specification.
• The DDFM=BETWITHIN degrees of freedom method requires that the data
be processed by subjects; see the “Processing by Subjects” section.
• The LSMEANS statement in the MIXED procedure does not support the fol-
lowing options: ADJDFE=, SIMPLEDIFF=, SIMPLEDIFFTYPE=, and the
DIFF=ANOM type of LS-Means differences.
• The LSMEANS statement in the MIXED procedure does not support the fol-
lowing options: LINES, PLOTS=, SIMPLEDIFF=, and SIMPLEDIFFTYPE=.
• When you add the response variable to the CLASS statement, PROC
GLIMMIX defaults to the multinomial distribution. If you add the response
variable to the CLASS statement in PROC MIXED, it has no effect on the
fitted model.
• For ODS purposes, the name of the table for the solution of fixed effects is
“SolutionF” in the MIXED procedure. In PROC GLIMMIX, the name of the
140
The GLIMMIX Procedure
1. Noniterative algorithms
A closed form solution exists for all model parameters. Standard linear models
with homoscedastic, uncorrelated errors can be fit with noniterative algorithms.
2. Singly iterative algorithms
A single optimization, consisting of one or more iterations, is performed to
obtain solutions for the parameter estimates by numerical techniques. Linear
mixed models for normal data can be fit with singly iterative algorithms.
3. Doubly iterative algorithms
A model of simpler structure is derived from the target model. The parameters
of the simpler model are estimated by noniterative or singly iterative methods.
Based on these new estimates, the model of simpler structure is re-derived and
another estimation step follows. The process continues until changes in the
parameter estimates are sufficiently small between two re-computations of the
simpler model, or until some other criterion is met. The re-derivation of the
model can often be cast as a change of the response to some pseudo-data along
with an update of implicit model weights.
Iteration History
Objective Max
Iteration Restarts Evaluations Function Change Gradient
0 0 4 83.039723731 . 13.63536
1 0 3 82.189661988 0.85006174 0.281308
2 0 3 82.189255211 0.00040678 0.000174
3 0 3 82.189255211 0.00000000 1.05E-10
Figure 15. Iteration History and Convergence Status in Singly Iterative Fit
The “Iteration History” table contains the Evaluations column that shows how many
function evaluations were performed in a particular iteration. The convergence sta-
tus message informs you which convergence criterion was met when the estimation
process concluded. In a singly iterative fit, the criterion is one that applies to the
optimization. In other words, it is one of the criteria that can be controlled with
the NLOPTIONS statement: see the ABSCONV=, ABSFCONV=, ABSGCONV=,
ABSXCONV=, FCONV=, or GCONV= option.
In a doubly iterative fit, the “Iteration History” table does not contain an Evaluations
column. Instead it displays the number of iterations within an optimization (column
Subiterations in Figure 16).
Iteration History
Objective Max
Iteration Restarts Subiterations Function Change Gradient
Figure 16. Iteration History and Convergence Status in Doubly Iterative Fit
142
The GLIMMIX Procedure
The Iteration column then counts the number of optimizations. The “Convergence
Status” table indicates that the estimation process concludes when a criteria is met
that monitors the parameter estimates across optimization, namely the PCONV= or
ABSPCONV= criterion.
You can control the optimization process with the GLIMMIX procedure through the
NLOPTIONS statement. Its options affect the individual optimizations. In a doubly
iterative scheme, these apply to all optimizations.
The default optimization techniques are TECHNIQUE=NONE for nonitera-
tive estimation, TECHNIQUE=NEWRAP for singly iterative methods, and
TECHNIQUE=QUANEW for doubly iterative methods.
some kind of Hessian approximation usually require many more iterations than tech-
niques that do use a Hessian matrix, and, as a result, the total run time of these
techniques is often longer. Techniques that do not use the Hessian also tend to be less
reliable. For example, they can more easily terminate at stationary points rather than
at global optima.
Table 17 shows which derivatives are required for each optimization technique
(FOD: first-order derivatives (=gradient evaluation); SOD: second-order derivatives
(=Hessian evaluation)).
The second-derivative methods TRUREG, NEWRAP, and NRRIDG are best for
small problems where the Hessian matrix is not expensive to compute. Sometimes
the NRRIDG algorithm can be faster than the TRUREG algorithm, but TRUREG can
be more stable. The NRRIDG algorithm requires only one matrix with p(p + 1)/2
double words; TRUREG and NEWRAP require two such matrices. Here, p denotes
the number of parameters in the optimization.
The first-derivative methods QUANEW and DBLDOG are best for medium-sized
problems where the objective function and the gradient are much faster to evaluate
than the Hessian. The QUANEW and DBLDOG algorithms, in general, require more
iterations than TRUREG, NRRIDG, and NEWRAP, but each iteration can be much
faster. The QUANEW and DBLDOG algorithms require only the gradient to update
an approximate Hessian, and they require slightly less memory than TRUREG or
NEWRAP (essentially one matrix with p(p + 1)/2 double words).
The first-derivative method CONGRA is best for large problems where the objective
function and the gradient can be computed much faster than the Hessian and where
too much memory is required to store the (approximate) Hessian. The CONGRA
algorithm, in general, requires more iterations than QUANEW or DBLDOG, but each
iteration can be much faster. Since CONGRA requires only a factor of p double-word
memory, many large applications can be solved only by CONGRA.
The no-derivative method NMSIMP is best for small problems where derivatives are
not continuous or are very difficult to compute.
Each optimization method employs one or more convergence criteria that determine
when it has converged. An algorithm is considered to have converged when any
one of the convergence criterion is satisfied. For example, under the default set-
144
The GLIMMIX Procedure
Algorithm Descriptions
Trust Region Optimization (TRUREG)
The trust region method uses the gradient g(ψ (k) ) and the Hessian matrix H(ψ (k) );
thus, it requires that the objective function f (ψ) have continuous first- and second-
order derivatives inside the feasible region.
The trust region method iteratively optimizes a quadratic approximation to the nonlin-
ear objective function within a hyperelliptic trust region with radius ∆ that constrains
the step size corresponding to the quality of the quadratic approximation. The trust
region method is implemented using Dennis, Gay, and Welsch (1981), Gay (1983),
and Moré and Sorensen (1983).
The trust region method performs well for small- to medium-sized problems, and it
does not need many function, gradient, and Hessian calls. However, if the compu-
tation of the Hessian matrix is computationally expensive, one of the (dual) quasi-
Newton or conjugate gradient algorithms may be more efficient.
Newton-Raphson Optimization with Line Search (NEWRAP)
The NEWRAP technique uses the gradient g(ψ (k) ) and the Hessian matrix H(ψ (k) );
thus, it requires that the objective function have continuous first- and second-order
derivatives inside the feasible region. If second-order derivatives are computed effi-
ciently and precisely, the NEWRAP method may perform well for medium-sized to
large problems, and it does not need many function, gradient, and Hessian calls.
This algorithm uses a pure Newton step when the Hessian is positive definite and
when the Newton step reduces the value of the objective function successfully.
Otherwise, a combination of ridging and line search is performed to compute suc-
cessful steps. If the Hessian is not positive definite, a multiple of the identity matrix is
added to the Hessian matrix to make it positive definite (Eskow and Schnabel 1991).
In each iteration, a line search is performed along the search direction to find an
approximate optimum of the objective function. The default line-search method uses
quadratic interpolation and cubic extrapolation (LIS=2).
Newton-Raphson Ridge Optimization (NRRIDG)
The NRRIDG technique uses the gradient g(ψ (k) ) and the Hessian matrix H(ψ (k) );
thus, it requires that the objective function have continuous first- and second-order
derivatives inside the feasible region.
This algorithm uses a pure Newton step when the Hessian is positive definite and
when the Newton step reduces the value of the objective function successfully. If at
least one of these two conditions is not satisfied, a multiple of the identity matrix is
added to the Hessian matrix.
The NRRIDG method performs well for small- to medium-sized problems, and it
does not require many function, gradient, and Hessian calls. However, if the com-
putation of the Hessian matrix is computationally expensive, one of the (dual) quasi-
Newton or conjugate gradient algorithms may be more efficient.
Choosing an Optimization Algorithm
145
The (dual) quasi-Newton method uses the gradient g(ψ (k) ), and it does not need
to compute second-order derivatives since they are approximated. It works well for
medium to moderately large optimization problems where the objective function and
the gradient are much faster to compute than the Hessian. However, in general, it
requires more iterations than the TRUREG, NEWRAP, and NRRIDG techniques,
which compute second-order derivatives. QUANEW is the default optimization al-
gorithm because it provides an appropriate balance between the speed and stability
required for most nonlinear mixed model applications.
The QUANEW technique is one of the following, depending upon the value of the
UPDATE= option:
You can specify four update formulas with the UPDATE= option:
• DBFGS performs the dual Broyden, Fletcher, Goldfarb, and Shanno (BFGS)
update of the Cholesky factor of the Hessian matrix. This is the default.
• DDFP performs the dual Davidon, Fletcher, and Powell (DFP) update of the
Cholesky factor of the Hessian matrix.
• BFGS performs the original BFGS update of the inverse Hessian matrix.
• DFP performs the original DFP update of the inverse Hessian matrix.
In each iteration, a line search is performed along the search direction to find an
approximate optimum. The default line-search method uses quadratic interpolation
and cubic extrapolation to obtain a step size α satisfying the Goldstein conditions.
One of the Goldstein conditions can be violated if the feasible region defines an upper
limit of the step size. Violating the left-side Goldstein condition can affect the positive
definiteness of the quasi-Newton update. In that case, either the update is skipped or
the iterations are restarted with an identity matrix, resulting in the steepest descent or
ascent search direction. You can specify line-search algorithms other than the default
with the LIS= option.
The QUANEW algorithm performs its own line-search technique. All options and
parameters (except the INSTEP= option) controlling the line search in the other algo-
rithms do not apply here. In several applications, large steps in the first iterations are
troublesome. You can use the INSTEP= option to impose an upper bound for the step
146
The GLIMMIX Procedure
size α during the first five iterations. You can also use the INHESSIAN[=r] option
to specify a different starting approximation for the Hessian. If you specify only the
INHESSIAN option, the Cholesky factor of a (possibly ridged) finite difference ap-
proximation of the Hessian is used to initialize the quasi-Newton update process. The
values of the LCSINGULAR=, LCEPSILON=, and LCDEACT= options, which con-
trol the processing of linear and boundary constraints, are valid only for the quadratic
programming subroutine used in each iteration of the QUANEW algorithm.
Double Dogleg Optimization (DBLDOG)
The double dogleg optimization method combines the ideas of the quasi-Newton and
trust region methods. In each iteration, the double dogleg algorithm computes the
step s(k) as the linear combination of the steepest descent or ascent search direction
(k) (k)
s1 and a quasi-Newton search direction s2 ,
(k) (k)
s(k) = α1 s1 + α2 s2
The step is requested to remain within a prespecified trust region radius; refer to
Fletcher (1987, p. 107). Thus, the DBLDOG subroutine uses the dual quasi-Newton
update but does not perform a line search. You can specify two update formulas with
the UPDATE= option:
• DBFGS performs the dual Broyden, Fletcher, Goldfarb, and Shanno update of
the Cholesky factor of the Hessian matrix. This is the default.
• DDFP performs the dual Davidon, Fletcher, and Powell update of the Cholesky
factor of the Hessian matrix.
The double dogleg optimization technique works well for medium to moderately
large optimization problems where the objective function and the gradient are much
faster to compute than the Hessian. The implementation is based on Dennis and
Mei (1979) and Gay (1983), but it is extended for dealing with boundary and
linear constraints. The DBLDOG technique generally requires more iterations
than the TRUREG, NEWRAP, or NRRIDG technique, which requires second-order
derivatives; however, each of the DBLDOG iterations is computationally cheap.
Furthermore, the DBLDOG technique requires only gradient calls for the update of
the Cholesky factor of an approximate Hessian.
Conjugate Gradient Optimization (CONGRA)
Second-order derivatives are not required by the CONGRA algorithm and are not
even approximated. The CONGRA algorithm can be expensive in function and gra-
dient calls, but it requires only O(p) memory for unconstrained optimization. In
general, many iterations are required to obtain a precise solution, but each of the
CONGRA iterations is computationally cheap. You can specify four different update
formulas for generating the conjugate directions by using the UPDATE= option:
• PB performs the automatic restart update method of Powell (1977) and Beale
(1972). This is the default.
Remote Monitoring
147
The default often behaves best for typical examples whereas UPDATE=CD can per-
form poorly.
The CONGRA subroutine should be used for optimization problems with large p. For
the unconstrained or boundary constrained case, CONGRA requires only O(p) bytes
of working memory, whereas all other optimization methods require order O(p2 )
bytes of working memory. During p successive iterations, uninterrupted by restarts
or changes in the working set, the conjugate gradient algorithm computes a cycle
of p conjugate search directions. In each iteration, a line search is performed along
the search direction to find an approximate optimum of the objective function. The
default line-search method uses quadratic interpolation and cubic extrapolation to ob-
tain a step size α satisfying the Goldstein conditions. One of the Goldstein conditions
can be violated if the feasible region defines an upper limit for the step size. Other
line-search algorithms can be specified with the LIS= option.
Nelder-Mead Simplex Optimization (NMSIMP)
The Nelder-Mead simplex method does not use any derivatives and does not assume
that the objective function has continuous derivatives. The objective function itself
needs to be continuous. This technique is quite expensive in the number of function
calls, and it may be unable to generate precise results for p 40.
The original Nelder-Mead simplex algorithm is implemented and extended to bound-
ary constraints. This algorithm does not compute the objective for infeasible points,
but it changes the shape of the simplex adapting to the nonlinearities of the objective
function, which contributes to an increased speed of convergence. It uses a special
termination criteria.
Remote Monitoring
The SAS/EmMonitor is an application for Windows that enables you to monitor a
CPU-intensive application running on a remote server. The GLIMMIX procedure
supports remote monitoring through its NLOPTIONS statement.
On the server side, a FILENAME statement assigns a fileref to a SOCKET-type
device that defines the IP address of the client and the port number for listening. The
fileref is then specified in the SOCKET= option of the NLOPTIONS statement to
control the EmMonitor. The following statements show an example of server-side
code.
subject=year;
nloptions tech=nrridg gconv=2.e-5 socket=sock;
run;
On the client side, the SAS/EmMonitor application is started with the following syn-
tax:
EmMonitor options
The options are
You can intervene in the optimization process through the Stop Current and Stop
All buttons. Signaling the server by clicking these buttons effectively terminates the
optimization. In a singly iterative optimization, clicking either of the two buttons halts
the optimization process when the signal is received on the server side. The current
parameter estimates are accepted, and post-optimization processing is based on these
Default Output
149
values. Note that this is different from a condition where the usual termination criteria
is not met; for example, when the maximum number of iterations is exceeded. If an
optimization does not meet the termination criteria, no post-processing occurs.
In doubly iterative processes, the Stop All button terminates the overall optimization
process and accepts the current estimates. The Stop Current button stops the current
optimization and uses the current parameter estimates to start the next optimization.
The server does not need to be running when you start the EmMonitor, and you can
start or dismiss it at any time during the iteration process. You only need to remember
the port number.
If you do not start the PC client, or you close it prematurely, it will have no effect
on the server side. In other words, the iteration process will continue until one of the
criteria for termination is met.
Default Output
The following sections describe the output that PROC GLIMMIX produces by de-
fault. The output is organized into various tables, which are discussed in the order of
appearance. Note that the contents of a table may change with the estimation method
or the model being fit.
Model Information
The “Model Information” table displays basic information about the fitted model,
such as the link and variance functions, the distribution of the response, and the data
set. If important model quantities—for example, the response, weights, link, or vari-
ance function—are user-defined, the “Model Information” table displays the final
assignment to the respective variable, as determined from your programming state-
ments. If the table indicates that the variance matrix is blocked by an effect, then
PROC GLIMMIX processes the data by subjects. The “Dimensions” table displays
the number of subjects. For more information about processing by subjects, see the
section “Processing by Subjects” on page 123. For ODS purposes, the name of the
“Model Information” table is “ModelInfo.”
Number of Observations
The “Number of Observations” table displays the number of observations read from
the input data set and the number of observations used in the analysis. If you specify
a FREQ statement, the table also displays the sum of frequencies read and used. If
the events/trials syntax is used for the response, the table furthermore displays the
number of events and trials used in the analysis. For ODS purposes, the name of the
“Number of Observations” table is “NObs.”
150
The GLIMMIX Procedure
Response Profile
For binary and multinomial data, the “Response Profile” table displays the Ordered
Value from which the GLIMMIX procedure determines
For each response category level, the frequency used in the analysis is reported. The
section “Response Level Ordering and Referencing” on page 137 explains how you
can use the DESCENDING, EVENT=, ORDER=, and REF= options to affect the
assignment of Ordered Values to the response categories. For ODS purposes, the
name of the “Response Profile” table is “ResponseProfile.”
Dimensions
The “Dimensions” table displays information from which you can determine the size
of relevant matrices in the model. This table is useful in determining CPU time and
memory requirements. For ODS purposes, the name of the “Dimensions” table is
“Dimensions.”
Optimization Information
The “Optimization Information” table displays important details about the optimiza-
tion process.
The optimization technique that is displayed in the table is the technique that ap-
plies to any single optimization. For singly iterative methods that is the optimization
method.
The number of parameters that are updated in the optimization, equals the number
of parameters in this table minus the number of equality constraints. The number of
constraints are displayed if you fix covariance parameters with the HOLD= option
of the PARMS statement. The GLIMMIX procedure also lists the number of up-
per and lower boundary constraints. Note that the procedure may impose boundary
constraints for certain parameters, for example, variance components and correlation
parameters. Covariance parameters for which a HOLD= was issued have an upper
and lower boundary equal to the parameter value.
If a residual scale parameter is profiled from the optimization, it is also shown in the
“Optimization Information” table.
In a GLMM for which the parameters are estimated by one of the linearization meth-
ods, you need to initiate the process of computing the pseudo-response. This can
be done based on existing estimates of the fixed-effects, or by using the data it-
self—possibly after some suitable adjustment—as an estimate of the initial mean.
The default in PROC GLIMMIX is to use the data itself to derive initial estimates of
the mean function and to construct the pseudo-data. The “Optimization Information”
table shows how the pseudo-data are determined initially. Note that this issue is sep-
arate from the determination of starting values for the covariance parameters. These
Default Output
151
Iteration History
The “Iteration History” table describes the progress of the estimation process. In
singly iterative methods, the table displays
Note that the change in the objective function is not the convergence criterion
monitored by the GLIMMIX procedure. PROC GLIMMIX tracks several conver-
gence criteria simultaneously; see the ABSCONV=, ABSFCONV=, ABSGCONV=,
ABSXCONV=, FCONV=, or GCONV= options in the NLOPTIONS statement.
For doubly iterative estimation methods, the “Iteration History” table does not display
the progress of the individual optimizations; instead, it reports on the progress of
the outer iterations. Every row of the table then corresponds to an update of the
linearization, the computation of a new set of pseudo-data, and a new optimization.
In the listing, PROC GLIMMIX displays
By default, the change in the parameter estimates is expressed in terms of the relative
PCONV criterion. If you request an absolute criterion with the ABSPCONV option
of the PROC GLIMMIX statement, the change reflects the largest absolute difference
since the last optimization.
If you specify the ITDETAILS option of the PROC GLIMMIX statement, parameter
estimates and their gradients are added to the “Iteration History” table. For ODS
purposes, the name of the “Iteration History” table is “IterHistory.”
152
The GLIMMIX Procedure
Convergence Status
The “Convergence Status” table contains a status message describing the reason for
termination of the optimization. The message is also written to the LOG. For ODS
purposes, the name of the “Convergence Status” table is “ConvergenceStatus,” and
you can query the nonprinting numeric variable Status to check for a successful
optimization. This is useful in batch processing, or when processing BY groups, for
example, in simulations. Successful optimizations are indicted by the value 0 of the
Status variable.
Fit Statistics
The “Fit Statistics” table provides statistics about the estimated model. The first entry
of the table corresponds to the negative of twice the (possibly restricted) log likeli-
hood, log pseudo-likelihood, or log quasi-likelihood. If the estimation method per-
mits the true log likelihood or residual log likelihood, the description of the first entry
reads accordingly. Otherwise, the fit statistics are preceded by the words Pseudo- or
Quasi-, for Pseudo- and Quasi-Likelihood estimation, respectively.
Note that the (residual) log pseudo-likelihood in a GLMM is the (residual) log like-
lihood of a linearized model. You should not compare these values across different
statistical models, even if the models are nested with respect to fixed and/or G-side
random effects. It is possible that between two nested models the larger model has a
smaller pseudo-likelihood. For this reason, IC=NONE is the default for GLMMs fit
by pseudo-likelihood methods.
See the IC= option of the PROC GLIMMIX statement and Table 1 for the definition
and computation of the information criteria reported in the “Fit Statistics” table.
For generalized linear models, the GLIMMIX procedure reports Pearson’s chi-square
statistic
bi )2
X wi (yi − µ
X2 =
a(b
µi )
i
where a(b
µi ) is the variance function evaluated at the estimated mean.
For GLMMs, the procedure typically reports a generalized chi-square statistic,
Xg2 = b b∗ )−1b
r0 V(θ r
so that the ratio of X 2 or Xg2 and the degrees of freedom produces the usual residual
dispersion estimate.
If the R-side scale parameter φ is not extracted from V, the GLIMMIX procedure
computes
r0 V(θ)
Xg2 = b b −1b
r
as the generalized chi-square statistic. This is the case, for example, if R-side covari-
ance structures are varied by a GROUP= effect or if the scale parameter is not profiled
Notes on Output Statistics
153
dfsmooth,res = f − trace(S)
where S is the “smoother” matrix, that is, the matrix that produces the predicted
values on the linked scale.
For ODS purposes, the name of the “Fit Statistics” table is “FitStatistics.”
b 0 L0 (LQL0 )−1 Lβ
β b
F =
rank(L)
where the matrix Q depends on the estimation method and options. For example, in a
GLMM, the default is Q = (X0 V(θ) b −1 X)− , where V(θ) is the marginal variance of
the pseudo-response. If you specify the DDFM=KENWARDROGER option, Q is the
estimated variance matrix of the fixed effects, adjusted by the method of Kenward and
Roger (1997). If the EMPIRICAL= option is in effect, Q corresponds to the selected
sandwich estimator.
You can use the HTYPE= option in the MODEL statement to obtain tables of Type I
(sequential) tests and Type II (adjusted) tests in addition to or instead of the table of
Type III (partial) tests.
For ODS purposes, the name of the “Type I Tests of Fixed Effects” through the “Type
III Tests of Fixed Effects” tables are “Tests1” through “Tests3,” respectively.
154
The GLIMMIX Procedure
Confidence limits on the scale of the data are usually computed by applying the in-
verse link function to the confidence limits on the linked scale. The resulting limits
on the data scale have the same coverage probability as the limits on the linked scale,
but are possibly asymmetric.
In generalized logit models confidence limits on the mean scale are based on sym-
metric limits about the predicted mean in a category. Suppose that the multinomial
response in such a model has J categories. The probability of a response in category
i is computed as
exp {b
ηi }
µ
bi = PJ
j=1 exp {b ηi }
Statistical Graphics for LS-Mean Comparisons
155
The variance of µ
bi is then approximated as
.
µi ] = ζ = $ 0i var ηb1 ηb2 · · · ηbJ $ i
var[b
bi (1 − µ
µ bi ) i = k
−bµi µ
bk i=6 k
The confidence limits in the generalized logit model are then obtained as
p
bi ± tν,α/2
µ ζ
where tν,α/2 is the 100 ∗ (1 − α/2) percentile from a t distribution with ν degrees of
freedom. Confidence limits are truncated if they fall outside the [0, 1] interval.
data plants;
input Type $ @;
do Block = 1 to 3;
input StemLength @;
output;
end;
datalines;
Clarion 32.7 32.3 31.5
Clinton 32.1 29.7 29.1
Knox 35.7 35.9 33.1
ONeill 36.0 34.2 31.2
Compost 31.8 28.0 29.2
Wabash 38.2 37.8 31.9
Webster 32.5 31.1 29.7
;
The following statements perform the analysis of the experiment with the GLIMMIX
procedure.
156
The GLIMMIX Procedure
ods html;
ods graphics on;
ods select LSMeans DiffPlot;
The PLOTS= option in the PROC GLIMMIX statement requests that plots of pair-
wise least-squares means differences are produced for effects that are listed in cor-
responding LSMEANS statements. This is the Type effect. The second LSMEANS
statement uses the PLOTS= option of the LSMEANS statement to change the default
ABS option to NOABS.
The Type LS-means are shown in Figure 18. Note that the order in which the levels
appear corresponds to the order in which they were read from the data set. This was
accomplished with the ORDER=DATA option of the PROC GLIMMIX statement.
Standard
Type Estimate Error DF t Value Pr > |t|
Because there are seven levels of Type in this analysis, there are 7(6−1)/2 = 21 pair-
wise comparisons among the least-squares means. The comparisons are performed
in the following fashion: The first level of Type is compared against levels 2 through
7; the second level of Type is compared against levels 3 through 7; and so forth.
The default difference plot for these data is shown in Figure 19. The display is also
known as a “mean-mean scatter plot” (Hsu 1996; Hsu and Peruggia 1994). It contains
21 lines rotated by 45 degrees counter clockwise, and a reference line (dashed 45
degree line). The (x, y) coordinate for the center of each line corresponds to the
two least-squares means being compared. Suppose that ηb.i and ηb.j denote the ith
and jth least-squares mean for the effect in question, where i < j according to the
ordering of the effect levels. If the ABS option is in effect, which is the default,
Statistical Graphics for LS-Mean Comparisons
157
The background grid of the difference plot is drawn at the values of the least-squares
means for the seven type levels. These grid lines are used to find a particular com-
parison by intersection. Also, the labels of the grid lines indicate the ordering of the
least-squares means.
The NOABS option of the difference plot changes the way in which the GLIMMIX
procedure places the line segments (Figure 20). If the specific-plot-option NOABS
is in effect, the line segment is centered at the point (bη.i , ηb.j ), i < j. For example,
the center of the line segment for a comparison of “Clarion” and “Compost” types is
centered at (b η.1 , ηb.5 ) = (32.1667, 29.6667). Whether a line segment appears above
or below the reference line depends on the magnitude of the least-squares means and
the order of their appearance in the “Least Squares Means” table.
158
The GLIMMIX Procedure
Since the ABS option places lines on the same side of the 45 degree reference, it
can help to visually discover groups of significant and nonsignificant differences. On
the other hand, when the number of levels in the effect is large, the display can get
crowded. The NOABS option can then provide a more accessible resolution.
data plants;
input Type $ @;
do Block = 1 to 3;
input StemLength @;
output;
end;
datalines;
Clarion 32.7 32.3 31.5
Clinton 32.1 29.7 29.1
Knox 35.7 35.9 .
ONeill 36.0 34.2 31.2
Compost 31.8 28.0 29.2
Wabash 38.2 37.8 31.9
Webster 32.5 31.1 29.7
;
Statistical Graphics for LS-Mean Comparisons
159
The following code requests ControlPlots for effects in LSMEANS statements with
compatible option.
ods html;
ods graphics on;
ods select Diffs ControlPlot;
The LSMEANS statement for the Type effect is compatible; it requests comparisons
of Type levels against “Clarion,” adjusted for multiplicity with Dunnett’s method.
Since “Clarion” is the first level of the effect, the LSMEANS statement is equivalent
to
The “Differences of Type Least Squares Means” table shows the six comparisons
between Type levels and the control level.
Standard
Type _Type Estimate Error DF t Value Pr > |t| Adj P
The two rightmost columns of the table give the unadjusted and multiplicity adjusted
p-values. At the 5% significance level, both “Knox” and “Wabash” differ significantly
from “Clarion” according to the unadjusted tests. After adjusting for multiplicity,
only “Wabash” has a least-squares mean significantly different from the control mean.
Note that the standard error for the comparison involving “Knox” is larger than that
for other comparisons because of the reduced sample size in that group.
In the plot of control differences a horizontal line is drawn at the value of the
“Clarion” least-squares mean. Vertical lines emanating from this reference line ter-
minate in the least-squares means for the other levels (Figure 21).
160
The GLIMMIX Procedure
The dashed upper and lower horizontal reference lines are the upper and lower de-
cision limits for tests against the control level. If a vertical line crosses the upper or
lower decision limit, the corresponding least-squares mean is significantly different
from the LS-mean in the control group. If the data had been balanced, the UDL and
LDL would be straight lines, since all estimates ηb.i − ηb.j would have had the same
standard error. The limits for the comparison between “Knox” and “Clarion” are
wider than for other comparisons, because of the reduced sample size in the “Knox”
group.
The significance level of the decision limits is determined from the ALPHA= level of
the LSMEANS statement. The default are 95% limits. If you choose one-sided com-
parisons with DIFF=CONTROLL or DIFF=CONTROLU in the LSMEANS state-
ment, only one of the decision limits is drawn.
Examples
Example 1. Binomial Counts in Randomized Blocks
In the context of spatial prediction in generalized linear models, Gotway and Stroup
(1997) analyze data from an agronomic field trial. Researchers studied sixteen vari-
eties (entries) of wheat for their resistance to infestation with the Hessian fly. They
arranged the varieties in a randomized complete block design on an 8 × 8 grid. Each
4 × 4 quadrant of that arrangement constitutes a block.
The outcome of interest was the number of damaged plants (Yij ) out of the total
number of plants growing on the unit (nij ). The two subscripts identify the block
(i = 1, · · · , 4) and the entry (j = 1, · · · , 16). The following SAS statements create
the data set. The variables lat and lng denote the coordinate of an experimental unit
on the 8 × 8 grid.
data HessianFly;
label Y = ’No. of damaged plants’
n = ’No. of plants’;
input block entry lat lng n Y @@;
datalines;
164
The GLIMMIX Procedure
1 14 1 1 8 2 1 16 1 2 9 1
1 7 1 3 13 9 1 6 1 4 9 9
1 13 2 1 9 2 1 15 2 2 14 7
1 8 2 3 8 6 1 5 2 4 11 8
1 11 3 1 12 7 1 12 3 2 11 8
1 2 3 3 10 8 1 3 3 4 12 5
1 10 4 1 9 7 1 9 4 2 15 8
1 4 4 3 19 6 1 1 4 4 8 7
2 15 5 1 15 6 2 3 5 2 11 9
2 10 5 3 12 5 2 2 5 4 9 9
2 11 6 1 20 10 2 7 6 2 10 8
2 14 6 3 12 4 2 6 6 4 10 7
2 5 7 1 8 8 2 13 7 2 6 0
2 12 7 3 9 2 2 16 7 4 9 0
2 9 8 1 14 9 2 1 8 2 13 12
2 8 8 3 12 3 2 4 8 4 14 7
3 7 1 5 7 7 3 13 1 6 7 0
3 8 1 7 13 3 3 14 1 8 9 0
3 4 2 5 15 11 3 10 2 6 9 7
3 3 2 7 15 11 3 9 2 8 13 5
3 6 3 5 16 9 3 1 3 6 8 8
3 15 3 7 7 0 3 12 3 8 12 8
3 11 4 5 8 1 3 16 4 6 15 1
3 5 4 7 12 7 3 2 4 8 16 12
4 9 5 5 15 8 4 4 5 6 10 6
4 12 5 7 13 5 4 1 5 8 15 9
4 15 6 5 17 6 4 6 6 6 8 2
4 14 6 7 12 5 4 7 6 8 15 8
4 13 7 5 13 2 4 8 7 6 13 9
4 3 7 7 9 9 4 10 7 8 6 6
4 2 8 5 12 8 4 11 8 6 9 7
4 5 8 7 11 10 4 16 8 8 15 7
;
Analysis as a GLM
If infestations are independent among experimental units, and all plants within a unit
have the same propensity of infestation, then the Yij are binomial random variables.
The first model considered is a standard generalized linear model for independent
binomial counts.
The PROC GLIMMIX statement invokes the procedure. The CLASS statement in-
structs the GLIMMIX procedure to treat both block and entry as classification vari-
ables. The MODEL statement specifies the response variable and the fixed effects in
the model. PROC GLIMMIX constructs the X matrix of the model from the terms
in the right-hand side of the MODEL statement. The GLIMMIX procedure supports
Example 1. Binomial Counts in Randomized Blocks
165
two kinds of syntax for the response variable. This example uses the events/trials
syntax. The variable y represents the number of successes (events) out of n Bernoulli
trials. When the events/trials syntax is used, the GLIMMIX procedure automatically
selects the binomial distribution as the response distribution. Once the distribution is
determined, the procedure selects the link function for the model. The default link
for binomial data is the logit link. The preceding statements are thus equivalent to the
following statements.
The SOLUTION option in the MODEL statement requests that solutions for the fixed
effects (parameter estimates) be displayed.
The “Model Information” table in Output 1.1 describes the model and methods used
in fitting the statistical model.
Model Information
The GLIMMIX procedure recognizes that this is a model for uncorrelated data (vari-
ance matrix is diagonal) and that parameters can be estimated by maximum likeli-
hood. The default degrees of freedom method to denominator degrees of freedom
for F tests and t tests is the RESIDUAL method. This corresponds to choosing
f − rank(X) as the degrees of freedom, where f is the sum of the frequencies used
in the analysis. You can change the degrees of freedom method with the DDFM=
option of the MODEL statement.
The “Class Level Information” table in Output 1.2 lists the levels of the variables
specified in the CLASS statement and the ordering of the levels.
166
The GLIMMIX Procedure
block 4 1 2 3 4
entry 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
The “Number of Observations” table displays the number of observations read and
used in the analysis (Output 1.3).
The “Dimensions” table lists the size of relevant matrices (Output 1.4).
Dimensions
Columns in X 21
Columns in Z 0
Subjects (Blocks in V) 1
Max Obs per Subject 64
Because of the absence of G-side random effects in this model, there are no columns
in the Z matrix. The 21 columns in the X matrix comprise the intercept, 4 columns
for the block effect and 16 columns for the entry effect. Because no RANDOM
statement with a SUBJECT= option was specified, the GLIMMIX procedure does
not process the data by subjects (see the “Processing by Subjects” section on page
123 for details on subject processing).
The “Optimization Information” table provides information about the methods and
size of the optimization problem (Output 1.5).
Example 1. Binomial Counts in Randomized Blocks
167
Optimization Information
With few exceptions, models fit with the GLIMMIX procedure require numerical
methods for parameter estimation. The default optimization method for (overdis-
persed) GLM models is the Newton-Raphson algorithm. In this example, the opti-
mization involves 19 parameters, corresponding to the number of linearly indepen-
dent columns of the X0 X matrix.
The “Iteration History” table shows that the procedure converged after 3 iterations
and 13 function evaluations (Output 1.6).
Iteration History
Objective Max
Iteration Restarts Evaluations Function Change Gradient
0 0 4 134.13393738 . 4.899609
1 0 3 132.85058236 1.28335502 0.206204
2 0 3 132.84724263 0.00333973 0.000698
3 0 3 132.84724254 0.00000009 3.029E-8
The Change column measures the change in the objective function between iter-
ations; however, this is not the convergence criterion monitored. The GLIMMIX
procedure monitors several features simultaneously to determine whether to stop an
optimization.
The “Fit Statistics” table in Output 1.7 lists information about the fitted model.
168
The GLIMMIX Procedure
Fit Statistics
The -2 Log Likelihood values are useful for comparing nested models, and the in-
formation criteria AIC, AICC, BIC, CAIC, and HQIC are useful for comparing
nonnested models. On average, the ratio between the Pearson statistic and its de-
grees of freedom should equal one in GLMs. Values larger than one are indicative
of overdispersion. With a ratio of 2.37, these data appear to exhibit more dispersion
than expected under a binomial model with block and varietal effects.
In Output 1.8, the “Parameter Estimates” table displays the maximum likelihood es-
timates (Estimate), standard errors, and t tests for the hypothesis that the estimate is
zero.
Parameter Estimates
Standard
Effect block entry Estimate Error DF t Value Pr > |t|
The Gradient column displays the (projected) gradient of the parameter estimate at
convergence of the algorithm. These gradients should be small.
The “Type III Tests of Fixed Effect” table displays significance tests for the two fixed
effects in the model (Output 1.9).
Num Den
Effect DF DF F Value Pr > F
These tests are Wald-type tests, not likelihood ratio tests. The entry effect is clearly
significant in this model with a p-value of < 0.0001, indicating that the 16 wheat
varieties are not equally susceptible to damage by the Hessian fly.
Model Information
block 4 1 2 3 4
entry 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Example 1. Binomial Counts in Randomized Blocks
171
The “Dimensions” table indicates that there is a single G-side parameter, the variance
of the random block effect.
Dimensions
The “Dimensions” table in Output 1.13 has changed from the previous model (com-
pare to Output 1.4). Note that although the block effect has four levels, only a single
variance component is estimated. The Z matrix has four columns, however, corre-
sponding to the four levels of the block effect. Because no SUBJECT= option is used
in the RANDOM statement, the GLIMMIX procedure treats these data as having
arisen from a single subject with 64 observations.
In Output 1.14, the “Optimization Information” table indicates that a Quasi-Newton
method is used to solve the optimization problem. This is the default method for
GLMM models.
Optimization Information
In contrast to the Newton-Raphson method, the Quasi-Newton method does not re-
quire second derivatives. Because the covariance parameters are not unbounded in
this example, the procedure enforces a lower boundary constraint (zero) for the vari-
ance of the block effect, and the optimization method is changed to a dual quasi-
172
The GLIMMIX Procedure
Newton method. The fixed effects are profiled from the likelihood equations in this
model. The resulting optimization problem involves only the covariance parameters.
The “Iteration History” table appears to indicate that the procedure converged after
four iterations.
Iteration History
Objective Max
Iteration Restarts Subiterations Function Change Gradient
Notice, however, that the “Iteration History” table in Output 1.15 has changed slightly
from the previous analysis (see Output 1.6). The Evaluations column has been re-
placed by the Subiterations column, since the GLIMMIX procedure applied a dou-
bly iterative fitting algorithm. The entire process consisted of five optimizations, each
of which was iterative. The initial optimization required four iterations, the next one
three iterations, and so on.
In Output 1.16, the “Fit Statistics” table shows information about the fit of the
GLMM.
Fit Statistics
The log likelihood reported in the table is not the residual log likelihood of the data. It
is the residual log likelihood for an approximated model. The generalized chi-square
statistic measures the residual sum of squares in the final model and the ratio with
its degrees of freedom is a measure of variability of the observation about the mean
model.
Example 1. Binomial Counts in Randomized Blocks
173
Covariance Parameter
Estimates
Cov Standard
Parm Estimate Error
The variance of the random block effects is rather small (Output 1.17). If the envi-
ronmental effects operate on a spatial scale smaller than the block size, the random
block model does not provide a suitable adjustment. From the coarse layout of the
experimental area, it is not surprising that random block effects alone do not account
for the overdispersion in the data.
Adding a random component to a generalized linear model is different from adding
a multiplicative overdispersion component, for example, via the PSCALE option in
PROC GENMOD or a
random _residual_;
Standard
Effect entry Estimate Error DF t Value Pr > |t|
Because the block variance component is small, the Type III test for the variety effect
in Output 1.19 is affected only very little compared to the GLM (Output 1.9).
Num Den
Effect DF DF F Value Pr > F
Note that the block effects have been removed from the statements. The keyword
– RESIDUAL– in the RANDOM statement instructs the GLIMMIX procedure to
model the R matrix. Here, R is to be modeled as an exponential covariance structure
matrix. The SUBJECT=INTERCEPT option means that all observations are consid-
ered correlated.
Dimensions
Because the random effects are residual-type effects, there are no columns in the Z
matrix for this model (Output 1.20).
Optimization Information
In addition to the fixed effects, the GLIMMIX procedure now profiles one of the co-
variance parameters, the variance of the exponential covariance model (Output 1.21).
This reduces the size of the optimization problem. Only a single parameter is part of
the optimization, the “range” (SP(EXP)) of the spatial process.
Standard
Cov Parm Subject Estimate Error
The practical range of a spatial process is that distance at which the correlation be-
tween data points has decreased to at most 0.05. The parameter reported by the
GLIMMIX procedure as SP(EXP) in Output 1.22 corresponds to one third of the prac-
tical range. The practical range in this process is 3 × 0.9052 = 2.7156. Correlations
extend beyond a single experimental unit, but they do not appear to exist on the scale
of the block size.
The sill of the spatial process, the variance of the underlying residual effect, is esti-
mated as 2.5315.
Num Den
Effect DF DF F Value Pr > F
The F value for the entry effect in Output 1.23 has been sharply reduced compared to
the previous analyses. The smooth spatial variation accounts for some of the variation
among the varieties.
In this example three models were considered for the analysis of a randomized block
design with binomial outcomes. If data are correlated, a standard generalized linear
model often will indicate overdispersion relative to the binomial distribution. Two
courses of action are considered in this example to address this overdispersion. The
inclusion of G-side random effects models the correlation indirectly; it is induced
through the sharing of random effects among responses from the same block. The
R-side spatial covariance structure models covariation directly. In generalized lin-
ear (mixed) models the two modeling approaches can lead to different inferences,
because the models have different interpretation. The random block effects are mod-
eled on the linked (logit) scale, and the spatial effects were modeled on the mean
scale. Only in a linear mixed model are the two scales identical.
data salamander;
input day fpop$ fnum mpop$ mnum mating @@;
datalines;
4 rb 1 rb 1 1 4 rb 2 rb 5 1
4 rb 3 rb 2 1 4 rb 4 rb 4 1
4 rb 5 rb 3 1 4 rb 6 ws 9 1
4 rb 7 ws 8 0 4 rb 8 ws 6 0
4 rb 9 ws 10 0 4 rb 10 ws 7 0
4 ws 1 rb 9 0 4 ws 2 rb 7 0
4 ws 3 rb 8 0 4 ws 4 rb 10 0
4 ws 5 rb 6 0 4 ws 6 ws 5 0
4 ws 7 ws 4 1 4 ws 8 ws 1 1
4 ws 9 ws 3 1 4 ws 10 ws 2 1
Example 2. Mating Experiment with Crossed Random Effects
177
8 rb 1 ws 4 1 8 rb 2 ws 5 1
8 rb 3 ws 1 0 8 rb 4 ws 2 1
8 rb 5 ws 3 1 8 rb 6 rb 9 1
8 rb 7 rb 8 0 8 rb 8 rb 6 1
8 rb 9 rb 7 0 8 rb 10 rb 10 0
8 ws 1 ws 9 1 8 ws 2 ws 6 0
8 ws 3 ws 7 0 8 ws 4 ws 10 1
8 ws 5 ws 8 1 8 ws 6 rb 2 0
8 ws 7 rb 1 1 8 ws 8 rb 4 0
8 ws 9 rb 3 1 8 ws 10 rb 5 0
12 rb 1 rb 5 1 12 rb 2 rb 3 1
12 rb 3 rb 1 1 12 rb 4 rb 2 1
12 rb 5 rb 4 1 12 rb 6 ws 10 1
12 rb 7 ws 9 0 12 rb 8 ws 7 0
12 rb 9 ws 8 1 12 rb 10 ws 6 1
12 ws 1 rb 7 1 12 ws 2 rb 9 0
12 ws 3 rb 6 0 12 ws 4 rb 8 1
12 ws 5 rb 10 0 12 ws 6 ws 3 1
12 ws 7 ws 5 1 12 ws 8 ws 2 1
12 ws 9 ws 1 1 12 ws 10 ws 4 0
16 rb 1 ws 1 0 16 rb 2 ws 3 1
16 rb 3 ws 4 1 16 rb 4 ws 5 0
16 rb 5 ws 2 1 16 rb 6 rb 7 0
16 rb 7 rb 9 1 16 rb 8 rb 10 0
16 rb 9 rb 6 1 16 rb 10 rb 8 0
16 ws 1 ws 10 1 16 ws 2 ws 7 1
16 ws 3 ws 9 0 16 ws 4 ws 8 1
16 ws 5 ws 6 0 16 ws 6 rb 4 0
16 ws 7 rb 2 0 16 ws 8 rb 5 0
16 ws 9 rb 1 1 16 ws 10 rb 3 1
20 rb 1 rb 4 1 20 rb 2 rb 1 1
20 rb 3 rb 3 1 20 rb 4 rb 5 1
20 rb 5 rb 2 1 20 rb 6 ws 6 1
20 rb 7 ws 7 0 20 rb 8 ws 10 1
20 rb 9 ws 9 1 20 rb 10 ws 8 1
20 ws 1 rb 10 0 20 ws 2 rb 6 0
20 ws 3 rb 7 0 20 ws 4 rb 9 0
20 ws 5 rb 8 0 20 ws 6 ws 2 0
20 ws 7 ws 1 1 20 ws 8 ws 5 1
20 ws 9 ws 4 1 20 ws 10 ws 3 1
24 rb 1 ws 5 1 24 rb 2 ws 2 1
24 rb 3 ws 3 1 24 rb 4 ws 4 1
24 rb 5 ws 1 1 24 rb 6 rb 8 1
24 rb 7 rb 6 0 24 rb 8 rb 9 1
24 rb 9 rb 10 1 24 rb 10 rb 7 0
24 ws 1 ws 8 1 24 ws 2 ws 10 0
24 ws 3 ws 6 1 24 ws 4 ws 9 1
24 ws 5 ws 7 0 24 ws 6 rb 1 0
24 ws 7 rb 5 1 24 ws 8 rb 3 0
24 ws 9 rb 4 0 24 ws 10 rb 2 0
;
The first observation, for example, indicates that rough butt female 1 was paired in
178
The GLIMMIX Procedure
the laboratory on day 4 of the experiment with rough butt male 1 and mated. On the
same day rough butt female 7 was paired with whiteside male 8, the pairing did not
result in mating of the animals.
The model adopted by many authors for these data comprises fixed effects for gender
and population, their interaction, and male and female random effects. Specifically,
let πRR , πRW , πW R , and πW W denote the mating probabilities between the popu-
lations, where the first subscript identifies the female partner of the pair. Then, we
model
πkl
log = τkl + γf + γm k, l ∈ {R, W }
1 − πkl
where γf and γm are independent random variables representing female and male
random effects (20 each), and τkl denotes the average logit of mating between females
of population k and males of population l.
The following statements fit this model by pseudo-likelihood.
The response variable is the two-level variable mating. Since it is coded as zeros
and ones, and since PROC GLIMMIX models by default the probability of the first
level according to the response level ordering, the EVENT=’1’ option instructs PROC
GLIMMIX to model the probability of a successful mating. The distribution of the
mating variable, conditional on the random effects, is binary.
The fpop*fnum effect in the RANDOM statement creates a random intercept for each
female animal. Because fpop and fnum are CLASS variables, the effect has 20 levels
(10 rb and 10 ws females). Similarly, the mpop*mnum effect creates the random
intercepts for the male animals. Because no TYPE= is specified in the RANDOM
statement, the covariance structure defaults to TYPE=VC. The random effects and
their levels are independent, and each effect has its own variance component. Since
the conditional distribution of the data, conditioned on the random effects, is binary,
no extra scale parameter (φ) is added.
The LSMEANS statement requests least-squares means for the four levels of the
fpop*mpop effect, which are estimates of the cell means in the 2 × 2 classification
of female and male populations. The ILINK option of the LSMEANS statement
requests that the estimated means and standard errors are also reported on the scale
of the data. This yields estimates of the four mating probabilities, πRR , πRW , πW R ,
and πW W .
The “Model Information” table in Output 2.1 displays general information about the
model being fit.
Example 2. Mating Experiment with Crossed Random Effects
179
Model Information
The response variable mating follows a binary distribution (conditional on the ran-
dom effects). Hence, the mean of the data is an event probability, π, and the logit of
this probability is linearly related to the linear predictor of the model. The variance
function is the default function that is implied by the distribution, a(π) = π(1 − π).
The variance matrix is not blocked, since the GLIMMIX procedure does not process
the data by subjects (see the “Processing by Subjects” section for details). The es-
timation technique is the default methods for GLMMs, residual pseudo-likelihood
(METHOD=RSPL), and degrees of freedom for tests and confidence intervals are
determined by the containment method.
The “Class Level Information” table in Output 2.2 lists the levels of the variables
listed in the CLASS statement, as well as the order of the levels.
fpop 2 rb ws
fnum 10 1 2 3 4 5 6 7 8 9 10
mpop 2 rb ws
mnum 10 1 2 3 4 5 6 7 8 9 10
Note that there are two female populations and two male populations; also, the
variables fnum and mnum have 10 levels each. As a consequence, the effects
fpop*fnum and mpop*mnum identify the 20 females and males, respectively. The
effect fpop*mpop identifies the four mating types.
180
The GLIMMIX Procedure
Response Profile
Ordered Total
Value mating Frequency
1 0 50
2 1 70
The “Response Profile Table” in Output 2.3, which is displayed for binary or multi-
nomial data, lists the levels of the response variable and their order. With binary data,
the table also provides information about which level of the response variable de-
fines the event. Because of the EVENT=’1’ response variable option in the MODEL
statement, the probability being modeled is that of the higher ordered value.
Dimensions
There are two covariance parameters in this model, the variance of the fpop*fnum
effect and the variance of the mpop*mnum effect (Output 2.4). Both parameters
are modeled as G-side parameters. The nine columns in the X matrix comprise the
intercept, two columns each for the levels of the fpop and mpop effects, and four
columns for their interaction. The Z matrix has forty columns, one for each animal.
Because the data are not processed by subjects, PROC GLIMMIX assumes the data
consist of a single subject (a single block in V).
In Output 2.5, the “Optimization Information” table displays basic information about
the optimization.
Optimization Information
The default technique for GLMMs is the Quasi-Newton method. There are two pa-
rameters in the optimization, which correspond to the two variance components. The
17 fixed effects parameters are not part of the optimization. The initial optimization
computes pseudo-data based on the response values in the data set rather than from
estimates of a generalized linear model fit.
Iteration History
Objective Max
Iteration Restarts Subiterations Function Change Gradient
The GLIMMIX procedure performs eight optimizations after the initial optimization
(Output 2.6). That is, following the initial pseudo-data creation, the pseudo-data were
updated eight more times and a total of nine linear mixed models were estimated.
Standard
Cov Parm Estimate Error
The quotesCovariance Parameter Estimates table in Output 2.7 lists the estimates for
the two variance components and their estimated standard errors. The heterogeneity
(in the logit of the mating probabilities) among the females is considerably larger
than the heterogeneity among the males.
182
The GLIMMIX Procedure
Num Den
Effect DF DF F Value Pr > F
The “Type III Tests of Fixed Effects” table indicates a significant interaction between
the male and female populations (Output 2.8). A comparison in the logits of mating
success in pairs with R-females and W-females depends on whether the male partner
in the pair is the same species. The “mating Least Squares Means” table in Output
2.9 shows this effect more clearly.
Standard
Standard Error
fpop mpop Estimate Error DF t Value Pr > |t| Mean Mean
In a pairing with a male rough butt salamander, the logit drops sharply from 1.1629
to −1.4119 when the male is paired with a whiteside female instead of a female
from its own population. The corresponding estimated probabilities of mating suc-
cess are πbRR = 0.7619 and π bW R = 0.1959. If the same comparisons are made
in pairs with whiteside males, then you also notice a drop in the logit if the female
comes from a different population, 1.0151 versus 0.7839. The change is consid-
erably less, though, corresponding to mating probabilities of πbW W = 0.7340 and
π
bRW = 0.6865. Whiteside females appear to be successful with their own popula-
tion. Whiteside males appear to succeed equally well with female partners of the two
populations.
This insight into the factor level comparisons can be amplified by graphing the least-
squares mean comparisons and by subsetting the differences of least-squares means.
This is accomplished with the following statements.
ods html;
ods graphics on;
ods select DiffPlot SliceDiffs;
In Output 2.11, the “Simple Effect Comparisons” tables show the results of the
SLICEDIFF= option in the second LSMEANS statement.
Simple
Effect Standard
Level fpop _fpop Estimate Error DF t Value Pr > |t|
Simple
Effect Standard
Level mpop _mpop Estimate Error DF t Value Pr > |t|
The first table of simple effects comparisons holds fixed the level of the mpop factor
and compares the levels of the fpop factor. Since there is only one possible compari-
Example 3. Smoothing Disease Rates; Standardized Mortality Ratios
185
son for each male population, there are two entries in the table. The first compares the
logits of mating probabilities when the male partner is a rough butt, and the second
entry applies when the male partner is from the whiteside population. The second
table of simple effects comparisons applies the same logic, but holds fixed the level
of the female partner in the pair. Note that these four comparisons are a subset of
all six possible comparisons, eliminating those where both factors are varied at the
same time. The simple effect comparisons show that there is no difference in mating
probabilities if the male partner is a whiteside salamander, or if the female partner is
a rough butt. Rough butt females also appear to mate indiscriminately.
where β0 and β1 are fixed effects, and the γi are county random effects.
The following DATA step creates the data set lipcancer. The expected number of
cases is based on the observed standardized mortality ratio for counties with lip can-
cer cases, and on the expected counts reported by Clayton and Kaldor (1987, Table 1)
for the counties without cases. The sum of the expected counts then equals the sum
of the observed counts.
data lipcancer;
input county observed expected employment SMR;
if (observed > 0) then expCount = 100*observed/SMR;
else expCount = expected;
datalines;
1 9 1.4 16 652.2
2 39 8.7 16 450.3
3 11 3.0 10 361.8
4 9 2.5 24 355.7
186
The GLIMMIX Procedure
5 15 4.3 10 352.1
6 8 2.4 24 333.3
7 26 8.1 10 320.6
8 7 2.3 7 304.3
9 6 2.0 7 303.0
10 20 6.6 16 301.7
11 13 4.4 7 295.5
12 5 1.8 16 279.3
13 3 1.1 10 277.8
14 8 3.3 24 241.7
15 17 7.8 7 216.8
16 9 4.6 16 197.8
17 2 1.1 10 186.9
18 7 4.2 7 167.5
19 9 5.5 7 162.7
20 7 4.4 10 157.7
21 16 10.5 7 153.0
22 31 22.7 16 136.7
23 11 8.8 10 125.4
24 7 5.6 7 124.6
25 19 15.5 1 122.8
26 15 12.5 1 120.1
27 7 6.0 7 115.9
28 10 9.0 7 111.6
29 16 14.4 10 111.3
30 11 10.2 10 107.8
31 5 4.8 7 105.3
32 3 2.9 24 104.2
33 7 7.0 10 99.6
34 8 8.5 7 93.8
35 11 12.3 7 89.3
36 9 10.1 0 89.1
37 11 12.7 10 86.8
38 8 9.4 1 85.6
39 6 7.2 16 83.3
40 4 5.3 0 75.9
41 10 18.8 1 53.3
42 8 15.8 16 50.7
43 2 4.3 16 46.3
44 6 14.6 0 41.0
45 19 50.7 1 37.5
46 3 8.2 7 36.6
47 2 5.6 1 35.8
48 3 9.3 1 32.1
49 28 88.7 0 31.6
50 6 19.6 1 30.6
51 1 3.4 1 29.1
52 1 3.6 0 27.6
53 1 5.7 1 17.4
54 1 7.0 1 14.2
55 0 4.2 16 0.0
56 0 1.8 10 0.0
;
Example 3. Smoothing Disease Rates; Standardized Mortality Ratios
187
Since the mean of the Poisson variates, conditional on the random effects, is µi =
Ei λi , applying a log link yields
logn = log(expCount);
and associated with the linear predictor through the OFFSET= option of the MODEL
statement. The statement
x = employment / 10;
calculates the fitted standardized mortality rate. Note that the offset variable does not
contribute to the term being exponentiated.
The OUTPUT statement saves results of the calculations to the output data set glim-
mixout. The ID statement specifies that only the variables listed in are written to the
output data set.
188
The GLIMMIX Procedure
Model Information
The GLIMMIX procedure displays in the “Model Information” table that the offset
variable was computed with programming statements and the final assignment state-
ment from your GLIMMIX statements (Output 3.1).
county 56 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
Dimensions
There are two columns in the X matrix, corresponding to the intercept and the regres-
sor x/10. There are 56 columns in the Z matrix, however, one for each observation
in the data set (Output 3.3).
Example 3. Smoothing Disease Rates; Standardized Mortality Ratios
189
Optimization Information
The optimization involves only a single covariance parameter, the variance of the
county effect (Output 3.4). Because this parameter is a variance, the GLIMMIX pro-
cedure imposes a lower boundary constraint; the solution for the variance is bounded
by zero from below.
Iteration History
Objective Max
Iteration Restarts Subiterations Function Change Gradient
Following the initial creation of pseudo-data and the fit of a linear mixed model, the
procedure goes through five more updates of the pseudo-data, each associated with a
separate optimization (Output 3.5). Although the objective function in each optimiza-
tion is the negative of twice the restricted maximum likelihood for that pseudo-data,
there is no guarantee that across the outer iterations the objective function decreases
in subsequent optimizations. In this example, minus twice the residual maximum
likelihood at convergence takes on its smallest value at the initial optimization and
increases in subsequent optimizations.
Cov Standard
Parm Estimate Error
The “Covariance Parameter Estimates” table in Output 3.6 shows the estimate of
the variance of the region-specific log-relative risks. There is significant county-to-
county heterogeneity in risks. If the covariate were removed from the analysis, as in
Clayton and Kaldor (1987), the heterogeneity in county-specific risks would increase.
(The fitted SMRs in Table 6 of Breslow and Clayton (1993) were obtained without
the covariate x in the model.)
Standard
Effect Estimate Error DF t Value Pr > |t|
The “Solutions for Fixed Effects” table in Output 3.7 displays the estimates of β0 and
β1 along with their standard errors and test statistics. Because of the DDFM=NONE
option in the MODEL statement, PROC GLIMMIX assumes that the degrees of free-
dom for the t tests of H0 : βj = 0 are infinite. The p-values correspond to probabilities
under a standard normal distribution. The covariate measuring employment percent-
ages in agriculture, fisheries, and forestry is significant. This covariate may be a
surrogate for the exposure to sunlight, an important risk factor for lip cancer.
You can examine the quality of the fit of this model with various residual plots. A
panel of studentized residuals is requested with the following statements.
ods html;
ods graphics on;
ods select StudentPanel;
The graph in the upper left corner of the panel displays studentized residuals against
the linear predictor (Output 3.8). The default of the GLIMMIX procedure is to use
the estimated BLUPs in the construction of the residuals and to present them on the
linear scale, which in this case is the logarithmic scale. You can change the type
of the computed residual with the TYPE= suboptions of each paneled display. For
example, the option PLOTS=STUDENTPANEL(TYPE=NOBLUP) would request a
paneled display of the marginal residuals on the linear scale.
Example 3. Smoothing Disease Rates; Standardized Mortality Ratios
191
The graph in the upper right-hand corner of the panel shows a histogram with overlaid
normal density. A Q-Q plot and a box plot are shown in the lower cells of the panel.
Figure 22 displays the observed and predicted standardized mortality ratios. Fitted
SMRs tend to be larger than the observed SMRs for counties with small observed
SMR and smaller than observed for counties with high observed SMR.
192
The GLIMMIX Procedure
Figure 22. Observed and Predicted SMRs. Datalabels Indicate Covariate Values
To demonstrate the impact of the random effects adjustment to the log-relative risks,
the following statements fit a Poisson regression model (a GLM) by maximum like-
lihood. The SMRs are county specific to the extent that risks vary according to the
value of the covariate, but risks are no longer adjusted based on county-to-county
heterogeneity in the observed incidence count.
Model Information
Because of the absence of random effects, the GLIMMIX procedure recognizes the
model as a generalized linear model and fits it by maximum likelihood (Output 3.9).
The variance matrix is diagonal, because the observations are uncorrelated.
Dimensions
Columns in X 2
Columns in Z 0
Subjects (Blocks in V) 1
Max Obs per Subject 56
In Output 3.10, the “Dimensions” table shows that there are no G-side random effects
in this model, and no R-side scale parameter either.
Optimization Information
Because this is a GLM, the GLIMMIX procedure defaults to the Newton-Raphson al-
gorithm and the fixed effects (intercept and slope) comprise the parameters in the op-
timization (Output 3.11). The default optimization technique for a GLM is Newton-
Raphson.
194
The GLIMMIX Procedure
Parameter Estimates
Standard
Effect Estimate Error DF t Value Pr > |t|
The estimates of β0 and β1 have changed from the previous analysis. In the GLMM,
the estimates were βb0 = −0.4406 and βb1 = 0.6799 (Output 3.12). More importantly,
without the county-specific adjustments through the best linear unbiased predictors
of the random effects, the predicted SMRs are the same for all counties with the same
percentage of employees in agriculture, fisheries, and forestry (Figure 23).
purpose of the analysis is to make comparisons among the varieties, adjusted for site
effects.
data blotch;
array p{9} pct1-pct9;
input variety pct1-pct9;
do site = 1 to 9;
prop = p{site}/100;
output;
end;
drop pct1-pct9;
datalines;
1 0.05 0.00 1.25 2.50 5.50 1.00 5.00 5.00 17.50
2 0.00 0.05 1.25 0.50 1.00 5.00 0.10 10.00 25.00
3 0.00 0.05 2.50 0.01 6.00 5.00 5.00 5.00 42.50
4 0.10 0.30 16.60 3.00 1.10 5.00 5.00 5.00 50.00
5 0.25 0.75 2.50 2.50 2.50 5.00 50.00 25.00 37.50
6 0.05 0.30 2.50 0.01 8.00 5.00 10.00 75.00 95.00
7 0.50 3.00 0.00 25.00 16.50 10.00 50.00 50.00 62.50
8 1.30 7.50 20.00 55.00 29.50 5.00 25.00 75.00 95.00
9 1.50 1.00 37.50 5.00 20.00 50.00 50.00 75.00 95.00
10 1.50 12.70 26.25 40.00 43.50 75.00 75.00 75.00 95.00
;
Little is known about the distribution of the leaf area proportions. The outcomes are
not binomial proportions, since they do not represent the ratio of a count over a total
number of Bernoulli trials. However, since the mean proportion µij for variety j on
site i must lie in the interval [0, 1], you can commence the analysis with a model that
treats Prop as a “pseudo-binomial” variable:
Here, ηij is the linear predictor for variety j on site i, αi denotes the ith site effect
and τj denotes the jth barley variety effect. The logit of the expected leead area
proportions is linearly related to these effects. The variance funcion of the model is
that of a binomial(n,µij ) variable, and φ is an overdispersion parameter. This model
is fit with the GLIMMIX procedure with the following statements.
The MODEL statement specifies the distribution as binomial and the logit link. Since
the variance function of the binomial distribution is a(µ) = µ(1 − µ), you use the
random _residual_;
statement to specify the scale parameter φ. The LSMEANS statement requests esti-
mates of the least-squares means for the barley variety. The DIFF=CONTROL(’1’)
option requests tests of least-squares means differences against the first variety.
The “Model Information” table in Output 4.1 describes the model and methods used
in fitting the statistical model. It is assumed here that the data are binomial propor-
tions.
Output 4.1. Pseudo-Binomial Analysis
The GLIMMIX Procedure
Model Information
The “Class Level Information” table in Output 4.2 lists the number of levels of the
Site and Variety effects and their values. All 90 observations read from the data are
used in the analysis.
site 9 1 2 3 4 5 6 7 8 9
variety 10 1 2 3 4 5 6 7 8 9 10
In Output 4.3, the “Dimensions” table shows that the model does not contain G-
side random effects. There is a single covariance parameter, which corresponds to
φ. The “Optimization Information” table in Output 4.4 shows that the optimization
comprises 18 parameters. These correspond to the 18 nonsingular columns of the
X0 X matrix.
Example 4. Quasi-Likelihood Estimation for Proportions with Unknown Distribution
197
Dimensions
Covariance Parameters 1
Columns in X 20
Columns in Z 0
Subjects (Blocks in V) 1
Max Obs per Subject 90
Optimization Information
Fit Statistics
There are significant site and variety effects in this model based on the approximate
Type III F tests (Output 4.6).
Num Den
Effect DF DF F Value Pr > F
Output 4.7 displays the least-squares means for this analysis. These are obtained by
averaging
µij ) = ηbij
logit(b
across the sites. In other words, LS-means are computed on the linked scale where the
model effects are additive. Note that the least-squares means are ordered by variety.
The estimate of the expected proportion of infected leaf area for the first variety is
1
µ
b.,1 = = 0.0124
1 + exp{4.38}
1
µ
b.,10 = = 0.468
1 + exp{0.127}
Standard
variety Estimate Error DF t Value Pr > |t|
Because of the ordering of the least-squares means, the differences against the first
variety are also ordered from smallest to largest (Output 4.8).
Standard
variety _variety Estimate Error DF t Value Pr > |t|
This analysis depends on your choice for the variance function which was implied by
the binomial distribution. You can diagnose the distributional assumption by examin-
ing various graphical diagnotics measures. The following statements request a panel
display of the Pearson-type residuals.
ods html;
ods graphics on;
ods select PearsonPanel;
Output 4.9 clearly indicates that the chosen variance function is not appropriate for
these data. As µ approaches zero or one, the variability in the residuals is less than
that implied by the binomial variance function. To remedy this situation, McCullagh
and Nelder (1989) consider instead the variance function
Imagine two varieties with µ.i = 0.1 and µ.k = 0.5. Under the binomial variance
function, the variance of the proportion for variety k is 2.77 times larger than that for
variety i. Under the revised model this ratio increases to 2.772 = 7.67.
The analysis of the revised model is obtained with the next set of GLIMMIX state-
ments. Since you need to model a variance function that does not correspond to
any of the built-in distributions,you need to supply a function with an assignment
to the automatic variable – VARIANCE– . The GLIMMIX procedure then considers
the distribution of the data as unknown. The corresponding estimation technique is
quasi-likelihood. Since this model does not include an extra scale parameter, you can
drop the RANDOM – RESIDUAL– statement from the analysis.
ods html;
ods graphics on;
ods select ModelInfo FitStatistics LSMeans Diffs PearsonPanel;
The “Model Information” table in Output 4.10 now displays the distribution as
“Unknown”, because of the assignment made in the GLIMMIX statements to
– VARIANCE– . The table also shows the expression evaluated as the variance func-
tion.
Output 4.10. Quasi-Likelihood Analysis
The GLIMMIX Procedure
Model Information
The fit statistics of the model are now expressed in terms of the log quasi-likelihood.
It is computed as
9 X
10 Z µij
X yij − t
dt
yij t (1 − t)2
2
i=1 j=1
Twice the negative of this sum equals −85.74, which is displayed in the “Fit
Statistics” table (Output 4.11).
The scaled Pearson statistic is now 0.99. Inclusion of an extra scale parameter φ
would have little or no effect on the results.
Fit Statistics
The panel of Pearson-type residuals now shows a much more adequate distribution
for the residuals and a reduction in the number of outlying residuals (Output 4.12).
The least-squares means are no longer ordered in size by variety (Output 4.13). For
example, logit(bµ.1 ) > logit(b
µ.2 ). Under the revised model, the second variety has
a greater percentage of its leaf area covered by blotch, compared to the first variety.
Varieties 5 and 6 and varieties 8 and 9 show similar reversal in ranking.
Standard
variety Estimate Error DF t Value Pr > |t|
Interestingly, the standard errors are constant among the LS-means (Output 4.13),
and among the LS-means differences (Output 4.14). This is due to the fact that for
the logit link
∂µ
= µ(1 − µ)
∂η
which cancels with the square root of the variance function in the estimating equa-
tions. The analysis is thus orthogonal.
Standard
variety _variety Estimate Error DF t Value Pr > |t|
data hernio;
input patient age gender$ OKstatus leave los;
datalines;
1 78 m 1 0 9
2 60 m 1 0 4
3 68 m 1 1 7
4 62 m 0 1 35
5 76 m 0 0 9
6 76 m 1 1 7
7 64 m 1 1 5
8 74 f 1 1 16
9 68 m 0 1 7
10 79 f 1 0 11
11 80 f 0 1 4
12 48 m 1 1 9
13 35 f 1 1 2
14 58 m 1 1 4
15 40 m 1 1 3
16 19 m 1 1 4
17 79 m 0 0 3
18 51 m 1 1 5
19 57 m 1 1 8
20 51 m 0 1 8
21 48 m 1 1 3
22 48 m 1 1 5
23 66 m 1 1 8
24 71 m 1 0 2
25 75 f 0 0 7
26 2 f 1 1 0
27 65 f 1 0 16
28 42 f 1 0 3
29 54 m 1 0 2
30 43 m 1 1 3
31 4 m 1 1 3
32 52 m 1 1 8
;
While the response variable los is a Poisson count variable, the response variable
leave is a binary variable. You can perform separate analysis for the two outcomes,
for example, by fitting a logistic model for the operating room exit condition and
a Poisson regression model for the length of hospital stay. This, however, would
ignore the correlation between the two outcomes. Intuitively, you would expect that
the length of post-operative hospital stay is longer for those patients who had more
tenuous exit conditions.
The following DATA step converts the data set hernio from the multivariate form
to the univariate form. In the multivariate form the responses are stored in separate
variables. The GLIMMIX procedure requires the univariate data structure.
data hernio_uv;
length dist $7;
set hernio;
Example 5. Joint Modeling of Binary and Count Data
205
response = (leave=1);
dist = "Binary";
output;
response = los;
dist = "Poisson";
output;
keep patient age OKstatus response dist;
run;
This DATA step expands the 32 observations in data set hernio into 64 observations,
stacking two observations per patient. The character variable dist identifies the dis-
tribution that is assumed for the respective observations within a patient. The first
observation for each patient corresponds to the binary response.
The following GLIMMIX statements fit a logistic regression model with two regres-
sors (age and OKstatus) to the binary observations.
The EVENT=(’1’) response option requests that PROC GLIMMIX model the prob-
ability Pr(leave = 1), that is, the probability of routine recovery. The fit statistics
and parameter estimates for this univariate analysis are shown in Output 5.1. The
coefficient for the age effect is negative (−0.07725) and marginally significant at the
5% level (p = 0.0491). The negative sign indicates that the probability of routine
recovery decreases with age. The coefficient for the OKStatus variable is also neg-
ative. Its large standard error and the p-value of 0.7341 indicate, however, that this
regressor is not significant.
Fit Statistics
Parameter Estimates
Standard
Effect Estimate Error DF t Value Pr > |t|
Based on the univariate logistic regression analysis, you would probably want to re-
visit the model, examine other regressor variables, test for gender effects and inter-
actions, and so forth. For this example, we are content with the two-regressor model.
It will be illustrative to trace the relative importance of the two regressors through
various types of models.
The next statements fit the same regressors to the count data.
Fit Statistics
Parameter Estimates
Standard
Effect Estimate Error DF t Value Pr > |t|
For this response, both regressors appear to make significant contributions at the 5%
significance level (Output 5.2). The sign of the coefficient seems appropriate, the
length of hospital stay should increase with patient age and be shorter for patients
with better pre-operative health. The magnitude of the scaled Pearson statistic (4.48)
indicates, however, that there is considerable overdispersion in this model. This could
be due to omitted variables or an improper distributional assumption. The importance
of pre-operative health status, for example, can change with a patient’s age, which
could call for an interaction term.
You can also model both responses jointly. The following statements request a mul-
tivariate analysis.
The DIST=BYOBS option in the MODEL statement instructs the GLIMMIX proce-
dure to examine the variable dist in order to identify the distribution of an observa-
tion. The variable can be character or numeric. See the DIST= option of the MODEL
statement for a list of the numeric codes for the various distributions that are com-
patible with the DIST=BYOBS formulation. Since no LINK= option is specified, the
link functions are chosen as the default links that correspond to the respective distri-
butions. In this case, the LOGIT link is applied to the binary observations and the
LOG link is applied to the Poisson outcomes. The dist variable is also listed in the
CLASS statement, which enables you to use interaction terms in the MODEL state-
ment to vary the regression coefficients by response distribution. The NOINT option
is used here so that the parameter estimates of the joint model are directly comparable
to those in Output 5.1 and Output 5.2.
The “Fit Statistics” and “Parameter Estimates” tables of this bivariate estimation pro-
cess are shown in Output 5.3.
Fit Statistics
Parameter Estimates
Standard
Effect dist Estimate Error DF t Value Pr > |t|
The “Fit Statistics” table now contains a separate column for each response distribu-
tion, as well as an overall contribution. Since the model does not specify any random
effects or R-side correlations, the log likelihoods are additive. The parameter esti-
mates and their standard errors in this joint model are identical to those in Output 5.1
and Output 5.2. The p-values reflect the larger “sample size” in the joint analysis.
Note that the coefficients would be different from the separate analyses, if the dist
variable had not been used to form interactions with the model effects.
208
The GLIMMIX Procedure
There are two ways in which the correlations between the two responses for the same
patient can be incorporated. You can induce them through shared random effects or
model the dependency directly. The following statements fit a model that induces cor-
relation: You add the patient variable to the CLASS statement and add a RANDOM
statement.
Fit Statistics
Standard
Cov Parm Subject Estimate Error
Standard
Effect dist Estimate Error DF t Value Pr > |t|
The “Fit Statistics” table in Output 5.4 no longer has separate columns for each re-
sponse distribution, since the data are not independent. The log (pseudo-)likelihood
does not factor into additive component that correspond to distributions. Instead, it
factors into components associated with subjects.
The estimate of the variance of the random patient intercept is 0.2990, and the esti-
mated standard error of this variance component estimate is 0.1116. There appears to
be significant patient-to-patient variation in the intercepts. The estimates of the fixed
effects as well as their estimated standard errors have changed from the bivariate-
independence analysis (see Output 5.3). When the length of hospital stay and the
post-operative condition are modeled jointly, the pre-operative health status (variable
Example 5. Joint Modeling of Binary and Count Data
209
OKStatus) no longer appears significant. Compare this result to Output 5.3; in the
separate analyses the initial health status was a significant predictor of the length
of hospital stay. A further joint analysis of these data would probably remove this
predictor from the model entirely.
A joint model of the second kind, where correlations are modeled directly, is fit with
the following GLIMMIX statements.
Fit Statistics
Standard
Cov Parm Subject Estimate Error
Standard
Effect dist Estimate Error DF t Value Pr > |t|
The “Covariance Parameter Estimates” table in Output 5.5 contains three entries for
this model, corresponding to a (2 × 2) covariance matrix for each patient. The
Cholesky root of the R matrix is
1.0162 0
L=
0.3942 2.0819
This is not the covariance matrix of the data, however, since the variance functions
need to be accounted for.
The p-values in the “Solutions for Fixed Effects” table indicate the same pattern of
significance and nonsignificance as in the conditional model with random patient
intercepts.
Example 6. Radial Smoothing of Repeated Measures Data
211
data times;
input time1-time23;
datalines;
122 150 166 179 219 247 276 296 324 354 380 445
478 508 536 569 599 627 655 668 723 751 781
;
data cows;
if _n_ = 1 then merge times;
array t{23} time1 - time23;
array w{23} weight1 - weight23;
input cow iron infection weight1-weight23 @@;
do i=1 to 23;
weight = w{i};
tpoint = (t{i}-t{1})/10;
output;
end;
keep cow iron infection tpoint weight;
datalines;
1 0 0 4.7 4.905 5.011 5.075 5.136 5.165 5.298 5.323
5.416 5.438 5.541 5.652 5.687 5.737 5.814 5.799
5.784 5.844 5.886 5.914 5.979 5.927 5.94
2 0 0 4.868 5.075 5.193 5.22 5.298 5.416 5.481 5.521
5.617 5.635 5.687 5.768 5.799 5.872 5.886 5.872
5.914 5.966 5.991 6.016 6.087 6.098 6.153
3 0 0 4.868 5.011 5.136 5.193 5.273 5.323 5.416 5.46
5.521 5.58 5.617 5.687 5.72 5.753 5.784 5.784
5.784 5.814 5.829 5.872 5.927 5.9 5.991
4 0 0 4.828 5.011 5.136 5.193 5.273 5.347 5.438 5.561
5.541 5.598 5.67 . 5.737 5.844 5.858 5.872
5.886 5.927 5.94 5.979 6.052 6.028 6.12
5 1 0 4.787 4.977 5.043 5.136 5.106 5.298 5.298 5.371
5.438 5.501 5.561 5.652 5.67 5.737 5.784 5.768
5.784 5.784 5.829 5.858 5.914 5.9 5.94
6 1 0 4.745 4.868 5.043 5.106 5.22 5.298 5.347 5.347
5.416 5.501 5.561 5.58 5.687 5.72 5.737 5.72
5.737 5.753 5.768 5.784 5.844 5.844 5.9
7 1 0 4.745 4.905 5.011 5.106 5.165 5.273 5.371 5.416
5.416 5.521 5.541 5.635 5.687 5.704 5.784 5.768
5.768 5.814 5.829 5.858 5.94 5.94 6.004
8 0 1 4.942 5.106 5.136 5.193 5.298 5.347 5.46 5.521
5.561 5.58 5.635 5.704 5.784 5.823 5.858 5.9
5.94 5.991 6.016 6.064 6.052 6.016 5.979
9 0 1 4.605 4.745 4.868 4.905 4.977 5.22 5.165 5.22
212
The GLIMMIX Procedure
The mean response profiles of the cows are not of particular interest; what mat-
ters are inferences about the Iron effect, the Infection effect, and their interaction.
Nevertheless, the body weight of the cows changes over the 22-month period, and
Example 6. Radial Smoothing of Repeated Measures Data
213
you need to account for these changes in the analysis. A reasonable approach is to
apply the approximate low-rank smoother to capture the trends over time. This ap-
proach frees you from having to stipulate a parametric model for the response trajec-
tories over time. In addition, you can test hypotheses about the smoothing parameter;
for example, whether it should be varied by treatment.
The following statements fit a model with a 2 × 2 factorial treatment structure and
smooth trends over time, choosing the Newton-Raphson algorithm with ridging for
the optimization.
The continuous time effect appears in both the MODEL (tpoint) and the RANDOM
statement (t2). Because the variance of the radial smoothing component is dependent
on the temporal metric, the time scale was rescaled for the RANDOM effect to move
the parameter estimate away from the boundary. The knots of the radial smoother
are selected as the vertices of a k-d tree. Specifying BUCKET=100 sets the bucket
size of the tree to b = 100. Since measurements at each time point are available for
26 (or 25) cows, this groups approximately four time points in a single bucket. The
KNOTINFO keyword of the KNOTMETHOD= option requests a printout of the knot
locations for the radial smoother. The OUTPUT statement saves the predictions of
the mean of each observations to the data set gmxout. Finally, the TECH=NEWRAP
option in the NLOPTIONS statement specifies the Newton-Raphson algorithm for
the optimization technique.
Output 6.1. Repeated Measures Analysis
The GLIMMIX Procedure
Model Information
The “Class Level Information” table in Output 6.2 lists the number of levels of the
Cow, Iron, and Infection effects.
214
The GLIMMIX Procedure
cow 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26
iron 2 0 1
infection 2 0 1
In Output 6.3, the “Radial Smoother Knots for RSmooth(t2)” table displays the knots
computed from the vertices of the t2 k-d tree. Notice that knots are spaced unequally
and that the extreme time points are among the knot locations. The “Number of
Observations” table shows that one observation was not used in the analysis. The
twelfth observation for cow 4 has a missing value.
Radial Smoother
Knots for
RSmooth(t2)
Knot
Number t2
1 0
2 0.04400
3 0.1250
4 0.2020
5 0.3230
6 0.4140
7 0.5050
8 0.6010
9 0.6590
The “Dimensions” table in Output 6.4 shows that the model contains only two co-
variance parameters, the G-side variance of the spline coefficients (σ 2 ) and the R-side
scale parameter (φ). For each subject (cow), there are nine columns in the Z matrix,
one per knot location. The GLIMMIX procedure processes these data by subjects
(cows).
Example 6. Radial Smoothing of Repeated Measures Data
215
Dimensions
Optimization Information
After 11 iterations, the optimization process terminates (Output 6.6). In this case, the
absolute gradient convergence criterion was met.
Iteration History
Objective Max
Iteration Restarts Evaluations Function Change Gradient
0 0 4 -1302.549272 . 20.33682
1 0 3 -1451.587367 149.03809501 9.940495
2 0 3 -1585.640946 134.05357887 4.71531
3 0 3 -1694.516203 108.87525722 2.176741
4 0 3 -1775.290458 80.77425512 0.978577
5 0 3 -1829.966584 54.67612585 0.425724
6 0 3 -1862.878184 32.91160012 0.175992
7 0 3 -1879.329133 16.45094874 0.066061
8 0 3 -1885.175082 5.84594887 0.020137
9 0 3 -1886.238032 1.06295072 0.00372
10 0 3 -1886.288519 0.05048659 0.000198
11 0 3 -1886.288673 0.00015425 6.364E-7
The generalized chi-square statistic in the “Fit Statistics” table is small for this model
(Output 6.7). There is very little residual variation. The radial smoother is associated
with 433.55 residual degrees of freedom, computed as 597 minus the trace of the
smoother matrix.
Fit Statistics
The “Covariance Parameter Estimates” table in Output 6.8 displays the solutions of
the covariance parameters. The variance of the random spline coefficients is esti-
b2 = 0.5961, and the scale parameter (=residual variance) estimate is φb =
mated as σ
0.0008.
Standard
Cov Parm Subject Estimate Error
The “Type III Tests of Fixed Effects” table in Output 6.9 displays F tests for the fixed
effects in the MODEL statement.
Num Den
Effect DF DF F Value Pr > F
The smoothing parameter in this analysis is related to the covariance parameter esti-
mates. Because there is only one radial smoothing variance component, the amount
of smoothing is the same in all four treatment groups. To test whether the smooth-
ing parameter should be varied by group, you can refine the analysis of the previous
model. The following statements fit the same general model, but vary the covariance
parameters by the levels of the Iron*Infection interaction. This is accomplished with
the GROUP= option of the RANDOM statement.
subject=cow
group=iron*infection
knotmethod=kdtree(bucket=100);
nloptions tech=newrap;
run;
All observations that have the same value combination of the Iron and Infection
effects share the same covariance parameter. As a consequence, you obtain different
smoothing parameters result in the four groups.
In Output 6.10, the “Optimization Information” table shows that there are now four
covariance parameters in the optimization, one spline coefficient variance for each
group.
Optimization Information
Varying this variance component by groups has changed the −2 Res Log Likelihood
from −1886.29 to −1887.95 (Output 6.11). The difference, 1.66, can be viewed
(asymptotically) as the realization of a chi-square random variable with 3 degrees of
freedom. The difference is not significant (p = 0.64586).
Fit Statistics
The “Covariance Parameter Estimates” table in Output 6.12 confirms that the esti-
mates of the spline coefficient variance are quite similar in the four groups, ranging
from 0.4788 to 0.7105.
Example 6. Radial Smoothing of Repeated Measures Data
219
Standard
Cov Parm Subject Group Estimate Error
Finally, you can apply a different technique for varying the temporal trends among the
cows. From Figure 24 it appears that an assumption of parallel trends within groups
might be reasonable. In other words, you can fit a model in which the “overall” trend
over time in each group is modeled nonparametrically, and this trend is shifted up or
down to capture the behavior of the individual cow. You can accomplish this with the
following statements.
There are now two subject effects in this analysis. The first RANDOM statement
applies the radial smoothing and identifies the experimental conditions as the sub-
ject. For each condition, a separate realization of the random spline coefficients is
obtained. The second RANDOM statement adds a random intercept to the trend for
each cow. This random intercept results in the parallel shift of the trends over time.
The “Fit Statistics” table is shown in Output 6.13. Because the parallel shift model is
not nested within either one of the previous models, the models cannot be compared
with a likelihood ratio test. However, you can draw on the other fit statistics.
220
The GLIMMIX Procedure
Fit Statistics
All statistics indicate that this model does not fit the data as well as the initial model
that varies the spline coefficients by cow. The Pearson chi-square statistic is more
than twice as large as in the previous model, indicating much more residual variation
in the fit. On the other hand, this model generates only four sets of spline coefficients,
one for each treatment group, and thus retains more residual degrees of freedom.
The “Covariance Parameter Estimates” table in Output 6.14 displays the solutions for
the covariance parameters. The estimate of the variance of the spline coefficients is
not that different from the estimate obtained in the first model (0.5961). The residual
variance, however, has more than doubled.
Standard
Cov Parm Subject Estimate Error
The Infection effect continues to be significant in this model; however, p-values are
larger for all fixed effects as compared to the initial model (Output 6.15).
Example 6. Radial Smoothing of Repeated Measures Data
221
Num Den
Effect DF DF F Value Pr > F
You can clearly see the parallel shifts of the nonparametric smooths in Figure 25. In
the groups receiving only iron or only an infection, the parallel lines assumption holds
quite well. In the control group and the group receiving iron and the infection, the
parallel shift assumption does not hold as well. Two of the profiles in the iron-only
group are nearly indistinguishable.
sidered here, the appropriate model is one with a single smoothing parameter for all
treatment group and cow-specific spline coefficients.
data FerriteCores;
do Temp = 1 to 4;
do rep = 1 to 5; drop rep;
input MagneticForce @@;
output;
end;
end;
datalines;
10.8 9.9 10.7 10.4 9.7
10.7 10.6 11.0 10.8 10.9
11.9 11.2 11.0 11.1 11.3
11.4 10.7 10.9 11.3 11.7
;
It is of interest to test whether the magnetic force of the cores rises monotonically
with temperature. The approach of Hirotsu and Srivastava (2000) depends on the
lower confidence limits of the isotonic contrasts of the force means at each temper-
ature, adjusted for multiplicity. The corresponding isotonic contrast compares the
average a particular group and the preceding groups with the average of the succeed-
ing groups. You can compute adjusted confidence intervals for isotonic contrasts
using the LSMESTIMATE statement.
The following statements request an analysis of the FerriteCores data as a one-way
design and multiplicity adjusted lower confidence limits for the isotonic contrasts.
For the multiplicity adjustment, the LSMESTIMATE statement employs simulation,
which provides adjusted p-values and lower confidence limits that are exact up to
Monte Carlo error.
Standard
Effect Label Estimate Error DF t Value Pr > t Adj P
Adj Adj
Effect Label Alpha Lower Upper Lower Upper
With an adjusted p-value of 0.001, the magnetic force at the first temperature is sig-
nificantly less than the average of the other temperatures. Likewise, the average
of the first two temperatures is significantly less than the average of the last two
(p = 0.0009). However, the force at the last temperature is not significantly greater
than the average of the others (p = 0.0625). These results indicate a significant
monotone increase over the first three temperatures, but not across all four tempera-
tures.
References
Akaike, H. (1974), “A New Look at the Statistical Model Identification,” IEEE
Transaction on Automatic Control, AC–19, 716–723.
Bahadur, R. R. (1961), “A Representation of the Joint Distribution of Responses to n
Dichotomous Items,” in Studies in Item Analysis and Prediction, ed. H. Solomon,
Stanford, CA: Stanford University Press.
Beale, E. M. L. (1972), “A Derivation of Conjugate Gradients,” in Numerical
Methods for Nonlinear Optimization, ed. F.A. Lootsma, London: Academic
Press.
Bell, R. M. and McCaffrey, D. F. (2002), “Bias Reduction in Standard Errors
for Linear Regression with Multi-Stage Samples,” Survey Methodology, 28,
169–181.
Bozdogan, H. (1987), “Model Selection and Akaike’s Information Criterion
(AIC): The General Theory and Its Analytical Extensions,” Psychometrika, 52,
345–370.
Breslow, N. E. and Clayton, D. G. (1993), “Approximate Inference in Generalized
Linear Mixed Models,” Journal of the American Statistical Association, 88,
9–25.
224
The GLIMMIX Procedure
Fuller, W. A. (1976), Introduction to Statistical Time Series, New York: John Wiley
& Sons, Inc.
Games, P. A., and Howell, J. F. (1976), “Pairwise Multiple Comparison Procedures
With Unequal n’s and/or Variances: A Monte Carlo Study,” Journal of
Educational Statistics, 1, 113–125.
Gay, D. M. (1983), “Subroutines for Unconstrained Minimization,” ACM
Transactions on Mathematical Software, 9, 503–524.
Giesbrecht, F. G. and Burns, J. C. (1985), “Two-Stage Analysis Based on a
Mixed Model: Large-sample Asymptotic Theory and Small-Sample Simulation
Results,” Biometrics, 41, 477–486.
Gilliland, D. and Schabenberger, O. (2001), “Limits on Pairwise Association for
Equi-Correlated Binary Variables,” Journal of Applied Statistical Sciences, 10,
279–285.
Goodnight, J. H. (1978), SAS Technical Report R-101, Tests of Hypotheses in Fixed-
Effects Linear Models, Cary, NC: SAS Institute Inc.
Goodnight, J. H. (1978), SAS Technical Report R-105, Computing MIVQUE0
Estimates of Variance Components, Cary, NC: SAS Institute Inc.
Goodnight, J. H. (1979), “A Tutorial on the Sweep Operator,” American Statistician,
33, 149–158.
Goodnight, J. H. and Hemmerle, W. J. (1979), “A Simplified Algorithm for
the W-Transformation in Variance Component Estimation,” Technometrics, 21,
265–268.
Gotway, C. A. and Stroup, W. W. (1997), “A Generalized Linear Model Approach
to Spatial Data and Prediction,” Journal of Agricultural, Biological, and
Environmental Statistics, 2, 157–187.
Guirguis, G. H. and Tobias, R. D. (2004), “On the Computation of the Distribution
for the Analysis of Means,” Communications in Statistics: Simulation and
Computation, 33, 861–888.
Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J., and Ostrowski, E. (1994), A
Handbook of Small Data Sets, London: Chapman and Hall.
Hannan, E. J. and Quinn, B. G. (1979), “The Determination of the Order of an
Autoregression,” Journal of the Royal Statistical Society, Series B, 41, 190–195.
Harville, D. A. and Jeske, D. R. (1992), “Mean Squared Error of Estimation or
Prediction Under a General Linear Model,” Journal of the American Statistical
Association, 87, 724–731.
Hemmerle, W. J. and Hartley, H. O. (1973), “Computing Maximum Likelihood
Estimates for the Mixed AOV Model Using the W-Transformation,”
Technometrics, 15, 819–831.
Henderson, C. R. (1984), Applications of Linear Models in Animal Breeding,
University of Guelph.
226
The GLIMMIX Procedure
A BYCAT option
ABSCONV option CONTRAST statement (GLIMMIX), 32
NLOPTIONS statement (GLIMMIX), 69 ESTIMATE statement (GLIMMIX), 35
ABSFCONV option BYCATEGORY option
NLOPTIONS statement (GLIMMIX), 69 CONTRAST statement (GLIMMIX), 32
ABSGCONV option ESTIMATE statement (GLIMMIX), 35
NLOPTIONS statement (GLIMMIX), 69 BYLEVEL option
ABSGTOL option LSMEANS statement (GLIMMIX), 43
NLOPTIONS statement (GLIMMIX), 69 LSMESTIMATE statement (GLIMMIX), 53
ABSPCONV option
PROC GLIMMIX statement, 16 C
ABSTOL option CHISQ option
NLOPTIONS statement (GLIMMIX), 69 CONTRAST statement (GLIMMIX), 33
ABSXCONV option LSMESTIMATE statement (GLIMMIX), 53
NLOPTIONS statement (GLIMMIX), 70 MODEL statement (GLIMMIX), 59
ABSXTOL option CHOL option
NLOPTIONS statement (GLIMMIX), 70 PROC GLIMMIX statement, 17
ADJDFE= option CHOLESKY option
ESTIMATE statement (GLIMMIX), 34 PROC GLIMMIX statement, 17
LSMEANS statement (GLIMMIX), 40 CL option
LSMESTIMATE statement (GLIMMIX), 52 ESTIMATE statement (GLIMMIX), 35
ADJUST= option LSMEANS statement (GLIMMIX), 43
ESTIMATE statement (GLIMMIX), 34 LSMESTIMATE statement (GLIMMIX), 53
LSMEANS statement (GLIMMIX), 40 MODEL statement (GLIMMIX), 59
LSMESTIMATE statement (GLIMMIX), 52 RANDOM statement (GLIMMIX), 87
ALLSTATS option CLASS statement
OUTPUT statement (GLIMMIX), 80 GLIMMIX procedure, 30
ALPHA= option CONTRAST statement
ESTIMATE statement (GLIMMIX), 34 GLIMMIX procedure, 30
LSMEANS statement (GLIMMIX), 42 CORR option
LSMESTIMATE statement (GLIMMIX), 53 LSMEANS statement (GLIMMIX), 43
OUTPUT statement (GLIMMIX), 80 LSMESTIMATE statement (GLIMMIX), 54
RANDOM statement (GLIMMIX), 87 CORRB option
ASYCORR option MODEL statement (GLIMMIX), 59
PROC GLIMMIX statement, 17 COV option
ASYCOV option LSMEANS statement (GLIMMIX), 43, 54
PROC GLIMMIX statement, 17 COVB option
AT MEANS option MODEL statement (GLIMMIX), 59
LSMEANS statement (GLIMMIX), 42 COVBI option
LSMESTIMATE statement (GLIMMIX), 53 MODEL statement (GLIMMIX), 59
AT option
LSMEANS statement (GLIMMIX), 42, 43 D
LSMESTIMATE statement (GLIMMIX), 53 DAMPSTEP option
NLOPTIONS statement (GLIMMIX), 70
B DATA= option
BUCKET= suboption OUTPUT statement (GLIMMIX), 78
RANDOM statement (GLIMMIX), 89 PROC GLIMMIX statement, 17
BY statement DDF= option
GLIMMIX procedure, 29
238
Syntax Index
U
UPD option
NLOPTIONS statement (GLIMMIX), 76
UPDATE option
NLOPTIONS statement (GLIMMIX), 76
UPPERB= option
PARMS statement (GLIMMIX), 86
UPPERTAILED option
ESTIMATE statement (GLIMMIX), 38
LSMESTIMATE statement (GLIMMIX), 56
V
V option
RANDOM statement (GLIMMIX), 99
VC option
RANDOM statement (GLIMMIX), 100
VCI option
RANDOM statement (GLIMMIX), 100
VCORR option
RANDOM statement (GLIMMIX), 100
VI option
RANDOM statement (GLIMMIX), 100
W
WEIGHT statement
GLIMMIX procedure, 100
X
XCONV option
NLOPTIONS statement (GLIMMIX), 76
XSIZE option
NLOPTIONS statement (GLIMMIX), 77
XTOL option
NLOPTIONS statement (GLIMMIX), 76
Z
ZETA= option
MODEL statement (GLIMMIX), 67
244
Syntax Index