A Short Introduction To Linear Mixed Models
A Short Introduction To Linear Mixed Models
A Short Introduction To Linear Mixed Models
Xiang Ao
October 5, 2007
yij = αj + rij
where
rij ∼ N (0, σ 2 )
Here i is student; j is school index. We express a student’s score in terms of
the sum of an intercept for school (αj ) and a random error rij associated with
student i at school j.
Now school level intercepts can be expressed as the sum of an overall mean
µ and random deviations from that mean (uj ):
αj = µ + uj
where
uj ∼ N (0, σu2 ).
Putting the two equations together we have a multilevel model:
yij = µ + uj + rij
where
rij ∼ N (0, σ 2 )
and
uj ∼ N (0, σu2 ).
1
We can see that without the extra term uj in the model, we basically have
a simple mean model; or say we have a linear regression model with only a
constant term as the regressor. By introducing ui into the model, we are able
to take account of the school effect in this case. That is, we model not only
the grand mean, but also the mean within school. Also, we have the benefit of
decomposing the total variance into two components: σ 2 and σu2 . This way we
can explain how much of the variation comes from variation between schools
and how much comes from variation within schools (between students). To
be able to do these, we sacrifice by making an extra distribution assumption
uj ∼ N (0, σu2 ).
Let’s consider another example. Suppose we have patent prior art data. We
have the proportion of prior art by examiners as the dependent variables. We
have three levels: patent applicants, examiners, and technology. These three
levels are crossed; they are not nested as students, schools, districts. Our model
would be:
yij = µ + uj + vk + wl + rijkl
where
rijkl ∼ N (0, σ 2 )
and
uj ∼ N (0, σu2 ),
vk ∼ N (0, σv2 ),
2
wl ∼ N (0, σw ).
αj = µ + uj
βj = η + vj
where
2
uj σ σuv
∼ N (0, u )
vj σuv σv2
In this model, we allow both the intercept and slopes of predictors vary
across groups (levels). We make the assumption that the random components
2
of the intercept and slopes are jointly distributed as multivariate normal. σu2
is the variance of level 2 residuals uj from predicting level 1 intercept, σv2 is
the variance of level 2 residuals vj from predicting level 1 slope. σuv is the
covariance between uj and vj .
yij = µ + uj + rij
The difference in these two methods is: ANOVA is basically putting in group
(level) dummies in the regression and mixed model is putting group-varying
components into the error term.
ANOVA calculates the intraclass correlation by calculating between-group
mean square error and within-group mean square error. Intraclass correlation is
ratio of between-group variance and total variance (the sum of between-group
variance and within-group variance). A linear mixed model treat ui as part of
the variance of the model; then it estimates the variance-covariance matrix.
For a model with only one group variable, the difference between variance
decomposition by ANOVA and mixed model may not be huge. However, for a
model with multiple group variables, such as
yij = µ + uj + vk + wl + rijkl .
If the three group variables are not orthogonal, then the variance decompo-
sition by ANOVA depends on the order they are put in the model, while in a
mixed model, the order is totally irrelevant.
y = Xβ + Zγ + e
where
e ∼ N (0, R)
γ ∼ N (0, G)
, and e and γ are uncorrelated.
This model has two parts: Xβ is the fixed effect part, and Zγ is the random
effect part.
In this model, E(y) = Xβ and Var(y) = Var(Zγ) + Var(e) = ZGZ0 + R.
This means that in the linear mixed models, the unconditional mean is
determined by the fixed effect part only.
The other way to formulate this is:
3
y|γ ∼ N(Xβ + Zγ, R)
where
γ ∼ N (0, G)
.
η = Xβ + Zγ.
Second, if we have a linear mixed model, then we simply need to add in a
normally distributed error term:
y = η + e = Xβ + Zγ + e.
Equivalently, we have
4
Now if we don’t have y normally distributed, then we have to “link” the in-
dependent variable to the linear predictor η, therefore the name “link function”.
g(y) = η
We have
Basically, this says that we need to integrate out the random effect part
to get the unconditional likelihood function. Unfortunately usually we don’t
have an analytic solution for this, since random effects can be high-dimensional.
Three possible solutions are proposed: