Multilevel Modeling Using R - Finch Bolin Kelley
Multilevel Modeling Using R - Finch Bolin Kelley
Multilevel Modeling Using R - Finch Bolin Kelley
Multilevel
Modeling
Using R
W. Holmes Finch
Jocelyn E. Bolin
Ken Kelley
Multilevel
Modeling Using R
Chapman & Hall/CRC
Statistics in the Social and Behavioral Sciences Series
Series Editors
Jeff Gill Steven Heeringa
Washington University, USA University of Michigan, USA
Tom Snijders
Oxford University, UK
University of Groningen, NL
Large and complex datasets are becoming prevalent in the social and behavioral
sciences and statistical methods are crucial for the analysis and interpretation of such
data. This series aims to capture new developments in statistical methodology with
particular relevance to applications in the social and behavioral sciences. It seeks to
promote appropriate use of statistical, econometric and psychometric methods in
these applied sciences by publishing a broad range of reference works, textbooks and
handbooks.
Multilevel
Modeling Using R
W. Holmes Finch
Ball State University
Muncie, Indiana, USA
Jocelyn E. Bolin
Ball State University
Muncie, Indiana, USA
Ken Kelley
University of Notre Dame
Notre Dame, Indiana, USA
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the
copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let
us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted,
or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, includ-
ing photocopying, microfilming, and recording, or in any information storage or retrieval system, without written
permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers,
MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety
of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment
has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
Preface.......................................................................................................................xi
About the Authors............................................................................................... xiii
1. Linear Models...................................................................................................1
1.1 Simple Linear Regression..................................................................... 2
1.1.1 Estimating Regression Models with Ordinary
Least Squares............................................................................. 2
1.2 Distributional Assumptions Underlying Regression.......................3
1.3 Coefficient of Determination................................................................4
1.4 Inference for Regression Parameters...................................................5
1.5 Multiple Regression............................................................................... 7
1.6 Example of Simple Manual Linear Regression.................................. 9
1.7 Regression in R..................................................................................... 12
1.7.1 Interaction Terms in Regression........................................... 14
1.7.2 Categorical Independent Variables...................................... 15
1.7.3 Checking Regression Assumptions with R........................ 18
Summary.......................................................................................................... 21
vii
viii Contents
The goal of this book is to provide you, the reader, with a c omprehensive
resource for the conduct of multilevel modeling using the R software pack-
age. Multilevel modeling, sometimes referred to as hierarchical modeling,
is a powerful tool that allows a researcher to account for data collected at
multiple levels. For example, an educational researcher may gather test
scores and measures of socioeconomic status (SES) for students who attend
a number of different schools. The students would be considered level-1
sampling units, and the schools would be referred to as level-2 units.
Ignoring the structure inherent in this type of data collection can, as we
discuss in Chapter 2, lead to incorrect parameter and standard error esti-
mates. In addition to modeling the data structure correctly, we will see in
the following chapters that the use of multilevel models can also provide
insights into the nature of relationships in our data that might otherwise not
be detected.
After reviewing standard linear models in Chapter 1, we will turn our
attention to the basics of multilevel models in Chapter 2, before learning
how to fit these models using the R software package in Chapters 3 and 4.
Chapter 5 focuses on the use of multilevel modeling in the case of longitu-
dinal data, and Chapter 6 demonstrates the very useful graphical options
available in R, particularly those most appropriate for multilevel data.
Chapters 7 and 8 describe models for categorical dependent variables, first
for single-level data, and then in the multilevel context. Finally, we conclude
in Chapter 9 with Bayesian fitting of multilevel models.
We hope that you find this book to be helpful as you work with multi-
level data. Our goal is to provide you with a guidebook that will serve as
the launching point for your own investigations in multilevel modeling.
The R code and discussion of its interpretation contained in this text should
provide you with the tools necessary to gain insights into your own research,
in whatever field it may be. We appreciate your taking the time to read our
work and hope that you find it as enjoyable and informative to read as it was
for us to write.
xi
About the Authors
xiii
1
Linear Models
1
2 Multilevel Modeling Using R
yi = β0 + β1xi + εi (1.1)
where yi is the dependent variable for individual i in the data set and xi is
the independent variable for subject i (i = 1, …, N). The terms β0 and β1, are
the intercept and slope of the model, respectively. In a graphical sense, the
intercept is the point at which the line in Equation (1.1) crosses the y axis at
x = 0. It is also the mean, specifically the conditional mean, of y for individuals
with values of 0 on x. This latter definition will be most useful in actual prac-
tice. The slope β1 expresses the relationship between y and x. Positive slope
values indicate that larger values of x are associated with correspondingly
larger values of y, while negative slopes mean that larger x values are associ-
ated with smaller y values. Holding everything else constant, larger values
of β1 (positive or negative) indicate a stronger linear relationship between
y and x. Finally, ει represents the random error inherent in any statistical
model, including regression. It expresses the fact that for any individual, i,
the model will not generally provide a perfect predicted value of yi, denoted
y i and obtained by applying the regression model as
yˆ i = β 0 + β1 xi (1.2)
of these methods is ordinary least squares (OLS). The vast majority of other
approaches are useful in special cases involving small samples or data that
fail to conform to the distributional assumptions undergirding OLS.
The goal of OLS is to minimize the sum of the squared differences between
the observed values of y and the model predicted values of y across the sam-
ple. This difference, known as the residual, is written as
ei = y i − ŷ i (1.3)
n n
∑ ∑ ( y − ŷ )
2
ei2 = i i (1.4)
i =1 i =1
The actual mechanism for finding the linear equation that minimizes the
sum of squared residuals involves the partial derivatives of the sum of
squared function with respect to the model coefficients β0 and β1. We will
leave these mathematical details to excellent references such as Fox (2008).
Note that in the context of simple linear regression, the OLS criteria reduce
to the following equations that can be used to obtain b0 and b1 as
sy
b1 = r (1.5)
sx
and
b0 = y − b1 x (1.6)
is not linear, then clearly an equation for a line will not provide adequate fit
and the model is thus misspecified. A second assumption is that the variance
in the residuals is constant regardless of the value of xi. This assumption is
typically referred to as homoscedasticity and is a generalization of the homo-
geneity of error variance assumption in ANOVA. Homoscedasticity implies
that the variance of yi is constant across values of xi. The distribution of the
dependent variables around the regression line is literally the distribution of
the residuals, thus making clear the connection of homoscedasticity of errors
with the distribution of yi around the regression line. The third assumption
is that the residuals are normally distributed in a population. Fourth is the
assumption that the independent variable x is measured without error and
that it is unrelated to the model error term ε. It should be noted that the
assumption of x measured without error is not as strenuous as one might
first assume. In fact, for most real-world problems, the model will work well
even when the independent variable is not error free (Fox, 2008). Fifth and
finally, the residuals for any two individuals in a population are assumed to
be independent of one another. This independence assumption implies that
the unmeasured factors influencing y are not related from one individual to
another and addressed directly with the use of multilevel models, as we will
see in Chapter 2.
In many research situations, individuals are sampled in clusters, such that
we cannot assume that individuals from the same cluster will have uncor-
related residuals. For example, if samples are obtained from multiple neigh-
borhoods, individuals within the same neighborhoods may tend to be more
like one another than they are like individuals from other neighborhoods.
A prototypical example of this is children in schools. Due to a variety of
factors, children attending the same school often have more in common with
one another than they do with children from other schools. These common
factors may include neighborhood socioeconomic status, school administra-
tion policies, and school learning environment, to name just a few.
Ignoring this clustering or not even realizing it is a problem can be detri-
mental to the results of statistical modeling. We explore this issue in great
detail later in the book, but for now we simply want to mention that a fail-
ure to satisfy the assumption of independent errors is (1) a major problem
and (2) often a problem that may be overcome with appropriate models,
such as multilevel models that explicitly consider the nesting of data.
n n
∑ ( yˆ i − y ) ∑ ( y − yˆ )
2 2
i
SSR i =1 i =1 SSE
R2 = = n = 1− n = 1− (1.7)
SST SST
∑(y − y ) ∑(y − y )
2 2
i i
i =1 i =1
The terms in Equation (1.7) are as defined previously. The value of this sta-
tistic always lies between 0 and 1, with larger numbers indicating a stronger
linear relationship between x and y, implying that the independent variable
accounts for more variance in the dependent. R 2 is a very commonly used
measure of the overall fit of a regression model. Along with the parameter
inference discussed below, it serves as the primary mechanism by which the
relationship between the two variables is quantified.
data from a single sample, much as we did with b0 and b1. To obtain sb1,
we must first calculate the variance of the residuals,
∑e 2
i
i =1
Se2 = (1.8)
n− p−1
where ei is the residual value for individual i, N is the sample size, and p is the
number of independent variables (one in the case of simple regression). Then
1 Se
Sb1 = (1.9)
1 − R2 n
∑ (x − x )
i =1
i
2
∑xi =1
2
i
Because the sample intercept and slope are only estimates of the popu-
lation parameters, researchers often are interested in testing hypoth-
eses to infer whether the data represent a departure from what would be
expected in what is commonly referred to as the null case (the null value
holding true in the population can be rejected). Usually (but not always),
the inference of interest concerns testing that the population parameter is 0.
In particular, a non-0 slope in a population means that x is linearly related
to y. Therefore, researchers typically are interested in using the sample to
make inference about whether the population slope is 0 or not. Inference
can also be made regarding the intercept, and again the typical focus is on
whether the value is 0 in the population.
Inference about regression parameters can be made using confidence inter-
vals and hypothesis tests. Much as with the confidence interval of the mean,
the confidence interval of the regression coefficient yields a range of values
within which we have some level of confidence (e.g., 95%) that the population
parameter value resides. If our particular interest is in whether x is linearly
related to y, then we would simply determine whether 0 is in the interval for β1.
If so, then we could not conclude that the population value differs from 0.
Linear Models 7
and
Here the parameter estimates and their standard errors are as described pre-
viously, while tcv is the critical value of the t distribution for 1 – α/2 (e.g., the
0.975 quantile if α = 0.05) with n – p – 1 degrees of freedom. The value of α is
equal to 1 minus the desired level of confidence. Thus, for a 95% confidence
interval (0.95 level of confidence), α would be 0.05.
In addition to confidence intervals, inference about the regression param-
eters can also be made using hypothesis tests. In general, the forms of this
test for the slope and intercept, respectively, are
b1 − β1
tb1 = (1.13)
sb1
b0 − β 0
tb0 = (1.14)
sb0
The terms β1 and β0 are the parameter values under the null hypothesis.
Again, most often the null hypothesis posits that there is no linear relationship
between x and y (β1 = 0) and that the value of y = 0 when x = 0 (β0 = 0). For sim-
ple regression, each of these tests is conducted with n – 2 degrees of freedom.
In many ways, this model is interpreted like the one for simple linear
regression. The only major difference between simple and multiple regres-
sion interpretation is that each coefficient is interpreted in turn holding
constant the value of the other regression coefficient. In particular, the
parameters are estimated by b 0, b1, and b2, and inferences about these
parameters are made in the same fashion for both confidence intervals and
hypothesis tests.
The assumptions underlying this model are also the same as those described
for the simple regression model. Despite these similarities, three additional
topics regarding multiple regression need to be considered here. These are
inference for the set of model slopes as a whole, an adjusted measure of the
coefficient of determination, and collinearity among the independent variables.
Because these issues will be important in the context of multilevel modeling as
well, we will address them in detail.
With respect to model inference, for simple linear regression, the most
important parameter is generally the slope, so that inference for it will be
of primary concern. When a model has multiple x variables, the researcher
may want to know whether the independent variables taken as a whole are
related to y. Therefore, some overall test of model significance is desirable.
The null hypothesis for this test is that all of the slopes are equal to 0 in the
population; i.e., none of the regressors is linearly related to the dependent
variable. The test statistic for this hypothesis is calculated as
SSR p n − p − 1 R2
F= = 2
(1.16)
SSE ( n − p − 1) p 1− R
Here, terms are as defined in Equation (1.7). This test statistic is distributed
as an F with p and n – p – 1 degrees of freedom. A statistically significant
result would indicate that one or more of the regression coefficients are not
equal to 0 in the population. Typically, the researcher would then refer to the
tests of individual regression parameters described above in order to iden-
tify which parameters were not equal to 0.
A second issue to be considered by researchers in the context of multi-
ple regression is the notion of adjusted R 2. Stated simply, the inclusion of
additional independent variables in the regression model will always yield
higher values of R 2, even when these variables are not statistically signifi-
cantly related to the dependent variable. In other words, there is a capitaliza-
tion on chance that occurs in the calculation of R 2.
As a consequence, models including many regressors with negligible
relationships with y may produce an R 2 that would suggest the model
explains a great deal of variance in y. An option for measuring the vari-
ance explained in the dependent variable that accounts for this additional
model complexity would be helpful to a researcher seeking to under-
stand the true nature of the relationship between the set of independent
Linear Models 9
variables and the dependent. Such a measure exists in the form of the
adjusted R 2 value, which is commonly calculated as
n−1
( )
RA2 = 1 − 1 − R 2 (1.17)
n − p − 1
RA2 only increases with the addition of an x if that x explains more variance
than would be expected by chance. RA2 will always be less than or equal to the
standard R2. It is generally recommended to use this statistic in practice when
models containing many independent variables are used.
A final important issue specific to multiple regression is collinearity, which
occurs when one independent variable is a linear combination of one or more
of the other independent variables. In such a case, regression coefficients and
their corresponding standard errors can be quite unstable, resulting in poor
inference. It is possible to investigate the presence of collinearity using a sta-
tistic known as the variance inflation factor (VIF). To calculate the VIF for xj,
we would first regress all the other independent variables onto xj and obtain
an Rxi2 value. We then calculate
1
VIF = (1.18)
1 − Rx2
The VIF will become large when Rxj2 is near 1, indicating that xj has very
little unique variation when the other independent variables in the model
are considered. That is, if the other p – 1 regressors can explain a high pro-
portion of xj, then xj does not add much to the model above and beyond
the other p – 1 regression. Collinearity in turn leads to high sampling
variation in bj, resulting in large standard errors and unstable parameter
estimates. Conventional rules of thumb have been proposed for determin-
ing when an independent variable is highly collinear with the set of other
p – 1 r egressors. Thus, the researcher may consider collinearity a problem
if VIF > 5 or 10 (Fox, 2008). The typical response to collinearity is to remove
the offending variable(s) or use an alternative approach to conducting the
regression analysis such as ridge regression or regression following a prin-
cipal components analysis.
TABLE 1.1
Descriptive Statistics and Correlation of GPA and
Test Anxiety
Variable Mean Standard Deviation Correlation
GPA 3.12 0.51 –0.30
Anxiety 35.14 10.83
which higher scores indicate greater anxiety when taking a test. The sample
consisted of 440 college students who were measured on both variables. The
researcher is interested in the extent to which test anxiety is related to col-
lege GPA, so that GPA is the dependent variable and anxiety is the indepen-
dent variable. The descriptive statistics for each variable and the correlations
between them appear in Table 1.1.
We can use this information to obtain estimates for both the slope and
intercept of the regression model using Equations (1.4) and (1.5). First, the
slope is calculated as
0.51
b1 = −0.30 = −0.014
10.83
indicating that individuals with higher test anxiety scores will generally
have lower GPAs. Next, we can use this value and information in the table to
calculate the intercept estimate:
Thus, this model would predict that for a one-point increase in the anxiety
assessment score, the GPA would decrease by −0.014 points.
To better understand the strength of the relationship between test anxi-
ety and GPA, we will want to calculate the coefficient of determination.
To do this, we need both the SSR and SST, which take the values 10.65 and
115.36, yielding
10.65
R2 = = 0.09
115.36
we can calculate the F statistic t-test for whether any of the model slopes
(in this case only one) are different from 0 in the population:
440 − 1 − 1 0.09
F= = 438(0.10) = 43.8
1 1 − 0.09
This test has p and n – p – 1 degrees of freedom, or 1 and 438 in this situ-
ation. The p value of this test is less than 0.001, leading us to conclude that
the slope in the population is indeed significantly different from 0 because
the p value is less than the Type I error rate specified. Thus, test anxiety is lin-
early related to GPA. The same inference could be conducted using the t-test
for the slope. First we must calculate the standard error of the slope estimate:
1 SE
Sb1 =
2
1 − R ∑ ( xi − x ) 2
104.71
SE = = 0.24 = 0.49
440 − 1 − 1
In turn, the sum of squared deviations for x (anxiety) was 53743.64, and we
previously calculated R 2 = 0.09. Thus, the standard error for the slope is
1 0.49
Sb1 = = 1.05 ( 0.002 ) = 0.002
1 − 0.09 53743.64
b1 − 0 −0.014
t= = = −7.00
sb 1 0.002
with n – p – 1 or 438 degrees of freedom. The p value for this test statistic
value is less than 0.001 and thus we can probabilistically infer that the value
of the slope in the population is not zero, with the best sample point estimate
being –0.014.
Finally, we can also draw inference about β1 through a 95% confidence
interval, as shown in Equation (1.9). For this calculation, we must deter-
mine the value of the t distribution with 438 degrees of freedom that cor-
respond to the 1 – 0.05/2 or 0.975 point in the distribution. We can do so by
using a t table in the back of a textbook or with standard computer software
12 Multilevel Modeling Using R
such as SPSS. In either case, the critical value for this example is 1.97. The
confidence i nterval can then be calculated as
The fact that 0 is not in the 95% confidence interval simply supports the
conclusion we reached using the p value as described above. Also, given this
interval, we can infer that the actual population slope value lies between
–0.018 and –0.010. Thus, anxiety could plausibly have an effect as small as
–0.010 or as large as –0.018.
1.7 Regression in R
In R, the function call for fitting linear regression is lm, which is part of the
stats library that is loaded by default each time R is started. The basic form
for a linear regression model using lm is:
lm(formula, data)
where formula defines the linear regression form and data indicates the
data set used in the analysis, examples of which appear below. Returning to
the previous example, predicting GPA from measures of physical (BStotal)
and cognitive academic anxiety (CTA.tot), the model is defined in R as
Model1.1 <- lm(GPA ~ CTA.tot + BStotal, Cassidy)
This line of R code is referred to as a function call and defines the regres-
sion equation. The dependent variable GPA is followed by the independent
variables CTA.tot and BStotal, separated by ~. The data set Cassidy is
also given here, after the regression equation has been defined. Finally, the
output from this analysis is stored in the object Model1.1. To view this out-
put, we can type the name of this object in R, and hit return to obtain the
following:
Call:
lm(formula = GPA ~ CTA.tot + BStotal, data = Cassidy)
Coefficients:
(Intercept) CTA.tot BStotal
3.61892 -0.02007 0.01347
The output obtained from the basic function call will return only val-
ues for the intercept and slope coefficients, lacking information regarding
Linear Models 13
summary(Model1.1)
Residuals:
Min 1Q Median 3Q Max
-2.99239 -0.29138 0.01516 0.36849 0.93941
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.618924 0.079305 45.633 < 2e-16 ***
CTA.tot -0.020068 0.003065 −6.547 1.69e-10 ***
BStotal 0.013469 0.005077 2.653 0.00828 **
––-
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the model summary we can obtain information on model fit (overall
F test for significance, R 2, and standard error of the estimate), parameter
significance tests, and a summary of residual statistics. As the F test for the
overall model is somewhat abbreviated in this output, we can request
the entire ANOVA result, including sums of squares and mean squares by
using the anova(Model1.1) function call.
Response: GPA
Df Sum Sq Mean Sq F value Pr(>F)
CTA.tot 1 10.316 10.3159 43.8125 1.089e-10 ***
BStotal 1 1.657 1.6570 7.0376 0.00828 **
Residuals 426 100.304 0.2355
––-
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
attributes(Model1.1)
$names
[1] "coefficients" "residuals" "effects" "rank" "fitted.values"
[6] "assign" "qr" "df.residual" "na.action" "xlevels"
[11] "call" "terms" "model"
$class
[1] "lm"
This is a list of attributes or information that may be pulled from the fitted
regression model. To obtain this information, we can call for the particular
attribute. For example, if we want to obtain the predicted GPA for each indi-
vidual in the sample, we would simply type the following followed by the
enter key:
Model1.1$fitted.values
1 3 4 5 8 9 10 11 12
2.964641 3.125996 3.039668 3.125454 2.852730 3.152391 3.412460 3.011917 2.611103
13 14 15 16 17 19 23 25 26
3.158448 3.298923 3.312121 2.959938 3.205183 2.945928 2.904979 3.226064 3.245318
27 28 29 30 31 34 35 37 38
2.944573 3.171646 2.917635 3.198584 3.206267 3.073204 3.258787 3.118584 2.972594
39 41 42 43 44 45 46 48 50
2.870630 3.144980 3.285454 3.386064 2.871713 2.911849 3.166131 3.051511 3.251917
Thus for example, the predicted GPA for subject 1 based on the prediction
equation would be 2.96. By the same token, we can obtain the regression
residuals with the following command:
Model1.1$residuals
1 3 4 5 8 9
-0.4646405061 -0.3259956916 -0.7896675749 -0.0254537419 0.4492704297 -0.0283914353
10 11 12 13 14 15
-0.1124596847 -0.5119169570 0.0888967457 -0.6584484215 -0.7989228998 -0.4221207716
16 17 19 23 25 26
-0.5799383942 -0.3051829226 -0.1459275978 -0.8649791080 0.0989363702 -0.2453184879
27 28 29 30 31 34
-0.4445727235 0.7783537067 -0.8176350301 0.1014160133 0.3937331779 -0.1232042042
35 37 38 39 41 42
0.3412126654 0.4814161689 0.9394056837 -0.6706295541 -0.5449795748 -0.4194540531
43 44 45 46 48 50
-0.4960639410 -0.0717134535 -0.4118490187 0.4338687432 0.7484894275 0.4480825762
From this output, we can see that the predicted GPA for the first individual
in the sample was approximately 0.465 points below the actual GPA.
Mo
del1.2 <- lm(GPA ~ CTA.tot + BStotal + CTA.tot*BStotal,
Cassidy)
Model1.2
Call:
lm
(formula = GPA ~ CTA.tot + BStotal + CTA.tot * BStotal, data
= Cassidy)
Residuals:
Min 1Q Median 3Q Max
-2.98711 -0.29737 0.01801 0.36340 0.95016
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.8977792 0.2307491 16.892 < 2e-16 ***
CTA.tot -0.0267935 0.0060581 -4.423 1.24e-05 ***
BStotal -0.0057595 0.0157812 -0.365 0.715
CTA.tot:BStotal 0.0004328 0.0003364 1.287 0.199
––-
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Re
sidual standard error: 0.4849 on 425 degrees of freedom
(57 observations deleted due to missingness)
Multiple R-squared: 0.1101, Adjusted R-squared: 0.1038
F-statistic: 17.53 on 3 and 425 DF, p-value: 9.558e-11
Here the slope for the interaction is denoted CTA.tot:BStotal, takes the
value 0.0004, and is nonsignificant (t = 1.287, p = 0.199), indicating that the
level of physical anxiety symptoms (BStotal) does not change or moderate
the relationship between cognitive test anxiety (CTA.tot) and GPA.
In this example, the slope for the dummy variable Male is negative and sig-
nificant (β = –0.223, p < 0.001), indicating that males have significantly lower
mean GPAs than females.
Depending on the format in which the data are stored, the lm function is
capable of dummy coding categorical variables. If a variable has been desig-
nated as categorical (as often happens if you read data in from an SPSS file
in which the variable is designated as such) and is used in the lm function,
it will automatically dummy code the variable in your results. For example,
if instead of using the Male variable as described above, we used Gender
as a categorical variable coded as female and male, we would obtain the
following results from the model specification and summary commands.
Model1.4 <- lm(GPA~CTA.tot + Gender, Acad)
summary(Model1.4)
Call:
lm(formula = GPA ~ CTA.tot + Gender, data = Acad)
Residuals:
Min 1Q Median 3Q Max
-3.01149 -0.29005 0.03038 0.35374 0.96294
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.740318 0.080940 46.211 < 2e-16 ***
CTA.tot -0.015184 0.002117 -7.173 3.16e-12 ***
Gender[T.male] -0.222594 0.047152 -4.721 3.17e-06 ***
––-
Linear Models 17
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Re
sidual standard error: 0.4775 on 437 degrees of freedom
(46 observations deleted due to missingness)
Multiple R-squared: 0.1364, Adjusted R-squared: 0.1324
F-statistic: 34.51 on 2 and 437 DF, p-value: 1.215e-14
summary(GPAmodel1.5)
Call:
lm(formula = GPA ~ CTA.tot + Ethnicity, data = Acad)
Residuals:
Min 1Q Median 3Q Max
-2.95019 -0.30021 0.01845 0.37825 1.00682
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.670308 0.079101 46.400 < 2e-16 ***
CTA.tot -0.015002 0.002147 -6.989 1.04e-11 ***
Ethnicity[T.African American] -0.482377 0.131589 -3.666 0.000277 ***
Ethnicity[T.Other] -0.151748 0.136150 -1.115 0.265652
––-
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Re
sidual standard error: 0.4821 on 436 degrees of freedom
(46 observations deleted due to missingness)
Multiple R-squared: 0.1215, Adjusted R-squared: 0.1155
F-statistic: 20.11 on 3 and 436 DF, p-value: 3.182e-12
Since we have slopes for African American and Other, we know that Cauca
sian serves as the reference category, which is coded as 0. Results indicate
18 Multilevel Modeling Using R
Male<-as.factor(Male)
Library(car)
residualPlots(Model1.1)
residualPlots(Model1.1)
1 1
0 0
Pearson Residuals
Pearson Residuals
–1 –1
–2 –2
–3 –3
20 30 40 50 60 10 15 20 25 30 35 40
CTA.tot BStotal
0
Pearson Residuals
–1
–2
–3
2.8 3.0 3.2 3.4
Fitted Values
FIGURE 1.1
Diagnostic residuals plots for regression model predicting GPA from CTA.tot and BStotal.
The qqPlot function from the car package may be used to easily create
QQ plots of run regression models. Interpretation of the QQ plot is quite
simple. Essentially, the graph displays the data as it actually is on the
x axis and as it would be if normally distributed on the y axis. The indi-
vidual data points are represented in R by black circles. The solid line
represents the data conforming perfectly to the normal distribution.
Therefore, the closer the observed data (circles) are to the solid line, the
more closely the data conforms to the normal distribution. In addition,
R provides a 95% confidence interval for the line, so that when the data
points fall within it they are deemed to conform to the normal distribu-
tion. In this example, the data appear to follow the normal distribution
fairly closely.
qqPlot(Model1.1)
Linear Models 21
–2
–4
–6
–3 –2 –1 0 1 2 3
t Quantiles
Summary
Chapter 1 introduced readers to the basics of linear modeling using R. This
treatment was purposely limited, as a number of good texts cover linear
modeling and it is not the main focus of this book. However, many of the
core concepts presented here for the GLM apply to multilevel modeling as
well, and thus are of key importance as we move into more complex analyses.
In addition, much of the syntactical framework presented here will reappear
in subsequent chapters. In particular, readers should leave this chapter com-
fortable with interpretation of coefficients in linear models and the concept
of variance in outcome variables. We would encourage you to return to this
chapter frequently as needed to reinforce these basic concepts. In addition,
we would recommend that you also refer to the appendix dealing with the
basics of using R when questions about data management and installation
of specific R libraries arise. In Chapter 2, we will turn our attention to the
conceptual underpinnings of multilevel modeling before delving into esti-
mation in Chapters 3 and 4.
2
Introduction to Multilevel Data Structure
23
24 Multilevel Modeling Using R
of a higher level variable such as school. Thus, students are nested within
school. Such designs can be contrasted with crossed data structures whereby
individuals at the first level appear in multiple levels of the second variable.
In our example, students may be crossed with after-school activities if they
are allowed to participate in more than one. For example, a student might be
on the basketball team and a member of the band.
The focus of this book is almost exclusively on nested designs that give
rise to multilevel data. Another example of a nested design is a survey of
job satisfaction levels of employees from multiple departments within a
large business organization. In this case, each employee works within only
a single division in the company, making possible a nested design. It seems
reasonable to assume that employees working in the same division will
have correlated responses on the satisfaction survey, because much of their
views of their jobs will be based exclusively upon experiences within their
divisions. For a third such example, consider the situation in which clients
of several psychotherapists working in a clinic are asked to rate the qual-
ity of each therapy session. In this instance, three levels of data exist: (1)
time in the form of an individual session, (2) client, and (3) therapist. Thus,
session is nested in client, which in turn is nested in therapist. This data
structure would be expected to lead to correlated scores on a therapy rating
instrument.
scores for two individuals from the same cluster. Another way to frame
this issue is that individuals within the same cluster (e.g., school) are more
alike on the measured variable than they are like individuals in other
clusters.
It is possible to estimate τ2 and σ2 using sample data, and thus it is also
possible to estimate ρΙ. Those familiar with ANOVA will recognize these
estimates as related (though not identical) to the sum of squared terms.
The sample estimate for variation within clusters is simply
C
∑ ( n − 1) S
j=1
j
2
j
σˆ 2 = (2.2)
N −C
where S2j is the variance within cluster
nj
∑(y
i=1
ij − yj )
j=
(n j −1 )
nj is the sample size of cluster j, N is the total sample size, and C is the total
number of clusters. In other words, σ2 is simply the weighted average of
within-cluster variances.
Estimation of τ2 involves a few more steps, but is not much more complex
than what we have seen for σ2. To obtain the sample estimate for variation
between clusters τ̂ 2, we must first calculate the weighted between-cluster
variance:
C
∑ n (y − y )
2
j j
Sˆ B2 =
j=1
(2.3)
n ( C − 1)
where y j is the mean on response variables for cluster j and y is the overall
mean on the response variable
C
1
∑n j=1
2
j
n = N −
C−1 N
σˆ 2
τˆ 2 = SB2 − (2.4)
n
Using these variance estimates, we can in turn calculate the sample estimate
of ρΙ:
τˆ 2
ρˆ I = (2.5)
τˆ + σˆ 2
2
Note that Equation (2.5) assumes that the clusters are of equal size. Clearly,
that will not always be the case, in which case this equation will not hold.
However, the purpose for its inclusion here is to demonstrate the principle
underlying the estimation of ρI, which holds even as the equation changes.
To illustrate estimation of ρI, let us consider the following data set.
Achievement test data were collected from 10,903 third grade students nested
within 160 schools. School enrollment sizes ranged from 11 to 143, with a
mean size of 68.14. In this case, we will focus on the reading a chievement
test scores and use data from only five of the schools to make manual calcu-
lations easy to follow. First we will estimate σ̂ 2. To do so, we must estimate
the variance in scores within each school. These values appear in Table 2.1.
Using these variances and sample sizes, we can calculate σ̂ 2 as
∑( n − 1) S
j=1
j
2
j
σˆ 2 =
N −C
=
( 58 − 1) 5.3 + ( 29 − 1) 1.5 + ( 64 − 1) 2.9 + ( 39 − 1) 6.1 + ( 88 − 1) 3.4
278 − 5
302.1 + 42 + 182.7 + 231.8 + 295.8 1054.4
= = = 3.9
273 273
TABLE 2.1
School Size, Mean, and Variance of Reading Achievement Test
School N Mean Variance
767 58 3.952 5.298
785 29 3.331 1.524
789 64 4.363 2.957
815 39 4.500 6.088
981 88 4.236 3.362
Total 278 4.149 3.916
Introduction to Multilevel Data Structure 27
The school means that are required for calculating SB2 , appear in Table 2.1
as well. First we must calculate n:
C
1
∑j=1
n2j
1 582 + 292 + 642 + 392 + 882
n = N− = 278 −
C −1 N 5 − 1 278
1
= ( 278 − 63.2 ) = 53.7
4
Using this value, we can then calculate SB2 for the five schools in our small
sample using Equation (2.3):
53.7 ( 5 − 1)
3.9
0.140 − = 0.140 − 0.073 = 0.067
53.7
We have now calculated all the parts needed to estimate ρI for the population,
0.067
ρˆ I = = 0.017
0.067 + 3.9
This result indicates very little correlation of test scores within the schools.
We can also interpret this value as the proportion of variation in the test
scores accounted for by the schools. Since ρ̂I is a sample estimate, we know
that it is subject to sampling variation, which can be estimated with a stan-
dard error as in Equation (2.6):
2
sρI = ( 1 − ρI ) ( 1 + ( n − 1) ρI ) (2.6)
n ( n − 1)( N − 1)
28 Multilevel Modeling Using R
The terms in Equation (2.6) are as defined previously, and the assumption
is that all clusters are of equal size. As noted earlier, this latter condition is
not a requirement, however, and an alternative formulation exists for cases
in which it does not hold. However, Equation (2.6) provides sufficient insight
for our purposes into the estimation of the standard error of the ICC.
The ICC is an important tool in multilevel modeling, in large part because
it indicates the degree to which a multilevel data structure may impact the
outcome variable of interest. Larger ICC values are indicative of a greater
impact of clustering. Thus, as the ICC increases in value, we must be more
cognizant of employing multilevel modeling strategies in data analysis.
In the next section, we will discuss the problems associated with ignoring
this multilevel structure, before we turn our attention to methods for dealing
with it directly.
we may well miss important variables at the school level that may help
explain performance at student level. Therefore, beyond the known prob-
lem with misestimating standard errors, we also develop an incorrect
model for understanding the outcome variable of interest. In the context
of multilevel linear models (MLMs), inclusion of variables at each level is
relatively simple, as are interactions among variables at different levels.
This greater model complexity in turn may lead to greater understanding
of the phenomenon under study.
y = β0 + β1x + ε
We say potentially here because the single intercept model of Equation (1.1)
will suffice if there is no cluster effect. In practice, assessing the existence
of different means across clusters is an empirical question described below.
It should also be noted that in this discussion we consider only the case
where the intercept is cluster specific. It is also possible for β1 to vary by
group or even other coefficients from more complicated models.
Allowing for group-specific intercepts and slopes leads to the following
notation commonly used for the level 1 (micro) model in multilevel modeling
where the ij subscript refers to the ith individual in the jth cluster. We will
begin our discussion of MLM notation and structure with the most basic
multilevel model: predicting the outcome from only an intercept that we will
allow to vary randomly for each group.
Equation (2.10) is termed the full or composite model in which the multiple
levels are combined into a unified equation. Often in MLM, we begin our
analysis of a data set with this simple random intercept model known as the
null model that takes the form
While the null model does not provide information about the impacts of spe-
cific independent variables on the dependent, it does yield important infor-
mation regarding how variation in y is partitioned between variance among
the individual σ2 values and variance among the clusters τ2. The total vari-
ance of y is simply the sum of σ2 and τ2. In addition, as we have already seen,
these values can be used to estimate ρI. The null model, as will be seen in
later sections, is also used as a baseline for model building and comparison.
The model now includes the predictor and the slope relating it to the
dependent variable γ10, which we acknowledge as being at Level 1 by the
subscript 10. We interpret γ10 in the same way as β1 in the linear regres-
sion model, i.e., as a measure of the impact on y of a one-unit change in x.
In addition, we can estimate ρI exactly as earlier although now it reflects the
correlation between individuals from the same cluster after controlling for
the independent variable, x. In this model, both γ10 and γ00 are fixed effects,
while σ2 and τ2 remain random.
One implication of the model in Equation (2.12) is that the dependent
variable is impacted by variations among individuals (σ2), variations
among clusters (τ2), an overall mean common to all clusters (γ00), and the
impact of the independent variable as measured by γ10, which is also com-
mon to all clusters.
32 Multilevel Modeling Using R
Written in this way, we have separated the model into its fixed (γ00 + γ10xij)
and random (U0j + U1jxij+ εij) components. The Equation (2.16) model simply
indicates an interaction between cluster and x, such that the relationship of
x and y is not constant across clusters.
Heretofore we discussed only one source of between-group variation,
expressed as τ2, that serves as the variation among clusters in the inter-
cept. However, Equation (2.16) adds a second such source of between-group
variance in the form of U1j, which indicates cluster variation on the slope
relating the independent and dependent variables. To differentiate these
two sources of between-group variance, we now denote the variance of U0j
as τ 20 and the variance of U1j as τ12 . Furthermore, within clusters we expect
U1j and U0j to have a covariance of τ 01. However, across different clusters,
these terms should be independent of one another, and in all cases it is
assumed that ε remains independent of all other model terms. In practice,
2
if we find that τ1 is not 0, we must be careful in describing the relationship
between the independent and dependent variables, as it is not the same for
all clusters.
We will revisit this idea in subsequent chapters. For the moment, h owever,
it is most important to recognize that variation in the dependent variable
y can be explained by several sources, some fixed and others random.
In practice, we will most likely be interested in estimating all of these sources
of variability in a single model.
As a means for further understanding the MLM, let us consider a simple
example using the five schools described above. In this context, we are inter-
ested in treating a reading achievement test score as the dependent vari-
able and a vocabulary achievement test score as the independent variable.
Remember that students are nested within schools so that a simple regres-
sion analysis is not appropriate. To understand the issue being estimated in
the context of MLM, we can obtain separate intercept and slope estimates for
each school as shown in Table 2.2.
Since the schools are of the same sample size, the estimate of γ00, the
average intercept value is 2.359, and the estimate of the average slope value
γ10 is 0.375. Notice that for both parameters, the school values deviate from
these means. For example, the intercept for school 1 is 1.230. The –1.129
difference between this value and 2.359 is U0j for that school. Similarly, the
Introduction to Multilevel Data Structure 33
TABLE 2.2
Intercept and Slope Estimates of Multilevel Linear Model
School Intercept U0j Slope U1j
1 1.230 –1.129 0.552 0.177
2 2.673 0.314 0.199 –0.176
3 2.707 0.348 0.376 0.001
4 2.867 0.508 0.336 –0.039
5 2.319 –0.040 0.411 0.036
Overall 2.359 0.375
difference between the average slope value of 0.375 and the slope for school
1, 0.552 is 0.177, which is U1j for the school. Table 2.2 includes U0j and U1j
values for each school. The differences in slopes also provide informa-
tion about the relationship between vocabulary and reading test scores.
This relationship was positive for all schools, meaning that students who
scored higher on vocabulary also scored higher on reading. However, the
strength of this relationship was weaker for school 2 than for school 1, as
an example.
Based on the values in Table 2.2, it is also possible to estimate the vari-
ances associated with U1j and U0j, τ12 and τ 20, respectively. Again, because the
schools in this example had the same numbers of students, the calculation of
these variances is a straightforward matter, using
∑ (U )
2
1j − U1
(2.17)
J−1
for the slopes and an analogous equation for the intercept random vari-
ance. We obtain τ 20 = 0.439 and τ12 = 0.016. In other words, much more of
the variance in the dependent variable is accounted for by variation in the
intercepts at school level than is accounted for by variation in the slopes.
Another way to think of this result is that the schools exhibited greater dif-
ferences among one another in the mean level of achievement as compared
to differences in the impacts of x on y.
The practice of obtaining these variance estimates using the R environ-
ment for statistical computing and graphics and interpreting their mean-
ing are subjects for upcoming chapters. Before discussing the practical
“nuts and bolts” of conducting this analysis, we first examine the basics for
estimating parameters in the MLM framework using maximum likelihood
and restricted maximum likelihood algorithms. While similar in spirit to
the simple calculations demonstrated above, they are different in practice
and will yield somewhat different results from those obtained using least
squares as above. First, one more issue warrants our attention as we consider
the use of MLM, namely variable centering.
34 Multilevel Modeling Using R
2.4.3 Centering
Centering is simply the practice of subtracting the mean of a variable from
each individual value. This implies the mean for the sample of the centered
variables is 0 and also that each individual’s (centered) score represents a
deviation from the mean rather than representing the meaning of its raw
value. In the context of regression, centering is commonly used, for example,
to reduce collinearity caused by including an interaction term in a regression
model. If the raw scores of the independent variables are used to calculate
the interaction and both the main effects and interaction terms are included
in the subsequent analysis, it is very likely that collinearity will cause prob-
lems in the standard errors of the model parameters. Centering is a way to
help avoid such problems (Iversen, 1991).
Such issues are also important to consider in MLM, in which interactions
are frequently employed. In addition, centering is also a useful tool for avoid-
ing collinearity caused by highly correlated random intercepts and slopes
in MLMs (Wooldridge, 2004). Finally, centering provides a potential advan-
tage in terms of interpretation of results. Remember from our discussion
in Chapter 1 that the intercept is the value of the dependent variable when
the independent variable is set to 0. In many applications (e.g., a measure of
vocabulary), the independent variable cannot reasonably be 0. This essen-
tially renders the intercept as a necessary value for fitting the regression line
but not one that has a readily interpretable value. However, when x has been
centered, the intercept takes on the value of the dependent variable when
the independent is at its mean. This is a much more useful interpretation for
researchers in many situations, and yet another reason why centering is an
important aspect of modeling, particularly in the multilevel context.
Probably the most common approach to centering is to calculate the differ-
ence between each individual’s score and the overall, or grand mean across the
entire sample. This grand mean centering is certainly the most commonly used
method in practice (Bickel, 2007). It is not, however, the only manner of cen-
tering data. An alternative approach known as group mean centering involves
calculating the difference between each individual score and the mean of
the cluster to which it belongs. In our school example, grand mean centering
would involve calculating the difference between each score and the overall
mean across schools, while group mean centering would lead the researcher
to calculate the difference between each score and the mean for the school.
While the literature indicates some disagreement regarding which
approach may be best for reducing the harmful effects of collinearity (Bryk
& Raudenbush, 2002; Snijders & Bosker, 1999), researchers demonstrated
that either technique will work well in most cases (Kreft, de Leeuw, &
Aiken, 1995). Therefore, the choice of which approach to use must be made
on substantive grounds regarding the nature of the relationship between x
and y. By using grand mean centering, we implicitly compare individuals
to one another (in the form of the overall mean) across an entire sample.
Introduction to Multilevel Data Structure 35
that the observed data arose from a population with parameters close to
those used to generate the predicted values. In practice, MLE is an iterative
methodology in which the algorithm searches for parameter values that will
maximize the likelihood of the observed data (i.e., produce predicted values
that are as close as possible to observed values). MLE may be computation-
ally intensive, particularly for complex models and large samples.
The additional piece of Equation (2.19) is γh1zj, which represents the slope for
(γh1), and value of the average vocabulary score for the school (zj). In other
words, the mean school performance is related directly to the coefficient
linking the individual vocabulary score to the individual reading score. For
our specific example, we can combine Equations (2.18) and (2.19) to yield a
single equation for the two-level MLM.
Each of these model terms has been defined previously in this chapter: γ00 is
the intercept or grand mean for the model, γ10 is the fixed effect of variable
x (vocabulary) on the outcome, U0j represents the random variation for the
intercept across groups, and U1j represents the random variation for the
slope across groups.
The additional pieces of Equation (2.13) are γ01 and γ11. The γ01 represents
the fixed effect of Level 2 variable z (average vocabulary) on the outcome and
γ11 represents the slope for and value of the average vocabulary score for the
school. The new term in Equation (2.20) is the cross-level interaction γ1001xijzj.
As the name implies, the cross-level interaction is simply an interaction of
Level 1 and Level 2 predictors. In this context, it represents the interaction
between an individual’s vocabulary score and the mean vocabulary score
for his or her school. The coefficient for this interaction term, γ1001, assesses
the extent to which the relationship between a student’s vocabulary score is
moderated by the mean for the school attended. A large significant value for
this coefficient would indicate that the relationship between an individual’s
vocabulary test score and overall reading achievement is dependent on the
level of vocabulary achievement at his or her school.
Before formulating the rest of the model, we must evaluate whether the
slopes and intercepts are random at both Levels 2 and 3 or only at Level 1, for
example. This decision should always be based on the theory surrounding
the research questions, what is expected in the population, and what is
revealed in the empirical data. We will proceed with the remainder of this
discussion under the assumption that the Level 1 intercepts and slopes are
random for both Levels 2 and 3 in order to provide a complete description
of the most complex model possible when three levels of data structure are
present. When the Level 1 coefficients are not random at both levels, the
terms in the following models for which this randomness is not present
would simply be removed. We will address this issue more specifically in
Chapter 4 when we discuss the fitting of three-level models using R. The
Level 2 and Level 3 contributions to the MLM described in Equation (2.13)
appear below.
We can then use simple substitution to obtain the expression for the Level 1
intercept and slope in terms of both Level 2 and Level 3 parameters.
In turn, these terms may be substituted into Equation (2.15) to provide the
full three-level MLM.
Summary
The goal of this chapter was to introduce the basic theoretical underpinnings
of multilevel modeling, but not to provide an exhaustive technical discus-
sion of these issues. A number of useful resources can provide comprehen-
sive details and are listed in the references at the end of the book. However,
the information in this chapter should be adequate as we move forward with
multilevel modeling using R software. We recommend that you make liberal
use of the information provided here while reading subsequent chapters.
This should provide you with a complete understanding of the output gener-
ated by R that we will be examining. In particular, when interpreting output
from R, it may be helpful for you to return to this chapter to review precisely
what each model parameter means.
Introduction to Multilevel Data Structure 41
In the next two chapters, we will take the theoretical information from
this chapter and apply it to real data sets using two different R libraries,
nlme and lme4, both of which were developed for conducting multilevel
analyses with continuous outcome variables. In Chapter 5, we will examine
how these ideas can be applied to longitudinal data. Chapters 7 and 8 will
discuss multilevel modeling for categorical dependent variables. In Chapter
9, we will diverge from the likelihood-based approaches described here and
explain multilevel modeling within the Bayesian framework, focusing on
applications and learning when this method may be appropriate and when
it may not.
3
Fitting Two-Level Models in R
43
44 Multilevel Modeling Using R
Mo
del3.0 <- lme(
fixed = geread~1, random = ~1|school, data =
Achieve)
Random effects:
Formula: ~1 | school
(Intercept) Residual
StdDev: 0.6257119 2.24611
0.6257119
ρˆ l = = 0.2178797
0.6257119 + 2.24611
We interpret this value to mean that the correlation of reading test scores
among students within the same schools is 0.22 if we round our result.
To fit the model with vocabulary as the independent variable using lme, we
submit the following syntax in R.
In the first part of the function call, we define the formula for the model fixed
effects, very similar to model definition of linear regression using lm(). The
statement fixed = geread~gevocab essentially says that the reading score
is predicted with the vocabulary score fixed effect. The random part of the
function call defines the random effects and the nesting structure. If only a
random intercept is desired, the syntax for the intercept is 1. In this example,
random = ~1|school indicates that only a random intercepts model will be
used and that the random intercept varies within school. This corresponds
to the data structure of students nested within schools. Fitting this model,
which is saved in the output object Model3.1, we obtain the following out-
put by inputting the name of the output object.
Model3.1
Linear mixed-effects model fit by REML
Data: Achieve
Log-restricted-likelihood: -21568.6
Fixed: geread ~ gevocab
(Intercept) gevocab
2.0233559 0.5128977
Random effects:
Formula: ~1 | school
(Intercept) Residual
StdDev: 0.3158785 1.940740
Output from the lme() function provides parameter estimates for the fixed
effects and standard deviations for the random effects along with a summary
of the number of Level 1 and Level 2 units in the sample. As with the output
from the lm() function, however, the output from the lme() function provides
limited information. If we desire more detailed information about the model,
including significance tests for parameter estimates and model fit statistics,
we can request a model summary. The summary() command will provide
the following:
summary(Model3.1)
Linear mixed-effects model fit by REML
Data: Achieve
AIC BIC logLik
43145.2 43174.17 -21568.6
Random effects:
Formula: ~1 | school
(Intercept) Residual
StdDev: 0.3158785 1.940740
From this summary we obtain AIC, BIC, and log likelihood information that
can be used for model comparisons in addition to parameter significance
tests. We can also obtain a correlation between the fixed effect slope and
the fixed effect intercept as well as a brief summary of the model residu-
als including the minimum, maximum, and first, second (median, denoted
Med), and third quartiles.
The correlation of the fixed effects represents the estimated correlation
if we had repeated samples of the two fixed effects (i.e., the intercept and
slope for gevocab). Often this correlation is not particularly interesting.
From this output, we can see that gevocab is a significant predictor of
geread (t = 61.258, p < 0.05), and that as vocabulary score increases by
1 point, reading ability increases by 0.513 points. We can compare the fit
Fitting Two-Level Models in R 47
for this model with that of the null model by referring to the AIC and BIC
statistics. Recall that smaller values reflect better model fit. For Model 3.1,
the AIC and BIC are 43145.2 and 43174.17, respectively. For Model 3.0, the
AIC and BIC were 46274.31 and 46296.03. Because the values for both sta-
tistics are smaller for Model 3.1, we would conclude that it provides a
better fit to the data. Substantively, this means that we should include the
predictor variable geread, which the results of the hypothesis test also
supported.
In addition to the fixed effects in Model 3.1, we can also ascertain
how much variation in geread is present across schools. Specifically,
the output shows that after accounting for the impact of gevocab, the
estimate of variation in intercepts across schools is 0.3158785, while the
within-school variation is estimated as 1.940740. We can tie these num-
bers directly back to our discussion in Chapter 2 where τ 20 = 0.3158785
and σ2 = 1.940740. In addition, the overall fixed intercept denoted as γ00 in
Chapter 2 is 2.0233559, which is the mean of geread when the gevocab
score is 0.
Finally, it is possible to estimate the proportion of variance in the outcome
variable accounted for at each level of the model. In Chapter 1, we saw that
with single-level OLS regression models, the proportion of response variable
variance accounted for by the model is expressed as R 2. In the context of
multilevel modeling, R 2 values can be estimated for each level of the model
(Snijders & Bosker, 1999). For Level 1, we can calculate
σ 2M 1 + τ 2M 1
R12 = 1 −
σ 2M 1 + τ 2M 1
1.940740 + 0.3158785
= 1−
2.24611 + 0.6257119
2.2566185
= 1− = 1 − 0.7857794 = 0.2142206
2.8718219
This result tells us that Level 1 of Model 3.1 explains approximately 21% of
the variance in the reading score above and beyond that accounted for in the
null model. We can also calculate a Level 2 R 2 value:
σ 2M 1 /B + τ 2M 1
R22 = 1 −
σ 2M 0 /B + τ 2M 0
where B is the average size of the Level 2 units (schools in this case). R pro-
vides the number of individuals in the sample (10320) and the number of
schools (160) so that we can calculate B as 10320/160 = 64.5. We can now
estimate
48 Multilevel Modeling Using R
σ 2M 1 /B + τ 2M 1
R22 = 1 − =
σ 2M 0 /B + τ 2M 0
σ 2M 1 + τ 2M 1
R12 = 1 −
σ 2M 0 + τ 2M 0
1.940760 + 0.3167654
= 1−
2.24611 + 0.6257119
2.2575254
= 1− = 1 − 0.7860952 = 0.2139048
2.8718219
The model in the previous example was quite simple and incorpo-
rated only a single Level 1 predictor. In many applications, researchers
utilize predictor variables at both Level 1 (student) and Level 2 (school).
Incorporation of predictors at higher levels of analysis is straightfor-
ward in R and is handled in exactly the same manner as incorporation
of Level 1 predictors. For example, let us assume that in addition to a
student’s v
ocabulary test performance, a researcher also wants to deter-
mine whether school enrollment size (senroll) also produces a statisti-
cally significant impact on overall reading score. In that instance, adding
the school enrollment Level 2 predictor would result in the following
R syntax:
summary(Model3.2)
Linear mixed-effects model fit by REML
Data: Achieve
AIC BIC logLik
43162.1 43198.31 -21576.05
Random effects:
Formula: ~1 | school
(Intercept) Residual
StdDev: 0.3167654 1.940760
Note that in this specific function call, senroll, is included only in the
fixed part of the model and not in the random part. This variable thus has
only a fixed (average) effect and is the same across all schools. We will see
shortly how to incorporate a random coefficient in this model.
From these results we can see that enrollment did not have a statistically
significant relationship with reading achievement. In addition, notice some
minor changes in the estimates of the other model parameters and a fairly large
change in the correlation between the fixed effect of gevocab slope and the
fixed effect of the intercept. The slope for senroll and intercept were strongly
negatively correlated and the slopes of the fixed effects exhibited virtually no
correlation. As noted earlier, these correlations are typically not very helpful
for explaining the dependent variable and are rarely discussed in any detail in
reports of analysis results. The R2 values for Levels 1 and 2 appear below.
σ 2M 1 + τ 2M 1
R12 = 1 −
σ 2M 0 + τ 2M 0
1.940760 + 0.3167654
= 1−
2.24611 + 0.6257119
2.2575254
= 1− = 1 − 0.7860952 = 0.2139048
2.8718219
σ 2M 1 /B + τ 2M 1
R22 = 1 −
σ 2M 0 /B + τ 2M 0
1.940760/64.5 + 0.3167654
= 1−
2.24611/64.5 + 0.6257119
0.34685
= 1− = 1 − 0.52378 = 0.474884
0.66053
into a multilevel model using lme occurs in the random part of the model
syntax. When defining random effects, as mentioned above, 1 stands for
the intercept, so that if all we desire is a random intercepts model as in the
previous example, the syntax ~1|school is sufficient. If, however, we want
to allow a Level 1 slope to vary randomly, we will change this part of the syn-
tax (recall that gevocab is already included in the fixed part of the model).
Let us return to the Model 3.1 scenario, but this time allow both the slope
and intercept for gevocab to vary randomly from one school to another.
The syntax for this model would now become
This model differs from Model 3.1 only in that the 1 in the random line is
replaced by the variable name whose effect we want to be random. Notice
that we no longer explicitly state a random intercept in the specification.
After a random slope is defined, the random intercept becomes implicit so
we no longer need to specify it (i.e., it is included by default). If we do not
want the random intercept while modeling the random coefficient, we would
include a –1 immediately prior to gevocab. The random slope and intercept
syntax will generate the following model summary:
summary(Model3.3)
Linear mixed-effects model fit by REML
Data: Achieve
AIC BIC logLik
43004.85 43048.3 -21496.43
Random effects:
Formula: ~gevocab | school
Structure: G
eneral positive-definite, Log-Cholesky
parametrization
StdDev Corr
(Intercept) 0.5316640 (Intr)
gevocab 0.1389372 -0.858
Residual 1.9146629
summary(Model3.4)
Linear mixed-effects model fit by REML
Data: Achieve
AIC BIC logLik
43015.77 43088.18 -21497.88
Random effects:
Formula: ~gevocab + age | school
Structure: G
eneral positive-definite, Log-Cholesky
parametrization
StdDev Corr
(Intercept) 0.492561805 (Intr) gevocb
gevocab 0.137974552 -0.073
age 0.006388612 -0.649 -0.601
Residual 1.914030323
summary(Model3.5)
Linear mixed-effects model fit by REML
Data: Achieve
AIC BIC logLik
43155.49 43198.94 -21571.75
Fitting Two-Level Models in R 53
Random effects:
Formula: ~1 | school
(Intercept) Residual
StdDev: 0.3142524 1.939708
We can see from the output of Model 3.5 that both age (t = –3.65, p < 0.01)
and the interaction (gevocab:age) between age and vocabulary (t = 2.87,
p < 0.01) are significant predictors of reading. Focusing on the interaction,
the sign on the coefficient is positive. This indicates an enhancing effect:
as age increases, the relationship of reading and vocabulary becomes
stronger.
summary(Model3.6)
Linear mixed-effects model fit by REML
Data: Achieve
AIC BIC logLik
43175.57 43219.02 -21581.79
Random effects:
Formula: ~1 | school
(Intercept) Residual
StdDev: 0.316492 1.940268
Correlation:
(Intr) gevocb senrll
gevocab -0.782
senroll -0.958 0.735
gevocab:senroll 0.752 -0.960 -0.766
The output from Model 3.6 has a similar interpretation. When school
enrollment is used instead of age as a predictor, the main effect of vocabulary
(t = 19.59, p < 0.001) and the interaction between vocabulary and school enroll-
ment (t = –2.51, p < 0.05) are significant predictors of reading achievement.
Focusing on the interaction, since the sign on the coefficient is negative we
would conclude that there is a buffering or inhibitory effect. In other words,
as school size increases, the relationship between vocabulary and reading
achievement becomes weaker.
After mean centered versions of the predictors are created, they can be
incorporated into the model in the same manner used earlier.
summary(Model3.5.C)
Linear mixed-effects model fit by REML
Data: Achieve
AIC BIC logLik
43155.49 43198.94 -21571.75
Fitting Two-Level Models in R 55
Random effects:
Formula: ~1 | school
(Intercept) Residual
StdDev: 0.3142524 1.939708
First, notice the identical model fit (compare AIC, BIC, and log likelihood)
of the centered and uncentered models. This is a good way to ensure that
centering worked. Looking now to the fixed effects of the model, we see
some changes in their interpretation. These differences are likely due to
multicollinearity issues in the original uncentered model. The interaction
is still significant (t = 2.87, p < 0.05) but we now see a significant effect of
vocabulary (t = 61.15, p < 0.01). Age is no longer a significant predictor (t = –1.73
p > 0.05). Focusing on the interaction, recall that when predictors are centered,
an interaction can be interpreted as the effect of one variable while holding
the second variable constant. Since the sign on the interaction is positive,
vocabulary has a positive impact on reading ability if we hold age constant.
The model is defined in much the same way as we defined the lme func-
tion, where the outcome variable is the sum or linear combination of all of
the random and fixed effects. The only difference in treatment of fixed and
random effects is that the random effects require information on the nesting
structure (students within schools in this case) for the parameter within which
they vary. The primary difference in model syntax between lme and lmer is
that the random effect is denoted by its appearance within parentheses rather
than through explicit assignment using the random statement. This syntax
will yield the following output:
Model3.7
Linear mixed model fit by REML
Formula: geread ~ gevocab + (1 | school)
Data: Achieve
AIC BIC logLik deviance REMLdev
43145 43174 -21569 43124 43137
Random effects:
Groups Name Variance Std.Dev.
school (Intercept) 0.099779 0.31588
Residual 3.766470 1.94074
Number of obs: 10320, groups: school, 160
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.023343 0.049305 41.04
gevocab 0.512901 0.008373 61.26
From this output we can see one obvious benefit of the lme4 package is
that all important information is presented without requiring the use of a
summary statement. The function call alone is enough to provide model
fit statistics, parameter estimates, parameter significance tests, parameter
estimate correlations, residuals, and sample summaries. We can also see
that the lme4 package includes deviance and REML estimated deviance val-
ues in the model fit statistics in addition to the AIC, BIC, and log likelihood
reported in the nlme package. What the lme4 package does not include are
p values for model coefficients.
Fitting Two-Level Models in R 57
In comparing the outputs of lme and lmer, we notice that while both
t values and accompanying p values are reported in the nlme package,
only the t values for fixed effects are reported in lme4. The reason for
this discrepancy in the reported results, and specifically for the lack
of p values is somewhat complex and is not within the scope of this
book. However, we should note that the standard approach for finding
p values based on using the reference t distribution, which would seem
to be the intuitively correct step, does in fact not yield correct values in
many cases. Therefore, some alternative approach for obtaining them is
necessary.
Douglas Bates, the developer of lme4, recommends the use of Markov
chain Monte Carlo (MCMC) methods to obtain p values for mixed
model effects. We review MCMC in greater detail in Chapter 9 so that
readers may gain an understanding of how this method works. We can
say at this point that the computer-intensive MCMC approach relies on
generating a posterior distribution for each model parameter, then using
the distributions to obtain p values and confidence intervals for each
parameter estimate. To obtain MCMC p values and confidence intervals
for lme objects, we must install the coda and languageR packages and
then use the following command sequence to obtain the desired statistics
for Model 3.7.
library(coda)
library(languageR)
Mo
del3.7.pvals<-pvals.fnc(Model3.7, nsim = 10000, withMCMC =
TRUE)
These commands first load the two libraries we need. We then create an
object that contains the p values and confidence intervals for the various
terms in Model 3.7 in the object Model3.7.pvals. The actual function
that we use is pvals.fnc, which is part of the languageR library. In
turn, this function calls the mcmcsamp function from the coda library.
Three elements are included in this function call, including the name of
the lmer object that contains the model fit results (Model3.7), the num-
ber of simulated data sets we want to sample by using MCMC (nsim), and
whether we want results of each of these 10000 MCMC draws to be saved
(withMCMC = TRUE). Setting this last condition to TRUE is not necessary,
as we are interested only in summary statistics. We can obtain the relevant
information for the fixed and random portions of the model by typing the
following commands.
Model3.7.pvals$fixed
Model3.7.pvals$random
From these results, we can determine that the vocabulary score was statisti-
cally significantly related to the reading score, and that the random effects
school and Residual, were both different from 0 as well, since neither of
their confidence intervals included 0.
Returning to model definition using lmer(), multiple predictors at any
level and interactions between predictors at any level are again entered
in the model in the same manner as using the lm() or lme() functions.
The following is the syntax for fitting Model 3.8 using lmer.
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.0748764 0.1139915 18.20
gevocab 0.5128742 0.0083733 61.25
senroll -0.0001026 0.0002051 -0.50
Model3.8.pvals<-pvals.fnc(
Model3.8, nsim = 10000, withMCMC =
TRUE)
Model3.8.pvals$fixed
Model3.8.pvals$random
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.00570 0.06109 32.83
gevocab 0.52036 0.01442 36.09
We must note here that the MCMC approach for obtaining hypothesis test
results for models estimated using lmer is not currently available for ran-
dom coefficient models.
Although, for the most part, the syntax of lme4 is fairly similar to that of
lme for relatively simple models, incorporating multiple random slopes into
multilevel models using lme4 is somewhat different. The random effects
discussed for the nlme package assume correlated or nested levels. Random
effects in lme4 may be either correlated or uncorrelated. In this respect, lme4
provides greater modeling flexibility. This difference in model specification
60 Multilevel Modeling Using R
Model3.10
Linear mixed model fit by REML
Formula: geread ~ gevocab + age + (gevocab + age | school)
Data: Achieve
AIC BIC logLik deviance REMLdev
43015 43088 -21498 42974 42995
Random effects:
Groups Name Variance Std.Dev. Corr
school (Intercept) 1.8361e-02 0.135503
gevocab 1.9026e-02 0.137936 0.465
age 2.4641e-05 0.004964 -0.197 -0.960
Residual 3.6641e+00 1.914182
Number of obs: 10320, groups: school, 160
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.965272 0.413052 7.18
gevocab 0.519278 0.014351 36.18
age -0.008881 0.003822 -2.32
Model3.11
Linear mixed model fit by REML
Formula: geread ~ gevocab + age + (gevocab | school) + (age |
school)
Data: Achieve
AIC BIC logLik deviance REMLdev
43017 43089 -21498 42975 42997
Random effects:
Groups Name Variance Std.Dev. Corr
school (Intercept) 2.1436e-01 0.46299441
gevocab 1.9194e-02 0.13854364 -0.976
Fitting Two-Level Models in R 61
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.973619 0.414551 7.17
gevocab 0.519191 0.014397 36.06
age -0.008956 0.003798 -2.36
Notice the difference in how random effects are expressed in lmer between
Models 3.10 and 3.11. Output in Model 3.10 provides identical estimates to
those of the nlme Model 3.4. With random effects, R reports estimates for the
variability of the random intercept, variability for each random slope, and
the correlations between the random intercept and random slopes. Output
in Model 3.11, however, reports two different sets of uncorrelated random
effects.
The first set reports variability for the random intercept and variability
for the random slope for vocabulary and correlation between the random
intercept and random slope for vocabulary. The second set of random
effects reports variability of a second random intercept, variability in the
random slope for age, and the correlation between the random intercept
and the random slope for age. The random slope for vocabulary and the
random slope for age are not allowed to correlate. Finally, we can obtain
p values and confidence intervals for each model term using the pvals.fnc
function based on the MCMC approach reviewed earlier in this chapter.
method in nlme, the call is method = "ML". Model 3.13 depicts fitting of the
same multilevel model using the lme4 package. The call to designate the use
of the ML to be used is REML = FALSE.
values for each model deviance can be used to compare model fit. After each
of the models in question has been fit, the difference in chi-square values can
be obtained using the anova() function call.
For models run using the nlme package, the anova() command will pro-
vide accurate comparisons only if maximum likelihood estimation is used.
For models run using lme4, the anova() command will work for both
maximum likelihood and restricted maximum likelihood. When maximum
likelihood is used, both fixed and random effects are compared simultane-
ously. When restricted maximum likelihood is used, only random effects are
compared. The following is an example of comparing fit with the chi-square
difference statistic for Models 3.1 and 3.2 that were discussed in detail above.
anova(Model3.1, Model3.2)
anova(Model3.1 Model3.2)
Model3.1 1
4 43132.43 43161.40 -21562.22
Mo
del3.2 2 5 43134.18 43170.39 -21562.09 1 vs 2 0.2550617
0.6135
intervals(Model3.3)
Fixed effects:
lower est. upper
(Intercept) 1.8859621 2.0057064 2.1254506
gevocab 0.4920982 0.5203554 0.5486126
attr(,"label")
[1] "Fixed effects:"
Random Effects:
Level: school
lower est. upper
sd((Intercept)) 0.4250700 0.5316531 0.6649611
sd(gevocab) 0.1153701 0.1389443 0.1673356
cor((Intercept),gevocab) -0.9178709 -0.8585096 -0.7615768
For the intercept, the 95% confidence interval lies between 0.425 and 0.665.
Thus, we are 95% confident that the actual variance component for the
intercept was between these two values. Likewise, the 95% confidence inter-
val for the random slope variance was between 0.115 and 0.167. From these
values, we can see that 0 did not lie in the interval for either random effect,
intercept, or slope. Thus, we can conclude that both the random intercept and
random slope were significantly different from 0.
Summary
This chapter put to work the concepts learned in Chapter 2 to work using R.
We learned the basics of fitting two-level models when a dependent v
ariable
is continuous using the lme and lmer packages. Within this multilevel
Fitting Two-Level Models in R 65
framework, we learned how to fit the null, random intercept, and random
slopes models. We also covered independent variables at both levels of data
and learned how to compare the fits of models with one another. This last
point will prove particularly useful as we engage in the process of select-
ing the most parsimonious (simplest) model that also explains the depen-
dent variable adequately. Of greatest import in this chapter, however, is the
ability to fit multilevel models using both lme and lme4 in R and correctly
interpreting the resultant output. If you have mastered those skills, you are
ready to move to Chapter 4, where we extend the model to include a third
level in the hierarchy. As we will see, the actual fitting of three-level models
is very similar to fitting two-level models studied in the chapter.