Zuur Protocol 2010
Zuur Protocol 2010
Zuur Protocol 2010
doi: 10.1111/j.2041-210X.2009.00001.x
Highland Statistics Ltd, Newburgh, UK; 2Oceanlab, University of Aberdeen, Newburgh, UK; and 3Department of
Ecology and Evolutionary Biology and Center for Conservation Biology, University of Connecticut, Storrs, CT, USA
Summary
1. While teaching statistics to ecologists, the lead authors of this paper have noticed common statistical problems. If a random sample of their work (including scientic papers) produced before doing
these courses were selected, half would probably contain violations of the underlying assumptions
of the statistical techniques employed.
2. Some violations have little impact on the results or ecological conclusions; yet others increase
type I or type II errors, potentially resulting in wrong ecological conclusions. Most of these violations can be avoided by applying better data exploration. These problems are especially troublesome in applied ecology, where management and policy decisions are often at stake.
3. Here, we provide a protocol for data exploration; discuss current tools to detect outliers, heterogeneity of variance, collinearity, dependence of observations, problems with interactions, double
zeros in multivariate analysis, zero ination in generalized linear modelling, and the correct type of
relationships between dependent and independent variables; and provide advice on how to address
these problems when they arise. We also address misconceptions about normality, and provide
advice on data transformations.
4. Data exploration avoids type I and type II errors, among other problems, thereby reducing the
chance of making wrong ecological conclusions and poor recommendations. It is therefore essential
for good quality management and policy based on statistical analyses.
Key-words: collinearity, data exploration, independence, transformations, type I and II
errors, zero ination
Introduction
The last three decades have seen an enormous expansion of the
statistical tools available to applied ecologists. A short list
of available techniques includes linear regression, generalized
linear (mixed) modelling, generalized additive (mixed) modelling, regression and classication trees, survival analysis, neural networks, multivariate analysis with all its many methods
such as principal component analysis (PCA), canonical correspondence analysis (CCA), (non-)metric multidimensional
scaling (NMDS), various time series and spatial techniques,
etc. Although some of these techniques have been around for
some time, the development of fast computers and freely available software such as R (R Development Core Team 2009)
makes it possible to routinely apply sophisticated statistical
techniques on any type of data. This paper is not about these
methods. Instead, it is about the vital step that should, but
frequently does not, precede their application.
4 A. F. Zuur et al.
ple in CCA and redundancy analysis (RDA, ter Braak & Verdonschot 1995), or the Mantel test (Legendre & Legendre
1998), temporal or spatial correlation between observations
can increase type I errors (rejecting the null hypothesis when it
is true).
The same holds with regression-type techniques applied on
temporally or spatially correlated observations. One of the
most used, and misused, techniques is without doubt linear
regression. Often, this technique is associated with linear patterns and normality; both concepts are often misunderstood.
Linear regression is more than capable of tting nonlinear relationships, e.g. by using interactions or quadratic terms (Montgomery & Peck 1992). The term linear in linear regression
refers to the way parameters are used in the model and not to
the type of relationships that are modelled. Knowing whether
we have linear or nonlinear patterns between response and
explanatory variables is crucial for how we apply linear regression and related techniques. We also need to know whether the
data are balanced before including interactions. For example,
Zuur, Ieno & Smith (2007) used the covariates sex, location
and month to model the gonadosomatic index (the weight of
the gonads relative to total body weight) of squid. However,
both sexes were not measured at every location in each month
due to unbalanced sampling. In fact, the data were so unbalanced that it made more sense to analyse only a subset of the
data, and refrain from including certain interactions.
With this wealth of potential pitfalls, ensuring that the scientist does not discover a false covariate eect (type I error),
wrongly dismiss a model with a particular covariate (type II
error) or produce results determined by only a few inuential
observations, requires that detailed data exploration be applied
before any statistical analysis. The aim of this paper is to provide a protocol for data exploration that identies potential
problems (Fig. 1). In our experience, data exploration can take
up to 50% of the time spent on analysis.
Although data exploration is an important part of any analysis, it is important that it be clearly separated from hypothesis
testing. Decisions about what models to test should be made
a priori based on the researchers biological understanding of
the system (Burnham & Anderson 2002). When that understanding is very limited, data exploration can be used as a
hypothesis-generating exercise, but this is fundamentally different from the process that we advocate in this paper. Using
aspects of a data exploration to search out patterns (data
dredging) can provide guidance for future work, but the
results should be viewed very cautiously and inferences about
the broader population avoided. Instead, new data should be
collected based on the hypotheses generated and independent
tests conducted. When data exploration is used in this manner,
both the process used and the limitations of any inferences
should be clearly stated.
Throughout the paper we focus on the use of graphical tools
(Chateld 1998; Gelman, Pasarica & Dodhia 2002), but in
some cases it is also possible to apply tests for normality or
homogeneity. The statistical literature, however, warns against
certain tests and advocates graphical tools (Montgomery &
Peck 1992; Draper & Smith 1998, Quinn & Keough 2002).
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314
Data exploration 5
points beyond these lines are labelled as outliers. Some
researchers routinely (but wrongly) remove these observations.
Figure 2a shows an example of such a graph using 1295
observations of a morphometric variable (wing length of the
saltmarsh sparrow Ammodramus caudacutus; Gjerdrum, Elphick & Rubega 2008). The graph leads one to believe (perhaps
wrongly, as we will see in a moment) that there are seven
outliers.
Another very useful, but highly neglected, graphical tool to
visualize outliers is the Cleveland dotplot (Cleveland 1993).
This is a graph in which the row number of an observation is
plotted vs. the observation value, thereby providing much
more detailed information than a boxplot. Points that stick out
on the right-hand side, or on the left-hand side, are observed
values that are considerable larger, or smaller, than the majority of the observations, and require further investigation. If
such observations exist, it is important to check the raw data
for errors and assess whether the observed values are reasonable. Figure 2b shows a Cleveland dotplot for the sparrow
wing length data; note that the observations identied by the
boxplot are not especially extreme after all. The upward trend
in Fig. 2b simply arises because the data in the spreadsheet
were sorted by weight. There is one observation of a wing
length of about 68 mm that stands out to the left about half
way up the graph. This value is not considerably larger than
the other values, so we cannot say yet that it is an outlier.
Figure 3 shows a multi-panel Cleveland dotplot for all of
the morphometric variables measured; note that some variables have a few relatively large values. Such extreme values
could indicate true measurement errors (e.g. some t the characteristics of observer distraction sensu Morgan 2004,
whereby the observers eye is drawn to the wrong number on a
measurement scale). Note that one should not try to argue that
such large values could have occurred by chance. If they were,
then intermediate values should also have been generated by
(b)
60
55
65
(a)
chance, but none were. (A useful exercise is to generate, repeatedly, an equivalent number of random observations from an
appropriate distribution, e.g. the Normal distribution, and
determine how the number of extreme points compares to the
empirical data.) When the most likely explanation is that the
extreme observations are measurement (observer) errors, they
should be dropped because their presence is likely to dominate
the analysis. For example, we applied a discriminant analysis
on the full sparrow data set to see whether observations differed among observers, and found that the rst two axes were
mainly determined by the outliers.
So far, we have loosely dened an outlier as an observation
that sticks out from the rest. A more rigorous approach is to
consider whether unusual observations exert undue inuence
on an analysis (e.g. on estimated parameters). We make a distinction between inuential observations in the response variable and in the covariates. An example of the latter is when
species abundances are modelled as a function of temperature,
with nearly all temperature values between 15 and 20 C, but
one of 25 C. In general, this is not an ideal sampling design
because the range 2025 C is inadequately sampled. In a eld
study, however, there may have been only one opportunity to
sample the higher temperature. With a large sample size, such
observations may be dropped, but with relative small data sets
the consequent reduction in sample size may be undesirable,
especially if other observations have outliers for other explanatory variables. If omitting such observations is not an option,
then consider transforming the explanatory variables.
In regression-type techniques, outliers in the response variables are more complicated to deal with. Transforming the
data is an option, but as the response variable is of primary
interest, it is better to choose a statistical method that uses a
probability distribution that allows greater variation for large
mean values (e.g. gamma for continuous data; Poisson or negative binomial for count data) because doing this allows us to
55
60
65
Wing length (mm)
Fig. 2. (a) Boxplot of wing length for 1295 saltmarsh sparrows. The line in the middle of the box represents the median, and the lower and upper
ends of the box are the 25% and 75% quartiles respectively. The lines indicate 1.5 times the size of the hinge, which is the 75% minus 25% quartiles. (Note that the interval dened by these lines is not a condence interval.) Points beyond these lines are (often wrongly) considered to be outliers. In some cases it may be helpful to rotate the boxplot 90 to match the Cleveland dotplot. (b) Cleveland dotplot of the same data. The
horizontal axis represents the value of wing length, and the vertical axis corresponds to the order of the data, as imported from the data le (in this
case sorted by the birds weight).
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314
6 A. F. Zuur et al.
Culmen length
Weight
Wing length
Tarsus length
Head length
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314
Data exploration 7
Female
Male
10
Intake rate
08
06
04
02
00
Summer
Pre-migration
Winter
Pre-migration
Winter
Migration period
(b)
100
80
60
40
20
0
0
14
16
18
20
22
Weight (g)
24
26
28
100
80
60
40
20
0
July
Frequency
Frequency
50
100
150
(a)
August
100
80
60
40
20
0
June
Summer
15
20
Weight (g)
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314
25
8 A. F. Zuur et al.
Covariate
Fig. 6. Visualization of two underlying assumptions in linear regression: normality and homogeneity. The dots represent observed values
and a regression line is added. At each covariate value, we assume
that observations are normally distributed with the same spread
(homogeneity). Normality and homogeneity at each covariate value
cannot be veried unless many (>25) replicates per covariate value
are taken, which is seldom the case in ecological studies. In practice, a
histogram of pooled residuals should be made, but this does not provide conclusive evidence for normality. The same limitations holds if
residuals are plotted vs. tted values to verify homogeneity.
06
07
08
09
10
RBGU
LESA
DUNL
SNIP
LBDO
GRYE
LBCU
KILL
GREG
SNEG
GBHE
AMBI
COOT
UNDU
NOSH
NOPI
AMWI
GWTE
GADW
MALL
MALL
GADW
GWTE
AMWI
NOPI
NOSH
UNDU
COOT
AMBI
GBHE
SNEG
GREG
KILL
LBCU
GRYE
LBDO
SNIP
DUNL
LESA
RBGU
Response variable
05
Frequency
100 200 300 400 500 600 700
Fig. 8. A corrgram showing the frequency with which pairs of waterbird species both have zero abundance. The colour and the amount
that a circle has been lled correspond to the proportion of observations with double zeros. The diagonal running from bottom left to
top right represents the percentage of observations of a variable equal
to zero. Four-letter acronyms represent dierent waterbird species.
The top bar relates the colours in the graph to the proportion of
zeros.
0 4 8 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
Observed values
when two sites both have the same joint absences, this might
mean that the sites are ecologically similar. On the other hand,
if a species has a highly clumped distribution, or is simply rare,
then joint absences might arise through chance and say
nothing about the suitability of a given site for a species, the
similarity among the habitat needs of species or the ecological
similarity of sites. A high frequency of zeros, thus, can greatly
complicate interpretation of such analyses. Irrespective of our
attitude to joint absences, we need to know whether there are
double zeros in the data. This means that for each species-pair,
we need to calculate how often both had zero abundance for
the same observation (e.g. site). We can either present this
information in a table, or use advanced graphical tools like a
corrgram (Fig. 8; Sarkar 2008). In our waterbird example, the
frequency of double zeros is very high. All the blue circles cor-
respond to species that have more than 80% of their observations jointly zero. This result is consistent with the biology of
the species studied, most of which form large ocks and have
highly clumped distributions. A PCA would label such species
as similar, although their ecological use of habitats is often
quite dierent (e.g. Elphick & Oring 1998). Alternative multivariate analyses that ignore double zeros are discussed in
Legendre & Legendre (1998) and Zuur et al. (2007).
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314
Data exploration 9
the number of saltmarsh sparrows captured in a study plot is
modelled as a function of covariates that describe the relative
abundance of various plant species (for details, see Gjerdrum,
Elphick & Rubega 2005; Gjerdrum et al. 2008). The second
column of the table gives the estimated P-values of the t-statistics for each regression parameter when all covariates are
included in the model. Note that only one covariate, that for
the per cent cover of the rush Juncus gerardii, is weakly signicant at the 5% level.
In linear regression, an expression for the variances of the
parameters bj is given by (Draper & Smith 1998; Fox 2008):
Variancebj
1
r2
2
1 Rj n 1S2j
Table 1. P-values of the t-statistic for three linear regression models and variance ination factor (VIF) values for the full model. In the full
model, the number of banded sparrows, which is a measure of how many birds were present, is modelled as a function of the covariates listed in
the rst column. In the second and third columns, the P-values and VIF values for the full model are presented (note that no variables have been
removed yet). In the fourth column P-values are presented for the model after collinearity has been removed by sequentially deleting each
variable for which the VIF value was highest until all remaining VIFs were below 3. In the last column, only variables with signicant P-values
remain, giving the most parsimonious explanation for the number of sparrows in a plot
Covariate
VIF
% Juncus gerardii
% Shrub
Height of thatch
% Spartina patens
% Distichlis spicata
% Bare ground
% Other vegetation
% Phragmites australis
% Tall sedge
% Water
% Spartina alterniora (short)
% Spartina alterniora (tall)
Maximum vegetation height
Vegetation stem density
00203
09600
09989
00640
00527
00666
00730
00715
02160
00568
00549
00960
02432
07219
449953
27818
16712
1593506
537545
120586
58170
37490
44093
170677
1214637
1593828
61200
32064
00001
00568
08263
03312
02538
08908
09462
02734
04313
06942
02949
000004
00727
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314
10 A. F. Zuur et al.
shows a multi-panel scatterplot (also called a pair plot) for the
1295 saltmarsh sparrows for which we have morphological
data. Any observation that sticks out from the black cloud
needs further investigation; these may be dierent species,
measurement errors, typing mistakes or they may be correct
values after all. Note that the large wing length observation
that we picked up with the Cleveland dotplot in Fig. 2b has
average values for all other variables, suggesting that it is
indeed something that should be checked. The lower panels in
Fig. 10 contain Pearson correlation coecients, which can be
inuenced by outliers meaning that outliers can even contribute to collinearity.
Banded
8 10 12
20
40
% Tall sedge
50
40
30
20
10
0
0
10
15
10 20 30 40 50
10
20
80
30
15
20
% Bare ground
% Juncus gerardii
10
10
20
15
40
60
% Other vegetation
20 0
% Shrub
40
50
40
30
20
10
0
% Water
% Distichlis
50
40
30
20
10
0
60
8 10 12 0
Height of thatch
30
40
20 40
50
60
60 80 100
% Phragmites australis
50
40
30
20
10
0
10
% Spartina patens
20
40
60
80
Covariates
10 12 14 16
10 15 20 25
65
20 24 28 32
05
55
Tarsus length
05
Head length
04
05
07
25
05
04
05
07
07
06
05
06
06
05
12 18
Culmen length
10 20
10 14
35
20 26 32
Wing chord
55 60 65
25
30
35
6 8
12
Weight
16
Fig. 10. Multi-panel scatterplot of morphometric data for the 1295 saltmarsh sparrows.
The upper right panels show pairwise scatterplots between each variable, and the lower left panels contain Pearson correlation
coecients. The font size of the correlation
coecient is proportional to its value. Note
that there are various outliers.
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314
Data exploration 11
Given : month
Sep
Aug
Jul
Jun
May
60
52
56
60
52
56
60
Male
Given : sex
Female
16 18 20 22 24
56
16 18 20 22 24
Weight (g)
52
52
56
60
52
56
60
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314
12 A. F. Zuur et al.
(b)
C. fuscicollis ACF
ACF
04 02 08
C. fuscicollis abundance
0
400 800
(a)
10
15
20
Time (2 weeks)
25
6
8
Lag
10
12
(d)
L. dominicanus ACF
ACF
04 02 08
L. dominicanus abundance
0
4
8 12
(c)
10
15
20
Time (2 weeks)
25
Discussion
All of the problems described in this paper, and the strategies
to address them, apply throughout ecological research, but
6
8
Lag
10
12
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314
Data exploration 13
Ecological eld data tend to be noisy, eld conditions
unpredictable and prior knowledge often limited. In the
applied realm, changes in funding, policy, and research priorities further complicate matters. This situation is especially so
for long-term studies, where the initial goals often change
with circumstances (e.g. the use of many data sets to examine
species responses to climate change). For all these reasons,
the idealized situation whereby an ecologist carefully designs
their analysis a priori and then collects data may be compromised or irrelevant. Having the analytical exibility to adjust
ones analyses to such circumstance is an important skill for
an applied ecologist, but it requires a thorough understanding of the constraining assumptions imposed by a given data
set.
When problems arise, the best solutions vary. Frequently,
however, ecologists simply transform data to avoid assumption violations. There are three main reasons for a transformation; to reduce the eect of outliers (especially in covariates), to
stabilize the variance and to linearize relationships. However,
using more advanced techniques like GLS and GAMs, heterogeneity and nonlinearity problems can be solved, making
transformation less important. Zuur et al. (2009a) showed
how the use of a data transformation resulted in dierent conclusions about long-term trends compared to an appropriate
analysis using untransformed data; hence it may be best to
avoid transforming response variables. If a transformation is
used, automatic selection tools such as Mosteller and Tukeys
bulging rule (Mosteller & Tukey 1977) should be used with
great caution because these methods ignore the eects of covariates. Another argument against transformations is the need
to subsequently back-transform values to make predictions; it
may not always be clear how to do this and still be able to interpret results on the original scale of the response variable. It is
also important to ensure that the transformation actually
solves the problem at hand; even commonly recommended
transformations do not always work. The bottom line is that
the choice of a specic transformation is a matter of trial and
error.
It is a given fact that data exploration should not be used to
dene the questions that a study sets out to test. Every step of
the exploration should be reported, and any outlier removed
should be justied and mentioned. Reasons for data transformations need to be justied based on the exploratory analysis
(e.g. evidence that model assumptions were violated and that
the transformation rectied the situation).
Applying data exploration (e.g. scatterplots to visualize relationships between response and explanatory variables) to create hypotheses and then using the same data to test these
hypotheses should be avoided. If one has limited a priori
knowledge, then a valid approach is to create two data sets;
apply data exploration on the rst data set to create hypotheses
and use the second data set to test the hypotheses. Such a process, however, is only practical for larger data sets. Regardless
of the specic situation, the routine use and transparent reporting of systematic data exploration would improve the quality
of ecological research and any applied recommendations that
it produces.
Acknowledgements
We thank Anatoly Saveliev, and two anonymous reviewers for comments on
an earlier draft.
References
Brockwell, P.J. & Davis, R.A. (2002) Introduction to Time Series and Forecasting, 2nd edn. Springer-Verlag, New York.
Brown, H. & Prescott, R. (2006) Applied Mixed Models in Medicine, 2nd edn.
John Wiley and Sons, New York.
Burnham, K.P. & Anderson, D.R. (2002) Model Selection and Multimodel
Inference. A Practical InformationTheoretic Approach, 2nd edn. Springer,
New York.
Cameron, A.C. & Trivedi, P.K. (1998) Regression Analysis of Count Data.
Cambridge University Press, Cambridge, UK.
Carroll, R.J., Ruppert, D., Stefanski, L.A. & Crainiceanu, C.M. (2006)
Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edn.
Chapman & Hall, Boca Raton, FL.
Chateld, C. (1998) Problem Solving: A Statisticians Guide. Chapman & Hall,
Boca Raton, FL.
Cleveland, W.S. (1993) Visualizing Data. Hobart Press, Summit, NJ.
Draper, N.R. & Smith, H. (1998) Applied Regression Analysis, 3rd edn. John
Wiley and Sons, New York.
Ellison, A.M. (2004) Bayesian inference in ecology. Ecology Letters, 7, 509
520.
Elphick, C.S. & Oring, L.W. (1998) Winter management of Californian rice
elds for waterbirds. Journal of Applied Ecology, 35, 95108.
Elphick, C.S. & Oring, L.W. (2003) Conservation implications of ooding rice
elds on winter waterbird communities. Agriculture, Ecosystems and
Environment, 94, 1729.
Fitzmaurice, G.M., Laird, N.M. & Ware, J.H. (2004) Applied Longitudinal
Analysis. John Wiley & Sons, Hoboken, NJ.
Fox, J. (2008) Applied Regression Analysis and Generalized Linear Models, 2nd
edn. Sage Publications, CA.
Gelman, A., Pasarica, C. & Dodhia, R. (2002) Lets practice what we preach:
turning tables into graphs in statistic research. The American Statistician, 56,
121130.
Gjerdrum, C., Elphick, C.S. & Rubega, M. (2005) What determines nest site
selection and nesting success in saltmarsh breeding sparrows? Condor, 107,
849862.
Gjerdrum, C., Elphick, C.S. & Rubega, M.A. (2008) How well can we model
numbers and productivity of saltmarsh sharp-tailed sparrows (Ammodramus
caudacutus) using habitat features? Auk, 125, 608617.
Harvey, A.C. (1989) Forecasting, Structural Time Series Models and the Kalman
Filter. Cambridge University Press, Cambridge, UK.
Hilbe, J.M. (2007) Negative Binomial Regression. Cambridge University Press,
Cambridge, UK.
Hurlbert, S.H. (1984) Pseudoreplication and the design of ecological eld
experiments. Ecological Monographs, 54, 187211.
Jollie, I.T. (2002) Principal Component Analysis, 2nd edn. Springer, New York.
Koper, N. & Manseau, M. (2009) Generalized estimating equations and generalized linear mixed-eects models for modelling resources selection. Journal
of Applied Ecology, 46, 590599.
Laara, E. (2009) Statistics: reasoning on uncertainty, and the insignicance of
testing null. Annales Zoologici Fennici, 46, 138157.
Law, R., Illian, J., Burslem, D.F.R.P., Gratzer, G., Gunatilleke, C.V.S. &
Gunatilleke, I.A.U.N. (2009) Ecological information from spatial patterns of
plants: insights from point process theory. Journal of Ecology, 97, 616628.
Legendre, P. & Legendre, L. (1998) Numerical Ecology. Second English
Edition. Elsevier, Amsterdam.
Montgomery, D.C. & Peck, E.A. (1992) Introduction to Linear Regression
Analysis. Wiley, New York.
Morgan, J.H. (2004) Remarks on the taking and recording of biometric
measurements in bird ringing. The Ring, 26, 7178.
Mosteller, F. & Tukey, J.W. (1977) Data Analysis and Regression: A Second
Course in Statistics. Addison Wesley, Reading, MA.
Ostrom, C.W. (1990) Time Series Analysis: Regression Techniques, 2nd edn.
Sage Publications Inc, Thousand Oaks Newbury Park, CA.
Pinheiro, J. & Bates, D. (2000) Mixed Eects Models in S and S-Plus. SpringerVerlag, New York.
Pullin, A.S. & Knight, T.M. (2009) Doing more good than harm building an
evidence-based for conservation and environmental management. Biological
Conservation, 142, 931934.
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314
14 A. F. Zuur et al.
Quinn, G.P. & Keough, M.J. (2002) Experimental Design and Data Analysis for
Biologists. Cambridge University Press, Cambridge, UK.
R Development Core Team (2009) R: A Language and Environment for
Statistical Computing. R Foundation for Statistical Computing, Vienna.
ISBN 3-900051-07-0. URL http://www.R-project.org.
Roberts, P.D., Stewart, G.B. & Pullin, A.S. (2006) Are review articles a
reliable source of evidence to support conservation and environmental
management? A comparison with medicine. Biological Conservation, 132,
409423.
Robinson, A.P. & Hamann, J.D. (2008) Correcting for spatial autocorrelation in sequential sampling. Journal of Applied Ecology, 45, 1221
1227.
Sarkar, D. (2008) Lattice: Multivariate Data Visualization with R. Springer,
New York.
Schabenberger, O. & Pierce, F.J. (2002) Contemporary Statistical Models for
the Plant and Soil Sciences. CRC Press, Boca Raton, FL.
Sonderegger, D.L., Wang, H., Clements, W.H. & Noon, B.R. (2009) Using
SiZer to detect thresholds in ecological data. Frontiers in Ecology and the
Environment, 7, 190195.
Stephens, P.A., Buskirk, S.W., Hayward, G.D. & Mart nez del Rio, C. (2005)
Information theory and hypothesis testing: a call for pluralism. Journal of
Applied Ecology, 42, 412.
ter Braak, C.J.F. & Verdonschot, P.F.M. (1995) Canonical correspondence
analysis and related multivariate methods in aquatic ecology. Aquatic
Science, 57, 225289.
Wood, S.N. (2006) Generalized Additive Models. An Introdcution with R.
Chapman Hall CRC, Boca Raton, FL. Zuur, A.F., Ieno, E.N., Walker,
N.J., Saveliev, A.A. & Smith, G. (2009a) Mixed Eects Models and Extensions in Ecology with R. Springer, New York.
Zuur, A.F., Ieno, E.N. & Smith, G.M. (2007) Analysing Ecological Data.
Springer, New York.
Zuur, A.F., Ieno, E.N. & Meesters, E.H.W.G. (2009b) A Beginners Guide to R.
Springer, New York.
Received 13 August 2009; accepted 8 October 2009
Handling Editor: Robert P. Frecklenton
Supporting Information
Additional Supporting Information may be found in the online
version of this article:
Appendix S1. Data sets and R code used for analysis.
As a service to our authors and readers, this journal provides
support ing information supplied by the authors. Such materials may
be re-organized for online delivery, but are not copy-edited or typeset.
Technical support issues arising from supporting information (other
than missing les) should be addressed to the authors.
2009 The Authors. Journal compilation 2009 British Ecological Society, Methods in Ecology and Evolution, 1, 314