Animal Movements - Statistical Models
Animal Movements - Statistical Models
Animal Movements - Statistical Models
Movement
STATISTICAL MODELS FOR
TELEMETRY DATA
Cervus canadensis Phoca largha; Dave
Withrow), and mountain lion (Puma concolor; Jacob Ivan, Colorado Parks and Wildlife).
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background on Animal Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Population Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Spatial Redistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Home Ranges, Territories, and Groups . . . . . . . . . . . . . . . . 6
1.1.4 Group Movement and Dynamics. . . . . . . . . . . . . . . . . . . . . . . 7
1.1.5 Informed Dispersal and Prospecting . . . . . . . . . . . . . . . . . . . 8
1.1.6 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.7 Individual Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.8 Energy Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1.9 Food Provision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1.10 Encounter Rates and Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Telemetry Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Statistical Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Additional Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
v
vi Contents
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Author Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Preface
With the field of animal movement modeling evolving so rapidly, navigating the
expanding literature is challenging. It may be impossible to provide an exhaustive
summary of animal movement concepts, biological underpinnings, and behavioral
theory; thus, we view this book as a starting place to learn about the fundamen-
tal suite of statistical modeling tools available for providing inference concerning
individual-based animal movement.
Notice that the title is focused on “statistical models for telemetry data.” The set of
existing literature related to animal movement is massive, with thousands of individ-
ual papers related to the general topic. All of this information cannot be synthesized
in a single volume; thus, we focus on the subset of literature mainly concerned with
parametric statistical modeling (i.e., statistical approaches for inverse modeling based
on data and known probability distributions, mainly using likelihood and Bayesian
methods). There are many other approaches for simulating animal movement and
visualizing telemetry data; we leave most of those for another volume.
Our intention is that this book reads more like a reference than a cookbook. It pro-
vides insight about the statistical aspects of animal movement modeling. We expect
two types of readers: (1) a portion of readers will use this book as a companion ref-
erence for obtaining the background necessary to read scientific papers about animal
movement, and (2) the other portion of readers will use the book as a foundation for
creating and implementing their own statistical animal movement models.
We designed this book such that it opens with an overview of animal movement
data and a summary of the progression of the field over the years. Then we provide
a series of chapters as a review of important statistical concepts that are relevant for
the more advanced animal movement models that follow. Chapter 4 covers point pro-
cess models for learning about animal movement; many of these rely on uncorrelated
telemetry data, but Section 4.7 addresses spatio-temporal point processes. Chapters 5
through 6 are concerned with dynamic animal movement models of both the discrete-
and continuous-time flavors. Finally, Chapter 7 describes approaches to use mod-
els in sequence, properly accommodating the uncertainty from first-stage models in
second-stage inference.
We devote a great deal of space to spatial and temporal statistics in general because
this is an area that many animal ecologists have received no formal training in. These
subjects are critical for animal movement modeling and we recommend at least a light
reading of Chapters 2 and 3 for everyone. However, we recognize that readers already
familiar with the basics of telemetry data, as well as spatial and temporal statistics,
may be tempted to skip ahead to Chapter 4, only referring back to Chapters 2 and 3
for reference.
Finally, despite the rapid evolution of animal movement modeling approaches,
no single method has risen to the top as a gold standard. This lack of a universally
accepted framework for analyzing all types of telemetry data is somewhat unique in
the field of quantitative animal ecology and can be daunting for new researchers just
ix
x Preface
wanting to do the right thing. On the other hand, it is an exciting time in animal ecol-
ogy because we can ask and answer new questions that are fundamental to the biology,
ecology, and conservation of wildlife. Each new statistical approach for analyzing
telemetry data brings potential for new inference into the scientific understanding of
critical processes inherent to living systems.
Acknowledgments
The authors acknowledge the following funding sources: NSF DMS 1614392,
CPW T01304, NOAA AKC188000, PICT 2011-0790, and PIP 112-201101-58. The
authors are grateful to (in alphabetical order) Mat Alldredge, Chuck Anderson, David
Anderson, Ali Arab, Randy Boone, Mike Bower, Randy Brehm, Brian Brost, Franny
Buderman, Paul Conn, Noel Cressie, Kevin Crooks, Marìa del Mar Delgado, Bob
Dorazio, Tom Edwards, Gabriele Engler, John Fieberg, James Forester, Daniel Fortin,
Marti Garlick, Brian Gerber, Eli Gurarie, Ephraim Hanks, Dan Haydon, Trevor
Hefley, Tom Hobbs, Jennifer Hoeting, Gina Hooten, Jake Ivan, Shea Johnson, Gwen
Johnson, Layla Johnson, Matt Kaufman, Bill Kendall, Carey Kuhn, Josh London,
John Lowry, Jason Matthiopoulos, Joe Margraf, Leslie McFarlane, Josh Millspaugh,
Ryan Neilson, Joe Northrup, Otso Ovaskainen, Jim Powell, Andy Royle, Henry
Scharf, Tanya Shenk, John Shivik, Bob Small, Jeremy Sterling, David Theobald, Len
Thomas, Jay Ver Hoef, Lance Waller, David Warton, Gary White, Chris Wikle, Perry
Williams, Ken Wilson, Ryan Wilson, Dana Winkelman, George Wittemyer, Jamie
Womble, Jun Zhu, and Jim Zidek for various engaging discussions about animal
movement, assistance, collaboration, and support during this project. The findings
and conclusions in this book by the NOAA authors do not necessarily represent the
views of the National Marine Fisheries Service, NOAA. Any use of trade, firm, or
product names is for descriptive purposes only and does not imply endorsement by
the U.S. Government.
xi
Authors
Mevin B. Hooten is an associate professor in the Departments of Fish, Wildlife, and
Conservation Biology, and Statistics at Colorado State University. He is also assistant
unit leader in the U.S. Geological Survey, Colorado Cooperative Fish and Wildlife
Research Unit. Dr. Hooten earned a PhD in statistics at the University of Missouri.
His research focuses on the development of statistical methodology for spatial and
spatio-temporal ecological processes.
xiii
1 Introduction
1
2 Animal Movement
Discrete-time
Location data Where could it models
go? (Chapters 5, 7)
FIGURE 1.1 Relationships among data types, analytical methods, and some fundamental
questions of movement ecology. Location data are the cornerstone of all of the analysis meth-
ods described in this book. Environmental data, such as those acquired from remote sensing, are
useful in drawing connections between animals and their surroundings. Auxiliary bioteleme-
try data, such as accelerometer or dive profile data, can help address questions about animal
behavior. Dashed lines indicate where data can be helpful for addressing particular questions,
but are not essential.
data, and biotelemetry tags allow for the simultaneous collection of important physio-
logical and behavioral information from wild animals. These technological advances
will lead to a better understanding of how individual decisions affect demographic
parameters and ultimately translate into population dynamics. In this sense, animal
movement can provide the long-sought bridge between behavior, landscape ecology,
and population dynamics (Lima and Zollner 1996; Wiens 1997; Morales et al. 2010;
Kays et al. 2015).
In what follows, we provide a brief summary of research findings, existing knowl-
edge, and analytic approaches for important aspects of animal movement ecology.
We organized these topics into 10 sections:
1. Population dynamics
2. Spatial redistribution
3. Home ranges, territories, and groups
4. Group movement and dynamics
5. Informed dispersal and prospecting
Introduction 3
6. Memory
7. Individual condition
8. Energy balance
9. Food provision
10. Encounter rates and patterns
and Penteriani 2008), and on the interplay between individual behavior and features
of the underlying landscape (Johnson et al. 1992; McIntyre and Wiens 1999; Fahrig
2001; Ricketts 2001; Morales et al. 2004; Mueller and Fagan 2008), including reac-
tions to habitat boundaries (Schultz and Crone 2001; Morales 2002; Schtickzelle and
Baguette 2003; Ovaskainen 2004; Haynes and Cronin 2006). In particular, population
heterogeneity produces leptokurtic (i.e., heavy tailed) redistribution kernels when a
subset of individuals consistently moves longer distances than others (Skalski and
Gilliam 2000; Fraser et al. 2001).
Several factors can explain why two individuals belonging to the same population
move differently. They may be experiencing different environments of heteroge-
neous landscapes; they can also have different phenotypes or condition, different past
experiences (e.g., Frair et al. 2007), or even different “personalities” (Fraser et al.
2001; Dall et al. 2004). In a theoretical study, Skalski and Gilliam (2003) modeled
animals switching between fast and slow random walk movement states and found
that the resulting redistribution kernel depended on the total time spent in each of the
states and not on the particular sequence of changes. This theoretical result highlights
the importance of animals’ time budgets for scaling movement processes (Figure 1.2).
It is common to consider that individuals have a small set of movement strategies
(Blackwell 1997; Nathan et al. 2008), and the time allocation to these different behav-
iors (or “activity budgets”) can depend on the interaction between their motivation
and the structure of the landscape they occupy (Morales et al. 2004, 2005). The results
Past
Allospecifics Redistribution
experiences
Environment Reproduction
FIGURE 1.2 Mechanistic links between animal movement and population dynamics adapted
from Morales et al. (2010). We consider an unobserved individual internal state that inte-
grates body condition (e.g., energy reserves, reproductive status). Several factors affect the
dynamics of this internal state, including social interactions with conspecifics, trophic or other
interaction with allospecifics (other species), and abiotic environmental effects and dynamics.
Internal state dynamics determine the organism’s time allocation to different behaviors (e.g.,
food acquisition, predator avoidance, homing, and landscape exploration) but is also modu-
lated by past experiences and phenotypic trails such as behavioral predispositions. As different
behaviors imply different movement strategies, the time budget determines the properties of
the spatial redistribution that describes space use. Time allocation to different behaviors also
affects individual survival and reproduction, and hence, overall population dynamics.
6 Animal Movement
of Skalski and Gilliam (2003) imply that knowing the fraction of time allocated to
each behavior makes it possible to derive suitable redistribution kernels.
A common reaction in the visual inspection of movement data is to intuit that
individuals are moving differently at different times. As a result, several techniques
(including many ad hoc procedures) have been developed to identify and model
changes in movement behavior from trajectory data (reviewed in Patterson et al.
2008; Schick et al. 2008; Gurarie et al. 2016). Clustering models, such as those we
describe in Chapter 5, can be difficult to reliably implement because biologically
different movement behaviors can lead to very similar trajectories. For example, it
may be difficult to distinguish relative inactivity (e.g., resting) from intense foraging,
within a small patch, based on horizontal trajectory alone. However, as physiological
and other information becomes available through biotelemetry devices, we may gain
greater insight into how animals allocate time to different tasks and how this allo-
cation changes in different environments (McClintock et al. 2013), thus providing a
mechanistic way to model redistribution kernels conditional on individual state.
Another result from Skalski and Gilliam (2003) is that a mixture of movement
states converges to simple diffusion if given enough time. The sum of n independent
and identically distributed random variables with finite variance will be Gaussian
distributed as n increases. Thus, if all individuals in a population move according
to the same stochastic process, we would expect that, at some time after the initi-
ation of movement, the distribution of distance moved becomes Gaussian because
the distance traveled is the sum of movement vectors. However, this depends on the
rate of convergence and independence assumption. Still, similar results may relate to
the interaction between individual behavior and landscape structure (Morales 2002;
Levey et al. 2005) and are the focus of ongoing research.
We return to redistribution kernels for animal movement in Chapters 4 through 6.
In particular, we consider spatial redistribution from three different perspectives (i.e.,
point processes, discrete-time processes, and continuous-time processes) and high-
light the relevant literature associated with each. We also show how to scale up from
Lagrangian to Eulerian models for movement in Chapter 6.
another. They found that, rather than following dominant individuals, baboons are
more likely to follow others when multiple initiators of movement agree, suggesting
a democratic collective action emerging from simple rules. In a study of fission–
fusion dynamics of spider monkeys (Ateles geoffroyi), Ramos-Fernández and Morales
(2014) found that group composition and cohesion affected the chance that a partic-
ular individual will leave or join a group. As another example, Delgado et al. (2014)
found that dispersing juveniles of eagle owl (Bubo bubo) were generally attracted to
conspecifics, but the strength of attraction decreased with decreasing proximity to
other individuals. However, despite this progress, models for animals that decide to
leave their territory or abandon a group, and how they explore and choose where to
establish new territories or home ranges, have yet to appear in the literature.
1.1.6 MEMORY
The importance of previous experiences and memory is increasingly being
recognized and explicitly considered in the analysis of telemetry data (e.g., Dalziel
et al. 2008; McClintock et al. 2012; Avgar et al. 2013; Fagan et al. 2013; Merkle et al.
Introduction 9
2014). Smouse et al. (2010) provide a summary of the approaches used to include
memory in movement models. Formulating memory models has largely been a the-
oretical exercise but the formal connection with data is possible. For example, the
approach used to model the effect of scent marking in mechanistic home range mod-
els (Moorcroft and Lewis 2013) could be easily adapted to model memory processes.
Avgar et al. (2015) fit a movement model that included perceived quality of visited
areas and memory decays to telemetry data from migrating Caribou. It is less clear
what role memory plays in population dynamics.
Forester et al. (2007) describe how certain discrete-time movement models can be
reformulated to provide inference about memory. We explain these ideas in Chapter 5.
In continuous-time models, Hooten and Johnson (2016) show how to utilize basis
function specifications for smooth stochastic processes to represent different types of
memory and perception processes. We discuss these functional movement modeling
approaches in Chapter 6.
* We refer to “tags” generically here; for most terrestrial mammals, the telemetry devices are attached
to neck collars and fitted to the individual animals. Telemetry devices have been fitted to animals in a
variety of other ways.
Introduction 13
* We describe specific aspects of Argos data and potential remedies in Chapters 4 and 5.
† See Kays et al. (2015) for a recent overview of tag technology.
14 Animal Movement
include light-sensing “geologgers” for smaller species (e.g., Bridge et al. 2011),
archival “pop-up” tags popular in fisheries (e.g., Patterson et al. 2008), proximity
detectors (e.g., Ji et al. 2005), acoustic tags (e.g., McMichael et al. 2010), “life his-
tory” tags (Horning and Hill 2005), accelerometer tags (e.g., Lapanche et al. 2015),
and automatic trajectory representation from video recordings (Pérez-Escudero et al.
2014). In what follows, we primarily focus on the analysis of location data such
as those obtained from VHF, GPS, and Argos tags. However, many of the meth-
ods we present can utilize location information arising from other sources, as well
as incorporate auxiliary information about the individual animal’s internal and exter-
nal environment that is now regularly being collected from modern biotelemetry tags.
Winship et al. (2012) provide a comparison of the fitted movement of several different
marine animals when using GPS, Argos, and light-based geolocation tags.
1.3 NOTATION
A wide variety of notation has been used in the literature on animal movement data
and modeling. This variation in statistical notation used makes it challenging to main-
tain consistency in a comprehensive text on the subject. We provided this section,
along with Table 1.1, in an attempt to keep expressions as straightforward as possi-
ble. We recommend bookmarking this section on your first reading so that you may
return to it quickly if the notation becomes confusing.
Conventional telemetry data consist of a finite set of spatially referenced geo-
graphic locations (S ≡ {s1 , . . . , si , . . . , sn }) representing the individual’s observed
location at a set of times spanning some temporal extent of interest (e.g., a season
or year). We use the notation, {μ1 , . . . , μn } to represent the corresponding true posi-
tions of the animal. Sometimes, the observed telemetry data are assumed to be the
true positions (i.e., no observation error); however, in most situations, they will be dif-
ferent. The times at which locations are observed can be thought of as fixed and part
of the “design,” or as observed random variables. In either case, a statistical notation
with proper time indexing becomes somewhat tricky. To remain consistent with the
broader literature on point processes (and with Chapter 2), we assume that there are
n telemetry observations collected at times t ≡ (t1 , . . . , ti , . . . , tn ) such that ti ∈ T
and t ⊂ T . The seemingly redundant time indexing accounts for the possibility of
irregularly spaced data in time. If the differences (i = ti − ti−1 ) between two time
points at which we have telemetry observations are all equal, we could just as easily
use the direct time indexing where the data are st for t = 1, . . . , T. In that case, we
have T = n. From a model-building perspective, it is sometimes less cumbersome to
index telemetry observations in time (i.e., st ) and deal with temporal irregularity dur-
ing the implementation. However, there are some situations, for example, when the
points are serially dependent, where we need the i notation. A further perspective
on notation arises when considering that the true animal location process is a continu-
ous process in time. To formally recognize this, we often index the observed location
vectors as s(ti ) (or μ(ti ), in the case of the true positions). The parenthetical notation
at least admits that we are often modeling animal locations as a continuous function.
Thus, prepare yourself to see all types of indexing, both in this text and in the vast
animal movement literature.
Introduction 15
TABLE 1.1
Statistical Notation
Notation Definition
* Parametric statistical models involve the specification of known probability distributions with parameters
that are unknown but estimated in the model fitting procedure.
16 Animal Movement
where yi are the observations (we use si for telemetry observations instead of yi ) for
i = 1, . . . , n, θ are the data model parameters, and the bracket notation “[·]” repre-
sents a probability distribution. The data model is often referred to as the “likelihood”
by Bayesians, but the likelihood used in maximum likelihood estimation (MLE)
is proportional to the joint distribution of the data conditioned on the parameters.
When the observations
are conditionally independent, the likelihood is often writ-
ten as [y|θ] = ni=1 [yi |θ], where individual data distributions can be multiplied to
obtain the joint distribution because of independence. To fit the model using MLE, the
likelihood is usually maximized numerically to find the optimal parameter values θ̂.
The Bayesian approach involves the specification of a probability model for the
parameters, θ ∼ [θ], that depend on fixed hyperparameters assumed to be known. The
prior probability distribution should contain information about the parameters that
is known before the data are collected, except for cases where regularization-based
model selection is desired (Hooten and Hobbs 2015), in which case, the prior can be
tuned based on a cross-validation procedure. Rather than maximizing the likelihood,
the Bayesian approach seeks to find the conditional distribution of the parameters
given the data (i.e., the posterior distribution)
[y|θ][θ]
[θ|y] = , (1.1)
[y|θ][θ] dθ
where y is a vector notation for all the observations and the denominator in Equa-
tion 1.1 equates to a scalar constant after the data have been observed. For complicated
models, the multidimensional integral in the denominator of Equation 1.1 cannot be
obtained analytically (i.e., exactly by pencil and paper) and must be either numer-
ically calculated or avoided using a stochastic simulation procedure. Markov chain
Monte Carlo (MCMC; Gelfand and Smith 1990) allows us to obtain samples from
the posterior distribution while avoiding the calculation of the normalizing constant
in the denominator of Equation 1.1. MCMC algorithms have many advantages (e.g.,
easy to develop), but also limitations (e.g., can be time consuming to run).
Hierarchical models are composed of a sequence of nested probability distribu-
tions for the data, the process, and the parameters (Berliner 1996). For example, a
basic Bayesian hierarchical model is
where zi is an underlying process for individual i and yi,j are repeated measurements
for each individual (j = 1, . . . , J). Notice that the process model parameters β also
require a prior distribution if the model is Bayesian. The posterior for this model is a
Introduction 17
Throughout the remainder of this book, we use both Bayesian and non-Bayesian
models for statistical inference in the settings where they are appropriate. Many com-
plicated hierarchical models are easier to implement from a Bayesian perspective,
but may not always be necessary. Hobbs and Hooten (2015) provide an accessible
description of both Bayesian and non-Bayesian methods and model-building strate-
gies as well as an overview of basic probability and fundamental approaches for fitting
models. Hereafter, we remind the reader of changes in notation and modeling strate-
gies as necessary without dwelling on the details of a full implementation because
those can be found in the referenced literature.
Each of these processes and associated statistical methods is relevant for analyz-
ing telemetry data. In Chapters 4 and 6, we show how spatial statistical concepts
can be employed to analyze telemetry data. We do not intend this chapter to be
comprehensive, but rather to serve as a reference for the important spatial pro-
cesses in the formulation of animal movement models in the following chapters. See
Cressie (1993) and Cressie and Wikle (2011) for additional material and references
concerning spatial and spatio-temporal statistical modeling.
19
20 Animal Movement
of spatial locations), and it is those characteristics that are the random quantities of
interest. For SPPs, we may also be interested in other characteristics associated with
the points such as size, condition, or another variable associated with the point, but
the point location is of primary interest. An SPP containing auxiliary information is
referred to as a marked SPP.
Many types of SPPs have been studied and models have been formulated to pro-
vide inference using observed SPP data. In the two-dimensional (2-D) spatial setting,
where the size (n) of the SPP is known, we can formulate a basic model for an
SPP with data represented by location vectors si (containing the coordinates in some
geographic space) such that si ∼ f (s) for i = 1, . . . , n and with support si ∈ S . The
probability density function (PDF) f stochastically controls the placement of the
points si , as it would for any other random variable. In the situation where n is
unknown before observing the SPP, the size of the SPP is also random, and thus,
a component of the overall random process that arises.
Consider a set of observed telemetry data for an individual bobcat (Lynx rufus;
Figure 2.1). In this case, the positions of the individual are measured at an irregu-
lar set of times and presented in geographic space. Bobcat occur throughout much
of North America and have been the subject of several scientific studies involving
telemetry data. The data presented in Figure 2.1 were collected at the Welder Wildlife
Foundation Refuge in southern Texas, USA (Wilson et al. 2010) using VHF telemetry
techniques. We return to these data in what follows to demonstrate spatial statistical
methods. Telemetry data, such as those shown in Figure 2.1, are often treated as SPP
data, but doing so relies on several assumptions that we discuss in more detail as they
arise.
3,113,000
3,112,000
Northing
3,111,000
3,110 ,000
FIGURE 2.1 Measured positions (si , for i = 1, . . . , 110 in UTM) of an individual bobcat in
the Welder Wildlife Foundation Refuge.
Statistics for Spatial Data 21
1. Sample n ∼ Pois(λ),
2. Sample si ∼ Unif(S ) for i = 1, . . . , n,
where the intensity parameter λ is set a priori and equal to the expected size of
the point process (i.e., E(n) = λ) and “Unif” is a multivariate uniform distribution
(usually 2-D) for si ≡ (s1,i , . . . , sd,i ) in d dimensions. The resulting set of points
S ≡ (s1 , . . . , sn ) is a realization from a homogeneous Poisson SPP (they will also
be CSR).
Using an intensity of λ = 100, we simulated two independent realizations (i.e.,
random sets of points) from a 2-D CSR Poisson SPP (Figure 2.2). In this case, the first
(a) (b)
1.0 1.0
0.8 0.8
Latitude
Latitude
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Longitude Longitude
FIGURE 2.2 Simulated positions (s1,i , for i = 1, . . . , 94 observations in panel (a) and s2,i ,
for i = 1, . . . , 84 observations in panel (b)) on a unit square outlined in black.
* “Support” is the term used in statistics to describe the space where the random variable lives, or in other
words, the values that si can take on.
22 Animal Movement
where d represents the distance for which we desire inference (e.g., is the process
clustered or regular within distance d?). The K-statistic can be affected by edges of
the spatial domain when points are close to it; thus, an edge-corrected estimator for
the K-statistic was proposed by Ripley (1976):
K̂(d) = λ̂−1 w(si , sj )−1 1{di,j ≤d} /n, (2.2)
i=j
where di,j is the distance between point i and point j, λ̂ = n/area(S ), and w(si , sj )
is the proportion of the circumference of a circle, centered at si , that is inside the
study region S . The term 1{di,j ≤d} is an indicator variable that is equal to one when
the condition in the subscript is true and zero otherwise.
The use of simulated data for hypothesis testing of SPP characteristics is more
tractable than deriving theoretical properties of the K-statistic. However, because
we are in a simulation setting, we need to employ a Monte Carlo test. The proce-
dure is simple. Begin by simulating a large number, N, of CSR SPPs; then compute
Equation 2.2 for each one and for the observed data S as well (i.e., K̂(d)obs ). Rank
all of these estimated K-statistics together for a given distance of interest d. Reject
the null hypothesis of “no clustering” at the α level if 1 − (rank(K̂(d)obs )/N) < α
(conversely, rank(K̂(d)obs )/N < α for the null hypothesis “no regularity”).
Plotting clustering statistics for several values of d simultaneously can be helpful
in the examination of SPP patterns descriptively. For graphical purposes, it is often
easier to assess patterns with a slightly modified version of K-function,* called the
L-function, which is estimated as
L̂(d) = K̂(d)/π − d. (2.3)
* We use the term “function” here because K(d) and L(d) are considered for a range of values of d.
Statistics for Spatial Data 23
(a) (b)
1.0 0.08
0.8 0.06
Latitude
0.6 0.04
L
0.4 0.02
0.2 0.00
0.0 −0.02
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5
Longitude Distance
(c) (d)
1.0 0.08
0.8 0.06
Latitude
0.6 0.04
L
0.4 0.02
0.2 0.00
0.0 −0.02
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5
Longitude Distance
FIGURE 2.3 Simulated CSR SPP (a) and scaled bobcat telemetry SPP (c). Associated L̂(d)
functions appear in the right panels (b: CSR and d: bobcat). The gray regions in panels (b) and
(d) represent 95% intervals based on Monte Carlo simulation from 1000 CSR processes in the
same spatial domain.
estimating f (s) is called kernel density estimation (KDE) and has a long history of use
in a variety of applications (Diggle 1985; Cressie 1993; Schabenberger and Gotway
2005).
In KDE, one takes a nonparametric approach to estimating f , whereby, for any
location of interest c ≡ (c1 , c2 ) in the spatial domain S , the estimate of the density
function that gives rise to a point process is
T
t=1 k((c1 − s1,t )/b1 )k((c2 − s2,t )/b2 )
f̂ (c) = , (2.4)
Tb1 b2
where st ≡ (s1,t , s2,t ) , k represents the kernel (often assumed to be Gaussian), and
the parameters b1 and b2 are bandwidth parameters that control the diffuseness of the
kernel (Venables and Ripley 2002, Chapter 5). There are various ways to choose the
bandwidth parameters and these are well described in the literature (e.g., Silverman
1986). As the bandwidth increases, the smoothness of the KDE increases. Overly
smooth estimates will not reveal any patterns in the distribution giving rise to the
SPP but estimates that are not smooth enough will be too noisy to provide meaningful
inference.*
Treating the bobcat telemetry data as an observed SPP, we calculated the KDE for
the region shown in Figure 2.4. Based on the observed telemetry data, it appears that
the spatial density function giving rise to the bobcat SPP is irregularly shaped and
nonuniform. These results agree with the estimated L-function based on the same
data (Figure 2.3c and d).
3,113,000
−07
1.44e
3,112,000
1.8
−07
e−
1.22e
07
7
−0
Northing
e
1.6
−07
1.8e
3,111,000
7
1e−0
08
−
2e
3,110,000 8e−08
4e
6e
−0
−0
8
8
2e−
08
2e−08
FIGURE 2.4 Kernel density estimate (shown as contours) for the bobcat telemetry data.
* See Fieberg (2007) for a review of KDE methods for telemetry data.
Statistics for Spatial Data 25
The expression (2.5) implies that the expected total number of points in S
is E(n(S )) = λ̃(S |β).
2. For any J regions, B1 , . . . , BJ ⊆ S , that do not overlap, the number of
points in each subregion, n(B1 ), . . . , n(BJ ), are independent Poisson random
variables.
n
λ(si |β)
f (s1 , . . . , sn ) =
i=1
λ̃(S |β)
n
i=1 λ(si |β)
= n .
λ̃(S |β)
From the first property above, n ∼ Poisson (λ̃(S |β)) implies that the full likelihood
is the joint PDF of (s1 , . . . , sn , n) as a function of β:
n
n! i=1 λ(si |β) λ̃(S |β)n e−λ̃(S |β)
L(β) = n × (2.6)
λ̃(S |β) n!
n
= λ(si |β) exp(−λ̃(S |β)), (2.7)
i=1
26 Animal Movement
The n! term in Equation 2.6 arises because it does not matter in which order the
points are observed. Thus, the indices can be permuted n! different ways. The log of
Equation 2.7 yields the classic form of the log likelihood for the Poisson SPP:
n
l(β) = log λ(si |β) − λ(s|β)ds. (2.8)
i=1 S
n
l(β) = x (si )β − exp(x (s)β)ds. (2.9)
i=1 S
One can now proceed with standard statistical model fitting, from a likelihood
or Bayesian perspective, by either maximizing Equation 2.9 or assigning a prior
distribution for β (e.g., β ∼ N(0, β )) and finding the posterior distribution of
β|s1 , . . . , sn .
The main challenge in fitting the Poisson SPP model is that the integral on the
right-hand side of Equations 2.8 and 2.9 must be computed at every step in an opti-
mization or sampling algorithm because it contains the parameter vector β. The added
computational cost of the required numerical integration can lead to cumbersome
algorithms for direct maximization of the log likelihood and this is compounded if
the model is fit using MCMC. However, recent findings have shown that inference
using the inhomogeneous Poisson SPP model can be achieved in a wide variety of
ways, often with readily available statistical software. See Berman and Turner (1992),
Baddeley and Turner (2000), and Illian et al. (2013) for more detailed descriptions.
We provide a very brief introduction to the general approaches used in model fitting
in what follows.
There are two basic methods for approximating the log likelihood in Equation 2.9.
First, if x(s) is defined on a grid of cells over S , then we can use the first property
of Poisson SPP and sum all of the events occurring in each cell, yj . Each yj is an
independent Poisson random variable with rate λj = exp(aj + xj β), where aj is the
area of cell j. One can use any statistical software to fit a Poisson regression with
offset equal to aj . This does not really numerically approximate the likelihood, but
rather uses a summary of the raw data that retain much of the original information
and has a more usable likelihood. The second technique approximates the likelihood
function itself. The likelihood approximation is known as the Berman–Turner device
(Berman and Turner 1992), which can be described as
Statistics for Spatial Data 27
1. Partition S into J regions (e.g., grid cells) and take the centroids, c1 , . . . , cJ ,
as quadrature points. The integral is then approximated with
J
exp(x (s)β)ds ≈ wj exp(x (cj )β),
S j=1
n+J
When the number of points is known a priori, the SPP likelihood in Equation 2.6
simplifies to
n! ni=1 λ(si |β)
L(β) = , (2.11)
λ̃(S |β)n
resulting in the log likelihood
n
l(β) = log(n!) + log λ(si |β) − n · log λ(s|β)ds. (2.12)
i=1 S
The integral in Equation 2.12 must still be calculated to maximize the likelihood with
respect to β. Similar methods can be used to fit the SPP model when n is known and
we describe several of these in Chapter 4. The general form of PDF,
λ(si |β)
, (2.13)
λ̃(S |β)
for a point (si ) arising from a point process distribution has been referred to as a
“weighted distribution” in the statistical literature (e.g., Patil and Rao 1976, 1977,
1978).
There are three other useful classes of point process models for analyzing animal
telemetry data that we briefly mention here and expand upon in Chapter 4 where they
are directly discussed in reference to telemetry data models. The first class of models
is log Gaussian Cox process (LGCP) model. The LGCP model is a simple extension
to the Poisson SPP in Equation 2.9 with intensity function modeled as
where α(si , β) is a spatial effect (e.g., x (si )β), φ is a potential function that decreases
with increasing distance between points (δij ≡ ||si − sj ||) and controls the interaction
among points, and zβ is a normalizing term that ensures the likelihood is a PDF with
respect to s1 , . . . , sn . While the likelihood in Equation 2.15 appears relatively benign,
the zβ needed is usually analytically intractable and cannot be easily evaluated. How-
ever, Baddeley and Turner (2000) and Illian et al. (2013) have examined methods
similar to the previously described approximations for fitting GSPPs. In Chapter 4,
we illustrate a method similar to Illian et al. (2013) for developing and fitting a GSPP
model specifically for animal telemetry data.
(a) (b)
45 25
44
20
43
Latitude
Frequency
15
42
41 10
40 5
39
0
−98 −96 −94 −92 −90 40 45 50 55 60
Longitude Temperature
FIGURE 2.5 (a) Temperature data in February 1941, from portions of eight Midwestern
states (states outlined in black). Relative temperature values indicated by circle size. (b)
Frequency histogram for the average maximum temperatures in degrees Fahrenheit.
(a) (b)
45 45
44 44
43 43
Latitude
Latitude
42 42
41 41
40 40
39 39
−98 −96 −94 −92 −90 −98 −96 −94 −92 −90
Longitude Longitude
FIGURE 2.6 Covariates (i.e., predictor variables) for temperature: (a) longitude and (b) lat-
itude, shown as spatial maps (larger values shown in darker shade). U.S. state boundaries
overlaid in black. Points represent measurement locations.
30 Animal Movement
Statistically, we can model the observed CSP as we would any other response
variable in a linear or generalized linear model setting. For example, consider a con-
tinuous univariate response variable y(si ) with real support (i.e., y ∈ ). Then we
have the linear model:
y(si ) = x (si )β + η(si ),
where η(si ) ∼ N(0, σ 2 ) for i = 1, . . . , n. The assumption of normally distributed
errors is not a necessity for the estimation of β and σ 2 but it implies a decidedly
model-based statistical approach and allows us to generalize the model for other
purposes; thus, we retain it here.
The linear model can also be written as
y = Xβ + η, (2.16)
where σ 2 is the variance component (i.e., the “sill” in geostatistical parlance), dij is the
distance between locations si and sj (often written as dij ≡ ||si − sj ||, where the dou-
ble bar notation implies a “norm”), and φ is a spatial range parameter. As φ increases,
* The phrase “model-based geostatistics” was coined by Peter Diggle (Diggle et al. 1998).
Statistics for Spatial Data 31
the range of spatial structure in the second-order process η also increases. Thus, in
fitting this model, there is only one additional parameter (φ) to estimate beyond the
q + 1 parameters in the conventional regression model (2.16). Also note that the
covariance matrix (2.17) can be written as ≡ σ 2 R(φ), where R(φ) ≡ exp(−D/φ)
for pairwise distance matrix D.*
Numerous covariance models have been used to capture different types of spatial
dependence in the errors (e.g., Matern, Gaussian, spherical), and some are more gen-
eral than others. There are many excellent spatial statistics references, but Banerjee
et al. (2014) (p. 21) provided a particularly useful succinct summary of covariance
models.
Geostatistical models can be fit using generalized least squares (GLS), maxi-
mum likelihood, or Bayesian methods. In the nonparametric setting, the residuals
e(si ) = y(si ) − ŷ(si ), arising from a model fit based on Equation 2.16, are used to
empirically characterize the covariance using either a covariogram or a variogram.
The covariogram (c(si , sj )) and variogram (2γ (si , sj )) are directly related to each
other under certain conditions by c(si , sj ) = c(0) − γ (si , sj ). Under the assumption
of stationarity, the variogram for the errors can be expressed as
where the last equality arises because η(si ) and η(sj ) are assumed to have a constant
mean. The variogram is then estimated with the “empirical variogram”
1
2γ̂ (si , sj ) = (η(si ) − η(sj ))2 , (2.20)
nb
Sb
where Sb is a set of location pairs falling into a vector difference bin of choice and nb
is the size of this set (i.e., number of pairs in the bin). Often, 2γ̂ (si , sj ) is calculated
for a set of bins, usually over a range of distances. Also note that, if the spatial process
η is not observed directly, the residuals e resulting from a fit of the independent error
model (2.16) are used instead. The empirical variogram is a moment-based estimator
that is often credited to Matheron (1963) (though see Cressie 1990 for a discussion
of the history of geostatistics).†
Estimated only as a function of Euclidean distance between observation locations,
the empirical semivariogram for the raw temperature data is shown in Figure 2.7a,
whereas the semivariogram for the residuals (after regressing temperature on lon-
gitude and latitude) is shown in Figure 2.7b. In Figure 2.7, both semivariograms
generally increase to an asymptote, but the semivariogram for raw temperature has
* The “exp” is an element-wise exponential, exponentiating each element of the matrix on the inside of
the parentheses.
† The term “semivariogram” is often used in the spatial statistics literature and refers to γ (d), differing
from the variogram by a factor of 2. Spatial statistical software often computes the semivariance directly.
32 Animal Movement
(a) (b)
3.0
20
2.5
Semivariance
Semivariance
15 2.0
10 1.5
1.0
5
0.5
0 0.0
0 1 2 3 4 5 0 1 2 3 4 5
Distance Distance
FIGURE 2.7 Empirical semivariogram for temperature data (a) and (b) residuals after
regressing temperature on longitude and latitude. The range of the x-axis is half of the
maximum distance in the spatial domain.
a much larger asymptote and reaches it at larger distances. The maximum semivari-
ance is smaller for the residuals because most of the variation has been accounted for
by the covariates (i.e., longitude and latitude). Also, the point at which the semivar-
iogram levels off for the residuals occurs at a smaller distance because the range of
spatial structure in the raw temperatures includes the major north–south trend.
In layman’s terms, the two key assumptions in geostatistical modeling can be
intuited as
prediction at locations that were unobserved. The continuity in space is one reason
why spatial maps are often referred to as “processes.”*
Before we turn to prediction, we describe the covariance modeling that is used
in many applications of geostatistics. After the empirical variogram is estimated and
plotted against d for a set of distance bins (e.g., Figure 2.7), it allows us to visualize
the spatial structure in the process. It is often critical to find a parametric form for this
covariance so that (1) the covariance matrix can be used for further inference and (2)
we can learn about the covariance at distances other than those used in the empirical
variogram. The ability to calculate covariance for all locations in the spatial domain
facilitates prediction. Thus, we must find a parametric model that fits the empirical
variogram well. Like the covariance models discussed earlier (2.18), there is a suite of
parametric variogram models that are related to the covariance models through c(d) =
c(0) − γ (d). Weighted least squares is a common method for fitting the parametric
variogram model to the empirical variogram and yields parameter estimates for σ 2
and φ (and others if the model contains more). The covariance parameter estimates
can be substituted into the covariance matrix , ˆ which can then be used for estimating
β from the linear regression model (2.16) using GLS:
ˆ −1 X)−1 X
β̂ GLS = (X ˆ −1 y. (2.21)
In principle, this process is iterated such that the covariance matrix is estimated
based on the new residuals e = y − Xβ̂ GLS and then Equation 2.21 is used again to
update the regression coefficient estimates. In practice, we have found that the itera-
tively reweighted least squares procedure requires few iterations to converge to stable
estimates.
There are several alternatives to the iterative reweighted least squares estimation
procedure, including maximum likelihood and Bayesian methods. In the case of max-
imum likelihood, we begin with the fully parametric model and seek to find the
parameter values that maximize
1
L(β, σ 2 , φ) ∝ |(σ 2 , φ)|−1/2 exp − (y − Xβ) (σ 2 , φ)−1 (y − Xβ) , (2.22)
2
where (σ 2 , φ) makes it explicit that the covariance matrix depends on the parame-
ters σ 2 and φ. In the Bayesian framework, we specify priors for the model parameters
(β, σ 2 , and φ) and find the joint posterior distribution of these parameters given the
data.
In cases where there may be small-scale variability or sources of measurement
error, the geostatistical model is modified slightly to include an uncorrelated error
term (often referred to as a “nugget” effect in the spatial statistics literature) such that
y = Xβ + η + ε, (2.23)
* A Gaussian process, for example, is a continuous random process arising from a normal distribution
(perhaps in many dimensions) with covariance structure.
34 Animal Movement
where ε ∼ N(0, σε2 I). This generalization adds an additional parameter to the model
that needs to be estimated, but provides a way for the error in the spatial process to
arise from correlated and uncorrelated sources.
Using the temperature data as an example, the semivariogram for the temperature
residuals (Figure 2.7b) suggests a nugget may be useful for describing the covariance
structure because the semivariance is larger than zero at very small distances.
2.2.2 PREDICTION
Optimal prediction of the response variable y, in the spatial context, is referred to
as “Kriging,” named after the mining engineer D.L. Krige (see Cressie 1990 for
details). Given that response variables (i.e., data) are considered random variables
until they are observed, for prediction, we seek the conditional distribution of unob-
served response variables given those that were observed. That is, for a set of observed
data yo and a set of unobserved data yu , we wish to characterize the distribution
[yu |yo ], or at least moments of this probability distribution, which is referred to as
the predictive distribution.* In the case of interpolation (prediction within the space
of the data), we are often interested in obtaining the predictions ŷu = E(yu |yo ). A
tremendously useful feature of the Gaussian distribution is that it has analytically
tractable marginal and conditional distributions that are also Gaussian. To see this,
consider the joint distribution of the observed and unobserved data, such that
yo Xo o,o o,u
∼N β, , (2.24)
yu Xu u,o u,u
where the o subscript is used to denote correspondence with the observed data set and
u with the unobserved data set and the associated covariance and cross-covariance
matrices are indicated by the ordering of their subscripts. Then, using properties of
the multivariate normal distribution, the conditional distribution of the unobserved
data (yu ) given the observed data (yo ) is
When the parameters are all known, Equation 2.25 is the exact predictive distribution
of the unobserved data; thus, the Kriging predictions are obtained using
ŷu = Xu β + u,o −1
o,o (yo − Xo β), (2.26)
which is also known as the best linear unbiased predictor (BLUP). The BLUP is a
well-known statistical concept used in many forms of prediction.
* As previously mentioned, brackets used as [·] denote a probability distribution. Originally, the bracket
notation used in this way (Gelfand and Smith 1990) represented a PDF or PMF, but more recently, it
has been adopted as a space-saving notation for probability distributions in general (Hobbs and Hooten
2015).
Statistics for Spatial Data 35
(a) (b)
3.0 45
2.5 44
Semivariance
2.0 43
Latitude
1.5 42
1.0 41
40
0.5
39
0.0
0 1 2 3 4 5 −98 −96 −94 −92 −90
Distance Longitude
FIGURE 2.8 (a) Empirical (points) and fitted (line) semivariogram for the residuals after
regressing temperature on longitude and latitude. Fitted semivariogram is based on an exponen-
tial covariance model with nugget effect. (b) Spatial predictions for temperature using Kriging
(darker is warmer). U.S. state boundaries overlaid as lines and observation locations shown as
points.
Schabenberger and Gotway (2005) note that Equation 2.28 is somewhat misleading
in that β̂ REML is really a GLS estimator evaluated at the REML estimates for the
covariance parameters.
In the big picture, this concept of restricting the estimation to focus on a subset of
the larger parameter space has many more applications than just maximum likelihood.
It can also play an important role prioritizing first-order (i.e., mean) versus second-
order (i.e., covariance) effects in models and also in dimension reduction for improved
computational efficiency. We return to these issues in the sections that follow.
* It appears that Gelfand and Smith (1990) were the first to employ such notation and we thank them
for it every time we write a posterior or full-conditional distribution using this notation because it is
streamlined and uncluttered.
Statistics for Spatial Data 37
model without a nugget effect, the data portion of the model can be written as
β ∼ N(μβ , β ) ≡ [β],
σ 2 ∼ IG(α1 , α2 ) ≡ [σ 2 ],
φ ∼ Gamma(γ1 , γ2 ) ≡ [φ],
1 2 α2 −1 −(1/α1 σ 2 )
[σ 2 ] ≡ (σ ) e .
α1α2 (α2 )
To fit the model, we find the conditional distribution of the unknowns (parameters)
given the knowns (data). This distribution, [β, σ 2 , φ|y], is known as the posterior dis-
tribution. Using Bayes’ law, we can write out the posterior distribution as a function
of the model distributions (i.e., data model and parameter models)
where the product of priors is used because we are assuming the parameters are inde-
pendent a priori. The constant c(y) is actually a function of the data y and is a single
number that allows the left-hand side of Equation 2.30 to integrate to 1, as required of
all PDFs. We could attempt to integrate the right-hand side of Equation 2.30 directly
to find c(y); however, in this case, the integral is not analytically tractable.* Thus, we
rely on one of many Bayesian computational methods to find the posterior distribution
for the parameters of interest.
As we noted in Chapter 1, MCMC is an incredibly useful computational method
for fitting Bayesian models and has the advantage of being relatively intuitive and
easy to program (as compared with many other methods). The basic idea under-
pinning MCMC is to sample a single parameter (or subset of parameters) from the
conditional distribution, given everything else (termed the “full-conditional distri-
bution,” and denoted as [parameter|·]), assuming that everything else in the model
is actually known (i.e., data and other parameters). For the parameter vector β, the
full-conditional distribution is [β|·] ≡ [β|y, σ 2 , φ]. After a sample, β (k) , is obtained,
for the kth MCMC iteration, we sample the next parameter, (σ 2 )(k) , from its full-
conditional distribution [σ 2 |·], and then sample the remaining parameter φ (k) from
* An analytically intractable expression cannot be written in closed form (i.e., pencil and paper).
38 Animal Movement
its full-conditional distribution [φ|·]. After we have sequentially sampled all param-
eters from their full-conditionals using the latest sampled values of each parameter
being conditioned on, we loop back to the first parameter and sample each parameter
again such that we are always conditioning on the most recent values for parameters
in the loop. MCMC theory shows that these sequences of samples, called Markov
chains, will eventually produce a sample from the correct joint posterior distribution,
given enough iterations of the MCMC algorithm. Hobbs and Hooten (2015) provide
additional insight about MCMC that solidifies the quick introduction presented here.
After the samples have been obtained, various point and interval estimates (among
other important quantities) for other parameters can be approximated by computing
sample statistics on the Markov chains themselves. For example, we could find the
posterior mean of the regression coefficients by averaging the set of MCMC samples
K (k)
k=1 β
E(β|y) ≈ ,
K
where k = 1, . . . , K represent the iterations in the MCMC algorithm and the total
number of MCMC iterations K is large enough that the posterior mean is well approx-
imated. Posterior summarization is trivial (i.e., taking various sample averages of
the MCMC output) because the sampling-based method for approximating inte-
grals, called Monte Carlo (MC) integration, has excellent properties. For example,
one could approximate any integral using MC samples (independent and identically
distributed) θ (k) ∼ [θ] for k = 1, . . . , K with
K
g(θ (k) )
Eθ (g(θ)) = g(θ)[θ ]dθ ≈ k=1 , (2.31)
K
for some PDF [θ] and function of theta g(θ). Therefore, coupling MC integration
with MCMC output from Bayesian model fitting yields an incredibly powerful tool
for finding posterior quantities of nearly any function of model parameters. Trying
to provide such inference under non-Bayesian paradigms, if possible at all, requires
complicated procedures such as the delta method (e.g., Ver Hoef 2012) or further
computational burden, such as bootstrapping.
In practice, MCMC algorithms may require some “burn-in” period where the sam-
ples are still converging to the correct posterior distribution, and thus, a set of initial
samples (often the first fourth or half) are discarded before computing posterior sum-
mary statistics. Furthermore, it may not always be easy to assess whether an MCMC
algorithm has converged, and although some statistics and guidelines exist, it is an
ongoing challenge to assess convergence in high-dimensional settings.
MCMC algorithms are surprisingly easy to construct in a statistical programming
language such as R (R Core Team 2013), but there are also several automated MCMC
sampling softwares available (e.g., BUGS, JAGS, INLA, and STAN; Lunn et al. 2000;
Plummer 2003; Lindgren and Rue 2015; Carpenter et al. 2016). Furthermore, we
emphasize that, even though MCMC has led to numerous breakthroughs in statistics
and science, and has served as a catalyst for Bayesian methods and studies in general,
new Bayesian computational approaches are regularly being developed. Depending
Statistics for Spatial Data 39
on the desired inference and model, some alternative computational approaches have
advantages over MCMC. However, as previously mentioned, few, if any, alternatives
are as robust, intuitive, and as easy to implement as MCMC.
One of the primary advantages of MCMC and the Bayesian approach to geostatis-
tics in general is that uncertainty can properly be accounted for in both parameter
estimation and prediction. In fact, where many of the non-Bayesian approaches to
geostatistics involve a sequential set of estimation procedures (i.e., first obtain ordi-
nary least square [OLS] coefficient estimates, calculate residuals, estimate variogram,
then find GLS coefficient estimates), parameter estimation and prediction can all be
done simultaneously under the Bayesian paradigm using MCMC.
In MCMC, one only needs to sample from the full-conditional distribution for each
unknown quantity of interest given everything else in the model. For prediction, we
only need to sample y(k) (k)
u from its full-conditional [yu |·] with the other parameters
in an MCMC algorithm. Basic linear algebra leads to the necessary full-conditional
distribution, which turns out to be the predictive distribution for yu we described
previously (2.25):
Therefore, it is trivial to obtain MCMC samples from Equation 2.32 inside of a larger
MCMC algorithm and, using the output, we can easily find the Bayesian Kriging
predictions E(yu |yo ) by averaging the MCMC samples for yu according to Equation
2.31. Furthermore, the sample variance (2.31) of the MCMC samples for yu approx-
imates the posterior Kriging variance Var(yu |yo ) while incorporating the uncertainty
involved in the estimation of model parameters. The Bayesian approach to geostatis-
tics is probably the most coherent method for performing prediction while properly
accommodating uncertainty.
(a) (b)
SEDGWICK
LOGAN
MOFFAT JACKSON LARIMER
PHILLIPS
WELD
ROUTT
MORGAN
GRAND BOULDER
RIO BLANCO YUMA
BROOMFIELD WASHINGTON
GILPIN ADAMS
DENVER
CLEAR CREEK
GARFIELD EAGLE SUMMIT ARAPAHOE
JEFFERSON
MONTEZUMA
LA PLATA LAS ANIMAS BACA
COSTILLA
ARCHULETA CONEJOS
FIGURE 2.9 (a) Map of Colorado counties and (b) connections (straight black lines) between
Park county and the neighboring counties in Colorado within 100 km of Park county.
where wij are the elements of W and Ni indicates the neighborhood of unit Ai . In
a regular grid, the nearest neighbors (i.e., north, south, east, west) of grid cell Ai
could comprise the neighborhood Ni . The a priori specification of W is akin to the
choice of parametric covariance function in geostatistics. Thus, for irregularly located
regions, it is common to define the neighborhood Ni as all other units within some
prespecified distance d.*
Consider the U.S. state of Colorado, for example (Figure 2.9). There are 64 coun-
ties in the state of Colorado, each irregularly sized and shaped (Figure 2.9a). The
set of counties within (and other political or ecological regions) have discrete spatial
support.
* The distances dij are often calculated based on (1) Euclidean distance between unit centroids ci and cj
or (2) minimum distance between Ai and Aj .
Statistics for Spatial Data 41
FIGURE 2.10 Simulated areal data on a regular grid arising from (a) a regular process, (b)
a random process, and (c) a clustered process.
or negative spatial structure in a process but have subtle, yet important, differences.
We describe each in what follows.
For the sample variance σ̂ 2 of the data y ≡ (y1 , . . . , yn ) , the Moran’s I statistic
n
wij (yi − ȳ)(yj − ȳ) (2.33)
(n − 1)σ̂ 2 i j wij i j
is the discrete-space analog to the correlation function used in continuous space. Note
that we have used a subscript index to simplify the notation; that is, yi ≡ y(Ai ). The
Moran’s I statistic ranges from –1 to 1 and, under certain assumptions, the mean of
the Moran’s I statistic is −1/(n − 1). Values of Moran’s I close to 1 indicate spa-
tial clustering (or similarity for neighboring units) while values closer to –1 indicate
spatial regularity (or dissimilarity for neighboring units).
The Geary’s C statistic
1
wij (yi − yj )2 (2.34)
2σ̂ 2
i j wij i j
TABLE 2.1
Moran’s I and Geary’s C Statistics for Simulated Discrete
Spatial Processes in Figure 2.10
Moran’s I Moran p-Value Geary’s C Geary p-Value
statistic have been developed for investigating spatial structure in the residuals of a
linear model: e = y − ŷ, where ŷ = Xβ̂. In matrix notation, the Moran’s I statistic
for residuals is
n (Ge) W(Ge)
, (2.35)
ij wij (Ge) Ge
TABLE 2.2
Moran’s I and Geary’s C Statistics for Colorado County
Discrete Spatial Processes in Figure 2.11
Moran’s I Moran p-Value Geary’s C Geary p-Value
(a) (b)
(c) (d)
FIGURE 2.11 Maps (darker corresponds to larger values) of (a) avian species richness, (b)
log(minimum elevation), (c) log(human population), and (d) total area in square kilometers.
The Moran’s I and Geary’s C statistics for the avian species richness data and associ-
ated covariates in Colorado suggest that all of these discrete spatial data are clustered.
The only exception occurs for avian species richness itself (Figure 2.11a), which only
has a significant Moran’s I statistic (no confirmation from Geary’s C).
* Note that some refer to CAR models as Besag models, after their early development by Besag (1974).
44 Animal Movement
Beginning with the SAR model, we use the same model-based framework as in the
preceding geostatistics sections, where we employ the linear modeling specification
y = Xβ + η, (2.36)
but, in this case, we let the errors, η = ρWη + ν, depend on themselves stochastically
such that E(ν) = 0, E(νν ) = σ 2 I, and ρ is a parameter that controls the degree of
autocorrelation (−1 < ρ < 1).
Solving η = ρWη + ν for η and substituting into Equation 2.36, we have
y = Xβ + η
= Xβ + (I − ρW)−1 ν,
which implies that the covariance matrix for η is σ 2 (I − ρW)−1 (I − ρW )−1 . It is
important to point out that the SAR model does not require W to be symmetric (i.e.,
one-way relationships between spatial units are acceptable) but we do need to be
able to invert (I − ρW). LeSage and Pace (2009) provide a solid description of SAR
models that is helpful for gaining intuition about the implied connectivity.
To formulate the CAR model, we assume a Markov dependence among the
errors ηi .* The conditional mean of ηi can be expressed as
E(ηi |{ηj , j ∈ Ni }) = cij ηj (2.37)
j∈Ni
and
Var(ηi |{ηj , j ∈ Ni }) = σi2 , (2.38)
where cij are weights based on the proximity with neighbors and σi2 varies with i,
imparting nonstationary in the spatial process.† An interesting and critical result for
CAR models is that they can be written jointly, using matrix notation such as SAR
models. Thus, let the CAR model be defined as
y = Xβ + η, (2.39)
* This Markov assumption implies that, given the neighbors, the process at a location is independent of all
other nonneighboring locations.
† Statistical models for discrete spatial processes do not have the same assumptions as those for continuous
spatial processes.
Statistics for Spatial Data 45
row sums of W on the diagonal and zero elsewhere. In this latter specification, the
correlation parameter ρ is bounded between –1 and 1.
It has become common to fix ρ = 1 in CAR models and refer to them as “intrinsic”
CAR models (ICARs). The recent popularity of ICAR specifications is due to several
reasons:
The implementation of SAR and CAR models is similar to the former geostatistical
models where they can be implemented in either a maximum likelihood or Bayesian
paradigm. The CAR specification naturally pairs with an MCMC algorithm because
the full-conditional distributions for ηi are Gaussian and can be readily simulated
from sequentially.
To demonstrate the differences in inference resulting from the regular linear model
and the CAR model, we fit both models to the Colorado avian species richness data
(i.e., Figure 2.11). As a typical variance stabilizing transformation, we used the nat-
ural log of species richness for a response variable. We used the standardized natural
log of county population size from the 2010 census and standardized county area
as covariates. We specifically left out elevation as a potentially important “missing
covariate.” Heuristically, we expect greater species richness in counties with more
people and in larger counties. We might also expect there to be latent spatial structure
in the residuals from a regular multiple linear regression model fit.
A maximum likelihood analysis of these data confirms our hypotheses that log
county population size and county area are positive predictors of recorded avian
diversity (Table 2.3). While the parameter estimates resulting from the regular linear
TABLE 2.3
Parameter Estimates and p-Values for Avian Log Species
Richness Based on the Standardized Natural Log of County
Population Size from the 2010 Census and Standardized
County Area as Covariates (Figure 2.11) in the Regular Linear
Model (LM) and CAR Model
Covariate LM Estimate LM p-Value CAR Estimate CAR p-Value
model fit are both positive and apparently important (i.e., small p-values), a Moran’s
I test of the residuals suggested that there may be remaining unaccounted for spatial
dependence in the errors (p-value < 0.001). Thus, fitting a CAR model to the same
data (using a neighborhood structure based on centroids within a 100-km radius),
we find that the county log population covariate still seems significant, whereas the
county area covariate is no longer a significant predictor of log richness (Table 2.3).
A Moran’s I test of the CAR residuals indicated no remaining evidence of spatial
structure after accounting for correlated errors (p-value = 0.671).
The results of the Colorado avian species richness analysis illustrate an important
reason to account for latent dependence in data. Assuming independent errors when
they are actually positively correlated can cause parameter estimates to be overly nar-
row, inflating the chance of inferring a significant first-order effect. When we added
the spatial dependence to the regression model using a CAR structure, the county area
p-value increased, leading us to downplay its importance in explaining avian species
richness. Furthermore, and most importantly, because the assumptions of the linear
model were not met in this example, it cannot be used to provide statistical inference,
whereas the CAR model results can be used.
Finally, recall that our model did not include the elevation covariate. However, the
CAR model we fit did include a positively correlated spatial random effect. Thus,
the spatial random effect helped account for the missing covariate of elevation, at
least to some extent. Figure 2.12 provides a visual perspective of how spatial struc-
ture helps to account for the missing elevation covariate. Notice that the opposite
pattern appears in the (a) and (b) panels of Figure 2.12. Heuristically, we expect
higher elevations to negatively affect avian species richness. Thus, the spatial ran-
dom effect needs to appear as the opposite pattern of log(minimum elevation) to
influence the model in the same way as the actual covariate. In this case, the esti-
mated spatial random effect does indeed have a pattern similar to that expected based
on our prior understanding of the system. Thus, the spatial random effect is capable
of accounting for the same type of spatial structure that appears in the topography of
Colorado.
(a) (b)
FIGURE 2.12 Maps (darker corresponds to larger values) of (a) log(minimum elevation) and
(b) η̂, the estimated spatial random effect from the CAR model (i.e., the mean of the residuals).
Statistics for Spatial Data 47
y = β0 + Xβ + η + ε
= β0 + Xβ + Hα + ε. (2.40)
The presence of H makes it clear that there are, in fact, two “design” matrices in this
model, one for the fixed effects (X) and one for the random effects (H). Under this
specification (2.40), assume η ∼ N(0, ση2 Q−1 ) (for either a continuous or discrete
spatial process), where Q−1 is the spatial correlation matrix whose inverse can be
decomposed as Q = H H , then α ∼ N(0, ση2 −1 ) such that is a diagonal matrix.
Just as multicollinear covariates can bias the estimates of β in the standard linear
model, it can also influence the regression coefficients in the mixed model framework,
which includes the spatial models we have described when a nugget effect (η) is used.
If the columns of H are linearly independent of the columns of X, these models per-
form as expected for parameter estimation. However, when the columns of H are not
linearly independent of X, one may wish to consider remedial measures. Hodges and
Reich (2010) and Hughes and Haran (2013) present a restriction approach for forcing
the first-order process to take precedence over the second-order process in CAR mod-
els and Hanks et al. (2015b) developed a similar method for geostatistical models. In
fact, the restriction is essentially the same idea that is used in REML estimation where
the second-order process is restricted to the residual space of the first-order process.
Following Hodges and Reich (2010), we describe the basic restricted spatial regres-
sion approach and follow up in the next section with the modification presented by
Hughes and Haran (2013).
To arrive at one set of orthogonal basis vectors for H (2.40), consider the spec-
tral decomposition of the matrix G = H H .* In this decomposition, which is also
known as the eigen decomposition, the columns of H are the eigenvectors, while the
diagonal elements of the diagonal matrix are the eigenvalues of the residual opera-
tor G = I − X(X X)−1 X .† The corresponding model for the spectral coefficients α
is α ∼ N(0, ση2 (H QH)−1 ). This restricted spatial regression will guarantee that the
point estimates for β are the same as those resulting from the nonspatial model (i.e.,
the model without η in Equation 2.40).
* The continuous version of this decomposition is referred to as a Karhunen–Loeve expansion and is the
basis for principal components analysis.
† Technically, the matrix H needs to be truncated so that it only contains the first n − rank(X) eigenvectors
of G.
48 Animal Movement
Nothing prevents one from using this same restriction approach on the covariates
for the fixed effects in the model (X) if certain collinear covariates are believed to
have a priority over others. This procedure is probably not wise to apply as a blanket
approach in all analyses (Paciorek 2010; Hanks et al. 2015b). Serious considera-
tion should be given to the covariates used in a model. However, in most ecological
studies, we try to collect information on the factors we feel are most relevant (i.e.,
suspected to be causal) for the response variables we observe. Thus, few ecolo-
gists would have much hesitation about giving their carefully selected fixed effect
covariates priority in a model over second-order spatial structure.
y = β0 + Xβ + η + ε (2.41)
= β0 + Xβ + Hα + ε. (2.42)
As before, suppose that η ∼ N(0, ση2 Q−1 ) and ε ∼ N(0, σε2 I). If H is an n × n matrix
of orthonormal basis functions (e.g., Fourier basis functions, wavelets), then α are
spectral coefficients whose implied distribution will be α ∼ N(μα , σα2 ), where is
a diagonal matrix and, typically, μα = 0 and σα2 = ση2 . Note that this is the same
basic idea as that discussed in the preceding sections, however, we can use fast
computational algorithms to calculate the necessary transformation η = Hα (and
inverse transformation, H η = α; recall that, if the columns of H are orthogonal,
we have H H = I). As an example, Wikle (2002) employs the discrete cosine trans-
form, whereas Hooten et al. (2003) used the fast Fourier transform, to get back
* Note that, if the data are of dimension O(105 ), then the covariance matrix is on the order of O(1010 )
elements, a frighteningly large number of values to store in the computer, let alone do calculations with.
Statistics for Spatial Data 49
and forth between α and η. In the Bayesian generalized linear mixed model setting
(which includes the linear mixed model), an advantage of the orthogonality is that
the full-conditional distribution for α is
⎛
−1 −1 −1
[α|·] = N ⎝ H H + 2
H (y − β0 − Xβ) + 2 μα ,
σα σα
−1 ⎞
−1 ⎠,
H H + 2 (2.43)
σα
where the inner product, H H = I, and the covariance matrix in Equation 2.43 is the
inverse of a diagonal matrix (because both H H and are diagonal). This, by itself,
can dramatically reduce the number of calculations required in an MCMC algorithm
and speed up model fitting.
A disadvantage to using this approach is that the matrix H should only have to be
calculated once or the savings gained in computing the full-conditional in Equation
2.43 are tempered by having to recalculate H repeatedly. The matrix Q must be known
in advance because H is often computed as a direct function of Q. In the geostatistical
setting, we often assume the correlation matrix Q−1 ij ≡ exp(−dij /φ). In this case,
the distances dij , between locations i and j, are easily calculated (and thus, known)
but the parameter φ is almost always unknown. A practical, yet perhaps unfulfilling,
remedial approach for empirically fixing Q is to either use a separate set of data
to estimate φ (and then fix its value in Q) or, similarly, use the same data set to
estimate φ. In the latter case, the approach is referred to as “empirical Bayes.” If Q is
known and conditioned on, the expansion matrix H can be easily calculated, allowing
the reparameterization in Equation 2.42 to be advantageous computationally. In the
same spirit, the ICAR model specification (2.39) implies that Q = (diag(W1) − W),
where W is typically a binary proximity matrix indicating which spatial regions are
neighbors of each other. This proximity matrix is often fixed by the researcher, and
thus, Q is fixed and can be used to compute H. Thus, the number of calculations
can be reduced in both continuous and discrete spatial process modeling using the
first-order reparameterization in Equation 2.42.
particular, the expansion matrix H, in the previous section, is not technically reduc-
ing dimensionality, but rather, reducing the number of required computations. More
formally, the previously specified matrix H is a full-rank n × n matrix. If, instead, we
consider a lower-rank matrix H̃ that has dimension n × p, where p n, we arrive at
the following modification of the spatial model (2.42):
y = β0 + Xβ + η + ε
≈ β0 + Xβ + H̃α̃ + ε, (2.44)
(2.45)
is sampled from, in an MCMC setting, to learn about the posterior predictive dis-
tribution, [yu |y]. The predictive distribution in Equation 2.46 relies on ηu ≡ H̃u α̃,
the correlated random field at the unobserved locations of interest. The matrix H̃u
contains the basis functions at the locations where predictions are desired.
An alternative approach for fitting the reduced-rank model (2.44) is to use an
integrated likelihood approach. Using a process called “Rao-Blackwellization,” we
integrate the random effects α̃ out of the product of the data and process models to
yield the integrated likelihood
[y|φ, μα , α , β, σε2 ] = [y|φ, α̃, β, σε2 ][α̃|μα , α ]dα̃. (2.47)
Statistics for Spatial Data 51
For the reduced-rank model based on the integrated likelihood, if α = σα2 I and ε =
σε2 I, then
−1
−1 I H̃ I H̃ H̃ H̃
y = 2 − 2 + 2 . (2.49)
σε σε σα 2 σε σε2
Thus, (I/σα2 ) + (H̃ H̃/σε2 ) is only a p × p matrix and can be inverted quickly.
Furthermore, if the basis vectors in H̃ are orthogonal (e.g., eigenvectors), then H̃ H̃
is often diagonal, further reducing the required computation to compute the precision
matrix −1y and sample the model parameters in an MCMC algorithm. For large data
sets (i.e., n greater than a few hundred), the integrated likelihood method is useful for
constructing fast and stable MCMC algorithms to fit the reduced-rank geostatistical
model (2.44).
Bayesian Kriging based on the integrated likelihood model is achieved by sam-
pling from the predictive full-conditional distribution
where y,u,o are the cross-covariance matrices between the unobserved and observed
spatial locations and y,o,u ≡ y,u,o . Sampling from the predictive full-conditional
(2.50) does not affect the model fit; thus, it can be performed during or after the
remainder of the MCMC samples are obtained for model parameters. The predictive
full-conditional (2.50) also depends on −1 y ; thus, predictive samples can be obtained
quickly using the Sherman–Morrison–Woodbury identity (2.49).
y = Xβ + η + ε
≈ Xβ + η̂ + ε. (2.51)
To obtain the predictions η̂, consider a set of m knot locations S̃ that exist in the space
of the n data locations S, where m n. If η ∼ N(0, η ) and ε ∼ N(0, σε2 I), then a
reasonable approach to obtain η̂ is with the linear predictor
−1
˜ η
η̂ ≡ ˜ η̃, (2.52)
y = Xβ + η̂ + ε
−1
˜ η
= Xβ + ˜ η̃ + ε. (2.53)
Thus, rather than needing to invert the large covariance matrix, η , at every step in
a statistical computer algorithm (e.g., MCMC), we only need to invert the m × m
matrix ˜ and sample an m-dimensional correlated random vector η̃. If the number of
knots (m) is small relative to the sample size (n), then the predictive process procedure
can be very computationally advantageous.
An interesting and relevant note is that the predictive process specification takes
the same form as the other reduced-rank specifications we described previously
˜ η
(2.44). That is, ˜ −1 η̃ = H̃α̃, where ˜ −1 is the matrix of basis functions and η̃
˜ η
represents the process on a lower-dimensional manifold. The key difference between
the predictive process and more conventional methods for dimension reduction is
in the choice and properties of basis functions. One could argue that the predictive
process approach is heuristically more tangible than other spectral approaches for
defining basis functions because, in the predictive process procedure, the knot loca-
tions are in the same space as the data locations. The associated basis functions in
˜ η˜ −1 can be visualized in Euclidean space and the coefficients η̃ are the values of
the spatial process at the set of knots.
Furthermore, the predictive process was originally intended for use when the
covariance matrices (i.e., η , ˜ η , and )
˜ are functions of unknown parameters, and
thus, must be computed and inverted at each step of a statistical algorithm. For exam-
ple, the elements of each of the covariance matrices might be modeled geostatistically
as an exponential function ση2 exp(−dij /φ) where dij represents the distance between
any two points i and j such that these points could be either in the data locations, knot
locations, or both.
Statistics for Spatial Data 53
y = Xβ + η + ε, (2.54)
* Gaussian processes and Gaussian random fields are the same thing; they both are realizations of a
continuous Gaussian distribution with correlation structure (i.e., a nondiagonal covariance matrix).
54 Animal Movement
(a) (b)
45 45
44 44
43 43
Latitude
Latitude
42 42
41 41
40 40
39 39
−98 −96 −94 −92 −90 −98 −96 −94 −92 −90
Longitude Longitude
also explicitly temporal, we summarize fundamental statistics for time series in the
following chapter.
Animal telemetry data usually consist of time-indexed spatial locations, and can be
thought of as multivariate time series. Thus, a foundation in the statistical treatment
of time series data is important for modeling animal movement. This chapter provides
a useful set of tools and concepts that one may wish to apply to telemetry data.
* One could argue that time is strictly a forward process. Regardless of this fact (at least the way we
experience time as humans, with the potential exception of Kurt Vonnegut), statistical approaches make
use of information on both sides of the time point of interest, as we demonstrate in what follows.
55
56 Animal Movement
model for a first- and second-order process. For example, most ecologists are primar-
ily interested in first-order effects (i.e., things that influence the mean of the process
under study). Thus, we should avoid making invalid model assumptions that lead to
erroneous inference. To illustrate how invalid assumptions about second-order depen-
dence can lead to erroneous first-order inference, consider the following contrived
example.
Suppose the following two sets of data are collected from known distributions:
where μ and σ 2 correspond to the mean and variance, and note that the square bracket
notation [·] refers to a probability distribution, as before. These distributions could be
Gaussian, but need not be in this example. Suppose we are interested in estimating
the first-order mean μ in this setting. The usual estimator for a population mean is the
sample mean, which is μ̂y = (y1 + y2 )/2 and μ̂z = (z1 + z2 )/2. As an estimator, the
sample mean enjoys the excellent properties of unbiasedness and known variance in
the case where σ 2 is known. To see how these properties arise, we derive the first two
moments for the distribution of μ̂ in detail. First, the expectation of the sample mean
for y is
y1 + y2
E(μ̂y ) = E
2
1
= E (y1 + y2 )
2
1
= (E(y1 ) + E(y2 ))
2
1
= (μ + μ)
2
1
= (2μ)
2
= μ.
Thus, the sample mean is an unbiased estimator of μ. The same procedure can be
applied to show that μ̂z is also unbiased. Thus, we have an unbiased estimator for a
homogeneous mean in both cases (i.e., the independent error (1) or dependent error
(2) case).
Proceeding with the variance, we take a similar approach, but using variance and
covariance operators. To find the variance of μ̂y , we start by considering the variance
as a covariance of that quantity and itself and then expand the covariance term as
1
= (cov(y1 , y1 ) + cov(y1 , y2 ) + cov(y2 , y1 ) + cov(y2 , y2 ))
22
1
= 2 (Var(y1 ) + 2cov(y1 , y2 ) + Var(y2 ))
2
1
= 2 σ 2 + 2cov(y1 , y2 ) + σ 2
2
1
= 2 (2σ 2 )
2
σ2
= .
2
Thus, the variance of μ̂y is the population variance divided by the sample size (as we
recall from our first statistics course). But, what is the variance of μ̂z ? The variance
of μ̂z can be found using the same procedure, except notice that the 2cov(z1 , z2 ) (on
line 5 in the above derivation) is not zero, implying that
σ2 cov(z1 , z2 )
Var(μ̂z ) = + . (3.1)
2 2
The variance of the estimator for μ is either larger or smaller for z than it is for y. In
this case, Var(μ̂z ) will be larger when the covariance between z1 and z2 is positive and
smaller when negative. Thus, positive dependence in time series data,* if unaccounted
for, will lead to confidence intervals that are too narrow, inflating the chance of a
type 1 error in decision making based on first-order effects. Thus, in what follows,
we provide the background to assess, and then account for, temporal dependence in
data and processes.
of homogeneous mean is also required for temporal stationarity, but we mention that
separately because we intend to model variation in the mean (i.e., first-order) else-
where. The covariance function for time series is the analog to the covariogram in
spatial statistics and the stationarity assumption has a similar interpretation as well;
specifically, that the temporal process behaves according to the same dependence
throughout the entire time series.
We need a way to estimate the covariance function for time series. Thus, we
estimate γ (t) and ρ(t) with
T
(ηt − η̄)(ηt−t − η̄)
t=1+t
γ̂ (t) = , (3.3)
T − t
and
γ̂ (t)
ρ̂(t) = . (3.4)
γ̂ (0)
Using these estimators in a large sample situation, if η is *
√ not correlated,√ approxi-
mately 95% of ρ̂(t) should fall in the interval (−1.96/ T − t, 1.96/ T − t).
This provides a way to test if an observed time series meeting the aforementioned
assumptions is uncorrelated.
The other useful statistic for assessing temporal dependence is called the PACF.
The PACF provides inference about the correlation between ηt and ηt−t with the
dependence from the time points between removed. The PACF is estimated as
⎧
⎪
⎪ρ̂(1) if t = 1
⎪
⎨ t−1
ρ̂(t, t) = ρ̂(t) − ρ̂(t − 1, j)ρ̂(t − j) , (3.5)
j=1
⎪
⎪ if t = 2, 3, . . .
⎪
⎩ t−1
1− ρ̂(t − 1, j)ρ̂(j)
j=1
* An uncorrelated stationary temporal process is typically referred to as “white noise” in the time series
literature.
Statistics for Temporal Data 59
One can then find the associated confidence interval for this statistic and gauge
whether lag 1 autocorrelation is evident in the data beyond the first-order trend. It
is important to note that this statistic only examines lag 1 autocorrelation. That is, it
is concerned only with et and et−t , where t = 1 for all t; although it can be adapted
to assess autocorrelation at larger lags.
Consider the four simulated time series in Figure 3.1. Panels (a–c) in Figure 3.1
show time series with increasing amounts of positive temporal dependence, whereas
(a)
2
1
ε
0
−1
0 20 40 60 80 100
Time
(b) 2
1
ε
−1
−3
0 20 40 60 80 100
Time
(c) 4
2
ε
0
−2
0 20 40 60 80 100
Time
(d) 6
2
ε
−2
−6
0 20 40 60 80 100
Time
FIGURE 3.1 Simulated time series with mean zero, variance equal to 1, and (a) no tem-
poral dependence, (b) moderate positive temporal dependence, (c) strong positive temporal
dependence, and (d) strong negative temporal dependence.
60 Animal Movement
TABLE 3.1
Durbin–Watson Statistics and p-Values at Lag 1
Unit of Time for a Two-Sided Hypothesis Test for
the Time Series in Figure 3.1
a b c d
panel (d) in Figure 3.1 shows strong negative temporal dependence. To assess the
temporal dependence in the time series shown in Figure 3.1, we calculated the ACF,
PACF, and Durbin–Watson statistic for each series. The ACFs for each time series
in Figure 3.1 are shown in Figure 3.2 and the corresponding PACFs are shown in
Figure 3.3. Confidence intervals under the null hypothesis of no autocorrelation are
shown as gray dashed lines. Finally, the Durbin–Watson statistics for each time series
are shown in Table 3.1.
As an exploratory data analysis, the ACF, PACF, and Durbin–Watson statistics
suggest that there is no evidence of temporal autocorrelation in the time series in
Figure 3.1a, whereas for time series in Figure 3.1b and c, there is increasing tempo-
ral dependence, but only conditioned on the neighboring time points (i.e., the PACF
showed no structure after removing dependence on neighboring time points for all
time series). While the ACF suggests positive temporal dependence for the time series
in Figure 3.1b and c, the oscillating nature of the ACF for the time series in Figure 3.1d
indicates negative temporal dependence.
ηt = αηt−1 + εt , (3.8)
(a) 1.0
0.6
ACF
0.2
−0.2
0 5 10 15 20
Lag
(b) 1.0
0.6
ACF
0.2
−0.2
0 5 10 15 20
Lag
(c) 1.0
0.6
ACF
0.2
−0.2
0 5 10 15 20
Lag
(d) 1.0
ACF
0.0
−1.0
0 5 10 15 20
Lag
FIGURE 3.2 ACF for simulated time series with mean zero, variance equal to 1, and (a) no
temporal dependence, (b) moderate positive temporal dependence, (c) strong positive temporal
dependence, and (d) strong negative temporal dependence. Gray dashed lines show a 95%
confidence interval under the null hypothesis.
At α = 0, the time series is a white noise process (i.e., independent) with mean zero
and variance σ 2 . However, for α = 0, the process is often referred to as a “random
walk.” That is, each step in the time series is a step of random length in a random
direction away from the previous location (in η space). When α = 1, the random walk
is not stationary and can wander anywhere it wants in the real numbers. The random
walk can be used as a model for temporal dependence where strong autocorrelation
is present or desired.
The AR(1) model is naturally conditional, that is, ηt depends on ηt−1 , but it can also
be coerced into a joint model such that η ∼ N(0, ). In the joint model, the precision
(i.e., inverse covariance) matrix −1 is tri-diagonal with (1 − α)/σ 2 for the first and
62 Animal Movement
(a)
0.2
Partial ACF
0.0
−0.2
5 10 15 20
Lag
(b)
0.4
Partial ACF
0.2
0.0
−0.2
5 10 15 20
Lag
(c)
Partial ACF
0.6
0.2
−0.2
5 10 15 20
(d) Lag
0.2
Partial ACF
−0.2
−0.6
−1.0
5 10 15 20
Lag
FIGURE 3.3 PACF for simulated time series with mean zero, variance equal to 1, and (a) no
temporal dependence, (b) moderate positive temporal dependence, (c) strong positive temporal
dependence, and (d) strong negative temporal dependence. Gray dashed lines show a 95%
confidence interval under the null hypothesis.
where the second autoregressive coefficient controls the dependence at time lag t =
2. Models with dependence at higher-order lags are often referred to as AR(p) models,
Statistics for Temporal Data 63
where p denotes the highest-order lag in the model. It should be noted that, outside
of the field of economics, AR models of higher order than 2 are not common. One
of the reasons higher-order models are not common in ecology is that they can be
difficult to interpret.
There are several extensions we might want to make for this type of autoregressive
model. The first, and perhaps most obvious, is to allow for a trend in the process. Thus,
we denote yt as the response variable to clarify that we are now specifying models for
something other than a mean zero stationary process. Consider a scenario where there
exists a heterogeneous temporal trend in the data and we wish to account for it in the
model. We have many options in that case; however, to express the general idea of
how to set up such a model, we limit ourselves to only AR(1) dynamics. A univariate
autoregressive temporal model with linear heterogeneous trend is specified as
yt = xt β + ηt , (3.11)
yt = xt β + ηt
= xt β + αηt−1 + εt
= xt β + α(yt−1 − xt−1 β) + εt
= xt β + αyt−1 − αxt−1 β + εt
= (xt − αxt−1 ) β + αyt−1 + εt , (3.12)
where the dynamic component is a random walk and the new covariates (xt − xt−1 )
are a velocity vector in covariate space describing the change in covariates during that
time period. Thus, the stronger the autocorrelation in η, the more the inference about
β shifts away from a direct effect of xt on yt , and shifts toward the effect of a change
64 Animal Movement
in covariates (over time) on the associated change in the response variable.* Thus,
the model with autocorrelated errors (3.11) can be thought of as a form of discretized
differential equation model and is a very important topic that will arise in Chapter 6,
when modeling animal movement.
Each of the time series in Figure 3.1 were simulated from an AR(1) process with
mean zero and variance 1 using model (3.8). In Figure 3.1, panel (a) used an autocor-
relation parameter α = 0, panel (b) used α = 0.5, panel (c) used α = 0.9, and panel
(d) used α = −0.9. The fact that we used AR(1) models to simulate each of the data
sets was suggested by the ACF, PACF, and Durbin–Watson statistics.
To simulate a higher-order AR time series model with a trend (Figure 3.4), we use
yt = β0 + β1 xt + ηt , (3.14)
(a) 1.0
0.5
0.0
x
−1.0
0 20 40 60 80 100
Time
(b) 4
3
2
1
y
−2
0 20 40 60 80 100
Time
FIGURE 3.4 Simulated time series with heterogeneous trend specified as Equation 3.14. (a)
The temporal covariate xt and (b) the time series yt .
* Note that this can also be written as (yt − yt−1 ) = (xt − xt−1 ) β + εt . So, the change in “position” (y)
is a function of a change in the driving factors (x). This will be an important formulation in some of the
animal movement models that follow.
Statistics for Temporal Data 65
(a) 1.0
0.6
ACF
0.2
−0.2
0 5 10 15 20
Lag
(b) 0.6
0.4
Partial ACF
0.2
−0.2
5 10 15 20
Lag
FIGURE 3.5 ACF (a) and PACF (b) for the simulated time series with heterogeneous trend
specified as Equation 3.14. Gray dashed lines show a 95% confidence interval under the null
hypothesis.
The ACF and PACF plots shown in Figure 3.5 indicate longer range dependence in
the time series and dependence at time lag 2 after accounting for dependence at lag 1.
where εt ∼ N(0, σ 2 ) for all t and θ is the MA(1) regression coefficient. Note that the
difference between Equation 3.15 and the former model (3.12) is that the errors (εt )
are uncorrelated in Equation 3.15. Furthermore, the two types of time series models
66 Animal Movement
TABLE 3.2
Behavior Described in the ACF and PACF Suggests Which
Form of Time Series Model to Use
AR(p) MA(q) ARMA(p,q)
where εt ∼ N(0, 1) are independent, α1 = 0.9, and θ1 = 0.9 (Figure 3.6). Following
the guidance in Table 3.2, the ACF and PACF in Figure 3.6 indicate that the time
series does show characteristics of an ARMA time series, as it should, because both
the ACF and PACF tail off rather than cut off after a certain lag.
(a)
4
2
0
y
−2
−4
0 20 40 60 80 100
Time
(b) 1.0
0.6
ACF
0.2
−0.2
0 5 10 15 20
Lag
(c)
0.6
Partial ACF
0.2
−0.2
−0.6
5 10 15 20
Lag
FIGURE 3.6 Simulated ARMA(1,1) time series (a) based on Equation 3.17, ACF (b), and
PACF (c) for the simulated ARMA(1,1) time series. Gray dashed lines show a 95% confidence
interval under the null hypothesis.
and the backshift operator for the ARMA(p, q) model (3.16) yields
zt = yt − Byt
= (1 − B)yt ,
and this extends to the dth difference as zt = (1 − B)d yt . Thus, the basic
ARIMA(p, d, q) model without a covariate-based trend can be written as
where d corresponds to the chosen order of difference. If there is a need for further
trend explanation with covariates, then the form of ARIMA(p, d, q) is
Beginning with OLS, we recognize the form of the basic AR(p) model as a
regression model
yt = α0 + α1 yt−1 + · · · + αp yt−p + εt , (3.18)
[y1 , . . . , yT ] = f (yT |y1 , . . . , yT−1 )f (yT−1 |y1 , . . . , yT−2 ) · · · f (y2 |y1 )f (y1 ). (3.19)
* Method of moments is the process of equating population moments (i.e., often means and variances) in
the data generating probability distribution with sample moments and then solving algebraically for the
parameters in the distribution.
70 Animal Movement
T
L(α, σ ) =2
N(yt |α0 + α1 yt−1 + · · · + αp yt−p , σ 2 ), (3.20)
t=p+1
where μα , σα2 , ω1 , and ω2 are hyperparameters that are fixed and known a priori.
Then, the posterior distribution for this AR(p) model is
T
[α, σ 2 |y] ∝ [yt |α0 + α1 yt−1 + · · · + αp yt−p , σ 2 ][α][σ 2 ], (3.21)
t=p+1
where the likelihood arises from Equation 3.20 and [α] and [σ 2 ] represent the prior
distributions. The full-conditional distributions necessary to construct an MCMC
algorithm for this model are multivariate Gaussian and inverse gamma and are trivial
to sample from sequentially.
Consider the four simulated time series in Figure 3.1. Each of these time series
were simulated from an AR(1) process and exploratory data analysis suggested that
autocorrelation is present in the time series in panels (b–d) (Figure 3.1). The point
estimates of α obtained from fitting the AR(1) time series model to each of the data
sets in Figure 3.1 using four different methods are shown in Table 3.3. The Bayesian
AR(1) assumed a Gaussian prior for α with mean zero and variance 1 and an inverse
gamma prior for σ 2 with q = 2 and r = 1. Table 3.3 indicates that while all of the
estimation methods provide similar inference, there are differences among them.
Overall, similar fitting approaches can be constructed for more complicated MA,
ARMA, and ARIMA models as well. Regardless of the approach, we should proceed
with a series of routine model-checking techniques, computing model residuals and
constructing ACF and PACF plots for them, to assess whether various sources of
dependence exist, beyond those accounted for by the model.
Statistics for Temporal Data 71
TABLE 3.3
Truth and Parameter Point Estimates for the Autocorrelation
Parameter α in the AR(1) Model for Each of the Time Series in
Figure 3.1a–d Using Four Different Estimation Methods: Ordinary
Least Squares, Yule–Walker, Maximum Likelihood, and the Bayesian
Posterior Mean
a b c d
3.1.3 FORECASTING
Much like in spatial statistics, we may be interested in obtaining predictions for the
process of interest. The most commonly sought form of prediction in time series
is a forecast, that is, a temporal extrapolation. In spatial statistics, we were mostly
concerned with optimal interpolation and cautioned against extrapolation (e.g., by
Kriging only inside a convex hull of the data). From a dynamic modeling perspective,
we may also be interested in interpolation (to estimate missing values in the data), but
a primary concern is forecasting. Strong assumptions about stationarity must be made
to perform this type of extrapolation. With this in mind, we need a framework that we
can use for prediction in the temporal setting. To begin, we consider the prediction
of future responses.
Suppose we have data y ≡ (y1 , . . . , yT ) and desire a prediction for yT+1 . Recall
that, for the linear regression model with independent errors (i.e., y = Xβ + ε),
we seek ŷpred = E(ypred |y) = xpred β̂, which has prediction error variance σ 2 (1 +
xpred (X X)−1 xpred ), where xpred represent the set of covariates for the prediction of
interest. The time series analog for the AR(1) model, yt = αyt−1 + εt , where data
exist for t = 1, . . . , T, would be a one-step-ahead prediction ŷT+1 = αyT when the
coefficient α is known. In this setting, the prediction error variance σ̂T+1 2 , for the
one-step-ahead prediction, is
σ̂T+1
2
= σ 2 (1 + α 2 ). (3.22)
Note that predictions for higher-order (in time) time series models can be obtained
in a similar fashion, as well as more complicated models like ARMA and ARIMA
models (see Shumway and Stoffer 2006 for further details).
Figure 3.7 shows 10 step-ahead predictions and 95% prediction intervals for
each of the time series presented in Figure 3.1. The predictions in Figure 3.7 were
obtained by fitting the AR(1) model using maximum likelihood, yielding the results
72 Animal Movement
(a)
2
1
ε
0
−1
0 20 40 60 80 100
Time
(b) 2
1
ε
−1
−3
0 20 40 60 80 100
Time
(c) 4
2
ε
0
−2
0 20 40 60 80 100
Time
(d) 6
2
ε
−2
−6
0 20 40 60 80 100
Time
FIGURE 3.7 Predictions based on a maximum likelihood fit of the AR(1) model to each of
the time series in Figure 3.1. Dashed vertical line represents the time point before which data
exist. Gray region shows a 95% prediction interval.
in Table 3.3. Notice how the predictions converge to the overall mean of the time
series and the prediction intervals widen. For time series with stronger temporal
dependence, the predictions converge to the mean more slowly (Figure 3.7c and d).
The Bayesian perspective on forecasting with temporal data and a dynamic model
is similar, but predictions and predictive distributions are embedded in the same com-
putational procedure we use to fit the model (i.e., find the posterior distribution). To
see this, consider the simple Bayesian AR(1) model
yt ∼ N(αyt−1 , σ 2 ), t = 1, . . . , T,
α ∼ N(0, σα2 ),
σ 2 ∼ IG(r, q).
Statistics for Temporal Data 73
When we fit the model, we seek the posterior distribution [α, σ 2 |y], which can be
sampled from using MCMC.* To obtain a forecast for yT+1 , we find the posterior
predictive distribution
[yT+1 |y] = [yT+1 |α, σ 2 , y][α, σ 2 |y]dαdσ 2 , (3.23)
using composition sampling in an MCMC algorithm.† For this model, the full-
conditional predictive distribution is [yT+1 |α, σ 2 , y] = N(αyT , σ 2 ); thus, we only
need to be able to sample from a Gaussian distribution given our current values for α
and σ 2 in the MCMC algorithm. After the samples y(1) (k) (K)
T+1 , . . . , yT+1 , . . . , yT+1 (where
k = 1, . . . , K refers to the MCMC iteration) have been obtained, they can be summa-
rized using Monte Carlo integration just like any other model parameter. That is, we
compute the sample average of those resulting predictive realizations to obtain the
posterior predictive mean for the forecast.
* In fact, the full-conditional distributions for this model are conjugate, meaning that a fully Gibbs MCMC
algorithm can easily be constructed.
† Composition sampling involves taking the current MCMC values for parameters and substituting them
into the full-conditional predictive distribution, then sampling from that iteratively, as you would sample
any other parameters in the MCMC algorithm.
‡ It is not uncommon to see economic or financial ARIMA models with tens or even hundreds of parame-
ters. Model selection is often employed to find the parameter combination that provides best predictive
ability.
74 Animal Movement
yt = β0 + β1 t + εt . (3.24)
It is also possible that this trend can be explained by temporally varying covariates,
in which case the model becomes
yt = xt β + εt . (3.25)
A natural question might be: How can we make these models more flexible? One
approach that can be potentially useful is a form of semiparametric regression.† For
example, to generalize the basic homogeneous trend model yt = μ + εt would be to
let the mean of the temporal process vary over time such that yt = μt + εt , where
μt is unknown for all t. However, without repeated measurements at each time, the
model is overparameterized (i.e., too many parameters and not enough data to learn
about all of them). Thus, we need to reduce the dimensionality of the model so that
the known-to-unknown ratio is greater.
The basic idea underpinning semiparametric regression is to project the time-
varying quantity (in our case, μt ) into a different (hopefully reduced dimensional)
* Models that directly account for heteroskedasticity are referred to as “stochastic volatility” models.
† Semiparametric regression is often referred to as additive modeling by the machine learning community.
The term “additive” is used because we are “adding” up effects much like in regular linear regression.
Statistics for Temporal Data 75
space.* It really only means that we plan to model the temporal variation in a trans-
formation of the temporal space. The transformation is provided by a set of basis
functions that describe different portions of the temporal space ahead of time so that
we do not have to figure out the mean of the process everywhere independently, but
rather as a subset of the space. More formally, we can reparameterize the temporally
varying mean model as
yt = μt + εt
= ht φ + εt , (3.26)
where the vectors ht contain information about a region in temporal space and φ are
the coefficients to be estimated. As with other regression models, this can be written
in full matrix notation as y = Hφ + ε. Thus, when the set of basis vectors are known,
it is trivial to estimate φ. In practice, there are a few issues with this model. First, the
new “design” matrix of basis functions H is T × T and the coefficient vector φ is
T × 1. Thus, under this full-rank scenario, we gain nothing in terms of dimension
reduction. Second, we need to choose the specific form of basis functions in H.
To reduce the dimension of the unknowns in the model, consider the approxima-
tion
y=μ+ε
= Hφ + ε
= H̃φ̃ + ε,
where the new matrix of basis vectors H̃ is T × p, and, similarly, the new coefficients
φ̃ are p × 1. If p T, we gain a substantial amount of power for estimating μt .
The actual choice of H or H̃ is somewhat arbitrary, like many choices we make in
statistical modeling. Some have better support than others based on their characteris-
tics and the specific application being considered. As a subset of the many forms we
could use for H, consider the following popular choices:
* The phrase “project it into a reduced dimensional space” is commonly used in time series and spatial
statistics (e.g., recall our discussion of reduced-rank models in the previous chapter).
76 Animal Movement
where min(Tj ) is the minimum value (i.e., infimum) in the time set Tj .
• B-splines: For p “knot” locations τj (j = 1, . . . , p) in the temporal domain,
let
t − τj τj+l − t
hj,t (l) = hj,t (l − 1) + hj+1,t (l − 1)
τj+l−1 − τj τj+l − τj+1
for j = 1, . . . , p + 2L − l, where l = 1, . . . , L refers to the B-spline order
and the first order is defined as
1 if τj ≤ t < τj+1
hj,t (1) = .
0 otherwise
The B-spline basis functions* are related to cubic splines and commonly
used in semiparametric statistics. Despite their apparent complexity, as com-
pared with piecewise constant or piecewise linear splines, B-splines are
trivial to calculate using modern statistical software.
* It is common to hear the terms “basis functions” and “basis vectors” used interchangeably, especially in
statistics. However, “basis vectors” refer to the case where the functions themselves have been discretized
for use in computation.
† Regularization involves penalizing the complexity of a model so that it is parsimonious enough to provide
good predictions. Additive models are often penalized using generalized cross-validation (GCV).
Statistics for Temporal Data 77
(a)
2
1
ε
0
−1
0 20 40 60 80 100
Time
(b)
2
1
ε
−1
−3
0 20 40 60 80 100
Time
(c)
4
2
ε
0
−2
0 20 40 60 80 100
Time
(d)
6
2
ε
−2
−6
0 20 40 60 80 100
Time
FIGURE 3.8 Predictions based on temporally varying coefficient model to each of the time
series in Figure 3.1. The 95% prediction intervals are shaded in gray.
presented here are best for interpolated prediction rather than extrapolated prediction
(i.e., forecasting).
* Although, when both the times and characteristics of the event at each time are observed, it is referred
to as a marked temporal point process (e.g., volume of water displaced during a sequence of geyser
eruptions).
78 Animal Movement
spatial point processes presented in Section 2.1.3, but translated to the time domain.
However, time progresses in one direction, which permits more tractable approaches
for modeling interactions among points.
We specify a parametric temporal point process model based on the conditional
intensity function λ(t|Ht ), where Ht is the history of event times up to time t. The
conditional intensity function has the same interpretation as the intensity function in
a spatial setting; the expected number of events in a small window of time, (t, t +
t). However, in the temporal context, we allow the possibility that the intensity can
change depending on the number and times of events already observed. A Poisson
process results if the intensity function does not depend on previous points.
The temporal Poisson process has the same basic properties as the spatial ver-
sion. Specifically, if T ≡ [0, T] is the time span over which we are examining the
process,
1. For any time interval B ⊆ T , the number of events occurring within B , n(B ),
is a Poisson variable with rate
λ̃(B ) = λ(τ ) dτ ,
B
which means that the expected total number of points in [0, T] is E(n(T )) =
λ̃(T ).
2. Finally, for any J intervals, B1 , . . . , BJ ⊆ T , that do not overlap,
n(B1 ), . . . , n(BJ ) are independent Poisson random variables.
Another useful concept in temporal point processes that does not exist for the spa-
tial versions is “waiting time.” The waiting times of a point process are the time gaps
between events (i.e., the time spent waiting until the next event). If t0 , t1 , . . . , tn , tn+1
(where t0 = 0 and tn+1 = T) are the observed event times, then the associated wait-
ing times are ti − ti−1 for i = 1, . . . , n (we revisit the truncated time, T − tn , in what
follows). Rather than specify a model for the event times themselves, we can specify
a model for the waiting times and examine the resulting model for the event times.
We show how to move between the two different model specifications.
We begin by slightly redefining the intensity function, λ(t|Ht ). A point process
is considered “orderly” if the chances of having more than one event in an interval
approaches zero as the interval becomes very short. That is,
If a point process is orderly, then the intensity function can be equivalently defined
as the probability that a single event occurs in a very short interval
Thus, the probability of an event occurring in the interval (t, t + t) is P(n((t, t +
t)) = 1|Ht ) ≈ λ(t|Ht )t. Although this may sound like a strong assumption, most
point processes fall into this category. This restriction is aimed at eliminating the
chance that two events will occur simultaneously so that we can construct a proper
density function.
To find the cumulative distribution function (CDF) and PDF of the waiting time
given the intensity function and history Ht , we need to find the CDF and PDF of the
event time ti given the previous events and the time since the last event. To accomplish
this, we take a brief probability detour.
For any continuous random variable, X, we can write
where F is the CDF of the random variable X. This results from the definitions of
conditional probabilities and CDFs. If we divide each side of Equation 3.29 by x
and let x → 0, then
where f (x) is the PDF of the distribution of X, which results from the fact that
dF(x)/dx = f (x). In the context of event times, we obtain
Using Equations 3.27 and 3.28, we replace the left-hand side with λ(t|Ht ), providing
a way to calculate the intensity
f (t|Ht )
λ(t|Ht ) = . (3.32)
1 − F(t|Ht )
While Equation 3.32 provides a sense of the relationship between the intensity func-
tion and the CDF (or PDF) of the waiting time, it is not directly useful for model
building. To further simplify the relationship,* we use
d
− f (t|Ht ) = (1 − F(t|Ht )). (3.33)
dt
A positive function H(x), such that d log H(x)/dx = h(x)/H(x), where h(x) =
dH(x)/dx, together with Equation 3.33, provides a result in terms of just the waiting
time CDF:
d
λ(t|Ht ) = − log(1 − F(t|Ht )). (3.34)
dt
Integrating each side and solving for the CDF results in the relationship
⎛ ⎞
t
F(t|Ht ) = 1 − exp ⎝− λ(τ |Hτ )dτ ⎠ . (3.35)
ti−1
Finally, by taking the derivative of Equation 3.35 with respect to t, we can find the
waiting time PDF
⎛ ⎞
t
f (t|Ht ) = λ(t|Ht ) exp ⎝− λ(τ |Hτ )dτ ⎠ . (3.36)
ti−1
Now that we have derived the conditional PDFs for the event times given the pre-
vious event times, we form the full joint PDF for the entire set of events and obtain
the likelihood for parameter estimation. Thus, we explicitly parameterize the inten-
sity function as in Chapter 2 for spatial point processes (i.e., λ(t|β, Ht ), where β is
a vector of parameters we wish to estimate). The joint likelihood is formed from the
product of conditional PDFs; however, we must deal with the truncation between the
last observed event, tn , and the end of the study interval, T. We never see when event
tn+1 occurs; we only know that tn+1 > T. To find the PDF of tn+1 , we need to find
the probability that there are no events in the interval (tn , T], or that, given tn < T,
the unobserved tn+1 event happens at a time >T, which is equal to 1 − F(T|HT ).
Therefore, using Equation 3.35
Finally, we have all the pieces to form the parametric model likelihood for a temporal
point process
n+1
L(β) = f (ti |β, Hti )
i=1
⎛ ⎞
n+1 ti
= λ(ti |β, Hti ) exp ⎝− λ(τ |β, Hτ )dτ ⎠
i=1 ti−1
Statistics for Temporal Data 81
n ⎛ ⎞
n+1 ti
= λ(ti |β, Hti ) exp ⎝− λ(τ |β, Hτ )dτ ⎠
i=1 i=1 ti−1
n ⎛ ⎞
T
= λ(ti |β, Hti ) exp ⎝− λ(τ |β, Hτ )dτ ⎠ . (3.38)
i=1 0
In Chapter 2, we showed the identical likelihood form for the spatial version of the
point process. However, a notion of temporal dependence has been incorporated by
conditioning on the history, Ht . Therefore, the intensity function changes over the
interval [0, T] depending on when events occur.
The waiting time concept in temporal point processes is very similar to that of
survival modeling based on “time to events” or failures. In survival modeling, the
“hazard” function is mathematically equivalent to our conditional intensity function,
λ(t|Ht ). The waiting times are equivalent to the “failure” times that are modeled.
Therefore, many of the parametric survival models are available for us to use in this
context. One of the most popular survival models that incorporates covariates into the
hazard function is the Cox proportional hazards (CPH) model (Cox and Oakes 1984).
The CPH intensity function is given by
where λ0 (t − t ) is a baseline intensity function that depends only on the time since
the last event, t . The time-indexed covariates in Equation 3.39 are denoted as the
vector x(t) and β are the coefficients to be estimated. The term exp(x (t)β) scales the
base intensity depending on the time series of covariate values. If we substitute the
CPH intensity function back into the likelihood (3.38), the resulting log-likelihood is
⎛ ⎞
n ti
(β) = ⎝log λ0 (ti − ti−1 ) + x (ti )β − exp(log λ0 (τ − ti−1 ) + x (τ )β)dτ ⎠ .
i=1 ti−1
(3.40)
To evaluate the log-likelihood, one can employ a trick similar to the Berman–
Turner device (Berman and Turner 1992) we introduced in Chapter 2. In the temporal
context, we select Ji + 1 quadrature points,* ui,0 , . . . , ui,Ji within the interval [ti−1 , ti ]
(where ui,0 = ti−1 and ui,Ji = ti ). Then the log-likelihood can be approximated by
n
Ji
(β) ≈ zi,j (log(ui,j − ui,j−1 ) + log λ0 (ui,j − ti−1 ) + x (ui,j )β)
i=1 j=1 (3.41)
where zi,j = 1 if j = Ji and zero for all other times. The log-likelihood function (3.41)
is the same if the zi,j were treated as independent Poisson random variables with
rates exp(log(uij − ui,j−1 ) + log λ0 (ui,j − ti−1 ) + x (ui,j )β). This approximation was
initially proposed by Holford (1980). Thus, if the log baseline intensity function
log λ0 (·) is linear in its parameters, one can use standard GLM software to fit a
Poisson regression model with offsets equal to log(ui,j − ui,j−1 ).
There are several different forms of baseline intensity that will produce different
effects from events clustered together in time to events that are more regularly spaced
than what would be expected from pure randomness. A very flexible class of waiting
time distributions is the Weibull distribution. The PDF for the Weibull distribution is
α−1
α t α
f (t|φ, α) = e−(t/φ) , (3.42)
φ φ
thus, if we model the waiting times with a Weibull distribution, the conditional
intensity function is
α t − t α−1
λ(t|Ht ) = , (3.44)
φ φ
where t is the time of the last observed event prior to t. For the Weibull intensity,
α
log λ0 (t|Ht ) = log − log(φ) + (α − 1) log(t − t ), (3.45)
φ
(a)
0 20 40 60 80 100
(b)
0 20 40 60 80 100
(c)
0 20 40 60 80 100
Time
FIGURE 3.9 Example of a Weibull point process. Plot (a) illustrates clustering behavior
when β1 < 0 (or α < 1 in the original parameterization), plot (b) illustrates constant intensity
when β1 = 0 (α = 1), and finally, a more regularly spaced pattern is illustrated in plot (c) when
β1 > 0 (α > 1).
yt = Ayt−1 + ηt , (3.46)
yt = Ayt−1 + ηt
= Iyt−1 + ηt
= yt−1 + ηt , (3.47)
a random walk of order 1. Under this random walk specification for the animal teleme-
try data scenario, if ≡ σ 2 I, then the current position of the animal (yt ) will be very
close to the last position (yt−1 ) if the error variance σ 2 is small. For example, two
simulated multivariate time series are shown in Figure 3.10. An initial position of
y1 = (0, 0) is assumed for both time series and σ 2 = 1 in Figure 3.10a, whereas
Figure 3.10b assumes σ 2 = 2.
It is important that even though we use the traditional term “error variance,” the
“error” terms ηt are really just a component of a stochastic temporal process (i.e., the
individual’s position). The simple random walk (3.47) is our first mechanistic animal
movement model. Despite its simplicity, the random walk is useful (especially as a
null model) and we return to it in the chapters that follow.
The random walk is a very simple dynamic model; it can be generalized by letting
A ≡ diag(α).† When the covariance is also diagonal, fitting the VAR(1) model
* The free parameters in are the two diagonal elements, representing the variances for each dimen-
sion, and a single off-diagonal element that controls the correlation between the dimensions. Covariance
matrices need to be symmetric; thus, the upper right element in a 2 × 2 covariance matrix is the same as
the lower left element.
† The “diag” function places the vector α along the diagonal of a square matrix with zeros for all off-
diagonal elements.
Statistics for Temporal Data 85
(a) (b)
0 0
−10 −10
Y2
Y2
−20 −20
−30 −30
−10 0 10 20 −10 0 10 20
Y1 Y1
FIGURE 3.10 Simulated 2-D processes arising from Equation 3.47 plotted in 2-D space.
Panel (a) assumes σ 2 = 1 and panel (b) assumes σ 2 = 2.
(3.46) essentially implies we are really fitting J independent univariate time series
models (one for each dimension of yt ). This makes it slightly more robust than the
independent random walk model, but it does not harness the real utility of the general
VAR(1).
In fact, the VAR(1) can provide surprisingly general dynamic behavior. The mech-
anism allowing for the flexibility in dynamics lies in the off-diagonal elements of A
(Wikle and Hooten 2010). The off-diagonal elements control the interactions within
the process from one time point to the next. As an example, consider the VAR(1)
model for a 2-D dynamic process with homogeneous trend μ:
yt = μ + Ayt−1 + ηt , (3.48)
which is a biased random walk. If A is parameterized such that it has α1,1 = α2,2 =
α as diagonal elements with α1,2 and α2,1 on the off-diagonals, then the mean of
the first element y1,t is E(y1,t |yt−1 ) = μ1 + α1,1 y1,t−1 + α1,2 y2,t−1 . The conditional
mean of y1,t indicates that y1,t will tend to be close to some weighted average of the
global mean for that dimension (μ1 ), the previous location in that dimension (y1,t−1 ),
and the previous location in the other dimension (y2,t−1 ). A similar expression can
be found for the conditional mean of the other dimension of the process. If the off-
diagonal autoregressive coefficients α1,2 and α2,1 in A approach zero, we return to the
independent random walk model, but as they increase, we see an increasing influence
of one dimension of the process on the other in the dynamics. This range of possible
interactions among dimensions allows for realistic behavior in a dynamic process
such as animal movement.
Figure 3.11 shows two simulated multivariate time series arising from the VAR(1)
model in Equation 3.48 in 2-D space. The mean was μ = (0, 0) and variance param-
eter was σ 2 = 1 for both simulations, but the propagator matrix A was specified to
have 0.9 on the diagonal elements, with 0.1 on the off-diagonals in Figure 3.11a and
−0.1 on the off-diagonal in Figure 3.11b. The elliptical shape of the time series in
86 Animal Movement
(a) (b)
5 5
0 0
Y2
Y2
−5 −5
−10 −10
−5 0 5 10 15 −5 0 5 10 15
Y1 Y1
FIGURE 3.11 Simulated 2-D processes arising from Equation 3.48 plotted in 2-D space.
Panels (a) and (b) assume μ = (0, 0) and σ 2 = 1. The propagator matrix A contains posi-
tive off-diagonal elements in panel (a), whereas panel (b) relies on a propagator matrix with
negative off-diagonal elements.
2-D space in Figure 3.11 arises from the interaction between the directions in the
dynamic process.
More mechanistic parameterizations of A are also possible and can be useful.*
We describe several specific parameterizations of VAR models for animal telemetry
data in the later chapters but, before we leave this topic, we note that higher-order
autoregressive models are possible and potentially useful for multivariate processes.
Recall our description of univariate ARIMA models in the previous section. The
same sort of temporal differencing can be used in the multivariate setting, but its inter-
pretation may vary. For example, it might have additional utility beyond detrending
a time series. It is possible that the differencing could be motivated by a discretized
derivative used to relate velocities in the multivariate process (rather than locations).
To see this, consider the integrated VAR(1) model on the quantity δ t = yt − yt−1 ,
where
δ t = Aδ t−1 + ηt . (3.49)
Using substitution and algebra with this model shows that it is actually a VAR(2)
model on the original location vectors yt . To see this, substitute yt − yt−1 into Equa-
tion 3.49 for δ t and rearrange terms with yt on the left-hand side and all other terms
on the right-hand side of the equality. The result is
* Wikle and Hooten (2010) and Cressie and Wikle (2011) provide much more detailed descriptions of mul-
tivariate dynamic models, their utility, and implementation, especially as they pertain to spatio-temporal
processes.
Statistics for Temporal Data 87
(a) (b)
200 200
0 0
Y2
Y2
−200 −200
−400 −400
−600 −600
−200 0 200 400 600 800 −200 0 200 400 600 800
Y1 Y1
FIGURE 3.12 Simulated 2-D processes arising from Equation 3.50 plotted in 2-D space.
Panels (a) and (b) assume σ 2 = 1. The propagator matrix A contains positive off-diagonal ele-
ments in panel (a), whereas panel (b) relies on a propagator matrix with negative off-diagonal
elements.
where the two propagator matrices in the VAR(2) model are (A + I) and −A.
Thus, a particular parameterization of a VAR(2) implies integrated VAR(1)
dynamics.*
Figure 3.12 shows two simulated multivariate time series arising from the VAR(1)
model in Equation 3.50 in 2-D space. As in the preceding nonintegrated time series
from Figure 3.11, the variance parameter was σ 2 = 1 for both simulations, but the
propagator matrix A was specified to have 0.9 on the diagonal elements, with 0.1
on the off-diagonals in Figure 3.12a and −0.1 on the off-diagonal in Figure 3.12b.
Thus, the simulated time series based on Equation 3.50 are substantially smoother
than those from Figure
3.11, but retain the diagonally oriented process. The time spe-
cific displacements (yt − yt−1 ) (yt − yt−1 ) are similar in Figures 3.11 and 3.12,
but the turning angles are much more consistent (i.e., highly correlated). Thus,
integrated time series models (or higher-order VAR models) are good for captur-
ing dynamics of smooth spatio-temporal processes. For example, in the animal
movement context, smoothness could be a result of migrational movement (see
Chapters 5 and 6).
3.2.2 IMPLEMENTATION
To fit VAR models, we borrow some of the procedures from the preceding section on
univariate time series. For example, recognizing that the VAR(1) specification (3.46)
can be written as a multivariate Gaussian, where yt ∼ N(Ayt−1 , ), the likelihood
* Though we did not mention it earlier, the same is true for the univariate AR(2) model.
88 Animal Movement
becomes
T
L(A, ) = [yt |yt−1 , A, ]
t=2
T
= N(yt |Ayt−1 , ). (3.51)
t=2
Then, we maximize Equation 3.51 with respect to A and to obtain the MLEs for
model parameters.
From the Bayesian perspective, we need priors for the parameter matrices A and
. A possible prior for the covariance matrix (depending on its parameterization) is
an inverse Wishart (or Wishart for the inverse covariance, or precision, matrix) such
that −1 ∼ Wish((Vν)−1 , ν), where E( −1 ) = V−1 . An appropriate prior for the
autoregressive coefficients in A is not quite as obvious. We could specify independent
priors for the individual elements (e.g., αj,j̃ ∼ N(0, σ 2 )), but this does not provide a
j,j̃
means to correlate them a priori. One potential way to generalize the prior for A is
to use the “vec” operator* on A and the multivariate Gaussian distribution vec(A) ∼
N(μA , A ).
To fit the Bayesian VAR(1) model, we seek the posterior distribution
T
[A, |Y] ∝ [yt |yt−1 , A, ][A][], (3.52)
t=2
scale up the inference from individuals to the population. We present model formula-
tions for time series data that will be helpful in each of these settings in the following
sections.
Hierarchical models need not be Bayesian, but the Bayesian framework provides
a straightforward way to fit hierarchical models. In the Bayesian context, Berliner
(1996) provided the first clear description of the structure of a hierarchical model,
a structure that we often take for granted now. The hierarchical structure allows a
complicated problem to be broken up into several simpler problems (i.e., conditional
probability distributions for random variables). Thus, Berliner (1996) formulated a
general hierarchical Bayesian model for time series as a sequence of conditional
distributions:
where each stage is conditioned on the stages below it in the model. This sequence
of distributions appears simple, but provides an incredibly powerful tool for building
complicated statistical models. In Stage 1 of a hierarchical framework, we typically
find the “data model,” which accounts the uncertainty associated with the actual mea-
surements. Stage 2 is composed of the “process model.” The term “process” arises
from the mechanistic underpinnings associated with our understanding of how the
system under study actually works.* The final component is the “parameter model,”
often referred to as a prior in Bayesian models. This final component is necessary for
finding the posterior distribution that is used for Bayesian inference. While helpful
in many cases, the parameter model is not necessary for non-Bayesian models.†
* In the year 1996, Mark Berliner was focused on modeling atmospheric and oceanic processes for which
very detailed mathematical models involving the physics of fluid dynamics are available. For this reason,
he still prefers we use the term “physical process model” rather than “mechanistic model” for Stage 2.
In our presentation here, we have shortened it to “process model.”
† Random effects, in the classical sense, are more akin to process models than parameter models. Parameter
models are for the bottom-level parameters. Random effects depend on unknown parameters; therefore,
they are not at the bottom of the hierarchical structure.
‡ Some would argue that we are never able to directly measure the components of a process we often desire
inference for. In which case, hierarchical models are essential.
§ It is also common to see the variable y used to represent the process and z used to represent the data; for
example, Cressie and Wikle (2011).
90 Animal Movement
autoregressive process is
for t = 1, . . . , T. If the process model is a Gaussian AR(1) and the measurements arise
from a Gaussian process centered on the truth, a hierarchical model specification is
where the priors are only necessary if the model is Bayesian. In this case, we specify
a normal prior for the autocorrelation parameter (α) and inverse gamma distribu-
tions for the two variance components (σy2 and σz2 ). In this particular model, it can
sometimes be difficult to identify both variance components without strong prior
information for one of them. Identifiability is a topic we return to in later chapters.
Figure 3.13 shows a simulated time series from a Gaussian hierarchical model with
autoregressive parameter α = 0.95 and variance components σy2 = σz2 = 1. As the
measurement error (σy2 ) increases, the temporal pattern evident in the latent process
(zt ) will be less visible in the observed time series (yt ).
2
y
−2
−4
0 20 40 60 80 100
Time
FIGURE 3.13 Simulated time series (yt , gray points) from hierarchical model with dynamic
latent process (zt , dark line).
Statistics for Temporal Data 91
We are not obligated to use a normal distribution for the measurement error
(although it does yield substantial computational advantages when appropriate). Sup-
pose the measured response variable is a count at each time t.* Then we might choose
to model the data as yt ∼ Pois(ezt ), where ezt represents the underlying intensity
process for the behavior of interest. The log of this intensity process is modeled as
Equation 3.48 to account for smoothness in behavior over time.
Thus, the options for modeling error are limitless and will explicitly depend on
the type of data collected and study design. In animal movement modeling, we often
observe positions of individuals as 2-D measurements arising from telemetry data.
The ability to account for multivariate measurements is essential, and the hierarchical
modeling approach makes it easy to do that.
N(μ0 , σ02 ), wt =0
yt ∼ , (3.64)
N(μ1 , σ12 ), wt =1
Bern(1 − p), wt−1 = 0
wt ∼ , (3.65)
Bern(p), wt−1 = 1
μ0 ∼ N(μ0,0 , σ0,0
2
), (3.66)
μ1 ∼ N(μ1,0 , σ1,0
2
), (3.67)
where the two clusters are shaped by Gaussian distributions with potentially differ-
ent locations and spreads. The key to this HMM is that the cluster probability is a
* For example, when a certain discrete behavior (e.g., forays from a nest) is observed repeatedly during
t.
92 Animal Movement
(a)
0.8
w
0.4
0.0
0 20 40 60 80 100
Time
(b) 6
2
y
−2
0 20 40 60 80 100
Time
FIGURE 3.14 Simulated time series and dynamic binary process arising from an HMM.
Panel (a) shows the time series for the latent process wt and panel (b) shows the time series for
the positions yt .
3.3.3 UPSCALING
Another common usage of hierarchical models in time series is to avoid pseudo-
replication by scaling up the inference to the appropriate level. For example, in the
animal movement context, we commonly obtain telemetry data for a subsample of
individuals from a larger population. Population-level inference is often of interest in
many studies, but we need to construct individual-level models to properly represent
the movement dynamics. Upscaling can also be useful to help separate measurement
uncertainty from process uncertainty.
Statistics for Temporal Data 93
t = 1, . . . , T, to estimate α, the model would use all telemetry data for all individ-
uals directly. In reality, each individual probably responds to environmental cues
differently and has different physical characteristics; thus, we could let the autoregres-
sive parameter αj vary by individual. If we substitute the individual-level parameter
into each individual model and estimate them all separately, it will not acknowl-
edge any consistent behavior among individuals in the population. Thus, we set up a
hierarchical model to allow for structure at the population level:
(a)
2
0
y
−2
−4
0 20 40 60 80 100
Time
(b) 4
0
y
−2
−4
0 20 40 60 80 100
Time
FIGURE 3.15 Five (J = 5) simulated time series (yj,t ) from two different hierarchical mod-
els. In panel (a), μα = 0.7, and in panel (b), μα = −0.7. In both panels, σy,j2 = 1, ∀j and
σα2 = 0.05.
which is very similar to the original hierarchical measurement error model, except
that the replication at the data level provides enough information about σy2 to separate
it from σz2 , especially as J increases.
The process model in Equation 3.82 is written jointly as we would write a CAR model
in spatial statistics. This specification allows us to write the dynamic structure in terms
Statistics for Temporal Data 95
of covariance, where the matrix z is a function of the parameters α and σz2 , such
that z ≡ σz2 (diag(W1) − αW)−1 and W is a binary proximity matrix indicating
which times are neighbors of each other and diag(W1) is a diagonal matrix with
row sums of W along the diagonal. This type of integration is often referred to as
“Rao-Blackwellization.”
The main drawback of using the integrated likelihood approach is that one cannot
simultaneously obtain inference for the latent process. The latent process is one of the
key features of interest in most animal ecological studies. Thus, a non-Bayesian alter-
native to the integrated likelihood approach for estimating the process in hierarchical
time series models involves Kalman methods.
Kalman methods allow for the estimation and prediction of latent linear temporal
processes such as those described in our hierarchical time series example for measure-
ment error (Kalman 1960). Kalman methods have been extremely popular for signal
processing because they are fast to implement and can naturally update inference in
real time as new data are obtained.
Consider the simple non-Bayesian hierarchical time series model
To set up basic Kalman terminology, there are three main types of procedures for
estimation and prediction. If we are interested in inference about zt , given data yτ ≡
(y1 , . . . , yτ ) , then our problem is prediction if t > τ , it is filtering* if t = τ , and it is
smoothing if t < τ .
Thus, to estimate the process sequentially for t = τ , we can use the Kalman
filtering algorithm (e.g., Cressie and Wikle 2011):
1. Choose initial values for the prediction mean E(z0 |y0 ) and variance E((z0 −
E(z0 |y0 ))2 |y0 ).
2. Let t = 1.
3. Calculate the prediction mean: E(zt |yt−1 ) = αE(zt−1 |yt−1 ).
4. Calculate the prediction variance:
Var(zt |yt−1 ) = σz2 + α 2 Var(zt−1 |yt−1 ).
5. Calculate the Kalman gain† using the prediction variance:
gt = Var(zt |yt−1 )(Var(zt |yt−1 ) + σy2 )−1 .
6. Calculate the filter distribution mean using the prediction mean and Kalman
gain: E(zt |yt ) = E(zt |yt−1 ) + gt · (yt − E(zt |yt−1 )).
7. Calculate the filter distribution variance using the prediction variance and
Kalman gain: Var(zt |yt ) = (1 − gt )E((zt − E(zt |yt−1 ))2 |yt−1 ).
8. Stop if t = T, else let t = t + 1 and go to step 3.
* The term “filtering” is used because it removes unwanted noise from a signal. In this sense, smoothing
is also a type of filtering, but one using all the data.
† The “gain” is a multiplier that updates the information from the previous time to provide the expectation
at the current time.
96 Animal Movement
This iterative algorithm will result in the correct filter distribution mean and vari-
ance for all times. The smoother distribution mean and variance can be obtained
using a similar algorithm (see Cressie and Wikle 2011 for details). Furthermore, these
algorithms are also easily extended to the multivariate setting. While they are incred-
ibly fast, the drawback to Kalman algorithms is that they do not directly estimate
model parameters (i.e., α, σy2 , and σz2 ). Thus, Kalman methods must be paired with
parameter estimation algorithms such as the expectation–maximization algorithm or
maximum likelihood to provide full model fitting results. See Shumway and Stoffer
(2006) for additional details on Kalman methods.
we seek the posterior distribution of the latent state variables (zt ) and parameters α,
σy2 , and σz2 :
T
[z, α, σy2 , σz2 |y] ∝ [yt |zt , σy2 ][zt |zt−1 , α, σz2 ][α][σy2 ][σz2 ]. (3.90)
t=1
The joint posterior is not analytically tractable, but we can use MCMC to fit the
model. For our simple hierarchical time series model, the full-conditional distribu-
tions are tractable because we used conjugate prior distributions.* Thus, we construct
an MCMC algorithm by sampling from the following distributions sequentially:
⎛ −1 −1 ⎞
2 2
z 1 z z z 1
[α|·] = N ⎝ t t t−1 ⎠ , (3.91)
t t−1 t t−1
+ 2 , + 2
σz2 σα σz2 σz2 σα
t (yt − zt )
T 2
[σy2 |·]= IG + γ1 , + γ2 , (3.92)
2 2
t (zt − αzt−1 )
T 2
[σz |·] = IG
2
+ γ1 , + γ2 , (3.93)
2 2
* Recall that conjugacy implies that the form of the full-conditional matches that of the prior.
Statistics for Temporal Data 97
⎛ −1 −1 ⎞
1 2 yt zt+1 + zt−1 1 2
[zt |·] = N ⎝ 2 + 2 + , + 2 ⎠,
σy σz σy2 σz2 σy2 σz
for t = 1, . . . , T − 1, (3.94)
⎛ −1 −1 ⎞
1 1 y z 1 1
[zT |·] = N ⎝ 2 + 2 ⎠,
t t−1
+ 2 , + 2 (3.95)
σy σz σy2 σz σy2 σz
given an initial value for z0 . As discussed in the previous chapters, after a large num-
ber of MCMC samples have been collected, we can obtain inference in the form of
posterior means, variances, and credible intervals using Monte Carlo integration. For
more details on Bayesian methods and MCMC, see Hobbs and Hooten (2015).
Using the simulated data set, based on σy2 = 0.1, σz2 = 1, and α = 0.95, we
estimated the latent temporal process zt using maximum likelihood (with Kalman
filtering) and the Bayesian hierarchical model (with MCMC). Figure 3.16a shows
the time series with Kalman smoother mean and 95% confidence interval while
(a) 6
2
y
−2
−4
0 20 40 60 80 100
Time
(b) 6
2
y
−2
−4
0 20 40 60 80 100
Time
FIGURE 3.16 Estimated latent process for zt based on simulated data yt (points). Panel (a)
shows the Kalman smoother mean (dashed line) and 95% confidence interval (gray region).
Panel (b) shows the Bayesian posterior mean (dashed line) and 95% credible interval (gray
region).
98 Animal Movement
Figure 3.16b shows the same time series with the Bayesian posterior mean and 95%
credible interval. The confidence interval for the Kalman smoother (Figure 3.16a)
is narrower than that of the Bayesian credible interval (Figure 3.16b). While both
statistical estimates are obtained via smoothing, the Bayesian credible interval is
slightly wider because it accommodates the uncertainty associated with the unknown
parameters.
n
i=1 k((c1 − μ1,i )/b1 )k((c2 − μ2,i )/b2 )
f̂ (c) = . (4.1)
nb1 b2
* Even though terrestrial animals live on a spheroid that is clearly not 2-D. For small spatial extents, the 2-D
assumptions are often sufficient, but keep in mind that we may not be able to reduce the dimensionality
of space down to two and still retain the important ecological characteristics for animals that swim or fly.
99
100 Animal Movement
The true density function f (c) can then be estimated for any location c given the true
individual locations μi for i = 1, . . . , n and choice of kernel function k(·). Additional
quantities in this estimator are the bandwidth parameters b1 and b2 . These bandwidth
parameters control the smoothness of the estimated density surface. Many approaches
exist for setting or estimating b1 and b2 . Most commonly, a default bandwidth is
calculated for each margin (i.e., latitude and longitude) as 0.9 times the minimum
of the sample standard deviation and the interquartile range divided by 1.34 times
the sample size to the negative one-fifth power (e.g., Silverman 1986; Scott 1992).
There are many alternative methods for setting an appropriate bandwidth (e.g., cross-
validation), but the method described above works well for Gaussian kernels and
many data sets. Although the UD is often not estimated in a parametric framework, the
estimated density function serves as a basis from which to calculate many important
space use metrics.
The GPS telemetry data and estimated UD, based on KDE, for an individual moun-
tain lion (Puma concolor) in Colorado, USA, is shown in Figure 4.1. The data in
Figure 4.1 were used in an example by Hooten et al. (2013b), and represent 91 posi-
tions observed every 3 h over a period of approximately 11 days for an adult mountain
lion. The estimated UD in this example indicates that the individual mountain lion
likely uses space differentially, with at least two main regions of higher-intensity use
4,440,000
4,435,000
4,430,000
4,425,000
FIGURE 4.1 Mountain lion telemetry locations (points) and utilization distribution estimated
using KDE (darker gray shading indicates greater utilization).
Point Process Models 101
in the study area (Figure 4.1). Furthermore, there appears to be at least one telemetry
observation that is distant from the regions of highest-intensity use (leftmost point in
Figure 4.1), perhaps due to a foray into a neighboring individual’s territory.
* With the possible exception of true physical constraints such as fenced regions or a very strong territorial
effect in a confined space.
102 Animal Movement
UD (or KDE of the UD) is essentially a contour line, or more formally, a line drawn
through all of the points on a surface that have the same density value. For example,
by convention, the 95% isopleth of a KDE delineates the region that contains 95% of
the total density. A convex hull* is the smallest polygon containing all of the telemetry
points by connecting the “outside” points while having no acute interior angles (i.e.,
no interior angles less than 90◦ ).
Returning to the mountain lion example in the previous section, Figure 4.2
demonstrates the similarities and differences among home range estimation meth-
ods. For example, the KDE isopleth increases in size as the percentage of the
isopleth increases. The 95% KDE isopleth in Figure 4.2a is sufficiently small that
(a) (b)
4,440,000 4,440,000
4,435,000 4,435,000
4,430,000 4,430,000
4,425,000 4,425,000
(c) (d)
4,440,000 4,440,000
4,435,000 4 ,435,000
4,430,000 4,430,000
4,425,000 4,425,000
FIGURE 4.2 Mountain lion telemetry locations (points) and estimated home range delin-
eation using (a) 95% KDE isopleth, (b) 97.5% KDE isopleth, (c) 99% KDE isopleth, and
(d) convex hull.
* A convex hull is often referred to as a minimum convex polygon (MCP) in the animal ecology literature.
Point Process Models 103
the dominant region of space in the estimated home range does not include one of
the telemetry observations. In fact, at the 95% level, there are two distinct regions of
space, but we only plot the dominant one (i.e., the one with larger area) for illustration
here. Figure 4.2d illustrates that, in this case, the convex hull estimate of the moun-
tain lion home range is substantially smaller than the KDE isopleth estimates and
captures all of the telemetry data. The researcher must decide which type of home
range estimator to use if inference for the home range is desired. The convex hull
method is less subjective, but some would argue that it is also less realistic. Signer
et al. (2015) argue that the relative differences in home range size among individu-
als is most important, and thus, the estimation method may not impact the desired
inference.
(a) (b)
4,445,000
4,440,000 500
4,435,000
0
L
4,430,000
−500
4,425,000
−1000
4,420,000
0
00
00
00
Distance
5,
0,
5,
46
47
47
FIGURE 4.3 (a) Mountain lion telemetry locations (points) and estimated home range delin-
eation using a 99% KDE isopleth and (b) the corresponding estimated L function (black line)
and Monte Carlo interval based on 1000 CSR simulations of the point process within the home
range (gray region).
The model proposed by Wilson et al. (2010) partitions the home range into a
large number of m grid cells for computational reasons. The telemetry data are
then converted to counts within each grid cell.* Grid cells not containing teleme-
try observations receive a zero count. To begin, assume that the home range H can be
partitioned into two subsets H = C ∪ C˜, where C and C˜ represent the nonoverlapping
core and noncore areas (i.e., their intersection is zero).† The core area C may be com-
posed of several distinct subregions itself in cases where the UD is multimodal. If the
KDE isopleth φ is known, then both C and C˜ are known and there are mC and mC˜ grid
cells that fall within each subregion. Thus, Wilson et al. (2010) used a multinomial
framework to model the grid cell counts
where yC (an mC × 1 vector) are the cell counts in the core areas and yC˜ (an mC˜ × 1
vector) are the cell counts in the noncore areas. The total numbers of telemetry obser-
vations in core and noncore areas are nC and nC˜ , and the grid cell probabilities for
core and noncore areas are pC ≡ (1/mC , . . . , 1/mC ) and pC˜ ≡ (1/mC˜ , . . . , 1/mC˜ ) .
The multinomial specification and equal grid cell probabilities imply that the total
different sizes within a home range, we use the term “density” here rather than “intensity.” While core
areas all have density fC in our model, the intensities will vary with size; larger core areas will have
higher expected numbers of telemetry observations even though they all have the same density.
* This is similar to the implementation of resource selection function models, and thus, it serves a good
segue to the next section.
† Although the core area will be completely surrounded by noncore area given our model assumptions.
Point Process Models 105
core area intensity is aC nC /mC and noncore area intensity is aC˜ nC˜ /mC˜ , where the
total core size is aC and noncore size is aC˜ . The important point is that both inten-
sities are known when the isopleth is φ is known. Thus, the total likelihood for the
model can be written as
where y is the m × 1 vector of all cell counts. The likelihood in Equation 4.4 can be
maximized to find the MLE for φ or a prior could be specified for φ and Bayesian
inference can be obtained. Wilson et al. (2010) obtain a Bayesian estimate for φ using
the likelihood (4.4) and the prior φ ∼ Beta(1.1, 1.1).*
The core area model described thus far is appropriate when a single isopleth par-
titions the area into core and noncore areas. However, it is possible that there may be
multiple levels of core areas at increasingly higher levels of intensity. In these cases,
we can easily generalize the model by allowing for several isopleths (in the vec-
tor φ ≡ (φ1 , . . . , φJ ) ) such that they are ordered from small to large. Wilson et al.
(2010) use a Dirichlet prior for the isopleth vector because each of the φj isopleths are
bounded by zero and one, and sum to one ( Jj=1 φj = 1). In this generalized setting,
the likelihood is now a product over all subregions of the home range
J
[y|φ] = MN(yCj |nCj , pCj , φj ), (4.5)
j=1
where the home range is partitioned as H = ∪Jj=1 Cj and nCj is number of telemetry
observations in the jth subregion.
Wilson et al. (2010) recommended a general procedure for implementation when
the number of core areas is unknown:
1. Check for clustering in the observed set of telemetry data using the Ripley’s
L function and associated Monte Carlo hypothesis tests.
2. If clustering exists, partition the domain into a large number of grid cells (as
big as computationally feasible) and fit the core area model assuming only
two levels of intensity.
3. Use the posterior mean isopleth E(φ|y) to split the telemetry observations
into two sets, one for the core and one for the noncore.
4. Check for additional clustering in each set separately using the methods in
step 1.
5. If checks reveal no further evidence of clustering, stop and obtain the desired
inference (e.g., core area size).
6. If additional clustering exists, fit the core area model using three levels of
intensity.
7. Check each of these three subregions for further clustering.
* The hyperparameters of this prior were chosen deliberately to keep φ away from the unreasonable values
of zero and one while still being only weakly informative.
106 Animal Movement
Figure 4.4 shows the estimated core area for the mountain lion data in our example
from the previous sections. The posterior mean isopleth occurred at 49% and is com-
posed of two core area regions shown as a dashed line in Figure 4.4. The estimated
core area itself encompasses approximately one-third of the total home range.
Figure 4.5 shows the core and noncore areas as well as their estimated L functions.
The simulation envelopes based on 1000 CSR point processes fully encompass the
estimated L functions for the core and noncore areas (Figure 4.5) indicating a lack of
clustering or regularity in each of the partitions of the home range. Thus, following the
guidance of Wilson et al. (2010), we conclude that the estimated region in Figure 4.4
is sufficient for delineating the core area of space use for our example mountain lion.
Had there been evidence of significant clustering in either the core or noncore areas,
we would fit the core area model using two partitions, which would result in three
areas of distinct space use intensity.
The advantages to this sequential approach of model fitting and model checking is
that the assumptions of the model can be verified during the procedure. The drawback
4,440,000
4,435,000
4,430,000
4,425,000
FIGURE 4.4 Mountain lion telemetry locations (points), home range (dark line), and
estimated core area delineation (dashed line).
Point Process Models 107
(a) (b)
500
0
L
−1000
−2000
0 1000 2000 3000 4000 5000
Distance
(c) (d)
0
−500
L
−1500
−2500
0 1000 2000 3000 4000 5000
Distance
FIGURE 4.5 Mountain lion telemetry locations (points) and home range (dark line). The esti-
mated (a) core area and (c) noncore area are shown as gray regions. The estimated L functions
(dark line) and simulation envelopes (gray regions) are shown for the (b) core and (d) noncore
areas.
their location) given the type of environment (i.e., resources) that is “available” to
them. This topic involves many different notations and terminologies that we must
reconcile as we develop the necessary tools to infer resource selection. We begin
with the punch line: the concept of resource selection functions (RSFs) fits within a
standard framework for modeling spatial point processes. Even though much of the
notation, terminology, and practice developed separately in the field of quantitative
animal ecology, almost all of the tools have existed in the field of statistics for quite
some time. We return to the history of this subject, but we present the fundamental
ideas first.
Resource selection inference can be similar to space use inference in that we often
seek to characterize the spatial probability distribution that gives rise to the data. The
difference is that RSF models are parametric and usually involve auxiliary sources of
data on the environment or potential “resources” from which the individual can select.
In RSF analysis, the environment, habitat, or resources that are available to the indi-
vidual are specified or modeled. The selection process and availability of resources
are modeled as nonnegative functions that influence the spatial density of individual
locations in a region. The product of selection and availability functions is propor-
tional to the density. If the product of selection and availability functions integrates to
one, it is a density function. Thus, to serve as a valid probability model for the indi-
vidual locations as a point process, the product of selection and availability functions
must be normalized so that it is a proper density function over space.
We describe the RSF model from a somewhat unconventional perspective in
wildlife ecology so as to remain consistent with the standard statistical view of a point
process model. In doing so, we treat the spatial location μi as the random quantity of
interest for which we specify a PDF. The traditional approach in the wildlife ecologi-
cal literature treats the environment or resources (i.e., x(μi )) as the modeled quantity.
Both perspectives are correct in that they are designed to model a point process. In
the recent literature, you will see both formulations. We treat the spatial location μi
as the point, whereas some other descriptions will treat the set of environmental con-
ditions x(μi ) as the point. We model the spatial location directly because it allows us
to generalize the model to accommodate more complicated situations.
Consider the weighted distribution formulation of a point process model for
independent individual locations μi ∼ [μi |β, θ] such that
where the selection function g depends on β, the selection coefficients. The availabil-
ity (i.e., f ) depends on θ, the availability coefficients. Furthermore, the denominator
in Equation 4.6 is necessary so that the entire PDF [μi |β, θ] integrates to one over
the support of the point process. The RSF model in Equation 4.6 provides a useful
example of how we can construct PDFs from scratch for nearly any type of data or
process.
In principle, any positive functions can be used for availability (f ) and selection (g).
However, in basic resource selection studies, the availability function is taken to be
Point Process Models 109
the uniform PDF on the support of the point process (M). For such uniform avail-
ability specifications, the interpretation is that the individual can occur anywhere in
the support M with equal probability, and thus, the availability coefficients, θ, disap-
pear from the model and the focus shifts toward the selection coefficients β. Johnson
(1980) introduced a natural ordering of four scales for resource selection inference
that ecologists may be interested in:
The concept for scales of selection inference proposed by Johnson (1980) are com-
monly referred to and allow the researcher to define the support M based on their
goals for inference.
The selection function g can assume any positive form; however, two forms are
most popular: the exponential and logistic functions. The exponential selection func-
tion can be expressed as g(x(μi ), β) ≡ exp(x (μi )β), whereas the logistic selection
function takes the form of a probability
The x (μi )β term in Equation 4.7 resembles the mean function in linear regression
and the forms of selection as link functions commonly used in generalized linear
modeling with Poisson and Bernoulli likelihoods.* In most GLMs, the value of one
is included as the first covariate in x so that the first element of β (i.e., β0 ) acts as an
intercept in the model. However, if we use an intercept in the exponential selection
function, it will cancel in the numerator and denominator of Equation 4.6. Thus, an
intercept is not included in RSF models that rely on Equation 4.6 directly when the
selection function is exponential.
The main difference in the resulting inference from the two common forms of
selection functions is that the logit form† (4.7) allows for inference directly on the
probability of selection, whereas the exponential form limits inference to the relative
intensity of selection. However, even in this case, inference concerning the direction
and magnitude of environmental effects on selection can still be obtained directly by
learning about β. Thus, despite this apparent shortcoming, most resource selection
studies still rely on the exponential form for the model because of tradition and ease
of implementation. Under uniform availability, the RSF with exponential selection
function is
exp(x (μi )β)
[μi |β] ≡ . (4.8)
exp(x (μ)β)dμ
Notice the similarity of this resource selection model (4.8) with that of the hetero-
geneous spatial point process model (2.8) described in Section 2.1. Thus, to form a
likelihood under the assumption of conditional independence* for the points (i.e., μi ,
for i = 1, . . . , n), we take the product of Equation 4.8 over the n individual locations
n
exp(x (μi )β)
. (4.9)
exp(x (μ)β)dμ
i=1
(a) (b)
FIGURE 4.6 (a) Mountain lion telemetry locations (points) and home range (gray region).
(b) Background sample based on 1000 samples from a CSR point process (points) and home
range (gray region).
Figure 4.6a shows the original point process for the mountain lion data and esti-
mated home range based on a 99% KDE isopleth. Figure 4.6b shows a background
sample of size 1000 based on a CSR point process within the estimated home range.
In the Poisson regression approach to fitting RSF models, the support of the indi-
vidual locations M is gridded up into L areal units (i.e., grid cells or pixels). The
covariates (xl for l = 1, . . . , L, for large L) are associated with each grid cell and the
individual locations (μ) are counted in each grid cell and recorded in an L × 1 vector
of cell frequencies z. The model becomes zl ∼ Pois(λl ), where log(λl ) = β0 + xl β
for all grid cell counts l = 1, . . . , L. The intercept β0 is related to grid cell size,* but
usually ignored, and only the estimates of β are used for resource selection inference.
Figure 4.7a shows the original point process for the mountain lion data and esti-
mated home range based on a 99% KDE isopleth. Figure 4.7b shows the gridded
point process with counts within 1 km grid cells in the estimated home range.
The striking result is that both the logistic and Poisson regression approaches to
implementing the RSF model yield the same inference about β, under conditions
we explain in what follows (Warton and Shepherd 2010; Aarts et al. 2012). In the
mountain lion example, we fit the spatial point process model using both Poisson
and logistic regression to yield inference for the selection coefficients β based on
the standardized covariates in Figure 4.8. The results indicate that both the logistic
and Poisson regression approach to fitting the point process model are very similar
(Table 4.1). Furthermore, resource selection inference resulting from the model fits
indicates that this mountain lion is selecting for lower elevations and steeper slopes
relative to available terrain; there is no evidence of selection for exposure given the
other covariates in the model.
The necessary conditions for equivalence in the inference are that the background
sample used in the logistic regression and the number of grid cells used in the Poisson
(a) (b)
FIGURE 4.7 (a) Mountain lion telemetry locations (points) and home range (gray region).
(b) Gridded cell counts of the mountain lion telemetry locations within the home range (light
gray = 1, medium gray = 2, and dark gray = 3 points in the pixel).
FIGURE 4.8 Landscape covariates for the mountain lion telemetry data: (a) elevation,
(b) slope, and (c) exposure. Darker shading corresponds to larger values in the covariates.
TABLE 4.1
Estimated Resource Selection Coefficients Using Logistic (LR) and Poisson
Regression (PR) Based on the Covariates in Figure 4.8
LR PR
regression often need to be very large, sometimes on the order of tens of thousands
(Northrup et al. 2013). Perhaps the most common mistake made in most classical RSF
implementations is that a large-enough background sample is not used. The easiest
way to check whether a large-enough sample is chosen is to try larger background
Point Process Models 113
sample sizes until the inference stabilizes. The reason that a large background sam-
ple or number of grid cells is so important is that it is implicitly sidestepping the
integral in the denominator of the RSF model (4.8) for us. However, when the covari-
ate information is only available at a certain resolution, then additional grid cells at
resolutions smaller than the covariates in the Poisson regression do not improve the
approximation.
We could choose to numerically compute the integral and then use maximum like-
lihood or Bayesian methods, but the easy implementation in statistical software using
a GLM seems much more straightforward for most ecologists. There can be some
advantages to the explicit integration approach however, and we discuss these in the
next section.
We discuss the first item here and return to the topic of model extensions in
Section 4.4.
One approach for improving computational performance in the evaluation of a
point process likelihood (4.9) is to find efficient approximations for the integral in
the denominator. The function exp(x (μ)β) will inevitably be quite complicated;*
thus, the integral of the function with respect to μ will be analytically intractable.
However, a fairly simple approximation to the integral exp(x (μ)β)dμ can be found
using numerical quadrature. That is, break up the support of the point process (M)
into a large number of equally sized grid cells (as in the Poisson regression procedure
described above) and then evaluate exp(x (μl )β) for each grid cell (assuming that μl
is the grid cell center for cell l). We then multiply by the area of a grid cell (a) and
sum to obtain the integral approximation. Thus, we approximate the integral with
L
exp(x (μ)β)dμ ≈ a exp(x (μl )β) (4.10)
l=1
on each MCMC iteration. Therefore, if the integral takes 0.1 s to approximate once
and you need 100,000 MCMC iterations, then your algorithm will never take less than
about 2.8 h to fit the model (assuming all other required calculations are negligible).
If we could speed up the integral approximation by an order of magnitude, we could
reduce the total required computational time down to 16 min, which is reasonable for
contemporary statistical model fitting.
In what follows, we describe two approaches for approximating the required RSF
integral that can be made as accurate as quadrature without increasing computational
time. If a minimal amount of additional approximation error is acceptable, these
approaches can offer an order of magnitude faster integral approximation.
To simplify our presentation of the integral approximation techniques, we first
reduce the complexity of the required integral through orthogonalization and inte-
gration. Most statistical algorithms are iterative and we typically need to evaluate
the RSF integral when optimizing or sampling the RSF coefficients. Thus, if we can
reduce the problem to one where a single coefficient is dealt with at a time, then a
few good tricks for reducing computational burden become apparent.
For computing purposes only, we begin by transforming the environmental vari-
ables X to create a new set of orthogonal covariates X̃ = XV. To perform this
transformation, we acquire V from the singular value decomposition of the matrix
of environmental variables X = UDV such that U and V are the left and right sin-
gular vectors, respectively, and D is a diagonal matrix with the singular values on
the diagonal. The orthogonalization allows us to perform a type of principal compo-
nents regression because we can always express the selection function as exp(Xβ) =
exp(X̃β̃), where β̃ are a set of selection coefficients associated with the transformed
covariates. These new covariates are linear combinations of the original covariates but
can be difficult to interpret, although they can be easily visualized as spatial maps.
For example, the principal component scores resulting from the orthogonalization of
the original mountain lion covariates are shown in Figure 4.9.
The advantages of using this orthogonal covariate transformation are manifold. It
would appear that we lose the ability to interpret the selection coefficients; however,
we can always recover them with the inverse transformation β = Vβ̃. Moreover,
the orthogonalization results in a much more stable computational algorithm if the
original covariates were correlated (i.e., multicollinear). Finally, in situations where
FIGURE 4.9 Principal components of landscape covariates for the mountain lion teleme-
try data: (a) component 1 (41% of variation), (b) component 2 (37% of variation), and
(c) component 3 (22% of variation).
Point Process Models 115
there are many original environmental variables, we can reduce the dimension of the
orthogonal covariate set by retaining only the first q columns of V when calculat-
ing X̃. This approximation is a common technique used in spatial statistics, but will
only be worthwhile for large sets of covariates (i.e., more than 10).
The orthogonalization affords us one final benefit. It allows us to construct a condi-
tional algorithm where we only sample (or optimize) a single coefficient β̃j at a time.
Normally, in an optimization (i.e., Newton–Raphson) or sampling algorithm (e.g.,
MCMC), we would prefer to handle a set of regression coefficients jointly; however,
in this case, it will pay off to deal with them one at a time.
In an effort to make this point as clearly as possible, we focus on a Bayesian RSF
model using an MCMC algorithm. In this case, we seek the posterior distribution of
[β̃|μ1 , . . . , μn ]. Using MCMC, we can sample from full-conditional distributions for
each of the coefficients sequentially to fit the model. That is, we need to be able to
efficiently sample each coefficient β̃j from the full-conditional distribution
n
exp(x̃ (μi )β̃)
[β̃j |·] ∝ [β̃j ] , (4.11)
i=1 exp(x̃ (μ)β̃dμ)
where the initial term [β̃j ] is the marginal prior distribution* for β̃j and the integral in
the denominator needs to be approximated (because it involves β̃j ). In working with
Equation 4.11, we can expand the exponential such that
⎛ ⎞
exp(x̃ (μi )β̃) = exp ⎝ x̃j (μi )β̃j ⎠
∀j
⎛ ⎞
= exp(x̃j (μi )β̃j ) exp ⎝ x̃l (μi )β̃l ⎠ . (4.12)
∀l =j
This nicely isolates the jth effect x̃j (μi )β̃j from the rest, allowing us to expand terms
in the full-conditional distribution from Equation 4.11 to
n
exp(x̃ (μi )β̃)
[β̃j |·] ∝ [β̃j ]
i=1 exp(x̃ (μ)β̃)dμ
n (μ )β̃)
[β̃j ] i=1 exp(x̃ i
∝ n
exp(x̃ (μ)β̃)dμ
n n
[β̃j ] i=1 exp(x̃ j (μ i ) β̃j ) i=1 exp ∀l =j x̃ l (μi ) β̃l
∝ n
∀j exp x̃j (μ)β̃j dμ
n
[β̃j ] i=1 exp(x̃j (μi )β̃j )
∝ n , (4.13)
∀j exp x̃j (μ)β̃j dμ
where the last product in the numerator drops out in the proportionality in Equa-
tion 4.13 because it does not involve β̃j . To simplify the integral in the denominator
of Equation 4.13, we change the integral so that it integrates over the covariates (x̃j )
rather than the spatial locations (μ).* The full-conditional distribution for β̃j then
becomes
n
[β̃j ] i=1 exp(x̃j (μi )β̃j )
[β̃j |·] ∝ n
∀j exp x̃j (μ) β̃j dμ
[β̃j ] n exp(x̃j (μi )β̃j )
∝ i=1 n
∀j exp x̃ β̃
j j [x̃j ]dx̃ j
[β̃j ] ni=1 exp(x̃j (μi )β̃j )
∝ n
exp x̃j β̃j [x̃j ]dx̃j ∀l=j exp x̃l β̃l [x̃l ]dx̃l
[β̃j ] ni=1 exp(x̃j (μi )β̃j )
∝ n , (4.14)
exp x̃j β̃j [x̃j ]dx̃j
where the product over the integrals in the denominator is valid when the covariates
are independent. The covariates x̃j should be independent if they are normally dis-
tributed because they have been orthogonalized.† This new parameterization does not
immediately seem to help. However, because we can approximate the integral with
a sum (i.e., the quadrature concept discussed earlier), in certain circumstances, com-
ponents of the sum can be precalculated, which increases computational efficiency.
To illustrate this idea, consider a summation approximation of the required integral
L
(l) (l)
exp x̃j β̃j [x̃j ]dx̃j ≈ a exp x̃j β̃j [x̃j ], (4.15)
l=1
* Recall that exp x̃(μ)β̃ dμ = exp x̃β̃ [x̃]dx̃, where [x̃] is the distribution of the covariate implied
by a uniform distribution on μ. Also, keep in mind that our initial integral over μ is multivariate; we just
use the single integral notation to simplify things.
† Independence and orthogonality are equivalent for normally distributed random variables, but that is not
true for all random variables. Therefore, this technique requires potentially strong assumptions.
Point Process Models 117
where “a” is the area of quadrature grid cell as before and L is the number of total cells.
Now suppose that a discretization can be found for the variable x̃j such that it falls in
one of C classes. Also suppose that the loss in precision due to the discretization can
be decreased with a larger number of classes. Then, we replace the quadrature sum
in Equation 4.15 with a different sum involving the classes as
L
(l) (l)
exp x̃j β̃j [x̃j ]dx̃j ≈ a exp x̃j β̃j [x̃j ]
l=1
C
≈a nc exp x̃jc β̃j [x̃jc ], (4.16)
c=1
where nc corresponds to the number of cells containing that particular class for the
covariate. This reduces the sum from a potentially very large L dimension down
to a much smaller C dimension because nc can be precalculated after the optimal
discretization into classes is performed.
Many methods exist for finding an optimal (i.e., minimal loss) discretization for
the covariates. Perhaps the simplest approach is to cluster each covariate using a
K-means approach (or other clustering algorithm) prior to model fitting to deter-
mine the classes. In our experience, this type of preclustering can speed up fitting
algorithms by an order of magnitude or more depending on the complexity of the
covariate.
for j = 1, . . . , m, and where the error terms are normally distributed. The errors, εj ,
are assumed to be independent and identically Gaussian, whereas ηj are allowed
118 Animal Movement
to be spatially correlated such that η ∼ N(0, ). The covariance matrix is typically
parameterized assuming a continuous spatial process. For example, the elements of
the covariance matrix are often defined as jl ≡ exp(−djl /φ), where djl is the
Euclidean distance between cell locations cj and cl in the grid, and φ controls the
range of spatial dependence as described in Chapter 2.
The advantages of the basic RUF concept are that it is intuitive and straightfor-
ward to implement. It is intuitive because it attempts to link the estimated density (or
the UD) associated with animal relocations to the environment in a regression frame-
work. It is straightforward because the actual implementation only requires two lines
of computer code: one to estimate f̂ and another to fit the regression model. At the
time of the development of the RUF concept (i.e., early 2000s), it was especially
attractive as compared to RSF fitting procedures, and seems like it should yield simi-
lar inference. At present, it is now widely known that RSFs can be implemented with
only one line of computer code (after the initial preprocessing of the data). That is,
to fit an RSF model, only software for fitting a GLM is required, either Bernoulli
(i.e., using the background sample approach) or Poisson regression (i.e., the count
modeling approach). Thus, any computational advantages to the RUF may be moot
at this point.
However, the traditional RUF approach highlights some important issues that we
describe further. Hooten et al. (2013b) compared and contrasted the RUF and RSF
procedures in an attempt to reconcile the inference they provide. Among other things,
they found two key differences that we describe in what follows: the “support” for the
response variable in the model and the relationship between first- and second-order
(i.e., mean and variance) components of the RUF model.
Recall that when we use the word “support” we are talking about the values that a
certain variable can assume. In the case of the RUF model, the approach described by
Marzluff et al. (2004), and later Millspaugh et al. (2006), links the estimated density
(f̂ or UD) directly to the covariates without transformation. Thus, the RUF model
implies an identity link function (i.e., no transformation). On the other hand, if we
consider the Poisson regression approach to fitting the RSF model, it is customary to
use the log link function such that log(λ(cj )) = xj β. As the grid cell area approaches
zero, the intensity function λ is proportional to the density function f . Thus, the log
of λ plus a constant is equivalent to the log of f , which implies that we may want to
use the log of the estimated density surface (or UD) as the response variable in the
RUF model such that
log( f̂ (cj )) = xj β + ηj + εj . (4.18)
We refer to this new model as a modified RUF model. The modified RUF (4.18) more
closely mimics the RSF and provides more similar inference.
The second issue concerning the RUF pertains to the second-order spatial struc-
ture of the model (η). In the early development of the RUF concept, Marzluff et al.
(2004) noticed that the first-stage density estimation procedure induced a form of spa-
tial autocorrelation that was exogenous to the resource selection process. Therefore,
a model checking effort for the standard regression form of RUF (i.e., without spa-
tially autocorrelated random effects) indicates spatial dependence in the residuals. In
fact, based on our own experience with these models, an almost “textbook” empirical
Point Process Models 119
1.5
Semivariance
1.0
0.5
0.0
0 2000 4000 6000 8000
Distance
FIGURE 4.10 Empirical (points) and fitted (line) semivariogram based on the residuals of
regressing the log estimated UD on the exposure covariate for the mountain lion data. The
fitted semivariogram is based on a Gaussian model for covariance.
variogram for the residuals often results. For example, Figure 4.10 shows the empir-
ical and fitted semivariogram for the residuals when regressing the log estimated UD
on the exposure covariate for the mountain lion data. The smoothly increasing semi-
variance with distance is due to the UD estimation using KDE. Therefore, because
it is ad hoc to use a model for inference when evidence for a lack of fit is present.*
Marzluff et al. (2004) suggested adding a correlated random effect to the model, as
is typically done in spatial statistics. The result is a model with the same basic form
as the right-hand side of Equation 4.18. This is an excellent model for spatial predic-
tion, but prediction is not the goal of RUF analysis. Instead, we seek to learn about
the regression coefficients β as surrogates for selection coefficients in a point process
RSF model. Inference concerning β requires the covariates X to be linearly indepen-
dent of the random effect η. Evaluating the collinearity assumption is not trivial, and
thus, it is not often checked.
A number of recent studies have shown that the inference for β can be affected
if collinearity exists among the fixed and random effects (e.g., Hodges and Reich
2010; Hughes and Haran 2013; Hanks et al. 2015b). For example, in the mountain
lion application, the exposure covariate is negatively correlated (ρ = −0.29) with
the eleventh eigenvector of the estimated covariance matrix (Figure 4.11). Recall the
discussion of spatial confounding in the context of general spatial statistics in Chap-
ter 2. Spatial confounding can also occur in RUF models because, when fitting RUF
models with and without the spatial random effect, one arrives at potentially different
inference. In an attempt to help alleviate the problem, Hooten et al. (2013b) evalu-
ated restricted spatial regression (RSR) for RUF models and found that it can yield
improved inference in some cases (e.g., less biased estimates of coefficients). As a
reminder, the idea with RSR is to force the random effect to be orthogonal to the
fixed effects. The use of RSR is only warranted when the first-order effects (i.e., xj β)
take precedence over the second-order effects (i.e., ηj ). Using RSR for inference can
(a) (b)
FIGURE 4.11 Exposure covariate (a) and eleventh eigenvector (b) of ˆ based on the fitted
semivariogram using a Gaussian model for covariance. The correlation for the covariate and
eigenvector was approximately 0.3.
be detrimental in cases when the covariates are collinear with a true additive random
effect in reality (Hanks et al. 2015b). Thus, caution must be exercised in specifying
and fitting RUF models.
A final potential issue with using RUF inference in lieu of RSF inference relates
to model misspecification (i.e., an incorrect formulation of the model for the desired
type of inference). To illustrate this issue, consider the following simplified modeling
scenario. Imagine a very basic linear regression model, where a spatially indexed
response variable y is regressed on a single covariate x using the model
y = β0 + β1 x + ε. (4.19)
My = β0 M1 + β1 Mx + Mε. (4.20)
If the rows of M each sum to one and the errors ε are normal and independent with
homogeneous variance σ 2 , then this new model for the smoothed data can also be
written as
My ∼ N(β0 + β1 Mx, σ 2 MM ). (4.21)
Notice that the new model (4.21) is very similar to the original, but with two important
differences. The first difference is that the original covariate is replaced with a new
smoothed version of it. The model formulation in Equation 4.21 suggests that, if you
have data that are smoothed after the process of interest occurs, you should use a
model containing the same type of smoothing on the covariate as on the response.
Intuitively, this seems sensible, but might have gone unnoticed if we had not written
Point Process Models 121
the transformed model explicitly. The second difference is that the original errors
were uncorrelated but the smoothing induces a specific type of correlation in the
new model via the covariance matrix σ 2 MM in Equation 4.21. In fact, the type of
correlation induced is the same type used in kernel convolution approaches for fitting
spatially explicit models (Higdon 1998).
The bottom line is that the simple regression example in Equations 4.19 through
4.21 illustrates that the inference could be affected when using standard RUF
approaches. Because the original density of the point process is being estimated with-
out the use of finer-scale underlying covariates (i.e., using KDE alone), it will likely
be smoothed in a similar fashion as in the simple case outlined above, thus affecting
inference if the covariates are not smoothed appropriately first. Hooten et al. (2013b)
empirically demonstrated that it was possible to obtain better inference using an RUF
with smoothed covariates and a spatially correlated error structure (similar to that
proposed by Marzluff et al. 2004). In this case, the phrase “better inference” pertains
to inference closer to that arising from a Poisson regression implementation of the
RSF model. Thus, while it is possible to fix up the RUF model, it may no longer
be worthwhile because the RSF model is simpler to fit. Having said that, there may
still be some uses for two-stage procedures like that used in RUF models. For exam-
ple, in cases where complicated model extensions are required or the amount of data
becomes too large, some form of multiple imputation may be necessary.* The two-
stage aspect of multiple imputation is similar to that used in the RUF procedure. We
return to this idea in Chapter 7.
4.4 AUTOCORRELATION
As we have seen in the preceding sections, obtaining inference for resource selection
using animal telemetry data can be tricky. In addition to the spatial autocorrelation
issues that we discussed in the previous section, further consideration of the temporal
form of autocorrelation in the analysis of telemetry data is critical.
The fundamental issue with temporal autocorrelation arises because the point pro-
cess models used to obtain RSF inference often assume that each point (i.e., observed
animal position) arises independently of the others. When the telemetry fixes are
obtained close together in time, the points will naturally be closer together due to the
physics involved in movement (e.g., animals have limited speed when moving). If
short time gaps between telemetry fixes creates a form of dependence in the obser-
vations that cannot be accounted for by the standard RSF model, then the model
assumptions will not be valid and we cannot rely on the resulting statistical inference.
For these reasons, building on the work of Dunn and Gipson (1977) and Schoener
(1981), Swihart and Slade (1985) developed a method for assessing temporal depen-
dence in telemetry data. A function of distance moved and distance from activ-
ity center serves as the basis for assessing dependence. For a given time lag l,
* Multiple imputation is a two-stage procedure where an imputation distribution is first estimated and then
realizations from it are used as data in secondary models. It can be useful in situations with missing data.
122 Animal Movement
n
i=l+1 (μ1,i − μ1,i−l ) + (μ2,i − μ2,i−l )
2 2
n
n · , (4.22)
i=1 (μ1,i − μ̄1 ) + (μ2,i − μ̄2 ) n−l
2 2
assuming that the positions μi ≡ (μ1,i , μ2,i ) for i = 1, . . . , n are observed directly
without measurement error. Thus, the autocorrelation statistic (4.22) is essentially a
multivariate Durbin–Watson statistic* that accounts for the home range (Durbin and
Watson 1950). By calculating Equation 4.22 for a set of time lags ranging from small
to large, one could look for a temporal lag at which the autocorrelation levels off. This
leveling off suggests a time lag beyond which pairs of telemetry observations can
be considered independent. For large-enough data sets, the original set of telemetry
observations could be thinned such that no two points occur within the determined
time lag and the usual RSF model then can be fit to the subsampled data set.
The papers by Swihart and Slade (1985) and Swihart and Slade (1997) are impor-
tant contributions to the animal movement literature because they remind us to check
the assumptions of our models. The downside is that we leave out data if we are inter-
ested in using standard approaches for analyzing telemetry data. A similar dilemma
occurred early in the development of spatial statistics. Before modern methods for
model-based geostatistics existed, researchers finding evidence for residual spatial
autocorrelation would resort to subsampling data at spatial lags beyond which the
errors were considered to be independent.
Numerous authors have challenged the claim that autocorrelation can affect ani-
mal space use inference (e.g., Rooney et al. 1998; deSolla et al. 1999; Otis and White
1999; Fieberg 2007). However, most of those studies were specifically focused on
home range estimation rather than resource selection inference. Despite the different
focus, Otis and White (1999) issue an important reminder to always consider the tem-
poral extent of the study when collecting and analyzing telemetry data. While Otis
and White (1999) opt for design-based approaches (i.e., those that rely on random
sampling for frequentist inference) that minimize the effects of temporal autocorrela-
tion for the estimation of quantities they were interested in, it is generally important
to obtain a representative sample of the process under study. Fieberg et al. (2010)
provide an excellent overview of different approaches for dealing with autocorrela-
tion in resource selection inference, ranging from the subsampling approach we just
described to hybrid models containing both movement and selection components.
We agree with Fieberg et al. (2010) that newer sources of telemetry data col-
lected at fine temporal resolutions present both a challenge and opportunity for new
modeling and inference pertaining to animal movement. We return to some of these
approaches discussed by Fieberg et al. (2010) in what follows.
In terms of model-based methods to properly account for temporal autocorrelation,
we would normally turn to those approaches used in time series (e.g., Chapter 3).
That is, for temporally indexed data, yt , we could model it in terms of mean effects
* As discussed in Chapter 3.
Point Process Models 123
However, this type of linear model structure does not neatly fit into the point process
framework, nor will it play nicely with typical telemetry data. The time gaps between
telemetry fixes are almost always irregular in practice, despite intentional regularity in
the duty cycling. Also, latency in the time required to obtain a fix is a random quantity
that is difficult to control. In the collection of most time series data, we often assume
that stochasticity associated with observation time is inconsequentially small relative
to the desired inference, and thus, most fixes are mapped to a set of regular time
intervals. Missing data between fixes is still an issue however, but statistical methods
have been developed for dealing with that issue, as we will see in Chapters 5 and 6.
Fleming et al. (2015) present a generalization of the KDE isopleth approach for
estimating animal home ranges when telemetry data are autocorrelated; they use an
alternate form of bandwidth in the KDE to properly adjust for autocorrelation in
the data (Fleming et al. 2015), but their method is for home range estimation with-
out explicitly considering movement constraints or resource selection. To generalize
the point process model such that it explicitly accommodates temporal variation and
autocorrelation veers toward the broader concepts in mechanistic animal movement
modeling. Thus, we return to this in the upcoming section on spatio-temporal point
process (STPP) models.
where each individual has its own set of coefficients (β j and hence, response to
the environmental conditions), but shared error variance (σ 2 ). The individual-level
effects are then assumed to arise from a population-level distribution with mean μβ
and covariance ( β ). On average, we expect the individuals to respond to the environ-
ment like μβ , but with variation corresponding to β . This concept is often referred
to as “shrinkage” because, as the diagonal elements of β get small, all of the indi-
vidual sets of coefficients become more like the population-level mean μβ . Another
descriptor for this framework is a “random effects model.” Despite the ongoing debate
about the phrase “random effects” (especially in Bayesian statistics), it is often used
to describe animal movement models because at least some subset of the β j can be
thought of as arising from a distribution with unknown parameters (i.e., μβ and β ).
It is important to note that this form of random effects model is more general than
what is commonly used in ecology. It is much more common to let an intercept be
the random effect and let the remaining regression parameters be the fixed effects. A
model set up this way can be written as
Notice that this simpler type of random effect model can only shrink the individual-
level intercepts (β0,j ) back to a population-level intercept. Therefore, it is much less
flexible than the case where all coefficients (i.e., β j ) are allowed to random effects.
Regardless of how many parameters are considered as random effects, the advan-
tages of the hierarchical model in this setting are that we can obtain rigorous statistical
population-level inference by building the population mechanism into the model
directly, effectively providing more power to estimate model parameters because
we can borrow strength among individuals. The population-level inference is often
obtained by estimating the population-level mean μβ and its associated uncertainty.
For example, if one of the coefficients in μβ corresponding to a particular type
of covariate is substantially larger than zero, it would imply that the population is
responding positively to that covariate on the whole. This type of inference could
occur even if some individuals are responding negatively to the covariate.
The hierarchical model appropriately weights unbalanced data sets and allows
us to properly scale the inference to the correct level so that the individual, rather
than each observation, is the sample unit. To visualize the effect this can have on
inference, consider an alternative model where each observation is the sample unit:
yi, j ∼ N(xi,j μβ , σ 2 ). In this simplified nonhierarchical model, there are essentially
J Jj=1 nj total observations to estimate q coefficients (given there are q − 1 covari-
ates). However, in the original hierarchical version of the model, even at best (i.e.,
very small σ 2 ), there are only Jq effective observations to estimate the q population-
level coefficients. This reduction in effective sample size is a result of the goals for
inference in the study. While it might seem like a bad thing, it keeps us from being too
optimistic about population-level effects by appropriately increasing the uncertainty
associated with the estimator for μβ .
Point Process Models 125
How can the random-effect concept be used for inferring population-level resource
selection? As it turns out, the population-level RSF model can easily be formulated
by indexing the selection coefficients by individual and specifying a distribution for
them. The individual-level coefficients are essentially means, like those in the simple
regression model; thus, we use the same multivariate Gaussian distribution for them
as random effects
nj
exp(x (μi,j )β j )
μi, j ∼ ,
exp(x (μ)β j )dμ
i=1
β j ∼ N(μβ , β ).
In practice, the implied spatial point process model could still be fit using either
logistic or Poisson regression after properly transforming the data as described in
the earlier sections. After fitting the model, population-level inference for resource
selection can be obtained by assessing the estimate for μβ .
To demonstrate the benefit of using a hierarchical RSF model for population-level
inference, we simulated point processes arising from 10 individuals (Figure 4.12).
Each simulated individual in Figure 4.12 has a positive response to the expo-
sure covariate (from the previously analyzed mountain lion data), but the selection
for exposure is stronger for some individuals. Ultimately, inference is desired for
resource selection at the population level, but we analyzed the individuals separately
8 4
3
10 9
5 1
2
FIGURE 4.12 Exposure covariate (grid) and simulated telemetry data (points) for 10 indi-
viduals. The individuals are denoted by number at each home range centroid.
126 Animal Movement
nj
exp(βj,1 x(μi,j ))
μi, j ∼ (4.23)
exp(βj,1 x(μ))dμ
i=1
that are implemented using Bayesian Poisson GLMs, where yj,l ∼ Pois(eβj,0 +βj,1 xj,l )
for cell counts yj,l , at grid cells cl (l = 1, . . . , L). Gaussian priors were specified for
the coefficients such that βj,k ∼ N(0, 16) for k = 0, 1 and j = 1, . . . , J. Figure 4.13a
shows the point estimates and 95% credible intervals for each individual. Thus, as
expected, most of the selection coefficients are estimated to be positive, indicating
a preference for more exposed terrain, while a few (i.e., individual 3 and 4) do not
appear to be significant. Do we have sufficient evidence to conclude that the simu-
lated population of individuals is positively selecting for exposure at the population
level?
(a)
2.5
1.5
β1
0.5
−0.5
1 2 3 4 5 6 7 8 9 10
Individual
(b)
2.5
1.5
β1
0.5
−0.5
1 2 3 4 5 6 7 8 9 10
Individual
FIGURE 4.13 RSF parameter estimates for β1 based on exposure as a covariate using
(a) independent point process models and (b) a hierarchical point process model with pooling
at the individual level. Posterior means for each coefficient are shown as points and 95% cred-
ible intervals are shown as vertical bars. In panel (b), the dashed horizontal lines represent the
population-level 95% credible interval and the solid horizontal line represents the population-
level posterior mean for μβ . The gray horizontal line represents zero selection and is shown
for reference only.
Point Process Models 127
exp(x(μi, j )β1, j )
μi, j ∼ ,
exp(x(μ)β1, j )dμ
β1, j ∼ N(μβ , σβ2 ),
μβ ∼ N(0, 100),
σβ ∼ Unif(0, 100),
data. DOP calculations are made based on the geometry of the positions of the satel-
lites and the telemetry device. Small DOP values imply high-quality measurements
and large values imply low quality. GPS measurement error distributions are often
assumed to be multivariate Gaussian, but can vary both spatially and temporally.
Let si for i = 1, . . . , n represent the measured telemetry locations, then the simplest
parametric model for the error conditioned on the true but unknown location μi is
si ∼ N(μi , σ 2 I). In this case, the error variance (σ 2 ) is assumed to be homogeneous,
but it could be generalized such that it is a function of the provided DOP information
for each measurement (σi2 = g(DOPi )). A simple link function relating the DOP to
the error variance is the logarithm. In this case, we might choose to model the error
standard deviation as a linear function of DOP such that log(σi ) = α0 + α1 DOPi .
When fitting this model, we expect the slope coefficient α1 to be positive because
DOP increases as error variance increases. The multivariate Gaussian model for error,
in this case, provides circular error isopleths, implying that there is symmetry and no
directional bias in the telemetry errors. Clearly, these assumptions may not always
hold, but the basic framework for modeling the error structure we present is capable
of being extended for more complicated situations. For example, if we expected the
errors to be greater in the longitudinal direction than the latitudinal direction, we could
replace the error covariance matrix (i.e., σ 2 I) with one that is still diagonal, but with
two variance components as the diagonal elements (i.e., diag(σ12 , σ22 )). An example
of independent Gaussian errors, with covariance σ 2 I, is shown in Figure 4.14a.
In contrast to GPS data, Argos telemetry data are subject to an entirely differ-
ent type of measurement error due to the polar orbiting nature of the associated
satellites. Some of the same environmental and behavioral features that affect GPS
error can also influence Argos error, but the actual mechanics of the instrumentation
often cause the largest errors. In particular, Argos telemetry errors often assume an
X-pattern due to the polar orbit of the satellites and which side of the individual they
pass on (e.g., Costa et al. 2010; Douglas et al. 2012). Fortunately, Argos provides
auxiliary information associated with the error class for each fix. For data prior to the
(a) (b)
4 4
2 2
Latitude
Latitude
0 0
−2 −2
−4 −4
−4 −2 0 2 4 −4 −2 0 2 4
Longitude Longitude
year 2007, Argos used categorical error classes that are ordinal, taking on the values
3, 2, 1, 0, A, B, Z, with 3 corresponding to the smallest error and Z the largest.
For recently collected Argos data (i.e., since 2007), a new algorithm has been cre-
ated for providing more detailed information about the type of error distribution (e.g.,
Boyd and Brightsmith 2013). This new algorithm allows for elliptical-shaped distri-
butions such as the multivariate Gaussian (McClintock et al. 2015).* In the absence
of further modeling, these newer techniques for processing raw Argos data can be
useful in providing a better understanding of the error associated with the observed
locations. However, newer processing methods rely on Kalman methods that imply
linear dynamics in the associated underlying movement process (Silva et al. 2014).
Thus, researchers should be careful in how they interpret Argos error information in
conjunction with ongoing modeling efforts that may or may not share similar dynamic
properties.
Given the clear X-shaped pattern in the distribution of most Argos telemetry
errors, Brost et al. (2015) and Buderman et al. (2016) suggested accounting for the
measurement distribution in a hierarchical framework that can contain any modeled
movement process one chooses. We return to these specific movement models in later
sections and chapters, but for now, we just describe a measurement model, assuming
that there is an underlying model for the true positions μi .
The method for accommodating Argos telemetry error presented by Brost et al.
(2015) and Buderman et al. (2016) allows the error to arise from a mixture of
two elliptically shaped distributions. The use of two distributions accounts for the
X-pattern that arises from the direction that the satellite passes overhead. The multi-
variate Gaussian is incredibly useful for this type of model and can serve a starting
point. In our proposed measurement model for the GPS data, we suggested using a
multivariate Gaussian that is potentially elliptical in the cardinal directions only. We
seek a more flexible specification that can account for an elliptical shape on a diag-
onal axis. Thus, if we know which side of the telemetry device the satellite passes
over, we can use a fully parameterized multivariate Gaussian measurement model:
si ∼ N(μi , ), where the covariance matrix is completely unknown and need not
be diagonal. For example, the covariance matrix
√
1 ρ a
≡ σ2 √ (4.24)
ρ a a
is quite flexible. In this case, some combination of the three covariance parameters
can provide an appropriate amount of eccentricity and tilt for the error ellipses. This
measurement model is very similar to that used by McClintock et al. (2015), which
relies on information from Argos about the direction of tilt in the ellipse. In older data
sets, where such information is not available, we need a mixture model to account for
tilt in either direction. Thus, consider a generalization of the measurement model
where p represents a mixture probability* and the matrix rotates the first distri-
bution to provide an X-shape to overall mixture distribution. The rotation can be
achieved by specifying as
1 0
≡ . (4.26)
0 −1
Figure 4.14b shows telemetry position errors (i.e., si − μi , i = 1, . . . , n) associated
with the Gaussian mixture model (4.25).
Mixture models can be represented in many ways. The model presented in Equa-
tion 4.25 is one of the most common forms for mixture models, but there can be value
in using a hierarchical structure with auxiliary variables to specify the mixture model.
For example, Buderman et al. (2016) used the form
N(μi , ) if zi = 1
si ∼ , (4.27)
N(μi , ) if zi = 0
where the latent binary process is modeled as zi ∼ Bern(p) and acts like a switch,
turning on and off each distribution as needed. Perhaps surprisingly, this new mix-
ture specification (4.27) yields exactly the same inference as the previous one (4.25)
and has other benefits in terms of implementation. In the simple RSF context we
have described in this chapter, consider a fully specified model that accounts for
Argos telemetry error and uses the RSF point process model for the underlying true
observations:
A second form of structure arises in the support for the point process, that is, the
spatial domain where the points are restricted to occur. Brost et al. (2015) demon-
strates the effect of barriers to movement on the ability to estimate the true underlying
point process and resource selection. In the case of a marine species, the shoreline can
serve as an adequate boundary and allow the model to separate measurement error
from process-based variation (Brost et al. 2015). Finally, natural temporal autocor-
relation in the process can also provide enough structure in some cases to separate
measurement error from process-based variation (e.g., Brost et al. 2015). This con-
cept is fundamental to the dynamic movement models we describe in the next section
and later chapters.
Regardless of the type of telemetry device used, it is important to understand the
potential influence of measurement error on the desired inference as well as how to
properly account for it. The power of model-based approaches for animal movement
inference is that one can generalize the model structures as needed to accommodate
intricacies of the data and type of movement behavior.
we can generalize it for situations where the time steps between telemetry observa-
tions are small. When the time steps are small, we would expect to see a movement
signal in the data themselves. Such a signal arises from the physical limitations of the
movement process. That is, there is some reasonable finite upper bound to the distance
an animal can travel, or is willing to travel, in a fixed amount of time. Heuristically,
constraints provide smoothness to the individual’s path based on its true positions
at each time. Conditioning on the position at the previous time step (μi−1 ), envi-
sion a spatial map corresponding to the probability the animal will occur at the next
time in the absence of other environmental information. For example, the maps in
Figure 4.15 indicate that locations near the previous position (μi−1 ) would be more
likely to host the next position (μi ). As the distance increases from the previous posi-
tion, we would be less likely to find the next position. The position labeled μi in
Figure 4.15 is more likely under the availability in panel (b) than panel (a). Further-
more, as the time between positions (i ) increases, we would expect the map to be
flatter, indicating the animal could be farther away. With increasing i , we would
expect a completely flat surface over the support of the point process (M) such that
the effective distribution for that particular position (μi ) is uniform (or CSR, using the
jargon from the point process literature). The surface we are describing corresponds to
the availability surface (f (μi , θ)) for each particular time ti and will change over time
depending on μi−1 and i . Moorcroft and Barnett (2008) refer to this time-varying
availability distribution as a “redistribution kernel.”
132 Animal Movement
(a) (b)
0.8 0.8
µi µi
µi−1 µi−1
Latitude
Latitude
0.4 0.4
0.0 0.0
0.0 0.4 0.8 0.0 0.4 0.8
Longitude Longitude
FIGURE 4.15 Examples of two different availability functions f (μi , θ ) (shaded surface with
darker corresponding to greater availability). (a) Less diffuse availability and (b) more diffuse
availability. Two consecutive positions (i.e., μi−1 and μi ) are shown for reference.
To translate the concept of time-varying availability into the point process model
itself, we need to allow for dependence in the availability distribution such that
The new model in Equation 4.30 has the same basic form as the original point pro-
cess model in Equation 4.6, but contains an explicit dependence in time through the
availability function f (μi |μi−1 , i , θ). Christ et al. (2008) and Johnson et al. (2008b)
presented this STPP model as part of a general framework for accounting for both ani-
mal movement and resource selection simultaneously. Later, Forester et al. (2009),
Potts et al. (2014a), and Brost et al. (2015) used similar approaches to model teleme-
try data from elk (Cervus canadensis), caribou (Rangifer tarandus), and harbor seals
(Phoca vitulina), respectively.
* We switched to using μ for a spatial location instead of s as was used in Chapters 2 and 3 for point
process description.
Point Process Models 133
where n(B ) is the number of events in B and |B | is the volume of the cube. If
λ(μ, t|Ht ) = λ(μ, t), that is, it does not depend on the history up to time t, then it
is a spatio-temporal Poisson process with the properties given in Chapters 2 and 3
(with respect to spatial and temporal Poisson processes). Following the derivations
from each of the two previously discussed processes,* we arrive at the likelihood for
the STPP as
where (μi , ti ) are the locations and times of observed events, M is the spatial study
area, and T is the time window of the study.
Notice that the model in Equation 4.32 does not look like Equation 4.30 yet. Thus,
we investigate further, providing more details and one additional result. First, the
intensity function is usually decomposed as
n
λ(μi , ti |Hti , θ)
[μ1 , . . . , μn |t1 , . . . , tn , θ] = . (4.34)
i=1 M λ(u, ti |Hti , θ)du
n
g(μi )f (μi , ti |Hti , θ)
[μ1 , . . . , μn |t1 , . . . , tn , θ] = , (4.35)
i=1 M g(μ)f (μ, ti |Hti , θ)dμ
where the temporal baseline intensity h(t|θ) does not appear because it cancels in
the numerator and denominator. If the intensity changes depending only on the last
* The derivations to arrive at the STPP likelihood are similar to what was presented in Chapters 2 and 3;
thus, we omit it here.
134 Animal Movement
observed event location and time interval since the last event, the resulting conditional
distribution of event locations is
n
g(μi )f (μi |μi−1 , i , θ)
[μ1 , . . . , μ1 |t1 , . . . , tn , θ] =
i=1 M g(μ)f (μ|μi−1 , i , θ)dμ
n
= [μi |μi−1 , i , θ], (4.36)
i=1
and we arrive at the full likelihood for the model given by the transitions in Equa-
tions 4.30. In the references provided at the beginning of the section (i.e., Christ et al.
2008; Johnson et al. 2008b; Forester et al. 2009; Potts et al. 2014a; Brost et al. 2015),
the conditional STPP model was developed under the weighted distribution paradigm
(i.e., using expressions like Equation 4.29; Patil and Rao 1977). Those papers devel-
oped weighted distributions by specifying movement models for μi given μi−1 and
weighting the spatial distribution by the spatial effects in g(μ). The integral in the
denominator of Equation 4.36 results out of necessity to normalize the PDF to inte-
grate to one over the spatial domain M. Johnson et al. (2013) arrived at the same
result using STPP concepts directly.
where ||μi − μi−1 || is the Euclidean distance between the two positions (μi and μi−1 )
and r is the radius of the circular availability area. This early work led to a suite of
similar methods known as “step selection functions” (Boyce et al. 2003; Fortin et al.
2005; Potts et al. 2014a; Avgar et al. 2016). The classical step selection function
approach defines the availability circle using the empirical step lengths associated
with the telemetry data. A background sample of availability locations is selected
within the associated circle for each telemetry observation. Then a conditional logis-
tic regression approach is used to associate the covariates at the background sample
locations with each telemetry location. Similar methods were developed for use in
medical statistics to account for variation in patients that have similar backgrounds
Point Process Models 135
to control for potentially confounding factors in life history (Rahman et al. 2003).
Fortin et al. (2005) claimed that the remaining temporal dependence in these mod-
els will not affect inference on selection coefficients; however, it has been shown
that there are exceptions (e.g., Fieberg and Ditmer 2012; Hooten et al. 2013b). For
example, when the covariates influencing selection are smoothly varying, there is an
increased risk of temporal confounding.
An alternative availability model where the availability range is estimated simul-
taneously with the other parameters was proposed by Christ et al. (2008) and
generalized to uneven times of location by Johnson et al. (2008b) is
such that μ̃i = μ̄ + Bi (μi−1 − μ̄) and μ̄ is a central place of attraction. The compo-
nents controlling the dispersion of the availability distribution are Bi ≡ exp(−(ti −
ti−1 )/φ)I, where Qi = Q − Bi QBi . Johnson et al. (2008b) arrived at this specific
form for availability because they were assuming a stochastic process for animal
movement called the Ornstein–Uhlenbeck (OU) model (e.g., Dunn and Gipson 1977;
Blackwell 1997). The parameter φ controls the range of availability as the r param-
eter does in the “step selection” models. However, in the OU model, the availability
limit is soft, meaning the availability function never drops all the way to zero for
any distance from the current location, but the function decreases and approaches
zero for very large distances. The early step selection models had a hard availability
limit (i.e., there is no availability of locations for distances larger than r). Addition-
ally, the OU-based model allows for a central attraction point (or multiple attraction
points, e.g., Johnson et al. 2008b). We provide additional details of OU processes
for modeling animal movement in Chapter 6. Similar to Johnson et al. (2008b),
Moorcroft and Barnett (2008) also described a unification of resource selection mod-
els and what they call “mechanistic home range” models. The mechanistic home
range models essentially model the movement process in terms of partial differen-
tial equations (Moorcroft et al. 2006). Moorcroft and Barnett (2008) also point out
that the model in Equation 4.30 rigorously accommodates autocorrelation if it exists.
Potts et al. (2014a) discussed the same framework presented by Johnson et al.
(2008b), but referred to Equation 4.30 as the “master equation.” Potts et al. (2014a)
parameterized the time-varying availability function f (μi |μi−1 , θ) in terms of bearing
θ so that μi and μi−1 are related by
cos(θ + π ) (μi − μi−1 ) (μi − μi−1 )
μi = μi−1 + , (4.39)
sin(θ + π ) (μi − μi−1 ) (μi − μi−1 )
where (μi − μi−1 ) (μi − μi−1 ) is the Euclidean distance between μi and μi−1 .
Additionally, Potts et al. (2014a) were interested in discrete habitat types, and thus,
they modified the traditional RSF g(x(μi ), β) to be the proportion line segment from
μi−1 to μi of habitat x, for example. Potts et al. (2014a) ultimately decomposed the
availability function into a finite sum of habitat-specific components. The habitat-
specific components involved a product of turning angle and step length distributions
(e.g., Weibull and von Mises distributions). Rather than maximize the likelihood
136 Animal Movement
based on Equation 4.30 directly, Potts et al. (2014a) used an approximate condi-
tional logistic regression procedure similar to that described in Section 4.2 on RSFs
to estimate parameters.
Most (if not all) STPP analysis of telemetry data assumes that the locations are
observed without error. If the locations are observed with a significant amount of
error, then that must be taken into account. We present an example analysis of the har-
bor seal telemetry data found in Brost et al. (2015) that uses a hierarchical framework
to accommodate complicated telemetry error distributions.
Rather than rely on a specific stochastic process as a model for animal movement,
Brost et al. (2015) specified an availability distribution directly based on a particular
form of smoothness
||μ − μi−1 ||
f (μi |μi−1 , θ) ∝ exp − i , (4.40)
i φ
where ||μi − μi−1 || is a distance measure between true positions μi and μi−1 , i
is the elapsed time between positions, and φ acts as a smoothing parameter. The
availability distribution in Equation 4.40 is very similar to an exponential model for
correlation in a spatial covariance matrix (Chapter 2). In their analysis of harbor seals,
Brost et al. (2015) considered the shortest water distance as the distance metric in the
availability function. This distance metric allowed them to appropriately accommo-
date the shoreline as a hard constraint for movement of harbor seals. While increasing
the realism and utility of the model, formally accounting for such a constraint adds
a nontrivial amount of complexity to the model implementation. It is worth noting
that, although the exponential function was used in the study of harbor seals, many
other functional forms are reasonable. Forester et al. (2009) describe several different
functional forms and state that exponential family functions are preferable.
Following Brost et al. (2015), we analyzed Argos telemetry data arising from an
individual seal in the Gulf of Alaska. The telemetry data in our example (Figure 4.16)
occur at irregular temporal intervals, ranging from minutes to hours, with the majority
of observations occurring less than 2 h apart. The telemetry data are composed of a
range of error classifications with the majority of data in the lower-quality Argos error
categories (e.g., 0, A, and B classes), which is why many observed positions occur
on land, far from water (Figure 4.16).
We specified a hierarchical STPP model for the harbor seal telemetry data such
that
si ∼ p · t(μi , i , νi ) + (1 − p) · t(μi , H i H , νi ),
where the t-distribution allows for extreme telemetry observations and has heavier
tails than the Gaussian distribution as the degrees of freedom parameter νi decreases
Point Process Models 137
(a) (b)
840 840
835 835
Northing (km)
Northing (km)
830 830
825 825
60 65 70 75 60 65 70 75
Easting (km) Easting (km)
FIGURE 4.16 Argos telemetry data (si , for i = 1, . . . , n; shown as points) for an individual
harbor seal and two different environmental covariates (X) influencing harbor seal movement:
(a) distance from known haul out (i.e., distance from position shown with a dark triangle in the
left of each panel) and (b) bathymetry (i.e., ocean depth). Both covariates were standardized
and are shown with darker shading as the values of the covariate increase.
and p = 0.5. The measurement scale matrix i was specified as in Equation 4.24 and
data model parameters σi2 , ai , ρi , and νi assume one of six distributions depending on
which error class was recorded for that telemetry observation.* We assumed uniform
priors on ecologically reasonable ranges of support for the standard deviation σi as
well as ρi and νi . The time-varying availability function f (μi |μi−1 , θ) in the hierar-
chical model was specified as in Equation 4.40, where the distance metric was the
shortest water distance between μi and μi−1 .
Fitting the hierarchical STPP model of Brost et al. (2015) to the harbor seal teleme-
try data has additional benefits. For example, because the individual is constrained to
be in the water and adjacent shorelines only, erroneous telemetry observations over
land will naturally be constrained to occur in the correct support (i.e., the water).
Furthermore, the constraint itself actually aids in the estimation of measurement
error–specific parameters (i.e., σi2 , ai , ρi , and νi ) because the model knows that posi-
tions on land are incorrect. To summarize the estimated true individual positions μi ,
we calculated the posterior mean UD for all positions E({μi , ∀i}|{si , ∀i}) for the entire
support considered in the study area (Figure 4.17). The selection coefficients associ-
ated with the distance to haul out and bathymetry covariates were both estimated to
be negative. Therefore, after controlling for potential autocorrelation due to tempo-
ral proximity of telemetry fixes, complicated Argos measurement error, and barriers
to movement, the data suggest that this individual harbor seal selects for aquatic
environments nearer the haul out and in shallower water. These findings agree with
the central place foraging behavior of harbor seals in the North Pacific Ocean.
* If telemetry observation si is measured with error class ci = 2, then the variance parameter for observa-
tion i assumes the variance for error class 2: σi2 = σc2i =2 . The other parameters are defined similarly.
Priors for the parameters of different error classes can be specified such that they contain differing
information about the precision of the measurement at that time.
138 Animal Movement
835
Northing (km)
830
825
60 65 70 75
Easting (km)
FIGURE 4.17 Argos telemetry data (i.e., si , for i = 1, . . . , n; shown as points) for an indi-
vidual harbor seal and the estimated posterior mean UD (i.e., utilization distribution) (i.e.,
E({μi , ∀i}|{si , ∀i})) based on true underlying positions μi and known covariates X.
where x (μi ) is a vector containing a 1 (for the intercept) and the stream distance
covariate at μi , and η(μi , θ) is a thin-plate regression spline in 2-D (df = 25, Wood
2003), d(μi , μi−1 ) is the Euclidean distance from μi−1 to μi , and i ≡ ti − ti−1 .
Point Process Models 139
(a) (b)
(c) (d)
FIGURE 4.18 Spatio-temporal point process of brown bear locations. Plot (a) shows the
location on top of the distance to the nearest steam, the fitted “home range” density function
is shown in plot (b), (c) illustrates the fitted selection surface, and (d) shows the fitted density
modeling the availability function for μ101 (white dot).
To examine how this relates to the other models in this section, we factor the
intensity as
λ(μi , ti |Hti ) = g1 (μi )g2 (μi )f (μi |μi−1 , i ), (4.42)
where
g2 (μi ) = eη(μi ,θ) (4.43)
might be considered broad-scale selection within the study area, perhaps a home
range,
g1 (μ) = ex (μi )β (4.44)
140 Animal Movement
is small-scale selection within the home range relative to the stream distance covari-
ate, and finally,
f (μi |μi−1 , i ) = αd(μi , μi−1 )2 /i (4.45)
is the temporally dependent redistribution kernel. The dynamic availability in Equa-
tion 4.45 is inspired by the transition kernel of a Brownian motion movement model,
but, when combined with g2 , the total movement is similar to an OU model in that it
has a region of attraction, although not a central point.
Despite the added complexity, there are benefits of the full STPP analysis over the
conditional analysis where location times are considered fixed. The full likelihood
in Equation 4.32 appears considerably more difficult to evaluate than the conditional
version in Equation 4.36. The baseline temporal intensity of the locations is constant;
thus, there does not even seem to be any inferential benefit to be gained in that respect
either. The real benefit lies in the likelihood computation. In the conditional likelihood
(4.36), the 2-D spatial integral must be computed n times. However, in the full like-
lihood, only one three-dimensional (3-D) integral is necessary. Johnson et al. (2013)
show that the approximation methods used for spatial and temporal likelihood can be
extended to the spatio-temporal version. To do so, we augment the observed locations
and times with a grid of quadrature locations in space and time, qijl , l = 1, . . . , Lij at
times uij , j = 0, . . . , Ji , where ti−1 = ui0 and ti = uiJi . In addition, we denote aijl to be
the area of the cell associated with qijl . The area aijl depends on how the points were
selected. If the points were selected in a nonregular manner, a Voronoi tessellation
can be used to obtain the areas, whereas if the points were selected as centroids of a
regular grid, then the area of the grid cell is used. However, even if a regular grid is
used, recall that observed locations are part of the observed set; μi = qiJi l for one of
the l, say l = Lij , to be compatible with the j index. Therefore, the quadrature points
are never on a completely regular grid, so, some adjustment has to be made to assign
area mass to the observed locations. The easiest method to handle this situation is to
count the number of observed locations in a cell and divide the area of the cell by this
count plus one (for the grid centroid) and assign the partial areas to all points in the
cell. Now, the log-likelihood can be approximated by
Lij
n
Ji
(β, θ, α) ≈ zijl log(λijl ) − λijl , (4.46)
i=1 j=1 l=1
where
and vijl = aijl (uij − ui,j−1 ), ij = uij − ti−1 , and zijl = 1 for j = Ji and l = Lij and
zero elsewhere. As in Chapters 2 and 3, the zijl can be thought of as independent
Poisson variables and we can fit the model with any GLM fitting software using
log(vijl ) as an offset. Fitting a single model may not be much faster using the full like-
lihood versus the conditional likelihood; however, after the “model data” have been
created, that is, zijl , qijl , x(qijl ), and Bijl = d(qijl , μi−1 )2 /ij , any number of other
Point Process Models 141
submodels or alternate models that use the quantities can be fit using the optimized
GLM algorithms in most statistical software. Thus, a full analysis, including model
selection or multimodel inference, can proceed quickly after the data are created.
We fit the full STPP model to the brown bear telemetry data using the R package
“mgcv” (Wood 2003) to implement the thin-plate spline (Figure 4.18). The larger-
scale home range surface, g2 (μ) in Figure 4.18b, shows the bimodal surface found by
Johnson et al. (2008b) when using an OU movement model and two centers of attrac-
tion. The difference between this analysis and that described by Johnson et al. (2008b)
is that we did not have to specify the number of points of attraction or the switching
time. The small-scale resource selection surface, g1 (μ), is shown in Figure 4.18c,
where one can see that the bear selects for habitat in close proximity to streams with
coefficient estimate β̂ = −2.41 and 95% confidence interval (−2.71, −2.11) for the
distance from the nearest stream covariate. Finally, the OU-like transition kernel,
g2 (μi )f (μi |μi−1 , i ), is shown in Figure 4.18d for i = 101. Notice that the mass of
the transition density is centered on the current location (white point) and decreases
to zero as the distance from the current location increases.
λ(μ) = λ(μ, t|Ht ) dt
T
where we assume that α > 0. The integral on the right-hand side of Equation 4.48
does not exist in a closed form, but symbolically,
i
γi (μ) = exp −αd(μ, μi−1 )2 /u du
0
where (·, ·) is the incomplete gamma function. Although (·, ·) is not available
in closed form, numerical solutions are available in most statistical software. It
is hardly apparent what the γi (μ) function looks like in geographic space, but as
Johnson et al. (2013) noted, it is similar in shape to a bivariate normal density cen-
tered on μ (Figure 4.19). Substituting Equation 4.49 into Equation 4.48, we obtain
the spatial intensity
n
λ(μ) = exp x (μ)β + η(μ, θ) γi (μ)
i=1
(a) (b)
1.0
0.8
1.0
0.6 0.8
γi (µ)
0.6
γi (µ)
0.4
0.4
2
0.2 0.2 1
0 e
−1 ud
0.0 −2 tit
0.0 −2 −1 0 1 2 La
0.0 0.5 1.0 1.5 2.0 Longitude
Squared distance
FIGURE 4.19 Illustration of γi (μ). The plot on the left depicts a 1-D view, where γi (μ) is
a function of squared distance from μ. The plot on the right shows the full 3-D view of the
γi (μ) function illustrating that it assumes the form of a kernel similar to a bivariate normal
kernel.
Point Process Models 143
the likelihood. First, note that there is a movement-related parameter, α, that is part
of the γi (μ) calculation. Thus, if one uses GLM fitting routines in an efficient man-
ner, α must be known and fixed at α̂ so that we can calculate one kernel density
map, û(μ), which may be used as a covariate in the GLM model. However, this
assumes that the Brownian kernel is correct and uncertainty about α is ignored. Tech-
nically, u(μ) is a random spatial field that controls interactions between observed
locations, much like the Gibbs spatial point process models of Section 2.1.3, for which
model fitting is notoriously difficult. Illian et al. (2012) suggested a log-Gaussian Cox
process (Section 2.1.3) approximation, which can be fit numerically using readily
available software. The basic premise of the Illian et al. (2012) approach is to cre-
ate a constructed covariate that captures interaction effects, then add the covariate to
a random effects version of the Poisson GLM representation of the Poisson spatial
point process. Thus, for the spatial marginalization model in Equation 4.50, this can
be accomplished by the following procedure:
Instead of GAM smoothing, Johnson et al. (2013) and Illian et al. (2012) used
ICAR models (Section 2.3.2), which provide an acceptable alternative in a Bayesian
framework.
To demonstrate the spatial marginalization of STPP models, we reanalyzed the
bear data presented in the last section. In their example, Johnson et al. (2013) chose a
value for α based on a commonly held belief about the maximum speed of travel for
northern fur seals (Callorhinus ursinus). We take an empirical approach by setting α̂
equal to 1/mean(observed velocity)2 , because, for Brownian
√ motion, the expected dis-
placement in one unit of time is approximately 1/ α. After selecting α̂, we created
a heterogeneous kernel UD using Equation 4.49; the kernel is shown in Figure 4.20a.
The remaining effects in the model were as described in the previous section and
the R package “mgcv” was used to fit the model. The estimated resource selec-
tion coefficient for stream distance was β̂ = −1.78, with 95% confidence interval
(−2.12, −1.44) (fitted selection surface shown in Figure 4.20b). What might be
termed the “availability” surface, η1 (μ, θ 1 ) + η2 (û(μ), θ 2 ) (Figure 4.20c) accounts
for all the other influences beyond resource selection, that is temporal autocorre-
lation and home range effects. The availability surface functions as a trade-off of
144 Animal Movement
(a)
(b) (c)
FIGURE 4.20 Spatial point process model fit to spatio-temporal brown bear telemetry data
using the temporal marginalization approximation. Plot (a) illustrates the observed data and
the log û(μ) surface. The fitted resource selection surface for the stream distance covariate
is shown in (b). Plot (c) illustrates the fitted availability surface; η1 (μ, θ 1 ) + η2 (û(μ), θ 2 ).
Plots (b) and (c) partition space utilization to the components attributable to known covariates
and those components that cannot be assigned to a specific habitat trait.
* We use the t subscript here instead of i for simplicity and consistency with the time series notation. Also,
we can always just linearly rescale the entire temporal extent so that t is with respect to the units of
interest. For example, in that case, if t is an hour, then the +1 and −1 correspond to the hour after and
before.
147
148 Animal Movement
other locations but only through its nearest neighbors in time. That is, if the random
walk is of order 1 (e.g., an AR(1) time series model), we can write
μt = μt−1 + ε t , (5.1)
for t = 1, . . . , T, where the errors are often assumed to be independent and nor-
mally distributed such that εt ∼ N(0, ). In the simplest case, the error covariance
matrix could be specified as ≡ σ 2 I, so that the errors are symmetric. In time
series statistics, this model is often referred to as a vector autoregressive model (i.e.,
VAR(1); because μt is multidimensional) of order one. Recall, from Chapter 3, that
an alternative way to write the random walk model is using distribution notation such
that μt ∼ N(μt−1 , σ 2 I). The distribution notation is a theme throughout this book
and can be helpful when formulating hierarchical models, especially in a Bayesian
framework.
In terms of mechanisms, the VAR(1) model implies that the displacement of the
individual during each time step occurs in a random direction with step length gov-
erned by a univariate Weibull distribution. In this case, the variance component σ 2
controls the step lengths between successive locations. For example, Figure 5.1 shows
both the empirical and theoretical distributions (histogram based on T = 10,000 time
steps and σ 2 = 0.5, 1, 2) of the step lengths resulting from three simulated 2-D tra-
jectories using Equation 5.1. Notice how both the central tendency and spread in step
length distribution increase as the random walk variance parameter (σ 2 ) increases
(Figure 5.1).
The formulation in Equation 5.1 is often referred to as an “intrinsic” conditional
autoregressive model (ICAR) because the effect of the location at the previous time
step is not attenuated or mixed with another location-based force. ICAR models are
nonstationary in the sense that the process is not being shrunk back toward some fixed
location in space and there are no other constraints on the process (e.g., that the μt sum
to 0). There is no assumed center of gravity in the model to keep the individual in one
general area; thus, it lacks that mechanism for modeling a central place forager like a
pygmy rabbit (Brachylagus idahoensis) or a harbor seal (Phoca vitulina). However,
substantial flexibility can be accommodated in the autoregressive framework, and the
VAR(1) specification can serve as a basis from which we can generalize to account
for more complicated mechanisms of movement.
Finally, one of the unique aspects of conditional autoregressive models is that
it is straightforward to translate the first-order (i.e., mean) dynamics into second-
order dependence (covariance). That is, if we vectorize all of the μt and concatenate
such that μ ≡ (μ1 , . . . , μT ) , then the same properties used in spatial statistics allow
us to write the joint distribution for all of the individual locations as μ ∼ N(1 ⊗
μ̄, μ ⊗ I). This type of formulation can sometimes be advantageous for compu-
tational reasons because of the sparsity of −1 μ or various basis function expansions
of the covariance structure. We return to this concept of modeling dynamics in the
second-order component of the model in Chapter 6.
We consider each of the following generalizations to the simple random walk
model in turn:
Discrete-Time Models 149
(a)
0.8
0.6
Density
0.4
0.2
0.0
0 1 2 3 4 5 6
Step length
(b) 0.6
0.4
Density
0.2
0.0
0 1 2 3 4 5 6
Step length
(c)
0.4
0.3
Density
0.2
0.1
0.0
0 1 2 3 4 5 6
Step length
FIGURE 5.1 Empirical (histogram) and theoretical (solid line) step length distributions based
on simulated trajectories using Equation 5.1 with T = 10,000 and (a) σ 2 = 0.5, (b) σ 2 = 1,
and (c) σ 2 = 2. The
√ theoretical distribution for step lengths arising from the model in Equation
5.1 is Weibull(2, 2σ 2 ).
1. Attraction
2. Measurement error
3. Temporal alignment (i.e., irregular data)
4. Heterogeneous behavior (e.g., covariate-based, change-point-based)
150 Animal Movement
5.1.2 ATTRACTION
A useful generalization of the VAR(1) model allows for the inclusion of an attracting
point, or central place. In time series jargon, one approach for imposing an attractor
can be achieved by forcing the process to be stationary. To impose stationarity, we
can model the centered time series as
μt − μ∗ = M(μt−1 − μ∗ ) + εt , (5.2)
where we can interpret μ∗ as the geographic centroid of the movement process (e.g.,
a home range center) and the propagator matrix M now controls the dynamics. The
simplest type of dynamics can be achieved by letting M ≡ ρI. This propagator effec-
tively treats the dynamics in latitude and longitude the same, but independently. If we
desire a functional form that is more typical, with μt on the left-hand side by itself,
then the formulation becomes
When M ≡ ρI and ρ = 1, this new model (5.3) reduces back to the original ICAR
form in Equation 5.1. Also, in that case, at fine time scales, we expect the movement
process to be smooth, and thus, the parameter ρ controls the smoothness and should
fall between zero and one for the individual to have an attracting point (μ∗ ), on aver-
age. The (I − M) term is actually not necessary; however, if it is retained in the model
statement, it induces a simple stability constraint on ρ (i.e., −1 < ρ < 1) so that it is
interpretable as a correlation coefficient.
Figure 5.2 shows two simulated 2-D VAR(1) processes arising from Equation 5.3
with attractor μ∗ = (1, 1) and σ 2 = 1 in both cases. We set ρ = 0.5 for the left
column of panels (Figure 5.2a and c) and ρ = 0.95 for the right column of pan-
els (Figure 5.2b and d). Notice that both bivariate processes are stationary around
μ∗ = (1, 1) , but that a value of ρ closer to 1 forces the trajectory to be smoother in
Figure 5.2d–f than in Figure 5.2a–c, where ρ is smaller.
(a) (d)
3
10
2
μ2
μ2
1
5
−1 0
−1 0 1 2 3 −5 0 5
μ1 μ1
(b) (e) 4
2 2
μ1
0
μ1
1
0
−4
−1
0 20 40 60 80 100 0 20 40 60 80 100
Time Time
(c) (f )
3
2 10
μ2
μ2
1 5
−1 0
0 20 40 60 80 100 0 20 40 60 80 100
Time Time
FIGURE 5.2 Joint (a, d) and marginal plots (b, c, e, f) of VAR(1) time series simulated from
Equation 5.3 based on μ∗ = (1, 1) and σ 2 = 1 in both cases. Panels (a–c) show μt , μ1,t , and
μ2,t based on ρ = 0.5 and panels (d–f) show μt , μ1,t , and μ2,t for ρ = 0.95.
* Again, we feel that the term “state-space” is a bit too broad to be used to effectively differentiate random
walk models for animal movement because any hierarchical model can be thought of as a state-space
model. Outside of the animal ecology world, the term “state-space” is often reserved for temporal and
spatio-temporal processes.
152 Animal Movement
the movement process can ameliorate some identifiability issues. For example, the
error variance reported by telemetry device manufacturers could be used to inform
σs2 . If the measurement error covariance is nondiagonal, then it may be feasible to
statistically separate it from the process variance. Similarly, if we assume smooth-
ness in the movement process by letting M ≡ I (i.e., the ICAR situation), we usually
have enough of a reduction in the model complexity that a single set of data can be
useful, but this also affects scientific inference about the biological and ecological
mechanisms governing the movement process. Finally, when multiple instruments
are measuring the individual’s position at the same time, or near the same time,
we can use this information to help separate the observation variance from process
variance.*
The utility of a discrete-time hierarchical movement model can be assessed by
considering the unknown quantities in the model, as well as various functions of
them, that might be of interest. In this case, using the model in Equation 5.4, there
are four sets of unknown quantities: (1) the measurement error variance σs2 , (2) the
process variance σμ2 , (3) the parameters in M that control the dynamics, and (4) the
set of true locations μt , for t = 1, . . . , T. If one is interested in learning about the
measurement error associated with the telemetry device, inference should involve
σs2 . If one is interested in learning about the stochasticity associated with the under-
lying movement process, inference should involve σμ2 . Similarly, if one seeks to
learn about the smoothness of the movement at a given time scale, inference should
involve M.
One of the most useful types of inference can be obtained by learning about the
true underlying locations μt . Properly accounting for measurement error and, at least,
a surrogate for the movement process allows us to learn about the actual animal loca-
tions and the associated uncertainty, even though we did not observe them directly. It
also allows for inference pertaining to any function of the true locations. For exam-
ple, the velocity vectors associated with a movement process are a simple difference
function of the process in time (i.e., vt ≡ μt − μt−1 ); thus, we can obtain an under-
standing of the step lengths† and turning angles ofthe individual path at any given
time via the quantities vt vt and cos−1 ((vt−1 vt )/( vt−1 vt−1 vt vt )). These derived
quantities can help characterize movement behavior. For example, areas where the
speed is consistently high might indicate migration or dispersal corridors and areas
where the turning angles are sharp might indicate a foraging behavior. The derived
quantities are indexed in time so they can be mapped to the spatial domain (with
associated uncertainty) because we have formal inference for the true locations in
space (i.e., μt ). In some sense, this could be viewed as an emergent or derived form
of inference.
* This could occur when an individual is telemetered with a GPS and Argos device simultaneously, for
example. While it may be a good idea to use multiple telemetry devices for statistical reasons, it may not
always be practical. However, telemetry data sets collected with two devices do exist (e.g., Argos and
VHF devices; Buderman et al. 2016).
† To obtain speed from step length when the trajectory is temporally irregular, divide vi vi by i , the
difference in time between fixes i = ti − ti−1 . Then the speed has the same units as i .
Discrete-Time Models 153
where μt−t and μt correspond to the nearest process time before and after ti ,
respectively. The weight wi is a function of the time interval between ti and t such
that
t − ti
wi = . (5.6)
t
This model is general enough that, when ti co-occurs with t, the data point is exactly
associated with the underlying process location. For cases when t is small relative
to the movement frequency of the animal, this type of linear interpolation model per-
forms well. However, as t increases, the linear interpolation may not be appropriate
(see Section 5.2.5 for more on discretization error). In most cases, there is agreement
between the data and process scales and the linear interpolation performs well.
where the vector zt contains all zeros and a single one arising from a multinomial dis-
tribution: zt ∼ MN(1, p). The mixture model in Equation 5.9 can yield computational
advantages such as conjugacy in a Bayesian setting (which results in an automatic
model fitting algorithm that does not require tuning).
A different approach for incorporating multiple attracting points relies on a tem-
poral change-point model. For example, Figure 5.3 shows two simulated trajectories
arising from Equation 5.7 with two attracting points
∗
μ1 t < t∗
μ∗t = , (5.10)
μ∗2 t ≥ t∗
and t∗ is the change point to be estimated. Notice how the trajectory (Figure 5.3d–f)
based on ρ = 0.95 is almost so smooth that it obscures the fact that a change in
attracting point occurred. Longer time series will eventually reveal the change, but
the amount of data needed to estimate a change depends on the smoothness.
One approach for adding temporal heterogeneity to the simple random walk mod-
els we have discussed thus far is to allow the dynamics to change over time. That is,
generically, we could let the propagator matrix from Equation 5.4 vary with time (i.e.,
Mt ). In fact, if Mt ≡ ρt I and we expect ρt > 0, we could use logit(ρt ) = xt β to link
the temporal correlation coefficients to a set of time-specific covariates. In essence,
this regression formulation for temporal correlation accomplishes two things: (1) it
allows for differing degrees of smoothness in the movement at different times and (2)
it allows for inference concerning the potential drivers of movement dynamics. For
example, if we used a temporal covariate, such as temperature, for xt in the model
logit(ρt ) = β0 + β1 xt , then a negative value for β1 would indicate that the position
process μt becomes more steady (i.e., smoother) as temperatures decrease.
We could also use a random effect approach to allow for heterogeneous dynamics.
The random effect model could be specified as logit(ρt ) ∼ N(μρ , σρ2 ). In this case,
the time-specific correlations are shrunk back to a general mean μρ (in the logit space)
Discrete-Time Models 155
(a) 4 (d)
2 5
0
μ2
μ2
0
−2
−4 −5
−4 −2 0 2 4 −8 −6 −4 −2 0 2 4 6
μ1 μ1
(b) 4 (e)
2 4
0 0
μ1
μ1
−4
−4
−8
0 20 40 60 80 100 0 20 40 60 80 100
Time Time
(c) (f ) 4
2
0 2
μ2
μ2
0
−2
−2
−4
−4
0 20 40 60 80 100 0 20 40 60 80 100
Time Time
FIGURE 5.3 Joint (a, d) and marginal plots (b, c, e, f) of VAR(1) time series simulated from
Equation 5.7 based on attracting points μ∗1 = (1, 1) and μ∗2 = (−2, −2) , with σ 2 = 1 in both
cases. Panels (a–c) show μt , μ1,t , and μ2,t based on ρ = 0.5 and panels (d–f) show μt , μ1,t ,
and μ2,t for ρ = 0.95. Horizontal gray lines represent μ∗1 and μ∗2 and vertical dashed gray lines
represent t∗ .
* Tuning model parameters to attenuate other model components is a model selection technique called
“regularization” (e.g., Hooten and Hobbs 2015).
156 Animal Movement
where Mt = ρt I. If there are two possible types of behavior we might expect an ani-
mal to be exhibiting (e.g., resting and foraging), we expect only two possible values
for ρt such that
ρ1 if t < t∗
ρt = , (5.12)
ρ2 if t ≥ t∗
where ρ1 and ρ2 represent the dynamics before and after a particular time t∗ where
the change occurs. Unless t∗ is known in advance, it will need to be treated as an
unknown model parameter and estimated. Estimation of t∗ could be done using
maximum likelihood methods or through the Bayesian approach. For the latter,
t∗ needs an appropriate prior distribution. One such prior is the discrete uniform
t∗ ∼ DiscUnif(2, . . . , T − 1), which has support 2, . . . , T − 1. This prior indicates
that the change can occur equally likely at any discrete time point ranging from 2 to
T − 1.*
The approach described above for accommodating changing dynamics via a
change-point model forces the behavior to be grouped into two time periods. An
alternative approach that allows switching behavior (still pertaining to two types of
dynamics) can be written as
ρ1 if zt = 1
ρt = , (5.13)
ρ2 if zt = 0
where the latent binary variable zt is further modeled as zt ∼ Bern(p). In this case,
the dynamics can change at every time point but the overall ratio of type 1 versus
type 2 dynamics is controlled by p. It may be unrealistic to assume that an individual
animal could switch back and forth on every time step, so additional smoothing on the
switching process could be induced in several ways. A simple approach for smoothing
the switching dynamics could be achieved with an HMM such that
Bern(p1 ) if zt−1 = 1
zt ∼ . (5.14)
Bern(p0 ) if zt−1 = 0
When p1 is large (i.e., close to one) and p0 is small (i.e., close to zero), then ρt will
have a tendency to stay in its current state longer. By contrast, when both p1 and p0 are
close to 0.5, the model reverts to the simpler case with p = 0.5. A stronger assumption
would let p0 = 1 − p1 , which is capable of providing appropriate dynamic behavior
for some situations. For example, Figure 5.4 shows a simulated trajectory based on
the change-point model (5.11) and Equations 5.13 and 5.14 with p1 = 1 − p0 = 0.99.
* Notice that we do not include times 1 and T in the support for t∗ . This is because we would not have
enough data to estimate a change point on the boundary of our time series.
Discrete-Time Models 157
(a) 1.0
0.8
0.6
z
0.4
0.2
0.0
0 20 40 60 80 100
Time
(b)
2
μ1
−2
−4
0 20 40 60 80 100
Time
(c)
6
4
μ2
−2
0 20 40 60 80 100
Time
FIGURE 5.4 Hidden Markov process zt (a) and marginal plots (b, c) of a VAR(1) time series
simulated from Equations 5.11, 5.13 and 5.14 with p = 0.99, μ∗ = (1, 1) , and σ 2 = 1 in both
cases. Panels (b–c) show μ1,t and μ2,t based on ρ1 = 0.1 when zt = 1 and ρ = 0.99 when
zt = 0.
158 Animal Movement
Notice that there was only a single change point in our simulation due to the large
value for p1 , and that, while zt = 0 (i.e., early in the time series), the trajectory is
much smoother than when zt = 1 (i.e., late in the time series).
The basic concept for allowing movement dynamics to change over time can be
extended to the situations involving more regimes for the dynamics (e.g., 3, 4, . . .).
In such cases, a more general multinomial model replaces the Bernoulli. In fact,
it is possible to allow for an unknown number of regimes, but these approaches
require substantially more complicated model fitting procedures (e.g., reversible-
jump MCMC, birth-death MCMC, or other transdimensional parameter space model
implementations; Hanks et al. 2011).
vt = Mvt−1 + εt , (5.15)
where ε t ∼ N(0, σε2 I) actually accounts for the dynamics in speed and direction.
In particular, depending on how the propagator matrix M is parameterized, we can
obtain various mechanistic interpretations for the dynamics. For example, suppose
that
cos(θ) − sin(θ)
M≡ . (5.16)
sin(θ) cos(θ)
In Equation 5.16, a single parameter θ controls the dynamics, but unlike in the case
where M ≡ ρI, the trigonometric specification (5.16) allows θ to control the turn-
ing angle from one time to the next and imposes additional correlation between step
length and turning angle (McClintock et al. 2014). The turning angle parameter is
bounded between −π and π; thus, when θ is close to zero, the individual animal will
move directly ahead. Conversely, when θ is closer to π or −π, the animal will turn
Discrete-Time Models 159
around 180◦ . Similarly, θ = π/2 and θ = −π/2 will turn the animal left and right,
respectively. The step length is controlled by the process error variance σε2 , with larger
values of σε2 corresponding to larger step lengths on average.
Given the interpretation of model parameters as controlling turning angles and step
lengths in Equation 5.16, the random walk model associated with the velocity process
has a decidedly mechanistic feel to it. The random walk model for velocity (5.15) also
has a direct relationship with a discrete-time continuous-space model for the position
process. To derive this relationship, substitute μt − μt−1 for vt in Equation 5.15 to
obtain
μt − μt−1 = M(μt−1 − μt−2 ) + εt . (5.17)
Then add μt−1 to both sides and simplify the equation. As we saw in Chapter 3, the
result is a VAR(2) model for the position process:
where the propagator matrices are (I + M) for the first-order difference and −M
for the second-order difference. We discussed this result generically in Chapter 3, but
now we see how the same basic concept can be helpful in modeling animal movement
explicitly. In essence, higher-order dependence in the position process (i.e., longer
memory) allows for a useful mechanistic interpretation of the model components.
The parameterization of the propagator matrix M in Equation 5.16 yields a very
restrictive model. A simple extension is
cos(θ) − sin(θ)
M≡γ , (5.19)
sin(θ) cos(θ)
where the parameter γ (for 0 < γ < 1) dampens the contribution of the dynamics
in velocity as necessary when γ becomes small. In this new formulation (5.19), the
propagator matrix is a function of two unknown variables (i.e., γ and θ) that must be
estimated.
Figure 5.5 shows simulated trajectories (i.e., μt = τ ≤t vτ ) using the velocity
VAR(1) model (5.15) for six different parameter scenarios. The trajectories take on
very distinct geometric patterns in Figure 5.5d–f; when γ = 1, the trajectories exhibit
all left turns with consistent turning angles. Whereas, when γ = 0.1 (Figure 5.5a–c),
the trajectories exhibit more variability in their turns and step lengths. Realistic animal
movement trajectories occur when −π/2 < θ < π/2 and γ < 1 for typical temporal
resolutions (t) associated with most telemetry data.
To fit a Bayesian version of the discrete-time velocity model in Equations 5.15
and 5.19 to data, we specified the priors σ 2 ∼ IG(0.001, 0.001), θ ∼ Unif(−π, π ),
and γ ∼ Unif(0, 1). To simulate a data set, we used T = 100 time steps and let θ =
π/8, γ = 0.9, and σ 2 = 1 (Figure 5.6). Using MCMC to fit the model with 10,000
iterations, the marginal posterior distributions for model parameters are shown in
Figure 5.7. Based on the simulated data in Figure 5.6, the model is able to recover
the parameters quite well.
160 Animal Movement
(a) 10 (d) 40
20
5
0
μ2
μ2
0
−20
−5 −40
(b) (e)
10
8 10
6
5
μ2
μ2
2 0
0
−5
−6 −4 −2 0 2 4 −5 0 5 10
μ1 μ1
(c) 2 (f)
0 10
−2
−4 5
μ2
μ2
−6
−8
0
−12
−5
−10 −6 −4 −2 0 2 4 −5 0 5
μ1 μ1
FIGURE 5.5 Simulated position processes (i.e., μt = τ ≤t vτ ) using Equation 5.15 for six
different parameter scenarios and T = 100 time steps: (a) θ = 0.1 · π , γ = 0.1, (b) θ = π/2,
γ = 0.1, (c) θ = 0.9 · π, γ = 0.1, (d) θ = 0.1 · π , γ = 1, (e) θ = π/2, γ = 1, and (f) θ =
0.9 · π , γ = 1.
Discrete-Time Models 161
(a) (b)
30 6
4
20
2
μ2
v2
10 0
−2
0
−4
−10 −6
FIGURE 5.6 Simulated position processes (i.e., μt = τ ≤t vτ ) using Equation 5.15 for
T = 100 equally spaced time steps and θ = π/8, γ = 0.9, and σ 2 = 1. Panel (a) shows the
trajectory (i.e., μt or position process) with open and closed circles denoting the starting and
ending positions, respectively. Panel (b) shows the velocity vectors (vt ).
As with the first-order dynamic models for the position process, Jonsen et al.
(2005) allow this velocity model to contain time-varying dynamics with a switch-
ing model similar to Equation 5.13. In this case, several variables could be indexed
in time and allowed to arise from a discrete set of possible movement states.
For example, in the situation involving two movement states, we could allow for
Density
Density
6
6 2
4
4
1
2 2
0 0 0
0.80 0.85 0.90 0.95 1.00 0.25 0.35 0.45 0.55 0.6 0.8 1.0 1.2
γ θ σ2
FIGURE 5.7 Marginal posterior distributions for model parameters (a) γ , (b) θ, and (c) σ 2
based on the simulated position processes in Figure 5.6. True parameter values used to simulate
data are shown as vertical lines.
162 Animal Movement
where the latent binary indicator is modeled like before as zt ∼ Bern(p). The concept
of letting multiple model variables arise from a discrete set of states over time was
introduced by Morales et al. (2004) in the animal movement context. They suggested
that animals may alternate among different behaviors, resulting in different move-
ment patterns, and thus, proposed model formulations that allow for state switching
behavior.
rt ∼ Weib(at , bt ),
θt ∼ WrapCauchy(mt , ρt ), (5.21)
where Weib(r|a, b) ≡ abrb−1 exp(−arb ) is the Weibull PDF and WrapCauchy(θ |m,
ρ) ≡ (1 − ρ 2 )/(2π(1 + ρ 2 − 2ρ cos(θ − m))) is the wrapped Cauchy PDF. Weibull
random variables have positive support and parameters controlling the scale (a) and
shape (b), providing a sensible model for movement rates. The Weibull distribution
is a generalized version of the exponential distribution and becomes equivalent when
b = 1. When b < 1, the Weibull distribution has mode near zero and a long tail, allow-
ing for rare, fast movement rates (long displacements). Also, the Weibull distribution
is equivalent to the Rayleigh distribution when b = 2, and describes the step length
distribution of a standard diffusion process. Thus, the Weibull seems to be a good
option to model movement rate or step length even though other distributions (e.g.,
gamma, exponential, or lognormal) could be used. A drawback of the Weibull dis-
tribution and also the gamma and lognormal is that they are not defined for rt = 0.
Discrete-Time Models 163
Thus, zeros in the data have to be ignored or replaced by small numbers. Also, the
shape parameter b may not always be identifiable using telemetry data alone. The
wrapped Cauchy is a circular distribution with support −π ≤ θ ≤ π and parameters
controlling the scale (ρ) and location (m) of probability density on a circle.* The
wrapped Cauchy also has the special property that, as ρ → 0, it becomes a uniform
distribution on the circle providing equally likely turning angles in any direction.†
Allowing the parameters (e.g., at , bt , mt , and ρt ) to vary in time completely would
lead to an overfit model with very little learning potential. However, fixing them all
in time would not allow for realistic movement behavior. Thus, the strength of the
approach proposed by Morales et al. (2004) is in the underlying process that gov-
erns the variation in these parameters. Morales et al. (2004) proposed seven different
model specifications that provide varying amounts of heterogeneity:
* There are numerous other parameterizations of the Weibull and wrapped Cauchy distributions, but these
are most similar to those used by Morales et al. (2004).
† Compared to other circular distributions such as the wrapped normal or von Mises, the wrapped Cauchy
is more peaked and has heavier tails and thus implies different long-term consequences (Codling et al.
2008).
164 Animal Movement
Each of these models implies a different form of heterogeneity for the process. The
assumptions of each model can be checked formally and, if appropriate, models can
be compared to select which among them has the best predictive ability (Morales
et al. 2004; Hooten and Hobbs 2015). Of course, each of the scenarios presented
by Morales et al. (2004) could be generalized further if the situation dictates (e.g.,
including additional movement states).
In a Bayesian framework, each of the unknown parameters in the models above
needs a distribution and one could proceed as usual in completing the model state-
ment with explicit priors. For example, Morales et al. (2004) used gamma priors for
a and b parameters, uniform priors for m, ρ, and p (or p1 and p2 ), and then normal
priors for regression coefficients β (or β 1 and β 2 ). A potential problem with cluster-
ing models such as these is “label switching” (i.e., states may be labeled differently
in different model fits). Thus, it is common to define a subset of the parameters for
one of the categories or states as a function of others. For example, Morales et al.
(2004) set a1 = a2 + ε, where ε is the difference between scale parameters and was
assigned a truncated normal prior. Thus, state 1 will always have a larger scale param-
eter, which can help avoid label switching. Alternatively, for the mean step length or
Discrete-Time Models 165
5000 Elk−115
Elk−163
Elk−287
4980 Elk−363
4960
Northing (km)
4940
4920
4900
FIGURE 5.8 GPS telemetry data for four individual elk analyzed by Morales et al. (2004).
movement rate of one of the states, Morales et al. (2004) set m2 = m1 + ε, yielding
the corresponding scale for the Weibull as a1 = (m1 / (1 + 1/b1 ))b1 .
Morales et al. (2004) demonstrated their discrete-time random walk models using
four cow elk (Cervus canadensis) GPS telemetry data sets collected in east-central
Ontario, Canada (Figure 5.8). Using Bayesian methods, they fit the models using
WinBUGS (Lunn et al. 2000) and performed model selection to identify the best
predicting model. They also used posterior predictive checks for the temporal auto-
correlation of movement rates to justify their use of informative priors for the shape
parameter of the Weibull distribution. That is, they simulated trajectories with param-
eters sampled from the joint posterior and compared the temporal autocorrelation in
step length with those from the data.
As previously stated, Bayesian methods are often employed for complicated mod-
els that are challenging to fit using non-Bayesian approaches, such as discrete-time
movement models that explicitly account for location measurement error or tem-
porally irregular observations (e.g., Jonsen et al. 2005; McClintock et al. 2012).
However, because elk are terrestrial and the GPS fixes were obtained at regular time
intervals, an analysis similar to that of Morales et al. (2004) can be performed using
maximum likelihood methods (e.g., Langrock et al. 2012).
We used the R package “moveHMM” (R Core Team 2013; Michelot et al. 2015) to
fit the “single,” “double-switch,” “switch with covariates,” and “triple-switch” HMMs
166 Animal Movement
of Morales et al. (2004) using maximum likelihood estimation techniques, thus avoid-
ing any need for Bayesian prior specification, custom MCMC algorithms, or the
Deviance Information Criterion (DIC). As in Morales et al. (2004), the covariates
(X) included the shortest distance from each elk location to 10 habitat types (water,
swamp, treed wetland, open forest, non-treed wetland, mixed forest, open habitat,
dense deciduous forest, coniferous forest, and alvar [i.e., dwarf shrubs and limestone
grasslands]). Distance to each habitat type (km) was standardized to have zero mean
and unit variance. Unlike Morales et al. (2004), we analyzed the four elk data sets
jointly and used the Akaike Information Criterion (AIC) for model selection, and the
computation for the entire analysis required only a few seconds.
Similar to the model selection results from Morales et al. (2004), we calculated
AIC values of 3660.7 for the “triple-switch,” 3770 for the “switch with covariates,”
3790.6 for the “double-switch,” and 3975.2 for the “single” model. Although the
“triple-switch” model resulted in the lowest AIC, it essentially split an “encamped”
state (with small step lengths and large turning angles) into two states with slightly
different expected step lengths while leaving the “exploratory” state (with large step
lengths and small turning angles) largely unchanged. Thus, despite the lower AIC,
the “switch with covariates” model is arguably more biologically interpretable and
5000
Elk−115
Elk−163
Elk−287
4980 Elk−363
4960
Northing (km)
4940
4920
4900
FIGURE 5.9 Estimated elk trajectories and estimated movement states: “encamped” (solid
symbols connected by solid lines) and “exploratory” (hollow symbols connected by dashed
lines).
Discrete-Time Models 167
(a) (b)
0.4
“Encamped”
“Exploratory”
0.5
0.3
0.4
0.3
Density
Density
0.2
0.2
0.1
0.1
0.0 0.0
0 5 10 15 20 −π −π/2 0 π/2 π
Step length (km) Turning angle (radians)
FIGURE 5.10 Estimated distributions for the “encamped” and “exploratory” movement
states for (a) step length and (b) turning angle.
meaningful (Morales et al. 2004) with its two distinct “encamped” and “exploratory”
states (Figures 5.9 and 5.10).
Using the notation of Equation 5.27, both the “moveHMM” package and Morales
et al. (2004) parameterize the “switch with covariates” model in terms of the
switching probabilities
TABLE 5.1
Results of Fitting the State-Switching Discrete-Time Movement Model to Elk
Telemetry Data
95% CI 95% CI
original Morales et al. (2004) analysis, our results did not demonstrate that elk may be
more likely to switch from exploratory to encamped movement when they are close
to habitats where they can forage. Also similar to Morales et al. (2004), we found that
the elk were more likely to switch from exploratory to encamped with distance from
open habitat (β̂2,oh = 1.00).
Unlike Morales et al. (2004), our joint analysis found that these four elk were more
likely to stay in the encamped state when close to dense deciduous forest (β̂1,ddf =
0.45), more likely to switch from encamped to exploratory when close to non-treed
wetland (β̂1,ntw = −0.83) or treed wetland (β̂1,tw = −0.46), more likely to switch
from encamped to exploratory when close to open habitat (β̂1,oh = −0.82), and more
likely to remain in the exploratory state when close to non-treed wetland (β̂2,ntw =
1.12) (Table 5.1).
* Recall that the data in these models are usually functions of the time series of position data, μi .
Discrete-Time Models 169
for rt , but let φt represent bearing.* The basic data model structure is
rt ∼ Weib(at , bt ), (5.30)
φt ∼ WrapCauchy(mt , ρt ), (5.31)
with latent state vector zt comprising all zeros and a single one in the element that
corresponds to the state for time t. This is a generalization of the model framework
presented by Morales et al. (2004) who discussed two or three latent states only. The
model proposed by McClintock et al. (2012) allows for any number of states via the
dimension of vector zt . Following Blackwell (1997, 2003), McClintock et al. (2012)
allow the latent state to arise from a categorical distribution, which is equivalent to
modeling zt as a multinomial random vector
zt ∼ MN(1, pt ). (5.32)
The simplest model for the state probabilities assumes they are static over time and
sum to one; that is, pt = p, where p 1 = 1. In this case, the state transitions are con-
ditionally independent with certain states being more prevalent than others. The first
generalization might be to allow for heterogeneity in the state probabilities with a
regression framework. The “mlogit” transformation is one possible way to model
multivariate probability vectors and can be written element-wise in terms of the log
odds as log(pt,j /pt,1 ) = xt β j , for j = 2, . . . , J states. The mlogit transformation prop-
erly constrains each pt to sum to one and there are J − 1 coefficient vectors β j to be
estimated.
A more general model that allows for dynamics in the state switching is
In this case, P is a transition matrix with columns that sum to one. The elements of
P, pj,k , control the probability of switching from state k to state j. As McClintock
et al. (2012) point out, dynamic multinomial models have become popular in the
population modeling literature (e.g., Hobbs et al. 2015) for accommodating demo-
graphic changes in populations. The larger the diagonal elements of P relative to the
off-diagonal elements, the more stable the state-switching process zt will be.
To allow for centers of attraction and repulsion, McClintock et al. (2012) let the
parameters of the model for direction depend on a distance (dt ) between the current
location (μt ) and a point in the domain of interest (μ∗ ). To achieve this attraction or
repulsion, they used a hyperbolic tangent function to link the parameter ρt to dt (i.e.,
ρt = tanh(αdt ), for scaling parameter α) and let the mean direction mt be equal to
the direction from μt to μ∗ . McClintock et al. (2012) utilize the hyperbolic tangent
function because it maps the real numbers to those bounded by −1 and 1. Values of
ρt < 0 capture repulsion; thus, if the interest is in attraction only, an alternative link
function could be the logit such that logit(ρt ) = α0 + α1 dt .
* Turning angle and bearing (or direction) are different. Turning angle is the angle between two successive
displacement vectors (i.e., moves), whereas bearing is the direction relative to true north.
170 Animal Movement
ψ if zt indicates an exploratory state
ρt = , (5.34)
tanh(αdt ) otherwise
φt−1 if zt indicates an exploratory state
mt = . (5.35)
m∗t otherwise
where the covariates xt could vary among models as well. To force the step length
model to correspond to the latent movement state (zt ), we let the coefficients vary
in time such that β a,t and β b,t will be represented by β a,j and β b,j if zt indicates the
individual is in state j at time t. The model presented by McClintock et al. (2012)
* The support of mt is circular; thus, care must be taken when |φt−1 − m∗t | > π. One way to han-
dle this is to compute the weighted average for the Cartesian coordinates and then back-transform:
mt = arg(ut exp(iφt−1 ) + (1 − ut ) exp(im∗t )).
† Also see Duchesne et al. (2015) and Rivest et al. (2015) for a more general framework that can incorporate
additional sources of directional bias in mt .
Discrete-Time Models 171
could be extended further such that the parameters in the step length distribution are
also linked to the distance between the individual and the center of attraction.
To account for measurement error and alignment of the temporally irregular data
with an underlying position process at regular time intervals, McClintock et al. (2012)
used the same approach described by Jonsen et al. (2005). They used a hierarchical
framework (5.5) with a linear weighting of neighboring data time points si like we
described in Section 5.1.4:
As before, σs2 represents the measurement error variance, and could be allowed to
vary by direction. The weights are a function of the interval between process time
points (t)
t − ti
wi = . (5.39)
t
Then the movement parameters rt and φt are functions of the underlying true positions
μt and μt−1 .
McClintock et al. (2012) demonstrated their discrete-time, multistate, biased cor-
related random walk models using Fastloc-GPS telemetry data collected from a male
grey seal (Halichoerus grypus) between 9 April and 13 August 2008. The tem-
porally irregular observed locations (i.e., si ) showed this individual seal generally
traveled clockwise among a foraging area (Dogger Bank) in the North Sea and two
haul-out sites (Abertay and the Farne Islands) on the eastern coast of Great Britain
(Figure 5.11).
While simultaneously accounting for temporal irregularity and measurement
error using Equation 5.38, McClintock et al. (2012) fit a movement process model
to the grey seal data with five movement behavior states. The behavioral states
included three “center of attraction” states (with movement biased toward one of
three unknown positions) and two “exploratory” states (with unbiased but potentially
directionally persistent movement). Specifically,
where
β0,z
a +I
[0,dzt ) (δt ) β1,zt
a if zt ∈ {1, 2, 3}
log (at ) = t
, (5.43)
β0,z
a
t
otherwise
b
β0,zt + I[0,dzt ) (δt ) β1,z
b if zt ∈ {1, 2, 3}
log (bt ) = t
, (5.44)
β0,z
b
t
otherwise
172 Animal Movement
56
Latitude
55
−2 0 2
Longitude
FIGURE 5.11 Grey seal Fastloc-GPS telemetry data (si ). Arrows indicate direction between
successive locations.
ut φt−1 + (1 − ut ) m∗t if zt ∈ {1, 2, 3}
mt = , (5.45)
φt−1 otherwise
ρ ρ ρ
β0,zt + β1,zt δt + β2,zt δt2 if zt ∈ {1, 2, 3}
logit (ρt ) = ρ
, (5.46)
β0,zt otherwise
relationship with δt in Equation 5.46. In addition to the model parameters, the terms
μ∗t and dzt were treated as unknown quantities to be estimated.
McClintock et al. (2012) used a Bayesian model implemented with a reversible-
jump MCMC algorithm to fit the model and select among different parameterizations.
ρ
Parameterizations included a linear or quadratic model for ρt (i.e., β2,zt = 0 for zt ∈
{1, 2, 3}) and models with no short-term directional persistence (i.e., ut = 0 for zt ∈
{1, 2, 3} or ρt = 0 for zt ∈ {4, 5}). McClintock et al. (2012) found strong evidence of
biased movement toward the three centers attraction, with estimated locations (μ∗ )
corresponding to the Farne Islands haul-out site, the Abertay haul-out site, and the
Dogger Bank foraging site. They also found strong evidence of shorter step lengths
within 5 km of these three centers of attraction, suggesting restricted movement in
the vicinity of the haul-out sites and restricted area search while foraging at Dogger
Bank. Little evidence was found for short-term directional persistence (i.e., ρt = 0)
for the two exploratory states, but one was characterized by longer expected step
lengths (i.e., higher speed) than the other.
Figure 5.12 shows the estimated movement states (zt ) for the interpolated locations
(μt ) corresponding to the Farne Islands haul-out site (“×” symbol), Abertay haul-out
56
Latitude
55
−2 0 2
Longitude
TABLE 5.2
Estimated Activity Budgets for the Grey Seal Data
95% CI
site (“+” symbol), Dogger Bank foraging site (“” symbol), or spatially unassociated
high-speed (“”) and low-speed (“” symbol) exploratory states. The “@” symbols
indicate the estimated coordinates of the three centers of attraction (μ∗ ). Uncer-
tainty in μt is indicated by 95% normal error ellipses (translucent gray dashed lines).
The estimated “activity budgets” (i.e., the proportion of time steps allocated to each
behavioral state) are summarized in Table 5.2.
Based on posterior model probabilities, McClintock et al. (2012) found little evi-
dence of a quadratic effect of distance on the strength of bias toward centers of
ρ
attraction (i.e., β2,zt = 0 for zt ∈ {1, 2, 3}). The Abertay haul-out site maintained a
strong and consistent bias up to 350 km, while the strength of attraction to both
the Farne Islands haul-out and Dogger Bank foraging sites decreased with distance
(Figure 5.13). However, the strength of bias declined less rapidly from the Dogger
Bank foraging site than from the Farne Islands haul-out site. These movement pat-
terns suggest the seal could be “honing in” on these targets, although other factors
400
Distance to center (km)
300
200
(e.g., ocean currents) are also likely influencing the timing and direction of these
movements (Gaspar et al. 2006).
Previous analyses of individual seal movement have been largely limited to simple
and correlated random walk models of foraging trips (Jonsen et al. 2005; John-
son et al. 2008a; Breed et al. 2009; Patterson et al. 2010). However, based on
posterior estimates and model probabilities, McClintock et al. (2012) found strong
evidence that the incorporation of bias toward centers of attraction better explained
seal movement than simple or correlated random walks.
Overall, Beyer et al. (2013) demonstrated the effectiveness of relatively simple
switching models for estimating behavioral states, but these types of models are
rapidly becoming more complicated (e.g., McClintock et al. 2012, 2013; Isojunno
and Miller 2015). There is some evidence that these models may not perform well
when movement behavioral states are not sufficiently different (Beyer et al. 2013;
Gurarie et al. 2016). In general, fitting multistate movement models can be challeng-
ing. Thus, care should be taken when implementing complicated multistate models,
including appropriate exploratory data analysis (e.g., Gurarie et al. 2016) and model
checking (e.g., Morales et al. 2004). As demonstrated for the models of Morales
et al. (2004), in the absence of location measurement error, most of the movement
process models described by McClintock et al. (2012) can be fit to data in a maxi-
mum likelihood framework using HMM fitting machinery. Whether using Bayesian
or non-Bayesian methods, the number of latent states is typically fixed a priori. Thus,
extending generalized state-switching models to an unknown number of latent states
remains a promising avenue for future research.
rt ∼ Gamma(at , bt ), (5.48)
θt ∼ vonMises(mt , ρt ). (5.49)
In this model (5.49), at and bt are the shape and scale parameters while mt and ρt
are the location and concentration parameters. Because the response of an individual
to a given feature is of interest, Tracey et al. (2005) model the response angle as
θt − m∗t ∼ vonMises(mt , ρt ), rather than the turning angle directly. This modification
allows them to consider mt = m to be fixed and only ρt to vary. The basic concept
is to let concentration be a function of distance to feature; therefore, log(ρt ) = α0 +
α1 dt∗ will allow for a reduced precision in response angle as dt∗ increases if α1 < 0.
Similarly, in the model for distances, Tracey et al. (2005) use moment matching to
model the mean angle as log(at /bt ) = β0 + β1 dt∗ .† They let the mean of the gamma
distribution vary and assume a constant variance, which is estimated separately. This
model will let the mean distance or step length decrease as a function of decreasing
distance to the feature if β1 < 0.
* The gamma distribution has the mathematically elegant property of infinite divisibility, although any
practical advantages for discrete-time movement modeling (particularly with respect to choice of t)
are not well documented.
† Note that Tracey et al. (2005) also explore other models, but we present only the exponential forms here
for simplicity.
Discrete-Time Models 177
where ỹt is the latent movement parameter (underlying log step length process), wt is
a vector of covariates involved with the observed log step lengths and α are the associ-
ated regression coefficients, ρ controls the smoothness of the latent dynamic process,
xt are covariates and β are regression coefficients associated with the dynamic pro-
cess, and the variance components σy2 and σỹ2 control the stochasticity in the model
at the observed and latent levels. We also assume that the time intervals are constant
so that we are in the typical time series context.
The hierarchical model presented by Forester et al. (2007) has a similar state-space
construction as the models for dynamics in velocity (Section 5.2). The difference is
that the model proposed by Forester et al. (2007) operates on a univariate functional
of velocity (log of inverse speed, or step length). It also contains covariate influences
at both the data and process levels.
In fact, Forester et al. (2007) combine the process and observation models and
use iterative substitution to show that the mean of yi can be written as a function of
interpretable terms:
C
C−1
yt ∼ N ρ c xt β + ρ c
(ỹt − ρ ỹt−1 − xt β) + ρ C ỹt−C + wt α, σy2 , (5.52)
c=1 c=0
for C time steps. Forester et al. (2007) explained that the first term in the mean (i.e.,
C c
c=1 ρ xt β) contains the preceding C “environments” experienced by the individual
with strength of past experience attenuated by ρ. For example, when ρ decreases
toward zero, the memory of past experiences decreases. Therefore, larger values of
ρ indicate longer “memory.”* Visually, when viewing the time series, a smoother
process will have a larger ρ and a noisier process will have a smaller ρ (approaching
zero). Forester et al. (2007) describe the second term involving a sum in Equation
5.52 as similarly attenuated process uncertainty. Essentially, the smaller the process
error (ỹt − ρ ỹt−1 − xt β), the more the covariates (xt ) can influence the movement
process.
At first glance, the covariates in both the measurement and process models (5.51)
might appear to be redundant. However, as Forester et al. (2007) explain, we can think
of this as a multiscale model in that the covariate effects at the data level (wt α) have an
immediate effect on yt , whereas the process covariate effects (xt β) have a longer-term
effect on yt because they accumulate at a rate controlled by ρ. Thus, discrepancies
among α and β can indicate multiscale dynamics in the process.
The model described by Forester et al. (2007) is completely linear (except for the
initial log transformation of the step lengths), and thus, can be fit using maximum
likelihood and Kalman filtering methods to estimate the latent process (ỹt ). However,
a Bayesian implementation of the model is straightforward. In the Bayesian situation,
we just need to specify priors for the unknown parameters α, β, ρ, σy2 , and σỹ2 . If
Gaussian priors are used for α, β, and ρ, while inverse gamma priors are used for
* In time series, memory has a different definition than this, but, for consistency, we maintain Forester’s
use of the term here to help with visualization.
178 Animal Movement
the variance components (σy2 and σỹ2 ), the full-conditional distributions will all be
conjugate and an MCMC algorithm can be easily constructed with all Gibbs updates.
As previously mentioned, the hierarchical step length model (5.51) is closely
related to the vector autoregressive models for velocity. In fact, we can specify a
multivariate model for velocity using the same approach:
Following Forester et al. (2007), we combine these conditional models (5.54) for
a heuristic about memory. Using iterative substitution, we arrive at
C
C−1
vt ∼ N M Xt β +
c
M (ṽt − Mṽt−1 − Xt β) + M
c C
ṽt−C + Wt α, σv2 I ,
c=1 c=0
(5.55)
where zi+1 represents the identity for the next patch to visit and pi is a vector of
probabilities that each patch will be chosen as the next place to visit. In a model
without memory, we can assume that the probability of moving from one patch to
another is affected by between-patch distances
rk,j β
dj|k = exp − , (5.57)
α
dj|k
pji = , (5.58)
l=k dl|k
where dj|k is the propensity of choosing patch j given that the animal is now at patch k,
which is located at distance rk,j . The case of j = k is excluded by definition because
a move is defined as the displacement from one patch to another. This propensity
changes with distance as a function of a scale parameter α and a shape parameter
β, both of which need to be estimated from the observed sequence of patch to patch
movements. This model can be expanded to include patch-level covariates, such as
the area (A) of the patches, yielding
rk,j β
dj|k = exp − , (5.59)
α
log(cj ) = β0 + β1 Aj , (5.60)
dj|k cj
pji = . (5.61)
l=k dl|k cj
For a simple movement model with memory, Morales et al. (2016) considered the
case in which the probability of visiting a particular foraging patch increased with
the number of previous visits to that patch. Also, the probability of visiting a new
patch (i.e., a patch where the total number of previous visits after i moves is equal to
zero) is a decreasing function of the total number of unique patches visited so far. To
represent the memory effects, we can write
exp(−γ ui ) if vj,i = 0
mj = vj,i
b , (5.62)
1 − exp − a if vj,i > 0
where ui is the number of unique patches visited so far, and vj,i holds the number of
previous visits to patch j. The parameter γ controls how quickly the individual avoids
choosing new patches as ui increases. We combine these values with the effect of
distance from current patch location to other patches and standardize to obtain
dj|k mj
pji = . (5.63)
l=k dl|k mj
180 Animal Movement
Morales et al. (2016) analyzed data from elk newly translocated to the Rocky
Mountain foothills near Alberta, Canada between December through February, dur-
ing 2000–2002, from three neighboring areas: Banff National Park, Cross Conser-
vation Area, and Elk Island National Park. The capture, handling, release, and fates
of these animals was described by Frair et al. (2007). A total of 20 elk individu-
als were selected for this study and were fitted with GPS collars that recorded one
location every hour for up to 11 months. Foraging patches were delimited combining
dry/mesic and wet meadows, shrubland, clear cuts, and reclaimed herbaceous classes.
The GPS telemetry data were transformed into patch-to-patch movement sequences.
Figure 5.14 shows an example of the spatial distribution and size of foraging patches
and a simplified elk trajectory for one of the tracked elk.
After specifying priors for unknown parameters, we fit the above models to the
elk data and computed DIC values of 353.53 for a model considering distances and
patch areas and 386.12 for the model considering distance and number of visits. The
DIC scores suggest that the model with distance and area of patches has a better
predictive ability. However, if we simulate trajectories with the fitted models, we see
that the model without memory implies that animals keep visiting new patches as they
5,855,000
5,850,000
5,845,000
5,840,000
5,835,000
5,830,000
FIGURE 5.14 Example of elk trajectory (in gray) simplified to a sequence of patch-to-patch
movements (black). Foraging patches are represented as circles with diameter proportional to
patch area.
Discrete-Time Models 181
(a) (b)
30 30
Number of patches visited
20 20
10 10
5 5
0 0
0 20 40 60 0 20 40 60
Number of moves Number of moves
FIGURE 5.15 Posterior predictive check on the number of unique patches visited by an elk
released into an unfamiliar landscape. Black dots show the observed increase of patches used
by the animal as they move. Gray shades show the 90% credible intervals from data simulated
using parameters sampled from the posterior distribution of a model that included distance
from current location to all patches and their area (panel a), and a model that considered the
effect of distance and the number of previous visits to all patches (using Equations 5.62 and
5.63) (panel b).
move on the landscape (Figure 5.15a). In contrast, the model that takes into account
the history of patch visits (Figure 5.15b) results in similar saturation pattern of the
number of unique patches that the animal visits as it moves.
This simple example illustrates the importance of memory in movement patterns
but also the importance of checking for emergent properties of movement trajectories
when assessing model fit and comparing alternative models. The model considering
distance from current location to all available patches and their area performs well
in modeling the identity of the next patch visited by the animal but has no way to
prevent the animal from wandering through the whole network of patches. In con-
trast, including the number of previous visits results in a form of reinforcement in
the movement path and a more restricted use of space. Even though these patterns
are expected from theoretical grounds, their relevance is apparent when we compare
simulated trajectories from the fitted models.
A drawback of the patch transition models we just presented is that they do not
take into account the potential shadowing effects of nearby patches. That is, even
without memory involved, a particular patch can be less visited than expected by
distance and area effects because it is near other patches that compete as possible
destinations. Modeling movement in highly fragmented landscapes (where distances
between patches are large compared to the size of patches), Ovaskainen and Cornell
(2003) derived patch-to-patch movement probabilities, taking into account the spatial
configuration of the patch network. They showed that, if movement is modeled as a
simple diffusion, the probability that an individual leaving patch k will eventually
reach (before dying) a patch of radius ρj , given that the animal is at a distance rkj
182 Animal Movement
K0 (αm ρj + rkj )
Hjk = , (5.64)
K0 (αm ρj )
where K0 is the √modified Bessel function of second kind and zero order. The constant
αm is equal to cm /am , with cm and am being mortality and diffusion rate in the
matrix. Ovaskainen and Cornell (2003) express this probability as a combination of
probabilities pkj of visiting next patch j given that the animal has left patch k (i.e.,
the patch transition probabilities that we desire). Assuming that pkj depends only on
the individual just leaving patch k, but not on the full history of previous movements,
Ovaskainen and Cornell (2003) define
Hjk = pkj + pki Hij . (5.65)
i=j
For example, if the network is composed of just three patches, an individual leaving
from patch 1 can eventually reach patch 2 by either going there directly (p12 ) or going
to patch 3 first and then, eventually, going from patch 3 to patch 2 (p13 H32 ).
We can write Hp = h, where H is a matrix containing the values obtained from
Equation 5.65 and with the diagonal elements equal to one. The vector h has the
same values (i.e., probabilities of eventually getting to patch j) but, as we condition
on actually emigrating from a patch, we set hkj = 0 for all k = j. Ovaskainen and
Cornell (2003) used a linear solver to obtain the patch transition probabilities (pkj ),
which take into account the spatial configuration of the network. The probability of
an animal dying
or leaving the patch network, given that it has just left patch k, is
equal to 1 − k=j pkj .
Ovaskainen (2004) and Ovaskainen et al. (2008) used this approach to fit het-
erogeneous movement models to butterfly capture–recapture data. In principle, it is
possible to use this approach replacing the diffusion result (5.64) with a generic equa-
tion such as Equation 5.57 and consider the effect of previous visits by adding weights
to the transition probabilities derived from distance and area effects. However, this
approach is probably inaccurate when patches are close to each other or when patch
shapes or movement imply that we cannot ignore the location where animals are
leaving or entering patches.
where ηt and δt are the (state-dependent) shape parameters for the beta distribu-
tion, zt ∼ MN(1, Pzt−1 ), and zt ≡ (z1,t , z2,t , z3,t ) .* As before for at , bt , and ρt (5.22
through 5.25), we have ηt = η zt and δt ≡ δ zt , where η ≡ (η1 , η2 , η3 ) and δ =
(δ1 , δ2 , δ3 ) . For this particular model, z1,t = 1 indicates the “resting” state (character-
ized by short step lengths and smaller values for ωt ), z2,t = 1 indicates the “foraging”
state (moderate step lengths, low directional persistence, and larger values for ωt ),
z3,t = 1 indicates the “transit” state (long step lengths, high directional persistence,
and larger values for ωt ; Figures 5.16 and 5.17).
Adopting a Bayesian framework, McClintock et al. (2013) used simple prior con-
straints on at , bt , ρt , ηt , and δt to reflect the expected relationships for the three
* This model belongs to the general class of multivariate HMMs (e.g., Zucchini et al. (2016), pp. 138–
141). In fact, the basic movement process models of Morales et al. (2004), Jonsen et al. (2005), and
McClintock et al. (2012) can all be considered multivariate HMMs because they all consist of multiple
data sets assumed to arise from a Markov process with a finite number of hidden states (e.g., rt and θt
constitute the two data sets in the movement process model proposed by Morales et al. 2004).
184 Animal Movement
57.7
56.6
Latitude
Latitude
57.6
56.5
57.5
56.4
57.4
56.3
−2.8 −2.4 −2.0 −6.0 −5.9 −5.8 −5.7 −5.6
Longitude Longitude
FIGURE 5.16 Predicted locations and movement behavior states for two harbor seals in
the United Kingdom: (a) a male in southeastern Scotland and (b) a female in northwest-
ern Scotland. Estimated movement states for the predicted locations correspond to “resting”
(“” symbol), “foraging” (“+” symbol), and “transit” (“×” symbol) movement behavior
states. Light gray points indicate observed locations (si ). Uncertainty in predicted locations
are indicated by 95% credible ellipses (dashed translucent gray lines).
behavioral states (e.g., ρ1 , ρ2 < ρ3 ). Using this approach, they were able to detect sig-
nificant differences in the proportion of time harbor seals allocated to each behavioral
state (i.e., “activity budgets”) in the pre- and postbreeding seasons (Table 5.3).
McClintock et al. (2013) also demonstrated the dangers of attempting to esti-
mate the “resting,” “foraging,” and “transit” movement behaviors based on horizontal
trajectory alone (i.e., rt and φt only). They found that 33% of time steps with ωt > 0.5
were assigned to the “resting” state when inferred from horizontal trajectory alone,
but only 1% of these were assigned to “resting” when inferred from both horizon-
tal trajectory and the auxiliary dive data using their integrated model. Similarly, they
found that 46% of time steps with ωt < 0.5 were assigned to “foraging” or “tran-
sit” based on trajectory alone, but only 12% of these time steps were assigned to
“foraging” or “transit” when using the auxiliary dive data. Owing to the difficulty
Discrete-Time Models 185
12
10
Step length (km)
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Proportion of time step below 1.5 m (ω)
FIGURE 5.17 Estimated bivariate densities of harbor seal step length (rt ) and proportion of
time step spent diving below 1.5 m (ω) from McClintock et al. (2013). Densities were estimated
for three distinct movement behavior states (i.e., (a) “resting,” (b) “foraging,” and (c) “transit”),
where darker shades indicate higher relative densities. Time steps are t = 2 h.
of distinguishing more than two behavior states from horizontal trajectory alone, the
incorporation of auxiliary data is becoming commonplace when >2 behavioral states
are of interest (McClintock et al. 2013, 2014, 2015; Russell et al. 2014, 2015; Isojunno
and Miller 2015).
Figure 5.16b demonstrates an important consideration for discrete-time movement
models when the observed location data are temporally irregular. Notice that there
were many observed locations during some of the “transit” 2 h time steps, but the
temporally regular predicted movement path diverges somewhat from the observed
locations. Clearly, discretization of the movement path can introduce additional error
in the fit of the observed locations (si ) to the estimated true locations (μi ) that is
not attributable to location measurement error. Discretization error is often reduced
by choosing smaller time steps (t ), but generally, with smaller t comes greater
computational burden. Thus, with temporary irregular si , we are often posed with
a trade-off between choice of t and discretization error when fitting discrete-time
movement models.
McClintock et al. (2013, 2015) and Russell et al. (2014, 2015) constitute “pseudo
3-D” movement models in the sense that they utilize horizontal trajectory and discrete
vertical categories to characterize >2 behavioral states for diving animals. However,
animal-borne tags equipped with accelerometers are enabling great strides toward
186 Animal Movement
TABLE 5.3
Estimated Proportion of t = 2 Hour Time Steps Assigned to
Three Movement Behavior States (“Resting,” “Foraging,” and
“Transit”) for 17 (10 Male, 7 Female) Harbor Seals in the
United Kingdom
Prebreeding Postbreeding
Note: Two time periods are compared: “prebreeding” (prior to 1 June) and “post-
breeding” (after 1 June). State assignments are based on both location and
dive data for each time step.
discrete-time 3-D movement models in continuous space (e.g., Lapanche et al. 2015).
Even with a single behavioral state, continuous-space 3-D approaches are compli-
cated, data hungry, computationally intensive, and still in their infancy. Although
much work remains to be done, thinking in 3-D holds much promise for our shared
goal of building more realistic and biologically meaningful movement models.
more aimed at continuous-time settings like those presented in Chapter 6. Other com-
putational techniques, such as the use of sparse matrix storage and manipulation (e.g.,
Rue et al. 2009), will become essential for discrete-time animal movement modeling.
In time, more of these types of computational approaches will be borrowed from time
series and adapted for use in the analysis of telemetry data.
As we discussed in Chapter 1, a fundamental characteristic of animal movement
is that individuals interact with each other, both within and among species. Many
approaches have been proposed for modeling interactions among individuals in pop-
ulations and communities (e.g., Deneubourg et al. 1989; Couzin et al. 2002; Eftimie
et al. 2007; Giuggioli et al. 2012), and while most of them are purely mathematical
or statistically ad hoc, formal statistical models are now being developed regularly.
Delgado et al. (2014) presented a linear mixed model approach, modeling “sociabil-
ity” (the difference between observed and null proximity metrics) as a function of
random individual and temporal effects. Delgado et al. (2014) relate their approach
to that used in step selection functions (SSFs). Also in the context of SSFs, Potts
et al. (2014b) provided a concise review of approaches for studying interactions and
suggested that many can be considered in a step selection framework. More recently,
Russell et al. (2016a) and Scharf et al. (In Press) have developed formal hierarchi-
cal movement models that provide inference for interactions. Russell et al. (2016a)
focused on interactions using point process formulations and Scharf et al. (In Press)
developed a discrete-time movement model to provide inference for animal social
networks.
Finally, similar discrete-time models have been proposed in other branches of
ecology, for example, Clark (1998) and Clark et al. (2003) for implementations of
integro-difference models based on dispersal kernels. Such models could be modified
for the animal movement setting and fit using telemetry data.
6 Continuous-Time
Models
189
190 Animal Movement
φL φR
φN
FIGURE 6.1 One-dimensional spatial domain with movement probabilities for a move left
(φL ), move right (φR ), and no move (φN ).
where the notation refers to changes in time and space (i.e., +μ represents the
change in spatial location in the positive direction, for a 1-D spatial domain). If we
ultimately seek an Eulerian model on the probability of occupancy, p(μ, t), we need to
replace the notation with differential notation. Turchin (1998) proceeds by expand-
ing each of the probabilities in a Taylor series, truncating to remove higher-order
terms, and then substituting the truncated expansions back into Equation 6.1. The
Taylor series expansion yields a recurrence equation involving partial derivatives:
∂p ∂
p = (φL + φN + φR )p − t(φL + φN + φR ) − tp (φL + φN + φR )
∂t ∂t
∂p ∂ μ2 ∂ 2p
− μ(φR − φL ) − μp (φR − φL ) + (φL + φR ) 2
∂μ ∂μ 2 ∂μ
∂p ∂ μ2 ∂ 2
+ μ2 (φL + φR ) + p (φL + φR ) + · · · , (6.2)
∂μ ∂μ 2 ∂μ2
where we have defined p ≡ p(μ, t), φL ≡ φL (μ, t), φN ≡ φN (μ, t), and φR ≡ φR (μ, t)
to simplify the expressions. Combining like terms and truncating off higher-order
terms in Equation 6.2 results in a PDE of the form
∂p ∂ ∂2
= − (βp) + (δp), (6.3)
∂t ∂μ ∂μ2
where β = μ(φR − φL )/t and δ = μ2 (φR + φL )/2t. The resulting model in
Equation 6.3 is Eulerian and known as the Fokker–Planck or Kolmogorov equa-
tion (e.g., Risken 1989; Barnett and Moorcroft 2008).* We can scale up to the
population level and consider the spatial intensity u(μ, t) of some number of total
animals (N), by letting u(μ, t) ≡ Np(μ, t). In this context, assuming for the moment
that there is no advection (i.e., drift or bias) component (i.e., β = 0), we have the
* Some mathematicians may object to the use of δ for a diffusion coefficient, preferring instead, D or μ,
but we use δ to stay consistent with the rest of the mathematical notation in this book.
Continuous-Time Models 191
∂u ∂2
= (δu), (6.4)
∂t ∂μ2
where the process of interest is u ≡ u(μ, t), and δ ≡ δ(μ, t) represents the diffusion
coefficients that could vary over space and time. In the animal movement context,
the diffusion parameter (δ) represents animal motility. One could arrive at an alterna-
tive reduction of the Fokker–Planck equation by assuming that δ = 0, thus implying
that animal movement is driven by advection only. Though, perhaps less intuitive, we
might expect such behavior in wind- or water-advected populations (e.g., egg disper-
sal in a river system) or in cases where there is strong attraction or repulsion to spatial
features.
There are other ways to derive the ecological diffusion model in Equation 6.4
(Turchin 1998); however, we feel that this perspective may be directly beneficial
to those modeling spatio-temporal population dynamics, as the recent literature
suggests (e.g., Wikle and Hooten 2010; Cressie and Wikle 2011; Lindgren et al.
2011). The properties of ecological diffusion (6.4) are different than those of plain
or Fickian diffusions. The fundamental difference is that the diffusion coefficient
(δ) appears on the inside of the two spatial derivatives rather than between them
(Fickian, ∂u/∂t = (∂/∂μ)δ(∂/∂μ)u) or on the outside (plain, ∂u/∂t = δ(∂ 2 /∂μ2 )u).
Ecological diffusion describes a much less smooth process u(μ, t) than Fickian
or plain diffusion, and allows for motility-driven congregation to sharply differ
among neighboring habitat types. In some areas, animals may move slow, perhaps
to forage, whereas in other areas, they move fast, as in exposed terrain. The result-
ing behavior shows a congregative effect in areas of low motility (i.e., δ ↓) and
a dispersive effect in areas of high motility (i.e., δ ↑). In fact, depending on the
boundary conditions, the steady-state solution implies that u is proportional to the
inverse of δ.
The Lagrangian–Eulerian connection in ecological diffusion directly relates to
the continuous-versus discrete-time formulations in animal movement models. We
presented the Lagrangian–Eulerian connection for one particular scenario only, but
similar approaches can be used to connect many other specifications for movement
models in both Lagrangian and Eulerian contexts. The Taylor series expansion (6.2)
suggests that the discrete-time model is more general because we are truncating
higher-order terms to arrive at the continuous formulation. However, the continuous
model allows for more compact notation and facilitates a continuous mathematical
analysis, which can have advantages from an implementation perspective. In fact,
Garlick et al. (2011) and Hooten et al. (2013a) show that aspects of the resulting con-
tinuous model (6.4) can be exploited to yield approximate solutions that are highly
efficient to obtain numerically. Specifically, Garlick et al. (2011) and Hooten et al.
(2013a) use a type of perturbation theory called the method of multiple scales, or
homogenization, to arrive at an approximate solution to the PDE that is fast enough
that it can be used iteratively in a statistical algorithm for large spatial and temporal
domains. Such improvements in computational efficiency may not be possible using
the discrete-time model (6.1) directly.
192 Animal Movement
where we explicitly use the parenthetical function notation (e.g., b(ti )) that depends
on time directly, rather than the subscript notation (e.g., bi ), and the change in time
is i = ti − ti−1 . In this case, to let the individual step lengths correspond to time
intervals between b(ti ) and b(ti−1 ), we let the displacement vectors ε(ti ) depend
on i so that ε(ti ) ∼ N(0, i I). For large gaps in time, the displacement distance
(i.e., step length) of the individual during that time period will be larger on aver-
age. For simplicity, we consider the case where all the time intervals are equal (i.e.,
i = t, ∀i).
An alternative way to write the model for the current position b(ti ) is as a sum
of individual steps b(ti ) − b(ti−1 ) beginning with the initial position at the origin
b(t0 ) = (0, 0) and t0 = 0 such that
i
b(ti ) = b(tj ) − b(tj−1 ) (6.6)
j=1
i
= ε(tj ). (6.7)
j=1
For example, Figure 6.2 shows two simulated realizations of a Brownian motion
process based on 1000 time steps to accentuate the necessary computational dis-
cretization. Forcing the time intervals between positions to be increasingly small
(i.e., t → 0) puts the model into a continuous-time setting. Then the individual
steps become small but the sum is over infinitely many random quantities. Thus, the
continuous time model arises as the limit
i
b(ti ) = lim ε(tj ), (6.8)
t→0
j=1
* The standard PDE setting allows the probability of individual presence to evolve over time dynamically,
but it is deterministic itself.
Continuous-Time Models 193
(a) (d)
30
20 40
10
20
0
b2
b2
−10 0
−20
−20
−30
(b) (e)
80
−10 60
40
b1
b1
−30
20
−50 0
0 200 400 600 800 1000 0 200 400 600 800 1000
Time Time
(c) (f )
10 20
0 10
b2
b2
0
−20
−10
0 200 400 600 800 1000 0 200 400 600 800 1000
Time Time
FIGURE 6.2 Joint (a, d) and marginal plots (b, c, e, f) of two simulated Brownian motion
processes based on t = 1, and n = 1000 in both cases. Panels (a–c) and (d–f) show b(t),
b1 (t), and b2 (t).
which resembles the operator known as the Ito integral from stochastic calculus.
The resulting sequence b(t), for all t, is known as the Weiner process or Brown-
ian motion.* It is more comfortable to write Equation 6.8 using traditional integral
notation, but because we are “integrating” over a random quantity, the traditional
deterministic integral notation is not technically valid. Nonetheless, the traditional
* Hence, the b notation stands for “Brownian.” Note that we use a lowercase bold b to stay consistent with
our vector notation; however, it is common to see an uppercase B in related literature.
194 Animal Movement
notation is still used frequently, and thus, it is common to see Ito integrals expressed as
t
db(τ )
b(t) = dτ , (6.9)
dτ
0
where db(t) = ε(t), which is a similar abuse of notation that implies the individual
displacement vectors relate to the “derivative” of b(t) as t → 0. Loosely, we can
think of db(t) = b(t) − b(t − t) as t → 0. Therefore, the standard calculus nota-
tion for integrals and derivatives is often used for simplicity in stochastic differential
equation models (e.g., Brownian motion) when, in fact, the summation notation from
Equation 6.8 should be used instead. Finally, it is also common to write the integral
for b(t) from Equation 6.9 as
t
b(t) = db(τ ), (6.10)
0
because the integral of a constant function with respect to the Brownian process b(t)
is related to the integral of ε(t) with respect to time.
The original displacement vectors ε(t) are random; thus, the Brownian process
b(t) is also random. In fact, in the type of Brownian motion process we described, the
expectation of b(t) is zero and the variance is t. The covariance of the process at time ti
and tj is min(ti , tj ) and the correlation is min(ti , tj )/max(ti , tj ), but the covariance
between two separate differences in the Brownian process is zero.* Brownian pro-
cesses also have the useful property that b(ti ) − b(tj ) ∼ N(0, |ti − tj |I), where |ti − tj |
represents the time between ti and tj .
To generalize the Brownian motion process so that it can be located and scaled for
a specific position process μ(ti ), we begin with the discrete-time specification again,
such that
μ(ti ) = μ(ti−1 ) + ε(ti ). (6.11)
Then, to relocate the process, we assume that the initial position is μ(0) and, to scale
the process, we let the displacement vectors ε(ti ) ∼ N(0, σ 2 tI), where σ 2 stretches
or shrinks the trajectory in space. Using the Brownian notation, this model results in
for any time t. Figure 6.3 shows the simulated Brownian motion processes based on
the same ε(ti ) from Figure 6.3, but relocated using Equation 6.12 and initial position
at μ(0) = (100, 100) .
* This is known as the “independent increments” property and arises from the fact that each difference in
Brownian processes represents an ε and they are independent Gaussian random variables.
Continuous-Time Models 195
(a) (d)
130
120 140
110
120
100
μ2
μ2
90 100
80
80
70
(b) (e)
180
90
140
μ1
μ1
70
50 100
0 200 400 600 800 1000 0 200 400 600 800 1000
Time Time
(c) (f )
120
120
100
μ2
μ2
100
80
90
0 200 400 600 800 1000 0 200 400 600 800 1000
Time Time
FIGURE 6.3 Joint (a, d) and marginal plots (b, c, e, f) of two simulated Brownian motion
processes based on μ(0) = (100, 100) , t = 1, σ 2 = 1, and n = 1000 in both cases. Panels
(a–c) and (d–f) show μ(t), μ1 (t), and μ2 (t). Horizontal gray lines correspond to the initial
position μ(0).
(a) (b)
1.0 1.0
μ2
μ2
0.5 0.5
0.0 0.0
FIGURE 6.4 One hundred steps of two realizations of Brownian bridge processes (dark
lines) using (a) σ12 = 0.01 and (b) σ22 = 0.05. Both processes are based with starting point
μ(ti−1 ) = (0, 0) (open circle) and ending point μ(ti ) = (1, 1) (closed circle). Starting time
was ti−1 = 0 and ending time was ti = 1 for these simulations.
for ti−1 < t < ti and where μ(ti−1 ) and μ(ti ) are known. We can see that Equa-
tion 6.13 is a multivariate normal distribution centered at a scaled distance between
the endpoints μ(ti−1 ) and μ(ti ). The variance of this process at time t decreases as a
function of closeness in time to the starting (ti−1 ) or ending (ti ) time. Figure 6.4 shows
two realizations from two simulated Brownian bridge processes based on σ12 = 0.01
and σ22 = 0.05 and starting and ending points at μ(ti−1 ) = (0, 0) and μ(ti ) = (1, 1)
and ti − ti−1 = 1.
For situations with Gaussian measurement error, the observed telemetry loca-
tions could be modeled as described in previous animal movement models with
s(ti ) ∼ N(μ(ti ), σs2 I) for i = 1, . . . , n. This adds a natural hierarchical structure to
the model. However, most common methods for implementing these models inte-
grate over the Brownian motion process (μ(t)) to fit the model using likelihood
methods.
Horne et al. (2007) propose an approach that conditions on every other observa-
tion as an endpoint, using the middle locations as data to fit the Brownian bridge
model. Their idea was to exploit the independence property using triplets of the data.
After passing through the data once, they cycle back through it again after shifting
the triplets, ultimately yielding a “sample size” of approximately n/2 observations.
Despite the fact that this procedure results in a computationally efficient method for
fitting models to telemetry data, Pozdnyakov et al. (2014) suggest several potential
problems that could arise with it. First, the method Horne et al. (2007) described
for forming the likelihood produces a bias in the estimation of the movement vari-
ance (σ 2 ) that increases as the measurement error variance (σs2 ) increases. Second,
the movement and measurement error variances are not identifiable in the likelihood,
especially with equal time intervals between observations. Third, only approximately
half of the data are used to fit the model.
Continuous-Time Models 197
Pozdnyakov et al. (2014) demonstrate that the variance of the observed telemetry
location is
var(s(ti )) = σ 2 ti I + σs2 I (6.14)
and covariance is
cov(s(ti ), s(tj )) = σ 2 min(ti , tj ). (6.15)
Thus the covariance matrix for the joint telemetry data is dense (completely filled with
nonzero elements). However, the covariance matrix for the observed velocities (i.e.,
s(ti ) − s(ti−1 )) is tri-diagonal, but not diagonal, meaning not all off-diagonal elements
of the matrix are zero. In fact, the measurement error variance occurs on the off-
diagonals, which implies that the non-diagonal nature of the covariance matrix for the
joint process becomes increasingly important as the measurement error increases. The
diagonal elements of the covariance matrix for the joint velocity process are equal to
σ 2 (ti − ti−1 ) + σs2 . Pozdnyakov et al. (2014) suggest using the joint distribution of all
velocities (which is multivariate normal) as the likelihood to fit the Brownian motion
model instead of the Brownian bridge methods proposed by Horne et al. (2007), and
claims the approach is just as easy to implement.
Thus, rather than condition on an incremental sequence of endpoints, there is value
in modeling the animal movement process as a true dynamic continuous-time process.
We return to the covariance modeling perspective of Pozdnyakov et al. (2014) for a
broader class of movement models based on continuous-time stochastic processes in
the sections that follow.
Other applications of Brownian bridge models for telemetry data include Liu et al.
(2014) and Liu et al. (2015), who use Brownian bridges in a hierarchical model to
characterize dead-reckoned paths of marine mammals. Liu et al. (2014) developed
a computationally efficient Bayesian melding approach for path reconstruction that
provides improved inference as compared with linear interpolation procedures.
i
μ(ti ) = μ(0) + (μ(tj ) − μ(tj−1 )). (6.16)
j=1
3. Substitute the model for μ(tj ) into the right-hand side of Equation 6.16.
4. Take the limit of μ(ti ) as t → 0 to obtain the Ito integral representation of
the continuous-time process.
5. If desired, rewrite the model in terms of the Ito derivative of μ(t).
where M is the VAR(1) propagator matrix, μ∗ is the attracting location, and ε(t) ∼
N(0, σ 2 tI). Substituting this conditional discrete-time model into Equation 6.16 for
μ(tj ) results in
i
μ(ti ) = μ(0) + (μ(tj ) − μ(tj−1 )) (6.18)
j=1
i
= μ(0) + (Mμ(tj−1 ) + (I − M)μ∗ + ε(tj ) − μ(tj−1 )) (6.19)
j=1
i
= μ(0) + ((M − I)(μ(tj−1 ) − μ∗ ) + ε(tj )) (6.20)
j=1
i
i
= μ(0) + (M − I)(μ(tj−1 ) − μ∗ ) + ε(tj ). (6.21)
j=1 j=1
We recognize the last term, ij=1 ε(tj ), from the previous section, as the building
block of Brownian motion. Thus, taking the limit of the right-hand side as t → 0
results in the Ito integral equation
t t
∗
μ(t) = μ(0) + (M − I)(μ(τ ) − μ ) dτ + σ db(τ ) (6.22)
0 0
t
= μ(0) + (M − I)(μ(τ ) − μ∗ ) dτ + σ b(t). (6.23)
0
The integral Equation 6.23 contains three components: the quantity μ(0), which pro-
vides the proper starting position, the attracting process 0 (M − I)(μ(τ ) − μ∗ ) dτ ,
t
and the scaled Brownian motion process σ b(t). Finally, by Ito differentiating both
sides of Equation 6.23, we arrive at the stochastic differential equation (SDE) for
Continuous-Time Models 199
Note that ε(t) ∼ N(0, σ 2 dtI) and the form in Equation 6.25 is common in the SDE
literature, but it can also be written as
dμ(t) ε(t)
= (M − I)(μ(t) − μ∗ ) + , (6.26)
dt dt
with the usual derivative with respect to time (dμ(t)/dt) on the left-hand side. Now
we recognize Equation 6.26 as a differential equation with an additive term cor-
responding to differentiated Brownian motion. This is what sets SDEs apart from
deterministic differential equations with additive error. The “error” term (i.e., ε(t))
itself is wrapped up in the derivative of the position process μ(t).
We can rewrite the stochastic integral equation (SIE) (6.23) in words as
The cumulative drift integrates (i.e., adds up) the drift process, which, in the case of
Equation 6.23, are the propagated displacements from the attracting point μ∗ . The
cumulative diffusion integrates the uncorrelated steps or “errors” to arrive at a corre-
lated movement process (described earlier as Brownian motion). Together, these two
components combine to provide a realistic continuous-time movement model for ani-
mals such as central place foragers. However, the expression in Equation 6.27 also
provides a very general way to characterize many different SIE models by modifying
the drift and diffusion components directly.*
Figure 6.5 shows two simulated stationary SDE processes arising from Equa-
tion 6.25 assuming M = ρI. As in the discrete-time models in Chapter 5, the
stochastic process in Figure 6.5a (ρ = 0.75) is less smooth than that in Figure 6.5d
(ρ = 0.99), but both processes are attracted to the point μ∗ = (0, 0) .
We began with a simple Brownian motion process with no attraction in the previ-
ous section and we added a drift term to it, resulting in a more flexible model for true
animal position processes. The resulting SIE (6.23) is not Brownian, but rather con-
tains a Brownian component. In fact, the SDE in Equation 6.26 represents one way
to specify an Ornstein–Uhlenbeck (OU) process (Dunn and Gipson 1977; Blackwell
1997).
(a) (d)
4
10
2 5
μ2
μ2
0 0
−2 −5
−10
−4
−4 −2 0 2 0 5 10 15 20
μ1 μ1
(b) (e) 20
2 15
0 10
μ1
μ1
−2 5
−4 0
0 50 100 150 200 0 50 100 150 200
Time Time
(c) (f ) 10
4
2 5
μ2
μ2
0 0
−2
−5
0 50 100 150 200 0 50 100 150 200
Time Time
FIGURE 6.5 Two simulated stationary SDE processes (dark lines) using (a–c) ρ = 0.75
and (d–f) ρ = 0.99. Both processes are based on attracting point μ∗ = (0, 0) and vari-
ance σ 2 = 1. Panels (a) and (d) show the joint process μ(t) while panels (b–c) and (e–f) show
the marginal processes μ1 (t) and μ2 (t).
steps. However, the OU process is often expressed in exponential notation (e.g., Dunn
and Gipson 1977; Blackwell 2003; Johnson et al. 2008a).
To arrive at the OU expression involving exponentials, we note that it is more com-
mon in mathematical modeling to start with the SDE involving the velocity process
and then “solve” it to find the position process μ(t). To demonstrate how solutions
to the SDE are typically derived, we begin with a simplified SDE based on Equa-
tion 6.26 in 1-D space and with attractor μ∗ = 0, Brownian variance σ 2 = 1, and
autocorrelation parameter θ, such that
One solution technique involves a variation of parameters method. In this case, mul-
tiply both sides of Equation 6.28 by eθ t and then integrate both sides from 0 to t.
The eθt term actually simplifies the required integration and allows for an analytical
solution. Thus, multiplying both sides of Equation 6.28 by eθ t results in
t t t
θτ
e dμ(τ ) dτ = −θ e μ(τ ) dτ + eθ τ db(τ ).
θτ
(6.30)
0 0 0
The integral on the left-hand side of Equation 6.30 can be solved using integration
by parts:
t t
e dμ(τ ) dτ = e μ(t) − μ(0) − μ(τ )θeθ τ dτ .
θτ θt
(6.31)
0 0
t t t
θt θτ
e μ(t) − μ(0) − μ(τ )θe dτ = −θ e μ(τ ) dτ + eθ τ db(τ ),
θτ
(6.32)
0 0 0
t
−θ t
μ(t) = μ(0)e + e−θ (t−τ ) db(τ ). (6.33)
0
The resulting solution has several interesting properties. First, notice that, as
t → ∞, the first term on the right-hand side of Equation 6.33 goes away (i.e.,
μ(0)e−θt → 0). This result implies that, as the period of time increases, the ini-
tial position has less effect on the solution for μ(t). Second, the integral on the
right-hand side is a convolution of exp(−θ(t − τ )) with a white noise process (Iran-
pour et al. 1988). To determine the mean and variance of this random variable,
we return to the infinite summation representation of the Ito integral. Thus,
t
i
e−θ (t−τ ) db(τ ) = lim e−θ (t−τ ) (b(tj ) − b(tj−1 )), (6.34)
t→0
0 j=1
where t0 = 0 and ti = t. For any t, ij=1 e−θ (t−tj ) (b(tj ) − b(tj−1 )) is a weighted
sum of independent normal random variables with mean zero and variances
202 Animal Movement
i t
2 −2θ (t−tj )
lim σ e t = σ 2 e−2θ (t−τ ) dτ (6.35)
t→0
j=1 0
σ2
= (1 − e−2θt ), (6.36)
2θ
Thus, as the time gap increases between μ(t) and μ(τ ), the conditional process reverts
to zero and the variance converges to σ 2 . However, with small |t − τ |, μ(t) will
be closer to μ(τ ). Understanding stochastic processes in terms of covariance will
become important in the following sections.
Figure 6.6 shows two 1-D conditional univariate stochastic processes simulated
from Equation 6.37 based on two different values for θ . Figure 6.6a shows the con-
ditional process based on a relatively large θ = 1, while Figure 6.6b shows the
conditional process based on a much smaller θ = 0.001. While both processes are
conditioned on μ(τ ) = 1, the conditional process in Figure 6.6a shows very lit-
tle memory of μ(τ ) = 0, the process in Figure 6.6b clearly indicates longer-range
dependence on μ(τ ) = 1.
dμ(t) ε(t)
= g(μ(t)) + , (6.38)
dt dt
where the function g(μ(t)) acts as the drift component of the SDE model and we
assume ε(t) ∼ N(0, σ 2 dtI) in this section. In the previous section, we arrived at
the functional form g(μ(t)) = (M − I)(μ(t) − μ∗ ) for the drift component based
Continuous-Time Models 203
(a)
3
2
1
0
μ
−1
−2
−3
0 100 200 300 400 500
Time
(b)
1
μ
−1
0 100 200 300 400 500
Time
FIGURE 6.6 Two 1-D simulated conditional processes (dark lines) from Equation 6.37 based
on σ 2 = 1, τ = 1 (vertical gray line), μ(τ ) = 1 (open circle). (a) θ = 1 and (b) θ = 0.001.
* Several notational issues arise here. First, refers to the gradient operator; thus, p(μ(t)) =
(dp/dμ1 , dp/dμ2 ) . Second, we use p to represent the potential function because the first letter of poten-
tial is p. In many of the papers by Billinger and Preisler, H is used for the potential function, r is used for
position, and μ is used for drift. Yes, this can be confusing at first, but to remain consistent with other
literature and our expressions thus far, a notational change is necessary.
204 Animal Movement
FIGURE 6.7 Example potential function p(μ(t)) simulated from a correlated Gaussian
random process.
Our goal, from an inferential perspective, is to learn about the influences of the
potential function on movement. Thus, as in most statistical models, we can param-
eterize the potential function in various ways depending on the desired inference.
If the goal is to learn about the influence of a single attracting point on move-
ment, we can retain the SDE model from the previous section, or we could use
the potential concept directly, letting p(μ(t), μ∗ ) ≡ 12 (μ(t) − μ∗ ) (μ(t) − μ∗ ), the
L2 norm associated with distance between μ(t) and μ∗ . Using this potential func-
tion, we arrive at g(μ(t)) = −(μ1 − μ∗1 , μ2 − μ∗2 ) = −(μ(t) − μ∗ ) for a gradient
field. The resulting gradient field implies that the mean structure for the veloc-
ity dμ(t)/dt will be zero when μ(t) is close to the attracting point μ∗ , imposing
no particular directional bias on movement when the animal is near the central
place. As the animal ventures far from the attracting point μ∗ , the mean struc-
ture implied by the gradient will bias movement back toward the central place
(Figure 6.8a).
We can attenuate the attractive force by incorporating a multiplicative term that
decreases the velocity as needed. For example, if we use g(μ(t)) ≡ −(1 − ρ)(μ(t) −
μ∗ ) such that 0 < ρ < 1, we arrive at the same SDE model as Equation 6.26. In
that case, the propagator matrix is M ≡ ρI and a unity autocorrelation parameter
(i.e., ρ = 1) will remove the attractive effect completely, allowing the individual to
wander aimlessly. As in the time series context, values of ρ less than one will ensure
the individual’s path is stationary over time, forcing the animal to move toward the
central place μ∗ eventually. For example, Figure 6.8b shows the potential function
obtained by integrating g(μ(t)) based on ρ = 0.5 and μ∗ = (0.5, 0.5) .
The potential function in Figure 6.8b is flatter than that in Figure 6.8a because
the autocorrelation is stronger (ρ = 0.5 vs. ρ = 0 in Figure 6.8a). As ρ → 1, the
potential function becomes perfectly flat, allowing the individual to move without an
attracting force.
Continuous-Time Models 205
(a)
(b)
FIGURE 6.8 (a) Potential surface p(μ(t), μ∗ ) = 12 (μ(t) − μ∗ ) (μ(t) − μ∗ ) based on a sin-
gle attracting point μ∗ (black circle). (b) Potential surface p(μ(t), μ∗ ) = (1 − ρ)/2 · (μ(t) −
μ∗ ) (μ(t) − μ∗ ) based on a single attracting point μ∗ (black circle) and ρ = 0.5.
The number of attracting points can be increased easily by letting the potential
function be a sum or product of several individual functions. For example, in the case
of two additive attractors, μ∗1 and μ∗2 , we have
where [μ(t)|μ∗1 , σ12 ] and [μ(t)|μ∗2 , σ22 ] are bivariate Gaussian density functions with
means μ∗1 and μ∗2 and variances σ12 and σ22 . The potential function in Equation 6.39
results in a complicated gradient function (g(μ(t))) with a saddle point between the
two attracting points (Figure 6.9).
Another way to specify the potential function is to let it be a polynomial and inter-
action function of the elements of position (e.g., Kendall 1974; Brillinger 2010). For
example, the potential function
p(μ(t), β) = β1 μ1 (t) + β2 μ2 (t) + β3 μ21 (t) + β4 μ22 (t) + β5 μ1 (t)μ2 (t), (6.40)
will allow for learning about the best-fitting elliptical home range by estimating the
coefficients β.
One approach to account for boundaries to movement is to let the potential
function be time-varying and depend on a region R. For example, we can let
p(μ(t), γ , R) ≡ γ /dmin (t), where dmin (t) = minμ∗ (μ(t) − μ∗ ) (μ(t) − μ∗ ) is the
206 Animal Movement
FIGURE 6.9 Potential surface p(μ(t), μ∗ ) = −[μ(t)|μ∗1 , σ12 ]/2 − [μ(t)|μ∗2 , σ22 ]/2, where
the overall potential function is an average of potential functions that are negative bivari-
ate Gaussian density functions with means μ∗1 and μ∗2 (black circles) and equal variances
(σ12 = σ22 ).
squared distance to the closest point in R from the current position μ(t). In this spec-
ification, if γ > 0, the drift term will push the animal out of the region R, which
is particularly effective for marine species.* An alternative approach to account for
boundaries is to specify the potential function such that it has higher potential out-
side of a boundary. For example, suppose there are two activity centers within a
circular bounded region Rc (e.g., a pond or crater with two divots; Figure 6.10). A
corresponding potential function can be specified as
⎧
⎪ 1! ∗
" 1! ∗
"
⎨−θ1 μ(t)|μ1 , σ1 + μ(t)|μ2 , σ2
2 2 if μ(t) ∈ Rc
p(μ(t)) = 2 2 , (6.41)
⎪
⎩
θ2 (μ(t) − μ∗3 ) (μ(t) − μ∗3 ) if μ(t) ∈ R
where μ∗3 is the overall space use center and the multipliers θ1 and θ2 control the
strength of boundary and attraction. Figure 6.11 shows a simulated trajectory based on
the potential function in Equation 6.41. The simulated individual trajectory generally
is attracted to μ∗1 and μ∗2 and, if it wanders outside of Rc , it slides back in due to the
steepness of potential at the boundary.
* Recall that there are other ways to account for boundaries to movement in the point process modeling
framework (e.g., Brost et al. 2015).
Continuous-Time Models 207
FIGURE 6.10 Potential function p(μ(t)) based on two attracting points μ∗1 and μ∗2 (black
circles) and a steeply rising boundary condition delineating a circular region of space use.
There is no reason why the form of the potential function is limited to a function of
points in geographical space. In fact, it could be a function of covariates x(μ(t)). For
example, the potential function p(μ(t), β) ≡ x(μ(t)) β takes on a multiple regres-
sion form and implies that certain linear combinations of spatially explicit covariates
should influence the velocity of an individual’s movement. These covariates could
also vary in time and include things such as soil moisture, ambient temperature, or
other dynamic environmental factors. Regression specifications for potential func-
tions have been used in many different models and applications, including discrete-
space animal movement (Hooten et al. 2010b; Hanks et al. 2011, 2015a), disease
transmission (Hooten and Wikle 2010; Hooten et al. 2010a), invasive species spread
(Broms et al. 2016), and landscape genetics and connectivity models (Hanks and
Hooten 2013).
To implement SDE models based on potential functions, Brillinger (2010) sug-
gests a statistical model specification similar to
μ(ti ) − μ(ti−1 ) = (ti − ti−1 )g(μ(ti−1 )) + ti − ti−1 ε(ti ), (6.42)
where ε(ti ) ∼ N(0, σ 2 I) and the left-hand side of Equation 6.42 is the velocity vector
from μ(ti ) to μ(ti−1 ). This specification can be useful when the data are collected at
a fine temporal resolution and there is little or no measurement error. For example,
208 Animal Movement
(a) 2.0
1.5
1.0
μ2
0.5
0.0
−0.5
−1.0
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
μ1
(b)
0.8
μ1
0.4
0.0
(c)
0.8
μ2
0.4
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Time
FIGURE 6.11 Simulated individual trajectory based on the potential function in Equa-
tion 6.41, which is composed of two attracting points and a steeply rising boundary condition
delineating a circular region of space use. Panel (a) shows the potential function (background
image) with joint trajectory simulation (black line). Panels (b) and (c) show the marginal
positions over time; attracting points are show as horizontal gray lines in the marginal plots.
Continuous-Time Models 209
where v(ti ) ≡ μ(ti ) − μ(ti−1 ). Table 6.1 shows the posterior summary statistics for
all parameters resulting from fitting the quadratic–interaction potential function SDE
to the mountain lion telemetry data. While the 95% credible intervals for β1 , β2 ,
and β4 do not overlap zero, those for β3 and β4 do. Thus, the posterior potential
function is an upward concave shape with elliptical isopleths stretched in the north–
south orientation. Figure 6.12 shows the posterior mean and standard deviation of the
potential function, the shape of which concurs with our interpretation of the parame-
ter estimates (Table 6.1). The Bayesian implementation of the SDE model facilitates
inference for the potential function, regardless of how complicated it is. For the moun-
tain lion SDE model, we see that the uncertainty in the potential function increases
away from the center of data. Inference for the potential function can be useful for
understanding spatial regions where the data provide insight about environmental
factors influencing animal movement.
The stochastic process model based on a potential function in Equation 6.42 can be
embedded into a hierarchical statistical framework in the same way that many other
physical processes have been modeled (e.g., Wikle and Hooten 2010). For example,
if we use the SIE representation of the model
ti ti
μ(ti ) = μ(0) + g(μ(τ )) dτ + σ db(τ ), (6.44)
0 0
TABLE 6.1
Posterior Summary Statistics for the Parameters in
the Mountain Lion Potential Function SDE
Parameter Mean SD 95% CI
(a)
550
600
450
250
300
0
200
15
350
350
500
550
400
0
50
μ2
450
550
500
500
400
300
250
0
200
100
μ1 15
(b) 50 50 60
80
90
80
90
70
70
30
3 0
20
10
μ2
40
80
90
80
90
60
70
μ1
FIGURE 6.12 Posterior (a) mean and (b) standard deviation of the potential function based
on fitting the Bayesian SDE model (using the potential function specification in Equation 6.40)
to the mountain lion telemetry data (dark points). Isopleth contours are shown as dark lines.
it provides a natural “solution” for μ(t) and facilitates straightforward use as a process
model in a larger hierarchical framework. Assuming Gaussian measurement error and
observed telemetry locations s(ti ), we can use the previously discussed data model
The combination of Equations 6.44 and 6.45 forms a state-space model and can
be implemented from a likelihood (if the process model could be integrated out)
or Bayesian perspective. A few complications can arise when implementing the
hierarchical model:
The SIE in Equation 6.44 can be discretized for computational purposes to simu-
late a stochastic process based on potential functions. We showed how to derive
continuous-time stochastic trajectory models earlier in this chapter. In contrast, to
discretize Equation 6.44, we can use the temporal difference equation
i−1
i
μ(ti ) = μ(0) + (tj − tj−1 )g(μ(tj−1 )) + tj − tj−1 ε(tj ), (6.46)
j=2 j=1
where the potential function can be approximated using a spatial difference equation
1 p((μ1 (t) + μ, μ2 (t)) ) − p((μ1 (t) − μ, μ2 (t)) )
g(μ(t)) ≈ − . (6.47)
2μ p((μ1 (t), μ2 (t) + μ) ) − p((μ1 (t), μ2 (t) − μ) )
t
η(t) = σ b(τ ) dτ , (6.48)
0
where η(t) is a slightly simpler version of the integrated stochastic process that was
proposed by Johnson et al. (2008a). For example, Figure 6.13 shows a Brownian
motion process (b(t)) and the associated integrated Brownian motion process (η(t)).
The integrated Brownian motion model in Equation 6.48 can be likened to that of
Johnson et al. (2008a) by relating η(t) to the position process by μ(t) = μ(0) + η(t).
If an integral of a process yields the position, then the process being integrated is
related to velocity. Thus, the idea of Johnson et al. (2008a) was to model the velocity
process as an SDE and integrate it to yield a more appropriate (i.e., smoother) model
for animal movement. This basic concept already had a precedent, as Jonsen et al.
(2005) proposed the same idea, but in the discrete-time framework we described in
the previous chapter.*
The velocity modeling approach proposed by Johnson et al. (2008a) requires a
strict relationship between b(t) and μ(t), but also suggests a more general framework
for modeling movement. To show this, we define the function
1 if 0 < τ ≤ t
h(t, τ ) = . (6.49)
0 if t < τ ≤ T
* Also, integrated temporal models are common in time series and known as ARIMA models, as described
in Chapter 3.
Continuous-Time Models 213
(a)
b2 0
−2
−4
−6 −4 −2 0 2 4
b1
(b)
20
−20
η2
−40
−60
−40 −20 0 20
η1
FIGURE 6.13 Simulated (a) Brownian motion process (b(t)) and (b) integrated Brownian
motion process (η(t)). Only 50 time steps are shown to illustrate the difference in smoothness.
Starting and ending positions are denoted by open and closed circles.
T
η(t) = h(t, τ )σ b(τ ) dτ . (6.50)
0
The convolution in Equation 6.50 is the key to recognizing a more general class of
stochastic process models for animal movement.* For example, if h(t, τ ) is a con-
tinuous function such that 0 ≤ t ≤ T, 0 ≤ τ ≤ T and with finite positive integral
* Recall that a convolution is an integral function of the form: g(x, y)f (y) dy.
214 Animal Movement
T
0 < 0 h(t, τ ) dτ < ∞, then a new general class of continuous-time animal move-
ment models arises. Hooten and Johnson (2016) referred to this class of models as
“functional movement models” (Buderman et al. 2016) for reasons that will become
clear.
The ability to specify continuous-time movement models as convolutions (i.e.,
Equation 6.50) has two major advantages. First, it clearly identifies the connections
among animal movement models and similar models used in spatial statistics and
time series. Second, for the same reasons that convolution specifications are popular
in spatial statistics and time series, FMMs share similar advantageous properties.
To illustrate the two advantages listed above, we present a simple analysis of the
new FMM presented in Equation 6.50 following Hooten and Johnson (2016). Using
the previously specified definitions for variables and simple calculus, Hooten and
Johnson (2016) showed that the process can be rewritten as
T
η(t) = h(t, τ )σ b(τ ) dτ (6.51)
0
T τ
= h(t, τ ) σ db(τ̃ ) dτ (6.52)
0 0
T τ
= h(t, τ )σ db(τ̃ ) dτ (6.53)
0 0
T T
= h(t, τ ) dτ σ db(τ̃ ) (6.54)
0 τ̃
T
= h̃(t, τ̃ )σ db(τ̃ ) (6.55)
0
1. Equation 6.51: Begin with the convolution model from Equation 6.50.
2. Equation 6.52: Write the Brownian term, b(τ ), in its integral form.
3. Equation 6.53: Move the function h(t, τ ) inside both integrals. Note that
0 < τ̃ < τ and 0 < τ < T.
4. Equation 6.54: Switch the order of integration, paying careful attention to
the limits of integration. That is, τ̃ < τ < T and 0 < τ̃ < T.
T
5. Equation 6.55: Define h̃(t, τ̃ ) = τ̃ h(t, τ ) dτ resulting in a convolution of
white noise.
Continuous-Time Models 215
Returning to the advantages of this FMM approach, the expression in Equation 6.55
has the same form described in spatial statistics as a “process convolution” (or kernel
convolution; e.g., Barry and Ver Hoef 1996; Higdon 1998; Lee et al. 2005; Calder
2007). The process convolution has been instrumental in many fields, but especially in
statistics for allowing for both complicated and efficient representations of covariance
structure. Covariance structure in time series and spatial statistics is a critical tool for
modeling dependence in processes. Thus, it seems reasonable that the same idea can
be helpful in the context of modeling animal movement.
There are three main computational advantages to using the convolution perspec-
tive in continuous-time movement models. First, it is clear from Equation 6.55 that
we never have to simulate Brownian motion; rather, we can operate on it implicitly
by transforming the function h(t, τ ) to h̃(t, τ ) via integration and convolving h̃(t, τ )
with white noise directly. This is exactly the same way that covariance models for
spatial processes have been developed.
As an example, we let h(t, τ ) be the Gaussian kernel. The Gaussian kernel is prob-
ably the most commonly used function in kernel convolution methods. If we first
normalize the kernel so that it integrates to one for 0 < τ < T, we have a truncated
normal PDF for the function such that h(t, T) ≡ TN(τ , t, φ 2 )T0 . We can then convert
it to the required function h̃(t, τ̃ ) with
T
h̃(t, τ̃ ) = h(t, τ ) dτ (6.56)
τ̃
τ̃
= 1 − h(t, τ ) dτ . (6.57)
0
When the kernel function is the truncated normal PDF, the calculation in Equa-
tion 6.57 results in a numerical solution for the new kernel function h̃(t, τ ) by
subtracting the truncated normal CDF from one, a trivial calculation in any statis-
tical software. With respect to the time domain, this kernel looks different than most
kernels used in time series or spatial statistics (Figure 6.14j). Rather than being uni-
modal and symmetric, it has a sigmoidal shape equal to one at t = 0 and nonlinearly
decreasing to zero at t = T. In effect, the new kernel in Equation 6.57 is accumulat-
ing the white noise up to near time t and then including a discounted amount of white
noise ahead of time t.
The options for kernel functions are limitless. Each kernel results in a different
stochastic process model for animal movement. In fact, we have already seen that
this class of movement models is general enough to include that proposed by Johnson
et al. (2008a), but it also includes the original unsmoothed Brownian motion process
if we let h(t, τ ) be a point mass function at τ = t and zero elsewhere (Figure 6.14a).
The point mass kernel function can also be achieved by taking the limit as φ → 0 of
216 Animal Movement
(a) (f)
0.8
0.006
~
h
h
0.4
0.000 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
(b) (g)
0.8
0.006
~
h
h
0.4
0.000 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
(c) (h)
0.8
0.006
~
h
h
0.4
0.000 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
(d) (i)
0.8
0.006
~
h
h
0.4
0.000 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
(e) (j)
0.8
0.006
~
h
0.4
0.000 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Time
FIGURE 6.14 Example kernels h(t, τ ) (a–e) and resulting integrated kernels h̃(t, τ ) (f–j). The
first row (a,f) results in the regular Brownian motion, row two (b,g) is equivalent to that used
by Johnson et al. (2008a), rows three (c,h) through five (e,j) are more common in time series
and spatial statistics. The vertical gray line indicates time t for the particular kernel shown; in
this case, t = 0.5.
We can imagine the integrated kernel in Equation 6.58 summing up all of the
past velocities to obtain the current position (Figure 6.14f). This precisely follows
Continuous-Time Models 217
the procedure we described for specifying SIEs (6.8) in the previous sections. In
Figure 6.14f, the steep drop at τ = t is what provides the original Brownian motion
process its roughness. Whereas, when we use a non-point mass function for h(t, τ ),
we arrive at a smoother stochastic process model for movement.
We describe a few different kernel functions in more detail to examine their impli-
cations for resulting animal movement behavior. In doing so, it is simplest to interpret
the h(t, τ ) and h̃(t, τ ) functions directly. For example, using the direct integration of
velocity as proposed by Johnson et al. (2008a) results in the h̃(t, τ ) in Figure 6.14g.
The individual’s position is an accumulation of its past steps. The steps are noisy
themselves in direction and length but have some general momentum. In the case of
the “tail up” kernel shown in Figure 6.14c, the influence of past steps on the current
position decays linearly with time (Figure 6.14h).* In this case, the individual’s posi-
tion is more strongly a function of recent steps than steps in the distant past. The oppo-
site is true with the “tail down” model shown in Figure 6.14d, where future steps also
influence the position (Figure 6.14i). Heuristically, we might interpret the resulting
movement behavior as perception driven. That is, the individual may have an aware-
ness of a distant destination that drives their movement. Finally, the Gaussian kernel
discussed earlier in Figure 6.14e indicates a symmetric mixture of previous and future
velocities, suggesting the perception of former and future events by the individual.
T
cov(η(t1 ), η(t2 )) = σ h̃(t1 , τ )h̃(t2 , τ ) dτ , (6.59)
0
in spatial statistics (e.g., Paciorek and Schervish 2006; Ver Hoef and Peterson 2010).
The resulting model for the position process in this FMM for smooth Brownian
motion is
μ ∼ N(μ(0)1, σ 2 tH̃H̃ ). (6.61)
It may not be immediately apparent how this expression (6.61) is helpful. In fact, it
is often more intuitive to model the process from the first moment (i.e., mean dynam-
ical structure) rather than the second moment (Wikle and Hooten 2010). However,
the joint form with dependence imposed through the matrix H̃H̃ can be useful for
computational reasons. When the integral in Equation 6.59 cannot be used to ana-
lytically compute the necessary covariance matrix, we can still use the outer product
of the matrices explicitly (i.e., H̃H̃ ). However, the true covariance requires the num-
ber of columns of H̃ to approach infinity, which, under approximation, can lead to
computational difficulties. Higdon (2002) suggested a finite process convolution as
an approximation. In the finite approximation, H̃ could be reduced to m columns.
The reduction of columns in H̃ implies that there are m “knots” (spaced t apart)
in the temporal domain that anchor the basis functions (i.e., kernels), and thus, only
m white noise terms are required so that η ≈ H̃ε, where H̃ is an n × m matrix and
ε ≡ (ε(t1 ), . . . , ε(tj ), . . . , ε(tm )) is an m × 1 vector. As we discussed in Chapter 2, a
finite approximation of the convolution is also sometimes referred to as a reduced-
rank method (Wikle 2010a). Reduced-rank methods for representing dependence
in statistical models can improve computational efficiency substantially, and have
become popular in spatial and spatio-temporal statistics for large data sets (Cressie
and Wikle 2011).
To illustrate how the kernel functions relate to covariance, we simplified the
movement process so that it is 1-D in space. This approach can also be general-
ized to higher dimensions. In the more typical 2-D case, we form the vector η ≡
(η(1, t1 ), . . . , η(1, tn ), η(2, t1 ), . . . , η(2, tn )) by stacking the temporal vectors for each
coordinate. Then the joint model can be written as
where I is a 2 × 2 identity matrix.* As in the previous section, this new joint specifica-
tion assumes that H̃ contains the appropriate set of basis vectors for both coordinates
(longitude and latitude). This assumption can also be generalized to include different
types or scales of kernels for each direction. In fact, the original SIE for η(t) from
Equation 6.50 can be rewritten as
T
η(t) = H(t, τ )σ b(τ ) dτ . (6.63)
0
where H(t, τ ) is a 2 × 2 matrix function. If both diagonal elements are equal to the
previous h(t, τ ) with zeros on the off-diagonals, we have an equivalent expression to
* Recall that an identity matrix has ones on the diagonal and zeros elsewhere. Also, the ⊗ symbol denotes
the Kronecker product, which multiplies every element of I by H̃H̃ to form a new matrix.
Continuous-Time Models 219
Equation 6.50. However, if the diagonal elements of H(t, τ ) are different, h1 (t, τ ) and
h2 (t, τ ), we allow for the possibility of different types of movement in each direction.
This might be appropriate when the individual animal behavior relates strongly to a
linearly oriented habitat (e.g., movement corridor) or seasonal behavior (e.g., migra-
tion). Allowing the off-diagonals of H(t, τ ) to be nonzero functions can introduce
additional flexibility, such as off-axis home range shapes.
s ∼ N(Kμ, I ⊗ s ), (6.64)
Note that we omit the t notation in this section because the variance term σ 2 can
account for the grain of temporal discretization. The observed telemetry data (s, which
is 2n × 1-dimensional) and position process (μ, which is 2m × 1-dimensional) are
stacked vectors of coordinates in longitude and latitude. The mapping matrix K is
composed of zeros and ones that isolate the positions in μ at times when data are
available so that Kμ is temporally matched with s. The correlation matrix in Equa-
tion 6.65, (I ⊗ H̃)(I ⊗ H̃) , involves Kronecker products to account for the fact that
we are modeling the bivariate position process jointly. We use a simple measurement
error variance specification such that s ≡ σs2 I and Gaussian basis functions in H̃
that are parameterized with a single range parameter φ.
If we condition on the initial state μ(0), the full hierarchical model is composed
of the unknown quantities: μ, σs2 , σ 2 , and φ. In a Bayesian implementation of the
hierarchical model, each of the unknown quantities would need to be sampled in an
MCMC algorithm. However, we can use Rao-Blackwellization and integrate out the
latent process μ, resulting in a much more stable algorithm. The resulting integrated
likelihood is multivariate normal such that
Hooten and Johnson (2016) used an inverse gamma prior for σs2 , a uniform prior
for σμ/s , and a discrete uniform prior for the range parameter φ. The discrete uni-
form prior allows us to precalculate the matrix K(I ⊗ H̃)(I ⊗ H̃) K and perform
220 Animal Movement
the necessary operations (e.g., inverses) off-line so that the MCMC algorithm only
has to access the results without having to recompute them. To illustrate the infer-
ence obtained using an FMM, we simulated data from a stochastic process based
on the FMM in Equations 6.64 and 6.65. We simulated n = 300 observations on
the time domain (0, 1) using the “true” parameter values: φ = 0.005, σs2 = 0.001,
and σ 2 = 0.01. Fitting the model in Equation 6.66 to the simulated data results in the
marginal posterior histograms in Figure 6.15, which indicate that the Bayesian FMM
is able to recover the model parameters quite well in our simulation example. Note
that the Brownian motion variance parameter σ 2 is a derived quantity in our model
because of the reparameterization in Equation 6.67.
The FMM based on the integrated likelihood specification in Equation 6.66
does not explicitly provide direct inference for the latent movement process μ.
However, we can obtain MCMC samples for μ using a secondary algorithm. The
full-conditional distribution for μ given all other parameters and data is
1000
Frequency
Frequency
400
400
500
200 200
0 0 0
0.002 0.006 0.010 0.0008 0.0010 0.0012 0.008 0.012 0.016 0.020
φ σs2 σ2
FIGURE 6.15 Marginal posterior distributions for FMM parameters resulting from a fit to
simulated data arising from Equations 6.64 and 6.65: (a) φ, (b) σs2 , and (c) σ 2 .
Continuous-Time Models 221
(a)
2
0
μ2
−1
Position truth
−2 Position prediction
Position uncertainty
Observed locations
−1 0 1 2 3
μ1
(b)
3
2
1
μ1
0
−1
0.0
μ2
−1.0
−2.0
0.0 0.2 0.4 0.6 0.8 1.0
Time
FIGURE 6.16 Panel (a) shows the simulated stochastic process (dashed line) and data
(points) from the FMM in Equations 6.64 and 6.65 with posterior realizations of the posi-
tion process (gray lines). Panels (b) and (c) show the marginal data and path as well as 95%
credible interval (gray shaded region).
where β ∼ N(0, σβ2 I) and the basis vectors in H̃ are B-splines at various
temporal scales of interest. In Equation 6.69, the position process is represented
deterministically as μ ≡ (I ⊗ H̃)β, and the hyperparameter, σβ2 , is used to impose
shrinkage on the coefficients to avoid overfitting the model and obtain the optimal
amount of smoothness in the process. The functional regression model in Equa-
tion 6.69 is trivial to implement in any Bayesian computing software (e.g., BUGS,
JAGS, INLA, and STAN; Lunn et al. 2000; Plummer 2003; Lindgren and Rue 2015;
Carpenter et al. 2016) or in a penalized regression software such as the “mgcv”
R package (Wood 2011).
Buderman et al. (2016) generalized the model in Equation 6.69 to accommodate
heterogeneity in the measurement error associated with the telemetry data. Simi-
lar to that presented by Brost et al. (2015), Buderman et al. (2016) used a mixture
distribution to represent the X-shaped pattern associated with Argos telemetry data
when modeling Canada lynx (Lynx canadensis). To demonstrate the phenomenolog-
ical FMM, we modified the basic model in Equation 6.69 so that the data arise from
the mixture distribution
N(β 0 + Xi β, i ) if zi = 1
s(ti ) ∼ , (6.70)
N(β 0 + Xi β, i ) if zi = 0
Each covariance parameter in Equation 6.71 is associated with an error class c (for
c = 1, . . . , C) such that σi2 = σc2 , ρi = ρc , and ai = ac , for example, when the ith
observation is designated as class c. Thus, the parameters for each of six error
classes (i.e., 3, 2, 1, 0, A, B) associated with Argos telemetry data and a seventh
for VHF telemetry data are specified with prior distributions and estimated while fit-
ting the model. In the case of VHF telemetry data, the measurement error is much less
than with Argos data and lacks the X-shaped pattern. Thus, for the VHF telemetry
data, zi = 1 and i = σi2 I accommodate an error pattern with circular isopleths.
We used three sets of temporal basis vectors in our specification of H̃ (and hence,
Xi ) to describe the movement of an individual Canada lynx in Colorado (Figure 6.17).
Following Buderman et al. (2016), we used B-splines at three different scales (i.e.,
Continuous-Time Models 223
FIGURE 6.17 Observed Argos and VHF telemetry data (points) for an individual Canada
lynx in Colorado (Colorado counties outlined in gray). Dashed lines are used to visualize the
sequence of telemetry data only.
spanning the compact support of each B-spline basis function): 1 month, 3 months,
and 1 year. Thus, the phenomenological FMM is capable of characterizing movement
processes at the combination of those temporal scales representing the continuous-
time trajectory that best explains the data.
Figure 6.18 shows the results of fitting the phenomenological FMM in Equa-
tion 6.70 to the Canada lynx telemetry data in Figure 6.17. While some of the Argos
telemetry observations can be subject to extreme error, the VHF telemetry data pro-
vide consistently smaller errors and, thus, have a stronger influence on the model fit.
Therefore, the northernmost portion of the position process in Figure 6.18 appears
to show the predicted path missing the observed data. However, after incorporating
uncertainty related to the telemetry data and the inherent smoothness in the remain-
der of the path, the predictions are optimal if they do not pass directly through the
observed telemetry data. Finally, large time gaps in data collection (i.e., between 750
and 900 days) result in appropriately widened credible intervals for the predicted
position process (Figure 6.18).
(a)
Position prediction
Position uncertainty
Observed locations
4,600,000
4,400,000
μ2
4,200,000
4,000,000
(b)
400,000
μ1
250,000
(c)
4,400,000
μ2
4,100,000
0 500 1000 1500
Time
FIGURE 6.18 Observed Argos and VHF telemetry data (points) for an individual Canada
lynx in Colorado. (a) Predicted position process (dark line) and position process realizations
(gray lines) in 2-D geographic space. (b) Marginal position process (dark line) in easting,
observed telemetry data (points), and 95% credible interval (gray). (c) Marginal position pro-
cess (dark line) in northing, observed telemetry data (points), and 95% credible interval (gray).
Continuous-Time Models 225
that some force of attraction acts on the position process. In this case, recall that the
basic SIE representation of a multivariate OU model is
t
μ(t) = μ(0) + (M − I)(μ(τ ) − μ∗ ) dτ + b(t), (6.73)
0
where μ∗ is the attracting point in geographic space and the matrix M is usually
parameterized so that M ≡ ρI. In a sense, the parameter ρ controls the attraction
because, if ρ = 1, the process becomes nonstationary (and hence no effect of the
attractor μ∗ ). On the other hand, if 0 < ρ < 1 then the process is stationary. How-
ever, the parameter ρ (which is also called an autocorrelation parameter) also controls
the smoothness of the process; at least up to a certain degree. As ρ → 1, the process
reverts to Brownian motion and assumes its inherent degree of smoothness, but as
ρ → 0, the position process in Equation 6.73 becomes less smooth than Brownian
motion due to the fact that the autocorrelation approaches zero and the locations are
independent and identically distributed realizations from N(μ∗ , σ 2 I). Thus, to a cer-
tain extent, the parameter ρ is capable of smoothing the position process similar to
the FMM approach described in the previous section. However, the OU model in
Equation 6.73 can only smooth the process so much and still maintain attraction.
Thus, we can combine the OU process with the FMM to achieve both smoothness
and stationarity simultaneously.
One way to combine the OU and FMM models is to replace the Brownian motion
component b(t) in Equation 6.73 with the smoothed process η(t), where
T
η(t) = h(t, τ )v(τ ) dτ , (6.74)
0
where v(t) is a 2-D OU process instead of a Brownian motion. The benefit of this
modification to the model is that, in the limits, the OU process ranges from a white
noise process to a BM process. Therefore, with one additional parameter, we can
control the smoothness from BM to an integrated BM model.
Combining these ideas with the exponential notation described in the previous
sections, Johnson et al. (2008a) implicitly used the kernel function
1 if 0 < τ ≤ t
h(t, τ ) = . (6.75)
0 if t < τ ≤ T
In doing so, they specify the OU process directly for the individual’s velocity process
in each direction (i.e., 1-D to simplify notation) and convolve with h(t, τ ) to yield
T
e−θ τ
η(t) = h(t, τ ) γ + √ b(e2θ τ ) dτ , (6.76)
2θ
0
226 Animal Movement
where γ is the mean velocity and θ is an autocorrelation parameter. Then, the position
process becomes
μ(t) = μ(0) + η(t). (6.77)
To fit the model to a discrete and finite set of telemetry data, Johnson et al. (2008a)
derived a discretization of the OU model as follows. First, they worked directly with
the velocity process
e−θ t
v(t) = γ + √ b(e2θt ), (6.78)
2θ
which is another way to formulate the OU process. Then, for times t1 , . . . , ti , . . . , tn ,
we can write
e−θ ti
vi = γ + √ b(e2θti ), (6.79)
2θ
and conditioning on the result from the preceding section on OU models, results in
−θ (ti −ti−1 ) )
−θ (ti −ti−1 ) −θ (ti −ti−1 ) 2 (1 − e
vi |vi−1 ∼ N vi−1 e + γ (1 − e ), σ . (6.80)
2θ
To find the associated position process for μi , we start from Equation 6.77, but con-
dition on the previous position μi−1 and integrate from ti−1 to ti (instead of 0 to ti )
so that
ti
μi = μi−1 + vi−1 e−θ (τ −ti−1 ) + γ (1 − e−θ (τ −ti−1 ) ) + ξi dτ (6.81)
ti−1
1 − e−θ (ti −ti−1 ) 1 − e−θ (ti −ti−1 )
= μi−1 + vi−1 +γ ti − ti−1 − + ξi ,
θ θ
(6.82)
σ2 2 1
ξi ∼ N 0, 2 ti − ti−1 − (1 − e−θ (ti −ti−1 ) ) + (1 − e−2θ (ti −ti−1 ) ) . (6.83)
θ θ 2θ
Together, the results from Equations 6.80 and 6.82 are valid for each coordinate
axis and can be combined to yield a discretized smooth OU process for a bivariate
movement process.*
The main reason for deriving the preceding results is so that we can use observed
telemetry data to fit the CTCRW model. Thus, Johnson et al. (2008a) rely on the
* Johnson et al. (2008a) provides additional details on the derivations of the integrated OU model.
Continuous-Time Models 227
si ∼ N(Kzi , s ), (6.84)
zi ∼ N(Li zi−1 , z,i ), (6.85)
where
σ2 2 1
q1,1,i = ti − ti−1 − (1 − e−θ (ti −ti−1 ) ) + (1 − e−2θ (ti −ti−1 ) ) , (6.88)
θ2 θ 2θ
σ2
q1,2,i = 1 − 2e−θ (ti −ti−1 ) + e−2θ (ti −ti−1 ) , (6.89)
2θ 2
σ2
q2,2,i = q2,2,i = 1 − e−2θ (ti −ti−1 ) . (6.90)
2θ
Using these expressions as a guideline, it is straightforward to generalize them by
allowing the autocorrelation parameter θ and variance parameter σ 2 to vary by
coordinate, and possibly over time.
A Gaussian state-space formulation such as that presented in Equation 6.85 allows
for the use of fast computational approaches such as Kalman filtering methods when
fitting the model (Chapter 3). In fact, the models described in this section can be fit
* Recall that the Markov property essentially says that a process is independent of all other time points
when conditioned on its direct neighbors in time.
† Recall the VAR(1) from Chapter 3 on time series.
228 Animal Movement
using the R package “crawl” (Johnson et al. 2008a). Kalman filtering methods provide
a way to estimate the latent state vector zi for all times i = 1, . . . , n when conditioning
on the parameters in the model. Thus, we can numerically integrate out the latent
state and maximize the resulting likelihood using standard optimization methods. For
example, using the GPS telemetry data from an adult male mule deer (Odocoileus
hemionus) in Figure 6.19, first analyzed by Hooten et al. (2010b), we were able to
fit the CTCRW model in Equation 6.85 with the “crawl” R package. The maximum
likelihood algorithm in “crawl” required only 1 s to fit, and the resulting MLEs for
parameters associated with the OU process were log(θ) = −3.61 and log(σ ) = 4.27.
The state-space formulation presented by Johnson et al. (2008a) is also suited to
Bayesian hierarchical modeling techniques and only needs priors for the unknown
parameters to proceed. The fully Gaussian state-space model will result in conju-
gate full-conditional distributions (multivariate normal) for all zi , and thus, easy
implementation in an MCMC algorithm.
While the level of smoothness in the OU velocity model can be controlled with
the OU correlation parameter, it is still a nonstationary model. That is, a simulated
4,278,000
4,274,000
Northing
4,270,000
4,266,000
FIGURE 6.19 Observed GPS telemetry data (n = 129, points) from an adult male mule
deer during autumn in southeastern Utah, USA. Dashed line is shown to connect the points
in sequence only.
Continuous-Time Models 229
OU velocity realization will eventually wander away like a Brownian motion realiza-
tion. In fact, the simulated OU velocity realization will wander away at a faster rate
because it is smoother; for this reason, it is referred to as “superdiffusive.” Superdif-
fusivity is not usually a problem when modeling animal movement because, when
fitted to telemetry data (e.g., using the Kalman filter), the estimated state is con-
strained by the data. However, if there are extremely large time gaps in the data, this
model can perform poorly because the estimated position process will tend to wander
off in the direction of the last known velocity trajectory and will not begin to return
until approximately half way to the time of the next observed location. To fix this,
Fleming et al. (2014) proposed the Ornstein–Uhlenbeck foraging (OUF) model. The
OUF model extends the OU velocity model by adding attraction to a central location
in the OU velocity model. The OUF model can be characterized by the SIE
t t
∗
μ(t) = (M − I)(μ(τ ) − μ ) dτ + v(τ ) dτ , (6.91)
0 0
where v(τ ) is an OU velocity process (Fleming et al. 2015). The OUF model has the
same SIE as an OU position model, with the integrated white noise, db(t), replaced
with a correlated OU process. By replacing the white noise with correlated noise, the
OUF model produces a smooth position process in the short term, yet will not wander
off as t → ∞ as with the integrated OU velocity model.
where g(x(μi ), β) is the actual resource selection function and f (μi , θ) is often
referred to as the availability distribution. The availability distribution represents
locations in the spatial domain that are available in the time interval (ti−1 , ti ]. The
function f (μi , θ) can differentially weight these locations based on a variety of things
such as hard barriers to movement, physical limitations of the animal, territoriality,
and so on. The most frequently chosen availability distribution in conventional RSF
models is a uniform distribution on the spatial support of the point process (typically
230 Animal Movement
the study area or home range of the animal). The choice of availability distribution
is often the largest factor affecting differences in resource selection inference using
these methods (Hooten et al. 2013b). Thus, our specification of f (μi , θ) is a critical
component of obtaining resource selection inference.
In a reconciliation of RSF and dynamic animal movement models, Johnson et al.
(2008b) presented a general framework for considering these two approaches simul-
taneously. We described point process models in Chapters 2 through 4; however,
we return to them now with a background in continuous-time stochastic models for
movement. Johnson et al. (2008b) proposed that the availability distribution be linked
to a dynamic animal movement model such that f (μi , θ) = exp((μi − μ̄i ) −1 i (μi −
μ̄i )/2), where μ̄i = μ∗ + Bi (μi−1 − μ∗ ) and Bi = exp(−(ti − ti−1 )/φ) · I is a 2 × 2
matrix with zeros on the off-diagonals, i = − Bi Bi , and is a covariance
matrix that controls the strength of attraction to the central place μ∗ . Notice that this
definition for the availability distribution f (μi , θ) is proportional to the multivariate
OU process presented in the previous sections. The reason for the proportionality is
that the normalizing constants in the rest of the Gaussian distribution cancel out in
the numerator and denominator. To see this, we use an exponential selection function
and an OU model for the availability distribution and update the point process model
for μi
Recall how similar Equation 6.93 is to the model in Equation 4.40, developed by
Brost et al. (2015), for handling irregularly spaced telemetry data and constraints
to movement. Thus, the OU model serves as a useful way to control for temporal
autocorrelation based on the physics of movement in the standard resource selection
framework. The two ways to approach fitting these types of point process mod-
els are either (1) jointly or (2) two-stage. Jointly, one would fit the point process
model directly and estimate both the parameters in the selection and availability
functions simultaneously. Brost et al. (2015) use the joint approach, and, while it
is most rigorous statistically, it can also be computationally demanding, depending
on how difficult it is to calculate the integral in the denominator of Equation 6.93.
See Chapter 4 for details on that aspect of implementation.
The second approach to fitting this movement-constrained point process
model (6.93) is to preestimate the availability distribution for all times of interest,
t1 , . . . , tn , using the methods in the previous section and then use those estimates for
availability parameters while fitting the point process model in a second step. This can
be much more stable and less computationally demanding, allowing for things like
parallelization of the first computational step across individuals in a population, for
example. However, as with most two-stage modeling procedures, the validity of the
final inference depends heavily on the appropriateness of the first step and requires
minimal feedback from the second to the first step. That is, if statistical learning about
resource selection significantly alters the future availability of resources to the indi-
vidual, then some amount of feedback would be essential to fit the proper model. As
Continuous-Time Models 231
usual, there is a trade-off in how important it is to fit the exact model versus how
important it is to get at least tentative or preliminary results about the overall process.
In an era of “big data,” such trade-offs are being made every day because scientists
need to fit approximate models that would otherwise be computationally intractable in
their exact form. We return to these concepts of two-stage animal movement models
(and the concept of multiple imputation) in Chapter 7.
s ∼ [s|μ], (6.94)
μ ∼ [μ|θ], (6.95)
θ ∼ [θ], (6.96)
Computational methods (e.g., MCMC) can be used to sample from the posterior dis-
tribution in Equation 6.97. We obtain inference for the position process by integrating
the parameters out of the joint posterior (6.97) to yield the posterior distribution
[μ|s] = [μ, θ|s] dθ. (6.98)
Posterior inference for the position process, such as the posterior mean and variance
of μ, is obtained easily using sample moments based on the resulting MCMC sam-
ples (μ(k) , k = 1, . . . , K) from the model fit. For example, the posterior mean of μ is
calculated as
E(μ|s) = μ[μ|s] dμ (6.99)
= μ[μ, θ|s] dθdμ (6.100)
K (k)
k=1 μ
≈ . (6.101)
K
using the MCMC samples μ(k) . This procedure requires that we sample the complete
position process (μ) in our MCMC algorithm.
In practice, we obtain MCMC samples for μ(k) at a finite set of prediction times.
These times may or may not line up perfectly with the times for which telemetry data
are available. Thus, consider two vectors; one vector containing the position process
that lines up in time with the observations μ and a second vector that contains the
positions for all prediction times of interest μ̃. In this case, we can use composition
sampling to obtain MCMC samples for μ̃ by first sampling from the full-conditional
Continuous-Time Models 233
for the parameters θ, next sampling from the full-conditional distribution of μ con-
ditioned on θ, and finally sampling μ̃ from the conditional predictive distribution
[μ̃|μ, θ].
We may also seek the posterior distribution for the movement metrics of inter-
est. Given that these movement metrics (e.g., f (μ)) are direct functions of the latent
position process μ, they can be treated as derived quantities in the model. To obtain
posterior inference for derived quantities that are functions of the complete position
process, we often need to calculate posterior moments. An example derived quantity
is the posterior mean of the movement metric itself
E(f (μ)|s) = f (μ̃)[μ̃|μ, θ][μ, θ|s] dθdμdμ̃ (6.102)
K (k)
k=1 f (μ̃ )
≈ . (6.103)
K
The ability to find posterior statistics (e.g., means, variance, credible intervals) using
MCMC for functions of unknown quantities in Bayesian models arises as a result of
the equivariance property (Hobbs and Hooten 2015).
An example of a useful movement metric might be the total amount of time an
individual animal spent in geographic region A; in practice, A could be an area of
critical habitat, a national park, a highway buffer, or a city boundary. The associated
movement metric is
m
f (μ̃) = tI{μ̃(tj )∈A} , (6.104)
j=1
where the sum is over a set of m prediction times (t1 , . . . , tj , . . . , tm ) spaced t units
apart. The movement metric in Equation 6.104 can be used to graphically portray the
UD by calculating the posterior mean of it for a large set of grid cells in the study
area, each represented by a different A.
Another type of movement metric is total distance traveled by the individual. In
this case, an appropriate movement metric can be defined as
m
f (μ̃) = (μ̃(tj ) − μ̃(tj−1 )) (μ̃(tj ) − μ̃(tj−1 )). (6.105)
j=2
The metric in Equation 6.105 adds up the lengths of each of the steps to calculate the
total path length. As with the first metric in Equation 6.104, the metric corresponding
to total distance moved, Equation 6.105, will converge to the correct value as the time
gap between prediction locations shrinks (t → 0). From a computational storage
perspective, one benefit of using these single-number summaries as metrics is that
we can calculate running averages of them in the MCMC algorithm without having
to save the entire position process at all prediction times for every iteration.
As an alternative to obtaining the posterior inference for the movement metrics
concurrently with fitting the Bayesian model, Johnson et al. (2011) provided three
methods for obtaining approximate inference using a two-stage approach. In each
234 Animal Movement
method, the first stage involves fitting the CTCRW model (i.e., fit using “crawl”
R package) described in the previous sections. Recall that the CTCRW approach of
Johnson et al. (2011) relies on maximum likelihood methods and uses the Kalman
filter to estimate the latent state and is thus very computationally efficient.
For stage two, Johnson et al. (2011) suggest one of the following three approaches
to sample realizations of the position process μ̃(tj ) based on an implicit posterior
distribution for μ.
1. Plug-in: Use the MLEs for the model parameters as a stand in for the pos-
terior mode (under vague priors) in the full-conditional distribution [μ̃(tj )|·]
and sample from it to obtain realizations of the position process.
2. Importance sampling: Sample model parameters from a proposal distribu-
tion, weight them according to the implicit posterior, then sample μ̃(tj ) from
its full-conditional distribution given the model parameters resampled with
probability proportional to the weights.
3. Integrated nested Laplace approximation: Deterministically sample model
parameters from a distribution that mimics the posterior, construct weights
based on the posterior at these sampled parameter locations, then sample
μ̃(tj ) from its full-conditional distribution.
All of these approaches assume that the MLE is a good representation of the posterior
mode, and thus, make strong assumptions about the effect of the prior distribution (or
lack thereof). However, the first approach will also be substantially faster to imple-
ment than fitting the full Bayesian model. Specifically, the downside to the first
approach is that it will not properly accommodate the uncertainty in the parame-
ters and may be a poor approximation in cases where the parameter uncertainty is
relatively large. On the other hand, it is the fastest and easiest of the methods to
implement. Approach two is more rigorous and will provide a good approximation
to the true posterior when the proposal distribution is close to the target density. Oth-
erwise, importance sampling methods are prone to degeneracy issues that result in
posterior realizations that carry too much weight. Despite the additional complexity,
Johnson et al. (2011) prefer the third approach because the first two were inadequate
for their example.
Returning to the mule deer example, recall the GPS telemetry data for an autumn
migration of a male mule deer (Figure 6.19). Based on the CTCRW model fit using
maximum likelihood, we used the “crawl” R package to simulate 1000 realizations of
the position process, μ(t), by importance sampling (Johnson et al. 2011). Figure 6.20
shows the original telemetry data (points) and the distribution of the position process
(gray shaded region).
Regardless of how the realizations of the position process are obtained, after they
are in hand, they can be used for inference concerning the movement metrics of
interest. The excellent properties of Monte Carlo integration and MCMC allow for a
straightforward calculation of posterior summaries for derived quantities, regardless
of whether they are linear functions of the position process or not.
For example, notice the uncertainty in the position process increases (i.e., gray
regions widen) during periods where observations are spaced far apart in the mule
deer example (Figure 6.20). We can properly account for the uncertainty in the
Continuous-Time Models 235
4,278,000
4,274,000
Northing
4,270,000
4,266,000
FIGURE 6.20 Observed GPS telemetry data (s(t), points) and predicted position pro-
cess (μ(t), gray shaded region) for an adult male mule deer during autumn in southeastern
Utah, USA.
underlying position process when obtaining inference for movement metrics. For
example, Figure 6.21 shows the distributions for the movement metrics based on
the posterior simulation of the position process after fitting the CTCRW model to
the GPS telemetry data from the mule deer during an autumn migration. Finally, the
summary statistics in Table 6.2 indicates that the posterior mean path length during
the fall migration for this individual mule deer was 31 km, and the average speed
during the fall migration was 0.48 km/h.
(a) (b)
60
80
Frequency
60 40
Frequency
40
20
20
0 0
30.0 31.0 32.0 0.470 0.480 0.490 0.500
Total path length (km) Average speed (km/h)
FIGURE 6.21 Distributions of (a) total path length and (b) speed based on the CTCRW model
fit to the GPS telemetry data and posterior simulation of the position process from the mule
deer during an autumn migration.
TABLE 6.2
Posterior Summary Statistics for the GPS Telemetry Data
from the Mule Deer during Autumn Migration
Metric Mean Standard Deviation Credible Interval
continuous- and discrete-time models both analytically and empirically. In doing so,
they made several important points that we summarize in what follows.
One of the first points made by McClintock et al. (2014) is that the term “state-
space model” refers to every hierarchical model that incorporates data and process
model components. Both discrete-time and continuous-time animal movement mod-
els are state-space models if they accommodate measurement error. Thus, this term
may not be the most appropriate way to distinguish among models forms. In fact,
McClintock et al. (2014), in their Table 6.2, list 17 different forms of statistical
movement models based on the following attributes: discrete/continuous time, dis-
crete/continuous space, metric being modeled (e.g., position, velocity, turning angle,
step length), directed or undirected movement, correlated or uncorrelated movement,
and whether they are single-state or multistate models. Their synthesis suggests a
huge variety in the type of movement models developed and used in practice and far
from a consensus in their form.
As we have seen in this chapter, there has been a lengthy and sometimes parallel
evolution of both discrete-time and continuous-time models for animal movement.
Naturally, new developments are derived as generalizations of earlier models. For
example, the discrete-time models described by Morales et al. (2004) and Jonsen
et al. (2005) reduce to uncorrelated random walks under certain parameterizations
Continuous-Time Models 237
models used previously. Thus, some would argue the time for information trade-off
is worth it.
From our perspective, when considering the speed of obtaining animal movement
inference, one should consider the time it takes to develop the model, the code, and
the actual computational time together. More time spent on optimizing computer code
leads to increases in speed. Thus, when an algorithm needs to be used repetitively
(e.g., for several hundred individuals in a larger population), it can be worth the extra
programming time up front. Likewise, although models already exist that could be
used to analyze telemetry data, they can always be improved upon to yield faster
algorithms. Thus, ongoing development of both discrete- and continuous-time ani-
mal movement models is essential. However, we need not always focus on extending
animal movement models to more complicated settings; we should continue to pursue
important ways to facilitate the use of existing model forms.
In the previous chapter, we showed how one can build additional complexity and
realism into CTCRW movement models through the use of potential functions. How-
ever, owing to the readily available and user-friendly software provided by Johnson
et al. (2008a) that fits smooth velocity-based OU models to irregularly spaced data
while accounting for measurement error, there are many reasons to rely on the result-
ing model output for further inference. Using “crawl” to obtain posterior realizations
of the position process μ̃(tj ) allows for much more complicated inference than that
proposed by Johnson et al. (2011). In fact, entirely new movement models can be fit
using the output from “crawl” (or similar first stage models) as data. In what follows,
we describe several approaches for using first-stage posterior realizations of μ̃(tj ) in
secondary statistical models to learn about additional factors influencing movement.
The basic concept is to think of the types of statistical models you might fit if
you could have perfect knowledge about the true position process μ (i.e., μ(t), ∀t ∈
T , for the compact time period of interest T ).* In this case, we can build models
that rely on the entire continuous position process (i.e., a line on a map) and we can
characterize the path using the methods in the preceding section to obtain inference.
We can build population-level models that pool or cluster similar behaviors among
individuals. We can also obtain inference that improves the understanding of how
animals choose to move among resources and interact with each other at any temporal
scale of interest.
* With telemetry technology rapidly improving, semicontinuous data may not be far away, but we will
always have historical data sets for which inference is desired.
239
240 Animal Movement
more accurate uncertainty estimates for the secondary model parameters than only
conditioning on the posterior mean for μ.
Traditionally, multiple imputation treats μ̃ as missing data and [β|μ] is assumed
to be asymptotically Gaussian (Rubin 1987, 1996). Furthermore, if we condition on
the augmented μ̃ and fit the Bayesian model, the posterior distribution [β|s̃] will con-
verge to the distribution of the MLE for β conditioned on μ̃. These ideas allow us to
use maximum likelihood methods to obtain the point estimate for β (i.e., E(β|μ̃))
and associated variance (i.e., Var(β|μ̃)), which can then be averaged to arrive at
inference for β conditioned on μ using the following conditional mean and variance
relationships:
E(β|μ) ≈ Eμ̃ (E(β|μ̃)), (7.1)
and
Var(β|μ) ≈ Eμ̃ (Var(β|μ̃)) + Varμ̃ (E(β|μ̃)). (7.2)
In practice, we fit individual models using maximum likelihood methods and μ̃(k) as
(k) (k)
data to yield β̂ and Var(β̂ ) for k = 1, . . . , K realizations from a first-stage model
fit. We approximate the required integrals in the conditional mean and variance rela-
tionships using Monte Carlo integration, essentially computing sample averages and
(k) (k)
variances using β̂ and Var(β̂ ) for the K imputation samples. We have found that
only a relatively small number of imputation samples provide stable inference (i.e.,
on the order of 10s rather than 100s or 1000s). This approach to multiple imputation is
well known and performs well in most cases, but also requires stronger assumptions
and provides only approximate inference.
An alternative approach to multiple imputation used by Hooten et al. (2010b),
Hanks et al. (2011), and Hanks et al. (2015a) can be formulated as
[β|s] = [β, μ|s]dμ (7.3)
= [β|μ, s][μ|s]dμ (7.4)
≈ [β|μ̃][μ̃|s]dμ, (7.5)
FIGURE 7.1 Schematic of telemetry observations (points) and interpolated position process
(line) passing through a neighborhood of the center pixel in a larger lattice.
* The chosen dimension of 5 × 1 is not arbitrary. It arises from the fact that a truly continuous path must
first pass to one of the first-order neighbors in a square lattice of pixels before moving into other pixels
(i.e., it cannot pass directly through the corner). Our temporal discretization can always be made fine
enough so that successive points will be no further away than a single pixel. In practice, for large data
sets, this could be computationally demanding. In such cases, the methods of Hanks et al. (2015a) may
be necessary.
Secondary Models and Inference 243
(a)
(d)
FIGURE 7.2 Possible first-order moves on a regular lattice with square cells (i.e., pixels). (a)
Move north, yj = (1, 0, 0, 0, 0) , (b) move east, yj = (0, 1, 0, 0, 0) , (c) stay, yj = (0, 0, 1, 0, 0) ,
(d) move south, yj = (0, 0, 0, 1, 0) , and (e) move west, yj = (0, 0, 0, 0, 1) .
simplest of these drivers is based on the concept that an animal might move in a cer-
tain direction given the landscape it was in at time tj−1 . For example, possible drivers
of mule deer movement in southeastern Utah are shown in Figure 7.3. We can link
the movement probabilities with covariates so that g(pj,l ) = xj = x(μ̃j ) β 1 for the lth
neighbor of the cell corresponding to position μ̃j−1 .
Similarly, moves based on changes in the landscape can also be modeled. Recall
from the previous section on potential functions that we can model changes in position
as a function of the gradient associated with a potential function. In this context,
the movement probability is modeled as a function of the difference in covariates,
δ j = x(μ̃j ) − x(μ̃j−1 ), at the center pixel and the neighboring pixel such that g(pj,l ) =
δ j β 2 . This is the same general approach described in the spatio-temporal modeling
literature (Hooten and Wikle 2010; Hooten et al. 2010a; Broms et al. 2016).
Furthermore, when the individual does not move to a new pixel between successive
prediction times (tj−1 and tj ), we can model the residence probability as a function of
covariates in the residing pixel and neighboring pixels.* For example, Hooten et al.
(2010b) describe two possible residence models:
* For a temporally fine set of prediction times, there will be many more “stays” than “moves.” Again, Hanks
et al. (2015a) generalize these “stays” to a residence time, which reduces the computational demand
significantly.
244 Animal Movement
(a) (b)
(c) (d)
(e) (f )
FIGURE 7.3 Spatial covariates in the study area in southeastern Utah where the GPS
telemetry data (points) for the adult male mule deer were collected. (a) Deciduous forest, (b)
coniferous forest, (c) shrub/scrub, (d) elevation, (e) slope, and (f) solar exposure.
Secondary Models and Inference 245
To implement the model, various types of link functions could be used for g(p).
Hooten et al. (2010b) employed a hierarchical Bayesian framework that involves the
use of latent auxiliary variables zj,l . Combining all previously mentioned drivers of
animal movement together, we can write a model for the continuous latent movement
variable as
β0,1 + xj β 1 + δ j β 2 + εj,l if move on step j
zj,l = , (7.6)
β0,2 + xj−1,3 β 3 + l=3 xj−1,l β 4 + εj,l if stay on step j
where εj,l ∼ N(0, 1). Following Albert and Chib (1993), if the data model is specified
such that pj,l ≡ P(zj,l > zj,l̃ , ∀l̃ = l), we are implicitly assuming a probit link function
in this model.* This particular specification for a multinomial response model also
yields an MCMC algorithm that is fully conjugate, meaning that no tuning of the
algorithm is necessary. This is the primary advantage to using the popular auxiliary
variable approach of Albert and Chib (1993).
When implementing this model, we need to use the multiple imputation approach;
thus, there are K sets of data and covariates that we cycle through on each MCMC
iteration when sampling the sets of parameters β 1 , β 2 , β 3 , and β 4 . We suppressed
the k notation for each μ̃j in the model statements of this section for simplicity. How-
ever, the reason why we need K different sets of corresponding covariates is because
the covariates will change as the position μ̃(k) j changes. Thus, despite its utility for
providing new inference, this approach can also be computationally demanding.
We fit the hierarchical discrete-space continuous-time movement model described
in this section to the mule deer GPS telemetry data in Figure 6.19 using Bayesian mul-
tiple imputation based on the position process predictions in Figure 6.20. Focusing on
the marginal posterior distributions for the coefficients associated with moves based
on the gradient of a potential function (β 4 ) in Equation 7.6, Figure 7.4 shows violin
plots for each coefficient. The most striking effect in Figure 7.4 is that of elevation
(d) on autumn movement of the mule deer individual. The strong negative coefficient
indicates that increasing elevation has a negative effect on movement because the
individual is descending as temperatures decrease in the autumn and forage becomes
scarce.
An alternative way to view the inference of the environmental covariates is to
visualize them spatially. Figure 7.5 shows the posterior mean potential function
and resulting directional derivative functions associated with the term δ j β 2 from
Equation 7.6. Hooten et al. (2010b) did not use a negative in the gradient function
* The probit link function is the standard normal cumulative distribution function. It transforms variables
on real support to the compact support of (0,1). The probit link is an alternative to the logit link.
246 Animal Movement
−1
β4
−2
−3
−4
a b c d e f
FIGURE 7.4 Marginal posterior distributions (shown as violin plots representing the shape
of the posterior density functions) resulting from fitting the discrete-space continuous-time
movement model to the mule deer GPS telemetry data. Each coefficient corresponds to the
covariates in Figure 7.3: (a) Deciduous forest, (b) coniferous forest, (c) shrub/scrub, (d) eleva-
tion, (e) slope, and (f) solar exposure. Internal dark bars represent typical boxplots and white
points represent the median for each coefficient.
in their model specification (7.6); thus, Figure 7.5c shows the spatial potential func-
tion increasing (darker shading) toward high potential.* In this case, the potential
function (Figure 7.5c) is controlled mostly by elevation, as we discussed in relation
to the parameter estimates in Figure 7.4.
* Recall that we defined the gradient function with a negative sign in Section 6.6 to be consistent with the
notation used by Brillinger (2010).
† Generally, a discretized response variable will carry less information than the continuous response vari-
able it is based on. A binary response variable y, where y = I{z>0} for z ∼ N(0, σ 2 ), contains much less
information than z.
Secondary Models and Inference 247
(b)
(d)
FIGURE 7.5 Posterior mean potential function (c) and directional derivatives: (a) west, (b)
north, (d) south, (e) east. Dark regions indicate large values. Telemetry data are shown in panel
(c) as dark points.
directly as a function of the gradient of the underlying potential function. This spec-
ification results in a simpler model
form than
the multinomial, where the velocity
vectors are modeled as yj ∼ N p(μ̃j , β), . Recall, from the previous section on
potential functions, that the term p(μ̃j , β) represents the gradient operator of the
spatially explicit function p(μ̃j , β). As noted previously, there are several options for
248 Animal Movement
the potential function. One form for p(μ̃j , β) that is particularly useful when consider-
ing covariate influences on movement is the linear function p(μ̃j , β) = x(μ̃j ) β. One
can show that this model can be rewritten with the mean function equal to a linear
combination of gradients such that
for q covariates and covariance matrix that controls asymmetric velocities (i.e., drift
in the position process beyond that explained by x). The gradient vector for a given
covariate x is x(μ̃) = (dx/dμ̃1 , dx/dμ̃2 ) , the elements of which can be calculated
as dx/dμ̃1 ≈ (x(μ̃1 ) − x(μ̃1 + δ))/δ for small δ.
Hanks et al. (2011) borrow a concept from the discrete-time velocity modeling
approaches of Morales et al. (2004) and generalize the model to allow for tempo-
rally varying coefficients in a change-point framework. In this new specification,
Hanks et al. (2011) indexed the regression
coefficients
in the potential function by
time so that the model becomes yj ∼ N p(μ̃j , β j ), . Then they let β j arise from
the mixture ⎧
⎪
⎪ β 1 if tj ∈ (0, τ2 )
⎪
⎪
⎨β 2 if tj ∈ [τ2 , τ3 )
βj = .. , (7.8)
⎪
⎪
⎪
⎪
.
⎩
β N if tj ∈ [τN , T)
* Transdimensionality means that the parameter space changes on every iteration of a statistical algorithm
for fitting the model, like MCMC. These changes in the parameter space require modifications to the
MCMC algorithm so that the models with different numbers of change points can be fairly visited by
the algorithm.
Secondary Models and Inference 249
FIGURE 7.6 Position process realizations (top lines), corresponding velocity vectors (bot-
tom arrows), and telemetry observations (points).
largest northern fur seal rookeries exist at the Pribilof (i.e., Saint Paul and Saint
George) and Commander Islands (i.e., Bering Island and Medney Island) in the
summer. Male northern fur seals establish territories and breed with large groups
of females early in the summer. Generally, northern fur seals are pelagic foragers,
feeding on fish in the open ocean. During summer months, most northern fur seals
behave like central place foragers and respond to various environmental covariates
during their foraging trips. We used distance to rookery, sea surface temperature,
and primary productivity as covariates in the model (Figure 7.8). Figure 7.9 shows
the inference pertaining to the time-varying coefficients induced by the change-point
model in Equation 7.8. For this adult male northern fur seal, the credible intervals
for the coefficients indicate that the individual traveled away from the rookery (i.e.,
the coefficient for the gradient of distance to rookery was positive) during the early
part of the trip (up to day 12, approximately), then switched to respond negatively to
the gradient (Figure 7.9a). Figure 7.9b shows a similar temporal effect for sea surface
250 Animal Movement
Alaska
60
Latitude
55
50
FIGURE 7.7 Observed northern fur seal telemetry data (dark points) and Bering Sea. Alaska
shown in gray.
FIGURE 7.8 Bering sea environmental covariates: (a) distance to rookery, (b) sea surface
temperature, and (c) primary productivity. Observed telemetry data are shown as points.
(a) 0.02
0.01
0.00
β1
−0.01
−0.02
0 5 10 15
Day
(b) 1.5
1.0
0.5
β2
0.0
−0.5
−1.5
0 5 10 15
Day
(c)
0.003
0.001
β2
−0.001
−0.003
0 5 10 15
Day
FIGURE 7.9 Posterior 95% credible intervals (gray region) and posterior mean (dark line) for
the coefficients associated with covariates: (a) distance to rookery, (b) sea surface temperature,
and (c) primary productivity.
252 Animal Movement
0h 251 h
Latitude 57 57
Latitude
55 55
53 53
63 h 314 h
57 57
Latitude
Latitude
55 55
53 53
126 h 378 h
57 57
Latitude
Latitude
55 55
53 53
189 h 442 h
57 57
Latitude
55 55
53 53
FIGURE 7.10 Posterior mean gradient surface shown as arrows pointing in the direction of
largest gradient at a subset of time points during the 18-day foraging trip for the northern fur
seal. The position process is shown as a dark line.
Secondary Models and Inference 253
changes its behavior between 189 and 314 h, exhibiting more of a foraging pattern
(Figure 7.10). Finally, after 378 h, the individual returns to the rookery in a transiting
behavior again, indicated by the strong gradient field pointing toward the rookery.
The move probability (pj,move ) in Equation 7.9 can be thought of as a movement rate
scaled by a decreasing unit of time (t · λj,move ). Thus, if we properly normalize
Equation 7.9 so that it integrates to one, we have
e−τj λj,move
∞ −τ λ , (7.10)
0 e
j,move dτ
Thus, the asymptotic residence time model is exponentially distributed with param-
eter λj,move .
Returning to the multinomial model for moves, we arrive at a similar asymptotic
result for transitions to new pixels. Given that the individual is moving to a new pixel,
the probability of it moving to the lth neighboring cell is pj,l /pj,move . As before, if we
replace the transition probabilities with the associated rates scaled by t and take the
limit, we have
pj,l t · λj,l
lim = lim
t→0 pj,move t→0 t · λj,move
λj,l
= . (7.12)
λj,move
The limit is not necessary in Equation 7.12 because the t cancels in the numer-
ator and denominator, but we retain it to remain consistent with the derivation for
residence time.
We now have a model for residence time (7.11) and for movement (7.12). If we
assume conditional independence, a model for the joint process of residence and
movement arises as a product of Equations 7.11 and 7.12
λj,l
λj,move e−τj λj,move = λj,l e−τj λj,move . (7.13)
λj,move
Based on Equation 7.13, Hanks et al. (2015a) noticed that, for all pairs of sequential
stays and moves, the resulting likelihood is equivalent to a Poisson regression with a
temporally heterogeneous offset. To show this, note that we can always expand the
y
transition rate for the lth neighboring pixel in a product as λj,l = l̃=3 λ j,l̃ , where the
j,l̃
yj,l̃ are
1 if l̃ = l
yj,l̃ = , (7.14)
0 otherwise
as defined in Hooten et al. (2010b). Also,
recall that the overall movement rate is a
sum of pixel movement rates λj,move = l̃=3 λj,l̃ . Substituting these quantities into
Secondary Models and Inference 255
which is proportional to a product of Poisson probability mass functions for the ran-
dom variables yj,l̃ with offsets τj . Thus, for a sequence of stay/move pairs that occur
at the subset of prediction times J , we arrive at the likelihood
y −τj λj,l̃
λ j,l̃ e . (7.16)
j,l̃
j∈J l̃=3
One beneficial consequence of the model developed by Hanks et al. (2015a) is that
a reparameterization of the multinomial model of Hooten et al. (2010b) leads to a
secondary statistical model that is computationally efficient. There are two reasons
for the computational improvement: (1) The original set of prediction times needs to
approach infinity, but this model depends only on the total number of moves, which is
a function of pixel size, and (2) by using the sufficient statistics (yj,1 , yj,2 , yj,4 , yj,5 , τj )
for j ∈ J of the data structure used by Hooten et al. (2010b), the reparameterized
model of Hanks et al. (2015a) is a Poisson GLM and can be fit with any statistical
software.*
The last step in setting up a useful model framework is to link the movement
rates λj,l with covariates. Thus, consider the standard log-linear regression model
log(λj,l ) = xj,l β, where xj,l are the covariates associated with the lth neighbor of the
pixel in which μ̃j−1 falls, and β are the usual regression coefficients to be estimated.
As with any regression model, this one (7.16) can be generalized further to allow
for varying coefficients. In the animal movement context, it is sensible to allow for
time-varying coefficients, which could account for the individual’s residence time and
movement probabilities that may change during the period of time for which data are
collected. The resulting semiparametric model has the same form as Equation 7.16
but with link function modified so that
log(λj,l ) = xj,l β j
= xj,l Wj α, (7.17)
where Wj is a matrix of basis functions indexed in time and α is a new set of coef-
ficients to be estimated, instead of estimating β directly. The implementation of this
new model (7.17) only requires the creation of a modified set of covariates xj,l Wj
and then the estimated coefficients can be recombined with the matrices of basis
functions to recover β j = Wj α after the model has been fit to data. This procedure
allows us to view the β j as they vary over time. For example, based on a telemetry
* Notice that yj,3 is missing from the list of sufficient statistics because it originally represented a stay, but
now stays are represented by τj and moves are represented by the remaining multinomial zeros and ones
(yj,1 , yj,2 , yj,4 , yj,5 ).
256 Animal Movement
data set spanning an entire year, we can obtain explicit statistical inference to assess
whether residence time is influenced more by forest cover in the winter or summer.
The choice of basis functions, Wj , should match the goals of the study, and various
forms of regularization or model selection can be used to assess which coefficients
in α are helpful for prediction. By shrinking α toward zero with a penalized likeli-
hood approach or a Bayesian prior, one can essentially identify the optimal level of
smoothness in the β j over time. We would expect smoother β j over time in cases with
limited data.
Hanks et al. (2015a) examined various approaches for regularization (Hooten and
Hobbs 2015) of the parameters α and made a strong case for the use of a lasso penalty
(based on an L1 norm). Regularization can be used in Bayesian and non-Bayesian
contexts and the amount of shrinkage can be chosen via cross-validation. Hanks
et al. (2015a) employed both approaches to multiple imputation described earlier (i.e.,
approximate and fully Bayesian) and found strong agreement among inference using
as little as 50 imputation samples for μ̃.
We fit the reparameterized continuous-time discrete-space model developed by
Hanks et al. (2015a) to a subset of GPS telemetry data arising from an individual
female mountain lion (Puma concolor) in Colorado, USA. Based on the covariates
in Figure 7.11, we used the forest versus non-forest covariate (Figure 7.11a) for a
“static driver” of movement and the distance to potential kill site (Figure 7.11b) as a
“dynamic driver” of movement.
Using a semiparametric specification as in Equation 7.17 with hour of day repre-
sented in the basis function (Wj ), we fit the movement model to the data from the
adult female mountain lion. Figure 7.12 shows the inference obtained for the effects
of forest versus non-forest and distance to nearest kill site as a function of time of
day (in hours). The results in Figure 7.12a suggest a lack of evidence for an effect
of forest presence on the individual mountain lion. However, Figure 7.12b provides
some evidence that distance to nearest kill site temporally affects the potential func-
tion that could influence movement. We also fit a temporally homogeneous Poisson
GLM to the same data and found strong evidence for an effect of distance to nearest
kill site on the potential function (p < 0.001).
(a)
(b)
FIGURE 7.11 GPS telemetry data (points connected in sequence by dashed lines) for an adult
female mountain lion (Puma concolor) in Colorado and two spatial covariates: (a) presence of
non-forest (dark) versus forest (light) and (b) distance to nearest potential kill site (dark is far,
light is near).
f (μi |μi−1 , θ) such that the model for the position process at time ti is
g(x(μi ), β)f (μi |μi−1 , θ)
[μi |μi−1 , β, θ] ≡ . (7.18)
g(x(μ), β)f (μ|μi−1 , θ)dμ
In Chapter 4, we mentioned that these types of point process models are some-
times referred to as step selection functions (Fortin et al. 2005; Avgar et al. 2016).
258 Animal Movement
(a)
0.2
0.1
β1
0.0
−0.1
0 5 10 15 20
h
(b)
0.25
0.15
β2
0.05
−0.05
0 5 10 15 20
h
FIGURE 7.12 Inference for β resulting from fitting the reparameterized continuous-time
discrete-space model to an adult female mountain lion (Puma concolor) in Colorado using
the two spatial covariates: (a) presence of non-forest versus forest and (b) distance to nearest
potential kill site. Light shading represents a 95% confidence interval for the temporally vary-
ing coefficient and dark shading represents a 67% confidence interval. The temporally varying
point estimate is shown as the dark line.
Secondary Models and Inference 259
These methods definitely account for temporal scale while providing resource selec-
tion inference, but they do not allow you to choose the scale for inference. To put
the choice of scale back in the hands of the analyst, Hooten et al. (2014) devel-
oped an approach for combining continuous-time movement models with resource
selection functions. Their approach relied on the OU models of Johnson et al.
(2008a) to characterize the use and availability distributions (i.e., [μi |μi−1 , β, θ] and
f (μi |μi−1 , θ) from Equation 7.18). They reconciled the two distributions to obtain
resource selection inference (i.e., inference for β).
To characterize use and availability, Hooten et al. (2014) proposed to use the
smoother and predictor distributions resulting from a hierarchical model for the true
position process μ(t) (Figure 7.13). As we discussed in Chapter 3, the Kalman filter,
smoother, and predictor distributions all pertain to our understanding of the latent
temporal process. These distributions are useful for estimating state variables in hier-
archical time series models and are often paired with maximum likelihood or EM
FIGURE 7.13 Example of use (i.e., smoother, left) and availability (i.e., predictor, right)
distributions.
260 Animal Movement
algorithms to fit non-Bayesian models. In the animal movement context,* the predic-
tor distribution is the distribution of μ(ti ) given everything up to, but not including,
time ti . The filter is the distribution of μ(ti ) given everything up to and including
time ti . Finally, the smoother distribution is the distribution of μ(ti ) given everything
before and after time ti . Recall, from Chapter 3, that the predictor distribution is the
most diffuse, with the filter and smoother each more precise. In fact, the smoother dis-
tribution is our best estimate of μ(ti ) using all information about the individual’s path.
The predictor distribution tells us about the likely location of the individual given only
past movement. Thus, the predictor serves as a good estimator of availability, inform-
ing us about where the individual is likely to be based on previous movement alone.
By contrast, the smoother serves as a good estimator for actual space use.
Hooten et al. (2014) define [μi |μi−1 , β, θ] from Equation 7.18 as the smoother
distribution, and f (μi |μi−1 , θ) as the predictor distribution. Because Kalman meth-
ods are used to implement the CTCRW model of Johnson et al. (2008a), the smoother
and predictor distributions can be easily obtained using the “crawl” R package. To
estimate the selection coefficients β, Hooten et al. (2014) used the point estimate
for β that minimized the Kullback–Leibler (K–L) divergence between the left-hand
side and right-hand side of Equation 7.18. They conditioned on [μi |μi−1 , β, θ]
and f (μi |μi−1 , θ) and use the standard exponential resource selection function
g(x(μ(ti )), β) ≡ exp(x (μ(ti ))β). For example, consider the GPS telemetry data col-
lected for an individual mountain lion in Figure 7.14 spanning 30 days. We are
interested in inference for resource selection at the hourly temporal scale. The selec-
tion coefficient values that minimize the difference between the actual use (left-hand
side) and the predicted use (right-hand side) at time ti provide insight about the type
of selection occurring at that time. This provides a time-varying estimate β̂(ti ) that
can be temporally averaged to provide broader scale inference.
The magnitude of selection at time ti can also be measured using the actual
minimized K–L divergence (Dmin (ti )) and the original K–L divergence between
the predictor and smoother distributions (Dorig (ti )). The quantity e−(Dorig (ti )−Dmin (ti ))
serves as a measure of selection at time ti ; when e−(Dorig (ti )−Dmin (ti )) = 1, there is no
evidence of selection. Thus, Hooten et al. (2014) used the weights
(a) (b)
(c) (d)
FIGURE 7.14 GPS telemetry data (dark points connected by dashed lines) and spatial covari-
ates (background image) for resource selection inference: (a) Urban, (b) shrub, (c) bare ground,
and (d) elevation.
for the other covariates). Furthermore, the selection for higher elevations tends to
vary throughout the month-long period, but is somewhat temporally clustered (e.g.,
approximately day 20). Finally, the weights w(t) shown in Figure 7.15e indicate that
certain periods of time (e.g., day 20–25) are near zero, indicating a lack of evi-
dence for selection during that period. The temporally averaged coefficients were
β̄ = (−1.64, −3.51, −3.04, −0.01) . The fact that the averaged coefficient for eleva-
tion (β̄ 4 ) is close to zero suggests that, over the period of a month, elevation is not
consistently selected for or against.
262 Animal Movement
(a) 10
β1 5
0
−5
0 5 10 15 20 25 30
Day
(b) 4
0
β2
−4
−8
0 5 10 15 20 25 30
Day
(c) 10
5
β3
0
−5
0 5 10 15 20 25 30
Day
(d)
0.05
β4
−0.05
0 5 10 15 20 25 30
Day
(e)
0.006
w
0.000
0 5 10 15 20 25 30
Day
FIGURE 7.15 Optimal coefficients (β̂(t)) for the mountain lion data and covariates in
Figure 7.14. The time-varying coefficients at the hourly scale: (a) Urban, (b) shrub, (c) bare
ground, and (d) elevation. The time-varying weights, calculated using Equation 7.19 are shown
in panel (e).
Secondary Models and Inference 263
cover concentration, sea floor depth) in a Bayesian model that was fit using MCMC.
As part of an interdisciplinary collaboration on bearded seal ecology, Cameron et al.
(2016) were able to examine benthic foraging resource selection by drawing from the
posterior output of McClintock et al. (2016) and using multiple imputation to account
for location and state assignment uncertainty.
For their analysis, Cameron et al. (2016) synthesized trawl survey data for dozens
of benthic taxa sampled along the Chukchi corridor of Alaska, including most of
those known to be prey species of bearded seals based on stomach content data (e.g.,
bivalves). Using prey species biomass, sediment type, and sea floor depth as pre-
dictors partitioned into a fine set of grid cells, Cameron et al. (2016) fit a resource
selection model similar to the space-only Poisson point process model described by
Johnson et al. (2013) to each draw from the posterior output of McClintock et al.
(2016) using the readily available R package “INLA” (Lindgren and Rue 2015). This
is the exact same model as (4.51), but, in this case, the response variable, y(k) l , is
the number of locations μ(k) in grid cell l that were assigned to the benthic forag-
ing state for the kth draw from the posterior. Because benthic species distribution
−1900
−2000
Northing (km)
−2100
−2200
−2300
−2400
400 500 600 700 800 900
Easting (km)
FIGURE 7.16 Bearded seal benthic foraging locations identified within the Chukchi corridor
study area near Alaska, USA, from a discrete-time multistate movement model. Study area grid
is shaded by sea floor depth, where darker shades indicate deeper waters.
Secondary Models and Inference 265
and community composition are interrelated and the product of complicated ecolog-
ical relationships that are spatially correlated, Cameron et al. (2016) dealt with the
problem of multicollinearity by using principal component regression techniques and
singular value decomposition of the (standardized) design matrix X = UDV , where
the columns of U (i.e., the left singular vectors) form an orthonormal basis that were
used in lieu of X in Equation 4.51. After model fitting, the regression coefficients can
be easily back-transformed for inference on the original scale of the predictors.
For demonstration, we replicated the analysis of Cameron et al. (2016) for a single
bearded seal (Figure 7.16). We fit (4.51) to K = 4000 posterior samples of the loca-
tions and state assignments from the output of the multistate model fit by McClintock
et al. (2016). This particular seal exhibited several benthic foraging resource selec-
tion “hotspots” off the coast of Alaska along the Chukchi corridor (Figure 7.17).
Bivalves, sculpins (family Cottidae), sea urchins (class Echinoidea), and shrimp
(infraorder Caridea) represented a subset of the prey taxa that exhibited positive
selection coefficients for this particular seal (Figure 7.18).
While it is theoretically possible to explicitly incorporate dozens of environmen-
tal covariates related to specific behaviors into the multistate movement models
described in Chapter 5, it is more computationally efficient to perform a two-stage
analysis using multiple imputation as in Cameron et al. (2016). While multiple impu-
tation allowed Cameron et al. (2016) to account for location and state assignment
uncertainty, a disadvantage of their two-stage approach is that the prey biomass data
were not used to inform the movement process itself (and hence the estimated loca-
tions of benthic foraging activity). This could be particularly important when trying
(a) (b)
North
North
East East
FIGURE 7.17 Overall fitted (a) selection and (b) availability surfaces for a bearded seal along
the Chukchi corridor of Alaska, USA. Selection covariates included sea floor depth, sediment
type, and dozens of benthic taxa (e.g., bivalves, fish). Darker shades indicate greater intensity.
266 Animal Movement
(a)
North (b)
(c) (d)
North
East East
FIGURE 7.18 Individual selection surfaces for (a) sea urchin, (b) small sculpin, (c) large
surface bivalve, and (d) small shrimp biomass covariates for a bearded seal along the Chukchi
corridor of Alaska, USA. Darker shades indicate greater intensity.
to identify movement behaviors of even finer detail (e.g., foraging dives for bivalves
versus cod).
Recall, from Chapter 5, that discrete-time multistate movement models can be fit
using maximum likelihood when location measurement error and missing data are
negligible. When this is not the case, McClintock et al. (2016) proposed a potential
alternative using multiple imputation; instead of using computationally demanding
MCMC methods to fit discrete-time multistate movement models that account for
location measurement error (e.g., Jonsen et al. 2005; McClintock et al. 2012), real-
izations of the movement path obtained from “crawl” can be used as the data for
Secondary Models and Inference 267
269
270 Glossary
Markov: A random variable that is dependent on the rest of a process only through
its neighbors.
Markov process: A set of random variables that depend on each other only through
their neighbors.
Mixed model: A statistical model that contains both fixed and random effects.
Monte Carlo: Obtaining realizations of random variables by drawing them from a
probability distribution.
Multistate model: In animal movement ecology, a clustering model allowing for the
data or process to arise from a discrete set of probability regimes.
Nonparametric: A statistical model that does not fully specify a specific function as
a probability distribution for a random variable.
Norm: A distance function (not necessarily Euclidean). For example, |a − b| is the
L1 norm between vectors a and b (i.e., Manhattan distance).
OU process (Ornstein–Uhlenbeck): A Brownian motion process that has attraction
to a point.
Parametric: A statistical model that involves a specific probability distribution for
the random variable whose functional form depends on a set of parameters
that are often unknown.
Point process: A stochastic process where the positions of the events are the random
quantity of interest. In movement ecology, the events are typically either the
observed or true locations of the individual.
Posterior distribution: Probability distribution of parameters given observed data.
Posterior predictive distribution: Probability distribution of future data given the
observed data.
Precision: The inverse of variance (e.g., 1/σ 2 , or −1 if is a covariance matrix).
Prior distribution: A probability distribution useful in Bayesian modeling contain-
ing known information about the model parameters before the current data
are analyzed.
Probability density (or mass) function, PDF or PMF: A function expressing the
stochastic nature of a continuous or discrete random variable (usually
denoted as f (y) or [y] for random variable y).
Random effect: Parameters in a statistical model that are allowed to arise from a
distribution with unknown parameters.
Random field: A continuous stochastic process over space or time that is usually
correlated in some way.
Random walk: A dynamic temporal stochastic process that is not necessarily to a
central location.
Redistribution kernel: A function that describes the probability of moving from one
location to another in a period of time.
Seasonality: Periodicity in temporal processes, a commonly used term in time series.
Singular value decomposition: The decomposition of a matrix (X) into a product
of three matrices (i.e., X = UDV ), the left singular vectors U, a diagonal
matrix with singular values on the diagonal, and the right singular vectors V.
Spectral decomposition: An Eigen decomposition of a matrix (e.g., = Q Q ).
Stationary process: A process with covariance structure that does not vary with
location (in space or time).
Glossary 271
273
274 References
Besag, J. 1974. Spatial interaction and the statistical analysis of lattice systems. Journal of the
Royal Statistical Society, Series B, 36:192–225.
Beyer, H., J. Morales, D. Murray, and M.-J. Fortin. 2013. Estimating behavioural states from
movement paths using Bayesian state-space models: A proof of concept. Methods in
Ecology and Evolution, 4:433–441.
Bidder, O., J. Walker, M. Jones, M. Holton, P. Urge, D. Scantlebury, N. Marks, E. Magowan,
I. Maguire, and R. Wilson. 2015. Step by step: Reconstruction of terrestrial animal
movement paths by dead-reckoning. Movement Ecology, 3:1–16.
Biuw, M., B. McConnell, C. Bradshaw, H. Burton, and M. Fedak. 2003. Blubber and buoyancy:
Monitoring the body condition of free-ranging seals using simple dive characteristics.
Journal of Experimental Biology, 206:3405–3423.
Blackwell, P. 1997. Random diffusion models for animal movement. Ecological Modelling,
100:87–102.
Blackwell, P. 2003. Bayesian inference for Markov processes with diffusion and discrete
components. Biometrika, 90:613–627.
Blackwell, P., M. Niu, M. Lambert, and S. LaPoint. 2015. Exact Bayesian inference for animal
movement in continuous time. Methods in Ecology and Evolution, 7:184–195.
Boersma, P. and G. Rebstock. 2009. Foraging distance affects reproductive success in
Magellanic penguins. Marine Ecology Progress Series, 375:263–275.
Bolker, B. M. 2008. Ecological Models and Data in R. Princeton University Press, Princeton,
New Jersey, USA.
Borger, L., B. Dalziel, and J. Fryxell. 2008. Are there general mechanisms of animal home
range behaviour? A review and prospects for future research. Ecology Letters, 11:637–
650.
Bowler, D. and T. Benton. 2005. Causes and consequences of animal dispersal strategies:
Relating individual behaviour to spatial dynamics. Biological Reviews, 80:205–225.
Boyce, M., J. Mao, E. Merrill, D. Fortin, M. Turner, J. Fryxell, and P. Turchin. 2003. Scale
and heterogeneity in habitat selection by elk in Yellowstone National Park. Ecoscience,
10:321–332.
Boyd, J. and D. Brightsmith. 2013. Error properties of Argos satellite telemetry locations using
least squares and Kalman filtering. PLoS One, 8:e63051.
Breed, G., D. Costa, M. Goebel, and P. Robinson. 2011. Electronic tracking tag pro-
gramming is critical to data collection for behavioral time-series analysis. Ecosphere,
2:1–12.
Breed, G. A., I. D. Jonsen, R. A. Myers, W. D. Bowen, and M. L. Leonard. 2009. Sex-specific,
seasonal foraging tactics of adult grey seals (Halichoerus grypus) revealed by state–space
analysis. Ecology, 90:3209–3221.
Bridge, E., K. Thorup, M. Bowlin, P. Chilson, R. Diehl, R. Fléron, P. Hartl et al. 2011. Tech-
nology on the move: Recent and forthcoming innovations for tracking migratory birds.
BioScience, 61:689–698.
Brillinger, D. 2010. Modeling spatial trajectories. In Gelfand, A., P. Diggle, M. Fuentes, and P.
Guttorp, editors, Handbook of Spatial Statistics, pages 463–475. Chapman & Hall/CRC,
Boca Raton, Florida, USA.
Brillinger, D., H. Preisler, A. Ager, and J. Kie. 2001. The use of potential functions in modeling
animal movement. In Saleh, E., editor, Data Analysis from Statistical Foundations, pages
369–386. Nova Science Publishers, Huntington, New York, USA.
Brockwell, P. and R. Davis. 2013. Time Series: Theory and Methods. Springer Science &
Business Media, New York, New York, USA.
Broms, K., M. Hooten, R. Altwegg, and L. Conquest. 2016. Dynamic occupancy models for
explicit colonization processes. Ecology, 97:194–204.
References 275
Brost, B., M. Hooten, E. Hanks, and R. Small. 2015. Animal movement constraints improve
resource selection inference in the presence of telemetry error. Ecology, 96:2590–2597.
Brown, J. 1969. Territorial behavior and population regulation in birds: A review and
re-evaluation. The Wilson Bulletin, 81:293–329.
Buderman, F., M. Hooten, J. Ivan, and T. Shenk. 2016. A functional model for characterizing
long distance movement behavior. Methods in Ecology and Evolution, 7:264–273.
Burt, W. 1943. Territoriality and home range concepts as applied to mammals. Journal of
Mammalogy, 24:346–352.
Cagnacci, F., L. Boitani, R. A. Powell, and M. S. Boyce. 2010. Animal ecology meets
GPS-based radiotelemetry: A perfect storm of opportunities and challenges. Philosoph-
ical Transactions of the Royal Society of London B: Biological Sciences, 365:2157–
2162.
Calder, C. 2007. Dynamic factor process convolution models for multivariate space-time data
with application to air quality assessment. Environmental and Ecological Statistics,
14:229–247.
Cameron, M., B. McClintock, A. Blanchard, S. Jewett, B. Norcross, R. Lauth, J. Grebmeier,
J. Lovvorn, and P. Boveng. 2016. Bearded seal foraging resource selection related to
benthic communities and environmental characteristics of the Chukchi Sea. In Review.
Carbone, C., G. Cowlishaw, N. Isaac, and J. Rowcliffe. 2005. How far do animals go?
Determinants of day range in mammals. The American Naturalist, 165:290–297.
Carpenter, B., A. Gelman, M. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker,
J. Guo, P. Li, and A. Riddell. 2016. Stan: A probabilistic programming language. Journal
of Statistical Software.
Caswell, H. 2001. Matrix Population Models. Wiley Online Library, Sunderland, Mas-
sachusetts, USA.
Christ, A., J. Ver Hoef, and D. Zimmerman. 2008. An animal movement model incorporating
home range and habitat selection. Environmental and Ecological Statistics, 15:27–38.
Clark, J. 1998. Why trees migrate so fast: Confronting theory with dispersal biology and the
paleorecord. The American Naturalist, 152:204–224.
Clark, J. 2007. Models for Ecological Data: An Introduction. Princeton University Press,
Princeton, New Jersey, USA.
Clark, J., M. Lewis, J. McLachlan, and J. HilleRisLambers. 2003. Estimating population
spread: What can we forecast and how well? Ecology, 84:1979–1988.
Clobert, J. 2000. Dispersal. Oxford University Press, New York, USA.
Clobert, J., L. Galliard, J. Cote, S. Meylan, and M. Massot. 2009. Informed dispersal, het-
erogeneity in animal dispersal syndromes and the dynamics of spatially structured
populations. Ecology Letters, 12:197–209.
Codling, E., M. Plank, and S. Benhamou. 2008. Random walk models in biology. Journal of
the Royal Society Interface, 5:813–834.
Cooke, S., S. Hinch, M. Wikelski, R. Andrews, L. Kuchel, T. Wolcott, and P. Butler. 2004.
Biotelemetry: A mechanistic approach to ecology. Trends in Ecology and Evolution,
19:334–343.
Costa, D., P. Robinson, J. Arnould, A.-L. Harrison, S. E. Simmons, J. L. Hassrick, A. J. Hoskins
et al. 2010. Accuracy of Argos locations of pinnipeds at-sea estimated using Fastloc GPS.
PLoS One, 5:e8677.
Cote, J. and J. Clobert. 2007. Social information and emigration: Lessons from immigrants.
Ecology Letters, 10:411–417.
Coulson, T., E. Catchpole, S. Albon, B. Morgan, J. Pemberton, T. Clutton-Brock, M. Crawley,
and B. Grenfell. 2001. Age, sex, density, winter weather, and population crashes in soay
sheep. Science, 292:1528–1531.
276 References
Couzin, I., J. Krause, N. Franks, and S. Levin. 2005. Effective leadership and decision-making
in animal groups on the move. Nature, 433:513–516.
Couzin, I. D., J. Krause, R. James, G. D. Ruxton, and N. R. Franks. 2002. Collective memory
and spatial sorting in animal groups. Journal of Theoretical Biology, 218(1):1–11.
Cox, D. and D. Oakes. 1984. Analysis of Survival Data, volume 21. CRC Press, Boca Raton,
Florida, USA.
Craighead, F. and J. Craighead. 1972. Grizzly bear prehibernation and denning activities as
determined by radiotracking. Wildlife Monographs, (32):3–35.
Cressie, N. 1990. The origins of Kriging. Mathematical Geology, 22:239–252.
Cressie, N. 1993. Statistics for Spatial Data: Revised Edition. John Wiley and Sons, New York,
New York, USA.
Cressie, N. and C. Wikle. 2011. Statistics for Spatio-Temporal Data. John Wiley and Sons,
New York, New York, USA.
Dall, S., L.-A. Giraldeau, O. Olsson, J. McNamara, and D. Stephens. 2005. Information and
its use by animals in evolutionary ecology. Trends in Ecology & Evolution, 20:187–193.
Dall, S., A. Houston, and J. McNamara. 2004. The behavioural ecology of personality: Con-
sistent individual differences from an adaptive perspective. Ecology Letters, 7:734–739.
Dalziel, B., J. Morales, and J. Fryxell. 2008. Fitting probability distributions to animal
movement trajectories: Using artificial neural networks to link distance, resources, and
memory. The American Naturalist, 172:248–258.
Danchin, E., L.-A. Giraldeau, T. Valone, and R. Wagner. 2004. Public information: From nosy
neighbors to cultural evolution. Science, 305:487–491.
Datta, A., S. Banerjee, A. O. Finley, and A. E. Gelfand. 2016. Hierarchical nearest-neighbor
Gaussian process models for large geostatistical datasets. Journal of the American
Statistical Association, 111:800–812.
Davis, R. A., S. H. Holan, R. Lund, and N. Ravishanker. 2016. Handbook of Discrete-Valued
Time Series. CRC Press, Boca Raton, Florida, USA.
Delgado, M. and V. Penteriani. 2008. Behavioral states help translate dispersal movements into
spatial distribution patterns of floaters. The American Naturalist, 172:475–485.
Delgado, M., V. Penteriani, J. Morales, E. Gurarie, and O. Ovaskainen. 2014. A statistical
framework for inferring the influence of conspecifics on movement behaviour. Methods
in Ecology and Evolution, 5:183–189.
Deneubourg, J.-L., S. Goss, N. Franks, and J. Pasteels. 1989. The blind leading the blind:
Modeling chemically mediated army ant raid patterns. Journal of Insect Behavior,
2:719–725.
deSolla, S., R. Shane, R. Bonduriansky, and R. Brooks. 1999. Eliminating autocorrelation
reduces biological relevance of home range estimates. Journal of Animal Ecology,
68:221–234.
Diggle, P. 1985. A kernel method for smoothing point process data. Applied Statistics,
34:138–147.
Diggle, P., R. Menezes, and T. Su. 2010a. Geostatistical inference under preferential sampling.
Journal of the Royal Statistical Society: Series C (Applied Statistics), 59:191–232.
Diggle, P. and P. Ribeiro. 2002. Bayesian inference in Gaussian model-based geostatistics.
Geographical and Environmental Modelling, 6:129–146.
Diggle, P. and P. Ribeiro. 2007. Model-Based Geostatistics. Springer, New York, New York,
USA.
Diggle, P., J. Tawn, and R. Moyeed. 1998. Model-based geostatistics. Journal of the Royal
Statistical Society: Series C (Applied Statistics), 47(3):299–350.
Diggle, P. J., I. Kaimi, and R. Abellana. 2010b. Partial-likelihood analysis of spatio-temporal
point-process data. Biometrics, 66:347–354.
References 277
Fortin, D., H. Beyer, M. Boyce, D. Smith, T. Duchesne, and J. Mao. 2005. Wolves influence elk
movements: Behavior shapes a trophic cascade in Yellowstone National Park. Ecology,
86:1320–1330.
Frair, J., E. Merrill, J. Allen, and M. Boyce. 2007. Know thy enemy: Experience affects
elk translocation success in risky landscapes. The Journal of Wildlife Management, 71:
541–554.
Franke, A., T. Caelli, G. Kuzyk, and R. Hudson. 2006. Prediction of wolf (Canis lupus) kill-
sites using hidden Markov models. Ecological Modelling, 197(1):237–246.
Fraser, D., J. Gilliam, M. Daley, A. Le, and G. Skalski. 2001. Explaining leptokurtic move-
ment distributions: Intrapopulation variation in boldness and exploration. The American
Naturalist, 158:124–135.
Fryxell, J., A. Mosser, A. Sinclair, and C. Packer. 2007. Group formation stabilizes predator–
prey dynamics. Nature, 449:1041–1043.
Garlick, M., J. Powell, M. Hooten, and L. McFarlane. 2011. Homogenization of large-scale
movement models in ecology. Bulletin of Mathematical Biology, 73:2088–2108.
Garlick, M., J. Powell, M. Hooten, and L. McFarlane. 2014. Homogenization, sex, and differ-
ential motility predict spread of chronic wasting disease in mule deer in Southern Utah.
Journal of Mathematical Biology, 69:369–399.
Gaspar, P., J.-Y. Georges, S. Fossette, A. Lenoble, S. Ferraroli, and Y. Le Maho. 2006.
Marine animal behaviour: Neglecting ocean currents can lead us up the wrong track.
Proceedings of the Royal Society of London B: Biological Sciences, 273(1602):
2697–2702.
Gelfand, A. and A. Smith. 1990. Sampling-based approaches to calculating marginal densities.
Journal of the American Statistical Association, 85:398–409.
Gelfand, A. E., P. Diggle, P. Guttorp, and M. Fuentes. 2010. Handbook of Spatial Statistics.
CRC Press, Boca Raton, Florida, USA.
Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 2014. Bayesian Data Analysis. Taylor
& Francis, Boca Raton, Florida, USA.
Gelman, A. and J. Hill. 2006. Data Analysis Using Regression and Multilevel Hierarchical
Models. Cambridge University Press, Cambridge, United Kingdom.
Getz, W., S. Fortman-Roe, P. Cross, A. Lyons, S. Ryan, and C. Wilmers. 2007. LoCoH: Non-
parameteric kernel methods for constructing home ranges and utilization distributions.
PLoS One, 2:e207.
Giuggioli, L. and V. Kenkre. 2014. Consequences of animal interactions on their dynamics:
Emergence of home ranges and territoriality. Movement Ecology, 2:20.
Giuggioli, L., J. Potts, and S. Harris. 2012. Predicting oscillatory dynamics in the movement
of territorial animals. Journal of The Royal Society Interface, 9:1529–1543.
Grimmett, G. and D. Stirzaker. 2001. Probability and Random Processes. Oxford University
Press, New York, New York, USA.
Gurarie, E., C. Bracis, M. Delgado, T. Meckley, I. Kojola, and C. Wagner. 2016. What is the
animal doing? Tools for exploring behavioural structure in animal movements. Journal
of Animal Ecology, 85(1):69–84.
Gurarie, E. and O. Ovaskainen. 2011. Characteristic spatial and temporal scales unify models
of animal movement. The American Naturalist, 178(1):113–123.
Gurarie, E. and O. Ovaskainen. 2013. Towards a general formalization of encounter rates in
ecology. Theoretical Ecology, 6:189–202.
Hanks, E. and M. Hooten. 2013. Circuit theory and model-based inference for landscape
connectivity. Journal of the American Statistical Association, 108:22–33.
Hanks, E., M. Hooten, and M. Alldredge. 2015a. Continuous-time discrete-space models for
animal movement. Annals of Applied Statistics, 9:145–165.
References 279
Hanks, E., M. Hooten, D. Johnson, and J. Sterling. 2011. Velocity-based movement modeling
for individual and population level inference. PLoS One, 6:e22795.
Hanks, E., E. Schliep, M. Hooten, and J. Hoeting. 2015b. Restricted spatial regression in prac-
tice: Geostatistical models, confounding, and robustness under model misspecification.
Environmetrics, 26:243–254.
Hanski, I. and O. Gaggiotti. 2004. Ecology, Genetics, and Evolution of Metapopulations.
Academic Press, Burlington, Massachusetts, USA.
Harris, K. and P. Blackwell. 2013. Flexible continuous-time modeling for heterogeneous
animal movement. Ecological Modelling, 255:29–37.
Harrison, X., J. Blount, R. Inger, D. Norris, and S. Bearhop. 2011. Carry-over effects as drivers
of fitness differences in animals. Journal of Animal Ecology, 80:4–18.
Haydon, D., J. Morales, A. Yott, D. Jenkins, R. Rosatte, and J. Fryxell. 2008. Socially
informed random walks: Incorporating group dynamics into models of population
spread and growth. Proceedings of the Royal Society of London B: Biological Sciences,
275:1101–1109.
Haynes, K. and J. Cronin. 2006. Interpatch movement and edge effects: The role of behavioral
responses to the landscape matrix. Oikos, 113:43–54.
Hefley, T., K. Broms, B. Brost, F. Buderman, S. Kay, H. Scharf, J. Tipton, P. Williams, and
M. Hooten. 2016a. The basis function approach to modeling dependent ecological data.
Ecology, In Press.
Hefley, T., M. Hooten, R. Russell, D. Walsh, and J. Powell. 2016b. Ecological diffusion models
for large data sets and fine-scale inference. In Review.
Higdon, D. 1998. A process-convolution approach to modeling temperatures in the North
Atlantic Ocean. Environmental and Ecological Statistics, 5:173–190.
Higdon, D. 2002. Space and space-time modeling using process convolutions. In Anderson, C.,
V. Barnett, P. Chatwin, and A. El-Shaarawi, editors, Quantitative Methods for Current
Environmental Issues, pages 37–56. Springer-Verlag, London, UK.
Higgs, M. and J. V. Hoef. 2012. Discretized and aggregated: Modeling dive depth of harbor
seals from ordered categorical data with temporal autocorrelation. Biometrics, 68:965–
974.
Hobbs, N., C. Geremia, J. Treanor, R. Wallen, P. White, M. Hooten, and J. Rhyan. 2015. State-
space modeling to support adaptive management of brucellosis in the Yellowstone bison
population. Ecological Monographs, 85:525–556.
Hobbs, N. and M. Hooten. 2015. Bayesian Models: A Statistical Primer for Ecologists.
Princeton University Press, Princeton, New Jersey, USA.
Hodges, J. and B. Reich. 2010. Adding spatially-correlated errors can mess up the fixed effect
you love. The American Statistician, 64:325–334.
Holford, T. 1980. The analysis of rates and of survivorship using log-linear models. Biometrics,
36:299–305.
Holling, C. 1959a. Some characteristics of simple types of predation and parasitism. The
Canadian Entomologist, 91:385–398.
Holling, C. 1959b. The components of predation as revealed by a study of small-mammal
predation of the European pine sawfly. The Canadian Entomologist, 91:293–320.
Holzmann, H., A. Munk, M. Suster, and W. Zucchini. 2006. Hidden Markov models for circular
and linear-circular time series. Environmental and Ecological Statistics, 13(3):325–347.
Hooker, S., S. Heaslip, J. Matthiopoulos, O. Cox, and I. Boyd. 2008. Data sampling options
for animal-borne video cameras: Considerations based on deployments with Antarctic
fur seals. Marine Technology Society Journal, 42:65–75.
Hooten, M., J. Anderson, and L. Waller. 2010a. Assessing North American influenza dynamics
with a statistical SIRS model. Spatial and Spatio-Temporal Epidemiology, 1:177–185.
280 References
Hooten, M., F. Buderman, B. Brost, E. Hanks, and J. Ivan. 2016. Hierarchical animal movement
models for population-level inference. Environmetrics, 27:322–333.
Hooten, M., M. Garlick, and J. Powell. 2013a. Computationally efficient statistical differen-
tial equation modeling using homogenization. Journal of Agricultural, Biological and
Environmental Statistics, 18:405–428.
Hooten, M., E. Hanks, D. Johnson, and M. Aldredge. 2013b. Reconciling resource utilization
and resource selection functions. Journal of Animal Ecology, 82:1146–1154.
Hooten, M., E. Hanks, D. Johnson, and M. Aldredge. 2014. Temporal variation and scale in
movement-based resource selection functions. Statistical Methodology, 17:82–98.
Hooten, M. and N. Hobbs. 2015. A guide to Bayesian model selection for ecologists.
Ecological Monographs, 85:3–28.
Hooten, M. and D. Johnson. 2016. Basis function models for animal movement. Journal of the
American Statistical Association, In Press.
Hooten, M., D. Johnson, E. Hanks, and J. Lowry. 2010b. Agent-based inference for ani-
mal movement and selection. Journal of Agricultural, Biological and Environmental
Statistics, 15:523–538.
Hooten, M., D. Larsen, and C. Wikle. 2003. Predicting the spatial distribution of ground flora
on large domains using a hierarchical Bayesian model. Landscape Ecology, 18:487–502.
Hooten, M. and C. Wikle. 2008. A hierarchical Bayesian non-linear spatio-temporal model
for the spread of invasive species with application to the Eurasian collared-dove.
Environmental and Ecological Statistics, 15:59–70.
Hooten, M. and C. Wikle. 2010. Statistical agent-based models for discrete spatio-temporal
systems. Journal of the American Statistical Association, 105:236–248.
Horne, J., E. Garton, S. Krone, and J. Lewis. 2007. Analyzing animal movements using
Brownian bridges. Ecology, 88:2354–2363.
Horning, M. and R. Hill. 2005. Designing an archival satellite transmitter for life-long deploy-
ments on oceanic vertebrates: The life history transmitter. IEEE Journal of Oceanic
Engineering, 30:807–817.
Hughes, J. and M. Haran. 2013. Dimension reduction and alleviation of confounding for spa-
tial generalized linear mixed models. Journal of the Royal Statistical Society, Series B,
75:139–159.
Hutchinson, J. and P. Waser. 2007. Use, misuse and extensions of “ideal gas” models of animal
encounter. Biological Reviews, 82:335–359.
Illian, J., S. Martino, S. Sørbye, J. Gallego-Fernández, M. Zunzunegui, M. Esquivias, and J.
Travis. 2013. Fitting complex ecological point process models with integrated nested
Laplace approximation. Methods in Ecology and Evolution, 4:305–315.
Illian, J., A. Penttinen, H. Stoyan, and D. Stoyan. 2008. Statistical Analysis and Modelling of
Spatial Point Patterns. Wiley-Interscience, West Sussex, England.
Illian, J., S. Sorbye, H. Rue, and D. Hendrichsen. 2012. Using INLA to fit a complex point
process model with temporally varying effects—A case study. Journal of Environmental
Statistics, 3:1–25.
Iranpour, R., P. Chacon, and M. Kac. 1988. Basic Stochastic Processes: The Mark Kac
Lectures. Macmillan, New York.
Isojunno, S. and P. Miller. 2015. Sperm whale response to tag boat presence: Biologically
informed hidden state models quantify lost feeding opportunities. Ecosphere, 6(1):1–46.
Jetz, W., C. Carbone, J. Fulford, and J. Brown. 2004. The scaling of animal space use. Science,
306:266–268.
Ji, W., P. White, and M. Clout. 2005. Contact rates between possums revealed by proximity
data loggers. Journal of Applied Ecology, 42:595–604.
References 281
Johnson, A., J. Wiens, B. Milne, and T. Crist. 1992. Animal movements and population
dynamics in heterogeneous landscapes. Landscape Ecology, 7:63–75.
Johnson, D. 1980. The comparison of usage and availability measurements for evaluating
resource preference. Ecology, 61:65–71.
Johnson, D., M. Hooten, and C. Kuhn. 2013. Estimating animal resource selection from
telemetry data using point process models. Journal of Animal Ecology, 82:1155–1164.
Johnson, D., J. London, and C. Kuhn. 2011. Bayesian inference for animal space use and other
movement metrics. Journal of Agricultural, Biological and Environmental Statistics,
16:357–370.
Johnson, D., J. London, M. Lea, and J. Durban. 2008a. Continuous-time correlated random
walk model for animal telemetry data. Ecology, 89:1208–1215.
Johnson, D., D. Thomas, J. Ver Hoef, and A. Christ. 2008b. A general framework for the
analysis of animal resource selection from telemetry data. Biometrics, 64:968–976.
Jonsen, I. 2016. Joint estimation over multiple individuals improves behavioural state inference
from animal movement data. Scientific Reports, 6:20625.
Jonsen, I., J. Flemming, and R. Myers. 2005. Robust state-space modeling of animal movement
data. Ecology, 45:589–598.
Jonsen, I., R. Myers, and J. Flemming. 2003. Meta-analysis of animal movement using state-
space models. Ecology, 84:3055–3063.
Jonsen, I., R. Myers, and M. James. 2006. Robust hierarchical state-space models reveal diel
variation in travel rates of migrating leatherback turtles. Journal of Animal Ecology,
75:1046–1057.
Jonsen, I., R. Myers, and M. James. 2007. Identifying leatherback turtle foraging behaviour
from satellite telemetry using a switching state-space model. Marine Ecology Progress
Series, 337:255–264.
Jønsson, K., A. Tøttrup, M. Borregaard, S. Keith, C. Rahbek, and K. Thorup. 2016. Tracking
animal dispersal: From individual movement to community assembly and global range
dynamics. Trends in Ecology & Evolution, 31(3):204–214.
Kalman, R. 1960. A new approach to linear filtering and prediction problems. Transactions of
the ASME—Journal of Basic Engineering, 82:35–45.
Karatzas, I. and S. Shreven. 2012. Brownian Motion and Stochastic Calculus, volume 113.
Springer Science & Business Media, New York, New York, USA.
Katzfuss, M. 2016. A multi-resolution approximation for massive spatial datasets. Journal of
the American Statistical Association, In Press.
Kays, R., M. Crofoot, W. Jetz, and M. Wikelski. 2015. Terrestrial animal tracking as an eye on
life and planet. Science, 348(6240):aaa2478.
Keating, K. A. and S. Cherry. 2009. Modeling utilization distributions in space and time.
Ecology, 90:1971–1980.
Kendall, D. 1974. Pole-seeking Brownian motion and bird navigation. Journal of the Royal
Statistical Society, Series B, 36:365–417.
Kenward, R. 2000. A Manual for Wildlife Radio Tagging. Academic Press, San Diego,
California, USA.
Kery, M. and J. Royle. 2008. Hierarchical Bayes estimation of species richness and occupancy
in spatially replicated surveys. Journal of Applied Ecology, 45:589–598.
Kot, M., M. Lewis, and P. van den Driessche. 1996. Dispersal data and the spread of invading
organisms. Ecology, 77:2027–2042.
Langrock, R., J. Hopcraft, P. Blackwell, V. Goodall, R. King, M. Niu, T. Patterson, M. Pedersen,
A. Skarin, and R. Schick. 2014. Modelling group dynamic animal movement. Methods
in Ecology and Evolution, 5:190–199.
282 References
Langrock, R., R. King, J. Matthiopoulos, L. Thomas, D. Fortin, and J. Morales. 2012. Flexible
and practical modeling of animal telemetry data: Hidden Markov models and extensions.
Ecology, 93:2336–2342.
Lapanche, C., T. Marques, and L. Thomas. 2015. Tracking marine mammals in 3d using
electronic tag data. Methods in Ecology and Evolution, 6:987–996.
Laver, P. and M. Kelly. 2008. A critical review of home range studies. The Journal of Wildlife
Management, 72:290–298.
Le, N. D. and J. V. Zidek. 2006. Statistical Analysis of Environmental Space-Time Processes.
Springer Science & Business Media, New York, New York.
LeBoeuf, B., D. Crocker, D. Costa, S. Blackwell, P. Webb, and D. Houser. 2000. Foraging
ecology of northern fur seals. Ecological Monographs, 70:353–382.
Lee, H., D. Higdon, C. Calder, and C. Holloman. 2005. Efficient models for correlated data
via convolutions of intrinsic processes. Statistical Modelling, 5:53–74.
Lele, S. and J. Keim. 2006. Weighted distributions and estimation of resource selection
probability functions. Ecology, 87:3021–3028.
LeSage, J. and R. Pace. 2009. Introduction to Spatial Econometrics. Chapman & Hall/CRC,
Boca Raton, Florida, USA.
Levey, D., B. Bolker, J. Tewksbury, S. Sargent, and N. Haddad. 2005. Effects of landscape
corridors on seed dispersal by birds. Science, 309:146–148.
Lima, S. and P. Zollner. 1996. Towards a behavioral ecology of ecological landscapes. Trends
in Ecology and Evolution, 11:131–135.
Lindgren, F. and H. Rue. 2015. Bayesian spatial modelling with R-INLA. Journal of Statistical
Software, 63(19):1–25.
Lindgren, F., H. Rue, and J. Lindstrom. 2011. An explicit link between Gaussian fields and
Gaussian Markov random fields: The SPDE approach (with discussion). Journal of the
Royal Statistical Society, Series B, 73:423–498.
Liu, Y., B. Battaile, J. Zidek, and A. Trites. 2014. Bayesian melding of the dead-reckoned path
and GPS measurements for an accurate and high-resolution path of marine mammals.
arXiv preprint: 1411.6683.
Liu, Y., B. Battaile, J. Zidek, and A. Trites. 2015. Bias correction and uncertainty charac-
terization of dead-reckoned paths of marine mammals. Animal Biotelemetry, 3(51):
1–11.
Lloyd, M. 1967. Mean crowding. The Journal of Animal Ecology, 36:1–30.
Long, R., J. Kie, T. Bowyer, and M. Hurley. 2009. Resource selection and movements by
female mule deer Odocoileus hemionus: Effects of reproductive stage. Wildlife Biology,
15:288–298.
Lundberg, J. and F. Moberg. 2003. Mobile link organisms and ecosystem func-
tioning: Implications for ecosystem resilience and management. Ecosystems, 6:
87–98.
Lunn, D., J. Barrett, M. Sweeting, and S. Thompson. 2013. Fully Bayesian hierarchical mod-
elling in two stages, with application to meta-analysis. Journal of the Royal Statistical
Society: Series C (Applied Statistics), 62:551–572.
Lunn, D., A. Thomas, N. Best, and D. Spiegelhalter. 2000. WinBUGS—A Bayesian modelling
framework: Concepts, structure, and extensibility. Statistics and Computing, 10(4):325–
337.
Lyons, A., W. Turner, and W. Getz. 2013. Home range plus: A space-time characterization of
movement over real landscapes. Movement Ecology, 1:2.
Manly, B., L. McDonald, D. Thomas, T. McDonald, and W. Erickson. 2007. Resource Selec-
tion by Animals: Statistical Design and Analysis for Field Studies. Springer Science &
Business Media, Dordrecht, The Netherlands.
References 283
Moorcroft, P. and A. Barnett. 2008. Mechanistic home range models and resource selection
analysis: A reconciliation and unification. Ecology, 89:1112–1119.
Moorcroft, P. and M. Lewis. 2013. Mechanistic Home Range Analysis. Princeton University
Press, Princeton, New Jersey, USA.
Moorcroft, P., M. Lewis, and R. Crabtree. 1999. Home range analysis using a mechanistic
home range model. Ecology, 80:1656–1665.
Moorcroft, P., M. Lewis, and R. Crabtree. 2006. Mechanistic home range model capture spatial
patterns and dynamics of coyote territories in Yellowstone. Proceedings of the Royal
Society B, 273:1651–1659.
Morales, J. 2002. Behavior at habitat boundaries can produce leptokurtic movement distribu-
tions. The American Naturalist, 160:531–538.
Morales, J. and S. Ellner. 2002. Scaling up animal movements in heterogeneous landscapes:
The importance of behavior. Ecology, 83:2240–2247.
Morales, J., D. Fortin, J. Frair, and E. Merrill. 2005. Adaptive models for large herbivore
movements in heterogeneous landscapes. Landscape Ecology, 20:301–316.
Morales, J., J. Frair, E. Merrill, H. Beyer, and D. Haydon. 2016. Patch use of reintroduced elk
in the Canadian Rockies: Memory effects and home range development. Unpublished
Manuscript.
Morales, J., D. Haydon, J. Friar, K. Holsinger, and J. Fryxell. 2004. Extracting more out of
relocation data: Building movement models as mixtures of random walks. Ecology,
85:2436–2445.
Morales, J., P. Moorcroft, J. Matthiopoulos, J. Frair, J. Kie, R. Powell, E. Merrill, and D. Hay-
don. 2010. Building the bridge between animal movement and population dynamics.
Philosophical Transactions of the Royal Society of London B: Biological Sciences,
365:2289–2301.
Mueller, T. and W. Fagan. 2008. Search and navigation in dynamic environments—From
individual behaviors to population distributions. Oikos, 117:654–664.
Murray, D. 2006. On improving telemetry-based survival estimation. Journal of Wildlife
Management, 70:1530–1543.
Nathan, R., W. Getz, E. Revilla, M. Holyoak, R. Kadmon, D. Saltz, and P. Smouse. 2008. A
movement ecology paradigm for unifying organismal movement research. Proceedings
of the National Academy of Sciences, 105:19052–19059.
Nielson, R., B. Manly, L. McDonald, H. Sawyer, and T. McDonald. 2009. Estimating habitat
selection when GPS fix success is less than 100%. Ecology, 90:2956–2962.
Nielson, R. M. and H. Sawyer. 2013. Estimating resource selection with count data. Ecology
and Evolution, 3:2233–2240.
Northrup, J., M. Hooten, C. Anderson, and G. Wittemyer. 2013. Practical guidance on char-
acterizing availability in resource selection functions under a use-availability design.
Ecology, 94:1456–1464.
Nussbaum, M. 1978. Aristotle’s De Motu Animalium: Text with Translation, Commentary, and
Interpretive Essays. Princeton University Press, Princeton, New Jersey, USA.
Okubo, A., D. Grünbaum, and L. Edelstein-Keshet. 2001. The dynamics of animal group-
ing. In A. Okubo and S.A. Levin, editors, Diffusion and Ecological Problems: Modern
Perspectives, pages 197–237. Springer, New York, New York, USA.
Otis, D. and G. White. 1999. Autocorrelation of location estimates and the analysis of
radiotracking data. Journal of Wildlife Management, 63:1039–1044.
Ovaskainen, O. 2004. Habitat-specific movement parameters estimated using mark-recapture
data and a diffusion model. Ecology, 85:242–257.
References 285
Powell, R. 2000. Animal home ranges and territories and home range estimators. Research
Techniques in Animal Ecology: Controversies and Consequences, 442.
Powell, R. and M. Mitchell. 2012. What is a home range? Journal of Mammalogy, 93:948–958.
Pozdnyakov, V., T. Meyer, Y.-B. Wang, and J. Yan. 2014. On modeling animal movements
using Brownian motion with measurement error. Ecology, 95:247–253.
Prange, S., T. Jordan, C. Hunter, and S. Gehrt. 2006. New radiocollars for the detection of
proximity among individuals. Wildlife Society Bulletin, 34:1333–1344.
Preisler, H., A. Ager, B. Johnson, and J. Kie. 2004. Modeling animal movements using
stochastic differential equations. Environmetrics, 15:643–657.
Pyke, G. 2015. Understanding movements of organisms: It’s time to abandon the lévy foraging
hypothesis. Methods in Ecology and Evolution, 6:1–16.
R Core Team. 2013. R: A Language and Environment for Statistical Computing. R Foundation
for Statistical Computing, Vienna, Austria.
Rahman, M., J. Sakamoto, and T. Fukui. 2003. Conditional versus unconditional logistic
regression in the medical literature. Journal of Clinical Epidemiology, 56:101–102.
Ramos-Fernández, G. and J. Morales. 2014. Unraveling fission-fusion dynamics: How sub-
group properties and dyadic interactions influence individual decisions. Behavioral
Ecology and Sociobiology, 68:1225–1235.
Ratikainena, I., J. Gill, T. Gunnarsson, W. Sutherland, and H. Kokko. 2008. When density
dependence is not instantaneous: Theoretical developments and management implica-
tions. Ecology Letters, 11:184–198.
Rhodes, J., C. McAlpine, D. Lunney, and H. Possingham. 2005. A spatially explicit habitat
selection model incorporating home range behavior. Ecology, 86:1199–1205.
Ricketts, T. 2001. The matrix matters: Effective isolation in fragmented landscapes. The
American Naturalist, 158:87–99.
Ripley, B. 1976. The second-order analysis of stationary point processes. Journal of Applied
Probability, 13:587–602.
Risken, H. 1989. The Fokker–Planck Equation: Methods of Solution and Applications.
Springer, New York, New York, USA.
Rivest, L.-P., T. Duchesne, A. Nicosia, and D. Fortin. 2015. A general angular regression model
for the analysis of data on animal movement in ecology. Journal of the Royal Statistical
Society: Series C (Applied Statistics), 65:445–463.
Ronce, O. 2007. How does it feel to be like a rolling stone? Ten questions about dispersal
evolution. Annual Review of Ecology, Evolution, and Systematics, 38:231–253.
Rooney, S., A. Wolfe, and T. Hayden. 1998. Autocorrelated data in telemetry studies: Time to
independence and the problem of behavioural effects. Mammal Review, 28:89–98.
Royle, J., R. Chandler, R. Sollmann, and B. Gardner. 2013. Spatial Capture-Recapture.
Academic Press, Amsterdam, The Netherlands.
Royle, J. and R. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology: The Analysis
of Data from Populations, Metapopulations and Communities. Academic Press, London,
United Kingdom.
Rubin, D. 1987. Multiple Imputation for Nonresponse in Surveys. Wiley, New York, New York,
USA.
Rubin, D. 1996. Multiple imputation after 18+ years. Journal of the American Statistical
Association, 91:473–489.
Rue, H. and L. Held. 2005. Gaussian Markov Random Fields: Theory and Applications.
Chapman & Hall/CRC, Boca Raton, Florida, USA.
Rue, H., S. Martino, and N. Chopin. 2009. Approximate Bayesian inference for latent Gaus-
sian models by using integrated nested Laplace approximations. Journal of the Royal
Statistical Society: Series B (Statistical Methodology), 71(2):319–392.
References 287
White, G. and R. Garrott. 1990. Analysis of Wildlife Radio-Tracking Data. Academic Press,
San Diego, California, USA.
Wiens, J. 1997. Metapopulation dynamics and landscape ecology. Metapopulation Biology:
Ecology, Genetics, and Evolution, pages 43–62. Academic Press, San Diego, California,
USA.
Wikle, C. 2002. Spatial modeling of count data: A case study in modelling breeding bird survey
data on large spatial domains. In Lawson, A. and D. Denison, editors, Spatial Cluster
Modeling, pages 199–209. Chapman & Hall/CRC, Boca Raton, Florida, USA.
Wikle, C. 2003. Hierarchical Bayesian models for predicting the spread of ecological pro-
cesses. Ecology, 84:1382–1394.
Wikle, C. 2010a. Low-rank representations for spatial processes. In Gelfand, A., P. Diggle,
M. Fuentes, and P. Guttorp, editors, Handbook of Spatial Statistics, pages 107–118.
Chapman & Hall/CRC, Boca Raton, Florida, USA.
Wikle, C. 2010b. Hierarchical modeling with spatial data. In Gelfand, A., P. Diggle, M.
Fuentes, and P. Guttorp, editors, Handbook of Spatial Statistics, pages 89–106. Chapman
& Hall/CRC, Boca Raton, Florida, USA.
Wikle, C. and M. Hooten. 2010. A general science-based framework for nonlinear spatio-
temporal dynamical models. Test, 19:417–451.
Williams, T., L. Wolfe, T. Davis, T. Kendall, B. Richter, Y. Wang, C. Bryce, G. Elkaim, and C.
Wilmers. 2014. Instantaneous energetics of puma kills reveal advantage of felid sneak
attacks. Science, 346:81–85.
Wilson, R., M. Hooten, B. Strobel, and J. Shivik. 2010. Accounting for individuals, uncertainty,
and multiscale clustering in core area estimation. The Journal of Wildlife Management,
74:1343–1352.
Wilson, R., N. Liebsch, I. Davies, F. Quintana, H. Weimerskirch, S. Storch, K. Lucke et al.
2007. All at sea with animal tracks; Methodological and analytical solutions for the
resolution of movement. Deep Sea Research Part II: Topical Studies in Oceanography,
54:193–210.
Wilson, R., E. Shepard, and N. Liebsch. 2008. Prying into the intimate details of animal lives:
Use of a daily diary on animals. Endangered Species Research, 4:123–137.
Winship, A., S. Jorgensen, S. Shaffer, I. Jonsen, P. Robinson, D. Costa, and B. Block. 2012.
State-space framework for estimating measurement error from double-tagging telemetry
experiments. Methods in Ecology and Evolution, 3:291–302.
Wood, S. 2011. Fast stable restricted maximum likelihood and marginal likelihood estimation
of semiparametric generalized linear models. Journal of the Royal Statistical Society (B),
73:3–36.
Wood, S. N. 2003. Thin plate regression splines. Journal of the Royal Statistical Society: Series
B (Statistical Methodology), 65:95–114.
Worton, B. 1987. A review of models of home range for animal movement. Ecological
Modelling, 38:277–298.
Worton, B. 1989. Kernel methods for estimating the utilization distribution in home-range
studies. Ecology, 70:164–168.
Zucchini, W., I. L. MacDonald, and R. Langrock. 2016. Hidden Markov Models for Time
Series: An Introduction Using R, Second Edition. CRC Press. Boca Raton, Florida, USA.
Author Index
A Best, B.D., 6, 17
Best, N., 38, 165, 222
Aarts, G., 3, 110, 111, 141, 145, 185 Betancourt, M., 38, 222
Abellana, R., 133 Beyer, H., 3, 9, 134, 135, 175, 176, 177, 178, 179,
Ager, A., 203 180, 257
Albareda, D., 182 Bidder, O., 10
Albert, J., 245 Biuw, M., 9
Albon, S., 9 Blackwell, P., 5, 135, 169, 183, 186, 187, 199,
Aldredge, M., 100, 118, 119, 121, 135, 153, 230, 200, 212, 237
259, 260 Blanchard, A., 263, 264, 265
Alldredge, M., 207, 240, 242, 243, 253, 254, Block, B., 14
255, 256 Blount, J., 9
Allen, J., 5, 180 Boersma, P., 10
Altman, R., 186 Bohrer, G., 128
Altwegg, R., 207, 243 Boitani, L., 1, 17
Anderson, C., 112 Bolker, B., 3, 6
Anderson, D., 9, 176, 177, 178 Bolker, B.M., 54
Anderson, J., 207, 243 Bonduriansky, R., 122
Andow, D., 4 Boness, D., 182, 183
Andrews, R., 10, 13 Borger, L., 6, 101
Arganda, S., 14 Borregaard, M., 1
Arjas, E., 182 Boustany, A., 6, 17
Arnould, J., 13, 128 Boveng, P., 129, 145, 185, 187, 263, 264, 265, 266
Arthur, S., 134 Bowen, W., 182, 183
Auger-Méthé, M., 8 Bowen, W.D., 175
Austin, D., 182, 183 Bowler, D., 8
Avgar, T., 8, 9, 134, 178, 257 Bowlin, M., 14
Bowyer, T., 3
B Boyce, M., 5, 9, 122, 123, 134, 135, 176, 177, 178,
180, 257
Baddeley, A., 26, 28, 54 Boyce, M.S., 1, 17
Baguette, M., 5 Boyd, I., 11
Baker, J., 9, 178 Boyd, J., 129
Bakkenes, M., 1 Bracis, C., 6, 175
Balkenhol, N., 103 Bradshaw, C., 9
Banerjee, S., 31, 51, 53, 54 Brasseur, S., 185
Barboza, P., 9 Bravington, M., 183, 186, 187, 237
Barnett, A., 131, 135, 190 Bravington, M.V., 175
Barrett, J., 267 Breed, G., 8, 13
Barry, R., 215 Breed, G.A., 175
Basson, M., 183, 186, 187, 237 Bridge, E., 14
Bastille-Rousseau, G., 132, 134, 135, 136 Brightsmith, D., 129
Battaile, B., 197 Brillinger, D., 132, 202, 203, 205, 207, 238, 246
Baumgartner, M., 129 Brockwell, P., 98, 187
Bearhop, S., 9 Broms, K., 49, 98, 187, 207, 243
Beaumont, L., 1 Brooks, R., 122
Benhamou, S., 8, 163 Brost, B., 13, 49, 98, 129, 131, 132, 134, 136, 137,
Bennetts, R., 145 186, 187, 206, 222, 230, 256, 267
Benton, T., 8 Brown, G., 9, 178
Beringer, J., 11 Brown, J., 7, 10, 11
Berliner, L., 16, 89 Brown, R., 14
Berman, M., 26, 81 Brubaker, M., 38, 222
Besag, J., 43 Bryce, C., 10
291
292 Author Index
Buderman, F., 49, 98, 129, 130, 152, 186, 187, Crooks, K., 175, 176
211, 214, 220, 221, 222, 231, 267 Cross, P., 101, 145
Burt, W., 101
Burton, H., 9
Butler, P., 10, 13 D
Daley, M., 4, 5
Dall, S., 5, 8
C Dalziel, B., 6, 8, 101
Caelli, T., 187, 237 Danchin, E., 8
Cagnacci, F., 1, 17 Datta, A., 54
Calabrese, J., 123, 145, 153, 229, 231 Davidson, S., 128
Calder, C., 215 Davies, I., 10
Cameron, A., 1 Davis, R., 98, 187
Cameron, M., 129, 185, 187, 263, 264, 265, 266 Davis, R.A., 98
Carbone, C., 10, 11 Davis, T., 10
Carlin, B.P., 31, 54 de Polavieja, G., 14
Carlin, J.B., 54 De Vries, G., 7, 188
Carlson, T., 14 Deardon, R., 8, 178
Carpenter, B., 38, 222 Delgado, M., 4, 6, 7, 8, 175, 188
Carter, J., 14 Deneubourg, J.-L., 188
Caswell, H., 9 Deng, Z., 14
Catchpole, E., 9 deSolla, S., 122
Chacon, P., 201 Diehl, R., 14
Chandler, R., 3 Diekmann, O., 9
Cherry, S., 145 Diggle, P., 24, 30, 54, 145, 219
Chib, S., 245 Diggle, P.J., 133
Chilson, P., 14 Ditmer, M., 103, 135
Chopin, N., 54, 188 Dorazio, R., 54
Christ, A., 132, 134, 135, 138, 141, 211, 230, 256 Dorazio, R.M., 145
Clark, J., 3, 9, 54, 188 Douglas, D., 128
Clark, J.S., 6, 17 Duchesne, T., 134, 135, 170, 257
Clobert, J., 8 Dunn, D., 187
Clout, M., 14, 182 Dunn, J., 12, 121, 135, 199, 200, 202, 212, 237
Durban, J., 7, 12, 122, 129, 175, 187, 188, 200,
Clutton-Brock, T., 9
212, 215, 216, 217, 225, 226, 228, 237,
Codling, E., 163
239, 241, 259, 260
Colchero, F., 6, 17
Durrett, R., 4, 238
Collinghamne, Y., 1
Conde, D.A., 6, 17
Conquest, L., 207, 243 E
Cooke, S., 10, 13
Cornell, S., 3, 181, 182 Ebberts, B., 14
Costa, D., 13, 14, 128, 183 Eckert, K., 187
Cote, J., 8 Eckert, S., 187
Coulson, T., 9 Edelstein-Keshet, L., 7
Couzin, I., 7 Eftimie, R., 7, 188
Couzin, I.D., 188 Eggert, J., 11
Cowlishaw, G., 10 Elkaim, G., 10
Cox, D., 81 Ellner, S., 4, 9
Cox, O., 11 Eppard, M., 14
Crabtree, R., 6, 12, 135 Erasmus, B., 1
Craighead, F., 1 Erickson, W., 12, 17, 145
Craighead, J., 1 Esquivias, M., 26, 28
Crawley, M., 9
Cressie, N., 19, 24, 28, 31, 34, 54, 86, 89, 95, 96, F
98, 187, 191, 218
Crist, T., 5 Fagan, W., 5, 8, 123, 145, 153, 229, 231
Crocker, D., 183 Fahrig, L., 5
Crofoot, M., 1, 2, 7, 13 Farine, D., 7
Crone, E., 5 Fedak, M., 9
Cronin, J., 5 Fedak, M.A., 175
Author Index 293
Moorcroft, P., 2, 5, 6, 9, 12, 131, 135, 190 Peterson, E., 54, 217, 218
Moore, J., 187 Pettis, H., 3, 9
Morales, J., 2, 3, 4, 5, 6, 7, 8, 9, 153, 158, 162, Plank, M., 163
163, 164, 165, 166, 167, 168, 169, 170, Ploskey, G., 14
171, 172, 173, 174, 175, 176, 178, 179, Plummer, M., 38, 222
180, 183, 185, 186, 187, 188, 235, 236, Possingham, H., 134
237, 248, 263, 266 Potts, J., 132, 134, 135, 136, 188, 257
Morgan, B., 9 Potts, J.R., 188
Moss, S., 185 Powell, J., 4, 189, 191, 238
Mosser, A., 7, 11 Powell, R., 2, 5, 10, 101, 145
Moyeed, R., 30 Powell, R.A., 1, 17
Mueller, T., 5, 123, 145, 153, 229, 231 Pozdnyakov, V., 196, 197
Munk, A., 187, 237 Prange, S., 11
Murray, D., 3, 132, 134, 135, 136, 175 Preisler, H., 203
Myers, R., 147, 153, 158, 161, 165, 171, 175, 183, Prieto, R., 129
186, 187, 212, 236, 266 Pyke, G., 12
Myers, R.A., 175
Q
N
Quintana, F., 10, 182
Nathan, R., 1, 5
Newman, C., 182
Nicosia, A., 170 R
Nielson, R., 118, 145
Nielson, R.M., 145 Rahbek, C., 1
Niu, M., 186, 187, 237 Rahman, M., 135
Norcross, B., 263, 264, 265 Ramos-Fernández, G., 8
Norris, D., 9 Rao, C., 27, 134
Northrup, J., 112 Rathouz, P., 132, 134, 136, 256
Nussbaum, M., 1 Ratikainena, I., 9
Ravishanker, N., 98
Rebstock, G., 10
O Rees, M., 9
Rees, W., 10
Oakes, D., 81 Reich, B., 47, 119
Okubo, A., 4, 7 Reid, M., 8
Olson, D., 11 Rekola, H., 182
Olson, K., 123, 145, 153, 229, 231 Revilla, E., 1, 5
Olsson, O., 8 Rexstad, E., 165, 267
Otis, D., 122 Rhodes, J., 134
Ovaskainen, O., 3, 5, 6, 8, 11, 14, 147, 153, 181, Rhyan, J., 169
182, 183, 188 Ribeiro, P., 54, 219
Richardson, D., 1
P Richter, B., 10
Ricketts, T., 5
Pace, R., 44 Riddell, A., 38, 222
Paciorek, C., 47, 48, 217, 218 Ripley, B., 22, 24
Packer, C., 7, 11 Risken, H., 190
Parker, K., 9 Rittenhouse, C., 118
Pasteels, J., 188 Rivest, L.-P., 170
Patil, G., 27, 134 Robinson, P., 13, 14, 128
Patterson, H., 35 Rolland, R., 3, 9
Patterson, T., 6, 14, 147, 165, 183, 186, 187, Ronce, O., 8
237, 267 Rooney, S., 122
Patterson, T.A., 175 Rosatte, R., 3, 7
Pedersen, M., 186, 187 Rowcliffe, J., 10
Pérez-Escudero, A., 14 Royle, J., 3, 54, 123
Pemberton, J., 9 Rubak, E.,
Penteriani, V., 5, 7, 8, 188 Rubin, D., 240
Penttinen, A., 25, 54, 141, 145 Rubin, D.B., 54
Perry, G., 1 Rue, H., 38, 54, 143, 188, 191, 222, 264
Author Index 297
Russell, D., 6, 129, 175, 183, 184, 185, 186, Stephens, D., 8
187, 263 Sterling, J., 158, 207, 240, 246, 248, 253
Russell, J.C., 7, 188, 238 Stern, H.S., 54
Russell, R., 238 Stirzaker, D., 238
Rutz, C., 10 Stoffer, D., 66, 71, 96, 98, 187
Ruxton, G.D., 188 Storch, S., 10
Ryan, S., 101, 145 Stoyan, D., 25, 54, 141, 145
Stoyan, H., 25, 54, 141, 145
Strandburg-Peshkin, A., 7
S Strobel, B., 20, 103, 104, 105, 106
Su, T., 145
Sakamoto, J., 135 Suster, M., 187, 237
Saltz, D., 1, 5 Sutherland, W., 9
Sand, H., 11 Sweeting, M., 267
Sang, H., 51, 53 Swihart, R., 121, 122
Sargent, S., 6
Sartwell, J., 11
Sawyer, H., 145 T
Scantlebury, D., 10
Schabenberger, O., 24, 36, 54 Tawn, J., 30
Schaefer, J., 132, 134, 135, 136 Taylor, J.R., 238
Scharf, H., 7, 12, 49, 98, 187, 188 Tewksbury, J., 6
Schervish, M., 217, 218 Thomas, A., 38, 165, 222
Schick, R., 3, 9, 186, 187 Thomas, C., 1, 4
Schick, R.S., 6, 17 Thomas, D., 12, 17, 132, 134, 135, 138, 141, 145,
Schlägel, U., 153 211, 230, 256
Schliep, E., 47, 48, 119, 120, 237 Thomas, L., 6, 8, 14, 147, 153, 165, 168, 169, 170,
Schoenberg, F., 132 171, 172, 173, 174, 175, 176, 183, 186,
Schoener, T., 121 187, 237, 266
Schtickzelle, N., 5 Thompson, D., 129, 185
Schultz, C., 5 Thompson, P., 185
Scott, D., 100 Thompson, R., 35
Shaffer, S., 14 Thompson, S., 267
Shane, R., 122 Thorup, K., 1, 14
Shenk, T., 129, 130, 152, 187, 211, 214, 220, 221, Tipton, J., 49, 98, 187
222, 231 Tøttrup, A., 1
Shepard, E., 10, 182 Tracey, J., 175, 176
Shepherd, L., 110, 111, 141, 145 Trakhtenbrot, A., 1
Travis, J., 26, 28
Sheriff, S., 118
Treanor, J., 169
Shigesada, N., 4
Trites, A., 197
Shivik, J., 20, 103, 104, 105, 106
Turchin, P., 1, 3, 4, 11, 17, 134, 162, 189, 190,
Shreven, S., 238
191, 238
Shumway, R., 66, 71, 96, 98, 187
Turner, M., 9, 134, 176, 177, 178
Signer, J., 103
Turner, R., 26, 28, 54
Silva, M., 129
Turner, T., 26, 81
Silverman, B., 24, 100, 145
Turner, W., 101
Simmons, S.E., 13, 128
Sinclair, A., 7, 11
Skalski, G., 4, 5, 6 U
Skarin, A., 186, 187
Skellam, J., 4 Urge, P., 10
Slade, N., 121, 122
Small, R., 13, 129, 131, 132, 134, 136, 137, 206,
V
222, 230, 256
Smith, A., 16, 34, 36 Valone, T., 8
Smith, D., 9, 134, 135, 176, 177, 178, 257 van Buiten, R., 187
Smouse, P., 1, 5, 6, 9 van den Driessche, P., 4
Sollmann, R., 3 Venables, W., 24
Sørbye, S., 26, 28, 143 ver Hoef, J., 38, 54, 132, 134, 135, 138, 141, 145,
Spiegelhalter, D., 38, 165, 222 158, 182, 185, 211, 215, 217, 218, 230,
Stamps, J., 8 235, 236, 237, 256
298 Author Index
A B
ACFs, see Autocorrelation functions Backshift notation, 66–68
Additive modeling, 74 Basis functions, 76
Advection, see Drift Basis vectors, 76
Akaike Information Criterion (AIC), 70, 166 Bayesian
Algebra, 201 approach, 16, 96–98
Animal movement, 1, 212 AR(p) model, 70
encounter rates and patterns, 10–12 computing software, 222
energy balance, 10 contexts, 256
food provision, 10 geostatistics, 36–39
group movement and dynamics, 7–8 Kriging based on integrated likelihood
home ranges, territories, and groups, 6–7 model, 51
individual condition, 9 melding approach, 197
informed dispersal and prospecting, 8 methods, 17
mathematics of, 17 models, 15, 36, 240, 264
memory, 8–9 multiple imputation, 245, 248
notation, 14–15 Bayes’ law, 37
population dynamics, 3 Bearing, 169
relationships among data types, analytical Berman–Turner device, 26–27
methods, 2 Berman–Turner quadrature method, 142
spatial redistribution, 4–6 Bernoulli approach, 118
Animal telemetry, 1 Best linear unbiased predictor (BLUP), 34
data, 12–14 Bias, see Drift
Archival pop-up tags, 14 Big data, 54
Biologging, 13
Archival tags, 13
Biotelemetry technology, 13–14
Argos tags, 13
Birth-death MCMC, 248
Argos telemetry data, 128–129
Bivariate Gaussian density functions, 205
for harbor seal, 137–138
BLUP, see Best linear unbiased predictor
ARIMA model, see Autoregressive integrated
Bobcat telemetry data, 24
moving average model
Borrowing strength, 123
Attraction, 150
Brown bear (Ursus arctos), 138
Autocorrelation, 57, 121–123
Brownian bridges, 195–197
Autocorrelation functions (ACFs), 57
Brownian motion, 193, 197, 211
Autoregressive integrated moving average model
model, 223
(ARIMA model), 68, 73, 212
process, 192, 194, 195
Autoregressive models, 60; see also Vector
B-spline basis functions, 76
autoregressive models
ACF and PACF, 65
ACF for simulated time series, 61 C
AR(1) model, 60, 61–62
Gaussian assumption, 60–61 Callorhinus ursinus, see Northern fur seals
higher-order AR time series model, 64 CAR models, see Conditional autoregressive
simulated time series with heterogeneous models
trend, 64 Caribou (Rangifer tarandus), 132
univariate autoregressive temporal model, 63 “Carryover effects”, 9
Auxiliary data, 182 CDF, see Cumulative distribution function
estimated bivariate densities of harbor seal step Cervus canadensis, see Elk
length, 185 Change-point model, 250
estimated proportion, 186 Clustered spatial processes, 40
predicted locations and movement behavior Clustering models, 6
states, 184 Complete spatial random (CSR), 21
299
300 Subject Index
Multivariate Durbin–Watson statistics, 122 Point processes, 19; see also Temporal point
Multivariate Gaussian method, 129 processes
Multivariate normal random process, 195 density estimation, 23–24
Multivariate time series, 83; see also Hierarchical homogeneous SPPs, 21–23
time series models; Univariate time parametric models, 25–28
series Point process models, 19, 134
implementation, 87–88 autocorrelation, 121–123
vector autoregressive models, 83–87 connections with, 256
continuous-time models, 256–263
discrete-time models, 263–267
N measurement error, 127–131
Newton–Raphson method, 115 population-level inference, 123–127
Non-Bayesian resource selection functions, 107–117
contexts, 256 RUF, 117–121
methods, 17 space use, 99–107
models, 15, 260 spatio-temporal point process
Non-VHF animal telemetry tags, 13 models, 131–144
Northern fur seals (Callorhinus ursinus), Poisson point process model, 264
143, 248 Poisson probability mass functions, 255
Nugget effect, 33 Poisson regression approach, 111, 118
Population dynamics, 3, 11
Population-level inference, 123, 186–187
O Gaussian regression model, 123–124
Odocoileus hemionus (O. hemionus), 228 hierarchical model, 124–125, 127
OLS, see Ordinary least square hierarchical RSF model, 125–126
1-D discrete spatial domain, 189, 190 random effects model, 124–125
Ordinary least square (OLS), 39, 69 RSF parameter estimation, 126
Organisms, 3 Population-level movement models, 187
movement of, 1 Position models, 147; see also Velocity models
Ornstein–Uhlenbeck foraging model (OUF attraction, 150
model), 229 heterogeneous behavior, 153–158
Ornstein–Uhlenbeck model (OU model), 135, measurement error, 150–152
199–202, 223, 229–231 random walk, 147–149
foraging model, 229 temporal alignment, 153
prediction using, 231–235, 236 Posterior distribution, 37
two 1-D simulated conditional Potential functions, 202
processes, 203 Bayesian perspective, 211
correlated Gaussian random process, 204
negative bivariate Gaussian density
P functions, 206
PACFs, see Partial autocorrelation posterior, 210
functions posterior summary statistics for
Parameterizations, 173 parameters, 209
Parameter model, 89 potential surface, 205
Parametric models, 25–28 simulated individual trajectory, 208
Parametric statistical models, 15, 103 steeply rising boundary condition
Parametric temporal point process model, 78 delineating, 207
Partial autocorrelation functions (PACFs), time and space, 203
57, 58 Prediction using Ornstein–Uhlenbeck models,
Patch transitions, 178 231–235, 236
example of elk trajectory, 180 Predictive distribution, 34
posterior predictive check, 181 Predictive processes, 51–54
PDF, see Probability density function Predictor distribution, 260
Per capita vital rates, 3 Probability density function (PDF), 20
Perturbation theory, 191 Probability mass functions (PMFs), 36
Phoca vitulina, see Harbor seals; Harbor seals Probit link function, 245
(Phoca vitulina) Process convolution, 215
Plug-in, 234 Process model, 89
PMFs, see Probability mass functions Puma concolor, see Mountain lion
304 Subject Index