Statistical Modelling of Extreme Values
Statistical Modelling of Extreme Values
Statistical Modelling of Extreme Values
n,
X
n
n
N(0, 1),
so rescaling is needed to obtain a non-degenerate limit.
Same applies here: we seek limits of
M
n
b
n
a
n
for suitable sequences a
n
> 0 and b
n
.
Statistics of Extremes January 2008 slide 20
10
Extremal types theorem
Denition 1 The distributions F and F
are of the same type if there are constants a > 0 and b such that
F
0, x 0,
exp(x
), x > 0, > 0;
III : G(x) =
exp{(x)
}, x < 0, > 0,
1, x 0.
Conversely, each of these Gs may appear as a limit for the distribution of (Mn bn)/an, and does so when G itself is
the distribution of X.
Statistics of Extremes January 2008 slide 21
Three limiting distributions
10 5 0 5 10
0
.
0
0
.
1
0
.
2
0
.
3
Gumbel
x
P
D
F
10 5 0 5 10
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
Frechet, alpha=0.5
x
P
D
F
10 5 0 5 10
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
Negative Weibull, alpha=0.5
x
P
D
F
The three types are known as the Gumbel, Frechet and Weibull (strictly, negative Weibull),
respectively.
The Frechet (Type II) is bounded below, and the negative Weibull (Type III) is bounded above.
The standard Weibull is a distribution for minima.
Statistics of Extremes January 2008 slide 22
11
Example: Exponential maxima
If F is the standard exponential distribution, then
F(x + log n)
n
= (1 e
xlog n
)
n
=
1
1
n
e
x
n
exp{exp(x)}.
0 2 4 6 8 10 12
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
C
D
F
4 2 0 2 4 6 8
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
C
D
F
Distributions of maxima and renormalised maxima of n = 1, 7, 30, 365, 3650 standard exponential variables (from left to right), with Gumbel distribution (heavy).
Statistics of Extremes January 2008 slide 23
Example: Normal maxima
Maxima of standard normal variables also converge to a Gumbel limit, with
an = (2 log n)
0.5
, bn = (2 log n)
0.5
0.5(2 log n)
0.5
(log log n + log 4),
but convergence is extremely slow.
y
C
D
F
-4 -2 0 2 4
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
y
C
D
F
-4 -2 0 2 4
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Distributions of maxima and renormalised maxima of n = 1, 7, 30, 365, 3650 standard normal variables (from left to right), with Gumbel distribution (heavy).
Statistics of Extremes January 2008 slide 24
12
Using the limit law
We assume that for some a > 0 and b,
Pr(M
n
b)/a x G(x),
or equivalently,
PrM
n
x G(x b)/a = G
(x),
where G
_
1 +
_
x
__
1/
+
_
,
dened on x : 1 +(x )/ > 0.
From now on let x
+
= max(x, 0).
and are location and scale parameters
is a shape parameter determining the rate of tail decay, with
> 0 giving the heavy-tailed (Frechet) case
= 0 giving the light-tailed (Gumbel) case
< 0 giving the short-tailed (negative Weibull) case
Statistics of Extremes January 2008 slide 26
13
Quantiles and return levels
In terms of quantiles, take 0 < p < 1 and dene
x
p
=
_
1 log(1 p)
_
,
where G(x
p
) = 1 p.
In extreme value terminology, x
p
is the return level associated with the return period 1/p.
Log y
Q
u
a
n
tile
-2 0 2 4 6
0
5
1
0
1
5
Shape=0.2
Shape=0
Shape=-0.2
Statistics of Extremes January 2008 slide 27
Max-stability
Some insight into these results is obtained using the concept of max-stability.
Denition 3 A distribution G is said to be max-stable if
G
k
(a
k
x +b
k
) = G(x), k = 1, 2, . . . ,
for some constants a
k
and b
k
.
In other words, taking powers of G results only in a change of location and scale.
The connection with extremes is that a distribution is max-stable if and only if it is a GEV
distribution.
Statistics of Extremes January 2008 slide 28
14
Outline proof of GEV as limit law
Suppose that (M
n
b
n
)/a
n
has limit distribution G. Then, for large n,
Pr(M
n
b
n
)/a
n
x G(x)
and so for any k,
Pr(M
nk
b
nk
)/a
nk
x G(x).
But
Pr(M
nk
b
nk
)/a
nk
x = [Pr(M
n
b
nk
)/a
nk
x]
k
,
giving two expressions for the distribution of M
n
:
Pr(M
n
x) G
_
xbn
an
_
, Pr(M
n
x) G
1/k
_
xb
nk
a
nk
_
,
so that G and G
1/k
are identical apart from scaling coecients. Thus, G is max-stable and therefore GEV.
Statistics of Extremes January 2008 slide 29
Domains of attraction
For a given distribution function F, is it easy to determine suitable sequences a
n
and b
n
and to know what limit
G will occur?
For suciently smooth distributions, dening the reciprocal hazard function
r(x) =
1 F(x)
f(x)
.
and letting
b
n
= F
1
(1
1
n
), a
n
= r(b
n
), = lim
x
r
(x)
the limit distribution of (M
n
b
n
)/a
n
is
exp(1 +x)
1/
+
if ,= 0
and
exp(e
x
) if = 0.
Statistics of Extremes January 2008 slide 30
15
Convergence and approximation
There has been a good deal of work on the speed of convergence of M
n
to the limiting regime, which
depends on the underlying distribution Ffor example, convergence is slow for maxima of n Gaussian
variables.
From a statistical viewpoint, this is not so useful: we use the GEV as an approximate distribution for
sample maxima for nite (small?) n, so the key question is whether the GEV ts the available dataassess
this empirically.
Direct use of the GEV rather than the three types separately allows for exible modelling, and ducks the
question of which type is most appropriatethe data decide.
Testing for t of one type or another is usually unhelpful, because setting (say) = 0 can give
unrealistically precise inferences. Often uncertainty in extremes is (appropriately) large, and it can be
misleading to constrain inferences articially.
Statistics of Extremes January 2008 slide 31
Minima
All the ideas apply equally to minima, because
min(X
1
, . . . , X
n
) = max(X
1
, . . . , X
n
).
Our general discussion is for maxima, and we make this transformation without comment when we
model minima.
Statistics of Extremes January 2008 slide 32
Inference
Given observed annual maxima X
1
, . . . , X
k
, aim now is to make inferences on the GEV parameters
(, , ) . Possibilities include:
graphical techniqueshistorically important, remain useful for model-checking;
moment-based estimatorsusually inecient for extremes, as moments may not exist;
probability-weighted momentswidely used in hydrology, but dicult to extend to complex
data;
likelihood-based techniqueswidely used in statistics because
intuitive paradigm often readily adapted to complex data;
have unifying general approximate estimation and testing theory;
incorporation of prior information via Bayes theorem also uses likelihood as key ingredient.
Statistics of Extremes January 2008 slide 33
16
Likelihood
Denition 4 Let y be a data set, assumed to be the realisation of a random variable Y f(y; ), where is
unknown. Then the likelihood and log likelihood are
L() = L(; y) = f
Y
(y; ), () = log L(),
.
The maximum likelihood estimate (MLE)
satises (
.
Often
is unique and in many cases it satises the score (or likelihood) equation
()
= 0,
which is interpreted as a p 1 vector equation if is a p 1 vector.
The observed information dened as
J() =
2
()
T
is a p p matrix if has dimension p.
The likelihood ratio statistic (Wilks statistic) is W() = 2(
) () 0.
Statistics of Extremes January 2008 slide 34
Likelihood approximations
For large n, and if the data were generated from f(y;
0
, then
N
p
_
0
, J(
)
1
_
,
so a standard error for
r
is the square root of the rth diagonal element of J(
)
1
.
To obtain
and J(
W(
0
)
2
p
.
We base tests and/or condence intervals on these results, even when the sample is small (e.g. n = 20 or
less). Often the approximations are adequate for practical work.
For details, see for example Chapter 4 of Davison (2003) Statistical Models, CUP.
Statistics of Extremes January 2008 slide 35
17
Interest and nuisance parameters
In practice usually divides into
interest parameters
q1
central to the problem (often q = 1 in practice);
nuisance parameters
pq1
whose values are not of real importance.
Let
) (, ),
and dene the (generalised) likelihood ratio statistic
W
p
() = 2
_
(
) (,
)
_
= 2
_
(
) (
)
_
.
If
0
is the value of that generated the data from a regular model,
W
p
()
2
q
, for large n.
We often base condence intervals and tests for
0
on this, particularly if the prole log likelihood for ,
p
() = (
) = (,
i=1
_
log (1 + 1/) log
_
1 +
_
xi
_
+
_
1 +
_
xi
_
1/
+
_
;
note this equals if any 1 +(x
i
)/ < 0.
Numerical maximization of log-likelihood.
Calculation of standard errors from inverse of observed information matrix (also obtained numerically).
Diagnostic checks: probability plots, quantile plots, return level plots.
Comparison of competing nested models through deviance (likelihood ratio statistic), and of non-nested
models by minimising the Akaike information criterion AIC = 2dim()
) = min
_
1,
f(x
)q(x
(t)
[ x
)
f(x
(t)
)q(x
[ x
(t)
)
_
4. Dene
x
(t+1)
=
_
x
with probability
x
(t)
with probability 1 .
5. Set t = t + 1 and go to step 2.
Statistics of Extremes January 2008 slide 45
Comments on MetropolisHastings
The algorithm remains valid when f is only proportional to a target density function. This is why
the algorithm is so important in Bayesian applications.
Subject to regularity conditionsin particular, the possibility of moving everywhere in the
domainthe choice of q is arbitrary. Simple families such as random walks are generally chosen,
though poor choices will lead to poor algorithms.
A successful algorithm will:
converge quickly: density of x
t
is near f for small t;
mix well: have low autocorrelation.
These conditions facilitate the use of fairly short series of simulated data for inference, but use of
convergence diagnostics is essential, as there is usually no theoretical guarantee of convergence
Statistics of Extremes January 2008 slide 46
22
Port Pirie: MCMC analysis
MCMC chains for GEV parameters and (by transformation) 100-year return level in analysis of Port Pirie annual
maxima. Analysis based on vague, though proper, priors.
0 1000 3000 5000
3
.
8
4
.
2
4
.
6
5
.
0
Index
l
o
c
a
t
i
o
n
0 1000 3000 5000
0
.
2
0
.
4
0
.
6
0
.
8
Index
s
c
a
l
e
0 1000 3000 5000
0
.
6
0
.
2
0
.
2
Index
s
h
a
p
e
0 1000 3000 5000
4
.
5
5
.
0
5
.
5
6
.
0
6
.
5
Index
1
0
0
y
e
a
r
r
e
t
u
r
n
l
e
v
e
l
Statistics of Extremes January 2008 slide 47
Port Pirie: MCMC analysis
Posterior means:
= 3.87, = 0.208,
= 0.045,
Posterior standard deviations:
SD() = 0.029, SD() = 0.022, SD() = 0.099
Statistics of Extremes January 2008 slide 48
23
Port Pirie: MCMC analysis
3.75 3.85 3.95 4.05
0
5
1
0
1
5
2
0
location
P
r
o
b
a
b
i
l
i
t
y
d
e
n
s
i
t
y
f
u
n
c
t
i
o
n
Location
0.15 0.20 0.25 0.30
0
5
1
0
2
0
scale
P
r
o
b
a
b
i
l
i
t
y
d
e
n
s
i
t
y
f
u
n
c
t
i
o
n
Scale
0.4 0.0 0.2 0.4
0
1
2
3
4
5
6
shape
P
r
o
b
a
b
i
l
i
t
y
d
e
n
s
i
t
y
f
u
n
c
t
i
o
n
Shape
4.0 4.5 5.0 5.5 6.0 6.5 7.0
0
.
0
1
.
0
2
.
0
return level
P
r
o
b
a
b
i
l
i
t
y
d
e
n
s
i
t
y
f
u
n
c
t
i
o
n
100year return level
Posterior densities in analysis of Port Pirie annual maxima.
Statistics of Extremes January 2008 slide 49
Limitations of the GEV limit law
The GEV limit law for block maxima requires careful reading.
It states that when (normalized) maxima have a limit, that limit will be a member of the GEV
family. It does not (by itself) guarantee the existence of a limit, and there exist wide classes of
models for which no limit exists.
One interesting case is the Poisson distribution. If X
i
is a sequence of Poisson variables with
mean , a sequence of constants I
n
can be found such that
lim
n
Pr
_
max
1in
= I
n
or I
n
+ 1
_
= 1
and this degeneracy implies that no limit distribution for normalized maxima exists.
The discreteness in the Poisson distribution causes the oscillation between I
n
and I
n
+ 1. It is not
the cause of the limit degeneracy, which is actually a consequence of the Poisson tail decay.
Statistics of Extremes January 2008 slide 50
24
Point Process Approach slide 51
Improved Inferences
The annual maxima method can be inecient if other data are available. Alternative methods include:
peaks over thresholds,
r-largest order statistics.
Both are special cases of a point process representation.
Statistics of Extremes January 2008 slide 52
Example: Eskdalemuir rainfall
Time
H
o
u
r
l
y
r
a
i
n
f
a
l
l
(
m
m
)
1970 1975 1980 1985
0
5
1
0
1
5
Statistics of Extremes January 2008 slide 53
Example: Eskdalemuir rainfall
Time
H
o
u
r
l
y
r
a
i
n
f
a
l
l
(
m
m
)
1970 1975 1980 1985
0
5
1
0
1
5
Statistics of Extremes January 2008 slide 54
25
Example: Eskdalemuir rainfall
Time
H
o
u
r
l
y
r
a
i
n
f
a
l
l
(
m
m
)
1970 1975 1980 1985
0
5
1
0
1
5
Statistics of Extremes January 2008 slide 55
Point process
A point process T is simply a collection of pointstimes of avalanches, positions of stars in the
sky, time/position of earthquake times/epicentres; . . .
For any suitable set /, we simply count the number of points in /, and for some (random) n we
can write the process as
T =
n
j=1
X
j
where the X
j
are the positions of the points, and
x
puts unit mass at x.
The basic point process is the Poisson process.
Statistics of Extremes January 2008 slide 56
Poisson process
Denition 5 Let A R
d
, and let a function (/) 0 be dened for any measurable / A. A
Poisson process T on A with intensity measure satises
the number of points of T in /, denoted N(/), has the Poisson distribution with mean (/);
If /, B A are disjoint, then N(/) and N(B) are independent.
If / = [a
1
, x
1
] [a
d
, x
d
], and if
(x
1
, . . . , x
d
) =
d
(/)
x
1
x
d
exists, then is called the intensity (density) function of T.
Statistics of Extremes January 2008 slide 57
26
Point process limit: Basic idea
Suppose F unknown, and take X
1
, . . . , X
n
iid
F.
Form a 2-dimensional point process (i, X
i
); i = 1, . . . , n and characterize the behaviour of this
process in regions of the form / = [t
1
, t
2
] [u, ).
More formally, suppose (M
n
b
n
)/a
n
converges to the distribution
G(x) = exp
_
(1 +x)
1/
+
_
.
Construct a sequence of point processes on R
2
by
T
n
=
__
i
n + 1
,
X
i
b
n
a
n
_
: i = 1, . . . , n
_
.
Then T
n
T as n , where T is a Poisson process.
Statistics of Extremes January 2008 slide 58
Poisson limit
Poisson limit when rescaling samples of sizes 10, 100, 1000, 10,000.
t
(
X
-
b
)
/
a
0.0 0.2 0.4 0.6 0.8 1.0
-
8
-
6
-
4
-
2
0
2
t
(
X
-
b
)
/
a
0.0 0.2 0.4 0.6 0.8 1.0
-
8
-
6
-
4
-
2
0
2
t
(
X
-
b
)
/
a
0.0 0.2 0.4 0.6 0.8 1.0
-
8
-
6
-
4
-
2
0
2
t
(
X
-
b
)
/
a
0.0 0.2 0.4 0.6 0.8 1.0
-
8
-
6
-
4
-
2
0
2
Statistics of Extremes January 2008 slide 59
Limiting Poisson process
As n, on regions bounded away from , T
n
T, a Poisson process dened by its intensity
measure .
Consider the set /
x
= [0, 1] [x, ), for some x > u.
The Poisson property gives
Prno points in /
x
= exp(/
x
)
= PrM
n
x
exp
_
(1 +x)
1/
+
_
,
and so, by time-homogeneity of the process,
(t
1
, t
2
) [x, ) = (t
2
t
1
)(1 +x)
1/
+
.
Statistics of Extremes January 2008 slide 60
27
Statistical application
For statistical application, it is convenient to:
Assume the limiting process is a reasonable approximation for nite n above a high threshold, u.
To absorb the unknown scaling coecients into the intensity function and so work directly with
the original series.
To rescale the intensity so that the annual maximum has the GEV distribution with parameters
(, , ) .
This leads to modelling the series (i, X
i
); i = 1, . . . , n as a Poisson process above u with intensity
function
(t
1
, t
2
) (x, ) = n
y
(t
2
t
1
)1 +(x )/
1/
+
where n
y
is the number of years of observation, and the time limits are rescaled to 0 t
1
< t
2
1.
Statistics of Extremes January 2008 slide 61
Likelihood
Maximum likelihood inference is most natural.
For a region of the form /
v
= [0, 1] (v, ) for v > u, the likelihood is
L(/
v
; , , ) = exp(/
v
)
NAv
i=1
d(t
i
, x
i
)
= exp
_
n
y
_
1 +
v
_
1/
+
_
NAv
i=1
1
_
1 +
xi
_
1/1
+
,
where x
1
, . . . , x
NAv
is an enumeration of the N
Av
points that exceed the threshold v.
The parameters are the same as for the maximum model.
Statistics of Extremes January 2008 slide 62
Example: Eskdalemuir rainfall
Time
H
o
u
r
l
y
r
a
i
n
f
a
l
l
(
m
m
)
1970 1975 1980 1985
0
5
1
0
1
5
Model is now applied to a time series of hourly rainfall aggregates measured in mm with a threshold
of u = 5mm.
Statistics of Extremes January 2008 slide 63
28
Example: Eskdalemuir rainfall
(rain.fit <- fpot(esk.rain,threshold=5,model="pp",
start=list(loc=10,scale=1.2,shape=0.1),npp=365.25*24))
Threshold: 5
Number Above: 356
Proportion Above: 0.0024
Estimates
loc scale shape
10.13628 1.86637 0.06696
Standard Errors
loc scale shape
0.35380 0.23673 0.05379
Statistics of Extremes January 2008 slide 64
Example: Eskdalemuir rainfall
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Probability Plot
Empirical
M
o
d
e
l
6 8 10 12 14 16 18
5
1
0
1
5
2
0
Quantile Plot
Model
E
m
p
i
r
i
c
a
l
6 8 10 12 14 16
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
Density Plot
Quantile
D
e
n
s
i
t
y
0.05 0.20 1.00 5.00 20.00
5
1
0
1
5
2
0
2
5
Return Level Plot
Return Period
R
e
t
u
r
n
L
e
v
e
l
Statistics of Extremes January 2008 slide 65
29
Threshold methods
Let X
n,i
= (X
i
b
n
)/a
n
, for i = 1, . . . , n. Then, by the Poisson limit,
PrX
n,i
> u +x [ X
n,i
> u
(0, 1) (u +x, )
(0, 1) (u, )
=
_
1 +
x
+(u )
_
1/
+
.
Absorbing the unknown scaling coecients leads to the survivor function of the Generalized Pareto
Distribution (GPD):
PrX
n,i
> u +x [ X
n,i
> u =
_
1 +
x
_
1/
+
, x > 0,
where = +(u ).
Taking the limit 0 gives the exponential distribution as a special case.
Statistics of Extremes January 2008 slide 66
Alternative inference
An equivalent inference to the point process method is to apply the GPD model to the exceedances of
a threshold u. For the rainfall data we obtain:
(rain.gpdfit <- fpot(esk.rain,threshold=5,npp=365*24))
Deviance: 1058.954
Threshold: 5
Number Above: 356
Proportion Above: 0.0024
Estimates
scale shape
1.52239 0.06702
Standard Errors
scale shape
0.11488 0.05383
Statistics of Extremes January 2008 slide 67
30
Interpretation
There are now just two parameters, since the model conditions on the exceedance (one parameter
is lost corresponding to the crossing rate of u).
The estimate of is (almost!) identical to the point process analysis, while the change of
parameterization leads to a dierent value of .
Provided < 1, the mean residual life exists and satises
E(X u [ X > u) =
+u
1
,
we obtain a simple diagnostic for threshold selection. The mean exceedance above u should be
linear in u at levels for which the model is valid.
Suggests looking for linearity in a plot of the empirical mean residual life.
Statistics of Extremes January 2008 slide 68
Mean residual life plot
mrlplot(esk.rain) gives
0 2 4 6 8 10 12
0
.
0
0
.
5
1
.
0
1
.
5
2
.
0
2
.
5
3
.
0
3
.
5
Mean Residual Life Plot
Threshold
M
e
a
n
E
x
c
e
s
s
Interpretation hard, but u > 2 looks reasonable; u = 5 looks high.
Statistics of Extremes January 2008 slide 69
Threshold selection
Bias-variance trade-o: threshold too lowbias because of the model asymptotics being invalid; threshold too
highvariance is large due to few data points.
An alternative approach is to t the Poisson process model at many thresholds and look for parameter stability:
0 1 2 3 4 5 6 7
1
4
1
6
1
8
2
0
2
2
Threshold
L
o
c
a
t
i
o
n
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
Threshold
S
c
a
l
e
0 1 2 3 4 5 6 7
0
.
2
0
.
0
0
.
1
0
.
2
0
.
3
Threshold
S
h
a
p
e
Again, u = 5 looks too high.
Statistics of Extremes January 2008 slide 70
31
rlargest order statistics method
Consider the vector of r largest order statistics in a block (say, year) of data:
_
M
(1)
n
b
n
a
n
,
M
(2)
n
b
n
a
n
, . . . ,
M
(r)
n
b
n
a
n
_
rescaled in the same way as just the maximum, M
(1)
n
.
Setting u = M
(r)
n
in the point process likelihood gives
L = exp
_
_
_
_
1 +
M
(r)
n
_
1/
_
_
_
r
i=1
1
_
1 +
M
(i)
n
_
1/1
.
This is the likelihood contribution for a single block (year). The full likelihood is obtained by
multiplying such terms across the blocks.
Statistics of Extremes January 2008 slide 71
Venice Sea Levels
rlarg.fit(venice.data)
nllh:
[1] 1149.268
mle:
[1] 120.3961096 12.6929941 -0.1148273
se:
[1] 1.34704941 0.52516203 0.01901015
Statistics of Extremes January 2008 slide 72
32
Modelling Issues slide 73
Modelling issues
Extreme value data usually show:
dependence on covariate eects;
short term dependence (storms for example);
seasonality (due to annual cycles in meteorology);
longterm trends (due to gradual climatic change);
other forms of nonstationarity (for example, the deterministic eect of tides on sealevels).
For temporal dependence there is a suciently wide-ranging theory which can be invoked. Other
aspects have to be handled at the modelling stage.
We now discuss how to deal with
short-term dependence;
trends and seasonality.
Statistics of Extremes January 2008 slide 74
D(u
n
) condition
The usual (weak) condition which is adopted to eliminate the eect of long-range dependence is
Leadbetters D(u
n
) condition:
For all i
1
< . . . < i
p
< j
1
< . . . < j
q
with j
1
i
p
> l
|Pr{Xi
1
un, . . . , Xip
un, Xj
1
un, . . . , Xjq
un}
Pr{Xi
1
un, . . . , Xip
un}Pr{Xj
1
un, . . . , Xjq
un}| (n, l),
where (n, l
n
)0 for some sequence l
n
= o(n). This implies that rare events that are suciently
separated are independent.
Theorem 6 If D(u
n
) is satised with u
n
= a
n
x +b
n
, and if
PrM
n
a
n
x +b
n
G(x),
then G is a GEV distribution.
Conclusion: Same models apply for processes with (restricted) long-range dependence.
Statistics of Extremes January 2008 slide 75
33
Short-term dependence: Example
Suppose Y
1
, Y
2
, . . .
iid
exp(1), and X
i
= max(Y
i
, Y
i+1
):
0 20 40 60 80 100
0
1
2
3
4
5
6
x
Extremes tend to cluster in pairs.
Statistics of Extremes January 2008 slide 76
Exponential example
Marginal distribution:
PrX
i
< x = PrY
i
< x, Y
i+1
< x = (1 e
x
)
2
.
Let X
1
, X
2
, . . . be an independent series with the same marginal distribution, and M
n
= maxX
1
, . . . , X
n
.
Then
PrM
n
log(2n) < x = [1 expx log(2n)]
2n
=
_
1
1
2n
e
x
_
2n
exp
_
e
x
_
= G
2
(x),
while, for M
n
= maxX
1
, . . . , X
n
,
PrM
n
log(2n) < x = PrY
1
< x + log(2n), . . . , Y
n+1
< x + log(2n)
=
_
1
e
x
2n
_
n+1
exp
_
1
2
e
x
_
= G
1
(x),
and G
1
(x) = G
2
(x)
1/2
.
Statistics of Extremes January 2008 slide 77
34
Example: Moving maxima
Let Y0, Y1, Y2, . . .
iid
F, with
F(y) = exp
n
1
(a+1)y
o
, y > 0,
where 0 a 1 is a parameter, and dene the process Xi by
X0 = Y0, Xi = max{aYi1, Yi}, i = 1, . . . , n.
i
x
(i)
0 10 20 30 40 50
0
.1
0
.5
5
.0
5
0
.0
i
x
(i)
0 10 20 30 40 50
0
.1
0
.5
5
.0
5
0
.0
i
x
(i)
0 10 20 30 40 50
0
.1
0
.5
5
.0
5
0
.0
i
x
(i)
0 10 20 30 40 50
0
.1
0
.5
5
.0
5
0
.0
Statistics of Extremes January 2008 slide 78
Moving maxima: Asymptotic properties
PrX
i
x = PraY
i1
x, Y
i
x = exp(1/x),
Now, let X
1
, X
2
, . . . be a series of independent variables having a marginal standard Frechet distribution, and
dene M
n
= maxX
1
, . . . , X
n
. Then,
PrM
n
nz = [exp1/(nz)]
n
= exp(1/z).
On the other hand, for M
n
= maxX
1
, . . . , X
n
,
PrM
n
nz = PrX
1
nz, . . . , X
n
nz
= PrY
1
nz, . . . , Y
n
nz
=
_
exp
_
1
(a+1)nz
__
n
= exp (1/z)
1
a+1
,
It follows that
PrM
n
nz = [PrM
n
nz]
1
a+1
Statistics of Extremes January 2008 slide 79
35
Extremal index
The previous examples illustrate the following general result.
Theorem 7 Let X
i
be a stationary process and X
i
be independent variables with the same marginal
distribution. Set M
n
= maxX
1
, . . . , X
n
and M
n
= maxX
1
, . . . , X
n
. Under suitable regularity conditions,
Pr (M
n
b
n
)/a
n
z G
1
(z)
as n for normalizing sequences a
n
> 0 and b
n
, where G
1
is a non-degenerate distribution function, if
and only if
Pr (M
n
b
n
)/a
n
z G
2
(z),
where
G
2
(z) = G
1
(z)
for a constant called the extremal index that satises 0 < 1.
Thus if G
1
is GEV, then so is G
2
, with the same a strong robustness result.
Statistics of Extremes January 2008 slide 80
Extremal index
The extremal index can also be dened as
= lim
n
Prmax(X
2
, . . . , X
pn
) u
n
[ X
1
u
n
,
where p
n
= o(n), and the sequence u
n
is such that PrM
n
u
n
converges.
Loosely, is the probability that a high threshold exceedance is the nal element in a cluster of
exceedances.
Thus extremes occur in clusters whose (limiting) mean cluster size is 1/.
The mean distance between clusters is increased by a factor 1/.
In fact the distribution of a cluster maximum is the same as the marginal distribution of an
exceedance, so there is no bias in considering only cluster maxima, if we can identify clusters . . .
Statistics of Extremes January 2008 slide 81
36
Consequences
When clustering occurs, the notion of return level is more complex:
if = 1, then the 100-year-event has probability 0.368 of not appearing in the next 100 years;
if = 1/10, then on average the event also occurs ten times in a millenium, but all together: it has
probability 0.904 of not appearing in the next 100 years.
If we can estimate the tail of marginal distribution F (e.g. by tting to block maxima), then
Pr(M
n
x) F(x)
n
G(x),
where G is GEV with parameters , , . The marginal quantiles are approximately
F
1
(p) G
1
(p
n
) > G
1
(p
n
),
so may be much larger than would be the case with = 1.
A similar argument shows that ignoring can lead to over-estimating a return level estimated using G.
Statistics of Extremes January 2008 slide 82
D
(u
n
) condition
The D
(u
n
) condition is said to hold for the process X
i
if
limsup
n
n
[n/k]
j=2
PX
1
> u
n
, X
j
> u
n
0 as k
If a stationary process satises both the D(u
n
) and D
(u
n
) conditions, then its extremal index is
= 1.
D
(u
n
) holds for many well-known processes, for example Gaussian autoregressive processes. This
raises the issue of how best to model the extremes of such processes, since at any realistic level,
dependence is likely to be strong, while the asymptotic limit is independence.
Other conditions D
, D
(2)
, . . . facilitate extremal index calculation in special cases.
Statistics of Extremes January 2008 slide 83
Modelling techniques
The above discussion tells us that under the weak condition D(u
n
) and some mild additional
conditions, the maxima of stationary series can be modelling using the GEV, and that exceedances
can be modelling using the GPD. This is a strong robustness result, but it implies that extremes will
occur in clusters of correlated observations.
Possible modelling strategies using exceedances and the GPD are:
to identify clusters and model cluster maxima only;
as above, but also estimate the extremal index empirically;
to ignore the dependence on the basis that the marginal model is valid, but to inate standard
errors to account for reduction in independent information;
to specify explicit model for dependence, such as a rst-order Markov chain.
Statistics of Extremes January 2008 slide 84
37
Example: Wooster temperatures
Day Index
D
e
g
re
e
s
B
e
lo
w
0
F
.
0 10 20 30 40
-3
0
-2
0
-1
0
0
1
0
2
0
1 2 3 4
Day Index
D
e
g
re
e
s
B
e
lo
w
0
F
.
0 10 20 30 40
-3
0
-2
0
-1
0
0
1
0
2
0
1 2
Simple estimates of the extremal index are based on empirical means of clusters. Here the runs method is used:
a cluster is deemed to have terminated when there are r consecutive observations below the threshold. Left:
r = 1 gives 4 clusters. Right: r = 3 gives 2 clusters.
Statistics of Extremes January 2008 slide 85
Calculation of return levels
The m-observation return level is
x
m
= u +
_
(m
u
)
1
_
,
where and are the parameters of the threshold excess generalized Pareto distribution,
u
is the
probability of an exceedance of u, and is the extremal index.
u
=
n
u
n
and
=
n
c
n
u
.
where n
c
is number of clusters and n
u
is number of exceedances.
So simply estimate the component
u
by n
c
/n.
Statistics of Extremes January 2008 slide 86
Example: Wooster temperatures
u = 10 u = 20
r = 2 r = 4 r = 2 r = 4
n
c
31 20 43 29
11.8 (3.0) 14.2 (5.2) 17.4 (3.6) 19.0 (4.9)
2
, t > t
0
;
(t) = +y(t).
Statistics of Extremes January 2008 slide 91
Parameter estimation
Model specication (example)
Z
t
GEV((t), (t), (t)),
Likelihood (for complete parameter set ,
L() =
m
t=1
g(z
t
; (t), (t), (t)),
where g is GEV model density.
Maximization of L yields maximum likelihood estimates.
Standard likelihood techniques also yield standard errors, condence intervals etc.
Statistics of Extremes January 2008 slide 92
40
Model reduction
For nested models /
0
/
1
, the deviance statistic is
D = 2
1
(/
1
)
0
(/
0
),
Based on asymptotic likelihood theory, /
0
is rejected by a test at the -level of signicance if
D > c
, where c
(t)),
the standardized variables
Z
t
=
1
(t)
log
_
1 +
(t)
Z
t
(t)
(t)
_
,
each have the standard Gumbel distribution, with probability distribution function
Pr
Z
t
z = exp(e
z
), z R.
Possible diagnostics:
probability plot:
_
i/(m+ 1), exp(exp( z
(i)
)); i = 1, . . . , m
_
quantile plot:
__
log [logi/(m+ 1)] , z
(i)
_
; i = 1, . . . , m
_
Statistics of Extremes January 2008 slide 94
Other extreme value models
Similar techniques are applicable for the threshold exceedance and point process models, but
threshold selection is likely to be a more sensitive issue.
Time-varying thresholds may also be appropriate, though there is little guidance on how to make
such a choice.
Statistics of Extremes January 2008 slide 95
Example: Fremantle sea levels
Model Log-likelihood
Constant 43.6
Linear in 49.9
Quadratic in 50.6
Linear in and 50.7
Based on likelihood considerations, best model is linear in .
For this model, trend is around 2mm per year and
= 0.125 (0.070).
(Note: Model improves further by inclusion of SOI as a covariate for ).
Statistics of Extremes January 2008 slide 96
41
Example: Fremantle sea levels
Year
S
e
a
-
le
v
e
l
(
m
e
t
r
e
s
)
1900 1920 1940 1960 1980
1
.
2
1
.
4
1
.
6
1
.
8
Fitted trend in location parameter for Fremantle annual maximum sea levels.
Statistics of Extremes January 2008 slide 97
Example: Fremantle sea levels
Empirical
M
o
d
e
l
0.0 0.2 0.4 0.6 0.8 1.0
0
.0
0
.2
0
.4
0
.6
0
.8
1
.0
Residual Probability Plot
Model
E
m
p
ir
ic
a
l
-1 0 1 2 3 4
0
2
4
6
Residual Quantile Plot (Gumbel Scale)
Probability and quantile plots for nonstationary GEV analysis of Fremantle annual maximum sea levels.
Statistics of Extremes January 2008 slide 98
Example: Race times
Model Log-likelihood
2
t
.
has log-likelihood 49.5. Not so good as quadratic model, though comparison via likelihood ratio test
is invalid as models are not nested. Better behaviour for large t suggests a preferable model though.
Year
R
a
c
e
T
im
e
(
s
e
c
s
.)
1975 1980 1985 1990
2
3
2
2
3
4
2
3
6
2
3
8
2
4
0
2
4
2
2
4
4
2
4
6
Statistics of Extremes January 2008 slide 101
43
Example: Wooster Temperature Series
Day Index
D
e
g
r
e
e
s
B
e
lo
w
Z
e
r
o
F
.
0 100 200 300 400
-
4
0
-
2
0
0
2
0
Winter
Day Index
D
e
g
r
e
e
s
B
e
lo
w
Z
e
r
o
F
.
0 100 200 300 400
-
6
0
-
4
0
-
2
0
0
Spring
Day Index
D
e
g
r
e
e
s
B
e
lo
w
Z
e
r
o
F
.
0 100 200 300 400
-
7
0
-
6
0
-
5
0
-
4
0
Summer
Day Index
D
e
g
r
e
e
s
B
e
lo
w
Z
e
r
o
F
.
0 100 200 300 400
-
7
0
-
5
0
-
3
0
-
1
0
Autumn
One approach to handle seasonality is to model seasons separately. This is not likely to be sucient
here, at least with only 4 seasons.
Statistics of Extremes January 2008 slide 102
Example: Wooster Temperature Series
Year
D
a
ily
M
in
im
u
m
T
e
m
p
e
r
a
t
u
r
e
(
D
e
g
r
e
e
s
B
e
lo
w
0
F
.
)
1983 1984 1985 1986 1987 1988
-
6
0
-
4
0
-
2
0
0
2
0
Selection of a time-varying threshold for negated Wooster temperature series.
Statistics of Extremes January 2008 slide 103
Example: Wooster Temperature Series
Model p
1. Time-homogeneous 3 143.6
2. As 1. but periodic in 5 126.7
3. As 2. but periodic in log 7 143.6
4. As 3. but periodic in 9 145.9
5. As 3. plus linear trend in and log 9 145.1
6. As 3. with separate for each season 10 143.9
Likelihood considerations lead to Model 3 as preferable.
Statistics of Extremes January 2008 slide 104
44
Semiparametric regression
We may relax parametric assumptions, or mix them with more exible forms. For example, we may want to
model temperatures at sites in the Alps as dependent on altitude a(x), where x is location, plus a smooth
function of time t, giving
(x, t) =
0
+
1
a(x) +g(t), or (x, t) =
0
+
1
a(x) +gt, a(x).
Statistics of Extremes January 2008 slide 105
Formulation
A standard approach is then to include basis functions in the model, so
0
+
1
a(x) +g(t)
0
+
1
a(x) +
1
b
1
(t) + +
p
b
p
(t),
where the functions b
1
(t), . . . , b
p
(t) may be splines, polynomials, etc., depending on the properties sought.
Often take splines, which have optimality properties, and which correspond to prediction of stochastic
processes.
Usually penalise the smoothing parameters = (
1
, . . . ,
p
)
T
, for example, maximising a log likelihood of
form
(, ) +
T
K,
where smoothing parameter
0
.2
5
0
.2
0
0
.1
5
0
.1
0
0
.0
5
North Atlantic oscillation index
x
i
4 2 0 2
0
.2
5
0
.2
0
0
.1
5
0
.1
0
0
.0
5
Statistics of Extremes January 2008 slide 108
46
Swiss winter temperatures
Estimated 0.05 quantiles for winter temperatures at Rheinfelden (298m), Vattis (957m), and Arosa
(1821m), as a function of the North Atlantic oscillation index
3
0
2
5
2
0
1
5
1
0
5
Rheinfelden
3
0
2
5
2
0
1
5
1
0
5
Vattis
3
0
2
5
2
0
1
5
1
0
5
Arosa
Statistics of Extremes January 2008 slide 109
Issues with smoothing extremes
Care needed with choice of model: with GPD, change of threshold u u
changes scale
parameter:
u
u
=
u
+(u
u),
so, for example, the formulation
(
u
, ) = (expg(t), h(x))
at threshold u will become
(
u
, ) =
_
expg(t) +h(x)(u u
), h(x)
_
,
at threshold u
n
=
_
max
i=1,...,n
X
i
/n, max
i=1,...,n
Y
i
/n
_
the marginal distributions of M
n
are standard Frechet for all n. Remaining questions concern
dependence of limit distribution only.
Statistics of Extremes January 2008 slide 114
48
Limit distribution of componentwise maxima
Let M
n
= (M
x,n
, M
y,n
) be componentwise maxima of independent vectors with standard Frechet marginal
distributions. Then if
Pr{M
x,n
x, M
y,n
y}
d
G(x, y),
where G is a non-degenerate distribution function, G has the form
G(x, y) = exp{V (x, y)}, x > 0, y > 0
where
V (x, y) = 2
Z
1
0
max
w
x
,
1w
y
dH(w),
and H is a distribution function on [0, 1] satisfying the mean constraint
Z
1
0
wdH(w) = 1/2.
If H is dierentiable with density h, then
V (x, y) = 2
Z
1
0
max
w
x
,
1w
y
h(w)dw.
Statistics of Extremes January 2008 slide 115
Special cases
Independence: When H is a measure with masses 0.5 on w = 0 and w = 1
G(x, y) = exp(x
1
+y
1
), x > 0, y > 0.
Perfect dependence: When H is a measure that places unit mass on w = 0.5
G(x, y) = expmax(x
1
, y
1
), x > 0, y > 0,
which is the distribution function of variables that are marginally standard Frechet, but which are
perfectly dependent: X = Y with probability 1.
Statistics of Extremes January 2008 slide 116
Parametric models
For modelling purposes, it is usual to specify a parametric family for H or h that encompasses a
wide range of dependence types over the parametric domain.
Standard example is the logistic model, with
h(w) =
1
2
(
1
1)w(1 w)
11/
w
1/
+ (1 w)
1/
2
for 0 < < 1.
In this case
G(x, y) = exp
_
_
x
1/
+y
1/
_
_
, x > 0, y > 0.
Independence and perfect dependence arise as limits as 1 and 0 respectively.
Statistics of Extremes January 2008 slide 117
49
Alternative models
A limitation of the logistic model is its symmetry. Asymmetric alternatives include the
bilogistic model
h(w) =
1
2
(1 )(1 w)
1
w
2
(1 u)u
1
(1 u) +u
1
on 0 < w < 1, where 0 < < 1 and 0 < < 1, and u = u(w, , ) is the solution of
(1 )(1 w)(1 u)
(1 )wu
= 0;
and the Dirichlet model
h(w) =
( + + 1)(w)
1
(1 w)
1
2()()w +(1 w)
++1
, 0 < w < 1,
for parameters > 0 and > 0.
Statistics of Extremes January 2008 slide 118
Inference for multivariate extreme value models
Inference consists of the following steps:
1. estimation of marginal distributions and transformation to standard Frechet;
2. choice of the dependence model H;
3. estimation of the parameters of H by maximum likelihood;
4. model assessment.
There is a potential gain in eciency by estimating marginal and dependence parameters in a single
likelihood maximization.
Statistics of Extremes January 2008 slide 119
Structure variables
Though processes may be multivariate, it may be that a univariate functiona so-called
structure variableis the quantity of interest.
This suggests two possible methods of analysis:
1. univariate extreme value analysis of the sturcture variable process; or
2. multivariate analysis of the full process, followed by marginalization to the structure variable
of interest.
Statistics of Extremes January 2008 slide 120
50
Inference for structure variables
Univariate inference is trivial.
For multivariate inference, if Z = (M
x
, M
y
) is the structure variable of interest, and
M = (M
x
, M
y
) has density g, then
PrZ z =
_
Az
g(x, y)dxdy,
where A
z
= (x, y) : (x, y) z.
Sometimes integration can be avoided. For example, if Z = maxM
x
, M
y
,
PrZ z = PrM
x
z, M
y
z = G(z, z),
where G is the joint distribution function of M.
Statistics of Extremes January 2008 slide 121
Example: Sea levels
Freemantle Annual Maximum Sea-level (m)
P
o
r
t
P
ir
ie
A
n
n
u
a
l
M
a
x
im
u
m
S
e
a
-
le
v
e
l
(
m
)
1.3 1.4 1.5 1.6 1.7 1.8 1.9
3
.
6
3
.
8
4
.
0
4
.
2
4
.
4
4
.
6
Annual maximum sea levels at Port Pirie and Fremantle. Logistic = 0.922 (0.087), implying
near-independence.
Statistics of Extremes January 2008 slide 122
51
Analysis of structure variable
Return Period (Years)
R
e
t
u
r
n
L
e
v
e
l
(
m
)
1 5 10 50 100 500 1000
1
.
0
1
.
5
2
.
0
2
.
5
3
.
0
Return levels of structure variable Z = maxM
x
, (M
y
2.5) using both univariate and multivariate
methods.
Statistics of Extremes January 2008 slide 123
Impact of dependence on structure variables I
Return Period (Years)
R
e
t
u
r
n
L
e
v
e
l
(
m
)
1 5 10 50 100 500 1000
1
.
6
1
.
8
2
.
0
2
.
2
2
.
4
Z = maxM
x
, (M
y
2.5) in logistic model analysis of Fremantle and Port Pirie annual maximum
sea-level series with = 0, 0.25, 0.5, 0.75, 1 ( = 0 is lowest curve).
Statistics of Extremes January 2008 slide 124
52
Impact of dependence on structure variables II
Return Period (Years)
R
e
t
u
r
n
L
e
v
e
l
(
m
)
1 5 10 50 100 500 1000
1
.
2
1
.
4
1
.
6
1
.
8
2
.
0
Z = minM
x
, (M
y
2.5) in logistic model analysis of Fremantle and Port Pirie annual maximum
sea-level series with = 0, 0.25, 0.5, 0.75, 1 ( = 0 is highest curve).
Statistics of Extremes January 2008 slide 125
Point process representation
As in the univariate case, there are threshold exceedance and point process characterizations of
extremes that enable a greater use of available information.
The point process representation, in particular, also provides some insight into the measure
function H that appears in the componentwise limit law.
Statistics of Extremes January 2008 slide 126
Point process construction
(X
1
, Y
1
), (X
2
, Y
2
) . . . a sequence of independent variables with standard Frechet margins.
Assume componentwise maxima convergence
PrM
x,n
x, M
y,n
y G(x, y).
Dene sequence of point processes N
n
by
N
n
= (n
1
X
1
, n
1
Y
1
), . . . , (n
1
X
n
, n
1
Y
n
).
Normalization by n of standard Frechet variables is required to obtain marginal convergence.
Statistics of Extremes January 2008 slide 127
53
Point process limit
On regions bounded from the origin (0, 0), we have
N
n
d
N,
where N is a non-homogeneous Poisson process on (0, ) (0, ).
Moreover, in terms of pseudo-polar coordinates
r = x +y and w =
x
x+y
,
the intensity function of the limiting process N is
(r, w) = 2
dH(w)
r
2
.
The function H is related to G by
G(x, y) = exp2
_
1
0
max
_
w
x
,
1w
y
_
dH(w)
Statistics of Extremes January 2008 slide 128
Comments
The intensity function of the limit process factorizes across r and w. Loosely, the relative
magnitude of extreme events in the two variables is independent of the magnitude itself.
The limit provides an interpretation of H as the distribution of relative magnitudes of extreme
events between the two variables.
In practiceas usualthe limit process is taken to be exact for extreme enough events: in this
case for events which are suciently far from (0, 0).
Statistics of Extremes January 2008 slide 129
Consistency of results
PrM
x,n
x, M
y,n
y = PrN
n
(A) = 0,
where
A = (0, ) (0, )(0, x) (0, y).
So,
PrM
x,n
x, M
y,n
y PrN(A) = 0 = exp(A),
where
(A) =
_
A
2
dr
r
2
dH(w)
=
_
1
w=0
_
r=min{x/w,y/(1w)}
2
dr
r
2
dH(w)
= 2
_
1
w=0
max
_
w
x
,
1w
y
_
dH(w).
Statistics of Extremes January 2008 slide 130
54
Likelihood for Poisson process model
Choosing A = (x, y) : x/n +y/n > r
0
, for large r
0
,
(A) = 2
_
A
dr
r
2
dH(w) = 2
_
r=r
0
dr
r
2
_
1
w=0
dH(w) = 2/r
0
,
which is constant with respect to the parameters of H.
Hence, assuming H has density h,
L(; (x
1
, y
1
), . . . , (x
n
, y
n
)) = exp(A)
N
A
i=1
(x
(i)
/n, y
(i)
/n)
N
A
i=1
h(w
i
),
where w
i
= x
(i)
/(x
(i)
+y
(i)
) for the N
A
points (x
(i)
, y
(i)
) falling in A.
Statistics of Extremes January 2008 slide 131
Comments
Choice of threshold r
0
is based, as usual, on empirical measures of model stability.
Alternative forms of threshold curve lead to more complicated likelihood expressions.
Simultaneous estimation of marginal and dependence features is possibly advantageous, but again
results in a considerably more complicated likelihood.
Statistics of Extremes January 2008 slide 132
Example: Exchange rate data
Year
L
o
g
-
d
a
ily
r
e
tu
r
n
1997 1998 1999 2000 2001
-
0
.0
2
0
.0
0
.0
2
Year
L
o
g
-
d
a
ily
r
e
tu
r
n
1997 1998 1999 2000 2001
-
0
.0
2
0
.0
0
.0
2
Log-daily returns of exchange rates. Top panel: UK sterling/US dollar exchange rate. Bottom panel:
UK sterling/Canadian dollar exchange rate.
Statistics of Extremes January 2008 slide 133
55
Example: Exchange rate data
Sterling v. US Dollars
S
t
e
r
lin
g
v
.
C
a
n
a
d
ia
n
D
o
lla
r
s
-0.02 -0.01 0.0 0.01 0.02
-
0
.
0
2
-
0
.
0
1
0
.
0
0
.
0
1
0
.
0
2
Concurrent values of exchange rates.
Statistics of Extremes January 2008 slide 134
Example: Exchange rate data
Standardized US/UK Daily Returns
S
t
a
n
d
a
r
d
iz
e
d
C
a
n
a
d
a
/
U
K
D
a
ily
R
e
t
u
r
n
s
1 10 100 1000
1
1
0
1
0
0
1
0
0
0
Concurrent values of exchange rates after transformation to Frechet scale.
Statistics of Extremes January 2008 slide 135
56
Example: Exchange rate data
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
5
1
.
0
1
.
5
2
.
0
w
R
e
la
t
iv
e
F
r
e
q
u
e
n
c
y
Fitted logistic model density h in point process analysis of exchange rate data. ( = 0.434 (0.025))
Statistics of Extremes January 2008 slide 136
Example: Oceanographic data
Wave Height (m)
S
u
r
g
e
(
m
)
0 2 4 6 8 10
-
0
.
2
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
Simultaneous values of wave and surge height.
Statistics of Extremes January 2008 slide 137
57
Example: Oceanographic data
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
0
.
5
1
.
0
1
.
5
2
.
0
w
R
e
la
t
iv
e
F
r
e
q
u
e
n
c
y
Various tted models h in point process analysis of wave-surge data.
Statistics of Extremes January 2008 slide 138
Example: Oceanographic data
Model
Logistic 227.2 0.659
Bilogistic 230.2 0.704 0.603
Dirichlet 238.2 0.852 0.502
Details of various ts to oceanographic data give support for asymmetric dependence structure.
Statistics of Extremes January 2008 slide 139
Higher-dimensional models
Limit results and characterizations have analogous forms in higher dimensions.
Extremal dependence is chracterized by a dependence function H on the simplex
s
d
= {(w1, . . . , w
d
) : wj 0, j = 1, . . . , d,
d
X
j=1
wj = 1}
subject to
Z
S
d
wjdH(w) = 1, j = 1, . . . , d
In principle, inference techniques remain the same, but
1. model specication is more dicult;
2. likelihood calculation is more dicult;
3. reliability of asymptotic results is more questionable.
Statistics of Extremes January 2008 slide 140
58
Asymptotic dependence
A limitation of multivariate extreme value models (and point process equivalents) is that apart from
the degenerate case of independence, all such models are asymptotically dependent; the intuition is
that because these are asymptotic (limiting) models, the extent of dependence cannot vary with the
rareness of the event, whereas this happens in practice.
Two variables X and Y with the same distribution are said to be asymptotically independent if
lim
z
PrY > z [ X > z = 0,
otherwise they are asymptotically dependent.
Loosely, two variables are asymptotically independent if an extreme value in one has zero
probability of happening on the occurrence of an extreme in the other variable.
Statistics of Extremes January 2008 slide 141
Bivariate normal variables
The class of asymptotically independent variables is non-trivial, and includes, for example, all
bivariate normal variables with positive correlation.
Using extreme value models for such distributions is misleading.
The limit is independence, but this provides a poor approximation to dependence at nite
levels.
A tted model at any specic threshold will overestimate strength of dependence on
exrapolation.
Much recent work has looked at inference for asymptotic independent models: Ledford and Tawn,
Heernan, Coles and Tawn, Heernan and Tawn.
Statistics of Extremes January 2008 slide 142
59
Spatial Extremes slide 143
Spatiotemporal extremes
Many environmental extremal problems are spatial or temporal in nature, or both:
heatwaves
rainfall
avalanches
forest res
storms
sea levels
Likely to be more important in the future, prediction (on dierent scales) needed for long-range
planning, short-range evacuation, . . .
Statistics of Extremes January 2008 slide 144
Geostatistics slide 145
Geostatistics
Statistics of spatially-dened variables
Mostly a multivariate normal theory:
remove trends in mean and dispersion in space and time
transform residuals to standard normal margins
t suitable spatial/space-time correlation functions
inferences using WLS, (likelihood), or Bayes (McMC)
More generally, set up model for response variable Y (x) with x A:
Y (x) [ S(x)
ind
f(y; ), S(x) N(x), (x),
and use MetropolisHastings algorithm for inference on S(x), predictions of future Y , etc.
(Diggle, Tawn, Moyeed, 1998), lots of Bayesians
Statistics of Extremes January 2008 slide 145
60
Example: Temperature data
Statistics of Extremes January 2008 slide 146
Example: Temperature data
Lugano (273 m)
BaselBinningen (316 m)
LocarnoMonti (366 m)
MontreuxClarens (405 m)
OeschbergKoppigen (483 m)
Neuchatel (485 m)
Bad Ragaz (496 m)
ZurichMeteoSchweiz (556 m)
BernLiebefeld (565 m)
Chateau d'Oex (985 m)
Engelberg (1035 m)
Montana (1508 m)
DavosDorf (1590 m)
Arosa (1840 m)
GdStBernard (2472 m)
Santis (2490 m)
Jungfraujoch (3580 m)
2001 2002 2003 2004 2005
T
e
m
p
e
r
a
tu
r
e
a
n
o
m
a
ly
(
d
e
g
r
e
e
s
C
e
ls
iu
s
)
Maximum temperature: June, July, August, 20012005
Statistics of Extremes January 2008 slide 147
61
Geostatistics of extremes
Basic setup:
want to model extremes of process Y (x, t), (x, t) A T R
3
time series (maybe intermittent)could be annual maxima, or daily values, or . . .
data available at
sites x
d
A
D
= x
1
, . . . , x
D
A
times T
d
= t
d,1
, . . . , t
d,n
d
, for d 1, . . . , D
exposition simplied if T
d
T
= t
1
, . . . , t
n
d=1
E
Y
d
Pr(Y
d
(
d
[ Y
d
)f(Y
d
) ,
where ( =
d
(
d
and (
d
(
c
= , for c ,= d, with
(
d
= ( y R
D
: F
Y
d
(y
d
) > F
Yc
(y
c
), c = 1, . . . , D, c ,= d
Use GPD for Y
d
and nonparametric linear model for conditional probability, and simulate to estimate
Pr(Y ().
Advantages: can deal with large D, extends to near-independence models
Disadvantages: models on dierent (
d
need not be coherent, uniqueness of representation for conditional
probability unclear, inference messy
Statistics of Extremes January 2008 slide 151
Max-stable processes
General setup:
Z(x) Frechet(1), for x A, so if the Z
m
() are independent and identically distributed, then
m
1
maxZ
1
(x), . . . , Z
m
(x) Frechet(1)
Joint distribution at any subset T = x
1
, . . . , x
d
of sites has multivariate extreme-value
distribution, and extremal coecient
D
=
_
S
D
max
dD
w
d
dH(w
1
, . . . , w
D
),
where D = [T[, o
D
is the unit simplex in R
D
, and H is a measure on o
D
with all its marginal
expectations equal to 1; we have 1
D
D.
Statistics of Extremes January 2008 slide 152
Extremal coecient
Summary of dependence in subset of variables
If Z
1
, . . . , Z
d
all marginally unit Frechet, and joint distribution is max-stable, then
PrZ
1
z, . . . , Z
d
z = exp(
D
/z), z > 0,
where extremal coecient
D
depends on subset.
Interpretation: 1
D
d, where
= 1 corresponds to complete dependence of maxima in subset,
= d corresponds to independence of maxima in subset
Not a complete summary of joint distribution, but basis of useful diagnostics, as can estimate
D
quite
easily.
Rainfall example (sketch!)
Statistics of Extremes January 2008 slide 153
63
Madogram
Exploratory tool (analogous to variogram), designed for dealing with many extremes on a map
F-madogram (Poncet, Cooley, Naveau, 2006/7?)
1
2
E[[FY (x +h) FY (x)[] =
1
2
(h) 1
(h) + 1
,
where (h) is pairwise extremal coecient at range h
Extended by Naveau et al. (2006) to estimation of dependence function V
h
in expression
Pr Y (x) y
1
, Y (x +h) y
2
= exp V
h
(y
1
, y
2
) ,
where
V
h
(y
1
, y
2
) = 2
_
1
0
max
_
w
y
1
,
1 w
y
2
_
dH
h
(w).
Statistics of Extremes January 2008 slide 154
Spectral representations I
de Haan (1984):
A continuous (in probability) max-stable process Z(x) may be expressed as
Z(x)
D
= max
k
U
k
f(x T
k
),
where (U
k
, T
k
) are the points of a Poisson process on R
+
[0, 1] with measure du/u
2
(dt),
where is a nite measure on [0, 1] and f
k
are non-negative L
1
functions.
Interpretation in terms of storms of sizes U
k
and shape f centred at T
k
Exploited by Smith (1990), Coles (1993), Coles and Walshaw (1994), de Haan and Pereira (2006)
with various choices of f (normal, t, Laplace, circular)
Can compute joint density of Z(x
1
), Z(x
2
), but no further
Statistics of Extremes January 2008 slide 155
64
Spectral representations II
Schlather (2002):
Let V (x) be a stationary process on R
d
with = Emax0, V (x) < , and let be a Poisson process on
(0, ) with intensity
1
ds/s
2
. If the V
s
(x) are independent copies of V (x), then
Z(x) = max
s
sV
s
(x),
is a stationary max-stable random process with unit Frechet margins.
Interpretation in terms of maxima of random sheets
Example: V
s
() stationary istropic Gaussian processes with correlation (h), then
log Pr{Z(x1) z1, Z(x2) z2} =
1
2
1
z1
+
1
z2
1 +
1 2
{(h) + 1}z1z2
(z1 +z2)
2
1/2
!
Corresponding extremal coecient (h) = 1 + 2
1/2
1 (h)
1/2
has natural bounds (Schlather and
Tawn, 2003)
Can only represent positive dependencebut most likely in practice
Statistics of Extremes January 2008 slide 156
Pairwise likelihood
Data Z1, . . . , Zn are from (unavailable) full joint density f(z1, . . . , zn; ), yielding log likelihood
() = log f(z1, . . . , zn; ).
If lower-order marginal densities are available, construct composite likelihood by taking product of
(non-independent!) densities.
Simplest example is pairwise log likelihood
2() =
X
i>j
log f(zi, zj; ),
constructed from all distinct disjoint pairs of observations.
If is identiable from the pairwise marginal densities, and if Z
1
, . . . , Z
n
iid
f(z; ), with Z
j
= (Z
j
1
, . . . , Z
j
D
), then
under mild regularity conditions the maximum pairwise likelihood estimator
is consistent, and satises
N
n
, J(
)
1
K(
)J(
)
1
o
as n .
Statistics of Extremes January 2008 slide 157
65
Example: Temperature data
Statistics of Extremes January 2008 slide 158
Swiss summer temperature data
For illustration:
annual maximum temperature data at D = 17 Swiss sites, 19612006
t GEV to standardised values Y (x
d
, t
j
) with
dtj
=
d0
+
d1
t
j
,
d
,
d
, d = 1, . . . , 17, j = 1, . . . , 46,
and obtain Z
dtj
iid
Frechet(1) using estimated probability integral transform
estimate surfaces for
0
,
1
, ,
using splines
use likelihood estimation to obtain
ij
for each pair of sites (model checking)
use pairwise likelihood to t stationary isotropic covariance function
(u) = (1
1
) +
1
exp(u/
2
)
3
, 0
1
1,
2
,
3
> 0,
with estimated values
1
= 0.62 (0.2),
2
= 360 (100)km,
3
= 1.59 (1.29)
simulate max-stable random elds Z
(x) and
assess . . .
Statistics of Extremes January 2008 slide 159
66
Fit of marginal model
0.5 2.0 10.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station Arosa
0.5 2.0 10.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station Bad Ragaz
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station BaselBinningen
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station BernLiebefeld
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station Chateau dOex
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station DavosDorf
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station Engelberg
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station GdStBernard
0.5 2.0 10.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station Jungfraujoch
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station LocarnoMonti
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station Lugano
0.5 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station Montana
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station MontreuxClarens
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station Neuchatel
0.2 1.0 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station OeschbergKoppigen
0.5 2.0 10.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station Satis
0.5 5.0 50.0
0
.
5
2
.
0
1
0
.
0
5
0
.
0
Station ZrichMeteoSchweiz
Statistics of Extremes January 2008 slide 160
Estimated correlations
Individual points: MLEs of correlations 2 SE (grey bars) as a function of distance (10
3
km) between pairs of
points
Solid line: tted correlation estimated using pairwise likelihood
Left: temperature data; right: simulated data
0.0 0.2 0.4 0.6 0.8 1.0
1
.0
0
.5
0
.0
0
.5
1
.0
Schlather model coefficients (red points) and their confidence intervals (gray error bars) and Weibull model curve vs. distance.
Distance between two stations
C
o
e
ffic
ie
n
t e
s
tim
a
te
d
fr
o
m
S
c
h
la
th
e
r
m
o
d
e
l
0.0 0.2 0.4 0.6 0.8 1.0
1
.0
0
.5
0
.0
0
.5
1
.0
Distance between two stations
C
o
e
ffic
ie
n
t e
s
tim
a
te
d
fr
o
m
S
c
h
la
th
e
r
m
o
d
e
l
Statistics of Extremes January 2008 slide 161
67
Fit of correlation curve
Normal QQ plot of g(
ij
) g(
ij
)/SE, for Fisher z transform g
2 1 0 1 2
1
0
1
2
3
Normal QQ Plot
Theoretical Quantiles
S
a
m
p
l
e
Q
u
a
n
t
i
l
e
s
Statistics of Extremes January 2008 slide 162
68
P a i r w i s e t s
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,2) altitudes=(1840,496)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,3) altitudes=(1840,316)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,4) altitudes=(1840,565)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,5) altitudes=(1840,985)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,6) altitudes=(1840,1590)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,7) altitudes=(1840,1035)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,8) altitudes=(1840,2472)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,9) altitudes=(1840,3580)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,10) altitudes=(1840,366)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,11) altitudes=(1840,273)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,12) altitudes=(1840,1508)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,13) altitudes=(1840,405)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,14) altitudes=(1840,485)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,15) altitudes=(1840,483)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,16) altitudes=(1840,2490)
0.1 0.5 5.0 50.0
0
.
1
0
.
5
5
.
0
5
0
.
0
pair(1,17) altitudes=(1840,556)
S t a t i s t i c s o f E x t r e m e s J a n u a r y 2 0 0 8 s l i d e 1 6 3
S i m u l a t e d e l d s
F o r i l l u s t r a t i o n , f o l l o w i n g p a g e s s h o w s i m u l a t e d e l d s y
( x , 2 0 0 7 ) o r d e r e d a c c o r d i n g t o v a l u e s o f
_
X
y
( x , 2 0 0 7 ) d x
S c a l e i s s t a n d a r d i z e d r e l a t i v e t o r o b u s t l o c a t i o n a n d s c a l e a t e a c h s i t e
S t a t i s t i c s o f E x t r e m e s J a n u a r y 2 0 0 8 s l i d e 1 6 4
6 9
0
1
2
3
4
5
6 7 8 9 10
45.5
46.0
46.5
47.0
47.5
48.0
500th hottest year out of 1000 realisations for 2007. 500th hottest year out of 1000 realisations for 2007.
6 7 8 9 10
4
5
.5
4
6
.0
4
6
.5
4
7
.0
4
7
.5
4
8
.0
S t a t i s t i c s o f E x t r e m e s J a n u a r y 2 0 0 8 s l i d e 1 6 5
0
1
2
3
4
5
6 7 8 9 10
45.5
46.0
46.5
47.0
47.5
48.0
898th hottest year out of 1000 realisations for 2007. 898th hottest year out of 1000 realisations for 2007.
6 7 8 9 10
4
5
.5
4
6
.0
4
6
.5
4
7
.0
4
7
.5
4
8
.0
S t a t i s t i c s o f E x t r e m e s J a n u a r y 2 0 0 8 s l i d e 1 6 6
0
1
2
3
4
5
6 7 8 9 10
45.5
46.0
46.5
47.0
47.5
48.0
899th hottest year out of 1000 realisations for 2007. 899th hottest year out of 1000 realisations for 2007.
6 7 8 9 10
4
5
.5
4
6
.0
4
6
.5
4
7
.0
4
7
.5
4
8
.0
S t a t i s t i c s o f E x t r e m e s J a n u a r y 2 0 0 8 s l i d e 1 6 7
D i s c u s s i o n
S c h l a t h e r r e p r e s e n t a t i o n i s v e r y g e n e r a l , c a n e n v i s a g e f o r o t h e r t y p e s o f d a t a :
r a i n f a l l t i m e s e r i e s a t a s i n g l e s i t e , u s i n g a d d i t i o n o f a r a n d o m s e t ( M e h d i )
r a i n f a l l i n s p a c e a n d t i m e ?
s n o w f a l l i n s p a c e ( J u l i e t t e ? )
e t c .
M o d e l l i n g i s s u e s :
W h a t p r o c e s s e s a r e s e n s i b l e ? G a u s s i a n r a n d o m e l d s ? L e v y p r o c e s s e s ?
H o w t o e x t e n d t o n e a r - i n d e p e n d e n c e c a s e s ?
H o w t o b u i l d i n c o v a r i a t e s ? S m o o t h i n g ? P r e - w h i t e n i n g o r n o t ?
H o w b e s t t o p e r f o r m e s t i m a t i o n a n d t e s t i n g ?
B a y e s i a n i n f e r e n c e ?
H o w t o b u i l d i n p h y s i c a l k n o w l e d g e o f u n d e r l y i n g p r o c e s s e s ?
S t a t i s t i c s o f E x t r e m e s J a n u a r y 2 0 0 8 s l i d e 1 6 8
7 0