Board of the Foundation of the Scandinavian Journal of Statistics 2004. Published by Blackwell Publishing Ltd, 9600 Garsington
Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 31: 417–429, 2004
Inference for Observations of Integrated
Diffusion Processes
SUSANNE DITLEVSEN and MICHAEL SØRENSEN
University of Copenhagen
ABSTRACT. Estimation of parameters in diffusion models is investigated when the observations
are integrals over intervals of the process with respect to some weight function. This type of
observations can, for example, be obtained when the process is observed after passage through an
electronic filter. Another example is provided by the ice-core data on oxygen isotopes used to
investigate paleo-temperatures. Finally, such data play a role in connection with the stochastic
volatility models of finance. The integrated process is not a Markov process. Therefore, predictionbased estimating functions are applied to estimate parameters in the underlying diffusion model.
The estimators are shown to be consistent and asymptotically normal. The theory developed in the
paper also applies to integrals of processes other than diffusions. The method is applied to inference
based on integrated data from Ornstein–Uhlenbeck processes and from the Cox–Ingersoll–Ross
model, for both of which an explicit optimal estimating function is found.
Key words: asymptotic normality, consistency, Cox–Ingersoll–Ross model, estimating equation,
ice-core data, non-Markovian process, Ornstein–Uhlenbeck process, prediction-based estimating functions, quasi-likelihood, stochastic differential equation
1. Introduction
In the present paper, we study statistical inference for observations of integrated diffusions. In
several cases, a sample of observations at discrete time points of a diffusion process is not
available but, for example, a realization of the process has been observed after passage
through an electronic filter. Another example is provided by the ice-core records from
Greenland. The isotope ratio 18O/16O in the ice, measured as an average in pieces of ice, each
piece representing a time interval with time increasing as a function of the depth, is a proxy for
paleo-temperatures. The variation of the paleo-temperature can be modelled by a stochastic
differential equation, and it is natural to model the ice-core data as an integrated diffusion
process, see Ditlevsen et al. (2002). Integrated processes also play an important role in connection with the so-called realized stochastic volatility in finance, see Andersen & Bollerslev
(1998), Genon-Catalot et al. (1999), Gloter (1999b), Sørensen (2000), Barndorff-Nielsen &
Shepard (2001) and Andersen et al. (2001).
Martingale estimating functions are a useful tool for statistical inference based on discretely
sampled diffusions, see e.g. Bibby & Sørensen (1995, 1997, 2001), Pedersen (2000), and Sørensen
(1997) and references therein. For such data the likelihood function is usually not explicitly
available and can only be found numerically or be otherwise approximated. Integrated diffusion
processes, however, are not Markov processes, for which reason there are no natural or easily
calculated martingales on which to base a class of estimating functions. Moreover, the likelihood
function is not even numerically tractable for such models. Therefore, we will apply the prediction-based estimating functions that were introduced in Sørensen (2000) as a tool for drawing
statistical inference about non-Markovian models and in other situations where no martingale is
readily available. These estimating functions are generalizations of the martingale estimating
functions. It is shown that the method of prediction-based estimating functions under mild
regularity conditions provides a satisfactory solution to the inference problem investigated here.
The conditions ensure existence, consistency and asymptotic normality of the estimators.
418
S. Ditlevsen and M. Sørensen
Scand J Statist 31
Other approaches to inference for integrated diffusions were presented in Gloter (1999a,
2000). The first paper considers the integrated Ornstein–Uhlenbeck process, and compares the
Whittle estimator, which in this case is efficient, with the estimator obtained from Rydén’s split
data maximum pseudo-likelihood estimator, see Rydén (1994). In Gloter (1999a) minimum
contrast estimators are considered that are consistent when the length of the sampling interval
goes to zero as the number of observations go to infinity.
In Section 2 the model of integrated diffusions is presented, and prediction-based estimating
functions are briefly presented and applied to solve the inference problem. A way of finding
the necessary moments of the integrated process is derived, and as examples prediction-based
estimating functions are found for the integrated Ornstein–Uhlenbeck process and for the
integrated Cox–Ingersoll–Ross (CIR) model.
In Section 3 the optimal prediction-based estimating function is derived, and we find a way
of deriving moments of order j, j being a non-negative integer, provided that these moments
exist. A formula is given so that if we know an analytic expression or can simulate the
moments in the underlying process, the calculation of the moments needed in order to find the
optimal prediction-based estimating function is easily programmable. The optimal predictionbased estimating functions in the previous examples are discussed. This yields, in both
examples, explicit estimating functions.
In Section 4, asymptotic results about the estimating functions and their estimators are
proved under weak regularity conditions using the mixing properties of the underlying diffusion process.
It is worth pointing out that the theory developed in this paper applies immediately to
integrals of processes other than diffusions too. For the asymptotic theory to hold, it is enough
that the process that is integrated is sufficiently mixing.
2. Integrated diffusions and prediction-based estimating functions
Consider the one-dimensional diffusion
dXt ¼ bðXt ; hÞdt þ rðXt ; hÞdWt ;
X0 lh ;
where h is an unknown p-dimensional parameter belonging to the parameter space Q ˝ Rp
and W is a one-dimensional standard Wiener process. We assume that X0 is independent of W,
that the stochastic differential equation has a unique weak solution, and that X is an ergodic,
stationary diffusion with invariant measure lh.
Suppose that a sample of observations at discrete time points is not available but, instead, a
running integral of the process with respect to some weight function is available. Specifically,
suppose the interval of observation [0,T] is subdivided into n smaller intervals of length D ¼ T/
n, and let v be a probability measure on the interval [0,D]. We shall consider observations of
the form
Z D
Yi ¼
Xði1ÞDþs dvðsÞ; i ¼ 1; . . . ; n:
ð1Þ
0
Typically, v will have a density u with respect to the Lebesgue measure on [0,D], in which case
Z iD
Xs uðs ði 1ÞDÞds; i ¼ 1; . . . ; n:
ð2Þ
Yi ¼
ði1ÞD
If our observations are obtained by integrating uniformly over the time axis, v is simply the
uniform distribution on [0, D] with u ¼ 1/D, and we get the more simple observations
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
Inference for integrated diffusions
Scand J Statist 31
Yi ¼
1
D
Z
419
iD
Xs ds:
ði1ÞD
Note that by stationarity, the law of the process X is invariant under time translations, which
easily implies that {Yi} is a stationary process.
We solve the problem of estimating the parameter h in the underlying process X by applying
the method of prediction-based estimating functions introduced in Sørensen (2000). In the
following, we will briefly outline the method of prediction-based estimating functions. Assume
that fj, j ¼ 1,. . .,N, are one-dimensional functions such that Eh(fj(Yi)2) < 1 for all h 2 Q.
We denote the expectation when h is the true parameter value by Eh(Æ). Let hjk, j ¼ 1,. . ., N,
k ¼ 0,. . .,qj (qj ‡ 0) be functions from Rr into R, and define (for i ‡ r + 1) random variables
ði1Þ
ði1Þ
by Zjk
¼ hjk ðYi1 ; Yi2 ; . . . ; Yir Þ. We assume that Eh ððZjk Þ2 Þ < 1 for all h 2 Q, and let
Pi)1,j denote the subspace of the space of square integrable random variables spanned by
ði1Þ
ði1Þ
Zj0 ; . . . ; Zjqj . Finally, we make the natural assumption that the functions hj0,. . .,hjqj are
linearly independent. The space Pi)1, j can be interpreted as a set of predictors of fj(Yi) based
ði1Þ
on Yi)r,. . .,Yi)1. We write the elements of Pi)1,j in the form aT Zj , where aT ¼ (a0,. . .,aqj)
ði1Þ
ði1Þ
ði1Þ T
and Zj
¼ ðZj0 ; . . . ; Zjqj Þ are qj-dimensional vectors. We denote transposition by T. In
the rest of this paper, hj0 ¼ 1. We will study the estimating function
Gn ðhÞ ¼
N
n X
X
ði1Þ
Pj
i¼rþ1 j¼1
h
i
ði1Þ
^j ðhÞ ;
ðhÞ fj ðYi Þ p
ð3Þ
ði1Þ
where Yi is of the form (2), Pj ðhÞ is a p-dimensional stochastic vector, the coordinates
ði1Þ
^ j ðhÞ is the minimum mean square error predictor of fj (Yi)
of which belong to Pi)1, j, and p
in Pi)1,j.
When h is the true parameter value, we define Cj(h) as the covariance matrix
ðrÞ
ðrÞ
ðrÞ
ðrÞ
of ðZj1 ; . . . ; Zjqj ÞT and bj ðhÞ ¼ ðCovh ðZj1 ; fj ðYrþ1 ÞÞ; . . . ; Covh ðZjqj ; fj ðYrþ1 ÞÞÞT . Then we
have
ði1Þ
^j
p
ði1Þ
ðhÞ ¼ ^aj ðhÞT Zj
where ^aj ðhÞT ¼ ð^aj0 ðhÞ; ^aj ðhÞT Þ with
^aj ðhÞ ¼ Cj ðhÞ1 bj ðhÞ
ð4Þ
and
^aj0 ðhÞ ¼ Eh ðfj ðY1 ÞÞ
qj
X
k¼1
ðrÞ
^ajk ðhÞEh ðZjk Þ:
ði1Þ
ð5Þ
m
¼ Yikj , k ¼ 1,. . .,r, for some positive integers
If, for instance, we take fj (y) ¼ ymj and Zjk
mj
mj, we need to calculate the moments Eh ðY1 Þ and Eh((Y1Yk)mj) for k ¼ 1,. . .,r. Once we have
these moments, the vector of coefficients ^aj can easily be found by means of the Durbin–
Levinson algorithm,
see Brockwell
& Davis (1991). For many diffusions there exist K > 0 and
m
k > 0 such that Covh ðY1m ; Yrþ1
Þ K ekðr 1Þ , see Section 4. Therefore r will usually not
need to be chosen to be particularly large.
ði1Þ
Presumably f1(y) ¼ y and f2(y) ¼ y2 with Zjk
¼ Yik , k ¼ 1,. . ., r, j ¼ 1, 2 and
ði1Þ
2
Z2k
¼ Yiþrk , k ¼ r + 1,. . ., 2r, will in many cases be a reasonable choice. In this case the
minimum mean square error predictor of f1(Yi) can be found as described above, while the
predictor of f2(Yi) can be found by applying the two-dimensional Durbin–Levinson algorithm
to the process ðYi ; Yi2 Þ.
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
420
S. Ditlevsen and M. Sørensen
Scand J Statist 31
The necessary moments can in all these cases be found from the mixed moments of the
process X. First we assume that Eh(|X0|m) < 1. Then
Z
m !
D
Eh ðY1m Þ ¼ Eh
¼ Eh
¼
Z
Xs uðsÞds
0
Z
D
0
D
0
Z
0
D
Z
D
0
Xs1 Xsm uðs1 Þ uðsm Þds1 dsm
Eh ðXs1 Xsm Þuðs1 Þ uðsm Þds1 dsm ;
where we have used Fubini’s theorem. Specifically we get
Z D
Z D
uðsÞds ¼ Eh ðX0 Þ:
Eh ðXs ÞuðsÞds ¼ Eh ðX0 Þ
Eh ðY1 Þ ¼
0
0
When u(t) is a constant, we see that
Z Z
Z sm1
m! D s1
m
Eh ðXs1 Xsm Þdsm ds2 ds1 :
Eh ðY1 Þ ¼ m
D 0 0
0
Here we have used that Eh(Xs1 Xsm) does not depend on the ordering of s1,. . ., sm so that it is
enough to integrate over the region where 0 £ sm £ £ s1 £ D. The factor m! appears
because s1,. . ., sm can be ordered in m! different ways. In a similar way we obtain that when
Eh(|X0|m1+m2) < 1,
Z D
Z D
Eh Y1m1 Ykm2 ¼
Eh Xs1 Xsm1 Xððk1ÞDþsðm1 þ1Þ Þ Xððk1ÞDþsðm1 þm2 Þ Þ
0
0
uðs1 Þ uðsðm1 þm2 Þ Þds1 dsðm1 þm2 Þ :
Example 1. For diffusion models where the eigenfunctions of the generator are polynomials
it is possible to find all moments of type E(Xt1 Xtm), see e.g. Sørensen (2000).
Consider the Ornstein–Uhlenbeck process given by
dXt ¼ bXt dt þ r dWt :
This process is ergodic, and its stationary distribution is the normal distribution with
expectation 0 and variance r2/(2b), provided that b > 0. Recently, this process was used to
model the ice-core data from Greenland mentioned in the Introduction. The observations were
of the integrated process, for details see Ditlevsen et al. (2002).
We have that
Eh ðXt jX0 ¼ x0 Þ ¼ x0 ebt
and
Eh ðX0 Xt Þ ¼ r2 ð2bÞ1 ebt :
This implies that
Eh ðY12 Þ ¼ r2 b3 D2 ðbD 1 þ ebD Þ
and
1
Eh ðY1 Yk Þ ¼ r2 b3 D2 ð1 ebD Þ2 ekbD
2
for k > 1.
ði1Þ
ði1Þ
ði1Þ
For f1(y) ¼ y, f2(y) ¼ y2, Z1;0 ¼ 1, Z1;1 ¼ Yi1 and Z2;0 ¼ 1 (i.e. q1 ¼ 1, q2 ¼ 0,
r ¼ 1), we get
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
Inference for integrated diffusions
Scand J Statist 31
ði1Þ
^1
p
ðhÞ ¼
ð1 ebD Þ2
Yi1
2ðbD 1 þ ebD Þ
ði1Þ
^2
p
and
ðhÞ ¼
421
r2 ðbD 1 þ ebD Þ
:
b3 D2
ði1Þ
ði1Þ
One possible estimating equation is obtained by choosing P1 ðhÞ ¼ ð1; 0ÞT and P2 ðhÞ ¼
ði1Þ
^1 ðhÞ
ð0; 1ÞT . This is, however, not the optimal estimating function based on the predictors p
ði1Þ
^2 ðhÞ. The optimal estimating function will be found in the next section.
and p
Example 2. Another particular example is the CIR model given by
dXt ¼ bðXt aÞdt þ r
pffiffiffiffiffi
Xt dWt :
This process is ergodic and its stationary distribution is the Gamma distribution with shape
parameter 2bar)2 and scale parameter r2/(2b), provided that b > 0, a > 0, r > 0, and
2ba ‡ r2. The process has many applications: it was introduced in mathematical finance as a
model of the short-term interest rate by Cox et al. (1985). Feller (1951) proposed it as a model
for population growth, and recently it was used to model nitrous oxide emission from soil by
Pedersen (2000).
All moments of the type Eh(Xt1 Xtm) can be calculated by means of formulae in Sørensen
(2000). In particular, Eh ðX0 Þ ¼ a; Eh ðX02 Þ ¼ aða þ r2 =ð2bÞÞ, and
Eh ðX0 Xt Þ ¼ a2 þ ar2 ð2bÞ1 ebt :
Thus
Eh ðY12 Þ ¼ a2 þ ar2 b3 D2 ðebD 1 þ bDÞ
and
1
Eh ðY1 Yk Þ ¼ a2 þ ar2 b3 D2 ðebD 1Þ2 ekbD
2
ði1Þ
ði1Þ
for k > 1. For f1(y) ¼ y, f2(y) ¼ y2, Z1;0
q1 ¼ 1, q2 ¼ 0, r ¼ 1), we get
ði1Þ
^1
p
ðhÞ ¼ að1 a11 ðbÞÞ þ a11 ðbÞYi1
¼ 1, Z1;1
ði1Þ
^2
p
and
ði1Þ
¼ Yi1 and Z2;0
¼ 1 (i.e.
ðhÞ ¼ a20 ðhÞ;
where
a11 ðbÞ ¼
ð1 ebD Þ2
2ðbD 1 þ ebD Þ
ði1Þ
and a20 ðhÞ ¼ Eh ðY12 Þ. Thus by choosing P1
we end up with the estimating equations
ði1Þ
ðhÞ ¼ ð1; Yi1 ; 0ÞT and P2
n
n
X
1 X
^ þ a11 ðbÞ
^ 1
Yi ¼ ^að1 a11 ðbÞÞ
Yi1
n 1 i¼2
n 1 i¼2
n
X
i¼2
^
Yi1 Yi ¼ ^að1 a11 ðbÞÞ
^2 ¼
r
n
X
i¼2
^
Yi1 þ a11 ðbÞ
^3 D2 Pn Y 2 ^a2
b
i¼2 i
:
^
^
ðn 1Þ^aðebD
1 þ bDÞ
n
X
i¼2
Note that
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
2
Yi1
ðhÞ ¼ ð0; 0; 1ÞT ,
422
S. Ditlevsen and M. Sørensen
^a ¼
Scand J Statist 31
n1
1 X
Yn Y1
Yi þ
^
n 1 i¼1
ðn 1Þð1 a11 ðbÞÞ
^ is not close to 1. We
is essentially the average of the observations when n is large and a11 ðbÞ
shall see in the next section that the estimating function found here is, in fact, optimal.
3. The optimal prediction-based inference for integrated diffusions
In this section we derive the optimal choice of the weight P(i)1)(h) in (3) using the results and
notation in Sørensen (2000). Optimality is in the sense of the theory of estimating functions,
see Godambe & Heyde (1987) and Heyde (1997). The optimal member of a class of estimating
functions is the one that provides the most efficient estimator. This estimator is sometimes
called a quasi-likelihood estimator.
In Sørensen (2000) it was shown that the optimal estimating function of the type (3) is
given by
Gn ðhÞ ¼ An ðhÞ
n
X
i¼rþ1
H ðiÞ ðhÞ;
ð6Þ
where
^ ði1Þ ðhÞ ;
H ðiÞ ðhÞ ¼ Z ði1Þ F ðYi Þ p
ð7Þ
ði1Þ
^ði1Þ ðhÞ ¼ ð^
p1
with F(x) ¼ (f1(x),. . ., fN(x))T, p
0
1
ði1Þ
Z
0q1
0q1
B 1
C
ði1Þ
B
0
Z
0q2 C
q2
2
C
Z ði1Þ ¼ B
B ..
.. C:
..
@ .
. A
.
ði1Þ
0qN
0qN
ZN
ði1Þ
^N
ðhÞ; . . . ; p
ðhÞÞT and
ð8Þ
Here 0qj denotes the qj-dimensional zero-vector. Finally,
M
n ðhÞ1 ;
An ðhÞ ¼ @h ^aðhÞTCðhÞ
ð9Þ
with
n ðhÞ ¼ Eh H ðrþ1Þ ðhÞH ðrþ1Þ ðhÞT
M
i
ðn r kÞ h ðrþ1Þ
Eh H
ðhÞH ðrþ1þkÞ ðhÞT þ Eh H ðrþ1þkÞ ðhÞH ðrþ1Þ ðhÞT ; ð10Þ
ðn rÞ
k¼1
¼ Eh Z ði1Þ ðZ ði1Þ ÞT ;
ð11Þ
CðhÞ
þ
and
nr1
X
^aðhÞT ¼ a^1 ðhÞT ; . . . ; ^aN ðhÞT ;
ð12Þ
where ^aj ðhÞ is given by (4) and (5). A sufficient condition for (6) to be optimal is that the matrix
aðhÞT has full rank, and that the functions 1, f1,. . ., fN are linearly independent on the
@h ^
support of the conditional distribution of Yn given Y1,. . .,Yn)1. In particular, the latter
n ðhÞ is invertible.
condition implies that the matrix M
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
Inference for integrated diffusions
Scand J Statist 31
423
In Section 4, we shall see that for many diffusion models there exist K > 0 and k > 0 such
that the absolute values of all entries in the expectation matrices in the sum in (10) are
dominated by K e)k(k)r)1) when k > r. Therefore, the sum in (10) can in practice often be
truncated so that fewer moments need to be calculated.
j
ði1Þ
ði1Þ
Natural choices for fj (y) and Zjk
would be fj (y) ¼ yjj0 and Zjk
¼ Yiljk jk , where jj0
4j
and jjk are such that Eh[Y ] exists with j ¼ max{j10,. . .,jNqN}. Note that it is enough that
Eh[Y2j] exists for a prediction-based estimating function to be well defined. The more strict
condition is for the optimal prediction-based estimating function to exist. From now on we
ði1Þ
assume that fj and Zjk
have the form just indicated. For simplicity we assume jj0 and jjk
are integers. In order to calculate (10), we then need higher order moments of the form
Eh ½Y1k1 Ytk1 2 Ytk2 3 Ytk3 4 , where 1 £ t1 £ t2 £ t3 and where ki, i ¼ 1,. . ., 4 are non-negative integers such that k1 + k2 + k3 + k4 £ 4j. We will express these moments in terms of the
moments of Xt, which will usually either be known or possible to determine by simulation.
Define
wð v; u; s; r; hÞ ¼ Eh ½X v1 X vk Xu1 Xuk2 Xs1 Xsk3 Xr1 Xrk4 ;
1
where v ¼ ( v1,. . ., vk1), u ¼ (u1,. . ., uk2), s ¼ (s1,. . ., sk3), r ¼ (r1,. . ., rk4),
/ðx; k; t; DÞ ¼ uðx1 ðt 1ÞDÞ uðxk ðt 1ÞDÞ;
where k is an integer and x ¼ (x1,. . ., xk),
Uð v; u; s; r; t1 ; t2 ; t3 ; DÞ ¼ /ð v; k1 ; 1; DÞ/ðu; k2 ; t1 ; DÞ/ðs; k3 ; t2 ; DÞ/ðr; k4 ; t3 ; DÞ;
and
A1 ¼ ½0; Dk1
A2 ¼ ½ðt1 1ÞD; t1 Dk2
A3 ¼ ½ðt2 1ÞD; t2 Dk3
A4 ¼ ½ðt3 1ÞD; t3 Dk4
T1 ¼ fð v1 ; . . . ; vk1 Þ : 0 v1 vk1 Dg
T2 ¼ fðu1 ; . . . ; uk2 Þ : ðt1 1ÞD u1 uk2 t1 Dg
T3 ¼ fðs1 ; . . . ; sk3 Þ : ðt2 1ÞD s1 sk3 t2 Dg
T4 ¼ fðr1 ; . . . ; rk4 Þ : ðt3 1ÞD r1 rk4 t3 Dg
B ¼ ðA1 \ T1 Þ ðA2 \ T2 Þ ðA3 \ T3 Þ ðA4 \ T4 Þ:
In the same way as in Section 2 we get
h
i Z
Eh Y1k1 Ytk1 2 Ytk2 3 Ytk3 4 ¼
wð v; u; s; r; hÞUð v; u; s; r; t1 ; t2 ; t3 ; DÞdt;
A1 A2 A3 A4
where dt ¼ drk4 dr1dsk3 ds1duk2 du1d vk1 d v1. Thus we need the mixed moments of the
process X of order up to k1 + k2 + k3 + k4. These depend on the distance in time between
the variables Xti appearing in the expression for the moment, and care has to be taken when
different variables are integrated over the same interval when the order of the integration
variables changes. When u(t) ¼ 1/D, this can be solved in the following way. Assume that
1 < t1 < t2 < t3. Arguments of symmetry yield that
Z
k1 !k2 !k3 !k4 !
Eh ½Y1k1 Ytk12 Ytk2 3 Ytk34 ¼ ðk þk þk þk Þ wð v; u; s; r; hÞdt:
ð13Þ
D 1 2 3 4 B
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
424
S. Ditlevsen and M. Sørensen
Scand J Statist 31
The factor k1! appears because v1 ; . . . ; vk1 can be ordered in k1! different ways. The arguments
for the other factors are similar.
Example 3 (Example 1 continued). We will now find the optimal prediction-based estimating function for the integrated Ornstein–Uhlenbeck process with N ¼ 1; f1 ðyÞ ¼
ði1Þ
ði1Þ
ði1Þ
y; f2 ðyÞ ¼ y 2 ; Z1;0 ¼ Z2;0 ¼ 1 and Z1;1 ¼ Yi1 (i.e. q1 ¼ r ¼ 1, q2 ¼ 0).
First note that as CðhÞ
is diagonal and ^a10 ðhÞ ¼ 0, it follows that the first column of the
T
matrix @h ^aðhÞ CðhÞ is zero. Further, we have
Eh ðY1k1 Ytk12 Ytk2 3 Þ ¼ 0
for 1 £ t1 £ t2 and for non-negative integers ki satisfying k1 + k2 + k3 ¼ 3, because all
n ðhÞ has the form
moments of an odd order are zero. Therefore, M
0
1
m11
0
0
n ðhÞ ¼ @ 0 m22 m23 A;
M
0 m32 m33
which together with the observation above implies that An ðhÞ has the form
0 a12 a13
:
An ðhÞ ¼
0 a22 a23
and
The 2 · 2 matrix {aij} is invertible because @h ^aðhÞ has full rank and the matrices CðhÞ
M n ðhÞ are invertible. Hence we end up with the following optimal prediction-based estimating
equations for h:
Pn
^
ð1 ebD Þ2
i¼2 ‘Yi1 Yi
P
¼
n
^
2
bD
^
2ðbD 1 þ e Þ
i¼2 Yi1
^ 1
^ 3 D2 ðbD
^ 1 þ ebD
^2 ¼ b
r
Þ
n
1 X
Y 2:
n 1 i¼2 i
^
^ 6¼ 0. Hence there is no solution if Pn Yi1 Yi 0. As
^ 1 þ ebD
Note that bD
> 0 when b
i¼2
Yi and Yi)1 are positively correlated, the probability that this happens goes to zero as
n fi 1.
Example 4 (Example 2 continued). Consider again the CIR model. As we have found a
three-dimensional estimating function to estimate three parameters, it is necessarily optimal.
The matrix An ðhÞ is invertible by arguments similar to those in Example 3 and hence does not
change the estimator. There is therefore no need to find an expression for this matrix.
Suppose, however, that we know one of the parameters, a say. In order to find the optimal
combination of the three coordinates of the prediction-based estimating function derived in
Example 2 into a two-dimensional estimating function, we need moments of the form
Eh(Y1Yt1Yt2) and Eh(Y1Yt1Yt2Yt3), (1 £ t1 £ t2 £ t3). By the formulae above, these can be obtained by integration of moments of the form Eh(Xt1Xt2Xt3) and Eh(Xt1Xt1Xt2Xt3), for which
explicit and easily integrable expressions are known, see e.g. Sørensen (2000). As the resulting
expressions are rather long, they are omitted.
4. Asymptotic results
In this section we give asymptotic results for our estimating functions and the corresponding
estimators when our observations are integrated diffusions, based on general results in
Sørensen (1999, 2000). To do this we need to study which properties the process Y inherits
from the underlying diffusion process X. The integrated process Y is not a Markov process,
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
Inference for integrated diffusions
Scand J Statist 31
425
but mixing properties and moment conditions satisfied by X are preserved, which is what we
will use in this section.
We begin with a result on the asymptotic behaviour of an estimating function of the general
form
n
X
Gn ðhÞ ¼ An ðhÞ
H ðiÞ ðhÞ;
ð14Þ
i¼r
where {An(h)} is a sequence of p
PN
j ¼ 1 ðqj
þ 1Þ matrices, and where H(i)(h) is given by (7).
Theorem 1
Suppose the diffusion process X is stationary and a-mixing with mixing coefficients at(h), t > 0,
and that there exists a d > 0 such that
1
X
akD ðhÞd=ð2þdÞ < 1
ð15Þ
k¼1
and
2þd
< 1;
Eh H ðrÞ ðhÞjk
j ¼ 1; . . . ; N ; k ¼ 0; . . . ; qj :
ð16Þ
Then as n fi 1,
n ðhÞ ! MðhÞ;
M
n ðhÞ is given by (10) and where
where M
MðhÞ ¼ Eh H ðrÞ ðhÞH ðrÞ ðhÞT
1 n
o
X
Eh H ðrÞ ðhÞH ðrþkÞ ðhÞT þ Eh H ðrþkÞ ðhÞH ðrÞ ðhÞT :
þ
k¼1
ð17Þ
ð18Þ
Assume, moreover, that An(h) fi A(h) as n fi 1. Then as n fi 1,
n1 Varh ðGn ðhÞÞ ! V ðhÞ ¼ AðhÞMðhÞAðhÞT ;
ð19Þ
and
1
pffiffiffi Gn ðhÞ ! N ð0; V ðhÞÞ
n
ð20Þ
in distribution, provided that the matrix A(h) is such that A(h)M(h)A(h)T is strictly positive
definite.
Proof. First note that it follows that the process Y is stationary and a-mixing with mixing
coefficients aYk ðhÞ, satisfying aYk ðhÞ aðk1ÞD ðhÞ; k ¼ 2, 3,. . .. This is because the r-algebra
generated by Yi, i ¼ 1,. . ., n is contained in the r-algebra generated by Xu, 0 £ u £ nD, and the
r-algebra generated by Yi, i ¼ n, n + 1,. . . is contained in the r-algebra generated by Xu,
u ‡ (n ) 1)D. Next note that since H(i)(h) is a function of Yi)r,. . .,Yi, the process H(i)(h),
H
i ¼ r + 1, r + 2,. . . is a-mixing with mixing coefficients aH
k ðhÞ, satisfying that ak ðhÞ
H
aðkr1ÞD ðhÞ; k ¼ r + 2,. . ., and hence (15) holds with akD(h) replaced by ak ðhÞ.
To prove asymptotic normality, it is enough to consider the one-dimensional process
vTGn(h) for every v 2 Rp\{0} (Cramér–Wold device). Hence the theorem follows from a
classical central limit result for strongly mixing processes, see e.g. Theorem 1 in Section 1.5 of
Doukhan (1994).
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
426
S. Ditlevsen and M. Sørensen
Scand J Statist 31
For the optimal estimating function with An ðhÞ ¼ An ðhÞ, given by (9), An ðhÞ !
1
@h ^
aðhÞTCðhÞMðhÞ
under the conditions of Theorem 1, provided that MðhÞ
is invertible.
Hence
MðhÞ
1CðhÞ@
V ðhÞ ¼ @h ^aðhÞTCðhÞ
aðhÞ
hT ^
ð21Þ
in the optimal case.
The conditions imposed in Theorem 1 to ensure that a central limit theorem holds are not
the weakest possible. In fact, the speed with which the mixing coefficient at(h) needs to go to
zero to obtain a limit theorem depends on how heavy the tails of the marginal distribution of
Xt are, see Doukhan et al. (1994). This paper also shows that condition (16) can be weakened
slightly in the case of exponentially decreasing mixing coefficients.
For the one-dimensional, ergodic diffusion process X there are a number of relatively simple
criteria ensuring a-mixing with exponentially decreasing mixing coefficients for which (15) is
obviously satisfied. If, for instance, the spectrum of the generator of X has a discrete spectrum
then the process is a-mixing. If k1 denotes the smallest non-zero eigenvalue, then the mixing
coefficients satisfy
at ðh0 Þ etk1 ;
see Doukhan (1994, p. 112). Thus X is geometrically a-mixing. The diffusion processes
considered in Examples 1 and 2 have a discrete spectrum with k1 ¼ b.
Other criteria for geometrical mixing can be found in Veretennikov (1987, 1989), Doukhan
(1994) and Hansen & Scheinkman (1995). Rather general criteria for geometric a-mixing of
diffusion processes expressed in the language of Malliavin calculus were given by Kusuoka &
Yoshida (2000). We cite the following straightforward and rather weak set of conditions by
Genon-Catalot et al. (2000) on the coefficients b and r that are sufficient to ensure geometric a-mixing of X. It is presupposed that X is stationary with state space (l, r)
()1 £ l < r £ 1) and that the usual conditions on the scale measure and the speed measure
hold, i.e. that
Z x0
Z r
Z r
mðxÞdx < 1;
sðxÞdx ¼ 1 and
sðxÞdx ¼
l
l
x0
where
Z
sðxÞ ¼ exp 2
x
x0
bðuÞ
1
du
and mðxÞ ¼ 2
;
r2 ðuÞ
r ðxÞsðxÞ
and where x0 2 (l, r).
Condition 1
(i) The function b is continuously differentiable and r is twice continuously differentiable on
(l, r), r(x) > 0 for all x 2 (l, r), and there exists a constant K > 0 such that |b(x)| £
K(1 + |x|) and r2(x) £ K(1 + x2) for all x 2 (l, r).
(ii) r(x)m(x) fi 0 as x fl l and x › r.
(iii) 1/c(x) has a finite limit as x fl l and x › r, where c(x) ¼ r¢(x) ) 2b(x)/r(x).
This condition in fact implies more than geometric a-mixing, it actually ensures geometric
q-mixing, which again implies the exponential bounds on certain moments mentioned in
Sections 2 and 3. Specifically, there exist K > 0 and k > 0 such that if Z1 is measurable with
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
Inference for integrated diffusions
Scand J Statist 31
427
respect to the r-algebra generated by Xs, s £ t1, and Z2 is measurable with respect to the
r-algebra generated by Xs, s ‡ t2, t1 < t2, then |Cov(Z1,Z2)| £ K e)k(t2)t1)Var(Z1)Var(Z2).
Conditions ensuring polynomial a-mixing were given by Veretennikov (1997).
The following lemma can be used to check the moment condition (16) in Theorem 1.
Lemma 1
j
ði1Þ
Suppose fj(y) ¼ yjj0 and Zjk ¼ Yiljk jk with jjk, ljk ‡ 1 (j ¼ 1,. . .,N, k ¼ 0,. . .,qj). If
4j+
Eh(|X0|
) < 1 for an > 0, where j ¼ max{j10,. . .,jNqN}, then (16) holds with d ¼ /(2j).
2þd
< 1 for 1 £ i £ r and m1,m2 2
Proof. It is enough to check that Eh Yim1 Y1m2
{j10,. . .,jNqN}, and by Cauchy–Schwarz’ inequality this is the case if Eh(|Y1|2j(2+d)) < 1.
Finally, by Jensen’s inequality, Fubini’s theorem and the stationarity of X
Z D
2jð2þdÞ !
2jð2þdÞ
Eh jY1 j
Xu mðduÞ
¼ Eh
0
Z
0
D
Eh jXu j2jð2þdÞ mðduÞ ¼ Eh jX0 j2jð2þdÞ < 1:
ði1Þ
For more general choices of fj(y) and Zjk , the existence of the relevant moments must be
checked.
The following result about existence, consistency and asymptotic normality of our estimators can now be proved exactly as the similar result in Sørensen (2000).
Theorem 2
Let h0 denote the true value of the parameter vector. Suppose the conditions of Theorem 1 hold
e of h0 and that
for h in a neighbourhood H
(i) The vector ^aðhÞ given by (12) and the matrix An(h) are twice continuously differentiable with
respect to h.
(ii) The matrices @hT ^aðh0 Þ and A(h0) have rank p.
(iii) The matrices An(h), @ hiAn(h) and @ hi@ hjAn(h) converge to A(h), @ hiA(h) and @ hi@ hjA(h),
e
respectively, uniformly for h 2 H.
Then for every n ‡ r, an estimator ^hn exists that solves the estimating equation Gn ð^
hn Þ ¼ 0
with a probability tending to 1 as n fi 1. Moreover,
^hn ! h0
ð22Þ
in probability and
pffiffiffi ^
D
nðhn h0 Þ ! N 0; Dðh0 Þ1 V ðh0 ÞðDðh0 Þ1 ÞT
ð23Þ
0 Þ@ T ^aðhÞ, where Cðh
0 Þ is given by (11).
as n fi 1 with Dðh0 Þ ¼ Aðh0 ÞCðh
h
For the optimal prediction-based estimating function, we have seen that
MðhÞ
1 and that V(h) is given by (21). Therefore, the asymptotic variance
AðhÞ ¼ @h ^aðhÞTCðhÞ
pffiffiffi ^
of nðhn h0 Þ is
1
0 Þ Mðh
0 Þ@ T ^aðh0 Þ
0 Þ1Cðh
@h ^aðh0 ÞTCðh
h
in the optimal case.
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
428
S. Ditlevsen and M. Sørensen
Scand J Statist 31
The conditions ensuring consistency and asymptotic normality are satisfied for the estimators derived earlier in this paper for the Ornstein–Uhlenbeck process and the CIR model.
5. Conclusion
We have demonstrated that the problem of statistical inference for integrated diffusions can be
solved in a satisfactory and readily implementable way by means of optimal prediction-based
estimating functions. Under mild regularity conditions the estimators are consistent and
asymptotically normal.
We have, moreover, considered some of the problems encountered when the method is
implemented in practice. The calculation of moments needed in order to find the optimal
prediction-based estimating function is easily programmable when an analytic expression is
known for the moments of the underlying diffusion process or when we can obtain the
moments of the diffusion by numerical simulation. This was illustrated by two examples,
where explicit estimators were derived.
We derived the optimal estimating functions and the asymptotic results under the condition
that the intervals over which the diffusion is integrated have the same length. When this is not
the case, as in Ditlevsen et al. (2002), the observations do not form a stationary process, which
implies a more complex expression for the optimal estimating functions and more difficulty in
proving asymptotic results.
Acknowledgements
The research of Michael Sørensen was supported by MaPhySto – A Network in Mathematical
Physics and Stochastics funded by The Danish National Research Foundation, and both
authors were supported by the European Community’s Human Potential Programme under
contract HPRN-CT-2000-00100, DYNSTOCH. Thanks are due to the referees for several
helpful comments.
References
Andersen, T. G. & Bollerslev, T. (1998). Answering the skeptics: yes, standard volatility models do provide
accurate forecasts. Int. Econ. Rev. 39, 885–905.
Andersen, T. G., Bollerslev, T., Diebold, F. X. & Labys, P. (2001). The distribution of realized exchange
rate volatility. J. Amer. Statist. Assoc. 96, 42–55.
Barndorff-Nielsen, O. E. & Shepard, N. (2001). Non-Gaussian Ornstein–Uhlenbeck-based models and
some of their uses in financial econometrics (with discussion). J. Roy. Statist. Soc. Ser. B 63, 167–241.
Bibby, B. M. & Sørensen, M. (1995). Martingale estimation functions for discretely observed diffusion
processes. Bernoulli 1, 17–39.
Bibby, B. M. & Sørensen, M. (1997). A hyperbolic diffusion model for stock prices. Finance Stoch. 1,
25–41.
Bibby, B. M. & Sørensen, M. (2001). Simplified estimating functions for diffusion models with a highdimensional parameter. Scand. J. Statist. 28, 99–112.
Brockwell, P. J. & Davis, R. A. (1991). Time series: theory and methods. Springer, New York.
Cox, J. C., Ingersoll, J. E. & Ross, S. A. (1985). A theory of the term structure of interest rates.
Econometrica 53, 385–407.
Ditlevsen, P. D., Ditlevsen, S. & Andersen, K. K. (2002). The fast climate fluctuations during the stadial
and interstadial climate states. Ann. Glaciol. 35, 457–462.
Doukhan, P. (1994). Mixing, properties and examples. Lecture Notes in Statistics, Vol. 85. Springer, New
York.
Doukhan, P., Massart, P. & Rio, E. (1994). The functional central limit theorem for strongly mixing
processes. Ann. Inst. Henri Poincaré 30, 63–82.
Board of the Foundation of the Scandinavian Journal of Statistics 2004.
Scand J Statist 31
Inference for integrated diffusions
429
Feller, W. (1951). Diffusion processes in genetics. In Proceedings of the 2nd Berkeley Symposium on
Mathematical Statistics and Probability (ed. J. Neyman) 227–246. University of California Press,
Berkeley.
Genon-Catalot, V., Jeantheau, T. & Larédo, C. (1999). Parameter estimation for discretely observed
stochastic volatility models. Bernoulli 5, 855–872.
Genon-Catalot, V., Jeantheau, T. & Larédo, C. (2000). Stochastic volatility models as hidden Markov
models and statistical applications. Bernoulli 6, 1051–1079.
Gloter, A. (1999a). Parameter estimation for a discretely observed integrated diffusion process. Preprint,
University of Marne-la-Vallée, 13/99.
Gloter, A. (1999b). Parameter estimation for a hidden diffusion. Preprint, University of Marne-la-Vallée,
20/99.
Gloter, A. (2000). Parameter estimation for a discrete sampling of an integrated Ornstein-Uhlenbeck
process. Statistics 35, 225–243.
Godambe, V. P. & Heyde, C. C. (1987). Quasi likelihood and optimal estimation. Int. Statist. Rev. 55,
231–244.
Hansen, L. P. & Scheinkman, J. A. (1995). Back to the future: generating moment implications for
continuous-time Markov processes. Econometrica 63, 767–804.
Heyde, C. C. (1997). Quasi-likelihood and its application. Springer, New York.
Kusuoka, S. & Yoshida, N. (2000). Malliavin calculus, geometric mixing, and expansion of diffusion
functionals. Probab. Theory Rel. Fields 116 457–484.
Pedersen, A. R. (2000). Estimating the nitrous oxide emission rate from the soil surface by means of a
diffusion model. Scand. J. Statist. 27, 385–403.
Rydén, T. (1994). Consistent and asymptotically normal parameter estimates for hidden Markov models.
Ann. Statist. 22, 1884–1895.
Sørensen, M. (1997). Estimating functions for discretely observed diffusions: a review. In Selected
Proceedings of the Symposium on Estimating Functions (eds I. V. Basawa, V. P. Godambe & R. L.
Taylor) IMS Lecture Notes – Monograph Series, Vol. 32, 305–325. Institute of Mathematical Statistics,
Hayward.
Sørensen, M. (1999). On asymptotics of estimating functions. Braz. J. Probab. Statist. 13, 111–136.
Sørensen, M. (2000). Prediction-based estimating functions. Econometr. J. 3, 123–147.
Veretennikov, A. Yu. (1987). Bounds for the mixing rate in the theory of stochastic equations. Theory
Probab. Appl. 32, 273–281.
Veretennikov, A. Yu. (1989). On rate of mixing and the averaging principle for hypoelliptic stochastic
differential equations. Math. USSR Izvestiya 33, 221–231.
Veretennikov, A. Yu. (1997). On polynomial mixing bounds for stochastic differential equations. Stoch.
Proc. Appl. 70, 115–127.
Received February 2002, in final form November 2003
Michael Sørensen, Department of Applied Mathematics and Statistics, University of
Copenhagen, Universitetsparken 5, DK-2100 Copenhagen Ø, Denmark.
E-mail:
[email protected]
Board of the Foundation of the Scandinavian Journal of Statistics 2004.