Academia.eduAcademia.edu

Inference for Observations of Integrated Diffusion Processes

2004, Scandinavian Journal of Statistics

Estimation of parameters in diffusion models is investigated when the observations are integrals over intervals of the process with respect to some weight function. This type of observations can, for example, be obtained when the process is observed after passage through an electronic filter. Another example is provided by the ice-core data on oxygen isotopes used to investigate paleo-temperatures. Finally, such data play a role in connection with the stochastic volatility models of finance. The integrated process is not a Markov process. Therefore, predictionbased estimating functions are applied to estimate parameters in the underlying diffusion model. The estimators are shown to be consistent and asymptotically normal. The theory developed in the paper also applies to integrals of processes other than diffusions. The method is applied to inference based on integrated data from Ornstein-Uhlenbeck processes and from the Cox-Ingersoll-Ross model, for both of which an explicit optimal estimating function is found.

 Board of the Foundation of the Scandinavian Journal of Statistics 2004. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 31: 417–429, 2004 Inference for Observations of Integrated Diffusion Processes SUSANNE DITLEVSEN and MICHAEL SØRENSEN University of Copenhagen ABSTRACT. Estimation of parameters in diffusion models is investigated when the observations are integrals over intervals of the process with respect to some weight function. This type of observations can, for example, be obtained when the process is observed after passage through an electronic filter. Another example is provided by the ice-core data on oxygen isotopes used to investigate paleo-temperatures. Finally, such data play a role in connection with the stochastic volatility models of finance. The integrated process is not a Markov process. Therefore, predictionbased estimating functions are applied to estimate parameters in the underlying diffusion model. The estimators are shown to be consistent and asymptotically normal. The theory developed in the paper also applies to integrals of processes other than diffusions. The method is applied to inference based on integrated data from Ornstein–Uhlenbeck processes and from the Cox–Ingersoll–Ross model, for both of which an explicit optimal estimating function is found. Key words: asymptotic normality, consistency, Cox–Ingersoll–Ross model, estimating equation, ice-core data, non-Markovian process, Ornstein–Uhlenbeck process, prediction-based estimating functions, quasi-likelihood, stochastic differential equation 1. Introduction In the present paper, we study statistical inference for observations of integrated diffusions. In several cases, a sample of observations at discrete time points of a diffusion process is not available but, for example, a realization of the process has been observed after passage through an electronic filter. Another example is provided by the ice-core records from Greenland. The isotope ratio 18O/16O in the ice, measured as an average in pieces of ice, each piece representing a time interval with time increasing as a function of the depth, is a proxy for paleo-temperatures. The variation of the paleo-temperature can be modelled by a stochastic differential equation, and it is natural to model the ice-core data as an integrated diffusion process, see Ditlevsen et al. (2002). Integrated processes also play an important role in connection with the so-called realized stochastic volatility in finance, see Andersen & Bollerslev (1998), Genon-Catalot et al. (1999), Gloter (1999b), Sørensen (2000), Barndorff-Nielsen & Shepard (2001) and Andersen et al. (2001). Martingale estimating functions are a useful tool for statistical inference based on discretely sampled diffusions, see e.g. Bibby & Sørensen (1995, 1997, 2001), Pedersen (2000), and Sørensen (1997) and references therein. For such data the likelihood function is usually not explicitly available and can only be found numerically or be otherwise approximated. Integrated diffusion processes, however, are not Markov processes, for which reason there are no natural or easily calculated martingales on which to base a class of estimating functions. Moreover, the likelihood function is not even numerically tractable for such models. Therefore, we will apply the prediction-based estimating functions that were introduced in Sørensen (2000) as a tool for drawing statistical inference about non-Markovian models and in other situations where no martingale is readily available. These estimating functions are generalizations of the martingale estimating functions. It is shown that the method of prediction-based estimating functions under mild regularity conditions provides a satisfactory solution to the inference problem investigated here. The conditions ensure existence, consistency and asymptotic normality of the estimators. 418 S. Ditlevsen and M. Sørensen Scand J Statist 31 Other approaches to inference for integrated diffusions were presented in Gloter (1999a, 2000). The first paper considers the integrated Ornstein–Uhlenbeck process, and compares the Whittle estimator, which in this case is efficient, with the estimator obtained from Rydén’s split data maximum pseudo-likelihood estimator, see Rydén (1994). In Gloter (1999a) minimum contrast estimators are considered that are consistent when the length of the sampling interval goes to zero as the number of observations go to infinity. In Section 2 the model of integrated diffusions is presented, and prediction-based estimating functions are briefly presented and applied to solve the inference problem. A way of finding the necessary moments of the integrated process is derived, and as examples prediction-based estimating functions are found for the integrated Ornstein–Uhlenbeck process and for the integrated Cox–Ingersoll–Ross (CIR) model. In Section 3 the optimal prediction-based estimating function is derived, and we find a way of deriving moments of order j, j being a non-negative integer, provided that these moments exist. A formula is given so that if we know an analytic expression or can simulate the moments in the underlying process, the calculation of the moments needed in order to find the optimal prediction-based estimating function is easily programmable. The optimal predictionbased estimating functions in the previous examples are discussed. This yields, in both examples, explicit estimating functions. In Section 4, asymptotic results about the estimating functions and their estimators are proved under weak regularity conditions using the mixing properties of the underlying diffusion process. It is worth pointing out that the theory developed in this paper applies immediately to integrals of processes other than diffusions too. For the asymptotic theory to hold, it is enough that the process that is integrated is sufficiently mixing. 2. Integrated diffusions and prediction-based estimating functions Consider the one-dimensional diffusion dXt ¼ bðXt ; hÞdt þ rðXt ; hÞdWt ; X0  lh ; where h is an unknown p-dimensional parameter belonging to the parameter space Q ˝ Rp and W is a one-dimensional standard Wiener process. We assume that X0 is independent of W, that the stochastic differential equation has a unique weak solution, and that X is an ergodic, stationary diffusion with invariant measure lh. Suppose that a sample of observations at discrete time points is not available but, instead, a running integral of the process with respect to some weight function is available. Specifically, suppose the interval of observation [0,T] is subdivided into n smaller intervals of length D ¼ T/ n, and let v be a probability measure on the interval [0,D]. We shall consider observations of the form Z D Yi ¼ Xði1ÞDþs dvðsÞ; i ¼ 1; . . . ; n: ð1Þ 0 Typically, v will have a density u with respect to the Lebesgue measure on [0,D], in which case Z iD Xs uðs  ði  1ÞDÞds; i ¼ 1; . . . ; n: ð2Þ Yi ¼ ði1ÞD If our observations are obtained by integrating uniformly over the time axis, v is simply the uniform distribution on [0, D] with u ¼ 1/D, and we get the more simple observations  Board of the Foundation of the Scandinavian Journal of Statistics 2004. Inference for integrated diffusions Scand J Statist 31 Yi ¼ 1 D Z 419 iD Xs ds: ði1ÞD Note that by stationarity, the law of the process X is invariant under time translations, which easily implies that {Yi} is a stationary process. We solve the problem of estimating the parameter h in the underlying process X by applying the method of prediction-based estimating functions introduced in Sørensen (2000). In the following, we will briefly outline the method of prediction-based estimating functions. Assume that fj, j ¼ 1,. . .,N, are one-dimensional functions such that Eh(fj(Yi)2) < 1 for all h 2 Q. We denote the expectation when h is the true parameter value by Eh(Æ). Let hjk, j ¼ 1,. . ., N, k ¼ 0,. . .,qj (qj ‡ 0) be functions from Rr into R, and define (for i ‡ r + 1) random variables ði1Þ ði1Þ by Zjk ¼ hjk ðYi1 ; Yi2 ; . . . ; Yir Þ. We assume that Eh ððZjk Þ2 Þ < 1 for all h 2 Q, and let Pi)1,j denote the subspace of the space of square integrable random variables spanned by ði1Þ ði1Þ Zj0 ; . . . ; Zjqj . Finally, we make the natural assumption that the functions hj0,. . .,hjqj are linearly independent. The space Pi)1, j can be interpreted as a set of predictors of fj(Yi) based ði1Þ on Yi)r,. . .,Yi)1. We write the elements of Pi)1,j in the form aT Zj , where aT ¼ (a0,. . .,aqj) ði1Þ ði1Þ ði1Þ T and Zj ¼ ðZj0 ; . . . ; Zjqj Þ are qj-dimensional vectors. We denote transposition by T. In the rest of this paper, hj0 ¼ 1. We will study the estimating function Gn ðhÞ ¼ N n X X ði1Þ Pj i¼rþ1 j¼1 h i ði1Þ ^j ðhÞ ; ðhÞ fj ðYi Þ  p ð3Þ ði1Þ where Yi is of the form (2), Pj ðhÞ is a p-dimensional stochastic vector, the coordinates ði1Þ ^ j ðhÞ is the minimum mean square error predictor of fj (Yi) of which belong to Pi)1, j, and p in Pi)1,j. When h is the true parameter value, we define Cj(h) as the covariance matrix ðrÞ ðrÞ ðrÞ ðrÞ of ðZj1 ; . . . ; Zjqj ÞT and bj ðhÞ ¼ ðCovh ðZj1 ; fj ðYrþ1 ÞÞ; . . . ; Covh ðZjqj ; fj ðYrþ1 ÞÞÞT . Then we have ði1Þ ^j p ði1Þ ðhÞ ¼ ^aj ðhÞT Zj where ^aj ðhÞT ¼ ð^aj0 ðhÞ; ^aj ðhÞT Þ with ^aj ðhÞ ¼ Cj ðhÞ1 bj ðhÞ ð4Þ and ^aj0 ðhÞ ¼ Eh ðfj ðY1 ÞÞ  qj X k¼1 ðrÞ ^ajk ðhÞEh ðZjk Þ: ði1Þ ð5Þ m ¼ Yikj , k ¼ 1,. . .,r, for some positive integers If, for instance, we take fj (y) ¼ ymj and Zjk mj mj, we need to calculate the moments Eh ðY1 Þ and Eh((Y1Yk)mj) for k ¼ 1,. . .,r. Once we have these moments, the vector of coefficients ^aj can easily be found by means of the Durbin– Levinson algorithm,  see Brockwell  & Davis (1991). For many diffusions there exist K > 0 and m  k > 0 such that Covh ðY1m ; Yrþ1 Þ  K ekðr  1Þ , see Section 4. Therefore r will usually not need to be chosen to be particularly large. ði1Þ Presumably f1(y) ¼ y and f2(y) ¼ y2 with Zjk ¼ Yik , k ¼ 1,. . ., r, j ¼ 1, 2 and ði1Þ 2 Z2k ¼ Yiþrk , k ¼ r + 1,. . ., 2r, will in many cases be a reasonable choice. In this case the minimum mean square error predictor of f1(Yi) can be found as described above, while the predictor of f2(Yi) can be found by applying the two-dimensional Durbin–Levinson algorithm to the process ðYi ; Yi2 Þ.  Board of the Foundation of the Scandinavian Journal of Statistics 2004. 420 S. Ditlevsen and M. Sørensen Scand J Statist 31 The necessary moments can in all these cases be found from the mixed moments of the process X. First we assume that Eh(|X0|m) < 1. Then Z m ! D Eh ðY1m Þ ¼ Eh ¼ Eh ¼ Z Xs uðsÞds 0 Z D 0 D 0   Z 0 D Z D 0 Xs1    Xsm uðs1 Þ    uðsm Þds1    dsm  Eh ðXs1    Xsm Þuðs1 Þ    uðsm Þds1    dsm ; where we have used Fubini’s theorem. Specifically we get Z D Z D uðsÞds ¼ Eh ðX0 Þ: Eh ðXs ÞuðsÞds ¼ Eh ðX0 Þ Eh ðY1 Þ ¼ 0 0 When u(t) is a constant, we see that Z Z Z sm1 m! D s1 m Eh ðXs1    Xsm Þdsm    ds2 ds1 :  Eh ðY1 Þ ¼ m D 0 0 0 Here we have used that Eh(Xs1  Xsm) does not depend on the ordering of s1,. . ., sm so that it is enough to integrate over the region where 0 £ sm £    £ s1 £ D. The factor m! appears because s1,. . ., sm can be ordered in m! different ways. In a similar way we obtain that when Eh(|X0|m1+m2) < 1, Z D Z D     Eh Y1m1 Ykm2 ¼  Eh Xs1    Xsm1 Xððk1ÞDþsðm1 þ1Þ Þ    Xððk1ÞDþsðm1 þm2 Þ Þ 0 0  uðs1 Þ    uðsðm1 þm2 Þ Þds1    dsðm1 þm2 Þ : Example 1. For diffusion models where the eigenfunctions of the generator are polynomials it is possible to find all moments of type E(Xt1  Xtm), see e.g. Sørensen (2000). Consider the Ornstein–Uhlenbeck process given by dXt ¼ bXt dt þ r dWt : This process is ergodic, and its stationary distribution is the normal distribution with expectation 0 and variance r2/(2b), provided that b > 0. Recently, this process was used to model the ice-core data from Greenland mentioned in the Introduction. The observations were of the integrated process, for details see Ditlevsen et al. (2002). We have that Eh ðXt jX0 ¼ x0 Þ ¼ x0 ebt and Eh ðX0 Xt Þ ¼ r2 ð2bÞ1 ebt : This implies that Eh ðY12 Þ ¼ r2 b3 D2 ðbD  1 þ ebD Þ and 1 Eh ðY1 Yk Þ ¼ r2 b3 D2 ð1  ebD Þ2 ekbD 2 for k > 1. ði1Þ ði1Þ ði1Þ For f1(y) ¼ y, f2(y) ¼ y2, Z1;0 ¼ 1, Z1;1 ¼ Yi1 and Z2;0 ¼ 1 (i.e. q1 ¼ 1, q2 ¼ 0, r ¼ 1), we get  Board of the Foundation of the Scandinavian Journal of Statistics 2004. Inference for integrated diffusions Scand J Statist 31 ði1Þ ^1 p ðhÞ ¼ ð1  ebD Þ2 Yi1 2ðbD  1 þ ebD Þ ði1Þ ^2 p and ðhÞ ¼ 421 r2 ðbD  1 þ ebD Þ : b3 D2 ði1Þ ði1Þ One possible estimating equation is obtained by choosing P1 ðhÞ ¼ ð1; 0ÞT and P2 ðhÞ ¼ ði1Þ ^1 ðhÞ ð0; 1ÞT . This is, however, not the optimal estimating function based on the predictors p ði1Þ ^2 ðhÞ. The optimal estimating function will be found in the next section. and p Example 2. Another particular example is the CIR model given by dXt ¼ bðXt  aÞdt þ r pffiffiffiffiffi Xt dWt : This process is ergodic and its stationary distribution is the Gamma distribution with shape parameter 2bar)2 and scale parameter r2/(2b), provided that b > 0, a > 0, r > 0, and 2ba ‡ r2. The process has many applications: it was introduced in mathematical finance as a model of the short-term interest rate by Cox et al. (1985). Feller (1951) proposed it as a model for population growth, and recently it was used to model nitrous oxide emission from soil by Pedersen (2000). All moments of the type Eh(Xt1  Xtm) can be calculated by means of formulae in Sørensen (2000). In particular, Eh ðX0 Þ ¼ a; Eh ðX02 Þ ¼ aða þ r2 =ð2bÞÞ, and Eh ðX0 Xt Þ ¼ a2 þ ar2 ð2bÞ1 ebt : Thus Eh ðY12 Þ ¼ a2 þ ar2 b3 D2 ðebD  1 þ bDÞ and 1 Eh ðY1 Yk Þ ¼ a2 þ ar2 b3 D2 ðebD  1Þ2 ekbD 2 ði1Þ ði1Þ for k > 1. For f1(y) ¼ y, f2(y) ¼ y2, Z1;0 q1 ¼ 1, q2 ¼ 0, r ¼ 1), we get ði1Þ ^1 p ðhÞ ¼ að1  a11 ðbÞÞ þ a11 ðbÞYi1 ¼ 1, Z1;1 ði1Þ ^2 p and ði1Þ ¼ Yi1 and Z2;0 ¼ 1 (i.e. ðhÞ ¼ a20 ðhÞ; where a11 ðbÞ ¼ ð1  ebD Þ2 2ðbD  1 þ ebD Þ ði1Þ and a20 ðhÞ ¼ Eh ðY12 Þ. Thus by choosing P1 we end up with the estimating equations ði1Þ ðhÞ ¼ ð1; Yi1 ; 0ÞT and P2 n n X 1 X ^ þ a11 ðbÞ ^ 1 Yi ¼ ^að1  a11 ðbÞÞ Yi1 n  1 i¼2 n  1 i¼2 n X i¼2 ^ Yi1 Yi ¼ ^að1  a11 ðbÞÞ ^2 ¼ r n X i¼2 ^ Yi1 þ a11 ðbÞ   ^3 D2 Pn Y 2  ^a2 b i¼2 i : ^ ^ ðn  1Þ^aðebD  1 þ bDÞ n X i¼2 Note that  Board of the Foundation of the Scandinavian Journal of Statistics 2004. 2 Yi1 ðhÞ ¼ ð0; 0; 1ÞT , 422 S. Ditlevsen and M. Sørensen ^a ¼ Scand J Statist 31 n1 1 X Yn  Y1 Yi þ ^ n  1 i¼1 ðn  1Þð1  a11 ðbÞÞ ^ is not close to 1. We is essentially the average of the observations when n is large and a11 ðbÞ shall see in the next section that the estimating function found here is, in fact, optimal. 3. The optimal prediction-based inference for integrated diffusions In this section we derive the optimal choice of the weight P(i)1)(h) in (3) using the results and notation in Sørensen (2000). Optimality is in the sense of the theory of estimating functions, see Godambe & Heyde (1987) and Heyde (1997). The optimal member of a class of estimating functions is the one that provides the most efficient estimator. This estimator is sometimes called a quasi-likelihood estimator. In Sørensen (2000) it was shown that the optimal estimating function of the type (3) is given by Gn ðhÞ ¼ An ðhÞ n X i¼rþ1 H ðiÞ ðhÞ; ð6Þ where   ^ ði1Þ ðhÞ ; H ðiÞ ðhÞ ¼ Z ði1Þ F ðYi Þ  p ð7Þ ði1Þ ^ði1Þ ðhÞ ¼ ð^ p1 with F(x) ¼ (f1(x),. . ., fN(x))T, p 0 1 ði1Þ Z 0q1  0q1 B 1 C ði1Þ B 0 Z    0q2 C q2 2 C Z ði1Þ ¼ B B .. .. C: .. @ . . A . ði1Þ 0qN 0qN    ZN ði1Þ ^N ðhÞ; . . . ; p ðhÞÞT and ð8Þ Here 0qj denotes the qj-dimensional zero-vector. Finally,  M  n ðhÞ1 ; An ðhÞ ¼ @h ^aðhÞTCðhÞ ð9Þ with    n ðhÞ ¼ Eh H ðrþ1Þ ðhÞH ðrþ1Þ ðhÞT M   i ðn  r  kÞ h  ðrþ1Þ  Eh H ðhÞH ðrþ1þkÞ ðhÞT þ Eh H ðrþ1þkÞ ðhÞH ðrþ1Þ ðhÞT ; ð10Þ ðn  rÞ k¼1    ¼ Eh Z ði1Þ ðZ ði1Þ ÞT ; ð11Þ CðhÞ þ and nr1 X   ^aðhÞT ¼ a^1 ðhÞT ; . . . ; ^aN ðhÞT ; ð12Þ where ^aj ðhÞ is given by (4) and (5). A sufficient condition for (6) to be optimal is that the matrix aðhÞT has full rank, and that the functions 1, f1,. . ., fN are linearly independent on the @h ^ support of the conditional distribution of Yn given Y1,. . .,Yn)1. In particular, the latter  n ðhÞ is invertible. condition implies that the matrix M  Board of the Foundation of the Scandinavian Journal of Statistics 2004. Inference for integrated diffusions Scand J Statist 31 423 In Section 4, we shall see that for many diffusion models there exist K > 0 and k > 0 such that the absolute values of all entries in the expectation matrices in the sum in (10) are dominated by K e)k(k)r)1) when k > r. Therefore, the sum in (10) can in practice often be truncated so that fewer moments need to be calculated. j ði1Þ ði1Þ Natural choices for fj (y) and Zjk would be fj (y) ¼ yjj0 and Zjk ¼ Yiljk jk , where jj0 4j and jjk are such that Eh[Y ] exists with j ¼ max{j10,. . .,jNqN}. Note that it is enough that Eh[Y2j] exists for a prediction-based estimating function to be well defined. The more strict condition is for the optimal prediction-based estimating function to exist. From now on we ði1Þ assume that fj and Zjk have the form just indicated. For simplicity we assume jj0 and jjk are integers. In order to calculate (10), we then need higher order moments of the form Eh ½Y1k1 Ytk1 2 Ytk2 3 Ytk3 4 , where 1 £ t1 £ t2 £ t3 and where ki, i ¼ 1,. . ., 4 are non-negative integers such that k1 + k2 + k3 + k4 £ 4j. We will express these moments in terms of the moments of Xt, which will usually either be known or possible to determine by simulation. Define wð v; u; s; r; hÞ ¼ Eh ½X v1    X vk Xu1    Xuk2 Xs1    Xsk3 Xr1    Xrk4 ; 1 where v ¼ ( v1,. . ., vk1), u ¼ (u1,. . ., uk2), s ¼ (s1,. . ., sk3), r ¼ (r1,. . ., rk4), /ðx; k; t; DÞ ¼ uðx1  ðt  1ÞDÞ    uðxk  ðt  1ÞDÞ; where k is an integer and x ¼ (x1,. . ., xk), Uð v; u; s; r; t1 ; t2 ; t3 ; DÞ ¼ /ð v; k1 ; 1; DÞ/ðu; k2 ; t1 ; DÞ/ðs; k3 ; t2 ; DÞ/ðr; k4 ; t3 ; DÞ; and A1 ¼ ½0; Dk1 A2 ¼ ½ðt1  1ÞD; t1 Dk2 A3 ¼ ½ðt2  1ÞD; t2 Dk3 A4 ¼ ½ðt3  1ÞD; t3 Dk4 T1 ¼ fð v1 ; . . . ; vk1 Þ : 0  v1      vk1  Dg T2 ¼ fðu1 ; . . . ; uk2 Þ : ðt1  1ÞD  u1      uk2  t1 Dg T3 ¼ fðs1 ; . . . ; sk3 Þ : ðt2  1ÞD  s1      sk3  t2 Dg T4 ¼ fðr1 ; . . . ; rk4 Þ : ðt3  1ÞD  r1      rk4  t3 Dg B ¼ ðA1 \ T1 Þ  ðA2 \ T2 Þ  ðA3 \ T3 Þ  ðA4 \ T4 Þ: In the same way as in Section 2 we get h i Z Eh Y1k1 Ytk1 2 Ytk2 3 Ytk3 4 ¼ wð v; u; s; r; hÞUð v; u; s; r; t1 ; t2 ; t3 ; DÞdt; A1 A2 A3 A4 where dt ¼ drk4  dr1dsk3  ds1duk2  du1d vk1  d v1. Thus we need the mixed moments of the process X of order up to k1 + k2 + k3 + k4. These depend on the distance in time between the variables Xti appearing in the expression for the moment, and care has to be taken when different variables are integrated over the same interval when the order of the integration variables changes. When u(t) ¼ 1/D, this can be solved in the following way. Assume that 1 < t1 < t2 < t3. Arguments of symmetry yield that Z k1 !k2 !k3 !k4 ! Eh ½Y1k1 Ytk12 Ytk2 3 Ytk34  ¼ ðk þk þk þk Þ wð v; u; s; r; hÞdt: ð13Þ D 1 2 3 4 B  Board of the Foundation of the Scandinavian Journal of Statistics 2004. 424 S. Ditlevsen and M. Sørensen Scand J Statist 31 The factor k1! appears because v1 ; . . . ; vk1 can be ordered in k1! different ways. The arguments for the other factors are similar. Example 3 (Example 1 continued). We will now find the optimal prediction-based estimating function for the integrated Ornstein–Uhlenbeck process with N ¼ 1; f1 ðyÞ ¼ ði1Þ ði1Þ ði1Þ y; f2 ðyÞ ¼ y 2 ; Z1;0 ¼ Z2;0 ¼ 1 and Z1;1 ¼ Yi1 (i.e. q1 ¼ r ¼ 1, q2 ¼ 0).  First note that as CðhÞ is diagonal and ^a10 ðhÞ ¼ 0, it follows that the first column of the T matrix @h ^aðhÞ CðhÞ is zero. Further, we have Eh ðY1k1 Ytk12 Ytk2 3 Þ ¼ 0 for 1 £ t1 £ t2 and for non-negative integers ki satisfying k1 + k2 + k3 ¼ 3, because all  n ðhÞ has the form moments of an odd order are zero. Therefore, M 0 1 m11 0 0  n ðhÞ ¼ @ 0 m22 m23 A; M 0 m32 m33 which together with the observation above implies that An ðhÞ has the form   0 a12 a13 : An ðhÞ ¼ 0 a22 a23  and The 2 · 2 matrix {aij} is invertible because @h ^aðhÞ has full rank and the matrices CðhÞ  M n ðhÞ are invertible. Hence we end up with the following optimal prediction-based estimating equations for h: Pn ^ ð1  ebD Þ2 i¼2 ‘Yi1 Yi P ¼ n ^ 2  bD ^ 2ðbD  1 þ e Þ i¼2 Yi1 ^ 1 ^ 3 D2 ðbD ^  1 þ ebD ^2 ¼ b r Þ n 1 X Y 2: n  1 i¼2 i ^ ^ 6¼ 0. Hence there is no solution if Pn Yi1 Yi  0. As ^  1 þ ebD Note that bD > 0 when b i¼2 Yi and Yi)1 are positively correlated, the probability that this happens goes to zero as n fi 1. Example 4 (Example 2 continued). Consider again the CIR model. As we have found a three-dimensional estimating function to estimate three parameters, it is necessarily optimal. The matrix An ðhÞ is invertible by arguments similar to those in Example 3 and hence does not change the estimator. There is therefore no need to find an expression for this matrix. Suppose, however, that we know one of the parameters, a say. In order to find the optimal combination of the three coordinates of the prediction-based estimating function derived in Example 2 into a two-dimensional estimating function, we need moments of the form Eh(Y1Yt1Yt2) and Eh(Y1Yt1Yt2Yt3), (1 £ t1 £ t2 £ t3). By the formulae above, these can be obtained by integration of moments of the form Eh(Xt1Xt2Xt3) and Eh(Xt1Xt1Xt2Xt3), for which explicit and easily integrable expressions are known, see e.g. Sørensen (2000). As the resulting expressions are rather long, they are omitted. 4. Asymptotic results In this section we give asymptotic results for our estimating functions and the corresponding estimators when our observations are integrated diffusions, based on general results in Sørensen (1999, 2000). To do this we need to study which properties the process Y inherits from the underlying diffusion process X. The integrated process Y is not a Markov process,  Board of the Foundation of the Scandinavian Journal of Statistics 2004. Inference for integrated diffusions Scand J Statist 31 425 but mixing properties and moment conditions satisfied by X are preserved, which is what we will use in this section. We begin with a result on the asymptotic behaviour of an estimating function of the general form n X Gn ðhÞ ¼ An ðhÞ H ðiÞ ðhÞ; ð14Þ i¼r where {An(h)} is a sequence of p  PN j ¼ 1 ðqj þ 1Þ matrices, and where H(i)(h) is given by (7). Theorem 1 Suppose the diffusion process X is stationary and a-mixing with mixing coefficients at(h), t > 0, and that there exists a d > 0 such that 1 X akD ðhÞd=ð2þdÞ < 1 ð15Þ k¼1 and  2þd    < 1; Eh H ðrÞ ðhÞjk  j ¼ 1; . . . ; N ; k ¼ 0; . . . ; qj : ð16Þ Then as n fi 1,  n ðhÞ ! MðhÞ; M  n ðhÞ is given by (10) and where where M   MðhÞ ¼ Eh H ðrÞ ðhÞH ðrÞ ðhÞT 1 n    o X Eh H ðrÞ ðhÞH ðrþkÞ ðhÞT þ Eh H ðrþkÞ ðhÞH ðrÞ ðhÞT : þ k¼1 ð17Þ ð18Þ Assume, moreover, that An(h) fi A(h) as n fi 1. Then as n fi 1, n1 Varh ðGn ðhÞÞ ! V ðhÞ ¼ AðhÞMðhÞAðhÞT ; ð19Þ and 1 pffiffiffi Gn ðhÞ ! N ð0; V ðhÞÞ n ð20Þ in distribution, provided that the matrix A(h) is such that A(h)M(h)A(h)T is strictly positive definite. Proof. First note that it follows that the process Y is stationary and a-mixing with mixing coefficients aYk ðhÞ, satisfying aYk ðhÞ  aðk1ÞD ðhÞ; k ¼ 2, 3,. . .. This is because the r-algebra generated by Yi, i ¼ 1,. . ., n is contained in the r-algebra generated by Xu, 0 £ u £ nD, and the r-algebra generated by Yi, i ¼ n, n + 1,. . . is contained in the r-algebra generated by Xu, u ‡ (n ) 1)D. Next note that since H(i)(h) is a function of Yi)r,. . .,Yi, the process H(i)(h), H i ¼ r + 1, r + 2,. . . is a-mixing with mixing coefficients aH k ðhÞ, satisfying that ak ðhÞ  H aðkr1ÞD ðhÞ; k ¼ r + 2,. . ., and hence (15) holds with akD(h) replaced by ak ðhÞ. To prove asymptotic normality, it is enough to consider the one-dimensional process vTGn(h) for every v 2 Rp\{0} (Cramér–Wold device). Hence the theorem follows from a classical central limit result for strongly mixing processes, see e.g. Theorem 1 in Section 1.5 of Doukhan (1994).  Board of the Foundation of the Scandinavian Journal of Statistics 2004. 426 S. Ditlevsen and M. Sørensen Scand J Statist 31 For the optimal estimating function with An ðhÞ ¼ An ðhÞ, given by (9), An ðhÞ ! 1   @h ^ aðhÞTCðhÞMðhÞ under the conditions of Theorem 1, provided that MðhÞ is invertible. Hence  MðhÞ   1CðhÞ@ V ðhÞ ¼ @h ^aðhÞTCðhÞ aðhÞ hT ^ ð21Þ in the optimal case. The conditions imposed in Theorem 1 to ensure that a central limit theorem holds are not the weakest possible. In fact, the speed with which the mixing coefficient at(h) needs to go to zero to obtain a limit theorem depends on how heavy the tails of the marginal distribution of Xt are, see Doukhan et al. (1994). This paper also shows that condition (16) can be weakened slightly in the case of exponentially decreasing mixing coefficients. For the one-dimensional, ergodic diffusion process X there are a number of relatively simple criteria ensuring a-mixing with exponentially decreasing mixing coefficients for which (15) is obviously satisfied. If, for instance, the spectrum of the generator of X has a discrete spectrum then the process is a-mixing. If k1 denotes the smallest non-zero eigenvalue, then the mixing coefficients satisfy at ðh0 Þ  etk1 ; see Doukhan (1994, p. 112). Thus X is geometrically a-mixing. The diffusion processes considered in Examples 1 and 2 have a discrete spectrum with k1 ¼ b. Other criteria for geometrical mixing can be found in Veretennikov (1987, 1989), Doukhan (1994) and Hansen & Scheinkman (1995). Rather general criteria for geometric a-mixing of diffusion processes expressed in the language of Malliavin calculus were given by Kusuoka & Yoshida (2000). We cite the following straightforward and rather weak set of conditions by Genon-Catalot et al. (2000) on the coefficients b and r that are sufficient to ensure geometric a-mixing of X. It is presupposed that X is stationary with state space (l, r) ()1 £ l < r £ 1) and that the usual conditions on the scale measure and the speed measure hold, i.e. that Z x0 Z r Z r mðxÞdx < 1; sðxÞdx ¼ 1 and sðxÞdx ¼ l l x0 where  Z sðxÞ ¼ exp 2 x x0  bðuÞ 1 du and mðxÞ ¼ 2 ; r2 ðuÞ r ðxÞsðxÞ and where x0 2 (l, r). Condition 1 (i) The function b is continuously differentiable and r is twice continuously differentiable on (l, r), r(x) > 0 for all x 2 (l, r), and there exists a constant K > 0 such that |b(x)| £ K(1 + |x|) and r2(x) £ K(1 + x2) for all x 2 (l, r). (ii) r(x)m(x) fi 0 as x fl l and x › r. (iii) 1/c(x) has a finite limit as x fl l and x › r, where c(x) ¼ r¢(x) ) 2b(x)/r(x). This condition in fact implies more than geometric a-mixing, it actually ensures geometric q-mixing, which again implies the exponential bounds on certain moments mentioned in Sections 2 and 3. Specifically, there exist K > 0 and k > 0 such that if Z1 is measurable with  Board of the Foundation of the Scandinavian Journal of Statistics 2004. Inference for integrated diffusions Scand J Statist 31 427 respect to the r-algebra generated by Xs, s £ t1, and Z2 is measurable with respect to the r-algebra generated by Xs, s ‡ t2, t1 < t2, then |Cov(Z1,Z2)| £ K e)k(t2)t1)Var(Z1)Var(Z2). Conditions ensuring polynomial a-mixing were given by Veretennikov (1997). The following lemma can be used to check the moment condition (16) in Theorem 1. Lemma 1 j ði1Þ Suppose fj(y) ¼ yjj0 and Zjk ¼ Yiljk jk with jjk, ljk ‡ 1 (j ¼ 1,. . .,N, k ¼ 0,. . .,qj). If 4j+ Eh(|X0| ) < 1 for an  > 0, where j ¼ max{j10,. . .,jNqN}, then (16) holds with d ¼ /(2j).  2þd  < 1 for 1 £ i £ r and m1,m2 2 Proof. It is enough to check that Eh Yim1 Y1m2  {j10,. . .,jNqN}, and by Cauchy–Schwarz’ inequality this is the case if Eh(|Y1|2j(2+d)) < 1. Finally, by Jensen’s inequality, Fubini’s theorem and the stationarity of X Z D 2jð2þdÞ !     2jð2þdÞ  Eh jY1 j Xu mðduÞ ¼ Eh  0  Z 0 D     Eh jXu j2jð2þdÞ mðduÞ ¼ Eh jX0 j2jð2þdÞ < 1: ði1Þ For more general choices of fj(y) and Zjk , the existence of the relevant moments must be checked. The following result about existence, consistency and asymptotic normality of our estimators can now be proved exactly as the similar result in Sørensen (2000). Theorem 2 Let h0 denote the true value of the parameter vector. Suppose the conditions of Theorem 1 hold e of h0 and that for h in a neighbourhood H (i) The vector ^aðhÞ given by (12) and the matrix An(h) are twice continuously differentiable with respect to h. (ii) The matrices @hT ^aðh0 Þ and A(h0) have rank p. (iii) The matrices An(h), @ hiAn(h) and @ hi@ hjAn(h) converge to A(h), @ hiA(h) and @ hi@ hjA(h), e respectively, uniformly for h 2 H. Then for every n ‡ r, an estimator ^hn exists that solves the estimating equation Gn ð^ hn Þ ¼ 0 with a probability tending to 1 as n fi 1. Moreover, ^hn ! h0 ð22Þ in probability and   pffiffiffi ^ D nðhn  h0 Þ ! N 0; Dðh0 Þ1 V ðh0 ÞðDðh0 Þ1 ÞT ð23Þ  0 Þ@ T ^aðhÞ, where Cðh  0 Þ is given by (11). as n fi 1 with Dðh0 Þ ¼ Aðh0 ÞCðh h For the optimal prediction-based estimating function, we have seen that  MðhÞ  1 and that V(h) is given by (21). Therefore, the asymptotic variance AðhÞ ¼ @h ^aðhÞTCðhÞ pffiffiffi ^ of nðhn  h0 Þ is  1  0 Þ Mðh  0 Þ@ T ^aðh0 Þ  0 Þ1Cðh @h ^aðh0 ÞTCðh h in the optimal case.  Board of the Foundation of the Scandinavian Journal of Statistics 2004. 428 S. Ditlevsen and M. Sørensen Scand J Statist 31 The conditions ensuring consistency and asymptotic normality are satisfied for the estimators derived earlier in this paper for the Ornstein–Uhlenbeck process and the CIR model. 5. Conclusion We have demonstrated that the problem of statistical inference for integrated diffusions can be solved in a satisfactory and readily implementable way by means of optimal prediction-based estimating functions. Under mild regularity conditions the estimators are consistent and asymptotically normal. We have, moreover, considered some of the problems encountered when the method is implemented in practice. The calculation of moments needed in order to find the optimal prediction-based estimating function is easily programmable when an analytic expression is known for the moments of the underlying diffusion process or when we can obtain the moments of the diffusion by numerical simulation. This was illustrated by two examples, where explicit estimators were derived. We derived the optimal estimating functions and the asymptotic results under the condition that the intervals over which the diffusion is integrated have the same length. When this is not the case, as in Ditlevsen et al. (2002), the observations do not form a stationary process, which implies a more complex expression for the optimal estimating functions and more difficulty in proving asymptotic results. Acknowledgements The research of Michael Sørensen was supported by MaPhySto – A Network in Mathematical Physics and Stochastics funded by The Danish National Research Foundation, and both authors were supported by the European Community’s Human Potential Programme under contract HPRN-CT-2000-00100, DYNSTOCH. Thanks are due to the referees for several helpful comments. References Andersen, T. G. & Bollerslev, T. (1998). Answering the skeptics: yes, standard volatility models do provide accurate forecasts. Int. Econ. Rev. 39, 885–905. Andersen, T. G., Bollerslev, T., Diebold, F. X. & Labys, P. (2001). The distribution of realized exchange rate volatility. J. Amer. Statist. Assoc. 96, 42–55. Barndorff-Nielsen, O. E. & Shepard, N. (2001). Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial econometrics (with discussion). J. Roy. Statist. Soc. Ser. B 63, 167–241. Bibby, B. M. & Sørensen, M. (1995). Martingale estimation functions for discretely observed diffusion processes. Bernoulli 1, 17–39. Bibby, B. M. & Sørensen, M. (1997). A hyperbolic diffusion model for stock prices. Finance Stoch. 1, 25–41. Bibby, B. M. & Sørensen, M. (2001). Simplified estimating functions for diffusion models with a highdimensional parameter. Scand. J. Statist. 28, 99–112. Brockwell, P. J. & Davis, R. A. (1991). Time series: theory and methods. Springer, New York. Cox, J. C., Ingersoll, J. E. & Ross, S. A. (1985). A theory of the term structure of interest rates. Econometrica 53, 385–407. Ditlevsen, P. D., Ditlevsen, S. & Andersen, K. K. (2002). The fast climate fluctuations during the stadial and interstadial climate states. Ann. Glaciol. 35, 457–462. Doukhan, P. (1994). Mixing, properties and examples. Lecture Notes in Statistics, Vol. 85. Springer, New York. Doukhan, P., Massart, P. & Rio, E. (1994). The functional central limit theorem for strongly mixing processes. Ann. Inst. Henri Poincaré 30, 63–82.  Board of the Foundation of the Scandinavian Journal of Statistics 2004. Scand J Statist 31 Inference for integrated diffusions 429 Feller, W. (1951). Diffusion processes in genetics. In Proceedings of the 2nd Berkeley Symposium on Mathematical Statistics and Probability (ed. J. Neyman) 227–246. University of California Press, Berkeley. Genon-Catalot, V., Jeantheau, T. & Larédo, C. (1999). Parameter estimation for discretely observed stochastic volatility models. Bernoulli 5, 855–872. Genon-Catalot, V., Jeantheau, T. & Larédo, C. (2000). Stochastic volatility models as hidden Markov models and statistical applications. Bernoulli 6, 1051–1079. Gloter, A. (1999a). Parameter estimation for a discretely observed integrated diffusion process. Preprint, University of Marne-la-Vallée, 13/99. Gloter, A. (1999b). Parameter estimation for a hidden diffusion. Preprint, University of Marne-la-Vallée, 20/99. Gloter, A. (2000). Parameter estimation for a discrete sampling of an integrated Ornstein-Uhlenbeck process. Statistics 35, 225–243. Godambe, V. P. & Heyde, C. C. (1987). Quasi likelihood and optimal estimation. Int. Statist. Rev. 55, 231–244. Hansen, L. P. & Scheinkman, J. A. (1995). Back to the future: generating moment implications for continuous-time Markov processes. Econometrica 63, 767–804. Heyde, C. C. (1997). Quasi-likelihood and its application. Springer, New York. Kusuoka, S. & Yoshida, N. (2000). Malliavin calculus, geometric mixing, and expansion of diffusion functionals. Probab. Theory Rel. Fields 116 457–484. Pedersen, A. R. (2000). Estimating the nitrous oxide emission rate from the soil surface by means of a diffusion model. Scand. J. Statist. 27, 385–403. Rydén, T. (1994). Consistent and asymptotically normal parameter estimates for hidden Markov models. Ann. Statist. 22, 1884–1895. Sørensen, M. (1997). Estimating functions for discretely observed diffusions: a review. In Selected Proceedings of the Symposium on Estimating Functions (eds I. V. Basawa, V. P. Godambe & R. L. Taylor) IMS Lecture Notes – Monograph Series, Vol. 32, 305–325. Institute of Mathematical Statistics, Hayward. Sørensen, M. (1999). On asymptotics of estimating functions. Braz. J. Probab. Statist. 13, 111–136. Sørensen, M. (2000). Prediction-based estimating functions. Econometr. J. 3, 123–147. Veretennikov, A. Yu. (1987). Bounds for the mixing rate in the theory of stochastic equations. Theory Probab. Appl. 32, 273–281. Veretennikov, A. Yu. (1989). On rate of mixing and the averaging principle for hypoelliptic stochastic differential equations. Math. USSR Izvestiya 33, 221–231. Veretennikov, A. Yu. (1997). On polynomial mixing bounds for stochastic differential equations. Stoch. Proc. Appl. 70, 115–127. Received February 2002, in final form November 2003 Michael Sørensen, Department of Applied Mathematics and Statistics, University of Copenhagen, Universitetsparken 5, DK-2100 Copenhagen Ø, Denmark. E-mail: [email protected]  Board of the Foundation of the Scandinavian Journal of Statistics 2004.