Ca07 RgIto Text
Ca07 RgIto Text
Ca07 RgIto Text
In this section, we review some important concepts and definitions, which will
be extensively used in the next sections.
Definition 1. A collection A of subsets of is a -algebra if
A,
Ac A if A A,
[
An if A1 , A2 , . . . , An , . . . A.
(1)
(2)
(3)
(4)
!
An
(An ),
(5)
From (5) it follows that (A) (B) for all A B in A. The measure
is finite if 0 () . Hence, it can be normalized to obtain a probability
measure P with P (A) = (A)/() [0, 1] for all A A.
An important measure is the Borel measure B on the -algebra B of Borel
subsets1 of R, which assigns to each finite interval its length. However, the
measure space (R, B, B ) is not complete in the sense that there exist subsets B
of R with B
/ B, but B B for some B B with B (B) = 0. Therefore, we
enlarge the -algebra B to a -algebra L and extend the measure B uniquely
to the measure L on L so that (R, L, L ) is complete, that is L L with
L (L ) = 0 whenever L L for some L L with L = 0. We call L the
Lebesgue subsets of R and L the Lebesgue measure.
Definition 3. Let (1 , A1 ) and (2 , A2 ) be two measurable spaces. The function f : 1 2 is A1 : A2 -measurable if
f 1 (A2 ) = {1 1 : f (1 ) A2 } A1 ,
(6)
for all A2 A2 .
This means that the pre-image of any A2 A2 is in A1 .
Definition 4. Let be the sample space, the -algebra A a collection of
events and P the associated probability measure. We call a triplet (, A, P ) a
probability space if A and P satisfy the following properties:
Ac = \ A, A B, A B A if A, B A
0 P (A) 1, P (Ac ) = 1 P (A), P () = 0, P () = 1,
S
T
n An ,
n An A if {An , An A},
S
P
P ( n An ) = n P (An ) if {An , An A} and Ai Aj = for all i 6= j.
(7)
(8)
(9)
(10)
The last two properties hold for a countably (finite or infinite) number of
outcomes. For uncountably many basic outcomes we restrict attention to countable combinations of natural events, which are subintervals of and to which
non-zero probabilities are (possibly) assigned.
When (1 , A1 , P ) is a probability space and (2 , A2 ) is either (R, B) or
(R, L), we call the measurable function f : 1 R a random variable and
denote it usually by X.
Stochastic Processes
(11)
Example. The Ornstein-Uhlenbeck process with parameter > 0 is a widesense stationary Gaussian process X = {Xt , t R+ } for which
X0 N (0, 1),
E{Xt } = 0,
Cs,t = e|ts| ,
for all s, t R+ .
Definition 7. Let (, A, P ) be the probability space and {At , t 0} an increasing family of sub--algebras2 of A. The stochastic process X = {Xt , t R+ },
with Xt being At -measurable for each t 0, is a martingale if
E{Xt |As } = Xs , w.p. 1,
(12)
2.1
Markov chains
We first describe discrete time Markov chains and then generalize to their continuous time counterpart.
Definition 8. Let X = {x1 , . . . , xN } be the set of a finite number of discrete
states. The discrete time stochastic process X = {Xt , t T } is a discrete time
Markov chain if it satisfies the Markov property, that is
P (Xn+1 = xj |Xn = xin ) = P (Xn+1 = xj |X1 = xi1 , . . . , Xn = xin )
(13)
(14)
for all
Property 8.1. A discrete time Markov chain is homogeneous if Pn = P
n = 1, 2, . . ..
As a consequence, the probability vector of a homogenous Markov chain
k T pn for any k = N \ {0}. The probability distributions
satisfies pn+k = P
depend only on the time that has elapsed. However, this does not mean that
the Markov chain is strictly stationary. In order to be so, it is also required that
for each n = 1, 2, . . ., which implies that the probability distributions
pn = p
Tp
=P
.
are equal for all times such that p
It can be shown that a homogenous Markov chain has at least one stationary
probability vector solution. Therefore, it is sufficient that the initial random
variable X1 is distributed according to one of its stationary probability vectors
for the Markov chain to be strictly stationary.
Property 8.2. Let X = {x1 , . . . , xN } be the set of discrete states and f : X
R. The discrete time homogeneous Markov chain X = {Xn , n = 1, 2, . . .} is
ergodic if
T
N
X
1 X
f (Xn ) =
f (xi )
p(i)
T T
n=1
i=1
lim
(15)
(16)
(t)m t
e ,
m!
(17)
where m N.
Indeed, invoking the independent increments of the Poisson process we get
P (Xs = ms , Xt = mt ) = P (Xs = ms , Xt Xs = mt ms )
= P (Xs = ms )P (Xt Xs = mt ms )
=
The second factor on the right hand side corresponds to the transition probability P (Xt = mt |Xs = ms ) for ms mt . Hence, the Poisson process is
homogeneous since
P (Xt+h = mt + m|Xt = mt ) =
(h)m h
e
,
m!
where h 0.
Property 9.2. Let f : X R. The continuous time homogeneous Markov
chain X = {Xt , t R+ } is ergodic if
1
lim
T T
f (Xt ) dt =
0
N
X
f (xi )
p(i)
(18)
i=1
(i,j)
pt
lim
if i 6= j,
t0
(i,j)
t
a
=
(19)
(i,j)
pt
1
lim
if i = j.
t0
(20)
(21)
2.2
Diffusion processes
(22)
R
p(x)dx is a stationary probability density.
where p(y) = p(s, x; t, y)
This means that the time average limit coincide with the spatial average. In
practice, this is often difficult to verify.
7
= 2 + x2 0.
Diffusion processes are almost surely continuous functions of time, but they
need not to be differentiable. Without going into the mathematical details,
the continuity of a stochastic process can be defined in terms of continuity
with probability one, mean square continuity and continuity in probability or
distribution (see for example [3]).
Another interesting criterion is Kolmogorovs continuity criterion, which
states that a continuous time stochastic process X = {Xt , t T } has continuous sample paths if there exists a, b, c, h > 0 such that
E{|Xt Xs |a } c|t s|1+b
(30)
1 2
{ 2 (t, y)p} = 0,
+
{(t, y)p}
t
y
2 y 2
(31)
for a fixed initial state (s, x). The backward evolution of the transition density
p(s, x; t, y) is given by the Kolmogorov backward equation
p
p 1 2
2p
+ (s, x)
+ (s, x) 2 = 0,
s
x 2
x
(32)
1
p(s, x; t, y) = p(s + s , x + s + s ; t , y)
2
1
+ p(s + s , x + s s ; t , y).
2
Taking Taylor expansions up to the first order in s about (s, x; t, y) leads to
0=
p
p
1 2 p
s + s + 2 2 s + O (s)3/2 .
s
x
2 x
Since the discrete time process converges in distribution to the diffusion process,
we obtain the backward Kolmogorov equation when s 0.
Example. The Kolmogorov foward and the backward equations for the OrnsteinUhlenbeck process with parameter > 0 are respectively given by
p
2p
{yp} 2 = 0,
t
y
y
2
p
p
p
x
+ 2 = 0.
s
x
x
2.3
(33)
(34)
Wiener processes
n
2X
1
2
W (n) () W (n) () = t s, w.p. 1,
k+1
k=0
n
2X
1
k=0
0k2 1
(n)
k+1
k+1
() W (n) ().
k
(Note that lim sup is the limit superior or supremum limit, that is the supremum3 of all the limit points.) From the sample path continuity, we have
maxn W (n) () W (n) () 0, w.p. 1 as n
0k2 1
k+1
3 For S T , the supremum of S is the least element of T , which is greater or equal to all
elements of S.
10
and therefore
n
2X
1
k=0
(n)
k+1
() W (n) () , w.p. 1 as n .
k
As a consequence, the sample paths do, almost surely, not have bounded variation on [s, t] and cannot be differentiated.
The standard Wiener process is also a diffusion process with drift (s, x) = 0
and diffusion coefficient (s, x) = 1. Indeed, we have
E{y} x
= 0,
ts
ts
E{y 2 } 2E{y}x + x2
= lim
+ 0 = 1.
2 (x, s) = lim
ts
ts
ts
ts
(x, s) = lim
ts
= 0,
(36)
t
2 y 2
p 1 2 p
+
= 0.
(37)
s 2 x2
Directly evaluating the partial derivatives of the transition density leads to the
same results.
Note finally that the Wiener process W = {Wt , t R+ } is a martingale.
Since E{Wt Ws |Ws } = 0, w.p. 1, and E{Ws |Ws } = Ws , w.p. 1, we have
E{Wt |Ws } = Ws , w.p. 1.
2.4
t
(x y),
T
st
= min{s, t} ,
T
t = x
Cs,t
for 0 s, t T .
11
(39)
(40)
2.5
White Noise
h0
Wt+h Wt
,
h
(43)
2.6
12
(44)
or as an integral equation
Z
(s, x(s)) ds
x(t) = x0 +
(45)
t0
(46)
(t, Xt ) =
Xt and (t, Xt ) = for constant
and .
This symbolic differential can be interpreted as the integral equation along
a sample path
Z t
Z t
Xt () = Xt0 () +
(s, Xs ()) ds +
(s, Xs ())s () ds
(47)
t0
t0
for each . Now, for the case where a 0 and 1, we see that t should
be the derivative of Wiener process Wt = Xt , i.e it is Gaussian white noise.
This suggests that (47) can be written as follows:
Z t
Z t
Xt () = Xt0 () +
(s, Xs ()) ds +
(s, Xs ()) dWs ().
(48)
t0
t0
The problem with this formulation is that the Wiener process Wt is (almost
surely) nowhere differentiable such that the white noise process t does not
exist as a conventional function of t. As a result, the second integral in (48)
cannot be understood as an ordinary (Riemann or Lebesgue) integral. Worse,
it is not a Riemann-Stieltjes integral since the continuous sample paths of a
Wiener process are not of bounded variation for each sample path. Hence, it is
at this point that Itos stochastic integral comes into play!
3.1
n1
X
(50)
j=1
Note that this integral is a random variable with zero mean as it is a sum of
random variables with zero mean. Furthermore, we have the following result
E{I[f ]()} =
n1
X
fj2 (tj+1 tj ).
(51)
j=1
n1
X
(52)
j=1
Lemma. For any a, b R and any random step function f, g such that fj , gj
on tj t < tj+1 for j = 1, 2, . . . , n 1 with 0 = t1 < t2 < . . . < tn = 1 is Atj measurable and mean square integrable, the stochastic integral (52) satisfies the
following properties:
I[f ] is A1 measurable,
E{I[f ]} = 0,
P
E{I 2 [f ]} = j E{fj2 }(tj+1 tj ),
(53)
(54)
(56)
(55)
Since fj is Atj -measurable and {Wtj+1 Wtj } is Atj+1 -measurable, each term
fj {Wtj+1 Wtj } is Atj+1 -measurable and thus A1 -measurable. Hence, I[f ] is
A1 -measurable.
From the Cauchy-Schwarz inequality4 and the fact that each term in (52) is
mean-squre integrable, it follows that I[f ] is integrable. Hence, I[f ]() is again
4 The
R
2 R
R
14
n1
X
j=1
n1
X
j=1
n1
X
j=1
n1
X
E{fj2 }(tj+1 tj ).
j=1
(n)
]() =
n1
X
(n)
(57)
j=1
(n)
n1
X
(n)
j=1
R1
This converges to the Riemann integral 0 E{f 2 (s, )} ds for n . This
result, along with the well-behaved mean square property of the Wiener process,
i.e. E{(Wt Ws )2 } = t s, suggests defining the stochastic integral in terms
of mean square convergence.
Theorem 3. The Ito (stochastic) integral I[f ] of a function f : T R is
the (unique) mean square limit of sequences I[f (n) ] for any sequence of random
step functions f (n) converging5 to f :
I[f ]() = m.s. lim
n
5 In
n1
X
(n)
(59)
j=1
15
(58)
R1
The properties (5356) still apply, but we write E{I 2 [f ]} = 0 E{f 2 (t, )} dt
for (56) and call it the Ito isometry (on the unit time interval).
Similarly, the time-dependent Ito integral is a random variable defined on
any interval [t0 , t]:
Z t
f (s, ) dWs (),
(60)
Xt () =
t0
3.2
The main advantage of the Ito stochastic integral is the martingale property.
However, a consequence is that stochastic differentials, which are interpreted as
stochastic integrals, do not follow the chain rule of classical calculus! Roughly
speaking, an additional term is appearing due to the fact that dWt2 is equal to
dt in the mean square sense.
Consider the stochastic process Y = {Yt = U (t, Xt ), t 0} with U (t, x)
having continuous second order partial derivatives.
If Xt were continuously differentiable, the chain rule of classical calculus
would give the following expression:
dYt =
U
U
dt +
dXt .
t
x
(63)
where dXt = f dWt is the symbolic differential form of (60). The additional
term is due to the fact that E{dXt2 } = E{f 2 }dt gives rise to an additional term
of the first order in t of the Taylor expansion for U :
U
2U
U
1 2U 2
2U
2
Yt =
t
+
2
x
+ ....
t +
x +
tx
+
t
x
2 t2
tx
x2
Theorem 4. Consider the following general stochastic differential:
Z t
Z t
Xt () Xs () =
e(u, ) du +
f (u, ) dWu ().
s
(65)
U
Let Yt = U (t, Xt ) with U having continuous partial derivatives U
t , x and
2U
x2 . The Ito formula is the following stochastic chain rule:
Z t
Z t
1 2 2U
U
U
U
Yt Ys =
+ eu
+ fu 2 du +
dXu , w.p. 1. (66)
t
x
2
x
s
s x
Hence, we recover
3.3
Rt
0
Wt dWt =
1
2
2 Wt
12 t for s = 0.
Multivariate case
Probability Distributions
n
e ,
n!
(69)
1 (x)
(70)
References
[1] Lawrence E. Evans. An Introduction to Stochastic Differential Equations.
Lecture notes (Department of Mathematics, UC Berkeley), available from
http://math.berkeley.edu/evans/.
[2] Crispin W. Gardiner. Handbook of Stochastic Methods. Springer-Verlag,
Berlin, 2004 (3rd edition).
[3] Peter E. Kloeden and Eckhard Platen. Numerical Solution of Stochastic
Differential Equations. Springer-Verlag, Berlin, 1992.
[4] Bernt ksendael. Stochastic Differential Equations. Springer-Verlag, Berlin,
1985.
18