Eberle Introduction To Stochastic Analysis
Eberle Introduction To Stochastic Analysis
Eberle Introduction To Stochastic Analysis
Andreas Eberle
Contents 2
I Stochastic Processes 13
1 Brownian Motion 14
1.1 From Random Walks to Brownian Motion . . . . . . . . . . . . . . . . 15
Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Brownian motion as a Lévy process. . . . . . . . . . . . . . . . . . . . 20
Brownian motion as a Markov process. . . . . . . . . . . . . . . . . . . 21
Wiener Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.2 Brownian Motion as a Gaussian Process . . . . . . . . . . . . . . . . . 27
Multivariate normals . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Gaussian processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3 The Wiener-Lévy Construction . . . . . . . . . . . . . . . . . . . . . . 38
A first attempt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
The Wiener-Lévy representation of Brownian motion . . . . . . . . . . 41
Lévy’s construction of Brownian motion . . . . . . . . . . . . . . . . . 47
1.4 The Brownian Sample Paths . . . . . . . . . . . . . . . . . . . . . . . 52
Typical Brownian sample paths are nowhere differentiable . . . . . . . 52
Hölder continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Law of the iterated logarithm . . . . . . . . . . . . . . . . . . . . . . . 56
Passage times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.5 Strong Markov property and reflection principle . . . . . . . . . . . . . 60
2
CONTENTS 3
Index 564
Bibliography 567
Stochastic Processes
13
Chapter 1
Brownian Motion
The fundamental rôle played by Brownian motion in stochastic analysis is due to the
central limit Theorem. Similarly as the normal distribution arises as a universal scal-
ing limit of standardized sums of independent, identically distributed, square integrable
14
1.1. FROM RANDOM WALKS TO BROWNIAN MOTION 15
A standard approach to model stochastic dynamics in discrete time is to start from a se-
quence of random variables η1 , η2 , . . . defined on a common probability space (Ω, A, P ).
The random variables ηn describe the stochastic influences (noise) on the system. Often
they are assumed to be independent and identically distributed (i.i.d.). In this case the
collection (ηn ) is also called a white noise, whereas a colored noise is given by depen-
dent random variables. A stochastic process Xn , n = 0, 1, 2, . . . , taking values in Rd is
then defined recursively on (Ω, A, P ) by
Here the Φn are measurable maps describing the random law of motion. If X0 and
η1 , η2 , . . . are independent random variables, then the process (Xn ) is a Markov chain
with respect to P .
Now let us assume that the random variables ηn are independent and identically dis-
tributed taking values in R, or, more generally, Rd . The easiest type of a nontrivial
Pn
stochastic dynamics as described above is the Random Walk Sn = ηi which satisfies
i=1
Since the noise random variables ηn are the increments of the Random Walk (Sn ), the
law of motion (1.1.1) in the general case can be rewritten as
This equation is a difference equation for (Xn ) driven by the stochastic process (Sn ).
Our aim is to carry out a similar construction as above for stochastic dynamics in con-
tinuous time. The stochastic difference equation (1.1.2) will then eventually be replaced
by a stochastic differential equation (SDE). However, before even being able to think
about how to write down and make sense of such an equation, we have to identify a
continuous-time stochastic process that takes over the rôle of the Random Walk. For
this purpose, we first determine possible scaling limits of Random Walks when the time
steps tend to 0. It will turn out that if the increments are square integrable and the size
of the increments goes to 0 as the length of the time steps tends to 0, then by the Central
Limit Theorem there is essentially only one possible limit process in continuous time:
Brownian motion.
To see, how the CLT determines the possible scaling limits of Random Walks, let us
consider a one-dimensional Random Walk
n
X
Sn = ηi , n = 0, 1, 2, . . . ,
i=1
Plotting many steps of the Random Walk seems to indicate that there is a limit process
with continuous sample paths after appropriate rescaling:
4 10
2 5
5 10 15 20 20 40 60 80 100
−2 −5
50 100
25 50
To see what appropriate means, we fix a positive integer m, and try to define a rescaled
(m)
Random Walk St (t = 0, 1/m, 2/m, . . .) with time steps of size 1/m by
(m)
Sk/m = c m · Sk (k = 0, 1, 2, . . .)
(m)
Var[St ] = c2m · Var[Smt ] = c2m · m · t.
(m)
Hence in order to achieve convergence of St as m → ∞, we should choose cm
(m)
proportional to m−1/2 . This leads us to define a continuous time process (St )t≥0 by
(m) 1
St (ω) := √ Smt (ω) whenever t = k/m for some integer k,
m
k−1 k
and by linear interpolation for t ∈ ,
m m
.
(m)
St
√
m
1 2 t
m
Figure 1.1: Rescaling of a Random Walk.
Clearly,
(m)
E[St ] = 0 for all t ≥ 0,
and
(m) 1
Var[St ]Var[Smt ] = t
=
m
whenever t is a multiple of 1/m. In particular, the expectation values and variances for a
fixed time t do not depend on m. Moreover, if we fix a partition 0 ≤ t0 < t1 < . . . < tn
such that each ti is a multiple of 1/m, then the increments
(m) (m) 1
Sti+1 − Sti = √ Smti+1 − Smti , i = 0, 1, 2, . . . , n − 1, (1.1.4)
m
(m)
of the rescaled process (St )t≥0 are independent centered random variables with vari-
ances ti+1 − ti . If ti is not a multiple of 1/m, then a corresponding statement holds
approximately with an error that should be negligible in the limit m → ∞. Hence, if
(m)
the rescaled Random Walks (St )t≥0 converge in distribution to a limit process (Bt )t≥0 ,
then (Bt )t≥0 should have independent increments Bti+1 −Bti over disjoint time intervals
with mean 0 and variances ti+1 − ti .
It remains to determine the precise distributions of the increments. Here the Central
Limit Theorem applies. In fact, we can observe that by (1.1.4) each increment
mti+1
(m) (m) 1 X
Sti+1 − Sti = √ ηk
m k=mt +1
i
(m) (m) D
Sti+1 − Sti −→ N(0, ti+1 − ti ).
Hence if a limit process (Bt ) exists, then it should have independent, normally dis-
tributed increments.
Thus the increments of a d-dimensional Brownian motion are independent over disjoint
time intervals and have a multivariate normal distribution:
Remark. (1). Continuity: Continuity of the sample paths has to be assumed sepa-
rately: If (Bt )t≥0 is a one-dimensional Brownian motion, then the modified pro-
cess (Bet )t≥0 defined by Be0 = B0 and
et
B = Bt · I{Bt ∈R\Q} for t > 0
has almost surely discontinuous paths. On the other hand, it satisfies (a) and (b)
et , . . . , B
since the distributions of (B etn ) and (Bt , . . . , Btn ) coincide for all n ∈ N
1 1
and t1 , . . . , tn ≥ 0.
(2). Spatial Homogeneity: If (Bt )t≥0 is a Brownian motion starting at 0, then the
translated process (a + Bt )t≥0 is a Brownian motion starting at a.
(3). Existence: There are several constructions and existence proofs for Brownian mo-
tion. In Section 1.3 below we will discuss in detail the Wiener-Lévy construction
of Brownian motion as a random superposition of infinitely many deterministic
paths. This explicit construction is also very useful for numerical approximations.
A more general (but less constructive) existence proof is based on Kolmogorov’s
extension Theorem, cf. e.g. [Klenke].
motion is the only Lévy process Lt in continuous time with paths such that E[L1 ] =
0 and Var[L1 ] = 1. The normal distribution of the increments follows under these
assumptions by an extension of the CLT, cf. e.g. [Breiman: Probability]. A simple
example of a Lévy process with non-continuous paths is the Poisson process. Other
examples are α-stable processes which arise as scaling limits of Random Walks when
the increments are not square-integrable. Stochastic analysis based on general Lévy
processes has attracted a lot of interest recently.
Let us now consider consider a Brownian motion (Bt )t≥0 starting at a fixed point a ∈
Rd , defined on a probability space (Ω, A, P ). The information on the process up to time
t is encoded in the σ-algebra
FtB = σ(Bs | 0 ≤ s ≤ t)
generated by the process. The independence of the increments over disjoint intervals
immediately implies:
Proof. For any partition 0 = t0 ≤ t1 ≤ . . . ≤ tn = s of the interval [0, s], the increment
Bt − Bs is independent of the σ-algebra
and Bt0 is constant, this σ-algebra coincides with σ(Bt0 , Bt1 , . . . , Btn ). Hence Bt − Bs
is independent of all finite subcollections of (Bu | 0 ≤ u ≤ s) and therefore independent
of FsB .
Remark (Heat equation as backward equation and forward equation). The tran-
sition function of Brownian motion is the heat kernel in Rd , i.e., it is the fundamental
solution of the heat equation
∂u 1
= ∆u.
∂t 2
More precisely, pt (x, y) solves the initial value problem
∂ 1
pt (x, y) = ∆x pt (x, y) for any t > 0, x, y ∈ Rd ,
∂t 2
ˆ (1.1.5)
lim pt (x, y)f (y) dy = f (x) for any f ∈ Cb (Rd ), x ∈ Rd ,
tց0
Pd ∂2
where ∆x = 2
denotes the action of the Laplace operator on the x-variable. The
i=1 ∂xi
equation (1.1.5) can be viewed as a version of Kolmogorov’s backward equation for
Brownian motion as a time-homogeneous Markov process, which states that for each
t > 0, y ∈ Rd and f ∈ Cb (Rd ), the function
ˆ
v(s, x) = pt−s (x, y)f (y) dy
∂v 1
(s, x) = − ∆x v(s, x) for s ∈ [0, t), lim v(s, x) = f (x). (1.1.6)
∂s 2 sրt
Note that by the Markov property, v(s, x) = (pt−s f )(x) is a version of the conditional
expectation E[f (Bt ) | Bs = x]. Therefore, the backward equation describes the depen-
dence of the expectation value on starting point and time.
∂ 1
pt (x, y) = ∆y pt (x, y) for any t > 0, and x, y ∈ Rd ,
∂t 2
ˆ (1.1.7)
lim g(x)pt (x, y) dx = g(y) for any g ∈ Cb (Rd ), y ∈ Rd .
tց0
The equation (1.1.7) is a version of Kolmogorov’s forward equation, stating that for
g ∈ Cb (Rd ), the function u(t, y) = g(x)pt (x, y) dx solves
´
∂u 1
(t, y) = ∆y u(t, y) for t > 0, lim u(t, y) = g(y). (1.1.8)
∂t 2 tց0
The forward equation describes the forward time evolution of the transition densities
pt (x, y) for a given starting point x.
Corollary 1.3 (Finite dimensional marginals). Suppose that (Bt )t≥0 is a Brownian
motion starting at x0 ∈ Rd defined on a probability space (Ω, A, P ). Then for any
n ∈ N and 0 = t0 < t1 < t2 < . . . < tn , the joint distribution of Bt1 , Bt2 , . . . , Btn is
absolutely continuous with density
fBt1 ,...,Btn (x1 , . . . , xn ) = pt1 (x0 , x1 )pt2 −t1 (x1 , x2 )pt3 −t2 (x2 , x3 ) · · · ptn −tn−1 (xn−1 , xn )
n n
!
Y 1 X |x i − xi−1 | 2
= (2π(ti − ti−1 ))−d/2 · exp − (. 1.1.9)
i=1
2 i=1 ti − ti−1
P [Bt1 ∈ A1 , . . . , Btn ∈ An ]
= E[P [Btn ∈ An | FtBn−1 ] ; Bt1 ∈ A1 , . . . , Btn−1 ∈ An−1 ]
= E[ptn −tn−1 (Btn−1 , An ) ; Bt1 ∈ A1 , . . . , Btn−1 ∈ An−1 ]
ˆ ˆ
= ··· pt1 (x0 , x1 )pt2 −t1 (x1 , x2 ) · · ·
A1 An−1
Wiener Measure
The distribution of Brownian motion could be considered as a probability measure on
the product space (Rd )[0,∞) consisting of all maps x : [0, ∞) → Rd . A disadvantage
of this approach is that the product space is far too large for our purposes: It contains
extremely irregular paths x(t), although at least almost every path of Brownian motion
is continuous by definition. Actually, since [0, ∞) is uncountable, the subset of all
continuous paths is not even measurable w.r.t. the product σ-algebra on (Rd )[0,∞) .
Instead of the product space, we will directly consider the distribution of Brownian
motion on the continuous path space C([0, ∞), Rd). For this purpose, we fix a Brownian
motion (Bt )t≥0 starting at x0 ∈ Rd on a probability space (Ω, A, P ), and we assume that
every sample path t 7→ Bt (ω) is continuous. This assumption can always be fulfilled by
modifying a given Brownian motion on a set of measure zero. The full process (Bt )t≥0
can then be interpreted as a single path-space valued random variable (or a "random
path").
Rd
x0
ω Ω
t
B(ω)
B = σ(Xt | t ≥ 0)
Definition. The probability measure µx0 on the path space C([0, ∞), Rd) determined
by (1.1.10) is called Wiener measure (with start in x0 ).
Remark (Uniqueness in distribution). The Theorem asserts that the path space distri-
bution of a Brownian motion starting at a given point x0 is the corresponding Wiener
measure. In particular, it is uniquely determined by the marginal distributions in (1.1.9).
Since the cylinder sets of type {Xt1 ∈ A1 , . . . , Xtn ∈ An } generate the σ-algebra B, the
map B is A/B-measurable. Moreover, by corollary 1.3, the probabilities
are given by the right hand side of (1.1.10). Finally, the measure µx0 is uniquely deter-
mined by (1.1.10), since the system of cylinder sets as above is stable under intersections
and generates the σ-algebra B.
Definition (Canonical model for Brownian motion.). By (1.1.10), the coordinate pro-
cess
Xt (x) = xt , t ≥ 0,
on C([0, ∞), Rd) is a Brownian motion starting at x0 w.r.t. Wiener measure µx0 . We
refer to the stochastic process (C([0, ∞), Rd), B, µx0 , (Xt )t≥0 ) as the canonical model
for Brownian motion starting at x0 .
Multivariate normals
Let us first recall some basics on normal random vectors:
N(m, 0) = δm .
Differentiating (1.2.1) w.r.t. p shows that for a random variable Y ∼ N(m, C), the
mean vector is m and Ci,j is the covariance of the components Yi and Yj . Moreover, the
following important facts hold:
(2). Any affine function of a normally distributed random vector Y is again normally
distributed:
1 ⊤
= eip·(Am+b)− 2 p·ACA for any p ∈ Rd ,
n
Y
1 1 2
E[eip·Y ] = eip·m− 2 p·Cp = eimk pk − 2 Ck,k pk
k=1
If Y has a multivariate normal distribution N(m, C) then for any p, q ∈ Rn , the random
variables p · Y and q · Y are normally distributed with means p · m and q · m, and
covariance
n
X
Cov[p · Y, q · Y ] = pi Ci,j qj = p · Cq.
i,j=1
Conversely, we can generate a random vector Y with distribution N(m, C) from i.i.d.
standard normal random variables Z1 , . . . , Zn by setting
n p
X
Y = m+ λi Zi ei . (1.2.3)
i=1
Corollary 1.6 (Generating normal random vectors). Suppose that C = UΛU ⊤ with
a matrix U ∈ Rn×d , d ∈ N, and a diagonal matrix Λ = diag(λ1 , . . . , λd ) ∈ Rd×d with
nonnegative entries λi . If Z = (Z1 , . . . , Zd ) is a random vector with i.i.d. standard
normal random components Z1 , . . . , Zd then
Y = UΛ1/2 Z + m
Y ∼ N(m, UΛU ⊤ ).
C = LL⊤
of the covariance matrix as a product of a lower triangular matrix L and the upper
triangular transpose L⊤ :
(3). Set y := Lz + m.
Gaussian processes
Let I be an arbitrary index set, e.g. I = N, I = [0, ∞) or I = Rn .
The distribution of a Gaussian process (Yt )t∈I on the path space RI or C(I, R) endowed
with the σ-algebra generated by the maps x 7→ xt , t ∈ I, is uniquely determined by
the multinormal distributions of finite subcollections Yt1 , . . . , Ytn as above, and hence
by the expectation values
m(t) = E[Yt ], t ∈ I,
c(s, t) = Cov[Ys , Yt ], s, t ∈ I.
Example (AR(1) process). The autoregressive process (Yn )n=0,1,2,... defined recur-
sively by Y0 ∼ N(0, v0 ),
and
k 2n ε2 2n
c(n, n + k) = α · α v0 + (1 − α ) · for n, k ≥ 0 otherwise.
1 − α2
This is easily verified by induction. We now consider some special cases:
α = 0: In this case Yn = εηn . Hence (Yn ) is a white noise, i.e., a sequence of inde-
pendent normal random variables, and
P
n
α = 1: Here Yn = Y0 + ε ηi , i.e., the process (Yn ) is a Gaussian Random Walk, and
i=1
α < 1: For α < 1, the covariances Cov[Yn , Yn+k ] decay exponentially fast as k → ∞.
ε2
If v0 = 1−α2
, then the covariance function is translation invariant:
ε2 α k
c(n, n + k) = for any n, k ≥ 0.
1 − α2
Therefore, in this case the process (Yn ) is stationary, i.e., (Yn+k )n≥0 ∼ (Yn )n≥0 for all
k ≥ 0.
Proof. For a Brownian motion (Bt ) and 0 = t0 < t1 < . . . < tn , the increments Bti −
Bti−1 , 1 ≤ i ≤ n, are independent random variables with distribution N(0, ti − ti−1 ).
Hence,
n
O
(Bt1 − Bt0 , . . . , Btn − Btn−1 ) ∼ N(0, ti − ti−1 ),
i=1
which is a multinormal distribution. Since Bt0 = B0 = 0, we see that
1 0 0 ... 0 0
1 1 0 . . . 0 0
Bt1 Bt1 − Bt0
. ..
. ..
.. = .
..
.
Btn 1 1 1 . . . 1 0 Btn − Btn−1
1 1 1 ... 1 1
also has a multivariate normal distribution, i.e., (Bt ) is a Gaussian process. Moreover,
since Bt = Bt − B0 , we have E[Bt ] = 0 and
Hence by Theorem 1.5 (3), the increments Bti − Bti−1 , 1 ≤ i ≤ n, are independent with
distribution N(0, ti − ti−1 ), i.e., (Bt ) is a Brownian motion.
Theorem 1.9 (Invariance properties of Wiener measure). Let (Bt )t≥0 be a Brown-
ian motion starting at 0 defined on a probability space (Ω, A, P ). Then the following
processes are again Brownian motions:
et )t≥0 defined by
(4). The time inversion (B
e0 = 0,
B et = t · B1/t
B for t > 0.
Proof. The proofs of (1), (2) and (3) are left as an exercise to the reader. To show (4),
et1 , . . . , B
we first note that for each n ∈ N and 0 ≤ t1 < . . . < tn , the vector (B etn ) has a
multivariate normal distribution since it is a linear transformation of (B1/t1 , . . . , B1/tn ),
(B0 , B1/t2 , . . . , B1/tn ) respectively. Moreover,
et ] = 0
E[B for any t ≥ 0,
es , B
Cov[B et ] = st · Cov[B1/s , B1/t ]
1 1
= st · min( , ) = min(t, s) for any s, t > 0, and
s t
e e
Cov[B0 , Bt ] = 0 for any t ≥ 0.
et is almost surely continuous for t > 0, we can conclude that outside a set of
Since B
measure zero,
es |
sup |B = sup es |
|B −→ 0 as t ց 0,
s∈(0,t) s∈(0,t)∩Q
Remark (Long time asymptotics versus local regularity, LLN). The time inversion
invariance of Wiener measure enables us to translate results on the long time asymp-
totics of Brownian motion (t ր ∞) into local regularity results for Brownian paths
et ) at 0 is equiva-
(t ց 0) and vice versa. For example, the continuity of the process (B
lent to the law of large numbers:
1
P lim Bt = 0 = P lim sB1/s = 0 = 1.
t→∞ t sց0
At first glance, this looks like a simple proof of the LLN. However, the argument is based
on the existence of a continuous Brownian motion, and the existence proof requires
similar arguments as a direct proof of the law of large numbers.
Wiener measure (with start at 0) is the unique probability measure µ on the continuous
path space C([0, ∞), Rd) such that the coordinate process
n
! n
1 1 X |xti − xti−1 |2 Y
µt1 ,...,tn (dxt1 , . . . , dxtn ) = exp − dxti ,
Z(t1 , . . . , tn ) 2 i=1 ti − ti−1 i=1
(1.2.5)
where Z(t1 , . . . , tn ) is an appropriate finite normalization constant, and x0 := 0. Now
(k) (k) (k)
choose a sequence (τk )k∈N of partitions 0 = t0 < t1 < . . . < tn(k) = T of the interval
(k) (k)
[0, T ] such that the mesh size max |ti+1 − ti | tends to zero. Taking informally the limit
i
in (1.2.5), we obtain the heuristic asymptotic representation
ˆT 2 Y
1
1 dx
µ(dx) = exp − dt δ (dx ) dxt (1.2.6)
2 dt
0 0
Z∞
0 t∈(0,T ]
(k) (k)
• The normalizing constant Z∞ = lim Z(t1 , . . . , tn(k) ) is infinite.
k→∞
´T dx 2
• The integral dt is also infinite for µ-almost every path x, since typical
0 dt
paths of Brownian motion are nowhere differentiable, cf. below.
Q
• The product measure dxt can be defined on cylinder sets but an extension to
t∈(0,T ]
the σ-algebra generated by the coordinate maps on C([0, ∞), Rd) does not exist.
Hence there are several infinities involved in the informal expression (1.2.6). These
infinities magically balance each other such that the measure µ is well defined in contrast
to all of the factors on the right hand side.
Although not mathematically rigorous, the heuristic expression (1.2.5) can be a very
useful guide for intuition. Note for example that (1.2.5) takes the form
1/2
where kxkH = (x, x)H is the norm induced by the inner product
ˆT
dx dy
(x, y)H = dt (1.2.8)
dt dt
0
and hence µ ≡ 0.
Brownian motion as a random Fourier series. The approach described here is slightly
different and due to P. Lévy: The idea is to approximate the paths of Brownian mo-
tion on a finite time interval by their piecewise linear interpolations w.r.t. the sequence
of dyadic partitions. This corresponds to a development of the Brownian paths w.r.t.
Schauder functions ("wavelets") which turns out to be very useful for many applica-
tions including numerical simulations.
A first attempt
Recall that µ0 should be a kind of standard normal distribution w.r.t. the inner product
ˆ1
dx dy
(x, y)H = dt (1.3.1)
dt dt
0
where (Zi )i∈N is a sequence of independent standard normal random variables, and
(ei )i∈N is an orthonormal basis in the Hilbert space
Theorem 1.10. Suppose (ei )i∈N is a sequence of orthonormal vectors in a Hilbert space
H and (Zi )i∈N is a sequence of i.i.d. random variables with P [Zi 6= 0] > 0. Then the
P
∞
series Zi (ω)ei diverges with probability 1 w.r.t. the norm on H.
i=1
P -almost surely as n → ∞.
The Theorem again reflects the fact that a standard normal distribution on an infinite-
dimensional Hilbert space can not be realized on the space itself.
and correspondingly the Hilbert space H by the Banach space C([0, 1]). Note that the
supremum norm is weaker than the H-norm. In fact, for x ∈ H and t ∈ [0, 1], the
Cauchy-Schwarz inequality implies
t 2
ˆ ˆt
|xt | 2
= x′s ds ≤ t· |x′s |2 ds ≤ kxk2H ,
0 0
and therefore
kxksup ≤ kxkH for any x ∈ H.
There are two choices for an orthonormal basis of the Hilbert space H that are of par-
ticular interest: The first is the Fourier basis given by
√
2
e0 (t) = t, en (t) = sin(πnt) for n ≥ 1.
πn
With respect to this basis, the series in (1.3.2) is a Fourier series with random coeffi-
cients. Wiener’s original construction of Brownian motion is based on a random Fourier
series. A second convenient choice is the basis of Schauder functions ("wavelets") that
has been used by P. Lévy to construct Brownian motion. Below, we will discuss Lévy’s
construction in detail. In particular, we will prove that for the Schauder functions, the
series in (1.3.2) converges almost surely w.r.t. the supremum norm towards a contin-
uous (but not absolutely continuous) random path (Bt )t∈[0,1] . It is then not difficult to
conclude that (Bt )t∈[0,1] is indeed a Brownian motion.
An obvious advantage of this approximation over a Fourier expansion is that the values
of the approximating functions at the dyadic points remain fixed once the approximating
e(t) = t , and
en,k (t) = 2−n/2 e0,0 (2n t − k), n = 0, 1, 2, . . . , k = 0, 1, 2, . . . , 2n − 1, , where
t for t ∈ [0, 1/2]
e0,0 (t) = min(t, 1 − t)+ = 1−t for t ∈ (1/2, 1] .
0 for t ∈ R \ [0, 1]
1 e(t)
en,k (t)
−(1+n/2)
2
k · 2−n (k + 1)2−n 1
e0,0 (t)
0.5
is a dense subset of H.
∞
X
kxk2H = (x, ei )2H (1.3.4)
i=1
holds.
∞
X
(x, y)H = (x, ei )H (y, ei )H . (1.3.5)
i=1
For the proofs we refer to any book on functional analysis, cf. e.g. [Reed and Simon:
Methods of modern mathematical physics, Vol. I].
Lemma 1.11. The Schauder functions e and en,k (n ≥ 0, 0 ≤ k < 2n ) form an or-
thonormal basis in the Hilbert space H defined by (1.3.3).
Proof. By definition of the inner product on H, the linear map d/dt which maps an
absolutely continuous function x ∈ H to its derivative x′ ∈ L2 (0, 1) is an isometry
from H onto L2 (0, 1), i.e.,
e′ (t) ≡ 1,
e′n,k (t) = 2n/2 (I[k·2−n,(k+1/2)·2−n ) (t) − I[(k+1/2)·2−n ,(k+1)·2−n ) (t)) for a.e. t.
(k + 1)2−n
1 k · 2−n 1
−2−n/2
It is easy to see that these functions form an orthonormal basis in L2 (0, 1). In fact,
orthonormality w.r.t. the L2 inner product can be verified directly. Moreover, the linear
span of the functions e′ and e′n,k for n = 0, 1, . . . , m and k = 0, 1, . . . , 2n − 1 consists of
all step functions that are constant on each dyadic interval [j · 2−(m+1) , (j + 1) · 2−(m+1) ).
An arbitrary function in L2 (0, 1) can be approximated by dyadic step functions w.r.t.
the L2 norm. This follows for example directly from the L2 martingale convergence
Theorem, cf. ... below. Hence the linear span of e′ and the Haar functions e′n,k is dense
in L2 (0, 1), and therefore these functions form an orthonormal basis of the Hilbert space
L2 (0, 1). Since x 7→ x′ is an isometry from H onto L2 (0, 1), we can conclude that e and
the Schauder functions en,k form an orthonormal basis of H.
The expansion of a function x : [0, 1] → R in the basis of Schauder functions can now
be made explicit. The coefficients of a function x ∈ H in the expansion are
ˆ1 ˆ1
′ ′
(x, e)H = x e dt = x′ dt = x(1) − x(0) = x(1)
0 0
ˆ1 ˆ1
(x, en,k )H = x′ e′n,k dt = 2 n/2
x′ (t)e′0,0 (2n t − k) dt
0 0
n/2 1 −n −n −n 1 −n
= 2 (x((k + ) · 2 ) − x(k · 2 )) − (x((k + 1) · 2 ) − x((k + ) · 2 )) .
2 2
Proof. It can be easily verified that by definition of the Schauder functions, for each
m ∈ N the partial sum
m 2X
−1 n
X
(m)
x (t) := x(1)e(t) − 2n/2 ∆n,k x · en,k (t) (1.3.6)
n=0 k=0
is the polygonal interpolation of x(t) w.r.t. the (m+1)-th dyadic partition of the interval
[0, 1]. Since the function x is uniformly continuous on [0, 1], the polygonal interpola-
tions converge uniformly to x. This proves the first statement. Moreover, for x ∈ H,
the series is the expansion of x in the orthonormal basis of H given by the Schauder
functions, and therefore it also converges w.r.t. the H-norm.
Corollary 1.13 (Wiener-Lévy representation). For a Brownian motion (Bt )t∈[0,1] the
series representation
∞ 2X
−1 n
X
Bt (ω) = Z(ω)e(t) + Zn,k (ω)en,k (t), t ∈ [0, 1], (1.3.7)
n=0 k=0
Proof. It only remains to verify that the coefficients Z and Zn,k are independent with
standard normal distribution. A vector given by finitely many of these random variables
has a multivariate normal distribution, since it is a linear transformation of increments
of the Brownian motion Bt . Hence it suffices to show that the random variables are
uncorrelated with variance 1. This is left as an exercise to the reader.
The convergence proof relies on a combination of the Borel-Cantelli Lemma and the
Weierstrass criterion for uniform convergence of series of functions. Moreover, we will
need the following result to identify the limit process as a Brownian motion:
Lemma 1.15 (Parseval relation for Schauder functions). For any s, t ∈ [0, 1],
∞ 2X
−1n
X
e(t)e(s) + en,k (t)en,k (s) = min(t, s).
n=0 k=0
´t
where h(s) (t) := I(0,s) = min(s, t). Hence the Parseval relation (1.3.4) applied to
0
the functions h(s) and h(t) yields
X
e(t)e(s) + en,k (t)en,k (s)
n,k
X
= (e, h(t) )(e, h(s) ) + (en,k , h(t) )(en,k , h(s) )
n,k
ˆ1
(t) (s)
= (h , h ) = I(0,t) I(0,s) = min(t, s).
0
for any n ∈ N. Since the sequence on the right hand side is summable, Mn ≤ n
holds eventually with probability one. Therefore, the sequence on the right hand
side of (1.3.8) is also summable for P -almost every ω. Hence, by (1.3.8) and the
Weierstrass criterion, the partial sums
m 2X
−1 n
(m)
X
Bt (ω) = Z(ω)e(t) + Zn,k (ω)en,k (t), m ∈ N,
n=0 k=0
(2). L2 convergence for fixed t: We now want to prove that the limit process (Bt )
is a Brownian motion, i.e., a continuous Gaussian process with E[Bt ] = 0 and
Cov[Bt , Bs ] = min(t, s) for any t, s ∈ [0, 1]. To compute the covariances we first
(m)
show that for a given t ∈ [0, 1] the series approximation Bt of Bt converges
also in L . Let l, m ∈ N with l < m. Since the Zn,k are independent (and hence
2
(m)
Cov[Bt , Bs ] = E[Bt Bs ] = lim E[Bt Bs(m) ]
m→∞
n −1
m 2X
X
= e(t)e(s) + lim en,k (t)en,k (s).
m→∞
n=0 k=0
Here we have used again that the random variables Z and Zn,k are independent
with variance 1. By Parseval’s relation (Lemma 1.15), we conclude
Since the process (Bt )t∈[0,1] has the right expectations and covariances, and, by
construction, almost surely continuous paths, it only remains to show that (Bt ) is
a Gaussian process in oder to complete the proof:
(4). (Bt )t∈[0,1] is a Gaussian process: We have to show that (Bt1 , . . . , Btl ) has a mul-
tivariate normal distribution for any 0 ≤ t1 < . . . < tl ≤ 1. By Theorem 1.5,
it suffices to verify that any linear combination of the components is normally
distributed. This holds by the next Lemma since
l
X l
X (m)
pj Btj = lim pj Btj P -a.s.
m→∞
j=1 j=1
Combining Steps 3, 4 and the continuity of sample paths, we conclude that (Bt )t∈[0,1] is
indeed a Brownian motion.
Lemma 1.16. Suppose that (Xn )n∈N is a sequence of normally distributed random vari-
ables defined on a joint probability space (Ω, A, P ), and Xn converges almost surely to
a random variable X. Then X is also normally distributed.
Proof. Suppose Xn ∼ N(mn , σn2 ) with mn ∈ R and σn ∈ (0, ∞). By the Dominated
Convergence Theorem,
1 2 2
E[eipX ] = lim E[eipXn ] = lim eipmn e− 2 σn p .
n→∞ n→∞
The limit on the right hand side only exists for all p, if either σn → ∞, or the sequences
σn and mn both converge to finite limits σ ∈ [0, ∞) and m ∈ R. In the first case,
the limit would equal 0 for p 6= 0 and 1 for p = 0. This is a contradiction, since
characteristic functions are always continuous. Hence the second case occurs, and,
therefore
1 2 p2
E[eipX ] = eipm− 2 σ for any p ∈ R,
i.e., X ∼ N(m, σ 2 ).
So far, we have constructed Brownian motion only for t ∈ [0, 1]. Brownian motion on
any finite time interval can easily be obtained from this process by rescaling. Brownian
motion defined for all t ∈ R+ can be obtained by joining infinitely many Brownian
motions on time intervals of length 1:
B (2)
B (3)
1 2 3
B (1)
(1) (2)
Theorem 1.17. Suppose that Bt , Bt , . . . are independent Brownian motions starting
at 0 defined for t ∈ [0, 1]. Then the process
⌊t⌋
(⌊t⌋+1)
X (i)
Bt := Bt−⌊t⌋ + B1 , t ≥ 0,
i=1
Theorem 1.18 (Paley, Wiener, Zygmund 1933). Almost surely, the Brownian sample
path t 7→ Bt is nowhere differentiable, and
Bs − Bt
lim sup = ∞ for any t ≥ 0.
sցt s − t
Note that, since there are uncountably many t ≥ 0, the statement is stronger than claim-
ing only the almost sure non-differentiability for any given t ≥ 0.
is a null set for any T ∈ N. Hence fix T ∈ N, and consider ω ∈ N. Then there exist
k, L ∈ N and t ∈ [0, T ] such that
1
|Bs (ω) − Bt (ω)| ≤ L · |s − t| holds for s ∈ (t, t + ). (1.4.1)
k
To make use of the independence of the increments over disjoint intervals, we note that
for any n > 4k, we can find an i ∈ {1, 2, . . . , nT } such that the intervals ( ni , i+1
n
),
( i+1
n
, i+2
n
), and ( i+2
n
, i+3
n
) are all contained in (t, t + k1 ):
t t+ 1
k
j+1 j 8L
≤ L·( − t) + L · ( − t) ≤
n n n
[ \ [ nT
e 8L
N := B j+1 − B j ≤ for j = i, i + 1, i + 2 .
k,L∈N n>4k i=1
n n n
Hölder continuity
The statement of Theorem 1.18 says that a typical Brownian path is not Lipschitz contin-
uous on any non-empty open interval. On the other hand, the Wiener-Lévy construction
shows that the sample paths are continuous. We can almost close the gap between these
two statements by arguing in both cases slightly more carefully:
Hence a typical Brownian path is nowhere Hölder continuous with parameter α > 1/2,
but it is Hölder continuous with parameter α < 1/2 on any finite interval. The critical
case α = 1/2 is more delicate, and will be briefly discussed below.
Proof of Theorem 1.19. The first statement can be shown by a similar argument as in
the proof of Theorem 1.18. The details are left to the reader.
To prove the second statement for T = 1, we use the Wiener-Lévy representation
∞ 2X
−1 n
X
Bt = Z ·t+ Zn,k en,k (t) for any t ∈ [0, 1]
n=0 k=0
with independent standard normal random variables Z, Zn,k . For t, s ∈ [0, 1] we obtain
X X
|Bt − Bs | ≤ |Z| · |t − s| + Mn |en,k (t) − en,k (s)|,
n k
where Mn := max |Zn,k | as in the proof of Theorem 1.14. We have shown above that
k
by the Borel-Cantelli Lemma, Mn ≤ n eventually with probability one, and hence
Mn (ω) ≤ C(ω) · n
for some almost surely finite constant C(ω). Moreover, note that for each s, t and n, at
P
most two summands in k |en,k (t) − en,k (s)| do not vanish. Since |en,k (t)| ≤ 21 · 2−n/2
and |e′n,k (t)| ≤ 2n/2 , we obtain the estimates
By (1.4.5) the sums on the right hand side can both be bounded by a constant multiple of
|t − s|α for any α < 1/2. This proves that (Bt )t∈[0,1] is almost surely Hölder-continuous
of order α.
Theorem 1.20 (Khintchine 1924). For s ≥ 0, the following statements hold almost
surely:
Bs+t − Bs Bs+t − Bs
lim sup p = +1, and lim inf p = −1.
tց0 2t log log(1/t) tց0 2t log log(1/t)
bt =
Proof. This follows by applying the Theorem above to the Brownian motion B
t · B1/t . For example, substituting h = 1/t, we have
Bt h · B1/h
lim sup p = lim sup p = +1
t→∞ 2t log log(t) hց0 2h log log 1/h
almost surely.
The corollary is a continuous time analogue of Kolmogorov’s law of the iterated log-
Pn
arithm for Random Walks stating that for Sn = ηi , ηi i.i.d. with E[ηi ] = 0 and
i=1
Var[ηi ] = 1, one has
Sn Sn
lim sup √ = +1 and lim inf √ = −1
n→∞ 2n log log n n→∞ 2n log log n
almost surely. In fact, one way to prove Kolmogorov’s LIL is to embed the Random
Walk into a Brownian motion, cf. e.g. Rogers and Williams, Vol. I, Ch. 7 or Section 3.3
Passage times
We now study the set of passage times to a given level a for a one-dimensional Brownian
motion (Bt )t≥0 . This set has interesting properties – in particular it is a random fractal.
Fix a ∈ R, and let
Assuming that every path is continuous, the random set Λa (ω) is closed for every ω.
Moreover, scale invariance of Brownian motion implies a statistical self similarity prop-
erty for the sets of passage times: Since the rescaled process (c−1/2 Bct )t≥0 has the same
distribution as (Bt )t≥0 for any c > 0, we can conclude that the set valued random vari-
able c · Λa/√c has the same distribution as Λa . In particular, Λ0 is a fractal in the sense
that
Λ0 ∼ c · Λ0 for any c > 0.
Moreover, by Fubini’s Theorem one easily verifies that Λa has almost surely Lebesgue
measure zero. In fact, continuity of t 7→ Bt (ω) for any ω implies that (t, ω) 7→ Bt (ω) is
product measurable (Exercise). Hence {(t, ω) : Bt (ω) = a} is contained in the product
σ-algebra, and
∞
ˆ ˆ∞
E[λ(Λa )] = E I{a} (Bt ) dt = P [Bt = a] dt = 0.
0 0
In particular, for any a ∈ R, the random set Λa is almost surely unbounded, i.e. Brow-
nian motion is recurrent.
Hence,
√
P sup Bt ≥ a = P sup Bt ≥ a · c
t≥0 t≥0
for any c > 0, and therefore sup Bt ∈ {0, ∞} almost surely. The first part of the asser-
tion now follows since sup Bt is almost surely strictly positive. By reflection symmetry,
we also obtain inf Bt = −∞ with probability one.
The last Theorem makes a statement on the global structure of the set Λa . By invariance
w.r.t. time inversion this again translates into a local regularity result:
Theorem 1.23 (Fine structure of Λa ). The set Λa is almost surely a perfect set, i.e., any
t ∈ Λa is an accumulation point of Λa .
Proof. We prove the statement for a = 0, the general case being left as an exercise. We
proceed in three steps:
et := BTs +t − BTs ,
B t ≥ 0,
et = 0} = {t ≥ 0 : BTs +t = BTs } = {t ≥ Ts : Bt = a} ⊆ Λa .
{t ≥ 0 : B
S TEP 3: To complete the proof note that we have shown that the following properties
hold with probability one:
(1). Λa is closed.
Since Q+ is a dense subset of R+ , (1) and (2) imply that any t ∈ Λa is an accu-
mulation point of Λa . In fact, for any s ∈ [0, t] ∩ Q, there exists an accumulation
point of Λa in (s, t] by (2), and hence t is itself an accumulation point.
Remark. It can be shown that the set Λa has Hausdorff dimension 1/2.
attained before a given time s ∈ R+ . The idea is to proceed similarly as for Random
Walks, and to reflect the Brownian path after the first passage time
Ta = min{t ≥ 0 : Bt = a}
bt
B
Ta
Bt
It seems plausible (e.g. by the heuristic path integral representation of Wiener measure,
or by a Random Walk approximation) that the reflected process (B bt )t≥0 defined by
Bt for t ≤ Ta
Bbt :=
a − (B − a) for t > T
t a
is again a Brownian motion. At the end of this section, we will prove this reflection
principle rigorously by the strong Markov property. Assuming the reflection principle
is true, we can compute the distribution of Ms in the following way:
Proof. (1) holds since Ms ∼ |Bs |. For the proof of (2) we assume w.l.o.g. s = 1. The
general case can be reduced to this case by the scale invariance of Brownian motion
(Exercise). For a ≥ 0 and c ≤ a let
where Φ denotes the standard normal distribution function. Since lim G(a, c) = 0 and
a→∞
lim G(a, c) = 0, we obtain
c→−∞
ˆ∞ ˆc
∂2G
P [M1 ≥ a, B1 ≤ c] = G(a, c) = − (x, y) dydx
∂x∂y
x=a y=−∞
ˆ∞ ˆc
2x − y (2x − y)2
= 2· √ · exp − dydx.
2π 2
x=a y=−∞
The Theorem enables us to compute the distributions of the first passage times Ta . In
fact, for a > 0 and s ∈ [0, ∞) we obtain
√
P [Ta ≤ s] = P [Ms ≥ a] = 2 · P [Bs ≥ a] = 2 · P [B1 ≥ a/ s]
r ˆ∞
2 2
= e−x /2 dx. (1.5.1)
π √
a/ s
|a| 2 /2s
fTa (s) = √ · e−a .
2πs3
Next, we prove a strong Markov property for Brownian motion. Below we will then
complete the proof of the reflection principle and the statements above by applying the
strong Markov property to the passage time Ta .
Ta = min{t ≥ 0 : Bt = a}
Note that for (FtB ) stopping times S and T with S ≤ T we have FSB ⊆ FTB , since for
t≥0
A ∩ {S ≤ t} ∈ FtB =⇒ A ∩ {T ≤ t} = A ∩ {S ≤ t} ∩ {T ≤ t} ∈ FtB .
For any constant s ∈ R+ , the process (Bs+t − Bs )t≥0 is a Brownian motion independent
of FsB .
Theorem 1.26 (Strong Markov property). Suppose that T is an almost surely finite
et )t≥0 defined by
(FtB ) stopping time. Then the process (B
et = BT +t − BT
B if T < ∞, 0 otherwise,
Proof. We first assume that T takes values only in C ∪ {∞} where C is a countable
subset of [0, ∞). Then for A ∈ FTB and s ∈ C, we have A ∩ {T = s} ∈ FsB and
Bet = Bt+s −Bs on A∩{T = s}. By the Markov property, (Bt+s −Bs )t≥0 is a Brownian
motion independent of FsB . Hence for any measurable subset Γ of C([0, ∞], Rd), we
have
X
et )t≥0 ∈ Γ} ∩ A] =
P [{(B P [{(Bt+s − Bs )t≥0 ∈ Γ} ∩ A ∩ {T = s}]
s∈C
X
= µ0 [Γ] · P [A ∩ {T = s}] = µ0 [Γ] · P [A]
s∈C
where µ0 denotes the distribution of Brownian motion starting at 0. This proves the
assertion for discrete stopping times.
For an arbitrary (FtB ) stopping time T that is almost surely finite and n ∈ N, we set
Tn = n1 ⌈nT ⌉, i.e.,
k k−1 k
Tn = on <T ≤ for any k ∈ N.
n n n
We now apply the strong Markov property to prove a reflection principle for Brownian
motion. Consider a one-dimensional continuous Brownian motion (Bt )t≥0 starting at 0.
For a ∈ R let
Theorem 1.27 (Reflection principle). The joint distributions of the following random
variables with values in R+ × C([0, ∞)) × C([0, ∞)) agree:
et )t≥0 )
(Ta , (BtTa )t≥0 , (B ∼ et )t≥0 )
(Ta , (BtTa )t≥0 , (−B
Bt
a
bt
B
Ta
As a consequence of the theorem, we can complete the argument given at the beginning
of this section: The "shadow path" Bbt of a Brownian path Bt with reflection when
reaching the level a is given by
B Ta for t ≤ Ta
bt = t
B ,
a − B
et−Ta for t > Ta
whereas
B Ta for t ≤ Ta
t
Bt = .
a + B
e for t > Ta
t−Ta
bt )t≥0 has the same distribution as (Bt )t≥0 . Therefore, and since
By the Theorem 1.27, (B
max Bt ≥ a if and only if max B bt ≥ a, we obtain for a ≥ c:
t∈[0,s] t∈[0,s]
P max Bt ≥ a, Bs ≤ c = P max B bt ≥ a, B
bs ≥ 2a − c
t∈[0,s] t∈[0,s]
h i
= P B bs ≥ 2a − c
ˆ∞
1 2 /2s
= √ e−x dx.
2πs
2a−c
67
Chapter 2
Classical analysis starts with studying convergence of sequences of real numbers. Sim-
ilarly, stochastic analysis relies on basic statements about sequences of real-valued ran-
dom variables. Any such sequence can be decomposed uniquely into a martingale, i.e.,
a real.valued stochastic process that is “constant on average”, and a predictable part.
Therefore, estimates and convergence theorems for martingales are crucial in stochastic
analysis.
(2). A stochastic process (Xn )n≥0 is adapted to a filtration (Fn )n≥0 iff each Xn is
Fn -measurable.
68
2.1. DEFINITIONS AND EXAMPLES 69
Example. (1). The canonical filtration (FnX ) generated by a stochastic process (Xn )
is given by
FnX = σ(X0 , X1 , . . . , Xn ).
If the filtration is not specified explicitly, we will usually consider the canonical
filtration.
(2). Alternatively, filtrations containing additional information are of interest, for ex-
ample the filtration
Fn = σ(Z, X0 , X1 , . . . , Xn )
generated by the process (Xn ) and an additional random variable Z, or the filtra-
tion
Fn = σ(X0 , Y0, X1 , Y1 , . . . , Xn , Yn )
Clearly, the process (Xn ) is adapted to any of these filtrations. In general, (Xn ) is
adapted to a filtration (Fn ) if and only if FnX ⊆ Fn for any n ≥ 0.
and correspondingly with “=” replaced by “≤” or “≥” for super- or submartingales.
Intuitively, a martingale is a ”fair game´´, i.e., Mn−1 is the best prediction (w.r.t. the
mean square error) for the next value Mn given the information up to time n − 1. A su-
permartingale is “decreasing on average”, a submartingale is “increasing on average”,
and a martingale is both “decreasing” and “increasing”, i.e., “constant on average”. In
particular, by induction on n, a martingale satisfies
Similarly, for a supermartingale, the expectation values E[Mn ] are decreasing. More
generally, we have:
(≤)
E[Mn+k | Fn ] = Mn P -almost surely for any n, k ≥ 0.
A Random Walk n
X
Sn = ηi , n = 0, 1, 2, . . . ,
i=1
Fn = σ(η1 , . . . , ηn ) = σ(S0 , S1 , . . . , Sn )
if and only if the increments ηi are centered random variables. In fact, for any n ∈ N,
A stochastic process
n
Y
Mn = Yi , n = 0, 1, 2, . . . ,
i=1
Fn = σ(Y1 , . . . , Yn )
if and only if E[Yi ] = 1 for any i ∈ N, or E[Yi ] ≤ 1 for any i ∈ N respectively. In fact,
as Mn is Fn -measurable and Yn+1 is independent of Fn , we have
Martingales and supermartingales of this type occur naturally in stochastic growth mod-
els.
Pn
Example (Exponential martingales). Consider a Random Walk Sn = i=1 ηi with
i.i.d. increments ηi , and let
denote the moment generating function of the increments. Then for any λ ∈ R with
Z(λ) < ∞, the process
n
Y
Mnλ := eλSn /Z(λ)n = eληi /Z(λ)
i=1
is a martingale. This martingale can be used to prove exponential bounds for Ran-
dom Walks, cf. e.g. Chernov’s theorem [“Einführung in die Wahrscheinlichkeitstheo-
rie”, Theorem 8.3].
On the other hand, since in the CRR model Xn only takes the values 1 + a and 1 + b,
we have
Mn := E[F | Fn ], n = 0, 1, 2, . . . ,
Mn = E[M∞ | Fn ]
d) Functions of martingales
Theorem 2.2 (Convex functions of martingales). Suppose that (Mn )n≥0 is an (Fn )
martingale, and u : R → R is a convex function that is bounded from below. Then
(u(Mn )) is an (Fn ) submartingale.
Proof. Since u is lower bounded, u(Mn )− is integrable for any n. Jensen’s inequality
for conditional expectations now implies
E[u(Mn+1 ) | Fn ] ≥ u E[Mn+1 | Fn ] = u(Mn )
exist, and
(ph)(x) ≤ h(x) (respectively (ph)(x) ≥ h(x))
By the tower property for conditional expectations, any (Fn ) Markov chain is also a
Markov chain w.r.t. the canonical filtration generated by the process.
Example (Classical Random Walk on Zd ). The standard Random Walk (Xn )n≥0 on
Zd is a Markov chain w.r.t. the filtration FnX = σ(X0 , . . . , Xn ) with transition prob-
abilities p(x, x + e) = 1/2d for any unit vector e ∈ Zd . The coordinate processes
(Xni )n≥0 , i = 1, . . . , d, are Markov chains w.r.t. the same filtration with transition prob-
abilities
1 2d − 2
p(x, x + 1) = p(x, x − 1) = , p(x, x) = .
2d 2d
University of Bonn 2015/2016
76 CHAPTER 2. MARTINGALES IN DISCRETE TIME
for any x ∈ Zd .
A function h : Z → R is harmonic w.r.t. p if and only if h(x) = ax + b with a, b ∈ R,
and h is superharmonic if and only if it is concave.
Mn := h(Xn ), n = 0, 1, 2, . . . ,
is a martingale (resp. a supermartingale) w.r.t. (Fn ) for every harmonic (resp. super-
harmonic) function h : S → R such that h(Xn ) (resp. h(Xn )+ ) is integrable for all
n.
Below, we will show how to construct more general martingales from Markov chains,
cf. Theorem 2.5. At first, however, we consider a simple example that demonstrates the
usefulness of martingale methods in analyzing Markov chains:
Example (Wright model for evolution). In the Wright model for a population of N
individuals (replicas) with a finite number of possible types, each individual in genera-
tion n + 1 inherits a type from a randomly chosen predecessor in the n th generation.
k N
In order to compute the probabilities of the events “Xn = 0 eventually” and “Xn = N
eventually” we can apply the Optional Stopping Theorem for martingales, cf. Section
2.3 below. Let
denote the first hitting time of the absorbing states. If the initial number X0 of individ-
uals of the given type is k, then by the Optional Stopping Theorem,
E[XT ] = E[X0 ] = k.
Doob Decomposition
Let (Ω, A, P ) be a probability space and (Fn )n≥0 a filtration on (Ω, A).
Intuitively, the value An (ω) of a predictable process can be predicted by the information
available at time n − 1.
Theorem 2.4 (Doob decomposition). Every (Fn ) adapted sequence of integrable ran-
dom variables Yn (n ≥ 0) has a unique decomposition (up to modification on null sets)
Yn = Mn + An (2.2.1)
into an (Fn ) martingale (Mn ) and a predictable process (An ) such that A0 = 0. Ex-
plicitly, the decomposition is given by
n
X
An = E[Yk − Yk−1 | Fk−1], and Mn = Yn − An . (2.2.2)
k=1
Remark. (1). The increments E[Yk −Yk−1 |Fk−1] of the process (An ) are the predicted
increments of (Yn ) given the previous information.
(2). The process (Yn ) is a supermartingale (resp. a submartingale) if and only if the
predictable part (An ) is decreasing (resp. increasing).
Existence: Conversely, if (An ) and (Mn ) are defined by (2.2.2) then (An ) is predictable
with A0 = 0 and (Mn ) is a martingale, since
Mn2 = fn + hMin
M for any n ≥ 0.
Here we have used in the last step that E[Mk − Mk−1 | Fk−1] vanishes since (Mn ) is a
martingale.
is called the conditional variance process of the square integrable martingale (Mn ).
Pn
Example (Random Walks). If Mn = i=1 ηi is a sum of independent centered random
variables ηi and Fn = σ(η1 , . . . , ηn ) then the conditional variance process is given by
P
hMin = ni=1 Var[ηi ].
The conditional variance process is crucial for generalizations of classical limit theo-
rems such as the Law of Large Numbers or the Central Limit Theorem from sums of
independent random variables to martingales. A direct consequence of the fact that
Mn2 − hMin is a martingale is that
with a random noise term. The solution (Xt )t≥0 of such a stochastic differential equa-
tion (SDE) is a stochastic process in continuous time defined on a probability space
(Ω, A, P ) where also the random variables describing the noise effects are defined. The
vector field b is called the (deterministic) “drift”. We will make sense of general SDE
later, but we can already consider time discretizations.
Here, the values 0 and 1 are just a convenient normalization, but it is an important
assumption that the random variables are independent with finite variances. Given an
initial value x0 ∈ R and a fine discretization step size h > 0, we now define a stochastic
(h) (h)
process (Xn ) in discrete time by X0 = x0 , and
(h) (h) (h) (h)
√
Xk+1 − Xk = b(Xk ) · h + σ(Xk ) h ηk+1 , for k = 0, 1, 2, . . . (2.2.5)
(h)
One should think of Xk as an approximation for the value of the process (Xt ) at time
t = k · h. The equation (2.2.5) can be rewritten as
n−1
X n−1
X
(h) (h)
√
Xn(h) = x0 + b(Xk ) ·h+ σ(Xk ) · h · ηk+1 . (2.2.6)
k=0 k=0
√
To understand the scaling factors h and h we note first that if σ ≡ 0 then (2.2.5) re-
spectively (2.2.6) is the Euler discretization of the ordinary differential equation (2.2.3).
√
Furthermore, if b ≡ 0 and σ ≡ 1, then the diffusive scaling by a factor h in the second
(h)
term ensures that the continuous time process X⌊t/h⌋ , t ∈ [0, ∞), converges in distri-
bution as h ց 0. Indeed, the functional central limit theorem (Donsker’s invariance
principle) states that the limit process in this case is a Brownian motion (Bt )t∈[0,∞) . In
general, (2.2.6) is an Euler discretization of a stochastic differential equation of type
where (Bt )t≥0 is a Brownian motion. Let Fn = σ(η1 , . . . , ηn ) denote the filtration gen-
erated by the random variables ηi . The following exercise summarizes basic properties
of the process X (h) in the case of normally distributed increments.
Exercise. Suppose that the random variables ηi are standard normally distributed.
(1). Prove that the process X (h) is a time-homogeneous (Fn ) Markov chain with tran-
sition kernel
p(x, • ) = N(x + b(x)h, σ(x)2 h)[ • ].
(2). Show that the Doob decomposition X (h) = M (h) + A(h) is given by
n−1
X n−1
X
(h) (h)
√
An(h) = b(Xk ) · h, Mn(h) = x0 + σ(Xk ) h ηk+1 , (2.2.7)
k=0 k=0
The last equation can be used in combination with the maximal inequality for mar-
tingales to derive bounds for the processes (X (h) ) in an efficient way, cf. Section 2.4
below.
It is easy to verify that Mn2 − [M]n is again a martingale. However, [M]n is not pre-
dictable. For continuous martingales in continuous time, the quadratic variation and the
conditional variance process coincide. In discrete time or for discontinuous martingales
they are usually different.
Martingale problem
for any function f on the state space such that f (Xn ) is integrable for each n. Compu-
tation of the predictable part leads to the following general result:
(1). (Xn ) is a time homogeneous (Fn ) Markov chain with transition kernel p.
(2). (Xn ) is a solution of the martingale problem for the operator L = p − I, i.e.,
there is a decomposition
n−1
X
f (Xn ) = Mn[f ] + (L f )(Xk ), n ≥ 0,
k=0
[f ]
with an (Fn ) martingale (Mn ) for every function f : S → R such that f (Xn ) is
integrable for each n, or, equivalently, for every bounded function f : S → R.
Proof. The implication “(i)⇒(ii)” is just the Doob decomposition for f (Xn ). In fact,
by Theorem 2.4, the predictable part is given by
n−1
X
A[f
n
]
= E[f (Xk+1) − f (Xk ) | Fk ]
k=0
n−1
X n−1
X
= (pf (Xk ) − f (Xk )) = (L f )(Xk ),
k=0 k=0
[f ] [f ]
and Mn = f (Xn ) − An is a martingale.
[f ]
To prove the converse implication “(ii)⇒(i)” suppose that Mn is a martingale for any
bounded f : S → R. Then
[f ]
0 = E[Mn+1 − Mn[f ] | Fn ]
= E[f (Xn+1 ) − f (Xn ) | Fn ] − ((pf )(Xn ) − f (Xn ))
= E[f (Xn+1 ) | Fn ] − (pf )(Xn )
almost surely for any bounded function f . Hence (Xn ) is an (Fn ) Markov chain with
transition kernel p.
Example (One dimensional Markov chains). Suppose that under Px , the process (Xn )
is a time homogeneous Markov chain with state space S = R or S = Z, initial state
X0 = x, and transition kernel p. Assuming Xn ∈ L2 (Ω, A, P ) for each n, we define the
“drift” and the “fluctuations” of the process by
b(x) := Ex [X1 − X0 ]
a(x) = Varx [X1 − X0 ].
n−1
X n−1
X n−1
X
hMin = Var[Mk+1 − Mk | Fk ] = Var[Xk+1 − Xk | Fk ] = a(Xk ).
k=0 k=0 k=0
Therefore
n−1
X
Mn2 = fn +
M a(Xk ) (2.2.12)
k=0
Martingale transforms
Suppose that (Mn )n≥0 is a martingale w.r.t. (Fn ), and (Cn )n∈N is a predictable sequence
of real-valued random variables. For example, we may think of Cn as the stake in the
n-th round of a fair game, and of the martingale increment Mn − Mn−1 as the net gain
(resp. loss) per unit stake. In this case, the capital In of a player with gambling strategy
(Cn ) after n rounds is given recursively by
i.e.,
n
X
In = I0 + Ck · (Mk − Mk−1 ).
k=1
is called the martingale transform of the martingale (Mn )n≥0 w.r.t. the predictable
sequence (Cn )n≥1 , or the discrete stochastic integral of C w.r.t. M.
We will see later that the process C• M is a time-discrete version of the stochastic inte-
gral Cs dMs of a predictable continuous-time process C w.r.t. a continuous-time mar-
´
´n
tingale M. To be precise, (C• M)n coincides with the Itô integral 0 C⌈t⌉ dM⌊t⌋ of the
left continuous jump process t 7→ C⌈t⌉ w.r.t. the right continuous martingale t 7→ M⌊t⌋ .
Example (Martingale strategy). One origin of the word “martingale” is the name of
a well-known gambling strategy: In a standard coin-tossing game, the stake is doubled
each time a loss occurs, and the player stops the game after the first time he wins. If the
net gain in n rounds with unit stake is given by a standard Random Walk
Clearly, with probability one, the game terminates in finite time, and at that time the
player has always won one unit, i.e.,
(C• M)n
2
n
−1
−2
−3
−4
−5
−6
−7
At first glance this looks like a safe winning strategy, but of course this would only be
the case, if the player had unlimited capital and time available.
Theorem 2.6 (You can’t beat the system!). (1). If (Mn )n≥0 is an (Fn ) martingale,
and (Cn )n≥1 is predictable with Cn · (Mn − Mn−1 ) ∈ L1 (Ω, A, P ) for any n ≥ 1,
then C• M is again an (Fn ) martingale.
(2). If (Mn ) is an (Fn ) supermartingale and (Cn )n≥1 is non-negative and predictable
with Cn · (Mn − Mn−1 ) ∈ L1 for any n, then C• M is again a supermartingale.
This proves the first part of the claim. The proof of the second part is similar.
The theorem shows that a fair game (a martingale) can not be transformed by choice of
a clever gambling strategy into an unfair (or “superfair”) game. In models of financial
markets this fact is crucial to exclude the existence of arbitrage possibilities (riskless
profit).
Example (Martingale strategy, cont.). For the classical martingale strategy, we obtain
This is a classical example showing that the assertion of the dominated convergence
theorem may not hold if the assumptions are violated.
Remark. The integrability assumption in Theorem 2.6 is always satisfied if the random
variables Cn are bounded, or if both Cn and Mn are square-integrable for any n.
Example (Financial market model with one risky asset). Suppose that during each
time interval (n − 1, n), an investor is holding Φn units of an asset with price Sn per
unit at time n. We assume that (Sn ) is an adapted and (Φn ) is a predictable stochastic
process w.r.t. a filtration (Fn ). If the investor always puts his remaining capital onto
a bank account with guaranteed interest rate r (“riskless asset”) then the change of his
capital Vn during the time interval (n − 1, n) is given by
Considering the discounted quantity Ven = Vn /(1 + r)n , we obtain the equivalent
recursion
Ven = Ven−1 + Φn · (Sen − Sen−1 ) for any n ≥ 1. (2.3.2)
Ven = V0 + (Φ• S)
e n.
By Theorem 2.6, we can conclude that, if the discounted price process (Sen ) is an (Fn )
martingale w.r.t. a given probability measure, then (Ven ) is a martingale as well. In this
case, assuming that V0 is constant, we obtain in particular
E[Ven ] = V0 ,
or, equivalently,
E[Vn ] = (1 + r)n V0 for any n ≥ 0. (2.3.3)
This fact, together with the existence of a martingale measure, can now be used for
option pricing under a no-arbitrage assumption. To this end we assume that the payoff
of an option at time N is given by an (FN )-measurable random variable F . For example,
the payoff of a European call option with strike price K based on the asset with price
process (Sn ) is SN − K if the price Sn at maturity exceeds K, and 0 otherwise, i.e.,
F = (SN − K)+ .
Suppose further that the option can be replicated by a hedging strategy (Φn ), i.e., there
exists an F0 -measurable random variable V0 and a predictable sequence of random vari-
ables (Φn )1≤n≤N such that
F = VN
is the value at time N of a portfolio with initial value V0 w.r.t. the trading strategy (Φn ).
Then, assuming the non-existence of arbitrage possibilities, the option price at time
0 has to be V0 , since otherwise one could construct an arbitrage strategy by selling
the option and investing money in the stock market with strategy (Φn ), or conversely.
Therefore, if a martingale measure exists (i.e., an underlying probability measure such
that the discounted stock price (Sen ) is a martingale), then the no-arbitrage price of the
option at time 0 can be computed by (2.3.3) where the expectation is taken w.r.t. the
martingale measure.
The following exercise shows how this works out in the Cox-Ross-Rubinstein binomial
model:
Exercise (No-Arbitrage Pricing in the CRR model). Consider the CRR binomial
model, i.e., Ω = {1 + a, 1 + b}N with −1 < a < r < b < ∞, Xi (ω1 , . . . , ωN ) = ωi ,
Fn = σ(X1 , . . . , Xn ), and
n
Y
Sn = S0 · Xi , n = 0, 1, . . . , N,
i=1
where S0 is a constant.
(1). Completeness of the CRR model: Prove that for any function F : Ω → R there
exists a constant V0 and a predictable sequence (Φn )1≤n≤N such that F = VN
where (Vn )1≤n≤N is defined by (2.3.1), or, equivalently,
F
= VeN = V0 + (Φ• S)
e N.
(1 + r)N
Hence in the CRR model, any FN -measurable function F can be replicated by
a predictable trading strategy. Market models with this property are called com-
plete.
(2). Option pricing: Derive a general formula for the no-arbitrage price of an option
with payoff function F : Ω → R in the CRR model. Compute the no-arbitrage
price for a European call option with maturity N and strike K explicitly.
Stopped Martingales
One possible strategy for controlling a fair game is to terminate the game at a time
depending on the previous development. Recall that a random variable T : Ω →
{0, 1, 2, . . .} ∪ {∞} is called a stopping time w.r.t. the filtration (Fn ) if and only if
the event {T = n} is contained in Fn for any n ≥ 0, or equivalently, iff {T ≤ n} ∈ Fn
for any n ≥ 0.
SB = min{n ≥ 1 : Xn ∈ B}
If one decides to sell an asset as soon as the price Sn exceeds a given level λ > 0
then the selling time equals T(λ,∞) and is hence a stopping time.
For example, the process stopped at a hitting time TB gets stuck at the first time it enters
the set B.
i.e., we put a unit stake in each round before time T and quit playing at time T . Since
T is a stopping time, the sequence (Cn ) is predictable. Moreover,
and (2.3.4) follows by summing over n. Since the sequence (Cn ) is predictable, bounded
and non-negative, the process C• M is a martingale, supermartingale respectively, pro-
vided the same holds for M.
Suppose for example that (Mn ) is the classical Random Walk starting at 0 and
T = T{1} is the first hitting time of the point 1. Then, by recurrence of the
Random Walk, T < ∞ and MT = 1 hold almost surely although M0 = 0.
(2). If, on the other hand, T is a bounded stopping time, then there exists n ∈ N such
that T (ω) ≤ n for any ω. In this case, the optional stopping theorem implies
(≤)
E[MT ] = E[MT ∧n ] = E[M0 ].
More general sufficient conditions for (2.3.5) are given in Theorems 2.8, 2.9 and 2.10
below.
Example (Classical ruin problem). Let a, b, x ∈ Z with a < x < b. We consider the
classical Random Walk
n
X 1
Xn = x + ηi , ηi i.i.d. with P [ηi = ±1] = ,
i=1
2
with initial value X0 = x. We now show how to apply the Optional Stopping Theorem
to compute the distributions of the exit time
and the exit point XT . These distributions can also be computed by more traditional
methods (first step analysis, reflection principle), but martingales yield an elegant and
general approach.
n→∞
x = E[X0 ] = E[XT ∧n ] → E[XT ] = a · r(x) + b · (1 − r(x))
Therefore, by (2.3.6),
E[Tb ] ≥ lim (b − x) · (x − a) = ∞,
a→−∞
i.e., Tb is not integrable! These and some other related passage times are im-
portant examples of random variables with a heavy-tailed distribution and infinite
first moment.
Taking the limit as λ ց 0, we see that P [T < ∞] = 1. Taking this into account,
and substituting s = 1/ cosh λ in (2.3.9), we can now compute the generating
function of T explicitly:
√
E[sT ] = e−λ = (1 − 1 − s2 )/s for any s ∈ (0, 1). (2.3.10)
Theorem 2.8 (Optional Stopping Theorem, Version 2). Suppose that (Mn ) is a mar-
tingale w.r.t. (Fn ), T is an (Fn )-stopping time with P [T < ∞] = 1, and there exists a
random variable Y ∈ L1 (Ω, A, P ) such that
Then
E[MT ] = E[M0 ].
For non-negative supermartingales, we can apply Fatou’s Lemma instead of the Domi-
nated Convergence Theorem to pass to the limit as n → ∞ in the Stopping Theorem.
The advantage is that no integrability assumption is required. Of course, the price to
pay is that we only obtain an inequality:
Example (Dirichlet problem for Markov chains). Suppose that w.r.t. the probability
measure Px , the process (Xn ) is a time-homogeneous Markov chain with measurable
state space (S, B), transition kernel p, and start in x. Let D ∈ B be a measurable
subset of the state space, and f : D C → R a measurable function (the given “boundary
values”), and let
T = min{n ≥ 0 : Xn ∈ D C }
denote the first exit time of the Markov chain from D. By conditioning on the first
step of the Markov chain, one can show that if f is non-negative or bounded, then the
function
h(x) = Ex [f (XT ) ; T < ∞], (x ∈ S),
is a solution of the Dirichlet problem
DC
(1). Prove that h(XT ∧n ) is a martingale w.r.t. Px for any bounded solution h of the
Dirichlet problem and any x ∈ S.
(3). Similarly, show that for any non-negative f , the function h defined by (2.3.11) is
the minimal non-negative solution of the Dirichlet problem.
We finally state a version of the Optional Stopping Theorem that applies in particular to
martingales with bounded increments:
Corollary 2.10 (Optional Stopping for martingales with bounded increments). Sup-
pose that (Mn ) is an (Fn ) martingale, and there exists a finite constant K ∈ (0, ∞) such
that
Then for any (Fn ) stopping time T with E[T ] < ∞, we have
E[MT ] = E[M0 ].
Let Y denote the expression on the right hand side. We will show that Y is an integrable
random variable – this implies the assertion by Theorem 2.8. To verify integrability of
Y note that the event {T > i} is contained in Fi for any i ≥ 0 since T is a stopping
time. Therefore and by (2.3.12),
by the assumptions.
Exercise (Integrability of stopping times). Prove that the expectation E[T ] of a stop-
ping time T is finite if there exist constants ε > 0 and k ∈ N such that
Mn = Sn − n · m
Theorem 2.11 (Wald’s identity). Suppose that T is an (Fn ) stopping time with E[T ] <
∞. Then
E[ST ] = m · E[T ].
by the independence of the ηi . As the ηi are identically distributed and integrable, the
right hand side is a finite constant. Hence Corollary 2.10 applies, and we obtain
for any c ∈ N. In combination with the Markov-Čebyšev inequality this can be used to
control the running maximum of the Random Walk in terms of the moments of the last
value Sn .
Maximal inequalities are corresponding estimates for max(M0 , M1 , . . . , Mn ) or sup Mk
k≥0
when (Mn ) is a sub- or supermartingale respectively. These estimates are an important
tool in stochastic analysis. They are a consequence of the Optional Stopping Theorem.
Doob’s inequality
We first prove the basic version of maximal inequalities for sub- and supermartingales:
Note that Tc < ∞ whenever sup Mk > c. Hence by the version of the Optional
Stopping Theorem for non-negative supermartingales, we obtain
1 1
P [sup Mk > c] ≤ P [Tc < ∞] ≤ E[MTc ; Tc < ∞] ≤ E[M0 ].
c c
Here we have used in the second and third step that (Mn ) is non-negative. Re-
placing c by c − ε and letting ε tend to zero we can conclude
1 1
P [sup Mk ≥ c] = lim P [sup Mk > c − ε] ≤ lim inf E[M0 ] = · E[M0 ].
εց0 εց0 c − ε c
Corollary 2.13. (1). Suppose that (Mn )n≥0 is an arbitrary submartingale (not neces-
sarily non-negative!). Then
1 +
P max Mk ≥ c ≤ E Mn ; max Mk ≥ c for any c > 0, and
k≤n c k≤n
−λc λMn
P max Mk ≥ c ≤ e E e ; max Mk ≥ c for any λ, c > 0.
k≤n k≤n
Proof. The corollary follows by applying the maximal inequality to the non-negative
submartingales Mn+ , exp(λMn ), |Mn |p respectively. These processes are indeed sub-
martingales, as the functions x 7→ x+ and x 7→ exp(λx) are convex and non-decreasing
for any λ > 0, and the functions x 7→ |x|p are convex for any p ≥ 1.
Lp inequalities
The last estimate in Corollary 2.13 can be used to bound the Lp norm of the running
maximum of a martingale in terms of the Lp -norm of the last value. The resulting bound,
known as Doob’s Lp -inequality, is crucial for stochastic analysis. We first remark:
´y
Lemma 2.14. If Y : Ω → R+ is a non-negative random variable, and G(y) = g(x)dx
0
is the integral of a non-negative function g : R+ → R+ , then
ˆ∞
E[G(Y )] = g(c) · P [Y ≥ c] dc.
0
Theorem 2.15 (Doob’s Lp inequality). Suppose that (Mn )n≥0 is a martingale, and let
1 1
Then, for any p, q ∈ (1, ∞) such that p
+ q
= 1, we have
Proof. By Lemma 2.14, Corollary 2.13 applied to the martingales Mn and (−Mn ), and
Fubini’s theorem, we have
ˆ∞
2.14
E[(Mn∗ )p ] = pcp−1 · P [Mn∗ ≥ c] dc
0
ˆ∞
2.13
≤ pcp−2E[|Mn | ; Mn∗ ≥ c] dc
0
∗
ˆMn
Fub.
= E |Mn | · pcp−2 dp
0
p
= E[|Mn | · (Mn∗ )p−1]
p−1
p
for any n ≥ 0 and p ∈ (1, ∞). Setting q = p−1
and applying Hölder’s inequality to the
right hand side, we obtain
E[(Mn∗ )p ] ≤ q · kMn kLp · k(Mn∗ )p−1 kLq = q · kMn kLp · E[(Mn∗ )p ]1/q ,
i.e.,
kMn∗ kLp = E[(Mn∗ )p ]1−1/q ≤ q · kMn kLp . (2.4.1)
This proves the first inequality. The second inequality follows as n → ∞, since
kM ∗ kLp =
lim Mn∗
= lim inf kMn∗ kLp ≤ q · sup kMn kLp
n→∞ Lp n→∞ n∈N
by Fatou’s Lemma.
Hoeffding’s inequality
For a standard Random Walk (Sn ) starting at 0, the reflection principle combined with
Bernstein’s inequality implies the upper bound
for any n ∈ N and c ∈ (0, ∞). A similar inequality holds for arbitrary martingales with
bounded increments:
Theorem 2.16 (Azuma, Hoeffding). Suppose that (Mn ) is a martingale such that
1 an − Yn −λan 1 an + Yn λan
eλYn ≤ e + e
2 an 2 an
E[Yn |Fn−1] = 0,
and therefore
2
E[eλYn | Fn−1 ] ≤ e−λan + eλan /2 = cosh(λan ) ≤ e(λan ) /2
Hence, by induction on n,
n
!
1 2X 2
E[eλMn ] ≤ exp λ a for any n ∈ N, (2.4.4)
2 i=1 i
n−l
X
N = I{Xi+1 =a1 ,Xi+2 =ax ,...,Xi+l =al } (2.4.6)
i=0
n−l
X
E[N] = P [Xi+k = ak for k = 1, . . . , l] = (n − l + 1)/|S|l. (2.4.7)
i=0
To estimate the fluctuations of the random variable N around its mean value, we con-
sider the martingale
with initial value M0 = E[N] and terminal value Mn = N. Since at most l of the
summands in (2.4.6) are not independent of i, and each summand takes values 0 and 1
only, we have
|Mi − Mi−1 | ≤ l for each i = 0, 1, . . . , n.
The equation (2.4.7) and the bound (2.4.8) show that N is highly concentrated around
√
its mean if l is small compared to n.
The notion of a martingale, sub- and supermartingale in continuous time can be defined
similarly as in the discrete parameter case. Fundamental results such as the optional
stopping theorem or the maximal inequality carry over from discrete parameter to con-
tinuous time martingales under additional regularity conditions as, for example, conti-
nuity of the sample paths. Similarly as for Markov chains in discrete time, martingale
methods can be applied to derive explicit expressions and bounds for probabilities and
expectations of Brownian motion in a clear and efficient way.
We start with the definition of martingales in continuous time. Let (Ω, A, P ) denote a
probability space.
107
108 CHAPTER 3. MARTINGALES IN CONTINUOUS TIME
However, not every hitting time that we are interested in is a stopping time w.r.t. this
filtration. For example, for one-dimensional Brownian motion (Bt ), the first hitting
time T = inf{t ≥ 0 : Bt > c} of the open interval (c, ∞) is not an (FtB ) stopping
time. An intuitive explanation for this fact is that for t ≥ 0, the event {T ≤ t} is not
contained in FtB , since for a path with Bs ≤ c on [0, t] and Bt = c, we can not decide
at time t, if the path will enter the interval (c, ∞) in the next instant. For this and other
reasons, we also consider the right-continuous filtration
\
B
Ft := Ft+ε , t ≥ 0,
ε>0
Exercise (Hitting times as stopping times). Prove that the first hitting time TA =
inf{t ≥ 0 : Bt ∈ A} of a set A ⊆ Rd is an (FtB ) stopping time if A is closed, whereas
TA is an (Ft ) stopping time, but not necessarily an (FtB ) stopping time if A is open.
It is easy to verify that a d-dimensional Brownian motion (Bt ) is also a Brownian motion
w.r.t. the right-continuous filtration (Ft ):
AP = {A ⊆ Ω : ∃A1 , A2 ∈ A : A1 ⊆ A ⊆ A2 , P [A2 \ A1 ] = 0}
It can be shown that the completion (FtP ) of the right-continuous filtration (Ft ) is again
right-continuous. The assertion of Lemma 3.1 obviously carries over to the completed
filtration.
Brownian Martingales
We now identify some basic martingales of Brownian motion:
Proof. We only prove the second assertion for d = 1 and the right-continuous filtration
(Ft ). The verification of the remaining statements is left as an exercise.
For d = 1, since Bt is normally distributed, the Ft -measurable random variable Bt2 − t
is integrable for any t. Moreover, by Lemma 3.1,
Bt2 = Mt + t
of the submartingale (Bt2 ) into a martingale (Mt ) and the continuous increasing adapted
process hBit = t.
A Doob decomposition of the process f (Bt ) for general functions f ∈ C 2 (R) will be
obtained below as a consequence of Itô’s celebrated formula. It states that
ˆt ˆt
′ 1
f (Bt ) − f (B0 ) = f (Bs ) dBs + f ′′ (Bs ) ds (3.1.2)
2
0 0
where the first integral is an Itô stochastic integral, cf. Section 6.3. If, for example, f ′ is
bounded, then the Itô integral is a martingale as a function of t. If f is convex then f (Bt )
is a submartingale and the second integral is a continuous increasing adapted process in
t. It is a consequence of (3.1.2) that Brownian motion solves the martingale problem for
the operator L f = f ′′ /2 with domain Dom(L ) = {f ∈ C 2 (R) : f ′ bounded}.
Itô’s formula (3.1.2) can also be extended to the multi-dimensional case, see Section
P 2
6.4 below. The second derivative is then replaced by the Laplacian ∆f = di=1 ∂∂xf2 .
i
The multi-dimensional Itô formula implies that a sub- or superharmonic function of d-
dimensional Brownian motion is a sub- or supermartingale respectively, if appropriate
integrability conditions hold. We now give a direct proof of this fact by the mean value
property:
Lemma 3.3 (Mean value property for harmonic function in Rd ). Suppose that h ∈
C 2 (Rd ) is a (super-)harmonic function, i.e.,
(≤)
∆h(x) = 0 for any x ∈ Rd .
Proof. By the classical mean value property, h(x) is equal to (resp. greater or equal
than) the average value h of h on any sphere ∂Br (x) with center at x and radius
ffl
∂Br (x)
r > 0, cf. e.g. [XXXKönigsberger: Analysis II]. Moreover, if µ is a rotationally invari-
ant probability measure then the integral in (3.1.3) is an average of average values over
spheres: ˆ ˆ
(≤)
h(x + y) µ(dy) = h µR (dr) = h(x),
∂Br (x)
Theorem 3.5 (Optional Sampling Theorem). Suppose that (Mt )t∈[0,∞] is a martingale
w.r.t. an arbitrary filtration (Ft ) such that t 7→ Mt (ω) is continuous for P -almost every
ω. Then
E[MT | FS ] = MS P -almost surely (3.2.1)
We point out that an additional assumption on the filtration (e.g. right-continuity) is not
required in the theorem. Stopping times and the σ-algebra FS are defined for arbitrary
filtrations in complete analogy to the definitions for the filtration (FtB ) in Section 1.5.
E[MT ] = E[M0 ]
holds by dominated convergence provided T < ∞ almost surely, and the random vari-
ables MT ∧n , n ∈ N, are uniformly integrable.
Proof of Theorem 3.5. We verify the defining properties of the conditional expectation
in (3.4) by approximating the stopping times by discrete random variables:
(1). MS has an FS -measurable modification: For n ∈ N let Sen = 2−n ⌊2n S⌋, i.e.,
We point out that in general, Sen is not a stopping time w.r.t. (Ft ). Clearly, the
sequence (Sen )n∈N is increasing with S = lim Sn . By almost sure continuity
On the other hand, each of the random variables MSen is FS -measurable. In fact,
X
MSen · I{S≤t} = Mk·2−n · I{k2−n ≤S<(k+1)2−n and S≤t}
k:k·2−n ≤t
(2). E[MT ; A] = E[MS ; A] for any A ∈ FS : For n ∈ N, the discrete random vari-
ables Tn = 2−n · ⌈2n T ⌉ and Sn = 2−n · ⌈2n S⌉ are (Ft ) stopping times satisfying
Tn ≥ Sn ≥ S, cf. the proof of Theorem 1.26. In particular, FS ⊆ FSn ⊆ FTn .
Furthermore, (Tn ) and (Sn ) are decreasing sequences with T = lim Tn and
S = lim Sn . As T and S are bounded random variables by assumption, the
sequences (Tn ) and (Sn ) are uniformly bounded by a finite constant c ∈ (0, ∞).
Therefore, we obtain
S(ω) Sn (ω)
Sen (ω)
X
E[MTn ; A] = E[Mk·2−n ; A ∩ {Tn = k · 2−n }]
k:k·2−n ≤c
X
= E[Mc ; A ∩ {Tn = k · 2−n }] (3.2.3)
k:k·2−n ≤c
= E[Mc ; A] for any A ∈ FTn ,
and similarly
In (3.2.3) we have used that (Mt ) is an (Ft ) martingale, and A ∩ {Tn = k · 2−n } ∈
Fk·2−n . A set A ∈ FS is contained both in FTn and FSn . Thus by (3.2.3) and
(3.2.4),
denote the first exit time from the interval (−b, a) and the first passage time to the point
a, respectively. In Section 1.5 we have computed the distribution of Ta by the reflection
principle. This and other results can be recovered by applying optional stopping to the
basic martingales of Brownian motion. The advantage of this approach is that it carries
over to other diffusion processes.
Exercise (Exit and passage times of Brownian motion). Prove by optional stopping:
(1). Law of the exit point: P [BT = a] = b/(a + b), P [BT = −b] = a/(a + b),
Suppose that h ∈ C 2 (Rd ) is a harmonic function and that (Bt )t≥0 is a d-dimensional
Brownian motion starting at x w.r.t. the probability measure Px . Assuming that
the mean value property for harmonic functions implies that h(Bt ) is a martingale under
Px , cf. Theorem 3.4. The first hitting time T = inf{t ≥ 0 : Bt ∈ Rd \ D} of the com-
plement of an open set D ⊆ Rd is a stopping time w.r.t. the filtration (FtB ). Therefore,
by Theorem 3.5 and the remark below, we obtain
Now let us assume in addition that the set D is bounded. Then T is almost surely
finite, and the sequence of random variables h(BT ∧n ) (n ∈ N) is uniformly bounded
because BT ∧n takes values in the closure D for any n ∈ N. Applying the Dominated
Convergence Theorem to (3.2.6), we obtain the integral representation
ˆ
h(x) = Ex [h(BT )] = h(y)µx (dy) (3.2.7)
∂D
where µx = Px ◦ BT−1 denotes the exit law from D for Brownian motion starting at x.
In Chapter 7, we show that the representation (3.2.7) still holds true if h is a continuous
Generalized mean value property for harmonic functions. For any bounded do-
main D ⊆ Rd and any x ∈ D, h(x) is the average of the boundary values of h on ∂D
w.r.t. the measure µx .
Monte Carlo solution of the Dirichlet problem. The stochastic representation (3.2.9)
can be used as the basis of a Monte Carlo method for computing the harmonic function
h(x) approximately by simulating a large number n of sample paths of Brownian motion
starting at x, and estimating the expectation by the corresponding empirical average. Al-
though in many cases classical numerical methods are more efficient, the Monte Carlo
method is useful in high dimensional cases. Furthermore, it carries over to far more
general situations.
Computation of exit law. Conversely, if the Dirichlet problem (3.2.8) has a unique
solution h, then computation of h (for example by standard numerical methods) enables
us to obtain the expectations in (3.2.8). In particular, the probability h(x) = Px [BT ∈ A]
for Brownian motion exiting the domain on a subset A ⊆ ∂D is informally given as the
solution of the Dirichlet problem
∆h = 0 on D, h = IA on ∂D.
This can be made rigorous under regularity assumptions. The full exit law is the har-
monic measure, i.e., the probability measure µx such that the representation (3.2.7) holds
for any function h ∈ C 2 (D) ∩ C(D) with ∆h = 0 on D. For simple domains such as
half-spaces, balls and cylinders, this harmonic measure can be computed explicitly.
Example (Exit laws from balls). For d ≥ 2, the exit law from the unit ball D = {y ∈
Rd : |y| < 1} for Brownian motion starting at a point x ∈ Rd with |x| < 1 is given by
1 − |x|2
µx (dy) = ν(dy)
|y − x|d
where ν denotes the normalized surface measure on the unit sphere S d−1 = {y ∈ Rd :
|y| = 1}. Indeed, the classical Poisson integral formula states that for any f ∈ C(S d−1 ),
the function ˆ
h(x) = f (y) µx (dy)
solves the Dirichlet problem on D with boundary values lim h(x) = f (z) for any z ∈
x→z
S d−1 , cf. e.g. [XXX Karatzas/Shreve, Ch. 4]. Hence by (3.2.9),
1 − |x|2
ˆ
Ex [f (BT )] = f (y) ν(dy)
|y − x|d
holds for any f ∈ C(S d−1 ), and thus by a standard approximation argument, for any
indicator function of a measurable subset of S d−1 .
Theorem 3.6 (Doob’s Lp inequality in continuous time). Suppose that (Mt )t∈[0,∞) is
a martingale with almost surely right continuous sample paths t 7→ Mt (ω). Then the
1 1
following estimates hold for any a ∈ [0, ∞), p ∈ [1, ∞), q ∈ (1, ∞] with p
+ q
= 1,
and c > 0:
" #
(1). P sup |Mt | ≥ c ≤ c−p · E[|Ma |p ],
t∈[0,a]
(2).
sup |Mt |
≤ q · kMa kLp .
t∈[0,a]
Lp
Proof. Let (πn ) denote an increasing sequence of partitions of the interval [0, a] such
that the mesh size of πn goes to 0 as n → ∞. By Corollary 2.13 applied to the discrete
time martingale (Mt )t∈πn , we obtain
P max |Mt | ≥ c ≤ E[|Ma |p ]/cp for any n ∈ N.
t∈πn
Moreover, as n → ∞,
The first assertion now follows by replacing c by c − ε and letting ε tend to 0. The
second assertion follows similarly from Theorem 2.15.
Lemma 3.7 (Passage probabilities for lines). For a one-dimensional Brownian motion
(Bt ) starting at 0 we have
yields
" #
P [Bt ≥ β + αt/2 for some t ∈ [0, a]] = P sup (Bt − αt/2) ≥ β
t∈[0,a]
" #
= P sup Mtα ≥ exp(αβ) ≤ exp(−αβ) · E[Maα ] = exp(−αβ)
t∈[0,a]
With slightly more effort, it is possible to compute the passage probability and the dis-
tribution of the first passage time of a line explicitly, cf. ?? below.
Application to LIL
A remarkable consequence of Lemma 3.7 is a simplified proof for the upper bound half
of the Law of the Iterated Logarithm:
Theorem 3.8 (LIL, upper bound). For a one-dimensional Brownian motion (Bt ) start-
ing at 0,
Bt
lim sup p ≤ +1 P -almost surely. (3.3.1)
tց0 2t log log t−1
p
where h(t) := 2t log log t−1 . Fix θ ∈ (0, 1). The idea is to approximate the function
h(t) by affine functions
ln (t) = βn + αn t/2
on each of the intervals [θn , θn−1 ], and to apply the upper bounds for the passage prob-
abilities from the lemma.
θ5 θ4 θ3 θ2 θ 1
We choose αn and βn in a such way that ln (θn ) = h(θn ) and ln (0) = h(θn )/2, i.e.,
ln (θn )
ln (t) ≤ ln (θn−1 ) ≤ (3.3.2)
θ
h(θn ) h(t)
= ≤ for any t ∈ [θn , θn−1 ].
θ θ
Stochastic Analysis Andreas Eberle
3.3. MAXIMAL INEQUALITIES AND THE LIL 123
ln (t)
h(t)
h(θn )
h(θn )/2
θn θn−1
We now want to apply the Borel-Cantelli lemma to show that with probability one,
Bt ≤ (1 + δ)ln (t) for large n. By Lemma 3.7,
By (3.3.2), the right hand side of (3.3.3) is dominated by (1+δ)h(t)/θ for t ∈ [θn , θn−1 ].
Hence
1+δ [
Bt ≤ h(t) for any t ∈ [θn , θn−1 ],
θ n≥N
Since (−Bt ) is again a Brownian motion starting at 0, the upper bound (3.3.1) also
implies
Bt
lim inf p ≥ −1 P -almost surely. (3.3.4)
tց0 2t log log t−1
The converse bounds are actually easier to prove since we can use the independence of
the increments and apply the second Borel-Cantelli Lemma. We only mention the key
steps and leave the details as an exercise:
Exercise (Complete proof of LIL). Prove the Law of the Iterated Logarithm:
Bt Bt
lim sup = +1 and lim inf = −1
tց0 h(t) tց0 h(t)
p
where h(t) = 2t log log t−1 . Proceed in the following way:
(1). Let θ ∈ (0, 1) and consider the increments Zn = Bθn − Bθn+1 , n ∈ N. Show that
if ε > 0, then
and complete the proof of the LIL by deriving the lower bounds
Bt Bt
lim sup ≥ 1 and lim inf ≤ −1 P -almost surely. (3.3.5)
tց0 h(t) tց0 h(t)
The strength of martingale theory is partially due to powerful general convergence the-
orems that hold for martingales, sub- and supermartingales. In this chapter, we study
convergence theorems with different types of convergence including almost sure, L2
and L1 convergence, and consider first applications.
At first, we will again focus on discrete-parameter martingales – the results can then be
easily extended to continuous martingales.
4.1 Convergence in L2
Already when proving the Law of Large Numbers, L2 convergence is much easier to
show than, for example, almost sure convergence. The situation is similar for mar-
tingales: A necessary and sufficient condition for convergence in the Hilbert space
L2 (Ω, A, P ) can be obtained by elementary methods.
Martingales in L2
Consider a discrete-parameter martingale (Mn )n≥0 w.r.t. a filtration (Fn ) on a probabil-
ity space (Ω, A, P ). Throughout this section we assume:
125
126 CHAPTER 4. MARTINGALE CONVERGENCE THEOREMS
Mn = E[M∞ | Fn ]
holds almost surely for any n ≥ 0, where M∞ denotes the limit of Mn in L2 (Ω, A, P ).
We will prove in the next section that (Mn ) does also converge almost surely to M∞ .
An analogue result to Theorem 4.2 holds with L2 replaced by Lp for any p ∈ (1, ∞) but
not for p = 1, cf. Section 4.3 below.
Indeed,
E[Mn2 ] − E[Mm
2
] = E[(Mn − Mm )(Mn + Mm )]
= E[(Mn − Mm )2 ] + 2E[Mm · (Mn − Mm )],
(2). To prove that (4.1.1) is sufficient for L2 convergence, note that the sequence
(E[Mn2 ])n≥0 is increasing by (4.1.2). If (4.1.1) holds then this sequence is bound-
ed, and hence a Cauchy sequence. Therefore, by (4.1.2), (Mn ) is a Cauchy se-
quence in L2 . Convergence now follows by completeness of L2 (Ω, A, P ).
(3). Conversely, if (Mn ) converges in L2 to a limit M∞ , then the L2 norms are bound-
ed. Moreover, by Jensen’s inequality, for each fixed k ≥ 0,
How can boundedness in L2 be verified for martingales? Writing the martingale (Mn )
as the sequence of partial sums of its increments Yn = Mn − Mn−1 , we have
n n
! n
X X X
2 2
E[Mn ] = M0 + Yk , M0 + Yk = E[M0 ] + E[Yk2 ]
k=1 k=1 L2 k=1
P
∞
n−α converges ⇐⇒ α>1 , whereas
n=1
P
∞
(−1)n n−α converges ⇐⇒ α > 0.
n=1
Therefore, it seems interesting to see what happens if the signs are chosen randomly.
The L2 martingale convergence theorem yields:
Corollary 4.3. Let (an ) be a real sequence. If (εn ) is a sequence of independent random
variables on (Ω, A, P ) with P [εn = +1] = P [εn = −1] = 1/2, then
∞
X ∞
X
2
εn an converges in L (Ω, A, P ) ⇐⇒ a2n < ∞.
n=1 n=1
P
n
Proof. The sequence Mn = εk ak of partial sums is a martingale with
k=1
∞
X ∞
X
sup E[Mn2 ] = E[ε2k a2k ] = a2k .
n≥0
k=1 k=1
P
∞
Example. The series εn · n−α converges in L2 if and only if α > 12 .
n=1
(2). For c > 0 let Tc = inf{n ≥ 0 : |Mn | ≥ c}. Apply the Optional Stopping
Theorem to the martingale in the Doob decomposition of (Mn2 ), and conclude
that P [Tc = ∞] = 0.
Proof. Choose any increasing sequence tn ∈ [0, u) such that tn → u. Then (Mtn ) is an
L2 -bounded discrete-parameter martingale. Hence the limit Mu = lim Mtn exists in L2 ,
and
Mtn = E[Mu | Ftn ] for any n ∈ N. (4.1.3)
For an arbitrary t ∈ [0, u), there exists n ∈ N with tn ∈ (t, u). Hence
Mt = E[Mtn | Ft ] = E[Mu | Ft ]
by (4.1.3) and the tower property. In particular, (Mt )t∈[0,u] is a square-integrable mar-
tingale. By orthogonality of the increments,
lim E[(Mu − Mt )2 ] = 0.
tրu
Remark. (1). Note that in the proof it is enough to consider a fixed sequence tn ր u.
(2). To obtain almost sure convergence, an additional regularity condition on the sam-
ple paths (e.g. right-continuity) is required, cf. below. This assumption is not
needed for L2 convergence.
Remark (L1 boundedness vs. L1 convergence). (1). The condition sup E[Zn− ] < ∞
holds if and only if (Zn ) is bounded in L1 . Indeed, as E[Zn+ ] < ∞ by our defini-
tion of a supermartingale, we have
(2). Although (Zn ) is bounded in L1 and the limit is integrable, L1 convergence does
not hold in general, cf. the examples below.
Proof. We may assume E[Zn− ] < ∞ since otherwise there is nothing to prove. The key
idea is to set up a predictable gambling strategy that increases our capital by (b − a)
for each completed upcrossing. Since the net gain with this strategy should again be a
supermartingale this yields an upper bound for the average number of upcrossings. Here
is the strategy:
• Wait until Zk ≤ a.
repeat
and for k ≥ 2,
1 if (Ck−1 = 1 and Zk−1 ≤ b) or (Ck−1 = 0 and Zk−1 ≤ a),
Ck = .
0 otherwise
for k ≤ n. Therefore, by Theorem 2.6 and the remark below, the process
k
X
(C• Z)k = Ci · (Zi − Zi−1 ), 0 ≤ k ≤ n,
i=1
is again a supermartingale.
Clearly, the value of the process C• Z increases by at least (b − a) units during each
completed upcrossing. Between upcrossing periods, the value of (C• Z)k is constant.
Finally, if the final time n is contained in an upcrossing period, then the process can
decrease by at most (Zn − a)− units during that last period (since Zk might decrease
before the next upcrossing is completed). Therefore, we have
Zn
Proof. Let
U (a,b) = sup Un(a,b)
n∈N
denote the total number of upcrossings of the supermartingale (Zn ) over an interval
(a, b) with −∞ < a < b < ∞. By the upcrossing inequality and monotone conver-
gence,
1
E[U (a,b) ] = lim E[Un(a,b) ] ≤ · sup E[(Zn − a)− ]. (4.2.1)
n→∞ b − a n∈N
Assuming sup E[Zn− ] < ∞, the right hand side of (4.2.1) is finite since (Zn − a)− ≤
|a| + Zn− . Therefore,
U (a,b) < ∞ P -almost surely,
It remains to show that the almost sure limit Z∞ = lim Zn is an integrable random
variable (in particular, it is finite almost surely). This holds true as, by the remark below
Theorem 4.5, sup E[Zn− ] < ∞ implies that (Zn ) is bounded in L1 , and therefore
by Fatou’s lemma.
Suppose that P [ηi 6= 0] > 0. Then there exists ε > 0 such that P [|ηi | ≥ ε] > 0. As the
increments are i.i.d., the event {|ηi | ≥ ε} occurs infinitely often with probability one.
Therefore, almost surely, the martingale (Sn ) does not converge as n → ∞.
Ta = inf{t ≥ 0 : Sn ≥ a}
of the interval [a, ∞). By the Optional Stopping Theorem, the stopped Random Walk
(STa ∧n )n≥0 is again a martingale. Moreover, as Sk < a for any k < Ta and the incre-
ments are bounded by c, we obtain the upper bound
Therefore, the stopped Random Walk converges almost surely by the Supermartingale
Convergence Theorem. As (Sn ) does not converge, we can conclude that
P [Ta < ∞] = 1 for any a > 0, i.e., lim sup Sn = ∞ almost surely.
Remark (Almost sure vs. Lp convergence). In the last example, the stopped process
does not converge in Lp for any p ∈ [1, ∞). In fact,
Mn = Zn /αn
is a martingale. By the almost sure convergence theorem, a finite limit M∞ exists al-
most surely, because Mn ≥ 0 for all n. For the almost sure asymptotics of (Zn ), we
distinguish three different cases:
(2). α = 1: Here (Zn ) is a martingale and converges almost surely to a finite limit. If
P [Yi 6= 1] > 0 then there exists ε > 0 such that Yi ≥ 1 + ε infinitely often with
probability one. This is consistent with convergence of (Zn ) only if the limit is
zero. Hence, if (Zn ) is not almost surely constant, then also in the critical case,
Zn → 0 almost surely.
(3). α > 1 (supercritical): In this case, on the set {M∞ > 0},
Zn = Mn · αn ∼ M∞ · αn ,
i.e., (Zn ) grows exponentially fast. The asymptotics on the set {M∞ = 0} is not
evident and requires separate considerations depending on the model.
Although most of the conclusions in the last example could have been obtained without
martingale methods (e.g. by taking logarithms), the martingale approach has the advan-
tage of carrying over to far more general model classes. These include for example
branching processes or exponentials of continuous time processes.
To study the asymptotic behavior of h(x) as x approaches the boundary ∂D, we con-
struct a Markov chain (Xn ) such that h(Xn ) is a martingale: Let r : D → (0, ∞) be a
continuous function such that
and let (Xn ) w.r.t Px denote the canonical time-homogeneous Markov chain with state
space D, initial value x, and transition probabilities
D
x
r(x)
By (4.2.3), the function h is integrable w.r.t. p(x, dy), and, by the mean value property,
(1). Boundary regularity: If h is harmonic and bounded from below on D then the
limit lim h(Xn ) exists along almost every trajectory Xn to the boundary ∂D.
n→∞
Note that, in contrast to classical results from analysis, the first statement holds without
any smoothness condition on the boundary ∂D. Thus, although boundary values of h
may not exist in the classical sense, they do exist along almost every trajectory of the
Markov chain!
and the right hand side is an integrable random variable. Therefore, (Mn ) converges
almost surely on {Ta = ∞}. Since this holds for every a < 0, we obtain almost sure
convergence on the set
[
{lim inf Mn > −∞} = {Ta = ∞}.
a<0
a∈Q
Similarly, almost sure convergence follows on the set {lim sup Mn < ∞}.
Now let (Fn )n≥0 be an arbitrary filtration. As a consequence of Theorem 4.7 we obtain:
P
n P
n
Proof. Let Sn = IAk and Tn = E[IAk | Fk−1]. Then Sn and Tn are almost surely
k=1 k=1
increasing sequences. Let S∞ = sup Sn and T∞ = sup Tn denote the limits on [0, ∞].
The claim is that almost surely,
S∞ = ∞ ⇐⇒ T∞ = ∞. (4.2.4)
P P
1st Borel-Cantelli Lemma: If P [An ] < ∞ then P [An | Fn−1] < ∞ almost surely,
and therefore
P [An infinitely often] = 0.
P
2nd Borel-Cantelli Lemma: If P [An ] = ∞ and the An are independent then
P P
P [An | Fn−1 ] = P [An ] = ∞ almost surely, and therefore
(a,b)
Ut [Z] := sup U (a,b) [(Zs )s∈π ].
π⊂[0,t]
finite
Note that if (Zs ) has right-continuous sample paths and (πn ) is a sequence of partitions
of [0, t] such that 0, t ∈ π0 , πn ⊂ πn+1 and mesh(πn ) → 0 then
(a,b)
Ut [Z] = lim U (a,b) [(Zs )s∈πn ].
n→∞
(a,b) 1
E[Ut ] ≤ E[(Zt − a)− ].
b−a
(2). Convergence Theorem: If sup E[Zs− ] < ∞, then the limit Zu− = lim Zs exists
s∈[0,u) sրu
almost surely, and Zu− is an integrable random variable.
where (πn ) is a sequence of partitions as above. The assertion now follows by the
Monotone Convergence Theorem.
(2). The almost sure convergence can now be proven in the same way as in the discrete
time case.
More generally than stated above, the upcrossing inequality also implies that for a right-
continuous supermartingale (Zs )s∈[0,u) all the left limits lim Zs , t ∈ [0, u), exist simul-
sրt
taneously with probability one. Thus almost every sample path is càdlàg (continue à
droite, limites a gauche, i.e., right continuous with left limits). By similar arguments,
the existence of a modification with right continuous (and hence càdlàg) sample paths
can be proven for any supermartingale (Zs ) provided the filtration is right continuous
and complete, and s 7→ E[Zs ] is right continuous, cf. e.g. [XXXRevuz/Yor, Ch.II,§2].
The Supermartingale Convergence Theorem shows that every supermartingale (Zn ) that
is bounded in L1 converges almost surely to an integrable limit Z∞ . However, L1 con-
vergence does not necessarily hold:
Qn
Example. (1). Suppose that Zn = i=1 Yi where the Yi are i.i.d. with E[Yi ] = 1,
P [Yi 6= 1] > 0. Then, Zn → 0 almost surely, cf. Example 2 in Section 4.2. On
the other hand, L1 convergence does not hold as E[Zn ] = 1 for any n.
Uniform integrability
Let (Ω, A, P ) be a probability space. The key condition required to deduce L1 conver-
gence from convergence in probability is uniform integrability. To motivate the defini-
tion we first recall two characterizations of integrable random variables:
with relative density |X| w.r.t. P is absolutely continuous w.r.t. P in the following
sense: For any ε > 0 there exists δ > 0 such that
Proof. (1). For an integrable random variable X the first assertion holds by the Mono-
tone Convergence Theorem, since |X| · I{|X|≥c} ց 0 as c ր ∞.
Uniform integrability means that properties (1) and (2) hold uniformly for a family of
random variables:
We will prove below that convergence in probability plus uniform integrability is equiv-
alent to L1 convergence. Before, we state two lemmas giving sufficient conditions for
uniform integrability (and hence for L1 convergence) that can often be verified in appli-
cations:
The first condition in Lemma 4.11 is the classical assumption in the Dominated Con-
vergence Theorem. The second condition holds in particular if
or, if
sup E[|Xi |(log |Xi |)+ ] < ∞ (Entropy condition)
i∈I
is satisfied. Boundedness in L1 , however, does not imply uniform integrability, cf. the
examples at the beginning of this section.
The next observation is crucial for the application of uniform integrability to martin-
gales:
{E[X | F ] : F ⊆ A σ-algebra}
Proof. By Lemma 4.10, for any ε > 0 there exists δ > 0 such that
1 1
P [|E[X | F ]| ≥ c] ≤ E[|E[X | F ]|] ≤ E[ |X| ],
c c
(4.3.1) holds simultaneously for all σ-algebras F ⊆ A if c is sufficiently large.
Theorem 4.13. Suppose that (Xn )n∈N is a sequence of integrable random variables.
Then (Xn ) converges to a random variable X w.r.t. the L1 norm if and only if Xn
converges to X in probability and the family {Xn : n ∈ N} is uniformly integrable.
Proof. (1). We first prove the “if” part of the assertion under the additional assumption
that the random variables |Xn | are uniformly bounded by a finite constant c: For
ε > 0,
Here we have used that |Xn | ≤ c and hence |X| ≤ c with probability one, because
a subsequence of (Xn ) converges almost surely to X. For sufficiently large n, the
right hand side of (4.3.2) is smaller than 2ε. Therefore, E[ |Xn − X| ] → 0 as
n → ∞.
(2). To prove the “if” part under the uniform integrability condition, we consider the
cut-off-functions
φc (x) = (x ∧ c) ∨ (−c)
φc
−c c
−c
E[ |Xn − X| ]
≤ E[ |Xn − φc (Xn )| ] + E[ |φc (Xn ) − φc (X)| ] + E[ |φc (X) − X| ]
≤ E[ |Xn | ; |Xn | ≥ c] + E[ |φc (Xn ) − φc (X)| ] + E[ |X| ; |X| ≥ c].
Let ε > 0 be given. Choosing c large enough, the first and the last summand on
the right hand side are smaller than ε/3 for all n by uniform integrability of {Xn :
n ∈ N} and integrability of X. Moreover, by (4.3.3), there exists n0 (c) such that
the middle term is smaller than ε/3 for n ≥ n0 (c). Hence E[ |Xn − X| ] < ε for
n ≥ n0 , and thus Xn → X in L1 .
cf. Lemma 4.10. Hence, if P [A] < δ then supn≥n0 E[ |Xn | ; A] < ε.
Moreover, again by Lemma 4.10, there exist δ1 , . . . , δn0 > 0 such that for n ≤ n0 ,
L1 convergence of martingales
If X is an integrable random variable and (Fn ) is a filtration then Mn = E[X | Fn ]
is a martingale w.r.t. (Fn ). The next result shows that an arbitrary martingale can be
represented in this way if and only if it is uniformly integrable:
Theorem 4.14 (L1 Martingale Convergence Theorem). Suppose that (Mn )n≥0 is a
martingale w.r.t. a filtration (Fn ). Then the following statements are equivalent:
Proof.
for c ∈ (0, ∞) sufficiently large. Therefore, the limit M∞ = lim Mn exists al-
most surely and in probability by the almost sure convergence theorem. Uniform
integrability then implies
Mn → M∞ in L1
by Theorem 4.13.
Proof. Let Mn := E[X | Fn ]. By the almost sure and the L1 martingale convergence
theorem, the limit M∞ = lim Mn exists almost surely and in L1 . To obtain a measurable
function that is defined everywhere, we set M∞ := lim sup Mn . It remains to verify, that
M∞ is a version of the conditional expectation E[X | F∞ ]. Clearly, M∞ is measurable
w.r.t. F∞ . Moreover, for n ≥ 0 and A ∈ Fn ,
S
by (4.3.4). Since Fn is stable under finite intersections,
E[M∞ ; A] = E[X ; A]
S
holds for all A ∈ σ( Fn ) as well.
Fn = σ(A1 , . . . , An ), n ≥ 0.
Note that for each n ≥ 0, there exist finitely many atoms B1 , . . . , Bk ∈ A (i.e. disjoint
S
events with Bi = Ω) such that Fn = σ(B1 , . . . , Bk ). Therefore, the conditional
expectation given Fn can be defined in an elementary way:
X
E[X | Fn ] := E[X | Bi ] · IBi .
i : P [Bi ]6=0
Moreover, by Corollary 4.15, the limit M∞ = lim E[X | Fn ] exists almost surely and in
L1 , and M∞ is a version of the conditional expectation E[X | F ].
You might (and should) object that the proofs of the martingale convergence theorems
require the existence of conditional expectations. Although this is true, it is possible
to state the necessary results by using only elementary conditional expectations, and
thus to obtain a more constructive proof for existence of conditional expectations given
separable σ-algebras.
Corollary 4.16 (0-1 Law of P.Lévy). If (Fn ) is a filtration on (Ω, A, P ) then for any
S
event A ∈ σ( Fn ),
The L1 Martingale Convergence Theorem also implies that any martingale that is Lp
bounded for some p ∈ (1, ∞) converges in Lp :
(1). Prove that (Mn ) converges almost surely and in L1 , and Mn = E[M∞ | Fn ] for
any n ≥ 0.
Note that uniform integrability of |Mn |p holds automatically and has not to be assumed !
Exercise (Backward Martingale Convergence Theorem and LLN). Let (Fn )n≥0 be
a decreasing sequence of sub-σ-algebras on a probability space (Ω, A, P ).
(1). Prove that for any random variable X ∈ L1 (Ω, A, P ), the limit M−∞ of the
sequence M−n := E[X | Fn ] as n → −∞ exists almost surely and in L1 , and
\
M−∞ = E[X | Fn ] almost surely.
(2). Now let (Xn ) be a sequence of i.i.d. random variables in L1 (Ω, A, P ), and let
Fn = σ(Sn , Sn+1 , . . .) where Sn = X1 + . . . + Xn . Prove that
Sn
E[X1 | Fn ] = ,
n
and conclude that the strong Law of Large Numbers holds:
Sn
−→ E[X1 ] almost surely.
n
with i.i.d. random variables ηi ∈ L2 such that E[ηi ] = 0 and Var[ηi ] = 1, a continuous
function σ : R → R, and a scale factor h > 0. Equivalently,
n−1
X
(h)
√ (h)
Xn(h) = X0 + h· σ(Xk ) · ηk+1 , n = 0, 1, 2, . . . . (5.0.2)
k=0
(h)
If σ is constant then as h ց 0, the rescaled process (X⌊t/h⌋ )t≥0 converges in distribution
to (σ · Bt ) where (Bt ) is a Brownian motion. We are interested in the scaling limit for
general σ. One can prove that the rescaled process again converges in distribution, and
the limit process is a solution of a stochastic integral equation
ˆt
Xt = X0 + σ(Xs ) dBs , t ≥ 0. (5.0.3)
0
Here the integral is an Itô stochastic integral w.r.t. a Brownian motion (Bt ). Usually the
equation (5.0.3) is written briefly as
152
153
Example (Stock prices, geometric Brownian motion). A simple discrete time model
for stock prices is given by
ˆt
It = Hs dXs , t ≥ 0, (5.1.1)
0
for continuous functions and, respectively, continuous adapted processes (Hs ) and (Xs ).
where s′ := min{u ∈ π : u > s} denotes the next partition point after s. Note that
the increments δXs vanish for s ≥ t. In particular, only finitely many of the increments
are not equal to zero. A nearby approach for defining the integral It in (5.1.1) would be
Riemann sum approximations:
P
Variant 2 (anticipative): Iˆtn = Hs′ δXs ,
s∈πn
◦ P 1
Variant 3 (anticipative): Itn = 2
(Hs + Hs′ )δXs .
s∈πn
Note that for finite t, in each of the sums, only finitely many summands do not vanish.
For example,
X X
Itn = Hs δXs = Hs · (Xs′ ∧t − Xs ).
s∈πn s∈πn
s<t s<t
Now let us consider at first the case where Hs = Xs and t = 1, i.e., we would like to
´1
define the integral I = Xs dXs . Suppose first that X : [0, 1] → R is a continuous
0
function of finite variation, i.e.,
( )
X
V (1) (X) = sup |δXs | : π partition of R+ < ∞.
s∈π
Then for H = X and t = 1 all the approximations above converge to the same limit as
n → ∞. For example,
X
kIˆ1n − I1n k = (δXs )2 ≤ V (1) (X) · sup |δXs |,
s∈πn
s∈πn
and the right-hand side converges to 0 by uniform continuity of X on [0, 1]. In this case
the limit of the Riemann sums is a Riemann-Stieltjes integral
ˆ1
lim I1n = lim Iˆ1n = Xs dXs ,
n→∞ n→∞
0
which is well-defined whenever the integrand is continuous and the integrator is of finite
variation or conversely. The sample paths of Brownian motion, however, are almost
surely not of finite variation. Therefore, the reasoning above does not apply, and in fact
if Xt = Bt is a one-dimensional Brownian motion and Ht = Xt then
X X
E[ |Iˆ1n − I1n | ] = E[(δBs )2 ] = δs = 1,
s∈πn s∈πn
i.e., the L1 -limits of the random sequence (I1n ) and (Iˆ1n ) are different if they exist. Below
◦
we will see that indeed the limits of the sequences (I1n ), (Iˆ1n ) and (I1n ) do exist in L2 ,
and all the limits are different. The limit of the non-anticipative Riemann sums I1n
´1
is the Itô stochastic integral 0 Bs dBs , the limit of (Iˆ1n ) is the backward Itô integral
´1
ˆ s , and the limit of I ◦ is the Stratonovich integral 1 Bs ◦ dBs . All three notions
´
0
Bs dB n 0
of stochastic integrals are relevant. The most important one is the Itô integral because
´t
the non-anticipating Riemann sum approximations imply that the Itô integral 0 Hs dBs
is a continuous time martingale transform of Brownian motion if the process (Hs ) is
adapted.
ˆt X
Hs dXs = lim Hs δXs
n→∞
0 s∈πn
Note that the definition is vague since the mode of convergence is not specified. More-
over, the Itô integral might depend on the sequence (πn ). In the following sections we
will see which kind of convergence holds in different circumstances, and in which sense
the limit is independent of (πn ).
To get started let us consider the convergence of Riemann sum approximations for the
´t
Itô integrals Hs dBs of a bounded continuous (Fs ) adapted process (Hs )s≥0 w.r.t. an
0
(Fs ) Brownian motion (Bs ). Let (πn ) be a fixed sequence of partitions with πn ⊆ πn+1
and mesh(πn ) → 0. Then for the Riemann-Itô sums
X X
Itn = Hs δBs = Hs (Bs′ ∧t − Bs )
s∈πn s∈πn
s<t
we have
X
Itn − Itm = (Hs − H⌊s⌋m ) δBs for any m ≤ n,
s∈πn
s<t
where
Vm := sup (Hs − Hr )2 −→ 0 as m → ∞
|s−r|≤mesh(πm )
Theorem 5.1 (Itô integrals for bounded continuous integrands, Variant 1). Suppose
that (Hs )s≥0 is a bounded continuous (Fs ) adapted process, and (Bs )s≥0 is an (Fs )
Brownian motion. Then for any fixed t ≥ 0, the Itô integral
ˆt
Hs dBs = lim Itn (5.1.2)
n→∞
0
exists as a limit in L2 (Ω, A, P ). Moreover, the limit does not depend on the choice of a
sequence of partitions (πn ) with mesh (πn ) → 0.
Proof. An analogue argument as above shows that for any partitions π and π
e such that
e, the L2 distance of the corresponding Riemann sum approximations Itπ and Itπe is
π⊇π
bounded by a constant C(mesh(e
π )) that only depends on the maximal mesh size of the
two partitions. Moreover, the constant goes to 0 as the mesh sizes go to 0. By choosing
a joint refinement and applying the triangle inequality, we see that
The definition of the Itô integral suggested by Theorem 5.1 has two obvious drawbacks:
´t
Drawback 1: The integral 0
Hs dBs is only defined as an equivalence class in L2 (Ω, A, P ),
i.e., uniquely up to modification on P -measure zero sets. In particular, we do not have a
´t
pathwise definition of 0 Hs (ω) dBs (ω) for a given Brownian sample path s 7→ Bs (ω).
Drawback 2: Even worse, the construction above works only for a fixed integra-
tion interval [0, t]. The exceptional sets may depend on t and therefore, the process
´t
t 7→ 0 Hs dBs does not have a meaning yet. In particular, we do not know yet if there
exists a version of this process that is almost surely continuous.
The first drawback is essential: In certain cases it is indeed possible to define stochastic
integrals pathwise, cf. Chapter 6 below. In general, however, pathwise stochastic inte-
grals cannot be defined. The extra impact needed is the Lévy area process, cf. the rough
paths theory developed by T. Lyons and others [XXXLyons, Friz and Victoir, Friz and
Hairer].
Fortunately, the second drawback can be overcome easily. By extending the Itô isom-
etry to an isometry into the space Mc2 of continuous L2 bounded martingales, we can
´t
construct the complete process t 7→ 0 Hs dBs simultaneously as a continuous martin-
gale. The key observation is that by the maximal inequality, continuous L2 bounded
martingales can be controlled uniformly in t by the L2 norm of their final value.
will not necessarily be (Ft ) adapted. To ensure adaptedness, we have to consider the
completed filtration
If the martingales are right-continuous then two modifications agree almost surely, i.e.,
ft ∀t ∈ [0, u]] = 1.
P [Mt = M
In order to obtain norms and not just semi-norms, we consider the spaces
M 2 ([0, u]) := M2 ([0, u])/ ∼ and Mc2 ([0, u]) := M2c ([0, u])/ ∼
As the process (Mt2 ) is a submartingale for any M ∈ M 2 ([0, u]), the norm correspond-
ing to the inner product is given by
This crucial estimate shows that on the subspaces Mc2 and Md2 , the M 2 norm is equiva-
lent to the L2 norm of the supremum of the martingale. Therefore, the M 2 norm can be
used to control (right) continuous martingales uniformly in t!
Lemma 5.2 (Completeness). (1). The space M 2 ([0, u]) is a Hilbert space, and the
linear map M 7→ Mu from M 2 ([0, u]) to L2 (Ω, Fu , P ) is onto and isometric.
(2). The spaces Mc2 ([0, u]) and Md2 ([0, u]) are closed subspaces of M 2 ([0, u]), i.e.,
if (M n ) is a Cauchy sequence in Mc2 ([0, u]), or in Md2 ([0, u]) respectively, then
there exists a (right) continuous martingale M ∈ M 2 ([0, u]) such that
Proof. (1). By definition of the inner product on M 2 ([0, u]), the map M 7→ Mu is
an isometry. Moreover, for X ∈ L2 (Ω, Fu , P ), the process Mt = E[X | Fu ]
is in M 2 ([0, u]) with Mu = X. Hence, the range of the isometry is the whole
space L2 (Ω, Fu , P ). Since L2 (Ω, Fu , P ) is complete w.r.t. the L2 norm, the space
M 2 ([0, u]) is complete w.r.t. the M 2 norm.
(2). If (M n ) is a Cauchy sequence in Mc2 ([0, u]) or in Md2 ([0, u]) respectively, then by
(5.1.3),
Remark. We point out that the (right) continuous representative (Mt ) defined by (5.1.4)
is a martingale w.r.t. the complete filtration (FtP ), but it is not necessarily adapted w.r.t.
(Ft ).
are continuous L bounded martingales on [0, u]. We can therefore restate Theorem 5.1
2
Corollary 5.3 (Itô integrals for bounded continuous integrands, Variant 2). Suppose
that (Hs )s≥0 is a bounded continuous (Fs ) adapted process. Then for any fixed u ≥ 0,
the Itô integral
ˆ•
Hs dBs = lim (Itn )t∈[0,u] (5.1.5)
n→∞
0
exists as a limit in Mc2 ([0, u]). Moreover, the limit does not depend on the choice of a
sequence of partitions (πn ) with mesh (πn ) → 0.
In a first step, we define the integrals for predictable step functions (Ht ) of type
n−1
X
Ht (ω) = Ai (ω)I(ti ,ti+1 ] (t)
i=0
with n ∈ N, 0 ≤ t0 < t1 < t2 < . . . < tn , and bounded Fti -measurable random vari-
ables Ai , i = 0, 1, . . . , n − 1. Let E denote the vector space consisting of all stochastic
processes of this form.
Definition (Itô integral for predictable step functions). For stochastic processes H ∈
E and t ≥ 0 we define
ˆt n−1
X X
Hs dMs := Ai · (Mti+1 ∧t − Mti ∧t ) = Ai · (Mti+1 ∧t − Mti ).
0 i=0 i : ti <t
Note that the map (H, M) 7→ H• M is bilinear. The process H• M is a continuous time
martingale transform of M w.r.t. H. It models for example the net gain up to time t
if we hold Ai units of an asset with price process (Mt ) during each of the time intervals
(ti , ti+1 ].
Similarly to the discrete time case, the fact that Ai is predictable, i.e., Fti -measurable,
is essential for the martingale property:
(1). At first we note that (5.2.1) holds for s, t ∈ {t0 , t1 , . . . , tn }. Indeed, since Ai is
Fti -measurable, the process
j−1
X
(H• M)tj = Ai · (Mti+1 − Mti ), j = 0, 1, . . . , n,
i=0
is a martingale transform of the discrete time martingale (Mti ), and hence again
a martingale.
(2). Secondly, suppose s, t ∈ [tj , tj+1 ] for some j ∈ {0, 1, 2, . . . , n − 1}. Then
(3). Finally, suppose that s ∈ [tj , tj+1 ] and t ∈ [tk , tk+1] with j < k.
tj s tj+1 tk t tk+1
Then by the tower property for conditional expectations and by (1) and (2),
n−1
X ˆt
Hti · (Mti+1 ∧t − Mti ∧t ) = Hsπ dMs (5.2.2)
i=0 0
P
n−1
where H π := Hti · I(ti ,ti+1 ] is a process in E .
i=0
ˆ•
J (H) = H• M = Hs dMs
0
It turns out that we can even identify explicitly a simple norm on E such that the Itô
map is an isometry. We first consider the case where (Mt ) is a Brownian motion:
Theorem 5.5 (Itô’s isometry for Brownian motion). If (Bt ) is an (Ft ) Brownian mo-
tion on (Ω, A, P ) then for any u ∈ [0, ∞], and for any process H ∈ E ,
u 2 u
ˆ ˆ
kH• Bk2M 2 ([0,u]) = E Hs dBs = E Hs2 ds = kHk2L2 (P ⊗λ(0,u) ) (5.2.3)
0 0
Pn−1
Proof. Suppose that H = i=0 Ai · I(ti ,ti+1 ] with n ∈ N, 0 ≤ t0 < t1 < . . . < tn and
Ai bounded and Fti -measurable. With the notation δi B := Bti+1 ∧u − Bti ∧u , we obtain
"ˆ !2
u 2 # n−1
X n−1
X
E Hs dBs = E Ai δi B = E [Ai Ak · δi Bδk B] . (5.2.4)
0 i=0 i,k=0
By the martingale property, the summands on the right hand side vanish for i 6= k.
Indeed, if, for instance, i < k then
on the product space Ω × (0, u). In particular, J respects P ⊗ λ classes, i.e., if Hs (ω) =
e s (ω) for P ⊗ λ-almost every (ω, s) then • H dB = • H e dB P -almost surely. Hence
´ ´
H 0 0
J also induces a linear map between the corresponding spaces of equivalence classes.
As usual, we do not always differentiate between equivalence classes and functions, and
so we denote the linear map on equivalence classes again by J :
For continuous square integrable martingales, the assumption is always satisfied. In-
deed, assuming continuity, the “angle bracket process” hMit coincides almost surely
with the quadratic variation process [M]t of M, cf. Section 6.3 below. For Brownian
motion, Assumption A holds with
hBit = t.
Since t 7→ hMit (ω) is continuous and non-decreasing for a given ω, it is the distribution
function of a unique positive measure hMi(ω, dt) on R+ .
Theorem 5.6 (Itô’s isometry for martingales). Suppose that (Mt )t≥0 is a right con-
tinuous (Ft ) martingale with angle bracket process hMi satisfying Assumption A. Then
for any u ∈ [0, ∞], and for any process H ∈ E ,
"ˆ 2 # ˆ
u u
2
kH• MkM 2 ([0,u]) = E Hs dMs = E Hs2 dhMis (5.2.7)
0 0
where dhMi denotes integration w.r.t. the positive measure with distribution function
F (t) = hMit .
Proof. The proof is similar to the proof of Theorem 5.5 above. Suppose again that
Pn−1
H = i=0 Ai · I(ti ,ti+1 ] with n ∈ N, 0 ≤ t0 < t1 < . . . < tn and Ai bounded and Fti -
measurable. With the same notation as in the proof above, we obtain by the martingale
properties of M and M 2 − hMi,
E[A2i · (δi M)2 ] = E[A2i E[(δi M)2 | Fti ]] = E[A2i E[δi hMi | Fti ]] = E[A2i · δi hMi].
For a continuous square integrable martingale, Theorem 5.6 implies that the linear map
ˆ r
J :E → M2c ([0, u]), J (H) = Hs dMs ,
0 r∈[0,u]
Again, we denote the corresponding linear map induced on equivalence classes by the
same letter J .
Let Eu denote the closure of the space E in L2 (Ω × (0, u), PhM i ). Since J is linear with
This can be used to define the Itô integral for any process in Eu , i.e., for any process that
can be approximated by predictable step functions w.r.t. the L2 (PhM i ) norm:
ˆ t
H• B := J (H), Hs dBs := (H• B)t .
0
´t
Definition (Itô integral). For H ∈ Eu the process H• M = ( 0
Hs dMs )t∈[0,u] is the up
to modifications unique continuous martingale on [0, u] satisfying
(2). The definition above is consistent in the following sense: If H• M is the stochastic
integral defined on the time interval [0, v] and u ≤ v, then the restriction of H• M
to [0, u] coincides with the stochastic integral on [0, u].
For 0 ≤ s ≤ t we define
ˆ t
Hr dMr := (H• M)t − (H• M)s .
s
Having defined the Itô integral, we now show that bounded adapted processes with
left-continuous sample paths are contained in the closure of the simple predictable pro-
cesses, and the corresponding stochastic integrals are limits of predictable Riemann sum
approximations. As above, we consider a sequence (πn ) of partitions of R+ such that
mesh(πn ) → 0.
Theorem 5.7 (Approximation by Riemann-Itô sums). Let u ∈ (0, ∞), and suppose
that (Ht )t∈[0,u) is an (FtP ) adapted stochastic process on (Ω, A, P ) such that (t, ω) 7→
Ht (ω) is product-measurable and bounded. If t 7→ Ht is P -almost surely left continuous
then H is in Eu , and
ˆ t X
Hs dMs = lim Hs (Ms′ ∧t − Ms∧t ), t ∈ [0, u], (5.2.10)
0 n→∞
s∈πn
Proof. For any t ∈ [0, u], the Riemann sums on the right hand side of (5.2.10) are the
´t
stochastic integrals 0 Hsn dMs of the predictable step functions
X
Htn := Hs · I(s,s′] (t), n ∈ N.
s∈πn ,s<u
Hn → H in L2 (PhM i ).
Here we have used that the sequence (H n ) is uniformly bounded since H is bounded by
assumption. Now, by Itô’s isometry,
ˆ • ˆ •
Hs dMs = lim Hsn dMs in Mc2 ([0, u]).
0 n→∞ 0
The corresponding space of equivalence classes of P ⊗λ versions is denoted by L2a (0, u).
Proof. It only remains to show that an L2 (P ⊗ λ) limit of (FtP ) adapted processes again
has an (FtP ) adapted P ⊗ λ-version. Hence consider a sequence H n ∈ L2a (0, u) with
H n → H in L2 (P ⊗ λ). Then there exists a subsequence (H nk ) such that Htnk (ω) →
e defined by H
Ht (ω) for P ⊗ λ-almost every (ω, t) ∈ Ω × (0, u). The process H e t (ω) :=
e t (ω) := 0 otherwise, is an (F P ) adapted version of
lim Htnk (ω) if the limit exists, H t
H.
We can now identify the class of integrands H for which the stochastic integral H• B is
well-defined as a limit of integrals of predictable step functions in Mc2 ([0, u]):
Theorem 5.9 (Admissible integrands for Brownian motion). For any u ∈ (0, ∞],
Proof. Since E ⊆ L2a (0, u) it only remains to show the inclusion “⊇”. Hence fix a
process H ∈ L2a (0, u). We will prove in several steps that H can be approximated by
simple predictable processes w.r.t. the L2 (P ⊗ λ(0,u) ) norm:
(1). Suppose first that H is bounded and has almost surely continuous trajectories.
Then for u < ∞, H is in Eu by Theorem 5.7. For u = ∞, H is still in Eu
provided there exists t0 ∈ (0, ∞) such that Ht vanishes for t ≥ t0 .
for P -almost every ω. It is a standard fact from analysis that (5.2.12) implies
ˆ u
|Htn (ω) − Ht (ω)|2 dt −→ 0 as n → ∞.
0
(3). We finally prove that general H ∈ L2a (0, u) are contained in Eu . This is a conse-
quence of (2), because we can approximate H by the processes
w.r.t. the L2 (P ⊗ λ[0,u)) norm, cf. [XXXSteele: “Stochastic calculus and financial ap-
´t
plications”, Sect 6.6]. Therefore, the stochastic integral 0 H dB can be approximated
for t ≤ u by the correspondingly modified Riemann sums.
For continuous martingales, a similar statement as in Theorem 5.9 holds provided the
angle bracket process is absolutely continuous. Let L2a (0, u; M) denote the linear space
of all product-measurable, (FtP ) adapted stochastic processes (ω, t) 7→ Ht (ω) such that
ˆ u
2
E Ht dhMit < ∞.
0
The corresponding space of equivalence classes w.r.t. PhM i is denoted by L2a (0, u; M).
5.3 Localization
Square-integrability of the integrand is still an assumption that we would like to avoid,
since it is not always easy to verify or may even fail to hold. The key to extending
Proof. We go through the same approximations as in the proof of Theorem 5.9 above:
e t are almost surely continuous and bounded, and there
(1). Suppose first that Ht and H
exists t0 ∈ R+ such that Ht = H e t = 0 for t ≥ t0 . Let (πn ) be a sequence of
partitions with mesh(πn ) → 0. Then by Theorem 5.7,
ˆ t X
H dM = lim Hs · (Ms′ ∧t − Ms ), and
0 n→∞
s∈πn
s<t
ˆ t X
e dM
H f = lim e s · (M
H fs′ ∧t − M
fs )
0 n→∞
s∈πn
s<t
e t := 0 for t < 0)
(with ψn defined as in the proof of Theorem 5.9 and Ht := H
coincide for t ≤ T . Hence by (1), on {t ≤ T },
ˆ t ˆ t ˆ t ˆ t
H dM = lim n
H dM = = lim en f =
H dM e dM
H f,
0 0 0 0
where the convergence holds again almost surely uniformly in t along a subse-
quence.
e by
(3). Finally, in the general case the assertion follows by approximating H and H
the bounded processes
e n = ((H
Htn = ((Ht ∧ n) ∨ (−n)) · I[0,n] (t), H e t ∧ n) ∨ (−n)) · I[0,n] (t).
t
is product measurable in (t, ω), adapted w.r.t. the filtration (FtP ), and
Here for u ∈ (0, ∞], the space L2loc ([0, u), dhMi(ω)) consists of all measurable func-
´s
tions f : [0, u) → [−∞, ∞] such that 0 f (t)2 dhMit (ω) < ∞ for any s ∈ (0, u). In
particular, it contains all continuous functions.
e t )0≤t<∞ of
From now on, we use the notation Ht · I{t<T } for the trivial extension (H
a process (Ht )0≤t<T beyond the stopping time T . Locally square integrable adapted
processes allow for a localization by stopping times:
are (FtP ) stopping times. Moreover, for almost every ω, the function t 7→ Ht (ω) is
´t
in L2loc ([0, T ), dhMi(ω)). Hence the function t 7→ 0 Hs (ω)2 dhMis is increasing and
finite on [0, T (ω)), and therefore Tn (ω) ր T (ω) as n → ∞. Since Tn is an (FtP )
stopping time, the process Ht · I{t<Tn } is (FtP )-adapted, and by (5.3.3),
ˆ ∞ ˆ Tn
2 2
E (Hs · I{s<Tn } ) dhMis = E Hs dhMis ≤ n for any n.
0 0
A sequence of stopping times as in the lemma will also be called a localizing sequence.
We can now extend the definition of the Itô integral to locally square-integrable adapted
integrands:
Definition (Itô integral with locally square integrable integrand). For a process H ∈
L2a,loc (0, T ; M), the Itô stochastic integral w.r.t. the martingale M is defined for t ∈
[0, T ) by
ˆ t ˆ t
Hs dMs := Hs · I{s<T̂ } dMs for any t ∈ [0, T̂ ] (5.3.4)
0 0
whenever T̂ is an (FtP ) stopping time such that Ht · I{t<T̂ } ∈ L2a (0, ∞; M).
´t
Theorem 5.12. For H ∈ L2a,loc (0, T ; M) the Itô integral t 7→ 0
Hs dMs is almost surely
well defined by (5.3.4) as a continuous process on [0, T ).
Proof. We have to verify that the definition does not depend on the choice of the local-
izing stopping times. This is a direct consequence of Corollary 5.10: Suppose that T̂
and Te are stopping times such that Ht · I and Ht · I e are both in L2 (0, ∞; M).
{t<T̂ } {t<T } a
Since the two trivially extended processes agree on [0, T̂ ∧ Te), Corollary 5.10 implies
that almost surely,
ˆ t ˆ t
Hs · I{s<T̂ } dMs = Hs · I{s<Te} dMs for any t ∈ [0, T̂ ∧ Te).
0 0
Example (Hitting time of a closed set). The hitting time TA of a closed set A by a
continuous adapted process is predictable, as it can be approximated from below by the
hitting times TAk of the neighbourhoods Ak = {x : dist(x, A) < 1/k}. On the other
hand, the hitting time of an open set is not predictable in general.
On the other hand, note that if (Mt ) is a continuous local martingale up to T = ∞, and
the family {Mt∧Tk : k ∈ N} is uniformly integrable for each fixed t ≥ 0, then (Mt ) is a
martingale, because for 0 ≤ s ≤ t
with convergence in L1 .
Proof. We can choose an increasing sequence (Tk ) of stopping times such that Tk < T
on {T > 0} and Ht · I{t<Tk } ∈ L2a (0, ∞; M) for any k. Then, by definition of the Itô
integral,
ˆ t∧Tk ˆ t∧Tk
Hs dMs = Hs · I{s<Tk } dMs almost surely for any k ∈ N,
0 0
The theorem shows that for a predictable (FtP ) stopping time T , the Itô map H 7→
´•
0
H dM extends to a linear map
where L2loc (0, T ; M) is the space of equivalence classes of processes in L2loc (0, T ; M)
that coincide for PhM i -a.e. (ω, t), and Mc,loc ([0, T )) denotes the space of equivalence
classes of continuous local (FtP ) martingales up to time T w.r.t. P -almost sure coin-
cidence. Note that different notions of equivalence are used for the integrands and the
integrals.
We finally observe that continuous local martingales (and hence stochastic integrals
w.r.t. continuous martingales) can always be localized by a sequence of bounded mar-
tingales in Mc2 ([0, ∞):
If the integrand (Ht ) of a stochastic integral H dB has continuous sample paths then
´
local square integrability always holds, and the stochastic integral is a limit of Riemann-
Itô sums: Let (πn ) be a sequence of partition of R+ with mesh(πn ) → 0.
Theorem 5.14. Suppose that T is a predictable stopping time, and (Ht )0≤t<T is a
stochastic process defined for t < T . If the sample paths t 7→ Ht (ω) are continuous on
[0, T (ω)) for any ω, and the trivially extended process Ht · I{t<T } is (FtP ) adapted, then
H is in L2a,loc (0, T ; M), and for any t ≥ 0,
ˆ t X
Hs dMs = lim Hs · (Ms′ ∧t − Ms ) on {t < T } (5.3.5)
0 n→∞
s∈πn
s<t
Proof. Let ⌊t⌋n = max{s ∈ πn : s ≤ t} denote the next partition point below t. By
continuity,
X
H⌊t⌋n · I{t<T } = Hs · I{s<T } · I[s,s′) (t) · I(0,∞) (T − t).
s∈πn
By continuity, t 7→ Ht (ω) is locally bounded for every ω, and thus H is in L2a,loc (0, T ; M).
Moreover, suppose that (Tk ) is a sequence of stopping times approaching T from below
in the sense of the definition of a predictable stopping time given above. Then
is a localizing sequence of stopping times with Ht · I{t<Tk } in L2a (0, T ; M) for any k,
and Tek ր T . Therefore, by definition of the Itô integral and by Theorem 5.7,
ˆ t ˆ t ˆ t
Hs dMs = Hs · I{s<Tek } dMs = Hs · I{s≤Tek } dMs
0 0 0
X
= lim Hs · (Ms′ ∧t − Ms ) on {t ≤ Tek }
n→∞
s∈πn
s<t
we obtain (5.3.5).
Our approach to Itô’s formula in this chapter follows that of [Föllmer: Stochastic Anal-
ysis, Vorlesungsskript Uni Bonn WS91/92]. We start with a heuristic derivation of the
formula that will be the central topic of this chapter.
Suppose that s 7→ Xs is a function from [0, t] to R, and F is a smooth function on R. If
(πn ) is a sequence of partitions of the interval [0, t] with mesh(πn ) → 0 then by Taylor’s
theorem
1
F (Xs′ ) − F (Xs ) = F ′ (Xs ) · (Xs′ − Xs ) + F ′′ (Xs ) · (Xs′ − Xs )2 + higher order terms.
2
Summing over s ∈ πn we obtain
X 1
F (Xt ) − F (X0 ) = F ′ (Xs ) · (Xs′ − Xs ) + F ′′ (Xs ) · (Xs′ − Xs )2 + . . . (6.0.1)
s∈π
2
n
Therefore, the second order terms can be neglected in the limit of (6.0.1) as mesh(πn ) →
0. Similarly, the higher order terms can be neglected, and we obtain the limit equation
ˆt
F (Xt ) − F (X0 ) = F ′ (Xs ) dXs , (6.0.2)
0
183
184 CHAPTER 6. ITÔ’S FORMULA AND PATHWISE INTEGRALS
Of course, (6.0.3) is just the chain rule of classical analysis, and (6.0.2) is the equivalent
chain rule for Stieltjes integrals, cf. Section 6.1 below.
E[(Xs′ − Xs )2 ] = s′ − s.
with probability one. The equation (6.0.4) is the basic version of Itô’s celebrated for-
mula.
In Section 6.1, we will first introduce Stieltjes integrals and the chain rule from Stieltjes
calculus systematically. In Section 6.2 we prove a general version of Itô’s formula for
continuous functions with finite quadratic variation in dimension one. Here the setup
and the proof are still purely deterministic. As an aside we obtain a pathwise definition
for stochastic integrals involving only a single one-dimensional process due to Föllmer.
After computing the quadratic variation of Brownian motion in Section 6.3, we consider
first consequences of Itô’s formula for Brownian motions and continuous martingales.
Section 6.4 contains extensions to the multivariate and time-dependent case, as well as
further applications.
Lebesgue-Stieltjes integrals
Fix u ∈ (0, ∞], and suppose that t 7→ At is a right-continuous and non-decreasing
function on [0, u). Then At − A0 is the distribution function of the positive measure µ
on (0, u) determined uniquely by
´t
Therefore, we can define integrals of type Hs dAs as Lebesgue integrals w.r.t. the
s
measure µA . We extend µ trivially to the interval [0, u), so L1loc ([0, u), µA) is the space
of all functions H : [0, u) → R that are integrable w.r.t. µA on any interval (0, t) with
t < u. Then for any u ∈ [0, ∞] and any function H ∈ L1loc ([0, u), µA), the Lebesgue-
Stieltjes integral of H w.r.t. A is defined by
ˆt ˆ
Hr dAr := Hr · I(s,t] (r)µA (dr) for 0 ≤ s ≤ t < u.
s
It is easy to verify that the definition is consistent, i.e., varying u does not change the
´t
definition of the integrals, and that t 7→ Hr dAr is again a right-continuous function.
0
(1)
X
Vt (A) := sup |As′ ∧t − As∧t | for 0 ≤ t < u,
π
s∈π
Any right-continuous function of finite variation can be written as the difference of two
non-decreasing right-continuous functions. In fact, we have
At = Aր ց
t − At (6.1.1)
with
X 1 (1)
Aր
t = sup (As′ ∧t − As∧t )+ = (Vt (A) + At ), (6.1.2)
π
s∈π
2
X 1 (1)
Aց
t = sup (As′ ∧t − As∧t )− = (Vt (A) − At ). (6.1.3)
π
s∈π
2
t < u.
µA [B] := µ+ −
A [B] − µA [B], B ∈ B(0, u), (6.1.4)
(1)
and Vt is the distribution of the measure |µA | = µ+ −
A −µA . It is a consequence of (6.1.5)
and (6.1.6) that the measures µ+ −
A and µA are singular, i.e., the mass is concentrated on
disjoint sets S + and S − . The decomposition (6.1.7) is hence a particular case of the
Hahn-Jordan decomposition of a signed measure µ of finite variation into a positive and
a negative part, and the measure |µ| is the total variation measure of µ, cf. e.g. [Alt:
Lineare Funktionalanalysis].
We can now apply (6.1.1) to define Lebesgue-Stieltjes integrals w.r.t. functions of finite
variation. A function is integrable w.r.t. a signed measure µ if and only if it is integrable
w.r.t. both the positive part µ+ and the negative part µ− . The Lebesgue integral w.r.t. µ
is then defined as the difference of the Lebesgue integrals w.r.t. µ+ and µ− . Correspond-
ingly, we define the Lebesgue-Stieltjes integral w.r.t. a function At of finite variation as
the Lebesgue integral w.r.t. the associated signed measure µA :
ˆt ˆ ˆ
Hr dAr := Hr · I(s,t] (r) dAր
r − Hr · I(s,t] (r) dAց
r , 0 ≤ s ≤ t < u,
s
L1loc ((0, u), |dA|) := L1loc ((0, u), dAր) ∩ L1loc ((0, u), dAց)
is the intersection of the local L1 spaces w.r.t. the positive measures dAր = µ+
A and
dAց = µ− 1
A on [0, u), or, equivalently, the local L space w.r.t. the total variation mea-
sure |dA| = |µA |.
P
n−1
Remark. (1). Simple integrands: If Ht = ci · I(ti ,ti+1 ] is a step function with
i=0
0 ≤ t0 < t1 < . . . < tn < u and c0 , c1 , . . . , cn−1 ∈ R then
ˆt n−1
X
Hs dAs = ci · (Ati+1 ∧t − Ati ∧t ).
0 i=0
ˆt X
Hs dAs = lim Hs · (As′ ∧t − As ), t ∈ [0, u),
n→∞
0 s∈πn
s<t
for any sequence (πn ) of partitions of R+ such that mesh(πn ) → 0. For the proof
note that the step functions
X
Hrn = Hs · I(s,s′ ] (r), r ∈ [0, u),
s∈πn
s<t
ˆt
(1)
Vt (A) = |A′s | ds < ∞ for t ∈ [0, u).
0
In the applications that we are interested in, the integrand will mostly be continuous,
and the integrator absolutely continuous. Hence Remarks (2) and (3) above apply.
ˆt
F (At ) − F (A0 ) = F ′ (As ) dAs ∀t ∈ [0, u). (6.1.5)
0
where, as usual, s′ denotes the next partition point. By Taylor’s formula, for s ∈ πn
with s < t we have
1
F (As′ ∧t ) − F (As ) = F ′ (As )δAs + F ′′ (Zs ) · (δAs )2 ,
2
´t
As n → ∞, the first (Riemann) sum converges to the Stieltjes integral F ′ (As ) dAs by
0
continuity of F ′ (As ), cf. Remark (2) above.
To see that the second sum converges to zero, note that the range of the continuous
function A restricted to [0, t] is a bounded interval. Since F ′′ is continuous by assump-
tion, F ′′ is bounded on this range by a finite constant c. As Zs is an intermediate value
between As and As′ ∧t , we obtain
X ′′ X
F (Z )(δA ) 2 ≤ c· (1)
(δAs )2 ≤ c · Vt (A) · sup |δAs |.
s s
s∈πn s∈πn
s∈πn
s<t s<t
s<t
(1)
Since Vt (A) < ∞, and A is a uniformly continuous function on [0, t], the right hand
side converges to 0 as n → ∞. Hence we obtain (6.1.5) in the limit of (6.1.6) as
n → ∞.
To see that (6.1.5) can be interpreted as a chain rule, we write the equation in differential
form:
dF (A) = F ′ (A)dA. (6.1.7)
´s
Theorem 6.2 (Stieltjes integrals w.r.t. Stieltjes integrals). Suppose that Is = Hr dAr
0
where A : [0, u) → R is a function of locally finite variation, and H ∈ L1loc ([0, u), |dA|).
Then the function s 7→ Is is again right continuous with locally finite variation
(1) ´t
Vt (I) ≤ |H| |dA| < ∞, and, for any function G ∈ L1loc ([0, u), |dI|),
0
ˆt ˆt
Gs dIs = Gs Hs dAs for t ∈ [0, u). (6.1.8)
0 0
Proof. Right continuity of It and the upper bound for the variation are left as an exercise.
n−1
X n−1
X ˆti+1 ˆt
Gti (Iti+1 − Iti ) = Gti · Hs dAs = G⌊s⌋ Hs dAs
i=0 i=0 ti 0
where ⌊s⌋ denotes the largest partition point ≤ s. Choosing a sequence (πn ) of parti-
tions with mesh(πn ) → 0, the integral on the right hand side converges to the Lebesgue-
´t
Stieltjes integral Gs Hs dAs by continuity of G and the dominated convergence the-
0
´t
orem, whereas the Riemann sum on the left hand side converges to Gs dIs . Hence
0
(6.1.8) holds for continuous G. The equation for general G ∈ L1loc ([0, u), |dI|) follows
then by standard arguments.
Quadratic variation
Consider once more the approximation (6.1.6) that we have used to prove the funda-
mental theorem of Stieltjes calculus. We would like to identify the limit of the last sum
P ′′
F (Zs )(δAs )2 when A has unfinite variation on finite intervals. For F ′′ = 1 this
s∈πn
limit is called the quadratic variation of A if it exists:
(1). The quadratic variation should not be confused with the classical 2-variation de-
fined by
(2)
X
Vt (X) := sup |Xs′ ∧t − Xs∧t |2
π
s∈π
(2)
where the supremum is over all partitions π. The classical 2-variation Vt (X)
is strictly positive for any function X that is not constant on [0, t] whereas [X]t
vanishes in many cases, cf. Example (1) below.
(2). In general, the quadratic variation may depend on the sequence of partitions con-
sidered. See however Examples (1) and (3) below.
Example. (1). Functions of finite variation: For any continuous function A : [0, u) →
R of locally finite variation, the quadratic variation along (πn ) vanishes:
(1)
by uniform continuity and since Vt (A) < ∞.
[X + A]t = [X]t .
and the last two sums converge to 0 as mesh(πn ) → 0 by Example (1) and the
Cauchy-Schwarz inequality.
w.r.t. any fixed sequence (πn ) of partitions such that mesh(πn ) → 0, cf. Theorem
6.6 below.
´t
(4). Itô processes: If It = Hs dBs is the stochastic integral of a process H ∈
0
L2a,loc (0, ∞) w.r.t. Brownian motion then almost surely, the quadratic variation
w.r.t. a fixed sequence of partitions is
ˆt
[I]t = Hs2 ds for all t ≥ 0.
0
(5). Continuous martingales: [M] exists and is almost surely finite, see below.
Note that in Examples (3), (4) and (5), the exceptional sets may depend on the sequence
(πn ). If it exists, the quadratic variation [X]t is a non-decreasing function in t. In
particular, Stieltjes integrals w.r.t. [X] are well-defined provided [X] is right continuous.
X ˆt
Hs · (Xs′ ∧t − Xs )2 −→ Hs d[X]s as n → ∞ (6.2.1)
s∈πn 0
s<t
i.e., the infinitesimal increments of the quadratic variation are something like squared
infinitesimal increments of X. This observation is crucial for controlling the second
order terms in the Taylor expansion used for proving the Itô-Doeblin formula.
Proof. The sum on the left-hand side of (6.2.1) is the integral of H w.r.t. the finite
positive measure
X
µn := (Xs′ ∧t − Xs )2 · δs
s∈πn
s<t
holds. In particular, if the quadratic variation [X] does not depend on (πn ) then the Itô
integral (6.2.2) does not depend on (πn ) either.
´t
Note that the theorem implies the existence of f (Xs ) dXs for any function f ∈
0
C 1 (R)! Hence this type of Itô integrals can be defined in a purely deterministic way
without relying on the Itô isometry. Unfortunately, the situation is more complicated in
higher dimensions, cf. ?? below.
Proof. Fix t ∈ [0, u) and n ∈ N. As before, for s ∈ πn we set δXs = Xs′ ∧t − Xs∧t
where s′ denotes the next partition point. Then as above we have
X 1 X ′′ (n)
F (Xt ) − F (X0 ) = F ′ (Xs )δXs + F (Zs )(δXs )2
s∈πn
2 s∈π
n
s<t s<t
(6.2.4)
X 1 X ′′ X
= F ′ (Xs )δXs + F (Xs )(δXs )2 + Rs(n) ,
s∈πn
2 s∈π s∈π
n n
s<t s<t s<t
(6.2.5)
1
|Rs(n) | = |F ′′ (Zs(n) ) − F ′′ (Xs )| · (δXs )2 ≤ εn (δXs )2 ,
2
where
εn := sup |F ′′ (Xa ) − F ′′ (Xb )|.
a,b∈[0,t]
|a−b|≤mesh(πn )
because the sum of the squared increments converges to the finite quadratic variation
[X]t .
We have shown that all the terms on the right hand side of (6.2.4) except the first
´t
Riemann-Itô sum converge as n → ∞. Hence, by (6.2.4), the limit F ′ (Xs ) dXs
0
of the Riemann Itô sums also exists, and the limit equation (6.2.2) holds.
(2). For functions X with [X] = 0 we recover the classical chain rule dF (X) =
F ′ (X) dX from Stieltjes calculus as a particular case of Itô’s formula.
dZ = Z dX (6.2.6)
ˆt X
ˆ s := lim
F ′ (Xs ) dX F ′ (Xs′ ∧t ) · (Xs′ ∧t − Xs ),
n→∞
0 s∈πn
s<t
exist, and
ˆt ˆt
F (Xt ) − F (X0 ) = ˆ s−1
F ′ (Xs ) dX F ′′ (Xs ) d[X]s (6.2.7)
2
0 0
ˆt
= F ′ (Xs ) ◦ dXs . (6.2.8)
0
Proof. The proof of the backward Itô formula (6.2.7) is completely analogous to that of
Itô’s formula. The Stratonovich formula (6.2.8) follows by averaging the Riemann sum
approximations to the forward and backward Itô rule.
This makes them very attractive for various applications. For example, in stochastic dif-
ferential geometry, the chain rule is of fundamental importance to construct stochastic
processes that stay on a given manifold. Therefore, it is common to use Stratonovich
instead of Itô calculus in this context, cf. the corresponding example in the next sec-
tion. On the other hand, Stratonovich calculus has a significant disadvantage against Itô
calculus: The Stratonovich integrals
ˆt X1
Hs ◦ dBs = lim (Hs + Hs′ ∧t )(Bs′ ∧t − Bs )
n→∞ 2
0
w.r.t. Brownian motion typically are not martingales, because the coefficients 12 (Hs +
Hs′ ∧t ) are not predictable.
with
X X
Vtn = (Xs′ ∧t − Xs )2 and Itn = Xs · (Xs′ ∧t − Xs )
s∈πn s∈πn
s<t s<t
holds. The equation (6.3.1) is a discrete approximation of Itô’s formula for the function
F (x) = x2 . The remainder terms in the approximation vanish in this particular case.
Note that by (6.3.1), the quadratic variation [X]t = limn→∞ Vtn exists if and only if the
´t
Riemann sum approximations Itn to the Itô integral 0 Xs dXs converge:
ˆ t
n
∃ [X]t = lim Vt ⇐⇒ ∃ Xs dXs = lim Itn .
n→∞ 0 n→∞
Now suppose that (Xt ) is a continuous martingale with E[Xt2 ] < ∞ for any t ≥ 0.
Then the Riemann sum approximations (Itn ) are continuous martingales for any n ∈ N.
Therefore, by the maximal inequality, for a given u > 0, the processes (Itn ) and (Vtn )
converge uniformly for t ∈ [0, u] in L2 (P ) if and only if the random variables Iun or Vun
respectively converge in L2 (P ).
For the sample paths of a Brownian motion B, the quadratic variation [B] exists almost
surely along any fixed sequence of partitions (πn ) with mesh(πn ) → 0, and [B]t = t
a.s. In particular, [B] is a deterministic function that does not depend on (πn ). The
reason is a law of large numbers type effect when taking the limit of the sum of squared
increments as n → ∞.
for any u ∈ (0, ∞), and for each sequence (πn ) of partitions of R+ with mesh(πn ) → 0.
Warning. (1). Although the almost sure limit in (6.3.2) does not depend on the se-
quence (πn ), the exceptional set may depend on the chosen sequence!
(2) P
(2). The classical quadratic variation Vt (B) = supπ s∈π (δBs )
2
is almost surely
infinite for any t ≥ 0. The classical p-variation is almost surely finite if and only
if p > 2.
Proof. (1). L2 -convergence for fixed t: As usual, the proof of L2 convergence is com-
P
paratively simple. For Vtn = (δBs )2 with δBs = Bs′ ∧t − Bs∧t , we have
s∈πn
X X
E[Vtn ] = E[(δBs )2 ] = δs = t, and
s∈πn s∈πn
X X
Var[Vtn ] = 2
Var[(δBs ) ] = E[((δBs )2 − δs)2 ]
s∈πn s∈πn
X
2 2
= E[(Z − 1) ] · (δs)2 ≤ const. · t · mesh(πn )
s∈πn
Moreover, by (6.3.1), Vtn − Vtm = Itn − Itm is a continuous martingale for any
n, m ∈ N. Therefore, the maximal inequality yields uniform convergence of Vtn
to t for t in a finite interval in the L2 (P ) sense.
P
(2). Almost sure convergence if mesh(πn ) < ∞: Similarly, by applying the max-
imal inequality to the process Vtn − Vtm and taking the limit as m → ∞, we
obtain
" #
2
P sup |Vtn − t| > ε ≤ E[(Vtn − t)2 ] ≤ const. · t · mesh(πn )
t∈[0,u] ε2
P
for any given ε > 0 and u ∈ (0, ∞). If mesh(πn ) < ∞ then the sum of
the probabilities is finite, and hence sup |Vtn − t| → 0 almost surely by the
t∈[0,u]
Borel-Cantelli Lemma.
P
(3). Almost sure convergence if mesh(πn ) = ∞: In this case, almost sure conver-
gence can be shown by the backward martingale convergence theorem. We refer
to Proposition 2.12 in [Revuz, YorXXX], because for our purposes almost sure
convergence w.r.t arbitrary sequences of partitions is not essential.
Theorem 6.7 (Itô’s formula for Brownian motion). Suppose that F ∈ C 2 (I) where
I ⊆ R be an open interval. Then almost surely,
ˆt ˆt
1
F (Bt ) − F (B0 ) = F ′ (Bs ) dBs + F ′′ (Bs ) ds for all t < T, (6.3.3)
2
0 0
Proof. For almost every ω, the quadratic variation of t 7→ Bt (ω) along a fixed sequence
of partitions is t. Moreover, for any r < T (ω), the function F is C 2 on a neighbourhood
of {Bt (ω) : t ∈ [0, r]}. The assertion now follows from Theorem 6.7 by noting that
the pathwise integral and the Itô integral as defined in Section 5 coincide almost surely
since both are limits of Riemann-Itô sums w.r.t. uniform convergence for t in a finite
interval, almost surely along a common (sub)sequence of partitions.
Consequences
´t
(1). Doob decomposition in continuous time: The Itô integral MtF = F ′ (Bs ) dBs
0
is a local martingale up to T , and MtF is a square integrable martingale if I = R
and F ′ is bounded. Therefore, (6.3.3) can be interpreted as a continuous time
Doob decomposition of the process (F (Bt )) into the (local) martingale part M F
and an adapted process of finite variation. This process takes over the role of the
predictable part in discrete time.
In particular, we obtain:
Corollary 6.8 (Martingale problem for Brownian motion). Brownian motion is a so-
1 d2
lution of the martingale problem for the operator L = with domain Dom(L ) =
dF
2 dx2
2
{F ∈ C (R) : dx is bounded}, i.e., the process
ˆt
MtF = F (Bt ) − F (B0 ) − (L f )(Bs ) ds
0
The corollary demonstrates how Itô’s formula can be applied to obtain solutions of
martingale problems, cf./ below for generalizations.
Itô’s formula is also the key tool to derive or solve stochastic differential equations
for various stochastic processes of interest:
where (Bt ) is a standard Brownian motion on R1 . Itô’s formula yields the stochas-
tic differential equation
1
dZt = t(Zt ) dBt − n(Zt ) dt, (6.3.4)
2
iz
where t(z) = iz is the unit tangent vector to S 1 at the point z, and n(z) = z is the
outer normal vector. If we would omit the correction term − 21 n(Zt ) dt in (6.3.4),
the solution to the s.d.e. would not stay on the circle. This is contrary to classical
o.d.e. where the correction term is not required. For Stratonovich integrals, we
obtain the modified equation
Notice that in the theorem, we do not assume the existence of an angle bracket process
hMi. Indeed, the theorem proves that for continuous local martingales, the angle bracket
process always exists and it coincides almost surely with the quadratic variation process
[M] ! We point out that for discontinuous martingales, hMi and [M] do not coincide.
Proof. We first assume that M is a bounded martingale: |Mt | ≤ C for some finite
constant C. We then show that (In ) is a Cauchy sequence in the Hilbert space Mc2 ([0, t])
for any given t ∈ R+ . To this end let n, m ∈ N. We assume without loss of generality
that πm ⊆ πn , otherwise we compare to a common refinement of both partitions. For
s ∈ πn , we denote the next partition point in πn by s′ , and the previous partition point
in πm by ⌊s⌋m . Fix t ≥ 0. Then
X
Itn − Itm = (Ms − M⌊s⌋m ) (Ms′ ∧t − Ms ), and hence
s∈πn
s<t
kI n − I m k2M 2 ([0,t]) = E (Itn − Itm )2
X
= E (Ms − M⌊s⌋m )2 (Ms′ ∧t − Ms )2
s∈πn
s<t
X 2 1/2
2 1/2 2
≤ E δm E (δMs ) , (6.3.7)
Having shown the existence of the quadratic variation [M] for continuous local martin-
gales, we observe next that [M] is always non-trivial if M is not constant:
Proof. Again, we assume at first that M is a bounded martingale. Then the Itô integral
´•
0
M dM is a martingale as well. Therefore, by (6.3.6),
i.e., Ms = M0 for any s ∈ [0, t]. In the general case, the assertion follows once more by
localization.
The theorem shows in particular that every local martingale with continuous finite vari-
ation paths is almost surely constant, i.e., the Doob type decomposition of a continu-
ous stochastic process into a local martingale and a continuous finite variation pro-
cess starting at 0 is unique up to equivalence. As a consequence we observe that the
quadratic variation is the unique angle bracket process of M. In particular, up to mod-
ification on measure zero sets, [M] does not depend on the chosen partition sequence
(πn ):
Corollary 6.11 (Quadratic variation as unique angle bracket process). Suppose that
(Mt ) is a continuous local martingale. Then [M] is the up to equivalence unique contin-
uous process of finite variation such that [M]0 = 0 and Mt2 −[M]t is a local martingale.
A remarkable consequence of Itô’s formula for martingales is that any continuous local
martingale (Mt ) (up to T = ∞) with quadratic variation given by [M]t = t for any
t ≥ 0 is a Brownian motion ! In fact, for 0 ≤ s ≤ t and p ∈ R, Itô’s formula yields
ˆt ˆt
ipMt ipMs ipMr p2
e −e = ip e dMr − eipMr dr
2
s s
where the stochastic integral can be identified as a local martingale. From this identity
it is not difficult to conclude that the increment Mt − Ms is conditionally independent
of FsM with characteristic function
2 (t−s)/2
E[eip(Mt −Ms ) ] = e−p for any p ∈ R,
Theorem 6.12 (P. Lévy 1948). A continuous local martingale (Mt )t∈[0,∞) is a Brownian
motion if and only if almost surely,
Lévy’s Theorem is the basis for many important developments in stochastic analysis
including transformations and weak solutions for stochastic differential equations. An
extension to the multi-dimensional case with a detailled proof, as well as several appli-
cations, are contained in Section 11.1 below.
Hint: Set Ba = MTa where Ta = [M]−1 (a) = inf{t ≥ 0 : [M]t = a}, and verify by
Lévy’s characterization that B is a Brownian motion.
(n)
for any s ∈ πn with s < t where the dot denotes the Euclidean inner product Rs is the
remainder term in Taylor’s formula. We would like to obtain a multivariate Itô formula
by summing over s ∈ πn with s < t and taking the limit as n → ∞. A first problem
that arises in this context is the identification of the limit of the sums
X
g(Xs )δXs(i) δXs(j)
s∈πn
s<t
Covariation
Suppose that X, Y : [0, u) → R are continuous functions with continuous quadratic
variations [X]t and [Y ]t w.r.t. (πn ).
The covariation [X, Y ]t is the bilinear form corresponding to the quadratic form [X]t .
In particular, [X, X] = [X]. Furthermore:
Lemma 6.13 (Polarization identity). The covariation [X, Y ]t exists and is a continu-
ous function in t if and only if the quadratic variation [X + Y ]t exists and is continuous
respectively. In this case,
1
[X, Y ]t = ([X + Y ]t − [X]t − [Y ]t ).
2
Proof. For n ∈ N we have
X X X X
2 δXs δYs = (δXs + δYs )2 − (δXs )2 − (δYs )2 .
s∈πn s∈πn s∈πn s∈πn
The assertion follows as n → ∞ because the limits [X]t and [Y ]t of the last two terms
are continuous functions by assumption.
Remark. Note that by the polarization identity, the covariation [X, Y ]t is the difference
of two increasing functions, i.e., t 7→ [X, Y ]t has finite variation.
Example. (1). Functions and processes of finite variation: If Y has finite variation
then [X, Y ]t = 0 for any t ≥ 0. Indeed,
X X
δXs δYs ≤ sup |δXs | · |δYs |
s∈πn
s∈πn s∈πn
and the right hand side converges to 0 by uniform continuity of X on [0, t]. In
particular, we obtain again
´t ´t
(3). Itô processes: If It = 0
Gs dBs and Ft = Hs dBs with continuous adapted
0
et ) then
processes (Gt ) and (Ht ) and Brownian motions (Bt ) and (B
´t ´t
holds for Itô integrals It = Gs dXs and Jt = Hs dYs , cf. e.g. Corollary ??.
0 0
and the covariation [X, Y ]t exists along a sequence (πn ) of partitions with mesh(πn ) →
´t
0 then the corresponding backward Itô integral Xs dY ˆ s and the Stratonovich integral
0
´t
Xs ◦ dYs also exist, and
0
ˆt ˆt
ˆ s =
Xs dY Xs Ys + [X, Y ]t , and
0 0
ˆt ˆt
1
Xs ◦ dYs = Xs Ys + [X, Y ]t .
2
0 0
Itô’s formula in Rd
By the polarization identity, if [X]t , [Y ]t and [X + Y ]t exist and are continuous then
[X, Y ]t is a continuous function of finite variation.
Lemma 6.15. Suppose that X, Y and X + Y are continuous function on [0, u) with
continuous quadratic variations w.r.t. (πn ). Then
X ˆt
Hs (Xs′ ∧t − Xs )(Ys′ ∧t − Ys ) −→ Hs d[X, Y ]s as n → ∞
s∈πn 0
s<t
By Lemma 6.15, we can take the limit as mesh(πn ) → 0 in the equation derived by
summing (6.4.2) over all s ∈ πn with s < t. In analogy to the one-dimensional case,
this yields the following multivariate version of the pathwise Itô formula:
ˆt d ˆ t
1X ∂2F
F (Xt ) = F (X0 ) + ∇F (Xs ) · dXs + (Xs ) d[X (i) , X (j)]s ,
2 i,j=1 ∂xi ∂xj
0 0
where the Itô integral is the limit of Riemann sums along (πn ):
ˆt X
∇F (Xs ) · dXs = lim ∇F (Xs ) · (Xs′ ∧t − Xs ). (6.4.4)
n→∞
0 s∈πn
s<t
The details of the proof are similar to the one-dimensional case and left as an exercise
to the reader. Note that the theorem shows in particular that the Itô integral in (6.4.4) is
independent of the sequence (πn ) if the same holds for the covariations [X (i) , X (j) ].
Remark (Existence of pathwise Itô integrals). The theorem implies the existence of
´t
the Itô integral b(Xs ) · dXs if b = ∇F is the gradient of a C 2 function F : D ⊆
0
Rd → R. In contrast to the one-dimensional case, not every C 1 vector field b : D → Rd
´t
is a gradient. Therefore, for d ≥ 2 we do not obtain existence of 0 b(Xs ) · dXs for
any b ∈ C 1 (D, Rd ) from Itô’s formula. In particular, we do not know in general if the
´ t ∂F (i)
integrals 0 ∂x i
(Xs ) dXs , 1 ≤ i ≤ d, exists and if
ˆt Xd ˆ t
∂F
∇F (Xs ) · dXs = (Xs ) dXs(i) .
i=1
∂xi
0 0
If (Xt ) is a Brownian motion this is almost surely the case by the existence proof for Itô
integrals w.r.t. Brownian motion from Section 5.
(1) (d)
Example (Itô’s formula for Brownian motion in Rd ). Suppose that Bt = (Bt , . . . , Bt )
is a d-dimensional Brownian motion defined on a probability space (Ω, A, P ). Then the
(1) (d)
component processes Bt , . . . , Bt are independent one-dimensional Brownian mo-
tions. Hence for a given sequence of partitions (πn ) with mesh(πn ) → 0, the covari-
ations [B (i) , B (j) ], 1 ≤ i, j ≤ d, exists almost surely by Theorem 6.6 and the example
above, and
[B (i) , B (j) ] = t · δij ∀t ≥ 0
P -almost surely. Therefore, we can apply Itô’s formula to almost every trajectory. For
an open subset D ⊆ Rd and a function F ∈ C 2 (D) we obtain:
ˆt ˆt
1
F (Bt ) = F (B0 )+ ∇F (Bs )·dBs + ∆F (Bs )ds ∀t < TDC P -a.s. (6.4.5)
2
0 0
Corollary 6.17. Suppose that X, Y : [0, u) → R are continuous functions with contin-
uous quadratic variations [X] and [Y ], and continuous covariation [X, Y ]. Then
ˆt !
Ys
Xt Y t − X0 Y 0 = · d Xs Ys + [X, Y ]t for any t ∈ [0, u). (6.4.6)
Xs
0
´t ´t
If one, or, equivalently, both of the Itô integrals Ys dXs and Xs dYs exist then (6.4.6)
0 0
yields
ˆt ˆt
Xt Y t − X0 Y 0 = Ys dXs + Xs dYs + [X, Y ]t . (6.4.7)
0 0
Proof. The identity (6.4.6) follows by applying Itô’s formula in R2 to the process (Xt , Yt )
´t ´t
and the function F (x, y) = xy. If one of the integrals 0 Y dX or 0 X dY exists, then
the other exists as well, and
ˆt ! ! ˆt ˆt
Ys Xs
·d = Ys dXs + Xs dYs .
Xs Ys
0 0 0
As it stands, (6.4.7) is an integration by parts formula for Itô integrals which involves the
correction term [X, Y ]t . In differential notation, it is a product rule for Itô differentials:
d(XY ) = X dY + Y dX + [X, Y ].
Again, in Stratonovich calculus a corresponding product rule holds without the correc-
tion term [X, Y ]:
◦d(XY ) = X ◦ dY + Y ◦ dX.
Remark / Warning (Existence of X dY, Lévy area). Under the conditions of the
´
´t ´t
theorem, the Itô integrals X dY and Y dX do not necessarily exist! The following
0 0
statements are equivalent:
´t
(1). The Itô integral Ys dXs exists (along (πn )).
0
´t
(2). The Itô integral Xs dYs exists.
0
exists.
Hence, if the Lévy area At (X, Y ) is given, the stochastic integrals X dY and
´ ´
Y dX
can be constructed pathwise. Pushing these ideas further leads to the rough paths theory
developed by T. Lyons and others, cf. [Lyons, St. Flour], [Friz: Rough paths theory].
This integration by parts identity can be used as an alternative definition of the stochastic
´t
integral H dB for integrands of finite variation, which can then again be extended to
0
general integrands in L2a (0, t) by the Itô isometry.
holds for any t ≥ 0. Here ∂F/∂a denotes the derivative of F (a, x) w.r.t. the first com-
ponent, and ∇x F and ∂ 2 F/∂xi ∂xj are the gradient and the second partial derivatives
w.r.t. the other components. The most important application of the theorem is for At = t.
Here we obtain the time-dependent Itô formula
d
∂F 1 X ∂2F
dF (t, Xt ) = ∇x F (t, Xt ) · dXt + (t, Xt ) dt + (t, Xt ) d[X (i) , X (j) ]t .
∂t 2 i,j=1 ∂xi ∂xj
(6.4.10)
Similarly, if d = 1 and At = [X]t then we obtain
∂F ∂F 1 ∂2F
dF ([X]t , Xt ) = ([X]t , Xt ) dt + + ([X]t , Xt ) d[X]t . (6.4.11)
∂t ∂a 2 ∂x2
where
!
X As′ ∧t − As
Rd+1
It = lim ∇ F (As , Xs ) ·
n→∞
s∈πn Xs′ ∧t − Xs
s<t
X X
∂F
= lim (As , Xs )(As′ ∧t − As ) + ∇x F (As , Xs ) · (Xs′ ∧t − Xs ) .
n→∞ ∂a
´t ∂F
The first sum on the right hand side converges to the Stieltjes integral 0 ∂a
(As , Xs )dAs
as n → ∞. Hence, the second sum also converges, and we obtain (6.4.7) in the limit as
n → ∞.
∂h 1 ∂ 2 h
+ = 0 for t ≥ 0, x ∈ R, (6.4.12)
∂t 2 ∂x2
then by (6.4.11),
ˆt
∂h
h([X]t , Xt ) = h(0, X0 ) + ([X]s , Xs ) dXs .
∂x
0
(α) (α)
dZt = αZt dXt
(α) (α)
with initial condition Z0 = 1. This shows that in Itô calculus, the functions Zt
are the correct replacements for the exponential functions. The additional factor
exp(−α2 [X]t /2) should be thought of as an appropriate renormalization in the
continuous time limit.
For a Brownian motion (Xt ), we obtain the exponential martingales as general-
ized exponentials.
∂n 1
hn (t, x) = n
exp(αx − α2 t)|α=0
∂α 2
2 /2 dn −x2 /2
hn (1, x) = ex (−1)n
n
e for any x ∈ R, (6.4.13)
dx
√
hn (t, x) = tn/2 hn (1, x/ t) for any t ≥ 0, x ∈ R, (6.4.14)
∂hn ∂hn 1 ∂ 2 hn
= nhn−1 , + = 0. (6.4.15)
∂x ∂t 2 ∂x2
yields
dn
2
hN (1, x) = exp(x /2)(−1) exp(−β 2
n
/2) ,
dβ n
β=x
(n) (n−1)
dHt = nHt dXt . (6.4.16)
Therefore, the Hermite polynomials are appropriate replacements for the ordinary
(n)
monomials xn in Itô calculus. If X0 = 0 then H0 = 0 for n ≥ 1, and we obtain
inductively
ˆt ˆ ˆ t ˆs
(0) (1) (2)
Ht = 1, Ht = dXs , Ht = Hs(1) dXs = dXr dXs ,
0 0 0
and so on.
ˆ t ˆsn ˆs2
1
··· dXs1 · · · dXsn−1 dXsn = hn ([X]t , Xt ).
n!
0 0 0
Iterated Itô integrals occur naturally in Taylor expansions of Itô calculus. Therefore, the
explicit expression from the corollary is valuable for numerical methods for stochastic
differential equations, cf. Section ?? below.
The stationary and time-dependent Itô formula enable us to work out the connection of
Brownian motion to several partial differential equations involving the Laplace operator
in detail. One of the many consequences is the evaluation of probabilities and expec-
tation values for Brownian motion by p.d.e. methods. More generally, Itô’s formula
establishes a link between stochastic processes and analysis that is extremely fruitful in
both directions.
Suppose that (Bt ) is a d-dimensional Brownian motion defined on a probability space
(Ω, A, P ) such that every sample path t 7→ Bt (ω) is continuous. We first note that Itô’s
formula shows that Brownian motion solves the martingale problem for the operator
1
L = ∆ in the following sense:
2
221
222 CHAPTER 7. BROWNIAN MOTION AND PDE
Proof. By the continuity assumptions one easily verifies that M F is (FtB ) adapted.
Moreover, by the time-dependent Itô formula (6.4.10),
ˆt
MtF = ∇x F (s, Bs ) · dBs for t < TDC ,
0
Choosing a function F that does not explicitly depend on t, we obtain in particular that
ˆt
1
MtF = F (Bt ) − F (B0 ) − ∆F (Bs ) ds
2
0
is a martingale for any f ∈ Cb2 (Rd ), and a local martingale up to TDC for any F ∈
C 2 (D).
We will generalize this result substantially in Theorem 7.5 below. Before, we apply the
Dirichlet problem to study recurrence and transience of Brownian motions:
We now compute the probabilities P [Ta < Tb ] for a < |x0 | < b. Note that this is a
multi-dimensional analogue of the classical ruin problem. To compute the probability
for given a, b we consider the domain
For b < ∞, the first exit time TDC is almost surely finite,
x0
a
Then h(Bt ) is a bounded local martingale up to TDC and optional stopping yields
By rotational symmetry, the solution of the Dirichlet problem (7.1.2) can be computed
explicitly. The Ansatz h(x) = f (|x|) leads us to the boundary value problem
d2 f d − 1 df
2
(|x|) + (|x|) = 0, f (a) = 1, f (b) = 0,
dr |x| dr
for a second order ordinary differential equation. Solutions of the o.d.e. are linear
combinations of the constant function 1 and the function
s for d = 1,
φ(s) := log s for d = 2,
s2−d for d ≥ 3.
φ(s)
Figure 7.1: The function φ(s) for different values of d: red (d = 1), blue (d = 2) and
purple (d = 3)
Hence, the unique solution f with boundary conditions f (a) = 1 and f (b) = 0 is
φ(b) − φ(r)
f (r) = .
φ(b) − φ(a)
Summarizing, we have shown:
Theorem 7.3 (Ruin problem in Rd ). For a, b > 0 with a < |x0 | < b,
φ(b) − φ(|x0 |)
P [Ta < Tb ] = , and
φ(b) − φ(a)
1 for d ≤ 2
P [Tb < ∞] =
(a/|x |)d−2 for d > 2.
0
Corollary 7.4. For a Brownian motion in Rd the following statements hold for any
initial value x0 ∈ Rd :
(1). If d ≤ 2 then every non-empty ball D ⊆ Rd is recurrent, i.e., the last visit time of
D is almost surely infinite:
Ld = sup{t ≥ 0 : Bt ∈ D} = ∞ P -a.s.
Ld < ∞ P -a.s.
P [ ∃ t > 0 : Bt = x] = 0.
We point out that the last statement holds even if the starting point x0 coincides with x.
The first statement implies that a typical Brownian sample path is dense in R2 , whereas
by the second statement, lim |Bt | = ∞ almost surely for d ≥ 3.
t→∞
Proof.
(1),(2) The first two statements follow from Theorem 7.3 and the Markov property.
for any x ∈ D.
Remark. Note that we assume the existence of a smooth solution of the boundary value
problem (7.2.2). Proving that the function u defined by (7.2.4) is a solution of the b.v.p.
without assuming existence is much more demanding.
ˆt
At = V (Bs ) ds
0
Applying Itô’s formula with F (a, b) = e−a u(b) yields the decomposition
1
dXt = e−At ∇u(Bt ) · dBt − e−At u(Bt ) dAt + e−At ∆u(Bt ) dt
2
1
= e−At ∇u(Bt ) · dBt + e−At ∆u − V · u (Bt ) dt
2
of Xt into a local martingale up to time T and an absolutely continuous part. Since u
1
is a solution of (7.2.2), we have ∆u − V u = −g on D. By applying the optional
2
stopping theorem with a localizing sequence Tn ր T of stopping times, we obtain the
representation
ˆTn
u(x) = Ex [X0 ] = Ex [XTn ] + Ex e−At g(Bt ) dt
0
ˆTn
= Ex [e−ATn u(BTn )] + Ex e−At g(Bt ) dt
0
for x ∈ D. The assertion (7.2.4) now follows provided we can interchange the limit
as n → ∞ and the expectation values. For the second expectation on the right hand
side this is possible by the monotone convergence theorem, because g ≥ 0. For the first
expectation value, we can apply the dominated convergence theorem, because
T
−A
ˆ
e Tn u(BTn ) ≤ exp V − (Bs ) ds · sup |u(y)| ∀n ∈ N,
y∈D
0
of the diffusion process, cf. ??. The theorem hence establishes a general connection
between Itô diffusions and boundary value problems for linear second order elliptic
partial differential equations.
By Theorem 7.5 we can compute many interesting expectation values for Brownian mo-
tion by solving appropriate p.d.e. We now consider various corresponding applications.
Let us first recall the Dirichlet problem where V ≡ 0 and g ≡ 0. In this case,
u(x) = Ex [f (Bt )]. We have already pointed out in the last section that this can be
used to compute exit distributions and to study recurrence, transience and polarity of
linear subspaces for Brownian motion in Rd . A second interesting case of Theorem 7.5
is the stochastic representation for solutions of the Poisson equation:
which can be interpreted as an average cost accumulated by the Brownian path before
exit from the domain D. In particular, choosing g ≡ 1, we can compute the mean exit
time
u(x) = Ex [T ]
from D for Brownian motion starting at x by solving the corresponding Poisson prob-
lem.
where ∆ is an extra state added to the state space. By setting g(∆) = 0, the stochastic
representation (7.2.5) for a solution of the Poisson problem can be written in the form
∞
ˆ ˆ∞
u(x) = Ex g(Xt ) dt = (pD
t g)(x) dt, (7.2.6)
0 0
where
pD
t (x, A) = Px [Xt ∈ A], A ⊆ Rd measurable,
is the transition function for the absorbed process (Xt ). Note that for A ⊂ Rd ,
pD
t (x, A) = Px [Bt ∈ A and t < T ] ≤ pt (x, A) (7.2.7)
The function pD
t is called the heat kernel on the domain D w.r.t. absorption on the
boundary. Note that
ˆ∞
D
G (x, y) = pD
t (x, y) dt
0
is an occupation time density, i.e., it measures the average time time a Brownian mo-
tion started in x spends in a small neighbourhood of y before it exits from the Domain
This shows that the occupation time density GD (x, y) is the Green function (i.e.,
1
the fundamental solution of the Poisson equation) for the operator 2
with Dirichlet
boundary conditions on the domain D.
Note that although for domains with irregular boundary, the Green’s function might not
exist in the classical sense, the function GD (x, y) is always well-defined!
holds for x ∈ D.
As an application, we can, at least in principle, compute the full distribution of the exit
time T . In fact, choosing V ≡ α for some constant α > 0, the corresponding solution
uα of (7.2.8) yields the Laplace transform
ˆ∞
uα (x) = Ex [e−αT ] = e−αt µx (dt) (7.2.9)
0
of µx = Px ◦ T −1 .
Example (Exit times in R1 ). Suppose d = 1 and D = (−1, 1). Then (7.2.8) with
V = α reads
1 ′′
u (x) = αuα (x) for x ∈ (−1, 1), uα (1) = uα (−1) = 1.
2 α
By inverting the Laplace transform (7.2.9), one can now compute the distribution µx
of the first exit time T from (−1, 1). It turns out that µx is absolutely continuous with
density
∞
X
1 − (4n+1+x)
2
− (4n+1−x)
2
fT (t) = √ (4n + 1 + x)e 2t + (4n + 1 − x)e 2t , t ≥ 0.
2πt3 n=−∞
fT (t)
Figure 7.2: The density of the first exit time T depending on the starting point x ∈
[−1, 1] and the time t ∈ (0, 2].
of a bounded domain A ⊂ Rd for Brownian motion. This only makes sense for d ≥ 3,
since for d ≤ 2, the total occupation time of any non-empty open set is almost surely
infinite by recurrence of Brownian motion in R1 and R2 . The total occupation time is of
´∞
the form V (Bs ) ds with V = IA . Therefore, we should in principle be able to apply
0
Theorem 7.3, but we have to replace the exit time T by +∞ and hence the underlying
bounded domain D by Rd .
then
ˆ∞
u(x) = Ex exp − V (Bt ) dt for any x ∈ Rd .
0
by Theorem 7.3. Now let Dn = {x ∈ Rd : |x| < n}. Then TDnC ր ∞ as n → ∞. Since
d ≥ 3, Brownian motion is transient, i.e., lim |Bt | = ∞, and therefore by (7.2.10)
t→∞
denote the Laplace transform of the total occupation time of A. Although V = αIA is
not a continuous function, a representation of vα as a solution of a boundary problem
holds:
1
∆uα = αIA uα on Rd \ ∂A, lim uα (x) = 1, (7.2.12)
2 |x|→∞
then vα = uα .
Remark. The condition uα ∈ C 1 (Rd ) guarantees that uα is a weak solution of the p.d.e.
(7.2.10) on all of Rd including the boundary ∂U.
1 ′′ 1 ′′
f (r) + r −1 fα′ (r) = αfα (r) for r < 1, f (r) + r −1 fα (r) = 0 for r > 1.
2 α 2 α
University of Bonn 2015/2016
236 CHAPTER 7. BROWNIAN MOTION AND PDE
Taking into account the boundary condition and the condition uα ∈ C 1 (Rd ), one obtains
the rotationally symmetric solution
√ !
tanh( 2α)
1 +
√ − 1 · r −1 for r ∈ [1, ∞),
2α
√
uα (x) = √sinh( 2αr) √ · r −1 for r ∈ (0, 1) .
2α cosh 2α
1
√ for r = 0
cosh( 2α)
of (7.2.10), and hence an explicit formula for vα . In particular, for x = 0 we obtain the
simple formula
ˆ∞
1
E0 exp −α IA (Bt ) dt = uα (0) = √ .
cosh( 2α)
0
The right hand side has already appeared in the example above as the Laplace transform
of the exit time distribution of a one-dimensional Brownian motion starting at 0 from the
interval (−1, 1). Since the distribution is uniquely determined by its Laplace transform,
we have proven the remarkable fact that the total occupation time of the unit ball for a
standard Brownian motion in R3 has the same distribution as the first exit time from the
unit ball for a standard one-dimensional Brownian motion:
ˆ∞
3
I{|BR3 |<1} dt ∼ inf{t > 0 : |BtR | > 1}.
t
0
This is a particular case of a theorem of Ciesielski and Taylor who proved a correspond-
ing relation between Brownian motion in Rd+2 and Rd for arbitrary d.
that the backward equation for Brownian motion with absorption is a heat equation with
dissipation.
Then the accumulated absorption rate up to time t is given by the increasing process
ˆt
At = V (s, Bs ) ds, t ≥ 0.
0
We can think of the process At as an internal clock for the Brownian motion determining
the absorption time. More precisely, we define:
Definition. Suppose that (Bt )t≥0 is a d-dimensional Brownian motion and T is a with
parameter 1 exponentially distributed random variable independent of (Bt ). Let ∆ be
a separate state added to the state space Rd . Then the process (Xt ) defined by
Bt for At < T,
Xt :=
∆ for A ≥ T,
t
is called a Brownian motion with absorption rate V (t, x), and the random variable
ζ := inf{t ≥ 0 : At ≥ T }
A justification for the construction is given by the following informal computation: For
an infinitesimal time interval [t, t + dt] and almost every ω,
by the memoryless property of the exponential distribution, i.e., V (t, x) is indeed the
infinitesimal absorption rate.
Rigorously, it is not difficult to verify that (Xt ) is a Markov process with state space
Rd ∪ {∆} where ∆ is an absorbing state. The Markov process is time-homogeneous if
V (t, x) is independent of t.
For a measurable subset D ⊆ Rd and t ≥ 0 the distribution µt of Xt is given by
Theorem 7.7 (Forward equation for Brownian motion with absorption). The sub-
probability measures µt on Rd solve the heat equation
∂µt 1
= ∆µt − V (t, •)µt (7.3.2)
∂t 2
in the following distributional sense:
ˆt ˆ
1
ˆ ˆ
f (x)µt (dx) − f (x)µ0 (dx) = ( ∆f (x) − V (s, x)f (x))µs (dx) ds
2
0
Here C02 (Rd ) denotes the space of C 2 -functions with compact support. Under additional
regularity assumptions it can be shown that µt has a smooth density that solves (7.3.1)
in the classical sense. The equation (7.3.1) describes heat flow with cooling when the
heat at x at time t dissipates with rate V (t, x).
Proof. By (7.3.1), ˆ
f dµt = E[exp(−At ) ; f (Bt )] (7.3.3)
ˆt ˆt
1
e−At f (Bt ) = f (B0 ) + Mt + e−As f (Bs )V (s, Bs ) ds + e−As ∆f (Bs ) ds,
2
0 0
for t ≥ 0, where (Mt ) is a local martingale. Taking expectation values for a localizing
sequence of stopping times and applying the dominated convergence theorem subse-
quently, we obtain
ˆt
−At 1
E[e f (Bt )] = E[f (B0 )] + E[e−As ( ∆f − V (s, •)f )(Bs )] ds.
2
0
Here we have used that 12 ∆f (x)−V (s, x)f (x) is uniformly bounded for (s, x) ∈ [0, t]×
Rd , because f has compact support and V is locally bounded. The assertion now follows
by (7.3.3).
Exercise (Heat kernel and Green’s function). The transition kernel for Brownian mo-
tion with time-homogeneous absorption rate V (x) restricted to Rd is given by
ˆt
pVt (x, D) = Ex exp − V (Bs ) ds ; Bt ∈ D .
0
(1). Prove that for any t > 0 and x ∈ Rd , the sub-probability measure pVt (x, •) is
absolutely continuous on Rd with density satisfying
∂u 1
(s, x) = ∆u(s, x) − V (s, x)u(s, x) + g(s, x) for s ∈ (0, t], x ∈ Rd ,
∂s 2
(7.3.5)
u(0, x) = f (x),
Remark. The equation (7.3.5) describes heat flow with sinks and dissipation.
Proof. We first reverse time on the interval [0, t]. The function
û(s, x) = u(t − s, x)
Remark (Extension to diffusion processes). Again a similar result holds under a ap-
propriate regularity assumptions for Brownian motion replaced by a solution of a s.d.e.
dXt = σ(Xt )dBt + b(Xt )dt and 21 ∆ replaced by the corresponding generator, cf. ??.
ˆt
At = λ({s ∈ [0, t] : Bs > 0}) = I(0,∞) (Bs ) ds.
0
Theorem 7.9 (Arc-sine law of P.Lévy). For any t > 0 and θ ∈ [0, 1],
√ ˆθ
2 1 ds
P0 [At /t ≤ θ] = arcsin θ = p .
π π s(1 − s)
0
2
π
0.5 1.0
Note that the theorem shows in particular that a law of large numbers does not hold!
Indeed, for each ε > 0,
t
ˆ
1 1
P0 I(0,∞) (Bs ) ds − > ε 6→ 0 as t → ∞.
t 2
0
Even for large times, values of At /t close to 0 or 1 are the most probable. By the func-
tional central limit theorem, the proportion of time that one player is ahead in a long
coin tossing game or a counting of election results is also close to the arcsine law. In
particular, it is more then 20 times more likely that one player is ahead for more than
98% of the time than it is that each player is ahead between 49% and 51% of the time
[Steele].
Before proving the arc-sine law, we give an informal derivation based on the time-
dependent Feynman-Kac formula.
The idea for determining the distribution of At is again to consider the Laplace trans-
forms
u(t, x) = Ex [exp(−βAt )], β > 0.
By the Feynman-Kac formula, we could expect that u solves the equation
∂u 1 ∂2u
= (7.3.6)
∂t 2 ∂x2
with initial condition u(0, x) = 1. To solve the parabolic p.d.e. (7.3.6), we consider
another Laplace transform: The Laplace transform
∞
ˆ∞ ˆ
vα (x) = e−αt u(t, x) dt = Ex e−αt−βAt dt , α > 0,
0 0
Remark. The method of transforming a parabolic p.d.e. by the Laplace transform into
an elliptic equation is standard and used frequently. In particular, the Laplace trans-
form of a transition semigroup (pt )t≥0 is the corresponding resolvent (gα )α≥0 , gα =
´ ∞ −αt
0
e pt dt, which is crucial for potential theory.
Instead of trying to make the informal argument above rigorous, one can directly prove
the arc-sine law by applying the stationary Feynman-Kac formula:
(1). Let g ∈ Cb (R). Show that if vα is a bounded solution of (7.3.7) on R \ {0} with
vα ∈ C 1 (R) ∩ C 2 (R \ {0}) then
∞
ˆ
vα (x) = Ex g(Bt )e−αt−βAt dt for any x ∈ R.
0
(3). Now use the uniqueness of the Laplace inversion to show that the distribution µt
of At /t under P• is absolutely continuous with density
1
fAt /t (s) = p .
π s · (1 − s)
Suppose that (Bt )t≥0 is a given Brownian motion defined on a probability space (Ω, A, P ).
We will now study solutions of stochastic differential equations (SDE) of type
Recall that FtB,P denotes the completion of the filtration FtB = σ(Bs | 0 ≤ s ≤ t)
generated by the Brownian motion. Let T be an (FtB,P ) stopping
time.
We call a
process (t, ω) 7→ Xt (ω) defined for t < T (ω) adapted w.r.t. FtB,P , if the trivially
extended process Xet = Xt · I{t<T } defined by
Xt for t < T
et :=
X ,
0 for t ≥ T
is (FtB,P )-adapted.
245
246 CHAPTER 8. SDE: EXPLICIT COMPUTATIONS
Definition. An almost surely continuous stochastic process (t, ω) 7→ Xt (ω) defined for
t ∈ [0, T (ω)) is called a strong solution of the stochastic differential equation (8.0.1) if
it is (FtB,P )-adapted, and the equation
ˆt ˆt
Xt = X0 + b(s, Xs ) ds + σ(s, Xs ) dBs for t ∈ [0, T ) (8.0.2)
0 0
The terminology “strong” solution will be explained later when we introduce “weak”
solutions. The point is that a strong solution is adapted w.r.t. the filtration (FtB,P ) gener-
ated by the Brownian motion. Therefore, a strong solution is essentially (up to modifi-
cation on measure zero sets) a measurable function of the given Brownian motion! The
concept of strong and weak solutions of SDE is not related to the analytic definition of
strong and weak solutions for partial differential equations.
In this section we study properties of solutions and we compute explicit solutions for
one-dimensional SDE. We start with an example:
Example (Asset price model in continuous time). A nearby model for an asset price
process (Sn )n=0,1,2,... in discrete time is to define Sn recursively by
with an (Ft )-Brownian motion (Bt ) and (FtP ) adapted continuous stochastic processes
(αt )t≥0 and (σt )t≥0 , where (Ft ) is a given filtration on a probability space (Ω, A, P ).
The processes αt and σt describe the instantaneous mean rate of return and the volatility.
Both are allowed to be time dependent and random.
In order to compute a solution of (8.0.3), we assume St > 0 for any t ≥ 0, and divide
the equation by St :
1
dSt = αt dt + σt dBt . (8.0.4)
St
We will prove in Section 8.1 that if an SDE holds then the SDE multiplied by a contin-
uous adapted process also holds, cf. Theorem 8.1. Hence (8.0.4) is equivalent to (8.0.3)
if St > 0. If (8.0.4) would be a classical ordinary differential equation then we could
1
use the identity d log St = St
dSt to solve the equation. In Itô calculus, however, the
classical chain rule is violated. Nevertheless, it is still useful to compute d log St by
Itô’s formula. The process (St ) has quadratic variation
•
ˆ ˆt
[S]t = σr Sr dBr = σr2 Sr2 dr for any t ≥ 0,
0 t 0
almost surely along an appropriate sequence (πn ) of partitions with mesh(πn ) → 0. The
´t
first equation holds by (8.0.3), since t 7→ αr Sr dr has finite variation, and the second
0
identity is proved in Theorem 8.1 below. Therefore, Itô’s formula implies:
1 1
d log St = dSt − 2 d[S]t
St 2St
1
= αt dt + σt dBt − σt2 dt
2
= µt dt + σt dBt ,
or, equivalently, t
ˆ ˆt
St = S0 · exp µs ds + σs dBs . (8.0.5)
0 0
Conversely, one can verify by Itô’s formula that (St ) defined by (8.0.5) is indeed a
solution of (8.0.3). Thus we have proven existence, uniqueness and an explicit repre-
sentation for a strong solution of (8.0.3). In the special case when αt ≡ α and σt ≡ σ
are constants in t and ω, the solution process
St = S0 exp σBt + (α − σ 2 /2)t
50
40
30
20
10
1 2 3 4 5 6 7 8 9 10
Figure 8.1: Three one dimensional geometric Brownian motions with α2 = 1 and σ =
0.1 (blue), σ = 1.0 (red) and σ = 2.0 (magenta).
where
ˆt ˆt
At = Ks ds and It = Hs dBs (8.1.2)
0 0
with (Ht )t<T and (Kt )t<T almost surely continuous and (FtB,P )-adapted. A stochas-
tic process of type (8.1.1) is called an Itô process. In order to compute and analyze
solutions of SDE we will apply Itô’s formula to Itô processes. Since the absolutely con-
tinuous process (At ) has finite variation, classical Stieltjes calculus applies to this part
of an Itô process. It remains to consider the stochastic integral part (It ):
Theorem 8.1 (Composition rule and quadratic variation). Suppose that T is a pre-
dictable stopping time and (Ht )t<T is almost surely continuous and adapted.
(1). For any almost surely continuous, adapted process (Gt )0≤t<T , and for any t ≥ 0,
X ˆt
lim Gs (Is′ ∧t − Is ) = Gs Hs dBs (8.1.3)
n→∞
s∈πn 0
s<t
(2). For any t ≥ 0, the quadratic variation [I]t along (πn ) is given by
X ˆt
2
[I]t = lim (Is′ ∧t − Is ) = Hs2 ds (8.1.4)
n→∞
s∈πn 0
s<t
Remark (Uniform convergence). Similarly to the proof of Theorem 5.14 one can show
that there is a sequence of bounded stopping times Tk ր T such that almost surely along
a subsequence, the convergence in (8.1.3) and (8.1.4) holds uniformly on [0, Tk ] for any
k ∈ N.
Proof. (1). We first fix a > 0 and assume that H is in L2a ([0, a)) and G is bounded,
′ ∧t
s´
left-continuous and adapted on [0, ∞) × Ω. Since Is′ ∧t − Is = Hr dBr , we
s
obtain
X ˆt
Gs (Is′ ∧t − Is ) = G⌊r⌋n Hr dBr
s∈πn 0
s<t
(2). We first assume that H is in L2a ([0, ∞)), continuous and bounded. Then for s ∈
πn ,
ˆs′ ∧t
δIs = Is′ ∧t − Is = Hr dBr = Hs δBs + Rs(n)
s
′ ∧t
s´
(n)
where Rs := (Hr − H⌊r⌋n ) dBr . Therefore,
s
X X X X
(δIs )2 = Hs2 (δBs )2 + 2 Rs(n) Hs δBs + (Rs(n) )2 .
s∈πn s∈πn s∈πn s∈πn
s<t s<t s<t s<t
Since [B]t = t almost surely, the first term on the right-hand side converges
´t
to Hs2 ds with probability one. It remains to show that the remainder terms
0
converge to 0 in probability as n → ∞. This is the case, since
s′ ∧t
hX i X Xˆ
E (Rs(n) )2 = E[(Rs(n) )2 ] = E[(Hr − H⌊r⌋n )2 ] dr
s
ˆt
= E[(Hr − H⌊r⌋n )2 ] dr −→ 0
0
P (n)
by the Itô isometry and continuity and boundedness of H, whence (Rs )2 → 0
P (n)
in L1 and in probability, and Rs Hs δBs → 0 in the same sense by the Schwarz
inequality.
For H defined up to a stopping time T , the assertion now follows by a localization
procedure similar to the one applied above.
The theorem and the corresponding composition rule for Stieltjes integrals suggest that
we may define stochastic integrals w.r.t. an Itô process
ˆt ˆt
Xt = X0 + Hs dBs + Ks ds, t < T,
0 0
Definition. Suppose that (Bt ) is a Brownian motion on (Ω, A, P ) w.r.t. a filtration (Ft ),
X0 is an (F0P )-measurable random variable, T is a predictable (FtP )-stopping time,
and (Gt ), (Ht ) and (Kt ) are almost surely continuous, (FtP ) adapted processes defined
for t < T . Then the stochastic integral of (Gt ) w.r.t. (Xt ) is the Itô process defined by
ˆt ˆt ˆt
Gs dXs = Gs Hs dBs + Gs Ks ds, t < T.
0 0 0
By Theorem 8.1, this definition is consistent with a definition by Riemann sum approx-
imations. Moreover, the definition shows that the class of Itô processes w.r.t. a given
Brownian motion is closed under taking stochastic integrals! In particular, strong solu-
tions of SDE w.r.t. Itô processes are again Itô processes.
Linearity:
Composition rule:
dY = G dX ⇒ e dY = GG
G e dX,
Quadratic variation:
dY = G dX ⇒ d[Y ] = G2 d[X],
∂F ∂F 1 ∂2F
dF (t, X) = (t, X) dX + (t, X) dt + (t, X) d[X]
∂x ∂t 2 ∂x2
All equations are to be understood in the sense that the corresponding stochastic inte-
grals over any interval [0, t], t < T , coincide almost surely.
The proofs are straightforward. For example, if
ˆt
Yt = Y0 + Gs dXs
0
and
ˆt ˆt
Xt = X0 + Ks ds + Hs dBs
0 0
and hence
ˆt ˆt ˆt ˆt
es dYs =
G es Gs Ks ds +
G es Gs Hs dBs =
G es Gs dXs
G
0 0 0 0
and
ˆ• ˆt ˆt
[Y ]t = Gs Hs dBs = G2s Hs2 ds = G2s d[X]s .
0 t 0 0
Moreover, Theorem 8.1 guarantees that the stochastic integrals in Itô’s formula (which
are limits of Riemann-Itô sums) coincide with the stochastic integrals w.r.t. Itô processes
defined above.
Example (Option Pricing in continuous time I). We again consider the continuous
time asset price model introduced in the beginning of Chapter 8. Suppose an agent is
holding φt units of a single asset with price process (St ) at time t, and he invests the
remainder Vt − φt St of his wealth Vt in the money market with interest rate Rt . We
assume that (φt ) and (Rt ) are continuous adapted processes. Then the change of wealth
in a small time unit should be described by the Itô equation
Similarly to the discrete time case, we consider the discounted wealth process
ˆt
Vet := exp − Rs ds Vt .
0
´t
Since t 7→ Rs ds has finite variation, the Itô rule and the composition rule for stochas-
0
tic integrals imply:
ˆt ˆt
dVet = exp − Rs ds dVt − exp − Rs ds Rt Vt dt
0
0
ˆt ˆt
= exp − Rs ds φt dSt − exp − Rs ds Rt φt St dt
0 0
ˆt ˆt
= φt · exp − Rs ds dSt − exp − Rs ds Rt St dt
0 0
= φt dSet ,
We will now apply Itô’s formula to solutions of stochastic differential equations. Let
b, σ ∈ C(R+ × I) where I ⊆ R is an open interval. Suppose that (Bt ) is an (Ft )-
Brownian motion on (Ω, A, P ), and (Xt )0≤t<T is an (FtP )-adapted process with values
in I and defined up to an (FtP ) stopping time T such that the SDE
ˆt ˆt
Xt − X0 = b(s, Xs ) ds + σ(s, Xs ) dBs for any t < T (8.1.5)
0 0
Corollary 8.2 (Doeblin 1941, Itô 1944). Let F ∈ C 1,2 (R+ × I). Then almost surely,
ˆt
F (t, Xt ) − F (0, X0 ) = (σF ′ )(s, Xs ) dBs (8.1.6)
0
ˆt
∂F 1 2 ′′ ′
+ + σ F + bF (s, Xs ) ds for any t < T ,
∂t 2
0
Proof. Let (πn ) be a sequence of partitions with mesh(πn ) → 0. Since the process t 7→
´t
X0 + b(s, Xs ) ds has sample paths of locally finite variation, the quadratic variation
0
of (Xt ) is given by
ˆ• ˆt
[X]t = σ(s, Xs ) dBs = σ(s, Xs )2 ds ∀t < T
0 t 0
w.r.t. almost sure convergence along a subsequence of (πn ). Hence Itô’s formula can be
applied to almost every sample path of (Xt ), and we obtain
ˆt ˆt ˆt
∂F 1
F (t, Xt ) − F (0, X0) = F ′ (s, Xs ) dXs + (s, Xs ) ds + F ′′ (s, Xs ) d[X]s
∂t 2
0 0 0
ˆt ˆt ˆt ˆt
∂F 1
= (σF ′ )(s, Xs ) dBs + (bF ′ )(s, Xs ) ds + (s, Xs ) ds + (σ 2 F ′′ )(s, Xs ) ds
∂t 2
0 0 0 0
for all t < T , P -almost surely. Here we have used (8.1.5) and the fact that the Itô
integral w.r.t. X is an almost sure limit of Riemann-Itô sums after passing once more to
an appropriate subsequence of (πn ).
∂c ∂c 1 ∂2c
(t, x) + rx (t, x) + σ 2 x2 2 (t, x) = rc(t, x) (8.1.7)
∂t ∂x 2 ∂x
∂c
φt = (t, St ) (=: Delta ). (8.1.8)
∂x
Hint: Consider the discounted portfolio value Vet = e−rt Vt and, correspondingly, the
discounted option value e−rt c(t, St ). Compute the Ito differentials, and conclude that
both processes coincide if c is a solution to (8.1.7) and φt is given by (8.1.8).
1
(L F )(t, x) = σ(t, x)2 F ′′ (t, x) + b(t, x)F ′ (t, x).
2
In particular, in the time-homogeneous case and for T = ∞, any solution of (8.1.5)
solves the martingale problem for the operator L F = 21 σ 2 F ′′ +bF ′ with domain C02 (I).
Similarly as for Brownian motion, the martingales identified by the Itô-Doeblin formula
can be used to compute various expectation values for the Itô diffusion (Xt ). In the next
section we will look at first examples.
Remark (Uniqueness and Markov property of strong solutions). If the coefficients
are, for example, Lipschitz continuous, then the strong solution of the SDE (8.1.5) is
unique, and it has the strong Markov property, i.e., it is a diffusion process in the
classical sense (a strong Markov process with continuous sample paths). By the Itô-
Doeblin formula, the generator of this Markov process is an extension of the operator
(L , C02 (I)).
Although in general, uniqueness and the Markov property may not hold for solutions of
the SDE (8.1.5), we call any solution of this equation an Itô diffusion.
with a given Brownian motion (Bt ), x0 ∈ (0, ∞), and continuous time-homogeneous
coefficients b, σ : (0, ∞) → R. We assume that the solution is defined up to the explo-
sion time
T = sup Tε,r , Tε,r = inf{t ≥ 0 | Xt 6∈ (ε, r)}.
ε,r>0
2b ′ 2b
Lh = 0 ⇐⇒ h′′ = − h ⇐⇒ (log h′ )′ = − .
σ2 σ2
Therefore, the two-dimensional vector space of harmonic functions is spanned by the
constant function 1 and by the function
ˆx ˆz
2b(y)
s(x) = exp − dy dz.
σ(y)2
x0 x0
s(x) is called a scale function of the process (Xt ). It is strictly increasing and harmonic
on (0, ∞). Hence we can think of s : (0, ∞) → (s(0), s(∞)) as a coordinate transfor-
mation, and the transformed process s(Xt ) is a local martingale up to the explosion time
T . Applying the martingale convergence theorem and the optional stopping theorem to
s(Xt ) one obtains:
(1). The exit time Tε,r = inf{t ∈ [0, T ) : Xt 6∈ (ε, r)} is almost surely less than T .
s(r) − s(x)
(2). P [Tε < Tr ] = P [XTε,r = ε] = .
s(r) − s(ε)
(2). The scale function and the ruin probabilities depend only on the ratio b(x)/σ(x)2 .
In particular, we have
and
" #
[ \ s(x0 ) − s(ε)
P [Xt → 0 as t ր T ] = P {Tε < Tr } = lim lim .
rր∞ εց0 s(r) − s(ε)
r<∞ ε>0
(1). If s(0) = −∞ and s(∞) = ∞, then the process (Xt ) is recurrent, i.e.,
and
s(x0 ) − s(0)
P lim Xt = ∞ =
tրT s(∞) − s(0)
Intuitively, if s(0) = −∞, in the natural scale the boundary is transformed to −∞,
which is not a possible limit for the local martingale s(Xt ), whereas otherwise s(0) is
finite and approached by s(Xt ) with strictly positive probability.
Remark (Explosion in finite time, Feller’s test). Corollary 8.4 does not tell us whether
the explosion time T is infinite with probability one. It can be shown that this is always
the case if (Xt ) is recurrent. In general, Feller’s test for explosions provides a necessary
and sufficient condition for the absence of explosion in finite time. The idea is to com-
pute a function g ∈ C(0, ∞) such that e−t g(Xt ) is a local martingale and to apply the
optional stopping theorem. The details are more involved than in the proof of corollary
above, cf. e.g. Section 6.2 in [Durrett: Stochastic calculus].
and hence
ˆx
2α = const. · x−2α/σ2 .
s′ (x) = const. · exp − dy
σ2 y
x0
Therefore,
which again shows that St → ∞ for α > σ 2 /2, St → 0 for α < σ 2 /2, and St is
recurrent for α = σ 2 /2.
with given constants β ∈ R, σ > 0, and values in R+ . Note that in contrast to the
√
equation of geometric Brownian motion, the multiplicative factor Xt in the noise term
is not a linear function of Xt . As a consequence, there is no explicit formula for a
solution of (8.2.2). Nevertheless, a general existence result guarantees the existence of
a strong solution defined up to the explosion time
T = sup TR\(ε,r) ,
ε,r>0
is the number of offspring of the i-th individual in the l-th generation. We assume that
the mean and the variance of the offspring distribution are given by
We are interested in a scaling limit of the model as the size h of time steps goes to 0. To
establish convergence to a limit process as h ց 0 we rescale the population size by h,
i.e., we consider the process
Xth := h · Z⌊t⌋
h
, t ∈ [0, ∞).
h
E[Xt+h − Xth | Fth ] = h · E[Zt+h
h
− Zth | Fth ] = hηhZth = hβXth ,
h
Var[Xt+h − Xth | Fth ] = h2 · Var[Zt+h
h
− Zth | Fth ] = h2 σ 2 Zth = hσ 2 Xth ,
We now analyze the asymptotics of solutions of (8.2.2). The ratio of drift and diffusion
√
coefficient is βx/(σ x)2 = β/σ, and hence the derivative of a scale function is
Thus s(0) is always finite, and s(∞) = ∞ if and only if β ≤ 1. Therefore, by Corollary
8.4, in the subcritical and critical case β ≤ 1, we obtain
Cox-Ingersoll-Ross model
The CIR model is a model for the stochastic evolution of interest rates or volatilities.
The equation is
p
dRt = (α − βRt ) dt + σ Rt dBt R0 = x0 , (8.2.3)
with a one-dimensional Brownian motion (Bt ) and positive constants α, β, σ > 0. Al-
though the s.d.e. looks similar to the equation for Feller’s branching diffusion, the
behaviour of the drift coefficient near 0 is completely different. In fact, the idea is that
the positive drift α pushes the process away from 0 so that a recurrent process on (0, ∞)
is obtained. We will see that this intuition is true for α ≥ σ 2 /2 but not for α < σ 2 /2.
Again, there is no explicit solution for the s.d.e. (8.13), but existence of a strong solution
holds. The ratio of the drift and diffusion coefficient is (α − βx)/σ 2 x, which yields
2 2
s′ (x) = const. · x−2α/σ e2βx/σ .
Hence s(∞) = ∞ for any β > 0, and s(0) = ∞ if and only if 2α ≥ σ 2 . Therefore, the
CIR process is recurrent if and only if α ≥ σ 2 /2, whereas Xt → 0 as t ր T almost
surely otherwise.
By applying Itô’s formula one can now prove that Xt has finite moments, and compute
the expectation and variance explicitly. Indeed, taking expectation values in the s.d.e.
ˆt ˆt p
Rt = x0 + (α − βRs ) ds + σ Rs dBs ,
0 0
we obtain informally
d
E[Rt ] = α − βE[Rt ],
dt
and hence by variation of constants,
α
E[Rt ] = x0 · e−βt + (1 − e−βt ).
β
´t √
To make this argument rigorous requires proving that the local martingale t 7→ σ Rs dBs
0
is indeed a martingale:
(1). Show by applying Itô’s formula to x 7→ |x|p that E[|Rt |p ] < ∞ for any t ≥ 0 and
p ≥ 1.
(3). Proceed in a similar way to compute the variance of Rt . Find its asymptotic value
lim Var[Rt ].
t→∞
where (Bt ) is a Brownian motion, and the coefficients are deterministic continuous
functions β, σ : [0, ∞) → R. Hence the drift term βt Xt is linear in Xt , and the diffusion
coefficient does not depend on Xt , i.e., the noise increment σt dBt is proportional to
white noise dBt with a proportionality factor that does not depend on Xt .
Variation of constants
An explicit strong solution of the SDE (8.3.1) can be computed by a “variation of con-
stants” Ansatz. We first note that the general solution in the deterministic case σt ≡ 0 is
given by t
ˆ
Xt = const. · exp βs ds .
0
with a continuous Itô process (Ct ) driven by the Brownian motion (Bt ). By the Itô
product rule, t
ˆ
dXt = βt Xt dt + exp βs ds dCt .
0
i.e.,
ˆt ˆr
Ct = C0 + exp − βs ds σr dBr .
0 0
We thus obtain:
Theorem 8.5. The almost surely unique strong solution of the SDE (8.3.1)with initial
value x is given by
t
ˆt ˆt ˆ
Xtx = x · exp − βs ds + exp βs ds σr dBr .
0 0 r
Note that the theorem not only yields an explicit solution but it also shows that the
solution depends smoothly on the initial value x. The effect of the noise on the solution
is additive and given by a Wiener-Itô integral, i.e., an Itô integral with deterministic
integrand. The average value
t
ˆ
E[Xtx ] = x · exp Bs ds , (8.3.2)
0
coincides with the solution in the absence of noise, and the mean-square deviation from
this solution due to random perturbation of the equation is
t t t
ˆ ˆ ˆt ˆ
Var[Xtx ] = Var exp βs ds σr dBr = exp 2 βs ds σr2 dr
0 r 0 r
Lemma 8.6. For any deterministic function h ∈ L2 (0, t), the Wiener-Itô integral It =
´t ´t
hs dBs is normally distributed with mean 0 and variance h2s ds.
0 0
P
n−1
Proof. Suppose first that h = ci · I(ti ,ti+1 ] is a step function with n ∈ N, c1 , . . . , cn ∈
i=0
P
n−1
R, and 0 ≤ t0 < t1 < . . . < tn . Then It = ci · (Bti+1 − Bti ) is normally distributed
i=0
with mean zero and variance
n−1
X ˆt
Var[It ] = c2i (ti+1 − ti ) = h2s ds.
i=0 0
In general, there exists a sequence (h(n) )n∈N of step functions such that h(n) → h in
L2 (0, t), and
ˆt ˆt
It = h dB = lim h(n) dB in L2 (Ω, A, P ).
n→∞
0 0
ˆt∧s
E[It ] = 0 and Cov[It , Is ] = h2r ds for any t, s ≥ 0.
0
Proof. Let 0 ≤ t1 < . . . < tn . To show that (It1 , . . . , Itn ) has a normal distribution it
suffices to prove that any linear combination of the random variables It1 , . . . , Itn is nor-
mally distributed. This holds true since any linear combination is again an Itô integral
with deterministic integrand:
n
X ˆtn X
n
λi Iti = λi · I(0,ti ) (s)hs dBs
i=1 0 i=1
Cov[It , Is ] = E[It Is ]
∞
ˆ ˆ∞
= E hr · I(0,t) (r) dBr hr · I(0,s) (r) dBr
0 0
= (h · I(0,t) , h · I(0,s) )L2 (0,∞)
ˆs∧t
= h2r dr.
0
More generally, by Theorem 8.7 and Theorem 8.5, any solution (Xt ) of a linear SDE
with additive noise and deterministic (or Gaussian) initial value is a continuous Gaussian
process. In fact by (8.3.1), the marginals of (Xt ) are affine functions of the correspond-
ing marginals of a Wiener-Itô integral:
ˆt ˆr
1
Xtx = · x + hr σr dBr with hr = exp − βu du .
ht
0 0
Hence all finite dimensional marginals of (Xtx ) are normally distributed with
ˆt∧s
1
E[Xtx ] = x/Ht and Cov[Xtx , Xsx ] = · h2r σr2 dr.
ht hs
0
Gaussian process. The unique strong solution of the s.d.e. (8.3.3) with initial condition
x is given explicitly by
ˆt
Vtx = e−γt x + σ eγs dBs . (8.3.4)
0
In particular,
E[Vtx ] = e−γt x,
and
ˆt∧s
Cov[Vtx , Vsx ] = e−γ(t+s) σ 2 e2γr dr
0
2
σ −γ|t−s|
= (e − e−γ(t+s) ) for any t, s ≥ 0.
2γ
Note that as t → ∞, the effect of the initial condition decays exponentially fast with rate
γ. Similarly, the correlations between Vtx and Vsx decay exponentially as |t − s| → ∞.
The distribution at time t is
σ2
Vtx ∼ −γt
N e x, (1 − e−2γt
) . (8.3.5)
2γ
In particular, as t → ∞
D σ2
Vtx −→ N 0, .
2γ
One easily verifies that N(0, σ 2 /2γ) is an equilibrium for the process: If V0 ∼ N(0, σ 2 /2γ)
and (Bt ) is independent of V0 then
ˆt
Vt = e−γt V0 + σ eγ(s−t) dBs
0
2 ˆt
σ −2γt
∼ N 0, e + σ2 e2γ(s−t) ds = N(0, σ 2 /2γ)
2γ
0
for any t ≥ 0.
Remark. The pathwise counterpart of the Markov property used in the proof above is
called cocycle property of the stochastic flow x 7→ Vtx .
The Itô-Doeblin formula can now be used to identify the generator of the Ornstein-
Uhlenbeck process: Taking expectation values, we obtain the forward equation
ˆt
E[F (Vtx )] = F (x) + E[(L F )(Vsx )] ds
0
whence
ˆt
(pt f )(x) − f (x) 1
lim = lim E[(L f )(Vsx )] ds = (L f )(x)
tց0 t tց0 t
0
by continuity and dominated convergence. This shows that the infinitesimal generator
of the Ornstein-Uhlenbeck process is an extension of the operator (L , C02 (R)).
Change of time-scale
We will now prove that Wiener-Itô integrals can also be represented as Brownian motion
with a coordinate transformation on the time axis. Hence solutions of one-dimensional
linear SDE with additive noise are affine functions of time changed Brownian motions.
´t
We first note that a Wiener-Itô integral It = 0 hr dBr with h ∈ L2loc (0, ∞) is a contin-
uous centered Gaussian process with covariance
ˆt∧s
Cov[It , Is ] = h2r dr = τ (t) ∧ τ (s)
0
where
ˆt
τ (t) := h2r dr = Var[It ]
0
These are exactly the covariance of a Brownian motion. Since a continuous Gaussian
process is uniquely determined by its expectations and covariances, we can conclude:
Theorem 8.9 (Wiener-Itô integrals as time changed Brownian motions). The pro-
es := Iτ −1 (s) , 0 ≤ s < τ (∞), is a Brownian motion, and
cess B
eτ (t)
It = B for any t ≥ 0, P -almost surely.
Proof. Since (B es )0≤s<τ (∞) has the same marginal distributions as the Wiener-Itô in-
es ) is again a continuous centered Gaussian
tegral (It )t≥0 (but at different times), (B
process. Moreover, Cov[Bt , Bs ] = t ∧ s, so that (Bs ) is indeed a Brownian motion.
Note that the argument above is different from previous considerations in the sense that
the Brownian motion (B es ) is constructed from the process (It ) and not vice versa.
This means that we can not represent (It ) as a time-change of a given Brownian motion
(e.g. (Bt )) but we can only show that there exists a Brownian motion (B es ) such that I
e This way of representing stochastic processes w.r.t. Brownian
is a time-change of B.
motions that are constructed from the process corresponds to the concept of weak solu-
tions of stochastic differential equations, where driving Brownian motion is not given a
priori. We return to these ideas in Section 9, where we will also prove that continuous
local martingales can be represented as time-changed Brownian motions.
Theorem 8.9 enables us to represent solution of linear SDE with additive noise by time-
changed Brownian motions. We demonstrate this with an example: By the explicit
formula (8.3.4) for the solution of the Ornstein-Uhlenbeck SDE, we obtain:
e 1 2γt )
Vtx = e−γt (x + σ B (e −1)
2γ
Proof. The corresponding time change for the Wiener-Itô integral is given by
ˆt
τ (t) = exp(2γs) ds = (exp(2γt) − 1)/2γ.
0
Wiener-Lévy construction
Recall that the Brownian motion (Bt ) has the Wiener-Lévy representation
∞ X
X
Bt (ω) = Y (ω)t + 2n − 1Yn,k (ω)en,k (t) for t ∈ [0, 1] (8.4.1)
n=0 k=0
are independent. This suggests that we can construct the bridge by replacing Y (ω) by
the constant value y. Let
Xty := yt + Xt = Bt + (y − B1 ) · t,
and let µy denote the distribution of the process (Xty )0≤t≤1 on C([0, 1]). The next theo-
rem shows that Xty is indeed a Brownian motion conditioned to end at y at time 1:
(2). P [(Bt )0≤t≤1 ∈ A | B1 ] = µB1 [A] holds P -almost surely for any given Borel
subset A ⊆ C([0, 1]).
(3). If F : C([0, 1]) → R is a bounded and continuous function (w.r.t. the supremum
´
norm on C([0, 1])) then the map y 7→ F dµy is continuous.
The last statement says that <7→ µy is a continuous function w.r.t. the topology of weak
convergence.
Finite-dimensional distributions
We now compute the marginals of the Brownian bridge Xty :
Corollary 8.12. For any n ∈ N and 0 < t1 < . . . < tn < 1, the distribution of
(Xty1 , . . . , Xtyn ) on Rn is absolutely continuous with density
pt1 (0, x1 )pt2 −t1 (x1 , x2 ) · · · ptn −tn−1 (xn−1 , xn )p1−tn (xn , y)
fy (x1 , . . . , xn ) = . (8.4.2)
p1 (0, y)
fBt1 ,...,Btn ,B1 (x1 , . . . , xn , y) = pt1 (0, x0 )pt2 −t1 (x1 , x2 ) · · · ptn −tn−1 (xn−1 , xn )p1−tn (xn , y).
Since the distribution of (Xty1 , . . . , Xtyn ) is a regular version of the conditional distribu-
tion of (Bt1 , . . . , Btn ) given B1 , it is absolutely continuous with the conditional density
In general, any almost surely continuous process on [0, 1] with marginals given by
(8.4.2) is called a Brownian bridge from 0 to y in time 1. A Brownian bridge from x
to y in time t is defined correspondingly for any x, y ∈ R and any t > 0. In fact, this
definition of the bridge process in terms of the marginal distributions carries over from
Brownian motion to arbitrary Markov processes with strictly positive transition densi-
ties. In the case of the Brownian bridge, the marginals are again normally distributed:
Theorem 8.13 (Brownian bridge as a Gaussian process). The Brownian bridge from
0 to y in time 1 is the (in distribution unique) continuous Gaussian process (Xty )t∈[0,1]
with
The function c(t, s) is the Green function of the operator d2 /dt2 with Dirichlet boundary
conditions on the interval [0, 1]. This is related to the fact that the distribution of the
The second derivative d2 /dt2 is the linear operator associated with this quadratic from.
• It can not be carried over to more general diffusion processes with possibly non-
linear drift and diffusion coefficients.
• The bridge Xty = Bt + t(y − B1 ) does not depend on (Bt ) in an adapted way,
because the terminal value B1 is required to define Xty for any t > 0.
We will now show how to construct a Brownian bridge from a Brownian motion in an
adapted way. The idea is to consider an SDE w.r.t. the given Brownian motion with a
drift term that forces the solution to end at a given point at time 1. The size of the drift
term will be large if the process is still far away from the given terminal point at a time
close to 1. For simplicity we consider a bridge (Xt ) from 0 to 0 in time 1. Brownian
bridges with other end points can be constructed similarly. Since the Brownian bridge
is a Gaussian process, we may hope that there is a linear stochastic differential equation
with additive noise that has a Brownian bridge as a solution. We therefore try the Ansatz
The equation (8.4.5) holds if and only if ht is a constant multiple of 1/1 − t, and in this
case
d h′ 1
log ht = t =
βt = for t ∈ [0, 1].
dt ht 1−t
Summarizing, we have shown:
Theorem 8.14. If (Bt ) is a Brownian motion then the process (Xt ) defined by
ˆt
1−t
Xt = dBr for t ∈ [0, 1], X1 = 0,
1−r
0
Proof. As shown above, (Xt )t∈[0,1) is a continuous centered Gaussian process with the
covariances of the Brownian bridge. Hence its distribution on C([0, 1)) coincides with
that of the Brownian bridge from 0 to 0. In particular, this implies lim Xt = 0 almost
tր1
surely, so the trivial extension from [0, 1) to [0, 1] defined by X1 = 0 is a Brownian
bridge.
d
X
dXt = b(t, Xt ) dt + σk (t, Xt ) dBtk . (8.5.1)
k=1
Theorem 8.15 (Existence, uniqueness and stability under global Lipschitz condi-
tions). Suppose that b and σ satisfy a global Lipschitz condition of the following form:
For any t0 ∈ R, there exists a constant L ∈ R+ such that
|b(t, x)−b(t, x
e)|+||σ(t, x)−σ(t, x
e)|| ≤ L·|x−e e ∈ Rn . (8.5.3)
x| ∀ t ∈ [0, t0 ], x, x
Then for any initial value x ∈ Rn , the SDE (8.5.2) has a unique (up to equivalence)
strong solution (Xt )t∈[0,∞) such that X0 = x P -almost surely.
et ) are two strong solutions with arbitrary initial conditions,
Furthermore, if (Xt ) and (X
then for any t ∈ R+ , there exists a finite constant C(t) such that
" #
h i
E sup |Xs − X es | ≤ C(t) · E |X0 − X e 0 |2 .
s∈[0,t]
The proof of Theorem 8.15 is outlined in the exercises below. In Section 12.1, we will
prove more general results that contain the assertion of the theorem as a special case. In
particular, we will see that existence up to an explosion time and uniqueness of strong
solutions still hold true if one assumes only a local Lipschitz condition.
The key step for proving stability and uniqueness is to control the deviation
εt := E sup |Xs − X es | 2
s≤t
between two solutions up to time t. Existence of strong solutions can then be shown by
a Picard-Lindelöf approximation based on a corresponding norm:
et ) are strong
Exercise (Proof of stability and uniqueness). Suppose that (Xt ) and (X
solutions of (8.5.2), and let t0 ∈ R+ . Apply Itô’s isometry and Gronwall’s inequality to
show that if (8.5.3) holds, then there exists a finite constant C ∈ R+ such that for any
t ≤ t0 ,
ˆ t
εt ≤ C · ε0 + εs ds , and (8.5.4)
0
εt ≤ C · eCt ε0 . (8.5.5)
Hence conclude that two strong solutions with the same initial value coincide almost
surely.
Let ∆nt := E[sups≤t |Xsn+1 − Xsn |2 ]. Show that if (8.5.3) holds, then for any t0 ∈ R+ ,
there exists a finite constant C(t0 ) such that
ˆ t
n+1
∆t ≤ C(t0 ) ∆ns ds for any n ≥ 0 and t ≤ t0 , and
0
tn 0
∆nt ≤ ∆ C(t0 )n
for any n ∈ N and t ≤ t0 .
n! t
Hence conclude that the limit Xs = limn→∞ Xsn exists uniformly for s ∈ [0, t0 ] with
probability one, and X is a strong solution of (8.5.2) with X0 = x.
with continuous (FtB,P ) adapted stochastic processes Gs , Hs1, Hs2 , . . . , Hsd. We now
extend the stochastic calculus rules to such Itô processes that are driven by several in-
dependent Brownian motions. Let Hs and H e s be continuous (FtB,P ) adapted processes.
The proof is an extension of the proof of Theorem 8.1(ii), where the assertion has been
derived for k = l and H = H.e The details are left as an exercise.
Similarly to the one-dimensional case, the lemma can be used to compute the covariation
of Itô integrals w.r.t. arbitrary Itô processes. If Xs and Ys are Itô processes as in (8.5.1),
and Ks and Ls are adapted and continuous then we obtain
ˆ • ˆ • ˆ t
K dX, L dY = Ks Ls d[X, Y ]s
0 0 t 0
almost surely uniformly for t ∈ [0, u], along an appropriate subsequence of (πn ).
We now assume again that (Xt )t≥0 is a solution of a stochastic differential equation of
the form (8.5.1). By Lemma 8.16, we can apply Itô’s formula to almost every sample
path t 7→ Xt (ω):
ˆt
F (t, Xt ) = F (0, X0) + (σ ⊤ ∇x F )(s, Xs ) · dBs
0
ˆt
∂F
+ +LF (s, Xs ) ds for all t ≥ 0,
∂t
0
P
where aij = k σki σkj , i.e.,
The Itô-Doeblin formula shows that for any F ∈ C 2 (R+ × Rn ), the process
ˆs
∂F
MsF = F (s, Xs ) − F (0, X0 ) − +LF (t, Xt ) dt
∂t
0
The vector field b(s, x) is called the drift vector field of the SDE, and the coefficients
ai,j (s, x) are called diffusion coefficients.
Examples
Example (Physical Brownian motion with external force).
Change of measure
Absolute Continuity
Suppose that P and Q are probability measures on a measurable space (Ω, A), and F is
a sub-σ-algebra of A.
(2). The measures Q and P are called singular on F if and only if there exists A ∈ F
such that P [A] = 0 and Q[AC ] = 0.
286
9.1. LOCAL AND GLOBAL DENSITIES OF PROBABILITY MEASURES 287
to signed measures.
Example. The Dirac measure δ1/2 is obviously singular w.r.t. Lebesgue measure λ(0,1]
on the Borel σ-algebra B((0, 1]). However, δ1/2 is absolutely continuous w.r.t. λ(0,1]
on each of the σ-algebras Fn = σ(Dn ) generated by the dyadic partitions Dn = {(k ·
S
2−n , (k + 1)2−n ] : 0 ≤ k < 2n }, and B([0, 1)) = σ( Dn ).
Proof. The “if” part is obvious. If P [A] = 0 and (9.1.1) holds for each ε > 0 with δ
depending on ε then Q[A] < ε for any ε > 0, and hence Q[A] = 0.
To prove the “only if” part, we suppose that there exists ε > 0 such that (9.1.1) does not
hold for any δ > 0. Then there exists a sequence (An ) of events in F such that
whereas
" # " #
\ [ [
Q[An infinitely often] = Q Am = lim Q Am ≥ ε.
n→∞
n m≥n m≥n
For any ε > 0 there exists δ > 0 such that for any n ∈ N and a1 , . . . , an , b1 , . . . , bn ∈ R,
n
X n
X
|bi − ai | < ε ⇒ |F (bi ) − F (ai )| < δ, (9.1.2)
i=1 i=1
The Radon-Nikodym Theorem states that absolute continuity is equivalent to the exis-
tence of a relative density.
P [Bi ] = 0 =⇒ Q[Bi ] = 0.
Example (Dyadic partitions). Any probability measure on the unit interval [0, 1] is
locally absolutely continuous w.r.t. Lebesgue measure on the filtration Fn = σ(Dn )
generated by the dyadic partitions of the unit interval. The Radon-Nikodym derivative
on Fn is the dyadic difference quotient defined by
dµ µ[((k − 1) · 2−n , k · 2−n )] F (k · 2−n ) − F ((k − 1) · 2−n )
(x) = = (9.1.4)
dλ Fn λ[((k − 1) · 2−n , k · 2−n )] 2−n
for x ∈ ((k − 1)2−n , k2−n ].
N
∞ N
∞
Example (Product measures). If Q = ν and P = µ are infinite products of
i=1 i=1
probability measures ν and µ, and ν is absolutely continuous w.r.t. µ with density ̺,
then Q is locally absolutely continuous w.r.t. P on the filtration
Fn = σ(X1 , . . . , Xn )
Now suppose that Q is locally absolutely continuous w.r.t. P on a filtration (Fn ) with
relative densities
dQ
Zn = .
dP Fn
The L1 martingale convergence theorem can be applied to study the existence of a global
density on the σ-algebra
[
F∞ = σ( Fn ).
(1). The sequence (Zn ) of successive relative densities is an (Fn )-martingale w.r.t. P .
In particular, (Zn ) converges P -almost surely to Z∞ , and Z∞ is integrable w.r.t.
P.
Given ε > 0, the last summand is smaller than ε/3 for n0 sufficiently large, and
the other two summands on the right hand side are smaller than ε/3 if c is chosen
sufficiently large depending on n0 . Hence (Zn ) is uniformly integrable w.r.t. P .
and therefore
for any A ∈ F∞ .
To prove (9.1.6) for A = Ω we observe that for c ∈ (0, ∞),
Q lim sup Zn < c ≤ lim sup Q[Zn < c] = lim sup EP [Zn ; Zn < c]
n→∞ n→∞ n→∞
≤ EP lim sup Zn · I{Zn <c} ≤ EP [Z∞ ] = Qa [Ω]
n→∞
The approach above can be generalized to probability measures that are not absolutely
continuous:
The goal of the exercise is to prove that a Lebesgue density exists if the σ-algebra A is
separable.
(1). Show that if Z is a Lebesgue density of Q w.r.t. P then 1/Z is a Lebesgue density
of P w.r.t. Q. Here 1/∞ := 0 and 1/0 := ∞.
S
From now on suppose that the σ-algebra is separable with A = σ( Fn ) where (Fn ) is
a filtration consisting of σ-algebras generated by finitely many atoms.
(2). Prove that the limit Z∞ = lim Zn exists both P -almost surely and Q-almost
surely, and P [Z∞ < ∞] = 1 and Q[Z∞ > 0] = 1.
exists for almost every t and F ′ is an integrable function on (0, 1). Furthermore, if F is
absolutely continuous then
ˆs
F (s) = F ′ (t) dt for all s ∈ [0, 1]. (9.1.7)
0
Remark. Right continuity is only a normalization and can be dropped from the assump-
tions. Moreover, the assertion extends to function of finite variation since these can be
represented as the difference of two monotone functions, cf. ?? below. Similarly, (9.1.7)
also holds for absolutely continuous functions that are not monotone. See e.g. [Elstrodt:
Maß- und Integrationstheorie] for details.
Fn = σ(X1 , . . . , Xn ), n ∈ N,
where n
Y dνi
Zn = (Xi ) ∈ (0, ∞) P -almost surely.
i=1
dµ i
Theorem 9.5 (Kakutani’s dichotomy). The infinite product measures Q and P are
either singular or mutually absolutely continuous with relative density Z∞ . More pre-
cisely, the following statements are equivalent:
(1). Q ≪ P on F∞ .
(2). Q ≈ P on F∞ .
r
Q∞ ´ dνi
(3). dµi > 0.
i=1 dµi
P
∞
(4). d2H (νi , µi ) < ∞.
i=1
Here the squared Hellinger distance d2H (νi , µi ) of mutually absolutely continuous prob-
ability measures ν and µ is defined by
s !2
ˆ r !2
1 dν 1 dµ
ˆ
d2H = −1
dµ = −1 dν
2 dµ 2 dν
ˆ s ˆ r
dν dµ
= 1− dµ = 1 − dν.
dµ dν
Remark. (1). If mutual absolutely continuity holds then the relative densities on F∞
are
dQ dP 1
= lim Zn P -almost surely, and = lim Q-almost surely.
dP n→∞ dQ n→∞ Zn
dνi
Proof. (1) ⇐⇒ (3): For i ∈ N let Yi := dµi
(Xi ). Then the random variables Yi are
independent under both P and Q with EP [Yi ] = 1, and
Zn = Y 1 · Y 2 · · · Y n .
By Theorem 9.3, the measure Q is absolutely continuous w.r.t. P if and only if the mar-
tingale (Zn ) is uniformly integrable. To obtain a sharp criterion for uniform integrability
we switch from L1 to L2 , and consider the non-negative martingale
√ √ √
p ˆ s
Y1 Y2 Yn dνi
Mn = · ··· with βi = EP [ Yi ] = dµi
β1 β2 βn dµi
under the probability measure P . Note that for n ∈ N,
n
, n !2
Y Y
E[Mn2 ] = E[Yi ]/βi2 = 1 βi .
i=1 i=1
N
∞ N
∞
Example (Gaussian products). Let P = N(0, 1) and Q = N(ai , 1) where
i=1 i=1
(ai )i∈N is a sequence of reals. The relative density of the normal distributions νi :=
N(ai , 1) and µ := N(0, 1) is
dνi exp(−(x − ai )2 )/2
(x) = = exp(ai x − a2i /2),
dµ exp(−x2 /2)
and
ˆ s ˆ∞
dνi 1 1 2
dµ = √ exp − (x − ai x + ai /2) dx = exp(−a2i /8).
2
dµ 2π 2
−∞
(2). The relative entropy is related to the squared Hellinger distance by the inequality
1
H(ν | µ) ≥ d2H (ν | µ),
2
which follows from the elementary inequality
1 √ √
log x−1 = − log x ≥ 1 − x for x > 0.
2
Bth = Bt + h(t)
h(t)
Bt + h(t)
Bt
Example. (1). Suppose we would like to evaluate the probability that sup |Bs −
s∈[0,t]
g(s)| < ε for a given t > 0 and a given function g ∈ C([0, ∞), Rd) asymptotically
as ε ց 0. One approach is to study the distribution of the translated process
Bt − g(t) near 0.
(2). Similarly, computing the passage probability P [Bs ≥ a+bs for some s ∈ [0, t]]
to a line s 7→ a + bs for a one-dimensional Brownian motion is equivalent to
computing the passage probability to the point a for the translated process Bt −bt.
of the translated process Bth = Bt + h(t) is the image of Wiener measure µ0 under the
translation map
Recall that Wiener measure is a Gaussian measure on the infinite dimensional space
C([0, ∞), Rd). The next exercise discusses translations of Gaussian measures in Rn :
(1). Show that if C is non-degenerate then N(h, C) ≈ N(0, C) with relative density
dN(h, C) 1
(x) = e(h,x)− 2 (h,h) for x ∈ Rn , (9.2.1)
dN(0, C)
(2). Prove that in general, N(h, C) is absolutely continuous w.r.t. N(0, C) if and only
if h is orthogonal to the kernel of C w.r.t. the Euclidean inner product.
On C([0, ∞), Rd), we can usually not expect the existence of a global density of the
translated measures µh w.r.t. µ0 . The Cameron-Martin Theorem states that for t ≥ 0, a
relative density on FtX exists if and only if h is contained in the corresponding Cameron-
Martin space:
Theorem 9.6 (Cameron, Martin). For h ∈ C([0, ∞), Rd) and t ∈ R+ the translated
measure µh = µ ◦ τh−1 is absolutely continuous w.r.t. Wiener measure µ0 on FtX if and
´t
only if h is an absolutely continuous function on [0, t] with h(0) = 0 and 0 |h′ (s)|2 ds <
∞. In this case, the relative density is given by
ˆ t
dµh 1 t ′
ˆ
′ 2
= exp h (s) dXs − |h (s)| ds . (9.2.2)
dµ0 F X 0 2 0
t
´t
where 0
h′ (s) dXs is the Itô integral w.r.t. the canonical Brownian motion (X, µ0 ).
Before giving a rigorous proof let us explain heuristically why the result should be true.
Clearly, absolute continuity does not hold if h(0) 6= 0, since then the translated paths do
not start at 0. Now suppose h(0) = 0, and fix t ∈ (0, ∞). Absolute continuity on FtX
means that the distribution µth of (Bsh )0≤s≤t on C([0, t], Rd ) is absolutely continuous
w.r.t. Wiener measure µt0 on this space. The measure µt0 , however, is a kind of infinite
dimensional standard normal distribution w.r.t. the inner product
ˆ t
(x, y)H = x′ (s) · y ′ (s) ds
0
Since µt0 -almost every path x ∈ C([0, ∞), Rd) is not absolutely continuous, this ex-
pression does not make sense. Nevertheless, using finite dimensional approximations,
we can derive the rigorous expression (9.2.2) for the relative density where the integral
´t ′ ′ ´t
0
h x ds is replaced by the almost surely well-defined stochastic integral 0 h′ dx :
Proof of Theorem 9.6. We assume t = 1. The proof for other values of t is similar.
Moreover, as explained above, it is enough to consider the case h(0) = 0.
(1). Local densities: We first compute the relative densities when the paths are only
evaluated at dyadic time points. Fix n ∈ N, let ti = i · 2−n , and let
δi x = xti+1 − xti
Since the normalization constant does not depend on h, the joint distribution
of (Bth1 , Bth2 , . . . , Bth2n ) is absolutely continuous w.r.t. that of (Bt1 , Bt2 , . . . , Bt2n )
with relative density
2 !
X δi h 1 X δi h
exp · δi x − δt δt . (9.2.3)
δt 2
In fact, the sum on the right hand side coincides with the squared L2 norm
ˆ 1 2
dh/dt|σ(Dn ) dt
0
dh
lutely continuous with h′ ∈ L2 (0, 1) then → h′ (t) in L2 (0, 1) by the L2
dt σ(Dn )
martingale convergence theorem.
Indeed,
the sum on the right-hand side is the Itô integral of the step function
dh
w.r.t. X, and as remarked above, these step functions converge to h′ in
dt σ(Dn )
L2 (0, 1). Along a subsequence, the convergence in (9.2.5) holds µ0 -almost surely,
and hence by (9.2.4),
1
ˆ1
1
ˆ
lim Zn = exp h′ (s) dXs − |h′ (s)|2 ds µ0 -a.s. (9.2.6)
n→∞ 2
0 0
(3). Absolute continuity on F1X : We still assume h′ ∈ L2 (0, 1). Note that F1X =
S
σ( Fn ). Hence for proving that µh is absolutely continuous w.r.t. µ0 on F1X with
density given by (9.2.6), it suffices to show that lim sup Zn < ∞ µh -almost surely
(i.e., the singular part in the Lebesgue decomposition of µh w.r.t. µ0 vanishes).
Since µh = µ0 ◦ τh−1 , the process
Note that the minus sign in front of the second sum has turned into a plus by the
translation! Arguing similarly as above, we see that along a subsequence, (Zn )
converges µh -almost surely to a finite limit:
1
ˆ1
1
ˆ
lim Zn = exp h′ (s) dWs + |h′ (s)|2 ds µh -a.s.
2
0 0
(4). Singularity on F1X : Conversely, let us suppose now that h is not absolutely con-
tinuous or h′ is not in L2 (0, 1). Then
n −1
2X ˆ1 2
δi h 2 dh
δt = dt −→ ∞ as n → ∞.
δi t dt
i=0 σ(Dn )
0
Since
2n −1
n −1 2 !1/2
X δ h
2X
δi h
i
· δ X
i
= δt ,
δt
δt
i=0 L2 (µ0 ) i=0
Note that TaY is also the first passage time to the line t 7→ a − βt for the Brownian
motion (Bt ).
Theorem 9.7. For a > 0 and β ∈ R, the restriction of the distribution of TaY to (0, ∞)
is absolutely continuous with density
a(a − βt)2
fa,β (t) = √ exp − .
2πt3 2t
In particular,
ˆ∞
P [TaY < ∞] = fa,β (s) ds.
0
Proof. Let h(t) = βt. By the Cameron-Martin Theorem, the distribution µh of (Yt ) is
absolutely continuous w.r.t. Wiener measure on FtX with density
Zt = exp(β · Xt − β 2 t/2).
by the optional sampling theorem. The claim follows by inserting the explicit expression
for fTa derived in Corollary 1.25.
where (Gs ) is an adapted process. Recall that the densities in the Cameron-Martin-
Theorem took this form with the deterministic function Gs = h′ (s). We start with a
general discussion about changing measure on filtered probability spaces that will be
useful in other contexts as well.
Let (Ft ) be a filtration on a measurable space (Ω, A), and fix t0 ∈ (0, ∞). We consider
two probability measures P and Q on (Ω, A) that are mutually absolutely continuous
on the σ-algebra Ft0 with relative density
dP
Zt0 = > 0 Q-almost surely.
dQ Ft0
Then P and Q are also mutually absolutely continuous on each of the σ-algebras Ft ,
t ≤ t0 , with Q- and P -almost surely strictly positive relative densities
dP dQ 1
Zt = = EQ Zt0 Ft and = .
dQ Ft dP Ft Zt
The process (Zt )t≤t0 is a martingale w.r.t. Q, and, correspondingly, (1/Zt )t≤t0 is a mar-
tingale w.r.t. P . From now on, we always choose a right continuous version of these
martingales.
Lemma 9.8. 1) For any 0 ≤ s ≤ t ≤ t0 , and for any Ft -measurable random vari-
able X : Ω → [0, ∞],
2) Suppose that (Mt )t≤t0 is an (Ft ) adapted right continuous stochastic process.
Then
Proof. 1) The right hand side of (9.3.1) is Fs -measurable. Moreover, for any A ∈ Fs ,
EP [Mt∧Tn ; A ∩ {Tn > s}] = EQ [Mt∧Tn Zt∧Tn ; A ∩ {Tn > s}] (9.3.4)
= EQ [Ms∧Tn Zs∧Tn ; A ∩ {Tn > s}]] = EP [Ms∧Tn ; A ∩ {Tn > s}]
by the martingale property for (MZ)Tn , the optional sampling theorem, and the fact
that P ≪ Q on Ft∧Tn with relative density Zt∧Tn . (9.3.2) follows from (9.3.3) and
(9.3.4).
Girsanov’s Theorem
We now return to our original problem of identifying the change of measure induced
by a random translation of the paths of a Brownian motion. Suppose that (Xt ) is a
Brownian motion in Rd with X0 = 0 w.r.t. the probability measure Q and the filtration
(Ft ), and fix t0 ∈ [0, ∞). Let
ˆ t
Lt = Gs · dXs , t ≥ 0,
0
´t
with G ∈ L2a,loc R+ , Rd . Then [L]t = 0 |Gs |2 ds, and hence
ˆ t
1
ˆ t
2
Zt = exp Gs · dXs − |Gs | ds (9.3.5)
0 2 0
In order to use Zt0 for changing the underlying probability measure on Ft0 we have to
assume the martingale property:
dP
= Zt0 Q-a.s. (9.3.6)
dQ Ft0
Note that P and Q are mutually absolutely continuous on Ft for any t ≤ t0 with
dP dQ 1
= Zt and =
dQ Ft dP Ft Zt
both P - and Q-almost surely. We are now ready to prove one of the most important
results of stochastic analysis:
The right-hand side of (9.3.7) is a stochastic integral w.r.t. the Q-Brownian motion X,
and hence a local Q-martingale.
The theorem shows that if X is a Brownian motion w.r.t. Q, and Z defined by (9.3.5) is
a Q-martingale, then X satisfies
dXt = Gt dt + dBt .
with a P -Brownian motion B. This can be used to construct weak solutions of stochastic
differential equations by changing the underlying probability measure, see Section 11.3
below. For instance, if we choose Gt = b(Xt ) then the Q-Brownian motion (Xt ) is a
solution to the SDE
dXt = b(Xt ) dt + dBt ,
of a Brownian motion X.
Novikov’s condition
To verify the assumption in Girsanov’s theorem, we now derive a sufficient condition
for ensuring that the exponential
Zt = exp Lt − 1/2 [L]t
Theorem 9.10 (Novikov 1971). Let t0 ∈ R+ . If E[exp [L]t0 /2 ] < ∞ then (Zt )t≤t0 is
an (Ft ) martingale.
We only prove the theorem under the slightly more restrictive condition
This simplifies the proof considerably, and the condition is sufficient for many applica-
tions. For a proof in the general case and under even weaker assumptions see e.g. [37].
Proof. Let (Tn )n∈N be a localizing sequence for the martingale Z. Then (Zt∧Tn )t≥0 is a
martingale for any n. To carry over the martingale property to the process (Zt )t∈[0,t0 ] , it
is enough to show that the random variables Zt∧Tn , n ∈ N, are uniformly integrable for
each fixed t ≤ t0 . However, for c > 0 and p, q ∈ (1, ∞) with p−1 + q −1 = 1, we have
E[Zt∧Tn ; Zt∧Tn ≥ c]
p p−1
= E exp Lt∧Tn − [L]t∧Tn exp [L]t∧Tn ; Zt∧Tn ≥ c (9.3.9)
2 2
p2 1/p p−1 1/q
≤ E exp pLt∧Tn − [L]t∧Tn · E exp q · [L]t∧Tn ; Zt∧Tn ≥ c
2 2
p 1/q
≤ E exp [L]t ; Zt∧Tn ≥ c
2
for any n ∈ N. Here we have used Hölder’s inequality and the fact that exp pLt∧Tn −
p2
2
[L]t∧Tn is an exponential supermartingale. If exp 2p [L]t is integrable then the right
hand side of (9.3.9) converges to 0 uniformly in n as c → ∞, because
and the interest rate is given by (Rt ). We assume that (Xt ) is a Brownian motion and
(αt ), (Rt ), (σt ) and (1/σt ) are adapted bounded continuous processes, all defined on a
filtered probability space (Ω, A, Q, (Ft )). Then the discounted asset price
ˆ t
Set := exp − Rs ds St
0
satisfies
dSet = (αt − Rt )Set dt + σt Set dXt = σt Set dBt , (9.3.11)
where t
αs − Rs
ˆ
Bt := Xt + ds.
0 σs
We can apply Girsanov’s Theorem and the Novikov condition to conclude that the pro-
cess (Bt ) is a Brownian motion under a probability measure P on (Ω, A) with local
densities w.r.t. Q on Ft given by
ˆ t 1 t
ˆ
2
Zt = exp Gs · dXs − |Gs | ds where Gt = (Rt − αt )/σt .
0 2 0
Therefore, by (9.3.11) and by the assumptions on the coefficients, the process (Set ) is a
martingale under Q. The measure Q can now be used to compute option prices under
a no-arbitrage assumption similarly to the discrete time case considered in Section 2.3
above, see Section 9.4.
is the completed filtration generated by (Bt ). It is crucial that the filtration does not con-
tain additional information. By the factorization lemma, this implies that Ft measurable
random variables F : Ω → R are almost surely functions of the Brownian path (Bs )s≤t .
Indeed, we will show that such functions can be represented as stochastic integrals.
Theorem 9.11 (Itô). For any function F ∈ L2 (Ω, F1 , P ) there exists a unique process
G ∈ L2a (0, 1) such that
ˆ 1
F = E[F ] + Gs · dBs P -almost surely. (9.4.1)
0
We first show that the corollary follows from Theorem 9.11, and then we prove the
theorem:
Hence, by Theorem 9.11, there exists a unique process G ∈ L2a (0, 1) such that
ˆ 1 ˆ 1
M1 = E[M1 ] + G · dB = M0 + G · dB a.s.,
0 0
and thus
ˆ t
Mt = E[M1 |Ft ] = M0 + G · dB a.s. for any t ∈ [0, 1].
0
Since both sides in the last equation are almost surely right continuous, the identity
actually holds simultaneously for all t ∈ [0, 1] with probability 1.
e∈
Proof of Theorem 9.11. Uniqueness. Suppose that (9.4.1) holds for two processes G, G
L2a (0, 1). Then ˆ 1 ˆ 1
G · dB = e · dB,
G
0 0
and hence, by Itô’s isometry,
ˆ
e L2 (P ⊗λ) e
||G − G|| = (G − G) · dB = 0.
L2 (P )
3. Clearly, an Itô representation also holds for any linear combination of functions as in
Step 2.
Fn − E[Fn ] −→ F − E[F ] in L2 (P ).
αt ≡ α ∈ R, σt ≡ σ ∈ (0, ∞), Rt ≡ r ∈ R.
and by (9.3.11), the discounted stock price is proportional to the Itô exponential of σB
α−r
where Bt = Xt + σ
t is a Brownian motion under the risk-neutral measure Q:
Now suppose that we want to compute the no-arbitrage price of an option. For example,
let us consider a European call option where the payoff at the final time t0 is given by
is an FtB,P
0
measurable random variable. Therefore, by Itô’s Representation Theorem
and (9.4.4), there exists a process G ∈ L2a (0, t0 ) such that
h i ˆ t0 h i ˆ t0
Vet0 = EP Vet0 + Gr dBr = EP Vet0 + Φr dSer ,
0 0
where Φr := Gr /(σ Ser ). Hence (Φr ) is a replicating strategy for the option, i.e., in-
vesting Φr units in the stock and putting the remaining money on the bank account
yields
h exactly
i the payoff for the option at time t0 provided our initial capital is given by
EP Vet0 . Since otherwise there would be an arbitrage opportunity by selling the option
and investing the gain by the strategy Φ, or conversely, we can conclude that under a
no-arbitrage assumption, the only possible option price at time 0 is given by
h i +
e σBt0 −σ2 t0 /2 −rt0
EP Vt0 = EP S0 e −e K
Noting that Bt0 ∼ N(0, t0 ) under P , we obtain the Black-Scholes formula for the no-
arbitrage price of a European call option. Notice in particular that the price does not
depend on the usually unknown model parameter α (the mean rate of return).
Stochastic Analysis
318
Appendix
548
Appendix A
Conditional expectations
P [A ∩ {Y = z}]
P [A | Y = z] = , A ∈ A,
P [Y = z]
E[X; Y = z]
E[X | Y = z] = , X ∈ L1 (Ω, A, P ),
P [Y = z]
for any z ∈ S with P [Y = z] > 0 in an elementary way. Note that for z ∈ S with
P [Y = z] = 0, the conditional probabilities are not defined.
It will turn out to be convenient to consider the conditional probabilities and expecta-
tions not as functions of the outcome z, but as functions of the random variable Y . In
this way, the conditional expectations become random variables:
549
550 APPENDIX A. CONDITIONAL EXPECTATIONS
with
E[X | Y = z] if P [Y = z] > 0
g(z) :=
arbitrary if P [Y = z] = 0
is called (a version of the) conditional expectation of X given Y . For an event A ∈ A,
the random variable
P [A | Y ] := E[IA | Y ]
The conditional expectation E[X | Y ] and the conditional probability P [A | Y ] are again
random variables.They take the values E[X | Y = z] and P [A | Y = z], respectively,
on the sets {Y = z}, z ∈ S with P [Y = z] > 0. On each of the null sets {Y =
z}, z ∈ S with P [Y = z] = 0, an arbitrary constant value is assigned to the conditional
expectation. Hence the definition is only almost surely unique.
(II) E X · f (Y ) = E[X · f (Y )] for all non-negative or bounded functions f :
S → R, respectively.
In certain cases this is possible but in general, the existence of the limit is not guaran-
teed.
Note, however, that in general, the exceptional set will depend on the event A !
Theorem A.2 (Factorization lemma). Suppose that (S, S) is a measurable space and
Y : Ω → S is a map. Then a map X : Ω → R is measurable w.r.t. σ(Y ) if and only if
X = f (Y ) = f ◦ Y
Y
(Ω, σ(Y )) (S, S) (R, B(R))
X = sup Xn = sup fn (Y ) = f (Y ),
The factorization lemma can be used to rephrase the characterizing properties (I) und
(II) of conditional expectations in Theorem A.1 in the following way:
The equivalence of (I) und (i) is a consequence of the factorization lemma, and the
equivalence of (II) and (ii) follows by monotone classes, since (ii) states that
The characterization of conditional expectations by (i) and (ii) can be extended immedi-
ately to the case of general conditional expectations given a σ-algebra or given arbitrary
random variables. To this end let X : Ω → R be a non-negative (or integrable) random
variable on a probability space (Ω, A, P ).
Remark. By monotone classes it can be shown that Condition (b) is equivalent to:
E[X ; A] = e ; A]
E[X for any A ∈ F .
e P -almost surely.
Therefore, X = X
In particular,
Proof. (1). Aus der Linearität des Erwartungswertes folgt, dass λE[X |F ]+µE[Y |F ]
eine Version der bedingten Erwartung E[λX + µY | F ] ist.
(2). Sei X eine Version von E[X | F ]. Aus X ≥ 0 P -fast sicher folgt wegen {X <
0} ∈ F :
E[X ; X < 0] = E[X ; X < 0] ≥ 0,
und damit X ≥ 0 P -fast sicher.
(4). Ist Xn ≥ 0 und monoton wachsend, dann ist sup E[Xn | F ] eine nichtnegative
F -messbare Zufallsvariable (mit Werten in [0, ∞]), und nach dem "‘klassischen
"’ Satz von der monotonen Konvergenz gilt:
(5). Wir zeigen, dass jede Version von E[X | G] auch eine Version von E[E[X | F ] | G]
ist, also die Eigenschaften (i) und (ii) aus der Definition der bedingten Erwartung
erfüllt:
(6) und (7). Auf ähnliche Weise verifiziert man, dass die Zufallsvariablen, die auf der rechten
Seite der Gleichungen in (6) und (7) stehen, die definierenden Eigenschaften der
bedingten Erwartungen auf der linken Seite erfüllen (Übung).
(a) Gilt f (x, y) = g(x) · h(y) mit messbaren Funktionen g, h ≥ 0, dann folgt
nach (6) und (7) P -fast sicher:
E[f (X, Y ) | F ] = E[g(X) · h(Y ) | F ] = h(Y ) · E[g(X)|F ]
= h(Y ) · E[g(X)],
und somit
E[f (X, Y )|F ](ω) = E[g(X)·h(Y (ω))] = E[f (X, Y (ω))] für P -fast alle ω.
D ist ein Dynkinsystem, das nach (a) alle Produkte B = B1 ×B2 mit B1 ∈ S
und B2 ∈ T enthält. Also gilt auch
D ⊇ σ({B1 × B2 | B1 ∈ S, B2 ∈ T }) = S ⊗T.
The last property in Theorem A.4 is often very useful. For independent random variables
X and Y it implies
E[f (X, Y ) | Y ](ω) = E[f (X, Y (ω))] für P -fast alle ω, (A.2.3)
We stress that independence of X and Y ist essential for (A.2.3) to hold true. The
application of (A.2.3) without independence is a common mistake in computations with
conditional expectations.
Jensen’s inequality
Jensen’s inequality is valid for conditional expectations as well. Let (Ω, A, P ) be a
probability space, X ∈ L1 (Ω, A, P ) an integrable random variable, and F ⊆ A a σ-
algebra.
Proof. Jede konvexe Funktion u lässt sich als Supremum von abzählbar vielen affinen
Funktionen darstellen, d.h. es gibt an , bn ∈ R mit
Zum Beweis betrachtet man die Stützgeraden an allen Stellen einer abzählbaren dichten
Teilmenge von R, siehe z.B. [Williams: Probability with martingales, 6.6]. Wegen der
Monotonie und Linearität der bedingten Erwartung folgt
The proof of the corollary shows in particular that for a random variable X ∈ Lp , the
conditional expectation E[X | F ] is contained in Lp as well. We now restrict ourselves
to the case p = 2.
X
L2 (Ω, A, P )
E[X | F ]
0
L2 (Ω, F , P )
Hierbei zeigt man die zweite Äquivalenz mit den üblichen Fortsetzungsverfahren
(maßtheoretische Induktion).
(3) ⇒ (2): Sei Y eine Version der orthogonalen Projektion von X auf L2 (Ω, F , P ).
Dann gilt für alle Z ∈ L2 (Ω, F , P ):
≥ E[(X − Y )2 ]
(2) ⇒ (3): Ist umgekehrt Y eine beste Approximation von X in L2 (Ω, F , P ) und Z ∈
L2 (Ω, F , P ), dann gilt
The equivalence of (2) and (3) is a well-known functional analytic statement: the best
approximation of a vector in a closed subspace of a Hilbert space is the orthogonal
projection of the vector onto this subspace. The geometric intuition behind this fact is
indicated in Figure A.1.
Proof. (1). Wir betrachten zunächst den Fall X ∈ L2 (Ω, A, P ). Wie eben bemerkt, ist
der Raum L2 (Ω, F , P ) ein abgeschlossener Unterraum des Hilbertraums L2 (Ω, A, P ).
Sei d = inf{kZ − XkL2 | Z ∈ L2 (Ω, F , P )} der Abstand von X zu diesem Un-
terraum. Um zu zeigen, dass eine beste Approximation von X in L2 (Ω, F , P ) ex-
istiert, wählen wir eine Folge (Xn ) aus diesem Unterraum mit kXn − XkL2 → d.
Mithilfe der Parallelogramm-Identität folgt für n, m ∈ N:
und damit
lim sup kXn − Xm k2L2 ≤ 0.
n,m→∞
Also ist die Minimalfolge (Xn ) eine CauchyLfolge in dem vollständigen Raum
L2 (Ω, F , P ), d.h. es existiert ein Y ∈ L2 (Ω, F , P ) mit
kXn − Y kL2 → 0.
Für Y gilt
d.h. Y ist die gesuchte Bestapproximation, und damit eine Version der bedingten
Erwartung E[X | F ].
(2). Für eine beliebige nichtnegative Zufallsvariable X auf (Ω, A, P ) existiert eine
monoton wachsende Folge (Xn ) nichtnegativer quadratintegrierbarer Zufallsvari-
ablen mit X = sup Xn . Man verifiziert leicht, dass sup E[Xn | F ] eine Version
n
von E[X | F ] ist.
(3). Entsprechend verifiziert man, dass für allgemeine X ∈ L1 (Ω, A, P ) durch E[X|F ] =
E[X + | F ] − E[X − | F ] eine Version der bedingten Erwartung gegeben ist.
564
INDEX 565
white noise, 9, 25
Wiener
- measure, 20
[1]
[2]
[3] Sur l’équation de Kolmogoroff, par W. Doeblin. Éditions Elsevier, Paris, 2000. C.
R. Acad. Sci. Paris Sér. I Math. 331 (2000), Special Issue.
[4] Robert J. Adler and Jonathan E. Taylor. Random fields and geometry. Springer
Monographs in Mathematics. Springer, New York, 2007.
[5] David Applebaum. Lévy Processes and Stochastic Calculus. Cambridge UP, Cam-
bridge, 2004.
[9] Jean-Michel Bismut. Large deviations and the Malliavin calculus. Birkhaeuser,
Boston, 1994.
[10] Amir Dembo and Ofur Zeitouni. Large Deviations Techniques and Applications.
Springer, 1998, Berlin.
[11] Frank den Hollander. Large Deviations. American Mathematical Society, Provi-
dence, 2008.
567
568 BIBLIOGRAPHY
[15] David Elworthy. On the Geometry of Diffusion Operators and Stochastic Flows.
Springer, 1999, Berlin.
[17] William Feller. Introduction to Probability Theory and Its Applications. Wiley,
New York, 1957.
[18] Peter K. Friz and Martin Hairer. A course on rough paths. Universitext. Springer,
Cham, 2014. With an introduction to regularity structures.
[19] Peter K. Friz and Nicolas Victoir. Multidimensional Stochastic Processes as Rough
Paths. Cambridge UP, Cambridge, 2010.
[21] Martin Hairer. On malliavin’s proof of hörmander’s theorem. Bull. Sci. Math.,
135(6-7):650–666, 2011.
[22] Elton Hsu. Stochastic Analysis on Manifolds. Oxford University Press, Oxford,
2002.
[23] Nobuyuki Ikeda and Takehiko Ed. Watanabe. Stochastic Differential Equations
and Diffusion Processes. North-Holland, Amsterdam, 1981.
[24] Jean Jacod and Albert Shiryaev. Limit Theorems for Stochastic Processes.
Springer, Berlin, 2003.
[25] Ioannis Karatzas and Steven E. Shreve. Brownian motion and stochastic calculus.
Springer, New York, 2010.
[26] Peter E. Kloeden and Eckhard Platen. Numerical solution of stochastic differen-
tial equations, volume 23 of Applications of Mathematics (New York). Springer-
Verlag, Berlin, 1992.
[27] Hiroshi Kunita. Stochastic Flows and Stochastic Differential Equations. Cam-
bridge University Press, Cambridge, 1997.
[29] Terry J. Lyons. Differential Equations Driven by Rough Paths, Ecole d’Eté de
Probabilités de Saint-Flour XXXIV. Springer, 2004.
[31] Dilip B. Madan and Eugene Seneta. The variance gamma model for share market
returns. Journal of Business, 63(4):511–524, 1990.
[34] David Nualart. The Malliavin Calculus and Related Topics. Springer, Berlin,
2006.
[37] Daniel Revuz and Marc Yor. Continuous martingales and Brownian motion.
Springer, Berlin, 2005.
[38] L. C. G. Rogers and David Williams. Diffusions, Markov processes and martin-
gales, Vol. 2: Ito calculus. Cambridge UP, Camebridge, 2008.
[39] Steven E. Shreve. Stochastic calculus for finance. II. Springer Finance. Springer-
Verlag, New York, 2004. Continuous-time models.
[41] Shinzo Watanabe. Lectures on stochastic differential equations and Malliavin cal-
culus. Springer, Berlin, 1984.