Stochastic Optimal Control
Stochastic Optimal Control
Stochastic Optimal Control
May 2010
Deterministic Optimal Control Problem
Consider the following dynamical system with the state vector x(t) ∈ Rn and
the control vector u(t) ∈ Rm:
x(0) = x0 .
The permissible controls over the fixed time interval [0, T ] satisfy the following
condition:
u(t) ∈ U for all t ∈ [0, T ],
where U is a time-invariant, closed, and convex subset of the control space Rm:
m
U ⊆R .
Hamiltonian function:
1 T 1 T
H(x(t), u(t), p(t), t) = x (t)Q(t)x(t) + u (t)R(t)u(t)
2 2
T T
+p (t)A(t)x(t) + p (t)B(t)u(t)
H -minimizing control:
∗ −1 T ∗
u (t) = −R (t)B (t)p (t)
Plugging the H -minimizing control into the differential equations leads to the
following linear two-point-boundary-value problem:
∗ ∗ −1 T ∗
ẋ (t) = A(t)x (t) − B(t)R (t)B (t)p (t)
∗ ∗ T ∗
ṗ (t) = − Q(t)x (t) − A (t)p (t)
∗
x (0) = x0
∗ ∗
p (T ) = F x (T ) .
Considering that p∗(T ) is linear in x∗(T ) and that the linear differential
equations are homogeneous leads to the educated guess that p∗(t) is linear in
x∗(t) at all times, i.e.,
∗ ∗
p (t) = K(t)x (t)
The cost-to-go function can now be split into the two parts and we obtain
"Z
t+∆t
J(x, t) = max E L(x(t), u(t), t) dt
u:[t,T ]→U t
Z #
T
+ K(x(T )) + L(x(t), u(t), t) dt (4)
t+∆t
| {z }
J(x,t+∆t)
T ∂(·) 1 n ∂ 2(·) T
o
A(·) = f (x, u, t) + tr g(x, u, t)g (x, u, t)
∂x 2 ∂x2
"Z #
t+∆t
J(x, t) = max E L(x(t), u(t), t) dt + J(x, t + ∆t)
u:[t,t+∆t]→U t
"Z
t+∆t
= max E L(x, u, t)dt
u:[t,t+∆t]→U t
Z t+∆t à ! Z t+∆t #
∂J(t, x)
+ J(x, t) + + AJ(x, t) dt + Jx (x, t)g(x, u, t)dW
t ∂t t
"Z
t+∆t
= max E L(x, u, t)dt + J(x, t)
u:[t,t+∆t]→U t
Z t+∆t à ! #
∂J(x, t)
+ + AJ(x, t) dt ,
t ∂t
"Z #
t+∆t
∂J(x, t)
0= max E L(x, u, t) + + AJ(x, t)dt (7)
u:[t,t+∆t]→U t ∂t
set the arguments of the integral to zero and interchange the maximum
operator with the integral
½ ¾
∂J(x, t)
max L(x, u, t) + + AJ(x, t) = 0, (8)
u∈U ∂t
½
∂J(t, x) ∂J(t, x)
+ max L(t, x, u) + f (t, x, u) (9)
∂t u(t)∈U ∂x
1 n ∂ 2
J(t, x) o¾
T
+ tr g(t, x, u)g (t, x, u) 2
=0
2 ∂x
h
1. For a fixed J(t, x), one has to find u = u(t, x, J) such that L(t, x, u)+
i
1 T
J x(t, x)f (t, x, u) + 2 tr{g(t, x, u)g (t, x, u)J xx(t, x)} is maximal.
2. The function u is put back h into HJB equation for u, and
the partial differential equation L(t, x, u) + Jx(t, x)f (t, x, u) +
i
1 T
2 tr{g(t, x, u)g (t, x, u)Jxx (t, x)} = 0 is solved with terminal condi-
tion J(T, x) = K(T, X(T )). The solution J(t, x) is the maximal value
functional.
3. The function J(t, x) is put back into the equation for u, which is
derived in Step 1. This results in the optimal feedback control law
u∗ = u∗(t, x, J(t, x)).
1 T³ T T T T
Jt + x Q + C JxxC − 2C JxxDLN − N LN
2
´ 1 T
T T T
− C JxxDLD JxxC x − Jx BLB Jx
2
1 ³ T T T
´
+ Jx A − BLN − C JxxDLB x
2
1 ´
T T T
+ x(A − N LB − BLD JxxC Jx = 0 (13)
2
Hamiltonain function: H(x(t), u(t), t, p(t)) = L(x(t), u(t), t) + pT (t)f (x(t), u(t), t)
∗ ∗ ∗
ẋ (t) = ∇pH|∗ = f (x (t), u (t), t)
∗
x (0) = x0
∗ ∗ ∗ T ∗ ∗ ∗
ṗ (t) = −∇xH|∗ = −∇xL(x (t), u (t), t) − fx (x (t), u (t), t)p (t)
∗ ∗
p (T ) = ∇xK(x (T ))
∗ ∗ ∗ ∗ ∗
H(x (t), u (t), t, p (t)) ≤ H(x (t), u, t, p (t)) for all u ∈ U .
T 1 T
H(t, x, u, p, px) = L(t, x, u) + f (t, x, u)p + tr{pxg(t, x, u)g (t, x, u)}.
2
Before stating the stochastic maximum principle we need to define the adjoint
equations for the stochastic optimal control problem as
Note that we use the Hamiltonian function to write the knwon HJB equation
as
−Jt = max H(t, x, u, p, px). (19)
u∈U
In the following we assume there is a known optimal control law u∗(t, x, p, px)
which solves the optimal control problem such that
∗ ∗
H (t, x, p, px) = H(t, x, u (t, x, p, px), p, px) (20)
T
= L(t, x, p, px) + f (t, x, p, px)p
1 © T ª
+ tr pxg(t, x, p, px)g (t, x, p, px) (21)
2
= −Jt (22)
In the next step we write the differential equations for the state and adjoint
equations dx∗ and dp∗, respectively:
∗ ∗
dx = f (t, x, u )dt + g(t, x, u )dW
∗
= Hp (t, x, p, px)dt + g(t, x, p, px)dW,
derived from the system dynamics and (21). And using Itô’s lemma on the
definition of p(t) in equation 17 gives us
1 2
dp = Jxtdt + Jxxdx + Jxxx(dx) (23)
2
h 1 i
T
= Jxt + Jxxf + tr{Jxxxgg } dt + JxxgdW. (24)
2
Then we take the partial derivative of (22) with respect to x to get Jxt
∗ ∗ ∂p ∗ ∂px
−Jxt = Hx + Hp + Hpx
∂x ∂x
∗ 1 T
= Hx + Jxxf + tr{Jxxxgg }, (25)
2
and insert (25) into (24) leads to
∗
dp = −Hx dt + JxxgdW (26)
∗
= −Hx dt + pxgdW. (27)
T 1 T
H(t, x, u, p, px) = L(t, x, u)+f (t, x, u)p+ tr{pxg(t, x, u)g (t, x, u)}.
2
∗ ∗
dx = H p dt + gdW
∗ ∗
dp = −Hx dt + pxgdW,
x∗(0) = x0,
p∗(T ) = Kx(T, x(T )),
H ∗(t, x(t), u(t), p(t), p (t)) = max H(t, x(t), u, p(t), p ).
x u x
(28)
- Maximization
∗ ∗ ∗ ∗ ∗ ∗ ∗
H(t, x (t), u (t), p (t), px(t)) ≥ H(t, x (t), u(t), p (t), px(t)), for t ∈ [0, T ]
∗ 1³ T T T ∗
´
u (t) = arg min u (t)R(t)u(t) + u (t)B (t)p (t)
u 2
optimal control u∗(t) = −R−1(t)B T (t)p∗(t). This put back into the system of forward
backward stochastic differential equations (FBSDE)
∗ £ ∗ −1 T ∗ ¤
dx (t) = A(t)x (t) − B(t)R (t)B (t)p (t) dt + σ(t)dW (t)
∗
x (0) = x0
∗ ∗ T ∗ ∗
dp (t) = −Q(t)x (t) + A (t)p (t) + px(t)σ(t)dW (t)
∗ ∗
p (T ) = M (T )x (T ).
∗ −1 T −1 T
u (t) = −R (t)B (t)(K(t)x + φ(t)) = −R (t)B (t)K(t)x
The optimal control law is computed by solving the two stochastic Riccati
equations 31 and 32) for K(t) and ϕ(t)
m
X m
X
T ¢−1¡ T T ¢
u(t, x) = −(R + Dj KDj B K +N + Dj KCj x(t)
j=1 j=1
m
X m
X
T ¢−1¡ T T ¢
−(R + Dj KDj B ϕ + Dj dj . (33)
j=1 j=1
s.t.
T T
dX = [X(u (b − e r) + r) − C] dt + Xu σdW.
∂ 2J
This problem leads to the following HJB equation ( ∂J
∂t ≡ Jt ,
∂J
∂X ≡ Jx, and ∂X 2
≡ Jxx)
" #
−ρt γ ¡ T ¢ 1 2 T
Jt + max e C + X(u (b − e r) + r) − C Jx + X Jxxu Σu =0
C(t),u(t) 2
Jx −1
u(t) = − Σ (b − er)
XJxx
³1 ´ 1
ρt γ−1
C(t) = e Jx .
γ
Second step: Put these values back into the HJB equation
γ ³ −γ −1 ´
ρt
γ−1 1 Jx2 T −1
Jt + e γ−1 Jx γ γ−1 − γ γ−1 − (b − er) Σ (b − er) + XrJx = 0
2 Jxx
For the solution of the PDE We use a separation Ansatz
γ −ρt
J(X, t) = X e h(t)
∂J −ρt γ¡ 0 ¢
= e X h (t) − ρh(t)
∂t
∂J −ρt γ−1
= e h(t)γX
∂X
∂ 2J −ρt γ−2
= e h(t)γ(γ − 1)X .
∂X 2
Third step: Plugging these results back into the HJB-PDE yields
µ ³ T −1 ´ ¶
−ρt γ 0 (b − er) Σ (b − er) γ
e X h (t) + h(t) rγ − ρ + − (1 − γ)h(t) γ−1 = 0,
2(1 − γ)
the proposed Ansatz is therefore valid. To specify h(t) and find an explicit solution to the optimal
control problem, the ordinary differential equation
γ
0
h (t) + Ah(t) − (1 − γ)h(t) γ−1 = 0
ρT
h(T ) = πe ,
³ ´
(b−er)T Σ−1 (b−er)
with A = rγ − ρ + 2(1−γ) remains to be solved.
Inserting these solution for J and its partial derivatives into equations for C and u yields the
optimal policies for consumption C ∗(t) and investment strategy u∗(t):
∗ 1 −1 1 −1
u (t) = − Σ (b − er) = Σ (b − er)
γ−1 1−γ
³ ´ 1 1
∗ γ−1 γ−1
C (t) = h(t)γX = h(t) γ−1 X.