Stochastic Optimal Control

Stochastic Optimal Control
May 2010
Deterministic Optimal Control Problem
Consider the following dynamical system with the state vector x(t) ∈ Rn and
the control vector u(t) ∈ Rm:
ẋ(t) = f (x(t), u(t), t) .
Its initial state at the fixed initial time 0 is given:
x(0) = x0 .
The permissible controls over the fixed time interval [0, T ] satisfy the following
condition:
u(t) ∈ U for all t ∈ [0, T ],
where U is a time-invariant, closed, and convex subset of the control space Rm:
m
U ⊆R .
Stochastic Systems, 2010 2

Furthermore, consider a cost functional of the following form:

Z T
J = K(x(T )) + L(x(t), u(t), t) dt .
0
This cost functional should either be minimized or maximized, depending upon

the problem at hand. Consequently, there are two alternative formulations of
the optimal control problem.
The Minimization problem:

Find the control trajectory u∗ : [0, T ] → U ⊆ Rm generating the state
trajectory x∗ : [0, T ] → Rn such that the cost functional J is minimized.

The Maximization problem:

Find the control trajectory u∗ : [0, T ] → Ω ⊆ Rm generating the state
trajectory x∗ : [0, T ] → Rn such that the cost functional J is maximized.
The Hamiltonian function H : Rn × U × Rn × [0, T ] associated

with a regular optimal control problem is:
T
H(x(t), u(t), p(t), t) = L(x(t), u(t), t) + p (t)f (x(t), u(t), t) ,
where p(t) ∈ Rn is the so-called costate vector.

The Russian mathematician Pontryagin has found the following necessary

conditions for the optimality of a solution:Pontryagin
If u∗ : [0, T ] → U is an optimal control trajectory, the following conditions

are satisfied:
a) Optimal state trajectory:
∗
ẋ (t) = ∇pH|∗
∗ ∗
= f (x (t), u (t), t) for t ∈ [0, T ]
∗
x (0) = x0

b) Optimal costate trajectory: There exists an optimal costate trajectory satis-

fying
∗
ṗ (t) = −∇xH|∗
∗ ∗ T ∗ ∗ ∗
= −∇xL(x (t), u (t), t) − fx (x (t), u (t), t)p (t) for t ∈ [0, T ]
∗ ∗
p (T ) = ∇xK(x (T )) .

c) Global static optimization of the Hamiltonian function:

For the minimization problem:
For all t ∈ [0, T ], the Hamiltonian is globally minimized w.r. to u, i.e.,
∗ ∗ ∗ ∗ ∗
H(x (t), u (t), p (t), t) ≤ H(x (t), u, p (t), t) for all u ∈ U .
For the maximization problem:

For all t ∈ [0, T ], the Hamiltonian is globally maximized w.r. to u, i.e.,
∗ ∗ ∗ ∗ ∗
H(x (t), u (t), p (t), t) ≥ H(x (t), u, p (t), t) for all u ∈ U .

Example: The det. LQ-Regulator Problem
For the linear time-varying system
ẋ(t) = A(t)x(t) + B(t)u(t)
with the initial state

x(0) = x0
find the unconstrained optimal control u : [0, T ] → Rm such that the
quadratic cost functional
Z
1 T T
1³ T T
´
J = x (T )F x(T ) + x (t)Q(t)x(t) + u (t)R(t)u(t) dt
2 0 2
is minimized. Here, the penalty matrix R(t) is symmetric and positive-definite,

and the penalty matrices F and Q(t) are symmetric and positive-semidefinite.

Analysis of the necessary conditions for optimality:
Hamiltonian function:
1 T 1 T
H(x(t), u(t), p(t), t) = x (t)Q(t)x(t) + u (t)R(t)u(t)
2 2
T T
+p (t)A(t)x(t) + p (t)B(t)u(t)

Pontryagin’s necessary conditions for optimality:

∗ ∗ ∗
ẋ (t) = A(t)x (t) + B(t)u (t)
∗
x (0) = x0
∗ ∗ T ∗
ṗ (t) = − Q(t)x (t) − A (t)p (t)
∗ ∗
p (T ) = F x (T )
³1 ´
∗ T ∗
u (t) = arg min
m
u R(t)u + p (t)B(t)u
u∈R 2
H -minimizing control:
∗ −1 T ∗
u (t) = −R (t)B (t)p (t)

Plugging the H -minimizing control into the differential equations leads to the
following linear two-point-boundary-value problem:
∗ ∗ −1 T ∗
ẋ (t) = A(t)x (t) − B(t)R (t)B (t)p (t)
∗ ∗ T ∗
ṗ (t) = − Q(t)x (t) − A (t)p (t)
∗
x (0) = x0
∗ ∗
p (T ) = F x (T ) .
Considering that p∗(T ) is linear in x∗(T ) and that the linear differential
equations are homogeneous leads to the educated guess that p∗(t) is linear in
x∗(t) at all times, i.e.,
∗ ∗
p (t) = K(t)x (t)

This leads to the following linear state feedback control:

∗ −1 T ∗
u (t) = −R (t)B (t)K(t)x (t) .
Exploiting the two-point-boundary-value problem and the proposed linear rela-

tion leads to the following matrix Riccati differential equation for K(t) with a
boundary condition at the final time T :
T −1 T
K̇(t) = − A (t)K(t) − K(t)A(t) + K(t)B(t)R (t)B (t)K(t) − Q(t)
K(T ) = F .

Stochastic Optimal Control Problem
Problem 1. Stochastic optimal control problem

( " Z T #)
max E K(T, x(T )) + L(t, x(t), u(t)) dt
u(t)∈U t0
s. t. dx(t) = f (t, x(t), u(t)) dt + g(t, x(t), u(t)) dW (t) (1)

x(t0) = x0.
We define the value or cost-to-go function J(t, x(t))

( Z )
h T i
J(t, x(t)) = max E K(T, x(T )) + L(t, x, u)dt . (2)
u∈U t

Hamilton-Jacobi-Bellman I
Idea: Divide problem in two parts:

½ ¾
u(s) for t ≤ s ≤ t + ∆t
u(t) = (3)
u∗(s) for t + ∆t < s ≤ T
The cost-to-go function can now be split into the two parts and we obtain
"Z
t+∆t
J(x, t) = max E L(x(t), u(t), t) dt
u:[t,T ]→U t
Z #
T
+ K(x(T )) + L(x(t), u(t), t) dt (4)
t+∆t
| {z }
J(x,t+∆t)

Hamilton-Jacobi-Bellman II
This results in the following derivation for J(x, t + ∆t)

Z Ã !
t+∆t
∂J(x, t)
J(x, t + ∆t) = J(x, t) + + AJ(x, t) dt (5)
t ∂t
Z t+∆t
+ Jx(x, x)g(x, u, t)dW, (6)
t
where we use the stochastic differential operator
T ∂(·) 1 n ∂ 2(·) T
o
A(·) = f (x, u, t) + tr g(x, u, t)g (x, u, t)
∂x 2 ∂x2

Hamilton-Jacobi-Bellman III
"Z #
t+∆t
J(x, t) = max E L(x(t), u(t), t) dt + J(x, t + ∆t)
u:[t,t+∆t]→U t
"Z
t+∆t
= max E L(x, u, t)dt
u:[t,t+∆t]→U t
Z t+∆t Ã ! Z t+∆t #
∂J(t, x)
+ J(x, t) + + AJ(x, t) dt + Jx (x, t)g(x, u, t)dW
t ∂t t
"Z
t+∆t
= max E L(x, u, t)dt + J(x, t)
u:[t,t+∆t]→U t
Z t+∆t Ã ! #
∂J(x, t)
+ + AJ(x, t) dt ,
t ∂t

Hamilton-Jacobi-Bellman IV
"Z #
t+∆t
∂J(x, t)
0= max E L(x, u, t) + + AJ(x, t)dt (7)
u:[t,t+∆t]→U t ∂t
set the arguments of the integral to zero and interchange the maximum
operator with the integral
½ ¾
∂J(x, t)
max L(x, u, t) + + AJ(x, t) = 0, (8)
u∈U ∂t
expanding the differential operator A

Stochastic Hamilton-Jacobi-Bellman Equation
½
∂J(t, x) ∂J(t, x)
+ max L(t, x, u) + f (t, x, u) (9)
∂t u(t)∈U ∂x
1 n ∂ 2
J(t, x) o¾
T
+ tr g(t, x, u)g (t, x, u) 2
=0
2 ∂x
This is the Hamilton-Jacobi-Bellman equation for a stochastic control

problem as in (1). The maximizing u(t) can now be found in terms of
(t, x(t)), Jx and Jxx and reinserted into (9). By solving resulting PDE for the
cost function J(t, x(t)) the explicit solution for the optimal control u(t) can
be found.

HJB Solution Procedure
h
1. For a fixed J(t, x), one has to find u = u(t, x, J) such that L(t, x, u)+
i
1 T
J x(t, x)f (t, x, u) + 2 tr{g(t, x, u)g (t, x, u)J xx(t, x)} is maximal.
2. The function u is put back h into HJB equation for u, and
the partial differential equation L(t, x, u) + Jx(t, x)f (t, x, u) +
i
1 T
2 tr{g(t, x, u)g (t, x, u)Jxx (t, x)} = 0 is solved with terminal condi-
tion J(T, x) = K(T, X(T )). The solution J(t, x) is the maximal value
functional.
3. The function J(t, x) is put back into the equation for u, which is
derived in Step 1. This results in the optimal feedback control law
u∗ = u∗(t, x, J(t, x)).

Stochastic LQG Controller HJB
We consider the following optimal control problem:

h1³ Z T
T T
min
m
E x (t)Q(t)x(t) + 2u (t)N (t)x(t)
u∈R 2 0
´i
T T
+u (t)R(t)u(t)dt + x (T )F x(T ) s.t.
dx(t) = (A(t)x(t) + B(t)u(t))dt

+(C(t)x(t) + D(t)u(t))dZ. (10)
with dZ ∈ R1. If the Brownian motion is only time-dependent (dx(t) =

(A(t)x(t) + B(t)u(t))dt + (σ(t))dZ ), the optimal stochastic controller
is the same as the deterministic controller (certainty equivalent), which is
proven later.

The corresponding HJB equation yields:

n1
T T T
Jt + min (x Qx + 2x N (t)u + u Ru)
u 2
1 o
T
+Jx(Ax + Bu) + (Cx + Du) Jxx(Cx + Du) = 0
2
1 T
with terminal condition J(T, x(T )) = x (T )F x(T ). (11)
2
1. step: Calculate the minimum for u(t, x):
T T T T
N x + Ru + D JxxDu + B Jx + D JxxCx = 0
T −1 T T T
u = −(R + D JxxD) (B Jx + N x + D JxxCx). (12)

2. step: Plug u back in the HJB equation:
1 T³ T T T T
Jt + x Q + C JxxC − 2C JxxDLN − N LN
2
´ 1 T
T T T
− C JxxDLD JxxC x − Jx BLB Jx
2
1 ³ T T T
´
+ Jx A − BLN − C JxxDLB x
2
1 ´
T T T
+ x(A − N LB − BLD JxxC Jx = 0 (13)
2
where L = (R + D T JxxD)−1 and J(T ) = 12 xT (T )F x(T ). We make

a quadratic Ansatz for J , J(t, x) = 21 xT (t)K(t)x(t).

Inserting the Ansatz yields

h ¡ T ¢T
T T T T
x K̇ + KA + A K + C KC + Q − B K + N + D KC
¡ T ¢−1¡ T T ¢i
· R + D KD B K + N + D KC x = 0 (14)
This gives the stochastic Riccati equation:

T T ¡ T T ¢T
K̇ + KA + A K + C KC + Q − B K + N + D KC
¡ T ¢−1¡ T T ¢
· R + D KD B K + N + D KC = 0
K(T ) = F (15)

3. step: Plug J back into the solution of u:

∗ T −1 T T T
u (t, x) = −(R + D JxxD) (B Jx + N x + D JxxCx)
T −1 T T T
= −(R + D KD) (B Kx + N x + D KCx)
T −1 T T T
= −(R + D KD) (B K + N + D KC)x. (16)
K(t) is the solution of the stochastic Riccati-equation (15). Note that in

the stochastic case we may set R = 0 if D T K(t)D > 0 (this is not
possible in the deterministic case).

Deterministic Pontryagin Maximum Principle
ẋ(t) = f (t, x(t), u(t))

Z T
J = K(x(T )) + L(x(t), u(t), t) dt
0
Hamiltonain function: H(x(t), u(t), t, p(t)) = L(x(t), u(t), t) + pT (t)f (x(t), u(t), t)
∗ ∗ ∗
ẋ (t) = ∇pH|∗ = f (x (t), u (t), t)
∗
x (0) = x0
∗ ∗ ∗ T ∗ ∗ ∗
ṗ (t) = −∇xH|∗ = −∇xL(x (t), u (t), t) − fx (x (t), u (t), t)p (t)
∗ ∗
p (T ) = ∇xK(x (T ))
∗ ∗ ∗ ∗ ∗
H(x (t), u (t), t, p (t)) ≤ H(x (t), u, t, p (t)) for all u ∈ U .

Stochastic Pontryagins Maximum Principle
• Stochastic Pontryagins Maximum Principle:

A system of forward backward stochastic differential equations (FBSDE)
replaces the HJB partial differential equation.
nZ T o
max E L(t, x(t), u(t))dt + K(x(T ), T )
u 0
s.t.
dx = f (t, x, u)dt + g(t, x, u)dW

Adjoint Equations
In the stochastic case, we define the Hamiltonian function by

n n n×n
H : [0, T ] × R × U × R × R
T 1 T
H(t, x, u, p, px) = L(t, x, u) + f (t, x, u)p + tr{pxg(t, x, u)g (t, x, u)}.
2
Before stating the stochastic maximum principle we need to define the adjoint
equations for the stochastic optimal control problem as
p(t) = Jx(t, x) (17)

∂p
px(t) = = Jxx(t, x). (18)
∂x

Hamiltonian Function
Note that we use the Hamiltonian function to write the knwon HJB equation
as
−Jt = max H(t, x, u, p, px). (19)
u∈U
In the following we assume there is a known optimal control law u∗(t, x, p, px)
which solves the optimal control problem such that
∗ ∗
H (t, x, p, px) = H(t, x, u (t, x, p, px), p, px) (20)
T
= L(t, x, p, px) + f (t, x, p, px)p
1 © T ª
+ tr pxg(t, x, p, px)g (t, x, p, px) (21)
2
= −Jt (22)

Differential Equations for Adjoint Equations
In the next step we write the differential equations for the state and adjoint
equations dx∗ and dp∗, respectively:
∗ ∗
dx = f (t, x, u )dt + g(t, x, u )dW
∗
= Hp (t, x, p, px)dt + g(t, x, p, px)dW,
derived from the system dynamics and (21). And using Itô’s lemma on the
definition of p(t) in equation 17 gives us
1 2
dp = Jxtdt + Jxxdx + Jxxx(dx) (23)
2
h 1 i
T
= Jxt + Jxxf + tr{Jxxxgg } dt + JxxgdW. (24)
2
Define: tr{Jxxxgg T } = [tr{(Jx1 )xxgg T }, tr{(Jx2 )xxgg T }, . . . , tr{(Jxn )xxgg T }]T

Differential Equations for Adjoint Equations
Then we take the partial derivative of (22) with respect to x to get Jxt
∗ ∗ ∂p ∗ ∂px
−Jxt = Hx + Hp + Hpx
∂x ∂x
∗ 1 T
= Hx + Jxxf + tr{Jxxxgg }, (25)
2
and insert (25) into (24) leads to
∗
dp = −Hx dt + JxxgdW (26)
∗
= −Hx dt + pxgdW. (27)

Pontryagins Maximum Principle
We can finally write the system of forward backward stochastic differential

equation (FBSDE), the stochastic maximum principle
T 1 T
H(t, x, u, p, px) = L(t, x, u)+f (t, x, u)p+ tr{pxg(t, x, u)g (t, x, u)}.
2

 ∗ ∗

 dx = H p dt + gdW

 ∗ ∗
 dp = −Hx dt + pxgdW,
x∗(0) = x0,



 p∗(T ) = Kx(T, x(T )),

 H ∗(t, x(t), u(t), p(t), p (t)) = max H(t, x(t), u, p(t), p ).
x u x
(28)

Conditions for Optimality
The conditions for optimality are similar to the deterministic case:

a) Optimal state trajectory obtained from dx∗
b) Optimal costate trajectory obtained from dp∗
c) Global static optimization of the Hamiltonian function
- Minimization
∗ ∗ ∗ ∗ ∗ ∗ ∗
H(t, x (t), u (t), p (t), px(t)) ≤ H(t, x (t), u(t), p (t), px(t)), for t ∈ [0, T ]
- Maximization
∗ ∗ ∗ ∗ ∗ ∗ ∗
H(t, x (t), u (t), p (t), px(t)) ≥ H(t, x (t), u(t), p (t), px(t)), for t ∈ [0, T ]

Stochastic LQG Controller Pontryagin
Consider the stochastic LQG problem

n1 Z T
T T
J(t, x) = E x(t) Q(t)x(t) + u(t) R(t)u(t)dt
2 0
1 o
T
+ x(T ) M (T )x(T )
2
s.t.
£ ¤
dx = A(t)x(t) + B(t)u(t) dt + σ(t)dW (t)
x(0) = x0,
with R(t) = RT (t) > 0, M (t) = M T (t) ≥ 0, and Q(t) = QT (t) ≥ 0.

The Hamiltonian function:

1¡ T T ¢ £ ¤T 1 T
H(t, x, u, p, q) = x Qx + u Ru + Ax + Bu p + tr{pxσ(t)σ (t)}.
2 2
Pontryagins necessary conditions are
∗ ∗ £ ∗ ∗ ¤
dx (t) = Hp dt + σ(t)dW = A(t)x (t) + B(t)u (t) dt + σ(t)dW (t)
∗
x (0) = x0
∗ ∗ ∗ ∗ T ∗ ∗
dp (t) = −H x + pxσ(t)dW = −Q(t)x (t) + A (t)p (t) + pxσ(t)dW (t)
∗ ∗
p (T ) = M (T )x (T )
∗ ∗ ∗ ∗ ∗ ∗ ∗
H(t, x , u , p , px) ≥ H(t, x , u, p , px).

From the last statement we follow, that
∗ 1³ T T T ∗
´
u (t) = arg min u (t)R(t)u(t) + u (t)B (t)p (t)
u 2
optimal control u∗(t) = −R−1(t)B T (t)p∗(t). This put back into the system of forward
backward stochastic differential equations (FBSDE)
∗ £ ∗ −1 T ∗ ¤
dx (t) = A(t)x (t) − B(t)R (t)B (t)p (t) dt + σ(t)dW (t)
∗
x (0) = x0
∗ ∗ T ∗ ∗
dp (t) = −Q(t)x (t) + A (t)p (t) + px(t)σ(t)dW (t)
∗ ∗
p (T ) = M (T )x (T ).
Ansatz: p∗(t) = K(t)x + φ(t), ⇒ p∗x(t) = K(t)

Using Itô’s lemma we yield for the stochastic process p∗

h 1 i
∗ −1 T ∗
dp = K̇x + φ̇ + K(Ax − BR B p ) + · 0 dt + KσdW
2
h ¡ ¢i
−1 T
= K̇x + φ̇ + K Ax − BR B (Kx + φ) dt + qdW
from the FBSDE system we know

∗ T £ T ¤
dp = (−Qx − A p)dt + qdW = − Qx − A (Kx + φ) dt + qdW.
Setting the two equations equal:

h ¡ ¢i h i
−1 T T
K̇x + φ̇ + K Ax − BR B (Kx + φ) dt + qdW = − Qx − A (Kx + φ) dt + qdW
¡ −1 T ¢ T
K̇x + φ̇ + K Ax − BR B (Kx + φ) = −Qx − A (Kx + φ)

Leading to the two differential equations for K and φ:

T −1 T
K̇ + Q + KA + A K − KBR B K = 0
K(T ) = M (T )
−1 T T
φ̇ − KBR B φ+A φ = 0
φ(T ) = 0
∗ −1 T −1 T
u (t) = −R (t)B (t)(K(t)x + φ(t)) = −R (t)B (t)K(t)x
Note: certainty equivalent

Extended Stochastic LQG Controller HJB
The extended LQG problem with a m-dimensional Brownian motions and

inhomogeneous linear dynamics is defined like the following:
n1 Z T
T T T
min E (x(t) Q(t)x(t) + 2x(t) N (t)u(t) + u(t) Ru(t))dt
u 2 0
1 o
T T
+ x(T ) M (T )x(T ) + g(T ) x(T ) (29)
2
³ ´
dx(t) = A(t)x(t)dt + B(t)u(t) + b(t) dt
m
X
+ [Cj (t)x(t) + Dj (t)u(t) + dj (t)]dZj , x(0) = x0 (30)
j=1
The solution is derived with the following Ansatz:

J(t, x) = 12 xT K(t)x(t) + ϕT x(t) + ψ(t)

m
X m
X
T T ¡ T ¢T
K̇ + KA + A K + Cj KCj + Q − B K + N + Dj KCj
j=1 j=1
m
X m
X
¡ T ¢−1¡ T T ¢
· R+ Dj KDj B K+N + Dj KCj = 0, K(T ) = M (T ) (31)
j=1 j=1
m
X m
X
£ ¡ T ¢−1¡ T ¢¤T
ϕ̇ + A − B R + Dj KDj B K+N + Dj KCj ϕ
j=1 j=1
m
X m
X
£ ¡ T ¢−1
+ C j − Dj R + Di KDi
j=1 i=1
m
X
¡ T T ¢¤T
· B K+N + Di KCi Kdj + P b = 0, ϕ(T ) = g(T ) (32)
i=1

The optimal control law is computed by solving the two stochastic Riccati
equations 31 and 32) for K(t) and ϕ(t)
m
X m
X
T ¢−1¡ T T ¢
u(t, x) = −(R + Dj KDj B K +N + Dj KCj x(t)
j=1 j=1
m
X m
X
T ¢−1¡ T T ¢
−(R + Dj KDj B ϕ + Dj dj . (33)
j=1 j=1
In contrary to the previous LQG controller (33) we have calculated the

feed-forward and the feed-back solutions. This is necessary because of the
terms dj (t) and b(t) which are only time-dependent.

Portfolio Models and Stochastic Optimal Control
Statement of the problem

(Z )
T
−ρt γ γ
max E e C(t) dt + πX(T )
u 0
s.t.
T T
dX = [X(u (b − e r) + r) − C] dt + Xu σdW.
The cost-to-go function is defined by

(Z )
T
−ρt γ γ
J(X(t), t) = max E e C(t) dt + πX(T ) .
C,u(t) 0

Portfolio Models and Stochastic Optimal Control
∂ 2J
This problem leads to the following HJB equation ( ∂J
∂t ≡ Jt ,
∂J
∂X ≡ Jx, and ∂X 2
≡ Jxx)
" #
−ρt γ ¡ T ¢ 1 2 T
Jt + max e C + X(u (b − e r) + r) − C Jx + X Jxxu Σu =0
C(t),u(t) 2
with Σ = Σ(t) = σ(t)T σ(t).

First step: First oder conditions
Jx −1
u(t) = − Σ (b − er)
XJxx
³1 ´ 1
ρt γ−1
C(t) = e Jx .
γ

Solution to portfolio model
Second step: Put these values back into the HJB equation
γ ³ −γ −1 ´
ρt
γ−1 1 Jx2 T −1
Jt + e γ−1 Jx γ γ−1 − γ γ−1 − (b − er) Σ (b − er) + XrJx = 0
2 Jxx
For the solution of the PDE We use a separation Ansatz
γ −ρt
J(X, t) = X e h(t)
∂J −ρt γ¡ 0 ¢
= e X h (t) − ρh(t)
∂t
∂J −ρt γ−1
= e h(t)γX
∂X
∂ 2J −ρt γ−2
= e h(t)γ(γ − 1)X .
∂X 2

Third step: Plugging these results back into the HJB-PDE yields
µ ³ T −1 ´ ¶
−ρt γ 0 (b − er) Σ (b − er) γ
e X h (t) + h(t) rγ − ρ + − (1 − γ)h(t) γ−1 = 0,
2(1 − γ)
the proposed Ansatz is therefore valid. To specify h(t) and find an explicit solution to the optimal
control problem, the ordinary differential equation
γ
0
h (t) + Ah(t) − (1 − γ)h(t) γ−1 = 0
ρT
h(T ) = πe ,
³ ´
(b−er)T Σ−1 (b−er)
with A = rγ − ρ + 2(1−γ) remains to be solved.

Inserting these solution for J and its partial derivatives into equations for C and u yields the
optimal policies for consumption C ∗(t) and investment strategy u∗(t):
∗ 1 −1 1 −1
u (t) = − Σ (b − er) = Σ (b − er)
γ−1 1−γ
³ ´ 1 1
∗ γ−1 γ−1
C (t) = h(t)γX = h(t) γ−1 X.

Stochastic Optimal Control

Uploaded by

Copyright:

Available Formats

Stochastic Optimal Control

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stochastic Optimal Control

Uploaded by

Copyright:

Available Formats

Stochastic Optimal Control

ẋ(t) = f (x(t), u(t), t) .

Its initial state at the fixed initial time 0 is given:

Stochastic Systems, 2010 2

Furthermore, consider a cost functional of the following form:

This cost functional should either be minimized or maximized, depending upon

The Minimization problem:

Stochastic Systems, 2010 3

The Maximization problem:

The Hamiltonian function H : Rn × U × Rn × [0, T ] associated

where p(t) ∈ Rn is the so-called costate vector.

Stochastic Systems, 2010 4

The Russian mathematician Pontryagin has found the following necessary

If u∗ : [0, T ] → U is an optimal control trajectory, the following conditions

Stochastic Systems, 2010 5

b) Optimal costate trajectory: There exists an optimal costate trajectory satis-

Stochastic Systems, 2010 6

c) Global static optimization of the Hamiltonian function:

For the maximization problem:

Stochastic Systems, 2010 7

For the linear time-varying system

ẋ(t) = A(t)x(t) + B(t)u(t)

with the initial state

is minimized. Here, the penalty matrix R(t) is symmetric and positive-definite,

Stochastic Systems, 2010 8

Analysis of the necessary conditions for optimality:

Stochastic Systems, 2010 9

Pontryagin’s necessary conditions for optimality:

Stochastic Systems, 2010 10

Stochastic Systems, 2010 11

This leads to the following linear state feedback control:

Exploiting the two-point-boundary-value problem and the proposed linear rela-

Stochastic Systems, 2010 12

Problem 1. Stochastic optimal control problem

s. t. dx(t) = f (t, x(t), u(t)) dt + g(t, x(t), u(t)) dW (t) (1)

We define the value or cost-to-go function J(t, x(t))

Stochastic Systems, 2010 13

Idea: Divide problem in two parts:

Stochastic Systems, 2010 14

This results in the following derivation for J(x, t + ∆t)

where we use the stochastic differential operator

Stochastic Systems, 2010 15

Stochastic Systems, 2010 16

expanding the differential operator A

Stochastic Systems, 2010 17

This is the Hamilton-Jacobi-Bellman equation for a stochastic control

Stochastic Systems, 2010 18

Stochastic Systems, 2010 19

We consider the following optimal control problem:

dx(t) = (A(t)x(t) + B(t)u(t))dt

with dZ ∈ R1. If the Brownian motion is only time-dependent (dx(t) =

Stochastic Systems, 2010 20

The corresponding HJB equation yields:

Stochastic Systems, 2010 21

2. step: Plug u back in the HJB equation:

where L = (R + D T JxxD)−1 and J(T ) = 12 xT (T )F x(T ). We make

Stochastic Systems, 2010 22

Inserting the Ansatz yields