Lecture8 S21
Lecture8 S21
Lecture8 S21
Optimal Control
and Dynamic Games
Shankar Sastry, Forrest Laine, Claire Tomlin
These notes represent an introduction to the theory of optimal control, the linear quadratic
regulator (LQR), and dynamic games.
There exist two main approaches to optimal control:
1. For the calculus of variations, the optimal curve should be such that neighboring curves
do not lead to smaller costs. Thus the ‘derivative’ of the cost function about the optimal
curve should be zero: one takes small variations about the candidate optimal solution
and attempts to make the change in the cost zero.
2. For dynamic programming, the optimal curve remains optimal at intermediate points
in time.
In these notes, both approaches are discussed for optimal control; the results are then spe-
cialized to the case of linear dynamics and quadratic cost. Then, the methods are generalized
to dynamic games.
1
the ox-hide into a tiny strip and proceeded to enclose the entire area of what came to be
know as Carthage in a circle of the appropriate radius1 . The calculus of variations is really
the ancient precursor to optimal control. Iso perimetric problems of the kind that gave
Dido her kingdom were treated in detail by Tonelli and later by Euler. Both Euler and
Lagrange laid the foundations of mechanics in a variational setting culminating in the Euler
Lagrange equations. Newton used variational methods to determine the shape of a body that
minimizes drag, and Bernoulli formulated his brachistochrone problem in the seventeenth
century, which attracted the attention of Newton and L’Hôpital. This intellectual heritage
was revived and generalized by Bellman [3] in the context of dynamic programming and by
Pontryagin and his school in the so-called Pontryagin principle for optimal control ([6]).
Consider a nonlinear possibly time varying dynamical system described by
ẋ = f (x, u, t) (1)
with state x(t) ∈ Rn and the control input u ∈ Rni . Consider the problem of minimizing the
performance index Z tf
J = φ(x(tf ), tf ) + L(x(t), u(t), t)dt (2)
t0
where t0 is the initial time, tf the final time (free), L(x, u, t) is the running cost, and
φ(x(tf ), tf ) is the cost at the terminal time. The initial time t0 is assumed to be fixed
and tf variable. Problems involving a cost only on the final and initial state are referred
to as Mayer problems, those involving only the integral or running cost are called Lagrange
problems and costs of the form of equation (2) are referred to as Bolza problems. We will
also have a constraint on the final state given by
ψ(x(tf ), tf ) = 0 (3)
2
The variation of (4) is given by assuming independent variations in δu(), δx(), δp(), δλ, and
δtf :
δ J˜ = (D1 φ + D1 ψ T λ)δx|tf + (D2 φ + D2 ψ T λ)δt|tf + ψ T δλ
+ (H − pT ẋ)δt|tf (6)
R tf T T T
+ t0
D1 Hδx + D3 Hδu − p δ ẋ + (D 2 H − ẋ) δp dt
The notation Di H stands for the derivative of H with respect to the i th argument. Thus,
for example,
∂H ∂H
D3 H(x, p, u, t) = D1 H(x, p, u, t) =
R T ∂u ∂x
Integrating by parts for p δ ẋdt yields
δ J˜ = (D1 φ + D1 ψ T λ − pT )δx(tf ) + (D2 φ + D2 ψ T λ + H)δtf + ψ T δλ
Rt (7)
+ t0f (D1 H + ṗT )δx + D3 Hδu + (D2T H − ẋ)T δp dt
An extremum of J˜ is achieved when δ J˜ = 0 for all independent variations δλ, δx, δu, δp.
These conditions are recorded in the following
Table 1
∂H T
State Equation ẋ = ∂p
δp
T
Costate equation ṗ = − ∂H
∂x
δx
∂H
Input stationarity ∂u
=0 δu
Boundary conditions D1 φ − pT = −D1 ψ T λ|tf δx(tf )
H + D2 φ = −D2 ψ T λ|tf δtf
The conditions of Table (1) and the boundary conditions x(t0 ) = x0 and the constraint on
the final state ψ(x(tf ), tf ) = 0 constitute the necessary conditions for optimality. The end
point constraint equation is referred to as the transversality condition:
D1 φ − pT = −D1 ψ T λ
(8)
H + D2 φ = −D2 ψ T λ
3
The optimality conditions may be written explicitly as
∂H T
ẋ = ∂p
(x, u∗ , p)
T (9)
ṗ = − ∂H
∂x
(x, u∗ , p)
with the stationarity condition reading
∂H
(x, u∗ , p) = 0
∂u
and the endpoint constraint ψ(x(tf ), tf ) = 0. The key point to the derivation of the nec-
essary conditions of optimality is that the Legendre transformation of the Lagrangian to
be minimized into a Hamiltonian converts a functional minimization problem into a static
optimization problem on the function H(x, u, p, t).
The question of when these equations also constitute sufficient conditions for (local) opti-
mality is an important one and needs to be ascertained by taking the second variation of J. ˜
This is an involved procedure but the input stationarity condition in Table (1) hints at the
sufficient condition for local minimality of a given trajectory x∗ (·), u∗ (·), p∗ (·) being a
local minimum as being that the Hessian of the Hamiltonian,
D22 H(x∗ , u∗ , p∗ , t) (10)
being positive definite along the optimal trajectory. A sufficient condition for this is to ask
simply that the ni × ni Hessian matrix
D22 H(x, u, p, t) (11)
be positive definite. As far as conditions for global minimality are concerned, again the
stationarity condition hints at a sufficient condition for global minimality being that
u∗ (t) = argmin H(x∗ (t), u, p∗ (t), t) (12)
{ min over u }
Sufficient conditions for this are, for example, the convexity of the Hamiltonian H(x, u, p, t)
in u.
Finally, there are instances in which the Hamiltonian H(x, u, p, t) is not a function of u at
some values of x, p, t. These cases are referred to as singular extremals and need to be treated
with care, since the value of u is left unspecified as far as the optimization is concerned.
4
Further, if there is no final state constraint the boundary condition simplifies even further
to
p(tf ) = D1 φT |tf (14)
∂H T
State Equation ẋ = ∂p
= f (x, u∗ )
T
Costate Equation ṗ = − ∂H
∂x
= −D1 f T p + D1 LT
Stationarity Condition 0 = ∂H
∂u
= D2 LT + D2 f T p
Transversality Conditions D1 φ − pT = −D1 ψ T λ
H(tf ) = 0
In addition, it may be verified that
dH ∗ ∂H ∗ ∂H ∗
= (x, p)ẋ + ṗ = 0 (15)
dt ∂x ∂p
thereby establishing that H ∗ (t) ≡ 0.
q̇ = u
2
For example, there is no dissipation or no nonholonomic constraints. Holonomic or integrable constraints
are dealt with by adding appropriate Lagrange multipliers. If nonholonomic constraints are dealt with in
the same manner, we get equations of motion, dubbed vakonomic by Arnold [11] which do not correspond to
experimentally observed motions. On the other hand, if there are only holonomic constraints, the equations
of motion that we derive from Hamilton’s principle of least action is equivalent to Newton’s laws.
5
with Lagrangian
L(q, u) = T (q, u) − U (q)
The equations (9) in this context have H(q, u, p) = L(q, u) + pT u. u∗ = u∗ (p, q) is chosen
so as to minimize the Hamiltonian H. A necessary condition for stationarity is that u∗ (p, q)
satisfies
∂H ∂L
0= = + pT (16)
∂u ∂u
The form of the equations (9) in this context is that of the familiar Hamilton Jacobi equations.
The costate p has the interpretation of momentum.
∗
q̇ = ∂H
∂p
(p, q) = u∗ (p, q)
∗ (17)
ṗ = − ∂H∂q
(p, q)
Combining the second of these equations with (16) yields the familiar Euler Lagrange equa-
tions
∂L d ∂L
− =0 (18)
∂q dt ∂ q̇
H ∗ (x, p, t) := H(x, u∗ , p, t)
6
Consider, the time varying optimal control problem of (2) with fixed endpoint tf and time
varying dynamics. If the optimal value function, i.e. J ∗ (x(t0 ), t0 ) is a smooth function of
x, t, then J ∗ (x, t) satisfies the Hamilton Jacobi Bellman partial differential equation
∂J ∗ ∂J ∗
(x, t) = −H ∗ (x, (x, t), t) (19)
∂t ∂x
with boundary conditions given by J ∗ (x, tf ) = φ(x, tf ) for all x ∈ {x : ψ(x, tf ) = 0}.
Proof: The proof uses the principle of optimality. This principle says that if we have found
the optimal trajectory on the interval from [t, tf ] by solving the optimal control problem
on that interval, the resulting trajectory is also optimal on all subintervals of this interval
of the form [t1 , tf ] with t1 > t, provided that the initial condition at time t1 was obtained
from running the system forward along the optimal trajectory from time t. Thus, from using
t1 = t + ∆t, it follows that
min Z t+∆t
∗ ∗
J (x, t) = u(τ ) L(x, u, τ )dτ + J (x + ∆x, t + ∆t) (20)
t ≤ τ ≤ t + ∆t t
∂J ∗ ∂J ∗
min
− = L+( )f (21)
∂t u(t) ∂x
J ∗ (x, tf ) = φ(x, tf )
on the surface ψ(x) = 0. Using the definition of the Hamiltonian in equation (5), it follows
from equation (21) that the Hamilton Jacobi equation of equation (19) holds.
Remarks:
1. The preceding theorem was stated as a necessary condition for extremal solutions of
the optimal control problem. As far as minimal and global solutions of the optimal
control problem, the Hamilton Jacobi Bellman equations read as in equation (21). In
this sense, the form of the Hamilton Jacobi Bellman equation in (21) is more general.
7
2. The Eulerian conditions of Table (1) are easily obtained from the Hamilton Jacobi
∗
Bellman equation by proving that pT (t) := ∂J ∂x
(x, t) satisfies the costate equations of
that Table. Indeed, consider the equation (21). Since u(t) is unconstrained, it follows
that it should satisfy
∂L ∗ ∗ ∂f T
(x , u ) + p=0 (22)
∂u ∂u
Now differentiating the definition of p(t) above with respect to t yields
dp T ∂ 2J ∗ ∗ ∂ 2J ∗
= (x , t) + f (x∗ , u∗ , t) (23)
dt ∂t∂x ∂x2
Differentiating the Hamilton Jacobi equation (21) with respect to x and using the
relation (22) for a stationary solution yields
∂ 2J ∗ ∗ ∂L ∂ 2 J ∗ ∂f
− (x , t) = + 2
f + pT (24)
∂t∂x ∂x ∂x ∂x
Using equation (24) in equation (23) yields
∂f T ∂L T
−ṗ = p+ (25)
∂x ∂x
establishing that p is indeed the co-state of Table 1. The boundary conditions on p(t)
follow from the boundary conditions on the Hamilton Jacobi Bellman equation.
8
2.2 Free end time problems
In the instance that the final time tf is free, the transversality conditions are that
pT (tf ) = D1 φ + D1 ψ T λ
(27)
H(tf ) = −(D2 φ + D2 ψ T λ)
A special class of minimum time problems of especial interest is minimum time problems,
where tf is to be minimized subject to the constraints. This is accounted for by setting
the Lagrangian to be 1, and the terminal state cost φ ≡ 0, so that the Hamiltonian is
H(x, u, p, t) = 1 + pT f (x, u, t). Note that by differentiating H(x, u, p, t) with respect to time,
we get
dH ∗ ∂H ∗
= D1 H ∗ ẋ + D2 H ∗ u̇ + D3 H ∗ ṗ + (28)
dt ∂t
Continuing the calculation using the Hamilton Jacobi equation,
dH ∗ ∂H ∗ ∂H ∗ ∂H ∗
=( + ṗ)f (x, u∗ , t) + = (29)
dt ∂x ∂t ∂t
In particular, if H ∗ is not an explicit function of t, it follows that H ∗ (x, u, p, t) ≡ H. Thus,
for minimum time problems for which f (x, u, t) and ψ(x, t) are not explicitly functions of t,
it follows that 0 = H(tf ) ≡ H(t).
The material here will borrow from Stephen Boyd’s lectures in Stanford’s EE363 class, which
gives an excellent presentation of this topic [14].
9
3.1 Continuous-time Linear Quadratic Regulator
The Linear Quadratic Regulator is a classical problem first formulated by Rudolf Kalman in
the 1960’s [15]. The problem involves finding the optimal control policies for a system with
linear dynamics and a quadratic running-cost. In particular, we consider the time-invariant
dynamical system described by
ẋ = Ax + Bu (30)
with state x(t) ∈ Rn and control input u(t) ∈ Rm .
We wish to minimize the performance index
Z T
1 | | |
J= x(T ) QT x(T ) + x(τ ) Qx(τ ) + u(τ ) Ru(τ )dτ (31)
2 t0
1 1
H(x, u, p) = x| Qx + u| Ru + p| (Ax + Bu) (32)
2 2
ẋ = Ax + Bu (33)
x(0) = x0 (34)
ṗ = −A| p − Qx (35)
0 = Ru + B | p (36)
p(T ) = QT x(T ) (37)
Note that D2 H(x, u, p) = R 0, so the necessary conditions for optimality are also sufficient
for a global minimum. Furthermore, because from Eq. (36) we have that u(t) = −R−1 B | p(t),
we can combine the above equations into the following differential equation:
A −BR−1 B | x
ẋ
= . (38)
ṗ −Q −A| p
Claim: There exists a positive-semi-definite matrix V given by the following matrix differ-
ential equation, known as the Riccati differential equation:
−V̇ = A| V + V A − V BR−1 B | V + Q, V (T ) = QT , (39)
10
such that p(t) = V (t)x(t).
ṗ = V̇ x + V ẋ (40)
= −(A| V + V A − V BR−1 B | V + Q)x + V (Ax − BR−1 B | p) (41)
= −Qx − A| V x + V BR−1 B | V x − V BR−1 B | V x (42)
= −Qx − A| p (43)
Hence, by first integrating the value function V backwards in time starting from the initial
(final) condition V (T ) = QT , we can substitute p(t) = V (t)x(t) and solve forward in time the
system ẋ = Ax + Bu∗ , where u∗ (t) = −R−1 B | V (t)x(t), starting from the initial condition
x(0) = x0 . This provides a form for the optimal trajectory x(t), as well as the optimal
control sequence u∗ (t) as a linear function of the state.
11
Here R̄ = hR and Q̄ = hQ.
The dynamic programming solution to the discrete-time LQR problem gives the relation
uk = Kk xk where
Here Vk is a discrete-time verison of the value function, and we have defined Ā = (I + hA)
and B̄ = hB. Substituting and removing all terms with h2 or higher-order terms, we get
Rearranging, we have
1
− (Vk+1 − Vk ) = Q + A| Vk+1 + Vk+1 A − Vk+1 BR−1 B | Vk+1 (51)
h
And in the limit h → 0, we recover the continuous-time Riccati differential equation
12
At xt + Bt ut + ct . We consider minimizing the performance objective given by time-varying
cost
−1
N
!
1 X
J= x|N QT xN + 2qN|
xN + x|k Qxk + 2u|k Sk xk + u|k Ruk + 2qk| xk + 2rk| uk (53)
2 k=0
Q0 S0|
q0
S0 R0 r0
Q1 S1| q1
H= , h = r1 (56)
S1 R1
.. ..
. .
QN qN
Similarly, we can express the initial condition constraint as well as the dynamics constraints
in the compact form Gz + g = 0 where
I −xinit
−A0 −B0 I −c0
G=
−A 1 −B 1 I ,g =
−c
1 (57)
−A 2 −B 2 I
−c2
... ..
.
Combining the performance objective and constraint terms, we have the following optimiza-
tion problem:
1 |
min z Hz + h| z (58)
z 2
s.t. Gz + g = 0 (59)
13
The objective is a quadratic function of z and the constraints are linear. Hence, this is a
standard Equality-Constrained Quadratic Program. The Karush-Kuhn-Tucker conditions of
optimality [16] for this problem are given by
Hz + h + G| λ = 0 (60)
Gz + g = 0, (61)
It turns out that system above has a unique sparsity structure, and can be permuted such
that the matrix on the left-hand side is banded. This allows for very efficient computation
of the solution vectors z and λ.
Not all types of convex problems have closed-form solutions like this one. In fact, most do
not. However, there exist many efficient solvers for convex problems which are guaranteed
to find globally optimal solutions.
14
to minimize and the other to maximize the same cost function taken to be of the form
Z tf
J = φ( x(tf ), tf ) + L(x, u, d, t)dt (64)
t0
We will assume that player 1 (u) is trying to minimize J and player 2 (d) is trying to
maximize J. For simplicity we have omitted the final state constraint and also assumed the
end time tf to be fixed. These two assumptions are made for simplicity but we will discuss
the tf free case when we study pursuit evasion games. The game is said to have perfect
information if both players have access to the full state x(t). The solution of two person zero
sum games proceeds very much along the lines of the optimal control problem by setting up
the Hamiltonian
H(x, u, d, p, t) = L(x, u, d, t) + pT f (x, u, d, t) (65)
Rather than simply minimizing H(x, u, d, p, t) the game is said to have a saddle point solution
if the following analog of the saddle point condition for two person zero sum static games
holds:
min max H(x, u, d, p, t) = max min H(x, u, d, p, t) (66)
u d d u
If the minmax is equal to the maxmin, the resulting optimal Hamiltonian is denoted H ∗ (x, p, t)
and the optimal inputs u∗ , d∗ are determined to be respectively,
∗
u (t) = argmin max H(x, u, d, p, t) (67)
u d
and
∗
d (t) = argmax min H(x, u, d, p, t) (68)
d u
The equations for the state and costate and the transversality conditions are given as before
by
∗T
ẋ = ∂H
∂p
(x, p)
∗ T (69)
ṗ = − ∂H
∂x
(x, p)
with boundary conditions x(t0 ) = x0 and pT (tf ) = D1 φ(xtf ), and the equation is the familiar
Hamilton Jacobi equation. As before, one can introduce the optimal cost to go J ∗ (x(t), t)
and we have the following analog of Theorem (1):
15
with boundary conditions given by J ∗ (x, tf ) = φ(x) for all x.
Remarks
1. We have dealt with saddle solutions for unconstrained input signals u, d thus far in the
development. If the inputs are constrained to lie in sets U, D respectively the saddle
solutions can be guaranteed to exist if
2. The sort of remarks that were made about free endpoint optimal control problems can
also be made of games.
3. In our problem formulation for games, we did not include explicit terminal state con-
straints of the form ψ(x(tf ), tf ) = 0. These can be easily included, and we will study
this situation in greater detail under the heading of pursuit evasion games.
4. The key point in the theory of dynamical games is that the Legendre transformation
of the Lagrangian cost function into the Hamiltonian function converts the solution of
the “dynamic” game into a “static” game, where one needs to find a saddle point of
the Hamiltonian function H(x, u, d, p, t). This is very much in the spirit of the calculus
of variations and optimal control.
16
5.1 Non-cooperative Nash solutions
When there are N players each able to influence the process by controls ui ∈ Rni , i =
1, . . . , N , modeled by
ẋ = f (x, u1 , . . . , uN , t) (72)
and each cost functional (to be minimized) is of the form
Z tf
Ji (u1 (·), . . . , uN (·)) = φi (x(tf ), tf ) + Li (x, u1 , . . . , uN , t)dt (73)
t0
different solution concepts need to be invoked. The simplest non-cooperative solution strat-
egy is a so-called non-cooperative Nash equilibrium. A set of controls u∗i , i = 1, . . . , N is
said to be a Nash strategy if for each player modifying that strategy, and assuming that the
others play their Nash strategies, results in an increase in his payoff, that is for i = 1, . . . , N
It is important to note that Nash equilibria may not be unique. It is also easy to see that
for 2 person zero sum games, a Nash equilibrium is a saddle solution.
As in the previous section on saddle solutions, we can write Hamilton Jacobi equations for
Nash equilibria by defining Hamiltonians Hi (x, u1 , . . . , uN , p, t) according to
The conditions for a Nash equilibrium of equation (74) are there exist u∗i (x, p, t) such that
Then, we have N sets of Hamilton Jacobi equations for the N players satisfying the Hamilton
Jacobi equations with Hi∗ = Hi∗ (x, u∗1 , . . . , u∗N , pi , t). Note that we have changed the costate
variables to pi to account for different Hamiltonians and boundary conditions.
∂Hi∗ T
ẋ = ∂pi
∂H ∗ T
(77)
ṗi = − ∂xi
17
where player 1 is the leader and player 2 the follower. Once player 1 announces a strategy
uo1 (·), if player 2 is rational he choose his strategy so as to minimize his cost J2 with the
dynamics
ẋ = f (x, uo1 , u2 , t) (78)
and Z tf
J2 (u2 ) = φ2 (x(tf ), tf ) + L2 (x, uo1 (t), u2 , t)dt
t0
Thus, u∗2 (uo1 ) is chosen to minimize H2 (x, uo1 , u2 , p2 , t), where p2 satisfies the differential equa-
tion
∂H2 T
ṗ2 = − (x, uo1 , u2 (uo1 ), p, t) p2 (tf ) = D1T φ2 (x(tf ), tf )
∂x
In turn the leader chooses his strategy to be that u∗1 which minimizes J1 subject to the
assumption that player 2 will rationally play u∗2 (u∗1 ). Thus, the system of equations that he
has to solve to minimize J1 subject to
ẋ = f (x, u1 , u2 , t) x(t0 ) = x0
∂H2 T
ṗ2 = − ∂x (x, u1 , u2 (u1 ), p, t) p2 (tf ) = D1T φ2 (x(tf ), tf )
o o (79)
0 = D3 H2 (x, u1 , u2 , p2 , t)
The last equation in (79) is the stationarity condition for minimizing H2 . The optimization
problem of the system in (79) is not a standard optimal control in R2n because there is an
equality to be satisfied. Thus, Lagrange multipliers (co-states) taking values in R2n+n2 for
t ∈ [t0 , tf ] are needed. We will omit the details in these notes.
References
[1] S. S. Sastry, Lectures in Optimal Control and Dynamic Games, Notes for the course
EECS290A, Advanced Topics in Control Theory, University of California, Berkeley,
1996.
[5] A. E. Bryson and Y-C. Ho, Applied Optimal Control, Blaisdell Publishing Company,
Waltham, 1969.
18
[6] L. Pontryagin, V. Boltyanskii, R. Gamkrelidze, and E. Mischenko, The Mathematical
Theory of Optimal Processes, Wiley-New York, 1962.
[7] L. C. Young, Optimal Control Theory, Cambridge University Press, 2nd edition, 1980.
[10] W. Fleming and R. Rishel, Deterministic and Stochastic Optimal Control, Springer
Verlag, 1975.
[11] V. Arnold, Mathematical Methods of Classical Mechanics, Springer Verlag, 2nd edition,
1989.
[12] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior, Prince-
ton University Press, 1947.
[13] T. Basar and G. Olsder, Dynamic Noncooperative Games, Academic Press, 2nd Edition,
1995.
[14] Stephen Boyd, EE363: Linear Dynamical Systems, Stanford course website (2021):
https://web.stanford.edu/class/ee363/.
[15] Rudolf Emil Kalman and others. Contributions to the theory of optimal control, Bol.
Soc. Mat. Mexicana, 5(2), pp. 102–119, 1960.
[16] Stephen Boyd and Lieven Vandenberghe, Convex optimization, Cambridge University
Press, 2004.
19