3 4 Pontryagin

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Pontryagins Maximum Principle in a Nutshell

Moritz Diehl
Overview

I Derivation 1: Hamilton-Jacobi-Bellman equation


I Derivation 2: Calculus of Variations
I Properties of Euler-Lagrange Equations
I Boundary Value Problem (BVP) Formulation
I Numerical Solution of BVP
I Discrete Time Pontryagin Principle
Continuous Time Optimal Control

Regard simplified optimal control problem:


6
states x(t)
initial value terminal
x0 r cost E (x(T ))

controls u(t)
p p
-
0 t T

Z T
minimize L(x(t), u(t)) dt + E (x(T ))
x(·),u(·) 0
subject to
x(0) − x0 = 0, (fixed initial value)
ẋ(t) − f (x(t), u(t)) = 0, t ∈ [0, T ]. (ODE model)
Hamilton-Jacobi-Bellman equation

Recall:
I Hamilton-Jacobi-Bellman Equation:
− ∂J
∂t (x, t) = minu H(x, ∇J(x, t), u)
I with Hamiltonian function
H(x, λ, u) := L(x, u) + λT f (x, u)
I and terminal condition J(x, T ) = E (x).
Pontryagin’s Maximum Principle

OBSERVATION: In HJB, optimal controls

u ∗ (x, t) = arg min H(x, ∇x J(x, t), u)


u

depend only on derivative ∇x J(x, t), not on J itself!


IDEA: Introduce adjoint variables
λ(t) =ˆ ∂J T
∂x (x(t), t) ∈ R
nx and get controls from

Pontryagin’s Maximum Principle (historical name)

u ∗ (x, λ) = arg min H(x, λ, u)


u

QUESTION: How to obtain λ(t)?


(Differentiation Lemma)

We want to differentiate optimal solution that depends on


parameters y = (x, λ). How can we do that easiest?
LEMMA: If H ∗ (y ) = minu H(y , u)

then ∂H ∂H ∗
∂y (y ) = ∂y (y , u )

with u = arg minu H(y , u)

∗ ∂H ∗ ∂u ∗
PROOF: ∂H ∂H
∂y (y ) = ∂y (y , u ) + ∂u (y , u ) ∂y (y )
| {z }
=0
due to the first order optimality condition.
(Lemma can be extended to constrained problems, using partial
derivatives of Lagrangian.)
Adjoint Differential Equation

I Differentiate HJB Equation


∂J
− (x, t) = min H(x, ∇J(x, t), u) = H(x, ∇x J(x, t), u ∗ ) =
∂t u

with respect to x and obtain:

∂∇J T ∂H ∂H
− = + ∇2x J(x, t), u ∗ )
∂t ∂x ∂λ
|{z}
=f (x,u ∗ )T

or equivalently
∂∇J ∂
+∇2x J(x, t), u ∗ )f (x, u ∗ ) = ∇x J(x, t) = −∇x H(x, λ, u ∗ )
∂t ∂t
| {z }
=λ̇(t)
Terminal Condition

I Likewise, differentiate J(x, T ) = E (x)


and obtain terminal condition

λ(T ) = ∇E (x(T )).


Necessary Optimality Conditions

Summarize optimality conditions as boundary value problem:


x(0) = x0 , (initial value)
ẋ(t) = f (x(t), u ∗ (t)) t ∈ [0, T ], (ODE model)

λ̇(t) = − ∂H T
∂x (x(t), u (t), λ(t)) , t ∈ [0, T ], (adjoint equations)
u ∗ (t) = arg min H(x(t), u, λ(t)), t ∈ [0, T ], (minimum principle)
u
λ(T ) = ∇E (x(T )). (adjoint final value).
Overview

I Derivation 1: Hamilton-Jacobi-Bellman equation


I Derivation 2: Calculus of Variations
I Properties of Euler-Lagrange Equations
I Boundary Value Problem (BVP) Formulation
I Numerical Solution of BVP
Alternative Derivation

Regard infinite optimization problem:


Z T
minimize L(x(t), u(t)) dt + E (x(T ))
x(·),u(·) 0
subject to
x(0) − x0 = 0, (fixed initial value)
ẋ(t) − f (x(t), u(t)) = 0, t ∈ [0, T ]. (ODE model)

Introduce Lagrangian multipliers λ and “Lagrangian functional”


Z T
L(x(·), u(·), λ(·)) = L(x, u) − λT (ẋ − f (x, u))dt + E (x(T ))
0
.
Infinitesimal Variations

Abbreviate using the Hamiltonian H(x, λ, u)


Z T
L(x(·), u(·), λ(·)) = H(x, λ, u) − λT ẋdt + E (x(T ))
0

Regard infinitesimal variation of L(x(·), u(·), λ(·)) with


perturbation δx(t)
Z T
∂H ∂E
δL = δx − λT δ ẋdt + δx(T )
0 ∂x ∂x

which by partial integration yields:


Z T
d T ∂E
δL = (∇x H + λ̇)T δx − (λ δx)dt + δx(T )
0 dt ∂x
Infinitesimal Variations (contd.)

Using the fact that δx(0) = 0 and requiring δL = 0 yields


Z T
0= (∇x H + λ̇)T δxdt + (∇x E − λ(T ))T δx(T )
0

which implies, for arbitrary variations,

λ̇ = −∇x H(x(t), λ(t), u ∗ (t))

and
λ(T ) = ∇x E (x(T ))
Thus, calculus of variations leads to the same adjoint differential
equations as differentiation of HJB!
Overview

I Derivation 1: Hamilton-Jacobi-Bellman equation


I Derivation 2: Calculus of Variations
I Properties of Euler-Lagrange Equations
I Boundary Value Problem (BVP) Formulation
I Numerical Solution of BVP
How to obtain explicit expression for controls?

I In simplest case,

u ∗ (t) = arg min H(x(t), λ(t), u)


u

is defined by
∂H
(x(t), λ(t), u ∗ (t)) = 0
∂u
I In presence of path constraints, expression for u ∗ (t) changes
whenever active constraints change. This leads to state
dependent switches.
I If minimum of Hamiltonian locally not unique, “singular arcs”
occur. Treatment needs higher order derivatives of H.
Nice Case: Example

Regard L(x, u) = 12 (x T Qx + u T Ru) with invertible R and


f (x, u) = Ax + Bu. Then

1
H(x, λ, u) = (x T Qx + u T Ru) + λT (Ax + Bu).
2
and
∂H
= u T R + λT B.
∂u
∂H
Thus, ∂u = 0 implies that

u ∗ = −R −1 B T λ
Singular Arcs

But what if the relation


∂H
(x(t), λ(t), u ∗ ) = 0
∂u
is not invertible w.r.t. to u ∗ ?
This e.g. occurs if L(x, u) is independent of u and f (x, u) is linear
in u.
Singular arcs are due to the fact that only the integral of controls
influences the states, and “singular” perturbations (that go up and
down quickly) do not matter in the objective.
Remedy for Singular Arcs

“What is zero should also have zero derivative”.


Therefore, we differentiate totally w.r.t. to time
d ∂H
(x(t), λ(t), u ∗ ) = 0
dt ∂u
i.e.
∂ ∂H ∂ ∂H
ẋ + λ̇ = 0
∂x ∂u |{z} ∂λ ∂u |{z}
=f (x,u) =− ∂H
∂x

If this still does not allow to find u ∗ explicitly, differentiate even


further . . .
Singular Arc: Example

Regard L(x, u) = x T Qx and f (x, u) = Ax + Bu. Then

1
H(x, λ, u) = x T Qx + λT (Ax + Bu).
2
and
∂H
= λT B
∂u
This is not invertible w.r.t. to u ∗ !
Singular Arc: Example (contd.)

Once more differentiating yields:


d ∂H ∂H
= λ̇T B = − B = −(x T Q + λT A)B
dt ∂u ∂x
Once more differentiating yields:
d d ∂H
= −ẋ T QB−λ̇T AB = −(Ax+Bu)T QB+(x T Q+λT A)AB
dt dt ∂u
Setting this to zero finally yields the feedback law
 
u ∗ = (B T QB)−1 B T (AT Q − QA)x + AT AT λ

This is only applicable on singular arcs.


Euler Lagrange Differential Equations

Note that

H(x, λ, u) = f (x, u)
∂λ
Thus,    ∂H 
d x ∂λ
=
dt λ − ∂H
∂x

is a Hamiltonian system. Volume in (x, λ) is preserved. But this


also means that if the dynamics of x is very stable i.e. contracting
than the dynamics of λ must be expanding.
Overview

I Derivation 1: Hamilton-Jacobi-Bellman equation


I Derivation 2: Calculus of Variations
I Properties of Euler-Lagrange Equations
I Boundary Value Problem (BVP) Formulation
I Numerical Solution of BVP
Boundary Value Problem (BVP)

Can summarize the BVP


x(0) = x0 , (initial value)
ẋ(t) = f (x(t), u ∗ (t)) t ∈ [0, T ], (ODE model)

−λ̇(t) = ∂H T
∂x (x(t), λ(t), u (t)) , t ∈ [0, T ], (adjoint equations)
u ∗ (t) = arg min H(x(t), λ(t), u), t ∈ [0, T ], (minimum principle)
u
∂E T
λ(T ) = ∂x (x(T )) . (adjoint final value).

by using y = (x, λ) and substituting u ∗ explicitly as

0 = r (y (0), y (T )), (boundary conditions)


ẏ (t) = f˜(y (t)) t ∈ [0, T ], (ODE model)
BVP analysis

The BVP
0 = r (y (0), y (T )), (boundary conditions)
ẏ (t) = f˜(y (t)) t ∈ [0, T ], (ODE model)

has 2nx differential equations ẏ = f˜, and 2nx boundary conditions


r . It is therefore (usually) well-defined.
But how to solve a BVP?
Overview

I Derivation 1: Hamilton-Jacobi-Bellman equation


I Derivation 2: Calculus of Variations
I Properties of Euler-Lagrange Equations
I Boundary Value Problem (BVP) Formulation
I Numerical Solution of BVP
I single shooting
I collocation
Single Shooting

Guess initial value for y0 . Use numerical integration to obtain


trajectory as function y (t; y0 ) of y0 .

r y (T ; y0 )
y0 r trajectory y (t; y0 )

p -

Obtain in particular terminal value y (T ; y0 ).


Single Shooting (contd.)

The only remaining equation is

r (y0 , y (T ; y0 ) = 0
| {z }
=F (y0 )

which might or might not be satisfied for the guess y0 .


Fortunately, r has as many components as y0 , so we can apply
Newton’s method for root finding of

F (y0 ) = 0

which iterates
∂F k
y0k+1 = y0k − (y )F (y0k )
∂y0 0
∂F k
Attention: to evaluate ∂y0 (y0 ) we have to compute ODE
sensitivities.
Collocation (Sketch)

I Discretize states on grid with node values si ≈ y (ti ).


I Replace infinite ODE

0 = ẏ (t) − f˜(y (t)), t ∈ [0, T ]

by finitely many equality constraints

ci (si , si+1 ) = 0, . . . , N −1,


i = 0, 
si+1 −si
e.g. ci (si , si+1 ) := ti+1 −ti− f si +si+1
˜
2
Nonlinear Equation in Collocation

After discretization, obtain large scale, but sparse nonlinear


equation system:

r (s0 , sN ) = 0, (boundary conditions)


ci (si , si+1 ) = 0, i = 0, . . . , N − 1, (discretized ODE model)

Solve again with Newton’s method. Exploit sparsity in linear


system setup and solution.
Discrete Time Optimal Control Problem

N−1
X
minimize li (si , qi ) + E (sN )
s,q
i=0
subject to
s0 − x 0 = 0, (initial value)
si+1 − fi (si , qi ) = 0, i = 0, . . . , N − 1, (discrete system)
hi (si , qi ) ≥ 0, i = 0, . . . , N, (path constraints)
r (sN ) ≥ 0. (terminal constraints)

Can arise also from direct multiple shooting parameterization of


continous optimal control problem. This NLP can be solved by
SQP or Constrained Gauss-Newton method.
Optimality of Discretized Optimal Control

For simplicity, drop all inequalities, regard only:


N−1
X
minimize li (si , qi ) + E (sN )
s,q
i=0
subject to
s0 − x0 = 0, (initial value)
si+1 − fi (si , qi ) = 0, i = 0, . . . , N − 1, (discrete system)

What are KKT optimality conditions of this discretized optimal


control problem?
Discrete Optimality Conditions 1

Procedure:
I Introduce multiplier vectors λ0 , . . . , λN for all dynamic state
constraints.
I Formulate Lagrangian

L(s, q, λ) = E (sN ) + (s0 − x0 )T λ0


N−1
X
+ li (si , qi ) − (si+1 − fi (si , qi ))T λi+1
i=0

I Compute
∇si L and ∇qi L,
which must be zero for optimal solution.
Discrete Optimality Conditions 2

Obtain
1. ∇si L = −λi + ∇si li (si , qi ) + ∇si fi (si , qi )λi+1 = 0
(i = 0, . . . , N − 1)
2. ∇sN L = −λN + ∇sN E (sN )
3. ∇qi L = ∇qi li (si , qi )+∇qi fi (si , qi )λi+1 = 0 (i = 0, . . . , N −1)
4. ∇λ0 L = s0 − x0
5. ∇λi+1 L = si+1 − fi (si , qi ) = 0 (i = 0, . . . , N − 1)
These conditions can be simplified by introducing the discrete time
Hamiltonian:

Hi (si , qi , λi+1 ) = li (si , qi ) + fi (si , qi )T λi+1

as follows. . .
Discrete Pontryagin Principle

The KKT conditions are now equivalent to:


1. λi = ∇si Hi (si , qi , λi+1 ) (i = 0, . . . , N − 1) (adjoint equation)
2. λN = ∇sN E (sN ) (terminal condition on adjoints)
3. ∇qi Hi (si , qi , λi+1 ) = 0 (i = 0, . . . , N − 1) (minimum principle)
4. s0 = x0 (initial condition)
5. si+1 = fi (si , qi ) (i = 0, . . . , N − 1) (system dynamics)
Summary

I Pontryagin’s Maximum Principle can be derived by in two


ways (HJB/Calc. of Variations)
I Controls must be explicitly derived. On singular arcs, need
higher order derivatives.
I Boundary Value Problem is well posed but double ODE is
usually unstable. Not easy to simulate forward.
I Solve BVP with single shooting, collocation, or multiple
shooting.
Literature

I A. E. Bryson and Y. C. Ho: Applied Optimal Control,


Hemisphere/Wiley, 1975.

You might also like