Optimal Control Apparently
Optimal Control Apparently
Optimal Control Apparently
Optimal Control
Topics :
1. Performance Indices
3. Pontryagin’s Principle
Copyright
c Claudiu C. Remsing, 2006.
All rights reserved.
152
C.C. Remsing 153
◦ ◦
This section deals with the problem of compelling a system to behave in some “best
possible” way. Of course, the precise control strategy will depend upon the criterion
used to decide what is meant by “best”, and we first discuss some choices for mea-
sures of system performance. This is followed by a description of some mathematical
techniques for determining optimal control policies, including the special case of lin-
ear systems with quadratic performance index when a complete analytical solution is
possible.
◦
◦
154 AM3.2 - Linear Control
x(t0 ) = x0 ∈ Rm .
ẋ = F (t, x, u), (5.1)
x1 (t) u1 (t)
.
.. is the state vector, u(t) = ... is the control vector,
Here x(t) =
xm (t) u` (t)
and F is a vector-valued mapping having components
Note : We shall assume that the Fi are continuous and satisfy standard condi-
tions, such as having continuous first order partial derivatives (so that the solution
exists and is unique for the given initial condition). We say that F is continuously
differentiable (or of class C 1 ).
where t1 is the first instant of time at which the desired state is reached.
C.C. Remsing 155
The desired final state is now to be attained with minimum total expenditure
of control effort. Suitable performance indices to be minimized are
Z t1 `
X
J := βi |ui | dt (5.4)
t0 i=1
or Z t1
J := uT Ru dt (5.5)
t0
h i
where R = rij is a positive definite symmetric matrix (RT = R > 0) and
the βi and rij are weighting factors.
The aim here is to follow or “track” as closely as possible some desired state
x(·) throughout the interval [t0 , t1 ]. A suitable performance index is
Z t1
J := eT Qe dt (5.6)
t0
Note : Such systems are called servomechanisms; the special case when x(·)
is constant or zero is called a regulator. If the ui (·) are unbounded, then the
minimization problem can lead to a control vector having infinite components. This is
unacceptable for real-life problems, so to restrict the total control effort, the following
index can be used Z t1
eT Qe + uT Ru dt.
J := (5.7)
t0
Expressions (costs) of the form (5.5), (5.6) and (5.7) are termed quadratic
performance indices (or quadratic costs).
ẋ1 = x2 , ẋ2 = u
x1 (T ) = 0, x2 (T ) = 0.
This expression represents a sum of the total fuel consumption and time to
landing, k being a factor which weights the relative importance of these two
quantities.
Simple application
ẋ = Ax , x(0) = x0 (5.8)
Note : If (5.8) represents a regulator, with x(·) being the deviation from some
desired constant state, then minimizing Jr with respect to system parameters is
158 AM3.2 - Linear Control
equivalent to making the system approach its desired state in an “optimal” way.
Increasing the value of r in (5.9) corresponds to penalizing large values of t in this
process.
AT P + P A = −Q. (5.11)
where
AT P1 + P1 A = −P.
where
AT Pr+1 + Pr+1 A = −Pr , r = 0, 1, 2, . . . ; P0 = P. (5.13)
C.C. Remsing 159
z̈ + 2ωk ż + ω 2 z = 0
k 1 + qω 2 1 1 + qω 2
p11 = + , p12 = p21 = , p22 = ·
ω 4kω 2ω 2 4kω 3
Note : In fact by determining x(t) it can be deduced that this value does indeed
give the desirable system transient behaviour. However, there is no a priori way of
deciding on a suitable value for the factor q, which weights the relative importance
of reducing z(·) and ż(·) to zero. This illustrates a disadvantage of the performance
index approach, although in some applications it is possible to use physical arguments
to choose values for weighting factors.
160 AM3.2 - Linear Control
subject to
ẋ = F (t, x, u), x(t0 ) = x0 ∈ Rm .
We assume that
Note : (1) The cost functional J is in fact a function on the function space U
(of all admissible controls) :
J : u ∈ U 7→ J [u] ∈ R.
J [u] − J [u∗ ] ≥ 0.
Assume that u is differentiable on [t0 , t1 ] and that t0 and t1 are fixed. The
variation in Ja corresponding to a variation δu in u is
Z t1
∂ϕ ∂H ∂H
δJa = − p δx + δx + δu + ṗ δx dt
∂x t=t1 t0 ∂x ∂u
ẋ = F (t, x, u)
∂ϕ ∂H
and similarly for and ·)
∂x ∂u
162 AM3.2 - Linear Control
subject to
ẋ = F (t, x, u), x(t0 ) = x0
ẋ = F (t, x, u)
give a total of 2m linear or nonlinear ODEs with (mixed) boundary conditions x(t0 )
and p(t1 ). In general, analytical solution is not possible and numerical techniques
have to be used.
subject to
ẋ = −ax + u, x(0) = x0 ∈ R
H = L + pF = x2 + u2 + p(−ax + u).
Also,
∂H
ṗ∗ = − = −2x∗ + ap∗
∂x
and
∂H
: = 2u∗ + p∗ = 0
∂u u=u∗
where x∗ and p∗ denote the state and adjoint variables for an optimal solu-
tion.
Substitution produces
1
ẋ∗ = −ax∗ − p∗
2
and since ϕ ≡ 0, the boundary condition is just
p(T ) = 0.
Note : We have only found necessary conditions for optimality; further discussion
of this point goes far beyond the scope of this course.
we get
dH ∂L ∂L ∂F ∂F
Ḣ = = u̇ + ẋ + p u̇ + ẋ + ṗF
dt ∂u ∂x ∂u ∂x
∂L ∂F ∂L ∂F
= +p u̇ + +p ẋ + ṗF
∂u ∂u ∂x ∂x
∂H ∂H
= u̇ + ẋ + ṗF
∂u ∂x
∂H ∂H
= u̇ + + ṗ F.
∂u ∂x
Hu=u∗ = constant, t0 ≤ t ≤ t1 .
Discussion
We have so far assumed that t1 is fixed and x(t1 ) is free. If this is not
necessary the case, then we obtain
Z t1
∂ϕ ∂ϕ ∂H ∂H
δJa = − p δx + H + δt + δx + δu + ṗ δx dt.
∂x ∂t u=u∗ t0 ∂x ∂u
t=t1
C.C. Remsing 165
The expression outside the integral must be zero (by virtue of Proposition
5.2.1), making the integral zero. The implications of this for some important
special cases are now listed. The initial condition x(t0 ) = x0 holds through-
out.
x∗ (t1 ) = xf
∂ϕ
(and this replaces p(t1 ) = ).
∂x t=t1
Both δt|t=t1 and δx|t=t1 are now arbitrary so for the expression
∂ϕ ∂ϕ
− p δx + H + δt
∂x ∂t u=u∗
t=t1
166 AM3.2 - Linear Control
must hold.
Hu=u∗ = 0, t0 ≤ t ≤ t1 .
5.2.4 Example. A particle of unit mass moves along the x-axis subject
to a force u(·). It is required to determine the control which transfers the
particle from rest at the origin to rest at x = 1 in unit time, so as to minimize
the effort involved, measured by
Z 1
J := u2 dt.
0
ẍ = u
ẋ1 = x2 , ẋ2 = u.
We have
H = L + pF = p1 x2 + p2 u + u2 .
From
∂H
=0
∂u u=u∗
C.C. Remsing 167
2u∗ + p∗2 = 0
Integration gives
p∗2 = C1 t + C2
and thus
1
ẋ∗2 = − (C1 t + C2 )
2
which on integrating, and using the given conditions x2 (0) = 0 = x2 (1),
produces
1
x∗2 (t) = C2 t2 − t ,
C1 = −2C2 .
2
Finally, integrating the equation ẋ1 = x2 and using x1 (0) = 0, x1 (1) = 1
gives
1
x∗1 (t) = t2 (3 − 2t), C2 = −12.
2
Hence the optimal control is
An interesting case
g1 (x1 , x2 . . . , xm ) = 0
g2 (x1 , x2 , . . . , xm ) = 0
..
.
gk (x1 , x2 , . . . , xm ) = 0
168 AM3.2 - Linear Control
holds.
ax1 + bx2 = c
at time T so as to minimize Z T
u2 dt.
0
The values of a, b, c, and T are given.
From
H = u2 + p1 x2 − p2 x2 + p2 u
we get
1
u∗ = − p∗2 . (5.20)
2
C.C. Remsing 169
so that
p∗1 = c1 , p∗2 = c2 et + c1 (5.21)
must hold.
It is easy to verify that (5.19) produces
p∗1 (T ) a
∗ = (5.23)
p2 (T ) b
and (5.22) and (5.23) give four equations for the four unknown constants ci .
The optimal control u∗ (·) is then obtained from (5.20) and (5.21).
Note : In some problems the restriction on the total amount of control effort which
can be expended to carry out a required task may be expressed in the form
Z t1
L0 (t, x, u) dt = c (5.24)
t0
so that
ẋm+1 = L0 (t, x, u).
This ODE is simply added to the original one (5.1) together with the conditions
|ui (t)| ≤ Ki , i = 1, 2, . . . , `.
This implies that the set of final states which can be achieved is restricted.
Our aim here is to derive the necessary conditions for optimality corre-
sponding to Theorem 5.2.2 for the unbounded case.
An admissible control is one which satisfies the constraints, and we
consider variations such that
• u∗ + δu is admissible
where Z t1
J [u] = ϕ(x(t1 ), t1 ) + L(t, x, u) dt
t0
is determined by δJ in
δJ [u∗ , δu] ≥ 0.
for all admissible δu and all t in [t0 , t1 ]. This states that u∗ minimizes H,
so we have “established”
Note : (1) With a slighty different definition of H, the principle becomes one of
maximizing J, and is then referred to as the Pontryagin’s Maximum Principle.
(2) u∗ (·) is now allowed to be piecewise continuous. (A rigorous proof is beyond the
scope of this course.)
(3) Our derivation assumed that t1 was fixed and x(t1 ) free; the boundary con-
ditions for other situations are precisely the same as those given in the preceding
section.
172 AM3.2 - Linear Control
5.3.2 Example. Consider again the “soft landing” problem (cf. Example
5.1.2), where the performance index
Z T
J = (|u| + k) dt
0
is to be minimized subject to
ẋ1 = x2 , ẋ2 = u.
The Hamiltonian is
H = |u| + k + p1 x2 + p2 u.
where c1 and c2 are constants. Since p∗2 is linear in t, it follows that it can
take each of the values +1 and −1 at most once in [0, T ], so u∗ (·) can switch
at most twice. We must however use physical considerations to determine an
actual optimal control.
Since the landing vehicle begins with a downwards velocity at altitude h,
logical sequences of control would seem to either
u∗ = 0 , followed by u∗ = +1
ẋ1 = x2 , ẋ2 = u
x1 (0) = h , x2 (0) = −v
is
h − vt if 0 ≤ t ≤ t1
x∗1 = (5.27)
h − vt + 21 (t − t1 )2 if t1 ≤ t ≤ T
−v if 0 ≤ t ≤ t1
x∗2 = (5.28)
−v + (t − t1 ) if t1 ≤ t ≤ T.
x1 (T ) = 0, x2 (T ) = 0
174 AM3.2 - Linear Control
or
k
p∗1 (0) = ·
v
Hence we have
k
p∗1 (t) = , t≥0
v
and
kt kt1
p∗2 (t) = −
−1+
v v
∗
using the assumption that p2 (t1 ) = −1. Thus the assumed optimal control
will be valid if t1 > 0 and p∗2 (0) < 1 (the latter conditions being necessary
since u∗ = 0 ), and these conditions imply that
1 2v 2
h > v2 , k< · (5.29)
2 h − 12 v 2
Note : If these inequalities do not hold, then some different control strategy (such
as u = −1, then u∗ = 0, then u∗ = +1 ), becomes optimal. For example, if k is
∗
increased so that the second inequality in (5.29) is violated, then this means that more
emphasis is placed on the time to landing in the performance index. It is therefore
reasonable to expect this time would be reduced by first accelerating downwards with
u∗ = −1 before coasting with u∗ = 0.
We can now discuss a general linear regulator problem in the usual form
ẋ = Ax + Bu (5.30)
C.C. Remsing 175
where x(·) is the deviation from the desired constant state. The aim is to
transfer the system from some initial state to the origin in minimum time,
subject to
|ui (t)| ≤ Ki , i = 1, 2, . . . , `.
The Hamiltonian is
H = 1 + p (Ax + Bu)
h i
= 1 + pAx + pb1 pb2 . . . pb` u
`
X
= 1 + pAx + (p bi )ui
i=1
where
si (t) : = p∗ (t)bi (5.31)
is the switching function for the ith variable. The adjoint equation is
∂
ṗ∗ = − (p∗ Ax)
∂x
or
ṗ∗ = −p∗ A.
If si (t) ≡ 0 in some time interval, then u∗i (t) is indeterminate in this interval.
We now therefore investigate whether the expression in (5.31) can vanish.
176 AM3.2 - Linear Control
Firstly, we can assume that bi 6= 0. Next, since the final time is free, the
condition Hu=u∗ = 0 holds, which gives (for all t )
1 + p∗ (Ax∗ + Bu∗ ) = 0
so clearly p∗ (t) cannot be zero for any value of t. Finally, if the product p∗ bi
is zero, then si = 0 implies that
If the system (5.30) is c.c. by the ith input acting alone (i.e. uj ≡ 0, j 6=
i ), then by Theorem 3.1.3 the matrix in (5.32) is nonsingular, and equation
(5.32) then has only the trivial solution p∗ = 0. However, we have already
ruled out this possibility, so si cannot be zero. Thus provided the controlla-
bility condition holds, there is no time interval in which u∗i is indeterminate.
The optimal control for the ith variable then has the bang-bang form
u∗i = ± Ki .
1 t1 T
Z
1 T
x Q(t)x + uT R(t)u dt
J : = x (t1 )M x(t1 ) + (5.34)
2 2 0
with R(t) positive definite and M and Q(t) positive semi-definite symmetric
1
matrices for t ≥ 0 (the factors 2 enter only for convenience).
C.C. Remsing 177
Note : The quadratic term in u in (5.34) ensures that the total amount of control
effort is restricted, so that the control variables can be assumed unbounded.
The Hamiltonian is
1 1
H = xT Qx + uT Ru + p(Ax + Bu)
2 2
and the necessary condition (5.17) for optimality gives
∂ 1 ∗ T ∗
(u ) Ru + p Bu = (Ru∗ )T + p∗ B = 0
∗ ∗
∂u 2
so that
u∗ = −R−1 B T (p∗ )T (5.35)
and combining this equation with (5.36) produces the system of 2m linear
ODEs " # " #" #
d x∗ A(t) −B(t)R−1 (t)B T (t) x∗
= . (5.37)
dt (p∗ )T −Q(t) −AT (t) (p∗ )T
Since x(t1 ) is not specified, the boundary condition is
Also we get
Ṗ x∗ + P ẋ∗ − (ṗ∗ )T = 0
and substituting for ẋ∗ , (ṗ∗ )T (from (5.37)) and (p∗ )T , produces
Ṗ + P A − P BR−1 B T P + Q + AT P x∗ (t) = 0.
Ṗ = P BR−1 B T P − AT P − P A − Q (5.40)
Note : (1) Since the matrix M is symmetric, it follows that P (t) is symetric
m(m+1)
for all t, so the (vector) ODE (5.40) represents 2 scalar first order (quadratic)
ODEs, which can be integrated numerically.
C.C. Remsing 179
(2) Even when the matrices A, B, Q, and R are all time-invariant the solution P (t)
of (5.40), and hence the feedback matrix in (5.39), will in general still be time-varying.
However, of particular interest is the case when in addition the final time
t1 tends to infinity. Then there is no need to include the terminal expression
in the performance index since the aim is to make x(t1 ) → 0 as t1 → ∞, so
we set M = 0. Let Q1 be a matrix having the same rank as Q and such that
Q = QT1 Q1 . It can be shown that the solution P (t) of (5.40) does become a
constant matrix P , and we have :
ẋ = Ax + Bu(t)
is c.c. and the pair (A, Q1 ) is c.o., then the control which minimizes
Z ∞
xT Qx + uT Ru dt
(5.41)
0
is given by
u∗ (t) = −R−1 B T P x(t) (5.42)
where P is the unique positive definite symmetric matrix which satisfies the
so-called algebraic Riccati equation
P BR−1 B T P − AT P − P A − Q = 0. (5.43)
m(m+1)
Note : Equation (5.43) represents 2 quadratic algebraic equations for the
unknown elements (entries) of P , so the solution will not in general be unique. How-
ever, it can be shown that if a positive definite solution of (5.43) exists, then there is
only one such solution.
Interpretation
y T y = xT QT1 Q1 x .
180 AM3.2 - Linear Control
ẋ = Ax (5.44)
using the fact that it is a solution of (5.43). Since R−1 is positive definite
and Q is positive semi-definite, the matrix on the RHS in (5.45) is negative
semi-definite, so Proposition 4.3.10 is not directly applicable, unless Q is
actually positive definite.
It can be shown that if the triplet (A, B, Q1 ) is neither c.c. nor c.o. but
is stabilizable and detectable, then the algebraic Riccati equation (5.43) has a
unique solution, and the closed loop system (5.44) is asymptotically stable.
Note : Thus a solution of the algebraic Riccati equation leads to a stabilizing linear
feedback control (5.42) irrespective of whether or not the open loop system is stable.
(This provides an alternative to the methods of section 3.3.)
If x∗ (·) is the solution of the closed loop system (5.44), then (as in (5.10))
equation (5.45) implies
d
(x∗ )T P x∗ = −(x∗ )T P BR−1 B T P + Q x∗
dt
= −(u∗ )T Ru∗ − (x∗ )T Qx∗ .
Since A is a stability matrix, we can integrate both sides of this equality with
respect to t (from 0 to ∞ ) to obtain the minimum value of (5.41) :
Z ∞
(x∗ )T Qx∗ + (u∗ )T Ru∗ dt = xT0 P x0 .
(5.46)
0
AT P + P A = −Q
and Z ∞
J0 = xT Qx dt = xT0 P x0
0
respectively.
C.C. Remsing 181
5.5 Exercises
Exercise 93 A system is described by
ẋ = −2x + u
Show that the optimal control which transfers the system from x(0) = 1 to x(1) = 0
is
4e2t
u∗ (t) = − ·
e4 − 1
where z(·) denotes displacement. Starting from some given initial position with
given velocity and acceleration it is required to choose u(·) which is constrained by
|u(t)| ≤ k, so as to make displacement, velocity, and acceleration equal to zero in the
least possible time. Show using (PMP) that the optimal control consists of
u∗ = ± k
z̈ + aż + bz = u
where a > 0 and a2 < 4b. The control variable is subject to |u(t)| ≤ k and is to be
chosen so that the system reaches the state z(T ) = 0, ż(T ) = 0 in minimum possible
time. Show that the optimal control is
ẋ = −2x + 2u, x ∈ R.
is minimized, where T is fixed, show that the optimal control has the
form
√
u∗ (t) = c1 et sinh(t 2 + c2 )
where c1 and c2 are certain constants. (DO NOT try to determine their
values.)
(b) If u(·) is such that |u(t)| ≤ k, where k is a constant, and the system is
to be brought to the origin in the shortest possible time, show that the
optimal control is bang-bang, with at most one switch.
determine the control which transfers it from x(0) = 0 to the line L with equation
x1 + 5x2 = 15
Exercise 99 Use Proposition 5.4.1 to find the feedback control which minimizes
Z ∞
1
x22 + u2 dt
0 10
subject to
ẋ1 = −x1 + u, ẋ2 = x1 .
Exercise 100
(a) Use the Riccati equation formulation to determine the feedback control
for the system
ẋ = −x + u, x∈R
which minimizes Z 1
1
3x2 + u2 dt.
J =
2 0
ẇ(t)
[Hint : In the Riccati equation for the problem put P (t) = − ·]
w(t)
(b) If the system is to be transferred to the origin from an arbitrary initial
state with the same performance index, use the calculus of variations to
determine the optimal control.