Optimal Control Apparently

Chapter 5
Optimal Control
Topics :
1. Performance Indices
2. Elements of Calculus of Variations
3. Pontryagin’s Principle
4. Linear Regulators with Quadratic Costs
Copyright
c Claudiu C. Remsing, 2006.
All rights reserved.
152
C.C. Remsing 153
◦ ◦
This section deals with the problem of compelling a system to behave in some “best
possible” way. Of course, the precise control strategy will depend upon the criterion
used to decide what is meant by “best”, and we first discuss some choices for mea-
sures of system performance. This is followed by a description of some mathematical
techniques for determining optimal control policies, including the special case of lin-
ear systems with quadratic performance index when a complete analytical solution is
possible.
◦ ◦
154 AM3.2 - Linear Control
5.1 Performance Indices

Consider a (nonlinear) control system Σ described by
x(t0 ) = x0 ∈ Rm .
ẋ = F (t, x, u), (5.1)
   
x1 (t) u1 (t)
 . 
..  is the state vector, u(t) =  ...  is the control vector,
 
Here x(t) = 
   
xm (t) u` (t)
and F is a vector-valued mapping having components
Fi : t 7→ Fi (t, x1 (t), x2 (t), . . . , xm (t), u1 (t), . . . , u` (t)), i = 1, 2, . . . , m.
Note : We shall assume that the Fi are continuous and satisfy standard condi-
tions, such as having continuous first order partial derivatives (so that the solution
exists and is unique for the given initial condition). We say that F is continuously
differentiable (or of class C 1 ).
The optimal control problem
The general optimal control problem (OCP) concerns the minimization of

some function (functional) J = J [u], the performance index (or cost
functional); or, one may want to maximize instead a “utility” functional J ,
but this amounts to minimizing the cost −J . The performance index J
provides a measure by which the performance of the system is judged. We
give several examples of performance indices.
(1) Minimum-time problems.

Here u(·) is to be chosen so as to transfer the system from an initial state
x0 to a specified state in the shortest possible time. This is equivalent to
minimizing the performance index
Z t1
J : = t1 − t0 = dt (5.2)
t0
where t1 is the first instant of time at which the desired state is reached.
C.C. Remsing 155
5.1.1 Example. An aircraft pursues a ballistic missile and wishes to inter-

cept it as quickly as possible. For simplicity neglect gravitational and aero-
dynamic forces and suppose that the trajectories are horizontal. At t = 0
the aircraft is at a distance a from the missile, whose motion is known to be
described by x(t) = a + bt2 , where b is a positive constant. The motion of
the aircraft is given by ẍ = u, where the thrust u(·) is subject to |u| ≤ 1,
with suitably chosen units. Clearly the optimal strategy for the aircraft is to
accelerate with maximum thrust u(t) = 1. After a time t the aircraft has
then travelled a distance ct + 12 t2 , where ẋ(0) = c, so interception will occur
at time T where
1
cT + T 2 = a + bT 2 .
2
This equation may not have any real positive solution; in other words, this
minimum-time problem may have no solution for certain initial conditions.
(2) Terminal control.

In this case the final state xf = x(t1 ) is to be brought as near as possible to
some desired state x(t1 ). A suitable performance measure to be minimized is
J : = eT (t1 )M e(t1 ) (5.3)
where e(t) : = x(t) − x(t) and M is a positive definite symmetric matrix

(M T = M > 0).
A special case is when M is the unit matrix and then
J = kxf − x(t1 )k2 .
Note : More generally, if M = diag (λ1 , λ2 , . . . , λm ), then the entries λi are

chosen so as to weight the relative importance of the deviations (xi (t1 ) − xi (t1 )). If
some of the xi (t1 ) are not specified, then the corresponding elements of M will be
zero and M will be only positive semi-definite (M T = M ≥ 0).
(3) Minimum effort.

The desired final state is now to be attained with minimum total expenditure
of control effort. Suitable performance indices to be minimized are
Z t1 `
X
J := βi |ui | dt (5.4)
t0 i=1
or Z t1
J := uT Ru dt (5.5)
t0
h i
where R = rij is a positive definite symmetric matrix (RT = R > 0) and
the βi and rij are weighting factors.
(4) Tracking problems.
The aim here is to follow or “track” as closely as possible some desired state
x(·) throughout the interval [t0 , t1 ]. A suitable performance index is
Z t1
J := eT Qe dt (5.6)
t0
where Q is a positive semi-definite symmetric matrix (QT = Q ≥ 0).
Note : Such systems are called servomechanisms; the special case when x(·)
is constant or zero is called a regulator. If the ui (·) are unbounded, then the
minimization problem can lead to a control vector having infinite components. This is
unacceptable for real-life problems, so to restrict the total control effort, the following
index can be used Z t1
eT Qe + uT Ru dt.

J := (5.7)
t0
Expressions (costs) of the form (5.5), (5.6) and (5.7) are termed quadratic
performance indices (or quadratic costs).
5.1.2 Example. A landing vehicle separates from a spacecraft at time t0 =

0 at an altitude h from the surface of a planet, with initial (downward)
velocity ~v . For simplicity, assume that gravitational forces are neglected and
that the mass of the vehicle is constant. Consider vertical motion only, with
C.C. Remsing 157
upwards regarded as the positive direction. Let x1 denote altitude, x2 velocity

and u(·) the thrust exerted by the rocket motor, subject to |u(t)| ≤ 1 with
suitable scaling. The equations of motion are
ẋ1 = x2 , ẋ2 = u
and the initial conditions are
x1 (0) = h, x2 (0) = −v.
For a “soft landing” at some time T we require
x1 (T ) = 0, x2 (T ) = 0.
A suitable performance index might be

Z T
J := (|u| + k) dt.
0
This expression represents a sum of the total fuel consumption and time to
landing, k being a factor which weights the relative importance of these two
quantities.
Simple application
Before dealing with problems of determining optimal controls, we return

to the linear time-invariant system
ẋ = Ax , x(0) = x0 (5.8)
and show how to evaluate associated quadratic indices (costs)

Z ∞
Jr : = tr xT Qx dt , r = 0, 1, 2, . . . (5.9)
0
where Q is a positive definite symmetric matrix (QT = Q > 0).
Note : If (5.8) represents a regulator, with x(·) being the deviation from some
desired constant state, then minimizing Jr with respect to system parameters is
equivalent to making the system approach its desired state in an “optimal” way.
Increasing the value of r in (5.9) corresponds to penalizing large values of t in this
process.
To evaluate J0 we use the techniques of Lyapunov theory (cf. section

4.3). It was shown that
d T
x P x = −xT Qx

(5.10)
dt
where P and Q satisfy the Lyapunov matrix equation
AT P + P A = −Q. (5.11)
Integrating both sides of (5.10) with respect to t gives

Z ∞
∞
J0 = xT Qx dt = − xT (t)P x(t) = xT0 P x0
0 0
provided A is a stability matrix, since in this case x(t) → 0 as t → ∞

(cf. Theorem 4.2.1).
Note : The matrix P is positive definite and so J0 > 0 for all x0 6= 0.
A repetition of the argument leads to a similar expression for Jr , r ≥ 1.

For example,
d
txT P x = xT P x − txT Qx

dt
and integrating we have
Z ∞
J1 = txT Qx dt = xT0 P1 x0
0
where
AT P1 + P1 A = −P.
Exercise 91 Show that

Z ∞
Jr : = tr xT Qx dt = r! xT0 Pr x0 (5.12)
0
where
AT Pr+1 + Pr+1 A = −Pr , r = 0, 1, 2, . . . ; P0 = P. (5.13)
C.C. Remsing 159
Thus evaluation of (5.9) involves merely successive solution of the linear

matrix equations (5.13); there is no need to calculate the solution x(·) of (5.8).
5.1.3 Example. A general second-order linear system (the harmonic os-

cillator in one dimension) can be written as
z̈ + 2ωk ż + ω 2 z = 0
where ω is the natural frecquency of the undamped system and k is a damping

coefficient. With the usual choice of state variables x1 : = z, x2 : = ż, and
taking Q = diag (1, q) in (5.11), it is easy to obtain the corresponding solution
h i
P = pij with elements
k 1 + qω 2 1 1 + qω 2
p11 = + , p12 = p21 = , p22 = ·
ω 4kω 2ω 2 4kω 3
Exercise 92 Work out the preceding computation.

" #
1
In particular, if x0 = , then J0 = p11 . Regarding k as a parameter,
0
optimal damping could be defined as that which minimizes J0 . By setting
d
dk J0 = 0, this gives
1 + qω 2
k2 = ·
4
1 √1 .
For example, if q = ω2
then the “optimal” value of k is 2
Note : In fact by determining x(t) it can be deduced that this value does indeed
give the desirable system transient behaviour. However, there is no a priori way of
deciding on a suitable value for the factor q, which weights the relative importance
of reducing z(·) and ż(·) to zero. This illustrates a disadvantage of the performance
index approach, although in some applications it is possible to use physical arguments
to choose values for weighting factors.
5.2 Elements of Calculus of Variations

The calculus of variations is the name given to the theory of the optimization of
integrals. The name itself dates from the mid-eighteenth century and describes the
method used to derive the theory. We have room for only a very brief treatment (in
particular, we shall not mention the well-known Euler-Lagrange equation approach).
We consider the problem of minimizing the functional

Z t1
J [u] : = ϕ(x(t1 ), t1 ) + L(t, x, u) dt (5.14)
t0
subject to
ẋ = F (t, x, u), x(t0 ) = x0 ∈ Rm .
We assume that
• there are no constraints on the control functions ui (·), i = 1, 2, . . . , `

(that is, the control set U is R` );
• J = J [u] is differentiable (that is, if u and u+δu are two controls
for which J is defined, then
∆J : = J [u + δu] − J [u] = δJ [u, δu] + j(u, δu) · kδuk
where δJ is linear in δu and j(u, δu) → 0 as kδuk → 0 ).
Note : (1) The cost functional J is in fact a function on the function space U
(of all admissible controls) :
J : u ∈ U 7→ J [u] ∈ R.
(2) δJ is called the (first) variation of J corresponding to the variation δu in u.
The control u∗ is an extremal, and J has a (relative) minimum, provided

there exists an ε > 0 such that for all functions u satisfying ku − u∗ k < ε,
J [u] − J [u∗ ] ≥ 0.
A fundamental result (given without proof) is the following :

C.C. Remsing 161
5.2.1 Proposition. A necessary condition for u∗ to be an extremal is that
δJ [u∗ , δu] = 0 for all δu.
We now apply Proposition 5.2.1. Introduce a covector function of La-

h i
grange multipliers p(t) = p1 (t) p2 (t) . . . pm (t) ∈ R 1×m so as to form an
augmented functional incorporating the constraints :
Z t1
Ja : = ϕ(x(t1 ), t1 ) + (L(t, x, u) + p(F (t, x, u) − ẋ)) dt.
t0
Integrating the last term on the rhs by parts gives

Z t1
t1

Ja = ϕ(x(t1 ), t1 ) + (L + pF + ṗx) dt − px
t0 t0
t1 Z t1

= ϕ(x(t1 ), t1 ) − px + (H + ṗx) dt
t0 t0
where the (control) Hamiltonian function is defined by
H(t, p, x, u) : = L(t, x, u) + pF (t, x, u). (5.15)
Assume that u is differentiable on [t0 , t1 ] and that t0 and t1 are fixed. The
variation in Ja corresponding to a variation δu in u is
Z t1
∂ϕ ∂H ∂H
δJa = − p δx + δx + δu + ṗ δx dt
∂x t=t1 t0 ∂x ∂u
where δx is the variation in x in the differential equation
ẋ = F (t, x, u)
due to δu. (We have used the notation

∂H ∂H ∂H ∂H
:= ···
∂x ∂x1 ∂x2 ∂xm
∂ϕ ∂H
and similarly for and ·)
∂x ∂u
Note : Since x(t0 ) is specified, δx|t=t0 = 0.
It is convenient to remove the term (in the expression δJa ) involving δx

by suitably choosing p, i.e. by taking

∂H ∂ϕ
ṗ = − and p(t1 ) = . (5.16)
∂x ∂x t=t1
It follows that Z t1
∂H
δJa = δu dt .
t0 ∂u
Thus a necessary condition for u∗
to be an extremal is that

∂H
= 0, t0 ≤ t ≤ t1 . (5.17)
∂u u=u∗
We have therefore “established”
5.2.2 Theorem. Necessary conditions for u∗ to be an extremal for

Z t1
J [u] = ϕ(x(t1 ), t1 ) + L(t, x, u) dt
t0
subject to
ẋ = F (t, x, u), x(t0 ) = x0
are the following :

∂H
ṗ = −
∂x

∂ϕ
p(t1 ) =
∂x t=t1

∂H
= 0, t 0 ≤ t ≤ t1 .
∂u u=u∗
Note : The (vector) state equation
ẋ = F (t, x, u)
and the (vector) co-state equation (or adjoint equation)

∂H
ṗ = −
∂x
C.C. Remsing 163
give a total of 2m linear or nonlinear ODEs with (mixed) boundary conditions x(t0 )
and p(t1 ). In general, analytical solution is not possible and numerical techniques
have to be used.
5.2.3 Example. Choose u(·) so as to minimize

Z T
x2 + u2 dt

J =
0
subject to
ẋ = −ax + u, x(0) = x0 ∈ R
where a, T > 0. We have
H = L + pF = x2 + u2 + p(−ax + u).
Also,
∂H
ṗ∗ = − = −2x∗ + ap∗
∂x
and
∂H
: = 2u∗ + p∗ = 0
∂u u=u∗
where x∗ and p∗ denote the state and adjoint variables for an optimal solu-
tion.
Substitution produces
1
ẋ∗ = −ax∗ − p∗
2
and since ϕ ≡ 0, the boundary condition is just
p(T ) = 0.
The linear system " # " #" #

ẋ∗ −a − 12 x∗
=
ṗ∗ −2 a p∗
can be solved using the methods of Chapter 2. (It is easy to verify that x∗
√
and p∗ take the form c1 eλt + c2 e−λt , where λ = 1 + a2 and the constants
c1 and c2 are found using the conditions at t = 0 and t = T .)
It follows that the optimal control is

1
u∗ (t) = − p∗ (t).
2
Note : We have only found necessary conditions for optimality; further discussion
of this point goes far beyond the scope of this course.
If the functions L and F do not explicitly depend upon t, then from
H(p, x, u) = L(x, u) + pF (x, u)
we get

dH ∂L ∂L ∂F ∂F
Ḣ = = u̇ + ẋ + p u̇ + ẋ + ṗF
dt ∂u ∂x ∂u ∂x

∂L ∂F ∂L ∂F
= +p u̇ + +p ẋ + ṗF
∂u ∂u ∂x ∂x
∂H ∂H
= u̇ + ẋ + ṗF
∂u ∂x
∂H ∂H
= u̇ + + ṗ F.
∂u ∂x
Since on an optimal trajectory

∂H ∂H
ṗ = − and =0
∂x ∂u u=u∗
it follows that Ḣ = 0 when u = u∗ , so that
Hu=u∗ = constant, t0 ≤ t ≤ t1 .
Discussion
We have so far assumed that t1 is fixed and x(t1 ) is free. If this is not
necessary the case, then we obtain
Z t1
∂ϕ ∂ϕ ∂H ∂H
δJa = − p δx + H + δt + δx + δu + ṗ δx dt.
∂x ∂t u=u∗ t0 ∂x ∂u
t=t1
C.C. Remsing 165
The expression outside the integral must be zero (by virtue of Proposition
5.2.1), making the integral zero. The implications of this for some important
special cases are now listed. The initial condition x(t0 ) = x0 holds through-
out.
A Final time t1 specified.

(i) x(t1 ) free
We have δt|t=t1 = 0 but δx|t=t1 is arbitrary, so the condition

∂ϕ
p(t1 ) =
∂x t=t1
must hold (with Hu=u∗ = constant, t0 ≤ t ≤ t1 when appropriate), as

before.
(ii) x(t1 ) specified
In this case δt|t=t1 = 0 and δx|t=t1 = 0 so

∂ϕ ∂ϕ
− p δx + H + δt
∂x ∂t u=u∗
t=t1
is automatically zero. The condition is thus
x∗ (t1 ) = xf

∂ϕ
(and this replaces p(t1 ) = ).
∂x t=t1
B Final time t1 free.

(iii) x(t1 ) free
Both δt|t=t1 and δx|t=t1 are now arbitrary so for the expression

∂ϕ ∂ϕ
− p δx + H + δt
∂x ∂t u=u∗
t=t1
to vanish, the conditions

∂ϕ ∂ϕ
p(t1 ) = and H+ =0
∂x t=t1 ∂t u=u∗
t=t1
must hold.
Note : In particular, if ϕ, L, and F do not explicitly depend upon t, then
Hu=u∗ = 0, t0 ≤ t ≤ t1 .
(iv) x(t1 ) specified
Only δt|t=t1 is now arbitrary, so the conditions are

∗ ∂ϕ
x (t1 ) = xf and H+ = 0.
∂t u=u∗
t=t1
5.2.4 Example. A particle of unit mass moves along the x-axis subject
to a force u(·). It is required to determine the control which transfers the
particle from rest at the origin to rest at x = 1 in unit time, so as to minimize
the effort involved, measured by
Z 1
J := u2 dt.
0
Solution : The equation of motion is
ẍ = u
and taking x1 : = x and x2 : = ẋ we obtain the state equations
ẋ1 = x2 , ẋ2 = u.
We have
H = L + pF = p1 x2 + p2 u + u2 .
From
∂H
=0
∂u u=u∗
C.C. Remsing 167
the optimal control is given by
2u∗ + p∗2 = 0
and the adjoint equations are
ṗ∗1 = 0, ṗ∗2 = −p∗1 .
Integration gives
p∗2 = C1 t + C2
and thus
1
ẋ∗2 = − (C1 t + C2 )
2
which on integrating, and using the given conditions x2 (0) = 0 = x2 (1),
produces
1
x∗2 (t) = C2 t2 − t ,

C1 = −2C2 .
2
Finally, integrating the equation ẋ1 = x2 and using x1 (0) = 0, x1 (1) = 1
gives
1
x∗1 (t) = t2 (3 − 2t), C2 = −12.
2
Hence the optimal control is
u∗ (t) = 6(1 − 2t).
An interesting case
If the state at final time t1 (assumed fixed) is to lie on a “surface” S

(more precisely, an (m − k)-submanifold of Rm ) defined by
g1 (x1 , x2 . . . , xm ) = 0
g2 (x1 , x2 , . . . , xm ) = 0
..
.
gk (x1 , x2 , . . . , xm ) = 0
(i.e. S = g −1 (0) ⊂ Rm , where g = (g1 , . . . , gk ) : Rm → Rk , m ≥ k is such

∂g
that rank ∂x = k), then (it can be shown that) in addition to the k conditions
g1 (x∗ (t1 )) = 0, . . . , gk (x∗ (t1 )) = 0 (5.18)
there are a further m conditions which can be written as

∂ϕ ∂g1 ∂g2 ∂gk
− p = d1 + d2 + · · · + dk (5.19)
∂x ∂x ∂x ∂x
both sides being evaluated at t = t1 , u = u∗ , x = x∗ , p = p∗ . The di are
constants to be determined. Together with the 2m constants of integration
there are thus 2m + k unknowns and 2m + k conditions (5.18), (5.19), and
x(t0 ) = x0 . If t1 is free, then in addition

∂ϕ
H+ =0
∂t u=u∗
t=t1
holds.
5.2.5 Example. A system is described by
ẋ1 = x2 , ẋ2 = −x2 + u
is to be transformed (steered) from x(0) = 0 to the line L with equation
ax1 + bx2 = c
at time T so as to minimize Z T
u2 dt.
0
The values of a, b, c, and T are given.
From
H = u2 + p1 x2 − p2 x2 + p2 u
we get
1
u∗ = − p∗2 . (5.20)
2
C.C. Remsing 169
The adjoint equations are
ṗ∗1 = 0, ṗ∗2 = −p∗1 + p∗2
so that
p∗1 = c1 , p∗2 = c2 et + c1 (5.21)
where c1 and c2 are constants. We obtain

1 1 1 1
x∗1 = c3 e−t − c2 et − c1 t + c4 , x∗2 = −c3 e−t − c2 et − c1
4 2 4 2
and the conditions
x∗1 (0) = 0, x∗2 (0) = 0, ax∗1 (T ) + bx∗2 (T ) = c (5.22)
must hold.
It is easy to verify that (5.19) produces
p∗1 (T ) a
∗ = (5.23)
p2 (T ) b
and (5.22) and (5.23) give four equations for the four unknown constants ci .
The optimal control u∗ (·) is then obtained from (5.20) and (5.21).
Note : In some problems the restriction on the total amount of control effort which
can be expended to carry out a required task may be expressed in the form
Z t1
L0 (t, x, u) dt = c (5.24)
t0
where c is a given constant, such a constraint being termed isoperimetric. A

convenient way of dealing with (5.24) is to define a new variable
Z t
xm+1 (t) : = L0 (t, x, u) dτ
t0
so that
ẋm+1 = L0 (t, x, u).
This ODE is simply added to the original one (5.1) together with the conditions
xm+1 (t0 ) = 0, xm+1 (t1 ) = c
and the previous procedure continues as before, ignoring (5.24).

5.3 Pontryagin’s Principle
In real-life problems the control variables are usually subject to constraints on

their magnitudes, typically of the form
|ui (t)| ≤ Ki , i = 1, 2, . . . , `.
This implies that the set of final states which can be achieved is restricted.
Our aim here is to derive the necessary conditions for optimality corre-
sponding to Theorem 5.2.2 for the unbounded case.
An admissible control is one which satisfies the constraints, and we
consider variations such that
• u∗ + δu is admissible
• kδuk is sufficiently small so that the sign of
∆J = J [u∗ + δu] − J [u∗ ]
where Z t1
J [u] = ϕ(x(t1 ), t1 ) + L(t, x, u) dt
t0
is determined by δJ in
J [u + δu] − J [u] = δJ [u, δu] + j(u, δu) · kδuk.
Because of the restriction on δu, Proposition 5.2.1 no longer applies,

and instead a necessary condition for u∗ to minimize J is
δJ [u∗ , δu] ≥ 0.
The development then proceeds as in the previous section; Lagrange multipli-

h i
ers p = p1 p2 . . . pm are introduced to define Ja and are chosen so as
to satisfy
∂H ∂ϕ
ṗ = − and p(t1 ) = .
∂x ∂x t=t1
C.C. Remsing 171
The only difference is that the expression for δJa becomes

Z t1
δJa [u, δu] = (H(t, p, x, u + δu) − H(t, p, x, u)) dt.
t0
It therefore follows that a necessary condition for u = u∗ to be a minimizing

control is that
δJa [u∗ , δu] ≥ 0
for all admissible δu. This in turn implies that
H(t, p∗ , x∗ , u∗ + δu) ≥ H(t, p∗ , x∗ , u∗ ) (5.25)
for all admissible δu and all t in [t0 , t1 ]. This states that u∗ minimizes H,
so we have “established”
5.3.1 Theorem. (Pontryagin’s Minimum Principle) Necessary condi-

tions for u∗ to minimize
Z t1
J [u] = ϕ(x(t1 ), t1 ) + L(t, x, u) dt
t0
are the following :

∂H
ṗ = −
∂x

∂ϕ
p(t1 ) =
∂x t=t1
H(t, p , x , u + δu) ≥ H(t, p∗ , x∗ , u∗ ) for all admissible δu, t0 ≤ t ≤ t1 .

∗ ∗ ∗
Note : (1) With a slighty different definition of H, the principle becomes one of
maximizing J, and is then referred to as the Pontryagin’s Maximum Principle.
(2) u∗ (·) is now allowed to be piecewise continuous. (A rigorous proof is beyond the
scope of this course.)
(3) Our derivation assumed that t1 was fixed and x(t1 ) free; the boundary con-
ditions for other situations are precisely the same as those given in the preceding
section.
5.3.2 Example. Consider again the “soft landing” problem (cf. Example
5.1.2), where the performance index
Z T
J = (|u| + k) dt
0
is to be minimized subject to
ẋ1 = x2 , ẋ2 = u.
The Hamiltonian is
H = |u| + k + p1 x2 + p2 u.
Since the admissible range of control is −1 ≤ u(t) ≤ 1, it follows that H will

be minimized by the following :



 −1 if 1 < p∗2 (t)






∗
u (t) = 0 if −1 < p∗2 (t) < 1 (5.26)







 +1 if p∗ < −1.

2
Note : (1) Such a control is referred to by the graphic term bang-zero-bang,

since only maximum thrust is applied in a forward or reverse direction; no interme-
diate nonzero values are used. If there is no period in which u∗ is zero, the control
is called bang-bang. For example, a racing-car driver approximates to bang-bang
operation, since he tends to use either full throttle or maximum braking when at-
tempting to circuit a track as quickly as possible.
(2) In (5.26) u∗ (·) switches in value according to the value of p∗2 (·), which is there-
fore termed (in this example) the switching function.
The adjoint equations are
ṗ∗1 = 0, ṗ∗2 = −p∗1
and integrating these gives
p∗1 (t) = c1 , p∗2 (t) = −c1 t + c2

C.C. Remsing 173
where c1 and c2 are constants. Since p∗2 is linear in t, it follows that it can
take each of the values +1 and −1 at most once in [0, T ], so u∗ (·) can switch
at most twice. We must however use physical considerations to determine an
actual optimal control.
Since the landing vehicle begins with a downwards velocity at altitude h,
logical sequences of control would seem to either
u∗ = 0 , followed by u∗ = +1
(upwards is regarded as positive), or
u∗ = −1 , then u∗ = 0 , then u∗ = +1.
Consider the first possibility and suppose that u∗ switches from 0 to +1 in

time t1 . By virtue of (5.26) this sequence of control is possible if p∗2 decreases
with time. It is easy to verify (exercise !) that the solution of
ẋ1 = x2 , ẋ2 = u
subject to the initial conditions
x1 (0) = h , x2 (0) = −v
is



 h − vt if 0 ≤ t ≤ t1
x∗1 = (5.27)

h − vt + 21 (t − t1 )2 if t1 ≤ t ≤ T





 −v if 0 ≤ t ≤ t1
x∗2 = (5.28)


−v + (t − t1 ) if t1 ≤ t ≤ T.

Substituting the soft landing requirements
x1 (T ) = 0, x2 (T ) = 0
into (27) and (28) gives

h 1 h 1
T = + v, t1 = − v·
v 2 v 2
Because the final time is not specified and because of the form of H equation
Hu=u∗ = 0 holds, so in particular Hu=u∗ = 0 at t = 0; that is,
k + p∗1 (0)x∗2 (0) = 0
or
k
p∗1 (0) = ·
v
Hence we have
k
p∗1 (t) = , t≥0
v
and
kt kt1
p∗2 (t) = −
−1+
v v
∗
using the assumption that p2 (t1 ) = −1. Thus the assumed optimal control
will be valid if t1 > 0 and p∗2 (0) < 1 (the latter conditions being necessary
since u∗ = 0 ), and these conditions imply that
1 2v 2
h > v2 , k< · (5.29)
2 h − 12 v 2
Note : If these inequalities do not hold, then some different control strategy (such
as u = −1, then u∗ = 0, then u∗ = +1 ), becomes optimal. For example, if k is
∗
increased so that the second inequality in (5.29) is violated, then this means that more
emphasis is placed on the time to landing in the performance index. It is therefore
reasonable to expect this time would be reduced by first accelerating downwards with
u∗ = −1 before coasting with u∗ = 0.
A general regulator problem
We can now discuss a general linear regulator problem in the usual form
ẋ = Ax + Bu (5.30)
C.C. Remsing 175
where x(·) is the deviation from the desired constant state. The aim is to
transfer the system from some initial state to the origin in minimum time,
subject to
|ui (t)| ≤ Ki , i = 1, 2, . . . , `.
The Hamiltonian is
H = 1 + p (Ax + Bu)
h i
= 1 + pAx + pb1 pb2 . . . pb` u
`
X
= 1 + pAx + (p bi )ui
i=1
where the bi are the columns of B. Application of (PMP) (cf. Theorem

5.3.1) gives the necessary conditions for optimality that
u∗i (t) = −Ki sgn (si (t)), i = 1, 2, . . . , `
where
si (t) : = p∗ (t)bi (5.31)
is the switching function for the ith variable. The adjoint equation is
∂
ṗ∗ = − (p∗ Ax)
∂x
or
ṗ∗ = −p∗ A.
The solution of this ODE can be written in the form
p∗ (t) = p(0) exp(−tA)
so the switching function becomes
si (t) = p(0) exp(−tA)bi .
If si (t) ≡ 0 in some time interval, then u∗i (t) is indeterminate in this interval.
We now therefore investigate whether the expression in (5.31) can vanish.
Firstly, we can assume that bi 6= 0. Next, since the final time is free, the
condition Hu=u∗ = 0 holds, which gives (for all t )
1 + p∗ (Ax∗ + Bu∗ ) = 0
so clearly p∗ (t) cannot be zero for any value of t. Finally, if the product p∗ bi
is zero, then si = 0 implies that
ṡi (t) = −p∗ (t)Abi = 0
and similarly for higher derivatives of si . This leads to

h i
p∗ (t) bi Abi A2 bi . . . Am−1 bi = 0. (5.32)
If the system (5.30) is c.c. by the ith input acting alone (i.e. uj ≡ 0, j 6=
i ), then by Theorem 3.1.3 the matrix in (5.32) is nonsingular, and equation
(5.32) then has only the trivial solution p∗ = 0. However, we have already
ruled out this possibility, so si cannot be zero. Thus provided the controlla-
bility condition holds, there is no time interval in which u∗i is indeterminate.
The optimal control for the ith variable then has the bang-bang form
u∗i = ± Ki .
5.4 Linear Regulators with Quadratic Costs

A general closed form solution of the optimal control problem is possible for
a linear regulator with quadratic performance index. Specifically, consider the
time-varying system
ẋ = A(t)x + B(t)u (5.33)
with a criterion (obtained by combining together (5.3) and (5.7)) :
1 t1 T
Z
1 T
x Q(t)x + uT R(t)u dt

J : = x (t1 )M x(t1 ) + (5.34)
2 2 0
with R(t) positive definite and M and Q(t) positive semi-definite symmetric
1
matrices for t ≥ 0 (the factors 2 enter only for convenience).
C.C. Remsing 177
Note : The quadratic term in u in (5.34) ensures that the total amount of control
effort is restricted, so that the control variables can be assumed unbounded.
The Hamiltonian is
1 1
H = xT Qx + uT Ru + p(Ax + Bu)
2 2
and the necessary condition (5.17) for optimality gives

∂ 1 ∗ T ∗
(u ) Ru + p Bu = (Ru∗ )T + p∗ B = 0
∗ ∗
∂u 2
so that
u∗ = −R−1 B T (p∗ )T (5.35)
R(t) being nonsingular (since it is positive definite). The adjoint equation is
(ṗ∗ )T = −Qx∗ − AT (p∗ )T . (5.36)
Substituting (5.35) into (5.33) gives
ẋ∗ = Ax∗ − BR−1 B T (p∗ )T
and combining this equation with (5.36) produces the system of 2m linear
ODEs " # " #" #
d x∗ A(t) −B(t)R−1 (t)B T (t) x∗
= . (5.37)
dt (p∗ )T −Q(t) −AT (t) (p∗ )T
Since x(t1 ) is not specified, the boundary condition is
(p∗ )T (t1 ) = M x∗ (t1 ) . (5.38)
It is convenient to express the solution of (5.37) as follows :

" # " #
x∗ x∗ (t1 )
= Φ(t, t1 )
(p∗ )T (p∗ )T (t1 )
" #" #
φ1 φ2 x∗ (t1 )
=
φ3 φ4 (p∗ )T (t1 )
where Φ is the transition matrix for (5.37). Hence
x∗ = φ1 x∗ (t1 ) + φ2 (p∗ )T (t1 )

= (φ1 + φ2 M )x∗ (t1 ) .
Also we get
(p∗ )T = (φ3 + φ4 M )x∗ (t1 )

= (φ3 + φ4 M )(φ1 + φ2 M )−1 x∗ (t)
= P (t)x∗ (t).
(It can be shown that φ1 + φ2 M is nonsingular for all t ≥ 0 ). It now follows

that the optimal control is of linear feedback form
u∗ (t) = −R−1 (t)B T (t)P (t)x∗ (t). (5.39)
To determine the matrix P (t), differentiating (p∗ )T = P x∗ gives
Ṗ x∗ + P ẋ∗ − (ṗ∗ )T = 0
and substituting for ẋ∗ , (ṗ∗ )T (from (5.37)) and (p∗ )T , produces

Ṗ + P A − P BR−1 B T P + Q + AT P x∗ (t) = 0.
Since this must hold throughout 0 ≤ t ≤ t1 it follows that P (t) satisfies
Ṗ = P BR−1 B T P − AT P − P A − Q (5.40)
with boundary condition

P (t1 ) = M.
Equation (5.40) is often referred to as a matrix Riccati differential equa-

tion.
Note : (1) Since the matrix M is symmetric, it follows that P (t) is symetric
m(m+1)
for all t, so the (vector) ODE (5.40) represents 2 scalar first order (quadratic)
ODEs, which can be integrated numerically.
C.C. Remsing 179
(2) Even when the matrices A, B, Q, and R are all time-invariant the solution P (t)
of (5.40), and hence the feedback matrix in (5.39), will in general still be time-varying.
However, of particular interest is the case when in addition the final time
t1 tends to infinity. Then there is no need to include the terminal expression
in the performance index since the aim is to make x(t1 ) → 0 as t1 → ∞, so
we set M = 0. Let Q1 be a matrix having the same rank as Q and such that
Q = QT1 Q1 . It can be shown that the solution P (t) of (5.40) does become a
constant matrix P , and we have :
5.4.1 Proposition. If the linear time-invariant control system
ẋ = Ax + Bu(t)
is c.c. and the pair (A, Q1 ) is c.o., then the control which minimizes
Z ∞
xT Qx + uT Ru dt

(5.41)
0
is given by
u∗ (t) = −R−1 B T P x(t) (5.42)
where P is the unique positive definite symmetric matrix which satisfies the
so-called algebraic Riccati equation
P BR−1 B T P − AT P − P A − Q = 0. (5.43)
m(m+1)
Note : Equation (5.43) represents 2 quadratic algebraic equations for the
unknown elements (entries) of P , so the solution will not in general be unique. How-
ever, it can be shown that if a positive definite solution of (5.43) exists, then there is
only one such solution.
Interpretation
The matrix Q1 can be interpreted by defining an output vector y = Q1 x

and replacing the quadratic term involving the state in (5.41) by
y T y = xT QT1 Q1 x .

The closed loop system obtained by substituting (5.42) into (5.33) is
ẋ = Ax (5.44)
where A : = A − BR−1 B T P . It is easy to verify that
AT P + P A = AT P + P A − 2P BR−1 B T P = −P BR−1 B T P − Q (5.45)
using the fact that it is a solution of (5.43). Since R−1 is positive definite
and Q is positive semi-definite, the matrix on the RHS in (5.45) is negative
semi-definite, so Proposition 4.3.10 is not directly applicable, unless Q is
actually positive definite.
It can be shown that if the triplet (A, B, Q1 ) is neither c.c. nor c.o. but
is stabilizable and detectable, then the algebraic Riccati equation (5.43) has a
unique solution, and the closed loop system (5.44) is asymptotically stable.
Note : Thus a solution of the algebraic Riccati equation leads to a stabilizing linear
feedback control (5.42) irrespective of whether or not the open loop system is stable.
(This provides an alternative to the methods of section 3.3.)
If x∗ (·) is the solution of the closed loop system (5.44), then (as in (5.10))
equation (5.45) implies
d
(x∗ )T P x∗ = −(x∗ )T P BR−1 B T P + Q x∗

dt
= −(u∗ )T Ru∗ − (x∗ )T Qx∗ .
Since A is a stability matrix, we can integrate both sides of this equality with
respect to t (from 0 to ∞ ) to obtain the minimum value of (5.41) :
Z ∞
(x∗ )T Qx∗ + (u∗ )T Ru∗ dt = xT0 P x0 .

(5.46)
0
Note : When B ≡ 0, (5.43) and (5.46) reduce simply to
AT P + P A = −Q
and Z ∞
J0 = xT Qx dt = xT0 P x0
0
respectively.
C.C. Remsing 181
5.5 Exercises
Exercise 93 A system is described by
ẋ = −2x + u
and the control u(·) is to be chosen so as to minimize the performance index

Z 1
J = u2 dt.
0
Show that the optimal control which transfers the system from x(0) = 1 to x(1) = 0
is
4e2t
u∗ (t) = − ·
e4 − 1

...
z = u(t)
where z(·) denotes displacement. Starting from some given initial position with
given velocity and acceleration it is required to choose u(·) which is constrained by
|u(t)| ≤ k, so as to make displacement, velocity, and acceleration equal to zero in the
least possible time. Show using (PMP) that the optimal control consists of
u∗ = ± k
with zero, one, or two switchings.
Exercise 95 A linear system is described by
z̈ + aż + bz = u
where a > 0 and a2 < 4b. The control variable is subject to |u(t)| ≤ k and is to be
chosen so that the system reaches the state z(T ) = 0, ż(T ) = 0 in minimum possible
time. Show that the optimal control is
u∗ (t) = k sgn p(t)
where p(·) is a periodic function.

ẋ = −2x + 2u, x ∈ R.
The unconstrained control variable u(·) is to be chosen so as to minimize the perfor-

mance index Z 1
3x2 + u2 dt

J =
0
whilst transferring the system from x(0) = 0 to x(1) = 1. Show that the optimal
control is
3e4t + e−4t
u∗ (t) = ·
e4 − e−4
Exercise 97 A system is described by the equations
ẋ1 = x2 , ẋ2 = x1 − 2x2 + u
and is to be transferred to the origin from some given initial state.
(a) If the control u(·) is unbounded, and is to be chosen so that

Z T
J = u2 dt
0
is minimized, where T is fixed, show that the optimal control has the
form
√
u∗ (t) = c1 et sinh(t 2 + c2 )
where c1 and c2 are certain constants. (DO NOT try to determine their
values.)
(b) If u(·) is such that |u(t)| ≤ k, where k is a constant, and the system is
to be brought to the origin in the shortest possible time, show that the
optimal control is bang-bang, with at most one switch.
Exercise 98 For the system described by
ẋ1 = x2 , ẋ2 = −x2 + u
determine the control which transfers it from x(0) = 0 to the line L with equation
x1 + 5x2 = 15
and minimizes the performance index

Z 2
1 2 1 2 1
J = (x1 (2) − 5) + (x2 (2) − 2) + u2 dt.
2 2 2 0
C.C. Remsing 183
Exercise 99 Use Proposition 5.4.1 to find the feedback control which minimizes
Z ∞
1
x22 + u2 dt
0 10
subject to
ẋ1 = −x1 + u, ẋ2 = x1 .
Exercise 100
(a) Use the Riccati equation formulation to determine the feedback control
for the system
ẋ = −x + u, x∈R
which minimizes Z 1
1
3x2 + u2 dt.

J =
2 0
ẇ(t)
[Hint : In the Riccati equation for the problem put P (t) = − ·]
w(t)
(b) If the system is to be transferred to the origin from an arbitrary initial
state with the same performance index, use the calculus of variations to
determine the optimal control.

Optimal Control Apparently

Uploaded by

Copyright:

Available Formats

Optimal Control Apparently

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimal Control Apparently

Uploaded by

Copyright:

Available Formats

Chapter 5

2. Elements of Calculus of Variations

4. Linear Regulators with Quadratic Costs

5.1 Performance Indices

Fi : t 7→ Fi (t, x1 (t), x2 (t), . . . , xm (t), u1 (t), . . . , u` (t)), i = 1, 2, . . . , m.

The optimal control problem

The general optimal control problem (OCP) concerns the minimization of

(1) Minimum-time problems.

5.1.1 Example. An aircraft pursues a ballistic missile and wishes to inter-

(2) Terminal control.

J : = eT (t1 )M e(t1 ) (5.3)

where e(t) : = x(t) − x(t) and M is a positive definite symmetric matrix

J = kxf − x(t1 )k2 .

Note : More generally, if M = diag (λ1 , λ2 , . . . , λm ), then the entries λi are

(3) Minimum effort.

(4) Tracking problems.

where Q is a positive semi-definite symmetric matrix (QT = Q ≥ 0).

5.1.2 Example. A landing vehicle separates from a spacecraft at time t0 =

upwards regarded as the positive direction. Let x1 denote altitude, x2 velocity

and the initial conditions are

x1 (0) = h, x2 (0) = −v.

For a “soft landing” at some time T we require

A suitable performance index might be

Before dealing with problems of determining optimal controls, we return

and show how to evaluate associated quadratic indices (costs)

where Q is a positive definite symmetric matrix (QT = Q > 0).

To evaluate J0 we use the techniques of Lyapunov theory (cf. section

Integrating both sides of (5.10) with respect to t gives

provided A is a stability matrix, since in this case x(t) → 0 as t → ∞

Note : The matrix P is positive definite and so J0 > 0 for all x0 6= 0.

A repetition of the argument leads to a similar expression for Jr , r ≥ 1.

Exercise 91 Show that

Thus evaluation of (5.9) involves merely successive solution of the linear

5.1.3 Example. A general second-order linear system (the harmonic os-

where ω is the natural frecquency of the undamped system and k is a damping

Exercise 92 Work out the preceding computation.

5.2 Elements of Calculus of Variations

We consider the problem of minimizing the functional

• there are no constraints on the control functions ui (·), i = 1, 2, . . . , `

∆J : = J [u + δu] − J [u] = δJ [u, δu] + j(u, δu) · kδuk

where δJ is linear in δu and j(u, δu) → 0 as kδuk → 0 ).

(2) δJ is called the (first) variation of J corresponding to the variation δu in u.

The control u∗ is an extremal, and J has a (relative) minimum, provided

A fundamental result (given without proof) is the following :

5.2.1 Proposition. A necessary condition for u∗ to be an extremal is that

δJ [u∗ , δu] = 0 for all δu.

We now apply Proposition 5.2.1. Introduce a covector function of La-

Integrating the last term on the rhs by parts gives

where the (control) Hamiltonian function is defined by

H(t, p, x, u) : = L(t, x, u) + pF (t, x, u). (5.15)

where δx is the variation in x in the differential equation

due to δu. (We have used the notation

Note : Since x(t0 ) is specified, δx|t=t0 = 0.

It is convenient to remove the term (in the expression δJa ) involving δx

We have therefore “established”

5.2.2 Theorem. Necessary conditions for u∗ to be an extremal for

are the following :

Note : The (vector) state equation