Optimal Control: 5.1 Performance Indices
Optimal Control: 5.1 Performance Indices
Optimal Control: 5.1 Performance Indices
5 Optimal Control
One of the essential elements of the control problem, discussed in Section 1.1, is a means
of testing the performance of any proposed control law. Whenever we use the term “best” or
“optimal” to describe the effectiveness of a given control strategy, we do so with respect to
some numerical index of performance called the performance index. We assume that the value
of the performance index decreases as the quality of the given admissible control law increases.
The admissible controller that ensures the completion of the system objective and at the same
time minimizes the performance index is called an optimal controller for the system. In this
chapter, we discuss various methods of constructing optimal controllers.
on a fixed interval [t0 , t f ] so that the components of the state vector are “small.” A suitable
performance index to be minimized would be
tf
J1 = x T (t)x(t) dt. (5.3)
t0
225
226 CHAPTER 5 • OPTIMAL CONTROL
Obviously, if J1 is small, then the state vector norm, x(t), is small in the sense of the above
performance index.
If the objective is to control the system so that the components of the output, y(t), are to be
small, then we could use the performance index
tf
J2 = y T (t) y(t) dt
t0
tf
= x T (t)C T C x(t) dt
t0
tf
= x T (t) Qx(t) dt, (5.4)
t0
or
tf
J4 = u T (t)Ru(t) dt, (5.6)
t0
where the weight matrix R is symmetric positive definite. There is no loss of generality in
assuming the weight matrix R to be symmetric. If R was not symmetric, we could represent the
quadratic term u T Ru equivalently as
R + RT
u Ru = u
T T
u,
2
where the matrix 12 (R + R T ) is symmetric.
As pointed out by Owens [218, p. 186], we cannot simultaneously minimize the performance
indices (5.3) and (5.5) because minimization of (5.3) requires large control signals, while min-
imization of (5.5) requires small control signals. To solve the dilemma, we could compromise
between the two conflicting objectives by minimizing the performance index that is a convex
combination of J1 and J3 ,
J = λJ1 + (1 − λ)J3
tf
= (λx T (t)x(t) + (1 − λ)u T (t)u(t)) dt, (5.7)
t0
where λ is a parameter in the range [0, 1]. If λ = 1, then J = J1 ; and if λ = 0, then J = J3 . By trial
and error, we could select λ from the interval [0, 1] to compromise between the two extremes.
A generalization of the performance index (5.7) is
tf
J= (x T (t) Qx(t) + u T (t)Ru(t)) dt.
t0
5.2 A GLIMPSE AT THE CALCULUS OF VARIATIONS 227
In certain applications, we may wish the final state x(t f ) to be as close as possible to 0. Then,
a possible performance measure to be minimized is
x T (t f )Fx(t f ), (5.8)
1
J = (x(t f ) − x d (t f ))T F(x(t f ) − x d (t f ))
2
1 tf
+ ((x(t) − x d (t))T Q(x(t) − x d (t)) + u T (t)Ru(t)) dt. (5.10)
2 t0
◆ Example 5.1
The length l of the arc connecting two given points A( x0 , y0 ) and B( x1 , y1 ) in the
x–y plane, as depicted in Figure 5.1, is an example of a functional. We know that
x1
l = l( y( x)) = 1 + (dy( x)/dx) 2 dx,
x0
B(x1, y1)
y y(x)
A(x0, y0)
◆ Example 5.2
Another example of a functional is the area of the surface of revolution formed by
a surface connecting two circular disks as illustrated in Figure 5.2. In particular, the
surface of revolution can be formed by a soap film clinging to two circular disks.
When a wire circle is dipped into a soap solution and withdrawn, a circular disk
of soap film bounded by the circle is formed. If a second smaller circle is made to
touch the disk and then moved away, the two circles will be joined by a surface
of film that is a surface of revolution in the particular case when the circles are
parallel and have their centers on the same axis perpendicular to their planes. If
we denote the arc of intersection with the x–y plane by y = y( x), then the area of
the surface of revolution is
x1
v = v( y( x)) = 2π y( x) 1 + (dy( x)/dx) 2 dx,
x0
subject to the constraints y( x0 ) = y0 , y( x1 ) = y1 . One may then want to minimize
v = v( y( x)) to find y = y( x) resulting in the surface of revolution of minimum area.
y
A(x0, y0)
y y(x)
B(x1, y1)
A(0, 0)
x
x x(t)
y y(t)
B(x1, y1)
mg
A B
The problem of optimization of functionals dates back to Johann Bernoulli. In 1696 he published
a letter in which he posed the problem, which is now known as the brachistochrone problem.
Brachistochrone translates from Greek as the shortest time. The brachistochrone problem can be
described as follows. A particle of mass m is to slide down along a frictionless track connecting
two given points. The motion of the particle is to be constrained only by gravity and the shape
of the curve of the track. Find the curve of the track that minimizes the time travel between the
two given points. We illustrate the problem in Figure 5.3. The brachistochrone problem was
solved by Johann Bernoulli himself, his brother Jacob, Isaac Newton, the Marquis de L’Hôpital,
Gottfried Leibniz, and Tschirnhaus. A solution to the brachistochrone problem is presented in
Example 5.23. In 1697 Johann Bernoulli posed and solved the geodesics problem, which can be
formulated as follows: Find a line of shortest length that connects two points on a given surface;
see Figure 5.4 for an illustration of the problem. The curves of the least length connecting pairs
of points on a surface are called geodesics. A general procedure for solving the above type
of problems, using the calculus of variations, was given by Leonhard Euler (who flourished
between 1707 and 1783).
To acquaint ourselves with the methods of the calculus of variations, we first need to introduce
concepts concerning functionals.
assigns to each real number a uniqe real number. An increment of the argument of a function of
one variable t is t = t − t1 .
Similarly, a functional is a mapping that assigns to each function, from some class of func-
tions, a unique number. Thus, we can say that a functional is a “function of a function.” Let
x : R → R be an argument of a functional. By a variation δx of an argument x of a functional
v we mean the difference of two functions
δx = x − x1 .
We assume that x can change in an arbitrary way in some class of functions.
Recall that a function x : R → R is continuous if a small change of its argument t corresponds
to a small change of the value x(t) of the function. Similarly, a functional v is said to be continuous
if a “small” change of its argument x corresponds to a small change of the value of the functional.
A functional v is called linear if
v(ax1 + x2 ) = av(x1 ) + v(x2 ),
where a is a constant.
We will now introduce the notion of the variation of a functional. This notion is analogous to
the notion of a function differential. To connect the two, we first consider the case of a function
of one variable. Let x be a differentiable function defined on an open interval U , and let t ∈ U .
The derivative of x at t is defined as
x(t + t) − x(t)
x (t) = lim . (5.11)
t→0 t
Let
x(t + t) − x(t)
ϕ(t) = − x (t). (5.12)
t
The function ϕ is not defined at t = 0; however,
lim ϕ(t) = 0.
t→0
We rewrite (5.12) as
x(t + t) − x(t) = x (t)t + ϕ(t) t. (5.13)
The above relation has meaning only when t = 0. To make it hold at t = 0, we define
ϕ(t)|t=0 = 0. To proceed, let
β(t) = ϕ(t) if t > 0
(5.14)
β(t) = −ϕ(t) if t < 0.
Then, if x is differentiable, there exists a function β such that
x(t + t) − x(t) = x = x (t)t + β(t)|t|
= L(t, t) + β(t)|t|, (5.15)
where limt→0 β(t) = 0, and L(t, t) = x (t)t is a linear function in t.
For a real-valued function f = f (x) : Rn → R, relation (5.15) takes the form
f (x + x) − f (x) = f = D f (x)x + β(x)x
= L(x, x) + β(x)x, (5.16)
5.2 A GLIMPSE AT THE CALCULUS OF VARIATIONS 231
◆ Example 5.3
We find the variation of the functional
1
v= (2x 2 (t) + x(t))dt.
0
To proceed, note that the linear part on the right-hand side of (5.16) can be computed as
d
D f (x)x = L(x, x) = f (x + αx) (5.17)
dα α=0
As mentioned before, the above also holds for operators over Banach spaces other than Euclidean
232 CHAPTER 5 • OPTIMAL CONTROL
spaces. Using this observation, we can state and prove a lemma that will allow us to calculate
the variation of a given functional more efficiently than the method we used in Example 5.3.
Lemma 5.1
d
δv = v( x + αδx)
.
dα α=0
Proof Suppose that for a given functional v there exists its variation. This means that
we can represent v as
v = v( x + αδx) − v( x) = L ( x, αδx) + β(αδx)|α|δx.
Then, the derivative of v( x + αδx) with respect to α evaluated at α = 0 is equal to
v L ( x, αδx) + β(αδx)|α|δx
lim = lim
α→0 α α→0 α
L ( x, αδx) β(αδx)|α|δx
= lim + lim
α→0 α α→0 α
= L ( x, δx),
because L (·, ·) is linear with respect to the second argument, and hence
L ( x, αδx) = αL ( x, δx).
Furthermore,
β(αδx)|α|δx
lim = lim β(αδx)δx = 0.
α→0 α α→0
◆ Example 5.4
We will now use Lemma 5.1 to find the variation of the functional considered in
Example 5.3. We have
d
δv = v( x + αδx)|α=0
dα
1
d
= (2( x + αδx) 2 + ( x + αδx))dt
dα 0
α=0
1
= (4( x + αδx)δx + δx)dt
0
α=0
1
= (4x + 1)δx dt,
0
◆ Example 5.5
We will find the variation of the functional
1
v( x1 , x2 ) = v( x1 (t), x2 (t)) = x1 e−x2 dt,
0
We have
1
d
δv = ( x1 + αδx1 )e−( x2 +αδx2 ) dt
dα 0
α=0
1
d
= ( x1 + αδx1 )e−( x2 +αδx2 ) dt
0 dα
α=0
1
= δx1 e−( x2 +αδx2 ) − δx2 e−( x2 +αδx2 ) ( x1 + αδx1 ) dt
0
α=0
1
= (e−x2 δx1 − x1 e−x2 δx2 )dt.
0
Thus,
1
−x
δv = e 2 δx1 − x1 e−x2 δx2 dt.
0
Definition 5.1 A functional v is maximal on some function x0 if the value of the functional
on any function x close to x0 is less than or equal to that on x0 , that is,
v = v(x) − v(x0 ) ≤ 0.
We will now give a first order necessary condition for a given functional to achieve a maximum
or minimum on a curve.
Theorem 5.1 If for a functional v there exists a variation and the functional reaches a
maximum or a minimum on a curve x0 , then
δv( x0 ) = 0.
Proof Suppose that x0 and δx are fixed. Then
v( x0 + αδx) = (α);
234 CHAPTER 5 • OPTIMAL CONTROL
that is, v( x0 +αδx) is a function of α that for α = 0 achieves its maximum or minimum.
This implies that dαd
= 0, or, equivalently,
d
v( x0 + αδx) = 0.
dα α=0
The above results will be used in the next section to determine conditions that a curve has to
satisfy to be a maximizer or a minimizer, that is, to be an extremal for a given functional.
Problem 1 Suppose that we are given two points (t0 , x0 ) and (t1 , x1 ) in the (t, x)-plane. We
wish to find a curve, or trajectory, joining the given points such that the functional
t1
v(x) = F(t, x, ẋ) dt (5.18)
t0
along this trajectory can achieve its extremal—that is, maximal or minimal—value. The problem
is illustrated in Figure 5.5. In solving Problem 1, we use the following lemma.
x(t)
x1
x* x*(t)
x x(t)
x0
x(t)
"t
Proof Sufficiency is obvious because we can satisfy t01 φ(t)δx(t)dt = 0 by choosing
φ that vanishes at every point of the interval [t0 , t1 ].
We prove necessity by contraposition.
"t We thus prove that if φ is nonzero on some
subinterval of [t0 , t1 ], then t01 φ(t)δx(t)dt = 0. Specifically, assume, without loss of
generality, that φ is positive on the interval [a, b], where t0 ≤ a ≤ b ≤ t1 . Note that
if φ were nonzero at only one point, then because it is continuous, it would have to
be nonzero in some neighborhood of that point. Hence, we can assume that a < b.
Consider now a variation given by
2
k (t − a) 2 (t − b) 2 for a ≤ t ≤ b,
δx = δx(t) = (5.19)
0 elsewhere.
If we assumed φ to be negative on the interval [a, b], then we would use δx to be
negative of (5.19). We illustrate δx in Figure 5.6. Because " t δx is positive in [a, b] and
zero elsewhere in the interval [t0 , t1 ], the integrand of t01 φ(t)δx(t)dt will have the
same sign as φ in [a, b] and will be zero outside this subinterval. Thus
t1 b
φ(t)δx(t)dt = φ(t)δx(t)dt = 0,
t0 a
Solution to Problem 1 We assume that there is an optimal trajectory x = x(t) joining the points
(t0 , x0 ) and (t1 , x1 ) such that δv(x(t)) = 0. Let us then consider an arbitrary acceptable curve
x ∗ = x ∗ (t) that is close to x and from a family of curves
x(t, α) = x(t) + α(x ∗ (t) − x(t)) = x(t) + αδx(t).
Note that for α = 0 we get x(t), and for α = 1 we obtain x ∗ (t). Furthermore,
d d
(δx(t)) = (x ∗ (t) − x(t)) = δx ,
dt dt
and analogously
d2 d2
2
(δx(t)) = 2 (x ∗ (t) − x(t)) = δx .
dt dt
236 CHAPTER 5 • OPTIMAL CONTROL
We are now ready to investigate the behavior of the functional v given by (5.18) on the curves
from the family x(t, α). Note that the functional v can be considered as a function of α, that
is, v(x(t, α)) = (α). It follows from the first-order necessary condition for a point to be an
extremizer of a function of one variable that α = 0 is a candidate to be an extremizer of if
d(α) = 0,
dα α=0
where
t1
(α) = F(t, x(t, α), x (t, α)) dt,
t0
and x (t, α) = d
dt
x(t, α). We now evaluate d(α)
dα
:
t1
d(α) d d
= Fx x(t, α) + Fx x (t, α) dt,
dα t0 dα dα
∂ ∂
where Fx = ∂x
F(t, x(t, α), x (t, α)), and Fx = ∂x
F(t, x(t, α), x (t, α)). Because
d d
x(t, α) = (x(t) + αδx(t)) = δx(t)
dα dα
and
d d
x (t, α) = (x (t) + αδx (t)) = δx (t),
dα dα
we can write
t1
d
(α) = (Fx (t, x(t, α), x (t, α))δx(t) + Fx (t, x(t, α), x (t, α))δx (t)) dt.
dα t0
By Lemma 5.1,
d
(0) = δv.
dα
Therefore,
t1
δv = (Fx δx + Fx δx ) dt,
t0
where for convenience we dropped the argument t. Integrating by parts the last term, along with
taking into account that δx = (δx) , yields
t1
d
δv = (Fx δx)|t0 +
t1
Fx − Fx δx dt.
t0 dt
The end points (t0 , x0 ) and (t1 , x1 ) are fixed. This implies that
δx|t=t0 = x ∗ (t0 ) − x(t0 ) = 0,
5.2 A GLIMPSE AT THE CALCULUS OF VARIATIONS 237
and
δx|t=t1 = x ∗ (t1 ) − x(t1 ) = 0.
Hence
t1
d
δv = Fx − Fx δx dt.
t0 dt
By Lemma 5.2,
d
δv = 0 if and only if Fx − Fx = 0.
dt
The equation Fx − d
dt
Fx = 0 can also be represented as
Fx − Fx t − Fx x x − Fx x x = 0.
The equation
d
Fx − Fx = 0 (5.20)
dt
is known as the Euler–Lagrange equation. The curves x = x(t, C1 , C2 ) that are solutions to
the Euler–Lagrange equation are called extremals. The constants C1 and C2 are the integration
constants. Thus, a functional can only achieve its extremum on the extremals.
◆ Example 5.6
We find the extremal for the functional
π/2
v= (( x ) 2 − x 2 )dt, (5.21)
0
where x(0) = 0, and x(π/2) = 1. The Euler–Lagrange equation for the problem
takes the form
d ∂ d ∂
F x − F x = (( x ) 2 − x 2 ) − (( x ) 2 − x 2 )
dt ∂x dt ∂ x
d
= −2x − (2x )
dt
= −2x − 2x
= 0.
Equivalently, the Euler–Lagrange equation for the functional (5.21) can be repre-
sented as
x + x = 0.
The characteristic equation of the above differential equation is s 2 + 1 = 0.
238 CHAPTER 5 • OPTIMAL CONTROL
Therefore, the general solution of the Euler–Lagrange equation has the form
x(t) = C1 cos t + C2 sin t.
Taking into account the end-point conditions gives
x(0) = C1 = 0,
π
x = C2 = 1.
2
Thus, we have one extremal
x(t) = sin t.
The functional (5.21) is minimized or maximized only on the extremal x(t) = sin t.
◆ Example 5.7
We now find the extremal for the functional
1
v= (( x ) 2 + 12xt)dt, (5.22)
0
where x(0) = 0, and x(1) = 1. The Euler–Lagrange equation for the this functional
is
d ∂ d ∂
Fx − F x = (( x ) 2 + 12xt) − (( x ) 2 + 12xt)
dt ∂x dt ∂ x
d
= 12t − (2x )
dt
= 12t − 2x
= 0.
Equivalently, the Euler–Lagrange equation for the functional (5.22) can be repre-
sented as
x = 6t.
The general solution to the above differential equation is
x(t) = C1 t + C2 + t 3 .
From the end-point conditions, we get
x(0) = C2 = 0,
x(1) = C1 + 1 = 1.
Hence, C1 = 0, and the extremal has the form
x(t) = t 3 .
5.2 A GLIMPSE AT THE CALCULUS OF VARIATIONS 239
Suppose now that A = A(t0 , x10 , . . . , xn0 ) and B = B(t1 , x11 , . . . , xn1 ) are two given points
in the (n + 1)-dimensional space (t, x1 , . . . , xn ), along with the functional
t1
v = v(x1 , . . . , xn ) = F(t, x1 , . . . , xn , ẋ 1 , . . . , ẋ n ) dt.
t0
To find a trajectory along which δv = 0, we consider only the variation in one coordinate, say xi ,
while other coordinates are fixed. We perform this process for each coordinate. The functional
v will depend only upon xi , and we write
v(x1 , . . . , xi , . . . , xn ) = ṽ(xi ).
d
Fxi − Fẋ = 0, i = 1, . . . , n. (5.23)
dt i
which are the Lagrange equations of motion for a particle. Thus a particle moves along a path
on which the functional
t2
J= L dt
t1
is extremized, which agrees with the Hamilton principle saying that a particle moves along the
path that minimizes the integral J .
In our discussion up to now, both end points were fixed. We now would like to solve a more
general optimization problem, where the end points need not be fixed. For simplicity, we assume
first that one of the end points is fixed, while the other is free. After we solve this problem, we
can easily generalize to the case when both end points are free.
Problem 2 In the (t, x) plane the left end point A is fixed, while the right end point B is free
(see Figure
"t 5.7). It is required to find a trajectory x = x(t) starting at A on which the functional
v = t01 F(t, x, x ) dt can achieve its extremal value.
Solution to Problem 2 Following Elsgolc [75, pp. 64–72], we begin by assuming that a solution
to Problem 2 exists; that is, there is a trajectory x = x(t) joining (t0 , x0 ) and (t1 , x1 ) such
that δv = 0. This trajectory must satisfy the Euler–Lagrange equation. Consider an adjoining
240 CHAPTER 5 • OPTIMAL CONTROL
x(t) C
x1 B
x0 A
Our goal now is to extract from v its linear portion in δx, which we know is the variation of
the given functional. We begin by approximating the first integral as
t1 +δt1
F(t, x + δx, x + δx ) dt ≈ F(t, x, x )|t=t1 δt1 .
t1
Expanding the integrand of the second integral into a Taylor series and retaining only first-order
terms yields
t1 t1
(F(t, x + δx, x + δx ) − F(t, x, x )) dt = (Fx (t, x, x )δx + Fx (t, x, x )δx ) dt.
t0 t0
Integrating the second term by parts and combining with the first term gives
t1 t1
t d
(Fx δx + Fx δx ) dt = (Fx δx)t10 + Fx − Fx δx dt,
t0 t0 dt
where for simplicity of notation we suppressed the arguments of Fx and Fx . We assumed that
a solution to the problem exits. Hence, an extremal must satisfy the Euler–Lagrange equation
Fx − dtd Fx = 0. Because the left end point is fixed, we obtain
δx|t=t0 = 0.
Taking into account the results of the above manipulations yields
v ≈ F(t, x, x )|t=t1 δt1 + (Fx δx)|t=t1 . (5.24)
5.2 A GLIMPSE AT THE CALCULUS OF VARIATIONS 241
D
E
x1 F
B
The final step in obtaining δv is to evaluate (Fx δx)|t1 . The problem is two-dimensional, so we
can illustrate it graphically as shown in Figure 5.8. Referring to Figure 5.8, we obtain
D B = δx|t=t1 ,
FC = δx1 ,
EC ≈ x (t1 )δt1 ,
D B = FC − EC.
Hence,
δx|t=t1 ≈ δx1 − x (t1 )δt1 .
Substituting the above into (5.24) gives
δv = F(t, x, x )|t=t1 δt1 + Fx |t=t1 (δx1 − x (t1 )δt1 )
= (F − x Fx )|t=t1 δt1 + Fx |t=t1 δx1 .
If the variations δt1 and δx1 are arbitrary, we obtain δv = 0 if in addition to the Euler–Lagrange
equation the following conditions are also satisfied:
The above end-point relations are called the transversality conditions. In conclusion, the optimal
trajectory must satisfy both the Euler–Lagrange equation and the transversality conditions.
◆ Example 5.8
Consider the functional of Example 5.7, where now the left end point is fixed as
before—that is, x(0) = 0—while the right end point x(1) is free. Here, we have
t1 = 1. Because δt1 = 0 and δx1 is arbitrary, the transversality conditions for the
right end point are reduced to one condition,
F x |t=t1 = 0.
242 CHAPTER 5 • OPTIMAL CONTROL
0.4
0.3
The right end-point is on t t1 1
0.2
0.1
x 0
0.1
0.2
0.3
0.4
0 0.2 0.4 0.6 0.8 1
t
Figure 5.9 A plot of the extremal, x(t) = −3t + t3 , of Example 5.8.
0 = F x |t=t1
= 2x (t1 )
= 2(C1 + 3t 2 )|t=1
= 2(C1 + 3).
x(t) = −3t + t 3 .
◆ Example 5.9
We consider again the same functional as in Example 5.7, where, this time, the left
end point is fixed, as before, while at the right end point t1 is free and x(t1 ) = 1.
Hence, in this case, δx1 = 0 while δt1 is arbitrary. The transversality conditions
reduce to one condition,
( F − x F x )|t=t1 = 0.
F − x F x = 12xt − ( x ) 2 ,
5.2 A GLIMPSE AT THE CALCULUS OF VARIATIONS 243
1
t1 0.51169 t1 1.2311
0.8
0.6
0.4
x 0.2
0.2
0.4
0 0.2 0.4 0.6 0.8 1 1.2
t
Figure 5.10 Plots of the extremals of Example 5.9.
where
x = (C1 t + t 3 ) = C1 + 3t 2 .
Note that because x(t1 ) = C1 t1 + t13 = 1, we have
1 − t13
C1 = .
t1
Taking into account the above and the fact that x(t1 ) = 1, we perform simple
algebraic manipulations to arrive at the following equation for t1 :
Plots of the two extremals for this example are shown in Figure 5.10.
if
(F + (g1 − x )Fx )|t=t1 = 0. (5.25)
The above is the transversality condition for the case when the end point B is restricted to the
curve x = g1 (t).
In the next section we discuss a specific problem of minimizing a functional subject to
constraints in the form of differential equations.
Integrating both sides of the resulting equation with respect to time from 0 to ∞, we
obtain
∞
V( x(∞)) − V( x(0)) = − ( x T Qx + u∗T R u∗ )dt.
0
implying that
J ( u∗ ) ≤ J ( ũ),
which is a contradiction, and the proof is complete.
It follows from the above theorem that the synthesis of the optimal control law involves
finding an appropriate Lyapunov function, or equivalently, the matrix P. The appropriate P is
found by minimizing
dV
f (u) = + x T Qx + u T Ru, (5.27)
dt
We first apply to (5.27) the necessary condition for unconstrained minimization,
∂ dV
+ x Qx + u Ru
T T
∗ =0 .
T
∂ u dt u=u
246 CHAPTER 5 • OPTIMAL CONTROL
∂2 dV ∂2
+ x T
Qx + u T
Ru = (2x T P Ax + 2x T P Bu + x T Qx + u T Ru)
∂u 2 dt ∂ u2
∂
= (2x T P B + 2u T R)
∂u
= 2R
> 0.
Thus the second-order sufficiency condition for u∗ to minimize (5.27) is satisfied.
We now turn our attention to finding an appropriate matrix P. The optimal closed-loop system
has the form
ẋ = ( A − B R−1 B T P)x, x(0) = x 0 .
Our optimal controller satisfies the equation
dV + x T Qx + u∗T Ru∗ = 0,
dt ∗ u=u
that is,
2x T P Ax + 2x T P Bu∗ + x T Qx + u∗T Ru∗ = 0.
We substitute the expression for u∗ , given by (5.28), into the above equation and represent it as
x T ( AT P + P A)x − 2x T P B R−1 B T P x + x T Qx + x T P B R−1 B T P x = 0.
Factoring out x yields
x T ( AT P + P A + Q − P B R−1 B T P)x = 0.
The above equation should hold for any x. For this to be true we have to have
AT P + P A + Q − P B R−1 B T P = O. (5.29)
The above equation is referred to as the algebraic Riccati equation (ARE). In conclusion, the
synthesis of the optimal linear state feedback controller minimizing the performance index
∞
J= (x T Qx + u T Ru) dt
0
5.3 LINEAR QUADRATIC REGULATOR 247
subject to
ẋ = Ax + Bu, x(0) = x 0
requires solving the ARE given by (5.29).
◆ Example 5.10
Consider the following model of a dynamical system:
1. We first find the solution to the ARE corresponding to the linear state-feedback
optimal controller. We have
A = 0, B = [2 2], Q = 1, R = r I 2.
8 2
O = AT P + P A + Q − P B R −1 B T P = 1 − p ,
r
whose solution is
.
r
p= .
8
2. We now write the equation of the closed-loop system driven by the optimal
controller. The optimal controller has the form
1 1
u = −R −1 B T P x = − √ x.
2r 1
4
ẋ = [2 2]u = − √ x.
2r
3. Finally, we find the value of J for the optimal closed-loop system. We have
.
9 r
J = x T (0) P x(0) = .
2 2
248 CHAPTER 5 • OPTIMAL CONTROL
We can directly verify that J = 92 r
2 as follows:
∞
min J = min x 2 + r u21 + r u22 dt
0
∞
√ .
8t 2r 9 r
= 9 exp − √ (1 + 1) dt = 9 = .
0 2r 4 2 2
A major difficulty when solving the ARE is that it is a nonlinear equation. In the next section
we present an efficient method for solving the ARE.
The 2n × 2n matrix in the middle is called the Hamiltonian matrix. We use the symbol H to
denote the Hamiltonian matrix, that is,
!
A −B R−1 B T
H= .
−Q − AT
We have thus reduced the problem of solving the ARE to that of constructing appropriate matrices
X and P X. To proceed further, let v i be an eigenvector of H and let si be the corresponding
eigenvalue; then
Hv i = si v i .
In our further discussion, we assume that H has at least n distinct real eigenvalues among its
2n eigenvalues. (The results obtained can be generalized for the case when the eigenvalues of
H are complex or nondistinct.) Thus, we can write
s1 0 · · · 0
0 s ··· 0
2
H[v 1 v 2 · · · v n ] = [v 1 v 2 · · · v n ] .. .. .. .
. . .
0 0 ··· sn
Let
X
= [v 1 v2 · · · vn ]
PX
and
s1 0 ··· 0
0 ··· 0
s2
= .. .. .. .
. . .
0 0 · · · sn
Thus,
X W
= .
PX Z
P = ZW −1 . (5.32)
We now have to decide what set of n eigenvalues should be chosen from those of H to construct
the particular P. In the case when all 2n eigenvalues of H are distinct, the number of different
250 CHAPTER 5 • OPTIMAL CONTROL
◆ Example 5.11
Consider the following model of a dynamical system:
ẋ = 2x + u,
along with the associated performance index
∞
J = ( x 2 + r u2 )dt.
0
Find the value of r such that the optimal closed-loop system has its pole at −3.
We form the associated Hamiltonian matrix
!
1
A −B R −1 B T 2 −
H= = r .
−Q − AT −1 −2
5.3 LINEAR QUADRATIC REGULATOR 251
1
det(s I 2 − H) = s 2 − 4 − = 0.
r
Hence,
1
r =
5
results in the optimal closed-loop system having its pole located at −3.
The eigenvector method for solving the ARE can be generalized to the case when the eigenvalues
of the Hamiltonian matrix H are not necessarily distinct or real.
is minimized. In other words, we are interested in designing an optimal control system with
prescribed poles. We illustrate the problem with a very simple example.
◆ Example 5.12
For a dynamical system modeled by
ẋ = 2x + u
= Ax + B u,
252 CHAPTER 5 • OPTIMAL CONTROL
We want
det[s I 2 − H] = (s + 3)(s − 3) = s 2 − 9.
Hence, the desired value of q is q = 5.
We now discuss a method for designing optimal control systems with prescribed poles. In
constructing the gain matrix K and the weight matrix Q, we repeatedly use a state variable
transformation. To proceed, we analyze the system model, the performance index, and the
Hamiltonian matrix in a new basis. Let
x = T x̃,
where det T = 0. Then, the system model in the new coordinates has the form
x̃˙ (t) = T −1 AT x̃(t) + T −1 Bu(t)
= Ã x̃(t) + B̃u(t).
The performance index J , after state variable transformation, takes the form
∞
J˜ = ( x̃ T (t)T T QT x̃(t) + u T (t)Ru(t)) dt.
0
Hence, the eigenvalues of H and H̃ are identical. Our next step is to compute the characteristic
equation of H̃ as a function of Q̃. Computing det(s I 2n − H̃) is facilitated after we perform
some manipulations on the characteristic matrix s I 2n − H̃ that do not change its determinant.
Specifically, we premultiply the characteristic matrix s I 2n − H̃ by a 2n × 2n matrix whose
determinant equals 1. Then, the determinant of the product is the same as det(s I 2n − H̃), but it
is much easier to evaluate. We have
! !
In 0 s I n − Ã B̃ R−1 B̃ T
− Q̃(s I n − Ã)−1 I n Q̃ s I n + ÃT
!
s I n − Ã B̃ R−1 B̃ T
= .
0 s I n + ÃT − Q̃(s I n − Ã)−1 B̃ R−1 B̃ T
Let
F = B̃ R−1 B̃ T ,
then
!
s I n − Ã F
det(s I 2n − H̃) = det
0 s I n + ÃT − Q̃(s I n − Ã)−1 F
= det(s I n − Ã) det(s I n + ÃT − Q̃(s I n − Ã)−1 F).
Suppose now that the matrix A has n linearly independent eigenvectors; hence it is diagonaliz-
able. Let
T = [t 1 t2 ··· tn]
From the form of the characteristic polynomial of the Hamiltonian matrix, we conclude that
selecting an appropriate value of q̃ j j allows us to shift the jth pole of the closed-loop system
into a prespecified location while leaving the other poles in their original positions. To see this
we represent the term that involves q̃ j j as
(s + λ j )(s − λ j ) − q̃ j j f j j = s 2 − λ2j − q̃ j j f j j
= s 2 − λ2j + q̃ j j f j j
= (s − s j )(s + s j ),
where
sj = ± λ2j + q̃ j j f j j .
Fromthe above expression for s j , we can calculate q̃ j j that results in the jth pole being shifted
to − λ2j + q̃ j j f j j . We obtain
s 2j − λ2j
q̃ j j = . (5.33)
f jj
Note that while selecting the desired location for s j , we have to make sure that q̃ j j > 0 to ensure
positive semidefiniteness of the weight matrix Q̃ j . Having Q̃ j , we compute Q j = T −T Q̃ j T −1
and solve the Riccati equation
AT P j + P j A + Q j − P j B R−1 B T P j = O (5.34)
for P j . We compute u = −R−1 B T P j x + u j = −K j x + u j that shifts only one pole of the
closed-loop system
ẋ = ( A − B K j )x + Bu j
= A j x + Bu j
to its specified position. We then start with the new system
ẋ = A j x + Bu j
and shift the next pole to a specified location, and then we combine the obtained feedback gains.
The above method is based on the result, established by Solheim [267], that different feedback
256 CHAPTER 5 • OPTIMAL CONTROL
matrices K j and the corresponding weight matrices Q j can be combined to give the final optimal
gain K and the corresponding weight matrix Q. Indeed, suppose we apply the above method
to the system ẋ = ( A − B K j )x + Bu j . After finding an appropriate weight matrix Q j+1 , we
solve the Riccati equation
for P j+1 to compute the gain K j+1 . Adding the two Riccati equations (5.34) and (5.35), and
performing simple manipulations yields
K = K j + K j+1
= R−1 B T P j + R−1 B T P j+1 .
In a similar manner we can show that the method works for more than just two iterations.
◆ Example 5.13
Consider a simple model of one-link robot manipulator shown in Figure 5.11. The
motion of the robot arm is controlled by a DC motor via a gear. The DC motor
is armature-controlled and its schematic is shown in Figure 5.12. We assume that
the motor moment of inertia is negligible compared with that of the robot arm.
We model the arm as a point mass m attached to the end of a massless rod of
length l. Hence the arm inertia is I a = ml 2 . We assume that the gear train has
no backlash, and all connecting shafts are rigid. As can be seen from Figure 5.11,
counterclockwise rotation of the arm is defined as positive, and clockwise rotation
Mass m
p
1: N
DC motor Gear
La Ra
m motor shaft
ia position
u eb back emf
Tm = K mi a ,
and the radii of the gears are proportional to their number of teeth. The work done
by the gears must be equal. Let T p denote the torque applied to the robot arm.
Then,
T p θ p = Tmθm.
T p = NTm = NK mi a .
We use Newton’s second law to write the equation modeling the arm dynamics,
d 2θ p
Ia = mgl sin θ p + T p . (5.36)
dt 2
Substituting into (5.36) the expressions for I a and T p and then rearranging yields
d 2θ p
ml 2 = mgl sin θ p + NK mi a , (5.37)
dt 2
258 CHAPTER 5 • OPTIMAL CONTROL
di a dθ p
La + Ra i a + K b N = u,
dt dt
where K b is the back emf constant. We assume that L a ≈ 0. Hence,
dθ p
u = Ra i a + K b N , (5.38)
dt
We next compute i a from (5.38) and substitute the result into (5.37) to get
dθ p
d 2θ KbN
p u dt .
ml 2 = mgl sin θ p + NK m − (5.39)
dt 2 Ra Ra
We can now construct a state-space model of the one-link robot. For this we
choose the following state and output variables:
dθ p
x1 = θ p , x2 = = ω p, and y = x1 .
dt
Then, using (5.39), we obtain the following simple state-space model of the robot
manipulator:
x2
ẋ 1
ẋ 2
= g K b K m N2 NK m ,
sin x1 − x2 + u
l ml 2 Ra ml 2 Ra
y = x1 .
4
x1
3
2
x1, x2
1
x2
0
1
2
3
0 2 4 6 8 10
Time (sec)
Figure 5.13 Plots of y = x1 and x2 versus time of the uncontrolled nonlinear system of
Example 5.13.
10
8
6
4
2
x2 0
2
4
6
8
10
10 5 0 5 10
x1
Figure 5.14 A phase portrait of the uncontrolled nonlinear system of Example 5.13.
Plots of y = x1 and x2 versus time are shown in Figure 5.15. A phase portrait of
the linearized system is shown in Figure 5.16. Let
∞
J = ( y2 + u2 )dt.
0
We shall find the linear state-feedback control law u = −kx that minimizes J
subject to the equations given by (5.40). We have
1 0
Q = cT c = and R = [1].
0 0
260 CHAPTER 5 • OPTIMAL CONTROL
25
20
x2
15
x1, x2
10
x1
0
0 0.2 0.4 0.6 0.8 1
Time (sec)
Figure 5.15 Plots of x1 and x2 versus time of the linearized system of Example 5.13.
10
8
6
4
2
x2 0
2
4
6
8
10
10 5 0 5 10
x1
Figure 5.16 A phase portrait of the linearized system of Example 5.13.
0.5
x1
0
x1, x2
x2
0.5
1
1.5
0 1 2 3 4 5
Time (sec)
Figure 5.17 Plots of x1 and x2 versus time of the linear closed-loop system of Example 5.13.
10
8
6
4
2
x2 0
2
4
6
8
10
10 5 0 5 10
x1
Figure 5.18 A phase portrait of the linear closed-loop system of Example 5.13.
0.5
x1
0
x1, x2
0.5
x2
1
1.5
0 1 2 3 4 5
Time (sec)
Figure 5.19 Plots of x1 and x2 of the nonlinear closed-loop system of Example 5.13.
10
8
6
4
2
x2 0
2
4
6
8
10
10 5 0 5 10
x1
Figure 5.20 A phase portrait of the nonlinear closed-loop system of Example 5.13.
◆ Example 5.14
We use the data from the previous example to find the weight matrix Q 2 and the
corresponding k2 so that λ2 is shifted to s2 = −5; that is, the new poles of the
closed-loop system
ẋ = ( Ac − bk2 ) x
5.3 LINEAR QUADRATIC REGULATOR 263
are s1 = λ1 and s2 = −5. For the control law u = −kx + r , the linear closed-loop
system is
0 1 0
ẋ = x+ u
−9.8509 −6.3484 1
= Ac x + br,
where for simplicity we used x rather than x, and u instead of u. The eigen-
values of Ac are λ1 = −2.7003 and λ2 = −3.6481. We have λ1 = λ2 and thus the
matrix Ac is diagonalizable. Note that different eigenvalues are only a sufficient
condition for diagonalizability. We construct the similarity transformation using
the eigenvectors of Ac. The eigenvectors of Ac are
0.3473 −0.2644
v1 = and v2 = .
−0.9378 −0.9644
Let x = T z, where T = [v1 v2 ]. Then, Ac in the new coordinates has the form
−2.7003 0
Ãc = T −1 Ac T =
0 −3.6481
and
3.0383
b̃ = T −1 b = .
3.9912
Next, we compute
9.2312 12.1264
T −1 bR −1 bT T −T = .
12.1264 15.9296
Hence,
s 2 − λ22
q̃22 = 2 = 0.7339.
15.9296
Therefore,
0 0
Q̃ 2 = = T T Q2 T
0 0.7339
and
85.2450 31.5686
Q 2 = T −T Q̃ 2 T −1 = .
31.5686 11.6907
Solving the ARE with Ac, b, Q 2 , and R yields
9.8572 3.6504
P2 =
3.6504 1.3518
and
k2 = [3.6504 1.3518].
264 CHAPTER 5 • OPTIMAL CONTROL
As a check, if we take
86.2450 31.5686
Qnew = Q + Q 2 = ,
31.5686 11.6907
◆ Example 5.15
For the linearized model given by (5.40) and R = [2], we will now determine Q
and the corresponding k that result in the closed-loop system ẋ = ( A − bk) x with
poles located at s1 = −4 and s2 = −5, where
0 1 0
A= , b= .
9.8 −1 1
We shift poles into the desired locations one by one. We first diagonalize A using
similarity transformation constructed from eigenvectors of A. The eigenvalues of A
are λ1 = 2.6702 and λ2 = −3.6702. We denote the corresponding eigenvectors
by v1 and v2 , respectively. We obtain
2.6702 0
T −1 AT = ,
0 −3.6702
where
0.3507 −0.2629
T = [v1 v2 ] = .
0.9365 0.9648
We will now construct k1 so that λ1 is shifted to s1 = −4, while λ2 remains
unchanged; that is, the eigenvalues of A − bk1 become s1 and λ2 . For this, we first
compute
0.1011 0.1349
F = T −1 bR −1 T T bT = .
0.1349 0.1800
Hence, f 11 = 0.1011,
42 − (2.6702) 2
q̃11 = = 87.7352,
0.1011
5.3 LINEAR QUADRATIC REGULATOR 265
and
87.7352 0
Q̃ 1 = = T T Q1 T .
0 0
Hence,
239.0024 65.1202
Q 1 = T −T Q̃ 1 T −1 = .
65.1202 17.7431
Solving the associated ARE, using R = [2], yields
179.7014 48.9626
P1 = .
48.9626 13.3407
Therefore,
k1 = [24.4813 6.6703]
and
0 1
A1 = A − bk1 = .
−14.6813 −7.6703
We will now construct k2 that shifts λ2 to s2 leaving s1 unchanged. We first
diagonalize A1 using its eigenvectors. Note that A1 and A share the common
eigenvector v2 . The other eigenvector of A1 corresponding to its eigenvalue s1 is
u1 = [−0.2425 0.9701]T . We have
s1 0
T −1 A1 T = ,
0 λ2
where
−0.2425 0.2629
T = [u1 v2 ] = .
0.9701 −0.9648
We next compute
78.0607 72.0157
T −1 bR −1 bT T −T = .
72.0157 66.4389
52 − (−3.6702) 2
q̃22 = = 0.1735,
66.4389
and
0 0
Q̃ 2 = = T T Q2 T .
0 0.1735
From the above, we obtain
368.9003 92.2211
Q 2 = T −T Q̃ 2 T −1 = .
92.2211 23.0543
266 CHAPTER 5 • OPTIMAL CONTROL
and
k2 = [5.3185 1.3296].
The final weight matrix Q that results in shifting poles into the desired locations
is
607.9027 157.3413
Q = Q1 + Q2 = .
157.3413 40.7974
The corresponding P is
222.2509 59.5996
P = P1 + P2 = .
59.5996 15.9998
The corresponding optimal gain vector k is
k = k1 + k2 = [29.7998 7.9999].
V (x) 0
2.5
2
1.5
1
0.5
x2 V (x) 0
0
0.5
1
1.5
2 V (x) constant
V (x) 0
2.5
1.5 1 0.5 0 0.5 1 1.5
x1
Figure 5.21 Estimating the region of asymptotic stability.
Solving the above optimization problem corresponds to maximizing the size of the region
{x : V (x) ≤ c} in which V̇ (x) < 0 by making the boundary of {x : V (x) ≤ c} to be tangent to
the boundary of {x : V̇ (x) < 0}. This is illustrated in Figure 5.21.
We next estimate the RAS for single-input linear dynamical systems with saturating linear
optimal control. Our development follows that of Friedland [91, Section 5.1]. We will use such
controllers in Chapter 11 to stabilize chaotic systems. We consider a single-input linear system
model,
ẋ = Ax + bUmax sat(u/Umax ), (5.44)
where x ∈ Rn and
1 if z ≥ 1
sat(z) = z if |z| < 1
−1 if z ≤ −1.
In the linear region, |u| ≤ Umax , (5.44) becomes ẋ = Ax + bu. We will consider using the
control law of the form
u = Umax sat(−kx/Umax ). (5.45)
When |kx| ≤ Umax , the control law given by (5.45) becomes a linear state-feedback controller,
u = −kx. For simplicity of notation, we can assume that Umax = 1. Indeed, we can always scale
the input vector b so that Umax = 1. The gain vector k is obtained by solving the associated
LQR problem with the performance index
∞
J= (x T Qx + r u 2 ) dt,
0
where we assume that the pair ( A, b) is reachable, Q = Q T > 0, and r > 0. Solving the algebraic
268 CHAPTER 5 • OPTIMAL CONTROL
Riccati equation,
AT P + P A + Q − P bbT P/r = O, (5.46)
we use its solution, P = P T > 0, to construct the gain vector,
k = bT P/r. (5.47)
We then proceed with evaluating of dtd V = dtd x T P x on the trajectories of the closed-loop system,
V̇ = 2x T P ẋ
= x T ( AT P + P A)x − 2x T P b sat(kx). (5.48)
By (5.47),
bT P = r k. (5.49)
Using the above and (5.46) gives
AT P + P A = r k T k − Q. (5.50)
Substituting (5.50) and (5.49) into (5.48) yields
c̃ = max x T P x
|kx|≤2
We will use the Lagrange multipliers method to solve the above constrained optimization prob-
lem. We will solve the equivalent problem,
max x T P x
subject to (kx)2 = 4.
The Lagrangian for the above problem is
l(x, λ) = x T P x + λ(4 − (kx)2 ).
Applying to the Lagrangian l the first-order necessary conditions for unconstrained minimization
gives
D x l(x, λ) = 2x T P − 2λ(kx)k = 0T , (5.57)
Dλl(x, λ) = 4 − (kx) = 0. 2
(5.58)
From (5.57), we obtain
x T P x = λ(kx)2 .
Because (kx)2 = 4,
λ = 14 x T P x,
and therefore
c̃ = x T P x = 4λ.
Again, using (5.57), we obtain
x T P = λ(kx)k.
Postmultiplying both sides of the above by P −1 k T gives
x T k T = kx = λ(kx)k P −1 k T . (5.59)
Note that kx = 0 because (kx)2 = 4. Hence, from (5.59),
1
λ= .
k P −1 k T
270 CHAPTER 5 • OPTIMAL CONTROL
1 r2
λ= = .
k P −1 k T T
b Pb
Hence,
4r 2
c̃ = 4λ =
bT P b
with a specified initial condition x(0) = x 0 , where x(k) ∈ Rn and u(k) ∈ Rm . We assume that
the pair ( A, B) is reachable. Our objective is to construct a stabilizing linear state-feedback
controller,
u(k) = −K x(k),
where Q = Q T ≥ 0, and R = R T > 0. We assume that the components of the control vector
are unconstrained. An optimal control law will be denoted u∗ .
Because we seek the controller among the ones that stabilize the system (5.60), the closed-
loop system,
is asymptotically stable; that is, the eigenvalues of the matrix A − B K are in the open unit disk.
This implies that there exists a Lyapunov function, V (x(k)) = x T (k) P x(k), for the
system (5.62). Therefore the first forward difference, V (x(k)) = V (x(k + 1)) − V (x(k)),
evaluated on the trajectories of (5.62) is negative definite. We now state and prove a sufficient
condition for u(k) = −K x(k) to be optimal.
Theorem 5.4 If the state feedback controller, u(k) = −K x(k), is such that
min(V( x(k)) + x T (k) Qx(k) + u T (k) R u(k)) = 0, (5.63)
u
then it is optimal.
5.3 LINEAR QUADRATIC REGULATOR 271
Thus, if a linear stabilizing controller satisfies the hypothesis of the theorem, then the
value of the performance index (5.61) resulting from applying this controller is
J = x 0T P x 0 .
To show that such a controller is an optimal one, we will use a proof by contradiction.
We assume that the hypothesis of the theorem holds but the controller u∗ that satisfies
this hypothesis is not optimal; that is, there is a controller ũ such that
J ( ũ) < J ( u∗ ). (5.64)
The hypothesis of the theorem implies that
V( x(k))|u=ũ + x T (k) Qx(k) + ũ T (k) R ũ(k) ≥ 0. (5.65)
We represent (5.65) as
V( x(k))|u=ũ ≥ −x T (k) Qx(k) − ũ T (k) R ũ(k).
Summing the above from k = 0 to k = ∞ yields
∞
V( x(0)) ≤ ( x T (k) Qx(k) + ũ T (k) R ũ(k));
k=0
that is,
J ( ũ) ≥ J ( u∗ ),
which contradicts (5.64). The proof is complete.
definition of V and substitute into (5.66) the system equation (5.60) to obtain
f (u(k)) = x T (k + 1) P x(k + 1) − x T (k) P x(k) + x T (k) Qx(k) + u T (k)Ru(k)
= ( Ax(k) + Bu(k))T P( Ax(k) + Bu(k)) − x T (k) P x(k) + x T (k) Qx(k)
+ u T (k)Ru(k). (5.67)
Applying to the above function the first-order necessary condition for a relative minimum gives
∂ f (u(k))
= 2( Ax(k) + Bu(k))T P B + 2u T (k)R
∂ u(k)
= 2x T (k) AT P B + 2u T (k)(R + B T P B)
= 0T .
The matrix R + B T P B is positive definite because R is positive definite, and therefore the
matrix R + B T P B is invertible. Hence,
u∗ (k) = −(R + B T P B)−1 B T P Ax(k) = −K x(k), (5.68)
where
K = (R + B T P B)−1 B T P A.
Let
S = R + B T P B.
Then, we represent (5.68) as
u∗ (k) = −S−1 B T P Ax(k). (5.69)
We next check if u∗ satisfies the second-order sufficient condition for a relative minimizer of
the function (5.66). We have
∂2
(x T (k + 1) P x(k + 1) − x T (k) P x(k) + x T (k) Qx(k) + u T (k)Ru(k))
∂ u2 (k)
∂
= (2x T (k) AT P B + 2u T (k)(R + B T P B))
∂ u(k)
= 2(R + B T P B)
> 0, (5.70)
that is, u∗ satisfies the second-order sufficient condition for a relative minimizer of the func-
tion (5.66).
The optimal controller (5.69) can be constructed if we have found an appropriate positive
definite matrix P. Our next task is to devise a method that would allow us to compute the desired
matrix P. For this, we first find the equation describing the closed-loop system driven by the
optimal controller given by (5.69):
x(k + 1) = ( A − B S−1 B T P A)x(k). (5.71)
Our controller satisfies the hypothesis of Theorem 5.4, that is,
x T (k + 1) P x(k + 1) − x T (k) P x(k) + x T (k) Qx(k) + u∗T (k)Ru∗ (k) = 0.
5.4 DYNAMIC PROGRAMMING 273
Substituting into (5.71) the expression for u∗ given by (5.69), along with performing simple
manipulations, gives
x T (k)( A − B S−1 B T P A)T P( A − B S−1 B T P A)x(k) − x T (k) P x(k)
+ x T (k) Qx(k) + x T (k) AT P B S−1 RS−1 B T P Ax(k)
= x T (k) AT P Ax(k) − x T (k) AT P B S−1 B T P Ax(k)
− x T (k) AT P B S−1 B T P Ax(k) + x T (k) AT P B S−1 B T P B S−1 B T P Ax(k)
− x T (k) P x(k) + x T (k) Qx(k) + x T (k) AT P B S−1 RS−1 B T P Ax(k)
= x T (k) AT P Ax(k) − x T (k) P x(k) + x T (k) Qx(k)
− 2x T (k) AT P B S−1 B T P Ax(k)
+ x T (k) AT P B S−1 (R + B T P B)S−1 B T P Ax(k)
= x T (k) AT P Ax(k) − x T (k) P x(k) + x T (k) Qx(k)
− 2x T (k) AT P B S−1 B T P Ax(k)
+ x T (k) AT P B S−1 B T P Ax(k)
= x T (k)( AT P A − P + Q − AT P B S−1 B T P A)x(k)
= 0. (5.72)
The above equation should hold for any x. For this to be true we have to have
AT P A − P + Q − AT P B S−1 B T P A = O (5.73)
x(t)
terminating at the final state x(t f ) at time t f . Let x(t) be an intermediate state on the optimal tra-
jectory and let J ∗ (t, x(t)) be the optimal cost of traveling from x(t) to the final state x(t f ). Thus,
J ∗ (t0 , x(t0 )) is the optimal cost of traveling from x 0 = x(t0 ) to x(t f ). It then follows that the
portion of the optimal trajectory from x(t) to x(t f ) is also optimal. For if some other path,
indicated by the dashed curve in Figure 5.22, connecting x(t) and x(t f ) resulted in a smaller
cost than the solid path, then this would provide a less expensive route from x 0 to x(t f ), thus
contradicting the optimality of the original trajectory.
We note that the PO seems to be already known to Johann Bernoulli, who stated in 1706 that
“. . . all curves, which should give a maximum, preserve also in all their sections the laws of the
same maximum.” Here, the reader should interpret maximum as optimum.
Bellman’s principle of optimality serves to “limit the number of potentially optimal control
strategies that must be investigated” [182, p. 315]. The optimal control strategy is determined,
using the PO, by working backward from the final stage.
We illustrate this “backwardness” with an example of a sequential decision-making process.
We define a decision as the choice of alternative paths leaving a given node.
◆ Example 5.16
We use the PO to find the minimum cost path that starts at the node a and ends
at any one of the nodes k, l, m in the routing network depicted in Figure 5.23.
The travel costs are shown beside each path segment. The line segments can only
be traversed from left to right. Applying the PO to this routing problem, we find
the optimal path by performing a backward pass through the network. Let J q∗ be
the minimum cost from a generic node q to any one of the three possible final
nodes. Then, we have
J g∗ = 16, J h∗ = 8, J i∗ = 7, J j∗ = 6.
Thus, in this case, we have two optimal paths from node d. We can take either dgk
or dhk paths. We then compute optimal paths from nodes e and f. We have
J e∗ = min{6 + J h∗ , 10 + J i∗ } = min{6 + 8, 10 + 7} = 14
5.4 DYNAMIC PROGRAMMING 275
g
3 16
d
9 11 8 k
h
b
4 2 6
e 9
a 10
i
7 5
c 4
7 l
14
6
f
12 m
j 8 Figure 5.23 A routing network.
and
J ∗f = min{4 + J i∗ , 12 + J j∗ } = min{4 + 7, 12 + 6} = 11.
Thus, the optimal path emanating from node e is ehk, and the path from node f is
fil. The next stage yields
J b∗ = min{9 + J d∗ , 2 + J e∗ } = min{9 + 19, 2 + 14} = 16,
and
J c∗ = min{5 + J e∗ , 14 + J ∗f } = min{5 + 14, 14 + 11} = 19.
Therefore, the optimal path starting from b is behk, and starting from c the optimal
path is cehk. Finally, the initial, and hence final, stage yields
J a∗ = min{4 + J b∗ , 7 + J c∗ } = min{4 + 16, 7 + 19} = 20.
Hence, the optimal, minimum total cost route from a to any one of the three
possible termination nodes k, l, or m is abehk. The total cost of this path is 20.
We now consider a class of discrete-time dynamical systems modeled by the difference equation
x(k + 1) = f (k, x(k), u(k)), (5.74)
with a given initial condition x(k0 ) = x 0 , and the associated performance index
N −1
J = J0 = (N , x(N )) + F(k, x(k), u(k)).
k=0
Let Jk∗ (x(k)) denote the minimum cost of transferring the system from x(k) to the terminal
point x(N ). Because x(N ) is the terminal point, we have JN∗ (x(N )) = (N , x(N )). Using the
PO, we obtain the cost of transfer from x(N − 1) to the terminal point:
JN∗ −1 (x(N − 1)) = min {F(N − 1, x(N − 1), u(N − 1)) + JN∗ (x(N ))}
u(N −1)
= min {F(N − 1, x(N − 1), u(N − 1)) + (N , x(N ))}.
u(N −1)
276 CHAPTER 5 • OPTIMAL CONTROL
When applying the PO to solving optimal control problems for discrete-time dynamical systems,
we usually perform two stage-by-stage passes through the time stages. We begin with a backward
pass. For this, we use (5.74) to eliminate x(N ). Next, we carry out the minimization to find
u∗ (N − 1). We then repeat the process. A typical step of the backward pass is described as
Jk∗ (x(k)) = min{F(k, x(k), u(k)) + Jk+1
∗
(x(k + 1))}
u(k)
∗
= min{F(k, x(k), u(k)) + Jk+1 ( f (k, x(k), u(k)))}.
u(k)
The backward pass is completed when the initial time k0 is reached. Now, because x 0 = x(k0 )
is known, we find u0 = u(k0 ) in terms of this state. We can then proceed with the forward pass.
We use x 0 , u0 , and (5.74) to compute x(k0 + 1). The state x(k0 + 1) is then used to compute
u(k0 + 1) from x(k0 + 2) = f (k0 + 1, x(k0 + 1), u(k0 + 1)), and so on.
◆ Example 5.17
Consider a scalar discrete-time dynamical system modeled by
1 2
1
J = J 0 = ( x(2) − 10) 2 + ( x (k) + u2 (k)).
2
k=0
Note that the final state, in this example, is free. We use the PO to find optimal
u(0) and u(1). There are no constraints on u(k). We begin with the backward pass.
We have
Then,
3 4
J ∗ ( x(1)) = min 12 ( x 2 (1) + u2 (1)) + J ∗ ( x(2))
u(1)
3 4
= min 12 ( x 2 (1) + u2 (1)) + (2x(1) − 3u(1) − 10) 2 .
u(1)
Because there are no constraints on u(1), we find optimal u(1) as a function of x(1)
by applying the first-order necessary condition for unconstrained optimization to
get
∂ 31 2 4
( x (1) + u2 (1)) + (2x(1) − 3u(1) − 10) 2 = 0.
∂u(1) 2
Hence,
1 (12x(1) − 60),
u(1) = 19
5.4 DYNAMIC PROGRAMMING 277
and therefore
2
1 12x(1) − 60 2 12x(1) − 60
J ∗ ( x(1)) = x 2 (1) + + 2x(1) − 3 − 10 .
2 19 19
and the fact that there are no constraints on u(0), we again use the first-order nec-
essary condition for unconstrained optimization to find optimal u(0). We compute
∂ 31 2 4
( x (0) + u2 (0)) + J ∗ ( x(1)) = 0
∂u(0) 2
to get
5
6
∂ 1 1 12(8 − 3u(0)) − 60 2
8 + u2 (0) +
∂u(0) 2 2 19
2
∂ 12(8 − 3u(0)) − 60
+ 2(8 − 3u(0)) − 3 − 10 = 0.
∂u(0) 19
12x(1) − 60
u(1) = = −1.9237.
19
Therefore,
J ∗ (x(N )) = 12 x T (N )H N x(N ).
Because there are no constraints on control, we can apply the first-order necessary condition
from static optimization to find u∗ (N − 1) as a function of x(N − 1),
∂ JN −1
= ( Ax(N − 1) + Bu(N − 1))T H N B + u T (N − 1)R
∂ u(N − 1)
= 0T .
Solving for the control satisfying the first-order necessary condition, we obtain
u∗ (N − 1) = −(R + B T H N B)−1 B T H N Ax N −1 .
Let
K N −1 = (R + B T H N B)−1 B T H N A,
then
u∗ (N − 1) = −K N −1 x(N − 1).
Computing the optimal cost of transferring the system from N − 1 to N yields
Let
H N −1 = ( A − B K N −1 )T H N ( A − B K N −1 ) + K TN −1 R K N −1 + Q,
then
Decrementing k to N − 2 yields
Note that JN −2 has the same form as JN −1 . Thus, we obtain analogous optimal feedback gain
where N is being replaced with N − 1. Continuing in this fashion, we get the following results
for each k = N − 1, N − 2, . . . , 0:
K k = (R + B T H k+1 B)−1 B T H k+1 A,
u∗ (k) = −K k x(k),
H k = ( A − B K k )T H k+1 ( A − B K k ) + K kT R K k + Q,
and
The above control scheme can be implemented by computing the sequence of gain matrices
{K k } off-line and stored. Then, we can implement the controller u∗ (k) = −K k x(k). Note that
the time-varying nature of the feedback controller may make an implementation of the control
strategy quite complicated, especially for large n and N . However, as we go to the limit as
N → ∞, we can simplify the solution to the problem. Specifically, let us consider a (reachable)
system model
x(k + 1) = Ax(k) + Bu(k), k ≥ 0, x(0) = x 0
and the performance index
∞
1 T
J= {x (k) Qx(k) + u T (k)Ru(k)}
2 k=0
obtained from the previously considered performance index by letting N → ∞ and setting
H N = 0. Then, the optimal controller takes the form
u∗ (k) = −K ∞ x(k), k ≥ 0,
where
K ∞ = (R + B T H ∞ B)−1 B T H ∞ A (5.75)
and H ∞ is defined as the limit obtained by continuing the recursion
H k = ( A − B K k )T H k+1 ( A − B K k ) + K kT R K k + Q (5.76)
280 CHAPTER 5 • OPTIMAL CONTROL
We call such a controller time optimal. We assume that for any x = x f there exists admissible,
time optimal control transferring the system from x to x f in minimum time. We denote by
J ∗ (t, x) = J ∗ (t, x(t)) the time of the optimal transfer from x(t) to the specified x f . Observe
that J ∗ (t, x) = J ∗ (t, x1 , x2 . . . , xn ), that is,
J ∗ : Rn+1 → R,
which means that J ∗ is a real-valued function of n + 1 variables. We assume that the function
J ∗ is continuous and has continuous partial derivatives everywhere except at the point x f .
5.4 DYNAMIC PROGRAMMING 281
x(t0)
t t0
Optimal trajectory
x(t)
J *(t, x(t))
xf Figure 5.24 Illustration of the derivation of the
condition that a controller must satisfy when
transferring a dynamical system from an initial
state to the final state in an optimal manner.
Now, let x 0 = x(t0 ) = x f be an arbitrary initial state in Rn . Assume that we applied some
admissible u0 ∈ U on the time interval [t0 , t]. As a result of this control action, the system is
transferred from x(t0 ) to x(t). The time spent on the motion from x(t0 ) to x(t) is t − t0 . We
then apply an admissible optimal control that transfers the system in minimum time J ∗ (t, x(t))
from x(t) to x f . The above considerations are illustrated in Figure 5.24. Denoting the optimal
time of transferring the system from x(t0 ) to x f by J ∗ (t0 , x(t0 )), we obtain
(t − t0 ) + J ∗ (t, x(t)) ≥ J ∗ (t0 , x(t0 )).
Rearranging the terms of the above equation and dividing by (t − t0 ) yields
J ∗ (t, x(t)) − J ∗ (t0 , x(t0 ))
1+ ≥ 0.
t − t0
Letting t → t0 and taking into account that ẋ i = f i (t, x, u), i = 1, 2, . . . , n, we obtain
∂ J ∗ (t, x) ∂ J ∗ (t, x) ∂ J ∗ (t, x) ∂ J ∗ (t, x)
n n
1+ + ẋ i = 1 + + fi
∂t i=1
∂ xi ∂t i=1
∂ xi
∂ J∗ ∂ J∗
= 1+ + f
∂t ∂x
≥ 0.
We conclude that for any x 0 = x f and for any u ∈ U ,
∂ J∗ ∂ J∗
1+ + f ≥ 0.
∂t ∂x
We next define the Hamiltonian function for the time optimal control problem to be
H (t, x, u, p) = 1 + pT f (t, x, u),
where
T
∂ J∗ ∂ J∗ ∂ J∗
p= ... .
∂ x1 ∂ x2 ∂ xn
Thus, we can say that for any x 0 = x f and for any u ∈ U ,
∂ J ∗ (t, x)
+ H (t, x, u) ≥ 0.
∂t
282 CHAPTER 5 • OPTIMAL CONTROL
Optimal trajectory
x(t0)
x(t)
In the following, we use the PO. Suppose that u(t) ∈ U is an optimal controller that transfers the
system from the initial state x(t0 ) = x f to the given final state x f in a time t f . Thus, we have
J ∗ (t0 , x(t0 )) = t f − t0 . The transfer from x(t) to x f requires (t f − t) time units. See Figure 5.25
for an illustration of the arguments. Hence,
t f − t = J ∗ (t0 , x(t0 )) − (t − t0 ),
and it is not possible to go from x(t) to x f in a time less than J ∗ (t0 , x(t0 )) − (t − t0 ). For if
such a more rapid motion were to exist, indicated by the dashed curve in Figure 5.25, then by
moving from x 0 to x(t) during the time t − t0 and then moving from x(t) to x f during a time
less than J ∗ (t0 , x(t0 )) − (t − t0 ), we would accomplish the transfer from x 0 to x f during a time
less than J ∗ (t0 , x(t0 )), which is impossible. Therefore, we can write
t f − t = J ∗ (t, x(t))
= J ∗ (t0 , x(t0 )) − (t − t0 ).
Rearranging terms, dividing by t − t0 , and letting t → t0 yields
∂ J ∗ (t, x)
+ min H (t, x, u) = 0 for any x = x f .
∂t u∈U
Theorem 5.5 Let ẋ = f (t, x, u) be a control system model, let U be a set of admissible
controllers, and let x f be a prescribed final state. If we suppose that
1. for any x = x f there exists an admissible optimal control transferring x to x f , and
2. the function J ∗ (t, x(t)), which is the time of optimal transfer of the system from x,
at time t, to x f , is continuously differentiable except at the point x f ,
5.4 DYNAMIC PROGRAMMING 283
then
∂ J ∗ (t, x)
+ H(t, x, u) ≥ 0 for any u ∈ U and x = x f ,
∂t
∂ J ∗ (t, x)
+ min H(t, x, u) = 0 for any optimal transfer where x = x f .
∂t u∈U
The above theorem is a special case of a more general result known as the continuous form
of the dynamic programming. We now use the PO to derive a continuous form of dynamic
programming for a class of problems with more general performance indices.
Define
tf
J (t, x(t), u(τ )) = (t f , x(t f )) + F(τ, x(τ ), u(τ )) dτ
t
However, by the PO the trajectory on the interval [t + t, t f ] must be optimal. Hence,
t+t
J ∗ (t, x(t)) = min F dτ + J ∗ (t + t, x(t + t)) .
u t
We next expand J ∗ (t + t, x(t + t)) in a Taylor series about the point (t, x(t)) to get
t+t
∗ ∗ ∂ J∗ ∂ J∗
J (t, x(t)) = min F dτ + J + t + (x(t + t) − x(t)) + H.O.T. ,
u t ∂t ∂x
284 CHAPTER 5 • OPTIMAL CONTROL
where H.O.T. stands for higher-order terms. Because J ∗ (t, x(t)) is independent of u, we cancel
the terms J ∗ (t, x(t)) out and use that fact that x(t + t) − x(t) ≈ ẋt to obtain
t+t
∂ J∗ ∂ J∗
0 = min F dτ + t + ẋ t + H.O.T. .
u t ∂t ∂x
By assumption, t is small; hence we can write
∂ J∗ ∂ J∗
0 = min F t + t + f t + H.O.T. .
u ∂t ∂x
Dividing the above equation by t and letting t → 0 yields
∂ J∗ ∂ J∗
0= + min F + f . (5.81)
∂t u ∂x
∗
Let H = F + ∂∂Jx f denote the Hamiltonian function. Then, we write the partial differential
equation (5.81) as
∂ J∗
0= + min H (5.82)
∂t u
◆ Example 5.18
We consider a general class of dynamical systems modeled by the scalar differential
equation
ẋ = ax + bu, x(t0 ) = x0 ,
where the initial time t0 equals zero, the final time t f < ∞ is fixed, and the final
state x(t f ) is free. Our goal is construct the control u∗ that minimizes J (0).
We first form the Hamiltonian function for the problem:
∂J ∗ 1 2 1 2 ∂J ∗
H t, x, u, = qx + r u + (ax + bu).
∂x 2 2 ∂x
5.4 DYNAMIC PROGRAMMING 285
Because there are no constraints on the control u, we can determine the form
of the optimal control by applying the first-order necessary condition for static
optimization. We compute
∂H ∂J ∗
= ru + b = 0.
∂u ∂x
1 ∂J ∗
u∗ = − b .
r ∂x
∂2 H
= r > 0;
∂u2
that is, the second-order sufficient condition for static minimization is satisfied.
We thus reduced the problem of constructing the optimal control to that of finding
∂ J ∗ /∂ x. Substituting u∗ into the HJB equation yields
∂J ∗ 1 2 b2 ∂ J ∗ 2 ∂J ∗ b2 ∂ J ∗ 2
+ qx + + ax −
∂t 2 2r ∂x ∂x r ∂x
∂J ∗ 1 2 b2 ∂ J ∗ 2 ∂J ∗
= + qx − + ax
∂t 2 2r ∂x ∂x
= 0.
Let us now assume that J ∗ is a quadratic function of the state for all t ≤ t f , that is,
J ∗ = 12 p(t) x 2 ,
∂J ∗ 1 ∂J ∗
= ṗ(t) x 2 and = p(t) x.
∂t 2 ∂x
Substituting the above into the HJB equation and performing some manipulations
gives
1 1 b 2 p 2 (t)
ṗ(t) + q + ap(t) − x 2 = 0.
2 2 2r
1 1 b 2 p 2 (t)
ṗ(t) + q + ap(t) − = 0,
2 2 2r
286 CHAPTER 5 • OPTIMAL CONTROL
or, equivalently,
b 2 p 2 (t)
ṗ(t) + q + 2ap(t) − =0 (5.83)
r
subject to the boundary condition p(t f ) = f that we obtain by comparing J ∗ =
1 p (t) x 2 and J (t ) = 1 f x 2 (t ). To solve the above equation we assume that
2 f 2 f
ẇ(t)
p(t) = ρ ,
w(t)
b2 p2
ṗ + q + 2ap −
r
ẅw − ẇ2 w2 ẇ b 2 ρ 2 ẇ2 /r
=ρ + q + 2aρ −
w2 w2 w w2
ρ ẅw − ρ ẇ + qw + 2aρ ẇw − b ρ 2 ẇ2 /r
2 2 2
=
w2
= 0.
We now choose ρ so that the terms nonlinear in ẇ cancel out. For this it is enough
that
−ρ ẇ2 − b 2 ρ 2 ẇ2 /r = 0.
Hence, if we set
r
ρ = − 2,
b
then the equation to be solved takes the form
r 2ar
− 2 ẅ − 2 ẇ + qw w
b b
= 0.
w2
The above will be satisfied if
r 2ar
− 2 ẅ − 2 ẇ + qw = 0,
b b
or, equivalently,
qb 2
ẅ + 2a ẇ − w = 0,
r
5.4 DYNAMIC PROGRAMMING 287
Hence
and
We substitute the above expression for C1 into p(t) and cancel out C2 . Then, we
use the obtained result to construct the optimal state-feedback controller:
1 ∂J ∗ 1
u∗ = − b = − bp(t) x
r ∂x r
◆ Example 5.19
Consider the schematic of an armature controlled DC motor, given in Figure 5.26,
where the system parameters are Ra = 2 , L a ≈ 0 H, K b = 2 V/rad/sec, and
K i = 2 Nm/A, and the equivalent moment of inertia referred to the motor shaft is
I eq = 1 kg · m2 . The friction is assumed to be negligible.
Ra La (negligible)
ea
Ieq
Figure 5.26 Schematic of an
if constant armature-controlled DC motor system
of Example 5.19.
288 CHAPTER 5 • OPTIMAL CONTROL
where i a is the armature current. The final state is free. There are no constraints
on ea . Assume J ∗ = 12 p(t)ω2 .
Hence,
∂J ∗
ea = 2ω − . (5.85)
∂ω
5.4 DYNAMIC PROGRAMMING 289
∂J ∗ 1 ∂J ∗
= ṗ ω2 and = p ω.
∂t 2 ∂ω
ṗ − p 2 = 0. (5.86)
We assume that p = ρ ẇ
w . Therefore,
ẅw − ẇ2
ṗ = ρ .
w2
Selecting ρ = −1 eliminates the nonlinear terms in the numerator, and we are left
with the equation
ẅ = 0,
whose solution is
w = w(t) = C1 t + C2 ,
ẇ C1
p=− =− .
w C1 t + C2
Because
C1
p (10) = 1 = − .
10C1 + C2
290 CHAPTER 5 • OPTIMAL CONTROL
Hence,
C2 = −11C1 ,
and
C1 C1 1
p = p(t) = − =− =− .
C1 t + C2 C1 t − 11C1 t − 11
Therefore,
∂J ∗ 1
ea∗ = 2ω − = 2ω − pω = 2+ ω
∂ω t − 11
In the above examples, the final time t f was fixed and t f < ∞. In the following example, we
consider the linear quadratic regulator design for t f = ∞ using the HJB equation.
◆ Example 5.20
For the dynamical system model
ẋ 1 0 1 x1 0
= + u
ẋ 2 0 −1 x2 1
we will use the HJB equation to find the optimal state-feedback controller. We first
form the Hamiltonian function
1 2 ∂J ∗ ∂J ∗
H= x1 + x22 + u2 + x2 + (−x2 + u).
2 ∂ x1 ∂ x2
Hence,
∂J ∗
= 0,
∂t
and the HJB equation takes the form
min H = 0,
u
5.4 DYNAMIC PROGRAMMING 291
Hence,
∂J ∗
u∗ = − .
∂ x2
Note that
∂J ∗ ∂J ∗
= αx1 + βx2 and = βx1 + γ x2 .
∂ x1 ∂ x2
1. α = 2, β = γ = 1.
2. α = −2, β = 1, γ = −3.
3. α = 0, β = γ = −1.
292 CHAPTER 5 • OPTIMAL CONTROL
u1 = −x1 − x2 ,
u2 = −x1 + 3x2 ,
u3 = x1 + x2 .
However, only u1 stabilizes the plant. Hence, the optimal control law for the
problem is
u∗ = −x1 − x2 .
Note that only for this controller the boundary condition, J ∗ ( x(t f )) = J ∗ ( x(∞)) = 0,
is satisfied.
subject to
ẋ = f (x, u), x(t0 ) = x 0 , and x(t f ) = x f . (5.88)
We consider two cases: fixed final state and free final state.
It follows from our discussion of the HJB equation that the optimal controller, u∗ , minimizes
the Hamiltonian function, that is,
u∗ = arg min H (x, u, p),
u∈U
compute
∂H ∂
= (F + pT f )
∂x ∂x
∂F ∂p ∂f
= + fT + pT , (5.90)
∂x ∂x ∂x
where ∂ F/∂ x = ∇ xT F. From our previous discussion it follows that the Hamiltonian function
attains its minimum on the optimal trajectory. Taking this into account, we conclude that for the
optimal process we obtain
∂H
= 0T for (x, u) = (x ∗ , u∗ ).
∂x
Hence, for the optimal process (x ∗ , u∗ ) we have
∂F ∂p ∂f
+ fT + pT = 0T . (5.91)
∂x ∂x ∂x
We now differentiate p with respect to time to get
∂p ∂p
ṗ = ẋ = f. (5.92)
∂x ∂x
Combining (5.91) and (5.92) and taking into account (5.89), we obtain
∂F T ∂f T
ṗ = − − p. (5.93)
∂x ∂x
Thus, p is the solution to the differential equation (5.93), and hence p = p(t). Therefore
∂ p/∂ x = 0, and by (5.90) we have
∂H ∂F ∂f
= + pT . (5.94)
∂x ∂x ∂x
Comparing (5.93) and (5.94) gives
∂H T
ṗ = − (5.95)
∂x
The above equation is called in the literature the adjoint or costate equation.
We summarize our development in the following theorem.
Theorem 5.6 Necessary conditions for u ∈ U to minimize (5.87) subject to (5.88) are
∂H T
ṗ = − ,
∂x
where H = H( x, u, p) = F ( x, u) + pT f ( x, u), and
H( x ∗ , u∗ , p∗ ) = min H( x, u, p).
u∈U
If the final state, x(t f ), is free, then in addition to the above conditions it is required
that the following end-point condition is satisfied:
p (t f ) = ∇ x |t=t f .
We now illustrate the above theorem with the following well-known example.
294 CHAPTER 5 • OPTIMAL CONTROL
◆ Example 5.21
We consider a controlled object modeled by
ẋ 1 0 1 x1 0
= + u
ẋ 2 0 0 x2 1
= Ax + bu
with the performance index
tf
J = dt.
0
Thus, the optimal control law is piecewise constant taking the values 1 or −1. This
control law has at most two intervals of constancy because the argument of the sign
function is a linear function, −d1 t + d2 , that changes its sign at most once. This
type of control is called a bang-bang control because it switches back and forth
between its extreme values. System trajectories for u = 1 and u = −1 are families
of parabolas, and their equations were derived in Example 2.2. A phase portrait for
u = 1 is shown in Figure 2.4, while that for u = −1 is shown in Figure 2.5. Note
that only one parabola from each family passes through the specified terminal
point x = [0 0]T in the state plane. Segments of the two parabolas through the
origin form the switching curve,
x1 = − 12 x 22 sign( x2 ).
This means that if an initial state is above the switching curve, then u = −1 is used
until the switching curve is reached. Then, u = 1 is used to reach the origin. For an
initial state below the switching curve, the control u = 1 is used first to reach the
switching curve, and then the control is switched to u = −1. The above control
action can be described as
1 if v < 0,
u = −sign( x2 ) if v = 0,
−1 if v > 0,
3
2.5
2
1.5
1
.5
x2 0
.5
1
1.5
2
2.5
3
6 5 4 3 2 1 0 1 2 3 4 5 6
x1
Figure 5.27 A phase-plane portrait of the time optimal closed-loop system of Example 5.21.
296 CHAPTER 5 • OPTIMAL CONTROL
◆ Example 5.22
Consider the schematic of an armature-controlled DC motor, given in Figure 5.26.
Use the minimum principle to find the armature voltage ea (t) such that the motor
angular velocity ω changes from ω(0) = 0 rad/sec at t = 0 to ω(1) = 10 rad/sec
while minimizing the energy dissipated in the armature resistor, Ra . There are no
constraints on ea . Note that the energy dissipated in the armature resistor is
1
J = Ra i a2 (t)dt,
0
Because there are no constraints on ea , we can find the optimal control by solving
the equation
∂H 2(ea − K b ω) Ki
0= = + p.
∂ea Ra I eq Ra
Hence,
ea = 2ω − p. (5.96)
The costate equation is
∂H 2K b(ea − K b ω) Ki Kb
ṗ = − = + p = 2(ea − 2ω) + 2 p. (5.97)
∂ω Ra Ra I eq
◆ Example 5.23
In this example, we will solve Johann Bernoulli’s brachistochrone problem, intro-
duced in Section 5.2, using the minimum principle. We first recall the problem as
restated by Johann Bernoulli in 1697 (quoted from Dunham [70, p. 200]):
“Among the infinitely many curves which join the two given points . . . choose
the one such that, if the curve is replaced by a thin tube or groove, and a small
sphere placed in it and released, then this will pass from one point to the other in
the shortest time.”
It is difficult not to mention the reward that one obtains for solving the problem
(quoted from Dunham [70, p. 200]):
“Let who can seize quickly the prize which we have promised to the solver.
Admittedly this prize is neither of gold nor silver, for these appeal only to base and
venal souls. . . . Rather, since virtue itself is its own most desirable reward and a
fame is a powerful incentive, we offer the prize fitting for the man of noble blood,
compounded of honor, praise, and approbation . . .” We add that Johann Bernoulli
wrote the above after he already knew the solution to the problem!
Model and the Performance Index
The field is conservative, hence the sums of the kinetic and potential energy is
constant,
1 mv 2 (t) − mgy(t) = 1 mv 2 (t ) − mgy(t ) = 0,
2 2 0 0
where we assumed that the particle is at rest at t0 . Using the above, we can find
the particle’s speed at any time t ≥ t0 :
v(t) = 2gy(t).
Then, the brachistochrone problem can be formulated as an optimal control prob-
lem of the form
T
minimize J (t0 ) = dt
t0
ẋ = v cos θ,
subject to
ẏ = v sin θ,
298 CHAPTER 5 • OPTIMAL CONTROL
where θ is the angle that the particle velocity vector forms with the positive x axis,
while ẋ and ẏ are the horizontal and vertical components of the particle velocity
vector. The path angle θ is considered as the control input to be determined.
Particle Trajectory
To proceed, we form the Hamiltonian function
H = 1 + p1 v cos θ + p 2 v sin θ, (5.99)
where p 1 and p 2 are the costate variables. The costate equations are
∂H !
0
ṗ 1 ∂x
= − = g .
ṗ 2 ∂H − ( p 1 cos θ + p 2 sin θ )
v
∂y
It follows from the above that p 1 = K = constant. Because there are no constraints
on the control input θ, the necessary condition for the control optimality is
∂H
= 0,
∂θ
where
∂H
= − p 1 v sin θ + p 2 v cos θ.
∂θ
Hence,
p 2 = p 1 tan θ = K tan θ.
˙ First note that
We next find an expression for θ.
∂ p2 ∂( K tan θ) K
ṗ 2 = θ˙ = θ˙ = θ˙. (5.100)
∂θ ∂θ cos2 θ
On the other hand,
g
ṗ 2 = − ( p 1 cos θ + p 2 sin θ )
v
g
= − ( K cos θ + K tan θ sin θ)
v
Kg sin2 θ
=− cos θ +
v cos θ
Kg cos2 θ + sin2 θ
=−
v cos θ
Kg
=− . (5.101)
v cos θ
Combining (5.100) and (5.101) yields
g
θ̇ = − cos θ. (5.102)
v
5.5 PONTRYAGIN’S MINIMUM PRINCIPLE 299
We now use the boundary conditions to express x and y as functions of time and
θ. First, using the above we can write
dy dy g
ẏ = θ̇ = − cos θ .
dθ dθ v
ẏ = v sin θ.
dy dy 1
2
= = − tan θdθ.
v 2gy g
that is,
1 1
ln |y| = ln | cos θ|.
2g g
y = C cos2 θ,
y( T ) = y1 = C cos2 θ ( T ).
Hence,
y1
C=
cos2 θ( T )
and
y1
y(t) = cos2 θ (t) (5.103)
cos θ( T )
2
dx dx g
ẋ = θ̇ = − cos θ,
dθ dθ v
ẋ = v cos θ.
300 CHAPTER 5 • OPTIMAL CONTROL
dx v2 y1
=− = −2y = −2 cos2 θ.
dθ g cos2 θ ( T )
Hence,
y1 sin 2θ ( T )
C̃ = x1 + θ( T ) + .
cos2 θ( T ) 2
y1
x(t) = x1 + (2(θ( T ) − θ(t)) + sin 2θ ( T ) − sin 2θ(t)) (5.104)
2 cos2 θ( T )
We point out that the equation for x(t), y(t) and θ̇ specify an implicit feedback con-
trol law in the sense that the control θ can be calculated given the state ( x(t), y(t)).
We now use the boundary conditions to find the description of the time-optimal
closed loop system. We can assume, without loss of generality, that x(t0 ) = y(t0 ) =
x(0) = y(0) = 0. Then, using the expression for y(t) we note that y(t0 ) = 0 =
y1
cos2 θ( T )
cos2 θ(t0 ). Thus, the angle of departure is θ(t0 ) = 90◦ . With the above
initial path angle, y, and hence v, is increased at the outset as quickly as possible.
Using these data, we next evaluate x at t = t0 to form an equation that can be
used to find the angle of arrival θ( T ). The equation is
The above equation can be solved easily using MATLAB’s Symbolic Toolbox or
the MATLAB’s fsolve function. Once θ( T ) is found, we use the expression for y
to find cos θ, where
.
y(t)
cos θ(t) = cos θ( T ).
y1
5.5 PONTRYAGIN’S MINIMUM PRINCIPLE 301
Hence,
sin θ(t) = 1 − cos2 θ(t).
We substitute the above expression into ẋ = v cos θ and ẏ = v sin θ to obtain the
state-space model of the time optimal motion of the particle moving under the
force of gravity from a given point to another specified point. The equations are
.
y(t)
ẋ(t) = 2gy(t) cos θ ( T ),
y1
.
y(t)
ẏ(t) = 2gy(t) 1− cos2 θ( T )
y1
dθ g
= −√ cos θ( T ).
dt 2gy( T )
Hence,
g
θ( T ) − θ(t) = − √ cos θ( T )( T − t0 ).
2gy( T )
To obtain an expression for the particle travel time, we set in the above equation
t = t0 = 0 and perform simple manipulations to obtain
π/2 − θ( T )
T = 2y( T )/g
cos θ( T )
We will now show that equations (5.104) and (5.103) describe a cycloid. To pro-
ceed, we define
y1
R= and φ(t) = π − 2θ(t). (5.105)
2 cos2 θ( T )
Using the above notation and taking into account that sin(π − φ) = sin φ, we
rewrite (5.104) as
y
Figure 5.28 Constructing an upside-down cycloid—the solution to the brachistochrone
problem.
5.5 PONTRYAGIN’S MINIMUM PRINCIPLE 303
Using the notation given in (5.105) and the fact that cos 2φ = 2 cos2 φ − 1, we
represent (5.103) as
y1
y(t) = cos2 θ(t)
cos θ( T )
2
y1 cos 2θ(t) + 1
=
cos2 θ( T ) 2
= R(1 + cos(π − φ(t)))
= R(1 − cos φ(t)),
In the following, we discuss a general minimum time regulator problem for a class of multi-
input dynamical systems modeled by
ẋ = Ax + Bu.
The goal is to transfer the system from some initial state to the origin in minimum time subject
to
|u i | ≤ ki , ki > 0,
i = 1, 2, . . . , m.
"t
Thus, the performance index for the problem is J = t0 f dt. We begin our analysis by forming
the associated Hamiltonian function
H = 1 + pT ( Ax + Bu)
m
= 1 + pT Ax + ( pT bi )u i ,
i=1
where bi is the ith column of B. Let
si (t) = p∗T (t)bi .
Then, applying Pontryagin’s minimum principle, we obtain the following form of the optimal
control:
u i∗ (t) = −ki sign(si (t)), i = 1, 2, . . . , m.
Thus, the optimal control for the ith control channel has the bang-bang form, u i∗ = ±ki . Note
that si (t) = p∗T (t)bi = 0 is the switching curve for the ith control channel. We compute the
costate vector from the costate equation,
∂H
ṗ∗T = −
∂x
∂
= − ( p∗T Ax)
∂x
= − p∗T A,
or
ṗ∗ = − AT p∗ .
304 CHAPTER 5 • OPTIMAL CONTROL
subject to
ẋ(t) = Ax(t) + Bu(t), x(t0 ) = x 0 , x(t f ) is free. (5.107)
We assume that there are no constraints on the control input u. The weight matrices F and Q
are real symmetric positive semidefinite, and the matrix R is real symmetric positive definite.
The Hamiltonian function for the above linear quadratic control problem is
H = 12 x T Qx + 12 u T Ru + pT ( Ax + Bu), (5.108)
where p ∈ Rn is the costate vector. It follows from Theorem 5.6 that the optimal controller must
minimize the Hamiltonian function. Because the control vector is unconstrained, the necessary
5.5 PONTRYAGIN’S MINIMUM PRINCIPLE 305
In summary, we reduced the linear quadratic control problem to solving 2n linear differential
equations with mixed boundary conditions, which is an example of a two-point boundary value
problem, or TPBVP for short. We shall now solve the above TPBVP. If we had the initial
conditions x(t0 ) and p(t0 ), then the solution to (5.114) would have the form
! !
x(t f ) H(t f −t0 ) x(t0 )
=e , (5.116)
p(t f ) p(t0 )
where
!
A −B R−1 B T
H= .
−Q − AT
However, in this particular TPBVP, p(t0 ) is unknown and instead we are given p(t f ). To proceed,
let
! !
x(t f ) H(t f −t) x(t)
=e , (5.117)
p(t f ) p(t)
306 CHAPTER 5 • OPTIMAL CONTROL
where
!
H(t f −t)
11 (t f , t) 12 (t f , t)
e = .
21 (t f , t) 22 (t f , t)
Ṗ = P B R−1 B T P − AT P − P A − Q (5.128)
Notes
Sussmann and Willems [273] made the following statement, in 1997, regarding the origins of
optimal control: “Optimal control was born in 1697—300 years ago—in Groningen, a university
town in the north of The Netherlands, when Johann Bernoulli, professor of mathematics at the
local university from 1695 to 1705, published his solution of the brachystochrone problem.”
For a serious student of optimal control, the seminal text by Lee and Markus [177] is recom-
mended. Less advanced than Lee and Markus are Kirk [160], Bryson and Ho [36], Owens [218],
and Pierre [234]. The classical text by Kwakernaak and Sivan [173] is also worth perusing be-
cause of its clarity and its treatment of the stochastic aspects of control problems. Informative,
and at the same time entertaining, accounts of the origins of the calculus of variations are given
by Dunham [70, Chapter 8] and by Sussmann and Willems [273]. Elsgolc [75] provides a lucid
introduction, illustrated with many examples, to the calculus of variations. Applications of the
calculus of variations in physics and engineering are provided by Weinstock [297]. Interrela-
tions between the calculus of variations and optimal control are discussed by Hestenes [121].
Frequency-domain aspects of optimal control and robust optimal control design are discussed by
Burl [39]. Informal discussion of the Pontryagin minimum principle in Section 5.5 was adapted
from Boltyanskii [30]. For more numerical examples related to optimal control, see Lewis and
Syrmos [182], and Bryson [37]. Exciting applications of optimal control to a number of control
problems of technology and science can be found in Lee and Markus [177] .
EXERCISES
5.1 Find the variations of the functionals:
" π/2
(a) v(x1 , x2 ) = 0 (x14 + 2x12 x22 + x22 ) dt,
"1
(b) v(x1 , x2 ) = 0 (e x1 −x2 − e x1 +x2 ) dt.
5.2 Find the equations of the curves that are extremals for the functional
tf
1 2
v(x) = 4
ẋ − t ẋ − x + 12 x ẋ dt
0
Assuming that u = − kx is such that J is minimized, find the poles of the closed-loop
system.
5.5 Given a dynamical system model
0 1 0
ẋ = x+ u,
0 0 1
y = [2 0]x
and the performance index
∞
J= (y 2 + u 2 ) dt.
0
Use the Kronecker product to represent the associated algebraic Riccati equation as the
nonlinear vector algebraic equation. Solve it, and construct the optimal state-feedback
control law u = −kx. Assume that
1
x(0) = .
2
Find the optimal value of J .
5.6 Find a state-feedback control law u so as to minimize
1
J= (x 2 + u 2 ) dt
0
subject to
ẋ = −x + u, x(0) = 2.
∗
Assume J = p(t)x . 2
5.7 Use the principle of optimality to find the set of numbers x1 , x2 , . . . , xn satisfying
x1 + x2 + · · · + xn = c, where c is a positive constant such that the product x1 x2 · · · xn
is maximized (see [20]).
5.8 Use dynamic programming to find the maximum cost path between node a and node
t in the routing network depicted in Figure 5.29, where the travel costs are shown
EXERCISES 309
d
6 3 4
b g
4
1
6 5
7 3
a e
2 t
4
3
8 4
6
5
c h
2
7 3 2 Figure 5.29 A routing network of
f Exercise 5.8.
beside each segment and toll charges are shown by each node. The network can only
be traversed from left to right.
5.9 Use dynamic programming to find u(0) and u(1) that minimize
1
J = (x(2) − 2)2 + u 2 (k)
k=0
subject to
x(k + 1) = x(k) + u(k), x(0) = 1.
5.10 Consider the following optimal control problem:
1
1
minimize J2 (u) = u(k)2
2 k=0
subject to x(k + 1) = x(k) + 2u(k), x(0) = 3, x(2) = 0.
Convert the above optimal control problem into a quadratic optimization problem with
an equality constraint using the composite input vector
u(0)
z= .
u(1)
Then, employ Lagrange’s multiplier theorem to find the optimal sequence {u ∗ (0),
u ∗ (1)}.
5.11 Consider the following optimal control problem:
1
1
1
minimize J2 (u) = x(2)2 + (x(k)2 + u(k)2 )
2 2 k=0
subject to x(k + 1) = x(k) + 2u(k), x(0) = 3.
310 CHAPTER 5 • OPTIMAL CONTROL
Convert the above optimal control problem into a quadratic optimization problem with
an equality constraint using the composite input-state vector
u(0)
x(1)
z= .
u(1)
x(2)
Then, employ Lagrange’s multiplier theorem to find the optimal sequence {u ∗ (0),
u ∗ (1)}.
5.12 Find u = u(t) so as to minimize
1
J= u 2 dt
0
subject to
ẋ = −2x + u, x(0) = 10, x(1) = 0.
5.13 For the circuit shown in Figure 5.30, find the input voltage, u(t), that changes the
current in the inductor from i = 10 A at t = 0 to i = 0 at t = 1 sec while minimizing
the performance index,
1
J= u 2 (t) dt.
0
2 i(t)
u(t) 1H
5.14 Use the minimum principle to find the input voltage u(t) that charges the capacitor, in
Figure 5.31, from x(0) = 2 V at t = 0 to x(10) = 12 V while minimizing the energy
dissipated in the resistor. There are no constraints on u(t). Note that, by Kirchhoff’s
voltage law,
Ri(t) + x(t) = u(t),
i(t) R 100 k
u(t) C 10 F x(t)
Figure 5.31 A capacitor charging circuit
of Exercise 5.14.
EXERCISES 311
5.20 (Based on Rugh and Murphy [246]) Given a dynamical system model
ẋ = Ax + bu,
where
0 1 0 ··· 0 0 0
··· 0
0 0 1 0 0 .
.. ..
A= . ··· ., b = .. ,
0 0 0 ··· 0 1 0
−a0 −a1 −a2 · · · an−2 an−1 1
and the associated performance index
1 ∞ T
J= (x Qx + u 2 ) dt, Q = Q T ≥ 0.
2 0
We label the above problem as ( A, b). Consider now the dynamical system model
x̃˙ = Ã x̃ + b̃ũ,
where b = b̃,
0 1 0 ··· 0 0
0 0 1 · · · 0 0
. ..
à = .. ··· . ,
0 0 0 · · · 0 1
0 0 0 ··· 0 0
and the associated performance index has the form
1 ∞ T
J˜ = ( x̃ Q̃ x̃ + ũ 2 ) dt,
2 0
where Q̃ = Q + D Ã + ÃT D + DbbT D, and
0 1 0 ··· 0 −a0
0 · · · −a1
0 1 0
. ..
D = .. ··· . .
0 0 0 ··· 0 −an−2
−a0 −a1 −a2 · · · −an−2 −an−1
EXERCISES 313
Hint Note that à = A− bbT D and b̃ = b. Write down the algebraic Riccati equation
for the ( Ã, b̃) problem and perform appropriate substitutions.
5.21 Consider the following variation of the brachistochrone problem: A particle of mass
m is to slide down a frictionless track connecting two given points that are not located
on the vertical line. The particle’s initial speed is v0 . The motion of the particle is
constrained only by gravity and the shape of the track. Find the curve of the track that
minimizes the time travel between the two given points. Solve this brachistochrone
problem using the minimum principle of Pontryagin. Illustrate your derivations with a
numerical example and simulations using MATLAB.
5.22 (a) Consider the ground vehicle model described in Subsection 1.7.2. Let the input
signal be u = δ f (the front wheel steer angle) and δr = 0. Write a MATLAB
script that animates the lateral motion Y versus the longitudinal motion X of the
vehicle, the steer angle δ f , and the heading angle ψ. Use the steer angle shown in
Figure 5.32.
(b) Use the linear model and the data from Table 1.1 to design a linear quadratic
regulator (LQR). In this part, the control has the form
δ f = −kx + Yr e f ,
where Yr e f is the command input shown in Figure 5.33. That is, the closed-loop
2
Steer angle (deg)
1
2
3
0 2 4 6 8 10
Time (sec)
Figure 5.32 Plot of the steer angle δ f versus time for Exercise 5.22.
314 CHAPTER 5 • OPTIMAL CONTROL
1
2
3
0 2 4 6 8 10
Time (sec)
Figure 5.33 Plot of the command input Yr e f versus time for Exercise 5.22.