Nonlinear Programming 3rd Edition Theoretical Solutions Manual
Nonlinear Programming 3rd Edition Theoretical Solutions Manual
Nonlinear Programming 3rd Edition Theoretical Solutions Manual
3rd Edition
Dimitri P. Bertsekas
Massachusetts Institute of Technology
1
NOTE
This manual contains solutions of the theoretical problems, marked in the book by www It is
continuously updated and improved, and it is posted on the internet at the book’s www page
http://www.athenasc.com/nonlinbook.html
Many thanks are due to several people who have contributed solutions, and particularly to David
Brown, Angelia Nedic, Asuman Ozdaglar, Cynara Wu.
2
Section 2.1
Solutions Chapter 2
SECTION 2.1
2.1.3 www
We have that
2
f (xk+1 ) ≤ max (1 + λi P k (λi )) f (x0 ), (1)
i
for any polynomial P k of degree k and any k, where {λi } is the set of the eigenvalues of Q. Chose
P k such that
(z1 − λ) (z2 − λ) (zk − λ)
1 + λP k (λ) = · ··· .
z1 z2 zk
Define Ij = [zj − δj , zj + δj ] for j = 1, . . . , k. Since λi ∈ Ij for some j, we have
Hence
2 2
max (1 + λi P k (λi )) ≤ max max (1 + λP k (λ)) . (2)
i 1≤j≤k λ∈Ij
(zl −λ)2
Here we used the fact that λ ∈ Ij implies λ < zl for l = j + 1, . . . , k, and therefore zl2
≤1
for all l = j + 1, . . . , k. Thus, from (2) we obtain
where
δ12 δ22 (z2 + δ2 − z1 )2 δk2 (zk + δk − z1 )2 · · · (zk + δk − zk−1 )2
R= , , · · · , .
z12 z12 z22 z12 z21 · · · zk2
The desired estimate follows from (1) and (3).
3
Section 2.1
2.1.4 www
It suffices to show that the subspace spanned by g 0 , g 1 , . . . , g k−1 is the same as the subspace
spanned by g 0 , Qg 0 , . . . , Qk−1 g 0 , for k = 1, . . . , n. We will prove this by induction. Clearly, for
k = 1 the statement is true. Assume it is true for k − 1 < n − 1, i.e.
where span{v 0 , . . . , v l } denotes the subspace spanned by the vectors v 0 , . . . , v l . Assume that
g k 6= 0 (i.e. xk 6= x∗ ). Since g k = ∇f (xk ) and xk minimizes f over the manifold x0 +
span{g 0 , g 1 , . . . , g k−1 }, from our assumption we have that
k−1 k−1
!
X X
g k = Qxk − b = Q x0 + ξi Qi g 0 − b = Qx0 − b + ξi Qi+1 g 0 .
i=0 i=0
If ξk−1 = 0, then from (1) and the inductive hypothesis it follows that
2.1.5 www
Let xk be the sequence generated by the conjugate gradient method, and let dk be the sequence
of the corresponding Q-conjugate directions. We know that xk+1 minimizes f over
x0 + span {d0 , d1 , . . . , dk }.
4
Section 2.1
Let x̃k be the sequence generated by the method described in the exercise. In particular, x̃1 is
generated from x0 by steepest descent and line minimization, and for k ≥ 1, x̃k+1 minimizes f
over the two-dimensional linear manifold
so
x0 + span {d0 , d1 , . . . , dk−1 } ⊃ xk + span {dk−1 and dk } ⊃ xk + span {dk }.
The vector xk+1 minimizes f over the linear manifold on the left-hand side above, and also
over the linear manifold on the right-hand side above (by the definition of a conjugate direction
method). Moreover, x̃k+1 minimizes f over the linear manifold in the middle above. Hence
xk+1 = x̃k+1 .
Suppose that x1 , . . . , xk have been generated by the method of Exercise 1.6.5, which by the result
of that exercise, is equivalent to the conjugate gradient method. Let y k and xk+1 be generated
by the two line searches given in the exercise.
x0 + span {g 0 , g 1 , . . . , g k−1 },
so that
g k ⊥ span {g 0 , g 1 , . . . , g k−1 },
and in particular
g k ⊥ g k−1 . (1)
5
Section 2.1
Also, since y k is the vector that minimizes f over the line yα = xk − αg k , α ≥ 0, we have
g k ⊥ ∇f (y k ). (2)
Any vector on the line passing through xk−1 and y k has the form
y = αxk−1 + (1 − α)y k , α ∈ ℜ,
= αg k−1 + (1 − α)∇f (y k ).
From Eqs. (1)-(3), it follows that g k is orthogonal to the gradient ∇f (y) of any vector y on the
line passing through xk−1 and y k .
In particular, for the vector xk+1 that minimizes f over this line, we have that ∇f (xk+1 )
is orthogonal to g k . Furthermore, because xk+1 minimizes f over the line passing through xk−1
and y k , ∇f (xk+1 ) is orthogonal to y k − xk−1 . Thus, ∇f (xk+1 ) is orthogonal to
span {g k , y k − xk−1 },
since xk−1 , xk , and y k form a triangle whose side connecting xk and y k is proportional to g k .
Thus xk+1 minimizes f over
xk + span {g k , xk − xk−1 },
2.1.7 www
1 ′
f (x) = x Qx + b′ x.
2
k−1 k−1
( ) ( )
X X
xk = arg min f (x)|x = x0 + γ i di , γ i ∈ ℜ = arg min f (x)|x = x0 + δi gi, δi ∈ℜ ,
i=1 i=1
6
Section 2.1
where di are the conjugate directions, and g i are the gradient vectors. At the beginning of the
(k + 1)st iteration, there are two possibilities:
(1) g k = 0: In this case, xk is the global minimum since f (x) is a convex function.
(2) g k 6= 0: In this case, a new conjugate direction dk is generated. Here, we also have two
possibilities:
(b) A minimum along the direction dk does not exist. This occurs if there exists a direction
d in the manifold spanned by d0 , . . . , dk such that d′ Qd = 0 and b′ d 6= 0. The problem
in this case has no solution.
If the problem has no solution (which occurs if there is some vector d such that d′ Qd = 0
but b′ d 6= 0), the algorithm will terminate because the line minimization problem along such a
direction d is unbounded from below.
If the problem has infinitely many solutions (which will happen if there is some vector d
such that d′ Qd = 0 and b′ d = 0), then the algorithm will proceed as if the matrix Q were positive
definite, i.e. it will find one of the solutions (case 1 occurs).
However, in both situations the algorithm will terminate in at most m steps, where m is
the rank of the matrix Q, because the manifold
k−1
X
{x ∈ ℜn |x = x0 + γ i di , γ i ∈ ℜ}
i=0
2.1.8 www
Let S1 and S2 be the subspaces with S1 ∩ S2 being a proper subspace of ℜn (i.e. a subspace
of ℜn other than {0} and ℜn itself). Suppose that the subspace S1 ∩ S2 is spanned by linearly
independent vectors vk , k ∈ K ⊆ {1, 2, . . . , n}. Assume that x1 and x2 minimize the given
quadratic function f over the manifolds M1 and M2 that are parallel to subspaces S1 and S2 ,
respectively, i.e.
7
Section 2.2
and {vk | k ∈ K} are linearly independent. From the definition of x1 and x2 we have that
d d
f (x1 + tv k ) =0 and f (x2 + tv k ) = 0,
dt t=0 dt t=0
x1 ′ Qv k − b′ v k = 0 and x2 ′ Qv k − b′ v k = 0.
(x1 − x2 )′ Qv k = 0, ∀ k ∈ K.
Hence, x1 − x2 is Q-conjugate to all vectors in the intersection S1 ∩ S2 . We can use this property
to construct a conjugate direction method that does not evaluate gradients and uses only line
minimizations in the following way.
Initialization: Choose any direction d1 and points y 1 and z 1 such that M11 = y 1 + span{d1 },
M21 = z 1 + span{d1 }, M11 6= M21 . Let d2 = x11 − x21 , where xi1 = arg minx∈M 1 f (x) for i = 1, 2.
i
As both x1k and x2k minimize f over the manifolds that are parallel to span{d1 , . . . , dk }, setting
dk+1 = x2k − x1k we have that d1 , . . . , dk , dk+1 are Q-conjugate directions (here we have used the
established property).
In this procedure it is important to have a step which given a nonoptimal point x generates
a point y for which f (y) < f (x). If x is an optimal solution then the step must indicate this fact.
Simply, the step must first determine whether x is optimal, and if x is not optimal, it must find
a better point. A typical example of such a step is one iteration of the cyclic coordinate descent
method, which avoids calculation of derivatives.
SECTION 2.2
8
Section 2.2
2.2.1 www
The proof is by induction. Suppose the relation Dk q i = pi holds for all k and i ≤ k − 1. The
relation Dk+1 q i = pi also holds for i = k because of the following calculation
yk yk ′ qk
Dk+1 q k = Dk q k + ′ = Dk q k + y k = Dk q k + (pk − Dk q k ) = pk .
qk yk
y k (pk − Dk q k )′ q i y k (pk ′ q i − q k ′ pi )
Dk+1 q i = Dk q i + ′ = p i+
′ .
qk yk qk yk
Since pk ′ q i = pk ′ Qpi = q k ′ pi , the second term in the right-hand side vanishes and we have
Dk+1 q i = pi . This completes the proof.
To show that (Dn )−1 = Q, note that from the equation Dk+1 q i = pi , we have
−1
Dn = p0 · · · pn−1 q 0 · · · q n−1 , (*)
while from the equation Qpi = Q(xi+1 − xi ) = (Qxi+1 − b) − (Qxi − b) = ∇f (xi+1 ) − ∇f (xi ) = q i ,
we have
Q p0 · · · pn−1 = q 0 · · · q n−1 ,
or equivalently
−1
Q = q 0 · · · q n−1 p0 · · · pn−1 . (**)
(Note here that the matrix p0 · · · pn−1 is invertible, since both Q and q 0 · · · q n−1 are
invertible by assumption.) By comparing Eqs. (*) and (**), it follows that (Dn )−1 = Q.
2.2.2 www
9
Section 2.2
2.2.3 www
(a) For simplicity, we drop superscripts. Let V = I − ρqp′ , where ρ = 1/(q ′ p). We have
(b) We have, by using repeatedly the update formula for D of part (a),
′ ′
Dk = V k−1 Dk−1 V k−1 + ρk−1 pk−1 pk−1
= V k−1 ′ V k−2 ′ Dk−2 V k−2 V k−1 + ρk−2 V k−1 ′ pk−2 pk−2 ′ V k−1 + ρk−1 pk−1 pk−1 ′ ,
Thus to calculate the direction −Dk ∇f (xk ), we need only to store D0 and the past vectors pi ,
q i , i = 0, 1, . . . , k − 1, and to perform the matrix-vector multiplications needed using the above
formula for Dk . Note that multiplication of a matrix V i or V i ′ with any vector is relatively
simple. It requires only two vector operations: one inner product, and one vector addition.
2.2.4 www
Suppose that D is updated by the DFP formula and H is updated by the BFGS formula. Thus
the update formulas are
pp′ Dqq ′ D
D̄ = D + − ,
p′ q q ′ Dq
p′ Hp qq ′ Hpq ′ + qp′ H
H̄ = H + 1 + ′ − .
qp qp
′ q′ p
If we assume that HD is equal to the identity I, and form the product H̄ D̄ using the above
formulas, we can verify with a straightforward calculation that H̄ D̄ is equal to I. Thus if the
10
Section 2.2
initial H and D are inverses of each other, the above updating formulas will generate (at each
step) matrices that are inverses of each other.
2.2.5 www
pp′ Dqq ′ D
D̄ = D + − ,
p′ q q ′ Dq
Let
R̄ = Q1/2 D̄Q1/2 , R = Q1/2 DQ1/2 ,
r = Q1/2 p, q = Qp = Q1/2 r.
rr′ Rrr′ R
R̄ = R + − ′ .
rr
′ r Rr
µ1 ≤ λ1 ≤ µ2 ≤ · · · ≤ µn ≤ λn ,
rr′
R̄ = P + .
r′ r
We have R̄r = r, so 1 is an eigenvalue of R̄. The other eigenvalues are the eigenvalues µ2 , . . . , µn
of P , since their corresponding eigenvectors e2 , . . . , en are orthogonal to r, so that
R̄ei = P ei = µi ei , i = 2, . . . , n.
(b) We have
r′ Rr
λ1 ≤ ≤ λn ,
r′ r
11
Section 2.2
so if we multiply the matrix R with r′ r/r′ Rr, its eigenvalue range shifts so that it contains 1.
Since
r′ r p′ Qp p′ q p′ q
= ′ 1/2 1/2
= ′ −1/2 −1/2
= ′ ,
r Rr
′ p Q RQ p qQ RQ q q Dq
multiplication of R by r′ r/r′ Rr is equivalent to multiplication of D by p′ q/q ′ Dq.
(cf. Exercise 1.7.2) we again pre- and postmultiply with Q1/2 . We obtain
r′ Rr rr′ Rrr′ + rr′ R
R̄ = R + 1 + ′ − ,
rr rr
′ r′ r
and an analysis similar to the ones in parts (a) and (b) goes through.
2.2.6 www
(a) We use induction. Assume that the method coincides with the conjugate gradient method
up to iteration k. For simplicity, denote for all k,
g k = ∇f (xk ).
The argument given at the end of the proof of Prop. 1.6.1 shows that this formula is the same as
the conjugate gradient formula.
(b) Use a scaling argument, whereby we work in the transformed coordinate system y = D−1/2 x,
where the matrix D becomes the identity.
12