15.093 Optimization Methods

15.
093 Optimization Methods
Lecture 20: The Conjugate Gradient Algorithm
Optimality conditions for constrained optimization
1 Outline
Slide 1
1. The Conjugate Gradient Algorithm
2. Necessary Optimality Conditions
3. Sufficient Optimality Conditions
4. Convex Optimization
5. Applications
2 The Conjugate Gradient Algorithm

2.1 Quadratic functions
Slide 2
1
min f (x) = x� Qx + c� x
2
Definition: d1 , . . . , dn are Q-conjugate if
di =
� 0, d�i Qdj = 0, i =
� j
Proposition: If d1 , . . . , dn are Q-conjugate, then d1 , . . . , dn are linearly inde
pendent.
2.2 Motivation
Slide 3
Given Q-conjugate d1 , . . . , dn , and xk , compute
k
min f (xk + αdk ) = c� x + αc� dk +
α
1 k
(x + αdk )� Q(xk + αdk ) =
2
1
f (xk ) + α�f (xk )� dk + α2 d�k Qdk
2
Solution:
−�f (xk )� dk
α̂k = , xk+1 = xk + α̂k dk
d�k Qdk
2.3 Expanding Subspace

Theorem
Slide 4
d1 , . . . , dn are Q-conjugate. Then, xk+1 solves
min f (x)
k
�
s.t. x = x1 + αj dj
j=1
Moreover, xn+1 = x∗ .
1
2.4 The Algorithm
Slide 5
Step 0 Given x1 , set k := 1, d1 = −�f (x0 )
Step 1 For k = 1, . . . , n do:
If ||�f (xk )|| ≤ �, stop; else:
−�f (xk )� dk
α̂k = argminα f (xk + αdk ) =
d�k Qdk
xk+1 = xk + α̂k dk
−�f (xk+1 )� Qdk
dk+1 = −�f (x k+1 ) + λk dk , λk =
d�k Qdk
2.5 Correctness
Slide 6
Theorem: The directions d1 , . . . , dn are Q-conjugate.
2.6 Convergence Properties

Slide 7
• This is a finite algorithm.
• If there are only k distinct eigenvalues of Q, the CGA finds an optimal
solution in k steps.
• Idea of pre-conditioning. Consider
min f (Sx) = (Sx)� Q(Sx) + c� Sx

2
so that the number of distinct eigenvalues of S � QS is small
2.7 Example
Slide 8
1 �
f (x) = x Qx − c� x
2
⎛ 35 19 22 28 16 3 16 6 4 4 ⎞ ⎛ −1 ⎞
⎜ 19 43 33 19 5 2 5 4 0 0
⎜ 0
⎜ 22 33 40 29 12 7 6 2 2 4 ⎜ 0
⎟ ⎟
⎟ ⎟
⎜ 28 19 29 39 16 7 14 6 2 4 ⎟ ⎜ −3 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 16 5 12 16 12 4 8 2 4 8 ⎜ 0
Q = ⎜ ⎟ and c = ⎜ −2
⎟ ⎟
⎜ 3 2 7 7 4 5 1 0 1 4 ⎟
⎟ ⎜ ⎟
⎜ 16 5 6 14 8 1 12 2 2 4 ⎟ ⎜ 0 ⎟
⎜ 6 4 2 6 2 0 2 4 0 0
⎟ ⎜ −6 ⎟
⎝ ⎠ ⎝ ⎠
4 0 2 2 4 1 2 0 2 4 −7
4 0 4 4 8 4 4 0 4 16 −4
� �2
κ(Q)−1
κ(Q) ≈ 17, 641 δ=
κ(Q)+1
= 0.999774 Slide 9
2
f (xk ) − f (x∗ )
k f (xk ) f (xk ) − f (x∗ )
f (xk−1 ) − f (x∗ )
1 12.000000 2593.726852 1.000000
2 8.758578 2590.485430 0.998750
3 1.869218 2583.596069 0.997341
4 −12.777374 2568.949478 0.994331
5 −30.479483 2551.247369 0.993109
6 −187.804367 2393.922485 0.938334
7 −309.836907 2271.889945 0.949024
8 −408.590428 2173.136424 0.956532
9 −754.887518 1826.839334 0.840646
10 −2567.158421 14.568431 0.007975
11 −2581.711672 0.015180 0.001042
12 −2581.726852 −0.000000 −0.000000
2.8 General problems

Slide 10
Step 0 Given x1 , set k := 1, d1 = −�f (x0 )
Step 1 If ||�f (xk )|| ≤ �, stop; else:
−�f (xk )� dk
α̂k = argminα f (xk + αdk ) =
d�k Qdk
x k+1 = xk + α̂k dk
dk+1 = −�f (xk+1 ) + λk dk
||�f (xk+1 )||
λk =
||�f (xk )||
Step 2 k ← k + 1, goto Step 1
3 Necessary
Optimality Conditions
3.1 Nonlinear Optimization
Slide 11
min f (x)
s.t. gj (x) ≤ 0, j = 1, . . . , p
hi (x) = 0, i = 1, . . . , m
P = {x| gj (x) ≤ 0, j = 1, . . . , p,
hi (x) = 0, i = 1, . . . , m}
3
3.2 The KKT conditions
Slide 12
Discovered by Karush-Kuhn-Tucker in 1950’s.
Theorem
If
• x is local minimum of P
• I = {j| gj (x) = 0}, set of tight constraints
• Constraint qualification condition (CQC): The vectors �gj (x), j ∈ I and
�hi (x), i = 1, . . . , m, are linearly independent
Slide 13
Then, there exist vectors (u, v):
p
� m
�
1. �f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1
2. u ≥ 0
3. uj gj (x) = 0, j = 1, . . . , p
3.3 Some Intuition from LO

Slide 14
Linearize the functions in the neighborhood of the solution x. Problem becomes:
min f (x) + �f (x)� (x − x)
s.t. gj (x) + �gj (x)� (x − x) ≤ 0, j∈I
hi (x) + �hi (x)� (x − x) = 0, i = 1, . . . , m
Slide 15
This is a LO problem. Dual feasibility:
� m
�
ûj �gj (x) + v̂i �hi (x) = �f (x), ûj ≤ 0
j∈I i=1
Change to uj = −ûj , vi = −v̂i to obtain:

p
� m
�
�f (x) + uj �gj (x) + vi �hi (x) = 0, uj ≥ 0
j=1 i=1
3.4 Example 1
Slide 16
min f (x) = (x1 − 12)2 + (x2 + 6)2
s.t. h1 (x) = 8x1 + 4x2 − 20 = 0
g1 (x) = x21 + x22 + 3x1 − 4.5x2 − 6.5 ≤ 0
g2 (x) = (x1 − 9)2 + x22 − 64 ≤ 0
4
x = (2, 1)� ; g1 (x) = 0, g2 (x) = −14, h1 (x) = 0.
Slide 17
• I = {1}
• �f (x) = (−20, 14)�; �g1 (x) = (7, −2.5)�
• �g2 (x) = (−14, 2)� ; �h1 (x) = (8, 4)�
• u1 = 4, u2 = 0, v1 = −1
• �g1 (x), �h1 (x) linearly independent
• �f (x) + u1 �g1 (x) + u2 �g2 (x) + v1 �h1 (x) = 0
� � � � � � � � � �
−20 7 −14 8 0
+4 +0 + (−1) =
14 −2.5 2 4 0
3.5 Example 2
Slide 18
max x� Qx
s.t. x� x ≤ 1
Q arbitrary; Not a convex optimization problem.
min −x� Qx
s.t. x� x ≤ 1
3.5.1 KKT
Slide 19
−2Qx + 2ux = 0
x� x ≤ 1
u ≥ 0
u(1 − x� x) = 0
Slide 20
3.5.2 Solutions of KKT

• x = 0, u = 0, Obj = 0.
• x �= 0 ⇒ Qx = u x ⇒ x eigenvector of Q with non-negative eigenvalue
u.
• x� Qx = u x� x = u.
• Thus, pick the largest nonnegative eigenvalue û of Q. The solution is
the corresponding eigenvector x̂ normalized such that x̂� x̂ = 1. If all
eigenvalues are negative, x̂ = 0.
5
3.6 Are CQC Necessary?
Slide 21
min x1
s.t. x21 − x2 ≤ 0
x2 = 0
Feasible space is (0, 0).
KKT: � � � � � � � �
1 2x1 0 0
+u +v =
0 −1 1 0
KKT multipliers do not exist, while still (0, 0)� is local minimum. Check �g1 (0, 0)
and �h1 (0, 0).
3.7 Constrained Qualification

Slide 22
Slater condition: There exists an x0 such that gj (x0 ) < 0, j = 1, . . . , p, and
hi (x0 ) = 0 for all i = 1, . . . , m.
Theorem Under the Slater condition the KKT conditions are necessary.
4 Sufficient
Optimality Conditions
Slide 23
Theorem If
• x feasible for P
• Feasible set is P is convex and f (x) convex
• There exist vectors (u, v), u ≥ 0:
p
� m
�
�f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1
uj gj (x) = 0, j = 1, . . . , p
Then, x is a global minimum of P .
4.1 Proof
Slide 24
• Let x ∈ P . Then (1 − λ)x + λx ∈ P for λ ∈ [0, 1].
• gj (x + λ(x − x)) ≤ 0 ⇒
�gj (x)� (x − x) ≤ 0
6
• Similarly, hi (x + λ(x − x)) ≤ 0 ⇒
�hi (x)� (x − x) = 0
Slide 25
• Thus,
�f (x)� (x − x) =
⎛ ⎞�
� p m
�
−⎝ uj �gj (x) + vi �hi (x)⎠ (x − x) ≥ 0
j=1 i=1
⇒ f (x) ≥ f (x).
5 Convex Optimization
Slide 26
• The KKT conditions are always necessary under CQC.
• The KKT conditions are sufficient for convex optimization problems.
• The KKT conditions are necessary and sufficient for convex optimization
problems under CQC.
• min f (x) s.t. Ax = b, x ≥ 0, f (x) convex, KKT are necessary and
sufficient even without CQC.
5.0.1 Separating hyperplanes

Slide 27
Theorem Let S be a nonempty closed convex subset of �n and let x∗ ∈ �n be
a vector that does not belong to S. Then, there exists some vector c ∈ �n such
that c� x∗ < c� x for all x ∈ S.
Proof in BT, p.170
5.1 Sketch of the Proof under convexity

Slide 28
• Suppose x is a local (and thus global) optimal solution.
• f (x) < f (x), gj (x) ≤ 0, j = 1, . . . , p, hi (x) = 0, i = 1, . . . , m is infeasi
ble.
• Let U = {(u0 , u, v)| there exists x : f (x) < u0 , gj (x) ≤ uj , hi (x) = vi }.

• (f (x), 0, 0) ∈
/ S.
• U convex.
Slide 29
• By separating hyperplane theorem, there is a vector (c0 , c, d):
p
� m
�
c0 u 0 + cj u j + di vi > c0 f (x) ∀ (u0 , u, v) ∈ U.
j=1 i=1
7
S
.x *
• c0 ≥ 0 and cj ≥ 0 for j ∈ I (constraint gj (x) ≤ 0 tight). Why?

If (u0 , u, v) ∈ U , then (u0 + λ, u, v) ∈ U for λ ≥ 0. Thus,
p
� m
�
∀ λ ≥ 0, λc0 + cj u j + di vi > c0 f (x) ⇒ c0 ≥ 0.
j=1 i=1
• Select (u0 , u, v) = (f (x) + λ, g1 (x), . . . , gp (x), h1 (x), . . . , hm (x)) ∈ U

•
p
� m
�
c0 (f (x) + λ) + cj gj (x) + di hi (x) > c0 f (x)
j=1 i=1
• Take λ → 0:
p
� m
�
c0 f (x) + cj gj (x) + di hi (x) ≥ c0 f (x)
j=1 i=1
• c0 > 0 (constrained qualification needed here).

• p
� m
�
f (x) + uj gj (x) + vi hi (x) ≥ f (x), uj ≥ 0
j=1 i=1
•
p
� m
�
f (x) + uj gj (x) + vi hi (x) ≤ f (x) ≤
j=1 i=1
p
� m
�
f (x) + uj gj (x) + vi hi (x).
j=1 i=1
8
• Thus, ⎛ ⎞
p
� m
�
f (x) = min ⎝f (x) + uj gj (x) + vi hi (x)⎠
j=1 i=1
p
�
uj gj (x) = 0 ⇒ uj gj (x) = 0
j=1
• Unconstrained optimality conditions:

p
� m
�
�f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1
6 Applications
6.1 Linear Optimization
Slide 30
min c� x
s.t. Ax = b
x≥0
min c� x
s.t. Ax − b = 0
−x ≤ 0
6.1.1 KKT
Slide 31
c + A� û − v = 0
v ≥ 0
vj xj = 0
Ax − b = 0
x ≥ 0
u = −û
A� u ≤ c dual feasibility
(cj − A�j u)xj = 0 complementarity
Ax − b = 0 primal feasibility
x ≥ 0 primal feasibility
9
6.2 Portfolio Optimization
Slide 32
x =weights of the portfolio
1
max r � x − λ x� Qx
2
s.t. e� x = 1
1
min −r �x + λ x� Qx
2
s.t. e� x = 1
6.2.1 KKT
Slide 33
−r + λQx + ue = 0
1 −1
x= Q (r − ue)
λ
−1
e� x = 1 ⇒ e� Q (r − ue) = λ
� −1
eQ r−λ
u= � −1
eQ e
As λ changes, tradeoff of risk and return changes. The allocation changes as
well. This is the essense of modern portfolio theory.
10
MIT OpenCourseWare
http://ocw.mit.edu
15.093J / 6.255J Optimization Methods

Fall 2009
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

15.093 Optimization Methods

Uploaded by

Copyright:

Available Formats

15.093 Optimization Methods

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

15.093 Optimization Methods

Uploaded by

Copyright:

Available Formats

15.

093 Optimization Methods

Lecture 20: The Conjugate Gradient Algorithm

Optimality conditions for constrained optimization

2 The Conjugate Gradient Algorithm

2.3 Expanding Subspace

2.6 Convergence Properties

• Idea of pre-conditioning. Consider

min f (Sx) = (Sx)� Q(Sx) + c� Sx

so that the number of distinct eigenvalues of S � QS is small

2.8 General problems

3.3 Some Intuition from LO

Change to uj = −ûj , vi = −v̂i to obtain:

3.5.2 Solutions of KKT

the corresponding eigenvector x̂ normalized such that x̂� x̂ = 1. If all

eigenvalues are negative, x̂ = 0.

3.7 Constrained Qualiﬁcation

Then, x is a global minimum of P .

problems under CQC.

• min f (x) s.t. Ax = b, x ≥ 0, f (x) convex, KKT are necessary and

suﬃcient even without CQC.

5.0.1 Separating hyperplanes

5.1 Sketch of the Proof under convexity

• Let U = {(u0 , u, v)| there exists x : f (x) < u0 , gj (x) ≤ uj , hi (x) = vi }.

• c0 ≥ 0 and cj ≥ 0 for j ∈ I (constraint gj (x) ≤ 0 tight). Why?

• Select (u0 , u, v) = (f (x) + λ, g1 (x), . . . , gp (x), h1 (x), . . . , hm (x)) ∈ U

• c0 > 0 (constrained qualiﬁcation needed here).

• Unconstrained optimality conditions:

15.093J / 6.255J Optimization Methods

You might also like