15.093 Optimization Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

15.

093 Optimization Methods

Lecture 20: The Conjugate Gradient Algorithm

Optimality conditions for constrained optimization

1 Outline
Slide 1
1. The Conjugate Gradient Algorithm
2. Necessary Optimality Conditions
3. Sufficient Optimality Conditions
4. Convex Optimization
5. Applications

2 The Conjugate Gradient Algorithm


2.1 Quadratic functions
Slide 2
1
min f (x) = x� Qx + c� x
2
Definition: d1 , . . . , dn are Q-conjugate if
di =
� 0, d�i Qdj = 0, i =
� j
Proposition: If d1 , . . . , dn are Q-conjugate, then d1 , . . . , dn are linearly inde­
pendent.

2.2 Motivation
Slide 3
Given Q-conjugate d1 , . . . , dn , and xk , compute
k
min f (xk + αdk ) = c� x + αc� dk +
α

1 k
(x + αdk )� Q(xk + αdk ) =
2
1
f (xk ) + α�f (xk )� dk + α2 d�k Qdk
2
Solution:
−�f (xk )� dk
α̂k = , xk+1 = xk + α̂k dk
d�k Qdk

2.3 Expanding Subspace


Theorem
Slide 4
d1 , . . . , dn are Q-conjugate. Then, xk+1 solves
min f (x)
k

s.t. x = x1 + αj dj
j=1

Moreover, xn+1 = x∗ .

1
2.4 The Algorithm
Slide 5
Step 0 Given x1 , set k := 1, d1 = −�f (x0 )
Step 1 For k = 1, . . . , n do:
If ||�f (xk )|| ≤ �, stop; else:

−�f (xk )� dk
α̂k = argminα f (xk + αdk ) =
d�k Qdk

xk+1 = xk + α̂k dk
−�f (xk+1 )� Qdk
dk+1 = −�f (x k+1 ) + λk dk , λk =
d�k Qdk

2.5 Correctness
Slide 6
Theorem: The directions d1 , . . . , dn are Q-conjugate.

2.6 Convergence Properties


Slide 7
• This is a finite algorithm.
• If there are only k distinct eigenvalues of Q, the CGA finds an optimal

solution in k steps.

• Idea of pre-conditioning. Consider

min f (Sx) = (Sx)� Q(Sx) + c� Sx


2

so that the number of distinct eigenvalues of S � QS is small

2.7 Example
Slide 8
1 �
f (x) = x Qx − c� x
2
⎛ 35 19 22 28 16 3 16 6 4 4 ⎞ ⎛ −1 ⎞
⎜ 19 43 33 19 5 2 5 4 0 0
⎜ 0
⎜ 22 33 40 29 12 7 6 2 2 4 ⎜ 0
⎟ ⎟
⎟ ⎟
⎜ 28 19 29 39 16 7 14 6 2 4 ⎟ ⎜ −3 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 16 5 12 16 12 4 8 2 4 8 ⎜ 0
Q = ⎜ ⎟ and c = ⎜ −2
⎟ ⎟
⎜ 3 2 7 7 4 5 1 0 1 4 ⎟
⎟ ⎜ ⎟
⎜ 16 5 6 14 8 1 12 2 2 4 ⎟ ⎜ 0 ⎟
⎜ 6 4 2 6 2 0 2 4 0 0
⎟ ⎜ −6 ⎟
⎝ ⎠ ⎝ ⎠
4 0 2 2 4 1 2 0 2 4 −7
4 0 4 4 8 4 4 0 4 16 −4
� �2
κ(Q)−1
κ(Q) ≈ 17, 641 δ=
κ(Q)+1
= 0.999774 Slide 9

2
f (xk ) − f (x∗ )
k f (xk ) f (xk ) − f (x∗ )
f (xk−1 ) − f (x∗ )
1 12.000000 2593.726852 1.000000
2 8.758578 2590.485430 0.998750
3 1.869218 2583.596069 0.997341
4 −12.777374 2568.949478 0.994331
5 −30.479483 2551.247369 0.993109
6 −187.804367 2393.922485 0.938334
7 −309.836907 2271.889945 0.949024
8 −408.590428 2173.136424 0.956532
9 −754.887518 1826.839334 0.840646
10 −2567.158421 14.568431 0.007975
11 −2581.711672 0.015180 0.001042
12 −2581.726852 −0.000000 −0.000000

2.8 General problems


Slide 10
Step 0 Given x1 , set k := 1, d1 = −�f (x0 )
Step 1 If ||�f (xk )|| ≤ �, stop; else:

−�f (xk )� dk
α̂k = argminα f (xk + αdk ) =
d�k Qdk

x k+1 = xk + α̂k dk
dk+1 = −�f (xk+1 ) + λk dk
||�f (xk+1 )||
λk =
||�f (xk )||
Step 2 k ← k + 1, goto Step 1

3 Necessary
Optimality Conditions
3.1 Nonlinear Optimization
Slide 11
min f (x)
s.t. gj (x) ≤ 0, j = 1, . . . , p
hi (x) = 0, i = 1, . . . , m

P = {x| gj (x) ≤ 0, j = 1, . . . , p,
hi (x) = 0, i = 1, . . . , m}

3
3.2 The KKT conditions
Slide 12
Discovered by Karush-Kuhn-Tucker in 1950’s.
Theorem
If
• x is local minimum of P
• I = {j| gj (x) = 0}, set of tight constraints
• Constraint qualification condition (CQC): The vectors �gj (x), j ∈ I and
�hi (x), i = 1, . . . , m, are linearly independent
Slide 13
Then, there exist vectors (u, v):
p
� m

1. �f (x) + uj �gj (x) + vi �hi (x) = 0

j=1 i=1

2. u ≥ 0
3. uj gj (x) = 0, j = 1, . . . , p

3.3 Some Intuition from LO


Slide 14
Linearize the functions in the neighborhood of the solution x. Problem becomes:
min f (x) + �f (x)� (x − x)
s.t. gj (x) + �gj (x)� (x − x) ≤ 0, j∈I
hi (x) + �hi (x)� (x − x) = 0, i = 1, . . . , m
Slide 15
This is a LO problem. Dual feasibility:
� m

ûj �gj (x) + v̂i �hi (x) = �f (x), ûj ≤ 0
j∈I i=1

Change to uj = −ûj , vi = −v̂i to obtain:


p
� m

�f (x) + uj �gj (x) + vi �hi (x) = 0, uj ≥ 0
j=1 i=1

3.4 Example 1
Slide 16
min f (x) = (x1 − 12)2 + (x2 + 6)2
s.t. h1 (x) = 8x1 + 4x2 − 20 = 0
g1 (x) = x21 + x22 + 3x1 − 4.5x2 − 6.5 ≤ 0
g2 (x) = (x1 − 9)2 + x22 − 64 ≤ 0

4
x = (2, 1)� ; g1 (x) = 0, g2 (x) = −14, h1 (x) = 0.
Slide 17

• I = {1}
• �f (x) = (−20, 14)�; �g1 (x) = (7, −2.5)�
• �g2 (x) = (−14, 2)� ; �h1 (x) = (8, 4)�
• u1 = 4, u2 = 0, v1 = −1
• �g1 (x), �h1 (x) linearly independent
• �f (x) + u1 �g1 (x) + u2 �g2 (x) + v1 �h1 (x) = 0
� � � � � � � � � �
−20 7 −14 8 0
+4 +0 + (−1) =
14 −2.5 2 4 0

3.5 Example 2
Slide 18
max x� Qx
s.t. x� x ≤ 1
Q arbitrary; Not a convex optimization problem.

min −x� Qx
s.t. x� x ≤ 1

3.5.1 KKT
Slide 19
−2Qx + 2ux = 0
x� x ≤ 1
u ≥ 0
u(1 − x� x) = 0
Slide 20

3.5.2 Solutions of KKT


• x = 0, u = 0, Obj = 0.
• x �= 0 ⇒ Qx = u x ⇒ x eigenvector of Q with non-negative eigenvalue
u.
• x� Qx = u x� x = u.
• Thus, pick the largest nonnegative eigenvalue û of Q. The solution is

the corresponding eigenvector x̂ normalized such that x̂� x̂ = 1. If all

eigenvalues are negative, x̂ = 0.

5
3.6 Are CQC Necessary?
Slide 21
min x1
s.t. x21 − x2 ≤ 0
x2 = 0
Feasible space is (0, 0).
KKT: � � � � � � � �
1 2x1 0 0
+u +v =
0 −1 1 0
KKT multipliers do not exist, while still (0, 0)� is local minimum. Check �g1 (0, 0)
and �h1 (0, 0).

3.7 Constrained Qualification


Slide 22
Slater condition: There exists an x0 such that gj (x0 ) < 0, j = 1, . . . , p, and
hi (x0 ) = 0 for all i = 1, . . . , m.

Theorem Under the Slater condition the KKT conditions are necessary.

4 Sufficient
Optimality Conditions
Slide 23
Theorem If
• x feasible for P
• Feasible set is P is convex and f (x) convex
• There exist vectors (u, v), u ≥ 0:
p
� m

�f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1

uj gj (x) = 0, j = 1, . . . , p

Then, x is a global minimum of P .

4.1 Proof
Slide 24
• Let x ∈ P . Then (1 − λ)x + λx ∈ P for λ ∈ [0, 1].
• gj (x + λ(x − x)) ≤ 0 ⇒

�gj (x)� (x − x) ≤ 0

6
• Similarly, hi (x + λ(x − x)) ≤ 0 ⇒

�hi (x)� (x − x) = 0

Slide 25
• Thus,

�f (x)� (x − x) =

⎛ ⎞�
� p m

−⎝ uj �gj (x) + vi �hi (x)⎠ (x − x) ≥ 0
j=1 i=1

⇒ f (x) ≥ f (x).

5 Convex Optimization
Slide 26
• The KKT conditions are always necessary under CQC.
• The KKT conditions are sufficient for convex optimization problems.
• The KKT conditions are necessary and sufficient for convex optimization

problems under CQC.

• min f (x) s.t. Ax = b, x ≥ 0, f (x) convex, KKT are necessary and

sufficient even without CQC.

5.0.1 Separating hyperplanes


Slide 27
Theorem Let S be a nonempty closed convex subset of �n and let x∗ ∈ �n be
a vector that does not belong to S. Then, there exists some vector c ∈ �n such
that c� x∗ < c� x for all x ∈ S.
Proof in BT, p.170

5.1 Sketch of the Proof under convexity


Slide 28
• Suppose x is a local (and thus global) optimal solution.
• f (x) < f (x), gj (x) ≤ 0, j = 1, . . . , p, hi (x) = 0, i = 1, . . . , m is infeasi­

ble.

• Let U = {(u0 , u, v)| there exists x : f (x) < u0 , gj (x) ≤ uj , hi (x) = vi }.


• (f (x), 0, 0) ∈
/ S.
• U convex.
Slide 29
• By separating hyperplane theorem, there is a vector (c0 , c, d):
p
� m

c0 u 0 + cj u j + di vi > c0 f (x) ∀ (u0 , u, v) ∈ U.
j=1 i=1

7
S
.x *

• c0 ≥ 0 and cj ≥ 0 for j ∈ I (constraint gj (x) ≤ 0 tight). Why?


If (u0 , u, v) ∈ U , then (u0 + λ, u, v) ∈ U for λ ≥ 0. Thus,
p
� m

∀ λ ≥ 0, λc0 + cj u j + di vi > c0 f (x) ⇒ c0 ≥ 0.
j=1 i=1

• Select (u0 , u, v) = (f (x) + λ, g1 (x), . . . , gp (x), h1 (x), . . . , hm (x)) ∈ U



p
� m

c0 (f (x) + λ) + cj gj (x) + di hi (x) > c0 f (x)
j=1 i=1

• Take λ → 0:
p
� m

c0 f (x) + cj gj (x) + di hi (x) ≥ c0 f (x)
j=1 i=1

• c0 > 0 (constrained qualification needed here).


• p
� m

f (x) + uj gj (x) + vi hi (x) ≥ f (x), uj ≥ 0
j=1 i=1


p
� m

f (x) + uj gj (x) + vi hi (x) ≤ f (x) ≤
j=1 i=1
p
� m

f (x) + uj gj (x) + vi hi (x).
j=1 i=1

8
• Thus, ⎛ ⎞
p
� m

f (x) = min ⎝f (x) + uj gj (x) + vi hi (x)⎠
j=1 i=1

p

uj gj (x) = 0 ⇒ uj gj (x) = 0
j=1

• Unconstrained optimality conditions:


p
� m

�f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1

6 Applications
6.1 Linear Optimization
Slide 30
min c� x
s.t. Ax = b
x≥0
min c� x
s.t. Ax − b = 0
−x ≤ 0

6.1.1 KKT
Slide 31
c + A� û − v = 0
v ≥ 0
vj xj = 0
Ax − b = 0
x ≥ 0
u = −û
A� u ≤ c dual feasibility
(cj − A�j u)xj = 0 complementarity
Ax − b = 0 primal feasibility
x ≥ 0 primal feasibility

9
6.2 Portfolio Optimization
Slide 32
x =weights of the portfolio
1
max r � x − λ x� Qx
2
s.t. e� x = 1

1
min −r �x + λ x� Qx
2
s.t. e� x = 1

6.2.1 KKT
Slide 33
−r + λQx + ue = 0
1 −1
x= Q (r − ue)
λ
−1
e� x = 1 ⇒ e� Q (r − ue) = λ
� −1
eQ r−λ
u= � −1
eQ e
As λ changes, tradeoff of risk and return changes. The allocation changes as
well. This is the essense of modern portfolio theory.

10
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like