Unconstrained Optimization (Contd.) Constrained Optimization

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Lecture 12

Unconstrained Optimization (contd.)


Constrained Optimization

October 15, 2008


Lecture 11

Outline

• Gradient descent algorithm


• Improvement to result in Lec 11
• At what rate will it converge?

• Constrained minimization over simple sets

Convex Optimization 1
Lecture 11

Unconstrained Minimization
minimize f (x)
subject x ∈ <n.
• Assumption 1:
• The function f is convex and continuously differentiable over Rn
• The optimal value f ∗ = inf x∈Rn f (x) is finite.

• Gradient descent algorithm

xk+1 = xk − α∇f (xk )

Convex Optimization 2
Lecture 11

Theorem 1: Bounded Gradients

• Theorem 1 Let Assumption 1 hold, and suppose that the gradients are
bounded. Then, the gradient method generates the sequence {xk } such
that
∗ αL2
lim inf f (xk ) ≤ f +
k→∞ 2

• We first proved that for y ∈ Rn and for all k,

kxk+1 − yk2 ≤ kxk − yk2 − 2αk (f (xk ) − f (y)) + α2k k∇f (xk )k2.

Convex Optimization 3
Lecture 11

Theorem 2: Lipschitz Gradients

• Q-approximation Lemma
For continuously differentiable function with Lipschitz gradients, we have

M
f (y) ≤ f (x) + ∇f (x)T (y − x) + ky − xk2 for all x, y ∈ Rn,
2

• Theorem Let Assumption 1 hold, and assume that the gradients of f


2
are Lipschitz. Then, for α with α < M , we have

lim k∇f (xk )k = 0.


k→∞

If in addition, an optimal solution exists [i.e., the minxf (x) is attained


at some x∗], then every accumulation point of the sequence {xk } is
optimal.

Convex Optimization 4
Lecture 11

• Proof: Using Q-approximation Lemma with y = xk+1 and x = xk , we


have

2 α2 M
f (xk+1) ≤ f (xk ) − αk∇f (xk )k + k∇f (xk )k2
α 2
= f (xk ) − (2 − αM ) k∇f (xk )k2
2

By summing these relations and using 2 − αM > 0, we can see that

k∇f (xk )k2 < ∞,


X

implying limk→∞ kf (xk )k = 0.


Suppose a solution exists, and the sequence {xk } has an accumulation
point x̄. Then, by continuity of the gradient we have ∇f (xk ) → ∇f (x̄)
along an appropriate subsequence.

Convex Optimization 5
Lecture 11

Since ∇f (xk ) → 0, it follows ∇f (x̄) = 0, implying that x̄ is a solution.


• Where do we use convexity?
• Correct Theorem 2: Assume that the gradients of f are Lipschitz. Then,
2
for α with α < M , we have


k∇f (xk )k2 < ∞.
X

k=1

If in addition, an optimal solution exists [i.e., the minxf (x) is attained


at some x∗], then every accumulation point of the sequence {xk } is
optimal.
• {xk } need not converge. Note difference between limit point and
accumalation point.
1
• Example: f (x) = 1+x2
. Satisfies conditions and xk → ∞.

Convex Optimization 6
Lecture 11

What does convexity buy?

• Answer: Convergence of the sequence {xk }

• Recall/Define X ∗ as optimal set.

• Theorem 3: Let Assumption 1 hold. Further, assume that the gradients


2
of f are Lipschitz and α with α < M . If X ∗ is nonempty then

lim xk = x∗, x∗ ∈ X ∗ .
k→∞

• Proof (outline): Previous theorem tells us a lot. Every convergenent


subsequence of {xk } is a point in X ∗. Questions
• Is there a convergent subsequence?
• Do all the subsequence converge to the same point in X ∗?

Convex Optimization 7
Lecture 11

Conditions of Lemma 1 are satisfied. Fix y = x̃, where x̃ ∈ X ∗

kxk+1 − x̃k2 ≤ kxk − x̃k2 − 2α(f (xk ) − f ∗) + α2k∇f (xk )k2.

Drop the negative term

kxk+1 − x̃k2 ≤ kxk − x̃k2 + α2k∇f (xk )k2.

Note from Theorem 2 that


k∇f (xl )k2 < ∞.
X

l=1

Convex Optimization 8
Lecture 11

P∞
Therefore adding α2 l=k+1 k∇f (xl )k
2
to both sides we obtain

∞ ∞
kxk+1 − x̃k2 + α2 k∇f (xl )k2 ≤ kxk − x̃k2 + α2 k∇f (xl )k2.
X X

l=k+1 l=k

Define ∞
uk = kxk − x̃k2 + α2 k∇f (xl )k2.
X

l=k
Thus, equation is
uk+1 ≤ uk ,
P∞
which implies {uk } converges. We already know l=k k∇f (xl )k2
converges as k → ∞. Therefore, kxk − x̃k converges. Therefore, {xk }
is bounded => convergent subsequence exists.
Take a convergent subsequence {xsk } and let limit point be x̄. We know

Convex Optimization 9
Lecture 11

x̄ ∈ X ∗ from previous Theorem. Thus,

lim kxsk − x̄k = 0.


k→∞

But we just showed that limk→∞ kxk − x̄k exists. Therefore, the limit
must be 0 and {xk } converge to x̄.

• Quick Summary:
• Without convexity no convergence of iterates.
• With convexity convergence of iterates.

1
• Go back and check f (x) = 1+x2
. It can’t be convex.

• General note: What do I get from all this analysis?

Convex Optimization 10
Lecture 11

Rates of convergence

• Successfully solved the problem. Finish course and go home? Unfortu-


nately, no. Gradient descent has poor rates of convergence. Emperically
observed when we simulate.
• See last page for an example.
• Lets consider the class of strongly convex functions. Recall,
• ∇2f (x)  mI for all x ∈ Rn
• f (y) ≥ f (x) + ∇f (x)T (y − x) + m2
1
• 2m k∇f (x)k2 ≥ f (x) − f ∗ ≥ m2
kx − x ∗ 2
k

Convex Optimization 11
Lecture 11

• Theorem 4: Let Assumption 1 hold. Further, assume that f is strongly


convex, gradients of f are Lipschitz and α with α < min(2,m)
M
. Then,

kxk − x∗k ≤ cq k , 0 < q < 1.

Note: Geometric rate of convergence. (Normal people call it exponen-


tial;))

• Proof: Strongly convex => Unique minimum. Basic iterate relation

Convex Optimization 12
Lecture 11

with y = x∗ gives

kxk+1 − x∗k2 ≤ kxk − x∗k2 − 2α(f (xk ) − f ∗) + αk∇f (xk )k2


m
∗ 2
≤ kxk − x k − 2α kxk − x∗k2 + αk∇f (xk )k2
2
m
= kxk − x∗k2 − 2α kxk − x∗k2 + αk∇f (xk ) − ∇f (x∗)k2
2
≤ kxk − x∗k2 − αmkxk − x∗k2 + α2M kxk − x∗k2
 
= 1 − mα + α M kxk − x∗k2
2

 k+1
2
= 1 − mα + α M kx0 − x∗k2

2
• Remark: Result holds when 0 < α < m
.

• Geometric is not bad. But, how many functions are strongly convex?

Convex Optimization 13
Lecture 11

Constrained minimization over simple sets

minimize f (x)
subject x ∈ X.
• Assumption 2
• f is continuously differentiable and convex on an open interval
containing X
• X is closed and compact
• f ∗ > −∞

• Optimality condition ∇f (x∗)T (x − x∗) ≤ 0

• Simple sets: Easy projection. Ball, hyperplanes, halfspaces

• Assumption 2: The set X is closed and convex.

Convex Optimization 14
Lecture 11

• Recall projection
• kPX [x] − xk ≤ kPX [x] − yk when y ∈ X
• kPX [x] − PX [y]k ≤ kx − yk when x, y ∈ <n

• Gradient projection algorithm

xk+1 = PX [xk − αk ∇f (xk )]

• Can we obtain results similar to Theorems 1-4?

• Lemma 1 and Theorem 1 Let Assumption 1 and 2 hold, and suppose


that the gradients are bounded. Then, for any y ∈ X, and all k

kxk+1 − yk2 ≤ kxk − yk2 − 2αk (f (xk ) − f (y)) + α2k k∇f (xk )k2.

Convex Optimization 15
Lecture 11

and hence for αk = α,

∗ αL2
lim inf f (xk ) ≤ f +
k→∞ 2

• Proof: By the definition of the method, it follows that for any k,

kxk+1 − yk2 = kPX [xk − αk ∇f (xk )] − yk2


≤ kxk − αk ∇f (xk ) − yk2
≤ kxk − yk2 − 2αk ∇f (xk )T (xk − y) + α2k k∇f (xk )k2.

Rest identical

• Theorem 2: Not very interesting. After all, ∇f (x∗) need not be 0.


What we could show is something like limk→∞ ∇f (xk )T (x − xk ) ≤ 0
for all x.

Convex Optimization 16
Lecture 11

• Theorem 3: Let Assumption 2 hold. Further, assume that the gradients


2
of f are Lipschitz and α with α < M . If X ∗ is nonempty then

lim xk = x∗, x∗ ∈ X ∗ .
k→∞

• Theorem 4: Let Assumption 2 hold. Further, assume that f is strongly


convex, gradients of f are Lipschitz and α with α < min(2,m)
M
. Then,

kxk − x∗k ≤ c̄q̄ k , 0 < q̄ < 1.

Convex Optimization 17
Lecture 11

Proofs

Pack it. I am going home! Prof. Nedić will do it. I think.

Convex Optimization 18

You might also like