Unconstrained Optimization (Contd.) Constrained Optimization

Lecture 12
Unconstrained Optimization (contd.)

Constrained Optimization
October 15, 2008

Lecture 11
Outline
• Gradient descent algorithm

• Improvement to result in Lec 11
• At what rate will it converge?
• Constrained minimization over simple sets
Convex Optimization 1
Lecture 11
Unconstrained Minimization
minimize f (x)
subject x ∈ <n.
• Assumption 1:
• The function f is convex and continuously differentiable over Rn
• The optimal value f ∗ = inf x∈Rn f (x) is finite.
• Gradient descent algorithm
xk+1 = xk − α∇f (xk )
Lecture 11
Theorem 1: Bounded Gradients
• Theorem 1 Let Assumption 1 hold, and suppose that the gradients are
bounded. Then, the gradient method generates the sequence {xk } such
that
∗ αL2
lim inf f (xk ) ≤ f +
k→∞ 2
• We first proved that for y ∈ Rn and for all k,
kxk+1 − yk2 ≤ kxk − yk2 − 2αk (f (xk ) − f (y)) + α2k k∇f (xk )k2.
Lecture 11
Theorem 2: Lipschitz Gradients
• Q-approximation Lemma
For continuously differentiable function with Lipschitz gradients, we have
M
f (y) ≤ f (x) + ∇f (x)T (y − x) + ky − xk2 for all x, y ∈ Rn,
2
• Theorem Let Assumption 1 hold, and assume that the gradients of f

2
are Lipschitz. Then, for α with α < M , we have
lim k∇f (xk )k = 0.

k→∞
If in addition, an optimal solution exists [i.e., the minxf (x) is attained

at some x∗], then every accumulation point of the sequence {xk } is
optimal.
Lecture 11
• Proof: Using Q-approximation Lemma with y = xk+1 and x = xk , we

have
2 α2 M
f (xk+1) ≤ f (xk ) − αk∇f (xk )k + k∇f (xk )k2
α 2
= f (xk ) − (2 − αM ) k∇f (xk )k2
2
By summing these relations and using 2 − αM > 0, we can see that
k∇f (xk )k2 < ∞,

X
implying limk→∞ kf (xk )k = 0.

Suppose a solution exists, and the sequence {xk } has an accumulation
point x̄. Then, by continuity of the gradient we have ∇f (xk ) → ∇f (x̄)
along an appropriate subsequence.
Lecture 11
Since ∇f (xk ) → 0, it follows ∇f (x̄) = 0, implying that x̄ is a solution.

• Where do we use convexity?
• Correct Theorem 2: Assume that the gradients of f are Lipschitz. Then,
2
for α with α < M , we have
∞
k∇f (xk )k2 < ∞.
X
k=1
If in addition, an optimal solution exists [i.e., the minxf (x) is attained

at some x∗], then every accumulation point of the sequence {xk } is
optimal.
• {xk } need not converge. Note difference between limit point and
accumalation point.
1
• Example: f (x) = 1+x2
. Satisfies conditions and xk → ∞.
Lecture 11
What does convexity buy?
• Answer: Convergence of the sequence {xk }
• Recall/Define X ∗ as optimal set.
• Theorem 3: Let Assumption 1 hold. Further, assume that the gradients

2
of f are Lipschitz and α with α < M . If X ∗ is nonempty then
lim xk = x∗, x∗ ∈ X ∗ .
k→∞
• Proof (outline): Previous theorem tells us a lot. Every convergenent

subsequence of {xk } is a point in X ∗. Questions
• Is there a convergent subsequence?
• Do all the subsequence converge to the same point in X ∗?
Lecture 11
Conditions of Lemma 1 are satisfied. Fix y = x̃, where x̃ ∈ X ∗
kxk+1 − x̃k2 ≤ kxk − x̃k2 − 2α(f (xk ) − f ∗) + α2k∇f (xk )k2.
Drop the negative term
kxk+1 − x̃k2 ≤ kxk − x̃k2 + α2k∇f (xk )k2.
Note from Theorem 2 that
∞
k∇f (xl )k2 < ∞.
X
l=1
Lecture 11
P∞
Therefore adding α2 l=k+1 k∇f (xl )k
2
to both sides we obtain
∞ ∞
kxk+1 − x̃k2 + α2 k∇f (xl )k2 ≤ kxk − x̃k2 + α2 k∇f (xl )k2.
X X
l=k+1 l=k
Define ∞
uk = kxk − x̃k2 + α2 k∇f (xl )k2.
X
l=k
Thus, equation is
uk+1 ≤ uk ,
P∞
which implies {uk } converges. We already know l=k k∇f (xl )k2
converges as k → ∞. Therefore, kxk − x̃k converges. Therefore, {xk }
is bounded => convergent subsequence exists.
Take a convergent subsequence {xsk } and let limit point be x̄. We know
Lecture 11
x̄ ∈ X ∗ from previous Theorem. Thus,
lim kxsk − x̄k = 0.

k→∞
But we just showed that limk→∞ kxk − x̄k exists. Therefore, the limit
must be 0 and {xk } converge to x̄.
• Quick Summary:
• Without convexity no convergence of iterates.
• With convexity convergence of iterates.
1
• Go back and check f (x) = 1+x2
. It can’t be convex.
• General note: What do I get from all this analysis?
Lecture 11
Rates of convergence
• Successfully solved the problem. Finish course and go home? Unfortu-

nately, no. Gradient descent has poor rates of convergence. Emperically
observed when we simulate.
• See last page for an example.
• Lets consider the class of strongly convex functions. Recall,
• ∇2f (x) mI for all x ∈ Rn
• f (y) ≥ f (x) + ∇f (x)T (y − x) + m2
1
• 2m k∇f (x)k2 ≥ f (x) − f ∗ ≥ m2
kx − x ∗ 2
k
Lecture 11
• Theorem 4: Let Assumption 1 hold. Further, assume that f is strongly

convex, gradients of f are Lipschitz and α with α < min(2,m)
M
. Then,
kxk − x∗k ≤ cq k , 0 < q < 1.
Note: Geometric rate of convergence. (Normal people call it exponen-

tial;))
• Proof: Strongly convex => Unique minimum. Basic iterate relation
Lecture 11
with y = x∗ gives
kxk+1 − x∗k2 ≤ kxk − x∗k2 − 2α(f (xk ) − f ∗) + αk∇f (xk )k2

m
∗ 2
≤ kxk − x k − 2α kxk − x∗k2 + αk∇f (xk )k2
2
m
= kxk − x∗k2 − 2α kxk − x∗k2 + αk∇f (xk ) − ∇f (x∗)k2
2
≤ kxk − x∗k2 − αmkxk − x∗k2 + α2M kxk − x∗k2

= 1 − mα + α M kxk − x∗k2
2
k+1
2
= 1 − mα + α M kx0 − x∗k2
2
• Remark: Result holds when 0 < α < m
.
• Geometric is not bad. But, how many functions are strongly convex?
Lecture 11
Constrained minimization over simple sets
minimize f (x)
subject x ∈ X.
• Assumption 2
• f is continuously differentiable and convex on an open interval
containing X
• X is closed and compact
• f ∗ > −∞
• Optimality condition ∇f (x∗)T (x − x∗) ≤ 0
• Simple sets: Easy projection. Ball, hyperplanes, halfspaces
• Assumption 2: The set X is closed and convex.
Lecture 11
• Recall projection
• kPX [x] − xk ≤ kPX [x] − yk when y ∈ X
• kPX [x] − PX [y]k ≤ kx − yk when x, y ∈ <n
• Gradient projection algorithm
xk+1 = PX [xk − αk ∇f (xk )]
• Can we obtain results similar to Theorems 1-4?
• Lemma 1 and Theorem 1 Let Assumption 1 and 2 hold, and suppose

that the gradients are bounded. Then, for any y ∈ X, and all k
kxk+1 − yk2 ≤ kxk − yk2 − 2αk (f (xk ) − f (y)) + α2k k∇f (xk )k2.
Lecture 11
and hence for αk = α,
∗ αL2
lim inf f (xk ) ≤ f +
k→∞ 2
• Proof: By the definition of the method, it follows that for any k,
kxk+1 − yk2 = kPX [xk − αk ∇f (xk )] − yk2

≤ kxk − αk ∇f (xk ) − yk2
≤ kxk − yk2 − 2αk ∇f (xk )T (xk − y) + α2k k∇f (xk )k2.
Rest identical
• Theorem 2: Not very interesting. After all, ∇f (x∗) need not be 0.

What we could show is something like limk→∞ ∇f (xk )T (x − xk ) ≤ 0
for all x.
Lecture 11
• Theorem 3: Let Assumption 2 hold. Further, assume that the gradients

2
of f are Lipschitz and α with α < M . If X ∗ is nonempty then
lim xk = x∗, x∗ ∈ X ∗ .
k→∞
• Theorem 4: Let Assumption 2 hold. Further, assume that f is strongly

convex, gradients of f are Lipschitz and α with α < min(2,m)
M
. Then,
kxk − x∗k ≤ c̄q̄ k , 0 < q̄ < 1.
Lecture 11
Proofs
Pack it. I am going home! Prof. Nedić will do it. I think.

Unconstrained Optimization (Contd.) Constrained Optimization

Uploaded by

Copyright:

Available Formats

Unconstrained Optimization (Contd.) Constrained Optimization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unconstrained Optimization (Contd.) Constrained Optimization

Uploaded by

Copyright:

Available Formats

Lecture 12

Unconstrained Optimization (contd.)

October 15, 2008

• Gradient descent algorithm

• Constrained minimization over simple sets

• Gradient descent algorithm

xk+1 = xk − α∇f (xk )

Theorem 1: Bounded Gradients

• We first proved that for y ∈ Rn and for all k,

Theorem 2: Lipschitz Gradients

• Theorem Let Assumption 1 hold, and assume that the gradients of f

lim k∇f (xk )k = 0.

If in addition, an optimal solution exists [i.e., the minxf (x) is attained

• Proof: Using Q-approximation Lemma with y = xk+1 and x = xk , we

By summing these relations and using 2 − αM > 0, we can see that

k∇f (xk )k2 < ∞,

implying limk→∞ kf (xk )k = 0.

Since ∇f (xk ) → 0, it follows ∇f (x̄) = 0, implying that x̄ is a solution.

If in addition, an optimal solution exists [i.e., the minxf (x) is attained

What does convexity buy?

• Answer: Convergence of the sequence {xk }

• Recall/Define X ∗ as optimal set.

• Theorem 3: Let Assumption 1 hold. Further, assume that the gradients

• Proof (outline): Previous theorem tells us a lot. Every convergenent

Conditions of Lemma 1 are satisfied. Fix y = x̃, where x̃ ∈ X ∗

kxk+1 − x̃k2 ≤ kxk − x̃k2 − 2α(f (xk ) − f ∗) + α2k∇f (xk )k2.

Drop the negative term

kxk+1 − x̃k2 ≤ kxk − x̃k2 + α2k∇f (xk )k2.

Note from Theorem 2 that

x̄ ∈ X ∗ from previous Theorem. Thus,

lim kxsk − x̄k = 0.

• General note: What do I get from all this analysis?

• Successfully solved the problem. Finish course and go home? Unfortu-

• Theorem 4: Let Assumption 1 hold. Further, assume that f is strongly

kxk − x∗k ≤ cq k , 0 < q < 1.

Note: Geometric rate of convergence. (Normal people call it exponen-

• Proof: Strongly convex => Unique minimum. Basic iterate relation

kxk+1 − x∗k2 ≤ kxk − x∗k2 − 2α(f (xk ) − f ∗) + αk∇f (xk )k2

Constrained minimization over simple sets

• Optimality condition ∇f (x∗)T (x − x∗) ≤ 0

• Simple sets: Easy projection. Ball, hyperplanes, halfspaces

• Assumption 2: The set X is closed and convex.

• Gradient projection algorithm

xk+1 = PX [xk − αk ∇f (xk )]

• Can we obtain results similar to Theorems 1-4?

• Lemma 1 and Theorem 1 Let Assumption 1 and 2 hold, and suppose

and hence for αk = α,

• Proof: By the definition of the method, it follows that for any k,

kxk+1 − yk2 = kPX [xk − αk ∇f (xk )] − yk2

• Theorem 2: Not very interesting. After all, ∇f (x∗) need not be 0.

• Theorem 3: Let Assumption 2 hold. Further, assume that the gradients

• Theorem 4: Let Assumption 2 hold. Further, assume that f is strongly

kxk − x∗k ≤ c̄q̄ k , 0 < q̄ < 1.

Pack it. I am going home! Prof. Nedić will do it. I think.

You might also like