Lecture 9

Quasi-Newton methods
• Idea: Instead of using true Hessian 𝛻 " 𝑓(𝑥& ) in Newton direction, use an
approximation 𝐵& (symmetric positive definite).
-
1 -
𝑓 𝒙& + 𝒑 ≈ 𝑓 𝒙& + 𝛻𝑓 𝒙& 𝒑 + 𝒑 𝑩& 𝒑
2
• 𝐵& is constructed at each step using 𝑩&12 and gradient information
𝛻𝑓 𝒙& and 𝛻𝑓 𝒙&12
• It is common to develop iteration formula directly in the form of 𝑩12
& = 𝑨&
𝑨&52 = 𝑨& + 𝑨6&

• Direction:
𝑝& = −𝐴& 𝛻𝑓(𝑥& )
• Eventually 𝑩& converges to 𝛻 " 𝑓(𝒙& )
• Quasi-Newton methods give superlinear convergence.
CL 603 Optimization 29
Quasi-Newton methods
• DFP (Davidon Fletcher Powell) method1,2
Δ𝒙& Δ𝒙& - 𝑨& Δ𝒈& Δ𝒈& - 𝑨&

𝑨&52 = 𝑨& + - −
Δ𝒙& Δ𝒈& Δ𝒈& - 𝑨& Δ𝒈&
• BFGS (Broyden Fletcher Goldfarb Shanno) method3-6
-
Δ𝒙& Δ𝒙& - Δ𝒙& Δ𝒈& - Δ𝒙& Δ𝒈& -
𝑨&52 = -
+ 𝑰− -
𝑨& 𝑰 −
Δ𝒙& Δ𝒈& Δ𝒙& Δ𝒈& Δ𝒙& - Δ𝒈&
with 𝑨= = 𝑰, Δ𝒙& = 𝒙&52 − 𝒙& , Δ𝒈& = 𝛻𝑓 𝒙&52 − 𝛻𝑓 𝒙&

1. Davidon WC, AEC Res. Develop. Rep. ANL-599, 1959
2. Fletcher R and Powell MJD, Computer J, 163, 1963
3. Broyden GG, J Inst. Math. Appl., 76-90, 1970
4. Fletcher R, Computer J, 317-322, 1970
5. Goldfarb D, Math. Prog., 94-110, 1977
6. Shanno DF, Math. Comput., 647-657, 1970
Quasi Newton methods
• Both the updating formulas preserve symmetry and positive
definiteness.
• They follow descent property
• They require storage for 𝑁×N matrix 𝑨𝒌
• The methods are robust and work on a wide variety of
practical problems.
• DFP method suffers from practical difficulty of 𝐴&52 becoming
ill-conditioned. Work-around is to restart with 𝑨& = 𝑰
• BFGS requires fewer restarts and is less dependent on exact
line searches.
Application to motivating example
DFP method After 3 iterations: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟏; 𝟖. 𝟓𝟖]
1. Initial guess 𝑥= = [4; 4] 𝒇⋆ = −𝟕. 𝟑𝟔
2. 𝑠2 = −𝛻𝑓 𝑥= = [0.65; 0.39]
Analytical solution: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟒; 𝟖. 𝟓𝟖]
3. 𝛼2 = 13.16
𝒇⋆ = −𝟕. 𝟑𝟔
4. 𝑥2 = [12.58; 9.14]
10.12 5.31 12 1 2 -1
-3.1619 0-2 -1
-3.1619
-5.5 0-2
5. 𝐴2 =
-5
-3.161
5.31 4.04
.5
-1 90-2
.5
0
-5
6. 𝑠" = [0.09; −0.10] 10
-2
-1
1
-7.
7. 𝛼" = 7.00
34
19
-7.2
-3.16
76
2
.2
76
8
-7
8. 𝑥" = [13.20; 8.43]
-5.5
-2 -3.1619
-1
CA0
9.11 −2.14
6
-5.
9. 𝐴" =
0
−2.14 4.37
-2
-1
1
-5.5 -5.5
4
10. 𝑠Q = [0.27; 0.05]
-3
.1
61
9
11. 𝛼Q = 3.35 2
-3.1619 -3.1619
-2
-2
-1
-2
12. 𝑥Q = [14.11; 8.58] -1
0
-1 -1
0
0 0 0 0
0 2 4 6 8 10 12 14 16 18 20
=
Application to motivating example
BFGS method After 3 iterations: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟏; 𝟖. 𝟔𝟐]
1. Initial guess 𝑥= = [4; 4] 𝒇⋆ = −𝟕. 𝟑𝟔
2. 𝑠2 = −𝛻𝑓 𝑥= = [0.65; 0.39]
Analytical solution: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟒; 𝟖. 𝟓𝟖]
3. 𝛼2 = 13.16
𝒇⋆ = −𝟕. 𝟑𝟔
4. 𝑥2 = [12.58; 9.14]
12 1 2 -1
-3.1619 0-2 -1
-3.1619
-5.5 0-2
10.13 5.29
-5
-3.161
5. 𝐴2 =
.5
-1 90-2
5.29 4.06
.5
0
-5
10
6. 𝑠" = [0.09; −0.105]
-2
-1
1
-7.
34
19
-7.2
7. 𝛼" = 6.79
-3.16
76
2
.2
76
8
-7
8. 𝑥" = [13.20; 8.43]
-5.5
-2 -3.1619
-1
CA0
6
-5.
18.62 0.07
0
9. 𝐴" =
-2
0.07 4.89
-1
1
-5.5 -5.5
4
-3
10. 𝑠Q = [0.68; 0.15]
.1
6 19
11. 𝛼Q = 1.33
-3.1619 -3.1619
2
-2
-2
-1
-2
-1
12. 𝑥Q = [14.11; 8.62]
0
-1 -1
0
0 0 0 0
0 2 4 6 8 10 12 14 16 18 20
=
Selection of step length 𝛼&
• Exact line search: Minimize 𝑓(𝑥& + 𝛼& 𝑝& ) using univariate
optimization
• Can be computationally expensive
• Inexact line search: Select 𝛼& such that it achieves
‘reasonable’ reduction in 𝑓 in the search direction 𝑝&
• Trade-offs: Substantial decrease in 𝑓 versus computational
effort involved
Need conditions on 𝜶𝒌 to select appropriate step length
Conditions on step length
• Simple condition: provide reduction 𝑓 𝑥& + 𝛼& 𝑝& < 𝑓(𝑥& )
• Armijo condition (AC): If 𝑝& is a descent direction, 𝛼& should
give sufficient decrease in 𝑓 i.e.
-𝑝
𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& &
for some 𝑐2 ∈ (0,1)

• Reduction in 𝑓 should be proportional to both step size as
-𝑝
well as directional derivative 𝛻𝑓 𝑥& &
• In practice, 𝑐2 is taken as a small value (say 101b )

• AC condition is not sufficient to make reasonable progress
as it is satisfied for all sufficiently small values of 𝛼&
Armijo condition – graphical interpretation
• 𝑓 𝑥 = 𝑥2 − 2 b + 𝑥2 − 2 𝑥"" + 𝑥" + 1 ", 𝑥& = 1; 1 , 𝑐2 = 0.2
• 𝑝& = −𝛻𝑓 - 𝑥& = [6; −6]
𝜶 ≤ 𝟎. 𝟑𝟐
Any small step 𝜶 → 𝟎 satisfies this condition

Curvature condition
-𝑝 -𝑝 , 𝑐
𝛻𝑓 𝑥& + 𝛼& 𝑝& & ≥ 𝑐" 𝛻𝑓 𝑥& & " ∈ (𝑐2 , 1)
• Rules out very small steps.
• 𝑐" = 0.9 for Newton type method, 0.1 for conjugate gradient
𝒄𝟏 = 𝟎. 𝟐, 𝒄𝟐 = 𝟎. 𝟒
-𝑝
𝑐" 𝛻𝑓 𝑥& &
𝜶 ≥ 𝟎. 𝟎𝟕 𝟎. 𝟎𝟕 ≤ 𝜶 ≤ 𝟎. 𝟑𝟐
Exact line search: 𝜶⋆ = 𝟎. 𝟐𝟓
Wolfe conditions
• Armijo condition and curvature condition
-𝑝
𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& &
-𝑝 -𝑝
𝛻𝑓 𝑥& + 𝛼& 𝑝& & ≥ 𝑐" 𝛻𝑓 𝑥& &
with 0 < 𝑐2 < 𝑐" < 1
• Strong Wolfe conditions: To bring 𝛼& closer to local minima

-𝑝
𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& &
-𝑝 -𝑝
𝛻𝑓 𝑥& + 𝛼& 𝑝& & ≤ 𝑐" 𝛻𝑓 𝑥& &
with 0 < 𝑐2 < 𝑐" < 1
Small 𝒄𝟏 and large 𝒄𝟐 lead to large bound on 𝜶
Strong Wolfe conditions
-𝑝
𝑐" 𝛻𝑓 𝑥& &
𝟎. 𝟎𝟕 ≤ 𝜶 ≤ 𝟎. 𝟑𝟏 𝟎. 𝟎𝟕 ≤ 𝜶 ≤ 𝟎. 𝟑𝟏
Eliminated region 0.31 ≤ 𝛼 ≤ 0.32 with large positive

𝛻𝑓 𝑥& + 𝛼& 𝑝& - 𝑝&
Why 𝑐2 < 𝑐"
𝒄𝟏 = 𝟎. 𝟓, 𝒄𝟐 = 𝟎. 𝟐
𝜶 ≤ 𝟎. 𝟏𝟐 𝜶 ≥ 𝟎. 𝟏𝟒
No feasible 𝜶
Backtrack algorithm
• Based on Armijo condition
1. Choose 𝛼g > 0, 𝜌, 𝑐2 ∈ (0,1)
2. Set 𝛼 = 𝛼g
-𝑝
3. Is 𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& & satisfied?
a. Yes: 𝛼& = 𝛼
b. No: 𝛼 = 𝜌𝛼, go back to step 3
How to select 𝛼g
• For Newton type methods, 𝛼g = 1
• Choice 1: Assume that first order change in function across
two iterates is same
𝛼& 𝛻- 𝑓 𝑥& 𝑝& = 𝛼&12 𝛻- 𝑓 𝑥&12 𝑝&12
jklm n o p qklm rklm

𝛼g =
n o p qk rk
• Choice 2: Fit a quadratic using 𝑓 𝑥&12 , 𝑓(𝑥& ) and

𝛻- 𝑓 𝑥& 𝑝& and define 𝛼g as the minimizer for this quadratic
2 𝑓 𝑥& − 𝑓 𝑥&12
𝛼g =
𝛻- 𝑓 𝑥& 𝑝&
Newton with inexact line search
Newton’s method with inexact After 4 iterations: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟑; 𝟖. 𝟓𝟖]
line search 𝒇⋆ = −𝟕. 𝟑𝟔
1. Initial guess 𝑥= = [4; 4] Analytical solution: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟒; 𝟖. 𝟓𝟖]
2. 𝛼= = 0.25 𝒇⋆ = −𝟕. 𝟑𝟔
3. 𝑥2 = [9.64; 9.09]
4. 𝛼2 = 1.00
5. 𝑥" = [12.39; 8.71]
6. 𝛼" = 1.00
7. 𝑥Q = [13.83; 8.61]
8. 𝛼Q = 1.00
9. 𝑥b = [14.13; 8.58]

Lecture 9

Uploaded by

Copyright:

Available Formats

Lecture 9

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 9

Uploaded by

Copyright:

Available Formats

Quasi-Newton methods

𝑨&52 = 𝑨& + 𝑨6&

Δ𝒙& Δ𝒙& - 𝑨& Δ𝒈& Δ𝒈& - 𝑨&

with 𝑨= = 𝑰, Δ𝒙& = 𝒙&52 − 𝒙& , Δ𝒈& = 𝛻𝑓 𝒙&52 − 𝛻𝑓 𝒙&

8. 𝑥" = [13.20; 8.43]

8. 𝑥" = [13.20; 8.43]

Need conditions on 𝜶𝒌 to select appropriate step length

for some 𝑐2 ∈ (0,1)

• In practice, 𝑐2 is taken as a small value (say 101b )

Any small step 𝜶 → 𝟎 satisfies this condition

with 0 < 𝑐2 < 𝑐" < 1

• Strong Wolfe conditions: To bring 𝛼& closer to local minima

with 0 < 𝑐2 < 𝑐" < 1

Small 𝒄𝟏 and large 𝒄𝟐 lead to large bound on 𝜶

Eliminated region 0.31 ≤ 𝛼 ≤ 0.32 with large positive

jklm n o p qklm rklm

• Choice 2: Fit a quadratic using 𝑓 𝑥&12 , 𝑓(𝑥& ) and

You might also like