Lecture 9

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Quasi-Newton methods

• Idea: Instead of using true Hessian 𝛻 " 𝑓(𝑥& ) in Newton direction, use an
approximation 𝐵& (symmetric positive definite).

-
1 -
𝑓 𝒙& + 𝒑 ≈ 𝑓 𝒙& + 𝛻𝑓 𝒙& 𝒑 + 𝒑 𝑩& 𝒑
2
• 𝐵& is constructed at each step using 𝑩&12 and gradient information
𝛻𝑓 𝒙& and 𝛻𝑓 𝒙&12
• It is common to develop iteration formula directly in the form of 𝑩12
& = 𝑨&

𝑨&52 = 𝑨& + 𝑨6&


• Direction:
𝑝& = −𝐴& 𝛻𝑓(𝑥& )
• Eventually 𝑩& converges to 𝛻 " 𝑓(𝒙& )
• Quasi-Newton methods give superlinear convergence.

CL 603 Optimization 29
Quasi-Newton methods
• DFP (Davidon Fletcher Powell) method1,2

Δ𝒙& Δ𝒙& - 𝑨& Δ𝒈& Δ𝒈& - 𝑨&


𝑨&52 = 𝑨& + - −
Δ𝒙& Δ𝒈& Δ𝒈& - 𝑨& Δ𝒈&
• BFGS (Broyden Fletcher Goldfarb Shanno) method3-6

-
Δ𝒙& Δ𝒙& - Δ𝒙& Δ𝒈& - Δ𝒙& Δ𝒈& -
𝑨&52 = -
+ 𝑰− -
𝑨& 𝑰 −
Δ𝒙& Δ𝒈& Δ𝒙& Δ𝒈& Δ𝒙& - Δ𝒈&

with 𝑨= = 𝑰, Δ𝒙& = 𝒙&52 − 𝒙& , Δ𝒈& = 𝛻𝑓 𝒙&52 − 𝛻𝑓 𝒙&


1. Davidon WC, AEC Res. Develop. Rep. ANL-599, 1959
2. Fletcher R and Powell MJD, Computer J, 163, 1963
3. Broyden GG, J Inst. Math. Appl., 76-90, 1970
4. Fletcher R, Computer J, 317-322, 1970
5. Goldfarb D, Math. Prog., 94-110, 1977
6. Shanno DF, Math. Comput., 647-657, 1970
CL 603 Optimization 30
Quasi Newton methods
• Both the updating formulas preserve symmetry and positive
definiteness.
• They follow descent property
• They require storage for 𝑁×N matrix 𝑨𝒌
• The methods are robust and work on a wide variety of
practical problems.
• DFP method suffers from practical difficulty of 𝐴&52 becoming
ill-conditioned. Work-around is to restart with 𝑨& = 𝑰
• BFGS requires fewer restarts and is less dependent on exact
line searches.

CL 603 Optimization 31
Application to motivating example
DFP method After 3 iterations: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟏; 𝟖. 𝟓𝟖]
1. Initial guess 𝑥= = [4; 4] 𝒇⋆ = −𝟕. 𝟑𝟔
2. 𝑠2 = −𝛻𝑓 𝑥= = [0.65; 0.39]
Analytical solution: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟒; 𝟖. 𝟓𝟖]
3. 𝛼2 = 13.16
𝒇⋆ = −𝟕. 𝟑𝟔
4. 𝑥2 = [12.58; 9.14]

10.12 5.31 12 1 2 -1
-3.1619 0-2 -1
-3.1619
-5.5 0-2
5. 𝐴2 =

-5
-3.161
5.31 4.04

.5
-1 90-2
.5
0

-5
6. 𝑠" = [0.09; −0.10] 10

-2
-1
1
-7.
7. 𝛼" = 7.00
34

19

-7.2
-3.16
76

2
.2

76
8
-7

8. 𝑥" = [13.20; 8.43]

-5.5
-2 -3.1619
-1
CA0
9.11 −2.14
6

-5.
9. 𝐴" =

0
−2.14 4.37

-2
-1
1
-5.5 -5.5
4
10. 𝑠Q = [0.27; 0.05]

-3
.1
61
9
11. 𝛼Q = 3.35 2
-3.1619 -3.1619
-2
-2

-1
-2
12. 𝑥Q = [14.11; 8.58] -1
0

-1 -1

0
0 0 0 0
0 2 4 6 8 10 12 14 16 18 20
=

CL 603 Optimization 32
Application to motivating example
BFGS method After 3 iterations: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟏; 𝟖. 𝟔𝟐]
1. Initial guess 𝑥= = [4; 4] 𝒇⋆ = −𝟕. 𝟑𝟔
2. 𝑠2 = −𝛻𝑓 𝑥= = [0.65; 0.39]
Analytical solution: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟒; 𝟖. 𝟓𝟖]
3. 𝛼2 = 13.16
𝒇⋆ = −𝟕. 𝟑𝟔
4. 𝑥2 = [12.58; 9.14]
12 1 2 -1
-3.1619 0-2 -1
-3.1619
-5.5 0-2
10.13 5.29

-5
-3.161
5. 𝐴2 =

.5
-1 90-2
5.29 4.06

.5
0

-5
10
6. 𝑠" = [0.09; −0.105]

-2
-1
1
-7.
34

19

-7.2
7. 𝛼" = 6.79

-3.16
76

2
.2

76
8
-7

8. 𝑥" = [13.20; 8.43]

-5.5
-2 -3.1619
-1
CA0
6

-5.
18.62 0.07

0
9. 𝐴" =

-2
0.07 4.89
-1
1
-5.5 -5.5
4

-3
10. 𝑠Q = [0.68; 0.15]

.1
6 19
11. 𝛼Q = 1.33
-3.1619 -3.1619
2
-2
-2

-1
-2
-1
12. 𝑥Q = [14.11; 8.62]
0

-1 -1

0
0 0 0 0
0 2 4 6 8 10 12 14 16 18 20
=

CL 603 Optimization 33
Selection of step length 𝛼&
• Exact line search: Minimize 𝑓(𝑥& + 𝛼& 𝑝& ) using univariate
optimization
• Can be computationally expensive
• Inexact line search: Select 𝛼& such that it achieves
‘reasonable’ reduction in 𝑓 in the search direction 𝑝&
• Trade-offs: Substantial decrease in 𝑓 versus computational
effort involved

Need conditions on 𝜶𝒌 to select appropriate step length

CL 603 Optimization 34
Conditions on step length
• Simple condition: provide reduction 𝑓 𝑥& + 𝛼& 𝑝& < 𝑓(𝑥& )
• Armijo condition (AC): If 𝑝& is a descent direction, 𝛼& should
give sufficient decrease in 𝑓 i.e.
-𝑝
𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& &

for some 𝑐2 ∈ (0,1)


• Reduction in 𝑓 should be proportional to both step size as
-𝑝
well as directional derivative 𝛻𝑓 𝑥& &

• In practice, 𝑐2 is taken as a small value (say 101b )


• AC condition is not sufficient to make reasonable progress
as it is satisfied for all sufficiently small values of 𝛼&

CL 603 Optimization 35
Armijo condition – graphical interpretation
• 𝑓 𝑥 = 𝑥2 − 2 b + 𝑥2 − 2 𝑥"" + 𝑥" + 1 ", 𝑥& = 1; 1 , 𝑐2 = 0.2
• 𝑝& = −𝛻𝑓 - 𝑥& = [6; −6]

𝜶 ≤ 𝟎. 𝟑𝟐

Any small step 𝜶 → 𝟎 satisfies this condition


CL 603 Optimization 36
Curvature condition
-𝑝 -𝑝 , 𝑐
𝛻𝑓 𝑥& + 𝛼& 𝑝& & ≥ 𝑐" 𝛻𝑓 𝑥& & " ∈ (𝑐2 , 1)
• Rules out very small steps.
• 𝑐" = 0.9 for Newton type method, 0.1 for conjugate gradient
𝒄𝟏 = 𝟎. 𝟐, 𝒄𝟐 = 𝟎. 𝟒

-𝑝
𝑐" 𝛻𝑓 𝑥& &

𝜶 ≥ 𝟎. 𝟎𝟕 𝟎. 𝟎𝟕 ≤ 𝜶 ≤ 𝟎. 𝟑𝟐
Exact line search: 𝜶⋆ = 𝟎. 𝟐𝟓
CL 603 Optimization 37
Wolfe conditions
• Armijo condition and curvature condition
-𝑝
𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& &
-𝑝 -𝑝
𝛻𝑓 𝑥& + 𝛼& 𝑝& & ≥ 𝑐" 𝛻𝑓 𝑥& &

with 0 < 𝑐2 < 𝑐" < 1

• Strong Wolfe conditions: To bring 𝛼& closer to local minima


-𝑝
𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& &
-𝑝 -𝑝
𝛻𝑓 𝑥& + 𝛼& 𝑝& & ≤ 𝑐" 𝛻𝑓 𝑥& &

with 0 < 𝑐2 < 𝑐" < 1

Small 𝒄𝟏 and large 𝒄𝟐 lead to large bound on 𝜶

CL 603 Optimization 38
Strong Wolfe conditions

-𝑝
𝑐" 𝛻𝑓 𝑥& &

𝟎. 𝟎𝟕 ≤ 𝜶 ≤ 𝟎. 𝟑𝟏 𝟎. 𝟎𝟕 ≤ 𝜶 ≤ 𝟎. 𝟑𝟏

Eliminated region 0.31 ≤ 𝛼 ≤ 0.32 with large positive


𝛻𝑓 𝑥& + 𝛼& 𝑝& - 𝑝&

CL 603 Optimization 39
Why 𝑐2 < 𝑐"
𝒄𝟏 = 𝟎. 𝟓, 𝒄𝟐 = 𝟎. 𝟐

𝜶 ≤ 𝟎. 𝟏𝟐 𝜶 ≥ 𝟎. 𝟏𝟒

No feasible 𝜶

CL 603 Optimization 40
Backtrack algorithm
• Based on Armijo condition
1. Choose 𝛼g > 0, 𝜌, 𝑐2 ∈ (0,1)
2. Set 𝛼 = 𝛼g
-𝑝
3. Is 𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& & satisfied?
a. Yes: 𝛼& = 𝛼
b. No: 𝛼 = 𝜌𝛼, go back to step 3

CL 603 Optimization 41
How to select 𝛼g
• For Newton type methods, 𝛼g = 1
• Choice 1: Assume that first order change in function across
two iterates is same
𝛼& 𝛻- 𝑓 𝑥& 𝑝& = 𝛼&12 𝛻- 𝑓 𝑥&12 𝑝&12

jklm n o p qklm rklm


𝛼g =
n o p qk rk

• Choice 2: Fit a quadratic using 𝑓 𝑥&12 , 𝑓(𝑥& ) and


𝛻- 𝑓 𝑥& 𝑝& and define 𝛼g as the minimizer for this quadratic

2 𝑓 𝑥& − 𝑓 𝑥&12
𝛼g =
𝛻- 𝑓 𝑥& 𝑝&

CL 603 Optimization 42
Newton with inexact line search
Newton’s method with inexact After 4 iterations: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟑; 𝟖. 𝟓𝟖]
line search 𝒇⋆ = −𝟕. 𝟑𝟔
1. Initial guess 𝑥= = [4; 4] Analytical solution: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟒; 𝟖. 𝟓𝟖]
2. 𝛼= = 0.25 𝒇⋆ = −𝟕. 𝟑𝟔
3. 𝑥2 = [9.64; 9.09]
4. 𝛼2 = 1.00
5. 𝑥" = [12.39; 8.71]
6. 𝛼" = 1.00
7. 𝑥Q = [13.83; 8.61]
8. 𝛼Q = 1.00
9. 𝑥b = [14.13; 8.58]

CL 603 Optimization 43

You might also like