Gcmma
Gcmma
Gcmma
Krister Svanberg
KTH, Royal Institute of Technology
This note describes the algorithms used in the authors latest fortran implementations of
MMA and GCMMA. The first versions of these methods were published in [1] and [2].
The fortran implementations of the authors MMA and GCMMA codes are based on the
assumption that the users optimization problem is written on the following form, where the
optimization variables are x = (x1 , . . . , xn )T , y = (y1 , . . . , ym )T and z.
minimize f0 (x) + z +
m
X
(ci yi + 12 yi2 )
i=1
i = 1, . . . , m
xmin
xj xmax
,
j
j
j = 1, . . . , n
yi 0,
i = 1, . . . , m
(1.1)
z 0,
max , a , c and f max are
where f0 , f1 , . . . , fm are given differentiable functions, while xmin
i
i
j , xj
i
min
max
given real numbers which satisfy xj < xj , ai 0 and ci 0.
xj
xmax
,
j
i = 1, . . . , m
(1.2)
j = 1, . . . , n,
where f0 , f1 , . . . , fm are given differentiable functions. To make problem (1.1) (almost) equivalent to this problem (1.2), first let ai = 0 for all i > 0. Then z = 0 in any optimal solution
to (1.1). Further, for each i, let ci = a large number, so that the variables yi become very
expensive. Then typically y = 0 in any optimal solution to (1.1), and the corresponding x
is an optimal solution to (1.2).
It should be noted that the problem (1.1) always has feasible solutions, and in fact also at
least one optimal solution. This holds even if the users problem (1.2) does not have any
feasible solutions, in which case some yi > 0 in the optimal solution of (1.1).
1
2
p
X
i )2
(hi (x) h
i=1
i = 1, . . . , q
(1.3)
xmin
xj xmax
, j = 1, . . . , n
j
j
i and g max are given constants.
where hi and gi are given differentiable functions, while h
i
Problem (1.3) may equivalently be written on the following form with variables x IRn and
y1 , . . . , y2p IR:
p
X
2
minimize 12
(yi2 + yp+i
)
i=1
i,
subject to yi hi (x) h
i hi (x),
yp+i h
gi (x)
xmin
j
gimax ,
xj
yi 0,
xmax
,
j
i = 1, . . . , p
i = 1, . . . , p
i = 1, . . . , q
j = 1, . . . , n
i = 1, . . . , 2p.
(1.4)
= 2p + q,
f0 (x)
= 0,
fi (x)
= hi (x),
i,
= h
i = 1, . . . , p
= hi (x),
i,
= h
i = 1, . . . , p
fimax
fp+i (x)
max
fp+i
i = 1, . . . , p
i = 1, . . . , p
i = 1, . . . , q
max
f2p+i
i = 1, . . . , q
ai
= 0,
i = 1, . . . , m
ci
= 0,
i = 1, . . . , 2p
c2p+i
= large number, i = 1, . . . , q.
gimax ,
As a third example of how to transform a given problem to the form (1.1), assume that the
user wants to solve a min-max problem on the form
minimize
i=1,..,p
i = 1, . . . , q
(1.5)
xmin
xj xmax
, j = 1, . . . , n
j
j
where hi and gi are given differentiable functions, while gimax are given constants. For each
given x, the value of the objective function in problem (1.5) is the largest of the p real
numbers h1 (x), . . . , hp (x). If there is a known number hmin such that hi (x) hmin for all i
and all feasible x, then problem (1.5) may equivalently be written on the following form with
variables x IRn and z IR :
minimize
z + hmin
gimax ,
xj
xmax
,
j
i = 1, . . . , p
i = 1, . . . , q
j = 1, . . . , n
z 0.
To make problem (1.1) (almost) equivalent to this problem (1.6), let
m
= p + q,
f0 (x)
= hmin , (a constant!)
fi (x)
= hi (x),
fimax
i = 1, . . . , p
hmin ,
i = 1, . . . , p
i = 1, . . . , q
max
fp+i
i = 1, . . . , q
ai
= 1,
i = 1, . . . , p
ap+i
= 0,
i = 1, . . . , q
ci
= large number,
i = 1, . . . , m
gimax ,
(1.6)
MMA is a method for solving problems on the form (1.1), using the following approach: In
each iteration k {1, 2, . . .}, the current iteration point (x(k) , y(k) , z (k) ) is given. Then an
approximating subproblem, in which the original functions fi are replaced by certain convex
(k)
functions fi , is generated. The choice of these approximating functions is based mainly on
(k)
(k)
gradient information at the current iteration point, but also on some parameters uj and lj
(moving asymptotes) which are updated in each iteration based on information from previous iteration points. The subproblem is solved, and the unique optimal solution becomes
the next iteration point (x(k+1) , y(k+1) , z (k+1) ). Then a new subproblem is generated, etc.
(k+1)
(k)
A possible convergence criterium is that the iteration process is stopped when |xj
xj | <
< (xmax
xmin
j
j ) for all j = 1, . . . , n, where is a small number. The default value of this
parameter , called XCHTOL in the fortran code, is 104 . Other convergence criteria are of
course also possible.
The MMA subproblem looks as follows:
(k)
minimize f0 (x) + z + 12 d0 z 2 +
m
X
(ci yi + 12 yi2 )
i=1
(k)
subject to fi (x) ai z yi fimax ,
(k)
i = 1, . . . , m
(k)
xj j ,
j = 1, . . . , n,
yi 0,
(2.1)
i = 1, . . . , m
z 0,
(k)
In this subproblem (2.1), the approximating functions fi (x) are chosen as
(k)
fi (x)
(k)
n
X
(k)
pij
(k)
uj
j=1
+
xj
qij
xj
!
(k)
(k)
lj
+ ri , i = 0, 1, . . . , m,
(2.2)
where
(k)
pij
(k)
qij
(k)
(uj
(k)
xj )2
(k)
(k)
(xj lj )2
!
fi (k) +
fi (k)
105
,
1.001
(x ) + 0.001
(x ) + max
xj
xj
xj xmin
j
!
fi (k) +
fi (k)
105
0.001
(x ) + 1.001
(x ) + max
,
xj
xj
xj xmin
j
!
(k)
(k)
n
X
p
q
ij
ij
(k)
ri = fi (x(k) )
+ (k) (k) .
(k)
(k)
uj xj
xj lj
j=1
(2.3)
fi (k) +
fi (k)
denotes the largest of the two numbers
Here,
(x )
(x ) and 0,
xj
xj
fi (k)
fi (k)
while
(x )
denotes the largest of the two numbers
(x ) and 0.
xj
xj
(2.4)
(2.5)
(k)
It follows from the formulas (2.2)(2.5) that the functions fi are always first order approximations of the original functions fi at the current iteration point, i.e.
(k)
fi
fi (k)
(k)
(x(k) ) =
(x ).
fi (x(k) ) = fi (x(k) ) and
xj
xj
(2.6)
(k)
(k)
Moreover, the approximating functions f0 , . . . , fm are strictly convex. Based on this, it
can be shown that there is always a unique optimal solution to the MMA subproblem.
p0j
(k)
uj
j=1
subject to
q0j
xj
xj
(k)
n
X
(k)
uj
(k)
xj
xj
(k)
xj j ,
yi 0,
+ r0 + z + 12 d0 z 2 +
(k)
lj
qij
!
(k)
(k)
pij
j=1
(k)
(k)
n
X
m
X
(ci yi + 12 yi2 )
i=1
!
(k)
ai z yi bi ,
(k)
lj
i = 1, . . . , m
(2.7)
j = 1, . . . , n,
i = 1, . . . , m
z 0,
(k)
where bi
(k)
= fimax ri
for i = 1, . . . , m.
(k)
and j
(k)
max
= max{ xmin
xmin
j , lj + 0.1(xj lj ), xj 0.5(xj
j ) },
The bounds j
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(2.8)
(k)
= min{ xmax
, uj 0.1(uj xj ), xj + 0.5(xmax
xmin
j
j
j ) },
(k)
(k)
xj j
xmin
xj xmax
,
j
j
(k)
(k)
(2.9)
(2.10)
(k)
0.9(uj xj ),
(2.11)
(k)
0.5(xmax
xmin
j
j ).
(2.12)
0.9(xj lj ) xj xj
0.5(xmax
xmin
j
j ) xj xj
(k)
(k)
(k)
(k)
The default rules for updating the lower asymptotes lj and the upper asymptotes uj
as follows. The first two iterations, when k = 1 and k = 2,
(k)
= xj 0.5(xmax
xmin
j
j ),
(k)
= xj + 0.5(xmax
xmin
j
j ).
lj
uj
are
(k)
(k)
(2.13)
lj
(k)
uj
where
(k)
(k)
(k)
(k1)
= xj j (xj
=
(k)
xj
(k) (k1)
j (uj
(k1)
lj
),
(k1)
xj
),
(k)
(k1)
(k1)
(k2)
)(xj
xj
) < 0,
0.7 if (xj xj
(k)
(k1)
(k1)
(k2)
=
1.2 if (xj xj
)(xj
xj
) > 0,
(k)
(k1)
(k1)
(k2)
1 if (xj xj
)(xj
xj
) = 0,
(2.14)
(2.15)
xj 0.01(xmax
xmin
j
j ),
(k)
xj 10(xmax
xmin
j
j ),
(k)
xj + 0.01(xmax
xmin
j
j ),
(k)
xj + 10(xmax
xmin
j
j ).
lj
lj
uj
uj
(k)
(k)
(2.16)
(k)
(k)
(k)
(k)
or uj
Note that most of the explicit numbers in the above expressions are just default values of
different parameters in the fortran code. More precisely:
The
The
The
The
The
The
The
The
number
number
number
number
number
number
number
number
105 in (2.3) and (2.4) is the default value of the parameter RAAI.
0.1 in (2.8) and (2.9) is the default value of the parameter ALBEFA.
0.5 in (2.8) and (2.9) is the default value of the parameter XXMOVE.
0.5 in (2.13) is the default value of the parameter GHINIT.
0.7 in (2.15) is the default value of the parameter GHDECR.
1.2 in (2.15) is the default value of the parameter GHINCR.
0.01 in (2.16) is the default value of the parameter ASYMIN.
10.0 in (2.16) is the default value of the parameter ASYMAX.
All these values can be carefully changed by the user. As an example, a more conservative
method is obtain by decreasing XXMOVE and/or ASYMAX and/or GHINIT and/or GHINCR.
The globally convergent version of MMA, from now on called GCMMA, for solving problems
of the form (1.1) consists of outer and inner iterations. The index k is used to denote
the outer iteration number, while the index is used to denote the inner iteration number.
Within each outer iteration, there may be zero, one, or several inner iterations. The double
index (k, ) is used to denote the :th inner iteration within the k:th outer iteration.
The first iteration point is obtained by first chosing x(1) X and then chosing y(1) and z (1)
such that (x(1) , y(1) , z (1) ) becomes a feasible solution of (1.1). This is easy. An outer iteration
of the method, going from the k:th iteration point (x(k) , y(k) , z (k) ) to the (k + 1):th iteration
point (x(k+1) , y(k+1) , z (k+1) ), can be described as follows:
Given (x(k) , y(k) , z (k) ), an approximating subproblem is generated and solved. In this sub(k,0)
problem, the functions fi (x) are replaced by certain convex functions fi
(x). The optimal
(k,0) (k,0)
(k,0)
(k,0)
(k,0)
, z
). If fi
(
x
) fi (
x(k,0) ), for all
i = 0, 1, . . . , m , the next iteration point becomes (x(k+1) , y(k+1) , z (k+1) ) = (
x(k,0) , y
(k,0) , z(k,0) ),
and the outer iteration is completed (without any inner iterations needed). Otherwise, an
inner iteration is made, which means that a new subproblem is generated and solved at
(k,1)
(k,0)
x(k) , with new approximating functions fi
(x) which are more conservative than fi
(x)
for those indices i for which the above inequality was violated. The optimal solution of
(k,1) (k,1)
this new subproblem is denoted (
x(k,1) , y
(k,1) , z(k,1) ). If fi
(
x
) fi (
x(k,1) ), for all i =
0, 1, . . . , m , the next iteration point becomes (x(k+1) , y(k+1) , z (k+1) ) = (
x(k,1) , y
(k,1) , z(k,1) ),
and the outer iteration is completed (with one inner iterations needed). Otherwise, another inner iteration is made, which means that a new subproblem is generated and solved
(k,2)
at x(k) , with new approximating functions fi
(x), etc. These inner iterations are re(k,) (k,)
(k,)
peated until fi
(
x
) fi (
x
) for all i = 0, 1, . . . , m , which always happens after
a finite (usually small) number of inner iterations. Then the next iteration point becomes
(x(k+1) , y(k+1) , z (k+1) ) = (
x(k,) , y
(k,) , z(k,) ), and the outer iteration is completed (with
inner iterations needed).
It should be noted that in each inner iteration, there is no need to recalculate the gradients
fi (x(k) ), since x(k) has not changed. Gradients of the original functions fi are calculated
only once in each outer iteration. This is an important note since the calculation of gradients
is typically the most time consuming part in structural optimization.
The GCMMA subproblem looks as follows, for k {1, 2, 3, . . .} and {0, 1, 2, . . .}:
(k,)
minimize f0 (x) + z + 12 d0 z 2 +
m
X
(ci yi + 12 yi2 )
i=1
(k,)
subject to fi
(x) ai z yi fimax ,
(k)
(k)
xj j ,
i = 1, . . . , m
j = 1, . . . , n,
yi 0,
i = 1, . . . , m
z 0,
(3.1)
(k,)
where the approximating functions fi
(x) are chosen as
(k,) !
(k,)
n
X
q
p
ij
ij
(k,)
(k,)
+
+ ri , i = 0, 1, . . . , m .
fi
(x) =
(k)
(k)
uj xj
xj lj
j=1
(3.2)
Here,
(k,)
pij
(k,)
qij
(k)
fi (k)
1.001
(x )
xj
(k)
= (uj xj )2
+
fi (k)
+ 0.001
(x )
xj
(k,)
xmax
xmin
j
j
, (3.3)
!
(k,)
i
fi (k)
fi (k) +
(x ) + 1.001
(x ) + max
, (3.4)
0.001
xj
xj
xj xmin
j
!
(k,)
(k,)
n
X
q
p
ij
ij
(k,)
(3.5)
+ (k) (k) .
ri
= fi (x(k) )
(k)
(k)
uj xj
xj lj
j=1
(k)
(k)
(xj lj )2
(k)
(k)
(k)
(k)
Between each outer iteration, the bounds j and j and the asymptotes lj and uj are
updated as in the original MMA, the formulas (2.8)(2.16) still hold.
(k,)
The parameters i
in (3.3) and (3.4) are strictly positive and updated as follows:
Within a given outer iteration k, the only differences between two inner iterations are the
values of some of these parameters. In the beginning of each outer iteration, when = 0, the
following values are used:
n
X
fi (k) max
(k,0)
min
min 0.1
i
= max{ ,
(3.6)
xj (x )(xj xj ) } , for i = 0, 1, .., m,
n
j=1
where the default value for the parameter min , called RAAMIN in the fortran code, is 106 .
(k,)
(k,)
(x) =
(k)
(k)
(k)
n
X
(uj lj )(xj xj )2
j=1
(k)
(k)
(3.7)
Now, let
(k,)
(k,)
i
(k)
(k,)
fi (
x(k,) ) fi
(
x(k,) )
.
d(k) (
x(k,) )
(3.8)
(k,)
(k,)
(k,)
Then hi (
x(k,) )+(i +i )d(k) (
x(k,) ) = fi (
x(k,) ), which shows that i +i
might
(k,+1)
be a natural value of i
. In order to get a globally convergent method, this natural value
is modified as follows.
(k,+1)
= min{ 1.1 (i
(k,+1)
= i
(k,)
(k,)
+ i
(k,)
(k,)
) , 10i
(k,)
> 0,
(k,)
0.
} if i
if i
(3.9)
(k,)
It follows from the formulas (3.2)(3.5) that the functions fi
are always first order approximations of the original functions fi at the current iteration point, i.e.
(k,)
fi
fi (k)
(k,) (k)
(x(k) ) =
(x ).
fi
(x ) = fi (x(k) ) and
xj
xj
(3.10)
(k,)
(k,)
Since the parameters i
are always strictly positive, the functions fi
are strictly convex.
Based on this, it can be shown that there is always a unique optimal solution to the GCMMA
subproblem.
m
X
(ci yi + 21 yi2 )
i=1
subject to fi (x) ai z yi bi ,
where
fi (x) =
n
X
i = 1, . . . , m
j xj j ,
j = 1, . . . , n
z 0, yi 0,
i = 1, . . . , m
fij (xj ) =
j=1
n
X
j=1
pij
qij
+
uj xj
xj lj
(4.1)
, for i = 0, 1, . . . , m.
(4.2)
4.1
A Lagrangean relaxation of the problem, with respect to the explicit constraints only,
gives rise to the following Lagrange function:
L(x, y, z, ) = f0 (x) + z + 21 d0 z 2 +
m
X
(ci yi +
i=1
1 2
2 yi )
m
X
i (fi (x) ai z yi bi ),
(4.3)
i=1
m
X
i fi (x) =
i=1
X
n
n
X
pj ()
qj ()
+
=
Lxj (xj , ),
uj xj
xj lj
j=1
L (y, ) =
m
X
(4.4)
(4.5)
j=1
(ci yi + 21 yi2 i yi ) ,
(4.6)
i=1
Lz (z, ) = (1T a) z + 12 d0 z 2 .
10
(4.7)
Here,
pj () = p0j +
m
X
i pij > 0
and qj () = q0j +
m
X
i qij > 0.
(4.8)
The dual function value () at the point 0 is by definition obtained as the optimal value
of the problem of minimizing the Lagrange function with respect to the primal variables
x, y and z, subject to the implicit constraints only, i.e. as the optimal value of the problem
minimize L(x, y, z, )
subject to j xj j ,
j = 1, . . . , n,
(4.9)
z 0, yi 0, i = 1, . . . , m.
Note that, in this problem (4.9), the vector is held fixed.
Due to (4.4), problem (4.9) can be divided into three separate problems which can be solved
independently of each other. First one problem in x IRn , namely
minimize Lx (x, ) =
n
X
Lxj (xj , )
j=1
(4.10)
subject to j xj j , j = 1, . . . , n,
which in turn can be divided into n separate problems, each involving only one variable xj :
minimize Lxj (xj , ) =
qj ()
pj ()
+
subject to j xj j .
uj xj
xj lj
(4.11)
(4.13)
(4.14)
Each of the three problems (4.11), (4.13) and (4.14) can easily be solved analytically.
The optimal solutions yi () and z() of (4.13) and (4.14), respectively, are given by
yi () = (i ci )+ and z() =
(T a 1)+
,
d0
11
(4.15)
The optimal values of the problems (4.12) and (4.14), respectively, are then given by
Ly (
y(), ) = 12
m
X
((i ci )+ )2 = 21
i=1
and
Lz (
z (), ) =
m
X
yi ()2
(4.16)
i=1
((T a1)+ )2
= 12 d0 z()2 .
2d0
(4.17)
Let j () denote the unique number xj (lj , uj ) which satisfies the equation
Lxj
(xj , ) = 0,
xj
where Lxj (xj , ) is defined in (4.11). Straigh-forward calculations give that
p
p
lj pj () + uj qj ()
p
j () = p
.
pj () + qj ()
Then the optimal solution x
j () of (4.11) is
j ()
x
j () =
(4.19)
given by
if j () j ,
if j < j () < j ,
(4.20)
if j () j ,
(4.18)
(4.21)
j=1
(4.22)
i=1
(4.24)
The following theorem, which is proved later, provides the main motivation for considering
this dual problem D.
is an optimal solution to the dual problem (4.24), and let
Theorem 1: Assume that
(
x, y
, z) be the corresponding primal solution obtained from the formulas
4.2
In this section, explicit expressions for the gradient and the Hessian of the dual function
are derived. These are needed when a modified Newton method is used for solving the dual
problem. It turns out that the first order derivatives of are continuous everywhere, while
the second order derivatives are not.
First, some simplifying notations are introduced. Let
w = (xT , yT , z)T ,
W = { w = (xT , yT , z)T | j xj j , j, yi 0, i, z 0 },
P
g0 (w) = f0 (x) + z + 1 d0 z 2 +
(ci yi + 1 y 2 ),
i
2 i
gi (w) = fi (x) ai z yi bi ,
g(w) = (g1 (w), . . . , gm (w))T .
Then the primal subproblem (4.1) may be written
P:
minimize g0 (w)
subject to gi (w) 0, i = 1, . . . , m
(4.25)
w W.
The corresponding Lagrange function, which is the same Lagrange function as in (4.3), is
L(w, ) = g0 (w) +
m
X
(4.26)
i=1
while the corresponding dual function, which is the same dual function as in (4.23), is
() = min L(w, ) = L(w(),
) = g0 (w())
+ T g(w()),
wW
(4.27)
where w()
= (
x()T , y
()T , z())T is the unique w W which minimizes L(w, ) on W .
The different components in the vector w()
() ()
)),
(4.28)
T
() ()
()
g(w()).
(4.29)
() ()
) L( w(
),
),
),
)).
change place in (4.28).
Then (4.29) follows by simply letting and
13
t 0
+ t d) ()
(
T d.
= g(w(
))
t
(4.30)
+ t d))T d ( + t d) () g(w(
T d,
g(w(
))
t
while for t < 0, they imply that:
+ t d))T d.
T d ( + t d) () g(w(
g(w(
))
t
+ t d)) g(w(
when t 0.
The lemma now follows from the fact that g(w(
))
Lemma 4.2.3: The partial derivatives of the dual function , at a given point ,
are given by
() = gi (w()),
i = 1, . . . , m.
i
(4.31)
() =
i
j=1
pij
qij
+
uj x
j () x
j ()lj
ai T
( a1)+ (i ci )+ bi .
d0
(4.32)
where x
j () is defined by the formula (4.20).
Proof:
According to Lemma 4.2.3, and the definition of gi (w), the partial derivative of the dual
function are given by
() = fi (
x()) ai z() yi () bi .
i
(4.33)
The lemma now follows from the definition of fi (x) and the formulas (4.15).
Since these partial derivatives of are continuous, the dual function is a continuously
differentiable function.
14
() =
n
X
j=1
qj ()
pj ()
+
uj x
j () x
j ()lj
y () = 21
m
X
=
n
X
xj (),
(4.34)
(4.35)
j=1
((i ci )+ )2 ,
(4.36)
i=1
z () =
((T a1)+ )2
.
2d0
(4.37)
n
X
xj (),
(4.38)
j=1
0
if i < ci ,
not defined if i = ci ,
[y ()]ii =
1
if i > ci ,
the elements of the matrix z () are given by
0
if T a < 1,
not defined if T a = 1,
[z ()]ik =
aa
i k
if T a > 1,
d0
(4.39)
(4.40)
x
j () ik =
if
j () < j or j () > j ,
if
j () = j or j () = j ,
hj () vij () vkj ()
if
j < j () < j ,
(4.41)
where
pij
qij
vij () =
and hj () =
pj () qj ()
15
p
pj ()qj ()
.
2(uj lj )
(4.42)
Proof:
Straight-forward analytical calculations of the second order derivatives of y () in (4.36)
and z () in (4.37) directly give the expressions for y () in (4.39) and z () in (4.40).
If j < j () < j then x
j () = j () so that
xj
pij
qij
() = fij (
xj ()) = fij (j ()) =
.
+
i
uj j () j () lj
(4.43)
p
p
pj () + qj ()
uj lj
qij
pij
p
+p
pj ()
qj ()
!
,
,
k i
2(uj lj )
pj () qj ()
pj () qj ()
(4.44)
(4.45)
(4.46)
Thus, the first order derivatives of xj are constant in a neighbourhood of , which implies
that all the second order derivates of xj are zero at . This proves the first line in (4.41).
On the hyperplanes j () = j and j () = j , the second order derivatives of xj typically
dont exist since they make a sudden jump when passes through any of these hyperplanes.
(It is typically not the case that the right hand side in (4.45) is zero when j () = j or j .)
16
4.3
))
))
Proof:
0, g(w(
0 and
T g(w(
= 0.
First, assume that
))
))
T
T
()
g(w())
0 for every 0.
(4.47)
> 0.
T
= k gk (w())
< 0.
T g(w())
= k gk (w())
Assume that
= w(
).
T
(4.48)
T g(w)
where the equality follows since
= 0, the strict inequality follows since w
is the
unique minimizer of the Lagrange function, and the final inequality follows since g(w) 0
0. The equality in (4.48) also implies that g0 (w)
= L( w(
)
= ().
and
= L( w,
)
),
17
4.4
(4.49)
Since the dual objective function is concave and continuously differentiable, a given point
1, . . . ,
m )T IRm is an optimal solution to this dual problem if and only if
= (
(1)
(2)
(3)
i 0
for all i = 1, . . . , m,
(4.50)
i = 0 } and I1 ()
i > 0 }.
={i|
={i|
where I0 ()
Then the unique optimal solution (
x, y
, z) to the primal subproblem (4.1) is obtained from
i 0
for all i = 1, . . . , m,
(3)
() g for all i I1 (),
i
(2)
(4.51)
The default value of the tolerance parameter g , called GEPS in the fortran code, is 105 .
(4.52)
w W.
Note that if = 0 then the primal subproblem (4.1) is obtained.
The Lagrange function corresponding to P() is
L (w, ) = g0 (w) +
X
i
i ,
(4.53)
wW
i = L0 (w(),
i ,
(4.54)
where w()
i ,
(4.55)
(4.56)
i subject to 0.
(4.57)
() = gi (w(
))
= gi (w)
.
i
Therefore, it follows from the definition (4.51) that
(2)
i = 0,
gi (w)
+g if
(4.58)
i > 0.
(3) g gi (w)
+g if
In particular, this implies that w
is a feasible solution to P(+g ).
Next, assume that w is a feasible solution to P(g ). Then
X
i gi (w)
g0 (w)
= L0 (w,
)
i >0
L0 (w, )
i gi (w)
i >0
= g0 (w) +
i gi (w)
i >0
g0 (w) +
i gi (w)
i >0
i (g )
i >0
= g0 (w),
and the proof is complete.
19
X
i >0
i (g )
(4.59)
Assume that
= w(
).
Then w
is a feasible solution to P(0) and g0 (w)
the optimal value of P(2g ).
is an g optimal solution to D(g ), and let w
Assume that
= w(
).
Then w
is a feasible solution to P(2g ) and g0 (w)
the optimal value of P(0).
is an g optimal solution to D(0), and let w
Assume that
= w(
).
Then w
is a feasible solution to P(g ) and g0 (w)
the optimal value of P(g ).
4.5
(4.60)
In this section, the gradient vector of the dual objective function is denoted g(), i.e.
g() = (g1 (), . . . , gm ())T =
T
() , . . . ,
() .
1
m
(4.61)
(4.62)
1, . . . ,
m )T and the index set A satisfy that
= (
Lemma 4.5.1: Assume that the point
(1)
i = 0 for all i A,
(4.63)
0 for all i A.
(4) gi ()
is an optimal solution to the problem D.
Then
is an optimal solution to the problem D, then there is an
Conversely, If
index set A M such that (1)(4) are satisfied (namely every index set A
i = 0}).
< 0} A { i M |
such that { i M | gi ()
Lemma 4.5.2: The point = (1 , . . . , m )T is an optimal solution to the equality-constrained
subproblem DA if and only if
(1)
i = 0 for all i A,
(4.64)
i = 0 for all i A,
(4.65)
(4.66)
(4.67)
g for all i A.
gi ()
Def: A point = (1 , . . . , m )T is an g optimal solution to the problem DA
if the following conditions hold:
i = 0 for all i A,
|gi ()| g for all i M \A.
(4.68)
i = 0 }.
is an g optimal solution to D, and A = { i M |
Lemma 4.5.4: Assume that
is an g optimal solution also to DA .
Then
Further, each g optimal solution to DA which satisfies 0 and
gi () g for all i A is an g optimal solution to D.
21
4.6
The active set algorithm, used in the fortran implementation for finding an g optimal
solution to the dual problem D, is now described.
At iteration k, the current index set guess is A(k) and the current iteration point is (k) .
It might be noted that if (k+1) 6= (k) then ((k+1) ) > ((k) ).
STEP 0: The index counter is set to k = 1, the first iteration point is set to (1) = 0,
and the first index guess is set to A(1) = { i M | gi ((1) ) g }.
If A(1) = M, then (1) = 0 is an g optimal solution to the dual problem D and the algorithm is stopped! Otherwise, the algorithm continues with STEP 1.
STEP 1: Here, the current index set A(k) M is given, together with the current iteration
(k)
(k)
point (k) IRm which satisfies i = 0 for i A(k) and i 0 for i M \A(k) .
If | gi ((k) )| g for all i M \A(k) , the algorithm continues with STEP 2.
Otherwise, the algorithm continues with STEP 3.
STEP 2: Here, the current iteration point (k) is an g optimal solution to DA(k) .
If gi ((k) ) g for all i A(k) then (k) is an g optimal solution to D and the
algorithm is stopped!
Otherwise, let p A(k) be an index such that gp ((k) ) = maxi { gi ((k) ) | i A(k) } (> g ).
Then the index set, the iteration point, and the iteration counter are updated as follows:
A(k+1) = A(k) \ {p}, (k+1) = (k) , k k + 1,
whereafter the algorithm continues with STEP 1.
STEP 3: Here, the current iteration point (k) is not an g optimal solution to DA(k) .
(k)
Then a search direction d(k) IRm such that di = 0 for i A(k) and g((k) )T d(k) > 0
is calculated. If d(k) 0 the algorithm continues with STEP 4.
( (k)
)
i
(k)
Otherwise, a maximal steplength is calculated through tmax = min
| di < 0 .
(k)
i
di
If g((k) + tmax d(k) )T d(k) < 0 the algorithm continues with STEP 4.
(k)
(k)
22
(5.1)
(1)
(1)
With c1 = c2 = 1000, a1 = a2 = 0, the starting point (x1 , x2 , x3 ) = (4, 3, 2), and the
(k)
(k)
parameter GEPS = 107 , it turned out that y1 = y2 = z (k) = 0 for all k 2, while x(k) ,
f0 (x(k) ), f1 (x(k) ) and f2 (x(k) ), for k = 1, . . . , 7, are shown in the following two tables:
k
(k)
x1
(k)
x2
(k)
x3
f0 (x(k) )
f1 (x(k) )
f2 (x(k) )
MMA:
9.959929
6.848340 9.215195
8.803031
8.885662 9.023207
8.770329
8.999802 9.000017
8.770249
9.000001 8.999998
8.770246
9.000000 9.000000
8.770246
9.000000 9.000000
(k)
x1
(k)
x2
(k)
x3
f0 (x(k) )
f1 (x(k) )
f2 (x(k) )
8.937619
8.650326 8.991408
8.773025
8.997020 8.998887
8.770396
8.999988 8.999891
8.770255
8.999998 8.999992
8.770246
9.000000 9.000000
Note that, on this particular problem, MMA is slightly faster than GCMMA. Also note that
some of the iteration points generated by MMA are slightly infeasible, while all the iteration
points generated by GCMMA are feasible (within the tolerances given by GEPS).
References
[1] K. Svanberg, The method of moving asymptotes a new method for structural optimization, International Journal for Numerical Methods in Engineering, 1987, 24, 359-373.
[2] K. Svanberg, A class of globally convergent optimization methods based on conservative
convex separable approximations, SIAM Journal of Optimization, 2002, 12, 555-573.
23