Gcmma

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

MMA and GCMMA Fortran versions March 2013

Krister Svanberg
KTH, Royal Institute of Technology
This note describes the algorithms used in the authors latest fortran implementations of
MMA and GCMMA. The first versions of these methods were published in [1] and [2].

Considered optimization problem

The fortran implementations of the authors MMA and GCMMA codes are based on the
assumption that the users optimization problem is written on the following form, where the
optimization variables are x = (x1 , . . . , xn )T , y = (y1 , . . . , ym )T and z.
minimize f0 (x) + z +

m
X

(ci yi + 12 yi2 )

i=1

subject to fi (x) ai z yi fimax ,

i = 1, . . . , m

xmin
xj xmax
,
j
j

j = 1, . . . , n

yi 0,

i = 1, . . . , m

(1.1)

z 0,
max , a , c and f max are
where f0 , f1 , . . . , fm are given differentiable functions, while xmin
i
i
j , xj
i
min
max
given real numbers which satisfy xj < xj , ai 0 and ci 0.

In problem (1.1), the true optimization variables are x1 , . . . , xn , while y1 , . . . , ym and z


are artificial optimization variables which should make it easier for the user to formulate
and solve certain subclasses of problems, like least squares problems and minmax problems.
As a first example of how to transform a given problem to the form (1.1), assume that the
user wants to solve a problem on the following standard form for nonlinear programming.
minimize f0 (x)
subject to fi (x) fimax ,
xmin
j

xj

xmax
,
j

i = 1, . . . , m

(1.2)

j = 1, . . . , n,

where f0 , f1 , . . . , fm are given differentiable functions. To make problem (1.1) (almost) equivalent to this problem (1.2), first let ai = 0 for all i > 0. Then z = 0 in any optimal solution
to (1.1). Further, for each i, let ci = a large number, so that the variables yi become very
expensive. Then typically y = 0 in any optimal solution to (1.1), and the corresponding x
is an optimal solution to (1.2).
It should be noted that the problem (1.1) always has feasible solutions, and in fact also at
least one optimal solution. This holds even if the users problem (1.2) does not have any
feasible solutions, in which case some yi > 0 in the optimal solution of (1.1).

Now some practical considerations:


The user should preferably scale the constraints in such a way that 1 fimax 100 for each
i (and not fimax = 1010 ). The objective function f0 (x) should preferably be scaled such that
1 f0 (x) 100 for reasonable values on the variables. The variables xj should preferably
be scaled such that 0.1 xmax
xmin
100, for all j.
j
j
Concerning the large numbers on the coefficients ci mentioned above, the user should for
numerical reasons try to avoid extremely large values on these coefficients (like 1010 ). It is
better to start with reasonably large values and then, if it turns out that not all yi = 0 in
the optimal solution of (1.1), increase the corresponding values of ci by e.g. a factor 100 and
solve the problem again, etc. If the functions and the variables have been scaled according to
above, then resonably large values on the parameters ci could be, say, ci = 1000 or 10000.
Finally, concerning the simple bound constraints xmin
xj xmax
, it may sometimes be
j
j
the case that some variables xj do not have any prescribed upper and/or lower bounds. In
that case, it is in practice always possible to choose artificial bounds xmin
and xmax
such
j
j
that every realistic solution x satisfies the corresponding bound constraints. The user should
then preferably avoid choosing xmax
xmin
unnecessarily large. It is better to try some
j
j
reasonable bounds and then, if it turns out that some variable xj becomes equal to such an
artificial bound in the optimal solution of (1.1), change this bound and solve the problem
again (starting from the recently obtained solution), etc.
As a second example of how to transform a given problem to the form (1.1), assume that the
user wants to solve a constrained least squares problem on the form
minimize

1
2

p
X
i )2
(hi (x) h
i=1

subject to gi (x) gimax ,

i = 1, . . . , q

(1.3)

xmin
xj xmax
, j = 1, . . . , n
j
j
i and g max are given constants.
where hi and gi are given differentiable functions, while h
i
Problem (1.3) may equivalently be written on the following form with variables x IRn and
y1 , . . . , y2p IR:
p
X
2
minimize 12
(yi2 + yp+i
)
i=1

i,
subject to yi hi (x) h
i hi (x),
yp+i h
gi (x)
xmin
j

gimax ,

xj

yi 0,

xmax
,
j

i = 1, . . . , p
i = 1, . . . , p
i = 1, . . . , q
j = 1, . . . , n
i = 1, . . . , 2p.

(1.4)

To make problem (1.1) (almost) equivalent to this problem (1.4), let


m

= 2p + q,

f0 (x)

= 0,

fi (x)

= hi (x),
i,
= h

i = 1, . . . , p

= hi (x),
i,
= h

i = 1, . . . , p

fimax
fp+i (x)
max
fp+i

i = 1, . . . , p
i = 1, . . . , p

f2p+i (x) = gi (x),

i = 1, . . . , q

max
f2p+i

i = 1, . . . , q

ai

= 0,

i = 1, . . . , m

ci

= 0,

i = 1, . . . , 2p

c2p+i

= large number, i = 1, . . . , q.

gimax ,

As a third example of how to transform a given problem to the form (1.1), assume that the
user wants to solve a min-max problem on the form
minimize

max {hi (x)}

i=1,..,p

subject to gi (x) gimax ,

i = 1, . . . , q

(1.5)

xmin
xj xmax
, j = 1, . . . , n
j
j
where hi and gi are given differentiable functions, while gimax are given constants. For each
given x, the value of the objective function in problem (1.5) is the largest of the p real
numbers h1 (x), . . . , hp (x). If there is a known number hmin such that hi (x) hmin for all i
and all feasible x, then problem (1.5) may equivalently be written on the following form with
variables x IRn and z IR :
minimize

z + hmin

subject to z hi (x) hmin ,


gi (x)
xmin
j

gimax ,

xj

xmax
,
j

i = 1, . . . , p
i = 1, . . . , q
j = 1, . . . , n

z 0.
To make problem (1.1) (almost) equivalent to this problem (1.6), let
m

= p + q,

f0 (x)

= hmin , (a constant!)

fi (x)

= hi (x),

fimax

i = 1, . . . , p

hmin ,

i = 1, . . . , p

fp+i (x) = gi (x),

i = 1, . . . , q

max
fp+i

i = 1, . . . , q

ai

= 1,

i = 1, . . . , p

ap+i

= 0,

i = 1, . . . , q

ci

= large number,

i = 1, . . . , m

gimax ,

(1.6)

The ordinary MMA

MMA is a method for solving problems on the form (1.1), using the following approach: In
each iteration k {1, 2, . . .}, the current iteration point (x(k) , y(k) , z (k) ) is given. Then an
approximating subproblem, in which the original functions fi are replaced by certain convex
(k)
functions fi , is generated. The choice of these approximating functions is based mainly on
(k)
(k)
gradient information at the current iteration point, but also on some parameters uj and lj
(moving asymptotes) which are updated in each iteration based on information from previous iteration points. The subproblem is solved, and the unique optimal solution becomes
the next iteration point (x(k+1) , y(k+1) , z (k+1) ). Then a new subproblem is generated, etc.
(k+1)
(k)
A possible convergence criterium is that the iteration process is stopped when |xj
xj | <
< (xmax
xmin
j
j ) for all j = 1, . . . , n, where is a small number. The default value of this
parameter , called XCHTOL in the fortran code, is 104 . Other convergence criteria are of
course also possible.
The MMA subproblem looks as follows:
(k)
minimize f0 (x) + z + 12 d0 z 2 +

m
X

(ci yi + 12 yi2 )

i=1
(k)
subject to fi (x) ai z yi fimax ,
(k)

i = 1, . . . , m

(k)

xj j ,

j = 1, . . . , n,

yi 0,

(2.1)

i = 1, . . . , m

z 0,
(k)
In this subproblem (2.1), the approximating functions fi (x) are chosen as
(k)
fi (x)

(k)

n
X

(k)

pij

(k)
uj

j=1

+
xj

qij
xj

!
(k)

(k)
lj

+ ri , i = 0, 1, . . . , m,

(2.2)

where
(k)
pij

(k)
qij

(k)
(uj

(k)
xj )2

(k)
(k)
(xj lj )2

!



fi (k) +
fi (k)
105
,
1.001
(x ) + 0.001
(x ) + max
xj
xj
xj xmin
j


!



fi (k) +
fi (k)
105
0.001
(x ) + 1.001
(x ) + max
,
xj
xj
xj xmin
j
!
(k)
(k)
n
X
p
q
ij
ij
(k)
ri = fi (x(k) )
+ (k) (k) .
(k)
(k)
uj xj
xj lj
j=1

(2.3)


fi (k) +
fi (k)
denotes the largest of the two numbers
Here,
(x )
(x ) and 0,
xj
xj


fi (k)
fi (k)
while
(x )
denotes the largest of the two numbers
(x ) and 0.
xj
xj


(2.4)

(2.5)

(k)
It follows from the formulas (2.2)(2.5) that the functions fi are always first order approximations of the original functions fi at the current iteration point, i.e.
(k)

fi
fi (k)
(k)
(x(k) ) =
(x ).
fi (x(k) ) = fi (x(k) ) and
xj
xj

(2.6)

(k)
(k)
Moreover, the approximating functions f0 , . . . , fm are strictly convex. Based on this, it
can be shown that there is always a unique optimal solution to the MMA subproblem.

An alternative, but equivalent, formulation of the MMA subproblem (2.1) is as follows:


minimize

p0j

(k)
uj

j=1

subject to

q0j

xj

xj

(k)

n
X

(k)
uj

(k)

xj

xj
(k)

xj j ,

yi 0,

+ r0 + z + 12 d0 z 2 +

(k)
lj

qij

!
(k)

(k)

pij

j=1

(k)

(k)

n
X

m
X

(ci yi + 12 yi2 )

i=1

!
(k)

ai z yi bi ,

(k)
lj

i = 1, . . . , m

(2.7)

j = 1, . . . , n,

i = 1, . . . , m

z 0,
(k)

where bi

(k)

= fimax ri

for i = 1, . . . , m.

(k)

and j

(k)

max
= max{ xmin
xmin
j , lj + 0.1(xj lj ), xj 0.5(xj
j ) },

The bounds j

(k)

(k)

in (2.1) and (2.7) are chosen as


(k)

(k)

(k)

(k)

(k)

(k)

(k)

(2.8)

(k)

= min{ xmax
, uj 0.1(uj xj ), xj + 0.5(xmax
xmin
j
j
j ) },
(k)

which means that the constraints j


of constraints:

(k)

xj j

are equivalent to the following three sets

xmin
xj xmax
,
j
j
(k)

(k)

(2.9)

(2.10)

(k)

0.9(uj xj ),

(2.11)

(k)

0.5(xmax
xmin
j
j ).

(2.12)

0.9(xj lj ) xj xj

0.5(xmax
xmin
j
j ) xj xj

(k)

(k)

(k)

(k)

The default rules for updating the lower asymptotes lj and the upper asymptotes uj
as follows. The first two iterations, when k = 1 and k = 2,
(k)

= xj 0.5(xmax
xmin
j
j ),

(k)

= xj + 0.5(xmax
xmin
j
j ).

lj

uj

are

(k)
(k)

(2.13)

In later iterations, when k 3,


(k)

lj

(k)
uj

where
(k)

(k)

(k)

(k1)

= xj j (xj
=

(k)
xj

(k) (k1)
j (uj

(k1)

lj

),

(k1)
xj
),

(k)
(k1)
(k1)
(k2)

)(xj
xj
) < 0,
0.7 if (xj xj
(k)
(k1)
(k1)
(k2)
=
1.2 if (xj xj
)(xj
xj
) > 0,

(k)
(k1)
(k1)
(k2)
1 if (xj xj
)(xj
xj
) = 0,

(2.14)

(2.15)

provided that this leads to values that satisfy


(k)

xj 0.01(xmax
xmin
j
j ),

(k)

xj 10(xmax
xmin
j
j ),

(k)

xj + 0.01(xmax
xmin
j
j ),

(k)

xj + 10(xmax
xmin
j
j ).

lj

lj

uj

uj

(k)

(k)

(2.16)

(k)

(k)

(k)

If any of these bounds is violated, the corresponding lj


of the violated inequality.

(k)

or uj

is put to the right hand side

Note that most of the explicit numbers in the above expressions are just default values of
different parameters in the fortran code. More precisely:
The
The
The
The
The
The
The
The

number
number
number
number
number
number
number
number

105 in (2.3) and (2.4) is the default value of the parameter RAAI.
0.1 in (2.8) and (2.9) is the default value of the parameter ALBEFA.
0.5 in (2.8) and (2.9) is the default value of the parameter XXMOVE.
0.5 in (2.13) is the default value of the parameter GHINIT.
0.7 in (2.15) is the default value of the parameter GHDECR.
1.2 in (2.15) is the default value of the parameter GHINCR.
0.01 in (2.16) is the default value of the parameter ASYMIN.
10.0 in (2.16) is the default value of the parameter ASYMAX.

All these values can be carefully changed by the user. As an example, a more conservative
method is obtain by decreasing XXMOVE and/or ASYMAX and/or GHINIT and/or GHINCR.

GCMMA the globally convergent version of MMA

The globally convergent version of MMA, from now on called GCMMA, for solving problems
of the form (1.1) consists of outer and inner iterations. The index k is used to denote
the outer iteration number, while the index is used to denote the inner iteration number.
Within each outer iteration, there may be zero, one, or several inner iterations. The double
index (k, ) is used to denote the :th inner iteration within the k:th outer iteration.
The first iteration point is obtained by first chosing x(1) X and then chosing y(1) and z (1)
such that (x(1) , y(1) , z (1) ) becomes a feasible solution of (1.1). This is easy. An outer iteration
of the method, going from the k:th iteration point (x(k) , y(k) , z (k) ) to the (k + 1):th iteration
point (x(k+1) , y(k+1) , z (k+1) ), can be described as follows:
Given (x(k) , y(k) , z (k) ), an approximating subproblem is generated and solved. In this sub(k,0)
problem, the functions fi (x) are replaced by certain convex functions fi
(x). The optimal
(k,0) (k,0)
(k,0)
(k,0)
(k,0)

solution of this subproblem is denoted (


x
,y

, z
). If fi
(
x
) fi (
x(k,0) ), for all
i = 0, 1, . . . , m , the next iteration point becomes (x(k+1) , y(k+1) , z (k+1) ) = (
x(k,0) , y
(k,0) , z(k,0) ),
and the outer iteration is completed (without any inner iterations needed). Otherwise, an
inner iteration is made, which means that a new subproblem is generated and solved at
(k,1)
(k,0)
x(k) , with new approximating functions fi
(x) which are more conservative than fi
(x)
for those indices i for which the above inequality was violated. The optimal solution of
(k,1) (k,1)
this new subproblem is denoted (
x(k,1) , y
(k,1) , z(k,1) ). If fi
(
x
) fi (
x(k,1) ), for all i =
0, 1, . . . , m , the next iteration point becomes (x(k+1) , y(k+1) , z (k+1) ) = (
x(k,1) , y
(k,1) , z(k,1) ),
and the outer iteration is completed (with one inner iterations needed). Otherwise, another inner iteration is made, which means that a new subproblem is generated and solved
(k,2)
at x(k) , with new approximating functions fi
(x), etc. These inner iterations are re(k,) (k,)
(k,)

peated until fi
(
x
) fi (
x
) for all i = 0, 1, . . . , m , which always happens after
a finite (usually small) number of inner iterations. Then the next iteration point becomes
(x(k+1) , y(k+1) , z (k+1) ) = (
x(k,) , y
(k,) , z(k,) ), and the outer iteration is completed (with
inner iterations needed).
It should be noted that in each inner iteration, there is no need to recalculate the gradients
fi (x(k) ), since x(k) has not changed. Gradients of the original functions fi are calculated
only once in each outer iteration. This is an important note since the calculation of gradients
is typically the most time consuming part in structural optimization.
The GCMMA subproblem looks as follows, for k {1, 2, 3, . . .} and {0, 1, 2, . . .}:
(k,)
minimize f0 (x) + z + 12 d0 z 2 +

m
X

(ci yi + 12 yi2 )

i=1
(k,)
subject to fi
(x) ai z yi fimax ,
(k)

(k)

xj j ,

i = 1, . . . , m
j = 1, . . . , n,

yi 0,

i = 1, . . . , m

z 0,

(3.1)

(k,)
where the approximating functions fi
(x) are chosen as
(k,) !
(k,)
n
X
q
p
ij
ij
(k,)
(k,)
+
+ ri , i = 0, 1, . . . , m .
fi
(x) =
(k)
(k)
uj xj
xj lj
j=1

(3.2)

Here,
(k,)

pij

(k,)
qij

(k)

fi (k)
1.001
(x )
xj

(k)

= (uj xj )2

+

fi (k)
+ 0.001
(x )
xj

(k,)

xmax
xmin
j
j

, (3.3)

!



(k,)
i
fi (k)
fi (k) +
(x ) + 1.001
(x ) + max
, (3.4)
0.001
xj
xj
xj xmin
j
!
(k,)
(k,)
n
X
q
p
ij
ij
(k,)
(3.5)
+ (k) (k) .
ri
= fi (x(k) )
(k)
(k)
uj xj
xj lj
j=1


(k)
(k)
(xj lj )2

(k)

(k)

(k)

(k)

Between each outer iteration, the bounds j and j and the asymptotes lj and uj are
updated as in the original MMA, the formulas (2.8)(2.16) still hold.
(k,)
The parameters i
in (3.3) and (3.4) are strictly positive and updated as follows:
Within a given outer iteration k, the only differences between two inner iterations are the
values of some of these parameters. In the beginning of each outer iteration, when = 0, the
following values are used:

n
X
fi (k) max
(k,0)
min
min 0.1


i
= max{ ,
(3.6)
xj (x ) (xj xj ) } , for i = 0, 1, .., m,
n
j=1

where the default value for the parameter min , called RAAMIN in the fortran code, is 106 .
(k,)

In each new inner iteration, the updating of i


is based on the solution of the most recent
(k,)

subproblem. Note that fi


(x) may be written on the form:
(k,)
(k)
(k,)
fi
(x) = hi (x) + i d(k) (x),
(k)

(k,)

where hi (x) and d(k) (x) do not depend on i


(k)

(x) =

(k)

. Some calculations give that

(k)

(k)

n
X

(uj lj )(xj xj )2

j=1

(uj xj )(xj lj )(xmax


xmin
j
j )

(k)

(k)

(3.7)

Now, let
(k,)

(k,)

i
(k)

(k,)

fi (
x(k,) ) fi
(
x(k,) )
.
d(k) (
x(k,) )

(3.8)

(k,)

(k,)

(k,)

Then hi (
x(k,) )+(i +i )d(k) (
x(k,) ) = fi (
x(k,) ), which shows that i +i
might
(k,+1)
be a natural value of i
. In order to get a globally convergent method, this natural value
is modified as follows.
(k,+1)

= min{ 1.1 (i

(k,+1)

= i

(k,)

(k,)

+ i

(k,)

(k,)

) , 10i

(k,)

> 0,

(k,)

0.

} if i
if i

(3.9)

(k,)
It follows from the formulas (3.2)(3.5) that the functions fi
are always first order approximations of the original functions fi at the current iteration point, i.e.
(k,)

fi
fi (k)
(k,) (k)
(x(k) ) =
(x ).
fi
(x ) = fi (x(k) ) and
xj
xj

(3.10)

(k,)
(k,)
Since the parameters i
are always strictly positive, the functions fi
are strictly convex.
Based on this, it can be shown that there is always a unique optimal solution to the GCMMA
subproblem.

A dual method for solving the subproblems

In the fortran implementation, a dual approach based on Lagrangean relaxation is used


for solving the subproblems in both MMA and GCMMA. The dual problem, which is a
maximization problem with a concave objective function and no other constraints than nonnegativity requirements on the (dual) variables, is solved by a modified Newton method,
combined with an active set strategy to handle the non-negativity constraints, whereafter
the optimal dual solution is translated to a corresponding optimal solution of the primal
MMA subproblem. The purpose of this section is to describe this approach in more details.
The MMA and GCMMA subproblems are in this section written as
minimize f0 (x) + z + 12 d0 z 2 +

m
X
(ci yi + 21 yi2 )
i=1

subject to fi (x) ai z yi bi ,

where
fi (x) =

n
X

i = 1, . . . , m

j xj j ,

j = 1, . . . , n

z 0, yi 0,

i = 1, . . . , m

fij (xj ) =

j=1

n 
X
j=1

pij
qij
+
uj xj
xj lj

(4.1)


, for i = 0, 1, . . . , m.

(4.2)

The constraints fi (x) ai z yi bi will be called the explicit constraints, while


the constraints j xj j , z 0 and yi 0 will be called the implicit constraints.

4.1

Lagrangean relaxation and the dual subproblem

A Lagrangean relaxation of the problem, with respect to the explicit constraints only,
gives rise to the following Lagrange function:
L(x, y, z, ) = f0 (x) + z + 21 d0 z 2 +

m
X

(ci yi +

i=1

1 2
2 yi )

m
X

i (fi (x) ai z yi bi ),

(4.3)

i=1

where = (1 , . . . , m )T is the vector of non-negative Lagrange multipliers i for the


explicit constraints. This Lagrange function can be written
L(x, y, z, ) = Lx (x, ) + Ly (y, ) + Lz (z, ) T b, where
L (x, ) = f0 (x) +
x

m
X

i fi (x) =

i=1

 X
n 
n
X
pj ()
qj ()
+
=
Lxj (xj , ),
uj xj
xj lj
j=1

L (y, ) =

m
X

(4.4)
(4.5)

j=1

(ci yi + 21 yi2 i yi ) ,

(4.6)

i=1

Lz (z, ) = (1T a) z + 12 d0 z 2 .
10

(4.7)

Here,
pj () = p0j +

m
X

i pij > 0

and qj () = q0j +

m
X

i qij > 0.

(4.8)

The dual function value () at the point 0 is by definition obtained as the optimal value
of the problem of minimizing the Lagrange function with respect to the primal variables
x, y and z, subject to the implicit constraints only, i.e. as the optimal value of the problem
minimize L(x, y, z, )
subject to j xj j ,

j = 1, . . . , n,

(4.9)

z 0, yi 0, i = 1, . . . , m.
Note that, in this problem (4.9), the vector is held fixed.
Due to (4.4), problem (4.9) can be divided into three separate problems which can be solved
independently of each other. First one problem in x IRn , namely
minimize Lx (x, ) =

n
X

Lxj (xj , )

j=1

(4.10)

subject to j xj j , j = 1, . . . , n,
which in turn can be divided into n separate problems, each involving only one variable xj :
minimize Lxj (xj , ) =

qj ()
pj ()
+
subject to j xj j .
uj xj
xj lj

(4.11)

Then one problem in y IRm , namely


minimize Ly (y, )
(4.12)
subject to yi 0, i = 1, . . . , m,
which in turn can be divided into m separate problems, each involving only one variable yi :
minimize (ci i ) yi + 12 yi2 subject to yi 0.

(4.13)

And finally one problem in z IR, namely


minimize Lz (z, ) = (1T a) z + 12 d0 z 2 subject to z 0.

(4.14)

Each of the three problems (4.11), (4.13) and (4.14) can easily be solved analytically.
The optimal solutions yi () and z() of (4.13) and (4.14), respectively, are given by
yi () = (i ci )+ and z() =

(T a 1)+
,
d0

where (i ci )+ denotes the largest of the two numbers i ci and 0,


and (T a 1)+ denotes the largest of the two numbers T a 1 and 0.

11

(4.15)

The optimal values of the problems (4.12) and (4.14), respectively, are then given by
Ly (
y(), ) = 12

m
X

((i ci )+ )2 = 21

i=1

and
Lz (
z (), ) =

m
X

yi ()2

(4.16)

i=1

((T a1)+ )2
= 12 d0 z()2 .
2d0

(4.17)

Let j () denote the unique number xj (lj , uj ) which satisfies the equation
Lxj
(xj , ) = 0,
xj
where Lxj (xj , ) is defined in (4.11). Straigh-forward calculations give that
p
p
lj pj () + uj qj ()
p
j () = p
.
pj () + qj ()
Then the optimal solution x
j () of (4.11) is

j ()
x
j () =

(4.19)

given by
if j () j ,
if j < j () < j ,

(4.20)

if j () j ,

The optimal value of the problems (4.10) is then given by



n
n 
X
X
pj ()
qj ()
x
x
L (
x(), ) =
Lj (
xj (), ) =
+
.
uj x
j () x
j () lj
j=1

(4.18)

(4.21)

j=1

Finally, the optimal value of the problem (4.9) is given by


L(
x(), y
(), z(), ) = Lx (
x(), ) + Ly (
y(), ) + Lz (
z (), ) T b,

(4.22)

which by definition is the dual function value () at the point . Thus,



m
n 
X
X
qj ()
pj ()
((T a1)+ )2
+
21
((i ci )+ )2
T b . (4.23)
() =
uj x
j () x
j ()lj
2d0
j=1

i=1

The dual problem D is by definition the following maximization problem.


D: maximize () subject to 0.

(4.24)

The following theorem, which is proved later, provides the main motivation for considering
this dual problem D.
is an optimal solution to the dual problem (4.24), and let
Theorem 1: Assume that
(
x, y
, z) be the corresponding primal solution obtained from the formulas

(4.20) and (4.15) with = .


Then (
x, y
, z) is the unique optimal solution to the primal subproblem (4.1).
Moreover, the optimal values of the problems (4.1) and (4.24) are equal.
12

4.2

Gradient and Hessian of the dual function

In this section, explicit expressions for the gradient and the Hessian of the dual function
are derived. These are needed when a modified Newton method is used for solving the dual
problem. It turns out that the first order derivatives of are continuous everywhere, while
the second order derivatives are not.
First, some simplifying notations are introduced. Let
w = (xT , yT , z)T ,
W = { w = (xT , yT , z)T | j xj j , j, yi 0, i, z 0 },
P
g0 (w) = f0 (x) + z + 1 d0 z 2 +
(ci yi + 1 y 2 ),
i

2 i

gi (w) = fi (x) ai z yi bi ,
g(w) = (g1 (w), . . . , gm (w))T .
Then the primal subproblem (4.1) may be written
P:

minimize g0 (w)
subject to gi (w) 0, i = 1, . . . , m

(4.25)

w W.
The corresponding Lagrange function, which is the same Lagrange function as in (4.3), is
L(w, ) = g0 (w) +

m
X

i gi (w) = g0 (w) + T g(w),

(4.26)

i=1

while the corresponding dual function, which is the same dual function as in (4.23), is
() = min L(w, ) = L(w(),

) = g0 (w())

+ T g(w()),

wW

(4.27)

where w()

= (
x()T , y
()T , z())T is the unique w W which minimizes L(w, ) on W .
The different components in the vector w()

are given by the formulas (4.20) and (4.15).


then
Lemma 4.2.1: If and
( )
T g(w(

() ()
)),

(4.28)

T
() ()

()
g(w()).

(4.29)

Proof: First, (4.28) follows from


= L( w(),
)
L( w(
) L( w(
)
= ( )
T g(w(

() ()

) L( w(
),
),
),
)).
change place in (4.28).
Then (4.29) follows by simply letting and

13

and d IRm then


Lemma 4.2.2: If
lim

t 0

+ t d) ()

(
T d.
= g(w(
))
t

(4.30)

+ t d. For t > 0, the inequalitites (4.28) and (4.29) imply that:


Proof: Let =

+ t d))T d ( + t d) () g(w(
T d,
g(w(

))
t
while for t < 0, they imply that:

+ t d))T d.
T d ( + t d) () g(w(

g(w(
))
t
+ t d)) g(w(
when t 0.
The lemma now follows from the fact that g(w(

))
Lemma 4.2.3: The partial derivatives of the dual function , at a given point ,
are given by

() = gi (w()),

i = 1, . . . , m.
i

(4.31)

Proof: Follows from Lemma 4.2.2 by letting d = ei .


Lemma 4.2.4: The partial derivatives of the dual function , at a given point ,
are given by
n

() =
i
j=1

pij
qij
+
uj x
j () x
j ()lj

ai T
( a1)+ (i ci )+ bi .
d0

(4.32)

where x
j () is defined by the formula (4.20).
Proof:
According to Lemma 4.2.3, and the definition of gi (w), the partial derivative of the dual
function are given by

() = fi (
x()) ai z() yi () bi .
i

(4.33)

The lemma now follows from the definition of fi (x) and the formulas (4.15).
Since these partial derivatives of are continuous, the dual function is a continuously
differentiable function.

14

Now to the Hessian matrix. The dual function can be written


() = x () + y () + z () T b, where
x

() =

n 
X
j=1

qj ()
pj ()
+
uj x
j () x
j ()lj

y () = 21

m
X


=

n
X

xj (),

(4.34)
(4.35)

j=1

((i ci )+ )2 ,

(4.36)

i=1

z () =

((T a1)+ )2
.
2d0

(4.37)

The Hessian matrix of is a corresponding sum of the Hessian matrices of x , y and z .


Lemma 4.2.5: The mm Hessian matrix () of the dual function is given by
() = y () + z () +

n
X

xj (),

(4.38)

j=1

where y () is a diagonal matrix with diagonal elements

0
if i < ci ,

not defined if i = ci ,
[y ()]ii =

1
if i > ci ,
the elements of the matrix z () are given by

0
if T a < 1,

not defined if T a = 1,
[z ()]ik =

aa

i k
if T a > 1,
d0

(4.39)

(4.40)

and the elements of the matrix xj () are given by

 x 
j () ik =

if

j () < j or j () > j ,

typically not defined

if

j () = j or j () = j ,

hj () vij () vkj ()

if

j < j () < j ,

(4.41)

where
pij
qij
vij () =

and hj () =
pj () qj ()

15

p
pj ()qj ()
.
2(uj lj )

(4.42)

Proof:
Straight-forward analytical calculations of the second order derivatives of y () in (4.36)
and z () in (4.37) directly give the expressions for y () in (4.39) and z () in (4.40).
If j < j () < j then x
j () = j () so that
xj
pij
qij
() = fij (
xj ()) = fij (j ()) =
.
+
i
uj j () j () lj

(4.43)

Note that this holds not only at but also in a neighbourhood of .


The expression (4.19) for j () now gives that
xj
() =
i

p
p
pj () + qj ()
uj lj

qij
pij
p
+p
pj ()
qj ()

!
,

and then, by taking the derivative of this with respect to k ,


p



2 xj
pj ()qj ()
pkj
qkj
pij
qij
() =

,
k i
2(uj lj )
pj () qj ()
pj () qj ()

(4.44)

(4.45)

which proves the last line in (4.41).


If j () < j or j () > j then x
j () = j or j , and then
xj
pij
qij
pij
qij
() = fij (
xj ()) =
+
or
+
.
i
uj j
j lj
uj j
j lj

(4.46)

Thus, the first order derivatives of xj are constant in a neighbourhood of , which implies
that all the second order derivates of xj are zero at . This proves the first line in (4.41).
On the hyperplanes j () = j and j () = j , the second order derivatives of xj typically
dont exist since they make a sudden jump when passes through any of these hyperplanes.
(It is typically not the case that the right hand side in (4.45) is zero when j () = j or j .)

16

4.3

Some theoretical results for the dual problem

is an optimal solution to the dual problem (4.24) if and only if


Lemma 4.3.1: The point
0, g(w(
0 and
T g(w(
= 0.

))
))
Proof:
0, g(w(
0 and
T g(w(
= 0.
First, assume that
))
))
T
T

Then, for any 0, ( ) g(w(


)) = g(w(
)) 0, and then the inequality (4.28)
0, so that
is an optimal solution to the dual problem (4.24).
implies that () ()
is an optimal solution to the dual problem (4.24).
In the rest of the proof, assume that
0 and ()
() 0 for every 0, and then (4.29) implies that
Then
T

()
g(w())

0 for every 0.

(4.47)

0 . Assume that gk (w(


> 0 for some k.
First, we show that g(w(
))
))

Then let i = i for all i 6= k and k = k + k where k > 0.


If k is chosen sufficiently small then gk (w())

> 0.
T

But then 0 and () g(w())

= k gk (w())

< 0, which contradicts (4.47).


k gk (w(
T g(w(
= 0 . Assume that
6= 0 for some k.
Next, we show that
))
))

< 0 for this k.


Since k 0 and gk (w(
)) 0, it follows that k > 0 and gk (w(
))
i for all i 6= k and k =
k k where 0 < k <
k .
Then let i =
If k is chosen sufficiently small then gk (w())

< 0.
T g(w())

But then 0 and ()

= k gk (w())

< 0, which contradicts (4.47).


The preparations are now made for proving Theorem 1 from section 4.1.
Before the proof, the theorem is reformulated with the simplified notations.
is an optimal solution of the dual problem (4.24) then
Theorem 1: If
is the unique globally optimal solution of the primal subproblem (4.1).
w(
)
= (),
so that the optimal values of the
Moreover, g0 (w(
))
primal problem and the dual problem are equal.
Proof:
is an optimal solution to the dual problem and let w

Assume that
= w(
).
T

Then, from Lemma 4.3.1, g(w)


0 and g(w)
= 0.
This implies, in particular, that w
is a feasible solution to the primal problem.
Now, assume that w is another feasible solution to the primal problem,
i.e. w W , g(w) 0 and w 6= w.
Then
T g(w)
T g(w) g0 (w),
g0 (w)
= g0 (w)
+
< g0 (w) +

(4.48)

T g(w)
where the equality follows since
= 0, the strict inequality follows since w
is the
unique minimizer of the Lagrange function, and the final inequality follows since g(w) 0
0. The equality in (4.48) also implies that g0 (w)
= L( w(
)
= ().

and
= L( w,
)
),

17

4.4

g optimal dual solutions and sufficiently good primal solutions

We repeat that the dual problem reads


D: maximize () subject to 0.

(4.49)

Since the dual objective function is concave and continuously differentiable, a given point
1, . . . ,
m )T IRm is an optimal solution to this dual problem if and only if
= (

(1)
(2)
(3)

i 0

for all i = 1, . . . , m,

() 0 for all i I0 (),


i

() = 0 for all i I1 (),


i

(4.50)

i = 0 } and I1 ()
i > 0 }.
={i|
={i|
where I0 ()
Then the unique optimal solution (
x, y
, z) to the primal subproblem (4.1) is obtained from

the formulas (4.20) and (4.15), with = .


In the fortran implementation, the optimality conditions (2) and (3) are relaxed slightly.
1, . . . ,
m )T is said to be a g optimal solution to D if
= (
Def: A dual vector
(1)

i 0

for all i = 1, . . . , m,

() g for all i I0 (),


i




(3)
() g for all i I1 (),
i

(2)

(4.51)

The default value of the tolerance parameter g , called GEPS in the fortran code, is 105 .

The corresponding primal solution (


x, y
, z), obtained from (4.20) and (4.15) with = ,
is then considered to be a sufficiently good solution to the primal subproblem (4.1).
In order to analyze how good a sufficiently good primal solution is, the following problem
is considered, where the parameter is a (typically small) real number, positive or negative
or zero.
P() : minimize g0 (w)
subject to gi (w) , i = 1, . . . , m

(4.52)

w W.
Note that if = 0 then the primal subproblem (4.1) is obtained.
The Lagrange function corresponding to P() is
L (w, ) = g0 (w) +

i (gi (w) ) = L0 (w, )

X
i

where L0 (w, ) is the Lagrange function in (4.3).


18

i ,

(4.53)

The corresponding dual objective function is


() = min L (w, ) = min L0 (w, )
wW

wW

i = L0 (w(),

i ,

(4.54)

where w()

is the unique w W which minimizes L0 (w, ) on W (for given 0).


It may be noted that
() = 0 ()

i ,

(4.55)

where 0 () is the dual function in (4.23).


The dual problem D() corresponding to the primal problem P() is
D() : maximize () subject to 0,

(4.56)

which equivalently may be written


D() : maximize 0 ()

i subject to 0.

(4.57)

is an g optimal solution to D(), and let w

Lemma 4.4.1: Assume that


= w(
).
Then w
is a feasible solution to P(+g ), and g0 (w)
g0 (w) for every
feasible solution w to P(g ).

() = gi (w(
))
= gi (w)

.
i
Therefore, it follows from the definition (4.51) that

Proof: First, it follows from (4.31) that

(2)

i = 0,
gi (w)
+g if
(4.58)

i > 0.
(3) g gi (w)
+g if
In particular, this implies that w
is a feasible solution to P(+g ).
Next, assume that w is a feasible solution to P(g ). Then
X
i gi (w)

g0 (w)
= L0 (w,
)

i >0


L0 (w, )

i gi (w)

i >0

= g0 (w) +

i gi (w)

i >0

g0 (w) +

i gi (w)

i >0

i (g )

i >0

= g0 (w),
and the proof is complete.

19

X
i >0

i (g )

(4.59)

Three special cases may be pointed out: = g , = g and = 0.


The corresponding statements read:
Corollary 4.4.1:
is an g optimal solution to D(g ), and let w

Assume that
= w(
).
Then w
is a feasible solution to P(0) and g0 (w)
the optimal value of P(2g ).
is an g optimal solution to D(g ), and let w

Assume that
= w(
).
Then w
is a feasible solution to P(2g ) and g0 (w)
the optimal value of P(0).
is an g optimal solution to D(0), and let w

Assume that
= w(
).
Then w
is a feasible solution to P(g ) and g0 (w)
the optimal value of P(g ).

4.5

An active set method for the dual problem

Consider again the dual problem


D:

maximize () subject to i 0 for i = 1, . . . , m.

(4.60)

In this section, the gradient vector of the dual objective function is denoted g(), i.e.
g() = (g1 (), . . . , gm ())T =

T

() , . . . ,
() .
1
m

(4.61)

Let M = {1, . . . , m} and let A be a given subset of M, i.e. A M.


Def: The equality-constrained subproblem DA corresponding to A is defined as
DA :

maximize () subject to i = 0 for all i A.

(4.62)

1, . . . ,
m )T and the index set A satisfy that
= (
Lemma 4.5.1: Assume that the point
(1)

i = 0 for all i A,

= 0 for all i M \A,


(2) gi ()
(3)

i 0 for all i M \A,

(4.63)

0 for all i A.
(4) gi ()
is an optimal solution to the problem D.
Then
is an optimal solution to the problem D, then there is an
Conversely, If
index set A M such that (1)(4) are satisfied (namely every index set A
i = 0}).
< 0} A { i M |
such that { i M | gi ()
Lemma 4.5.2: The point = (1 , . . . , m )T is an optimal solution to the equality-constrained
subproblem DA if and only if
(1)

i = 0 for all i A,

(2) gi () = 0 for all i M \A.


20

(4.64)

and the index set A satisfy (1)(4) in Lemma 4.5.1.


Lemma 4.5.3: Assume that the point
Assume further that the same index set A and the possibly new point
satisfy (1)(3) in Lemma 4.5.1, i.e.
(1)

i = 0 for all i A,

(2) gi () = 0 for all i M \A,


(3)

(4.65)

i 0 for all i M \A.

Then and A also satisfy (4) in Lemma 4.5.1, i.e.


(4) gi () 0 for all i A,

(4.66)

and is an optimal solution both to DA and to D.


Thus, an optimal solution to the inequality-constrained dual problem D might be obtained
by solving the (typically much easier) equality-constrained subproblem DA , while ensuring
that 0 is satisfied. The difficulty, of course, is that the optimal index set A is not
known beforehand.
The main idea with an active set method is to make a good guess of the optimal index
set A, a guess which is iteratively updated during the solution process.
When optimal solution is replaced by g optimal solution, the corresponding result is
as follows, where (4.67) is essentially a repetition of (4.51).
1, . . . ,
m )T is an g optimal solution to the problem D
= (
Def: A point
if there is an index set A M such that the following conditions hold:
i = 0 for all i A,

g for all i M \A.


|gi ()|
i 0 for all i M \A.

(4.67)

g for all i A.
gi ()
Def: A point = (1 , . . . , m )T is an g optimal solution to the problem DA
if the following conditions hold:
i = 0 for all i A,
|gi ()| g for all i M \A.

(4.68)

i = 0 }.
is an g optimal solution to D, and A = { i M |
Lemma 4.5.4: Assume that
is an g optimal solution also to DA .
Then
Further, each g optimal solution to DA which satisfies 0 and
gi () g for all i A is an g optimal solution to D.

21

4.6

Active set algorithm for finding an g optimal solution

The active set algorithm, used in the fortran implementation for finding an g optimal
solution to the dual problem D, is now described.
At iteration k, the current index set guess is A(k) and the current iteration point is (k) .
It might be noted that if (k+1) 6= (k) then ((k+1) ) > ((k) ).
STEP 0: The index counter is set to k = 1, the first iteration point is set to (1) = 0,
and the first index guess is set to A(1) = { i M | gi ((1) ) g }.
If A(1) = M, then (1) = 0 is an g optimal solution to the dual problem D and the algorithm is stopped! Otherwise, the algorithm continues with STEP 1.
STEP 1: Here, the current index set A(k) M is given, together with the current iteration
(k)
(k)
point (k) IRm which satisfies i = 0 for i A(k) and i 0 for i M \A(k) .
If | gi ((k) )| g for all i M \A(k) , the algorithm continues with STEP 2.
Otherwise, the algorithm continues with STEP 3.
STEP 2: Here, the current iteration point (k) is an g optimal solution to DA(k) .
If gi ((k) ) g for all i A(k) then (k) is an g optimal solution to D and the
algorithm is stopped!
Otherwise, let p A(k) be an index such that gp ((k) ) = maxi { gi ((k) ) | i A(k) } (> g ).
Then the index set, the iteration point, and the iteration counter are updated as follows:
A(k+1) = A(k) \ {p}, (k+1) = (k) , k k + 1,
whereafter the algorithm continues with STEP 1.
STEP 3: Here, the current iteration point (k) is not an g optimal solution to DA(k) .
(k)
Then a search direction d(k) IRm such that di = 0 for i A(k) and g((k) )T d(k) > 0
is calculated. If d(k) 0 the algorithm continues with STEP 4.
( (k)
)
i
(k)
Otherwise, a maximal steplength is calculated through tmax = min
| di < 0 .
(k)
i
di
If g((k) + tmax d(k) )T d(k) < 0 the algorithm continues with STEP 4.
(k)

(k)

Otherwise, let q M \A(k) be an index such that q + tmax dq = 0.


Then the index set, the iteration point, and the iteration counter are updated as follows:
A(k+1) = A(k) {q}, (k+1) = (k) + tmax d(k) , k k + 1,
whereafter the algorithm continues with STEP 1.
STEP 4: Here, a line search from the point (k) in the search direction d(k) is made.
A steplength tk such that g((k) + tk d(k) )T d(k) > 0 g((k) + (1+)tk d(k) )T d(k)
is calculated. (A default value of the parameter is 0.03.)
Then the index set, the iteration point, and the iteration counter are updated as follows:
A(k+1) = A(k) , (k+1) = (k) + tk d(k) , k k + 1,
whereafter the algorithm continues with STEP 1.

22

A small test problem

Consider the following problem in the variables x = (x1 , x2 , x3 )T :


minimize x21 + x22 + x23
subject to (x1 5)2 + (x2 2)2 + (x3 1)2 9,

(5.1)

(x1 3)2 + (x2 4)2 + (x3 3)2 9,


0 xj 5, j = 1, 2, 3.
(1)

(1)

(1)

With c1 = c2 = 1000, a1 = a2 = 0, the starting point (x1 , x2 , x3 ) = (4, 3, 2), and the
(k)
(k)
parameter GEPS = 107 , it turned out that y1 = y2 = z (k) = 0 for all k 2, while x(k) ,
f0 (x(k) ), f1 (x(k) ) and f2 (x(k) ), for k = 1, . . . , 7, are shown in the following two tables:
k

(k)

x1

(k)

x2

(k)

x3

f0 (x(k) )

f1 (x(k) )

f2 (x(k) )

1 4.000000 3.000000 2.000000 29.000000 3.000000 3.000000

MMA:

2 2.390298 1.805719 0.992865

9.959929

6.848340 9.215195

3 2.038452 1.762359 1.241707

8.803031

8.885662 9.023207

4 2.017793 1.778557 1.239183

8.770329

8.999802 9.000017

5 2.017626 1.779369 1.238257

8.770249

9.000001 8.999998

6 2.017554 1.779796 1.237758

8.770246

9.000000 9.000000

7 2.017526 1.779968 1.237558

8.770246

9.000000 9.000000

(k)

x1

(k)

x2

(k)

x3

f0 (x(k) )

f1 (x(k) )

f2 (x(k) )

1 4.000000 3.000000 2.000000 29.000000 3.000000 3.000000


2 2.555037 1.890622 1.076547 11.261620 5.995666 8.347138
GCMMA:

3 2.072173 1.795876 1.191027

8.937619

8.650326 8.991408

4 2.016184 1.791365 1.224353

8.773025

8.997020 8.998887

5 2.016950 1.783479 1.233496

8.770396

8.999988 8.999891

6 2.017408 1.780681 1.236728

8.770255

8.999998 8.999992

7 2.017508 1.780073 1.237436

8.770246

9.000000 9.000000

Note that, on this particular problem, MMA is slightly faster than GCMMA. Also note that
some of the iteration points generated by MMA are slightly infeasible, while all the iteration
points generated by GCMMA are feasible (within the tolerances given by GEPS).

References

[1] K. Svanberg, The method of moving asymptotes a new method for structural optimization, International Journal for Numerical Methods in Engineering, 1987, 24, 359-373.
[2] K. Svanberg, A class of globally convergent optimization methods based on conservative
convex separable approximations, SIAM Journal of Optimization, 2002, 12, 555-573.

23

You might also like