Lecture 6: Optimality Conditions For Nonlinear Programming

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

IE 8534 1

Lecture 6: Optimality Conditions for Nonlinear Programming


Shuzhong Zhang
IE 8534 2
In this lecture we will discuss a very important topic: how to
characterize and solve a nonlinear program with explicit equality and
inequality constraints? That is,
minimize f(x)
subject to h
i
(x) = 0, i = 1, ..., m,
g
j
(x) 0, j = 1, ..., r.
To start our discussion we rst consider the case where no inequalities
present.
Let x

be an optimal point for the problem.


Consider a penalty function
F

(x) = f(x) +

2
m

i=1
h
2
i
(x) +

2
x x

2
.
Shuzhong Zhang
IE 8534 3
Let
k
, and x
k
be the corresponding optimal solution for the
unconstrained function F

k
(x).
We see that lim
k
x
k
= x

.
We assume additionally the following regularity condition or
sometimes termed constraint qualication
rank ([h
1
(x), ..., h
m
(x)]) = m,
for any x.
Now, by the optimality of x
k
we have
0 = F

k
(x
k
)
= f(x
k
) +
k
m

i=1
h
i
(x
k
)h
i
(x
k
) +(x
k
x

).
Let
H(x) = [h
1
(x), ..., h
m
(x)].
Shuzhong Zhang
IE 8534 4
We have

k
h(x
k
) = (H(x
k
)
T
H(x
k
))
1
H(x
k
)
T
(f(x
k
) +(x
k
x

)).
Now we take the limit. By letting
= lim
k

k
h(x
k
)
we have
= (H(x

)
T
H(x

))
1
H(x

)
T
f(x

).
Shuzhong Zhang
IE 8534 5
This gives us the relation
f(x

) +
m

i=1

i
h
i
(x

) = 0.
The
i
s are called the Lagrangian multipliers.
Example. We can apply the above optimality condition to solve
minimize x
2
1
+x
2
2
+x
2
3
subject to x
1
+x
2
+x
3
= 1.
Shuzhong Zhang
IE 8534 6
Moreover, the second order necessary condition tells us that
0
2
F

k
(x
k
)
=
2
f(x
k
) +
k
m

i=1
_
h
i
(x
k
)h
i
(x
k
)
T
+h
i
(x
k
)
2
h
i
(x
k
)

+I.
Let y be such that H(x

)
T
y = 0. Then, by taking limit, we have
y
T
_

2
f(x

) +
m

i=1

2
h
i
(x

) +I
_
y 0.
This is true for any . Thus,
y
T
_

2
f(x

) +
m

i=1

2
h
i
(x

)
_
y 0
for any H(x

)
T
y = 0.
This proves the famous theorem of Lagrange.
Shuzhong Zhang
IE 8534 7
Theorem 1 Consider the equality constrained optimization problem
minimize f(x)
subject to h
i
(x) = 0, i = 1, ..., m.
Suppose that the constraints are regular, i.e.,
rank ([h
1
(x), ..., h
m
(x)]) = m,
for any x. Then, if x

is an optimal solution, then there must exist


i
,
i = 1, ..., m, such that
f(x

) +
m

i=1

i
h
i
(x

) = 0.
and
y
T
_

2
f(x

) +
m

i=1

2
h
i
(x

)
_
y 0
for all H(x

)
T
y = 0.
Shuzhong Zhang
IE 8534 8
Conventionally we call the function
L(x, ) = f(x) +
m

i=1

i
h
i
(x)
the Lagrangian function.
The Lagrange theorem can be stated as:
There is such that

x
L(x

, ) = 0
and
y
T

2
xx
L(x

, )y 0
for all H(x

)
T
y = 0.
Shuzhong Zhang
IE 8534 9
Just like in the unconstrained case, if, additionally, x

satises
y
T
_

2
f(x

) +
m

i=1

2
h
i
(x

)
_
y > 0
for any y = 0 and H(x

)
T
y = 0, then x

is a local minimum.
In other words, this is a sucient condition for optimality.
In the form of Lagrangian function, the sucient condition is
y
T

2
xx
L(x

, )y > 0
for all y = 0 and H(x

)
T
y = 0.
Shuzhong Zhang
IE 8534 10
Lagrangian multipliers are also known as shadow prices.
Consider
minimize f(x)
subject to h
i
(x) = y
i
, i = 1, ..., m.
If the problem has an optimal solution with value c(y). The
corresponding Lagrangian multiplier is (y). Then,
c(y) = (y),
i.e.,
c(y + y) c(y)
m

i=1

i
(y)y
i
.
Shuzhong Zhang
IE 8534 11
How about the problem with equality and inequality constraints?
minimize f(x)
subject to h
i
(x) = 0, i = 1, ..., m,
g
j
(x) 0, j = 1, ..., r.
First, let
A(x) = {j | g
j
(x) = 0}
be the set of active constraints at point x.
Assume that
[h
i
(x), g
j
(x) | i = 1, ..., m; j A(x)]
are always linearly independent. (This is a regularity condition, or,
constraint qualication).
Shuzhong Zhang
IE 8534 12
We may convert an inequality constraint into an equality constraint,
resulting in an equivalent problem
minimize f(x)
subject to h
i
(x) = 0, i = 1, ..., m,
g
j
(x) +z
2
j
= 0, j = 1, ..., r.
Let the Lagrangian multipliers be (rst m constraints) and (next r
constraints).
Shuzhong Zhang
IE 8534 13
The Lagrangian function is L(x, z, , ). Moreover,
L(x, z, , ) =
_

_
f(x) +
m

i=1

i
h
i
(x) +
r

j=1

j
g
j
(x)
2
1
z
1
.
.
.
2
r
z
r
_

_
Shuzhong Zhang
IE 8534 14
Its Hessian matrix is

2
L
x,z
(x, z, , )
=
_

2
f(x) +
m

i=1

2
h
i
(x) +
r

j=1

2
g
j
(x) 0
0 2diag ()
_

_
where diag () is a diagonal matrix with as its components. Hence,
_
y
T
w
T
_

2
x,z
L(x, z, , )
_
_
y
w
_
_
= y
T
_
_

2
f(x) +
m

i=1

2
h
i
(x) +
r

j=1

2
g
j
(x)
_
_
y + 2
r

j=1

j
w
2
j
Shuzhong Zhang
IE 8534 15
The rst order optimality condition gives
_

_
f(x

) +
m

i=1

i
h
i
(x

) +
r

j=1

j
g
j
(x

) = 0
2
j
z

j
= 0, j = 1, ..., r.
This implies that
j
= 0 whenever z

j
= 0, i.e. j A(x

).
Shuzhong Zhang
IE 8534 16
The second order optimality condition states that
_
y
T
w
T
_

2
x,z
L(x, z, , )
_
_
y
w
_
_
0
for all
(
x,z
h
i
(x, z))
T
_
_
y
w
_
_
= 0
and
(
x,z
g
j
(x, z))
T
_
_
y
w
_
_
= 0.
Shuzhong Zhang
IE 8534 17
Now,

x,z
h
i
(x, z) =
_
_
h
i
(x)
0
_
_
and

x,z
g
j
(x, z) =
_
_
g
j
(x)
2z
j
_
_
.
Therefore,
(
x,z
g
j
(x, z))
T
_
_
y
w
_
_
= (g
j
(x))
T
y + 2z
j
w
j
.
Shuzhong Zhang
IE 8534 18
Hence, the second order optimality condition is
y
T
_
_

2
f(x

) +
m

i=1

2
h
i
(x

) +

jA(x

2
g
j
(x

)
_
_
y 0
for all y such that h
i
(x

)
T
y = 0, i = 1, ..., m, and
g
j
(x

)
T
y = 0, j A(x

).
Shuzhong Zhang
IE 8534 19
Now we are in a position to state the famous theorem of Karush,
Kuhn and Tucker.
Theorem 2 Consider
minimize f(x)
subject to h
i
(x) = 0, i = 1, ..., m,
g
j
(x) 0, j = 1, ..., r.
Suppose that the constraints are regular, i.e.,
h
1
(x), ..., h
m
(x), g
j
(x), j A(x),
are linearly independent for any given x. Then, if x

is an optimal
solution, there must exist
i
, i = 1, ..., m,
j
0, j = 1, ..., r, such that

x
L(x

, , ) = 0 and y
T

2
x
L(x

, , )y 0
for all y such that h
i
(x

)
T
y = 0, i = 1, ..., m, and
g
j
(x

)
T
y = 0, j A(x

). Moreover,
j
= 0 for all j A(x

).
Shuzhong Zhang
IE 8534 20
The condition becomes sucient if
y
T
_
_

2
f(x

) +
m

i=1

2
h
i
(x

) +

jA(x

2
g
j
(x

)
_
_
y > 0
for all y = 0 with h
i
(x

)
T
y = 0, i = 1, ..., m, and
g
j
(x

)
T
y = 0, j A(x

), and moreover,

j
> 0, for j A(x

).
The last condition is called strict complementarity.
The usual complementarity refers to the relationship:

j
0, g
j
(x

) 0, and
j
g
j
(x

) = 0
for all j = 1, ..., r.
The KKT condition also becomes sucient when the problem is
convex.
Shuzhong Zhang
IE 8534 21
In the statement of the KKT theorem, we need to assume a regularity
condition.
Without any regularity condition, Lagrangian multipliers may not
exist. Consider for example:
minimize x
1
+x
2
subject to (x
1
1)
2
+x
2
2
1 0
(x
1
2)
2
x
2
2
+ 4 0.
Shuzhong Zhang
IE 8534 22
However, the following condition always holds: There are
i
,
i = 1, ..., m, and
j
0, j = 0, 1, ..., r, such that

0
f(x

) +
m

i=1

i
h
i
(x

) +
r

j=1

j
g
j
(x

) = 0,
where
j
g
j
(x

) = 0 for j = 1, ..., r, and not all these multipliers are


zero.
This condition is known as the Fritz John condition.
Shuzhong Zhang
IE 8534 23
In solving an optimization problem, one is often interested in the case
where
0
> 0.
There are several alternative conditions to guarantee the regularity of
the constraints, thus ensuring the existence of Lagrangian multipliers.
Those are called constraint qualications.
A famous constraint qualication is known as Mangasarian-Fromovitz
constraint qualication:
The vectors h
i
(x

) (i = 1, ..., m) are linearly independent,


and there exists d such that
(h
i
(x

))
T
d = 0, i = 1, ..., m,
and
(g
j
(x

))
T
d < 0, for all j A(x

).
It is easy to see that with this condition, one must have
0
> 0.
Shuzhong Zhang
IE 8534 24
The Mangasarian-Fromovitz constraint qualication may still be
dicult to verify.
There is a very well known and simple veriable constraint
qualication in case all h
i
s are linear, and g
j
s are convex. This is
known as the Slater constraint qualication:
There is x such that
h
i
( x) = 0, for all i = 1, ..., m,
and
g
j
( x) < 0, for all j = 1, ..., r.
Shuzhong Zhang
IE 8534 25
To prove that Slaters condition implies the Mangasarian-Fromovitz
condition, we need only to use the subgradient inequality:
0 > g
j
( x) g
j
(x

) + (g
j
(x

))
T
( x x

) = (g
j
(x

))
T
( x x

)
for j A(x

). Therefore, if we let d = x x

, then
(g
j
(x

))
T
d < 0
for j A(x

), and
h
i
(x

)
T
d = 0
due to the linearity of h
i
, i = 1, ..., m.
Shuzhong Zhang
IE 8534 26
Now we wish to establish a link to geometry.
For a set X and a given x X, we call y a tangent direction if there is
a sequence x
k
X such that
(x
k
x)/x
k
x y/y.
The collection of all tangent directions at x is called a tangent cone,
denoted by T(x).
Recall Proposition 1 in Lecture 5.
For a constrained optimization problem
(P) minimize f(x)
subject to x X,
if x

X is a local optimal point for (P), then


(f(x

))
T
(x x

) 0
for all x X.
Shuzhong Zhang
IE 8534 27
Quite clearly, this proposition suggests that at an optimal point x

, we
must have
(f(x

))
T
y 0
for all y T(x

).
What is T(x

) in case of equality and inequality constraints?


In a normal situation (the quasi-regular case) we have
T(x

) =
_
y | (h
i
(x

))
T
y = 0, i = 1, ..., m, (g
j
(x

))
T
y 0, for j A(x

)
_
.
Note the following well known Farkas Lemma:
c
T
y 0 for all By = 0 and Cy 0 is equivalent to and
0 such that c = B
T
C
T
.
Shuzhong Zhang
IE 8534 28
Using Farkas Lemma, we can conclude that if x

is a quasi-regular
point, and that x

is a local minimum, then there must exist and

j
0 for j A(x

) such that
0 = f(x

) +
m

i=1

i
h
i
(x

) +

jA(x

j
g
j
(x

).
By supplementing
j
= 0 for all j A(x

) we then have
0 = f(x

) +
m

i=1

i
h
i
(x

) +
r

j=1

j
g
j
(x

)
with
j
g
j
(x

) = 0 for all j = 1, ..., r.


This is exactly the optimality condition of Karush, Kuhn and Tucker.
Shuzhong Zhang

You might also like