Lecture 6: Optimality Conditions For Nonlinear Programming
Lecture 6: Optimality Conditions For Nonlinear Programming
Lecture 6: Optimality Conditions For Nonlinear Programming
(x) = f(x) +
2
m
i=1
h
2
i
(x) +
2
x x
2
.
Shuzhong Zhang
IE 8534 3
Let
k
, and x
k
be the corresponding optimal solution for the
unconstrained function F
k
(x).
We see that lim
k
x
k
= x
.
We assume additionally the following regularity condition or
sometimes termed constraint qualication
rank ([h
1
(x), ..., h
m
(x)]) = m,
for any x.
Now, by the optimality of x
k
we have
0 = F
k
(x
k
)
= f(x
k
) +
k
m
i=1
h
i
(x
k
)h
i
(x
k
) +(x
k
x
).
Let
H(x) = [h
1
(x), ..., h
m
(x)].
Shuzhong Zhang
IE 8534 4
We have
k
h(x
k
) = (H(x
k
)
T
H(x
k
))
1
H(x
k
)
T
(f(x
k
) +(x
k
x
)).
Now we take the limit. By letting
= lim
k
k
h(x
k
)
we have
= (H(x
)
T
H(x
))
1
H(x
)
T
f(x
).
Shuzhong Zhang
IE 8534 5
This gives us the relation
f(x
) +
m
i=1
i
h
i
(x
) = 0.
The
i
s are called the Lagrangian multipliers.
Example. We can apply the above optimality condition to solve
minimize x
2
1
+x
2
2
+x
2
3
subject to x
1
+x
2
+x
3
= 1.
Shuzhong Zhang
IE 8534 6
Moreover, the second order necessary condition tells us that
0
2
F
k
(x
k
)
=
2
f(x
k
) +
k
m
i=1
_
h
i
(x
k
)h
i
(x
k
)
T
+h
i
(x
k
)
2
h
i
(x
k
)
+I.
Let y be such that H(x
)
T
y = 0. Then, by taking limit, we have
y
T
_
2
f(x
) +
m
i=1
2
h
i
(x
) +I
_
y 0.
This is true for any . Thus,
y
T
_
2
f(x
) +
m
i=1
2
h
i
(x
)
_
y 0
for any H(x
)
T
y = 0.
This proves the famous theorem of Lagrange.
Shuzhong Zhang
IE 8534 7
Theorem 1 Consider the equality constrained optimization problem
minimize f(x)
subject to h
i
(x) = 0, i = 1, ..., m.
Suppose that the constraints are regular, i.e.,
rank ([h
1
(x), ..., h
m
(x)]) = m,
for any x. Then, if x
) +
m
i=1
i
h
i
(x
) = 0.
and
y
T
_
2
f(x
) +
m
i=1
2
h
i
(x
)
_
y 0
for all H(x
)
T
y = 0.
Shuzhong Zhang
IE 8534 8
Conventionally we call the function
L(x, ) = f(x) +
m
i=1
i
h
i
(x)
the Lagrangian function.
The Lagrange theorem can be stated as:
There is such that
x
L(x
, ) = 0
and
y
T
2
xx
L(x
, )y 0
for all H(x
)
T
y = 0.
Shuzhong Zhang
IE 8534 9
Just like in the unconstrained case, if, additionally, x
satises
y
T
_
2
f(x
) +
m
i=1
2
h
i
(x
)
_
y > 0
for any y = 0 and H(x
)
T
y = 0, then x
is a local minimum.
In other words, this is a sucient condition for optimality.
In the form of Lagrangian function, the sucient condition is
y
T
2
xx
L(x
, )y > 0
for all y = 0 and H(x
)
T
y = 0.
Shuzhong Zhang
IE 8534 10
Lagrangian multipliers are also known as shadow prices.
Consider
minimize f(x)
subject to h
i
(x) = y
i
, i = 1, ..., m.
If the problem has an optimal solution with value c(y). The
corresponding Lagrangian multiplier is (y). Then,
c(y) = (y),
i.e.,
c(y + y) c(y)
m
i=1
i
(y)y
i
.
Shuzhong Zhang
IE 8534 11
How about the problem with equality and inequality constraints?
minimize f(x)
subject to h
i
(x) = 0, i = 1, ..., m,
g
j
(x) 0, j = 1, ..., r.
First, let
A(x) = {j | g
j
(x) = 0}
be the set of active constraints at point x.
Assume that
[h
i
(x), g
j
(x) | i = 1, ..., m; j A(x)]
are always linearly independent. (This is a regularity condition, or,
constraint qualication).
Shuzhong Zhang
IE 8534 12
We may convert an inequality constraint into an equality constraint,
resulting in an equivalent problem
minimize f(x)
subject to h
i
(x) = 0, i = 1, ..., m,
g
j
(x) +z
2
j
= 0, j = 1, ..., r.
Let the Lagrangian multipliers be (rst m constraints) and (next r
constraints).
Shuzhong Zhang
IE 8534 13
The Lagrangian function is L(x, z, , ). Moreover,
L(x, z, , ) =
_
_
f(x) +
m
i=1
i
h
i
(x) +
r
j=1
j
g
j
(x)
2
1
z
1
.
.
.
2
r
z
r
_
_
Shuzhong Zhang
IE 8534 14
Its Hessian matrix is
2
L
x,z
(x, z, , )
=
_
2
f(x) +
m
i=1
2
h
i
(x) +
r
j=1
2
g
j
(x) 0
0 2diag ()
_
_
where diag () is a diagonal matrix with as its components. Hence,
_
y
T
w
T
_
2
x,z
L(x, z, , )
_
_
y
w
_
_
= y
T
_
_
2
f(x) +
m
i=1
2
h
i
(x) +
r
j=1
2
g
j
(x)
_
_
y + 2
r
j=1
j
w
2
j
Shuzhong Zhang
IE 8534 15
The rst order optimality condition gives
_
_
f(x
) +
m
i=1
i
h
i
(x
) +
r
j=1
j
g
j
(x
) = 0
2
j
z
j
= 0, j = 1, ..., r.
This implies that
j
= 0 whenever z
j
= 0, i.e. j A(x
).
Shuzhong Zhang
IE 8534 16
The second order optimality condition states that
_
y
T
w
T
_
2
x,z
L(x, z, , )
_
_
y
w
_
_
0
for all
(
x,z
h
i
(x, z))
T
_
_
y
w
_
_
= 0
and
(
x,z
g
j
(x, z))
T
_
_
y
w
_
_
= 0.
Shuzhong Zhang
IE 8534 17
Now,
x,z
h
i
(x, z) =
_
_
h
i
(x)
0
_
_
and
x,z
g
j
(x, z) =
_
_
g
j
(x)
2z
j
_
_
.
Therefore,
(
x,z
g
j
(x, z))
T
_
_
y
w
_
_
= (g
j
(x))
T
y + 2z
j
w
j
.
Shuzhong Zhang
IE 8534 18
Hence, the second order optimality condition is
y
T
_
_
2
f(x
) +
m
i=1
2
h
i
(x
) +
jA(x
2
g
j
(x
)
_
_
y 0
for all y such that h
i
(x
)
T
y = 0, i = 1, ..., m, and
g
j
(x
)
T
y = 0, j A(x
).
Shuzhong Zhang
IE 8534 19
Now we are in a position to state the famous theorem of Karush,
Kuhn and Tucker.
Theorem 2 Consider
minimize f(x)
subject to h
i
(x) = 0, i = 1, ..., m,
g
j
(x) 0, j = 1, ..., r.
Suppose that the constraints are regular, i.e.,
h
1
(x), ..., h
m
(x), g
j
(x), j A(x),
are linearly independent for any given x. Then, if x
is an optimal
solution, there must exist
i
, i = 1, ..., m,
j
0, j = 1, ..., r, such that
x
L(x
, , ) = 0 and y
T
2
x
L(x
, , )y 0
for all y such that h
i
(x
)
T
y = 0, i = 1, ..., m, and
g
j
(x
)
T
y = 0, j A(x
). Moreover,
j
= 0 for all j A(x
).
Shuzhong Zhang
IE 8534 20
The condition becomes sucient if
y
T
_
_
2
f(x
) +
m
i=1
2
h
i
(x
) +
jA(x
2
g
j
(x
)
_
_
y > 0
for all y = 0 with h
i
(x
)
T
y = 0, i = 1, ..., m, and
g
j
(x
)
T
y = 0, j A(x
), and moreover,
j
> 0, for j A(x
).
The last condition is called strict complementarity.
The usual complementarity refers to the relationship:
j
0, g
j
(x
) 0, and
j
g
j
(x
) = 0
for all j = 1, ..., r.
The KKT condition also becomes sucient when the problem is
convex.
Shuzhong Zhang
IE 8534 21
In the statement of the KKT theorem, we need to assume a regularity
condition.
Without any regularity condition, Lagrangian multipliers may not
exist. Consider for example:
minimize x
1
+x
2
subject to (x
1
1)
2
+x
2
2
1 0
(x
1
2)
2
x
2
2
+ 4 0.
Shuzhong Zhang
IE 8534 22
However, the following condition always holds: There are
i
,
i = 1, ..., m, and
j
0, j = 0, 1, ..., r, such that
0
f(x
) +
m
i=1
i
h
i
(x
) +
r
j=1
j
g
j
(x
) = 0,
where
j
g
j
(x
))
T
d = 0, i = 1, ..., m,
and
(g
j
(x
))
T
d < 0, for all j A(x
).
It is easy to see that with this condition, one must have
0
> 0.
Shuzhong Zhang
IE 8534 24
The Mangasarian-Fromovitz constraint qualication may still be
dicult to verify.
There is a very well known and simple veriable constraint
qualication in case all h
i
s are linear, and g
j
s are convex. This is
known as the Slater constraint qualication:
There is x such that
h
i
( x) = 0, for all i = 1, ..., m,
and
g
j
( x) < 0, for all j = 1, ..., r.
Shuzhong Zhang
IE 8534 25
To prove that Slaters condition implies the Mangasarian-Fromovitz
condition, we need only to use the subgradient inequality:
0 > g
j
( x) g
j
(x
) + (g
j
(x
))
T
( x x
) = (g
j
(x
))
T
( x x
)
for j A(x
). Therefore, if we let d = x x
, then
(g
j
(x
))
T
d < 0
for j A(x
), and
h
i
(x
)
T
d = 0
due to the linearity of h
i
, i = 1, ..., m.
Shuzhong Zhang
IE 8534 26
Now we wish to establish a link to geometry.
For a set X and a given x X, we call y a tangent direction if there is
a sequence x
k
X such that
(x
k
x)/x
k
x y/y.
The collection of all tangent directions at x is called a tangent cone,
denoted by T(x).
Recall Proposition 1 in Lecture 5.
For a constrained optimization problem
(P) minimize f(x)
subject to x X,
if x
))
T
(x x
) 0
for all x X.
Shuzhong Zhang
IE 8534 27
Quite clearly, this proposition suggests that at an optimal point x
, we
must have
(f(x
))
T
y 0
for all y T(x
).
What is T(x
) =
_
y | (h
i
(x
))
T
y = 0, i = 1, ..., m, (g
j
(x
))
T
y 0, for j A(x
)
_
.
Note the following well known Farkas Lemma:
c
T
y 0 for all By = 0 and Cy 0 is equivalent to and
0 such that c = B
T
C
T
.
Shuzhong Zhang
IE 8534 28
Using Farkas Lemma, we can conclude that if x
is a quasi-regular
point, and that x
j
0 for j A(x
) such that
0 = f(x
) +
m
i=1
i
h
i
(x
) +
jA(x
j
g
j
(x
).
By supplementing
j
= 0 for all j A(x
) we then have
0 = f(x
) +
m
i=1
i
h
i
(x
) +
r
j=1
j
g
j
(x
)
with
j
g
j
(x