Dualidad
Dualidad
Dualidad
DUALITY
343
constraint ci by 10ci . The new problem will actually be equivalent (that is, it has the same feasible set and same solution), but the optimal multiplier i corresponding to ci will be replaced by i /10. However, since ci (x ) is replaced by 10 ci (x ), the product i ci (x ) does not change. If, on the other hand, we replace the objective function f by 10 f , the multipliers i in (12.34) all will need to be replaced by 10i . Hence in (12.80) we see that the sensitivity of f to perturbations has increased by a factor of 10, which is exactly what we would expect.
12.9 DUALITY
In this section we present some elements of the duality theory for nonlinear programming. This theory is used to motivate and develop some important algorithms, including the augmented Lagrangian algorithms of Chapter 17. In its full generality, duality theory ranges beyond nonlinear programming to provide important insight into the elds of convex nonsmooth optimization and even discrete optimization. Its specialization to linear programming proved central to the development of that area; see Chapter 13. (We note that the discussion of linear programming duality in Section 13.1 can be read without consulting this section rst.) Duality theory shows how we can construct an alternative problem from the functions and data that dene the original optimization problem. This alternative dual problem is related to the original problem (which is sometimes referred to in this context as the primal for purposes of contrast) in fascinating ways. In some cases, the dual problem is easier to solve computationally than the original problem. In other cases, the dual can be used to obtain easily a lower bound on the optimal value of the objective for the primal problem. As remarked above, the dual has also been used to design algorithms for solving the primal problem. Our results in this section are mostly restricted to the special case of (12.1) in which there are no equality constraints and the objective f and the negatives of the inequality constraints ci are all convex functions. For simplicity we assume that there are m inequality constraints labelled 1, 2, . . . , m and rewrite (12.1) as follows: min f (x ) n
x I R
subject to ci (x ) 0, i 1, 2, . . . , m .
If we assemble the constraints into a vector function c(x ) (c1 (x ), c2 (x ), . . . , cm (x ))T , we can write the problem as min f (x ) n
x I R def
subject to c(x ) 0,
(12.81)
344
CHAPTER 12.
THEORY
OF
CONSTRAINED OPTIMIZATION
for which the Lagrangian function (12.16) with Lagrange multiplier vector I Rm is L ( x , ) f ( x ) T c ( x ) . We dene the dual objective function q : I Rn I R as follows: q () inf L(x , ).
x def
(12.82)
In many problems, this inmum is for some values of . We dene the domain of q as the set of values for which q is nite, that is, D { | q () > }.
def
(12.83)
Note that calculation of the inmum in (12.82) requires nding the global minimizer of the function L(, ) for the given which, as we have noted in Chapter 2, may be extremely difcult in practice. However, when f and ci are convex functions and 0 (the case in which we are most interested), the function L(, ) is also convex. In this situation, all local minimizers are global minimizers (as we verify in Exercise 12.4), so computation of q () becomes a more practical proposition. The dual problem to (12.81) is dened as follows: q () max n
I R
subject to 0.
(12.84)
EXAMPLE 12.10
Consider the problem
2 2 min 0.5(x1 + x2 )
(x1 ,x2 )
subject to x1 1 0.
(12.85)
The Lagrangian is
2 2 + x2 ) 1 (x1 1). L(x1 , x2 , 1 ) 0.5(x1
If we hold 1 xed, this is a convex function of ( x1 , x2 )T . Therefore, the inmum with respect to (x1 , x2 )T is achieved when the partial derivatives with respect to x1 and x2 are zero, that is, x1 1 0, x2 0.
12.9.
DUALITY
345
By substituting these inmal values into L(x1 , x2 , 1 ) we obtain the dual objective (12.82):
2 q (1 ) 0.5(2 1 + 0) 1 (1 1) 0.51 + 1 .
(12.86)
In the remainder of this section, we show how the dual problem is related to (12.81). Our rst result concerns concavity of q .
Theorem 12.10. The function q dened by (12.82) is concave and its domain D is convex. PROOF. For any 0 and 1 in I Rm , any x I Rn , and any [0, 1], we have
L(x , (1 )0 + 1 ) (1 )L(x , 0 ) + L(x , 1 ). By taking the inmum of both sides in this expression, using the denition (12.82), and using the results that the inmum of a sum is greater than or equal to the sum of inmums, we obtain q ((1 )0 + 1 ) (1 )q (0 ) + q (1 ), conrming concavity of q . If both 0 and 1 belong to D, this inequality implies that q ((1 )0 + 1 ) also, and therefore (1 )0 + 1 D, verifying convexity of D. The optimal value of the dual problem (12.84) gives a lower bound on the optimal objective value for the primal problem (12.81). This observation is a consequence of the following weak duality result.
Theorem 12.11 (Weak Duality). 0, we have q ( ) f (x For any x feasible for (12.81) and any ). PROOF.
) inf f (x ) T c( x ) f ( x T c( x q ( ) ) f (x ),
x
346
CHAPTER 12.
THEORY
OF
CONSTRAINED OPTIMIZATION
For the remaining results, we note that the KKT conditions (12.34) specialized to (12.81) are as follows: 0, f (x ) c( x ) c( x ) 0, 0, i ci (x ) 0, i 1, 2, . . . , m , (12.87a) (12.87b) (12.87c) (12.87d)
where c(x ) is the n m matrix dened by c(x ) [ c1 (x ), c2 (x ), . . . , cm (x )]. The next result shows that optimal Lagrange multipliers for (12.81) are solutions of the dual problem (12.84) under certain conditions. It is essentially due to Wolfe [309].
Theorem 12.12. Suppose that x is a solution of (12.81) and that f and ci , i 1, 2, . . . , m are convex for which (x ) satises the KKT . Then any , functions on I Rn that are differentiable at x conditions (12.87) is a solution of (12.84).
) satises (12.87). We have from 0 that L(, ) is a convex PROOF. Suppose that (x , and differentiable function. Hence, for any x , we have )T ( x x ), ) L( x ) + x L( x , ) L( x , L( x , , where the last equality follows from (12.87a). Therefore, we have ) L( x ) f (x T c( x ) inf L(x , , ) ) f (x ), q (
x
where the last equality follows from (12.87d). Since from Theorem 12.11, we have q () ) f (x is a solution of f (x ) for all 0 it follows immediately from q ( ) that (12.84). Note that if the functions are continuously differentiable and a constraint qualication such as LICQ holds at x , then an optimal Lagrange multiplier is guaranteed to exist, by Theorem 12.1. In Example 12.10, we see that 1 1 is both an optimal Lagrange multiplier for the problem (12.85) and a solution of (12.86). Note too that the optimal objective for both problems is 0.5. We prove a partial converse of Theorem 12.12, which shows that solutions to the dual problem (12.84) can sometimes be used to derive solutions to the original problem (12.81). ) for a certain value . We The essential condition is strict convexity of the function L(, note that this condition holds if either f is strictly convex (as is the case in Example 12.10) i > 0. or if ci is strictly convex for some i 1, 2, . . . , m with
12.9.
DUALITY
347
Theorem 12.13. Suppose that f and ci , i 1, 2, . . . , m are convex and continuously differentiable on solves (12.84) is a solution of (12.81) at which LICQ holds. Suppose that I Rn . Suppose that x ) is attained at x ) is a strictly . Assume further than L(, and that the inmum in inf x L(x , ). convex function. Then x x (that is, x is the unique solution of (12.81)), and f (x ) L( x , PROOF. Assume for contradiction that x x . From Theorem 12.1, because of the LICQ assumption, there exists satisfying (12.87). Hence, from Theorem 12.12, we have that also solves (12.84), so that
) q ( ) q ( ) L( x ). L( x , , ), we have from Theorem 2.2 that x L(x ) 0. Moreover, Because x arg minx L(x , , ), it follows that by strict convexity of L(, ) L( x ) > x L( x )T ( x L( x , , , x ) 0. Hence, we have ) > L( x ) L( x ), L( x , , , so in particular we have T c( x T c( x ) > ) 0, 0 and c(x where the nal equality follows from (12.87d). Since ) 0, this yields the contradiction, and we conclude that x x , as claimed. In Example 12.10, at the dual solution 1 1, the inmum of L(x1 , x2 , 1 ) is achieved at (x1 , x2 ) (1, 0)T , which is the solution of the original problem (12.85). An slightly different form of duality that is convenient for computations, known as the Wolfe dual [309], can be stated as follows: max L(x , )
x ,
(12.88a) 0. (12.88b)
subject to x L(x , ) 0,
The following results explains the relationship of the Wolfe dual to (12.81).
Theorem 12.14. Suppose that f and ci , i 1, 2, . . . , m are convex and continuously differentiable on n ) is a solution pair of (12.81) at which LICQ holds. Then (x ) solves , , I R . Suppose that (x the problem (12.88).
348
CHAPTER 12.
THEORY
OF
CONSTRAINED OPTIMIZATION
) satises (12.88b), and that PROOF. From the KKT conditions (12.87) we have that ( x , ) f (x L( x , ). Therefore for any pair (x , ) that satises (12.88b) we have that ) f (x L( x , ) ) f (x ) T c( x L( x , ) L ( x , ) + x L ( x , ) T ( x x) L ( x , ) , where the second inequality follows from the convexity of L(, ). We have therefore shown ) maximizes L over the constraints (12.88b), and hence solves (12.88). that (x ,
EXAMPLE 12.11
(LINEAR PROGRAMMING)
An important special case of (12.81) is the linear programming problem min c T x subject to Ax b 0, for which the dual objective is q () inf c T x T ( Ax b) inf (c A T )T x + b T .
x x
(12.89)
If c A T 0, the inmum is clearly (we can set x to be a large negative multiple of (c A T ) to make q arbitrarily large and negative). When c A T 0, on the other hand, the dual objective is simply b T . In maximizing q , we can exclude for which c A T 0 from consideration (the maximum obviously cannot be attained at a point for which q () ). Hence, we can write the dual problem (12.84) as follows: max b T
subject to A T c, 0.
(12.90)
subject to A T c, 0,
and by substituting the constraint A T c 0 into the objective we obtain (12.90) again. For some matrices A, the dual problem (12.90) may be computationally easier to solve than the original problem (12.89). We discuss the possibilities further in Chapter 13.
12.9.
DUALITY
349
EXAMPLE 12.12
Consider
min
1 T x Gx + c T x subject to Ax b 0, 2
(12.91)
where G is a symmetric positive denite matrix. The dual objective for this problem is q () inf L(x , ) inf
x x
1 T x Gx + c T x T ( Ax b). 2
(12.92)
Since G is positive denite, since L(, ) is a strictly convex quadratic function, the inmum is achieved when x L(x , ) 0, that is, Gx + c A T 0. (12.93)
Hence, we can substitute for x in the inmum expression and write the dual objective explicitly as follows: 1 q () ( A T c)T G 1 ( A T c)T + b T . 2 Alternatively, we can write the Wolfe dual form (12.88) by retaining x as a variable and including the constraint (12.93) explicitly in the dual problem, to obtain max
(,x )
subject to
1 T x Gx + c T x T ( Ax b) 2 Gx + c A T 0, 0.
(12.94)
To make it clearer that the objective is concave, we can use the constraint to substitute (c A T )T x x T Gx in the objective, and rewrite the dual formulation as follows: 1 max x T Gx + T b, subject to Gx + c A T 0, 0. (,x ) 2 (12.95)
Note that the Wolfe dual form requires only positive semideniteness of G .