Linear Conic Programming
Linear Conic Programming
Linear Conic Programming
ii
Preface
This monograph is developed for MS&E 314, Semidenite Programming, which I am teaching at Stanford. Information, supporting materials, and computer programs related to this book may be found at the following address on the World-Wide Web: http://www.stanford.edu/class/msande314 Please report any question, comment and error to the address: [email protected] A little story in the development of semidenite programming. One day in 1990, I visited the Computer Science Department of the University of Minnesota and met a young graduate student, Farid Alizadeh. He, working then on combinatorial optimization, introduced me semidenite optimization or linear programming over the positive denite matrix cone. We had a very extensive discussion that afternoon and concluded that interior-point linear programming algorithms could be applicable to solving SDP. I urged Farid to look at the LP potential functions and to develop an SDP primal potential reduction algorithm. He worked hard for several months, and one afternoon showed up in my oce in Iowa City, about 300 miles from Minneapolis. He had everything worked out, including potential, algorithm, complexity bound, and even a dictionary from LP to SDP, but was stuck on one problem which was how to keep the symmetry of the matrix. We went to a bar nearby on Clinton Street in Iowa City (I paid for him since I was a third-year professor then and eager to demonstrate that I could take care of students). After chatting for a while, I suggested that he use X 1/2 X 1/2 to keep symmetry instead of X 1 which he was using, where X is the current symmetric positive denite matrix and is the symmetric directional matrix. He returned to Minneapolis and moved to Berkeley shortly after, and few weeks later sent me an e-mail message telling me that everything had worked out beautifully. At the same time or even earlier, Nesterov and Nemirovskii developed a more general and powerful theory in extending interior-point algorithms for solving conic programs, where SDP was a special case. Boyd and his group later presented a wide range of SDP applications and formulations, many of which were incredibly novel and elegant. Then came the primal-dual algorithm, the Max-Cut, ... and SDP established its full popularity. YINYU YE iii
iv Stanford, 2002
PREFACE
Chapter 1
Semidenite Programming, hereafter SDP, is a natural extension of Linear programming (LP) that is a central decision model in Management Science and Operations Research. LP plays an extremely important role in the theory and application of Optimization. In one sense it is a continuous optimization problem in minimizing a linear objective function over a convex polyhedron; but it is also a combinatorial problem involving selecting an extreme point among a nite set of possible vertices. Businesses, large and small, use linear programming models to optimize communication systems, to schedule transportation networks, to control inventories, to adjust investments, and to maximize productivity. In LP, the variables form a vector which is regiured to be nonnegative, where in SDP they are components of a matrix and it is constrained to be positive semidenite. Both of them may have linear equality constraints as well. One thing in common is that interior-point algorithms developed in past two decades for LP are naturally applied to solving SDP. Interior-point algorithms are continuous iterative algorithms. Computation experience with sophisticated procedures suggests that the number of iterations necessarily grows much more slowly than the dimension grows. Furthermore, they have an established worst-case polynomial iteration bound, providing the potential for dramatic improvement in computation eectiveness. The goal of the monograph is to provide a text book for teaching Semidenite Programming, a modern Linear Programming decision model and its applications in other scientic and engineering elds. One theme of the monograph is the mapping between SDP and LP, so that the reader, with knowledge of LP, can understand SDP with little eort. The monograph is organized as follows. In Chapter 1, we discuss some 1
necessary mathematical preliminaries. We also present several decision and optimization problems and several basic numerical procedures used throughout the text. Chapter 2 is devoted to studying the theories and geometries of linear and matrix inequalities, convexity, and semidenite programming. Almost all interior-point methods exploit rich geometric properties of linear and matrix inequalities, such as center, volume, potential, etc. These geometries are also helpful for teaching, learning, and research. Chapter 3 is focused on interior-point algorithms. Here, we select two types algorithms: the path-following algorithm and the potential reduction algorithm. Each algorithm has three forms, the primal, the dual and the primal-dual form. We analyze the worst-case complexity bound for them, where we will use the real number computation model in our analysis because of the continuous nature of interior-point algorithms. We also compare the complexity theory with the convergence rate used in numerical analysis. Not only has the convergnece speed of SDP algorithms been signicantly improved during the last decade, but also the problem domain applicable by SDP has dramatically widened. Chapters 4, 5, and 6 would describe some of SDP applications and new established results in Engineering, Combinatory Optimization, Robust Optimization, Euclidean Geometry Computation, etc. Finally, we discuss major computational issues in Chapter 7. We discuss several eective implementation techniques frequently used in interior-point SDP software, such as the sparse linear system, the predictor and corrector step, and the homogeneous and self-dual formulation. We also present major diculties and challengies faced by SDP.
1.2
Mathematical Preliminaries
This section summarizes mathematical background material for linear algebra, linear programming, and nonlinear optimization.
1.2.1
Basic notations
The notation described below will be followed in general. There may be some deviation where appropriate. By R we denote the set of real numbers. R+ denotes the set of nonnegative real numbers, and R+ denotes the set of positive numbers. For a natural number
n n, the symbol Rn (Rn + , R+ ) denotes the set of vectors with n components in n R (R+ , R+ ). We call Rn + the interior of R+ . The vector inequality x y means xj yj for j = 1, 2, ..., n. Zero represents a vector whose entries are all zeros and e represents a vector whose entries are all ones, where their dimensions may vary according to other vectors in expressions. A vector is always considered as a column vector, unless otherwise stated. Upper-case letters will be used to represent matrices. Greek letters will
typically be used to represent scalars. For convenience, we sometime write a column vector x as x = (x1 ; x2 ; . . . ; xn ) and a row vector as x = (x1 , x2 , . . . , xn ). Addition of matrices and multiplication of matrices with scalars are standard. The superscript T denotes transpose operation. The inner product in Rn is dened as follows:
n
x, y := xT y =
j =1
xj yj
for
x, y Rn .
The l2 norm of a vector x is given by x and the l norm is x In general, the p norm is
n 1/p 2
xT x,
=
1
|xj |
p = 1, 2, ...
The dual of the p norm, denoted by . , is the q norm, where 1 1 + = 1. p q In this book, . generally represents the l2 norm. For natural numbers m and n, Rmn denotes the set of real matrices with m rows and n columns. For A Rmn , we assume that the row index set of A is {1, 2, ..., m} and the column index set is {1, 2, ..., n}. The ith row of A is denoted by ai. and the j th column of A is denoted by a.j ; the i and j th component of A is denoted by aij . If I is a subset of the row index set and J is a subset of the column index set, then AI denotes the submatrix of A whose rows belong to I , AJ denotes the submatrix of A whose columns belong to J , AIJ denotes the submatrix of A induced by those components of A whose indices belong to I and J , respectively. The identity matrix is denoted by I . The null space of A is denoted N (A) and the range of A is R(A). The determinant of an n n-matrix A is denoted by det(A). The trace of A, denoted by tr(A), is the sum of the diagonal entries in A. The operator norm of A, denoted by A , is A
2
:= max n
0=xR
Ax 2 . x 2
For a vector x Rn , Dx represents a diagonal matrix in Rnn whose diagonal entries are the entries of x, i.e., Dx = diag(x). A matrix Q Rnn is said to be positive denite (PD), denoted by Q if xT Qx > 0, xT Qx 0, for all for all x = 0, 0, if x. 0,
If Q 0, then Q is called negative denite (ND), denoted by Q 0; if Q 0, then Q is called negative semi-denite (NSD), denoted by Q 0. If Q is symmetric, then its eigenvalues are all real numbers; furthermore, Q is PSD if and only if all its eigenvalues are non-negative, and Q is PD if and only if all its eigenvalue are positive. Given a PD matrix Q we can dene a Q-norm , . Q , for vector x as x Q = xT Qx . Mn denotes the space of symmetric matrices in Rnn . The inner product in Mn is dened as follows: X, Y := X Y = trX T Y =
i,j
Xi,j Yi,j
for X, Y Mn .
This is a generalization of the vector inner product to matrices. The matrix norm associated with the inner product is called Frobenius norm: X f = trX T X .
n n Mn + denote the set of positive semi-denite matrices in M . M+ denotes the n set of positive denite matrices in Mn . We call Mn + the interior of M+ . 0 1 2 k k {x }0 is an ordered sequence x , x , x , ..., x , .... A sequence {xk } 0 is convergent to x , denoted xk x , if
xk x 0.
k A point x is a limit point of {xk } 0 if there is a subsequence of {x } convergent to x. If g (x) 0 is a real valued function of a real nonnegative variable, the notation g (x) = O(x) means that g (x) c x for some constant c ; the notation g (x) = (x) means that g (x) cx for some constant c; the notation g (x) = (x) means that cx g (x) c x. Another notation is g (x) = o(x), which means that g (x) goes to zero faster than x does:
x0
lim
g (x) = 0. x
1.2.2
Convex sets
If x is a member of the set , we write x ; if y is not a member of , we write y . The union of two sets S and T is denoted S T ; the intersection of them is denoted S T . A set can be specied in the form = {x : P (x)} as the set of all elements satisfying property P . For y Rn and > 0, B (y, ) = {x : x y } is the ball or sphere of radius with center y . In addition, for a positive denite matrix Q of dimension n, E (y, Q) = {x : (x y )T Q(x y ) 1} is called an ellipsoid. The vector y is the center of E (y, Q). A set is closed if xk x, where xk , implies x . A set is open if around every point y there is a ball that is contained in , i.e., there is an > 0 such that B (y, ) . A set is bounded if it is contained within a ball with nite radius. A set is compact if it is both closed and bounded. The (topological) interior of any set , denoted , is the set of points in which is are the centers of some balls contained in . The closure of , denoted , that the smallest closed set containing . The boundary of is the part of is not in . A set C is said to be convex if for every x1 , x2 C and every real number , 0 < < 1, the point x1 + (1 )x2 C . The convex hull of a set is the intersection of all convex sets containing . A set C is a cone if x C implies x C for all > 0. A cone that is also convex is a convex cone. For a cone C E , the dual of C is the cone C := {y : x, y 0 for all x C },
Example 1.2 The set of all positive semi-denite matrices in Mn , Mn + , is a convex cone, called the positive semi-denite matrix cone. The dual of the cone is also Mn + ; it is self-dual. Example 1.3 The set {(t; x) Rn+1 : t x } is a convex cone in Rn+1 , called the second-order cone. The dual of the cone is also the second-order cone in Rn+1 ; it is self-dual. A cone C is (convex) polyhedral if C can be represented by C = {x : Ax 0} for some matrix A (Figure 1.1). Example 1.4 The non-negative orthant is a polyhedral cone, and neither the positive semi-denite matrix cone nor the second-order cone is polyhedral.
Polyhedral Cone
Nonpolyhedral Cone
Figure 1.1: Polyhedral and nonpolyhedral cones. The most important type of convex set is a hyperplane. Hyperplanes dominate the entire theory of optimization. Let a be a nonzero n-dimensional vector, and let b be a real number. The set H = {x Rn : aT x = b} is a hyperplane in Rn (Figure 1.2). Relating to hyperplane, positive and negative closed half spaces are given by H+ = {x : aT x b} H = {x : aT x b}.
+
H
0
Figure 1.2: A hyperplane and half-spaces. A set which can be expressed as the intersection of a nite number of closed half spaces is said to be a convex polyhedron: P = {x : Ax b}.
A bounded polyhedron is called polytope. Let P be a polyhedron in Rn , F is a face of P if and only if there is a vector c for which F is the set of points attaining max {cT x : x P } provided the this maximum is nite. A polyhedron has only nite many faces; each face is a nonempty polyhedron. The most important theorem about the convex set is the following separating theorem (Figure 1.3). Theorem 1.1 (Separating hyperplane theorem) Let C E , where E is either Rn or Mn , be a closed convex set and let y be a point exterior to C . Then there is a vector a E such that a, y < inf a, x .
xC
The geometric interpretation of the theorem is that, given a convex set C and a point y outside of C , there is a hyperplane containing y that contains C in one of its open half spaces.
Figure 1.3: Illustration of the separating hyperplane theorem; an exterior point y is separated by a hyperplane from a convex set C . Example 1.5 Let C be a unit circle centered at the point (1; 1). That is, C = {x R2 : (x1 1)2 + (x2 1)2 1}. If y = (2; 0), a = (1; 1) is a separating hyperplane vector. If y = (0; 1), a = (0; 1) is a separating hyperplane vector. It is worth noting that these separating hyperplanes are not unique. We use the notation E to represent either Rn or Mn , depending on the context, throughout this book, because all our decision and optimization problems take variables from one or both of these two vector spaces.
1.2.3
Real functions
The real function f (x) is said to be continuous at x if xk x implies f (xk ) f (x). The real function f (x) is said to be continuous on set E , where recall that E is either Rn or Mn , if f (x) is continuous at x for every x .
P (x) = n log(c x)
j =1
log xj
P (X ) = n log(C X ) log det(X ) is homogeneous of degree 0. A set of real-valued function f1 , f2 , ..., fm dened on E can be written as a single vector function f = (f1 , f2 , ..., fm )T Rm . If f has continuous partial derivatives of order p, we say f C p . The gradient vector or matrix of a real-valued function f C 1 is a vector or matrix f (x) = {f /xij }, for i, j = 1, ..., n.
If f C 2 , we dene the Hessian of f to be the n2 -dimensional symmetric matrix 2 f (x) = 2f xij xkl for i, j, k, l = 1, ..., n.
If f = (f1 , f2 , ..., fm )T Rm , the Jacobian matrix of f is f1 (x) . ... f (x) = fm (x) f is a (continuous) convex function if and only if for 0 1, f (x + (1 )y ) f (x) + (1 )f (y ). f is a (continuous) quasi-convex function if and only if for 0 1, f (x + (1 )y ) max[f (x), f (y )]. Thus, a convex function is a quasi-convex function. The level set of f is given by L(z ) = {x : f (x) z }. f is a quasi-convex function implies that the level set of f is convex for any given z (see Exercise 1.9). A group of results that are used frequently in analysis are under the heading of Taylors theorem or the mean-value theorem. The theorem establishes the linear and quadratic approximations of a function.
Theorem 1.2 (Taylor expansion) Let f C 1 be in a region containing the line segment [x, y ]. Then there is a , 0 1, such that f (y ) = f (x) + f (x + (1 )y )(y x). Furthermore, if f C 2 then there is a , 0 1, such that f (y ) = f (x) + f (x)(y x) + (1/2)(y x)T 2 f (x + (1 )y )(y x). We also have several propositions for real functions. The rst indicates that the linear approximation of a convex function is a under-estimate. Proposition 1.3 Let f C 1 . Then f is convex over a convex set if and only if f (y ) f (x) + f (x)(y x) for all x, y . The following proposition states that the Hessian of a convex function is positive semi-denite. Proposition 1.4 Let f C 2 . Then f is convex over a convex set if and only if the Hessian matrix of f is positive semi-denite throughout .
1.2.4
Inequalities
There are several important inequalities that are frequently used in algorithm design and complexity analysis. Cauchy-Schwarz: given x, y Rn , then xT y x y . Arithmetic-geometric mean: given x Rn +, xj n Harmonic: given x Rn +, xj 1/xj n2 .
xj
1/n
10
1.3
A decision or optimization problem has a form that is usually characterized by the decision variables and the constraints. A problem, P , consists of two sets, data set Zp and solution set Sp . In general, Sp can be implicitly dened by the so-called optimality conditions. The solution set may be empty, i.e., problem P may have no solution. Theorem 1.5 Weierstrass theorem A continuous function f dened on a compact set (bounded and closed) E has a minimizer in ; that is, there is an x such that for all x , f (x) f (x ). In what follows, we list several decision and optimization problems. More problems will be listed later when we address them.
1.3.1
Given A Rmn and b Rm , the problem is to solve m linear equations for n unknowns: Ax = b. The data and solution sets are Zp = {A Rmn , b Rm } and Sp = {x Rn : Ax = b}. Sp in this case is an ane set. Given an x, one can easily check to see if x is in Sp by a matrix-vector multiplication and a vector-vector comparison. We say that a solution of this problem is easy to recognize. To highlight the analogy with the theories of linear inequalities and linear programming, we list several well-known results of linear algebra. The rst theorem provides two basic representations, the null and row spaces, of a linear subspaces. Theorem 1.6 Each linear subspace of Rn is generated by nitely many vectors, and is also the intersection of nitely many linear hyperplanes; that is, for each linear subspace of L of Rn there are matrices A and C such that L = N (A) = R(C ). The following theorem was observed by Gauss. It is sometimes called the fundamental theorem of linear algebra. It gives an example of a characterization in terms of necessary and sucient conditions, where necessity is straightforward, and suciency is the key of the characterization. Theorem 1.7 Let A Rmn and b Rm . The system {x : Ax = b} has a solution if and only if that AT y = 0 implies bT y = 0. A vector y , with AT y = 0 and bT y = 1, is called an infeasibility certicate for the system {x : Ax = b}. Example 1.7 Let A = (1; 1) and b = (1; 1). Then, y = (1/2; 1/2) is an infeasibility certicate for {x : Ax = b}.
11
1.3.2
Given A Rmn and c Rn , the system of equations AT y = c may be overdetermined or have no solution. Such a case usually occurs when the number of equations is greater than the number of variables. Then, the problem is to nd an y Rm or s R(AT ) such that AT y c or s c is minimized. We can write the problem in the following format: (LS ) or minimize subject to minimize subject to AT y c y Rm ,
2
(LS )
sc 2 s R(AT ).
In the former format, the term AT y c 2 is called the objective function, y is called the decision variable. Since y can be any point in Rm , we say this (optimization) problem is unconstrained. The data and solution sets are Zp = {A Rmn , c Rn } and Sp = {y Rm : AT y c
2
AT x c
for every x Rm }.
Given a y , to see if y Sp is as the same as the original problem. However, from a projection theorem in linear algebra, the solution set can be characterized and represented as Sp = {y Rm : AAT y = Ac}, which becomes a system of linear equations and always has a solution. The vector s = AT y = AT (AAT )+ Ac is the projection of c onto the range of AT , where AAT is called normal matrix and (AAT )+ is called pseudo-inverse. If A has full row rank then (AAT )+ = (AAT )1 , the standard inverse of full rank matrix AAT . If A is not of full rank, neither is AAT and (AAT )+ AAT x = x only for x R(AT ). The vector c AT y = (I AT (AAT )+ A)c is the projection of c onto the null space of A. It is the solution of the following least-squares problem: (LS ) minimize subject to xc 2 x N (A).
In the full rank case, both matrices AT (AAT )1 A and I AT (AAT )1 A are called projection matrices. These symmetric matrices have several desired properties (see Exercise 1.15).
1.3.3
Given A Rmn and b Rm , the problem is to nd a solution x Rn satisfying Ax b or prove that the solution set is empty. The inequality
12
problem includes other forms such as nding an x that satises the combination of linear equations Ax = b and inequalities x 0. The data and solution sets of the latter are Zp = {A Rmn , b Rm } and Sp = {x Rn : Ax = b, x 0}.
Traditionally, a point in Sp is called a feasible solution, and a strictly positive point in Sp is called a strictly feasible or interior feasible solution. The following results are Farkas lemma and its variants. Theorem 1.8 (Farkas lemma) Let A Rmn and b Rm . Then, the system {x : Ax = b, x 0} has a feasible solution x if and only if that AT y 0 implies bT y 0. A vector y , with AT y 0 and bT y = 1, is called a (primal) infeasibility certicate for the system {x : Ax = b, x 0}. Geometrically, Farkas lemma means that if a vector b Rm does not belong to the cone generated by a.1 , ..., a.n , then there is a hyperplane separating b from cone(a.1 , ..., a.n ). Example 1.8 Let A = (1, 1) and b = 1. Then, y = 1 is an infeasibility certicate for {x : Ax = b, x 0}. Theorem 1.9 (Farkas lemma variant) Let A Rmn and c Rn . Then, the system {y : AT y c} has a solution y if and only if that Ax = 0 and x 0 imply cT x 0. Again, a vector x 0, with Ax = 0 and cT x = 1, is called a (dual) infeasibility certicate for the system {y : AT y c}. Example 1.9 Let A = (1; 1) and c = (1; 2). Then, x = (1; 1) is an infeasibility certicate for {y : AT y c}. We say {x : Ax = b, x 0} or {y : AT y c} is approximately feasible in the sense that we have an approximate solution to the equations and inequalities. In this case we can show that any certicate proving their infeasibility must have large norm. Conversely, if {x : Ax = b, x 0} or {y : AT y c} is approximately infeasible in the sense that we have an approximate certicate in Farkas lemma, then any feasible solution must have large norm. Example 1.10 Given > 0 but small. Let A = (1, 1) and b = . Then, x = (0; 0) is approximately feasible for {x : Ax = b, x 0}, and the infeasibility certicate y = 1/ has a large norm. Let A = (1; 1) and c = (1; 1 ). Then, y = 1 is approximately feasible for {y : AT y c}, and the infeasibility certicate x = (1/ ; 1/ ) has a large norm.
13
1.3.4
Given A Rmn , b Rm and c, l, u Rn , the linear programming (LP) problem is the following optimization problem: minimize subject to cT x Ax = b, l x u,
where some elements in l may be meaning that the associated variables are unbounded from below, and some elements in u may be meaning that the associated variables are unbounded from above. If a variable is unbounded either from below or above, then it is called a free variable The standard form linear programming problem is given below, which we will use throughout this book: (LP ) minimize cT x subject to Ax = b, x 0. The linear function cT x is called the objective function, and x is called the decision variables. In this problem, Ax = b and x 0 enforce constraints on the selection of x. The set Fp = {x : Ax = b, x 0} is called feasible set or feasible region. A point x Fp is called a feasible point, and a feasible point x is called an optimal solution if cT x cT x for all feasible points x. If there is a sequence {xk } such that xk is feasible and cT xk , then (LP) is said to be unbounded. The data and solution sets for (LP), respectively, are Zp = {A Rmn , b Rm , c Rn } and Sp = {x Fp : cT x cT y, for every y Fp }.
Again, given an x, to see if x Sp is as dicult as the original problem. However, due to the duality theorem, we can simplify the representation of the solution set signicantly. With every (LP), another linear program, called the dual (LD), is the following problem: (LD) maximize subject to bT y AT y + s = c, s 0,
where y Rm and s Rn . The components of s are called dual slacks. Denote by Fd the sets of all (y, s) that are feasible for the dual. We see that (LD) is also a linear programming problem where y is a free vector. The following theorems give us an important relation between the two problems. Theorem 1.10 (Weak duality theorem) Let Fp and Fd be non-empty. Then, cT x bT y where x Fp , (y, s) Fd .
14
This theorem shows that a feasible solution to either problem yields a bound on the value of the other problem. We call cT x bT y the duality gap. From this we have important results. Theorem 1.11 (Strong duality theorem) Let Fp and Fd be non-empty. Then, x is optimal for (LP) if and only if the following conditions hold: i) x Fp ; ii) there is (y , s ) Fd ; iii) cT x = bT y . Theorem 1.12 (LP duality theorem) If (LP) and (LD) both have feasible solutions then both problems have optimal solutions and the optimal objective values of the objective functions are equal. If one of (LP) or (LD) has no feasible solution, then the other is either unbounded or has no feasible solution. If one of (LP) or (LD) is unbounded then the other has no feasible solution. The above theorems show that if a pair of feasible solutions can be found to the primal and dual problems with equal objective values, then these are both optimal. The converse is also true; there is no gap. From this condition, the solution set for (LP) and (LD) is cT x bT y n m n Ax Sp = (x, y, s) (R+ , R , R+ ) : AT y s = 0 = b , = c
(1.1)
which is a system of linear inequalities and equations. Now it is easy to verify whether or not a pair (x, y, s) is optimal. For feasible x and (y, s), xT s = xT (c AT y ) = cT x bT y is called the complementarity gap. If xT s = 0, then we say x and s are complementary to each other. Since both x and s are nonnegative, xT s = 0 implies that xj sj = 0 for all j = 1, . . . , n. Thus, one equation plus nonnegativity are transformed into n equations. Equations in (1.1) become Dx s = Ax = AT y s = 0 b c.
(1.2)
This system has total 2n + m unknowns and 2n + m equations including n nonlinear equations. The following theorem plays an important role in analyzing LP interiorpoint algorithms. It give a unique partition of the LP variables in terms of complementarity.
15
Theorem 1.13 (Strict complementarity theorem) If (LP) and (LD) both have feasible solutions then both problems have a pair of strictly complementary solutions x 0 and s 0 meaning X s = 0 Moreover, the supports P = {j : x j > 0} and Z = {j : s j > 0} and x + s > 0.
are invariant for all pairs of strictly complementary solutions. Given (LP) or (LD), the pair of P and Z is called the (strict) complementarity partition. {x : AP xP = b, xP 0, xZ = 0} is called the primal T optimal face, and {y : cZ AT Z y 0, cP AP y = 0} is called the dual optimal face. Select m linearly independent columns, denoted by the index set B , from A. Then matrix AB is nonsingular and we may uniquely solve AB xB = b for the m-vector xB . By setting the variables, xN , of x corresponding to the remaining columns of A equal to zero, we obtain a solution x such that Ax = b. Then, x is said to be a (primal) basic solution to (LP) with respect to the basis AB . The components of xB are called basic variables. A dual vector y satisfying AT B y = cB is said to be the corresponding dual basic solution. If a basic solution x 0, then x is called a basic feasible solution. If the dual solution is also feasible, that is s = c AT y 0, then x is called an optimal basic solution and AB an optimal basis. A basic feasible solution is a vertex on the boundary of the feasible region. An optimal basic solution is an optimal vertex of the feasible region. If one or more components in xB has value zero, that basic solution x is said to be (primal) degenerate. Note that in a nondegenerate basic solution the basic variables and the basis can be immediately identied from the nonzero components of the basic solution. If all components, sN , in the corresponding dual slack vector s, except for sB , are non-zero, then y is said to be (dual) nondegenerate. If both primal and dual basic solutions are nondegenerate, AB is called a nondegenerate basis. Theorem 1.14 (LP fundamental theorem) Given (LP) and (LD) where A has full row rank m,
16
i) if there is a feasible solution, there is a basic feasible solution; ii) if there is an optimal solution, there is an optimal basic solution. The above theorem reduces the task of solving a linear program to that searching over basic feasible solutions. By expanding upon this result, the simplex method, a nite search procedure, is derived. The simplex method is to proceed from one basic feasible solution (an extreme point of the feasible region) to an adjacent one, in such a way as to continuously decrease the value of the objective function until a minimizer is reached. In contrast, interior-point algorithms will move in the interior of the feasible region and reduce the value of the objective function, hoping to by-pass many extreme points on the boundary of the region.
1.3.5
Given Q Rnn , A Rmn , b Rm and c Rn , the quadratic programming (QP) problem is the following optimization problem: (QP ) minimize subject to q (x) := (1/2)xT Qx + cT x Ax = b, x 0.
We may denote the feasible set by Fp . The data and solution sets for (QP) are Zp = {Q Rnn , A Rmn , b Rm , c Rn } and Sp = {x Fp : q (x) q (y ) for every y Fp }. A feasible point x is called a KKT point, where KKT stands for KarushKuhn-Tucker, if the following KKT conditions hold: there exists (y Rm , s Rn ) such that (x , y , s ) is feasible for the following dual problem: (QD) maximize subject to d(x, y ) := bT y (1/2)xT Qx AT y + s Qx = c, x, s 0,
and satises the complementarity condition (x )T s = (1/2)(x )T Qx + cT x (bT y (1/2)(x )T Qx = 0. Similar to LP, we can write the KKT condition as:
m n (x, y, s) (Rn + , R , R+ )
and
Dx s = 0 Ax = b AT y + Qx s = c.
(1.3)
Again, this system has total 2n + m unknowns and 2n + m equations including n nonlinear equations.
17
The above condition is also called the rst-order necessary condition. If Q is positive semi-denite, then x is an optimal solution for (QP) if and only if x is a KKT point for (QP). In this case, the solution set for (QP) is characterized by a system of linear inequalities and equations. One can see (LP) is a special case of (QP).
1.4
An algorithm is a list of instructions to solve a problem. For every instance of problem P , i.e., for every given data Z Zp , an algorithm for solving P either determines that Sp is empty or generates an output x such that x Sp or x is close to Sp in certain measure. The latter x is called an approximate solution. Let us use Ap to denote the collection of all possible algorithm for solving every instance in P . Then, the (operation) complexity of an algorithm A Ap for solving an instance Z Zp is dened as the total arithmetic operations: +, , , /, and comparison on real numbers. Denote it by co (A, Z ). Sometimes it is convenient to dene the iteration complexity, denoted by ci (A, Z ), where we assume that each iteration costs a polynomial number (in m and n) of arithmetic operations. In most iterative algorithms, each iteration can be performed eciently both sequentially and in parallel, such as solving a system of linear equations, rank-one updating the inversion of a matrix, pivoting operation of a matrix, multiplying a matrix by a vector, etc. In the real number model, we introduce , the error for an approximate solution as a parameter. Let c(A, Z, ) be the total number of operations of algorithm A for generating an -approximate solution, with a well-dened measure, to problem P . Then, c(A, ) := sup c(A, Z, ) fA (m, n, ) for any
Z Zp
> 0.
We call this complexity model error-based. One may also view an approximate solution an exact solution to a problem -near to P with a well-dened measure in the data space. This is the so-called backward analysis model in numerical analysis. If fA (m, n, ) is a polynomial in m, n, and log(1/ ), then algorithm A is a polynomial algorithm and problem P is polynomially solvable. Again, if fA (m, n, ) is independent of and polynomial in m and n, then we say algorithm A is a strongly polynomial algorithm. If fA (m, n, ) is a polynomial in m, n, and (1/ ), then algorithm A is a polynomial approximation scheme or pseudo-polynomial algorithm . For some optimization problems, the complexity theory can be applied to prove not only that they cannot be solved in polynomial-time, but also that they do not have polynomial approximation schemes. In practice, approximation algorithms are widely used and accepted in practice. Example 1.11 There is a strongly polynomial algorithm for sorting a vector in
18
descending or ascending order, for matrix-vector multiplication, and for computing the norm of a vector. Example 1.12 Consider the bisection method to locate a root of a continuous function f (x) : R R within interval [0, 1], where f (0) > 0 and f (1) < 0. The method calls the oracle to evaluate f (1/2) (counted as one operation). If f (1/2) > 0, we throw away [0, 1/2); if f (1/2) < 0, we throw away (1/2, 1]. Then we repeat this process on the remaining half interval. Each step of the method halves the interval that contains the root. Thus, in log(1/ ) steps, we must have an approximate root whose distance to the root is less than . Therefore, the bisection method is a polynomial algorithm. We have to admit that the criterion of polynomiality is somewhat controversial. Many algorithms may not be polynomial but work ne in practice. This is because polynomiality is built upon the worst-case analysis. However, this criterion generally provides a qualitative statement: if a problem is polynomial solvable, then the problem is indeed relatively easy to solve regardless of the algorithm used. Furthermore, it is ideal to develop an algorithm with both polynomiality and practical eciency.
1.4.1
Convergence rate
Most algorithms are iterative in nature. They generate a sequence of everimproving points x0 , x1 , ..., xk , ... approaching the solution set. For many optimization problems and/or algorithms, the sequence will never exactly reach the solution set. One theory of iterative algorithms, referred to as local or asymptotic convergence analysis, is concerned with the rate at which the optimality error of the generated sequence converges to zero. Obviously, if each iteration of competing algorithms requires the same amount of work, the speed of the convergence of the error reects the speed of the algorithm. This convergence rate, although it may hold locally or asymptotically, provides evaluation and comparison of dierent algorithms. It has been widely used by the nonlinear optimization and numerical analysis community as an efciency criterion. In many cases, this criterion does explain practical behavior of iterative algorithms. Consider a sequence of real numbers {rk } converging to zero. One can dene several notions related to the speed of convergence of such a sequence. Denition 1.1 . Let the sequence {rk } converge to zero. The order of convergence of {rk } is dened as the supermum of the nonnegative numbers p satisfying 0 lim sup
k
Denition 1.2 . Let the sequence {rk } converge to zero such that lim sup
k
1.5. BASIC COMPUTATIONAL PROCEDURES Then, the sequence is said to converge quadratically to zero.
19
It should be noted that the order of convergence is determined only by the properties of the sequence that holds as k . In this sense we might say that the order of convergence is a measure of how good the tail of {rk } is. Large values of p imply the faster convergence of the tail. Denition 1.3 . Let the sequence {rk } converge to zero such that lim sup
k
Then, the sequence is said to converge linearly to zero with convergence ratio . Linear convergence is the most important type of convergence behavior. A linearly convergence sequence, with convergence ratio , can be said to have a tail that converges to zero at least as fast as the geometric sequence c k for a xed number c. Thus, we also call linear convergence geometric convergence. As a rule, when comparing the relative eectiveness of two competing algorithms both of which produce linearly convergent sequences, the comparison is based on their corresponding convergence ratiothe smaller the ratio, the faster the algorithm. The ultimate case where = 0 is referred to as superlinear convergence.
1 T Example 1.13 Consider the conjugate gradient algorithm for minimizing 2 x Qx+ c. Starting from an x0 Rn and d0 = Qx0 + c, the method uses iterative formula
xk+1 = xk k dk where k = and dk+1 = Qxk+1 k dk where k = (dk )T Q(Qxk+1 + c) . dk 2 Q (dk )T (Qxk + c) , dk 2 Q
This algorithm is superlinearly convergent (in fact, it converges in nite number of steps).
1.5
There are several basic numerical problems frequently solved by interior-point algorithms.
20
1.5.1
Probably the best-known algorithm for solving a system of linear equations is the Gaussian elimination method. Suppose we want to solve Ax = b. We may assume a11 = 0 after some row switching, where aij is the component of A in row i and column j . Then we can subtract appropriate multiples of the rst equation from the other equations so as to have an equivalent system: a11 0 A1. A x1 x = b1 b .
This is a pivot step, where a11 is called a pivot, and A is called a Schur complement. Now, recursively, we solve the system of the last m 1 equations for x . Substituting the solution x found into the rst equation yields a value for x1 . The last process is called back-substitution. In matrix form, the Gaussian elimination method transforms A into the form U 0 C 0
and L is a nonsingular, lower-triangular matrix. This is called the LU -decomposition. Sometimes, the matrix is transformed further to a form D 0 C 0
where D is a nonsingular, diagonal matrix. This whole procedure uses about nm2 arithmetic operations. Thus, it is a strong polynomial-time algorithm.
1.5.2
The theory says that y minimizes AT y c if and only if AAT y = Ac. So the problem is reduced to solving a system of linear equations with a symmetric semi-positive denite matrix.
21
One method is Choleskis decomposition. In matrix form, the method transforms AAT into the form AAT = LLT , where L is a lower-triangular matrix and is a diagonal matrix. (Such a transformation can be done in about nm2 arithmetic operations as indicated in the preceding section.) L is called the Choleski factor of AAT . Thus, the above linear system becomes LLT y = Ac, and y can be obtained by solving two triangle systems of linear equations.
1.5.3
The Newton method is used to solve a system of nonlinear equations: given f (x) : Rn Rn , the problem is to solve n equations for n unknowns such that f (x) = 0. The idea behind Newtons method is to use the Taylor linear approximation at the current iterate xk and let the approximation be zero: f (x) f (xk ) + f (xk )(x xk ) = 0.
The Newton method is thus dened by the following iterative formula: xk+1 = xk (f (xk ))1 f (xk ), where scalar 0 is called step-size. Rarely, however, is the Jacobian matrix f inverted. Generally the system of linear equations f (xk )dx = f (xk ) is solved and xk+1 = xk + dx is used. The direction vector dx is called a Newton step, which can be carried out in strongly polynomial time. A modied or quasi Newton method is dened by xk+1 = xk M k f (xk ), where M k is an n n symmetric matrix. In particular, if M k = I , the method is called the gradient method, where f is viewed as the gradient vector of a real function. The Newton method has a superior asymptotic convergence order equal 2 for f (xk ) . It is frequently used in interior-point algorithms, and believed to be the key to their eectiveness.
22
1.5.4
1,
or
(BD)
minimize subject to
1.
x minimizes (BP) if and only if there always exists a y such that they satisfy AAT y = Ac, and if c AT y = 0 then x = (c AT y )/ c AT y ; otherwise any feasible x is a solution. The solution y for (BD) is given as follows: Solve AAT y = b, and if y = 0 then set y = y / AT y ;
otherwise any feasible y is a solution. So these two problems can be reduced to solving a system of linear equations.
1.5.5
or simply (BD)
This problem is used by the classical trust region method for nonlinear optimization. The optimality conditions for the minimizer y of (BD) are (Q + I )y = b, and 0, y
2
1, 0.
(1 y 2 ) = 0,
(Q + I )
These conditions are necessary and sucient. This problem can be solved in polynomial time log(1/ ) and log(log(1/ )) by the bisection method or a hybrid of the bisection and Newton methods, respectively. In practice, several trust region procedures have been very eective in solving this problem. The ball-constrained quadratic problem will be used an a sub-problem by several interior-point algorithms in solving complex optimization problems. We will discuss them later in the book.
1.6. NOTES
23
1.6
Notes
The term complexity was introduced by Hartmanis and Stearns [155]. Also see Garey and Johnson [118] and Papadimitriou and Steiglitz [248]. The N P theory was due to Cook [70] and Karp [180]. The importance of P was observed by Edmonds [88]. Linear programming and the simplex method were introduced by Dantzig [73]. Other inequality problems and convexity theories can be seen in Gritzmann and Klee [141], Gr otschel, Lov asz and Schrijver [142], Gr unbaum [143], Rockafellar [264], and Schrijver [271]. Various complementarity problems can be found found in Cottle, Pang and Stone [72]. The positive semi-denite programming, an optimization problem in nonpolyhedral cones, and its applications can be seen in Nesterov and Nemirovskii [241], Alizadeh [8], and Boyd, Ghaoui, Feron and Balakrishnan [56]. Recently, Goemans and Williamson [125] obtained several breakthrough results on approximation algorithms using positive semi-denite programming. The KKT condition for nonlinear programming was given by Karush, Kuhn and Tucker [195]. It was shown by Klee and Minty [184] that the simplex method is not a polynomial-time algorithm. The ellipsoid method, the rst polynomial-time algorithm for linear programming with rational data, was proven by Khachiyan [181]; also see Bland, Goldfarb and Todd [52]. The method was devised independently by Shor [277] and by Nemirovskii and Yudin [239]. The interior-point method, another polynomial-time algorithm for linear programming, was developed by Karmarkar. It is related to the classical barrier-function method studied by Frisch [109] and Fiacco and McCormick [104]; see Gill, Murray, Saunders, Tomlin and Wright [124], and Anstreicher [21]. For a brief LP history, see the excellent article by Wright [323]. The real computation model was developed by Blum, Shub and Smale [53] and Nemirovskii and Yudin [239]. Other complexity issues in numerical optimization were discussed in Vavasis [318]. Many basic numerical procedures listed in this chapter can be found in Golub and Van Loan [133]. The ball-constrained quadratic problem and its solution methods can be seen in Mor e [229], Sorenson [281], and Dennis and Schnable [76]. The complexity result of the ball-constrained quadratic problem was proved by Vavasis [318] and Ye [332].
1.7
Exercises
1.1 Let Q Rnn be a given nonsingular matrix, and a and b be given Rn vectors. Show (Q + abT )1 = Q1 1 1+ bT Q1 a Q1 abT Q1 .
24
1.2 Prove that the eigenvalues of all matrices Q Mnn are real. Furthermore, show that Q is PSD if and only if all its eigenvalues are non-negative, and Q is PD if and only if all its eigenvalues are positive. 1.3 Using the ellipsoid representation in Section 1.2.2, nd the matrix Q and vector y that describes the following ellipsoids: 1. The 3-dimensional sphere of radius 2 centered at the origin; 2. The 2-dimensional ellipsoid centered at (1; 2) that passes the points (0; 2), (1; 0), (2; 2), and (1; 4); 3. The 2-dimensional ellipsoid centered at (1; 2) with axes parallel to the line y = x and y = x, and passing through (1; 0), (3; 4), (0; 3), and (2; 1). 1.4 Show that the biggest coordinate-aligned ellipsoid that is entirely contained
a n in Rn + and has its center at x R+ can be written as:
E (xa ) = {x Rn :
(X a )1 (x xa ) 1}.
1.5 Show that the non-negative orthant, the positive semi-denite cone, and the second-order cone are all self-dual. 1.6 Consider the convex set C = {x R2 : (x1 1)2 + (x2 1)2 1} and let y R2 . Assuming y C , 1. Find the point in C that is closest to y ; 2. Find a separating hyperplane vector as a function of y . 1.7 Using the idea of Exercise 1.6, prove the separating hyperplane theorem 1.1. 1.8 Given an m n matrix A and a vector c Rn , consider the function n B (y ) = j =1 log sj where s = c AT y > 0. Find B (y ) and 2 B(y ) in terms of s.
m
Given C Mn , Ai Mn , i = 1, , m, and b Rm , consider the function B (y ) := log det(S ), where S = C and 2 B (y ) in terms of S .
i=1
yi Ai
0. Find B(y )
The best way to do this is to use the denition of the partial derivative f (y )i = lim f (y1 , y2 , ..., yi + , ..., ym ) f (y1 , y2 , ..., yi , ..., ym ) .
1.9 Prove that the level set of a quasi-convex function is convex. 1.10 Prove Propositions 1.3 and 1.4 for convex functions in Section 1.2.3.
1.7. EXERCISES
25
(x) dened below 1.11 Let f1 , . . ., fm be convex functions. Then, the function f is also convex:
i=1,...,m
max fi (x)
m
fi (x)
i=1
1.12 Prove the Harmonic inequality described in Section 1.2.4. 1.13 Prove Farkas lemma 1.7 for linear equations. 1.14 Prove the linear least-squares problem always has a solution. 1.15 Let P = AT (AAT )1 A or P = I AT (AAT )1 A. Then prove 1. P = P 2 . 2. P is positive semi-denite. 3. The eigenvalues of P are either 0 or 1. 1.16 Using the separating theorem, prove Farkas lemmas 1.8 and 1.9. 1.17 If a system AT y c of linear inequalities in m variables has no solution, show that AT y c has a subsystem (A )T y c of at most m + 1 inequalities having no solution. 1.18 Prove the LP fundamental theorem 1.14. 1.19 If (LP) and (LD) have a nondegenerate optimal basis AB , prove that the strict complementarity partition in Theorem 1.13 is P = B. 1.20 If Q is positive semi-denite, prove that x is an optimal solution for (QP) if and only if x is a KKT point for (QP). 1.21 Prove X S 0 if both X and S are positive semi-denite matrices. 1.22 Prove that two positive semi-denite matrices are complementary to each other, X S = 0, if and only if XS = 0. 1.23 Let both (LP) and (LD) for a given data set (A, b, c) have interior feasible points. Then consider the level set (z ) = {y : c AT y 0, z + bT y 0} where z < z and z designates the optimal objective value. Prove that (z ) is bounded and has an interior for any nite z < z , even Fd is unbounded.
26
1.24 Given an (LP) data set (A, b, c) and an interior feasible point x0 , nd the feasible direction dx (Adx = 0) that achieves the steepest decrease in the objective function.
m n 1.25 Given an (LP) data set (A, b, c) and a feasible point (x0 , y 0 , s0 ) (Rn + , R , R+ ) for the primal and dual, and ignoring the nonnegativity condition, write the systems of linear equations used to calculate the Newton steps for nding points that satisfy the optimality equations (1.2) and (1.3), respectively.
1.26 Show the optimality conditions for the minimizer y of (BD) in Section 1.5.5: (Q + I )y = b, and are necessary and sucient. 0, y 1, 0, (1 y ) = 0,
(Q + I )
Chapter 2
Semidenite Programming
2.0.1 Semi-denite programming (SDP)
Given C Mn , Ai Mn , i = 1, 2, ..., m, and b Rm , the semi-denite programming problem is to nd a matrix X Mn for the optimization problem: (SDP ) inf subject to C X Ai X = bi , i = 1, 2, ..., m, X
0.
Recall that the operation is the matrix inner product A B := trAT B. The notation X 0 means that X is a positive semi-denite matrix, and X 0 means that X is a positive denite matrix. If a point X 0 and satises all equations in (SDP), it is called a (primal) strictly or interior feasible solution. . The dual problem to (SDP) can be written as: (SDD) sup bT y m subject to i yi Ai + S = C, S 0,
which is analogous to the dual (LD) of LP. Here y Rm and S Mn . If a point (y, S 0) satises all equations in (SDD), it is called a dual interior feasible solution. Example 2.1 Let P (y Rm ) = C + i yi Ai , where C and Ai , i = 1, . . . , m, are given symmetric matrices. The problem of minimizing the max-eigenvalue of P (y ) can be cast as a (SDD) problem. In semi-denite programming, we minimize a linear function of a matrix in the positive semi-denite matrix cone subject to ane constraints. In contrast to the positive orthant cone of linear programming, the positive semi-denite 27
m
28
matrix cone is non-polyhedral (or non-linear), but convex. So positive semidenite programs are convex optimization problems. Semi-denite programming unies several standard problems, such as linear programming, quadratic programming, and convex quadratic minimization with convex quadratic constraints, and nds many applications in engineering, control, and combinatorial optimization. From Farkas lemma for linear programming, a vector y , with AT y 0 and T b y = 1, always exists and is called a (primal) infeasibility certicate for the system {x : Ax = b, x 0}. But this does not hold for matrix equations in the positive semi-denite matrix cone. Example 2.2 Consider A1 = and b= 1 0 0 0 , A2 = 0 2 X M2 +. 0 1 1 0
The problem is that {y : yi = Ai X, i = 1, 2, ..., m, X 0} is not a closed set. Similary, 0 1 1 0 C= and A1 = 1 0 0 0 makes C A1 y1 infeasible but it does not have an infeasible certicate. We have several theorems analogous to Farkas lemma. Theorem 2.1 (Farkas lemma in SDP) Let Ai Mn , i = 1, ..., m, have rank m m (i.e., i yi Ai = 0 implies y = 0) and b Rm . Then, there exists a symmetric matrix X 0 with Ai X = bi , i = 1, ..., m, if and only if
m i
yi Ai
0 and
m i
yi Ai = 0 implies bT y < 0.
Corollary 2.2 (Farkas lemma in SDP) Let Ai Mn , i = 1, ..., m, have rank m m m (i.e., i yi Ai = 0 implies y = 0) and C Mn . Then, i yi Ai C if and only if X 0, X = 0 and Ai X = 0, implies C X > 0. In other words, an X 0, X = 0, Ai X = 0,
m i
i = 1, ..., m,
i = 1, ..., m,
and C X 0 proves that yi Ai C is impossible. Note the dierence between the LP and SDP. The weak duality theorem for SDP is identical to that of (LP) and (LD).
29 Corollary 2.3 (Weak duality theorem in SDP) Let Fp and Fd , the feasible sets for the primal and dual, be non-empty. Then, C X bT y where X Fp , (y, S ) Fd . But we need more to make the strong duality theorem hold. Theorem 2.4 (Strong duality theorem in SDP) Let Fp and Fd be non-empty and at least one of them has an interior. Then, X is optimal for (PS) if and only if the following conditions hold: i) X Fp ; ii) there is (y, S ) Fd ; iii) C X = bT y or X S = 0. Again note the dierence between the above theorem and the strong duality theorem for LP. Example 2.3 The 0 1 C= 1 0 0 0 and b= following SDP has 0 0 0 , A1 = 0 0 0 a duality gap: 0 0 0 1 0 , A 1 = 1 0 0 0 0 10 . 1 0 0 0 0 2
Two positive semi-denite matrices are complementary to each other, X S = 0, if and only if XS = 0 (Exercise 1.22). From the optimality conditions, the solution set for certain (SDP) and (SDD) is Sp = {X Fp , (y, S ) Fd : C X bT y = 0}, or Sp = {X Fp , (y, S ) Fd : XS = 0}, which is a system of linear matrix inequalities and equations. In general, we have Theorem 2.5 (SDP duality theorem) If one of (SDP) or (SDD) has a strictly or interior feasible solution and its optimal value is nite, then the other is feasible and has the same optimal value. If one of (SDP) or (SDD) is unbounded then the other has no feasible solution. Note that a duality gap may exists if neither (SDP) nor (SDD) has a strictly feasible point. This is in contrast to (LP) and (LD) where no duality gap exists if both are feasible. Although semi-denite programs are much more general than linear programs, they are not much harder to solve. It has turned out that most interiorpoint methods for LP have been generalized to semi-denite programs. As in LP, these algorithms possess polynomial worst-case complexity under certain computation models. They also perform well in practice. We will describe such extensions later in this book.
30
2.1
2.1.1
Analytic Center
AC for polytope
Let be a bounded polytope in Rm represented by n (> m) linear inequalities, i.e., = {y Rm : c AT y 0}, where A Rmn and c Rn are given and A has rank m. Denote the interior of by
m T = {y R : c A y > 0}.
Dene d(y ) =
(cj aT j y ),
j =1
y ,
where a.j is the j th column of A. Traditionally, we let s := c AT y and call it a slack vector. Thus, the function is the product of all slack variables. Its logarithm is called the (dual) potential function,
n n
log(cj aT .j y ) =
j =1
log sj ,
(2.1)
and B (y ) is the classical logarithmic barrier function. For convenience, in what follows we may write B(s) to replace B(y ) where s is always equal to c AT y . Example 2.4 Let A = (1, 1) and c = (1; 1). Then the set of is the interval [1, 1]. Let A = (1, 1, 1) and c = (1; 1; 1). Then the set of is also the interval [1, 1]. Note that d(1/2) = (3/2)(1/2) = 3/4 and d (1/2) = (3/)(1/2)(1/2) = 3/8 and B (1/2) = log(3/8). The interior point, denoted by y a and sa = c AT y a , in that maximizes the potential function is called the analytic center of , i.e., B () := B (y a , ) = max log d(y, ).
y
and
B(1/2) = log(3/4),
(y a , sa ) is uniquely dened, since the potential function is strictly concave in a bounded convex . Setting B(y, ) = 0 and letting xa = (S a )1 e, the analytic center (y a , sa ) together with xa satisfy the following optimality conditions: Xs = Ax = AT y s = e 0 c. (2.2)
Note that adding or deleting a redundant inequality changes the location of the analytic center.
31
Example 2.5 Consider = {y R : y 0, y 1}, which is interval [0, 1]. The analytic center is y a = 1/2 with xa = (2, 2)T . Consider
n times
= {y R : y 0, , y 0, y 1}, which is, again, interval [0, 1] but y 0 is copied n times. The analytic center for this system is y a = n/(n + 1) with xa = ((n + 1)/n, , (n + 1)/n, (n + 1))T . The analytic center can be dened when the interior is empty or equalities are presented, such as = {y Rm : c AT y 0, By = b}. Then the analytic center is chosen on the hyperplane {y : By = b} to maximize the product of the slack variables s = c AT y . Thus, the interior of is not used in the sense that the topological interior for a set is used. Rather, it refers to the interior of the positive orthant of slack variables: Rn + := {s : s 0}. When say has an interior, we mean that
T Rn + {s : s = c A y for some y where By = b} = . n n Again Rn + := {s R+ : s > 0}, i.e., the interior of the orthant R+ . Thus, if
has only a single point y with s = c AT y > 0, we still say is not empty. Example 2.6 Consider the system = {x : Ax = 0, eT x = n, x 0}, which is called Karmarkars canonical set. If x = e is in then e is the analytic center of , the intersection of the simplex {x : eT x = n, x 0} and the hyperplane {x : Ax = 0} (Figure 2.1).
2.1.2
AC for SDP
m i
Let be a bounded convex set in Rm represented by n (> m) a matrix inequality, i.e., = {y Rm : C Let S = C
m i
yi Ai
0, }.
yi Ai and
m
yi Ai ).
m
(2.3)
a The interior point, denoted by y a and S a = C i yi Ai , in that maximizes the potential function is called the analytic center of , i.e.,
max B(y ).
y
32
x3
(0,0,3)
Ax=0
x1
x2
Figure 2.1: Illustration of the Karmarkar (simplex) polytope and its analytic center. (y a , S a ) is uniquely dened, since the potential function is strictly concave in a bounded convex . Setting B(y, ) = 0 and letting X a = (S a )1 , the analytic center (y a , S a ) together with X a satisfy the following optimality conditions: XS AX AT y S = I = 0 = C. (2.4)
2.2
We show how potential functions can be dened to solve linear programming problems and semi-denite programming. We assume that for a given LP data set (A, b, c), both the primal and dual have interior feasible point. We also let z be the optimal value of the standard form (LP) and (LD). Denote the feasible sets of (LP) and (LD) by Fp and Fd , respectively. Denote by F = Fp Fd , and the interior of F by F .
2.2.1
where z < z . Since both (LP) and (LD) have interior feasible point for given (A, b, c), (z ) is bounded and has an interior for any nite z , even := Fd is unbounded (Exercise 1.23). Clearly, (z ) , and if z2 z1 , (z2 ) (z1 ) and the inequality z + bT y is translated from z = z1 to z = z2 .
33
From the duality theorem again, nding a point in (z ) has a homogeneous primal problem minimize cT x zx0 s.t. Ax bx0 = 0, (x , x0 ) 0. For (x , x0 ) satisfying Ax bx0 = 0, (x , x0 ) > 0, let x := x /x0 F p , i.e., Ax = b, x > 0. Then, the primal potential function for (z ) (Figure 2.2), as described in the preceding section, is
n
P (x , (z )) = (n + 1) log(cT x zx0 )
j =0 n
log xj
= (n + 1) log(cT x z )
j =1
The latter, Pn+1 (x, z ), is the Karmarkar potential function in the standard LP form with a lower bound z for z .
b T y = bT y a
ya
bT y = z
The objective hyperplane
ya
Figure 2.2: Intersections of a dual feasible region and the objective hyperplane; bT y z on the left and bT y bT y a on the right. One algorithm for solving (LD) is suggested in Figure 2.2. If the objective hyperplane is repeatedly translated to the analytic center, the sequence of new
34
analytic centers will converge to an optimal solution and the potentials of the new polytopes will decrease to . As we illustrated before, one can represent (z ) dierently:
times
(z ) = {y : c AT y 0, z + bT y 0, , z + bT y 0},
(2.6)
i.e., z + bT y 0 is copied times. Geometrically, this representation does not change (z ), but it changes the location of its analytic center. Since the last inequalities in (z ) are identical, they must share the same slack value and the same corresponding primal variable. Let (x , x0 ) be the primal variables. Then the primal problem can be written as
times
minimize s.t.
c x zx0 zx0
times
Ax bx0 bx0 = 0, (x , x0 ) 0.
Let x = x /(x0 ) F p . Then, the primal potential function for the new (z ) given by (2.6) is
n
P (x, (z ))
= =
(n + ) log(cT x z (x0 ))
j =1 n
log xj log x0
(n + ) log(cT x z )
j =1
log xj + log
log xj
(2.7)
is an extension of the Karmarkar potential function in the standard LP form with a lower bound z for z . It represents the volume of a coordinate-aligned ellipsoid whose intersection with A(z) contains S(z) , where z + bT y 0 is duplicated times.
2.2.2
We can also develop a dual potential function, symmetric to the primal, for (y, s) F d Bn+ (y, s, z ) = (n + ) log(z bT y )
j =1 n
log sj ,
(2.8)
35
where z is a upper bound of z . One can show that it represents the volume of a coordinate-aligned ellipsoid whose intersection with the ane set {x : Ax = b} contains the primal level set
times
{x Fp : c x z 0, , cT x z 0}, where cT x z 0 is copied times (Exercise 2.1). For symmetry, we may write Bn+ (y, s, z ) simply by Bn+ (s, z ), since we can always recover y from s using equation AT y = c s.
2.2.3
A primal-dual potential function for linear programming will be used later. For x F p and (y, s) F d it is dened by
n
n+ (x, s) := (n + ) log(xT s)
j =1
log(xj sj ),
(2.9)
n+ (x, s) = = = Since
(n + ) log(cT x bT y )
j =1 n
log xj
j =1
log sj
Pn+ (x, bT y )
j =1 n
log sj log xj .
j =1
Bn+ (s, cT x)
then, for > 0, n+ (x, s) implies that xT s 0. More precisely, we have n+ (x, s) n log n xT s exp( ). We have the following theorem: Theorem 2.6 Dene the level set ( ) := {(x, y, s) F : n+ (x, s) }. i) ( 1 ) ( 2 ) if 1 2 .
36 ii)
( ) = {(x, y, s) F : n+ (x, s) < }. ) has non-empty interseciii) For every , ( ) is bounded and its closure ( tion with the solution set. Later we will show that a potential reduction algorithm generates sequences {x , y k , sk } F such that n+n (xk+1 , y k+1 , sk+1 ) n+n (xk , y k , sk ) .05 for k = 0, 1, 2, .... This indicates that the level sets shrink at least a constant rate independently of m or n.
k
2.2.4
The potential functions for SDP of Section 2.0.1 are analogous to those for LP. For given data, we assume that both (SDP) and (SDD) have interior feasible points. Then, for any X F p and (y, S ) F d , the primal potential function is dened by Pn+ (X, z ) := (n + ) log(C X z ) log det(X ), the dual potential function is dened by Bn+ (y, S, z ) := (n + ) log(z bT y ) log det(S ), where 0 and z designates the optimal objective value. For X F p and (y, S ) F d the primal-dual potential function for SDP is dened by n+ (X, S ) := = = = (n + ) log(X S ) log(det(X ) det(S )) (n + ) log(C X bT y ) log det(X ) log det(S ) Pn+ (X, bT y ) log det(S ) Bn+ (S, C X ) log det(X ),
z z;
z z,
where 0. Note that if X and S are diagonal matrices, these denitions reduce to those for LP. Note that we still have (Exercise 2.2) n+ (X, S ) = log(X S ) + n (X, S ) log(X S ) + n log n. Then, for > 0, n+ (X, S ) implies that X S 0. More precisely, we have n+ (X, S ) n log n X S exp( ). We also have the following corollary:
37
Corollary 2.7 Let (SDP) and (SDD) have non-empty interior and dene the level set ( ) := {(X, y, S ) F : n+ (X, S ) }. i) ( 1 ) ( 2 ) ii) ( ) = {(X, y, S ) F : n+ (X, S ) < }. ) has non-empty interseciii) For every , ( ) is bounded and its closure ( tion with the solution set.
if
1 2 .
2.3
Many interior-point algorithms nd a sequence of feasible points along a central path that connects the analytic center and the solution set. We now present this one of the most important foundations for the development of interior-point algorithms.
2.3.1
Consider a linear program in the standard form (LP) and (LD). Assume that F = , i.e., both F p = and F d = , and denote z the optimal objective value. The central path can be expressed as C= (x, y, s) F : Xs =
xT s e n
in the primal-dual form. We also see C = (x, y, s) F : n (x, s) = n log n . For any > 0 one can derive the central path simply by minimizing the primal LP with a logarithmic barrier function: (P )
minimize s.t.
cT x j =1 log xj Ax = b, x 0.
Let x() F p be the (unique) minimizer of (P). Then, for some y Rm it satises the optimality conditions Xs = Ax = AT y s = e b c. (2.10)
38
CHAPTER 2. SEMIDEFINITE PROGRAMMING Consider minimizing the dual LP with the barrier function: (D)
maximize s.t.
bT y + j =1 log sj AT y + s = c, s 0.
Let (y (), s()) F d be the (unique) minimizer of (D). Then, for some x Rn it satises the optimality conditions (2.10) as well. Thus, both minimizers x() and (y (), s()) are on the central path with x()T s() = n. Another way to derive the central path is to consider again the dual level set (z ) of (2.5) for any z < z (Figure 2.3). Then, the analytic center (y (z ), s(z )) of (z ) and a unique point (x (z ), x0 (z )) satises Ax (z ) bx0 (z ) = 0, X (z )s = e, s = c AT y, and x0 (z )(bT y z ) = 1. Let x(z ) = x (z )/x0 (z ), then we have Ax(z ) = b, X (z )s(z ) = e/x0 (z ) = (bT y (z ) z )e. Thus, the point (x(z ), y (z ), s(z )) is on the central path with = bT y (z ) z and cT x(z ) bT y (z ) = x(z )T s(z ) = n(bT y (z ) z ) = n. As we proved earlier in Section 2.2, (x(z ), y (z ), s(z )) exists and is uniquely dened, which imply the following theorem: Theorem 2.8 Let both (LP) and (LD) have interior feasible points for the given data set (A, b, c). Then for any 0 < < , the central path point (x(), y (), s()) exists and is unique.
ya
2.3. CENTRAL PATHS OF LP AND SDP Theorem 2.9 Let (x(), y (), s()) be on the central path.
39
i) The central path point (x(), s()) is bounded for 0 < 0 and any given 0 < 0 < . ii) For 0 < < , cT x( ) < cT x() and bT y ( ) > bT y ().
iii) (x(), s()) converges to an optimal solution pair for (LP) and (LD). Moreover, the limit point x(0)P is the analytic center on the primal optimal face, and the limit point s(0)Z is the analytic center on the dual optimal face, where (P , Z ) is the strict complementarity partition of the index set {1, 2, ..., n}. Proof. Note that (x(0 ) x())T (s(0 ) s()) = 0, since (x(0 ) x()) N (A) and (s(0 ) s()) R(AT ). This can be rewritten as
n j
2n.
Thus, x() and s() are bounded, which proves (i). We leave the proof of (ii) as an exercise. Since x() and s() are both bounded, they have at least one limit point which we denote by x(0) and s(0). Let x P (xZ = 0) and sZ (sP = 0), respectively, be the unique analytic centers on the primal and dual optimal faces: {xP : AP xP = b, xP 0} and {sZ : sZ = cZ AT Z y 0, cP AT P y = 0}. Again, we have
n s j x()j + xj s()j = n, j
or
j P
x j x()j
+
j Z
s j s()j
= n.
Thus, we have
x()j x j /n > 0, j P
and
s()j s j /n > 0, j Z .
j P
j Z
s j 1 s()j x()j
j Z
or
j P
x j
j Z
s()j .
s j
However, ( j P x j )( j Z sj ) is the maximal value of the potential function over all interior point pairs on the optimal faces, and x(0)P and s(0)Z is one interior point pair on the optimal face. Thus, we must have
j P
x j
j Z
= s j
j P
x(0)j
j Z
s(0)j .
Therefore,
x(0)P = x P
and s(0)Z = s Z ,
since x P and sZ are the unique maximizer pair of the potential function. This also implies that the limit point of the central path is unique.
Xs e
and =
xT s n
= | min(0, min(x))|.
2.3. CENTRAL PATHS OF LP AND SDP Theorem 2.10 Let (x, y, s) N ( ) for constant 0 < < 1. i) The N ( ) {(x, y, s) : xT s n0 } is bounded for any given 0 < .
41
ii) Any limit point of N ( ) as 0 is an optimal solution pair for (LP) and (LD). Moreover, for any j P xj (1 )x j , n (1 )s j , n
2.3.2
Consider a SDP problem in Section 2.0.1 and Assume that F = , i.e., both F p = and F d = . The central path can be expressed as C = (X, y, S ) F : XS = I, 0 < < , or a symmetric form C = (X, y, S ) F : X .5 SX .5 = I, 0 < < ,
n .5 .5 where X .5 Mn = X. + is the square root matrix of X M+ , i.e., X X We also see C = (X, y, S ) F : n (X, S ) = n log n .
When X and S are diagonal matrices, this denition is identical to LP. We also have the following corollary: Corollary 2.11 Let both (SDP) and (SDD) have interior feasible points. Then for any 0 < < , the central path point (X (), y (), S ()) exists and is unique. Moreover, i) the central path point (X (), S ()) is bounded where 0 < 0 for any given 0 < 0 < . ii) For 0 < < , C X ( ) < C X () and bT y ( ) > bT y ().
iii) (X (), S ()) converges to an optimal solution pair for (SDP) and (SDD), and the rank of the limit of X () is maximal among all optimal solutions of (SDP) and the rank of the limit S () is maximal among all optimal solutions of (SDD).
42
2.4
Notes
2.5
Exercises
2.1 Let (LP) and (LD) have interior. Prove the dual potential function Bn+1 (y, s, z ), where z is a upper bound of z , represents the volume of a coordinate-aligned ellipsoid whose intersection with the ane set {x : Ax = b} contains the primal level set {x Fp : cT x z }. 2.2 Let X, S Mn be both positive denite. Then prove n (X, S ) = n log(X S ) log(det(X ) det(S )) n log n. 2.3 Consider linear programming and the level set ( ) := {(X, y, S ) F : n+ (x, s) }. Prove that ( 1 ) ( 2 ) if 1 2 ,
) has non-empty intersection and for every ( ) is bounded and its closure ( with the solution set. 2.4 Prove (ii) of Theorem 2.9. 2.5 Prove Theorem 2.10. 2.6 Prove Corollary 2.11. Here we assume that X () = X ( ) and y () = y (mu ).
Chapter 3
Interior-Point Algorithms
This pair of semidenite programs can be solved in polynomial time. There are actually several polynomial algorithms. One is the primal-scaling algorithm, and it uses only X to generate the iterate direction. In other words, X k+1 S k+1 = Fp (X k ),
where Fp is the primal algorithm iterative mapping. Another is the dual-scaling algorithm which is the analogue of the dual potential reduction algorithm for linear programming. The dual-scaling algorithm uses only S to generate the new iterate: X k+1 S k+1 = Fd (S k ),
where Fd is the dual algorithm iterative mapping. The third is the primal-dual scaling algorithm which uses both X and S to generate the new iterate and references therein): X k+1 S k+1 = Fpd (X k , S k ),
where Fpd is the primal-dual algorithm iterative mapping. All these algorithms generate primal and dual iterates simultaneously, and possess O( n ln(1/ )) iteration complexity to yield the duality gap accuracy . Other scaling algorithms have been proposed in the past. For example, an SDP equivalent of Dikins ane-scaling algorithm could be very fast. However this algorithm may not even converge. Recall that Mn denotes the set of symmetric matrices in Rnn . Let Mn + denote the set of positive semi-denite matrices and Mn + the set of positive denite matrices in Mn . The goal of this section is to extend interior-point algorithms to solving the positive semi-denite programming problem (SDP) and (SDD) presented in Section 2.0.1. 43
44
(SDP) and (SDD) are analogues to linear programming (LP) and (LD). In fact, as the notation suggest, (LP) and (LD) can be expressed as a positive semi-denite program by dening C = diag(c), Ai = diag(ai ), b = b,
where ai is the ith row of matrix A. Many of the theorems and algorithms used in LP have analogues in SDP. However, while interior-point algorithms for LP are generally considered competitive with the simplex method in practice and outperform it as problems become large, interior-point methods for SDP outperform other methods on even small problems. Denote the primal feasible set by Fp and the dual by Fd . We assume that both F p and F d are nonempty. Thus, the optimal solution sets for both (SDP ) and (SDD) are bounded and the central path exists, see Section 2.3 Let z denote the optimal value and F = Fp Fd . In this section, we are interested in nding an approximate solution for the SDP problem: C X bT y = S X . For simplicity, we assume that a central path pair (X 0 , y 0 , S 0 ), which satises (X 0 ).5 S 0 (X 0 ).5 = 0 I
and
0 = X 0 S 0 /n,
is known. We will use it as our initial point throughout this section. Let X F p , (y, S ) F d , and z z . Then consider the primal potential function P (X, z ) = (n + ) log(C X z ) log det X, and the primal-dual potential function (X, S ) = (n + ) log(S X ) log det XS, where = n. Let z = bT y . Then S X = C X z , and we have (x, s) = P (x, z ) log det S. Like in Chapter 4, these functions will be used to solve SDP problems. Dene the -norm,, which is the traditional l2 operator norm for matrices, of Mn by X := max {|j (X )|},
j {1,...,n}
where j (X ) is the j th eigenvalue of X , and the Euclidean or l2 norm, which is the traditional Frobenius norm, by X := X =
n
X X =
j =1
(j (X ))2 .
45
We rename these norms because they are perfect analogues to the norms of vectors used in LP. Furthermore, note that, for X Mn ,
n n
tr(X ) =
j =1
j (X )
and
det(I + X ) =
(1 + j (X )).
j =1
Then, we have the following lemma which resembles Lemma 3.1. We rst prove two important lemmas. Lemma 3.1 If d Rn such that d
n
eT d
i=1
log(1 + di ) eT d
3.1
Let (x, y, s) F . Then consider the primal-dual potential function: n+ (x, s) = (n + ) log(xT s)
j =1
log(xj sj ),
log sj .
Recall that when = 0, n+ (x, s) is minimized along the central path. However, when > 0, n+ (x, s) means that x and s converge to the optimal face, and the descent gets steeper as increases. In this section we choose = n. The process calculates steps for x and s, which guarantee a constant reduction in the primal-dual potential function. As the potential function decreases, both x and s are forced to an optimal solution pair. Consider a pair of (xk , y k , sk ) F . Fix z k = bT y k , then the gradient vector of the primal potential function at xk is Pn+ (xk , z k ) = ( n + ) (n + ) c (X k )1 e = T k c (X k )1 e. k T k (s ) x c x zk
We directly solve the ball-constrained linear problem for direction dx : minimize s.t. Pn+ (xk , z k )T dx Adx = 0, (X k )1 dx .
46
Let the minimizer be dx . Then d x = where pk = p(z k ) :== (I X k AT (A(X k )2 AT )1 AX k )X k Pn+ (xk , z k ). Update xk+1 = xk + dx = xk and, Pn+ (xk+1 , z k ) Pn+ (xk , z k ) pk + Here, we have used the fact Pn+ (xk+1 , z k ) Pn+ (xk , z k ) n+ (X k )1 dx 2 T T k 1 c d e ( X ) d + x x cT xk z k 2(1 (X k )1 dx (X k )1 dx 2 = P (xk , z k )T dx + 2(1 (X k )1 dx ) 2 = pk + 2(1 )
)
X k pk , pk
X k pk , pk 2 . 2(1 )
(3.1)
Thus, as long as pk > 0, we may choose an appropriate such that Pn+ (xk+1 , z k ) Pn+ (xk , z k ) for some positive constant . By the relation between n+ (x, s) and Pn+ (x, z ), the primal-dual potential function is also reduced. That is, n+ (xk+1 , sk ) n+ (xk , sk ) . However, even if pk is small, we will show that the primal-dual potential function can be reduced by a constant by increasing z k and updating (y k , sk ). We focus on the expression of pk , which can be rewritten as pk = (I X k AT (A(X k )2 AT )1 AX k )( = where s(z k ) = c AT y (z k ) (3.3) ( n + ) X k s(z k ) e, cT xk z k (n + ) X k c e) cT xk z k (3.2)
47
(3.4)
Regarding pk = p(z k ) , we have the following lemma: Lemma 3.3 Let k = If p(z k ) < min then the following three inequalities hold: s(z k ) > 0, X k s(z k ) e < , and < (1 .5/ n)k . (3.6) n ,1 , n + 2 (3.5) cT xk z k (xk )T sk = n n and = (xk )T s(z k ) . n
Proof. The proof is by contradiction. i) If the rst inequality of (3.6) is not true, then j such that sj (z k ) 0 and p(z k ) 1 (n + ) xj sj (z k ) 1. nk
ii) If the second inequality of (3.6) does not hold, then p(z k )
2
= =
where the last relation prevails since the quadratic term yields the minimum at n (n + ) = . nk n + 2 iii) If the third inequality of (3.6) is violated, then (n + ) 1 . 5 (1 + )(1 ) 1, nk n n
48
The lemma says that, when p(z k ) is small, then (xk , y (z k ), s(z k )) is in the neighborhood of the central path and bT y (z k ) > z k . Thus, we can increase z k to bT y (z k ) to cut the dual level set (z k ). We have the following potential reduction theorem to evaluate the progress.
Theorem 3.4 Given (xk , y k , sk ) F . Let = n, z k = bT y k , xk+1 be given k+1 k k+1 by (3.1), and y = y (z ) in (3.4) and s = s(z k ) in (3.3). Then, either
n+ (xk+1 , sk ) n+ (xk , sk ) or n+ (xk , sk+1 ) n+ (xk , sk ) where > 1/20. Proof. If (3.5) does not hold, i.e., p(z k ) min then Pn+ (xk+1 , z k ) Pn+ (xk , z k ) min n 2 , 1 + , n + 2 2(1 ) n ,1 , n + 2
hence from the relation between Pn+ and n+ , n+ (xk+1 , sk ) n+ (xk , sk ) min 2 n , 1 + . n + 2 2(1 )
Otherwise, from Lemma 3.3 the inequalities of (3.6) hold: i) The rst of (3.6) indicates that y k+1 and sk+1 are in F d .
49
ii) Using the second of (3.6) and applying Lemma 3.1 to vector X k sk+1 /, we have
n
n log(xk )T sk+1
j =1 n
k+1 log(xk ) j sj
= n log n
j =1
k+1 /) log(xk j sj
n log n +
n log(xk )T sk
j =1
k log(xk j sj ) +
2 . 2(1 )
iii) According to the third of (3.6), we have n(log(xk )T sk+1 log(xk )T sk ) = n log . k 2
Adding the two inequalities in ii) and iii), we have n+ (xk , sk+1 ) n+ (xk , sk ) 2 + . 2 2(1 )
Theorem 3.4 establishes an important fact: the primal-dual potential function can be reduced by a constant no matter where xk and y k are. In practice, one can perform the line search to minimize the primal-dual potential function. This results in the following primal-dual potential reduction algorithm. Algorithm 3.1 Given a central path point (x0 , y 0 , s0 ) F . Let z 0 = bT y 0 . Set k := 0. While (sk )T xk do 1. Compute y1 and y2 from (3.4). 2. If there exists z such that s(z ) > 0, compute z = arg min n+ (xk , s(z )),
z
and if n+ (xk , s( z )) < n+ (xk , sk ) then y k+1 = y ( z ), sk+1 = s( z ) and z k+1 = bT y k+1 ; otherwise, y k+1 = y k , sk+1 = sk and z k+1 = z k .
50
CHAPTER 3. INTERIOR-POINT ALGORITHMS 3. Let xk+1 = xk X k p(z k+1 ) with = arg min n+ (xk X k p(z k+1 ), sk+1 ).
0
4. Let k := k + 1 and return to Step 1. The performance of the algorithm results from the following corollary: Corollary 3.5 Let = bT y 0 )/ ) iterations with n. Then, Algorithm 3.1 terminates in at most O( n log(cT x0 cT xk bT y k . Proof. In O( n log((x0 )T s0 / )) iterations n log((x0 )T s0 / ) = n+ (xk , sk ) n+ (x0 , s0 ) n log(xk )T sk + n log n n+ (x0 , s0 ) = n log((xk )T sk /(x0 )T s0 ). Thus, i.e., cT xk bT y k = (xk )T sk . n log(cT xk bT y k ) = n log(xk )T sk n log ,
3.2
Another technique for solving linear programs is the symmetric primal-dual algorithm. Once we have a pair (x, y, s) F with = xT s/n, we can generate a new iterate x+ and (y + , s+ ) by solving for dx , dy and ds from the system of linear equations: Sdx + Xds = e Xs, Adx = 0, (3.8) AT dy ds = 0. Let d := (dx , dy , ds ). To show the dependence of d on the current pair (x, s) T T and the parameter , we write d = d(x, s, ). Note that dT x ds = dx A dy = 0 here. The system (3.8) is the Newton step starting from (x, s) which helps to nd the point on the central path with duality gap n, see Section 2.3.1. If = 0, it steps toward the optimal solution characterized by the system of equations (1.2); if = 1, it steps toward the central path point (x(), y (), s()) characterized by the system of equations (2.10); if 0 < < 1, it steps toward a central path point with a smaller complementarity gap. In the algorithm presented in
51
this section, we choose = n/(n + ) < 1. Each iterate reduces the primaldual potential function by at least a constant , as does the previous potential reduction algorithm. To analyze this algorithm, we present the following lemma, whose proof is omitted. Lemma 3.6 Let the direction d = (dx , dy , ds ) be generated by equation (3.8) with = n/(n + ), and let = min(Xs) Xs) , (3.9)
xT s (XS )1/2 ( (n +) e
and
s+ = s + ds .
Then, we have (x+ , y + , s+ ) F and n+ (x+ , s+ ) n+ (x, s) min(Xs) (XS )1/2 (e (n + ) 2 Xs ) + . xT s 2(1 )
Let v = Xs. Then, we can prove the following lemma (Exercise 3.3): Lemma 3.7 Let v Rn be a positive vector and n. Then, min(v ) V 1/2 (e (n + ) v) eT v 3/4 .
for a constant . This result will provide a competitive theoretical iteration bound, but a faster algorithm may be again implemented by conducting a line search along direction d to achieve the greatest reduction in the primal-dual potential function. This leads to Algorithm 3.2 Given (x0 , y 0 , s0 ) F . Set While (sk )T xk do
n and k := 0.
1. Set (x, s) = (xk , sk ) and = n/(n + ) and compute (dx , dy , ds ) from (3.8).
52
CHAPTER 3. INTERIOR-POINT ALGORITHMS 2. Let xk+1 = xk + dx , y k+1 = y k + dy , and sk+1 = sk + ds where = arg min n+ (xk + dx , sk + ds ).
0
3. Let k := k + 1 and return to Step 1. Theorem 3.8 Let = O( n). Then, Algorithm 3.2 terminates in at most O( n log((x0 )T s0 / )) iterations with cT xk bT y k .
3.3
Consider a pair of (X k , y k , S k ) F . Fix z k = bT y k , then the gradient matrix of the primal potential function at X k is P (X k , z k ) = n+ C (X k )1 . Sk X k
< 1. Then,
Mn +
and P (X, z k ) P (X k , z k ) P (X k , z k ) (X X k ) + (X k ).5 (X X k )(X k ).5 2 2(1 (X k ).5 (X X k )(X k ).5 A1 A2 A= ... . Am A1 X A2 X = b, AX = ... Am X
m )
Let
Then, dene
and AT y =
y i Ai .
i=1
Then, we directly solve the following ball-constrained problem: minimize s.t. P (X k , z k ) (X X k ) A(X X k ) = 0, (X k ).5 (X X k )(X k ).5 < 1.
53
Let X = (X k ).5 X (X k ).5 . Note that for any symmetric matrices Q, T Mn and X Mn +, Q X .5 T X .5 = X .5 QX .5 T and XQ Then we transform the above problem into minimize s.t. where (X k ).5 P (X k , z k )(X k ).5 (X I ) A (X I ) = 0, i = 1, 2, ..., i, X I , .
= QX
= X .5 QX .5 .
Let the minimizer be X and let X k+1 = (X k ).5 X (X k ).5 . Then X I = X k+1 X k = where Pk = = or Pk = and yk = PA (X k ).5 P (X k , z k )(X k ).5 (X k ).5 P (X k , z k )(X k ).5 A y k
T
, (3.10)
(X k ).5 P k (X k ).5 , Pk
Here, PA
Sk X k T (A A )1 A (X k ).5 P (X k , z k )(X k ).5 . n+ is the projection operator onto the null space of A , and A1 A1 A1 A2 ... A1 Am A2 A1 A2 A2 ... A2 Am T Mm . A A := ... ... ... ... Am A1 Am A2 ... Am Am P (X k , z k ) (X k ).5 P k (X k ).5 Pk (X k ).5 P (X k , z k )(X k ).5 P k Pk Pk 2 = P k , Pk
54 we have
P (X k+1 , z k ) P (X k , z k ) P k +
2 . 2(1 )
Thus, as long as P k > 0, we may choose an appropriate such that P (X k+1 , z k ) P (X k , z k ) for some positive constant . Now, we focus on the expression of P k , which can be rewritten as P (z k ) := P k = with and y (z k ) := y k = y2 where y1 and y2 are given by y1 y2 = (A A )1 A I = (A A )1 b, T = (A A )1 A (X k ).5 C (X k ).5 .
T T
(3.11)
S (z k ) = C AT y (z k ) Sk X k C X k zk y1 = y2 y1 , n+ n+
(3.12) (3.13)
(3.14)
Regarding P k = P (z k ) , we have the following lemma resembling Lemma 3.3. Lemma 3.10 Let k = If P (z k ) < min then the following three inequalities hold: S (z k ) 0, (X k ).5 S (z k )(X k ).5 e < , and < (1 .5/ n)k . (3.16) n ,1 , n + 2 (3.15) Sk X k C X k zk = n n and = S (z k ) X k . n
Proof. The proof is by contradiction. For example, if the rst inequality of (3.16) is not true, then (X k ).5 S (z k )(X k ).5 has at least one eigenvalue less than or equal to zero, and P (z k ) 1. The proof of the second and third inequalities are similar to that of Lemma 3.3. Based on this lemma, we have the following potential reduction theorem.
55
Theorem 3.11 Given X k F p and (y k , S k ) F d , let = n, z k = bT y k , X k+1 be given by (3.10), and y k+1 = y (z k ) in (3.13) and S k+1 = S (z k ) in (3.12). Then, either
(X k+1 , S k ) (X k , S k ) or where > 1/20. Proof. If (3.15) does not hold, i.e., P (z k ) min n ,1 , n + 2 (X k , S k+1 ) (X k , S k ) ,
Otherwise, from Lemma 3.10 the inequalities of (3.16) hold: i) The rst of (3.16) indicates that y k+1 and S k+1 are in F d . ii) Using the second of (3.16) and applying Lemma 3.2 to matrix (X k ).5 S k+1 (X k ).5 /, we have n log S k+1 X k log det S k+1 X k = n log S k+1 X k / log det(X k ).5 S k+1 (X k ).5 / = n log n log det(X k ).5 S k+1 (X k ).5 / (X k ).5 S k+1 (X k ).5 / I 2 n log n + 2(1 (X k ).5 S k+1 (X k ).5 / I ) 2 n log n + 2(1 ) 2 n log S k X k log det S k X k + . 2(1 ) iii) According to the third of (3.16), we have n(log S k+1 X k log S k X k ) = n log . k 2
Adding the two inequalities in ii) and iii), we have (X k , S k+1 ) (X k , S k ) 2 + . 2 2(1 )
56
Theorem 3.11 establishes an important fact: the primal-dual potential function can be reduced by a constant no matter where X k and y k are. In practice, one can perform the line search to minimize the primal-dual potential function. This results in the following potential reduction algorithm. Algorithm 3.3 Given x0 F p and (y 0 , s0 ) F d . Let z 0 = bT y 0 . Set k := 0. While S k X k do 1. Compute y1 and y2 from (3.14). 2. Set y k+1 = y ( z ), S k+1 = S ( z ), z k+1 = bT y k+1 with z = arg min (X k , S (z )).
z z k
If (X k , S k+1 ) > (X k , S k ) then y k+1 = y k , S k+1 = S k , z k+1 = z k . 3. Let X k+1 = X k (X k ).5 P (z k+1 )(X k ).5 with = arg min (X k (X k ).5 P (z k+1 )(X k ).5 , S k+1 ).
0
4. Let k := k + 1 and return to Step 1. The performance of the algorithm results from the following corollary: Corollary 3.12 Let = n. Then, Algorithm 3.3 terminates in at most O( n log(C X 0 bT y 0 )/ ) iterations with C X k bT y k . Proof. In O( n log(S 0 X 0 / )) iterations n log(S 0 X 0 / ) = (X k , S k ) (X 0 , S 0 ) n log S k X k + n log n (X 0 , S 0 ) n log(S k X k /S 0 X 0 ). =
Thus, i.e.,
n log(C X k bT y k ) =
n log S k X k
n log ,
C X k bT y k = S k X k .
57
3.4
Once we have a pair (X, y, S ) F with = S X/n, we can apply the primaldual Newton method to generate a new iterate X + and (y + , S + ) as follows: Solve for dX , dy and dS from the system of linear equations: D1 dX D1 + dS AdX AT dy dS where D = X .5 (X .5 SX .5 ).5 X .5 . Note that dS dX = 0. This system can be written as dX + dS A dX T A dy dS where dX = D.5 dX D.5 , and dS = D.5 dS D.5 , R = D.5 (X 1 S )D.5 , . = R, = 0, = 0, (3.18) = R := X 1 S, = 0, = 0, (3.17)
Again, we have dS dX = 0, and dy = (A A )1 A R , dS = A dy , and dX = R dS . Then, assign dS = AT dy Let and dX = D(R dS )D.
T T
V 1/2 = D.5 XD.5 = D.5 SD.5 Mn + . Then, we can verify that S X = I V . We now present the following lemma, whose proof is very similar to that for LP and 3.6 and will be omitted. Lemma 3.13 Let the direction dX , dy and dS be generated by equation (3.17) with = n/(n + ), and let = V
1/2 I V 1 /2 n+ V
V 1 /2
(3.19)
58
and
S + = S + dS .
Applying Lemma 3.7 to v Rn as the vector of the n eigenvalues of V , we can prove the following lemma: Lemma 3.14 Let V Mn + and
n. Then, 3/4.
for a constant . This leads to Algorithm 3.4. Algorithm 3.4 Given (X 0 , y 0 , S 0 ) F . Set = While S k X k do
n and k := 0.
1. Set (X, S ) = (X k , S k ) and = n/(n + ) and compute (dX , dy , dS ) from (3.17). 2. Let X k+1 = X k + dX , y k+1 = y k + dy , and S k+1 = S k + dS , where = arg min (X k + dX , S k + dS ).
0
3. Let k := k + 1 and return to Step 1. Theorem 3.15 Let = n. Then, Algorithm 3.4 terminates in at most O( n log(S 0 X 0 / )) iterations with C X k bT y k .
59
3.5
An open question is how to exploit the sparsity structure by polynomial interiorpoint algorithms so that they can also solve large-scale problems in practice. In this paper we try to respond to this question. We show that many large-scale semidenite programs arisen from combinatorial and quadratic optimization have features which make the dual-scaling interior-point algorithm the most suitable choice: 1. The computational cost of each iteration in the dual algorithm is less that the cost the primal-dual iterations. Although primal-dual algorithms may possess superlinear convergence, the approximation problems under consideration require less accuracy than some other applications. Therefore, the superlinear convergence exhibited by primal-dual algorithms may not be utilized in our applications. The dual-scaling algorithm has been shown to perform equally well when only a lower precision answer is required. 2. In most combinatorial applications, we need only a lower bound for the optimal objective value of (SDP). Solving (SDD) alone would be sucient to provide such a lower bound. Thus, we may not need to generate an X at all. Even if an optimal primal solution is necessary, our dual-scaling algorithm can generate an optimal X at the termination of the algorithm with little additional cost. 3. For large scale problems, S tends to be very sparse and structured since it is the linear combination of C and the Ai s. This sparsity allows considerable savings in both memory and computation time. The primal matrix, X , may be much less sparse and have a structure unknown beforehand. Consequently, primal and primal-dual algorithms may not fully exploit the sparseness and structure of the data. These problems include the semidenite relaxations of the graph-partition problem, the box-constrained quadratic optimization problem, the 0 1 integer set covering problem, etc. The dual-scaling algorithm, which is a modication of the dual-scaling linear programming algorithm, reduces the Tanabe-Todd-Ye primal-dual potential function (X, S ) = ln(X S ) ln det X ln det S. The rst term decreases the duality gap, while the second and third terms keep X and S in the interior of the positive semidenite matrix cone. When > n, the inmum of the potential function occurs at an optimal solution. Also note that, using the arithmetic-geometric mean inequality, we have n ln(X S ) ln det X ln det S n ln n.
60
Since A(X )T y = AT : m S n is
m i=1
yi (Ai X ) = ( A (y ) =
i=1 T
yi Ai .
Let z = C X for some feasible X and consider the dual potential function (y, z ) = ln( z bT y ) ln det S. Its gradient is (y, z ) = b + A(S 1 ). z bT y 0 and (3.20)
(S k ).5 AT (y y k ) (S k ).5 < 1, using the above lemma, the concavity of the rst term in the potential function, and the fact that (S k ).5 S (S k ).5 I = (S k ).5 (S S k )(S k ).5 = (S k ).5 AT (y y k ) (S k ).5 , we establish an overestimater for the potential reduction: (y, z k ) (y k , z k ) k T = ln( z b y ) ln( z k bT y k ) ln det((S k ).5 S (S k ).5 ) k T ln( z b y ) ln( z k bT y k ) + trace((S k ).5 S (S k ).5 I ) k .5 T k (S ) (A (yy ))(S k ).5 + 2(1 (S k ).5 (AT (yyk ))(S k ).5 ) = ln( z k bT y ) ln( z k bT y k ) + A((S k )1 )T (y y k ) k .5 T k (S ) (A (yy ))(S k ).5 + 2(1 (S k ).5 (AT (yyk ))(S k ).5 ) (S k ).5 (AT (y y k ))(S k ).5 ( y k , z k )T (y y k ) + 2(1 (S k ).5 (AT (yyk ))(S k ).5 ) .
(3.21)
Therefore, beginning with a strictly feasible dual point (y k , S k ) and upper bound z k , each iteration solves this following problem. Minimize Subject to T (y k , z k )(y y k ) k .5 (S ) AT (y y k ) (S k ).5 , (3.22)
where is a positive constant less than 1. For simplicity, in what follows we let k = z k bT y k .
61
The rst order Karusch-Kuhn-Tucker conditions state that the minimum point, y k+1 , of this convex problem satises M k (y k+1 y k )+ (y k , z k ) = M k (y k+1 y k )+ ( k b+A((S k )1 )) = 0 z bT y k (3.23) for a positive value of , where A1 (S k )1 (S k )1 A1 A1 (S k )1 (S k )1 Am . . .. . . Mk = . . . Am (S k )1 (S k )1 A1 and Am (S k )1 (S k )1 Am A1 (S k )1 . . A((S k )1 ) = . . Am (S k )1
The matrix M k is a Gram matrix and is positive denite when S k 0 and the Ai s are linearly independent. In this paper, it will sometimes be referred to as M. Using the ellipsoidal constraint, the minimal solution, y k+1 , of (3.22) is given by y k+1 y k = d( z k )y (3.24) T k k (y , z )(M k )1 (y k , z k ) where d( z k )y = (M k )1 (y k , z k ). (3.25) Unlike linear programming, semidenite programs require a signicant amount of the time to compute the system of equations used to to determine the step direction. For arbitrary symmetric matrices Ai , the AHO direction can be computed in 5nm3 + n2 m2 + O(max{m, n}3 ) operations, the HRVW/KSH/M direction uses 2nm3 + n2 m2 + O(max{m, n}3 ) operations, and the NT direction uses nm3 + n2 m2 /2 + O(max{m, n}3 ) operations. The complexity of computing the matrix is a full order of magnitude higher than any other step of the algorithm. Fujisawa, Kojima and Nakata explored another technique for computing primal-dual step directions that exploit the sparsity of the data matrices. However, it is our belief that only the dual-scaling algorithm can fully exploit the structure and sparsity of many problems, as explained below. k Generally, Mij = Ai (S k )1 (S k )1 Aj . When Ai = ai aT i , the Gram matrix can be rewritten in the form T k 1 2 k 1 (a1 (S ) a1 ) (aT am )2 1 (S ) . . .. . . Mk = (3.26) . . .
k 1 (aT a1 )2 m (S )
k 1 (aT am )2 m (S )
and
k 1 aT a1 1 (S ) . . A((S k )1 ) = . . T k 1 am (S ) am
62
This matrix can be computed very quickly without computing, or saving, (S k )1 . Instead, S k can be factored, and then we can use
63
Algorithm M: To compute M k and A((S k )1 ), factor S k = Lk (Lk )T and do the following: For i = 1 : m; Solve Lk wi = ai ;
T k = (A((S k )1 )i )2 ; wi and Mii A((S k )1 )i = wi
For j = 1 : m 1; end.
T k wj )2 ; = (wi Mij
end;
Solving each of the m systems of equations uses n2 + O(n) oating point operations. Since there are m(m + 1)/2 vector multiplications, Algorithm M, uses nm2 + n2 m + O(nm) operations after factoring S k . Note that these operations can be signicantly reduced if S k is structured and sparse. In applications like the maximum cut problem, the matrix S k is indeed very sparse while its inverse is usually dense, so working with S k is faster than working with its inverse. Using matrices of the form Ai = ai aT i also reduces the complexity of primal-dual algorithms by a factor of n, but even the quickest direction to compute takes about twice as long as our dual-scaling direction. Furthermore, they all need to handle dense X . Algorithm M needs to store all vectors w1 , ..., wm and they are generally dense. To save storage and exploit the sparsity of ai , ..., am , an alternative algorithm is Algorithm M: To compute M k and A((S k )1 ), factor S k = Lk (Lk )T and do the following: For i = 1 : m; Solve S k wi = ai ;
k T = (A((S k )1 )i )2 ; ai and Mii A((S k )1 )i = wi
For j = i + 1 : m; end.
k T Mij = (wi aj )2 ;
end;
Algorithm M does not need to store wj but uses one more back-solver for wi . To nd a feasible primal point X , we solve the least squares problem Minimize Subject to (S k ).5 X (S k ).5 A(X ) = b.
k I
(3.27)
This problem looks for a matrix X ( z k ) near the central path. Larger values of generally give a lower objective value, but provide a solution matrix that is not positive denite more frequently. The answer to (3.27) is a by-product of computing (3.25), given explicitly by X ( zk ) = k k 1 T A (d( z k )y ) + S k (S k )1 . (S )
(3.28)
64
Creating the primal matrix may be costly. However, the evaluation of the primal objective value C X ( z k ) requires drastically less work. C X ( zk ) = bT y k + X ( zk ) S k = bT y k + trace = bT y k + = bT y k +
k AT (d( z k )y ) trace (S ) k k 1 k T d( z )y A((S ) ) + n k k 1 (S ) k 1
AT (d( z k )y ) + S k (S k )1 S k +I
Since the vectors A((S k )1 ) and d( z k )y were previously found in calculating the dual step direction, the cost of computing a primal objective value is the cost of a vector dot product! The matrix X ( z k ) never gets computed during the iterative process, saving time and memory. On the other hand, primal-dual methods require far more resources to compute the primal variables X . Dening P ( z k ) = k (S k ).5 X ( z k )(S k ).5 I, (3.29) we have the following lemma:
k Lemma 3.16 Let = n + n, and < 1. If k n
z k bT y k , n
X ( z k ) S k n
C X ( z k )bT y k , n
n , 1 ), n + 2
(3.30)
2. (S k ).5 X ( z k )(S k ).5 I ; 3. (1 .5/ n)k . Proof. The proofs are by contradiction. If the rst inequality is false, then (S k ).5 X ( z k )(S k ).5 has at least one nonpositive eigenvalue, which by (3.29) implies that P ( z k ) 1. If the second does not hold, then P ( zk )
2
= = = >
where the last inequality is true because the quadratic term has a minimum at n = n+ 2 . nk
3.5. DUAL ALGORITHM FOR SDP If the third inequality does not hold, then > nk which leads to P ( zk )
2
65
1 1+ n
.5 1 n
1.
1 nk 1 1+ n 1
n 1 2 n
2 2
2 2 n
(1 )2 .
y k+1 y k
(S k ).5
T (y k , z k )d( z k )y = P ( zk ) and
(3.31) (3.32)
Updating the dual variables according to y k+1 = y k + and S k+1 = C AT (y k+1 ), (3.33)
assures the positive deniteness of S k+1 when < 1, which assures that they are feasible. Using (3.32) and (3.21), the reduction in the potential function satises the inequality (y k+1 , z k ) (y k , z k ) P ( zk ) + 2 . 2(1 ) (3.34)
The theoretical algorithm can be stated as follows. 0 0 DUAL ALGORITHM. Given an upper bound z 0 and a dual point (y , S ) 0 T 0 such that S = C A y 0, set k = 0, > n + n, (0, 1), and do the following: while z k bT y k do begin
66
CHAPTER 3. INTERIOR-POINT ALGORITHMS 1. Compute A((S k )1 ) and the Gram matrix M k (3.26) using Algorithm M or M. 2. Solve (3.25) for the dual step direction d( z k )y . 3. Calculate P ( z k ) using (3.31). 4. If (3.30) is true, then X k+1 = X ( z k ), z k+1 = C X k+1 , and (y k+1 , S k+1 ) = k k (y , S ); else y k+1 = y k + and z k+1 = z k . endif 5. k := k + 1.
P ( zk )
end We can derive the following potential reduction theorem based on the above lemma: Theorem 3.17 (X k+1 , S k+1 ) (X k , S k ) where > 1/50 for a suitable . Proof. (X k+1 , S k+1 )(X k , S k ) = (X k+1 , S k+1 ) (X k+1 , S k ) + (X k+1 , S k ) (X k , S k ) . In each iteration, one of the dierences is zero. If P ( z k ) does not satisfy (3.30), the dual variables get updated and (3.34) shows sucient improvement in the potential function when = 0.4. On the other hand, if the primal matrix gets updated, then using Lemma 3.2 and the rst two parts of Lemma 3.16, n ln X k+1 S k ln det X k+1 ln det S k = n ln X k+1 S k ln det X k+1 S k = n ln X k+1 S k / ln det X k+1 S k / = n ln n ln det (S k ).5 X k+1 (S k ).5 / k .5 ) X k+1 (S k ).5 /I n ln n + 2(1 (S (S k ).5 X k+1 (S k ).5 /I ) n ln n + n ln X S
2 2(1) k k
ln det X k ln det S k +
2 2(1)
67
By choosing = 0.4 again, we have the desired result. This theorem leads to Corollary 3.18 Let n + n and (X 0 , S 0 ) ( n) ln(X 0 S 0 ). Then, the algorithm terminates in at most O(( n) ln(X 0 S 0 / )) iterations. Proof. In O(( n) ln(X 0 S 0 / )) iterations, (X k , S k ) ( n) ln( )). Also, ( n) ln(C X k bT y k ) = ( n) ln(X k S k ) (X k , S k ) n ln n (X k , S k ). Combining the two inequalities, C X k bT y k = X k S k < .
Again, from (3.28) we see that the algorithm can generate an X k as a byproduct. However, it is not needed in generating the iterate direction, and it is only explicitly used for proving convergence and complexity. Theorem 3.19 Each iteration of the dual algorithm uses O(m3 + nm2 + n2 m + n3 ) oating point iterations. Proof. Creating S , or S + AT (d( z k )), uses matrix additions and O(mn2 ) op3 erations; factoring it uses O(n ) operations. Creating the Gram matrix uses nm2 + 2n2 m + O(nm) operations, and solving the system of equations uses O(m3 ) operations. Dot products for z k+1 and P ( z k ) , and the calculation of y k+1 use only O(m) operations. These give the desired result.
3.6
Notes
The primal potential reduction algorithm for positive semi-denite programming is due to Alizadeh [9, 8], in which Ye has suggested studying the primal-dual potential function for this problem and looking at symmetric preserving scal1/2 1/2 ings of the form X0 XX0 , and to Nesterov and Nemirovskii [241], and the primal-dual algorithm described here is due to Nesterov and Todd [242, 243]. One can also develop a dual potential reduction algorithm. In general, consider (P SP ) inf s.t. C X A X = b, X K,
(P SD)
sup s.t.
bT y A Y + S = C, S K,
where K is a convex homogeneous cone. Interior-point algorithms compute a search direction (dX , dY , dS ) and a new strictly feasible primal-dual pair X + and (Y + ; S + ) is generated from X + = X + dX , Y + = Y + dY , S + = S + dS , for some step-sizes and . The search direction (dX , dY , dS ) is determined by the following equations: A d X = 0, and dX + F (S )dS = or dS + F (X )dX = or dS + F (Z )dX = where Z is chosen to satisfy S = F (Z )X. (3.39) The dierences among the three algorithms are the computation of the search direction and their theoretical close-form step-sizes. All three generate an optimal solution (X, Y, S ), i.e., X S in a guaranteed polynomial time. Other primal-dual algorithms for positive semi-denite programming are in Alizadeh, Haeberly and Overton [10, 12], Boyd, Ghaoui, Feron and Balakrishnan [56], Helmberg, Rendl, Vanderbei and Wolkowicz [156], Jarre [170], de Klerk, Roos and Terlaky.[185], Kojima, Shindoh and Hara [190], Monteiro and Zhang [228], Nesterov, Todd and Ye [244], Potra and Sheng [257], Shida, Shindoh and Kojima [276], Sturm and Zhang [285], Tseng [305], Vandenberghe and Boyd [313, 314], and references therein. Ecient interior-point algorithms are also developed for optimization over the second-order cone; see Andersen and Christiansen [17], Lobo, Vandenberghe and Boyd [204], and Xue and Ye [326]. These algorithms have established the best approximation complexity results for some combinatorial problems. Primal-dual adaptive path-following algorithms, the predictor-corrector algorithms and the wide-neighborhood algorithms can also be developed for solving (SDP). dS = A dY (feasibility) (3.35)
n+ X F (S ) (dual scaling), X S
3.7. EXERCISES
69
3.7
Exercises
3.1 Prove a slightly stronger variant of Lemma 3.1: If D Mn such that 0 D < 1 then Trace(D) log det(I D) Trace(D) 3.2 Given S D 2 2(1 D
)
Given X
n. Prove 3/4 .
(n + ) v) eT v
3.4 Prove the following convex quadratic inequality (Ay + b)T (Ay + b) cT y d 0 is equivalent to a matrix inequality I (Ay + b)T Ay + b cT y + d 0.
Using this relation to formulate a convex quadratic minimization problem with convex quadratic inequalities as an (SDD) problem. 3.5 Prove Corollary 3.9. 3.6 Prove Lemma 3.13. 3.7 Describe and analyze a dual potential algorithm for positive semi-denite programming in the standard form.
70
Chapter 4
A (randomized) algorithm for a maximization problem is called (randomized) r-approximation algorithm, where 0 < r 1, if it outputs a feasible solution with its (expected) value at least r times the optimum value for all instances of the problem. More precisely, let w (> 0) be the (global) maximum value of a given problem instance. Then, a r-approximate maximizer x satises w(x) r w . or E[w(x)] r w . A (randomized) algorithm for a minimization problem is called (randomized) r-approximation algorithm, where 1 r, if it outputs a feasible solution with its (expected) value at most r times the optimum value for all instances of the problem. More precisely, let w (> 0) be the (global) minimal value of a given problem instance. Then, a r-approximate minimizer x satises w(x) r w . or E[w(x)] r w . 71
72
4.2
Minimize
xT Qx + 2q T x 1. (4.1)
Here, the given matrix Q Mn , the set of n-dimensional symmetric matrices; vector q Rn ; and . is the Euclidean norm.
4.2.1
Homogeneous Case: q = 0
X = xxT . Minimize Subject to QX I X 1, X 0, Rank(X ) = 1.
Matrix-formulation: Let
z = (BQP)
SDP-relaxation: Remove the rank-one constraint. z SDP := (SDP) Subject to I X 1, X The dual of (SDP) can be written as: z SDP = (DSDP) Subject to yI + S = Q, y 0, S 0. Maximize y 0. Minimize QX
X is a minimal matrix solution to SDP if and only if there exist a feasible dual variable y 0 such that S = Q y I 0
y (1 I X ) = 0 S X = 0. Observation: z SDP z . Theorem 4.1 The SDP relaxation is exact, meaning z SDP z .
73
X =
j =1
xj xT j ,
is a solution to (BQP).
4.2.2
Non-Homogeneous Case
X = (1; x)(1; x)T Mn+1 , Q = I = 0 q 0 0 1 0 qT Q 0T I 0T 0 , ,
Matrix-formulation: Let
Q X I X 1, I1 X = 1, X 0, Rank(X ) = 1.
SDP-relaxation: Remove the rank-one constraint. z = (SDP) Subject to I X 1, I1 X = 1 , X 0. The dual of (SDP) can be written as: z SDP = (DSDP) Subject to y 1 I + y 2 I1 + S = Q , y1 0, S 0. Maximize y1 + y2 Minimize Q X
74
X is a minimal matrix solution to SDP if and only if there exist a feasible dual variables (y1 0, y2 ) such that
S = Q y1 I y2 I1 y1 (1 I X ) = 0
S X = 0. Observation: z SDP z . Lemma 4.2 Let X be a positive semidenite matrix of rank r, A be a given symmetric matrix. Then, there is a decomposition of X
r
X=
j =1
xj xT j ,
X=
j =1
pj pT j
be any decomposition of X . Assume that pT 1 Ap1 < 0 Now let x1 = (p1 + p2 )/ and x2 = (p2 p1 )/ where makes (p1 + p2 )T A(p1 + p2 ) = 0. We have
T T T p1 pT 1 + p2 p2 = x1 x1 + x2 x2
and
pT 2 Ap2 > 0. 1 + 2 1 + 2,
and A (X x1 xT 1 ) = 0. Note that X x1 xT 1 is still positive semidenite and its rank is r 1. Theorem 4.3 The SDP relaxation is exact, meaning z SDP z .
75
X =
j =1
xj xT j ,
is a solution to (BQP).
4.3
X = xxT . QX A1 X 1, A2 X 1, X 0, Rank(X ) = 1.
Minimize Subject to
SDP-relaxation: Remove the rank-one constraint. z = (SDP) Subject to A1 X 1, A2 X 1, X 0. The dual of (SDP) can be written as: z SDP = (DSDP) Subject to y1 A1 + y2 A2 + S = Q, y1 , y2 0, S 0. Maximize y1 + y2 Minimize QX
Assume that there is no gap between the primal and dual. Then, X is a minimal matrix solution to SDP if and only if there exist a feasible dual variable (y1 , y2 ) 0 such that S = Q y1 A1 y2 A2 0
76
S X = 0. Observation: z SDP z . Theorem 4.4 The SDP relaxation is exact, meaning z SDP z . Moreover, there is a decomposition of X
r
X =
j =1
xj xT j ,
where r is the rank of X , such that for any j , x j = xj for some is a solution to (QP-2).
4.4
Max-Cut Problem
Consider the Max Cut problem on an undirected graph G = (V, E ) with nonnegative weights wij for each edge in E (and wij = 0 if (i, j ) E ), which is the problem of partitioning the nodes of V into two sets S and V \ S so that w(S ) :=
iS, j V \S
wij
is maximized. A problem of this type arises from many network planning, circuit design, and scheduling applications. This problem can be formulated by assigning each node a binary variable xj : 1 z = Maximize w(x) := wij (1 xi xj ) 4 i,j (MC) Subject to x2 i = 1, i = 1, ..., n.
The Coin-Toss Method: Let each node be selected to one side, or xi be 1, independently with probability .5. Then, swap nodes from the majority side to the minority side using the greedy method. E[w(x)] 0.5 z .
77
4.4.1
Semi-denite relaxation
z SDP := minimize s.t. QX Ij X = 1, j = 1, ..., n, X 0. eT y Q D(y ). (4.3)
The dual is
(4.4)
Let V = (v1 , . . . , vn ) Rnn , i.e., vj is the j th column of V , such that X = V TV . Generate a random vector u N (0, I ):
(4.5)
4.4.2
Approximation analysis
Then, one can prove from Sheppard [274] (see Goemans and Williamson [125] and Bertsimas and Ye [50]): E[ xi x j ] = 2 ij ), arcsin(X i, j = 1, 2, . . . , n.
U
= -1
V j /||Vj ||
=1 arccos(.)
V i /||Vi ||
=1
= -1
j i Figure 4.1: Illustration of the product ( v ) ( v ) on the 2-dimensional i j unit circle. As the unit vector u is uniformly generated along the circle, the product is either 1 or 1.
vT u
vT u
78
Lemma 4.5 For x [1, 1) 1 (2/ ) arcsin(x) .878. 1x Lemma 4.6 Let X 0 and d(X ) 1. Then arcsin[X ] X.
Theorem 4.7 We have i) If Q is a Laplacian matrix, then E( x T Qx ) .878z SDP .878z , so that z .878z SDP . ii) If Q is positive semidenite E( xT Qx ) so that z 2 SDP 2 z z, 2 SDP z .
4.5
that is, we can compute a feasible solution x, in polynomial time, such that x T Q0 x 1 z. min(m2 , n2 )
4.5.1
SDP relaxation
(SDP ) z SDP := maximize Q0 X subject to Qi X 1, i = 1, ..., m X 0.
4.5. MULTIPLE ELLIPSOID-CONSTRAINED QUADRATIC MAXIMIZATION79 and its dual (SDD) z SDD := minimize subject to yi m Z = i=1 yi Qi Q0 Z 0, yi 0, i = 1, ..., m.
m i=1
We assume that (SDD) is feasible. Then (SDP) and (SDD) have no duality gap, that is, z SDP = z SDD . Moreover, z = z SDP if and only if (SDP) has a feasible rank-one matrix solution and meet complementarity condition.
4.5.2
Bound on Rank
Lemma 4.8 Let X be an maximizer of (SDP). Then we can compute another maximizer of (SDP) whose rank is no more than m by solving a linear program. Suppose the rank of initial X is r. Then, there exist column vectors xj , j = 1, ..., r, such that
r
X =
j =1
xj xT j .
Let
T aij = Qi xj xT j = xj Qi xj .
Note that v1 = ... = vr = 1 is an optimal solution for the problem. But we should have at least r inequalities active at an optimal basic solution. Thus, we should have at most m of vj are positive at an optimal basic solution. Let X be an SDP maximizer whose rank r min(m, n). Then, there exist column vectors xj , j = 1, ..., r, such that
r
X =
j =1
xj xT j .
Thus, each xj is a feasible solution of (HQP). Then, we can select an xj such that xT j Q0 xj is the largest, and have max xT j Q0 xj 1 r
r
j =1,...,r
xT j Q0 xj =
j =1
80
4.5.3
Improvements
Lemma 4.10 Let X be an maximizer of (SDP). Then we can compute another maximizer of (SDP) whose rank is no more than m 1 by solving a linear program. Thus, we have Theorem 4.11 For solving HQP-m, where Qi approximation ratio 1 . min(m 1, n) 0 for i = 1, ..., m, we have
For large m, consider SDP in the standard form: z := Minimize Subject to C X Ai X = bi , i = 1, ..., m X 0.
Theorem 4.12 (Pataki [252], Barvinok [34]) Let X be a minimizer of the SDP in the standard form. Then we can compute a minimizer of the problem whose rank r satises r(r + 1)/2 m in polinomial time. Proof. If the rank of X , r, satises the inequality, then we need do nothing. Thus, we assume r(r + 1)/2 > m, and let V V T = X , Then consider Minimize Subject to V T CV U V T Ai V U = bi , i = 1, ..., m U 0. (4.6) V Rnr .
4.5. MULTIPLE ELLIPSOID-CONSTRAINED QUADRATIC MAXIMIZATION81 Moreover, for any feasible solution of (4.6) one can construct a feasible solution for the original SDP using X (U ) = V U V T and C X (U ) = V T CV U. (4.7)
Thus, the minimal value of (4.6) is also z , and U = I is a minimizer of (4.6). Now we show that any feasible solution U to (4.6) is a minimizer for (4.6); thereby X (U ) of (4.7) is a minimizer for the original SDP. Consider the dual of (4.6) m z := Maximize bT y = i=1 bi yi Subject to V T CV
m i=1
yi V T Ai V T .
(4.8)
Let y be a dual maximizer. Since U = I is an interior optimizer for the primal, the strong duality condition holds, i.e.,
m
I (V CV
i=1
T V Ai V T ) = 0 yi
so that we have V T CV
m T yi V Ai V T = 0. i=1
Then, any feasible solution of (4.6) satises the strong duality condition so that it must be also optimal. Consider the system of homogeneous linear equations V T Ai V W = 0, i = 1, ..., m where W is a r r symmetric matrices (does not need to be denite). This system has r(r + 1)/2 real number variables and m equations. Thus, as long as r(r +1)/2 > m, we must be able to nd a symmetric matrix W = 0 to satisfy all m equations. Without loss of generality, let W be either indenite or negative semidenite (if it is positive semidenite, we take W as W ), that is, W has at least one negative eigenvalue, and consider U () = I + W. | where is the least eigenvalue of W , we have Choosing = 1/| U ( ) 0
and it has at least one 0 eigenvalue or rank(U ( )) < r, and V T Ai V U ( ) = V T Ai V (I + W ) = V T Ai V I = bi , i = 1, ..., m. That is, U ( ) is a feasible and so it is an optimal solution for (4.6). Then, X (U ( )) = V U ( )V T
82
is a new minimizer for the original SDP, and rank(X (U ( ))) < r. This process can be repeated till the system of homogemeous linear equations has only all zero solution, which is necessarily given by r(r + 1)/2 m. Thus, we must be able to nd an SDP solution whose rank satises the inequality. The total number of such reduction steps is bounded by n 1 and each step uses no more than O(m2 n) arithmetic operations. Therefore, the total number of arithmetic operations is a polynomial in m and n, i.e., in (strongly) polynomial time given the least eigenvalue of W .
4.5.4
Randomized Rounding
z SDP := maximize subject to Q0 V T U V Qi V U 1, i = 1, ..., m V U 0.
Note that U = I is an optimal solution for the problem. Q0 V T is a diagonal matrix Without loss of generality, assume that D := V and Ai = V Qi V , i = 1, ..., m. Rewrite the problem z SDP := maximize D U subject to Ai U 1, i = 1, ..., m U 0, and U = I is an optimal solution for the problem. Generates a random r-dimension vector u where, independently, uj = Note that and Let u = Then, for i = 1, ..., m, maxi 1 with probability.5 1 otherwise. E[uuT ] = I, D uuT = D I = z SDP . 1 Ai uuT u.
4.6. MAX-BISECTION PROBLEM x x T is a feasible solution for (SDP), and Q0 x x T = where for i = 1, ..., m Ai I 1. Lemma 4.13 For any given vector a, the probability Pr(aaT uuT > a 2 ) 2 exp(/2). Lemma 4.14 The probability Pr(Ai uuT > ) Pr(Ai uuT > (Ai I )) 2r exp(/2). Lemma 4.15 The probability Pr(max(Ai uuT ) > ) 2mr exp(/2).
i
83
1 r , 2m 2 1 . 2
4.6
Max-Bisection Problem
Consider the Max-Bisection problem on an undirected graph G = (V, E ) with non-negative weights wij for each edge in E (and wij = 0 if (i, j ) E ), which is the problem of partitioning the even number of nodes in V into two sets S and V \ S of equal cardinality so that w(S ) :=
iS, j V \S
wij
84
is maximized. This problem can be formulated by assigning each node a binary variable xj : w := (MB) subject to
j =1
Maximize
1 4
n
wij (1 xi xj )
i,j
xj = 0
or
eT x = 0
x2 j = 1, j = 1, . . . , n, where e n (n even) is the column vector of all ones, superscript T is the transpose operator. Note that xj takes either 1 or 1, so that we can choose either S = {j : xj = 1} or S = {j : xj = 1}. The constraint eT x = 0 ensures that |S | = |V \ S |. A problem of this type arises from many network planning, circuit design, and scheduling applications. In particular, the popular and widely used MinBisection problem sometimes can be solved by nding the Max-Bisection over the complementary graph of G. The Coin-Toss Method: Let each node be selected to one side, or xi be 1, independently with probability .5. Then, swap nodes from the majority side to the minority side using the greedy method. E[w(x)] 0.5 z .
4.6.1
SDP relaxation
Maximize Subject to 1 4 wij (1 Xij )
i,j
Xii = 1, X
i,j
i = 1, ..., n,
Xij = 0, 0.
What complicates matters in Max-Bisection, comparing to Max-Cut, is that two objectives are actually soughtthe objective value of w(S ) and the size of S . Therefore, in any (randomized) rounding method at the beginning, we need to balance the (expected) quality of w(S ) and the (expected) size of S . We want high w(S ); but, at the same time, zero or small dierence between |S | and n/2, since otherwise we have to either add or subtract nodes from S , resulting in a deterioration of w(S ) at the end. Our method is built upon a careful balance of the two, plus an improved proof technique.
85
4.6.2
We rst review the Frieze and Jerrum method, and then proceed with our improved method. be an optimal solution of Problem (SDP). Since X is positive semidefLet X inite, the randomization method of Goemans and Williamson [125] essentially generates a random vector u from a multivariate normal distribution with 0 , that is, using (4.5) to generate S = {i : x mean and covariance matrix X i = 1} or S = {i : x i = 1}. Then, one can prove Sheppard [274] (see Goemans and Williamson [125], Frieze and Jerrum [110], and Bertsimas and Ye [50]): E[ xi x j ] = and 1 where (1) := min
1y<1
2 ij ), arcsin(X
i, j = 1, 2, . . . , n,
(4.9)
2 ij ) (1)(1 X ij ), arcsin(X 1
2
arcsin(y ) .878567 1y
2 ij ) arcsin(X
ij ) wij (1)(1 X
i,j
(4.10)
2 ij ) arcsin(X
ij ) (1)(1 X
i,j
= (1) since
i,j
n2 , 4
(4.11)
ij = eeT X = 0. X
However, x may not satisfy eT x = 0, i.e., S may not be a bisection. Then, using a greedy method, Frieze and Jerrum have adjusted S by swapping nodes
86
from the majority block into the minority block until they are equally sized. Note that inequality (4.11) assures that not too many nodes need to be swapped. Also, in selecting swapping nodes, Frieze and Jerrum make sure that the least weighted node gets swapped rst. More precisely, (w.l.o.g) let |S | n/2; and for each i S , let (i) = j S wij and S = {i1 , i2 , ..., i|S | }, where (i1 ) (i2 ) = {i1 , i2 , ..., in/2 }. Clearly, the construction of ... (i|S | ). Then, assign S bisection S guarantees that ) n w (S ) , w(S 2|S | n/2 |S | n. (4.12)
, they dene two random variIn order to analyze the quality of bisection S ables: 1 w := w(S ) = wij (1 x i x j ) 4 i,j and m := |S |(n |S |) = n2 (eT x )2 1 = 4 4 4 (1 x i x j )
i,j
(since (eT x )2 = (2|S | n)2 ). Then, from (4.10) and (4.11), E[w] (1) w and E[m] (1) m , where m = n2 . 4
w m + , w m and z3
(4.13)
since w/w 2|S |/n 2 and m/m 1. For simplicity, Frieze and Jerrums analysis can be described as follows. They ), create x repeatedly generate samples u N (0, X or S using the randomization method (4.5), construct S using the swapping procedure, and record the largest ) among these samples. Since E[z ] 2(1) and z 3, they are value of w(S almost sure to have one z to meet its expectation before too long, i.e., to have z 2(1). Moreover, when (4.14) holds and suppose w(S ) = w , which from (4.13) and (4.14) implies that m 2(1) . m (4.14)
4.6. MAX-BISECTION PROBLEM Suppose that |S | = n; then the above inequality implies 2(1) 4 (1 ). Applying (4.12) and (4.15), one can see that ) w(S = w (S ) 2 w 2 (2(1) 4 (1 ))w 2 2( 2(1) 1)w .
87
(4.15)
The last inequality follows from simple calculus that = 2(1)/2 yields the minimal value for (2(1) 4 (1 ))/(2 ) when 0 1. Note that for ) in the process (1) .878567, 2( 2(1) 1) > .651. Since the largest w(S is at least as good as the one who meets z 2(1), they proceed to prove a .651-approximation algorithm for Max-Bisection.
4.6.3
and a Our improved rounding method is to use a convex combination of X positive semidenite matrix P as the covariance matrix to generate u and x , i.e, + (1 )P ), u N (0, X x = sign(u), S = {i : x i = 1} or S = {i : x i = 1}
from the Frieze and Jerrum swapping procedure. such that |S | n/2, and then S One choice of P is n 1 P = (I eeT ), n1 n
1 where I is the n-dimensional identity matrix. Note that I n eeT is the projection matrix onto the null space of eT x = 0, and P is a feasible solution for (SDP). The other choice is P = I , which was also proposed in Nesterov [240], and by Zwick [345] for approximating the Max-Cut problem when the graph is sparse. Again, the overall performance of the rounding method is determined by two factors: the expected quality of w(S ) and how much S need to be downsized. + (1 )P provides a balance The convex combination parameter used in X in the combination between these two factors. Typically, the more use of X results in higher expected w(S ) but larger expected dierence between |S | and n/2; and the more use of P results in less expected w(S ) and more accurate |S |.
88
More precisely, our hope is that we could provide two new inequalities in replacing (4.10) and (4.11): E[w(S )] = and E[|S |(n |S |)] = E (eT x )2 n2 1 = 4 4 4 (1 E[ xi x j ])
i,j
1 4
wij (1 E[ xi x j ]) w
i,j
(4.16)
n2 , 4
(4.17)
such that would be slightly less than (1) but would be signicantly greater than (1). Thus, we could give a better overall bound than .651 for MaxBisection. Before proceed, we prove a technical result. Again, let two random variables w := w(S ) = 1 4 wij (1 x i x j ) and
i,j
m := |S |(n |S |) =
1 4
(1 x i x j ).
i,j
Furthermore, for a given parameter 0, let new random variable z ( ) = Then, we have E[z ( )] + and z 2 + . Note that Frieze and Jerrum used = 1 in their analyses. Now we prove the following lemma: Lemma 4.17 Assume (4.16) and (4.17) hold. Then, for any given /(4 ), if random variable z ( ) meets its expectation, i.e., z ( ) + , then ) 2 w (S In particular, if = 1 ( 1) 2 1 ( + ) w . w m + . w m (4.18)
89
w(S ) = w
and
|S | = n,
which from (4.18) and z ( ) + implies that + 4 (1 ). Applying (4.12) we see that ) w(S w(S ) 2 w = 2 + 4 (1 ) w 2 2( ( + ) ) w .
The last inequality follows from simple calculus that + = 2 yields the minimal value for ( + 4 (1 ))/(2 ) in the interval [0, 1], if /(4 ). In particular, substitute = 1 1) ( 2 1
into the rst inequality, we have the second desired result in the lemma. The motivation to select =
1 2 ( 1
value for 2( ( + ) ). In fact, when both = = (1) .878567 as in the case of Frieze and Jerrum, 1+ > 0.6515, 1
which is just slightly better than 2( 2(1) 1) 0.6511 proved by Frieze and Jerrum. So their choice = 1 is almost optimal. We emphasize that is only used in the analysis of the quality bound, and is not used in the rounding method.
4.6.4
A simple .5-approximation
To see the impact of in the new rounding method, we analyze the other extreme case where = 0 and P = 1 n (I eeT ). n1 n
90
That is, we generate u N (0, P ), then x and S . Now, we have 1 1 2 1 E[w(S )] = E wij (1 x i x j ) = 1 + arcsin( ) 4 i,j 4 n1 and E
wij .5 w
i=j
where from (4.9) we have used the facts that E[ xi x j ] = and 1 2 2 1 arcsin( ), n1 wij =
i=j i<j
i=j
wij w .
Comparing (4.19) and (4.20) to (4.10) and (4.11), here the rst inequality on w(S ) is worse, .5 vs .878567; but the second inequality is substantially better, 1 1 n vs .878567; i.e., x here is a bisection with probability almost 1 when n is large. Using Lemma 4.17, we see that the method is a .5 > 1+ 1 1 + 1/n approximation method. The same ratio can be established for P = I .
4.6.5
A .699-approximation
We now prove our main result. For simplicity, we will use P = I in our rounding + (1 )I method. Therefore, we discuss using the convex combination of X as the covariance matrix to generate u, x , S and S for a given 0 1, i.e., + (1 )P ), u N (0, X x = sign(u), S = {i : x i = 1} or S = {i : x i = 1} from the Frieze and Jerrum swapping procedure. such that |S | n/2, and then S Dene 2 arcsin(y ) 1 ; (4.21) () := min 1y<1 1y
4.6. MAX-BISECTION PROBLEM and () := (1 where b() = 1 2 arcsin() and c() = min
1y<1
91
1 )b() + c(), n
(4.22)
2 arcsin() arcsin(y ) . 1y
Note that (1) = (1) .878567 as shown in Goemans and Williamson [125]; 1 and (0) = .5 and (0) = 1 n . Similarly, one can also verify that (.89) .835578, b(.89) .301408 and c(.89) .660695.
We now prove another technical lemma: Lemma 4.18 For any given 0 1 in our rounding method, inequalities (4.16) and (4.17) hold for = ( ) Proof. Since 1 have
2
and
= ().
E[w(S )] =
wij 1
i,j
ij ) wij ()(1 X
i,j
from the denition (4.22) we have E[|S |(n |S |)] = = = = E n2 (eT x )2 4 4 1 2 ij ) 1 arcsin(X 4
i=j
1 4 1 4 1 4
1
i=j
b() +
i=j
ij ) b() + c()(1 X
i=j
92
CHAPTER 4. SDP FOR COMBINATORIAL OPTIMIZATION = = = 1 (n2 n)b() + (n2 n)c() + nc() 4 1 n2 (1 )b() + c() n 4 2 n () . 4
Lemmas 4.17 and 4.18 together imply that for any given between 0 and 1, our rounding method will generate a ) w (S ( ) 1+ 1 ( ) w
as soon as (bounded) z ( ) of (4.18) meets its expectation. Thus, we can set ( ) to a value in [0, 1] such that is maximized, that is, let
1+ 1 ( )
() 1+ ( ) 1 ( ) .
[0,1]
1+
1 ( )
In particular, if = .89 is selected in the new rounding method, (.89) > .8355, and for n suciently large ( 104 ) (.89) = (1 which imply rM B = ( ) 1+ 1 ( ) (.89) 1+ 1 (.89) > .69920. 1 )b(.89) + c(.89) > .9620, n
This bound yields the nal result: Theorem 4.19 There is a polynomial-time approximation algorithm for MaxBisection whose expected cut is at least rM B times the maximal bisection cut, if the number of nodes in the graph is suciently large. In particular, if parameter = .89 is used, our rounding method is a .699-approximation for Max-Bisection. The reader may ask why we have used two dierent formulations in dening () of (4.21) and () of (4.22). The reason is that we have no control on the ratio, , of the maximal bisection cut w over the total weight i<j wij , i.e., := w
i<j
wij
93
Note that ranges from 1/2 to 1. Indeed, using the second derivation in Lemma 4.18, we can also prove that in Lemma 4.17 1 b() + c(). 2
Then, for = .89, we have .8113, which is less than .8355 established by using the rst derivation. However, if .8, then we have .8490. For Max-Bisection on these graphs, our method is a .710 approximation for setting = .89. This bound can be further improved by setting a smaller . In general, the quality bound improves as decreases. When near 1/2, we have a close to 1 approximation 1 if = 0 is chosen, since b(0) = 1, c(0) = 0, 21 and 1 n . In any case, we can run our rounding method 100 times for parameter = .00, .01, ..., .98, .99 and report the best rounding solution among the 100 tries. This will ensure us to produce a solution with a near best guarantee, but without the need to know .
4.7
Quadratic Optimization
In this section we consider approximating the global minimizer of the following QP problem with certain quadratic constraints: q (Q) := minimize s.t. q (x) := xT Qx n 2 j =1 aij xj = bi , i = 1, . . . , m, e x e,
where symmetric matrix Q Mn , A = {aij } Rmn and b Rm are given and e Rn is, again, the vector of all ones. We assume that the problem is feasible and denote by x(Q) its global minimizer. Normally, there is a linear term in the objective function: q (x) = xT Qx + cT x. However, the problem can be homogenized as minimize s.t. xT Qx + t cT x n 2 j =1 aij xj = bi , i = 1, . . . , m, e x e, 1 t 1
by adding a scalar variable t. There always is an optimal solution (x, t) for this problem in which t = 1 or t = 1. If t = 1, then x is also optimal for the
94
original problem; if t = 1, then x is optimal for the original problem. Thus, without loss of generality, we can assume q (x) = xT Qx throughout this section. The function q (x) has a global maximizer over the bounded feasible set as well. Let q := q (Q) and q := q (Q) denote their maximal and minimal objective values, respectively. We now present a fast algorithm to compute a 4/7-approximate minimizer, that is, to compute a feasible x , such that q ( x) q 4/7. q q
4.7.1
The algorithm for approximating the QP minimizer is to solve a positive semidenite programming relaxation problem: p(Q) := minimize s.t. QX Ai X = bi , i = 1, . . . , m, d(X ) e, X 0.
(4.23)
Here, Ai = diag(ai ), ai = (ai1 , . . . , ain ), and unknown X Mn is a symmetric matrix. Furthermore, d(X ) is a vector containing the diagonal components of X . Note that d(X ) e can be written as Ij X 1, j = 1, . . . , n, where Ij is the all-zero matrix except the j th diagonal component equal to 1. The dual of the relaxation is p(Q) = maximize s.t. eT z + bT y Q D(z ) +
m i=1
yi Ai , z 0,
(4.24)
where D(z ) is the diagonal matrix such that d(D(z )) = z Rn . Note that the relaxation is feasible and its dual has an interior feasible point so that there is no duality gap between the primal and dual. Denote by X (Q) and (y (Q), z (Q)) an optimal solution pair for the primal (4.23) and dual (4.24). For simplicity, in what follows we let x = x(Q) and X = X (Q). We have the following relations between the QP problem and its relaxation: Proposition 4.20 Let q := q (Q), q := q (Q), p := p(Q), p := p(Q), ( y, z ) = (y (Q), z (Q)). Then, i) q is the maximal objective value of xT Qx in the feasible set of the QP problem; ii) p = eT z + bT y and it is the maximal objective value of Q X in the feasible m set of the relaxation, and D( z ) + i=1 y i Ai Q 0; iii) pqq p .
95
Proof. The rst and second statements are straightforward to verify. Let X = xxT Mn . Then X 0, d(X ) e,
n
Ai X = xT Ai x =
j =1
aij (xj )2 = bi , i = 1, . . . , m,
and Q X = xT Qx = q (x) = q (Q). Thus, we have q (Q) p(Q), or p q . Similarly, we can prove q (Q) p(Q), or p q . Since X is positive semi-denite, there is a matrix factorization V = (v 1 , . . . , v n ) Rnn , i.e., v j is the j th column of V , such that X = V T V . Then, after obtaining X and V , we generate a random vector u uniformly distributed on the unit sphere in Rn and assign x = D (V T u), where D = diag( v 1 , . . . , v n ) = diag( x11 , . . . , xnn ), (4.25)
and, for any x Rn , (x) Rn is the vector whose j th component is sign(xj ): sign(xj ) = 1 if xj 0 1 otherwise.
It is easily see that x is a feasible point for the QP problem and we will show later that its expected objective value, Eu q ( x), satises Eu q ( x) q 4 1 . q q 2 7 That is, x is a 4/7-approximate minimizer for the QP problem expectantly. One can generate u repeatedly and choose the best x in the process. Thus, we will almost surely generate a x that is a 4/7-approximate minimizer.
4.7.2
Approximation analysis
First, we present a lemma which will be only used in our analysis. Lemma 4.21 Let u be uniformly distributed on the unit sphere in Rn . Then, q (Q) = minimize Eu ( (V T u)T DQD (V T u)) s.t. Ai (V T V ) = bi , i = 1, . . . , m, vj 1, j = 1, . . . , n,
where D = diag( v1 , . . . , vn ).
96
Proof. Since, for any feasible V , D (V T u) is a feasible point for the QP problem, we have q (Q) Eu ( (V T u)T DQD (V T u)). On the other hand, for any xed u with u = 1, we have
n n
qij vi
(4.26)
xi x
Thus, vi
which implies that for this particular feasible V q (Q) = q (x) = Eu ( (V T u)T DQD (V T u)). These two relations give the desired result. For any function of one variable f (t) and X Rnn , let f [X ] Rnn be the matrix with the components f (xij ). We have the next technical lemma whose proof is an exercise. Lemma 4.22 Let X 0 and d(X ) 1. Then arcsin[X ] X.
Now we are ready to prove the following theorem, where we use inmum to replace minimum, since for simplicity we require X to be positive denite in our subsequent analysis. Theorem 4.23 q (Q) = inmum s.t.
2 1 XD1 ]D) Q (D arcsin[D Ai X = bi , i = 1, . . . , m,
d(X ) e, X where
0,
= 1 2Pr (
T T vj u vi u ) = ( ) . vi vj
97
T T u vj vi u ) = ( ) vi vj
1 arccos
T vi vj vi vj
,
2
we have
2 arcsin
T vi vj vi vj
=
i=1 j =1
qij vi
vj
2 arcsin
T vi vj vi vj
Finally, Lemma 4.21 gives us the desired result. Theorem 4.23 and Lemma 4.22 lead to our main result: Theorem 4.24 We have i) p q ii) q p iii) p p q q 4 ( p p). 2 ( p p). 2 ( p p).
Proof. We prove (i). Recall z = z (Q) 0, y = y (Q), p = p(Q) = m T T e z + b y , and D( z ) + i=1 y i Ai Q 0. Thus, for any X 0, d(X ) e and D = diag( x11 , . . . , xnn ), from Theorem 4.23 q = q (Q) 2 2 Q (D arcsin[D1 XD1 ]D)
m m
=
1
Q D( z)
i=1
y i Ai + D( z) +
i=1
y i Ai
D arcsin[D1 XD1 ]D
98
Q D( z)
i=1 m
y i Ai y i Ai
D arcsin[D1 XD1 ]D
+ D( z) +
i=1
Q D( z)
i=1 m
y i Ai y i Ai
+ D( z) +
i=1
D arcsin[D1 XD1 ]D
m
(since Q D( z)
i=1 m
y i Ai X
D1 XD1 .)
Q D( z)
i=1 m
y i Ai y i Ai
+ D( z) +
i=1
D arcsin[D1 XD1 ]D y i Ai X
QX
D( z) +
i=1 m
+ D( z) +
i=1
y i Ai
m
QX z T d(X )
i=1
+ z d(D arcsin[D =
XD
]D) +
i=1
= Let X
T ( b) QX z T d(X ) y T b + z T ( d(X )) + y 2 2 1 1 (since d(D arcsin[D XD ]D) = d(X ) and aT i d(X ) = bi ) 2 Q X + ( 1) z T d(X ) + ( 1) yT b 2 2 Q X + ( 1)( zT e + y T b) 2 (since 0 d(X ) e and z 0) p. Q X + ( 1) 2
0 converge to X , then Q X p and we prove (i). Replacing Q with Q proves (ii) in the theorem. Adding the rst two inequalities gives (iii) of the theorem.
4.8. NOTES
99
The result indicates that the positive semi-denite relaxation value p p is a constant approximation of q q . Similarly, the following corollary can be devised: Corollary 4.25 Let X = V T V 0, d(X ) e, Ai X = bi (i = 1, . . . , m), D = diag( x11 , . . . , xnn ), and x = D (V T u) where u with u = 1 is a random vector uniformly distributed on the unit sphere. Moreover, let X 0 X. Then,
X X
X X
Finally, we have the following theorem: Theorem 4.26 Let x be randomly generated from X . Then Eu q ( x) q 1 < 4/7. q q 2 Proof. Since p q 2 2 2 2 p + (1 )p (1 ) p + p q p,
2 p + 2 + p 2 p + 2 + p
2 (1 )( p p) 2 p p) ( 2 = 2 2
1.
4.8
Notes
Semidenite relaxations have recently appeared in relation to relaxations for 0-1 optimization problems. In [207], a lifting procedure is presented to obtain 2 a problem in n ; and then the problem is projected back to obtain tighter
100
inequalities. See also [29]. Several of the operators that arise in our applications are similar to those that appear in [207]. However, our motivation and approach is dierent. A discussion of several applications for semidenite relaxation appears in [9]. Also see recent papers by Fujie and Kojima [113] and Polijak, Rendl and Wolkowicz [254]. During the past ten years, there have been several remarkable results on approximating specic quadratic problems using positive semi-denite programming. Goemans and Williamson [125] proved an approximation result for the Maxcut problem where 1 0.878. Nesterov [240] extended their result to approximating a boolean QP problem where 4/7. Most of the results are related to graph partition. Given an undirected graph G = (V, E ) with |V | = n, non-negative weights wij on edges (i, j ) E , and an integer k (1 < k < n), the maximization graph partition (MAX-GP) problem is to determine a subset S V of k nodes such that an objective function w(S ) is maximized. Some examples of MAX-GP are: Dense-k -Subgraph (DSP), where the total edge weights of the subgraph induced by S is maximized; Max-Cut with size k (MC), where the total edge weights of the edges crossing between S and V \ S is maximized; Max-Not-Cut with size k (MNC), where the total edge weights of the non-crossing edges between S and V \ S is maximized; Max-Vertex-Cover with size k (MVC), where the total edge weights of the edges covered by S is maximized. Since these MAX-GP problems are NP-hard (e.g., see [100] for DSP, [2] for MC, [151] for MNC, and [253] for MVC), one should not expect to nd polynomial time algorithms for computing their optimal solutions. Therefore, we are interested in how close to optimality one can approach in polynomial time. A (randomized) polynomial time approximation algorithm for a maximization problem has a performance guarantee or worst case ratio 0 < r 1, if it outputs a feasible solution whose (expected) value is at least r times the maximal value for all instance of the problem. Such an algorithm is often called (randomized) r-approximation algorithm. A key step in designing a good approximation algorithm for such a maximization problem is to establish a good upper bound on the maximal objective value. Linear programming (LP) and semidenite programming (SDP) have been frequently used to provide such upper bounds for many NP-hard problems. There are several approximation algorithms for DSP. Kortsarz and Peleg [193] devised an approximation algorithm which has a performance guarantee O(n0.3885 ). Feige, Kortsarz and Peleg [101] improved it to O(n1/3+ ), for some > 0. The other approximation algorithms have performance guarantees which are the function of k/n, e.g., a greedy heuristic by Asahiro et.al. [23], and SDP relaxation based algorithms developed by Feige and Langberg [99] and Feige and Seltser [100], and Srivastav and Wolf [282]. The previously best performance guarantee, k/n for general k and k/n + k for k n/2, was obtained by Feige and Langberg [99]. Moreover, for the case k = n/2, Ye and Zhang [151], using a new SDP relaxation, obtained an improved 0.586 performance guarantee from 0.517 of Feige and Langberg [99]. For approximating the MC problem with size k , both LP and SDP based
4.9. EXERCISES
101
approximation algorithms have a performance ratio 1/2 for all k , see Ageev and Sviridenko [2] and Feige and Langberg [99]. For k = n/2, i.e., the MaxBisection, Frieze and Jerrum [110] obtained a 0.651-approximation algorithm (the same bound was also obtained by Andersson [16] in his paper for max-psection). Subsequently, this ratio has been improved to 0.699 by Ye [334]. Both of their approximation algorithms are based on SDP relaxations. For approximating the MNC problem with size k , the LP-based approximak(nk) tion algorithm has a performance ratio 1 2 n(n1) for all k , and the SDP-based algorithm has a ratio .5 + k for k n/2, see Feige and Langberg [99]. Again, Ye and Zhang [151] obtained a 0.602-approximation algorithm for MNC when k = n/2, comparing to .541 of Feige and Langberg [99]. Han et al. [151] has presented an improved method to round an optimal solution of the SDP relaxation of the MAX-GP problem for general k . This rounding technique is related to the well-known rounding method introduced by Goemans and Williamson [125], Feige and Goemans [98] for MAX-DICUT and MAX 2-SAT, Zwick [344] for constraint satisfaction problems, and Nesterov[240] and Zwick [345] for MAX-CUT. This kind of randomized algorithm can be derandomized by the technique of Mahajan and Ramesh[214]. What complicates matters in the MAX-GP problem, comparing to the MAXCUT problem, is that two objectives are soughtthe objective value of w(S ) and the size of S . Therefore, in any (randomized) rounding method, we need to balance the (expected) quality of w(S ) and the (expected) size of S . One wants high w(S ); but, at the same time, zero or small dierence between |S | and k , since otherwise we have to either add or subtract nodes from S , resulting in a deterioration of w(S ) at the end. Our improved rounding method is built upon this balance need. As consequences of the improved rounding method, they have yielded improved approximation performance ratios for DSP, MC, MNC and MVC, on a wide range of k . On approximating DSP, for example, our algorithm has guaranteed performance ratios .648 for k = 3n/5, 0.586 for k = n/2, 0.486 for k = 2n/5 and 0.278 for k = n/4. For MC and MNC, the performance guarantees are also much better than 0.5 for a wide range of k . On approximating MVC, our algorithm has guaranteed performance ratios .845 for k = 3n/5, 0.811 for k = n/2, and 0.733 for k = 2n/5.
4.9
Exercises
4.1 Let X be a positive semidenite matrix of rank r, A be a given symmetric matrix. Then, there is a decomposition of X
r
X=
j =1
xj xT j ,
102
4.2 Prove Theorem 4.4. 4.3 The s t Max-Cut Problem: This is a max-cut problem where we require that a pair of nodes s and t be separated by the cut. Construct the SDP relaxation for the problem. Show that the problem can be approximated by the factor 0.878. The result holds for more pairs need to be separated. 4.4 Triangle Inequality: In the Max-Cut or other problems, decision variable xi is either 1 or 1. Thus, for any three variables we have |xi + xj + xk | 1 and |xi xj | + |xj xk | |xk xi | which call triangle-type inequalities. Show how to incoporate these inequalities in the SDP relaxation of the problem. 4.5 Let X 0 and d(X ) 1. Then arcsin[X ] X.
4.6 Prove Lemma 4.10. 4.7 In solving the standard SDP problem with m equality constraints, we have shown that there is an optimal SDP solution whose rank r satisfying r(r + 1) m, 2 and there is a strongly polynomial-time algorithm to nd such a solution from any exactly optimal SDP solution. In reality, one may never generate an exactly optimal SDP solution but an -optimal solution X , meaning its minimal objective value is away from the optimal z : C X z + . Prove that, from X , one can nd an -optimal SDP solution whose rank r satises r(r + 1) m+1 2 in strongly polynomial time. Show how the rank-reduction procedure works. 4.8 Consider the following eigenvalue optimization problem minimize subject to nmax (Q + Diag( y )) eT y =0 y Rm , where max (A) is the maximum eigenvalue of the matrix A. Show that this problem is equivalent to the dual of the SDP relaxation for Max-Cut. (4.27)
4.9. EXERCISES
103
4.9 Quadratic 0-1 Programming: The following problem is referred as the Quadratic 0-1 programming maximize subject to xT Bx x {0, 1}n . (4.28)
Show that the following matrix representation is equivalent to the above quadratic 0-1 programming problem (4.28): maximize BX 1 diag (X ) diag (X )T X 0, (4.29)
= subject to X
) = 1, rank(X where diag(A) is the vector of diagnoal elements in A. Note 1. It suces to show that both problems have the same feasible region. Note 2. Problem (4.29) without the rank 1 constraint on X is an SDP relaxation of (4.28). 4.10 The 2-Catalog Segmentation Problem ([325]): Given a ground set I of n items, a family {S1 , S2 , , Sm } of subsets of I and an integer 1 k n. The problem is to nd two subsets A1 , A2 I such that |A1 | = |A2 | = k to maximize m i=1 max{|Si A1 |, |Si A2 |}. Here I can be the list of goods; there are m customers where customer i is interested in the goods of Si ; A1 and A2 are the two catalogs one of which could be sent to each of the customers such that the (total) satisfaction is maximized. Find an SDP relaxation to the problem. 4.11 The Sparsest Cut Problem ([22]): Given a graph G = (V, E ). For any ) with |S | |V |/2, the edge expansion of the cut is |E (S, S )|/|S |. The cut (S, S problem is to nd the S such that |E (S, S )|/|S | is the smallest. Find an SDP relaxation to the problem.
104
Chapter 5
5.1
The basic mathematical model of Euclidean distance geometry optimization can be described as a quadratically constrained optimization problem. Let n unknown points xj Rd , j = 1, ..., n. For each pair of (i, j ), we are given ij and lower bound d between the two points Euclidean distance upper bound d ij xi and xj , and an objective function of x1 , ..., xn . Moreover, we require that each point xj where is a convex set. Then, the basic model can be formulated as: minimize subject to f (x1 , ..., xn ) (dij )2 xi xj xj , j,
2
ij )2 , i < j (d
(5.1)
(dij )2
(5.2)
105
106
is not convex, so that the ecient convex optimization techniques cannot apply to solving this problem. Let X = [x1 x2 ... xn ] be the d n matrix that needs to be determined. Then T xi xj 2 = eT ij X Xeij , where eij is the vector with 1 at the ith position, 1 at the j th position and zero everywhere else. Let Y = X T X . Then problem (5.1) can be rewritten as: minimize subject to f (X, Y ) 2 (dij )2 eT ij Y eij (dij ) , i < j, xj , j, Y = X T X.
(5.3)
Our approach is to relax problem (5.3) to a semidenite program: minimize subject to f (X, Y ) 2 (dij )2 eT ij Y eij (dij ) , i < j, xj , j, Y X T X.
(5.4)
Then, the problem can be written as a standard SDP problem: minimize subject to f (Z ) Z (1 : d, 1 : d) = I, ij )2 , i < j, (dij )2 (0; eij )T Z (0; eij ) (d Z (1 : d, j ) j = d + 1, ..., d + n, Z 0.
(5.5)
If f (Z ) is a convex function of Z , then (5.5) becomes a convex optimization problem. In particular, if f (Z ) is a linear function of Z and is a polyhedron, then (5.5) is a (linear) semidenite program.
5.2
There has been an increase in the use of semidenite programming (SDP) for solving wide range of Euclidean distance geometry problems, such as data compression, metric-space embedding, ball packing, chain folding etc. One application of the SDP Euclidean distance geometry model lies in ad hoc wireless sensor networks which are constructed for monitoring environmental information(temperature, sound levels, light etc) across an entire physical space. Typical networks of this type consist of a large number of densely deployed sensor
107
nodes which gather local data and communicate with other nodes within a small range. The sensor data from these nodes are relevant only if we know what location they refer to. Therefore knowledge of of each sensor position becomes imperative. Generally, the use of a GPS system is a very expensive solution to this requirement. Indeed, other techniques to estimate node positions are being developed that rely just on the measurements of distances between neighboring nodes [51, 65, 79, 116, 161, 163, 245, 268, 269, 270, 273]. The distance information could be based on criterion like time of arrival, angle of arrival and received signal strength. Depending on the accuracy of these measurements and processor, power and memory constraints at each of the nodes, there is some degree of error in the distance information. Furthermore, it is assumed that we already know the positions of a few anchor nodes. The problem of nding the positions of all the nodes given a few anchor nodes and partial distance information between the nodes is called the position estimation or localization problem. In particular, the paper [51] describes an SDP relaxation based model for the position estimation problem in sensor networks. The optimization problem is set up so as to minimize the error in the approximate distances between the sensors. Observable traces are developed to measure the quality of the distance data. The basic idea behind the technique is to convert the non-convex quadratic distance constraints into linear constraints by introducing relaxations to remove the quadratic term in the formulation. The performance of this technique is highly satisfactory compared to other techniques. Very few anchor nodes are required to accurately estimate the position of all the unknown sensors in a network. Also the estimation errors are minimal even when the anchor nodes are not suitably placed within the network. More importantly, for each sensor the model generates numerical data to measure the reliability and accuracy of the positions computed from the model, which can be used to detect erroneous or outlier sensors.
5.2.1
For simplicity, let the sensor points be placed on a plane. Recall that we have m known anchor points ak R2 , k = 1, ..., m, and n unknown sensor points xj R2 , j = 1, ..., n. For every pair of two points, we have a Euclidean distance measure if the two are within a communication distance range R. Therefore, ij between unknown say for (i, j ) Nx , we are given Euclidean distance data d sensors i and j , and for (k, j ) Na we know distance dkj between anchor k and sensor j . Note that for the rest of pairs we have only a lower bound R for their pair-wise distances. Therefore, the localization problem can be formulated as
108
an error minimization problem with mixed equalities and inequalities: minimize subject to
i,j Nx , i<j 2
xi xj ak xj xi xj ak xj
|ij | + k,j Na |kj | ij )2 + ij , (i, j ) Nx , i < j, = (d 2 kj )2 + kj , for (k, j ) Na , = (d 2 R2 , for the rest i < j, 2 R2 , for the rest k, j.
Thus, we relax the problem to a semidenite program: minimize subject to |ij | + k,j Na |kj | (1; 0; 0) Z (1; 0; 0) = 1 (0; 1; 0)T Z (0; 1; 0) = 1 (1; 1; 0)T Z (1; 1; 0) = 2 ij )2 + ij , (i < j, j ) Nx , (0; eij )T Z (0; eij ) = (d T kj )2 + kj , (k, j ) Na , (ak ; ej ) Z (ak ; ej ) = (d T 2 (0; eij ) Z (0; eij ) R , (i < j, j ) / Nx , (ak ; ej )T Z (ak ; ej ) R2 , (k, j ) / Na , Z 0.
i,j Nx , i<j T
(5.6)
The matrix of Z has 2n + n(n + 1)/2 unknown variables. Consider the case that among {k, i, j }, there are 2n + n(n + 1)/2 of the pairs where each of them has same distance upper and lower bounds, and = 0 for the minimal solution of (5.6). Then we have at least 2n + n(n + 1)/2 linear equalities among the constraints. Moreover, if these equalities are linearly independent, then Z has a unique solution. Therefore, we can show Proposition 5.1 If there are 2n + n(n + 1)/2 distance pairs each of which has an accurate distance measure and other distance bounds are feasible. Then, the minimal value of = 0 in (5.6). Moreover, if (5.6) has a unique minimal solution I X = , Z T X Y = (X )T X and X equal true positions of the unknown then we must have Y sensors. That is, the SDP relaxation solves the original problem exactly. Proof. Let X be the true locations of the n points, and Z = I (X )T X (X ) X
T
Then Z and = 0 is a feasible solution for (5.6). is the unique solution to satisfy the 2n+n(n+1)/2 On the other hand, since Z = Z so that Y = (X )T X = X T X . equalities, we must have Z We present a simple case to show what it means for the system has a unique solution. Consider n = 1 and m = 3. The accurate distance measures from
109
unknown b1 to known a1 , a2 and a3 are d11 , d21 and d31 , respectively. Therefore, the three linear equations are y 2xT a1 y 2xT a2 y 2xT a3 = (d11 )2 a1 = (d21 )2 a2 = (d31 )2 a3
2 2 2
This system has a unique solution if it has a solution and the matrix 1 a1 1 a2 1 a3
is nonsingular. This essentially means that the three points a1 , a2 and a3 are not on the same line, and then x = b1 can be uniquely determined. Here, the SDP method reduces to the so-called triangular method. Proposition 5.1 and the example show that the SDP relaxation method has the advantage of the triangular method in solving the original problem.
5.2.2
The case discussed in Proposition 5.1 is deterministic. Alternatively, each xj can be viewed a random point x j since the distance measures contain random errors. Then the solution to the SDP problem provides the rst and second moment information on x j , j = 1, ..., n. Such an interpretation appears to be rst stated in Bertsimas and Ye [50]. Generally, we have E[ xj ] x j , j = 1, ..., n and where = Z ij , E[ xT j ] Y i x I T X i, j = 1, ..., n. X Y
is the optimal solution of the SDP problem. Thus, X T X Y represents the co-variance matrix of x j , j = 1, ..., n. These quantities also constitute error management and analyses of the original problem data. For example,
n
X T X ) = Trace(Y
j =1
jj x (Y j
),
the total trace of the co-variance matrix, measures the quality of distance sample data dij and dkj . In particular, individual trace jj x Y j
2
(5.7)
110
which is also the variance of x j , helps us to detect possible distance measure errors, and outlier or defect sensors. These errors often occur in real applications either due to the lack of data information or noisy measurement, and are often dicult to detect since the true location of sensors is unknown. We again use the same simple case to illustrate our theory. Consider n = 1 and m = 3. The inexact distance measures from unknown b1 to known a1 , a2 and a3 are d11 + , d21 + and d31 + , respectively, where is a random error with zero mean. Therefore, the three linear equations are y 2 xT a1 + a1 y 2 xT a2 + a2 y 2 xT a3 + a3
2 2 2
2 2 2
Taking expect values on both sides, we have E[ y ] 2E[ x]T a1 + a1 E[ y ] 2E[ x]T a2 + a2 E[ y ] 2E[ x]T a3 + a3 or E[ y ] E[ x]T E[ x] + E[ x] a1 E[ y ] E[ x]T E[ x] + E[ x] a2 T E[ y ] E[ x] E[ x] + E[ x] a3 The solution to the linear equation is E[ x] = b1 , and E[ y ] E[ x]T E[ x] = E[ 2 ] or E[ y ] = E[ 2 ] + b1 2 . That is, x is a point estimate of b1 . Moreover, from E[ y ] E[ x 2 ] + E[ x 2 ] b1 we have E[ y x 2 ] E[ 2 ] and E[ x 2 ] b1
2 2 2 2 2 2 2 2
= E[ y ] b1
= E[ 2 ],
E[ 2 ],
so that the quantity of y x 2 is a lower bound estimate for the error variance and the variance of x is also bounded by the error variance. This quantity gives an interval estimation of b1 . More generally, we have
5.2. WIRELESS SENSOR NETWORK LOCALIZATION Proposition 5.2 Let the noisy measurements ij = dij + d and kj = dkj + d
j, i
111
j,
i=j
k, j
where dij are the true distances and j are independent random errors with zero mean. Moreover, let the minimal = 0 in (5.6) and the anchor points are linear independent. Then, we have E[ xj ] = bj and ij ] = (bi )T bj i = j, E[Y where bj is the true position of xj , j = 1, ..., n, and = Z is the minimizer of (5.6). Proof. Since = 0, we have, for all i, j, k , ii 2Y ij + Y jj Y T jj 2 Y xj ak + ak 2 = (dij + = (dkj +
i
and
jj ] = bj E[Y
+ E[ 2 j] j
I T X
X Y
+ j )2 2 j) .
Taking expect values on both sides, we have ii ] 2E[Y ij ] + E[Y jj ] E[Y jj ] E[ E[Y xj ]T E[ xj ] + E[ xj ] ak or ii ] 2E[Y ij ] + E[Y jj ] E[ E[Y xi ] E[ xj ] 2 + E[ xj ] E[ xi ] jj ] E[ E[Y xj ]T E[ xj ] + E[ xj ] ak Thus, E[ xj ] = bj and ij ] = (bi )T bj i = j E[Y is the solution satisfying these equations. jj ] = bj and E[Y
2 2 2 2
2 (dij )2 + E[ 2 i ] + E[ j ]
= (dkj )2 + E[ 2 j ].
= =
2 (dij )2 + E[ 2 i ] + E[ j ]
(dkj )2 + E[ 2 j ].
+ E[ 2 j] j
112
5.3
In the previous section, we show that if all accurate distances are given, then the graph can be localized by solving an SDP problem in polynomial time. In real applications, only partial distances(or edges) are known to the model. Can these graph be localized or realized in polynomial time? In this section, we given a positive answer to a family of uniquely localizable graphs. Informally, a graph is uniquely localizable in dimension d if (i) it has a unique realization in Rd , and (ii) it does not have any nontrivial realization whose ane span is Rh , where h > d. Specically, we present an SDP model that guarantees to nd the unique realization in polynomial time when the input graph is uniquely localizable. The proof employs SDP duality theory and properties of interior point algorithms for SDP. To the best of our knowledge, this is the rst time such a theoretical guarantee is proven for a general localization algorithm. Moreover, in view of the hardness result of Aspnes et. al. [24], The result is close to be the best possible in terms of identifying the largest family of eciently realizable graphs. We also introduce the concept of strong localizability. Informally, a graph is strongly localizable if it is uniquely localizable and remains so under slight perturbations. We show that the SDP model will identify all the strongly localizable subgraphs in the input graph.
5.3.1
Preliminaries
For the simplication of our analyses, we ignore the lower bound constraints and study the Graph Localization problem that is dened as follows: nd a realization of x1 , . . . , xn Rd such that ak xj xi xj
2 2
2 =d kj = d2 ij
(k, j ) Na (i, j ) Nx
(5.8)
Recall that m anchor points a1 , . . . , am Rd whose locations are known, and n sensor points x1 , . . . , xn Rd whose locations we wish to determine. Furtherkj between ak more, we are given partial accurate Euclidean distance values d and xj for (k, j ) Na , and dij between xi and xj for some (i < j, j ) Nx . Again, we can write the relaxed problem as a standard SDP problem, namely, nd a symmetric matrix Z R(d+n)(d+n) to: maximize subject to 0 Z1:d,1:d = Id (0; eij )(0; eij )T Z = d2 ij 2 (ak ; ej )(ak ; ej ) Z = d kj
T
(i, j ) Nx (k, j ) Na
(5.9)
where Z1:d,1:d is the d d principal submatrix of Z . Note that this formulation forces any possible feasible solution matrix to have rank at least d.
5.3. GENERAL SDP THEORY ON GRAPH REALIZATION The dual of the SDP relaxation is given by: minimize Id V +
(i,j )Nx
113
yij d2 ij +
(k,j )Na
2 wkj d kj
subject to +
V 0
0 0
+
(i,j )Nx
(5.10)
Note that the dual is always feasible, as V = 0, yij = 0 for all (i, j ) Nx and wkj = 0 for all (k, j ) Na is a feasible solution.
5.3.2
We now investigate when will the SDP (5.9) have an exact relaxation, i.e. when will the solution matrix Z have rank d. Suppose that problem (5.9) is feasible. kj and dij represent exact distance values for This occurs when, for instance, d = [ = ( Id ; X )T (Id ; X ) is a the positions X x1 x 2 . . . x n ]. Then, the matrix Z feasible solution for (5.9). Now, since the primal is feasible, the minimal value of the dual must be 0, i.e. there is no duality gap between the primal and dual. Let U be the (d + n)dimensional dual slack matrix, i.e.: U = V 0 +
(k,j )Na
0 0
+
(i,j )Nx
Then, from the duality theorem for SDP (see, e.g., [10]), we have: be a feasible solution for (5.9) and U be an optimal slack Theorem 5.3 Let Z matrix of (5.10). Then, U = 0 or Z U = 0; 1. complementarity condition holds: Z ) + Rank(U ) d + n; 2. Rank(Z ) d and Rank(U ) n. 3. Rank(Z An immediate result from the theorem is the following: Corollary 5.4 If an optimal dual slack matrix has rank n, then every solution of (5.9) has rank d. That is, problems (5.8) and (5.9) are equivalent and (5.8) can be solved as an SDP in polynomial time. Another technical result is the following:
114
Proposition 5.5 If every sensor point is connected, directly or indirectly, to an anchor point in (5.8), then any solution to (5.9) must be bounded, that is, Yjj is bounded for all j = 1, . . . , n. Proof. If sensor point xj is connected to an anchor point ak , then we have: xj
2
2aT k xj + ak
Yjj 2aT k xj + ak
2 =d kj
so that from the triangle inequality xj in (5.9) is bounded. Hence, we have: 2 + 2 ak Yjj d kj xj ak
2
Furthermore, if xi is connected to xj and Yjj is bounded, we have: Yii 2 Yii Yjj + Yjj Yii 2Yij + Yjj = d2 ij
so that from the triangle inequality Yii must be also bounded. In general, a primal (dual) maxrank solution is a solution that has the highest rank among all solutions for primal (5.9) (dual (5.10)). It is known [147, 130, 210] that various pathfollowing interiorpoint algorithms compute the maxrank solutions for both the primal and dual in polynomial time. This motivates the following denition. Denition 5.1 Problem (5.8) is uniquely localizable if there is a unique local Rdn and there is no xj Rh , j = 1, . . . , n, where h > d, such ization X that: 2 (ak ; 0) xj 2 = d (k, j ) Na kj 2 2 xi xj = dij (i, j ) Nx xj = ( x j ; 0) for some j {1, . . . , n} The latter says that the problem cannot have a nontrivial localization in some higher dimensional space Rh (i.e. a localization dierent from the one obtained by setting xj = ( xj ; 0) for j = 1, . . . , n), where anchor points are augmented to (ak ; 0) Rh , for k = 1, . . . , m. We now develop the following theorem: Theorem 5.6 Suppose that the network is connected. statements are equivalent: 1. Problem (5.8) is uniquely localizable. 2. The maxrank solution matrix of (5.9) has rank d. 3. The solution matrix of (5.9) satises Y = X T X . Proof. The equivalence between 2. and 3. is straightforward. Now, since any rank d solution of (5.9) is a solution to (5.8), from 2. to 1. we need to prove that if the maxrank solution matrix of (5.9) has rank d then it is unique. Suppose not, i.e., (5.9) has two rankd feasible solutions: Z1 = Id T X1 X1 T X1 X1 and Z2 = Id T X2 X2 T X2 X2 Then, the following
115
Then, the matrix Z = Z1 + Z2 , where + = 1 and , > 0 is a feasible solution and its rank must be d, since all feasible solution of (5.9) has rank at least d but the maxrank is assumed to be d. Therefore, we have: Z = = Id T T X1 + X2 Id BT B BT B X1 + X2 T T X1 X1 + X2 X2
where B = X1 +X2 . It follows that (X1 X2 )T (X1 X2 ) = 0, or X1 X2 = 0, i.e. Z1 = Z2 , which is a contradiction. Next, we prove the direction from 1. to 2., that is, the rank of a maxrank solution of (5.9) is d. Suppose that there is a feasible solution Z of (5.9) whose rank is greater than d. Then, we must have Y X T X and Y = X T X . Thus, we T T have the decomposition Y X X = (X ) X , where X = [x1 , . . . , xn ] Rrn and r is the rank of Y X T X . Now, consider the point: x j = Then, we have: x j
2
xj xj
Rd+r
for j = 1, . . . , n
= Yjj , ( xi )T x j = Yij
i, j ;
and there exist at least one x j such that x j = xj or xj = 0. Moreover, since the network is connected, we conclude from Proposition 5.5 that Yii and Yij are bounded for all i, j . Hence, we have: (ak ; 0) x j x i x j
2 2
2 =d kj = d2 ij
(k, j ) Na (i, j ) Nx
In other words, x j is a localization of problem (5.8) in Rd+r , which is a contradiction. Theorem 5.6 establishes, for the rst time, that as long as problem (5.8) is uniquely localizable, then the realization can be computed in polynomial time by solving the SDP relaxation. Conversely, if the relaxation solution computed by an interiorpoint algorithm (which generates maxrank feasible solutions) has rank d (and hence Y = X T X ), then X is the unique realization of problem (5.8). Moreover, as the recent result of Aspnes et. al. [24] shows, the results of Theorem 5.6 are close to be the best possible, since the problem of computing a realization of the sensors on the plane is NPcomplete in general, even when the instance has a unique solution on the plane.
5.3.3
Although unique localizability is an useful notion in determining the solvability of the Sensor Network Localization problem, it is not stable under perturbation. As we shall see in Section 5.3.4, there exist networks which are uniquely localizable, but may no longer be so after small perturbation of the sensor points. This motivates us to dene another notion called strong localizability.
116
Denition 5.2 We say problem (5.8) is strongly localizable if the dual of its SDP relaxation (5.10) has an optimal dual slack matrix with rank n. Note that if a network is strongly localizable, then it is uniquely localizable from Theorems 5.3 and 5.6, since the rank of all feasible solution of the primal is d. We show how we can construct a rankn optimal dual slack matrix. First, note that if U is an optimal dual slack matrix of rank n, then it can be written in the form U = (X T ; In )T W (X T ; In ) for some positive denite matrix W of rank n. Now, consider the dual matrix U . It has the form: U= U11 T U12 U12 U22
where U22 is an n n matrix. Moreover, it can be decomposed as U22 = A + D, where Aij = yij if i = j , Aii = j Aij ; and D is a diagonal matrix where Dii = (k,i)Na wki . (If there is no (k, i) Na , then Dii = 0.) Note that if we impose the constraints yij 0 and wki 0, then both A and D are positive semidenite. Moreover, we have the following: Proposition 5.7 Suppose that the network is connected. Furthermore, suppose that yij < 0 for all (i, j ) Nx , and that wki < 0 for some (k, i) Na , with Na = . Then, U22 is positive denite, i.e. it has rank n. Proof. Since A and D are positive semidenite, we have xT U22 x 0 for all x Rn . We now show that there is no x Rn \{0} such that xT Ax = xT Dx = 0. Suppose to the contrary that we have such an x. Then, since D is diagonal, n we have xT Dx = i=1 Dii x2 i = 0. In particular, for Dii > 0, we have xi = 0. Now, note that:
n n
xT Ax =
i=1 j =1
xi xj Aij =
i<j
(xi xj )2 Aij
Thus, xT Ax = 0 implies that xi = xj for all i, j . Since Na = , there exists an i such that Dii > 0, whence xi = 0. It follows that x = 0. Proposition 5.7 gives us a recipe for putting U into the desired form. First, we set U22 to be a positive 22 , where X is the matrix denite matrix. Then, we need to set U12 = XU containing the true locations of the sensors. We now investigate when this is possible. Note that the above condition is simply a system of linear equations. Let Ai be the set of sensors connected to anchor i, and let E be the number of sensorsensor edges. Then, the above system has E + i |Ai | variables. The number of equations is E + 3m, where m is the number of sensors that are connected to some anchors. Hence, a sucient condition for solvability is that the system of equations are linearly independent, and that i |Ai | 3m. In particular, this shows that the trilateration graphs dened in [89] are strongly localizable. We now develop the next theorem.
117
Theorem 5.8 If a problem (graph) contains a subproblem (subgraph) that is strongly localizable, then the submatrix solution corresponding to the subproblem in the SDP solution has rank d. That is, the SDP relaxation computes a solution that localizes all possibly localizable unknown points. Proof. Let the subproblem have ns unknown points and they are indexed as 1, . . . , ns . Since it is strongly localizable, an optimal dual slack matrix Us of the SDP relaxation for the subproblem has rank ns . Then in the dual problem of the SDP relaxation for the whole problem, we set V and those w associated with the subproblem to the optimal slack matrix Us and set all other w equal 0. Then, the slack matrix: U= Us 0 0 0 0
must be optimal for the dual of the (wholeproblem) SDP relaxation, and it is complementary to any primal feasible solution of the (wholeproblem) SDP relaxation: Zs Id Xs Z= 0 where Zs = T Xs Ys However, we have 0 = Z U = Zs Us and Us , Zs 0. The rank of Us is ns implies that the rank of Zs is exactly d, i.e. Ys = (Xs )T Xs , so Xs is the unique realization of the subproblem.
5.3.4
A Comparison of Notions
In this section, we will show that the notions of unique localizability, strong localizability and rigidity in R2 are all distinct.
5.3.5
We have already remarked earlier that a strongly localizable graph is necessarily uniquely localizable. However, as we shall see, the converse is not true. Let G1 be the network shown in Figure 5.1(a). The key feature of G1 is that the sensor x2 lies on the line joining anchors a1 and a3 . It is not hard to check that this network is uniquely localizable. Now, suppose to the contrary that G1 is strongly localizable. Then, the dual slack matrix U admits the decomposition T , I )T W (X T , I ). It is easy to verify that: U = ( X U12 U22 = ( y21 a2 + y 31 a3 , y 12 a1 + y 32 a3 ) = ( y21 + y 31 ) y12 y12 y12 ( y12 + y 32 ) y12
22 . This is equivalent to the and the form of U requires that U12 = XU following system of equations: ( x1 a2 ) y21 + ( x1 a3 ) y31 = ( x1 x 2 )y12 ( x2 a1 ) y12 + ( x2 a3 ) y32 = ( x1 x 2 )y12 (5.11) (5.12)
118
Since x 2 lies on the ane space spanned by a1 and a3 , equation (5.12) implies that y12 = 0. However, equation (5.11) would then imply that x 1 lies on the ane space spanned by a2 and a3 , which is a contradiction. Thus, we conclude that G1 is not strongly localizable.
a1
a1=(0,1.4)
x1
x2
x1=(0,0.5)
x2=(0.6,0.7)
a2=(1,0)
a3=(1,0)
a2
a3
5.3.6
By denition, a uniquely localizable network is rigid in R2 . However, the converse is not true. To see this, let G2 be the network shown in Figure 5.1(b). Note that G2 can be viewed as a perturbed version of G1 . It is easy to verify that G2 is rigid. Thus, by Theorem 5.6, it can fail to be uniquely localizable only if it has a realization in some higher dimension. Indeed, the above network has an 3dimensional realization. The idea for constructing such a realization is as follows. Let us rst remove the edge (x1 , x2 ). Then, reect the subgraph induced by a1 , x2 , a2 across the dotted line. Now, consider two spheres, one centered at a2 and the other centered at a3 , both having radius 5/2. The intersection of these spheres is a circle, and we can move x1 along this circle until the distance between x1 and x2 equals to the prespecied value. Then, we can put the edge (x1 , x2 ) back and obtain an 3dimensional realization of the network. More precisely, for the above realization, the reected version of x2 has co 173 112 495 23 ordinates x2 = 370 , 185 , 0 . Now, let x1 = 0, 64 , 64 . It is straightforward to verify that: x1 a2 x1 a3 x1 x2
2 2 2
= x1 a2 = x1 a3 = x1 x2
2 2 2
= = =
5 4 5 4 2 5
119
Hence, we conclude that G2 is not uniquely localizable. It would be nice to have a characterization on those graphs which are rigid in the plane but have higher dimensional realizations. However, nding such a characterization remains a challenging task, as such characterization would necessarily be noncombinatorial, and would depend heavily on the geometry of the network. For instance, the networks shown in Figure 5.2, while having the same combinatorial property as the one shown in Figure 5.1(b), are uniquely localizable (in fact, they are both strongly localizable):
a1
a1
x2
x2
x1
a2 a3
a2
a3
x1
(a)
(b)
5.3.7
The computational results presented here were generated using the interiorpoint algorithm SDP solvers SeDuMi of Sturm and DSDP2.0 of Benson and Ye. The performance of this technique seems highly satisfactory compared to other techniques. Very few anchor nodes are required to accurately estimate the position of all the unknown nodes in a network. Also the estimation errors are minimal even when the anchor nodes are not suitably placed within the network. Simulations were performed on a network of 50 sensors or nodes randomly generated in a square region of [.5 .5] [.5 .5]. The distances between the nodes was then calculated. If the distance between two notes was less than a given radiorange between [0, 1], a random error was added to it ij = dij (1 + randn(1) noisyf actor), d where noisyf actor was a given number between [0, 1], and then both upper and lower bound constraints were applied for that distance in the SDP model. If the distance was beyond the given radio range, only the lower bound constraint,
120
1.001 radiorange2 , was applied. The average estimation error is dened by 1 x j aj , n j =1 where x j comes from the SDP solution and aj is the true position of the j th X T X the total trace. node. As discussed earlier, we called the trace of Y Connectivity indicates how many of the nodes, on average, are within the radio range of a sensor. Also the original and the estimated sensors were plotted. The blue diamond nodes refer to the positions of the anchor nodes, green circle nodes to the true locations of the unknown sensors and red star nodes to their estimated positions . The discrepancies in the positions can be estimated by the osets from X between the true and the estimated points as indicated by the solid lines, see Figure 5.3. Even with noisy data measurement, the position estimations for the sensors are fairly accurate. Figure 5.4 shows the correlation between individual oset errors (not observable) and individual traces (observable) of corresponding sensors when the radio range is set at 0.25.
n
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.1
0.1
0.2
0.2
0.3
0.3
0.4
0.4
0.5 0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.5 0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
total
Figure 5.3: Position estimations with 3 anchors, noisy factor=0, and various radio ranges.
Several key questions have to be answered for the future research: What is the minimal radio range such that the network is localizable? How does the sensor geographical distribution change the minimal radio range? What is the sensitivity of the distance data? How to lter the data noisy? etc.
121
0.4
0.3
0.2
0.2
0.1
0.15
0.1
0.1
0.2
0.3
0.05
0.4
0.5 0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
10
15
20
25
30
35
40
45
50
of
anchors=3,
of
anchors=3,
radio
Figure 5.4: Diamond: the oset distance between estimated and true positions, jj x Box: the square root of trace Y j 2 .
5.4
We list few other variances of the distance geometry or graph localization problem.
5.4.1
The basic mathematical model of the metric distance embedding problem can be described as a quadratically constrained optimization problem. We are given metric p-norm distances dij for all i, j = 1, ..., n. The problem is to nd n unknown points xj Rn , j = 1, ..., n such that xi xj
2
= (dij )2 i = j
or the error is minimized. Thus, an optimization problem can be formulated as: minimize subject to (dij )2 xi xj 2 (dij )2 , i < j xj Rn , j = 1, ..., n. (5.13)
minimize subject to
(5.14)
122 Again
Maximize Subject to 0
i<j
uij (dij )2
Reduce Y s rank to O(log n) for 2 =1+ . rank O(log n) for p O(log n) for all p. ? Better bound for 1
5.4.2
Molecular conrmation
The basic mathematical model of the molecular conrmation problem can be described as a quadratically constrained optimization problem as well. For a ij and lower pair of (i, j ) N , we are given Euclidean distance upper bound d bound dij between points i and j , the problem is to nd Y such that 2 (dij )2 (eij eT ij ) Y (dij ) , i < j N . Furthermore, it is desired to nd other missing dij such that the Euclidean distance matrix can be completed. The progem can be formulated as an SDP: minimize I Y 2 subject to (dij )2 (eij eT ij ) Y (dij ) , i < j N Y 0. (5.15)
5.4.3
The Euclidean ball packing problem is an old mathematical geometry problem with plenty modern applications in Bio-X and chemical structures. It is a special case of the Euclidean distance geometry problem, say d = 3, xj R3 is the ij = and d = ri + rj , where ri unknown position of the center of ball j , d ij and rj are the radii of balls i and j , respectively; and represents a convex container where the balls should be packed.
123
Say that we want to pack n balls in a box with width and length equal 2R and like to minimize the height of the box, we can formulate it as minimize subject to xi xj 2 (ri + rj )2 , i = j, R + rj xj (1) R rj , j, R + rj xj (2) R rj , j, rj xj (3) rj , j,
(5.16)
The rst constraint tells that the distance between any two balls should be greater than or equal to the sum of the two balls radii, the second says that the rst coordinate of xj is within the width constraint and the third is for the length requirement, and the fourth indicates that the ball center has to be above rj from the bottom and its height is below rj where is to be minimized.
5.4.4
Given P , a data point set of p1 , ..., pn Rd , a fundamental question is how to embed P into Q of q1 , ..., qn Rk , where k d, such that qj s keep all essential information of P , such as the norms, distances and angles between pj s. In other words, nd a d k -dimension subspace such that the projections of pj s onto the subspace has a minimal information loss. We believe that this problem can be posted as an Euclidean distance geometry problem Lk := minimize subject to
nk T 2 i=1 ((pl ) xi ) , nk T 2 i=1 ((pl pm ) xi ) 2 xi = 1, i = 1, ..., n
l, , l < m, k, (xh )T xi = 0, h < i. (5.17) Here, x1 , ..., xnk represent the n k orthogonal bases of a n k -dimension subT space. Thus, (pT l xi ; ...; pl xnk ) represents the projection of pl onto the n k subspace. After the problem being solved and the subspace is found, the projections of pl s onto the complement k -dimension subspace must have all remaining information of pl s. In fact, the model without pair-wise distance constraints is called the minimum radius problem, which is a fundamental computational geometry and statistics problem. Assume that x1 , x2 , , xnk Rd are the optimal solution of (5.17). Then T T one can easily verify that the matrix Y = x1 xT 1 + x2 x2 + + xnk xnk is a feasible solution for the following SDP model:
k :=
minimize subject to
(5.18) 0.
It follows that k Rk (P )2 . Then, we can design a rounding procedure to round Y , a solution to (5.18), into orthogonal n k bases and prove their quality.
124
Theorem 5.9 We can computed in polynomial time, a (n k )-subspace such 2 that, with probability at least 1 n , its projected value of (5.17) is at most 12 log(n) of the minimal one.
5.5
The outer-radius, Rk (P ), is the minimum of the Euclidean radius of P projected onto a k -dimensional subspace. Computing the minimal radius is a fundamental problem in computational convexity with applications in global optimization, data mining, statistics and clustering, and has received considerable attention in the computational geometry literature [60, 153]. The square of Rk (P ) can be dened by the optimal value of the following Euclidean distance minimization: Rk (P )2 := minimize k T 2 subject to i=1 ((pl ) xi ) , pl P, 2 xi = 1, i = 1, ..., k, (xi )T xj = 0, i = j k. (5.19) This leads to the following SDP relaxation:
k :=
minimize subject to
Trace(pl pT l X ) , pl P, Trace(X ) = k, I X 0, X
(5.20) 0.
It follows that k Rk (P )2 .
Lemma 5.10 There exists an integer r k such that we can compute, in polynomial time, r nonnegative reals 1 , 2 , , r and r orthogonal unit vectors v1 , v2 , , vr such that (i).
r i=1
i = k. i p, vi
2
Rk (P )2 , for any p P .
Proof. We solve the semidente program (5.20), and let X be an optimal solution of (5.20). We claim that the rank of X , say r, is at least k . This follows from the fact that Trace(X ) = k and I X 0. In other words, Trace(X ) = k implies that the sum of the eigenvalues of X is equal to k , and I X 0 implies that the all eigenvalues are less than or equal to 1. Therefore, X has at least k non-zero eigenvalues, which implies that the rank of X is at least k . Let 1 , 2 , , r be the r nonnegative eigenvalues and r v1 , v2 , , vr be the corresponding eigenvectors. Then we have i=1 i = k and max1ir i 1. Furthermore, for any p P ,
r r
i p, vi
i=1
= Trace(ppT
i=1
T i v i v i ) = Trace(ppT X ) k Rk (P )2 .
125
5.5.1
In this section, we prove a lemma concerning how to deterministically group the eigenvalues and their eigenvectors. The proof of the lemma is elementary but it plays an important role for proving our main result. Lemma 5.11 The index set {1, 2, , r} can be partitioned into k sets I1 , I2 , , Ik such that (i). k i=1 Ii = {1, 2, , r }, and for any i = j , Ii Ij = . (ii). For any i : 1 i k ,
r j Ii 1 . j 2
Proof. Recall that j =1 j = k and 0 j 1 for all j . Without loss of generality, we can assume that 1 2 r . Our partitioning algorithm is the same as the Longest-Processing-Time heuristic algorithm for parallel machine scheduling problem. The algorithm works as follows: STEP 1. For i = 1, 2, , k , set Ii = and let Li = 0. Let I = {1, 2, , r}. STPE 2. While I = choose j from I with the smallest index; choose set i with the smallest value Li ; Let Ii := Ii {j }, Li := Li + j and I := I {j }. It is clear that when the algorithm stops, the sets I1 , I2 , , Ik satisfy condition (i). Now we prove condition (2) by contradiction. Assume that there exists 1 some t such that j It j < 2 . We now claim that, for all i, j Ii j 1. Otherwise, suppose j It j > 1 for some t . Note that j 1 for every j and thus there are at least two eigenvalues are assigned to It . Denote the last eigenvalue by s . It follows that j It j s = j It \{s } j j It j since, otherwise, we would have not assigned s to It in the algorithm. However, since j It j < 1 2 , we 1 1 . Thus, > must have j It j s = j It \{s } j < 1 j s j It 2 2 > 2. This is impossible since s is the last eigenvalue assigned to It , which implies s j for every j It , and we have already proved that there must exist an l such that s = l It and l j It \{s } j < 1 j Ii j 1 2 . Therefore, for all i, and in particular
j It
j <
1 2.
It follows that
k i=1
This However, we know that, by condition (i), j I i j = results a contradiction. Therefore, such t does not exists and we have proved condition (ii). Notice that the running time of the partitioning algorithm is bounded by O(r k ). An alternative way of partitioning the eigenvalues is the following: First, put the eigenvalues that are greater than or equal to 1/2 into distinct subsets. If the number of such eigenvalues, say l, is not less than k , then we are done. Otherwise, arbitrarily put the remaining eigenvalues into k l subsets such that the sum of eigenvalues in each subset is greater than or equal to 1/2. This method is suggested by an anonymous referee.
k i=1 j Ii j r j =1 j = k.
< k.
126
5.5.2
Assume now that we have found I1 , I2 , , Ik . Then our next randomized rounding procedure works as follows. STEP 1. Generate a r dimensional random vector such that each entry of takes value, independently, 1 or 1 with probability 1 2 each way. STEP 2. For i = 1, 2, , k , let xi =
j Ii
j vj
j I i j
The following Lemmas show that x1 , x2 , , xk form a feasible solution for the original problem. In other words, they are k orthogonal unit vectors. Lemma 5.12 For i = 1, 2, , k , xi = 1. Proof. Recall that vl , vj = 0 for any l = j and vj = 1. By denition, xi =
2
= xi , xi j
j Ii
j Ii
j v j j j
j Ii
j Ii
j
j Ii
j v j j j vj
= = = = = = 1
1
j Ii
j j j
j vj ,
j Ii
j j v j
1
j Ii j Ii
j j
j Ii
j vj , j j v j
2
1
j Ii
1
j I i j j Ii
(j )2 j vj j
j Ii
1
j Ii
Lemma 5.13 If s = t then xs , xt = 0. Proof. The proof is similar as that of Lemma 5.12. xs , xt
127
j
j Is
j v j j
j It
j
j It
j v j j j v j ,
j It
=
j Is
1 j
j It
j
j Is
j v j
0.
The last equality holds since for any j Is and l It , vj , vl = 0. Now we establish a bound on the performance of our algorithm. First, let us introduce Bernsteins Theorem (see, e.g., [238]), which is a form of the Cherno Bound. Lemma 5.14 Let be a random vector whose entries are independent and either 1 or 1 with probability .5 each way. Then, for any vector e and > 0, prob{ , e Let Cip =
j Ii 2
j p, vj 2 . Then we have
2 . n3
Proof. Given i and p, dene a |Ii | dimensional vector e such that its entries are j p, vj , j Ii , respectively. Furthermore, we dene the vector |Ii whose entries are those of with indices in Ii . First notice that e
2
=
j Ii
j p, vj )2 =
j Ii
j p, vj
= Cip .
j Ii
j vj j
j Ii
j
2
p,
j Ii
j v j j
2
2 p,
j Ii
j v j j
(since
j Ii
1 ) 2
128
2 j j p, vj
2
> 6 log(n) e 2 }.
Therefore, the conclusion of the lemma follows by using Lemma 5.14 and by letting = 6 log(n). Theorem 5.16 We can computed in polynomial time, a (d k )-at such that, 2 with probability at least 1 n , the distance between any point p P and F is at most 12 log(n) Rk (P ). Proof. For given i = 1, 2, , k and p P , consider the event Bip = {| p, xi
2
and B = i,p Bip . The probability that the event B happens is bounded by prob{ p, xi
i,p 2
2kn 2 . 3 n n
If B does not happen, then for any i and p, p, xi Therefore, for each p P ,
k k 2
12 log(n) Cip .
p, xi
i=1
12 log(n)
i=1
Cip 12 log(n) Rk (P )2 .
The last inequality follows from Lemma 5.10. This completes the proof by taking F as the at which is orthogonal to the vectors x1 , x2 , , xk .
5.6
The SDP problems presented in this chapter can be solved in a distributed fashion which has not been studied before. Here we describe an iterative distributed SDP computation scheme to for sensor network localization. We rst partition the anchors into many clusters according to their physical positions, and assign some sensors into these clusters if a sensor has a direct connection to one of the
129
anchors. We then solve semidenite programs independently at each cluster, and x those sensors positions which have high accuracy measures according the SDP computation. These positioned sensors become ghost anchors and are used to decide the remaining un-positioned sensors. The distributed scheme then repeats. A round of the distributed computation method is straightforward and intuitive: 1. Partition the anchors into a number of clusters according to their geographical positions. In our implementation, we partition the entire sensor area into a number of equal-sized squares and those anchors in a same square form a regional cluster. 2. Each (unpositioned) sensor sees if it has a direct connection to an anchor (within the communication range to an anchor). If it does, it becomes an unknown sensor point in the cluster to which the anchor belongs. Note that a sensor may be assigned into multiple clusters and some sensors are not assigned into any cluster. 3. For each cluster of anchors and unknown sensors, formulate the error minimization problem for that cluster, and solve the resulting SDP model if the number of anchors is more than 2. Typically, each cluster has less than 100 sensors and the model can be solved eciently. 4. After solving each SDP model, check the individual trace (5.7) for each unknown sensor in the model. If it is below a predetermined small tolerance, label the sensor as positioned and its estimation x j becomes an anchor. If a sensor is assigned in multiple clusters, we choose the x j that has the smallest individual trace. This is done so as to choose the best estimation of the particular sensor from the estimations provided by solving the dierent clusters. 5. Consider positioned sensors as anchors and return to Step 1 to start the next round of estimation. Note that the solution of the SDP problem in each cluster can be carried out at the cluster level so that the computation is highly distributive. The only information that needs to be passed among the neighboring clusters is which of the unknown sensors become positioned after a round of SDP solutions. In solving the SDP model for each cluster, even if the number of sensors is below 100, the total number of constraints could be in the range of thousands. However, many of those bounding away constraints, i.e., the constraints between two remote points, are inactive or redundant at the optimal solution. Therefore, we adapt an iterative active constraint generation method. First, we solve the problem including only partial equality constraints and completely ignoring the bounding-away inequality constraints to obtain a solution. Secondly we verify the equality and inequality constraints and add those violated at the current solution into the model, and then resolve it with a warm-start solution. We can repeat this process until all of the constraints are satised.
130
Typically, only about O(n + m) constraints are active at the nal solution so that the total number of constraints in the model can be controlled at O(n + m).
5.6.1
Simulations were performed on networks of 2, 000 to 4, 000 sensor points which are randomly generated in a square region of [.5 .5] [.5 .5]. We generally select the rst 510% of the points as anchors, that is, anchors are also uniformly distributed in the same random manner. The tolerance for labeling a sensor as positioned is set at 0.01 (1 + noisyf actor) radiorange. One simulation solves a network localization with 4, 000 sensors, where the entire sensor region is partitioned into 10 10 equal-sized squares, that is, 100 clusters, and the radio range is set at .035. The total solution time for the ve round computation on ij , is about the single Pentium 1.2 GHz and 500 MB PC, excluding computing d four minutes using DSDP2.0. The nal solution is ploted in Figure 3.
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
Figure 5.5: Third and nal round position estimations in the 4, 000 sensor network, noisy-factor=0, radio-range=0.045, and the number of clusters=100. It is usually the outlying sensors at the boundary or the sensors which do not have many anchors within the radio range that are not estimated in the initial
5.7. NOTES
131
stages of the method. Gradually, as the number of well estimated sensors or ghost anchors grows, more and more of these points are positioned. We have also estimated the same 4, 000 network using noisy data. It is noted that the erroneous points are isolated within particular regions. This clearly indicates that the clustering approach prevents the propagation of errors to other clusters. We feel that our distributed computation techlogy is promising for solving very very large scale distance geometry related problems.
5.7
Notes
The distance geometry problem and its variants arise from applications in various areas, such as molecular conformation, dimensionality reduction, Euclidean ball packing, and more recently, wireless sensor network localization [5, 51, 79, 159, 269, 273]. In the sensor networks setting, the vertices of G correspond to sensors, the edges of G correspond to communication links, and the weights correspond to distances. Furthermore, the vertices are partitioned into two sets one is the anchors, whose exact positions are known (via GPS, for example); and the other is the sensors, whose positions are unknown. The goal is to determine the positions of all the sensors. We shall refer to this problem as the Sensor Network Localization problem. Note that we can view the Sensor Network Localization problem as a variant of the Graph Realization problem in which a subset of the vertices are constrained to be in certain positions. In many sensor networks applications, sensors collect data that are location dependent. Thus, another related question is whether the given instance has a unique realization in the required dimension (say, in R2 ). Indeed, most of the previous works on the Sensor Network Localization problem fall into two categories one deals with computing a realization of a given instance [51, 79, 89, 159, 268, 269, 270, 273], and the other deals with determining whether a given instance has a unique realization in Rd using graph rigidity [89, 137]. It is interesting to note that from an algorithmic viewpoint, the two problems above have very dierent characteristics. Under certain non degeneracy assumptions, the question of whether a given instance has a unique realization on the plane can be decided eciently [167], while the problem of computing a realization on the plane is NPcomplete in general, even if the given instance has a unique realization on the plane [24]. Thus, it is not surprising that all the aforementioned heuristics for computing a realization of a given instance do not guarantee to nd it in the required dimension. On another front, there has been attempts to characterize families of graphs that admit polynomial time algorithms for computing a realization in the required dimension. For instance, Eren et. al. [89] have shown that the family of trilateration graphs has such property. (A graph is a trilateration graph in dimension d if there exists an ordering of the vertices 1, . . . , d + 1, d + 2, . . . , n such that (i) the rst d + 1 vertices form a complete graph, and (ii) each vertex j > d + 1 has at least d + 1 edges to vertices earlier in the sequence.) However, the question of whether there exist larger families of graphs with such property is open.
132
We should mention here that various researchers have used SDP to study the distance geometry problem (or its variants) before. For instance, Barvinok [33, 34] has studied this problem in the context of quadratic maps and used SDP theory to analyze the possible dimensions of the realization. Alfakih and Wolkowicz [6, 7] have related this problem to the Euclidean Distance Matrix Completion problem and obtained an SDP formulation for the former. Moreover, Alfakih has obtained a characterization of rigid graphs in [3] using Euclidean distance matrices and has studied some of the computational aspects of such characterization in [4] using SDP. However, these papers mostly addressed the question of realizability of the input graph, and the analyses of their SDP models only guarantee that they will nd a realization whose dimension lies within a certain range. Thus, these models are not quite suitable for our application. In contrast, our analysis takes advantage of the presence of anchors and gives a condition which guarantees that our SDP model will nd a realization in the required dimension. We remark that SDP has also been used to compute and analyze distance geometry problems where the realization is allowed to have a certain amount of distortion in the distances [35, 203]. Again, these methods can only guarantee to nd a realization that lies in a range of dimensions. Thus, it would be interesting to extend our method to compute lowdistortion realizations in a given dimension. For some related work in this direction, see, e.g., [26]. The SDP models were rst used by [63, 64] for the ad hoc wireless sensor network localization, and the results described here were based on their work. The result on the radii improves the ratio of point sets is due to [339], which of [317] by a factor of O( log d) that could be as large as O( log n). This ratio also matches the previously best known ratio for approximating the special case R1 (P ) of [238], the width of point set P .
5.8
Exercises
minimize subject to
2 2 k,j Na (kj ) i,j Nx , i<j (ij ) + 2 2 ij ) + ij , (i, j ) Nx , i < xi xj = (d kj )2 + kj , for (k, j ) Na , ak xj 2 = (d xi xj 2 R2 , for the rest i < j, ak xj 2 R2 , for the rest k, j ;
and interpret the objective function. 5.2 Find the SDP relaxation to the following minimization problem minimize subject to
2 i,j Nx , i<j (ij )
xi xj ak xj xi xj ak xj
+ k,j Na (kj )2 ij + ij , (i, j ) Nx , i < j, =d kj + kj , for (k, j ) Na , =d 2 R2 , for the rest i < j, 2 R2 , for the rest k, j ;
133
5.3 The Un-Folding Problem: with the same notions, consider the problem maximize subject to
i ak
xi 2 xj
2 2
2 =d kj = d2 ij
(k, j ) Na (i, j ) Nx
(5.21)
xi xj
Find the SDP relaxation and its dual, and explain what they are. 5.4 Given one sensor and three anchors in R2 , and let the three distances from the sensor to three anchors are given. Show that the graph is stronly localizable if and only if the sensors position lies in the liner space of the three anchors positions. 5.5 Prove that the graphs depicted in Figures 5.2(a) and 5.2(b) are strongly localizable. 5.6 Suppose for the angle of two edges joint at a point is known. How to incoporate this information in the SDP relaxation of the distance geometry problem?
134
Chapter 6
where is the data of the problem and x Rn is the decision vector, and K is a convex cone. For deterministic optimization, we assume is known and xed. In reality, may not be certain. Knowledge of belonging to a given uncertain set U . The constraints must be satised for every in the uncertain set U . Optimal solution must give the best guaranteed value of supU f (x, ). This leads to the so-called robust counterpart: Minimize (ROPT) Subject to supU f (x, ) F (x, ) K for all U. (6.2)
136
6.1.1
Stochastic Method
Minimize (EOPT) Subject to E [F (x, )] K. E [f (x, )]
6.1.2
Sampling Method
Minimize z F (x, k ) K f (x, k ) z for large sxamples k U.
(SOPT) Subject to
6.2
qT x 1. (6.3)
6.2.1
Ellipsoid uncertainty
k
uj Aj | uT u 1.
qT x (A0 +
k j =1
uj Aj )x 1
uT u 1.
qT x uj Aj )x
2
uT u 1.
137
uj Aj )x
1 u
F (x)T F (x)
1 u
qT x 1 0
T
0 0 1 0
F (x)T F (x) 0 I 1 u 0.
1 u
1 u
6.2.2
S -Lemma
Lemma 6.1 Let P and Q be two symmetric matrices such that there exists u0 satisfying (u0 )T P u0 > 0. Then the implication that uT P u 0 uT Qu 0 holds true if and only if there exists 0 such that Q
T
P.
0 0
F (x)T F (x) 1 0 0 I 1 u 1 0
1 u 0
if and only if there is a 0 such that 1 0 This is equivalent to F (x)T F (x) 0 I F (x) I
T
+ F (x)
1 0
0 I
which is a SDP constraint since F (x) is linear in x. Here we have used another technical lemma Lemma 6.2 Let P a symmetric matrix and A be a rectangle matrix. Then P AT A if and only if P A AT I 0 0.
138
6.2.3
Consider and x as variables, REQP nally becomes a SDP problem: Minimize (REQP) Subject to qT x 1 0 0 I F (x) F (x) I
T
0.
6.3
6.3.1
Ellipsoid uncertainty
(A, b, ) = (A0 , b0 , 0 ) +
j =1
uj (Aj , bj , j )| uT u 1.
qT x xT AT Ax + 2bT x + 0 uT u 1.
0 + 2xT b0 1 /2 + xT b1 k /2 + xT bk
1 /2 + xT b1 0 0
k /2 + xT bk 0 F (x)T F (x) 0
1 u
139
6.3.2
SDP formulation
From the S -lemma, we immediately see if and only if there is 0 such that 0 + 2xT b0 1 /2 + xT b1 k /2 + xT bk 1 /2 + xT b1 0 0 F (x)T F (x)) 1 0 0 I k /2 + xT bk 0 0 which is is equivalent to 0 + 2xT b0 1 /2 + xT b1 1 /2 + xT b1 0 k /2 + xT bk 0 k /2 + xT bk 0 + 0 F (x)
1 0 0 I F (x)T I 0.
which is a SDP constraint since the matrix is linear in x. Consider and x as variables, REQP nally becomes a SDP problem: Minimize (REQP) qT x Subject to 0 + 2xT b0 1 /2 + xT b1 1 /2 + xT b1 k /2 + xT bk 0 F (x) k /2 + xT bk 0 F (x)T I 0.
6.4
qT x i = 1, ..., t. (6.5)
6.4.1
Ellipsoid uncertainty
k 0 0 (Ai , bi , i ) = (A0 i , b i , i ) + j =1 j j T uj (Aj i , bi , i )| u u 1.
qT x
T xT AT i Ai x + 2bi x + i 0, i = 1, ..., t,
uT u 1.
140
6.4.2
SDP formulation
1 k Fi (x) = (A0 i x, Ai x, Ai x).
Let again Consider i , i = 1, ..., t, and x as variables, REQP nally becomes a SDP problem: Min q T x s.t.
0 i + 2xT b0 i i 1 /2 + xT b1 i i k i /2 + xT bk i 1 i /2 + xT b1 i i 0 Fi (x)
k i /2 + xT bk i 0 Fi (x)T i I
0,
i = 1, ..., t.
6.5
Recall:
where
. uU
by its conic dual problem and let the dual objective function be (0 , x) where 0 is the dual variables. The dual objective function is an upper bound on f (x, (u)) so we could represent the problem by Minimize (ROPT) Subject to F (x, ) 0 for all U dual constraints on 0 . (0 , x) (6.7)
141
We handle the constraints in the exact same manner, and the problem becomes Minimize (0 , x) (ROPT) (6.8) Subject to (, x) C dual constraints on 0 , , where (, x) is, component-wise, the dual objective function of sup F (x, (u)),
u U
6.5.1
Examples
ax + b 0
ui (ai , bi )|
i=1
. uU
Case of u 1:
k u: u 1
max
a0 x + b0 +
i=1
ui (ai x + bi )
+ bi )2 .
0 (a0 x + b0 ) +
i=1
(ai x + bi )2 .
Case of u
1:
k u: u
max
a0 x + b0 +
i=1
ui (ai x + bi )
|(ai x + bi )|.
142
0 (a0 x + b0 ) +
i=1
|(ai x + bi )|.
Case of u
1:
k u: u
max
1 1
a0 x + b0 +
i=1
ui (ai x + bi )
6.6
6.6.1
Applications
Robust Linear Least Squares
Minimize Ay c y Y.
2
(A, c) = (A0 , c0 ) +
i=1
ui (Ai , ci )| u 1,
the robust counter part can be drawn from the earlier disscusion, since the original problem is a convex QP problem.
6.6.2
Consider an electric circuit represented as a connected oriented graph with N arcs, each of which possesses a given conductance sij . Assuming that the arcs of the circuit are charged with external voltages y and internal voltages v . These voltages induce currents in the circuit; as a result, the circuit dissipates heat. According to Kirchos laws, the heat is given by (Qv + P y )T S (Qv + P y ), H = min m
v R
6.6. APPLICATIONS
143
where Q and P are some matrices given by the topology of the circuit, m is the number of nodes of the circuit, and S is N N diagonal matrix with sij as its diagonal entries. The problem is to choose y such that (Qv + P y )T S (Qv + P y ) , min min m
y Y v R
S = S (u) = S 0 +
i=1
ui S i | u 1,
Since (Qu + P y )T S (u)(Qu + P y ) is convex in (y, v ) and linear in u, this problem is equivalent to min max(Qv + P y )T S (u)(Qv + P y ) .
u U
y Y,v Rm
6.6.3
Portfolio optimization
Minimize Subject to xT V x T x eT x = 1 .
(V, ) = (V , ) +
i=1
ui (V i , i )| u 1.
The robust counterpart of the portfolio problem is Minimize Subject to maxV V [xT V x] minS [T x] eT x = 1
144
6.7
Exercises
6.1 Prove the S -lemma. 6.2 In the linear least squares problem, assume that
k
(A, c) = (A0 , c0 ) +
i=1
ui (Ai , ci )| u 1.
1,
1,
and
1.
S = S (u) = S 0 +
i=1
ui S i | u 1,
and si 0 for all i = 0, ..., k . Construct the robust counterparts of the problem for three norms on u: u
2
1,
1,
and
1.
(V, ) = (V 0 , 0 ) +
i=1
ui (V i , i )| u 1,
and V i 0 for all i = 0, ..., k . Construct the robust counterparts of the problem for three norms on u: u
2
1,
1,
and
1.
Chapter 7
Quantum Computation uses Quantum Mechanics to enhance computation power. First concrete result was developed for Integer Factorization by Shor in 1994; and then for Quantum search algorithm and space reduction by Grover in 1996. The eld is closely linked to Quantum Information Theory and Statistics, Quantum Channels and Optimization. To describe quantum mechanics, Hilbert sapces and linear operators are generally used. In this lecture, we restrict ourselves to nite-dimensional real vector spaces. In classical quantum physics, a quantum state, or a state, is a column vector v in the N -dimensional vector space RN with norm ||v ||2 = v v = 1. Nowadays, another formalism is widely accepted, in which a state is a postive semidenite symmetric matrix V with trace 1 that belongs to RN N . A matrix of such property is called a density matrix. For example, a density matrix V with rank one can be represented as V = vv T , and T r(V ) = 1. A density matrix of rank higher than 1 can be written as
N
V =
i=1
T i v i v i ,
i = 1.
A density matrix of rank 1 is called a pure sate, and a density matrix of rank higher than 1 is called a mixed state. The state V is interpreted as a probalistic T mixture of pure states vi vi with mixing probabilities i . 145
146
7.2
The state of a system may change as time goes on. Given {A1 , A2 , , Ak }, where Ai RM N , such that
k
Ai AT i = I,
i=1
T (V ) =
i=1
Ai V A T i .
Note that T (V ) RM M , so CP is a linear mapping that conveys quantum states in RN N to the states in RM M . In quantum computing, the pure state n is in R2 or N = 2n , called n-qubit. When n = 1, a unit orthogonal (1; 0) and (0; 1) represent the classical bits. A measurement is described as a nite set of symmetric positive semidenite matrices {M1 , M2 , , Mk } whose sum is an identity matrix, i.e.,
k
Mi
0, i = 1, , k, and
i=1
Mi = I,
called Positive Operator-Valued Measures (POVM). Given the system in state V and a measurement {M1 , M2 , , Mk } performed on the system, we obtain a random varible X with Pr(X = l) = V Ml , l = 1, , k.
A fundamental problem is quantum detection theory is to nd an optimal measurement for given mixed states. Suppose we are given m states V1 , V2 , , Vm with corresponding prior probabilities 1 , 2 , , m . For a measurement M = {M1 , . . . , Mk }, we have P (j |i) = T r(Vi Mj ). Let cij ( 0) be the cost (penlaty) of taking a true state i to be j . Then the average cost is C (M ) =
i,j
i P (j |i)cij =
j
Tr
i
i Vi cij
Mj .
Let Wj = i i Vi cij , j = 1, . . . , k . The problem of minimizing the average cost can be formulated as the following SDP: Minimize Subject to
k j =1 k j =1
T r(Wj Mj ) Mj = I, 0, j = 1, . . . , k.
Mj
147
T r (U ) U, j = 1, . . . , k.
Wj
Wi
j =1
Mj Wj =
j =1
Wj Mj ,
i = 1, ..., k.
7.3
A classical discrete memoryless channel consists of a set of input symbols A = {a1 , a2 , . . . , am }, a set of output symbols B = {b1 , b2 , . . . , bn } and a set of conditional probabilities Vij = P (bj |ai ) with
j =1
Vij = 1, for i = 1, . . . , m.
Therefore, V = {Vij } is a stochastic matrix, where Vij is the probability that bj is received when ai is sent through the channel. The capacity C (V ) of a channel V , dened as the maximum of rates (speeds) at which information can be transmitted reliably, is given by following theorem due to Shannon. The classical channel coding theorem is modeled as: C (V ) Maximize Subject to I (p, V )
m i=1
pi = 1, with p = {pi }
pi 0, i = 1, . . . , m, where I (p, V ) =
i
pi
j
Vij log
k
Vij pk Vkj
qj log
qj , rj
is a measure of distance between two probability distributions q = (qj ) and r = (rj ), and plays an important role in information theory. Then, the function I (p, V ) can be written in terms of the above KullbackLeibler divergence I (p, V ) = pi D(Vi ||pV ),
i
where r = pV is a distribution dened by rj = i pi Vij . With a channgel V exed, I (p, V ) is concave in p = (pi ). Hence, the computatution of C (V ) is a concave maximization problem with linear constraints.
148
A quantum memoryless channel consists of a set S1 of input states (or density matrices) in Rmm , S2 of output states (or density matrices) in Rnn , and a CP (map) : S1 S2 . The capacity C () of a quantum channel , dened to be the maximum of rates (speeds) at which reliable transmission of classical information through the channel is possible. The Quantum channel coding theorem due to A. S. Holevo is modeled by C (V ) Maximize Subject to I ({i }, {Xi }, ) X i S1 ,
d i=1
i = 1, . . . , d
i = 1
= with X i i Xi . Observe that I ({i }, {Xi }, ) can be written in terms of the quantum counterpart of the Kullback-Leibler divergence (often called relative entropy) D(X ||Y ) = T r(X )[log T r(X ) log T r(Y )]. Once the ensemble X = {Xi } is xed, I ({i }, {Xi }, ) is a concave function in = {i }. This is again a convex programming problem. But, in general, I ({i }, {Xi }, ) is neither convex nor concave in (, X ).
7.4
Interation Between Prover and Verier. Consider two CP mappings: given {A1 , A2 , , Ak } and {B1 , B2 , , Bl }, where Ai , Bi RM N , such that
k l
Ai AT i =I
i=1
and
i=1
T Bi Bi = I,
Dene T1 (V ) =
Ai V AT i
i=1 l
and T2 (V ) =
T Bi V Bi . i=1
7.5. NOTES The question is, are there two states V1 and V2 such that T1 (V1 ) = T2 (V2 ); or for all V1 and V2 T1 (V1 ) T2 (V2 ) . An SDP Formulation for this problem is Minimize Subject to t I V1 = I V2 = 1 , V1 , V 2 0 T1 (V1 ) T2 (V2 ) + t I t I T1 (V1 ) + T2 (V2 ) 0 0.
149
7.5 7.6
Notes Exercises
150
Bibliography
[1] M. Abdi, H. El Nahas, A. Jard, and E. Moulines. Semidenite positive relaxation of the maximum-likelihood criterion applied to multiuser detection in a CDMA context. IEEE Signal Processing Letters, 9(6):165 167, June 2002. [2] A. A. Ageev and M. I. Sviridenko, An Approximation Algorithm for Hypergraph Max k -Cut with Given Sizes of Parts, Proceedings of ESA2000, Lecture Notes in Computer Science 1879, (2000) 32-41. [3] A. Y. Alfakih. Graph Rigidity via Euclidean Distance Matrices. Linear Algebra and Its Applications 310:149165, 2000. [4] A. Y. Alfakih. On Rigidity and Realizability of Weighted Graphs. Linear Algebra and Its Applications 325:5770, 2001. [5] A. Y. Alfakih, A. Khandani, H. Wolkowicz. Solving Euclidean Distance Matrix Completion Problems Via Semidenite Programming. Comput. Opt. and Appl. 12:1330, 1999. [6] A. Y. Alfakih, H. Wolkowicz. On the Embeddability of Weighted Graphs in Euclidean Spaces. Research Report CORR 9812, University of Waterloo, Dept. of Combinatorics and Optimization, 1998. [7] A. Y. Alfakih, H. Wolkowicz. Euclidean Distance Matrices and the Molecular Conformation Problem. Research Report CORR 200217, University of Waterloo, Dept. of Combinatorics and Optimization, 2002. [8] F. Alizadeh. Combinatorial optimization with interior point methods and semidenite matrices. Ph.D. Thesis, University of Minnesota, Minneapolis, MN, 1991. [9] F. Alizadeh. Optimization over the positive semi-denite cone: Interior point methods and combinatorial applications. In P. M. Pardalos, editor, Advances in Optimization and Parallel Computing, pages 125. North Holland, Amsterdam, The Netherlands, 1992. [10] F. Alizadeh. Interior point methods in semidenite programming with applications to combinatorial optimization. SIAM J. Optimization, 5:13 51, 1995. 151
152
BIBLIOGRAPHY
[11] F. Alizadeh, J. P. A. Haeberly, and M. L. Overton, Primal-dual interior point methods for semidenite programming, Technical Report 659, Computer Science Department, Courant Institute of Mathematical Sciences, New York University, 1994. [12] F. Alizadeh, J.-P. A. Haeberly, and M. L. Overton. Complementarity and nondegeneracy in semidenite programming. Math. Programming, 77:111128, 1997. [13] B. Alkire and L. Vandenberghe. Convex optimization problems involving nite autocorrelation sequences. Mathematical Programming. To appear. [14] E. D. Andersen and K. D. Andersen. The APOS linear programming solver: an implementation of the homogeneous algorithm. Working Paper, Department of Management, Odense University, Denmark, 1997. [15] E. D. Andersen and Y. Ye. On a homogeneous algorithm for the monotone complementarity problem. Working Paper, Department of Management Sciences, The University of Iowa, IA, 1995. To appear in Math. Programming. [16] G. Andersson and L. Engebrestsen, Better Approximation Algorithms for Set Splitting and Not-All-Equal SAT, Information Processing Letters 65, (1998) 305-311. [17] K. D. Andersen and E. Christiansen. A newton barrier method for minimizing a sum of Euclidean norms subject to linear equality constraints. Preprint 95-07, Department of Mathematics and Computer Science, Odense University, Denmark, 1995. [18] K. M. Anstreicher. A standard form variant and safeguarded linesearch for the modied Karmarkar algorithm. Math. Programming, 47:337351, 1990. [19] K. M. Anstreicher and M. Fampa, A long-step path following algorithm for semidenite programming problems, Working Paper, Department of Management Science, The University of Iowa, Iowa City, 1996. [20] K. M. Anstreicher, D. den Hertog, C. Roos, and T. Terlaky. A long step barrier method for convex quadratic programming. Algorithmica, 10:365382, 1993. [21] K. M. Anstreicher. On long step path following and SUMT for linear and quadratic programming. SIAM J. Optimization, 6:33-46, 1996. [22] S. Arora, S. Rao and U. Vazirani. Expander ows, geometric embeddings and graph partitioning. The 35th STOC04 2004.
BIBLIOGRAPHY
153
[23] Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama, Greedily nding a dense subgraph, in Proceedings of the 5th Scandinavian Workshop on Algorithm Theory (SWAT), Lecture Notes in Computers Science, 1097, 136-148, Springer-Verlag, 1996. [24] James Aspnes, David Goldenberg, Yang Richard Yang. On the Computational Complexity of Sensor Network Localization. ALGOSENSORS 2004, in LNCS 3121:3244, 2004. [25] M. Badoiu, S. Har-Peled and P. Indyk, Approximate Clustering via Core-sets, In Proc. ACM Symp. Theory of Computing, 2002. [26] M. B adoiu. Approximation Algorithm for Embedding Metrics into a TwoDimensional Space. Proc. 14th SODA 434-443, 2003. [27] M. B adoiu, E. D. Demaine, M. T. Hajiaghayi, P. Indyk. Low Dimensional Embedding with Extra Information. Proc. 20th SoCG 320 329, 2004. [28] O. Bahn, O. Du Merle, J. L. Gon, and J. P. Vial. A cutting plane method from analytic centers for stochastic programming. Math. Programming, 69:4574, 1995. [29] Balas, E., Ceria, S. and Cornuejols, G., A lift-and-project cutting plane algorithm for mixed 0-1 programs. Mathematical Programming, 58:295 324, 1993. [30] R. Banavar and A. Kalele. A mixed norm performance measure for the design of multirate lterbanks. IEEE Transactions on Signal Processing, 49(2):354359, Feb. 2001. [31] I. B ar any and F uredi. Computing the volume is dicult. In Proceedings of the 18th Annual ACM Symposium on Theory of Computing, page 442447, ACM, New York, 1986. [32] G. Barequet and S. Har-Peled, Eciently Approximating the MinimumVolume Bounding Box of a Point Set in Three Dimensions, J. Algorithms, 38:91-109, 2001. [33] A. Barvinok. Problems of Distance Geometry and Convex Properties of Quadratic Maps. Disc. Comp. Geom. 13:189202, 1995. [34] A. Barvinok. A Remark on the Rank of Positive Semidenite Matrices Subject to Ane Constraints. Disc. Comp. Geom. 25(1):2331, 2001. [35] A. Barvinok. A Course in Convexity. AMS, 2002. [36] D. A. Bayer and J. C. Lagarias. The nonlinear geometry of linear programming, Part I: Ane and projective scaling trajectories. Transactions of the American Mathematical Society, 314(2):499526, 1989.
154
BIBLIOGRAPHY
[37] D. A. Bayer and J. C. Lagarias. The nonlinear geometry of linear programming, Part II: Legendre transform coordinates. Transactions of the American Mathematical Society, 314(2):527581, 1989. [38] D. A. Bayer and J. C. Lagarias. Karmarkars linear programming algorithm and Newtons method. Math. Programming, 50:291330, 1991. [39] M. Bellare and P. Rogaway. The complexity of approximating a nonlinear program. Math. Programming, 69:429-442, 1995. [40] S. J. Benson and Y. Ye. DSDP User Guide. unix.mcs.anl.gov/ benson, November 2000. http://www-
[41] S. J. Benson, Y. Ye, and X. Zhang. Mixed linear and semidenite programming for combinatorial optimization. Optimization Methods and Software, 10:515544, 1999. [42] S. J. Benson, Y. Ye, and X. Zhang. Solving large scale sparse semidenite programs for combinatorial optimization. SIAM Journal of Optimization, 10:443461, 2000. [43] A. Ben-Tal, M. Kocvara, A. Nemirovski, and J. Zowe. Free material optimization via semidenite programming: the multiload case with contact conditions. SIAM Review, 42(4):695715, 2000. [44] A. Ben-Tal and A. Nemirovski. Robust truss topology design via semidefinite programming. SIAM J. Optim., 7(4):9911016, 1997. [45] A. Ben-Tal and A. Nemirovski. Structural design via semidenite programming. In Handbook on Semidenite Programming, pages 443467. Kluwer, Boston, 2000. [46] A. BenTal and A. S. Nemirovskii. Interior point polynomial time method for truss topology design. SIAM J. Optimization, 4:596612, 1994. [47] D. P. Bertsekas and J. N. Tsitsiklis. Introduction to Linear Optimization. Athena Scientic, Belmont, MA, 1997. [48] D. P. Bertsekas. Nonlinear Programming. Athena Scientic, Belmont, MA, 1995. [49] D. Bertsimas and J. Nino-Mora. Optimization of multiclass queuing networks with changeover times via the achievable region approach: part ii, the multi-station case. Mathematics of Operations Research, 24(2), May 1999. [50] D. Bertsimas and Y. Ye. Semidenite relaxations, multivariate normal distributions, and order statistics. Handbook of Combinatorial Optimization (Vol. 3), D.-Z. Du and P.M. Pardalos (Eds.) pp. 1-19, (1998 Kluwer Academic Publishers).
BIBLIOGRAPHY
155
[51] P. Biswas, Y. Ye. Semidenite Programming for Ad Hoc Wireless Sensor Network Localization. Proc. 3rd IPSN 4654, 2004. [52] R. G. Bland, D. Goldfarb and M. J. Todd. The ellipsoidal method: a survey. Operations Research, 29:10391091, 1981. [53] L. Blum, M. Shub and S. Smale. On a theory of computations over the real numbers: NP-completeness, recursive functions and universal machines. Proc. 29th Symp. Foundations of Computer Science, 387397, 1988. [54] H.L. Bodlaender, P. Gritzmann, V. Klee and J. Van Leeuwen, The Computational Complexity of Norm Maximization, Combinatorica, 10: 203-225, 1990. [55] K. H. Borgwardt. The Simplex Method: A Probabilistic Analysis. Springer, Berlin, 1987. [56] S. Boyd, L. E. Ghaoui, E. Feron, and V. Balakrishnan. Linear Matrix Inequalities in System and Control Science. SIAM Publications. SIAM, Philadelphia, 1994. [57] B. Borchers. CSDP, a C library for semidenite programming. Optimization Software and Methods, 11:613623, 1999. [58] B. Borchers. SDPLIB 1.0 : A collection of semidenite programming test problems. Technical report, Faculty of Mathematics, Institute of Mining and Technology, New Mexico Tech, Socorro, NM, USA, July 1998. [59] B. Borchers. Presentation at the Seventh DIMACS Implementation Challenge Piscataway, NJ, November 2, 2000. [60] A. Brieden, P. Gritzmann, R. Kannan, V. Klee, L. Lovasz and M. Simonovits, Deterministic and Randomized Polynomial-time Approximation of Radii, To appear in Mathematika. [61] A. Brieden, P. Gritzmann, R. Kannan, V. Klee, L. Lovasz and M. Simonovits, Approximation of Diameters: Randomization Doesnt Help, In Proc. IEEE Symp. Foundations of Comp. Sci., 244-251, 1998. [62] A. Brieden, Geometric Optimization Problems Likely Not Contained in APX, Discrete Comput. Geom., 28:201-209, 2002. [63] P. Biswas and Y. Ye. Semidenite Programming for Ad Hoc Wireless Sensor Network Localization. Management Science and Engineering, Stanford, CA 94305, September 2003. [64] P. Biswas and Y. Ye. A distributed method for solving semideinite programs arising from Ad Hoc Wireless Sensor Network Localization. Management Science and Engineering, Stanford, CA 94305, October 2003.
156
BIBLIOGRAPHY
[65] N. Bulusu, J. Heidemann, D. Estrin. GPS-less low cost outdoor localization for very small devices. TR 00-729, Computer Science, University of Southern California, April, 2000. [66] S. Burer and R. D. C. Monteiro. An ecient algorithm for solving the MAXCUT SDP relaxation. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA USA, December 1998. [67] S. Burer, R. D. C. Monteiro, and Y. Zhang. Solving semidenite programs via nonlinear programming. part I: Transformations and derivatives. Technical Report TR99-17. Department of Computational and Applied Mathematics, Rice University, TX, September 1999. [68] S. Burer, R. D. C. Monteiro, and Y. Zhang. Solving semidenite programs via nonlinear programming. part II: Interior point methods for a subclass of SDPs. Technical Report TR99-23. Department of Computational and Applied Mathematics, Rice University, TX, October 1999. [69] G. Calaore and M. Indri. Robust calibration and control of robotic manipulators. In American Control Conference, pages 20032007, 2000. [70] S. A. Cook. The complexity of theorem-proving procedures. Proc. 3rd ACM Symposium on the Theory of Computing, 151158, 1971. [71] R. W. Cottle and G. B. Dantzig. Complementary pivot theory in mathematical programming. Linear Algebra and its Applications, 1:103-125, 1968. [72] R. Cottle, J. S. Pang, and R. E. Stone. The Linear Complementarity Problem, chapter 5.9 : Interiorpoint methods, pages 461475. Academic Press, Boston, 1992. [73] G. B. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, New Jersey, 1963. [74] L. Danzer, B. Gr unbaum and V. Klee. Hellys theorem and its relatives. in V. Klee, editor, Convexity, page 101180, AMS, Providence, RI, 1963. [75] T. Davidson, Z. Luo, and K. Wong. Design of orthogonal pulse shapes for communications via semidenite programming. IEEE Transactions on Communications, 48(5):14331445, May 2000. [76] J. E. Dennis and R. E. Schnabel. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. PrenticeHall, Englewood Clis, New Jersey, 1983. [77] C. de Soua, R. Palhares, and P. Peres. Robust H lter design for uncertain linear systems with multiple time-varying state delays. IEEE Transactions on Signal Processing, 49(3):569575, March 2001.
BIBLIOGRAPHY
157
[78] I. I. Dikin. On the convergence of an iterative process. Upravlyaemye Sistemi, 12:5460, 1974. (In Russian). [79] L. Doherty, L. E. Ghaoui, S. J. Pister. Convex Position Estimation in Wireless Sensor Networks. Proc. 20th INFOCOM Volume 3:16551663, 2001. [80] A. Doherty, P. Parrilo, and F. Spedalieri. Distinguishing separable and entangled states. Physical Review Letters, 88(18), 2002. [81] Y. Doids, V. Guruswami, and S. Khanna. The 2-catalog segmentation problem. In Proceedings of SODA, pages 378380, 1999. [82] C. Du, L. Xie, and Y. Soh. H ltering of 2-D discrete systems. IEEE Transactions on Signal Processing, 48(6):17601768, June 2000. [83] H. Du, L. Xie, and Y. Soh. H reduced-order approximation of 2-D digital lters. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 48(6):688698, June 2001. [84] G. E. Dullerud and F. Paganini. A Course in Robust Control Theory, volume 36 of Texts in Applied Mathematics. Springer-Verlag, 2000. [85] B. Dumitrescu, I. Tabus, and P Stoica. On the parameterization of positive real sequences and MA parameter estimation. IEEE Transactions on Signal Processing, 49(11):26302639, November 2001. [86] M. E. Dyer and A. M. Frieze. The complexity of computing the volume of a polyhedron. SIAM J. Comput., 17:967974, 1988. [87] M. E. Dyer, A. M. Frieze and R. Kannan. A random polynomial time algorithm for estimating volumes of convex bodies. J. Assoc. Comp. Mach., 38:117, 1991. [88] J. Edmonds. Systems of distinct representatives and linear algebra. J. Res. Nat. Bur. Standards, 71B, 1967, pages 241245. [89] T. Eren, D. K. Goldenberg, W. Whiteley, Y. R. Yang, A. S. Moore, B. D. O. Anderson, P. N. Belhumeur. Rigidity, Computation, and Randomization in Network Localization. Proc. 23rd INFOCOM, 2004. [90] G. Elekes. A geometric inequality and the complexity of computing volume. Discr. Comput. Geom., 1:289292, 1986. [91] J. Elzinga and T. G. Moore. A central cutting plane algorithm for the convex programming problem. Math. Programming, 8:134145, 1975. [92] U. Faigle, W. Kern, and M. Streng, Note on the Computaional Complexity of j -Radii of Polytopes in Rn , Mathematical Programming, 73:15, 1996.
158
BIBLIOGRAPHY
[93] J. Falkner, F. Rendl, and H. Wolkowicz, A computational study of graph partitioning, Mathematical Programming, 66(2):211-240, 1994. [94] S. C. Fang and S. Puthenpura. Linear Optimization and Extensions. PrenticeHall, Englewood Clis, NJ, 1994. [95] L. Faybusovich. Jordan algebras, Symmetric cones and Interior-point methods. Research Report, University of Notre Dame, Notre Dame, 1995. [96] L. Faybusovich. Semidenite programming: a path-following algorithm for a linear-quadratic functional. SIAM J. Optimization, 6:10071024, 1996. [97] U. Feige and M. X. Goemans. Approximating the value of two prover proof systems, with applications to max 2sat and max dicut. In Proceedings of the 3nd Israel Symposium on Theory and Computing Systems, pages 182189, 1995. [98] U. Feige and M. X. Goemans, Approximating the value of two prover proof systems, with applications to MAX 2SAT and MAX DICUT, in Proceedings of the Third Israel Symposium on Theory of Computing and Systems, (1995) 182-189. [99] U. Feige and M. Langberg, Approximation algorithms for maximization problems arising in graph partitioning, to appear in J. Algorithms. [100] U. Feige and M. Seltser, On the densest k -subgraph problem, Technical report, Department of Applied Mathematics and Computer Science, The Weizmann Institute, Rehovot, September 1997. [101] U. Feige, G. Kortsarz and D. Peleg, The dense k -subgraph problem, to appear in Algorithmica. [102] U. Feige and M. Langberg. Approximation algorithms for maximization problems arising in graph partitioning. Journal of Algorithms, 41:174 211, 2001. [103] U. Feige and M. Langberg. The rpr2 rounding technique for semidente programs. In ICALP, Lecture Notes in Computer Science. Springer, Berlin, 2001. [104] A. V. Fiacco and G. P. McCormick. Nonlinear Programming : Sequential Unconstrained Minimization Techniques. John Wiley & Sons, New York, 1968. Reprint : Volume 4 of SIAM Classics in Applied Mathematics, SIAM Publications, Philadelphia, PA, 1990. [105] R. Freund and F. Jarre. An interiorpoint method for fractional programs with convex constraints. Math. Programming, 67:407440, 1994.
BIBLIOGRAPHY
159
[106] R. M. Freund. Dual gauge programs, with applications to quadratic programming and the minimum-norm problem. Math. Programming, 38:4767, 1987. [107] R. M. Freund. Polynomialtime algorithms for linear programming based only on primal scaling and projected gradients of a potential function. Math. Programming, 51:203222, 1991. [108] E. Fridman and U. Shaked. A new H lter design for linear time delay systems. IEEE Transactions on Signal Processing, 49(11):28392843, July 2001. [109] K. R. Frisch. The logarithmic potential method for convex programming. Unpublished manuscript, Institute of Economics, University of Oslo, Oslo, Norway, 1955. [110] A. Frieze and M. Jerrum. Improved approximation algorithms for max k -cut and max bisection. Algorithmica, 18:6781, 1997. [111] M. Fu, C. de Souza, and Z. Luo. Finite-horizon robust Kalman lter design. IEEE Transactions on Signal Processing, 49(9):21032112, Sept. 2001. [112] M. Fu, Z.-Q. Luo and Y. Ye. Approximation algorithms for quadratic programming. manuscript, Department of Electrical and Computer Engineering, McMaster University, Hamilton, Ontario, CANADA L8S 4K1, 1996. [113] T. Fujie and M. Kojima, Semidenite programming relaxation for nonconvex quadratic programs, Journal of Global Optimization 10:367-380, (1997). [114] K. Fujisawa, M. Fukuda, M. Kojima, and K. Nakata. Numerical Evaluation of SDPA (SemiDenite Programming Algorithm). In High Performance Optimization. Kluwer Academic Press, 1999, pp. 267301. [115] K. Fujisawa, M. Kojima and K. Nakata, Exploiting Sparsity in PrimalDual Interior Point Methods for Semidenite Programming Research Report B-324, Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Oh-Okayama, Meguro-ku, Tokyo 152, September 1997. [116] D. Ganesan, B. Krishnamachari, A. Woo, D. Culler, D. Estrin, and S. Wicker. An empirical study of epidemic algorithms in large scale multihop wireless networks. UCLA/CSD-TR-02-0013, Computer Science, UCLA, 2002. [117] D. Gale. The Theory of Linear Economic Models. McGraw-Hill, New York, 1960.
160
BIBLIOGRAPHY
[118] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco, 1979. [119] D. M. Gay. A variant of Karmarkars linear programming algorithm for problems in standard form. Math. Programming, 37:8190, 1987. Errata in Math. Programming, 40:111, 1988. [120] J. Geromel. Optimal linear ltering under parameter uncertainty. IEEE Transactions on Signal Processing, 47(1):168175, Jan. 1999. [121] J. Geromel and M. De Oliveira. H2 /H robust ltering for convex bounded uncertain systems. IEEE Transactions on Automatic Control, 46(1):100107, Jan. 2001. [122] L. E. Gibbons, D. W. Hearn and P. M. Pardalos. A continuous based heuristic for the maximum clique problem. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 26:103-124, 1996. [123] P. E. Gill, W. M. Murray, and M. H. Wright. Practical Optimization. Academic Press, London, 1981. [124] P. E. Gill, W. Murray, M. A. Saunders, J. A. Tomlin, and M. H. Wright. On projected Newton barrier methods for linear programming and an equivalence to Karmarkars projective method. Math. Programming, 36:183209, 1986. [125] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisability problems using semidenite programming. Journal of Assoc. Comput. Mach., 42:11151145, 1995. [126] M. X. Goemans and D. P. Williamson, Approximation algorithms for MAX-3-CUT and other problems via complex semidenite programming, in 33rd STOC, pp. 443-452, 2001. [127] J. L. Gon, Z. Q. Luo, and Y. Ye. Complexity analysis of an interiorpoint cutting plane method for convex feasibility problem. SIAM J. Optimization, 6(3):638652, 1996. [128] J. L. Gon and J. P. Vial. Cutting planes and column generation techniques with the projective algorithm. J. Optim. Theory Appl., 65:409 429, 1990. [129] D. Goldfarb and G. Iyengar. Robust portfolio selection problems. Mathematics of Operations Research. Submitted. [130] D. Goldfarb, K. Scheinberg. Interior Point Trajectories in Semidenite Programming. SIAM J. Opt. 8(4):871886, 1998.
BIBLIOGRAPHY
161
[131] D. Goldfarb and M. J. Todd. Linear Programming. In G. L. Nemhauser, A. H. G. Rinnooy Kan, and M. J. Todd, editors, Optimization, volume 1 of Handbooks in Operations Research and Management Science, pages 141170. North Holland, Amsterdam, The Netherlands, 1989. [132] A. J. Goldman and A. W. Tucker. Polyhedral convex cones. In H. W. Kuhn and A. W. Tucker, Editors, Linear Inequalities and Related Systems, page 1940, Princeton University Press, Princeton, NJ, 1956. [133] G. H. Golub and C. F. Van Loan. Matrix Computations, 2nd Edition. Johns Hopkins University Press, Baltimore, MD, 1989. [134] C. C. Gonzaga. Polynomial ane algorithms for linear programming. Math. Programming, 49:721, 1990. [135] C. C. Gonzaga. An algorithm for solving linear programming problems in O(n3 L) operations. In N. Megiddo, editor, Progress in Mathematical Programming : Interior Point and Related Methods, Springer Verlag, New York, 1989, pages 128. [136] C. C. Gonzaga. Path following methods for linear programming. SIAM Review, 34(2):167227, 1992. [137] J. Graver, B. Servatius, H. Servatius. Combinatorial Rigidity. AMS, 1993. [138] P. Gritzmann and V. Klee, Inner and Outer j -Radii of Convex Bodies in Finite-Dimensional Normed Spaces, Discrete Comput. Geom., 7:255280, 1992. [139] P. Gritzmann and V. Klee, Computational Complexity of Inner and Outer j -Radii of Polytopes in Finite-Dimenional Normed Spaces, Math. Program., 59:162-213, 1993. [140] P. Gritzmann and V. Klee, On the Complexity of Some basic Problems in Computational Convexity: I. Containment Problems, Discrete Math., 136:129-174, 1994. [141] P. Gritzmann and V. Klee. Mathematical programming and convex geometry. In P. Gruber and J. Wills, editors, Hand Book of Convex Geometry, North Holland, Amsterdam, 1993, page 627674. [142] M. Gr otschel, L. Lov asz and A. Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer, Berlin, 1988. [143] B. Gr unbaum. Convex Polytopes. John Wiley & Sons, New York, NY, 1967. [144] O. G uler. Barrier functions in interior-point methods. Mathematics of Operations Research, 21(4):860885, 1996.
162
BIBLIOGRAPHY
[145] O. G uler. Existence of interior points and interior paths in nonlinear monotone complementarity problems. Mathematics of Operations Research, 18(1):128147, 1993. [146] O. G uler and L. Tuncel. Characterizations of the barrier parameter of homogeneous convex cones. Research Report CORR 95-14, Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario, Canada, 1995. To appear in Math. Programming. [147] O. G uler and Y. Ye. Convergence behavior of interior point algorithms. Math. Programming, 60:215228, 1993. [148] D. Guo, L. Rasmussen, S. Sun, and T. Lim. A matrix-algebraic approach to linear parallel interference cancellation in CDMA. IEEE Transactions on Communications, 48(1):152161, Jan. 2000. [149] E. Halperin and U. Zwick, Approximation algorithms for MAX 4-SAT and rounding procedures for semidenite programs, Proceedings of 7th IPCO (1999), 202-217. [150] E. Halperin and U. Zwick. A unied framework for obtaining improved approximation algorithms for maximum graph bisection problems. In IPCO, Lecture Notes in Computer Science. Springer, Berlin, 2001. [151] Q. Han, Y. Ye, and J. Zhang. An improved rounding method and semidefinite programming relaxation for graph partition. Math. Programming, 92:509535, 2002. [152] S. Har-Peled and K.R. Varadarajan, Approximate Shape Fitting via Linearization, In Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., 66-73, 2001. [153] S. Har-Peled and K. Varadarajan, Projective Clustering in High Dimensions Using Core-sets, In Proc. ACM Symp. Comput. Geom., 2002. [154] S. Har-Peled and K. Varadarajan, High-Dimensional Shap Fitting in Linear Time, In Proc. ACM Symp. Comput. Geom., 2003. [155] J. Hartmanis and R. E. Stearns, On the computational complexity of algorithms. Trans. A.M.S, 117:285306, 1965. [156] C. Helmberg and F. Rendl. A spectral bundle method for semidenite programming. SIAM Journal of Optimization, 10:673696, 2000. [157] C. Helmberg, F. Rendl, R. J. Vanderbei and H. Wolkowicz. An interior point method for semidenite programming. SIAM J. Optimization, 6:342361, 1996. [158] B. Hendrickson. Conditions for Unique Graph Realizations. SIAM J. Comput. 21(1):6584, 1992.
BIBLIOGRAPHY
163
[159] B. Hendrickson. The Molecule Problem: Exploiting Structure in Global Optimization. SIAM J. Opt. 5(4):835857, 1995. [160] D. den Hertog. Interior Point Approach to Linear, Quadratic and Convex Programming, Algorithms and Complexity. Ph.D. Thesis, Faculty of Mathematics and Informatics, TU Delft, NL2628 BL Delft, The Netherlands, 1992. [161] J. Hightower and G. Boriello. Location systems for ubiquitous computing. IEEE Computer, 34(8) (2001) 57-66. [162] R. Horst, P. M. Pardalos, and N. V. Thoai. Introduction to Global Optimization. Kluwer Academic Publishers, Boston, 1995. [163] A. Howard, M. J. Mataric, and G. S. Sukhatme. Relaxation on a mesh: a formalism for generalized localization. In Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS01) (2001) 1055-1060. [164] L. Huaizhong and M. Fu. A linear matrix inequality approach to robust H ltering. IEEE Transactions on Signal Processing, 45(9):23382350, Sept. 1997. [165] P. Huard. Resolution of mathematical programming with nonlinear constraints by the method of centers. In J. Abadie, editor, Nonlinear Programming, pages 207219. North Holland, Amsterdam, The Netherlands, 1967. [166] P. Huard and B. T. Li eu. La m ethode des centres dans un espace topologique. Numeriche Methematik, 8:56-67, 1966. [167] B. Jackson, T. Jord an. Connected Rigidity Matroids and Unique Realizations of Graphs. Preprint, 2003. [168] R. Jagannathan and S. Schaible. Duality in generalized fractional programming via Farkas lemma. Journal of Optimization Theory and Applications, 41:417424, 1983. [169] B. Jansen. Interior Point Techniques in Optimization. Ph.D. Thesis, Faculty of Mathematics and Informatics, TU Delft, NL2628 BL Delft, The Netherlands, 1995. [170] F. Jarre. An interiorpoint method for minimizing the largest eigenvalue of a linear combination of matrices. SIAM J. Control and Optimization, 31:13601377, 1993. [171] F. Jarre. Interiorpoint methods for convex programming. Applied Mathematics & Optimization, 26:287311, 1992. [172] F. Jarre, M. Kocvara, and J. Zowe. Optimal truss design by interior point methods. SIAM J. Optim., 8(4):10841107, 1998.
164
BIBLIOGRAPHY
[173] F. John. Extremum problems with inequalities as subsidiary conditions. In Sduties and Essays, presented to R. Courant on his 60th Birthday, January 8, 1948, Interscience, New York, 1948, page 187-204. [174] N. Johnson and S. Kotz, Distributions in Statistics: Continuous Multivariate Distributions, John Wiley & Sons, 1972. [175] W.B. Johnson and J. Lindenstrauss, Extensions of Lipshitz Mapping into Hilbert Space, Comtemporary Mathematics, 26:189-206, 1984. [176] M. J Kaiser, T. L. Morin and T. B. Trafalis. Centers and invariant points of convex bodies. In P. Gritzmann and B. Sturmfels, editors, Applied Geometry and Discrete Mathematics: The Victor Klee Festschrift, Amer. Math. Soc. and Ass. Comput. Mach., Providence, RI, 1991. [177] D. Karger, R. Motwani and M. Sudan, Approximate graph coloring by semidenite programming, in 35th FOCS (1994), 2-13. [178] S. E. Karisch, F. Rendl, and J. Clausen, Solving graph bisection problems with semidenite programming, Technical Report DIKU-TR-97/9, Department of Computer Science, University of Copenhagen, July 1, 1997. [179] N. K. Karmarkar. A new polynomialtime algorithm for linear programming. Combinatorica, 4:373395, 1984. [180] R. M. Karp. Reducibility among combinatorial problems. R. E. Miller and J. W. Thatcher, eds., Complexity of Computer Computations, Plenum Press, New York, 1972, pages 85103. [181] L. G. Khachiyan. A polynomial algorithm for linear programming. Doklady Akad. Nauk USSR, 244:10931096, 1979. Translated in Soviet Math. Doklady, 20:191194, 1979. [182] L. G. Khachiyan and M. J. Todd. On the complexity of approximating the maximal inscribed ellipsoid for a polytope. Math. Programming, 61:137160, 1993. [183] J. Kleinberg, C. Papadimitriou, and P. Raghavan. Segmentation problems. In Proceedings of the 30th Symposium on Theory of Computation, pages 473482, 1998. [184] V. Klee and G. J. Minty. How good is the simplex method. In O. Shisha, editor, Inequalities III, Academic Press, New York, NY, 1972. [185] E. de Klerk, C. Roos and T. Terlaky. Initialization in semidenite programming via a selfdual skewsymmetric embedding. Report 9610, Faculty of Technical Mathematics and Computer Science, Delft University of Technology, Delft, 1996. To appear in Operations Research Letters.
BIBLIOGRAPHY
165
[186] M. Kocvara, J. Zowe, and A. Nemirovski. Cascadingan approach to robust material optimization. Computers and Structures, 76:431442, 2000. [187] M. Kojima, N. Megiddo, T. Noma and A. Yoshise. A unied approach to interior point algorithms for linear complementarity problems. Lecture Notes in Computer Science 538, Springer Verlag, New York, 1991. [188] M. Kojima, S. Mizuno, and A. Yoshise. A polynomialtime algorithm for a class of linear complementarity problems. Math. Programming, 44:126, 1989. [189] M. Kojima, S. Mizuno, and A. Yoshise. An O( nL) iteration potential reduction algorithm for linear complementarity problems. Math. Programming, 50:331342, 1991. [190] M. Kojima, S. Shindoh, and S. Hara Interior-point methods for the monotone semidenite linear complementarity problem in symmetric matrices. SIAM J. Optimization, 7:86125, 1997. [191] M. Kojima, N. Megiddo, and Y. Ye. An interior point potential reduction algorithm for the linear complementarity problem. Math. Programming, 54:267279, 1992. [192] K. O. Kortanek, F. Potra, and Y. Ye. On some ecient interior point methods for nonlinear convex programming. Linear Algebra and Its Applications, 152:169189, 1991. [193] G. Kortsarz and D. Peleg, On choosing a dense subgraph, in 34th FOCS, (1993) 692-701. [194] E. Kranich. Interiorpoint methods bibliography. SIAG/OPT Views andNews, A Forum for the SIAM Activity Group on Optimization, 1:11, 1992. [195] H. W. Kuhn and A. W. Tucker. Nonlinear programming. In J. Neyman, editor, Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, pages 481492, University of California Press, Berkeley and Los Angeles, CA, 1961. [196] J. C. Lagarias. The nonlinear geometry of linear programming, Part III : Projective Legendre transform coordinates and Hilbert geometry. Transactions of the American Mathematical Society, 320:193225, 1990. [197] J. C. Lagarias. A collinear scaling interpretation of Karmarkars linear programming algorithm. SIAM J. Optimization, 3:630636, 1993. [198] J. Lasserre. Global optimization with polynomials and the problem of moments. SIAM J. Optimization, 11:796817, 2001.
166
BIBLIOGRAPHY
[199] J. Lasserre. Semidenite programming vs. lp relaxation for polynomial programming. Mathematics of Operations Research, 27(2):347360, May 2002. [200] M. Laurent. Matrix Completion Problems. The Encyclopedia of Optimization 3:221229, 2001. [201] A. Levin. On an algorithm for the minimization of convex functions. Soviet Math. Doklady, 6:286-290, 1965. [202] C. Lin and R. Saigal, On solving large scale semidenite programming problems - a case study of quadratic assignment problem, Technical Report, Dept. of Ind. and Oper. Eng., University of Michigan, Ann Arbor, MI, 1997. [203] N. Linial, E. London, Yu. Rabinovich. The Geometry of Graphs and Some of Its Algorithmic Applications. Combinatorica 15(2):215245, 1995. [204] M. S. Lobo, L. Vandenberghe, and S. Boyd. Second-order cone programming. Manuscript, Information Systems Laboratory, Dept. of Electrical Engineering, Stanford University, Stanford, CA, 1997. [205] L. Lovasz. An Algorithmic Theory of Numbers, Graphs and Convexity, volume 50 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1986. [206] L. Lov asz, On the Shannon Capacity of a graph, IEEE Transactions of Information Theory IT-25(1):1-7, Jan. 1979. [207] L. Lov asz and A. Shrijver. Cones of matrices and setfunctions, and 0 1 optimization. SIAM J. Optimization, 1:166-190, 1990. [208] W. Lu. A unied approach for the design of 2-D digital lters via semidefinite programming. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 49(6):814826, June 2002. [209] D. G. Luenberger. Linear and Nonlinear Programming, 2nd Edition. AddisonWesley, Menlo Park, CA, 1984. [210] Z. Q. Luo, J. Sturm and S. Zhang. Superlinear convergence of a symmetric primal-dual path following algorithm for semidenite programming. SIAM J. Optimization 8(1):59-81, 1998. [211] Z. Q. Luo, J. Sturm and S. Zhang. Duality and self-duality for conic convex programming. Report 9620/A, Econometric Institute, Erasmus University Rotterdam, 1996. [212] I. J. Lustig, R. E. Marsten, and D. F. Shanno. Computational experience with a globally convergent primaldual predictorcorrector algorithm for linear programming. Math. Programming, 66:123135, 1994.
BIBLIOGRAPHY
167
[213] I. J. Lustig, R. E. Marsten, and D. F. Shanno. Interior point methods : Computational state of the art. ORSA Journal on Computing, 6:1 14, 1994. [214] S. Mahajan and H. Ramesh, Derandomizing semidenite programming based approximation algorithms, in 36th FOCS, pp. 162-169, 1995. [215] M. Mahmoud and A. Boujarwah. Robust H ltering for a class of linear parameter-varying systems. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 48(9):11311138, Sept. 2001. [216] L. McLinden. The analogue of Moreaus proximation theorem, with applications to the nonlinear complementarity problem. Pacic Journal of Mathematics, 88:101161, 1980. [217] N. Megiddo. Pathways to the optimal set in linear programming. In N. Megiddo, editor, Progress in Mathematical Programming : Interior Point and Related Methods, pages 131158. Springer Verlag, New York, 1989. Identical version in : Proceedings of the 6th Mathematical Programming Symposium of Japan, Nagoya, Japan, 1986, pages 135. [218] S. Mehrotra. Quadratic convergence in a primaldual method. Mathematics of Operations Research, 18:741-751, 1993. [219] S. Mehrotra. On the implementation of a primaldual interior point method. SIAM J. Optimization, 2(4):575601, 1992. [220] G. Millerioux and J. Daafouz. Global chaos synchronization and robust ltering in noisy context. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 48(10):11701176, Oct. 2001. [221] Hans D. Mittelmann. Independent http://plato.la.asu.edu/, November 2000. Benchmark Results
[222] S. Mizuno, M. J. Todd, and Y. Ye. On adaptive step primaldual interiorpoint algorithms for linear programming. Mathematics of Operations Research, 18:964981, 1993. [223] S. Mizuno, M. J. Todd, and Y. Ye. A surface of analytic centers and primaldual infeasibleinteriorpoint algorithms for linear programming. Mathematics of Operations Research, 20:135162, 1995. [224] R. D. C. Monteiro and I. Adler. Interior path following primaldual algorithms : Part I : Linear programming. Math. Programming, 44:27 41, 1989. [225] R. D. C. Monteiro and I. Adler. Interior path following primaldual algorithms : Part II : Convex quadratic programming. Math. Programming, 44:4366, 1989.
168
BIBLIOGRAPHY
[226] R. D. C. Monteiro and J-S. Pang. On two interior-point mappings for nonlinear semidenite complementarity problems. Working Paper, John Hopkins University, 1996. [227] R. D. C. Monteiro and P. R. Zanj acomo, Implementation of primal-dual methods for semidenite programming based on Monteiro and Tsuchiya directions and their variants, Technical Report, School of Ind. and Systems Engineering, Georgia Institute of Technology, Atlanta, 1997. [228] R. D. C. Monteiro and Y. Zhang. A unied analysis for a class of pathfollowing primal-dual interior-point algorithms for semidenite programming. School of ISyE, Georgia Tech, GA 30332, 1995. [229] J. J. Mor e. The Levenberg-Marquardt algorithm: implementation and theory. In G. A. Watson, editor: Numerical Analysis Springer-Verlag, New York, 1977. [230] J. Mor e, Z. Wu. Global Continuation for Distance Geometry Problems. SIAM J. Opt. 7:814836, 1997. [231] K. G. Murty. Linear Complementarity, Linear and Nonlinear Programming. Heldermann, Verlag, Berlin, 1988. [232] K. G. Murty and S. N. Kabadi. Some NP-complete problems in quadratic and nonlinear programming. Math. Programming, 39:117-129, 1987. [233] M. Muramatsu. Ane scaling algorithm fails for semidenite programming. Research report No.16, Department of Mechanical Engineering, Sophia University, Japan, July 1996. [234] M. Muramatsu and T. Tsuchiya. An ane scaling method with an infeasible starting point: convergence analysis under nondegeneracy assumption. Annals of Operations Research, 62:325356, 1996. [235] M. Muramatsu and R. Vanderbei, Primal-dual ane scaling algorithms fail for semidenite programming, Technical Report, SOR, Princeton University, Princeton, NJ, 1997. [236] S. G. Nash and A. Sofer. A barrier method for largescale constrained optimization. ORSA Journal on Computing, 5:4053, 1993. [237] D. J. Newman. Location of the maximum on unimodal surfaces. J. Assoc. Comput. Math., 12:395-398, 1965. [238] A. Nemirovski, C. Roos and T. Terlaky, On Maximization of Quadratic Forms Over Intersection of Ellipsoids with Common Center, Math. Program., 86:463-473, 1999.
BIBLIOGRAPHY
169
[239] A. S. Nemirovsky and D. B. Yudin. Problem Complexity and Method Eciency in Optimization. John Wiley and Sons, Chichester, 1983. Translated by E. R. Dawson from Slozhnost Zadach i Eektivnost Metodov Optimizatsii, 1979, Glavnaya redaktsiya ziko-matematicheskoi literatury, Izdatelstva Nauka. [240] Y. E. Nesterov. Semidenite relaxation and nonconvex quadratic optimization. Optimization Methods and Software 9 (1998) 141-160. [241] Yu. E. Nesterov and A. S. Nemirovskii. Interior Point Polynomial Methods in Convex Programming: Theory and Algorithms. SIAM Publications. SIAM, Philadelphia, 1993. [242] Yu. E. Nesterov and M. J. Todd. Self-scaled barriers and interior-point methods for convex programming. Mathematics of Operations Research, 22(1):142, 1997. [243] Yu. E. Nesterov and M. J. Todd. Primal-dual interior-point methods for self-scaled cones. Technical Report No. 1125, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY, 1995, to appear in SIAM J. Optimization. [244] Yu. Nesterov, M. J. Todd and Y. Ye. Infeasible-start primal-dual methods and infeasibility detectors for nonlinear programming problems. Technical Report No. 1156, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY, 1996. [245] D. Niculescu and B. Nath. Ad-hoc positioning system. In IEEE GlobeCom, Nov. 2001. [246] R. Palhares, C. de Souza, and P. Dias Peres. Robust H ltering for uncertain discrete-time state-delayed systems. IEEE Transactions on Signal Processing, 48(8):16961703, Aug. 2001. [247] R. Palhares and P. Peres. LMI approach to the mixed H2 /H ltering design for discrete-time uncertain systems. IEEE Transactions on Aerospace and Electronic Systems, 37(1):292296, Jan. 2001. [248] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. PrenticeHall, Englewood Clis, NJ, 1982. [249] P. M. Pardalos and J. B. Rosen. Constrained Global Optimization: Algorithms and Applications. Springer-Verlag, Lecture Notes in Computer Sciences 268, 1987. [250] J. Park, H. Cho, and D. Park. Design of GBSB neural associative memories using semidenite programming. IEEE Transactions on Neural Networks, 10(4):946950, July 1999
170
BIBLIOGRAPHY
[251] P. Parrilo. Semidenite programming relaxations for semialgebraic problems. Mathematical Programming. To appear. [252] G. Pataki. On the Rank of Extreme Matrices in Semidenite Programs and the Multiplicity of Optimal Eigenvalues. Mathematics of Operations Research, 23(2):339-358, 1998. [253] E. Petrank, The hardness of approximation: gap location, Computational Complexity, 4 (1994), 133-157. [254] S. Polijak, F. Rendl and H. Wolkowicz. A recipe for semidenite relaxation for 0-1 quadratic programming. Journal of Global Optimization, 7:51-73, 1995. [255] R. Polyak. Modied barrier functions (theory and methods). Math. Programming, 54:177222, 1992. [256] F. A. Potra and Y. Ye. Interior point methods for nonlinear complementarity problems. Journal of Optimization Theory and Application, 88:617642, 1996. [257] F. Potra and R. Sheng. Homogeneous interiorpoint algorithms for semidenite programming. Reports On Computational Mathematics, No. 82/1995, Department of Mathematics, The University of Iowa, 1995. [258] E. Rains. A semidenite program for distillable entanglement. IEEE Transactions on Information Theory, 47(7):29212933, November 2001. [259] M. Ramana. An exact duality theory for semidenite programming and its complexity implications. Math. Programming, 77:129162, 1997. [260] M. Ramana, L. Tuncel and H. Wolkowicz. Strong duality for semidefinite programming. CORR 95-12, Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario, Canada, 1995. To appear in SIAM J. Optimization. [261] L. Rasmussen, T. Lim, and A. Johansson. A matrix-algebraic approach to successive interference cancellation in CDMA. IEEE Transactions on Communications, 48(1):145151, Jan. 2000. [262] F. Rendl and H. Wolkowicz. A semidenite framework for trust region subproblems with applications to large scale minimization. Math. Programming, 77:273300, 1997. [263] J. Renegar. A polynomialtime algorithm, based on Newtons method, for linear programming. Math. Programming, 40:5993, 1988. [264] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, New Jersey, 1970.
BIBLIOGRAPHY
171
[265] C. Roos, T. Terlaky and J.-Ph. Vial. Theory and Algorithms for Linear Optimization: An Interior Point Approach. John Wiley & Sons, Chichester, 1997. [266] M. Saadat and T. Chen. Design of nonuniform multirate lter banks by semidenite programming. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 47(11):13111314, November 2000. [267] R. Saigal and C. J. Lin. An infeasible start predictor corrector method for semi-denite linear programming. Tech. Report, Dept. of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI, Dec 1995. [268] C. Savarese, J. Rabay, K. Langendoen. Robust Positioning Algorithms for Distributed AdHoc Wireless Sensor Networks. USENIX Annual Technical Conference, 2002. [269] A. Savvides, C.C. Han, M. B. Srivastava. Dynamic FineGrained Localization in AdHoc Networks of Sensors. Proc. 7th MOBICOM, 166179, 2001. [270] A. Savvides, H. Park, M. B. Srivastava. The Bits and Flops of the n hop Multilateration Primitive for Node Localization Problems. Proc. 1st WSNA, 112121, 2002. [271] A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, New York, 1986. [272] U. Shaked, L. Xie, and Y. Soh. New approaches to robust minimum variance lter design. IEEE Transactions on Signal Processing, 49(11):2620 2629, November 2001. [273] Y. Shang, W. Ruml, Y. Zhang, M. P. J. Fromherz. Localization from Mere Connectivity. Proc. 4th MOBIHOC, 201212, 2003. [274] W. F. Sheppard, On the calculation of the double integral expressing normal correlation, Transactions of the Cambridge Philosophical Society 19 (1900) 23-66. [275] H. D. Sherali and W. P. Adams, Computational advances using the reformulation-linearization technique (rlt) to solve discrete and continuous nonconvex problems. Optima, 49:1-6, 1996. [276] Masayuki Shida, Susumu Shindoh and Masakazu Kojima. Existence of search directions in interior-point algorithms for the SDP and the monotone SDLCP. Research Report B-310, Dept. of Mathematical and Computing Sciences, Tokyo Institute of Technology, Oh-Okayama, Meguro, Tokyo 152 Japan, 1996. To appear in SIAM J. Optimization.
172
BIBLIOGRAPHY
[277] N. Z. Shor, Quadratic optimization problems, Soviet Journal of Computer and Systems Sciences, 25:1-11, 1987. Originally published in Tekhnicheskaya Kibernetika, No. 1, 1987, pp. 128-139. [278] M. Skutella, Semidenite relaxations for parallel machine scheduling, in Proceedings of the 39th Annual IEEE Symposium on Foundations of Computer Science (FOCS98), pp. 472-481, 1998. [279] M. Skutella, Convex quadratic and semidenite programming relaxations in scheduling, Journal of the ACM, 48(2), 206-242, 2001. [280] G. Sonnevend. An analytic center for polyhedrons and new classes of global algorithms for linear (smooth, convex) programming. In A. Prekopa, J. Szelezsan, and B. Strazicky, editors, System Modelling and Optimization : Proceedings of the 12th IFIPConference held in Budapest, Hungary, September 1985, volume 84 of Lecture Notes in Control and Information Sciences, pages 866876. Springer Verlag, Berlin, Germany, 1986. [281] D. C. Sorenson. Newtons method with a model trust region modication. SIAM J. Numer. Anal., 19:409-426, 1982. [282] A. Srivastav and K. Wolf, Finding dense subgraphs with semidenite programming, in Approximation Algorithms for Combinatorial Optimization, K. Jansen and J. Rolim (Eds.) (1998), 181-191. [283] P. Stoica, T. McKelvey, and J. Mari. Ma estimation in polynomial time. IEEE Transactions on Signal Processing, 48(7):19992012, July 2000. [284] J. F. Sturm. SeDumi, Let SeDuMi seduce http://fewcal.kub.nl/sturm/software/sedumi.html, October 2001. you.
[285] J. F. Sturm and S. Zhang. Symmetric primal-dual path following algorithms for semidenite programming. Report 9554/A, Econometric Institute, Erasmus University, Rotterdam, 1995 [286] J.F. Sturm and S. Zhang. On cones of nonnegative quadratic functions. SEEM2001-01, Chinese University of Hong Kong, March 2001. [287] K. Tanabe. Complementarityenforced centered Newton method for mathematical programming. In K. Tone, editor, New Methods for Linear Programming, pages 118144, The Institute of Statistical Mathematics, 467 Minami Azabu, Minatoku, Tokyo 106, Japan, 1987. [288] K. Tanabe. Centered Newton method for mathematical programming. In M. Iri and K. Yajima, editors, System Modeling and Optimization, Proceedings of the 13th IFIP Conference, pages 197206, Springer-Verlag, Tokyo, Japan, 1988.
BIBLIOGRAPHY
173
[289] R. A. Tapia. Current research in numerical optimization. SIAM News, 20:1011, March 1987. [290] S. P. Tarasov, L. G. Khachiyan, and I. I. Erlikh. The method of inscribed ellipsoids. Soviet Mathematics Doklady, 37:226230, 1988. [291] H. Tan and L. K. Rasmussen. The application of semidenite programming for detection in CDMA. IEEE Journal on Selected Areas in Communications, 19(8):14421449, August 2001. [292] Z. Tan, Y. Soh, and L. Xie. Envelope-constrained H lter design: an LMI optimization approach. IEEE Transactions on Signal Processing, 48(10):29602963, Oct. 2000. [293] Z. Tan, Y. Soh, and L. Xie. Envelope-constrained H FIR lter design. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 47(1):7982, Jan. 2000. [294] T. Terlaky, editor. Interior-Point Methods in Mathematical Programming Kluwer Academic Publisher, Boston, 1996. [295] M. J. Todd, Improved bounds and containing ellipsoids in Karmarkars linear programming algorithm. Mathematics of Operations Research, 13:650-659, 1988. [296] M. J. Todd. Potential-reduction methods in mathematical programming. Math. Programming, 76:345, 1996. [297] M. J. Todd, A study of search directions in primal-dual interior-point methods for semidenite programming, Optimization Methods and Software 11&12 (1999) 1-46. [298] M. J. Todd and B. P. Burrell. An extension of Karmarkars algorithm for linear programming using dual variables. Algorithmica, 1(4):409424, 1986. [299] M. J. Todd and Y. Ye. A centered projective algorithm for linear programming. Mathematics of Operations Research, 15:508529, 1990. [300] K. C. Toh. A note on the calculation of steplengths in interior point methods for semidenite programming. To appear Computational Optimization and Applications, 2001. [301] K. C. Toh, M. J. Todd, and R. H. T ut unc u. SDPT3 A MATLAB software package for semidenite programming, version 1.3 Optimization Software and Methods, 11:545-581, 1999. [302] K. C. Toh and M. Kojima. Solving some large scale semidenite programs via the conjugate residual method. Working paper, August 2000.
174
BIBLIOGRAPHY
[303] Y. L. Tong, The Multivariate Normal Distribution, Springer-Verlag, New York, 1990. [304] J. F. Traub, G. W. Wasilkowski and H. Wozniakowski. InformationBased Complexity. Academic Press, San Diego, CA, 1988. [305] P. Tseng. Search directions and convergence analysis of some infeasible path-following methods for the monotone semi-denite LCP. Report, Department of Mathematics, University of Washington, Seattle, Washington 98195, 1996. [306] P. Tseng. Approximating QP. Report, Department of Mathematics, University of Washington, Seattle, Washington, 2003. [307] C. Tseng and B. Chen. H fuzzy estimation for a class of nonlinear discrete-time dynamic systems. IEEE Transactions on Signal Processing, 49(11):26052619, Nov. 2001. [308] H. Tuan, P. Apkarian, and T. Nguyen. Robust and reduced-order ltering: new LMI-based characterizations and methods. IEEE Transactions on Signal Processing, 49(12):29752984, Dec. 2001. [309] H. Tuan, P. Apkarian, T. Nguyen, and T. Narikiyo. Robust mixed H2 /H ltering of 2-D systems. IEEE Transactions on Signal Processing, 50(7):17591771, July 2002. [310] A. W. Tucker. Dual systems of homogeneous linear relations, in H. W. Kuhn and A. W. Tucker, Editors, Linear Inequalities and Related Systems, page 3, Princeton University Press, Princeton, NJ, 1956. [311] P. M. Vaidya. A new algorithm for minimizing a convex function over convex sets. Math. Programming, 73:291341, 1996. [312] P. M. Vaidya and D. S. Atkinson. A technique for bounding the number of iterations in path following algorithms. In P. Pardalos, editor, Complexity in Numerical Optimization page 462489, World Scientic, New Jersey, 1993. [313] L. Vandenberghe and S. Boyd. A primaldual potential reduction method for problems involving matrix inequalities. Math. Programming, 69:205 236, 1995. [314] L. Vandenberghe and S. Boyd. Semidenite programming. SIAM Review, 38(1):4995, 1996. [315] L. Vandenberghe, S. Boyd, and S.P. Wu. Determinant maximization with linear matrix inequality constraints. Technical Report, Information Systems Laboratory, Stanford University, CA, 1996.
BIBLIOGRAPHY
175
[316] R. J. Vanderbei. LOQO: An Interior Point Code for Quadratic Programming. Program in Statistics & Operations Research, Princeton University, NJ, 1995. [317] K. Varadarajan, S. Venkatesh and J. Zhang, On Approximating the Radii of Point Sets in High Dimensions, In Proc. 43rd Annu. IEEE Sympos. Found. Comput. Sci., 2002. [318] S. A. Vavasis. Nonlinear Optimization: Complexity Issues. Oxford Science, New York, NY, 1991. [319] F. Wang and V. Balakrishnan. Robust Kalman lters for linear timevarying systems with stochastic parametric uncertainties. IEEE Transactions on Signal Processing, 50(4):803813, April 2002. [320] S. Wang, L. Xie, and C. Zhang. H2 optimal inverse of periodic FIR digital lters. IEEE Transactions on Signal Processing, 48(9):26962700, Sept. 2000. [321] T. Wang, R. D. C. Monteiro, and J.-S. Pang. An interior point potential reduction method for constrained equations. Math. Programming, 74(2):159196, 1996. [322] H. Wolkowicz and Q. Zhao, Semidenite programming for the graph partitioning problem, manuscript, 1996, to appear Discrete Applied Math.. [323] M. H. Wright. A brief history of linear programming. SIAM News, 18(3):4, November 1985. [324] S. J. Wright. Primal-Dual Interior-Point Methods SIAM, Philadelphia, 1996. [325] C. Xu, Y. Ye and J. Zhang. Approximate the 2-Catalog Segmentation Problem Using Semidenite Programming Relaxations. Optimization Methods and Software, 18:(6):705 - 719, 2003 [326] G. Xue and Y. Ye. Ecient algorithms for minimizing a sum of Euclidean norms with applications. Working Paper, Department of Management Sciences, The University of Iowa, 1995. To appear in SIAM J. Optimization. [327] F. Yang and Y. Hung. Robust mixed H2 /H ltering with regional pole assignment for uncertain discrete-time systems. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 49(8):12361241, Aug. 2002. [328] M. Yannakakis. Computing the minimum llin is NPcomplete. SIAM J. Algebraic Discrete Methods, pages 7779, 1981.
176
BIBLIOGRAPHY
[329] Y. Ye. Interior Point Algorithms : Theory and Analysis. WileyInterscience Series in Discrete Mathematics and Optimization. John Wiley & Sons, New York, 1997. [330] Y. Ye. Karmarkars algorithm and the ellipsoid method. Operations Research Letters, 6:177182, 1987. [331] Y. Ye. An O(n3 L) potential reduction algorithm for linear programming. Math. Programming, 50:239258, 1991. [332] Y. Ye. Combining binary search and Newtons method to compute real roots for a class of real functions. Journal of Complexity, 10:271280, 1994. [333] Y. Ye. Approximating quadratic programming with bound and quadratic constraints. Mathematical Programming 84 (1999) 219-226. [334] Y. Ye. A .699 approximation algorithm for Max-Bisection. Mathematical Programming 90:1 (2001) 101-111 [335] Y. Ye and M. Kojima. Recovering optimal dual solutions in Karmarkars polynomial algorithm for linear programming. Math. Programming, 39:305317, 1987. [336] Y. Ye, O. G uler, R. A. Tapia, and Y. Zhang. A quadratically convergent O( nL)iteration algorithm for linear programming. Math. Programming, 59:151162, 1993. [337] Y. Ye, M. J. Todd and S. Mizuno. An O( nL)iteration homogeneous and selfdual linear programming algorithm. Mathematics of Operations Research, 19:5367, 1994. [338] Y. Ye and S. Zhang. New results on quadratic minimization. Working Paper, Department of SSEM, The Chinese University of Hong Kong (May, 2001), to appear in SIAM J. Optimization. [339] Y. Ye and J. Zhang. An improved algorithm for approximating the radii of point sets. APPROX 2003, August 2003. [340] Q. Zhao, Semidenite Programming for Assignment and Partitioning Problems PhD thesis, University of Waterloo, 1996. [341] Q. Zhao, S. Karisch, F. Rendl, and H. Wolkowicz, Semidenite programming relaxations for the quadratic assignment problems, J. Combinatorial Optimization, 2.1, 1998, CORR 95-27. [342] H. Zhou, L. Xie, and C. Zhang. A direct approach to H2 optimal deconvolution of periodic digital channels. IEEE Transactions on Signal Processing, 50(7):16851698, July 2002.
BIBLIOGRAPHY
177
[343] J. Zowe, M. Kocvara, and M. Bendsoe. Free material optimization via mathematical programming. Mathematical Programming, 9:445466, 1997. [344] U. Zwick, Approximation algorithms for constraint satisfaction problems involving at most three variables per constraint, In 9th SODA (1998) 201-210. [345] U. Zwick. Outward rotationvs: a tool for rounding solutions of semidefinite programming relaxations, with applications to max cut and other problems. In Proceedings of the 31th Symposium on Theory of Computation, pages 679687, 1999.