Academia.eduAcademia.edu

LNMB PhD Course Networks and Semidefinite Programming 2014/2015

2015

LNMB PhD Course Networks and Semidefinite Programming 2014/2015 Monique Laurent CWI, Amsterdam, and Tilburg University These notes are based on material developed by M. Laurent and F. Vallentin for the Mastermath course Semidefinite Optimization February 8, 2015 CONTENTS 1 Positive semidefinite matrices 1.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Characterizations of positive semidefinite matrices n 1.1.2 The positive semidefinite cone S0 . . . . . . . . . 1.1.3 The trace inner product . . . . . . . . . . . . . . . 1.2 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Schur complements . . . . . . . . . . . . . . . . . 1.2.2 Kronecker and Hadamard products . . . . . . . . . 1.2.3 Properties of the kernel . . . . . . . . . . . . . . . 1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 3 3 4 4 5 6 7 2 Semidefinite programs 2.1 Semidefinite programs . . . . . . . . . . . . . 2.1.1 Recap on linear programs . . . . . . . 2.1.2 Semidefinite program in primal form . 2.1.3 Semidefinite program in dual form . . 2.1.4 Duality . . . . . . . . . . . . . . . . . 2.2 Application to eigenvalue optimization . . . . 2.3 Some facts about complexity . . . . . . . . . . 2.3.1 More differences between LP and SDP 2.3.2 Algorithms . . . . . . . . . . . . . . . 2.3.3 Gaussian elimination . . . . . . . . . . 2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 10 10 10 12 12 14 15 15 16 17 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Graph coloring and independent sets 20 3.1 Preliminaries on graphs . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.1 Stability and chromatic numbers . . . . . . . . . . . . . . 21 3.1.2 Perfect graphs . . . . . . . . . . . . . . . . . . . . . . . . . 22 i 3.1.3 The perfect graph theorem . . . . . . . . . . . . . 3.2 Linear programming bounds . . . . . . . . . . . . . . . . . 3.2.1 Fractional stable sets and colorings . . . . . . . . . 3.2.2 Polyhedral characterization of perfect graphs . . . 3.3 Semidefinite programming bounds . . . . . . . . . . . . . 3.3.1 The theta number . . . . . . . . . . . . . . . . . . 3.3.2 Computing maximum stable sets in perfect graphs 3.3.3 Minimum colorings of perfect graphs . . . . . . . . 3.4 Other formulations of the theta number . . . . . . . . . . 3.4.1 Dual formulation . . . . . . . . . . . . . . . . . . . 3.4.2 Two more (lifted) formulations . . . . . . . . . . . 3.5 The theta body TH(G) . . . . . . . . . . . . . . . . . . . . 3.6 The theta number for vertex-transitive graphs . . . . . . . 3.7 Bounding the Shannon capacity . . . . . . . . . . . . . . . 3.8 Geometric application . . . . . . . . . . . . . . . . . . . . 3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Approximating the MAX CUT problem 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 4.1.1 The MAX CUT problem . . . . . . . . . . 4.1.2 Linear programming relaxation . . . . . . 4.2 The algorithm of Goemans and Williamson . . . . 4.2.1 Semidefinite programming relaxation . . 4.2.2 Dual semidefinite programming relaxation 4.2.3 The Goemans-Williamson algorithm . . . 4.2.4 Remarks on the algorithm . . . . . . . . . 4.3 Extension to variations of MAX CUT . . . . . . . 4.3.1 The maximum bisection problem . . . . . 4.3.2 The maximum k-cut problem . . . . . . . 4.4 Extension to quadratic programming . . . . . . . 4.4.1 Nesterov’s approximation algorithm . . . 4.4.2 Quadratic programs modeling MAX 2SAT 4.4.3 Approximating MAX 2-SAT . . . . . . . . 4.5 Further reading and remarks . . . . . . . . . . . 4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 24 24 25 28 28 29 30 31 31 32 34 35 37 38 41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 45 45 47 50 50 52 54 57 57 57 59 61 61 63 64 65 66 CHAPTER 1 POSITIVE SEMIDEFINITE MATRICES In this chapter we collect basic facts about positive semidefinite matrices, which we will need in the next chapter to define semidefinite programs. We use the following notation. √ Throughout pPnkxk denotes the Euclidean norm 2 of x ∈ Rn , defined by kxk = xT x = i=1 xi . An orthonormal basis of Rn is a set of unit vectors {u1 , . . . , un } that are pairwise orthogonal: kui k = 1 for all i and uT i uj = 0 for all i 6= j. For instance, the standard unit vectors e1 , . . . , en ∈ Rn form an orthonormal basis. In denotes the n × n identity matrix and Jn denotes the all-ones matrix (we may sometimes omit the index n if the dimension is clear from the context). We let S n denote the set of symmetric n×n matrices and O(n) denote the set of orthogonal matrices. A matrix P ∈ Rn×n is orthogonal if P P T = In or, equivalently, P T P = In , i.e. the rows (resp., the columns) of P form an orthonormal basis of Rn . A diagonal matrix D ∈ S n has entries zero at all off-diagonal positions: Dij = 0 for all i 6= j. 1.1 Basic definitions 1.1.1 Characterizations of positive semidefinite matrices We recall the notions of eigenvalues and eigenvectors. For a matrix X ∈ Rn×n , a nonzero vector u ∈ Rn is an eigenvector of X if there exists a scalar λ ∈ R such that Xu = λu, then λ is the eigenvalue of X for the eigenvector u. A fundamental property of symmetric matrices is that they admit a set of eigenvectors {u1 , . . . , un } forming an orthonormal basis of Rn . This is the spectral decomposition theorem, one of the most important theorems about symmetric 1 matrices. Theorem 1.1.1. (Spectral decomposition theorem) Any real symmetric matrix X ∈ S n can be decomposed as X= n X λi u i u T i, (1.1) i=1 where λ1 , . . . , λn ∈ R are the eigenvalues of X and where u1 , . . . , un ∈ Rn are the corresponding eigenvectors which form an orthonormal basis of Rn . In matrix terms, X = P DP T , where D is the diagonal matrix with the λi ’s on the diagonal and P is the orthogonal matrix with the ui ’s as its columns. Next we define positive semidefinite matrices and give several equivalent characterizations. Theorem 1.1.2. (Positive semidefinite matrices) The following assertions are equivalent for a symmetric matrix X ∈ S n . (1) X is positive semidefinite, written as X  0, which is defined by the property: xT Xx ≥ 0 for all x ∈ Rn . (2) The smallest eigenvalue of is nonnegative, i.e., the spectral decomposition PX n of X is of the form X = i=1 λi ui uTi with all λi ≥ 0. (3) X = LLT for some matrix L ∈ Rn×k (for some k ≥ 1), called a Cholesky decomposition of X. (4) There exist vectors v1 , . . . , vn ∈ Rk (for some k ≥ 1) such that Xij = viT vj for all i, j ∈ [n]; the vectors vi ’s are called a Gram representation of X. (5) All principal minors of X are non-negative. Proof. (i) =⇒ (ii): By assumption, uT i Xui ≥ 0 for all i ∈ [n]. On the other hand, 2 Xui = λi ui implies uT Xu = λ ku k = λi , and thus λi ≥ 0 for all i. i i i i (ii) =⇒ (iii): By assumption, X has a decomposition (1.1) where all scalars λi are nonnegative. Define the matrix L ∈ Rn×n whose i-th column is the vector √ λi ui . Then X = LLT holds. (iii) =⇒ (iv): Assume X = LLT where L ∈ Rn×k . Let vi ∈ Rk denote the i-th row of L. The equality X = LLT gives directly that Xij = viT vj for all i, j ∈ [n]. (iv) =⇒ (i): Assume Xij P = viT vj for all i, j ∈P[n], where v1 , . . . , vnP ∈ Rk , and let n n n n T T x ∈ R . Then, x Xx = i,j=1 xi xj Xij = i,j=1 xi xj vi vj = k i=1 xi vi k2 is thus nonnegative. This shows that X  0. The equivalence (i) ⇐⇒ (v) can be found in any standard Linear Algebra textbook (and will not be used here). Observe that for a diagonal matrix X, X  0 if and only if its diagonal entries are nonnegative: Xii ≥ 0 for all i ∈ [n]. 2 The above result extends to positive definite matrices. A matrix X is said to be positive definite, which is denoted as X ≻ 0, if it satisfies any of the following equivalent properties: (1) xT Xx > 0 for all x ∈ Rn \ {0}; (2) all eigenvalues of X are strictly positive; (3) in a Cholesky decomposition of X, the matrix L is nonsingular; (4) in any Gram representation of X as (viT vj )ni,j=1 , the system of vectors {v1 , . . . , vn } has full rank n; and (5) all the principal minors of X are positive (in fact positivity of all the leading principal minors already implies positive definiteness, this is known as Sylvester’s criterion). n 1.1.2 The positive semidefinite cone S0 n We let S0 denote the set of all positive semidefinite matrices in S n , called the n positive semidefinite cone. Indeed, S0 is a convex cone in S n , i.e., the following holds: X, X ′  0, λ, λ′ ≥ 0 =⇒ λX + λ′ X ′  0 n (check it). Moreover, S0 is a closed subset of S n . (Assume we have a sequence (i) of matrices X  0 converging to a matrix X as i → ∞ and let x ∈ Rn . Then xT X (i) x ≥ 0 for all i and thus xT Xx ≥ 0 by taking the limit.) Moreover, as a n direct application of (1.1), we find that the cone S0 is generated by rank one matrices, i.e., n S0 = cone{xxT : x ∈ Rn }. (1.2) n Furthermore, the cone S0 is full-dimensional and the matrices lying in its interior are precisely the positive definite matrices. 1.1.3 The trace inner product The trace of an n × n matrix A is defined as Tr(A) = n X Aii . i=1 Taking the trace is a linear operation: Tr(λA) = λTr(A), Tr(A + B) = Tr(A) + Tr(B). Moreover, the trace satisfies the following properties: Tr(A) = Tr(AT ), Tr(AB) = Tr(BA), Tr(uuT ) = uT u = kuk2 for u ∈ Rn . (1.3) Using the fact that Tr(uuT ) = 1 for any unit vector u, combined with (1.1), we deduce that the trace of a symmetric matrix is equal to the sum of its eigenvalues. Lemma 1.1.3. If X ∈ S n has eigenvalues λ1 , . . . , λn , then Tr(X) = λ1 + . . . + λn . 3 One can define an inner product, denoted as h·, ·i, on Rn×n by setting hA, Bi = Tr(AT B) = n X i,j=1 Aij Bij for A, B ∈ Rn×n . This defines the Frobenius norm on Rn×n by setting kAk = p hA, Ai = (1.4) qP n i,j=1 A2ij . In other words, this is the usual Euclidean norm, just viewing a matrix as a vec2 tor in Rn . For a vector x ∈ Rn we have hA, xxT i = xT Ax. The following property is useful to know: Lemma 1.1.4. Let A, B ∈ S n and P ∈ O(n). Then, hA, Bi = hP AP T , P BP T i. Proof. Indeed, hP AP T , P BP T i is equal to Tr(P AP T P BP T ) = Tr(P ABP T ) = Tr(ABP T P ) = Tr(AB) = hA, Bi, where we have used the fact that P T P = P P T = In and the commutativity rule from (1.3). Positive semidefinite matrices satisfy the following fundamental property: Lemma 1.1.5. For a symmetric matrix A ∈ S n , n A  0 ⇐⇒ hA, Bi ≥ 0 for all B ∈ S0 . n Proof. The proof is based on the fact that S0 is generated by rank 1 matrices T (recall (1.2)). Indeed, if A  0 then hA, xx i = xT Ax ≥ 0 for all x ∈ Rn , and n n thus hA, Bi ≥ 0 for all B ∈ S0 . Conversely, if hA, Bi ≥ 0 for all B ∈ S0 then, T T for B = xx , we obtain that x Ax ≥ 0, which shows A  0. n In other words, the cone S0 is self dual, i.e., it coincides with its dual cone1 . 1.2 Basic properties 1.2.1 Schur complements We recall some basic operations about positive semidefinite matrices. The proofs of the following Lemmas 1.2.1, 1.2.2 and 1.2.3 are easy and left as an exercise. Lemma 1.2.1. If X  0 then every principal submatrix of X is positive semidefinite. 1 By n is the set of all matrices Y ∈ S n satisfying hY, Xi ≥ 0 definition, the dual of the cone S0 n for all X ∈ S0 . 4 Moreover, any matrix congruent to X  0 (i.e., of the form P XP T where P is nonsingular) is positive semidefinite: Lemma 1.2.2. Let P ∈ Rn×n be a nonsingular matrix. Then, X  0 ⇐⇒ P XP T  0. Lemma 1.2.3. Let X ∈ S n be a matrix having the following block-diagonal form:   A 0 X= . 0 C Then, X  0 ⇐⇒ A  0 and B  0. We now introduce the notion of Schur complement, which can be very useful for showing positive semidefiniteness. Lemma 1.2.4. Let X ∈ S n be a matrix in block form   A B , X= BT C (1.5) where A ∈ S p , C ∈ S n−p and B ∈ Rp×(n−p) . If A is non-singular, then X  0 ⇐⇒ A  0 and C − B T A−1 B  0. The matrix C − B T A−1 B is called the Schur complement of A in X. Proof. One can verify that the following identity holds:    0 I T A P, where P = X=P 0 0 C − B T A−1 B  A−1 B . I As P is nonsingular, we deduce that X  0 if and only if (P −1 )T XP −1  0 (use Lemma 1.2.2), which is thus equivalent to A  0 and C − B T A−1 B  0 (use Lemma 1.2.3). 1.2.2 Kronecker and Hadamard products Given two matrices A = (Aij ) ∈ Rn×m and B = (Bhk ) ∈ Rp×q , their Kronecker product is the matrix A ⊗ B ∈ Rnp×mq with entries Aih,jk = Aij Bhk ∀i ∈ [n], j ∈ [m], h ∈ [p], k ∈ [q]. The matrix A ⊗ B can be seen as the n × m block matrix whose ij-th block is the p × q matrix Aij B for all i ∈ [n], j ∈ [m]. Alternatively, it can be seen as the 5 p × q block matrix whose hk-block is the n × m matrix Bhk A for h ∈ [p], k ∈ [q]. As an example, I2 ⊗ J3 takes the form:   1 0 1 0 1 0  0 1 0 1 0 1    I2 I2 I2   I2 I2 I2  = 1 0 1 0 1 0 , 0 1 0 1 0 1   I2 I2 I2 1 0 1 0 1 0 0 1 0 1 0 1 or, after permuting rows and columns, the form:  1 1 1 0 1 1 1 0    1 1 1 0 J3 0 = 0 0 0 1 0 J3  0 0 0 1 0 0 0 1 0 0 0 1 1 1  0 0  0 . 1  1 1 This includes in particular defining the Kronecker product u ⊗ v ∈ Rnp of two vectors u ∈ Rn and v ∈ Rp , with entries (u ⊗ v)ih = ui vh for i ∈ [n], h ∈ [p]. Given two matrices A, B ∈ Rn×m , their Hadamard product is the matrix A ◦ B ∈ Rn×m with entries (A ◦ B)ij = Aij Bij ∀i ∈ [n], j ∈ [m]. Note that A ◦ B coincides with the principal submatrix of A ⊗ B indexed by the subset of all ‘diagonal’ pairs of indices of the form (ii, jj) for i ∈ [n], j ∈ [m]. Here are some (easy to verify) facts about these products, where the matrices and vectors have the appropriate sizes. 1. (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD). 2. In particular, (A ⊗ B)(u ⊗ v) = (Au) ⊗ (Bv). 3. Assume A ∈ S n and B ∈ S p have, respectively, eigenvalues α1 , . . . , αn and β1 , . . . , βp . Then A ⊗ B ∈ S np has eigenvalues αi βh for i ∈ [n], h ∈ [p]. In particular, A, B  0 =⇒ A ⊗ B  0 and A ◦ B  0, A  0 =⇒ A◦k = ((Aij )k )ni,j=1  0 ∀k ∈ N. 1.2.3 Properties of the kernel Here is a first useful property of the kernel of positive semidefinite matrices. Lemma 1.2.5. Assume X ∈ S n is positive semidefinite and let x ∈ Rn . Then, Xx = 0 ⇐⇒ xT Xx = 0. 6 Pn Proof. The ‘only if’ part is clear. Conversely, decompose Px = i=1 xi uiT in the orthonormal base of eigenvectors of X.PThen, Xx = i λi xi ui and x Xx = P 2 2 T i λi xi and thus xi = 0 for each i for i xi λi . Hence, 0 = x Xx gives 0 = which λi > 0. This shows that x is a linear combination of the eigenvectors ui with eigenvalue λi = 0, and thus Xx = 0. Clearly, X  0 implies Xii ≥ 0 for all i (because Xii = eT i Xei ≥ 0). Moreover, if X  0 has a zero diagonal entry at position (i, i) then the whole i-th row/column is identically zero. This follows from the following property: Lemma 1.2.6. Let X ∈ S n be a matrix in block form   A B X= , BT C (1.6) where A ∈ S p , C ∈ S n−p and B ∈ Rp×(n−p) . Assume y ∈ Rp belongs to the kernel of A, i.e., Ay = 0. Then the vector x = (y, 0, . . . , 0) ∈ Rn (obtained from y by adding zero coordinates at the remaining n − p positions) belongs to the kernel of X, i.e., Xx = 0. Proof. We have: xT Xx = uT Au = 0 which, in view of Lemma 1.2.5, implies that Xx = 0. We conclude with the following property: The inner product of two positive semidefinite matrices is zero if and only if their matrix product is equal to 0. Lemma 1.2.7. Let A, B  0. Then, hA, Bi = 0 ⇐⇒ AB = 0. Proof. TheP ‘only if’ part is clear since hA, Bi = Tr(AB). Assume now hA, Bi = 0. n Say, B = uT i , where λi ≥ 0 and the ui form an orthonormal base. i=1 λi uiP T Then, 0 = hA, Bi = i λi hA, ui uT i i. This implies that each term λi hA, ui ui i = T T λi ui Aui is equal to 0, since λi ≥ 0 and ui Aui ≥ 0 (as A  0). Hence, λi > 0 implies uT (by Lemma 1.2.5). each term i Aui = 0 and thus Aui = 0 P P Therefore, T λi Aui is equal to 0 and thus AB = A( i λi ui uT ) = = 0. λ Au u i i i i i 1.3 Exercises 1.1 Given x1 , . . . , xn ∈ R, consider the following matrix   1 x 1 . . . xn  x1 x1 0 0   X= . . ..  .. . 0 0 xn 0 0 xn That is, X ∈ S n+1 is the matrix indexed by {0, 1, . . . , n}, with entries X00 = 1, X0i = Xi0 = Xii = xi for i ∈ [n], and all other entries are equal to 0. 7 Show: X  0 if and only if xi ≥ 0 for all i ∈ [n] and Hint: Use Schur complements. Pn i=1 xi ≤ 1. 1.2. Define the matrix Fij = (ei − ej )(ei − ej )T ∈ S n for 1 ≤ i < j ≤ n. That is, Fij has entries 1 at positions (i, i) and (j, j), entries −1 at (i, j) and (j, i), and entries 0 at all other positions. (a) Show: Fij  0. (b) Assume that X ∈ S n satisfies the condition: X Xii ≥ |Xij | for all i ∈ [n]. j∈[n]:j6=i (Then X is said to be diagonally dominant.) Show: X  0. 1.3 Let X ∈ {±1}n×n be a symmetric matrix whose entries are 1 or −1. Show: X  0 if and only if X = xxT for some x ∈ {±1}n . 8 CHAPTER 2 SEMIDEFINITE PROGRAMS Semidefinite programming is the analogue of linear programming but now, instead of having variables that are vectors assumed to lie in the nonnegative n orthant Rn≥0 , we have variables that are matrices assumed to lie in the cone S0 of positive semidefinite matrices. Thus semidefinite optimization can be seen as linear optimization over the convex cone of positive semidefinite matrices. In this chapter we introduce semidefinite programs and give some basic properties, in particular, about duality and complexity. For convenience we recap some notation, mostly already introduced in the previous chapter. S n denotes the set of symmetric n × n matrices. For a matrix n X ∈ S n , X  0 means that X is positive semidefinite and S0 is the cone of positive semidefinite matrices; X ≻ 0 means that X is positive definite. Throughout In (or simply I when the dimension is clear from the context) denotes the n × n identity matrix, e denotes the all-ones vector, i.e., e = (1, . . . , 1)T ∈ Rn , and Jn = eeT (or simply J) denotes the all-ones matrix. The vectors e1 , . . . , en are the standard unit vectors in Rn , and the matrices n T Eij = (ei eT j + ej ei )/2 form the standard basis of S . O(n) denotes the set of orthogonal matrices, where A is orthogonal if AAT = In or, equivalently, AT A = I n . Pn We consider the trace inner product: hA, Bi = Tr(AT B) = i,j=1 Aij Bij for P n two matrices A, B ∈ Rn×n . Here Tr(A) = hIn , Ai = i=1 Aii denotes the trace of A. Recall that Tr(AB) = Tr(BA); in particular, hQAQT , QBQT i = hA, Bi if Q is an orthogonal matrix. A well known property of the positive semidefinite cone n S0 is that it is self-dual: for a matrix X ∈ S n , X  0 if and only if hX, Y i ≥ 0 n for all Y ∈ S0 . For a matrix A ∈ S n , diag(A) denotes the vector in Rn with entries are the diagonal entries of A and, for a vector a ∈ Rn , Diag(a) ∈ S n is the diagonal matrix with diagonal entries the entries of a. 9 2.1 Semidefinite programs 2.1.1 Recap on linear programs We begin with recalling the standard form of a linear program, in primal form: p∗ = maxn {cT x : aT j x = bj (j ∈ [m]), x ≥ 0}, x∈R (2.1) m where c, a1 , . . . , am ∈ Rn and b = (bj )m are the given data of the LP. j=1 ∈ R Then the dual LP reads:   m m X  X bj y j : (2.2) yj a j − c ≥ 0 . d∗ = minm y∈R   j=1 j=1 We recall the following well known facts about LP duality: Theorem 2.1.1. The following holds for the programs (2.1) and (2.2). 1. (weak duality) If x is primal feasible and y is dual feasible then cT x ≤ bT y. Thus, p∗ ≤ d∗ . 2. (strong duality) p∗ = d∗ unless both programs (2.1) and (2.2) are infeasible (in which case p∗ = −∞ and d∗ = +∞). If p∗ is finite (i.e., (2.1) is feasible and bounded) or if d∗ is finite (i.e., (2.2) is feasible and bounded), then p∗ = d∗ and both (2.1) and (2.2) have optimum solutions. 3. (optimality condition) If (x, y) is a pair of primal/dual feasible solutions, then they are primal/dual optimal solutions if and only if cT x = bT y or, equivalently, the complementary slackness condition holds:   m X xi  yj aj − c = 0 ∀i ∈ [n]. j=1 i 2.1.2 Semidefinite program in primal form The standard form of a semidefinite program (abbreviated as SDP) is a maximization problem of the form p∗ = sup{hC, Xi : hAj , Xi = bj (j ∈ [m]), X  0}. (2.3) X Here A1 , . . . , Am ∈ S n are given n×n symmetric matrices and b ∈ Rm is a given vector, they are the data of the semidefinite program (2.3). The matrix X is the variable, which is constrained to be positive semidefinite and to lie in the affine subspace W = {X ∈ S n | hAj , Xi = bj (j ∈ [m])} 10 of S n . The goal is to maximize the linear objective function hC, Xi over the feasible region n F = S0 ∩ W, n obtained by intersecting the positive semidefinite cone S0 with the affine subspace W. Of course, one can also handle minimization problems, of the form inf {hC, Xi : hAj , Xi = bj (j ∈ [m]), X  0} X since they can be brought into the above standard maximization form using the fact that infhC, Xi = − suph−C, Xi. In the special case when the matrices Aj , C are diagonal matrices, with diagonals aj , c ∈ Rn , then the program (2.3) reduces to the linear program (2.1). Indeed, let x denote the vector consisting of the diagonal entries of the matrix X, so that x ≥ 0 if X  0, and hC, Xi = cT x, hAj , Xi = aT j x. Hence semidefinite programming contains linear programming as a special instance. A feasible solution X ∈ F is said to be strictly feasible if X is positive definite. The program (2.3) is said to be strictly feasible if it admits at least one strictly feasible solution. Note that we write a supremum in (2.3) rather than a maximum. This is because the optimum value p∗ might not be attained in (2.3). In general, p∗ ∈ R ∪ {±∞}, with p∗ = −∞ if the problem (2.3) is infeasible (i.e., F = ∅) and p∗ = +∞ might occur in which case we say that the problem is unbounded. We give a small example as an illustration. Example 2.1.2. Consider the problem of minimizing/maximizing X11 over the feasible region   n o X11 a Fa = X ∈ S 2 : X =  0 where a ∈ R is a given parameter. a 0 Note that det(X) = −a2 for any X ∈ Fa . Hence, if a 6= 0 then Fa = ∅ (the problem is infeasible). Moreover, if a = 0 then the problem is feasible but not strictly feasible. The minimum value of X11 over F0 is equal to 0, attained at X = 0, while the maximum value of X11 over F0 is equal to ∞ (the problem is unbounded). Example 2.1.3. As another example, consider the problem     X11 1 0 . p∗ = inf 2 X11 : 1 X22 X∈S (2.4) Then the infimum is p∗ = 0 which is reached at the limit when X11 = 1/X22 and letting X22 tend to ∞. So the infimum is not attained. 11 2.1.3 Semidefinite program in dual form The program (2.3) is often referred to as the primal SDP in standard form. One can define its dual SDP, which takes the form:   m m X  X d∗ = inf bj yj = bT y : (2.5) y j Aj − C  0 . y   j=1 j=1 Thus the dual program has variables yj , one for each linear constraint of the primal program. The positive semidefinite constraint arising in (2.5) is also named a linear matrix inequality (LMI). The P SDP (2.5) is said to be strictly feasible if it has a feasible solution y for which j yj Aj − C ≻ 0. Example 2.1.4. Let us work out the dual SDP of the SDP in Example 2.1.3. First we write (2.4) in standard primal form as − p∗ = max2 {h−E11 , Xi : hE12 , Xi = 2}. X∈S (2.6) As there is one linear equation, there is one dual variable y and the dual SDP reads:   1 y ∗ − d = inf {2y : yE12 + E11 =  0}. (2.7) y 0 y∈R Hence y = 0 is the only dual feasible solution. Hence, the dual optimum value is d∗ = 0, attained at y = 0. 2.1.4 Duality The following facts relate the primal and dual SDP’s. They are simple, but very important. Lemma 2.1.5. Let X be a feasible solution of (2.3) and let y be a feasible solution of (2.5). Then the following holds. 1. (weak duality) We have: hC, Xi ≤ bT y and thus p∗ ≤ d∗ . 2. (optimality condition) Assume that p∗ = d∗ holds. Then X is an optimal solution of (2.3) and y is an optimal solution Pm of (2.5) if and only if equality: hC, Xi = bT y holds or, equivalently, hX, j=1 yj Aj −Ci = 0 which, in turn, is equivalent to the following complementarity condition:   m X yj Aj − C  = 0. X j=1 Proof. Let (X, y) is a primal/dual pair of feasible solutions. 1. We have: X X X hX, yj Aj − Ci = hX, Aj iyj − hX, Ci = bj yj − hX, Ci = bT y − hC, Xi, j j j (2.8) 12 where we Pequality. As both P used the fact that hAj , Xi = bj to get the second X and j yj Aj − C are positive semidefinite, we get: hX, j yj Aj − Ci ≥ 0, which implies hC, Xi ≤ bT y and thus p∗ ≤ d∗ . 2. By assumption, we have: hC, Xi ≤ p∗ = d∗ ≤ bT y. Hence, (X, y) form a pair of primal/dual optimal solutionsP if and only if hC, Xi = bT y or, equivalently (in view P of relation (2.8)), hX, j yj Aj − Ci = 0. Finally, as both X and Z = j yj Aj − C are positive semidefinite, we deduce that hX, Zi = 0 if and only if XZ = 0. (Recall Lemma 1.2.7.) The quantity d∗ − p∗ is called the duality gap. While there is no duality gap in LP, there might be a positive duality gap between the primal and dual SDP’s. When there is no duality gap, i.e., when p∗ = d∗ , one says that strong duality holds. Having strong duality is a very desirable situation, which happens when at least one of the primal and dual semidefinite programs is strictly feasible. We only quote the following result on strong duality. For its proof we refer e.g. to the textbook [1] or to [3]. Theorem 2.1.6. (Strong duality: no duality gap) Consider the pair of primal and dual programs (2.3) and (2.5). 1. Assume that the dual program (2.5) is bounded from below (d∗ > −∞) and that it is strictly feasible. Then the primal program (2.3) attains its supremum (i.e., p∗ = hC, Xi for some primal feasible X) and there is no duality gap: p∗ = d∗ . 2. Assume that the primal program (2.3) is bounded from above (p∗ < ∞) and that it is strictly feasible. Then the dual program (2.5) attains its infimum (i.e., d∗ = bT y for some dual feasible y) and there is no duality gap: p∗ = d∗ . Consider again the primal and dual SDP’s of Example 2.1.4. Then, the primal (2.6) is strictly feasible, the dual (2.7) attains its optimum value and there is no duality gap, while the dual is not strictly feasible and the primal does not attain its optimum value. We conclude with an example having a positive duality gap. Example 2.1.7. Consider the primal semidefinite program with data matrices       −1 0 0 1 0 0 0 0 1 C =  0 −1 0 , A1 = 0 0 0 , A2 = 0 1 0 , 0 0 0 0 0 0 1 0 0 and b1 = 0, b2 = 1. It reads p∗ = sup{−X11 − X22 : X11 = 0, 2X13 + X22 = 1, X  0} and its dual reads   y1 + 1  d∗ = inf y2 : y1 A1 + y2 A2 − C =  0  y2 13 0 y2 + 1 0   y2  00 .  0 Then any primal feasible solution satisfies X13 = 0, X22 = 1, so that the primal optimum value is equal to p∗ = −1, attained at the matrix X = E22 . Any dual feasible solution satisfies y2 = 0, so that the dual optimum value is equal to d∗ = 0, attained at y = 0. Hence there is a positive duality gap: d∗ − p∗ = 1. Note that in this example both the primal and dual programs are not strictly feasible. 2.2 Application to eigenvalue optimization Given a matrix C ∈ S n , let λmin (C) (resp., λmax (C)) denote its smallest (resp., largest) eigenvalue. One can express them (please check it) as follows: λmax (C) = xT Cx = max xT Cx, \{0} kxk2 x∈Sn−1 max n x∈R (2.9) where Sn−1 = {x ∈ Rn | kxkx = 1} denotes the unit sphere in Rn , and λmin (C) = xT Cx = min xT Cx. \{0} kxk2 x∈Sn−1 min n x∈R (2.10) (This is known as the Rayleigh principle.) As we now see the largest and smallest eigenvalues can be computed via a semidefinite program. Namely, consider the semidefinite program p∗ = sup {hC, Xi : Tr(X) = hI, Xi = 1, X  0} (2.11) and its dual program d∗ = inf {y : yI − C  0} . (2.12) y∈R In view of (2.9), we have that d∗ = λmax (C). The feasible region of (2.11) is bounded (all entries of any feasible X lie in [0, 1]) and contains a positive definite matrix (e.g., the matrix In /n), hence the infimum is attained in (2.12). Analogously, the program (2.12) is bounded from below (as y ≥ λmax (C) for any feasible y) and strictly feasible (pick y large enough), hence the infimum is attained in (2.12). Moreover there is no duality gap: p∗ = d∗ . Here we have applied Theorem 2.1.6. Thus we have shown: Lemma 2.2.1. The largest and smallest eigenvalues of a symmetric matrix C ∈ S n can be expressed with the following semidefinite programs: λmax (C) = max s.t. hC, Xi Tr(X) = 1, X  0 = min s.t. y yIn − C  0 λmin (C) = min s.t. hC, Xi = Tr(X) = 1, X  0 max s.t. y C − yIn  0 14 More generally, also the sum of the k largest eigenvalues of a symmetric matrix can be computed via a semidefinite program. For details see [4]. Theorem 2.2.2. (Fan’s theorem) Let C ∈ S n be a symmetric matrix with eigenvalues λ1 ≥ . . . ≥ λn . Then the sum of its k largest eigenvalues is given by any of the following two programs: λ1 + · · · + λk = maxn {hC, Xi : Tr(X) = k, In  X  0} , X∈S λ1 + · · · + λk = max Y ∈Rn×k  hC, Y Y T i : Y T Y = Ik . (2.13) (2.14) 2.3 Some facts about complexity 2.3.1 More differences between LP and SDP We have already seen above several differences between linear programming and semidefinite programming: there might be a duality gap between the primal and dual programs and the supremum/infimum might not be attained even though they are finite. We point out some more differences regarding rationality and bit size of optimal solutions. In the classical bit (Turing machine) model of computation an integer number p is encoded in binary notation, so that its bit size is log p + 1 (logarithm in base 2). Rational numbers are encoded as two integer numbers and the bit size of a vector or a matrix is the sum of the bit sizes of its entries. Consider a linear program max{cT x : Ax = b, x ≥ 0} (2.15) where the data A, b, c is rational valued. From the point of view of computability this is a natural assumption and it would be desirable to have an optimal solution which is also rational-valued. A fundamental result in linear programming asserts that this is indeed the case: If program (4.4) has an optimal solution, then it has a rational optimal solution x ∈ Qn , whose bit size is polynomially bounded in terms of the bit sizes of A, b, c (see e.g. [10]). On the other hand it is easy to construct instances of semidefinite programming where the data are rational valued, yet there is no rational optimal solution. For instance, the following program     1 x max x : 0 (2.16) x 2 √ attains its maximum at x = ± 2. Consider now the semidefinite program, with variables x1 , . . . , xn ,       1 2 1 xi−1 inf xn :  0 for i = 2, . . . , n .  0, xi−1 xi 2 x1 n (2.17) Then any feasible solution satisfies xn ≥ 22 . Hence the bit-size of an optimal solution is exponential in n, thus exponential in terms of the bit-size of the data. 15 2.3.2 Algorithms It is well known that linear programs (with rational data c, a1 , . . . , am , b) can be solved in polynomial time. Although the simplex method invented by Dantzig in 1948 performs very well in practice, it is still an open problem whether it gives a polynomial time algorithm for solving general LP’s. The first polynomialtime algorithm for solving LP’s was given by Khachiyan in 1979, based on the ellipsoid method. The value of this algorithm is however mainly theoretical as it is very slow in practice. Later the algorithm of Karmarkar in 1984 opened the way to polynomial time algorithms for LP based on interior-point algorithms, which also perform well in practice. What about algorithms for solving semidefinite programs? First of all, one cannot hope for a polynomial time algorithm permitting to solve any semidefinite program exactly. Indeed, even if the data of the SDP are assumed to be rational valued, the output might be an irrational number, thus not representable in the bit model of computation. Such an instance was mentioned above in (2.16). Therefore, one can hope at best for an algorithm permitting to compute in polynomial time an ǫ-approximate optimal solution. However, even if we set up to this less ambitious goal of just computing ǫ-approximate optimal solutions, we should make some assumptions on the semidefinite program, roughly speaking, in order to avoid having too large or too small optimal solutions. An instance of SDP whose output is exponentially large in the bit size of the data was mentioned above in (2.17). On the positive side, it is well known that one can test whether a given rational matrix is positive semidefinite in polynomial time — using Gaussian elimination. Hence one can test in polynomial time membership in the positive n semidefinite cone and, moreover, if X 6∈ S0 , then one can compute in polynon mial time a hyperplane strictly separating X from S0 (again as a byproduct of Gaussian elimination). See Section 2.3.3 below for details. This observation is at the base of the polynomial time algorithm for solving approximately semidefinite programs, based on the ellipsoid method. Roughly speaking, one can solve a semidefinite program in polynomial time up to any given precision. More precisely, we quote the following result describing the complexity of solving semidefinite programming with the ellipsoid method: Consider the semidefinite program p∗ = sup{hC, Xi : hAj , Xi = bj (j ∈ [m]), X  0}, where Aj , C, bj are integer valued. Denote by F its feasibility region. Suppose that an integer R is known a priori such that either F = ∅ or there exists X ∈ F with kXk ≤ R. Let ǫ > 0 be given.Then, either one can find a matrix X ∗ at distance at most ǫ from F and such that |hC, X ∗ i − p∗ | ≤ ǫ, or one can find a certificate that F does not contain a ball of radius ǫ. The complexity of this algorithm is polynomial in n, m, log R, log(1/ǫ), and the bit size of the input data. 16 Again, although polynomial time in theory, algorithms based on the ellipsoid method are not practical. Instead, interior-point algorithms are used to solve semidefinite programs in practice. We refer e.g. to [1], [2], [10], [6] for more information about algorithms for linear and semidefinite programming. 2.3.3 Gaussian elimination Let A = (aij ) ∈ S n be a rational matrix. Gaussian elimination permits to do the following tasks in polynomial time: (i) Either: find a rational matrix U ∈ Qn×n and a rational diagonal matrix D ∈ Qn×n such that A = U DU T , thus showing that A  0. (ii) Or: find a rational vector x ∈ Qn such that xT Ax < 0, thus showing that A is not positive semidefinite and giving a hyperplane separating A from n the cone S0 . Here is a sketch. We distinguish three cases. Case 1: a11 < 0. Then, (ii) applies, since eT 1 Ae1 < 0. Case 2: a11 = 0, but some entry a1j is not zero, say a12 6= 0. Then choose λ ∈ Q such that 2λa12 + a22 < 0, so that xT Ax < 0 for the vector x = (λ, 1, 0, . . . , 0) and thus (ii) applies again. Case 3: a11 > 0. Then we apply Gaussian elimination to the rows Rj and columns Cj of A for j = 2, . . . , n. Namely, for each j = 2, . . . , n, we replace Cj a1j C1 , and analogously we replace Rj by Rj − aa12 Rj , which amounts to by Cj − a11 11 making all entries of A equal to zero at the positions (1, j) and (j, 1) for j 6= 1. a1j For this, define the matrices Pj = In − a11 E1j and P = P2 · · · Pn . Then, P is rational and nonsingular, and P T AP has the block form:   1 0 T , P AP = 0 A′ where A′ ∈ S n−1 . Thus, A  0 ⇐⇒ P T AP  0 ⇐⇒ A′  0. Then, we proceed inductively with the matrix A′ ∈ S n−1 : • Either, we find W ∈ Q(n−1)×(n−1) and a diagonal matrix D′ ∈ Q(n−1)×(n−1) such that A′ = W T D′ W . Then, we obtain that A = U T DU , setting     1 0 1 0 U= . P −1 , D = 0 W 0 D′ • Or, we find y ∈ Qn−1 such that y T A′ y < 0. Then, we obtain that xT Ax < 0, after defining z = (0, y) and x = P z ∈ Qn . 17 2.4 Exercises 2.1. Consider a symmetric matrix A ∈ S n and a scalar t ∈ R. Show: There exists a scalar t0 ∈ R such that the matrix A + tIn is positive definite for all t ≥ t0 . Give some explicit value for t0 . 2.2. Consider the semidefinite program: p∗ = sup {X22 : X11 = 0, X12 = 1, X  0}. X∈S 2 (a) Write it in standard primal form and determine the value of p∗ . (b) Write the dual semidefinite program and determine its optimum value d∗ . (c) Is there a duality gap? Are the semidefinite programs strictly feasible? n 2.3. (a) Consider a matrix A ∈ S0 and L ∈ Rn×k such that A = LLT , a n vector b ∈ R and a scalar c ∈ R. Show: For any vector x ∈ Rn ,   Ik LT x T T x Ax ≤ b x + c ⇐⇒  0. x T L bT x + c (b) Show: For any vector x ∈ Rn , kxk2 ≤ 1 ⇐⇒  1 x xT In   0. (c) Given c ∈ Rn , consider the problem: min {cT x : kxk2 ≤ 1}. x∈Rn Reformulate it as a semidefinite program and write its dual semidefinite program. Is there a duality gap? 2.4. Let G = (V = [n], E) be a graph and let d = (dij ){i,j}∈E ∈ RE ≥0 be given nonnegative weights on the edges. Consider the following problem (P): Find vectors v1 , . . . , vn ∈ Rk (for some integer k ≥ 1) such that n X i=1 kvi k2 = 1, kvi − vj k2 = dij for all {i, j} ∈ E and for which the sum Pn i,j=1 viT vj is minimum. (a) Formulate problem (P) as an instance of semidefinite program. (b) If in problem (P) we would like to add the additional constraint that the vectors v1 , . . . , vn should belong to Rk for some fixed dimension k, which condition would you add to the semidefinite program? Hint: This could be a condition on the rank of the matrix variable. 18 BIBLIOGRAPHY [1] A. Ben-Tal, A. Nemirovski. Lectures on Modern Convex Optimization, SIAM, 2001. [2] M. Grötschel, L. Lovász and A. Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer, 1988. [3] M. Laurent and A. Vallentin. Semidefinite Optimization. Lecture Notes. https://sites.google.com/site/mastermathsdp/lectures [4] M. Overton and R.S. Womersley. On the sum of the k largest eigenvalues of a symmetric matrix. SIAM Journal on Matrix Analysis and its Applications 13(1):41–45, 1992. [5] A. Schrijver, Theory of linear and integer programming, John Wiley & Sons, 1986. [6] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review 38:49–95, 1996. http://stanford.edu/~boyd/papers/sdp.html 19 CHAPTER 3 GRAPH COLORING AND INDEPENDENT SETS In this chapter we discuss how semidefinite programming can be used for constructing tractable bounds for two hard combinatorial problems: for finding maximum independent sets and minimum colorings in graphs. We introduce the graph parameter ϑ(G), known as the theta number of the graph G. This parameter was introduced by L. Lovász in his seminal paper [8]. We present several equivalent formulations and explain how ϑ(G) can be used to compute maximum stable sets and minimum colorings in perfect graphs in polynomial time, whereas these problems are NP-hard for general graphs. Here are some definitions that we use in this chapter. Let G = (V, E) be a graph; often we let V = [n] = {1, . . . , n}. Then, E denotes the set of pairs {i, j} of distinct nodes that are not adjacent in G. The graph G = (V, E) is called the complementary graph of G. G is self-complementary if G and G are isomorphic graphs. Given a subset S ⊆ V , G[S] denotes the subgraph induced by S: its node set is S and its edges are all pairs {i, j} ∈ E with i, j ∈ S. The graph Cn is the circuit (or cycle) of length n, with node set [n] and edges the pairs {i, i + 1} (for i ∈ [n], indices taken modulo n). For a set S ⊆ V , its characteristic vector is the vector χS ∈ {0, 1}V , whose i-th entry is 1 if i ∈ S and 0 otherwise. We let e = (1, . . . , 1)T denote the all-ones vector. 20 3.1 Preliminaries on graphs 3.1.1 Stability and chromatic numbers A subset S ⊆ V of nodes is said to be stable (or independent) if no two nodes of S are adjacent in G. Then the stability number of G is the parameter α(G) defined as the maximum cardinality of an independent set in G. A subset C ⊆ V of nodes is called a clique if every two distinct nodes in C are adjacent. The maximum cardinality of a clique in G is denoted ω(G), the clique number of G. Clearly, ω(G) = α(G). Computing the stability number of a graph is a hard problem: Given a graph G and an integer k, deciding whether α(G) ≥ k is an N P -complete problem. Given an integer k ≥ 1, a k-coloring of G is an assignment of numbers (view them as colors) from {1, · · · , k} to the nodes in such a way that two adjacent nodes receive distinct colors. In other words, this corresponds to a partition of V into k stable sets: V = S1 ∪ · · · ∪ Sk , where Si is the stable set consisting of all nodes that received the i-th color. The coloring (or chromatic) number is the smallest integer k for which G admits a k-coloring, it is denoted as χ(G). Again it is an N P -complete problem to decide whether a graph is k-colorable. In fact, it is N P -complete to decide whether a planar graph is 3-colorable. On the other hand, it is known that every planar graph is 4-colorable – this is the celebrated 4-color theorem. Moreover, observe that one can decide in polynomial time whether a graph is 2-colorable, since one can check in polynomial time whether a graph is bipartite. Figure 3.1: The Petersen graph has α(G) = 4, ω(G) = 2 and χ(G) = 3 Clearly, any two nodes in a clique of G must receive distinct colors. Therefore, for any graph, the following inequality holds: ω(G) ≤ χ(G). (3.1) This inequality is strict, for example, when G is an odd circuit, i.e., a circuit of odd length at least 5, or its complement. Indeed, for an odd circuit C2n+1 (n ≥ 2), ω(C2n+1 ) = 2 while χ(C2n+1 ) = 3. Moreover, for the complement 21 G = C2n+1 , ω(G) = n while χ(G) = n + 1. For an illustration see the cycle of length 7 and its complement in Figure 6.2. Figure 3.2: For C7 and its complement C7 : ω(C7 ) = 2, χ(C7 ) = 3, ω(C7 ) = α(C7 ) = 3, χ(C7 ) = 4 3.1.2 Perfect graphs It is intriguing to understand for which graphs equality ω(G) = χ(G) holds. Note that any graph G with ω(G) < χ(G) can be embedded in a larger graph Ĝ with ω(Ĝ) = χ(Ĝ), simply by adding to G a clique of size χ(G) (disjoint from V ). This justifies the following definition, introduced by C. Berge in the early sixties, which makes the problem well posed. Definition 3.1.1. A graph G is said to be perfect if equality ω(H) = χ(H) holds for all induced subgraphs H of G (including H = G). Here are some classes of perfect graphs. For each of them the relation ω(G) = χ(G) gives a combinatorial min-max relation. 1. Bipartite graphs (the relation ω(G) = χ(G) = 2 is clear). 2. Line graphs of bipartite graphs (the min-max relation claims that the maximum cardinality of a matching is equal to the minimum cardinality of a vertex cover, which is König’s theorem). 3. Comparability graphs (the min-max relation corresponds to Diilworth’s theorem). It follows from the definition and the above observation about odd circuits that if G is a perfect graph then it does not contain an odd circuit of length at least 5 or its complement as an induced subgraph. Berge already conjectured that all perfect graphs arise in this way. Resolving this conjecture has haunted generations of graph theorists. It was finally settled in 2002 by Chudnovsky, Robertson, Seymour and Thomas who proved the following result, known as the strong perfect graph theorem: 22 Theorem 3.1.2. (The strong perfect graph theorem)[2] A graph G is perfect if and only if it does not contain an odd circuit of length at least 5 or its complement as an induced subgraph. This implies the following structural result about perfect graphs, known as the perfect graph theorem, already proved by Lovász in 1972. Theorem 3.1.3. (The perfect graph theorem)[7] If G is a perfect graph, then its complement G too is a perfect graph. We give a direct proof of Theorem 3.1.3 in the next section and we will mention later some other, more geometric, characterizations of perfect graphs (see, e.g., Theorem 3.2.5). 3.1.3 The perfect graph theorem Lovász [7] proved the following result, which implies the perfect graph theorem (Theorem 3.1.3). The proof given below follows the elegant linear-algebraic argument of Gasparian [4]. Theorem 3.1.4. A graph G is perfect if and only if |V (G′ )| ≤ ω(G′ )χ(G′ ) for each induced subgraph G′ of G. Proof. Necessity is easy: Assume that G is perfect and let G′ be an induced subgraph of G. Then χ(G′ ) = ω(G′ ) and thus V (G′ ) can be covered by ω(G′ ) stable sets, which implies that |V (G′ )| ≤ ω(G′ )α(G′ ). To show sufficiency, assume for a contradiction that there exists a graph G which satisfies the condition but is not perfect; choose such a graph with |V (G)| minimal. Then, n ≤ α(G)ω(G), ω(G) < χ(G) and ω(G′ ) = χ(G′ ) for each induced subgraph G′ 6= G of G. Set ω = ω(G) and α = α(G) for simplicity. Our first claim is: Claim 1: There exist αω + 1 stable sets S0 , . . . , Sαω such that each vertex of G is covered by exactly α of them. Proof of the claim: Let S0 be a stable set of size α in G. For each node v ∈ S0 , as G\v is perfect (by the minimality assumption on G), χ(G\v) = ω(G\v) ≤ ω. Hence, V \ {v} can be partitioned into ω stable sets. In this way we obtain a collection of αω stable sets which together with S0 satisfy the claim. Our next claim is: Claim 2: For each i = 0, 1, . . . , αω, there exists a clique Ki of size ω such that Ki ∩ Si = ∅ and Ki ∩ Sj 6= ∅ for j 6= i. Proof of the claim: For each i = 0, 1, . . . , αω, as G \ Si is perfect we have that χ(G\Si ) = ω(Si ) ≤ ω. This implies that χ(G\Si ) = ω since, if χ(G\Si ) ≤ ω −1, then one could color G with ω colors, contradicting our assumption on G. Hence there exists a clique Ki disjoint from Si and with |Ki | = ω. Moreover Ki meets all the other αω stable sets Sj for j 6= i. This follows from the fact that each 23 of the ω elements of Ki belongs to α stable sets among the Sj ’s (Claim 1) and these ωα sets are pairwise distinct. We can now conclude the proof. Define the matrices M, N ∈ Rn×(αω+1) , whose columns are χS0 , . . . , χSαω (the incidence vectors of the stable sets Si ), and the vectors χK0 , . . . , χαω+1 (the incidence vectors of the cliques Ki ), respectively. By Claim 2, we have that M T N = J − I (where J is the all-ones matrix and I the identity). As J − I is nonsingular, we obtain that that rank(M T N ) = rank(J − I) = αω + 1. On the other hand, rank(M T N ) ≤ rankN ≤ n. Thus we obtain that n ≥ αω + 1, contradicting our assumption on G. 3.2 Linear programming bounds 3.2.1 Fractional stable sets and colorings Let ST(G) denote the polytope in RV defined as the convex hull of the characteristic vectors of the stable sets of G: ST(G) = conv{χS : S ⊆ V, S is a stable set in G}, called the stable set polytope of G. Hence, computing α(G) is linear optimization over the stable set polytope: α(G) = max{eT x : x ∈ ST(G)}. We have now defined the stable set polytope by listing explicitly its extreme points. Alternatively, it can also be represented by its hyperplanes representation, i.e., in the form ST(G) = {x ∈ RV : Ax ≤ b} for some matrix A and some vector b. As computing the stability number is a hard problem one cannot hope to find the full linear inequality description of the stable set polytope (i.e., the explicit A and b). However some partial information is known: several classes of valid inequalities for the stable set polytope are known. For instance, if C is a clique of G, then the clique inequality X xi ≤ 1 (3.2) x(C) = i∈C is valid for ST(G): any stable set can contain at most one vertex from the clique C. The clique inequalities define the polytope  QST(G) = x ∈ RV : x ≥ 0, x(C) ≤ 1 ∀C clique of G (3.3) and maximizing the linear function eT x over it gives the parameter α∗ (G) = max{eT x : x ∈ QST(G)}, 24 (3.4) known as the fractional stability number of G. Clearly, QST(G) is a relaxation of the stable set polytope: ST(G) ⊆ QST(G). (3.5) Analogously, χ∗ (G) denotes the fractional coloring number of G, defined by the following linear program: ) ( X X S ∗ λS χ = e, λS ≥ 0 ∀S stable in G . λS : χ (G) = min S stable in G S stable in G (3.6) If we add the constraint that all λS should be integral we obtain the coloring number of G. Thus, χ∗ (G) ≤ χ(G). In fact the fractional stability number of G coincides with the fractional coloring number of its complement: α∗ (G) = χ∗ (G), and it is nested between α(G) and χ(G). Lemma 3.2.1. For any graph G, we have α(G) ≤ α∗ (G) = χ∗ (G) ≤ χ(G), (3.7) where χ∗ (G) is the optimum value of the linear program:     X X yC χC = e, yC ≥ 0 ∀C clique of G . min yC :   C clique of G (3.8) C clique of G Proof. The inequality α(G) ≤ α∗ (G) in (3.7) follows from the inclusion (3.5) and the inequality χ∗ (G) ≤ χ(G) was observed above. We now show that α∗ (G) = χ∗ (G). For this, we first observe that in the linear program (3.4) the condition x ≥ 0 can be removed without changing the optimal value; that is, α∗ (G) = max{eT x : x(C) ≤ 1 ∀C clique of G} (3.9) (check it). Now, it suffices to observe that the dual LP of the above linear program (3.9) coincides with the linear program (3.8). For instance, for an odd circuit C2n+1 (n ≥ 2), α∗ (C2n+1 ) = lies strictly between α(C2n+1 ) = n and χ(C2n+1 ) = n + 1. 2n+1 2 (check it) When G is a perfect graph, equality holds throughout in relation (3.7). As we see in the next section, there is a natural extension of this result to weighted graphs, which permits to show the equality ST(G) = QST(G) when G is a perfect graph. Moreover, it turns out that this geometric property characterizes perfect graphs. 3.2.2 Polyhedral characterization of perfect graphs For any graph G, the factional stable set polytope is a linear relaxation of the stable set polytope: ST(G) ⊆ QST(G). Here we show a geometric characterization of perfect graphs: G is perfect if and only if both polytopes coincide: ST(G) = QST(G). 25 The following operation of duplicating a node will be useful. Let G = (V, E) be a graph and let v ∈ V . Add to G a new node, say v ′ , which is adjacent to v and to all neighbours of v in G. In this way we obtain a new graph H, which we say is obtained from G by duplicating v. Repeated duplicating is called replicating. Lemma 3.2.2. Let H arise from G by duplicating a node. If G is perfect then H too is perfect. Proof. First we show that α(H) = χ(H) if H arises from G by duplicating node v. Indeed, by construction, α(H) = α(G), which is equal to χ(G) since G is perfect. Now, if C1 , . . . , Ct are cliques in G that cover V with (say) v ∈ C1 , then C1 ∪{v ′ }, . . . , Ct are cliques in H covering V (H). This shows that χ(G) = χ(H), which implies that α(H) = χ(H). From this we can conclude that, if H arises from G by duplicating a node v, then α(H ′ ) = χ(H ′ ) for any induced subgraph H ′ of H, using induction on the number of nodes of G. Indeed, either H ′ is an induced subgraph of G (if H ′ does not contain both v and v ′ ), or H ′ is obtained by duplicating v in an induced subgraph of G; in both cases we have that α(H ′ ) = χ(H ′ ). Hence, if H arises by duplicating a node in a perfect graph G, then H is perfect which, by Theorem 3.1.3, implies that H is perfect. Given node weights w ∈ RV+ , we define the following weighted analogues of the (fractional) stability and chromatic numbers: α(G, w) = max wT x, x∈ST(G) α∗ (G, w) = χ∗ (G, w) = min y χ(G, w) = min y       X yC : C clique of G X C clique of G yC : max x∈QST(G) X C clique of G X C clique of G wT x, yC χC = w, yC ≥ 0 ∀C clique of G    yC χC = w, yC ∈ Z, yC ≥ 0 ∀C clique of G When w is the all-ones weight function, we find again α(G), α∗ (G), χ∗ (G) and χ(G), respectively. The following analogue of (3.7) holds for arbitrary node weights: (3.10) α(G, w) ≤ α∗ (G, w) = χ∗ (G, w) ≤ χ(G, w). Lemma 3.2.3. Let G be a perfect graph and let w ∈ ZV≥0 be nonnegative integer node weights. Then, α(G, w) = χ(G, w). Proof. Let H denote the graph obtained from G by duplicating node i wi times if wi ≥ 1 and deleting node i if wi = 0. Then, by construction, α(G, w) = ω(H), which is equal to χ(H) since H is perfect (by Lemma 3.2.2). Say, S̃1 , . . . , S̃t are 26 ,    . t = χ(H) stable sets in H partitioning V (H). Each stable set S̃k corresponds to a stable set Sk in G (since S̃k contains at most one of the wi copies of each node i of G). Now, these stable sets S1 , . . . , St have the property that each node i of G belongs to exactly wi of them, which shows that χ(G, w) ≤ t = χ(H). This implies that χ(G, w) ≤ χ(H) = α(G, w), giving equality χ(G, w) = α(G, w). We will also use the following geometric property of down-monotone polytopes. A polytope P ⊆ Rn≥0 is said to be down-monotone if x ∈ P and 0 ≤ y ≤ x (coordinate-wise) implies y ∈ P . Lemma 3.2.4. Let P, Q ⊆ Rn be polytopes such that P ⊆ Q. (i) P = Q if and only if the following equality holds for all weights w ∈ Rn : max wT x = max wT x. x∈P x∈Q (3.11) (ii) Assume that P ⊆ Q ⊆ Rn≥0 are down-monotone. Then P = Q if and only if (3.11) holds for all nonnegative weights w ∈ Rn≥0 . Moreover, in (i) and (ii) it suffices to show that (3.11) holds for integer weights w. Proof. (i) The ‘only if’ part is clear. The ‘if part’ follows using the ‘hyperplane separation’ theorem: Assume that P ⊂ Q and that there exists z ∈ Q \ P . Then there exists a hyperplane separating z from P , i.e., there exists a nonzero vector w ∈ Rn and a scalar w0 ∈ R such that wT z > w0 and wT x ≤ w0 for all x ∈ P . These two facts contradict the condition (3.11). (ii) The ‘only if’ part is clear. For the ‘if part’, it suffices to show that the equality (3.11) holds for all weights w if it holds for all nonnegative weights w′ . This follows from the following claim (applied to both P and Q). Claim: Let P ⊆ Rn≥0 be a down-monotone polytope, let w ∈ Rn and define the nonnegative vector w′ ∈ Rn≥0 by wi′ = max{wi , 0} for i ∈ [n]. Then, maxx∈P wT x = maxx∈P (w′ )T x. Proof of the claim: Suppose x ∈ P maximizes wT x over P ; we claim that xi = 0 at all positions i for which wi < 0. Indeed, if xi > 0 and wi < 0 then, by setting yi = 0 and yj = xj for j 6= i, one obtains another point y ∈ P (since 0 ≤ y ≤ x and P is down-monotone) with wT y > wT x. Therefore, wT x = (w′ )T x and thus x maximizes w′ over P . The last part of the lemma follows using a continuity argument (if (3.11) holds for all integer weights w, it holds for all rational weights (by scaling) and thus for all real weights (taking limits)). We can now show the following geometric characterization of perfect graphs, due to Chvátal [3]. Theorem 3.2.5. [3] A graph G is perfect if and only if ST(G) = QST(G). 27 Proof. First assume that G is perfect, we show that ST(G) = QST(G). As ST(G) and QST(G) are down-monotone in RV≥0 , we can apply Lemma 3.2.4. Hence, it suffices to show that, for any w ∈ ZV≥0 , α(G, w) = maxx∈ST(G) wT x is equal to α∗ (G, w) = maxx∈QST(G) wT x, which follows from Lemma 3.2.3 (applied to G). Conversely, assume that ST(G) = QST(G) and that G is not perfect. Pick a minimal subset U ⊆ V for which the subgraph G′ of G induced by U satisfies α(G′ ) < χ(G′ ). Setting w = χU , we have that α(G′ ) = α(G, w) which, by assumption, is equal to maxx∈QST(G) wT x = α∗ (G, w). Consider the dual of the linear program defining α∗ (G, w) with an optimal solution y = (yC ). Pick a clique C of G for which yC > 0. Using complementary slackness, we deduce that x(C) = 1 for any optimal solution x ∈ QST(G) and thus |C ∩ S| = 1 for any maximum cardinality stable set S ⊆ U . Let G′′ denote the subgraph of G induced by U \ C. Then, α(G′′ ) ≤ α(G′ ) − 1 < χ(G′ ) − 1 ≤ χ(G′′ ), which contradicts the minimality assumption made on U . When G is a perfect graph, an explicit linear inequality description is known for its stable set polytope, given by the clique inequalities. However, it is not clear how to use this information in order to give an efficient algorithm for optimizing over the stable set polytope of a perfect graph. As we see later in Section 3.5 there is yet another description of ST(G) – in terms of semidefinite programming, using the theta body TH(G) – that will allow to give an efficient algorithm. 3.3 Semidefinite programming bounds 3.3.1 The theta number Definition 3.3.1. Given a graph G = (V, E), consider the following semidefinite program max {hJ, Xi : Tr(X) = 1, Xij = 0 ∀{i, j} ∈ E, X  0} . X∈S n (3.12) Its optimal value is denoted as ϑ(G), and called the theta number of G. This parameter was introduced by Lovász [8]. He proved the following simple, but crucial result – called the Sandwich Theorem by Knuth [6] – which shows that ϑ(G) provides a bound for both the stability number of G and the chromatic number of the complementary graph G. Theorem 3.3.2. (Lovász’ sandwich theorem) For any graph G, we have that α(G) ≤ ϑ(G) ≤ χ(G). Proof. Given a stable set S of cardinality |S| = α(G), define the matrix X= 1 S S T χ (χ ) ∈ S n . |S| 28 Then X is feasible for (3.12) with objective value hJ, Xi = |S| (check it). This shows the inequality α(G) ≤ ϑ(G). Now, consider a matrix X feasible for the program (3.12) and a partition of V into k cliques: V = C1 ∪ · · · ∪ Ck . Our goal is now to show that hJ, Xi ≤ k, Pk Ci which will imply ϑ(G) ≤ χ(G). For this, using the relation e = i=1 χ , observe that Y := k X i=1 Moreover, kχCi − e * X,  kχCi − e k X Ci T Ci T χ (χ ) i=1 = k2 k X i=1 + χCi (χCi )T − kJ. = Tr(X). P Indeed the matrix i χCi (χCi )T has all its diagonal entries equal to 1 and it has zero off-diagonal entries outside the edge set of G, while X has zero offdiagonal entries on the edge set of G. As X, Y  0, we obtain 0 ≤ hX, Y i = k 2 Tr(X) − khJ, Xi and thus hJ, Xi ≤ k Tr(X) = k. An alternative argument for the inequality ϑ(G) ≤ χ(G), showing an even more transparent link to coverings by cliques, will be given in the paragraph after the proof of Lemma 3.4.2. 3.3.2 Computing maximum stable sets in perfect graphs Assume that G is a graph satisfying α(G) = χ(G). Then, as a direct application of Theorem 3.3.2, α(G) = χ(G) = ϑ(G) can be computed by solving the semidefinite program (3.12), it suffices to solve this semidefinite program with precision ǫ < 1/2 as one can then find α(G) by rounding the optimal value to the nearest integer. In particular, combining with the perfect graph theorem (Theorem 3.1.3): Theorem 3.3.3. If G is a perfect graph then α(G) = χ(G) = ϑ(G) and ω(G) = χ(G) = ϑ(G). Hence one can compute the stability number and the chromatic number in polynomial time for perfect graphs. Moreover, one can also find a maximum stable set and a minimum coloring in polynomial time for perfect graphs. We now indicate how to construct a maximum stable set – we deal with minimum graph colorings in the next section. Let G = (V, E) be a perfect graph. Order the nodes of G as v1 , · · · , vn . Then we construct a sequence of induced subgraphs G0 , G1 , · · · , Gn of G. Hence each Gi is perfect, also after removing a node, so that we can compute in polynomial time the stability number of such graphs. The construction goes as follows: Set G0 = G. For each i = 1, · · · , n do the following: 29 1. Compute α(Gi−1 \vi ). 2. If α(Gi−1 \vi ) = α(G), then set Gi = Gi−1 \vi . 3. Otherwise, set Gi = Gi−1 . By construction, α(Gi ) = α(G) for all i. In particular, α(Gn ) = α(G). Moreover, the node set of the final graph Gn is a stable set and, therefore, it is a maximum stable set of G. Indeed, if the node set of Gn is not stable then it contains a node vi for which α(Gn \vi ) = α(Gn ). But then, as Gn is an induced subgraph of Gi−1 , one would have that α(Gn \vi ) ≤ α(Gi−1 \vi ) and thus α(Gi−1 \vi ) = α(G), so that node vi would have been removed at Step 2. Hence, the above algorithm permits to construct a maximum stable set in a perfect graph G in polynomial time – namely by solving n + 1 semidefinite programs for computing α(G) and α(Gi−1 \vi ) for i = 1, · · · , n. More generally, given integer node weights w ∈ ZV≥0 , the above algorithm can also be used to find a stable set S of maximum weight w(S). For this, construct the new graph G′ in the following way: Duplicate each node i ∈ V wi times, i.e., replace node i ∈ V by a set Wi of wi nodes pairwise non-adjacent, and make two nodes x ∈ Wi and y ∈ Wj adjacent if i and j are adjacent in G. By Lemma 3.2.2, the graph G′ is perfect. Moreover, α(G′ ) is equal to the maximum weight w(S) of a stable set S in G. From this it follows that, if the weights wi are bounded by a polynomial in n, then one can compute α(G, w) in polynomial time. (More generally, one can compute α(G, w) in polynomial time, e.g. by optimizing the linear function wT x over the theta body TH(G), introduced in Section 3.5 below.) 3.3.3 Minimum colorings of perfect graphs We now describe an algorithm for computing a minimum coloring of a perfect graph G in polynomial time. This will be reduced to several computations of the theta number which we will use for computing the clique number of some induced subgraphs of G. Let G = (V, E) be a perfect graph. Call a clique of G maximum if it has maximum cardinality ω(G). The crucial observation is that it suffices to find a stable set S in G which meets all maximum cliques. First of all, such a stable set S exists: in a ω(G)-coloring, any color class S must meet all maximum cliques, since ω(G \ S) = χ(G \ S) = ω(G) − 1. Now, if we have found such a stable set S, then one can recursively color G\S with ω(G\S) = ω(G) − 1 colors (in polynomial time), and thus one obtains a coloring of G with ω(G) colors. The algorithm goes as follows: For t ≥ 1, we grow a list L of t maximum cliques C1 , · · · , Ct . Suppose C1 , · · · , Ct have been found. Then do the following: 1. We find a stable set S meeting each of the cliques C1 , · · · , Ct (see below). 30 2. Compute ω(G\S). 3. If ω(G\S) < ω(G) then S meets all maximum cliques and we are done. 4. Otherwise, compute a maximum clique Ct+1 in G\S, which is thus a new maximum clique of G, and we add it to the list L. Pt The first step can be done as follows: Set w = i=1 χCi ∈ ZV≥0 . As G is perfect, we know that α(G, w) = χ(G, w), which in turn is equal to t. (Indeed, χ(G, w) ≤ t follows from the definition of w. Moreover, if y = (yCP ) is feasible T for the program defining χ(G, w) then, on the one hand, w e = C yC |C| ≤ P T y ω(G) and, on the other hand, w e = tω(G), thus implying t ≤ χ(G, w).) C C Now we compute a stable set S having maximum possible weight w(S). Hence, w(S) = t and thus S meets each of the cliques C1 , · · · , Ct . The above algorithm has polynomial running time, since the number of iterations is bounded by |V |. To see this, define the affine space Lt ⊆ RV defined by the equations x(C1 ) = 1, · · · , x(Ct ) = 1 corresponding to the cliques in the current list L. Then, Lt contains strictly Lt+1 , since χS ∈ Lt \ Lt+1 for the set S constructed in the first step, and thus the dimension decreases at least by 1 at each iteration. 3.4 Other formulations of the theta number 3.4.1 Dual formulation We now give several equivalent formulations for the theta number obtained by applying semidefinite programming duality and some further elementary manipulations. Lemma 3.4.1. The theta number can be expressed by any of the following programs: ϑ(G) = ϑ(G) = ϑ(G) = min t∈R,A∈S n {t : tI + A − J  0, Aij = 0 (i = j or {i, j} ∈ E)}, (3.13) min t∈R,B∈S n min t∈R,C∈S n  t : tI − B  0, Bij = 1 (i = j or {i, j} ∈ E) , (3.14) {t : C − J  0, Cii = t (i ∈ V ), Cij = 0 ({i, j} ∈ E)}, (3.15)  ϑ(G) = minn λmax (B) : Bij = 1 (i = j or {i, j} ∈ E) . B∈S (3.16) Proof. First we build the dual of the semidefinite program (3.12), which reads:     X t : tI + min yij Eij − J  0 . (3.17)  t∈R,y∈RE  {i,j}∈E 31 As both programs (3.12) and (3.17) are strictly feasible, there is no duality gap: the optimal value of (3.17) is equal to ϑ(G), and the optimal values are attained in both programs P– here we have applied the duality theorem (Theorem 2.1.6). Setting A = {i,j}∈E yij Eij , B = J − A and C = tI + A in (3.17), it follows that the program (3.17) is equivalent to each of the programs (3.13), (3.14) and (3.15). Finally the formulation (3.16) follows directly from (3.14) after recalling that λmax (B) is the smallest scalar t for which tI − B  0. 3.4.2 Two more (lifted) formulations We give here two more formulations for the theta number. They rely on semidefinite programs involving symmetric matrices of order 1 + n which we will index by the set {0} ∪ V , where 0 is an additional index that does not belong to V . Lemma 3.4.2. The theta number ϑ(G) is equal to the optimal value of the following semidefinite program: min {Z00 : Z  0, Z0i = Zii = 1 (i ∈ V ), Zij = 0 ({i, j} ∈ E)}. Z∈S n+1 (3.18) Proof. We show that the two semidefinite programs in (3.13) and (3.18) are equivalent. For this, observe that   t eT tI + A − J  0 ⇐⇒ Z :=  0, e I + 1t A which follows by taking the Schur complement of the upper left corner t in the block matrix Z. Hence, if (t, A) is feasible for (3.13), then Z is feasible for (3.18) with same objective value: Z00 = t. The construction can be reversed: if Z is feasible for (3.18), then one can construct (t, A) feasible for (3.13) with t = Z00 . Hence both programs are equivalent. From the formulation (3.18), the link of the theta number to the (fractional) chromatic number is even more transparent. Lemma 3.4.3. For any graph G, we have that ϑ(G) ≤ χ∗ (G). Proof. Let y = (yC ) be feasible for the linear program (3.8) defining χ∗ (G). For each clique C define the vector zC = (1 χC ) ∈ R1+n , obtained by appending an P entry equal to 1T to the characteristic vector of C. Define the matrix Z = C clique of G yC zC zC . One can verify that Z is feasible for the program (3.18) P with objective value Z00 = C yC (check it). This shows ϑ(G) ≤ χ∗ (G). Applying duality to the semidefinite program (3.18), we obtain1 the following formulation for ϑ(G). 1 Of course there is more than one road leading to Rome: one can also show directly the equivalence of the two programs (3.12) and (3.19). 32 Lemma 3.4.4. The theta number ϑ(G) is equal to the optimal value of the following semidefinite program: ) ( X Yii : Y  0, Y00 = 1, Y0i = Yii (i ∈ V ), Yij = 0 ({i, j} ∈ E) . max n+1 Y ∈S i∈V (3.19) Proof. First we write the program (3.18) in standard form, using the elementary matrices Eij (with entries 1 at positions (i, j) and (j, i) and 0 elsewhere): inf{hE00 , Zi : hEii , Zi = 1, hE0i , Zi = 2 (i ∈ V ), hEij , Zi = 0 ({i, j} ∈ E), Z  0}. Next we write the dual of this sdp:   X  X X yi Eii + zi E0i + sup yi + 2zi : Y = E00 − uij Eij  0 .   i∈V i∈V {i,j}∈E Observe now that the matrix Y ∈ S n+1 occurring in this program can be equivalently characterized by the conditions: Y00  P = 1, Yij = 0 if {i, j} P∈ E and Y  0. Moreover the objective function reads: i∈V yi + 2zi = − i∈V Yii + 2Y0i . Therefore the dual can be equivalently reformulated as ! ) ( X Yii + 2Y0i : Y  0, Y00 = 1, Yij = 0 ({i, j} ∈ E) . (3.20) max − i∈V As (3.18) is strictly feasible (check it) there is no duality gap, the optimal value of (3.20) is attained and it is equal to ϑ(G). Let Y be an optimal solution of (3.20). We claim that Y0i + Yii = 0 for all i ∈ V . Indeed, assume that Y0i + Yii 6= 0 for some i ∈ V . Then, Yii 6= 0. Let us multiply the i-th column and the i-th row of the matrix Y by the scalar − YY0i . In ii this way we obtain a new matrix Y ′ which is still feasible for (3.20), but now  2 Y2 with a better objective value: Indeed, by construction, Yii′ = Yii − YY0i = Y0i ii ii   2 Y0i Y0i ′ ′ and Y0i = Y0i − Yii = − Yii −Yii . Moreover, the i-th term in the new objective value is Y2 −(Yii′ + 2Y0i′ ) = 0i > −(Yii + 2Y0i ). Yii This contradicts optimality of Y and thus we have shown that Y0i = −Yii for all i ∈ V . Therefore, we can add w.l.o.g. the conditionP Y0i = −Yii (i ∈ V ) to (3.20), so that its objective function can be replaced by i∈V Yii . Finally, to get the program (3.19), it suffices to observe that one can change the signs on the first row and column of Y (indexed by the index 0). In this way we obtain a matrix Ỹ such that Ỹ0i = −Y0i for all i and Ỹij = Yij at all other positions. Thus Ỹ now satisfies the conditions Yii = Y0i for i ∈ V and it is an optimal solution of (3.19). 33 3.5 The theta body TH(G) It is convenient to introduce the following set of matrices X ∈ S n+1 , where columns and rows are indexed by the set {0} ∪ V : MG = {Y ∈ S n+1 : Y00 = 1, Y0i = Yii (i ∈ V ), Yij = 0 ({i, j} ∈ E), Y  0}, (3.21) which is thus the feasible region of the semidefinite program (3.19). Now let TH(G) denote the convex set obtained by projecting the set MG onto the subspace RV of the diagonal entries: TH(G) = {x ∈ RV : ∃Y ∈ MG such that xi = Yii ∀i ∈ V }, (3.22) called the theta body of G. It turns out that TH(G) is nested between ST(G) and QST(G). Lemma 3.5.1. For any graph G, we have that ST(G) ⊆ TH(G) ⊆ QST(G). Proof. The inclusion ST(G) ⊆ TH(G) follows from the fact that the characteristic vector of any stable set S lies in TH(G). To see this, define the vector y = (1 χS ) ∈ Rn+1 obtained by adding an entry equal to 1 to the characteristic vector of S, and define the matrix Y = yy T ∈ S n+1 . Then Y ∈ MG and χS = (Yii )i∈V , which shows that χS ∈ TH(G). We now show the inclusion TH(G) ⊆ QST(G). For this take x ∈ TH(G) and let Y ∈ MG such that x = (Yii )i∈V . Then x ≥ 0 (as the diagonal entries of a psd matrix are nonnegative). Moreover, for any clique C of G, we have that x(C) ≤ 1 (cf. Exercise 1.1). In view of Lemma 3.4.4, maximizing the all-ones objective function over TH(G) gives the theta number: ϑ(G) = max {eT x : x ∈ TH(G)}. x∈RV As maximizing eT x over QST(G) gives the LP bound α∗ (G), Lemma 3.5.1 implies directly that the SDP bound ϑ(G) dominates the LP bound α∗ (G): Corollary 3.5.2. For any graph G, we have that α(G) ≤ ϑ(G) ≤ α∗ (G). Combining the inclusion from Lemma 3.5.1 with Theorem 3.2.5, we deduce that TH(G) = ST(G) = QST(G) for perfect graphs. It turns out that these equalities characterize perfect graphs. Theorem 3.5.3. (see [2]) For any graph G the following assertions are equivalent. 1. G is perfect. 2. TH(G) = ST(G) 3. TH(G) = QST(G). 34 4. TH(G) is a polytope. We also mention the following beautiful relationship between the theta bodies of a graph G and of its complementary graph G: Theorem 3.5.4. For any graph G, TH(G) = {x ∈ RV≥0 : xT z ≤ 1 ∀z ∈ TH(G)}. In other words, we know an explicit linear inequality description of TH(G); moreover, the normal vectors to the supporting hyperplanes of TH(G) are precisely the elements of TH(G). One inclusion is easy: Lemma 3.5.5. If x ∈ TH(G) and z ∈ TH(G) then xT z ≤ 1. Proof. Let Y ∈ MG and Z ∈ MG such that x = (Yii ) and z = (Zii ). Let Z ′ be obtained from Z by changing signs in its first row and column (indexed by 0). Then hY, Z ′ i ≥ 0 as Y, Z ′  0. Moreover, hY, Z ′ i = 1 − xT z (check it), thus giving xT z ≤ 1. 3.6 The theta number for vertex-transitive graphs First we mention an inequality relating the theta numbers of a graph and its complement. (You will show it in Exercise 6.1.) Proposition 3.6.1. For any graph G = (V, E), we have that ϑ(G)ϑ(G) ≥ |V |. We now show that equality ϑ(G)ϑ(G) = |V | holds for certain symmetric graphs, namely for vertex-transitive graphs. In order to show this, one exploits in a crucial manner the symmetry of G, which permits to show that the semidefinite program defining the theta number has an optimal solution with a special (symmetric) structure. We need to introduce some definitions. Let G = (V, E) be a graph. A permutation σ of the node set V is called an automorphism of G if it preserves edges, i.e., {i, j} ∈ E implies {σ(i), σ(j)} ∈ E. Then the set Aut(G) of automorphisms of G is a group. The graph G is said to be vertex-transitive if for any two nodes i, j ∈ V there exists an automorphism σ ∈ Aut(G) mapping i to j: σ(i) = j. The group of permutations of V acts on symmetric matrices X indexed by V . Namely, if σ is a permutation of V and Pσ is the corresponding permutation matrix (with Pσ (i, j) = 1 if j = σ(i) and 0 otherwise), then one can build the new symmetric matrix σ(X) := Pσ XPσT = (Xσ(i),σ(j) )i,j∈V . If σ is an automorphism of G, then it preserves the feasible region of the semidefinite program (3.12) defining the theta number ϑ(G). This is an easy, but very useful fact, which is based on the convexity of the feasible region. 35 Lemma 3.6.2. If X is feasible for the program (3.12) and σ is an automorphism of G, then σ(X) is again feasible for (3.12), moreover with the same objective value as X. Proof. Directly from the fact that hJ, σ(X)i = hJ, Xi, Tr(σ(X)) = Tr(X) and σ(X)ij = Xσ(i)σ(j) = 0 if {i, j} ∈ E (since σ is an automorphism of G). Lemma 3.6.3. The program (3.12) has an optimal solution X ∗ which is invariant under action of the automorphism group of G, i.e., satisfies σ(X ∗ ) = X ∗ for all σ ∈ Aut(G). Proof. Let X be an optimal solution of (3.12). By Lemma 3.6.2, σ(X) is again an optimal solution for each σ ∈ Aut(G). Define the matrix X 1 σ(X), X∗ = |Aut(G)| σ∈Aut(G) obtained by averaging over all matrices σ(X) for σ ∈ Aut(G). As the set of optimal solutions of (3.12) is convex, X ∗ is still an optimal solution of (3.12). Moreover, by construction, X ∗ is invariant under action of Aut(G). Corollary 3.6.4. If G is a vertex-transitive graph then the program (3.12) has an optimal solution X ∗ satisfying Xii∗ = 1/n for all i ∈ V and X ∗ e = ϑ(G) n e. Proof. By Lemma 3.6.3, there is an optimal solution X ∗ which is invariant under action of Aut(G). As G is vertex-transitive, all diagonal entries of X ∗ are equal: Indeed, let i, j ∈ V and σ ∈ Aut(G) such that σ(i) = j. Then, ∗ Xjj = Xσ(i)σ(i) = Xii . As Tr(X ∗ ) = 1 we must = 1/n for all i. AnalP have∗ Xii P ∗ for all i, j, = k∈V Xjk ogously, the invariance of X ∗ implies that k∈V Xik ∗ i.e., X e = λe for some scalar λ. Combining with the condition hJ, X ∗ i = ϑ(G) we obtain that λ = ϑ(G) n . Proposition 3.6.5. If G is a vertex-transitive graph, then ϑ(G)ϑ(G) = |V |. Proof. By Corollary 3.6.4, there is an optimal solution X ∗ of the program (3.12) defining ϑ(G) which satisfies Xii∗ = 1/n for i ∈ V and X ∗ e = ϑ(G) n e. Then n2 n n2 ∗ ∗ X − J  0 (check it). Hence, t = and C = X define a feasible ϑ(G) ϑ(G) ϑ(G) solution of the program (3.15) defining ϑ(G), which implies ϑ(G) ≤ n/ϑ(G). Combining with Proposition 3.6.1 we get the equality ϑ(G)ϑ(G) = |V |. For instance, the cycle Cn is vertex-transitive, so that ϑ(Cn )ϑ(Cn ) = n. In particular, as C5 is isomorphic to C5 , we deduce that √ ϑ(C5 ) = 5. (3.23) (3.24) For n even, Cn is bipartite (and thus perfect), so that ϑ(Cn ) = α(Cn ) = n2 and ϑ(Cn ) = ω(Cn ) = 2. For n odd, one can compute ϑ(Cn ) using the above symmetry reduction: 36 Proposition 3.6.6. For any odd n ≥ 3, ϑ(Cn ) = 1 + cos(π/n) n cos(π/n) and ϑ(Cn ) = . 1 + cos(π/n) cos(π/n) Proof. As ϑ(Cn )ϑ(Cn ) = n, it suffices to compute ϑ(Cn ). We use the formulation (3.16). As Cn is vertex-transitive, there is an optimal solution B whose entries are all equal to 1, except Bij = 1 + x for some scalar x whenever |i − j| = 1 (modulo n). In other words, B = J + xACn , where ACn is the adjacency matrix of the cycle Cn . Thus ϑ(Cn ) is equal to the minimum value of λmax (B) for all possible x. The eigenvalues of ACn are known: They are ω k + ω −k 2iπ (for k = 0, 1, · · · , n − 1), where ω = e n is an n-th root of unity. Hence the eigenvalues of B are n + 2x and x(ω k + ω −k ) for k = 1, · · · , n − 1. (3.25) We minimize the maximum of the values in (3.25) when choosing x such that n + 2x = −2x cos(π/n) (check it). This gives ϑ(Cn ) = λmax (B) = −2x cos(π/n) = n cos(π/n) 1+cos(π/n) . 3.7 Bounding the Shannon capacity The theta number was introduced by Lovász [8] in connection with the problem of computing the Shannon capacity of a graph, a problem in coding theory considered by Shannon. We need some definitions. Definition 3.7.1. (Strong product) Let G = (V, E) and H = (W, F ) be two graphs. Their strong product is the graph denoted as G · H with node set V × W and with edges the pairs of distinct nodes {(i, r), (j, s)} ∈ V × W with (i = j or {i, j} ∈ E) and (r = s or {r, s} ∈ F ). If S ⊆ V is stable in G and T ⊆ W is stable in H then S × T is stable in G · H. Hence, α(G · H) ≥ α(G)α(H). Let Gk denote the strong product of k copies of G, we have that α(Gk ) ≥ (α(G))k . Based on this, one can verify that q q Θ(G) := sup k α(Gk ) = lim k α(Gk ). k→∞ k≥1 (3.26) The parameter Θ(G) was introduced by Shannon in 1956, it is called the Shannon capacity of the graph G. The motivation is as follows. Suppose V is a finite alphabet, where some pairs of letters could be confused when they are transmitted over some transmission channel. These pairs of confusable letters can be seen as the edge set E of a graph G = (V, E). Then the stability number of 37 G is the largest number of one-letter messages that can be sent without danger of confusion. Words of length k correspond to k-tuples in V k . Two words (i1 , · · · , ik ) and (j1 , · · · , jk ) can be confused if at every position h ∈ [k] the two letters ih and jh are equal or can be confused, which corresponds to having an edge in the strong product Gk . Hence the largest number of words of length k that can be sent without danger of confusion is equal to the stability number of Gk and the Shannon capacity of G represents the rate of correct transmission of the graph. For instance, for the 5-cycle C5 , α(C5 ) = 2, but α((C5 )2 ) ≥ 5. Indeed, if 1, 2, 3, 4, 5 are the nodes of C5 (in this cyclic order), then the five 2-letter words (1,√ 1), (2, 3), (3, 5), (4, 2), (5, 4) form a stable set in G2 . This implies that Θ(C5 ) ≥ 5. Determining the exact Shannon capacity of a graph is a very difficult problem in general, even for small graphs. For instance, the exact value of the Shannon capacity of C5 was not known until Lovász [8] showed how to use the theta number in order to upper bound the Shannon capacity: Lovász √ √ showed that Θ(G) ≤ ϑ(G) and ϑ(C5 ) = 5, which implies that Θ(C5 ) = 5. For instance, although the exact value of the theta number of C2n+1 is known (cf. Proposition 3.6.6), the exact value of the Shannon capacity of C2n+1 is not known, already for C7 . Theorem 3.7.2. For any graph G, we have that Θ(G) ≤ ϑ(G). The proof is based on the multiplicative property of the theta number from Lemma 3.7.3 – which you will prove in Exercise 6.2 – combined with the fact that the theta number upper bounds p the stability number: For any integer k, α(Gk ) ≤ ϑ(Gk ) = (ϑ(G))k implies k α(Gk ) ≤ ϑ(G) and thus Θ(G) ≤ ϑ(G). Lemma 3.7.3. The theta number of the strong product of two graphs G and H satisfies ϑ(G · H) = ϑ(G)ϑ(H). As an application one can compute the Shannon capacity of any graph G which is vertex-transitive and self-complementary (e.g., like C5 ). Theorem 3.7.4. If G = (V, E) is a vertex-transitivepgraph, then Θ(G · G) = |V |. If, moreover, G is self-complementary, then Θ(G) = |V |. Proof. We have Θ(G · G) ≥ α(G · G) ≥ |V |, since the set of diagonal pairs {(i, i) : i ∈ V } is stable in G · G. The reverse inequality follows from Lemma 3.7.3 combined with Proposition 3.6.5: Θ(G · G) ≤ ϑ(G · G) = ϑ(G)ϑ(G) = |V |. 2 2 If G is isomorphic to G then Θ(G p · G) = Θ(G ) = (Θ(G)) (check the rightmost equality). This gives Θ(G) = |V |. 3.8 Geometric application Definition 3.8.1. A set of vectors u1 , . . . , un ∈ Rd is called a vector labeling of G if they are unit vectors: kui k = 1 for all i ∈ V and satisfy the orthogonality 38 conditions: uT i uj = 0 ∀{i, j} ∈ E. In other words, they form an orthonormal representation of G (with the notation of Exercise 3.3). We are interested in the following graph parameter ξ(G), defined as the smallest integer d for which G admits a vector labeling in Rd . Theorem 3.8.2. For any graph G, we have: ϑ(G) ≤ ξ(G) ≤ χ(G). Proof. We show first the inequality ξ(G) ≤ χ(G). If d = χ(G), consider a partition of V into d stable sets S1 , . . . , Sd . We now assign to all nodes v ∈ Si the standard unit vector ei in Rd . In this way we have defined a vector labeling of G in the space Rd , thus showing ξ(G) ≤ d. We now show the inequality ϑ(G) ≤ ξ(G). For this, set d = ξ(G) and consider a vector labeling u1 , . . . , un ∈ Rd of G. Define the matrices U0 = Id , n+1 whose entries are U i = ui uT i (i ∈ [n]). We now consider the matrix Z ∈ S defined as Zij = hUi , Uj i for i, j ∈ {0, 1, . . . , n}. By construction, Z is positive semidefinite and satisfies: Z00 = d, Z0i = Tr(Ui ) = kui k2 = 1, Zii = hUi , Ui i = 2 kui k2 = 1, and Zij = hUi , Uj i = (uT i uj ) = 0 for all edges {i, j} ∈ E. Hence, Z is feasible for the program (3.18) in Lemma 3.4.2 (for the graph G) and thus we can conclude that ϑ(G) ≤ Z00 = d. Hence ξ(G) gives a bound for the chromatic number which is at least as good as the theta number. For bipartite graphs it is easy to check that equality holds throughout, more precisely the following holds (Exercise 4.5). Lemma 3.8.3. For any graph G, we have the equivalences: ξ(G) ≤ 2 ⇐⇒ χ(G) ≤ 2 ⇐⇒ G is bipartite. However, for larger values, the parameter ξ(G) is hard to compute. In fact, checking whether ξ(G) ≤ 3 is already an NP-hard problem. This follows from a construction of Peeters [9] which we now sketch. Consider the prism graph H, with node set V (H) = {x, a, b, c, d, y} and with edge set E(H) = {{a, x}, {a, b}, {b, x}, {c, d}, {c, y}, {d, y}, {x, d}, {b, c}, {a, y}}. We will use the following (easy to check) combinatorial property of H: H is 3-colorable and it has essentially two distinct 3-colorings, depending whether the two nodes x, y receive the same color or not. We will also use the following geometric property of its 3-dimensional vector labelings. Lemma 3.8.4. Consider a vector labeling ux , ua , ub , uc , ud , uy in R3 of the prism graph H. Then, the two vectors ux and uy are either orthogonal or parallel. 39 Proof. As the set {x, a, b} is a clique of H the vectors ux , ua , ub are pairwise orthogonal and thus form an orthonormal basis of R3 . Hence the remaining vectors uc , ud , uy have a unique linear decomposition in this basis. Moreover, they are pairwise orthogonal since the set {c, d, y} is also a clique of H. It is now easy to check that the coefficients of the linear decompositions of uc .ud , uy in the basis {ux , ua , ub } can follow only the following two patterns, where * indicates a nonzero entry: u  x uc ∗ ud  0 uy 0 ua 0 ∗ 0 ub  0 0  and ∗ u  x uc 0 ud  0 uy ∗ ua ∗ 0 0 ub  0 ∗ . 0 Therefore, in the first case the vectors ux and uy are orthogonal and in the second case they are parallel. Let G = (V = [n], E) be a graph. We now build a new graph G′ containing G as follows: For each pair of distinct nodes i, j ∈ V , we add a copy Hij of H whose nodes are x = i, y = j and four new additionnal nodes called aij , bij , cij , dij . Thus G′ contains G as an induced subgraph and G′ has 4 n−1 2  new edges. Clearly, χ(G) ≤ χ(G′ ). The following result new nodes and 9 n−1 2 shows that χ(G) ≤ 3 ⇐⇒ ξ(G′ ) ≤ 3. As the problem of testing whether χ(G) ≤ 3 is NP-complete, we can conclude that the problem of deciding whether ξ(G) ≤ 3 is also NP-hard. Proposition 3.8.5. Let G and G′ be as indicated above. Then the following holds: χ(G) ≤ 3 ⇐⇒ χ(G′ ) ≤ 3 ⇐⇒ ξ(G′ ) ≤ 3. Proof. We show: χ(G′ ) ≤ 3 =⇒ ξ(G′ ) ≤ 3 =⇒ χ(G) ≤ 3 =⇒ χ(G′ ) ≤ 3. The implication: χ(G′ ) ≤ 3 =⇒ ξ(G′ ) is clear, since ξ(G′ ) ≤ χ(G′ ). The implication: χ(G) ≤ 3 =⇒ χ(G′ ) ≤ 3 is easy. For this consider a 3coloring of G. Then, for any i 6= j ∈ V , one can extend the coloring of the nodes i, j to a 3-coloring of Hij , and thus χ(G′ ) ≤ 3. We now show the last implication: ξ(G′ ) ≤ 3 =⇒ χ(G) ≤ 3. Assume that ξ(G′ ) ≤ 3 and consider a vector labeling ux ∈ R3 (for x ∈ V (G′ )) of G′ . Then, the vectors labeling each subgraph Hij satisfy the conclusion of Lemma 3.8.4. Therefore, for any i 6= j ∈ V , the vectors ui and uj are either orthogonal or parallel. Moreover, if {i, j} is an edge of G, then ui and uj are orthogonal. If we now look at the lines spanned by the vectors ui (for i ∈ V ), it follows that there are (at most) three distinct lines among them. (Indeed, if there would be four distinct lines among them, then they should be spanned by four pairwise orthogonal vectors, which is a contradiction since we are in the space R3 .) This induces a partition of the vertices of G into three sets, depending to which of the three lines the vector vi is paralel to. Moreover, each of these sets is a stable set of G (since adjacent vertices correspond to orthogonal vectors). Hence we have found a 3-coloring of G. 40 Consider the semidefinite program: X  0, Xii = 1 (i ∈ [n]), Xij = 0 ({i, j} ∈ E). Summarizing, what the above shows is that checking whether this semidefinite program admits a feasible solution of rank at most 3 is an NP-hard problem. More generally, for any fixed integer r ≥ 1, checking existence of a feasible solution of rank at most r to a semidefinite program is NP-hard. For the case r = 1 this is easy to see: Consider e.g. the formulation (3.19) of the theta number ϑ(G) from Lemma 3.4.4. Then, a matrix Y which is feasible for (3.19) and has rank 1 corresponds exactly to a stable set S ⊆ V , namely Y = yy T , where y ∈ {0, 1}n+1 is the extended characteristic vector of S (with 1 at its zero-th entry). Therefore, If one could decide whether there is a feasible Pn solution of rank 1 to the program (3.19) augmented with the condition i=1 Yii ≥ k, then one could decide whether α(G) ≥ k, which is however an NP-complete problem. There are some known results that permit to ‘predict’ the rank of a feasible solution to a semidefinite program. In particular the following result holds, which bounds the rank in terms of the number of linear equations in the semidefinite program. Theorem 3.8.6. (see [1]) Consider the semidefinite program: hAj , Xi = bj (j ∈ [m]), X  0. (3.27) If it has a feasible solution then it has one whose rank r satisfies:   r+1 ≤ m. 2 In particular, if m ≤ 2, then there is a feasible solution of rank 1. See Exercise 3.4 for a weaker result: there exists a feasible solution of rank at most m + 1. 3.9 Exercises 3.1. Show: ϑ(G)ϑ(G) ≥ n for any graph G on n nodes. √ Show: ϑ(C5 ) ≥ 5. Hint: Let C be a feasible solution for the program (3.15) defining ϑ(G), and let C ′ be a feasible solution of the analogous program defining ϑ(G). Use the fact that hC − J, C ′ − Ji ≥ 0, hC − J, Ji ≥ 0, hC ′ − J, Ji ≥ 0 (why is this true?), and compute hC, C ′ i. 3.2 The goal is to show the result of Lemma 3.7.3 about the theta number of the strong product of two graphs G = (V, E) and H = (W, F ): ϑ(G · H) = ϑ(G)ϑ(H). 41 (a) Show that ϑ(G · H) ≥ ϑ(G)ϑ(H). (b) Show that ϑ(G · H) ≤ ϑ(G)ϑ(H). Hint: Use the primal formulation (3.12) for (a), and the dual formulation (Lemma 3.4.1) for (b), and think of using Kronecker products of matrices in order to build feasible solutions. 3.3 Let G = (V = [n], E) be a graph. A set of vectors u1 , . . . , un ∈ Rn is said to be an orthonormal representation of G if they satisfy: kui k = 1 ∀i ∈ V, uT i uj = 0 ∀{i, j} ∈ E. Consider the graph parameter ϑ1 (G) = min max c,ui i∈V 1 , (cT ui )2 where the minimum is taken over all unit vectors c and all orthonormal representations u1 , · · · , un of G. (a) Show: ϑ(G) ≤ ϑ1 (G). Hint: Consider unit vectors c, u1 , . . . , un forming an orthonormal representation of G. Set t = maxi (cT u1 i )2 and define the symmetric mauT u trix A with entries Aij = (cT uii)(cjT uj ) for {i, j} ∈ E and Aij = 0 otherwise. It might also be useful to consider the matrix M , defined as the Gram matrix of the vectors c − cTuui i for i ∈ V . Show that (t, A) is feasible for the formulation (3.13) for ϑ(G). (b) Show: ϑ1 (G) ≤ ϑ(G). Hint: Consider an optimum solution (t = ϑ(G), B) to the program (3.14) defining ϑ(G). Say tI − B is the Gram matrix of the vectors x1 , . . . , xn . Show that there exists a unit vector c orthogonal √ i (i ∈ V ) form an to x1 , . . . , xn . Show that the vectors ui = c+x t orthonormal representation of G. √ (c) Show: ϑ(C5 ) ≤ 5. Hint: Use the formulation via ϑ1 (G) and the following construction (known as Lovász’ umbrella construction). Set c = (0, 0, 1) ∈ R3 and, for k = 1, 2, 3, 4, 5, define the vector uk = (s cos(2kπ/5), s sin(2kπ/5), t) ∈ R3 , where s and t are scalars to be determined. Show that one can choose s and t in such a way that u1 , . . . , u5 form an orthonormal representation of C5 . Recall: cos(2π/5) = √ 5−1 4 . 3.4 Consider the semidefinite program: X  0, hAj , Xi = bj (j ∈ [m]). 42 (3.28) Ps Assume that X is a feasible solution of (3.28) and write X = i=1 vi viT for some vectors v1 , . . . , vs ∈ Rn , where s = rankX. Define the set ) ( s X T s yi vi vi i = bj (j ∈ [m] . P = y ∈ R : y ≥ 0, hAj , i=1 (a) Show: If Pm Aj ≻ 0, then P is a nonempty polytope. Ps (b) Show: If y is a vertex of P , then the matrix Xy = i=1 yi vi viT is a feasible solution of (3.28) with rank at most m. Pm (c) Show: If j=1 Aj ≻ 0, then there exists a feasible solution of (3.28) with rank at most m. j=1 (d) Show: There exists a feasible solution of (3.28) with rank at most m + 1. 3.5 Show Lemma 3.8.3. 43 BIBLIOGRAPHY [1] A. Barvinok. A course in geometry. AMS, 2002. [2] M. Chudnovsky, N. Robertson, P. Seymour, and R. Thomas. The strong perfect graph theorem, Annals of Mathematics 164 (1): 51229, 2006. [3] V. Chvátal. On certain polytopes associated with graphs. Journal of Combinatorial Theory, Series B 18:138–154, 1975. [4] G.S. Gasparian. Minimal imperfect graphs: a simple approach. Combinatorica, 16:209–212, 1996. [5] M. Grötschel, L. Lovász, A. Schrijver. Geometric Algorithms in Combinatorial Optimization, Springer, 1988. http://www.zib.de/groetschel/ pubnew/paper/groetschellovaszschrijver1988.pdf [6] D.E. Knuth. The Sandwich Theorem. The Electronic Journal of Combinatorics 1, A1, 1994. http://www.combinatorics.org/ojs/index.php/eljc/ article/view/v1i1a1 [7] L. Lovász. A characterization of perfect graphs. Journal of Combinatorial Theory, Series B 13:95–98, 1972. [8] L. Lovász. On the Shannon capacity of a graph. IEEE Transactions on Information Theory IT-25:1–7, 1979. [9] R. Peeters. Orthogonal representations over finite fields and the chromatic number of graphs. Combinatorica 16(3):417–431, 1996. 44 CHAPTER 4 APPROXIMATING THE MAX CUT PROBLEM 4.1 Introduction 4.1.1 The MAX CUT problem The maximum cut problem (MAX CUT) is the following problem in combinatorial optimization. Let G = (V, E) be a graph and let w = (wij ) ∈ RE + be nonnegative weights assigned to the edges. Given a subset S ⊆ V , the cut δG (S) consists of the edges {u, v} ∈ E having exactly one endnode in S, i.e., with |{i, j} ∩ S| = 1. In other words, δG (S) consists of the edges that are cut by the partition (S, S = V \ S) of V . The cut δG (S) is called trivial if S = ∅ or V (inPwhich case it is empty). Then the weight of the cut δG (S) is w(δG (S)) = {i,j}∈δG (S) wij and the MAX CUT problem asks for a cut of maximum weight, i.e., compute mc(G, w) = max w(δG (S)). S⊆V T When w = e = (1, . . . , 1) is the all-ones weight function, MAX CUT asks for a cut of maximum cardinality and we use the simpler notation mc(G) = mc(G, e). It is sometimes convenient to extend the weight function w ∈ RE to all pairs of nodes of V , by setting wij = 0 if {i, j} is not an edge of G. Given disjoint subsets S, T ⊆ V , it is also convenient to use the following notation: X wij . w(S, T ) = i∈S,j∈T Thus, w(S, S) = w(δG (S)) for all S ⊆ V. 45 To state its complexity, we formulate MAX CUT as a decision problem: MAX CUT: Given a graph G = (V, E), edge weights w ∈ ZE + and an integer k ∈ N, decide whether there exists a cut of weight at least k. It is well known that MAX CUT is an NP-complete problem. In fact, MAX CUT is one of Karp’s 21 NP-complete problems. So unless the complexity classes P and NP coincide there is no efficient polynomial-time algorithm which solves MAX CUT exactly. We give here a reduction of MAX CUT from the PARTITION problem, defined below, which is one the first six basic NP-complete problems in Garey and Johnson [5]: PARTITION: Given naturalPnumbers aP 1 , . . . , an ∈ N, decide whether there exists a subset S ⊆ [n] such that i∈S ai = i6∈S ai . Theorem 4.1.1. The MAX CUT problem is NP-complete. Proof. It is clear that MAX CUT to the class NP. We now show a reduction from PARTITION. Let a1 , . . . , an ∈ N be given. Construct the following weights wij = Pn ai aj for the edges of the completeP graph Kn . Set σ = i=1 ai and k = σ 2 /4. For any subset S ⊆ [n], set a(S) = i∈S ai . Then, we have w(S, S) = X i∈S,j∈S wij = X ai aj = ( i∈S,j∈S X i∈S ai )( X j∈S aj ) = a(S)(σ−a(S)) ≤ σ 2 /4, with equality if and only if a(S) = σ/2 or, equivalently, a(S) = a(S). From this it follows that there is a cut of weight at least k if and only if the sequence a1 , . . . , an can be partitioned. This concludes the proof. This hardness result for MAX CUT is in sharp contrast to the situation of the MIN CUT problem, which asks for a nontrivial cut of minimum weight, i.e., to compute min w(S, S). S⊆V :S6=∅,V (For MIN CUT the weights of edges are usually called capacities and they also assumed to be nonnegative). It is well known that the MIN CUT problem can be solved in polynomial time (together with its dual MAX FLOW problem), using the Ford-Fulkerson algorithm. Specifically, the Ford-Fulkerson algorithm permits to find in polynomial time a minimum cut (S, S) separating a given source s and a given sink t, i.e., with s ∈ S and t ∈ S. Thus a minimum weight nontrivial cut can be obtained by applying this algorithm |V | times, fixing any s ∈ V and letting t vary over all nodes of V \ {s}. Details can be found in Chapter 4 of the Lecture Notes [10]. Even stronger, Håstad in 2001 showed that it is NP-hard to approximate 16 ∼ 0.941. MAX CUT within a factor of 17 On the positive side, one can compute a 0.878-approximation of MAX CUT in polynomial time, using semidefinite programming. This algorithm, due to 46 Figure 4.1: Minimum and maximum weight cuts Goemans and Williamson [6], is one of the most influential approximation algorithms which are based on semidefinite programming. We will explain this result in detail in Section 4.2. Before doing that we recall some results for MAX CUT based on using linear programming. 4.1.2 Linear programming relaxation In order to define linear programming bounds for MAX CUT, one needs to find some linear inequalities that are satisfied by all cuts of G, i.e., some valid inequalities for the cut polytope of G. Large classes of such inequalities are known (cf. e.g. [3] for an overview and references). We now present some simple but important valid inequalities for the cut polytope of the complete graph Kn , which is denoted as CUTn , and defined as the convex hull of the incidence vectors of the cuts of Kn : CUTn = conv{χδKn (S) : S ⊆ [n]}. For instance, for n = 2, CUTn = [0, 1] and, for n = 3, CUT3 is a simplex in R3 (indexed by the edges of K3 ordered as {1, 2}, {1, 3}, {2, 3}) with as vertices the incidence vectors of the four cuts (S, S) of K3 : (0, 0, 0), (1, 1, 0), (1, 0, 1), and (0 1 1) (for S = ∅, {1}, {2} and {3}, respectively). As a first easy observation it is important to realize that in order to compute the maximum cut mc(G, w) in a weighted graph G on n nodes, one can as well deal with the complete graph Kn . Indeed, any cut δG (S) of G can be obtained from the corresponding cut δKn (S) of Kn , simply by ignoring the pairs that are not edges of G, in other words, by projecting onto the edge set of G. Hence one can reformulate any maximum cut problem as a linear optimization problem over the cut polytope of Kn : X wij xij ; mc(G, w) = max x∈CUTn {i,j}∈E the graph G is taken into account by the objective function of this LP. The following triangle inequalities are valid for the cut polytope CUTn : xij − xik − xjk ≤ 0, xij + xjk + xjk ≤ 2, 47 (4.1) for all distinct i, j, k ∈ [n]. This is easy to see, just verify that these inequalities hold when x is equal to the incidence vector of a cut. The triangle inequalities (4.1) imply the following bounds (check it): 0 ≤ xij ≤ 1 (4.2) on the variables. Let METn denote the polytope in RE(Kn ) defined by the triangle inequalities (4.1). Thus, METn is a linear relaxation of CUTn , tighter than the trivial relaxation by the unit hypercube: CUTn ⊆ METn ⊆ [0, 1]E(Kn ) . Moreover, one can verify that, if we add integrality conditions to the triangle inequalities, then we obtain an integer programming reformulation of MAX CUT. More precisley, we have CUTn = conv(METn ∩ {0, 1}En ). It is known that equality CUTn = METn holds for n ≤ 4, but the inclusion is strict for n ≥ 5. Indeed, the inequality: X xij ≤ 6 (4.3) 1≤i<j≤5 is valid for CUT5 (as any cut of K5 has cardinality 0, 4 or 6), but it is not valid for MET5 . For instance, the vector (2/3, . . . , 2/3) ∈ R10 belongs to MET5 but it violates the inequality (4.3) (since 10.2/3 > 6). We can define the following linear programming bound:    X  lp(G, w) = max wij xij : x ∈ METn   (4.4) {i,j}∈E(G) for the maximum cut: mc(G, w) ≤ lp(G, w). For the all-ones weight function w = e we also denote lp(G) = lp(G, e). The graphs for which this bound is tight have been characterized by Barahona [1]. Theorem 4.1.2. Let G be a graph. Then, mc(G, w) = lp(G, w) for all weight functions w ∈ RE if and only if the graph G has no K5 minor. In particular, if G is a planar graph, then mc(G, w) = lp(G, w) so that the maximum cut can be computed in polynomial time using linear programming. A natural question is how good the LP bound is for general graphs. Here are some easy bounds. Lemma 4.1.3. Let G be a graph with nonnegative weights w. The following holds: w(E)/2 ≤ mc(G, w) ≤ lp(G, w) ≤ w(E). 48 Proof. The inequality lp(G, w) ≤ w(E) follows from the inclusion METn ⊆ [0, 1]E(Kn ) and the fact that w ≥ 0. We now show that w(E)/2 ≤ mc(G, w). For this, pick S ⊆ V for which (S, S) is a cut of maximum weight: w(S, S) = mc(G, w). Thus if we move one node i ∈ S to S, or if we move one node j ∈ S to S, then we obtain another cut whose weight is at most w(S, S). This gives: w(S \ {i}, S ∪ {i}) − w(S, S) = w(S \ {i}, {i}) − w({i}, S) ≤ 0, w(S ∪ {j}, S \ {j}) − w(S, S) = w({j}, S \ {j}) − w(S, {j}) ≤ 0. Summing the first relation over i ∈ S and using the fact that 2w(E(S)) = P w(S \ {i}, {i}), where E(S) is the set of edges contained in S, and the fact i∈S P that i∈S w({i}, S) = w(S, S), we obtain: 2w(E(S)) ≤ w(S, S). Analogously, summing the second relation over j ∈ S, we obtain: 2w(E(S)) ≤ w(S, S). Summing these two relations yields: w(E(S)) + w(E(S)) ≤ w(S, S). Now adding w(S, S) to both sides implies: w(E) ≤ 2w(S, S) = 2mc(G, w), which concludes the proof. The above proof shows in fact that w(S, S) ≥ w(E)/2 for any cut (S, S) that is locally optimal, in the sense that switching one node from one side to the other side does not improve the weight of the cut. As an application of Lemma 4.1.3, we obtain that 1 w(E) mc(G, w) 1 ≤ ≤ ≤ 1 for all nonnegative weights w ≥ 0. 2 2 lp(G, w) lp(G, w) It turns out that there are graphs for which the ratio mc(G, w)/lp(G, w) can be arbitrarily close to 1/2 [9]. Hence, for these graphs, the ratio w(E)/lp(G, w) is arbitrarily close to 1, which means that the metric polytope does not provide a better approximation of the cut polytope than its trivial relaxation by the hypercube [0, 1]E . We now provide another argument for the lower bound mc(G, w) ≥ w(E)/2. This argument is probabilistic and based on the following simple randomized algorithm: Construct a random partition (S, S) of V by assigning, independently, with probability 1/2, each node i ∈ V to either side of the partition. Then the probability that an edge {i, j} is cut by the partition is equal to P({i, j} is cut) = P(i ∈ S, j ∈ S) + P(i ∈ S, j ∈ S) = 1 1 1 1 1 · + · = . 2 2 2 2 2 Hence, the expected weight of the cut produced by this random partition is equal to E(w(S, S)) = X wij P({i, j} is cut) = {i,j}∈E X {i,j}∈E 49 wij w(E) 1 = . 2 2 Here we have used the linearity of the expectation. As the expected weight of a cut is at most the maximum weight of a cut, we obtain the inequality: mc(G, w) ≥ E(w(S, S)) = w(E)/2. Moreover this simple randomized algorithm yields a random cut whose expected weight is at least half the optimum weight: E(w(S, S)) ≥ 0.5 · mc(G, w). In the next section, we will see another probabilistic argument, due to Goemans and Williamson, which permits to construct a much better random cut. Namely we will get a random cut whose expected weight satisfies: E(w(S, S)) ≥ 0.878 · mc(G, w), thus improving the above factor 0.5. The crucial tool will be to use a semidefinite relaxation for MAX CUT combined with a simple, but ingenious randomized “hyperplane rounding” technique. 4.2 The algorithm of Goemans and Williamson In this section we discuss the basic semidefinite programming relaxation for MAX CUT and its dual reformulation and then we present the approximation algorithm of Goemans and Williamson. 4.2.1 Semidefinite programming relaxation Here we will introduce the basic semidefinite programming relaxation for MAX CUT. For this we first reformulate MAX CUT as a (non-convex) quadratic optimization problem having quadratic equality constraints. With every vertex i ∈ V , we associate a binary variable xi ∈ {−1, +1} which indicates whether i lies in S or in S, say, i ∈ S if xi = −1 and i ∈ S if xi = +1. We model the binary constraint xi ∈ {−1, +1} as a quadratic equality constraint x2i = 1 for i ∈ V. For two vertices i, j ∈ V we have 1 − xi xj ∈ {0, 2}. This value equals to 0 if i and j lie on the same side of the cut (S, S) and the value equals to 2 if i and j lie on different sides of the cut. Hence, one can express the weight of the cut (S, S) by X 1 − xi xj w(S, S) = . wij 2 {i,j}∈E Now, the MAX CUT problem can be equivalently formulated as    1 X wij (1 − xi xj ) : x2i = 1 ∀i ∈ V . mc(G, w) = max  2 {i,j}∈E 50 (4.5) Next, we introduce a matrix variable X = (xij ) ∈ S n , whose entries xij model the pairwise products xi xj . Then, as the matrix (xi xj )ni,j=1 = xxT is positive semidefinite, we can require the condition that X should be positive semidefinite. Moreover, the constraints x2i = 1 give the constraints Xii = 1 for all i ∈ [n]. Therefore we can formulate the following semidefinite programming relaxation:    1 X wij (1 − Xij ) : X  0, Xii = 1 ∀i ∈ [n] . (4.6) sdp(G, w) = max  2 {i,j}∈E By construction, we have: mc(G, w) ≤ sdp(G, w). (4.7) Again we set sdp(G) = sdp(G, e) for the all-ones weight function. The feasible region of the above semidefinite program is the convex (nonpolyhedral) set En = {X ∈ S n : X  0, Xii = 1 ∀i ∈ [n]}, called the elliptope (and its members are known as correlation matrices). One can visualize the elliptope E3 . Indeed, for a 3 × 3 symmetric matrix X with an all-ones diagonal, we have:   1 x y X = x 1 z   0 ⇐⇒ 1 + 2xyz − x2 − y 2 − z 2 ≥ 0, x, y, z ∈ [−1, 1], y z 1 which expresses the fact that the determinant of X is nonnegative as well as the three 2 × 2 principal subdeterminants. The following Figure 4.2.1 visualizes the set of triples (x, y, z) for which X ∈ E3 . Notice that the elliptope E3 looks like an “inflated” tetrahedron, while the underlying tetrahedron corresponds to the linear relaxation MET3 . Figure 4.2: Views on the convex set E3 behind the semidefinite relaxation. We have: CUTn ⊆ En , with strict inclusion for any n ≥ 3. One can for instance verify that mc(Kn ) < sdp(Kn ) for any odd n ≥ 3. 51 4.2.2 Dual semidefinite programming relaxation Given a graph G with edge weights w, its Laplacian matrix Lw is the symmetric n × n matrix with entries: X (Lw )ii = wij (i ∈ [n]), j:{i,j}∈E (Lw )ij = −wij ({i, j} ∈ E), (Lw )ij = 0 (i 6= j, {i, j} 6∈ E). The following can be checked (Exercise 4.2). Lemma 4.2.1. The following properties hold for the Laplacian matrix Lw : P (i) For any vector x ∈ {±1}n , 41 xT Lw x = 12 {i,j}∈E wij (1 − xi xj ). (ii) For any nonnegative edge weights w ≥ 0, Lw  0. This permits to reformulate the quadratic formulation (4.5) of MAX CUT as   1 T mc(G, w) = max x Lw x : x2i = 1 ∀i ∈ V (4.8) 4 and its semidefinite relaxation (4.6) as   1 hLw , Xi : X  0, Xii = 1 ∀i ∈ V . sdp(G, w) = max 4 (4.9) This is a semidefinite program in standard primal form and its dual semidefinite program reads: ) ( n X 1 (4.10) yi : Diag(y) − Lw  0 . sdp(G, w) = min 4 i=1 Here, for a vector y ∈ Rn , Diag(y) denotes the diagonal matrix with diagonal entries y1 , . . . , yn . Indeed, as the program (4.6) is strictly feasible (e.g. the identity matrix is strictly feasible), the minimization program in (4.10) attains its minimum and there is no duality gap: its minimum value is equal to sdp(G, w). We now give a reformulation for the program (4.10), in terms of the maximum eigenvalue of the Laplacian matrix up to a suitable diagonal shift. This upper bound for MAX CUT was first introduced and investigated by Delorme and Poljak [2]. Theorem 4.2.2. Let G = (V = [n], E) be a graph and let w ∈ RE + be nonnegative edge weights. The following holds: ) ( n X n ui = 0 . (4.11) λmax (Lw + Diag(u)) : sdp(G, w) = minn u∈R 4 i=1 52 Proof. Let ϕ(G, w) denote the optimum value of the program in (4.11). We need to show that sdp(G, w) = ϕ(G, w). Observe first that, using the fact that λmax (Lw + Diag(u)) = min{t : tI − (Lw + Diag(u))  0}, t∈R we can reformulate ϕ(G, w) as ( ) n X n ui = 0 . ϕ(G, w) = min n t : tI − (Lw + Diag(u))  0, t∈R,u∈R 4 i=1 (4.12) Consider first a feasible solution y to the program (4.10). Set t = n4 eT y and u = te − 4y (where e denotes the all-ones vector). Then, eT u = 0 and we have tI − (Lw + Diag(u)) = 4Diag(y) − Lw  0. Hence, (t, u) is feasible for the program defining ϕ(G, w) and thus ϕ(G, w) ≤ nt/4 = eT y. This shows that ϕ(G, w) ≤ sdp(G, w). Conversely, consider a feasible solution (t, u) for the program defining ϕ(G, w). Set y = 14 (te − u). Then, Diag(y) − L4w = 41 (tI − (Lw + Diag(u))  0. This shows that sdp(G, w) ≤ eT y = nt/4 and thus sdp(G, w) ≤ ϕ(G, w). What the above result says is that the semidefinite bound amounts to finding an optimal ‘correcting’ vector u to add to the diagonal of the Laplacian matrix Lw . Interestingly, Delorme and Poljak [2] show the unicity of this optimal correcting vector. Lemma 4.2.3. The program (4.11) has a unique optimal solution u. Proof. Let X be an optimal solution to the primal program (4.9), let u be an optimal solution of the program (4.11) and set t = λmax (Lw + Diag(u)). We claim that X(tI − (Lw + Diag(u)) = 0. Indeed, by the optimality condition we have that hLw /4, Xi = nt/4. Combining this with the fact that hI, Xi = n and hX, Diag(u)i = uT e = 0, we obtain that hX, tI − (Lw + Diag(u)i = tn − hX, Lw i = 0. As both X and tI − (Lw + Diag(u)) are positive semidefinite, we can conclude that their product is zero, that is, X(tI − (Lw + Diag(u)) = 0. Now, if u′ is another optimal solution of (4.11), X(tI − (Lw + Diag(u′ )) = 0 also holds. Therefore, combining with the above identity, we obtain that XDiag(u − u′ ) = 0, which easily implies that u = u′ (since the diagonal entries of X are equal to 1). As an immediate corollary of Theorem 4.2.2 we obtain that mc(G, w) ≤ sdp(G, w) ≤ n λmax (Lw ) 4 (by selecting u = 0 in the program (4.11)). Moreover, if G is vertex-transitive and w = e is the all-ones weight function, then selecting the zero vector u = 0 is the optimum solution. 53 Lemma 4.2.4. If G is vertex transitive then sdp(G) = n 4 λmax (L). (You will show this in Exercise 4.5). As an application, one can compute the exact value of sdp(G) when G is an odd cycle (see Exercise 4.5). In particular, one has: mc(C5 ) 32 √ =: α∗ ∼ 0.884. = sdp(C5 ) 25 + 5 5 Delorme and Poljak [2] show that the ratio mc(G)/sdp(G) is at least α∗ ∼ 0.884 for several classes of graphs. As we will see in the next section, for general graphs, the worst case ratio for mc(G)/sdp(G) is at least 0.878... Interestingly, Poljak [8] could show that α∗ is the worst case value for the ratio of the LP and SDP bounds: lp(G, w) ≥ α∗ ∼ 0.884. sdp(G, w) 4.2.3 The Goemans-Williamson algorithm Goemans and Williamson [6] show the following result for the semidefinite programming bound sdp(G, w). Theorem 4.2.5. Given a graph G with nonnegative edge weights w, the following inequalities hold: sdp(G, w) ≥ mc(G, w) ≥ 0.878 · sdp(G, w). The proof is algorithmic and it gives an approximation algorithm which approximates the MAX CUT problem within a ratio of 0.878. The GoemansWilliamson algorithm has five steps: 1. Solve the semidefinite program (4.6); let X be an optimal solution, so P that sdp(G, w) = {i,j}∈E wij (1 − Xij )/2. 2. Perform a Cholesky decomposition of X to find unit vectors vi ∈ R|V | for i ∈ V , so that X = (viT vj )i,j∈V . 3. Choose a random unit vector r ∈ R|V | , according to the rotationally invariant probability distribution on the unit sphere. 4. Define a cut (S, S) by setting xi = sign(viT r) for all i ∈ V . That is, i ∈ S if and only if sign(viT r) ≤ 0. P 5. Check whether {i,j}∈E wij (1 − xi xj )/2 ≥ 0.878 · sdp(G, w). If not, go to step 3. The steps 3 and 4 in the algorithm are called a randomized rounding procedure because a solution of a semidefinite program is “rounded” (or better: projected) to a solution of the original combinatorial problem with the help of randomness. 54 Note also that because the expectation of the constructed solution is at least 0.878 · sdp(G, w) the algorithm eventually terminates; it will pass step 5 and without getting stuck in an endless loop. One can show that with high probability we do not have to wait long until the condition in step 5 is fulfilled. The following lemma (also known as Grothendieck’s identity, since it came up in work of Grothendieck in the 50’s, however in the different context of functional analysis) is the key to the proof of Theorem 4.2.5. For t ∈ [−1, 1], recall that arccos t is the unique angle ϑ ∈ [0, π] such that cos ϑ = t. Lemma 4.2.6. Let u, v ∈ Rd (for some d ≥ 1) be unit vectors and let r ∈ Rd be a random unit vector chosen according to the rotationally invariant probability distribution on the unit sphere. The following holds. (i) The probability that sign(uT r) 6= sign(v T r) is equal to P(sign(uT r) 6= sign(v T r)) = arccos(uT v) . π (4.13) (ii) The expectation of the random variable sign(uT r) sign(v T r) ∈ {−1, +1} is equal to 2 E[sign(uT r) sign(v T r)] = arcsin(uT v). (4.14) π Proof. (i) If u = ±v then arccos(uT v) = 0, π and the identity (4.13) holds clearly. We now assume that u, v span a vector space W of dimension 2. Let s be the orthogonal projection of r onto W , so that rT u = sT u and rT v = sT v. Then the vectors u, v lie on the unit circle in W and s/ksk is now uniformely distributed on the unit circle. Hence, the probability that sign(uT r) 6= sign(v T r) depends only on the angle between u and v and P[sign(uT r) 6= sign(v T r)] = 2 · 1 1 arccos(uT v) = arccos(uT v). 2π π (To see this, it helps to draw a figure.) (ii) By definition, the expectation E[sign(uT r) sign(v T r)] can be computed as (+1) · P[sign(uT r) = sign(v T r)] + (−1) · P[sign(uT r) 6= sign(v T r)] = 1 − 2 · P[sign(uT r) 6= sign(v T r)] = 1 − 2 · arccos(uT v) , π where we have used (i) for the last equality. Now use the trigonometric identity arcsin t + arccos t = π , 2 to conclude the proof of (ii). Using elementary univariate calculus one can show the following fact. 55 Lemma 4.2.7. For all t ∈ [−1, 1)], the following inequality holds: 2 arccos t ≥ 0.878. π 1−t (4.15) One can also “see” this on the following plots of the function in (4.15), where t varies in [−1, 1) in the first plot and in [−0.73, −0.62] in the second plot. 10 8.791e-1 8 8.79e-1 6 8.789e-1 4 8.788e-1 8.787e-1 2 8.786e-1 -1 -0.5 0 1 0.5 -0.73 -0.72 -0.71 -0.7 -0.69 -0.68 -0.67 -0.66 -0.65 Proof. (of Theorem 4.2.5) Let X be the optimal solution of the semidefinite program (4.6) and let v1 , . . . , vn be unit vectors such that X = (viT vj )ni,j=1 , as in Steps 1,2 of the GW algorithm. Let (S, S) be the random partition of V , as in Steps 3,4 of the algorithm. We now use Lemma 4.2.6(i) to compute the expected value of the cut (S, S): P P E(w(S, S)) = {i,j}∈E wij P({i, j} is cut) = {i,j}∈E wij P(xi 6= xj ) = = P P {i,j}∈E {i,j}∈E wij P(sign(viT r) 6= sign(vjT r)) = wij  1−viT vj 2 By Lemma 4.2.7, each term 2 π    arccos(v T v ) · π2 1−vT vij j . P {i,j}∈E wij arccos(viT vj ) π i arccos(viT vj ) 1−viT vj can be lower bounded by the constant 0.878. Since all weights are nonnegative, each term wij (1−viT vj ) is nonnegative. Therefore, we can lower bound E(w(S, S)) in the following way:   X 1 − viT vj E(w(S, S)) ≥ 0.878 · . wij 2 {i,j}∈E Now we recognize that the objective value sdp(G, w) of the semidefinite program is appearing in the right hand side and we obtain:   X 1 − viT vj = 0.878 · sdp(G, w). wij E(w(S, S)) ≥ 0.878 · 2 {i,j}∈E Finally, it is clear that the maximum weight of a cut is at least the expected value of the random cut (S, S): mc(G, w) ≥ E(w(S, S)). 56 Putting things together we can conclude that mc(G, w) ≥ E(w(S, S)) ≥ 0.878 · sdp(G, w). This concludes the proof, since the other inequality mc(G, w) ≤ sdp(G, w) holds by (4.7). 4.2.4 Remarks on the algorithm It remains to give a procedure which samples a random vector from the unit sphere. This can be done if one can sample random numbers from the standard normal (Gaussian) distribution (with mean zero and variance one) which has probability density 2 1 f (x) = √ e−x /2 . 2π Many software packages include a procedure which produces random numbers from the standard normal distribution. If we sample n real numbers x1 , . . . , xn independently uniformly at random from the standard normal distribution, then, the vector 1 (x1 , . . . , xn )T ∈ S n−1 r= p 2 2 x1 + · · · + xn is distributed according to the rotationally invariant probability measure on the unit sphere. Finally we mention that one can modify the Goemans-Williamson algorithm so that it becomes an algorithm which runs deterministically (without the use of randomness) in polynomial time and which gives the same approximation ratio. This was done by Mahajan and Ramesh in 1995. 4.3 Extension to variations of MAX CUT We now consider two variations of MAX CUT: the maximum bisection and maximum k-cut problems, where the analysis of Goemans and Williamson can be extended to yield good approximations. 4.3.1 The maximum bisection problem The maximum weight bisection problem is the variant of the maximum cut problem, where one wants to find a cut (S, S) such that |S| = n/2 (called a bisection) having maximum possible weight. This is also an NP-hard problem. n As before let us encode a partition P (S, S) of V by a vector x ∈ {±1} . Then we have a bisection precisely when i∈V xi = 0. Hence the maximum bisection problem can be formulated by the following quadratic program:   1 X  X mcb (G, w) := max xi = 0 , wij (1 − xi xj ) : x2i = 1 (i ∈ V ), 2  i∈V {i,j}∈E 57 P which is obtained by adding the constraint i∈V xi = 0 to the program (4.5). T As before we consider a matrix X ∈ S n modelling the outer Pproduct2 xx . As P x ) = 0 and x = 0 can be equivalently written as ( the constraint i∈V i i∈V i P P thus as i,j∈V xi xj = 0, this gives the constraint i,j∈V Xij for the matrix X. Therefore, we get the following semidefinite program: sdpb (G, w) := max  1 2 X {i,j}∈E (1 − Xij ) : X  0, Xii = 1 (i ∈ V ), X i,j∈V which gives a relaxation for the maximum bisection problem:   Xij = 0 ,  (4.16) mcb (G, w) ≤ sdpb (G, w). Frieze and Jerrum [4] use and extend the idea of Goemans-Williamson to derive a 0.651-approximation algorithm for the maximum bisection problem. For this, one first performs the steps 1-4 of the Goemans-Williamson algorithm. That is, 1. Compute an optimal solution X of the semidefinite program (4.16) and vectors v1 , . . . , vn ∈ Rn such that Xij = viT vj for i, j ∈ V . 2. Choose a random unit vector r ∈ Rn (according to the rotationally invariant probability distribution on the unit sphere) and define the cut (S, S) by S = {i ∈ V : rT vi > 0}. In the next step, one has to modify the cut (S, S) in order to get a bisection. 3. Say s := |S| ≥ n/2 and write S = {i1 , . . . , is }, where w(i1 , S) ≥ w(i2 , S) ≥ . . . ≥ w(is , S). Then, set S̃ = {i1 , . . . , in/2 }, so that (S̃, V \ S̃) is a bisection. First we relate the weights of the two cuts δG (S) and δG (S̃). Lemma 4.3.1. We have: w(δG (S̃)) ≥ n 2|S| w(δ(S)). Proof. Set T = S \ S̃ = {in/2+1 , . . . , is }. By construction, w(j, S) ≤ w(i, S) for all i ∈ S̃ and j ∈ T . After summing over i ∈ S̃ and j ∈ T , we obtain that |S̃|w(T, S) ≤ |T |w(S̃, S). Therefore,   |S| |T | |S| = w(S̃, S) ≤ w(δG (S̃)) , w(S, S) = w(S̃, S)+w(T, S) ≤ w(S̃, S) 1 + |S̃| |S̃| |S̃| as w(δG (S̃)) = w(S̃, T ) + w(S̃, S) ≥ w(S̃, S). This concludes the proof . 58 We now consider the following random variable Z: Z= |S|(n − |S|) w(δG (S)) + . sdpb (G, w) n2 /4 The first term is the weight of the random cut (S, S) in (G, w) scaled by the optimum value of the semidefinite program, whose expected value is at least αGW = 0.878.. by the analysis of Goemans-Williamson. For the second term, observe that the numerator is the cardinality of the random cut (S, S) in the unweighted P 1−v T v complete graph Kn , while the denominator is n2 /4 = {i,j}∈E(Kn ) 2i j , usPn ing the fact that i=1 vi = 0. Hence the expected value of the second term too is at least αGW by the analysis of Goemans-Williamson. This shows: Lemma 4.3.2. E(Z) ≥ 2αGW , where αGW = 0.878... The strategy used by Frieze and Jerrum [4] to get a ”good” bisection is now to repeat the above steps 1-3 until obtained a cut (S, S) satisfying (almost) the condition: Z ≥ 2αGW . (Roughly speaking, this will happen with high probability after sufficiently many rounds.) Indeed, as we now show, if the random cut (S, S) satisfies the inequality: Z ≥ 2αGW , then the corresponding bisection (S̃, V \ S̃) provides a good (0.631) approximation of the maximum √ bisection. Note that 2( 2αGW − 1) ∼ 0.631. √ Lemma 4.3.3. If Z ≥ 2αGW then w(δG (S̃)) ≥ 2( 2αGW − 1)sdpb (G, w). Proof. Using Lemma 4.3.1, we get: w(δG (S̃)) ≥ n 2|S| w(δG (S)) = ≥ n 2|S| 2(n−|S|) sdpb (G, w)  n 2(n−|S|) sdpb (G, w). n Zsdpb (G, w) − n |S| αGW − √ n It suffices now to check that |S| αGW − 2(n−|S|) ≥ 2( 2αGW −1) or, equivalently, n 2 √ αGW − 2 2αGW |S| + 2|S| n  n2 ≥ 0. The latter holds since the left hand side is equal √ 2 √ to . αGW − 2 |S| n 4.3.2 The maximum k-cut problem Given a graph G = (V, E) and nonnegative edge weights w ∈ RE + , the maximum k-cut problem asks for a partition P = (S1 , . . . , Sk ) of V into k classes so that the total weight w(P) of the edges cut by the partition is maximized. Here, an edge is cut by the partition P if P its end nodes belong to distinct classes and thus P we define w(P) = 1≤h<h′ ≤k′ {i,j}∈E,i∈Sh ,j∈Sh′ wij . For k = 2, we find again MAX CUT. As in the case of MAX CUT, there is a simple probabilistic algorithm permitting to construct a random partition of weight at least w(E)(k − 1)/k. Namely, construct a random partition of V into k classes by assigning, independently 59 with probability 1/k, each node i ∈ V to any of the k classes. Then, the probability that two nodes i, j fall into the same class is equal to 1/k and thus the expected weight of the random partition is w(E)(1 − k1 ). Frieze and Jerrum [4] present an approximation algorithm with performance guarantee αk satisfying: (i) αk > 1 − 1 k and limk→∞ 1 ) αk −(1− k 2k−2 ln k = 1. (ii) α2 = αGW > 0.878.., α3 ≥ 0.832, α4 ≥ 0.850, α5 ≥ 0.874, α10 ≥ 0.926, α100 ≥ 0.99. The starting point is to model a partition P = (S1 , . . . , Sk ) of V by a vector x ∈ Rn , whose coordinates can take k possible values. For k = 2 these two possible values are ±1 and, for k ≥ 2, these possible values are k unit vectors a1 , . . . , ak ∈ Rk−1 satisfying aT i aj = − 1 for all i 6= j ∈ [k]. k−1 (Such vectors exist since the k × k matrix kI − J is positive semidefinite with rank k − 1.) Now we can formulate the maximum k-cut problem as the problem:    k − 1 X wij (1 − xT mck (G, w) = max i xj ) : x1 , . . . , xn ∈ {a1 , . . . , ak }   k {i,j}∈E and a natural semidefinite relaxation is: P sdpk (G, w) = max{ k−1 {i,j}∈E wij (1 − Xij ) : k X  0, Xii = 1 (i ∈ V ), 1 (i 6= j ∈ V )}. Xij ≥ − k−1 (4.17) Frieze and Jerrum propose the following randomized algorithm: 1. Compute an optimal solution of the semidefinite program (4.17) and vectors v1 , . . . , vn ∈ Rn such that X = (viT vj ). 2. Choose k independent random vectors r1 , . . . , rk ∈ Rn (according to the rotationally invariant probability distribution on the unit sphere). 3. For h ∈ [k], let Sh consist of the nodes i ∈ V for which rhT vi = maxh′ ∈[k] rhT vi . This defines a partition P = (S1 , . . . , Sk ) of V (after beaking ties arbitrarily as they occur with probability 0). The analysis is along the same lines as the analysis for the case k = 2, although the technical details are more involved. We give a sketch only. The following observation is crucial: Given two unit vectors u, v, the probability that both maxima in maxh′ ∈[k] rhT u and maxh′ ∈[k] rhT v are attained at the 60 same vector among r1 , . . . , rk depends only on the angle between the two vectors u, v and thus on uT v; we denote this probability as k · I(uT v). Then the expected weight of the random partition P returned by the above algorithm is equal to P E(w(P)) = {i,j}∈E wij (1 − kI(viT vj )   T  P k 1−kI(vi vj ) k−1 T = {i,j}∈E wij k−1 T k (1 − vi vj ) 1−vi vj ≥ αk · sdpk (G, w), after setting αk := k 1 − kI(t) . 1−t − k−1 ≤t<1 k − 1 min 1 The evaluation of the constants αk requires delicate technical details. 4.4 Extension to quadratic programming In this section we consider several extensions to general quadratic problems, where some (weaker) approximation algorithms can be shown. 4.4.1 Nesterov’s approximation algorithm Let us now consider the general quadratic problem:   n X  Aij xi xj : x2i = 1 ∀i ∈ [n] , qp(A) = max   (4.18) i,j=1 where A ∈ S n is a symmetric matrix. Analogously we can define the semidefinite programming relaxation: sdp(A) = max {hA, Xi : X  0, Xii = 1 ∀i ∈ [n]} . (4.19) The following inequality holds: qp(A) ≤ sdp(A) for any symmetric matrix A. In the case when A = Lw /4 is the (scaled) Laplacian matrix of the weighted graph (G, w), then we find the MAX CUT problem: qp(A) = mc(G, w), and its semidefinite relaxation: sdp(A) = sdp(G, w). Assuming all edge weights are nonnegative, we have just seen that 0.878..sdp(A) ≤ qp(A). The matrix A = Lw /4 has two specific properties: (a) it is positive semidefinite and (b) its off-diagonal entries are nonpositive. We next consider the case when A is assumed to be positive semidefinite only. Then Nesterov [7] shows that sdp(A) yields a π2 -approximation for qp(A) 61 (Theorem 4.4.2 below). His proof is based on the same rounding technique of Goemans-Williamson, but the analysis is different. It relies on the following property of the function arcsin t: There exist positive scalars ak > 0 (k ≥ 0) such that X arcsin t = t + ak t2k+1 for all t ∈ [−1, 1]. (4.20) k≥0 Based on this one can show the following result. Lemma 4.4.1. Given a matrix X = (xij ) ∈ S n , define the new matrix X̃ = (arcsin Xij − Xij )ni,j=1 , whose entries are the images of the entries of X under the map t 7→ arcsin t − t. Then, X  0 implies X̃  0. Proof. The proof uses the following fact: If X = (xij )ni,j=1 is positive semidefk n )i,j=1 (whose entries are the inite then, for any integer k ≥ 1, the matrix (Xij k-th powers of the entries of X) is positive semidefinite as well. (Recall Section 1.2.2 of Chapter 1.) Using this fact, the form of the series decomposition (4.20), and taking limits, implies the result of the lemma. Theorem 4.4.2. Assume A is a positive semidefinite matrix. Then, sdp(A) ≥ qp(A) ≥ 2 sdp(A). π Proof. Let X be an optimal solution of the semidefinite program (4.19) and let v1 , . . . , vn be unit vectors such that X = (viT vj )ni,j=1 (as in Steps 1,2 of the GW algorithm). Pick a random unit vector r and set xi = sign(viT r) for i ∈ V (as in Steps 3,4 of the GW Pnalgorithm). We now use Lemma 4.2.6(ii) to compute the expected value of i,j=1 Aij xi xj : E( = = Pn i,j=1 2 π 2 π Pn Aij xi xj ) = i,j=1 P Pn i,j=1 Aij E(xi xj ) Aij arcsin(viT vj ) = n i,j=1 Aij Xij + 2 π Pn i,j=1 Aij arcsin Xij  A (arcsin X − X ) . ij ij ij i,j=1 Pn By Lemma 4.4.1, the second term is equal to hA, X̃i ≥ 0, since X̃  0. Moreover, we recognize in the first term the objective value of the semidefinite program (4.19). Combining these facts, we obtain: E( n X i,j=1 Aij xi xj ) ≥ 62 2 sdp(A). π On the other hand, it is clear that qp(A) ≥ E( n X Aij xi xj ). i,j=1 This concludes the proof. 4.4.2 Quadratic programs modeling MAX 2SAT Here we consider another class of quadratic programs, of the form:    X X bij (1 + xi xj ) : x ∈ {±1}n , aij (1 − xi xj ) + qp(a, b) = max   ij∈E2 ij∈E1 (4.21) where aij , bij ≥ 0 for all ij. Write the semidefinite relaxation:    X X bij (1 + Xij ) : X  0, Xii = 1 ∀i ∈ [n] . aij (1 − Xij ) + sdp(a, b) = max   ij∈E1 ij∈E2 (4.22) Goemans and Williamson [6] show that the same approximation result holds as for MAX CUT: Theorem 4.4.3. Assume that a, b ≥ 0. Then, sdp(a, b) ≥ qp(a, b) ≥ 0.878 · sdp(a, b). In the proof we will use the following variation of Lemma 4.2.7. Lemma 4.4.4. For any z ∈ [−1, 1], the following inequality holds: 2 π − arccos z ≥ 0.878. π 1+z Proof. Set t = −z ∈ [−1, 1]. Using the identity arccos(−t) = π − arccos t and z t applying (4.15), we get: π2 π−arccos = π2 arccos 1+z 1−t ≥ 0.878. Proof. (of Theorem 4.4.3) We apply the GW algorithm: Let X = (viT vj ) be an optimal solution of (4.22). Pick a random unit vector r and set xi = sign(viT r) arccos(v T v ) i j , we for i ∈ [n]. Using the fact that E(xi xj ) = 1−2·P(xi 6= xj ) = 1−2· π can compute the expected value of the quadratic objective of (4.21) evaluated at x: 63 E P =2· = P ij∈E1 aij (1 − xi xj ) + P ij∈E1 aij arccos(viT vj ) π P ij∈E2 bij (1 + xi xj ) +2· P ij∈E2 bij   1− arccos(viT vj ) π  2 arccos(viT vj ) P 2 π − arccos(viT vj ) T T b (1 + v v ) + a (1 − v v ) ij j ij j ij∈E2 ij∈E1 | | {z i } π 1 − viT vj {z i } π 1 + viT vj {z } {z } | | ≥0 ≥0 ≥0.878 ≥0.878 ≥ 0.878 · sdp(a, b). Here we have used Lemmas 4.2.7 and 4.4.4. From this we can conclude that qp(a, b) ≥ 0.878 · sdp(a, b). In the next section we indicate how to use the quadratic program (4.21) in order to formulate MAX 2 SAT. 4.4.3 Approximating MAX 2-SAT An instance of MAX SAT is given by a collection of Boolean clauses C1 , . . . , Cm , where each clause Cj is a disjunction of literals, drawn from a set of variables {z1 , . . . , zn }. A literal is a variable zi or its negation z i . Moreover there is a weight wj attached to each clause Cj . The MAX SAT problem asks for an assignment of truth values to the variables z1 , . . . , zn that maximizes the total weight of the clauses that are satisfied. MAX 2SAT consists of the instances of MAX SAT where each clause has at most two literals. It is an NP-complete problem [5] and analogously to MAX CUT it is also hard to approximate. Goemans and Williamson show that their randomized algorithm for MAX CUT also applies to MAX 2SAT and yields again a 0.878-approximation algorithm. Prior to their result, the best approximation was 3/4, due to Yannakakis (1994). To show this it suffices to model MAX 2SAT as a quadratic program of the form (4.21). We now indicate how to do this. We introduce a variable xi ∈ {±1} for each variable zi of the SAT instance. We also introduce an additional variable x0 ∈ {±1} which is used as follows: zi is true if xi = x0 and false otherwise. Given a clause C, define its value v(C) to be 1 if the clause C is true and 0 otherwise. Thus, v(zi ) = 1 − x0 xi 1 + x0 xi , v(z i ) = 1 − v(zi ) = . 2 2 Based on this one can now express v(C) for a clause with two literals: v(zi ∨ zj ) = 1 − v(z i ∧ z j ) = 1 − v(z i )v(z j ) = 1 − = 1+x0 xi 4 + 1+x0 xj 4 64 + 1−xi xj . 4 1−x0 xi 1−x0 xj 2 2 Analogously, one can express v(zi ∨ z j ) and v(z i ∨ z j ), by replacing xi by −xi when zi is negated. In all cases we see that v(C) is a linear combination of terms of the form 1 + xi xj and 1 − xi xj with nonnegative coefficients. Now MAX 2SAT can be modelled as m X wj v(Cj ) : x21 = . . . = x2n = 1}. max{ j=1 This quadratic program is of the form (4.21). Hence Theorem 4.4.3 applies. Therefore, the approximation algorithm of Goemans and Williamson gives a 0.878 approximation for MAX 2SAT. 4.5 Further reading and remarks We start with an anecdote. About the finding of the approximation ratio 0.878, Knuth writes in the article “Mathematical Vanity Plates”: For their work [6], Goemans and Williamson won in 2000 the Fulkerson prize (sponsored jointly by the Mathematical Programming Society and the AMS) which recognizes outstanding papers in the area of discrete mathematics for this result. 65 How good is the MAX CUT algorithm? Are there graphs where the value of the semidefinite relaxation and the value of the maximal cut are a factor of 0.878 apart or is this value 0.878, which maybe looks strange at first sight, only an artefact of our analysis? It turns out that the value is optimal. In 2002 Feige and Schechtmann gave an infinite family of graphs for which the ratio mc/sdp converges to exactly 0.878 . . .. This proof uses a lot of nice mathematics (continuous graphs, Voronoi regions, isoperimetric inequality) and it is explained in detail in the Chapter 8 of the book Approximation Algorithms and Semidefinite Programming of Gärtner and Matoušek. In 2007, Khot, Kindler, Mossel, O’Donnell showed that the algorithm of Goemans and Williamson is optimal in the following sense: If the unique games conjecture is true, then there is no polynomial time approximation algorithm achieving a better approximation ratio than 0.878 unless P = NP. Currently, the validity and the implications of the unique games conjecture are under heavy investigation. The book of Gärtner and Matoušek also contains an introduction to the unique games conjecture. 4.6 Exercises 4.1 The goal of this exercise is to show that the maximum weight stable set problem can be formulated as an instance of the maximum cut problem. Let G = (V, E) be a graph with node weights c ∈ RV+ . Define the new graph G′ = (V ′ , E ′ ) with node set V ′ = V ∪ {0}, with edge set E ′ = ′ E ∪ {{0, i} : i ∈ V }, and with edge weights w ∈ RE + defined by w0i = ci − degG (i)M for i ∈ V, and wij = M for {i, j} ∈ E. Here, degG (i) denotes the degree of node i in G, and M is a constant to be determined. (a) Let S ⊆ V . Show: w(S, V ′ \ S) = c(S) − 2M |E(S)|. (b) Show: If M is sufficiently large, then S ⊆ V is a stable set of maximum weight in (G, c) if and only if (S, V ′ \ S) is a cut of maximum weight in (G′ , w). Give an explicit value of M for which the above holds. 4.2 Let G = (V = [n], E) be a graph and let En denote the set of edges of the complete graph Kn (thus E ⊆ En ). Let πE denote the projection from the space REn to its subspace RE , mapping x = (xij )1≤i<j≤n to y = (xij ){i,j}∈E . The metric polytope MET(G) is the polytope in RE , defined by the inequalities: 0 ≤ xij ≤ 1 66 for all edges {i, j} of G, and x(F ) − x(C \ F ) ≤ |F | − 1 for all cycles C in G and all subsets F ⊆ C with odd cardinality |F |. Recall that the metric polytope METn is the polytope in REn defined by the triangle inequalities: xij + xik + xjk ≤ 2, xij ≤ xik + xjk , xik ≤ xij + xjk , xjk ≤ xij + xik for all 1 ≤ i < j < k ≤ n. Show: MET(G) = πE (METn ). E 4.3 Let G = (V = [n], E) be a graph with edge P weights w ∈ R . Define n the Laplacian matrix Lw ∈ S by: Lii = j∈V :{i,j}∈E wij for i ∈ V , Lij = −wij if {i, j} ∈ E, and Lij = 0 otherwise. P (a) Show: xT Lw x = 2 · {i,j}∈E wij (1 − xi xj ) for any vector x ∈ {±1}n . (b) Show: If w ≥ 0 then Lw  0. (c) Given an example of weights w for which Lw is not positive semidefinite. 4.4 Let G = (V, E) be a graph with nonnegative edge weights w ∈ RE + . Let sdp(G, w) be the optimum value of the semidefinite relaxation for MAX CUT (as defined in (4.6)). (a) Show: sdp(G, w) ≤ w(E). (b) Show: sdp(G, w) = mc(G, w) if G is bipartite. (c) Assume that G is the complete graph Kn and all edge weights are equal to 1. Show: sdp(Kn ) = n2 /4. What is the value of mc(Kn )? 4.5 Let G = (V, G) be a graph and let sdp(G) be the optimum value of the semidefinite relaxation for unweighted MAX CUT, where all edge weights are equal to 1. Let L denote the Laplacian matrix of G and let λmax (L) denote its largest eigenvalue. (a) Show: If G is vertex-transitive then sdp(G) = n 4 λmax (L). (b) Show: For the odd cycle G = C2n+1 on 2n + 1 vertices,    π 2n + 1 1 + cos . sdp(G) = 2 2n + 1 (c) How does the ratio 0.878..? mc(C5 ) sdp(C5 ) compare to the Goemans-Williamson ratio 4.6 Let G = (V = [n], E) be a graph and let w ∈ RE + be nonnegative edge weights. 67 (a) Show the following reformulation for the MAX CUT problem:     X T arccos(vi vj ) mc(G, w) = max : v1 , . . . , vn unit vectors in Rn . wij   π {i,j}∈E Hint: Use the analysis of the Goemans-Williamson algorithm. (b) Let v1 , . . . , v5 be unit vectors. Show: X 1≤i<j≤3 arccos(viT vj ) + arccos(v4T v5 ) − 5 3 X X i=1 j=4 arccos(viT vj ) ≤ 0. (c) Let v1 , . . . , v7 be unit vectors. Show: X arccos(viT vj ) ≤ 12π. 1≤i<j≤7 4.7 For a matrix A ∈ Rm×n we define the following quantities: XX Aij |, f (A) = max | I⊆[m],J⊆[n] i∈I j∈J called the cut norm of A, and   X X  g(A) = max Aij xi yj : x1 , . . . , xm , y1 , . . . , yn ∈ {±1} .   i∈[m] j∈[n] (a) Show: f (A) ≤ g(A) ≤ 4f (A). (b) Assume that all row sums and all column sums of A are equal to 0. Show: g(A) = 4f (A). (c) Formulate a semidefinite programming relaxation for g(A). (d) Show: g(A) = max  X X  i∈[m] j∈[n] Aij xi yj : x1 , . . . , xm , y1 , . . . , yn ∈ [−1, 1]    (e) Assume that A is a symmetric positive semidefinite n × n matrix. Show:   n X n X  g(A) = max Aij xi xj : x1 , . . . , xn ∈ {±1} .   i=1 j=1 68 . 4.8 Let G = (V, E) be a graph with nonnegative edge weights w ∈ RE + . The goal is to show that the maximum cut problem in (G, w) can be formulated as an instance of computing the cut norm f (A) of some matrix A (as defined in Exercise 4.7). For this consider the following 2|E|×|V | matrix A. Its columns are indexed by V and, for each edge e = {u, v} ∈ E, there are two rows in A, indexed by the two oriented pairs e1 = (u, v) and e2 = (v, u), with entries: Ae1 ,u = we , Ae1 ,v = −we , Ae2 ,u = −we , Ae2 ,v = we . Show: mc(G, w) = f (A). 69 BIBLIOGRAPHY [1] F. Barahona. The max-cut problem in graphs not contractible to K5 . Operations Research Letters 2:107–111, 1983. [2] C. Delorme and S. Poljak. Laplacian eigenvalues and the maximum cut problem. Mathematical Programming 62:557–574, 1993. [3] M. Deza and M. Laurent. Geometry of Cuts and Metrics. Springer, 1997. [4] A. Frieze and M. Jerrum. Improved approximation algorithms for MAX k-CUT and MAX BISECTION. Algorithmica 18:67–81, 1997. [5] M.R. Garey and D.S. Johnson. Computers and Intractability - A guide to the Theory of NP-Completeness. Freeman, 1979. [6] M.X. Goemans, D.P. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, J. ACM 42:1115–1145, 1995. [7] Y. Nesterov. Quality of semidefinite relaxation for nonconvex quadratic optimization. CORE Discussion Paper, Number 9719, 1997. [8] S. Poljak. Polyhedral and eigenvalue approximations of the max-cut problem. In Sets, Graphs and Numbers, Vol 60 of Colloquia Mathematica Societatis János Bolyai, Budapest, pp. 569–581, 1991. [9] S. Poljak and Z. Tuza. The expected relative error of the polyhedral approximation of the max-cut problem. Operations Research Letters 16:191–1998, 1994. [10] A. Schrijver. A Course in Combinatorial Optimization. Lecture Notes. Available at http://homepages.cwi.nl/~lex/files/dict.pdf 70